Statistics and Probability PDF

Commission on Higher Education
in collaboration with the Philippine Normal University
TEACHING GUIDE FOR SENIOR HIGH SCHOOL
Statistics and Probability

CORE SUBJECT
This Teaching Guide was collaboratively developed and reviewed by educators from public
and private schools, colleges, and universities. We encourage teachers and other education
stakeholders to email their feedback, comments, and recommendations to the Commission on
Higher Education, K to 12 Transition Program Management Unit - Senior High School
Support Team at k12@ched.gov.ph. We value your feedback and recommendations.
Published by the Commission on Higher Education, 2016 
Chairperson: Patricia B. Licuanan, Ph.D.
Commission on Higher Education 

K to 12 Transition Program Management Unit 
Office Address: 4th Floor, Commission on Higher Education,  
C.P. Garcia Ave., Diliman, Quezon City 
Telefax: (02) 441-1143 / E-mail Address: k12@ched.gov.ph
DEVELOPMENT TEAM
Team Leader: Jose Ramon G. Albert, Ph.D.
Writers: 
Zita VJ Albacea, Ph.D., Mark John V. Ayaay
Isidoro P. David, Ph.D., Imelda E. de Mesa
This Teaching Guide by the
Technical Editors:  Commission on Higher Education
Nancy A. Tandang, Ph.D., Roselle V. Collado is licensed under a Creative
Commons Attribution-
Copy Reader: Rea Uy-Epistola
NonCommercial-ShareAlike
Illustrator: Michael Rey O. Santos 4.0 International License. This
means you are free to:
Cover Artists: Paolo Kurtis N. Tan, Renan U. Ortiz
Share — copy and redistribute
CONSULTANTS the material in any medium or
THIS PROJECT WAS DEVELOPED WITH THE PHILIPPINE NORMAL UNIVERSITY.  format
University President: Ester B. Ogena, Ph.D.  Adapt — remix, transform, and
VP for Academics: Ma. Antoinette C. Montealegre, Ph.D.  build upon the material.
VP for University Relations & Advancement: Rosemarievic V. Diaz, Ph.D.
The licensor, CHED, cannot
Ma. Cynthia Rose B. Bautista, Ph.D., CHED  revoke these freedoms as long as
Bienvenido F. Nebres, S.J., Ph.D., Ateneo de Manila University  you follow the license terms.
Carmela C. Oracion, Ph.D., Ateneo de Manila University  However, under the following
Minella C. Alarcon, Ph.D., CHED  terms:
Gareth Price, Sheffield Hallam University  Attribution — You must give
Stuart Bevins, Ph.D., Sheffield Hallam University appropriate credit, provide a link
to the license, and indicate if
SENIOR HIGH SCHOOL SUPPORT TEAM  changes were made. You may do
CHED K TO 12 TRANSITION PROGRAM MANAGEMENT UNIT so in any reasonable manner, but
Program Director: Karol Mark R. Yee not in any way that suggests the
licensor endorses you or your use.
Lead for Senior High School Support: Gerson M. Abesamis
NonCommercial — You may
Lead for Policy Advocacy and Communications: Averill M. Pizarro
not use the material for
Course Development Officers:  commercial purposes.
John Carlo P. Fernando, Danie Son D. Gonzalvo ShareAlike — If you remix,
Teacher Training Officers:  transform, or build upon the
material, you must distribute
Ma. Theresa C. Carlos, Mylene E. Dones
your contributions under the
Monitoring and Evaluation Officer: Robert Adrian N. Daulat same license as the original.
Administrative Officers: Ma. Leana Paula B. Bato,   Printed in the Philippines by EC-TEC
Commercial, No. 32 St. Louis
Kevin Ross D. Nera, Allison A. Danao, Ayhen Loisse B. Dalena Compound 7, Baesa, Quezon City,
ectec_com@yahoo.com
Preface
Prior to the implementation of K-12, Statistics was taught in public high schools in the Philippines
typically in the last quarter of third year. In private schools, Statistics was taught as either an elective,
or a required but separate subject outside of regular Math classes. In college, Statistics was taught
practically to everyone either as a three unit or six unit course. All college students had to take at least
three to six units of a Math course, and would typically “endure” a Statistics course to graduate.
Teachers who taught these Statistics classes, whether in high school or in college, would typically be
Math teachers, who may not necessarily have had formal training in Statistics. They were selected out
of the understanding (or misunderstanding) that Statistics is Math. Statistics does depend on and uses a
lot of Math, but so do many disciplines, e.g. engineering, physics, accounting, chemistry, computer
science. But Statistics is not Math, not even a branch of Math. Hardly would one think that accounting
is a branch of mathematics simply because it does a lot of calculations. An accountant would also not
describe himself as a mathematician.
Math largely involves a deterministic way of thinking and the way Math is taught in schools leads
learners into a deterministic way of examining the world around them. Statistics, on the other hand, is
by and large dealing with uncertainty. Statistics uses inductive thinking (from specifics to generalities),
while Math uses deduction (from the general to the specific).
“Statistics has its own tools and ways of thinking, and statisticians are quite insistent that
those of us who teach mathematics realize that statistics is not mathematics, nor is it even a
branch of mathematics. In fact, statistics is a separate discipline with its own unique ways of
thinking and its own tools for approaching problems.” - J. Michael Shaughnessy, “Research on
Students’ Understanding of Some Big Concepts in Statistics” (2006)
Statistics deals with data; its importance has been recognized by governments, by the private sector,
and across disciplines because of the need for evidence-based decision making. It has become even more
important in the past few years, now that more and more data is being collected, stored, analyzed and
re-analyzed. From the time when humanity first walked the face of the earth until 2003, we created as
much as 5 exabytes of data (1 exabyte being a billion “gigabytes”). Information communications
technology (ICT) tools have provided us the means to transmit and exchange data much faster, whether
these data are in the form of sound, text, visual images, signals or any other form or any combination of
those forms using desktops, laptops, tablets, mobile phones, and other gadgets with the use of the
internet, social media (facebook, twitter). With the data deluge arising from using ICT tools, as of 2012,
as much as 5 exabytes were being created every two days (the amount of data created from the
beginning of history up to 2003); a year later, this same amount of data was now being created every ten
minutes.
In order to make sense of data, which is typically having variation and uncertainty, we need the Science
of Statistics, to enable us to summarize data for describing or explaining phenomenon; or to make
predictions (assuming trends in the data continue). Statistics is the science that studies data, and what
we can do with data. Teachers of Statistics and Probability can easily spend much time on the formal
methods and computations, losing sight of the real applications, and taking the excitement out of things.
The eminent statistician Bradley Efron mentioned how diverse statistical applications are:
“During the 20th Century statistical thinking and methodology has become the scientific
framework for literally dozens of fields including education, agriculture, economics, biology, and
medicine, and with increasing influence recently on the hard sciences such as astronomy,
geology, and physics. In other words, we have grown from a small obscure field into a big obscure
field.”
In consequence, the work of a statistician has become even fashionable. Google’s chief economist Hal
Varian wrote in 2009 that “the sexy job in the next ten years will be statisticians.” He went on and
mentioned that “The ability to take data - to be able to understand it, to process it, to extract value from
it, to visualize it, to communicate it's going to be a hugely important skill in the next decades, not only at
the professional level but even at the educational level for elementary school kids, for high school kids,
for college kids. “
This teaching guide, prepared by a team of professional statisticians and educators, aims to assist
Senior High School teachers of the Grade 11 second semester course in Statistics and Probability so that
they can help Senior High School students discover the fun in describing data, and in exploring the
stories behind the data. The K-12 curriculum provides for concepts in Statistics and Probability to be
taught from Grade 1 up to Grade 8, and in Grade 10, but the depth at which learners absorb these
concepts may need reinforcement. Thus, the first chapter of this guide discusses basic tools (such as
summary measures and graphs) for describing data. While Probability may have been discussed prior to
Grade 11, it is also discussed in Chapter 2, as a prelude to defining Random Variables and their
Distributions. The next chapter discusses Sampling and Sampling Distributions, which bridges
Descriptive Statistics and Inferential Statistics. The latter is started in Chapter 4, in Estimation, and
further discussed in Chapter 5 (which deals with Tests of Hypothesis). The final chapter discusses
Regression and Correlation.
Although Statistics and Probability may be tangential to the primary training of many if not all Senior
High School teachers of Statistics and Probability, it will be of benefit for them to see why this course is
important to teach. After all, if the teachers themselves do not find meaning in the course, neither will
the students. Work developing this set of teaching materials has been supported by the Commission on
Higher Education under a Materials Development Sub-project of the K-12 Transition Project. These
materials will also be shared with Department of Education.
Writers of this teaching guide recognize that few Senior High School teachers would have formal
training or applied experience with statistical concepts. Thus, the guide gives concrete suggestions on
classroom activities that can illustrate the wide range of processes behind data collection and data
analysis.
It would be ideal to use technology (i.e. computers) as a means to help teachers and students with
computations; hence, the guide also provides suggestions in case the class may have access to a
computer room (particularly the use of spreadsheet applications like Microsoft Excel). It would be
unproductive for teachers and students to spend too much time working on formulas, and checking
computation errors at the expense of gaining knowledge and insights about the concepts behind the
formulas.
The guide gives a mixture of lectures and activities, (the latter include actual collection and analysis of
data). It tries to follow suggestions of the Guidelines for Assessment and Instruction in Statistics
Education (GAISE) Project of the American Statistical Association to go beyond lecture methods, and
instead exercise conceptual learning, use active learning strategies and focus on real data. The guide
suggests what material is optional as there is really a lot of material that could be taught, but too little
time. Teachers will have to find a way of recognizing that diverse needs of students with variable
abilities and interests.
This teaching guide for Statistics and Probability, to be made available both digitally and in print to
senior high school teachers, shall provide Senior High School teachers of Statistics and Probability with
much-needed support as the country’s basic education system transitions into the K-12 curriculum. It is
earnestly hoped that Senior High School teachers of Grade 11 Statistics and Probability can direct
students into examining the context of data, identifying the consequences and implications of stories
behind Statistics and Probability, thus becoming critical consumers of information. It is further hoped
that the competencies gained by students in this course will help them become more statistical literate,
and more prepared for whatever employment choices (and higher education specializations) given that
employers are recognizing the importance of having their employee know skills on data management
and analysis in this very data-centric world.
K to 12 BASIC EDUCATION CURRICULUM
SENIOR HIGH SCHOOL – CORE SUBJECT
Grade: 11/12
Core Subject Title: Statistics and Probability No. of Hours/Semester: 80 hours/semester
Prerequisite (if needed):
Core Subject Description: At the end of the course, the students must know how to find the mean and variance of a random variable, to apply sampling techniques and
distributions, to estimate population mean and proportion, to perform hypothesis testing on population mean and proportion, and to perform correlation and regression
analyses on real-life problems.
PERFORMANCE
CONTENT CONTENT STANDARDS LEARNING COMPETENCIES CODE
STANDARDS
Random Variables The learner demonstrates The learner is able to apply The learner …
and Probability understanding of key an appropriate random 1. illustrates a random variable (discrete and continuous). M11/12SP-IIIa-1
Distributions concepts of random variable for a given real-life 2. distinguishes between a discrete and a continuous
variables and probability problem (such as in random variable. M11/12SP-IIIa-2
distributions. decision making and games
3. finds the possible values of a random variable. M11/12SP-IIIa-3
of chance).
4. illustrates a probability distribution for a discrete
random variable and its properties. M11/12SP-IIIa-4
5. constructs the probability mass function of a discrete

random variable and its corresponding histogram. M11/12SP-IIIa-5
6. computes probabilities corresponding to a given

random variable. M11/12SP-IIIa-6
7. illustrates the mean and variance of a discrete random

variable. M11/12SP-IIIb-1
8. calculates the mean and the variance of a discrete

random variable. M11/12SP-IIIb-2
9. interprets the mean and the variance of a discrete

random variable. M11/12SP-IIIb-3
10. solves problems involving mean and variance of

probability distributions. M11/12SP-IIIb-4
Normal The learner demonstrates The learner is able to The learner …

Distribution understanding of accurately formulate and 11. illustrates a normal random variable and its M11/12SP-IIIc-1
key concepts of normal solve real-life problems in characteristics.
probability distribution. different disciplines 12. constructs a normal curve. M11/12SP-IIIc-2
K to 12 Senior High School Core Curriculum – Statistics and Probability December 2013 Page 1 of 7
PERFORMANCE
STANDARDS
involving normal
distribution. 13. identifies regions under the normal curve
M11/12SP-IIIc-3
corresponding to different standard normal values.
14. converts a normal random variable to a standard

normal variable and vice versa. M11/12SP-IIIc-4
15. computes probabilities and percentiles using the M11/12SP-IIIc-d-

standard normal table. 1
Sampling and The learner demonstrates The learner is able to apply
Sampling understanding of key suitable sampling and The learner …
Distributions concepts of sampling and sampling distributions of 1. illustrates random sampling.
sampling distributions of the the sample mean to solve M11/12SP-IIId-2
sample mean. real-life problems in 2. distinguishes between parameter and statistic. M11/12SP-IIId-3
different disciplines.
3. identifies sampling distributions of statistics (sample
M11/12SP-IIId-4
mean).
4. finds the mean and variance of the sampling distribution
M11/12SP-IIId-5
of the sample mean.
5. defines the sampling distribution of the sample mean for
normal population when the variance is:
M11/12SP-IIIe-1
(a) known
(b) unknown
6. illustrates the Central Limit Theorem. M11/12SP-IIIe-2
7. defines the sampling distribution of the sample mean
M11/12SP-III-3
using the Central Limit Theorem.
8. solves problems involving sampling distributions of the M11SP-IIIe-f-1

sample mean.
Estimation of The learner demonstrates The learner is able to The learner …

Parameters understanding of key estimate the population
concepts of estimation of mean and population 1. illustrates point and interval estimations. M11/12SP-IIIf-2
population mean and proportion to make sound
2. distinguishes between point and interval estimation. M11/12SP-IIIf-3
PERFORMANCE
STANDARDS
population proportion. inferences in real-life
problems in different 3. identifies point estimator for the population mean. M11/12SP-IIIf-4
disciplines.
4. computes for the point estimate of the population
M11/12SP-IIIf-5
mean.
5. identifies the appropriate form of the confidence
interval estimator for the population mean when: (a)
the population variance is known, (b) the population M11/12SP-IIIg-1
variance is unknown, and (c) the Central Limit Theorem
is to be used.
9. illustrates the t-distribution. M11/12SP-IIIg-2
10. constructs a t-distribution. M11/12SP-IIIg-3
11. identifies regions under the t-distribution corresponding
to different t-values. M11/12SP-IIIg-4
11. identifies percentiles using the t-table. M11/12SP-IIIg-5

12. computes for the confidence interval estimate based on
the appropriate form of the estimator for the M11/12SP-IIIh-1
population mean.
13. solves problems involving confidence interval
estimation of the population mean. M11/12SP-IIIh-2
14. draws conclusion about the population mean based on

its confidence interval estimate. M11/12SP-IIIh-3
15. identifies point estimator for the population proportion. M11/12SP-IIIi-1
16. computes for the point estimate of the population

proportion. M11/12SP-IIIi-2
17. identifies the appropriate form of the confidence

interval estimator for the population proportion based M11/12SP-IIIi-3
on the Central Limit Theorem.
18. computes for the confidence interval estimate of the
population proportion. M11/12SP-IIIi-4
19. solves problems involving confidence interval

M11/12SP-IIIi-5
estimation of the population proportion.
PERFORMANCE
STANDARDS
20. draws conclusion about the population proportion
based on its confidence interval estimate M11/12SP-IIIi-6
21. identifies the length of a confidence interval. M11/12SP-IIIj-1
22. computes for the length of the confidence interval. M11/12SP-IIIj-2
23. computes for an appropriate sample size using the

M11/12SP-IIIj-3
length of the interval.
24. solves problems involving sample size determination.
M11/12SP-IIIj-4
Tests of The learner demonstrates The learner is able to The learner …
Hypothesis understanding of key perform appropriate tests 1. illustrates:
concepts of tests of of hypotheses involving the (a) null hypothesis
hypotheses on the population mean and (b) alternative hypothesis M11/12SP-IVa-1
population mean and population proportion to (c) level of significance
population proportion. make inferences in real-life (d) rejection region; and
problems in different (e) types of errors in hypothesis testing.
disciplines. 2. calculates the probabilities of committing a Type I and
Type II error. M11/12SP-IVa-2
3. identifies the parameter to be tested given a real-life

M11/12SP-IVa-3
problem.
4. formulates the appropriate null and alternative

M11/12SP-IVb-1
hypotheses on a population mean.
5. identifies the appropriate form of the test-statistic

when:
(a) the population variance is assumed to be known
M11/12SP-IVb-2
(b) the population variance is assumed to be unknown;
and
(c) the Central Limit Theorem is to be used.
PERFORMANCE
STANDARDS
6. identifies the appropriate rejection region for a given
level of significance when:
(a) the population variance is assumed to be known
M11/12SP-IVc-1
(b) the population variance is assumed to be unknown;
and
(c) the Central Limit Theorem is to be used.
7. computes for the test-statistic value (population mean). M11/12SP-IVd-1
8. draws conclusion about the population mean based on

M11/12SP-IVd-2
the test-statistic value and the rejection region.
9. solves problems involving test of hypothesis on the

M11/12SP-IVe-1
population mean.
10. formulates the appropriate null and alternative

M11/12SP-IVe-2
hypotheses on a population proportion.
11. identifies the appropriate form of the test-statistic

M11/12SP-IVe-3
when the Central Limit Theorem is to be used.
12. identifies the appropriate rejection region for a given

level of significance when the Central Limit Theorem is M11/12SP-IVe-4
to be used.
13. computes for the test-statistic value (population
proportion). M11/12SP-IVf-1
14. draws conclusion about the population proportion

based on the test-statistic value and the rejection M11/12SP-IVf-2
region.
15. solves problems involving test of hypothesis on the M11/12SP-IVf-g-
population proportion. 1
CONTENT CONTENT STANDARDS PERFORMANCE

LEARNING COMPETENCIES CODE
STANDARDS
ENRICHMENT The learner demonstrates The learner is able to 1. illustrates the nature of bivariate data. M11/12SP-IVg-2
understanding of key perform correlation and
Correlation and concepts of correlation and regression analyses on 2. constructs a scatter plot. M11/12SP-IVg-3
Regression regression analyses. real-life problems in
Analyses different disciplines. 3. describes shape (form), trend (direction), and variation
M11/12SP-IVg-4
(strength) based on a scatter plot.
4. estimates strength of association between the variables

M11/12SP-IVh-1
based on a scatter plot.
5. calculates the Pearson’s sample correlation coefficient. M11/12SP-IVh-2
6. solves problems involving correlation analysis. M11/12SP-IVh-3
7. identifies the independent and dependent variables. M11/12SP-IVi-1
8. draws the best-fit line on a scatter plot. M11/12SP-IVi-2
9. calculates the slope and y-intercept of the regression

M11/12SP-IVi-3
line.
10. interprets the calculated slope and y-intercept of the
M11/12SP-IVi-4
regression line.
11. predicts the value of the dependent variable given the
M11/12SP-IVj-1
value of the independent variable.
12. solves problems involving regression analysis. M11/12SP-IVj-2
Code Book Legend
Sample: M11/12SP-IIIa-1
LEGEND SAMPLE
Learning Area and Strand/ Subject or

Mathematics
Specialization
First Entry M11/12
Grade Level Grade 11/12
Domain/Content/
Uppercase Letter/s Statistics and Probability SP
Component/ Topic
-
Roman Numeral
Quarter Third Quarter III
*Zero if no specific quarter
Lowercase Letter/s
*Put a hyphen (-) in between letters to indicate Week Week one a
more than a specific week
-
illustrates a random variable (discrete and
Arabic Number Competency 1
continuous)
Table of Contents
Chapter 1: Exploring Data Chapter 3: Sampling
• Introducing Statistics 1 • Coin Tossing revisited from a
• Data Collection Activity 7 Statistical Perspective 204
• Basic Terms in Statistics 17 • The Need for Sampling 221
• Levels of Measurement 24 • Sampling Distribution of the Sample
• Data Presentation 31 Mean 242
• Measures of Central Tendency 44 • Sampling without Replacement 265
• Other Measures of Location 54 • Sampling from a Box of Marbles, Nips,
• Measures of Variation 60 or Colored Paper Clips and One-Peso
• More on Describing Data: Coins 285
Summary Measures and Graphs 69 • Sampling from the Periodic Table 299
Chapter 2: Random Variables and Chapter 4: On Estimation of Parameters

Probability Distributions • Concepts of Point and Interval
• Probability 86 Estimation 316
• Geometric Probability 98 • Point Estimation of the Population
• Random Variables 108 Mean 321
• Probability Distributions of • Confidence Interval Estimation of the
Discrete Random Variables 117 • Population Mean 328
• Probability Density Functions 130 • Point and Confidence Interval
• Mean and Variance of Discrete Estimation of the Population Proportion 344
Random Variables 144 • More on Point Estimates and
• More about Means and Confidence Intervals 351
Variances 155
• The Normal Distribution and Its Chapter 5: Tests of Hypothesis
Properties 164 • Basic Concepts in Hypothesis Testing 362
• Areas Under a Standard Normal • Steps in Hypothesis Testing 368
Distribution 182 • Test on Population Mean 374
• Areas under a Normal • Test on Population Proportion 385
Distribution 194 • More on Hypothesis Tests Regarding
the Population Proportion 390
Chapter 6: Correlation and Regression

Analysis
• Examining Relationships with
Correlation 399
Biographical Notes 421

!
!
CHAPTER 1: EXPLORING DATA
Lesson 1: Introducing Statistics

TIME FRAME: 60 minutes
OVERVIEW OF LESSON
In decision making, we use statistics although some of us may not be aware of it.
In this lesson, we make the students realize that to decide logically, they need to
use statistics. An inquiry could be answered or a problem could be solved
through the use of statistics. In fact, without knowing it we use statistics in our
daily activities.
LEARNING COMPETENCIES:
At the end of the lesson, the learner should be able to identify questions that
could be answered using a statistical process and describe the activities involved
in a statistical process.
LESSON OUTLINE:
1. Motivation
2. Statistics as a Tool in Decision-Making
3. Statistical Process in Solving a Problem
REFERENCES:
Albert, J. R. G. (2008).Basic Statistics for the Tertiary Level (ed. Roberto Padua,
Welfredo Patungan, Nelia Marquez), published by Rex Bookstore.
Handbook of Statistics 1 (1st and 2nd Edition), Authored by the Faculty of the
Institute of Statistics, UP Los Baños, College Laguna 4031
Workbooks in Statistics 1 (From 1st to 13th Edition), Authored by the Faculty of

the Institute of Statistics, UP Los Baños, College Laguna 4031
1"
!
DEVELOPMENT OF THE LESSON
A. Motivation
You may ask the students, a question that is in their mind at that moment. You
may write their answers on the board. (Note: You may try to group the questions
as you write them on the board into two, one group will be questions that are
answerable by a fact and the other group are those that require more than one
information and needs further thinking).
The following are examples of what you could have written on the board:
Group 1:
• How old is our teacher?
• Is the vehicle of the Mayor of our city/town/municipality bigger than the
vehicle used by the President of the Philippines?
• How many days are there in December?
• Does the Principal of the school has a post graduate degree?
• How much does the Barangay Captain receive as allowance?
• What is the weight of my smallest classmate?
Group 2:
• How old are the people residing in our town?
• Do dogs eat more than cats?
• Does it rain more in our country than in Thailand?
• Do math teachers earn more than science teachers?
• How many books do my classmates usually bring to school?
• What is the proportion of Filipino children aged 0 to 5 years who are
underweight or overweight for their age?
The first group of questions could be answered by a piece of information which

is considered always true. There is a correct answer which is based on a fact and
you don’t need the process of inquiry to answer such kind of question. For
example, there is one and only one correct answer to the first question in Group
1 and that is your age as of your last birthday or the number of years since your
birth year.
On the other hand, in the second group of questions one needs observations or
data to be able to respond to the question. In some questions you need to get
the observations or responses of all those concerned to be able to answer the
question. On the first question in the second group, you need to ask all the
people in the locality about their age and among the values you obtained you
get a representative value. To answer the second question in the second group,
2"
!
you need to get the amount of food that all dogs and cats eat to respond to the
question. However, we know that is not feasible to do so. Thus what you can do
is get a representative group of dogs and another representative group for the
cats. Then we measure the amount of food each group of animal eats. From
these two sets of values, we could then infer whether dogs do eat more than
cats.
So as you can see in the second group of questions you need more information
or data to be able to answer the question. Either you need to get observations
from all those concerned or you get representative groups from which you
gather your data. But in both cases, you need data to be able to respond to the
question. Using data to find an answer or a solution to a problem or an inquiry is
actually using the statistical process or doing it with statistics.
Now, let us formalize what we discussed and know more about statistics and
how we use it in decision-making.
B. Main Lesson
1. Statistics as a Tool in Decision-Making
Statistics is defined as a science that studies data to be able to make a decision.

Hence, it is a tool in decision-making process. Mention that Statistics as a
science involves the methods of collecting, processing, summarizing and
analyzing data in order to provide answers or solutions to an inquiry. One also
needs to interpret and communicate the results of the methods identified above
to support a decision that one makes when faced with a problem or an inquiry.
Trivia: The word “statistics” actually comes from the word “state”—
because governments have been involved in the statistical activities,
especially the conduct of censuses either for military or taxation purposes.
The need for and conduct of censuses are recorded in the pages of holy
texts. In the Christian Bible, particularly the Book of Numbers, God is
reported to have instructed Moses to carry out a census. Another census
mentioned in the Bible is the census ordered by Caesar Augustus
throughout the entire Roman Empire before the birth of Christ.
Inform students that uncovering patterns in data involves not just science
but it is also an art, and this is why some people may think “Stat is eeeks!”
and may view any statistical procedures and results with much skepticism
Make known to students that Statistics enable us to

• characterize persons, objects, situations, and phenomena;
• explain relationships among variables;
• formulate objective assessments and comparisons; and, more importantly
• make evidence-based decisions and predictions.
3"
!
And to use Statistics in decision-making there is a statistical process to follow

which is to be discussed in the next section.
2. Statistical Process in Solving a Problem
You may go back to one of the questions identified in the second group and use
it to discuss the components of a statistical process. For illustration on how to do
it, let us discuss how we could answer the question “Do dogs eat more than
cats?”
As discussed earlier, this question requires you to gather data to generate

statistics which will serve as basis in answering the query. There should be plan
or a design on how to collect the data so that the information we get from it is
enough or sufficient for us to minimize any bias in responding to the query. In
relation to the query, we said earlier that we cannot gather the data from all
dogs and cats. Hence, the plan is to get representative group of dogs and
another representative group of cats. These representative groups were
observed for some characteristics like the animal weight, amount of food in
grams eaten per day and breed of the animal. Included in the plan are factors
like how many dogs and cats are included in the group, how to select those
included in the representative groups and when to observe these animals for
their characteristics.
After the data were gathered, we must verify the quality of the data to make a
good decision. Data quality check could be done as we process the data to
summarize the information extracted from the data. Then using this information,
one can then make a decision or provide answers to the problem or question at
hand.
To summarize, a statistical process in making a decision or providing solutions to

a problem include the following:
• Planning or designing the collection of data to answer statistical questions in

a way that maximizes information content and minimizes bias;
• Collecting the data as required in the plan;
• Verifying the quality of the data after they were collected;
• Summarizing the information extracted from the data; and
• Examining the summary statistics so that insight and meaningful information
can be produced to support decision-making or solutions to the question or
problem at hand.
Hence, several activities make up a statistical process which for some the
process is simple but for others it might be a little bit complicated to implement.
Also, not all questions or problems could be answered by a simple statistical
4"
!
process. There are indeed problems that need complex statistical process.
However, one can be assured that logical decisions or solutions could be
formulated using a statistical process.
KEY POINTS
• Difference between questions that could be and those that could not
answered using Statistics.
• Statistics is a science that studies data.
• There are many uses of Statistics but its main use is in decision-making.
• Logical decisions or solutions to a problem could be attained through a
statistical process.
ASSESSMENT
Note: Answers are provided inside the parentheses and italicized.
1. Identify which of the following questions are answerable using a statistical
process.
a. What is a typical size of a Filipino family? (answerable through a statistical
process)
b. How many hours in a day? (not answerable through a statistical process)
c. How old is the oldest man residing in the Philippines? (answerable through
a statistical process)
d. Is planet Mars bigger than planet Earth? (not answerable through a
statistical process)
e. What is the average wage rate in the country? (answerable through a
statistical process)
f. Would Filipinos prefer eating bananas rather than apple? (answerable
through a statistical process)
g. How long did you sleep last night? (not answerable through a statistical
process)
h. How much a newly-hired public school teacher in NCR earns in a month?
(not answerable through a statistical process)
i. How tall is a typical Filipino? (answerable through a statistical process)
j. Did you eat your breakfast today? (not answerable through a statistical
process)
2. For each of the identified questions in Number 1 that are answerable using a
statistical process, describe the activities involved in the process.
5"
!
For a. What is a typical size of a Filipino family? (The process includes getting a
representative group of Filipino families and ask the family head as to how
many members do they have in their family. From the gathered data which
had undergone a quality check a typical value of the number of family
members could be obtained. Such typical value represents a possible answer
to the question.)
For c. How old is the oldest man residing in the Philippines? (The process
includes getting the ages of all residents of the country. From the gathered
data which had undergone a quality check the highest value of age could be
obtained. Such value is the answer to the question.)
For e. What is the average wage rate in the country? (The process includes
getting all prevailing wage rates in the country. From the gathered data which
had undergone a quality check a typical value of the wage rate could be
obtained. Such value is the answer to the question.)
For f. Would Filipinos prefer eating bananas rather than apple? (The process
includes getting a representative group of Filipinos and ask each one of them
on what fruit he/she prefers, banana or apple? From the gathered data which
had undergone a quality check the proportion of those who prefers banana
and proportion of those who prefer apple will be computed and compared.
The results of this comparison could provide a possible answer to the
question.)
For i. How tall is a typical Filipino? (The process includes getting a

representative group of Filipinos and measure the height of each member of
the representative group. From the gathered data which had undergone a
quality check a typical value of the height of a Filipino could be obtained. Such
typical value represents a possible answer to the question.)
Note: Tell the students that getting a representative group and obtaining a
typical value are to be learned in subsequent lessons in this subject.
6"
Lesson 2: Data Collection Activity

OVERVIEW OF LESSON
As we have learned in the previous lesson, Statistics is a science that studies data.
Hence to teach Statistics, real data set is recommend to use. In this lesson,we
present an activity where the students will be asked to provide some data that will
be submitted for consolidation by the teacher for future lessons. Data on heights
and weights, for instance, will be used for calculating Body Mass Index in the
integrative lesson. Students will also be given the perspective that the data they
provided is part of a bigger group of data as the same data will be asked from much
larger groups (the entire class, all Grade 11 students in school, all Grade 11
students in the district). The contextualization of data will also be discussed.
At the end of the lesson, the learner should be able to:
• Recognize the importance of providing correct information in a data collection
activity;
• Understand the issue of confidentiality of information in a data collection activity;
• Participate in a data collection activity; and
• Contextualize data
LESSON OUTLINE:
1. Preliminaries in a Data Collection Activity
2. Performing a Data Collection Activity
3. Contextualization of Data
REFERENCES
Albert, J. R. G. (2008). Basic Statistics for the Tertiary Level (ed. Roberto Padua, Welfredo Patungan,
Nelia Marquez), published by Rex Bookstore.
Handbook of Statistics 1 (1st and 2nd Edition), Authored by the Faculty of the Institute of Statistics, UP
Los Baños, College Laguna 4031
Workbooks in Statistics 1 (From 1st to 13th Edition), Authored by the Faculty of the Institute of
Statistics, UP Los Baños, College Laguna 4031
https://www.khanacademy.org/math/probability/statistical-studies/statistical-questions/v/statistical-
questions
https://www.illustrativemathematics.org/content-standards/tasks/703
7"
A. Preliminaries in a Data Collection Activity
Before the lesson, prepare a sheet of paper listing everyone’s name in class with a
“Class Student Number” (see Attachment A for the suggested format). The
class student number is a random number chosen in the following fashion:
(a) Make a box with “tickets” (small pieces of papers of equal sizes) listing the
numbers 1 up to the number of students in the class.
(b) Shake the box, get a ticket, and assign the number in the ticket to the first
person in the list.
(c) Shake the box again, get another ticket, and assign the number of this ticket to
the next person in the list.
(d) Do (c) until you run out of tickets in the box.
At this point all the students have their corresponding class student number written
across their names in the prepared class list. Note that the preparation of the class
list is done before the class starts.
At the start of the class, inform each student confidentially of his/her class student
number. Perhaps, when the attendance is called, each student can be provided a
separate piece of paper that lists her/his name and class student number. Tell
students to remember their class student number, and to always use this throughout
the semester whenever data are requested of them. Explain to students that in data
collection activity, specific identities like their names are not required, especially
because people have a right to confidentiality, but there should be a way to
develop and maintain a database to check quality of data provided, and verify from
respondent in a data collection activity the data that they provided (if necessary).
These preliminary steps for generating a class student number and informing
students confidentially of their class student number are essential for the data
collection activities to be performed in this lesson and other lessons so that students
can be uniquely identified, without having to obtain their names. Inform also the
students that the class student numbers they were given are meant to identify them
without having to know their specific identities in the class recording sheet (which
will contain the consolidated records that everyone had provided). This helps
protect confidentiality of information.
In statistical activities, facts are collected from respondents for purposes of getting
aggregate information, but confidentiality should be protected. Mention that the
agencies mandated to collect data is bound by law to protect the confidentiality of
information provided by respondents. Even market research organizations in the
private sector and individual researchers also guard confidentiality as they merely
want to obtain aggregate data. This way, respondents can be truthful in giving
8"
information, and the researcher can give a commitment to respondents that the
data they provide will never be released to anyone in a form that will identify them
without their consent.
B. Performing a Data Collection Activity
Explain to the students that the purpose of this data collection activity is to gather
data that they could use for their future lessons in Statistics. It is important that they
do provide the needed information to the best of their knowledge. Also, before
they respond to the questionnaire provided in the Attachment B as Student
Information Sheet (SIS), it is recommended that each item in the SIS should be
clarified. The following are suggested clarifications to make for each item:
1. CLASS STUDENT NUMBER: This is the number that you provided confidentially
to the student at the start of the class.
2. SEX: This is the student’s biological sex and not their preferred gender. Hence,
they have to choose only one of the two choices by placing a check mark (√) at
space provided before the choices.
3. NUMBER OF SIBLINGS: This is the number of brothers and sisters that the
student has in their nuclear or immediate family. This number excludes him or
her in the count. Thus, if the student is the only child in the family then he/she
will report zero as his/her number of siblings.
4. WEIGHT (in kilograms): This refers to the student’s weight based on the
student’s knowledge. Note that the weight has to be reported in kilograms. In
case the student knows his/her weight in pounds, the value should be converted
to kilograms by dividing the weight in pounds by a conversion factor of 2.2
pounds per kilogram.
5. HEIGHT (in centimeters): This refers to the student’s height based on the
student’s knowledge. Note that the height has to be reported in centimeters. In
case the student knows his/her height in inches, the value should be converted
to centimeters by multiplying the height in inches by a conversion factor of 2.54
centimeters per inch.
6. AGE OF MOTHER (as of her last birthday in years): This refers to the age of the
student’s mother in years as of her last birthday, thus this number should be
reported in whole number. In case, the student’s mother is dead or nowhere to
be found, ask the student to provide the age as if the mother is alive or
around.You could help the student in determining his/her mother’s age based
9"
on other information that the student could provide like birth year of the mother
or student’s age. Note also that a zero value is not an acceptable value.
7. USUAL DAILY ALLOWANCE IN SCHOOL (in pesos): This refers to the usual
amount in pesos that the student is provided for when he/she goes to school in
a weekday. Note that the student can give zero as response for this item, in case
he/she has no monetary allowance per day.
8. USUAL DAILY FOOD EXPENDITURE IN SCHOOL (in pesos): This refers to the
usual amount in pesos that the student spends for food including drinks in
school per day. Note that the student can give zero as response for this item, in
case he/she does not spend for food in school.
9. USUAL NUMBER OF TEXT MESSAGES SENT IN A DAY: This refers to the usual
number of text messages that a student send in a day. Note that the student can
give zero as response for this item, in case he/she does not have the gadget to
use to send a text message or simply he/she does not send text messages.
10. MOST PREFERRED COLOR: The student is to choose a color that could be
considered his most preferred among the given choices. Note that the student
could only choose one. Hence, they have to place a check mark (√) at space
provided before the color he/she considers as his/her most preferred color
among those given.
11. USUAL SLEEPING TIME: This refers to the usual sleeping time at night during a
typical weekday or school day. Note that the time is to be reported using the
military way of reporting the time or the 24-hour clock (0:00 to 23:59 are the
possible values to use)
12. HAPPINESS INDEX FOR THE DAY : The student has to response on how he/she
feels at that time using codes from 1 to 10. Code 1 refers to the feeling that the
student is very unhappy while Code 10 refers to a feeling that the student is very
happy on the day when the data are being collected.
After the clarification, the students are provided at most 10 minutes to respond to
the questionnaire. Ask the students to submit the completed SIS so that you could
consolidate the data gathered using a formatted worksheet file provided to you as
Attachment C. Having the data in electronic file makes it easier for you to use it in
the future lessons. Be sure that the students provided the information in all items in
the SIS.
10"
Inform the students that you are to compile all their responses and compiling all
these records from everyone in the class is an example of a census since data has
been gathered from every student in class. Mention that the government, through
the Philippine Statistics Authority (PSA), conducts censuses to obtain information
about socio-demographic characteristics of the residents of the country. Census
data are used by the government to make plans, such as how many schools and
hospitals to build. Censuses of population and housing are conducted every 10
years on years ending in zero (e.g., 1990, 2000, 2010) to obtain population counts,
and demographic information about all Filipinos. Mid-decade population censuses
have also been conducted since 1995. Censuses of Agriculture, and of Philippine
Business and Industry, are also conducted by the PSA to obtain information on
production and other relevant economic information.
PSA is the government agency mandated to conduct censuses and surveys.

Through Republic Act 10625 (also referred to as The Philippine Statistical Act of
2013), PSA was created from four former government statistical agencies, namely:
National Statistics Office (NSO), National Statistical Coordination Board (NSCB),
Bureau of Labor and Employment of Statistics (BLES) and Bureau of Agricultural
Statistics (BAS). The other agency created through RA 10625 is the Philippine
Statistical Research and Training Institute (PSRTI) which is mandated as the research
and training arm of the Philippine Statistical System. PSRTI was created from its
forerunner the former Statistical Research and Training Center (SRTC).
C. Contextualization of Data
Ask students what comes to their minds when they hear the term “data” (which
may be viewed as a collection of facts from experiments, observations,
sample surveys and censuses, and administrative reporting systems).
Present to the student the following collection of numbers, figures, symbols, and
words, and ask them if they could consider the collection as data.
3, red, F, 156, 4, 65, 50, 25, 1, M, 9, 40, 68, blue, 78, 168, 69, 3, F, 6, 9, 45,
50, 20, 200, white, 2, pink, 160, 5, 60, 100, 15, 9, 8, 41, 65, black, 68, 165,
59, 7, 6, 35, 45,
Although the collection is composed of numbers and symbols that could be

classified as numeric or non-numeric, the collection has no meaning or it is not
contextualized, hence it cannot be referred to as data.
11"
Tell the students that data are facts and figures that are presented,
collected and analyzed. Data are either numeric or non-numeric and
must be contextualized. To contextualize data, we must identify its six W’s or to
put meaning on the data, we must know the following W’s of the data:
1. Who? Who provided the data?
2. What? What are the information from the respondents and What is the unit of
measurement used for each of the information (if there are any)?
3. When? When was the data collected?
4. Where? Where was the data collected?
5. Why? Why was the data collected?
6. HoW? HoW was the data collected?
Let us take as an illustration the data that you have just collected from the students,
and let us put meaning or contextualize it by responding to the questions with the
Ws. It is recommended that the students answer theW-questions so that they will
learn how to do it.
1. Who? Who provided the data?
• The students in this class provided the data.
2. What? What are the information from the respondents and What is the unit of
measurement used for each of the information (if there are any)?
• The information gathered include Class Student Number, Sex, Number of

Siblings, Weight, Height, Age of Mother, Usual Daily Allowance in School,
Usual Daily Food Expenditure in School, Usual Number of Text Messages
Sent in a Day, Most Preferred Color, Usual Sleeping Time and Happiness
Index for the Day.
• The units of measurement for the information on Number of Siblings, Weight,

Height, Age of Mother, Usual Daily Allowance in School, Usual Daily Food
Expenditure in School, and Usual Number of Text Messages Sent in a Day
are person, kilogram, centimeter, year, pesos, pesos and message,
respectively.
3. When? When was the data collected?
12"
• The data was collected on the first few days of classes for Statistics and
Probability.
4. Where? Where was the data collected?
• The data was collected inside our classroom.
5. Why? Why was the data collected?
• As explained earlier, the data will be used in our future lessons in Statistics
and Probability
6. HoW? HoW was the data collected?
• The students provided the data by responding to the Student Information

Sheet prepared and distributed by the teacher for the data collection activity.
Once the data are contextualized, there is now meaning to the collection of number
and symbols which may now look like the following which is just a small part of the
data collected in the earlier activity.
Usual
Number Usual Usual daily
Age of number
Class of daily food Most Usual Happiness
Sex Weight Height mother of text
Student siblings allowance expenditure Preferred Sleeping Index for
(in kg) (in cm) (in messages
Number (in in school in school Color Time the Day
years) sent in a
person) (in pesos) (in pesos)
day
1 M 2 60 156 60 200 150 20 RED 23:00 8
2 F 5 63 160 66 300 200 25 PINK 22:00 9
3 F 3 65 165 59 250 50 15 BLUE 20:00 7
4 M 1 55 160 55 200 100 30 BLACK 19:00 6
5 M 0 65 167 45 350 300 35 BLUE 20:00 8
: : : : : : : : : : : :
: : : : : : : : : : : :
KEY POINTS
• Providing correct information in a government data collection activity is a

responsibility of every citizen in the country.
• Data confidentiality is important in a data collection activity.
• Census is collecting data from all possible respondents.
• Data to be collected must be clarified before the actual data collection.
• Data must be contextualized by answering six W-questions.
13"
ATTACHMENT A: CLASS LIST
CLASS STUDENT CLASS STUDENT

STUDENT NAME STUDENT NAME
NUMBER NUMBER
1. 36.
2, 37.
3. 38.
4. 39.
5. 40.
6. 41.
7. 42.
8. 43.
9. 44.
10. 45.
11. 46.
12. 47.
13. 48.
14. 49.
15. 50.
16. 51.
17. 52.
18. 53.
19. 54.
20. 55.
21. 56.
22. 57.
23. 58.
24. 59.
25. 60.
26. 61.
27. 62.
28. 63.
29. 64.
30. 65.
31. 66.
32, 67.
33. 68.
34. 69.
35. 70.
14"
ATTACHMENT B: STUDENT INFORMATION SHEET
Instruction to the Students: Please provide completely the following information.

Your teacher is available to respond to your queries regarding the items in this
information sheet, if you have any. Rest assured that the information that you will be
providing will only be used in our lessons in Statistics and Probability.
1. CLASS STUDENT NUMBER: ______________

2. SEX (Put a check mark, √): ____Male __ Female
3. NUMBER OF SIBLINGS: _____
4. WEIGHT (in kilograms): ______________
5. HEIGHT (in centimeters): ______
6. AGE OF MOTHER (as of her last birthday in years): ________
(If mother deceased, provide age if she was alive)
7. USUAL DAILY ALLOWANCE IN SCHOOL (in pesos): _________________
8. USUAL DAILY FOOD EXPENDITURE IN SCHOOL (in pesos): ___________
9. USUAL NUMBER OF TEXT MESSAGES SENT IN A DAY: ______________
10. MOST PREFERRED COLOR (Put a check mark, √. Choose only one):
____WHITE ____RED ____ PINK ____ ORANGE

____YELLOW ____GREEN ____BLUE ____PEACH
____BROWN ____GRAY ____BLACK ____PURPLE
11. USUAL SLEEPING TIME (on weekdays): ______________

12. HAPPPINESS INDEX FOR THE DAY:
On a scale from 1 (very unhappy) to 10 (very happy), how do you feel
today? ______
15"
ATTACHMENT C: CLASS RECORDING SHEET (for the Teacher’s Use)
Usual
Number Usual Usual Daily
Age of number
Class of Daily food Most Usual Happiness
years) sent in a
day
16"
Lesson 3: Basic Terms in Statistics

OVERVIEW OF LESSON
As continuation of Lesson 2 (where we contextualize data) in this lesson we define

basic terms in statistics as we continue to explore data. These basic terms include
the universe, variable, population and sample. In detail we will discuss other
concepts in relation to a variable.
LEARNING OUTCOME(S):
At the end of the lesson, the learner is able to
• Define universe and differentiate it with population; and

• Define and differentiate between qualitative and quantitative variables, and
between discrete and continuous variables (that are quantitative);
LESSON OUTLINE:
1. Recall previous lesson on ‘Contextualizing Data’

2. Definition of Basic Terms in Statistics (universe, variable, population and sample)
3. Broad of Classification of Variables(qualitative and quantitative, discrete and
continuous)
REFERENCES
Albert, J. R. G. (2008). Basic Statistics for the Tertiary Level (ed. Roberto Padua,
WelfredoPatungan, Nelia Marquez), published by Rex Bookstore.
Takahashi, S. (2009). The Manga Guide to Statistics. Trend-Pro Co. Ltd.
Workbooks in Statistics 1 (From 1st to 13th Edition), Authored by the Faculty of the
17#
A. Recall previous lesson on ‘Contextualizing Data’
Begin by recalling with the students the data they provided in the previous lesson
and how they contextualized such data. You could show them the compiled data set
in a table like this:
Usual
Number Usual Usual Daily
Age of number
Class of Daily food Most Usual Happiness
years) sent in a
day
1 M 2 60 156 60 200 150 20 RED 23:00 8

2 F 5 63 160 66 300 200 25 PINK 22:00 9
3 F 3 65 165 59 250 50 15 BLUE 20:00 7
4 M 1 55 160 55 200 100 30 BLACK 19:00 6
5 M 0 65 167 45 350 300 35 BLUE 20:00 8
: : : : : : : : : : : :
: : : : : : : : : : : :
Recall also their response on the first Ws of the data, that is, on the question “Who
provided the data?” We said last time the students of the class provided the data or
the data were taken from the students.
Another Ws of the data is What? What are the information from the respondents?
and What is the unit of measurement used for each of the information (if there are
any)? Our responses are the following:
• The information gathered include Class Student Number, Sex, Number of

Siblings, Weight, Height, Age of Mother, Usual Daily Allowance in School,
Usual Daily Food Expenditure in School, Usual Number of Text Messages
Sent in a Day, Most Preferred Color, Usual Sleeping Time and Happiness
Index.
• The units of measurement for the information on Number of Siblings, Weight,
Height, Age of Mother, Usual Daily Allowance in School, Usual Daily Food
Expenditure in School, and Usual Number of Text Messages Sent in a Day
are person, kilogram, centimeter, year, pesos, pesos and message,
respectively.
B. Main Lesson
1. Definition of Basic Terms
The collection of respondents from whom one obtain the data is called the universe
of the study. In our illustration, the set of students of this Statistics and Probability
class is our universe. But we must precaution the students that a universe is not
necessarily composed of people. Since there are studies where the observations
were taken from plants or animals or even from non-living things like buildings,
vehicles, farms, etc. So formally, we define universe as the collection or set of
18#
units or entities from whom we got the data. Thus, this set of units answers
the first Ws of data contextualization.
On the other hand, the information we asked from the students are referred to as
the variables of the study and in the data collection activity, we have 12 variables
including Class Student Number. A variable is a characteristic that is
observable or measurable in every unit of the universe. From each student
of the class, we got the his/her age, number of siblings, weight, height, age of
mother, usual daily allowance in school, usual daily food expenditure in school,
usual number of text messages sent in a day, most preferred color, usual sleeping
time and happiness index for the day. Since these characteristics are observable in
each and every student of the class, then these are referred to as variables.
The set of all possible values of a variable is referred to as a

population. Thus for each variable we observed, we have a population of values.
The number of population in a study will be equal to the number of variables
observed. In the data collection activity we had, there are 12 populations
corresponding to 12 variables.
A subgroup of a universe or of a population is a sample. There are several

ways to take a sample from a universe or a population and the way we draw the
sample dictates the kind of analysis we do with our data.
We can further visualize these terms in the following figure:
VARIABLE 1 VARIABLE 2 VARIABLE 12
Unit 1 Value 1 Value 1 Value 1

: : : …..# :
: : : :
Unit N Value N Value N Value N
UNIVERSE POPULATION POPULATION POPULATION OF

OF VARIABLE 1 OF VARIABLE 2 VARIABLE 12
Unit 1 Value 1
: :
: OR# :
Unit n Value n
SAMPLE
A SAMPLE OF
A SAMPLE OF UNITS
POPULATION VALUES
Figure 3.1 Visualization of the relationship among universe, variable, population and sample.
19#
2. Broad Classification of Variables
Following up with the concept of variable, inform the students that usually, a
variable takes on several values. But occasionally, a variable can only assume one
value, then it is called a constant. For instance, in a class of fifteen-year olds, the age
in years of students is constant.
Variables can be broadly classified as either quantitative or qualitative, with the

latter further classified into discrete and continuous types (see Figure 3.3 below).
Figure 3.3 Broad Classification of Variables
(i) Qualitative variables express a categorical attribute, such as sex (male or

female), religion, marital status, region of residence, highest educational
attainment. Qualitative variables do not strictly take on numeric values (although
we can have numeric codes for them, e.g., for sex variable, 1 and 2 may refer to
male, and female, respectively). Qualitative data answer questions “what kind.”
Sometimes, there is a sense of ordering in qualitative data, e.g., income data
grouped into high, middle and low-income status. Data on sex or religion do not
have the sense of ordering, as there is no such thing as a weaker or stronger sex,
and a better or worse religion. Qualitative variables are sometimes referred to as
categorical variables.
(ii) Quantitative (otherwise called numerical) data, whose sizes are meaningful,
answer questions such as “how much” or “how many”. Quantitative variables
have actual units of measure. Examples of quantitative variables include the
height, weight, number of registered cars, household size, and total household
expenditures/income of survey respondents. Quantitative data may be further
classified into:
20#
a. Discrete data are those data that can be counted, e.g., the number of days
for cellphones to fail, the ages of survey respondents measured to the
nearest year, and the number of patients in a hospital. These data assume
only (a finite or infinitely) countable number of values.
b. Continuous data are those that can be measured, e.g. the exact height of a
survey respondent and the exact volume of some liquid substance. The
possible values are uncountably infinite.
With this classification, let us then test the understanding of our students by asking
them to classify the variables, we had in our last data gathering activity. They should
be able to classify these variables as to qualitative or quantitative and further more
as to discrete or continuous. If they did it right, you have the following:
TYPE OF
TYPE OF
VARIABLE QUANTITATIVE
VARIABLE
VARIABLE
Class Student Number Qualitative
Sex Qualitative
Number of Siblings Quantitative Discrete
Weight (in kilograms) Quantitative Continuous
Height (in centimeters) Quantitative Continuous
Age of Mother Quantitative Discrete
Usual Daily Allowance in School (in Quantitative
Discrete
pesos)
Usual Daily Food Expenditure in Quantitative
Discrete
School (in pesos)
Usual Number of Text Messages Quantitative
Discrete
Sent in a Day
Usual Sleeping Time Qualitative
Most Preferred Color Qualitative
Happiness Index for the Day Qualitative
Special Note:
For quantitative data, arithmetical operations have some physical interpretation.
One can add 301 and 302 if these have quantitative meanings, but if, these
numbers refer to room numbers, then adding these numbers does not make any
sense. Even though a variable may take numerical values, it does not make the
corresponding variable quantitative! The issue is whether performing arithmetical
operations on these data would make any sense. It would certainly not make sense
to sum two zip codes or multiply two room numbers.
21#
KEY POINTS
• A universe is a collection of units from which the data were gathered.

• A variable is a characteristic we observed or measured from every element of the
universe.
• A population is a set of all possible values of a variable.
• A sample is a subgroup of a universe or a population.
• In a study there is only one universe but could have several populations.
• Variables could be classified as qualitative or quantitative, and the latter could
be further classified as discrete or continuous. ASSESSM ENT
1. A market researcher company requested all teachers of a particular school to fill

up a questionnaire in relation to their product market study. The following are
some of the information supplied by the teachers:
• highest educational attainment
• predominant hair color
• body temperature
• civil status
• brand of laundry soap being used
• total household expenditures last month in pesos
• number of children in the household
• number of hours standing in queue while waiting to be served by a bank
teller
• amount spent on rice last week by the household
• distance travelled by the teacher in going to school
• time (in hours) consumed on Facebook on a particular day
a. If we are to consider the collection of information gathered through the

completed questionnaire, what is the universe for this data set? (The universe is
the set of all teachers in that school)
b. Which of the variables are qualitative? Which are quantitative? Among the
quantitative variables, classify them further as discrete or continuous.
• highest educational attainment (qualitative)
• predominant hair color (qualitative)
• body temperature (quantitative: continuous)
• civil status (qualitative)
• brand of laundry soap being used (qualitative)
• total household expenditures last month in pesos (quantitative: discrete)
• number of children in a household (quantitative: discrete)
teller (quantitative: discrete)
22#
• amount spent on rice last week by a household (quantitative: discrete)
• distance travelled by the teacher in going to school (quantitative:
continuous)
• time (in hours) consumed on Facebook on a particular day (quantitative:
continuous)
c. Give at least two populations that could be observed from the variables identified
in (b).
(Possible answer: The population is the set of all values of the highest educational
attainment and another population is {single, married, divorced, separated,
widow/widower})
2. The Engineering Department of a big city did a listing of all buildings in their
locality. If you are planning to gather the characteristics of these buildings,
a. What is the universe of this data collection activity? (Set of all buildings in
the big city)
b. What are the crucial variables to observe? It would also be better if you
could classify the variables as to whether it is qualitative or quantitative.
Furthermore, classify the quantitative variable as discrete or continuous. (A
possible answer is the number of floors in the building, quantitative,
discrete)
3. A survey of students in a certain school is conducted. The survey questionnaire
details the information on the following variables. For each of these variables,
identify whether the variable is qualitative or quantitative, and if the latter, state
whether it is discrete or continuous.
a. number of family members who are working (quantitative: discrete)
b. ownership of a cell phone among family members (qualitative)
c. length (in minutes) of longest call made on each cell phone owned per
month (quantitative: continuous)
d. ownership/rental of dwelling (qualitative)
e. amount spent in pesos on food in one week (quantitative: discrete)
f. occupation of household head (qualitative)
g. total family income (quantitative: discrete)
h. number of years of schooling of each family member (quantitative: discrete)
i. access of family members to social media (qualitative)
j. amount of time last week spent by each family member using the internet
(quantitative: continuous)
Explanatory Note:
• Teachers have the option to just ask this assessment orally to the entire class, or to
group students and ask them to identify answers, or to give this as homework, or
to use some questions/items here for a chapter examination.
23#
Lesson 4: Levels of Measurement

OVERVIEW OF LESSON
In this lesson we discuss the different levels of measurement as we continue to

explore data. Knowing such will enable us to plan the data collection process we
need to employ in order to gather the appropriate data for analysis.
At the end of the lesson, the learner is able to identify and differentiate the different
levels of measurement and methods of data collection
LESSON OUTLINE:
1. Motivational Activity
2. Levels of Measurement
3. Data Collection Methods
REFERENCES
!
24!
A. Motivational Activity
Ask the students first if they believe the following statement:
“Students who eat a healthy breakfast will do best on a quiz, students who eat an
unhealthy breakfast will get an average performance, and students who do not eat
anything for breakfast will do the worst on a quiz”
You could further ask one or more students who have different answers to defend
their answers. Then challenge the students to apply a statistical process to
investigate on the validity of this statement. You could enumerate on the board the
steps in the process to undertake like the following:
1. Plan or design the collection of data to verify the validity of the statement in a
way that maximizes information content and minimizes bias;
2. Collect the data as required in the plan;
3. Verify the quality of the data after it was collected;
4. Summarize the information extracted from the data; and
5. Examine the summary statistics so that insight and meaningful information can
be produced to support your decision whether to believe or not the given
statement.
Let us discuss in detail the first step. In planning or designing the data collection
activity, we could consider the set of all the students in the class as our universe.
Then let us identify the variables we need to observe or measure to verify the
validity of the statement. You may ask the students to participate in the discussion
by asking them to identify a question to get the needed data. The following are
some possible suggested queries:
1. Do you usually have a breakfast before going to school?

(Note: This is answerable by Yes or No)
2. What do you usually have for breakfast?
(Note: Possible responses for this question are rice, bread, banana, oatmeal,
cereal, etc)
The responses in Questions Numbers 1 and 2 could lead us to identify whether a

student in the class had a healthy breakfast, an unhealthy breakfast or no breakfast
at all.
!
25!
Furthermore, there is a need to determine the performance of the student in a quiz
on that day. The score in the quiz could be used to identify the student’s
performance as best, average or worst.
As we describe the data collection process to verify the validity of the statement,
there is also a need to include the levels of measurement for the variables of
interest.
B. Main Lesson:
1. Levels of Measurement
Inform students that there are four levels of measurement of variables: nominal,
ordinal, interval and ratio. These are hierarchical in nature and are described as
follows:
Nominal level of measurement arises when we have variables that are categorical
and non-numeric or where the numbers have no sense of ordering. As an example,
consider the numbers on the uniforms of basketball players. Is the player wearing a
number 7 a worse player than the player wearing number 10? Maybe, or maybe not,
but the number on the uniform does not have anything to do with their
performance. The numbers on the uniform merely help identify the basketball
player. Other examples of the variables measured at the nominal level include sex,
marital status, religious affiliation. For the study on the validity of the statement
regarding effect of breakfast on school performance, students who responded Yes
to Question Number 1 can be coded 1 while those who responded No, code 0 can
be assigned. The numbers used are simply for numerical codes, and cannot be used
for ordering and any mathematical computation.
Ordinal level also deals with categorical variables like the nominal level, but in
this level ordering is important, that is the values of the variable could be ranked.
For the study on the validity of the statement regarding effect of breakfast on
school performance, students who had healthy breakfast can be coded 1, those who
had unhealthy breakfast as 2 while those who had no breakfast at all as 3. Using the
codes the responses could be ranked. Thus, the students who had a healthy
breakfast are ranked first while those who had no breakfast at all are ranked last in
terms of having a healthy breakfast. The numerical codes here have a meaningful
sense of ordering, unlike basketball player uniforms, the numerical codes suggest
that one student is having a healthier breakfast than another student. Other
examples of the ordinal scale include socio economic status (A to E, where A is
wealthy, E is poor), difficulty of questions in an exam (easy, medium difficult), rank in
a contest (first place, second place, etc.), and perceptions in Likert scales.
!
26!
Note to Teacher: Let us also emphasize to the students that while there is
a sense or ordering, there is no zero point in an ordinal scale. In addition,
there is no way to find out how much “distance” there is between one
category and another. In a scale from 1 to 10, the difference between 7 and 8
may not be the same difference between 1 and 2.
Interval level tells us that one unit differs by a certain amount of degree from
another unit. Knowing how much one unit differs from another is an additional
property of the interval level on top of having the properties posses by the ordinal
level. When measuring temperature in Celsius, a 10 degree difference has the same
meaning anywhere along the scale – the difference between 10 and 20 degree
Celsius is the same as between 80 and 90 centigrade. But, we cannot say that 80
degrees Celsius is twice as hot as 40 degrees Celsius since there is no true zero, but
only an arbitrary zero point. A measurement of 0 degrees Celsius does not reflect a
true "lack of temperature." Thus, Celsius scale is in interval level. Other example of
a variable measure at the interval is the Intelligence Quotient (IQ) of a person. We
can tell not only which person ranks higher in IQ but also how much higher he or
she ranks with another, but zero IQ does not mean no intelligence. The students
could also be classified or categorized according to their IQ level. Hence, the IQ as
measured in the interval level has also the properties of those measured in the
ordinal as well as those in the nominal level.
Special Note: Inform also the students that the interval level allows
addition and subtraction operations, but it does not possess an absolute
zero. Zero is arbitrary as it does not mean the value does not exist. Zero only
represents an additional measurement point.
Ratio level also tells us that one unit has so many times as much of the property as
does another unit. The ratio level possesses a meaningful (unique and non-arbitrary)
absolute, fixed zero point and allows all arithmetic operations. The existence of the
zero point is the only difference between ratio and interval level of measurement.
Examples of the ratio scale include mass, heights, weights, energy and electric
charge. With mass as an example, the difference between 120 grams and 135
grams is 15 grams, and this is the same difference between 380 grams and 395
grams. The level at any given point is constant, and a measurement of 0 reflects a
complete lack of mass. Amount of money is also at the ratio level. We can say that
2000 pesos is twice more than 1,000 pesos. In addition, money has a true zero
point: if you have zero money, this implies the absence of money. For the study on
the validity of the statement regarding effect of breakfast on school performance,
the student’s score in the quiz is measured at the ratio level. A score of zero implies
that the student did not get a correct answer at all.
In summary, we have the following levels of measurement:
!
27!
Level Property Basic Empirical Operation
Nomina
No order, distance, or origin Determination of equivalence
l
Has order but no distance or Determination of greater or lesser
Ordinal
unique origin values
Both with order and distance Determination of equality of intervals
Interval
but no unique origin or difference
Has order, distance and unique Determination of equality of ratios or
Ratio
origin means
The levels of measurement depend mainly on the method of measurement, not on

the property measured. The weight of primary school students measured in
kilograms has a ratio level, but the students can be categorized into overweight,
normal, underweight, and in which case, the weight is then measured in an ordinal
level. Also, many levels are only interval because their zero point is arbitrarily
chosen.
To assess the students understanding of the lesson, you may go back to the set of
variables in the data gathering activity done in Lesson 2. You could ask the students
to identify the level of measurement for each of the variable. If they did it right, you
have the following:
VARIABLE LEVEL OF MEASUREMENT

Class Student Number Nominal
Sex Nominal
Number of Siblings Ratio
Weight (in kilograms) Ratio
Height (in centimeters) Ratio
Age of Mother Ratio
Usual Daily Allowance in School (in pesos) Ratio
Usual Daily Food Expenditure in School (in Ratio
pesos)
Usual Number of Text Messages Sent in a Day Ratio
Usual Sleeping Time Nominal
Most Preferred Color Nominal
Happiness Index for the Day Ordinal
!
28!
2. Methods of Data Collection
Variables were observed or measured using any of the three methods of data
collection, namely: objective, subjective and use of existing records. The objective
and subjective methods obtained the data directly from the source. The former uses
any or combination of the five senses (sense of sight, touch, hearing, taste and
smell) to measure the variable while the latter obtains data by getting responses
through a questionnaire. The resulting data from these two methods of data
collection is referred to as primary data. The data gathered in Lesson 2 are primary
data and were obtained using the subjective method.
On the other hand, secondary data are obtained through the use of existing records
or data collected by other entities for certain purposes. For example, when we use
data gathered by the Philippine Statistics Authority, we are using secondary data
and the method we employ to get the data is the use of existing records. Other
data sources include administrative records, news articles, internet, and the like.
However, we must emphasize to the students that when we use existing data we
must be confident of the quality of the data we are using by knowing how the data
were gathered. Also, we must remember to request permission and acknowledge
the source of the data when using data gathered by other agency or people.
KEY POINTS
• Four levels of measurement: Nominal, Ordinal, Interval and Ratio

• Knowing what level the variable was measured or observed will guide us to know
the type of analysis to apply.
• Three methods of data collection include objective, subjective and use of
existing records.
• Using the data collection method as basis, data can be classified as either
primary or secondary data.
ASSESSMENT
Note: Answers are provided inside the parentheses and in bold face.
1. Using the data of the teachers in a particular school gathered by a market

researcher company, identify the level of measurement for each of the following
variable.
• highest educational attainment (ordinal)
• predominant hair color (nominal)
• body temperature (interval)
• civil status (nominal)
• brand of laundry soap being used (nominal)
!
29!
• total household expenditures last month in pesos (ratio)
• number of children in a household (ratio)
teller (ratio)
• amount spent on rice last week by a household (ratio)
• distance travelled by the teacher in going to school (ratio)
• time (in hours) consumed on Facebook on a particular day (ratio)
2. The following variables are included in a survey conducted among students in a

certain school. Identify the level of measurement for each of the variables.
a. number of family members who are working (ratio);
b. ownership of a cell phone among family members (nominal);
c. length (in minutes) of longest call made on each cell phone owned per month
(ratio);
d. ownership/rental of dwelling (nominal);
e. amount spent in pesos on food in one week (ratio);
f. occupation of household head (nominal);
g. total family income (ratio);
h. number of years of schooling of each family member (ratio);
i. access of family members to social media (nominal);
j. amount of time last week spent by each family member using the internet
(ratio)
3. In the following, identify the data collection method used and the type of
resulting data.
a. The website of Philippine Airlines provides a questionnaire instrument that
can be answered electronically. (subjective method, primary data)
b. The latest series of the Consumer Price Index (CPI) generated by the
Philippine Statistics Authority was downloaded from PSA website. (use of
existing record, secondary data)
c. A reporter recorded the number of minutes to travel from one end to another
of the Metro Manila Rail Transit (MRT) during peak and off-peak hours.
(objective method, primary data)
d. Students getting the height of the plants using a meter stick. (objective
method, primary data)
e. PSA enumerator conducting the Labor Force Survey goes around the country
to interview household head on employment-related variables. (subjective
method, primary data)
!
30!
Lesson 5: Data Presentation

OVERVIEW OF LESSON
In this lesson we enrich what the students have already learned from Grade 1 to 10
about presenting data. Additional concepts could help the students to
appropriately describe further the data set.
At the end of the lesson, the learner is able to identify and use the appropriate
method of presenting information from a data set effectively.
LESSON OUTLINE:
1. Review of Lessons in Data Presentation taken up from Grade 1 to 10.

2. Methods of Data Presentation
3. The Frequency Distribution Table and Histogram
REFERENCES
! 31#
A. Review of Lessons in Data Presentation taken up from Grade 1 to 10.
You could assist the students to recall what they have learned in Grade 1 to 10
regarding data presentation by asking them to participate in an activity. The activity
is called ‘Toss the Ball’. This is actually a review and wake-up exercise. Toss a ball to
a student and he/she will give the most important concept he/she learned about
data presentation.
You may list on the board their responses. You could summarize their responses to
be able to establish what they already know about data presentation techniques
and from this you could build other concepts on the topic. A suggestion is to
classify their answers according to the three methods of data presentation, i.e.
textual, tabular and graphical. A possible listing will be something like this:
Textual or Narrative Presentation:

• Detailed information are given in textual presentation
• Narrative report is a way to present data.
Tabular Presentation:
• Numerical values are presented using tables.
• Information are lost in tabular presentation of data.
• Frequency distribution table is also applicable for qualitative variables
Graphical Presentation:
• Trends are easily seen in graphs compared to tables.
• It is good to present data using pictures or figures like the pictograph.
• Pie charts are used to present data as part of one whole.
• Line graphs are for time-series data.
• It is better to present data using graphs than tables as they are much better to
look at.
B. Main Lesson
1. Methods of Data Presentation
You could inform the students that in general there are three methods to present
data. Two or all of these three methods could be used at the same time to present
appropriately the information from the data set. These methods include the (1)
textual or narrative; (2) tabular; and (3) graphical method of presentation.
In presenting the data in textual or paragraph or narrative form, one describes the
data by enumerating some of the highlights of the data set like giving the highest,
! 32#
lowest or the average values. In case there are only few observations, say less than
ten observations, the values could be enumerated if there is a need to do so. An
example of which is shown below:
The country’s poverty incidence among families as reported by the

Philippine Statistics Authority (PSA), the agency mandated to release
official poverty statistics, decreases from 21% in 2006 down to 19.7% in
2012. For 2012, the regional estimates released by PSA indicate that the
Autonomous Region of Muslim Mindanao (ARMM) is the poorest region
with poverty incidence among families estimated at 48.7%. The region with
the smallest estimated poverty incidence among families at 2.6% is the
National Capital Region (NCR).
Data could also be summarized or presented using tables. The tabular method of
presentation is applicable for large data sets. Trends could easily be seen in this
kind of presentation. However, there is a loss of information when using such kind of
presentation. The frequency distribution table is the usual tabular form of
presenting the distribution of the data. The following are the common parts of a
statistical table:
a. Table title includes the number and a short description of what is found inside
the table.
b. Column header provides the label of what is being presented in a column.
c. Row header provides the label of what is being presented in a row.
d. Body are the information in the cell intersecting the row and the column.
In general, a table should have at least three rows and/or three columns. However,
too many information to convey in a table is also not advisable. Tables are usually
used in written technical reports and in oral presentation. Table 5.1 is an example of
presenting data in tabular form. This example was taken from 2015 Philippine
Statistics in Brief, a regular publication of the PSA which is also the basis for the
example of the textual presentation given above.
! 33#
Table 5.1 Regional estimates of poverty incidence among families
based on the Family Income and Expenditures Survey
conducted on the same year of reporting.
Region 2006 2009 2012

NCR 2.9 2.4 2.6
CAR 21.1 19.2 17.5
I 19.9 16.8 14.0
II 21.7 20.2 17.0
III 10.3 10.7 10.1
IV A 7.8 8.8 8.3
IV B 32.4 27.2 23.6
V 35.4 35.3 32.3
VI 22.7 23.6 22.8
VII 30.7 26.0 25.7
VIII 33.7 34.5 37.4
IX 40.0 39.5 33.7
X 32.1 33.3 32.8
XI 25.4 25.5 25.0
XII 31.2 30.8 37.1
Caraga 41.7 46.0 31.9
ARMM 40.5 39.9 48.7
Graphical presentation on the other hand, is a visual presentation of the data.

Graphs are commonly used in oral presentation. There are several forms of graphs
to use like the pie chart, pictograph, bar graph, line graph, histogram and box-plot.
Which form to use depends on what information is to be relayed. For example,
trends across time are easily seen using a line graph. However, values of variables in
nominal or ordinal levels of measurement should not be presented using line graph.
Rather a bar graph is more appropriate to use. A graphical presentation in the form
of vertical bar graph of the 2012 regional estimates of poverty incidence among
families is shown below:
! 34#
60!
Poverty(Incidence(Among(
50!
Families(in(Percent(
40!
30!
20!
10!
0!
I!
II!
III!
IV!B!
VI!
VII!
VIII!
IX!
X!
XI!
XII!
Caraga!
ARMM!
IV!A!
V!
NCR!
CAR!
Figure 5.1 2012 Regional poverty incidence among families (2012 FIES).
Other examples of graphical presentations that are shown below are lifted from the
Handbook of Statistics 1 (listed in the reference section at the end of this Teaching
Guide).
Figure 5.2. Percentage distribution of dogs according to groupings identified in a

dog show.
Figure 5.3. Distribution of fruits sales of a store for two days.
! 35#
Figure 5.4 Weapons arrest rate from 1965 to 1992 by age of offender.
80
weight in kg
70
60
50
40
30
110 130 150 170 190
height in cm
Figure 5.5. Height and weight of STAT 1 students registered during the
previous term.
2. The Frequency Distribution Table and Histogram
A special type of tabular and graphical presentation is the frequency distribution

table (FDT) and its corresponding histogram. Specifically, these are used to depict
the distribution of the data. Most of the time, these are used in technical reports. An
FDT is a presentation containing non-overlapping categories or classes of a variable
and the frequencies or counts of the observations falling into the categories or
classes. There are two types of FDT according to the type of data being organized:
a qualitative FDT or a quantitative FDT. For a qualitative FDT, the non-overlapping
categories of the variable are identified, and frequencies, as well as the percentages
of observations falling into the categories, are computed. On the other hand, for a
! 36#
quantitative FDT, there are also of two types: ungrouped and grouped. Ungrouped
FDT is constructed when there are only a few observations or if the data set
contains only few possible values. On the other hand, grouped FDT is constructed
when there is a large number of observations and when the data set involves many
possible values. The distinct values are grouped into class intervals. The creation of
columns for a grouped FDT follows a set of guidelines. One such procedure is
described in the following steps, which is lifted from the Workbook in Statistics 1
(listed in the reference section at the end of this Teaching Guide)
Steps in the construction of a grouped FDT
1. Identify the largest data value or the maximum (MAX) and smallest data value or the
minimum (MIN) from the data set and compute the range, R. The range is the difference
between the largest and smallest value, i.e. R = MAX – MIN.
2. Determine the number of classes, k using k = N , where N is the total number of

observations in the data set. Round-off k to the nearest whole number. It should be
noted that the computed k might not be equal to the actual number of classes
constructed in an FDT.
3. Calculate the class size, c, using c = R/k. Round off c to the nearest value with precision
the same as that with the raw data.
4. Construct the classes or the class intervals. A class interval is defined by a lower limit (LL)
and an upper limit (UL). The LL of the lowest class is usually the MIN of the data set. The
LL’s of the succeeding classes are then obtained by adding c to the LL of the preceding
! 1 "
classes. The UL of the lowest class is obtained by subtracting one unit of measure # x $
10 &
, where x is the maximum number of decimal places observed from the raw data)% from
the LL of the next class. The UL’s of the succeeding classes are then obtained by adding
c to the UL of the preceding classes. The lowest class should contain the MIN, while the
highest class should contain the MAX.
5. Tally the data into the classes constructed in Step 4 to obtain the frequency of each
class. Each observation must fall in one and only one class.
6. Add (if needed) the following distributional characteristics:
a. True Class Boundaries (TCB). The TCBs reflect the continuous property of a
continuous data. It is defined by a lower TCB (LTCB) and an upper TCB (UTCB).
These are obtained by taking the midpoints of the gaps between classes or by using
the following formulas: LTCB = LL – 0.5(one unit of measure) and UTCB = UL +
0.5(one unit of measure).
b. Class Mark (CM). The CM is the midpoint of a class and is obtained by taking the
average of the lower and upper TCB’s, i.e. CM = (LTCB + UTCB)/2.
! 37#
c. Relative Frequency (RF). The RF refers to the frequency of the class as a fraction of
the total frequency, i.e. RF = frequency/N. RF can be computed for both qualitative
and quantitative data. RF can also be expressed in percent.
d. Cumulative Frequency (CF). The CF refers to the total number of observations

greater than or equal to the LL of the class (>CF) or the total number of observations
less than or equal to the UL of the class (<CF).
e. Relative Cumulative Frequency (RCF). RCF refers to the fraction of the total number
of observations greater than or equal to the LL of the class (>RCF) or the fraction of
the total number of observations less than or equal to the UL of the class (<RCF).
Both the <RCF and >RCF can also be expressed in percent.
The histogram is a graphical presentation of the frequency distribution table in the

form of a vertical bar graph. There are several forms of the histogram and the most
common form has the frequency on its vertical axis while the true class boundaries
in the horizontal axis.
As an example, the FDT and its corresponding histogram of the 2012 estimated
poverty incidences of 144 municipalities and cities of Region VIII are shown below.
Poverty Frequency 78!

Incidence 80!
(%) 59!
60!
Frequency(
00.000 - 20.015 3
20.015 - 40.015 59 40!
40.015 - 60.015 78
60.015 - 80.015 4 20!
3! 4! 0!
80.015 - 100.00 0 0!
True(Class(Boundaries(
KEY POINTS
• Three methods of data presentation: textual, tabular and graphical

• Two or all the methods could be combined to fully describe the data at hand.
• Distribution of data is presented using frequency distribution table and
histogram.
! 38#
ASSESSMENT
Note: This exercise and its corresponding possible answers were lifted from
Workbook in Statistics 1 (listed in the reference section)
A. You are to describe the data on the following table. Perform what is being asked
for in the questions found after the table.
Table 5.2 Characteristics of the 30 members of the Batong Malake Senior Citizens
Association (BMSCA) who participated in their 2009 Lakbay-Aral.
!
Receiving Monthly Gross Monthly Family

Age as of Last Number of Years as
No. Gender Pension? Income
Birthday Member
(Y/N) (in thousand pesos)
1 Female 61 Yes 45.0 1
3 Male 74 No 33.5 10
4 Male 80 No 50.0 12
7 Female 75 No 41.0 2
8 Male 64 No 10.1 3
9 Male 65 No 46.5 5
10 Female 68 Yes 18.0 3
11 Female 71 Yes 34.2 6
12 Female 63 Yes 73.1 2
13 Female 72 Yes 15.6 11
14 Male 76 Yes 17.4 11
15 Female 69 No 33.8 8
16 Male 70 Yes 35.1 9
17 Male 74 Yes 18.6 6
18 Female 68 Yes 65.7 8
19 Female 70 No 19.6 3
20 Male 65 Yes 53.0 2
21 Male 64 Yes 18.4 1
22 Female 62 Yes 27.8 1
23 Female 63 No 33.4 2
24 Male 68 No 38.0 5
25 Male 67 Yes 37.6 5
26 Male 69 No 50.4 7
27 Female 68 Yes 44.3 4
28 Female 66 No 36.7 3
29 Female 63 No 18.0 2
30 Male 64 Yes 63.2 2
! 39#
1. Choose a QUANTITATIVE variable from the given data set. Construct a
quantitative grouped FDT for this variable. Show preliminary computations (R, k,
and c). Also, construct a histogram for the data. Use appropriate labels and titles
for the table and graph. Describe the characteristics of the units in the data set
using a brief narrative report. Refer to the FDT and histogram constructed.
R!=!____________________!!!!!!!
k!=!____________________!!!!!!!
c!=!____________________!
Table __________________________________________________________________
Classes CF RCF (%) TCB

Frequency RF
< > CM
LL UL (F) (%) < RCF > RCF LTCB UTCB
CF CF
Histogram:
Textual presentation:
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
! 40#
Which of the three methods of data presentation do you think is most
appropriate to use for the variable chosen in Number 1? Justify your answer.
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
2. Choose a QUALITATIVE variable from Table 5.2 Construct an appropriate graph.

Use labels and a title for the graph.
Give a brief report describing the variable:

________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
________________________________________________________________________
Possible Answers:
1. For the quantitative variable gross monthly family income:
R = 73.1 – 10.1 = 63 k= 30 = 5.477 ~ 5 c = 63/5 = 12.6
Table 1. Distribution of the gross monthly family income (in thousand pesos) of the
30 Batong Malake Senior Citizens Association members who joined the
Lakbay-Aral.
Classes Frequency CF RCF (%) TCB
RF (%) CM
LL UL (F) < CF > CF < RCF > RCF LTCB UTCB
10.1 22.6 9 30.00 9 30 30.00 100.00 16.35 10.05 22.65
22.7 35.2 8 26.67 17 21 56.67 70.00 28.95 22.65 35.25
35.3 47.8 7 23.33 24 13 80.00 43.33 41.55 35.25 47.85
47.9 60.4 3 10.00 27 6 90.00 20.00 54.15 47.85 60.45
60.5 73.0 2 6.67 29 3 96.67 10.00 66.75 60.45 73.05
73.1 85.6 1 3.33 30 1 100.00 3.33 79.35 73.05 85.65
! 41#
Histogram:
10!
8!
Frequency(
6!
4!
2!
0!
10.05!!!!!!!!!!!!!!!22.65!!!!!!!!!!!!!!!!35.25!!!!!!!!!!!!!!!!47.85!!!!!!!!!!!!!!!!!60.45!!!!!!!!!!!!!!!!73.05!!!!!!!!!!!!!!!!!
1! 2! 3! 4! 5! 6!
85.65!
TCB(
Figure 1. Monthly gross family income (in thousand pesos) of the 30 BMSCA members.
Textual presentation:
(Sample) The monthly gross family income of the 30 BMSCA members range
from 10.1 to 73.1 thousand pesos. More than half of them have income of at
most 35,250 pesos. Only three of them, or 10%, have monthly family income of
at least 60,450 pesos.
Which of the three methods of data presentation do you think is most

appropriate to use for the variable chosen in Number 1? Justify your answer.
(Sample)
Textual presentation: It is most appropriate to use a textual presentation since
the highlights of the family income of the BMSCA members can be presented.
Tabular presentation: It is most appropriate to use a tabular presentation since a
lot of the numerical information can be presented and trends in the monthly
income of the members can be seen.
Graphical presentation: A graphical presentation is most appropriate so that

trends in the monthly income of the BMSCA are easily visible.
2. For the qualitative variable: gender
! 42#
Figure 2. Distribution of the 30 BMSCA members by gender.
Brief Description: Majority of the 30 BMSCA who joined the Lakbay-Aral are
males. Only 43% are females. For the qualitative variable: whether member is
receiving monthly pension or not
Figure 2. Distribution of the 30 BMSCA members as to whether

they are receiving monthly pension or not.
Brief Description: More than half of the 30 BMSCA members receive monthly
pension. Forty percent are not receiving monthly pension.
! 43#
Lesson 6: Measures of Central Tendency

OVERVIEW OF LESSON
The lesson begins with students engaging in a review of some measures of central
tendency by considering a numerical example. Students are also asked to examine
both strengths and limitations of these measures. Assessments will be given to
students on their ability to calculate these measures, and also to get an overall
sense of whether they recognize how these measures respond to changes in data
values.
• Calculate commonly used measures of central tendency,

• Provide a sound interpretation of these summary measures, and
• Discuss the properties of these measures.
LESSON OUTLINE:
1. Motivation
2. Common Measures of Central Tendency: Mean, Median and Mode
3. Properties of the Mean, Median and Mode
REFERENCES
“Deciding Which Measure of Center to Use”
http://www.sharemylesson.com/teaching-resource/deciding-which-measure-of-
center-to-use-50013703/
! 44!
A. Motivation
Present to the students the following frequency distribution table of the monthly
income of 35 families residing in a nearby barangay/village.
Monthly Family Income in Number of

Pesos Families
12,000 2
20,000 3
24,000 4
25,000 8
32,250 9
36,000 5
40,000 2
60,000 2
You may ask the students the following to pick up their interest and at the same
time introduce to them some summary statistics.
1. What is the highest monthly family income? Lowest?
Answer: Highest monthly family income is 60,000 pesos while the lowest is
12,000 pesos.
You may emphasize that the highest and lowest values, which are commonly known
as maximum and minimum, respectively are summary measures of a data set. They
represent important location values in the distribution of the data. However, these
measures do not give a measure of location in the center of the distribution.
2. What monthly family income is most frequent in the village?
Answer: Monthly family income that is most frequent is 32,250 pesos.
The value of 32,250 occurs most often or it is the value with the highest frequency.
This is called the modal value or simply the mode. In this data set, the value of
32,250 is found in the center of the distribution.
3. If you list down individually the values of the monthly family income from lowest
to highest, what is the monthly family income where half of the total number of
! 45!
families have monthly family income less than or equal to that value while the other
half have monthly family income greater than that value?
Answer: When arranged in increasing order or the data come in an array as in

the following:
12,000; 12,000; 20,000; 20,000; 20,000; 24,000; 24,000; 24,000; 24,000; 25,000;
25,000;25,000; 25,000; 25,000; 25,000; 25,000; 25,000; 32,250; 32,250; 32,250;
32,250; 32,250; 32,250; 32,250; 32,250; 32,250; 36,000; 36,000; 36,000; 36,000;
36,000; 40,000; 40,000; 60,000; 60,000;
there are 17 values that are less than the middle value while another 17 values
are higher or equal to the middle value. That middle value is the 18th observation
and it is equal to 32,250 pesos. The middle value is called the median and is
found in the center of the distribution.
4. What is the average monthly family income?
Answer: When computed using the data values, the average is 30,007.14 pesos.
The average monthly family income is commonly referred to as the arithmetic

mean or simply the mean which is computed by adding all the values and then the
sum is divided by the number of values included in the sum. The average value is
also found somewhere in the center of the distribution.
Let us now summarize what we have learned from our illustration and introduce the
three common measures of central tendency.
B. Common Measures of Central Tendency: Mean, Median and Mode
Inform students that the most widely used measure of the center is the (arithmetic)
mean. It is computed as the sum of all observations in the data set divided by the
number of observations that you include in the sum. If we use the summation
symbol, ! !!! !! read as ‘sum of observations represented by xi where i takes the
values from1 to N, and N refers to the total number of observations being added’,
!
!
we could compute the mean (usually denoted by Greek letter, µ) as ! = !!! ! !.
Using the example earlier with 35 observations of family income, the mean is
computed as
! = 12,000 + 12,000 + ⋯ + 60,000 35 = 1,050,250 35 = 30,007.14
! 46!
Alternatively, we could do the computation as follows:
Number
Monthly Family
of xi × fi
Income in Pesos
Families
(x i )
(f i )
12,000 2 12,000 × 2 = 24,000
20,000 3 20,000 × 3 = 60,000
24,000 4 24,000 × 4 = 96,000
25,000 8 25,000 × 8 = 200,000
32,250 9 32,250 × 9 = 290,250
36,000 5 36,000 × 5 = 180,000
40,000 2 40,000 × 2 = 80,000
60,000 2 60,000 × 2 = 120,000
Sum = 35 Sum = 1,050,250
For large number of observations, it is advisable to use a computing tool like a

calculator or a computer software, e.g. spreadsheet application or Microsoft Excel®.
The median on the other hand is the middle value in an array of observations. To
determine the median of a data set, the observations must first be arranged in
increasing or decreasing order. Then locate the middle value so that half of the
observations are less than or equal to that value while the half of the observations
are greater than the middle value.
If N (total number of observations in a data set) is odd, the median or the middle
!!! !!
value is the !
!observation in the array. On the other hand, if N is even, then
the median or the middle value is the average of the two middle values or it is
! !! ! !!
average of the !
and !
+1 !observations. In the example given earlier, there
!!! !!
are 35 observations so N is 35, an odd number. The median is then the =
!
!" !!
!
= 18!! observation in the array. Locating the 18th observation in the array
leads us to the value equal to 32,250 pesos.
The mode or the modal value is the value that occurs most often or it is that value
that has the highest frequency. In other words, the mode is the most fashionable
value in the data set. Like in the example above, the value of 32,250 pesos occurs
most often or it is the value with the highest frequency which is equal to nine.
! 47!
C. Properties of the Mean, Median and Mode
Each of these three measures has its own properties. Most of the time we use these
properties as basis for determining what measure to use to represent the center of
the distribution.
As mentioned before the mean is the most commonly used measure of central
tendency since it could be likened to a “center of gravity” since if the values in an
array were to be put on a beam balance, the mean acts as the balancing point
where smaller observations will “balance” the larger ones as seen in the following
illustration.
12,000% 20,000%
24,000% 25,000% 32,250% 36,000% 40,000% 60,000%
%
Note that the frequency represented by the size of the rectangle serves as ‘weights’
in this beam balance.
To illustrate further this property, we could ask the student to subtract the value of
the mean to each observation (denoted as di) and then sum all the differences. The
computation can also be done alternatively as shown in the following table.
Monthly
Number
Family di × fi
di = xi - µ of
Income in
(rounded off) Families
Pesos
(f i )
(x i )
12,000 12,000 – 30,007.14 = -18,007 2 -18,007 × 2 = -36,014
20,000 20,000 – 30,007.14 = -10,007 3 -10,007 × 3 = -30,021
24,000 24,000 – 30,007.14 = -6,007 4 -6,007 × 4 = -24,049
25,000 25,000 – 30,007.14 = -5,007 8 -5,007 × 8 = -40,057
32,250 32,250 – 30,007.14 = 2,243 9 2,243× 9 = 20,186
36,000 36,000 – 30,007.14 = 5,993 5 5,993 × 5 = 29,964
40,000 40,000 – 30,007.14 = 9,993 2 9,993 × 2 = 19,986
60,000 60,000 – 30,007.14 = 29,993 2 29,993 × 2 = 59,986
Sum = 35 Sum = 0
The sum of the differences across all observations will be equal to zero. This
indicate that the mean indeed is the center of the distribution since the negative
and positive deviations cancel out and the sum is equal to zero.
! 48!
In the expression given above, we could see that each observation has a
contribution to the value of the mean. All the data contribute equally in its
calculation. That is, the “weight” of each of the data items in the array is the
reciprocal of the total number of observations in the data set, i.e. 1 !.
Means are also amenable to further computation, that is, you can combine
subgroup means to come up with the mean for all observations. For example, if
there are 3 groups with means equal to 10, 5 and 7 computed from 5, 15, and 10
observations respectively, one can compute the mean for all 30 observations as
follows:
!! !! + ! !! !! + !! !! ! 10×5! + 5×15! + 7×10!

!= = = 195 30 = 6.59
30 30
If there are extreme large values, the mean will tend to be ‘pulled upward’, while if
there are extreme small values, the mean will tend to be ‘pulled downward’. The
extreme low or high values are referred to as ‘outliers’.’Thus, outliers do affect the
value of the mean.
To illustrate this property, we could tell the students that if in case there is one
family with very high income of 600,000 pesos monthly instead of 60,000 pesos
only, the computed value of mean will be pulled upward, that is,
! = 12,000 + 12,000 + ⋯ + 600,000 35 = 2,130,250 35 = 60,864.29
Thus, in the presence of extreme values or outliers, the mean is not a good measure
of the center. An alternative measure is the median. The mean is also computed
only for quantitative variables that are measured at least in the interval scale.
Like the mean, the median is computed for quantitative variables. But the median
can be computed for variables measured in at least in the ordinal scale. Another
property of the median is that it is not easily affected by extreme values or outliers.
As in the example above with 600,000 family monthly income measured in pesos as
extreme value, the median remains to same which is equal to 32,250 pesos.
For variables in the ordinal, the median should be used in determining the center of
the distribution. On the other hand, the mode is usually computed for the data set
which are mainly measured in the nominal scale of measurement. It is also
sometimes referred to as the nominal average. In a given data set, the mode can
easily be picked out by ocular inspection, especially if the data are not too many. In
some data sets, the mode may not be unique. The data set is said to be unimodal
! 49!
if there is a unique mode, bimodal if there are two modes, and multimodal if
there are more than two modes. For continuous data, the mode is not very useful
since here, measurements (to the most precise significant digit) would theoretically
occur only once.
The mode is a more helpful measure for discrete and qualitative data with numeric
codes than for other types of data. In fact, in the case of qualitative data with
numeric codes, the mean and median are not meaningful.
The following diagram provides a guide in choosing the most appropriate measure
of central tendency to use in order to pinpoint or locate the center or the middle of
the distribution of the data set. Such measure, being the center of the distribution
‘typically’ represents the data set as a whole. Thus, it is very crucial to use the
appropriate measure of central tendency.
KEY POINTS
• A measure of central tendency is a location measure that pinpoints the center or

middle value.
• The three common measures of central tendency are the mean, median and
mode.
! 50!
• Each measure has its own properties that serve as basis in determining when to
use it appropriately.
ASSESSMENT
1. Thirty people were asked the question, “How many people do you consider your
best friend?” The graph below shows their responses.
12%
10%
8%
Frequency)
6%
4%
2%
0%
1% 2% 3% 4% 5% 6% 7% 8%
Number)of)Best)Friends)
What measure of central tendency would you use to find the center for the
number of best friends people have? Explain your answer. (Since there is a
presence of an outlier, one can use the median which is numerically equal to 3)
2. The mean age of 10 full time guidance counselors is 35 years old. Two new full
time guidance counselors, aged 28 and 30, are hired. Five years from now, what
would be the average age of these twelve guidance counselors? (The sum of ages
is 350 for 10 counselors, with the two newly hired, the sum is now 408, thus yielding a
mean currently at 34 years. Five years from now, the mean will go up to 39 years for the
12 guidance counselors.)
3. Houses in a certain area in a big city have a mean price of PhP4,000,000 but a
median price is only PhP2,500,000. How might you explain this best? (There is an
outlier (an extremely expensive house) in the prices of the houses.)
4. Five persons were asked on the usual number of hours they spent watching
television in a week. Their responses are: 5, 7, 3, 38, and 7 hours.
a. Obtain the mean, median and mode. (The mean is 12; median is 7, mode is
7.)
b. If another person were to be asked the same question and he/she responded
200 hours, how would this affect the mean, median and mode? (Median and
mode unchanged; mean increases to 43.3)
! 51!
5. For the senior high school dance, there is a debate going on among students
regarding the color that will be featured prominently. Votes were sent by
students via SMS, and the results are as follows:
Color Red Green Orange White Yellow Blue Brown Purple

No. of
Votes 300 550 70 130 220 710 35 5
Received
a. Is there a clear winner on the choice of color? (Yes)
b. Compute for the mean, median and modal color (if possible). (We cannot
compute for the mean and median. But the modal color is said to be blue.)
c. Why is it that we could or could not find each measure of the central
tendency? (We cannot compute for the mean and median since color is a
qualitative variable and is measured at the nominal level)
d. Which measure of central tendency will determine the color to be prominently
used during the senior high school dance? (mode)
6. Everyone studied very hard for the quiz in the Statistics and Probability Course.
There were 10 questions in the quiz, and the scores are distributed as follows:
Score Number of Students

10 8
9 12
8 6
7 5
6 3
5 2
4 0
3 1
2 1
1 0
0 2
a. Compute for the mean, median, and mode for this set of data. (The
computation could be done as follows:
Less Than Cumulative

Score Number of Students
xi × fi Frequency
(x i ) (f i )
(< CF)
10 8 80 40
9 12 108 32
8 6 48 20
! 52!
7 5 35 14
6 3 18 9
5 2 10 6
4 0 0 4
3 1 3 4
2 1 2 3
1 0 0 2
0 2 0 2
Sum = 40 Sum = 304
!"#
Mean = µ = !"
= 7.6;
!!!
Median is the average of the 20th and 21st observations = ! = 8.5. Note
that the 20th observation is 8 while the 21st observation is 9 based on
the less than cumulative frequency.
Mode = 9 since that is the score with the highest frequency equal to 12.
c. Suppose the teacher said “Everyone in the class will be getting either the
mean, median, or mode for their official score.”
i. What would students want to receive (mean, median, or mode)? (Mode)
ii. Which would students want to receive the least (mean, median or mode)?
(Mean)
iii What is the fairest score to receive would be? Ask students to explain their
answers. (Note: There is no right or wrong answer for this question. It all
depends on the reasoning of the students)
! 53!
Lesson 7: Other Measures of Location

OVERVIEW OF LESSON
In the previous lesson we discussed a measure of location known as the measure of

central tendency. There are other measures of location which are useful in
describing the distribution of the data set. These measures of location include the
maximum, minimum, percentiles, deciles and quartiles. How to compute and
interpret these measures are also discussed in this lesson.
• Calculate measures of location other than the measure of central tendency, and
• Provide a sound interpretation of these summary measures.
LESSON OUTLINE:
1. Motivation
2. Measures of Location: Maximum, Minimum, Percentiles, Deciles and Quartiles
REFERENCES
“Deciding Which Measure of Center to Use”
http://www.sharemylesson.com/teaching-resource/deciding-which-measure-of-
center-to-use-50013703/
Moore, D.S. (2007). The Basic Practice of Statistics, Fourth Edition W.H. Freeman
and Company.
! 54#
A. Motivation
In the previous lesson, we ask the students to identify the highest and lowest family
income, and emphasized that that the highest and the lowest values, which are
commonly known as maximum and minimum, respectively are important summary
measures of a data set. They represent important location values in the distribution
of the data. However, these measures do not give a measure of location in the
center of the distribution. Instead, these two location measures give extreme
locations or points in a distribution.
For example, after a long test or examination, we are interested what is the highest
score or lowest score and of course who got these scores. These are in addition to
knowing the average, median and modal scores. These measures tell us how the
students perform in the long test. Knowing these measures, we could do further
actions like reward the student(s) who got the highest score and assist those
student(s) who got the lowest score. In addition, these measures also indicate if the
long test is difficult or easy and the measures may also indicate the level of
understanding of the students in the concepts that are covered in the test.
To motivate the students, present the following distribution of scores in a 50-item

long test of 150 Grade 11 students of a nearby Senior High School and ask them to
respond to some questions.
Number of
Score in a Long Test
Students
10 4
16 5
18 5
20 15
25 19
30 22
33 18
38 28
40 10
42 7
45 8
50 9
! 55#
1. What is the highest score? Lowest score?
Answer: Highest score is 50 while the lowest is 10.
2. What is the most frequent score?

Answer: Most frequent score is 38 which is the score of 28 students.
3. What is the median score?

Answer: The median score is 33 which implies that 50% of the students or
around 75 students have score at most 33.
4. What is the average or mean score?

Answer: On the average, the students got 32.04667 or 32 (rounded off) out
of 50 items correctly.
You could ask more questions like:

1. What is the score where at most 75% of the 150 students scored less or equal
to it?
2. Do you think the long test is easy since 75 students have scores at most 33
out of 50?
3. Do you need to be alarmed when 10% of the class got a score of at most 20
out of 50?
These questions could be answered by knowing other measures of location.
B. Measures of Location: Maximum, Minimum, Percentiles, Deciles and

Quatiles
We formally define the maximum as a measure of location that pinpoints the

highest value in the data distribution while the minimum locates the lowest value.
There are other measures of location that are becoming common because of its
constant use in reporting rank in distribution of scores as the percentile rank in
college entrance examination. These measures are referred to as percentiles,
deciles, and quartiles.
Percentile is a measure that pinpoints a location that divides distribution into 100
equal parts. It is usually represented by Pj, that value which separates the bottom j%
of the distribution from the top (100-j)%. For example, P30 is the value that separates
the bottom 30% of the distribution to the top 70%. Thus we say 30% of the total
! 56#
number of observations in the data set are said to be less than or equal to P30 while
the remaining 70% have values greater than P30.
Lifted from the workbook cited as reference at the end of this Teachers Guide, are
the steps in finding the jth percentile (Pj)
Step 1: Arrange the data values in ascending order of magnitude.

j
Step 2: Find the location of Pj in the arranged list by computing L = !$ "
%× N ,
& 100 '
where N is the total number of observations in the data set.
Step 3:
a. If L is a whole number, then Pj is the mean or average of the values in the
Lth and (L+1)th positions.
b. If L is not a whole number, then Pj is the value of the next higher
position.
To illustrate we use the data on long test scores of 150 Grade 11 students of nearby
Senior High School. An additional column on less than cumulative frequency was
included to facilitate the computation.
Number of < CF
Students
10 4 4
16 5 9
18 5 14
20 15 29
25 19 48
30 22 70
33 18 88
38 28 116
40 10 126
42 7 133
45 8 141
50 9 150
To find P30 we note that j = 30. Since the observations are tabulated in increasing
!
order, we could proceed to Step 2 which ask us to compute L as ! = !""
×! =
!"
!""
×150 = 45. The computed L which is equal to 45 is a whole number and thus
we follow the first rule in Step 3 which states that Pj is the average or mean of the
values found in the Lth and (L+1)th positions. Thus, we take the average of the 45th
and 46th observations which are both equal to 25. We then say that the bottom 30%
! 57#
of the scores are said to be less than or equal to 25 while the top 70% of the
observations (which is around 105) are greater than 25.
Deciles and quartiles are then defined in relation to percentile. If the percentile
divides the distribution into 100 equal parts, deciles divide the distribution into 10
equal parts while quartiles divide the distribution into 4 equal parts. Thus, we say
that 10th Percentile is the same as the 1st Decile, 20th Percentile same as 2nd Decile,
25th Percentile same as 1st Quartile, 50th Percentile same as 5th Decile or 2nd Quartile
and so forth. Note also that by definition of the median in previous lesson, we could
say that the median value is equal to the 50th Percentile or 5th Decile or 2nd Quartile.
Because of this relationship, the computation of the quartile and decile could be
coursed through the computation of the percentile.
To illustrate, if we want to compute the 3rd Decile or D3 then we compute 30th

Percentile or P30. In other words, D3 = P30 = 25 based on our earlier computation.
! !"
The 3rd Quartile or Q3 is equal to P75. To compute L as ! = !""
×! = !""
×150 =
112.5. The computed L which is equal to 112.5 is not a whole number and thus we
follow the second rule in Step 3 which states that Pj is the value found in the next
higher position, specifically, in 113th position, the next higher position after 112.5.
Thus, we take the 113th observation which is equal to 38 as the value of P75. We then
say that 75% of the class of 150 students or around 113 students correctly answered
at most 38 out of the 50 items.
The median which is equal to P50 is computed as the mean or average of the 75th
and 76th observations which are both equal to 33. Hence, we did get the same value
as the one we obtained using the definition we had in the previous lesson.
KEY POINTS
• There are other measures of location that could further describe the distribution
of the data set.
• The maximum and minimum values are measures of location that pinpoints the
extreme values which are the highest and lowest values, respectively.
• Percentiles, quartiles and deciles are measures of locations that divide the
distribution into 100, 4 and 10 equal parts, respectively.
! 58#
ASSESSMENT
Note: Answers are provided inside the parentheses and in bold face.
1. A businesswoman is planning to have a restaurant in the university belt. She

wants to study the weekly food allowance of the students in order to plan her
pricing strategy for the different menus she is going to offer. She asked 213
students and gathered the following data:
W eekly W eekly
Food Frequency Food Frequency
Allowance Allowance
50 5 550 3
100 3 600 18
150 6 700 22
170 1 750 8
200 8 800 16
250 5 900 11
300 5 1000 27
350 5 1200 2
400 6 1500 3
450 11 1700 1
500 46 2000 1
a. Determine the weekly food allowance where 60% of the students have at
most.
! !"
(The statistic we wanted is P60. To compute L as ! = !"" ×! = !"" ×213 =
127.8 ≅ 128. Then we take the 128th observation which is equal to 700. Thus
we say that 60% of the students have at most 700 pesos as their weekly food
allowance.)
b. What percentage of the students have a weekly food allowance that is at

most 170 pesos?
(Here we are looking for the value of j. It is given that Pj = 170 is the 15th
observation in the array of 213 values. Thus, 15 is the value of L and using
! !"
this we compute the value of j as ! = ! ×100 = !"# ×100 ≅ 7. Therefore we
say that 7% of the students have a weekly food allowance of at most 170
pesos.)
c. If the business woman wanted to have at least 50% of the students could
afford to eat in her restaurant, what should be the minimum total cost of the
meals that the student could have in a week?
!
(The statistic we wanted is the median or P50. To compute L as ! = !"" ×
!"
! = !"" ×213 = 106.5 ≅ 107. Then we take the 107th observation which is
equal to 600. Thus we say that at least 50% of the students could afford to
eat in the restaurant if the minimum total cost of the meals that the student
could have in a week is 600 pesos.)
! 59#
Lesson 8: Measures of Variation
OVERVIEW OF LESSON
In this lesson, students will be shown that it is not enough to get measures of central
tendency in a data set by scrutinizing two different data sets with the same
measures of central tendency. We illustrate this using data on the returns on stocks
where it is not only the mean, median and mode which are the same, it is also true
for other measures of location like its minimum and maximum. However, the spread
of observations are different which means that to further describe the data sets we
need additional measures like a measure about the dispersion of the data, i.e.
range, interquartile range, variance, standard deviation, and coefficient of variation.
Also, the standard deviation, as a measure of dispersion can be viewed as a
measure of risk, specifically in the case of making investments in stock market. The
smaller the value of the standard deviation, the smaller is the risk.
LEARNING OUTCOMES:
• Calculate some measures of dispersion;
• Think of the strengths and limitations of these measures; and
• Provide a sound interpretation of these measures.
LESSON OUTLINE:
1. Introduction: The Case of the Returns on Stocks
2. Absolute Measures of Dispersion: Range, Interquartile Range, Variance,
Standard Deviation and Coefficient of Variation
3. Relative Measure of Dispersion: Coefficient of Variation
REFERENCES
Bryant−Smith (2009): Practical Data Analysis, Second Edition. McGraw-Hill/Irvine,
USA.
Moore, D.S. (2007). The Basic Practice of Statistics, Fourth Edition W.H. Freeman
and Company.
“Range as a Measure of Variation” http://www.sharemylesson.com/teaching-
resource/range-as-a-measure-of-variation-50009362
! 60#
A. Introduction: The Case of the Returns on Stocks.
To introduce this lesson, tell the students the importance of thinking about their
future, of saving, and of wealth generation. Explain that a number of people invest
money into the stock market as an alternative financial instrument to generate
wealth from savings.
Explanatory Note: Stocks are shares of ownership in a company. When people buy
stocks they become part owners of the company, whether in terms of profits or
losses of the company.
Mention to students that the history of performance of a particular stock maybe a

useful guide to what may be expected of its performance in the foreseeable future.
This is of course, a very big assumption, but we have to assume it anyway.
Provide the following data to students representing the rates of return for two
stocks, which we will call Stock A and Stock B.
Y ear Stock Stock Year Stock Stock

A B A B
2005 0.081 0.214 2010 0.241 0.081
2006 0.231 0.193 2011 0.193 0.181
2007 0.214 0.132 2012 0.133 0.230
2008 0.214 0.073 2013 0.071 0.214
2009 0.181 0.066 2014 0.066 0.241
Inform students that the rate of return is defined as the increase in value of the
portfolio (including any dividends or other distributions) during the year divided by
its value at the beginning of the year. For instance, if the parents of Juana dela Cruz
invests 50,000 pesos in a stock at the beginning of the year, and the value of the
stock goes up to 60,000 pesos, thus having an increase in value of 10,000 pesos,
then the rate of return here is 10,000/50,000 = 0.20
Explain to students that the rate of return may be positive or negative. It represents
the fraction by which your wealth would have changed had it been invested in that
particular combination of securities.
Now, let us compute some measures of locations that we learned in previous

lessons to describe the data given above. You could ask the students to do this as a
sort of an assessment of what they have already learned. It could be done by
recitation or through a quiz. Below is a summary of the computed values as well as a
graphical presentation of the rate of returns of Stock A and B.
! 61#
Maximum Minimum Mean Median Mode
Stock A 0.241 0.066 0.1625 0.187 0.214
Stock B 0.241 0.066 0.1625 0.187 0.214
0.3!
0.25!
0.2!
0.15! Stock!A!
Stock!B!
0.1!
0.05!
0!
2005! 2006! 2007! 2008! 2009! 2010! 2011! 2012! 2013! 2014!
Notice that there are no differences in the computed summary statistics but the
trend and actual values of the rate of returns for the two stocks are different as
depicted in the line graph. Such observation tells us that it is not enough to simply
use measures of location to describe a data set. We need additional measures such
as measures of variation or dispersion to describe further the data sets.
In particular, summary measures of variability (such as the range and the standard
deviation) of the rates of return are used to measure risk associated with investment.
We could use measures of variation to decide whether it would make any difference
if we decide to invest wholly in Stock A, wholly in Stock B, or half of our investments
in Stock A and another half in Stock B. In general, there is higher risk in investing if
the rate of return fluctuates much or there is high variability in its historical values.
Thus, we choose investment where the risk of the rate of return has a small measure
of dispersion.
There are two types of measures of variability or dispersion. One type is the
absolute measure which includes the range, interquartile range, variance, and
standard deviation. Absolute measure of dispersion provides a measure of
variability of observations or values within a data set. On the other hand, the
relative measure of dispersion which is the other type of measure of dispersion
is used to compare variability of data sets of different variables or variables
measured in different units of measurement. The coefficient of variation is a relative
measure of variability.
! 62#
B. Absolute Measures of Dispersion: Range, Interquartile Range, Variance,
and Standard Deviation
The range is a simple measure of variation defined as the difference between the
maximum and minimum values. The range depends on the extremes; it ignores
information about what goes in between the smallest (minimum) and largest
(maximum) values in a data set. The larger the range, the larger is the dispersion of
the data set. We already encountered the range in previous lesson where we
discussed the construction of an FDT.
Using the data on the scores of 150 Grade 11 students of a nearby Senior High
School on a 50-item long test, we could demonstrate the computation of these
measures.
Number of < CF
Students
10 4 4
16 5 9
18 5 14
20 15 29
25 19 48
30 22 70
33 18 88
38 28 116
40 10 126
42 7 133
45 8 141
50 9 150
In the above data, the maximum is 50 and the minimum is 10, hence the range is
40. But note that the range could be easily affected by the values of the extremes as
mentioned earlier as the range depends only on the extremities. Because of this
property, another measure, the interquartile range or IQR is used instead.
The interquartile range or IQR is the difference between the 3rd and the 1st
quartiles. Hence, it gives you the spread of the middle 50% of the data set. Like the
range, the higher the value of the IQR, the larger is the dispersion of the data set.
Based on the computations we did in the previous lesson, the 3rd quartile or Q3 is
the 113th observation and is equal to 38 while Q1 or P25 is the 38th observation and is
equal to 25. Hence, IQR = = 38 – 25 = 13.
Recall with the students the property of the mean when deviation or difference of
each observation was obtained and summed for all the observations we got the sum
! 63#
equal to zero. We said that this property shows that the deviation of the observation
from the mean cancels out indicating that the mean is indeed the center of the
distribution. What if we square the difference before we get the sum and use it to
measure the spread of observations? Doing it in our example, we have the following
table:
Score in a d i =x i - µ Number of
Long Test (rounded di 2
Students d i2 × f i
(x i ) off) (f i )
10-32 = - 484
10
22 4 1936
16-32 = - 256
16
16 5 1280
18-32 = - 196
18
14 5 980
20-32 = - 144
20
12 15 2160
25 25-32 = -7 49 19 931
30 30-32 = -2 4 22 88
33 33-32 = 1 1 18 18
38 38-32 = 6 36 28 1008
40 40-32 = 8 64 10 640
42 42-32 = 10 100 7 700
45 45-32 = 13 169 8 1352
50 50-32 = 18 324 9 2916
Sum=
14009
So what we did is for each unique observation we subtract the mean, we refer to the
difference as di, square the difference and sum it for all observations. Note that in
the table we have to multiply the square of the difference with the number of
students to account for all observations. We then divide the sum by the total
number of observations, denoted by N. Summarizing these steps in a formula, we
!
! !! !
have !!! !! . We usually denote this expression as s2 or call it as variance. Thus
in this example, s2= 14009/150 = 93.39 For ease in computation, instead of
! ! !
!! !! ! !!! !!
!!!
!
, we use an equivalent expression !
− !! . When applied to our
! !
!!! !! !! !"#,!"#
example, we have ! ! = !
− !! = !"#
− 32.04667! ≅ 93.39 (rounded off).
Variance is a measure of dispersion that accounts for the average squared

deviation of each observation from the mean. Since we square the difference of
each observation from the mean, the unit of measurement of the variance is the
! 64#
square of the unit used in measuring each observation. Such property is a little bit
problematic in interpretation. For example, point2 or kilogram2 is difficult to
interpret compared to inches2.
Hence, instead of the variance the standard deviation is computed which is the
positive square of the variance, that is, ! = ! ! . In the example,!! = 93.3933 =
9.6640. To interpret, we say that on the average, the scores of the students deviate
from the mean score of 32 points by as much as 9.6640 or approximately 10 points.
If all the observations are equal to a constant, then the mean is that constant, and
the measure of variation is zero. Furthermore, if for a given data set, the variance
and standard deviation turn out to be zero, then all the deviations from the average
must be zero, which means that all observations are equal. Note that if a data set
were rescaled, that is if the observations were multiplied by some constant, then the
standard deviation of the new data set is merely the scaling factor multiplied to the
standard deviation of the original data set.
The variance and standard deviation are based on all the observations items in the
data set, and each item is given a proper weight. They are extremely useful
measures of variability as they measure the average scattering of the data around
the mean, that is how large data fluctuate above and below the mean. The variance
and standard deviation increase with an increase in the deviations about the mean,
and decrease with decreases in these deviations. A small standard deviation (and
variance) means a high degree of uniformity in the observations and of
homogeneity in a series.
The variance is the most suitable for algebraic manipulations but as was pointed out
earlier, its value is in squared unit of measurements. On the other hand, the
standard deviation has unit of measure same as with that of the observations. Thus,
standard deviation serves as the primary measure of variation, just as the mean is
the primary measure of central location.
Going back to the motivation example on the stocks where in we have two stocks, A
and B. Both stocks have same expected return measured by the mean. However,
the standard deviation of the rates of return for Stock A is 0.0688 while that for
Stock B is 0.0685, indicating that Stock A has higher risk compared to Stock B
although the difference is not that large.
C. Relative Measure of Dispersion: Coefficient of Variation
To compare variability between or among different data sets, that is, the data sets
are for different variables or same variables but measured in different unit of
! 65#
measurement, the coefficient of variation (CV) is used as measure of relative
!
dispersion. It is usually expressed as percentage and is computed as CV = ! ×100%.
CV is a measure of dispersion relative to the mean of the data set. With and having
same unit of measurement, CV is unit less or it does not depend on the unit of
measurement. Hence, it is used compare the variability across the different data
sets.
As an example, the CV of the scores of the students in the long test is computed as
! !.!!"#
CV = ! ×100% = ! !".!"##$ ×100% = 30.16% while the CV of the rate of returns of
!.!"##
Stock A is CV = !.!"#$ ×100% = 42.34%. Thus, we say the rate of returns of Stock A
is more variable than the scores of the students in the test. Here, we used the CV to
compare the variability of two different data sets.
KEY POINTS
• Measure of dispersion is used to further describe the distribution of the data set.
• Absolute measures of variation include range, interquartile range, variance and
standard deviation.
• A relative measure of dispersion is provided by the coefficient of variation.
ASSESSMENT
1. Three friends, Gerald, Carmina, and Rodolfo are planning their business of selling
homemade peanut butter. They start the planning by doing a market study where
they obtained the prices (in pesos) of a 250-gram jar of several known brands of
peanut butter. Below is the data set they have collected:
100.80 197.60 158.00 131.60 184.40 149.20

136.00 109.60 360.40 122.80 131.60
After studying the data, Gerald said, “The prices of peanut butter are pretty similar.
The range is only PhP 30.80.” Carmina said, “You are mistaken! The prices are very
different. The range is PhP 259.60. Rodolfo said, “I think you are both mistaken.
The range isn’t a useful measure to describe the variation of the data set.
a. Explain what you think is the basis used by each person in support of their claims.
(Gerald did not arrange the data set from smallest to largest, and erroneously
subtracted the first value (100.80) from the last value (131.60) in the data set.
Carmina found the range correctly by subtracting the smallest value (100.80) from
! 66#
the largest value (360.40). Rodolfo noticed that the maximum 360.80 is an outlier.
As a result, the computed range of PhP259.60 roughly describe the variation of the
observations as it was unduly increased by the extreme value.)
b. Who should we agree with? Why?

(We can agree with both Carmina and Rodolfo. Carmina correctly calculated the
range; Rodolfo intelligently observed that while Carmina was correct in her
calculation, the range is not very useful in describing the variability of the
observations, as the range would only be PHP 96.80 if the outlier were removed
from the data set.)
2. Three hundred students taking a basic course in Statistics are given similar final
examination. After checking the papers and while the professor is studying the
distribution of the final examination scores, he taught of several scenarios which are
described below:
a. Suppose the professor will give 30% weight to the final examination, what effect
would multiplying 30% on all the final scores have on the mean of the final exam
scores? On the standard deviation of the final exam scores?
(The mean will also get rescaled by 30%, so with the standard deviation.)
b. Suppose the professor wants to bloat the final examination scores, what will be
the effect to the mean of the final exam scores if 5 points will be added to each of
the final score? On the standard deviation of the final exam scores?
(The mean will also go up by 5 points; while standard deviation stays the same.)
3. In a fitness center, weights of a certain group of students were taken resulting to

a common weight of 140 pounds. What would be the standard deviation of the
distribution of weights?
(Zero, since the observations do not vary.)
4. Determine which of the following statements is (are) TRUE or FALSE. Explain

briefly your answer.
a. If each observation in a data set is doubled, then the standard deviation

would also be doubled.
! 67#
(True, since the variance would be quadrupled and taking the square root of
the resulting variance, will result to twice the standard deviation.)
b. If in a set of data, positive numbers are changed to negative, while negative

are changed to positive, then the standard deviation changes its sign as well.
(False, since standard deviation is always nonnegative.)
Explanatory Note:
Teachers have the option to ask this assessment orally to the entire class to either
introduce or recall the notions of computing the range and of computing the
standard deviation, or to group students and ask them to identify answers, or to give
this as homework, or to use some questions/items here for a chapter examination.
! 68#
Lesson 9: More on Describing Data: Summary Measures and Graphs

OVERVIEW OF LESSON:
In this lesson, students will do an activity that will use the data on heights and weights
which were collected in Lesson 2. They will construct box plots and calculate the
summary measures they have learned in previous lessons. These computed summary
measures and constructed boxplot will be used to describe fully the data set so as to
provide simple analysis of the data at hand.
• Construct and interpret box plots; and
• Provide simple analysis of a data set based on its descriptive measures.
LESSON OUTLINE:
1. Preliminaries: Teacher’s Preparation for the Lesson
2. Motivation: The Student’s Height and Weight and Corresponding BMI
3. Construction and Interpretation of a Box-plot
REFERENCES
“Armspans” inSTatistics Education Web (STEW)
http://www.amstat.org/education/stew/pdfs/Armspans.docx
“Deciding Which Measure of Center to Use” http://www.sharemylesson.com/teaching-

resource/deciding-which-measure-of-center-to-use-50013703/
Handbook of Statistics 1 (1st and 2nd Edition), Authored by the Faculty of the Institute of
Statistics, UP Los Baños, College Laguna 4031
! 69#
A. Preliminaries: Teacher’s Preparation for the Lesson

Note: This is an activity that the teacher has to do in preparation for the
lesson.
A day before the actual schedule for this lesson, you should review some information
about the body mass index (BMI) so that you could compute the BMI of each student
in the class based on the students’ weights and heights collected in Lesson 2. This will
also make you more confident to discuss BMI in the class as well as use it to integrate
the lessons learned in this chapter. The following discussion provides useful
information about BMI.
The BMI, devised by Adolphe Quetelet, is defined as the body mass divided by the
square of the body height, and is universally expressed in units of kg/m2, using weight
in kilograms and height in meters. When the term BMI is used informally, the units are
usually omitted. A high BMI can be an indicator of high body fatness. The BMI can be
used to screen for weight categories that may lead to health problems.
The BMI provides a simple numeric measure of a person's thickness or thinness,

allowing medical and health professionals to discuss weight problems more objectively
with the adult patients. The standard weight status categories associated with BMI
ranges for adults are listed below:
BMI Range Weight Status Health Risk

Below 18.5 Underweight Risk of developing
problems such as
nutritional deficiency and
osteoporosis
18.5 -22.9 Normal or Healthy Low Risk (healthy range)
Weight
23.0-27.4 Overweight Moderate risk of
developing heart disease,
high blood pressure,
stroke, diabetes
27.5 and above Obese High risk of developing
heart disease, high blood
pressure, stroke, diabetes
For adults, a BMI from 18.5 up to 23 indicates optimal weight, while a BMI lower than
18.5 suggests that the person is underweight, a number from 23 up to 30 indicates that
! 70#
the person is overweight, and a number from 30 upwards suggests the person is
obese. Note that the threshold 23 and 27.5 are used for South East Asians, as per
suggestion of the World Health Organization (WHO), though generally 25 and 30 are
used.
Special Notes about interpreting BMI:
1. Many but not all athletes have a high muscle to fat ratio and may have a BMI that is
misleadingly high relative to their body fat percentage. Exceptions also can be
made for the elderly, and the infirm.
2. For children and teens, the interpretation of BMI depends upon age and sex, even
though it is computed using the same formula. This difference in interpretation is
due to the variability in the amount of body fat with age and between girls and
boys, among children and teens. Instead of comparison against fixed thresholds for
underweight and overweight, the BMI is compared against the percentile for
children of the same gender and age. A BMI that is less than the 5th percentile is
considered underweight and above the 95th percentile is considered obese.
Children with a BMI between the 85th and 95th percentile are considered to be
overweight.
3. The following are other limitations in the interpretation of BMI.

a. Since the BMI depends upon weight and the square of height, it ignores the
basic scaling law which states that mass increases to the 3rd power of linear
dimensions. Thus, taller individuals, even if they had exactly the same body
shape and relative composition, always have a larger BMI
b. BMI also does not account for body frame size; a person may have a small frame
and be carrying more fat than optimal, but the BMI may suggest that these
people are normal. Alarge framed individual may be quite healthy with a fairly
low body fat percentage, but the BMI may yield an overweight classification.
In the Philippines, the government’s Food and Nutrition Research Institute (FNRI) of the
Department of Science and Technology collects the anthropometric data through the
National Nutrition Survey (NNS) to be able to generate estimates on the extent of child
malnutrition using three indicators of undernutrition: underweight, wasted and stunted.
The NNS is conducted every five years and based on the gathered weights and heights,
the nutritional status of the Filipinos was assessed.
! 71#
For a Filipino child whose weight is below three standard deviations from the median
weight-for-age, the child is said to be severely underweight, while if the weight is lower
than two standard deviations from the growth standard but higher than three standard
deviations, then the child is moderately underweight. Similarly, (moderate and severe)
wasting and stunting are respectively, defined in terms of the child growth standards on
weight-for-height and height-for-age, respectively. Using these standards, FNRI
estimates based on the 2013 NNS about one in five children aged 0 to 5 years were
underweight, about three in ten had stunted growth. Wasting—or low weight-for-
height—was estimated at 7.9 percent.
It was also reported that incidents of malnutrition were high among those under the
poorest 20 percent of families: underweight (29.8 percent), stunting (44.8 percent), and
wasting (9.5 percent). Malnutrition is thus related to poverty. The percentage of
overweight children was highest among the "wealthiest" (10.7 percent). The figure
below shows the trends in the prevalence of stunting, underweight and wasting from
1989 to 2013 based on the data gathered by FNRI through its NNS.
50
40
30
20
10
1990 1995 2000 2005 2010 2015

Year
Stunting!
Underweight Underweight!
Stunting
! Wasting
Figure 1. Prevalence of stunting, underweight, and wasting among 0-5 years old
preschoolers in the Philippines, 1989-2013.
When children under five are experiencing malnutrition, they are likely to carry this over
to early childhood, which has repercussions on learning achievements in school. In
consequence, government, through the Department of Social Welfare and
Development, as well as the Department of Education (DepED), has developed
feeding programs to reduce hunger, to aid in the development of children, to improve
! 72#
nutritional status and to promoting good health, as well as to reduce inequities by
encouraging families to send their children to school given the incentive of school
feeding benefits. DepED thus regularly collects school records of heights and weights
at the beginning and end of the school year to monitor nutrition of school-aged
children.
With this information and the class data gathered in Lesson 2, you are now to compute
the BMI of each student so that a table with the following format will be ready for the
group activity described in the next section.
Class Student Sex Height Weight BMI

Number (in (in (rounded off to whole
meters) kilograms) numbers)
Note that the height of the student collected in Lesson 2 is in centimeter, thus you
have to divide the values by 100 to get the values in meters. Also, BMI is rounded off
to whole numbers for ease of computation in the group activity.
B. Motivation: The Student’s Height and Weight and Corresponding BMI
The activities for this lesson is to be done by groups and will be conducted during the
entire class period. Hence, it is recommended that the grouping be done at the start of
the class and the group members sit together in a circle as the activity requires group
discussion. As mentioned, the students should be advised to stay in their group for the
entire class period.
A suggested way to group the students into three groups is to have them count 1-2-3
sequentially and students with same number will belong to the same group. Once,
they are seated together as group you could begin the lesson by asking the students if
they think that males and females have the same heights, weights and BMI. Have them
guess what the distribution of heights, weights and BMI might look like for the whole
class and whether the distribution of heights, weights and BMI for males and females
would be the same.
! 73#
The following are some possible questions to ask:
• Are the heights, weights, and BMI of males and females the same or
different?
• What are some other factors besides sex that might affect heights, weights
and BMI? (Possible factors that could be studied are age, location where
person resides, and year the data was collected.)
You could write these questions on the board so that the students will be reminded of
these questions while they perform a group activity. Assign the first group (those
students who were numbered ‘1’) for the variable ‘height’; the second group (those
students who were numbered ‘2’) for the variable ‘weight’; and third group (those
students who were numbered ‘3’) for the variable ‘BMI’ You will be using the class data
you prepared in the preliminary activity for this lesson. The following table provides a
sample data or what your class data should look like.
Class Student Sex Height W eight BM I

Number (in (in (rounded off to whole
m eters) kilogram s) num bers)
1 F 1.64 40 15
2 F 1.52 50 22
3 F 1.52 49 21
4 F 1.65 45 17
5 F 1.02 60 58
6 F 1.63 45 17
7 F 1.50 38 17
8 F 1.60 51 20
9 F 1.42 42 21
10 F 1.52 54 23
11 F 1.48 46 21
12 F 1.62 54 21
13 F 1.50 36 16
14 F 1.54 50 21
15 F 1.67 63 23
16 M 1.72 55 19
17 M 1.65 61 22
18 M 1.56 60 25
19 M 1.50 52 23
20 M 1.70 90 31
21 M 1.53 50 21
22 M 1.62 90 34
23 M 1.79 80 25
! 74#
24 M 1.57 58 24
25 M 1.70 68 24
26 M 1.77 27 9
27 M 1.48 50 23
28 M 1.73 94 31
29 M 1.56 66 27
30 M 1.75 50 16
With the class data, ask each group to do the following for the assigned variable in
their group:
1. Compute the descriptive measures for the whole class and also for each subgroup
in the data set with sex as the grouping variable. The descriptive measures to
compute include the measures of location such as minimum, maximum,mean,
median, first and third quartiles; and measures of dispersion such the range,
interquartile range (IQR) and standard deviation. Each group could use the
following format of the table to present the computed measures:
Table 9.1 Summary statistics of the variable __________.

Descriptive Computed Value
Measure
For the whole For the subgroup For the subgroup of
class with N = ___ of Males with N = Females with N = ___
___
Measures of Location
Minimum
Maximum
Mean
First Quartile
Median
Third Quartile
Measures of Dispersion
Range
IQR
Standard Deviation
2. With the computed descriptive measures, write a textual presentation of the data
for the variable assigned to the group.
! 75#
The following tables provide the descriptive measures of the sample class data as a
whole and by subgroup. Note that there might be discrepancies in the computed
values due to rounding off.
Table 9.2 Summary statistics of the variable height (in meters) using the sample data.
Descriptive M easure Com puted Value
For the whole class For the subgroup For the subgroup of
with N = 30 of Males with N = Females with N = 15
15
Minimum 1.020 1.480 1.020
Maximum 1.790 1.790 1.670
Mean 1.582 1.642 1.522
First Quartile 1.520 1.560 1.500
Median 1.585 1.650 1.520
Third Quartile 1.670 1.730 1.630
Range 0.770 0.310 0.650
IQR 0.150 0.170 0.130
Standard Deviation 0.144 0.103 0.157
Possible textual presentation of the data on heights:
Based on Table 9.2, on the average, a student of this class is 1.582 meters high. The
shortest student is just a little bit over one meter while the tallest is 1.79 meters high
resulting to a range of 0.77 meter. The median which is 1.585 is almost the same as the
mean height.
Comparing the males and female students, on the average male students are taller
than female students but the dispersion of the heights of the female students is wider
compared to that of the male students. Thus, male students of this class tend to be of
same heights compared to female students.
Table 9.3 Summary statistics of the variable weight (in kilograms) using the sample
data.
! 76#
Measure
class with N = 30 of Males with N = Females with N = 15
15
Minimum 27.0 27.0 36.0
Maximum 94.0 94.0 63.0
Mean 55.8 63.4 48.2
Median 51.5 60.0 49.0
Range 67.0 67.0 27.0
IQR 15.0 30.0 12.0
Possible textual presentation of the data on weights:
Using the statistics on Table 9.3, on the average, a student of this class weighs 55.8
kilograms. The minimum weight of the students in this class is only 27 kilograms while
the heaviest student of this class is 94 kilograms. There is a wide variation among the
values of the weights of the students in this class as measured by the range which is
equal to 67 kilograms. The median weight for this class is 51.5 kilograms which is quite
different from the mean as the value of the latter was pulled by the presence of
extreme values.
Comparing the males and female students, on the average male students are heavier
than female students. The extreme values observed for the class are both coming from
male students. The wide variation observed on the students’ weights of this class was
also observed among the weights of the male students. In fact, the standard deviation
of the weights of the male students is more than double the standard deviation of the
weights of female students.
! 77#
Table 9.4 Summary statistics of the variable BMI (in kg/m2) using the sample data.
Measure
class with N = 30 of Males with N = Females with N = 15
15
Minimum 9.0 9.0 15.0
Maximum 58.0 34.0 58.0
Mean 22.9 23.6 22.2
Median 21.5 24.0 21.0
Range 49.0 25.0 43.0
IQR 5.0 6.0 5.0
Possible textual presentation of the data on BMIs:
Table 9.4 shows that the minimum BMI of the students in the class is 9 while the
maximum is 58 kg/m2. On the average, a student of this class has a BMI of 22.9. Also,
the median BMI for this class is 21.5 which is near the value of the mean BMI. The
variability of the values is also not that large as a small standard error value of 8.3 was
obtained.
Comparing the males and female students, on the average, the BMI of the male and
female students are near each other with numerical values equal to 23.6 and 22.2,
respectively.But there is a wider variation among the BMI values of the female students
compared to that of the male students. The standard deviation of the BMIs of the male
students is less than that of the female students.
Visual comparison of the data distributions between two or among several groups
could be achieved through box-plots. You may ask the students if they already know
how to construct a box-plot. If so, you may just review the steps with them. Otherwise,
! 78#
you may briefly discuss the steps in constructing box-plot as given in the next section
before you ask them to construct box-plots for their respective data sets.
C. Construction of a Box-Plot
Using five summary statistics, namely: minimum, maximum, median, first and third
quartiles, a box-plot can be constructed as follows:
1. Draw a rectangular box (horizontally or vertically) with the first and third quartiles
as the endpoints. Thus the width of the box is given by the IQR which is the
difference between the third and first quartiles.
2. Locate the median inside the box and identify it with a line segment.
3. Compute for 1.5 IQR. Use this value to identify markers. These markers are used
to identify outliers. The lowest marker is given by Q1 – 1.5IQR while the highest
marker is Q3+ 1.5IQR.Values outside these markers are said to be outliers and
could be represented by a solid circle.
4. One of the two whiskers of the box-plot is a line segment joining the side of the
box representing Q1 and the minimum while the other whisker is a line segment
joining Q3 and the maximum. This is for the case when the minimum and
maximum are not outliers. In the case that there are outliers, the whiskers will
only be line segments from the side of box and its corresponding marker.
Inform also the students that a box-plot is also called box-and-whiskers plot and it
could easily be generated using a statistical software. Comparison of data
distributions could easily be done visually using this kind of plots. Likewise, in
technical papers or reports, a box-plot is an accepted graphical presentation of
data distribution.
To complete the activity for this lesson, ask each group to construct box-plots of the
male and female data distributions of their assigned variable. They could further
improve their textual presentation by interpreting the resulting box-plots of their
data sets.
Using the sample class data, the following figures provide the box-plots for the
variables heights, weights and BMI by sex of the student. The said figures confirm
what were stated in the textual presentation.
! 79#
Figure 9.1 Box-plots of the variable heights of the 30 students by sex.
We could also note that in Figure 9.1, the distribution of heights for the girls has a
larger range because of an outlier as represented by a solid circle given on the plot.
The distribution of the girls’ heights has smaller median compared to the male
distribution.
Figure 9.2 Box-plots of the variable weights of the 30 students by sex.
! 80#
For the variable weights, females have a lower median weight than males, as well as
less variability. The middle 50% of the female weight distribution is also observed to be
contained within the range of the male weight data.
Figure 9.2 Box-plots of the variable BMI of the 30 students by sex.
As for the variable BMI, females have a lower median BMI and lower variability
compared to those of males. There is, at least extremely obese female, and one is
severely underweight male.
With the computed descriptive statistics and corresponding box-plot(s), the analysis or
textual presentation could be further improved by describing data not only in terms of
the measures but also in terms of the interpretation of box plots. Furthermore, these
measures allow us to answer the guide questions provided at the start of the class.
KEY POINTS
• Descriptive measures are important statistics required in simple data analysis.

• Groups of data could be compared in terms of their descriptive measures.
• A box-plot is an approach to compare visually data distributions.
! 81#
ASSESSMENT
In a university the grading scale that is used for a subject are as follows: 1.0; 1.25; 1.5;
1.75; 2.0; 2.25; 2.5; 2.75; 3.0; 4.0; and 5.0 Grades from 1.0 to 3.0 are passing grades
with 1.0 as the highest possible grade. The grade of 5.0 is failing while 4.0 is a
conditional grade. At the end of the semester, the general weighted average (GWA) of
the students are computed and students with high GWAs are usually recognized.
Below is a table showing the GWA and sex of thirty students who are to be recognized
in a program for having high GWAs.
Name GW A Sex
Im elda 1.54 F
Frederick 1.45 M
Gerald 1.42 M
Jose 1.52 M
Ana 1.56 F
Isidoro 1.34 M
Roberto 1.36 M
Katherine 1.43 F
Barbara 1.49 F
Josie 1.58 F
M aria 1.64 F
Kenneth 1.56 M
Ofelia 1.56 F
Am paro 1.49 F
Jam es 1.42 M
Ditas 1.24 F
Frenz 1.78 F
Ronald 1.06 M
Ruben 1.33 M
Belle 1.45 F
Elm o 1.38 M
Connie 1.27 F
Gina 1.22 F
M arcia 1.59 F
Jikko 1.60 M
Susan 1.59 F
Em m an 1.63 M
Pinky 1.70 F
Rose 1.75 M
Brad 1.58 M
! 82#
Use the approaches below to compare the academic performance of male and female
students in the previous term.
1. Compute for the descriptive measures which include the measures of location such
as minimum, maximum, mean, median, first and third quartiles; and measures of
dispersion such the range, interquartile range (IQR) and standard deviation by sex.
Descriptive M easure Com puted Value

For the subgroup For the subgroup of
of Males with N = Females with N = 16
14
Minimum 1.06 1.22
Maximum 1.75 1.78
Mean 1.46 1.51
First Quartile 1.36 1.44
Median 1.44 1.55
Third Quartile 1.58 1.59
Range 0.69 0.56
IQR 0.22 0.15
Standard Deviation 0.17 0.16
2. Using the computed descriptive statistics, compare the two distributions in terms of
their measures of location and measures of dispersions. On the average, which
group of students perform better academically in the previous term? Which group
varies more?
(On the average, the numerical GWA of female students is 1.51 while male students
have an average GWA of 1.46 which implies that male students in this group
perform better academically than the female students. There is also difference in
the numerical values of the computed medians but still the same observation that
males perform better than females. However, the variability of the observations for
the male students is higher compared to those of the female students. Hence, we
say that the GWAs of male students vary more than those of the female students.)
! 83#
3. Sort the data within each group then determine what proportion in each group is
within one standard deviation of that group's mean. Are the proportions similar?
(Sorted Data of Male Students:
1.06 1.33 1.34 1.36 1.38 1.42 1.42 1.45 1.52 1.56 1.58 1.6 1.63 1.75
! ∓ σ = 1.46 ∓ 0.17 = 1.29,1.63 Note that there are 12 out of 14 observations are
within the interval or 86% of the observations are within one standard deviation of
the mean.
Sorted data for the female students:
1.5 1.5 1.5 1.5 1.5 1.5 1.6 1.7

1.22 1.24 1.27 1.43 1.45 1.49 1.49 1.7
4 6 6 8 9 9 4 8
! ∓ ! = 1.51 ∓ 0.16 = 1.35,1.67 Note that there are 11 out of 16 are within the
interval or 69% of the observations are within one standard deviation of the mean.
The proportions of observations that are within one standard deviation of the mean
for each group are not the same. The proportion for the male group is larger than
that of the female group. This support the observation earlier that the GWAs of the
male students are more varied compared to those of female students.)
! 84#
4. Construct box-plots of the GWAs for the males and females. Compare the two data
distributions of GWAs.
Visually, the two distributions of GWAs are different. The GWAs of the female
students are less dispersed compared to that of the male students. Numerically, the
median GWA of male students is lower than that of the female students. Hence,
male students of this group perform better academically than their female
counterpart. But the numerical values of the GWAs of the female students are close
to each other.
! 85#
CHAPTER 2 : RANDOM VARIABLES AND
PROBABILITY DISTRIBUTIONS
Lesson 1: Probability
OVERVIEW OF LESSON
In this activity, learners initially review some basic concepts in Probability that they may
have learned prior to Grade 11. Then, they are taught extra concepts on conditional
probability. There are also discussions on the classical birthday problem that show
them how to compute for the chance or probability of having at least two people in the
classroom share the same birthday.
LEARNING COMPETENCIES
At the end of the lesson, learners should be able to:
• define probability in terms of empirical frequencies

• show how to apply the General Addition Rule, and the Multiplication Rule
• make use of a tree diagram for conditional probabilities
LESSON OUTLINE
A. Introduction / Motivation: What is Probability?

B. Main Lesson: Computing the Probability of an Event
REFERENCES
Many of the materials in this lesson were adapted from:
De Veau, R. D., Velleman, P. F., and Bock, D. E. (2006). Intro Stats. Pearson Ed. Inc.
Workbooks in Statistics 1, 11th Edition. Institute of Statistics, UP Los Baños, College

Laguna 4031
Probability and Statistics: Module 18. (2013). Australian Mathematical Sciences Institute
and Education Services Australia. Retrieved from
http://www.amsi.org.au/ESA_Senior_Years/PDF/Probability4a.pdf
! 86#
A. Introduction / Motivation: What is Probability and How to Assign It?
Begin the session with a discussion on your uncertainty over summaries generated
from data, especially when data are “random” samples of a larger population of
units (i.e. people, farms, firms, etc).
Examples: (i) approval ratings or proportion of people voting for a candidate (in
an opinion poll); (ii) average family income (in the Philippine Statistical
Authority’s triennial Family Income and Expenditure Survey); (iii) average prices
of commodities (from sample outlets)
Explain that people can quantify uncertainty through the notion of PROBABILITY
(or Chance). Suggest to learners that if they were asked for the probability that they
would pass the next quiz, they may give a number between 0% and 100 percent.
Typically, the chances of a future outcome may be based on some past experience
of data collected. Very studious learners, for instance, had passed their quizzes 100
percent of the time, while average students had passed their quizzes 85 percent of
the time.
When considering probabilities of events, learners should be guided to consider a

particular context wherein possible outcomes are well defined and can be specified,
at least in principle, beforehand. This context is called random process wherein
we do not know which of the possible outcomes will occur, but we do know what is
on the list of possible outcomes. Learners can be informed that it can be also
helpful to view the probability of an event as its “long-run” empirical frequency
or the fraction of times the event may have occurred under repeated “trials” of the
random process. In the next lesson, we shall call this the “empirical probability,”
and mention that in practice, we expect these empirical probabilities to stabilize
toward some “theoretical probability.” This is called the law of large numbers).
Ask learners to think of random processes and an event where:
a. the outcome is certain. Examples may be getting a head (event) in the next
toss of a two-headed coin (random process) or getting a number of at most 6
(event) when a die is thrown once (random process)
b. the outcome is impossible. Examples may be getting a tail (event) in the next
toss of a two-headed coin (random process) or getting a number greater than 6
(event) when a die is thrown once (random process)
! 87#
c. the outcome has an even chance of occurring. Examples may be a couple
having a boy (event) as their next child (random process) or getting a red card
(event) when randomly selecting a card from a deck of cards (random process)
d. the outcome has a strong but not a certain chance of occurring. Example
might be getting a sum of at most 11 (event) when a pair of dice is thrown
(random process)
Then, ask them the probability associated with these events. (Answers: 100 percent
for certain events, 0 percent for impossible events, and 50 percent for outcomes
with even chance of occurring. For the example in D, there are 36 possible
outcomes for tossing a pair of fair dice, 35 of them will have at most a sum of 11, so
the chance of getting at most 11 is 35/36). The closer the value of the probability to
1, the more likely the event will occur and the closer it is to 0, the less likely it will
occur.
Important: Point out to learners the following properties of the probability of an

event:
• the probability of an event is a non-negative value. In fact, it ranges from

zero (0) (when the event is impossible) to one (when the event is sure). The
closer the value to one, the more likely the event will occur
• the probability of the sure event is one (In other words, the chance of a
sure event is 100 percent).
• if A and B are mutually exclusive events, meaning it is impossible for
these two events to occur at the same time, then P(A or B) = P(A) + P(B).
This is called the Addition Rule.
A more general result (also called the General Addition Rule) states that:
P(A or B) = P(A) + P(B) - P(A and B)
Geometrically, from a Venn Diagram, the area of the union of A and B is the sum of
the areas, but if we added the intersection of A ∩ B twice, so we have need to
subtract this area from the sum of the areas of A and B.
Illustrate to learners that these properties can help us more readily compute for the
probabilities of events.
P(at most a sum of 12 when tossing a pair of fair dice) =

P(at most a sum of 11 OR a sum of 12) = P (at most a sum of 11) + P(sum of
12)
! 88#
But P(at most a sum of 12) = 1 and P (sum of 12) = 1/36;
Thus, when looking for the value of P (at most a sum of 11)
P (at most a sum of 11) = 1 – 1 /36 = 35/36
In general, if we are interested in Ac, the complement of an event A (i.e. the event
that happens when A does not), since
P(A or Ac) = P(A) + P(Ac) and P(A or Ac) = P(Sure event) = 100%
Thus,
P(A) + P(Ac) = 1 or equivalently
P(Ac) = 1 – P(A)
In consequence, the chance that an event does not occur is one (1) minus the
chance it does occur.
In terms of a Venn Diagram below, given a Sure Event S (represented by a square

with area 100%), and an event A (represented by the triangle whose area represents
the probability of A), then the chance of an event A not happening is one minus the
chance of event A happening (i.e. area of the square minus the area of the triangle).
! 89#
Extra Notes: Mention also to learners that:
(1) Historically, probability was studied by gamblers who wanted to increase their
winnings (or at least decrease their losses).
(2) Probability describes random behavior, but does anything really happen at
random? Even Albert Einstein, when confronted by theories of quantum mechanics,
was said to have pointed out that “God does not play dice.” Yet, many events,
especially in nature “seem” to display random behavior. In many real life situations,
we will be able to model these by random processes and thus, apply probability to
understand the behavior of these situations.
B. Main Lesson: Computing Probabilities of Events
Mention to learners that the calculation of the probability of an event may

sometimes be considered directly from the nature of the phenomenon/random
process, with some assumptions of symmetry. Some underlying outcomes may be
“equally likely” by assumption such as fair coins and fair dice. In practice, these
assumptions need to be tested and will be the subject of inquiry in future lessons.
These assumptions are simplifications to help us calculate probabilities.
Example 1: Tell learners that a box contains green and blue chips. A chip is then
drawn from the box. If it is green, you win P100. If it is blue, you win nothing.
• Learners have a choice between two boxes:
– Box A with 3 blue chips and 2 green chips
– Box B with 30 blue chips and 20 green chips
• Which would learners prefer???
Some learners may say B, but tell learners that it actually should not matter,
because the chance of winning Php100 is 2/5 =40% in box A, while in box B, the
chance of winning is 20/50 =40%. Same probability.
Conditional Probability
Mention to learners that sometimes, we may have extra information that can
change the probability of an event. Give the following definition of conditional
probability.
The conditional probability of event A given that B has occurred is denoted

as P(A|B) and defined as
! 90#
Example 2: Suppose that we want to randomly select a student from among Grades
9 to 12 in a certain school
Sex
Grade Total
Male Female
9 84 145 229
10 40 82 122
11 36 52 88
12 25 36 61
Total 185 315 500
The chance of selecting a Grade 11 student, given that the student is male, can be
computed as follows:
Define events A and B as:
A = event that student selected is a Grade 11 student

B = event that student selected is male, then
Example 3: A king comes from a family of two children. What is the chance that the
king has a sister?
Remind learners here that as the king comes from a family of two children, we are
given extra information that this family of two children has a boy, the king.
! 91#
What we want to compute here is the probability that the sibling of the king is a
girl.
Let B the event of having at least one boy. So B={(b,b),(b,g),(g,b)}, where (x,y)
means the sex of each child and the possible values are b for boy and g for girl.
Then A is the event that the king's sibling is a girl, A={(b,g),(g,b)}.
While the original sample space S of all possible outcomes is

S={(b,b),(g,b),(b,g),(g,g)}, each outcome has ¼ chance of occurring.
However, P (A | B) = P (A and B) / P(B) = (2/4) / (3/4) = 2/3
Independent Events
Sometimes, the extra information provided may not really change the probability of
an event. In this case, the events are said to be independent. The conditional
probability of A given B may still be equal to the (unconditional) probability of event
A.
Two events A and B are said to be independent if
P (A and B) = P (A) P (B)
This is also called the Multiplication Rule. Intuitively, we call events such as
tossing a coin (or dice) several times independent since future tosses are not
affected by previous outcomes.
If however, the events are not independent then we can still obtain the probability
that both events A and B will occur using the definition of conditional probability:
P (A and B) = P (A) P (B | A)
Example 4: Tell learners to suppose that there is a box that contains three tickets
marked 1, 2, and 3. We shake the box, draw out one ticket at random; shake the
box and draw out a second ticket. What would be the probability of getting a sum
of “three” if tickets were drawn with replacement? Without replacement?
! 92#
The possible sums for the two tickets drawn with replacement are shown in a
contingency table and tree diagram below
In consequence the probability of getting a sum of three is:
P (sum of three) = 2/9
While if the tickets were drawn without replacement, we have
P (sum of three) = 2/6 = 1/3
Exercise 5: (The Birthday Problem, originally posed by Richard von Mises in 1939,
reprinted in English in 1964) Mention to learners that in a room filled with more
than 23 people, there is more than half a chance that at least two of them will have
the same birthday, and if there are more people, the chances increase further
toward 100% (about 99.9% with 70 people). Try it out with learners in your class.
Tell learners to identify how many of them have a birthday in January. Try to see if
you can get a match. Go to February if you don’t find anyone that match. Then
March, and so forth.
! 93#
The chance of 2 people having different birthdays is:
= 0.997260
The chance of N people having different birthdays is:
& 1 #& 2 # & N −1 #

$1 − !$1 − ! ! $1 − !
% 365 "% 365 " % 365 "
So the chance that at least two of them will have the same birthday is:
p(N) =
We have the probabilities computed below for several values of N.
N 10 20 23 30 50 57
p(N) 11.7% 41.1% 50.7% 70.6% 97.0% 99.0%
KEY POINTS
• Probability is a numerical representation of the likelihood of occurrence of an

event. Its value is between zero (0) and one (1). When the value approaches 1, this
means the event is very likely to occur, while a value close to zero (0) means it is
not likely to occur.
• When A and B are mutually exclusive events, then the probability of A or B is

P (A or B ) = P(A) + P (B) (this is called the Addition Rule)
• If A and B are independent events, then the probability of A and B is
P(A and B) =
P(A) P(B) (this is called the Multiplication Rule)
! 94#
ASSESSMENT
1. What would be the probability of

a. picking a black card at random from a standard deck of 52 cards?
b. picking a face card (i.e. a king, queen, or jack)?
c. not picking a face card?
Answer:
a. P(Black) = 26/52= ½ ;
b. P(Face)= 12/52 = 3/13 ;
c. P(not Face) = 1 – (3 /13) =10/13
2. What is the probability of rolling, on a fair dice:

a. a 3?
b. an even number?
c. zero?
d. a number greater than 4?
e. a number lying between 0 and 7?
f. a multiple of 3 given that an even number was drawn
Answer:
a. P(‘3’) = 1/6 ;
b. P(Even)= 3/6 = 1/2 ;
c. P(‘0’) = 0 ;
d. P(greater than 4) = P(5 or 6) = 2/6 = 1/3 ;
e. P(between 0 and 7) = P(1 or 2 or 3… or 6) = 6/6 = 1
f. P(multiple of 3 given even number) = P(multiple of 3 and even) / P(even)
= P(‘6’)/P(2 or 4 or 6) = (1/6) / (3/6 ) = 1/3
3. A standard deck of playing cards is well shuffled and from it, you are given two
cards. You can have 0, 1, or 2 aces: three possibilities altogether. So the probability
that you have two aces is equal to 1/3. What is flawed about this argument?
Answer:
The outcomes are not equally likely. There are (52)(51)/2=1326 ways of selecting
the first two cards. These are the equally likely outcomes. Of these ways, there
would be (4)(3)/2=6 ways of selecting two aces; (48)(47)/2=1128 ways of
selecting no aces, and 1326-6-1128=192 ways of selecting one ace. So, the
chance of getting two aces is 6/1326 and not 1/3.
! 95#
4. You shuffle a deck of playing card, and then start turning the cards one at a time.
The first one is black. The second one is also a black card. So is the third, and this
happens up to the 10th card. You start thinking, “the next one will likely be red!” Are
you correct in this reasoning?
Answer:
Yes, there are 42 cards left, 26 red and only 16 black. However, likely does not
mean certainty. There is 16/42 chance that it is still going to be black.
5. The family of Tony delivers newpapers, one to each house in their village.
Philippine Star 250 Manila Times 140

Philippine Daily Inquirer 300 Manila Standard Today 100
Manila Bulletin 150 Daily Tribune 60
What is the probability that a house picked at random has:

a. the Manila Times?
b. the Manila Standard Today or the Philippine Daily Inquirer?
c. a newspaper other than Daily Tribune?
Answer:
a. P(Manila Times) = 140/1000 =7/50;
b. P(Manila Standard Today or PDI)= (100 + 300)/1000 = 2/5 ;
c. P(other than Daily Tribune) = 1 – P(Daily Tribune) = 1 – (60/1000)= 940/1000
= 47/50
6. A class is going to play three games. In each game, some cards are put into a bag.
Each card has a square or a circle on it. One card will be taken out, then put back. If it
is a circle, the boys will get a point. If it is a square, the girls will get a point.
a. Which game are the girls least likely to win? Why?
b. Which game are the boys most likely to win? Why?
! 96#
c. Which game are the girls certain to win?
d. Which game is impossible for the boys to win?
e. Which game is it equally likely that the boys or girls win?
f. Are any of the games unfair? Why?
Answer:
a. game 3. For girls, chance of winning games 1, 2, and 3 are
respectively, 4/8=50%, 8/8=100%, 4/12 = 33.3%.
b.game 3. For boys, chance of winning games 1, 2, and 3 are
respectively, 4/8=50%, 0/8=0%, 8/12 = 66.7%.
c. game 2, chance is 100%
d. game 2, chance is 0%
e. game 1.
f. games 2 and 3.
7. In a computer ‘minefield’ game, ‘mines’ are hidden on grids. When you land
randomly on a square with a mine, you are out of the game.
a. The circles indicate where the mines are hidden on three different grids. On
which of the three grids is it hardest to survive?
b. Grid 1 above is a 3 by 6 grid with 6 mines. On which of the following grids is
it hardest to survive?
X. 99 mines on a 30 by 16 grid
Y. 40 mines on a 16 by 16 grid
Z. 10 mines on an 8 by 8 grid
Explain your reasoning.
Answer:
a. P(hit a mine) in grid 1, 2 and 3, respectively is 6/18=33.3%, 8/25=32%, 7/20=35%. Thus,

it is hardest to survive in grid 3
b. P(hit a mine) in X, Y, Z grid, respectively is 99/(30x16) = 0.20625, 40/(16x16)=0.15625,
10/8x8)=0.15625. So, it is hardest to survive in grid X.
! 97#
CHAPTER 2: RANDOM VARIABLES AND
Lesson 2: Geometric Probability
OVERVIEW OF LESSON
In this activity, learners initially review concepts in Probability, and discuss examples of
theoretical probability, geometric probability, and empirical probability. Then, they are
given a coin-tossing exercise to calculate the empirical probability of having a coin fall
on a particular square in a grid (to solve Buffon’s Coin problem). Learners are led to
discover that empirical probabilities, with more tosses, tend toward the
geometric/theoretical probability.

• define and distinguish “geometric probability,” “empirical probability,” and
“theoretical probability”
• use simulation to identify an empirical solution to Buffon’s coin problem
• employ area formulas to identify a theoretical solution to Buffon’s coin problem
• observe that as the number of trials increases, the empirical probability tends to
approach the theoretical probability.
LESSON OUTLINE
A. Introduction: Recall How to Calculate Probability for Certain Random Processes
B. Main Lesson: Empirical and Theoretical Probability
C. Investigation on Empirical Probability: Buffon’s Coin Problem
D. Investigation on Theoretical Probability: Buffon’s Coin Problem
REFERENCES
Schneiter, K. Exploring Geometric Probabilities with Buffon’s Coin Problem. Utah State University in
Statistics Education Web (STEW) Online Journal of K-12 Statistics Lesson Plans. Retrieved from
http://www.amstat.org/education/stew/pdfs/EGPBCP.pdf
Nelia Marquez). Philippines: Rex Bookstore.
Workbooks in Statistics 1: 11th Edition. Institute of Statistics, UP Los Baños, College Laguna 4031
Probability and Statistics: Module 18. (2013). Australian Mathematical Sciences Institute and Education
Services Australia. Retrieved from http://www.amsi.org.au/ESA_Senior_Years/PDF/Probability4a.pdf
Geometric Probability Examples. North Carolina School of Science and Mathematics. Retrieved from
http://www.dlt.ncssm.edu/stem/sites/default/files/GeometricProbabilityexamples.pdf
Geometric Probability Solutions. North Carolina School of Science and Mathematics. Retrieved from
http://www.dlt.ncssm.edu/stem/sites/default/files/GeometricProbabilitysolutions.pdf
! 98!
MATERIALS REQUIRED
• Coins and square grid (the diameter of the coin should be less than the
length of a square on the grid, possibilities are plastic lids on floor tiles, or
coins on graph paper). Note that a blank grid for coins is provided on the last
page of this lesson.
• Pencil and paper for record keeping and note taking
• Calculator
A. Introduction / Motivation: Recall How to Assign Probabilities to Events
Begin the session with a recall of the notion of the PROBABILITY (or Chance) of
events, in the context of random processes where possible outcomes can be
determined beforehand, but not whether an outcome will occur.
Mention to students that probabilities of events may be assigned:
(a) theoretically by assuming understanding of situations in the events, such as

symmetry or equal-likely outcomes (e.g. fair coin, fair dice being tossed so
outcomes are equally likely), or if the events are related to areas of geometric
objects (this is called “geometric probability”)
(b) subjectively with personal assessment of the situation (e.g. when a student
tells his friend that he has 50 percent chance of passing the quiz, or the
probability that a student can swim around the world in 24 hours is zero (0))
(c) empirically by collecting data from repeated trials or experiences, and

getting the proportion of times an event occurs (e.g. observing 10 patients,
noticing that 6 of them responded to a medicine within one hour of the
treatment, and thus, stating that the probability of response within an hour of
receiving the treatment is 60 percent)
Inform students that a few hundred years ago, people enjoyed betting on coins
tossed onto the floor ... Would they cross the line of the grid or not? Georges-Louis
Leclerc, Comte de Buffon (1707 – 1788), a French mathematician, started thinking
about this problem more systematically, expressing it as follows:
“What is the probability that a coin, tossed randomly at a grid, will land entirely
within a tile rather than beyond the tile boundaries? (For the purposes of this
! 99!
lesson, we will assume that the diameter of the coin is less than the length of a
side of the tile.)
The ‘Buffon coin problem’ is an exercise in geometric probability, where

probabilities are viewed as the proportions of areas (lengths or volumes) of
geometric objects under specified conditions. Examples of questions that deal with
geometric probabilities are:
What is the probability of hitting the bull’s eye when

a dart is thrown randomly at a target, given the
target has a diameter of 24 cm, and the bull’s eye
has a diameter of 10 cm?
What is the probability that a four-colored spinner,

with a diameter of 20 cm, will land on red?
Geometric probabilities can be estimated using

empirical approaches, or identified exactly using
analytical methods (theoretical probability).
An empirical probability is the proportion of

times that an event of interest occurs in a set number
of repetitions of an experiment.
Example: Throw 100 darts at the target.
15 darts hit the bull’s eye.
The empirical probability of hitting the bull’s eye is

15/100 = 3/20.
Spin the spinner 50 times.
Spinner lands on red 12 times.
Empirical probability = 12/50 = 6/25.
! 100!
A theoretical probability is the proportion of times an event of interest is expected
to occur in an infinite number of repetitions of an experiment. For a geometric
probability, this is the ratio of the area of interest (e.g. bull’s eye) to the total area (e.g.
target).
Area of bull’s eye =
Area of target =
Theoretical probability of hitting bull’s eye =
Area of red section =
Area of spinner =
Theoretical probability of landing on red =
Ask learners how they can identify an empirical solution to this Buffon’s coin problem?
(This corresponds to prompt 1 on the task sheet.)
C. Investigation on Empirical Probability: Buffon’s Coin Problem
I. Problem Formulation: Buffon’s coin
What is the probability that a coin, tossed randomly at a grid, will land entirely
within the tile rather than beyond the tile boundaries? (Recall that in this activity, we
assume that the diameter of the coin is less than the length of a side of the tile.)
II. Design and Implement a Plan to Collect the Data
• Discuss this as a class: How would you identify an empirical solution to

Buffon’s coin problem? (See Item 1 on Activity Sheet.)
! 101!
After learners propose tossing coins at a grid, discuss details of the
experiment. Divide the class into groups of five students. How many times
will each group throw the coin? How will the coin be tossed? Will they count
the times the coin lands on a boundary or the times it lands entirely within
the tile? Will each group do this the same way? What difference will it make if
they do not? (For purposes of later discussion, it will be helpful if everyone
considers the event that the coin lands entirely within a tile.) Who will record
the outcome of each toss? How will this count translate into an empirical
probability?
• Experiment: Instruct each group to conduct the experiment, as designed by

the class. (See Item 2 of Activity Sheet.)
III. Analyze the Data
Instruct each group to use the data they gathered to compute the empirical
probability of the event they considered.
IV. Interpret the Results
• Discuss as a class:
o Summarize the empirical probabilities generated by the groups on the
blackboard. Ask learners what they observe about the empirical
probabilities computed by the groups. (They are not all the same,
many may be similar, a few may differ by a lot, if the experiment were
repeated different answers would be obtained).
o Is it possible to get a more stable answer? (Yes, repeat the experiment
more times, combine data from different groups).
o Ask students what they would expect to see if the coin could be
tossed an infinite number of times. Why would they expect to see
this? (See Item 3 of Activity Sheet.)
D. Investigation on Geometric Probability
I. Problem Formulation: Buffon’s coin
Recall Buffon’s coin problem: What is the probability that the coin, tossed
randomly at a grid, will land entirely within a tile rather than beyond the tile
boundaries?
II. Solution to the Problem
! 102!
• Discuss this as a class: How will they identify a theoretical solution to Buffon’s
coin problem? (See Item 4 of Activity Sheet.)
Outline the process here: Identify the shape of the region within the tile
where the coin must land to be entirely within the tile. Look at the ratio of
the area of that shape to the area of a tile. Learners must work out the details
with group mates in the next segment.
Answer: The Probability of a Crack Crossing
Our main interest is in the event C that the coin crosses the tiles. However, it
turns out to be easier to describe the complementary event Cc that the coin
does not cross a tile.
If the tile has unit length, and radius r < ½, then
P(C c )= (1 - 2r) 2 ,
Thus, P(C) = 1 - (1 - 2r) 2
• Explore: Again working in groups, ask students:

o To formulate a conjecture about the relationship between theoretical
and empirical probabilities. (See Item 5 of Activity Sheet.)
o To identify the shape of the region within the tile in which the coin
must land to be entirely within a tile. (This will be challenging for some
learners. The key is to consider where the center of the coin lands and
how close the center can be to the edge of a tile while the coin is not
on the boundary.)
III. Analyze the Data
Instruct each group to use observations from the experiment, and the group
discussion to compute for the theoretical probability that a coin tossed
randomly at a grid lands entirely within a single tile. (See Item 6 of
Activity Sheet.)
! 103!
IV. Interpret the Results
• Discuss as a class: Summarize learners’ answers on the board. Discuss

observations about these solutions. Bring the class to a consensus on the
solution. Observe that there is only one solution and it will not vary with
further investigation.
• Synthesis
What seems to be the relationship between empirical and theoretical
probabilities? (See Item 7 of Activity Sheet.)
KEY POINTS
• Empirical probabilities (obtained from observing the proportion of times an event

occurs in repeated trials) may differ, but the long run frequency of empirical
probabilities will stabilize toward the theoretical probability. (As the number of
trials increases, the empirical probability tends to converge to the theoretical
one).
• For some situations, we can calculate the theoretical probabilities as geometric
probabilities, when events pertain to areas of geometric objects.
• Sometimes, we associate probabilities subjectively, according to personal
assessment of the likelihood of an event to occur.
ACTIVITY SHEET 2-02
Definitions:
1. Empirical probability: the proportion of times an event of interest occurs in a set

number of repetitions of an experiment.
2. Theoretical probability: the proportion of times an event of interest would be

expected to occur in an infinite number of repetitions of an experiment.
3. Geometric probability: a probability concerned with proportions of areas (lengths

or volumes) of geometric objects under specified conditions.
4. Subjective probability: a probability derived from an individual's personal

assessment of the situation on whether a specific outcome is likely to occur
! 104!
Investigation:
Consider the question: What is the probability that a coin, tossed randomly at a grid,
will land entirely within a tile rather than beyond the tile boundaries?
1. How would you be able to determine an empirical probability that “a coin, thrown
randomly at a grid, will land entirely within a tile of a grid rather than beyond the tile
boundaries?”
2. Work with your group to compute for the empirical probability that the coin will
land within the tile of a grid. Record your observations below:
3. What would you expect to observe if the coin were to be tossed an infinite number
of times at the grid? Why would you expect to see this?
4. How would you compute the theoretical probability that “a coin, thrown randomly at
a grid, will land entirely within a tile rather than beyond the tile boundaries?” How is
this question different from question 1?
5. What is the relationship between empirical and theoretical probabilities?
6. Work with your group to compute the theoretical probability that the coin will land
within the tile. Record your work below.
7. Compare the empirical and theoretical probabilities you found. How do your results
relate to the conjecture you proposed in item 5?
EXAMPLE OF A GRID
! ! !
! ! !
! ! !
! ! !
! 105!
ASSESSMENT 2-02
1. If a circle with diameter 20 cm is placed inside a square with a length 20 cm,

what is the chance that a dart thrown will land inside the circle?
ANSWER: Area of Circle / Area of Square = ( / = 314 / 400 =
0.785
2. Suppose two numbers, x and y, are generated at random, where 0 < x < 5
and 0<y<10 . What is the probability that the sum is less than or equal to 2?
ANSWER: Area of Triangle / Area of Rectangle = ( / = 2
/ 50 = 0.04
3. A parachutist jumps from an airplane and lands on a square field that is one (1)
kilometer on each side. In each corner of the field there is a large tree. The
parachutist’s ropes will get tangled in the tree if she/he lands within 1/10 kilometer of
its trunk. What is the probability that she/he will land in the field without getting caught
in a tree?
Answer: To avoid getting caught in a tree, the parachutist must land in the region
shaded below:
! 106!
Probability of not getting caught = =
0.968584
! 107!
Lesson 3: Random Variables
OVERVIEW OF LESSON
In this lesson, the concept of a random variable is discussed. The notion of a statistical
experiment is defined as well as random variables that relate to experiments. Finally,
two types of random variables, discrete and continuous, are described.
• illustrate/provide examples of random variables
• distinguish between discrete and continuous random variables
• find the possible values of a random variable
PRE-REQUISITE LESSONS
Types of Data (in particular, classifications of numerical variables) and Probability
LESSON OUTLINE
A. Introduction/Motivation: The coin toss and breath-holding activities
B. Main Lesson:
I. Introduce the concepts of a statistical experiment and a random variable
II. Distinguish between discrete and continuous random variables and give
examples of random variables
C. Group Discussion
D. Enrichment
REFERENCES
Workbooks in Statistics 1: 11th Edition, Institute of Statistics, UP Los Baños, College
Laguna 4031
Random Variables. Khan Academy. Retrieved from
https://www.khanacademy.org/math/probability/random-variables-
topic/random_variables_prob_dist/v/random-variables
KEY CONCEPTS
Statistical Experiment, Outcomes, Random Variables, Discrete Random Variables,

Continuous Random Variables
! 108$
MATERIALS NEEDED
• one-peso coin per student

• stop timer per group
A. Introduction/Motivation: The coin toss and breath-holding activities
Look at the current one-peso coin in circulation. It has Jose Rizal on one side, which
we will call Head (H), and the other side Tail (T). Ask learners to toss the one-peso
coin three times and record on their Activity Sheet the results of the three tosses
Use H for heads, and T for tails. If needed, define first the head side of the coin and
the tail side of the coin. For example, a learner tosses heads, tails, heads. Then the
learner should write HTH on his/her notebook. Ask them to count the number of
heads that appeared and write it also on their Activity Sheets.
Next, have all the learners hold their breaths and record the time. This is best done
if they time it as accurately as possible (if possible, use a cell phone timer and
record up to the nearest hundredth of a second). If there is limited number of
timers, do it one at a time with one learner holding the timer while the other one is
holding his/her breath. Ask students to record the time on their Activity Sheets.
Then, record all the possible answers on the board for both activities. For the first
activity, write all eight possible outcomes, and then list down which one had zero
(0), 1, 2, or 3 heads. If you have time, tally the results. You can do this systematically
so that you do not get confused later on. Start with the outcomes with zero (0)
heads, then progress from there.
TTT
TTH
THT
HTT
THH
HTH
HHT
HHH
The breath-holding activity is a little bit more challenging. Expect to have a lot of
possible values. Just write about 10. Then, tell the learners if they have different
values, they can raise their hands. Notice that the first one had only four possible
values to take, while the second one is almost unique to each individual. What
! 109$
could help may be getting the lowest value and the highest value recorded by
students.
Emphasize the difference in the number of possible values in these two activities as
this is important in the discussion.
B. Main Lesson
I. Experiments and Random Variables
Begin the discussion with the definition of a Statistical Experiment: An activity

that will produce outcomes, or a process that will generate data. The outcomes
have a corresponding chance of occurrence. Examples of which are (a) tossing three
coins and counting the number of heads, (b) recording the time a person can hold
his/her breath, (c) counting the number of students in the classroom who are
present today, (d) obtaining the height of a student, etc.
Say that the two activities are examples of statistical experiments. Come up with
several examples, such as recording the results of an examination, asking the
weekly baon (or allowance) of students, identifying the waistline of students.
Emphasize that Statistical Experiments can have a few or a lot of possible

outcomes. In the coin toss example, there are eight possible outcomes. In the
breath example, there can be a lot of possible outcomes. However, they can
indicate that the possible values are in the range of 10 seconds to 60 seconds (Ask
the shortest and the longest times in class and use that as the limits for this
example).
Suppose you give a learner candy based on the number of heads that appear in the
coin toss experiment (Remember: Giving a candy is optional). List down the
possible number of candies that can be given. Notice that it should only be zero (0),
1, 2, or 3. Then, you can list down all the outcomes of the experiment under each
value:
Number of candies Outcomes

0 TTT
1 TTH, THT, HTT
2 THH, HTH, HHT
3 HHH
! 110$
Next, define a Random Variable: It is a way to map outcomes of a statistical
experiment determined by chance into number. It is typically denoted by a capital
letter, usually X.
X: outcome ! number
Random variable is actually neither random nor a variable in the traditional sense
that a variable is defined in an algebra class (where we solve for the value of a
variable). It is technically a function from the space of all possible events to the set ℝ
of real numbers.
Tell students that a random variable must take exactly one value for each random
outcome. Generally, as with functions, a number of possible outcomes may have
the same value of the random variable, and in practice, this occurs frequently. For
instance, three outcomes above for tossing a coin thrice would have 1 candy, and
three outcomes would have 2 candies.
Learners need to understand that random variables are conceptually different from
the mathematical variables that they have met before in math classes. A random
variable is linked to observations in the real world, where uncertainty is involved.
Learners should be told that random variables are central to the use of probability
in practice. They help model random phenomena, that is, random variables are
relevant to a wide range of human activities and disciplines, including agriculture,
biology, ecology, economics, medicine, meteorology, physics, psychology,
computer science, engineering, and others. They are used to model outcomes of
random processes that cannot be predicted deterministically in advance (but the
range of numerical outcomes may, however, be viewed).
In the coin example, we can define the random variable X to be the number of
heads that appears from tossing a coin three times. While we do not know what the
resulting specific outcome is, we know the possible values of X in this case are zero
(0), 1, 2, or 3. You can also define another random variable Y to be the time a
person can hold his/her breath. The possible values for this variable can be one of
so many possible values.
! 111$
In the second example, the possible values range between the lowest and the
highest value recorded by students. Notice that it is really difficult to list down all
the possible values. That is why in this example, it is better to state the possible
values as an interval, such as
, if the lowest and highest values are 10 and 60, respectively.
II. Types of Random Variables:

Distinguish the two types of random variables, viz., discrete and continuous.
(a) Discrete Random Variables are random variables that can take on a finite (or
countably infinite) number of distinct values. Examples are the number of heads
obtained when tossing a coin thrice, the number of siblings a person has, the
number of students present in a classroom at a given time, the number of crushes a
person has at a particular time, etc.
Categorical variables can be considered discrete variables. Example: whether a

person has normal BMI or not, you can assign one (1) as the value for normal BMI
and zero (0) for not normal BMI. You can also put numbers to represent certain
categorical variables with more than two categories. You can also use ordinal
variables, like how much they like adobo on a scale of 1 to 10 (where 1 means
favorable and 10 unfavorable).
(b) Continuous Random Variables, on the other hand, are random variables
that take an infinitely uncountable number of possible values, typically measurable
quantities. Examples are the time a person can hold his/her breath, the height or
weight or BMI of a person (if measured very accurately), the time a person takes for
a person to bathe. The values that a continuous random variable can have lie on a
continuum, such as intervals.
Extra Notes:
• You can modify the experiment to just tossing a coin twice instead of
thrice to make things simpler. Here, the outcomes will be only four: HH,
HT, TH, TT, and the possible values of X are 0, 1, and 2.
• You may use other examples of continuous variables such as height,
weight, lengths, and age)
• Feel free to add more examples, or get examples from the seatwork that
is in the next section.
! 112$
C. Group Discussion
Group learners into threes. Given the following experiments and random variables,
ask the groups to identify what the possible values of the random variables are.
Also, for each random variable, identify whether the variable is discrete or
continuous. (Answers in bold are Discrete, while answers in italics are Continuous)
1. Experiment: Roll a pair of dice

Random Variable: Sum of numbers that appears in the pair of dice
2. Experiment: Ask a friend about preparing for a quiz in statistics
Random Variable: How much time (in hours) he/she spends studying for this
quiz
3. Experiment: Record the sex of family members in a family with four children
Random Variable: The number of girls among the children
4. Experiment: Buy an egg from the grocery
Random Variable: The weight of the egg in grams
5. Experiment: Record the number of hours one watches TV from 7 pm to 11
pm for the past five nights.
Random Variable: The number of hours spent watching TV from 7
pm to 11 pm
D. Enrichment
In tossing a coin four times, how many outcomes correspond to each value of the
random variable?
What if the coin would be tossed five times? six times? seven times? eight times?
Try to relate the outcomes to the numbers in Pascal’s triangle.
! 113$
For tossing the coin four times, there will be five possible values,
0, 1, 2, 3, 4, with
1, 4, 6, 4, 1 outcomes, respectively.
For five coins there are six possible values,
0, 1, 2, 3, 4, and 5, with
1, 5, 10, 10, 5, 1 outcomes, respectively.
In general, for n tosses of a coin, there are n+1 possible values, 0, 1, 2, 3, …, n. If k

is a possible value, then there are
n Cx =
outcomes associated with x.
Next, possibly read on probability distributions, which will be covered in the next
lesson.!
!
KEY POINTS
• A Random Variable may be viewed as a way to map outcomes of a statistical
experiment determined by chance into number.
• There are two types of random variables:
o Discrete: takes on a finite (or countably infinite) number of values
o Continuous: takes an infinitely uncountable number of possible values,
typically measurable quantities
! 114$
ACTIVITY SHEET 02-02
1. Toss a coin three times and record the results of the three tosses below.
(Use H for heads and T for tails.)
Outcome
First Toss
Second Toss
Third Toss
2. Count the number of heads that appeared.
3. Write all possible outcomes for tossing a coin three times, and then count the
number of heads for each outcome. List them down.
4. Hold your breath and accurately record the time you held your breath. (If possible,
use a cell phone timer and record up to the nearest hundredth of a second). Record
the time below: ________ seconds
ASSESSMENT
1. Identify a possible random variable (or if possible two random variables) given the
following statistical experiments. If possible, identify whether the variable is Discrete
or Continuous.
(Answers in bold are Discrete, while answers in italics are Continuous)
a. Take a quiz (score of students, whether a student passed or failed

the quiz, how long it took a student to answer the quiz)
b. Ask the class about their breakfast (whether students had breakfast or
not, how long students ate breakfast, the time students had breakfast, how
many calories they consumed)
c. Ask a neighbor about television shows (how many shows he/she
watches every night, what tv channel he/she prefers the most,
how long does he/she watch TV per week)
d. Ask a friend about Facebook (whether he/she has a Facebook account
or not , number of Facebook friends he/she has, the amount of time
he/she spends per week on Facebook)
e. Run 100m on the track (whether students were able to complete it in
under 15s or not, time to finish running 100m)
! 115$
f. Ask a classmate about musical instruments (whether he/she plays an
instrument or not , how many instruments he/she can play, length
of time he/she plays the instrument per week)
g. Visit the nearest market and look for poultry, such as chickens (how many
stalls sell chickens, whether the first stall sells chickens or not,
total weight of chickens sold in a certain day)
h. Ask your mother about the EDSA revolution (whether your mother was
alive the time of the EDSA revolution or not, whether she was
there or not, what her age was during the EDSA revolution)
2. During a game of Tetris, we observe a sequence of three consecutive pieces. Each

Tetris piece has seven possible shapes labeled here by the letters !, !, !, !, !, !
and !. So in this random procedure, we can observe a sequence such as STT, J ,
SOL, JJJ and so on. Define:
• X to be the number of occurrences of `J' in a sequence of three

pieces. Then X can take the value 0, 1, 2 or 3.
• Y to be the number of different shapes in a sequence of three
pieces. Then Y can take the value 1, 2 or 3.
• T to be the time it takes a randomly selected Tetris gamer to end a game
Identify whether X, Y and T are discrete or continuous.
Explanatory Notes:
• Teachers have the option to just ask the questions in this assessment orally to
the entire class, or to group learners and ask them to give the answers, or to
give this as homework, or to use some of the questions for a chapter
examination.
• The answers here are some of the possible answers and are not limited to
these. If the learners thought of other examples, that arebetter.
• Answers in bold are discrete while answers in italics are Continuous
! 116$
Lesson 4: Probability Distributions of Discrete Random Variables
TIM E FRAM E: 60 minutes
• illustrate the probability distribution for discrete random variables and its properties
• compute probabilities corresponding to a given discrete random variable
• construct the probability mass function of a discrete random variable and its
corresponding histogram
PRE-REQUISITE LESSONS: Probability, Random Variables
LESSON OUTLINE
A. Introduction / Motivation: How Many Siblings Do Students Have?

B. Main Lesson: Probability Distributions of Discrete Random Variables including Examples
of Probability Distributions
I. Properties of Probability Distributions of Discrete Random Variables
II. Determining Probabilities based on the Probability Distribution
C. Enrichment
KEY CONCEPTS
Probability Distribution, Graphical and Tabular Representation of Probability Distributions,

Probabilities from Probability Distributions
REFERENCES
Workbooks in Statistics 1: 11th Edition, Institute of Statistics, UP Los Baños, College Laguna
4031
Probability and Statistics Module 19: Discrete Probability Distributions. (2013) Australian
Mathematical Sciences Institute and Education Services Australia. Retrieved from
http://www.amsi.org.au/ESA_Senior_Years/SeniorTopic4/4_md/SeniorTopic4c.html
http://www.amsi.org.au/ESA_Senior_Years/SeniorTopic4/4_md/SeniorTopic4c.html#content_1
https://www.youtube.com/watch?v=qSu-Rk-6apw&feature=youtu.be
! 117#
A. Introduction/ Motivation: How Many Siblings Do Learners Have?
Ask learners to provide information on how many siblings they have. This can be
done either through the data set collected at the start of the year (see Chapter 1,
Lesson 1), or by asking them to raise their hands as you call out different numbers,
zero (0), 1, 2, 3, etc.. Emphasize to them that this is an example of a random
variable. Ask them what type it is (to review Discrete Variables). Mention that
typically, we denote random variables as capital letters X, Y, Z, etc.
Construct a table of frequencies for W, the number of siblings (together with the
relative frequencies). The first column for the table lists the possible values of W,
the number of siblings (0, 1, 2, 3, etc). The second column lists the frequencies (how
many students have the corresponding siblings for the first column). The third
column lists the relative (or percentage) frequencies, i.e. the entries on the second
column divided by the number of learners expressed as a percentage.
Suppose that we have the following table of frequencies:
W =Number of siblings Frequency Relative frequency

0 2 4% =(2/50) x 100%
1 10 20 % = (10/50) x 100%
2 28 56 %= (28/50) x 100%
3 5 10%= (5/50) x 100%
4 3 6%= (3/50) x 100%
5 1 2%= (1/50) x 100%
6
7 1 2%= (1/50) x 100%
TOTAL 50 100%= (50/50) x 100%
Next, draw a histogram to represent the relative frequencies. Emphasize that

the values on the y-axis represent these relative frequencies (in percent). You may
stretch the y-axis to make it visually better. For each rectangular region, compute
the area. Take note that the widths of the rectangles are all 1, so the area is just
equal to the height of the rectangle (the value of y), which is the probability. Have
them add the areas, and show that the sum is 100% (if it’s not 100%, then have
them check their answers). Ask them if that is just a coincidence or should this be
! 118#
expected? (It should be expected because the sum of all probabilities should be 1
or 100%)
60
40
Percent
20
0
0 2 4 6 8
Number of Siblings
Note the following properties of a histogram:

• All the possible values for y (in percent) are either zero or a positive number
less than or equal to 1. (Ask students if the percentage can be equal to 1?
Yes! Suppose all the learners have one sibling. Then show the histogram for
this scenario.)
• The sum of all the areas under the graph should be equal to 1 (or 100
percent).
Note that these properties are the properties of the probability of an event (as the
chance of an event can go from 0 to 100 percent, and the chance of the sure event,
i.e. the whole area under the graph is 100 percent).
! 119#
B. Main Lesson: Probability Distribution
Introduce the concept of the Probability Mass Function of Discrete

Random Variables:
a table, graph, or formula that lists all the possible values of the random variable
and the corresponding probability for each value. Take note that the probabilities
may be empirical probabilities, theoretical probabilities, or subjective probabilities.
In the examples done earlier, the table and the histogram are two ways to represent
the probability mass function, also called the probability distribution. You can
explain further that it is called probability distribution because it is as though we are
distributing probability weights (or masses) to all the possible observations or values
of the random variable. This will then lead to the properties of probability
distributions which will be discussed later. In the previous example, you distributed
all the “weights” from the learners to the different values of the random variable
(number of siblings).
As a second example, you can consider the probability distribution of the number
of heads occurring when tossing a coin three times, and then counting the number
of heads (the activity done in Lesson 2 of Chapter 2). Suppose there is an equal
chance that the coin lands on a head or a tail (However, this assumption cannot be
done always, since we are not exactly sure if we do have a “fair coin.” In fact, this is
what differentiates statistics from probability, where in the latter, we make
assumptions about the probability of garnering a head, while in statistics, we
conduct data collection to estimate this unknown probability). Then, there will be
eight outcomes, each one assumed to have 1/8 chance of appearing. Suppose X is
the number of heads, then the Probability Distribution is as follows:
Tabular Representation of the Probability Distribution of X
Outcomes Number of heads Probability

TTT 0 1/8
TTH, THT, HTT 1 3/8
THH, HTH, HHT 2 3/8
HHH 3 1/8
! 120#
In general, when flipping a coin n times where the coin has probability p of getting
a head in 1 toss, then the probability mass function for generating exactly X heads
is
This is called the binomial pmf. The formula can be understood as follows: We
want exactly k heads and n − k tails. For a particular sequence of k heads, the
multiplication rule says, that this has a chance pk and similarly for a particular
sequence of n-k tails, this has a chance of (1-p)n-k . However, the k heads can occur
anywhere among the n trials, and there are different ways of distributing k
heads in a sequence of n trials.
This second case is an example where the probabilities are derived theoretically.
So, whether probabilities are assigned empirically or theoretically, the probability
distribution should have the same general properties.
I. Properties of Probability Distributions of Discrete Random Variables
• Probabilities should be confined between zero (0) and 1 (inclusive of both

ends).
• The sum of all the probabilities should be 1 (i.e., 100%).
If we represent graphically the probability distribution of a discrete random variable

X, with the area of the rectangles corresponding to the probability of each value of
! 121#
X, we note that each area must be a non-negative valued (and at most equal to
one). In addition, the area of all the rectangles should total 1 (or 100 percent).
Inform learners that for the two worked examples (on number of siblings and
number of heads obtained in three tosses of a fair coin), all these properties of
probability distributions were satisfied. It might be helpful if you do not remove the
two examples from the board. Show in each example that probabilities cannot be
below zero (0), and they cannot be above one (1). In addition, the probabilities sum
up to one (1). For the graphical representation, show the area for each rectangle is
the same as the probability, and then show that they all add up to 100%.
You can stop at this point and just give an assessment (either in the form of a
seatwork or a short quiz). If the time is not enough, you may give the assessment as
a homework.
II. Determining Probabilities based on the Probability Distribution
Since the probability distribution contains the values of random variables and the
corresponding probabilities of each value, then it can be used to determine the
probabilities that a random variables will take on certain values.
For example, given the illustrative data on the number of siblings that the learners
have, or better yet the actual distribution of the data in class, you can ask the
learners:
• What is the probability that a randomly-selected learner is an only child?

• What is the chance that a randomly-selected learner has at most two
siblings?
• What is the probability that a randomly-selected learner has three or more
siblings?
For the illustrative data shown above, if you want to determine the probability that
the randomly-selected learner is an only child, then you just get the probability that
W=0, i.e.,
P(W=0) = 4%.
For the other questions, the chance that a randomly-selected learner has at most
two siblings is:
P( W ≤ 2 ) = P( W = 0 or W = 1 or W = 2) = P(W=0) + P(W=1) + P(W=2)
= 4 % + 20% + 56% = 80%
! 122#
while the probability that a randomly-selected learner has three or more siblings is
P( W ≥ 3 ) = P( W = 3 or W = 4 or W = 5 or W=7 )
= P(W=3) + P(W=4) + P(W=5) + P(W=7)
= 10% + 6% + 2% + 2 % = 20%
Alternatively, for the latter probability, you will notice that having three or more
siblings is the complement of having at most two siblings (whose probability was
calculated already to be at 80%). As was stated in Lesson 1, the chance of the
complement of an event is 100 percent minus the chance of an event.
P( W ≥ 3 ) = 1 – P( W ≤ 2 ) = 100% - 80% = 20%
You can also use the graphical representations of the probability distribution in
order to determine the probability of the events of interest. Since the area under
the graph is the same as the probability, then adding the areas of the rectangles
will give the appropriate probability that you are looking for.
You should point out to learners that, in general, for a discrete random variable X,
the probability that X lies in some discrete set A, may be obtained by summing the
probability for the distinct values in the set A, that is.
For instance, in the last case,
P( W ≥ 3 ) == P(W=3) + P(W=4) + P(W=5) + P(W=7)
= 10% + 6% + 2% + 2 % = 20%
C. Enrichment
You can go beyond the examples and do exercises like these:
Given the following table is a probability distribution for a random variable X, which
corresponds to the number of pens that children from a class have in their bags.
k Probability that
X=k
1 0.15
2 0.2
3 0.35
4
! 123#
Identify the value of a.
Answer: since the total probability is 1, then .

Therefore,
You can also add questions like: What is the probability that a randomly-selected
student has at least three pens in his/her bag. Answer:
You can give similar examples that are based on the actual data that students did
from the first lesson in Chapter 1 (the centralized data collected on the first day of
stat). For instance, let Y = rating of how a randomly-selected students feels today
(on a scale of 1-10).
ASSESSMENT
1. A probability distribution is an equation that:
a. associates a particular probability of occurrence with each outcome in the

sample space
b. measures outcomes and assigns values of X to simple events
c. assigns a value to the variability in the sample space
d. assigns a value to the center of the sample space
Answer: A
2. Given the results of a survey of high school students, given the following
probability distribution for Y, the number of pets they have at home.
Y Frequency Y Frequency
0 5 3 8
1 4 4 1
2 6 5 1
! 124#
(a) Construct a histogram for the probability distribution
30
20
Percent
10
0
0 2 4 6
number of pets they have at home
(b) Determine the probability of selecting a student having

• 3 pets.
• At most 2 pets
• At least 3 pets
• At least 1 pet
Answer:
The probability of selecting a student with

• 3 pets is
P(Y=3) = 8/25 = 32%
• At most 2 pets is
P(Y≤2) = P(Y=0 or Y =1 or Y =2)=P(Y=0) + P(Y=1) + P(Y=2) =
(5/25) + (4/25) + (6/25) =15/25= 60%
! 125#
• At least 3 pets is
P(Y≥3) = 1- P(Y≤2 )=100% -60%
• At least 1 pet is
P(Y≥1) = 1- P(Y=0 )=1- (5/25) = 100% -20% =80%
3. Your mom decides to buy a single ticket for the lotto. Suppose that it has the
following possible payoffs with their associated probabilities.
Payoff Probability
P 100 0.0500
P1250 0.0100
P5,000 0.0050
P25,000 0.0010
P250,000 0.0005
P500,000 0.0001
(a) the probability that your mom will win any money is ________. (Answer: 0.0666)
(b) the probability that your mom will win at least P5000 is ________. (Answer:
0.0066)
4. The following table contains the probability distribution for X = the number of
retransmissions necessary to successfully transmit a 5 GB data package through
a double satellite media.
X 0 1 2 3
P(X) 0.4 0.30 0.2 0.05

0 5
(a) the probability of no retransmissions is ________. (Answer: 0.40)

(b) the probability of at least one retransmission is ________. (Answer: 0.60)
5. Erik is going to flip a coin twice. Each coin flipped is independent, but the coin
is biased: the probability that the coin will flip a head is 25 percent each time. If
X is a random variable that represents the number of head obtained when the
coin is tossed, create a histogram representing the probability distribution for all
possible values of X.
! 126#
Answer: Generate a graph to represent the following probability distribution
P(X=0) = (0.75)2 = 0.5625
P(X=1) =2 (0.75) (0.25) = 0.375
P(X=2) = (0.25)2 = 0.0625
6. There are 8 players on an amateur basketball team. They are practicing their
free throws by having each player shoot two free throws. The table below shows
the result of each player's free throws. "X" represents a missed free throw, and
"O" represents an accomplished free throw.
Player Noel Candido Robert Michael Lino Carlos Angelo Ramon

Free XX OO XX XO OX XO XO XX
throws
Make a histogram representing the proportion for each possible number of free
throws made by a player.
Answer: Since we have the following probability distribution

P(X=0) =3/8
P(X=1) =4/8
P(X=2) = 1/8
the histogram is shown below:
50
40
30
Percent
20
10
0
0 1 2
number of free throws
! 127#
7. A couple intends to have children until they get at least one boy and one girl,
but they agree that they will not have more than three children, even if all are
girls or all are boys. (Assume boys and girls are equally likely).
(a) Determine the probability model for the number of children they will have
(b) Calculate the probability of having two children
Answer:
Enumerating possible scenarios and probabilities we will get:
First child Second Child Third Child Fourth Child Probability
Boy Boy Boy Boy (1/2) x (1/2) x (1/2) x (1/2) = 1/16

Girl (1/2) x (1/2) x (1/2) x (1/2) = 1/16
Girl (1/2) x (1/2) x (1/2) = 1/8
Girl (1/2) x (1/2) = 1/4
Girl Boy (1/2) x (1/2) = 1/4
Girl Boy (1/2) x (1/2) x (1/2) = 1/8
Girl Boy (1/2) x (1/2) x (1/2) x (1/2) = 1/16
Girl Girl (1/2) x (1/2) x (1/2) x (1/2) = 1/16
(a) Thus the probability distribution is:
P (X = 2) =P(BG or GB)= + =
P (X= 3) =P(BBG or GGB) =1/8 + 1/8 =
P (X= 4) =P(BBBB or BBBG or GGGB or GGGG) =1/16 + 1/16 + 1/16 + 1/16
=
(b) The probability of having two children is P (X = 2) =
8. A six-sided dice is biased yielding the following probability distribution for X, the
number of spots on the uppermost face when the die is rolled.
k 1 2 3 4 5 6
P(X=k)
! 128#
Lesson 5: Probability Density Functions (of Continuous
Random Variables)
• define the probability density function for continuous variables and its properties
• compute probabilities that a given continuous random variable falls in some
interval
• compare and contrast the probability density function to the probability mass
function
PRE-REQUISITE LESSONS: Probability, Random Variables, Probability Distributions

of Discrete Random Variables
LESSON OUTLINE
A. Introduction: Recall Concept of Continuous Random Variables
B. Motivation: Differentiate Probability Density Function from Probability Mass
Function
C. Main Lesson: Probability Density Function; Examples of Probability Density
Functions (Uniform and Triangular Distributions) and Using Probability Density
Functions of Continuous Random Variables to compute probabilities falling in an
interval
D. Enrichment: Pseudo Random Numbers between 0 and 1
REFERENCES
Laguna 4031
Probability and Statistics Module 21: Continuous Probability Distributions. (2013)
Australian Mathematical Sciences Institute and Education Services Australia. Retrieved
from http://www.amsi.org.au/ESA_Senior_Years/PDF/ContProbDist4e.pdf
KEY CONCEPTS:
Probability Density Function, Probability, Continuous Random Variables
! 130$
A. Introduction: Recall Concept of Continuous Random Variables
Help learners recall that random variables map outcomes of independent random
events into numbers, and that there are two general types: discrete (which takes on
a specific and countable number of possible values) and continuous (which takes on
possible values in a continuum). An example of continuous random variable is the
weight of a randomly-chosen rock from a pile of rocks since the exact weight of the
rock (to the smallest of measures) is impossible to pinpoint as weight can be
subjected to varying levels of accuracy, and we are only limited by the precision of
our instruments. We feel that, in principle, weight could be any numerical value. For
example, the weight of this randomly-chosen rock could be 245.4 grams, but it
could also be 245.38 grams, or 245.382 grams, and so on.
Ask learners to remember the heights data they provided in Lesson 1 of Chapter 1,
or the measured height they obtained in Lesson 3 of Chapter 1 (for calculating the
Body Mass Index). Tell them to assume that H is the height of a student chosen at
random in the class. Is H a continuous random variable or is it discrete? What is it
about the experiment that makes H continuous or discrete? What could they do to
change it? (Tell them that, in theory, H falls on a continuum of values, but if the data
could be “discretized” to be a countable number of data, i.e., if the heights were to
be put up to one decimal place in centimeters, then the possible values would be
countable. Another way is to define H = 1 if the height is less than 167 cm, H=2 if
between 167 and 180 cm, and H=3 if 180 cm or more. Then H would be discrete
with possible values of 1, 2, and 3.).
Define Y as the top bicycling speed of a randomly-chosen learner in class. Ask

learners if Y is a continuous random variable. (Speed is a common continuous
variable, and the value is chosen by a random process so Y is a continuous random
variable since there is always another possible value between any two speed values.
It would not be possible to count all possible speeds that Y could be.)
Define Z to be the number of people in the city/municipality/town who will vote in

the next presidential election. Ask learners if Y is a continuous random variable.
Their answer should be no since even though there may be a huge number of
voters in the city/municipality/town, the number of voters is countable (and finite).
Let T be the total time spent on Facebook by a randomly-chosen learner from the
time he/she registered on FB. Then T is a continuous random variable.
! 131$
B. Motivation: Probability Density Function as Analog of Probability
Mass Function
In the previous lesson, you showed learners that for discrete random variable X, that
takes on a finite or countably infinite number of possible values, there is a
probability distribution (or probability mass function)
P(X = x) for all of the possible values of X
Now, inform learners that for continuous random variables, the probability that X
takes on any particular value x is zero (0). That is, finding P(X = x) for a continuous
random variable X is always known to be zero (0).
Instead, we will need to find the probability that X falls in some interval (a, b), that
is, we'll need to find P(a < X < b). We can use a curve called the probability
density function f(x), for this purpose.
Even though a fast food chain would claim that a hamburger weighs 100 grams, a
randomly- selected hamburger might weigh 98 grams while another might weigh
103 grams. What is the probability that a randomly-selected hamburger weighs
between 95 and 105 grams? That is, if we let X denote the weight (in grams) of a
randomly selected hamburger, what is P(95 < X < 105)?
Tell learners to assume that we selected 100 hamburgers and created a histogram
of the resulting weights. Perhaps, the histogram might look something like this:
.1
.08
.06
Density
.04
.02
0
90 95 100 105 110

weight
! 132$
If we decreased the length of the class intervals on the histogram, then, the histogram
would look something like this:
.15
.1
Density
.05
0
90 95 100 105 110 115

weight
and if we pushed this further and decreased the intervals even more (and selected
even more hamburgers, say 1000 or 10,000), the intervals would eventually get so
small that we could represent the probability distribution of X, not as a histogram, but
as a curve (by connecting the "dots" at the tops of the tiny rectangles) that, in this
case, might look like this:
Tell learners that such a curve is denoted as f(x) and is called a (continuous)
probability density function.
! 133$
probability distribution) of a discrete random variable, i.e. the first property is about
probabilities being nonnegative, and the second property is about probabilities
adding up to 1.
Explain to students that we can use the probability density function to find the
probability that X lies in some interval from a to b, by considering the area under
the graph of the probability density function over the interval a to b. That is:
= the area under the curve f(x) sandwiched by a and b
This is also similar to the idea that probability that X lies in some discrete set A, may
be obtained by summing the probability for the distinct values in the set A, that is.
We simply changed sums that appeared in the discrete case to “integrals” in the
continuous case. You should not assume though that learners know their calculus,
but you can tell them this is why they have to “endure” calculus, as it has many uses
especially for probability and statistics.
Example 1: Uniform Distribution, also called Rectangular Distribution
Suppose that your friend is always late, and that the continuous random variable X
represents the time from when you are supposed to meet your friend until he shows
up. Suppose that your friend could arrive “a” minute late or up to “b” minutes late
with all intervals of equal time between and being equally likely.
For example, if a=10 and b = 60, and your friend is just as likely to be from 10 to 20
minutes late as he is to be 25 to 35 minutes late. The random variable X can be any
value in the interval from 10 to 60, that is, because any two intervals of equal length
between 10 and 60, inclusive, are equally likely.
The random variable X is said to follow a uniform probability distribution, with

probability density function
if a<x<b for some constant c
! 135$
and suppose f is zero (0) outside the interval a to b.
(i) What is the value of c?
(ii) Suppose that a=0, and b=1, what would be P(0.1<X<0.3} ; P(0.5<X<0.7); and P
( d < X < d +0.2 ) assuming d> a and d +0.2 < b?
Answer: Draw the probability density function
Tell learners that since the total area under the curve (which is the area of the
rectangle formed in the graph) should add up to one, then
c (b-a) = 1
Thus, c = 1/(b-a).
If a = 0, b=1, then c=1. In this case, show that the probabilities P(0.1<X<0.3} ;
P(0.5<X<0.7); and P ( d < X < d +0.2 ) assuming d> a and d +0.2 < b would all be
0.2 because we would be forming rectangles with a height of 1, and a width of 0.2,
whose areas would be 0.2.
! 136$
!
P(0 < X < 1) = P(-1 < X < 0 ) = ½
Show them that there we can form a smaller right triangle from ½ to 1, whose area
is (1/2) x ((1/2) x (1/2) = 1/8 so that we should know that
P (½ < X <1) = 1/8
And thus, we can compute. Remind learners about the symmetry at zero (0) (i.e. that
there are two right triangles to the left and right of zero (0)). Tell learners that this
means that
P(0 < X < ½) = P(0 < X < 1 P (½ < X <1)) = (1/2) - (1/8) = 3/8
An alternative way to calculate P(0 < X < ½) is to remember the area of a trapezoid,
the sum of the bases multiplied to half of the height. A trapezoid can be formed as
shown below with bases 1 and ½, and a height of 1/2.
Thus P(0 < X < ½) = ( 1 + (1/2) ) = 3/8
D. Enrichment: Pseudo Random Numbers between 0 and 1

Suppose we consider real numbers, randomly chosen between zero (0) and 1, for
which we record X as the random number truncated to one decimal place. For
example, the number 0.03421284007 is recorded as 0.0 (as the first decimal place is
zero (0)). Note that this means we are truncating (and not rounding). If the
mechanism generating these numbers has no preference for any position in the
interval (0, 1), then the probability distribution of X is a “discrete uniform”
distribution:
Pr(X = 0.0) = Pr(X = 0.1) = ··· = Pr(X = 0.9) =1/10
Now suppose that instead, we recorded it to two decimal places, then
Pr(X = 0.00) = Pr(X = 0.01) = ··· = Pr(X = 0.99) = 1/100
If we record the first k decimal places, the random variable X has 10k possible
outcomes, each with probability 1/(10k)
The spreadsheet application Microsoft Excel has a function that produces real
numbers between zero (0) and 1, selected so there is no preference for any position
in the interval (0, 1).
The actual mechanism is based on a deterministic generator of numbers called

linear congruential generation, so that in principle at some point, there will be a
cycle to repeat the numbers generated, although it will take a while to do so. This is
why this is more aptly called a pseudo random number generator. If you enter
=RAND() in a cell and hit return/enter, you will obtain such a pseudo random
number.
As we make the distribution finer and finer, with more and more values chosen, the
probability that any of these discrete random variables lie in the interval [0.1, 0.3) is
always 0.2. Thus X gets to be closer and closer toward a continuous random
variable with (uniform) probability density function f(x) =1 when 0< x < 1.
KEY POINTS
• For a continuous random variable X, the probability P(X=a) for some value a is
always zero (0).
! 139$
• For continuous random variables, we must consider the probability that it lies in
an interval. To find this probability, we need to find the area under the
probability density function on the given interval.
• The total area under a probability density function equals 1.
• A probability density function cannot be negative, since if they were, a negative
probability could be obtained, which is not allowed.
• The probability density function is analogous to, but different from, the
probability distribution function for a discrete random variable. For the discrete
case, the probability distribution sums up to 1, while for the continuous case, the
total area under the curve is 1.
ASSESSMENT
1. Suppose that in a certain class, grades will be given in a uniform distribution
where a = 80, b = 100. Let X be the grade of a randomly-selected student,
what is:
a. P( X < 90)
b. P( X > 90 )
c. P (X = 90)
d. P (85 <X < 95)
Answers:
a.
b.
c. 0
d.
2. Let X have a triangular distribution. Determine
(a) P(0 < X < 0.2)

(b) P(X > 0.3)
(c) P( 0.2 < X < 0.3)
Answer:
(a) (1/2) (0.2) (0.2) = 0.02

(b) - (1/2)(0.3)(0.3) = 0.5 – 0.045= 0.455
! 140$
" x 0 ≤ x ≤1
#
f x % 2− x 1≤ x ≤ 2
#0 otherwise
&
!
Lesson 6: Mean and Variance of Discrete Random Variables
• describe or illustrate the mean (and variance) of a discrete random variable
• compute for the mean (and variance) of a discrete random variable
• provide an interpretation of the mean (and variance) of a discrete random variable
PRE-REQUISITE LESSONS: Random Variables, Probability Distribution of Discrete Random

Variables
MATERIALS NEEDED
• three coins
• paper
• pencil
KEY CONCEPTS: Mean (long-run average) of a Random Variable, Variance of a Random

Variable
LESSON OUTLINE
A. Introduction: Review of Probability Distributions
B. Main Lesson: The Mean and Variance of a Random Variable
C. Examples of Finding and Interpreting the Mean and Variance
D. Enrichment: Mean of a Continuous Variable
REFERENCES
Workbooks in Statistics 1: 11th Edition, Institute of Statistics, UP Los Baños, College Laguna 4031
Probability and Statistics Module 19: Discrete Probability Distributions. (2013) Australian Mathematical
Sciences Institute and Education Services Australia. Retrieved from
[staslectures]. Mean and Expected Value of Discrete Random Variables. Retrieved from
https://www.opened.com/video/mean-and-expected-value-of-discrete-random-variables/116285
https://www.opened.com/video/variance-and-standard-deviation-of-discrete-random-variables/116286
https://www.opened.com/video/mean-e-x-and-variance-var-x-for-continuous-random-variables/116287
144#
A. Introduction: Review of Probability Distributions
Ask learners to toss three coins ten times and record the number of heads on each
toss. Divide the class into groups of five learners. Tell each learner to get the
average of the number of heads obtained for the first five tosses, and the average
of all the ten tosses. Then, in groups of three to five, get the average of the
averages that the groups got. If possible, ask learners to get the average for the
entire class.
Next, ask learners what the possible number of heads was. They should say 0, 1, 2,
3. Next, ask them what the range of the average of the first five tosses was. Ask
what the highest and lowest values are in the class. Record the number on the
board. Do this also for the average of the ten tosses, the average of each group,
and, finally, record the average of the class. Ask learners what they notice about the
average of the number of heads. Also, ask them what they notice about the range
of values as the number of tosses is increased. They should have noticed
fluctuations in the averages, but the averages approach 1.5, and the range of values
from the averages gets narrower with more data (i.e. more students giving
information).
B. Main Lesson: The Mean and Variance of a Discrete Random Variable
Recall the lesson three for this chapter (on Probability Distributions of Discrete
Random Variables). List down the distribution of the number of heads in tosses of
three fair coins (or three independent tosses of one fair coin). If possible, ask
learners to complete the first two columns of the table. Then, add another column
for the product of the entries of the first and second columns, (X) P(X). After
completing the table, ask them to get the total of the row. Ask them to fill out the
entries. Leave a fourth row which will be filled out later.
X=number of heads P(X) (X)P(X)

0 1/8 0
1 3/8 3/8
2 3/8 6/8
3 1/8 3/8
Total 12/8 = 1.5
! 145#
Definition
μ
! 146#
necessarily a possible value of the random variable. So learners cannot simply say
that the Mean is what they expect to be the number of heads when they toss three
coins. Rather, it is to be interpreted as a long-run average. Mention also to them
that the Mean is the value that we expect the long-run average to approach and it
is not the value of the random variable X that we expect to observe.
Next, have learners recall that the average of a given set of data is a measure of
central tendency. Inform them that the expected value—being an average—
measures the center of the distribution of the possible values of X.
Mention that the mean of a (discrete) random variable X can be given as a physical
interpretation. Suppose we imagine that the x-axis is an infinite see-saw in each
direction, and at each possible value of X, we place weights equal to the
corresponding probabilities. Then, the mean is at a point which will make the see-
saw balanced. In other words, it is at the centre of gravity of the system.
It may be helpful to give other examples to help learners gain more insights.
C. Examples of Finding and Interpreting the Mean and Variance
I. Example with biased dice
Recall the biased six-sided dice with a probability distribution for X, the number of
spots on the upward face when the die is rolled, given as follows:
I 1 2 3 4 5 6
P(X=i)
The expected value of the distribution may be calculated as follows:
= +2( )+ 3( )+4( )+5( )+6( )=
If q = 0, then this reduces to a fair dice, for which, we would have a long-run
average of 3.5 for the number of spots on the upward face.
! 147#
Provide another example that is used in the real world, such as the following:
II. Practical Example Used in Insurance

An insurance company sells life insurance of Php500,000 for a premium or payment
of Php10,400 per year. Actuarial tables show the probability of normal death in the
year following the purchase of this policy is 0.1%. What is the expected “gain” for
this life insurance policy?
Inform learners there are two simple events here. Either the customer will live this
year or will die (a normal death). The probability of a normal death, as given by the
problem, is 0.001, which will yield a negative gain to the insurance company in the
amount of
-489,600= Php10,400 - Php500,000. The probability that the customer will live is 1-
0.001=0.999. Thus, the insurance company’s expected gain X from this insurance
policy in the year after the purchase has the following probability distribution:
Gains Outcome Probability

10,400 Live 0.999
-489,600 Normal Death 0.001
m = (10,400)(0.999) + (-489,600)(0.001) =9,900
Learners should take note that if the insurance company were to sell a very large
number N of the Php500,000 insurance policies to many people, with the long-run
average profit per insurance policy is hp,900, the company would be expected to
make a total profit of N times Php9,900.
Next, ask learners whether a measure of central tendency is the only relevant
summary measure. They should remember that for a set of data, we also need other
summary measures, such as measures of variability. Learners have already met the
concepts of variance and standard deviation when summarizing data. Assist them to
remember that the variance and standard deviation of a set of data are measures of
spread. Tell learners that random variables also have a variance (and a standard
deviation). The variance is derived by getting the expected value of (X-μ)2 where μ is
the Mean.
To illustrate, go back to the Table on flipping three coins and get the number X of
heads in these three coins. Now, add the two columns with the following heading in
bold (See below.), and fill the corresponding values.
! 148#
!
Gains Probability Deviations Squared Weighted
Deviations Squared
Deviations
10,400 0.999 10388.73 107925794 107817868
-489,600 0.001 -490.466 240556.92 240.55692
The variance is the sum of the entries on the last column, i.e.,
s2=107817868+240.55692 =107818108
while the standard deviation is the square root of the variance
s = 10383.55
Remind learners that the standard deviation is the more understandable of the two
measures of spread, since the standard deviation is in the same units as X. For
example, if X is a random variable representing the number of heads in three tosses
of a fair coin, then the units for standard deviation is “heads,” while the variance is
in square heads (heads2).
Unlike the mean, there is no simple interpretation for the variance or standard
deviation. The variance though is analogous to the moment of inertia in Physics, but
that is not necessarily widely understood by learners. Stress that, in relative terms,
• a small standard deviation (and variance) means that the distribution of the
random variable is quite concentrated around the mean
• a large standard deviation (and variance) means that the distribution is rather
spread out, with some chance of observing values at some distance from the
mean
Inform learners that, in practice, the variance is not computed with the definition,
but rather using the following result:
Thus, the variance is the difference between the expected value of X2 and the
square of the mean.
! 150#
Explanatory Note: This can be derived from the definition, some algebraic
expansion of a binomial expression, and some properties of expected values (such
as the mean of a constant is the constant):
It is suggested though that this derivation not be discussed in class. It may be

helpful though to use this computational formula, and to use computers whenever
possible.
D. Enrichment: Mean of a Continuous Random Variable
Ask learners what they think would be the expected value of X if X were continuous
with probability density function f(x). They should provide the equivalent quantity
for a continuous random variable involving an integral. That is, the mean of
a continuous random variable X with probability density function f(x) is!
!
Tell learners that integrals may be viewed as sums if the curve is “discretized.” Also,
tell them when looking at a probability density function, they can locate the mean
by determining the “center of gravity” of the curve, i.e. where the pivot should be
placed to make the probability density function balanced on the x-axis, imagining
that the probability density function is a thin plate of uniform material, with height
f(x) at x.
An important consequence of this is that the mean of any random variable

(continuous or discrete) that has a symmetric distribution is always on the axis of
symmetry of the distribution. For a continuous random variable, this means the axis
of symmetry of the probability density function.
! 151#
KEY POINTS
• The mean (or expected value) of a discrete random variable, say X, is a

weighted average of the possible values of the random variable, where the
weights are the respective probabilities
• The variance is the expected value of the squared deviations from the mean.
• The standard deviation is the square root of the variance
ASSESSMENT!
1. The probability distributions of four random variables are shown below
! 152#
For each of these probability distributions:
a. Confirm that the graph represents a probability distribution.
b. Guess what the means are.
c. Compute the actual value of the mean.
! 153#
d. Provide a guess on which has the smallest variance among the distributions,
and the one with the largest variance
e. Calculate the variance and standard deviation.
Answer:
a. In each of the distribution, the probabilities are all clearly non-negative and
the sum of the probabilities equals one.
b. Visually, we can give good guesses on what the means are: E(X) approx. 6;
E(W) should be 5.5; EW should be 6; ET should be 5.
c. Using a spreadsheet, we can compute the Means as EX=5.7, EW=5.5; EW=6;
ET=5
d. Graphs show most variability for Y and W has the least variability
e. We could calculate variance with the computing formula
So that we only need to obtain the expected value of the squared random
variable, and then subtract from this the square of the mean, and thus verify
that Y has the biggest variance and standard deviation (while W has the
least).
! 154#
Lesson 7: More about Means and Variances
• explain why the mean and variance are important summary measures of a
probability distribution
• calculate the mean and variance for sums and differences of independent
random variables
• provide an interpretation of the mean (and variance) of a discrete random
variable
PRE-REQUISITE LESSONS: Random Variables, Probability Distributions, Expected

Values
LESSON OUTLINE
A. Review of Means and Variances

B. Motivation: The Chebyshev’s Inequality
C. Main Lesson: Properties of Means and Variances
REFERENCES
Grinstead, C. & Snell, J. (1997). Introduction to Probability: Second Revised Edition.

American Mathematical Society. Retrieved from
http://www.dartmouth.edu/~chance/teaching_aids/books_articles/probability_book
/Chapter6.pdf
Metzler, D. [David Metzler]. (2012, December 4). Example of expected value and
variance of a sum of two independent random variables. Retrieved from
https://www.youtube.com/watch?v=SMl7Jx7fcdE
! 155#
A. Introduction: Review of Means and Variances
Ask learners to recall the definition of the expected value and variance of a random
variable. They should say:
• The expected value of a discrete random variable is a weighted average of

the possible values of a random variable, with the weights being the
probabilities.
• The variance of a random variable X with mean m is the expected value of (X-
μ)2
Help learners remember that computationally, we derive the variance as
In addition, ask learners to define the standard deviation. They should say it is the
square root of the variance.
B. Motivation: Importance of Mean and Variance (and Standard Deviation) / The

Chebyshev’s Inequality
Learners may wonder why the mean and standard deviation are by far the two most
important summary measures of a distribution (whether a list of data, or for
probability distribution, including a probability density function).
Tell them about a mathematical result derived by a Russian mathematician named

Pafnuty Chebyshev, called Chebyshev’s Inequality, that says that for a
distribution,
(i) at least three fourths of the distribution is within two standard deviations from
the mean;
(ii) at least eight ninths of the data are within three standard deviations from the
mean. These bounds may be conservative though.
In the next set of lessons, learners will discover that if a random variable has a
normal distribution, these limits are even quite conservative: about 95% of a normal
! 156#
To make it simple, consider a case of two independent random variables, X and Y.
The expected value of the sum of independent random variables X and Y is the sum
of the expected values:
E(X + Y) = E(X) + E(Y)
while the expected value of the difference of X and Y is the difference of the
expected values:
E(X - Y) = E(X) - E(Y)
How about the variance? Explain to learners that if the random variables are
independent, then there is a simple Addition Rule for variances (for a sum of
random variables):
Var( X + Y ) = Var(X) + Var(Y)
What about the variances of a difference? Surprisingly, variance also adds up for a
difference of random variables:
Var( X - Y ) = Var(X) + Var(Y)
Variances are added for both the sum and difference of two independent random
variables because the variation in each variable contributes to the variation in each
case. The variability of the differences increases as much as the variability of sums.
To illustrate this notion about sums (or differences of random variables), consider a
team of four swimmers that are supposed to perform 4 medley relay events and
swimming 100 meters. The swimmers’ performances are independent, having the
following means and standard deviations of the times (in seconds) to finish 100
meters.
Swimmer Mean Standard

Deviation
1 (freestyle) 45.02 0.20
2 (butterfly) 50.25 0.26
3 (backstroke) 51.45 0.24
4 (breaststroke) 56.38 0.22
! 158#
!
!
Recall in the previous lesson that for tossing a fair coin (where p=1/2) three times,
the expected value of the number of heads is 1.5. We can also derive this as:
E(X1 + X2+ X3) = E(X1)+ E(X2)+ E(X3) = (1/2) + (1/2) + (1/2 ) = 1.5
while the variance was 0.75, and we can get this of
Var(X1 + X2+ X3) = Var(X1)+ Var(X2)+ Var(X3) = (1/2) (1/2) + (1/2) (1/2) +
(1/2 ) (1/2) = 3(1/2) (1/2) = 0.75
For tossing a fair coin ten times, the expected value of the number of heads is
E(X1 + X2+ X3) = E(X1)+ E(X2)+ E(X3) + … + E(X10) = 10(1/2)= 5
while the variance here is
Var(X1 + X2+ X3 +…+ X10) = Var(X1)+ Var(X2)+ Var(X3) … + Var(X10)
= 10(1/2) (1/2) = 5/2 = 2.5
and thus a standard deviation of approximately 1.58.
Using Chebyshev’s Inequality, we know that when tossing a fair coin ten times (and
repeating this coin tossing process many, many times), at least three fourths of the
time, we would have the number of heads range between 5 heads (the expected
value) and, give or take, 3 heads ( 3 = 2 times the standard deviation 1.58 ).
In general, when we have a sequence of independent random variables X1, X2, X3,
…, Xn, with a common mean m, and a common standard deviation s, then the sum
S=
will have an expected value of (n m) and a variance of (n s 2).
If we were to toss a fair coin 100 times, then expected value of the number of heads
obtained is 100 (1/2)=50 , while the variance is =100 (1/2) (1/2) =25. According to
Chebyshev’s Inequality, at least three fourths of the distribution of the number of
heads in 100 tosses of a fair coin is within 50 – 2(5) = 40 heads to 50 + 2 (5) = 60
heads.
For tossing a coin n times where the probability of getting a head is p, if S is the
number of heads, then E(S) = n (p) while Var (S) = n (p) (1-p).
! 160#
Remind learners that variances of independent random variables are the ones that
add up (not the standard deviations: variances have squared units, so the intuition
here is the underlying use of the Pythagorean theorem: the square of the
hypotenuse is the sum of squares of the legs). In addition, remind them that
variances of independent random variables add even when we are considering
differences between them.
KEY POINTS
• Adding or subtracting a constant from the distribution of a random variable X

shifts the mean E(X) by the constant but it does not change the variance
E( X ± c ) = E(X) ± c
Var( X ± c ) = Var(X)
• Multiplying or dividing the distribution of X by a constant changes the mean

by a factor equal to the constant, and the variability by the square of the
constant.
E( aX ) = a E(X)
Var( aX ) = a2 Var(X)
• The expected value of the sum (difference) of independent random variables

X and Y is the sum (difference) of the expected values.
E(X ± Y) = E(X) ± E(Y)
while the variance of the sum (difference) of the random variables is the sum
of the variances.
Var( X ± Y ) = Var(X) + Var(Y)
! 161#
ASSESSMENT!
1. A grade 12 student uses the Internet to get information on temperatures in the

city where he intends to go for college. He finds information in degrees
Fahrenheit.
Determine the summary statistics equivalents in Celsius scale given °C =(°F-32)

(5/9)
Maximum = 82.4 Range = 23.4 Median = 71.6

Mean =73.4 Standard Deviation = 7.2 IQR = 10.8
Answers:
The measures of location are sensitive to the transformation.
Maximum in Celsius = (82.4 -32) (5/9) = 28

Mean in Celsius = (73.4 -32) (5/9) = 23
Median in Celsius = (71.6 -32) (5/9) = 22
but the measures of variation will only be sensitive to scale.
Range in Celsius = (23.4)*5/9 = 13

Standard Deviation in Celsius = (7.2)*5/9 = 4
Interquartile Range in Celsius = (10.8)*5/9 = 6
2. Two companies are selling batteries for MP4 players. Company A claims an
average battery life of 12 hours while Company B advertises an average of 14
hours. Why would we need to know the standard deviation of the battery life
before deciding what brand to buy? Suppose the standard deviations are 1.5
hours for A and 1 hour for B, which battery is more likely to last all day?
Answer: While mean is important, standard deviation gives info on

variability. Together, they show consistency. Since B has higher average
and lower variability, it is likely to last longer. According to Chebyshev’s
inequality, at least 8 three fourths of the time, A will last within 9 hours to
15 hours; while B will last within 12 hours to 16 hours at least three fourths
of the time.
! 162#
3. Suppose that in a casino, a certain slot machine pays out an average of Php15,
with a standard deviation of Php5000. Every play of the game costs a gambler
Php20.
a) Why is the standard deviation so large?

b) If your parent decides to play with this slot machine 5 times, what are the
mean and standard deviation of the casino’s profit?
c) If gamblers play with this slot machine 1000 times in a day, what are the
mean and standard deviation of the casino’s profits?
Answer:
a) Gamblers lose a small amount of money most of the time, but there are a
few large payouts by the slot machine
In one play of the game, the slot machine loses X pesos to the gambler,
with a mean E(X)= 15, while SD(X) =5000, or Var(X)=25,000,000. Note
that every play of the game, gamblers are charged Php20. Thus, the
casino actually loses only an average of E(X-20) = Php15 - Php20 = - Php5
(i.e. on average, the casino wins Php5 per game), with a variability of
Var(X-20) = 25,000,000
b) For 5 plays of the game, expected value of “losses” would be 5 times (-

Php5) =
-Php25, i.e. (Php25 earned by a casino per game), with a variability of
Var( 5 ( X – 20 ) ) = 25 * 25,000,000 = 625,000,000
And thus, a standard deviation of 25,000
c) For 1000 plays of the game, the expected losses of the casino would be
E (1000 ( X -20 ) ) = 1000 x (-Php5) = -Php5000
With a variance of
Var (1000 ( X – 20 ) ) = 10002 x (25,000,000)
or a standard deviation of 1000 x 5,000 = 5,000,000
! 163#
Lesson 8: The Normal Distribution and Its Properties
• describe a normal random variable and its characteristics
• draw a normal curve
• state the empirical rule
PRE-REQUISITE LESSONS: Random Variables, Probability Distribution of Continuous

Random Variables
KEY CONCEPTS: Mean, Variance of a Random Variable
LESSON OUTLINE
A. Introduction: Review of Continuous Random Variables
B. Motivation: Distribution of Weights of Babies
C. Main Lesson: The Normal Curve and Its Properties
D. Seatwork: Validating the Empirical Rule in the Weights Distribution
E. Enrichment: History regarding the Normal Curve
F. Further Enrichment: Distribution of Balls in The Quincunx
REFERENCES
• Albert, J. R. G. (2008). Basic Statistics for the Tertiary Level (ed. Roberto Padua, Welfredo Patungan, Nelia
Marquez). Philippines: Rex Bookstore.
• De Veau, R. D., Velleman, P. F., and Bock, D. E. (2006). Intro Stats. Pearson Ed. Inc.
• Workbooks in Statistics 1: 11th Edition, Institute of Statistics, UP Los Baños, College Laguna 4031
• Probability and Statistics Module 22: Exponential and Normal Distributions. (2013) Australian Mathematical
Sciences Institute and Education Services Australia. Retrieved from
• http://www.amsi.org.au/ESA_Senior_Years/PDF/ExpoNormDist4f.pdf
• Dean, S. & Illowsky, Barbara. Normal Distribution: Teacher's Guide. OpenStax. Retrieved from
http://cnx.org/contents/228deca0-5532-488c-8422-5878022132d6@9/Normal-Distribution:-Teacher's
• [howtostats]. Normal Distribution Explained Simply (Part 1). Retrieved from
• https://www.opened.com/video/normal-distribution-explained-simply-part-1/43505
• [khan academy]. Introduction to the Normal Distribution. Retrieved from
• https://www.opened.com/video/introduction-to-the-normal-distribution/109297
• [khan academy]. (2009, April 26). Normal distribution excel exercise / Probability and Statistics. Retrieved from
https://www.youtube.com/watch?v=yTGEMoaWDCQ
• Neal, David. (2012, July 22). Normal Distributions. Retrieved from
http://people.wku.edu/david.neal/183/Unit1/Normal.pdf
• Normal Curves. Annenberg Learner. Retrieved from
http://www.learner.org/courses/againstallodds/unitpages/unit07.html
! 164$
A. Introduction: Review of Continuous Random Variables
Ask learners to recall the definition of a continuous random variable. (It is a random
variable that can take any real value within a specified range whereas a discrete
random variable takes some on a countable number of values). Learners should also
remember that a continuous variable involves the measurement of something, such
as the height of a randomly selected student, the weight of a newborn baby, or the
length of time that the battery of a cellphone lasts.
B. Motivation: Distribution of Weights of Babies Follows a Bell-shaped

Curve
Consider the following data pertaining to hospital weights (in pounds) of all the 36
babies that were born in the maternity ward of a certain hospital.
4.94 4.69 5.16 7.29 7.19 9.47 6.61 5.84 6.83

3.45 2.93 6.38 4.38 6.76 9.01 8.47 6.8 6.4
8.6 3.99 7.68 2.24 5.32 6.24 6.19 5.63 5.37
5.26 7.35 6.11 7.34 5.87 6.56 6.18 7.35 4.21
The data have an average of 6.11 pounds and a standard deviation of 1.61 pounds.
Show learners the histogram for this data set, or ask them to generate it. Help them
observe that the histogram is approximately bell-shaped:
.3
.2
Density
.1
0
2 4 6 8 10
weight
! 165$
C. Main Lesson: The Normal Curve and Its Properties
Inform learners that many continuous random variables, such as IQ scores, heights
of people, or weights of M&Ms, have histograms that have bell-shaped
distributions.
Tell them that the most important distribution in statistical science is a normal
distribution, which has a "bell-shaped" curve. Explain that there are many reasons
why the normal distribution is considered the most important curve in statistics. !
(a) Many random variables are either normally distributed or, at least,
approximately normally distributed. Heights, weights, examination scores, the
log of the length of life of some equipment are among a few random variables
that are approximately normally distributed. Although the distributions are only
approximately normal, the approximation is usually quite close.
(b) It is easy for mathematical statisticians to work with the normal curve. A number
of hypothesis tests and the regression model are based on the assumption that
the underlying data have normal distributions. (Extra note: There are, however,
other kinds of continuous distributions that are used in practice. For instance,
the distribution that has been found convenient for modeling the length of life
of an equipment is the Weibull distribution.)
Stress that the normal distribution is a continuous distribution just like the uniform
and triangular distribution. However, the left and right tails of the normal
distribution extend indefinitely but come infinitely close to the x-axis.
Draw a picture of the normal (bell-shaped) curve
Explain that the graph of the normal distribution depends on two factors: the
mean m and the standard deviation σ. In fact, the mean and standard deviation
! 166$
characterize the whole distribution. That is, we can get areas under the normal
curve given information about the mean and standard deviation.
Mention that the mean determines the location of the center of the bell shaped
curve. Thus, a change in the value of the mean shifts the graph of the normal curve
to the right or to the left.
Ask learners to recall what the mean, median, and mode of a distribution represent.
They should say (a) the mean represents the balancing point of the graph of the
distribution; (b)
the mode represents the “high point” of the probability density function (i.e. the
graph of the distribution), (c) the median represents the point where 50% of the
area under the distribution is to the left and 50% of the area under the distribution
is to
the right.
For symmetric distributions with a single peak, such as the normal curve, assist
learners to remember that in this case: Mean = Median = Mode.
Inform learners that the standard deviation determines the shape of the graphs
(particularly, the height and width of the curve). When the standard deviation is
large, the normal curve is short and wide, while a small value for the standard
deviation yields a skinnier and taller graph.
Draw the curves on the board:
Mention to learners that the curve above on the left is shorter and wider than the
curve on the right, because the curve on the left has a bigger standard deviation.
Get students to take note that a normal curve is symmetric about its mean and
is more concentrated in the middle rather than in the tails. Aside from that,
observe that normal curves differ in how spread out they are (and that the spread or
variability is measured by the standard deviation s).
! 167$
Tell learners that when a random variable has a normal distribution with mean m
and variance σ2, we denote this as X~N(μ,σ2 ).
Technical Note: The height of a normal curve at some value x is a formidable-

looking expression that depends on the mean m and standard deviation s:
1 & 1 #
f ( x) = exp$− ( x − µ )2 ! ,
2
2π σ % 2σ "
You will notice that this involves three famous numbers in the history of
mathematics:
2 = 1.41421235652,
π ≈ 3.141592654 and
Euler’s number e ≈ 2.7182818
Learners should not, however, be given this expression as they may feel threatened
by it. Instead, use the graphical form of the normal distribution by drawing the bell-
shaped curve.
Emphasize the following statements about the normal curve:
• The total area under the normal curve is equal to 1.

• The probability that a normal random variable X equals any particular value
a, P(X=a) is zero (0) (since it is a continuous random variable).
• The probability that X is less than a equals the area under the normal curve
bounded by a and minus infinity (as indicated by the shaded area in the
figure below)
• The probability that X is greater than some value a equals the area under the
normal curve bounded by a and plus infinity (as indicated by the non-shaded
area in the figure above).
• Since the normal curve is symmetric about the mean, the area under the
curve to the right of m equals the area under the curve to the left of m which
equals ½, i.e. the mean m is the median.
! 168$
• The probability density function is maximized at m, i.e. the mode is also the
mean.
• The normal curve has inflection points (i.e. point at which a change in the
direction of curvature occurs) at m - s and at m + s
• As x increases without bound (gets larger and larger), the graph approaches
but never reaches, the horizontal axis. As x decreases without bound (gets
larger and larger in the negative direction), the graph approaches, but never
reaches, the horizontal axis
Emphasize also to learners that every normal curve (regardless of its mean or
standard deviation) conforms to the following "empirical rule" (also called the 68-
95-99.7 rule):
• About 68% of the area under the curve falls within 1 standard deviation of
the mean.
• About 95% of the area under the curve falls within 2 standard deviations of
the mean.
• Nearly the entire distribution (About 99.7% of the area under the curve) falls
within 3 standard deviations of the mean.
! 169$
Explanatory Note: The empirical rule is actually a theoretical result based on an
analysis of the normal distribution. In the first chapter, it was pointed out that the
importance of the mean and standard deviation as summary measures is due to
Chebyshev’s Inequality, which guarantees that the area under a distribution within
two standard deviations from the mean is at least 75%. For nearly all sets of data,
the actual percentage of data may be much greater than the bound specified by
Chebyshev’s Inequality. In fact, for a normal curve, the area within two standard
deviations from the mean is about 95%. Also, about two thirds of the distribution lie
within one standard deviation from the mean and nearly the entire distribution
(99.7%) is within three standard deviations from the mean.
! 170$
D. Seatwork: Validating the Empirical Rule
Ask learners to determine what the frequency and relative frequency of babies’
weights that are within:
a) One standard deviation from the mean
Answer: 26 out of 36; or about 72%. Values within one s from m are in boldface:
4.94 4.69 5.16 7.29 7.19 9.47 6.61 5.84 6.83

3.45 2.93 6.38 4.38 6.76 9.01 8.47 6.8 6.4
8.6 3.99 7.68 2.24 5.32 6.24 6.19 5.63 5.37
5.26 7.35 6.11 7.34 5.87 6.56 6.18 7.35 4.21
b) Two standard deviations from the mean
Answer: 34 out of 36; or about 95% Values within two s from m are in boldface
4.94 4.69 5.16 7.29 7.19 9.47 6.61 5.84 6.83

3.45 2.93 6.38 4.38 6.76 9.01 8.47 6.8 6.4
8.6 3.99 7.68 2.24 5.32 6.24 6.19 5.63 5.37
5.26 7.35 6.11 7.34 5.87 6.56 6.18 7.35 4.21
c) Three standard deviations from the mean (ANSWER: 36 out of 36; 100%)
Remark:
In place of examining the distribution of weights from babies, you may want to
examine heights or weights of learners obtained from the data collection activities
in Lesson 1 of Chapter 1.
! 171$
E. Enrichment: History regarding the Normal Curve
Historical Notes on the Normal Curve:
(i) The French-English mathematician Abraham de Moivre first described the use of
the normal distribution in 1733 when he was developing the mathematics of
chance, particularly for approximating the binomial distribution. Marquis de
Laplace used the normal distribution as a model of measuring errors.
Adolphe Quetelet and Carl Friedrich Gauss popularized its use. Quetelet
used the normal curve to discuss “the average man” with the idea of using
the curve as some sort of an ideal histogram while Gauss used the normal
curve to analyze astronomical data in 1809.
(ii) In some disciplines, such as engineering, the normal distribution is also called
the Gaussian distribution (in honor of Gauss who did not first propose it!).
The first unambiguous use of the term “normal” distribution is attributed to
Sir Francis Galton in 1889 although Karl Pearson's consistent and exclusive
use of this term in his prolific writings led to its eventual adoption throughout
the statistical community.
F. Further Enrichment: Distribution of Balls in a Quincunx
If learners have access to Internet, ask them to go to this website:

http://www.mathsisfun.com/data/quincunx.html
On this webpage, they will be shown a quincunx or "Galton Board" (named after Sir
Francis Galton). This is a triangular array of pegs. Balls are dropped onto the top
peg, and then subsequently, they bounce their way down to the bottom where they
are collected in little bins. Each time a ball hits one of the pegs, it bounces either to
the left or right with equal probability, and consequently the number of pegs
collecting in the bins form a bell-shaped curve, especially as the number of rows
(and bins) as well as the number of balls increases.
Tell learners to reset defaults with the simulator and use 6 rows and drop about 50
balls.
! 172$
Then 12 rows with about 100 balls:
! 173$
The chance of falling on the kth peg given n rows and a probability p of bouncing left
is given by the so-called binomial probability distribution,
In particular, for 12 rows (n=12) and a probability of bouncing left of 0.5 (p=0.5), we
can calculate the probability of being in the 5th bin from the right (k=5) as follows:
= 0.193
In fact, we can build the entire probability distribution for rows=12 and probability=0.5
like this:
Bin number from 12 11 10 9 8 7 6

right
Probability 0.00024 0.0029 0.01611 0.05371 0.1208 0.19335 0.22558
4 3 3 1 5 9 6
Bin number from 5 4 3 2 1 0
right
Probability 0.19335 0.1208 0.05371 0.01611 0.0029 0.00024
9 5 1 3 3 4
KEY POINTS
• The normal distribution, a special continuous distribution, is extremely important in
statistics because many random variables that occur in real applications have
normal distributions (or approximately normal distributions).
• The normal distribution, characterized by its mean m and its standard deviation s.,
has a graph that is bell-shaped. It is also symmetric about the mean so that in
consequence, the mean is the median and is also the mode (since the curve is
highest at the mean).
• The normal curve satisfies the Empirical Rule: (a) Approximately 68% of the area
under the normal curve is within one standard deviation from the mean; (b)
Approximately 95% of the area under the normal curve is within two standard
deviations from the mean; and (c) nearly everything, approximately 99.7% of the
area under the normal curve, is within three standard deviations from the mean.
! 174$
ASSESSMENT !
1. The data below and the accompanying histogram give the weights, to the
nearest hundredth of a gram, of a sample of 100 coins (each with a value of
P10). The mean weight is 8.69 grams and the standard deviation s is
approximately 0.055 gram.
8.57 8.62 8.65 8.67 8.68 8.7 8.71 8.73 8.74 8.77
8.57 8.62 8.65 8.67 8.68 8.7 8.71 8.73 8.74 8.77
8.58 8.63 8.65 8.67 8.69 8.7 8.71 8.73 8.74 8.77
8.59 8.63 8.65 8.67 8.69 8.7 8.72 8.73 8.75 8.78
8.6 8.63 8.65 8.67 8.69 8.7 8.72 8.73 8.75 8.78
8.6 8.63 8.66 8.67 8.69 8.71 8.72 8.73 8.75 8.79
8.61 8.64 8.66 8.68 8.69 8.71 8.72 8.73 8.76 8.79
8.61 8.64 8.66 8.68 8.69 8.71 8.72 8.74 8.76 8.8
8.62 8.64 8.66 8.68 8.7 8.71 8.72 8.74 8.76 8.81
8.62 8.64 8.66 8.68 8.7 8.71 8.72 8.74 8.76 8.81
8
6
Density
4
2
0
8.55 8.6 8.65 8.7 8.75 8.8

weightcoin
! 175$
a. Compare the mean and median.
b. What percentage of the data is within one standard deviation of the mean?
Within two standard deviations? Within three standard deviations?
c. Suppose you were to randomly select a coin from this collection. What is the
chance that its weight would be within one standard from the mean? Two
standard deviations? Three standard deviations?
d. What percentage of the data is below the mean?
e. Suppose you were to randomly select a coin from this collection. What is the
chance that its weight would be below the mean?
Answer:
a. Very close, the median is 8.7 grams.

b. 67%, 95%, 100%.
c. According to the empirical rule, the chances are 68%, 95% and 99.7%.
d. 48%
e. 50%
2. Fifty students were asked to run a 100-meter dash. The data below represents
the time it took to finish the dash, and the histogram. The mean time for the 50
students is 15.8 seconds, and the standard deviation s is approximately 3.29
seconds.
16 14 14 16 21 14 17 15 16 21
14 10 9 20 12 12 19 11 15 14
18 18 13 18 23 8 20 13 16 23
16 17 15 18 17 16 13 15 18 19
12 12 15 17 14 16 17 16 16 21
a. Calculate the median and compare it with the mean.

b. What percentage of the data is within one standard deviation of the mean?
Within two standard deviations? Within three standard deviations?
c. Suppose you were to randomly select a student from these fifty students.
What is the chance that the time it took for him/her to run the 100-meter
dash would be within one standard from the mean? Two standard
deviations? Three standard deviations?
! 176$
Answer:
a. Very close, the median is 8.7 grams.

b. 70%, 92%, 100%.
c. According to the empirical rule, the chances are 68%, 95% and 99.7%.
3. Toss a fair coin twice and let X be the number of heads obtained. Generate the
histogram for the distribution. Consider tossing the fair coin 3 times, 5 times, 10
times, and 15 times. Generate the histogram for the number of heads also for
these cases.
As the number of tosses increases, what curve can be used to approximate the
histogram?
(a) Probability Distribution for number of heads in 2 tosses of a fair coin
X 0 1 2
P(X=x) 0.25 0.5 0.25
(b) Probability Distribution for number of heads in 3 tosses of a fair coin
X 0 1 2 3
P(X=x) 0.125 0.375 0.375 0.125
! 177$
(c) Probability Distribution for number of heads in 5 tosses of a fair coin
X 0 1 2 3 4 5
P(X=x) 0.03125 0.15625 0.3125 0.3125 0.15625 0.03125
(d) Probability Distribution for number of heads in 10 tosses of a fair coin
X 0 1 2 3 4 5
P(X=x) 0.000976563 0.009766 0.043945 0.117188 0.205078 0.246094
X 6 7 8 9 10
P(X=x) 0.205078 0.117188 0.043945 0.009766 0.000977
! 178$
(e) Probability Distribution for number of heads in 15 tosses of a fair coin
X 0 1 2 3 4 5
P(X=x) 3.05176E-05 0.000458 0.003204 0.013885 0.041656 0.091644
X 6 7 8 9 10
P(X=x) 0.15274 0.196381 0.196381 0.15274 0.091644
X 11 12 13 14 15
P(X=x) 0.041656 0.013885 0.003204 0.000458 3.05E-05
(f) As the number n of tosses increases, the normal curve can be used to
approximate the histogram of the number of heads in n tosses of a fair coin
(a “binomial probability distribution)?
! 179$
4. Suppose the weights of Filipino Grade 11 students are normally distributed with
a mean of 52 kilograms and a standard deviation of 1 kilogram. Explain what this
means in terms of the properties of a normal distribution
Answer:
Let Ω be all Grade 11 Filipino students, and let X denote their weight (in kg).
Then X ~ N(52, 1) kg. That is,
(i) the average weight, the most likely weight, and the median weight are all 52
kg.
(ii) weights of the grade 11 Filipino students as a whole are symmetric about the
weight 52 kg.
(iii) Around 68% of weights are from 51 to 53 kg (µ ± σ); around 95% of weights
are from 50 to 54 kg (µ ± 2σ); and around 99.7% of weights are from 49 to 55 kg
(µ ± 3σ ).
(iv) A histogram of weights of Filipino grade 11 students creates a “Bell-Shaped

Curve” with the percentages of high and low weights dropping off exponentially
The following data pertaining to the points scored of the high school basketball
team in 28 games.
66 75 41 57 54 82 67
42 60 37 49 87 101 78
60 66 48 43 42 61 64
67 37 51 63 68 77 13
The average of the points scored is 59.14286, while the standard deviation is
17.856 and the histogram is provided below.
! 180$
.03
.02
Density
.01
0
0 20 40 60 80 100
basketball
Can we approximate the distribution of scores with a normal curve, and what does
this mean in terms of the properties of a normal curve?
Answer:
Yes, we can approximate the basketball scores distribution with a normal curve. Let
X denote their basketball score. Then X ~ N(59.14, 17.962). That is,
(i) the average score, the most likely score, and the median score are all 59 points.
(ii) scores in the basketball games as a whole are symmetric about the 59 points
(iii) Around 68% of scores are from 41 to 77 points (µ ± σ); around 95% of scores are
from 23 to 95 points (µ ± 2σ); and around 99.7% of weights are from 6 to 113
points (µ ± 3σ ).
(iv) A histogram of the basketball scores creates an approximate “Bell-Shaped

Curve” with the percentages of high and low scores dropping off exponentially
! 181$
Lesson 9: Areas Under a Standard Normal Distribution
• compute probabilities using a table of cumulative areas under a standard normal

curve
• compute percentiles of a (standard) normal curve
PRE-REQUISITE LESSONS: Random Variables, Probability Distribution of Continuous

Random Variables, Properties of Normal Distributions
LESSON OUTLINE
A. Introduction: Review of Normal Distribution

B. Main Lesson: Areas Under a Normal Curve
C. Enrichment: Computing with Excel
REFERENCES
Welfredo Patungan, Nelia Marquez). Philippines: Rex Bookstore.

Laguna 4031
Probability and statistics: Module 22. (2013). Australian Mathematical Sciences Institute
http://www.amsi.org.au/ESA_Senior_Years/PDF/ExpoNormDist4f.pdf
! 182$
Ask learners to recall some of the lessons learned about normal distributions. They
should be able to state that:
• A normal distribution has a symmetric bell-shaped curve (for its probability

density function) with one peak. This curve is characterized by its mean m
(the center of symmetry, and also the peak) and standard deviation σ (the
distance from the center to the change-of-curvature points on either side). If
a random variable X has a normal distribution with mean m and variance σ2,
we denote this as X~N(μ,σ2 ).
• A normal curve is symmetric about its mean (thus the mean is the median). It
is more concentrated in the middle and its peak is at the mean (so that the
mean is also the mode).
• Like any continuous distribution, the total area under the normal curve is
equal to 1, and the probability that a normal random variable X equals any
particular value a , P(X=a) is zero (0).
• The normal curve follows the empirical rule (also called the 68-95-99.7 rule):
o About 68% of the area under the curve falls within 1 standard
deviation of the mean.
deviations of the mean.
o Nearly the entire distribution or about 99.7% of the area under the
curve falls within 3 standard deviations of the mean.
B. Main Lesson: Probabilities/Areas Under a Normal Curve
Inform learners that there are countless normal curves, and we could very readily
obtain the probabilities with computers, specifically with the use of spreadsheet
applications of statistical software applications like Microsoft Excel.
They need to examine a special normal distribution called the standard normal
distribution.
The Standard Normal Curve

Define the standard normal distribution to be the normal distribution with a
mean of 0 and a standard deviation of 1, and draw a standard normal curve:
! 183$
Tell learners that the notation Z ~N (0; 1) means that the random variable Z has a
standard normal distribution, i.e. m = 0 and s = 1. Ask learners to use the empirical
rule to determine the following areas under a standard normal curve:
(a) -1 to +1 (Answer should be 68 percent)

(b) - 2 to +2 (Answer should be 95 percent)
(c) - 3 to +3 (Answer should be 99.7 percent)
Provide a copy of the handout (found at the end of this lesson plan) to learners.
This is a table of the cumulative distribution function (i.e. the area to the left of
some particular value z) of a standard normal curve. That is, this table reports the
cumulative probability associated with a particular z-score:
Φ(z) = P(Z ≤ _z) = area under a Standard normal curve to the left of some
particular z
! 184$
Explain to learners that the table’s rows show the whole number and tenths place of
the z-score, while the table’s columns show the hundredths place, and finally, the
cumulative probability Φ(z) appears in the cell of the table.
For example, a section of the standard normal table is reproduced below. To find
the cumulative probability of a z-score equal to -1.31, explain to students that they
should cross-reference the row of the table containing -1.3 with the column
containing 0.01. The table shows that the probability that a standard normal
random variable will be less than -1.31 is 0.0951; that is, Φ(1.31) = P(Z ≤ -1.31) =
0.0951.
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
1.3 0.0968 0.0951 0.0934 0.0918 0.0901 0.0885 0.0869 0.0853 0.0838 0.0823
Practice further with learners on how to use this table of cumulative probabilities
under a standard normal curve. Assume that we have a random variable Z that has a
standard normal distribution. Ask them what would be:
(a) P( Z ≤ 0 ): Answer should be 0.5 since the first entry of the first line (of the
second page) for the Table of values of Φ(z) reads so.
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
(b) P( Z ≤ -1.54 ) ; As per Table of values of Φ(z), answer is 0.0618
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
-1.5 0.0668 0.0655 0.0643 0.0630 0.0618 0.0606 0.0594 0.0582 0.0571 0.0559
(c) P(-1.54 ≤ Z ≤ 1.54 ) = 0.8764. Get a graph of the pertinent area of interest, and
show that the area between -1.54 and 1.54 can be obtained from the difference
of the area to the left of 1.54 and the area to the left of -1.54:
! 185$
= P( Z ≤ 1.54 ) - P( Z ≤ -1.54 ) = 0.9382 - 0.0618 (as per the table entries) =
0.8764
(d) P(Z ≥ 1.54) = 0.0618
P(Z ≥ 1.54) is an upper tail area, but the total area under the curve is 1, so P( Z ≥
1.54 ) is the difference of 1 and the area to the left of 1.54, i.e.
1- P( Z ≤ 1.54 ) = 1 - 0.9382.= 0.0618
! 186$
Alternatively, P( Z ≥ 1.54 ) = P( Z ≤ - 1.54 ) = 0.0618
Technical Com puting Note:
The formula below can be used to approximate Φ (z ) :
1 2
Φ( z ) ≈ 1 − e −z /2
(a1 y + a 2 y 2 + a3 y 3 + a4 y 4 + a5 y 5 ),
2π
where
1
y= ; a = 0.319381530;
1 + 0.2316419 z 1
a2 = −0.356563782; a3 = 1.781477937 ;
a4 = −1.821255978; a5 = 1.330274429 .
Percentiles may also be obtained from the table. For instance, illustrate this by giving
an example such as this: obtaining values of z for which (a) the area to the left of z in a
standard normal curve is 0.5832; (b) the area to the right of z is 0.8508.
Show that we can find z directly for (a), by looking for the value of z that gives an entry
of 0.5832. In this case, we find z to be 0.21.
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141
For (b), show firstly that if the area to the right of z is 0.8508, then the area to the
left of z is 0.1492, so that consequently they need to observe that z=-1.04.
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
-1.0 0.1587 0.1562 0.1539 0.1515 0.1492 0.1469 0.1446 0.1423 0.1401 0.1379
Computing Note:
If computers are available, show learners that we could alternatively use Excel to
obtain (a) and (b). Merely enter the command
= NORMSINV(0.5832)
and generate the value of z as 0.210086 for (a).
While for (b), we enter the command
= NORMSINV(1-0.8508)
and thus find z as –1.03987.
! 187$
The 5th percentile of the standard normal curve can be obtained as -1.645, since as
per the table of values of Φ(z), we can find that Φ(-1.64)=0.0505 and Φ(-1.65)=0.0495,
so that interpolation yields Φ(-1.645)=0.05
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
-1.6 0.0548 0.0537 0.0526 0.0516 0.0505 0.0495 0.0485 0.0475 0.0465 0.0455
Provide a copy of handout 2-07-2 on selected percentiles of the standard normal

distribution (found at the end of this lesson plan) that can be used for later reference.

(May be skipped, especially if class has no access to computers)
Inform learners that the table entries of cumulative distribution function

(probabilities) can also be readily generated with Microsoft Excel, with the
NORMSDIST function. For instance, the area to the left of -1.31 in a standard
normal curve can be obtained by entering in Excel the following
=NORMSDIST(-1.31)
into any cell on the spreadsheet application.
KEY POINTS
• The standard normal distribution is a normal distribution with a mean of 0

and a standard deviation of 1.
• Tables of the Cumulative Distribution Function of a Standard Normal
Distribution can be used to generate various areas of a standard normal
curve as well as percentiles of the distribution.
ASSESSMENT
1. The standard normal distribution

a. has a mean of zero (0) and a standard deviation of 1.
b. has a mean of 1 and a variance of zero (0).
c. has an area equal to 0.5.
d. cannot be used to approximate discrete probability distributions.
Answer: A
! 188$
2. If Z has a standard normal distribution, and P (0 < Z < z ) is 0.3770, then the
value of z is
a. 0.18
b. 0.81
c. 1.16
d. 1.47
Answer: C
3. True or False: The probability that a standard normal random variable, Z, falls
between – 1.50 and 0.81 is 0.7242.
Answer: True
4. Suppose Z has a standard normal distribution with a mean of 0 and standard

deviation of 1. The probability that Z is less than 1.15 is __________.
Answer: 0.8749
5. Suppose Z has a standard normal distribution with a mean of zero (0) and
standard deviation of 1. The probability that Z values are larger than __________
is 0.3483.
Answer: 0.39
6. Suppose Z has a standard normal distribution with a mean of zero (0) and
standard deviation of 1. 85% of the possible Z values are smaller than
__________.
Answer: 1.04
7. Let Z be a standard normal random variable. Calculate:

(a) P(Z ≤ 1.43);
(b) P(Z> 1.43);
(c) P( –1.43 ≤ Z ≤ 1.43).
Answer: To solve (a), we merely read off directly the entry from the
Table of Cumulative Distribution Function values of a Standard Normal
Curve. Reproducing the needed part of this Table,
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319
We find that the area under the curve is F(1.43) = 0.9236.
! 189$
To solve (b), note that the sum of the area to the right and the area to
the left is the total area under the curve (100%), so that the area to the
right is 100% minus the area to the left. Thus, for (b), since the area to
the left of z=1.43 is 0.9236 and the total area under the curve is 100%,
then the area to the right of z = 1.43 is then 1-0.9236=0.0764.
σZ =1 σZ =1
= 100% -
1.43 Z 1.43 Z
µZ = 0 µZ = 0
To obtain (c), note that by symmetry, since the area to the right of z =
1.43 is 0.0764, the area to the left of z = -1.43 is also 0.0764. Hence, the
area between –1.43 and +1.43 is 1 – 2(0.0764) = 1–0.1528= 0.8472.
Alternatively, we obtain this probability as F(1.43)- F(-1.43)= 0.9236 – F(-
1.43) = 0.9236-0.0764=0.8472.
8. The Inter-quartile Range is the difference between the Third Quartile (i.e., the
75th percentile), and the First Quartile (the 25th percentile). Calculate the
Interquartile Range of a standard normal distribution.
Answer: We can obtain the IQR of a standard normal curve by

generating the upper and lower quartiles of the distribution. From the
Table, we have the upper and lower quartiles as 0.675 and –0.675,
respectively. Thus the IQR, the difference between the upper and
lower quartiles, is 1.35.
! 190$
HANDOUT 2-07-1
Cumulative Distribution Function (CDF) of the Standard Normal Curve
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
-3.8 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001
-3.7 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001
-3.6 0.0002 0.0002 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001
-3.5 0.0002 0.0002 0.0002 0.0002 0.0002 0.0002 0.0002 0.0002 0.0002 0.0002
-3.4 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0002
-3.3 0.0005 0.0005 0.0005 0.0004 0.0004 0.0004 0.0004 0.0004 0.0004 0.0003
-3.2 0.0007 0.0007 0.0006 0.0006 0.0006 0.0006 0.0006 0.0005 0.0005 0.0005
-3.1 0.0010 0.0009 0.0009 0.0009 0.0008 0.0008 0.0008 0.0008 0.0007 0.0007
-3.0 0.0013 0.0013 0.0013 0.0012 0.0012 0.0011 0.0011 0.0011 0.0010 0.0010
-2.9 0.0019 0.0018 0.0018 0.0017 0.0016 0.0016 0.0015 0.0015 0.0014 0.0014
-2.8 0.0026 0.0025 0.0024 0.0023 0.0023 0.0022 0.0021 0.0021 0.0020 0.0019
-2.7 0.0035 0.0034 0.0033 0.0032 0.0031 0.0030 0.0029 0.0028 0.0027 0.0026
-2.6 0.0047 0.0045 0.0044 0.0043 0.0041 0.0040 0.0039 0.0038 0.0037 0.0036
-2.5 0.0062 0.0060 0.0059 0.0057 0.0055 0.0054 0.0052 0.0051 0.0049 0.0048
-2.4 0.0082 0.0080 0.0078 0.0075 0.0073 0.0071 0.0069 0.0068 0.0066 0.0064
-2.3 0.0107 0.0104 0.0102 0.0099 0.0096 0.0094 0.0091 0.0089 0.0087 0.0084
-2.2 0.0139 0.0136 0.0132 0.0129 0.0125 0.0122 0.0119 0.0116 0.0113 0.0110
-2.1 0.0179 0.0174 0.0170 0.0166 0.0162 0.0158 0.0154 0.0150 0.0146 0.0143
-2.0 0.0228 0.0222 0.0217 0.0212 0.0207 0.0202 0.0197 0.0192 0.0188 0.0183
-1.9 0.0287 0.0281 0.0274 0.0268 0.0262 0.0256 0.0250 0.0244 0.0239 0.0233
-1.8 0.0359 0.0351 0.0344 0.0336 0.0329 0.0322 0.0314 0.0307 0.0301 0.0294
-1.7 0.0446 0.0436 0.0427 0.0418 0.0409 0.0401 0.0392 0.0384 0.0375 0.0367
-1.6 0.0548 0.0537 0.0526 0.0516 0.0505 0.0495 0.0485 0.0475 0.0465 0.0455
-1.5 0.0668 0.0655 0.0643 0.0630 0.0618 0.0606 0.0594 0.0582 0.0571 0.0559
-1.4 0.0808 0.0793 0.0778 0.0764 0.0749 0.0735 0.0721 0.0708 0.0694 0.0681
-1.3 0.0968 0.0951 0.0934 0.0918 0.0901 0.0885 0.0869 0.0853 0.0838 0.0823
-1.2 0.1151 0.1131 0.1112 0.1093 0.1075 0.1056 0.1038 0.1020 0.1003 0.0985
-1.1 0.1357 0.1335 0.1314 0.1292 0.1271 0.1251 0.1230 0.1210 0.1190 0.1170
-1.0 0.1587 0.1562 0.1539 0.1515 0.1492 0.1469 0.1446 0.1423 0.1401 0.1379
-0.9 0.1841 0.1814 0.1788 0.1762 0.1736 0.1711 0.1685 0.1660 0.1635 0.1611
-0.8 0.2119 0.2090 0.2061 0.2033 0.2005 0.1977 0.1949 0.1922 0.1894 0.1867
-0.7 0.2420 0.2389 0.2358 0.2327 0.2296 0.2266 0.2236 0.2206 0.2177 0.2148
-0.6 0.2743 0.2709 0.2676 0.2643 0.2611 0.2578 0.2546 0.2514 0.2483 0.2451
-0.5 0.3085 0.3050 0.3015 0.2981 0.2946 0.2912 0.2877 0.2843 0.2810 0.2776
-0.4 0.3446 0.3409 0.3372 0.3336 0.3300 0.3264 0.3228 0.3192 0.3156 0.3121
-0.3 0.3821 0.3783 0.3745 0.3707 0.3669 0.3632 0.3594 0.3557 0.3520 0.3483
-0.2 0.4207 0.4168 0.4129 0.4090 0.4052 0.4013 0.3974 0.3936 0.3897 0.3859
-0.1 0.4602 0.4562 0.4522 0.4483 0.4443 0.4404 0.4364 0.4325 0.4286 0.4247
-0.0 0.5000 0.4960 0.4920 0.4880 0.4840 0.4801 0.4761 0.4721 0.4681 0.4641
! 191$
HANDOUT 2-07-1
CDF of the Standard Normal Curve ( cont’d)

z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879
0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224
0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319
1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441
1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633
1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706
1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767
2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817
2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857
2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890
2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916
2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936
2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952
2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964
2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974
2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981
2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986
3.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990
3.1 0.9990 0.9991 0.9991 0.9991 0.9992 0.9992 0.9992 0.9992 0.9993 0.9993
3.2 0.9993 0.9993 0.9994 0.9994 0.9994 0.9994 0.9994 0.9995 0.9995 0.9995
3.3 0.9995 0.9995 0.9995 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 0.9997
3.4 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9998
3.5 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998 0.9998
3.6 0.9998 0.9998 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999
3.7 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999
3.8 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999 0.9999
! 192$
HANDOUT 2-07-2
Selected Percentiles of the Standard Normal

Distribution
z F(z)
-2.326 0.01
-1.96 0.025
-1.645 0.05
-1.282 0.10
-0.842 0.20
-0.675 0.25
0.00 0.50
0.675 0.75
0.842 0.80
1.282 0.90
1.645 0.95
1.96 0.975
2.326 0.99
! 193$
Lesson 10: Areas under a Normal Distribution
• convert a normal random variable to a standard normal variable and vice versa
• compute probabilities using a table of cumulative areas under a standard normal
curve
• compute percentiles of a normal curve
PRE-REQUISITE LESSONS: Random Variables, Probability Distribution of

Continuous Random Variables, Properties of Normal Distributions
LESSON OUTLINE
B. Main Lesson: Areas Under a Normal Curve
REFERENCES
Welfredo Patungan, Nelia Marquez). Philippines: Rex Bookstore.

Laguna 4031
Probability and statistics: Module 22: Exponential and normal distributions (2013).
Australian Mathematical Sciences Institute and Education Services Australia. Retrieved
from
http://www.amsi.org.au/ESA_Senior_Years/PDF/ExpoNormDist4f.pdf
! 194$
Ask learners to recall some of the lessons learned about normal distributions. They
should be able to state that
• A normal distribution has a symmetric bell-shaped curve (for its probability

density function) with one peak. This curve is characterized by its mean m
(the center of symmetry, and also the peak) and standard deviation σ (the
distance from the center to the change-of-curvature points on either side). If
a random variable X has a normal distribution with mean m and variance σ2,
we denote this as X~N(μ,σ2 ).
• A normal curve is symmetric about its mean (thus the mean is the median). It
is more concentrated in the middle and its peak is at the mean (so that the
mean is also the mode).
• Like any continuous distribution, the total area under the normal curve is
equal to 1, and the probability that a normal random variable X equals any
particular value a, P(X=a) is zero (0).
• The normal curve follows the empirical rule (also called the 68-95-99.7 rule):
o Nearly the entire distribution, about 99.7% of the area under the
curve, falls within 3 standard deviations of the mean.
B. Main Lesson: Probabilities/Areas Under a Normal Curve
Tell learners that given a normally distributed random variable: X~N(μ,σ2), we often
wish to find various probabilities pertaining to where an arbitrary measurement may
lie. For instance, we may want to find P(a ≤ X ≤ b), which is the probability that a
random measurement X lies between a and b.
! 195$
We may also wish to find the proportion of measurements less than a value k (or at
most k ), denoted by P(X < k) (or P(X ≤ k) ). Remind learners that it would not
matter whether we are considering P(X < k) or P(X ≤ k) since or P(X = k) =0
Finally, we may want the proportion greater than k (or at least k), denoted by P(X >
k) (or P(X ≥ k) ).
In the last session, learners were given a lesson on the standard normal distribution.
We make use of areas under a standard normal distribution also but we need to
convert a normal distribution into standardized form.
Standard Scores (or Z-scores)
Whatever the value of the mean and standard deviation of a normal curve, we can
transform the whole normal curve into a standard normal curve (as illustrated in the
following figure).
! 196$
This entails transforming the all data in a normal curve into standard units:
An observation is in standard unit (or z-score) if we see how many standard

deviations it is above or below the average. That is, if x, m, and s respectively
represent the observation, its mean, its standard deviation, then the standardized
form (or z-score) of x is
x−µ
σ
Reiterate to learners that a Z-score indicates how many standard deviations a certain
data element is from the mean. For instance, if examination scores in Statistics and
Probability have an average of 75 and a standard deviation of 5, then an exam score of
90 has a z-score of (90-75)/5 = 3 , while a score of 70 has a z-score of (70-75)/5 =-1. To
interpret these z-scores, we note that 90 is 3 standard deviations above the mean (75),
while 70 is one standard deviation “below’ the mean.
Z scores have a very good way of making variables comparable. Suppose a student got
an examination score of 90 in Statistics and Probability (where the mean was 75 and
the standard deviation was 5) and a 92 in an English examination (where the mean was
95 and the standard deviation was 3). While it might seem that the Statistics and
Probability is “lower” (in absolute numbers) than the English, but the z-score in English
is (92-95)/3 = -1, so the “relative” performance in English (in relation to the average) is
actually lower than the relative performance in Statistics and Probability.
The Z-scores may also be used for normal random variables to transform them into
standard normal random variables, and this, in turn, can help us relate probabilities for
any normal distribution to areas under a standard normal curve, as the following
example on the time to walk a dog illustrates.
! 197$
Illustration for Finding Areas Under a Normal Curve
Ask learners to assume that the distribution of heights of all female Grade 11 students
can be modeled well by a normal curve with a mean of 1620 mm and a standard
deviation of 50 mm. Further, we wish to determine (a) the proportion of female Grade
11 students shorter than 1550 mm; (b) the proportion of female Grade 11 students
taller than 1650 mm; (c) the proportion of female Grade 11 students between 1600 and
1675 mm; (d) the height of a female Grade 11 student for which 10 percent of female
Grade 11 students are shorter than it; (e) the height of a female Grade 11 student for
which 75% of female Grade 11 students are taller than it.
For computing the answer to (a), tell learners to firstly transform 1550 to its z-score,
yielding (1550-1620)/50 =-1.4 so that we can associate the area to the left of 1550
(under a normal curve with mean 1620 and standard deviation 50) with that of the area
to the left of z = -1.4 under a standard normal curve. Reading from the table of
Cumulative Distribution Function of a Standard Normal Curve, we find Φ(-1.4) =
0.0808,
For (b), ask learners what they should do. They should say they need to firstly transform
the height value 1650 to its standard units, (1650-1620)/50 = 0.6, and then note that
the area to the right of z = 0.6 under the standard normal curve is the difference
between the total area under a standard normal curve (100%) and the area to the right
of z=0.6, Φ (0.6)= 0.7257. In consequence, the desired probability (and area) is 1-
0.7257=0.2743.
Likewise, for (c), learners should mention they need to firstly transform 1600 and 1675
into their respective standardized forms, namely (1600-1620)/50 = -0.4 and (1675-
1620)/50 = 1.1, and then generate the area between these two z-scores as the
difference between Φ (1.1) and Φ (-0.4), i.e. 0.8643-0.3446=0.5197.
For (d), draw the figures on the board to illustrate what needs to be done:
! 198$
Show that the 10th percentile of the height distribution may be obtained by firstly
getting the 10th percentile of the standard normal curve, which can be read off from
the table as –1.282. This means that the 10th percentile of the height distribution is
1.282 standard deviations below the mean. This required value for the height is –
1.282(50)+1620 =1555.9.
Finally, for (e), suggest to learners that we want the 25th percentile as this is the
value for which 75 percent of the height distribution would be above it. Similar to
(d), tell students they can find the 25th percentile first of a standard normal curve (–
0675), then yield the required height as:
–0.675(50)+1620 =1586.25.
In the last lesson, the NORMSDIST function was illustrated in the Enrichment part of
the lesson. There are other important Excel functions for the normal distribution,
especially the NORMDIST and NORMINV functions.
The NORMDIST (x, mu, sigm, cumulative) helps obtain cumulative probabilities but for
general normal curves. The parameters x, mu and sigma are numeric values, where the
parameter, cumulative is a logical TRUE or FALSE value. Note that sigma must be
greater than 0 (as it is a non-trivial standard deviation), but there are no similar
requirements whether for x or mu.
To illustrate, recall the female student’s height example, where we were interested
firstly in obtaining P(X ≤ 1550 given m =1620 and s =50) where X is the height of a
randomly selected female Grade 11 student. Students can merely use the NORMDIST
function that asks for the score (1550), the mean (m =1620) and standard deviation (s =50) of
the normal distribution:
= NORMDIST(1550,1620,50,TRUE)
Note that the final argument TRUE tells Excel that we wish to obtain the area to the left
(rather than the height of the normal curve).
Also, to compute P(X ≥ 1650 given m =1620 and s =50), learners can specify in
Microsoft Excel the command
! 199$
= 1-NORMDIST(1650,1620,50,TRUE)
For P(1600 ≤ X ≤ 1675 given m =1620 and s =50), learners can enter
= NORMDIST(1675,1620,50,1) - NORMDIST(1600,1620,50,1)
The NORMINV (p, mu, sigma) function of Excel returns the value x such that, with
probability p, a normal random variable with mean mu and standard deviation sigma
takes on a value less than or equal to x. That is, the value returned is the (100 times
p)th percentile of the normal curve with mean mu and standard deviation sigma.
For instance, to obtain the 10th percentile of the distribution for the heights of female
Grade 11 students, merely enter in Excel
=NORMINV(0.1,1620,50)
The 25th percentile (the value for which 75 percent are above it) can be obtained with:
=NORMINV(0.25,1620,50)
KEY POINTS
• To obtain probabilities or percentiles under a normal curve, perform two steps:
Transform the normal curve into a standard normal curve by way of “z-scores”
(which involves subtracting the mean and dividing the result by the standard
deviation)
z = (X - μ) / σ.
Then, use the tables of the Cumulative Distribution Function of a Standard
Normal Distribution to obtain the required areas of a standard normal curve to
find the probabilities associated with the z-scores.
! 200$
ASSESSMENT
1. If a particular batch of data is approximately normally distributed, we would find that
approximately
a) 2 of every 3 observations would fall between ± 1 standard deviation around the
mean.
b) 4 of every 5 observations would fall between ± 1.28 standard deviations around
the mean.
c) 19 of every 20 observations would fall between ± 2 standard deviations around
the mean.
d) All the above.
Answer: d
For problems 2 to 4 consider the following case.
The length of time it takes a Grade 11 student to play the Candy Crush computer app follows a
normal distribution with a mean of 3.5 minutes and a standard deviation of 1 minute
2. The probability that a randomly selected Grade 11 student will play one game of Candy
Crush in less than 3 minutes is
a) 0.3551
b) 0.3085
c) 0.2674
d) 0.1915
Answer: b
3. The probability that a randomly-selected grade 11 student will take between 2 and 4.5
minutes to play Candy Crush is:
a) 0.0919
b) 0.2255
c) 0.4938
d) 0.7745
Answer: d
!
4. The point in the distribution of times to play Candy Crush in which 75.8% of the Grade
11 students exceed when playing Candy Crush.
a) 2.8 minutes
b) 3.2 minutes
c) 3.4 minutes
d) 4.2 minutes
Answer: a
5. Rodrigo earned a score of 940 on a national achievement test. The mean test
score was 850 with a standard deviation of 100. What proportion of students
had a higher score than Rodrigo? (Assume that test scores are normally
distributed.) If there were 100,000 students who took the test, how many would
be expected to have a higher score than Rodrigo?
Answer :
Assuming that test scores are normally distributed, the solution involves
three steps.
! 201$
First, we transform Rodrigo's test score into a z-score, using the z-score
transformation equation.
z = (X - μ) / σ = (940 - 850) / 100 = 0.90
Then, using a standard normal distribution table, we find the cumulative

probability associated with the z-score. In this case, we find P(Z < 0.90) =
0.8159.
Therefore, the P(Z > 0.90) = 1 - P(Z < 0.90) = 1 - 0.8159 = 0.1841.
Thus, we estimate that 18.41 percent of the students tested had a higher
score than Rodrigo. If there were 100,000 students who took the exam,
then we expect
100,000 x (0.1841 ) = 18,410 students
to have scores higher than Rodrigo’s.
6. Every night when you get home from school, you take your dog Bantay for a
walk. The length of the walk is normally distributed with a mean of m=15
minutes and standard deviation of s=3 minutes.
(a) What proportion of walks last less than 15 minutes?
(b) What proportion of walks last longer than 20 minutes?
(c) What proportion of walks last between 10 and 16 minutes?
Answer :
(a) P (X < 15 given m=15 and s=3 ) = P( Z < (15-15)/3 = 0 ) = 0.5
(b) P (X > 20 given m=15 and s=3 ) = P( Z > (20-15)/3 = 1.67 ) = 1- P(Z≤1.67)
= 1- 0.9525 = 0.0475
(c) P (10 ≤ X ≤ 20 given m=15 and s=3 ) = P ((10-15)/3 ≤ Z ≤ (20-15)/3 )=
P (-1.67 ≤ Z ≤ 1.67) = I(1.67) – I(-1.67) = 0.9525 – 0.0475 = 0.905
7. Suppose scores on an IQ test are normally distributed. If the test has a mean of
100 and a standard deviation of 10, what is the probability that a person who
takes the test will score between 90 and 110?
Answer : Here, we want to know the probability that the test score falls
between 90 and 110. The "trick" to solving this problem is to realize the
following:
P( 90 < X < 110 ) = P( X < 110 ) - P( X < 90 ) = P(Z < (110-100)/10 ) – P(Z<
(90-100)/10 ) = 0.84 - 0.16 = 0.68
Thus, about 68% of the test scores will fall between 90 and 110.
Alternatively, notice we are getting the scores within one standard

deviation from the mean, so the empirical rule will suggest the chance to
be 68%.
8. The following letter appeared in the popular “Dear Abby” newspaper advice
column in the 1970s:
Dear Abby: You wrote in your column that a woman is pregnant for 266 days. Who said
so? I carried my baby for ten months and five days, and there is no doubt about it
because I know the exact date my baby was conceived. My husband is in the Navy and
it couldn’t have possibly been conceived any other time because I saw him only once
for an hour, and I didn’t see him again until the day before the baby was born.
! 202$
I don’t drink or run around, and there is no way this baby isn’t his, so please print a
retraction about the 266-day carrying time because otherwise I am in a lot of trouble. -
San Diego Reader
The advice column was founded in 1956 by Pauline Phillips under the pen name
"Abigail Van Buren" and carried on up to today by her daughter, Jeanne
Phillips, who now owns the legal rights to the pseudonym.
Suppose that according to pediatricians, pregnancy durations, let’s call them X,

tend to be normally distributed with m= 266 days and s = 16 days. Perform a
probability calculation that addresses San Diego Reader’s credibility, presuming
she was pregnant for 308 days. What would you conclude and why?
Answer:
The chance of being pregnant for 308 or more days is
P(X > 308 days, given m= 266 days and s = 16 days) = P(Z > (308-266)/16 ) =
P(Z >2.625) = 1- 0.9957 =0.0043 .
which would happen in 43 out of 10,000 pregnancies.
! 203$
CHAPTER 3: SAMPLING
Lesson 1: Coin Tossing Revisited from a Statistical

Perspective
OVERVIEW OF LESSON
In this activity, learners revisit the coin tossing activity but this time, they look into how
the probability of getting a head is an unknown constant and needs to be estimated.
Other illustrations on sampling and estimation are also be discussed.
• describe random sampling
• distinguish between (population) parameter and (sample) statistic
• describe sampling distributions of statistics (sample mean)
• discuss the Central Limit Theorem
LESSON OUTLINE
A. Introduction / Motivation : A Coin Need Not Be Fair
B. Main Lesson: Estimation of Probability of Getting a Head in a Single Toss of a
Coin
C. Data Collection
D. Data Analysis and Interpretation
E. Enrichment
REFERENCES
Richardson, M, Using Dice to Introduce Sampling Distributions. STatistics Education
Web (STEW). Retrieved from
http://www.amstat.org/education/stew/pdfs/UsingDicetoIntroduceSamplingDistributio
ns.doc
Workbooks in Statistics 1: 11th Edition. Institute of Statistics, UP Los Baños, College
Laguna 4031
Probability and statistics: Module 24. (2013). Australian Mathematical Sciences Institute
http://www.amsi.org.au/ESA_Senior_Years/PDF/InferenceProp4g.pdf
KEY CONCEPTS: Sampling, Estimation, Sampling Variation, Standard Error,

Central Limit Theorem
! 204$
MATERIALS NEEDED: 1-peso coin per student
A. Introduction / Motivation: A Coin Need Not Be Fair
Learners may have heard of “sample” of data being used—opinion polls which
estimate the fraction of voters who are likely to vote for a particular candidate for
the next presidential election; taking measurements on the heights and weights of
senior high school learners (done in the first chapter); or conducting an experiment
on a sample of patients who are either randomly allocated to (a) a treatment group
who is given some medical treatment) and (b) a control group, who is given a
placebo, a harmless salt solution, to control the psychological effects of being given
a treatment.
The context here for sampling is to recall the coin-tossing experiment in Lesson 2-
03 that involves tossing a one-peso coin with the class getting either a head (H), the
face of Rizal on top, or a tail (T), the other side up. This time, however, it is crucial to
point out that the class do not assume beforehand that the coin is fair. That is, while
they may be able to assume that the probability of getting a head on a single toss
of a coin, P(H) = p, is a constant, it is not known; and thus, they would like to
estimate p. This makes the situation a statistical one that involves uncertainty.
Tell learners to suppose that if they were to flip the coin n times. (Later on they will
begin to refer to this as taking a sample of size n.) The random variable X, as
defined before in Lesson 2-03, can take on values {0,1,2, … , n} and the number of
outcomes favorable to each value can be read off of Pascal’s Triangle. However,
this time the outcomes are no longer equally likely. Moreover, the probabilities
cannot be computed exactly because they are functions of p, which is not known to
them.
Note to teacher: In the previous chapter, learners have learned that the
probabilities of independent events happening simultaneously are the product of
their probabilities. So, the probability of x heads and n-x tails in specific sequence is
given by
p x(1-p)n-x
! 205$
Moreover, there are nCx = ways that x heads can turn up out of n flips.
Hence, the probability of observing x heads in n tosses given that the chance p of
getting heads is
P(X=x) = nCx p x(1-p)n-x for x = 0, 1, 2, …, n

which is called the binomial probability mass function, or binomial pmf. As
the name implies, a pmf defines the probability mass corresponding to individual
values of the discrete random variable X. The binomial pmf depends on the value of
p, which may be assumed as constant but is unknown.
Intuition should lead the majority, if not most learners, to consider the number of
heads X observed divided by the number of tosses (or sample size) as a reasonable
estimate for p; that is,
X/n
is a natural estimate of the probability p of getting a head.
They know that can take on values 0/n, 1/n, 2/n, … , n/n, but they will not know
which one until after the flipping is completed. Furthermore, they know that if the
experiment is repeated (flip coin n times again), the observed X will not necessarily
be the same as in the previous one. Thus, X is no longer just a variable in the
mathematical sense; it is called a random variable (as was discussed in Lesson 2-03)
because its outcome can change, but that the change cannot be computed with
certainty. Variability and the attendant uncertainty in the result of the sampling
experiment are introduced.
Inform learners that, as will be shown later in the course, the one experiment (of n
flips of a coin yielding a single outcome x) allows the class to
(a) estimate the unknown probability p of getting a head (with x/n);

(b) estimate the uncertainty in this estimate, i.e., the value of the “probable
error” in estimation from the sample, aware that the estimate is subject to
“sampling variation” or “sampling error.” For instance, the approval
ratings of the President obtained from an opinion poll of about 1,200
respondents randomly selected are theoretically within a margin of error of
about 3 percentage points (as will be illustrated later) from the actual
approval ratings.
(c) compute an interval estimate from the sample, along with the chance that
the interval “captures” the unknown p. The uncertainty is still there, but it
! 206$
can be measured using probability and there lies the connection between
statistics and probability (or mathematics).
B. Main Lesson: Estimation of Probability of Getting a Head in a Single

Toss of a Coin
In discussing probability in the last chapter, we have considered symmetry and

appropriate random mixing (such as shaking a die) to justify the assignment of
probabilities. For example, that the chance of rolling a two using a fair die is 1 out
of 6. But knowing the probabilities of events, or even having a basis for assuming
particular values for probabilities, is actually not a common scenario. On the
contrary, we are often confronted with a situation such as a coin-tossing
experiment, where we know the size n of the random sample of units (or number of
trials), but we do not know the probability p of getting a head. And we would like
to estimate this constant but unknown quantity. We could extend this to the
scenario of knowing the
• percentage of voters who would be voting for a certain candidate in the

next election, or
• the fraction of the population who is poor
One of the main reasons for studying probability distributions is that it provides the
foundation for making conclusions or inferences about unknown population
characteristics, such as p (on the basis of sample data). Inform learners that
generalizing results beyond the data collected, provided that the data collected is a
part (sample) of a large set of items (population), is known as statistical
inference.
In the context of the two practical examples, we could get a random sample of
• voters, who can be asked about their current preference for voting in the
next election. We may be interested in using the sample to draw an
inference about the proportion of the population of voters who currently
prefer to vote for some candidate (and even profile these people in
relation to socioeconomic status, sex, age, or geographic location).
• respondents who can be asked about their income and/or expenditure,
and if some poverty line (that can be viewed as the minimum level of
income or expenditure required for a particular welfare level) is defined,
we can draw conclusions about the proportion of the population who are
poor (and consequently, describe the poor in relation to the non-poor).
! 207$
Even without using any concepts from probability (discussed in the previous
chapter), learners should find it reasonable to think that a sample proportion should
tell us something about the population proportion (that is unknown). If we have a
“random sample” from the population, the sample is representative of the
population so we should be able to use the sample proportion as an estimate of the
population proportion. Provide some scenarios and ask what the estimate of p
would be for these scenarios:
• Flipping a coin 100 times and getting 52 heads. Ask learners what the
estimate of the probability of getting a head on a single toss would be.
The probability of getting a tail? They should say 52/100=0.52 and
48/100=0.48, respectively.
• Conducting an opinion poll of 1,200 randomly selected voters who

suggested these voting preferences
Metro Balance Visayas Mindanao Total

Manila Luzon
Candidate X 195 197 115 261 768
Candidate Y 105 103 185 39 432
Total 300 300 300 300 1200
Ask learners what the estimated fraction of voting preference for

candidate X would be. Learners should see that nationally, the estimate is
768/1200=0.64 but the estimated proportions vary by geographic
location.
• Conducting a sample survey of, say 5 families selected randomly from a

list of families. They are asked to provide information on their monthly
family income and family size. Suppose that a family is poor if its monthly
per capita income, i.e. monthly total family income divided by the family
size, is less than Php1,800 Phpper month.
Monthly Total Family Per Capita

Family Income Family Size Income
1 40,000 5 8000.00
2 10,000 6 1666.67
3 100,000 3 33333.33
4 8,000 8 1000.00
5 75,000 4 18750.00
! 208$
Ask learners what the estimated proportion of families that are poor would be. Learners
should see that only the second and fourth families are poor, so 2/5=0.40= 40% of
families are estimated to be poor.
C. Data Collection Activity
Give the learners the Activity Worksheet 3-01. Ask learners: If you were to toss a
coin for an extremely large number of times, what proportion of the tosses will be
heads? Of course, they will answer 1/2. Explain to learners that they are assuming
that the coin is fair. The goal of this activity is to estimate the proportion of tosses
that would result in a “head” and then, to examine the distribution of estimates in
repeated sampling.
Have learners work individually using the data collection procedure described on
the Activity Worksheet. Learners must determine the sample proportion of tosses
that would yield a “head” for each of the sample sizes of n=5, 10, 20, and 30.
Individual results are recorded on the Worksheet and each student will write
individual results on the blackboard (or worksheet of a spreadsheet application in a
computer) in an appropriately labeled column.
For a 45-student class, there should be 45 sample proportion values for each of the
sample sizes. Collecting this data provides learners the opportunity to participate in
an example of obtaining repeated samples. Calculating the proportion of “heads”
yielded for each of the sample sizes helps to reinforce the idea of a sample
proportion being a random variable whose value changes from sample to sample.
After the individual sample proportions have been computed and the results copied
onto the blackboard (or a computer worksheet), ask learners to input the class data
into the Class Data Table on the Activity Worksheet. Based on the generated data,
construct a Stem and Leaf Display (you may also use bar graphs). From this graph,
ask learners to describe the distribution of the values that they generated.
D. Data Analysis and Interpretation !
The figures below provides an example of results that can serve as a model. Here, 35
learners participated in producing the example data set. Stem and Leaf Displays have
been constructed for the proportions of tosses that yielded a ‘head’ for each of the four
sample sizes (5, 10, 20 and 30). By examining the class data, learners will begin to
discover how statistics differs from mathematics since statistics involves uncertainty.
! 209$
Stem and Leaf Display for sample sizes of n=5
data rounded to nearest multiple of .1
plot in units of .1
0* | 0
0t | 22222
0f | 44444444444
0s | 66666666666
0. | 8888
1* | 000

plot in units of .1
0t | 222
0f | 4444444555555555
0s | 666666677777777
0. | 8

plot in units of .01
2. | 888
3* | 2
3. | 6
4* | 0000044444
4. | 88888
5* | 22222
5. | 6
6* | 00004
6. | 88
7* | 22

! 210$
plot in units of .01
3* | 4
3. | 88
4* | 002222444
4. | 6688
5* | 00022244
5. | 666666
6* | 044
6. | 68
Figure 3-01.1. Stem and Leaf Display of Example Class Data
The example class data in Figure 3-01.1 is used for purposes of illustration, and
maybe as a prototype for the actual class data. For each sample size, learners
should construct a stem and leaf display (or any graphical representation of the
distribution such as a histogram) of the sample proportion values and describe the
shape, center, and spread of the distribution of values. For samples of size 5 and
size 10, there are only a few different values of the sample proportion, so the shape
is difficult to determine. It appears that the centers of the distribution for samples of
size 5 and size 10 are both at around 0.50. And, the sample proportion values range
from 0 to 1 for samples of size 5, and from 0.2 to 0.8 for samples of size 10. For
samples of size 25, different values are obtained for the sample proportion of
“heads.” The distribution has a center again at around 0.50. And, the sample
proportions range from 0.28 to 0.72. For samples of size 50, the distribution of the
sample proportions appears roughly like a normal curve, i.e. mound-shaped (with a
slight rightward skew). The center is at around .50. The sample proportion values
range from 0.34 to 0.68.
For each sample size, ask learners to calculate the mean and standard deviation of
the sample proportions. The calculated values of the mean of the sample
proportion distributions are 0.52, 0.529, 0.489, and .50, respectively for samples of
size 5, 10, 25, and 50. The calculated values of the standard deviation of the sample
proportion distributions are 0.24, 0.15, 0.12, and 0.09, respectively for samples of
size 5, 10, 25, and 50.
Ask learners to think about the relationship between the center of the distribution
of the sample proportions and the value of the population proportion. Learners
! 211$
should note that the distribution of sample proportion values is centered on the
value of the population proportion (1/2 is approximately 0.50).
Tell learners to think about the relationship between the sample size and the shape
of the distribution of the sample proportion. Learners should note that as the
sample size increases, the distribution of the sample proportion tends more towards
a normal distribution. (This is known as the “Central Limit Theorem”).
Ask learners: For which sample size is the standard deviation of the sample
proportion values the largest and for which sample size is the standard deviation
the smallest? Ask them why they think this happens. Learners should observe that
the variability of the sample proportion values, whether measured from the range or
the standard deviation, is related to the sample size. A larger sample size results in
smaller variability (smaller range and smaller standard deviation) in the sample
proportion values.
The results from the analyses of the repeated sampling can lead to a discussion on
the theoretical properties of the sampling distribution of a sample proportion. The
mean value of the distribution of a sample proportion for repeated random samples
of size n, drawn from the same population, is equal to the corresponding value of
the proportion of the population, p. (Here, the value of p seems to be 0.5). The
standard deviation of the distribution of a sample proportion for repeated random
samples of size n, drawn from the same population, decreases with an increase in
the sample size. The theoretical standard deviation formula of a sample
proportion, also called the “standard error,” can now be introduced. The
standard error is inversely proportional to the square root of the sample size. In
consequence, the bigger the sample size (i.e., the number of tosses of the coin), the
less variability there will be in the estimates.
Technical Note: The function is maximized when p=1/2, then the standard
error of the sample proportion is at most . As will be shown in later lessons,

organizations that conduct opinion polls have 95% confidence that the opinion
polls have margins of error, i.e. twice the standard error of a sample proportion, at
most 3 percentage points. They design the polls so that , or
equivalently, the minimum sample size n, for which, = This is why

these organizations use sample sizes of 1,200.
! 212$
Learners will also observe that the standard error is dependent on p, and thus, we may
instead estimate it with provided that the sample size is large
enough (usually and ).
Inform learners that the distribution of a sample proportion for repeated random
samples of size n, drawn from the same population, will approximately follow a normal
(bell-shaped) distribution.
Note: The activity above can be done with other “simulations” of situations, e.g.,
consider tossing a die and observing the proportion of times that the upward face of
the die yields a “four” or a “five” (which should be expected to be 2/6 = 1/3 , give or
take some random variation).
E. Enrichment
After learners have been introduced to the formula for a confidence interval for the
population proportion, data collected from the coin-tossing activity for n= 50 can
be used. Learners can construct a 95% confidence interval for the proportion of
tosses resulting in a “head”:
which can be approximated as
since p is unknown.
Note that since each student’s sample will be unique, after constructing their 95%
confidence intervals, learners can be asked to put the results onto the blackboard.
This way, the instructor can have a discussion on what confidence level means
(examining the percentage of the different confidence intervals that include ½ =
! 213$
0.5). About 95 percent of learners will have confidence intervals that will hit the
target of 0.5, but about 5 percent of the intervals will not.
KEY POINTS
• The probability p of getting a “head” in a single toss of a coin need not be 50%,
but it is an unknown number, which you can estimate by flipping a coin n times
and noting the number x of times you get a “head,” and thus yield an estimate
of p.
• If several learners were to yield estimates from n tosses of the coin, the estimates
will not be the same, but they will have “sampling” variability. The standard
p(1 − p)
deviation of the estimates, called the standard error, is . This standard
n
error is dependent on p, and thus, we may instead estimate it with
. The standard error is inversely proportional to the square
root of the sample size. In consequence, the bigger the sample size (i.e., the
number of tosses of the coin), the less variability we will have in the estimates.
• As the number of tosses increases, the distribution of estimates looks more and
more like a normal curve. (This is known as the “Central Limit Theorem”
wherein the sample proportion has a sampling distribution whose shape can be
approximated by a normal curve, whose center is the value of the population
p(1 − p)
proportion and a standard deviation of . The larger the sample, the
n
better the approximation will be.
! 214$
ACTIVITY SHEET NUMBER 3-01
Using Coin Tosses to Introduce Sampling Distributions Activity

Sheet
Suppose that you were to toss a regular one-Php coin a large number of times. What
proportion of the tosses will yield a “head”? p = ______________.
Individual Data Table Use two decim al places.
5 Trials 10 Trials 25 trials 50 Trials
Number of Tosses
Resulting in “Heads”
Proportion of Tosses
Resulting in “Heads”
Copy your sample proportions (use two decimals) onto the blackboard in
the appropriately labeled column.
Input the class proportion of tosses resulting in a “head” into the Class Data Table.
Class Data Table Class Proportion of Tosses Resulting in a “Head”
Sample n=5 n= n= n=30 Sample n=5 n = 10 n = 20 n=30

1 10 20 21
2 22
3 23
4 24
5 25
6 26
7 27
8 28
9 29
10 30
! 215$
11 31
12 32
13 33
14 34
15 35
16 36
17 37
18 38
19 39
20 40
Use the Class Data to answer the following questions. Rem em ber that p = 1/2
1. For each sample size, construct a stem-and-leaf display/histogram of the sample

proportion values and describe the shape, center, and spread of the distribution of
values.
2. Based on your stem-and-leaf displays/histograms, what do you think is the
relationship between the center of the distribution of the sample proportions and the
value of the population proportion?
3. Based on your stem-and-leaf displays/histograms, what do you think is the
relationship between the sample size and the shape of the distribution of the sample
proportion?
4. For each sample size, calculate the mean and standard deviation of the sample
proportions and write a sentence to interpret the standard deviation.
Sample Mean Standard

Size Deviation
n=5
n=10
n=25
n=50
5. For which sample size is the standard deviation the largest and for which sample
size is the standard deviation the smallest? Why do you suppose this happens?
! 216$
ASSESSMENT
1. A polling organization randomly selects 1,000 registered voters to estimate the

proportion of a large population that intends to vote for a certain movie celebrity in an
upcoming election. Although it is not known by the polling organization, the actual
proportion of the population that prefers the celebrity is 0.46.
(a) Give the numerical value of the mean of the sampling distribution of pˆ .
Answer: .46
(b) Calculate the standard deviation of the sampling distribution of pˆ .
Answer:
(c) If the proportion of the population that prefers the celebrity is .46, would a sample
proportion value of .60 be considered unusual?
Answer: Yes, a sample proportion of .60 would be considered unusual.

It is roughly 8.9 standard deviations above the mean: .46 + 8.9(.0158) =
.60.
2. Every senior high school student taking a Probability and Statistics class at a large
senior high school (about 1,100 learners) participated in a class project by rolling a 6-
sided die, 50 times. Each student determined the proportion of his or her 50 rolls for
which the result was a “1”. The instructor plans to draw a histogram of the 1,100
sample proportions.
(a) What will be the approximate shape of this histogram?
(i). Skewed left.
(ii). Uniform.
(iii). Normal (bell-shaped).
(iv). Skewed right.
ANSWER: iii
! 217$
(b) What will be the approximate mean for the 1,100 sample proportions?
(i) 1/50
(ii) 1/6
(iii) 6/50
(iv). 6
ANSWER: ii. Note that the sample size here is 50
(c) What will be the approximate standard deviation for the 1,100 sample proportions?
(1 / 6)(5 / 6)
(i)
1,100
(1 / 6)(5 / 6)
(ii)
50
(1 / 100)(99 / 100)
(iii)
1,100
(1 / 100)(99 / 100)
(iv)
50
ANSWER: ii
3. A candy company assures the public that 20% (p = 0.20) of all its assorted candies
are chocolate flavored. Suppose the candies are packaged at random in small bags
containing about n =100 candies. A class of senior high school learners learning about
percentages opens several bags, counts the various types of candies, and calculates
p̂ = proportion in the bag that are chocolate flavored. What is the z-score for p̂ = .25?
ANSWER;
and, here: p = .20, .25, and n = 100, so:
! 218$
4. It is believed that 60% of cars travelling on a major highway outside of Metro Manila
exceed the speed limit. A radar trap checks the speeds of 90 cars.
(a) Using the 68-95-99.7 Rule, draw and label the distribution of the proportion of cars
that the police will observe speeding
ANSWER:
Center of distribution is 0.60; Standard error is 0.05163978
The normal curve follows the empirical rule (also called the 68-95-99.7
rule):
• About 68% of the area under the curve falls within 1 standard
• About 95% of the area under the curve falls within 2 standard
• Nearly the entire distribution (About 99.7% of the area under the
curve) falls within 3 standard deviations of the mean.
Thus, we can approximate the sampling distribution of the proportion

of cars that will exceed speed limit by a normal curve with center 0.60
and standard deviation 0.052
(b) Do you think the appropriate conditions necessary for your analysis are met?
! 219$
Both np=90 *0.6 = 54 and np(1-p)=21.6 ≥ 10. Drivers may be viewed as
independent of each other, but if flow of traffic is very fast, they may not behave
independently. Or if weather conditions are not good, these conditions may
affect all drivers. In these cases, there may be fewer speeders than they expect.!
5. Using the following table of standard errors for estimating the proportion p of voters
supporting a candidate:
(a) Obtain the maximum estimated standard error for a sample of size 1,200
ANSWER: .0145
(b) What would the gain be, if you go from a sample of size 1,000 to a sample of size
2,000, or even 10,000?
ANSWER: Doubling sample size from 1,000 to 2,000 will only reduce
standard error from 1.6 percentage points to 1.1 percentage points.
Increasing size from 1,000 to 10,000 will reduce standard error from 1.6
percentage points to about a third (0.5 percentage point).
! 220$
CHAPTER 3: SAMPLING
Lesson 2: The Need for Sampling
OVERVIEW OF LESSON
In this lesson, learners are given lectures (and assessments) regarding sampling—basic
concepts, discussions on why it is important to sample, descriptions of different types of
samples, as well as kinds of survey errors.
LEARNING CO M PETENCIES
• Define random sampling

• Give reason for sampling
• Distinguish between parameter and statistic
• Recognize the value of randomization as a defense against bias
• Identify that the size of the sample (not the fraction of the population) determines the
precision of estimates from a probability sample
LESSON OUTLINE
A. Motivation: What is a Survey and Why do we use Sampling?

B. Lesson Proper
1. Probability Sampling
2. Non-probability Sampling
3. Survey Errors
4. Sampling Distribution, Accuracy, and Precision
C. Data Collection
E. Enrichment
REFERENCES
Albert, J. R. G. (2008). Basic Statistics for the Tertiary Level (ed. Roberto Padua, Welfredo
Patungan, Nelia Marquez). Philippines: Rex Bookstore.
4031
! 221#
KEY CONCEPTS: Sampling, Estimation, Bias, Sampling Variation, Randomization
A. Motivation: What is a Survey and Why do we use Sampling (rather than full
enumeration)?
In Chapter 1, discussions on describing data assumed that data come from a

population of interest. When the recording of information of an entire population is
conducted, this is called a census. An example of this is collecting the grades of all
the Grade 11 learners, or the decennial population census done by the Philippine
Statistics Authority (PSA). However, in most cases, censuses involve great
challenges. Also, one does not need to do a full count to get information, especially
on flow data, such as agricultural production, household expenditure, and
establishment income. This brings us to sampling, which is the process of selecting
a section of the population.
Learners may have heard of sample surveys especially opinion polls conducted
before an election. Ask a few learners to tell state the number of minutes they
spend to get to school in the morning. Then, after asking these few individuals,
describe to them the typical time it takes learners to get to school (such as the
average time). Ask learners if the descriptive statements you made are valid or not.
Next, define:
• a sample survey as a method of systematically gathering information on a

segment of the population, such as individuals, families, wildlife, farms,
business firms, and unions of workers, for the purpose of inferring
quantitative descriptors of the attributes of the population.
The fraction of the population being studied is called a sample.
Learners may wonder why people don’t just survey everyone instead and why they
“trust” opinion polls when these only interview 1,600 respondents and not the
actual millions of Filipinos who will be voting on election day. Learners should be
made aware that there are many reasons why we resort to sampling.
• Cost. A sample often provides useful and reliable information at a much

lower cost than a census. For extremely large populations, the conduct of a
census can be even impractical. In fact, the difficulty of analyzing complete
census data led to summarizing a census by taking a “sample” of returns.
! 222#
• Timeliness. A sample usually provides more timely information because
fewer data are to be collected and processed. This attribute is particularly
important when information is needed quickly.
• Accuracy. A sample often provides information as accurate, or more
accurate, than a census, because data errors typically can be controlled
better in smaller tasks.
• Detailed information. More time is spent in getting detailed information
with sample surveys than with censuses. In a census, we can often only
obtain stock, not flow data. For instance, agricultural production cannot be
generated from censuses.
• Destructive testing. When a test involves the destruction of an item,
sampling must be used. Battery life tests must use sampling because
something must be left to sell!
Inform learners that conducting a full census of voters can be quite costly and
besides, this is already done on Election Day itself. Only in rare cases is a full
enumeration census of the population taken. For instance, the PSA conducts the
Census of Population and Housing every ten years, typically when the year ends in
0, although in 1995, 2007 and 2015, the PSA has also conducted mid-decade
censuses. The financial costs for conducting and processing results of censuses are
quite huge (compared to sample surveys).
Explain that in a sample survey, we can generate flow information that describes
characteristics of the subject covering a period of time. For instance, agricultural
production is collected not in an agriculture census but in a sample survey of
agricultural households and establishments. A sample survey covers more detailed
information on the unit of inquiry than that of a census, and is also less expensive to
conduct than a census.
Sampling theory, developed a century ago, has shown that one does not need to
conduct a census to obtain information, i.e. conducting a sample survey will do just
as well. Look at it this way: One does not need to finish drinking a pot full of coffee
to know if the coffee tastes good. A cup or even a sip will do, provided the
“sample” is taken in a “fair manner.” Even hospitals only extract blood samples
from patients for medical tests rather than extracting all the blood of the patient to
determine whether or not the patient gets clean bill of health. What is crucial is to
design a sample survey that will be a representative of the population it intends to
characterize. Typically, people can guarantee representativeness in a sample survey
if chance methods are used for selecting respondents.
! 223#
Even sample surveys conducted by the PSA—household surveys, establishment
surveys, agricultural surveys (that may involve households and establishments)—are
also using chance methods to select their survey respondents.
B. Lesson Proper
1. Probability Sampling
If data is to be used to make decisions about a population, then how the data is
collected is critical. For a sample data to provide reliable information about a
population of interest, the sample must be representative of that population.
Selecting samples from the population using chance allows the samples to be
representative.
If a sample survey involves allowing every member of the population to have a

known, nonzero chance of being selected into the sample, then the sample survey
is called a probability sample. Probability samples are meant to ensure that the
segment taken is representative of the entire population. Examples of these include
the Family Income and Expenditure Survey (FIES), the Labor Force Survey, and the
Quarterly Survey of Establishments, all conducted by the PSA. Opinion polls
conducted by some non-government organizations with track records such as the
Social Weather Stations and Pulse Asia, likewise use chance methods to select their
survey respondents. Data collected from these probability sampling-based surveys
yield estimates of characteristics of the population that these surveys attempt to
describe.
Basic Types of Probability Sampling
a. Simple random sampling (SRS) involves allowing each possible

sample to have an equal chance of being picked and every member of the
population has an equal chance of being included in the sample. Selection
may be with replacement (selected individual or unit is returned to frame for
possible reselection) or without replacement (selected individual or unit isn’t
returned to the frame). This sampling method requires a listing of the
elements of the population called the sampling frame. In the case of
agricultural surveys or surveys of establishments, the sampling frame may
either be based on a list frame, or an area frame, or a mixture. Samples may
be obtained from the table of random numbers or computer random number
generators.
! 224#
b. Stratified sampling is an extension of simple random sampling which
allows for different homogeneous groups, called strata, in the population to
be represented in the sample. To obtain a stratified sample, the population
is divided into two or more strata based on common characteristics. A SRS is
then used to select from each strata, with sample sizes proportional to strata
sizes. Samples from the strata are then combined into one. This is a common
technique when sampling from a population of voters, stratifying across racial
or socio-economic classes. When thinking of using stratification, the
following questions must be asked:
! Are there different groups within the population?

! Are these differences important to the investigation?
Figure 3-02.1 Illustration of Stratified Sampling
If the answer to both questions is yes, then stratified sampling is necessary.
Explanatory Note: Usually, stratified sampling is done when the population is

divided into several subgroups with common characteristics. The population
may be divided into urban and rural locations (as dwellings in rural areas may
tend to be homogenous compared to dwellings in urban areas); the student
population may be divided by the year level of learners; or the workers in a
hospital may be categorized by their different occupations—nurse, doctor,
janitor, secretary.
c. In systematic sampling, elements are selected from the population at a

uniform interval that is measured in time, order, or space.
! 225#
Figure 3-02.2 Illustration of Systematic Sampling !
Typically, there is firstly, a decision on a desired sample size n. The frame of

N units is then divided into groups of k units: k=N/n. Then, one unit is
randomly selected from the first group, with every kth unit thereafter also
selected. For instance in Figure 3-02.2, consider the population of 20 trees,
and if the sample size is 4, then the frame is divided into 4 groups. Suppose
that the fourth item is chosen in the first group, with every fifth unit thereafter
chosen.
d. Cluster sampling divides the population into groups called clusters,

selects a random sample of clusters, and then, subjects the sampled clusters
to complete enumeration, that is everyone in the sampled clusters are made
part of the sample.
Figure 3-02.3 Illustration of Cluster Sampling
! 226#
Explanatory Note: Clusters in the population may be based on convenience in
the collection of data. For example, in a village, clusters can be blocks of
houses. In a school, the clusters can be the sections. In a dormitory, clusters can
be the rooms. In a city or municipality, the clusters can be the different
barangays. Cluster sampling is conducted so that data collected need not come
from a huge geographic range, thus saving resources. For instance, instead of
getting a simple random sample of households from all over a town, clusters of
dwellings can be selected from different barangays so that the cost of data
collection can be minimized.
Example:
Suppose you want to compute the mean grade point averages (GPAs) of learners at a
certain higher educational institution. You decide that an appropriate sample size is n =
100. To estimate the mean GPAs, you can use simple random sampling to select 100
learners and average their GPAs. Since freshmen GPAs tend to be lower than senior
GPAs, you may want to make sure that both classes are represented, so you decide to
use a stratified sample.
According to the university’s registrar, the student population consists of 35% freshmen,
30% sophomores, 20% juniors, and 15% seniors. Get samples from each stratum,
proportional to its size. Specifically, take simple random samples of 35 freshmen, 30
sophomores, 20 juniors, and 15 seniors. Then, average the GPAs of the learners to
estimate the GPA of the entire university.
Instead of a class, you can also have subgroups of the student population based on
their academic major, assuming that each student is assigned one major. When
stratifying into subgroups, the subgroups must be mutually exclusive. If they are not,
then some subjects will have a higher chance of being chosen since they belong in
more than one subgroup.
Inform learners that Statistics is different from Mathematics. The essential paradigm
in Statistics is induction (from the particular to the general) while Mathematics uses
deduction (from the general to the particular). Modern Statistics’ is there to develop
tools that will allow scientifically valid inference from samples to the populations
from which they came.
Specific parameters—numerical summaries of the population such as a population

proportion or a population mean—are estimated by Statistics, summaries of the
! 227#
sample data such as a sample proportion, or a sample mean. In probability
sampling, each member of the population has a positive and measurable chance of
inclusion in the sample. These inclusion probabilities serve as the bridge from
sample to population. However, this bridge is weak or nonexistent when the
inclusion probabilities cannot be computed as in the case of sample surveys.
Figure 3-02.4 Population, sample, and inference
2. Non-probability Sampling
Ask learners whether polls on voting preferences through SMS messages and
Facebook posts can be adequate to represent actual voting preference. Learners
should know or be made aware that results of such kinds of polls are filled with too
much noise as there is currently no way to determine the representativeness of
respondents to such surveys (if the targeted population is much bigger than the
sampled respondents). SMS and Facebook polls do not have complete coverage of
voters: Not all voters have cellphones (especially among the poor) despite the
increase in mobile phone usage over the years; Not everyone has internet access;
and, certainly, not every Filipino voter has a Facebook account. As of 2014, only a
third of Filipinos are reported to have access to the Internet.
In addition, a mere “random selection” of mobile phone numbers or of Facebook

users will in no way assure you of its representativeness of the voting population
even if everyone had a cellphone or a Facebook account since there will be
“nonresponses” that have to be accounted for.
Non-probability or judgment sampling is the generic name of several

sampling methods where some units in the population do not have the chance to
be selected in the sample, or if the inclusion probabilities cannot be computed.
Generally, the procedure involves arbitrary selection of “typical” or
! 228#
“representative” units concerning which information is to be obtained. A few types
of non-probability samples are listed below:
a. Haphazard or accidental sampling involves an unsystematic

selection of sample units. Some disciplines like archaeology, history, and
even medicine draw conclusions from whatever items are made available.
Some disciplines like astronomy, experimental physics, and chemistry often
do not care about the “representativeness” of their specimens.
b. In convenience sampling, sample units expedient to the sampler are
taken.
c. For volunteer sampling, sample units are volunteers in studies wherein
the measuring process is painful or troublesome to a respondent.
d. Purposive sampling pertains to having an expert select a
representative sample based on his own subjective judgment. For instance,
in Accounting, a sample audit of ledgers may be taken of certain weeks
(which are viewed as typical). Many agricultural surveys also adopt this
procedure for lack of a specific sampling frame.
e. In Quota Sampling, sample units are picked for convenience but certain
quotas (such as the number of persons to interview) are given to interviewers.
This design is especially used in market research.
f. In Snowball Sampling, additional sample units are identified by asking
previously picked sample units for people they know who can be added to
the sample. Usually, this is used when the topic is not common, or the
population is hard to access.
Discuss with learners other ways of classifying surveys.
• size of the sample – e.g. large-scale or small-scale

• periodicity – longitudinal or panel, where respondents are monitored
periodically; cross-section; quarterly
• main objective – descriptive, analytic
• method of data collection – mail, face-to-face interview, e-survey,
phone survey, SMS survey
• respondents – individual, household, establishment (or enterprise), farmer,
OFW
! 229#
3. Survey Errors
When collecting data, whether through sample surveys or censuses, a variety of

survey errors may arise. This is why it is crucial to design the data collection process
very carefully. Censuses may also overcount or undercount certain portions of the
population of interest. Household censuses in the Philippines, for instance, have
often been contentious because of undercounts and overcounts and their
implications on politics since congressional seats and Internal Revenue Allotment
(IRA) depend on population counts. Conclusions based on purposive samples, such
as telephone polls used in early morning television shows, SMS polls, or surveys in
Facebook, do not hold the same weight as probability-based samples. A probability
sample uses chance to ensure that the sample is much more representative of the
population, something that is not true of purposive samples.
Survey errors involve sampling errors and non-sampling errors:
• In the conduct of sample surveys, sampling error is roughly the difference

between the value obtained in a sample statistic and the value of the
population parameter that would have arisen had a census been conducted.
This difference comes from the operation of the chance process that
determines which particular units in the population are included in the
sample. This error can be positive or negative, small or large but increasing
the sample size can always reduce this type of error. This error can be
estimated and reported along with the sample statistic. Since estimates of a
parameter from a probability sample would vary from sample to sample, the
variation in estimates serves as a measure of sampling error. Statisticians can
say, for instance, that in 2000, the FIES indicated that 39.5 percent of the
entire Filipino population is poor and that there are 95 chances in 100 that a
full census would reveal a value within 0.4% of the stated figure. The
approval ratings of the President, obtained from an opinion poll of about
1,200 respondents who were selected judiciously through chance-methods,
are theoretically within a margin of error of about 3 percentage points from
the actual approval ratings.
• Another type of error that statisticians consider in the collection of data is
called non-sampling error. There are many specific types of non-sampling
error. There may be selection bias or the systematic tendency to exclude
in a survey a particular group of units. As a result, you get coverage errors,
which arise if, for example, we assume that the respondents in a telephone
! 230#
poll in an early morning television shows reflect the entire population of
voters. Yet in fact, telephone polls in the Philippines at best represent only
the population of telephone subscribers, which is, in truth, only a vast
minority of the targeted population of all Filipinos. Current television and
radio polls being conducted by a number of media stations reflect only the
population of those who are watching or listening to the show and who are
persistent in phoning in their views. Thus, there is a serious issue of
coverage. The same is true in the case of Internet-based and SMS surveys.
Even a seriously done Internet survey will only reflect those who have
Internet access, which is currently not the majority of Filipino households. To
illustrate coverage and other non-sampling errors, consider the following
case in point.
Example of Survey Estimate Fiasco:

In 1936, the Literary Digest, a famed magazine in the United States, conducted
a survey of its subscribers as well as telephone subscribers to predict the
outcome of the presidential race. The Digest erroneously predicted that then
incumbent President Franklin D. Roosevelt would receive 43% of the vote and
thus lose to the challenger Kansas Governor Alfred Landon when in actuality,
Landon only received 38% of the total vote. (The Digest went bankrupt
thereafter). At the same time, George Gallup set up his polling organization and
correctly forecasted Roosevelt’s victory from a mere sample of 50,000 people.
A post-mortem analysis revealed coverage errors arising from biases in sample

selection. The Literary Digest list of targeted respondents was taken from
telephone books, magazine subscriptions, club membership lists, and
automobile registrations. Inadvertently, the Digest targeted well-to-do voters,
who were predominantly Republican and who had a tendency to vote for their
candidate. The sample had a built-in bias to favor one group over another. This
is called selection bias. In addition, there was also a non-response bias
since, of the 10 million they targeted for the survey, only 2.4 million had actually
responded. A response rate of 24% is far too low to yield reliable estimates of
population parameters. Nonresponsive people may differ considerably in their
views from the views of responders.
Here, we see that obtaining a large number of respondents does not cure
procedural defects but only repeats them over and over again! When choosing a
! 231#
sample, biases, such as selection bias or nonresponse bias should be avoided.
However, in practice, it can be challenging to avoid nonresponse bias in surveys
since there are people who will fill out surveys and those who will not, even if
incentives are provided.
Provide more examples to emphasize the lesson on non-sampling errors, such

as asking your learners a question but only selecting specific people to answer
the question. For example, what their favorite toy or game was when they were
growing up, but only ask either the boys or girls. Then, based on the responses,
conclude that their answers were true for the entire class. Have them react to
your statements. Say that it was an example of selection or coverage bias.
You can also ask what the average height of the class is, but only ask tall people.
Then, conclude that all members of the class are tall. Emphasize that non-
probability sampling makes the conclusions hard to generalize for the
population.
To remedy biases (or failures for a sample to represent the population) resulting
from “convenience” errors, polling organizations have since then resorted to
using probability-based methodologies for selecting samples where the subjects
are chosen on the basis of certain probabilities, which in turn, allow us to
compute for the number of respondents each sampled respondent effectively
represents. Randomization or using chance-based procedures for selecting
respondents is the best guarantee against bias. However, it is important to firstly
have an idea of the sampling frame, i.e. the targeted population, and carefully
design the survey in order to make it representative of the targeted population.
Other possible sources of biases in sample surveys that one should be cautious
about:
• wording of questions, which can influence the response enormously
• the sensitivity of a survey topic (e.g., income, sex and illegal behavior)
• interviewer biases in selecting respondents or in the responses generated
because of the appearance and demeanor of the interviewer
• non-response biases, which happens when targeted respondents opt not
to provide information in the survey
! 232#
Note to Teacher: You may mention the following examples to drive the point
further about survey errors. One rather famous example is the time when surveys
conducted by all the “reputable” pollsters Gallup, Crossley and Roper in the United
States in the 1950s embarrassingly resulted in the wrong prediction that New York
Governor Thomas Dewey would beat the reelectionist US President Harry Truman in
the presidential race. No less than the famed Chicago Daily Tribune printed an early
edition with the headlines based on the (wrong!) poll predictions.
Re-electionist Harry Truman showing the Chicago Daily Tribune early edition
There, problems resulted from the sampling design, with the interviewers being
provided excessive judgment calls on whom to interview. These polls used quota
sampling. Interviewers may have selected the least threatening people they would
encounter on the field, e.g. the best-dressed people so that the samples chosen
systematically over-represented a part of the population (and underrepresented
other groups).
Special surveys that measure the difference between respondents and non-
respondents show that lower-income and upper income people tend to not
respond to questionnaires, so that modern polling organizations would prefer to
use personal interviews rather than mailed questionnaires.
In developed countries like United States, the typical response rate for personal
interviews is about 65%, compared to merely 25% for mailed questionnaires. A
number of methods are now being tested to improve response rates in polls and
other surveys.
! 233#
4. Sampling Distribution, Accuracy and Precision
As was pointed out earlier, statistics generated from a sample survey are subject to
both non-sampling and sampling errors. The latter arise because only a part of the
population is observed. There is likely to be some difference between the sample
statistic and the true value of the population parameter (that you would have
obtained had a census been conducted). To know more about this difference or
sampling error and consequently establish the reliability of the sample statistic, you
have to understand the chance process involved in the sample selection. For this
purpose, you have to analyze the sampling distribution or the set of all possible
values that the point estimate could take under repeated sampling, and possibly
approximate this sampling distribution.
When estimating, you should know something about the population to be

generalized. One of the characteristics of the population that is often estimated is
the mean. The population mean is often the parameter to be estimated. There can
be several estimators of the population mean, including the sample mean, sample
median, sample mode, and sample midrange. In similar manner, there can be
several estimators of the population variance s2. Given sample data
where represents the sample mean (i.e. the sum of the data divided by the sample
size n), then the sample variance defined with denominator n-1
and that with denominator n
are two estimators of the population variance.
As was earlier pointed out, a good estimator must possess desirable properties—
lAccuracy and Precision.
! 234#
• Accuracy is a measure of how close the estimates are to the parameter. It
can be measured by bias, i.e., the difference of the expected value of the
estimate from the true value of the parameter. An estimator is said to be
unbiased if its bias is zero. Otherwise, the estimator is biased. When bias is
positive or greater than zero, the estimator overestimates the parameter. If
negative or below zero, estimator underestimates the parameter.
• Precision is a measure of how close the estimates are with each other. The
variance of the estimator or its standard error gives a measure of how precise
the estimator is. The smaller the value of the standard error of an estimator,
the more precise the estimator is.
In general, we want the estimator to be both accurate and precise. We can illustrate
precision and accuracy by way of an analogy. Let us represent the parameter as a
target bull’s eye while the estimates of the parameters are the arrows shot by an
archer. The first target (1) in the figure below illustrates a precise but not an
accurate estimator. The second target (2) shows that the archer or estimator is
accurate but not precise. The third estimator (3) shows the archer is both precise
and accurate while the last target (4) shows an estimator that is neither accurate nor
precise.
(1)!! (2)!! (3)!! (4)!!
Figure 3-02.5 Analogy between estimation and hitting the bull’s eye
Example: The sample mean (of a simple random sample) is an estimator of the
population mean that is both accurate and precise. Its expected value is equal to
the population mean itself that is why it is unbiased and, consequently, an accurate
estimator. It is precise because statistical theory has determined that it has the
smallest standard error compared to other estimators. Having these good
! 235#
properties of an estimator makes the sample mean a good estimator of the
population mean.
E. Enrichment
Encourage learners to come up with a survey to discover something that can be

relevant to their experience in high school. For example, they can ask the biggest
concern of learners in their grade level (Is it their academics, family, friends/peers,
etc?) or what the learners in their grade level want to do after high school, or who
they want to support in the next national elections. They can explore different
sampling methods or try different ways of collecting data (interviews,
questionnaires, text polls). Then, have them report their findings and ask them how
they can come up with the appropriate interpretation of the data they generated.
KEY POINTS
• Sampling is undertaken over full enumeration (census) since selecting a sample is

less time-consuming and less costly than selecting every item in the population.
An analysis of a sample is also less cumbersome and more practical than an
analysis of the entire population.
• Probability sampling involves units obtained using chance mechanism, and
requires the use of a sampling frame (a list/map of all the sampling units in the
population) while in a non-probability sample, units are chosen without regard to
their probability of occurrence. The latter type of sample should not be used for
statistical inference. Among the typical basic probability samples include
o Simple random sample wherein sample size n is one in which each set of n
elements in the population has an equal chance of being selected,
o Systematic sample is a sample drawn by first selecting a fixed starting point
in the larger population and then obtaining subsequent observations by
using a constant interval between samples taken.
o Stratified random sample is a sample chosen in such a way that the
population is divided into several subgroups, called strata, with random
samples drawn from each stratum.
o Cluster sample is a sample where entire groups (or clusters) are chosen at
random.
! 236#
• Types of Survey errors
o Sampling error results from chance variation from sample to sample in a
probability sample. It is roughly the difference between the value obtained
in a sample statistic and the value of the population parameter that would
have arisen had a census been conducted. Since estimates of a parameter
from a probability sample would vary from sample to sample, the variation
in estimates serves as a measure of sampling error.
o Non-sampling error:
! Coverage error or selection bias results if some groups are excluded
from the frame and have no chance of being selected
! Non-response error or bias occurs when people who do not
respond may be different from those who do respond
! Measurement error arising due to weaknesses in question design,
respondent error, and interviewer’s impact on the respondent
• A representative sample, using chance-based methods for selecting the sample
units, can provide insights about a population. The size of the sample, not its
relative size to the larger population, determines the precision of the statistics it
generates. Randomization, i.e. using chance-based procedures for selecting
respondents, is the best guarantee against bias.
ASSESSMENT
I. Select the best choice.

1. The process of using sample statistics to draw conclusions about true population parameters is
called
a) statistical inference
b) the scientific method
c) sampling
d) descriptive statistics
ANSWER: a
2. The universe or "totality of items or things" under consideration is called

a) a sample
b) a population
c) a parameter
d) a statistic
ANSWER: b
3. The portion of the universe that has been selected for analysis is called
a) a sample
b) a frame
c) a parameter
! 237#
d) a statistic
ANSWER: a
4. A summary measure that is computed to describe a characteristic from only a sample of the
population is called
a) a parameter
b) a census
c) a statistic
d) the scientific method
ANSWER: c
5. A summary measure that is computed to describe a characteristic of an entire population is

called
a) a parameter
b) a census
c) a statistic
d) the scientific method
ANSWER: a
6. Which of the following is most likely a population as opposed to a sample?

a) respondents to a newspaper survey
b) the first 5 learners completing an assignment
c) every third person to arrive at the bank
d) registered voters in a county
ANSWER: d
7. Which of the following is most likely a parameter as opposed to a statistic?

a) The average score of the first five learners completing an assignment
b) The proportion of females registered to vote in a county
c) The average height of people randomly selected from a database
d) The proportion of trucks stopped yesterday that were cited for bad brakes
ANSWER: b
8. Which of the following is NOT a reason for the need for sampling?
a) It is usually too costly to study the whole population.
b) It is usually too time-consuming to look at the whole population.
c) It is sometimes destructive to observe the entire population.
d) It is always more informative by investigating a sample than the entire population.
ANSWER: d
9. Which of the following is NOT a reason for drawing a sample?

a) A sample is less time consuming than a census.
b) A sample is less costly to administer than a census.
c) A sample is always a good representation of the target population.
! 238#
d) A sample is less cumbersome and more practical to administer.
ANSWER: c
10. The Philippine Airlines Internet site provides a questionnaire instrument that can be answered
electronically. Which of the 4 methods of data collection is involved when people complete the
questionnaire?
a) Published sources
b) Experimentation
c) Surveying
d) Observation
ANSWER: c
II. Identify the population, parameter of interest, the sampling frame, the sample, the sampling
method, and any potential sources of biases in the following studies
1. The producers of a television show asked information from Facebook users on the TV
show’s Facebook page about their sentiments (favorable, unfavorable, neutral) on a
segment on the TV show
Answer:
population: TV show viewers; parameter: proportion of TV viewers who have
favorable sentiments to a segment on the TV show; sampling frame: Facebook
users who have access to the TV show’s Facebook page; sampling method:
convenience sample; biases: coverage, non-response, and measurement errors
2. A question posted on the website of a daily newspaper in the Philippines asked visitors
of the site to indicate their voter preference for the next presidential election.
Answer:
population: Filipino voters ; parameter: proportion of voters who would vote for
some presidentiables; sampling frame: visitors of the website; sampling method:
voluntary response (no randomization used); biases: coverage (sampling frame is
not target population), and voluntary response has a “self selection bias” (those
who visit the site and respond may be predisposed to a particular answer).
3. In March 2015, Pulse Asia reported that the leading urgent concerns of Filipinos are
inflation control (46%), the increase of workers' pay (44%), and the fight against
government corruption (40%). On the other hand, Filipinos are least concerned with
national territorial integrity (5%), terrorism (5%), and charter change (4%).The
nationwide survey was conducted from March 1 to 7, 2015 with 1,200 respondents.
Answer:
population: Filipinos; parameter: proportion of people who are concerned about
various socio-economic issues; sampling frame: all Filipino adults; sampling
method: 1200 randomly selected respondents; biases: probably not biased.
Conclusions could be generalized
! 239#
4. A sample survey of persons with disability (PWDs) was designed to be representative of
PWDs, by making use of PWD registers from local government units, but an assessment
suggested the registers were severely undercovering PWDs. The design was adjusted
to make use of snowball sampling where existing sampled PWDs would identify other
future subjects from among their acquaintances. The study attempted to examine the
proportion of PWDs who were poor.
Answer:
population: PWDs; parameter: proportion of PWDs who are poor; sampling
frame: PWD registers with extra PWDs from snowball sampling; sampling
method: random selection from PWD registers plus snowball sampling; biases:
coverage errors, non-response biases (from uncooperative PWDs)
III. Identify which sampling method is applied in the following situations.

1. The teacher randomly selects 20 boys and 15 girls from a batch of learners to be
members of a group that will go to a field trip
Stratified sampling
2. A sample of 10 mice are selected at random from a set of 40 mice to test the effect of a
certain medicine
Simple Random Sampling
3. The people in a certain seminar are all members of two of five groups are asked what
they think about the president.
Cluster Sampling
4. A barangay health worker asks every four house in the village for the ages of the
children living in those households.
Systematic Sampling
5. A sales clerk for a brand of clothing asks people who comes up to her whether they
own a piece of article from her brand.
Volunteer Sampling
6. A psychologist asks his patient, who suffers from depression, whether he knows other
people with the same condition, so he can include them in his study
Snowball Sampling
7. A brand manager of a toothpaste asks ten dentists that have clinic closest to his office
whether they use a particular brand of toothpaste.
Convenience Sampling
! 240#
IV. Examine each of the following questions that could be used in a survey for possible bias.
Indicate how the question might be improved
1. Do you go swimming?
a. Never
b. Rarely
c. Frequently
d. Sometimes
Possible Answer: The problem with this question is in the categories supplied for the answer.
Everybody has a different idea as to what words such as ‘sometimes’ and ‘frequently’ mean.
Instead, give specific time frames such as ‘twice a year’ or ‘once a month’. Also, the order of
answers should follow a logical sequence – in the example above, they do not.
2. How many books have you read in the last year?

a. None
b. 1 to 5
c. 5 to 10
d. 10 to 20
e. Over 20
Possible Answer: This question may contain prestige bias – would people be more likely to say
they have read plenty of books when they might actually not have read any? Also, the categories
for the answers need modification – which box would you tick for someone who answered ‘5’ or
‘10’?
3. Do you think senior high school learners should be required to wear school uniforms?
a. Agree
b. Disagree
c. Neutral or No opinion
Possible Answer: Seems unbiased
4. What do you think about the CPP-NPA attempt to blackmail the Government?
Possible Answer: This is a very leading question which uses an emotive word—blackmail. It
assumes that the CPP-NPA is blackmailing the Government and assumes that someone knows
about the issues and would be able to answer. A filter question would have to be used in this case
and the word “blackmail” changed.
5. What is wrong with the young people of today and what can we do about it?
Possible Answer: This question is double-barreled, leading, and ambiguous. It asks two questions
in one and so needs to be split up.
The word ‘wrong’ is emotive and suggests there is something not normal about the young people
of today. It asks the respondent to distance themselves and comment from a moral high ground
! 241#
CHAPTER 3: SAMPLING
Lesson 3: Sampling Distribution of the Sample Mean

OVERVIEW OF LESSON
In this lesson, learners are given lectures (and assessments) regarding the sampling
distribution of the sample mean and sample proportion (under conditions that random
sampling is done with replacement).
• identify sampling distributions of some statistics of the sample mean and the
sample proportion
• calculate the mean and variance of the sampling distribution of the sample
mean and sample proportion
• describe the approximate sampling distribution of the sample mean (and sample
proportion) when the sample size is large
LESSON OUTLINE
A. Introduction
B. Motivational Activity on Sampling Distribution
C. Lesson Proper:
1. Expected Value and Standard Error
2. Approximating the Sampling Distribution
3. Some Points of Confusion
KEY CONCEPTS: Sampling Distribution, Expected Value, Standard Error, Normal

Curve, Central Limit Theorem
! 242#
A. Introduction
In previous lessons, learners were provided concepts about sampling, including the
reasons for sampling as opposed to the conduct of a full-enumeration census. It was
also pointed out that probability sampling, where samples are selected using
chance methods, enable the samples to be representative of the population being
studied. Samples can be drawn with and without replacement. In addition, if the
sampling protocol were to be replicated, then a new set of samples (and data)
would be obtained, thus yielding different estimates from one sample to another.
Thus, an estimate based on sample could be different if the sampling process were
to be repeated many times. The set of all possible estimates generated is called the
sampling distribution. In this lesson, learners will be provided more descriptions
and discussions on sampling distributions.
B. Motivation Activity on Sampling Distribution
1. Instruct learners to form groups of five members each. Assign a number, from 1
to 5, to each member of the group. For each group, list the weight of the
members on a sheet of paper. Compute for the mean weight of the group, ,
and the standard deviation of the weights, . For reference, the formula for the
standard deviation is
2. Tell them to list all the possible samples with two members with replacement.
For example: 1-1, 1-2, 1-3, 1-4, 1-5, 2-1, 2-2, etc. There should be a total of 25
possible samples. Slearners may list the weights of the learners in each of the
samples.
3. For each of the samples, compute for the average weight of the learners. Tell
them that these are the possible values of the sample means, . Ask them how
many of the sample means are equal to the mean weight of the group. They
might notice that most, if not all, of the sample means are not equal to the true
mean weight. Ask them if they think that based on the activity, will getting a
sample always be representative of the population? They should notice that
there is “chance error” i.e. each sample mean may differ from the population
mean, most of the differences are negligible.
4. After computing for all the possible values of , ask them to compute the
average of all the values of . This is the mean of all the sample means, . Ask
! 243#
them to compare this value to the mean of the group. The learners should
notice that , which we shall refer to as the expected value of the sampling
distribution of the sample mean, is equal to the population (or group’s) mean .
What does this say about the mean of all the possible samples? They should
mention that this suggests that while there is chance error, the sample mean
appears to be a fairly good estimate of the mean .
5. Next, ask them to compute for the standard deviation of all the sample means,
. Ask them to also compare this with the population standard deviation .
They should notice that the standard deviation of all the sample means (which
we shall later refer to as the “standard error”) is less than the population
standard deviation. This suggests that the “sampling distribution” of the sample
mean is more clustered around than the original “population.” Inform them
that, in fact, when sampling with replicate, the standard deviation of sample
means, is , where the sample size here is 2.
6. Also, for each sample, ask learners to compute for the proportion of males (The
number of males in the sample divided by the size of the sample. This should be
one of these three 0, ½, or 1 since each sample may have zero, one, or two
males). This is called a sample proportion, .
7. Compute for the mean of all sample proportions. That is . How does this
compare to the proportion of males in the group? This is similar to the result
earlier that the mean of the sampling distribution is the targeted mean since a
proportion is essentially an average, an average of 1’s and 0’s.
8. Next, ask them to compute for the standard deviation of all the possible values
of the sample proportion, . Tell them to recall that for a binomial probability
model, the standard deviation is where is the proportion of males
in the group. Let them notice that is less than . In fact, should
equal , where is the sample size 2?
Here is a sample table that the learners can prepare:

Number in sample Weights of Sample mean, Sample proportion,
sample
1-1 40 kg, 40 kg 40 kg 1
1-2 40 kg, 38 kg 39 kg 0.5
…
5-5 39 kg, 39 kg 39 kg 0
Total 1237.5 15
Average 49.5 kg 0.6
! 244#
C. Lesson Proper
1. Expected Value and Standard Error
Given a sample data set that was drawn from a certain population, the resulting
sample mean serves as an estimate of the mean of the population from which the
sample was derived. Ask learners to imagine having a replicated run of the random
sampling protocol. Would the new estimate of the mean be the same as the
previous estimate? Point out to learners that the new estimate is likely to vary from
the first estimate because of randomization. If the protocol were to be further
replicated many times, then we would have a distribution of estimates. The set of all
possible estimates is called the sampling distribution. This sampling distribution has
a mean and standard deviation. Henceforth, we define:
• the expected value (EV) as the mean of the sampling distribution

• the standard error (SE) as the standard deviation of the sampling distribution
It turns out that for the sampling distribution of the mean, EV is the population
mean m. That is,
Thus, we say that the sample mean is an unbiased estimate of the population
mean m. In addition, the SE can be viewed as measuring the amount of chance
variability in the estimates that could be generated over all possible samples. Ask
learners whether they prefer the estimates to have less variability. Their intuition
should lead them to say that they desire estimates with a small SE, since in this
case, the chances are good that the estimate will be close to the true value of the
parameter.
Consider the data in Lesson 1-08 pertaining to the heights, weights, and BMIs of learners,
but suppose you limit interest to information about the female learners, whose heights,
weights, and MBIs we replicate below:
! 245#
Student Height Weight BMI (in
A visualization of the
(in m) (in kg) kg/m 2 )
1 1.64 40 14.8721
distribution of the
heights, weights, and
2 1.52 50 21.64127
BMIs of the 15 learners
3 1.52 49 21.20845
is given in Figure 3-03.1
4 1.65 45 16.52893
a, b, and c, respectively.
5 1.02 60 57.67013
6 1.626 45 17.02046 !
7 1.5 38 16.88889
8 1.6 51 19.92188
9 1.42 42.2 20.92839
10 1.52 54 23.37258
11 1.48 46 21.00073
12 1.62 54 20.57613
13 1.5 36 16
14 1.54 50 21.08281
15 1.67 63 22.58955
(Population) Average 1.52 48.21 22.09
(Population) Standard 0.11 5.23 6.96
Deviation
.5
.8
.4
.6
.3
Fraction
Fraction
.4
.2
.2
.1
0
0
1 1.2 1.4 1.6 35 40 45 50 55 60

Height Weight
(A) (B)
1
.8
.6
Fraction
.4
.2
0
10 20 30 40 50
BMI
(C)
! 246#
Figure 3-03.1. Distribution of (a) Heights, (b) Weights, and (c) BMIs of 15
Female Learners
Suppose you are interested in estimating the (population) average height, (population)
average weight, and (population) average BMI of these 15 female learners by getting
estimates based on the (sample) average height, (sample) average weight, and (sample)
average BMI of two female learners selected at random.
Draw a box of 15 tickets and mention that these tickets represent the 15 female
learners. Ask learners how many possible samples of size 2 can be obtained. They
should say that there are 15x15=225 possible sample of size 2 (i.e., 225 possible
samples of 2 female learners that can be drawn from the population of 15 female
learners)
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
Table 3-03.1 lists all the 225 possible random samples with replacement of sample size
2 that could be taken from the box above, together with the (sample) average height,
(sample) average weight, and (sample) average BMI for the specific sample drawn. For
example, if we had obtained sample number 80, with the tickets 6 and 5 drawn,
meaning student number 6 and student number 5 were selected, then for this sample,
the average of the height, weight and BMI of the learners is 1.323 m, 52.5 kg, and
37.3453 kg/m2, respectively.
Student Height Weight BMI (in
(in m) (in kg) kg/m 2 )
6 1.626 45 17.02046
5 1.02 60 57.67013
Average 1.323 52.5 37.3453
Each of the 225 possible samples has an equal chance, i.e. 1/225, of being selected.
The sample averages for the heights, weights, and BMIs have sampling distributions
illustrated by Figure 3-03-2 (a), (b) and (c). While only one sample of size 2 would
actually be chosen, but there could be a host of possible samples and thus, several
possible sample averages for the heights, weights, and BMIs that will serve as estimates
for the population averages. Estimates for the population average height (of 1.52
meters), for instance, can vary, going for as low as 1.02 meters to as high as 1.67. In
estimating the population average weight (of 48.21 kg), estimates for the sample
average weight can range from 36 kg to 63, while for the population average BMI (of
22.09 kg/m2), the sample averages can go from as low as 14.87 kg/m2 to as high as
57.67.
! 247#
Table 3-03.1 Distinct samples of size two (with replacement) and average heights, weights,
and BMIs of sample
Sample First Second Average Average Average BMI Sample First Second Average Average Average BMI Sample First Second Average Average Average
Student Student Height Weight Student Student Height Weight Student Student Height Weight BMI
1 1 1 1.64 40 14.8721 76 6 1 1.633 42.5 15.94628 151 11 1 1.595 58.5 22.98107
2 1 2 1.58 45 18.25669 77 6 2 1.573 47.5 19.33087 152 11 2 1.56 43 17.93642
3 1 3 1.58 44.5 18.04028 78 6 3 1.573 47 19.11446 153 11 3 1.5 48 21.321
4 1 4 1.645 42.5 15.70052 79 6 4 1.638 45 16.7747 154 11 4 1.5 47.5 21.10459
5 1 5 1.33 50 36.27112 80 6 5 1.323 52.5 37.3453 155 11 5 1.565 45.5 18.76483
6 1 6 1.633 42.5 15.94628 81 6 6 1.626 45 17.02046 156 11 6 1.25 53 39.33543
7 1 7 1.57 39 15.8805 82 6 7 1.563 41.5 16.95468 157 11 7 1.553 45.5 19.0106
8 1 8 1.62 45.5 17.39699 83 6 8 1.613 48 18.47117 158 11 8 1.49 42 18.94481
9 1 9 1.53 41.1 17.90025 84 6 9 1.523 43.6 18.97443 159 11 9 1.54 48.5 20.46131
10 1 10 1.58 47 19.12234 85 6 10 1.573 49.5 20.19652 160 11 10 1.45 44.1 20.96456
11 1 11 1.56 43 17.93642 86 6 11 1.553 45.5 19.0106 161 11 11 1.5 50 22.18666
12 1 12 1.63 47 17.72412 87 6 12 1.623 49.5 18.7983 162 11 12 1.48 46 21.00073
13 1 13 1.57 38 15.43605 88 6 13 1.563 40.5 16.51023 163 11 13 1.55 50 20.78843
14 1 14 1.59 45 17.97746 89 6 14 1.583 47.5 19.05164 164 11 14 1.49 41 18.50037
15 1 15 1.655 51.5 18.73083 90 6 15 1.648 54 19.80501 165 11 15 1.51 48 21.04177
16 2 1 1.58 45 18.25669 91 7 1 1.57 39 15.8805 166 12 1 1.575 54.5 21.79514
17 2 2 1.52 50 21.64127 92 7 2 1.51 44 19.26508 167 12 2 1.63 47 17.72412
18 2 3 1.52 49.5 21.42486 93 7 3 1.51 43.5 19.04867 168 12 3 1.57 52 21.1087
19 2 4 1.585 47.5 19.0851 94 7 4 1.575 41.5 16.70891 169 12 4 1.57 51.5 20.89229
20 2 5 1.27 55 39.6557 95 7 5 1.26 49 37.27951 170 12 5 1.635 49.5 18.55253
21 2 6 1.573 47.5 19.33087 96 7 6 1.563 41.5 16.95468 171 12 6 1.32 57 39.12313
22 2 7 1.51 44 19.26508 97 7 7 1.5 38 16.88889 172 12 7 1.623 49.5 18.7983
23 2 8 1.56 50.5 20.78158 98 7 8 1.55 44.5 18.40539 173 12 8 1.56 46 18.73251
24 2 9 1.47 46.1 21.28483 99 7 9 1.46 40.1 18.90864 174 12 9 1.61 52.5 20.24901
25 2 10 1.52 52 22.50693 100 7 10 1.51 46 20.13074 175 12 10 1.52 48.1 20.75226
26 2 11 1.5 48 21.321 101 7 11 1.49 42 18.94481 176 12 11 1.57 54 21.97436
27 2 12 1.57 52 21.1087 102 7 12 1.56 46 18.73251 177 12 12 1.55 50 20.78843
! 248$
28 2 13 1.51 43 18.82064 103 7 13 1.5 37 16.44445 178 12 13 1.62 54 20.57613
29 2 14 1.53 50 21.36204 104 7 14 1.52 44 18.98585 179 12 14 1.56 45 18.28807
30 2 15 1.595 56.5 22.11541 105 7 15 1.585 50.5 19.73922 180 12 15 1.58 52 20.82947
31 3 1 1.58 44.5 18.04028 106 8 1 1.62 45.5 17.39699 181 13 1 1.645 58.5 21.58284
32 3 2 1.52 49.5 21.42486 107 8 2 1.56 50.5 20.78158 182 13 2 1.57 38 15.43605
33 3 3 1.52 49 21.20845 108 8 3 1.56 50 20.56517 183 13 3 1.51 43 18.82064
34 3 4 1.585 47 18.86869 109 8 4 1.625 48 18.22541 184 13 4 1.51 42.5 18.60423
35 3 5 1.27 54.5 39.43929 110 8 5 1.31 55.5 38.79601 185 13 5 1.575 40.5 16.26447
36 3 6 1.573 47 19.11446 111 8 6 1.613 48 18.47117 186 13 6 1.26 48 36.83507
37 3 7 1.51 43.5 19.04867 112 8 7 1.55 44.5 18.40539 187 13 7 1.563 40.5 16.51023
38 3 8 1.56 50 20.56517 113 8 8 1.6 51 19.92188 188 13 8 1.5 37 16.44445
39 3 9 1.47 45.6 21.06842 114 8 9 1.51 46.6 20.42514 189 13 9 1.55 43.5 17.96094
40 3 10 1.52 51.5 22.29052 115 8 10 1.56 52.5 21.64723 190 13 10 1.46 39.1 18.4642
41 3 11 1.5 47.5 21.10459 116 8 11 1.54 48.5 20.46131 191 13 11 1.51 45 19.68629
42 3 12 1.57 51.5 20.89229 117 8 12 1.61 52.5 20.24901 192 13 12 1.49 41 18.50037
43 3 13 1.51 42.5 18.60423 118 8 13 1.55 43.5 17.96094 193 13 13 1.56 45 18.28807
44 3 14 1.53 49.5 21.14563 119 8 14 1.57 50.5 20.50235 194 13 14 1.5 36 16
45 3 15 1.595 56 21.899 120 8 15 1.635 57 21.25572 195 13 15 1.52 43 18.54141
46 4 1 1.645 42.5 15.70052 121 9 1 1.53 41.1 17.90025 196 14 1 1.585 49.5 19.29478
47 4 2 1.585 47.5 19.0851 122 9 2 1.47 46.1 21.28483 197 14 2 1.59 45 17.97746
48 4 3 1.585 47 18.86869 123 9 3 1.47 45.6 21.06842 198 14 3 1.53 50 21.36204
49 4 4 1.65 45 16.52893 124 9 4 1.535 43.6 18.72866 199 14 4 1.53 49.5 21.14563
50 4 5 1.335 52.5 37.09953 125 9 5 1.22 51.1 39.29926 200 14 5 1.595 47.5 18.80587
51 4 6 1.638 45 16.7747 126 9 6 1.523 43.6 18.97443 201 14 6 1.28 55 39.37647
52 4 7 1.575 41.5 16.70891 127 9 7 1.46 40.1 18.90864 202 14 7 1.583 47.5 19.05164
53 4 8 1.625 48 18.22541 128 9 8 1.51 46.6 20.42514 203 14 8 1.52 44 18.98585
54 4 9 1.535 43.6 18.72866 129 9 9 1.42 42.2 20.92839 204 14 9 1.57 50.5 20.50235
55 4 10 1.585 49.5 19.95076 130 9 10 1.47 48.1 22.15049 205 14 10 1.48 46.1 21.0056
56 4 11 1.565 45.5 18.76483 131 9 11 1.45 44.1 20.96456 206 14 11 1.53 52 22.2277
57 4 12 1.635 49.5 18.55253 132 9 12 1.52 48.1 20.75226 207 14 12 1.51 48 21.04177
58 4 13 1.575 40.5 16.26447 133 9 13 1.46 39.1 18.4642 208 14 13 1.58 52 20.82947
! 249$
59 4 14 1.595 47.5 18.80587 134 9 14 1.48 46.1 21.0056 209 14 14 1.52 43 18.54141
60 4 15 1.66 54 19.55924 135 9 15 1.545 52.6 21.75897 210 14 15 1.54 50 21.08281
61 5 1 1.33 50 36.27112 136 10 1 1.58 47 19.12234 211 15 1 1.605 56.5 21.83618
62 5 2 1.27 55 39.6557 137 10 2 1.52 52 22.50693 212 15 2 1.655 51.5 18.73083
63 5 3 1.27 54.5 39.43929 138 10 3 1.52 51.5 22.29052 213 15 3 1.595 56.5 22.11541
64 5 4 1.335 52.5 37.09953 139 10 4 1.585 49.5 19.95076 214 15 4 1.595 56 21.899
65 5 5 1.02 60 57.67013 140 10 5 1.27 57 40.52136 215 15 5 1.66 54 19.55924
66 5 6 1.323 52.5 37.3453 141 10 6 1.573 49.5 20.19652 216 15 6 1.345 61.5 40.12984
67 5 7 1.26 49 37.27951 142 10 7 1.51 46 20.13074 217 15 7 1.648 54 19.80501
68 5 8 1.31 55.5 38.79601 143 10 8 1.56 52.5 21.64723 218 15 8 1.585 50.5 19.73922
69 5 9 1.22 51.1 39.29926 144 10 9 1.47 48.1 22.15049 219 15 9 1.635 57 21.25572
70 5 10 1.27 57 40.52136 145 10 10 1.52 54 23.37258 220 15 10 1.545 52.6 21.75897
71 5 11 1.25 53 39.33543 146 10 11 1.5 50 22.18666 221 15 11 1.595 58.5 22.98107
72 5 12 1.32 57 39.12313 147 10 12 1.57 54 21.97436 222 15 12 1.575 54.5 21.79514
73 5 13 1.26 48 36.83507 148 10 13 1.51 45 19.68629 223 15 13 1.645 58.5 21.58284
74 5 14 1.28 55 39.37647 149 10 14 1.53 52 22.2277 224 15 14 1.585 49.5 19.29478
75 5 15 1.345 61.5 40.12984 150 10 15 1.595 58.5 22.98107 225 15 15 1.605 56.5 21.83618
! 250$
.3
.2
Fraction
.1
0
1 1.2 1.4 1.6

Average Height
(a)
.15
.1
Fraction
.05
0
35 40 45 50 55 60
Average Weight
(b)
! 251$
.4
.3
Fraction
.2
.1
0
10 20 30 40 50 60
Average BMI
(c)
Figure 3-03.2. Sam pling distributions of the sam ple m eans of size n=2
learners for their (a) heights, (b) weights and (c) BM I levels
With regards to the sampling distribution of the sample average height, the EV may be
readily calculated as the mean of all the 225 sample average heights:
while the SE, the standard deviation of the sample average heights, is
It may be noted that the true values of the population average and population standard
deviation are 1.52 and 0.15, respectively.
In practice, each sample average (height) will be off from the population average
(height) by some “chance error.” How big the chance error is likely to be is roughly
measured by the SE. The sample averages are rarely more than 2 or 3 SEs away from
the population average.
Learners will note that the EV, i.e. the center of the sampling distribution,
! 252$
is equal to the targeted population mean (height)
while the SE, the standard deviation of the sample average heights,
is less than the standard deviation of the population of heights data:
It turns out that theoretically, the SE is the ratio of the population sample standard
deviation to the square root of the sample size, i.e. ,
Similarly, learners can verify that for weights, the sampling distribution has EV
which is equal to the mean of the weights distribution
and that the SE, the standard deviation of the sample average weights,
is the ratio of the (population) standard deviation of the weights to the square root of
the sample size, i.e.
Learners may be able to see that the pattern holds true for BMI. That is, the sampling
distribution has an expected value
that is equal to the mean of the weights distribution
and that the SE, the standard deviation of the sample average BMIs,
! 253$
is also the ratio of the (population) standard deviation of the BMIs to the square root of
the sample size, i.e.
Learners should now recognize that

• a sample statistic (a summary measure from sample data such as the sample
mean, sample percentage, sample median, and even the sample standard
deviation) has an associated sampling distribution arising from the fact that if
we were to repeatedly take all possible samples that could be generated, the
sample statistics will differ from sample to sample
• in practice, these sample statistics (such as a sample mean) will not equal to
the targeted parameter (i.e. the population mean)
• the difference between the sample statistic and the targeted parameter is a
chance error, which you hope to be small. How big the chance error is likely
to be is roughly measured by the SE, the ratio of the (population) standard
deviation to the square root of the sample size
It would be desirable to have the EV of the sampling distribution equal to the

targeted parameter. In which case, the estimate is an unbiased estimate of the
parameter.
• Sample means turn out to be unbiased estimates of the population mean,
i.e. the expected value of the sampling distribution of the sample means is
the population mean.
Also, you would want to have the standard error, i.e. the standard deviation of
the sampling distribution, take a small value as this would imply that the
estimate is accurate. Typically, to have a small standard error, you would have to
increase the sample size.
2. Approximating the Sampling Distribution
Inform learners that an approximation of the histogram of the sampling distribution

can be obtained when the sample size is large.
! 254$
In the last chapter, learners learned about the normal curve and how to determine
areas under the curve, as well as how to obtain specific percentiles of the normal
curve. One of the reasons why the normal curve is important in statistical
applications is that under some fairly regular conditions, the normal curve can be
used to approximate sampling distributions of sample averages and sample
proportions regardless of the original shape of the parent distributions provided
that the sample sizes are rather large. This is called the Central Limit Theorem.
Example Revisited:
Recall the earlier example that provided an illustration of the sampling distribution of the
sample mean for a sample of size 2 taken with replacement from a box model with 15
tickets (representing 15 female learners).
Show learners the results of a simulation experiment conducted with a statistical software
called Stata. This simulation experiment consisted of generating 10,000 experiments of
random samples with replacement of sample size n=3, n=5, n=9 and n=12 from the box
model representing the distribution of heights, weights, and BMIs of the N=15 learners.
The histograms of the resulting (simulated) sampling distributions for heights are shown in
Figure 3-03.3 (i)-(iv).
15
15
10
10
Percent
Percent
5
5
0
1 1.2 1.4 1.6 1.8 1 1.2 1.4 1.6 1.8

Sampling Distribution for Average Heights, n = 3 Sampling Distribution for Average Heights, n = 5
10
8
8
6
Percent
Percent
6
4
4
2
2
0
1.2 1.3 1.4 1.5 1.6 1.35 1.4 1.45 1.5 1.55 1.6
Sampling Distribution for Average Heights, n = 9 Sampling Distribution for Average Heights, n = 14
! 255$
Figure 3-03.3. Sam pling Distribution of the Sam ple M ean Height (taken
from a random sample with replacem ent) of size (i) n=3; (ii) n=5; (iii) n=9;
(iv) n=14.
What can learners notice as n gets larger? They should notice that the sampling
distribution looks more and more like a normal curve as the sample size gets larger.
Figure 3-03.4 shows the sampling distribution for average heights of sample of size
n=3, n=5, n=9 and n=14. Learners should notice that the normal approximation for the
sampling distribution is already quite good even for n=3, but gets even better for n=14.
8
8
6
6
Percent
Percent
4
4
2
2
0
30 40 50 60 35 40 45 50 55 60
Sampling Distribution for Average Weights, n=3 Sampling Distribution for Average Weights, n=5
8
8
6
6
Percent
Percent
4
4
2
2
0
40 45 50 55 60 40 45 50 55 60
Sampling Distribution for Average Weights, n=9 Sampling Distribution for Average Weights, n=14
Figure 3-03.4. Sam pling Distribution of the Sam ple M ean W eight (taken
from a random sample with replacem ent) of size (i) n=3; (ii) n=5; (iii) n=9;
(iv) n=14.
For the BMI values, an illustration of the sampling distribution for sample averages of
sample size n=3, n-5, n=9 and n=14 is given in Figure 3-03.5, where we observe that for
“small samples” (of size n=3, 5), the sampling distribution appears to be a “mixture” of
at least two distributions, but as the sample size increases, once again the sampling
distribution appears to stabilize toward a normal curve.
! 256$
25
20
20
15
Percent
Percent
15
10
10
5
5
0
0
10 20 30 40 50 60 10 20 30 40 50
Sampling Distribution for Average BMIs, n=3 Sampling Distribution for Average BMIs, n=5
15
10
8
10
Percent
Percent
6
4
5
2
0
0
15 20 25 30 35 40 15 20 25 30 35
Sampling Distribution for Average BMIs, n=9 Sampling Distribution for Average BMIs, n=14
Figure 3-03.5. Sam pling Distribution of the Sam ple M ean BM I (taken from
a random sam ple with replacem ent) of size (i) n=3; (ii) n=5; (iii) n=9; (iv)
n=14.
Summary
Three% Major% Points% about% the% Sampling% Distribution% of% the% Sample%
Mean%
(i) The!EV!is!the!population!mean!µ!
(ii) The!SE!of!the!mean!is! !!(for!samples!with!replacement),!where!σ!is!
the!population!standard!deviation.!!
(iii) The! shape! is! approximately! normal,! provided! the! sample! size! is!
large!enough,!and!regardless!of!the!shape!of!parent!distribution.!
!
(i) The first major point, The EV is the population mean m, illustrates why the
sample mean is a reasonable estimate of the population mean. While there may
be some chance errors involved, on average, the chance errors cancel out. That
! 257$
is, the average of the sampling distribution is the population mean, which we
wish to estimate.
The second major point, The SE of the mean is (for samples with replacement),
where s is the population standard deviation, provides us a fast way to calculate the
SE of the mean (when we have generated a random sample with replacement). It
should be stressed to learners that the SE is an important guide for establishing the
reliability of the estimate. Other things being equal, you would prefer an estimate
that has a small SE. In the case of estimating the population mean, you could either
minimize the individual variability or increase the sample size to achieve a desired
standard error level.
The third major point is called the Central Limit Theorem, and it states that whether
the shape of the population histogram is skewed or symmetric, the sampling
distribution of the sample mean will still be reasonably approximated by a normal
curve, provided that you have a rather large sample size. Actually, there are some
mathematical conditions that the population distribution ought to obey, but they
need not be a concern. Also, how “large” a large enough sample is will actually
depend on the parent distribution. For most distributions, 30 appears to be good
enough, for practically all, 100 appears to be good enough. If, however, the parent
distribution is symmetric, even 12 to 15 would be good enough. If the parent
distribution is a normal curve, then all we need is one observation!
Similar results can be obtained for the case of sample proportions (since a
proportion is also an average of 1’s and 0’s):
• the expected value of the sampling distribution is the true value of the
population percentage regardless of the sample size. Thus, the sample
percentage (like the sample mean) is an unbiased estimate of the population
percentage (respectively the population mean).
• The SE of the sampling distribution (of sample proportions) is inversely
proportional to the square root of the sample size, in fact
SE =
so that as the sample size increases, the SE becomes smaller, i.e. the sampling
distribution of the proportion tends to bunch up more toward the true value of
the population proportion. Note that since the SE depends on the unknown
population percentage P, we can estimate the SE by bootstrapping the
population proportion P by the sample proportion p, i.e. estimating the SE with
! 258$
Alternatively, one might notice that the SE of the sample proportion is at most
since the quadratic function P (1-P) is maximized at ¼ when P = ½.
• as the sample size is increased, with both np and np(1-p) both at least 10, the
sampling distribution of sample proportions gets to be more and more like a
normal curve.
It is interesting to note that when estimating percentages or proportions, it is the
absolute size of the sample, i.e., n itself,
In summary, we can state the following major facts about the sampling distribution
of the sample proportion:
Three%Major%Points%about%the%Sampling%Distribution%of%the%Sample%
Proportion!p!
(i) The!EV!!for!the!sample!proportion!is!the!population!percentage!
P!
(ii) The!SE!of!the!proportion!is .!!Since!this!SE!depends!on!the!
unknown!population!percentage!P,!we!need!to!estimate!it.!The!
SE! may! be! readily! estimated! by! ,! i.e.,! bootstrapping! the!
population! proportion! P! by! the! sample! proportion! p.!
Alternatively,! one! might! notice! that! the! SE! of! the! sample!
proportion!p!is!at!most! !since!the!quadratic!function#P!(1FP)!
is!maximized!at!¼!when!P!=!½.!
(iii) The!shape!is!approximately!normal,!provided!the!sample!size!is!
large!enough,!with!both np and!np( 1- p)!both!at!least!10.!
!
3. Some Points of Confusion
Often learners (and even teachers) get confused about the following
a) The original parent distribution of heights, weights, and BMI levels (this is the
“population) nor the distribution of sample data is not the same as the
sampling distribution. When you take a sample of the data, one can look at
the sample statistics and even the shape of the distribution of the sample.
! 259$
But this is not the sampling distribution. The sampling distribution is an
imaginary construct: the collection of all possible values that a statistic (such
as a sample mean, or a sample proportion). We look into the sampling
distribution to be able to make a statement about the ‘behavior’ of a statistic
(as a statistic, like a sample mean or sample proportion, serves as an estimate
of a population parameter such as a population mean, or population
proportion).
b) The notion about independence in a random sample. The Central Limit
Theorem, which discusses the behavior of the sampling distribution as the
sample size gets large, depends on the notion of independence of sample
data. Good and well-designed sampling (and randomized experiments)
ensures independence
c) The behavior of small samples. The Central Limit Theorem describes what
happens for large samples. For the weight data, which was already rather
symmetric (see Figure 3-03.1 b), even small samples would already yield a
sampling distribution that is very nearly normal. However, for the BMI data,
which was very skewed, the sample size would need to be fairly large for the
normal approximation to work.
d) Formulas for EV and SE. There are only two sample statistics examined here,
the sample mean and the sample proportion. The EV for the sample mean
turns out to be the population mean (and this is why the sample mean is said
to be a rather good estimator of the population mean). The EV for the
sample proportion likewise turns out to be population proportion (and this is
why the sample proportion is a good estimator of the population
proportion). Sample means and sample proportions might have some chance
error from the true values, but the average of all possible estimates turns out
to be the target. The SE of the sample mean is the ratio of the population
standard deviation to the square root of the sample size. For the case of the
sample proportion, which is also a mean (a mean of 1’s and 0’s), the SE is
since the underlying probability model here for the population is a
binomial which has a standard deviation of .
! 260$
KEY POINTS
• For sampling distribution of sample means
o The expected value is the population mean

σ
o The standard error of the mean is for samples with replacement
n
o The shape is approximately normal, provided the sample size is large
enough, and regardless of the shape of parent distribution
• For sampling distribution of sample proportions:

o The expected value is the population percentage
o The standard error of the proportion is . This standard error may
be readily estimated by , i.e.. bootstrapping the population

proportion P by the sample proportion p. Alternatively, one might notice
that the SE of the sample proportion is at most since the quadratic

function P (1-P) is maximized at ¼ when P = 1/2.
o The shape is approximately normal, provided the sample size is large
enough, with both np and np(1-p) both at least 10.
ASSESSMENT
1. Records indicate that the value of dwellings in a certain city is skewed to the right, with
a mean of P1.4 million and a standard deviation of P600,000. To check the accuracy of
the records, the city officials plan to conduct a detailed appraisal of 100 dwellings,
selected at random. Obtain an approximate sampling distribution model for the sample
mean value of the dwellings selected.
ANSWER: The sampling distribution should have an EV of P1.4 million with a SE

= . Use a normal distribution with mean P1.4 million and
standard deviation P60,000 to approximate the sampling distribution.
2. Suppose that a city gets an average of 36.4 inches of rain each year, with a standard
deviation of 4.2 inches. Assume that a normal curve applies.
a) During what percentage of years does the city get more than 41 inches of rain?
b) Less than how much rain falls in the driest 20% of all years?
! 261$
c) A certain university is found in this city. A student has been studying in this
university for 4 years. Let y be the average amount of rain in those 4 years.
Describe the sampling distribution of the sample mean y
d) What is the chance that those 4 years average less than 31 inches of rain?
ANSWER:
a)
b) Identify the value of x where . From the table of values from

the normal distribution, . From here, we can get the value of
x,
c) Sampling distribution has EV = 36.4 and SE = ; population of rainfall

data follows a normal curve, so the sampling distribution is, in fact, having an
exact normal curve with mean 36.4 and standard deviation 2.1
d)
3. The weight of potato chips in a medium-sized bag is said to be 10 ounces. The amount
that the packaging machine puts in a bag of potato chips can be modeled by a normal
curve with mean 10.2 ounces and standard deviation 0.12 ounces.
a) What fraction of all bags sold are underweight?

b) Some of the chips are sold in bargain packs of 3 bags. What is the chance that
none of the 3 is underweight?
c) What is the probability that the average weight of the 3 bags is below the stated
amount?
d) What is the chance the average weight of a 24-bag case of potato chips is below
10 ounces?
ANSWERS:
a)
b)
c) Since the question is about the average of three, we use the sampling
distribution of the sample mean. The average is the same as that of the
population, 10.2. However, the standard deviation is now .
Therefore the answer will be,
! 262$
d) This is very similar to the previous question. However, the standard deviation
is now .Thus, the answer to the question is
4. Consider a school district that has 10,000 11th graders. In this district, the average
weight of an 11th grader is 45 kg, with a standard deviation of 10 kg. Suppose you
draw a random sample of 50 learners. What is the probability that the average weight of
a sampled student will be less than 42.5 kg?
ANSWER: To solve this problem, we need to define the sampling distribution of
the mean. Because our sample size (of 50) is fairly large, we might assume that
the Central Limit Theorem holds, i.e. that the sampling distribution of the
sample mean will approximate a normal curve.
The EV of the sampling distribution is equal to the mean of the population (45
kg), while the SE of the sampling distribution (for sampling with replacement) is
given by SE =
Thus, using areas under a normal curve, the probability that a sample mean (of
50 learners) will have an average weight less than or equal to 42.5 kg is
approximately equal to 0.038.
5. Based on past experience, a bank believes that 20% of credit card customers are
considered bad credits. The bank has recently given 500 credit cards.
a) What are the mean and standard deviation of the sample proportion of bad
credits among the 500 credit cards?
b) What assumptions underlie your answer in a)
c) What is the chance that over 25% of these credit card applicants become bad
credits?
ANSWER:
a) For the sampling distribution of the sample proportions,
The EV, i.e., the mean of the sampling distribution, is
EV =m = 20% (i.e. the population proportion)
while SE, the standard deviation of the sampling distribution, is
SE =
! 263$
b) Assume that credit card clients pay their dues independently of each other so
that you have a random sample of all possible clients, and that these represent
less than 20% of all possible clients;
n p = 500 (0.20) = 100 and n p ( 1- p) = 500 (0.2) (0.8) = 80 are both at least 10.
c) Using a normal approximation to the sampling distribution, the chance that

over 25% of these credit card applicants become bad credits is approximately
equal to the area under a normal curve (with mean 20% and standard deviation
1.8%) to the right of 25%, i.e.,
0.0027366
6. Assume that 40% of senior high school learners have Twitter accounts,
a) We randomly choose 100 learners. Let p represent the proportion of learners in
this sample that have Twitter accounts. What is the approximate model for the
sampling distribution of p? Specify the name of the distribution, the mean, and
standard deviation of the sampling distribution. Be sure to verify that the
conditions are met.
b) What is the approximate probability that less than half of this sample have
Twitter accounts?
ANSWER:
a) Normal with EV= m = 40%, SE =
n p = 100 (0.40) = 40 and n p (1- p) = 100 (0.4) (0.6) = 24 are both at least 10
b) Using a normal approximation to the sampling distribution, the chance that less
than half of the sample have Twitter accounts is approximately equal to the area
under a normal curve (with mean 40% and standard deviation 4.9%) to the left of
50%, i.e.,
0.97936546
7. When a truckload of bananas arrives at the pier, a random sample of 150 is selected
and examined for bruises, discoloration, and other defects. The whole truckload is
rejected if more than 5% of the sample is unsatisfactory. Suppose that in fact, 10% of
the bananas on the truck do not meet the desired standard. What is the chance that the
entire truckload of bananas will be accepted anyway?
ANSWER: 0.0206 using a Normal curve with a mean of 0.10, and a standard
deviation of 0.0245
! 264$
CHAPTER 3: SAMPLING
Lesson 4: Sampling Without Replacement

OVERVIEW OF LESSON
This lesson continues the discussion on random sampling and sampling distribution. The lesson
first discuss preliminary concepts on the hypergeometric probability model, which serves as the
model for estimating proportions when sampling without replacement. Then it reexamine the
example from the previous lesson, where samples are drawn with replacement. This time, the
sampling is done without replacement to determine the effect on the sampling distribution.
• identify sampling distributions of statistics (sample mean and sample proportion) when
sampling is conducted without replacement
• calculate the mean (or expected value) and standard deviation (or standard error) of the
sampling distribution of the sample mean (and sample proportion) when sampling is
conducted without replacement
• describe the approximate sampling distribution of the sample mean (and sample
proportion) when the sample size is large and when sampling is conducted without
replacement
• solve problems involving sampling distributions of the sample mean
LESSON OUTLINE
A. Motivation: Survey Estimates of Voting Preferences

B. Introduction: The Binomial and Hypergeometric Probability Models
C. Main Lesson: Sampling Marbles from a Box without Replacement
D. Enrichment
KEY CONCEPTS: Sampling, Estimation, Sampling Variation, Standard Error, Central Limit
Theorem
! 265$
DEVELO PM ENT O F THE LESSO N
A. M otivation: Survey Estim ates of Voting Preferences
On May 10, 2004, the country elected Gloria Macapagal Arroyo (GMA) as President with
36.4% of the votes cast, while her closest rival, Fernando Poe, Jr. (dubbed as the King of
Philippine movies) garnered 33.2% of the votes. A few weeks before the elections, Pulse
Asia, conducted its last pre-election survey suggesting that GMA would lead Poe 37% to
31%, while another organization, Social Weather Stations had Arroyo leading 37% to 30%.
Both surveys were within one percentage point of the actual percentage of votes received
by GMA as per the official count of the Commission on Elections. SWS and Pulse Asia,
however, underestimated the percentage of votes for FPJ by 3.2 and 2.2 percentage
points, respectively.
For the other three presidential candidates, both surveys were less than two percentage
points away from the actual proportions. Months before this, however, FPJ was leading
GMA.
The SWS also conducted a day of the election survey that suggested GMA getting 36.6%
of the votes, and FPJ 27.6% of the votes. Although the Day of the Election survey involved
more sample voters, the Exit Poll’s underestimation was quite off at 5.6 percentage points
(as the FPJ voters may have come home late and were missed by the Exit Poll given that it
was raining hard that day, and the poll estimates were released soonest).
Why do survey estimates differ from the “true value”? Why do sample proportions vary at
all, from survey to survey, assuming that survey protocols (of reputable organizations) are
fairly similar? How can surveys that conducted essentially at the same time and asked
similar questions get different results?
Learners will probably suggest that people had not made up their minds yet on who to
vote for, but by a month or so before elections, voting preference would have already
stabilized. If estimates would differ from the true value, by how much variability among
surveys should you expect to see?
In the previous lesson, it was assumed that sampling is conducted with replacement.
Clearly, in a survey, it does not make sense if we were to ask the same person twice about
his or her voting preference. So, we will examine sampling without replacement in this
lesson.
B. Introduction: The Binom ial and Hypergeom etric Probability M odels
Mention to learners that in Lesson 3-01, we re-examined coin tossing from a statistical
perspective. What happens in every flip of the coin is unaffected by earlier and later flips.
That is, the probability p of getting a head remains the same through all n flips of the coin.
! 266$
In other words, the flips, or more precisely, their outcomes, are independent. As a sampling
exercise or procedure, we call this sampling with equal probability with replacement or
more commonly referred to as simple random sampling1 (with replacement).
Learners have also learned in the previous chapter and were reminded in Lesson 3-01 that
the probabilities of observing x heads in n independent tosses of a coin is given by
P(X = x) = nCx p x (1 - p) n-x for x = 0, 1, 2, …, n
where the chance p of getting heads is a fixed constant. Learners have also learned or
been told before that it is called the binomial probability mass function (pmf). From Lesson
3-01, learners now know more: that the binomial pmf is the result of sampling for a
proportion when the sampling is equal probability with replacement.
Consider now a box that contains N marbles, M of which are white and the rest of other
colors, so that p = M/N is the proportion of white marbles.
M!white!marbles!
NGM!nonGwhite!marbles!
Mix the marbles well. Then, draw n < N successively, without putting back those previously
drawn. In this case, the outcomes of the draws are no longer independent. Hence the
variable X = number of white marbles in the sample will not behave according to the
binomial pmf2. From basic counting techniques (through a branch of mathematics called
combinatorics), the number of ways n can be drawn from N marbles is NCn = ; the
number of ways x white marbles can be drawn from M and n-x from N-M non-white
marbles is MCx ,
(N-M)C(n-x), respectively. Therefore, the probability of having x marbles in the sample is
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
1
!Strictly!speaking,!random!does!not!mean!equal!probability.!However,!very!early!on!in!the!history!of!statistics,!
simple!random!was!used!to!describe!sampling!with!equal!probability,!and!the!usage!persisted.!
!
2
!X!will!follow!the!binomial!pmf!if!the!marbles!are!drawn!with!replacement:!draw,!replace,!mix!the!marbles,!and!
repeat!n!times.!Then,!the!outcomes!of!the!draws!are!independent!and!you!are!drawing!from!the!same!population!
(of!marbles)!each!time.!!
! 267$
P(X=x) = for x = 0, 1, 2, …, n
This is the hypergeom etric pmf.
When N and M, hence P = M/N too, are known, the hypergeometric pmf can be computed
exactly, without uncertainty. This is a mere exercise in probability.
C. M ain Lesson: Sam pling M arbles from a Box without Replacem ent
In practice, the fraction P of marbles that are white is unknown. There are many important
real-world sampling situations when P is not known and the sampling is done without
replacement, hence the hypergeometric pmf is useful. The aim is to infer about P. Think of
the proportion of voters for candidate A. This is now a problem in statistics, no longer an
exercise in computing the exact probabilities.
Since in each draw equal probability is assigned to all remaining marbles, it is intuitively
obvious that the sampling procedure assigns equal probability for every ball to be in the
sample. Hence the procedure is called sampling with equal probability without
replacement, or simple random sampling without replacement.
Furthermore, the equal probability property of the sampling procedure suggests that
assigning equal importance or weight to the outcome of every draw should lead to a
reasonable estimate for P. Picture the outcome of a sample as a sequence of n 1’s and 0’s,
where 1 stands for white and 0 not white; the sum of the sequence is X = number of whites.
Giving equal weight, with the sum of the weights = 1, means 1/n for each. This leads to the
estimate p = X/n, which is the same formula in sampling with equal probability without
replacement.
Notice that X is the sum of individual outcomes, each of which is 1( If Head in the flipping
coin experiment, white in the marble drawing experiment), 0 otherwise (Tail, not White).
Hence p is a simple average, and serves to estimate the true proportion (of marbles in the
marble drawing experiment, or of heads in the flipping coin experiment).
We will see in the next chapter that the simple mean is still used to estimate the true mean
of more complex variables like height, weight, BMI, test scores, etc. This is because, when
sampling with equal probability, the simple mean possesses some desirable or optimal
properties. For instance, if it were feasible to draw all possible samples (of size n) and
compute their simple means, then the mean of the latter (the so-called expected value of
the sampling distribution) is equal to the true mean. This was illustrated in the last lesson on
sampling distribution, under conditions that the sampling is conducted with replacement.
Learners may then wonder if the estimates of sample means would be the same with or
without replacement. What is the difference between sampling with compared to sampling
without replacement?
! 268$
As will be shown in the examples below, the sampling distribution without replacement will
be more closely clustered together than a sampling distribution with replacement. This
means the sampling variation in the former is smaller, or that the estimate is more precise
(with lower SE) compared to sampling with replacement. Tell learners that this should not
come as a surprise since we sample to gain information about the population. Additional
information is gained whenever a new unit is drawn. However, no new information is gained
from a unit that had already been drawn previously.
When selecting a relatively small sample from a large-sized population, obtaining a sample
of independent units occurs whether you sample with replacement or without replacement.
If you sample from a bathtub full of marbles, you do not need to sample with replacement
because drawing one white marble does not influence the next color of marble to be
drawn. The marbles are independent of each other, which means the selection of one
marble doesn’t influence the selection of another.
When the population is small, obtaining a sample without replacement, such that units
selected are independent, is difficult. For example, if we sample without replacement from
a small bag of marbles, removing one white marble can influence the next color to be
drawn.
Standard Error of Sample Mean when Sampling without Replacement
How much lower is the SE of the mean without replacement? As was indicated in the last
lesson, under conditions of sampling with replacement, the standard error (SE) of the
sampling distribution of the mean is proportional to the population standard deviation s
and inversely proportional to the square root of the sample size n:
SE =
On the other hand, when sampling is conducted without replacement, the SE is the
SE = =
The term is called the finite population correction (fpc) and the ratio is
the sam pling rate, where n and N are the sample size and population size, respectively.
When the sampling rate is small enough, the two SEs for the mean (where sampling is
conducted with and without replacement) can be assumed to be virtually the same.
But how small is “small enough”? learnersIt depends on the situation, although < .05
is a workable rule of thumb in many real situations. In majority of actual sampling
applications, N is very large so the fpc is replaced by 1.
Example (Heights, Weights and BMIs of Female Learners Revisited):
! 269$
In the last lesson, the heights, weights, and BMIs of 15 female learners formed the
population to be studied, with the (population) average height, (population) average
weight, and (population) average BMI the parameters of interest.
The entire sampling distributions for the average height, average weight, and average
BMI for a sample of size n=2 learners were extensively tabulated and illustrated. In
addition, the behavior of the sampling distribution was also simulated with 10,000
experiments (on a software) for sample sizes n=3, n=5, n=9, and n=12. It was observed
that the expected value of the sampling distributions always hit the mark (i.e. the
relevant population averages), but the SE got smaller with increasing sample size. In
addition, the sampling distribution could be approximated rather well with a normal
curve when the sample size was increased.
Here, you revisit the results, but this time under the assumption of sampling without
replacement.
If a random sample of size n=2 were to be obtained, then there would be (15) (14) =210
possible equally likely samples to be selected. The full list of samples is given in Table
3-04.1
! 270$
Table 3-04.1 Distinct sam ples of size two (without replacem ent) and average heights, weights, and BM Is
of sam ple
SAMPLE F IR S T SECOND AVERAGE AVERAGE AVERAGE SAMPLE F IR S T SECOND AVERAGE AVERAGE AVERAGE SAMPLE F IR S T SECOND AVERAGE AVERAGE AVERAGE BMI
STUDENT STUDENT H E IG H T W E IG H T BMI STUDENT STUDENT H E IG H T W E IG H T BMI STUDENT STUDENT H E IG H T W E IG H T
1 1 2 1.58 45 18.25669 71 6 1 1.633 42.5 15.94628 141 11 1 1.56 43 17.93642
2 1 3 1.58 44.5 18.04028 72 6 2 1.573 47.5 19.33087 142 11 2 1.5 48 21.321
3 1 4 1.645 42.5 15.70052 73 6 3 1.573 47 19.11446 143 11 3 1.5 47.5 21.10459
4 1 5 1.33 50 36.27112 74 6 4 1.638 45 16.7747 144 11 4 1.565 45.5 18.76483
5 1 6 1.633 42.5 15.94628 75 6 5 1.323 52.5 37.3453 145 11 5 1.25 53 39.33543
6 1 7 1.57 39 15.8805 76 6 7 1.563 41.5 16.95468 146 11 6 1.553 45.5 19.0106
7 1 8 1.62 45.5 17.39699 77 6 8 1.613 48 18.47117 147 11 7 1.49 42 18.94481
8 1 9 1.53 41.1 17.90025 78 6 9 1.523 43.6 18.97443 148 11 8 1.54 48.5 20.46131
9 1 10 1.58 47 19.12234 79 6 10 1.573 49.5 20.19652 149 11 9 1.45 44.1 20.96456
10 1 11 1.56 43 17.93642 80 6 11 1.553 45.5 19.0106 150 11 10 1.5 50 22.18666
11 1 12 1.63 47 17.72412 81 6 12 1.623 49.5 18.7983 151 11 12 1.55 50 20.78843
12 1 13 1.57 38 15.43605 82 6 13 1.563 40.5 16.51023 152 11 13 1.49 41 18.50037
13 1 14 1.59 45 17.97746 83 6 14 1.583 47.5 19.05164 153 11 14 1.51 48 21.04177
14 1 15 1.655 51.5 18.73083 84 6 15 1.648 54 19.80501 154 11 15 1.575 54.5 21.79514
15 2 1 1.58 45 18.25669 85 7 1 1.57 39 15.8805 155 12 1 1.63 47 17.72412
16 2 3 1.52 49.5 21.42486 86 7 2 1.51 44 19.26508 156 12 2 1.57 52 21.1087
17 2 4 1.585 47.5 19.0851 87 7 3 1.51 43.5 19.04867 157 12 3 1.57 51.5 20.89229
18 2 5 1.27 55 39.6557 88 7 4 1.575 41.5 16.70891 158 12 4 1.635 49.5 18.55253
19 2 6 1.573 47.5 19.33087 89 7 5 1.26 49 37.27951 159 12 5 1.32 57 39.12313
20 2 7 1.51 44 19.26508 90 7 6 1.563 41.5 16.95468 160 12 6 1.623 49.5 18.7983
21 2 8 1.56 50.5 20.78158 91 7 8 1.55 44.5 18.40539 161 12 7 1.56 46 18.73251
22 2 9 1.47 46.1 21.28483 92 7 9 1.46 40.1 18.90864 162 12 8 1.61 52.5 20.24901
23 2 10 1.52 52 22.50693 93 7 10 1.51 46 20.13074 163 12 9 1.52 48.1 20.75226
24 2 11 1.5 48 21.321 94 7 11 1.49 42 18.94481 164 12 10 1.57 54 21.97436
25 2 12 1.57 52 21.1087 95 7 12 1.56 46 18.73251 165 12 11 1.55 50 20.78843
26 2 13 1.51 43 18.82064 96 7 13 1.5 37 16.44445 166 12 13 1.56 45 18.28807
! 271$
27 2 14 1.53 50 21.36204 97 7 14 1.52 44 18.98585 167 12 14 1.58 52 20.82947
28 2 15 1.595 56.5 22.11541 98 7 15 1.585 50.5 19.73922 168 12 15 1.645 58.5 21.58284
29 3 1 1.58 44.5 18.04028 99 8 1 1.62 45.5 17.39699 169 13 1 1.57 38 15.43605
30 3 2 1.52 49.5 21.42486 100 8 2 1.56 50.5 20.78158 170 13 2 1.51 43 18.82064
31 3 4 1.585 47 18.86869 101 8 3 1.56 50 20.56517 171 13 3 1.51 42.5 18.60423
32 3 5 1.27 54.5 39.43929 102 8 4 1.625 48 18.22541 172 13 4 1.575 40.5 16.26447
33 3 6 1.573 47 19.11446 103 8 5 1.31 55.5 38.79601 173 13 5 1.26 48 36.83507
34 3 7 1.51 43.5 19.04867 104 8 6 1.613 48 18.47117 174 13 6 1.563 40.5 16.51023
35 3 8 1.56 50 20.56517 105 8 7 1.55 44.5 18.40539 175 13 7 1.5 37 16.44445
36 3 9 1.47 45.6 21.06842 106 8 9 1.51 46.6 20.42514 176 13 8 1.55 43.5 17.96094
37 3 10 1.52 51.5 22.29052 107 8 10 1.56 52.5 21.64723 177 13 9 1.46 39.1 18.4642
38 3 11 1.5 47.5 21.10459 108 8 11 1.54 48.5 20.46131 178 13 10 1.51 45 19.68629
39 3 12 1.57 51.5 20.89229 109 8 12 1.61 52.5 20.24901 179 13 11 1.49 41 18.50037
40 3 13 1.51 42.5 18.60423 110 8 13 1.55 43.5 17.96094 180 13 12 1.56 45 18.28807
41 3 14 1.53 49.5 21.14563 111 8 14 1.57 50.5 20.50235 181 13 14 1.52 43 18.54141
42 3 15 1.595 56 21.899 112 8 15 1.635 57 21.25572 182 13 15 1.585 49.5 19.29478
43 4 1 1.645 42.5 15.70052 113 9 1 1.53 41.1 17.90025 183 14 1 1.59 45 17.97746
44 4 2 1.585 47.5 19.0851 114 9 2 1.47 46.1 21.28483 184 14 2 1.53 50 21.36204
45 4 3 1.585 47 18.86869 115 9 3 1.47 45.6 21.06842 185 14 3 1.53 49.5 21.14563
46 4 5 1.335 52.5 37.09953 116 9 4 1.535 43.6 18.72866 186 14 4 1.595 47.5 18.80587
47 4 6 1.638 45 16.7747 117 9 5 1.22 51.1 39.29926 187 14 5 1.28 55 39.37647
48 4 7 1.575 41.5 16.70891 118 9 6 1.523 43.6 18.97443 188 14 6 1.583 47.5 19.05164
49 4 8 1.625 48 18.22541 119 9 7 1.46 40.1 18.90864 189 14 7 1.52 44 18.98585
50 4 9 1.535 43.6 18.72866 120 9 8 1.51 46.6 20.42514 190 14 8 1.57 50.5 20.50235
51 4 10 1.585 49.5 19.95076 121 9 10 1.47 48.1 22.15049 191 14 9 1.48 46.1 21.0056
52 4 11 1.565 45.5 18.76483 122 9 11 1.45 44.1 20.96456 192 14 10 1.53 52 22.2277
53 4 12 1.635 49.5 18.55253 123 9 12 1.52 48.1 20.75226 193 14 11 1.51 48 21.04177
54 4 13 1.575 40.5 16.26447 124 9 13 1.46 39.1 18.4642 194 14 12 1.58 52 20.82947
55 4 14 1.595 47.5 18.80587 125 9 14 1.48 46.1 21.0056 195 14 13 1.52 43 18.54141
56 4 15 1.66 54 19.55924 126 9 15 1.545 52.6 21.75897 196 14 15 1.605 56.5 21.83618
! 272$
57 5 1 1.33 50 36.27112 127 10 1 1.58 47 19.12234 197 15 1 1.655 51.5 18.73083
58 5 2 1.27 55 39.6557 128 10 2 1.52 52 22.50693 198 15 2 1.595 56.5 22.11541
59 5 3 1.27 54.5 39.43929 129 10 3 1.52 51.5 22.29052 199 15 3 1.595 56 21.899
60 5 4 1.335 52.5 37.09953 130 10 4 1.585 49.5 19.95076 200 15 4 1.66 54 19.55924
61 5 6 1.323 52.5 37.3453 131 10 5 1.27 57 40.52136 201 15 5 1.345 61.5 40.12984
62 5 7 1.26 49 37.27951 132 10 6 1.573 49.5 20.19652 202 15 6 1.648 54 19.80501
63 5 8 1.31 55.5 38.79601 133 10 7 1.51 46 20.13074 203 15 7 1.585 50.5 19.73922
64 5 9 1.22 51.1 39.29926 134 10 8 1.56 52.5 21.64723 204 15 8 1.635 57 21.25572
65 5 10 1.27 57 40.52136 135 10 9 1.47 48.1 22.15049 205 15 9 1.545 52.6 21.75897
66 5 11 1.25 53 39.33543 136 10 11 1.5 50 22.18666 206 15 10 1.595 58.5 22.98107
67 5 12 1.32 57 39.12313 137 10 12 1.57 54 21.97436 207 15 11 1.575 54.5 21.79514
68 5 13 1.26 48 36.83507 138 10 13 1.51 45 19.68629 208 15 12 1.645 58.5 21.58284
69 5 14 1.28 55 39.37647 139 10 14 1.53 52 22.2277 209 15 13 1.585 49.5 19.29478
70 5 15 1.345 61.5 40.12984 140 10 15 1.595 58.5 22.98107 210 15 14 1.605 56.5 21.83618
! 273$
Figure 3-04.1 illustrates the sampling distributions for the average height, average
weight, and average BMI of sample size n=2.
.25
.2
.15
Fraction
.1
.05
0
1.2 1.3 1.4 1.5 1.6 1.7

Average Height
(a)
.15
.1
Fraction
.05
0
35 40 45 50 55 60
Average Weight
(b)
! 274$
.3
.2
Fraction
.1
0
15 20 25 30 35 40
Average BMI
(c)
Figure 3-04.2. Sam pling distributions of the sam ple m eans of size n=2
fem ale learners (selected at random without replacem ent) for their (a)
heights, (b) weights, and (c) BM I levels
Computations can also be readily made for the EVs and the SEs of the sampling
distributions for the average height, average weight, and average BMI when a sample
size n=2 is taken (where sampling is done without replacement). They yield:
Average Height Average Weight Average BMI

EV 1.52 48.21 22.09
SE 0.10 5.04 6.70
Recall the EVs and the SEs of the sampling distributions for the average height, average
weight, and average BMI of sample size n=2 (when sampling is conducted with
replacement). They were:
Average Height Average Weight Average BMI

EV 1.52 48.21 22.09
SE 0.11 5.23 6.96
! 275$
As was pointed out earlier, the EV for sampling distributions of means, whether with or
without replacement, is the target population parameter, i.e. the population mean. While
the SE for sampling without replacement is less than the SE for sampling with replacement.
In fact, the SE for means when samples are done with replacement for a sample of size n is
given by:
SE =
where s is the population standard deviation, while the SE for means of sample size n from
a population with size N, when sampling is conducted without replacement is:
SE =
Example (continued):
Learners may remember that the sampling distribution tends to get better
approximated by a normal curve with a center given by the EV and a standard deviation
given by the SE.
Show learners the results of a simulation experiment conducted with a statistical

software called Stata. Simulation experiments of 10,000 experiments of random
samples without replacement for sample size n=3, n=5, n=9 and n=12 from the box
model representing the sampling distributions of heights, weights, and BMIs of the
N=15 learners are shown in Figure 3-04.3, Figure 3-04.4, and Figure 3-04.5,
respectively.
! 276$
8
8
6
6
Percent
Percent
4
4
2
2
0
0
1.3 1.4 1.5 1.6 1.7 1.4 1.45 1.5 1.55 1.6 1.65
Average Height (Sampling wihout Replacement), n=3 Average Height (Sampling wihout Replacement), n=5
10
8
8
6
Percent
Percent
6
4
4
2
2
0
1.45 1.5 1.55 1.6 1.48 1.5 1.52 1.54 1.56 1.58
Average Height (Sampling wihout Replacement), n=9 Average Height (Sampling wihout Replacement), n=12
Figure 3-04.3. Sam pling Distribution of the Sam ple M ean Height (taken
from a random sample without replacement) of size (i) n=3; (ii) n=5; (iii)
n=9; (iv) n=14
! 277$
8
6
6
4
Percent
Percent
4
2
2
0
0
40 45 50 55 60 40 45 50 55
Average Weight (Sampling without Replacement), n=3 Average Weight (Sampling without Replacement), n=5
8
8
6
6
Percent
Percent
4
4
2
2
0
44 46 48 50 52 46 48 50 52
Average Weight (Sampling without Replacement), n=9 Average Weight (Sampling without Replacement), n=12
Figure 3-04.4. Sam pling Distribution of the Sam ple M ean W eight (taken
from a random sample without replacement) of size (i) n=3; (ii) n=5; (iii)
n=9; (iv) n=14
! 278$
15
10
8
10
Percent
Percent
6
4
5
2
0
0
15 20 25 30 35 15 20 25 30
Average BMI (Sampling without Replacement), n=3 Average BMI (Sampling without Replacement), n=5
15
8
6
10
Percent
Percent
4
5
2
0
18 20 22 24 26 19 20 21 22 23 24
Average BMI (Sampling without Replacement), n=9 Average Height (Sampling without Replacement), n=12
Figure 3-04.5. Sam pling Distribution of the Sam ple M ean BM I (taken from
a random sam ple without replacem ent) of size (i) n=3; (ii) n=5; (iii) n=9;
(iv) n=14
! 279$
What justifies the choice of sampling “without replacement” over “with replacement”? As
was pointed out, more information is gained by having sampling done without
replacement. Provide the following example.
Example 2:
A janitor has 20 keys, and one of them is the key to a locked office door. Should he
sample the keys with or without replacement?
If he randomly tries the keys one by one, but does not eliminate the ones he tries, then
he is sampling with replacement. In this case, the long-run average number of tries to
unlock the door is 20.
If he randomly tries the keys one by one, eliminating the ones that do not work, then he
is sampling without replacement. In this case, the long-run average number of tries to
unlock the door is 11.
In this case, sampling without replacement makes sense over sampling with
replacement.
D. Enrichm ent
Inform learners that it is often a puzzle to many why merely a sample of 1,200 respondents in a
poll would be enough to represent a population of voters. Most polling organizations try to get
an estimate of the population of voters who will vote for some candidate (or whatever behavior
of interest).
As was pointed out, the SE of the sampling distribution of a sample mean where sampling is
done without replacement is given by
SE =
A percentage or fraction may be viewed as an average of tickets drawn from a box containing
1’s (representing those who will vote for the candidate) and 0’s (representing those who will
not), where n tickets are drawn successively and independently without replacement. If P
represents the fraction of 1’s among the tickets, then the first ticket will follow a binomial
probability model with a mean P, variance P (1-P) and thus standard deviation
In consequence, the sample percentage will have a sampling distribution with
EV = P
and
! 280$
SE =
that can be approximated by a normal curve. With a large sample size, the finite population
correction (fpc) is nearly one so that SE will be practically
SE =
The latter inequality above follows from the observation that P(1-P) is maximized at p =1/2.
Since the sampling distribution follows a nearly normal curve, 95% of the time we would expect
the sample percentage to be within 2 SE from the true value. There would be a “margin of
error” between the sample and true percentage of
2 SE
so if we allow the margin of error to be 3 percentage points, then solving for n in the equation
yields n = 1111. This is why reputable organizations in the country tend to use about 1,200
respondents (regardless of the population size).
KEY PO INTS
• Sampling with replacement results in independent events that are unaffected by previous
outcomes, but in practice, there is more of sampling without replacement since we do
want to have more information. Additional information is gained whenever a new unit is
drawn, but no new information is gained from a unit that had already been drawn
previously (which happens when sampling is done with replacement).
• When selecting a relatively small sample from a large population, obtaining a sample of
independent subjects occurs whether we sample with replacement or without
replacement
• While the standard error (SE) of the sampling distribution of the mean is
SE =
when sampling with replacement, the SE for the mean for sampling without
replacement is less, and given by
SE = =
! 281$
where s is the population standard deviation, while n and N are the sample size and
population size, respectively.
o The term is called the finite population correction (fpc) and the
ratio is the sam pling rate.

o When the sampling rate is small enough, the two SEs (for with and without
replacement) can be assumed to be virtually the same. In majority of actual
sampling applications, N is very large so that the fpc is replaced by 1.
• For the special case, when the sample mean is actually a proportion, the EV of the
sampling distribution of sample proportion p is the population proportion P; the
standard error (SE) is
where n and N are the sample size and population size, respectively.
REFERENCES
4031
ASSESSM ENT
1. A city has 300,000 registered voters, with 120,000 of them poor. A survey organization
is about to take a random sample of 1,000 registered voters. Describe the sampling
distribution of the fraction of poor among the 1,000 sampled voters.
ANSWER:
Approximately normal with mean given by
EV =
And standard deviation given by
! 282$
SE = = 0.0155
2. Consider a school district that has 10,000 11th graders. In this district, the average
weight of an 11th grader is 45 kg, with a standard deviation of 10 kg. Suppose you
draw a random sample of 50 learners. What is the probability that the average weight of
a sampled student will be less than 42.5 kg?
ANSWER: To solve this problem, you need to define the sampling distribution of the
mean. Because the sample size (of 50) is fairly large, you might assume that the Central
Limit Theorem holds, i.e. that the sampling distribution of the sample mean will
approximate a normal curve.
The EV of the sampling distribution is equal to the mean of the population (45 kg), while
the SE of the sampling distribution (for sampling with replacement) is given by
SE =
Thus, using areas under a normal curve, the probability that a sample mean (of 50 learners)
will have an average weight less than or equal to 42.5 kg is approximately equal to 0.0382.
3. According to Sherlock Holmes, (The Sign of Four)
“While the individual man is an insoluble puzzle, in the aggregate he becomes a

mathematical certainty. You can, for example, never foretell what any one man will be
up to, but you can say with precision what an average number will be up to. Individuals
vary, but percentages remain constant. So says the statistician.”
The statistician does not actually say that. What is Sherlock Holmes forgetting?
ANSWER: Sherlock Holmes forgets about chance error, about uncertainty. The
“mathematical certainty” he may be referring to is that there is some kind of predictability,
the sampling distribution tends toward a normal curve (the Central Limit Theorem), but this
is a statistical model that involves uncertainty.
4. One public opinion poll uses a simple random sample (without replacement) of size
1,200 drawn from a region with a population of 10 million. Another poll uses another
simple random sample of size 1,200 from a region with a population of 1 million. The
polls are trying to estimate the proportion of voters who are in favor of constitutional
change. Other things being equal:
a) The first poll is likely to be a bit more accurate than the second;
! 283$
b) The second poll is likely to be a bit more accurate than the first;
c) There is not likely to be much difference in the accuracy of the polls.
ANSWER: (c). While there is a finite population correction that should be applied to the SE
for both polls, this fpc is nearly 1. The reliability (i.e. precision) of the sample proportion
depends not on big N = the population size (when N is large), but rather on small n (the
sample size).
5. A survey organization wants to take a simple random sample without replacement to

estimate the proportion of voters who are in favor of voting for candidate A in the next
election. To keep costs down, they want to take a sample as small as possible, but their
client would like to only tolerate chance errors of 1 percentage point or so in the
estimate. Should they use a sample of size 100, 2,500, or 10,000? Note that past
experience suggests that the population percentage is in the range 20% to 40%.
ANSWER: The sample size should be around 2,500. Since the client wants
SE = and p is maximum here for p= 0.4, merely solve (0.4)*(0.6)/n = (0.01)2
6. A simple random sample of 400 persons 15 years old and above is taken in Naga City.
The total years of schooling of all the sampled persons is 3230, so that the average
educational attainment is 8.1 years. The standard deviation of the sample data is
4.1 years. Describe the sampling distribution.
ANSWER: Approximately a normal curve with mean given by
EV =
and standard deviation given by
SE = = 0.2036
! 284$
CHAPTER 3: SAMPLING
Lesson 5: Sampling from a Box of Marbles, Nips, or Colored

Paper Clips, and One-Peso Coins
OVERVIEW OF LESSON
This lesson provides an activity to further help learners understand sampling. It focuses on
sampling objects from a box. The objects that can be categorized (e.g., marbles classified as
white or non-white; Nips colored candy chocolates categorized green or non-green color;
coloredpaper clips categorized red or non-red; one-peso coins categorized based on the year
they were made or minted). Learners then select random samples without replacement of
marbles (or Nips or colored paper clips) or one peso-coins. For marbles (or Nips or colored
paper clips), learners examine the distribution of colors of several samples. For one-peso coins,
learners examine the age distributions of one-peso coins (categorizing the years into two: the
year 2014, and other years). Based on the data distributions, learners think about what the
color distribution of all marbles (or Nips, or colored paper clips), or the age distribution of all
one-peso coins will look like.
• represent data with graphs (dot plots)

• describe statistical methods as a process for making inferences about population
parameters based on a random sample from that population
• use data from a sample survey to estimate a population mean or proportion and
develop a margin of error through the use of simulation for random sampling
• describe the difference on the sampling distribution when more samples are included
LESSON OUTLINE
A. Introduction
B. Main Lesson: Estimation of Probability of Getting a Head in a Single Toss of a Coin
C. Data Collection
E. Enrichment
KEY CONCEPTS: Sampling, Estimation, Sampling Variation, Standard Error, Central Limit
Theorem
! 285$
M ATERIALS NEEDED
For the activity, learners will need pencil, paper, and notes. The teacher should bring
• a large container with either (a) marbles, (b) Nips (although the drawback is that these
might melt), or (c) colored paper clips; or, (d) one-peso coins with different minting
dates
• copies of the Activity Worksheet
A. Introduction
Before starting the activity, the teacher may wish to review the definitions of population,
sample, population parameter, and sample statistic, reinforcing learners’ understanding of
these basic concepts for the lesson/activity on sampling marbles from a box with about 15%
white marbles in the box. The box should contain at least 500 marbles.
To make the lesson more tractable for learners in the class, a teacher should prepare the box
ahead of time. In other words, a teacher should have a large box of marbles containing 15%
white marbles for learners to sample in the classroom prior to the beginning of the lesson.
Learners should be asked to consider how random sampling could be used to explore the
population parameter of interest (the proportion of white marbles). They will be presented with
a large box of marbles and asked how they could possibly use it to help them with their
investigation. After learners have discussed the possibilities, they will be guided through an
investigation with a series of questions on the Activity Worksheet. They will first calculate the
proportion of white marbles in a single sample of marbles.
They will record the proportion on a sticky note and the class will construct a dot plot on the
board showing the approximate sampling distribution of the proportion of white marbles in a
sample of a fixed size of marbles.
They will then take another sample of marbles and calculate the proportion of white marbles in
the total number of marbles from both sample selections.
A second dot plot will be constructed on the board showing the sampling distribution of the
proportion of white marbles from the two samples combined.
! 286$
B. Data Analysis
At this point, learners will be performing an informal analysis of the data displayed on the two
class dot plots. They will first be asked to use the two plots to estimate the proportion of white
marbles. Then, they will be asked to compare the dot plots to analyze the effects of a larger
sample size.
The increased sample size should decrease the spread of the distribution.
Sample outcomes are shown in the tables on the next page. The data in the sample outcomes
were collected from sampling from a large box of white marbles
Sample Data Analysis

Note: Your actual samples will most likely be different from this one. So just use this as a mere
reference on how the class can interpret the results.
20 samples were collected for sample sizes n = 30 and n = 60. The results and corresponding
dot plots are shown on Table 3-05.1 below.
Table 3-05.1 Sam ples Collected for sam ple sizes n=30 and n=60
Sam ple Proportions for n = 30 Sam ple Proportions for n = 60
Sample # Proportion of white marbles Sample # Proportion of white marbles
1 5/30 = 0.17 1 12/60 = 0.20

2 6/30 = 0.20 2 15/60 = 0.25
3 3/30 = 0.10 3 14/60 = 0.23
4 8/30 = 0.27 4 7/60 = 0.12
5 3/30 = 0.10 5 10/60 = 0.17
6 3/30 = 0.10 6 10/60 = 0.17
7 6/30 = 0.20 7 11/60 = 0.18
8 5/30 = 0.17 8 7/60 = 0.12
9 7/30 = 0.23 9 8/60 = 0.13
10 3/30 = 0.10 10 10/60 = 0.17
11 3/30 = 0.10 11 8/60 = 0.13
12 4/30 = 0.13 12 11/60 = 0.18
13 7/30 = 0.23 13 6/60 = 0.10
14 5/30 = 0.17 14 11/60 = 0.18
15 5/30 = 0.17 15 12/60 = 0.20
16 1/30 = 0.03 16 10/60 = 0.17
17 6/30 = 0.20 17 12/60 = 0.20
18 8/30 = 0.27 18 11/60 = 0.18
19 1/30 = 0.03 19 13/60 = 0.22
20 5/30 = 0.17 20 8/60 = 0.13
! 287$
Figure 3-05.1 Dot Plots for Sam ple Proportions of W hite M arbles for Sample
Sizes n = 30 and n = 60
Proportion of W hite M arbles

Sam ple Size n = 30
Proportion of W hite M arbles

Sam ple Size n = 60
Interpret the Results
During the activity, learners can be asked if the information on the dot plots seemed to support
the claim that 15% of all marbles are white. Even though the proportion of white marbles will
vary considerably from sample to sample, the data displayed on the dot plot should support
the claim.
Learners will be asked to answer the following questions on the Activity Sheet as they work
through the sampling procedure and analysis. Answers to the questions are discussed below.
Teacher’s Step-By-Step Guide for This Activity
STEP 1: Preliminary Questions.
! 288$
1. What is the specific question that needs to be addressed in your investigation?
Answer: After reading the questions of interest on the activity sheet, learners should
recognize that the question to be addressed is, “What proportion of all marbles are
white?” To set up the question, learners should recognize that the proportion of all
marbles that are white is a population parameter, with the population being all the
marbles produced.
2. How does this question relate to the background information given at the beginning of
the activity? Be sure to use some of the bold-faced terms in your answer.
Answer: Learners should see that this question relates to the background information
because it involves a population param eter (the proportion of white marbles) and a
population (all marbles produced).
3. What is the population of interest in the investigation?

Answer: The population of interest is all marbles produced. It is important that learners
recognize that this includes more than the number of marbles in the large box in class,
the marbles throughout the country, or even the number of marbles produced in a
given time period, but ALL marbles produced.
4. What is the population parameter of interest in the investigation?

Answer: The population parameter of interest is the proportion of white marbles.
Learners should be able to distinguish between the proportion and other possible
quantities or parameters.
5. Why can we not realistically calculate the population parameter of interest directly?
Answer: It is important that learners recognize how large a quantity of marbles would
be involved in calculating the proportion of white marbles in ALL marbles produced.
They should also realize that production is an ongoing process. Both of these factors
would make it impossible to count all marbles produced and determine the proportion
of white marbles.
6. How could the concept of random sampling be used to investigate the population
parameter of interest?
Answer: In answering this question learners should mention that random sampling is a
practical, reasonable way to gather information that can be used to help draw
conclusions about a population of interest. They should point out that in this activity, a
sample of marbles can easily be obtained at a local store and since it is part of the
population of all marbles, it can help us draw conclusions about the proportion of white
marbles in all marbles produced.
! 289$
STEP 2: Answer the following questions before using the samples of marbles in your
investigation.
1. What are we interested in finding out about each of the samples of marbles?
Answer: The statistical question set up by learners in Step 1, #1, involved the proportion
of all marbles that are white. After referring back to this question, learners should
conclude that they are interested in finding out about the proportion of white marbles
in each sample.
2. What information do we need in order to answer our question or investigate the claim
that 15% of marbles produced are white? How can we use the samples of marbles to
obtain this information?
Answer: Learners are again asked to recognize that they need the proportion of white
marbles in the population of all marbles produced. They should then use the
background information at the beginning of the activity sheet to help them conclude
that they can use the samples of marbles as samples from which they can collect
information about the population of all marbles.
STEP 3: Each pair or group of learners will perform the following investigation using ONE
sample of marbles.
The total number of marbles in one sample is 30.
1. Calculate the proportion of white marbles in the sample.

Proportion of white marbles __________________________
Answer: Answers will vary. It is recommended that the teacher know the proportion of
white marbles in the box from which learners are drawing samples so he/she can know
if learners’ results are reasonable and can have an idea of the needed range on the dot
plot created in (3).
2. Give an estimate of the proportion of white marbles based on your sample.

Answer: Answers will vary, but should correspond with the answer obtained from the
previous question.
3. Write the proportion you found in (1) on the sticky note you were given and place it in
the appropriate position on the dot plot your teacher has drawn on the board. Did
every sample have the same proportion of white marbles?
Answer: Student answers should be checked in order to determine the range needed
for the class dot plot. The horizontal axis on the dot plot should be scaled so learners
! 290$
can easily determine where to place their sticky note. Learners should quickly see that
every sample did not have the same proportion of white marbles.
4. What value is at the center of the dot plot constructed by using one sample per group?
Answer: The value at the center of the dot plot should be close to the proportion of
white marbles in the box from which the samples were drawn, say 15%. The learners
should conclude that it is not a coincidence that the value is close to/far from the 15%
(true value).
5. Define each of the following in the context of the investigation you are performing with
one sample of marbles.
Population of interest _____________________________________________________
Answer: All marbles produced. See explanation in the answer to Step 1, #3.
Population parameter of interest_____________________________________________
Answer: Proportion of white marbles in the population of all marbles. See explanation
in the answer to Step 1, #4.
Sample (drawn from population of interest) ___________________________________
Answer: 30 marbles from the bucket. Learners should be very specific with their
answer, indicating the number of marbles in the sample and where the sample was
obtained.
Statistic (used to estimate population parameter of interest) _______________________
Answer: Proportion of white marbles in one sample of marbles from the box. It is
important that learners indicate that it is the proportion in ONE sample, and where the
sample was obtained.
STEP 4: Each pair or group of learners will perform the following investigations using two
samples of marbles.
The total number of marbles in two samples is 60.
! 291$
1. Calculate the proportion of white marbles in the overall sample (two samples with 30
marbles in each sample).
Proportion of white marbles in two samples __________________________
Answer: Answers will vary, but should be reasonable considering the proportion of
white marbles in the box from which the sample was drawn.
2. Give an estimate for the proportion of white marbles based on your overall sample.
Answer: Answers will vary, but should correspond with the answer obtained for the
previous question.
every group have the same proportion of white marbles in two samples?
Answer: Student answers should be checked in order to determine the range needed
for the class dot plot. The horizontal axis on the dot plot should be scaled so learners
can easily determine where to place their sticky note. Learners should quickly see that
every group did not have the same proportion of white marbles in two samples.
4. What value is at the center of the dot plot constructed by using the overall samples of
each group? Is it a coincidence that the value is close to/far from the value of the
proportion of white marbles in the box (say 15%)?
Answer: The value at the center of the dot plot should be close to the proportion of
white marbles in the box from which the samples were drawn. The learners should
conclude that it is not a coincidence that the value is close to/far from 15%.
5. Define each of the following in the context of the investigation with two samples of
marbles.
Population of interest _____________________________________________________
Answer: All marbles produced. See explanation in the answer to Step 1, #3.
Answer: Proportion of white marbles in the population of all marbles produced.
See explanation in the answer to Step 1, #4.
! 292$
Answer: Two samples of size 60 of marbles from the box. Learners should be very
specific in their answers, indicating the number of marbles in the sample and where the
sample was obtained.
Answer: Proportion of white marbles in two samples of marbles from the box. It is
important that learners indicate the proportion in two samples (total of 60 marbles) and
where the sample was obtained.
6. What type of changes occurred in the dot plot when you used two samples of marbles
instead of one?
Answer: Learners should observe that the spread of the distribution decreased when
the sample size increased.
7. If you only had one sample on which to base your estimate, how far off would your
estimate be? What would the worst case scenario be with one sample shown on the dot
plot? What would the worst case scenario be if you had two samples shown on the dot
plot?
Answer: For the data on the sample dot plot created on page 4 of this lesson, learners
may find a sample proportion as small as .03 or as high as .27 for n = 30. The worst case
scenario for n = 60 will be .10 on the low end and .25 on the high end. Learners should
recognize that the spread is decreasing as the n increases. Learners will give similar
answers based on the class dot plots for sample sizes n = 30 and n = 60.
Note to Teacher: Should you use other materials, just change the question as needed. For
example, if the materials available are Nips, then change it to the proportion of green Nips
candies, or if you have colored paper clips then change it to the proportion of red clips. This
activity is applicable, as long as there are two categories possible.
Also, the teacher can modify this to ease the traffic of learners getting marbles. What you can
do is have each person get 10 marbles per sample, in this case, you will have an easier time in
letting the learners get samples.
C. Possible Extensions
1. After learners understand the basic concept of a sampling distribution, design activities for
them. For instance, the teacher may ask for a copy of DepEd’s Basic Education Information
! 293$
System for the year, and ask learners to sample schools, and then focus on a particular
parameter, e.g.., the average pupil-to-teacher ratio in high school, the average enrolment,
the proportion of high school learners who are indigenous people. Ask them to get a
sample of schools, and obtain sample statistics from their samples, and then combine the
statistics with other learners, and develop a dot plot of the sampling distribution and
describe the shape, center, spread, and outliers.
2. Learners are introduced to the concept of the variability of a statistic and its relationship to
the spread of its sampling distribution. This is extended further as learners are introduced
to the concept that larger samples give smaller spreads. This can lead to the discussion of
the fact that the size of the population does not affect the spread of the sampling
distribution when the population is fairly large.
REFERENCES
Population Parameters with NIPS® by Anna Bargagliotti and Jeanie Gibson STatistics
Education Web (STEW),
https://www.amstat.org/education/stew/pdfs/PopulationParameterswithMMs.docx
See also:
Workbooks in Statistics 1: 11th Edition. Institute of Statistics, UP Los Baños, College Laguna
4031
Probability and statistics: Module 24. (2013). Australian Mathematical Sciences Institute and
Education Services Australia. Retrieved from
http://www.amsi.org.au/ESA_Senior_Years/PDF/InferenceProp4g.pdf
! 294$
ACTIVITY SHEET 3-05
Background
A population param eter is a summary measure that describes some characteristic of a

given population. In statistics, the population is the entire collection of units (individuals,
households, establishment, farms, etc.) about which we want information. The population
parameter is a constant value that does not change. Many times, it is impractical or even
impossible to calculate the population parameter of interest, the most common reason being
that populations are often composed of very large numbers of units. When we cannot calculate
the population parameter directly, we use a sam ple, which is a part of the population from
which we actually collect information.
From the sample, we calculate a statistic, a summary measure that describes some
characteristic of the sample. A statistic will vary depending on the sample from which it was
calculated. Preferably the sample should be designed so that it is representative of the
population.
How are the sam ple and the population related? We use information from a sam ple (a
statistic) to draw conclusions about a population param eter. We will use this relationship
while performing the following investigation.
Question of Interest:
W hat is the proportion of m arbles that are white?
Instructions: Your class will be divided into pairs or small groups to perform the following
investigation in order to answer the question above.
STEP 1: Preliminary questions.
1. What is the specific question that needs to be addressed in your investigation?

2. How does this question relate to the background information given at the beginning
of the activity? Be sure to use some of the bold-faced terms in your answer.
What is the population of interest in the investigation?
What is the population parameter of interest in the investigation?
Why can we not realistically calculate the population parameter of interest directly?
How could the concept of random sampling be used to investigate the population
parameter of interest?!
Your group will now be presented with a large box full of marbles. Groups will then come up to
the bucket one-by-one and, without looking, grab 30 marbles. The group will note on the
number of marbles of each color, place the sampled marbles back in the box, and repeat the
process. Once two samples are recorded, the group will return to their seats to answer the
questions in Steps 2 through 4.
! 295$
STEP 2: Answer the following questions before using the samples of marbles in your
investigation.
1. What are we interested in finding out about each of the samples of marbles?!
What information do we need in order to answer our question about the proportion of
marbles that are white? How can we use the samples of marbles to obtain this
information?
!
STEP 3: Each pair or group of learners will perform the following investigation using one
sample of marbles.
The total number of marbles in one sample is 30.
1. Calculate the proportion of white marbles in the sample.

Proportion of white marbles __________________________
2. Give an estimate of the proportion of white marbles based on your sample.!
every sample have the same proportion of white marbles? !
4. What value is at the center of the dot plot constructed by using one sample per group? !
5. Define each of the following in the context of the investigation you are performing with
one sample of marbles.
Population of interest __________________________________________________________!
Population parameter of interest_________________________________________________
Sample (drawn from population of interest) _______________________________________
STEP 4: Each pair or group of learners will perform the following investigation using two
samples of marbles.
The total number of marbles in two samples is 60.
1. Calculate the proportion of white marbles in the overall sample (two samples with 30
marbles in each sample).
Proportion of white marbles in two samples __________________________
2. Give an estimate for the proportion of white marbles based on your overall sample.
every group have the same proportion of white marbles in two samples?
4. What value is at the center of the dot plot constructed by using the overall samples of
each group? !
5. Define each of the following in the context of the investigation with two samples of
marbles.
! 296$
Population of interest _____________________________________________________!
6. What type of changes occurred in the dot plot when you used two samples of marbles
instead of one?!
If you only had one sample on which to base your estimate, how far off would your
estimate be? What would the worst case scenario be with one sample shown on the
dot plot? What would the worst case scenario be if you had two samples shown on the
dot plot?
!
Answers to Activity Sheet
STEP 1:
1. What proportion of all marbles produced is white?

2. We want to know a population parameter, which is the proportion of white marbles. The
population of interest is the total population of all marbles produced.
3. All marbles produced.
4. The proportion of white marbles.
5. It would be impossible to count all marbles produced because there would be too many and
production is an ongoing process.
6. We could use a sample of marbles that could reasonably be obtained at a local store and easily
be counted. We would use information from the sample to draw conclusions about the
proportion of white marbles in all marbles produced since the sample is a part of this
population.
STEP 2:
1. The proportion of white marbles.

2. We need the proportion of white marbles in the population of all marbles produced. We can use
the samples of marbles as samples from which we can collect information about the population.
STEP 3:
1. Answers will vary.

3. No, every sample did not have the same proportion of white marbles
4. Answers will vary, but should be approximately equal to the true proportion of white marbles in
the box. This is not a coincidence.
5. Population of interest – all marbles produced.
Population parameter of interest – proportion of white marbles in the population of all marbles
produced.
Sample – 30 marbles from the box
Statistic – proportion of white marbles ® in one sample of marbles from the box!
! 297$
STEP 4:

3. No, every sample did not have the same proportion of white marbles.
4. Answers will vary, but should be approximately equal to the true proportion of white marbles in
the box. This is not a coincidence.
5. Learners should observe that the spread of the distribution of sample proportions for sample
size n = 60 is less than the spread for n = 30. Therefore, there is less variability in the sample
proportions for size n = 60. For this reason learners should conclude that the information on the
dot plot for two samples seems to support the claim more than the information obtained from
one sample.
6. Population of interest – all marbles produced.
Population parameter of interest – proportion of white marbles in the population of all
marbles produced.
Sample – two samples of size 30 of marbles
Statistic – proportion of white marbles in 60 marbles from the box.
7. The spread of the distribution decreased.
8. For the data on the sample dot plot created on page 4 of this lesson, learners may find a sample
proportion as small as .03 or as high as .27 for n = 30. The worst case scenario for n = 60 will be
.10 on the low end and .25 on the high end. Learners should recognize that the spread is
decreasing as the n increases. Learners will give similar answers based on the class dot plots for
sample sizes n = 30 and n = 60.
ASSESSMENT
1. Explain the difference between a population parameter and a sample statistic.

Answer: A statistic is a numerical summary computed from a sample. A parameter is a numerical
summary computed from a population. A population parameter is a constant value that does not
change, whereas a statistic will vary depending on the sample from which it was calculated.
2. There are several different colors of marbles. Suppose you obtained a bag of marbles and
found that 10% of the bag was a certain specified color. Describe an activity that would allow
you to estimate how far away your estimate of 12% might be from the population proportion of
that color marble.
Answer: The answers will vary. Essential elements in the description would be:
a. A random sampling method using marbles as originally packaged.
b. Sample sizes that are large enough to be representative of the total population of marbles
c. A sampling method that involves obtaining repeated samples of the same size.
d. A method of representing the sampling distribution of the proportion of the specified color
of marbles, such as a dot plot or histogram.
e. Directions for performing an informal analysis of the sampling distribution displayed in the
dot plot or histogram.
3. If there are more samples, say each person gets 75 marbles, what can be the effect on the
estimation for the parameter?
Answer: The bigger the sample size, the better the estimates that can be generated.
! 298$
CHAPTER 3: SAMPLING
Lesson 6: Sampling from the Periodic Table

OVERVIEW OF LESSON: This lesson introduces an activity that further helps learners
understand various sampling methods. This is largely taken from a STatistics Education Web
(STEW) lesson plan called It’s Elemental. Using the Periodic Table of Elements discussed in
Chemistry, learners collect data using simple random and systematic sampling. When both
samples are collected, learners then calculate appropriate descriptive statistics and use
sampling distributions to compare the performance of the methods. They also determine how
to set up a stratified random sample and a cluster sample, but only perform the cluster sample
in this activity.
• use data from a sample to estimate a population mean

• describe the sampling distributions of some statistics (sample mean)
• compare sampling distributions for sample means from different sampling methods to
determine the optimal sampling strategy
LESSON OUTLINE
A. Introduction
B. Data Collection and Preliminary Analysis
C. Further Analysis and Interpretation
D. Enrichment: Possible Extensions
KEY CONCEPTS: Simple Random Sampling (SRS), Systematic Sampling, Stratified

Sampling, Cluster Sampling
PREREQ UISITES: Before learners begin the activity/lesson, they should have basic
knowledge of random sampling methods such as simple random, systematic, stratified, and
cluster sampling (discussed in Lesson 3-02. They should also be familiar with univariate
descriptive statistics, graphs (discussed in Chapter 1), and sampling distributions (discussed in
Lessons 3-01, 3-03, and 3-04).
M ATERIALS N EEDED : For the activity, learners will need pencils. The instructor should
provide the activity sheet, complete with a periodic table of elements.
! 299#
A. Introduction
Start the activity by giving learners the Activity Worksheet 3-06 and explaining that the overall
goal of the activity is to determine which sampling method is the most appropriate for
estimating the mean atomic weight of elements in the periodic table. The main question of
interest for this activity is: After performing a simple random sample and a systematic sample
on the periodic table of elements, which of the two is the most appropriate method?
The various sampling methods the learners will implement in this activity are based on the
periodic table of elements, so give a basic description of the periodic table as this will be
important in the design. The following description is available in the Worksheet, but go ahead
and summarize it for the learners.
“The periodic table is a tabular arrangement of chemical elements, ordered by their atomic
number (i.e., the number of protons in the nucleus of an element), electron configurations, and
recurring chemical properties. Dmitri Mendeleev created the periodic table in 1869. The table
reflects the “periodic” trends in the elements. Each of the rows of the table are called periods
and elements within a period have the same valence electron shell based on quantum
mechanical theory. The groups contain elements with similar physical properties due to the
number of electrons in the respective valence shell. The most up-to-date periodic table has
117 confirmed elements, 92 of them occurring naturally on earth, with scientists producing the
rest artificially in a laboratory. The value that learners will be most interested in is the atomic
weight of elements, which can be determined using a weighted average of the weights for
each element’s various isotopes.”
The periodic table the learners will use comes from the US National Institute of Standards and
Technology (www.nist.gov). This table only has 114 elements, so for the purposes of this
activity, the population size is N = 114 and the population mean atomic weight is µ = 141.09
grams per mole. The last two pages of the Activity Worksheet contain a text version of the
periodic table that may be easier for some learners to use.
B. Data Collection and Prelim inary Analysis
After the periodic table has been described to learners, have them all start the activity.
Strategy 1: Finding a simple random sam ple (without replacement) of 25

elem ents
Using the RANDBETWEEN() function in MS Excel, enter RANDBETWEEN(1,114) in at least 27

cells. Note that this function will yield independent draws with replacement. Consider the
sample data set in Table 3.01. Since at least once instance occurs where the sample produced
! 300#
duplicates elements, we will delete these duplicate entries (68 and 25). Slearners can continue
to sample elements until the sample is comprised of 25 unique elements.
Table 3-06.1. Exam ple of a Sim ple Random Sam ple with Replacem ent
87 92 72 68 50 52 25 7 37
69 10 68 43 49 25 89 63 17
28 5 88 94 39 96 112 98 51
With the 25 elements, learners need to find the mean atomic weight of the sample. For
Table 3.06.1, the resulting sample average here is 142.2808 grams per mole.
Alternative to the use of Excel, learners may use a table of random digits to select the elements
in the periodic table.
A table of random digits is a list of the 10 digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 constructed in such a

way that the digit in any position in the list has the same chance of being one of the ten digits,
and each value in the list has no influence on the other values in the list. An example of a part
of this table appears below.
11164 36318 75061 37674 26320 75100 10431 20418 19228 91792
21215 91791 76831 58678 87054 31687 93205 43685 19732 08468
10438 44482 66558 37649 08882 90870 12462 41810 01806 02977
36792 26236 33266 66583 60881 97395 20461 36742 02852 50564
73944 04773 12032 51414 82384 38370 00249 80709 72605 67497
49563 12872 14063 93104 78483 72717 68714 18048 25005 04151
64208 48237 41701 73117 33242 42314 83049 21933 92813 04763
51486 72875 38605 29341 80749 80151 33835 52602 79147 08868
99756 26360 64516 17971 48478 09610 04638 17141 09227 10606
71325 55217 13015 72907 00431 45117 33827 92873 02953 85474
65285 97198 12138 53010 94601 15838 16805 61004 43516 17020
17264 57327 38224 29301 31381 38109 34976 65692 98566 29550
95639 99754 31199 92558 68368 04985 51092 37780 40261 14479
61555 76404 86210 11808 12841 45147 97438 60022 12645 62000
78137 98768 04689 87130 79225 08153 84967 64539 79493 74917
62490 99215 84987 28759 19177 14733 24550 28067 68894 38490
24216 63444 21283 07044 92729 37284 13211 37485 10415 36457
16975 95428 33226 55903 31605 43817 22250 03918 46999 98501
59138 39542 71168 57609 91510 77904 74244 50940 31553 62562
29478 59652 50414 31966 87912 87154 12944 49862 96566 48825
96155 95009 27429 72918 08457 78134 48407 26061 58754 05326
29621 66583 62966 12468 20245 14015 04014 35713 03980 03024
12639 75291 71020 17265 41598 64074 64629 63293 53307 48766
14544 37134 54714 02401 63228 26831 19386 15457 17999 18306
! 301#
83403 88827 09834 11333 68431 31706 26652 04711 34593 22561
67642 05204 30697 44806 96989 68403 85621 45556 35434 09532
64041 99011 14610 40273 09482 62864 01573 82274 81446 32477
17048 94523 97444 59904 16936 39384 97551 09620 63932 03091
93039 89416 52795 10631 09728 68202 20963 02477 55494 39563
82244 34392 96607 17220 51984 10753 76272 50985 97593 34320
96990 55244 70693 25255 40029 23289 48819 07159 60172 81697
09119 74803 97303 88701 51380 73143 98251 78635 27556 20712
57666 41204 47589 78364 38266 94393 70713 53388 79865 92069
46492 61594 26729 58272 81754 14648 77210 12923 53712 87771
08433 19172 08320 20839 13715 10597 17234 39355 74816 03363
10011 75004 86054 41190 10061 19660 03500 68412 57812 57929
92420 65431 16530 05547 10683 88102 30176 84750 10115 69220
35542 55865 07304 47010 43233 57022 52161 82976 47981 46588
86595 26247 18552 29491 33712 32285 64844 69395 41387 87195
72115 34985 58036 99137 47482 06204 24138 24272 16196 04393
07428 58863 96023 88936 51343 70958 96768 74317 27176 29600
35379 27922 28906 55013 26937 48174 04197 36074 65315 12537
10982 22807 10920 26299 23593 64629 57801 10437 43965 15344
90127 33341 77806 12446 15444 49244 47277 11346 15884 28131
63002 12990 23510 68774 48983 20481 59815 67248 17076 78910
40779 86382 48454 65269 91239 45989 45389 54847 77919 41105
43216 12608 18167 84631 94058 82458 15139 76856 86019 47928
96167 64375 74108 93643 09204 98855 59051 56492 11933 64958
70975 62693 35684 72607 23026 37004 32989 24843 01128 74658
85812 61875 23570 75754 29090 40264 80399 47254 40135 69916
We can use this table by first arbitrarily choosing a line.
We then use the three successive digits in this line: if the first digit is even, write down a zero; if
it is odd, write it as 1; Then, consider the 2nd and third digits to yield a number (and throw away
the resulting number if it is larger than 114 or a repeat):
Suppose the chosen line is third line, and continuing
104-384-448-266-558-376-490-888-290-870-124-624-181-001-806-029-773-
679- 226-236-332-666-658-360-881-973-952-046-136-742-028-525-056-473-
944-047-731-203-251-414-823-843-837-000-249-807-097-260-567-497
Then, according to the protocol, we would choose the following 25 distinct elements:
104 X 48 66 X X 90 88 90 70 X 24 X 1 6 29 X
79 26 36 X 66 58 X 81 X X 46 X X 28 X 56 73
! 302#
X 47 X 3 51 14 23 43
(Note that here 66 and 90 were taken twice).
Now, ask learners to complete the list.
Strategy 2: Finding a 1-in-5 system atic sam ple of elem ents.
Learners should be able to set the seed using the same method as in Strategy 1. Instead of
sampling 25 from the numbers 1 up to 114 as in the simple random sample, the learners will
sample one integer 1 ≤ k ≤ 5 and then select every 5th element from the ordered list of
elements by atomic weight starting at k . The sample size will be determined by the value of k
picked since 114 is not divisible by 5. Learners may think that 25 elements need to also be in
this systematic sample, but that is impossible. Suppose the integer chosen between 1 to 5 is 4?
So the sample will contain the 23 elements in Table 2.
Table 3-06.2. Elem ents in the 1-in-5 system atic sam ple where 4 was first drawn
(from 1 to 5), and every fifth elem ent is then taken.
4 9 14 19 24 29 34 39 44
49 54 59 64 69 74 79 84 89
94 99 104 109 114
From the 23 elements in Table 2, the 1-in-5 systematic sample produces a sample mean of
144.5087 grams per mole.
Strategy 3: Stratified Random Sam pling

LearnersLearners should explain how they would take a stratified random sample of 25
elements from the 114 using the following 4 strata: solid, liquid, gas, and artificial. Learners are
also given the information that there are 77 solids, 2 liquids, 11 gases, and 24 artificial
elements. In order to take a stratified sample, learners should divide each population stratum
size by the population size and then multiply 25 and this percentage. For example, there
77
are 77 solids, so ×100% = 67.5% of the population of elements are solids. Thus, there
114
should be .675 × 25 = 16.875 ≈ 17 solids in the sample.
Similarly, learners should discover that the sample will have 17 solids, 1 liquid, 2 gases, and 5
artificial elements. A student may ask how the 1 liquid made the sample because they will see
that .43 liquids should be included in the sample and based on the rounding used for the other
three strata, this would lead to 0 liquids. However, it could be argued that the sample should
have at least 1 representative element of the liquids and by rounding up to 1, the resulting
sample has 25 elements.
! 303#
Strategy 4: Cluster Sam ple
Ask learners to take a cluster sample using the columns of elements as clusters. Therefore,
there are 18 clusters in total and the goal is to take a random sample of 4 clusters. Now, the
sample itself is not difficult to obtain, but learners need to understand why the columns are the
clusters and not the rows. The main reason is the variability in atomic weight in a column is
more representative of all elements and a row will have very similar weights across all elements.
Tell learners to take a random sample of 4 of the 18 clusters. Finding this sample may now be
very easy for learners. Suppose that the 4 clusters randomly chosen from 1 to 18 are cluster
numbers 11, 5, 8, and 9. Therefore, Table 3 includes all the elements that are in these 4
clusters.
Table 3-06.3. Elem ents in the cluster sam ple of 4 groups
23 26 27 29 41 44 45 47
58 61 62 64 73 76 77 79
90 93 94 96 105 108 109 111
With the elements in Table 3 as the sample of 24 elements, the sample mean for the cluster
sample is 167.76 grams per mole.
C. Further Analysis and Interpretation
Now that learners have a thorough understanding of how to take a simple random sample
(SRS) and a systematic sample, have them continue further with the activity. Ask learners which
of the two sampling methods, SRS and systematic sampling, they think will produce the least
variable mean atomic weight estimate after repeated sampling. It seems reasonable that most
learners will suspect the SRS to have the least variable estimate, but actually since there are
only 5 possible systematic samples, the repeated systematic sampling should produce the
more precise estimate. With the samples roughly ordered by atomic weight, the systematic
sample should be more representative of the population of elements. The learners don’t have
to be correct in their response to this question because they will know/learn the answer at the
end of this activity.
Tell learners to redo the process of generating a simple random and systematic sample, but let
them do this individually. Then, ask them to determine the mean atomic weight of their
samples and to record their means on the board under the appropriate heading. Then, divide
the board into two sections such that learners can compile a list of means needed for the
sampling distribution portion of the activity. Table 3-06.4 below contains example of data from
a set of say 30 learners.
! 304#
!
Results and Interpretation
The learners should be able to see right away that the standard deviation for the simple
random samples is quite large compared to the systematic samples. Also, the average mean
atomic weights for both types of samples are nearly equal at 141.50 and 141.30. Therefore,
learners should conclude that the systematic sample produces a mean atomic weight that is
more accurate. In the periodic table, the elements are ordered according to the number of
protons and this is directly related to the atomic weight of the elements. So for the most part,
the elements are ordered according to the atomic weight, which was the measure of interest.
D. Enrichm ent: Possible Extensions
1. Carry out in class a cluster sample and stratified random sample to compare the variability
and precision of the mean atomic weight after repeated sampling.
2. Demonstrate the Central Limit Theorem by taking various sized samples of the simple
random sample. Repeatedly sample 5, 10, and 50 elements and compare the sampling
distributions of the mean atomic weights.
3. Begin with the 114 elements and calculate the sample size needed to reach a specified
margin of error for the four sample methods.
4. You can use other data sets as the source of data for sampling: data from the batch of
learners (weight, height, etc), BEIS (record of all the schools of DepEd), or medical records from
the school physician.
KEY PO INTS
• Various random sampling methods may be employed in practice, such as simple random,
stratified, cluster, and systematic sampling.
• In systematic random sampling, the researcher first randomly picks the first item from the
population. Then, the researcher will select each k'th subject from the list. The main
advantage of using systematic sampling over simple random sampling is its simplicity and
the assurance that the population will be evenly sampled. There exists a chance in simple
random sampling that allows a clustered selection of subjects. This is systematically
eliminated in systematic sampling.
• With stratified sampling, the population can be divided into groups (the strata) that are in
some meaningful way different from each other. The sample is chosen by having a simple
random sample chosen in each strata.
• With cluster sampling, the population is divided into groups (the clusters) that are all
essentially the same as each other, but within the groups, their members are as diverse as
the population. Thus, the cluster sample is obtained by having a random sample of the
clusters (with all members in the cluster taken).
REFERENCES
Malloure, M. and Richardson, M. It’s Elemental! Sampling from the Periodic Table. Grand
Valley State University. STatistics Education Web (STEW). Retrieved from
https://www.amstat.org/education/stew/pdfs/ItsElemental!.docx
4031
ACTIVITY SHEET NUM BER 3-06
The periodic table of chem ical elem ents is a tabular method of displaying the chemical
elements. Although there were precursors to this table, its creation is generally credited to
Russian chemist Dmitri Mendeleev in 1869.
Mendeleev intended the table to illustrate recurring (“periodic”) trends in the properties of the
elements. The layout of the table has been refined and extended over time, especially with
new elements being discovered, and new theoretical models being developed to explain
chemical behavior.
The periodic table provides an extremely useful framework to classify, systematize, and
compare all the many different forms of chemical behavior. The table has also found wide
application in physics, biology, engineering, and industry. The current standard table contains
117 confirmed elements as of January 27, 2008 (while element 118 has been synthesized,
element 117 has not been). Ninety-two are found naturally on Earth, and the rest are synthetic
elements that have been produced artificially in particle accelerators.
The main value of the periodic table is the ability to predict the chemical properties of an
element based on its location on the table. It should be noted that the properties vary
differently when moving vertically along the columns of the table, than when moving
horizontally along the rows. The layout of the periodic table demonstrates recurring
(“periodic”) chemical properties. Elements are listed in order of increasing atomic number (i.e.
the number of protons in the atomic nucleus). Rows are arranged so that elements with similar
properties fall into the same vertical columns (groups). According to quantum mechanical
theories of electron configuration within atoms, each horizontal row (period) in the table
corresponded to the filling of a quantum shell of electrons. There are progressively longer
periods further down the table.
In printed tables, each element is usually listed with its element symbol and atomic number.
Many versions of the table also list the element’s atomic weight and other information. The
atom ic weight is the average mass of the atoms of an element. It is a weighted average of
the naturally-occurring isotopes. For example, the atomic weight of Hydrogen is 1.00794
grams per mole.
! 307#
! 308$
Atomic Atomic
Number Weight Element Abbr. Type Period
1 1.01 Hydrogen H Gas 1
2 4.00 Helium He Gas 1
3 6.94 Lithium Li Solid 2
4 9.01 Beryllium Be Solid 2
5 10.81 Boron B Solid 2
6 12.01 Carbon C Solid 2
7 14.01 Nitrogen N Gas 2
8 16.00 Oxygen O Gas 2
9 19.00 Fluorine F Gas 2
10 20.18 Neon Ne Gas 2
11 22.99 Sodium Na Solid 3
12 24.30 Magnesium Mg Solid 3
13 26.98 Aluminum Al Solid 3
14 28.09 Silicon Si Solid 3
15 30.97 Phosphorus P Solid 3
16 32.06 Sulfur S Solid 3
17 35.45 Chlorine Cl Gas 3
18 39.95 Argon Ar Gas 3
19 39.10 Potassium K Solid 4
20 40.08 Calcium Ca Solid 4
21 44.96 Scandium Sc Solid 4
22 47.87 Titanium Ti Solid 4
23 50.94 Vanadium V Solid 4
24 52.00 Chromium Cr Solid 4
25 54.94 Manganese Mn Solid 4
26 55.84 Iron Fe Solid 4
27 58.93 Cobalt Co Solid 4
! 309$
28 58.69 Nickel Ni Solid 4
29 63.55 Copper Cu Solid 4
30 65.41 Zinc Zn Solid 4
31 69.72 Gallium Ga Solid 4
32 72.64 Germanium Ge Solid 4
33 74.92 Arsenic As Solid 4
34 78.96 Selenium Se Solid 4
35 79.90 Bromine Br Liquid 4
36 83.80 Krypton Kr Gas 4
37 85.47 Rubidium Rb Solid 5
38 87.62 Strontium Sr Solid 5
39 88.91 Yttrium Y Solid 5
40 91.22 Zirconium Zr Solid 5
41 92.91 Niobium Nb Solid 5
42 95.94 Molybdenum Mo Solid 5
43 98.00 Technetium Tc Artificial 5
44 101.07 Ruthenium Ru Solid 5
45 102.91 Rhodium Rh Solid 5
46 106.42 Palladium Pd Solid 5
47 107.87 Silver Ag Solid 5
48 112.41 Cadmium Cd Solid 5
49 114.82 Indium In Solid 5
50 118.71 Tin Sn Solid 5
51 121.76 Antimony Sb Solid 5
52 127.60 Tellurium Te Solid 5
53 126.90 Iodine I Solid 5
54 131.29 Xenon Xe Gas 5
55 132.91 Cesium Cs Solid 6
56 137.33 Barium Ba Solid 6
! 310$
57 138.91 Lanthanum La Solid 6
58 140.11 Cerium Ce Solid 6
59 140.91 Praseodymium Pr Solid 6
60 144.24 Neodymium Nd Solid 6
61 145.00 Promethium Pm Artificial 6
62 150.36 Samarium Sm Solid 6
63 151.96 Europium Eu Solid 6
64 157.25 Gadolinium Gd Solid 6
65 158.93 Terbium Te Solid 6
66 162.50 Dysprosium Dy Solid 6
67 164.93 Holmium Ho Solid 6
68 167.26 Erbium Er Solid 6
69 168.93 Thulium Tm Solid 6
70 173.04 Ytterbium Yb Solid 6
71 174.97 Lutetium Lu Solid 6
72 178.49 Hafnium Hf Solid 6
73 180.95 Tantalum Ta Solid 6
74 183.84 Tungsten W Solid 6
75 186.21 Rhenium Re Solid 6
76 190.23 Osmium Os Solid 6
77 192.22 Iridium Ir Solid 6
78 195.08 Platinum Pt Solid 6
79 196.97 Gold Go Solid 6
80 200.59 Mercury Hg Liquid 6
81 204.38 Thallium Tl Solid 6
82 207.20 Lead Pb Solid 6
83 208.98 Bismuth Bi Solid 6
84 209.00 Polonium Po Solid 6
85 210.00 Astatine At Solid 6
! 311$
86 222.00 Radon Rn Gas 6
87 223.00 Francium Fr Solid 7
88 226.00 Radium Ra Solid 7
89 227.00 Actinium Ac Solid 7
90 232.04 Thorium Th Solid 7
91 231.04 Protactini Pa Solid 7
92 238.03 Uranium Ur Solid 7
93 237.00 Neptunium Np Artificial 7
94 244.00 Plutonium Pu Artificial 7
95 243.00 Americium Am Artificial 7
96 247.00 Curium Cm Artificial 7
97 247.00 Berkelium Bk Artificial 7
98 251.00 Californium Cf Artificial 7
99 252.00 Einsteinium Es Artificial 7
100 257.00 Fermium Fm Artificial 7
101 258.00 Mendelevium Md Artificial 7
102 259.00 Nobelium No Artificial 7
103 262.00 Lawrencium Lr Artificial 7
104 261.00 Rutherfordium Rf Artificial 7
105 262.00 Dubnium Db Artificial 7
106 266.00 Seaborgium Sg Artificial 7
107 264.00 Bohrium Bh Artificial 7
108 277.00 Hassium Hs Artificial 7
109 268.00 Meitnerium Mt Artificial 7
110 281.00 Ununnilium Uun Artificial 7
111 272.00 Unununium Uuu Artificial 7
112 285.00 Ununbium Uub Artificial 7
114 289.00 Ununquadium Uuq Artificial 7
116 292.00 Ununhexium Uuh Artificial 7
! 312$
Part 1. Taking Sam ples
Instructions: Refer to the Periodic Table produced by the National Institute of Standards
and Technology (NIST) in 2003. This Periodic Table displays 114 elements, along with their
corresponding atomic numbers and atomic weights.
Notice that the atom ic weight of the elem ents generally increases with the atom ic
num ber of the elem ents. Thus, the first element listed, Hydrogen, with an atomic number
of 1, has the lowest atomic weight of 1.01 grams per mole and the last element listed,
Ununhexium, with an atomic number of 116, has the highest atomic weight of 292 grams per
mole.
In order to practice selecting different types of samples and to compare the performance of
different types of samples, we are going to consider our Population of interest to be all of the
elements shown on the NIST 2003 Periodic Table (thus, N = 114) and the variable of interest is
atomic weight. Let’s assume that we are interested in selecting samples from this population in
order to estimate the population mean atomic weight. The true mean atomic weight of the 114
elements on the NIST Periodic Table is µ = 141.09 grams per mole.
Strategy #1
Select a sim ple random sam ple of 25 elements. Sample without replacement.
What is the mean atomic weight for the 25 sampled elements? x = _______________
Strategy #2
Select a 1-in-5 system atic sam ple of elements.
What is the mean atomic weight for the sampled elements? x = _______________
Strategy #3
To select a stratified random sam ple of 25 elements, without replacement, divide the table
into 4 strata: Solid, Liquid, Gas, and Artificial. Note that there are 77 Solids, 2 Liquids,
11 Gases, and 24 Artificial elements. We would then sample 17 Solids, 1 Liquid, 2 Gases, and 5
Artificial elements. !
Briefly explain why it makes sense to sample 17 of the Solids.
Strategy #4
Select a cluster sam ple of elements. Use the colum ns of elements as the clusters. Thus,
there are 18 clusters. Randomly select 4 clusters.
Briefly explain why it makes sense to use the columns as clusters and not the rows.
What is the mean atomic weight for the sampled elements? x = _______________
! 313$
Part 2. Com parison of Sam pling Strategies
We want to use class data to determine if repeated sim ple random sam pling of 25
elements will result in sample mean atomic weights that are less variable than the sample mean
weights resulting from repeated 1-in-5 system atic sam pling of elements.
1. Do you think that repeated simple random sampling of elements will likely to produce
less variable sample mean atomic weights than repeated 1-in-5 systematic sampling of
elements? Why? Or, why not?
2. Write your sample mean atomic weight on the white board in the column labeled
“Sample Means from Simple Random Samples.”
3. Write your sample mean atomic weight on the white board in the column labeled
“Sample Means from Systematic Samples.”
4. Record the class sample means for each of the sampling techniques below.
Sim ple Random Sam pling:
System atic Sam pling:
Create stem plots and calculate descriptive statistics for the class sample means.
Sim ple Random System atic Sam ples

Sam ples
Sim ple Random Sam pling System atic Random Sam pling
mean = __________ mean = __________
standard deviation = __________ standard deviation = __________
first quartile = __________ first quartile = __________
median = __________ median = __________
third quartile = __________ third quartile = __________
Based upon the above calculations, do you think that repeated simple random sampling of
elements from the Periodic Table would most likely produce a more accurate estimate of the
population mean atomic weight than would a repeated 1-in-5 systematic sampling of
elements? Why? Or, why not?
ASSESSM ENT
1. Identify the type of sampling method used in the following 4 scenarios. The possible
response options are:
A. Systematic Sample
B. Cluster Sample
C. Simple Random Sample
D. Stratified Random Sample.
! 314$
Scenario 1: In a factory that produces television sets, every 100th set produced is inspected.
Answer: A
Scenario 2: A class of 200 learners is numbered from 1 to 200, and a table of random digits is
used to choose 60 learners from the class.
Answer: C
Scenario 3: A class of 200 learners is seated in 10 rows of 20 learners per row. Three learners
are randomly selected from every row.
Answer: D
Scenario 4: An airline company randomly chooses one flight from a list of all international
flights taking place that day. All passengers on that selected flight are asked to fill out a survey
on meal satisfaction.
Answer: B
2. Suppose a state has 10 universities, 25 four-year colleges, and 50 community colleges, each
of which offers multiple sections of an Introductory Statistics course each year. Researchers
want to conduct a survey of learners taking Introductory Statistics in the state. Explain a
method for collecting each of the following types of samples:
A. Stratified Random Sample
B. Cluster Sample
C. Simple Random Sample
Answer:
First, compile a list of all the Introductory Statistics courses taught in the state at each type of
learning institution.
(A) Randomly sample a representative proportion of Introductory Statistics courses from each
of the 3 strata: universities, four-year colleges, and community colleges.
(B) Randomly sample one of the 3 types of learning institutions and then take a census of all
introductory courses within that type of institution.
(C) Simply take a random sample of n introductory courses from the list of offerings without
considering the type of learning institution.
! 315$
CHAPTER 4: ON ESTIMATION OF
PARAMETERS
Lesson 1: Concepts of Point and Interval Estimation
OVERVIEW OF LESSON: This chapter further builds on the discussions in the

previous chapter on sampling and sampling distribution to illustrate one of the basic
purposes of statistics—inference. Learners are given descriptions of basic concepts of
estimation, both point estimation and interval estimation.
LEARNING COMPETENCIES: At the end of the lesson, the learner should be

able to:
a. Classify a decision process as one that makes use of inferential statistics, or

not
b. Illustrate point and interval estimation.
c. Differentiate point from interval estimation.
PRE-REQUISITE KNOWLEDGE AND SKILLS: Knowledge in sampling and

sampling distribution (discussed in Chapter 3)
LESSON OUTLINE:
a. Motivation: Concept of Inferential Statistics

b. Preliminary Lesson: Development of the Concept of Estimation
c. Differentiate point from interval estimation
A. Motivation: Concept of Inferential Statistics
At this point, learners should have already imbibed the idea that in reality they do
not have the whole population to work on. Hence, they should make use of a
representative subset of the population which they referred to in previous lessons as
a random sample. From this random sample, they will generate statistics they will
use to make inferences about the population and/or its parameters. This process is
referred to as inferential statistics and is illustrated below.
! 316$
Note that the sample taken from the population must be a random sample obtained
using one of the sampling techniques discussed in the previous chapter. Likewise,
the inferences that they will make are subject to uncertainty which means that they
are not 100% sure of the inferences or conclusions they’ll make about the
population or its parameters, based on the statistics generated from the random
sample. In other words, there is a chance or likelihood that they will make a wrong
inference and that they will try to measure this likelihood so that they can minimize
it. This is the reason why probability measure was discussed in earlier chapter.
B. Preliminary Lesson: Development of the Concept of Estimation
In making inferences about the population, learners can either provide a value or
values for the parameter or evaluate a statement about the parameter. This chapter
will focus on the former, which is generally referred to as estimation. For this
lesson, we will discuss two ways in estimation, namely: point and interval estimation
and differentiate one from the other.
As motivational activity, ask learners to write on 1/4 sheet of paper the following:
1. His/her “best” guess of your age by giving a single number
2. The same as in Number 1, but this time he/she should give a range of values
wherein your age would most likely fall
3. Ask the student to rate his/her confidence from 0% (not confident) to 100%
(very confident) in his/her educated guess of the range of values in Number
2.
Then, collect all the papers.
C. Main Lesson: Differentiate point from interval estimation
Discuss the results of this activity to learners, emphasizing the following points:
! 317$
• There is no right or wrong numerical value for the given answers. However,
there might be misconceptions or misunderstanding of the concepts when
they provided their answers.
• On the first item, learners should give one logical number between 21 and 65
(inclusive). The student should have given only one number since you asked
them to give a point estimate. A point estimate is a numerical value and it
identifies a location or a position in the distribution of possible values. The
student’s guess of your age should be between 21 and 65 for it to be logical
since one usually starts working or teaching at the age of 21 and retires at the
age of 65 (compulsory retirement age).
• On the second item, tell learners they should have given a logical range of
values or set of values with lower and upper limits. The logical lower limit
should be at least 21 while the upper limit should be at most 65. The reasons
behind these numbers are similar to what were stated earlier. What they have
given is their interval estimate of your age. An interval estimate is a range
of values where most likely the true value will fall.
• As stated in item 3, the percentage should be within 0% to 100%. This
measure of confidence in the interval estimate is referred to as confidence
coefficient. When combined with their estimate in item 2, such is now
referred to as confidence interval estimate. Hence, a confidence interval
estimate is a range of values where one has a certain percentage of
confidence that the true value will likely fall in.
• In point estimation, since you are giving only a single value, there are only
two possibilities, that is, the estimate is either wrong or correct. Also, there is
no measure of how confident one is in his/her estimate. On the other hand,
the interval estimate gives more than one possible values as estimates. In
addition, the confidence coefficient provides a measure of confidence in the
estimate.
Before the session ends, inform your learners your true age. Take note, too, of
how many learners gave correct point estimates and how many gave confidence
interval estimates that included your age.
SEATWORK
The seatwork can be something similar to the motivational activity but be sure you
know the true value so you would have a basis for logical answers. Some interesting
parameters to estimate are as follows:
1. Age of a celebrity such as Sharon Cuneta
2. Income or salary of your father or mother
3. Length (in kilometres) of EDSA (in Metro Manila or some local national
highway)
4. Height of the tallest building in your city or municipality
! 318$
5. Number of text messages per day sent by the principal of the school
TEACHER TIPS
• Keep the submitted papers as materials for future lessons.
• Ask learners to reflect on the difference between a “fortune teller” and one
who provides a confidence interval estimate.
KEY POINTS
• A point estimate is a numerical value and it identifies a location or a position
in the distribution of possible values.
• A confidence interval estimate is a range of values where one has a certain
percentage of confidence that the true value will likely fall in.
REFERENCES
Workbooks in Statistics 1: 11th Edition. Institute of Statistics, UP Los Baños, College

Laguna 4031
ASSESSMENT
For each of the following situations, ask the student whether inferential statistics is
applicable or not:
1. The government would like to know the per capita rice consumption per day
of Filipinos.
Answer: Inferential statistics is applicable as you cannot take the daily rice
consumption of every Filipino by obtaining a probability sample of
households.
2. The effectiveness of a newly developed cure of cancer

Answer: Inferential statistics is NOT applicable as medical research makes
use of volunteers and not a random sample of cancer patients to test the
effectiveness of a newly developed cure for the disease.
3. A presidential candidate decides to take a survey through text messaging to

determine the proportion of voters who are likely to vote for him/her.
Answer: Inferential statistics cannot be used here since it is likely that not all
voters have cell phones, and even if everyone has, it is important that the
sample represents the target population, which does not here, partly also
due to non-responses.
4. A farmer wants to estimate the number of pigs he has in the pig pen. He
decides to capture 20 pigs, puts a red mark on the captured pigs, and then,
! 319$
lets them loose. After a day or so, the farmer decides to recapture another
set of pigs, say, 10 of them, and notices that only one of them has a red
mark, and so he estimates that he has 20/(0.1) = 200 pigs
Answer: Inferential Statistics here, assuming that the 20 originally captured
pigs had roamed around across the population of pigs, and that the
recaptured 10 pigs form a random sample, so that 1/10 = 0.1 = 20/N and
thus N =200)
5. An auditor of a government office wants to assess what proportion of

experiment records were done correctly. Instead of going over 10,000
records, she decides to sample the first 100 records on her desk, and notices
that 97 of them were done well. She concludes that 97% of all the records
given to her were done well.
Answer: If the records on the desk were put in random order, then this is a
valid application of inferential statistics. However, if the records are not in
random order, then the sample is not a random sample, and there will likely
be some bias in the estimate.
learners
! 320$
CHAPTER 4: ESTIMATION OF PARAMETERS
Lesson 2: Point Estimation Of The Population

Mean
• Identify possible point estimators of the population mean
• Discuss characteristics of a “good” estimator
• Appraise why the sample mean is the “best” estimator of the population mean
• Compute for a point estimate of the population mean
PRE-REQUISITE KNOWLEDGE AND SKILLS

Knowledge in basic concepts in estimation (Lesson 4-01) as well as the sampling
distribution of the sample mean (Chapter 3)
LESSON OUTLINE
A. Possible point estimators of the population mean
B. Properties of a “good” estimator
C. The sample mean as the best linear unbiased estimator (BLUE) of the population
mean
D. Illustration on the computation of a point estimate of the population mean
DEVELO PM ENT O F THE LESSON

At the start, review the difference between a parameter and statistic.
A param eter is a characteristic of the population which is usually unknown and needs to
be estimated.
On the other hand,
A statistic is computed from a random sample and hence, it is known and is used to
estimate the unknown parameter.
Recall that there are two types of estimation: point and interval estimation. In estimating a
parameter, the mathematical expression or formula you used in coming up with the
estimate is referred to as estim ator while the estim ate is a numerical value that you
arrived at when you apply the estimator using the sample data.
As motivational activity, ask learners to group themselves into five. Each group will be given
a sample data set on weights of 10 learners that you generated beforehand. Each group
must have a different sample data set. Hence, the number of sample data sets corresponds
to the number of groups of five learners. Ask each group to discuss how they are going to
use the sample data to estimate the average weight of all learners in the class and
implement the process to get an estimate. This activity should be done in 10 minutes
! 321$
After the group activity, the leaders of each group were then asked to report on the
mathematical expression or formula they used as well as the resulting estimate. It is
recommended that you list their reported estimators and estimates on the board.
For the discussion of the results of the activity, consider the following:
• Use what were reported and listed on the board to illustrate that there could be several
estimates for a parameter. Now, it is possible that the learners will give only one
estimator and most commonly, they will give the sample mean or the average. If this is
the case, you could supply other possible estimators.
There could be several estimators for a parameter. For a population mean, usually
represented by the Greek letter µ, the following are possible estimators that make use
of a sample data obtained using simple random sampling scheme.
Sample Mean
Sample Median
where is the ith observation in an array or when
the observations are arranged in increasing or
decreasing order.
Sample Mode is the value(s) with the highest frequency
• With several estimators, we must choose and use the “best” estimator but how do we
choose the “best”? An estimator could be evaluated based on the two statistical
properties: accuracy and precision, which are both measures of closeness. Accuracy is
a measure of closeness of the estimates to the true value while precision is a measure
of closeness of the estimates to each other.
To illustrate, take the bull’s eye in a dart board as the parameter and the ‘hits’ made on
the board as the estimates. There could be a “hit” that is near the bull’s eye or an
estimate that is near the parameter. On the other hand, there could be a “hit” that is far
from the bull’s eye or an estimate that is far from the parameter. As shown in the
following figure, Estimate No. 1 is far from the parameter value while Estimate No. 3 is
near the parameter value.
Parameter!
Estimate!
No.!2! µ!
Estimate!
No.!1!
Estimate!
No.!3!
! 322$
Parameter!
µ!
!
• Illustration of the Computation: Consider the following observed weights (in
kilograms) of a random sample of 20 learners and use it to estimate the true value of
the average weight of learners enrolled in the class.
40 45 46 48 48 50 55 55 56 58
58 59 60 60 62 62 64 64 65 66
The sample mean is computed as:
Thus, we say the average weight of all learners in the class is estimated to be around
56.05 kg based on a simple random sample of 20 observations.
ENRICHM ENT
For problems described in Numbers 1, 3, and 4, identify or formulate in words the
hypothesis of interest to obtain the objective of problem.
TEACHER TIPS
• Use the same numerical example and assessment problems for future lessons.
KEY PO INTS
• When estimating a parameter (such as a population mean), there are various

possible estimators to use (including the sample mean, sample median, and sample
mode).
• What makes an estimator a good estimator? An estimator should have both
accuracy and precision.
o Accuracy is a measure of closeness of the estimates to the true value
o Precision is a measure of closeness of the estimates to each other
• We also prefer the estimator to be unbiased.
o Bias is the difference between the average value of the estimator (i.e. the
expected value of the sampling distribution) and the value of the population
parameter.
REFERENCES
4031
! 325$
!
Answer: pesos.
The estimated true mean expenditure of PU learners for alcoholic beverages is Php
315.50 with precision (measured by standard error of the estimate) equal to
pesos. If the head of the university uses this estimate to check on his
observation, he could say there is a decline on the alcoholic expenditures of the
learners. However, the risk of committing an error on this decision could not be
measured with a point estimate only. There is a need to do interval estimation or
hypothesis test procedure.
4. A local government official observes an increase in the number of individuals with

cardiovascular and obesity problems in his barangay. In order to improve the health
conditions of his constituents, he aims to promote an easy and cheap way to reduce
individuals’ weight. It is known that obesity results to a greater risk of having illnesses
like diabetes and heart problems. He encouraged his constituents to participate in his
project every weekend for 3 months. To know if the program is effective
in reducing their weight, he randomly selected 12 participants from the group who
completed the program. The weight loss data of the 12 randomly selected participants,
in kilograms, after completing the program are: 0.5, 0.7, 0.9, 1.1, 1.2, 1.3, 1.4, 2.0, 2.3,
2.4, 2.7, and 3.0. It is known that the weight loss of those who have completed the
dance program follows a normal distribution with variance of 3.24 kg2. Provide a point
estimate of the average weight loss of the participants who have completed the dance
program. Give also a measure of the estimate’s precision.
Answer: kg.
The estimated true mean weight loss of the program participants is 1.625 kg with
precision (measured by standard error of the estimate) equal to kg.
If the local government official uses this estimate to know whether his program is
effective or not, he could say that the average weight loss of the participants after the
program is greater than zero. However, the risk of committing an error on this decision
could not be measured with a point estimate only. There is a need to do interval
estimation or hypothesis test procedure.
327$
Lesson 3: Confidence Interval Estimation of the Population
Mean (Part 1)
OVERVIEW OF LESSON: In this lesson, learners learn how to construct interval

estimates for the population mean when the variance s2 of the parent distribution is known.
They are also provided examples on how to determine minimum sample size requirements
for estimating the population mean given information on the variability of the parent
distribution.
LEARNING COM PETENCIES

• Assess accuracy of confidence interval estimates through its width
• Interpret confidence interval estimates
• Construct a (1-α)100% confidence interval estimator of the population mean when
the population variance is known
• Determine the required sample size in estimating the population mean under the
simple random sampling scheme
PRE-REQ UISITE KNOW LEDGE AND SKILLS: Knowledge in point estimation as

well as sampling distribution of the sample mean
LESSON OUTLINE
A. Accuracy of the confidence interval estimates through its width.
B. Construction and interpretation of a (1-α)100% confidence interval estimator of
the population mean when the population variance is known.
C. Computation of the interval estimate of the population mean when the
population variance is known and its interpretation
D. Sample size determination under simple random sampling scheme in estimating
the population mean
E. Computation of the required sample size under simple random sampling
scheme in estimating the population mean and its interpretation
confidence interval estim ate
•
width of the interval
estim ate represents the accuracy of the estim ate
Tabular Value
!
• In particular, for the population mean, the point estimator is the sample mean while the
standard error of the sample mean will be used in the computation. With a known
population variance (σ2) and sample size (n), the standard error of the sample mean is
computed as a ratio of the standard deviation (square root of the variance) and the
square root of the same size or mathematically, .
• Also, since the population variance is known, the sampling distribution of the sample
mean will follow the standard normal distribution or the Z distribution. This would mean
that the tabular value would come from the Z-distribution table. Usually, we use the
notation Z /2 as a tabular value in the Z-distribution table whose area to its right is equal
α
to α/2.
• Thus, a (1-α)% confidence interval (CI) of the population mean (µ) when the population
variance (σ2) is known is constructed as
or
where is the sample mean computed from a simple random sample of size n. The
lower limit of the interval is while the upper limit is
• The width of the interval estimate is the difference between the upper limit and
the lower limit of the interval estimate. Expressing it mathematically, we have:
This would lead to 2 where is usually referred to as m axim um

allowable deviation, denoted by D.
• The maximum allowable deviation is a function of three factors: (1) population standard
deviation, σ ; (2) sample size, n and (3) confidence coefficient (1-α)% through the tabular
value Z /2. Take note of the following relationships between each of these three factors
α
and the confidence interval estimator holding other factors constant:
1. The larger is the variability of the population from which the simple random sample
was drawn, or the larger value of σ will result to larger maximum allowable deviation
and consequently, wider confidence interval estimate.
2. Bigger sample size will lead to smaller maximum allowable deviation and narrower
confidence interval estimate.
3. Higher confidence interval coefficient (1-α)% means lower value of α, thus higher
tabular value Z /2 which leads to larger maximum allowable deviation and
α
consequently, wider confidence interval estimate.
! 330$
The (1-α)% confidence interval (CI) of the population mean (µ) can be interpreted as a
probability statement or a confidence statement. It is a probability statement when the
upper and lower limits are still considered random variables or they are not yet fixed.
Otherwise, it is considered a confidence statement. For example, one could say that the
probability that a 95% CI of the population mean will include the population mean in
the interval is equal to 0.95 Mathematically, this is expressed as:
Once, we have computed or fixed the lower and upper limits, say the lower limit is 40
and the upper limit is 60, the 95% CI of the population mean becomes a confidence
statement. Thus, we say that we are 95% confident that the true mean value will be
between 40 and 60, and the probability that the true mean value will be between 40
and 60 is either one or zero. The probability is one if the true mean value is indeed
between 40 and 60, and if otherwise, the probability is zero. Note also that we could
interpret the 95% CI of the population mean, in terms of the number of interval
estimates out of all possible confidence interval estimates that will contain or include
the population mean. Like what was said earlier, out of all possible confidence intervals,
95% of them will contain or include the population mean.
Illustration of the Computation: Consider the numerical example used in point

estimation of the population mean where the following observed weights (in kilograms)
of a random sample of 20 learners were used.
40 45 46 48 48 50 55 55 56 58
58 59 60 60 62 62 64 64 65 66
Assuming that the population standard deviation of the weights of all learners in the
class is 9 kg, the 95% confidence interval estimate of the true average weight of the
learners is
Thus, we are 95% confident that the true average weight of all learners in the class is
between 52 kg and 60 kg (rounded off to the nearest integer).
Using the expression on maximum allowable deviation, the required

sample size in estimating the population mean under simple random sampling scheme
is computed as (rounded up to the next higher integer). Hence,
greater variability of the population, larger confidence coefficient and smaller maximum
allowable deviation require larger sample size.
• Illustration of the Computation: Suppose we want to estimate the true average weight
of learners enrolled in a school using a sample to be drawn using simple random
sampling. How large should the sample be if we want the estimate to be within 2 kg
away from the true value and that we are 99% confident of our estimate? We could
assume that population standard deviation of the weight is 9 kg.
Thus, we need 135 learners in estimating the true average weight of learners enrolled in
this class under simple random sampling scheme with 99% confidence and maximum
allowable deviation is within 2 kg.
TEACHER TIPS
• Use the same numerical example for future lessons.
ENRICHM ENT
For an enrichment activity, ask learners to collect data to determine how far they can walk
blindfolded down the length of a hallway before they deviate from a straight walking path
and go “out of bounds.” This data will be used to test the theory that humans are not able
to walk in a straight line without being able to see landmarks. Learners can examine if the
data set they collected appear to be normally distributed and build a 95% confidence
interval to estimate the mean distance learners can walk in.
REFERENCES
Freedman, D., Pisani, R, and Purves. (2007). Statistics, Fourth Edition. New York: W. W.
Norton & Company.
Huey, M. Walk the Line. STatistics Education Web (STEW). Retrieved from
https://www.amstat.org/education/stew/pdfs/WalktheLine.docx
4031
332$
ASSESSMENT
I. Using some of the problems in Lesson 2 of this Chapter, ask learners to do the computational
exercises on the construction of the confidence interval of the population mean and determination of
the required sample size.
1. A company that manufactures electronic calculators uses a certain type of plastic. An alternative
plastic material is introduced in the market and the manager of the company is thinking of
shifting to this material. He will decide to shift if the mean breaking strength of the new material
is greater than 155. It is known that the breaking strengths of the new plastic material follow the
normal distribution and have a standard deviation of 10 psi (pounds per square inch). Six
samples of the new plastic materials were randomly selected and their breaking strengths were
determined. The data obtained were 156, 154, 168, 157, 160 and 158. Construct and interpret a
98% confidence interval for the true mean breaking strength of the new plastic material.
Answer: With psi and its standard error equal to
, the 98% confidence interval for the true mean breaking strength of the
new plastic material is
We say that we are 98% confident that the true mean breaking strength of the new plastic
material is between 148.8 and 167.8 psi.
2. The head of the Philippine University (PU) observes a decline on the alcoholic expenditures of
learners from a monthly expenditure of Php 350 pesos in the previous year. To check on this, he
randomly selected 10 PU learners who drink alcoholic beverages and asked the amount, in
pesos, that they usually spend on alcoholic beverages in a month. It is known that the usual
amount spent on alcoholic beverages by learners who drink alcoholic beverages follows the
normal distribution with standard deviation of Php 10. The data collected are: 400, 235, 200,
250, 200, 300, 500, 430, 420, and 220.
a. Construct and interpret a 95% confidence interval for the true mean amount spent by the
learners on alcoholic beverages.
Answer: With pesos and its standard error
equal to pesos, the 95% confidence interval for the true mean amount
spent by the learners on alcoholic beverages is expressed as
We say that we are 95% confident that the true mean amount spent by learners on alcoholic
beverages is between Php 308.95 and Php 321.35.
b. Find a 90% confidence interval for the true mean amount spent by the learners on alcoholic
beverages.
Answer: The 90% confidence interval for the true mean amount spent by the learners on
alcoholic beverages is expressed as
.)
! 333$
c. Calculate the widths of the confidence intervals in (a) and (b). How was the width of the
confidence interval affected by the decrease in the confidence coefficient holding the other
factors constant?
Holding other factors constant, as the confidence coefficient decreases, the width of the
interval becomes narrower.
d. How many learners must be sampled in order to be 99% confident that the estimated mean
amount spent on alcoholic beverages will be within Php 2.00 of the true mean?
3. A local government official observes an increase in the number of individuals with cardiovascular
and obesity problems in his barangay. In order to improve the health conditions of his
constituents, he aims to promote an easy and cheap way to reduce weight. It is known that
obesity results in greater risk of having illnesses like diabetes and heart problems. He
encouraged his constituents to participate in his Dance for Life project every weekend for 3
months. To know if the program is effective in reducing weight, he randomly selected 12
participants from the group who completed the program. The weight loss data, in kilograms, of
the 12 randomly selected participants after completing the program are: 0.5, 0.7, 0.9, 1.1, 1.2,
1.3, 1.4, 2.0, 2.3, 2.4, 2.7, and 3.0. It is known that the weight loss of those who have completed
the dance program follows a normal distribution with variance of 3.24 kg2.
a. Construct and interpret a 90% confidence interval for the true mean weight loss of the
participants who have completed the dance program.
Answer: With kg and its standard error equal to
kg, the 90% confidence interval for the true mean weight loss of the
participants who have completed the dance program is expressed as
We say that we are 90% confident that the true mean weight loss of the participants who
have completed the dance program is between 0.7702 and 2.4798 kg.
b. How many participants must be sampled in order to be 95% confident that the estimated
mean weight loss of the participants will be within 0.5 kg of the true mean?
! 334$
II. Provide the best choice in each item:
1. The width of a confidence interval estimate for a proportion will be

a) narrower for 99% confidence than for 95% confidence.
b) wider for a sample size of 100 than for a sample size of 50.
c) narrower for 90% confidence than for 95% confidence.
d) narrower when the sample proportion is 0.50 than when the sample proportion is
0.20.
ANSWER: C
2. A 99% confidence interval estimate can be interpreted to mean that

a) if all possible samples are taken and confidence interval estimates are developed,
99% of them would include the true population mean somewhere within their
interval.
b) we have 99% confidence that we have selected a sample whose interval does
include the population mean.
c) Both of the above.
d) None of the above.
ANSWER: C
3. When determining the sample size necessary for estimating the true population mean, which
factor is not considered when sampling with replacement?
a) The population size.
b) The population standard deviation.
c) The level of confidence desired in the estimate.
d) The allowable or tolerable sampling error.
ANSWER: A
4. Suppose a 95% confidence interval for µ turns out to be (1,000, 2,100). To make more
useful inferences from the data, it is desired to reduce the width of the confidence interval.
Which of the following will result in a reduced interval width?
a) Increase in the sample size.
b) Decrease in the confidence level.
c) Increase in the sample size and decrease in the confidence level.
d) Increase in the confidence level and decrease in the sample size.
ANSWER: C
5. In the construction of confidence intervals, if all other quantities are unchanged, an increase
in the sample size will lead to a interval.
a) narrower
b) wider
c) less significant
d) biased
ANSWER: A
! 335$
Chapter 4: Estimation of Parameters
Lesson 4: Confidence Interval Estimation of the Population

Mean (Part 2)
LESSON OVERVIEW : In this lesson, learners continue to learn about interval estimation
of the population mean discussed in the previous lesson, but this time under the
assumption that the parent distribution follows a normal curve, and that the population
variance s2 is unknown. The interval estimate makes use of percentiles of a Student’s t
distribution with n-1 degrees of freedom.

the population variance is unknown
• Use the Student’s t distribution table in getting a tabular value
the population variance is unknown and sample size is large enough to invoke the
• Interpret confidence interval estimates
PRE-REQ UISITE KNOW LEDGE AND SKILLS: Knowledge in confidence interval

estimation of the population mean when the population variance is known
LESSON OUTLINE
A. Construction and interpretation of a (1-α)100% confidence interval estimator of the
population mean when the population variance is unknown
B. Use of the Student’s t distribution table in getting a tabular value
C. Construction and interpretation of a (1-α)100% confidence interval estimator of the
population mean when the population variance is unknown and sample size is large enough
to invoke the Central Limit Theorem
D. Illustration on the computation of an interval estimate of the population mean and its
interpretation
First, recall how to construct an interval estimator.
! 336#
In this expression, the tabular value depends on the sampling distribution of the sample
mean. You learned in the previous lecture that the tabular value to use in the mathematical
expression when the population variance is known is to be taken from the standard normal
distribution.
When the population variance is unknown, there is a slight change in the construction of the
confidence interval and the changes involve the tabular value and the standard error of the
sample mean.
A. Construction and interpretation of a (1-α)100% confidence interval estimator of the

population mean when the population variance is unknown
With an unknown population variance (σ2), it has to be estimated using a simple random
sample of size n. A point estimator of the population variance is the sample variance
denoted as s2 and computed as The square root of the sample variance is the
sample standard deviation, denoted as s. Such point estimate of the population standard
deviation is used in the computation of the standard error of the sample mean and can be
computed as a ratio of the sample standard deviation and the square root of the same size
or mathematically, .
B. Use of the Student’s t distribution table in getting a tabular value

The tabular value to use would come from the Student’s t-distribution table. Usually, we use
the notation t ( /2,n-1) as a tabular value in the Student’s t-distribution with degrees of freedom
α
equal to n-1. Such tabular value is also a point in the distribution whose area to its right is
equal to α/2.
• A Student’s t-distribution table (Please see attached table generated using MS Excel®)
provides the area or probability to the right of a given value (t0). The illustration below
shows a part of the table. The first row of the table provides selected probabilities or
areas while the first column provides the degrees of freedom. The intersection of the
area and the degrees of freedom is the needed tabular value.
df 0.10 0.05 0.025 0.01 0.005 Selected!Probabilities!

1 3.08 6.31 12.71 31.82 63.66
2 1.89 2.92 4.30 6.96 9.92
Degrees!of!Freedom!(df)! 3 1.64 2.35 3.18 4.54 5.84 Tabular!Value!with!area!of!
0.025!to!its!right!in!a!Student’s!
4 1.53 2.13 2.78 3.75 4.60
t!distribution!with!3!df.!
5 1.48 2.02 2.57 3.36 4.03
Graphically,!this!is!shown!
6 1.44 1.94 2.45 3.14 3.71
below:!
0.025! !
!!!!t(0.025,3)=3.18!
!
! 337# !
!
• Thus, a (1-α)% confidence interval (CI) of the population mean (µ) when the population
variance (σ2) is unknown is constructed as
or
where and s are the sample mean and sample standard deviation, respectively. Both
are computed using a simple random sample of size n. The lower limit of the interval is
while the upper limit is
• For this case, the width of the interval estimate is computed as:
and the maximum allowable deviation is
C. Construction and interpretation of a (1-α)100% confidence interval estimator of the

population mean when the population variance is unknown and sample size is large enough
to invoke the Central Limit Theorem
A property of the Student’s t distribution is that it approaches the standard normal
distribution as its degrees of freedom increase. Since the degrees of freedom that we are
concerned about at the moment depend on the sample size n, we can say that as n
increases, the Student’s t distribution approaches the standard normal distribution. This is
also in consonance to the Central Limit Theorem, discussed in the previous chapter. With
these concepts, the tabular value to be used in the construction of the confidence interval
for the population mean when the sample size is at least 30 is to be taken from the Z-
distribution table. Thus, the following expression is to be used in constructing a (1-α)%
confidence interval (CI) of the population mean (µ) when the population variance (σ2) is
unknown and the sample size is at least 30:
or
• For this case, the width of the interval estimate is computed as:
and the maximum allowable deviation is
D. Illustration of the Com putation

Again, consider the numerical example used in point and interval estimation of the
population mean where the following observed weights (in kilograms) of a random sample
of 20 learners were used.
! 338#
40 45 46 48 48 50 55 55 56 58
58 59 60 60 62 62 64 64 65 66
kg.
This time, you don’t have an assumed value of the population standard deviation of the
weights of all learners in the class. Because of this situation, there is a need to use a
point estimate of the population standard deviation. Using the same sample
observations given above, a point estimate of the population standard deviation is
With the sample mean and standard deviation, the 95% confidence interval estimate of
the true average weight of the learners is
Thus, we say that we are 95% confident that the true average weight of all learners in
the class is between 52 kg and 60 kg (rounded off to the nearest integer).
TEACHER TIPS
ENRICHM ENT
Plan an enrichment activity that involves learners measuring their foot sizes. The teacher
records the foot sizes of all learners in class in order to obtain the population mean foot size
of the entire class. The class is then divided into groups of 3 to 5 learners. Using a simple
random sample of 10 learners, the groups will estimate the average foot size of the entire
class. Numeric summaries (mean and five-number summary) and box plots can be used to
obtain point and interval estimates, respectively, for the mean foot size of the entire class.
The confidence level, or reliability, for the interval estimates computed by the learners is
estimated by obtaining the proportion of interval estimates that “trap” the population
average foot size of the entire class.
REFERENCES
! 339#
Parks, S., Steinwachs, M., Diaz, R., and Molinaro, M. Did I Trap the Median? STatistics
Education Web (STEW). Retrieved from
https://www.amstat.org/education/stew/pdfs/DidITrapTheMedian.docx
Freedman, D., Pisani, R, and Purves, R. (2007). Statistics, Fourth Edition. New York: W. W.
Norton & Company.
4031!
ASSESSM ENT
I. Using a problem in Lesson 2 of this Chapter, ask learners to do the computational
exercises on the construction of the confidence interval of the population mean when the
population variance is unknown.
1. The nickel metal Hydride (Nimh) battery is one of the highly advertised rechargeable
batteries today. It is lighter and can last up to 2 to 4 times longer than alkaline or
standard Nickel-Cadmium (NiCd) batteries. To evaluate its performance, a random
sample of 10 Nimh batteries was taken. The number of photos taken using each battery
in a digital camera is given as follows: 405, 564, 342, 456, 435, 543, 473, 452, 462, and
475. Construct and interpret a 95% confidence interval for the true mean number of
photos taken using the Nimh battery.
Answer: With photos and its standard error is

equal to the 95% confidence interval for the true mean number of
photos taken using the Nimh battery is
We say that we are 95% confident that the true mean number of photos taken using the
Nimh battery is between 416 and 506 photos.
Further, we could have the following additional problems:

1. The Municipal Planning Officer of Los Baños wants to determine if the average wage of
labourers per hour in the municipality is below Php 320. A random sample of 40
labourers in the municipality yielded a mean of Php 300 per hour with a standard
deviation of Php 50 per hour. Using this information, construct a 99% confidence
interval estimate of the true average wage rate per hour of labourers in the Municipality
of Los Baños.
Answer: With pesos and its standard error is equal to the

99% confidence interval for the true average wage rate per hour of labourers in the
Municipality of Los Baños is
! 340#
.
We say that we are 99% confident that the true average wage rate per hour of labourers
in the Municipality of Los Baños is between Php 280 and Php 320.
2. A machine produces metal pieces which are cylindrical in shape with an average mean
diameter of 14.20 cm if the machine is in good condition. A quality engineer officer
evaluates the condition of the machine by using a random sample of 36 runs which
resulted to a mean diameter of 14.25 cm with standard deviation of 0.30 cm. Using this
information, construct a 95% confidence interval estimate of the true average diameter
of the cylindrical metal pieces produced by the machine.
Answer: With cm and its standard error is equal to the 95%

confidence interval for the true average diameter of the cylindrical metal pieces
produced by the machine is
We say that we are 95% confident that the average diameter of the cylindrical metal
pieces produced by the machine is between 14.152 and 14.348 cm.
II. Provide the Best Choice
1. Which of the following is not true about the Student’s t distribution?

a) It has more area in the tails and less in the center than does the normal
distribution.
b) It is used to construct confidence intervals for the population mean when the
population standard deviation is known.
c) It is bell-shaped and symmetrical.
d) As the number of degrees of freedom increases, the t distribution
approaches the normal distribution.
ANSWER: B
2. The t distribution
a) assumes the population is normally distributed.
b) approaches the normal distribution as the sample size increases.
c) has more area in the tails than does the normal distribution.
d) All of the above.
ANSWER: D
3. A major department store chain is interested in estimating the average amount its
credit card customers spent on their first visit to the chain’s new store in the mall.
Fifteen credit card accounts were randomly sampled and analyzed with the following
results:
! 341#
and . Assuming the distribution of the amount spent on
their first visit is approximately normal, what is the shape of the sampling distribution
of the sample mean that will be used to create the desired confidence interval for
µ?
a) Approximately normal with a mean of Php 2525
b) A standard normal distribution
c) A t distribution with 15 degrees of freedom
d) A t distribution with 14 degrees of freedom
ANSWER: D
4. A major department store chain is interested in estimating the average amount its
credit card customers spent on their first visit to the chain’s new store in the mall.
Fifteen credit card accounts were randomly sampled and analyzed with the following
results:
and . Construct a 95% confidence interval for the
average amount its credit card customers spent on their first visit to the chain’s new
store in the mall.
a) 2525 pesos ± 454.5 peos
b) 2525 pesos ± 506 pesos
c) 2525 pesos ± 550 pesos
d) 2525 pesos ± 554 pesos
ANSWER: D
5. As an aid to the establishment of personnel requirements, the director of a hospital

wishes to estimate the mean number of people who are admitted to the emergency
room during a 24-hour period. The director randomly selects 64 different 24-hour
periods and determines the number of admissions for each. For this sample,
X = 19.8 and s2 = 25. Which of the following assumptions is necessary in order for a
confidence interval to be valid?
a) The population sampled from has an approximate normal distribution.
b) The population sampled from has an approximate t distribution.
c) The mean of the sample equals the mean of the population.
d) None of these assumptions are necessary.
ANSWER: D
! 342#
Student’s t Distribution Table
probability!
t0!
selected probability or area to the right of a tabular value (α)

Df 0.10 0.05 0.025 0.01 0.005
1 3.08 6.31 12.71 31.82 63.66
2 1.89 2.92 4.30 6.96 9.92
3 1.64 2.35 3.18 4.54 5.84
4 1.53 2.13 2.78 3.75 4.60
5 1.48 2.02 2.57 3.36 4.03
6 1.44 1.94 2.45 3.14 3.71
7 1.41 1.89 2.36 3.00 3.50
8 1.40 1.86 2.31 2.90 3.36
9 1.38 1.83 2.26 2.82 3.25
10 1.37 1.81 2.23 2.76 3.17
11 1.36 1.80 2.20 2.72 3.11
12 1.36 1.78 2.18 2.68 3.05
13 1.35 1.77 2.16 2.65 3.01
14 1.35 1.76 2.14 2.62 2.98
15 1.34 1.75 2.13 2.60 2.95
16 1.34 1.75 2.12 2.58 2.92
17 1.33 1.74 2.11 2.57 2.90
18 1.33 1.73 2.10 2.55 2.88
19 1.33 1.73 2.09 2.54 2.86
20 1.33 1.72 2.09 2.53 2.85
21 1.32 1.72 2.08 2.52 2.83
22 1.32 1.72 2.07 2.51 2.82
23 1.32 1.71 2.07 2.50 2.81
24 1.32 1.71 2.06 2.49 2.80
25 1.32 1.71 2.06 2.49 2.79
26 1.31 1.71 2.06 2.48 2.78
27 1.31 1.70 2.05 2.47 2.77
28 1.31 1.70 2.05 2.47 2.76
29 1.31 1.70 2.05 2.46 2.76
30 1.31 1.70 2.04 2.46 2.75
∞ 1.28 1.65 1.96 2.33 2.58
! 343#
Lesson 5: Point and Confidence Interval Estimation of the

Population Proportion
OVERVIEW OF LESSON: In this lesson, learners learn how to construct interval

estimates of the population proportion. They are also taught how to determine minimum
sample size requirements for estimating the population proportion.

• Identify a point estimator of the population proportion
• Discuss the properties of the sample proportion as point estimator
• Compute for a point estimate of the population proportion
• Identify an appropriate confidence interval estimator of the population proportion
using large sample based on the Central Limit Theorem
• Construct a (1-α)100% confidence interval estimator of the population proportion
using a large sample
• Interpret point and confidence interval estimates of the population proportion
PRE-REQ UISITE KNOW LEDGE AND SKILLS: Knowledge in point estimation as

well as the sampling distribution of the population proportion
LESSON OUTLINE
A. Point estimator of the population proportion
B. Properties of the sample proportion as point estimator of population proportion
C. Construction and interpretation of a (1-α)100% confidence interval estimator of
the population proportion using a large sample
D. Illustration on the computation of a point and interval estimates of the
population proportion and its interpretation.
First, review the lesson on proportion as a parameter. The ratio of the number of units
possessing a characteristic to the total number of units in the population is a population
proportion. Examples are the proportion of learners who passed the last examination, the
proportion of Filipinos who live in poverty, the proportion of housing units in the Philippines
with roof made of strong materials, and proportion of Piatos chips that are not broken.
As a motivational activity, present the partial list of variables below in a data set gathered
from learners enrolled in Grade 11 Statistics and Probability this school year.
! 344!
VARIABLE DEFINITIO N/DESCRIPTIO N
usual number of hours spends studying outside school hours during
HRS_STUD
weekdays
SEX biological sex
HEIGHT height measured in cm
WEIGHT weight measured in kg
WAIST waist girth measured in cm
HIP hip girth measured in cm
MGINCOME monthly family gross income
MONTH_ALLO
W monthly allowance
WEEK_FOOD weekly expenditures on food outside home
AGE_FATHER father's age
AGE_MOTHER mother's age
NUM_SIBLINGS number of siblings
mode of transportation in going to school (private, service, public, not
MODE_TRANS
applicable (i.e. walking))
GENRE preferred genre of music (e.g. rock, acoustic, mellow, etc)
Ask learners to identify proportions that could be defined from these variables. Note that
some variables are straightforward while others need to be redefined further. The following
are some examples identified:
1. Proportion of learners who are enrolled in Grade 11 Statistics and Probability this
school year and who spend at least 2 hours studying outside school hours during
weekdays
2. Proportion of male learners who are enrolled in Grade 11 Statistics and Probability
this school year
school year and at least 160 cm tall
school year and at most 100 kg
school year and with waist girth of at most 50 cm
school year and with hip girth of at least 60 cm
school year and who belong to a family whose gross monthly income is at most Php
15,000
school year and with a monthly allowance equal to Php 4,000
school year and with a weekly food expenditure outside home equal to P500
school year and with a father whose age is at least 60 years
! 345!
school year and with a mother whose age is at least 60 years
school year and with at least 3 siblings
school year and who go to school using private vehicles
school year and who have rock as preferred music genre
Choose one of these variables and ask learners what they are going to do if they were
asked to estimate one of the variables.
In the discussion, take note of the following:
• If, in general, you were to estimate the proportion of learners enrolled in Grade 11
Statistics and Probability this school year and with rock as as preferred music genre from
a simple random sample of size n, an estimator for this population proportion is the
ratio of the number of sampled Grade 11 Statistics and Probability learners who
preferred rock over the sample size n. This is referred to as the sam ple proportion
which is defined as the ratio of the number of sample units possessing the characteristic
of interest to n. Mathematically, the point estimator of the population proportion, based
on a simple random sample of size n, is expressed as where a is the number of
sample units having the characteristic of interest.
• The sample proportion as estimator of the population proportion is unbiased
with standard error equal to . This was discussed in the previous chapter on
sampling. Also, with sufficient sample size n, (say at least 100), the sampling distribution
of the sample proportion could be approximated by the standard normal distribution
based on the Central Limit Theorem (CLT).
• Using the above mentioned concepts, a (1-α)% confidence interval (CI) of the
population proportion (P) is constructed as
or
where is the sample proportion computed from a simple random sample of size n.
The lower limit of the interval is while the upper limit is
• Computing the width of the confidence interval estimate, we have:
! 346!
where the maximum allowable deviation is equal to .
• Illustration of the Computation: Suppose in a simple random sample of 50 Grade 11

Statistics and Probability learners, 30 of them said they preferred the music genre rock.
The sample proportion is computed as with standard error equal to
. Using the same sample, the 95% confidence interval (CI)

of the population proportion of Grade 11 Statistics and Probability learners with rock as
preferred genre of music is constructed as
Hence, we estimate that 6 out of every 10 Grade 11 Statistics and Probability learners
would say that rock is their preferred music genre. Further, we could say that we are
95% confident that the true proportion of Grade 11 Statistics and Probability learners
would say that rock is their preferred music genre is between 0.59 and 0.61 or out of
every 100 Grade 11 Statistics and Probability learners, we are 95% confident that there
will be between 59 to 61 of them who would say that rock is their preferred music
genre.
ENRICHM ENT
Most national opinion polls sample at least 1,200 respondents (although there are 100=
million Filipinos as of mid-2014) and typically ask about the approval ratings of government
officials, especially the President. How is this possible? In this lesson, we saw that the likely
size of the chance error in sample percentages depends on the size of the sample, and,
hardly at all, on the population size. The huge number of possible Filipinos that could be
sampled does not affect the standard error (of the proportion) but only makes it difficult
operationally to draw the random sample. Is 1,200 a big enough sample?
Most critics of sample surveys would find it illogical why 1,200 respondents would represent
millions. It turns out that 1,200 would indeed be a reasonable sample size for estimating
approval ratings. If the true approval ratings of the President were 50%, then with a sample
size of 1200, the standard error for the proportion is about 6 percentage points, and we
could have a margin of error of 3 percentage points at 95% confidence. This shows why we
ought to be able to accurately assess the winner of a presidential race even before election
day itself, unless the proportion of votes for two candidates in an election are very close.
Suppose we will be required to construct a 95% confidence interval for the proportion so
that it would have a width of 5%, what sample size would be required? In the previous
! 347!
chapter, we noted that a conservative estimate of the standard error of the sample
proportion is
0.5 (0.5)
n
since the maximum value p(1-p) can take is when p = ½. Since we want the length of the
confidence interval to be 5%, we would thus like to have the 95% confidence interval for
the proportion take the form
Sample Proportion ± 2.5%
This means that we want
1.96 (Estimate of Standard Error) = 0.025
or equivalently
0.5 (0.5)
1.96 = 0.025
n
Solving for this algebraic equation yields:
2
& , 0.5 )#
n = $1.96* '! = 1537
% + 0.025 ("
Ask learners whether they should account for sampling without replacement? Theoretically,
the required sample size of 1,537 has to be adjusted by incorporating the population size.
For a population of 100,000,000, we would have to obtain a sample of size:
= 1537
which is the same as that obtained for sampling with replacement. This numerical result
explains why nationwide polls typically use only 1,200 to 1,600 respondents. Emphasize to
learners that a large population size has virtually no effect on the choice of the sample size
when estimating a population proportion.
REFERENCES
Freedman, D., Pisani, R, and Purves, R. (2007). Statistics, Fourth Edition. New York: W. W.
Norton & Company
4031
! 348!
ASSESSM ENT
The following are some problems that could serve as computational exercises on point and
confidence interval estimation of the population proportion.
1. Some government officials are proposing for the country’s academic calendar to be
moved from June-March to August-May. This proposal, according to the officials, if
approved, can further improve education by synchronizing our calendar with that of the
other countries. Government officials would push through with the proposal if at least
85% of the student population favor it. To know the opinion of learners regarding the
said proposal, a simple random sample of learners was obtained and they were asked if
they were in favor of the said proposal. Of the 1,000 surveyed learners, 892 said they
were in favor of the approval of the said proposal.
a. Find a point estimate of the true proportion of learners who are in favor of the
approval of the proposal and find its standard error.
Answer: With a = 892, then and its standard error equal to
b. Construct a 99% confidence interval for the true proportion of learners who are in
favor of the approval of the proposal. Interpret the confidence interval obtained.
Answer: The 99% confidence interval estimate of the true proportion of learners
who are in favor of the approval of the proposal is expressed as
We then say that we are 99% confident that the true proportion of learners who are
in favor of the approval of the proposal is between 0.87 and 0.92
2. Because of several political problems the country is experiencing right now, a lawyer
became interested in knowing the opinion of the residents of a certain municipality
about plunder issues. A lawyer came up with a proposed program regarding the
resolution of plunder cases if majority of the population were not satisfied with the
result of plunder cases filed in the country. She randomly selected 500 individuals
from the complete list of registered voters of the 2013 National Election in their
municipality. Each respondent was asked if he/she were satisfied with the outcome
of plunder cases filed in the country. Of the surveyed citizens, 180 said they were
satisfied with the result of plunder cases filed in the country.
a. Find a point estimate of the true proportion of citizens who are not satisfied with the
result of plunder cases filed in the country. Calculate the standard error of the
estimate.
! 349!
Answer: With a = 320, then and its standard error equal to
b. Construct a 95% confidence interval for the true proportion of citizens who are not
satisfied with the result of plunder cases filed in the country. Interpret the
confidence interval obtained.
Answer: The 95% confidence interval estimate of the true proportion of citizens
who are not satisfied with the result of plunder cases filed in the country is
expressed as
We then say that we are 95% confident that the true proportion of citizens who are
not satisfied with the result of plunder cases filed in the country is between 0.60 and
0.68
ENRICHM ENT
For problems described above, discuss how they could use the confidence interval
estimates to obtain the objective of the problem.
TEACHER TIPS
! 350!
Lesson 6: More on Point Estimates and Confidence

Intervals
OVERVIEW OF LESSON: In this lesson, learners undertake an activity to deepen their

understanding of point and interval estimation. This lesson is largely taken from a STatistics
Education Web (STEW) lesson plan called “Did I Trap the Median?” Learners recall the
information provided at the beginning of Chapter 1, particularly the rating score they gave
(from 1 to 10) about their state of happiness. Learners collect a random sample of 10 of
their classmates’ records on their respective states of happiness in order to obtain point and
interval estimates for the median level of state of happiness in the entire class.

• Calculate a point estimator of the population median number of text messages sent
in a day in class
• Construct a (1-α)100% confidence interval estimator of the population proportion
using large sample
• Interpret point and confidence interval estimates of the population proportion
M ATERIALS REQ UIRED : Ruler and Pencil, Calculators, Activity sheet

LESSON OUTLINE
A. Introduction
B. Data Collection
C. Data Analysis
D. Enrichment
DEVELOPM ENT OF THE LESSON

A. Introduction
This lesson involves an activity where learners collect sample data from their class to
estimate the median state of happiness among the population of Grade 11 learners in the
entire class. Each student obtains a point estimate and constructs an interval estimate for
the median state of happiness in the entire class by using a simple random sample of 10
learners in the class. Numeric summaries and graphs are used to obtain point and interval
estimates, respectively, for the median state of happiness in the entire class. The teacher
examines the database on the state of happiness of all learners in class that was collected in
the first lesson of the first chapter in order to obtain the population median state of
happiness of the entire class. The confidence level for the interval estimates computed by
! 351$
the learners is estimated by obtaining the proportion of learners’ sample interval estimates
that trap the population median state of happiness of the entire class.
Ask learners to hypothesize the answers to some of these questions:
1. What are the advantages of collecting a sample of only 10 records of the state of
happiness and not of the entire class to obtain the median state of happiness in the
entire class?
2. What are the advantages and disadvantages of using the sample median to
estimate the population median?
3. Is there any advantage to constructing an interval estimate as opposed to a point
estimate (the sample median) for the population median?
4. Is it possible to ascribe a reliability value to the interval estimate (ascribe a
probability that the interval contains the median)?
5. What are the factors that may affect the length and the reliability of an interval
estimate?
B. Data Collection
Have the Learnerslearners recall that they reported their state of happiness to the teacher at
the beginning of Lesson 1-01. This was put in a database. Now, let them get similar records
of the state of happiness of 10 randomly selected learners in class. To ensure that each
student will use a random sample of 10 records from the class database:
1. Ask learners to generate 10 random numbers from one to the total number of
learners in class using a table of random digits. They may use the Table of Random
Digits from Lesson 3-06.
2. Have learners write down the generated numbers from least to greatest on the data
table. Explain that each number corresponds to a classmate. Next, list all the records
of the state of happiness of each student (from the database collected in Lesson 1-
01), together with the respective student numbers. Tell each student to write down
only the records for each randomly generated number that corresponds to a
classmate’s state of happiness.
A sample student data set is shown in Table 4-06.1 below. A blank data table is provided in
the Activity Sheet.
Table 4-06.1. Exam ple Student Data Sam ple

Sample Student State of Happiness
1 7
2 7
3 8.5
4 4
5 6.5
6 7
7 7.5
8 5.5
9 7.5
10 3.5
! 352$
An example class data set is shown in Table 4-06.2 below.
Table 4-06.2. Exam ple Class Data

Student Number State of Happiness
1 7
2 7
3 8.5
4 4
5 6.5
6 7
7 7.5
8 5.5
9 7.5
10 9.5
11 6.5
12 6
13 8
14 5
15 4
16 8.5
17 5.5
18 8
19 5
20 6
1. Computing and Displaying Numerical Summaries

Different statistical tools are used for estimating numerical values in a population. For
example, when drawing a random sample one can calculate the sample mean (or average),
or the median (50th percentile) to obtain an estimate of a measure of the center of the
distribution of values pertaining to the entire population. Also, the range, the inter-quartile
range (difference from the 25th percentile or first quartile, to the 75th percentile or third
quartile) as well as the standard deviation from sample data can be computed to estimate
the spread of the values in a population (measures of variation).
2. Visualizing the Distribution

A box and whiskers plot is a graphical summary proposed by John Tukey for data that uses
the 5-number summary (minimum, 25th percentile, median, 75th percentile, and maximum)
to graphically display the distribution of a data set while highlighting measures of the center
(median), other positions (25thand 75th percentiles), and measures of variation (range, inter-
quartile range). This plot also allows us to identify outliers, numbers that are very different
from the rest of the data. Some features of the box plot of a sample data set will be used to
construct interval estimates for the median of the population.
! 353$
To construct a box and whiskers plot, learners should compute the 5-number summary of
their sample data. Ask learners to order the values in their sample, from smallest to largest.
Now, learners can readily identify the minimum and maximum values in their sample data,
and proceed to compute the quartiles. The median or second quartile (Q2) is found by
locating the midpoint of the entire ordered sample data set. Since we have an even number
of data points in the example used here, we have two middle values so we find the median
by averaging these two values. The 25th percentile or first quartile (Q1) is found by
calculating the median of the lower half of the sample data (first five numbers). For the
sample data Q1 is the sole value in the middle position (third data point) of the first five
numbers. The 75th percentile or third quartile (Q3) is similarly found by calculating the
median of the upper half of the sample data (last five numbers). In this case, the third
quartile is in the eighth position.
The steps to draw the box plot using the sample data to construct an interval estimate for a
population median can be better described by means of an example. This is done in the
succeeding paragraphs using the sample data in Table 4-06.1. In addition, the teacher
should construct a box plot for the data of the entire class for a later discussion.
The median state of happiness in the entire class in the example in this lesson plan (Table 4-
06.2) is 6.75, while for the student data sample (Table 4-06.1), it is 7.0. Note that the
sample median of 7 can be used as a point estimate of the population median of 6.75.
Point estimates are obtained with the hope that they are close to the population value that
they are meant to estimate.
Confidence intervals have the extra advantage of providing a sense of uncertainty in the
estimation process. With confidence intervals, we can be quite confident of the accuracy in
estimation; i.e., that the exact population value that is being estimated (the population
median in this case) is captured or “trapped” by an interval constructed using sample data.
To place an interval estimate for the population median using the features of a box plot,
start by having each student obtain the 5-number summary of his/her sample data as
described in section A above. Notice that the smallest level of happiness for the sample
data in Table 4-06. 1 is 4.0, while the largest level of happiness is 9.5. The median value
(Q2) of 7.0 indicates that about half of the learners in the data set have stated levels of
happiness less than or equal to 7, and that about half of the learners have levels of
happiness greater than or equal to 7.
The first quartile of the student data sample is 6.5 and the third quartile is 7.5 cm. These
values for Q1 and Q3 indicate that about 25% of the learners in this sample have levels of
happiness less than or equal to 6.5, and about 25% of learners have levels of happiness
greater than or equal to 7.5. These values also indicate about 50% or half of the learners in
the sample have levels of happiness between 6.5 and 7.5.
! 354$
To construct a box plot follow these steps:
1. Mark the values of Q1 = 6.5, Q2 = 7.0 cm, and Q3 = 7.5 on a horizontal scale
that spans across all the values in the sample data. Then, construct a box
above the scaled line using these values as indicated in Figure 4-06.1.
_____________________________________________________________________
! !
Q1!=!6.5! Q2!!!=!7.0! Q3!!!=!7.5!
! !
Figure !4-06. 1. Box plot:
! Step 1
2. To find if there are any outliers or extreme values in a data set, compute the
inter-quartile range (IQR), which is the difference between the third and first
quartiles. Any data point beyond what are called the lower outlier bound, Q1
– 1.5(IQR), or the upper outlier bound, Q3 + 1.5(IQR), is considered to be an
outlier. In this case, IQR = 7.5 – 6.5 = 1; therefore any level of happiness
smaller than Q1 − 1.5(IQR) = 6.5 − (1.5)(1) = 5, or larger than Q3 + 1.5(IQR) =
7.5 + 1.5(1) = 9.0 is an outlier. There are two outliers in this data set, 4.0 and
9.5. These outliers are indicated by drawing stars above the scaled line at
about half the height of the box as shown in Figure 4-06.2 below.
* *
!!
_____________________________________________________________________ !!
4.0! Q1!=!6.5! Q2!!!=!7.0! Q3!=!7.5! 9.5!
! Figure ! 4-06. 2 Box plot: Step 2
3. Finally, find the minimum value that is not an outlier and the maximum value
that is not an outlier. Here, the minimum value that is not an outlier is 5.5,
and the maximum value that is not an outlier is 8.5. Then, add what are called
the whiskers to the box by drawing horizontal lines at about half the height of
the box, first from Q1 down to the minimum value that is not at outlier, and
second from Q3 up to the maximum value that is not an outlier as indicated in
Figure 4-06.3 below. Only when there are no outliers would the whiskers go
as far as the minimum and maximum values in the data set. To avoid drawing
the whiskers incorrectly, make sure to draw them after the outliers (if any)
have been added to the graph.
! 355$
* ! *
!! ! Q2! Q2! Q3! !!
_____________________________________________________________________
! !
4! 4.5! Q1!=!6.5! Q2!!!=!7.0! Q3!=!7.5! 8.5! 9.5!
! ! Figure ! 4-06. 3. Box plot: Step 3 !
Notice that the box and whiskers plot for the student sample data is quite
symmetric. The distributions of random sample data tend to reflect the distribution
of the population. At this point, you can write on the board the box plot you
obtained for the data of the entire class, and ask learners if their sample data box
plots resemble that of the population. There may be a small proportion of learners
whose box plot may be quite different from the box plot of the data of the entire
class. This is due to random variation in the samples. However, most of the learners
should have a box plot that resembles that of the population.
3. Constructing an Interval Estimate

Ask learners to discuss how much their sample median differs from the population
median. In the above example, the sample median of 7.0 is off by 0.25 from the
population median of 6.75. Learners should note the wide variability in estimation
error when using their sample median as an estimate of the population median
(compared to the variability in estimation error when estimating the population
mean by the sample mean). Now ask learners if they would consider it reasonable to
provide an interval estimate that has a high probability of capturing or trapping the
exact median of the population. If they could provide an interval that captures or
traps the population median by using their own sample data, what would this
interval be? One suggestion might be to use the endpoints of the whiskers of their
box plots as an interval that has a high probability of trapping the population
median. However, learners may also realize that this interval is too wide to help
hone in on the value of the population median (that is, that this interval has a large
margin of error). Then, ask learners whether the shorter interval from Q1 to Q3
(endpoints of the box instead of endpoints of the whiskers) would be more
reasonable to estimate the location of the population median.
Now, you can ask learners how confident they are that each time they obtain a
random sample of 10 learners and obtain the first and third quartiles of this sample,
the interval (Q1, Q3) captures or traps the population median. It would not be
surprising to have learners whose intervals (Q1, Q3) did not capture the population
median. If so, this would prevent learners from saying that that they are 100%
confident that each time they take a sample of 10 learners and obtain the first and
! 356$
third quartiles of their sample, the interval (Q1, Q3) will trap the population median.
So what is the level of confidence that learners have for capturing the population
median with the interval (Q1, Q3) from a random sample of 10 learners? To answer
this question, learners can obtain the reliability or level of confidence, of using (Q1,
Q3) from their sample of 10 learners as an interval estimate for the population
median. Simply obtain the proportion of learners in class whose interval estimate
trapped the population median (class median). For example, if 15 of the 20 learners
(75%) in the class obtained an interval (Q1, Q3) that trapped the class median of
6.75, then this means that each time someone takes a sample of 10 learners from
the class, we expect 75% of the intervals (Q1, Q3) will trap the population median
level of happiness.
C. Data Analysis
Learners should now have an idea of the advantages of using interval estimates, which,
once their level of reliability is known, are called confidence intervals. However, learners
may agree that a sample interval (Q1, Q3) is still too wide (that is, the interval has a large
margin of error) as a predictor of the location of the median. Ask learners questions
pertaining to possible refinements for these confidence intervals such as the following
questions below, which will be explored on the second day of this lesson plan:
1. What do you think would happen to the sample interval (Q1, Q3) if the sample size
increased from 10 to 15 (or 20)?
2. What do you think would happen to the sample interval (Q1, Q3) if the population
distribution is not a symmetric distribution?
3. Do you have any idea how to construct interval estimates that are shorter than the
interval (Q1, Q3)? Would a shorter interval necessarily change the level of reliability?
D. Enrichment
Extend this lesson and activity by increasing the sample size to 15 or 20, and let
learners see how increasing the sample size produces tighter intervals (Q1, Q3). Also,
they should explore how symmetric distributions produce interval estimates with
smaller reliability (lower level of confidence) than non-symmetric distributions.
REFERENCES
Parks, S. Did I Trap the MEdian? (California State University Sacramento, Dept. of Mathematics and Statistics),
Mathew Steinwachs (University of California Davis, iAMSTEM Hub), Rafael Diaz (California State University
Sacramento, Dept. of Mathematics and Statistics), Marco Molinaro (University of California Davis). STatistics
Education Web (STEW). Retrieved from https://www.amstat.org/education/stew/pdfs/DidITrapTheMedian.docx
Albert, J. R. G. (2008). Basic Statistics for the Tertiary Level (ed. Roberto Padua, Welfredo Patungan, Nelia
Marquez). Philippines: Rex Bookstore.
! 357$
Freedman, D., Pisani, R, and Purves, R. (2007). Statistics, Fourth Edition. New York: W. W. Norton & Company
ASSESSM ENT
A class of 25 learners is selected and their exam scores are recorded. A random sample of
10 learners is taken from the classlearners. The data is shown in the tables below.
Class Data Table Sample Data Table
Student Exam Scores Student Exam Scores

1 90 1 96
2 101 2 101
3 106 3 106
4 108 4 108
5 125 5 125
6 130 6 130
7 115 7 115
8 91 8 93
9 112 9 112
10 107 10 107
11 76
12 103
13 69
14 94
15 106
16 78
17 121
18 80
19 85
20 80
21 99
22 76
23 92
24 89
25 121
Using the above tables, answer the following questions:

a) Calculate the 5-number summary for the sam ple data table.
Answer:
5-number summary: minimum = 93, first quartile, Q1 = 101, median = 107.5, third
quartile, Q3= 115, maximum = 130.
! 358$
b) Determine the lower and upper outlier bounds. Are there any outliers?
Answer: Q1 = 101
c) What are the minimum and maximum values that are not outliers? Note: If there are
no outliers below (above) the lower (upper) outlier bound, then the minimum
(maximum) value that is not an outlier matches the minimum (maximum) value of the
data set.
Answer:
Q3 = 115,
IQR = Q3 – Q1 = 115 – 101 = 14;
Lower outlier bound: Q1 – 1.5(IQR) = 101 – 1.5(14) = 80;
Upper outlier bound: Q3 + 1.5(IQR) = 115 + 1.5(14) = 136;
No outliers (no points located beyond the outlier bounds).
d) Construct a box plot for foot sizes for the sam ple data table.
Answer: Minimum that is not an outlier = Minimum of the sample data (no outliers
below the lower outlier bound) = 93
Maximum that is not an outlier = Maximum of the sample data (no outliers above
the upper outlier bound) = 130
e) Is the distribution of the data set symmetric or asymmetric?

learners
Answer:
See box plot below:
Sample Boxplot
100 110 120 130
IQ Score
The box plot indicates that the distribution of the sample data is asymmetric due to
a longer upper whisker, and a larger spread for the values between the third quartile
and the median. (In the second day of this lesson plan, learners will learn that when
a box plot shows an asymmetry in this direction the distribution of the data is said to
be skewed right or positively skewed).
f) Compute the population median (median of the entire class of 25 learners).
Answer: The median of the entire class is 99.
! 359$
g) Does the interval (Q1, Q3) trap the median of the class data?
Answer: learnersNo, the sample box plot does not trap the class median of 99. The
class median does not fall between 101 (first quartile) and 115 (third quartile).
!
Activity Sheet 4-06
1. Describe the data collection process that will be used.
2. Recall your answer to Activity Sheet 1-01a :
On a scale from 1 (very unhappy) to 10 (happiest), how do you feel today? ________
3. Record the state of happiness of 10 randomly chosen learners in your class.
Name State of Happiness
4. Arrange the values of the state of happiness from smallest to largest.
5. Complete the table below showing numeric summaries for the state of happiness for
your ten randomly chosen classmates.
Mean Minimum First Median Third Maximum

Quartile Quartile
(Q1) (Q3)
6. Determine what values would be considered to be outliers for your 10 randomly

chosen classmates. Are there any outliers?
7. Construct a horizontal box plot for your 10 randomly chosen classmates. In the event
of having outliers for your data set, do not use outliers for the minimum or maximum
values. For the minimum and maximum values, plot the minimum value that is not
an outlier and the maximum value that is not an outlier.
! 360$
state of happiness
8. What is the class median state of happiness? Does your Q1 to Q3 interval estimate
trap the median for the entire class?
9. Based on the median of the entire class given by your teacher and the median of
your particular 10 randomly chosen classmates, calculate what proportion (percent)
of box plots trap the median for the entire class. This is the reliability (confidence
level) of using interval estimates from Q1 to Q3.
10. Think about what would happen if the sample size is increased. Would the
proportion of box plots that would trap the median increase or decrease? Why?
Sim ulation W orksheet
1. Hypothesize what the answers to the following questions might be and state why.
a) What happens to the width of the confidence intervals when the sample size
increases? Do the bounds of the intervals vary more? Why?
b) What happens to the level of confidence (reliability or percentage of sample
intervals that trap the population median) of the interval estimate when the
sample size increases? Why?
c) What happens to the width of the interval estimate when the population
distribution shape changes? Do the bounds of the intervals vary more?
Why?
d) What happens to the level of confidence (reliability or percentage of sample
intervals that trap the population median) when the population distribution
shape changes? Why?
! 361$
CHAPTER 5: TESTS OF HYPOTHESIS
Lesson 1: Basic Concepts in Hypothesis Testing

At the end of the lesson, the learners should be able to:
• Illustrate a statistical hypothesis
• Differentiate a null hypothesis from alternative hypothesis
• Differentiate Type I from Type II error
• Illustrate consequences of committing errors
LESSON OUTLINE
a. Definition of statistical hypothesis
b. The difference of null hypothesis from alternative hypothesis
c. Consequences of making a decision
d. Two possible errors that could be committed in a test of hypothesis
Recall that statistical inference is concerned with either estimation or evaluation of a

statement or claim about a parameter or a distribution. The latter is the focus of this
chapter. Evaluation of a claim about a parameter or a distribution is done through a
statistical test of hypothesis.
As a motivational activity, ask learners to react on the government pronouncement about El

Niño phenomenon. Describe the El Niño phenomenon and its possible consequences
further.
“The country will experience El Niño phenomenon in the next few months.”
Write learners’ reactions on the board. Their reactions may include the following:
1. The occurrence of El Niño phenomenon is not sure.
2. There is a possibility that El Niño phenomenon may not occur.
3. The effects of El Niño phenomenon are devastating to the country.
4. Some of the consequences of the El Niño phenomenon are tolerable while other
consequences are not.
5. The validity of the statement could be tested based on some empirical facts.
Discuss the results of this activity to learners with emphasis on the following points:
• The pronouncement is a claim that may be true or false. Such claim could be referred to
as a statistical hypothesis. A statistical hypothesis is a claim or a conjecture
that m ay either be true or false. The claim is usually expressed in terms of the
value of a parameter or the distribution of the population values.
! 362$
• There are two possible actions that one can do with the statement. These actions are
either to accept the statem ent or to reject it. These actions are brought about by
a decision whether the statement is true or false. Some of the learners may believe that
the statement is true, hence they accept the pronouncement. Others may think that the
statement is false, hence they reject the claim.
• The actions we made have consequences. Possible consequences of accepting that
the statement is true include: (a) increase the importation of rice in anticipation of
supply shortage; (b) buy materials for water storage; (c) use drought-resistant varieties of
rice; (d) invest in programs to make Filipinos ready; and the like. On the other hand,
when the statement is rejected because we think it is false, possible consequences are
(a) We are not prepared for rice and water shortage; (b) Farmers experience great loss
on production; or (c) We do not do anything.
• Some of the consequences are tolerable while other consequences are severe.
Experiencing a few days of water shortage is tolerable but having rice shortage for a
month or two is unbearable. The degree of the possible consequence is the basis
in making the decision. If the consequences of accepting the claim that El Niño
phenomenon is going to happen are tolerable, then we may not reject the
pronouncement. However, if the consequences are severe, then we reject the claim.
Consider another statement or claim but this time regarding a parameter. Consider the
average number of text messages that a Grade 11 student sends in a day. The statement
could be stated as follows:
“The average daily number of text messages that a Grade 11 student sends is equal to
100.”
As discussed earlier, this statement can either be true or false. Hence, one can accept or
reject this statement. The validity of this statement can be accessed through a series of
steps known as test of hypothesis. A test of hypothesis is a procedure based on a
random sam ple of observations with a given level of probability of
com m itting an error in m aking the decision, whether the hypothesis is true or
false.
In hypothesis testing, we first formulate the hypotheses to be tested. In the formulation of

the hypotheses, we take note of the following:
• There are two kinds of a statistical hypothesis: the null and the alternative hypothesis. A
null hypothesis is the statem ent or claim or conjecture to be tested while
an alternative hypothesis is the claim that is accepted in case the null
hypothesis is rejected. The symbol “Ho” is used to represent a null hypothesis
while “Ha” is used to represent an alternative hypothesis. The statement “The average
daily number of text messages that a Grade 11 student sends is equal to 100.” is
considered a null hypothesis. In the event that we reject this claim, we can accept
another statement which states otherwise, that is, “The average daily number of text
messages that a Grade 11 student sends is not equal to 100.” This statement is our
alternative hypothesis.
! 363$
• In formulating the hypotheses, we can use the following guidelines:
1. A null hypothesis is generally a statem ent of no change. Thus, a statement
of equality or one which involves the equality is usually considered in the null
hypothesis. Possible forms of the null hypothesis include (a) equality; (b) less than or
equal; and (c) greater than or equal.
2. The statistical hypothesis is about a param eter or distribution of the
population values. For example, the parameter in the statement is the average
daily number of text messages that a Grade 11 student sends. Usually, the
parameter is represented by a symbol, like for the population mean, we use µ.
Hence, the null and alternative hypotheses could be stated using symbols as “Ho: µ
= 100 against Ha: µ ≠ 100.”
3. The null and alternative hypotheses are com plem entary and m ust not
overlap. The usual pairs are as follow:
(a) Ho: Parameter = Value versus Ha: Parameter ≠ Value;
(b) Ho: Parameter = Value versus Ha: Parameter < Value;
(c) Ho: Parameter = Value versus Ha: Parameter > Value;
(d) Ho: Parameter ≤ Value versus Ha: Parameter > Value; and
(e) Ho: Parameter ≥ Value versus Ha: Parameter < Value
• As discussed earlier, there are two actions that one can make on the hypothesis. One
can either reject or fail to reject (accept) a hypothesis. The table below shows
these actions:
Hypothesis is Hypothesis is
Action
TRUE FALSE
Reject the hypothesis Error Committed No Error Committed
Fail to reject (Accept) the No Error Committed Error Committed
hypothesis
• The table shows that there are no errors committed when we reject a false hypothesis
and when we fail to reject a true hypothesis. On the other hand, an error is
com m itted when we reject a true hypothesis and such error is called a
Type I error. Also, when we fail to reject (accept) a false hypothesis, we are
com m itting a Type II error.
• As mentioned earlier, for every action that one takes, there are consequences. When we
commit an error, there are consequences, too. Since it is an error in decision making,
the consequences may be tolerable or too severe, severe enough to cause lives. In
Statistics, we measure that chance of committing the error so we will have a basis in
making a decision.
ASSESSMENT
As an assessment, choose one of the following problems and ask learners to formulate the
appropriate null and alternative hypotheses. You can also ask them to identify situations where Type
I and Type II errors are committed. Have them state its possible consequences.
! 364$
1. A manufacturer of IT gadgets recently announced they had developed a new battery for a tablet
and claimed that it has an average life of at least 24 hours. Would you buy this battery?
Answer: The null hypothesis can be stated as Ho: The average life of the newly developed
battery for a tablet is at least 24 hours while the alternative hypothesis is Ha: The
average life of the newly developed battery for a tablet is less than 24 hours. Type I
error is committed when you did not buy the battery and a possible consequence is
you lost the opportunity to have a battery that could last for at least 24 hours. On the
other hand, Type II error is committed when you did buy the battery and found out
later that the battery’s life was less than 24 hours. A possible consequence of this
Type II error is that you wasted your money in buying the battery.
2. A teenager who wanted to lose weight is contemplating on following a diet she read about in
the Facebook. She wants to adopt it but, unfortunately, following the diet requires buying
nutritious, low calorie yet expensive food. Help her decide.
Answer: The null hypothesis can be stated as Ho: The diet will not result to a change in her
weight while the alternative hypothesis is Ha: The diet will induce a reduction in her
weight. Type I error is committed when the teenager did follow the diet and a
possible consequence is that she spent unnecessarily for a diet that did not help her
reduce weight. On the other hand, Type II error is committed when the teenager did
not follow the diet. A possible consequence of this Type II error is that the teenager
lost the opportunity to attain her goal of weight reduction.
3. Alden is exclusively dating Maine. He remembers that on their first date, Maine told him that her
birthday was this month. However, he forgot the exact date. Ashamed to admit that he did not
remember, he decides to use hypothesis testing to make an educated guess that today is
Maine’s birthday. Help Alden do it.
Answer: The null hypothesis can be stated as Ho: Today is Maine’s birthday while the alternative
hypothesis is Ha: Maine’s birthday is on another day and not today. Type I error is
committed when Alden’s guess of Maine’s birthday is not on this day and a possible
consequence is that Alden failed to greet or give Maine a birthday gift today. On the
other hand, Type II error is committed when Alden guessed that today is Maine’s
birthday. A possible consequence of this Type II error is that Alden made the mistake
of greeting Maine a happy birthday on that day.
4. After senior high school, Lilifut is pondering whether or not to pursue a degree in Statistics. She
was told that if she graduates with a degree in Statistics, a life of fulfilment and happiness awaits
her. Assist her in making a decision.
Answer: The null hypothesis can be stated as Ho: Life of fulfillment and happiness awaits her
after obtaining a degree in Statistics while the alternative hypothesis is Ha: Life of
fulfillment and happiness does not happen after obtaining a degree in Statistics.
Type I error is committed when Lilifut does not pursue a degree in Statistics and a
possible consequence is that she’ll miss the promised life of fulfilment and
happinness after obtaining a degree in Statistics. On the other hand, Type II error is
committed when Lilifut decides to obtain a degree in Statistcs. A possible
consequence of this Type II error is that Lilifut will miss the opportunity to experience
a life of fulfilment and happenness after obtaining a degree in Statistics.
5. An airline company regularly does quality control checks on airplanes. Tire inspection is included
since tires are sensitive to the heat produced when the airplane passes through the airport’s
runway. The company, since its operation, uses a particular type of tire which is guaranteed to
perform even at a maximum surface temperature of 107oC. However, the tires cannot be used
! 365$
and need to be replaced when surface temperature exceeds a mean of 107oC. Help the
company decide whether or not to do a complete tire replacement.
Answer: The null hypothesis can be stated as Ho: The surface temperature of the tires is at most
107 oC while the alternative hypothesis is Ha: The surface temperature of the tires is
greater than 107 oC. Type I error is committed when the airline company orders a tire
replacement when in fact it is not needed. A possible consequence of this is that the
company will waste money in replacing the tires. On the other hand, Type II error is
committed when the airline company does not order tire replacement. A possible
consequence of this Type II error is an accident that may happen because of non-
replacement of the tires.
Multiple Choice
1. Which of the following would be an appropriate null hypothesis?

a) The mean of a population is equal to 50.
b) The mean of a sample is equal to 50.
c) The mean of a population is greater than 50.
d) Only (a) and (c) are true.
ANSWER: A
2. Which of the following would be an appropriate null hypothesis?

a) The population proportion is less than 0.45.
b) The sample proportion is less than 0.45.
c) The population proportion is no less than 0.45.
d) The sample proportion is no less than 0.45.
ANSWER: C
3. Which of the following would be an appropriate alternative hypothesis?

a) The mean of a population is equal to 50.
b) The mean of a sample is equal to 50.
c) The mean of a population is greater than 50.
d) The mean of a sample is greater than 50.
ANSWER: C
4. Which of the following would be an appropriate alternative hypothesis?

a) The population proportion is less than 0.45.
b) The sample proportion is less than 0.45.
c) The population proportion is no less than 0.45.
d) The sample proportion is no less than 0.45.
ANSWER: A
5. A Type II error is committed when

a) we reject a null hypothesis that is true.
b) we don't reject a null hypothesis that is true.
c) we reject a null hypothesis that is false.
d) we don't reject a null hypothesis that is false.
ANSWER: D
! 366$
6. A Type I error is committed when
a) we reject a null hypothesis that is true.
b) we don't reject a null hypothesis that is true.
c) we reject a null hypothesis that is false.
d) we don't reject a null hypothesis that is false.
ANSWER: A
7. Suppose we wish to test H0: 47 versus H1: > 47. What will result if we conclude that the
mean is greater than 47 when its true value is really 52?
a) We have made a Type I error.
b) We have made a Type II error.
c) We have made a correct decision.
d) None of the above is correct.
ANSWER: C
8. If, as a result of a hypothesis test, we reject the null hypothesis when it is false, then we have
committed
a) a Type II error.
b) a Type I error.
c) no error.
d) an acceptance error.
ANSWER: C
9. The owner of a local restaurant has recently surveyed a random sample of n = 250 customers
of the restaurant. She would now like to determine whether or not the mean age of her
customers is over 30. If so, she planned to provide background music to appeal to an older
crowd. If not, no changes would be made to the background music in the restaurant. The
appropriate hypotheses to test are:
a) H0 : µ ≥ 30 versus H1 : µ < 30.
b) H0 : µ ≤ 30 versus H1 : µ > 30.
c) H0 : X ≥ 30 versus H1 : X < 30 .
d) H0 : X ≤ 30 versus H1 : X > 30 .
ANSWER: B
10. A major telco is considering opening a new telecom center in an area that currently does not
have any such centers. The telco will open the center if there is evidence that more than
5,000 of the 20,000 households in the area use the telco. It conducts a poll of 300 randomly
selected households in the area and finds that 96 subscribe to the telco. State the test of
interest to the rental chain.
a) H0 : p ≤ 0.32 versus H1 : p > 0.32
b) H0 : p ≤ 0.25 versus H1 : p > 0.25
c) H0 : p ≤ 5,000 versus H1 : p > 5,000
d) H0 : µ ≤ 5,000 versus H1 : µ > 5,000
ANSWER: B
! 367$
Lesson 2: Steps in Hypothesis Testing


• Identify the steps in hypothesis testing
• Illustrate level of significance and corresponding rejection region
• Calculate the probabilities of committing an error in a test of hypothesis
LESSON OUTLINE
A. Introduce the steps in hypothesis testing procedure
B. Define level of significance and its role in hypothesis testing
C. Illustrate the corresponding rejection region based on a given level of
significance
D. Compute the probabilities of committing an error in a test of hypothesis
A test of hypothesis is a series of steps that starts with the formulation of the null and the
alternative hypotheses and ends with stating the conclusion. Each step has several
components to consider. It has parallelism with court proceedings which could be used as a
motivational activity.
As a motivational activity, ask learners how a court trial proceeds based on their knowledge.
Guide them by citing a popular case and letting them identify the steps to come up with a
verdict for the case. For example, take the case of former President Marcos’ ill-gotten
wealth case. List the steps that the learners identified. They may mention the following:
1. State the accusation against the family of former President Marcos.

2. Choose the jury. Set or review the guidelines to be used in the decision-making
process.
3. Present the evidences
4. Decide on the matter, based on the evidences.
5. State the verdict, based on the decision made.
Discuss the results of this activity to learners, emphasizing that steps in a court proceeding
are similar if one has to conduct a test of hypothesis.
• As in a court proceeding, the first step is to state the accusation or the statement of
what will be evaluated as true or false. Parallel to hypothesis testing, we first formulate
the hypotheses to be tested. Remember that we do not know the true state of
nature of the hypothesis, that is, whether the hypothesis is true or false. Like in a court
trial, we do not know whether the accused is guilty or not.
! 368$
• As discussed in the previous lesson, there are two types of hypotheses to state: the null
and the alternative. In a court proceeding, we can state the null hypothesis as Ho: The
accused is not guilty. While the alternative is stated as Ha: The accused is guilty.
• The second step is to state the decision rule that we will follow in making a
decision on whether to reject or fail to reject (accept) the null hypothesis.
In a court proceeding, it is a guideline that the court uses to evaluate the quantity and
quality of evidences to be presented. And based on this guideline, the court decides
whether to reject or accept the hypothesis that the accused is not guilty. To be able to
specify the decision rule in a hypothesis testing procedure, there is a need to specify
the components of the rule. These components include the following:
1. We specify a level of significance, which is usually denoted as α in doing the test of
hypothesis. It is the same α that we encounter in the discussion of the (1-α)%
confidence interval estimate. A level of significance is the probability of
rejecting a true null hypothesis or com m itting a Type I error in the test
of hypothesis. Since it is a probability of committing an error, it is usually a small
value and it is between 0 and 1.
2. We identify the test statistic to use in the decision rule. Usually, the test statistic
is a standardized expression of the point estimator of the param eter
identified in the hypothesis. Also, the distribution of the test statistic is also
needed to be specified.
3. Part of the decision rule is the specification of the rejection region. The rejection
region is that part of the distribution of the test statistic where we
reject the null hypothesis.
An example of a decision rule is stated as follows:
“At a given α = 0.05, we reject Ho if the computed test statistic (denoted

as tc) is greater than a tabular value of the t distribution with n-1 degrees
of freedom. Otherwise, we fail to reject Ho.”
In this decision rule, the level of significance is set at α equal to 0.05 and the test
statistic is denoted by tc which is assumed to follow the Student’s t-distribution with n-1
degrees of freedom. The rejection region is the area to the right of the tabular value
obtained from the Student’s t-distribution with n-1 degrees of freedom. Such rejection
region is illustrated in the following figure.
rejection!region!
ttab$
• The third step is then to com pute the value of the test statistic using a random
sam ple of observations gathered or collected for the purpose of the test
of hypothesis. In a court proceeding, this is the time that the gathered evidences are
presented.
! 369$
• With the computed value of the test statistic, the next step is to use the decision rule to
m ake a decision whether to reject or fail to reject (accept) the null
hypothesis. As in a court proceeding, the jury or the court will decide whether the
accused is guilty or not based on the evidences presented.
• Lastly, as a consequence of the decision, conclusions are m ade which are in
relation to the purpose of the test of hypothesis. In a court proceeding, this is
the time when the court gives its verdict on the accused. In both scenarios, this last step
is the most awaited part of the procedure.
At this point, we can summarize the steps as follows:

1. Formulate the null and alternative hypotheses.
2. Identify the test statistic to use. With the given level of significance and the
distribution of the test statistics, state the decision rule and specify the rejection
region.
3. Using a simple random sample of observation, compute for the value of the test
statistic.
4. Make a decision whether to reject or fail to reject (accept) Ho.
5. State the conclusion.
Note that in drawing the conclusions based on the test of hypothesis, we are not 100% sure
of our decisions and also with our conclusions. Like in a court proceeding when the court
declares the accused is guilty, the decision is not made with certainty. In other words, the
court is not 100% sure that the accused is guilty. There is still that possibility that the
accused is not guilty and the court is making an error with its decision.
Recall that there are two types of errors that one can commit in decision making and these
are Type I and Type II error. Type I error is rejecting a hypothesis when in fact the
hypothesis is true. Thus, the court commits a Type I error when it declares that the accused
is guilty when in fact the accused is not guilty of the crime. In other words, the court had
convicted an innocent person. On the other hand, Type II error is committed when a false
hypothesis is accepted, that is, the court freed a person guilty of the crime.
Note: if there is still time, let learners discuss the consequences of committing the two
types of error and determine which of the two types of error—(1) convicting an innocent
person or (2) freeing a person guilty of the crime—has greater consequence.
However, unlike in a court proceeding, statistics can allow us to compute the probability of
committing an error in decision making. The probability of committing Type I error is
defined earlier as the level of significance and it is denoted by α. On the other hand, the
probability of committing Type II error is usually denoted by b. Also, in statistics the
decisions on the test of hypothesis are made with the given probabilities of Type I and Type
II error. We have the assurance that the test procedures that we use in statistics were
formulated with minimum probabilities of committing an error.
The probability of committing an error is a conditional probability problem. It is the

probability of making a decision based on the uncertainty of the true state of nature of the
! 370$
hypothesis being tested. You may use the numerical example provided in the previous
lesson in this chapter to illustrate the computation of the probabilities of committing Type I
and Type II errors.
Numerical Example: In testing the null hypothesis “The average daily number of text
messages that a Grade 11 student sends is equal to 100” against an alternative hypothesis
stated as “The average daily number of text messages that a Grade 11 student sends is
greater than 100”. A random sample of 16 students were selected and interviewed. The
daily number of text messages she sends is obtained. The null hypothesis is said to be
rejected if the sample mean is at least 102, otherwise the null hypothesis will be accepted
or we fail to reject Ho. It is assumed that the number of text messages that a Grade 11
student sends in a day follows a normal distribution with standard deviation equal to 5 text
messages.
Computing for the probability of committing Type I error, we have
Thus, we say the probability of rejecting a true null hypothesis is 0.0058 or we say that on
the average, we are assured with 94.52% (1-0.0058 = 0.9452) confidence that we are
making a correct decision in accepting a true null hypothesis.
The alternative hypothesis is stated as “The average daily number of text messages that a
Grade 11 student sends is greater than 100.” If we assume that the true distribution of the
number of text messages that a Grade 11 student sends in a day follows a normal
distribution with a mean of 103 and a standard deviation equal to 5 text messages, then the
computed probability of Type II error is
In this case, the probability of accepting a false null hypothesis or accepting Ho given that
the average number of text messages that a Grade 11 student sends in a day is indeed 103
(greater than 100) is computed as 0.2119.
ASSESSMENT
! 371$
As an assessment, consider the following problem and ask students to do what is being asked for.
1. The Graduate Record Exam (GRE) is a standardized test required to be admitted to many
graduate schools in the United States. A high score in the GRE makes admission more likely.
According to the Educational Testing Service, the mean score for takers of GRE who do not
have training courses is 555 with a standard deviation of 139. Brain Philippines (BP) offers
expensive GRE training courses, claiming their graduates score better than those who have
not taken any training courses. To test the company’s claim, a statistician randomly selected
30 graduates of BP and asked their GRE scores.
a. Formulate the appropriate null and alternative hypotheses.
Answer: Ho: Graduates of BP courses did not score better than 555 while Ha:
Graduates of BP courses did score better than 555.
b. Identify situations when Type I and Type II errors are committed and state their possible
consequences.
Answer: Type I error is committed when we declare that the company’s claim is true
where in fact BP graduates do not perform better than 555 and a possible
consequence is that the tuition fee paid for the training is wasted. On the
other hand, Type II is committed when we declare that the BP’s claim is false
when in fact BP graduates do score better than 555 and a possible
consequence is that opportunity to score better than 555 is lost.
c. Suppose the decision rule is “Reject Ho if the mean score of the sampled BP graduates
is greater than 590; otherwise, fail to reject Ho.” Compute for the level of significance for
this test. Also, find the risk of concluding that the BP graduates did not score better than
555 when in fact the mean score is 600.
Answer: The probability of Type I error is the same as the level of significance denoted
by α.
On the other hand, the risk of concluding that the BP graduates did not score better
than 555 when in fact their mean score is 600 is the probability of committing Type II and
such risk or probability is computed as follows:
! 372$
2. Consider a manufacturing process that is known to produce bulbs that have life lengths with
a standard deviation of 75 days. A potential customer will purchase bulbs from the company
that manufactures the bulbs if she is convinced that the average life of the bulbs is 1550
days.
a. Formulate the appropriate null and alternative hypotheses.
Answer: Ho: null hypothesis, that the average life of bulbs is (at least) 1550 days against the
alternative hypothesis, that the average is less than 1550
b. Identify situations when Type I and Type II errors are committed and state their possible
consequences.
Answer: Type I error is committed when we declare that the average life is less than 1550
days where in fact the average life is 1550 days or more. On the other hand, Type II is
committed when we declare that the average is at least 1550 days, when in fact, it is less
than 1550 days.
c. Suppose the decision rule is “Reject Ho if a random sample of 50 bulbs has a life less
than than 1532 days; otherwise, fail to reject Ho.” Compute for the level of significance
for this test. Also, find the risk of concluding that the average is greater than 1550 days
when in fact their mean score is 1500.
Answer: The probability of Type I error is the same as level of significance denoted by
α.
On the other hand, the risk of concluding that the mean is at least 1550 days, when in
fact it is less than 1500, is:
! 373$
Lesson 3: Test on Population Mean (Part 1)


• Formulate appropriate null and alternative hypotheses on the population mean
• Identify the appropriate form of the test statistic on the population mean when the
population variance is assumed to be known
• Identify the appropriate rejection region for a given level of significance when the
• Conduct the test of hypothesis on population mean when the population variance is
assumed to be known
LESSON OUTLINE
A. Introduction of the possible null and alternative hypotheses on population mean
B. Steps in hypothesis testing on population mean when the population variance is
assumed to be known
C. Illustration of the test of hypothesis on the population mean when the
As a review, recall the steps of hypothesis testing procedure discussed in the previous
lesson. You may either give it as a quiz or in the form of recitation. Or, write it on the board
to serve as a guide in the development of the lesson. The following are the steps identified
in the previous lesson:
region.
3. Using a simple random sample of observation, compute the value of the test
statistic.
4. Make a decision whether to reject or fail to reject Ho.
As a motivational activity, you may use the first step of the procedure which is to formulate
the appropriate null and alternative hypotheses. This way, you can also review the learners
on how to formulate the null and alternative hypotheses by asking them to do it with each
of the following real life problems. Through this, learners will be able to identify the
population parameter of interest in the problem.
! 374$
Here are some real life problem situations that you can use:
1. The father of a senior high school student is lists down the expenses he will incur when he sends
his daughter to the university. At the university where he wants his daughter to study, he hears that
the average tuition fee is at least Php20,000 per semester. He wants to do a test of hypothesis.
In this problem, the parameter of interest is the average tuition fee or the true population mean of
the tuition fee. In symbol, this parameter is denoted as µ. As applied to the problem, the appropriate
null and alternative hypotheses are:
Ho: The average tuition fee in the targeted university is at least Php20,000. In symbols, Ho: µ ≥
Php20,000.
Ha: The average tuition fee in the targeted university is less than Php20,000. In symbols, Ha: µ <
Php20,000.
2. The principal of an elementary school believes that this year, there would be more students from
the school who would pass the National Achievement Test (NAT), so that the proportion of students
who passed the NAT is greater than the proportion obtained in previous year, which is 0.75. What
will be the appropriate null and alternative hypotheses to test this belief?
In this problem, the parameter of interest is the proportion of students of the school who passed the
NAT this year. In symbol, this parameter is denoted as P. As applied to the problem, the appropriate
null and alternative hypotheses are:
Ho: The proportion of students of the school who passed the NAT this year is equal to 0.75. In
symbols, Ho: P = 0.75.
Ha: The proportion of students of the school who passed the NAT this year is greater than 0.75. In
symbols, Ho: P > 0.75.
Discuss the results of this activity to learners, pointing out the following.
• A statistical hypothesis is a statement about a parameter and deals with evaluating the value of
the parameter.
• The null and alternative hypotheses should be complementary and non-overlapping.
• Generally, the null hypothesis is a statement of equality or includes the equality condition as in
the case of ‘at least’ (greater than or equal) or ‘at most’ (less than or equal).
Choose the first problem or the problem on the average tuition fee to further develop the lesson. In
the problem, the parameter is the population mean. To identify the test statistics, which is part of the
second step, certain assumptions have to be made.
• With the assumption of known population variance (σ2) and the variable of interest is measured
at least in the interval scale and follows the normal distribution, the appropriate test statistic,
denoted as ZC is computed as where is the sample mean computed from a
simple random sample of n observations; µ0 is the hypothesized value of the parameter; and σ is
the population standard deviation. The test statistic follows the standard normal distribution
which means the tabular value in the Z-table will be used as critical or tabular value. With this,
the decision rule can be one of the following possibilities:
1. Reject the null hypothesis (Ho) if ZC < -Z . Otherwise, we fail to reject Ho.
α
2. Reject the null hypothesis (Ho) if ZC > Z . Otherwise, we fail to reject Ho.
α
3. Reject the null hypothesis (Ho) if |ZC|> Z /2. Otherwise, we fail to reject Ho.
α
! 375$
For the problem, the first is the appropriate decision rule. Suppose the level of significance (α) is
set at 0.05, then the decision rule for the problem could be stated as ‘Reject Ho if ZC < -Z0.05 = -
1.645. Otherwise, we fail to reject Ho.” Note that this test procedure is referred to as “one-tail
Z-test for the population mean when the population variance is known’ and the
rejection region is illustrated as follows:
rejection!region!
&Z =&1.645!
α
• The third step is to compute for the value of the test statistic using a random sample of
observations gathered or collected for the purpose of the test of hypothesis. Suppose from a
simple random sample of 16 students, a sample mean of Php19,750 was obtained. Further, the
variable of interest, which is the tuition fee in the university, is said to be normally distributed
with an assumed population variance equal to Php160,000. Hence, the computed test statistic is
• With the computed value of the test statistic equal to -2.50, the next step is to use the decision
rule to make a decision and this is to reject Ho.
• Lastly, as a consequence of the decision, conclusions are made which are in relation to the
purpose of the test of hypothesis. With the rejection of the null hypothesis, the father can then
say that the average tuition fee in the university where he wants his daughter to study is less than
Php20,000.
To summarize the lessons learned today, present the following table:

Null Alternative
Appropriate Decision Rule and
Hypothesis Hypothesis Assumptions
Test Statistic Rejection Region
(Ho) (Ha)
Reject Ho if |ZC|> Z /2. α
Variable of Otherwise, we fail to reject

interest follows Ho.
the normal rejection!regions!
µ = µ0 µ ≠ µ0 distribution
with known
population
variance (σ2)
!!!!!!!!!!"Z /2!!!!!!!!!!!!!!Z /2!
α α
Reject Ho if ZC > Z ,.
!!!!!!!Standard!Normal!Distribution!
Variable of
α
Otherwise, we fail to reject

interest follows
Ho.
µ = µ0 the normal
or µ > µ0 distribution rejection!region!
µ ≤ µ0 with known
population
variance (σ2)
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!Z !
α
!!!!!!Standard!Normal!Distribution!
! 376$
Reject Ho if ZC < - Z . α

µ = µ0 the normal rejection!region!
or µ < µ0 distribution
µ ≥ µ0 with known
population
variance (σ2)
!!!!!!!!!!!!!!!!!!"Z !
α
!!!!!!!!Standard!Normal!Distribution!
ASSESSMENT
Using the problem in the assessment of the previous lesson, perform the test of hypothesis on the
statistician who wants to test the claim of Brain Philippines. The random sample of 30 graduates he
obtains recorded a mean score of 560 in GRE.
Step 1: Formulate the appropriate null and alternative hypotheses.

Answer: Ho: Graduates of BP courses did not score better than 555 or in symbols, µ ≤ 555 while
Ha: Graduates of BP courses score better than 555 or in symbols, µ > 555.
Step 2: Identify the test statistic to use. With the given level of significance and the distribution of the
test statistics, state the decision rule and specify the rejection region.
Answer: The appropriate test statistic is . With 5% level of significance, the decision rule
is “Reject the null hypothesis (Ho) if ZC > Z0.05 = 1.645. Otherwise, we fail to reject Ho.”
The rejection region is found on the right tail of the standard normal distribution as shown
below:
rejection!region!
Z =1.645!
α
Step 3: Using a simple random sample of observations, compute the value of the test statistic.
Answer: The computed test statistic is
Step 4: Make a decision whether to reject or fail to reject Ho.

Answer: With the computed test statistic equal to 5.48, the null hypothesis is rejected.
Step 5: State the conclusion.

Answer: We then say that the graduates of Brain Philippines do better in GRE.
MEETING LEARNERS’ NEEDS

• Continue to use the examples/assessments in other lessons in the future.
! 377$
Lesson 4: Test on Population Mean (Part 2)

• Identify the appropriate form of test statistic on the population mean when the population
variance is assumed to be unknown
• Identify the appropriate rejection region for a given level of significance when the population
• Conduct the test of hypothesis on the population mean when the population variance is
assumed to be unknown
• Identify the appropriate form of test statistic on the population mean when the population
variance is assumed to be unknown and the sample size is large enough to invoke the
• Identify the appropriate rejection region for a given level of significance when the population
• Conduct the test of hypothesis on the population mean when the population variance is
assumed to be unknown and the sample size is large enough to invoke the Central Limit
Theorem
LESSON OUTLINE
A. Steps in hypothesis testing on population mean when the population variance is
assumed to be unknown
B. Illustration of the test of hypothesis on the population mean when the population
C. Steps in hypothesis testing on population mean when the population variance is
assumed to be unknown and the sample size is large enough to invoke the Central Limit
Theorem
D. Illustration of the test of hypothesis on the population mean when the population

As in the previous lesson, you may start the lesson by reviewing the steps of hypothesis testing
procedure:
2. Identify the test statistic to use. With the given level of significance and the distribution of
the test statistics, state the decision rule and specify the rejection region.
3. Using a simple random sample of observation, compute the value of the test statistic.
4. Make a decision whether to reject or fail to reject (accept) Ho.
As a motivational activity, use the problem in the previous lesson but, in this case, emphasize that
the population variance is unknown. The problem can be stated as follows:
! 378$
The father of a senior high school student lists down the expenses he will incur when he
sends his daughter to the university where he wants her to study. He hypothesizes that the
average tuition fee is at least Php20,000 per semester. He knows the variable of interest,
which is the tuition fee, is measured at least in the interval scale or specifically in the ratio
scale. He assumes that the variable of interest follows the normal distribution but both
population mean and variance are unknown. The father asks, at random, 25 students of the
university about their tuition fee per semester. He is able to get an average of Php20,050
with a standard deviation of Php500.
• In this problem, the appropriate null and alternative hypotheses remain the same as in the
previous lesson and are stated as:
20,000 pesos.
20,000 pesos.
• With the assumption of unknown population variance (σ2) and the variable of interest is
measured at least in the interval scale and follows the normal distribution, the appropriate test
statistic, denoted as tC is computed as where and s, are the sample mean and
sample standard deviation, respectively, computed from a simple random sample of n

observations; and µ0 is the hypothesized value of the parameter. The test statistic follows the
Student’s t-distribution with n-1 degrees of freedom which means the tabular value in the
Student’s t-table will be used as critical or tabular value. With this, the decision rule can be one
of the following possibilities:
1. Reject Ho if tC < -t , n-1. Otherwise, we fail to reject Ho.

α
2. Reject Ho if tC > t , n-1. Otherwise, we fail to reject Ho.

α
3. Reject Ho if |tC|> t /2, n-1. Otherwise, we fail to reject Ho.

α
set at 0.05, then the decision rule for the problem can be stated as “Reject Ho if the tC < -t ,24 = -
α
2.064. Otherwise, we fail to reject Ho.” Note that this test procedure is referred to as “one-tail
t-test for the population mean” and the rejection region is illustrated as follows:
rejection!region!
't ,!n'1='2.064!
α
• The third step in hypothesis testing procedure is to compute for the value of the test statistic
based on a random sample of observations collected. It was stated in the problem that from a
simple random sample of 25 students, a sample mean of PhP20,050 with standard deviation 500
pesos was obtained. Hence, the computed test statistic is
! 379$
• The next step is to use the decision rule to make a decision. With the computed value of the test
statistic equal to 0.50 and the rule dictates that our decision is not to reject or fail to reject the
null hypothesis.
• Lastly, as a consequence of the decision, conclusions are to be stated. With the acceptance of
the null hypothesis, the father can say that the average tuition fee at the university where he
wanted his daughter to study is at least Php20,000.
We proceed to the next lesson by asking learners what they will do in case the variable of interest
cannot be assumed to follow a normal distribution. Is there a way to test the hypotheses?
The answer to this question is: Yes, there is a way to do it but they must be assured that the sample
size is large enough to invoke the Central Limit Theorem they learned under the lesson on sampling
distribution of the sample mean. Let us say that for the given problem, a random sample of size 36 is
sufficient for us to invoke the theorem. Hence, we could restate the problem as follows. Notice that
we emphasize the change in the sample size to invoke the theorem.
The father of a senior high school student lists down the expenses he will incur when he
sends his daughter to the university, where he wanted her to study. He hypothesizes that the
average tuition fee is at least Php20,000 per semester. He knows the variable of interest,
which is the tuition fee, is measured at least in the interval scale or specifically in the ratio
scale. He assumes that the variable of interest follows a distribution with
unknown population mean and variance. The father asks, at random, 36 students of
the university about their tuition fee per semester. He is able to get an average of PhP20,200
with a standard deviation of 400 pesos.
• In this problem, the appropriate null and alternative hypotheses remain the same as in the
previous lesson and are stated as follow:
20,000 pesos.
20,000 pesos.
• With the assumption of unknown distribution of the variable of interest as well as its population
variance (σ2) but with a sample size large enough to invoke the Central Limit Theorem, the test
statistic, denoted as tC which was used earlier, is still appropriate to use. This test statistic is
computed as where and s, are the sample mean and sample standard deviation,
respectively, computed from a simple random sample of n observations; and µ0 is the

hypothesized value of the parameter. However, this time with the Central Limit Theorem, we can
assume that the test statistic follows the standard normal distribution which means the tabular
value in Z-table will be used as critical or tabular value. With this, the decision rule can be one of
the following possibilities:
1. Reject Ho if tC < -Z ,. Otherwise, we fail to reject Ho.

α
2. Reject Ho if tC > Z ,. Otherwise, we fail to reject Ho.

α
3. Reject Ho if |tC|> Z /2. Otherwise, we fail to reject Ho.

α
set at 0.05, then the decision rule for the problem can be stated as “Reject Ho if tC < - Z0.05, = -
1.645. Otherwise, we fail to reject Ho.” The rejection region is illustrated as follows:
! 380$
rejection!region!
'#Z ,#='1.645!
α
• The third step in hypothesis testing procedure is to compute for the value of the test statistic
based on a random sample of observations collected. It was stated in the problem that from a
simple random sample of 36 students, a sample mean of PhP20,250 with standard deviation 400
pesos was obtained. Hence, the computed test statistic is
• The next step is to use the decision rule to make a decision. With the computed value of the test
statistic equal to 3.75, the rule dictates that our decision should be to reject the null hypothesis.
• Lastly, as a consequence of the decision, conclusions are to be stated. With the rejection of the
null hypothesis, the father can then say that the average matriculation tuition fee at the university
where he wanted his daughter to study is less than Php20,000.
Null Alternative
Appropriate Decision Rule and
Hypothesi Hypothesi Assumptions
Test Statistic Rejection Region
s (Ho) s (Ha)
Reject Ho if |tC|> t /2, n-1. α

interest follows a Ho.
normal rejection!regions!
µ = µ0 µ ≠ µ0 distribution with
unknown
population
variance (σ2).
##########$t /2,n$1#########t /2,n$1#
α α
Reject Ho if tC > t , n-1.

Student’s#t#Distribution#with#n$1#df#
α

µ = µ0 normal
rejection!region!
or µ > µ0 distribution with
µ ≤ µ0 unknown
population
variance (σ2).
##################################t ,#n$1#
α
Student’s#t#Distribution#with#n$1#df#
! 381$
Reject Ho if tC < - t , n-1. α

µ = µ0 normal rejection!region!
or µ < µ0 distribution with
µ ≥ µ0 unknown
population
variance (σ2).
################$t ,#n$1#
α
Reject Ho if |tC|> Z /2.

####Student’s#t#Distribution#with#n$1#df#
α

an unknown rejection!regions!
µ = µ0 µ ≠ µ0 distribution but
uses a large
sample to invoke
the CLT.
##########$Z /2##############Z /2#
α α
Reject Ho if tC > Z ,.
#######Standard#Normal#Distribution#
α

µ = µ0 an unknown
rejection!region!
or µ > µ0 distribution but
µ ≤ µ0 uses a large
sample to invoke
the CLT.
##################################Z # α
Reject Ho if tC < - Z .
######Standard#Normal#Distribution# α

µ = µ0 an unknown rejection!region!
or µ < µ0 distribution but
µ ≥ µ0 uses a large
sample to invoke
the CLT.
##################$Z # α
########Standard#Normal#Distribution#
ASSESSM ENT
The purpose of the assessment is to conduct the test of hypothesis using the appropriate
components of the test procedure. Hence, ask learners to conduct the test of hypothesis for
each of the following real-life problems.
1. The minimum wage earners of the National Capital Region are believed to be receiving
less than Php500 per day. The CEO of a large supermarket chain in the region is claiming to
be paying its contractual higher than the minimum daily wage rate of Php500. To check on
this claim, a labour union leader obtained a random sample of 144 contractual employees
from this supermarket chain. The survey of their daily wage earnings resulted to an average
wage of Php510 per day with standard deviation of Php100. The daily wage of the region is
! 382$
assumed to follow a distribution with an unknown population variance. Perform a test of
hypothesis at 5% level of significance to help the labour union leader make an empirical-
based conclusion on the CEO’s claim.

Answer: Ho: The CEO’s claim is not true or the average daily wage rate of the contractual
employees at the supermarket is less than or equal to Php500. In symbols, µ ≤
500 while Ha: The CEO’s claim is true or the average daily wage rate of the
contractual employees from at the supermarket is higher than Php500. In
symbols, µ > 500.
Step 2: Identify the test statistic to use. With the given level of significance and the
distribution of the test statistics, state the decision rule and specify the rejection region.
Answer: The appropriate test statistic is . With 5% level of significance, the
decision rule is ‘Reject the null hypothesis (Ho) if tC > Z0.05 = 1.645. Otherwise, fail
to reject Ho. The rejection region is found on the right tail of the standard normal
distribution as shown below:
rejection!region!
Z =1.645!
α
Step 3: Using the sample statistics obtained from a random sample of size 144, compute for
the value of the test statistic.

Answer: With the computed test statistic equal to 1.20, the null hypothesis is not rejected.

Answer: We say the claim of the CEO is not true and that the daily minimum wage rate of
the contractual workers at the supermarket chain in the region is at most Php500.
2. A brand of powdered milk is advertised as having a net weight of 250 grams. A curious
consumer obtained the net weight of 10 randomly selected cans. The values obtained are:
256, 248, 242, 245, 246, 248, 250, 255, 243 and 249 grams. Is there reason to believe that
the average net weight of the powdered milk cans is less than 250 grams at 10% level of
significance? Assume the net weight is normally distributed with unknown population
variance.
! 383$
Answer: Ho: The average net weight of the powdered milk cans is equal to 250 grams. In
symbols, µ = 250 while Ha: The average net weight of the powdered milk cans is
equal to 250 grams. In symbols, µ< 250
Step 2: Identify the test statistic to use. With the given level of significance and the
distribution of the test statistics, state the decision rule and specify the rejection region.
Answer: The appropriate test statistic is . With 10% level of significance, the
decision rule is “Reject the null hypothesis (Ho) if the tC < -t0.10,9 = -2.998.”
Otherwise, we fail to reject Ho. The rejection region is found at the left tail of the
Student’s t-distribution with 9 df shown below:
rejection!region!
$t0.10,9='2.998!
Step 3: Using the 10 observations, compute for the value of the test statistic.
Answer: The sample mean and sample standard deviation are computed as
and respectively. The
computed test statistic is

Answer: With the computed test statistic equal to -1.23, the null hypothesis is not rejected.

Answer: We can then say that the advertised average net weight of the powdered milk is
indeed true or µ = 250 grams .
M EETING LEARNERS’ NEEDS

! 384$
Lesson 5: Test on Population Proportion

• Formulate appropriate null and alternative hypotheses on the population proportion
• Identify the appropriate form of the test statistic on the population proportion when
the sample size is large enough to invoke the Central Limit Theorem
sample size is large enough to invoke the Central Limit Theorem
• Conduct the test of hypothesis on population proportion when the sample size is
large enough to invoke the Central Limit Theorem!
LESSON OUTLINE
a. Introduction of the possible null and alternative hypotheses on population
proportion
b. Steps in hypothesis testing on population proportion when the sample size is large
enough to invoke the Central Limit Theorem
c. Illustration of the test of hypothesis on the population proportion when the sample
size is large enough to invoke the Central Limit Theorem

As in the previous lesson, start the lesson by reviewing the steps of hypothesis testing
procedure:
region.
3. Using a simple random sample of observation, compute the value of the test
statistic.
4. Make a decision on whether to reject or fail to reject (accept) Ho.
As a motivational activity, you may use a problem in Lesson 3, which is stated as follows:
The principal of an elementary school believes that this year there would be more
students from the school who would pass the National Achievement Test (NAT),
so that the proportion of students who passed the NAT is greater than the so
that the proportion of students who passed the NAT is greater than the
proportion obtained in previous year, which is 0.75. What will be the appropriate
null and alternative hypotheses to test this belief?
385$
In this problem, the parameter of interest is the proportion of students of the school who
will pass the NAT this year. In symbol, this parameter is denoted as P. As applied to the
problem, the appropriate null and alternative hypotheses are:
Ho: The proportion of students of the school who will pass the NAT this year is equal to
0.75. In symbols, Ho: P = 0.75.
Ha: The proportion of students of the school who will pass the NAT this year is greater than
0.75. In symbols, Ho: P > 0.75.
• The variable as to whether a student passes the NAT this year or not is said to follow a
Bernoulli distribution with parameter P. If we further say that out of n students, the
number of students who will pass the NAT this year as the variable of interest, then this
variable is distributed as binomial with parameters n and P. With the assumption of
large sample to be able to invoke the Central Limit Theorem, the appropriate test
statistic, denoted as ZC is computed as where is the sample
proportion computed from a simple random sample of n observations; and P0 is the

hypothesized value in of the parameter. The test statistic follows the standard normal
distribution which means the tabular value in the Z-table will be used as critical or
tabular value. With this, the decision rule can be one of the following possibilities:
1. Reject the null hypothesis (Ho) if ZC < -Z . Otherwise, we fail to reject Ho.
α
2. Reject the null hypothesis (Ho) if ZC > Z . Otherwise, we fail to reject Ho.
α
3. Reject the null hypothesis (Ho) if |ZC|> Z /2. Otherwise, we fail to reject Ho.
α
For the problem, the second option is the appropriate decision rule. Suppose the level
of significance (α) is set at 0.05, then the decision rule for the problem can be stated as
“Reject Ho if ZC > Z0.05 = 1.645. Otherwise, we fail to reject Ho.” Note that this test
procedure is referred to as “one-tail Z-test for population proportion” and the rejection
region is illustrated as follows:
rejection!region!
Z0.05=1.645!
• The third step is to compute for the value of the test statistic using a random sample of
observations gathered or collected for the purpose of the test of hypothesis. Suppose
from a simple random sample of 100 students of the school, 78 students were able to
pass the NAT. Hence, the computed test statistic is
.
• With the computed value of the test statistic equal to 0.6928, the next step is to use the
decision rule to make a decision: not to reject or fail to reject Ho.
• Lastly, as a consequence of the decision conclusions are made which are in relation to
the purpose of the test of hypothesis. With the non-rejection of the null hypothesis, then
386$
it can be concluded that it is not true that more students of the school did perform
better in NAT this year at 5% level of significance.
Null Alternative
Appropriate Test Decision Rule and
Hypothesis Hypothesis Assumptions
Statistic Rejection Region
(Ho) (Ha)
Reject Ho if |ZC|> Z /2. α

interest Ho. rejection!regions!
follows the
P = P0 P ≠ P0 binomial
distribution
with n and P
as parameters """"""""""#Z /2""""""""""""""Z /2"
α α
"""""""Standard"Normal"Distribution"
Reject Ho if ZC > Z ,. α

interest Ho.
rejection!region!
P = P0 follows the
or P > P0 binomial
P ≤ P0 distribution
with n and P
as parameters """"""""""""""""""""""""""""""""""Z " α
""""""Standard"Normal"Distribution"
Reject Ho if ZC < - Z . α

interest Ho.
rejection!region!
P = P0 follows the
or P < P0 binomial
P ≥ P0 distribution
with n and P
as parameters """"""""""""""""""#Z "
α
""""""""Standard"Normal"Distribution"
ASSESSMENT
Carry out a test of hypothesis to draw conclusions in relation to each of the following problems:
1. Previous evidences show that majority of the students are happy and contented with the
univesity’s policies. This year, a random sample of 100 students was drawn. They were asked if they
were happy and contented with the univesity’s policies. Out of 100 students, 65 said so. What
conclusions could be made at 10% level of significance?

Answer: Ho: At most, half of the student population are happy and contended with the university’s
policies. In symbols, P ≤ 0.50 while Ha: Majority of the student population are happy and
contended with the university’s policies. In symbols, P > 0.50
387$
Answer: Having the variable of interest defined as the number of happy and contented students with
the university policies out of n students, the appropriate test statistic is .
With 10% level of significance, the decision rule is “Reject the null hypothesis (Ho) if ZC >
Z0.10 = 1.28. Otherwise, we fail to reject Ho.” The rejection region is found on the right tail
of the standard normal distribution as shown below:
rejection!region!
Z =1.28!
α
Step 3: Using a simple random sample of observations, compute for the value of the test statistic.

Answer: With the computed test statistic equal to 3.0, the null hypothesis is rejected.

Answer: We then say that majority of the student population are happy and contended with the
university’s policies.
2. An independent research group is interested to show that the percentage of babies delivered
through Ceasarian Section is decreasing. For the past years, 20% of the babies were delivered
through Ceasarian Section. The research group randomly inspects the medical records of 144 births
and finds that 25 of the births were by Ceasarian Section. Can the research group conclude that the
percent of births by Ceasarian Section has decreased at 5% level of significance?

Answer: Ho: The proportion of births that were delivered by Caesarean Section is not decreasing,
that is, it is still at least equal to 0.20. In symbols, P ≥ 0.20 while Ha: The proportion of
births that were delivered by Caesarean Section is decreasing, that is, it is less than 0.20. In
symbols, P < 0.20
Answer: Having the variable of interest defined as the number of births out of n that were delivered
through Caesarean Section, the appropriate test statistic is . With 5%
level of significance, the decision rule is “Reject the null hypothesis (Ho) if ZC < -Z0.05 = -
1.645. Otherwise, we fail to reject Ho.” The rejection region is found on the left tail of the
standard normal distribution as shown below:
388$
rejection!region!
6Z =61.645!
α
Step 3: Using a simple random sample of observations, compute for the value of the test statistic.

Answer: With the computed test statistic equal to -0.3, we fail to reject the null hypothesis.

Answer: We then say that the proportion of births that were delivered by Caesarean Section is not
decreasing.
MEETING LEARNERS’ NEEDS

389$
CHAPTER 5: TEST OF HYPOTHESIS
Lesson 6: More on Hypothesis Tests Regarding

The Population Proportion
OVERVIEW OF LESSON: In this lesson, learners participate in an activity to reinforce their

understanding of hypothesis testing. The lesson is largely taken from a STatistics Education
Web (STEW) lesson plan called “I Always Feel Like Somebody's Watching Me.” Learners
perform an experiment in order to test the “Psychic Staring Effect,” i.e., the idea that
people can sense they are being stared at. The activity can be used to illustrate large-
sample confidence intervals and hypothesis tests on proportions.
• Calculate a confidence interval estimate for the population proportion
• Formulate appropriate null and alternative hypotheses on the population proportion
• Identify the appropriate form of the test statistic on the population proportion when
the sample size is large enough to invoke the Central Limit Theorem
sample size is large enough to invoke the Central Limit Theorem
• Conduct the test of hypothesis on population proportion when the sample size is
large enough to invoke the Central Limit Theorem
MATERIALS REQUIRED
• Scientific calculators
• Activity sheet (found at the end of this lesson)
• Some mechanism or instructions for incorporating randomness (the scientific
calculator can be used for this task).
LESSON OUTLINE
A. Point estimator of the population proportion
B. Properties of the sample proportion as point estimator of population proportion
C. Construction and interpretation of a (1-α)100% confidence interval estimator of
the population proportion using large sample
D. Illustration on the computation of a point and interval estimates of the
population proportion and its interpretation

A. ACTIVITY: “PSYCHIC STARING EFFECT”
This lesson involves an activity where learners collect and explore data. Students perform an
experiment in order to test the “Psychic Staring Effect.”
Explain first to students that the “Psychic Staring Effect” is the idea that people can sense
they are being stared at. This has been studied heavily by many different researchers, but
390$
with different results. In 2003, Rupert Sheldrake wrote a book entitled “The Sense of Being
Stared At,” which contained anecdotal evidence for the phenomenon: “Many people have
had the experience of feeling that they are being looked at, and, on turning around, find
that they really are. Conversely, many people have stared at other people's backs, for
example in a lecture theater, and watched them become restless and then turn round.”
In the late 19th century, psychologist Edward B. Titchener suggested that the effect was an
illusion, and that when a person turned to check whether they were being watched, the
initial movement of their head might attract the focus of somebody behind them who was
previously only looking in their general direction. By the time the person had turned their
head fully, the second person would be looking directly at them, giving the mistaken
impression that they had been staring at them all along.
After introducing the “Psychic Staring Effect,” explain to learners that the goal of the
activity is to perform one of Sheldrake’s experiments, and perform a statistical analysis of
the data collected to either support or refute Sheldrake’s claims.
Data Collection
Put learners into pairs and explain the data collection procedure. One person
will be the Looker and the other, the Subject.
• The Subject should sit with his or her back to the Looker and keep his or her eyes
closed.
• The Looker either “looks” or does “not look” at the Subject in a series of 10 trials,
according to a random sequence. This random sequence can be generated by a
random number generator on a calculator or by tossing a fair coin.
The Looker should stand about 1 meter behind the Subject’s back, and either
“looks” or does “not look” at the Subject in accordance with a random sequence of
trials. The teacher might wish to instruct all the Lookers to look down at the data
collection sheet if they are not looking at the Subject on a particular trial. To signal
the beginning of each trial, the teacher should give a signal to the entire class, so
that all trials are performed simultaneously.
The teacher should say “Trial one: Begin.” (Since all Lookers are following different
random sequences of instructions, the teacher’s voice will give no relevant clues to
the Subjects.) The Subject then says “looking” or “not looking,”and the Looker
records the looking status and the Subject’s response on the Data Collection Sheet.
The Subject should not spend long thinking, but guess quite quickly. 10 seconds are
long enough. The Looker should record the Subject’s guess and then proceed to
the next trial. For the next trial, the teacher should say, “Trial two: Begin.” And so
on. The entire procedure is repeated for 10 trials. After the series of 10 trials has
been completed, the Lookers hand in their Data Collection Sheets. The Lookers and
Subjects then trade places. Each new Looker starts with a new data sheet.
Assuming a class size of 40 students is split into 20 pairs with each student within a
pair playing the role of the Looker, 400 total trials would be produced for the class.
After all the trials have been completed, collect the results from the pairs via the
Data Collection Sheets and tally the class results on the white board.
391$
Note: 400 total trials will be performed. However, we expect the Looker to be Looking at
the Subject in only half of these trials. It is this half that will be used for a part of the data
analysis that follows.
Table 1 contains sample results obtained when this activity is performed with 28 students.
Table 1. Example two-way frequency table for class data.
Answer of Subject
Status of Looker Looking Not Looking Row Totals
Looking 66 68 134
Not Looking 76 70 146
Column Totals 142 138 280
After the class results are tallied onto the white board, students use the data to answer a
series of questions designed to determine if the data support the existence of “Psychic
Staring Effect
B. DATA ANALYSIS
Here are three ways to analyze the data:
1. Confidence Intervals
Explain to learners that they need to make inferences regarding the population proportion,
p. In the context of Sheldrake’s experiment p is the proportion of the time that someone
who is being looked at can correctly identify that they are being looked at.
Help learners recall the following (large sample) formula that can be used to construct a
confidence interval for p:
pˆ (1 − pˆ )
pˆ ± z
n
where p̂ is the sample proportion of successes, n is the sample size, and z is the multiplier
(critical value).
The assumptions for using this procedure are: (1) The random variable of interest is
categorical; (2) The data are obtained using randomization; (3) The sample size is
sufficiently large so that the sampling distribution of the sample proportion is approximately
normal. Specifically, npˆ ≥ 15 and n (1 − pˆ ) ≥ 15.
Discuss the assumptions with the learners. Have them identify the random variable of
interest. On each trial, a success occurs if a Subject correctly identifies being looked at.
Thus, the variable of interest is whether or not a correct response is given by a Subject who
392$
is being looked at. Whether or not the Looker is looking at the Subject on a particular trial
is obtained using randomization. The randomization is either done by flipping a fair coin or
generating a random number in order to determine whether or not the Looker does look at
a Subject on an individual trial. For a typical class of 40 students, the number of trials will
approach 200, so it is not likely that the sample size requirement will not be met.
Translate the problem from one concerning the random variable of interest to one that
involves a population proportion. Ask students to identify what the unknown proportion is
in this experiment’s context. In this experiment, the population proportion p is the
proportion of those being looked at who correctly identify that they are being looked at.
Instruct learners to compute for the sample proportion for the class results. Discuss what the
value of the sample proportion indicates about the validity of the “Psychic Staring Effect.”
Ask learners to construct a 95% confidence interval for p based upon the class data. Once
the interval has been constructed, ask them to interpret the interval and to state whether or
not they believe that the class data indicate that there is a “Psychic Staring Effect.”
For the sample class data, pˆ , the sample proportion of the time that someone who is being
66
looked at can identify that they are being looked at is pˆ = ≈ .4925 or .49. For 95%
134
confidence, the z multiplier is 1.96. Thus, inserting the class values into the confidence
.49(1 − .49)
interval formula gives: .49 ± 1.96 = .49 ± .08 = (.41,.57).
134
Based upon this confidence interval, we can say, with 95% confidence, that the proportion
of the time that someone who is being looked at can identify that they are being looked at
is between .41 and .57. Notice that this interval includes the value of .50, (the probability of
getting a head in flipping a fair coin), which does not support the existence of the “Psychic
Staring Effect.”
2. Hypothesis Tests
Next, discuss with learners the procedure for performing a large-sample hypothesis test on
the value of a population proportion. If they wish to perform a hypothesis test on a
proportion, p, the corresponding test statistic formula to test H 0 : p = p0 is
pˆ − p0
z=
p0 (1 − p0 )
n
where p0 is the null hypothesized value, p̂ is the sample proportion of successes, and n is
the sample size.
The assumptions for this are the same as the ones for large-sample confidence interval
for p.
Have a class discussion about the appropriate value to use for p0 and the appropriate
alternative hypothesis to use in order to use the class data to test for the existence of the
393$
“Psychic Staring Effect.” Learners should agree that the null hypothesis should
be H 0 : p = .50 and the upper-tailed alternative hypothesis should be H A : p > .50.
Once the test statistic value has been calculated, ask learners to calculate and interpret the
p-value and to make a conclusion in the context of the problem. That is, do the class data
indicate the presence of the “Psychic Staring Effect”?
.49 − .50
For the sample class data, the test statistic is: z = ≈ −.23.
.50(1 − .50)
134
The teacher may wish to note that this test statistic value is very close to zero (0). A z test
statistic of zero (0) is obtained if the sample proportion are exactly equal to 0.50. Since the
alternative hypothesis here is upper tailed, the p-value is equal to the area to the right of
−.23 under the standard normal distribution curve. This area is approximately equal to
0.59. Based upon this p-value, the null hypothesis cannot be rejected. Therefore, the data
do not provide significant evidence to indicate that the proportion of the time that
someone who is being looked at can identify that they are being looked at exceeds .50.
Or, in other words, the data does not support the existence of the “Psychic Staring Effect.”
REFERENCES
Nelia Marquez). Philippines: Rex Bookstore.
Freedman, D., Pisani, R, and Purves, R. (2007). Statistics, Fourth Edition. New York: W. W. Norton &
Company
Richardson, M. and Stephenson, P. I Always Feel Like Somebody’s Watching Me. Grand Valley State
University in Statistics Education Web (STEW). Retrieved from
https://www.amstat.org/education/stew/pdfs/IAlwaysFeelLikeSomebodysWatchingMe.doc
Schneiter, K. Exploring Geometric Probabilities with Buffon’s Coin Problem. Utah State University in
Statistics Education Web (STEW) Online Journal of K-12 Statistics Lesson Plans. Retrieved from
http://www.amstat.org/education/stew/pdfs/EGPBCP.pdf
394$
Activity Sheet 5-06
Background
The Psychic Staring Effect (Adapted from: http://en.wikipedia.org/wiki/Staring)
The “Psychic Staring Effect” is the idea that people can sense that they are being stared at.
It has been studied heavily, by many different researchers, with different results. In 2003,
Rupert Sheldrake wrote the controversial book “The Sense of Being Stared At,” which
contained a great deal of anecdotal evidence for the phenomenon: “Many people have
had the experience of feeling that they are being looked at, and, on turning around, find
that they really are. Conversely, many people have stared at other people's backs, for
example in a lecture theater, and watched them become restless and then turn round.”
After students reported the phenomenon to him in the late 19th century, psychologist
Edward B. Titchener suggested that the effect was an illusion, and that when a person
turned to check whether they were being watched, the initial movement of their head might
attract the focus of somebody behind them who was previously only looking in their general
direction. By the time the person had turned their head fully, the second person would be
looking directly at them, giving the mistaken impression that they had been staring at them
all along.
Goal:
Perform one of Sheldrake’s experiments to see if a statistical analysis of the resulting data
supports Sheldrake’s claims. Learners will perform Sheldrake’s “Method 1” which is an
experiment with Lookers and Subjects within the same room.
The Experim ent:
The experiment involves working in pairs. One person is the Looker and the other the
Subject. The Subject sits with his or her back to the Looker and keeps his or her eyes
closed. Lookers either “look” or do “not look” at the Subjects in a series of 10 trials
according to a random sequence. This random sequence can be generated by a random
number generator on a calculator or a fair coin toss.
The Looker stands about 3 feet behind the Subject’s back, and either “looks” or does “not
look” at the Subject in accordance with the random sequence of trials. To signal the
beginning of each trial, the teacher will give a signal to the entire class, so that all trials are
performed simultaneously. The teacher will say “Trial one: Begin.” Since all Lookers are
following different random sequences of instructions, the teacher’s voice will give no
relevant clues to the Subjects. Then for the next trial, the teacher will say, “Trial two:
Begin.” And so on. The Subject then says “looking” or “not looking,” and the Looker
records the looking status and the Subject’s response on the Data Collection Sheet. The
Subject should not spend a long time thinking whether the Lookers are looking or not
looking, but guess quite quickly. 10 seconds are long enough. The Looker records the
Subject’s guess and then proceeds to the next trial. The same procedure is repeated for all
10 trials.
395$
After the series of 10 trials has been completed, the Lookers hand in the Data Collection
Sheets. The Lookers and Subjects then trade places. Each new Looker starts with a new
data sheet.
Questions:
A. Large-Sam ple Confidence Interval on a Proportion
1. If a person can in fact determine whether they are being stared at, then what can we say
about the numerical value of the proportion of those being looked at who should guess
correctly and say “yes”?
2. For the class data:
(a) Identify the sample size. n = __________

(b) Identify the number of successes (recall, a trial results in a success if the Subject has
correctly identified being Looked at). x = __________
(c) Calculate the sample proportion of those being Looked at who correctly answered “yes.”!
p̂ = __________
3.!!Construct!a!95%!confidence!interval!for!the!proportion!of!correct!guesses!for!those!being!looked!at!(that!is,!
the!proportion!of!time!that!those!who!were!being!looked at said “yes”).
(a) Give the formula:
(b) Plug the appropriate class data into the formula:
(c) Give the confidence interval:
(d) Interpret the interval. State whether or not the interval supports the “Psychic Staring
Effect”:
B. Hypothesis Test on a Proportion

4. Does the class data provide significant evidence to indicate that the proportion of “yes” guesses
is higher than 50% for those being looked at?
(a) State the appropriate hypotheses:
(b) Summarize the data into an appropriate test statistic:
(c) Find the p-value:
(d) Report a conclusion in the context of this problem:
(e) In the context of this problem, explain what a type 1 error would be:
(f) In the context of this problem, explain what a type 2 error would be:
396$
Data Collection Sheet
Name of Looker____________________
Name of Subject ____________________
Looker Subject
Trial Looking (yes) / Says Being Looked At (yes)/
Not Looking (no) Says Not Being Looked At (no)
2
3
5
6
7
8
9
10
Sum m ary:
Guess by Subject
Looking
Yes No
Yes
No
ASSESSMENT
1. Suppose a new treatment for a certain disease is given to a sample of 100 patients. The treatment
was successful for 81 of the patients. Assume that these patients are representative of the population
of individuals who have this disease.
(a) Calculate the sample proportion that were successfully treated.
Answer: The sample proportion is
397$
(b) Determine a 95% confidence interval for the proportion of the population for whom the
treatment would be successful. Write a sentence that interprets this interval.
Answer: The formula to use is Sample estimate ± Multiplier × Standard error. The sample estimate
is and the standard error is . For 95% confidence
the multiplier is 1.96. The 95% confidence interval is .83 ± (1.96 × .0266) ,
, which is .807 to .813. With 95% confidence, we can say that if the whole
population with this disease received the treatment, the proportion successfully treated will be
between .807and .813 (or 80.7% to 81.3%).
2. An ESP experiment is done wherein a participant guesses which of 4 cards the researcher has
randomly picked, where each card is equally likely. This is repeated for 100 trials. The null hypothesis
is that the subject is guessing, while the alternative is that the subject has ESP and can guess at
higher than the chance rate.
(i) What is the correct statement of the null hypothesis that the person does not have ESP?
A. H0: p = 0.5
B. H0: p = 4/100
C. H0: p = 1/4
D. H0: p > 1/4
Answer: C
(ii) The subject actually gets 35 correct answers. Which of the following describes the probability
represented by the p-value for this test?
A. The probability that the subject has ESP.

B. The probability that the subject is just guessing.
C. The probability of 35 or more correct guesses if the subject is guessing at the chance rate.
D. The probability of 35 or more correct guesses if the subject has ESP.
Answer: C
(iii) Which of the following would be a Type 1 error in this situation?
A. Declaring somebody does not have ESP when they actually do.
B. Declaring somebody has ESP when they actually don’t have ESP.
C. Analyzing the data with a confidence interval rather than a significance test.
D. Making a mistake in the calculations of the significance test.
Answer: B
3. The probability that a patient recovers from a certain stomach disease is .75. Suppose that 20
people who have contracted this disease are randomly selected. What is the probability that exactly
12 will recover from the disease?
Answer: Let the random variable, Y, be the number of the selected people who recover from the
disease. Then Y has a binomial distribution with n = 20 and p =0.75. Thus
for y = 0,1, 2,..., 20. So
0.168609 !
398$
!
Chapter 6: Correlation and Regression Analysis
Lesson 1: Examining Relationships with Correlation

OVERVIEW OF LESSON: In this lesson, learners provide discussions on basic concepts and tools in
exploring relationships between two variables. In addition, an activity is conducted to give learners
hands-on experience in exploring relationships between two variables using a random sample of data
collected from Lesson 1-01. Each student chooses a question to work on, then generates a scatter plot
to show the relationship of interest and finally, estimate the correlation coefficient. After all learners
have interpreted their correlation, they pool their findings as a class to explore the variability in the
correlations found. As a class, they construct approximations to the sampling distributions of the
correlation coefficient and use the sampling distributions to make assertions about the values of the
population parameters.
• illustrate the nature of bivariate data

• construct a scatter plot
• describe shape (form), trend (direction), and variation (strength) of bivariate relationships based
on a scatter plot
• estimate strength of association between two variables based on a scatter plot
• calculate the Pearson’s sample correlation coefficient
• solve problems involving correlation analysis
LESSON OUTLINE
1. Motivation:
2. Preliminary Lesson: Correlation Measures Linear Association
3. Main Lesson: How to Generate the Scatterplot and Calculate the Correlation Coefficient
4. Enrichment: Sampling Distribution of Correlations
5. Advance Lesson: Rank Correlation
(A) M otivation
In Chapter 1, learners were guided through the basic tools used for describing data
pertaining to one variable. In practice, a number of variables are collected per data item,
such as information on an individual, household, establishment, farm, country, etc.
Although data can be described and explored one variable at a time, it is also important to
! 399#
!
explore the relationship between two or among many variables. For instance, learners are
to be asked on how information about a student’s daily allowance compares with the
number of text messages he/she sends in a day, or how the weight of a student compares
with the height of the student. We might wish to compare information on poverty in a
particular region with information about crime. Such pairs of measurements are called
bivariate data. Observations of two or more variables per individual (or object) are called
m ultivariate data.
Inform learners that Sir Francis Galton was one of the first scientists who investigated the
relationships of variables within the context of studying family resemblances, particularly
the degree to which children resemble their parents. Galton’s disciple Karl Pearson further
worked on this topic through an extensive study on family resemblances. Part of this study
generated the heights of 1,078 fathers and those of their respective first-born sons when
they reached the age of maturity. A plot of these data is shown in Figure 6-01.1, where the
pairs of dots represent the father’s height on the horizontal axis and the son’s height on the
vertical axis. This is known as a scatterplot or scatter diagram . The scatter plot
suggests a positive association between the father’s height and that of his son, i.e., taller-
than-average fathers tend to have taller-than-average sons and short fathers tend to have
short sons.
Figure 6-01.1 Heights of 1,078 fathers and

sons.
Reproduced from Figure 1, Chapter 8, p. 120,
Freedman, Pisani, Purves, 2007.
Studies that involve comparing two variables are

conducted to find some connection (perhaps even
some suggestion of causality) between them. This
analysis aids us to establish whether we can
estimate the height of the son given the height of
the father. If there is a weak association between
the variables, then information about one variable will not help us in estimating the other
variable. If there is a strong association between the variables, one way by which we can
estimate one of them given the other is to fit a line passing through the point of averages
(the point consisting of the averages of the two variables) with a slope equal to the ratio of
the standard deviations of the variables. An alternative to this line is the regression line,
which will be discussed in more detail in the next set of lessons in this chapter.
(B) Prelim inary Lesson: Correlation Measures Linear Association
Let learners know that when the relationship between two numerical variables is of interest,
a scatterplot of these variables should be drawn. Tell learners that a scatterplot allows one
! 400#
!
to visualize an association between two variables, if and when it exists. Some of the
questions that can be answered with the use of a scatterplot include:
• Does one variable tend to be large when another is large?

• Does one variable tend to be small when the other is large?
• Does the relationship between the variables more or less follow a straight line?
• Is the scatter in one variable the same, regardless of the value of the other variable?
(C) M ain Lesson: How to Generate the Scatterplot
Before starting with the activity, the teacher may review the concepts of population,
sample, population parameter and sample statistic to reinforce the learner’s knowledge
and competence on these basic concepts.
Data obtained from all the learners in Lesson 1-01 may be used to create a scatterplot and
to conduct a correlation analysis.
For this lesson, the population of interest being discussed is the population of senior high
school learners who have filled out the Activity Sheet Number 1-01a in Lesson 1-01. The
statistical questions of interest for the activity are:
1. Does your daily allowance (x) increase or decrease with the number of text messages
you send in a day (y)?
2. Does your daily allowance (x) increase or decrease with how happy you are (y)?
3. Does your weight (x) increase or decrease with your height (y)?
Learners will be asked to choose the statistical question they would like to focus on.
Note for teachers on classroom organization: For small classes with less than 30 learners,
give only two questions to choose from. Make sure that the class is evenly divided among
the questions. It would be ideal to have at least 10-15 people working on the same
question. If the class size is too small, reduce the number of questions or have each
student choose two out of the three questions to explore.
! 401#
!
The teacher should share the entire database collected from Lesson 1-01 and divide the
learners into groups. Ask the groups to take a random sample of 30 records from the entire
database. Suppose that we have the following snapshot of a sample data set:
Student Sex Height (in Weight (in Daily Usual Happiness

Number meters) kg) allowance number of
in school text / day
1 F 1.64 40 0 10 5
2 F 1.52 50 50 7
3 F 1.52 49 0 5 5
4 F 1.65 45 150 18 9
5 F 1.02 60 0 4 5
6 F 1.626 45 0 60 7
7 F 1.5 38 200 20 7
8 F 1.6 51 100 6
9 F 1.42 42.2 500 200 9
10 F 1.52 54 0 15 4
11 F 1.48 46 100 10 8
12 F 1.62 54 20 2 4
13 F 1.5 36 0 25 6
14 F 1.54 50 0 30 7
15 F 1.67 63 0 60 9
16 M 1.72 55 200 80 8
17 M 1.65 61 0 30 5
18 M 1.56 60 50 1 6
19 M 52 250 80 8
20 M 1.7 90 0 30 4
21 M 1.53 50 250 100 9
22 M 1.62 90 100 0 6
23 M 1.79 80 100 6 7
24 M 1.57 58 50 0 7
25 M 1.7 68 20 0 4
26 M 1.77 27 100 8
27 M 1.478 50 300 55 7
28 M 1.727 94 100 50 7
29 M 1.56 66 50 5 6
30 M 1.75 50 0 3 5
! 402#
!
Note for teachers regarding Outlier Analysis: The class should discuss whether the
data are realistic or whether it will be necessary
to delete some cases from the analysis
(especially if these cases seem incorrect).
Show learners how to obtain a scatterplot by

hand.
• Tell them to draw a rectangular coordinate

system and to label the x- and y- axes.
Learners may be asked if they still
remember that Rene Descartes (1596-1650)
was the first to represent pairs of numbers
with points in a coordinate system, and this
is why the x and y coordinates are also
called the “cartesian coordinates.”
• Suppose that x and y represent the daily

allowance of learners and the number of
the typical number of text messages sent
by learners, respectively.
• Choose a range that includes the

maximums and minimums of the two
variables. For the sample data, our x-values !
go from 0 to 500 (pesos), while our y-values Figure 6-01.2 Dataset in Excel 2013
range from 0 to 200.
• Draw the first point, i.e. (0,10), on the graph, and then the remaining points as shown in
Figure 6-01.3.
y
10
200
Usual number of text messages sent in a day
9
usual number of text messages sent in a day
150
7
6
100
5
4
3
50
2
1
0
0
0 1 2 3 4 5 6 7 8 9 10 0 100 200 300 400 500

Daily allowance in school
daily allowance in school x
(a) (b)
! 403#
!
Figure 6-01.3 Drawing Points on a Cartesian Coordinate System (a) First

Point; (b) All Points
To obtain a scatterplot of the dataset using spreadsheet applications, such as Excel 2013,
merely put the dataset into a spreadsheet and highlight the columns of data that need to
be used (here, columns F and G). Then on the Insert tab, under the Charts group, choose
Scatter and then select the Scatter Icon. This yields a default scatterplot with a title and
legend (mentioning the y variable), shown in Figure 6-01.4 (a). This can be improved by
deleting the title and legend, and clicking on Layout, then Selecting Axis Titles, and then
adding Primary Horizontal Axis Title (with a title “allowance”) and also Primary Vertical Axis
Title (with a title “text messages”), thus yielding Figure 6-01.4 (b).
(a)
(b)
! 404#
!
Figure 6-01.4 Scatterplots of Usual Num ber of Text m essages in a Day versus
Daily Allowance (O btained from Excel 2013)
The scatterplot is a very informative tool for determining the association between two
variables, say X and Y. Let mx and my be the respective means of the variables X and Y,
and sx and sy be the respective means of the variables X and Y. In Chapter 1, learners were
reminded of statistical concepts learned in previous grades, that the mean and standard
deviation of a list of data measure the center of and scattering in the list, respectively.
Furthermore, Chebychev's inequality guarantees that at least 75% of the x- values
(pertaining to the X variable) will be within ± 2 standard deviations of the mean of X, and
likewise, that the y coordinates of at least 75% of the points will be within ± 2 standard
deviations of the mean of Y.
Inform learners that two variables are said to be associated if knowing the value of one
variable provides information about the value of the other variable. More precisely, the two
variables X and Y are (linearly) associated if the standard deviation of the y-values around a
narrow range of x-values (ie, a "vertical slice" through the scatterplot) is smaller than the
overall standard deviation sy of the variable Y; or if the standard deviation of the x-values of
the x-variables around a narrow range of values of the y-values (ie, a "horizontal slice"
through the scatterplot) is smaller than the overall standard deviation sx of the variable X.
For example, consider the plot of allowance against the number of texts sent. The two
variables are said to be associated if for particular values of X, say for the learners with
allowance of Php 50 (or thereabouts), the spread of the values of Y is less than the spread
of all the values of Y, and this is true for all the values of X. In that case, you may see that as
the allowance of the student increases, the number of texts sent also increases.
• If points with larger-than-average values of one variable tend to have larger-than-

average values of the other, and points with smaller-than-average values of one
variable tend have smaller-than-average values of the other, the scattering of the
values of Y in vertical slices through the scatterplot will be smaller than sy. The
scatterplot of the variables X and Y would show positive association. Simply put,
the values of Y tend to increase as the values of X increase.
• If points with larger-than-average values of one variable tend to have smaller-than-

average values of the other, and points with smaller-than-average values of one
variable tend to have larger-than-average values of the other, the scattering of the
values of Y in vertical slices through the scatterplot will be smaller than sy; in this
instance, there is negative association. Simply put, the values of Y tend to
decrease as the values of X increase.
! 405#
!
In conjunction with such an analysis of the scatterplot, we may need a summary measure
that would inform us of whether or not there appears to be a relationship between two
variables in our data set. One such measure is called the correlation coefficient (or
correlation, for short), which, together with the other four summary measures mx, my, sx, and
sy, provides a basic description of ttwo variables and their connection.
The correlation coefficient between two variables X an Y is a measure of

association between the variables. It is obtained by firstly, getting the
product of the standardized scores of the X’s and Y’s, and then, taking the
average of the resulting products.
The correlation coefficient is denoted by the Greek letter r (rho). The correlation (often
ascribed to Pearson) serves to measure linear association between two variables.
TECHNICAL NOTES
For a set of points ( x1 , y1 ) , ( x2 , y 2 ) , …, ( x n , y n ) the correlation coefficient is

•
•
1 n ' xi − µ x $'% yi − µ y $"
• ρ= ∑ % "
n i =1 %& σ x "#%& σ y "#
•
•
In practice, what we have is a sample data set. The sample correlation coefficient, denoted as
• r, is computed in exactly the form given above with the data still treated as though we had a
• population.
•
•
Show learners that the correlation coefficient between daily allowance (x) and the number
of the typical number of text messages (y) can be calculated by:
• Firstly, obtaining the standardized values of these variables;

• then, computing the product of the standardized values; and
• finally, obtaining the average of these products, thus yielding the correlation
coefficient.
Calculation of this whole process with Excel 2014 is shown in Figure 6-02.5, for pairs of data
with full records of x and y. Here, the data on daily allowance (x) and the number of text
messages (y) are put on the fifth and sixth columns. The eighth and ninth columns of the
worksheet represent the standardized values of daily allowance (x) and the number of text
messages (y), respectively.
! 406#
!
The last two entries of the fifth and sixth columns represent the mean and standard
deviation of the 27 data entries in the said columns which are treated as a population. For
instance, the item in cell E29, the mean of x, is obtained using the command
=AVERAGE(E2:E28)
while the item in cell E30, (the population standard deviation of x), is obtained using the
command
=STDEVP(E2:E28)
The eighth and ninth columns are the standardized values obtained by subtracting the
mean from the data and dividing the result by the standard deviation. For instance, cell
H2, which contains the standardized value of the first observation of x, is obtained using the
command
=(E2-$E$29)/$E$30
Each entry in the tenth column is the product of the corresponding entries from the eighth
and ninth columns. The final cell in the tenth column, that shows the average of the entries
in the column viz., 0.780283153, represents the correlation coefficient.
Figure 6-01.5 Com puting

correlation with Excel 2013
While the above calculations are

obtained based on the definition
of the correlation coefficient,
there is a much quicker way of
obtaining the correlation
coefficient using Excel 2013 .
Using Excel 2013, construct the
database (i.e. the first two
columns in the worksheet shown
in Figure 6-01.6) and use the
Excel function CORREL by
entering:
= CORREL(A2:A41, B2:B41)
into any cell outside of the

database, then generate the
correlation 0.780283153 between daily allowance (x) and the number of text messages (y).
! 407#
!
Figure 6-01.5 Alternative and

faster com putation of
correlation with Excel 2013
Alternative way of com puting

the correlation coefficient (in
case the use of Excel is not
possible)
List down all the values of X, Y,

XY, X2, and Y2. On the
bottommost row, calculate the
total for each column.
X Y XY X2 Y2
0 10 0 0 100
50 0 0 2500 0
150 18 2700 25200 324
…
The correlation coefficient is given below:
! 408#
!
If the numerator of the previous expression is divided by , you will get the sam ple
covariance, denoted by . The denominator is the product of the standard deviations
of X and Y. Thus, the correlation coefficient can also be computed as:
If the linear association between the variables is strong, as shown in the worked example,
we would be able to predict the variable y (here, typical number of text messages) from the
variable x (here, daily allowance). Ask learners why daily allowance would be a good
predictor of number of text messages sent. Learners should be able to say that the number
of text messages sent daily is based on the socio-economic status of the student (that could
be measured by daily allowance of the student).
Learners should be guided that the correlation coefficient is unit-free because it is based on
the standard units of the variables. Mention to learners also that correlation can only range
from –1 to 1.
• When the correlation is zero, the cloud of points is either without form (as in Figure
6-01.7) or has a nonlinear pattern (as in Figure 6-01.8). The latter is a result of
correlation being a measure of linear association (and not just association) between
the variables. Variables that have zero correlation are said to be uncorrelated.
• When the correlation coefficient is positive, the variables tend to increase together.
As the (positive-valued) correlation coefficient gets closer to 1, the stronger is the
linear association between the variables, and the more tightly we see a clustering of
points around a line in the scatterplot showing a line sloping upward.
o A correlation of exactly 1, referred to as a perfect correlation, has all the
points falling exactly on a line that is sloping upward.
• The variables will tend to go on opposite directions if the correlation between them
is negative. When the correlation is negative, either: (a) as X increases, Y decreases;
or (b) as X decreases, Y increases. Also, as the (negative-valued) correlation
coefficient gets closer to –1, the linear association between the variables gets
stronger.
! 409#
!
o A perfect –1 correlation will have all the points lying exactly on a line sloping
downward.
6
5
4
3
y
2
1
0
0.0 0.2 0.4 0.6 0.8 1.0
Figure 6-01.7 No apparent association between X and Y

25
20
15
y
10
5
0
0.0 0.2 0.4 0.6 0.8 1.0
Figure 6-01.8 A nonlinear relationship betwen X and Y
The correlation is a unit-free measure of linear association between

two variables, i.e. clustering of values of the two variables around a line
that have values between –1 and +1. The closer the correlation
between the variables is to –1, the stronger is the negative linear
relationship; the closer it is to +1, the stronger is the positive linear
relationship; and the closer it is to 0, the weaker the linear relationship.
! 410#
!
Ask learners if they think that the correlation between two variables is sensitive to the
order of the variables. The answer here should be no. Interchanging the variables would
yield the same value for the correlation between the variables. Switching y (here, typical
number of text messages) and x (here, daily allowance) in the worked example would
still generate the same value for the correlation between the variables.
Help learners notice that adding a constant to all the values of one variable will not
change the correlation coefficient. If one is to add 100 to all values of daily allowance
in the worked example, the correlation coefficient will not be affected.
Let learners know also that multiplying one variable by a positive constant does not
affect the correlation. In particular, had all learners been given an across the board
increase of 10 percent in their daily allowances (which is equivalent to multiplying the
current values by 1.10), then the correlation we calculated would remain unchanged.
(But if a negative constant is multiplied to one variable, while the correlation does not
change in absolute value, it changes direction from positive to negative, or negative to
positive.)
Learners may want to know whena correlation coefficient suggests strong association
between the variables. Although there are no hard rules in determining the strength of
the linear relationship based on the correlation coefficient, learners may want to use the
following guide in order to interpret the correlation:
0 < r < 0.3 Weak Correlation
0.3 < r < 0.7 Moderate Correlation
r > 0.7 Strong Correlation
For instance, the correlation coefficient of 0.780283153 between daily allowance (x) and
the number of text messages (y) indicates a strong positive correlation between the
variables.
Notes on interpreting the correlation coefficient:
(i) A correlation of 70% does not mean that 70% of the points are clustered around
a line. It should not also mean that there is twice as much linear association
with a set of points that has a correlation of 35%.
(ii) Furthermore, a correlation analysis does not imply that the variable X causes the
variable Y; that is, association is not necessarily causation (although it may be
indicative of cause and effect relationships). Even if polio incidence
correlates strongly with soft drink consumption, this need not mean that soft
drink consumption causes polio. If the population of ants increases (in time)
! 411#
!
with the population of persons, (and thus these numbers strongly correlate),
we cannot adopt a population control program for people based on
controlling the number of ants!
(iii) The presence of outliers (see Figure 6-01.9) easily affects the correlation of a set
of data, so it is important to take the correlation figure with a grain of salt if
we detect one or more outliers in the data. In some situations, we ought to
remove these outliers from the data set (especially those that are suspected
to be poor quality data) and re-do the correlation analysis. In other instances,
these outliers ought not to be removed as these records may be “correct
data” and they contain information that should not be deleted. In any
scatterplot, there will be more or less some points detached from the main
bulk of the data, and these seeming outliers need not be rejected without
due cause.
20
15
y
10
5
0
0.0Figure
0.2 6-01.9
0.4 Outliers
0.6 in0.8
Data 1.0
(iv) Moving from correlation to causation
x is often problematic as there are several
possible explanations for a correlation between X and Y (excluding the
possibility of chance). It may be that X influences (or cases) Y; Y influences X;
or both X and Y are influenced by some other variable. Thus, when
performing correlation analysis of variables without being given any
background knowledge or theory, inferring a causal link could not be
justified regardless of the magnitude of the correlation. While there may be
a causal link between alcohol consumption and deaths from liver cirrhosis, it
would be difficult to infer from the high correlation between pork
consumption and cirrhosis mortality. To further illustrate why correlation and
causation are not equivalent, consider the high correlation of ice cream sales
and the number of drowning cases; the high correlation of the absolute
number of unemployed in a country across the years with the number of
sunspots observed across the years; or the high correlation of the number of
! 412#
!
diseases with the number of health professionals. Based on these examples,

it is flawed to conclude that eating ice cream causes drowning or that the
sun causes unemployment or the health professionals are causing diseases.
The seemingly high correlation of ice cream sales and the number of
drowning cases is due to the fact that ice cream sales increase as the
temperatures increases (during summer!). During this time, more people go
to the beach, which also increases the chance of drowning. Therefore, there
seems to be an increase in drowning cases that coincides with the increase
of the sales of ice cream due to the season.
(D) Enrichm ent: Sampling Distribution of Correlations
Note: Teacher may opt to break the lesson here for the following session day if class
sessions are only done for at most 60 minutes in a day.
1. Using the “samples” obtained from the database, learners should be asked to analyze
the sample on their own. Each student should create a scatterplot, examine if the
existence of a linear relationship, if any, point out any outliers, and compute the
correlation coefficient.
Once the individual activities have been completed, learners may want to work
together as a class. The teacher may arrange the class into groups according to the
questions they worked on. The groups must generate dot plots for the three different
questions for the slope, intercept, and correlation coefficient.
Figure 6-01.10 is a sample interpretation of the dot plots. The dot plot for the
correlation above reveals that in most samples the correlation is very high. The
distribution illustrates that except for one outlier value of .63, the correlation tends to
be greater than .85. Moreover, the majority of the samples revealed a correlation
greater than .95. Thus, a good guess for the population correlation coefficient would
be around .96. This would be a good guess because although the mode of the dot
plot is .99, there are still some values that are a bit lower. A .96 correlation was found
in three samples as well. The range of the dot plot is .36.
! 413#
!
Figure 6-01.10. Dot Plot of Correlation of Usual Num ber of Text

M essages with Daily Allowances
2. Break up each sample by gender and repeat the activity for each gender separately.
Compare whether there are differences between males’ and females’ text messaging
experiences.
(D) Advance Lesson : Rank Correlation

A less sensitive measure to outliers is the rank correlation or the correlation of the ranks
of the x-values with the ranks of the y-values. The rank correlation may be computed
instead of the typical (Pearson product moment) correlation coefficient especially if there
are outliers in the data. This rank correlation is called Spearm an’s rho.
Learners may want to determine Spearman’s rho using the data they have worked on. Make
sure to determine ranks properly for ties, e.g., if the fourth, fifth and sixth smallest have the
same values, then they are all given a rank of 5.
KEY PO INTS
• A scatterplot (or scatter diagram) can be used to show the relationship between two
numerical variables.
• Correlation analysis is used to detect whether two variables are “linearly” related (or
associated), i.e. does one variable increase when the other variable increases? or does
one decrease when the other increases?
• Correlation measures the strength of association (linear relationship) between two
numerical variables:
o The correlation coefficient is a unit-free number that ranges from -1 to 1. It it is not
affected by interchanging the order of the numbers, by adding the same number
to all the values of one variable, or by multiplying all the values of one variable the
same positive number
o Correlation is only concerned with the strength of the relationship andcausal
effect is not implied. In some cases, e.g., correlation of absolute number of
unemployed in a country across the years and number of sunspots observed
across the years, correlation may be simultaneously influenced by a third variable
(here in the example, time is the third variable).
o The correlation coefficient can be misleading when the data have outliers and if
the underlying empirical relationship between the variables is nonlinear.
Whenever possible, inspect the scatterplot for issues such as these.
REFERENCES
! 414#
!
Gibson, J., McNelis, M., and Bargagliotti, A., “Text Messaging is Time Consuming! What
Gives?”Retrieved from: STatistics Education Web (STEW) through
https://www.amstat.org/education/stew/pdfs/TextMessagingisTimeConsumingWhatGives.doc
See also:
Freedman, D., Pisani, R, and Purves (2007). Statistics. Fourth Edition. New York: W.W. Norton
& Company.
4031
! 415#
!
ACTIVITY SHEET 6-01
Introduction
The advent of cellophanes has changed our life. Many of us send lots of text messages
throughout a day. What factors could be related to the number of text messages a senior high
school learner sends in a day? In this activity, we will explore the relationship between the
number of text messages one sends in a day and a few other potential explanatory factors,
such as daily allowance and the happiness of a learner. We will also explore the relationship of
height and weight of senior high school learners.
Each learner will work individually on a random sample of data collected from the class (or the
whole school) and then later on share the results with the class. The class will then be grouped
to work together in drawing further conclusions.
Choose one of the following questions to explore:
a. Does your daily allowance (x) increase or decrease with the number of text
messages you send in a day (y)?
b. Does your daily allowance (x) increase or decrease with how happy you are (y)?
c. Does your weight (x) increase or decrease with your height (y)?
To answer the question you chose, you are going to take a random sample of 30 observations
from the database collected at beginning of the Statistics and Probability course.
You will carry out the following steps on your own:
1. Based on the question you chose, generate a scatterplot to create a visual

representation of the data. Does the relationship appear to be linear? Are there any
outliers? What are some possible explanations for the existence of outliers? Should
you eliminate the outliers in your data set? Why or why not?
2. Based on the question you chose, compute the correlation coefficient.
3. Interpret the correlation coefficient in the context of the question.!
Now, the class will be divided into three groups and will be asked to choose one question to
work on from the three provided above. Each group will complete the following table for their
chosen question.
! 416#
!
ENRICHM ENT
4. Collect the correlation coefficient found by each learner in the class and summarize the
result in a table:
Question Chosen: _________________________ vs._________________________________
Student Correlation Student Correlation Student Correlation Student Correlation
1 6 11 16
2 7 12 17
3 8 13 18
4 9 14 19
5 10 15 20
5. Create a dot plot for the correlations.!

6. Look at the dot plot that pertains to your question. This dot plot represents an
approximation to the sam pling distribution of the correlation coefficient. What do
you notice about the dot plot? What is the range of the correlation coefficient? What
seems to be the most common correlation? If you are to guess what the correlation
was for the entire population, what would be your guess? Explain wh
ASSESSM ENT 6-01
I. A study was done to investigate the relationship between the amount of protix (a new
protein-vitamin-mineral supplement) on fortified-vitamin rice, known as FVR, and the weight
gain of children. Ten randomly chosen sections of grade one pupils were fed with FVR
containing protix; different amounts (X) of protix were used for the 10 sections. The increase in
weight of each child was measured after a given period. The average gain (Y) in weight for
each section with a prescribed protix level (X) is as follows:
Section Protix Gain Section Protix Gain

1 50 92.6 6 100 106.2
2 60 97.5 7 110 108.9
3 70 96.5 8 120 108.4
4 80 102.3 9 130 110.2
5 90 105.8 10 140 110.8
a. Create a scatter diagram based on the data.
! 417#
!
b. Does the scatter diagram suggest a linear relationship? What other relationships may
be tenable?
c. What is the correlation between the protix level and the average weight gain?
Answers:
(a) Scatterdiagram below
(b) Yes, a linear relationship is tenable. A quadratic relationship may be more tenable for
the data.
(c) 0.954608
II. At a large local high school, the principal wanted to ensure that her learners would perform
well on this year’s standardized tests. As such, the principal came up with a list of factors that
may negatively or positively impact test scores and aimed to prove it to the learners while
giving a practice test out of 100 points. A month before the practice test, the principal asked
learners to fill out a survey asking them how many hours per week they hung out with their
friends and how many hours per week they spent in the study hall. Because the high school
was very large, the principal only surveyed a sample of the learners. Below are the two
scatterplots showing the survery results versus the students’ scores on the practice exam.
Collection 1 Scatter Plot
110
100
90
80
70
60
50
0 5 10 15 20 25 30 35
Hours_With_Friends
! 418#
!
Yˆ = −2.69 X +122.87,R2 = .71
Collection 1 Scatter Plot
110
100
90
80
70
60
50
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5

Hours_in_Study_Hall
Yˆ = 2.85X + 76.183,R2 = .02
Based on these two scatterplots, answer the following questions:
1. Is there a positive or negative relationship between the hours spent by students with
with their friends and their test scores? How about the hours spent in the study hall and
their test scores?
2. On average, what would be the students’ scores if they spent zero hour per week
hanging out with friends? In the study hall?
3. On average, how many points on the test would be increased/decreased if each
student spent one extra hour in the study hall? Hanging out with friends?
When the learners heard the results of the study, they asked the principal to look at different
samples of learners in the high school. To accommodate the request of the learners, the
principal decided to randomly sample groups of 20 learners at a time, for 15 more times. The
following dot plots provide the summary of the results.
Hours in Study Hall Dot Plot

Hours with Friends Dot Plot
1.5 2.0 2.5 3.0 3.5 4.0 4.5 -4.0 -3.5 -3.0 -2.5 -2.0 -1.5
Slope
Slope
! 419#
!
Based on the dot plots above, answer the following questions:
4. Should the learners believe that the principal’s decision to mandate spending an extra
hour in the study hall every week would increase their scores on the test? Explain.
5. Should the learners try to decrease the number of hours they spent hanging out with
friends before the test? Explain.
Answers
1. There appears to be a negative linear relationship between the amount of time a

student spends hanging out with their friends and their test scores. There does not
seem to be a clear positive or negative relationship between the number of hours spent
in the study hall and the test scores.
2. On average, a student would score 122.87 on the test if they spent zero hours per week
hanging out with friends. This y-intercept does not have a practical interpretation since
there is no way to score more than 100 on the test. Also note that 0 is not within the
range of the collected data values for hours spent with friends. On average, a student
would score 76.183 on the test if they spent zero hours per week in the study hall.
3. On average, a student’s score will change by -2.69 points for every hour they spend
hanging out with friends. On average, a student will increase 2.85 points on the test for
every hour they spend in study hall.
4. The dot plot illustrates that all the sampled slopes are positive. This means that for
every one of the 50 samples of 20 subjects sampled, the slope of the regression line
was positive showing that as the number of hours spent in thestudy hall increases, the
scores on the test increase. In particular, the dot plot shows that the slopes tend to be
for the most part between 2.6 and 3.6, meaning that on average, scores would be
raised between 2.6 and 3.6 for every hour extra spent in the study hall.
5. The dot plot illustrates that all the sampled slopes are negative. This means that for
every one of the 50 samples of 20 subjects sampled, the slope of the regression line
was negative showing that as the number of hours spent with friends increases, the
scores on the test decrease. In particular, the dot plot shows that the slopes tend to be
centered around −2.5, meaning that on average, the scores would change by about -
2.5 for every extra hour spent hanging out with friends.
! 420#
Biographical Notes She co-authored several teaching workbooks,
Jose Ramon G. Albert, Ph.D. lecture handbooks, and laboratory manuals for
various topics in Statistics. Dr Albacea completed
Team Leader her doctorate degree in Statistics at the
Dr. Jose Ramon Albert Dr. Albert is a Senior University of the Philippines Los Baños. She also
Research Fellow of the Philippine Institute for finished her master’s and bachelor’s (cum laude)
Development Studies. He is a professional degrees in Statistics at the same university.
statistician who wrote topics spanning poverty
measurement and analysis, education statistics, Mark John V. Ayaay
and statistical analysis of missing data.
Writer
He has been a Consultant of development
agencies, including the United Nations Mr. Mark John V. Ayaay is the lead teacher in
Statistical Institute for Asia and the Pacific and Statistics 1 at Philippine Science High School,
the World Bank Group. He has also served in Diliman, where he has been teaching for five
government agencies such as the Philippine’s years and considered an Outstanding Teacher for
Commission on Population, Malaysia's Economic three school years. Prior teaching at PSHS, he
Planning Unit, and Lao's Department of was also an Instructor at the Ateneo de Manila
Statistics. Dr. Albert has taught in different University, where he taught Calculus and
universities including the University of the Applied Mathematics. He finished his bachelor’s
Philippines and the De La Salle University. degree in Mathematics at the Ateneo de Manila
University, and is currently taking his Bachelor
Dr. Albert served as President of the Philippine of Science in Public Health major in Biostatistics
Statistical Association, Inc. For over fifteen at the University of the Philippines Manila.
years, Dr. Albert has written and co-authored
various monographs, papers, and journal articles.
He earned his Doctorate of Philosophy in
Statistics and his Master of Science in Statistics Isidoro P. David, Ph.D.
from the State University of New York at Stony Writer
Brook. He was a Philippine Department of
Science and Technology Scholar and graduated Dr. Isidoro P. David is one of the frontrunners of
the Statistics community in the country. He
with a degree in Mathematics (Summa Cum
served as President of the Philippine Statistical
Laude and Awardee for Excellence in
Association for two terms. He was consultant for
Mathematics) from the De La Salle University.
the Asian Development Bank, National Statistics
Office, Statistical Research, and theo Training
Zita V.J. Albacea, Ph.D. Center among others. He also held a teaching
position at the University of the Philippines . Dr.
Writer
David finished his doctorate in Statistics at the
Dr. Zita Albacea is the current Executive Iowa State University of Science and Technology.
Director of the Philippine Statistical Research He received his master’s degree in Statistics and
and Training Institute. She has been teaching bachelor’s (cum laude) degrees in Agriculture,
Statistics spanning survey operations, special majoring in Statistics at the University of the
problems, and statistical theory at the University Philippines Diliman. He received different local
of the Philippines Los Baños for 34 years. She and international citations, including
served as Dean of the UPLB College of Arts and Outstanding Social Scientist Award, Outstanding
Sciences from 2011 to 2014 and UPLB Vice Researcher Award in Mathematical Sciences, and
Chancellor for Administration from 2002 to 2004. won the 3rd Mahalanobis International Award.
Imelda E. de Mesa, Ph.D. She graduated cum laude for her bachelor’s
degree in Statistics also at the same university.
Writer Dr. Collado has served as statistician for
Dr. Imelda E. de Mesa is a Senior Lecturer at the different partner groups, including the
University of the Philippines School of Statistics, Resources, Environment, and Economic Center
Diliman where she teaches undergraduate for Studies, DAR, Inc. and SEAMEO-SEARCA.
courses in Statistics. Before teaching at UP She has also served as resource speaker in
Diliman, she was an Assistant Professor VI at lectures on Statistics around the Philippines.
the De La Salle University Manila, where she
taught courses such as Statistical Inference,
Multivariate Analysis, and Nonparametric Rea Uy-Epistola
Statistics both on the graduate and
undergraduate levels. She also held various Copyreader
leadership positions at the Philippine Statistical Rea Uy-Epistola is currently Proprietress for the
Association from 2003 to 2011, most notably as Material Recovery Facilities. She held consultant
the Board Director for five years. posts for writing projects including those of
UNICEF Philippines, Save the Children
Federation, Inc., and the Climate Change
Nancy E. Añez-Tandang, Ph.D. Commission. Rea Uy-Epistola graduated Cum
Laude at the University of Santo Tomas with a
Technical Editor
bachelor’s degree in Political Science.
Dr Nancy Tandang is Assistant Professor at the
University of the Philippines Los Baños. She has
served as Chairperson for Research Committee Michael Rey O. Santos
at the Institute of Statistics in UPLB and has
collaborated on researches and lectures with Layout Artist
partner universities and agencies, including the MIchael Rey Santos is a freelance illustrator and
Ateneo de Naga, TESDA, DA, and DepEd. Dr graphic artist specializing both in traditional and
Tandang co-authored academic resources on digital media. He worked as illustrator for
Statistics such as the Training Manual on Micro- publications such as Summit Media’s K-Zone
computer based Statistics for the Social Science Magazine, and for government agencies including
Research and all twelve editions of Workbook on PAG-IBIG and CHED. Mr. Santos also taught
Statistics. Dr. Tandang Completed her doctorate graphic applications and animated media at the
in Statistics, her masters in Statistics, and her First Academy of Computer Arts for 7 years. He
bachelor’s degree in Statistics (cum laude) at the graduated at the De La Salle University with a
University of the Philippines Los Baños. bachelor’s degree in Psychology.
Roselle V. Collado
Technical Editor
Prof. Roselle V. Collado is Assistant Professor III
at the University of the Philippines Los Baños
and is currently the Program Development
Associate at the Office of Institutional Linkages
in UPLB. She completed her master’s degree in
Statistics at the University of the Philippines Los
Baños under the DOST-ESEP scholarship grant.

Statistics and Probability PDF

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Statistics and Probability PDF

Загружено:

Авторское право:

Доступные форматы

Commission on Higher Education

in collaboration with the Philippine Normal University

TEACHING GUIDE FOR SENIOR HIGH SCHOOL

Statistics and Probability

Commission on Higher Education

5. constructs the probability mass function of a discrete

6. computes probabilities corresponding to a given

7. illustrates the mean and variance of a discrete random

8. calculates the mean and the variance of a discrete

9. interprets the mean and the variance of a discrete

10. solves problems involving mean and variance of

Normal The learner demonstrates The learner is able to The learner …

14. converts a normal random variable to a standard

15. computes probabilities and percentiles using the M11/12SP-IIIc-d-

8. solves problems involving sampling distributions of the M11SP-IIIe-f-1

Estimation of The learner demonstrates The learner is able to The learner …

11. identifies percentiles using the t-table. M11/12SP-IIIg-5

14. draws conclusion about the population mean based on

15. identifies point estimator for the population proportion. M11/12SP-IIIi-1

16. computes for the point estimate of the population

17. identifies the appropriate form of the confidence

19. solves problems involving confidence interval

21. identifies the length of a confidence interval. M11/12SP-IIIj-1

22. computes for the length of the confidence interval. M11/12SP-IIIj-2

23. computes for an appropriate sample size using the

3. identifies the parameter to be tested given a real-life

4. formulates the appropriate null and alternative

5. identifies the appropriate form of the test-statistic

7. computes for the test-statistic value (population mean). M11/12SP-IVd-1

8. draws conclusion about the population mean based on

9. solves problems involving test of hypothesis on the

10. formulates the appropriate null and alternative

11. identifies the appropriate form of the test-statistic

12. identifies the appropriate rejection region for a given

14. draws conclusion about the population proportion

CONTENT CONTENT STANDARDS PERFORMANCE

4. estimates strength of association between the variables

5. calculates the Pearson’s sample correlation coefficient. M11/12SP-IVh-2

6. solves problems involving correlation analysis. M11/12SP-IVh-3

7. identifies the independent and dependent variables. M11/12SP-IVi-1

8. draws the best-fit line on a scatter plot. M11/12SP-IVi-2

9. calculates the slope and y-intercept of the regression

12. solves problems involving regression analysis. M11/12SP-IVj-2

Code Book Legend

Learning Area and Strand/ Subject or

Grade Level Grade 11/12

Chapter 2: Random Variables and Chapter 4: On Estimation of Parameters

Chapter 6: Correlation and Regression

Biographical Notes 421

CHAPTER 1: EXPLORING DATA

Lesson 1: Introducing Statistics

Workbooks in Statistics 1 (From 1st to 13th Edition), Authored by the Faculty of

DEVELOPMENT OF THE LESSON

The first group of questions could be answered by a piece of information which

Statistics is defined as a science that studies data to be able to make a decision.

Make known to students that Statistics enable us to

And to use Statistics in decision-making there is a statistical process to follow

2. Statistical Process in Solving a Problem

As discussed earlier, this question requires you to gather data to generate

To summarize, a statistical process in making a decision or providing solutions to

• Planning or designing the collection of data to answer statistical questions in

For i. How tall is a typical Filipino? (The process includes getting a

Lesson 2: Data Collection Activity

Commission on Higher Education 

WHITE RED PINK ORANGE