Академический Документы
Профессиональный Документы
Культура Документы
Describing Data
Lesson 7: Measures of Variation
TIME FRAME: 1 hour session
OVERVIEW OF LESSON
In this lesson, students will be shown that it is not enough to get measures of central tendency in
a data set by seeing two data sets representing the returns on stocks. Here, the means are the
same, and the spread of the data, i.e. range, standard deviation and variance, for the data sets are
also the same. The standard deviation can be viewed as a measure of risk. The main learning her
is that if we get a mixture of the stocks (and thus have an average of the returns of the two
stocks), we would get less risk (as the standard deviation of this mixture will be less).
LEARNING OUTCOME(S): At the end of the lesson, the learner is able to
LESSON OUTLINE:
1. Introduction
2. Case Study: Returns on Stocks
3. Analysis and Comments on Case
DEVELOPMENT OF THE LESSON
(A) Introduction
Discuss with students the importance of thinking of their future, of saving, and of wealth
generation. Explain that a number of people invest money into the stock market as an
alternative financial instrument to generate wealth from savings. (Explanatory Note: Stocks
are shares of ownership in a company. When people buy stocks they become part owners of
the company, whether in terms of profits or losses of the company. )
Mention to students that the history of performance of a particular stock maybe a useful
guide to what may be expected of its performance in the foreseeable future. (This is of
course, a very big assumption, but we have to assume it anyways.)
(B) Case Study: Returns on Stocks
Provide the following data to students representing the rates of return for two stocks, which
well call stock A and stock B.
Inform students that the rate of return is defined as the increase in value of the portfolio
(including any dividends or other distributions) during the year divided by its value at the
beginning of the year. For instance, if the parents of Juana dela Cruz invests 50,000 pesos in
a stock at the beginning of the year, and the value of the stock goes to 60,000 pesos, thus
having an increase in value of 10,000 pesos, then the rate of return here is 10,000/50,000 =
0.20
Explain to students that the rate of return may be positive or negative. It represents the
fraction by which your wealth would have changed had it been invested in that particular
combination of securities.
Divide students into sets of threes, ask students to obtain the average return for the two
stocks and the standard deviations for the rates of return.
Tell them to use the historical performance of the stocks as a guide to making an investment
decision. Instruct them to look at summary measures of variability (such as the range and the
standard deviation) of the rates of return, and use these as measures of risk associated with
investing in a given security. Discuss whether it would make any difference if we decide to
invest wholly in stock A, wholly in stock B, or half of our investments in stock A and half in
stock B? Ask them why is this so???
Notes on Calculating Measures of Variation
(i) A simple measure of variation is the range, the difference between the maximum and
minimum values.
While the range is simple, it only depends on the extremes; it ignores information
about what goes on between the smallest (minimum) and largest (maximum) values
in a data set.
We may want to have a measure of spread based on all these deviations. Getting the
mean of these deviations always yields a value of zero regardless of the values of a
data distribution. However, the average of the absolute value of these deviations is
nontrivial. It is called the mean absolute deviation and is useful for measuring
spread. This measure, alas, does not have very interesting mathematical properties.
An alternative to the mean absolute deviation is the variance, formed by taking the
mean of the squared deviations from the average. Unlike the mean absolute deviation,
the variance has some interesting mathematical properties; but, we omit discussion of
these properties here. The variance has the square of the units of the data. If we take
its square root, we get the standard deviation.
TECHNICAL NOTES
x i
i 1
x1 x 2 xN N
Given a data set , , , , denote the mean as
(x i )2
2 i 1
2 N
(a) the variance, denoted as , is
(b) the population standard deviation, denoted as , is the square root of the variance
N
(x i )2
i 1
N
The variance and standard deviation are based on all items in the list, and each item is
given a proper weight. They are extremely useful measures of variability as they
measure the mean scattering of the data around the average, i.e. how large data
fluctuate above the average and how small data distribute below the average. The
variance and standard deviation increase with an increase in the deviations about the
mean, and decrease with decreases in these deviations. A small standard deviation
(and variance) means a high degree of uniformity in the observations and of
homogeneity in a series.
The variance is most suitable for algebraic manipulations but as was pointed out
earlier, its computation results in squared units. On the other hand, the standard
deviation has a value in the original units of the data. Thus, it serves as the primary
measure of variation just as the mean is the primary measure of central location
x x x
x x x x x
6 7 8 9 10
Another is
x x x x x
x x x x x
6 7 8 9 10
2. Gerald, Carmina, and Rodolfo obtained the prices (in pesos) of a jar of peanut butter at
several grocery stores. Below is the data they have collected:
100.80 197.60 158.00 131.60 184.40 149.20
136.00 109.60 360.40 122.80 131.60
After analyzing the data, Gerald said, The prices of peanut butter are pretty similar. The range
is only PHP 30.80. Carmina said, You are mistaken! The prices are very different. The range is
P259.60. That is a big difference in terms of peanut butter. Rodolfo said, I think you are both
mistaken. The range isnt a useful measure to describe this set of data.
Gerald did not order the data set from smallest to largest, and erroneously subtracted the first
value (100.80) from the last value (131.60) in the data set.
Carmina found the range correctly by subtracting the smallest value (100.80) from the largest
value (360.40).
Rodolfo noticed that the maximum 360.80 is an outlier in this set of data. As a result, a range of
PHP259.60 should not entirely describe the variation of the set of data as it was unduly increased
by the extreme value.
ANSWER:
Rodolfo astutely observed that while Carmina was correct in her cacluations, the range is not
very useful in describing the variability of this set of data, as the range would only be PHP 96.80
if the outlier were removed from the data set.
3. Three hundred students taking a first course in Statistics are provided a common final
examination. The following histogram shows the distribution of the final scores.
C E
B
F
G
A H
40 60 80 100 120
Chapter 1 Describing Data Lesson 7 Page 8
Suppose the professor will give 30% weight to the Final Examination, what effect
would multiplying 30% on all the Final Scores have on the mean of the Final
Exam Scores? on the standard deviation of the Final Exam Scores?
Answer: mean will also get rescaled by 30%, so with the standard deviation
Suppose the professor wants to bloat the Final Examination Scores, what effect
would adding 5 points to all the Final Scores have on the mean of the Final Exam
Scores? on the standard deviation of the Final Exam Scores?
Answer: mean will also go up by 5 points; standard deviation stays the same
4. In a fitness center, the weights of a certain group of students were taken resulting to a common
weight of 140 pounds. What would be the standard deviation of the distribution of weights?
Answer:
Zero (since the data do not vary).
5. Determine which of the following five statements is true or false and explain briefly.
a. The average and median of any list of data are always close together. (Answer: False)
b. Half of a list of data is always below the average. (Answer: False, median)
c. If entries in a list are doubled, then the average is doubled. (Answer: True)
d. If entries in a list are doubled, then the standard deviation is doubled. (Answer: True)
e. If in a set of data, positive numbers are changed to negative, while negative are changed
to positive, the Standard Deviation changes sign as well. (Answer: False, Standard
deviation is always nonnegative)
Explanatory Note:
Teachers have the option to just ask this assessment orally to the entire class to either
introduce or recall the notions of computing the range and of computing the standard
deviation, or to group students and ask them to identify answers, or to give this as
homework, or to use some questions/items here for a chapter examination.
Consider the following five data representing the difference in scores of two players in a
computer game:
To compute for the sample standard deviation here, five steps must be essentially done, viz.:
(a) compute the sample mean, i.e. sum the values in the first
column and divide by the number of items thus yielding:
x =1
(b) subtract the sample mean from each of the items (yielding the
deviations from the average) and thus obtain the second
column in table below.
(c) square the deviations from the average (the items in the second
column), and thus obtain the third column of the table above.
(d) sum the values in the third column and divide by number of
data, thus yielding Variance = 7722/5 = 1544.4
(e) take the square root of the result from (d): Standard Deviation is
nearly 39. 3
In practice, the sum of the values in the third column may be obtained in much faster
and efficient way. This calculation involves (a*) summing the squared values of the
first column, 7727, (b*) subtracting from (a*) the product of the sample size and the
If the first column of the table were entered as an Excel spreadsheet as in Figure 6.1,
= STDEVP(A2:A6)
in an empty cell (such as A9) and obtain the population standard deviation as
approximately 39.3
x i
i 1
x1 x 2 xN N 2
Given a data set , , , , denote the mean as . The variance, denoted as , is
N
(x i )2
2 i 1
N
; while the population standard deviation, denoted as , is the square root of the
variance, i.e.,
(x i )2
i 1
N