Вы находитесь на странице: 1из 71

Data Description Numerical

Measures of Variability for


Ungrouped Univariate Data
Suppose the two data sets below represent
the scores of 5 students in their 30item
test in statistics. Which group performs
better?

Group A: 12, 15, 18, 24, 20


Group B: 14, 13, 26, 30, 6
Objectives

Introduction of some basic statistical


measurements of spread or variability.
How to compute these measures and
investigate some of their properties.
Introduction
A measure of variability for a collection of
data values is a number that is meant to
convey the idea of spread for the data set.
The most commonly used measures of
variability for sample data are the:
range

interquartile range

mean absolute deviation

variance or standard deviation

coefficient of variation.
The Range
Explanation of the term range: The
range is the difference between the
largest and smallest values in the data
set.
NOTE: The explanation is true for a
sample as well as a finite population of
values.
The Range
Example: What is the range for the following
sample values?

3, 8, 6, 14, 0, 4, 0, 12, 7, 0, -10

Solution: First we should arrange from smallest


to largest.

-10, -7, 0, 0, 0, 3, 6, 8, 12, 14

Range = 14 (-10) = 24
The Range
Question: Why does subtracting the
smallest value from the largest value
produce a measure of spread?
The next slide shows the plot of the data
set.
Observe that the range measures the
distance between the smallest and largest
values.
This distance gives a measurement of the
spread of the data.
The Range

Range gives a measurement of spread.


Quick Tip:
The range does not use the concept of
deviations.
It is affected by outliers (large or small
values relative to the rest of the data
set).
The range does not utilize all the
information in the data set only the
largest and smallest values.
Thus, it is not a very useful measure of
spread or variation.
The Range - Example
Example: What is the range for the
following sample values?

9, 995, 1000, 1002, 1014

Solution: Range = 1014 9 = 1005

Here the range is significantly affected


by the outlying value of 9.
The Interquartile Range
Explanation of the term interquartile
range: The interquartile range measures
the spread of the middle 50% of an
ordered data set.
NOTE: The interquartile range is
obtained using the following steps-
Step1: Order the data set from smallest
to largest.
Step 2: Find the median for the ordered
set. Denote by Q2.
The Interquartile Range
Step 3: Find the median for the first
50% of the ordered set. The median
found in Step 2 is not included in this
portion of the data. Denote by Q1.
Step 4: Find the median for the second
50% of the ordered set. The median
found in Step 2 is not included in this
portion of the data. Denote by Q3.
The Interquartile Range
Step 5: The interquartile range is
computed from the following

IQR = Q3 Q1
The Interquartile Range
The following depicts the idea of the
interquartile range.
The Interquartile Range
Example: The following scores for a
statistics 10-point quiz were reported.
What is the value of the interquartile
range?

7, 8, 9, 6, 8, 0, 9, 9, 9

0, 0, 7, 10, 9, 8, 5, 7, 9
The Interquartile Range

Solution: Ordering the data in ascending


order gives:

0, 0, 0, 5, 6, 7, 7, 7, 8, 8,
8, 9, 9, 9, 9, 9, 9, 10
The Interquartile Range

Solution (continued): The median (Q2) is 8 (the


mean of the 9th and 10th data elements).
The first 50% of the data (excluding Q2)
contains the elements
0, 0, 0, 5, 6, 7, 7, 7, 8
yielding a median of 6. Thus Q1 is 6.
Similarly, Q3 is 9.
The interquartile range is therefore 9
6 = 3.
The Interquartile Range
Solution (continued): With the
availability of technology, it makes it
easy to compute the interquartile range.
We will present information in EXCEL
and compute the interquartile range. To
do so, type the data in an EXCEL
window. First, find Q3 and Q1 by
typing: = Quartile(array, quart) then
use the formula Q3-Q1 to find IQR.
The Mean Absolute Deviation
The mean absolute deviation utilizes
deviations of the data values from the
mean.
Explanation of the term - Mean
Absolute Deviation (MAD): The mean
absolute deviation is the average of the
absolute deviations from the mean of
the data set.
The Mean Absolute Deviation
The MAD is computed using the
following formula.

The formula says that you:


subtract the sample mean from each data
value
take the absolute values of the results
add the absolute values together
divide by the sample size
The Mean Absolute Deviation
Example: What is the MAD for the
following sample values?

3, 8, 6, 12, 0, -4, 10

Solution: First of all, the sample mean =


5 (Verify).
The table on the next slide shows the
computations
The Mean Absolute Deviation
The Mean Absolute Deviation
The Mean Absolute Deviation

Question: What does the MAD


measure?
The MAD measures the average
(absolute) distance of the sample
values from the mean of the data
values.
The Mean Absolute Deviation

The deviations
contribute to the
total in proportion
to the size of the
deviation.

The average
distance of the
sample values
from the mean
is 4.57.
Quick Tip:

If data set A has a larger MAD than


data set B, then it is reasonable to
believe that the values in data set A are
more spread out (variable) than the values
in data set B.
The MAD is sensitive to values that are
very small or very large relative to the
rest of the data set.
The Variance and Standard
Deviation
The variance and standard deviation
are the most common and useful
measures of variability.
These two measures provide
information about how the data vary
about the mean.
The Variance and Standard
Deviation

When the data are clustered about


the mean, the variance and standard
deviation will be somewhat small.
The Variance and Standard
Deviation

When the data are widely scattered


about the mean, the variance and
standard deviation will be somewhat
large.
Sample Variance and Standard
Deviation
Explanation of the term sample variance:
The sample variance is an approximate average
of the squared deviations of the data values
from the sample mean.
The sample variance is computed from the
following formula and is denoted by s2:
Sample Variance and Standard
Deviation
Example: What is the variance for the
following sample values?
3, 8, 6, 14, 0, 11
NOTE: Do not let the formula
intimidate you. We will build a table to
help with the computations.
Sample Variance and Standard
Deviation
We will build a table to help in the
computations. NOTE: The mean = 7.

S2 = 132/(6 1)
= 132/5
= 26.4
The Variance and Standard
Deviation
In the previous example, observe that the
variance is large relative to the size of the
data values.
This can be observed from the plot which
shows that the data values are very much
spread out about the mean value of 7.
Sample Variance and Standard
Deviation
Explanation of the term sample standard
deviation: The sample standard deviation is the
positive square root of the variance.
NOTE: The standard deviation has the same
unit as the variable.
Example: The sample standard deviation for
the previous example is
Quick Tips:

If all of the observations have the same value,


the sample variance (standard deviation) will be
zero. That is, there is no variability in the
data set.
The variance (standard deviation) is influenced
by outliers in the data set.
The unit for the standard deviation is the same
as that for the raw data.
Thus it is preferred to use the standard
deviation rather than the variance as the
measure of variability.
Population Variance and Standard
Deviation
Explanation of the term population
variance: The population variance is the
average of the squared deviations of the data
values from the population mean.
The population variance is computed from the
following formula and is denoted by 2:
Population Variance and Standard
Deviation
Explanation of the term population standard
deviation: the population standard deviation is
the positive square root of the population
variance.
The population standard deviation is computed
from the following formula and is denoted by
:
Population Variance and Standard
Deviation

Consider the example in the previous page.


Suppose the data set represents population of
values. What is the value of the variance and
standard deviation?
The computation is the same except that we use
N instead of n 1 when dividing the sum of the
squared differences.
Population Variance and Standard
Deviation
We will build a table to help in the
computations. NOTE: The mean = 7.

2 = 132/6
= 132/6
= 22
Population Variance and Standard
Deviation
Explanation of the term population standard
deviation: The population standard deviation is
the positive square root of the variance.
NOTE: The standard deviation has the same
unit as the variable.
Example: The population standard deviation
for the previous example is

22 4.6904
The Coefficient of
Variation
The coefficient of variation (CV) allows us to
compare the variation of two (or more)
different variables.
Explanation of the term sample coefficient
of variation: The sample coefficient of
variation is defined as the sample standard
deviation divided by the sample mean of the
data set.
Usually, the result is expressed as a
percentage.
The Coefficient of
Variation


Population CV 100%

NOTE: The coefficient of variation


standardizes the variation by dividing it
by the mean of the values.
The Coefficient of
Variation
The coefficient of variation has no
units since the standard deviation and the
mean have the same units, and thus
cancel out each other.
Because of this property, we can use
this measure to compare the variations
for different variables with different
units.
The Coefficient of
Variation
Example: The mean number of parking
tickets issued in a neighborhood over a
four-month period was 90, and the
standard deviation was 5. The average
revenue generated from the tickets was
$5,400, and the standard deviation was
$775. Compare the variations of the two
variables.
Solution is on the next slide.
The Coefficient of
Variation
Solution:

Since the CV is larger for the revenues, there is


more variability in the recorded revenues than in
the number of tickets issued.
The Coefficient of
Variation

Example: The mean asking price of a stock


(Stock A) had a 2006 mean asking price of
$275.00 and standard deviation $65.257.
Another stock (Stock B) had an average asking
price of $50.125 with a standard deviation of
$15.525.
Compare the variations of the two variables,
and thus the relative risk of investing in each
stock.
Solution is on the next slide.
The Coefficient of
Variation
Solution:

Since the CV is larger for Stock B, Stock A is


less risky.
Example
Compare the two stocks below:

Stock A:
Average price last year = $50
Standard deviation = $5
Stock B:
Average price last year = $100
Standard deviation = $5
The Coefficient of
Variation
Explanation of the term population
coefficient of variation: the population
coefficient of variation is defined as the
population standard deviation divided by the
population mean of the data set.
NOTE: The population CV has the same
properties as the sample CV.
Sample Variance and Standard
Deviation: Alternative Formula
Expanding the original formula for sample
variance and standard deviation using
summation definition and some rules, it will
result to:

n x x n x x
2 2 2
2

s
2
s
n n 1 n n 1
Sample Variance and Standard
Deviation: Alternative Formula
Now, let us use the alternative formula using
the sample we have in the previous page and
see if the results are the same. We form
table of values. Then, we plug-in the values
of: n, sum of x and sum of x2 into the
formula.

x 3 8 6 14 0 11 x 42
x2 x
2
9 64 36 196 0 121 426
Sample Variance and Standard
Deviation: Alternative Formula
Plugging in the values into the formula, we
have:
x 42 x n6
2
426

n x x 6 426 42
2 2
2

Variance s 2
26.4
n n 1 6 6 1

n x x
2 2

s.d . s 26.4 5.1381


n n 1
Population Variance and Standard
Deviation: Alternative Formula

The population variance and standard


deviation can be computed using the alternative
formula:

N X X
2 2

Variance 2

N N

N X X
2 2

s.d .
N N
Population Variance and Standard
Deviation: Alternative Formula
Suppose the given data set below represents
population of values. Compute the standard
deviation and variance.
Again, we make table of values then get the
sum of X and X2.

X 3 8 6 14 0 11 X 42
X2 X
2
9 64 36 196 0 121 426
Population Variance and Standard
Deviation: Alternative Formula
Plugging in the values into the formula, we
have:
X 42 X 426 N 6
2

N X X 6 426 42
2 2
2

Variance 2
22
N N 6 6

N X X
2 2

s.d . 22 4.6904
N N
Sample Variance and Standard
Deviation: Grouped Data
If the data set is presented in frequency
distribution table (grouped data), the sample
variance and standard deviation can be found,
using the formula:
n fX fX
2 2

Variance s 2
where:
n n 1 X is the class mark
f frequency
n = sum of frequency
n fX fX
2 2

s.d . s
n n 1
Sample Variance and Standard
Deviation: Grouped Data
Find the sample variance and standard
deviation.
Sample Variance and Standard
Deviation: Grouped Data
Make table of values for X, fX, X2, and fX2
X f fX X2 fX2
5 3 15 25 75
15 10 150 225 2250
25 6 150 625 3750
35 4 140 1225 4900
45 2 90 2025 4050
f 25 fX 545 15,025
fX 2
Sample Variance and Standard
Deviation: Grouped Data
Substituting the following values from the
table in the previous page:
n f 25 fX 545 15,025
fX 2

25 15025 545 2

Variance s 2
131
25 25 1

n fX fX
2 2

s.d . s 131 11.4455


n n 1
Population Variance and Standard
Deviation: Grouped Data
If the data set presented in frequency distribution
table (grouped data), represents values obtained
from a census (population), the variance and
standard deviation can be found, using the formula:

N fX fX
2 2

Variance 2 where:
N N X is the class mark
f frequency
N - sum of frequency
N fX fX
2 2

s.d .
N N
Population Variance and Standard
Deviation: Grouped Data
Suppose the data set below represents a
population of values, find the variance and
standard deviation.
Population Variance and Standard
Deviation: Grouped Data
Make table of values for X, fX, X2, and fX2
X f fX X2 fX2
5 3 15 25 75
15 10 150 225 2250
25 6 150 625 3750
35 4 140 1225 4900
45 2 90 2025 4050
f 25 fX 545 15,025
fX 2
Population Variance and Standard
Deviation: Grouped Data
Substituting the following values from the
table in the previous page:
N f 25 fX 545 15,025
fX 2

25 15025 5452

Variance 2
125.76
25 25

N fX fX
2 2

s.d . 125.76 11.2143


N N
The Empirical Rule
Knowing the value of the mean and the value
of the standard deviation for a data set can
provide a great deal of information about the
data set.
In particular, if the data set has a single
mode and is symmetrical (bell-shaped), then
one can generalize some properties of the
distribution.
One such generalization is called the
Empirical Rule.
The Empirical Rule

The Empirical Rule gives some general


statements relating the mean and the
standard deviation of a bell-shaped
distribution.
It relates the mean to one, two, and
three standard deviations.
Empirical Rule

One Sigma Rule Approximately 68% of the


data values will lie within one standard
deviation from the mean.
That is, one can expect a deviation of more
than one sigma from the mean to occur once in
every three observations.
This true because approximately 33%
(approximately 1/3) of the values are outside
one standard deviation from the mean.
Empirical Rule - One Sigma
Rule

Graphical Display of the One Sigma Rule


Empirical Rule

Two Sigma Rule Approximately 95% of the


data values will lie within two standard
deviations from the mean.
That is, one can expect a deviation of more
than two sigma from the mean to occur once in
every twenty observations.
This true because approximately 5% (1/20)
of the values are outside two standard
deviations from the mean
Empirical Rule - Two Sigma
Rule

Graphical Display of the Two Sigma Rule


Empirical Rule

Three Sigma Rule Approximately 99.7% of


the data values will lie within three standard
deviations from the mean.
That is, one can expect a deviation of more
than three sigma from the mean to occur once
in every 333 observations.
This true because approximately 0.3%
(1/333) of the values are outside three
standard deviations from the mean.
Empirical Rule - Three Sigma
Rule

Graphical Display of the Three Sigma Rule

Вам также может понравиться