Measures of Dispersion

Measures of Dispersion
or Variation
Introduction
1
Objectives of studying variation
o Sometimes the average alone cannot

adequately describe a set of
observations.
2
Example
Factory A Factory B Factory C
monthly wages monthly monthly wages
( Rs.) wages ( Rs.) ( Rs.)
2300 2310 2380
2300 2300 2210
2300 2304 2220
2300 2306 2200
2300 2280 2490
Total 11,500 11,500 11,500

Average 2,300 2,300 2,300 3
Observations:
 Since the average wage is same in all
factories, one is likely to conclude
that the factories are alike in their
wage structure.
 But a close examination will reveal
that the wage distribution in three
factories differs widely from one
another.
4
 In factory A , each and every worker
is getting the same wage, so there is
no variation.
 In factory B, there is slight variation
whereas in factory C, there is wide.
variation
5
Conclusion:
The measures of central tendency,
therefore, insufficient. It must be
supported by some other measures to
reveal the true characteristics of the
data.
6
Measures of Variation
• Knowing the measures of central
tendency is not enough
• Both of the distributions below have
identical measures of central
tendency
3-7
Figure 3.13
Definition
A measure of variation is designed to
measure to what extent the
individual observation differ from it’s
average.
8
Various Measures of Dispersion
1) Range
2) Quartile deviation
3) Mean deviation
4) Standard deviation
9
Desired Qualities of a good
measure of Variation
1. It should be easy to understand and
calculate.
2. It should be rigidly defined.
3. It should be based on all the values.
4. It should not be affected too much
by abnormal extreme values.
5. It should be capable for further
statistical analysis.
10
The Range
 Simplest measure of variation
 Range of a data is the difference between
largest (L) and smallest (S) value.
R= L- S
Coeffient of range = L-S
L+S
11
The Range
DCOVA
 Simplest measure of variation
 Difference between the largest and the smallest values:
Range = Xlargest – Xsmallest
Example:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 13 - 1 = 12
12
EXERCISE
• The following are the prices of shares of a
company from Monday to Friday. Calculate Range
and Coeffient of range
Day Price
Monday 200
Tuesday 210
Wednesday 208
Thursday 160
Friday 250
13
Merits and Limitations of Range
• Amongst all the methods it is the
simplest to understand and takes
minimum time to compute.
Limitations
i) Range is not based on each and every
observation of the distribution.
14
Measures of Variation:
Why The Range Can Be Misleading
 Does not account for how the data are
distributed
7 8 9 10 7 8 9 10 11
11 Range
12 = 12 - 7 = 5 12 Range = 12 - 7 = 5
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
 Sensitive to outliers
Range = 5 - 1 = 4
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 120 - 1 = 119

15
Interquartile range
Interquartile range represents the
difference between third quartile
and first quartile.
Symbolically,
Interquartile range = Q3 – Q1
16
Semi-interquartile range
or quartile deviation.
Q.D = (Q3 – Q1) / 2
17
Example: Calculate Quartile Deviation and its
coefficient from the following data
Weekly Income (Rs) No. of workers f c.f.
Below 350 8
350-370 16
370-390 39
390-410 58
410-430 60
430-450 40
450-470 22
470-490 15
490-510 15
510-530 9
530 and above 10 18
• Q1 = size of N th observation
4
= 292 = 73rd observation
4
Q1 lies in the class 390 – 410.
Q1 = L + N/4 – p.c.f. X I
f
= 393.448
Q3 = 449 QD= 19
Mean deviation
MD= ∑ x – x 
n
20
Example
X Deviation
from mean
Absolute
Deviation
from mean
Find the mean 10

deviation of 20
the data. 30
50
35
25
45
25
Total 21
240
Steps:
1. Calculate absolute deviation
( remove negative signs) from mean.
2. Add all the deviations.
3. Divide the sum of the deviations by

total no of observations.
22
Variance and Standard
Deviation
Variance The average of the squared
deviations of all the
population measurements
from the mean
Standard The square root of the
Deviation variance
23
Variance and Standard deviation
SD=σ = ∑(X- X)2

n
Variance= V= σ2 = ∑(X- X)2

n
Root mean of squared deviations of
values from the mean
24
The Sample Variance
DCOVA
• Root mean of squared deviations of
values from the mean
n
– Sample variance:  (X  X) i
2
S 2 i1
n -1
Where X = arithmetic mean
n = sample size
Xi = ith value of the variable X
25
The Sample Standard Deviation
• Most commonly used measure of

variation
• Shows variation about the mean
• Is the square root of the variance
• Has the same units as the original data
Sample standard deviation: n
 (X  X)
i
2
S i1
n -1
The Standard Deviation
DCOVA
Steps for Computing Standard Deviation
1. Compute the difference between each value

and the mean.
2. Square each difference.
3. Add the squared differences.
4. Divide this total by n-1 to get the sample
variance.
5. Take the square root of the sample variance
to get the sample standard deviation. 27
Sample Standard Deviation:
Calculation Example
DCOVA
Sample
Data (Xi) : 10 12 14 15 17 18 18 24
n=8 Mean = X = 16
(10  X)2  (12  X)2  (14  X)2    (24  X)2

S
n 1
(10  16)2  (12  16)2  (14  16)2    (24  16)2


8 1
130 A measure of the “average”

  4.3095 scatter around the mean
7 28
Example
X Deviation
from mean
Square of
Deviation
Find the 10
Standard 20
deviation
and variance of 30
the following 50
data: 35
25
45
25
29
Total
Comparing Standard
Deviations
Data A
Mean = 15.5
11 12 13 14 15 16 17 18
S = 3.338
19 20 21
Data B Mean = 15.5
S = 0.926
11 12 13 14 15 16 17 18
19 20 21
Data C Mean = 15.5
S = 4.567
11 12 13 14 15 16 17 18
19 20 21
30
Comparing Standard
Deviations
Smaller standard deviation
Larger standard deviation
31
:
Summary Characteristics
 The more the data are spread out, the greater the
range, variance, and standard deviation.
 The more the data are concentrated, the smaller the

range, variance, and standard deviation.
 If the values are all the same (no variation), all these
measures will be zero.
 None of these measures are ever negative.

32
The Coefficient of Variation
DCOVA
• Measures relative variation
• Always in percentage (%)
• Shows variation relative to mean
• Can be used to compare the variability
of two or more sets of data measured
in different units
S
CV     100%

X 
33
Comparing Coefficients of Variation
• Stock A:
– Average price last year = $50
– Standard deviation = $5
S $5
CVA     100%   100%  10%
X $50 Both stocks
• Stock B: have the same
standard
– Average price last year = $100 deviation, but
stock B is less
– Standard deviation = $5 variable
relative to its
S $5 price
CVB     100%   100%  5%
X $100
34
Comparing Coefficients of Variation (con’t)
• Stock A:
DCOVA
– Average price last year = $50
– Standard deviation = $5
S $5
 
CVA     100%   100%  10%
Stock C has a
X $50
much smaller
• Stock C:
standard
– Average price last year = $8 deviation but a
much higher
– Standard deviation = $2 coefficient of
variation
 S  $2
CVC     100%   100%  25%

X  $8
35
Example
Nutritional data about a sample of seven breakfast
cereals includes the number of calories per serving:
Compute mean, variance and s.d. of the calories in the
cereals.
Cereal Calories
Kellogg’s All Bran 80
Kellogg’s Corn Flakes 100
Wheaties 100
Organic Multigrain Flakes 110
Kellogg’s Rice krispies 130
Wheat Vanilla Almond 190
Kellogg’s Mini Wheats 200 36

Solution
• Mean= 130
• Variance= 2,200
• S.D= 46.9042
The standard deviation of 46.9042 indicates that the
calories in the cereals are clustering between 83.0948
and 176.9042.
37
Cereal Calories Amount of Sugar (
in grams)
Kellogg’s All Bran 80 6
Kellogg’s Corn Flakes 100 2
Wheaties 100 4
Organic Multigrain 110 4

Flakes
Kellogg’s Rice 130 4

krispies
Wheat Vanilla 190 11

Almond
Kellogg’s Mini 200 10

Wheats
38
Q. Which varies more from cereal to
cereal – the number of calories or
the amount of sugar (in grams) ?
39
Solution
• Since calories and the amount of sugar
have different units of measurements, C.V.
is needed to compare the variability in the
two measurements.
• CV (calories) = 36.08%
• CV (sugar) = 57.84%
• Relative to the mean , the amount of sugar
is much more variable than calories.
40
Compute Co-efficient of Variation of
the following data .Which team is more
consistent?
Team A Team B
500 350
300 250
200 350
100 280
400 320
41
Locating Extreme Outliers:
Z-Scores
 To compute the Z-score of a data value, subtract the mean
and divide by the standard deviation.
 The Z-score is the number of standard deviations a data

value is from the mean.
 A data value is considered an extreme outlier if its Z-score is

less than -3.0 or greater than +3.0.
 The larger the absolute value of the Z-score, the farther the
data value is from the mean.
Z-Score
XX
Z
S
where X represents the data value

X is the sample mean
S is the sample standard deviation
Z-Score
 Suppose the mean math MAT score is 490,

with a standard deviation of 100.
 Compute the Z-score for a test score of 620.
X  X 620  490 130
Z    1.3
S 100 100
A score of 620 is 1.3 standard deviations above the mean and would not
be considered an outlier.
Cereal Calories
Kellogg’s All Bran 80
Kellogg’s Corn Flakes 100
Wheaties 100
Organic Multigrain Flakes 110
Kellogg’s Rice krispies 130
Wheat Vanilla Almond 190
Kellogg’s Mini Wheats 200
45
46
Practice Problems
1. A company has two sections with 40 and
65 employees respectively. Their average
weekly wages $450 and $350. The
standard deviation are 7 and 9.
(i) Which section has a larger wage bill ?

(ii) Which section has larger variability in
wages?
47
2. A group of 80 candidates have their
average height is 148.5 cm with
coefficient of variation 2.5%. What
is the standard deviation of their
height?
48
Solution
• Section B is larger in bill.
• CVa = 1.56%
• CVb = 2.57%
49
Skewness
The term ‘Skewness’ refers to lack of
symmetry or departure from
symmetry.
50
When a distribution is not symmetrical
(i.e. asymmetrical), it is called a skewed
distribution.
51
1.Karl Pearson’s Coefficient of
Skewness
Karl Pearson’s Coefficient of Skewness = Mean- Mode
σ
It is independent of the unit of measurement/
If the mode is ill-defined, the modified formula is

Coefficient of Skp = 3(Mean- Median)
σ
The coefficient lies between ±1 for moderately
skewed distribution.
52
Practice question
1. Compute Karl Pearson’s Coefficient of
Skewness of the following data:
6,5,7, 5, 8, 4,6,4,6,1.
2. Find Skewness of 6,5, 7,5, 8, 4, 6, 4, 6, 4,1
53
2. Bowley’s Coefficient of Skewness
This method is based on Quartiles. The
formula is:
Sk B = (Q3 –Q2) – (Q2 –Q1 )
(Q3 –Q1 )
= Q3 +Q1 -2 Med.
(Q3 –Q1 )
This method is useful in case of open-end
distribution and where extreme values are
present.
54
3. Kelly’s Coefficient of
Skewness
• Sk k = P90 – 2 P50 + P10
P90 – P10
• Sk k = D9 – 2 D5 + D1
D 9 – D1
55
It should be noted that the three different
formulae of calculating skewness are based
on different assumptions and hence the
answer obtained from the same question by
different method may differ.
56
Q1. The following data relate to the profits
of 1,000 companies: Calculate coefficient
of skewness and comment on its value.
Annual Profits ( crore) No. of companies
10-12 17
12-14 53
14-16 199
16-18 194
18-20 327
20-22 208
22-24 2
57
• Mean= 17.786
• Mode= 19.056
• S.D.= 2.52
• SKp= -0.504 ( moderate negatively skewed)
• The mode is greater by an amount equal to about

50.4 per cent of the value of s.d.
58
Q2.The following table gives the distribution of
daily wages of 500 unskilled workers in a factory:
a)Obtain the limits of daily wages of central 50 per
cent of the observed workers.
b)Calculate Bowley’s Coefficient of Skewness .
Daily wages No. of workers
Below 200 10
200-250 25
250-300 145
300-350 220
350-400 70
400 and above 30

59
• Q1 = 281.03
• Q3 = 344.32
• Hence the daily wages of central 50% of workers lies
between 281.03/- and 344.32 .
• Sk B = -0.102
60
Q3.A bank branch located in a city has the
business objective of developing an
improved process for serving customers
the noon -to – 1.00 p.m. lunch period. The
Waiting time , in minutes, is defined as the
time the customers enters the line till when
he or she reaches the teller window. Data
collected from a sample of 15 customers
during this hour are stored in
61
Data
62
Compute mean, median,
Variance, S.D. and Z
scores.
• Are there any outliers ?
• Are the data skewed?
• As a customer walks into the branch
office during the lunch hour, she
asks the branch manager how long
she can expect to wait. The branch
manager replies “ Almost certainly
less than 5 minutes’. Evaluate the
accuracy of this statement. 63
KURTOSIS
Kurtosis refers to the degree of flatness
or peakedness in the frequency distribution
curve.
64
If the curve is more peaked than the
normal curve it is called ‘leptokurtic’.
If it is more flat-topped than the
normal curve it is called ‘platykurtic’.
The normal curve itself is known as
‘mesokurtic’.
65
Moments about Mean
• μ1 = ∑(x- x )/N
• μ2 = ∑(x- x )2/N
• μ3= ∑(x- x )3/N

• μ4 = ∑(x- x )4/N
66
• Kurtosis ß 2 = ( μ4 / μ2 2 ) -3
67
Example: College Student’s Heights
Height f Class Mid value

(inches)
Here are grouped
data for heights of
100 randomly 59.5-62.5 61 5
selected students
from St. Stephen 62.5-65.5 64 18
College
65.5-68.5 67 42
68.5-71.5 70 27
71.5-74.5 73 8
68
a) Draw a histogram to check whether data is
symmetric or skewed.
b) If data is skewed, comment on how highly
skewed.
c) Also calculate kurtosis value and interpret
the result.
69
Answer keys
• Mean= 67.45
• Skewness =-0.1082( approximately
symmetric)
70
Bin f
62.5 5
65.5 18
68.5 42
71.5 27
74.5 8
Histogram of Heights
45
40
35
30
25
20
15
10
5
0
62.5 65.5 68.5 71.5 74.5
71

Measures of Dispersion

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Measures of Dispersion

Загружено:

Авторское право:

Доступные форматы

Measures of Dispersion

o Sometimes the average alone cannot

Total 11,500 11,500 11,500

Range = Xlargest – Xsmallest

Range = 120 - 1 = 119

Q.D = (Q3 – Q1) / 2

Find the mean 10

2. Add all the deviations.

3. Divide the sum of the deviations by

Standard The square root of the

SD=σ = ∑(X- X)2

Variance= V= σ2 = ∑(X- X)2

• Most commonly used measure of

1. Compute the difference between each value

(10  X)2  (12  X)2  (14  X)2    (24  X)2

(10  16)2  (12  16)2  (14  16)2    (24  16)2

130 A measure of the “average”

Smaller standard deviation

Larger standard deviation

 The more the data are concentrated, the smaller the

 None of these measures are ever negative.

Kellogg’s Corn Flakes 100

Organic Multigrain Flakes 110

Kellogg’s Rice krispies 130

Wheat Vanilla Almond 190

Kellogg’s Mini Wheats 200 36

Kellogg’s Corn Flakes 100 2

Organic Multigrain 110 4

Kellogg’s Rice 130 4

Wheat Vanilla 190 11

Kellogg’s Mini 200 10

 The Z-score is the number of standard deviations a data

 A data value is considered an extreme outlier if its Z-score is

where X represents the data value

 Suppose the mean math MAT score is 490,

Kellogg’s All Bran 80

Kellogg’s Corn Flakes 100

Organic Multigrain Flakes 110

Kellogg’s Rice krispies 130

Wheat Vanilla Almond 190

Kellogg’s Mini Wheats 200

(i) Which section has a larger wage bill ?

If the mode is ill-defined, the modified formula is

2. Find Skewness of 6,5, 7,5, 8, 4, 6, 4, 6, 4,1

• The mode is greater by an amount equal to about

400 and above 30

• μ3= ∑(x- x )3/N

Height f Class Mid value

Вам также может понравиться