Вы находитесь на странице: 1из 71

Measures of Dispersion

or Variation
Introduction

1
Objectives of studying variation

o Sometimes the average alone cannot


adequately describe a set of
observations.

2
Example
Factory A Factory B Factory C
monthly wages monthly monthly wages
( Rs.) wages ( Rs.) ( Rs.)
2300 2310 2380
2300 2300 2210
2300 2304 2220
2300 2306 2200
2300 2280 2490

Total 11,500 11,500 11,500


Average 2,300 2,300 2,300 3
Observations:
 Since the average wage is same in all
factories, one is likely to conclude
that the factories are alike in their
wage structure.
 But a close examination will reveal
that the wage distribution in three
factories differs widely from one
another.

4
 In factory A , each and every worker
is getting the same wage, so there is
no variation.
 In factory B, there is slight variation
whereas in factory C, there is wide.
variation

5
Conclusion:
The measures of central tendency,
therefore, insufficient. It must be
supported by some other measures to
reveal the true characteristics of the
data.

6
Measures of Variation
• Knowing the measures of central
tendency is not enough
• Both of the distributions below have
identical measures of central
tendency

3-7
Figure 3.13
Definition
A measure of variation is designed to
measure to what extent the
individual observation differ from it’s
average.

8
Various Measures of Dispersion
1) Range
2) Quartile deviation
3) Mean deviation
4) Standard deviation

9
Desired Qualities of a good
measure of Variation
1. It should be easy to understand and
calculate.
2. It should be rigidly defined.
3. It should be based on all the values.
4. It should not be affected too much
by abnormal extreme values.
5. It should be capable for further
statistical analysis.
10
The Range
 Simplest measure of variation
 Range of a data is the difference between
largest (L) and smallest (S) value.

R= L- S
Coeffient of range = L-S
L+S

11
The Range
DCOVA
 Simplest measure of variation
 Difference between the largest and the smallest values:

Range = Xlargest – Xsmallest

Example:

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Range = 13 - 1 = 12
12
EXERCISE
• The following are the prices of shares of a
company from Monday to Friday. Calculate Range
and Coeffient of range

Day Price

Monday 200

Tuesday 210

Wednesday 208

Thursday 160

Friday 250

13
Merits and Limitations of Range
• Amongst all the methods it is the
simplest to understand and takes
minimum time to compute.

Limitations
i) Range is not based on each and every
observation of the distribution.

14
Measures of Variation:
Why The Range Can Be Misleading
 Does not account for how the data are
distributed
7 8 9 10 7 8 9 10 11
11 Range
12 = 12 - 7 = 5 12 Range = 12 - 7 = 5

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5

 Sensitive to outliers
Range = 5 - 1 = 4

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120

Range = 120 - 1 = 119


15
Interquartile range
Interquartile range represents the
difference between third quartile
and first quartile.
Symbolically,
Interquartile range = Q3 – Q1

16
Semi-interquartile range
or quartile deviation.

Q.D = (Q3 – Q1) / 2

17
Example: Calculate Quartile Deviation and its
coefficient from the following data
Weekly Income (Rs) No. of workers f c.f.

Below 350 8
350-370 16
370-390 39
390-410 58
410-430 60
430-450 40
450-470 22
470-490 15
490-510 15
510-530 9
530 and above 10 18
• Q1 = size of N th observation
4
= 292 = 73rd observation
4
Q1 lies in the class 390 – 410.
Q1 = L + N/4 – p.c.f. X I
f
= 393.448
Q3 = 449 QD= 19
Mean deviation

MD= ∑ x – x 
n

20
Example
X Deviation
from mean
Absolute
Deviation
from mean

Find the mean 10


deviation of 20
the data. 30
50
35
25
45
25
Total 21
240
Steps:
1. Calculate absolute deviation
( remove negative signs) from mean.

2. Add all the deviations.

3. Divide the sum of the deviations by


total no of observations.

22
Variance and Standard
Deviation
Variance The average of the squared
deviations of all the
population measurements
from the mean

Standard The square root of the

Deviation variance

23
Variance and Standard deviation

SD=σ = ∑(X- X)2


n

Variance= V= σ2 = ∑(X- X)2


n
Root mean of squared deviations of
values from the mean
24
The Sample Variance
DCOVA
• Root mean of squared deviations of
values from the mean
n

– Sample variance:  (X  X) i
2

S 2 i1
n -1
Where X = arithmetic mean
n = sample size
Xi = ith value of the variable X
25
Measures of Variation:
The Sample Standard Deviation

• Most commonly used measure of


variation
• Shows variation about the mean
• Is the square root of the variance
• Has the same units as the original data
Sample standard deviation: n

 (X  X)
i
2

S i1
n -1
Measures of Variation:
The Standard Deviation
DCOVA
Steps for Computing Standard Deviation

1. Compute the difference between each value


and the mean.
2. Square each difference.
3. Add the squared differences.
4. Divide this total by n-1 to get the sample
variance.
5. Take the square root of the sample variance
to get the sample standard deviation. 27
Measures of Variation:
Sample Standard Deviation:
Calculation Example
DCOVA
Sample
Data (Xi) : 10 12 14 15 17 18 18 24

n=8 Mean = X = 16

(10  X)2  (12  X)2  (14  X)2    (24  X)2


S
n 1

(10  16)2  (12  16)2  (14  16)2    (24  16)2



8 1

130 A measure of the “average”


  4.3095 scatter around the mean
7 28
Example
X Deviation
from mean
Square of
Deviation

Find the 10
Standard 20
deviation
and variance of 30
the following 50
data: 35
25
45
25
29
Total
Comparing Standard
Deviations
Data A
Mean = 15.5

11 12 13 14 15 16 17 18
S = 3.338
19 20 21
Data B Mean = 15.5
S = 0.926
11 12 13 14 15 16 17 18
19 20 21
Data C Mean = 15.5
S = 4.567
11 12 13 14 15 16 17 18
19 20 21
30
Comparing Standard
Deviations

Smaller standard deviation

Larger standard deviation

31
:
Summary Characteristics
 The more the data are spread out, the greater the
range, variance, and standard deviation.

 The more the data are concentrated, the smaller the


range, variance, and standard deviation.

 If the values are all the same (no variation), all these
measures will be zero.

 None of these measures are ever negative.


32
Measures of Variation:
The Coefficient of Variation
DCOVA
• Measures relative variation
• Always in percentage (%)
• Shows variation relative to mean
• Can be used to compare the variability
of two or more sets of data measured
in different units
S
CV     100%

X 
33
Measures of Variation:
Comparing Coefficients of Variation
• Stock A:
– Average price last year = $50
– Standard deviation = $5
S $5
CVA     100%   100%  10%
X $50 Both stocks
• Stock B: have the same
standard
– Average price last year = $100 deviation, but
stock B is less
– Standard deviation = $5 variable
relative to its
S $5 price
CVB     100%   100%  5%
X $100
34
Measures of Variation:
Comparing Coefficients of Variation (con’t)
• Stock A:
DCOVA
– Average price last year = $50
– Standard deviation = $5
S $5
 
CVA     100%   100%  10%
Stock C has a
X $50
much smaller
• Stock C:
standard
– Average price last year = $8 deviation but a
much higher
– Standard deviation = $2 coefficient of
variation
 S  $2
CVC     100%   100%  25%

X  $8
35
Example
Nutritional data about a sample of seven breakfast
cereals includes the number of calories per serving:
Compute mean, variance and s.d. of the calories in the
cereals.
Cereal Calories
Kellogg’s All Bran 80

Kellogg’s Corn Flakes 100

Wheaties 100

Organic Multigrain Flakes 110

Kellogg’s Rice krispies 130

Wheat Vanilla Almond 190

Kellogg’s Mini Wheats 200 36


Solution

• Mean= 130
• Variance= 2,200
• S.D= 46.9042
The standard deviation of 46.9042 indicates that the
calories in the cereals are clustering between 83.0948
and 176.9042.

37
Cereal Calories Amount of Sugar (
in grams)
Kellogg’s All Bran 80 6

Kellogg’s Corn Flakes 100 2

Wheaties 100 4

Organic Multigrain 110 4


Flakes

Kellogg’s Rice 130 4


krispies

Wheat Vanilla 190 11


Almond

Kellogg’s Mini 200 10


Wheats
38
Q. Which varies more from cereal to
cereal – the number of calories or
the amount of sugar (in grams) ?

39
Solution
• Since calories and the amount of sugar
have different units of measurements, C.V.
is needed to compare the variability in the
two measurements.
• CV (calories) = 36.08%
• CV (sugar) = 57.84%
• Relative to the mean , the amount of sugar
is much more variable than calories.

40
Compute Co-efficient of Variation of
the following data .Which team is more
consistent?

Team A Team B

500 350
300 250
200 350
100 280
400 320

41
Locating Extreme Outliers:
Z-Scores
 To compute the Z-score of a data value, subtract the mean
and divide by the standard deviation.

 The Z-score is the number of standard deviations a data


value is from the mean.

 A data value is considered an extreme outlier if its Z-score is


less than -3.0 or greater than +3.0.

 The larger the absolute value of the Z-score, the farther the
data value is from the mean.
Locating Extreme Outliers:
Z-Score

XX
Z
S

where X represents the data value


X is the sample mean
S is the sample standard deviation
Locating Extreme Outliers:
Z-Score

 Suppose the mean math MAT score is 490,


with a standard deviation of 100.
 Compute the Z-score for a test score of 620.
X  X 620  490 130
Z    1.3
S 100 100

A score of 620 is 1.3 standard deviations above the mean and would not
be considered an outlier.
Cereal Calories

Kellogg’s All Bran 80

Kellogg’s Corn Flakes 100

Wheaties 100

Organic Multigrain Flakes 110

Kellogg’s Rice krispies 130

Wheat Vanilla Almond 190

Kellogg’s Mini Wheats 200

45
46
Practice Problems
1. A company has two sections with 40 and
65 employees respectively. Their average
weekly wages $450 and $350. The
standard deviation are 7 and 9.

(i) Which section has a larger wage bill ?


(ii) Which section has larger variability in
wages?

47
2. A group of 80 candidates have their
average height is 148.5 cm with
coefficient of variation 2.5%. What
is the standard deviation of their
height?

48
Solution
• Section B is larger in bill.

• CVa = 1.56%

• CVb = 2.57%

49
Skewness
The term ‘Skewness’ refers to lack of
symmetry or departure from
symmetry.

50
When a distribution is not symmetrical
(i.e. asymmetrical), it is called a skewed
distribution.

51
1.Karl Pearson’s Coefficient of
Skewness
Karl Pearson’s Coefficient of Skewness = Mean- Mode
σ
It is independent of the unit of measurement/

If the mode is ill-defined, the modified formula is


Coefficient of Skp = 3(Mean- Median)
σ
The coefficient lies between ±1 for moderately
skewed distribution.

52
Practice question
1. Compute Karl Pearson’s Coefficient of
Skewness of the following data:
6,5,7, 5, 8, 4,6,4,6,1.

2. Find Skewness of 6,5, 7,5, 8, 4, 6, 4, 6, 4,1

53
2. Bowley’s Coefficient of Skewness
This method is based on Quartiles. The
formula is:
Sk B = (Q3 –Q2) – (Q2 –Q1 )
(Q3 –Q1 )

= Q3 +Q1 -2 Med.
(Q3 –Q1 )
This method is useful in case of open-end
distribution and where extreme values are
present.
54
3. Kelly’s Coefficient of
Skewness
• Sk k = P90 – 2 P50 + P10
P90 – P10

• Sk k = D9 – 2 D5 + D1
D 9 – D1

55
It should be noted that the three different
formulae of calculating skewness are based
on different assumptions and hence the
answer obtained from the same question by
different method may differ.

56
Q1. The following data relate to the profits
of 1,000 companies: Calculate coefficient
of skewness and comment on its value.
Annual Profits ( crore) No. of companies

10-12 17

12-14 53

14-16 199

16-18 194

18-20 327

20-22 208

22-24 2
57
• Mean= 17.786
• Mode= 19.056
• S.D.= 2.52
• SKp= -0.504 ( moderate negatively skewed)

• The mode is greater by an amount equal to about


50.4 per cent of the value of s.d.

58
Q2.The following table gives the distribution of
daily wages of 500 unskilled workers in a factory:
a)Obtain the limits of daily wages of central 50 per
cent of the observed workers.
b)Calculate Bowley’s Coefficient of Skewness .
Daily wages No. of workers

Below 200 10

200-250 25

250-300 145

300-350 220

350-400 70

400 and above 30


59
• Q1 = 281.03

• Q3 = 344.32
• Hence the daily wages of central 50% of workers lies
between 281.03/- and 344.32 .

• Sk B = -0.102

60
Q3.A bank branch located in a city has the
business objective of developing an
improved process for serving customers
the noon -to – 1.00 p.m. lunch period. The
Waiting time , in minutes, is defined as the
time the customers enters the line till when
he or she reaches the teller window. Data
collected from a sample of 15 customers
during this hour are stored in

61
Data

62
Compute mean, median,
Variance, S.D. and Z
scores.
• Are there any outliers ?
• Are the data skewed?
• As a customer walks into the branch
office during the lunch hour, she
asks the branch manager how long
she can expect to wait. The branch
manager replies “ Almost certainly
less than 5 minutes’. Evaluate the
accuracy of this statement. 63
KURTOSIS
Kurtosis refers to the degree of flatness
or peakedness in the frequency distribution
curve.

64
If the curve is more peaked than the
normal curve it is called ‘leptokurtic’.
If it is more flat-topped than the
normal curve it is called ‘platykurtic’.
The normal curve itself is known as
‘mesokurtic’.

65
Moments about Mean
• μ1 = ∑(x- x )/N
• μ2 = ∑(x- x )2/N

• μ3= ∑(x- x )3/N


• μ4 = ∑(x- x )4/N

66
• Kurtosis ß 2 = ( μ4 / μ2 2 ) -3

67
Example: College Student’s Heights

Height f Class Mid value


(inches)
Here are grouped
data for heights of
100 randomly 59.5-62.5 61 5
selected students
from St. Stephen 62.5-65.5 64 18
College
65.5-68.5 67 42

68.5-71.5 70 27

71.5-74.5 73 8
68
a) Draw a histogram to check whether data is
symmetric or skewed.
b) If data is skewed, comment on how highly
skewed.
c) Also calculate kurtosis value and interpret
the result.

69
Answer keys
• Mean= 67.45
• Skewness =-0.1082( approximately
symmetric)

70
Bin f
62.5 5
65.5 18
68.5 42
71.5 27
74.5 8
Histogram of Heights
45
40
35
30
25
20
15
10
5
0
62.5 65.5 68.5 71.5 74.5

71

Вам также может понравиться