Академический Документы
Профессиональный Документы
Культура Документы
or Variation
Introduction
1
Objectives of studying variation
2
Example
Factory A Factory B Factory C
monthly wages monthly monthly wages
( Rs.) wages ( Rs.) ( Rs.)
2300 2310 2380
2300 2300 2210
2300 2304 2220
2300 2306 2200
2300 2280 2490
4
In factory A , each and every worker
is getting the same wage, so there is
no variation.
In factory B, there is slight variation
whereas in factory C, there is wide.
variation
5
Conclusion:
The measures of central tendency,
therefore, insufficient. It must be
supported by some other measures to
reveal the true characteristics of the
data.
6
Measures of Variation
• Knowing the measures of central
tendency is not enough
• Both of the distributions below have
identical measures of central
tendency
3-7
Figure 3.13
Definition
A measure of variation is designed to
measure to what extent the
individual observation differ from it’s
average.
8
Various Measures of Dispersion
1) Range
2) Quartile deviation
3) Mean deviation
4) Standard deviation
9
Desired Qualities of a good
measure of Variation
1. It should be easy to understand and
calculate.
2. It should be rigidly defined.
3. It should be based on all the values.
4. It should not be affected too much
by abnormal extreme values.
5. It should be capable for further
statistical analysis.
10
The Range
Simplest measure of variation
Range of a data is the difference between
largest (L) and smallest (S) value.
R= L- S
Coeffient of range = L-S
L+S
11
The Range
DCOVA
Simplest measure of variation
Difference between the largest and the smallest values:
Example:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 13 - 1 = 12
12
EXERCISE
• The following are the prices of shares of a
company from Monday to Friday. Calculate Range
and Coeffient of range
Day Price
Monday 200
Tuesday 210
Wednesday 208
Thursday 160
Friday 250
13
Merits and Limitations of Range
• Amongst all the methods it is the
simplest to understand and takes
minimum time to compute.
Limitations
i) Range is not based on each and every
observation of the distribution.
14
Measures of Variation:
Why The Range Can Be Misleading
Does not account for how the data are
distributed
7 8 9 10 7 8 9 10 11
11 Range
12 = 12 - 7 = 5 12 Range = 12 - 7 = 5
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Sensitive to outliers
Range = 5 - 1 = 4
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
16
Semi-interquartile range
or quartile deviation.
17
Example: Calculate Quartile Deviation and its
coefficient from the following data
Weekly Income (Rs) No. of workers f c.f.
Below 350 8
350-370 16
370-390 39
390-410 58
410-430 60
430-450 40
450-470 22
470-490 15
490-510 15
510-530 9
530 and above 10 18
• Q1 = size of N th observation
4
= 292 = 73rd observation
4
Q1 lies in the class 390 – 410.
Q1 = L + N/4 – p.c.f. X I
f
= 393.448
Q3 = 449 QD= 19
Mean deviation
MD= ∑ x – x
n
20
Example
X Deviation
from mean
Absolute
Deviation
from mean
22
Variance and Standard
Deviation
Variance The average of the squared
deviations of all the
population measurements
from the mean
Deviation variance
23
Variance and Standard deviation
– Sample variance: (X X) i
2
S 2 i1
n -1
Where X = arithmetic mean
n = sample size
Xi = ith value of the variable X
25
Measures of Variation:
The Sample Standard Deviation
(X X)
i
2
S i1
n -1
Measures of Variation:
The Standard Deviation
DCOVA
Steps for Computing Standard Deviation
n=8 Mean = X = 16
Find the 10
Standard 20
deviation
and variance of 30
the following 50
data: 35
25
45
25
29
Total
Comparing Standard
Deviations
Data A
Mean = 15.5
11 12 13 14 15 16 17 18
S = 3.338
19 20 21
Data B Mean = 15.5
S = 0.926
11 12 13 14 15 16 17 18
19 20 21
Data C Mean = 15.5
S = 4.567
11 12 13 14 15 16 17 18
19 20 21
30
Comparing Standard
Deviations
31
:
Summary Characteristics
The more the data are spread out, the greater the
range, variance, and standard deviation.
If the values are all the same (no variation), all these
measures will be zero.
Wheaties 100
• Mean= 130
• Variance= 2,200
• S.D= 46.9042
The standard deviation of 46.9042 indicates that the
calories in the cereals are clustering between 83.0948
and 176.9042.
37
Cereal Calories Amount of Sugar (
in grams)
Kellogg’s All Bran 80 6
Wheaties 100 4
39
Solution
• Since calories and the amount of sugar
have different units of measurements, C.V.
is needed to compare the variability in the
two measurements.
• CV (calories) = 36.08%
• CV (sugar) = 57.84%
• Relative to the mean , the amount of sugar
is much more variable than calories.
40
Compute Co-efficient of Variation of
the following data .Which team is more
consistent?
Team A Team B
500 350
300 250
200 350
100 280
400 320
41
Locating Extreme Outliers:
Z-Scores
To compute the Z-score of a data value, subtract the mean
and divide by the standard deviation.
The larger the absolute value of the Z-score, the farther the
data value is from the mean.
Locating Extreme Outliers:
Z-Score
XX
Z
S
A score of 620 is 1.3 standard deviations above the mean and would not
be considered an outlier.
Cereal Calories
Wheaties 100
45
46
Practice Problems
1. A company has two sections with 40 and
65 employees respectively. Their average
weekly wages $450 and $350. The
standard deviation are 7 and 9.
47
2. A group of 80 candidates have their
average height is 148.5 cm with
coefficient of variation 2.5%. What
is the standard deviation of their
height?
48
Solution
• Section B is larger in bill.
• CVa = 1.56%
• CVb = 2.57%
49
Skewness
The term ‘Skewness’ refers to lack of
symmetry or departure from
symmetry.
50
When a distribution is not symmetrical
(i.e. asymmetrical), it is called a skewed
distribution.
51
1.Karl Pearson’s Coefficient of
Skewness
Karl Pearson’s Coefficient of Skewness = Mean- Mode
σ
It is independent of the unit of measurement/
52
Practice question
1. Compute Karl Pearson’s Coefficient of
Skewness of the following data:
6,5,7, 5, 8, 4,6,4,6,1.
53
2. Bowley’s Coefficient of Skewness
This method is based on Quartiles. The
formula is:
Sk B = (Q3 –Q2) – (Q2 –Q1 )
(Q3 –Q1 )
= Q3 +Q1 -2 Med.
(Q3 –Q1 )
This method is useful in case of open-end
distribution and where extreme values are
present.
54
3. Kelly’s Coefficient of
Skewness
• Sk k = P90 – 2 P50 + P10
P90 – P10
• Sk k = D9 – 2 D5 + D1
D 9 – D1
55
It should be noted that the three different
formulae of calculating skewness are based
on different assumptions and hence the
answer obtained from the same question by
different method may differ.
56
Q1. The following data relate to the profits
of 1,000 companies: Calculate coefficient
of skewness and comment on its value.
Annual Profits ( crore) No. of companies
10-12 17
12-14 53
14-16 199
16-18 194
18-20 327
20-22 208
22-24 2
57
• Mean= 17.786
• Mode= 19.056
• S.D.= 2.52
• SKp= -0.504 ( moderate negatively skewed)
58
Q2.The following table gives the distribution of
daily wages of 500 unskilled workers in a factory:
a)Obtain the limits of daily wages of central 50 per
cent of the observed workers.
b)Calculate Bowley’s Coefficient of Skewness .
Daily wages No. of workers
Below 200 10
200-250 25
250-300 145
300-350 220
350-400 70
• Q3 = 344.32
• Hence the daily wages of central 50% of workers lies
between 281.03/- and 344.32 .
• Sk B = -0.102
60
Q3.A bank branch located in a city has the
business objective of developing an
improved process for serving customers
the noon -to – 1.00 p.m. lunch period. The
Waiting time , in minutes, is defined as the
time the customers enters the line till when
he or she reaches the teller window. Data
collected from a sample of 15 customers
during this hour are stored in
61
Data
62
Compute mean, median,
Variance, S.D. and Z
scores.
• Are there any outliers ?
• Are the data skewed?
• As a customer walks into the branch
office during the lunch hour, she
asks the branch manager how long
she can expect to wait. The branch
manager replies “ Almost certainly
less than 5 minutes’. Evaluate the
accuracy of this statement. 63
KURTOSIS
Kurtosis refers to the degree of flatness
or peakedness in the frequency distribution
curve.
64
If the curve is more peaked than the
normal curve it is called ‘leptokurtic’.
If it is more flat-topped than the
normal curve it is called ‘platykurtic’.
The normal curve itself is known as
‘mesokurtic’.
65
Moments about Mean
• μ1 = ∑(x- x )/N
• μ2 = ∑(x- x )2/N
66
• Kurtosis ß 2 = ( μ4 / μ2 2 ) -3
67
Example: College Student’s Heights
68.5-71.5 70 27
71.5-74.5 73 8
68
a) Draw a histogram to check whether data is
symmetric or skewed.
b) If data is skewed, comment on how highly
skewed.
c) Also calculate kurtosis value and interpret
the result.
69
Answer keys
• Mean= 67.45
• Skewness =-0.1082( approximately
symmetric)
70
Bin f
62.5 5
65.5 18
68.5 42
71.5 27
74.5 8
Histogram of Heights
45
40
35
30
25
20
15
10
5
0
62.5 65.5 68.5 71.5 74.5
71