Вы находитесь на странице: 1из 6

INTRODUTION AND INFORMATION

Last Chapter, we discussed the following


- computation, comparison and uses of the
several types of measuring the averages of a
set of numerical data = MEAN, MEDIAN,
MODE represent typical values that tend to
be centrally located in a set of data,
arranged by magnitude in an increasing or
decreasing order.
- Quantiles quartile, decile, and
percentile commonly used positional
measures that divide a distribution of data
into equal parts.
- info on the measures of central location and
the positions of division of a set of numerical
data not certainly sufficient to give an
adequate description of the data.
How the numerical data tend to spread or
scatter about the average value
- Two sets of data having the same mean or
median may differ considerably in the spread
of their data from the average
2 sets of data A and B have the same mean
of 50. Data A has a lowest score of 5 and a
highest score of 95, while data B has 20 and
80 for its lowest and highest scores, while
the 2 sets of data have the same mean, they
differ considerably in the spread of their data
in terms of the range. The range of data A is
95 5 = 90, while for the data B is 80 20 =
60. This indicates that the values in data B
are less scattered about the mean of 50,
hence, more homogeneous than those
values in data A. If we will find the
differences of the 2 ranges, 90 for A and 60
for B, will note a big gap of 30.
- The measures of central tendency not
enough to provide complete and useful data
and information, there is need to support
them with other computational measures of
description = MEASURES OF VARIATION
OR MEASURES OF DISPERSION indicate
the degree or extent to which numerical

values are dispersed or spread out about the


average value in a distribution.
Popularly used measures of variation the
range, the semi interquartile range,
the quartile range, the mean deviation
or the average deviation, the variance
and the standard deviation
THE RANGE
- simplest to compute
- difference between the largest and the
lowest values in the set of numerical data.
FOR UNGROUPED DATA difference
between the largest value and the lowest
value
R = HV LV
Example :
The scores obtained by 12 students in a
Statistics class are 80, 75, 61, 95, 100, 78,
85, 90,73, 65, 87, and 81. Find the range.
R = HV LV
R = 99 61 = 38
FOR GROUPED DATA subtracting the
lower boundary of the lowest class interval
from the upper boundary of the highest class
interval of a frequency distribution. This is so
because the class boundaries are considered
the true limits.
R = UB HCI LBLCI
Example :
Find the range of a given frequency
distribution whose highest class interval is 91
95 and lowest class interval is 61 65.
The upper boundary of the highest class
interval is 95.5 and the lower boundary of
the lowest class interval is 60.5.
R = UBHCI LBLCI
R = 95.5 60.5 = 35
*the range considers only the extreme
values the largest and the lowest values
in a distribution and shows only the
difference or the distance between these 2
values.
- it does not consider and tell anything about
all the other values between these extreme
values.

- if there are 1000 observations, only 20


observations are considered in the
calculation of the range. The rest of the
observations are simply ignored. The range
is a poor and unstable measure of variation,
particularly, if we consider a large number of
values.it is least reliable and should be used
only when someone wants to obtain a quick
measure of variation.
- A reliable measure of spread is one which
considers or takes into account all the values
in a distribution.
THE INTER-QUARTILE RANGE (IQR)
- difference between the values of the third
quartile(Q3 )
or upper quartile and the first quartile (Q 1 ) or
the lower quartile.
IQR = Q3 - Q1
THE SEMI INTERQUARTILE RANGE OR
QUARTILE DEVIATION (SIQR OR QD)
- indicates the variation or dispersion of the
values covering the middle 50% of the
distribution of the data. It is found by getting
the half of the value or distance between the
third quartile or upper quartile and the first
quartile or the lower quartile.
SIQR or QD = Q3 - Q1 / 2
*The range, the interquartile range and the
semi interquartile range have the same
disadvantage.
- Each does not provide idea on the density
of observations, and hence give only little
information on the concentration of the
observations about the central values
- The semi interquartile range or quartile
deviation is an appropriate measure of
variation only if the median is the one that is
used as the measure of central tendency and
if the distribution is skewed.
EXAMPLE:
A manufacturing company produced the
following number of units per day for a given
period
21,25,20,28,30,23,22,31,32,27,19,33,24,29,2
6 and 34
Determine the

1. Range:
R = HV LV
= 34 19 = 15
*Arrange the production units accdg to
magnitude and calculate Q1 and Q3
19 20 21 22 23 24 25 26 27 28 29 30 31 32
33 34
* Q1 is up to the point containing the lower
25% of the data arranged from lowest to
highest while Q3 is up to the point containing
the upper 75% of the data
APPLY THE FORMULAS AS FOLLOWS
2. IQR = Q3 - Q1 = 30.5-22.5 = 8
3. SIQR or QD = Q3 - Q1 / 2 = 30.5-22.5/2= 4
THE MEAN DEVIATION OR AVERAGE
DEVIATION
- more reliable measure of variation than
range etc.
- takes into account the deviations of the
individual
Values from the mean.
-while the deviations from the mean of the
individual values could either be + or
- we shall calculate the mean deviation using
the absolute values of deviations
- the absolute value of -7 or +7 is simply 7, it
uses the absolute values of the deviations
referred to as THE MEAN ABSOLUTE
DEVIATION
- the mean dev. Defined as the ave of the
absolute deviations of the individual values
of a set of numerical data from either the
mean, median mode
- MEAN most preferred and commonly
used measure of central tendency for
computing the mean dev.

Where
x = individual value for ungrouped data and
the midpoint of each class interval for
grouped data
_
X or u = mean of the data
n = total number of frequencies
f = frequency of each class interval

FOR UNGROUPED DATA


1. Arrange the values from the lowest to the
highest or vice versa
2. Compute the value of the mean
3. Find the individual absolute value of each
deviation from the mean
4. Find the sum of the absolute values in step
3
5.Substitute the values in the formula and
solve
*companies use this measure for planning
exercises, notably when making economic
fore castings.
EXAMPLE
Say we need to find the mean deviation
about mean for the following data set: 112,
99, 103, 45, 79
1. Find the mean of the given data. Mean =
(Sum of observations) / (Number of
observations) = (6 + 7 + 10 + 12 + 13 + 4
+ 8 + 12) / 8 = 72/8 = 9
2. Subtract the mean from each of the
observations and record your results.
These results are the respective deviations of
the observations from the mean.
6 9, 7 9, 10 9, 12 9, 13 9, 4 9, 8 9,
12 9 or, -3, 2, 1, 3, 4, 5, 1, 3
3. Find absolute values of the deviations
obtained above. |-3|, |2|, |1|, |3|, |4|, |5|, |
1|, |3| or, 3, 2, 1, 3, 4, 5, 1, 3
4. Add the absolute values obtained above. 3
+ 2 + 1 + 3 + 4 + 5 + 1 + 3 = 22
5.Divide the result of last step by the number
of observations. This will give you the final
answer.
Thus, the mean deviation about mean for the
given ungrouped data is = 22/8 = 2.75
FOR GROUPED DATA
1. Compute the mean of the distribution
2. Subtract the mean from each of the
midpoints and write the absolute values of
the results under the column.
3. Find the products of items under column f
and items under column
4. Add the products in step 3 to obtain the
value of.. (numerator - formula)
5. Divide the sum obtained in step 4 by n

EXAMPLE
Calculate the mean average deviation for the
following data:
Classes

10-15 15-20 20-25 25-30 30-35

Frequencies

Solution : Computation of Mean


Classes

fi

xi

xifi

10-15

12.5

37.5

15-20

17.5

87.5

20-25

22.5

157.5

25-30

27.5

110

30-35

32.5

n = 21

65
xifi = 457.5

Mean = 457.521 = 21.786


Formula for MAD for grouped data is as
follows :
MAD = ni=1|xix|fin
= |12.521.786|.3+|17.521.786|.5+|
22.521.786|.7+|27.521.786|.4+|
32.521.786|.221
= (9.286)3+(4.286)5+(0.714)7+(5.714)4+(1
0.714)221
= 27.858+21.43+4.998+22.856+21.42821
= 98.5721
= 4.693
THE VARIANCE
- most important measures of variability
called the variance and its corresponding
square root.
-defined as the average of the squared
deviations from the mean
-the square root of this variance is known as
the standard variation
- the variance for a sample data is denoted
by S2 (S squared) while the symbol for
variance of the population is 2 (sigma
squared)
FOR UNGROUPED DATA
1. Arrange the values accdg to magnitude
(lowest to highest or vice versa) vertically

2. Calculate the mean


3. Obtain the individual deviations from the
mean
4. Square each deviation and write the
results under column (x-u)^2
5. Find the sum of the squared deviations
6. Divide the sum in step 5 by n-1 for sample
data or by N for population data.
For the variance of a sample data

xx
75

2.4

5.76

83

10.4

108.16

54

-18.6

345.96

90

17.4

302.76

61

-11.6

134.56

x =
363

(xx)2 =897.
2

_
X is the mean of the sample data

Formula for variance is given by:


2 = (xx)^2 /n1

For the variance of a population

2 = 897.2/4
= 224.3

Where u represents the mean of the


population

*the computation of the variance for


ungrouped data using the above procedures
and time consuming.
A more simpler and easier solution can be
done through the raw data formula.

*use n 1 instead of n as divisor in


computing the variance of a sample data
although the variance is defined as the
average of the squared deviations about the
mean.
The reason for this is to avoid the likely
existence of biases that are normally
associated with the use of the variance
computed from different random samples,
especially when the sample sizes n are small.
The n different sizes selected from the same
population generally yield different values for
the variance
But the average of these values computed
from several samples of the population tends
to be closer to the actual variance THE
POPULATION VARIANCE
EXAMPLE
Marks obtained by few students are: 75, 83,
54, 90, 61. Find the variance of the sample ?
Solution:
Formula for mean:
x = x/n
x = 363/5 = 72.6
Construct the following table:
x

(xx)2

The raw data formulas for computing the


variance of ungrouped data are:
FOR SAMPLE DATA

FOR POPULATION DATA

We will use n for the number of observations


for sample data and N for population data.
To solve the variance of ungrouped data by
the raw data formula we will follow the
procedures enumerated below
1. Arrange the values in terms of their
magnitude
2. Find the sum of the values
3. Square each value and write the results
under the column x2
4. Get the sum of the squared values in step
3
5. Substitute the results obtained in step 2
and step 4 in the raw data formula
EXAMPLE
Find the variance by the raw data formula
With the computed mean of 13.5
X
X2

9
10
11
13
13
15
17
20
= 108
n=8

81
100
121
169
169
225
289
400
= 1554

Substituting the computed values obtained


above in the raw data formula, we shall have

= 8(1554)-(108)2 / 8(8-1)
=12432-11664/56
=768/56
= 13.71
THE VARIANCE FOR GROUPED DATA

The simplified Computation in solving the


variance of a grouped data by using a short
method known as the coding method formula
Method 2 Short Method
For sample and For population Data

First Method involves rather longer


procedures using the mean deviation formula
as follows
Method 1 Long Method Formula
For Sample and For Population Data
(resp.)

We use x for the mean of a sample data and


u for the mean of a population data.
PROCEDURES:
1. Find the value of the mean
2. Subtract the mean from each of the
midpoints of each class interval
3. Square each of the deviations in step 2
and write the results under (x-x)2
4. Multiply the squared deviations in step 3
by their corresponding frequencies
5. Obtain the sum of the results in step 4
6. Divide the results in step 5 by n-1 for
sample data and by N for population data

The d represents the coded value of a class


interval and c, the interval class size of the
class interval.
To obtain the variance of a grouped data by
the short method or coding formula, we shall
follow the procedures listed below.
1. Write the coded values of the class
intervals under the d column
2. Multiply the frequencies by the
corresponding coded values
3. Multiply the squared coded values by the
corresponding frequencies
4. Add the results in step 2 and step 3
5. Substitute the values in the coding
formula
EXAMPLE
The following gives the frequency
distribution of the daily commuting time (in
minutes) from home to work for all 25
employees of a company.

FOR GROUPED DATA for population


data

EXAMPLE: BASED ON THE BOOK (Please refer


to examples 1 and presented in table 3
example 1)
If the variance of the ungrouped sample data
is 13.71
It is calculated in the following manner to get
the standard deviation
Substitute the following.
S=
THE STANDARD DEVIATION
-Variance is the average of the squared
deviations from the mean. This means that
the variance obtained for a group of data is
expressed in terms of squared units and
therefore it is difficult to use. Because of this
it is more convenient to use the square root
of the variance known as the standard
deviation.
- THE MOST IMPORTANT MEASURE OF
VARIATION
- we will be able to determine the position of
the scores in a frequency distribution in
relation to the mean
-Of a small value means that the values, in a
distribution are scattered or spread out near
the mean and vice versa
FORMULAS TO COMPUTE THE
STANDARD DEVIATION
FOR UNGROUPED DATA for sample
data

= 13.71
S = 3.70
If the variance of the grouped sample data is
52.93
Substitute.
S=

= 52.93
S = 7.28
*The variance and the standard deviation are
generally accepted measures of dispersion.
*the standard deviation is more popularly
used than the variance since its value is
expressed in the unit of measurement as the
observations and the mean. Its present
values that can be used directly for analysis
and interpretation.
EXAMPLE:
When the unit of measurement of the data is
in kilos the standard deviation and the mean
are in kilos. The variance on the other hand
is in kilos squared.

Вам также может понравиться