Вы находитесь на странице: 1из 14

Chapter 3 Numerical Descriptive Measures

3.1 Measures of central tendency for ungrouped data



Represent a data set by some numerical measures (typical values)
A single value that summarizes a set of data
It locates the centre of the values
Give the centre of a histogram or a frequency distribution curve

Consists of 3 measures:
1. Mean 2. Median 3. Mode


Mean
The mean for population data
N
x x x , , ,
2 1
is denoted by and is defined as

=
=
+ + +
=
N
i
i
N
x
N N
x x x
1
2 1
1

The mean for sample data
n
x x x , , ,
2 1
is denoted by x and is defined as

=
=
+ + +
=
n
i
i
n
x
n n
x x x
x
1
2 1
1

where
N is the population size
n is the sample size.

Example 3.1
The following are the ages of all eight employees of a small company
53 32 61 27 39 44 49 57

Solution:

25 . 45
8
57 32 53
=
+ + +
=

years


Reconsider the Example 3.1, a sample of three employees from this company is
} 57 , 39 , 32 {
67 . 42
3
57 39 32
=
+ +
= x years

Reconsider the Example 3.1, a sample of three employees from this company is
} 44 , 27 , 53 {
33 . 41
3
44 27 53
=
+ +
= x years

Consequently, the value of the population mean is constant.


Note: 1. Mean not necessary takes one of the values in the original data
2. Mean is influenced by extreme value
3. Mean is not suitable in the data set that contain extreme value


For ungrouped data in the form of frequency distribution of single-valued classes

=
E
E
= =
+ +
=
n
i i
i i
i i
n n
f
x f
x f
n n
x f x f x f
x
1
2 2 1 1
1



Example 3.2
Find the mean of the following frequency distribution.


i
x 2 5 6 8

i
f
1 3 4 2

Solution:


i
x
2 5 6 8

i
f
1 3 4 2

i i
x f
2 15 24 16

7 . 5
10
16 24 15 2
=
+ + +
= x


Some properties for the mean

For a data set } , , , {
2 1 n
x x x with mean x . If each number in that data set
1. is added with a constant b, then the new mean is b x + .
2. is multiplied by a constant a, then the new mean is x a .
3. is multiplied by a constant a and then is added with a constant b
} , , 2 , 1 , { n i b ax y
i i
= + = , then the new mean b x a y + = .


Outliers or extreme values
Values that are very small or very large relative to the majority of the values in a data set.


Example 3.3 (effect of an outlier on the mean)
Given the values 5610 3243 609 1187 32268
Very obvious 32268 is an outlier.
Mean =

If we exclude the value 32268,
Mean =


Median
the value of the middle term in a data set that has been ranked in increasing or decreasing
order

Median = the value of the th
n
|
.
|

\
| +
2
1
term in a ranked data set; n = total number

Note:
1. If n is odd, then median the value of the middle term in the ranked data
2. If n is even, then median the average values of the two middle term

Example 3.4
Find the median of set A = { 10, 5, 19, 8, 3 } and set B = { 2, 7, 3, 6, 4, 5 }

Solution:
Rearrange the data in ascending order
A= } 19 , 10 , 8 , 5 , 3 { B= } 7 , 6 , 5 , 4 , 3 , 2 {

Median = 8 Median = 5 . 4
2
5 4
=
+



Note: 1. Median is not influenced by the extreme value
2. Extreme values are values that are very small or very large relative to the
majority of the values in a data set

For ungrouped data in the form of frequency distribution of single-valued classes
the median can be found either from ungrouped frequency distribution or from the
cumulative frequency distribution

Example 3.5
Find the median of the following frequency distribution.

No. of children 0 1 2 3 4 5
Frequency 3 5 12 9 4 2

Solution:
Construct a cumulative frequency distribution

No. of children <0 <1 <2 <3 <4 <5 <6
Cumulative frequency 0 3 8 20 29 33 35











Mode
the value that occurs with the highest frequency in a data set

Example 3.6
Find the mode of each of the following data set.
i) 74, 9, 5, 8, 3, 8, 8 ii) 2, 6, 6, 6, 3, 8, 8, 8, 3
iii) 2, 2, 6, 6, 8, 8, 9, 9 iv) B, C, D, A, A, C, C, C, B, A

Solution:







Note: 1. Mode is not influence by the extreme value
2. Mode may not exist, exist one mode(unimodal), two modes(bimodal) or more
than two modes(multimodal)
3. Mode can be used for both quantitative and qualitative data


Example 3.7
Find the mode of the following frequency distribution.

No. of children 0 1 2 3 4 5
Frequency 3 5 12 9 4 2

Solution:























3.2 Measures of Central Tendency for grouped data
Mean

Suppose data are grouped into k class intervals, and


i
f = the frequency of class i

=
=
N
i
i
f N
1
=population size

i
m = the midpoint of class i

=
=
n
i
i
f n
1
= sample size

mean for population data:
N
m f
i i
E
=

mean for sample data:
n
m f
x
i i
E
=

Example 3.8
Find the mean of the following frequency distribution.

MAA161 Scores 10 12 13 15 16 18 19 21
Number of students 4 12 20 14


Solution Ex. 3.8

Class interval 10 12 13 15 16 18 19 21
Class midpoint (
i
m ) 11 14 17 20
Frequency (
i
f )
4 12 20 14
i i
m f 44 168 340 280



64 . 16
50
832
= =
E
=
N
m f
i i


64 . 16
50
832
= =
E
=
n
m f
x
i i














Relationships among the mean, median and mode

1. Symmetric histogram and frequency curve with one peak.
The values of the mean, median and mode are identical and lie at the center of the
distribution.




2. Histogram and frequency curve skewed to the right.
The value of the mean is the largest, mode is the smallest and median lies between
these two.



3. Histogram and frequency curve skewed to the left.
The value of the mean is the smallest, mode is the largest and median lies between
these two.


3.3 Measures of dispersion

3.3.1 Measures of dispersion for ungrouped data

Sometimes, the measures of central tendency only are not enough to reveal the whole
picture of the distribution of a whole data set
The measure of central tendency does not describe how the data is distributed

Data set Data Mean Median Mode
A 1, 3, 6, 10, 10, 21, 26 11 10 10
B 7, 8, 10, 10, 10, 15, 17 11 10 10
the mean, median and mode are the same for data set A and B but the distribution of the
data is different.


Range
The range for a data set } , , , {
2 1 n
x x x is defined to be the difference between the largest
value and smallest value.

value smallest value largest Range =


Example 3.9: Find the range for data set A and data set B.





Variance

The variance is the average of the squared deviation of the data from the mean


Consider a population of N measurements
N
x x x , , ,
2 1

Population Mean =

=
=
N
i
i
x
N
1
1

Population Variance =
(
(

|
.
|

\
|
= =

= = =
2
1 1
2
1
2 2
1 1
) (
1
N
i
i
N
i
i
N
i
i
x
N
x
N
x
N
o

Consider a sample of n measurements
n
x x x , , ,
2 1

Sample Mean =

=
=
n
i
i
x
n
x
1
1

Unbiased Sample Variance =
(
(

|
.
|

\
|

=

= = =
n
i
n
i
i i
n
i
i
x
n
x
n
x x
n
s
1
2
1
2
1
2 2
1
1
1
) (
1
1





Standard Deviation

The standard deviation is the positive square root of the variance
Sample standard deviation =
2
s s =
Population standard deviation =
2
o o =

Note:
A small standard deviation means that the data are distributed closely to their mean
A large standard deviation means that the data are widely scattered about their mean
It is influenced by extreme values





Example 3.10
Data shows the salary per day for all 6 employees of a small company.
29.50, 16.50, 35.40, 21.30, 49.70, 24.60
Calculate the variance and standard deviation for these data.

Solution:

Method 1
Mean, = 29.5

i
x
i
x
2
) (
i
x
2
i
x
29.50 0.00 0.00 870.25
16.50 -13.00 169.00 272.25
35.40 5.90 34.81 1253.16
21.30 - 8.20 67.24 453.69
49.70 20.20 408.04 2470.09
24.60 -4.90 24.01 605.16

177 =
i
x

= 1 . 703 ) (
2

i
x

2
i
x =5924.6

Population variance =

=
=
N
i
i
x
N
1
2 2
) (
1
o 18 . 117 ) 1 . 703 (
6
1
= =

Population standard deviation = 18 . 117 = o

Method 2

= E
2
i
x
Population variance =
( )
18 . 117
6
177
6 . 5924
6
1 1 1
2
2
1 1
2 2
=
(

=
(
(

|
.
|

\
|
=

= =
N
i
i
N
i
i
x
N
x
N
o

Population standard deviation = = o 18 . 117



Example 3.13
A sample consists of 5 data values: 72, 49, 79, 55 and 57. Calculate the variance and standard
deviation.

Solution
5 = n , = E
i
x 312
= E
2
i
x 20100
Sample variance =
(
(

|
.
|

\
|

=

= =
n
i
n
i
i i
x
n
x
n
s
1
2
1
2 2
1
1
1
= 8 . 157
5
) 312 (
20100
4
1
2
=
(


Sample standard deviation = = s 56 . 12 8 . 157 =




3.3.2 Measures of dispersion for grouped data

Variance
Population Variance =
(
(

|
.
|

\
|
= =

= = =
2
1 1
2
1
2 2
1 1
) (
1
N
i
i i
N
i
i i
N
i
i i
m f
N
m f
N
m f
N
o
Sample Variance =
(
(

|
.
|

\
|

=

= = =
n
i
n
i
i i i i
n
i
i i
m f
n
m f
n
x m f
n
s
1
2
1
2
1
2 2
1
1
1
) (
1
1


Example 3.14
Find the variance from the following frequency distribution if it represent
a) population b) sample

Height (m) 20 22 23 25 26 28 29 31 32 34
Frequency 3 6 12 9 2

Solution:

Height Midpoint, m Frequency, f m f
2
m f
20 22 21 3 63 1323
23 25 24 6 144 3456
26 28 27 12 324 8748
29 31 30 9 279 8100
32 34 33 2 66 2178
Total:

= 867
i i
m f

= 23805
2
i i
m f

a) 83 . 9
32
867
23805
32
1 1 1
2
2
1 1
2 2
=
(

=
(
(

|
.
|

\
|
=

= =
N
i
i i
N
i
i i
m f
N
m f
N
o
b) 15 . 10
32
867
23805
31
1 1
1
1
2
1
2
1
2 2
=
(

=
(
(

|
.
|

\
|

=

= =
n
i
n
i
i i i i
m f
n
m f
n
s


Use of Standard Deviation

We can find the proportion or percentage of the total observations that fall within a given
interval about the mean.


Chebyshevs Theorem
For any number k greater than 1, at least ) / 1 1 (
2
k of the data values lie within k
standard deviations of the mean.


Example 3.15
The average systolic blood pressure for 4000 women who were screened for high blood
pressure was found to be 187 with a standard deviation of 22. Using Chebyshevs theorem,
find at least what percentage of women in this group have a systolic pressure between 143
and 231.

Solution

143 187 231



2
22
44
= = k
Thus,
75 . 0
2
1
1
1
1
2 2
= =
k
or 75%

Empirical Rule
For a bell-shaped distribution, approximately
- 68% of the observations lie within one standard deviation of the mean.
- 95% of the observations lie within two standard deviations of the mean.
- 99.7% of the observations lie within three standard deviations of the mean.


Example 3.16
The age distribution of a sample of 500 persons is bell-shaped with a mean of 40 years
and a standard deviation of 12 years. Determine the approximate percentage of people
who are 16 to 64 years old.

Solution
40 = x years
12 = s years

16 40 64

2 = k

Thus, approximately 95% of the people in the sample are 16 to 64 years old.


3.4 Measures of position

Determine the position of a single value in relation to other values in a sample or a
population data set.

Quartiles
Quartiles are 3 summary measures that divide a ranked data set into 4 equal parts.
- second quartile (Q
2
) is the median of a data set.
- first quartile (Q
1
) is the value of the middle term among the observations that are less
than the median.
- third quartile (Q
3
) is the value of the middle term among the observations that are
greater than the median.


25 % 25 % 25 % 25 %

1
Q
2
Q
3
Q


To Find The Quartiles of Ungrouped Data

Consider n items arranged in ascending order. Then,

The first quartile = Lower quartile =
1
Q = th n ) 1 (
4
1
+ value
The second quartile = Median =
2
Q = th n ) 1 (
2
1
+ value
The third quartile = Upper quartile =
3
Q = th n ) 1 (
4
3
+ value

When n is odd, the rule locate the exact position of the quartiles.
When n is even,
(a) When n is even and
2
n
is even, then round up all decimal values of th n ) 1 (
4
1
+ or
th n ) 1 (
4
3
+ values, into 0.5 , for example:
2.3 2.5
7.9 7.5

(b) When n is even and
2
n
is odd, then round up the decimal value of the th n ) 1 (
4
1
+
value which is greater than 0.5, for example:
3.75 4
If the decimal value of th n ) 1 (
4
3
+ value is smaller than 0.5, then round down, for
example:
2.25 2

Interquartile Range(IQR)

1 3
Q Q IQR =

The semi-interquartile range = The quartile deviation = 2 / ) (
1 3
Q Q


Example 3.17
The following are the ages of nine employees of an insurance company:
47 28 39 51 33 37 59 24 33
a) Find the values of the three quartiles. Where does the age of 28 fall in relation to the ages
of these employees?
b) Find the interquartile range.

Solution

24 28 33 33 37 39 47 51 59





2
33 28
1
+
= Q 37
2
= Q
2
51 47
3
+
= Q
= 30.5 = 49



Percentiles
The (approximate) value of the kth percentile, denoted by P
k
is

P
k =
value of the th
kn
|
.
|

\
|
100
term in a ranked data set
where k denotes the number of the percentile and n represents the sample size.



Example 3.18
For the data in Ex 3.17,
a) Find the interquartile range.
b) Find the value of the 62
nd
percentile.

Solution

The position of the 60nd percentile is
th
kn
6 . 5
100
) 9 ( 62
100
= = term

The value of the 5.6
th
term can be approximated by the value of the sixth term in ranked data

=
62
P 62th percentile = 39








Box-and-whisker plot

a plot that shows the center, the spread, and the skewness of a data set.
Also helps to detect outliers.

Procedures:

Step1: Rank the data in increasing order and calculate the
2 1
, Q Q ,
3
Q and IQR =
1 3
Q Q .
Step2: Find
Lower inner fence (LIF) = IQR Q 5 . 1
1

Upper inner fence (UIF) = IQR Q + 5 . 1
3

Step3: Determine the smallest and the largest values in the given data set within the two
inner fences.
Step4: Draw a box with its left side at the position of
1
Q and the right side at the position of
3
Q . Inside the box, draw a vertical line at the position of
2
Q .
Step5: Draw two lines called whiskers joining the box to the points of the smallest and the
largest values within the two inner fences found in step 2. A value that falls outside
the two inner fences is shown by marking an asterisk and is called an outlier.


Example 3.19
The following data are the incomes ( in thousands of dollars) for a sample of 12 households:
35 29 44 72 34 64 41 50 54 104 39 58

Solution

Step 1
29 34 35 39 41 44 50 54 58 64 72 104

For these data,
Median = (44+50)/2=47
37 2 / ) 39 35 (
1
= + = Q
61 2 / ) 64 58 (
3
= + = Q
24 37 61
1 3
= = = Q Q IQR

Step 2

Lower inner fence (LIF) = IQR Q 5 . 1
1

1 24 * 5 . 1 37 = =

Upper inner fence (UIF) = IQR Q + 5 . 1
3

97 24 * 5 . 1 61 = + =

Вам также может понравиться