Вы находитесь на странице: 1из 16

*

TEACHING OBJECTIVES TO INTRODUCE THE THREE COMMON AVERAGES - MEAN, MEDIAN AND MODE - AND SHOW HOW TO CALCULATE THEM
MEASURES OF CENTRAL TENDENCY CONTENTS

PAGE

1. 2.

INTRODUCTION THE ARITHMETIC MEAN 2.1 The weighted mean MEDIAN MODE
COMPARISON OF THE ARITHMETIC MEAN, MEDIAN AND MODE

3. 4. 5. 6.

QUESTIONS FOR SELF-EVALUATION

Compiled by L E Greyling Technikon SA 1994

1.

INTRODUCTION

Although the histogram and frequency polygon of a data set are not smooth curves, the general or basic shape of the curve may be represented by a smooth or continuous curve and we call this the data distribution. There are four types of descriptive measures of a data distribution: 1. Descriptive measures of locality: A measure of locality gives an indication of the "midpoint" of the distribution. We also speak of a measure of central tendency. Descriptive measures of dispersion: Measure the inherent variation of the observations, in other words how concentrated the values are. Descriptive measures of symmetry: A distribution is symmetric if a perpendicular line can divide it into two halves which are mirror images of each other. Descriptive measures of kurtosis: Kurtosis is a measure of the peakedness of a distribution.

2.

4.

In the previous unit we summarized data using tables and charts. In the next two units we will look at various ways to summarize a data set numerically. From the above we see that there are two types of numerical measures, namely measures of location and measures of variation. In this chapter we are going to discuss descriptive measures of location.

2.

THE ARITHMETIC MEAN

The mean, commonly known as the average, of a set of data is calculated by adding all the values (or observations) and dividing this sum by the number of values.
sum of the values total number of values

mean =

EXAMPLE: Calculate the mean of the following values: 3 2 4 4 9 7 8 5 2 3 7

___________________________________________________________________________ CAS161Z -2MODULE 5: UNIT 3

SOLUTION:

mean=

3+2+4+4+9+7+8+5+2+3+7 11 54 = 11 = 4,9

Instead of writing out the definition in words we will use a shorter notation: The values of n items are represented by x1, x2, x3, .........xn. In general, the value of the ith item is denoted by xi , where i = 1, 2, 3, ...n. The Greek letter (sigma) is used to denote "the sum of ". The sum of x1 and x2 and x3 and so on up to and including xn is denoted by xi that is
i =1 n

xi = x1 + x2 + x3 + ......... + xn
i =1

We will assume that the summation will always be performed for all values of xi and will therefore only use xi . The mean can be calculated for a population and a sample using either the raw data or a frequency distribution. Using the raw data:

Population mean = Sample mean

xi N xi x= n

i = 1,2,3..... N i = 1,2,3..... n

(1) ( 2)

where xi represents the value of the ith item in the population or sample. N represents the number of items in the population n represents the number of items in the sample, represents the population mean x represents the sample mean Ideally the population mean and the sample mean should be equal. Using a frequency distribution f i xi Sample mean x= i = 1,2,3..... k n where xi represents the midpoint of the ith class fi represents the frequency of the ith class

(3)

k represents the number of classes n represents the number of items in the sample and n = fi .
___________________________________________________________________________ CAS161Z -3MODULE 5: UNIT 3

The midpoint of a class can be calculated using the following formula: lower class limit + upper class limit ( 4) Midpoint of class = 2
Properties of the mean: 1. It can be calculated for any set of numerical data, so it always exits. 2. A set of numerical data has one and only one mean, so it is always unique.

3. 4. 5.

It lends itself to further statistical treatment. For instance the means of several sets of data can be combined in the overall mean of all data. It is relatively reliable in the sense that the means of many samples drawn from the same population usually do not fluctuate widely. It takes into account every element of the set.

EXAMPLE:

Consider the following frequency distribution for the masses of potato tubers harvested from 50 small plots of potato plants: Mass 5 - 15 15 - 25 25 - 35 35 - 45 45 - 55 55 - 65 Frequency 5 10 20 5 5 5

SOLUTION: We are required to compute the arithmetic mean. We do this in tabular form: xi fi fixi

10 20 30 40 50 60 Totals

5 10 20 5 5 5 50

50 200 600 200 250 300 1,600

___________________________________________________________________________ CAS161Z -4MODULE 5: UNIT 3

The arithmetic mean is fi xi x= fi


1600 50 = 32kg =

2.1

THE WEIGHTED ARITHMETIC MEAN

In our discussion we have so far assumed that each item is of equal importance. We say that the weight of each item is the same. In some situations, some items are more important than others and we take this into account when we use the weighted mean. The weighted mean is calculated as follows:
xw =

wi xi wi

(5)

where wi represents the weight (relative importance) of the ith item and xi represents the ith item.
EXAMPLE:

A person who has just completed a long trip by car wants to calculate the average price he paid for fuel. The amount of fuel bought and the price which he paid at every fuel station where he filled up, is as follows: 30 liters at 108,4 cent per liter 80 liters at 100,8 cent per liter 55 liters at 98,0 cent per liter
SOLUTION:

The average price per liter =

(30)(108, 4) + (80)(100,8) + (55)(98, 0) 30 + 80 + 55 16706 = 165 = 101, 2 cent per liter

3.

THE MEDIAN

The median is the value of the middle term when the data is placed in order from smallest to largest. If there are an odd number of items then there will be a middle term which is the

___________________________________________________________________________ CAS161Z -5MODULE 5: UNIT 3

median. If there are an even number of items then there will not be a middle term. In this case the median is the mean of the two middle values if the data are placed in order from smallest to largest. Placing the items in order from smallest to largest is known as ranking
the data. EXAMPLE:

Calculate the median of the following set: 9;15,0; 25,6; 27,5; 31,0; 35,0; Solution: The median has a value exactly halfway between the third and fourth elements of the set: 25,66+27,5 me = 2 = 26,55
3.1 CALCULATING THE MEDIAN FOR GROUPED DATA

n , and find 2 the median class interval, that is first class interval of which the "less than" cumulative n frequency is greater than . Then 2 L + c n nL 2

The median value of grouped data is computed by means of a formula: Compute

me =

f me

Where L = lower limit of the median interval c = length of the median interval nL = "less than" cumulative frequency at L (i.e. of the previous class interval) fme = frequency of the median class interval n = the total number of observations

EXAMPLE: The amount of water consumed by 82 animals is as follows: Class interval (litres) Frequency "less than" cumulative frequency 3,45 - 3,95 6 6
3,95 - 4,45 4,45 - 4,95 4,95 - 5,45 5,45 - 5,95 5,95 - 6,45 6,45 - 6,95 Total 17 22 15 14 6 2 82 23 45 60 74 80 82

___________________________________________________________________________ CAS161Z -6MODULE 5: UNIT 3

The median is the L = 4, 45 nL = 23 f me = 22 n = 82 c = 0,5

82 = 41 st observation, which is in the interval 4,45 - 4,95. Thus we have 2

41 23 me = 4, 45 + 0,5 22 = 4,86 litres. We could also do this graphically, using the cumulative frequency polygon. This is shown in the figure below:

FIGURE 1: FINDING THE MEDIAN OF A FREQUENCY DISTRIBUTION As can be seen from the figure it is not necessary to draw the whole cumulative frequency polygon in order to read off the median. One needs to draw only the part around the median, in which case the median can be read off more accurately. EXERCISE: Compute the median of the following frequency distribution: Class interval (g) Frequency 10 - 15 5

15 - 20 20 - 25 25 - 30 30 - 35 35 - 40

14 20 12 5 2

___________________________________________________________________________ CAS161Z -7MODULE 5: UNIT 3

SOLUTION: 22,5 g Properties of the median:

1. 2.

The median can be calculated for any set of data and for any given set of data the median is unique. Unlike the mean the median is not so easily affected by extreme values. Thus if extreme values occur in a set of data the median is a better measure of locality than the mean. For example: Consider the set of values: 26,5; 18,4; 19,5; 20,4; 25,6; 30,0; 53,0; 68,5; 102,0 Here we have x = 38,8 and me = 26,05 We see that the extreme values 53,0; 68,5; and 102,0 have a marked effect on the mean. The median unlike the mean can also be used to define the middle of a number of objects whose properties or qualities can be ranked. For instance we can rank a number of tasks according to their degree of difficulty and find the median of these tasks. Perhaps the most important difference between the mean and the median is that the mean is very useful for further statistical calculations.

3.

4.

4.

THE MODE

It is defined as the value, category or class which occurs with the highest frequency.
EXAMPLE:

In a random sample of clients of a shoe shop the following distribution was found: Size of shoe Number of clients Find the mode.
SOLUTION:

245 2

255 4

260 262 5 6

265 8

270 13

275 6

280 4

290 2

The mode of this set of values is 270 as its frequency is the highest.

4.1

CALCULATING THE MODE FOR GROUPED DATA

We now turn to the problem of computing the mode for grouped observations. This is best done by means of a formula: ___________________________________________________________________________ CAS161Z -8MODULE 5: UNIT 3

f m f m1 mo = L + c 2 f m f m1 f m+1
Where L = lower limit of the modal class interval c = length of the modal class interval fm = frequency of the modal class interval fm- 1 = the frequency of the class interval preceding the modal class interval fm+ 1 = the frequency of the class interval following the modal class interval
EXAMPLE: Consider the following frequency distribution: Class interval (mm) 100 - 110 110 - 120

Frequency 4 13 28 39 6 10

120 - 130 130 - 140 140 - 150 150 - 160 The modal interval is 130 - 140, and we find L = 130 f m = 39 f m1 = 28 f m+1 = 6 c = 10
f m f m1 mo = L + c 2 f m f m1 f m+1 39 28 = 130 + 10 78 28 6 11 = 130 + 10 44 = 132,5 mm

Graphically the formula for computing the mode can be illustrated as follows:

___________________________________________________________________________ CAS161Z -9MODULE 5: UNIT 3

FIGURE 2: COMPUTING THE MODE FROM THE HISTOGRAM OF GROUPED OBSERVATIONS

If there are two modal class intervals then one has to compute two separate modes. If the modal class intervals are consecutive intervals, the modes for the two intervals will be the same. This is illustrated by means of a further example.
EXAMPLE: Consider the following frequency distribution: Class interval

Frequency 2 5 13 13

6,5 - 7,5 7,5 - 8,5 8,5 - 9,5 9,5 - 10,5

10,5 - 11,5 4 11,5 - 12,5 2 12,5 - 13,5 1 We see that there are two modal intervals, each with frequency 13, namely 8,5 - 9,5 and 9,5 10,5. Regarding the first interval, 8,5 - 9,5, as the modal interval we obtain: L = 8,5 c =1 f m = 13 f m1 = 5 f m+1 = 13 13 5 mo = 8,5 + 1 26 5 13 = 9,5 ___________________________________________________________________________ CAS161Z -10- MODULE 5: UNIT 3

If we regard the second interval, namely 9,5 - 10,5, as the modal interval, we obtain L = 9,5 c =1 f m = 13 which is the same answer. f m1 = 13 f m+1 = 4
13 13 mo = 9,5 + 1 26 13 4 = 9,5

Properties of the mode: 1. Easily calculated and it is not influenced by extreme values. 2. The mode has the advantage that it can be found for qualitative data as well as quantitative data. 3. The mode is best used in the case of discrete data as shown above.
4. However, it is not always uniquely defined and we may have a set of data with more than one mode.

5.

COMPARISON OF THE ARITHMETIC MEAN, MEDIAN AND MODE

5.1 RELATIONSHIP We have now discussed three measures of location: the arithmetic mean, the median and the mode. Which one of these three is preferable? Unfortunately an unambiguous choice is not possible. What is the difference between the three measures? Firstly we note that, for a symmetric frequency distribution, the three are identical.
This is shown by an example.

EXAMPLE: Class interval 1-3 3-5 5-7


7-9 9 - 11 11- 13 13 - 15

fi 1 3 6 10 6 3 1 30

xi 2 4 6 8 10 12 14

fixi 2 12 36 80 60 36 14 240

Fi 1 4 10 20 26 29 30

___________________________________________________________________________ CAS161Z -11- MODULE 5: UNIT 3

We see that 240 x= =8 30 10 6 mo = 7 + 2 =8 20 6 6


30 10 me = 7 + 2 2 =8 10 We say the frequency distribution is symmetric about 8. The symmetry is best seen by means of a histrogram. Draw your own histogram from the data to confirm the symmetry. As a general rule we have: For a symmetric frequency distribution: x = mo = me For "mildly skew" frequency distributions the following relationship has been found to be approximately true: x mo = 3 ( x me ) ,that is the difference between the arithmetic mean and the mode is three times as large as the difference between the arithmetic mean and the median. The median is usually in the middle, closer to the arithmetic mean than to the mode.

FIGURE 4: RELATIONSHIP BETWEEN MEAN, MEDIAN AND MODE 5.2 ADVANTAGES AND DISADVANTAGES Each of the three measures of location has certain advantages and disadvantages. The choice between the three will depend on practical considerations, and sometimes it may be beneficial
to compute all three. One measure may sometimes be highly misleading, while the others may be revealing.

EXAMPLE: A small firm has 5 employees whose montly salaries are as follows: R100 R110 R120 R150 R220

___________________________________________________________________________ CAS161Z -12- MODULE 5: UNIT 3

700 = R140 5 The median salary is R120


The mean salary is R The mode does not exist. In order to impress the public, the owner defines himself to be an employee of the firm. His 2700 own "salary" is R2000, which makes the mean salary R = R450 . 6 120 + 150 The median salary is R = R135 2 This shows that the arithmetic mean is much more sensitive than the median to one observation which is far away from the others. Some of the advantages and disadvantages of the three measures of location are as follows:

(a) Simplicity The median and the mode are easier to interpret than the arithmetic mean. (b) Sensitivity The arithmetic mean is more sensitive that the mode and the median to single outlying observations. (c) Definition The mode is not always defined. The definition of the median for even sample sizes is somewhat arbitrary. The arithmetic mean is well defined. (d) Variability The arithmetic mean is less variable (from sample to sample from the same population) than the median and the mode. Thus if we draw two samples from the same population, then the
two arithmetic means may be expected to differ less than the two medians or the two modes.

(e) Mathematical properties The arithmetic mean possesses mathematical properties which make it very convenient to
work with in advanced statistical work.

5.3

CHOICE OF MEASURE OF LOCATION

The choice of the measure of location to be used in a specific application depends mainly on three considerations: the purpose for which it is to be used, the nature of the distribution and whether further analyses are to be done. ___________________________________________________________________________ CAS161Z -13- MODULE 5: UNIT 3

(a) The purpose The choice of a measure of location will depend on the specific problem.

A shoe

manufacturer will be interested to know which shoe size is worn by most people (i.e. the mode) rather than the arithmetic mean. In order to judge a student's performance relative to his class, it would be more informative to know whether he is in the upper half of the class, i.e. whether he is above the median, than to know how he stands with respect to the arithmetic mean.

(b) Nature of the distribution If the distribution is symmetric it does not really matter which measure of location is used. If
the distribution is very skew, the arithmetic mean may not be a very "typical" value since it is so sensitive to one or a few extreme values.

(c) Further analyses If it is intended that further analyses should be done, e.g. comparing the sample to another sample, then the arithmetic mean is usually the most suitable measure of location.

---oOo---

___________________________________________________________________________ CAS161Z -14- MODULE 5: UNIT 3

6.

QUESTIONS FOR SELF-EVALUATION


TIME: 1 HOURS

MARKS: 52
1.

A large computer company conducted a survey of the hours worked by their part-time staff. The part-time staff work 18,20 or 24 hours per week. A sample of weekly working hours of 35 part-time employees is given below. 18 20 24 18 20 20 24 18 20 20 (a) (b) (c) 20 18 24 18 24 20 24 20 20 18 20 24 18 20 20 18 20 18 24 20 20

20 18 24 20 Calculate the mean Calculate the median Calculate the mode

(10)

2.

The breaking strength of a fibre is tested in a laboratory. Forty samples of the fibre are tested and the breaking strengths(in grams) are given below. 2,143 2,143 2,148 2,158 2,137 2,137 2,147 2,140 2,142 2,168 2,168 2,150 2,133 2,170 2,145 2,146 (a) 2,134 2,162 2,125 2,122 2,164 2,130 2,137 2,149 2,165 2,133 2,151 2,156 2,145 2,162 2,149 2,161 2,142 2,137 2,134 2,168 2,169 2,156 2,151 2,131

Use the raw data to (i) Calculate the mean (ii) Calculate the median (iii) Calculate the mode Use a grouped frequency distribution to (i) Calculate the mean (ii) Calculate the median (iii) Calculate the mode

(10)

(b)

(14)

3.

The numbers of units produced by 180 workers in an engineering factory in one working week are given below. Number of units 500 509 510 519 Number of workers 8 18

___________________________________________________________________________ CAS161Z -15- MODULE 5: UNIT 3

520 529 530 539 540 549 550 559 560 569 570 579 Total (a) (b) (c) 4. xi wi

23 37 47 26 16 5 180

Calculate the mean number of units Calculate the median number of units Calculate the modal number of units

(14)

In the data given below larger values are given a greater importance and thus a higher weight than smaller values. Calculate the weighted mean. 8,2 5,6 7,3 9,8 7,1 6,2 9,0 16 4 11 30 9 6 24 (4)

---oOo---

LEARNING OBJECTIVES YOU SHOULD NOW BE ABLE TO: * CALCULATE- THE MEAN, - THE MEDIAN AND - THE MODE OF A DATA SET

___________________________________________________________________________ CAS161Z -16- MODULE 5: UNIT 3

Вам также может понравиться