Вы находитесь на странице: 1из 11

NUMERICAL SUMMARY OF DATA

In this lecture we shall look at summarising data by numbers, rather then by graphs as last week. We shall use the diagrams drawn in last week's lecture again this week for making estimations and you will use the diagrams you drew in Tutorial 2 for estimating summary statistics in Tutorial 3. You will first learn the meanings of the various summary statistics and then learn how to use your calculator in standard deviation mode to save time and effort in calculating mean and standard deviation. Further summary statistics are discussed in Business Statistics, Chapter 3, Section 3.6. As we saw last week, a large display of data is not easily understood but if we make some summary statement such as 'the average examination mark was 65' and the 'marks ranged from 40 to 75' we get a clearer understanding of the situation. A set of data is often described, or summarised by two parameters: a measure of centrality, or location, and a measure of spread, or dispersion.

MEASURES OF CENTRALITY
Mode: This is the most commonly occurring value. The data may be nominal, ordinal, interval or ratio. (Section 1.4 of Business Statistics) Median: The middle value when all the data are placed in order. The data themselves must be ordinal or interval. For an even number of values take the average of the middle two. Mean: The Arithmetic Average. The data must be measurable on an interval scale. This is calculated by dividing the 'sum of the values' by the 'number of the values'.

x=

x n

or

x=

( fx ) n

x represents the value of the data f is the frequency of that particular value x is the shorthand way of writing 'mean' (sigma) is the shorthand way of writing 'the sum of'. The mean may be thought of as the 'typical value' for a set of data and, as such, is the logical value for use when representing it. The choice of measure of centrality depends, therefore, on the scale of measurement of the data. For example it makes no sense to find the mean of ordinal or nominal data. All measures are suitable for interval or ratio data so we shall consider numbers in order that all the measures can be illustrated. Data values may also be discrete, i.e. not continuous. - people. continuous - heights. single - each value stated separately. grouped - frequencies stated for the number of cases in each value group,

or interval for continuous data You still work for the Transport Manager, as last week, investigating the usage of cars. Example 1 Non-grouped discrete data Nine members of staff were selected at random from those using hire cars. The number of days he/she left the car unused in the garage in the previous week was: 2 6 2 4 1 4 3 1 1 Rearranged in numerical order: Mode = 1; Median = 2; 1 1 1 2 2 3 4 4 6 Mean = x=

x 24 = 2.67 = 9 n

Example 2 Grouped discrete data. Days idle for a whole staff of users of hired cars: No. of Days unused (x) 0 1 2 3 4 5 6 Total Mode Median Mean = = = No. of Staff (f) 5 24 30 19 10 5 2 95 = =
=

fx 0 24 60 57 40 25 12 218 2 days 2 days


2.295 2.3 days

Most common number of days Middle, 48th, number of days


Total number of days Total number of staff = 218 95

Example 3

Grouped continuous data

40 employees stated their fuel costs during the previous week to be: Value () 59 and < 60 60 and < 61 61 and < 62 62 and < 63 63 and < 64 64 and < 65 65 and < 66 66 and < 67 Modal Class = 64 to 65 Medial Class = Class including 17/l8th user = 62 to 64 Mid-value (x) 59.5 60.5 61.5 62.5 63.5 64.5 65.5 66.5 Total No. of Employees(f) 2 5 4 6 5 7 3 2 34 fx 119.0 302.5 246.0 375.0 317.5 451.5 196.5 133.0 2141.0

Mean = Total number of employees = Range: Largest value - smallest value.

Total value of invoices

2141 34

= 62.97

MEASURES OF SPREAD (THE DATA MUST BE AT LEAST ORDINAL)


Interquartile Range: The range of the middle half of the ordered data. Semi interquartile Range: Half the value of the interquartile range. Population Standard Deviation: (xn) This is also known as 'Root Mean Square Deviation' and is calculated by squaring and adding the deviations from the mean, finding the average of the squared deviations, and then square-rooting the result, or by using your calculator. As the name suggests we have the whole population of interest available for analysis.
s = ( x x) n
2

or

f ( x x) n

for frequency data

represents the value of the data is the frequency of that particular value x is the shorthand way of writing 'mean' s is the shorthand way of writing 'standard deviation' is the shorthand way of writing 'the sum of'. means 'take the positive square root of' as the negative root has no meaning. The Standard Deviation is a measure of how closely the data are grouped about the mean. The larger the value the wider the spread. It is a measure of just how precise the mean is as the value to represent the whole data set. An equivalent formula which is often used is:
s = x x or n n
2 2

x f

fx fx n n

for frequency data.

Sample standard deviation (x ) Usually we are interested in the whole population but only have a sample taken from it available for analysis. It has been shown that the formula above gives a biased value -always too small. If denominator used is (n - 1) instead of (n), the estimate is found to be more accurate.
s = ( x x) n 1
2

n-1

or

f ( x x) n 1

for frequency data

As we shall see shortly on the calculator, the keys for the two standard deviations are described as xn and xn-1 respectively

The examples below are the same as used previously for the measures of centrality because these two parameters are generally calculated together. Example 1 Discrete Data:

The number of idle days for nine cars: 2 6 2 4 1 4 3 1 1 Rearranged in numerical order: 1 1 1 2 2 3 4 4 6 Range: = 6 - 1 = 5 days Interquartile Range: Lower quartile, one quarter of (9 + 1)= 2.5th = 1 day Upper quartile, three quarters of (9 + 1) = 4 days Interquartile range 4 - 1 = 3 days Semi interquartile Range: half interquartile range = 1.5 days (A much larger sample should really be used for this type of analysis.) Standard Deviation: (See your calculator booklet for the method as calculators vary considerably. See next page if you have a Casio. If still uncertain, ask in tutorials) For these nine numbers only = 1.633 days ( xn shift 2 for Casios) As estimator for the whole population = 1.732 days ( xn 1 shift 3 for Casios) Example 2. Grouped Discrete Data: The numbers of idle days for 95 cars are: No. of Days Idle (x) 0 1 2 3 4 5 6 Total Range: = 6 - 0 = 6 No. of Staff (f) 5 24 30 19 10 5 2 95 Cumulative frequency 5 29 59 78 88 93 95

Interquartile Range: Lower quartile =

95 +1 = 24th = 1 day 4 3(95 +1) Upper quartile = = 72nd = 3 days 4

Interquartile Range = 3 - 1 = 2 days Semi interquartile range: 2/2 = 1 day Standard Deviation: (from calculator) For this sample only = 1.345 days

(x n shift 2)
4

As estimator for population = 1.352 days Example 3 Grouped Continuous Data Mid-value (x) 59.5 60.5 61.5 62.5 63.5 64.5 65.5 66.5 Total Range: = 66.5 - 59.5 = 7.0 (estimate)

( xn 1 shift 3)

The invoices for 34 cars: No. of Employees 2 5 4 6 5 7 3 2 34 fx 119.0 302.5 246.0 375.0 317.5 451.5 196.5 133.0 2141.0

Value () 59 and < 60 60 and < 61 61 and < 62 62 and < 63 63 and < 64 64 and < 65 65 and < 66 66 and < 67

Interquartile Range: Usually calculated from the quartiles estimated (see below) from a cumulative frequency diagram - an ogive. Semi interquartile range: estimated from the interquartile range from the ogive. Standard Deviation: (from calculator) For sample only = 1.929 ( x n shift 2) ( xn 1 shift 3)

As estimator for population = 1.958

N.B. The sample standard deviation, xn 1 , is the standard deviation produced by default from both Minitab and SPSS analysis. Use of a Casio calculator in S.D. Mode for finding mean and standard deviation Input data (x is the number being keyed in.) 1. 2. 3. Clear all memories Get into Standard Deviation Mode Input data a) Single numbers b) Grouped data (where x is the value and f the frequency.) Output results 4. 5. Check sample size (n) Output mean ( x ) RCL SHIFT Red C (hyp) 1 x DT x DT x; f etc. DT x ; f DT etc. SHIFT AC MODE 2

(6. 7.

Output population standard deviation ( x n ) Output sample standard deviation ( xn 1 )

SHIFT SHIFT

2) 3

Estimation of the Mode from a Histogram Mode (Last week's Mileages of Cars example) We know that the modal mileage is in the interval between 140 and 150 miles but need to estimate it more accurately. Draw 'crossed diagonals' on the protruding part of the highest column. These two diagonals cross at an improved estimate of the modal mileage. By this method we take into consideration also the frequencies of the columns adjacent to the modal column. Histogram of Car mileage as drawn last week Freq. per 10 mile interval 12 10 8 6 4 2 0 100 110 120 130 140 150 160 Mileages 170 180 190

Estimated Mode:

Estimation the Median and Interquartile range from a Cumulative frequency diagram Median (Last week's Mileages of Cars example) Dot in from 50% on the Cumulative Percentage axis to your ogive and then down to the values on the horizontal axis. The value indicated is the estimated Median for the Mileages. Quartiles Dot in from 75% and the 25% values on the Cumulative Percentage axis to your ogive and then down to the values on the horizontal axis. The values indicated are the estimated Upper quartile and Lower quartile respectively of the Mileages. The difference between these measures is the Interquartile range. Half the Inter-quartile range is the Semi-interquartile range. Cumulative frequency diagram % C.F. 100 80 60 40 20 0 100 Estimated Median: Estimated Quartiles: Interquartile range: Semi-Interquartile range: % of users who travelled more than 170 miles 110 120 130 140 150 160 Mileages 170 180 190 200

COMPUTER OUTPUT: NUMERICAL SUMMARY OF EXAMPLE 3


Minitab output Example 3 Descriptive Statistics
Variable Costs Variable Costs N 34 Min 59.500 Mean 62.974 Max 66.500 Median 63.000 Q1 61.500 Tr Mean 62.970 Q3 64.500 StDev 1.955 SE Mean 0.335

SPSS output Example 3


Descriptives INVOICES Mean 95% Confidence Interval for Mean 5% Trimmed Mean Median Variance Std. Deviation Minimum Maximum Range Interquartile Range Skewness Kurtosis Statistic 62.971 62.288 63.654 62.967 63.000 3.832 1.958 59.5 66.5 7.0 3.000 -.043 -.899 Std. Error .336

Lower Bound Upper Bound

.403 .788

Completed diagrams from lecture handout (Not for student handouts) Estimation of the Mode from a Histogram Mode (Last week's Mileages of Cars example) We know that the modal mileage is in the interval between 140 and 150 miles but need to estimate it more accurately. Draw 'crossed diagonals' on the protruding part of the highest column. These two diagonals cross at an improved estimate of the modal mileage. Histogram of Car mileage as drawn last week Freq. per 10 mile interval 12 10 8 6 4 2 0 100 110 Estimated Mode: 146 miles 120 130 140 150 160 Mileages 170 180 190

10

Estimation the Median and Interquartile range from a Cumulative frequency diagram Median (Last week's Mileages of Cars example) Dot in from 50% on the Cumulative Percentage axis to your ogive and then down to the values on the horizontal axis. The value indicated is the estimated Median for the Mileages. Quartiles Dot in from 75% and the 25% values on the Cumulative Percentage axis to your ogive and then down to the values on the horizontal axis. The values indicated are the estimated Upper quartile and Lower quartile respectively of the Mileages. The difference between them measures is the Interquartile range. Half the Inter-quartile range is the Semi-interquartile range. Cumulative frequency diagram % C.F. 100 80 60 40 20 0 100 110 120 130 140 150 160 Mileages 170 180 190 200

Estimated Median: 146 miles Estimated Quartiles: 134 and 156 miles Interquartile range: 156 - 134 = 22 miles Semi-Interquartile range: 22/2 = 11 miles .

% of users who travelled more than, for e.g. > 170 miles = 5%

11

Вам также может понравиться