Вы находитесь на странице: 1из 59

QMT554 DATA ANALYSIS

CHAPTER 3
DESCRIPTIVE STATISTICS (UNGROUP DATA)

Introduction
2

Describing data using numerical measures for ungrouped data.

Measures in this chapter include i. Measures of central tendency ii. Measures of dispersion iii. Measures of position iv. Measures of skewness

MEAN MEDIAN MODE

MEASURES OF CENTRAL TENDENCY

Usually called the average Single value situated at the center of the data and can be taken as a summary value for that data set. Three common measures mean, mode, median

MEAN (for ungrouped data)

5

The mean is the sum of the values, divided by the total number of values. For population,
X 1 X 2 X 3 ... X N N

where N is number of population For sample,

X 1 X 2 X 3 ... X n X n

where n is number of sample

QMT554 DATA ANALYSIS

Example 1
6

The following data give the hours spent studying per week by nine students. Find the mean and interpret 12 16 7 22 11 18 7 25 16 Answer: 14.89 hours Interpretation:
On the average x ), 14.89 hours was spent for ( studying in a week.
QMT554 DATA ANALYSIS

Example 2
7

The data represent the annual chocolate sales (in billions of dollars) for a sample of seven countries in the world. Find the mean and interpret 2.0 4.9 6.5 2.1 5.1 3.2 16.6

Pros and cons of using mean

8

Pros Summarizes data in a way that is easy to understand. Uses all the data Used in many statistical applications Cons Affected by extreme values

Mean Example #1
9

Department #1 - 10 agents Agent Number

1 2 3 4 5 6 7 8 18 19 20 20 20 20 20 22

Sum = 18 + 19 + 20 + 20+ 20+ 20 + 20 + 22 + 24 + 28 = 211 Mean = 211 10 = 21.1 Reasonable as a typical or middle new customers recruited Mean is a reasonable measure of central tendancy

9
10

24
28

Mean Example #2
10

Department #2 - 10 agents Agent Number

1 2 3 4 5 6 7 8 18 19 20 20 20 20 20 22

Sum = 18 + 19 + 20 + 20+ 20+ 20 + 20 + 22 + 24 + 120 = 303

Mean =
303 10 = 30.3 Does NOT seem to be typical or middle number No one in department is close to 30.3

9
10

24
120

MEDIAN (for ungrouped data)

11

The median is the midpoint of the data array. Lies in the middle of the data. Symbol for median is MD Is NOT influenced by extremely high or low numbers in a set of data Example : cost of houses, income, age, etc. Steps in computing the median of raw data Step 1 Arrange the data in order Step 2 Select the middle point. If the middle point falls halfway between two values, find the median by adding the two values and dividing by two. QMT554 DATA ANALYSIS

Example 3
12

Find and interpret the median for ages of seven preschool children. The ages are 1 3 4 2 3 5 1

3
median

So, median age is 3 years old

*MD location = (n +1)th/2 value of ordered observations
QMT554 DATA ANALYSIS

13

Interpretation Half of the preschool children aged at least 3 years old or at most 3 years during the study period.

QMT554 DATA ANALYSIS

Example 4

The number of cloudy days for the top 6 cloudiest cities is shown. Find the median. 209 223 211 227 213 240

213

223

227

240

median

213 223 median 218 2

QMT554 DATA ANALYSIS 14

15

Interpretation Half (50%) of the cities were having cloudy days at an average of at least 218 days during the study period.

QMT554 DATA ANALYSIS

MODE
16

The mode is the value that occurs most often in a data set. Data set with only one mode unimodal Data set with two modes bimodal Data set with more than two modes multimodal

QMT554 DATA ANALYSIS

Example 5
17

The following data represent the duration (in days) of U.S. Space Shuttle voyages for the years 1992-1994. Find the mode and interpret

8 9 9 14 8 8 10 7 6 9 7 8 10 14 11 8 14 11 Answer: Mode for the data set is 8 Interpretation: Most of the U.S. Space Shuttle voyages for 8 QMT554 DATA ANALYSIS days

Example 6
18

19

Example 7 SPSS OUTPUT(EXPLORE)

Example 1 data
On the average 14.89 hours was spent for studying in a week.

Half of the students studied at an average of at least 16 hours per week

QMT554 DATA ANALYSIS

20

Relationship among mean, median & mode

As discussed in previous topic, histogram or a frequency distribution curve can assume either skewed shape or symmetrical shape.

Knowing the value of mean, median and mode can give us some idea about the shape of frequency curve.

QMT554 DATA ANALYSIS

21

Mean, median, and mode for a symmetric histogram and frequency distribution curve

QMT554 DATA ANALYSIS

22

Mean, median, and mode for a histogram and frequency distribution curve skewed to the right

QMT554 DATA ANALYSIS

23

Mean, median, and mode for a histogram and frequency distribution curve skewed to the left

Skewness based on mean, mode & median

24

To determine the skewness of data (symmetry, left skewed, right skewed) Measure the lack of symmetry in a data distribution The relationship between mean, median and mode Mean>median : positively-skewed Mean=median : symmetrical Mean<median : negatively-skewed If mean>mode : positively-skewed QMT554 DATA ANALYSIS

25

MEASURES OF POSITION
QUARTILES 1st Quartile & 3rd Quartile BOX AND WHISKER PLOT

QMT554 DATA ANALYSIS

QUARTILE
26

Describe positional values of data set. divide the distribution into four groups, separated by Q1, Q2 and Q3.

First Quartile - positional value where 25% of the observations are smaller and 75% are larger than this value.
QMT554 DATA ANALYSIS

27

Second Quartile - positional value where 50% of the observations are greater or equal to this value and another half are smaller or equal this value Second Quartile = median Third Quartile positional value where 75% of the observations are smaller and 25% are larger than this value.

Finding Data Values Corresponding to Q1, Q2 and Q3.

28

Step 1 Arrange the data in order from lowest to highest Step 2 Find the median of the data values. That is the value for Q2. Step 3 Find the median of the data values that fall below Q2. This is the value for Q1 Step 4 Find the median of the data values that fall above Q2. This is the value for Q3

QMT554 DATA ANALYSIS

Location formula
29

Q1 = (n + 1)th/4 value of ordered observations Q2 = (n +1)th/2 value of ordered observations Q3 = 3(n + 1)th/4 value of ordered observations

Example 8A

Solutions 5 6

12

13

15

18

22

50

Q2 position = (8+1)/2 = 4.5

13 15 Q2 median 14 2
QMT554 DATA ANALYSIS 30

Continue.

First Quartile, Q1 5 6 12 13
Q1 position = (8+1)/4 = 2.25

15

18

22

50

6 12 Q1 median 9 2

Third quartile, Q3 5 6 12 13
18 22 Q3 median 20 2

15

18

22

50

31

Example 8B
32

Interpretation??

Box and Whisker Plot

Purpose : to find out what information can be discovered about the data such as the center and spread Box and Whisker plot involve five specific values. 1) The lowest value (minimum) 2) Q1 3) The median 4) Q3 QMT554 DATA ANALYSIS 33 5) The highest value (maximum)

Information From Box plot

If the median is near the center of box, the distribution is approximately symmetric If the median falls to the left of the center of the box, the distribution is positively skewed. If the median falls to the right of the center, the distribution is negatively skewed.
QMT554 DATA ANALYSIS 34

Example 9
A stockbroker recorded the number of clients she saw each day over an 11-day period. The data are shown. Construct a box and whisker plot for the data. 33, 38, 43, 30, 29, 40, 51, 27, 42, 23, 31

QMT554 DATA ANALYSIS

35

Solutions

Step 1 : Arrange the data in order 23, 27, 29, 30, 31, 33, 38, 40, 42, 43, 51 Step 2 : Find the median 23, 27, 29, 30, 31, 33, 38, 40, 42, 43, Median =33 51

Step 3 : Find Q1 Q1=29 23, 27, 29, 30, 31, 33, 38, 40, 42, 43, QMT554 DATA ANALYSIS 36 51

Step 4 : Find Q3 23, 27, 29, 30, 31, 33, 38, 40, 42, 43, 51
Q3=42

Step 5 : Draw a scale for the data on the x axis Step 6 : Locate the lowest values, Q1, the median, Q3 and the highest value on the scale Step 7 Draw a box around Q1 and Q3, draw a vertical line through the median, and connect the upper and lower values
QMT554 DATA ANALYSIS 37

SPSS OUTPUT
38

QMT554 DATA ANALYSIS

Example 10

A dietitian is interested in comparing the sodium content of real cheese and cheese substitute. The data for two random samples are shown. Compare the distribution using box plots.
Real cheese Cheese substitute

310 420 45 40 270 180 250 290 220 240 180 90 130 260 340 310
QMT554 DATA ANALYSIS 39

40

INTERQUARTILE RANGE (IQR)

The interquartile range (IQR) is defined as the difference between and , and is the range of the middle 50% of the data
IQR Q3 Q1

Interquartile range is used to identify outliers, and it is also used as a measure of variability If data values smaller than 1 1.5(IQR ) or Q Qthan larger 3 1.5(IQR ) , the data is an outlier.
QMT554 DATA ANALYSIS 41

Example 11
42

Refer to example 9, find the IQR and identify if there any outliers

QMT554 DATA ANALYSIS

43

MEASURES OF SKEWNESS
Pearson Coefficient of Skewness

44

QMT554 DATA ANALYSIS

Interpretation of measures
45

If Sk +ve If Sk -ve If Sk = 0

right/positively skewed left/negatively skewed symmetry

If Sk takes a value in between (-0.9999, -0.0001) or (0.0001, 0.9999) indicates that the data is approximately symmetry.
QMT554 DATA ANALYSIS

Example 12A
46

The duration of cancer patient warded in Hospital Sultanah Bahiyah recorded in a frequency distribution. From the record, the mean is 28 days, median is 25 days and mode is 23 days. Given the standard deviation is 4.2 days. Find the skewness coefficient. What is the type of distribution?
QMT554 DATA ANALYSIS

47

Solution:
Mean - Mode 28 23 Sk 11905 . s 4.2 OR Sk 3 Mean - Median s 3 28 25 4.2 21429 .

So, from the Sk value this distribution is right skewed.

QMT554 DATA ANALYSIS

Example 12B
48

QMT554 DATA ANALYSIS

49

MEASURES OF DISPERSION
RANGE STANDARD DEVIATION VARIANCE

QMT554 DATA ANALYSIS

RANGE
50

The range is the highest value minus the lowest value Disadvantages: being influenced by outliers. Based on two values only. All other values in a data set are ignored. Refer to example 4 Highest value = 240 Lowest value = 209 Range = 240 209 = 31 QMT554 DATA ANALYSIS

51

Useful to measure the spread or variability of a set of data

A Standard Deviation value tells how closely the values of a data set clustered around the mean. Lower value - data set value are spread over relatively smaller range around the mean. Larger value - data set value are spread over relatively larger around the mean (far from mean). Standard deviation is obtained from the positive root of the variance
QMT554 DATA ANALYSIS

52

X X N
2

53

s2

X
n

n 1

s2

QMT554 DATA ANALYSIS

Example 13
54

Find the sample variance and standard deviation for the amount of European auto sales for a sample of 6 years shown. The data are in millions of dollars. 11.2 11.9 12.0 12.8 13.4 14.3

QMT554 DATA ANALYSIS

Solution
x 11.2 11.9 ... 14.3 75.6 x 11.2 11.9 ... 14.3 958.94 x x
2 2 2 2
2 2

s2

n 1 75.6 2 958.94 6 5 1.276

s s 2 1.276 1.13
QMT554 DATA ANALYSIS 55

Example 14
56

Refer to Example 9. From the SPSS output, interpret the value of range, variance and standard deviation.

QMT554 DATA ANALYSIS

Example 15
57

The following data give the number of patients admitted to a hospital on seven days during the month of January 2003.

19

14

25

21

13

16

QMT554 DATA ANALYSIS

CALCULATOR MANUAL
58

1) CLEAR DATA 2) MODE (SD) PRESS 1 3) KEY IN DATA (eg. If data 56, 57 and 58) Press 56 and M+ followed by 57 and M+ 58 and M+ 2 4) Press SHIFT 1 for x , x, n 5) Press SHIFT 2 for x (sample mean), xn (population std. deviation), xn 1(sample std. deviation)

59

END OF CHAPTER 3
THANK YOU