Вы находитесь на странице: 1из 43

Numerical Measures

Chapter Goals
After completing this chapter, you should be able to:      Compute and interpret the mean, median, and mode for a set of data Compute the range, variance, and standard deviation and know what these values mean Construct and interpret a box and whiskers plot Compute and explain the coefficient of variation and z scores Use numerical measures along with graphs, charts, and tables to describe data
2

Chapter Topics
 Measures of Center and Location
 Mean, median, mode, geometric mean, midrange

 Other measures of Location


 Weighted mean, percentiles, quartiles

 Measures of Variation
 Range, interquartile range, variance and standard deviation, coefficient of variation
3

Summary Measures
Describing Data Numerically

Center and Location Mean Median Mode Weighted Mean

Other Measures of Location Percentiles

Variation Range Interquartile Range

Quartiles Variance Standard Deviation Coefficient of Variation

Measures of Center and Location


Overview Center and Location

Mean
n

Median

Mode

Weighted Mean

x
x!
i !1

XW QW

n
N

x
Q!
i!1

w x ! w w x ! w
i i i i

Mean (Arithmetic Average)


 The Mean is the arithmetic average of data values
 Sample mean
n = Sample Size n

x
i !1

x!

n
N i

x1  x 2  .  x n ! n
N = Population Size

 Population mean

x1  x 2  .  x N Q! ! N N
i !1

Mean (Arithmetic Average)(continued)


   The most common measure of central tendency Mean = sum of values divided by the number of values Affected by extreme values (outliers)

0 1 2 3 4 5 6 7 8 9 10

0 1 2 3 4 5 6 7 8 9 10

Mean = 3
1  2  3  4  5 15 ! !3 5 5

Mean = 4
1  2  3  4  10 20 ! !4 5 5
7

Median
 Not affected by extreme values
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

Median = 3

Median = 3

 In an ordered array, the median is the middle number


 If n or N is odd, the median is the middle number  If n or N is even, the median is the average of the 8 two middle numbers

Mode
      A measure of central tendency Value that occurs most often Not affected by extreme values Used for either numerical or categorical data There may may be no mode There may be several modes
0 1 2 3 4 5 6

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Mode = 5

No Mode

Weighted Mean
 Used when values are grouped by frequency or relative importance
Example: Sample of 26 Repair Projects
Days to Complete 5 6 7 8 Frequency 4 12 8 2

Weighted Mean Days to Complete:


XW

w x ! w
i i

! !

(4 v 5)  (12 v 6)  (8 v 7)  (2 v 8) 4  12  8  2 164 ! 6.31 days 26


10

Review Example
 Five houses on a hill by the beach
$2,000 K

House Prices: $2,000,000 500,000 300,000 100,000 100,000


$500 K $300 K

$100 K $100 K
11

Summary Statistics
House Prices: $2,000,000 500,000 300,000 100,000 100,000 Sum 3,000,000

 Mean: ($3,000,000/5) = $600,000  Median: middle value of ranked data = $300,000  Mode: most frequent value = $100,000
12

Which measure of location is the best?


 Mean is generally used, unless extreme values (outliers) exist  Then median is often used, since the median is not sensitive to extreme values.
 Example: Median home prices may be reported for a region less sensitive to outliers
13

Shape of a Distribution
 Describes how data is distributed  Symmetric or skewed
Left-Skewed Symmetric Right-Skewed

Mean < Median < Mode Mean = Median = Mode (Longer tail extends to left)

Mode < Median < Mean


(Longer tail extends to right)
14

Other Location Measures


The pth percentile in a data Other Measures array:  p% are less than or equal ofthis value to Location  (100 p)% are greater than or equal to this value Percentiles Quartiles (where 0 p 100) 1st quartile = 25th percentile 2nd quartile = 50th percentile = median 3rd quartile = 75th percentile
15

Percentiles
 The pth percentile in an ordered array of n values is the value in ith position, where

p (n  1) i! 100
Example: The 60th percentile in an ordered array of 19 values is the value in 12th position:

p 60 i! (n  1) ! (19  1) ! 12 100 100


16

Quartiles
 Quartiles split the ranked data into 4 equal groups
25% 25%
Q1 Example: Find the first quartile
Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22
(n = 9) Q1 = 25th percentile, so find the

25%

25%

Q2

Q3

25 (9+1) = 2.5 position 100


so Q1 = 12.5
17

so use the value half way between the 2nd and 3rd values,

Box and Whisker Plot


 A Graphical display of data using 5-number summary:
Minimum -- Q1 -- Median -- Q3 -- Maximum Example:
25% 25% 25% 25%

Minimum Minimum

1st 1st Quartile Quartile

Median Median

3rd 3rd Quartile Quartile

Maximum Maximum
18

Shape of Box and Whisker Plots


 The Box and central line are centered between the endpoints if data is symmetric around the median

 A Box and Whisker plot can be shown in either vertical or horizontal format

19

Distribution Shape and Box and Whisker Plot


Left-Skewed Symmetric Right-Skewed

Q1

Q2 Q3

Q1 Q2 Q3

Q1 Q2 Q3

20

Box-and-Whisker Plot Example


 Below is a Box-and-Whisker plot for the following data:
Min

0 2 27

Q1

Q2

Q3

5 10

Max

0 23 5 0 2 3 5

27 27

 This data is very right skewed, as the plot depicts

21

Measures of Variation
Variation
Range Interquartile Range Variance Standard Deviation Population Standard Deviation Sample Standard Deviation Coefficient of Variation

Population Variance Sample Variance

22

Variation
 Measures of variation give information on the spread or variability of the data values.

Same center, different variation


23

Range
 Simplest measure of variation  Difference between the largest and the smallest observations:
Range = xmaximum xminimum
Example:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Range = 14 - 1 = 13
24

Disadvantages of the Range


 Ignores the way in which data are distributed
7 8 9 10 11 12 7 8 9 10 11 12 Range = 12 - 7 = 5 Range = 12 - 7 = 5

 Sensitive to outliers

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Range = 5 - 1 = 4

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 120 - 1 = 119
25

Interquartile Range
 Can eliminate some outlier problems by using the interquartile range  Eliminate some high-and low-valued observations and calculate the range from the remaining values.  Interquartile range = 3rd quartile 1st quartile

26

Interquartile Range
Example: X
minimum
25%

Q1
25%

Median (Q2)
25%

Q3
25%

maximum

12

30

45

57

70

Interquartile range = 57 30 = 27
27

Variance
 Average of squared deviations of values from the mean
 Sample variance:
n 2

(x
i !1 N

 x)

s !

n -1

 Population variance:

(x
!
i !1

 )

28

Standard Deviation
 Most commonly used measure of variation  Shows variation about the mean  Has the same units as the original data
n

 Sample standard deviation:


s!

(x i  x )2
i !1

n -1
N

 Population standard deviation:

(x i  )2 !
i !1

29

Calculation Example: Sample Standard Deviation


Sample Data (Xi) : 10 12 n=8
s !

14

15

17

18

18

24

Mean = x = 16

(10  x )2  (12  x )2  (14  x )2  .  (24  x )2 n 1 (10  16) 2  (12  16)2  (14  16)2  .  (24  16) 2 8 1 126 7 ! 4.2426
30

Comparing Standard Deviations


Data A
11 12 13 14 15 16 17 18 19 20 21

Mean = 15.5 s = 3.338

Data B
11 12 13 14 15 16 17 18 19 20 21

Mean = 15.5 s = .9258 Mean = 15.5 s = 4.57


31

Data C
11 12 13 14 15 16 17 18 19 20 21

Coefficient of Variation
 Measures relative variation  Always in percentage (%)  Shows variation relative to mean  Is used to compare two or more sets of data measured in different units
Population Sample

CV !

100%

s 100% CV ! x
32

Comparing Coefficient of Variation


 Stock A:  Average price last year = $50  Standard deviation = $5

s CVA ! x
 Stock B:

$5 100% ! 100% ! 10% $50

 Average price last year = $100  Standard deviation = $5

s CVB ! x

$5 100% ! 100% ! 5% $100

Both stocks have the same standard deviation, but stock B is less variable relative to its price

33

The Empirical Rule


 If the data distribution is bell-shaped, then the interval:  s 1 contains about 68% of the values in the population or the sample
X
68%

s1
34

The Empirical Rule


 
s 2 contains about 95% of the values in

the population or the sample s 3 contains about 99.7% of the values in the population or the sample

95%

99.7%

s2

s3
35

Tchebysheffs Theorem
 Regardless of how the data are distributed, at least (1 - 1/k2) of the values will fall within k standard deviations of the mean
 Examples:
At least within (1 - 1/12) = 0% ..... k=1 ( 1 ) (1 - 1/22) = 75% ........ k=2 ( 2 ) (1 - 1/32) = 89% . k=3 ( 3 )
36

Standardized Data Values


 A standardized data value refers to the number of standard deviations a value is from the mean  Standardized data values are sometimes referred to as z-scores

37

Standardized Population Values


where: x z value  x = original data!  = population mean  = population standard deviation  z = standard score
(number of standard deviations x is from )

38

Standardized Sample Values


where: xx z value  x = original data! s  x = sample mean  s = sample standard deviation  z = standard score
(number of standard deviations x is from )

39

Using Microsoft Excel


 Descriptive Statistics are easy to obtain from Microsoft Excel
 Use menu choice: tools / data analysis / descriptive statistics  Enter details in dialog box

40

Using Excel


Use menu choice:

tools / data analysis / descriptive statistics

41

Using Excel
 Enter dialog box details

(continued)

 

Check box for summary statistics Click OK

42

Excel output
Microsoft Excel descriptive statistics output, using the house price data:
House Prices: $2,000,000 500,000 300,000 100,000 100,000

43

Вам также может понравиться