Академический Документы
Профессиональный Документы
Культура Документы
Chapter Goals
After completing this chapter, you should be able to: Compute and interpret the mean, median, and mode for a set of data Compute the range, variance, and standard deviation and know what these values mean Construct and interpret a box and whiskers plot Compute and explain the coefficient of variation and z scores Use numerical measures along with graphs, charts, and tables to describe data
2
Chapter Topics
Measures of Center and Location
Mean, median, mode, geometric mean, midrange
Measures of Variation
Range, interquartile range, variance and standard deviation, coefficient of variation
3
Summary Measures
Describing Data Numerically
Mean
n
Median
Mode
Weighted Mean
x
x!
i !1
XW QW
n
N
x
Q!
i!1
w x ! w w x ! w
i i i i
x
i !1
x!
n
N i
x1 x 2 . x n ! n
N = Population Size
Population mean
x1 x 2 . x N Q! ! N N
i !1
0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10
Mean = 3
1 2 3 4 5 15 ! !3 5 5
Mean = 4
1 2 3 4 10 20 ! !4 5 5
7
Median
Not affected by extreme values
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Median = 3
Median = 3
Mode
A measure of central tendency Value that occurs most often Not affected by extreme values Used for either numerical or categorical data There may may be no mode There may be several modes
0 1 2 3 4 5 6
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Mode = 5
No Mode
Weighted Mean
Used when values are grouped by frequency or relative importance
Example: Sample of 26 Repair Projects
Days to Complete 5 6 7 8 Frequency 4 12 8 2
w x ! w
i i
! !
Review Example
Five houses on a hill by the beach
$2,000 K
$100 K $100 K
11
Summary Statistics
House Prices: $2,000,000 500,000 300,000 100,000 100,000 Sum 3,000,000
Mean: ($3,000,000/5) = $600,000 Median: middle value of ranked data = $300,000 Mode: most frequent value = $100,000
12
Shape of a Distribution
Describes how data is distributed Symmetric or skewed
Left-Skewed Symmetric Right-Skewed
Mean < Median < Mode Mean = Median = Mode (Longer tail extends to left)
Percentiles
The pth percentile in an ordered array of n values is the value in ith position, where
p (n 1) i! 100
Example: The 60th percentile in an ordered array of 19 values is the value in 12th position:
Quartiles
Quartiles split the ranked data into 4 equal groups
25% 25%
Q1 Example: Find the first quartile
Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22
(n = 9) Q1 = 25th percentile, so find the
25%
25%
Q2
Q3
so use the value half way between the 2nd and 3rd values,
Minimum Minimum
Median Median
Maximum Maximum
18
A Box and Whisker plot can be shown in either vertical or horizontal format
19
Q1
Q2 Q3
Q1 Q2 Q3
Q1 Q2 Q3
20
0 2 27
Q1
Q2
Q3
5 10
Max
0 23 5 0 2 3 5
27 27
21
Measures of Variation
Variation
Range Interquartile Range Variance Standard Deviation Population Standard Deviation Sample Standard Deviation Coefficient of Variation
22
Variation
Measures of variation give information on the spread or variability of the data values.
Range
Simplest measure of variation Difference between the largest and the smallest observations:
Range = xmaximum xminimum
Example:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 14 - 1 = 13
24
Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Range = 5 - 1 = 4
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 120 - 1 = 119
25
Interquartile Range
Can eliminate some outlier problems by using the interquartile range Eliminate some high-and low-valued observations and calculate the range from the remaining values. Interquartile range = 3rd quartile 1st quartile
26
Interquartile Range
Example: X
minimum
25%
Q1
25%
Median (Q2)
25%
Q3
25%
maximum
12
30
45
57
70
Interquartile range = 57 30 = 27
27
Variance
Average of squared deviations of values from the mean
Sample variance:
n 2
(x
i !1 N
x)
s !
n -1
Population variance:
(x
!
i !1
)
28
Standard Deviation
Most commonly used measure of variation Shows variation about the mean Has the same units as the original data
n
(x i x )2
i !1
n -1
N
(x i )2 !
i !1
29
14
15
17
18
18
24
Mean = x = 16
(10 x )2 (12 x )2 (14 x )2 . (24 x )2 n 1 (10 16) 2 (12 16)2 (14 16)2 . (24 16) 2 8 1 126 7 ! 4.2426
30
Data B
11 12 13 14 15 16 17 18 19 20 21
Data C
11 12 13 14 15 16 17 18 19 20 21
Coefficient of Variation
Measures relative variation Always in percentage (%) Shows variation relative to mean Is used to compare two or more sets of data measured in different units
Population Sample
CV !
100%
s 100% CV ! x
32
s CVA ! x
Stock B:
s CVB ! x
Both stocks have the same standard deviation, but stock B is less variable relative to its price
33
s1
34
the population or the sample s 3 contains about 99.7% of the values in the population or the sample
95%
99.7%
s2
s3
35
Tchebysheffs Theorem
Regardless of how the data are distributed, at least (1 - 1/k2) of the values will fall within k standard deviations of the mean
Examples:
At least within (1 - 1/12) = 0% ..... k=1 ( 1 ) (1 - 1/22) = 75% ........ k=2 ( 2 ) (1 - 1/32) = 89% . k=3 ( 3 )
36
37
38
39
40
Using Excel
41
Using Excel
Enter dialog box details
(continued)
42
Excel output
Microsoft Excel descriptive statistics output, using the house price data:
House Prices: $2,000,000 500,000 300,000 100,000 100,000
43