Академический Документы
Профессиональный Документы
Культура Документы
CVEN2002/CVEN2702
2016
2. Descriptive Statistics
Graphical representations
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
2 / 58
2. Descriptive Statistics
Stem-and-leaf plot
A stem-and-leaf plot (or just stemplot) is another effective way to
organise numerical data without much effort.
Idea: separate each observation into a stem (all but last digit) and
a leaf (final digit).
Example:
24 := 2|4
139 := 13|9
5 := 0|5.
Write all unique stems in vertical column with the smallest at the
top, and draw a vertical line at the right of this column.
Write each leaf in the row to the right of its stem, in increasing
order out from the stem.
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
3 / 58
2. Descriptive Statistics
26 57 66 66 41 46 65 35 46 38 44 29 43
27 18 46 30 32 35 59 39 32 31 39 21 58
53 27 38 52 29 58 45 34 36 56 47 22 59
39 23 55 50 42 18 48 64 44 46 66 33 61
42 42 26 47 67 37 39 58 26 41 61 51 61
28 52 36 62 31 38 42 64 51 54 33 19 25
37 56 43 28 56 49 39 57 48 52 60 17 49
36 58 47 16 33 27 29 48 45 34 57 56 48
04 41 64 37
Dr Jakub Stoklosa
Semester 2, 2016
4 / 58
2. Descriptive Statistics
4
1345678889
1223456666777889999
0112233344555666677777888899999
111222223344445566666677788888999
00111222233455666667777888899
01111244455666778
Dr Jakub Stoklosa
Semester 2, 2016
5 / 58
2. Descriptive Statistics
Stem-and-leaf plot
The stemplot conveys information about the following aspects of the
data:
identification of a typical value.
extent of spread about the typical values.
presence of any gaps in the data.
extent of symmetry in the distribution values.
number and location of peaks.
presence of any outlying values.
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
6 / 58
2. Descriptive Statistics
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
7 / 58
2. Descriptive Statistics
4
134
5678889
12234
56666777889999
0112233344
555666677777888899999
11122222334444
5566666677788888999
001112222334
55666667777888899
011112444
55666778
Dr Jakub Stoklosa
Semester 2, 2016
8 / 58
2. Descriptive Statistics
5
6
7
8
9
9
01445
1223567
01334578
156688
The right side appears to be shifted down one row from the other
side (better scores in Class 2 than in Class 1).
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
9 / 58
2. Descriptive Statistics
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
10 / 58
2. Descriptive Statistics
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
11 / 58
2. Descriptive Statistics
100
80
60
40
0
20
120
140
737
757
747
767
707
airplane model
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
12 / 58
2. Descriptive Statistics
Histogram: example
Example
A sample of students from a statistics class were asked how many credit
cards they carry. For 150 students, the frequency distribution is given below:
Number of cards
Frequency
0
12
1
42
2
57
3
24
4
9
5
4
6
2
30
20
0
10
frequency
40
50
The frequency distribution for the number of credit cards is given in the
following histogram:
CVEN2002/CVEN2702 (Statistics)
Drnumber
JakubofStoklosa
credit cards
Semester 2, 2016
13 / 58
2. Descriptive Statistics
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
14 / 58
2. Descriptive Statistics
number of observations
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
15 / 58
2. Descriptive Statistics
Dr Jakub Stoklosa
Semester 2, 2016
16 / 58
2. Descriptive Statistics
Histogram: example
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
17 / 58
2. Descriptive Statistics
Histogram: example. . .
[4, 6)
5
[6, 8)
18
[8, 10)
23
[10, 12)
20
[12, 14)
16
[14, 16)
4
[16, 18)
2
[18, 20)
1
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
18 / 58
2. Descriptive Statistics
10
0
Frequency
15
20
5
CVEN2002/CVEN2702 (Statistics)
10
Dr Jakub Stoklosa
15
20
Semester 2, 2016
19 / 58
2. Descriptive Statistics
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
20 / 58
2. Descriptive Statistics
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
21 / 58
2. Descriptive Statistics
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
22 / 58
2. Descriptive Statistics
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
23 / 58
2. Descriptive Statistics
Example: Bimodal
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
24 / 58
2. Descriptive Statistics
15
0
10
Frequency
20
25
30
20
40
60
80
Dr Jakub Stoklosa
Semester 2, 2016
25 / 58
2. Descriptive Statistics
15
0
10
Frequency
20
25
30
when class widths are unequal, the frequencies will give a distorted
representation of reality.
20
40
60
80
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
26 / 58
2. Descriptive Statistics
Density histogram
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
27 / 58
2. Descriptive Statistics
Density histogram
So we have:
relative frequency = density class width
= rectangle height rectangle width
= rectangle area
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
28 / 58
2. Descriptive Statistics
Density histogram
0.03
0.00
0.01
0.02
Density
0.04
0.05
0.06
20
40
60
80
Dr Jakub Stoklosa
Semester 2, 2016
29 / 58
2. Descriptive Statistics
Descriptive measures
Dotplots, stem-and-leaf plots, histograms and density histograms
summarise a data set graphically so we can visually discern the
overall pattern of variation.
It is also useful to numerically describe the data set.
Usual notation:
x1 , x2 , . . . , xn
for a sample consisting of n observations of the variable X .
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
30 / 58
2. Descriptive Statistics
Sample mean
n
x =
1X
xi
n
i=1
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
31 / 58
2. Descriptive Statistics
Example
Metabolic rate is the rate at which someone consumes energy.
Data (calories per 24 hours) from 7 men in a study of dieting are:
1792; 1666; 1362; 1614; 1460; 1867; 1439
Find the mean.
x = 17 (1792 + 1666 + 1362 + 1614 + 1460 + 1867 + 1439) = 1600
(calories/24h)
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
32 / 58
2. Descriptive Statistics
Sample median
The median, usually denoted m (or x ), is the value which divides the
data into two equal parts, half below the median and half above.
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
33 / 58
2. Descriptive Statistics
CVEN2002/CVEN2702 (Statistics)
1
x( n ) + x( n +1) .
2
2
2
Dr Jakub Stoklosa
Semester 2, 2016
34 / 58
2. Descriptive Statistics
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
35 / 58
2. Descriptive Statistics
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
36 / 58
2. Descriptive Statistics
1
(4 70, 000 + 160, 000) = 88, 000 ($)
5
Dr Jakub Stoklosa
Semester 2, 2016
37 / 58
2. Descriptive Statistics
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
38 / 58
2. Descriptive Statistics
Dr Jakub Stoklosa
Semester 2, 2016
39 / 58
2. Descriptive Statistics
Dr Jakub Stoklosa
Semester 2, 2016
40 / 58
2. Descriptive Statistics
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
41 / 58
2. Descriptive Statistics
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
42 / 58
2. Descriptive Statistics
Measures of variability
Different samples may have identical measures of centre, yet
differ from one another in other important ways.
In the figure below, we observe that the dispersion of a set of
observations is small if the values are closely clustered around
their mean (red sample), and that it is large if the values are
scattered widely about their mean (blue sample) here, both
samples have mean 10.
9.0
9.5
10.0
10.5
11.0
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
43 / 58
2. Descriptive Statistics
Measures of variability
It would seem reasonable to measure the variability in a dataset in
terms of the amounts by which the values deviate from their mean.
Define the deviations from the mean:
(x1 x ), (x2 x ), . . . , (xn x )
We might then think of using the average of those deviations as a
measure of variability in the data set.
Unfortunately, this will not do, because this average is always 0:
n
i=1
i=1
1X
1X
(xi x ) =
xi x = x x = 0
n
n
We need to remove the signs of those deviations, so that positive
and negative ones do not cancel each other out.
Taking the square is a natural thing to do.
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
44 / 58
2. Descriptive Statistics
Sample variance
n
1 X
(xi x )2
s =
n1
2
i=1
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
45 / 58
2. Descriptive Statistics
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
46 / 58
2. Descriptive Statistics
(xi x ) =
xi +
xi x
x 2
i=1
i=1
n
X
i=1
n
X
i=1
i=1
xi2 + nx 2 2x
n
X
xi
i=1
xi2 + nx 2 2nx 2 =
i=1
n
X
xi2 nx 2 .
i=1
Hence,
1
s2 =
n1
n
X
!
xi2 nx 2
i=1
Dr Jakub Stoklosa
Semester 2, 2016
47 / 58
2. Descriptive Statistics
s2 =
1
n1
Pn
i=1 (xi
x )2
Dr Jakub Stoklosa
Semester 2, 2016
48 / 58
2. Descriptive Statistics
s2 =
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
49 / 58
2. Descriptive Statistics
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
50 / 58
2. Descriptive Statistics
Dr Jakub Stoklosa
Semester 2, 2016
51 / 58
2. Descriptive Statistics
Outliers: example
Example
Consider the energy consumption data on Slide 16. Suppose that the
five number summary is
{2.97, 7.9475, 9.835, 12.045, 18.26}
Find possible outliers.
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
52 / 58
2. Descriptive Statistics
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
53 / 58
2. Descriptive Statistics
Boxplots
A boxplot is a graphical display showing the five number
summary and any outlier value.
It is sometimes called box-and-whisker plot.
A central box spans the quartiles.
Dr Jakub Stoklosa
15
outlier
1.5 x IQR
Q3
10
IQR
Q2
Q1
1.5 x IQR
Semester 2, 2016
54 / 58
2. Descriptive Statistics
Boxplots
15
20
25
10
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
55 / 58
2. Descriptive Statistics
Quiz
Adapted from a 2011 exam.
In an air-pollution study, ozone concentrations were taken in a large
California city at 5.00 p.m. The eight readings (in parts per million)
were
7.9, 11.3, 6.9, 12.7, 13.2, 8.8, 9.3, 10.6
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
56 / 58
2. Descriptive Statistics
Solution to Quiz
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
57 / 58
2. Descriptive Statistics
Objectives
Objectives
Now you should be able to:
Compute an interpret the sample mean, sample variance, sample
standard deviation, sample median and sample quartiles.
Construct and interpret visual data displays, including the
stem-and-leaf plot, the histogram and the boxplot.
Comment and assess the overall pattern of data from visual
displays
Explain how to use the boxplots to visually compare two or more
samples of data
Recommended exercises (from the textbook):
Q5 p.20, Q17 p.23, Q3 p.69, Q15 p.77, Q35 p.86, Q37 p.86,
Q39 p.86, Q53 (a,c,d) p.95, Q59 p.96, Q67 p.98, Q69 p.98
CVEN2002/CVEN2702 (Statistics)
Dr Jakub Stoklosa
Semester 2, 2016
58 / 58