Statistic Frequency Distribution

Chapter 2
Descriptive Statistics
Larson/Farber 4th ed.
Chapter Outline
2.1 Frequency Distributions and Their Graphs
2.2 More Graphs and Displays
2.3 Measures of Central Tendency
2.4 Measures of Variation
2.5 Measures of Position
Overview
Descriptive Statistics
Describes the important characteristics of a set of
data.
Organize, present, and summarize data:
1. Graphically
2. Numerically
Important Characteristics of
Quantitative Data
Shape, Center, and Spread
Center: A representative or average value that
indicates where the middle of the data set is located.
Variation: A measure of the amount that the values
vary among themselves.
Distribution: The nature or shape of the distribution
of data (such as bell-shaped, uniform, or skewed).
Overview
2.1 Frequency Distributions and Their Graphs
2.2 More Graphs and Displays
2.3 Measures of Central Tendency
2.4 Measures of Variation
2.5 Measures of Position
Section 2.1
Frequency Distributions
and Their Graphs
Frequency Distribution
A table that organizes data values into classes or intervals

along with number of values that fall in each class
(frequency, f ).
1. Ungrouped Frequency Distribution for data sets with
few different values. Each value is in its own class.
2. Grouped Frequency Distribution: for data sets with
many different values, which are grouped together in
the classes.
Grouped and Ungrouped

Ungrouped
Courses Frequency, f
Taken
Grouped
Age of Frequency, f
Voters
25
18-30
202
38
31-42
508
217
43-54
620
1462
55-66
413
932
67-78
158
15
78-90
32
Ungrouped Frequency Distributions

Number of Peas in a Pea
Pod
Sample Size: 50
Peas per
pod
Freq, f
Freq,
Peas per pod
f
1
18
12
Graphs of Frequency Distributions:

Frequency Histograms
frequency
Frequency Histogram
A bar graph that represents the frequency distribution.
The horizontal scale is quantitative and measures the
data values.
The vertical scale measures the frequencies of the
classes.
Consecutive bars must touch.
data values
10
Frequency Histogram
Ex. Peas per Pod
Peas per pod
Freq, f
18
12
Relative Frequency Distributions and

Relative Frequency Histograms
Relative Frequency Distribution
Shows the portion or percentage of the data that falls
in a particular class.
class frequency f
relative frequency
Sample size
n
Relative Frequency Histogram
Has the same shape and the same horizontal scale as
the corresponding frequency histogram.
The vertical scale measures the relative frequencies,
not frequencies.
12
Relative Frequency Histogram

Has the same shape and horizontal scale as a
histogram, but the vertical scale is marked with
relative frequencies.
Grouped Frequency Distributions

Grouped Frequency Distribution
For data sets with many different values.
Groups data into 5-20 classes of equal width.
Exam Scores
Freq, f
Exam Scores
Freq, f
30-39
30-39
40-49
40-49
50-59
50-59
60-69
60-69
70-79
70-79
13
80-89
80-89
10
90-99
90-99
Exam Scores
Freq, f
Grouped Frequency Distribution Terms

Lower class limits: are the smallest numbers that can
actually belong to different classes
Upper class limits: are the largest numbers that can
actually belong to different classes
Class width: is the difference between two
consecutive lower class limits
15
Labeling Grouped Frequency

Distributions
Class midpoints: the value halfway between LCL
and UCL:
(Lower class limit) (Upper class limit)
2
Class boundaries: the value halfway between an

UCL and the next LCL
(Upper class limit) (next Lower class limit)
2
Constructing a Grouped Frequency

Distribution
1. Determine the range of the data:
Range = highest data value lowest data value
May round up to the next convenient number
2. Decide on the number of classes.
Usually between 5 and 20; otherwise, it may be difficult to detect any
patterns.
3. Find the class width:
.
Round up to the next convenient number.
range
class width =
number of classes
17
Constructing a Frequency Distribution

4. Find the class limits.
Choose the first LCL: use the minimum data entry
or something smaller that is convenient.
Find the remaining LCLs: add the class width to the
lower limit of the preceding class.
Find the UCLs: Remember that classes must cover
all data values and cannot overlap.
5. Find the frequencies for each class. (You may add a
tally column first and make a tally mark for each data
value in the class).
18
Shape of Distributions
Symmetric
Data is symmetric if the left half of its histogram is

roughly a mirror image of its right half.
Skewed
Data is skewed if it is not symmetric and if it extends
more to one side than the other.
Uniform
Data is uniform if it is equally distributed (on a
histogram, all the bars are the same height or
approximately the same height).
The Shape of Distributions

Symmetric
Skewed left
Uniform
Skewed Right
Outliers
Outliers
Unusual data values as compared to the rest of the set.
They may be distinguished by gaps in a histogram.
Section 2.2
More Graphs and Displays
22
Other Graphs
Besides Histograms, there are other methods of
graphing quantitative data:
Stem and Leaf Plots

Dot Plots
Time Series
Stem and Leaf Plots

Represents data by separating each data value into
two parts: the stem (such as the leftmost digit) and
the leaf (such as the rightmost digit)
24
Constructing Stem and Leaf Plots
Split each data value at the same place value to form the stem and a leaf. (Want 5-20 stems).
Arrange all possible stems vertically so there are no missing stems.
Write each leaf to the right of its stem, in order.
Create a key to recreate the data.
Variations of stem plots:
1. Split stems
2. Back to back stem plots.
25
Constructing a Stem-and-Leaf Plot
Include a key to identify

the values of the data.
26
Dot Plots
Dot plot
Consists of a graph in which each data value is plotted as
a point along a scale of values
Figure 2-5
Time Series
(Paired data)
Quantitative
data
Time Series
Data set is composed of quantitative entries taken at
regular intervals over a period of time.
e.g., The amount of precipitation measured each
day for one month.
Use a time series chart to graph.
time
28
Time-Series Graph
Number of Screens at Drive-In Movies Theaters
Figure 2-8
Ex. www.eia.doe.gov/oil_gas/petroleum/
Graphing Qualitative Data Sets
Pareto Chart
A vertical bar graph in which the
height of each bar represents
frequency or relative frequency.
Frequency
Pie Chart
A circle is divided into sectors
that represent categories.
Categories
30
Constructing a Pie Chart

Find the total sample size.
Convert the frequencies to relative frequencies (percent).
Marital Status
Frequency,f Relative frequency (%)

(in millions)
Never Married
55.3
Married
127.7
Widowed
13.9
Divorced
22.8
Total: 219.7
55.3
219.7
127.7
219.7
13.9
219.7
22.8
219.7
0.25 or 25%
31
Constructing Pareto Charts

Create a bar for each category, where the height of the
bar can represent frequency or relative frequency.
The bars are often positioned in order of decreasing
height, with the tallest bar positioned at the left.
Figure 2-6
Section 2.3
Measures of Central Tendency
33
Measures of Central Tendency

Measure of central tendency
A value that represents a typical, or central, entry of a
data set.
Most common measures of central tendency:
Mean
Median
Mode
34
Measure of Central Tendency: Mean

Mean : The sum of all the data entries divided by the
number of entries.
Population mean:
Sample mean:
x
x
n
Round-off rule for measures of center:
Carry
one more decimal place than is in the original values. Do
not round until the last step.
35
Measure of Central Tendency: Median

Median
The value that lies in the middle of the data when the data
set is arranged in order from lowest to highest. .
Measures the center of an ordered data set by dividing it
into two equal parts.
A sample mean is often referred to as x.
~
If the data set has an
odd number of entries: median is the middle data entry.
even number of entries: median is the mean of the two
middle data entries.
36
Computing the Median

If the data set has an:
odd number of entries: median is the middle data entry:
2
11
13
%
median is the exact middle value: x 6
even number of entries: median is the mean of the two

middle data entries:
2
11
13
67
%
6.5
median is the mean of the by two numbers: x
2
37
Measure of Central Tendency: Mode

Mode
The data entry that occurs with the greatest frequency.
If no entry is repeated the data set has no mode.
If two entries occur with the same greatest frequency,
each entry is a mode (bimodal).
a) 5.40 1.10 0.42 0.73 0.48 1.10
Mode is 1.10
b) 27 27 27 55 55 55 88 88 99
Bimodal -
c) 1 2 3 6 7 8 9 10
No Mode
27 & 55
Comparing the Mean, Median, and Mode

All three measures describe an average. Choose the one that best
represents a typical value in the set.
Mean:
The most familiar average.
A reliable measure because it takes into account every entry of a
data set.
May be greatly affected by outliers or skew.
Median:
A common average.
Not as effected by skew or outliers.
Mode: May be used if there is an overwhelming repeat.
Choosing the Best Average

The shape of your data and the existence of any
outliers may help you choose the best average:
Section 2.4
Measures of Variation
41
Measures of Variation (Spread)

Another important characteristic of quantitative data is how
much the data varies, or is spread out.
The 2 most common method of measuring spread are:
1. Range
2. Standard deviation and Variance
42
Range
Range
The difference between the maximum and minimum
data entries in the set.
The data must be quantitative.
Range = (Max. data entry) (Min. data entry)
43
Example: Finding the Range

The wait time to see a bank teller is studied at 2 banks.
Bank A has multiple lines, one for each teller.
Bank B has a single wait line for 1st available teller.
5 wait times (in minutes) are sampled from each bank:
Bank A:
5.2 6.2 7.5 8.4 9.2
Bank B:
6.6 6.8 7.5 7.7 7.9
Find the mean, median, and range for each bank.
Solution: Finding the Range

Bank A: Range = ?
Bank B: Range = ?
Note: The range is easy to compute, but only uses 2
values. Do the following 2 sets vary the same?
Set A: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
Set B: 1, 10, 10, 10, 10, 10, 10, 10, 10, 10
45
Standard Deviation and Variance

Measures the typical amount data deviates from the
mean.
Sample Variance, s:2
(
x
x
)
s2
n 1
Sample Standard Deviation, s:
( x x )
s s
n 1
2
46
Finding Sample Variance & Standard Deviation

1.
Find the mean of the sample

data set.
2.
Find deviation of each entry.
3.
Square each deviation.
4.
Add to get the sum of the

deviations squared.
5.
Divide by n 1 to get the

sample variance.
6.
Find the square root to get

the sample standard
deviation.
x
n
xx
( x x )2
( x x ) 2
2
(
x
x
)
s2
n 1
( x x ) 2
s
n 1
47
Find the Standard Deviation and Variance

for Bank A (multi-line)
x 36.5
x
7.3 min Wait time, Deviation: x x

x (in min)
n
5
( x x )
s
n 1
2
Squares: (x x)2
5.2
5.2 7.3 = -2.1
(2.1)2 = 4.41
6.2
6.2 7.3 =
)2 =
7.5
7.5 7.3 =
)2 =
8.4
8.4 7.3 =
)2 =
9.2
9.2 7.3 =
)2 =
(x x) =
x x
x 36.5
s s2
Round to one more decimal than the data.
Dont round until the end.
Include the appropriate units.
Find the Standard Deviation and Variance

for Bank B (1 wait line)
x 36.5
x
7.3 min Wait time, Deviation: x x

x (in min)
n
5
Squares: (x x)2
6.6
6.8
2
(
x
x
)
s2
n 1
7.5
7.7
7.9
x 36.5
(x x) =
x x
2
s s2
Round to one more decimal than the data.
Dont round until the end.
Include the appropriate units.
Sample versus Population

Standard Deviation and Variance
Sample
Statistics:
Population
Parameters:
Mean
Standard
Deviation
Variance
s2
Sample versus Population

Standard Deviation
Note: Unlike x and , the formulas for s and
are not mathematically the same:
Sample Standard Deviation
( x x )
s s
n 1
2
Population Standard Deviation

2
(
x
)
2
N
51
Standard Deviation: Key Points
s0
( When would s = 0 ?)
The standard deviation is a measure of variation of all

values from the mean. The larger s is, the more the
data varies.
The units of the standard deviation s are the same as
the units of the original data values. (The variance
has units2).
The value of the standard deviation s can increase
dramatically with the inclusion of one or more
outliers (data values far away from all others)
Interpreting Standard Deviation

Standard deviation is a measure of the typical amount
an entry deviates from the mean.
The more the entries are spread out, the greater the
standard deviation.
53
Solution: Using Technology to Find the

Standard Deviation
Sample Mean
Sample Standard
Deviation
54
Using Technology
The gas mileage of 2 cars is sampled over various
conditions:
Car A:
Car B:
21.1 21.2 20.8 19.8 23.8 (mpg)

25.2 19.1 18.0 24.4 20.3 (mpg)
Which car do you think gets better mpg?

Use a calculator to find the mean and standard deviation
for each to justify your choice.
Standard Deviation and Spread

How does s show how much the data varies?
Three methods:
1. Range Rule of Thumb
2. Chebyshevs Theorem
3. The Empirical Rule
The Range Rule of Thumb

Range Rule: For most data sets, the majority of the
data lies within 2 standard deviations of the mean.
Recall: Range = High Lo
Estimate: Range 4s
Alternatively, If the range is known, you can use the range

rule to estimate the standard deviation:
Range
4
Using the Range Rule of Thumb

A sample of womens heights has a mean of 64
inches and a standard deviation of 2.5 inches.
Using the range rule, most women fall within
what heights?
What would be an unusual height?
Using the Range Rule of Thumb

The sample of Exam Scores used in the class
handout had a mean of 73.6. Which of the
following is most likely the standard deviation of
the sample?
s = 3.6
s = 12.8
s = 74.5
Use the range rule to help justify your choice.
Chebyshevs Theorem
Chebyshevs Theorem
For data with any distribution, the proportion (or
fraction) of any set of data lying within K standard
deviations of the mean is always at least 1-1/K2, where
K is any positive number greater than 1.
For K = 2, at least 3/4 (or 75%) of all values lie
within 2 standard deviations of the mean
For K = 3, at least 8/9 (or 89%) of all values lie
within 3 standard deviations of the mean
Using Chebyshevs Theorem

A sample of salaries at an elementary school has a
mean of $32,000 and a standard deviation of $3000.
Use Chebyshevs Theorem to describe how the salaries
are spread out.
Would a salary of $28,000 be unusual?
Would a salary of $45,000 be unusual?
The Empirical Rule

Empirical (68-95-99.7) Rule
For data sets having a symmetric distribution:
About 68% of all values fall within 1 standard
deviation of the mean
About 95% of all values fall within 2 standard
deviations of the mean
About 99.7% of all values fall within 3 standard
deviations of the mean
The Empirical Rule
The Empirical Rule
The Empirical Rule
Example: Using the Empirical Rule

A sample of IQs has a symmetric distribution with a mean
of 100 and a standard deviation of 15.
1. Sketch the distribution.
2. 68% of people have an IQ between what 2 values?
3. What percent of people have an IQ between 70 and 130?
4. What percent of people have an IQ between 100 and 115?
5. What percent of people have an IQ above 145?
66

Statistic Frequency Distribution

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Statistic Frequency Distribution

Загружено:

Авторское право:

Доступные форматы

Chapter 2

Larson/Farber 4th ed.

Larson/Farber 4th ed.

Larson/Farber 4th ed.

Larson/Farber 4th ed.

Larson/Farber 4th ed.

A table that organizes data values into classes or intervals

Grouped and Ungrouped

Ungrouped Frequency Distributions

Graphs of Frequency Distributions:

Relative Frequency Distributions and

Relative Frequency Histogram

Grouped Frequency Distributions

Grouped Frequency Distribution Terms

Labeling Grouped Frequency

Class boundaries: the value halfway between an

Constructing a Grouped Frequency

Constructing a Frequency Distribution

Data is symmetric if the left half of its histogram is

The Shape of Distributions

Larson/Farber 4th ed.

Stem and Leaf Plots

Stem and Leaf Plots

Larson/Farber 4th ed.

Constructing Stem and Leaf Plots

Larson/Farber 4th ed.

Constructing a Stem-and-Leaf Plot

Include a key to identify

Larson/Farber 4th ed.

Graphing Qualitative Data Sets

Constructing a Pie Chart

Frequency,f Relative frequency (%)

Constructing Pareto Charts

Larson/Farber 4th ed.

Measures of Central Tendency

Larson/Farber 4th ed.

Measure of Central Tendency: Mean

Measure of Central Tendency: Median

Computing the Median

even number of entries: median is the mean of the two

Measure of Central Tendency: Mode

a) 5.40 1.10 0.42 0.73 0.48 1.10

Comparing the Mean, Median, and Mode

Choosing the Best Average

Larson/Farber 4th ed.

Measures of Variation (Spread)

Larson/Farber 4th ed.

Larson/Farber 4th ed.

Example: Finding the Range

Solution: Finding the Range

Larson/Farber 4th ed.

Standard Deviation and Variance

Sample Standard Deviation, s:

Finding Sample Variance & Standard Deviation

Find the mean of the sample

Find deviation of each entry.

Square each deviation.

Add to get the sum of the

Divide by n 1 to get the

Find the square root to get

Find the Standard Deviation and Variance

7.3 min Wait time, Deviation: x x

5.2 7.3 = -2.1