BIOSTAT Chapter2

Chapter 2:
Frequency
Distributions
1
Frequency Distributions
After collecting data, the first task for a
researcher is to organize and simplify the
data so that it is possible to get a general
overview of the results. This is the goal of
descriptive statistical techniques.
One method for simplifying and organizing data
is to construct a frequency distribution.
Frequency Distributions (cont.)
A frequency distribution is an organized

tabulation showing exactly how many
individuals are located in each category on
the scale of measurement.
A frequency distribution presents an
organized picture of the entire set of
scores, and it shows where each
individual is located relative to others in
the distribution.
3
FREQUENCY DISTRIBUTIONS
(CONT.)
A table that organizes data values into classes
or intervals along with number of values that
fall in each class (frequency, f ).
1. Ungrouped Frequency Distribution for

data sets with few different values. Each
value is in its own class.
2. Grouped Frequency Distribution: for data

sets with many different values, which
are grouped together in the classes.
Grouped and Ungrouped

Frequency Distributions
Ungrouped
Courses Frequency, f
Taken
Grouped
Age of Frequency, f
Voters
25
18-30
202
38
31-42
508
217
43-54
620
1462
55-66
413
932
67-78
158
15
78-90
32
Ungrouped Frequency Distributions

Number of Peas in a Pea Pod
Sample Size: 50
5
Peas per
pod
Freq, f
Freq,
Peas per pod
f
1
18
12
Frequency Distribution Tables

A frequency distribution table consists of at least
two columns - one listing categories on the scale of
measurement (X) and another for frequency (f).
In the X column, values are listed from the highest
to lowest, without skipping any.
For the frequency column, tallies are determined
for each value (how often each X value occurs
in the data set). These tallies are the frequencies
for each X value.
The sum of the frequencies should equal n.
Grouped Frequency
Distribution
Sometimes, however, a set of scores covers

a wide range of values. In these situations,
a list of all the X values would be quite long
- too long to be a simple presentation of
the data.
To remedy this situation, a grouped
frequency distribution table is used.
Grouped Frequency Distribution

(cont.)
In a grouped table, the X column lists
groups of scores, called class intervals,
rather than individual values.
These intervals all have the same width,
usually a simple number such as 2, 5, 10,
and so on.
Each interval begins with a value that is a
multiple of the interval width. The interval
width is selected so that the table will have
approximately ten intervals.
Key Concepts:
Data in its original form and structure

are called raw data/ ungrouped data.
Example: Consider the following data
on 40 womens testosterone serum level
(scores) measured in mg/dl.
79
83
84
62
62
43
72
48
46
59
93
64
59
32
54
45
55
45
76
72
40
51
51
72
83
49
62
85
74
40
49
65
38
55
77
63
38
43
63
69
To
construct
a
frequency
distribution of the given raw data,
we first find the highest serum
level and prepare a column of the
these levels beginning from the
highest value and ending at the
lowest one. Since the highest
serum level is 93 and the lowest is
32, we have:
Serum
Level
Serum
Level
93
62
III
85
59
II
84
55
83
II
54
79
51
II
77
49
II
76
48
74
46
72
III
45
II
69
43
II
65
40
II
64
38
II
63
II
32
When these data are placed into a system wherein they

are organized, then these partake the nature of
grouped data. This procedure of organizing data into
groups is called a frequency distribution table
(FDT).
Example:
The following presents a frequency distribution table of
the grouped data of the urine amylase (scores) of 15
patients in amylase units/hour.
SCORES
FREQUENCY
10-19
20-29
30-39
40-39
50-59
1
15
Components of a Frequency Table

Class Interval- these are the numbers defining the
class; consists of the end numbers called the class
limits namely the upper limit and the lower limit.
Class frequency- shows the number of observation
falling in the class
Class Boundaries- these are the so called true class
limits, classified as:
Lower Class Boundary (LCB)- defined as the
middle value of the lower class limit of the class
and the upper class limit of the preceding class
Upper Class Boundary- defined as the
middle value between the upper class limit of
the class and the lower limit of the next class
Class size- the difference between two consecutive

upper limits or two consecutive lower limits
Classmark(CM)- midpoint or the middle value of a
class interval
Cumulative frequency- shows the accumulated
frequencies of successive classes
Greater than CF (>CF)- shows the number

of observations greater than LCB
Less than CF(<CF)- shows the number of
observations less than UCB
Steps in Constructing
FDT:
Step 1) Determine the number of classes. For first

approximation, it is suggested to use the STURGES
APPROXIMATION FORMULA.
K= 1+3.322 log n
where K=approximate number of classes
n= number of class
Example: We now construct the FDT

of the testosterone serum level of 40
women as shown in the raw data:
Step 2) Determine the range R:

where R= maximum value-minimum value
R= max-min
R= 93-32
R= 61
Step 3) Determine the approximate class size C using

the formula
C= R/K
where R= range & K= Sturges Approximation
Formula
Note: It is usually convenient to round off C to a
nearest whole number.
C= R/K
C= 61/6
C= 10.167
C=10
Step 4) Determine the lowest class interval (or the

first class). This class should include the minimum
value in the data set. For uniformity, let us agree
that for our purposes, the lower limit of the lowest
class interval should start at the minimum value.
Let us decide to start at the minimum

value. Thus the lowest class is the
class 32-41.
Step 5) Determine all class limits by adding the

class size C to the limits of the previous class.
The classes constructed by adding 10 each class
limit. Thus we have:
32
41
42
51
52
61
62
71
72
81
82
91
92
101
Step 6: Tally the scores or observation falling in each class.
Tally
Classes
32
42
41
51
Frequency
IIII
IIII-IIII
IIII
52
61
62
71
IIII-III
72
81
IIII-III
82
91
IIII
92
101
4
8
8
5
1
N=40
The following table presents the complete frequency

distribution table indicating the class boundaries, the
class marks as well as the cumulative frequencies.
Frequency Distribution
Graphs
In a frequency distribution graph, the score
categories (X values) are listed on the X axis
and the frequencies are listed on the Y axis.
When the score categories consist of
numerical scores from an interval or ratio
scale, the graph should be either a
histogram or a polygon.
Histograms
In a histogram, a bar is centered above each
score (or class interval) so that the height of
the bar corresponds to the frequency and the
width extends to the real limits, so that
adjacent bars touch.
Polygons
In a polygon, a dot is centered above each
score so that the height of the dot
corresponds to the frequency. The dots are
then connected by straight lines.
An
additional line is drawn at each end to bring
the graph back to a zero frequency.
28
Bar graphs
When the score categories (X values) are
measurements from a nominal or an
ordinal scale, the graph should be a bar
graph.
A bar graph is just like a histogram
except that gaps or spaces are left
between adjacent bars.
30
Relative frequency
Many populations are so large that it is
impossible to know the exact number of
individuals (frequency) for any specific
category.
In these situations, population distributions
can be shown using relative frequency
instead of the absolute number of individuals
for each category.
32
Smooth curve
If the scores in the population are measured on
an interval or ratio scale, it is customary to
present the distribution as a smooth curve
rather than a jagged histogram or polygon.
The smooth curve emphasizes the fact that the
distribution is not showing the exact frequency
for each category.
34
Frequency distribution
graphs
Frequency distribution graphs are useful
because they show the entire set of scores.
At a glance, you can determine the highest
score, the lowest score, and where the scores
are centered.
The graph also shows whether the scores are
clustered together or scattered over a wide
range.
36
Shape
A graph shows the shape of the distribution.
A distribution is symmetrical if the left side
of the graph is (roughly) a mirror image of the
right side.
One example of a symmetrical distribution is
the bell-shaped normal distribution.
On the other hand, distributions are skewed
when scores pile up on one side of the
distribution, leaving a "tail" of a few extreme
values on the other side.
37
Positively and
Negatively
Distributions
aSkewed
positively
skewed distribution,
In
the
scores tend to pile up on the left side of
the distribution with the tail tapering off to
the right.
In a negatively skewed distribution, the
scores tend to pile up on the right side and
the tail points to the left.
38
Percentiles, Percentile
Ranks,
and Interpolation
The relative location of individual scores

within a distribution can be described by
percentiles and percentile ranks.
The percentile rank for a particular X value
is the percentage of individuals with scores
equal to or less than that X value.
When an X value is described by its rank, it is
called a percentile.
40
Percentiles, Percentile
Ranks,
and
Interpolation
(cont.)
find percentiles and percentile ranks, two
To
new
columns are placed in the frequency distribution table:
One is for cumulative frequency (cf) and the other is for
cumulative percentage (c%).
Each cumulative percentage identifies the percentile
rank for the upper real limit of the corresponding score
or class interval. When scores or percentages do not
correspond to upper real limits or cumulative
percentages, you must use interpolation to determine
the corresponding ranks and percentiles. Interpolation
is a mathematical process based on the assumption that
the scores and the percentages change in a regular,
linear fashion as you move through an interval from one
end to the other.
41
Interpolation
When scores or percentages do not
correspond to upper real limits or
cumulative percentages, you must use
interpolation
to
determine
the
corresponding ranks and percentiles.
Interpolation is a mathematical process
based on the assumption that the scores
and the percentages change in a regular,
linear fashion as you move through an
interval from one end to the other.
42
Stem-and-Leaf Displays
A stem-and-leaf display provides a very
efficient method for obtaining and
displaying a frequency distribution.
Each score is divided into a stem
consisting of the first digit or digits, and a
leaf consisting of the final digit.
Finally, you go through the list of scores,
one at a time, and write the leaf for each
score beside its stem.
The resulting display provides an
organized picture of the entire distribution.
The number of leafs beside each stem
corresponds to the frequency, and the
individual leafs identify the individual
scores.
44
Descriptive Statistics
Sample Illustration:
Which Group is Smarter?
Class A--IQs of 13 Students
Class B--IQs of 13 Students
102
115
127
162
128
109
131
103
131
89
96
111
80
109
93
87
98
106
140
93
110
119
97
120
105
109
Each individual may be different. If you try to understand a

group by remembering the qualities of each member, you
become overwhelmed and fail to understand the group.
Descriptive Statistics
Which group is smarter now?
Class A--Average IQ
110.54
Class B--Average IQ
110.23
Theyre roughly the same!

With a summary descriptive statistic, it is
much easier to answer our question.
Besides Histograms, there are other

methods of graphing quantitative
data:
Stem and Leaf Plots
Dot Plots
Time Series
Other Graphs
Represents data by separating each data value into two

parts: the stem (such as the leftmost digit) and the leaf (such
as the rightmost digit)
Stem and Leaf Plots
Larson/Farber 4th ed.
49
50
Constructing Stem and Leaf

Plots
Split each data value at the same place value to form the
stem and a leaf. (Want 5-20 stems).
Arrange all possible stems vertically so there are no

missing stems.
Write each leaf to the right of its stem, in order.
Create a key to recreate the data.
Variations of stem plots:
1. Split stems
2. Back to back stem plots.
51
Constructing a Stem-andLeaf Plot
Dot Plots
Dot plot
Consists of a graph in which each data value is
plotted as a point along a scale of values
Figure 2-5
Time Series
(Paired data)
Time Series
Data set is composed of quantitative entries
taken at regular intervals over a period of time.
e.g., The amount of precipitation measured
each day for one month.
Quantitativ
e data
Use a time series chart to graph.
time
Time-Series Graph
Number of Screens at Drive-In Movies
Theaters
Figure 2-8
55
Graphing Qualitative Data Sets
Pie Chart
Pareto Chart
A vertical bar graph in which the
height of each bar represents
frequency or relative frequency.
Frequency
A circle is divided into

sectors that represent
categories.
Categories
Constructing a Pie Chart

56
Find the total sample size.

Convert the frequencies to relative frequencies (percent).
Marital Status
Frequency,f Relative frequency (%)

(in millions)
Never Married
55.3
Married
127.7
Widowed
13.9
Divorced
22.8
Total: 219.7
55.3
219.7
127.7
219.7
13.9
219.7
22.8
219.7
0.25 or 25%
Constructing Pareto Charts

Create a bar for each category, where the height
of the bar can represent frequency or relative
frequency.
The bars are often positioned in order of
decreasing height, with the tallest bar positioned
at the left.

BIOSTAT Chapter2

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

BIOSTAT Chapter2

Загружено:

Авторское право:

Доступные форматы

Chapter 2:

Frequency Distributions (cont.)

A frequency distribution is an organized

1. Ungrouped Frequency Distribution for

2. Grouped Frequency Distribution: for data

Grouped and Ungrouped

Ungrouped Frequency Distributions

Frequency Distribution Tables

Sometimes, however, a set of scores covers

Grouped Frequency Distribution

Data in its original form and structure

When these data are placed into a system wherein they

Components of a Frequency Table

Class size- the difference between two consecutive

Greater than CF (>CF)- shows the number

Step 1) Determine the number of classes. For first

Example: We now construct the FDT

Step 2) Determine the range R:

Step 3) Determine the approximate class size C using

Step 4) Determine the lowest class interval (or the

Let us decide to start at the minimum

Step 5) Determine all class limits by adding the

Step 6: Tally the scores or observation falling in each class.

The following table presents the complete frequency

The relative location of individual scores

Class B--IQs of 13 Students

Each individual may be different. If you try to understand a

Theyre roughly the same!

Besides Histograms, there are other

Represents data by separating each data value into two

Stem and Leaf Plots

Larson/Farber 4th ed.

Constructing Stem and Leaf

Arrange all possible stems vertically so there are no

Write each leaf to the right of its stem, in order.

Create a key to recreate the data.

Variations of stem plots:

Constructing a Stem-andLeaf Plot

Use a time series chart to graph.

Graphing Qualitative Data Sets

A circle is divided into

Constructing a Pie Chart

Find the total sample size.

Frequency,f Relative frequency (%)

Constructing Pareto Charts

Вам также может понравиться