Вы находитесь на странице: 1из 57

Chapter 2:

Frequency
Distributions
1

Frequency Distributions
After collecting data, the first task for a
researcher is to organize and simplify the
data so that it is possible to get a general
overview of the results. This is the goal of
descriptive statistical techniques.
One method for simplifying and organizing data
is to construct a frequency distribution.

Frequency Distributions (cont.)

A frequency distribution is an organized


tabulation showing exactly how many
individuals are located in each category on
the scale of measurement.
A frequency distribution presents an
organized picture of the entire set of
scores, and it shows where each
individual is located relative to others in
the distribution.
3

FREQUENCY DISTRIBUTIONS
(CONT.)
A table that organizes data values into classes
or intervals along with number of values that
fall in each class (frequency, f ).

1. Ungrouped Frequency Distribution for


data sets with few different values. Each
value is in its own class.

2. Grouped Frequency Distribution: for data


sets with many different values, which
are grouped together in the classes.

Grouped and Ungrouped


Frequency Distributions
Ungrouped
Courses Frequency, f
Taken

Grouped
Age of Frequency, f
Voters

25

18-30

202

38

31-42

508

217

43-54

620

1462

55-66

413

932

67-78

158

15

78-90

32

Ungrouped Frequency Distributions


Number of Peas in a Pea Pod
Sample Size: 50
5

Peas per
pod

Freq, f

Freq,
Peas per pod
f
1

18

12

Frequency Distribution Tables


A frequency distribution table consists of at least
two columns - one listing categories on the scale of
measurement (X) and another for frequency (f).
In the X column, values are listed from the highest
to lowest, without skipping any.
For the frequency column, tallies are determined
for each value (how often each X value occurs
in the data set). These tallies are the frequencies
for each X value.
The sum of the frequencies should equal n.

Grouped Frequency
Distribution

Sometimes, however, a set of scores covers


a wide range of values. In these situations,
a list of all the X values would be quite long
- too long to be a simple presentation of
the data.
To remedy this situation, a grouped
frequency distribution table is used.

Grouped Frequency Distribution


(cont.)
In a grouped table, the X column lists
groups of scores, called class intervals,
rather than individual values.
These intervals all have the same width,
usually a simple number such as 2, 5, 10,
and so on.
Each interval begins with a value that is a
multiple of the interval width. The interval
width is selected so that the table will have
approximately ten intervals.

Key Concepts:

Data in its original form and structure


are called raw data/ ungrouped data.
Example: Consider the following data
on 40 womens testosterone serum level
(scores) measured in mg/dl.
79

83

84

62

62

43

72

48

46

59

93

64

59

32

54

45

55

45

76

72

40

51

51

72

83

49

62

85

74

40

49

65

38

55

77

63

38

43

63

69

To
construct
a
frequency
distribution of the given raw data,
we first find the highest serum
level and prepare a column of the
these levels beginning from the
highest value and ending at the
lowest one. Since the highest
serum level is 93 and the lowest is
32, we have:

Serum
Level

Serum
Level

93

62

III

85

59

II

84

55

83

II

54

79

51

II

77

49

II

76

48

74

46

72

III

45

II

69

43

II

65

40

II

64

38

II

63

II

32

When these data are placed into a system wherein they


are organized, then these partake the nature of
grouped data. This procedure of organizing data into
groups is called a frequency distribution table
(FDT).
Example:
The following presents a frequency distribution table of
the grouped data of the urine amylase (scores) of 15
patients in amylase units/hour.
SCORES

FREQUENCY

10-19

20-29

30-39

40-39

50-59

1
15

Components of a Frequency Table


Class Interval- these are the numbers defining the
class; consists of the end numbers called the class
limits namely the upper limit and the lower limit.
Class frequency- shows the number of observation
falling in the class
Class Boundaries- these are the so called true class
limits, classified as:
Lower Class Boundary (LCB)- defined as the
middle value of the lower class limit of the class
and the upper class limit of the preceding class
Upper Class Boundary- defined as the
middle value between the upper class limit of
the class and the lower limit of the next class

Class size- the difference between two consecutive


upper limits or two consecutive lower limits
Classmark(CM)- midpoint or the middle value of a
class interval
Cumulative frequency- shows the accumulated
frequencies of successive classes

Greater than CF (>CF)- shows the number


of observations greater than LCB
Less than CF(<CF)- shows the number of
observations less than UCB

Steps in Constructing
FDT:

Step 1) Determine the number of classes. For first


approximation, it is suggested to use the STURGES
APPROXIMATION FORMULA.

K= 1+3.322 log n
where K=approximate number of classes
n= number of class

Example: We now construct the FDT


of the testosterone serum level of 40
women as shown in the raw data:

Step 2) Determine the range R:


where R= maximum value-minimum value
R= max-min
R= 93-32
R= 61

Step 3) Determine the approximate class size C using


the formula
C= R/K
where R= range & K= Sturges Approximation
Formula
Note: It is usually convenient to round off C to a
nearest whole number.
C= R/K
C= 61/6
C= 10.167
C=10

Step 4) Determine the lowest class interval (or the


first class). This class should include the minimum
value in the data set. For uniformity, let us agree
that for our purposes, the lower limit of the lowest
class interval should start at the minimum value.

Let us decide to start at the minimum


value. Thus the lowest class is the
class 32-41.

Step 5) Determine all class limits by adding the


class size C to the limits of the previous class.
The classes constructed by adding 10 each class
limit. Thus we have:
32

41
42

51

52

61

62

71

72

81

82

91

92

101

Step 6: Tally the scores or observation falling in each class.

Tally

Classes
32
42

41
51

Frequency

IIII

IIII-IIII

IIII

52

61

62

71

IIII-III

72

81

IIII-III

82

91

IIII

92
101

4
8
8
5
1
N=40

The following table presents the complete frequency


distribution table indicating the class boundaries, the
class marks as well as the cumulative frequencies.

Frequency Distribution
Graphs
In a frequency distribution graph, the score
categories (X values) are listed on the X axis
and the frequencies are listed on the Y axis.
When the score categories consist of
numerical scores from an interval or ratio
scale, the graph should be either a
histogram or a polygon.

Histograms
In a histogram, a bar is centered above each
score (or class interval) so that the height of
the bar corresponds to the frequency and the
width extends to the real limits, so that
adjacent bars touch.

Polygons
In a polygon, a dot is centered above each
score so that the height of the dot
corresponds to the frequency. The dots are
then connected by straight lines.
An
additional line is drawn at each end to bring
the graph back to a zero frequency.

28

Bar graphs
When the score categories (X values) are
measurements from a nominal or an
ordinal scale, the graph should be a bar
graph.
A bar graph is just like a histogram
except that gaps or spaces are left
between adjacent bars.

30

Relative frequency
Many populations are so large that it is
impossible to know the exact number of
individuals (frequency) for any specific
category.
In these situations, population distributions
can be shown using relative frequency
instead of the absolute number of individuals
for each category.

32

Smooth curve
If the scores in the population are measured on
an interval or ratio scale, it is customary to
present the distribution as a smooth curve
rather than a jagged histogram or polygon.
The smooth curve emphasizes the fact that the
distribution is not showing the exact frequency
for each category.

34

Frequency distribution
graphs
Frequency distribution graphs are useful
because they show the entire set of scores.
At a glance, you can determine the highest
score, the lowest score, and where the scores
are centered.
The graph also shows whether the scores are
clustered together or scattered over a wide
range.

36

Shape
A graph shows the shape of the distribution.
A distribution is symmetrical if the left side
of the graph is (roughly) a mirror image of the
right side.
One example of a symmetrical distribution is
the bell-shaped normal distribution.
On the other hand, distributions are skewed
when scores pile up on one side of the
distribution, leaving a "tail" of a few extreme
values on the other side.

37

Positively and
Negatively
Distributions
aSkewed
positively
skewed distribution,

In
the
scores tend to pile up on the left side of
the distribution with the tail tapering off to
the right.
In a negatively skewed distribution, the
scores tend to pile up on the right side and
the tail points to the left.

38

Percentiles, Percentile
Ranks,
and Interpolation

The relative location of individual scores


within a distribution can be described by
percentiles and percentile ranks.
The percentile rank for a particular X value
is the percentage of individuals with scores
equal to or less than that X value.
When an X value is described by its rank, it is
called a percentile.

40

Percentiles, Percentile
Ranks,
and
Interpolation
(cont.)
find percentiles and percentile ranks, two

To
new
columns are placed in the frequency distribution table:
One is for cumulative frequency (cf) and the other is for
cumulative percentage (c%).
Each cumulative percentage identifies the percentile
rank for the upper real limit of the corresponding score
or class interval. When scores or percentages do not
correspond to upper real limits or cumulative
percentages, you must use interpolation to determine
the corresponding ranks and percentiles. Interpolation
is a mathematical process based on the assumption that
the scores and the percentages change in a regular,
linear fashion as you move through an interval from one
end to the other.
41

Interpolation
When scores or percentages do not
correspond to upper real limits or
cumulative percentages, you must use
interpolation
to
determine
the
corresponding ranks and percentiles.
Interpolation is a mathematical process
based on the assumption that the scores
and the percentages change in a regular,
linear fashion as you move through an
interval from one end to the other.
42

Stem-and-Leaf Displays
A stem-and-leaf display provides a very
efficient method for obtaining and
displaying a frequency distribution.
Each score is divided into a stem
consisting of the first digit or digits, and a
leaf consisting of the final digit.
Finally, you go through the list of scores,
one at a time, and write the leaf for each
score beside its stem.
The resulting display provides an
organized picture of the entire distribution.
The number of leafs beside each stem
corresponds to the frequency, and the
individual leafs identify the individual
scores.
44

Descriptive Statistics
Sample Illustration:
Which Group is Smarter?
Class A--IQs of 13 Students

Class B--IQs of 13 Students

102

115

127

162

128

109

131

103

131

89

96

111

80

109

93

87

98

106

140
93
110

119
97

120

105

109

Each individual may be different. If you try to understand a


group by remembering the qualities of each member, you
become overwhelmed and fail to understand the group.

Descriptive Statistics
Which group is smarter now?
Class A--Average IQ
110.54

Class B--Average IQ
110.23

Theyre roughly the same!


With a summary descriptive statistic, it is
much easier to answer our question.

Besides Histograms, there are other


methods of graphing quantitative
data:
Stem and Leaf Plots
Dot Plots
Time Series

Other Graphs

Represents data by separating each data value into two


parts: the stem (such as the leftmost digit) and the leaf (such
as the rightmost digit)

Stem and Leaf Plots

Larson/Farber 4th ed.

49

50

Constructing Stem and Leaf


Plots

Split each data value at the same place value to form the
stem and a leaf. (Want 5-20 stems).

Arrange all possible stems vertically so there are no


missing stems.

Write each leaf to the right of its stem, in order.

Create a key to recreate the data.

Variations of stem plots:

1. Split stems
2. Back to back stem plots.

51

Constructing a Stem-andLeaf Plot

Dot Plots
Dot plot
Consists of a graph in which each data value is
plotted as a point along a scale of values

Figure 2-5

Time Series
(Paired data)
Time Series
Data set is composed of quantitative entries
taken at regular intervals over a period of time.
e.g., The amount of precipitation measured
each day for one month.
Quantitativ
e data

Use a time series chart to graph.

time

Time-Series Graph
Number of Screens at Drive-In Movies
Theaters

Figure 2-8

55

Graphing Qualitative Data Sets

Pie Chart

Pareto Chart
A vertical bar graph in which the
height of each bar represents
frequency or relative frequency.

Frequency

A circle is divided into


sectors that represent
categories.

Categories

Constructing a Pie Chart


56

Find the total sample size.


Convert the frequencies to relative frequencies (percent).

Marital Status

Frequency,f Relative frequency (%)


(in millions)

Never Married

55.3

Married

127.7

Widowed

13.9

Divorced

22.8
Total: 219.7

55.3
219.7
127.7
219.7
13.9
219.7
22.8
219.7

0.25 or 25%

Constructing Pareto Charts


Create a bar for each category, where the height
of the bar can represent frequency or relative
frequency.
The bars are often positioned in order of
decreasing height, with the tallest bar positioned
at the left.

Вам также может понравиться