Вы находитесь на странице: 1из 34

# DATA SUMMARIZING

AND PRESENTATION

## This chapter is presenting the classification methods, by variants and

by classes, followed by the main categories of visual presentations:
tables and graphs

3.1 Introduction

## In order to express the central tendency in a dataset, the first activity of a

researcher is to summarize the primary recorded data. The activity is
redundant for the secondary, data taking into account that, secondary data is
already summarized and accessed as tables, distributions, and graphical
displays. Data summarizing concerns only primary data collected for the
first time and used for the same purpose as the current research.

## As a result of data classification we obtain frequency distributions as simple

tables and cross tables. The purpose of the data grouping and classification
is to put in order massive sets of data in order to extract the pertinent
information describing numerically the data set.

## One of the main concepts of data summarizing is the frequency

distribution, which can be obtained according to one or two variables. If
we take into account only one variable we are obtaining simple frequency
distributions (single variation frequency distributions or univariated data)
for which the classification procedures are: classification by variants and
classification by classes (interval) of variation. If the recorded dataset is
classified according to two variables, we are going to obtain a double
variation frequency distribution (bivariate data) and the classification
procedure is the cross-table procedure.

## A statistical distribution is a table with two columns: on the first column, we

will have the variants or the classes and on the second column, we will
have the frequencies. We can have absolute, relative, and cumulated
frequencies.

## Absolute frequency, denoted by fi represents the number of units occurring

to a certain variant or falling into a certain class.

## Relative frequency, denoted by fir represents the share of the absolute

frequency corresponding to a variant or a class into the total number of
frequencies. Can be computed as a coefficient or as percentages:

fi fi
fir = , f i r (%) = 100 , with the property:
f i f i

f i
r
= 1 , and f i
r
(%) = 100

## Cumulated frequencies can be obtained from absolute and relative

frequencies. They can be computed as frequency more than the lower
limit , meaning the number of unit with the variable value over the lower
limit of the current class and less than the upper limit, meaning the
number of units with the variable value lower than the upper limit of the
current class.

## All frequency categories are presented into tables. The elements of a

statistical table are: the overall title, the internal titles, the measurement
units of the data recorded into the table, data sources, and explanations.
Data Summarizing and Presentation

## The simple frequency distribution is obtained after the classification of the

raw data according to the values of a single variable. The possibility of
appearance can be a numerical or a nonnumeric category. Both qualitative
and quantitative characteristics can be summarized by variants/categories.
The possibility of appearance for the quantitative characteristics will be a
number called variant and for the qualitative characteristics will be a word
called attribute.

## The classification procedure is the following: we read sequentially each

recorded data and mark each appearance. Classification by variants assumes
that the variants are ranked increasingly.

## Example: We know the data concerning the number of persons per

household for 14 households:

Current number 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Number of persons 4 4 4 4 4 4 4 4 4 4 4 4 3 4
on a household

## Request: Classify the households by variance of the number of persons.

Member Absolute Relative Degrees of pie
Leaves
numbers frequencies frequencies chart
3 / 1 1/14=0.07 0
0.07x360 =25.2
0

0

Total 14 1 360
0

## The classification procedure is to read sequentially the database and to mark

each appearance by a leaf corresponding to each variant.

## The absolute frequency is the number of units occurring to a certain variant

of the characteristic.

A relative frequency expresses the weight of the absolute frequency into the
total and it will be computed:
f i = absolute _ frequency;
fi
fi =
r

f
; f i = total _ population _ size;
i
f i r = relative _ frequency.

## The structure of the households by variants of number of members is

presented using the pie chart in Figure. 1.a:
Structure of households per variants of number of persons

3 members
7%

4 members
93%

## Figure 1.a Pie chart

Pie charts are more suggestive and easy to understand than tables.

## 3.2.2 Data Summarizing by Classes

Data summarizing by classes can be applied only for quantitative
characteristics. A class of variation or an interval is defined between
two boundaries: its lower and upper limit. The size of the class or the
interval size will be defined as the difference between the upper limit and
the lower limit.
This classification can be:
continuous (the lower limit of the current class is the same as the upper
limit of the previous class) or
discrete (there are gaps between the upper limit of the current class and
the lower limit of the next class). In this case the class size will be the
difference between two lower limits or between two upper limits
Data Summarizing and Presentation

## Data grouping assumes solving a few issues:

a. establishing the purpose of the classification by classes: data
summarizing by classes is used in order to obtain synthetic data
b. choosing the classification variable or variables: the result of
classification should be obtaining homogeneous groups
c. establishing the number of classes: according to the classification
purpose, the completion criterion (the classification should comprise all
the units) and empiric rules, the result should be frequency distribution
as close as possible to the normal distribution (Gauss bell)
d. constructing the classes continuously or discretely
e. marking each frequency as they occur

## Step 1: Compute the range of variation called also amplitude:

A = X max-X min
The amplitude will be measured in the same measurement unit as the
variable.

## Step 2: Choose the proper number of classes, r, according to many criteria

like for instance the rule of H.D. Sturges: r = 1 + 3.322 lg n

Step 3: Compute the class size, k, if all the classes have the same size and
the variation is continuous: k= A/r, where: A amplitude; r - number of
classes

Step 4: Construct the classes, by adding step by step the class size k starting
with the minimum value xmin (or a smaller value) or by subtracting the class
size step by step starting from the maximum value xmax (or from a larger
value).

Example:
The class marks of a distribution of a daily number of monthly average
numbers of trips made by an employee, are 10.5, 18.5, 26.5, 34.5, and 42.5.
Find: a) the class boundaries; b) the class limits

Solution:
a) The boundary - between the first two classes is:
(10.5 + 18.5)/2 = 14.5

## - between the second and third is:

(18.5 + 26.5)/2 = 22.5
- between the third and fourth is:
(26.5 + 34.5)/2 = 30.5
- between the fourth and fifth is:
(34.5 + 42.5)/2 = 38.5
Since the difference between successive boundaries is, say
22.5 - 14.5 = 8
=> the lower boundary of the first class is
14.5 - 8 = 6.5
=> the upper boundary of the fifth class is
38.5 + 8.0 = 46.5
b) The class limits
6.5 + 0.5 = 7 lower limit
46.5 - 0.5 = 46 upper limit
=> the class limits are 7-14; 15-22; 23-30; 31-38; 39-46

## One of the most effective ways of presenting numerical information is to

construct a chart or diagram. The choice depends on the type of data.

There is a basic distinction between a data that is discrete and one that is
continuous. A data set is discrete if we make a count, for example the
number of people in a room or the number of cars sold last month.

## A data set is continuous if measurement is made on a continuous scale, for

example the time taken to travel to work or the yield in kilograms of a
manufacturing process.

There are some exceptions, for example, money can be seen as discrete
since it changes hands in increments but is usually treated as continuous
because the increments can be relatively small. Age is a continuous variable
but when quoted as age last birthday becomes discrete.
Data Summarizing and Presentation

## 3.3.1 Presentation of Discrete Data

The counts of cars sold by model, and region in the Table 3-1 below
represents discrete data.

Numbers of four types of cars sold by region during the last financial year
Table 3-1
Model North South East West Total
Sport 675 60 35 20 790
Sedan 30 490 30 20 570
Break 150 180 235 15 580
Van 5 20 0 35 60
Total 860 750 300 90 2000

## To make an easy comparison between regions we can present the same

information as percentages. The table 2 below shows this.

Percentages unit sales of four types of industrial trolley by region during the last
financial year
Table 3-2
Model North South East West
Sport 78.49 8.00 11.67 22.22
Sedan 3.49 65.33 10.00 22.22
Break 17.44 24.00 78.33 16.67
Van 0.58 2.67 0.00 38.89
Total 100.00 100.00 100.00 100.00

It can be seen that Sport cars accounts for 78.49% of unit sales in the North
and only 8.00% of unit sales in the South. To calculate these percentages we
can first find the fraction and then multiply by 100. Sport cars for example,
accounts for 675 sales out of the 860 achieved in the northern sales region .
The figure 675 divided by the total of 860 = 0.7849 = 78.49% (if expressed
in percentages). The original totals are often given as base figures.

## 3.3.2 Non-numeric Frequency Distributions

These take a similar form to their numeric counterparts, except that the
groups (or classes) describe qualitative (i.e. non-numeric) characteristics of
the data. For example:

## a) Table 3-3 shows the execution workforce employed at a factory.

b) Table 3-4 shows the insurance contracts issued by an insurance agent in a
month.

Employee distribution
Table 3-3

Table 3-4

## 3.3.3 Time Series Presentation

The name given to the data describing the values of some variable over
successive time periods is a time series. For example, the data given in
Tables 3-5 and 3-6 are typical.
Production evolution
Table 3-5

## Monthly sales evolution

Table 3-6
Data Summarizing and Presentation

## Combination of non-numeric frequency distribution and time series often

occur with business statistics. For example, Table 3-7 shows the breakdown
of holiday locations booked through an international travel company. This
type of data is sometimes known as a time series or an evolution, since the
various classifications (in the below case holiday destinations) can be
thought of as the components of a meaningful total in the below case, total
number of holidays booked through the travel agent.

## Holiday location evolution per regions Table 3-7

94 95 96 97 98 99

The data in table 3-8, although similar in form to that of table 3-7, have
classifications for each month, which cannot be added to form meaningful
totals. They are sometimes called multiple time series, since the data given
for each type of ice-cream is a separate time series.

Table 3-8

Price of Vanilla
ice-cream
Chocolate

Fruits

## a. present data in an attractive and colorful way;

b. enable a general perspective of the data to be shown without redundant
details

## Diagrams can be used to replace listings or tabulation of data and often

used, for example, when intended audience is not very sophisticated. On the
other hand, a long and detailed business report can be complemented by a
scattering of diagrams, which will help to 'break the rules' and thus make it
more agreeable.

The type of diagrams (i.e charts and graphs) described in this chapter can be
conveniently classified under three headings as follows:

## a) Diagrams to display non-numerical frequency distributions as pictograms,

simple bar charts, pie charts

## b) Diagrams to display time series as line diagrams, simple bar charts,

miscellaneous charts

## These diagrams are generally used to display various combinations of

multiple non-numeric frequency distributions or time series as:
a. Component, percentage and multiple bar charts.
b. Multiple pie charts.
c. Strata charts.
d. Z-charts.
e. Gantt charts.
f. Semi logarithmic graphs
These graphs are presented further.

3.4.1 Pictograms
Appropriate pictures to show comparison replace bars. Whilst, this is more
eye-catching, it is considerably less accurate and may be miss-leading.
(Figure 1.b, how many does a fraction of person represent? Half or 45%)
Data Summarizing and Presentation

Features of pictograms
a) Pictograms are sometimes referred to as ideograms.
b) The symbol are normally duplicated and for the sake of accuracy the
numeric values being represented are sometimes shown. A scaled axes
can be included.
c) An alternative method to duplicating the symbol used is to magnify
them. For example, Figure 2 represents different numbers of trucks by
the area of the truck symbol and Figure 3 represents an increase in sales
of detergent (in kg) by the volume of the two detergent boxes.

Easy to understand for a non-sophisticated audience, extremely suggestive.
- Can be award to construct if complex symbols are used.
- Not accurate enough for serious statistical presentation.

## - Magnification of symbols (using areas are volumes) can be confusing

unless the values of figures being represented are clearly shown.
3.4.2 Bar Charts

A simple bar chart is a chart consisting of a set of bars separated with gaps
(they are non-join bars). A separated bar for each class is drawn to a height
proportional to the class frequency. The widths of the bars drawn for each
class are always the same and if desired, each bar can be shaded or colored
differently. Simple bar charts can be used to represent nonnumeric
frequency distributions and time series equally well.

## The numbers cars sold by model or by region can be represented as vertical

bars. The height of each bar is drawn in proportion to the number using a
vertical ruler scale (Figure 4). We can also show the number of each model
sold by region using a component bar chart (Figure 5).

## Once we being to examine the composition of totals it can become difficult

to see the relative size of some of the components. To overcome this
problem, it is often convenient to change the absolute figures into
percentages, thus giving bars all of the same length and making direct
comparisons possible. (Figure 7).
OBS.: Do not confuse simple bar charts and histograms. Histograms
represent numeric data with joined bars. Simple bar charts represent
non-numeric data (time series) and have their bars separated from each
other.

## Figure 4. Sales of cars by region

Data Summarizing and Presentation

Sedan

Sport

Break

Van

## Some features of simple bar chart:

a) They can be drawn with vertical or horizontal bars, but must show a
scaled frequency axis.
b) They are easily adapted to take account of both positive and negative
values. For example, Figure 6 shows the Balance of Payment of a
particular country.

## According to Fig. 6 we can comment: for this country the Balance of

Payments started with a deficit of 300 million m.u. in 19X1 and in 19X2. In
19X3 it raises up to a surplus of almost 5000 million, rising to a maximum
value of 100 million in 19X5. The situation shows a decrease of the surplus
continued a deficit of 1000 m.u. in 19X7. This type of chart is also known
as a loss and profit chart.

## Percentage bar chart

These bar charts have each bar representing a class but all drawn to the
same height, representing 100% (of the total). The constituent parts of each
class are then calculated as percentage of the total and shown within the bar
accordingly. Within each bar, components are stacked in the same order.

Sedan

Sport

Break

Van

## Figure 7. Percentage bar chart

Figure 7 depicts a Percentage bar chart: Percentage Bar Charts are used
where relative comparison between components, are important. The
disadvantage is that actual figures including class totals are not comprised
into the graph.
Multiple bar charts
These have a set of bars each bar representing a single constituent part of
the total. Within each set, the bars are physically joined and always arranged
in the same sequence. Sets of bars should be separated.
Data Summarizing and Presentation

## Multiple bar chart is appropriate to compare components within and

across in actual terms, since each bar is drawn from a fixed base. The two
only disadvantages is that class totals are not easy to assimilate and also it
can be unwieldy if there are a large number of classes.
3.4.3 Pie Charts
A pie chart shows the totality of the data being represented using a single
circle (a pie). The circle is split into sectors (i.e pieces of pie chart = sectors
of the circle), the size of each one being drawn in proportion to the class
frequency. Each sector can be shaded or colored differently if desired.
The Figure 1a. and Figure 9 below shows a pie chart drawn for the data of
Table3-9. Pie charts are always used to represent non-numeric frequency
distribution and are at their most effective where the classes need to be
compared in relative terms.
Features of pie chart:
a) Pie Charts are sometimes refereed to as circular diagrams or divided
circles.
b) In order to construct a pie chart, the size of each sector in degrees needs
to be calculated. The procedure is:
- Calculate the proportion of the total that each frequency represents.
- Multiply each proportion by 360, giving the sizes of the relevant sectors
(in degrees) that needs to be drawn

Table 3-9

## c) Exploding the different sectors is an effective way of presenting a pie chart

see Figure 10.
Table 3-10
Data Summarizing and Presentation

Figure 10 The breakdown of an employee's monthly pay, with each sector exploded.

## Net salaries accounts for 2 thirds of total salaries

- A dramatic and appealing way of presenting data.
- Good for comparing classes in relative terms.

- Compilations laborious. Circles should not be drawn by hand and sectors
should be drawn using a protractor, However, without a protractor (once
the size of each sector has been determined, their physical size within the
circle can be intelligently guessed at.
- Can be untidy if there are many classes (say 8 or more) and different shading
or colorings are being used.

Multiple pie-charts

Multiple Pie Charts can be used as alternatives to percentage bar charts; that
is, a pie chart (360 degrees) replaces a bar (100%) for each class or year.
For example, Table 3-11 represents the skills classifications of the
workforce at two factories.(Note that the degrees figure in Table 3-11 can be
obtained by multiplying each percentage by 3.6)

Table 3-11

## Figure 11. Multiple pie charts

Comments on the situation shown by Fig. 11: At both factories, about 20%
of the workforce is semi-skilled. However, whereas unskilled workers
account for only 20% of the workforce of factory A, they will constitute
about 35% of factory B's employed workforce.
The advantage of using multiple pie charts as opposed to a percentage bar
chart is mainly visual impact; they generally felt to be more attractive.
However, their construction is more involved and this is considered as a
major disadvantage. Most people prefer to work out percentages and draw
straight line bars than calculate degree of sectors and draw circles.
Data Summarizing and Presentation

## Proportional pie charts

Sometimes the presentation of multiple pie charts can be taken a stage
further than that shown in Table 3-11 and Figure 12. This is done by making
the areas of the circle proportional to the class totals. Thus, if the total
frequency of one class was twice that of another, the area of its
representative circle would be twice as large. These are called proportional
pie charts. the following procedure describes how to flaw a pair of
proportional pie charts, to represent two classes having totals T1 and T2
respectively. At the same time, the technique is demonstrated, using the data
given in the previous section.
Steps for constructing proportional pie charts
Step 1: Determine the two class totals.
For the previous data, T1 =116 (total number of workers at factory A) and
T2=322 (total number of workers at factory B).
Step 2: Draw the first circle using any convenient radius.

## Thus, for the previous data,

We calculate the size of the sectors for both circles with current method.

## The obvious impact of proportional pie charts is often outweighed by the

considerable effort (both calculation and drawing) that need to be made in
its construction. This is particularly significant if there are more than
two classes involved.

3.4.4 Histograms

## A histogram plots the value of a frequency distribution as a sequence of bars

joined. The variable points are always represented along the horizontal axis,
values of the frequencies along the vertical axis- see Figure 13.

0.4
Relative Frequency

0.3

0.2

0.1

0
10 15 20 25 30
Def ective Items

## It represents continuous data.

Data Summarizing and Presentation

## 3.4.5 Strata Charts

Strata charts (otherwise known as cumulative line diagrams or area graphs)
are used to represent component time series. As the middle of the above
three titles implies, the separate line diagrams for the components are
stacked (in a similar way to component bar charts). Figure 14 shows strata
chart of the number of flats built (by type) for a local authority, for the data
given in the accompanying Table, 3-12

Table 3-12

## Comments on the situation shown in the figure above: Total building

decreased from nearly 800 flats in 19X1 to a low of under 400 in 19X7, but
over the next 3 years (up to 19X9) recovered nearly half to stand at about
550. Over the whole period, the building of both 2 and 3 or more bedroom
flats have decreased by about hulk whereas 1 bedroom flat, despite a small
slump in 19X7, have remained at just over 300 per year.
a) It is important to shade the various sections of the chart, as shown in
Figure 14, since this emphasizes the cumulative nature of the data.
Compare this with the case of a multiple line diagram, where actual data
values for each variable are plotted and no shading would be used.
b) The advantages of a strata chart, compared with a multiple bar chart, are:
i. a total is shown for each year.
ii. as many components as desired can be plotted without confusion.
c) The advantage of a strata chart compared with a component bar chart: the
impression of continuity that the lines give (appropriate for time series)
and it is easier to be constructed.

## d) The disadvantages of using a strata chart, compared with a multiple line

diagram, are: cross-over points (i.e where the value of one component
overtakes the value of another) are not identified on a strata chart and
individual component comparisons are not as easy to make.

3.4.6 Z-charts

## A Z-chart is the name given to a chart, which presents evolutions in the

form of a combination of three separate line diagrams described as follows.

a) The first line diagram drawn describes the actual time series values.
Normally, the time series consists of monthly measurements over one
complete year, i.e Jan, Feb, through to Dec.

b) The second line diagram drawn describes the accumulated time series
values, For monthly measurements, the first plot will coincide with the
first plot of the diagram in a), i.e January figure. The second plot will be
January and February, the third, January, February and March and so on.
This diagram is useful for charting monthly progress towards an annual
total. The more removed from straight line it is, so the more variation
there has been on the actual monthly figures.
Production evolution
Table 3-13

The third line diagram is drawn such that each point describes the current
month's figure plus the previous eleven month's figures, to form a
Data Summarizing and Presentation

## twelve-month total. Collectively, the twelve values so obtained are called

moving totals (see table above). Note that, in order to calculate moving
totals for a particular year, the previous year's figure must be known. The
first point plotted (at January, this year) will be the sum of the figures from
February), last year, to January, this year.
The second point plotted will be the sum of the figures from March, last
year to February this year, and so on. The last point moving total plotted
will coincide with the last accumulated value of the diagram in (b) which is
the point corresponding to the top right-hand join of the Z). This particular
diagram is useful for determining the long term underlying trend of the data.

## The following example shows the construction of a Z-chart.

The data in Table 3-13 give the monthly production figures of a manufactured
component and shows the calculations necessary for constructing a Z-chart
on Figure 15.

## Figure 15. Z-chart, comprising actual, cumulative and moving totals

for year 2.

Comment on the situation shown by Figure 15: Production in the 2nd year
was relatively constant with a slight drop in the summer months. The long
term shows a drop in overall production.

## 3.4.7 GANTT Charts

A method of charting the progress of some project against a defined plan is
affected by means of a Gantt chart. A number of scaled natural of scaled
natural time periods (days, weeks or months) are identified, within each of
which three bars can be drawn. One bar shows the planned achievements
and the other 2 show actual achievement and cumulative achievement
(to date).
As an example, Table 3-14 shows the planned achievement of five weeks
production for a project, together with the actual achievement up to the end
of week three.
Table 3-14

Figure 16 GANTT chart drawn up for the data of Table 3-13. The comments shown
on the chart are for information purposes and may or may not be included
on an actual chart.
Data Summarizing and Presentation

## Comments on the situation shown by Figure 16: Despite a shortfall in actual

production in week 1, week 2 and 3, both showed an excess of production
which resulted in an overall excess of 8units by the end of week 3.

## 3.4.8 Actual and Percentage Increases in Time Series

In business, it is sometimes necessary to show clearly, using a graphical
method, whether some time series variable is increasing in actual or
percentage terms. The differences between these two can have a marked
effect on successive values of variable.

For example, suppose the yearly demand for a new technological product
was estimated as 2000 in year 1. Table 3-15 shows the difference between
an actual increase of 500 per year and a relative increase of 25% per year
and figure 17 shows these values plotted using line diagrams.

Table: 3-15

a b
Figure 17 a and b. Demand evolution forecast

## However, reversing the situation, given only a diagram showing successive

values of a time series variable, is it possible to determine quickly whether
there is a constant actual or percentage increase in values? For a constant
actual increase, the answer is yes. We look for straight line. However, for a
constant rate of increase, there is no easy way to tell by looking at the graph.
This problem can be overcome by plotting the logarithms of the values to
form a line diagram called a semi-logarithmic graph.

## 3.4.9 Semi-logarithmic Graphs

a) Semi-logarithmic graphs are used to display time series data and their
purpose is to show whether the rate of increase /decrease in the values of
the variable involved is constant or not. They are constructed by plotting
the logarithmic of the given values against their respective time points
and if a straight-line results, the values are increasing or decreasing at a
constant rate.
As an example, the data given in Table 3-15 (situation 2) will be plotted
using a semi-logarithmic graph. The layout of the calculations is shown
below in Table 3-16.
Table 3-16

The above logarithms against year are displayed in Figure 18. Notice that
the points form a perfect straight line (which was expected, since the values
were calculated using a constant 25% increase.)
Data Summarizing and Presentation

## Figure 18. Estimated sales evolution

b) The larger the rate of increase of time series data, so the steeper the
(semi-logarithmic) line will be. This fact enables the rates of increase of
two or more time series to be compared on the same set of axes. Note
that when comparing time series data in this way, it is only the steepness
(or inclination) of the lines that is of any interest and not their positions.
Figure 19 and 20 shows the significance of some standard shapes of
semi-logarithmic graphs.

## The following points should be regarded as standards when constructing

graphs and should be remembered.

a) All diagrams should be neat and attractive to look at. Always use graph
paper and a ruler.
b) Diagrams should be easy to read, without excessive detail.
c) Always try to locate the diagram centrally on the paper, using as much of
it as possible.
d) A general title must be given which describes what is being ported but it
should be as brief as possible and to the point
e) Axes, if used, should be clearly labeled, giving the units of the data and a
note of any break of scale.
f) Shading or coloring, if used, must be lightly done as it may detract from
the presentation.

## g) If two or more line diagrams appear together, distinguish between them

clearly by labeling, coloring or dotting/dashing. If charts are colored or
shaded, label them clearly or provide a separate key.

a) b)

## Figure 19 a, b, c, d. Example of semi-logarithmic graphs

Data Summarizing and Presentation

## Non-numeric frequency distributions describe data by their quality. Time

series consist of measurements of some variable over time. Both of these
data structures, or a combination of the two, are represented by the charts
and graphs described in this section.

## Pictograms are charts, which represent different magnitudes by repeating or

varying the size of symbols/pictures, which are easily identifiable by a
non-expert audience. Their merit is that they are easy to understand. The
disadvantages include possible misrepresentation and using varying symbols

3.5 Exercises
Multiple choice exercises with answers
1. In order to display a qualitative variable variation we can use:
a. the histogram
b. the frequency polygon
c. the ogyve of cumulated frequencies
d. the bar chart
e. no graphical presentation is suitable for qualitative data distributions

## 2. Which of the following statements about pie charts is false?

a. Pie charts are graphical representations of the relative frequency
distribution
b. Pie charts are usually used to display the relative sizes of categories for
interval data.
c. Pie charts always have the shape of a circle
d. Area of each slice of a pie chart is the proportion of the corresponding
category of the frequency distribution of a categorical variable

## 3. The most appropriate type of chart for determining the number of

observations at or below a specific value is:
a. a histogram
b. a pie chart
c. a time-series chart
d. a cumulative frequency give

4. The best type of chart for comparing two sets of categorical data is:
a. a line chart
b. a pie chart
c. a histogram
d. a bar chart

## 5. The discrete variable variation cannot be displayed using:

a. the bar chart
b. the scatter diagram
c. the pie chart
d. the line chart
e. it can be displayed using all the above mentioned graphical techniques
Data Summarizing and Presentation

## 6. The number of failures produced by an equipment and recorded for the

last 25 hours are as follows:

19 6 15 20 17 16 17 12 15
29 23 17 7 10 14 14 27 22
8 5 23 19 9 28 5

## a. Construct a frequency distribution and relative frequency distribution for

these data. Use five class intervals, with the lower boundary of the first
class being five items.
b. Construct a relative frequency histogram for these data.
c. What is the relationship between the total area under the histogram you
have constructed and the relative frequencies of observations?
Class Limits Frequency Relative Frequency
5 - 10* 6 .24
10 - 15 4 .16
15 - 20 8 .32
20 - 25 4 .16
25 - 30 3 .12

Total 25 1.00

Class contains observations up to but not including 10. The other classes are
defined similarly. This notion is used throughout the chapter.

b. Note that the numbers that appear along the horizontal axis represent the
upper limits of the class intervals even though they appear in the center of
the classes of the histogram.

0.4

Relative Frequency
0.3

0.2

0.1

0
10 15 20 25 30
Def ec tiv e Items

c. The area under the histogram between two values is five times the relative
frequency of observations that fall between those two values (where 5 is the
width of each class). The total area under the histogram will be equal to 5.

## 7. Consumers participating in a recent demand estimation survey in

Bucharest were asked to state their ice-cream preference. Coding the data
1 for Vanilla, 2 for Chocolate, and 3 for Fruit, the data collected were as
follows:
3 1 2 3 1 3 3 2 1
3 3 2 1 1 3 2 3 1
3 2 3 2 1 3 3

## a. Develop a frequency distribution and a proportion distribution for the

data. What does the data suggest about the strength of the categories of
ice-cream on the ice-cream market in Bucharest?
b. Construct a frequency bar graph.

a.
Ice-cream Frequency Proportion
category
Vanilla 7 0.32
Chocolate 6 0.24
Fruit 12 0.44
Data Summarizing and Presentation

The Fruit ice-cream on Bucharest market is stronger than the Vanilla and
Chocolate ice-creams.
b.

14
12
10
8
6
4
2
0
Vanilla Chocolate Fruit

## Open ended exercises without answers

8. Identify the type of data for which each of the following graphs is
appropriate.
a. Histogram
b. Pie chart
c. Bar chart
9. For each of the following examples, identify the data type as, either
qualitative, ranked, or quantitative, and specify the appropriate measurement
scale for each as either interval, nominal, or ordinal.
a. the letter grades received by students in a computer science class
b. the number of students in a statistics course
c. the starting salaries of newly Ph.D. graduates from a statistics program
d. the size of fries (small, medium, large) ordered by a sample of Burger
King customers.
e. the college (Arts and science, Business, etc.) you are enrolled in.

10. In its 2002 report, a company presented the following data regarding its
sales (in millions of dollars), net income (in millions of dollars), return
on equity (%), and net income per share (in dollars).

## Year 2002 2001 2000 1999 1998

Sales Volume 185 57 73 76 70
Net Income 7.1 2.4 4.1 4.2 1.2
Return on Investment 30.5 11.6 22.4 29.7 10.0
Net Profit Per Share 1.0 .35 .50 .52 .12

## a. Use bar charts to present these data

b. Assume that you are an unscrupulous statistician and want to make the
data appear more positive than they really are. Draw the bar charts
accordingly.