Вы находитесь на странице: 1из 29

Visualizing and Presenting Data

Tables
?

Frequency
distributions
?

Graphical
representation?

Newspapers, magazines and television all


use these types of displays to try and
convey information in an easy to
assimilate way.
Glyn Davis & Branko Pecar

In a nut shell what these forms of


displays aim to do is to summarise large
sets of raw data such that we can see at
a glance the 'behaviour' of the data.

Learning Objectives

Understand the different types of data variables that can be used to


represent a specific measurement.
Know how to present data in table form.
Present data in a variety of graphical forms.
Construct frequency distributions from raw data.
Distinguish between discrete and continuous data.
Construct histograms for equal and unequal class widths.
Understand what we mean by a frequency polygon.

Introduction
In statistics we have two distinct types:
1. Descriptive Statistics comprises collecting, presenting
data (tables and graphs) and describing data (central
tendency, dispersion, skewness, kurtosis).
2.

Inferential Statistics drawing conclusions about a


population value based upon sample data (point and
interval estimates, hypothesis testing, fitting lines to data
sets (X, Y) using least squares regression, and analysing
time series data).

Summary of Presenting Data


Presenting
Categorical
DataData

The Different Types of Data Variable


Variable - A variable is any measured
characteristic or attribute that differs for
different subjects e.g. height of a building,
eye colour.
Qualitative (or categorical) Descriptive
variable measuring a particular
characteristic (e.g. eye colour) or the
variable can be ranked (e.g. finished first,
fourth etc.). These variables have values
that can only placed into categories such
as yes and no.
Quantitative (or numerical) these
6

The Different Types of Data Variable


Nominal Assigning items to categories e.g.
number of people with blue eyes. When numbers
are placed to label an item/individual, it is called
as nominal data. Frequency distributions are
usually used to tabulate and analyse problems
involving nominal data.
Ordinal A set of data is said to be ordinal if the
values belonging to it can be ranked. Number are
used to rank objects/attributes
7

The Different Types of Data Variable


Interval - An interval scale is a scale of

measurement where the distance between any


two adjacent units of measurement (or
intervals) is the same but the zero point is
arbitrary
Ratio - Ratio data are continuous data where

both differences and ratios are interpretable and


have a natural zero

Recognising a measure scale


Measurement
Scale
Nominal data

Ordinal data

Interval data

Ratio data

Recognising a measure scale


1. Classification data e.g. male or female, red or
black car.
2. Arbitrary labels e.g. m or f, r or b, 0 or 1.
3. No ordering e.g. it makes no sense to state that
r > b.
1. Ordered list e.g. student satisfaction scale of 1,
2, 3, 4, and 5.
2. Differences between values are not important
e.g. political parties can be given labels: far
left, left, mid, right, far right etc. and student
satisfaction scale of 1, 2, 3, 4, and 5.
1. Ordered, constant scale, with no natural zero
e.g. temperature, dates.
2. Differences make sense, but ratios do not e.g.
temperature difference
1. Ordered, constant scale, and a natural zero e.g.
length, height, weight, and age.
9

Tables
Tables come in a variety of formats, from simple tables to
frequency distribution, that allow data sets to be summarised in a
form that allows users to be able to access important information.
Proposed voting behaviour by 1110 university
students
(Source: University Student Survey October 2008)
Party
Frequenc or
Party
Frequenc
y
y%
Conservativ
400
Conservativ
36
e
e
Labour
510
Labour
46
Democrat
78
Democrat
7
Green
55
Green
5
Other
67
Other
6
Total
1110
Total
100

Example
1.1
Simple
table
illustrating
the voting
intentions
of 1110
students

10

Simple Tables
Month
Pink
Blue
Total

Januar
y
5200
2100
7300

Half-yearly sales of XBAR Ltd.


Februar March
April
May
y
4100
6000
6900
6050
1050
2950
5000
6300
5150
8950 11900 12350

Single

Less than 15
hrs per week
15 hrs or
more per
week
Total

Under
30
330

30+

June

Total

7000
5200
12200

35250
22600
57850

Example 1.2 Half


yearly sales of
XBAR Ltd

Married
30+

358

Under
30
1162

1719

241

643

1521

2049

599

1805

2005

484

Example 1.3
Viewing habits
of adult males

11

Frequency Distributions
Consider the set of data that represents the
number of insurance claims processed each
day by an insurance firm over a period of 40
days:
3, 5, 9, 6, 4, 7, 8, 6, 2, 5, 10, 1, 6, 3, 6, 5, 4,
7, 8, 4, 5, 9, 4, 2, 7, 6, 1, 3, 5, 6, 2, 6, 4, 8, 3,
1, 7, 9, 7, 2.

12

Frequency Distributions
Consider the set of data that represents the number of insurance
claims processed each day by an insurance firm over a period of 40
days: 3, 5, 9, 6, 4, 7, 8, 6, 2, 5, 10, 1, 6, 3, 6, 5, 4, 7, 8, 4, 5, 9, 4, 2,
7, 6, 1, 3, 5, 6, 2, 6, 4, 8, 3, 1, 7, 9, 7, 2.
SCORE
1
2
3
4
5
6
7
8
9
10

TALLY
111
1111
1111
1111
1111
1111 11
1111
111
111
1

FREQUENCY, f
3
4
4
5
5
7
5
3
3
1
f = 40

Example
1.4
Frequency
distribution

13

Grouped Frequency Distributions


Consider the following data set of miles recorded by 120 salesmen in
one week.
403 407 407 408 410 412 413 413
Example 1.5
423 424 424 425 426 428 430 430
Data set
435
444
452
462
474
490
415
416
418
419
420
421
421

435
444
453
462
474
493
430
431
432
432
433
433
434

436
445
453
462
475
494
439
440
440
441
442
442
443

436
446
453
463
476
495
449
450
450
451
451
451
452

436
447
454
464
477
497
457
457
458
459
459
460
460

438
447
455
465
478
498
468
469
470
471
471
472
473

438
447
455
466
479
498
482
482
483
485
486
488
489

438
448
456
468
481
500
502
502
505
508
509
511
515

14

Counting frequencies
MILEAGE

TALLY

400 - 419
420 - 439

1111 1111 11
1111 1111 1111 1111
1111 11
1111 1111 1111 1111
1111 1111 1111
1111 1111 1111 1111
1111
1111 1111 1111
1111 111

440 - 459
460 - 479
480 - 499
500 - 519

FREQUENCY
f
12
27

Example 1.5
Grouped
frequency
distribution

34
24
15
8
f = 120

See text for


the Excel
solution

15

Class Intervals and Boundaries


Data can exist in two forms: discrete and
continuous:
1. Discrete data occurs as an
MATHEMATICAL LIMIT
integer (whole number)
STATED LIMIT
DISCRETE CONTINUOUS
5 - under
5-9
5 - 9.999999'
e.g. 1, 2, 3, 4, 5, 6,.......etc. A
10
10 - 14
10 2. Continuous data occurs as
10 - under
14.999999'
a continuous number and
15
can take any level of
B
5- 9
5-9
4.5 - 9.5
accuracy, e.g. the number
10 15
10 - 15
9.5 - 15.5
of miles travelled could be
440.3 or 440.34 etc.
Normally, we would look at creating 5 12 classes in the grouped
frequency distribution, where class width = Upper Lower Class
Boundaries.
Class width

Highest Value Lowest Value


Number of Classes

16

Graphical Representation of Data


The next stage of
analysis after the data
has been tabulated is
to graph the data
using a variety of
methods to provide a
suitable graph. In this
section we will explore:
1.
2.
3.
4.
5.
6.

Bar charts
Pie charts
Histograms
Frequency polygons
Scatter plots
Time series plots

The type of graph you will use to graph


the data depends upon the type of
variable you are dealing with within
your data set e.g. category (or
nominal), ordinal, or interval (or ratio)
data as follows:

Data type
Which graph to use?
Category Bar chart, pie chart, cross tab
or
tables (or contingency tables)
nominal
Ordinal
Bar chart, pie chart, scatter
plots.
Interval or Histogram, frequency polygon,
ratio
histogram.
Cumulative frequency curve (or
ogive), scatter plots, time series
plots.

17

Bar charts
Categorical data is represented
largely by bar and pie charts.
Bar charts are very useful in
providing a simple pictorial
representation of several sets of
data on one graph.
Example 1.7
Bar chart for
proposed
voting
behaviour
See text for
the Excel
solution

18

Horizontal Bar Charts


Example 1.8
Component bar
chart for half
yearly car sales

See text for


the Excel
solution

19

Pie charts
In a pie chart the relative
frequencies are
represented by a slice of a
circle. Each section
represents a category, and
the area of a section
represents the frequency
or number of objects
within a category.
They are particularly
useful in showing relative
proportions, but their
effectiveness tends to
diminish for more than
eight categories.

Example 1.11
Pie chart for
proposed
voting
behaviour

See text for


the Excel
solution

20

Pie chart angles


A set of instructions is provided below if you would like to calculate
the angles of each slice in the circle that represents each voting
category.
Political
Party

Voting
Behaviour

Angle
Calculation

Conservativ
e
Labour

400

(360/1110)*4
00
(360/1110)*5
10
(360/1110)*7
8
(360/1110)*5
5
(360/1110)*6
7

510

Democrat

78

Green

55

Other

67

Total =

1110

Angle
(1 decimal
place)
129.70
165.40
25.30
17.80
21.70
359.9

21

Histograms
A graph of the data in a frequency distribution is called a
histogram. The area of each bar is a measure of the frequency of
occurrence (number of values) within each category. If the bar
widths are the same (constant) then the height of the bar is
directly related to the frequency and this information can then be
used to construct the histogram.
Example 1.12

Glyn Davis & Branko Pecar

Histogram for
the number of
insurance claims
processed

22

Histogram Example
Example 1.13
Histogram for
the miles
recorded by 120
salesman

See text for


the Excel
solution

23

Frequency Polygon
A frequency polygon is formed from a histogram by joining the midpoints of the tops of the rectangles by straight lines. The mid-points of
the first and last class are joined to the x-axis to either side at a
distance equal to (1/2)th the class interval of the first and last class.
Example 1.15
Frequency
Polygon for the
miles recorded
by 120 salesman
Glyn Davis & Branko Pecar

See text for


the Excel
solution

24

Creating Scatter Plots


A scatter plot is a
graph which helps us
assess visually the
form of relationship
between two
variables. To illustrate
the idea of a scatter
plot consider the
following problem.
Example 1.16

Employee Number Productivity, X


1
47
2
71
3
64
4
35
5
43
6
60
7
38
8
59
9
67
10
56
11
67
12
57
13
69
14
38
15
54
16
76
17
53
18
40
19
47
20
23

% Raise in Productivity, Y
4.2
8.1
6.8
4.3
5.0
7.5
4.7
5.9
6.9
5.7
5.7
5.4
7.5
3.8
5.9
6.3
5.7
4.0
5.2
2.2

25

Scatter plots
Example 1.16
Scatter plot for
the % raise in
productivity
against
productivity

Glyn Davis & Branko Pecar

See text for


the Excel
solution

26

Time series
Time series analysis is concerned with data collected over a period
of time. It attempts to isolate and evaluate various factors which
contribute to changes over time in such variable series as imports
and exports, sales, unemployment and prices. If we can evaluate the
main components which determine the value of say sales for a
particular month then we can project the series into the future to
obtain a forecast.

Example 1.17

Year
2001
2002
2003
2004

Sales of Pip Ltd 2001-2004 (tons)


Quarter Quarter Quarter Quarter
1
2
3
4
654
620
698
723
756
698
748
802
843
799
856
889
967
876
960
976

27

Time series plots


Example 1.17
Time series plot
for quarterly
sales of Pip Ltd

See text for


the Excel
solution

28

Conclusion
In this presentation we explored summarising data sets using the
following three concepts:

Tables
Frequency
distributions
Graphs
29

Вам также может понравиться