Вы находитесь на странице: 1из 20

AAMS1773 QUANTITATIVE STUDIES

CHAPTER 1: INTRODUCTION TO STATISTICS AND


DATA PRESENTATION
INTRODUCTION TO STATISTICS
Statistics represent scientific procedures and methods for
collecting, organizing, summarizing, presenting and
analyzing data, as well as drawing valid conclusions and
making reasonable decisions base on the analysis.
However, the figures that result from statistical analysis are
also referred to as statistics.
Presenting

Collecting

STATISTICS

Analyzing

Organizing
Interpreting

PURPOSE OF STATISTICS
Statistical techniques are used extensively by
marketing
managers,
accountants,
consumers,
educators, politicians, physicians, etc.
Statistical techniques are used to make many
decisions that affect our lives. Regardless what your
future line of work is, you will make decisions that
involved data.
1

Reasons for learning statistics:


To know how to properly present and describe
information.
To know how to obtain reliable forecasts of variables of
interest.
To know how to draw conclusions about large
populations based on information obtained from
samples.

Population and Sample


Population: A set of all items under observation.
Sample: A set of items
selected from the population.
A subset of a population.

Statistic and Parameter


A summary measure such as mean, median, mode or
standard deviation, computed from sample data is
called a statistic.
A summary measure for the entire population is called a
parameter.
Statisticians often estimate population parameters from
the corresponding sample statistics.

TYPES-OF VARIABLES
A variable measures the characteristics of the population
that the researcher wants to study.

Variable
The characteristics of the population
of interest
E.g. monthly income of
respondents, respondents age,
gender, level of education, number
of children and type of house owned

Quantitative or Numerical
Measured on numerical
scale
Yields numerical response
E.g. How tall are you? The
answer is numerical.

Qualitative or Attributive
Measured on non-numerical
scale
Yields categorical response
E.g. Are you a Malaysian?
The answer is only Yes or
No.

Discrete
Numerical response which
arises from a counting
process.
E.g. How many mobile
phones do you have?

Continuous
Numerical response which
arises from a measuring
process.
E.g. What is your weight?

DATA PRESENTATION
Raw data
Data collected that have not been organized or
processed are called raw data.
When every observed value of the random variable is
listed, the data are called ungrouped data.
Grouping is one of the most common methods of
organizing data. When we group data we are actually
constructing frequency distributions for the raw data.
Frequency Distribution
A frequency distribution is a table in which possible
values for a variable are grouped into non
overlapping classes, and the number of observed
values which fall into each class is recorded.
Data organized in a frequency distribution are called
grouped data.
Example
The frequency distribution below represents the number
of books read by 500 students in a school during one
year:
No. of books read No. of students (Frequency)
09
52
10 19
63
20 29
71
30 39
96
40 49
43
50 59
58
60 79
72
80 99
45
The variable is number of books read.
The data (number of books read) are grouped into 8
classes.
4

The followings are some guidelines for the construction of


frequency distributions, not as absolute rules.
Classes / class intervals set up should be nonoverlapping and no double counting.
Normally, the number of classes should not be less than
5 or more than 15.
The number of classes, k, can be estimated based on the
formula:
log n
k
where n = number of observations
log 2
Use equal class sizes/ widths whenever possible.
The class size / width, i can be determined as:
( H L)
i
where H = the highest data value;
k
L = the lowest data value.
The start point should be a little smaller than the lowest
value. If possible, it should be an even multiple of the
class size.
Some common practices for classes:
*Class
(exclusive type)
0 - < 10 or 0
10 - < 20
10
20 - < 30
20
30 - < 40
30
40 - < 50
40

- 10
- 20
- 30
- 40
- 50

** Class
(inclusive type)
0 9
10 19
20 29
30 39
40 49

Class
(open-ended)
Below 20
20 - < 30
30 - < 40
40 - < 50
50 and above

*class (exclusive type) is mainly used for continuous data or


discrete data which have been rounded to the nearest tens,
hundreds, thousands, millions etc.
**class (inclusive type) is mainly used for discrete data where
there is a gap between classes.
5

Example
The following is a record of the number of books borrowed per
week in the library for 30 weeks:21
89
87

47
15
27

64
97
74

42
25
21

89
35
66

76
12
25

55
92
47

100 75
36 93
10 89

67
34
30

Tabulate the data in the form of a frequency distribution,


grouping by suitable class size.
Solution:
The variable is the number of books borrowed per week which
is discrete.
log n log 30
k

4.9069 Use k = 5
Number of classes:
log 2 log 2
Class size: Lowest value = 10; highest value = 100

( H L)
(100 10)
18 Use i = 20
i
=
k
5
Frequency distribution for the number of books borrowed per
week in the library for 30 weeks:
Number of books
10 29
30 49
50 69
70 89
90 109
Total

Tally count

Number of weeks

30

Example
The average amount of rainfall (in cm) for a small town was
recorded for the month of December.
20.42 21.06
22.40
21.117 22.6
33.01 22.89
22.9
30.34 25.61
23
24.5
26.881
24.49
23.7
28
25.0
25.69 27.14
26.321 27.216
19.22 29.6
26.5
24.15 24.18
26.4
25
25.7
28
25.556
Construct a grouped frequency distribution for the data using
suitable class size.
Solution:
The variable is the average amount of rainfall which is
continuous.
log n log 31
k

4.9542 Use k = 5
Number of classes:
log 2 log 2
Class size: Lowest value = 19.22; highest value = 33.01
i

( H L)
(33.01 19.22)
2.758 Use i = 3
=
k
5

Frequency distribution for the average amount of rainfall in the


month of December:
Average amount
of rainfall (cm)
19 - < 22
22 - < 25
25 - < 28
28 - < 31
31 - < 34
Total

Tally count

Number of days

31

Basic components of a frequency distribution:


Class limits- the smallest and largest possible measurements
in each class, i.e. the upper and lower limits are known as
class limits.
Class boundaries- the dividing lines between successive
classes.
Class size/ class width = upper class boundary lower class
boundary. An exception is opening and closing classes to
include extreme values.
Opened ended classes- one boundary is not specified e.g.
below 20; 50 and above. In further calculation, assume to be
of the same size as the immediate neighboring class.
Class mark or class mid-point the value exactly at the
middle of a class. It lies half way between the class limits or
the class boundaries.
Class mark =

it upper class
2

i
m
i
l

m
i
l
lower class

or

lower class boundary upper class boundary


Class mark =
2

Example
Class
10 - 29
30 - 49
50 - 69
70 - 89
90 - 109
class marks

Class boundaries
9.5 29.5
29.5 49.5
49.5 69.5
69.5 89.5
89.5 109.5
1st class
19.5

Class size
29.5 9.5 =20
49.5 29.5=20
69.5 49.5=20
89.5 69.5=20
109.5 89.5=20

2nd class
39.5

[//////// //// ///]

Class mark
19.5
39.5
59.5
79.5
99.5

3rd class
59.5

[/////// ////////]

[/////// ////////]

...]
10
class limits

29 30

49 50

[
69 70

class
9.5
boundaries

29.5

49.5

69.5

Example
Class
19 < 22
22 < 25
25 < 28
28 < 31
31 < 34

Class boundaries
19 22
22 25
25 28
28 31
31 34

class marks

1st class
20.5

Class size
22 19 = 3
25 22 = 3
28 25 = 3
31 28 = 3
34 31 = 3

2nd class
23.5

Class mark
20.5
23.5
26.5
29.5
32.5

3rd class
26.5

[///////// /////////)[//////// //////////)[///////// /////////)[


class limits 19
class
19
boundaries

22
22

25
25

28
28

Histogram
is a graphical representation of the frequency distribution.
A bar is drawn for each class and the area of each bar is
proportional to the class frequency. The bars are drawn
adjacent to another. Class boundaries are graduated on
the horizontal axis.
For frequency distribution with equal class size, the
height of each bar is drawn proportional to the actual
frequency of each class and the width of each bar
extends from the lower class boundary to the upper class
boundary of the class.
Example
Construct a histogram for the frequency distribution of the
number of books borrowed per week in the library for 30
weeks:
Number of books
10 29
30 49
50 69
70 89
90 109
Total

Number of weeks
8
7
4
7
4
30

Solution:
Histogram of number of books borrowed per week
Frequency

10
8
6
4
2
0

class boundaries

9.5 29.5 49.5 69.5 89.5 109.5


Number of books borrowed

10

Example
Construct a histogram for the frequency distribution the
average amount of rainfall in the month of December:
Average amount of
rainfall (cm)
19 - < 22
22 - < 25
25 - < 28
28 - < 31
31 - < 34
Total

Number of days
4
10
12
4
1
31

Solution:
Histogram of the average amount of rainfall (cm) in
the month of December
Frequency
14
12
10
8
6
4
2
0

class boundaries

19
22
25
28
31
34
Average amount of rainfall (cm)

For frequency distribution of unequal class size, the


height of each bar is drawn proportional to the adjusted
frequency of each bar where
Common class size frequency
Adjusted frequency
class size
Example
Construct a histogram for the frequency distribution of sales of
46 branches of a company in the course of one week.
11

Sales (units)
0 99
100 199
200 299
300 499
500 699

No. of branches
10
18
8
6
4

Solution:
Sales
(units)
0 99
100 199
200 299
300 499
500 699

No. of branches
Class
Class *Adjusted
(frequency )
boundaries
size frequency
10
- 0.5 99.5
100
10
18
99.5 199.5
100
18
8
199.5 299.5
100
8
6
299.5 499.5
200
3
4
499.5 699.5
200
2
100 frequency

Adjusted
frequency
*
, where the common
class size
class size = 100
Histogram of sales of 46 branches of a
company in one week
Adjusted freq.

20
15
10
5
class boundaries

0
-0.5

99.5 199.5 299.5


Sales (units)

499.5

699.5

12

Vertical axis label


No. of observations
Proportion of observations
Percentage of observations

Types of Histogram
Frequency Histogram
Relative Freq. Histogram
Percentage Histogram

The term skewness is used to describe the shape of a


frequency distribution.
Positive skewness
Negative skewness
The peak of the histogram lies The peak of the histogram lies
to the left of the centre of the to the right of the centre of the
distribution.
distribution.

If the peak of the histogram lies at the centre of the


distribution with two slopes virtually identical, the distribution
is said to be symmetrical, or not skewed.

13

Cumulative Frequency Distribution


Given a frequency distribution, a cumulative frequency
distribution can be derived by the addition of the
frequencies of the successive classes.
There are two types of cumulative frequency
distributions:
1. Less than cumulative frequency distribution
A table showing the total frequency of all values less
than the upper class boundary of each class is called a
less than cumulative frequency distribution.
2. More than cumulative frequency distribution
A table showing the total frequency of all values more
than or equal to the lower class boundary of each
class is called a more than cumulative frequency
distribution.
* In examination, only less than cumulative frequency
distribution will be included.
Example
Number of
books
10
30
50
70
90

29
49
69
89
109

Number of
weeks (freq.)
8
7
4
7
4

Class
boundaries

< Cum. Freq. table


No. of books Cum. freq.
< 9.5
0
9.5 29.5
< 29.5
8
29.5 49.5
< 49.5
15
49.5 69.5
< 69.5
19
69.5 89.5
< 89.5
26
89.5 109.5
< 109.5
30
upper class
boundaries

14

Example
Average
amount of
rainfall (cm)
19
22
25
28
31

<
<
<
<
<

22
25
28
31
34

Number of
days (freq.)
4
10
12
4
1

Class
boundaries

19
22
25
28
31

22
25
28
31
34

< Cum. Freq. table


Ave. amount
Cum.
of rainfall (cm)
freq.
< 19
0
< 22
4
< 25
14
< 28
26
< 31
30
< 34
31

upper class
boundaries

Ogives (Cum. Freq. Polygon/ Cum. Freq. Curve)


Ogive is a line chart of a cumulative frequency
distribution.
There are two types of ogives:
1. Less than ogive showing the cumulative frequency
less than the upper class boundary plotted against the
upper class boundary of any class.
2. More than ogive showing the cumulative frequency
more than or equal to the lower class boundary plotted
against the lower class boundary of any class.
*In examination, only less than ogive will be included.

15

Example
The following table shows the output produced by 20
employees in an hour in a factory.
Output (units)
15
6 10
11 15
16 20
21 25

Number of employees
1
2
3
9
5

Construct a less than cumulative frequency distribution and


plot a less than ogive. Hence estimate
(i)

the number of employees producing output less than


13 units
(ii) the proportion of employees producing output more
than 22 units
(iii) the number of units of output which will be exceeded
by 90% of the employees
(iv) the number of employees producing output between 8
and 18 units.
Solution:
Output
(units)

Number of
employees
(freq.)

Class
boundaries

15
6 10
11 15
16 20
21 25

1
2
3
9
5

0.5 5.5
5.5 10.5
10.5 15.5
15.5 20.5
20.5 25.5

< Cum. Freq. table


Output
Cum. freq.
(units)
< 0.5
0
< 5.5
1
< 10.5
3
< 15.5
6
< 20.5
15
< 25.5
20

16

'<' Ogive of output produced by 20 employees


cum. freq.

20
18
16
14
12
10
8
6
4
2
0

class boundaries

0.5

5.5

10.5

15.5

20.5

25.5

Output

From the < ogive, we can estimate


(i) the number of employees producing output less than 13
units to be 4.5.
(ii) the proportion of employees producing output more than
20 16.5
3.5

0.175
22 units to be
20
20
(iii) the number of units of output which will be exceeded by
90% of the employees to be x units
90% of the employees are producing more than x units
10% of the other employees (10% x 20= 2 employees)
are producing less than x units. From the < ogive, x = 8
units.
(iv) the number of employees producing output between 8 and
18 units to be 10.5 - 2 = 8.5.
17

AAMS1773 QUANTITATIVE STUDIES


TUTORIAL 1 (DATA PRESENTATION)
1.

The data below


examination.
62 54 38
57 71 85
48 68 55
64 58 66
(a)

(b)
(c)
(d)
2.

are the marks obtained by 40 students in an


33
47
49
59

80
50
79
52

66
71
41
43

56
52
61
65

60
76
65
48

68
49
75
41

52
69
81
56

Construct a frequency distribution table using 30 39


as the first class, 40 49 as the second class and so
on.
Draw a histogram for the above data.
Construct a less than cumulative frequency
distribution.
Draw a less than cumulative frequency polygon.

The following data is the heights (in nearest centimeters) of


85 employees in a company:
169
183
186
177
180
173
179
176

179
162
177
184
175
165
182
164

183
170
185
175
183
170
171
187

186
186
175
168
191
178
169
167

166
174
179
181
172
181
171
185

181
188
166
180
188
181
184
177

177
165
190
172
180
189
198
184

173
168
182
178
176
187
182
178

167
174
182
192
185
191
175

193
170
180
175
178
179
190

176
176
194
189
179
196
187

(a) Tabulate the above data in the form of a frequency


distribution, using 160 - <165 as the first class, 165 - <170
as the second class and so on.
(b) Draw a histogram for the above data.
(c) Construct a less than cumulative frequency distribution.
(d) Draw a less than cumulative frequency polygon (ogive).

18

(e) Using the ogive in part (d), estimate:


(i) the height which will be exceeded by 25% of the
employees.
(ii) the number of employees who have heights less than
175 cm.
(iii) the proportion of employees who have heights
exceeding 175 cm.
3.

The following table shows the gross profit of a random


sample of 500 small companies in a year.
Gross Profit ($thousand)
Under 10
10 and under 20
20 and under 30
30 and under 40
40 and under 60
60 and under 90

Percentages of companies
8
22
36
18
10
6

(a) Draw a histogram.


(b) Construct a less than cumulative frequency distribution.
(c) Plot a less than ogive and use it to estimate
(i) the number of small companies which earned at least
$38,000 of gross profit;
(ii) the proportion of small companies which earned less
than $45,000 of gross profits.
4. The following data shows the number of rejects from the
assembly line of a local manufacturer recorded for a period of
80 days:
Number of rejects Number of days
04
1
59
14
10 14
23
15 19
20
20 24
16
25 29
6
19

(a) Draw a histogram for the data.


(b) Construct a less than cumulative frequency distribution
and plot a less than cumulative frequency polygon. Use
the graph to estimate
(i) the number of days that produce at most 12 rejects;
(ii) the number of rejects exceeded by 10 % of the days.
5. The following cumulative frequency distribution shows the
duration of each telephone call made by an employee
recorded for a period of one month:
Duration (minutes)
Under 3
Under 6
Under 9
Under 12
Under 18
Under 24

Number of calls
45
104
142
173
192
200

(a)

Draw the ogive for the above cumulative frequency


distribution.
(b) Use the ogive to estimate:
(i) the number of calls that lasted between 5 and 10
minutes;
(ii) the duration not exceeded by 90% of the calls.
(c)
Redraft the above data in the form of frequency
distribution and construct a histogram.
Answers:
2. (e) (i) 185.5 cm.
3. (c) (i) 100
4. (b) (i) 26.5 days
5. (b) (i) 68 calls

(ii)
(ii)
(ii)
(ii)

23
(iii) 0.7294
0.865
24 rejects
14 min.

20

Вам также может понравиться