Вы находитесь на странице: 1из 17

MTH201

Probability and Statistics

Acknowledgment: Sona Golder & Sheldon Ross


Introduction

In a broad sense, making an inference implies partially or


completely describing a phenomenon.

Before discussing inference making, we must have a method for


characterizing or describing a set of numbers.

The characterizations must be meaningful so that knowledge of the


descriptive measures enable us to clearly visualize a set of numbers.

Generally, we can characterize a set of numbers using either


graphical or numerical methods.
Graphical Methods

Raw data refers to the information collected from a sample that


has not been subjected to statistical manipulation.

Table: Number of Children per Family

2 2 5 3 4 3 3 6 4 3 4 4 4 4 4 2 5 9 2 3
1 3 5 2 4 4 4 3 3 2 2 4 2 2 6 6 1 3 3 3
3 2 3 4 7 3 3 3 2 2 2 2 3 2 3 2 3 2 5 2
3 2 2 2 4 3 3 2 3 2 4 3 3 3 4 2 4 1 2 2
2 4 3 3 3 5 2 3 3 2 2 3 3 4 2 2 2 7 2 3

The information shown in the following table is a frequency


distribution. A frequency distribution represents the number of
occurrences of each possible value (or interval) of a variable.
Frequency Distribution

The absolute frequency (fi ) reports the absolute number of


times that a value xi occurs in n trials.

The relative frequency ( fni ) reports the proportion of observations


for which the value xi occurs.

The empirical cumulative distribution is the fraction of the


sample (data) smaller than or equal to some number xi .

We can display all of this information either in tabular form or


graphically.
Frequency Distribution

Table: Number of Children per Family


Number of Tabulation Frequency Relative Empirical Cumulative
Children Frequency Distribution Function
1 ||| 3 0.03 0.03
2 ||||| ||||| ||||| ||||| ||||| ||||| |||| 34 0.34 0.37
3 ||||| ||||| ||||| ||||| ||||| ||||| |||| 34 0.34 0.71
4 ||||| ||||| ||||| ||| 18 0.18 0.89
5 ||||| 5 0.05 0.94
6 ||| 3 0.03 0.97
7 || 2 0.02 0.99
8 0 0.00 0.99
9 | 1 0.01 1.00

n = 100 1.00
Relative Frequency Distribution as a (Density) Histogram

Figure: Relative Frequency Distribution as a (Density) Histogram


.35
.3 .25
Relative Frequency
.15 .2
.1
.05

1 2 3 4 5 6 7 8 9 10
Number of Children
Empirical Cumulative Distribution Function

Figure: Empirical Cumulative Distribution Function


1

.9
Empirical Cumulative Distribution

.8

.7

.6

.5

.4

.3

.2

.1

1 2 3 4 5 6 7 8 9
Number of Children
Stem-and-Leaf Plot

A stem-and-leaf plot is similar to a histogram except that it


contains more information by providing details regarding individual
values in the sample.

The data are arranged by place value. The digits in the largest
place are referred to as the stem and the digits in the smallest
place are referred to as the leaves.

An example should make this clear.


Stem-and-Leaf Plot

Figure: Stem-and-Leaf Plot for Exam Grades


Distribution of Examination Grades (n=50)

81 38 93 47 45 70 52 59 51 93
54 42 58 58 67 63 71 91 65 90
42 69 60 69 68 65 77 72 75 73
71 57 76 61 74 76 73 75 97 84
83 84 87 83 82 81 85 79 34 66

Leaves Frequency
Stem
3 48 2
4 2257 4
5 1247889 7
6 0135567899 10
7 0112334556679 13
8 112334457 9
9 01337 5
Percentiles

Figure: Relative Frequency Distribution as a (Density) Histogram

.35
.3 .25
Relative Frequency
.15 .2
.1
.05
1 2 3 4 5 6 7 8 9 10
Number of Children

Suppose that a family has 2 children. We can say that 37%


(3% + 34% = 37%) of families in the sample have two or less
children.

Thus, a two-child family represents the 37th percentile in our


sample.
Percentiles

The percentiles that cut the data up into four quarters have special
names.

The 25th percentile is known as the lower quartile.


The 75th percentile is known as the upper quartile.
The 50th percentile is known as the median.

The distance between the upper quartile and the lower quartile
is called the inter-quartile range.

Information about these quartiles can be shown using a box (box


and whisker) plot.
Box Plot

Figure: A Box Plot (or Box and Whisker Plot) for the Number of
Children per Family
25th Percentile 75th Percentile Maximum Value
Lower Adjacent Upper Adjacent
Median
Value Value

1 2 3 4 5 6 7 8 9
Number of Children
Violin Plot

A violin plot offers a potentially more informative alternative to a


box plot.

A violin plot is essentially the same as a box plot except that it


also shows the (local) probability density of the data at different
values of the variable.

An example should make this clear.


Violin Plot

Figure: A Violin Plot for the Number of Children per Family


25th Percentile 75th Percentile
Median

children

1 2 3 4 5 6 7 8 9
Number of Children
Discrete and Continuous Variables

In our running example, the number of children is a discrete


variable.

A discrete variable is one that can only take on values from the
set of all integers (positive and negative whole numbers and zero)
i.e. D = {X|x is an integer}. Examples: Voter choice or turnout.

A continuous variable is one that can take on values from the set
of all rational numbers i.e. C = {X| < x < +}. Examples:
Placement on a liberal-conservative scale and income.
Histogram for Continuous Variable

Figure: Household Income in the United States (1989)

Household Income Frequency Relative Frequency


Less than $5,000 5,684,517 0.06
$5,000 to $9,999 8,529,980 0.09
$10,000 to $14,999 8,133,273 0.09
$15,000 to $24,999 16,123,742 0.21
$25,000 to $34,999 14,575,125 0.16
$35,000 to $49,999 16,428,455 0.19
$50,000 to $74,999 13,777,883 0.15
$75,000 to $99,999 4,704,808 0.05
$100,000 to $149,999 2,593,768 0.03
$150,000 or more 1,442,031 0.02
91,993,582 1.05*
* Due to Rounding Error

Figure 2.2 Household Income in United States (1989)


20,000

15,000

10,000

5,000

0
$5 to $10 $15 to $25 $35 to $50 $75 to $100 > $150
< $5 $10 to $15 $25 to $35 $50 to $75 $100 to $150

(In $1,000 Dollars)


Histogram for Continuous Variable

The cells have been chosen somewhat arbitrarily, but with the
following conveniences in mind:

1 The number of cells is a reasonable compromise between too much detail


and too little. Usually 5 to 15 cells is a appropriate.
2 Each cell midpoint, which hereafter will represent all observation in the
cell, is a convenient whole number.
3 If an observation occurs right on a cell boundary, then move the first such
occurrence into the cell above, the second to the cell below, and repeat.
This minimizes directional bias in the distribution.

Вам также может понравиться