Вы находитесь на странице: 1из 55

Introduction to Probability and Statistics

Handout #7

Instructor: Lingzhou Xue

TA: Daniel Eck

The pdf file for this class is available on the class web page.

Chapter 8 Fundamental Sampling Distributions and Data Descriptions

Sample Mean X and Sample Variance S 2 .

¯

Histogram and Box Plot.

Central Limit Theorem (CLT).

χ 2 , t, and F Distributions.

Example 1: Sample Distribution

The sample distribution is the distribution resulting from the collection of actual data. A major characteristic of a sample is that it contains a finite (countable) number of scores, the num- ber of scores represented by the letter n. For example, suppose that the following data were collected:

15

14 15 18 15 20 15 16 17 14 17 13 11 14 18 12 17 12 21 8

14

17 14 12 13 15 15 16 17 14 16 13 14 15 18 16 16 17 14 15

16

15 17 12 14 14 13 13 13 14

These numbers constitute a sample distribution.

Histogram

8 10 12 14 16 18 20 Density 0.00 0.05 0.10 0.15 0.20
8
10
12
14
16
18
20
Density
0.00
0.05
0.10
0.15
0.20

x

Sample Distribution.

In addition to the frequency distribution, the sample distribu- tion can be described with numbers, called statistics. Examples of statistics are the mean, median, mode, standard deviation, range, and correlation coefficient, among others.

If a different sample was taken, different scores would result. However, there would also be some consistency in that while the statistics would not be exactly the same, they would be simi- lar. To achieve order in this chaos, statisticians have developed probability models.

Histogram

Density 0.00 0.05 0.10 0.15 0.20
Density
0.00
0.05
0.10
0.15
0.20

Histogram

Density 0.00 0.05 0.10 0.15
Density
0.00
0.05
0.10
0.15

8

10

12

14

16

18

20

10

12

14

16

18

20

22

 

x

x

Histogram

8 10 12 14 16 18 20 Density 0.00 0.05 0.10 0.15
8
10
12
14
16
18
20
Density
0.00
0.05
0.10
0.15

x

Histogram

10 12 14 16 18 20 Density 0.00 0.05 0.10 0.15 0.20
10
12
14
16
18
20
Density
0.00
0.05
0.10
0.15
0.20

x

Random Sampling

Population

A population consists of the totality of the observations with

which we are concerned.

It is the entire group we are interested in, which we wish to

describe or draw conclusions about.

Sample

A sample is subset of a population.

2008 Presidential Race from CNN. 7

2008 Presidential Race from CNN.

Example 2

If you wanted to find out the percentage of students at UMN who enjoy reading Time. If we randomly select 20% of the pop- ulation, this selection would be the sample in this experiment. Therefore, the population would be all of the students who attend UMN.

A simple random sample of size n consists of n individuals from the population chosen in such a way that every set of n individuals has an equal chance to be the sample actually selected.

Random Sample

random variables, each

having the same probability distribution f (x). Define X 1 , X 2 , to be a random sample of size n from the population f (x) and write its joint probability distribution as

Let

X 1 , X 2 ,

, X n

be n independent

, X n

g(x 1 , x 2 ,

, x n ) = f(x 1 )f(x 2 )··· f(x n ).

9

Some Important Statistics

Statistic

A statistic is a function of random variables that does not de- pend upon any unknown parameter.

Sample Mean & Sample Variance

If X 1 , X 2 ,

sample mean is defined by the statistic

, X n represent a random sample of size n, then

X = 1

¯

n

n i=1

X

i

the

and the sample variance is defined by the statistic

S 2 =

1

n 1

n

i=1

(X i X) 2 .

¯

Example 3

A comparison of coffee prices at 4 randomly selected grocery

stores in San Diego showed increases from the previous month

of 12, 15, 17, and 20 cents for 1-pound bag. Find the variance

of this random sample of price increases.

Solution:

4

i=1

x¯ = 1

n

1

x i = 16 cents.

4

i=1

(x i x¯) 2 = 34 3 .

s 2 =

4 1

Theorem

If S 2 is the variance of a random sample of size n, we may write

Proof:

S 2 =

1

n(n 1)

n

n

i=1

X

2

i

n

i=1

X

i

2

.

Example 4

Find the sample mean and variance of the data 3, 4, 5, 6, 6, and 7, representing the number of trout caught by a random sample of 6 fishermen on June 19, 1996, at Lake Muskoka. Solution:

6

i=1

x

2

i

= 171,

6

i=1

x i = 31,

n = 6.

Hence,

s 2 =

1 6 [6 · 171 31 2 ] = 13

6 .

5 ·

Thus the sample standard deviation s = 13/6 = 1.47.

Example 5

The numbers of incorrect answers on a true-false competency test for a random sample of 15 students were recorded as follows:

2, 1, 3, 0, 1, 3, 6, 0, 3, 4. Find

the sample mean;

the sample variance.

Mode

The mode in a list of numbers refers to the list of numbers that occur most frequently. A trick to remember this one is to remember that mode starts with the same first two letters that most does. Most frequently - Mode. You’ll never forget that one!

Median

The median is the middle value in your list. When the totals of the list are odd, the median is the middle entry in the list after sorting the list into increasing order. When the totals of the list are even, the median is equal to the sum of the two middle (after sorting the list into increasing order) numbers divided by two. Thus, remember to line up your values, the middle number is the median! Be sure to remember the odd and even rule.

Data Displays and Graphical Methods

Box Plot

A box plot (also known as a box-and-whisker diagram or plot or candlestick chart) is a convenient way of graphically depicting the five-number summary, which consists of 25% percentile (lower quartile or first quartile (Q 1 )), median, 75% percentile (upper quartile or third quartile (Q 3 )) and adjust values; in addition, the boxplot indicates which observations, if any, are considered unusual, or outliers.

Outlier

Outliers are observations that are considered to be unusually far from the bulk of the data. Technically, one may view an outlier as being an observation that represents a ”rare event.” If the distance from the box exceeds 1.5 times the interquartile range, Q 3 Q 1 (in either direction), the observation may be labeled an outlier.

Box Plot

Box Plot 19

Example 6

The following set of numbers are the amount of marbles fifteen different boys own (they are arranged from least to greatest). 18 27 34 52 54 59 61 68 78 82 85 87 91 93 100.

Find the median.

Find the lower quartile.

Find the upper quartile.

Find the interquartile range.

Box-and-Whisker Plot for Example 6. 21

Box-and-Whisker Plot for Example 6.

21

Sampling Distribution of Means

Sampling Distribution

The probability distribution of a statistic is called a sampling distribution.

If we are sampling from a population with unknown distribu-

¯

tion, either finite or infinite, the sampling distribution of X will

be approximately normal with mean µ and variance σ 2 /n pro- vided that the sample size is large (n > 30).

Central Limit Theorem

¯

If X

is the mean of a random sample of size n taken from a

population with mean µ and finite variance σ 2 , then the limiting form of the distribution of

Z =

¯

X µ

the limiting form of the distribution of Z = ¯ X − µ σ/ √ n

σ/ n

as n → ∞, is the standard normal distribution N (0, 1).

Example 7

¯

Let X be the sample mean of a random sample of size 100 drawn

from an exponential distribution with its graph given by

1

f(x) = 4 e x/4 ,

x > 0

distribution with its graph given by 1 f ( x ) = 4 e − x

Exponential p.d.f with β = 4.

Decide which of the graphs labeled (a)-(d) would most closely

resemble the sampling distribution of the sample mean plain briefly your reasoning.

Ex-

¯

X.

(a)-(d) would most closely resemble the sampling distribution of the sample mean plain briefly your reasoning.
(a)-(d) would most closely resemble the sampling distribution of the sample mean plain briefly your reasoning.
(a)-(d) would most closely resemble the sampling distribution of the sample mean plain briefly your reasoning.
(a)-(d) would most closely resemble the sampling distribution of the sample mean plain briefly your reasoning.

Example 8

An electrical firm manufactures light bulbs that have a length of life that is approximately normally distributed, with mean equal to 800 hours and a standard deviation of 40 hours. Find the probability that a random sample of 16 bulbs will have an average life of less than 775 hours? Solution:

Example 9

The blood cholesterol levels of a population of workers have mean 202 and standard deviation 14.

1. If a sample of 36 workers is selected, approximate the prob- ability that the sample mean of their blood cholesterol levels will lie between 198 and 206.

2. Repeat 1 when the sample size is 64.

Solution:

Sampling Distribution: Difference Between Two Averages

If independent samples of size n 1 and n 2 are drawn at random from two populations, discrete or continuous, with means µ 1

, respectively, then the sampling

and µ 2 , and variances σ 2 and σ

2

2

1

¯

¯

distribution of the differences of means, X 1 X 2 , is approximately

normally distributed with mean and variance given by

Hence

µ

¯

X

1

X 2 = µ 1 µ 2 ,

¯

and

σ

2

¯

X

1

X 2 = σ 2

¯

1

n

1

Z =

¯ ¯

( X 1 X 2 ) µ

¯

X

1

¯

X

2

2

1

σ

1 +

n

σ

2

2

n

2

2

2

+ σ

n

2

.

is approximately a standard normal variable.

Example 10

The television picture tubes of manufacture A have a mean life- time of 6.5 years and a standard deviation 0.9 year, while those of manufacturer B have a mean lifetime of 6.0 years and a stan- dard deviation of 0.8 year. What is the probability that a random sample of 36 tubes from manufacturer A will have a mean life- time that is at least 1 year more than the mean lifetime of a sample of 49 tubes from manufacturer B?

Solution:

Example 11

The mean score for freshmen on an aptitude test at a certain college is 540, with a standard deviation of 50. What is the probability that two groups of students selected at random, con- sisting of 64 and 100 students, respectively, will differ in their mean scores by

1. more than 10 points?

2. an amount between 5 and 10 points?

Solution:

Sampling Distribution of S 2

Sampling Distribution of S 2

If S 2 is the variance of a random sample of size n taken from a normal population having the variance σ 2 , then the statistic

χ 2 = (n 1)S 2 σ 2

=

n

i=1

(X i X) 2

¯

σ 2

has a chi-squared distribution with υ = n 1 degrees of free- dom.

Example 12

Find the probability that a random sample of 21 observations, from a normal population with variance σ 2 = 5, will have a variance s 2

1. greater than 2.065;

2. between 2.065 and 3.6445.

Solution:

tDistribution

tDistribution

Let Z be a standard normal random variable and V a chi- squared random variable with υ degrees of freedom. If Z and V are independent, then the distribution of the random variable T , where

T =

Z V /υ
Z
V /υ

is given by the density function

h(t) = Γ[(υ + 1)/2]

(1 + t 2 ) (υ+1)/2 ,

υ

Γ(υ/2)

−∞ < t < .

This is known as the tdistribution with υ degrees of freedom.

0.4

0.3

f(x)

0.2

0.1

0.0

t Distributions v=2 v=5 v=100
t Distributions
v=2
v=5
v=100
0.4 0.3 f(x) 0.2 0.1 0.0 t Distributions v=2 v=5 v=100 −3 −2 −1 0 1
0.4 0.3 f(x) 0.2 0.1 0.0 t Distributions v=2 v=5 v=100 −3 −2 −1 0 1
0.4 0.3 f(x) 0.2 0.1 0.0 t Distributions v=2 v=5 v=100 −3 −2 −1 0 1
0.4 0.3 f(x) 0.2 0.1 0.0 t Distributions v=2 v=5 v=100 −3 −2 −1 0 1
0.4 0.3 f(x) 0.2 0.1 0.0 t Distributions v=2 v=5 v=100 −3 −2 −1 0 1

−3

−2

−1

0

1

2

3

 

x

The t-Distribution curves for ν = 2, 5 and 100.

Corollary

Let X 1 , X 2 ,

normal with mean µ and standard deviation σ. let

, X n

be independent random variables that are all

X = 1

¯

n

n

i=1

X i

and

S 2 =

1

n 1

n

i=1

(X i X) 2 .

¯

Then the random variable T = ν = n 1 degrees of freedom.

¯

Xµ

S/

n has a t-distribution with

Example 13

Find k such that P (k < T < 1.761) = 0.045 for a random

sample of size 15 selected from a normal distribution with mean

µ and T =

¯

Xµ

s/ n .

Solution:

What Is the t-Distribution Used for?

The t-distribution is used extensively in problems that deal with inference about the population mean or in problems that in- volve comparative samples (i.e., in cases where one is trying to determine if means from two samples are significantly differ- ent). The reader should note that the use of the t-distribution for the statistic

requires that X 1 , X 2 ,

T =

¯

X µ

for the statistic requires that X 1 , X 2 , T = ¯ X −

S/ n

, X n be normal.

Example 14

Suppose scores on an IQ test are normally distributed, with a mean of 100. Suppose 25 people are randomly selected and tested. The standard deviation in the sample group is 25. What is the probability that the average test score in the sample group will be at most 110.3?

Solution:

2

Note for χ α (v)

In the textbook,

we have

α

= P (χ 2

2

> χ α (v)).

That is,

χ

2

α

represent the χ 2 -value above which we find an area equal to α.

Note for t α (v)

In the textbook, we use α = P (T > t α (v)). That is, t α represent the t-value above which we find an area equal to α.

Note for f α(v 1 ,v 2 )

In the textbook, we have α = P (F

> f α (v 1 , v 2 )).

That is, f α

represent the f -value above which we find an area equal to α.

Example 13b

Consider T =

¯

Xµ

s/

n for a random sample of size n = 8.

Calculate P (T < 2.517) and P (2.998 < T < 3.499)

Find k such that P (k < T < 2.517) = 0.975.

Solution:

F -Distribution

Let U and V be two independent random variables having chi-

squared distribution with υ 1 and υ 2 degrees of freedom, respec-

tively. Then the distribution of the random variable F = U/υ 2 is

given by the density

1

V /υ

f(x) =

  Γ[(ν 1 +ν 2 )/2](ν 1 2 ) ν 1 /2

   0,

Γ(ν 1 /2)Γ(ν 2 /2)

x (ν 1 /2)1

(1+ν 1 x/ν 2 ) (ν 1

+ν 2 )/2 ,

x

> 0;

x 0.

This is known as the F distribution with υ 1 and υ 2 degrees of freedom.

2.0

1.5

f(x)

1.0

0.5

0.0

F Distributions v1=100, v2=100 v1=6, v2=10 v1=10, v2=30
F Distributions
v1=100, v2=100
v1=6, v2=10
v1=10, v2=30
f(x) 1.0 0.5 0.0 F Distributions v1=100, v2=100 v1=6, v2=10 v1=10, v2=30 0 1 2 3
f(x) 1.0 0.5 0.0 F Distributions v1=100, v2=100 v1=6, v2=10 v1=10, v2=30 0 1 2 3
f(x) 1.0 0.5 0.0 F Distributions v1=100, v2=100 v1=6, v2=10 v1=10, v2=30 0 1 2 3
f(x) 1.0 0.5 0.0 F Distributions v1=100, v2=100 v1=6, v2=10 v1=10, v2=30 0 1 2 3
f(x) 1.0 0.5 0.0 F Distributions v1=100, v2=100 v1=6, v2=10 v1=10, v2=30 0 1 2 3

0

1

2

3

4

5

x

The F -Distribution curves.

Theorem

are the variances of independent random samples

of size n 1 and n 2 taken from normal populations with variances

σ 2 and σ

If S 2 and S

2

2

1

2

2

, respectively, then

1

F

=

S 2

1

2

1

S

2

2

2

2

2

2

S

2

1

= σ

σ

2 S 2

1

2

has an F -distribution with υ 1 = n 1 1 and υ 2 = n 2 1 degrees of freedom.

What Is the F -Distribution Used for?

The F -distribution is used in two-sample situations to draw in- ferences about the population variances. However, the F - distribution is applied to many other types of problems in which the sample variances are involved. In fact, the F -distribution is called the variance ratio distribution.

Example 15

If S 2 and S

1

2

2

represent the variances of independent random sam-

ples of size n 1 = 31 and n 2 = 25, taken from normal populations

with variances σ 2 = 20 and σ

1

2

2

= 10, respectively, find:

1. P S

2

2

< 7.526 .

2. P S 2 > 3.88 .

1

S

2

2

Solution:

Solution:

1.

P(S

2

2

2

< 7.526) = P 24 · S

2

10

< 24 · 7.526

10

= 1 P χ 2 24 > 36.4152

= 1 0.05

= 0.95

2.

P

S

2

1

S

2

2

> 3.88 = P

= P

S

2

1

S

S

2

2

2

1

S

2

2

·

·

σ

2

2

σ

σ

2

1

2

2

σ

2

1

2

2

> 3.88 · σ

σ

2

1

> 3.88 ·

2

1

= P F (30,24) > 1.94

= 0.05