Вы находитесь на странице: 1из 47

# Probability Distribution

## A continuous random variable can assume any value in an

interval on the real line or in a collection of intervals.

## It is not possible to talk about the probability of the random

variable assuming a particular value.

variable assuming a value within a given interval.

## The probability of the random variable assuming a value

within some given interval from x1 to x2 is defined to be the
area under the graph of the probability density function
between x1 and x2.

f (x)

f (x) Exponential

Uniform
f (x)

x1 x2

Normal

x1 xx12 x2

x
x1 x2

## Normal Probability Distributions

The normal probability distribution is the most important

## distribution for describing a continuous random variable.

It is widely used in statistical inference.

## Normal Probability Distributions

Normal Probability Density Function

1
( x )2 /2 2
f (x)
e
2
where:

= mean
= standard deviation
= 3.14159
e = 2.71828

## Normal Probability Distributions

Its a probability function, so no matter what the values of
and , must integrate to 1!

1
2

E(X)= =

1 x 2
(
)
e 2 dx

1
2

Var(X)=2 = ( x 2

1 x 2
(
)
e 2 dx

Standard Deviation(X)=

1 x 2
(
)
2

dx) 2

## Normal Probability Distributions

Characteristics
The distribution is symmetric; its skewness
measure is zero.

## Normal Probability Distributions

Characteristics
The entire family of normal probability distributions
is defined by its mean and its standard deviation .

Standard Deviation

Mean

## Normal Probability Distributions

Characteristics
The highest point on the normal curve is at the mean,
which is also the median and mode.

## Normal Probability Distributions

Characteristics
The mean can be any numerical value: negative, zero,
or positive.

x
-10

20

## Normal Probability Distributions

Characteristics
The standard deviation determines the width of the
curve: larger values result in wider, flatter curves.
= 15

= 25
x

## Normal Probability Distributions

Characteristics
Probabilities for the normal random variable are given
by areas under the curve. The total area under the curve
is 1 (.5 to the left of the mean and 0.5 to the right).

.5

.5
x

## Normal Probability Distributions

Since the area under the curve represents probability, the
probability of a normal random variable at one specific value is
zero . With a single value, one cant find the area since the
area must be bound by two values. Thus,

P(x = 10) = 0

P(x = 3) = 0

P(x = 7.5) = 0

P( 1 < x < 3)

P(x > 3)

## Normal Probability Distributions

Characteristics
68.26% of values of a normal random variable
are within +/- 1 standard deviation of its mean.
95.44% of values of a normal random variable
are within +/- 2 standard deviations of its mean.
99.72% of values of a normal random variable
are within +/- 3 standard deviations of its mean.

Characteristics
99.72%
95.44%
68.26%

3
1
2

+ 3
+ 1
+ 2

## Normal Probability Distributions

There may be thousands of normal distribution curves,

## each with a different mean and a different standard

deviation. Since the shapes are different, the areas under
the curves between any two points are also different.
To make life easier, all normal distributions can be
converted to a standard normal distribution. A standard
normal distribution has a mean of 0 and a standard
deviation of 1.
No matter what and are, the area between - and +
95.44%; and the area between -3 and +3 is about
99.72%.
Almost all values fall within 3 standard
deviations.

## How good is rule for real data?

Check some example data:
The mean of the weight of the women = 127.8
The standard deviation (SD) = 15.5

## 68% of 120 = .68x120 = ~ 82 runners

In fact, 79 runners fall within 1-SD (15.5 lbs) of the mean.

112.3

127.8

143.3

25

20
P
e
r
c
e
n
t

15

10

0
80

90

100

110

120
POUNDS

130

140

150

160

## 95% of 120 = .95 x 120 = ~ 114 runners

In fact, 115 runners fall within 2-SDs of the mean.

96.8

127.8

158.8

25

20
P
e
r
c
e
n
t

15

10

0
80

90

100

110

120
POUNDS

130

140

150

160

## 99.7% of 120 = .997 x 120 = 119.6 runners

In fact, all 120 runners fall within 3-SDs of the mean.

81.3

127.8

174.3

25

20
P
e
r
c
e
n
t

15

10

0
80

90

100

110

120
POUNDS

130

140

150

160

Example
Suppose SAT scores roughly follows a normal distribution in
the U.S. population of college-bound students (with range
restricted to 200-800), and the average math SAT is 500 with a
standard deviation of 50, then:
68% of students will have scores between 450 and 550

## 95% will be between 400 and 600

99.7% will be between 350 and 650

## Standard Normal Probability Distributions

The formula for the standardized normal probability
density function is

1
p( Z )
e
(1) 2

1 Z 0 2
(
)
2 1

e
2

1
( Z )2
2

## The Standard Normal Distribution (Z)

All normal distributions can be converted into the standard
normal curve by subtracting the mean and dividing by the
standard deviation:
Z

## Somebody calculated all the integrals for the standard

normal and put them in a table! So we never have to
integrate!
Even better, computers now do all the integration.

## Standard Normal Probability Distributions

The letter z is used to designate the standard
normal random variable.
1

z
0

## Applications of Standard Normal Distribution

Problem:
Whats the probability of getting a math SAT score of 575 or less,
=500 and =50?

575 500
Z
1.5
50
i.e., A score of 575 is 1.5 standard deviations above the mean
575

P( X 575)

(50)

200

1.5
1 x 500 2
(
)
e 2 50 dx

1
2

1
Z2
e 2 dz

Yikes!
But to look up Z= 1.5 in standard normal chart (or enter
into SAS) no problem! = .9332

## Applications of Standard Normal Distribution

Problem:
Test scores of a special examination administered to all potential
employees of a firm are normally distributed with a mean of 500
points and a standard deviation of 100 points. What is the probability
that a score selected at random will be higher than 700?

## P(x > 700) = ?

If we convert this normal variable, x, to a standard normal variable, z,

## z = (x - ) / = (700 500) / 100 = 2

-------------500----------700 x-scale
P(x > 700) = P(z > 2)
----------------0-----------2
z-scale

Problem
If birth weights in a population are normally distributed with a
mean of 109 oz and a standard deviation of 13 oz,
a. What is the chance of obtaining a birth weight of 141 oz
or heavier when sampling birth records at random?
b. What is the chance of obtaining a birth weight of 120 or
lighter?

Solution
a. What is the chance of obtaining a birth weight of 141 oz

## or heavier when sampling birth records at random?

141 109
Z
2.46
13
From the chart or SAS Z of 2.46 corresponds to a right tail (greater
than) area of: P(Z2.46) = 1-(.9931)= .0069 or .69 %

Solution
b. What is the chance of obtaining a birth weight of 120
or lighter?

120 109
Z
.85
13
From the chart or SAS Z of .85 corresponds to a left tail area of:
P(Z.85) = .8023= 80.23%

## Looking up probabilities in the

standard normal table
What is the area to
the left of Z=1.51 in
a standard normal
curve?

Z=1.51

Z=1.51

Area is 93.45%

## Are my data normal?

Not all continuous random variables are normally distributed!!
It is important to evaluate how well the data are approximated

by a normal distribution

## Are my data normally distributed?

1. Look at the histogram! Does it appear bell shaped?
2. Compute descriptive summary measuresare mean,
median, and mode similar?
3. Do 2/3 of observations lie within 1 std dev of the mean? Do
95% of observations lie within 2 std dev of the mean?
4. Look at a normal probability plotis it approximately
linear?
5. Run tests of normality (such as Kolmogorov-Smirnov). But,
be cautious, highly influenced by sample size!

## Normal approximation to the binomial

When you have a binomial distribution where n is large and p
is middle-of-the road (not too small, not too big, closer to .5),
then the binomial starts to look like a normal distribution in
fact, this doesnt even take a particularly large n
Recall: What is the probability of being a smoker among a
group of cases with lung cancer is .6, whats the
probability that in a group of 8 cases you have less than 2
smokers?

## Normal approximation to the binomial

When you have a binomial distribution where n is large and p
isnt too small (rule of thumb: mean>5), then the binomial starts
to look like a normal distribution
Recall: smoking example
.27

## Starting to have a normal shape

even with fairly small n. You can
imagine that if n got larger, the
bars would get thinner and thinner
and this would look more and
more like a continuous function,
with a bell curve shape. Here
np=4.8.

.27

## What is the probability of fewer than 2 smokers?

Exact binomial probability (from before) = .00065 + .008 = 0.00865

## Normal approximation probability:

=4.8
=1.39

2 (4.8) 2.8
Z

2
1.39
1.39

P(Z<2)= 0.0227

A little off, but in the right ballpark we could also use the value
to the left of 1.5 (as we really wanted to know less than but not
including 2; called the continuity correction)

Z

2.37
1.39
1.39
P(Z-2.37) =.0089

## A fairly good approximation of

the exact probability, .00865.

Practice problem
1. You are performing a cohort study. If the probability of

## developing disease in the exposed group is .25 for the study

duration, then if you sample (randomly) 500 exposed
people, Whats the probability that at most 120 people
develop the disease?

Solution:
By hand (yikes!):
P(X120) = P(X=0) + P(X=1) + P(X=2) + P(X=3) + P(X=4)+.+
P(X=120)= (.25) (.75) + (.25) (.75) + (.25) (.75) + (.25) (.75)
500

120

120

380

500

498

500

499

OR Use SAS:
data _null_;

put Cohort;
run;

0.323504227

## OR use, normal approximation:

=np=500(.25)=125 and 2=np(1-p)=93.75; =9.68
P(Z<-.52)= 0.3015

500
0

500

## More Sample Problems

a. P(z 1.5) =
b. P(z 1.0) =

c. P(1 z 1.5) =
d. P(0 < z < 2.5) =

## More Sample Problems

a. P(z 1.5) = 0.9332
b. P(z 1.0) = 0.8413

## c. P(1 z 1.5) = 0.9332 0.8413 = 0.09

d. P(0 < z < 2.5) = 0.9938 0.5000 = 0.4938

## More Sample Problems

a. P(z - 1.0) =
b. P(z - 1) =
c. P(z - 1.5) =
d. P(- 3 < z 0) =

## More Sample Problems

a. P(z - 1.0) = 0.1587
b. P(z - 1) = 1 P(z - 1) = 1 0.1587 = 0.8413
c. P(z - 1.5) = 1 P(z - 1.5) = 1 0.0668 = 0.9332
d. P(- 3 < z 0) = 0.5 0.0014 = 0. 4986

Given: = 77

= 20

## Convert to z: z = (x - ) / = (50 77) / 20 = - 1.35

P(x < 50) = P(z < - 1.35) = 0.0885
b. P(x > 100) = ?

## z = (100 77) / 20 = 1.15

P(x > 100) = P(z > 1.15) = 1 P(z 1.15) = 1 0.8749 =
0.1251 or 12.51 %

## Continuation of Sample Problem

c. x = ? to be considered a heavy user

Upper 20% of the area is in the right tail of the normal curve.
80% of the area is to the left. Go to Table 1 and locate 0.8 (or
80%) as the table entry. The closest entry is 0.7995. That
point represents a z-value of 0.84. Use this value of z in the
following equation:
z = (x - ) /
0.84 = (x 77)/ 20

x = 93.8 hours

## More Sample Problems

A statistics instructor grades on a curve. He does not want to
give more than 15 percent A in his class. If test scores of
students in statistics are normally distributed with a mean of
75 and a standard deviation of 10, what should be the cut-off
point for an A?
z = (x - ) /
1.04 = (x 75) / 10
x = 85.4 or 85

Sample Problems
The service life of a certain brand of automobile battery is
normally distributed with a mean of 1000 days and a standard
deviation of 100 days. The manufacturer of the battery wants
to offer a guarantee, but does not know the length of the
warranty. It does not want to replace more than 10 percent of
the batteries sold. What should be the length of the warranty?
z = (x - ) /
- 1.28 = (x 1000) / 100
x = 872 days

Reference:
Anderson Sweeney Williams