Вы находитесь на странице: 1из 46

Data, Models and Decisions

Session 3: Distributions
PGP 13-15

Dr. Rohit Joshi, IIM Shillong

Learning Objectives
In this session, you will learn:
Distinguish between discrete random variables and
continuous random variables.
Know how to determine the mean and variance of a
discrete distribution and continuous distributions.
Identify the type of statistical experiments that can be
described by discrete distribution and continuous
distributions, and know how to work such problems.

Random variables
Random
Variables
Discrete
Random Variable

Continuous
Random Variable

What is a distribution?

Describes the shape of a batch of numbers


Probability of happening of each outcome

Why Distribution?
can serve as a basis for standardized

comparison of empirical distributions


can help us estimate confidence intervals
for inferential statistics
form a basis for more advanced statistical
methods
fit between observed distributions and certain

theoretical distributions is an assumption of many


statistical procedures

Distributions
Discrete distributions
Binomial Distribution
Hyper geometric Distribution
Poisson Distribution
Continuous Distribution
Normal Distribution
Uniform Distribution
Beta Distribution (t, F, Chi square)
Exponential Distribution

Important discrete probability


distribution: The binomial

The Binomial Distribution: Properties


A fixed number of observations, n
ex. 15 tosses of a coin; ten light bulbs taken from a
warehouse
Two mutually exclusive and collectively

exhaustive categories
ex. head or tail in each toss of a coin; defective or not

defective light bulb; having a boy or girl


Generally called success and failure
Probability of success is p, probability of failure is 1 p
Constant probability for each observation
The outcome of one observation does not affect the outcome of

the other

Two sampling methods


Infinite population without replacement
Finite population with replacement

Binomial distribution
Take the example of 5 coin tosses. Whats the
probability that you flip exactly 3 heads in 5 coin
tosses?

Binomial distribution, generally


Notethegeneralpatternemergingifyouhaveonlytwopossible
outcomes(callthem1/0oryes/noorsuccess/failure)innindependent
trials,thentheprobabilityofexactlyXsuccesses=
n = number of trials
n

X=#
successes
out of n
trials

p (1 p )

n X

1-p = probability
of failure
p=
probability of
success

Binomial distribution: example


If I toss a coin 20 times, whats the probability of

getting exactly 10 heads?

20

10

10
10
(.5) (.5) .176

Binomial distribution: example


If I toss a coin 20 times, whats the probability of

getting of getting 2 or fewer heads?


20

20

20

0
20
(.5) (.5)

19

(.5) (.5)

20!

(.5) 20 20 x9.5 x10 7 1.9 x10 5


19!1!

2
18
(.5) (.5)

1.8 x10 4

20!
(.5) 20 9.5 x10 7
20!0!

20!
(.5) 20 190 x9.5 x10 7 1.8 x10 4
18!2!

**All probability distributions are characterized


by an expected value and a variance:
If X follows a binomial distribution with parameters n
and p: X ~ Bin (n, p)
Mean

E(x) np

Variance and Standard Deviation

2 np (1 - p )

np (1 - p )

Where n = sample size


p = probability of success
(1 p) = probability of failure

Applications
A manufacturing plant labels items as either

defective or acceptable
A firm bidding for contracts will either get a
contract or not
A marketing research firm receives survey responses
of yes I will buy or no I will not
New job applicants either accept the offer or reject it
Your team either wins or loses the football game at
the company picnic

The Hypergeometric Distribution


The binomial distribution is applicable

when selecting from a finite population with


replacement or from an infinite population
without replacement.
The hypergeometric distribution is

applicable when selecting from a finite


population without replacement.

The Hypergeometric Distribution

P( X )

N A

n X
N

Where
N = population size
A = number of successes in the population
N A = number of failures in the population
n = sample size
X = number of successes in the sample
n X = number of failures in the sample

The Hypergeometric Distribution


Example
Different computers are checked from 10 in the

department. 4 of the 10 computers have illegal


software loaded. What is the probability that 2 of the
3 selected computers have illegal software loaded?
So, N = 10, n = 3, A = 4, X = 2
A

X
P(X 2)

N A

4 6

2 1
n X
(6)(6)

0.3
N
120
10

3
n

The probability that 2 of the 3 selected computers

have illegal software loaded is .30, or 30%.

The Hypergeometric Distribution


Characteristics
The mean of the hypergeometric distribution is:
E(x)

nA
N

The standard deviation is:

Where

nA(N - A) N - n

2
N
N -1

N-n
N - 1 is called the Finite Population Correction Factor

from sampling without replacement from a finite population

The Poisson Distribution Definitions


An area of opportunity is a continuous unit or

interval of time, volume, or such area in which


more than one occurrence of an event can
occur.
ex. The number of scratches in a cars paint
ex. The number of mosquito bites on a

person
ex. The number of computer crashes in a day

The Poisson Distribution Properties


Apply the Poisson Distribution when:
You wish to count the number of times an event occurs in a

given area of opportunity


The probability that an event occurs in one area of opportunity
is the same for all areas of opportunity
The number of events that occur in one area of opportunity is
independent of the number of events that occur in the other
areas of opportunity
The probability that two or more events occur in an area of
opportunity approaches zero as the area of opportunity
becomes smaller
The average number of events per unit is (lambda)

The Poisson Distribution Formula


e x
P(X)
X!
where:
X = the probability of X events in an area of opportunity
= expected number of events
e = mathematical constant approximated by 2.71828

An example
Suppose that, on average, 5 cars enter a parking lot

per minute. What is the probability that in a given


minute, 7 cars will enter?
e x e 5 5 7
P(7)

0.104
X!
7!
So, there is a 10.4% chance 7 cars will enter the

parking in a given minute.


Mean = Variance =

Continuous Probability Distribution

Continuous Distribution
A continuous random variable is a variable that can assume

any value on a continuum (can assume an uncountable


number of values)
thickness of an item
time required to complete a task
temperature of a solution
height
These can potentially take on any value, depending only on

the ability to measure precisely and accurately.

The Normal Distribution Properties


Bell Shaped

f(X)

Symmetrical and asymptotic


Mean, Median and Mode are equal
Location is characterized by the mean,
Spread is characterized by the standard

deviation,
Area to right and left of mean is 1/2.
The random variable has an infinite
theoretical range: - to +

Mean
= Median
= Mode

The Normal Distribution Density


Function
The formula for the normal probability density

function is
1
f(X)
e
2

1 (X )

Where e = the mathematical constant approximated by 2.71828


= the mathematical constant approximated by 3.14159
= the population mean
= the population standard deviation
X = any value of the continuous variable

Sigma understanding of a NC

q = 99.7 %

What is a Sigma

level?

A metric that indicates how well a process

is performing.
Higher is better
Measures the capability of the process to
perform defect-free work
Also known as z, it is based on standard
deviation for continuous data

Finding Probabilities
Probability is the area
under the curve!

P c X d ?

f(X)

Many Normal Distribution


There are an infinite number of normal distributions

By varying the parameters and , we obtain


different normal distributions

Table Lookup of a
Standard Normal Probability
P( 0 Z 1) 0. 3413
Z

0.00

0.01

0.02

0.000.0000 0.0040 0.0080


0.100.0398 0.0438 0.0478
0.200.0793 0.0832 0.0871
1.000.3413 0.3438 0.3461

-3

-2

-1

1.100.3643 0.3665 0.3686


1.200.3849 0.3869 0.3888

Standardized Normal
Distribution
Cumulative Standardized Normal
Distribution Table (Portion)

.00

.01

Z 0

Z 1

.02
.5478

0.0 .5000 .5040 .5080

Shaded Area
Exaggerated

0.1 .5398 .5438 .5478

0.2 .5793 .5832 .5871


0.3 .6179 .6217 .6255

Probabilities

0
Z = 0.12

Only One Table is Needed

Standardizing Example
X 6.2 5
Z

0.12

10
Standardized
Normal Distribution

Normal Distribution

10

Z 1

6.2

Shaded Area Exaggerated

Z 0

0.12

Example:

P 2.9 X 7.1 .1664

X 2.9 5
Z

.21

10

X 7.1 5
Z

.21

10
Standardized
Normal Distribution

Normal Distribution

10

.0832

Z 1

.0832
2.9

7.1

Shaded Area Exaggerated

0.21

Z 0

0.21

Example:

P 2.9 X 7.1 .1664(continued)

Cumulative Standardized Normal


Distribution Table (Portion)

.00

.01

Z 0

Z 1

.02

.5832

0.0 .5000 .5040 .5080

Shaded Area
Exaggerated

0.1 .5398 .5438 .5478

0.2 .5793 .5832 .5871


0.3 .6179 .6217 .6255

0
Z = 0.21

Example:

P 2.9 X 7.1 .1664(continued)

Cumulative Standardized Normal


Distribution Table (Portion)

.00

.01

.02

Z 0

Z 1

.4168

-03 .3821 .3783 .3745

Shaded Area
Exaggerated

-02 .4207 .4168 .4129

-0.1 .4602 .4562 .4522


0.0 .5000 .4960 .4920

0
Z = -0.21

Example:

P X 8 .3821
X 85
Z

.30

10
Standardized
Normal Distribution

Normal Distribution

10

Z 1

.3821

X
Shaded Area Exaggerated

Z 0

0.30

Example:

P X 8 .3821

Cumulative Standardized Normal


Distribution Table (Portion)

.00

.01

Z 0

(continued)

Z 1

.02

.6179

0.0 .5000 .5040 .5080

Shaded Area
Exaggerated

0.1 .5398 .5438 .5478

0.2 .5793 .5832 .5871


0.3 .6179 .6217 .6255

0
Z = 0.30

Finding Z Values for Known


Probabilities
What is Z Given
Probability = 0.6217 ?

Z 0

Z 1

Cumulative Standardized Normal


Distribution Table (Portion)

.00

.01

0.2

0.0 .5000 .5040 .5080

.6217

0.1 .5398 .5438 .5478


0.2 .5793 .5832 .5871
Shaded Area
Exaggerated

Z .31

0.3 .6179 .6217 .6255

Recovering X Values for


Known Probabilities
Standardized
Normal Distribution

Normal Distribution

10

Z 1

.6179

.3821

Z 0

0.30

X Z 5 .30 10 8

An Example
We have a training program designed to
upgrade the supervisory skills of production
line supervisors. Because the program is self
administered, supervisors require different no.
of hours to complete the program. A study of
past participation indicates that the mean
length of time spent on the program is 500
hours and that this normally distributed random
variable has a standard deviation of 100 hrs.

Solve : (individual exercise)


What is the probability that a participant

selected at random will require more than


500 hrs to complete the program?
Between 500 and 650 hrs to complete the
training program?
More than 700 hrs.
Less than 580.
Between 450 to 650.

Lowest Stock decision at post office


The manager of a small postal substation is
trying to quantify the variation in the weekly
demand for mailing envelops. She has decided to
assume that this demand is normally distributed.
She knows that on an average 100 envelops are
purchased weekly and that 90 percent of the time,
weekly demand is below 115. The manager wants
to stock enough mailing envelops each week so
that the percentage of running out of envelops is
no higher than 5 percent. Can you suggest her the
lowest such stock level?

Prediction of number of spectators in a match


Mr. John, the McDonald stand manager for the One day
Series at Sri Lanka's cricket stadium, just had two
cancellation on his crew. This means that if more than 72,000
people come to watch todays cricket match, the line for hotdogs will constitute a disgrace to Mr. John and will harm
business at the future games. Mr. John knows from his
experience that number of spectators who come to the game is
normally distributed with mean 67,000 and standard deviation
4,000 people. Mr. John has an option to hire two temporary
employees to ensure the business wont be harmed in the
future at an additional cost of $200. If he believes the future
harm to business of having more than 72,000 fans at the match
would be $ 5000, what would you suggest him to go for?

Inspection Shop
On the basis of past experience,
automobile inspectors in Maruti Udyog
Limited in Gurgaon, have noticed that 5
percent of the cars coming in for their
annual inspection fail to pass. Find the
probability that between 7 and 18 of the
next 200 cars to enter the Inspection shop
will fail in the inspection.

Вам также может понравиться