Probability Is The Ratio of The Number of Favorable Cases To That of All Cases Possible.

A Philosophical Essay on Probabilities (1814)
Probability is the ratio of the number of favorable

cases to that of all cases possible.
Suppose we throw a coin twice. What is the probability that
we will throw exactly one head?
There are four equally possible cases that might arise:
1. One head and one tail.
2. One tail and one head.
3. Two tails.
4. Two heads.
So there are 3 cases that will give us a head. The probability
that we seek is 3/4.
Pierre Simon de Laplace

(1749-1827)
Laplace firmly believed that, in reality, every event is fully determined

by general laws of the universe. But nature is complex and we are
woefully ignorant of her ways; we must therefore calculate
probabilities to compensate for our limitations. Event, in other words,
are probable only relative to our meager knowledge. In an epigram
that has defined strict determinism ever since, Laplace boasted that if
anyone could provide a complete account of the position and motion
of every particle in the universe at any single movement, then total
knowledge of natures laws would permit a full determination of all
future history. Laplace directly links the need for a theory of
probability to human ignorance of natures deterministic ways. He
writes: So it is that we owe to the weakness of the human mind one
of the most delicate and ingenious of mathematical theories, the
science of chance or probability. (Analytical Theory of Probabilities,
as cited by Stephen J. Gould, Dinosaurs in a Haystack, p. 27.
Laplaces Classical Definition: The Probability of an

event A is defined a-priori without actual experimentation
as
Number of outcomes favorable to A
P( A)
,
Total number of possible outcomes
(1-1)
provided all these outcomes are equally likely.

Consider a box with n white and m red balls. In this case,
there are two elementary outcomes: white ball or red ball.
Probability of selecting a white ball n .
nm
We can use above classical definition to determine
the
probability that a given number is divisible by a prime p.
3
An advantage of study of probability is its use in everyday life.

What actions will improve your probability of success. Buying
lottery tickets? Investing in junk bonds? Buying gold?
Investing in real estate? Smoking?
What events are most probable in my lifetime? What kind of
event is likely to kill me? Meteor impact? Earthquake?
Tsunami? Terrorism? War? Accident? Heart disease? Cancer?
Should you buy insurance? How much should you pay for
insurance?
None of these probabilities are fixed. As knowledge increases
and parameters change, so do the probabilities.
Prisoners dilemma
Three prisoners , A, B, C are in jail. One of them
is to be executed and the other two will be set free.
Prisoner A asked the guard : one of my partners B
or C will be set free. Could you please tell me
which one of them will be set free?
Guard thought a while and told A : If I do not tell
you, then your chance of death is 1/3. But if I tell
you, then there are only two left and you are one
of them to be killed. Your chance of death will be
1/2. Do you really want to increase your chance of
death ?
5
Some basics:
Flip a coin 3 times, how many possible outcomes are there?
With each flip there are two possible outcomes, and we do
this 3 times, so all the possible results are:
Flip 1
flip 2
flip 3
There are 3 events each with two

possible outcomes, so there are a total
of 2*2*2 results = 8.
The formulation is the number of
possible results with k trails with ni
possible outcomes in the Ith trial is
N ni
6
How many values can a 3-digit binary number
have?
Another example: How many possible license plates are

there using three letters and three numbers?
N=26*26*26*10*10*10= 1,757,600
Permutations: The permutations of r objects taken from a set
of n distinct objects is the number of ways n things taken r at a
time can be arranged.
Example: We have 20 rock samples, how many ways can you
select 3 samples from the 20?
The first rock can be any one of the 20; the 2nd can be any of
19, and the 3rd can be any of 18. So the answer is 20*19*18.
The formulation is:
n
Pr n(n 1)(n 2)...(n r 1)

7
Factorial: The factorial operation is defined as:

n
n! i n (n 1) (n 2)L 1
i1
By definition, 0! Is set equal to 1.

We can re-write the permutation equation as:
n(n 1)(n 2)L (n r 1)(n r )L 1

n!
n Pr
(n r )(n r 1)L 1
(n r )!
Example: How many different hands are there in straight

poker (no draw)?
52!
52!
52 51 50 49 48 318,372,600
52 P5
52 5! 47!
8
The poker example isnt quite correct, because it assumes

that the order that you received the cards in is important,
which it isnt. We need another parameter where order isnt
important.
COMBINATIONS: When we dont care about the order of
the outcomes (ABC=ACB), then we talk about the number of
COMBINATIONS of n objects taken r at a time. This turns
out to be the number of permutations divided by r!.
n
Pr
n!

n Cr
r! r!(n r)! r
n
n
arecalledthebinomialcoefficients
r
9
So how many different poker hands are there really?
C5 318,372,600
2,653,105
52 C 5
5!
5 4 3 2 1
52
How many ways can you pick three marbles from 9

marbles?
9!
9 8 7 3 4 7
84
9 C3
3!6! 3 2 1
1
10
Probability
Now that we know how to tell whats possible, how
do we tell whats probable?
The basic concept is: If there are s possible favorable
outcomes of an event and there are n outcomes
possible, then the probability of success is s/n.
p=s/n
However, this is only true if all outcomes are equally
likely.
11
Example: What is the probability of drawing an ace from

a deck of cards?
Since there are 52 cards, there are 52 possible
outcomes, and, since there are 4 aces, four of those
outcomes are favorable, thus:
P=4/52=1/13=7.7%
Example: A cancer surgery patient gets biopsies on 6
lymph nodes. If any one is found to contain cancer, then
the cancer will be known to have spread and the patient
will receive chemotherapy. If only 1 in 10 lymph nodes are
actually cancerous, what are the odds of all six sampled
nodes coming out negative?
12
Our possible outcomes are 10 nodes taken 6 at a time, or

10C6=10!/(6!(10-6)!)=10*9*8*7/(4*3*2*1)=10*3*7=210.
Favorable outcomes are picking the 1 cancerous node
out of 10 in 6 tries, which is the same as picking only the
9 clear nodes in 6 tries:
C6=9!/(6!(9-6)!)=9*8*7/(3*2*1)=84. So the probability is
84/210=40%. Lesson to surgeons: sample LOTS of
nodes!
9
When the probabilities of some outcomes are greater

than those of others, the above calculations dont work.
A better definition is:
The probability of an outcome is the fraction of
trials where that outcome is observed with a large
number of trials.
13
Example: The probability of sunshine for more than 2

hours per day in June in Murree is 97%. This statistic,
valuable to the Tourist Bureau, is based on a large
number of samples of sunshine in June.
The Law of Large Numbers:

If an experiment is repeated a large
number of times, the fraction of times a
particular outcome is observed will
approach the probability of that outcome.
14
Rules and definitions

S: Sample Space: All possible outcomes of an experiment
A: Event: a subset of S. An event may contain more than
one outcome
Mutually exclusive: Two events that have no common
outcomes.
The probability of an event must be greater than or equal to
0 and less than or equal to 1.
0 P(A) 1
Also, P(S)=1.
15
If P(A)=1, A is a certainty.
If two events are mutually exclusive, then the
probability that one or the other will occur is the sum
of their probabilities.
: the Union symbol. It means or
: the Intersection symbol. It means and
If A and B are mutually exclusive:
P(A B)= P(A)+P(B)
P(A B)=0
: the compliment symbol. It means not

P(A)+P(A)=1
16
Additional Probability Addition Rules
Venn Diagram
0.18 0.12
0.24
Venn Diagrams illustrate the the probabilities of nonexclusive events. The circles represent two different
events embedded in the sample space. This could be
the probabilities of hitting economical oil (Orange) or
gas (pink).
P(oil)=0.18+0.12=0.30
P(gas)=0.24+0.12=0.36
P(gas oil)=0.18+0.12+0.24=0.54
Note: this is the inclusive OR in that both events can
17
occur and still be counted.
If we had used our previous addition rule,

AB=P(A)+P(B)=P(oil)+P(gas)=0.30+0.36=0.66,
We overestimate the probability of finding gas and oil.
We fix that by writing:
P(oil gas)=P(oil)+P(gas)-P(oil gas)=0.3+0.36-0.12=0.54
If the events are mutually exclusive, then P(A) P(B)=0,
And the original rule is recovered.
18
Conditional Probability
What if probabilities are very common - probabilities
where an outcome depends on the occurrence of a
previous outcome.
If a strength 5 hurricane hits hawks bay, what is the
probability that a Kanupp will fail?
If an earthquake occurs of the South coast, what is
the probability that a major tsunami will be
generated.
If a disaster occurs, what is the probability that our
insurance company will not have sufficient funds
If oil supply drops below demand, what is the
probability that we can make due with alternative
energy?
19
Conditional probability is the probability that an event

will occur, given that another event has already
occurred.
P(A|B)=
P(A B)
P(B)
The probability of A given that B has occurred is equal to

the probability of A and B divided by the probability of B.
In the oil and gas example, what is the probability of
finding oil given that gas was found?
P(oil | gas)= 0.12/0.36= 1/3= 33%
0.18
0.12
20
0.24
Bayes Basic Theorem

Re-writing the above equation, we get:
P(A B) = P(B) P(A | B)
and
P(A B) = P(A) P(B | A).
If A and B are independent, then if B has already

occurred or not does not affect the probability of A:
P(A|B)=P(A).
Substituting into Bayes Theorem:
P(A B)= P(A) P(B), if A and B are independent.
For n independent events:
P pi
i1
21
Example:
What is the probability of death by meteoroid impact?
The probability of a planet killer meteoroid impact in a
given year are about 10-8. The average person lives
about 60 years, and there are about 5x109 people.
Every 108 years, 5x109 people will be killed by an
impact, but every 60 years 5x109 people will be killed
by other causes. So, in 108 years, 108/60 * 5x109 die of
other causes, and 5x109 people will be killed by an
impact.
Divide the total deaths by impact by total deaths by
other to get the probability of death by impact:
P(death by impact)~ 1 in 17 million
this is about the same as death by lightning
22
Peak Oil Example

The probability that A) demand for oil will outstrip supply
within the next 5 years is ~70%.
The probability that B) we will be able to satisfy demand
with other energy sources to take up the demand: ~20%
The probability of C) global economic chaos if A B is
~60%.
The probability of global economic chaos beginning
within the next 5 years:
P(A) P(B|A)=0.7*(1-0.2)=0.56
P( C ) = 0.56*0.6 ~ 34 %
This is the argument that is getting considerable attention
now: Google peak oil
23
If there is more than 1 event Bi (all mutually exclusive) that

are conditionally related to event A, then P(A) is the sum of
the conditional probabilities of the Bi.
n
P(A) P(A | Bi ) P(Bi )

i1
This yields:
P(Bi | A)
P(Bi ) P(A | Bi )
P(A | B ) P(B )
i
i1
Which is the general Bayes Theorem.
24
Like much of statistics, the formulas are incomprehensible

without examples. Consider:
An unknown marine fossil fragment was found at the fossil

site in a stream bed. You want a better fossil, but there are
two possible sources up stream. Drainaage basin B1 covers
25
180 km2 and B2 covers 100 km2 .
Based on the area alone, the probability that the fossil comes from
one or the other basins is:
P(B1)=180/280=0.64
P(B2)=100/280=.36
However, a geological map shows that 35% of the outcrops in B1
are marine, while 80% of the outcrops in B2 are marine. The
conditional probabilities are:
P(A|B1)=0.35 probability of fossil given B1
P(A|B2)=0.80 probability of fossil given B2
We can now use Bayes theorum to find the probability that the fossil
came from B1, given that the fossil is marine:
P(B1 | A)
P(A | B1)P(B1)
0.35 * 0.64
0.44
P(A | B1)P(B1) P(A | B2)P(B2) 0.35 * 0.64 0.80 * 0.36
26
A
A
A B
A B
Fig.1.1
If A B , the empty set, then A and B are

said to be mutually exclusive (M.E).
A partition of is a collection of mutually exclusive
subsets of such that their union is .
Ai Aj , and
A .
(1-9)
i 1
A1
A B
Fig. 1.2
A2
Aj
27
Ai
An
De-Morgans Laws:
A B A B ;
A
A B
A B A B
A B
(1-10)
A B
Fig.1.3
Often it is meaningful to talk about at least some of the
subsets of as events, for which we must have mechanism

to compute their probabilities.
Example 1.1: Consider the experiment where two coins are

simultaneously tossed. The various elementary events are
28
1 ( H , H ), 2 ( H , T ), 3 (T , H ), 4 (T , T )
and
1 , 2 , 3 , 4 .
The subset A 1 , 2 , 3 is the same as Head

has occurred at least once and qualifies as an event.
Suppose two subsets A and B are both events, then
consider
Does an outcome belong to A or B A B
Does an outcome belong to A and B A B
Does an outcome fall outside A?
29
is a measure of the event A given that B has already

occurred. We denote this conditional probability by
P(A|B) = Probability of the event A given
that B has occurred.
We define
P ( AB )
P( A | B )
,
P( B )
(1-35)
provided P( B ) 0. As we show below, the above definition

satisfies all probability axioms discussed earlier.
30
We have
(i)
P ( AB ) 0
P( A | B )
0,
P( B) 0
(ii)
P ( | B )
P ( B ) P ( B )
1,
P( B )
P( B )
(1-36)
since B = B.
(1-37)
(iii) Suppose A C 0. Then

P( A C | B )
P (( A C ) B ) P ( AB CB )
.
P( B)
P( B )
(1-38)
But AB AC , hence P( AB CB ) P( AB ) P(CB ).

P ( AB ) P (CB )
P( A C | B )
P ( A | B ) P (C | B ),
P( B)
P( B )
(1-39)
satisfying all probability axioms in (1-13). Thus (1-35)

defines a legitimate probability measure.
31
Properties of Conditional Probability:

a. If B A, AB B, and
P( A | B)
P ( AB) P ( B )
1
P( B)
P( B)
(1-40)
since if B A, then occurrence of B implies automatic

occurrence of the event A. As an example, but
A {outcome is even}, B={outcome is 2},
in a dice tossing experiment. Then B A, and P( A | B ) 1.
b. If A B, AB A, and
P ( AB ) P ( A)
P( A | B )
P ( A).
P( B )
P( B )
(1-41)
32
(In a dice experiment, A {outcome is 2}, B ={outcome is even},

so that A B. The statement that B has occurred (outcome
is even) makes the odds for outcome is 2 greater than
without that information).
c. We can use the conditional probability to express the
probability of a complicated event in terms of simpler
related events.
Let A1 , A2 ,, An are pair wise disjoint and their union is .
Thus Ai Aj , and
n
Thus
i 1
(1-42)
B B ( A1 A2 An ) BA1 BA2 33BAn . (1-43)
But Ai Aj BAi BAj , so that from (1-43)

P( B)
P( BA ) P( B | A ) P( A ).
i 1
i 1
(1-44)
With the notion of conditional probability, next we

introduce the notion of independence of events.
Independence: A and B are said to be independent events,
if
P ( AB ) P ( A) P ( B ).
(1-45)
Notice that the above definition is a probabilistic statement,
not a set theoretic notion such as mutually exclusiveness.
34
Suppose A and B are independent, then

P ( AB )
P ( A) P ( B )
P( A | B )
P ( A).
P( B )
P( B )
(1-46)
Thus if A and B are independent, the event that B has

occurred does not shed any more light into the event A. It
makes no difference to A whether B has occurred or not.
An example will clarify the situation:
Example 1.2: A box contains 6 white and 4 black balls.
Remove two balls at random without replacement. What
is the probability that the first one is white and the second
one is black?
Let W1 = first ball removed is white
B2 = second ball removed is black
35
We need P (W1 B2 ) ? We have W1 B2 W1B2 B2W1.

Using the conditional probability rule,
P (W1B2 ) P ( B2W1 ) P ( B2 | W1 ) P (W1 ). (1-47)
But
and
6
6
3
P (W1 )
,
6 4 10 5
4
4
P ( B2 | W1 )
,
54 9
and hence
5 4 20
P (W1 B2 )
0.25.
9 9 81
36
Are the events W1 and B2 independent? Our common sense

says No. To verify this we need to compute P(B2). Of course
the fate of the second ball very much depends on that of the
first ball. The first ball has two options: W1 = first ball is
white or B1= first ball is black. Note that W1 B1 ,
W1 B1 .
and
Hence W1 together with B1 form a partition.
Thus (see (1-42)-(1-44))
P ( B2 ) P ( B2 | W1 ) P (W1 ) P ( B2 | R1 ) P ( B1 )
4 3
3
4 4 3 1 2 42 2

,
5 4 5 6 3 10 9 5 3 5
15
5
and
2 3
20
P ( B2 ) P (W1 ) P ( B2W1 )
.
5 5
81
As expected, the events W1 and B2 are dependent.

37
From (1-35),
P ( AB ) P ( A | B ) P ( B ).
Similarly, from (1-35)

P ( B | A)
or
(1-48)
P ( BA)
P ( AB )
,
P ( A)
P ( A)
P ( AB ) P ( B | A) P ( A).
(1-49)
From (1-48)-(1-49), we get

or
P ( A | B ) P ( B ) P ( B | A) P ( A).
P( A | B )
P ( B | A)
P ( A)
P( B )
(1-50)
Equation (1-50) is known as Bayes theorem.

38
Although simple enough, Bayes theorem has an interesting

interpretation: P(A) represents the a-priori probability of the
event A. Suppose B has occurred, and assume that A and B
are not independent. How can this new information be used
to update our knowledge about A? Bayes rule in (1-50)
take into account the new information (B has occurred)
and gives out the a-posteriori probability of A given B.
We can also view the event B as new knowledge obtained
from a fresh experiment. We know something about A as
P(A). The new information is available in terms of B. The
new information should be used to improve our
knowledge/understanding of A. Bayes theorem gives the
exact mechanism for incorporating such new information.
39
A more general version of Bayes theorem involves

partition of . From (1-50)
P ( B | Ai ) P ( Ai )
P ( Ai | B )
P( B )
P ( B | Ai ) P ( Ai )
n
P( B | A ) P( A )
i 1
(1-51)
where we have made use of (1-44). In (1-51), Ai , i 1 n,

represent a set of mutually exclusive events with
associated a-priori probabilities P ( A ), i 1 n. With the
i
new information B has occurred, the information about
Ai can be updated by the n conditional probabilities
P ( B | Ai ), i 1 n, using (1 - 47).
40
Example 1.3: Two boxes B1 and B2 contain 100 and 200

light bulbs respectively. The first box (B1) has 15 defective
bulbs and the second 5. Suppose a box is selected at
random and one bulb is picked out.
(a) What is the probability that it is defective?
Solution: Note that box B1 has 85 good and 15 defective
bulbs. Similarly box B2 has 195 good and 5 defective bulbs.
Let D = Defective bulb is picked out.
Then
15
P ( D | B1 )
0.15,
100
5
P ( D | B2 )
0.025.
200
41
Since a box is selected at random, they are equally likely.

P ( B1 ) P ( B2 )
1
.
2
Thus B1 and B2 form a partition as in (1-43), and using

(1-44) we obtain
P ( D ) P ( D | B1 ) P ( B1 ) P ( D | B2 ) P ( B2 )
0.15
1
1
0.025 0.0875.
2
2
Thus, there is about 9% probability that a bulb picked at

random is defective.
42
(b) Suppose we test the bulb and it is found to be defective.

What is the probability that it came from box 1? P ( B1 | D ) ?
P ( B1 | D )
P ( D | B1 ) P ( B1 ) 0.15 1 / 2
0.8571. (1-52)
P( D)
0.0875
Notice that initially P( B1 ) 0.5; then we picked out a box

at random and tested a bulb that turned out to be defective.
Can this information shed some light about the fact that we
might have picked up box 1?
From (1-52), P ( B1 | D) 0.857 0.5, and indeed it is more
likely at this point that we must have chosen box 1 in favor
of box 2. (Recall box1 has six times more defective bulbs
compared to box2).
43
CONTINUOUS RANDOM
VARIABLES
44
Definition and Basic Properties

Recall that a random variable X is simply a
function from a sample space S into the real
numbers. The random variable is discrete is the
range of X is finite or countably infinite. This
refers to the number of values X can take on, not
the size of the values. The random variable is
continuous if the range of X is uncountably
infinite and X has a suitable pdf (see below).
Typically an uncountably infinite range results
from an X that makes a physical measurement
e.g., the position, size, time, age, flow, volume, or
area of something.
45

The pdf of a continuous random variable X
must satisfy two conditions.
It is a nonnegative function (but unlike in the
discrete case it may take on values exceeding
1).
Its definite integral over the whole real line
equals one. That is
f ( x)dx 1
46

The pdf of a continuous random variable X must satisfy
three conditions.
Its definite integral over a subset B of the real numbers
gives the probability that X takes a value in B. That is,
f ( x) P( X B)
for every subset B of the real numbers. As a special

b
case (the usualf case)
( x)dxforall
P (real
a numbers
X b) a and b
Put simply, the probability is simply the area under the

47
pdf curve over the interval [a,b].

If X has uncountable range and such a pdf,
then X is a continuous random variable. In
this case we often refer to f as a continuous
pdf. Note that this means f is the pdf of a
continuous random variable. It does not
necessarily mean that f is a continuous
function.
48

Note that by this definition the probability of X
taking on a single value a is always 0.
This
a
follows from P( X a) P(a X a) a f ( x)dx 0 ,
since every definite integral over a degenerate
interval is 0. This is, of course, quite different
from the situation for discrete random variables.
49

Consequently we can be sloppy about
inequalities. That is
P ( a X b) P (a X b) P ( a X b) P (a X b)
Remember that this is blatantly false for

discrete random variables.
50

There are random variables that are neither discrete nor
continuous, being discrete at some points in their ranges
and continuous at others. They are not hard to construct,
but they seldom appear in introductory courses and will
not concern us.
Mathematicians have defined many generalizations of the
Riemann integral of freshman calculusthe RiemannStieljes integral and the Lesbegue integral being common
examples. With a suitable generalized integral it is possible
to treat discrete and continuous random variables
identically (as well as the mixed random variables), but
this approach lies far beyond the scope of our course.
51

Examples
Let X be a random variable with range [0,2] and pdf defined
by f(x)=1/2 for all x between 0 and 2 and f(x)=0 for all other
values of x. Note that since the integral of zero is zero we get
1
f ( x)dx 1/ 2dx x 1 0 1
0
2 0
2
That is, as with all continuous pdfs, the total area under the
curve is 1. We might use this random variable to model the
position at which a two-meter with length of rope breaks when
put under tension, assuming every point is equally likely.
Then the probability the break occurs in the last half-meter of
2
the rope is
2
2
1
P (3 / 2 X 2)
3/ 2
f ( x)dx 1/ 2dx
3/ 2
1/ 4
3/ 2
52

Examples
Let Y be a random variable whose range is the
nonnegative reals and whose pdf is defined by
1 x / 750
f ( x)
e
750
for nonnegative values of x (and 0 for negative values
of x). Then
f ( x)dx
lim e
t
x / 750 t
0
t
1 x / 750
e
dx lim e x / 750 dx
t 0
750
lim e 0 e 750 / t 1 0 1
t
53

The random variable Y might be a reasonable
choice to model the lifetime in hours of a
standard light bulb with average life 750 hours.
To find the probability a bulb lasts under 500
hours, you calculate
P(0 Y 500)
500
1 x / 750
x / 750 500
e
dx e
e 2 / 3 1 0.487
0
750
54

Note that in both these examples the pdf is not a
continuous function. Also note that in all these
cases the pdf behaves as a linear density function
in the physical sense: the definite integral of the
density of a nonhomogeneous wire or of a lamina
gives the mass of the wire or lamina over the
specified interval. Here the mass is the
probability.
55
Cumulative Distribution
Functions
The cdf of a continuous random variable has the
same definition as that for a discrete random
variable. That is,
F ( x) P( X x)
In practice this means that F is essentially a
particular antiderivative of the pdf since
F ( x) P( X x)
f (t )dt
Thus at the points where f is continuous

F(x)=f(x).
56
Functions
Knowing the cdf of a random variable greatly
facilitates computation of probabilities involving
that random variable since, by the Fundamental
Theorem of Calculus,
P(a X b) F (b) F (a )
57
Functions
In the second example above, F(x)=0 if x is
negative and for nonnegative x we have
F ( x)
1 t / 750
t / 750 x
e
dt e
e x / 750 1 1 e x / 750
0
750
Thus the probability of a light bulb lasting

between 500 and 1000 hours is
F (1000) F (500) (1 e 1000 / 750 ) (1 e 500 / 750 ) e 2 / 3 e 4 / 3 0.250
58
Functions
In the first example above F(x)=0 for negative x,
F(x)=1 for x greater than 2 and F(x)=x/2 for x
between 0 and 2 since for such x we have
x
1
1
F ( x) 1/ 2dt t x
0
2 0 2
x
Thus to find the probability the rope breaks

somewhere in the first meter we calculate F(1)F(0)=1/2-0-1/2, which is intuitively correct.
59
Functions
If X is a continuous random variable, then its cdf
is a continuous function. Moreover,
lim F ( x) 0
and
lim F ( x) 1
x
Again these results are intuitive

60
Expectation and Variance

Definitions
The expected value of a continuous random variable
X is defined by
E ( X ) xf ( x)dx
Note the similarity to the definition for discrete

random variables. Once again we often denote it by
. As in the discrete case this integral may not
converge, in which case the expectation if X is
undefined.
61

Definitions
As in the discrete case we define the variance by
Var ( X ) E (( X ) )
2
Once again the standard deviation is the square root

of variance. Variance and standard deviation do not
exist if the expected value by which they are defined
does not converge.
62

Theorems
The Law Of The Unconscious Statistician holds in
the continuous case. Here it states
E (h( X )) h( x) f ( x)dx
Expected value still preserves linearity. That is
E (aX b) aE ( X ) b
The proof depends on the linearity of the definite
integral (even an improper Riemann integral).
63

Theorems
Similarly the expected value of a sum of functions of X equals
the sum of the expected values of those functions (see theorem
4.3 in the book) by the linearity of the definite integral.
The shortcut formula for the variance holds for continuous
random variables, depending only on the two preceding
linearity results and a little algebra, just as in the discrete case.
The formula states
Var ( X ) E ( X 2 ) E ( X ) 2 E ( X 2 ) 2
Variance and standard deviation still act in the same way on

2
linear functions of X. Namely Var (aX b) a Var ( X ) and
SD(aX b) a SD ( X )
64

Examples
In the two-meter-wire problem, the expected value
should be 1, intuitively. Let us calculate:
E( X )
21
1 2
1
x dx x dx x 1 0 1
0 4
4 0
2
65

Examples
In the same example the variance is
Var ( X ) E ( X ) 1
2
1 3
1
1
x dx 1 x 1
6 0
3
2
2
and consequently
1
1
3
SD( X )
0.577
3
3
3
This result seems plausible.
66

Examples
It is also possible to compute the expected value and
variance in the light bulb example. The integration
involves integration by parts.
67

Probability Is The Ratio of The Number of Favorable Cases To That of All Cases Possible.

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Probability Is The Ratio of The Number of Favorable Cases To That of All Cases Possible.

Загружено:

Авторское право:

Доступные форматы

A Philosophical Essay on Probabilities (1814)

Probability is the ratio of the number of favorable

Pierre Simon de Laplace

Laplace firmly believed that, in reality, every event is fully determined

Laplaces Classical Definition: The Probability of an

provided all these outcomes are equally likely.

An advantage of study of probability is its use in everyday life.

There are 3 events each with two

Another example: How many possible license plates are

Pr n(n 1)(n 2)...(n r 1)

Factorial: The factorial operation is defined as:

By definition, 0! Is set equal to 1.

n(n 1)(n 2)L (n r 1)(n r )L 1

Example: How many different hands are there in straight

The poker example isnt quite correct, because it assumes

So how many different poker hands are there really?

How many ways can you pick three marbles from 9

Example: What is the probability of drawing an ace from

Our possible outcomes are 10 nodes taken 6 at a time, or

When the probabilities of some outcomes are greater

Example: The probability of sunshine for more than 2

The Law of Large Numbers:

Rules and definitions

: the compliment symbol. It means not

Additional Probability Addition Rules

If we had used our previous addition rule,

Conditional probability is the probability that an event

The probability of A given that B has occurred is equal to

Bayes Basic Theorem

P(A B) = P(A) P(B | A).

If A and B are independent, then if B has already

Peak Oil Example

If there is more than 1 event Bi (all mutually exclusive) that

P(A) P(A | Bi ) P(Bi )

Which is the general Bayes Theorem.

Like much of statistics, the formulas are incomprehensible

An unknown marine fossil fragment was found at the fossil

If A B , the empty set, then A and B are

Often it is meaningful to talk about at least some of the

subsets of as events, for which we must have mechanism

Example 1.1: Consider the experiment where two coins are

The subset A 1 , 2 , 3 is the same as Head

is a measure of the event A given that B has already

provided P( B ) 0. As we show below, the above definition

(iii) Suppose A C 0. Then

But AB AC , hence P( AB CB ) P( AB ) P(CB ).

satisfying all probability axioms in (1-13). Thus (1-35)

Properties of Conditional Probability:

since if B A, then occurrence of B implies automatic

(In a dice experiment, A {outcome is 2}, B ={outcome is even},

B B ( A1 A2 An ) BA1 BA2 33BAn . (1-43)

But Ai Aj BAi BAj , so that from (1-43)

With the notion of conditional probability, next we

Suppose A and B are independent, then

Thus if A and B are independent, the event that B has

We need P (W1 B2 ) ? We have W1 B2 W1B2 B2W1.

Are the events W1 and B2 independent? Our common sense

As expected, the events W1 and B2 are dependent.

Similarly, from (1-35)

From (1-48)-(1-49), we get

Equation (1-50) is known as Bayes theorem.

Although simple enough, Bayes theorem has an interesting