Академический Документы
Профессиональный Документы
Культура Документы
PROBABILITY THEORY
Outline #11 (Tail-Sum Theorem, Conditional distribution and expectation)
∞
X
P(X > x)
x=0
The quantity P(X > x) is itself a sum, which makes the tail sum into
∞ X
X ∞
fX (r)
x=0 r=x+1
First we sum over r for each x, then over x, getting a set of pairs that looks
like this:
r=4****
r=3***
r=2**
r=1*
r=0
x = 0 1 2 3...
But we could also do the same sum by first summing over x for each r, then
summing over r. The sum becomes
∞ X
X r−1
fX (r)
r=1 x=0
∞
X
rfX (r) = E(X)
r=1
∞
X
E(X) = P(X > x)
x=0
1
2. Calculate the expectation for the geometric distribution by using the tail
sum theorem, then find the expectation for the negative binomial distribu-
tion by treating it as a sum of geometric random variables.
Use the “until” version: X is the total number of times you roll a die if
you stop when you first roll a 6. If q is the probability of not rolling a 6,
P(X > x) = q x . (Let p be the probability of rolling a 6, so p + q = 1.)
By the tail sum theorem,
∞ ∞
X X 1 1
E(X) = P(X > x) = qx = = .
x=0 x=0
1−q p
Let Yn be the total number of times you roll a die if you stop when you roll
a 6 for the nth time. Then Yn = X1 + X2 + · · · + Xn , where each Xi is a
geometric random variable equal to X above. (That is, Yn has the negative
binomial distribution.)
n
E(Yn ) = .
p
Sometimes we’ll write fX,Y and FX,Y for the functions above.
Recall that we defined X and Y to be independent if the events {X = x}
and {Y = y} are independent for all x and y.
Lemma 0.2. The discrete random variables X and Y are independent if
and only if
fX,Y (x, y) = fX (x)fY (y) for all x, y ∈ R.
2
Definition 0.3. The covariance of X and Y is
cov(X, Y )
ρ(X, Y ) = p .
(var(X) var(Y ))
Note that
with equality if and only if P(aX = bY ) = 1 for some real a and b, at least
one of which is non-zero.
3
(Checking the statement about equality will be part of the homework.)
Now let’s prove Lemma 0.4 using the Cauchy-Schwarz inequality.
Plugging in the definition of cov and var, this is equivalent to showing that
(E((X − EX)(Y − EY )))2 ≤ E((X − EX)2 )E((Y − EY )2 ).
Now note that X −EX and Y −EY are themselves random variables, so the
Cauchy-Schwarz inequality applies to them, proving exactly the statement
(E((X − EX)(Y − EY )))2 ≤ E((X − EX)2 )E((Y − EY )2 ).
Finally, by the Cauchy-Schwarz inequality, we get equality in the above
equation if and only if P(a(X − EX) = b(Y − EY )) = 1 for some real a and
b.
This implies P(aX = bY + c) = 1 for some constant c.
P((X = x) ∩ B)
f (x|B) = P((X = x)|B) =
P(B)
4
5. Conditional Expectation
Why is this useful? We can play the conditioning trick. For two events B
and B c that partition the sample space,
The result extends in the obvious way to any partition of the sample space
into disjoint events B1 , B2 , ....
Lemma 0.6. X
E(X) = P(Bi )E(X|Bi )
i
P((Y ≤ y) ∩ (X = x))
P(Y ≤ y|X = x) = = F(Y |X) (y|x).
P(X = x)
A conditional mass function is defined the same way:
P((Y = y) ∩ (X = x))
P(Y = y|X = x) = = f(Y |X) (y|x).
P(X = x)
The conditional expectation of Y, given the event {X = x}, is a function of
x. Since the events {X = x} form a partition of the sample space, Lemma
0.6 shows that
X
E(Y ) = P(X = x)E(Y |X = x).
x
5
Lemma 0.7. If X has mass function f and g : R → R, then
X
E(g(X)) = g(x)f (x)
x
Proof.
X
E(ψ(X)) = ψ(x)fX (x)
x
X
= yfY |X (y|x)fX (x)
x,y
X
= yfX,Y (x, y)
x,y
X
= yfY (y)
y
= E(Y ).
You and I meet a generous gambler who offers to pay us a dollar for each
head that appears in four tosses of a coin. Our winnings are a random
variable X. We agree that if there are fewer than two heads(event A), I
pocket the winnings, otherwise you do:
5
There are 5 of 16 ways for me to pocket the winnings: P(A) = 16
My conditional expectation is E(X|A) = 0 × 15 + 1 × 45 = 45 .
11
There are 11 of 16 ways for you to pocket the winnings: P(Ac ) = 16
6 4 1 28
Your conditional expectation is E(X|Ac ) = 2 × 11
+3× 11
+4× 11
= 11
.
Lemma 0.6 guarantees that
P(A)E(X|A) + P(Ac )E(X|Ac ) = E(X)
or
5 4 11 28
16 5
+ 16 11
=2
6
8. Two Wins Takes All
Here is an illustration of how, by using conditional expectation, you can
solve a problem that would otherwise force you to sum a messy series.
Harvard and Yale are playing a series of football overtimes, each of which
Harvard wins with probability p. As soon as one team wins two overtimes
in a row, the series ends. H is the event that Harvard wins the series,
Y = H c is the event that Yale wins the series, and random variable X is
the number of overtimes in the series.
By conditioning on the outcome of the first overtime, find E(X).
We must condition on the outcome of the second overtime!
If Harvard wins the first overtime, then either it wins the second overtime
and the series ends after just two overtimes, or Yale wins and the series
continues for one overtime longer than if Yale had just won the first over-
time.
Thus E(X|H1 ) = P(H2 )E(X|H1 ∩ H2 ) + P(Y2 )E(X|H1 ∩ Y2 )
and so
E(X|H1 ) = 2p + q(1 + E(X|Y1 ))
If Yale wins the first overtime, then either it wins the second overtime and
the series ends after just two overtimes, or Harvard wins and the series
continues for one overtime longer than if Harvard had just won the first
overtime.
Thus E(X|Y1 ) = P(Y2 )E(X|Y1 ∩ Y2 ) + P(H2 )E(X|Y1 ∩ H2 )
and so
E(X|Y1 ) = 2q + p(1 + E(X|H1 ))
The rest of the problem is just algebra. Solve to get
2 + q2
E(X|H1 ) =
1 − pq
and
2 + p2
E(X|Y1 ) = .
1 − pq
Then
E(X) = P(H1 )E(X|H1 ) + P(Y1 )E(X|Y1 )