Вы находитесь на странице: 1из 7

MATHEMATICS 154, SPRING 2009

PROBABILITY THEORY
Outline #11 (Tail-Sum Theorem, Conditional distribution and expectation)

Last modified: March 7, 2009


Reference:
PRP, Sections 3.6 and 3.7.

1. Tail-Sum Theorem (PRP, page 84, problem 13)

Random variable X has non-negative integer values. The “tail sum” is


X
P(X > x)
x=0

The quantity P(X > x) is itself a sum, which makes the tail sum into

∞ X
X ∞
fX (r)
x=0 r=x+1

First we sum over r for each x, then over x, getting a set of pairs that looks
like this:
r=4****
r=3***
r=2**
r=1*
r=0
x = 0 1 2 3...
But we could also do the same sum by first summing over x for each r, then
summing over r. The sum becomes

∞ X
X r−1
fX (r)
r=1 x=0


X
rfX (r) = E(X)
r=1

We have proved the tail-sum theorem:


X
E(X) = P(X > x)
x=0

1
2. Calculate the expectation for the geometric distribution by using the tail
sum theorem, then find the expectation for the negative binomial distribu-
tion by treating it as a sum of geometric random variables.
Use the “until” version: X is the total number of times you roll a die if
you stop when you first roll a 6. If q is the probability of not rolling a 6,
P(X > x) = q x . (Let p be the probability of rolling a 6, so p + q = 1.)
By the tail sum theorem,

∞ ∞
X X 1 1
E(X) = P(X > x) = qx = = .
x=0 x=0
1−q p

Let Yn be the total number of times you roll a die if you stop when you roll
a 6 for the nth time. Then Yn = X1 + X2 + · · · + Xn , where each Xi is a
geometric random variable equal to X above. (That is, Yn has the negative
binomial distribution.)

n
E(Yn ) = .
p

3. We want to study further the relationship between two random variables.


Definition 0.1. The joint mass function of two discrete random variables
X and Y is the function f : R2 → [0, 1] defined by
f (x, y) = P(X = x and Y = y).

Their joint distribution function is the function F : R2 → [0, 1] which is


defined by
F (x, y) = P(X ≤ x and Y ≤ y).

Sometimes we’ll write fX,Y and FX,Y for the functions above.
Recall that we defined X and Y to be independent if the events {X = x}
and {Y = y} are independent for all x and y.
Lemma 0.2. The discrete random variables X and Y are independent if
and only if
fX,Y (x, y) = fX (x)fY (y) for all x, y ∈ R.

(This will be on the homework.)

Recall that we defined X and Y to be uncorrelated if E(XY ) = E(X)E(Y ).


And we proved that if X and Y are independent, then they are uncorrelated.
When X and Y are not uncorrelated, we want some measure of just how
correlated they are.

2
Definition 0.3. The covariance of X and Y is

cov(X, Y ) = E[(X − EX)(Y − EY )].

And we define the correlation (coefficent) of X and Y to be

cov(X, Y )
ρ(X, Y ) = p .
(var(X) var(Y ))

Note that

cov(X, Y ) = E[(X − EX)(Y − EY )] = E(XY ) − (EX)(EY ).

So cov(X, Y ) = 0 if and only if X and Y are uncorrelated.


When X = Y , cov(X, X) = E(X 2 ) − (EX)2 = var(X). So covariance
generalizes the notion of variance.
Why do we bother defining the correlation coefficient? It is because we’d
like to measure how correlated two random variables are on a fixed scale
(with fixed maximum and minimum values).
Lemma 0.4. The correlation coefficient ρ satisfies |ρ(X, Y )| ≤ 1 with
equality if and only if ρ(aX + bY = c) = 1 for some a, b, c ∈ R.

We’ll prove Lemma 0.4 using the Cauchy-Schwarz inequality.


Theorem 0.5. (Cauchy-Schwarz Inequality) For random variables X, Y ,

{E(XY )}2 ≤ E(X 2 )E(Y 2 ).

with equality if and only if P(aX = bY ) = 1 for some real a and b, at least
one of which is non-zero.

Proof. For a, b ∈ R, let Z = aX − bY . Then

0 ≤ E(Z 2 ) = a2 E(X 2 ) − 2abE(XY ) + b2 E(Y 2 ).

We view a2 E(X 2 ) − 2abE(XY ) + b2 E(Y 2 ) as a quadratic polynomial in a.


Since it is always ≥ 0, this a polynomial with at most one real root. (Why?)
Therefore its discriminant must be non-positive, i.e. if b 6= 0,

(E(XY ))2 − E(X 2 )E(Y 2 ) ≤ 0.

3
(Checking the statement about equality will be part of the homework.)
Now let’s prove Lemma 0.4 using the Cauchy-Schwarz inequality.

Proof. We want to show that |ρ(X, Y )| ≤ 1.

Plugging in the definition of ρ, this is equivalent to showing that

(cov(X, Y ))2 ≤ var(X) var(Y ).

Plugging in the definition of cov and var, this is equivalent to showing that
(E((X − EX)(Y − EY )))2 ≤ E((X − EX)2 )E((Y − EY )2 ).

Now note that X −EX and Y −EY are themselves random variables, so the
Cauchy-Schwarz inequality applies to them, proving exactly the statement
(E((X − EX)(Y − EY )))2 ≤ E((X − EX)2 )E((Y − EY )2 ).
Finally, by the Cauchy-Schwarz inequality, we get equality in the above
equation if and only if P(a(X − EX) = b(Y − EY )) = 1 for some real a and
b.
This implies P(aX = bY + c) = 1 for some constant c.

4. Conditional Random Variable


A random variable is still a random variable if you condition on some event.
Simple examples:

• The roll on a die, conditioned on the number being odd.


• The number of heads when you toss a coin 4 times, conditioned on
getting at least 3 heads.

The conditional mass function is

P((X = x) ∩ B)
f (x|B) = P((X = x)|B) =
P(B)

It is easy to show that this is a mass function (it sums to 1).


Note that fY |X = fX,Y /fX .

4
5. Conditional Expectation

Since f (x|B) is a mass function, it has an expectation:


X
E(X|B) = xf (x|B)
x

Why is this useful? We can play the conditioning trick. For two events B
and B c that partition the sample space,

E(X) = P(B)E(X|B) + P(B c )E(X|B c )

The result extends in the obvious way to any partition of the sample space
into disjoint events B1 , B2 , ....
Lemma 0.6. X
E(X) = P(Bi )E(X|Bi )
i

6. For two random variables X and Y , define the conditional distribution


function, conditional mass function, and conditional expectation of Y given
X = x. Prove that E(E(Y |X)) = E(Y )
This is a special case of a conditional random variable.
Since {X = x} is an event, we can condition on it as long as its probability is
positive. The resulting conditional distribution function for Y is a function
both of y and of x.

P((Y ≤ y) ∩ (X = x))
P(Y ≤ y|X = x) = = F(Y |X) (y|x).
P(X = x)
A conditional mass function is defined the same way:

P((Y = y) ∩ (X = x))
P(Y = y|X = x) = = f(Y |X) (y|x).
P(X = x)
The conditional expectation of Y, given the event {X = x}, is a function of
x. Since the events {X = x} form a partition of the sample space, Lemma
0.6 shows that
X
E(Y ) = P(X = x)E(Y |X = x).
x

Now recall a lemma from Section 3.3 of the book.

5
Lemma 0.7. If X has mass function f and g : R → R, then
X
E(g(X)) = g(x)f (x)
x

(whenever the sum is absolutely convergent).


Definition 0.8. Let ψ(x) = E(Y |X = x). We call it the conditional expec-
tation of Y given X, written as ψ(X) or E(Y |X).

Note: although conditional expectation sounds like a number, this is actu-


ally a RANDOM VARIABLE.
Theorem 0.9. E(ψ(X)) = E(Y )

Proof.
X
E(ψ(X)) = ψ(x)fX (x)
x
X
= yfY |X (y|x)fX (x)
x,y
X
= yfX,Y (x, y)
x,y
X
= yfY (y)
y

= E(Y ).

7. Conditional Expectation Example

You and I meet a generous gambler who offers to pay us a dollar for each
head that appears in four tosses of a coin. Our winnings are a random
variable X. We agree that if there are fewer than two heads(event A), I
pocket the winnings, otherwise you do:
5
There are 5 of 16 ways for me to pocket the winnings: P(A) = 16
My conditional expectation is E(X|A) = 0 × 15 + 1 × 45 = 45 .
11
There are 11 of 16 ways for you to pocket the winnings: P(Ac ) = 16

6 4 1 28
Your conditional expectation is E(X|Ac ) = 2 × 11
+3× 11
+4× 11
= 11
.
Lemma 0.6 guarantees that
P(A)E(X|A) + P(Ac )E(X|Ac ) = E(X)
or
5 4 11 28
16 5
+ 16 11
=2

6
8. Two Wins Takes All
Here is an illustration of how, by using conditional expectation, you can
solve a problem that would otherwise force you to sum a messy series.
Harvard and Yale are playing a series of football overtimes, each of which
Harvard wins with probability p. As soon as one team wins two overtimes
in a row, the series ends. H is the event that Harvard wins the series,
Y = H c is the event that Yale wins the series, and random variable X is
the number of overtimes in the series.
By conditioning on the outcome of the first overtime, find E(X).
We must condition on the outcome of the second overtime!
If Harvard wins the first overtime, then either it wins the second overtime
and the series ends after just two overtimes, or Yale wins and the series
continues for one overtime longer than if Yale had just won the first over-
time.
Thus E(X|H1 ) = P(H2 )E(X|H1 ∩ H2 ) + P(Y2 )E(X|H1 ∩ Y2 )
and so
E(X|H1 ) = 2p + q(1 + E(X|Y1 ))
If Yale wins the first overtime, then either it wins the second overtime and
the series ends after just two overtimes, or Harvard wins and the series
continues for one overtime longer than if Harvard had just won the first
overtime.
Thus E(X|Y1 ) = P(Y2 )E(X|Y1 ∩ Y2 ) + P(H2 )E(X|Y1 ∩ H2 )
and so
E(X|Y1 ) = 2q + p(1 + E(X|H1 ))
The rest of the problem is just algebra. Solve to get
2 + q2
E(X|H1 ) =
1 − pq
and
2 + p2
E(X|Y1 ) = .
1 − pq
Then
E(X) = P(H1 )E(X|H1 ) + P(Y1 )E(X|Y1 )

which works out to


2 + q2 2 + p2 2 + pq
P(H) = p +q = .
1 − pq 1 − pq 1 − pq

Вам также может понравиться