Ch05 Student (Prob. Tuts)

Pairs of Random Variables
• In this chapter, we consider experiments that produce a collection of random vari-

ables, X1 , X2 , . . . , Xn , where n can be any integer.
• For most of this chapter, we study n = 2 random variables: X and Y . A pair of

random variables is enough to show the important concepts and useful problem-
solving techniques. Moreover, the definitions and theorems we introduce for X and
Y generalize to n random variables. These generalized definitions appear near the
end of this chapter in Section 5.10.
• We also note that a pair of random variables X and Y is the same as the two-
0
dimensional vector X Y . Similarly, the random variables X1 , . . . , Xn can be writ-
0
ten as the n dimensional vector X = X1 · · · Xn . Since the components of X
are random variables, X is called a random vector . Thus this chapter begins our
study of random vectors.
• This subject is continued in Chapter 8, which uses techniques of linear algebra to

develop further the properties of random vectors.
Pairs of Random Variables: Definitions
• We begin here with the definition of FX,Y (x, y), the joint cumulative distribution
function of two random variables, a generalization of the CDF introduced in Sec-
tion 3.4 and again in Section 4.2.
• The joint CDF is a complete probability model for any experiment that produces two
random variables. However, it not very useful for analyzing practical experiments.
• More useful models are PX,Y (x, y), the joint probability mass function for two dis-
crete random variables, presented in Sections 5.2 and 5.3, and fX,Y (x, y), the joint
probability density function of two continuous random variables, presented in Sec-
tions 5.4 and 5.5.
• Section 5.7 considers functions of two random variables and expectations, respec-
tively.
• We extend the definition of independent events to define independent random vari-

ables.
• The subject of Section 5.9 is the special case in which X and Y are Gaussian.
Example 5.1
We would like to measure random variable X, but we instead observe
Y = X + Z. (5.1)
The noise Z prevents us from perfectly observing X. In some settings, Z
is an interfering signal. In the simplest setting, Z is just noise inside the
circuitry of your measurement device that is unrelated to X. In this case,
it is appropriate to assume that the signal and noise are independent;
that is, the events X = x and Z = z are independent. This simple model
produces three random variables, X, Y and Z, but any pair completely
specifies the remaining random variable. Thus we will see that a proba-
bility model for the pair (X, Z) or for the pair (X, Y ) will be sufficient to
analyze experiments related to this system.
Section 5.1
Joint Cumulative Distribution

Function
Figure 5.1
{X £ x, Y £ y} (x,y)
The area of the (X, Y ) plane corresponding to the joint cumulative dis-
tribution function FX,Y (x, y).
Joint Cumulative Distribution
Definition 5.1 Function (CDF)
The joint cumulative distribution function of random variables X and Y

is
FX,Y (x, y ) = P [X ≤ x, Y ≤ y ] .
Theorem 5.1
For any pair of random variables, X, Y ,
(a) 0 ≤ FX,Y (x, y) ≤ 1, (f) FX,Y (−∞, y) = 0,

(b) FX,Y (∞, ∞) = 1, (g) If x ≤ x1 and y ≤ y1, then
(c) FX(x) = FX,Y (x, ∞),
(d) FY (y) = FX,Y (∞, y),
(e) FX,Y (x, −∞) = 0, FX,Y (x, y ) ≤ FX,Y (x1, y1)
Example 5.2 Problem
X years is the age of children entering first grade in a school. Y years is

the age of children entering second grade. The joint CDF of X and Y is





0 x < 5,




0 y < 6,

(x − 5)(y − 6) 5 ≤ x < 6, 6 ≤ y < 7,


FX,Y (x, y ) = (5.3)


y−6 x ≥ 6, 6 ≤ y < 7,

x−5 5 ≤ x < 6, y ≥ 7,






1

otherwise.
Find FX(x) and FY (y).
Example 5.2 Solution
Using Theorem 5.1(b) and Theorem 5.1(c), we find

 
0 x < 5, 0 y < 6,

 

FX (x) =

x−5 5 ≤ x < 6, F Y (y ) =

y−6 6 ≤ y < 7, (5.4)
x ≥ 6, y ≥ 7.
 

1 
1
Referring to Theorem 4.6, we see from Equation (5.4) that X is a contin-
uous uniform (5, 6) random variable and Y is a continuous uniform (6, 7)
random variable.
Theorem 5.2
P [x1 < X ≤ x2, y1 < Y ≤ y2] = FX,Y (x2, y2) − FX,Y (x2, y1)
− FX,Y (x1, y2) + FX,Y (x1, y1) .
Quiz 5.1
Express the following extreme values of the joint CDF FX,Y (x, y) as num-
bers or in terms of the CDFs FX(x) and FY (y).
(a) FX,Y (−∞, 2)
(b) FX,Y (∞, ∞)
(c) FX,Y (∞, y)
(d) FX,Y (∞, −∞)

Quiz 5.1 Solution
Each value of the joint CDF can be found by considering the correspond-
ing probability.
(a) FX,Y (−∞, 2) = P[X ≤ −∞, Y ≤ 2] ≤ P[X ≤ −∞] = 0 since X cannot
take on the value −∞.
(b) FX,Y (∞, ∞) = P[X ≤ ∞, Y ≤ ∞] = 1.
This result is given in Theorem 5.1.
(c) FX,Y (∞, y) = P[X ≤ ∞, Y ≤ y] = P[Y ≤ y] = FY (y).
(d) FX,Y (∞, −∞) = P[X ≤ ∞, Y ≤ −∞] = P[Y ≤ −∞] = 0 since Y cannot
take on the value −∞.
Section 5.2
Joint Probability Mass Function

Joint Probability Mass
Definition 5.2 Function (PMF)
The joint probability mass function of discrete random variables X and

Y is
PX,Y (x, y ) = P [X = x, Y = y ] .
Example 5.3 Problem
Test two integrated circuits one after the other. On each test, the
possible outcomes are a (accept) and r (reject). Assume that all circuits
are acceptable with probability 0.9 and that the outcomes of successive
tests are independent. Count the number of acceptable circuits X and
count the number of successful tests Y before you observe the first reject.
(If both tests are successful, let Y = 2.) Draw a tree diagram for the
experiment and find the joint PMF PX,Y (x, y).
a •aa
0.9 X=2,Y =2The experiment has the tree dia-

0.9

a X
X

XXX
gram shown to the left. The sample
XXX

0.1 X
X r •ar X=1,Y =1space of the experiment is

HH
S = {aa, ar, ra, rr} .

H
HH
H
HH
0.9

a •ra X=1,Y =0 (5.7)
0.1 H
H r
XXX

X=0,Y =0Observing the tree diagram, we com-

XXX
r •rr
XXX
0.1 X
pute
P [aa] = 0.81, P [ar] = 0.09, (5.8)

P [ra] = 0.09, P [rr] = 0.01. (5.9)
Each outcome specifies a pair of values X and Y . Let g(s) be the function
that transforms each outcome s in the sample space S into the pair of
random variables (X, Y ). Then
g(aa) = (2, 2), g(ar) = (1, 1), g(ra) = (1, 0), g(rr) = (0, 0).
(5.10)
[Continued]
Example 5.3 Solution (Continued 2)
For each pair of values x, y, PX,Y (x, y) is the sum of the probabilities of the
outcomes for which X = x and Y = y. For example, PX,Y (1, 1) = P[ar].
PX,Y (x, y) y = 0 y = 1 y = 2 The joint PMF can be represented by

x=0 0.01 0 0 the table on left, or, as shown below, as
x=1 0.09 0.09 0 a set of labeled points in the x, y plane
x=2 0 0 0.81
where each point is a possible value of
the pair (x, y), or as a simple list:
y
.81 
•
6
2 
0.81 x = 2, y = 2,



0.09 x = 1, y = 1,


.09 
1 • PX,Y (x, y ) = 0.09 x = 1, y = 0,


.01 .09

0.01 x = 0, y = 0.
x

0 • • - 

0

otherwise
0 1 2
Theorem 5.3
For discrete random variables X and Y and any set B in the X, Y plane,
the probability of the event {(X, Y ) ∈ B} is
X
P [B ] = PX,Y (x, y ) .
(x,y)∈B
Example 5.4 Problem
Continuing Example 5.3, find the probability of the event B that X,

the number of acceptable circuits, equals Y , the number of tests before
observing the first failure.
Mathematically, B is the event {X = Y }. The elements of B with nonzero

probability are
B ∩ SX,Y = {(0, 0), (1, 1), (2, 2)} . (5.12)

Therefore,
P [B ] = PX,Y (0, 0) + PX,Y (1, 1) + PX,Y (2, 2)

= 0.01 + 0.09 + 0.81 = 0.91. (5.13)
Figure 5.2
Y Y
B={X+Y £ 3} B={X 2 + Y 2 £ 9}
X X
Subsets B of the (X, Y ) plane. Points (X, Y ) ∈ SX,Y are marked by

bullets.
Quiz 5.2
The joint PMF PQ,G(q, g) for random variables Q and G is given in the
following table:
PQ,G (q, g ) g = 0 g = 1 g = 2 g = 3
q=0 0.06 0.18 0.24 0.12
q=1 0.04 0.12 0.16 0.08
Calculate the following probabilities:
(a) P[Q = 0]
(b) P[Q = G]
(c) P[G > 1]
(d) P[G > Q]
Quiz 5.2 Solution
From the joint PMF of Q and G given in the table, we can calculate the requested
probabilities by summing the PMF over those values of Q and G that correspond to the
event.
(a) The probability that Q = 0 is
P [Q = 0] = PQ,G (0, 0) + PQ,G (0, 1) + PQ,G (0, 2) + PQ,G (0, 3)
= 0.06 + 0.18 + 0.24 + 0.12 = 0.6. (1)
(b) The probability that Q = G is
P [Q = G] = PQ,G (0, 0) + PQ,G (1, 1) = 0.18. (2)
(c) The probability that G > 1 is
3 X
X 1
P [G > 1] = PQ,G (q, g)
g=2 q=0
= 0.24 + 0.16 + 0.12 + 0.08 = 0.6. (3)
(d) The probability that G > Q is
1
X 3
X
P [G > Q] = PQ,G (q, g)
q=0 g=q+1
= 0.18 + 0.24 + 0.12 + 0.16 + 0.08 = 0.78. (4)
Section 5.3
Marginal PMF
Theorem 5.4
For discrete random variables X and Y with joint PMF PX,Y (x, y),
X X
P X ( x) = PX,Y (x, y ) , P Y (y ) = PX,Y (x, y ) .
y∈SY x∈SX
Example 5.5 Problem
PX,Y (x, y) y = 0 y = 1 y = 2 In Example 5.3, we found that random
x=0 0.01 0 0 variables X and Y have the joint PMF
x=1 0.09 0.09 0 shown in this table. Find the marginal
x=2 0 0 0.81
PMFs for the random variables X and
Y.
We note that both X and Y have range {0, 1, 2}. Theorem 5.4 gives
2
X 2
X
PX (0) = PX,Y (0, y) = 0.01 PX (1) = PX,Y (1, y) = 0.18 (5.14)
y=0 y=0
2
X
PX (2) = PX,Y (2, y) = 0.81 PX (x) = 0 x 6= 0, 1, 2 (5.15)
y=0
Referring to the table representation of PX,Y (x, y), we observe that each value of PX(x)
is the result of adding all the entries in one
P row of the table. Similarly, the formula for
the PMF of Y in Theorem 5.4, PY (y) = x∈SX PX,Y (x, y), is the sum of all the entries in
one column of the table. [Continued]
We display PX(x) and PY (y) by rewriting the table and placing the row sums and column
sums in the margins.
PX,Y (x, y) y=0 y=1 y=2 PX (x)
x=0 0.01 0 0 0.01
x=1 0.09 0.09 0 0.18
x=2 0 0 0.81 0.81
PY (y) 0.10 0.09 0.81
Thus the column in the right margin shows PX(x) and the row in the bottom margin
shows PY (y). Note that the sum of all the entries in the bottom margin is 1 and
so is the sum of all the entries in the right margin. This is simply a verification of
Theorem 3.1(b), which states that the PMF of any random variable must sum to 1.
Quiz 5.3
The probability mass function PH,B(h, b) for the two random variables H
and B is given in the following table. Find the marginal PMFs PH(h) and
PB(b).
PH,B (h, b) b = 0 b = 2 b = 4
h = −1 0 0.4 0.2
(5.16)
h=0 0.1 0 0.1
h=1 0.1 0.1 0
Quiz 5.3 Solution
By Theorem 5.4, the marginal PMF of H is
X
PH (h) = PH,B (h, b) . (1)
b=0,2,4
For each value of h, this corresponds to calculating the row sum across the table of the
joint PMF. Similarly, the marginal PMF of B is
1
X
PB (b) = PH,B (h, b) . (2)
h=−1
For each value of b, this corresponds to the column sum down the table of the joint
PMF. The easiest way to calculate these marginal PMFs is to simply sum each row and
column:
PH,B(h,b) b=0 b=2 b=4 PH(h)
h = −1 0 0.4 0.2 0.6
h=0 0.1 0 0.1 0.2
h=1 0.1 0.1 0 0.2
PB(b) 0.2 0.5 0.3
Section 5.4
Joint Probability Density Function

Joint Probability Density
Definition 5.3 Function (PDF)
The joint PDF of the continuous random variables X and Y is a function

fX,Y (x, y) with the property
Z x Z y
FX,Y (x, y ) = fX,Y (u, v ) dv du.
−∞ −∞
Theorem 5.5
∂ 2FX,Y (x, y )
fX,Y (x, y ) =
∂x ∂y
Example 5.6 Problem
Use the joint CDF for childrens’ ages X and Y given in Example 5.2 to
derive the joint PDF presented in Equation (5.5).
Referring to Equation (5.3) for the joint CDF FX,Y (x, y), we must eval-
uate the partial derivative ∂ 2FX,Y (x, y)/∂x ∂y for each of the six regions
specified in Equation (5.3). However, ∂ 2FX,Y (x, y)/∂x ∂y is nonzero only
if FX.Y (x.y) is a function of both x and y. In this example, only the region
{5 ≤ x < 6, 6 ≤ y < 7} meets this requirement. Over this region,
∂2 ∂ ∂
fX,Y (x, y ) = [(x − 5)(y − 6)] = [x − 5] [y − 6] = 1. (5.18)
∂x ∂y ∂x ∂y
Over all other regions, the joint PDF fX,Y (x, y) is zero.
Theorem 5.6
A joint PDF fX,Y (x, y) has the following properties corresponding to first
and second axioms of probability (see Section 1.2):
Z ∞ Z ∞
(a) fX,Y (x, y) ≥ 0 for all (x, y), (b) fX,Y (x, y) dx dy = 1.
−∞ −∞
Theorem 5.7
The probability that the continuous random variables (X, Y ) are in A is

ZZ
P [A] = fX,Y (x, y ) dx dy.
A
Example 5.7 Problem
Random variables X and Y have joint PDF


c 0 ≤ x ≤ 5, 0 ≤ y ≤ 3,
fX,Y (x, y ) = (5.19)
0 otherwise.
Find the constant c and P[A] = P[2 ≤ X < 3, 1 ≤ Y < 3].
The large rectangle in the diagram is the area of nonzero probability.

Theorem 5.6 states that the integral of the joint PDF over this rectangle
is 1:
Z 5Z 3
1= c dy dx = 15c. (5.20)
Y 0 0
A Therefore, c = 1/15. The small dark rect-
angle in the diagram is the event A =
{2 ≤ X < 3, 1 ≤ Y < 3}. P[A] is the integral of the
PDF over this rectangle, which is
X Z 3Z 3
1
P [A] = dv du = 2/15. (5.21)
2 1 15
This probability model is an example of a pair of random variables uni-

formly distributed over a rectangle in the X, Y plane.
Figure 5.3
Y Y
1 1
y
y
X X
x 1 x 1
x < 0 or y < 0 0≤y≤x≤1

(a) (b)
Y Y
1 1
y
y
X X
x 1 1 x
0≤x<y 0≤y≤1
0≤x≤1 x>1
(c) (d)
(Case (e) covering the whole triangle is omitted.)
Five cases for the CDF FX,Y (x, y) of Example 5.8.

Example 5.8 Problem
Find the joint CDF FX,Y (x, y) when X and Y have joint PDF
Y
1
fXY(x,y)=2

2 0 ≤ y ≤ x ≤ 1,
fX,Y (x, y ) = (5.22)
0 otherwise.
X
1
We can derive the joint CDF using Definition 5.3 in which we integrate the joint PDF
fX,Y (x, y) over the area shown in Figure 5.1. To perform the integration it is extremely
useful to draw a diagram that clearly shows the area with nonzero probability and then
to use the diagram to derive the limits of the integral in Definition 5.3.
The difficulty with this integral is that the nature of the region of integration depends
critically on x and y. In this apparently simple example, there are five cases to consider!
The five cases are shown in Figure 5.3. First, we note that with x < 0 or y < 0, the
triangle is completely outside the region of integration, as shown in Figure 5.3a. Thus
we have FX,Y (x, y) = 0 if either x < 0 or y < 0. Another simple case arises when x ≥ 1
and y ≥ 1. In this case, we see in Figure 5.3e that the triangle is completely inside the
region of integration, and we infer from Theorem 5.6 that FX,Y (x, y) = 1. The other
cases we must consider are more complicated. In each case, since fX,Y (x, y) = 2 over
the triangular region, the value of the integral is two times the indicated area. When
(x, y) is inside the area of nonzero probability (Figure 5.3b), the integral is
Z yZ x
FX,Y (x, y) = 2 du dv = 2xy − y 2 (Figure 5.3b). (5.23)
0 v
[Continued]
In Figure 5.3c, (x, y) is above the triangle, and the integral is
Z xZ x
FX,Y (x, y) = 2 du dv = x2 (Figure 5.3c). (5.24)
0 v
The remaining situation to consider is shown in Figure 5.3d, when (x, y) is to the right
of the triangle of nonzero probability, in which case the integral is
Z yZ 1
FX,Y (x, y) = 2 du dv = 2y − y 2 (Figure 5.3d) (5.25)
0 v
The resulting CDF, corresponding to the five cases of Figure 5.3, is

 0 x < 0 or y < 0 (a),
 2xy − y 2 0≤y≤x≤1 (b),


FX,Y (x, y) = x2 0 ≤ x < y, 0 ≤ x ≤ 1 (c), (5.26)

 2y − y 2 0 ≤ y ≤ 1, x > 1 (d),


1 x > 1, y > 1 (e).
In Figure 5.4, the surface plot of FX,Y (x, y) shows that cases (a) through (e) correspond
to contours on the “hill” that is FX,Y (x, y). In terms of visualizing the random variables,
the surface plot of FX,Y (x, y) is less instructive than the simple triangle characterizing
the PDF fX,Y (x, y).
Because the PDF in this example is fX,Y (x, y) = 2 over (x, y) ∈ SX,Y , each probability is
just two times the area of the region shown in one of the diagrams (either a triangle
or a trapezoid). You may want to apply some high school geometry to verify that the
results obtained from the integrals are indeed twice the areas of the regions indicated.
The approach taken in our solution, integrating over SX,Y to obtain the CDF, works for
any PDF.
Figure 5.4
0.5
2
1
0
0 0.5 1 0 y
1.5 2
x
A graph of the joint CDF FX,Y (x, y) of Example 5.8.

Example 5.9 Problem
As in Example 5.7, random variables X and Y have joint PDF


1/15 0 ≤ x ≤ 5, 0 ≤ y ≤ 3,
fX,Y (x, y ) = (5.27)
0 otherwise.
What is P[A] = P[Y > X]?
Applying Theorem 5.7, we integrate fX,Y (x, y) over the part of the X, Y
plane satisfying Y > X. In this case,
Y
Y>X Z 3 Z 3 !
1
P [A] = dy dx (5.28)
0 x 15
3
Z 3
3−x 2
(3 − x) 3
= dx = − = . (5.29)
0 15 30
0 10
X
Quiz 5.4
The joint probability density function of random variables X and Y is


cxy 0 ≤ x ≤ 1, 0 ≤ y ≤ 2,
fX,Y (x, y ) = (5.30)
0 otherwise.
Find the constant c. What is the probability of the event A = X 2 + Y 2 ≤

1?
Quiz 5.4 Solution
To find the constant c, we apply
Z ∞Z ∞
1= fX,Y (x, y) dx dy
−∞ −∞
2Z 1 2
! 1 Z 2 2 2
x2
Z Z
c cy
= cxy dx dy = c y dy = y dy = = c. (1)
0 0 0 2 0
2 0 4 0
Thus c = 1.
Y
To calculate P[A], we write
2
ZZ
P [A] = fX,Y (x, y) dx dy (2)
A 1
To integrate over A, we convert to polar coordinates using the sub- A
stitutions x = r cos θ, y = r sin θ and dx dy = r dr dθ.
X
1
This yields
Z π/2 Z 1
P [A] = r2 sin θ cos θ r dr dθ
0 0
Z 1 Z π/2 2
π/2!

1 sin θ 1
= r3 dr 4
sin θ cos θ dθ = r /40 = . (3)
0 0 2 0 8
Section 5.5
Marginal PDF
Theorem 5.8
If X and Y are random variables with joint PDF fX,Y (x, y),
Z ∞ Z ∞
fX (x) = fX,Y (x, y ) dy, f Y (y ) = fX,Y (x, y ) dx.
−∞ −∞
Proof: Theorem 5.8
From the definition of the joint PDF, we can write

Z x Z ∞
FX (x) = P [X ≤ x] = fX,Y (u, y ) dy du. (5.31)
−∞ −∞
Taking the derivative of both sides with respect to x (which involves
differentiating an integral with variable limits), we obtain
Z ∞
f X ( x) = fX,Y (x, y ) dy
−∞
. A similar argument holds for fY (y).
Example 5.10 Problem
The joint PDF of X and Y is


5y/4 −1 ≤ x ≤ 1, x2 ≤ y ≤ 1,
fX,Y (x, y ) = (5.32)
0 otherwise.
Find the marginal PDFs fX(x) and fY (y).
We use Theorem 5.8 to find the marginal PDF fX(x). In the figure that
accompanies Equation (5.33) below, the gray bowl-shaped region depicts
those values of X and Y for which fX,Y (x, y) > 0. When x < −1 or when
x > 1, fX,Y (x, y) = 0, and therefore fX(x) = 0. For −1 ≤ x ≤ 1,
Y X=x
1
5(1 − x4)
Z 1
5y
f X ( x) = dy = . (5.33)
x
2 x2 4 8
X
-1 x 1
The complete expression for the marginal PDF of X is
0.5
fX(x)

5(1 − x4)/8 −1 ≤ x ≤ 1,
f X ( x) = (5.34)
0 otherwise.
0
−1 0 1
x
[Continued]
For the marginal PDF of Y , we note that for y < 0 or y > 1, fY (y) = 0. For 0 ≤ y ≤ 1,
we integrate over the horizontal bar marked Y = y. The boundaries of the bar are
√ √
x = − y and x = y. Therefore, for 0 ≤ y ≤ 1,
Y
1
Z √
y
x=√y
5y 5y
Y=y fY (y) = √ dx = x
4 x=−√y
= 5y 3/2 /2. (5.35)
− y 4
X
1/2 1/2
-1 -y y 1
The complete marginal PDF of Y is
3
2
fY(y)
(5/2)y 3/2

0 ≤ y ≤ 1,
1 fY (y) = (5.36)
0 otherwise.
0
−1 0 1
y
Quiz 5.5


6(x + y 2)/5 0 ≤ x ≤ 1, 0 ≤ y ≤ 1,
fX,Y (x, y ) = (5.37)
0 otherwise.
Find fX(x) and fY (y), the marginal PDFs of X and Y .
Quiz 5.5 Solution
By Theorem 5.8, the marginal PDF of X is
Z ∞
fX (x) = fX,Y (x, y) dy (1)
−∞
Note that fX(x) = 0 for x < 0 or x > 1. For 0 ≤ x ≤ 1,

6 1
Z
6 y=1 6x + 2
fX (x) = (x + y 2 ) dy = xy + y 3 /3 y=0 = (2)
5 0 5 5
The complete expression for the PDF of X is

(6x + 2)/5 0≤x≤1
fX (x) = (3)
0 otherwise
By the same method we obtain the marginal PDF for Y . For 0 ≤ y ≤ 1,
Z ∞
fY (y) = fX,Y (x, y) dy
−∞
1 2 x=1
6y 2 + 3
Z
6 2 6 x 2

= (x + y ) dx = + xy = . (4)
5 0 5 2 x=0 5
Since fY (y) = 0 for y < 0 or y > 1, the complete expression for the PDF of Y is
(3 + 6y 2 )/5 0 ≤ y ≤ 1,

fY (y) = (5)
0 otherwise.
Section 5.6
Independent Random Variables

Definition 5.4 Independent Random Variables
Random variables X and Y are independent if and only if
Discrete: PX,Y (x, y) = PX(x)PY (y)
Continuous: fX,Y (x, y) = fX(x)fY (y).

Are the childrens’ ages X and Y in Example 5.2 independent?

In Example 5.2, we derived the CDFs FX(x) and FY (y), which showed
that X is uniform (5, 6) and Y is uniform (6, 7). Thus X and Y have
marginal PDFs
 
1 5 ≤ x < 6, 1 6 ≤ x < 7,
f X ( x) = fY (y ) = (5.38)
0 otherwise, 0 otherwise.
Referring to Equation (5.5), we observe that fX,Y (x, y) = fX(x)fY (y).
Thus X and Y are independent.

4xy 0 ≤ x ≤ 1, 0 ≤ y ≤ 1,
fX,Y (x, y ) =
0 otherwise.
Are X and Y independent?
The marginal PDFs of X and Y are

 
2x 0 ≤ x ≤ 1, 2y 0 ≤ y ≤ 1,
f X ( x) = fY (y ) = (5.39)
It is easily verified that fX,Y (x, y) = fX(x)fY (y) for all pairs (x, y), and so
we conclude that X and Y are independent.

24uv u ≥ 0, v ≥ 0, u + v ≤ 1,
fU,V (u, v ) = (5.40)
0 otherwise.
Are U and V independent?
Since fU,V (u, v) looks similar in form to fX,Y (x, y) in the previous example,
we might suppose that U and V can also be factored into marginal PDFs
fU(u) and fV (v). However, this is not the case. Owing to the triangular
shape of the region of nonzero probability, the marginal PDFs are
 
12u(1 − u)2 0 ≤ u ≤ 1, 12v(1 − v)2 0 ≤ v ≤ 1,
fU (u) = fV (v ) =
Clearly, U and V are not independent. Learning U changes our knowledge
of V . For example, learning U = 1/2 informs us that P[V ≤ 1/2] = 1.
Consider again the noisy observation model of Example 5.1. Suppose X

is a Gaussian (0, σX ) information signal sent by a radio transmitter and
Y = X + Z is the output of a low-noise amplifier attached to the antenna
of a radio receiver. The noise Z is a Gaussian (0, σZ ) random variable
that is generated within the receiver. What is the joint PDF fX,Z(x, z)?
From the information given, we know that X and Z have PDFs

1 −x2 /2σX
2 1 −z 2 /2σZ
2
fX (x) = q e , fZ (z ) = q e . (5.41)
2
2πσX 2
2πσZ
The signal X depends on the information being transmitted by the sender
and the noise Z depends on electrons bouncing around in the receiver
circuitry. As there is no reason for these to be related, we model X and
Z as independent. Thus, the joint PDF is

1
−2 x2 + z 2
1 σ2 σ2
fX,Z (x, z ) = fX (x) fZ (z ) = q e X Z . (5.42)
2π 2 σ2
σX Z
Quiz 5.6(A)
Random variables X and Y in Example 5.3 and random variables Q and

G in Quiz 5.2 have joint PMFs:
PX,Y (x, y) y=0 y=1 y=2 PQ,G(q, g) g=0 g=1 g=2 g=3
x=0 0.01 0 0 q=0 0.06 0.18 0.24 0.12
x=1 0.09 0.09 0 q=1 0.04 0.12 0.16 0.08
x=2 0 0 0.81
(a) Are X and Y independent?
(b) Are Q and G independent?

Quiz 5.6(A) Solution
(a) For random variables X and Y from Example 5.3, we observe that PY (1) = 0.09
and PX(0) = 0.01. However,
PX,Y (0, 1) = 0 6= PX (0) PY (1) (1)
Since we have found a pair x, y such that PX,Y (x, y) 6= PX(x)PY (y), we can conclude
that X and Y are dependent. Note that whenever PX,Y (x, y) = 0, independence
requires that either PX(x) = 0 or PY (y) = 0.
(b) For random variables Q and G from Quiz 5.2, it is not obvious whether they are
independent. Unlike X and Y in part (a), there are no obvious pairs q, g that fail
the independence requirement. In this case, we calculate the marginal PMFs from
the table of the joint PMF PQ,G(q, g) in Quiz 5.2. In transposed form, this table is
PQ,G(q, g) q=0 q=1 PG(g)
g=0 0.06 0.04 0.10
g=1 0.18 0.12 0.30
g=2 0.24 0.16 0.40
g=3 0.12 0.08 0.20
PQ(q) 0.60 0.40
Careful study of the table will verify that PQ,G(q, g) = PQ(q)PG(g) for every pair q, g.
Hence Q and G are independent.
Quiz 5.6(B)
Random variables X1 and X2 are independentand identically distributed

with probability density function

x/2 0 ≤ x ≤ 2,
fX (x) = (5.43)
0 otherwise.
What is the joint PDF fX1,X2(x1, x2)?
Quiz 5.6(B) Solution
Since X1 and X2 are identical, fX1(x) = fX2(x) = fX(x). Since X1 and

X2 are independent,

 x1 · x2 0 ≤ x1, x2 ≤ 2,
fX1,X2 (x1, x2) = fX1 (x1) fX2 (x2) = 2 2 (1)
0 otherwise.
Section 5.7
Expected Value of a Function of

Two Random Variables
Theorem 5.9
For random variables X and Y , the expected value of W = g(X, Y ) is

X X
Discrete: E[W ] = g(x, y)PX,Y (x, y)
x∈SX y∈SY
Z ∞ Z ∞
Continuous: E[W ] = g(x, y)fX,Y (x, y) dx dy.
−∞ −∞
Theorem 5.10
E [a1g1(X, Y ) + · · · + angn(X, Y )] = a1 E [g1(X, Y )] + · · · + an E [gn(X, Y )] .
Proof: Theorem 5.10
Let g(X, Y ) = a1 g1 (X, Y ) + · · · + an gn (X, Y ). For discrete random variables X, Y , Theo-
rem 5.9 states
X X
E [g(X, Y )] = (a1 g1 (x, y) + · · · + an gn (x, y)) PX,Y (x, y) . (5.44)
x∈SX y∈SY
We can break the double summation into n weighted double summations:

X X X X
E [g(X, Y )] = a1 g1 (x, y)PX,Y (x, y) + · · · + an gn (x, y)PX,Y (x, y) .
x∈SX y∈SY x∈SX y∈SY
By Theorem 5.9, the ith double summation on the right side is E[gi (X, Y )]; thus,
E [g(X, Y )] = a1 E [g1 (X, Y )] + · · · + an E [gn (X, Y )] . (5.45)
For continuous random variables, Theorem 5.9 says
Z ∞Z ∞
E [g(X, Y )] = (a1 g1 (x, y) + · · · + an gn (x, y)) fX,Y (x, y) dx dy. (5.46)
−∞ −∞
To complete the proof, we express this integral as the sum of n integrals and recognize
that each of the new integrals is a weighted expected value, ai E[gi (X, Y )].
Theorem 5.11
For any two random variables X and Y ,
E [X + Y ] = E [X ] + E [Y ] .
Theorem 5.12
The variance of the sum of two random variables is
Var [X + Y ] = Var [X ] + Var [Y ] + 2 E [(X − µX )(Y − µY )] .

Proof: Theorem 5.12
Since E[X + Y ] = µX + µY ,
h i
2
Var[X + Y ] = E (X + Y − (µX + µY ))
h i
2
= E ((X − µX ) + (Y − µY ))
h i
2 2
= E (X − µX ) + 2(X − µX )(Y − µY ) + (Y − µY ) . (5.47)
We observe that each of the three terms in the preceding expected values
is a function of X and Y . Therefore, Theorem 5.10 implies
h i h i
2 2
Var[X + Y ] = E (X − µX ) + 2 E [(X − µX )(Y − µY )] + E (Y − µY ) .
(5.48)
The first and last terms are, respectively, Var[X] and Var[Y ].
A company website has three pages. They require 750 kilobytes, 1500
kilobytes, and 2500 kilobytes for transmission. The transmission speed
can be 5 Mb/s for external requests or 10 Mb/s for internal requests.
Requests arrive randomly from inside and outside the company indepen-
dently of page length, which is also random. The probability models for
transmision speed, R, and page length, L, are:

 
 0.3 l = 750,
0.4 r = 5,
 

 0.5

l = 1500,
P R (r ) = 0.6 r = 10, PL (l) = (5.49)

 
 0.2 l = 2500,

0 otherwise, 

0 otherwise.


Write an expression for the transmission time g(R, L) seconds. Derive the
expected transmission time E[g(R, L)]. Does E[(g(R, L)] = g(E[R], E[L])?
The transmission time T seconds is the the page length (in kb) divided
by the transmission speed (in kb/s), or T = 8L/1000R. Because R and
L are independent, PR,L(r, l) = PR(r)PL(l) and
XX 8l
E [g(R, L)] = PR (r) PL (l)
r l
1000r
! 
8 X P R (r ) X
=  P L (l ) l 
1000 r r l
8 0.4 0.6

= + (0.3(750) + 0.5(1500) + 0.2(2500))
1000 5 10
= 1.652 s. (5.50)
P P
By comparison, E[R] = r rP R(r) = 8 Mb/s and E[L] = l lPL(l) = 1475
kilobytes. This implies
8 E [L]
g(E [R] , E [L]) = = 1.475 s 6= E [g(R, L)] . (5.51)
1000 E [R]
Section 5.8
Covariance, Correlation and

Independence
Definition 5.5 Covariance
The covariance of two random variables X and Y is
Cov [X, Y ] = E [(X − µX ) (Y − µY )] .

Example 5.16
Suppose we perform an experiment in which we measure X and Y in

centimeters (for example the height of two sisters). However, if we change
units and measure height in meters, we will perform the same experiment
except we observe X̂ = X/100 and Ŷ = Y /100. In this case, X̂ and Ŷ
have expected values µX̂ = µX /100 m, µŶ = µY /100 m and
h i h i
Cov X̂, Ŷ = E (X̂ − µX̂ )(Ŷ − µŶ )
E [(X − µX )(Y − µY )] Cov [X, Y ] 2
= = m . (5.53)
10, 000 10, 000
Changing the unit of measurement from cm2 to m2 reduces the covari-
ance by a factor of 10, 000. However, the tendency of X − µX and Y − µY
to have the same sign is the same as the tendency of X̂ − µX̂ and Ŷ − µŶ
to have the same sign. (Both are an indication of how likely it is that a
girl is taller than average if her sister is taller than average).
Definition 5.6 Correlation Coefficient
The correlation coefficient of two random variables X and Y is

Cov [X, Y ] Cov [X, Y ]
ρX,Y = q = .
Var[X] Var[Y ] σ σ
X Y
Figure 5.5
2 2 2
Y
Y
0 0 0
−2 −2 −2
−2 0 2 −2 0 2 −2 0 2
X X X
(a) ρX,Y = −0.9 (b) ρX,Y = 0 (c) ρX,Y = 0.9
Each graph has 200 samples, each marked by a dot, of the random
variable pair (X, Y ) such that E[X] = E[Y ] = 0, Var[X] = Var[Y ] = 1.
Theorem 5.13
If X̂ = aX + b and Ŷ = cY + d, then
(a) ρX̂,Ŷ = ρX,Y , (b) Cov[X̂, Ŷ ] = ac Cov[X, Y ].

Theorem 5.14
−1 ≤ ρX,Y ≤ 1.
Proof: Theorem 5.14
2 and σ 2 denote the variances of X and Y , and for a constant a,

Let σX Y
let W = X − aY . Then,
h i
Var[W ] = E (X − aY ) − (E [X − aY ])2 .
2 (5.54)
Since E[X − aY ] = µX − aµY , expanding the squares yields
h i
2 2 2 2 2 2
Var[W ] = E X − 2aXY + a Y − µX − 2aµX µY + a µY
= Var[X] − 2a Cov [X, Y ] + a2 Var[Y ]. (5.55)
Since Var[W ] ≥ 0 for any a, we have 2a Cov[X, Y ] ≤ Var[X] + a2 Var[Y ].
Choosing a = σX /σY yields Cov[X, Y ] ≤ σY σX , which implies ρX,Y ≤ 1.
Choosing a = −σX /σY yields Cov[X, Y ] ≥ −σY σX , which implies ρX,Y ≥
−1.
Theorem 5.15
If X and Y are random variables such that Y = aX + b,


 −1 a < 0,

ρX,Y = 0 a = 0,

 1 a > 0.
5.8 Comment: Examples of Correlation
Some examples of positive, negative, and zero correlation coefficients include:
• X is a student’s height. Y is the same student’s weight. 0 < ρX,Y < 1.
• X is the distance of a cellular phone from the nearest base station. Y is the power
of the received signal at the cellular phone. −1 < ρX,Y < 0.
• X is the temperature of a resistor measured in degrees Celsius. Y is the temperature

of the same resistor measured in Kelvins. ρX,Y = 1 .
• X is the gain of an electrical circuit measured in decibels. Y is the attenuation,

measured in decibels, of the same circuit. ρX,Y = −1.
• X is the telephone number of a cellular phone. Y is the Social Security number of

the phone’s owner. ρX,Y = 0.
Definition 5.7 Correlation
The correlation of X and Y is rX,Y = E[XY ]

Theorem 5.16
(a) Cov[X, Y ] = rX,Y − µX µY .
(b) Var[X + Y ] = Var[X] + Var[Y ] + 2 Cov[X, Y ].
(c) If X = Y , Cov[X, Y ] = Var[X] = Var[Y ] and rX,Y = E[X 2] = E[Y 2].

Proof: Theorem 5.16
Cross-multiplying inside the expected value of Definition 5.5 yields
Cov [X, Y ] = E [XY − µX Y − µY X + µX µY ] . (5.56)

Since the expected value of the sum equals the sum of the expected
values,
Cov [X, Y ] = E [XY ] − E [µX Y ] − E [µY X ] + E [µY µX ] . (5.57)

Note that in the expression E[µY X], µY is a constant. Referring to
Theorem 3.12, we set a = µY and b = 0 to obtain E[µY X] = µY E[X] =
µY µX . The same reasoning demonstrates that E[µX Y ] = µX E[Y ] =
µX µY . Therefore,
Cov [X, Y ] = E [XY ] − µX µY − µY µX + µY µX = rX,Y − µX µY . (5.58)

The other relationships follow directly from the definitions and Theo-
rem 5.12.
For the integrated circuits tests in Example 5.3, we found in Example 5.5
that the probability model for X and Y is given by the following matrix.
PX,Y (x, y ) y = 0 y = 1 y = 2 PX (x)
x=0 0.01 0 0 0.01
x=1 0.09 0.09 0 0.18
x=2 0 0 0.81 0.81
P Y (y ) 0.10 0.09 0.81
Find rX,Y and Cov[X, Y ].
By Definition 5.7,
2 X
X 2
rX,Y = E [XY ] = xyPX,Y (x, y ) (5.59)
x=0 y=0
= (1)(1)0.09 + (2)(2)0.81 = 3.33. (5.60)
To use Theorem 5.16(a) to find the covariance, we find
E [X ] = (1)(0.18) + (2)(0.81) = 1.80,

E [Y ] = (1)(0.09) + (2)(0.81) = 1.71. (5.61)
Therefore, by Theorem 5.16(a), Cov[X, Y ] = 3.33−(1.80)(1.71) = 0.252.
Definition 5.8 Orthogonal Random Variables
Random variables X and Y are orthogonal if rX,Y = 0.

Definition 5.9 Uncorrelated Random Variables
Random variables X and Y are uncorrelated if Cov[X, Y ] = 0.

Theorem 5.17
For independent random variables X and Y ,
(a) E[g(X)h(Y )] = E[g(X)] E[h(Y )],
(b) rX,Y = E[XY ] = E[X] E[Y ],
(c) Cov[X, Y ] = ρX,Y = 0,
(d) Var[X + Y ] = Var[X] + Var[Y ],

Proof: Theorem 5.17
We present the proof for discrete random variables. By replacing PMFs and sums
with PDFs and integrals we arrive at essentially the same proof for continuous random
variables. Since PX,Y (x, y) = PX(x)PY (y),
X X
E [g(X)h(Y )] = g(x)h(y)PX (x) PY (y)
x∈SX y∈SY
! 
X X
= g(x)PX (x)  h(y)PY (y) = E [g(X)] E [h(Y )] . (5.62)
x∈SX y∈SY
If g(X) = X, and h(Y ) = Y , this equation implies rX,Y = E[XY ] = E[X] E[Y ]. This
equation and Theorem 5.16(a) imply Cov[X, Y ] = 0. As a result, Theorem 5.16(b)
implies Var[X + Y ] = Var[X] + Var[Y ]. Furthermore, ρX,Y = Cov[X, Y ]/(σX σY ) = 0.
For the noisy observation Y = X +Z of Example 5.1, find the covariances

Cov[X, Z] and Cov[X, Y ] and the correlation coefficients ρX,Z and ρX,Y .
We recall from Example 5.1 that the signal X is Gaussian (0, σX ), that
the noise Z is Gaussian (0, σZ ), and that X and Z are independent. We
know from Theorem 5.17(c) that independence of X and Z implies
Cov [X, Z ] = ρX,Z = 0. (5.63)

In addition, by Theorem 5.17(d),
Var[Y ] = Var[X] + Var[Z] = σX2 + σ2 . (5.64)

Z
Since E[X] = E[Z] = 0, Theorem 5.11 tells us that E[Y ] = E[X] + E[Z] =
0 and Theorem 5.17)(b) says that E[XZ] = E[X] E[Z] = 0. This permits
us to write
Cov [X, Y ] = E [XY ] = E [hX(X + Z)i] h i h i

2 2 2
= E X + XZ = E X + E [XZ ] = E X = σX 2.
(5.65)
This implies
v
Cov [X, Y ] 2
σX
u 2 /σ 2
σX Z .
u
ρX,Y = q =q =t 2 /σ 2
(5.66)
Var[X] Var[Y ] 2 (σ 2 + σ 2 )
σX 1 + σX
X Z Z
Quiz 5.8(A)
Random variables L and T have joint PMF
PL,T (l, t) t = 40 sec t = 60 sec
l = 1 page 0.15 0.1
l = 2 pages 0.30 0.2
l = 3 pages 0.15 0.1.
Find the following quantities.
(a) E[L] and Var[L]
(b) E[T ] and Var[T ]
(c) The covariance Cov[L, T ]
(d) The correlation coefficient ρL,T
Quiz 5.8(A) Solution
It is helpful to first make a table that includes the marginal PMFs.
PL,T(l, t) t = 40 t = 60 PL(l)
l=1 0.15 0.1 0.25
l=2 0.3 0.2 0.5
l=3 0.15 0.1 0.25
PT(t) 0.6 0.4
(a) The expected value of L is
E [L] = 1(0.25) + 2(0.5) + 3(0.25) = 2. (1)
Since the second moment of L is
E L2 = 12 (0.25) + 22 (0.5) + 32 (0.25) = 4.5,

(2)
the variance of L is
Var [L] = E L − (E [L])2 = 0.5.
2
(3)
(b) The expected value of T is
E [T ] = 40(0.6) + 60(0.4) = 48. (4)
The second moment of T is
E T 2 = 402 (0.6) + 602 (0.4) = 2400.

(5)
[Continued]
Quiz 5.8(A) Solution (Continued 2)
Thus
Var[T ] = E T 2 − (E [T ])2 = 96.

(6)
(c) First we need to find
3
X X
E [LT ] = ltPLT (lt)
t=40,60 l=1
= 1(40)(0.15) + 2(40)(0.3) + 3(40)(0.15)
+ 1(60)(0.1) + 2(60)(0.2) + 3(60)(0.1)
= 96. (7)
The covariance of L and T is
Cov [L, T ] = E [LT ] − E [L] E [T ] = 96 − 2(48) = 0. (8)
(d) Since Cov[L, T ] = 0, the correlation coefficient is ρL,T = 0.
Quiz 5.8(B)


xy 0 ≤ x ≤ 1, 0 ≤ y ≤ 2,
fX,Y (x, y ) = (5.67)
0 otherwise.
Find the following quantities.
(a) E[X] and Var[X]
(b) E[Y ] and Var[Y ]
(c) The covariance Cov[X, Y ]
(d) The correlation coefficient ρX,Y
Quiz 5.8(B) Solution
As in the discrete case, the calculations become easier if we first calculate the marginal
PDFs fX(x) and fY (y). For 0 ≤ x ≤ 1,
Z ∞ Z 2 y=2
1 2
fX (x) = fX,Y (x, y) dy = xy dy = xy = 2x. (1)
−∞ 0 2 y=0
Similarly, for 0 ≤ y ≤ 2,
Z ∞ Z 2
x=1
1 y
xy dx = x2 y

fY (y) = fX,Y (x, y) dx = = . (2)
−∞ 0 2 x=0 2
The complete expressions for the marginal PDFs are

2x 0 ≤ x ≤ 1, y/2 0 ≤ y ≤ 2,
fX (x) = fY (y) = (3)
0 otherwise, 0 otherwise.
From the marginal PDFs, it is straightforward to calculate the various expectations.
(a) The first and second moments of X are
Z ∞ Z 1
2
E [X] = xfX (x) dx = 2x2 dx = . (4)
−∞ 0 3
Z ∞ Z 1
1
E X2 = x2 fX (x) dx = 2x3 dx = .

(5)
−∞ 0 2
[Continued]
Quiz 5.8(B) Solution (Continued 2)
The variance of X is
1
Var[X] = E[X 2 ] − (E[X])2 = .
18
(a) The first and second moments of Y are
Z ∞ Z 2
1 2 4
E [Y ] = yfY (y) dy = y dy = , (6)
−∞ 0 2 3
Z ∞ Z 2
2 1 3
E Y = y 2 fY (y) dy = y dy = 2. (7)
−∞ 0 2
The variance of Y is
16 2
Var[Y ] = E Y − (E [Y ])2 = 2 −
2
= . (8)
9 9
(b) We start by finding
Z ∞ Z ∞
E [XY ] = xyfX,Y (x, y) dx, dy
−∞ −∞
1Z 2 1 2
x3 y 3
Z
= 2 2
x y dx, dy = = 8. (9)
0 0 3 0 3 0 9
The covariance of X and Y is then
8 2 4
Cov [X, Y ] = E [XY ] − E [X] E [Y ] = − · = 0. (10)
9 3 3
(c) Since Cov[X, Y ] = 0, the correlation coefficient is ρX,Y = 0.
Section 5.9
Bivariate Gaussian Random

Variables
Bivariate Gaussian Random
Definition 5.10 Variables
Random variables X and Y have a bivariate Gaussian PDF with parame-

ters µX , µY , σX > 0, σY > 0, and ρX,Y satisfying −1 < ρX,Y < 1 if
 2 2 
x−µX
2ρ X,Y (x−µ X )(y−µ Y )
y−µY
 σX − σX σY + σY 
exp −

2

2(1 − ρX,Y ) 
fX,Y (x, y ) = q ,
2πσX σY 1 − ρX,Y 2
Figure 5.6
ρ = −0.9 ρ=0 ρ = 0.9
0.3 0.3 0.3

fX,Y(x,y)
(x,y)
fX,Y(x,y)
0.2 0.2 0.2
X,Y
0.1 2 0.1 2 0.1 2
f
0 0 0
0 0 0
−2 −1 −2 −2 −1 −2 −2 −1 −2
0 1 2 y 0 1 2 y 0 1 2 y
x x x
2 2 2
Y
Y
0 0 0
−2 −2 −2
−2 0 2 −2 0 2 −2 0 2
X X X
The Joint Gaussian PDF fX,Y (x, y) for µX = µY = 0, σX = σY = 1, and three values of
ρX,Y = ρ. Next to each PDF, we plot 200 sample pairs (X, Y ) generated with that PDF.
Theorem 5.18
If X and Y are the bivariate Gaussian random variables in Definition 5.10,

X is the Gaussian (µX , σX ) random variable and Y is the Gaussian (µY , σY )
random variable:
1 −(x−µ ) 2 /2σ 2 1 −(y−µ ) 2 /2σ 2
f X ( x) = √ e X X , f Y (y ) = √ e Y Y.
σX 2π σY 2π
Proof: Theorem 5.18
Integrating fX,Y (x, y) in Equation (5.69) over all y, we have

Z ∞
fX (x) = fX,Y (x, y ) dy
−∞
Z ∞
1 −(x−µX )2 /2σX
2 1 −(y−µ̃Y (x))2 /2σ̃Y2
= √ e √ e dy (5.70)
σX 2π −∞ σ̃Y 2π
| {z }
1
The integral above the bracket equals 1 because it is the integral of a
Gaussian PDF. The remainder of the formula is the PDF of the Gaussian
(µX , σX ) random variable. The same reasoning with the roles of X and
Y reversed leads to the formula for fY (y).
Theorem 5.19
Bivariate Gaussian random variables X and Y in Definition 5.10 have

correlation coefficient ρX,Y .
Theorem 5.20
Bivariate Gaussian random variables X and Y are uncorrelated if and only

if they are independent.
Theorem 5.21
If X and Y are bivariate Gaussian random variables with PDF given by

Definition 5.10, and W1 and W2 are given by the linearly independent
equations
W1 = a1X + b1Y, W2 = a2X + b2Y,

then W1 and W2 are bivariate Gaussian random variables such that
E [Wi] = aiµX + biµY , i = 1, 2,

Var[Wi] = a2 2 2 2
i σX + bi σY + 2ai bi ρX,Y σX σY , i = 1, 2,
2 + b b σ 2 + (a b + a b )ρ
Cov [W1, W2] = a1a2σX 1 2 Y 1 2 2 1 X,Y σX σY .
For the noisy observation in Example 5.14, find the PDF of Y = X + Z.

Since X is Gaussian (0, σX ) and Z is Gaussian (0, σZ ) and X and Z are

independent, X and Z are jointly Gaussian. It follows from Theorem 5.21
that Y is Gaussian with E[Y ] = E[X] + E[Z] = 0 and variance σY2 =
σX2 + σ 2 . The PDF of Y is
Z
1 −y 2 /2(σX
2 +σ 2 )
fY (y ) = q e Z . (5.71)
2 + σ2 )
2π(σX Z
Continuing Example 5.19, find the joint PDF of X and Y when σX = 4

and σZ = 3.
From Theorem 5.21, we know that X and Y are bivariate Gaussian. We

also know that µX = µY = 0 and that Y has variance σY2 = σX2 + σ2 =
Z
25. Substituting σX = 4 and σZ = 3 in the formula for the correlation
coefficient derived in Example 5.18, we have
v
u 2 /σ 2
σX 4
Z
u
ρX,Y = t 2 2
= . (5.72)
1 + σX /σZ 5
Applying these parameters to Definition 5.10, we obtain
1 − 25x2/16−2xy+y2/18
fX,Y (x, y ) = e . (5.73)
24π
Quiz 5.9
Let X and Y be jointly Gaussian (0, 1) random variables with correlation

coefficient 1/2. What is the joint PDF of X and Y ?
Quiz 5.9 Solution
This problem just requires identifying the various parameters in Defini-

tion 5.10. Specifically, from the problem statement, we know ρ = 1/2
and
µX = 0, µY = 0,
σX = 1, σY = 1.
Applying these facts to Definition 5.10, we have
2 2
e−2(x −xy+y )/3
fX,Y (x, y ) = √ . (1)
3π 2
Section 5.10
Multivariate Probability Models

Definition 5.11 Multivariate Joint CDF
The joint CDF of X1, . . . , Xn is
FX1,...,Xn (x1, . . . , xn) = P [X1 ≤ x1, . . . , Xn ≤ xn] .

Definition 5.12 Multivariate Joint PMF
The joint PMF of the discrete random variables X1, . . . , Xn is
PX1,...,Xn (x1, . . . , xn) = P [X1 = x1, . . . , Xn = xn] .

Definition 5.13 Multivariate Joint PDF
The joint PDF of the continuous random variables X1, . . . , Xn is the func-
tion
∂ nFX1,...,Xn (x1, . . . , xn)
fX1,...,Xn (x1, . . . , xn) = .
∂x1 · · · ∂xn
Theorem 5.22
If X1, . . . , Xn are discrete random variables with joint PMF

PX1,...,Xn(x1, . . . , xn),
(a) PX1,...,Xn(x1, . . . , xn) ≥ 0,

X X
(b) ··· PX1,...,Xn(x1, . . . , xn) = 1.
x1 ∈SX1 xn ∈SXn
Theorem 5.23
If X1, . . . , Xn have joint PDF

fX1,...,Xn(x1, . . . , xn),
(a) fX1,...,Xn(x1, . . . , xn) ≥ 0,

Z x Z x
1 n
(b) FX1,...,Xn(x1, . . . , xn) = ··· fX1,...,Xn(u1, . . . , un) du1 · · · dun,
−∞ −∞
Z ∞ Z ∞
(c) ··· fX1,...,Xn(x1, . . . , xn) dx1 · · · dxn = 1.
−∞ −∞
Theorem 5.24
The probability of an event A expressed in terms of the random variables

X1, . . . , Xn is
X
Discrete: P[A] = PX1,...,Xn(x1, . . . , xn)
(x1 ,...,xn )∈A
Z Z
Continuous: P[A] = ··· fX1,...,Xn(x1, . . . , xn) dx1 dx2 . . . dxn.
A
Consider a set of n independent trials in which there are r possible out-

comes s1, . . . , sr for each trial. In each trial, P[si] = pi. Let Ni equal the
number of times that outcome si occurs over n trials. What is the joint
PMF of N1, . . . , Nr ?
The solution to this problem appears in Theorem 2.9 and is repeated

here:
n
n n
PN1,...,Nr (n1, . . . , nr ) = p1 1 p2 2 · · · pn r
r . (5.74)
n1, . . . , nr
Theorem 5.25
For a joint PMF PW,X,Y,Z(w, x, y, z) of discrete random variables W, X, Y, Z,

some marginal PMFs are
X
PX,Y,Z (x, y, z ) = PW,X,Y,Z (w, x, y, z ) ,
w∈SW
X X
PW,Z (w, z ) = PW,X,Y,Z (w, x, y, z ) ,
x∈SX y∈SY
Theorem 5.26
For a joint PDF fW,X,Y,Z(w, x, y, z) of continuous random variables W, X, Y, Z,

some marginal PDFs are
Z ∞
fW,X,Y (w, x, y ) = fW,X,Y,Z (w, x, y, z ) dz,
Z−∞
∞ Z ∞ Z ∞
f X ( x) = fW,X,Y,Z (w, x, y, z ) dw dy dz.
−∞ −∞ −∞
As in Quiz 5.10, the random variables Y1, . . . , Y4 have the joint PDF

4 0 ≤ y1 ≤ y2 ≤ 1, 0 ≤ y3 ≤ y4 ≤ 1,
fY1,...,Y4 (y1, . . . , y4) = (5.75)
0 otherwise.
Find the marginal PDFs fY1,Y4(y1, y4), fY2,Y3(y2, y3), and fY3(y3).
Z ∞ Z ∞
fY1,Y4 (y1, y4) = fY1,...,Y4 (y1, . . . , y4) dy2 dy3. (5.76)
−∞ −∞
In the foregoing integral, the hard part is identifying the correct limits.
These limits will depend on y1 and y4. For 0 ≤ y1 ≤ 1 and 0 ≤ y4 ≤ 1,
Z 1Z y
4
fY1,Y4 (y1, y4) = 4 dy3 dy2 = 4(1 − y1)y4. (5.77)
y1 0
The complete expression for fY1,Y4(y1, y4) is

4(1 − y )y 0 ≤ y1 ≤ 1, 0 ≤ y4 ≤ 1,
1 4
fY1,Y4 (y1, y4) = (5.78)
0 otherwise.
Similarly, for 0 ≤ y2 ≤ 1 and 0 ≤ y3 ≤ 1,
Z y Z 1
2
fY2,Y3 (y2, y3) = 4 dy4 dy1 = 4y2(1 − y3). (5.79)
0 y3
[Continued]
The complete expression for fY2,Y3(y2, y3) is


4y (1 − y ) 0 ≤ y2 ≤ 1, 0 ≤ y3 ≤ 1,
2 3
fY2,Y3 (y2, y3) = (5.80)
0 otherwise.
Lastly, for 0 ≤ y3 ≤ 1,
Z ∞ Z 1
fY3 (y3) = fY2,Y3 (y2, y3) dy2 = 4y2(1 − y3) dy2 = 2(1 − y3). (5.81)
−∞ 0
The complete expression is

2(1 − y ) 0 ≤ y3 ≤ 1,
3
fY3 (y3) = (5.82)
0 otherwise.
N Independent Random
Definition 5.14 Variables
Random variables X1, . . . , Xn are independent if for all x1, . . . , xn,
Discrete: PX1,...,Xn(x1, . . . , xn) = PX1(x1)PX2(x2) · · · PXN(xn)
Continuous: fX1,...,Xn(x1, . . . , xn) = fX1(x1)fX2(x2) · · · fXn(xn).

Independent and Identically
Definition 5.15 Distributed (iid)
X1, . . . , Xn are independent and identically distributed (iid) if
Discrete: PX1,...,Xn(x1, . . . , xn) = PX(x1)PX(x2) · · · PX(xn)
Continuous: fX1,...,Xn(x1, . . . , xn) = fX(x1)fX(x2) · · · fX(xn).

The random variables X1, . . . , Xn have the joint PDF


1 0 ≤ xi ≤ 1, i = 1, . . . , n,
fX1,...,Xn (x1, . . . , xn) = (5.83)
0 otherwise.
Let A denote the event that maxi Xi ≤ 1/2. Find P[A].
We can solve this problem by applying Theorem 5.24:

P [A] = P max Xi ≤ 1/2 = P [X1 ≤ 1/2, . . . , Xn ≤ 1/2]
i
Z 1/2 Z 1/2
1
= ···
1 dx1 · · · dxn = n . (5.84)
0 0 2
As n grows, the probability that the maximum is less than 1/2 rapidly
goes to 0.
We note that inspection of the joint PDF reveals that X1, . . . , X4 are iid
continuous uniform (0, 1) random variables. The integration in Equa-
tion (5.84) is easy because independence implies
P [A] = P [X1 ≤ 1/2, . . . , Xn ≤ 1/2]

= P [X1 ≤ 1/2] × · · · × P [Xn ≤ 1/2] = (1/2)n. (5.85)
Quiz 5.10
The random variables Y1, . . . , Y4 have the joint PDF


4 0 ≤ y1 ≤ y2 ≤ 1, 0 ≤ y3 ≤ y4 ≤ 1,
fY1,...,Y4 (y1, . . . , y4) = (5.86)
0 otherwise.
Let C denote the event that maxi Yi ≤ 1/2. Find P[C].
Quiz 5.10 Solution
We find P[C] by integrating the joint PDF over the region of interest.
Specifically,
Z 1 Z y Z 1 Z y
2 2 2 4
P [C ] = dy2 dy1 dy4 4dy3
0 0  0 0 
Z 1 Z 1
2 2
= 4 y2 dy2  y4 dy4
0 0
1 1
  
2
1 2 1 2 1 1
= 4  y22   y42  = 4

= . (1)
2 0 2 0 8 16
Section 5.11
Matlab
Sample Space Grids
• We start with the case when X and Y are finite random variables with ranges
SX = {x1 , . . . , xn } , SY = {y1 , . . . , ym } . (5.87)
In this case, we can take advantage of Matlab techniques for surface plots of g(x, y)
over the x, y plane.
• In Matlab, we represent SX and SY by the n element vector sx and m element

vector sy.
• The function [SX,SY]=ndgrid(sx,sy) produces the pair of n × m matrices,

   
x1 · · · x1 y1 · · · ym
SX =  ... ...  , SY =  ... ...  . (5.88)
xn · · · xn y1 · · · ym
We refer to matrices SX and SY as a sample space grid because they are a grid
representation of the joint sample space
SX,Y = {(x, y)|x ∈ SX , y ∈ SY } . (5.89)
That is, [SX(i,j) SY(i,j)] is the pair (xi , yj ).
Probabilities on Grids
• To complete the probability model, for X and Y , in Matlab, we employ the n × m
matrix PXY such that PXY(i,j) = PX,Y (xi , yj ).
• To make sure that probabilities have been generated properly, we note that
[SX(:) SY(:) PXY(:)]
is a matrix whose rows list all possible pairs xi , yj and corresponding probabilities
PX,Y (xi , yj ).
• Given a function g(x, y) that operates on the elements of vectors x and y, the
advantage of this grid approach is that the Matlab function g(SX,SY) will calculate
g(x, y) for each x ∈ SX and y ∈ SY .
• In particular, g(SX,SY) produces an n × m matrix with i, jth element g(xi , yj ).

An Internet photo developer website prints compressed photo images.

Each image file contains a variable-sized image of X × Y pixels described
by the joint PMF
PX,Y (x, y ) y = 400 y = 800 y = 1200
x = 800 0.2 0.05 0.1
(5.90)
x = 1200 0.05 0.2 0.1
x = 1600 0 0.1 0.2.
For random variables X, Y , write a script imagepmf.m that defines the
sample space grid matrices SX, SY, and PXY.
h i0
In the script imagepmf.m, the matrix SX has 800 1200 1600 for each
h i
column and SY has 400 800 1200 for each row.
After running imagepmf.m, we can inspect the variables:
%imagepmf.m >> imagepmf; SX

PXY=[0.2 0.05 0.1; ... SX =
0.05 0.2 0.1; ... 800 800 800
0 0.1 0.2]; 1200 1200 1200
[SX,SY]=ndgrid([800 1200 1600],... 1600 1600 1600
[400 800 1200]); >> SY
SY =
400 800 1200
400 800 1200
400 800 1200
At 24 bits (3 bytes) per pixel, a 10:1 image compression factor yields

image files with B = 0.3XY bytes. Find the expected value E[B] and the
PMF PB(b).
%imagesize.m The script imagesize.m produces the expected

imagepmf;
SB=0.3*(SX.*SY); value as eb, and produces the PMF, which is
eb=sum(sum(SB.*PXY)) represented by the vectors sb and pb. The 3 × 3
sb=unique(SB)’
pb=finitepmf(SB,PXY,sb)’
matrix SB has i, jth element g(xi, yj ) = 0.3xiyj .
The calculation of eb is simply a Matlab imple-
mentation of Theorem 5.9. Since some elements of SB are identical,
sb=unique(SB) extracts the unique elements. Although SB and PXY are
both 3 × 3 matrices, each is stored internally by Matlab as a 9-element
vector. Hence, we can pass SB and PXY to the finitepmf() function,
which was designed to handle a finite random variable described by a
pair of column vectors. Figure 5.7 shows one result of running the pro-
gram imagesize. The vectors sb and pb comprise PB(b). For example,
PB(288000) = 0.3.
Figure 5.7
>> imagesize
eb =
319200
sb =
96000 144000 192000 288000 384000 432000 576000
pb =
0.2000 0.0500 0.0500 0.3000 0.1000 0.1000 0.2000
Output resulting from imagesize.m in Example 5.25.

Write a function xy=imagerv(m) that generates m sample pairs of the

image size random variables X, Y of Example 5.25.
The function imagerv uses the imagesize.m script to define the matrices
SX, SY, and PXY. It then calls the finiterv.m function. Here is the code
imagerv.m and a sample run:
function xy = imagerv(m); >> xy=imagerv(3)

imagepmf; xy =
S=[SX(:) SY(:)]; 800 400
xy=finiterv(S,PXY(:),m); 1200 800
1600 800
Given a list xy of sample pairs of random variables X, Y with Matlab

range grids SX and SY, write a Matlab function
fxy=freqxy(xy,SX,SY)
that calculates the relative frequency of every pair x, y. The output fxy
should correspond to the matrix [SX(:) SY(:) PXY(:)].
function fxy = freqxy(xy,SX,SY) The matrix [SX(:) SY(:)] in freqxy has

xy=[xy; SX(:) SY(:)];
[U,I,J]=unique(xy,’rows’); rows that list all possible pairs x, y. We
N=hist(J,1:max(J))-1; append this matrix to xy to ensure that the
N=N/sum(N);
fxy=[U N(:)];
new xy has every possible pair x, y. Next,
fxy=sortrows(fxy,[2 1 3]); the unique function copies all unique rows
of xy to the matrix U and also provides
the vector J that indexes the rows of xy in U; that is, xy=U(J). In
addition, the number of occurrences of j in J indicates the number of
occurrences in xy of row j in U. Thus we use the hist function on
J to calculate the relative frequencies. We include the correction factor
-1 because we had appended [SX(:) SY(:)] to xy at the start. Lastly,
we reorder the rows of fxy because the output of unique produces the
rows of U in a different order from [SX(:) SY(:) PXY(:)].
Generate m = 10, 000 samples of random variables X, Y of Example 5.25.

Calculate the relative frequencies and use stem3 to graph them.
The script imagestem.m generates the following relative frequency stem

plot.
%imagestem.m
imagepmf;
xy=imagerv(10000);
0.2
fxy=freqxy(xy,SX,SY);
stem3(fxy(:,1),...
fxy(:,2),fxy(:,3)); 0.1
xlabel(’\it x’); 1600
ylabel(’\it y’); 0 800
1200 800 400 0 0 x
y

Ch05 Student (Prob. Tuts)

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Ch05 Student (Prob. Tuts)

Загружено:

Авторское право:

Доступные форматы

Pairs of Random Variables

• In this chapter, we consider experiments that produce a collection of random vari-

• For most of this chapter, we study n = 2 random variables: X and Y . A pair of

• This subject is continued in Chapter 8, which uses techniques of linear algebra to

• We extend the definition of independent events to define independent random vari-

We would like to measure random variable X, but we instead observe

Joint Cumulative Distribution

The joint cumulative distribution function of random variables X and Y

For any pair of random variables, X, Y ,

(a) 0 ≤ FX,Y (x, y) ≤ 1, (f) FX,Y (−∞, y) = 0,

X years is the age of children entering first grade in a school. Y years is

Using Theorem 5.1(b) and Theorem 5.1(c), we find

(a) FX,Y (−∞, 2)

(b) FX,Y (∞, ∞)

(c) FX,Y (∞, y)

(d) FX,Y (∞, −∞)

Joint Probability Mass Function

The joint probability mass function of discrete random variables X and

S = {aa, ar, ra, rr} .

X=0,Y =0Observing the tree diagram, we com-

P [aa] = 0.81, P [ar] = 0.09, (5.8)

PX,Y (x, y) y = 0 y = 1 y = 2 The joint PMF can be represented by

Continuing Example 5.3, find the probability of the event B that X,

Mathematically, B is the event {X = Y }. The elements of B with nonzero

B ∩ SX,Y = {(0, 0), (1, 1), (2, 2)} . (5.12)

P [B ] = PX,Y (0, 0) + PX,Y (1, 1) + PX,Y (2, 2)

Subsets B of the (X, Y ) plane. Points (X, Y ) ∈ SX,Y are marked by

Joint Probability Density Function

The joint PDF of the continuous random variables X and Y is a function

The probability that the continuous random variables (X, Y ) are in A is

Random variables X and Y have joint PDF

The large rectangle in the diagram is the area of nonzero probability.

This probability model is an example of a pair of random variables uni-

x < 0 or y < 0 0≤y≤x≤1

Five cases for the CDF FX,Y (x, y) of Example 5.8.

A graph of the joint CDF FX,Y (x, y) of Example 5.8.

As in Example 5.7, random variables X and Y have joint PDF

The joint probability density function of random variables X and Y is

Find the constant c. What is the probability of the event A = X 2 + Y 2 ≤

From the definition of the joint PDF, we can write

The joint PDF of X and Y is

The joint probability density function of random variables X and Y is

Note that fX(x) = 0 for x < 0 or x > 1. For 0 ≤ x ≤ 1,

Independent Random Variables

Random variables X and Y are independent if and only if

Discrete: PX,Y (x, y) = PX(x)PY (y)

Continuous: fX,Y (x, y) = fX(x)fY (y).

Are the childrens’ ages X and Y in Example 5.2 independent?

The marginal PDFs of X and Y are

Consider again the noisy observation model of Example 5.1. Suppose X

From the information given, we know that X and Z have PDFs

Random variables X and Y in Example 5.3 and random variables Q and

(a) Are X and Y independent?

(b) Are Q and G independent?

Random variables X1 and X2 are independentand identically distributed

Since X1 and X2 are identical, fX1(x) = fX2(x) = fX(x). Since X1 and

Expected Value of a Function of

For random variables X and Y , the expected value of W = g(X, Y ) is

We can break the double summation into n weighted double summations:

For any two random variables X and Y ,

The variance of the sum of two random variables is

Var [X + Y ] = Var [X ] + Var [Y ] + 2 E [(X − µX )(Y − µY )] .

Covariance, Correlation and