Introduction

to Econometrics
(3rd Updated Edition, Global Edition)
by
James H. Stock and Mark W. Watson

Solutions to End-of-Chapter Exercises: Chapter 2*

(This version August 17, 2014)

*Limited distribution: For Instructors Only. Answers to all odd-numbered

questions are provided to students on the textbook website. If you find errors in
the solutions, please pass them along to us at mwatson@princeton.edu.

2.1.

Probability
(b)

Y=0
0.36

Y = E (Y ) = (0 0.36) + (1 0.48) + (2 0.16) = 0.8

Using Key Concept 2.3: var(Y
=
) E (Y 2 ) [ E (Y )]2 ,
and

E (Y 2 ) =
2 0.6 0.4 + 0.82 =
1.12
so that

var(Y ) =E (Y 2 ) [ E (Y )]2 =
1.12 (0.8) 2 =0.48

Y=1
0.48

Y=2
0.16

2.2. We know from Table 2.2 that Pr (Y = 0) = 0.22, Pr (Y = 1) = 0.78, Pr ( X = 0) = 0.30,

Pr ( X = 1) = 0.70. So
(a)

Y = E (Y ) = 0 Pr (Y = 0) + 1 Pr (Y = 1)
= 0 0.22 + 1 0.78 = 0.78,

X = E ( X ) = 0 Pr ( X = 0) + 1 Pr ( X = 1)
= 0 0.30 + 1 0.70 = 0.70.

(b)

X2 = E[( X X )2 ]
= (0 0.70)2 Pr ( X = 0) + (1 0.70) 2 Pr ( X = 1)
= (0.70)2 0.30 + 0.302 0.70 = 0.21,

Y2 = E[(Y Y )2 ]
= (0 0.78)2 Pr (Y = 0) + (1 0.78)2 Pr (Y = 1)
= (0.78)2 0.22 + 0.222 0.78 = 0.1716.

(c)

XY = cov (X , Y ) = E[( X X )(Y Y )]

= (0 0.70)(0 0.78) Pr( X = 0, Y = 0)
+ (0 0.70)(1 0.78) Pr ( X = 0, Y = 1)
+ (1 0.70)(0 0.78) Pr ( X = 1, Y = 0)
+ (1 0.70)(1 0.78) Pr ( X = 1, Y = 1)
= (0.70) (0.78) 0.15 + (0.70) 0.22 0.15
+ 0.30 (0.78) 0.07 + 0.30 0.22 0.63
= 0.084,
corr (X , Y ) =

XY
0.084
=
= 0.4425.
XY
0.21 0.1716

2.3.

(a) E(V ) = E (11 2Y ) = 11 2E(Y ) = 11 2 0.78 = 9.44

E(W ) = E (4 + 8 X ) = 4 + 8E( X ) = 4 + 8 0.70 = 9.6

(b) W2 =ar(4
v
+ 8X ) =
82 X2 =
64 0.21 =
13.44
V2 =
0.6864
var(11 2Y ) =
(2) 2 Y2 =
4 0.1716 =
(c)

WV
1.008
corr(W
,V ) =
=
= 0.3319
W V
13.44 0.6864

2.4. (a) E ( X 3 ) = 03 (1 p) + 13 p = p
(b) E ( X k ) = 0k (1 p) + 1k p = p
(c) E ( X ) = 0.3

var ( X ) = E ( X 2 ) [ E ( X )]2 = 0.3 0.09 = 0.21

Thus, = 0.21 = 0.46.
To compute the skewness, use the formula from exercise 2.21:

E ( X )3 = E ( X 3 ) 3[ E ( X 2 )][ E ( X )] + 2[ E ( X )]3
= 0.3 3 0.32 + 2 0.33 = 0.084
Alternatively, E ( X )3 = [(1 0.3)3 0.3] + [(0 0.3)3 0.7] = 0.084
Thus, skewness = E ( X )3/ 3 = .084/0.463 = 0.87.
To compute the kurtosis, use the formula from exercise 2.21:

E ( X )4 = E ( X 4 ) 4[ E ( X )][ E ( X 3 )] + 6[ E ( X )]2 [ E ( X 2 )] 3[ E ( X )]4

= 0.3 4 0.32 + 6 0.33 3 0.34 = 0.0777
Alternatively, E ( X )4 = [(1 0.3)4 0.3] + [(0 0.3)4 0.7] = 0.0777
Thus, kurtosis is E ( X )4/ 4 = .0777/0.464 = 1.76

2.5. Let X denote temperature in F and Y denote temperature in C. Recall that Y = 0 when X = 32 and
Y = 100 when X = 212. This implies Y =
(100/180) ( X 32) or Y =
17.78 + (5/9) X. Using Key
o
Concept 2.3, X = 65 F implies that Y =
17.78 + (5/9) 65 =
18.33C, and JX = 5oF implies
J Y= (5/9) 5= 2.78C.

2.6.

The table shows that Pr(X = 0, Y = 0) = 0.078, Pr(X = 0, Y = 1) = 0.673,

Pr(X = 1, Y = 0) = 0.042, Pr(X = 1, Y = 1) = 0.207, Pr(X = 0) = 0.751,
Pr(X = 1) = 0.249, Pr(Y = 0) = 0.12, Pr(Y = 1) = 0.88.
(a)

E (Y ) =
Y =Pr(Y
0
=
0) + 1 Pr(Y =
1) =
0 0.12 + 0 0.88 =
0.88.

(b) Unemployment Rate

#(unemployed)
= Pr(Y = 0) = 0.12
#(labor force)

(c) Calculate the conditional probabilities first:

Pr(=
X 0,=
Y 0) 0.078
Pr(Y= 0 | X= 0)=
=
= 0.104,
Pr( X = 0)
0.751
Pr(=
= 1) 0.673
X 0,Y
Pr(Y= 1| X= 0)=
=
= 0.896,
Pr( X = 0)
0.751
Pr(=
= 0) 0.042
X 1,Y
Pr(Y= 0 | X= 1)=
=
= 0.169,
Pr( X = 1)
0.249
Pr(=
= 1) 0.207
X 1,Y
Pr(Y= 1| X= 1)=
=
= 0.831.
Pr( X = 1)
0.249
The conditional expectations are

E (Y | X =
1) =P
0 r(Y =
0| X =
1) + 1 Pr(Y =
1| X =
1)
= 0 + 1 0.831 = 0.831,
E (Y | X =
0) =
0 Pr(Y =
0| X =
0) + 1 Pr(Y =
1| X =
0)
= 0 + 1 0.896 = 0.896.
(d) Use the solution to part (c),
Unemployment rate for college graduates = 1 E(Y|X = 1) = 1 0.831= 0.169
Unemployment rate for non-college graduates = 1 E(Y|X = 0) = 1 0.896 = 0.104.
(e) The probability that a randomly selected worker who is reported being unemployed is a
Pr(=
X 1,=
Y 0) 0.042
Pr( X= 1| Y= 0)=
=
= 0.35
Pr(Y = 0)
0.12
The probability that this worker is a non-college graduate is

Pr(X = 0|Y = 0) = 1 Pr(X = 1|Y = 0) = 1 0.35 = 0.65.

(f) Educational achievement and employment status are not independent because they do not
satisfy that, for all values of x and y,

Pr (X x |Y y) Pr (X x)
For example, from part (e) Pr(X = 0|Y = 0) = 0.65 while from the table Pr(X = 0) = 0.751.

2.7.

Using obvious notation, C M F ; thus C M F and C2 M2 F2 2 cov(M, F). This

implies
(a) C 50 + 48 = \$98,000 per year.
(b) corr ( M , F )

cov(M , F )

M F

, so that cov(M, F) M F corr ( M , F ). Thus cov ( M , F )

15 13 0.9 = 175.50, where the units are squared thousands of dollars per year.
(c) C2 M2 F2 2 cov(M, F),
so that C2 = 152 +132 + 2(175.5) = 745 and C = 745 = 27.295 thousand dollars per year.

(d) First you need to look up the current Euro/dollar exchange rate in the Wall Street Journal, the
Federal Reserve web page, or other financial data outlet. Suppose that this exchange rate is e
(say e = 0.75 Euros per dollar); each 1 dollar is therefore with e Euros. The mean is therefore
e C (in units of thousands of Euros per year), and the standard deviation is e C (in units
of thousands of Euros per year). The correlation is unit-free, and is unchanged.
2.8.

Z = E (3(Y 4) = 3( E (Y ) 4) = 3(4 4) = 0 745 = 27.295

1
9

Z2 = var(3(Y 4) = 9 var(Y ) = 9 = 1

2.9.

Value of X

3
6
9
Probability distribution of Y

2
0.04
0.10
0.13
0.27

4
0.09
0.06
0.11
0.26

Value of Y
6
8
0.03
0.12
0.15
0.03
0.04
0.06
0.22
0.21

10
0.01
0.02
0.01
0.04

Probability
Distribution of X
0.29
0.36
0.35
1.00

(a) The probability distribution is given in the table above.

E(Y) = 2(0.27) + 4(0.26) + 6(0.22) + 8(0.21) + 10(0.04) = 4.98.

E(Y2) = 22(0.27) + 42(0.26) + 62(0.22) + 82(0.21) + 102(0.04) = 30.6.

Var(Y) = E(Y2) [E(Y)] 2 = 30.6 4.98 = 25.62.
(b) The conditional probability of Y|X = 6 is given in the table below
Value of Y

2
0.10/0.36

4
0.06/0.36

6
0.15/0.36

8
0.03/0.36

10
0.02/0.36

E(Y|X = 6) = 2(0.1/0.36) + 4(0.06/0.36) + 6(0.15/0.36)

+ 8(0.03/0.36) + 10(0.02/0.36) = 4.944
E(Y2|X = 6) = 22(0.1/0.36) + 42(0.06/0.36) + 62(0.15/0.36)
+ 82(0.03/0.36) + 102(0.02/0.36) = 29.667.
Var(Y) = 29.667 4.944 = 24.723.
(c)

E(XY) = 3 2 0.04 + 3 4 0.09 + + 9 10 0.01 = 29.4

Cov(X, Y) = E(XY) E(X)E(Y) = 29.4 6.18 4.98 = 1.3764
Corr(X, Y) = cov(X, Y)/(var(X)var(Y)) = 1.3764/(5.7276 24.723) = 0.0097

2.10.

(a)

Y Y

~ N (0, 1) and Appendix Table 1, we have

Y 4 3 1
1
Pr(Y 5) = Pr

= = 0.6304

3
3
3

Y 5 25
1
1
(b) Pr(Y > 2) = 1 Pr

= 1 = = 0.5987

4
4
4
4
2 1 Y 1 5 1
1
(c) Pr(2 Y 5) = Pr

= (2) = 0.2857

2
2
2
2
1 2 Y 2 1 2

(d) Pr(1 Y 4) = Pr
= (2) (1) = (2) (1 (1)) = 0.8185
1
1

2.11.

(a) 0.90
(b) 0.95
(c) 0.95
(d) When Y ~ 82 , then Y /10 ~ F8, .

2.12.

(a)
(b)
(c)
(d)

0.90
0.90
0.9108
The tdf distribution and N(0, 1) approach each other only when N becomes large, or the
degrees of freedom become large.
(e) 0.95
(f) 0.01

2.13.

(a) E (Y 2 ) = Var (Y ) + Y2 = 4 + 0 = 4;

E(W 2 ) = Var(W ) + W2 = 16 + 0 = 16

(b) Y and W are symmetric around 0, thus skewness is equal to 0; because their mean is zero,
this means that the third moment is zero.
(c) The kurtosis of the normal is 3, 3 =

E (Z Z )4

Z4

(d) First, condition on X = 0, so that S = W:

E(S | X = 0) = 0; E(S

| X = 0) =16; E(S

| X = 0) = 0; E(S

| X =1) = 0; E(S

| X = 0) = 3162

Similarly,

E(S | X = 1) = 0; E(S

| X =1) = 4; E(S

| X =1) = 3 42

From the law of iterated expectations

E (S) =
E(S | X =
0) Pr( X =
0) + E (S | X =
1) Pr( X =
1) =
0
E (S 2 ) =
E(S 2 | X =
0) Pr( X =
0) + E (S 2 | X =
1) Pr( X =
1) =
16 0.1 + 4 0.9 =
5.2
E (S 3 ) =
E(S 3 | X =
0) Pr( X =
0) + E (S 3 | X =
1) Pr( X =
1) =
0
E (S 4 ) =
E(S 4 | X =
0) Pr( X =
0) + E (S 4 | X =Pr
1) ( X =
1) =
3 162 0.1 + 3 42 0.9 = 120

S E(S
=
) 0, thus E(S S )3 = E(S 3 ) = 0, from part (d). Thus skewness = 0. Similarly,
(e) =
S2 = E ( S S ) 2 = E ( S 2 ) =5.2, and E(S S ) 4 = E(S 4 ) = 120.
Thus, kurtosis = 120/(5.22) = 4.44.

2.14.

The central limit theorem suggests that when the sample size (n) is large, the distribution of the
sample average (Y ) is approximately N ( Y , ) with =
2
Y

2
Y

Y2
n

. Given Y = 50 and Y2 = 21,

2
Y

21
= = 0.21
n 100
Y 50 51 50
Y 50

Pr(Y =
51) Pr
=
2.1822
=
Pr
0.9853
0.21
0.21
0.21

2
(a) With n = 50 and =
Y

Y2

21
= = 0.14 , and
n 150
Y 50 49 50
Y 50

Pr(Y 49) =
1 Pr(Y 49) =
1 Pr

1 Pr
2.6726 =
0.9962
=
0.14
0.14
0.14

2
(b) With n = 150 and =
Y

Y2

21
= = 0.467 , and
n 45
50.5 50 Y 50 51 50
Pr(50.5 Y 51) = Pr

= (1.4633) (0.7317)
0.467
0.467
0.467
= 0.9245 0.7647 = 0.1598

2
(c) With n = 45 and =
Y

2.15. (a)

9.6 10 Y 10 10.4 10

Pr (9.6 Y 10.4) = Pr

4/n
4/n
4/n
10.4 10
9.6 10
= Pr
Z

4/n
4/n

where Z ~ N(0, 1). Thus,

10.4 10
9.6 10
(i) n = 20; Pr
Z
= Pr (0.89 Z 0.89) = 0.63
4/n
4/n
10.4 10
9.6 10
(ii) n = 100; Pr
Z
= Pr(2.00 Z 2.00) = 0.954
4/n
4/n
10.4 10
9.6 10
(iii)n = 1000; Pr
Z
= Pr(6.32 Z 6.32) = 1.000
4/n
4/n
(b)

c
Y 10
c

Pr (10 c Y 10 + c) = Pr

4/n
4/n
4/n
c
c
= Pr
Z
.
4/n
4/n

As n get large

c
gets large, and the probability converges to 1.
4/n

(c) This follows from (b) and the definition of convergence in probability given in
Key Concept 2.6.

2.16.

There are several ways to do this. Here is one way. Generate n draws of Y, Y1, Y2, Yn. Let Xi 1
if Yi 3.6, otherwise set Xi 0. Notice that Xi is a Bernoulli random variables with X Pr(X 1)
Pr(Y 3.6). Compute X . Because X converges in probability to X Pr(X 1) Pr(Y 3.6),
X will be an accurate approximation if n is large.

2.17.

= Pr
0.24
0.24

n
n

Y 0.6

0.04
Y 0.6
0.2157
= Pr

= Pr
0.577=
0.24
0.24

0.24

n
n
n

Y 0.6
Pr
=
0.56) Pr
=
=
1.157 0.12
(ii) P(Y
0.24
0.24

0.24

n
n
n

(b) We know Pr(1.96 Z 1.96) = 0.95, thus we want n to satisfy

=
0.61

0.65 0.60
0.65 0.60
> 1.96 and
< 1.96. Solving these inequalities yields n 368.
0.24
0.24
n
n

2.18.

Pr(=
Y \$0)
= 0.95, Pr(=
Y \$30,000)
= 0.05.

(a) The mean of Y is

Y =
0 Pr(Y =
\$0) + 30,000 Pr(Y =
\$30,000) =
\$1,500.
The variance of Y is

Y2 E[(Y Y ) 2 ]
=
Y2 =
50) + (30,000 1500) 2 Pr(Y =
\$30,000)
(0 1500) 2 Pr(Y =
Y2 =
(1500) 2 0.95 + (28,500) 2 0.05 =
4.27 107
so the standard deviation of Y is Y =(4.27 107 )1 2 =\$6,538.35 .
(b) Y2 =

Y2
n

=4.27

107
=355,833.
120

Using the central limit theorem,

Pr(Y > 3000) =
1 Pr(Y 3000)
Y 1500 3000 1500
=
1 Pr

355,833
355,833
= 1 (2.5145) = 1 0.9939 = 0 (correct to 4 decimal places).

l

2.19. (a) Pr (Y = y j ) = Pr ( X = xi , Y = y j )
i =1
l

= Pr (Y = y j | X = xi )Pr ( X = xi )
i =1

j =1

j =1

i =1

(b) E (Y ) = y j Pr (Y = y j ) = y j Pr (Y = y j |X = xi ) Pr ( X = xi )
k

i =1 j =1
l

yj Pr (Y = yj |X = xi ) Pr ( X = xi )

= E (Y | X = xi )Pr ( X = xi ).
i =1

(c) When X and Y are independent,

Pr (X = xi , Y = yj ) = Pr (X = xi )Pr (Y = y j ),
so

XY = E[( X X )(Y Y )]
l

= ( xi X )( y j Y ) Pr ( X = xi , Y = y j )
i =1 j =1
l

= ( xi X )( y j Y ) Pr ( X = xi ) Pr (Y = y j )
i =1 j =1

l
k
= ( xi X ) Pr ( X = xi ) ( yj Y ) Pr (Y = yj
i =1
j =1

= E ( X X ) E (Y Y ) = 0 0 = 0,

cor (X , Y ) =

XY
0
=
= 0.
XY XY

2015 Pearson Education, Ltd.

2.20. (a) Pr (Y = yi ) = Pr (Y = yi | X = xj , Z = zh ) Pr (X = xj , Z = zh )
j =1 h =1

(b) E (Y ) = yi Pr (Y = yi ) Pr (Y = yi )
i =1
k

= yi Pr (Y = yi | X = xj , Z = zh ) Pr (X = xj , Z = zh )
i =1

j =1 h =1

= yi Pr (Y = yi | X = xj , Z = zh ) Pr (X = xj , Z = zh )
j =1 h =1 i =1

= E (Y | X = xj , Z = zh ) Pr (X = xj , Z = zh )
j =1 h =1

where the first line in the definition of the mean, the second uses (a), the third is a
rearrangement, and the final line uses the definition of the conditional
expectation.

2.21.

(a) E ( X )3 E[( X ) 2 ( X )] E[ X 3 2 X 2 X 2 X 2 2 X 2 3 ]
E ( X 3 ) 3E( X 2 ) 3E( X ) 2 3 E ( X 3 ) 3E ( X 2 )E ( X )
3E ( X )[E( X )]2 [ E ( X )]3
E ( X 3 ) 3E ( X 2 )E ( X ) 2 E ( X )3

(b) E ( X ) 4 E[( X 3 3 X 2 3 X 2 3 )( X )]
E[ X 4 3 X 3 3 X 2 2 X 3 X 3 3 X 2 2 3 X 3 4 ]
E ( X 4 ) 4 E ( X 3 ) E ( X ) 6 E ( X 2 ) E ( X ) 2 4 E ( X ) E ( X )3 E ( X ) 4
E( X 4 ) 4[ E ( X )][ E ( X 3 )] 6[ E ( X )]2 [ E ( X 2 )] 3[ E ( X )]4

2.22.

The mean and variance of R are given by

w 0.08 (1 w) 0.05

2 w2 0.07 2 (1 w) 2 0.042 2 w (1 w) [0.07 0.04 0.25]

where 0.07 0.04 0.25 Cov (Rs , Rb ) follows from the definition of the correlation between Rs
and Rb.
(a) 0.065; 0.044
(b) 0.0725; 0.056
(c) w 1 maximizes ; 0.07 for this value of w.
(d) The derivative of 2 with respect to w is

d 2
2w 0.07 2 2(1 w) 0.042 (2 4w) [0.07 0.04 0.25]
dw
0.0102w 0.0018
Solving for w yields w 18 / 102 0.18. (Notice that the second derivative is positive, so that
this is the global minimum.) With w 0.18, R 0.038.

2.23.

X and Z are two independently distributed standard normal random variables, so

X Z 0, X2 Z2 1, XZ 0.
(a) Because of the independence between X and Z , Pr (Z z| X x) Pr (Z z), and

E ( Z |X ) E ( Z ) 0. Thus E (Y|X ) E(X 2 Z|X ) E(X 2|X ) E (Z|X ) X 2 0 X 2

(b) E ( X 2 ) X2 X2 1, and Y E ( X 2 Z ) E ( X 2 ) Z 1 0 1
(c) E ( XY ) E ( X 3 ZX ) E ( X 3 ) E (ZX ). Using the fact that the odd moments of a standard
normal random variable are all zero, we have E ( X 3 ) 0. Using the independence between
3
X and Z , we have E ( ZX ) Z X 0. Thus E ( XY ) E ( X ) E(ZX ) 0.

(d)

cov (XY ) E[( X X )(Y Y )] E[( X 0)(Y 1)]

E ( XY X ) E ( XY ) E ( X )
0 0 0
corr (X , Y )

XY
0

0
XY XY

2.24. (a) E (Yi 2 ) = 2 + 2 = 2 and the result follows directly.

(b) (Yi/) is distributed i.i.d. N(0,1), W = i =1 (Yi / )2 , and the result follows from the
n

definition of a n2 random variable.

n

(c) E (W ) = E
i =1

Yi 2

= E
i =1

Yi 2

= n.

(d) Write

V=

Y1
in=2 Yi2
n 1

Y1 /
in=2 (Y / )2
n 1

which follows from dividing the numerator and denominator by . Y1/ ~ N(0,1),
n
n
i=2 (Yi / )2 ~ n21 , and Y1/ and i=2 (Yi / )2 are independent. The result then

2.25.

(a)

ax
i 1

(b)

(ax1 ax2 ax3 axn ) a( x1 x2 x3 xn ) a xi

i 1

(x y ) (x
i 1

y1 x y2 xn yn )

( x1 x2 xn ) ( y1 y2 yn )
n

i 1

i 1

xi yi
(c)

a (a a a a) na
i 1

(d)

(a bx
i 1

i 1

i 1

i 1

i 1

i 1

i 1

cov(Yi , Y j )

Y Y
i

cov(Yi , Y j )

Y Y

cov(Yi , Y j )

Y2

= , where the first equality

uses the definition of correlation, the second uses the fact that Yi and Yj have the same
variance (and standard deviation), the third equality uses the definition of standard
deviation, and the fourth uses the correlation given in the problem. Solving for
cov(Yi, Yj) from the last equality gives the desired result.
1
1
1
2
2
2
2 Y2
1
1
2
var(Y ) var(Y1 ) var(Y2 ) cov(Y1 , Y2 ) Y
4
4
4
2
2
1
2

(c) Y

1 n
1 n
1 n
)

Y
E
(
Y
E
(
Y
Y Y
,
so
that
i
i n
n i1
n i 1
i 1

1 n
var(Y ) var Yi
n i 1
1 n
2
2 var(Yi ) 2
n i 1
n

1
n2

2
Y

i 1

2
Y

2
n2

n 1

cov(Y ,Y )

n 1

i 1 j i1
n

i1 j i 1

2
Y

n(n 1)
Y2
n
n2
2 1
Y 1 Y2
n n

n 1

a a(1 2 3 n 1)
i 1 j i 1

(d) When n is large

Y2
n

0 and

an( n 1)
for any variable a.
2

1
0 , and the result follows from (c).
n

2.27 (a) E(W) = E[E(W|Z) ] = E[E(X X! )|Z] = E[ E(X|Z) E(X|Z) ] = 0.

(b) E(WZ) = E[E(WZ|Z) ] = E[ZE(W)|Z] = E[ Z0] = 0
(c) Using the hint: V = W h(Z), so that E(V2) = E(W2) + E[h(Z)2] 2E[Wh(Z)].
Using an argument like that in (b), E[Wh(Z)] = 0. Thus, E(V2) = E(W2) +
E[h(Z)2], and the result follows by recognizing that E[h(Z)2] 0 because h(z)2 0
for any value of z.