Академический Документы
Профессиональный Документы
Культура Документы
By Joan Llull
Main references:
Goldberger: 13.1, 14.1-14.5, 16.4, 25.1-25.4, (13.5), 15.2-15.5, 16.1, 19.1
I. Classical Regression Model
A. Introduction
1
estimator for and :1
N
1 X
(, ) = arg min (yi a b0 xi )2 . (4)
(a,b) N
i=1
This estimator is called Ordinary Least Squares. The solution to the above
problem is:
" N #1 N
X X
0
= (xi xN )(xi xN ) (xi xN )(yi yN ).
i=1 i=1
0
= yN xN . (5)
Note that the first term of is a K K matrix, while the second is a K 1 vector.
(y W d)0 (y W d) = y 0 y y 0 W d d0 W 0 y + d0 W 0 W d
= y 0 y 2d0 W 0 y + d0 W 0 W d. (8)
The last equality is obtained by observing that all elements in the sum are scalars.
The first order condition is:
2W 0 y + 2(W 0 W ) = 0,
W 0 y = (W 0 W ), (9)
= (W 0 W )1 W 0 y.
1
To avoid complications with the notation below, in this chapter we follow the convention
of writing the estimators as a function of realizations (yi , xi ) instead of doing it as functions of
the random variables (Yi , Xi ).
2
Note that we need W 0 W to be full rank, such that it can be inverted. This is to
say, we require absence of multicollinearity .
D. Residuals and Fitted Values
2) y 0 u = 0 because y 0 u = W 0 u = 0 = 0.
3) y 0 y = y 0 y because y 0 y = (y + u)0 y = y 0 y + u0 y = y 0 y + 0 = y 0 y.
Following exactly the analogous arguments as in the proof of the variance de-
composition for the linear prediction model in Chapter 3 we can prove that:
y 0 y = y 0 y + u0 u and Var(y)
d = Var(y)
d + Var(u),
d (10)
N 1 N 2
P
where Var(z) i=1 (z z) To prove the first, we simply need basic algebra:
d
u0 u = (y y)0 (y y) = y 0 y y 0 y y 0 y + y 0 y = y 0 y y 0 y. (11)
Given the result in item 4) above, we can conclude that (yy)0 (yy) = y 0 yN y 2 .
Thus:
N Var(u)
d = u0 u = y 0 y y 0 y = y 0 y N y 2 (y 0 y N y 2 ) = N Var(y)
d N Var(y),
d
(13)
3
completing the proof.
Similar to the population case described in Chapter 3, this result allows us to
write the sample coefficient of determination as:
PN PN
2 u2i (yi y)2i Var(y)
d d y)]2
[Cov(y,
R 1 PN i=1
= Pi=1
N
= = = 2y,y .
i=1 (yi y)2 i=1 (yi y)2 Var(y)
d Var(y)Var(y)
d d
(14)
The last equality is obtained by multiplying and dividing by y 0 y, and using that
y 0 y = y 0 y as shown above.
which implies that E[yi |x1 , ..., xN ] = E[yi |xi ]. This is not satisfied, for ex-
ample, by time series data: if xi = yi1 (that is, a regressor is the lag of the
dependent variable), as E[yi |x1 , ..., xN ] = E[yi |xi , xi+1 = yi ] = yi 6= E[yi |xi ].
4
Var(yi |xi ) = 2 and Cov(yi , yj |x1 , ..., xN ) = 0 for all i 6= j:
Var(yi |xi ) = Var(E[yi |x1 , ..., xN ]|xi ) + E[Var(yi |x1 , ..., xN )|xi ]
= Var(E[yi |xi ]|xi ) + E[ 2 |xi ] = 0 + 2 = 2 (16)
We could also check as before that an i.i.d. random sample would satisfy
this condition.
II. Statistical Results and Interpretation
A. Unbiasedness and Efficiency
The first result that we obtained indicates that OLS gives an unbiased estimator
of under the classical assumptions. Now we need to check how good is it in terms
of efficiency. The Gauss-Markov Theorem establishes that OLS is a BLUE
(best linear unbiased estimator). More specifically, the theorem states that in
the class of estimators that are conditionally unbiased and linear in y, is the
estimator with the minimum variance.
To prove it, consider an alternative linear estimator Cy, where C is
a function of the data W . We can define, without loss of generality, C
0 1 0
(W W ) W + D, where D is a function of W . Assume that satisfies E[|W ] =
(hence, is another linear unbiased estimator). We first check that E[|W ] = is
equivalent to DW = 0:
E[|W ] = E[ + (W 0 W )1 W 0 u + DW + Du|W ] = (I + DW )
(I + DW ) = DW = 0, (20)
5
Therefore, Var(|W ) is the minimum conditional variance of linear unbiased es-
timators. Finally, to prove that Var() is the minimum as a result we use the
variance decomposition and the fact that the estimator is conditionally unbiased,
which implies Var(E[|W ]) = 0. Using that, we obtain Var() = E[Var(|W )].
Hence, proving whether Var() Var() 0, which is what we need to prove
to establish that Var() is the minimum for this class of estimators, is the same
as proving E[Var(|W ) Var(|W )] 0. Note that, given a random matrix A,
because Z 0 E[A]Z = E[Z 0 AZ] if A is positive semidefinite, E[A] is also positive
semidefinite. Therefore, since we proved that Var(|W ) Var(|W ) 0, that
is, it is positive semidefinite, then its expectation should be positive semidefinite,
which completes the prove.
which easily delivers that the maximum likelihood estimator of is the OLS
0
estimator, and 2 = uNu . Therefore, we can conclude that, under the normality
assumption, the OLS estimator is conditionally a BUE. We could prove, indeed,
that 2 (W 0 W )1 is (conditionally) the Cramer-Rao lower bound. Even though
we are not going to prove it (it is not a trivial proof), unconditionally, there is no
BUE. To do it, we would need to use the unconditional likelihood f (y|W )f (W )
instead of f (y|W ) alone.
6
Regarding 2 , similarly to what happened with the variance of a random vari-
able, the MLE is biased:
u = y W = y W (W 0 W )1 W 0 y = (I W (W 0 W )1 W 0 )y = M y. (26)
where we used the fact that u0 M u is a scalar (and hence equal to its trace), and
some of the tricks about traces used above. Now: