Академический Документы
Профессиональный Документы
Культура Документы
(Handout Version)∗
Walter Belluzzo
Econ 507 Econometric Analysis
Spring 2013
• The main result of the FWL theorem is sometimes called Partitioned Regression
because only part of the covariates are included explicitly in the estimated model.
• This theorem is very useful in regression analysis because it allows us to partition the
covariates to facilitate derivation.
• We will develop the ideas underlying the FWL theorem starting from the properties of
least squares projections and the linear transformation of the regressors.
• We are interested in understanding how this linear transformation of the regressors change
the vector of fitted values ŷ and the residuals vector û.
• Note that each ai is a k-vector, just like β, and each block Xai is a n-vector, which is a
linear combination of the columns of X.
∗ This lecture is based on D & M’s Chapter 2.
1
Linear Transformations and the Column Space
• Since S(X) contains all linear combinations of the columns of X, it must be that every of
Xai is in S(X)
• Now, take any w ∈ S(X). Since w = Xb for some b ∈ R (why?), we can write
Xb = X(AA−1 )b = (XA)(A−1 b).
• Note that A−1 b is a k-vector, so that the RHS is actually a linear combination of the
columns of XA, and thus it is in S(XA).
• As a result, we can conclude that every element of S(X) is in S(XA) and vice-versa. That
is, these two subspaces must be identical.
• Because S(X) and S⊥ (X) are not affected by the linear transformation, the fitted values
and residuals vectors are the same, regardless of using X or XA as regressors.
• Remember that we can write linear transformations of the columns as Xb for any (real)
k-vector.
• Thus, we see that replacing X by XA requires adjusting the vector of OLS estimates by
A−1 , so that ŷ will remain unchanged.
• Alternatively, we can substitute XA into the expression for β̂ and obtain the same result:
ˆ = ((XA)0 (XA))−1 (XA)0 y
β̂
= (A0 X0 XA)−1 A0 X0 y
= A−1 (X 0 X)−1 (A0 )−1 A0 X0 y (why?)
−1 0
=A (X 0 X)−1 X y
= A−1 β̂.
2
Linear Transformations and the Regression Intercept
• The invariance of ŷ and û to nonsingular linear transformations can be extended to the
addition of a constant amount to one or more of the regressors, provided that the regression
equation includes an intercept.
• Thus, if ι is a regressor, we can always write a matrix A that produces the effect of adding
a constant to a regressor.
• For instance, with k = 3 we have:
x11 x12 x13
x21 x22 x23 a11 a12 a13
XA = . .. a21 a22 a23
..
.. . . a31 a32 a33
xn1 xn2 xn3
• Making xi1 = 1 for all i, and transposing the first line, just to facilitate typesetting, we
get: 0
a11 + a12 x12 + a13 x13
a21 + a22 x12 + a23 x13
a31 + a32 x12 + a33 x13
• Note that all columns of XA include ai1 , and therefore we can always accommodate any
additive constant in x2 and/or x3 by choosing suitable a’s.
• Let us consider now the role of the intercept is the adjustment necessary in the vector β̂.
3
• Adding a constant a to the regressor will change the OLS estimates. So we can write the
corresponding model as
yi = γ1 + γ2 (xi + a) + ui .
• The OLS estimators for ξ2 and β2 are the same,
P
(x − x̄)(y − ȳ)
γ̂2 = β̂2 = P ,
(x − x̄)2
because adding a constant to x do not change (x − x̄):
1X
x − x̄ = (x + a) − (x + a)
n
• It is useful to help our understanding getting this result directly from our simple regression
model.
• Let zi = xi + c. Then,
yi = β1 + β2 (zi − c) +ui = (β1 − β2 c) +β2 zi + ui ,
| {z } | {z }
xi γ1
• What if E(u) 6= 0? Can you cast this case in to the additive constant term framework?
• We can also illustrate these ideas geometrically, using one of the previous diagrams with
ι as one of the regressors:
cι
x2 z
β̂2 cι
β̂2 x2 β̂2 z
θ2
ŷ
φ2
φ3 θ1 θ3
φθ11
O α̂1 β̂1 x1 x1
β̂2 cι
4
More on Adding Constants to Regressors
• Because there is no restriction on the arbitrary placement of the vector ι in our figure, we
can simply rotate the whole picture at will.
• Note, however, that in the simple regression example illustrated in our figure the trans-
formation can not be the addition of a constant, because of there would be no intercept.
• Even without an intercept, we can still get α̂2 = β̂2 and ŷ invariant, if the transformation
produces the necessary adjustment in the coefficient of the other variable.
• The answer is evident once you can see that the magic happens only because we can write
model (1) as
yi = (β1 − β2 c) + β2 zi + ui . (2)
yi = β1 x1 + β2 x2 + ui .
• Try to rewrite it for z = x2 + c, just like we did before. Can you get something like model
(2)?
yi = β1 x1 + β2 zi − β2 c + ui . (3)
• Now we cannot rearrange to get rid of the −β2 c by including it into a new term and keep
the model unchanged.
• Note that −β2 c works like an intercept the transformed model, so that we have now 3
parameters.
• It should be clear to you at this point that the subspaces spanned by the regressors before
and after the transformation have different dimensions.
• If the new column space is different, the fitted values and residuals vectors from the
transformed model will not be the same as those from the original model.
• With the intercept in place, x1 = ι, and we can simply combine β1 and β2 c into a single
parameter, while keeping the column space of X intact.
5
• When x1 6= ι we can get the same result only if the transformation is a linear function of
the other regressor.
y = β1 x1 + β2 (z − cx1 ) + residuals
• Now we see that the effect of the transformation “migrates” to the coefficient of x1 , leaving
the β2 intact in the transformed model.
• But note that β2 is associated with z in the transformed model, and not x2 as in the
original model.
• Partitioning the parameter vector accordingly, the regression model can be written as
y = X1 β1 + X2 β2 u.
• Then, for a k1 × k2 matrix A, adding X1 A to X2 while keeping X1 intact will not change
the estimates β̂2 , nor ŷ and û.
• Do not confuse this A matrix with that k × k matrix we defined to get nonsingular linear
transformation of X. This k1 × k2 matrix is actually part of that previous one.
6
3 Deviations from the mean
• We know that adding/subtracting a constant to a regressor does not change fitted values
or residuals. Moreover, parameter estimates other than the intercept will not change
either.
• Because the mean of a regressor is a constant for a given sample, running a regression on
deviations from the mean should affect only the intercept estimate.
• Therefore, even though x is not orthogonal to ι, centering will transform it into an or-
thogonal vector.
Mι x = (I − Pι )x = x − ι(ι0 ι)−1 ι0 x
= x − x̄ι = z
• The main reason for centering covariates is that the resulting change in the intercept gives
it a special interpretation, which is very convenient sometimes.
• Because z = 0 when xi = x̄, the intercept of the centered regression equals the mean of y
conditional on x = x̄.
7
Graphic Illustration of the Effect of Centering
• The figure at the right depicts a scatter plot of 50 observations simulated from our model,
and the estimated regression line, drawn in red.
y = (β1 + β2 x̄) ι + β2 z + u.
| {z }
α1
• What we are really doing is a rescaling the axis so that the origin is at x̄.
y y
α̂1
β̂2 x̄
β̂1
O x̄
x̄/O x/xz
4 Orthogonal Regressors
• A very special feature of the regression on centered covariates is that the regressors are
orthogonal to ι (remember that ι0 z = 0).
• Because the regressors are orthogonal, the estimate α̂2 in our simple regression example
is the same whether or not we include the regressor ι.
• To see why this is true, remember that we are decomposing y into three vectors:
• But projecting y on to S(z) annihilates both α̂1 ι and û, because these vectors are orthog-
onal to S(z).
8
• As a result, we are actually projecting α̂2 z on to S(z), which results in α̂2 z itself. (why?)
• It follows, then, that α̂2 is the same as given by regressing y on both ι and z.
• For any regressors x1 and x2 we can illustrate geometrically, what happens as we transform
one of them to get orthogonality:
cι
x2 z
β̂2 x2 β̂2 z
ŷ
θ1
O α̂1 β̂1 x1 x1
• Note that DM derive this for Px and P1 , stressing that orthogonality of X1 and X2 is
not necessary to obtain the result, but only symmetry of the corresponding orthogonal
projection matrices.
• I changed the notation to Z and W to avoid any confusion with this. In the result I
presented above only the relevant property of the projections Pz and Px when S(Z) ⊂ S(X).
• Confusion may arise because orthogonality is linked to the symmetry. It can be shown
that a square matrix P (i) is a projection matrix iff it is idempotent, and (ii) a orthogonal
projection matrix iff it is idempotent and symmetric.
• It should be clear that symmetry of the projection matrices is not linked to orthogonality
of Z and W in the preceding derivation.
• We consider here only P, but the same results hold for the complementary projection.
You should check this (DM Exercise 2.15).
9
5 Short and Short Regression
Short and Long Regression with Orthogonal Regressors
• We can generalize the orthogonal regressors case to any two groups of regressors using
the partitioned regression model
y = X1 β1 + X2 β2 + u. (4)
• Side note: Remember that we already defined the partition of the matrix X such that
X = [X1 X2 ], where X1 is n × k1 and X2 is n × k2 , with k1 + k2 = k.
• We will refer to the partitioned regression on the full set of regressors (4) as the long
regression and the regressions on a subset of the regressors (that is, X1 or X2 ) as the
short regression.
X β̂ˆ = P P y = P ŷ.
1 1 1 x 1
= P1 X1 β̂1 + P1 X2 β̂2
= X1 β̂1 ,
• As a result, we obtain exactly the same α̂2 from either the long regression
y = X1 α1 + M1 X2 α2 + u
10
Long Regressions on X2 × Long Regression on M1 X2
y = X 1 β1 + X 2 β2 + u
y = X1 α1 + (I − P1 )X2 α2 + u
= X1 α1 + (X2 − X1 (X01 X1 )−1 X01 X2 )α2 + u
= X1 α1 + (X2 − X1 A)α2 + u.
• But this is just the generalized result for adding a linear transformation of a set of regres-
sors to the other set of regressors.
• Thus, we know that α̂2 = β̂2 and also that the vectors of fitted values ŷ and residuals û
will be the same.
• Because M1 X2 ⊥ X1 we know that α̂2 from the short regression equals β̂2 from the long
regression.
• However, the fitted values and the residuals from short regression will be different because
dropping regressors change the column space where ŷ lies.
• To see this, note that y can be decomposed into fitted values from either the short or the
long regression. Therefore,
Residuals Regression
y = M1 X2 α̂2 + v̂
y = M1 X2 α̂2 + P1 y + û
y − P1 y = M1 X2 α̂2 + û
M1 y = M1 X2 α̂2 + û
11
Frisch-Waugh-Lovell Theorem
Theorem 3 (Frisch-Waugh-Lovell). The OLS estimates of β2 and the residuals from the re-
gressions
y = X1 β1 + X2 β2 + u
and
M1 y = M1 X2 β2 + residuals
are numerically identical.
Proof
• Let β̂1 and β̂2 be the OLS estimates from the original model. Then we can write y =
Px y + Mx y = X1 β̂1 + X2 β̂2 + Mx y.
ˆ is equal to β̂ .
• Next, we show that OLS estimate of β2 from the transformed model β̂ 2 2
ˆ = ((M X )0 (M X ))−1 (M X )0 y
β̂ 2 1 2 1 2 1 2
• To show that residuals are also numerically identical, we start again with y = X1 β̂1 +
X2 β̂2 + Mx y, but now we premultiply both sides by M1 ,
M1 y = M1 X1 β̂1 + M1 X2 β̂2 + M1 Mx y.
M1 y − M1 X2 β̂2 = Mx y = û
which is the desired result, since the LHS is the residual vector from the transformed
regression.
12