Вы находитесь на странице: 1из 6

ECON 594: Lecture #8

Thomas Lemieux
Department of Economics, UBC
May 2012
1 Measurement error
Until now, we have always assumed that all relevant variables ( and r) were measured
without error. In reality, however, there are many reasons to believe that a substantial
amount of error is introduced in the measurement of standard economic variables. For
example, people may not recall exactly when asked about recent consumption expendi-
tures, hours of work, earnings, etc. A number of validation surveys have indeed indicated
that these errors can be substantial by comparing detailed employer administrative data
about earnings to self reports by individuals. In light of this, it is important to assess
how standard econometric estimates are aected by measurement error in either (or both)
the explanatory or dependent variables
1.1 Classical measurement error
The most standard form of measurement error that has been considered in the literature
is what is called classical measurement error. Under classical measurement error, it is
assumed that the observed value of (or r) is equal to the true value of (or r) plus a
purely random component. More specically, we can write the measured value of as
the sum of the true value

plus a measurement error n


y
:
=

+ n
y
. (1)
A rst important result is that measurement error in the dependent variable does not
aect the consistency of OLS estimates of ,. To see this, lets rst write the true model
1
as:

= r

, + . (2)
From equation (1) it follows that

= n
y
. Substituting back into equation (2) yields
= r

, + + n
y
. (3)
So measurement error in only changes the interpretation of the error term where we
both have the standard error term, , plus a component due to measurement error, n
y
.
This does not aect the consistency of the OLS estimate
b
, since
plim(
b
,) =
co(. r

)
c:(r

)
=
co(r

, + + n
y
. r

)
c:(r

)
= , +
co( + n
y
. r

)
c:(r

)
= , (4)
because co(. r

) = co(n
y
. r

) = 0. Note, however, that adding a measurement error


term does increase that variance of the error term from c:() to c:() + c:(n
y
). As
a result, it also increases the variance (or standard error) of
b
, from (A
0
A)
1
c:() to
(A
0
A)
1
[c:() + c:(n
y
)].
A second important result is that measurement error in the explanatory variable, r,
does result in a bias in the OLS estimate of ,. To see this, consider the measured value
of r as the sum of the true value r

plus a measurement error n


x
:
r = r

+ n
x
(5)
Substituting r

= r n
x
in equation (2) we now get that

= r, + ( ,n
x
) (6)
This is obviously a problem since the presence of n
x
in the error term generates a mechan-
ical correlation between the error term, ,n
x
, and the explanatory variable, r = r

+n
x
.
In fact, it now follows that
plim(
b
,) =
co(

. r)
c:(r)
=
co(r

, + . r

+ n
x
)
c:(r

+ n
x
)
=
c:(r

)
c:(r

+ n
x
)
, +
co(r

,. n
x
) + co(. r

+ n
x
)
c:(r

+ n
x
)
=
c:(r

)
c:(r

) + c:(n
x
)
,. (7)
2
So
b
, now only converges in probability to a fraction
var(x

)
var(x

)+var(ux)
< 1 of the true ,. This
bias is called an attenuation bias since
b
, is biased towards 0 (irrespective of whether ,
is positive or negative). Note that the attenuation bias term is closely linked to the
signal-to-noise ratio since c:(r

) is the variance of the signal while c:(n


x
) is the
variance of the noise. The larger the variance of the noise relative to the signal, the
larger is the magnitude of the attenuation bias.
Note that this simple and clear result about the form of the bias only works in
the case of a univariate regression. With multiple regressors where only one of the
element of (the vector) r is measured with error, the bias will also depend (in a not so
straightforward way) on the correlation between the r variable measured with error and
the other explanatory variables.
1.2 Measurement error bias in rst-dierences
When r is measured with error, rst-dierencing to eliminate a xed eect will generally
magnify the importance of the measurement error bias. To see this, it is easy to show
that once the variables in the above model are replaced with their rst dierences, we
have:
plim(
b
,
FD
) =
c:(r

)
c:(r

) + c:(n
x
)
, (8)
The problem is that the variance in the noise term doubles from c:(n
x
) to c:(n
x
) =
2c:(n
x
). By contrast, the variance of the signal declines if there is substantial serial
correlation in r

, which is typically the case with panel data (the serial correlation is
equal to one in the extreme but common case of time-invariant regressors). If j
x
is the
autocorrelation in r

, it follows that
c:(r

) = 2(1 j
x
)c:(r

) (9)
which is smaller than c:(r

) when j
x
.5. So unless j
x
6 0, which is unlikely, rst
dierencing will tend to magnify the attenuation bias due to measurement error. Using
a similar approach, in can be shown that using the within (or dierence from means)
approach instead of rst-dierences does not generally increase the measurement error
bias as much. Taking longer (say t vs. t 3) dierences is another way of removing
the xed eect without increasing as much the measurement error bias as in the case of
rst dierences. The intuition is that the autocorrelation term j
x
tends to decline for
longer dierences (for example in an AR(1) model), which results in a larger variance in
3
the signal and a smaller attenuation bias. Griliches and Hausman have a classic piece
on this in the 1986 Journal of Econometrics.
In practical terms, if you nd that rst dierence estimates are closer to zero than
within estimates or estimates based on longer dierences, measurement error may be the
culprit. In such a case, you should trust more the latter set of estimates than standard
rst dierences. It can also be shown that measurement error tends to magnify the
attenuation bias in the random eect estimator relative to OLS. This is an additional
reason for simply correcting the OLS standard errors by clustering instead of estimating
the more ecient (but also more sensitive to measurement error) random eects model.
1.3 IV as a possible solution
What can be done about measurement error in the explanatory variables? Since the
problem in equation (6) is that the error term is correlated with r, the standard solution
is to nd an instrumental variable for r that is correlated with r, but not with the error
term ,n
x
. As it turns out, nding such a variable is not quite as dicult as in the
standard case where the correlation with the error term is due to a deeper economic
mechanism. In particular, when the problem is that r is only an imperfect proxy for the
true r

, there may also be other proxies for r

that are available. For instance consider


a second proxy ., where
. = r

+ n
z
(10)
Provided that the measurement error in ., n
z
, is uncorrelated with the measurement
error in r, n
x
, we get that . is a valid instrumental variable since:
co(.. r) = c:(r

) 0 (11)
and
co(.. ,n
x
) = co(r

+ n
z
. ,n
x
) = 0 (12)
So the simple solution is to use one proxy as an instrument for the other proxy, and
consistently estimate , by TSLS. Note that . will remain a valid instrument for r (but
not the other way around) even if . has a dierent scale, i.e. . = r

+n
z
, with 6= 1.
This can be quite useful with panel data, since . can then consist of lagged values of r
where . is correlated with r (provided that j
x
6= 0) but where the measurement errors are
not correlated under the assumption that measurement error is iid (classical measurement
error).
4
Another approach is to use group means as an instrumental variable. For example, if
we have data on rms, the average value of r in the industry can be used as an instrument
for rm-level r. The two variables tend to be strongly correlated if r varies substantially
across industries, while measurement error in the industry-level variable tends to be
small since rm-level measurement errors get averaged out to zero when we compute the
industry mean. Even better, the mean r among all other rms in the industry can be
used as an instrument to make sure the rm-specic measurement error does not become
part of the industry average.
1.4 Non-classical measurement error
Many validation studies have found that the assumption underlying classical measure-
ment error was incorrect, and that measurement error was often correlated with the r
variable. For example, several validation studies indicate that high-income people tend
to under-report their income, while low-income tend to over-report. This means that n
x
is negatively correlated with r

, which complicates the formula for the attenuation bias.


The IV approach will still work provided that n
z
remains uncorrelated with n
x
, but this
wont happen if the measurement error n
z
is also non-classical and correlated with r

.
Another case where non-classical measurement error can prevail is when both and
r are dened partly in terms of the same underlying variables. Take for instance the
case where both and r are dened on a per capita basis. For example, say we have
a sample of regions with regional level aggregate variables 1 and A that are divided by
population 1 to get a per capita number. When working with logs, we have
= log(1 ) log(1) and r = log(A) log(1)
If measured population is based on a noisy estimate of the true population 1

, where
log(1) = log(1

) + n
p
, while 1 and A are measured without error, we then get that
= log(1

) log(1) = log(1

) log(1

) n
p
=

n
p
and
r = log(A

) log(1) = log(A

) log(1

) n
p
= r

n
p
So we now have the same measurement error in r and , which results in inconsistent
OLS estimates since
plim(
b
,) =
co(. r)
c:(r)
=
co(

n
p
. r

n
p
)
c:(r

n
p
)
5
=
co(r

, + n
p
. r

n
p
)
c:(r

n
p
)
=
, c:(r

) + c:(n
p
)
c:(r

) + c:(n
p
)
6= , (13)
As before, the bias goes away when the variance of the signal, c:(r

), becomes very
large relative to the variance in the noise, c:(n
p
). But when the signal to noise ratio
is not that large, the mechanical correlation between y and x due to n
p
tend to bias
b
,
towards 1. For example, even if the true , is zero, we have
plim(
b
,) =
c:(n
p
)
c:(r

) + c:(n
p
)
(14)
which lies between 0 and 1, depending on the signal-to-noise ratio.
6

Вам также может понравиться