Академический Документы
Профессиональный Документы
Культура Документы
240A
For 240A we do not go in to great detail. Key OLS results are in Section
1 and 4. The theorems cited in sections 2 and 3 are those from Appendix A of
Cameron and Trivedi (2005), Microeconometrics: Methods and Applications.
yi = x i + ui
ui jxi [0; 2 ], not necessarily normal.
PN 1
Then E[b] = and V[b] = 2 2
i=1 xi for any N .
1.2. Consistency of b
Intuitively b collapses on its mean of as N ! 1, since V[b] ! 0 as N ! 1.
The formal term is that b converges in probability to , or that plim b = ,
p
or that b ! . We say that b is consistent for .
The usual method of proof is not this simple as our toolkit works separately
on each average, and then combines results. Given the dgp, b can be written as
1
PN
b = + N Pi=1 xi ui :
1 N 2
N i=1 xi
a
where means is asymptotically distributed as.
Again our toolkit works separatelyp on each average, and then combines re-
sults. The method is to rescale by N , to get something with nondegenerate
distribution, and
h PN 2 i
1
PN 2 1
p p
i=1 xi ui d
N 0; plim N i=1 xi
N (b ) = N
P N
! P
N
1
i=1 xi
2
plim N1 N 2
i=1 xi
" #
1
d 2 1 PN 2
! N 0; plim x :
N i=1 i
P
The key component is that p1N N i=1 xi ui has a normal distribution as N ! 1,
by a central limit theorem. p
Given this theoretical result we convert from N (b ) to b and drop the
plim, giving
b a N [ , 2 (PN x2 ) 1 ]:
i=1 i
2. Consistency
The keys are (1) convergence in probability; (2) law of large numbers for an
average; (3) combining pieces.
2
Examples include: (1)
P bN is an estimator, say b; (2) bN is a component of an
estimator, such as N 1 i xi ui ; (3) bN is a test statistic.
Due to sampling randomness we can never be certain that a random sequence
bN , such as an estimator bN , will be within a given small distance of its limit, even
if the sample is innitely large. But we can be almost certain. Dierent ways of
expressing this almost certainty correspond to dierent types of convergence of a
sequence of random variables to a limit. The one most used in econometrics is
convergence in probability.
Recall that a sequence of nonstochastic real numbers faN g converges to a if
for any " > 0, there exists N = N (") such that for all N > N ,
e.g. if aN = 2+3=N; then the limit a = 2 since jaN aj = j2+3=N 2j = j3=N j < "
for all N > N = 3=":
For a sequence of r.v.s we cannot be certain that jbN bj < ", even for large N ,
due to the randomness. Instead, we require that the probability of being within
" is arbitrarily close to one.
Thus fbN g converges in probability to b if
3
Denition A2: (Consistency) An estimator b is consistent for 0 if
plim b = 0:
4
Note that Xi here is general notation for a random variable, and in the regression
context does not necessarily denote the regressor variables. For example Xi = xi ui .
Denition A7: (Law of Large Numbers) A weak law of large numbers (LLN)
species conditions on the individual terms Xi in XN under which
p
(XN E[XN ]) ! 0:
For a strong law of large numbers the convergence is instead almost surely.
If a LLN can be applied then
plim XN = lim E[XN ] in general
1
PN
= lim N i=1 E[Xi ] if Xi independent over i
= if Xi iid.
The simplest laws of large numbers assume that Xi is iid.
Theorem A8: (Kolmogorov LLN ) Let fXi g be iid (independent and identically
as
distributed). If and only if E[Xi ] = exists and E[jXi j] < 1, then (XN ) ! 0.
The Kolmogorov LLN gives almost sure convergence. Usually convergence in
probability is enough and we can use the weaker Khinchines Theorem.
Theorem A8b: (Khinchines Theorem) Let fXi g be iid (independent and iden-
p
tically distributed). If and only if E[Xi ] = exists, then (XN ) ! 0.
There are other laws of large numbers. In particular, if Xi are independent
but not identically distributed we can use the Markov LLN.
5
3. Asymptotic Normality
The keys are (1) convergence in distribution; (2) central limit theorem for an
average; (3) combining pieces.
Given consistency, the estimator b has a degenerate distribution that collapses
on 0 as N ! 1. So cannot do statistical inference. [Indeed there is no reason to
do it if N ! 1.] Need to magnify or rescale b to obtain a random variable with
nondegenerate distribution as N ! 1.
lim FN = F;
N !1
6
This result is also called the Continuous Mapping Theorem.
d p
Theorem A12: (Slutskys Theorem) If aN ! a and bN ! b, where a is a random
variable and b is a constant, then
d
(i) aN + b N ! a + b
d
(ii) aN bN ! ab (3.2)
d
(iii) aN =bN ! a=b, provided Pr[b = 0] = 0:
XN E[XN ]
ZN = p ;
V[XN ]
7
If XN satises a central
p limit theorem, then so too does h(N )XN for functions
h( ) such as h(N ) = N , since
d d p
Using Slutskys theorem that aN bN ! a b (for aN ! a and bN ! b), this
implies that
PN
p1
1 XN i=1 xi ui p 2 d
p d
p xi ui = Np E[x2 ] ! N [0; 1] 2 E[x2 ] ! N [0; 2 E[x2 ]]:
N i=1 2 E[x2 ]
8
d d p
Then using Slutskys theorem that aN =bN ! a=b (for aN ! a and bN ! b)
PN
p p1
i=1 xi ui
N (b ) = N
1
P N 2
N i=1 xi
N [0; 2 E[x2 ]] d N [0; 2 E[x2 ]] d h i
d 2 2 1
! P ! ! N 0; E[x ] ;
plim N1 N 2
i=1 xi
E[x2 ]
P
where we use result from consistency proof that plim N 1 N 2 2
i=1 xi =E[x ].
a
XN 1
b N ; 2
x2i :
i=1
Note that this is exactly the same result as we would have got if yi = xi + ui
with ui N [0; 2 ]:
9
4.1. Multivariate CLT
Now we need to work with vectors. Two useful results follow.
Denition A16a: (Multivariate Central Limit Theorem) Let N = E[XN ]
and VN = V[XN ]. A multivariate central limit theorem (CLT) species the
conditions on the individual terms Xi in XN under which
1=2 d
VN (XN N) ! N [0; I]:
1=2
Note that if we can apply a CLT to XN we can also apply it to N XN .
d
Theorem A17: (Limit Normal Product Rule) If a vector aN ! N [ ; A] and
p
a matrix HN ! H, where H is positive denite, then
d
HN aN ! N [H ; HAH0 ]:
b= 1 1
+ N X0 X N 1
X0 u:
P
The reason for renormalization in the right-hand side is that N 1 X0 X = N 1 i xi x0i
is an average that converges in probability to a nite nonzero matrix if xi satises
assumptions that permit a LLN to be applied to xi x0i .
Then
1
plim b = + plim N 1 X0 X plim N 1 X0 u ;
using Slutskys Theorem (Theorem A.3). The OLS estimator is therefore consis-
tent for (i.e., plim b OLS = ) if
1
plim N X0 u = 0:
P
If a law of LLN can be applied to the average N 1 X0 u = N 1 i xi ui then
a necessary condition for this to hold is that E[xi ui ] = 0. The fundamental
condition for consistency of OLS is that E[ui jxi ] = 0 so that E[xi ui ] = 0.
10
4.3. Limit Distribution of OLS
Given consistency, the limit distribution of b is degenerate with p
all the mass at
. To obtain a limit distribution we scale b OLS up by a multiple N , so
p 1
N (b ) = N 1 X0 X N 1=2 X0 u:
We know plim N 1 X0 X exists and is nite and nonzero from the proof of consis-
tency. For iid errors, E[ujX] = 0 and V[X0 ujX] = E[X0 uu0 X0 jX] = 2 X0 X we
assume that a CLT can be applied to bN = N 1=2 X0 u to yield
1 d
[N X0 X] 1=2
N 1=2
X0 u ! N [0; I], so by Theorem A17
1=2 d
N X0 u ! N [0; 2
plim N 1
X0 X ]:
so
b a 2
N[ ; (X0 X) 1 ]:
The asymptotic variance matrix is
V[ b ] = 2
(X0 X) 1 ;
11
4.5. OLS with Heteroskedastic Errors
What if the errors are heteroskedastic?
PN 2 If E[uu0 jX] = = Diag[ 2i ] then V[X0 ujX] =
0 0 0 0 0
E[X uu X jX] = X X = i=1 i xi xi . A Multivariate CLT gives
1 d
[N X0 X] 1=2
N 1=2
X0 u ! N [0; I], so
1=2 d
N X0 u ! N [0; plim N 1
X0 X];
leading to
p d
N (b
1
) ! (plim N 1 X0 X) N [0; plim N 1 X0 X]
d 1 1
! N [0; 2 (plim N 1 0
X X) plim N 1 X0 X (plim N 1
X0 X) ]:
Then dropping the limits etcetera
b a N [ ; (X0 X) 1 X0 X(X0 X) 1 ]:
XN XN
plim N 1 u2i xi x0i = plim N 1 2 0
i xi xi :
i=1 i=1
p
The error is not observed, so we instead use the residual. Formally, since b ! ,
p
b2i u2i ! 0 so replacing u2i by u
u b2i makes no dierence asymptotically. This gives
Whites estimator.
This is a fundamental result. Without specifying a model for heteroskedasticity
we can do OLS and get heteroskedastic robust standard errors. This generalizes
to other failures of = 2 I, notably serially correlated errors (use Newey-West)
and clustered errors. And it generalizes to estimators other than OLS (e.g. IV,
MLE).
12
5. Optional Extra: Dierent Sampling Schemes
The key for consistency is obtaining the probability of the two averages (of xi ui
and of x2i ), by use of laws of large numbers (LLN). And for asymptotic normality
the key is the limit distribution of the average of xi ui , obtained by a central limit
theorem (CLT).
Dierent assumptions about the stochastic properties of xi and ui lead to
dierent properties of x2i and xi ui and hence dierent LLN and CLT.
2. Fixed regressors.
This occurs in an experiment where we x the xi and observe the resulting
random yi . Given xi xed and ui iid it follows that xi ui are inid (even if ui
are iid), while x2i are nonstochastic.
13
The rest of the side-condition is
Plikely to hold with cross-section data. e.g. if set
1
= 1, then need variance plus i=1 ( i =i ) < 1 which happens if 2i is bounded.
2 2
Theorem A15: (Liapounov CLT ) Let fXi g be independent with E[Xi ] = i and
PN PN 2 (2+ )=2
V[Xi ] = 2i . If lim i=1 E[jXi ij
2+
] = i=1 i = 0, for some choice
P q PN 2 d
of > 0, then ZN = N i=1 (Xi i )= i=1 i ! N [0; 1].
14
5.3.3. Exogenous Stratied Sampling with iid errors
Assume xi inid with mean E[xi ] and variance V[xi ] and ui iid with mean 0.
Now xi ui are inid with mean E[xi ui ] =E[xi ]E[ui ] = 0 and variance V[xi ui ] =E[x2i ] 2 ,
P p
so need Markov LLN. This yields N 1 i xi ui ! 0, with the side-condition satis-
ed if E[x2i ] is bounded.
And x2i are inid, so need Markov LLN with side-condition that requires e.g.
existence and boundedness of E[x4i ].
Combining again get plim b = .
d d p
Using Slutskys theorem that aN bN ! a b (for aN ! a and bN ! b), this
implies that
PN
p1
1 XN i=1 xi ui p 2 d
p d
p xi ui = Np E[x2 ] ! N [0; 1] 2 E[x2 ] ! N [0; 2 E[x2 ]]:
N i=1 2 E[x2 ]
d d p
Then using Slutskys theorem that aN =bN ! a=b (for aN ! a and bN ! b)
PN
p p1
i=1 xi ui
N (b ) = N
1
P N 2
N i=1 xi
N [0; 2 E[x2 ]] d N [0; 2 E[x2 ]] d h i
d 2 2 1
! P ! ! N 0; E[x ] ;
plim N1 N 2
i=1 xi
E[x2 ]
P
where we use result from consistency proof that plim N 1 N 2 2
i=1 xi =E[x ].
15
5.4.2. Fixed Regressors with iid errors
Assume xi xed and and ui iid with mean 0 and variance 2 .
Then xi ui are inid with mean 0 and variance V[xi ui ] = x2i 2 . Apply Liapounov
LLN yielding
0 1 PN
PN p1
p N 1
x u 0 x i ui
N @q i=1 i i A = q N i=1 d
! N [0; 1]:
P P
lim N 1 N i=1 xi
2 2 2 lim N 1 N 2
i=1 xi
d d p
Using Slutskys theorem that aN bN ! a b (for aN ! a and bN ! b), this
implies
PN r
p1
1 XN N i=1 xi ui
XN
p x i ui = q PN 2
2N 1 x2i
N i=1 2 lim N 1 i=1
i=1 xi
r
XN d 1 XN 2
= N [0; 1] 2 lim N 1 x2i ! N [0; 2 lim xi ]:
i=1 N i=1
d d p
Then using Slutskys theorem that aN =bN ! a=b (for aN ! a and bN ! b)
h PN 2 i " #
1
PN 2 1
p p
i=1 x i u i d
N 0; lim N i=1 x i d 1 XN 2
1
N (b ) = N1 PN 2 ! 1
P N
! N 0; 2
lim xi :
N i=1 x i lim N i=1 x 2
i
N i=1
16
6. Simulation Exercise
6.1. Data Generating Process
2
Suppose yi are iid
P (1), which has mean 1 and variance 2.
1 N
Then y = N i=1 yi has E[y] = 1, V[y] = V[y]=N = 2=N , and using the result
that the sum of N independent 2 (1) is 2 (N ), we know that y 2
(N )=N .
6.2. Distribution of y
From a CLT we know that as N ! 1
y N [1; 2=N ]:
Question: how well does this apply in nite samples? Answer: do a simulation:
Then does the density of the S generated ys look like the density of a standard
normal with mean 1 and variance 2=N ? We can do this in several ways: comparing
key moments, or comparing key percentiles, or compare a histogram or (kernel)
density estimate to the normal.
Then does the density of the S generated ts look like a t(N 1) density? Again
we can test the similarity using moments, percentiles or kernel density estimate.
17
6.4. Size of t-test
For the t-test it is behavior in the tails that really matter. In addition to view
the 2:5 and 97:5 percentiles, we investigate the size of the t-test. Recall
The nominal size of the test is the size that we think we are testing at. Assume
that t T (N 1) distributed under H0 . Then for two-tailed testing at 5 percent
18