Академический Документы
Профессиональный Документы
Культура Документы
Regression
Fabian Waldinger
Waldinger () 1 / 44
Topics Covered in Lecture
1 Bad controls.
2 Standard errors.
3 Quantile regression.
Waldinger (Warwick) 2 / 44
Bad Control Problem
Waldinger (Warwick) 3 / 44
An Example
We are interested in the eect of a college degree on earnings.
People can work in two occupations: white collar (wi = 1) or blue
collar (wi = 0).
A college degree also increases the chance of getting a white collar
job.
) Potential outcomes of getting a college degree: earnings and
working in a white collar job.
Suppose you are interested in the eect of a college degree on
earnings. It may be tempting to include occupation as an additional
control (because earnings may substantially dier across occupations):
Yi = 1 + 2 Ci + 3 Wi + i
Yi = Ci Y1i + (1 Ci )Y0i
Wi = Ci W1i + (1 Ci )W0i
Waldinger (Warwick) 7 / 44
Part 2
Standard Errors
Waldinger (Warwick) 8 / 44
Standard Errors in Practical Applications
V (b
) = (X 0 X ) 1 X 0 X (X 0 X ) 1
V (b
) = 2 (X 0 X ) 1
Yig = 0 + 1 xg + ig
- i : individual
- g : group, with G groups.
Waldinger (Warwick) 10 / 44
Grouped Error Structure
eig = vg + ig
An error structure like this can increase standard errors (Kloek, 1981,
Moulton, 1986).
With this error structure the intraclass correlation coe cient becomes:
2v
e = 2v +2
Waldinger (Warwick) 11 / 44
Derivation of the Moulton Factor - Notation
2 3 2 3
Y1g e1g
6 Y2g 7 6 e2g 7
yg = 6
4 ... 5
7 eg = 6
4 ... 5
7
Yn g g en g g
Waldinger (Warwick) 12 / 44
Derivation of the Moulton Factor - Group Covariance
2 3
1 0 ... 0
6 0 2 .. 7
E (ee 0 ) = = 6
4 ..
7
.. 0 5
0 ... 0 G (Gxn
g )x (Gxn g )
2 3
1 e ... e
6 e 1 .. 7
g = 2e 6
4 ..
7
.. e 5
e ... e 1 n
g xn g
2
- where e = 2+v 2 (see above).
v
- The dimensions of the matrix are only correct if ng is the same
across all groups.
Waldinger (Warwick) 13 / 44
Derivation of the Moulton Factor
Rewriting:
2 3 2 3
1 + e + e + ... + e 1 + ( ng 1 ) e
6 1 + + + ... + 7 6 1 + ( ng 1 ) 7
= 2e 6
4
e e e 7
5 = 2e 6
4
e 7
5
.. ..
1 + e + e + ... + e n 1 + ( ng 1 ) e n
g x1 g x1
Waldinger (Warwick) 14 / 44
Derivation of the Moulton Factor
2 3
1 + ( ng 1 ) e
6 1 + ( ng 1 ) 7
Using g0 6
4
e 7 = n [1 + (n
5 g g 1)e ] we get:
..
1 + ( ng 1 ) e
Waldinger (Warwick) 15 / 44
Derivation of the Moulton Factor
Now dene g = 1 + (ng 1)e so we get:
xg g0 g g xg0 = 2e ng g xg xg0
And therefore:
X 0 X = 2e ng g xg xg0
g
V (b
) = (X 0 X ) 1 X 0 X (X 0 X ) 1
= 2e (ng xg xg0 ) 1
ng g xg xg0 (ng xg xg0 ) 1
g g g
Waldinger (Warwick) 16 / 44
Comparing Group Data Variance to Normal OLS Variance
If group sizes are equal ng = n and g = we get
) = 2e (nxg xg0 )
V (b 1
nxg xg0 (nxg xg0 ) 1
g g g
= 2e (nxg xg0 ) 1
g
V clustered (b
)
Therefore
V OLS (b )
= 1 + (n 1) e
This ratio tells us how much we overestimate precision by ignoring the
intraclass correlation. The square root of this ratio is called the
Moulton factor (Moulton, 1986).
Waldinger (Warwick) 17 / 44
Generalized Moulton Factor
With equal group sizes the ratio of covariances is (last slide):
V clustered (b
)
V OLS (b )
= 1 + (n 1) e
For a xed number of total observations this gets larger if group size
n goes up and if the within group correlation e increases.
The formula above covers the important special case where the group
sizes are the same and where the regressors do not vary within groups.
A more general formula allows the regressor xig to vary within groups
(therefore it is now indexed by i) and it allows for dierent group
sizes ng :
V clustered (b
) V (n g )
V OLS (b )
= 1+( n +n 1) x e
Waldinger (Warwick) 19 / 44
Clustering and IV
The Moulton factor works similarly with 2SLS estimates. In that case
the Moulton factor is:
V clustered (b
) V (n g )
V OLS (b )
= 1+( n +n 1)xb e
Waldinger (Warwick) 20 / 44
Clustering and Dierences-in-Dierences
Waldinger (Warwick) 21 / 44
Practical Tips for Estimating Models with Clustered Data
V (b
) = (X 0 X ) 1(
Xg b g Xg )(X 0 X ) 1
g
b g is estimated as:
2 2 3
b
e1g b
e1g b
e2g ... b
e1g b
en g g
6 b
e1g be2g b2
e2g 7
b g = ab
eg0 = a 6
eg b 4
7
5
.. ... b
en g 1,g ben g g
b
e1g b
en g g ... b
en g 1,g b
en g g en2g g
b
a is a degrees-of-freedom correction.
The clustered estimator is consistent as the number of groups gets
large given any within-group correlation structure and not just the
parametric model that we have investigated above. -> Asymptotics
work at the group level.
Increasing group sizes to innity does not make the estimator
consistent if the number of groups does not increase.
Waldinger (Warwick) 23 / 44
Practical Tips for Estimating Models with Clustered Data
3 Use group averages instead of micro data:
I.e. estimate:
Y g = 0 + 1 Xg + e g
4 Block bootstrap:
Draw blocks of data dened by the groups g.
5 Do GLS:
For this one would have to assume a particular error structure and
then estimate GLS.
Waldinger (Warwick) 25 / 44
How Many Clusters Do We Need?
Waldinger (Warwick) 26 / 44
Solutions For Small Number of Clusters
b g = ae
eg0
eg e
e
eg = Ag beg
where Ag solves:
Ag0 Ag = (I Hg ) 1
Hg = Xg (X 0 X ) 1 Xg0
and a is a degrees-of-freedom correction
Waldinger (Warwick) 27 / 44
Solutions For Small Number of Clusters
Waldinger (Warwick) 28 / 44
Part 3
Quantile Regression
Waldinger (Warwick) 29 / 44
Quantile Regression
Waldinger (Warwick) 30 / 44
Quantile Regression
Q ( Yi j Xi ) = F y 1 ( j Xi )
Waldinger (Warwick) 31 / 44
The Minimization
(u ) = [ 19 ( yi u )] + [ 91 (yi u )]( 1)
yi =u y i <u
Waldinger (Warwick) 33 / 44
The Loss Function: Example
If we want to nd the median = 0.5 :
Expected loss if u = 3:
9 2
9 (ii
(3) = [ 0.5 3)] 9 (i
[ 0.5 3)] =
i =3 i =1
0.5 0.5 0.5
9 [0 + 1 + 2 + 3 + 4 + 5 + 6] 9 [ 2 1] = 9 [24]
Expected loss if u = 5:
9 4
9 (ii
(5) = [ 0.5 5)] 9 (i
[ 0.5 5)] =
i =5 i =1
0.5 0.5 0.5
9 [0 + 1 + 2 + 3 + 4] 9 [ 4 3 2 1] = 9 [20]
Waldinger (Warwick) 34 / 44
The Loss Function: Example
u 1 2 3 4 5 6 7 8 9
0.5
0.5 (u ) = 9 36 29 24 21 20 21 24 29 36
Waldinger (Warwick) 35 / 44
The Loss Function: Example bottom 1/3 decile
We would like to nd the bottom 1/3 of the distribution: = 1/3
The expected loss is:
(u ) = [ 19 (yi u ) 13 ] + [ 19 (yi u )]( 1
3 1)
y i =u y i <u
! Now the loss function weights positive and negative terms asymmetrically.
Expected loss if u = 3:
9 2
1
(3) = [ 27 (ii 3)] 2
[ 27 (i 3)] =
i =3 i =1
1 2 21 5 26
27 [0 + 1 + 2 + 3 + 4 + 5 + 6] 27 [ 2 1] = 27 + 27 = 27
Expected loss if u = 5:
9 4
1
(5) = [ 27 (ii 5)] 2
[ 27 (i 5)] =
i =5 i =1
1 2 10 19 29
27 [0 + 1 + 2 + 3 + 4] 27 [ 4 3 2 1] = 27 + 27 = 27
Waldinger (Warwick) 36 / 44
The Loss Function: Example
u 1 2 3 4 5 6 7 8 9
1
0.33 (u ) = 27 36 30 26 27 29
Waldinger (Warwick) 37 / 44
Estimating Quantile Regression
Waldinger (Warwick) 38 / 44
Quantile Regression - An Example: Returns to Education
In one of the previous lectures you may have looked at Angrist and
Kruegers (1992) paper on the returns to education.
Rising wage inequality has prompted labour economists to investigate
whether returns to education changed dierently for people at
dierent deciles of the wage distribution.
Angrist, Chernozhukov, and Fernandez-Val (2006) show some
evidence on this in their 2006 Econometrica paper.
Waldinger (Warwick) 39 / 44
Quantile Regression Results
Adapted from Angrist, Chernozhukov, and Fernandez-Val (2006) as reported in Angrist
and Pischke (2009)
Waldinger (Warwick) 40 / 44
Quantile Regression Results
1980:
an additional year of education raises mean wages by 7.2 percent.
an additional year of education raises median wages by 0.68 percent.
slightly higher eects at lower and upper quantiles.
1990:
Returns go up at all quantiles and returns remain fairly similar at all
quantiles.
2000:
returns start to diverge:
an additional year of education raises wages in the lowest decile by 9.2
percent.
an additional year of education raises wages at the median by 11.1
percent.
an additional year of education raises wages in the highest decile by
15.7 percent.
Waldinger (Warwick) 41 / 44
Graphical Illustration
Waldinger (Warwick) 42 / 44
Censored Quantile Regression
Yi is the variable that one would like to see but one only observes
Yi ,obs which will be equal to c where the real Yi is greater than c.
Quantile regression can be used to estimate the eect of covariates
on conditional quantiles that are below the censoring point (if the
data is censored from above).
If earnings were censored above the median we could still estimate
eects at the median.
Waldinger (Warwick) 43 / 44
Censored Quantile Regression - Estimation
Powell (1986) proposed the quantile estimator:
Q [Yi jXi ] = min(c, Xi0 c )