Академический Документы
Профессиональный Документы
Культура Документы
ECON 550
E.g.
E.g. ,
1.431 2 0.014 ·
Interpreting Coefficient Estimates (cont.)
• Applied to our example, the marginal effect of a change in
age starting from age 20 is
E.g.
2 · 20
|
⇒ 2 · 20
|
S.E. of Estimated Effects (cont.)
can be backed out from a calculation of the
|
F-statistic for the test that the true effect of age on wage at
age 20 is zero under the null:
: 2 · 20 0
| 2 · 20 0|
2 · 20
∆ ∆
ln ∆ ln ln ≅
⇔ ln
ln
ln 12.881 9.448ln
20
30
40
Age
Linear-Log Model
50
60
20
30
40
Age
Cubic Model
50
60
The Log-Linear Model
• In the log-linear regression model, Y is in logarithms but X
is not.
ln
ln 2.500 0.010
20 30 40 50 60
Age
The Log-Log Model
• In the log-log regression model both Y and X are in
logarithms
ln ln
ln ln
1.285 0.443ln
3
3.5
ln(Age)
Log-Log Model
4
4.5
Comparing Logarithmic Models
• If we want to compare models in terms of fit, the
dependent variable must be equivalent across
specifications.
• We cannot consequently compare the linear-log model to
either log-linear or log-log models.
• We can compare the linear-log model to the polynomial
model.
• We can compare the log-linear and log-log models.
• Ultimately, we should be guided by our desired
interpretation for the data.
Interactions Between Regressors
• Recall that a second group of non-linear regression
functions involve effects of one regressor, , on Y which
depend on the value of another regressor, .
• E.g. ,
• E.g. ,
Interactions of Binary Regressors
• If and are two binary dummy variables we might
want to allow the effect of on Y to depend on the value
of .
• This effect is estimated through an interaction term or
interacted regressor:
Interactions of Binary Regressors (cont.)
E.g. Returns to Education
average hourly earnings
∈ 0,1 {no college degree, degree}
∈ 0,1 {male, female}
Interaction Effects – Binary Regressors
• The difference in wage associated with going to college is:
1, 0,
if 0 (male)
if 1 (female)
3
where D is a binary indicator variable (e.g. gender) and X
is a continuous regressor.
• (1) allows for only the intercept to depend on D.
• (2) allows for only the slope to depend on D.
• (3) allows for both intercept and slope to depend on D.
Interactions of Binary and Continuous Regressors
E.g. Teacher Evaluations
• Estimating (1) allows for males
and females to have different evaluation scores, on
average, but implicitly imposes that the effect of beauty be
the same for males and females alike.
• Estimating (2) imposes
that males and females have the same average
evaluation scores but allows for the effect of beauty to
differ by gender.
• Estimating (3)
allows for males and females to differ by average
evaluation score and allows for the effect of beauty to
differ by gender.
Interactions of Binary and Continuous Regressors
• E.g. ,
• E.g. ,
Interactions of Binary Regressors
E.g. Returns to Education
average hourly earnings
∈ 0,1 {no college degree, degree}
∈ 0,1 {male, female}
Interaction Effects – Binary Regressors
Interactions of Binary and Continuous Regressors
∆ ∆
,
∆ ∆
E.g. ⋯
1) Estimate regression model without polynomial terms
and compute fitted values
2) Estimate ⋯
3) Perform F-test of : 0, 0
Under the null, the base model is correctly specified (i.e.
polynomials of existing regressors are unnecessary).
Misspecification Tests (cont.)
Note that the RESET test is silent on whether your model
should include additional variables.
Rejection of the null hypothesis does not indicate which
regressors should be specified in quadratic and/or
cubic terms.
∗ ∗
• We assume , s.t. is uninformative
about once the ∗ has been accounted for.
• We assume 0
Errors in Variables (cont.)
• If we instead estimate
,
∗
Then use
∗
,
∗
∗
such that:
∑ ∑
∑ ∑
Errors in Variables (cont.)
∑
∑
,
⇒
(1) If cov , 0,
cov , 0 and will be consistent.
Errors in Variables (cont.)
∗
(2) If cov , 0, then:
∗ ∗
, ,
⇒ , 0
Attenuation bias!
Classical Measurement Error Solutions
• Under classical measurement error, the direction of bias is
always toward zero.
• For some types of analyses, this is not too problematic.
,
β
∗ ∗
, , 0
So, we expect
(Suppose, for instance, that wage was 20% higher for a group
of people and they worked 2 more hours per week but they
only reported that their wage was 10% higher. We would
attribute the additional 2 hours to only a 10% rise in the wage
when in fact it was due to a 20% rise).
Measurement Error in Y
• Formally, suppose now that we would like to estimate
∗
• As usual,
,
⇒
where , , , .
If , 0,
, 0 and will be consistent.
Measurement Error in Y (cont.)
• Mismeasurement of the dependent variable that has
mean zero and is uncorrelated with any regressor (i.e.
classical measurement error) will not yield biased
coefficient estimates, but the estimated standard errors
will be larger than otherwise.
• (Random measurement error with non-zero mean will
merely bias the intercept coefficient estimate).
cov , cov ,
⇒ cov ,
1
Simultaneity Solutions
• Just as for omitted variable bias or errors-in-variables
bias, the best option for addressing simultaneity bias is
instrumental variables regression.
“Rating sites apparently even have the power to bring a well-known UNC Law professor to
his electronic knees. It’s not every day that a torts professor sends his former students a “rather
embarrassing request” to repair his online reputation. It’s also certainly not every day that the
students respond en masse….
On Tuesday, Professor Michael Corrado sent the following email to 2Ls who took his torts
class last year, basically pleading for their help. ...
“Rating sites apparently even have the power to bring a well-known UNC Law professor to
his electronic knees. It’s not every day that a torts professor sends his former students a “rather
embarrassing request” to repair his online reputation. It’s also certainly not every day that the
students respond en masse….
On Tuesday, Professor Michael Corrado sent the following email to 2Ls who took his torts
class last year, basically pleading for their help. ...
Differences-in-Differences (DiD) (cont.)
E.g. Financial Incentives for Fitness
What is ?
Next consider:
What is ?
What is ?
Differences-in-Differences (DiD) (cont.)
Next consider:
What is ?
What is ?
Differences-in-Differences (DiD) (cont.)
• Now consider again the full model:
What is ?
What is ?
What is ?
What is ?
Differences-in-Differences (DiD) (cont.)
but we estimate
,
then | if , 0 and 0.
⇒
Limitations - First Differencing
• A possible downside with first differencing, however, is
that there may be relatively little variation over time in the
changes in X to explain changes in Y, even if there exists
a substantial degree of cross-sectional variation to exploit.
where .
• is referred to as an entity fixed effect and allows for
each entity to have its own intercept (i.e. time-average
effect on Y).
Accounting for Fixed Effects
Where have we previously encountered situations where
we wanted to allow different groups within our data to
have different intercepts?
How did we allow for this in our regression models?
Indicator (dummy) variables!
The Dummy Variable Approach
• The dummy variable approach to fixed effects regression
consists of including separate binary indicator variables to
flag each individual entity in the dataset.
• For example, 2 1 if 2 and 0 otherwise.
• To avoid multicollinearity, only 1 dummy variable
regressors may be included, so we arbitrarily drop the
first:
, 2 3 ⋯
⇒
Limitations - Dummy Variable Approach
• While statistically-valid, the dummy variable approach has
the practical downside of requiring estimation of
coefficients.
• This can be computationally-slow and clutters the
regression output if we do not care about the magnitude
of each of the separate fixed effects.
Entity Demeaning
• Yet a third method for accounting for time-invariant entity
fixed effects consists of time-demeaning (subtracting the
mean of each variable over all T time periods) for each of
the regressors and the dependent variable.
• Thus, instead of estimating,
,
we estimate
̅ ̅
Entity Demeaning (cont.)
• Since is time-invariant,
1 1
̅ · ·
where
• is the slope coefficient estimate from the first
differenced model (estimated without an intercept),
• corresponds to the model with 1 dummy
variable indicators, and
• represents the coefficient estimate from the entity
demeaned model.
Time Fixed Effects
• Just as we may worry about omitted variable bias arising
through time-invariant determinants of Y, we might
also/instead worry about unobserved time-varying effects
that are the same across entities.
• The fully-specified (time and entity) fixed effects
regression model is hence
⇔
• E.g. Traffic Fatalities: evolving paternalistic views
affecting national vehicle safety standards and “sin” taxes.
Time Fixed Effects (cont.)
• Accounting for time fixed effects proceeds in much the
same way as for entity fixed effects.
1) Entity and time dummies:
, 2 ⋯ 2 ⋯
with both state and year fixed effects, what can be said
about the internal validity of our regression results?
• In terms of FE, the intercepts for the two entities, control and treated,
are the entity fixed effects, written as and , while is the time
fixed effect. captures the DiD treatment effect.
• Or, recognizing 1, 0:
Δ
• captures the average change in gym visits for the control group
• captures the DiD treatment effect.
DiD and Parallel Trends Assumption
• A DiD regression compares the trend in the outcome in the treatment group to the
trend in the outcome in the control group
• In order for this comparison to yield a good estimate of the treatment effect, we
must rule out any differences in pre-existing trends among the two groups
• If the pre-existing trends differ, then any difference in differences may simply
reflect a continuation of these pre-existing trends rather than a causal effect.
• Using data from the pre-period, create a linear time trend, a variable that equals 1
in period 1, 2 in period 2, …and T in period T, the last untreated time period.
• Then, interact it with a treatment group dummy and run the model below on pre
treatment data.
• If the coefficient on this interaction is different from zero 0 , the data flunk
the parallel trends assumption and the DiD estimate is likely to be biased.
• Researchers typically plot the trends in both the pre period and the treatment
period with a vertical line at the time the treatment is applied
• If you have data from too few observations to run the regression above, you can
simply plot the time trend.
Differences in Differences in Differences
(DDD)
• Conceptually, the DDD captures the difference between two
DD results in one regression.
• The first result is the one we’ve already studied.
• The second result is for a group that is exposed to the
treatment but should not be affected by it.
• For example, Philadelphia might enact a tutoring program for
high school juniors while Pittsburg does not.
• We might be interested in the effect of the program on test
scores, one at the start of the junior year and the other at the
end of the junior year.
• The first DD subtracts the change in test score for juniors in
Pittsburg from the change in test scores for juniors in
Philadelphia:
DDD (cont.)
• The DDD basically runs the DD for sophomores and
subtracts the result for them from the result for juniors.
• The DDD accounts for whether any time varying omitted
variable might be causing the change in test scores
instead of the tutoring program.
• For example, it may be that at the same time that
Philadelphia got money for the tutoring program, it also
got money for better labs, new textbooks and better
teachers.
• These changes should also affect sophomores, but since
the tutoring program is only for juniors, sophomores would
not be affected by it.
DDD (cont.)
• The DDD can be captured in a single regression as follows:
• ∆
• Notice that every regressor in the longer regression from the previous
slide that lacks disappears when it’s 1st differenced.
• Once again, we have 4 parameters to describe 4 groups, but this time
we’re describing changes instead of levels
Problem Set #6
due Weds. 2/28, in class.
Econometrics
ECON 550
Instrumental Variables
W Chapter 15
1) Instrumental Variables
• Instrument Relevance, Exogeneity, and Monotonicity
• IV Estimation
• 2SLS Estimation
• Testing
• Endogeneity
• Overidentifying Restrictions
• Weak Instruments
1) Omitted Variables
2) Functional Form Misspecification
3) Measurement Error (Errors-in-Variables)
4) Simultaneity (Simultaneous Causality)
5) Sample Selection
Why Instrumental Variables? (cont.)
• When more direct solutions are not available (e.g. explicit
controls or fixed effects), instrumental variables (IV)
regression offers a possible method for mitigating bias
due to omitted variables, simultaneity, or measurement
error.
, , ,
,
,
∑ ̅
⇒
∑ ̅
Properties of - Consistency
∑ ̅
∑ ̅
, ,
⇒
, ,
Properties of - Unbiasedness
⇒ , ,
· ,
from regression of
on Z (including constant)
Unless , 1 (i.e. ,
, ,
·
, ,
, ·
Weak Instruments (cont.)
• Asymptotic bias for the IV estimator will be more severe
than for the OLS estimator if:
,
,
,
1 (exogenous)
2) (endogenous)
2SLS Estimation (cont.)
• 2SLS regression thus proceeds in two stages:
1) In the first stage, we regress the endogenous regressor, X, on
the instrument(s) and obtain predicted values of the component
of X which is uncorrelated with the error term u from the
regression of Y on X:
⇒
2) In the second stage (i.e. the main or “structural” equation), we
regress Y on these predicted values:
, ,
⇒
, ,
⇒
, ,
⇒
, ,
·
Structural (IV/2SLS), Reduced Form, and
First-Stage Equations
The reduced form equation evaluates the effect of the
instrument directly on the outcome.
(First Stage)
(Second Stage Structural Equation)
(Reduced Form Equation)
• To see this, note that you can think of the portion of the
variation in X that is explained by Z as capturing the subset of
the sample that is induced to “comply” with X, the “treatment.”
V V
Durbin-Wu-Hausman Test:
′ V V
~
Tests of Endogeneity (cont.)
Regression-Based Test:
Under the null ( is exogenous), the residual from the
first stage regression should have no statistically
significant effect if included as an extra regressor in the
OLS regression.
1) Estimate ⇒
• captures variation in that is orthogonal to
and and therefore potentially correlated with
2) Estimate ,
3) Test : 0
⇒ , 0⇔ , 0⇒ 0
Tests of Endogeneity (cont.)
• Rejection of 0 implies that is endogenous (through
correlation between and ).
Use IV!
Problem Set #7
due Weds. 3/6, in class.
Econometrics
ECON 550
March 7, 2018
Instrumental Variables,
Limited Dependent Variables
W Chapter 15-17
1) IV Tests
• Overidentification
• Weak Instruments
2) Simultaneous Equations
3) Limited Dependent Variables
• LPM/Probit/Logit
• Tobit
• Maximum Likelihood Estimation
Presentation Guidelines
• Plan concise, 15-18 minute presentations
• Presentation should cover
• Research question, motivation, and background
• Research design – modeling, data, etc.
• Results (w/ careful interpretation)
• Discussion – validity of results; extensions, etc.
• Group members should participate equally
• Participation in ~5 minute Q+A following others’
presentations will also factor into grades
Overidentification (OID) Tests
• An IV regression is said to be just identified if there are as
many instruments as endogenous regressors.
E.g. Air Travel (cont.)
(Inverse Supply)
(Demand)
E.g.
1) Binary outcomes
2) Corner solutions/censoring
3) Counts
(1) Examples of Binary Outcomes
• Smoking: How do cigarette taxes affect whether or not
an individual smokes at a particular point in time?
• ER visits: How do medical co-payments affect whether or
not an individual uses the emergency room (over the
course of a year)?
• Poverty: How does an individual’s poverty status as a
child affect whether or not they live in poverty as adults?
• Sovereign default: How does the use of a pegged/fixed
exchange rate affect whether or not a country defaults on
its debt in a period of economic turmoil?
(2) Examples of Corner Solutions
• Smoking: How do cigarette taxes affect how many
cigarettes an individual smokes?
• Dividend payouts: How does the fraction of executive
compensation coming from stock options impact dividend
payments to shareholders?
• Capital expenditures: How do bonus depreciation rules
affect business expenditures on new industrial
machinery?
(3) Examples of Count Outcomes
• Number of children: How do government expenditures
on pre-K “schooling” affect the number of children per
household?
• Number of ER visits: How do medical co-payments affect
the number of ER visits made over the course of a year?
• Number of exported products: How do trade barriers
(e.g. tariff rates) affect the number of products produced
for export?
Why the Special Treatment?
• Binary explanatory variables posed no special problems
for estimation by OLS, so why the special treatment for
binary dependent variables?
-20 -10 0 10 20
Change in Annual Property Taxes
Pr |
N-L Binary Response Models (cont.)
• For any real number z , provided that is
symmetrically distributed about 0 and independent of ,
Pr Pr 1 Pr
⇒ Pr 1
⇒ Pr 1
Φ z , 2 . · exp 0.5
∆ Pr 1 β ⋯ ∆ ⋯β
β ⋯ ⋯β
• Equivalently, Pr 1 implies that
Pr 1
g · , where g z
Pr 1
g · , where g z
If ⋯ ,
Pr 1
g ·
Pr 1
g · 2
Calculation of Marginal Effects (cont.)
For discrete variables, if ∈ is a dummy variable
Pr 1
1 0
Pr 1 Pr 0
Calculation of Marginal Effects (cont.)
• and g will differ across observations because each
observation has its own values of X.
• Since the calculation of the marginal effect depends on these
values, they too will differ across observations.
⇒ argmax log ; ,
Probit/Logit MLE
• For binary response models, where ∈ 0,1 , the density of
| is given by
; , · 1
⇒ log 1 Y log 1
Pr 1, 0.5 Pr 0, 0.5
• Concretely,
Pseudo- 1 /
where is the log-likelihood function estimated for the
unrestricted model, and is the log-likelihood for an
intercept-only model.
Model Fit (cont.)
• Given that pseudo- 1 / ,
If the included regressors have no explanatory power,
and pseudo- 0.
Usually, ⇒ pseudo− 0 (i.e. since MLE
involves maximizing the (negative) log likelihood adding
regressors cannot reduce the value of the estimated log
likelihood function.)
Multiple Hypothesis Testing
• For the same reason that the standard measure of is
inappropriate under probit or logit estimation, F-tests are
likewise invalid and testing involves values of the
maximized likelihood function.
Pr 1
σ σ σ
Corner Solutions – Tobit Model (cont.)
• Taken together, we can write
, I 0 · log 1
σ
1
I Y 0 · log ·
1
2 | 0,
Assignment