You are on page 1of 22

# Causality and Selection

## PA5044: Regression Analysis Accelerated

Professor Janna E. Johnson
Humphrey School of Public Affairs
University of Minnesota

## For Jan 20, 2016

1 / 22

Why metrics?
Applied Econometrics: economists use of data to answer
cause-and-effect questions
Consists of disciplined data analysis paired with the machinery of
statistical inference
This class will provide you with the foundations of metrics
Most important foundation: distinction between causality and
correlation
Even with the most advanced metrics methods, it is exceedingly
difficult to achieve causality
Mistakenly interpreting metrics results as causal is one of the easiest
and most dangerous errors one can make

2 / 22

## The Power of Metrics

Economists use metrics to try to answer very important questions
Will mandatory health insurance really make Americans healthier?
Is it better in terms of lifetime earnings to attend a public university
or a private college that costs 4 times as much?
Do smaller class sizes increase student test scores?
Do conditional cash transfer programs in developing countries really
improve child health and education outcomes?
The methods we cover in this class (and PA 5033) have the potential to
provide definitive causal answers to these questions, but doing so is
extremely difficult
When our methods provide us with a result we can interpret as causal,
economists say we have identification

## For Jan 20, 2016

3 / 22

Identification

People that have more schooling earn more than those with less
schooling
Is this relationship causal?
A defensible model in economics is that human capital (knowledge)
represents an investment in people that employers are often willing to
pay for. Schooling is one way to accumulate human capital
While the observation that those who go to school longer make more
is consistent with the human capital model, there are other
interpretations

4 / 22

## Alternative interpretation: People who go to school longer are smarter

and have other traits (persistence, conscientiousness) that are valued
by employers
The correlation between earnings and schooling is spurious: more able
people are paid more but they also happen to get more schooling
Put differently, how do we know how much, if any, of the higher
earnings of people with more schooling is actually caused by the
schooling itself?
This, in a nutshell, is the fundamental problem of social science: how
do we know the differences in outcomes we observe are the result of
differences in observed choices that people make?
Economists and other social scientists refer to this conundrum as the
identification problem or the selection problem

5 / 22

## The Roy Model

Suppose there are two occupations: economists and accountants
Suppose that both professions are equally pleasant
The earnings of the ith person in accounting is given by y0i where
y0i N (65, 000, 5, 0002 )
The earnings of the ith person in economics is given by y1i where
y1i N (60, 000, 10, 0002 )
In addition, we assume the correlation between accounting and
economist wages is high: 0.84. If youre going to be a good
economist, its very likely youll also be good at accounting.

6 / 22

## Now lets build a model of occupation selection. Because each individual

doesnt have a preference between accounting and doing economics, she
picks the one that pays the most. Her observed earnings are
yi = max (y0i , y1i )
Note that here Yi is written in lower case because it is the realization of a
random variable. Because both Y0i and Y1i are lower-case on the
right-hand side of the equation, we are assuming that our agent knows
what her earnings would be in each occupation before she decides to be an
economist or accountant. Let Di = 1 indicate she chooses economics.
Now, were going to play God and do something we cant do in real life:
pretend to observe the potential earnings of the entire population.

7 / 22

## Here is the data:

Accounting earnings
Economics earnings
N

Accountants
63,985
56,599
78,414

Economists
68,690
72,317
21,586

Mean(Total)
65,001
59,992
100,000

You usually only observe the bolded elements, which are the realized
outcomes.
We, as economists, want to know what impact choosing to be an
economist has on economists earnings relative to the counterfactual of
them being accountants.

## For Jan 20, 2016

8 / 22

Because everyone isex ante identical, you might try to estimate the impact
of being an economist by comparing the observed earnings, or
N = y 1,D =1 y 0,D =0 = 72, 317 63, 985 = 8, 332
This is nave because it has implicitly assumed that
E (Y0 |D = 1) = E (Y0 |D = 0) or that a good estimate of economists
counterfactual accounting earnings is the observed accountants
accounting earnings.
Since we got to observe all potential outcomes in this exercise, we can
calculate what the real gain from becoming economists is for the people
who become economists.
ATET = y 1,D =1 y 0,D =1 = 72, 317 68, 690 = 3, 627

9 / 22

## Alternatively, you could have estimated the impact of becoming

economists for those who actually became accountants, in which case the
nave estimator implicitly assumed that E (Y1 |D = 0) = E (Y1 |D = 1).
For accountants, the impact on earnings if they had become economists
ATEN = y 1,D =0 y 0,D =0 = 56, 599 63, 985 = 7, 386
It looks like those who became accountants made the right choice: they
would have been worse off if they had become economists! This again
shows why the nave estimator is so incredibly horrible.

## For Jan 20, 2016

10 / 22

Finally, you may want to know what the impact would be if we made
everyone become economists. If you used the nave estimator, youd be
implicitly assuming that E (Y1 ) = E (Y1 |D = 1) and
E (Y0 ) = E (Y0 |D = 0). Because the data is made up, we can calculate
the real ATE:
ATE = y 1 y 0 = 59, 992 65, 001 = 5, 009
All three of these parameters are meaningful, and our nave estimator gives
us none of them.

11 / 22

## You should find this depressing. Relatively simple estimators can go

tragically wrong. While I have done this in terms of the selection of
occupation, think about how this is relevant for public policy:
Job training programs
The returns to education (President Obamas new proposal for free
community college)
Programs to prevent recidivism
Programs to prevent dropping out of high school
Finally, notice that people who choose to be an accountant benefit from
being an accountant; people who choose to be an economist benefit from
being an economist. Mr. Roy reminds us that people are utility
maximizers and we forget that at our own (and their) peril

12 / 22

## The Selection Problem

What is going on here? We need a model to think about this. The model
is
y1i = g1 (xi , ui )
y0i = g0 (xi , ui )
where xi is a vector of observed variables we will call covariates or
observables, and ui is a vector of unobserved variables we call
unobservables
The functions gi () are two unknown functions that depend on D

13 / 22

## The fundamental problem of evaluation is that you get to observe (at

most) only one of the two outcome states. This makes estimation of
causal effects really hard
Why does this make things so hard? Agents are making optimal choices
with information not available to the econometrician
The nave estimator ignores the nonrandom selection into the two regimes
and results in biased estimates of the parameters. Social scientists say that
such estimates suffer from a selection bias. There are two forms of
selection bias: (1) selection on unobservables and (2) selection on
unobservables

## For Jan 20, 2016

14 / 22

Selection on observables is the easier of the two (but hard enough to keep
us occupied for a while). Ordinary Least Squares (OLS) regressions are
one means of trying to account for selection on observables
Selection on unobservables is extremely difficult. This is the problem that
keeps people like me employed. If you take the follow-up to this course,
Multivariate Analysis, youll learn about some of the techniques one can
use to try to deal with selection on unobservables. Fixed effects models,
instrumental variables, and selection models are a few of the more
common methods.

15 / 22

## The Returns to Schooling

To make this discussion a bit more concrete, consider two youths (with the
same observables) making the decision to drop out of high school: one
drops out and one sticks it out and completes high school
Suppose that the one who stays in high school has high earnings, does not
get arrested, and gets married. The one that drops out has low earnings,
gets arrested, and never gets married

16 / 22

## Are any of these differences in outcomes causal? Possible confounders

include
Differences in standardized test scores
Criminal activity
Lousy high schools
High paying job
Before making that judgment, at a minimum you would want to know a
great deal about the decision-making process of the two youths

17 / 22

## What is the source of the variation in choices made by people in my

sample and why do I think it (or some of it) is not related to the outcome
I am studying?
Why are these observations useful for estimating the missing
counterfactual?

18 / 22

## Back to Our Model

Keeping with the dropout story, let D = 0 indicate the student drops out
of high school and let D = 1 indicate that the student finishes high school
There are two possible outcomes for the students: y1 , the outcome if they
finish high school, and y0 , the outcome if they do not finish high school
There are two parameters you might be interested in:
(D = 1) E (y1 |D = 1) E (y0 |D = 1)
(D = 1) E (y1 |D = 0) E (y0 |D = 0)

19 / 22

## Unfortunately, the data do not reveal E (y0 |D = 1) or E (y1 |D = 0). You

have to estimate those. How?
Well, you just use the nave estimator and assume
E (y0 |D = 1) = E (y0 |D = 0)
E (y1 |D = 0) = E (y1 |D = 1)
But if you claim to believe this, people will look at you like you said you
believe in Santa Claus

20 / 22

## You might be able, however, to find a set of covariates so that

E (y0 |x, D = 1) = E (y0 |x, D = 0)
E (y1 |x, D = 0) = E (y1 |x, D = 1)
This will allow you to identify the parameter
(x, D = 1) E (y1 |x, D = 1) E (y0 |x, D = 1)
(x, D = 1) E (y1 |x, D = 0) E (y0 |x, D = 0)

21 / 22

## But how do we actually do this and estimate ?

What we do most of the time is make parametric assumptions that make
life a lot easier. Suppose we assume
E (y1 |x ) = g1 (x )
E (y0 |x ) = g0 (x )
With this assumption, now all we have to do is estimate the functions
g1 (x ) and g0 (x ) and we solve all of our problems, subject of course to
some huge assumptions.
What we do typically is use Ordinary Least Squares (OLS) to estimate
these functions, which means we apply a linear functional form.

22 / 22