Академический Документы
Профессиональный Документы
Культура Документы
Lecture 7 (Ch14)
Pooled Cross
Sections and
Simple Panel Data
Methods 1
An independently pooled
cross section
This type of data is obtained by
sampling randomly from a population
at different points in time (usually in
different years)
You can pool the data from different
year and run regressions.
However, you usually include year
dummies.
2
Panel data
This is the cross section data
collected at different points in time.
However, this data follow the same
individuals over time.
You can do a bit more than the
pooled cross section with Panel data.
You usually include year dummies as
well.
3
cross sections across
time.
As long as data are collected independently, it
causes little problem pooling these data over time.
However, the distribution of independent variables
may change over time. For example, the
distribution of education changes over time.
To account for such changes, you usually need to
include dummy variables for each year (year
dummies), except one year as the base year
Often the coefficients for year dummies are of
interest.
4
Example 1
Consider that you would like to see the
changes in fertility rate over time after
controlling for various characteristics.
Next slide shows the OLS estimates of
the determinants of fertility over time.
(Data: FERTIL1.dta)
The data is collected every other year.
The base year for the year dummies
are year 1972.
5
Dependent variable =# kids per woman
. reg kids educ age agesq black east northcen west farm othrural town smcity y74 y76 y80 y82 y84
7
Example 2
CPS78_85.dta has wage data collected in
1978 and 1985.
we estimate the earning equation which
includes education, experience,
experience squared, union dummy, female
dummy and the year dummy for 1985.
Suppose that you want to see if gender
gap has changed over time, you include
interaction between female and 1985;
that is you estimate the following.
8
Log(wage)=0+1(educ)
+2(exper)+3(expersq)+4(Union)
+5(female)
+6(year85)
+7(year85)(female)
You can check if gender wage gap in 1985 is different
from the base year (1978) by checking if 7 is equal to
zero or not.
The gender gap in each period is given by:
-gender gap in the base year (1978) = 5
-gender gap in 1985= 5+ 7
9
. reg lwage educ exper expersq union female y85 y85fem
11
Example: Effects of
garbage incinerator on
housing prices
This example is based on the studies of
housing price in North Andover in
Massachusetts
The rumor that a garbage incinerator
will be build in North Andover began
after 1978. The construction of
incinerator began in 1981.
You want to examine if the incinerator
affected the housing price.
12
Our hypothesis is the following.
13
Most nave analysis would be to run the
following regression using only 1981 data.
price =0+1(nearinc)+u
where the price is the real price (i.e., deflated using CPI to
express it in 1978 constant dollar).
Using the KIELMC.dta, the result is the following
. reg rprice nearinc if year==1981
But can we say from this estimation that the incinerator has
14
negatively affected the housing price?
To see this, estimate the same equation
using 1979 data. Note this is before the
rumor of incinerator building began.
. reg rprice nearinc if year==1978
Note that the price of the house near the place where the
incinerator is to be build is lower than houses farther from the
location.
incinerator is
greater in Year 1981 regression
. reg rprice nearinc if year==1981
increase in the rprice Coef. Std. Err. t P>|t| [95% Conf. Interval]
The difference-in-difference
estimator :
1
= (coefficient for nearinc in 1981)
(coefficient for nearinc in 1979)
= 30688.27 ( 18824.37)= 11846
So, incinerator has decreased the house prices on
average by $11846. 17
Note that, in this example, the coefficient for (nearinc) in 1979
is equal to
price =0+1(nearinc)
+2(year81)+1(year81)(nearinc)
Difference in
difference estimator
19
. reg rprice nearinc y81 y81nrinc
21
The group of people who are affected by
the policy is called the treatment group.
Those who are not affected by the policy is
called the control group.
Suppose that you want to know how the
change in spousal tax deduction has
affected the hours worked by women.
Suppose, you have the pooled data of
workers in 1994 and 1995.
The next slide shows the typical procedure
you follow to conduct the difference-in-
difference analysis.
22
Step 1: Create the treatment dummy
such that
You did not find the evidence that receiving the grant will
reduce scrap rate. 24
The reason why we did not find the significant effect
is probably due to the endogeneity problem.
The company with low ability workers tend to apply
for the grant, which creates positive bias in the
estimation. If you observe the average ability of the
workers, you can eliminate the bias by including the
ability variable. But since you cannot observe
ability, you have the following situation.
27
Eliminating bias using two
period panel data
Now, go back to the equation.
log( Scrap ) 0 1 ( grant ) 2 log( sales ) 3 log(employment ) ( 4 ability u )
v
30
First, for each firm, take the first
difference. That is, compute the following.
log(Scrap) it log( Scrap ) it log( Scrap) it 1
It follows that,
31
So, by taking the first difference, you
can eliminate the fixed effect.
log( Scrap )it 1( grant ) it 2 log(sales ) it 3 log(employment ) it 5 ( year88) it uit
. ******************************
. * Generate first differenced *
. * variables *
. ******************************
. gen difflscrap=lscrap-L.lscrap
(363 missing values generated)
. gen diffgrant=grant-L.grant
(157 missing values generated) When you use nocons
. gen difflsales=lsales-L.lsales
(226 missing values generated) option, the stata omits
. gen difflemploy=lemploy-L.lemploy
(181 missing values generated) constant term.
. gen diffd88=d88-L.d88
(157 missing values generated)
. **********************
. * Run the regression *
. **********************
. reg difflscrap diffgrant difflsales difflemploy diffd88 if year<=1988, nocons
34
General case
First differenced model in a more general
situation can be written as follows.
Yit=0+1xit1+2xit2++kxitk+ai+uit
Fixed
effect
If ai is correlated with any of the explanatory variables,
the estimated coefficients will be biased. So take the
first difference to eliminate ai, then estimate the
following model by OLS.
36
First differencing for more
than two periods.
You can use first differencing for more
than two periods.
You just have to difference two adjacent
periods successively.
For example, suppose that you have 3
periods. Then for the dependent variable,
you compute yi2=yi2-yi1, and yi3=yi3-yi2.
Do the same for x-variables. Then run the
regression.
37
Exercise
The data ezunem.dta contains the city level
unemployment claim statistics in the state of
Indiana. This data also contains information
about whether the city has an enterprise zone
or not.
The enterprise zone is the area which
encourages businesses and investments
through reduced taxes and restrictions.
Enterprise zones are usually created in an
economically depressed area with the purpose
of increasing the economic activities and
reducing unemployment.
38
Using the data, ezunem.dta, you are asked to estimate the
effect of enterprise zones on the city-level unemployment
claim. Use the log of unemployment claim as the
dependent variable
39
OLS results
. reg luclms ez d81 d82 d83 d84 d85 d86 d87 d88
40
First differencing
. reg lagluclms lagez lagd81 lagd82 lagd83 lagd84 lagd85 lagd86 lagd87 lagd88, nocons
41
The do file used to generate the results.
reg luclms ez d81 d82 d83 d84 d85 d86 d87 d88
reg lagluclms lagez lagd81 lagd82 lagd83 lagd84 lagd85 lagd86 lagd87
lagd88, nocons
42
The assumptions for the
first difference method.
Assumption FD1: Linearity
yit=0+1xit1++kxitk+ai+uit
43
Assumption FD2:
Assumption FD3:
There is no perfect collinearity. In
addition, each explanatory variable
changes over time at least for some i
in the sample.
44
Assumption FD4. Strict exogeneity
46
Assumption FD5: Homoskedasticity
Var(uit|Xi)=2
47