Академический Документы
Профессиональный Документы
Культура Документы
The following replication exercise closely follows the homework assignment #2 in ECNS 562.
The data for this exercise can be found here.
The data is about the expansion of the Earned Income Tax Credit. This is a legislation aimed
at providing a tax break for low income individuals. For some background on the subject, see
Eissa, Nada, and Jeffrey B. Liebman. 1996. Labor Supply Responses to the Earned Income Tax
Credit. Quarterly Journal of Economics. 111(2): 605-637.
STATA:
*************************************************************************
*The following block of commands go at the start of nearly all do files*/
*Bracket comments with /* */ or just use an asterisk at line beginning
R:
1
2
3 # Kevin Goulding
4 # ECNS 562 - Assignment 2
5
6 #########################################################################
7 #
# Load the foreign package
8 require(foreign)
9
1 # Import data from web site
0 # update: first download the file eitc.dta from this link:
1 # https://docs.google.com/open?id=0B0iAUHM7ljQ1cUZvRWxjUmpfVXM
1 # Then import from your hard drive:
eitc = read.dta("C:/link/to/my/download/folder/eitc.dta")</pre>
1 Note that any comments can be embedded into R code, simply by putting a
2 <code> # </code> to the left of your comments (e.g. anything to the right
1 of <code> # </code> will be ignored by R). Alternately, you can download
3 the data file, and import it from your hard drive:
1
eitc = read.dta("C:\DATA\Courses\Econ 562\homework\eitc.dta")
4
1
5
Recall from part 1 of this series, the following code to describe and summarize your data:
STATA:
des
sum
R:
In R, each column of your data is assigned a class which will determine how your data is
treated in various functions. To see what class R has interpreted for all your variables, run the
following code:
1sapply(eitc,class)
2summary(eitc)
3source('sumstats.r')
4sumstats(eitc)
To output the summary statistics table to LaTeX, use the following code:
1require(xtable)
from R.
# xtable package helps create LaTeX code
2xtable(sumstats(eitc))
Note: You will need to re-run the code for sumstats() which you can find in an earlier post.
STATA:
summarize if children==0
summarize if children == 1
summarize if children >=1
summarize if children >=1 & year == 1994
R:
1 # The following code utilizes the sumstats function (you will need to re-
2 run this code)
3 sumstats(eitc[eitc$children == 0, ])
sumstats(eitc[eitc$children == 1, ])
4 sumstats(eitc[eitc$children >= 1, ])
5 sumstats(eitc[eitc$children >= 1 & eitc$year == 1994, ])
6
7 # Alternately, you can use the built-in summary function
8 summary(eitc[eitc$children == 0, ])
9 summary(eitc[eitc$children == 1, ])
summary(eitc[eitc$children >= 1, ])
10summary(eitc[eitc$children >= 1 & eitc$year == 1994, ])
11
12# Another example: Summarize variable 'work' for women with one child
13from 1993 onwards.
14summary(subset(eitc, year >= 1993 & children == 1, select=work))
The code above includes all summary statistics – but say you are only interested in the mean.
You could then be more specific in your coding, like this:
1mean(eitc[eitc$children == 0, 'work'])
2mean(eitc[eitc$children == 1, 'work'])
3mean(eitc[eitc$children >= 1, 'work'])
Try out any of the other headings within the summary output, they should also work: min()
for minimum value, max() for maximum value, stdev() for standard deviation, and others.
To create a new variable called “c.earn” equal to earnings conditional on working (if “work”
= 1), “NA” otherwise (“work” = 0) – use the following code:
STATA:
R:
1
eitc$c.earn=eitc$earn*eitc$work
2z = names(eitc)
3X = as.data.frame(eitc$c.earn)
4X[] = lapply(X, function(x){replace(x, x == 0, NA)})
5eitc = cbind(eitc,X)
eitc$c.earn = NULL
6names(eitc) = z
7
Construct a variable for the treatment called “anykids” = 1 for treated individual (has at least
one child); and a variable for after the expansion called “post93” = 1 for 1994 and later.
STATA:
R:
Create a plot
Create a graph which plots mean annual employment rates by year (1991-1996) for single
women with children (treatment) and without children (control).
STATA:
preserve
collapse work, by(year anykids)
gen work0 = work if anykids==0
label var work0 "Single women, no children"
gen work1 = work if anykids==1
label var work1 "Single women, children"
twoway (line work0 year, sort) (line work1 year, sort), ytitle(Labor Force
Participation Rates)
graph save Graph "homework\eitc1.gph", replace
R:
Calculate the unconditional difference-in-difference estimates of the effect of the 1993 EITC
expansion on employment of single women.
STATA:
R:
STATA:
R:
Adding additional variables is a matter of including them in your coded regression equation,
as follows:
STATA:
reg work post93 anykids interaction nonwhite age age2 ed finc nonlaborinc
R:
STATA:
R:
Testing a placebo model is when you arbitrarily choose a treatment time before your actual
treatment time, and test to see if you get a significant treatment effect.
STATA:
In R, first we’ll subset the data to exclude the time period after the real treatment (1993 and
later). Next, we’ll create a new treatment dummy variable, and run a regression as before on
our data subset.
R:
1
# sub set the data, including only years before 1994.
2 eitc.sub = eitc[eitc$year <= 1993,]
3
4 # Create a new "after treatment" dummy variable
5 # and interaction term
6 eitc.sub$post91 = as.numeric(eitc.sub$year >= 1992)
7
8 # Run a placebo regression where placebo treatment = post91*anykids
reg3 <- lm(work ~ anykids + post91 + post91*anykids, data = eitc.sub)
9 summary(reg3)
10
The entire code for this post is available here (File –> Save As). If you have any questions or
find problems with my code, you can e-mail me directly at kevingoulding {at} gmail [dot]
com.