Вы находитесь на странице: 1из 2

Stata Introduction and Worksheet

Maren Vairo, 2018

 You can work in teams!

The Stata interface


 Start Stata.
 Look around. Typically, at the right hand panels you will find the list of variables and the
properties of the database, the top box in the middle is the results window and the one at the
bottom is where you input commands, and on the left-hand panel you can review previous
commands.
 Figure out how to access help files, if you haven’t yet (type help). Browse around on the help
pages. The links on the bottom of each help page can be especially helpful for beginners since for
each command they link to a number of related commands.

Using data files


 Clear the working space: clear any data/results currently loaded on Stata (use clear).
 Create a working directory: this indicates Stata where to find the files we want to open and
where to store our outputs (the command is cd)
 Open statatutorial.dta: double-click on the file, use the open button, or the use command. The
data contains information on individual and household characteristics for a sample of
individuals. Of particular interest is if they bought health insurance (variable “ins”) or not.
 View and describe the data, note how many variables and observations it contains. Then view or
list only those observations for which ins=1 is true. Now, do the same only for the variable “ins”.
(check out the commands: describe, codebook, browse, edit, list, count, if
condition)
 Generate a new variable that captures if an individual has “good” or better health status (=1) or
not (=0). Then generate a new variable that contains the observation number (hint: “_n” is Stata’s
internal code for observation number n). Drop both those variables. Use the command preserve,
replace the values for one of your variables, keep only female observations, and use the
command restore to undo those changes.(gen, replace, drop, keep, preserve,
restore)
 Save the data under a new name, then export it in an Excel-friendly format. (save, outsheet)
 Open the Excel dataset using Stata (insheet)
 Advanced stuff: Input an additional observation and drop it again. Split your variables in two
parts, save them into different data files, then merge them again. Create two separate datasets
with insured and uninsured observations, then append them.(input, save, drop, keep,
use, append, merge, rm)

Basic statistics
 Report some summary statistics on your variables. Among other things, report the sample mean,
the median, and the standard deviation of all variables; and the 10th and 90th percentiles of the
variable hhincome. Now report the 10th and 90th percentiles of the variable hhincome only for
households who bought insurance. (summarize, codebook, centile, table, tabulate)
 Report an estimate for the population mean of hhincome, and report its standard error. Think
about the difference between this task and the previous task of reporting sample moments. It is a
possible source of confusion, so never forget that difference. Also think about the relation
between the standard deviation of hhincome in the sample that you reported in the previous step
and the standard error that you are reporting here. (mean)
 Use graphics commands to plot the distribution of the variable hhincome. (histogram,
kdensity)
 Advanced stuff: Report an estimate for the population mean of hhincome and of its standard
error without using the built-in Stata routine. (gen, egen)

Regressions
 Run an OLS regression of variable ins on age square(age) white married poor hhincome.
(reg, gen)
 What is each element of the results panel telling you? Are any of the coefficients significant? Do
these variables explain a large fraction of why an individual takes up health insurance?
 Save the fitted or predicted values from your regression into a new variable. Analyse if their
distribution has some property that could question the usefulness of our linear regression model
in this application.
 Find out where Stata stores the estimated regression coefficients and variance-covariance
matrix. (ereturn list, mat list)
 Save the estimated regression coefficients into scalars and produce the fitted values manually.
(ereturn list, scalar)

Your do-file
 Start a new “do-file” and record the commands for all the previous tasks, so that your work is
reproducible. Redo all previous steps for female only.
 We want to have things more convenient. Tell Stata to keep a log file of all the results that you
produce. (log)
 Advanced stuff: Take a look at the “set” options that Stata provides. Ask Stata to always report to
you how long it takes for each command to run, and rerun all of the previous steps. Turn the
report on time usage off again since it will really bother you after some time. (set, set rmsg)

Try-your-own time
 Try all the things in Stata that you never dared to try, and ask if there are questions!
 Check out the by and bysort command
 Check out loops: foreach and forvalues
 Or try problem 1 from the first problem set, if you haven’t done so yet.

Where to find help


 Stata Help: In program (Help>Contents) or at www.stata.com/help.cgi?contents
 Carolina Population Center Stata Tutorials:
http://www.cpc.unc.edu/research/tools/data_analysis/statatutorial/index.html
 UCLA ATS Stata tutorials: www.ats.ucla.edu/stat/stata/

Вам также может понравиться