Вы находитесь на странице: 1из 7

STATA Tutorial

GVPT622: Quantitative Methods I


September 4, 2002

1 Preliminaries
STATA is a command-line driven statistics package. This means that much like DOS, you need to
type commands into the software to make it execute any routine. While this is a bit more difficult than
menu-driven packages like SPSS, it is much faster and more flexible. This document is meant to get you
started working with STATA.

2 Getting Started
2.1 Why All those Windows?
STATA is a multiple-windowed environment. When you open STATA, you will see 4 windows.
1. Review window - The review window gives you a list of previously typed commands. You can access
these two different ways. You scroll through the previous commands with the scroll arrows and
click on the command, or each time you hit the “page up” command the previous command will
show up in the command window. If you hit the “page up” key twice, the next to last command
will pop up and so forth.
2. Variables window - The variables window provides a list of the variables and their labels that are
in the currently loaded dataset.
3. STATA Command window - The STATA command window is where the user can interact with
STATA. This is where commands are typed in.

4. STATA Results window - The STATA results window show you the results of the commands typed
into STATA.
The Graphics Window displays graphs as a result of a graph command being typed in the command
window. This window will not be visible when you first open STATA, rather it will pop up directly
following a graphical command.

2.2 Log Files


Log files are a way to save all of the commands and corresponding output generated during a STATA
session. It is essential to open a log file during every session to keep track of data manipulation and any
analysis performed. A log file can be opened in two different ways. First, you can open a log using the
menu option: File>>Log>>Begin at which point the program will ask you to specify the name for the
log file. Make sure that you put it in the directory you want. There will be two options for a log file.
The .smcl format is a formatted log and the .log format is an unformatted log which can be opened in
any text editor. I find the .log files easier to work with, but this is only a personal preference. If you
want a .log file, make sure to change the option in the pull-down menu of the save box. The second way
to open a log is to type the command in the command window:
log using filename [, append replace [ text | smcl ] ]

1
This type of log file captures everything that comes up in the results window. There is another type of
log file - the command log - that captures only commands, not output. This type of file is one that would
allow you to replicate your analysis with just one command. The command log can be requested using
command-line syntax as follows:

cmdlog using filename [, append replace]


The command log can then be opened as a text file, or can be opened in STATA’s do-file editor. You
can access this document by clicking the envelope looking button (the fifth one from the left in STATA’s
toolbar) which opens the do-file editor or you can access it through the command line by typing:

doedit filename
The file can easily be run in stata by typing in the command line:

do filename
or by clicking on the Tools>>Run menu option in the STATA do-file editor.
Log files can be suspended or closed. Suspending a log file can be done by typing “log off” which
temporarily closes the log file. The log can then be turned back on by typing “log on”. Closing a log file
is done by typing “log close”. You may then open a new log file. You could open the same log file and
add more information by typing:

log using logname.log, append


You can also replace a log file with a new log file by replacing the append statement with the replace
statement in the syntax above.

2.3 Getting Data in STATA Format


There are 4 main ways to put your data in STATA:
1. Typing data into STATA.
2. Copy and paste from another program.
3. Infile/Insheet.
4. Stat-Transfer.

2.3.1 Typing in Data


You can type data into STATA directly without having it in any other format. You can type data into
a spreadsheet environment either by typing “edit” in the command window or by pushing the button
with the spreadsheet (without the magnifying glass) on it. The button with the spreadsheet and a
magnifying glass is a data viewing environment where the data cannot be edited. STATA’s capabilities
as a spreadsheet are lacking so typing directly into STATA is only a good idea with very small datasets.

2.3.2 Copy and Paste


Data can be cut and pasted from other programs into STATA. This works particularly well with
spreadsheet data, but can also work with text delimited data. To copy and paste data into STATA
simply open the data editor as suggested above, then copy from a spreadsheet like excel and paste into
STATA.
There are a couple of text editors that will allow the user to copy blocks of text, such as columns, from
the middle of a document which is particularly useful in these types of situations. These are Textpad
- www.textpad.org and WinEdt - www.winedt.org. These are essentially shareware. WinEdt is a $30
registration and the program becomes annoying after the trial period is up. Textpad could also be
registered but is not particularly annoying if you don’t.

2
2.3.3 infile/insheet
STATA’s infile command allows the user to bring in any sheet of data into the program. This is
usually done from a .txt document. The data file should be delimited by tabs, spaces or commas. Insheet
is a similar command that is specifically designed for data read out of a spreadsheet program and in this
utility, the delimiter is an argument to the function where it is not in the infile command. The syntax to
the infile command is as follows 1 :

infile varlist [_skip[(#)] [varlist [_skip[(#)] ...]]] using filename [if exp]
[in range][, automatic byvariable(#) clear ]
The syntax to the insheet command is:

insheet [varlist] using filename [, double [no]names


{ comma | tab | delimiter("char") }
clear ]

You can get a description of what the arguments mean to these and other functions by typing:
help infile1
help insheet
or more generally:

help <function>

Dictionaries Dictionaries are a way to define variable types. STATA does not like to infile string
variables without a dictionary. Dictionary files include not only the data you want to input, but a
dictionary command at the beginning. For an example, see “H:/GVPT622 F02/auto.dct”, you can open
it in a text editor. STATA has two basic types of variables: string and numeric. To use a dictionary with
the infile statement, just type:

infile using [filename.dct]

1. String variables are those that contain at least one non-numeric character such as a letter or symbol.
STATA calls these “str” variables. There is always a number after the “str” which denotes how
many characters wide the variable is, so a variable that is str8 is 8 characters long.

2. Numeric variables are those containing only numbers (including possibly a decimal point). There
are different kinds of numeric variables: byte, int, long, float and double. They all have different
minima, maxima and precision toward zero. Type “help datatypes” for a more thorough discussion.

2.3.4 Stat-Transfer
By far, the easiest way to get data into STATA or nearly any other format for that matter, is with
Stat-Transfer. This program allows the user to take data in nearly any format (including SAS, SPSS,
Excel (or other spreadsheet), Access (or other database), Systat, Gauss, Limdep, Matlab, Statistica,
etc...) and transfer the data into any other format. One of the benefits is that variable names and labels
as well as value labels tend to be preserved across formats. Stat-Transfer is a windows program that
should be on the statistical software menu in the graduate lab or in LeFrak.
The program works in 4 steps.
1. Choose the type of file you want to transfer.
2. Find the file on your computer
3. specify the type of file into which you want to transfer your data.
1 The hard brackets [ or ] in the commands need not be entered in the syntax, they are simply for clarity in the
presentation.

3
4. hit “Transfer”.
For more advanced users, there are tabs of observations, variables and options that will help the user
tweak the program to produce more polished data, but often times specifying further options in these
tabs is not necessary.

3 Saving and Loading Data in STATA Format


3.1 Loading Data
Data can be loaded in one of two ways:
1. Menu - With the Menu option File>>Open, you can search and load data. Similarly, you can type
ctrl+O or hit the open folder button, the first one on the left-hand side of the STATA toolbar.
2. Syntax - You can type the use command directly into the command window. The command is as
follows:

use filename [, clear nolabel ]

The clear option allows data to be loaded in even if data are currently loaded into the program and
have changed since the last save command was executed.

3.2 Saving Data


Data can be saved in a couple of different ways as well.
1. Saving with the Menu - menu option File>>Save or File>>Save As, can be used to save data in
STATA format. These files end in a “.dta” extension. One can also hit ctrl+s to save as well.
2. Syntax - data can be saved using the command save as follows:

save [filename] [, nolabel old replace all intercooled ]

Where old instructs the software to save the dataset in the previous version of STATA. You shouldn’t
need this in the lab, but will if you’re using STATA 7 elsewhere and want to use the data in STATA
6 in the lab. Replace simply replaces the dataset if there is one that has the exact same name. The
other options are irrelevant to your work.

4 Graphing
STATA’s graphing capabilities are not the best of the statistical packages, but they are sufficient for
exploratory analysis. They are, however, probably not good enough for publication. There are many
possibilities. These can be broken down into two basic types - univariate and bivariate.

4.1 Univariate Graphs


Univariate graphs are usually meant to describe the distributional properties of a single variable.
These include histograms, density plots, boxplots, and oneway scatterplots.

4.1.1 Histogram
Histograms - Histograms place observations into categories (or bins) which are then graphed as a
function of the percentage of the total observations that are in each bin. The command in stata is:
graph [variable] [weight] [if exp] [in range], histogram [common_options
bin(#) {freq | percent} normal[(#,#)] density(#)]
The “bin” argument allows you to set the number of categories into which the observations are placed.
A density curve can be imposed on the histogram.

4
4.1.2 Density Plots
A density plot is also called a “smoothed histogram”. In this graph, there are no bins. It is a single
line that is more like the population density function than the histogram. The command in stata for this
is:
kdensity varname [weight] [if exp] [in range] [, nograph
generate(newvarx newvard) n(#) width(#)
{biweight|cosine|epan|gauss|parzen|rectangle|triangle} normal
stud(#) at(varx) symbol(...) connect(...) title(string)
graph_options ]
The gauss option is probably the one that will be most useful. The biweight, cosine, epan (epanechankov),
parsen, rectangle and triangle options are all options that control how observations are weighted (this is
analogous to deciding which bin they are in).

4.1.3 Boxplots
Boxplots, sometimes called “box and whisker” plots are particularly good at showing the spread of a
distribution. The box represents the inter-quartile range (the range between the 25th and 50th percentiles.
The whiskers cover most of the rest of the observations, but some extreme outliers can still lie outside
the whiskers. The STATA command to make a boxplot is:

graph [varlist] [weight] [if exp] [in range], box [common_options


[no]alt vwidth root]

4.1.4 Oneway Scatterplots


Oneway scatterplots (also called rug plots in other packages) are yet another way to visualize univariate
distributions. These are particularly good with smaller datasets as with larger ones, the distributional
qualities are not distinguishable. The STATA command to construct a oneway scatterplot is:
graph [varlist] [weight] [if exp] [in range], oneway [common_options
jitter(#)]

4.2 Bivariate Graphs


Bivariate graphs display the relationship between two variables, While theoretically, there are a num-
ber of possibilities for visualizing two variables together, such as a joint density plot, the one used almost
exclusively is the bivariate scatterplot.

4.2.1 Bivariate Scatterplot


The bivariate scatterplot uses the values on two variables (X and Y) as coordinates graphed onto
a set of coordinate axes. A number of different lines can be plotted on the graphs to further describe
the relationship between the two variables. We will learn more about these later in the semester. The
command to create a bivariate scatterplot in STATA is:
graph [varlist] [weight] [if exp] [in range], twoway [common_options
jitter(#) rescale rbox {y|x|r}reverse]
You may consult the STATA graphing manual or help files for more specific help on any of these and
many other commands for the graphical display of data.

5 Miscellaneous
There are a number of other commands that will become useful as you begin to use STATA on a
regular basis.

5
1. Describe - describe provides you with a list of properties of the variables specified or all of the
varaibles in the dataset if no variables are specified.

describe [varlist] [, short detail fullnames numbers ]

2. Summarize - summarize provides mean, variance, min and max for all of the variables specified or
all variables in data if none are specified.

summarize [varlist] [weight] [if exp] [in range] [, [detail|meanonly]


format ]

3. Set Memory - the memory set function will be important when you are using large datasets.

set mem 100m

This will set the memory at 100 megabytes. This should be sufficient for nearly all of your projects.
The upper bound is determined by the computer’s physical memory and if 100 megabytes is not
enough, if you computer has more memory available, you can set the limit higher.
4. Labelling - Labelling variables and variable values is important to keeping your dataset manageable.
You will hear horror stories from many quantitative types about how they didn’t label variables and
variable values because they were sure they would always remember and then two years later after
having left the project sitting, they come back only to find they’ve forgotten everything about the
variables and their coding. You will need three different commands to properly label your variables.
(a) Label Variable - this command simply attaches a label to the variable name. So, if for instance
the name of the variable is ’var1’, and you label it ’party ID’, then ’party ID’ will show up in
all printed output containing that variable. The command in STATA is as follows:
label variable varname ["label"]
Where ’varname’ is the variable name (var1 in the example above) and label is the label you
want to apply to that variable name (party ID in the example above). So, to create the label
party ID for var1, we would type the following:
label variable var1 "party ID"
(b) Label Define - this command defines value labels. For instance, our party ID variable may have
republicans, independents and democrats. We want to make a label so that if we tabulate the
variable, instead of 0, 1 and 2 as categories, it shows republicans, independents and democrats.
The general STATA code is as follows:
label define lblname # "label" [# "label" ...] [, add modify nofix ]
Where lblname is the name you want to give to the label, like ’partyid’ for this case, # signifies
the number you want the label to apply and label is the descriptor. For this example, we would
type:
label define partyid 0 "republican" 1 "independent" 2
"democrat"
(c) Label Values - Finally, we can apply the new value label we defined ’partyid’, to the variable
of interest.
label values var1 partyid
More generally, the syntax is:
label values varname [lblname] [, nofix ]

6
6 Resources
1. STATA’s website: www.stata.com has a number of useful resources, like help files and FAQ’s.
2. STATA also has a listserv called STATA list. You can subscribe to STATA list you can consult the
STATA list FAQs located at http://www.stata.com/support/faqs/res/statalist.html.
3. Reference manuals are also a great source of information, hopefully we will have them available to
you early in the semester.

Вам также может понравиться