Вы находитесь на странице: 1из 27

Chapter 1: A Gentle Introduction and UE

Chapter 1

1.1 Acquiring Stata

First things first. Before you can use Stata, you have to get access to it.
How do you get it? Your college or university may provide Stata in
official computer labs. If it doesnt (or if you want a personal copy), you
can buy and download Stata directly (www.stata.com). Fortunately,
reasonable student pricing is available.

With access to Stata, you open it as you would any program on your
computer (like Word, Excel, etc.). When you open Stata on a PC, you
should see something like Figure 1.

FIGURE 1.1

Stata also runs on Macs and, while it looks slightly different, the
commands and functionality are nearly identical on a PC. Figure 2
shows Stata on a Mac.

Using Stata 1-1


FIGURE 1.2

Lets talk about what you see. There are five panels or windows in
Stata. The biggest one, squarely in the middle of the screen, is the
Results window. Nicely, it shows you the results of what you tell Stata
to do.

At the top left is the Review window. This area provides a history of
all the commands you have given Stata. The top right is where the
variables in your dataset will show up and the bottom right is where
youll see properties of the variables.

The bottom, center window is the Command window. As the name


suggests, this is where you can tell Stata what to do, where you
actually program. (Dont panic! You can work in Stata by typing
commands one at a time or you can roll all your commands up into a
single programcalled in Stata language a do-file.)

Using Stata 1-2


1.2 Getting Data in Stata

With Stata open, we should move along and open a dataset.

Stata Format

There are a number of ways to get data in Stata. The easiest, of


course, is for data to be in Stata format to begin with. Like most
software packages, a certain extension is associated with a certain
type of file. A Microsoft Word document has the extension .docx and
.pdf is the extension for Adobe Acrobat. Stata data sets have a .dta
extension.

Opening a .dta file in Stata is pretty straightforward. Click on File at


the top left. Then click on Open and from there select the folder your
data set is in, and then click the dataset name. You are off and running.
In Figure 1.3, I show doing exactly this for the Magic Hill dataset
(named HTWT1.dta) introduced in Section 1.4 of Using Econometrics. I
had previously saved that file to my hard drive (from the Using
Econometrics Student Companion website).

HTWT1.dta has two variables:

Y: weight (in pounds) of the ith customer


X: height (in inches above 5 feet) of the ith customer

FIGURE 1.3

Using Stata 1-3


Figure 1.4 displays what you should see in Stata after loading the
HTWT1.dta file.

FIGURE 1.4

Using Stata 1-4


Notice in the Variables window (blue arrow) there are two variables X
and Yjust as you expected. The Results window gives a record of
what I did (red arrow). In this case, I opened a data set. In Stata-speak,
to open a data set is it use a data set. The line use
"/Volumes/ECONOMICS/Economics/Data/HTWT1.dta" is really a Stata
programming command. Dont stress out. We will break that (and
many other) commands down a bit later. For now, appreciate that even
though you opened a data set by a point and click approach, Stata
recorded what you did in its language. That is a nice nugget to keep in
mind.

Also notice that opening the command was recorded in the Review
window.

Of course, to open a Stata dataset you can could also find the file on
your computer and double-click on it, just like you open files with
most other common software.

Using Stata 1-5


While you will have access to all the datasets used in Using
Econometrics in Stata-format, that wont be the case with many other
datasets. With that in mind, we should cover a couple of common
approaches to get data into Stata.

The Hard Way: Manual Data Entry

Often in life, there is a hard way to do something. Note hard does


not necessarily mean ineffective way. The hard way to get data
into Stata is to manually input it.

Lets say you have the following data that you need to get into Stata.

Income Experienc Name


e
$35,000 8 Bruce
$45,000 6 Sue
$52,500 9 Maria
$37,500 15 Woody
$20,000 1 John

Income is defined as annual income in dollars, Experience is in years,


and Name is, well, the persons name.

As before, open Stata as you would any other program. At the very top,
you will see an icon that looks like a spreadsheet with a pencil. Figure
1.5 shows this:

Using Stata 1-6


FIGURE 1.5

If you click on this icon, it will open a Data Editor window. Figure 1.6
shows the Data Editor. As the name suggests, this is where you can
edit data.

Using Stata 1-7


FIGURE 1.6

The Data Editor looks very similar to a spreadsheet. It is organized in


rows and columns. In Stata, each column is a variable. Each row is an
observation.

Start in the top-left cell (indicated by blue arrow) and type 35000 and
hit enter. Figure 1.7 shows what you should see:

FIGURE 1.7

Using Stata 1-8


Notice that the column is now named var1 and the row is numbered
as 1 automatically. We should go ahead and tell Stata that we want
the name of this variable to be Income and not var1. Since we are
in the data editor, an easy way to do this is to double-click on the
var1 under the Properties window at the far right of the page (shown
by a blue arrow in Figure 1.8).

FIGURE 1.8

Name the variable Income. Figure 1.9 shows what you see after
doing this.

Using Stata 1-9


FIGURE 1.9

Naming variables is important for obvious reasons. You want to make


sure variable names are informative but not excessively long. Also,
keep in mind that Stata is case sensitive. To Stata Income and
income are different words.

The next step should be to enter experience and the name of the first
person (Bruce). You would enter 8 in the first row, second column
and then Bruce in the first row, third column. Figure 1.10 shows this.

Using Stata 1-10


FIGURE 1.10

Again, notice that when we entered experience the variable was


named automatically var2 and the variable name of the individual
was var3. Naturally, we would want to rename these Experience
and Name as we did for Income.

At this point we have all the information for Bruce in the data set. The
first row in the data set contains all of Bruces information. It is worth
repeating that a row in Stata is an observation.

Using Stata 1-11


After we rename the variables, we should go ahead and enter the
information for the other four people. Figure 1.11 shows what you
should see after all the information is entered.

FIGURE 1.11

You have now worked through getting data into Stata the hard way. I
would suggest at this point you save your data set. Save early and
save often is a VERY good rule to live by! The easiest way to do this is
the click on file>save as as you would with any other software (such
as Word) as shown in FIGURE 1.12. Naturally, after you have saved and
named the file the first time, to save you just click on save.

Using Stata 1-12


FIGURE 1.12

I want to make a really important note at this point, something you


might have stumbled onto. Notice that when I entered the income for
Bruce I did NOT use a comma or a dollar sign ($). In Stata, there are
essentially two types of data: numeric and non-numeric. Numeric data
only have numbers (and a decimal, if called for). Data with anything
other than numbers is non-numeric. While this is an oversimplification
it is good place to start. The takeaway at the moment is that Stata

Using Stata 1-13


would have seen $35,000 as a non-numeric entry no different than it
saw Bruce. Since we need it to be a number, we entered 35000 as
the value.

After you have saved your data set, you can now close your data editor
window. Youll probably notice that there are many lines in the Results
and Review windows. This is shown in Figure 1.13 (blue and red arrows,
respectively).

FIGURE 1.13

What you see is Stata making a record of everything you did as you
entered the data in the form of Stata commands. As before, this is a
helpful (and sensible) feature of Stata and something we will explore
more formally later.

Using Stata 1-14


The Less Hard Way: Importing Data

Another common way to get data into Stata is to import it from


another form. While Stata can import a number of data forms, perhaps
the most common import is from a Microsoft Excel spreadsheet. With
that in mind, well take some time to walk through the process.

We will use the same data we manually imputed. I have recorded the
data in an Excel file, shown in Figure 1.14.

FIGURE 1.14

To import this into Stata, click file>import in Stata and select Excel
spreadsheet (*.xls; *xlsx) This is shown in Figure 1.15 (blue arrow).

FIGURE 1.15

Using Stata 1-15


Once you do that, another window will open, shown in Figure 1.16.
FIGURE 1.16

From here, click on Browse (blue arrow) which will allow you to
select the file you want to import. My file is named

Using Stata 1-16


ExcelImportData.xlsx. You should see something along the lines of
Figure 1.17

FIGURE 1.17

Before clicking OK we should talk about a couple of settings. The


first, identified by a blue arrow, asks whether you want to have the first
row in your Excel file be the variable names. In our case, we should
check this box because row 1 has our variable names. If the first Excel
row doesnt contain the names, of course, dont check it!

The second setting asks whether we want to import the data as


strings (indicated by red arrow). While strings has a formal
computer science definition, for our purposes it means not a number.
Clearly, this is not what we want. We need our income and experience
data to be numbers in Stata. So, you should not check that box.

After clicking the first box and NOT the second box, hit OK. This will
automatically pull the Excel data into Stata. You should then see
something like Figure 1.18.

FIGURE 1.18

Using Stata 1-17


And you are in the same place as if you had manually entered the data
(though this is a good bit more fun!). If this was a real project, youd
want to go ahead and save your newly imported data set.

1.3: Some Basics of Using Data

Once you have data in Stata, you can actually do interesting things.
You will use some commands frequently in Stata and here well work
through some of the common ones. Well use the income and
experience data introduced above and pick up right after the Excel
data import.

Summary Statistics

One question that might come up is, what is the average income for
our data set? Put another way, what is the sample mean of income?

To get summary statistics (which include mean, standard deviation,


minimum and maximum), you would give the command:

summarize variablename

Using Stata 1-18


As a general rule throughout this document, actual Stata
commands will be given in blue font and other elements in
Stata command lines, such as variables, will be in red. Both
will be italicized.

Taking the above syntax and applying it to our income and experience
data, you would type the following in the Command window

summarize Income

and hit enter.

You should see something like Figure 1.19.

FIGURE 1.19

Using Stata 1-19


The results of your command are reported, nicely, in the Results
window along with a record of what command you gave Stata. This
single command gives quite a bit of information. Lets walk through
each:

1. Obs.: the number of observations used in the calculation.


2. Mean: the sample mean (i.e. average) of the data set.
3. Std. Dev.: the standard deviation of the sample.
4. Min: the minimum value found in the data set.
5. Max: the maximum value found in the data set.

If you wanted even more information on income, you could ask for
detailed summary statistics. To do this, you would add ,detail to
the end of the command.

summarize Income, detail

Figure 1.20 provides a picture of what you should see after this
command, zooming in to only see what would is displayed in the
Results window.

Using Stata 1-20


FIGURE 1.20

Adding detail to the command gives you much more information. Our
data set only has 5 observations so this is not as interesting as if we
had thousands of observations. Still, the point is that you can easily get
quite a bit of information about a variable in Statawhether it has 5
observations or 5 million.

Its easy to get summary statistics for more than one variable at the
same time. The general syntax in Stata is:

summarize variablename1 variablename2 variablename3

You can add as many variables to the statement was you want. Or, if
you are lazy (no comment), you can just type:

summarize

That will give summary statistics on every variable in the data set.
Doing that for our data set generates something along the lines of
Figure 1.21.

FIGURE 1.21

Using Stata 1-21


You get a listing of every variable in your data set along with calculated
summary statistics. Notice, however, there is something funny about
the name variable. Stata reports it has no observations and does not
provide any summary statistics. What is going on?

If you think about it for a moment, Name is a text variable. It records


the names of each individual in the sample. When was the last time
you tried to average names? I thought so. Stata is being polite when it
reports 0 observations.

Creating Variables

Another useful ability to have in Stata is to be able to create variables


from existing variables. For example, using our current income and
experience data, we might wonder what each person is paid per year
of experience. Put another way, we could create a variable named
IncPerYrExp (note, I tried to make the variable name informative but
not too long) which is defined as income divided by years of
experience.

The general syntax in Stata to create a new variable is:

generate newvariable = some_mathematical_function

Where newvariable is the name you give to the variable you are
creating.

To create IncPerYrExp as defined above for our data set, Id give the
following command:

generate IncPerYrExp = Income / Experience

Figure 1.22 shows what you should see in Stata after this command.

Using Stata 1-22


FIGURE 1.22

Not much exciting happens. But, notice that in the Variables window
you have one more variable than before: IncPerYrExp (indicated by
blue arrow).

You can click on the icon to see it. If you do that, you will see
something along the lines of Figure 1.23.

FIGURE 1.23

Using Stata 1-23


Nicely, Stata did just what we asked: create a new variable, name it
IncPerYrExp and define it as income divided by experience. Perfect.

The generate command is quite flexible and can handle a number of


mathematical expressions. The following are examples of what you
could do (even if you wouldnt want to). Can you decipher what is
going on in each one?

generate IncMinusExp = Income Experience

generate IncPlusExp = Income + Experience

generate IncInThousands = Income/1000

generate Inc_Squared = Income*Income

generate Inc_Squared = Income^2

generate ln_Inc = ln(Income)

The last generate command is one to note. It creates a variable (ln_Inc)


which is the natural log of income. It uses a mathematical operator
command: ln(). Stata has many operators and we will cover more as
needed.

1.4 Beyond Data Manipulation: OLS Regression

Hopefully, you are starting to feel a bit more comfortable with Stata
and working with data in Stata. There is much more to learn and do
(Stata is pretty amazing!) and we are on our way.

Using Stata 1-24


Sections 1.4 and 1.5 of Using Econometrics present two examples of
regression analysis. It seems entirely appropriate to use one of those
to show how Stata can be used to generate regression results.

The good news is that running a regression in Stata is pretty


straightforward. The basic syntax is:

regress dependentvariable independentvariable

The regress command tells Stata to take the specified variables and
perform a regression. Lets work through the Magic Hill example on
page 17 in Using Econometrics.

The Magic Hill data can be downloaded from the Using Econometrics
Student Companion website. The name of the data set is HTWT1.dta
and it has two variables:

Y: weight (in pounds) of the ith customer


X: height (in inches above 5 feet) of the ith customer

The model proposed in from Using Econometrics, Section 1.4, Equation


1.18, is:

Y i= 0 + 1 X i+ i

After loading the data into Stata, type the following command in the
Command window and hit enter.

regress Y X

Figure 1.24 indicates what you will see right before hitting enter. Figure
1.25 shows what you should see right after hitting enter.

FIGURE 1.24

Using Stata 1-25


FIGURE 1.25

Using Stata 1-26


A lot has happened in the Results window. For now, focus on the three
arrows (blue, red, and green). The blue arrow points to the regression
command. The red arrow points to the column of variables in the
regression: Y, X, and something called _cons. That something is the
models estimated intercept term, otherwise know as ^ 0 .

The green arrow points to the Coef. column, which reports the
estimated coefficients. The first number in the Coef. column is
6.377093. That is the estimate for 1, the parameter for X. It matches
the 6.38 (rounded) of Equation 1.19 in Using Econometrics. Just below
that is _cons, the estimate of 0, the intercept. It is 103.3971, which
rounds to 103.40.

Using Stata 1-27

Вам также может понравиться