Вы находитесь на странице: 1из 4

Exploring Regression Results using Margins

Once you've run a regression, the next challenge is to figure out what the results mean. The
command, new in Stata 11, is a powerful tool for understanding a model.

margins

The examples in this article will use the auto data set included with Stata. Load it with:

sysuse auto

Rates of Change
One of the basic questions after running a regression is "If X changes, how much does that change Y?"
or in calculus terms, "What is the derivative of Y with respect to X?"

Linear Regression
In the case of simple linear regression, the answer is usually the coefficient on X. Run the following
regression:

reg price i.foreign c.weight##c.weight displacement


This regresses the price of the car on foreign, weight and weight squared, and
displacement. If you're not familiar with the new factor and interaction notation also introduced in
Stata 11, see the Factor Variables section in Stata for Researchers: Usage and Syntax.
In the model run above, the coefficient on displacement is about 3.64, meaning that if you
increase displacement by one the expected price increases by $3.64. However, weight is not
so simple: if you increase weight by one that increases both weight and weight squared, and the
total effect depends on what
This is where the

weight was to begin with.

margins command becomes useful. With the dydx() option, margins

calculates the derivative of the mean expected outcome with respect to the variable you specify.

margins, dydx(displacement)
gives you 3.64, the original coefficient on displacement. The standard error, P-value and 95%
confidence interval are also very similar to the original regression results, though they're calculated
differently and thus not quite identical. But consider:

margins, dydx(weight)
This gives a result of 2.73, which is nothing like the coefficient on either
squared.

weight or weight

margins does here is take the numerical derivative of the mean expected price with respect to
weight. In doing so, margins looks at the actual data. Thus it considers the effect of changing the
What

Honda Civic's weight from 1,760 pounds as well as changing the Lincoln Continental's from 4,840 (the
weight squared term is more important with the latter than the former). It then averages them along
with all the other cars to get its result of 2.73, or that each additional pound of weight adds $2.73 to
the mean expected price.
Another approach is to set all the variables to their means, then find the derivative of expected price
with respect to weight at that point. You can do that by adding the atmeans option:

margins, dydx(weight) atmeans


In this case the result is the same. But consider a slightly more complicated model, where
and weight squared are both interacted with foreign:

weight

reg price i.foreign##c.weight##c.weight displacement


In with this model:

margins, dydx(weight)
gives 3.08, while:

margins, dydx(weight) atmeans


gives 3.30. The mean effect is not necessarily the same as the effect at the mean.

Binary Outcome Models

margins is even more useful for models with binary outcomes, where interpretation is always
difficult. Before you can run one, you need a binary dependent variable. Create it with the following:

gen bigEngine=(displacement>150)

displacement is a measure of engine size, and we'll call anything over 150 cubic inches "big."
Now consider the model:

logit bigEngine i.foreign weight


The coefficients tell us that being foreign makes a car less likely to have a big engine, while being heavy
makes it more likely. But by how much?

margins gives us one way to answer that question:

margins, dydx(weight)
This tells us that the derivative of the mean expected probability of having a big engine with respect to

weight is .0002664. This suggests that if you had 3753 types of cars and added a pound of weight to
all of them, you'd expect one to switch from having a "small" engine to a "big" engine--not a very big
effect. However, note that the atmeans option changes things somewhat:

margins, dydx(weight) atmeans


gives .0008623, or one expected change per 1160 cars.
Next try:

margins, dydx(foreign)
Stata knows that foreign is an indicator variable because you specified it as

i.foreign in the model

so, instead of looking at small changes, margins considers the effect of changing foreign from 0 to 1. If
all the cars in the sample were domestic (which they are not) and then became foreign, the mean
expected probability of having a big engine would fall by 0.186. This is almost one change for every five
cars, a much bigger effect.

Multinomial Logit
Multinomial logit models can be even harder to interpret because the coefficients only compare two
states. Copy and paste the following command to load a data set that was carefully constructed to
illustrate the pitfalls of interpreting multinomial logit results:

use http://www.ssc.wisc.edu/sscc/pubs/files/margins_mlogit.dta

y that takes on the values 1, 2 and 3; and a continuous variable


x. They are negatively correlated (cor y x).
It contains two variables, an integer

Now run the following model:

mlogit y x

x for outcome 2 is negative, so it's tempting to say that as x increases the probability
of y being 2 decreases. But in fact that's not the case, as the margins command will show you:
The coefficient of

margins, dydx(x) predict(outcome(2))

predict() options allows you to choose the response margins is examining.


predict(outcome(2)) specifies that you're interested in the expected probability of outcome 2.
And in fact the probability of outcome 2 increases with x, the derivative being 0.016.
The

mlogit only compare the probability of a given


outcome with the base outcome. Thus the x coefficient of -5.34 for outcome 2 tells you that as x
increases, observations are likely to move from outcome 2 to outcome 1. Meanwhile the x coefficient of
-21.292 for outcome 3 tells you that as x increases observations are likely to move from outcome 3 to
outcome 1. What it doesn't tell you is that as x increases observations also move from outcome 3 to
How can that be? Recall that the coefficients given by

outcome 2, and in fact that effect dominates the movement from 2 to 1.


You can see it if you change the base category of the regression:

mlogit y x, base(2)
Now the coefficients tell you about the probability of each outcome compared to outcome 2, and the
fact that the negative x coefficient for outcome 3 is much larger (in absolute terms) than the positive
coefficient for outcome 1 indicates that increasing x increases the probability of outcome 2.

Levels
margins can also predict the level of the outcome variable under various scenarios. Sometimes these
"counter-factuals" can be interesting results in and of themselves: "What would would the mean income
be if all the blacks in my sample were white?" "What would the mean test score have been if the
school's demographics hadn't changed?"
Load the automobile data set again and re-run our first regression:

sysuse auto
reg price i.foreign c.weight##c.weight displacement
To examine the impact of

foreign on the mean expected price, type:

margins foreign
This sets foreign to zero for all cars, leaving the other variables unchanged, finds the predicted
price for each car, and then averages them. It then sets foreign to one for all cars and repeats the
process. If you wanted to set all the other variables to their means instead, you'd add the
option just like before:

atmeans

margins foreign, atmeans


The foreign variable can only take on two values (Stata knows this because you marked it as

i.foreign) so the margins command calculated its results for both of them. Obviously you can't
look at all possible values for continuous variables, so for continuous variables you have to specify the
values you're interested in with the at() option. For example, to see what the mean expected price
would be if all the cars weighted 3,000 pounds, type:

margins, at(weight=3000)
If you wanted to compare different values of

weight, replace the 3000 with a list of numbers in

parentheses:

margins, at(weight=(2000 3000 4000))


You can include multiple variables in the at() option, allowing you to set up any scenario you're
interested in. For example, you can find what the mean expected price would be if all the cars were
foreign and weighed 3,000 pounds with:

margins, at(weight=3000 foreign=1)


If you're interested in a statistic that margins can't calculate (say, the effect on a particular car)
there is an alternative technique for examining counter-factual scenarios. It involves actually changing
the data, making sure you can get the real data back, and then using the predict command. For
more information see Making Predictions with Counter-Factual Data in Stata.
Last Revised: 2/17/2010
2012 UW Board of Regents, University of Wisconsin - Madison