Understanding Regression Results Using Margins

Exploring Regression Results using Margins
Once you've run a regression, the next challenge is to figure out what the results mean. The
command, new in Stata 11, is a powerful tool for understanding a model.
margins
The examples in this article will use the auto data set included with Stata. Load it with:
sysuse auto
Rates of Change
One of the basic questions after running a regression is "If X changes, how much does that change Y?"
or in calculus terms, "What is the derivative of Y with respect to X?"
Linear Regression
In the case of simple linear regression, the answer is usually the coefficient on X. Run the following
regression:
reg price i.foreign c.weight##c.weight displacement

This regresses the price of the car on foreign, weight and weight squared, and
displacement. If you're not familiar with the new factor and interaction notation also introduced in
Stata 11, see the Factor Variables section in Stata for Researchers: Usage and Syntax.
In the model run above, the coefficient on displacement is about 3.64, meaning that if you
increase displacement by one the expected price increases by $3.64. However, weight is not
so simple: if you increase weight by one that increases both weight and weight squared, and the
total effect depends on what
This is where the
weight was to begin with.
margins command becomes useful. With the dydx() option, margins
calculates the derivative of the mean expected outcome with respect to the variable you specify.
margins, dydx(displacement)
gives you 3.64, the original coefficient on displacement. The standard error, P-value and 95%
confidence interval are also very similar to the original regression results, though they're calculated
differently and thus not quite identical. But consider:
margins, dydx(weight)
This gives a result of 2.73, which is nothing like the coefficient on either
squared.
weight or weight
margins does here is take the numerical derivative of the mean expected price with respect to
weight. In doing so, margins looks at the actual data. Thus it considers the effect of changing the
What
Honda Civic's weight from 1,760 pounds as well as changing the Lincoln Continental's from 4,840 (the
weight squared term is more important with the latter than the former). It then averages them along
with all the other cars to get its result of 2.73, or that each additional pound of weight adds $2.73 to
the mean expected price.
Another approach is to set all the variables to their means, then find the derivative of expected price
with respect to weight at that point. You can do that by adding the atmeans option:
margins, dydx(weight) atmeans

In this case the result is the same. But consider a slightly more complicated model, where
and weight squared are both interacted with foreign:
weight
reg price i.foreign##c.weight##c.weight displacement

In with this model:
gives 3.08, while:

gives 3.30. The mean effect is not necessarily the same as the effect at the mean.
Binary Outcome Models
margins is even more useful for models with binary outcomes, where interpretation is always
difficult. Before you can run one, you need a binary dependent variable. Create it with the following:
gen bigEngine=(displacement>150)
displacement is a measure of engine size, and we'll call anything over 150 cubic inches "big."
Now consider the model:
logit bigEngine i.foreign weight

The coefficients tell us that being foreign makes a car less likely to have a big engine, while being heavy
makes it more likely. But by how much?
margins gives us one way to answer that question:
This tells us that the derivative of the mean expected probability of having a big engine with respect to
weight is .0002664. This suggests that if you had 3753 types of cars and added a pound of weight to
all of them, you'd expect one to switch from having a "small" engine to a "big" engine--not a very big
effect. However, note that the atmeans option changes things somewhat:

gives .0008623, or one expected change per 1160 cars.
Next try:
margins, dydx(foreign)
Stata knows that foreign is an indicator variable because you specified it as
i.foreign in the model
so, instead of looking at small changes, margins considers the effect of changing foreign from 0 to 1. If
all the cars in the sample were domestic (which they are not) and then became foreign, the mean
expected probability of having a big engine would fall by 0.186. This is almost one change for every five
cars, a much bigger effect.
Multinomial Logit
Multinomial logit models can be even harder to interpret because the coefficients only compare two
states. Copy and paste the following command to load a data set that was carefully constructed to
illustrate the pitfalls of interpreting multinomial logit results:
use http://www.ssc.wisc.edu/sscc/pubs/files/margins_mlogit.dta
y that takes on the values 1, 2 and 3; and a continuous variable

x. They are negatively correlated (cor y x).
It contains two variables, an integer
Now run the following model:
mlogit y x
x for outcome 2 is negative, so it's tempting to say that as x increases the probability
of y being 2 decreases. But in fact that's not the case, as the margins command will show you:
The coefficient of
margins, dydx(x) predict(outcome(2))
predict() options allows you to choose the response margins is examining.

predict(outcome(2)) specifies that you're interested in the expected probability of outcome 2.
And in fact the probability of outcome 2 increases with x, the derivative being 0.016.
The
mlogit only compare the probability of a given

outcome with the base outcome. Thus the x coefficient of -5.34 for outcome 2 tells you that as x
increases, observations are likely to move from outcome 2 to outcome 1. Meanwhile the x coefficient of
-21.292 for outcome 3 tells you that as x increases observations are likely to move from outcome 3 to
outcome 1. What it doesn't tell you is that as x increases observations also move from outcome 3 to
How can that be? Recall that the coefficients given by
outcome 2, and in fact that effect dominates the movement from 2 to 1.

You can see it if you change the base category of the regression:
mlogit y x, base(2)
Now the coefficients tell you about the probability of each outcome compared to outcome 2, and the
fact that the negative x coefficient for outcome 3 is much larger (in absolute terms) than the positive
coefficient for outcome 1 indicates that increasing x increases the probability of outcome 2.
Levels
margins can also predict the level of the outcome variable under various scenarios. Sometimes these
"counter-factuals" can be interesting results in and of themselves: "What would would the mean income
be if all the blacks in my sample were white?" "What would the mean test score have been if the
school's demographics hadn't changed?"
Load the automobile data set again and re-run our first regression:
sysuse auto
reg price i.foreign c.weight##c.weight displacement
To examine the impact of
foreign on the mean expected price, type:
margins foreign
This sets foreign to zero for all cars, leaving the other variables unchanged, finds the predicted
price for each car, and then averages them. It then sets foreign to one for all cars and repeats the
process. If you wanted to set all the other variables to their means instead, you'd add the
option just like before:
atmeans
margins foreign, atmeans

The foreign variable can only take on two values (Stata knows this because you marked it as
i.foreign) so the margins command calculated its results for both of them. Obviously you can't
look at all possible values for continuous variables, so for continuous variables you have to specify the
values you're interested in with the at() option. For example, to see what the mean expected price
would be if all the cars weighted 3,000 pounds, type:
margins, at(weight=3000)
If you wanted to compare different values of
weight, replace the 3000 with a list of numbers in
parentheses:
margins, at(weight=(2000 3000 4000))

You can include multiple variables in the at() option, allowing you to set up any scenario you're
interested in. For example, you can find what the mean expected price would be if all the cars were
foreign and weighed 3,000 pounds with:
margins, at(weight=3000 foreign=1)

If you're interested in a statistic that margins can't calculate (say, the effect on a particular car)
there is an alternative technique for examining counter-factual scenarios. It involves actually changing
the data, making sure you can get the real data back, and then using the predict command. For
more information see Making Predictions with Counter-Factual Data in Stata.
Last Revised: 2/17/2010
2012 UW Board of Regents, University of Wisconsin - Madison

Understanding Regression Results Using Margins

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Understanding Regression Results Using Margins

Загружено:

Авторское право:

Доступные форматы

Exploring Regression Results using Margins

reg price i.foreign c.weight##c.weight displacement

weight was to begin with.

margins command becomes useful. With the dydx() option, margins

margins, dydx(weight) atmeans

reg price i.foreign##c.weight##c.weight displacement

margins, dydx(weight) atmeans

Binary Outcome Models

logit bigEngine i.foreign weight

margins gives us one way to answer that question:

margins, dydx(weight) atmeans

i.foreign in the model

y that takes on the values 1, 2 and 3; and a continuous variable

Now run the following model:

margins, dydx(x) predict(outcome(2))

predict() options allows you to choose the response margins is examining.

mlogit only compare the probability of a given

outcome 2, and in fact that effect dominates the movement from 2 to 1.

foreign on the mean expected price, type:

margins foreign, atmeans

weight, replace the 3000 with a list of numbers in

margins, at(weight=(2000 3000 4000))

margins, at(weight=3000 foreign=1)

Вам также может понравиться