Академический Документы
Профессиональный Документы
Культура Документы
Once you've run a regression, the next challenge is to figure out what the results mean. The
command, new in Stata 11, is a powerful tool for understanding a model.
margins
The examples in this article will use the auto data set included with Stata. Load it with:
sysuse auto
Rates of Change
One of the basic questions after running a regression is "If X changes, how much does that change Y?"
or in calculus terms, "What is the derivative of Y with respect to X?"
Linear Regression
In the case of simple linear regression, the answer is usually the coefficient on X. Run the following
regression:
calculates the derivative of the mean expected outcome with respect to the variable you specify.
margins, dydx(displacement)
gives you 3.64, the original coefficient on displacement. The standard error, P-value and 95%
confidence interval are also very similar to the original regression results, though they're calculated
differently and thus not quite identical. But consider:
margins, dydx(weight)
This gives a result of 2.73, which is nothing like the coefficient on either
squared.
weight or weight
margins does here is take the numerical derivative of the mean expected price with respect to
weight. In doing so, margins looks at the actual data. Thus it considers the effect of changing the
What
Honda Civic's weight from 1,760 pounds as well as changing the Lincoln Continental's from 4,840 (the
weight squared term is more important with the latter than the former). It then averages them along
with all the other cars to get its result of 2.73, or that each additional pound of weight adds $2.73 to
the mean expected price.
Another approach is to set all the variables to their means, then find the derivative of expected price
with respect to weight at that point. You can do that by adding the atmeans option:
weight
margins, dydx(weight)
gives 3.08, while:
margins is even more useful for models with binary outcomes, where interpretation is always
difficult. Before you can run one, you need a binary dependent variable. Create it with the following:
gen bigEngine=(displacement>150)
displacement is a measure of engine size, and we'll call anything over 150 cubic inches "big."
Now consider the model:
margins, dydx(weight)
This tells us that the derivative of the mean expected probability of having a big engine with respect to
weight is .0002664. This suggests that if you had 3753 types of cars and added a pound of weight to
all of them, you'd expect one to switch from having a "small" engine to a "big" engine--not a very big
effect. However, note that the atmeans option changes things somewhat:
margins, dydx(foreign)
Stata knows that foreign is an indicator variable because you specified it as
so, instead of looking at small changes, margins considers the effect of changing foreign from 0 to 1. If
all the cars in the sample were domestic (which they are not) and then became foreign, the mean
expected probability of having a big engine would fall by 0.186. This is almost one change for every five
cars, a much bigger effect.
Multinomial Logit
Multinomial logit models can be even harder to interpret because the coefficients only compare two
states. Copy and paste the following command to load a data set that was carefully constructed to
illustrate the pitfalls of interpreting multinomial logit results:
use http://www.ssc.wisc.edu/sscc/pubs/files/margins_mlogit.dta
mlogit y x
x for outcome 2 is negative, so it's tempting to say that as x increases the probability
of y being 2 decreases. But in fact that's not the case, as the margins command will show you:
The coefficient of
mlogit y x, base(2)
Now the coefficients tell you about the probability of each outcome compared to outcome 2, and the
fact that the negative x coefficient for outcome 3 is much larger (in absolute terms) than the positive
coefficient for outcome 1 indicates that increasing x increases the probability of outcome 2.
Levels
margins can also predict the level of the outcome variable under various scenarios. Sometimes these
"counter-factuals" can be interesting results in and of themselves: "What would would the mean income
be if all the blacks in my sample were white?" "What would the mean test score have been if the
school's demographics hadn't changed?"
Load the automobile data set again and re-run our first regression:
sysuse auto
reg price i.foreign c.weight##c.weight displacement
To examine the impact of
margins foreign
This sets foreign to zero for all cars, leaving the other variables unchanged, finds the predicted
price for each car, and then averages them. It then sets foreign to one for all cars and repeats the
process. If you wanted to set all the other variables to their means instead, you'd add the
option just like before:
atmeans
i.foreign) so the margins command calculated its results for both of them. Obviously you can't
look at all possible values for continuous variables, so for continuous variables you have to specify the
values you're interested in with the at() option. For example, to see what the mean expected price
would be if all the cars weighted 3,000 pounds, type:
margins, at(weight=3000)
If you wanted to compare different values of
parentheses: