You are on page 1of 2


034 Data Analysis For Managers


Problem SET #4 Write Up

1. Omitted Variable Bias Formula: Anti-Ulcer Drug Demand
In this problem, we plugged numbers into the omitted variable bias formula. This is a
simple case where we have the variable we need (logP) and the omitted variable bias
formula tells us how far wrong we go if we omit it.
Takeaways: Our aim is for you to have a deeper understanding of:

Sources of omitted variable bias and how to start thinking about it.
What it means to control for a variable via a residual regression: the computer is
using the part of the variation in Advertising that is uncorrelated with Price and
testing whether that independent variation predicts quantity.
So, when we hear that estimates were generated by a multi-variate regression,
which controls for the other variables in the model, thats what the computer is

Students asked about the residual regression:

Regress logQ on the residuals,

, from (3).

Does it make sense that this is a good estimate of

if (1) is the true model?

Our estimates of are the residuals from the regression of log P on log ADV: this
is uncorrelated with logADV by construction, but it is of course still related to logP.
As noted in class, it is the part of logP that is uncorrelated with logADV. So,
when we regress logQ on it, we are using variation in logP that is uncorrelated with
logADV and seeing whether that residual variation is related to logQ. If it is, then
we say that this is the relationship between logP and logQ controlling for logADV.


The threein terms (threein and logP_3in) are not statistically significant
when tested separately, but they are significant when tested jointly (F-test,
with p-value of 0.005). The difference is that the F-test is a test of whether
both restrictions are true and takes into account that the variables are related
to one another. The t-tests do not consider this interrelationship.
We conclude from the F-test that these variables matter as a group so it
would do damage to the model if we excluded them. Consider if you

dropped only the logP_3in. Now, threein would be statistically significant as

shown in the .log file. Only by including both in the model did the t-tests
Say you dropped the threein terms. Are all of the remaining variables
individually statistically significant? What is the R-squared?
The estimates look gooda lot of significance, high R-squared. This
doesnt mean it is a well specified model.
What is the new price elasticity?
And, this affects the estimated price elasticity, which is now an average of
the price elasticity with one good and with three goods in the market. The
elasticity goes from -1.17 to -1.78.
Consider the following model:

Lets say you wanted to impose the restriction that the coefficients on threein
and fourin are equal to one another. What new regression model would impose
that restriction? What would your new price elasticity estimate be imposing
that restriction?
Recall that twoin is a dummy variable for two drugs in the market (0 before, 1 when
there are 2 drugs, and 0 after) with similar definitions for threein and fourin.
And you want to impose the restriction that
Then rewrite the model substituting
that restriction

Collect the

into the coefficient on fourin (imposing


So, you could create a new variable called three_or_fourin=threein+fourin and the
estimated coefficient on that variable would be the estimate of (which = as
well). This variable is a dummy variable equal to 1 if there are 3 or 4 drugs in the