a) In this regression, the OLS estimator of the coefficient of smoking is unlikely to be consistent
because of there is a problem of endogeneity. This means that there could be some variables
existing in the error term, which are correlated with both smoke and bmi. Two such variables could

1. Alcohol Consumption
2. Exercise

Alcohol consumption reflects a certain kind of lifestyle and can be highly correlated with smoking
patterns. Additionally, one’s alcohol intake is also likely to be correlated with that person’s bmi(drinking
a lot can cause symptoms such as having a beer-belly which can have an adverse effect on the bmi).
Similarly, whether one exercises or not also reflects a certain kind of lifestyle and and is thus correlated
with both smoking and bmi. Thus there is very clearly an endogeneity issue here which can cause the
estimator to be inconsistent.

The asymptotic bias is:

1. Alcohol Consumption: Asymptotic bias is calculated as βcov(smoke, alcohol

consumption)/var(smoke). Here β is the coefficient of alcohol consumption on bmi. Alcohol
consumption has a positive correlation with smoking and β in this case is also positive since
alcohol consumption has a positive correlation with bmi as well. Therefore, there is a
positive bias in this case.
2. Exercise: Asymptotic bias is calculated as ⍺ cov(smoke, exercise). Here, ⍺ is the
coefficient of exercise on bmi. Exercise has a negative correlation with smoking and ⍺ in
this case is also negative. Therefore, there is a positive bias in this case as well.
b) Refer to log file.

c) To check whether something is a good instrument for the estimation, we need to keep in mind
instrument relevance (whether the instrument is correlated with smoke) and instrument
exogeneity(the IV should not have any partial effect on bmi other than through smoke). The dataset
has the following three instruments for smoking:

1. Cigarette Tax: This is a good instrument because it satisfies both conditions of instrument
relevance and instrument exogeneity. Cigarette Tax affects bmi only through smoke and it is not
correlated with the error term. The only drawback of using cigtax as an IV is that if the demand for
cigarettes is inelastic, then cigtax will not have a very high correlation with bmi.

2. Involuntary Employment: This is not a very good instrument because it can affect bmi directly, not
only through smoke. This is because if a person is involuntarily unemployed then he/she is unlikely
to meet his/her calorie requirements and thus could end up losing a lot of weight, which will have a
negative effect on the bmi. Thus it violates the instrument exogeneity condition.

3. Separated/Divorced Status: This is not a very good instrument because it does not really have any
strong correlation with smoke. While the shock of going through a divorce/separation could induce
some people to smoke, it is not true that divorced people are systematically more likely to smoke
than non-divorced people. Hence, in this case the condition of instrument relevance is violated. In
addition, it is quite likely that if one is divorced or separated, then the mental anguish of the
situation could cause a person to start eating less, which would have a direct effect on bmi. Thus it
could affect bmi directly, and not only through smoke. This violates instrument exogeneity as well.

d) Refer to log file. The joint significance is very high with a p-value of 0.0001.

e) Refer to log file. As compared to the OLS regression in part (b), the only main change is that the
coefficient on smoke changes from 0.1387 to -1.1753. The standard errors also change from 0.2968 to
2.346. However, the p-values in both OLS and 2SLS estimates are large enough for us to conclude that
the smoke variable is not significant. Apart from smoke, the coefficients and standard errors of all the
other variables remain more or less the same in both the OLS and the 2SLS regressions.
f) The model estimated in part e) is incorrect. This is because; the model used here includes all three
instruments for its IV regression. It uses cigtax, unemployment and sepdiv as instruments in its
estimation. However, as I reasoned in part c), only cigtax satisfies both the conditions of instrument
relevance and instrument exogeneity, and thus it is the only instrument which should be used in our
2SLS regression. Refer to log file for the estimation.

The only drawback is that, once the new model is estimated on STATA, it does not show a strong
correlation between smoke and cigtax. However, among the three IVs that we have, it is the only one,
which satisfies both instrument relevance and instrument exogeneity, and thus we should use it.