Вы находитесь на странице: 1из 2

MA Discussion (30 Sep)

Dont drop observations just because they are missing in a control


variable.
www.ats.ucla.edu/stat/stata/modules/missing.html
o Missing observation use summarize Stata drops
observations if they miss a single variable in the regression
model.
o Dont drop outliers at the START of the project. Only do so after
testing if they affect the end result.
Unclassified hospital type vs . Unclassified is treated as the
excluded type, so it does not affect the regression when we remove
those observations.
Generally drop if you think it might bias other results If you think the
missing-ness is systematic.
Correct Y variable is either Medicaid or uninsured, not
Medicaid/countymedicaid.
o If the DGP is Y = a + bX, this assumes that a=0 and b=1.
o If the DGP is Y = a + bX + cZ, this says that a=0, b=1, and Z is
somehow better related to the ratio of Y/X instead of just Y.
Medicare*admits is a bad interaction variable. Imagine two hospitals in
the same county. If one type of hospital is systematically bigger.
Avoid sequential dropping! How to decide whether to drop?
o Requires F-test tool.
o Compares against different baselines
o In future, its good to explicitly say what the core model is, and
then compare all against this baseline.
Pay attention to:
o Coefficient, SE, p-value of the KEY VARIABLE OF INTEREST (e.g.
forprofit)
o R2
Robustness
o Done after finalizing the core model
o Add stuff and see if it makes a difference!
o Focus on goal (key variable of interest, R 2), not the newly added
stuff!
F-test
o Test othernfp countyowner church
Running Y forprofit othernfp church
o Everything compared RELATIVE against countyowned
o We only know if forprofit took their fair share relative to
countyowned
o Run t-tests:
Test forprofit = othernfp
Test forprofit = church
o Finish the interpretation significantly more or less?
Multi-collinearity: multiple x variables so correlated that the regression
cannot get correct standard errors.

Use composite variables (add multiple variables that might be


collinear together. Use z-scores if units are different.
Test to see if they are jointly significant.
Vif command
o If > 10, variables may be subject to problematic multicollinearity
o Take this with a huge pinch of salt though.
To create a composite variable, we need the underlying variables to be
standardized.
o Have the same mean and SD
o (X X_mean) / X_stddev */
o return list
o di r(mean)
o gen temp = (countyprivateinsurance r(mean))/r(sd)
egen
o egen score1 = std(countyprivateinsurance) Creates standard
z-score
o gen check = temp == score1
o tab check
o drop check temp
o egen score2 = std(countymedicaid)
o egen score3 = std(countymedicare)
o tab temp forprofit See mean, dummy variable split
The residual is the difference between predicted Y and actual Y.
o predict line, residuals
o predict line, xb
o su abc
o (help predict)
o gen diff = Medicaid line
o corr abc diff
o scatter abc line (find a plot to see if data is heteroskedastic)
No pattern; well-scattered No heteroskedasticity
o If residual for the outlier is very small, then you know the model
is trying to fit the outlier. If the residual is large, then you know
the model ignored the outlier.
Search outreg2, all
o Install.
o

Вам также может понравиться