Robust Regression

Robust Regression
Regression Methods
We are going to look at three approaches
to robust regression:
Regression with robust standard errors
Regression with robust standard errors
including the cluster option
Regression with random effect
Regression with fixed effect
We will look at a model that predicts the

api 2000 scores
Our focus is whether the average class
size in K through 3 (acs_k3) and average
class size 4 through 6 (acs_46) affect the
academic performance
use a new data set

http://www.ats.ucla.edu/stat/stata/webboo
ks/reg/elemapi2
4.1.1 Regression with Robust

Standard Errors
The Stata regress command includes a
robust option for estimating the standard
errors using the Huber-White sandwich
estimators.
Such robust standard errors can deal with a
collection of minor concerns about failure to
meet assumptions,
Minor problems about normality

Heteroscedasticity
Some observations that exhibit large residuals,
leverage or influence.
With the robust option, the point

estimates of the coefficients are exactly
the same as in ordinary OLS, but the
standard errors take into account issues
concerning heterogeneity and lack of
normality.
As with the robust option, the estimate of the

coefficients are the same as the OLS estimates,
but the standard errors take into account that the
observations within districts are nonindependent.
If you have a very small number of clusters
compared to your overall sample size it is possible
that the standard errors could be quite larger
than the OLS results. For example, if there were
only 3 districts, the standard errors would be
computed on the aggregate scores for just 3
districts.
Using the Cluster Option

The elemapi2 dataset contains data on 400
schools that come from 37 school districts. It
is very possible that the scores within each
school district may not be independent, and
this could lead to residuals that are not
independent within districts.
We can use the cluster option to indicate that
the observations are clustered into districts
(based on dnum) and that the observations
may be correlated within districts, but would
be independent between districts.
Control for random effect (school district)
Control for fixed effect (school district)
2.4 Examine Distribution Assumption
Classical regression assumption requires

that the outcome (dependent) to be
normally distributed.
In large sample, this assumption is not
that important because of Central Limit
Theory
In small sample, however, the distribution
assumption could be relevant
We will investigate issues concerning
normality.
Here we check the normality of enroll

We start with making some graphs
Hisgram
Kdesnity
We can use the normal option to

superimpose a normal curve on this graph
and the bin(20) option to use 20 bins.
The distribution looks skewed to the right.
An alternative to histograms is the kernel

density plot, which approximates the
probability density of the variable.
Kernel density plots have the advantage of
being smooth and of being independent of
the choice of origin, unlike histograms.
Stata implements kernel density plots
with the kdensity command.
Having concluded that enroll is not normally

distributed, how should we address this
problem?
We may try to transform enroll to make it
more normally distributed. Potential
transformations include taking the log, the
square root or raising the variable to a power.
Stata includes the ladder and gladder
commands to help selecting the right
transformation. Ladder reports numeric
results and gladder produces a graphic
display.
This indicates that the log transformation

would help to make enroll more normally
distributed.
Let's use the generate command with the
log function to create the variable lenroll
which will be the log of enroll.
Note that log in Stata will give you the
natural log, not log base 10. To get log
base 10, type log10(var)
2. 5 Summary
Simple Regression
Multiple Regression
Hypothesis Testing
Examine the normality assumption
Quiz I
Make graphs of api99: histogram,
kdensity plot
What is the correlation between api99
and meals?
Regress api99 on meals.
Create and list the fitted (predicted)
values.
Graph meals and api99 with and without
the regression line.
Quiz II
Look at the correlations among the
variables api99 meals ell avg_ed using
the corr and pwcorr commands.
Perform a regression predicting api99
from meals and ell. Interpret the output.

Robust Regression

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Robust Regression

Загружено:

Авторское право:

Доступные форматы

Robust Regression

We will look at a model that predicts the

use a new data set

4.1.1 Regression with Robust

Minor problems about normality

With the robust option, the point

As with the robust option, the estimate of the

Using the Cluster Option

Control for random effect (school district)

Control for fixed effect (school district)

2.4 Examine Distribution Assumption

Classical regression assumption requires

Here we check the normality of enroll

We can use the normal option to

An alternative to histograms is the kernel

Having concluded that enroll is not normally

This indicates that the log transformation

Вам также может понравиться