 =  =
Digression: percentage change
Chapter 6: Regression Analysis with
Qualtitative Information 14
An approximation to this exact formular is given by the
differences in logs of the respective values:
( ) ( ) ( )
1 0
log log 100 ~  pc y y
This approximation works well for small changes.
Using the numbers from the previous slide, the price of the tshirt,
using the formular above, has risen by approximately
( ) ( ) ( )
log 21$ log 20$ 100 4.879% 5%  = ~
Example: log(wage) equation
Chapter 6: Regression Analysis with
Qualtitative Information 15
( )
(0.099) (0.036) 0.007 (0.0059)
2 2
(0.0001) (0.007) (0.00023)
2
female
male
wage
wage
( ) ( )
( )
( )
(rewrite)
(apply exponential function)
log log 0.297
log 0.297
exp 0.297
1 exp 0.297 1 0.257
(substract 1
)
(
=
 
=

\ .
=
= =
female male
female
male
female
male
female
male
wage wage
wage
wage
wage
wage
wage
wage
25.7%)
Example: log(wage) equation
The coefficient on a dummy variable when
the dependent variable is in logs has a
percentage interpretation
This percentage change is only an
approximation that works well for small
changes
The exact percentage change can be
calculated the way described on the
previous slide
Chapter 6: Regression Analysis with
Qualtitative Information 17
Using Dummy Variables for
Multiple Categories
Up to now we were focusing on dummy
variables based on two categories (e.g. male vs.
female). Now we will turn to the problem of
dummy variables for multiple categories
Chapter 6: Regression Analysis with
Qualtitative Information 18
Multiple Categories
Suppose we want to estimate the effect of credit
ratings (CR) on bond interest rates (BIR)
Several companies (Moodys, Standard &Poors)
rate the quality of debt for governments, where
the rating depends on the probability of default
Suppose for simplicity that ratings range from 0
to 4, with 0 being the worst credit rating and 4
being the best credit rating
This is an example of an ordinal variable with
multiple categories
Chapter 6: Regression Analysis with
Qualtitative Information 19
Multiple Categories
How can we incorporate the variable CR in our
model? One way is to estimate the following
BIR=
0
+ 
1
CR + other factors

1
would give us the change in BIR if CR
increases by one unit (e.g. from 0 to 1, or from
2 to 3)
But is the effect on BIR when CR changes
from 0 to 1 the same as when CR changes from
2 to 3? Probably not!
Chapter 6: Regression Analysis with
Qualtitative Information 20
Multiple Categories
Better alternative: define 4 dummy variables
C
1
, C
2
, C
3
and C
4
, where C
j
equals 1 if CR=j
and 0 otherwise (for j=1,...,4) and run the
regression
BIR= 
0
+ o
1
C
1
+ o
2
C
2
+ o
3
C
3
+ o
4
C
4
+ other factors
The interpretation of, e.g.,
2
is the following:
how does BIR changes if CR changes from 0
(the base group) to 2.
Chapter 6: Regression Analysis with
Qualtitative Information 21
Chapter 6: Regression Analysis with
Qualtitative Information 22
Multiple Categories
Any categorical variable can be turned into
a set of dummy variables
Because the base group is represented by
the intercept, if there are n categories there
should be n 1 dummy variables
If there are a lot of categories, it may make
sense to group some together
Example: top 10 ranking, 11 25, etc.
Chapter 6: Regression Analysis with
Qualtitative Information 23
Example: wage equation
Remember the model of wage
determination from slide 15:
log(wage)=
0
+ o
1
female + other factors
We have seen that women earn less than
men after controlling for other factors like
experience, tenure and education (o
1
< 0)
Now we want to know whether marital
status also affects the wage
Example: wage equation
One way would be to estimate:
log(wage)=
0
+ o
1
female + o
2
married + other factors,
where married is 1 if the person is married and
0 otherwise
Drawback: Estimating this regression we
implicitly assume that the effect of married is
the same for men and women!
Solution: multiple categories
Chapter 6: Regression Analysis with
Qualtitative Information 24
Example: wage equation
In this case we get 4 different categories:
married men, married women, single men and
single women
Denote these by marrmale, marrfem, singmale
and singfem
As the base group we choose single men, so
singmale will not be included in the regression
Chapter 6: Regression Analysis with
Qualtitative Information 25
Example: wage equation
The model that allows the effect of martial
status on wages to vary between men and
women is:
log(wage)= 
0
+ o
1
marrmale + o
2
marrfem +
o
3
singfem + other factors
Interpretation:
1
measures the difference in
wages between married men and single men
(base group)
2
measures the difference in wages between
married women and single men
Chapter 6: Regression Analysis with
Qualtitative Information 26
Example: wage equation
We can also calculate the difference in wages
between single women and married women
from the previous regression
This difference in given by
3
2
Chapter 6: Regression Analysis with
Qualtitative Information 27
Example: wage equation
Chapter 6: Regression Analysis with
Qualtitative Information 28
( )
(0.100) (0.055) (0.058)
(0.056)
(
(
married
Chapter 6: Regression Analysis with
Qualtitative Information 33
Other Interactions with Dummies
Can also consider interacting a dummy
variable, d, with a continuous variable, x
y = 
0
+ o
1
d + 
1
x + o
2
d*x + u
If d = 0, then y = 
0
+ 
1
x + u
If d = 1, then y = (
0
+ o
1
) + (
1
+ o
2
) x + u
This is interpreted as a change in the slope
in addition to a change in the intercept
Chapter 6: Regression Analysis with
Qualtitative Information 34
Other Interactions with Dummies
Chapter 6: Regression Analysis with
Qualtitative Information 35
Testing for Differences Across
Groups
Testing whether a regression function is
different for one group versus another can be
thought of as simply testing for the joint
significance of the dummy and its interactions
with all other x variables
So, you can estimate the model with all the
interactions and without and form an F
statistic, but this could be unwieldy
Chapter 6: Regression Analysis with
Qualtitative Information 36
The Chow Test
Turns out you can compute the proper F
statistic without running the unrestricted
model with interactions with all k continuous
variables
If run the restricted model for group one and
get SSR
1
, then for group two and get SSR
2
Run the restricted model for all to get SSR,
then
( )   ( )  
1
1 2
2 1
2 1
+
+

+
+
=
k
k n
SSR SSR
SSR SSR SSR
F
Chapter 6: Regression Analysis with
Qualtitative Information 37
The Chow Test (continued)
The Chow test is really just a simple F test
for exclusion restrictions, but weve
realized that SSR
ur
= SSR
1
+ SSR
2
Note, we have k + 1 restrictions (each of the
slope coefficients and the intercept)
Note the unrestricted model would estimate
2 different intercepts and 2 different slope
coefficients, so the df is n 2k 2
Example:
college grade point averages
cgpa= 
0
+ 
1
sat + 
2
hsperc + 
3
tothrs + u
cgpa: College grade point average
sat: Scholastic Assessment Test score
hsperc: High school rank percentile
tothrs: Total hours of college courses
Question: Is the regression line different for
men and women?
Chapter 6: Regression Analysis with
Qualtitative Information 38
Example:
college grade point averages
Variante 1:
Use the usual F test
Run the following regression
cgpa= 
0
+ o
0
female + 
1
sat + o
1
female*sat +

2
hsperc + o
2
female*hsperc + 
3
tothrs +
o
3
female*tothrs + u
Test the hypothesis H
0
:
0
=0,
1
=0,
2
=0,
3
=0
This would be the usual F test
Chapter 6: Regression Analysis with
Qualtitative Information 39
Example:
college grade point averages
Variante 2:
Estimate the restricted model for both groups
(males and females) and get SSR
female
and
SSR
male
Then estimate restricted model with males and
females pooled together to get SSR
Chapter 6: Regression Analysis with
Qualtitative Information 40
0 1 2 3
to get : = + + + + SSR cgpa sat hsperc tothrs u    
{ }
0 1 2 3
to get for sex , : e
= + + + +
sex
sex sex sex sex sex
SSR male female
cgpa sat hsperc tothrs u    
Example:
college grade point averages
Given SSR
female
, SSR
male
and SSR, we can
plug in the values into the formular
This would be the Chow test
Chapter 6: Regression Analysis with
Qualtitative Information 41
( )
( )
2 1
1
female male
female male
SSR SSR SSR
n k
F
SSR SSR k
(
+
+ (
= 
+ +
The Linear Probability Model
(LPM)
A special case of regression analysis
occurs if the dependent variable y is
binary (0 or 1). This issue will be
discussed on the next slides
Chapter 6: Regression Analysis with
Qualtitative Information 42
Chapter 6: Regression Analysis with
Qualtitative Information 43
Linear Probability Model
P(y=1x) = E(yx), when y is a binary variable,
so we can write our model as
P(y = 1x) =
0
+
1
x
1
+ +
k
x
k
So, the interpretation of
j
is the change in the
probability of success when x
j
changes
The predicted y is the predicted probability of
success
Potential problem: prediction can be outside
[0,1] what is somehow strange for a
probablilty!
Chapter 6: Regression Analysis with
Qualtitative Information 44
Linear Probability Model
Chapter 6: Regression Analysis with
Qualtitative Information 45
Linear Probability Model (cont)
Irrespective of potential predictions outside of
[0,1], parameter estimates of the LPM are
consistent
The LPM model will violate assumption of
homoskedasticity, so it will affect inference
Despite drawbacks, its usually a good tool to
start with when y is binary
Chapter 6: Regression Analysis with
Qualtitative Information 46
Linear Probability Model (cont)
Alternatives to LPM: Nonlinear or nonparametric curve fitting;
Example: P(treatment under adult criminal law age when crime is
committed)
0
.
2
.
4
.
6
.
8
1
P
(
t
r
e
a
t
m
e
n
t
=
a
d
u
l
t
)
16 18 20 22 24
age at offense
kernel = epanechnikov, degree = 0, bandwidth = .56, pwidth = .83
Chapter 6: Regression Analysis with
Qualtitative Information 47
Caveats on Program Evaluation
A typical use of a dummy variable is when
we are looking for a program effect
For example, we may have individuals that
received job training, or welfare, etc
We need to remember that usually
individuals choose whether to participate in
a program, which may lead to a self
selection problem
Chapter 6: Regression Analysis with
Qualtitative Information 48
Selfselection Problems
If we can control for everything that is
correlated with both participation and the
outcome of interest then its not a problem
Often, though, there are unobservables that
are correlated with participation
In this case, the estimate of the program
effect is biased, and we dont want to set
policy based on it!
Selfselection Problems
One situation in which the problem of self
selection does not occur is when program
participation is randomized
If all individuals have the same probability
of ending up in the program then self
selection is not a problem
Randomization solves the selection
problem
Chapter 6: Regression Analysis with
Qualtitative Information 49