Академический Документы
Профессиональный Документы
Культура Документы
ECON2206
Slides07
Lecturer: Minxian Yang
ie_Slides07 my, School of Economics, UNSW 1
7. Multiple Regression Model: Qualitative Variables (Ch7)
7. Multiple Regression Model: Binary Variables
Lecture plan
Qualitative information and dummy (binary) variables
Regression with dummy regressors
Interactions with dummy regressors
Binary dependent variable: linear probability model
ie_Slides07 my, School of Economics, UNSW 2
7. Multiple Regression Model: Qualitative Variables (Ch7)
Qualitative information
Many variables in social sciences are qualitative
(non-numerical) factors that takes two values.
eg. female, married, insured, etc.
They can be defined as binary (0-1) valued variables,
known as dummy variables.
eg. female = 1 if female and female = 0 otherwise.
insured = 1 if insured and insured = 0 otherwise.
The assignment of values (0,1) is often determined by
interpretation convenience.
ie_Slides07 my, School of Economics, UNSW 3
7. Multiple Regression Model: Qualitative Variables (Ch7)
Dummy explanatory variables
We use dummy variables to incorporate qualitative
information into regression models.
eg. Wage model
wage =
0
+
0
female +
1
educ
+ u,
where
0
characterise
the gender difference in wage.
Under the ZCM assumption,
E(wage | female=1, educ)
=
0
+
0
+
1
educ,
E(wage | female=0, educ)
=
0
+
1
educ,
0
represents an intercept shift.
ie_Slides07 my, School of Economics, UNSW 4
7. Multiple Regression Model: Qualitative Variables (Ch7)
Dummy explanatory variables
Interpretation of dummy
eg. Wage model (continued)
wage =
0
+
0
female +
1
educ
+ u.
Would you add the male dummy in the model?
No, doing so leads to perfect collinearity (violation of MLR3).
Males are treated as the base group (against which
comparisons are made).
We could regress wage on male and educ, where females
would be base group and coefficient interpretation would be
different.
ie_Slides07 my, School of Economics, UNSW 5
7. Multiple Regression Model: Qualitative Variables (Ch7)
Dummy explanatory variables
eg. Wage model (continued): WAGE1.RAW
wage = 1.57 1.81female + 0.572educ
(0.72) (0.26) (0.049)
+ 0.025exper + 0.141tenure
(0.012) (0.021)
n = 526, R
2
= 0.364
Negative intercept is not meaningful here.
A female worker is predicted to earn $1.81 less than a male
worker at the same level of educ, exper and tenure.
Compare the above with the simple regression
wage = 7.10 2.51female, n = 526, R
2
= 0.116
(0.21) (0.30)
But the coefficient of female here is not ceteris paribus.
ie_Slides07 my, School of Economics, UNSW 6
7. Multiple Regression Model: Qualitative Variables (Ch7)
Dummy explanatory variables
log dependent variable
eg. Wage model (continued): what if y = log(wage)?
log(wage) = .501 .301female + .087educ
(.102) (.037) (.007)
+ .005 exper + .017tenure
(.002) (.003)
n = 526, R
2
= 0.392
A female worker is predicted to earn 30.1% less than a male
worker at the same level of educ, exper and tenure.
A more accurate calculation:
log(wage
f
/wage
m
) = .301
(wage
f
wage
m
)/wage
m
= exp(.301) 1 =.260 =26%
ie_Slides07 my, School of Economics, UNSW 7
1 =
) x
exp(
y
y
+ + + =
1 1 0
#kids less than 6 years old
7. Multiple Regression Model: Qualitative Variables (Ch7)
Binary response and linear probability model
Shortcomings of LPM
The predicted probability can be either less than 0 or
greater than 1. (eg. the in-labour-force probability for
those with kidslt6 4 is predicted to be negative.)
Linear function is not suitable for modelling probabilities.
Logit model: P(y=1|x) = {1+exp[-(
0
+
1
x
1
+ ... +
k
x
k
)]}
-1
Probit model: P(y=1|x) = (
0
+
1
x
1
+ ... +
k
x
k
)
For LPM, it can easily be shown that
Var(u|x) = Var(y|x) = P(y=1|x)[1 P(y=1|x)],
ie, MLR5 does not hold as the conditional variance
depends on xs (heteroskedasticity). It does not cause
estimation bias but does invalidate the standard errors.
ie_Slides07 my, School of Economics, UNSW 21
7. Multiple Regression Model: Qualitative Variables (Ch7)
Summary
Dummy variables are useful to measure the ceteris
paribus differences among different groups in the
sample.
Dummies are also useful to incorporate ordinal
information.
Dummies can be interacted with other variables to
allow for different slopes for different groups, and
tests for various hypotheses of interest.
Chow test and F-test are used to test for differences
in the regressions across groups.
Binary response leads to the LPM, where the fitted
values are interpreted as probabilities of success.
ie_Slides07 my, School of Economics, UNSW 22