Вы находитесь на странице: 1из 40

LECTURE 3

MULTIPLE REGRESSION MODEL-


ESTIMATION
Xi Qu
Fall 2017 0
The Multiple Regression Model

 Definition of the multiple regression model

Explain variable y in terms of several variables x1, x2...,xk .

Intercept,
Slope parameter,
constant term
coefficient

y  0  1 x1  2 x2   k xk  u
Dependent variable, Error term,
Independent variable, disturbance,
explained variable, explanatory variable,
regressand, unobservables,…
right hand side variable
left hand side variable control variable, 1
response variable,… regressor, covariate,…
Motivation for multiple regression
 Motivation:
1. Incorporate more explanatory factors into the model
2. Explicitly hold fixed other factors that otherwise
would be in u
3. Allow for more flexible functional forms

 Example 1: Wage equation


Now measures effect of education explicitly holding experience fixed

wage  0  1educ  2exper  u All other factors…

Hourly wage Years of education Labor market experience 2


 Example 2:
Average test scores and per student spending
Other factors
avgscore  0  1expend  2avginc  u

Average standardized Per student spending Average family income


test score of school at this school of students at this school

 Per student spending is likely to be correlated with average family


income at a given high school because of school financing

 Omitting average family income in regression would lead to biased


estimate of the effect of spending on average test scores

 In a simple regression model, effect of per student spending would


3
partly include the effect of family income on test scores
 Example 3:
Family income and family consumption
Other factors
cons  0  1inc  2inc  u 2

Family consumption Family income Family income squared

 Model has two explanatory variables: inome and income squared


 Consumption is explained as a quadratic function of income
 One has to be very careful when interpreting the coefficients:

cons
 1  2 2inc
inc
By how much does
consumption increase if Depends on how much
income is already there 4
income is increased by one
unit?
 Example 4:
CEO salary, sales and CEO tenure

log(salary)  0  1 log(sales)  2ceoten  3ceoten  u 2

Log of CEO salary Log sales Quadratic function of CEO


tenure with firm

 Model assumes a constant elasticity relationship between CEO


salary and the sales of his or her firm
 Model assumes a quadratic relationship between CEO salary and
his or her tenure with the firm

 Meaning of “linear” regression


 The model has to be linear in the parameters 5

(not in the variables)


QUESTION
 1. In the equation, y = β0 + β1 x1 + β2 x2 + u, β2 is a(n) _____.
a. independent variable
b. dependent variable
c. slope parameter
d. intercept parameter

 2. Consider the following regression equation: y =


β0 + β1 x1 + β2 x2 + u. What does β1 imply?
a.β1 measures the ceteris paribus effect of x1 on x2 .
b. β1 measures the ceteris paribus effect of y on x1 .
c. β1 measures the ceteris paribus effect of x1 on y.
d. β1 measures the ceteris paribus effect of x1 on u.
6
OLS estimationof the multiple
regression model
 Random sample

 x i1 , xi 2 ,..., xik , yi  : i  1,..., n

 Regression residuals
uˆi  yi  yˆi  yi  ˆ0  ˆ1 xi1   ˆk xik
 Minimize sum of squared residuals
n
min ˆ  ui  (  0 ,  1 ,
ˆ ˆ
ˆ 2
k)
0 , 1 ,  k
i 1
7
Minimization will be carried out by computer
Interpretation of the multiple
regression model
Ideally, we want to interprete it as “By how much
y does the dependent variable change if the j-th
 independent variable is increased by one unit,
x j
j
holding all other independent variables constant”

 As long as the unobserved factors do not change if the u


explanatory variables are changed (infeasible!) 0
x j
 The multiple linear regression model manages to hold the values
of other explanatory variables fixed even if, in reality, they are
correlated with the explanatory variable under consideration

 “Ceteris paribus”-interpretation
8
FORMAL INTERPRETATION

 Formally, if

then

and
“By how much does the expected
(average) value of the dependent variable
change if the j-th independent variable is
increased by one unit, holding all other
independent variables constant”

 Need to explicitly state “ holding (the other factors) 9


x1,…xj-1,xj+1,…xk constant”
 Example 1: Determinants of college GPA

Grade point average High school grade Achievement test score


at college point average

 Interpretation
 Holding ACT fixed, another point on hsGPA is associated with
another .453 points college grade point average

 Or: If we compare two students with the same ACT, but the
hsGPA of student A is one point higher, we predict student A
to have a colGPA that is .453 higher than that of student B

 Holding hsGPA fixed, on average, another 10 points on ACT 10


are associated with less than one-tenth point on college GPA
“Partialling out” interpretation
of multiple regression
 The estimated coefficient of an explanatory variable can be
obtained in two steps:
 1) Regress the explanatory variable on all other explanatory variables
 2) Regress y on the residuals from this regression

 Why does this procedure work?


 The residuals from the first regression is the part of the explanatory
variable that is uncorrelated with the other explanatory variables
 The slope coefficient of the second regression therefore represents the
isolated effect of the explanatory variable on the dependent variable
11
Properties of OLS
 Fitted values and residuals

yˆi  ˆ0  ˆ1 xi1   ˆk xik uˆi  yi  yˆi


Fitted or predicted values Residuals (Deviations from regression line)

 Algebraic properties of OLS regression


n n

 uˆi  0
i 1
 uˆi xi  0
i 1
y  ˆ0  ˆ1 x1   ˆk x k

Deviations from Correlation between Sample averages of


regression line sum deviations and y and x lie on
up to zero regressors is zero regression line 12
Goodness of Fit
 Decomposition of total variation

Notice that R-squared can only


increase if another explanatory
variable is added to the regression
 R-squared

 Alternative expression for R-squared

R-squared is equal to the squared


correlation coefficient between the
actual and the predicted value of
the dependent variable
 Example: Explaining arrest records

Number of times Proportion prior arrests


Quarters employed 1986
arrested 1986, per that led to conviction
100 men

Months in prison 1986

 Interpretation: (holding other factors constant)


 Proportion prior arrests +0.5 ! -.075 = -7.5 arrests per 100 men
 Months in prison +12 ! -.034(12) = -0.408 arrests for given man
 Quarters employed +1 ! -.104 = -10.4 arrests per 100 men
14
 Example: Explaining arrest records (cont.)
 An additional explanatory variable is added:

Average sentence in prior convictions


R-squared increases only slightly

 Interpretation:
 Average prior sentence increases number of arrests (?)
 Limited additional explanatory power as R-squared increases
by little
 General remark on R-squared
 Even if R-squared is small (as in the given example),
regression may still provide good estimates of ceteris paribus
effects
15
QUESTION
 3. Which of the following is true of R2?
a. R2 is also called the standard error of regression.
b. A low R2 indicates that the Ordinary Least Squares line fits
the data well.
c. R2 usually decreases with an increase in the number of
independent variables in a regression.
d. R2 shows what percentage of the total variation in the
dependent variable, Y, is explained by the explanatory
variables.

 4. The value of R2 always _____.


a. lies below 0
b. lies above 1
c. lies between 0 and 1
16
d. lies between 1 and 1.5
Standard Assumptions
 Assumption MLR.1 (Linear in parameters)
In the population, the
y  0  1 x1   k xk  u relationship between y
and x is linear

 Assumption MLR.2 (Random sampling)


The data is a random
 x i1 , xi 2 ,..., xik , yi  : i  1,..., n sample drawn from
the population

yi  0  1 xi1  2 xi 2   k xik  ui
17
Each data point therefore follows the population equation
Standard Assumptions
 Assumption MLR.3 (No perfect collinearity)
In the sample (and therefore in the population), none of the
independent variables is constant and there are no exact
relationships among the independent variables

 Remarks on MLR.3
 The assumption only rules out perfect collinearity/correlation
between explanatory variables; (allow imperfect correlation )

 If an explanatory variable is a perfect linear combination of other


explanatory variables, it is superfluous and may be eliminated

 Once the model includes an intercept, constant variables are also


ruled out (collinear with intercept) 18
Example for perfect collinearity
•Example 1:
cons  0  1 log(income)  2 log(income )  u
2

•Example 2:
cons  0  1husbIncome   2 wifeIncome
 3 familyIncome  u
•Example 3:
expenditure  0  1deathrate  2 surviverate  u
19
QUESTION
 5. If an independent variable in a multiple linear regression
model is an exact linear combination of other independent
variables, the model suffers from the problem of _____.
a. perfect collinearity
b. homoskedasticity
c. heteroskedasticty
d. omitted variable bias

 6. The assumption that there are no exact linear relationships


among the independent variables in a multiple linear
regression model fails if _____, where n is the sample size and
k is the number of parameters.
a. n>2
b. n=k+1
c. n>k
d. n<k+1 20
Standard Assumptions
 Assumption MLR.4 (Zero conditional mean)

E  u | x1 , x2 ,..., xk   0

 The value of the explanatory variables must contain no


information about the mean of the unobserved factors

 The key assumption for unbiased estimators and for the “ceteris
paribus” interpretation

 In a multiple regression model, the zero conditional mean


assumption is much more likely to hold because fewer things end
up in the error
21
 Example 1: Wage equation
wage  0  1educ  2 abil  u
If ability was not included in the regression, it would end up in the error
term; hard to defend that education is uncorrelated with the error

 Example 2: Housing price


price  0  1size  2bedrooms  u
If size was not included in the regression, it would end up in the error
term; hard to defend that the number of bedroom is uncorrelated with
size which is in the error

 Example 3: Average test scores

avgscore  0  1expend  2avginc  u


If average income was not included in the regression, it would end
22
up in the error term; it would then be hard to defend that school
expenditure is uncorrelated with the error
 Discussion of the zero mean conditional
assumption
 Explanatory variables that are correlated with the error term
are called endogenous.
 Explanatory variables that are uncorrelated with the error
term are called exogenous.
 Exogeneity is the key assumption for a causal interpretation of
the regression, and for unbiasedness of the OLS estimators

 Theorem 3.1 (Unbiasedness of OLS)

 Unbiasedness is an average property in repeated samples; in a


given sample, the estimates may still be far away from the
true values
23
 Including irrelevant variables in a regression model

= 0 in the population
No problem because .

However, including irrevelant variables may increase sampling variance.

 Omitting relevant variables: can cause problem of


endogeneity

True model (contains x1 and x2)

Estimated model (x2 is omitted)

24
 Omitted variable bias

If x1 and x2 are correlated,


assume a linear regression
relationship between them

If y is only regressed error term


If y is only regressed
on x1 this will be the on x1, this will be the
estimated intercept estimated slope on x1

 Conclusion: In general, all estimated


coefficients will be biased
25
 Example: Omitting ability in a wage equation

Will both be positive

The return to education will be overestimated because .


It will look as if people with many years of education earn very
high wages, but this is partly due to the fact that people with
more education are also more able on average.

 When is there no omitted variable bias?


 If the omitted variable is irrelevant or uncorrelated

26
 Omitted variable bias: more general cases
True model
(contains x1, x2 and x3)

Estimated model (x3 is omitted)

 No general statements possible about direction of bias


 Analysis as in simple case if one regressor uncorrelated with others

 Example: Omitting ability in a wage equation

If experience is approximately uncorrelated with education and


ability, then the direction of the omitted variable bias can be as
analyzed in the simple two variable case.

27
QUESTION
 7. Exclusion of a relevant variable from a multiple linear
regression model leads to the problem of _____.
a. misspecification of the model
b. multicollinearity
c. perfect collinearity
d. homoskedasticity

 8. Suppose the variable x2 has been omitted from the following


regression equation,y = β0 + β1 x1 + β2 x2 + u. β1 is the estimator
obtained when x2 is omitted from the equation. The bias in β1 is
positive if _____.
a. β2 >0 and x 1 and x 2 are positively correlated
b. β2 <0 and x 1 and x 2 are positively correlated
c. β2 >0 and x 1 and x 2 are negatively correlated
28
d. β2 = 0 and x 1 and x 2 are negatively correlated
Standard Assumptions
 Assumption MLR.5 (Homoscedasticity)
The value of the explanatory variables
must contain no information about the
variance of the unobserved factors

 Example: Wage equation

This assumption may also be


hard to justify in many cases

 Short hand notation

with

29
All explanatory variables are
collected in a random vector
 Theorem 3.2 (Sampling variances of OLS slope estimators)

Under assumptions MLR.1 – MLR.5:

Variance of the error term

Total sample variation in R-squared from a regression of explanatory


explanatory variable xj: variable xj on all other independent variables
(including a constant)

30
Components of OLS Variances:
 1) The error variance
 A high error variance increases the sampling variance because there
is more “noise” in the equation
 A large error variance necessarily makes estimates imprecise
 The error variance does not decrease with sample size

 2) The total sample variation in the explanatory


variable
 More sample variation leads to more precise estimates
 Total sample variation automatically increases with the sample size
 Increasing the sample size is thus a way to get more precise
estimates 31
 3) Linear relationships among the independent
variables
Regress on all other independent variables (including a
constant)

The R-squared of this regression will be the


higher the better xj can be linearly explained
by the other independent variables

 Sampling variance of will be the higher the better explanatory


variable can be linearly explained by other independent variables
 The problem of almost linearly dependent explanatory variables is
called multicollinearity (i.e. for some )
 The extreme case is perfect collinearity where the exact linear
relationship exists in explanatory variables.
32
 An example for multicollinearity Other
Expenditures expenditures
Average standardized for teachers
Expenditures for
test score of school
instructional materials

The different expenditure categories will be strongly correlated. It will be


hard to estimate the differential effects of different expenditure categories
because all expenditures are either high or low.

For precise estimates of the differential effects, one would need information
about situations where expenditure categories change differentially.

As a consequence, sampling variance of the estimated effects will be large.

33
 Discussion of the multicollinearity problem
 In the above example, it might be better to lump all expenditure
categories together because effects cannot be disentangled.

 In other cases, dropping some independent variables may reduce


multicollinearity (but this may lead to omitted variable bias).

 Only the sampling variance of the variables involved in multicollinearity


will be inflated; the estimates of other effects may be very precise

 Note that multicollinearity is not a violation of MLR.3.


 Multicollinearity may be detected through “variance inflation factors”

As an (arbitrary) rule of thumb, the


variance inflation factor should not be 34
larger than 10
The Error Variances:
 Estimating the error variance

•The number of observations minus the number of estimated


parameters is also called the degrees of freedom.

•The n estimated squared residuals in the sum are not completely


independent but related through the k+1 equations that define the first
order conditions of the minimization problem.

•From this construction, the estimator of the error variance is unbiased.

 Theorem 3.3 (Unbiased estimator of the error variance)


35
 Estimation of the sampling variances of the OLS estimators

The true sampling


variation of the
estimated

Plug in for the


unknown

The estimated
sampling variation of
the estimated

 Note that these formulas are only valid under assumptions


MLR.1-MLR.5 (in particular, there has to be homoscedasticity)

36
Efficiency of OLS:
 The Gauss-Markov Theorem
 Under assumptions MLR.1 - MLR.5, OLS is unbiased
 However, under these assumptions there may be many other
estimators that are unbiased
 Which one is the unbiased estimator with the smallest variance?
 In order to answer this question one usually limits oneself to linear
estimators, i.e. estimators linear in the dependent variable

May be an arbitrary function of the sample values of all


the explanatory variables;
the OLS estimator can be shown to be of this form. 37
 Theorem 3.4 (Gauss-Markov Theorem)
 Under assumptions MLR.1 - MLR.5, the OLS estimators are the
best linear unbiased estimators (BLUEs) of the regression
coefficients, i.e.

for all for which .

 OLS is only the best estimator if MLR.1 – MLR.5 hold; if


there is heteroscedasticity for example, there are better
estimators.

38
QUESTION
 9.Find the degrees of freedom in a regression model that has
10 observations and 7 independent variables.
a. 17
b. 2
c. 3
d. 4

 10. The Gauss-Markov theorem will not hold if _____.


a. the error term has the same variance given any values of
the explanatory variables
b. the error term has an expected value of zero given any
values of the independent variables
c. the independent variables have exact linear relationships
among them
d. the regression model relies on the method of random
sampling for collection of data 39

Вам также может понравиться