Вы находитесь на странице: 1из 38

Introductory Econometrics

Multiple Regression: Inference

Monash Econometrics and Business Statistics

2019

1 / 38
Recap

I In the multiple regression model

yi = β0 + β1 xi1 + ... + βk xik + ui , i = 1, 2, . . . , n ⇔ y = Xβ + u

under fairly minimal assumptions the OLS estimator


βb = (X0 X)−1 X0 y is an unbiased estimator of β
I These assumptions are:
1. the population model is linear in parameters;
2. the conditional expectation of true errors given all explanatory
variables is zero;
3. the sample is randomly selected; and
4. the explanatory variables are not perfectly collinear
I If we add no conditional heteroskedasticity to the above
assumptions, then βb is the B.L.U.E. for β (Gauss-Markov Theorem)

2 / 38
I Under these assumptions, the variance-covariance matrix of βb
conditional on X is σ 2 (X0 X)−1 , which we estimate using

Var\
(β̂ | X) = σ̂ 2 (X0 X)−1

where Pn
ûi2 û0 û
σ̂ 2 = i=1
=
n−k −1 n−k −1
I The standard errors of coefficients reported by all statistical
packages are the square root of the diagonal elements of this
estimated variance-covariance matrix
I Statistical packages also report the standard error of the regression,
which is σ̂, the square root of σ̂ 2

3 / 38
Lecture Outline
I Sampling Distributions of the OLS Estimators (Textbook reference
4-1)
I Testing Hypotheses About a Single Population Parameter
(Textbook reference 4-2)
I Confidence Intervals (Textbook reference 4-3)
I Testing Single Linear Restrictions (Textbook reference 4-4)
I Testing a joint hypothesis of several restrictions - The F test
I An important special case: Testing the overall significance of a
regression model
I Application of t-test to testing a single linear restriction involving
several parameters (textbook reference 4.4)

4 / 38
Sampling Distributions of the OLS Estimators

I We want to test hypotheses about the βj . This means we


hypothesize that a population parameter is a certain value, then use
the data to determine whether the hypothesis is likely to be false.
I Example: Hypothesis: Missing lectures has no causal effect on final
exam performance

final = β0 + β1 skipped + β2 priGPA + u

The null hypothesis is


H 0 : β1 = 0

5 / 38
I Assumptions we have made so far gave us:

E (βb | X) = β

and
Var (βb | X) = σ 2 (X0 X)−1

I But hypothesis testing requires the sampling distributions of βb

6 / 38
I Assumption MLR.6 or E.5 (Normality): Conditional on X the
population errors are normally distributed
I The random sampling assumption implies that population errors are
independent of each other
I The zero conditional expectation assumption implies that
population errors have mean zero
I The no heteroskedasticity assumption implies that population errors
have variance σ 2
I Assumption MLR.6 can therefore be expressed as: conditional on
explanatory variables, population errors ui are i.i.d.N(0, σ 2 )
I Similarly, E.5 can be expressed as u | X ∼ N(0, σ 2 In )
I Assumptions MLR.1 to MLR.6 are called the Classical Linear Model
(CLM) assumptions

7 / 38
The Classical Linear Model

f (y | x)

β0 + β1 x

x1
x2 x
x3

8 / 38
I The beauty of Normal distribution is that any linear combination of
Normal random variable is also Normally distributed
I Recall that:

βb = (X0 X)−1 X0 y
= β + (X0 X)−1 X0 u

which shows that conditional on X, βb is a linear combination of


normally distributed errors, and therefore we obtain the following
result:
I Under CLM assumptions

βb | X ∼ N(β, σ 2 (X0 X)−1 )

9 / 38
I That is for j = 0, 1, . . . , k
 
β̂j | X ∼ N βj , Var (β̂j )

where
Var (β̂j ) = σ 2 (X0 X)−1 jj


I Therefore the standardised estimator

β̂j − βj
∼ N(0, 1)
sd(β̂j )

because ...

10 / 38
Testing Hypotheses About a Single Population Parameter

I We cannot directly use the result

β̂j − βj
∼ N(0, 1)
sd(β̂j )

to test hypotheses about βj : sd(β̂j ) depends on σ, which is


unknown
I But we have σ̂ as an estimator of σ. Using this in place of σ gives
us the standard error, se(β̂j ).
I But replacing σ (an unknown constant) with σ̂ (an estimator that
varies across samples), takes us from the standard normal to the t
distribution.

11 / 38
I Under the CLM Assumptions,

β̂j − βj
∼ tn−k−1 = tdf
se(β̂j )
(no proof required)
I The t distribution also has a bell shape, but is more spread out (has
fatter tails) than the N(0, 1). It gets more similar to N(0, 1) as its
df increases.

12 / 38
I We use the result on the t distribution to test the null hypothesis
about a single βj
I Most routinely we use it to test if controlling for all other x, xj has
no partial effect on y :
H0 : βj = 0
for which we use the t statistic (or t ratio),

β̂j
tβ̂j =
se(β̂j )

which is computed automatically by most statistical packages for


each estimated coefficient
I But to conduct the test we need an alternative hypothesis

13 / 38
I The alternative hypothesis can be one-sided, such as H1 : βj < 0 or
or H1 : βj > 0, or it can be two-sided H1 : βj 6= 0.
I The alternative determines what kind of evidence is considered
legitimate against the null. For example, for H1 : βj < 0 we only are
excited to reject the null if we find evidence that it is negative. We
won’t be interested in any evidence that βj is positive, and we won’t
reject the null if we found such evidence.
I Example: The effect of missing lectures on final performance, the
alternative hypothesis is that missing lectures has negative effect on
your final, even after controlling for how smart you are. We are not
interested in any evidence that missing lectures improves final
performance
I With two sided alternatives, we take any evidence that βj may not
be zero, whether positive or negative, as legitimate

14 / 38
I We also need to specify α the size or the significance level of the
test, which is the probability that we wrongly reject the null when it
is true (Type I error)
I Using the tn−k−1 distribution, the significance level and the type of
alternative, we determine the critical value that defines the rejection
region.
I For H1 : βj > 0

15 / 38
I For H1 : βj 6= 0

I If tcalc (the value of the t-statistic in our sample) falls in the critical
region, we reject the null
I When we reject the null, we say that xj is statistically significant at
the α% level
I When we fail to reject the null, we say that xj is not statistically
significant at the α% level

16 / 38
I We can use t test to test the null hypothesis:

H0 : β j = r

where r is a constant, not necessarily zero. Under the assumptions


of CLM, we know that:

β̂j − r
∼ tn−k−1 if H0 is true
se(β̂j )

So we test the null using this t statistic.


I Percentiles of t distribution with various degrees of freedom are
given in Table G.2 on page 833 (5th edition) of the textbook (or
page 745 of the 6th edition)

17 / 38
Steps involved in statistical verification of a question

1. Formulating H0 and H1 for the question


2. Determining the appropriate test statistic and stating its distribution
under H0
3. Determining the rejection region with reference to the null
distribution, H1 and the desired level of significance of the test (α),
using statistical tables or software
4. Calculating the test statistic from regression results
5. Arriving at a conclusion: Rejecting H0 if the value of the test
statistic falls inside the rejection region, and not rejecting H0
otherwise
6. Explaining the conclusion in the context of the question

18 / 38
Confidence Intervals

I Another way to use classical statistical testing is to construct a


confidence interval using the same critical value as was used for a
two-sided test
I A 100(1 − α)% confidence interval is defined as

β̂j ± c × se(β̂j )

where c is the 100(1 − α2 ) percentile of a tn−k−1 distribution


I The interpretation of a 100(1 − α)% confidence interval is that the
interval will cover the true parameter with probability (1 − α)
I If the confidence interval does not contain zero, we can deduce that
xj is statistically significant at the α level

19 / 38
p-value of a test

I An alternative to classical approach to hypothesis testing is to ask


“based on the evidence in the sample, what is the value that for all
significance levels less than that value the null would not be
rejected, and for all significance levels bigger than that, the null
would be rejected?”
I To find this, need to compute the t statistic under the null, then
look up what percentile it is in the tn−k−1 distribution
I This value is called the p-value
I Most statistical packages report the p-value for the null of βj = 0,
assuming a two-sided test. Obviously, if you want this for a
one-sided alternative, just divide the two-sided p-value by 2

20 / 38
Example: Effect of missing classes on the final score

21 / 38
Testing multiple linear restrictions: The F test

I Sometimes we want to test multiple restrictions. For example in the


regression model

y = β0 + β1 x 1 + β2 x 2 + β3 x 3 + β 4 x 4 + u

we are always interested in the overall significance of the model by


testing
H 0 : β1 = β2 = β3 = β 4 = 0
or we may be interested in testing

H 0 : β3 = β4 = 0

or even a more exotic hypothesis such as:

H0 : β1 = −β2 , β3 = β4 = 0

22 / 38
I The first null involves four restrictions, the second involves ...
restriction and the third involves ... restrictions
I The alternative can only be that at least one of these restrictions is
not true
I The test statistic involves estimating two equations, one without
restrictions (the unrestricted model) and one with the restrictions
imposed (the restricted model), and seeing how much their sum of
squared residuals differ
I This is particularly easy for testing exclusion restrictions like the first
two nulls on the previous slide

23 / 38
I For example for

H0 : β3 = β4 = 0 (note: 2 restrictions)

the alternative is

H1 : at least one of β3 or β4 is not zero

the unrestricted model is:

y = β0 + β1 x1 + β2 x2 + β3 x3 + β4 x4 + u (UR)

and the restricted model is:

y = β0 + β 1 x 1 + β2 x 2 + u (R)

24 / 38
I The test statistic is
(SSRr − SSRur )/q
F = ∼ Fq,n−k−1 under H0
SSRur /(n − k − 1)

where q is the number of restrictions, and Fq,n−k−1 is an F


distribution with (q, n − k − 1) degrees of freedom. q is called the
numerator df, and n − k − 1 is called the denominator df.
I The F statistic is always positive (why?) - so if you happen to get a
negative number, you should realise that you must have made a
mistake
I The 10%, 5% and 1% critical values of the F distribution for
various numerator and denominator dfs are given in Tables G.3a,
G.3b and G.3c of the textbook

25 / 38
I Suppose q = 3 and n − k − 1 = 60. Then from Table G.3b the 5%
Fcrit = 2.76

I We reject the null if Fcalc > 2.76, and not reject otherwise

26 / 38
A useful formulation of the F test

I The SST of the restricted and the unrestricted models are the same
(why?), therefore we have:

SSRr = (1 − Rr2 ) SST


2
SSRur = (1 − Rur ) SST

I This allows us to write the F statistic as:


2
(Rur − Rr2 )/q
F =
(1 − Rur
2 )/(n − k − 1)

which is more useful because R 2 is reported more often than SSR

27 / 38
Test for overall significance of a model

I This is the F -statistic that is reported in the eviews output any time
we estimate a regression
I It is for the special null hypothesis that all slope parameters are zero

H 0 = β1 = β2 = · · · = βk = 0

meaning that none of the explanatory variables helps explain y


I If we cannot reject this null, our model is useless
I The test statistic is
R 2 /k
F = ∼ Fk,n−k−1 under H0
(1 − R 2 )/(n − k − 1)

I Eviews also produces the p-value of this test (“Prob(F-statistic)” in


the regression output)

28 / 38
Example
I Determinants of long-term economic growth: Controlling for initial
conditions (the size of GDP in 1960 and human capital in 1960)
and openness is the rule of law a significant determinant of long
term growth rate?
I Data from “Finance and the Sources of Growth” Journal of
Financial Economics, Vol. 58, pp. 261-300, published in 2000.
I The unrestricted model:

29 / 38
I H0 : βrev coups = βassasinations = 0
I H1 : at least one of them is not zero
(SSRr −SSRur )/2
I F = SSRur /(65−5−1) ∼ F2,59 under H0
I The restricted model:

(157.1668−147.6766)/2
I Fcalc = 147.6766/(65−5−1) = 1.896 in this sample

30 / 38
I Level of significance of the test: 5%
I Critical value from Table G.3b between 3.15 for F2,60 and 3.23 for
F2,40 . In such cases we prefer to be conservative and choose the
closest degree of freedom less than what we are after
I Fcalc < 3.23 ⇒ There is not enough evidence to suggest that
revolution, coup d’etat or political assassinations have any effect on
long-term growth, after controlling for the effects of initial gdp per
capita, educatedness of the workforce and openness to trade.
I The F-test for overall significance of both regressions shows that
both models have at least one significant explanatory variable for
explaining growth.

31 / 38
Testing a single restriction that involves more than one
parameter
Textbook reference 4.4

I In the regression model

y = β0 + β1 x 1 + β2 x 2 + β3 x 3 + u

sometimes we want to test hypotheses such as:

H 0 : β1 = β2 (Null1)

or as another example, we may want to test

H 0 : β1 + β2 + β3 = 1 (Null2)

I Both of these hypotheses test that a single linear combination of β


is equal to a constant (what is that for Null1?)

32 / 38
I We can use the F test as usual, there would be no good reason to
study this case separately
I However, with a single hypothesis, we can have one-sided
alternatives. For example for Null1, we can have

H1 : β1 > β2 (Alt1)

and for Null2 we can have

H 1 : β 1 + β2 + β3 < 1 (Alt2)

I F-test cannot be used for one sided alternatives


I We can re-formulate the problem and use a t-test

33 / 38
I Consider Null1. Define
δ = β1 − β2
I We can write Null1 and Alt1 as

H0 : δ=0
H1 : δ>0

I The estimator for δ is δ̂ = β̂1 − β̂2


I And Var (δ̂) = Var (β̂1 ) + Var (β̂2 ) − 2Cov (β̂1 , β̂2 )
I Under CLM assumptions βb conditional on X is Normally distributed,
so δ̂ is also Normally distributed (why?)
I We can use the above to come up with a t-test to test H0 : δ = 0
I But that would be too messy. A more practical solution is to use a
cool reparameterisation trick

34 / 38
Reparameterisation: A cool useful trick

I Consider Null1 and Alt1

H0 : β1 − β2 = 0
H1 : β1 − β2 > 0

and set
δ = β1 − β2 ⇒ β1 = δ + β2
I Replace β1 in the model with δ + β2 and rearrange

y = β0 + (δ + β2 )x1 + β2 x2 + β3 x3 + u ⇒
y = β0 + δx1 + β2 (x1 + x2 ) + β3 x3 + u

which shows that if we regress y on a constant, x1 , (x1 + x2 ) and


x3 , the OLS estimate of the coefficient of x1 will be δ̂ and its
standard error is directly given to us without any messy calculations!
Amazing! (The audience goes WoW!)

35 / 38
Reparameterisation: Example
I Test that controlling for IQ, one additional year of education has the
same proportional effect on wage as one additional year of
experience, against the alternative that it has a larger effect:

I Note that only the estimated parameter of EDUC on the right is


β̂EDUC − β̂EXPER on the left
I δ is the parameter of EDUC in the reparameterised model
I tcalc = 5.59 which is larger than 1.65 the 5% one-sided cv of t931
I Therefore we reject the null
36 / 38
Summary
I In this lecture we learned that if we add the assumption that
population errors are normally distributed, we get that the OLS
estimator, conditional on X is normally distributed
I Using this, we learned how to test hypothesis on any of the slope
parameters, or on a single linear combination of several of them, by
using a t test
I We learned that we can reparameterise the population model to
enable us to test a single linear combination of parameters easily
I We also learned how to form confidence intervals for any of the
slope parameters
I We learned that hypothesis tests could also be done with the help of
confidence intervals, or alternatively using p-values

37 / 38
Summary

I Further, we can test multiple linear restrictions using an F test


I The test statistic is based on sum of squared residuals of the
unrestricted and the restricted models (SSRur and SSRr )

(SSRr − SSRur )/q


F = ∼ Fq,n−k−1 under H0
SSRur /(n − k − 1)

I Using the significance level of the test, we can find the Fcrit of the
test from the F table
I We will learn in tutorial how to get this from eviews
I We reject the null of Fcalc > Fcrit , and not reject otherwise
I The F-test for overall significance of a regression model is an
important case that is provided by all statistical packages
I Finally, we learned how to test general linear restrictions

38 / 38

Вам также может понравиться