Вы находитесь на странице: 1из 27

LECTURE 4

Introductory Econometrics
Hypothesis testing
October 25, 2011
ON THE PREVIOUS LECTURE

We have listed the classical assumptions of regression


models:

model linear in parameters, explanatory variables linearly


independent

(normally distributed) error term with zero mean and


constant variance, no serial autocorrelation

no correlation between error term and explanatory


variables

We have shown that if these assumptions hold, OLS


estimate is

consistent

unbiased

efcient

normally distributed
ON TODAYS LECTURE

We are going to discuss how hypotheses about coefcients


can be tested in regression models

We will explain what signicance of coefcients means

We will learn how to read regression output


QUESTIONS WE ASK

What conclusions can we draw from our regression?

What can we learn about the real world from a sample?

Is it likely that our results could have been obtained by


chance?

If our theory is correct, what are the odds that this


particular sample would have been observed?
HYPOTHESIS TESTING

We cannot prove that a given hypothesis is correct using


hypothesis testing

All that can be done is to state that a particular sample


conforms to a particular hypothesis

We can often reject a given hypothesis with a certain


degree of condence

In such a case, we conclude that it is very unlikely the


sample result would have been observed if the
hypothesized theory were correct
NULL AND ALTERNATIVE HYPOTHESES

First step in hypothesis testing: state explicitly the


hypothesis to be tested

Null hypothesis: statement of the range of values of the


regression coefcient that would be expected to occur if
the researchers theory were not correct

Alternative hypothesis: specication of the range of values of


the coefcient that would be expected to occur if the
researchers theory were correct

In other words: we dene the null hypothesis as the result


we do not expect
NULL AND ALTERNATIVE HYPOTHESES

Notation:

H
0
. . . null hypothesis

H
A
. . . alternative hypothesis

Examples:

One-sided test
H
0
: 0
H
A
: > 0

Two-sided test
H
0
: = 0
H
A
: = 0
TYPE I AND TYPE II ERRORS

It would be unrealistic to think that conclusions drawn


from regression analysis will always be right

There are two types of errors we can make

Type I : We reject a true null hypothesis

Type II : We do not reject a false null hypothesis

Example:

H
0
: = 0

H
A
: = 0

Type I error: it holds that = 0, we conclude that = 0

Type II error: it holds that = 0, we conclude that = 0


TYPE I AND TYPE II ERRORS

Example:

H
0
: The defendant is innocent

H
A
: The defendant is guilty

Type I error = Sending an innocent person to jail

Type II error = Freeing a guilty person

Obviously, lowering the probability of Type I error means


increasing the probability of Type II error

In hypothesis testing, we focus on Type I error and we


ensure that its probability is not unreasonably large
DECISION RULE

A sample statistic must be calculated that allows the null


hypothesis to be rejected or not depending on the
magnitude of that sample statistic compared with a
preselected critical value found in tables

The critical value divides the range of possible values of


the statistic into two regions: acceptance region and rejection
region

The idea is that if the value of the coefcient is not such as


stated under H
0
, the value of the sample statistic should
not fall into the rejection region

If the value of the sample statistic falls into the rejection


region, we reject H
0
ONE-SIDED REJECTION REGION

H
0
: 0 vs H
A
: > 0

Distribution of

:
Acceptance region
Rejection region
Probability of Type I error
TWO-SIDED REJECTION REGION

H
0
: = 0 vs H
A
: = 0

Distribution of

:
Acceptance region
Rejection region
Rejection region
Probability of Type I error
THE t-TEST

We use t-test to test hypotheses about individual


regression slope coefcients

Tests of more than one coefcient at a time (joint


hypotheses) are typically done with the F-test (see next
lecture)

The t-test is appropriate to use when the stochastic error


term is normally distributed and when the variance of that
distribution must be estimated

The t-test accounts for differences in the units of


measurement of the variables
THE t-TEST

Consider the model


y =
0
+
1
x
1
+
2
x
2
+

Suppose we want to test (b is some constant)


H
0
:
1
= b vs H
A
:
1
= b

We know from the last lecture that

1
N

1
, Var(

1
)

Var(

1
)
N(0, 1) ,
where Var(

1
) is an element of the covariance matrix of

THE t-TEST
Var(

) = Var

Var(

0
) Cov(

0
,

1
) Cov(

0
,

2
)
Cov(

1
,

0
) Var(

1
) Cov(

1
,

2
)
Cov(

2
,

0
) Cov(

2
,

1
) Var(

2
)

=
2

1
Var

=
2

1
22

b

1
=

Var

2
(X

X)
1
22
THE t-TEST

Problem: we do not know the value of the parameter


2

It has to be estimated as

2
:= s
2
=
e

e
n k
,
k is the number of regression coefcients (here k = 3)

It can be shown that


(n k)s
2

2

2
nk

We denote standard error of


1
(sample counterpart of
standard deviation
b

1
)
s.e.

s
2
(X

X)
1
22
THE t-TEST

We dene the t-statistic


t :=

2
(X

X)
1
22

(nk)s
2

2

1
nk

N(0, 1)

2
nk
nk
= t
nk
=

s
2
(X

X)
1
22
=

1
s.e.

This statistic depends only on the estimate


1
, our
hypothesis about
1
, and it has a known distribution
TWO-SIDED t-TEST

Our hypothesis is
H
0
:
1
= b vs H
A
:
1
= b

Hence, our t-statistic is


t =

1
b
s.e.

We set the probability of Type I error to 5%

We say the p-value of the test is 5% or that we have a test at


95% condence level

We compare our statistic to the critical values t


nk,0.975
and
t
nk,0.025
(note that t
nk,0.975
= t
nk,0.025
)
TWO-SIDED t-TEST
Rejection region :
p-value = 5%
Distribution tn-k
2.5 %
2.5 %
tn-k,0.975 tn-k,0.025

We reject H
0
if |t| > t
nk,0.975
ONE-SIDED t-TEST

Suppose our hypothesis is


H
0
:
1
b vs H
A
:
1
> b

Our t-statistic still is


t =

1
b
s.e.

We set the probability of Type I error to 5%

We compare our statistic to the critical value t


nk,0.95
ONE-SIDED t-TEST
Rejection region :
p-value = 5%
Distribution tn-k
5%
tn-k,0.95

We reject H
0
if t > t
nk,0.95
SIGNIFICANCE OF THE COEFFICIENT

The most common test performed in regression is


H
0
: = 0 vs H
A
: = 0
with the t-statistic
t =

s.e.

t
nk

If we reject H
0
: = 0, we say the coefcient is
signicant

This t-statistic (and corresponding p-value) are displayed


in most regression outputs
EXAMPLE

Let us study the impact of years of education on wages:


wage =
0
+
1
education +
2
experience +

Output from Gretl:


gJe1 ou1pu1 1oJ Pava h1koovova 2011-10-22 23.20 page 1 o1 1
hode 3. 0LS, us1hg obseJva11ohs 1-526
0epehdeh1 vaJ1abe. wage
coe111c1eh1 s1d. eJJoJ 1-Ja11o p-vaue
--------------------------------------------------------
cohs1 -3.39054 0.766566 -4.423 1.18e-05 ***
educ 0.644272 0.0538061 11.97 2.28e-29 ***
expeJ 0.0700954 0.0109776 6.385 3.78e-10 ***
heah depehdeh1 vaJ 5.896103 S.0. depehdeh1 vaJ 3.693086
Sum squaJed Jes1d 5548.160 S.E. o1 JegJess1oh 3.257044
R-squaJed 0.225162 Adus1ed R-squaJed 0.222199
F{2, 523) 75.98998 P-vaue{F) 1.07e-29
Log-1ke1hood -1365.969 Aka1ke cJ11eJ1oh 2737.937
SchwaJz cJ11eJ1oh 2750.733 hahhah-0u1hh 2742.948
EXAMPLE

Output from Stata:



_cons -3.390539 .7665661 -4.42 0.000 -4.896466 -1.884613
exper .0700954 .0109776 6.39 0.000 .0485297 .0916611
educ .6442721 .0538061 11.97 0.000 .5385695 .7499747

wage Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 7160.41429 525 13.6388844 Root MSE = 3.257
Adj R-squared = 0.2222
Residual 5548.15979 523 10.6083361 R-squared = 0.2252
Model 1612.2545 2 806.127251 Prob > F = 0.0000
F( 2, 523) = 75.99
Source SS df MS Number of obs = 526
CONFIDENCE INTERVAL

A 95% condence interval of is an interval centered


around

such that (

c,

+ c) with probability 95%


P

c < <

+ c

=
= P

c
s.e.

<


s.e.

<
c
s.e.

= 0.95

Since
b

s.e.
(
b

)
t
nk
, we derive the condence interval:

t
nk,0.975
s.e.

SUMMARY

We discussed the principle of hypothesis testing

We derived the t-statistic

We dened the concept of the p-value

We explained what signicance of a coefcient means

We observed a regression output on an example


TO BE CONTINUED . . . :)

Next exercise session:

revision of the t test

introduction to statistical software (hopefully)

Next lecture:

testing of multiple linear restrictions

assessing the goodness of t (R


2
)

Home assignment:

to be submitted on the next lecture

Вам также может понравиться