ECON 3049: Introduction to Econometrics

ECON 3049: ECONOMETRICS
Semester 1 - 2009
Department of Economics
The University of the West Indies, Mona
These notes are not typo-free!!

Contents
1 Introduction 2
1.1 Definition of Econometrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Methodological Approach to Econometrics . . . . . . . . . . . . . . . . . . . 2
1.3 Regression Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Statistical vs. Deterministic Relations . . . . . . . . . . . . . . . . . . . . . . 2
1.5 Correlation, Causation and Regression . . . . . . . . . . . . . . . . . . . . . 3
1.6 The concept of ‘Ceteris Paribus’ . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.7 Structure of Economic Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.8 Review of some probabilistic concepts . . . . . . . . . . . . . . . . . . . . . . 4
1.9 Review of the summation operator . . . . . . . . . . . . . . . . . . . . . . . 4
2 Simple Regression Analysis 5

2.1 Some basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Linearity in Variables vs. Linearity in Parameters . . . . . . . . . . . . . . . 6
3 Model Estimation 7
3.1 Method 1: Method of Moments . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Method 2: Ordinary Least Squares (OLS) . . . . . . . . . . . . . . . . . . . 12
3.3 Properties of the OLS regression line (SRF) . . . . . . . . . . . . . . . . . . 13
4 Assumptions behind the CLRM 15

4.1 Properties of the OLS estimators . . . . . . . . . . . . . . . . . . . . . . . . 16
4.2 The Variance of the OLS estimators . . . . . . . . . . . . . . . . . . . . . . . 18
4.3 Gauss Markov Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5 R-Squared(R2 ) 22
5.1 Properties of R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.2 Sample Correlation(r ) and R2 . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.3 Estimating the error variance σ 2 . . . . . . . . . . . . . . . . . . . . . . . . . 24
6 Statistical Inference - Confidence Interval Estimation, Hypothesis Test-

ing, Prediction and Goodness of Fit 27
6.1 Normality of β̂0 and β̂1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
6.2 Test for Significance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6.3 t distribution ratio of Chi and Standard Normal distribution . . . . . . . . . 29
6.4 Confidence Interval Estimation . . . . . . . . . . . . . . . . . . . . . . . . . 31
1
6.5 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
7 Multiple Linear Regression Model 38

7.1 Properties of the OLS estimators . . . . . . . . . . . . . . . . . . . . . . . . 43
7.2 Statistical Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
7.2.1 Hypothesis Testing [Part 2] . . . . . . . . . . . . . . . . . . . . . . 46
7.2.2 Restricted vs Unrestricted Models . . . . . . . . . . . . . . . . . . . . 46
7.2.3 Case II: Testing Multiple Hypothesis . . . . . . . . . . . . . . . . . 48
7.2.4 Confidence Interval Estimation . . . . . . . . . . . . . . . . . . . . . 50
8 Violation of Some assumptions of CLRM 51

8.1 Multicollinearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
8.1.1 The effect of Perfect Multicollinearity on Estimation . . . . . . . . . 52
8.1.2 The effect of Near(Perfect) Multicollinearity . . . . . . . . . . . . . . 52
8.2 Heteroscedasticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
8.2.1 How to adjust the model for heteroscedasticity? . . . . . . . . . . . . 55
9 Regression with Dummy (Qualitative) Variables 55

9.1 Incorporating a single dummy as a Regressor . . . . . . . . . . . . . . . . . . 56
9.2 Dummy regressor in log-linear models . . . . . . . . . . . . . . . . . . . . . . 58
9.3 Dummies for Multiple Categories . . . . . . . . . . . . . . . . . . . . . . . . 58
9.4 Interactions Among Dummies . . . . . . . . . . . . . . . . . . . . . . . . . . 60
9.4.1 Other Interactions with Dummies . . . . . . . . . . . . . . . . . . . . 61
9.5 Testing for Differences Across Groups . . . . . . . . . . . . . . . . . . . . . . 61
9.5.1 The Chow Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
9.6 Linear Probability Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
9.7 Caveats on Policy Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 63
9.7.1 Self-selection Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 63
9.8 Current Affairs Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
2
1 Introduction
1.1 Definition of Econometrics
The analysis of economic phenomena by applying Mathematics and Statistical Inference to
economic theory with the ultimate aim of empirically verifying the theory.
1.2 Methodological Approach to Econometrics

1. State theory or hypothesis.
2. Specify the mathematical model of the theory.
3. Specify the econometric model of the theory.
4. Collect the data.
5. Estimate the parameters of the econometric model.
6. Test the hypothesis.
7. Forecast or Predict.
8. Use the empirical results of the econometric model for control or policy prescription.
1.3 Regression Analysis

“Regression Analysis is concerned with the study of the dependence of one variable, the
dependent variable, on one or more other variables, the explanatory variable(s), with a view
to estimate and/or predict the (population) mean or average value of the former in terms of
the known or fixed (in repeated sampling)value of the latter”(Gujarati).
1.4 Statistical vs. Deterministic Relations

Statistical - considers variables that are random or stochastic. A random or stochastic
variable is one that has a non-degenerate probability distribution function. Examples of
statistical relations:
1. the effect of corruption on growth,
2. the effect of corruption on inflation.
3
Deterministic (Functional) - involves variables that are non-random or non-stochastic. An
example of deterministic relations is Newton’s law of gravity and motion. Deterministic
relations are found in classical Physics.
In this course we abstract from deterministic relations and deal only with statistical
relations.
1.5 Correlation, Causation and Regression

• Regression analysis does not necessarily imply causation.
• Correlation is the measure of linear association between two variables.
• Correlation analysis is a ‘symmetrical concept’.
• Regression analysis is an ‘asymmetrical concept’.
Note:
• Correlation Analysis - both variables are stochastic.
• Regression Analysis - the dependent variable is stochastic but the explanatory variable
is fixed or non-stochastic.
• Correlation does not necessarily imply causation. (Read Tolstoy.)
1.6 The concept of ‘Ceteris Paribus’

‘Ceteris Paribus’ means holding all other things constant. What is the relation between
ceteris paribus and partial differentiation?
Note: Ceteris paribus is crucial to causal analysis because we cannot establish causality
without holding other factors constant. For example:
• the effect of education on wages,
• the effect of corruption on growth,
• the effect of education on crime.
1.7 Structure of Economic Data

Cross-sectional Data - Data on one or more variables for individuals, firms, cities, states,
countries or other units of observation collected at the same point in time.
4
Time Series Data - A collection of observations on the values that a variable takes at
different points in time. Intervals can be daily, monthly, yearly etc.
Pooled Cross Section - Combining sets of cross sectional data to increase sample size.
Example, cross sectional household survey in two different years (two different random sam-
ple).
Panel or Longitudinal Data - A time series data set for each cross-sectional member in
the data set. Example, wage data on a set of individual’s over a 25-year period.
Note: Distinction between the two latter data structures - In panel data, the same cross
sectional units are followed over the given period.
In this course, we restrict our focus to cross-sectional data.
1.8 Review of some probabilistic concepts

See Wooldridge, Appendix B
1.9 Review of the summation operator

See Wooldridge, Appendix A
5
2 Simple Regression Analysis
2.1 Some basic Concepts
Recall the aim of regression analysis. Now let Y be the dependent variable, X be the
explanatory variable and (Y , X) be drawn from the same population of interest. We want
a functional form that will allow us to express Y in terms of X. In the context of a Simple
Linear Regression Model, we write
Y = β0 + β1 X + U (2.1)
Equation (2.1) is also called a “two variable linear regression model” or a “bivariate linear
regression model”.
Various jargons for the variables in a regression model.
Table 1: Jargons used for Y

and X
Y Variable X Variable
Dependent Independent
Explained Explanatory
Response Control
Predicted Predictor
Regressand Regressor
Covariate
In equation (2.1), U is known as the error term or disturbance term. That is, U captures
all elements (factors) other than X that affect Y . Note that U is unobserved.
Y = β0 + β1 X + U
⇒ ∆Y = β1 ∆X + ∆U (2.2)
⇒ ∆Y = β1 ∆X if ∆U = 0
• ∆U = 0 implies that the other elements are held constant,“ceteris paribus”,
• β0 is known as the intercept parameter,
6
• β1 is known as the slope parameter (coefficient of X).
Thus β1 measures the effect of a change in X on Y , ceteris paribus. In Equation (2.2) we

see that X has a linear effect on Y .
Now Assume: (a) E(U ) = 0 and (b) E(U | X) = E(U ). Then (b) implies that (i) X
and U are uncorrelated (ii) X and U are not linearly dependent.
Then (a) and (b) imply that E(U | X) = E(U ) = 0.

Now taking conditional expectation w.r.t. X of Equation (2.1) gives
E(Y | X) = E[(β0 + β1 X + U )|X]

E(Y | X) = β0 + β1 X (2.3)
Combining (2.1) and (2.3) we have Y = E(Y | X) + U . Equation (2.3) is known as the
“Population Regression Function” (PRF). Note that β0 and β1 are unknown but fixed
parameters in the PRF.
In regression analysis we seek to estimate the parameters of the PRF.
2.2 Linearity in Variables vs. Linearity in Parameters

• Linearity in variables - e.g. if E(Y | X) = β0 + β1 X 3 , then this is not a linear function
in the variable X.
• Linearity in Parameters - e.g. if E(Y | X) = β0 + β1 2 X, then this is not a linear

function in the parameter β1 .
Note: We will use linear in simple linear regression to mean linear in parameters!!
7
Figure 1: Graph of Fitted values and Residuals
3 Model Estimation
Let us begin with Equation 2.1
Yi = β0 + β1 Xi + Ui , for i = 1, . . . , n
Given the population regression function is not directly observable, we estimate this form:
Yi = β̂0 + β̂1 Xi + Ûi

Yi = Ŷi + Ûi ,
where:
1. n is the sample size,
2. Ŷi is the estimated (conditional mean) value of Yi ,
3. Ûi is the residual, that is the difference between the actual and the estimated values
of Yi (Ûi = Yi − Ŷi ).
Question: how do we obtain β̂0 & β̂1 ?

Answer: there are 3 general approaches to estimating parameters of the PRF: (1) method
of moments, (2) least squares and (3) maximum likelihood. We will only discuss method of
moments and least squares approaches in this course.
8
3.1 Method 1: Method of Moments
This method requires only the two assumptions in section (2.1) that were used to derive the
PRF, namely (a) E(U ) = 0 and E(U |X) = E(U ). Recall that we can combine (a) and (b)
to obtain E(U |X) = 0 which implies that U and X are uncorrelated. That is,
0 = Cov(X, U ) = E(XU ) − E(X)E(U )

⇒ 0 = E(XU ) since E(U ) = 0.
In essence we now have
1. E(U ) = 0
2. E(XU ) = 0
Using (1) ⇒ E(U ) = E(Y − β0 − β1 X) = 0.

Using (2) ⇒ E(XU ) = E[X(Y − β0 − β1 X)] = 0.
The sample analogue for E(Y − β0 − β1 X) = 0 is:
n
1X
(Yi − β̂0 − β̂1 Xi ) = 0 (3.1)
n i=1
Similarly, the sample analogue for E[X(Y − β0 − β1 X)] = 0 is:

n
1X
(Xi (Yi − β̂0 − β̂1 Xi )) = 0 (3.2)
n i=1
Using (3.1) we have

Ȳ − β̂0 − β̂1 X̄ = 0 −→ β̂0 = Ȳ − βˆ1 X̄.
9
Using (3.2) we have
n
1X
[Xi (Yi − β̂0 − β̂1 X)] = 0
n i=1
n
1X
(Xi Yi − β̂0 Xi − β̂1 Xi 2 ) = 0
n i=1
n n n
1X 1X 1X
Xi Yi − β̂0 Xi − β̂1 Xi 2 = 0
n i=1 n i=1 n i=1
n n
1X 1X 2
Xi Yi − β̂0 X̄ − β̂1 Xi = 0
n i=1 n i=1
n n
1X 1X 2
Xi Yi − (Ȳ − β̂1 X̄)X̄ − β̂1 Xi = 0
n i=1 n i=1
1
Pn
Xi Yi − Ȳ X̄
⇒ β̂1 = 1 Pi=1
n
n 2 2
n i=1 Xi − X̄
1
P n
n i=1 (Xi − X̄)(Yi − Ȳ )
β1 = 1
Pn
(Xi − X̄)2
n
P i=1
(Xi − X̄)(Yi − Ȳ )
β1 = P
(Xi − X̄)2
Thus, given Y = β0 + β1 X + U the MOM estimators of β0 & β1 , β̂0 & β̂1 , are as follows:
β̂0 = Ȳ − β̂1 X
1
Pn
n i=1 Xi Yi − X̄ Ȳ
β̂1 = 1
P n 2 2
n i=1 Xi − X̄
Example 3.1. Consider the following data for the two variable regression model
Yi = β0 + β1 Xi + Ui , for i = 1, . . . , n,
which satisfies all the standard assumptions of the Classical Linear Regression Model:
X X X X X
n = 10, X = 30, Y = 20, X 2 = 92, Y 2 = 50, XY = 64.
Find the MOM estimators of β0 and β1 .
10
Answer:
1
Pn
n i=1 Xi Yi − X̄ Ȳ
β̂1 = 1
P n 2 2
n i=1 Xi − X̄
1
10
64 − (3)(2)
β̂1 = 1 =2
10
92 − ( 30
10
)2
Similarly,
β̂0 = Ȳ − βˆ1 X̄
= 2 − (2)(3) = −4
Formulae:
1. n
X
(Xi − X̄) = 0
i=1
2. n n
X X
(Xi − X̄)2 = (Xi − X̄)Xi
i=1 i=1
3. n n
X X
(Xi − X̄)(Yi − Ȳ ) = (Xi − X̄)Yi
i=1 i=1
4. n n
X X
(Xi − X̄) = 2
Xi 2 − nX̄ 2
i=1 i=1
5. n n
X X
(Xi − X̄)(Yi − Ȳ ) = Xi Yi − nX̄ Ȳ
i=1 i=1
Proving Formulae above:
1.
n
X n
X n
X n
X
(Xi − X̄) = Xi − X̄ = Xi − nX̄
i=1 i=1 i=1 i=1
= nX̄ − nX̄ = 0
11
2.
n
X n
X
(Xi − X̄)2 = (Xi − X̄)(Xi − X̄)
i=1 i=1
n h
X i
= (Xi − X̄)Xi + (Xi − X̄)(−X̄)
i=1
Xn n
X
= (Xi − X̄)Xi − (Xi − X̄)X̄
i=1 i=1
Xn n
X
= (Xi − X̄)Xi − X̄ (Xi − X̄)
i=1 i=1
n
X
= (Xi − X̄)Xi − X̄ ∗ 0
i=1
Xn
= (Xi − X̄)Xi
i=1
3. Similar to (2)
4.
n
X n
X
2
(Xi − X̄) = (Xi − 2Xi X̄ + X̄ 2 )
i=1 i=1
X X
= Xi 2 − 2X̄ Xi + nX̄ 2
n
X
= Xi 2 − 2nX̄ 2 + nX̄ 2
i=1
Xn
= Xi 2 − nX̄ 2
i=1
P P
Example 3.2. Suppose Ȳ = 2, X̄ = 3, n =10 ni=1 (Xi − X̄)2 = 2, ni=1 (Xi − X̄)(Yi − Ȳ ) = 4
for the model
Yi = α0 + α1 Xi + Ui , i = 1, . . . , n.
Find the MOM estimators of α0 and α1 .
Answer:
Pn
i=1 (Xi − X̄)(Yi − Ȳ ) 4
α̂1 = Pn (deviation form) ⇒ α̂1 = = 2.
i=1 (Xi − X̄)
2 2
12
Also,
α̂0 = Ȳ − α1 X̄
α0 = 2 − 2(3) = −4
3.2 Method 2: Ordinary Least Squares (OLS)

Recall the SRF:
Yi = β̂0 + β̂1 Xi + Ûi

= Ŷi + Ûi
where Ûi is the residual and Ŷi is the estimated (conditional mean) values of Yi . That is
Ŷi = β̂0 + β̂1 Xi . Then Ûi = Yi − Ŷi .
The least-squares criterion states that β0 and β1 must be selected so that the sum of
P 2
squares residuals is minimized. That is, Ui is as small as possible. By virtue of the
least-squares criterion we therefore seek β0 and β1 such that
n
X
min Ûi2
β0 ,β1
i=1
n
X
⇒ min (Yi − β̂0 − βˆ1 Xi )2
β0 ,β1
i=1
Differentiating w.r.t β0 and β1 yields:

P X
∂( Ûi2 )
= −2 (Yi − β̂0 − βˆ1 Xi ) = 0
∂ β̂0
P 2 X
∂( Ûi )
= −2 (Yi − β̂0 − βˆ1 Xi )Xi = 0
∂ β̂1
Then the First Order Conditions imply

P
1. Ûi = 0
P
2. Ûi Xi = 0
Alternatively,
X
(Yi − β̂0 − βˆ1 Xi ) = 0 (3.3)
X
(Yi − β̂0 − βˆ1 Xi )Xi = 0 (3.4)
13
Equations (3.3) and (3.4) are known as the normal equations. We use equation (3.3) to solve
for β̂0 :
X X X
=⇒ Yi − β̂0 − βˆ1 Xi = 0
X X
=⇒ Yi − nβ̂0 − βˆ1 Xi = 0
P P
Yi Xi
=⇒ β̂0 = − β̂1
n n
or β̂0 = Ȳ − β̂1 X̄ (3.5)
Put (3.5) into (3.4) and solve for β̂1 .

X X X
Xi Yi − β̂0 Xi − βˆ1 Xi 2 = 0
X X X
Xi Yi − β̂0 Xi − βˆ1 Xi 2 = 0
X X X
Xi Yi − (Ȳ − β1 X̄) Xi − βˆ1 Xi 2 = 0
X X X X
=⇒ β̂1 ( Xi 2 − X̄ Xi ) = Xi Yi − Ȳ Xi
P P P
n Xi Yi − Xi Yi
=⇒ β̂1 = P P
n Xi 2 − ( Xi )2
P
(Xi − X̄)(Yi − Ȳ )
= P
(Xi − X̄)2
Notation: In this class we will define Xi ≡ Xi − X̄, that is Xi is the deviation form of Xi
∼ ∼
from its mean value. Then
P
Xi Yi
β̂1 = P ∼ ∼2 (deviationf orm)
Xi
∼
1
P P
Aside: Method of Moments: n
Ûi = 0; n1 Ûi Xi = 0
Remark 3.3. The method of moments condition for the sample are identical to the first
order conditions from the OLS approach. Thus for our classical linear regression models the
estimators from these two model estimation approaches are identical.
3.3 Properties of the OLS regression line (SRF)

1. The SRF passes through the sample means of X and Y .
2. The mean value of the estimated Yi = Ŷi is equal to the mean of the actual Y. That is
14
¯
Ŷ = Ȳ .
3. The residuals Ûi ’s have mean equal to zero. One implication of this property is that
the SRF can be written as
Yi − Ȳ = β1 (Xi − X̄) + Ui
=⇒ Yi = β1 Xi + Ui (deviationf orm)
∼ ∼
By virtue of this property we also have:
Ŷi = β̂1 Xi ,
∼ ∼
for the estimated value Ŷi in deviation form.
4. There is zero correlation between the residuals Ûi and the fitted values Ŷi .
5. There is zero correlation between the residuals Ûi and the explanatory variable Xi .
Questions:
a Verify all the above properties of the SRF. You can provide a proof for each property.
b Do all properties hold if the simple linear regresssion model is of the form Yi = β1 Xi +
Ui , i = 1, . . . , n?
15

ECON 3049: Introduction to Econometrics

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

ECON 3049: Introduction to Econometrics

Загружено:

Авторское право:

Доступные форматы

ECON 3049: ECONOMETRICS

These notes are not typo-free!!

2 Simple Regression Analysis 5

4 Assumptions behind the CLRM 15

6 Statistical Inference - Confidence Interval Estimation, Hypothesis Test-

7 Multiple Linear Regression Model 38

8 Violation of Some assumptions of CLRM 51

9 Regression with Dummy (Qualitative) Variables 55

1.2 Methodological Approach to Econometrics

2. Specify the mathematical model of the theory.

3. Specify the econometric model of the theory.

4. Collect the data.

5. Estimate the parameters of the econometric model.

6. Test the hypothesis.

1.3 Regression Analysis

1.4 Statistical vs. Deterministic Relations

1. the effect of corruption on growth,

2. the effect of corruption on inflation.

1.5 Correlation, Causation and Regression

• Correlation is the measure of linear association between two variables.

• Correlation analysis is a ‘symmetrical concept’.

• Regression analysis is an ‘asymmetrical concept’.

• Correlation Analysis - both variables are stochastic.

• Correlation does not necessarily imply causation. (Read Tolstoy.)

1.6 The concept of ‘Ceteris Paribus’

• the effect of education on wages,

• the effect of corruption on growth,

• the effect of education on crime.

1.7 Structure of Economic Data

1.8 Review of some probabilistic concepts

1.9 Review of the summation operator

Various jargons for the variables in a regression model.

Table 1: Jargons used for Y

• ∆U = 0 implies that the other elements are held constant,“ceteris paribus”,

• β0 is known as the intercept parameter,

Thus β1 measures the effect of a change in X on Y , ceteris paribus. In Equation (2.2) we

Then (a) and (b) imply that E(U | X) = E(U ) = 0.

E(Y | X) = E[(β0 + β1 X + U )|X]

2.2 Linearity in Variables vs. Linearity in Parameters

• Linearity in Parameters - e.g. if E(Y | X) = β0 + β1 2 X, then this is not a linear

Yi = β̂0 + β̂1 Xi + Ûi

1. n is the sample size,

2. Ŷi is the estimated (conditional mean) value of Yi ,

Question: how do we obtain β̂0 & β̂1 ?

0 = Cov(X, U ) = E(XU ) − E(X)E(U )

In essence we now have

Using (1) ⇒ E(U ) = E(Y − β0 − β1 X) = 0.

Similarly, the sample analogue for E[X(Y − β0 − β1 X)] = 0 is:

Using (3.1) we have

Find the MOM estimators of β0 and β1 .

Proving Formulae above:

Find the MOM estimators of α0 and α1 .

3.2 Method 2: Ordinary Least Squares (OLS)

Yi = β̂0 + β̂1 Xi + Ûi

Differentiating w.r.t β0 and β1 yields:

Then the First Order Conditions imply

Put (3.5) into (3.4) and solve for β̂1 .

3.3 Properties of the OLS regression line (SRF)

By virtue of this property we also have:

for the estimated value Ŷi in deviation form.

Вам также может понравиться