# PGP

Session 2
2017
Understanding Econometrics
A case of Multiple Regression

PRECISION OF THE REGRESSION COEFFICIENTS

2 1

u2
u
X2 b21
X X
2
2
X i X
b0 2

n
i
n2
E M SD(e) u2
n
n

e 2
i
Hence, u2 i 1
n2
. reg API FLE

## Source SS df MS Number of obs = 20

F( 1, 18) = 33.81
Model 81632.8614 1 81632.8614 Prob > F = 0.0000
Residual 43466.3386 18 2414.79659 R-squared = 0.6525
Total 125099.2 19 6584.16842 Root MSE = 49.141

## FLE -2.114271 .3636374 -5.81 0.000 -2.878245 -1.350298

_cons 951.8735 22.78791 41.77 0.000 903.9979 999.7491

18
A flowchart illustrating the Process to Move from Statistics to Econometrics
Start

## Choose a set of variables

Done Formulate the problem Choose form of model
Specify assumptions

## Done Fit the model Use method of fitting: LS

No Residual plots
Validate assumptions What are Outliers detection
those? : Diagnostics Checks

OK?
Yes
Evaluate the fitted
model Goodness of fit tests

No Ok?

Yes
Use the model for the
Stop
TYPES OF REGRESSION MODEL AND ASSUMPTIONS

Assumptions

## A.1: The model is linear in parameters and correctly specified.

Y = b1 + b2 X + u

## Examples of models that are not linear in parameters:

Y b1 X b2 u
Y = b1 + b2X2 + b3X3 + b2b3X4 + u

5
Assumptions

## A.2.The disturbance term has zero expectation

E(ui) = 0 for all i
Yi = b0+ b1Xi + ui
Define E (ui ) u 0

Yi b 0 b1 X i vi u
b 0* b1 X i vi where b 0 * b 0 u

## Then E (vi ) E (ui u )

= u-u = 0 A2*=cov(X,U)=0 unbiased
estimates
Results of the Assumptions: UNBIASEDNESS OF THE REGRESSION
COEFFICIENTS

## Simple regression model: Y = b0 + b1X + u

LSestimate s : b1
X X Y Y b
i i
ai ui
X X
2 1
i

Xi X
ai

jX X 2

E b1 E b1 E ai ui
b1 E ai ui b1 E ( ai ) E ui
b1

Assumptions
A.3. There should be some variations in X.

b1 X i X Yi Y
i X X 2

0
If X i X for all i, b2 .
0

A.4. The variance of the random error u is

## If Assumption A.4 is not satisfied, the OLS

regression coefficients will be inefficient, and you
should be able to obtain more reliable results by
using a modification of the OLS regression
technique.

8
l

## A.5 The covariance between any pair of random errors,

ui and uj is
cov(ui , u j ) E[(ui E (u)(u j E (u)] 0

For example, just because the disturbance term is large and positive in one observation,
there should be no tendency for it to be large and positive in the next (or large and
negative, for that matter, or small and positive, or small and negative).

If this assumption is not satisfied, OLS will again give inefficient estimates.

9
Gauss-Markov Theorem: Under these assumptions of the
linear regression model, the estimators b and b have
0 1
the smallest variance of all linear and unbiased estimators of b 0

## and b1 . They are called the Best Linear Unbiased

Estimators (BLUE).

10
Assumptions
For inference purpose,
A.6 : we assume errors follow normal
distribution.
Source SS df MS Number of obs = 20
F( 1, 18) = 33.81
Model 81632.8614 1 81632.8614 Prob > F = 0.0000
Residual 43466.3386 18 2414.79659 R-squared = 0.6525
Total 125099.2 19 6584.16842 Root MSE = 49.141

## FLE -2.114271 .3636374 -5.81 0.000 -2.878245 -1.350298

_cons 951.8735 22.78791 41.77 0.000 903.9979 999.7491

Statistical analysis ; usefulness of X as a
predictor of Y

## An appropriate test statistic for testing the

null hypothesis for testing the null
hypothesis
H0 : 1 = 0 ; against the alternative
H1 : 1 0 is the t-Test.

Sampling Distribution of b0
(intercept)and b1 (slope)
/ (x x)
i
2

t
b1
x2
1 / n
( xi x ) 2 /2

e 2
i

n2 n2

b0 t
t- Test

## The statistic t1 is distributed as a students t

with n-2 degrees of freedom. The test is
carried out by comparing this observed
value with the appropriate critical value
obtained from the t table.

Evaluate the model: GOODNESS OF FIT

ei Yi Yi Yi Yi ei

TSS Yi Y
2

Yi ei Y e
2
Yi Y ei 2

Y Y e 0

Y Y Y Y e
i
2
i
2 2
i
2 Yi Y ei

Y Y e 2 Yi ei 2Y ei
2 2
i i

Y Y Y Y e
2
2 2
i i i

R2
ESS

i
(Y Y ) 2

R2
1
ei
2

Measure of Variation

Multiple Regression: can I improbe
goodness of Fit

Is it only poverty or something Else?
Parents education: Is it because FLE explains API or because FLE is correlated with
Whole set of other variables (including parents schooling and school funding)
We do not know.

Goal is to isolate how poverty affects API, but in fact it might be the variable that was
Omitted is the true cause.

Now 3 normal equations

: =1 0 1 1 2 2 (-1) = 0

1 : =1 0 1 1 2 2 (-1 ) = 0

2 : =1 0 1 1 2 2 (-2 ) = 0

= - 1 1 - 2 2

Multiple Regression
2

=1 1 =1 2 =1 1 2 =1 2
1 = 2 2 2
=1 1 =1 2 1= 1 2

2

=1 2 =1 1 =1 1 2 =1 1
2 = 2 2 2
=1 1 =1 2 1= 1 2

Multiple=Simple
2

=1 1 =1 2 =1 1 2 =1 2
1 = 2 2 2
=1 1 =1 2 1= 1 2

if x1i x2i 0
i

## leave out relevant variable : difference may be quite

large depending on covariance between x1 and x2 and
y and x2.

Simple Regression

Multiple Regression
. reg API FLE PE

## Source SS df MS Number of obs = 20

F( 2, 17) = 52.09
Model 107548.893 2 53774.4467 Prob > F = 0.0000
Residual 17550.3065 17 1032.37097 R-squared = 0.8597
Total 125099.2 19 6584.16842 Root MSE = 32.131

## FLE -.5105995 .3987211 -1.28 0.218 -1.351827 .3306285

PE 2.335998 .4662362 5.01 0.000 1.352325 3.31967
_cons 777.1664 37.91938 20.50 0.000 697.1635 857.1693

. corr FLE PE
(obs=20)

FLE PE

FLE 1.0000
PE -0.8027 1.0000

. reg FLE PE

## Source SS df MS Number of obs = 20

F( 1, 18) = 32.62
Model 11768.0225 1 11768.0225 Prob > F = 0.0000
Residual 6493.77755 18 360.765419 R-squared = 0.6444
Total 18261.8 19 961.147368 Root MSE = 18.994

## PE -.9386783 .1643529 -5.71 0.000 -1.283971 -.5933856

_cons 89.72497 7.430862 12.07 0.000 74.1133 105.3366

. predict e1,resid
(2 missing values generated)

. reg API e1

## Source SS df MS Number of obs = 20

F( 1, 18) = 0.25
Model 1693.00479 1 1693.00479 Prob > F = 0.6253
Residual 123406.195 18 6855.89973 R-squared = 0.0135
Total 125099.2 19 6584.16842 Root MSE = 82.8

## e1 -.5105995 1.027504 -0.50 0.625 -2.669305 1.648106

_cons 835.8 18.51472 45.14 0.000 796.902 874.698

. reg PE FLE

## Source SS df MS Number of obs = 20

F( 1, 18) = 32.62
Model 8606.56421 1 8606.56421 Prob > F = 0.0000
Residual 4749.23579 18 263.846433 R-squared = 0.6444
Total 13355.8 19 702.936842 Root MSE = 16.243

## FLE -.6865041 .1201998 -5.71 0.000 -.9390345 -.4339736

_cons 74.78907 7.532511 9.93 0.000 58.96385 90.61429

. predict e2,resid
(2 missing values generated)

. reg API e2

## Source SS df MS Number of obs = 20

F( 1, 18) = 4.70
Model 25916.0318 1 25916.0318 Prob > F = 0.0437
Residual 99183.1682 18 5510.17601 R-squared = 0.2072
Total 125099.2 19 6584.16842 Root MSE = 74.231

## e2 2.335998 1.077137 2.17 0.044 .0730171 4.598978

_cons 835.8 16.59846 50.35 0.000 800.9279 870.6721

Perfect Multicollinearity
2

=1 1 =1( 1 ) =1 1 1 =1 1
1 = 2 ( )2 2
=1 1 =1 1 1= 1 1

2(
=1 1

=1 1
2 2
=1 1 =1 1 )
= 2 2 2 2
2
( =1 1 =1 1 1= 1 )

= 0/0

Which model should we choose?
. reg API PE

## Source SS df MS Number of obs = 20

F( 1, 18) = 99.02
Model 105855.889 1 105855.889 Prob > F = 0.0000
Residual 19243.3112 18 1069.07284 R-squared = 0.8462