Академический Документы
Профессиональный Документы
Культура Документы
Do Huu Luat
luat.do@luatdo.com
Outline
• Examples
• The research process
• Liner regression model
• Qualitative regressors
Relationship between variables:
Examples
• [Individual] Does a master degree improve my
wage?
• [Producer] How much my output increase if I
hire more labor?
• [Seller] How much would the demand for my
product increase if I reduce the price by $X?
• [Policy maker] Does subsidies (cash transfers) to
the poor reduce their working efforts?
• [Farmer and Policy maker] Does lower fertilizer
price result in higher profit?
Relationship between variables:
Examples
• [Wage and gender] Is there gender
discrimination in wage?
• [Labor supply] Is female worker working harder
than male?
• [Labor supply] Is laborer working more with
higher wage?
Econometrics techniques
Multiple regressions
Advanced methods to obtain unbiased estimates
Multiple regression
y E(y|x) = b0 + b1x
y4
u4 {
y3 } u3
y2 u2 {
y1 } u1
x1 x2 x3 x4 x
Sample Regression Function (SRF)
y
y4 e4 {
yˆ = + b . X
y3 . e3
y2 e2 {
y1 } e1
x1 x2 x3 x4 x
Method Of Ordinary Least Squares
❑Method of Ordinary Least Squares (OLS) search for
coefficients that minimizes residual sum of squares
(RSS):
❑We could not find out PRF but we can fine SRF, so we
apply OLS for data sample to find SRF
❑To obtain values of the regression coefficients,
derivatives are taken with respect to the regression
coefficients and set equal to zero.
Goodness Of Fit: R2
❑𝑅2 , the coefficient of determination, is an overall
measure of goodness of fit of the estimated
regression line.
❑Gives the percentage of the total variation in the
dependent variable that is explained by the
regressors.
❑It is a value between 0 (no fit) and 1 (perfect fit).
2
❑Let: Explained Sum of Squares 𝐸𝑆𝑆 = σ 𝑌 − 𝑌 ത
Residual Sum of Squares 𝑅𝑆𝑆 = σ 𝑒 2
Total Sum of Squares 𝑇𝑆𝑆 = σ 𝑌 − 𝑌ത 2
2 𝐸𝑆𝑆 𝑅𝑆𝑆
❑Then: 𝑅 = =1−
𝑇𝑆𝑆 𝑇𝑆𝑆
Goodness Of Fit: R2
Degree of Freedom: 𝑑𝑓
• 𝑛 is total number of observations
• 𝑘 is total number of estimated coefficients
• 𝑑𝑓 for 𝑅𝑆𝑆 = 𝑛 − 𝑘
Goodness Of Fit: R Squared Adjusted
❑𝑅2 is that it is an increasing function of 𝑘.
❑Sometimes researchers play the game of “maximizing” 𝑅2
(somebody think the higher the 𝑅2 , the better the model.)
❑To avoid this temptation: 𝑅2 should takes into account the
number of regressors.
❑Such an 𝑅2 is called an adjusted 𝑅2 , denoted as ഥ𝑅2 (R-bar
squared), and is computed from the (unadjusted) 𝑅2 as
follows:
𝑛−1
ഥ𝑅2 = 1 − 1 − 𝑅2
𝑛−𝑘
Example
Data: 9,200 individual laborers from VHLSS 2008
• workday: number of working days in 12 months
• wage: in thousands VND/day
• gender: dummy variable, male = 1
• age: in years
• edu: schooling years (years)
• married: dummy variable, married = 1
Summary statistics
. sum workday wage gender age edu married
. reg workday wage gender age edu married F-statistic and its
ESS RSS P-value
Source SS df MS Number of obs = 9,251
F(5, 9245) = 13.65
Model 910367.176 5 182073.435 Prob > F = 0.0000
Residual 123276639 9,245 13334.412 R-squared = 0.0073
Adj R-squared = 0.0068
Total 124187006 9,250 13425.6223 Root MSE = 115.47
Regressand TSS
workday Coef. Std. Err. t P>|t| [95% Conf. Interval]
P value of
Regressors t-statistic
t-statistic
Model selection
. estat ic
. estat ic
. sum u, detail
Residuals
Percentiles Smallest
1% -122.7432 -445.4255
5% -116.9056 -157.4657
10% -113.8881 -155.805 Obs 9246
25% -105.6947 -151.2645 Sum of Wgt. 9246
y
f(y|x)
E(y|x) = b0 + b1x
x1 x2
Heteroskedastic Case
f(y|x)
E(y|x) = b0 + b1x
x1 x2 x3 x
Example: working days and wage
Regression without outliers
. reg workday wage gender age edu married if wage<15000
P-value/2
. test wage
( 1) wage = 0
F( 1, 9237) = 169.65
Prob > F = 0.0000
. test married
( 1) married = 0
F( 1, 9237) = 1.08
Prob > F = 0.2985
Testing Individual Coefficient: t test
. reg workday wage gender age edu married if wage<15000
( 1) gender = 0
( 2) age = 0
( 3) edu = 0
( 4) married = 0
F( 4, 9237) = 5.76
Prob > F = 0.0001
F test for overall significance
❑Step 1: Form hypotheses
❑𝐻0 : 𝛽2 = 𝛽3 = ⋯ = 𝛽𝑘 = 0
❑𝐻𝛼 : At least one β different from 0
( 1) wage = 0
( 2) gender = 0
( 3) age = 0
( 4) edu = 0
( 5) married = 0
F( 5, 9237) = 39.01
Prob > F = 0.0000
F test for overall significance
. reg workday wage gender age edu married if wage<15000
gen white = 0
replace white = 1 if race == 1
gen black = 0
replace black = 1 if race == 2
• Then introduce white and black to the regression
The wage function again
. reg lwage edu tenure age age2 male married union white black