Академический Документы
Профессиональный Документы
Культура Документы
ASSIGNMENT 1
WONG SHAO YUN CHARIS
LILI LAURINA EISENRING
Question 1
Consider a simple linear regression: y
i
=
0
+
1
x
i
+ u
i
where u
i
is the error term. The sample
regression function is
.
Assume Gauss Markov assumptions are satisfied. This implies that
are unbiased
estimators, i.e.
(P1)
Let
To prove that
E2:
E3:
E4: (
) [(
E5: (
E6: ((
((
E7: [
(Substituting E2 inside)
((
((
(Substituting E1 inside)
((
((
)(
) (
( (
( (
(Using E6)
*(
+ (
, i.e.
Hence,
is an unbiased estimator of
.
Question 2
(a) Yes. The p-value for
expersq
is 0 which is smaller than 1% significance level, thus we reject the
null hypothesis that
expersq
= 0. Thus, exper
2
is statistically significant at 1% significance level.
(b) (
)
Given other things being equal, using the above approximation, the approximation return to
fifth year of experience is:
100*(.0328542 + 2*(-.0006606)*5)*5 = 100(0.0262482*5) = 13.12%
(c) If education increases by 2 years, salary on average increases by 100*(.0853489*2) =17.1%,
holding other factors constant.
(d) F(4,521) is the F-statistic to test H
0
:
educ
=
tenure
=
exper
=
expersq
= 0 against H
1
: H
0
is not true.
Since the p-value = 0, it implies that F(4,521) = 73.09 is more than the critical value of F-
distribution with q= number of restrictions = 4 and degrees of freedom = n-k-1 = 521. Thus,
we reject H
0
:
educ
=
tenure
=
exper
=
expersq
= 0 at 5% significance level. Therefore, one or
more of these variables (education, tenure, experience and exper
2
) are important regressors
in explaining salary.
(e) Test H
0
:
exper
=
expersq
= 0 vs H
1
: H
0
is not true.
q= number of restrictions = 2 and degrees of freedom = n-k-1 = 521.
Unrestricted (UR) Model:
Restricted (R) Model:
_cons .1983445 .1019556 1.95 0.052 -.0019501 .3986392
expersq -.0006606 .0001111 -5.94 0.000 -.0008789 -.0004423
exper .0328542 .0051135 6.42 0.000 .0228085 .0428999
tenure .0208413 .0030037 6.94 0.000 .0149404 .0267422
educ .0853489 .0071885 11.87 0.000 .071227 .0994709
lwage Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 148.329751 525 .28253286 Root MSE = .42704
Adj R-squared = 0.3545
Residual 95.0110462 521 .182362853 R-squared = 0.3595
Model 53.3187052 4 13.3296763 Prob > F = 0.0000
F( 4, 521) = 73.09
Source SS df MS Number of obs = 526
. reg lwage educ tenure exper expersq
Results for Restricted model:
(
(
(
The p-value = 0 is less than 5% significance level, implying F = 20.72 > critical F-value at 5%
significance level. Hence, we reject H
0
:
exper
=
expersq
= 0 at 5% significance level. Therefore,
it is quite likely that both experience and (experience)
2
are important in explaining wage.
Question 3
(a) As this model is a level-log model, thus interpretation of
would be: (
.
Therefore, holding other factors constant, a 1% increase in Candidate As campaign
expenditure on average increases percentage of the vote received by Candidate A
by (
.
(b) The null hypothesis is a 1% increase in As expenditures is offset by a 1% increase in Bs
expenditure. This is equivalent to
_cons .4044739 .0916956 4.41 0.000 .224337 .5846109
tenure .0258143 .0026795 9.63 0.000 .0205504 .0310782
educ .0865276 .0069909 12.38 0.000 .0727939 .1002613
lwage Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 148.329751 525 .28253286 Root MSE = .44285
Adj R-squared = 0.3059
Residual 102.567109 523 .19611302 R-squared = 0.3085
Model 45.7626421 2 22.881321 Prob > F = 0.0000
F( 2, 523) = 116.67
Source SS df MS Number of obs = 526
. reg lwage educ tenure
Prob > F = 0.0000
F( 2, 521) = 20.72
( 2) expersq = 0
( 1) exper = 0
. test exper expersq
(c) Results for given model:
From the reported t-ratios for both
, both
and not
null hypothesis in part (b).
Furthermore, the t-statistic to test the null hypothesis in part (b) is calculated as follows:
)
(
) (
) (
)
However, (
( (
( (
Thus, to test
vs.
against
, where
.
Results from auxiliary regression:
LM statistic = n* R
2
uhatsq
= 173 * 0.0545 = 9.4285 > 7.815 =
Hence, we reject null hypothesis and conclude that the model contains heteroskedastic
errors (i.e. homoskedasticity assumption is not met).
_cons 45.07893 3.926305 11.48 0.000 37.32801 52.82985
prtystrA .1519574 .0620181 2.45 0.015 .0295274 .2743873
lexpendA_B 6.083316 .38215 15.92 0.000 5.328914 6.837719
lexpendB -.532101 .5330858 -1.00 0.320 -1.584466 .5202639
voteA Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 48457.2486 172 281.728189 Root MSE = 7.7123
Adj R-squared = 0.7889
Residual 10052.1388 169 59.4801115 R-squared = 0.7926
Model 38405.1097 3 12801.7032 Prob > F = 0.0000
F( 3, 169) = 215.23
Source SS df MS Number of obs = 173
. reg voteA lexpendB lexpendA_B prtystrA
_cons 159.5399 46.05722 3.46 0.001 68.6183 250.4615
prtystrA -.5260204 .7274986 -0.72 0.471 -1.962176 .910135
lexpendB -3.011196 4.443722 -0.68 0.499 -11.78355 5.761159
lexpendA -12.01325 4.482781 -2.68 0.008 -20.86271 -3.163787
u2 Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 1463001.66 172 8505.8236 Root MSE = 90.469
Adj R-squared = 0.0378
Residual 1383202.77 169 8184.63179 R-squared = 0.0545
Model 79798.8863 3 26599.6288 Prob > F = 0.0233
F( 3, 169) = 3.25
Source SS df MS Number of obs = 173
. reg u2 lexpendA lexpendB prtystrA
(f) To test for nonlinearities in the original, we test
where
are slope
parameters in the following regression:
The results of the above regression:
Hence, reject
with respect to
0
and
1
. Thus, after differentiating with
respect to
0
and 1, the following first order conditions are:
(
-- (1)
-- (2)
_cons -68.73712 16.28095 -4.22 0.000 -100.8801 -36.59411
yhat3 -.0005976 .0000759 -7.88 0.000 -.0007474 -.0004478
yhat2 .0836951 .0107371 7.79 0.000 .0624972 .1048931
prtystrA -.3593402 .0921216 -3.90 0.000 -.5412132 -.1774672
lexpendB 16.34374 3.379216 4.84 0.000 9.67225 23.01523
lexpendA -15.66736 2.972818 -5.27 0.000 -21.53651 -9.798213
voteA Coef. Std. Err. t P>|t| [95% Conf. Interval]
Total 48457.2486 172 281.728189 Root MSE = 6.6197
Adj R-squared = 0.8445
Residual 7318.06896 167 43.8207722 R-squared = 0.8490
Model 41139.1796 5 8227.83592 Prob > F = 0.0000
F( 5, 167) = 187.76
Source SS df MS Number of obs = 173
. reg voteA lexpendA lexpendB prtystrA yhat2 yhat3
Prob > F = 0.0000
F( 2, 167) = 31.20
( 2) yhat3 = 0
( 1) yhat2 = 0
. test yhat2 yhat3
Substitute (1) into (2):
( (
))
( (
(b) Assume that x
1
and x
2
are not correlated with each other. This implies that error term v
which contains x
2
is uncorrelated with x
1
, implying that zero conditional mean assumption
holds, i.e.
However, if zero conditional mean assumption does not hold, this would imply error term v
is correlated with x
1
, i.e.
.
[
Hence,
.
(c) Assume x
1
and x
2
are positively correlated and
2
has a positive theoretical sign.
Positive correlation between x
1
and x
2
implies that x
1
and v are positively correlated, i.e.
(
+ (
(This is because under the true regression model, zero conditional mean is satisfied, i.e.
(
)
Therefore, since (
(As (
)
Hence, even under large samples,
overestimates true
. Therefore,
.
Question 5
(a) Mean of prpblck = 0.113 =11.3%; Standard deviation of prpblack= 0.1824165
Mean of income = 47053.78; Standard deviation of income= 13179.29
Units of measurement for prpblack and income are percentage points and dollars
respectively.
(b) Results:
Results in equation form:
(0.018992) (0.026001) (0.000000362)
n = 401 R
2
= 0.0681
Interpret the coefficient on prpblck: As the proportion of blacks increases by 0.1 (i.e. 10
percentage points), price of soda on average increases by 0.012 dollars, ceteris paribus.
The coefficient on prpblck is not economically large as its magnitude is close to 0.
(c) Results:
Interpret the coefficient on prpblck: As the proportion of blacks increases by 0.01 (i.e. 1
percentage point), price of soda on average increases by 0.1215%, ceteris paribus.
If prpblck increases by 0.2 (20 percentage points), price of soda on average increases by
2.43%, ceteris paribus.
(d) Results of model in part (c) after including prppov:
Intuition: The higher the proportion in poverty (i.e. higher prppov), the lower the demand
for soda. This is because there are fewer people who are able to afford soda. Thus, by the
theory of demand, price of soda (psoda) will increase. This implies a positive relationship
between prppov and psoda.
Since the coefficient of prppov is positive (as shown in the above report), it is what we
expected.
(e) Because log(income) and prppov are so highly correlated, they have no business being in
the same regression.
The above statement is false. According to the classical linear model assumptions, there
should be no perfect collinearity, implying no perfect correlation between log(income) and
prppov. However, this assumption of no perfect collinearity allows independent explanatory
variables to be correlated, but not perfectly. Since the correlation between log(income) and
prppov (shown below) shows that these 2 variables are not perfectly correlated, thus
log(income) and prppov can be in the same regression.
(f) (i) Test
Hence, based on the p-value, we reject H
0
at 5% significance level. We conclude that one or
more of these variables (prpblck, log(income) and prppov) are important in explaining the
price of soda.
(ii) Test
Hence, based on the p-value, we fail to reject H
0
at 5% significance level.
(iii) Test
|| |
Hence, we reject H
0
at 5% significance level.