Вы находитесь на странице: 1из 19

Homework 3

Problem 3.1
In a study relating college grade point average to time spent in various activities, you
distribute a survey to several students. The students are asked how many hours they spend
each week in four activities: studying (study), sleeping (sleep), working (), and
(leisure). Any activity is put into one of the four categories, so that for each student, the
sum of hours in the four activities must be 168.

(i) In the model

= 0 + 1 + 2 + 3 + 4 + ,

does it make sense to hold sleep, work, and leisure fixed, while changing study?
No, it doesnt make any sense to hold sleep, work and leisure fixed and while
changing study. The variable study, sleep, work and leisure are linearly related and all
of them sum 168 hours in a week.

(ii) Explain why this model violates Assumption MLR.3.


There are four categories that students can take during a week, you can study
more/less but once we know how much time is spent in three categories, we know
how much time is spent in the fourth category. For example, ( = 168
), therefore, there is a perfect linear relationship between the
first three variables and the fourth. In conclusion, if the fourth variables are included,
then the assumption MLR.3. is violated.

(iii) How could you reformulate the model so that its parameters have a useful
interpretation and it satisfies Assumption MLR.3?
We could drop one of the four variables, for example, we could drop leisure from the
model and we could re-write the model as
= 0 + 1 + 2 + 3 +
One of the variables would change by one hour to keep the 168 hours in a week. In
this way, beta1 tells us what effect substituting one more hour of study for one less
hour of leisure has on GPA. This model satisfies MLR.3.
Problem 3.2
Suppose that average worker productivity at manufacturing firms (avgprod) depends on
two factors, average hours of training (avgtrain) and average worker ability (avgabil):

= 0 + 1 + 2 + .

Assume that this equation satisfies the Gauss-Markov assumptions. If grants have been
given to firms whose workers have less than average ability, so that avgtrain and avgabil
are negatively correlated, what is the likely bias in 1 obtained from the simple regression
of avgprod on avgtrain?
If we omit , we mess up the model as
= 0 + 1 + .
If ( ) affects () and if ( ) and ( ) are
correlated, then omission of ( ) will lead to omitted variables bias.

The sign of the bias is likely to be negative. This is because


corr(( ) ,( )) is negative (because lower ability workers require
more training), and because ( ) positively affects () (presumably
higher ability workers tend to be more productive)

Since the correlation between the omitted and included variables is negative, and since
the effect of the omitted variable on is positive, we would expect the overall sign of the
bias to be negative.
Problem 3.3
The following equation describes the relationship between fourth-grade pass rates on a
math test, measured as a percent, spending per student (exppp, in dollars), and the percent-
age of students eligible for free and reduced-price lunches (lunch):

4 = 0 + 1 log() + 2+ .

(i) How much is the percentage point change in math4 when exppp increases by 10
percent?
(ii) If expenditure per student is higher at poor schools, are log(exppp) and lunch
positively or negatively correlated?
(iii) The following equations were estimated:

= 84.84 1.52 log(), 2 = 0.0003.


4

= 84.84 + 11.38 log() 0.471, 2 = 0.370.


4

From these simple and multiple regression results, determine whether, in this simple,
log(exppp) and lunch are positively or negatively correlated.
Problem 3.4
The median starting salary for new law school graduates is determined by

log() = 0 + 1 + 2 + 3() + 4() + 5 + ,

where LSAT is the median LSAT score for the graduating class, GPA is the median college
GPA for the class, libvol is the number of volumes in the law school library, cost is the
annual cost of attending law school, and rank is a law school ranking (with = 1
being the best).
(i) What signs do you expect for all slope parameters? Justify your answers.
There should be positive relationships between all of the other slope parameters and
salary. For LSAT and GPA, higher score means higher ability and higher chance of
getting a good job. The number of volumes in the law school library and the annual
cost are both measure of quality. However, there is no guarantee that books is
sufficient and efficiently read and the annual cost could be for other purposes than
education.

(ii) The estimated equation is


log() = 8.34 + 0.0047 + 0.248 + 0.095 ()
+ 0.038 () 0.0033 ; n = 136, 2 = 0.842.

What is the predicted ceteris paribus difference in salary for schools with a median
GPA different by one point? (Report your answer as a percentage.)

24.8%

(iii) Interpret the coefficient on the variable log(libvol).

Elasticity of salary with respect to . An increase in by 1% is predicted to


increase salary by .095%

(iv) Would you say it is better to attend a higher ranked law school? How much is a
difference in ranking of 20 worth in terms of predicted starting salary?
The answer is 20 (.0033) (100) = 6.6%
Computer Exercises (All data are contained in the file Data_HW3.xls)
Due to I am not able to do INFILE in iCloud, I am doing CARDS instead of INFILE and I am
only showing part of the CARDS SAS CODE because the information is too long.

Problem 3.5
Use the data in sheet of WAGE1 to confirm the partialling out interpretation of the OLS
estimates by considering the following model

log() = 0 + 1 + 2 + 3 +

This first requires regressing educ on exper and tenure and saving the residuals,1.

When I run this regression in SAS I get:


Parameter Estimates
Variable DF Parameter Standard t Value Pr > |t|
Estimate Error
Intercept 1 13.57496 0.18432 73.65 <.0001
exper 1 -0.07379 0.00976 -7.56 <.0001
tenure 1 0.04768 0.01834 2.60 0.0096

= 13.57496 0.07379 + 0.04768

DATA adMaru;
INPUT educ exper tenure @@;
CARDS;
11 2 0
12 22 2
11 2 0

;
TITLE 'Multiple Linear Regression';
PROC REG DATA =adMaru;
MODEL educ= exper tenure /;
RUN;

Then, regress log(wage) on 1.. Compare the coefficient on 1 with the coefficient on educ
in the regression of log(wage) on educ, exper, and tenure.

:
To get the residuals, I substract

1 =

When I run this regression in SAS I get:


Parameter Estimates
Variable DF Parameter Standard t Value Pr > |t|
Estimate Error
Intercept 1 1.62976 0.02072 78.64 <.0001
r1 1 0.09112 0.00790 11.54 <.0001


log() = 1.62976 + 0.091121

DATA adMaru;
INPUT logwage r1 @@;
CARDS;
1.131402 -2.43
1.175573 -0.13
1.098612 -2.43

;
TITLE 'Simple Linear Regression';
PROC REG DATA =adMaru;
MODEL logwage= r1 /;
RUN;
Problem 3.6
Use the data in sheet of WAGE2 for this problem. As usual, be sure all of the following
regressions contain an intercept.
(i) Run a simple regression of IQ on educ to obtain the slope coefficient, say,1 .

Parameter Estimates
Variable DF Parameter Standard t Value Pr > |t|
Estimate Error
Intercept 1 53.68715 2.62293 20.47 <.0001
educ 1 3.53383 0.19221 18.39 <.0001

= 53.68715 + 3.53383

DATA adMaru;
INPUT IQ educ @@;
CARDS;
93 12
119 18
108 14

;
TITLE 'Simple Linear Regression';
PROC REG DATA =adMaru;
MODEL IQ= educ /;
RUN;

(ii) Run the simple regression of log(wage) on educ, and obtain the slope coefficient, 1
Parameter Estimates
Variable DF Parameter Standard t Value Pr > |t|
Estimate Error
Intercept 1 5.97306 0.08137 73.40 <.0001
educ 1 0.05984 0.00596 10.03 <.0001


log() = 5.97306 + 0.05984
DATA adMaru;
INPUT logwage educ @@;
CARDS;
6.645091 12
6.694562 18
6.715384 14

;
TITLE 'Simple Linear Regression';
PROC REG DATA =adMaru;
MODEL logwage= educ /;
RUN;
(iii) Run the multiple regression of log(wage) on educ and IQ, and obtain the slope
coefficients, 1 2 , respectively.

Parameter Estimates
Variable DF Parameter Standard t Value Pr > |t|
Estimate Error
Intercept 1 5.65829 0.09624 58.79 <.0001
educ 1 0.03912 0.00684 5.72 <.0001
IQ 1 0.00586 0.00099791 5.88 <.0001


log() = 5.65829 + 0.03912 + 0.00586
DATA adMaru;
INPUT logwage educ IQ @@;
CARDS;
6.645091 12 93
6.694562 18 119
6.715384 14 108

;
TITLE 'Multiple Linear Regression';
PROC REG DATA =adMaru;
MODEL logwage= educ IQ /;
RUN;

(iv) Verify that 1 = 1 + 2 1


We take 1 = 0.03912 from the last regression and add we add 2 = 0.00586 from the
last regression, times 1 = 3.53383from the first regression. The answer is 0.0598
Problem 3.7
The data in sheet of CEOSAL contains data on 177 chief executive officers and can be
used to examine the effects of firm performance on CEO salary.
(i) Estimate a model relating annual salary to firm sales and market value. Make the
model of the constant elasticity variety for both independent variables. Write the
results out in equation form.
This is a constant elasticity and the model would be:
log() = 0 + 1 log() + 2 log() +

When I run this model in SAS I get:


Parameter Estimates
Variable DF Parameter Standard t Value Pr > |t|
Estimate Error
Intercept 1 4.62092 0.25441 18.16 <.0001
logsales 1 0.16213 0.03967 4.09 <.0001
logmktval 1 0.10671 0.05012 2.13 0.0347

Based on this regression model of constant elasticity of annual salary toward the firm
sales and marketing value, the equation can be expressed as:

log( = 4.62092 + 0.16213log() + 0.10671log()

DATA datamaru37;
INPUT logsalary logsales logmktval @@;
CARDS;
7.057037 8.732305 10.05191
6.39693 5.645447 7.003066
5.937536 5.129899 7.003066

/*the last obs of 15 . is to request for a predicted


value for a company spending 15(100k) on advertisement*/
TITLE 'Multiple Linear Regression';
PROC REG DATA = datamaru37;
MODEL logsalary= logsales logmktval /;
RUN;

(ii) Add profits to the model from part. Why can this variable not be included in
logarithmic form?
we are asked to add the profit variable as explanatory variable in the regression
model, however, profits value can be negative which means that firm loses.
When I run this model in SAS I get:
Parameter Estimates
Variable DF Parameter Standard t Value Pr > |t|
Estimate Error
Intercept 1 4.68692 0.37973 12.34 <.0001
logsales 1 0.16137 0.03991 4.04 <.0001
logmktval 1 0.09753 0.06369 1.53 0.1275
profits 1 0.00003566 0.00015196 0.23 0.8147

Based on this regression mode of constant elasticity of annual salary toward the firm
sales, markt sales, and profit, the equation can be expressed as:

log() = 4.68692 + 0.16137 log() + 0.09753 log()
+ 0.00003566
= 0.2993
(ii). Would you say that these firm performance variables explain most of the
variation in CEO salaries?
The R-square of this model is 29.93% which means that the rest in percentage change
of annual salary is not explained, therefore, I would not say that this firm
performance variables explain most of the variation in CEO salaries.

DATA datamaru37;
INPUT logsalary logsales logmktval profits @@;
CARDS;
7.057037 8.732305 10.05191 966
6.39693 5.645447 7.003066 48
5.937536 5.129899 7.003066 40

/*the last obs of 15 . is to request for a predicted


value for a company spending 15(100k) on advertisement*/
TITLE 'Multiple Linear Regression';
PROC REG DATA = datamaru37;
MODEL logsalary= logsales logmktval profits/;
RUN;

(iii) Add the variable ceoten to the model in part (ii). What is the estimated percentage
return for another year of CEO tenure, holding other factors fixed?
When I run in SAS I get:
Parameter Estimates
Variable DF Parameter Standard t Value Pr > |t|
Estimate Error
Intercept 1 4.55778 0.38025 11.99 <.0001
Parameter Estimates
Variable DF Parameter Standard t Value Pr > |t|
Estimate Error
logsales 1 0.16223 0.03948 4.11 <.0001
logmktval 1 0.10176 0.06303 1.61 0.1083
profits 1 0.00002905 0.00015035 0.19 0.8470
ceoten 1 0.01168 0.00534 2.19 0.0301

Based on this regression model of constant elasticity of annual salary toward the firm
sales, market value, profit, and CEO tenure, the equation cab expressed as:


log() = 4.55778 + 0.16223 log() + 0.10176 log() + 0.00002905
+ 0.01168

DATA datamaru37;
INPUT logsalary logsales logmktval profits ceoten @@;
CARDS;
7.057037 8.732305 10.05191 966 2
6.39693 5.645447 7.003066 48 10
5.937536 5.129899 7.003066 40 3

/*the last obs of 15 . is to request for a predicted


value for a company spending 15(100k) on advertisement*/
TITLE 'Multiple Linear Regression';
PROC REG DATA = datamaru37;
MODEL logsalary= logsales logmktval profits ceoten/;
RUN;
(iv) Find the sample correlation coefficient between the variables log(mktval) and profits.
Are these variables highly correlated? What does this say about the OLS estimators?
The correlation between profit and log(market value) is 0.777. We can say that profit
and log(market value) are highly correlated. Both move together, profit measure how
the firm is doing and market value is based on past, present, and expected future
profitability.
Corresponding SAS codes:

Problem 3.5 (Multiple Regression)

Multiple Linear Regression

The REG Procedure


Model: MODEL1
Dependent Variable: educ
Number of Observations Read 526
Number of Observations Used 526

Analysis of Variance
Source DF Sum of Mean F Value Pr > F
Squares Square
Model 2 407.94631 203.97316 29.49 <.0001
Error 523 3617.48335 6.91679
Corrected Total 525 4025.42966

Root MSE 2.62998 R-Square 0.1013


Dependent Mean 12.56274 Adj R-Sq 0.0979
Coeff Var 20.93477

Parameter Estimates
Variable DF Parameter Standard t Value Pr > |t|
Estimate Error
Intercept 1 13.57496 0.18432 73.65 <.0001
exper 1 -0.07379 0.00976 -7.56 <.0001
tenure 1 0.04768 0.01834 2.60 0.0096
Problem 3.5 (Simple Regression)

Simple Linear Regression

The REG Procedure


Model: MODEL1
Dependent Variable: logwage
Number of Observations Read 526
Number of Observations Used 526

Analysis of Variance
Source DF Sum of Mean F Value Pr > F
Squares Square
Model 1 30.04971 30.04971 133.13 <.0001
Error 524 118.28005 0.22573
Corrected Total 525 148.32976

Root MSE 0.47511 R-Square 0.2026


Dependent Mean 1.62327 Adj R-Sq 0.2011
Coeff Var 29.26845

Parameter Estimates
Variable DF Parameter Standard t Value Pr > |t|
Estimate Error
Intercept 1 1.62976 0.02072 78.64 <.0001
r1 1 0.09112 0.00790 11.54 <.0001
Problem 3.6 (i)

Simple Linear Regression

The REG Procedure


Model: MODEL1
Dependent Variable: IQ
Number of Observations Read 935
Number of Observations Used 935

Analysis of Variance
Source DF Sum of Mean F Value Pr > F
Squares Square
Model 1 56281 56281 338.02 <.0001
Error 933 155347 166.50218
Corrected Total 934 211627

Root MSE 12.90357 R-Square 0.2659


Dependent Mean 101.28235 Adj R-Sq 0.2652
Coeff Var 12.74020

Parameter Estimates
Variable DF Parameter Standard t Value Pr > |t|
Estimate Error
Intercept 1 53.68715 2.62293 20.47 <.0001
educ 1 3.53383 0.19221 18.39 <.0001
Problem 3.6 (ii)

Simple Linear Regression

The REG Procedure


Model: MODEL1
Dependent Variable: logwage
Number of Observations Read 935
Number of Observations Used 935

Analysis of Variance
Source DF Sum of Mean F Value Pr > F
Squares Square
Model 1 16.13771 16.13771 100.70 <.0001
Error 933 149.51859 0.16026
Corrected Total 934 165.65629

Root MSE 0.40032 R-Square 0.0974


Dependent Mean 6.77900 Adj R-Sq 0.0964
Coeff Var 5.90529

Parameter Estimates
Variable DF Parameter Standard t Value Pr > |t|
Estimate Error
Intercept 1 5.97306 0.08137 73.40 <.0001
educ 1 0.05984 0.00596 10.03 <.0001
Problem 3.6 (iii)

Multiple Linear Regression

The REG Procedure


Model: MODEL1
Dependent Variable: logwage
Number of Observations Read 935
Number of Observations Used 935

Analysis of Variance
Source DF Sum of Mean F Value Pr > F
Squares Square
Model 2 21.47795 10.73897 69.42 <.0001
Error 932 144.17834 0.15470
Corrected Total 934 165.65629

Root MSE 0.39332 R-Square 0.1297


Dependent Mean 6.77900 Adj R-Sq 0.1278
Coeff Var 5.80198

Parameter Estimates
Variable DF Parameter Standard t Value Pr > |t|
Estimate Error
Intercept 1 5.65829 0.09624 58.79 <.0001
educ 1 0.03912 0.00684 5.72 <.0001
IQ 1 0.00586 0.00099791 5.88 <.0001
Problem 3.7 (i)

Multiple Linear Regression

The REG Procedure


Model: MODEL1
Dependent Variable: logsalary
Number of Observations Read 177
Number of Observations Used 177

Analysis of Variance
Source DF Sum of Mean F Value Pr > F
Squares Square
Model 2 19.33656 9.66828 37.13 <.0001
Error 174 45.30966 0.26040
Corrected Total 176 64.64622

Root MSE 0.51029 R-Square 0.2991


Dependent Mean 6.58285 Adj R-Sq 0.2911
Coeff Var 7.75188

Parameter Estimates
Variable DF Parameter Standard t Value Pr > |t|
Estimate Error
Intercept 1 4.62092 0.25441 18.16 <.0001
logsales 1 0.16213 0.03967 4.09 <.0001
logmktval 1 0.10671 0.05012 2.13 0.0347
Problem 3.7 (ii)

Multiple Linear Regression

The REG Procedure


Model: MODEL1
Dependent Variable: logsalary
Number of Observations Read 177
Number of Observations Used 177

Analysis of Variance
Source DF Sum of Mean F Value Pr > F
Squares Square
Model 3 19.35098 6.45033 24.64 <.0001
Error 173 45.29524 0.26182
Corrected Total 176 64.64622

Root MSE 0.51169 R-Square 0.2993


Dependent Mean 6.58285 Adj R-Sq 0.2872
Coeff Var 7.77301

Parameter Estimates
Variable DF Parameter Standard t Value Pr > |t|
Estimate Error
Intercept 1 4.68692 0.37973 12.34 <.0001
logsales 1 0.16137 0.03991 4.04 <.0001
logmktval 1 0.09753 0.06369 1.53 0.1275
profits 1 0.00003566 0.00015196 0.23 0.8147
Problem 3.7 (iii)
Multiple Linear Regression

The REG Procedure


Model: MODEL1
Dependent Variable: logsalary
Number of Observations Read 177
Number of Observations Used 177

Analysis of Variance
Source DF Sum of Mean F Value Pr > F
Squares Square
Model 4 20.57681 5.14420 20.08 <.0001
Error 172 44.06941 0.25622
Corrected Total 176 64.64622

Root MSE 0.50618 R-Square 0.3183


Dependent Mean 6.58285 Adj R-Sq 0.3024
Coeff Var 7.68937

`Parameter Estimates
Variable DF Parameter Standard t Value Pr > |t|
Estimate Error
Intercept 1 4.55778 0.38025 11.99 <.0001
logsales 1 0.16223 0.03948 4.11 <.0001
logmktval 1 0.10176 0.06303 1.61 0.1083
profits 1 0.00002905 0.00015035 0.19 0.8470
ceoten 1 0.01168 0.00534 2.19 0.0301

Вам также может понравиться