Вы находитесь на странице: 1из 7

BES Tutorial Sample Solutions, S1/13

WEEK 11 TUTORIAL EXERCISES (To be discussed in the week starting


May 20)

1. Use a calculator to compute the sample least squares regression line for
, given the following six observations.
the model

y 2 8 6 12 9
11
x 1 4 3 10 10 8

10
6

10

6;

6 2

74

12 9 11
8
6
6 11 8 62

TN: Dividing these 2 terms by (n1) = 5 gives the sample covariance


between x & y and the sample variance of x, respectively.


Thus the sample regression line is

0.8378
2.9732

62
0.8378
74
6 2.9732
0.8378

2. Suppose the relationship between the dependent variable weekly


household consumption expenditure in dollars (y) and the independent
variable weekly household income in dollars (x) is represented by the
simple regression model (i refers to the ith observation or household):




Suppose a sample of observations yields least squares estimates of
b0 = 32 and b1 = 0.82.

(a) What does represent in the model?
It is the random disturbance term. It includes any purely random factors or
errors and factors that have been left out of the model but whose influence is
considered minor.

(b) State the basic (classical) assumptions made about the s in this
model. Explain in words what the assumptions mean.

|
0 for all observations. The conditional mean of the disturbance
(i)
does not depend on x and is normalized to zero. Note this is different from
Keller who only mentions the normalization to zero. That the conditional
mean of the disturbances does not depend on x ensures unbiasedness of
the OLS estimator and so is the much more important component of this
assumption. Relating back to the previous part of the question it implies
that omitted factors that might affect expenditure but appear in the
disturbance are assumed to be uncorrelated with x.
, are drawn by simple random sampling and hence iid.
(ii)
(iii) The standard deviation of is constant for all observations. It is denoted
by and we say the disturbances are homoskedastic. Here that implies
the variability in food expenditure does not depend on income which is
possibly problematic in practice.
(iv) The disturbances for any two observations are independent. This will
imply, in particular that there is no correlation between disturbances
associated with different observations. In this example the factors in the
disturbance for household i are not correlated with those for household j.
(v) is normally distributed for all observations.

(c) Does the estimate of b0 = 32 make sense? If not, does this
necessarily invalidate the model? Explain your answer.
2


This indicates that if a household had a zero weekly income then on average
such a household would have negative consumption, which does not make
sense. However, this does not necessarily invalidate the model. It may be that
the linear model is only a reasonable approximation for some range of
household incomes, not including incomes near zero. In particular, the
relationship may be nonlinear for values of x near zero. The conclusion is
that we should be careful in interpreting the intercept term, as it may not be
very meaningful in some cases.

(d) Interpret both 1 and b1. What does the model predict would be the
change in y following a $10 increase in x from some initial level?

1 is the (unknown) population change in the value of y resulting from a one


unit increase in x, whereas b1=0.82 is an estimate of 1. In this particular

example this is the marginal propensity to consume that would be discussed in


economics courses. The predicted change in y following a $10 increase in x
10 0.82 $8.20.
would be 10

(e) Suppose we measured y and x in cents rather than dollars. What
effect would this have on the estimated coefficient of x? What effect
would it have on the estimated intercept?
In this case: $x becomes 100x cents and $y becomes 100y cents. The estimated
coefficient of x i when the variables are measured in dollars is given by


If we let be the estimated slope coefficient when the variables are measured
in cents, we have
100
100 100
100 100


100
100
100


Also, denote by the estimated intercept in this case then we have

100
100 100

100

Thus estimation of this model (with the same, but rescaled data) would lead
3200.
to an unchanged b1, whilst the intercept term would become 100
3

(f) Suppose y were measured in dollars but x were measured in cents.


What effects would this have on the estimated coefficient of x?


Denote the estimated slope and intercept in this case by let and . Then

100
100
100


100
100
100

100

100



Now estimation of this model would lead to the estimated coefficient of the
income variable being 0.0082 and estimated intercept would be unchanged.
This makes sense since:
If income is measured in dollars, we predict expenditure (in dollars) will
increase by$0.82 if household income increases by one dollar.
If income is measured in cents, we predict expenditure (in dollars) will
increase by $0.0082 if household income increases by one cent.

(g) Distinguish between and (the residual associated with
observation i). Illustrate your answer with a diagram


as an estimate of the true random disturbance
We can think of
.
associated with observation i ,




3. Computing Exercise #4
Refer to the computing program and answer Discussion Questions 4.1
and 4.2 associated with simple linear regression.
Q4.1 Discussion:

Based on the information you obtained, describe the relationship
between the returns on the individual stock (Intel) and the returns on
the overall market (S&P).
As indicated below in the Line fit plot produced for the second question, there
is a positive correlation between the returns on Intel stock and the overall
market return. However there is considerable variation around the
superimposed linear relationship.


Q4.2 Discussion:

i)
What is the sample regression line?
From the Excel regression output below:

0.022 1.472 ,

ii) Is there sufficient evidence to infer at the 5% significance level that
there is a linear relationship between the return on Intel
Corporation stock and the return on the total market?
Appropriate hypothesis to be tested is:

:
0; :
0

which according to the Excel output yields a pvalue of 0.0069 and so for any
significance level greater than 0.0069 (which includes 5% ) we would reject
the null and conclude there is evidence to suggest a linear relationship.

iii) Is there sufficient evidence to infer at the 5% significance level that
Intel Corporation stock is more sensitive than the average stock?
Now the appropriate hypothesis to be tested is:

:
1; :
1

The standardized test statistic for this hypothesis is:
5

1.47163 1
0.52052

0.9061


Using a t critical value and 40 degrees of freedom (actually 47 degrees of
freedom but this value not in tables) yields a rejection region of t>1.684.
Alternatively with a relatively large sample size we can invoke the CLT and use
the 5% normal critical value of 1.675.

In either case the calculated test statistic falls well short of the reject ion
region and we cannot reject the null hypothesis.

iv) Discuss the significance of the findings?
While there is evidence of a strong positive relationship between the returns,
the evidence of whether the Intel stock is more or less sensitive to the market is
weak. The point estimate of 1.472 indicates evidence in favour of being more
sensitive but we cannot exclude the possibility that it is in fact less sensitive.
The 95% CI provided by Excel is (0.424, 2.519) and hence includes values
consistent with both possibilities.


v)
Explain the meaning of the regression and residual sum of squares.
The total sums of squares representing the total variation (0.4446) in the
dependent variable (returns on Intel stocks) can be decomposed into two
parts: a regression sum of squares (0.0658) representing that part explained
by the regression model and the residual sum of squares (0.3788) representing
that part left over and unexplained by the model. In this case the latter is large
relative to the former leading to an R2 of 0.148 indicating that only 14.8% of
the variation in Intel stock is being explained by the market model.

This is consistent with our initial observation from the scatter plot that there
was considerable variation around the trendline. See also the line fit plot that
overlays the estimated market model on the bivariate scatter.

SUMMARY OUTPUT
Regression Statistics
Multiple R
R Square
Adjusted R Square
Standard Error
Observations

0.3848
0.1480
0.1295
0.0907
48

ANOVA
df
Regression
Residual
Total

Intercept
INDEX

SS
MS
0.065822161 0.065822
0.378800255 0.008235
0.444622416

1
46
47
Coefficients
0.02192
1.47163

Standard Error
0.01508
0.52052

t Stat
1.45365
2.82722

Significance F
F
7.993182 0.0069287

P-value Lower 95% Upper 95% Lower 95.0%Upper 95.0%


0.15283
-0.00843
0.05228
-0.00843
0.05228
0.00693
0.42387
2.51938
0.42387
2.51938

INDEXLineFitPlot
0.25
0.20
0.15
0.10
INTEL

0.05

0.1

INTEL

0.00
0.05

0.05 0

0.05

0.1

PredictedINTEL

0.10
0.15
0.20
INDEX

Вам также может понравиться