You are on page 1of 8

BES Tutorial Sample Solutions, S2,2010

It will be posted on BES website with one week delay.


WEEK 12 TUTORIAL EXERCISES (To be discussed in the week starting
October 11)
1. RecalltheAnzacGaragedata(ANZACG.XLS)usedinWeek3,question
7,whereweconsideredthesimplelinearregressionmodelgivenby:

whereprice=usedcarpriceindollarsandage=ageofthecarinyears.
The EXCEL results obtained using Ordinary Least Squares are
presentedbelow:

RegressionStatistics

R 2
0.077

StandardError 42069

Observations
117

CoefficientsStandardError tStat pvalue


Intercept
47469
6748
7.035 0.000
Age
2658
856
3.106 0.002

(a) Interpret the tStat and the pvalues in the EXCEL output.
Whatdoyouneedtoassume?
Thetstat&pvaluesintheEXCELoutputarederivedfromtwotailtestswith
nullhypothesesthattheassociatedpopulationparameterequalsto0.Hence,
larger tstats and lower pvalues mean we are more confident that the
associatedpopulationparameterisnonzero.Here,pvaluesforbothintercept
and Age coefficients are below 1% &, hence we can be confident that both
populationparametersarestatisticallysignificant(nonzero).

Weneedtoassumethedisturbancesarenormalorbecausethesamplesizeis
largeinvoketheCLT.
1


(b) Calculatea95%confidenceintervalforthecoefficientonage.

Standardnormalcriticalvalueis1.96hence95%confidenceintervalis:

26581.96856=26581678=(4336,980)

(c) InterprettheR2value.

Theregressionmodelincludingageexplains7.7%ofthevariationinusedcar
prices.

(d) TestwhethertheestimatedcoefficientofAgeissignificantlyless
thanzeroatthe5%levelofsignificance.

Unlikein(a)thisisaonetailedtest:

H0: 1=0; H1: 1< 0


Decisionrule:RejectH0ifb1/se(b1)<1.645
Teststatistic:b1/se(b1)=3.106<1.645andhencerejectH0

(e) Estimate a 95% confidence interval for the mean price for a
secondhandpassengercarthatis10yearsoldandinterpretthe
result?Note:thesamplemeanofageis6.44years.

A10yearoldcarisexpectedtobevaluedat$47469102658=20889.
Boundariesofconfidenceintervalforthispredictioncanbefoundby:

1
,

wheres=42069,se(b1)=856andhence
42069
856

2415

Hence:
20889

1.98

42069

1
117

10

6.44
2415

20889

9783

Weare95%confidentthatthepriceofa10yearoldcarwillfallbetween
$11,106 and $30,672. While the impact of age on price is precisely
estimated, the CI is quite wide because of the large amount of
unexplainedvariationthatisindicatedbytheverylowR2valuereported.
(Note: use of normal critical values here would be acceptable given the
large sample size and would make little practical difference as the
criticalvaluewouldbe1.96ratherthan1.98)

Anzac Garages pricing scheme based on the age of the car is not
workingoutverywell.Whenitssecondhandcarsarecomparedwith
cars of the same age from other dealers, prices often diverge. One of
their consultants noted that the value of a secondhand car should
dependonboththeOdometerreadingaswellastheAgeofthevehicle.
This consultant wanted to estimate the following two simple linear
regressionmodelsseparately:

whereOdometer=distancethecarhastravelledsinceleavingfactory
in kilometers. A senior consultant advised use of a multiple linear
regressionmodelinstead:

(f) Discuss why the simple linear regression methods may not be
preferable to the multiple regression method, in general, and in
the context of this problem. The resultant OLS estimates for the
multipleregressionmodelgivenbelow:

Thepredictiveperformanceofthemodelwillimproveasrelevantvariablesare
addedtoasimpleregressionmodel.

Alsotheassumptionthatthedisturbanceisuncorrelatedwiththeexplanatory
variables is critical for the unbiased estimation of coefficients of included
variables.Inthesimplepriceonageregressionitwillbeviolatedifvariables
affecting price and correlated with age have been omitted from the model.
Thisislikelytobethecaseherewithdistancethecarhastraveled.

We see the R2 has improved (approximately doubled) with the addition of


odometerandthecoefficientonageisnowmuchsmallerinmagnitudeandis
nowstatisticallyinsignificant.

SUMMARYOUTPUT

RegressionStatistics

RSquare
0.150

StandardError40568

Observations 117

CoefficientsStandardErrortStat Pvalue
Intercept
53867
6825
7.893 0.000
Odometer(km)0.270
0.087
3.110 0.002
Age
360
1108
0.325 0.746

2. ComputingExercise#4
Refer to the Computing Work document and answer question 3 on
page23onmultipleregression.

After estimating three import equations, the first two being simple
linearregression,thethirdbeingamultipleregressioncontainingGNR
and relative prices as explanatory variables you were asked the
followingdiscussionquestion:

Arethecoefficients1 and2 statisticallydifferentfromzeroatthe5%


level? Of the three regression equations you estimated, which one
providesabetterexplanationofthelevelofimports?

The pvalues for 1 and 2 are both <0.0005 and hence at all conventional
significancelevelsonewouldrejectthenullhypothesesthatthesecoefficients
areindividuallyequaltozero.
4

We could interpret better in a number of ways. In terms of fit the third


regressionisbestintermsofadjusted:0.9713comparedto0.9457and0.3167
inthetwosimpleregressionmodels.(Noticethemultipleregressionmodelwill
alwaysdominatethetwosimpleregressionmodelsintermsofR2
butmaynotintermsofadjustedR2.)

In addition though you could argue that the multiple regression model is
betterbecauseitguardsagainsttheomittedvariablebiasthatislikelyinthe
twosimplelinearregressionmodels.

SUMMARY OUTPUT
Regression Statistics
Multiple R
0.9867
R Square
0.9736
0.9713
Adjusted R
Standard E 3140.3680
Observatio
26

Intercept
GNE
Price

Coefficients Standard Error


16101.329
10822.442
0.249
0.011
-38978.894
8255.354

t Stat
P-value
1.488
0.150
23.406
0.000
-4.722
0.000

3. SIA:Sydneyhousingprices.
RecallthehousingpricedataforSydneysuburbsusedinQuestion6in
Week3.Yourstatisticallynavefriendhasbeendoingsomeanalysisof
Sydneyhousingpricesusingthesedataandhasaskedyouforhelp.In
addition to the price data there are a number of characteristics
associated with the suburb that have been collected and are likely to
explain some of the large variation in housing prices across suburbs
that are observed in the data. Your friend was very interested in the
impact on housing prices of being located under the flight path. The
regression of housing price on the flightpath variable (Model 1)
provided a result that he did not expect. On your advice he ran a
second regression (Model 2) that included several extra explanatory
variables.ResultsforModel1andModel2arepresentedinthetable,
togetherwithafulldescriptionofvariablesusedintheanalysis.
5

Housingpriceisthemeanofthemedianpriceofhousessoldineach
suburbfortwoquarters(SeptemberandDecember2002)measured
inthousandsofdollars;
DistancetoCBDisdistancemeasuredinkilometersofthesuburbfrom
SydneysCBD;
Distance to Airport is distance measured in kilometers of the suburb
fromSydneyAirport;
Distance to beach is distance of the suburb measured in kilometers
fromthenearestbeach;
Flightpathisadummyvariablethatequals1ifthesuburbisunderthe
flightpathandequalto0otherwise.
(a) How would you interpret the regression estimates for the
parameters in Model 1 and explain why your friend found the
resulttobeunexpected?

Becausethe estimate of 1 is positive this means housesunder the flightpath


on average sell for more ($216,200 more) than houses not under the
flightpath. This is surprising because you would except aircraft noise
associated with being under the flighpath would be unattractive and hence
leadtolowernothigherprices.

(b) Explain why the results in Model 1 are unreliable as a basis for
determining the impact on housing prices of being located under
the flight path. Which of the assumptions associated with simple
linearregressionhasclearlybeenviolatedinModel1?

You would like to make the statement about the impact of being under the
flightpathholdingotherfactorsconstant.ThisisnotpossiblewithModel1
as it is a simple linear regression and hence there is potential for omitted
(confounding) variables that lead to biased estimates of the impact of being
situatedundertheflightpath.

Forexample,proximitytothebeachislikelytoimpactonhousingpricesand
be correlated with being under the flightpath. In Model 1, the variable
Distancetobeachisinthedisturbancetermandhenceleadstoaviolationof
assumptionthatE(u|X)=0.

(c)

WriteabriefdescriptionoftheresultsforFlightpathinModel2in
terms of the parameter estimate, its interpretation and its
statisticalsignificance.

The estimated parameter indicated a $51,500 premium (much smaller than


for Model 1) for suburbs under the flightpath relative to those not holding
otherfactorsconstant.

Forstatisticalsignificance:
H0: i =0versusH1: i 0where i istheithregressioncoefficient
BecausewehavealargesamplesizewecaninvoketheCLTandusestandard
normalcriticalvalueswhenevaluatingtheteststatisticsgivenbybi/se(bi)

Ifwechoose =0.05thenthedecisionrulewillbetorejectif|bi/se(bi)|>1.96

The test statistic for flightpath (51.5/50.2 = 1.03) indicates that this
parameterisnotstatisticallydifferentfromzero.

(d) InterprettheoverallfitofModel2.

Model2producesanR2of0.372 37.2%ofthevariationinSydneyhousing
prices is explained by the explanatory variables in the regression.

(e) UseModel2topredicttheaveragehousingpriceforthesuburbof
Randwick which is 5.21 kms from the CBD, 1.78 kms from the
beach, 6.62 kms from the airport and is not deemed to be under
theflightpath.

Prediction=853.5+021.55.21+216.6213.91.78
=855.763

ThepredictedaveragehousepriceforRandwickis$855,763

MultipleregressionresultsforSydneyhousingprices*

Dependentvariable:
Housingprice
Model1
Model2
569.9
853.5
Intercept
(20.6)
(35.5)
216.2
51.5
Flightpath
(56.0)
(50.2)
21.5
Distanceto

(3.4)
CBD
Distanceto
21.0

Airport
(2.9)
Distanceto
13.9

beach
(2.3)
Observations
503
503
Rsquared
0.029
0.372
*Numbersinbracketsbelowcoefficientestimatesarestandarderrors.

Explanatory
variables