Students Tutorial Answers Week12 ECON1203 UNSW

WEEK 12 TUTORIAL EXERCISES (To be discussed in the week starting

October 11)

1. RecalltheAnzacGaragedata(ANZACG.XLS)usedinWeek3,question

7,whereweconsideredthesimplelinearregressionmodelgivenby:

whereprice=usedcarpriceindollarsandage=ageofthecarinyears.

The EXCEL results obtained using Ordinary Least Squares are

presentedbelow:

RegressionStatistics

R 2

0.077

StandardError 42069

Observations

117

Intercept

47469

6748

7.035 0.000

Age

2658

856

3.106 0.002

(a) Interpret the tStat and the pvalues in the EXCEL output.

Whatdoyouneedtoassume?

Thetstat&pvaluesintheEXCELoutputarederivedfromtwotailtestswith

nullhypothesesthattheassociatedpopulationparameterequalsto0.Hence,

larger tstats and lower pvalues mean we are more confident that the

associatedpopulationparameterisnonzero.Here,pvaluesforbothintercept

and Age coefficients are below 1% &, hence we can be confident that both

populationparametersarestatisticallysignificant(nonzero).

Weneedtoassumethedisturbancesarenormalorbecausethesamplesizeis

largeinvoketheCLT.

1

(b) Calculatea95%confidenceintervalforthecoefficientonage.

Standardnormalcriticalvalueis1.96hence95%confidenceintervalis:

26581.96856=26581678=(4336,980)

(c) InterprettheR2value.

Theregressionmodelincludingageexplains7.7%ofthevariationinusedcar

prices.

(d) TestwhethertheestimatedcoefficientofAgeissignificantlyless

thanzeroatthe5%levelofsignificance.

Unlikein(a)thisisaonetailedtest:

Decisionrule:RejectH0ifb1/se(b1)<1.645

Teststatistic:b1/se(b1)=3.106<1.645andhencerejectH0

(e) Estimate a 95% confidence interval for the mean price for a

secondhandpassengercarthatis10yearsoldandinterpretthe

result?Note:thesamplemeanofageis6.44years.

A10yearoldcarisexpectedtobevaluedat$47469102658=20889.

Boundariesofconfidenceintervalforthispredictioncanbefoundby:

1

,

wheres=42069,se(b1)=856andhence

42069

856

2415

Hence:

20889

1.98

42069

1

117

10

6.44

2415

20889

9783

Weare95%confidentthatthepriceofa10yearoldcarwillfallbetween

$11,106 and $30,672. While the impact of age on price is precisely

estimated, the CI is quite wide because of the large amount of

unexplainedvariationthatisindicatedbytheverylowR2valuereported.

(Note: use of normal critical values here would be acceptable given the

large sample size and would make little practical difference as the

criticalvaluewouldbe1.96ratherthan1.98)

Anzac Garages pricing scheme based on the age of the car is not

workingoutverywell.Whenitssecondhandcarsarecomparedwith

cars of the same age from other dealers, prices often diverge. One of

their consultants noted that the value of a secondhand car should

dependonboththeOdometerreadingaswellastheAgeofthevehicle.

This consultant wanted to estimate the following two simple linear

regressionmodelsseparately:

whereOdometer=distancethecarhastravelledsinceleavingfactory

in kilometers. A senior consultant advised use of a multiple linear

regressionmodelinstead:

(f) Discuss why the simple linear regression methods may not be

preferable to the multiple regression method, in general, and in

the context of this problem. The resultant OLS estimates for the

multipleregressionmodelgivenbelow:

Thepredictiveperformanceofthemodelwillimproveasrelevantvariablesare

addedtoasimpleregressionmodel.

Alsotheassumptionthatthedisturbanceisuncorrelatedwiththeexplanatory

variables is critical for the unbiased estimation of coefficients of included

variables.Inthesimplepriceonageregressionitwillbeviolatedifvariables

affecting price and correlated with age have been omitted from the model.

Thisislikelytobethecaseherewithdistancethecarhastraveled.

odometerandthecoefficientonageisnowmuchsmallerinmagnitudeandis

nowstatisticallyinsignificant.

SUMMARYOUTPUT

RegressionStatistics

RSquare

0.150

StandardError40568

Observations 117

CoefficientsStandardErrortStat Pvalue

Intercept

53867

6825

7.893 0.000

Odometer(km)0.270

0.087

3.110 0.002

Age

360

1108

0.325 0.746

2. ComputingExercise#4

Refer to the Computing Work document and answer question 3 on

page23onmultipleregression.

After estimating three import equations, the first two being simple

linearregression,thethirdbeingamultipleregressioncontainingGNR

and relative prices as explanatory variables you were asked the

followingdiscussionquestion:

level? Of the three regression equations you estimated, which one

providesabetterexplanationofthelevelofimports?

The pvalues for 1 and 2 are both <0.0005 and hence at all conventional

significancelevelsonewouldrejectthenullhypothesesthatthesecoefficients

areindividuallyequaltozero.

4

regressionisbestintermsofadjusted:0.9713comparedto0.9457and0.3167

inthetwosimpleregressionmodels.(Noticethemultipleregressionmodelwill

alwaysdominatethetwosimpleregressionmodelsintermsofR2

butmaynotintermsofadjustedR2.)

In addition though you could argue that the multiple regression model is

betterbecauseitguardsagainsttheomittedvariablebiasthatislikelyinthe

twosimplelinearregressionmodels.

SUMMARY OUTPUT

Regression Statistics

Multiple R

0.9867

R Square

0.9736

0.9713

Adjusted R

Standard E 3140.3680

Observatio

26

Intercept

GNE

Price

16101.329

10822.442

0.249

0.011

-38978.894

8255.354

t Stat

P-value

1.488

0.150

23.406

0.000

-4.722

0.000

3. SIA:Sydneyhousingprices.

RecallthehousingpricedataforSydneysuburbsusedinQuestion6in

Week3.Yourstatisticallynavefriendhasbeendoingsomeanalysisof

Sydneyhousingpricesusingthesedataandhasaskedyouforhelp.In

addition to the price data there are a number of characteristics

associated with the suburb that have been collected and are likely to

explain some of the large variation in housing prices across suburbs

that are observed in the data. Your friend was very interested in the

impact on housing prices of being located under the flight path. The

regression of housing price on the flightpath variable (Model 1)

provided a result that he did not expect. On your advice he ran a

second regression (Model 2) that included several extra explanatory

variables.ResultsforModel1andModel2arepresentedinthetable,

togetherwithafulldescriptionofvariablesusedintheanalysis.

5

Housingpriceisthemeanofthemedianpriceofhousessoldineach

suburbfortwoquarters(SeptemberandDecember2002)measured

inthousandsofdollars;

DistancetoCBDisdistancemeasuredinkilometersofthesuburbfrom

SydneysCBD;

Distance to Airport is distance measured in kilometers of the suburb

fromSydneyAirport;

Distance to beach is distance of the suburb measured in kilometers

fromthenearestbeach;

Flightpathisadummyvariablethatequals1ifthesuburbisunderthe

flightpathandequalto0otherwise.

(a) How would you interpret the regression estimates for the

parameters in Model 1 and explain why your friend found the

resulttobeunexpected?

on average sell for more ($216,200 more) than houses not under the

flightpath. This is surprising because you would except aircraft noise

associated with being under the flighpath would be unattractive and hence

leadtolowernothigherprices.

(b) Explain why the results in Model 1 are unreliable as a basis for

determining the impact on housing prices of being located under

the flight path. Which of the assumptions associated with simple

linearregressionhasclearlybeenviolatedinModel1?

You would like to make the statement about the impact of being under the

flightpathholdingotherfactorsconstant.ThisisnotpossiblewithModel1

as it is a simple linear regression and hence there is potential for omitted

(confounding) variables that lead to biased estimates of the impact of being

situatedundertheflightpath.

Forexample,proximitytothebeachislikelytoimpactonhousingpricesand

be correlated with being under the flightpath. In Model 1, the variable

Distancetobeachisinthedisturbancetermandhenceleadstoaviolationof

assumptionthatE(u|X)=0.

(c)

WriteabriefdescriptionoftheresultsforFlightpathinModel2in

terms of the parameter estimate, its interpretation and its

statisticalsignificance.

for Model 1) for suburbs under the flightpath relative to those not holding

otherfactorsconstant.

Forstatisticalsignificance:

H0: i =0versusH1: i 0where i istheithregressioncoefficient

BecausewehavealargesamplesizewecaninvoketheCLTandusestandard

normalcriticalvalueswhenevaluatingtheteststatisticsgivenbybi/se(bi)

Ifwechoose =0.05thenthedecisionrulewillbetorejectif|bi/se(bi)|>1.96

The test statistic for flightpath (51.5/50.2 = 1.03) indicates that this

parameterisnotstatisticallydifferentfromzero.

(d) InterprettheoverallfitofModel2.

Model2producesanR2of0.372 37.2%ofthevariationinSydneyhousing

prices is explained by the explanatory variables in the regression.

(e) UseModel2topredicttheaveragehousingpriceforthesuburbof

Randwick which is 5.21 kms from the CBD, 1.78 kms from the

beach, 6.62 kms from the airport and is not deemed to be under

theflightpath.

Prediction=853.5+021.55.21+216.6213.91.78

=855.763

ThepredictedaveragehousepriceforRandwickis$855,763

MultipleregressionresultsforSydneyhousingprices*

Dependentvariable:

Housingprice

Model1

Model2

569.9

853.5

Intercept

(20.6)

(35.5)

216.2

51.5

Flightpath

(56.0)

(50.2)

21.5

Distanceto

(3.4)

CBD

Distanceto

21.0

Airport

(2.9)

Distanceto

13.9

beach

(2.3)

Observations

503

503

Rsquared

0.029

0.372

*Numbersinbracketsbelowcoefficientestimatesarestandarderrors.

Explanatory

variables

