# STAT2509ExamReviewProblems1

REMINDER:Onthefinalexam
1.

2.

"Test"meansthatyoumustcarryoutaformalhypothesistestingprocedurewithH0,Ha,
teststat.,criticalregion,calculatedvalueofteststat.,andconclusion.

3.

numericalvaluesofthisteststatandpvalue.

4.

"Findallexamples(indications)of...inoutput"meansyoumustgivetheactualnumerical

5.
Don'tforgettostateassumptionswherenecessary.
_______________________________________________________________________________________
_
1.

ThejobplacementcentreatalargeUniversitywouldliketopredictthestartingsalary(givenin\$000)
representsthenumberofyearsofpriorjobrelatedexperience.Thefollowingdatawasobtainedona

15

X = 44.7

13.8

(X

44.7

13.8

137.19

40.36

40.36

22.72

2.52089

- 0.77695

- 0.15100

X ) = - 0.77695
- 0.15100

0.254727

0.019415

0.019415

0.101240

-1

350.6

X Y = 1069.27

336.88
T

Y i2 =8387.48

## Y = 2.1858 + 6.5138 X 1 + 1.925 X 2

a)
b)
c)
d)
e)
f)
g)
2.

a)

Givethemodelassumedincludingtheassumptionsnecessaryforestimationandprediction.
ConstructtheANOVAtable.
Atthe5%levelofsignifcanceistherealinearrelationshipbetweenstartingsalaryandthe
twoexplanatoryvariables?
regressiononGPAandjobrelatedexperience?
Testwhether GPAmakesasignificantcontributiontoamodelwith yearsofjobrelated
experienceinit.Use=0.05.
FindthecorrelationcoefficientbetweenX1andX2.
Woulditbereasonabletosaythatitisestimatedthattheaveragestartingsalaryincreasesby
6.5138(\$000)withanincreaseof1inGPA?Why?
Whatdeparturesfromalinearregressionmodelcanbestudiedfromaplotoftheresiduals

b)

3.

against the predicted values? What departures can be studied from a histogram of the
residuals?
Drawanexampleresidualplot(eivs Y i )foreachofthefollowingcases:
i)
errorvariancedecreaseswith Y i
ii)
thetrueregressionfunctionis shapedbutaSLRfunctionisfitted.

Computeloperates3computersatdifferentlocations. Thecomputersareidenticalinmakeand
model butaresubjecttodifferentdegrees ofvoltage fluctuation inthepowerlines servingthe
respectiveinstallations. Itisdesiredtotestwhethertheaveragelengthofoperatingtimebetween
failuresisthesameforthe3computers.Itisbelievedthattimebetweenfailuresfollowseitheran
exponentialoraWeibulldistribution.Thedataobtainedareshownbelow.
Computer

Operatingtimebetweenfailures(inhours)

A
B
C

105
85
183

93
43
144

90
1
219

217
37
86

22
14
39

Useanappropriatetesttodeterminewhetherthereissufficientevidencetoconclude,atthe5%level
ofsignificance,thataveragelengthofoperatingtimediffersforthe3computers.
4.

Surface

1
2
3
4
5
6
7
8
9
10

68
40
82
56
70
80
47
55
78
53

Totals

629

72
43
89
60
75
91
58
68
77
65

698

Totals

65
42
84
50
68
86
50
52
75
60

205
125
255
166
213
257
155
175
230
178

632

1959

a)
b)
c)
d)

FillintheblanksintheANOVAtable.
Whichmultiplecomparisonwereyouusinginpart(c)?

## The SAS System

1

TheANOVAProcedure
ClassLevelInformation
ClassLevelsValues
surface3123
Numberofobservations30

TheSASSystem2

TheANOVAProcedure
DependentVariable:measure
Sumof
SourceDFSquaresMeanSquareFValuePr>F
Model116009.166667546.28787956.15<.0001
Error18175.1333339.729630
CorrectedTotal296184.300000
RSquareCoeffVarRootMSEmeasureMean
0.9716814.7767773.11923565.30000
SourceDFAnovaSSMeanSquareFValuePr>F
surface2304.200000152.10000015.630.0001

TheSASSystem3
TheANOVAProcedure
Bonferroni(Dunn)tTestsformeasure
NOTE:ThistestcontrolstheTypeIexperimentwiseerrorrate,butit
generallyhasahigherTypeIIerrorratethanTukey'sforallpairwise
comparisons.
Alpha0.05
ErrorDegreesofFreedom
ErrorMeanSquare
CriticalValueoft
MinimumSignificantDifference
Comparisonssignificantatthe0.05levelareindicatedby
DifferenceSimultaneous
ComparisonMeansLimits

A2A36.6002.91810.282
A2A16.9003.21810.582
A3A26.60010.2822.918
A3A10.3003.3823.982
A1A26.90010.5823.218
A1A30.3003.9823.382

5.

Alocalrealestateassociationinametropolitanareawouldliketodevelopanequationtopredictthe
sellingpriceofasinglefamilyhousebasedonthenumberofrooms(X 1),andtheneighbourhood
(X2).Thepostulatedmodelwas
Y = 0 + 1 X 1+ 2 X 2 + 3 X 3 +

where X2=

0
1

ifneighbourhood
ifneighbourhood

A
B

X3=X1*X2

TheSASoutputisgivenbelow.Assumingtherearenoobviousassumptionviolations:
a)
b)
c)
d)
e)

Writetheseparatemodelsforeachneighbourhood.
Writetheseparatefittedequationsforeachneighbourhood.
Testwhetherthetwolinesarecoincident(thesame).Use =.05.YoudoNOThaveto
stateassumptions.
Testwhetherthetwolineshavethesameslopes.Use=.05.WouldyourejectH0at
=.001?Whyorwhynot?
Findthesimplelinearregressionequationtopredictsellingpricebasedonneighbourhood.

TheSASSystem
Model:MODEL1
ModelCrossproductsX'XX'YY'Y
X'XINTERCEPROOMSNBHDX3PRICE
INTERCEP2017910932251.9
ROOMS17916619389720742.4
NBHD109310931221.7
X3938979389711686.7
PRICE2251.920742.41221.711686.7261206.41

DependentVariable:PRICE
AnalysisofVariance
SumofMean
SourceDFSquaresSquareFValueProb>F
Model36695.988842231.9962837.2880.0001
Error16957.7406659.85879
CTotal197653.72950
RootMSE7.73685Rsquare0.8749
C.V.6.87139
ParameterEstimates
ParameterStandardTforH0:

VariableDFEstimateErrorParameter=0Prob>|T|
INTERCEP133.94508213.690387402.4790.0247
ROOMS18.0319671.566278765.1280.0001
NBHD15.90209118.833365760.3130.7580
X312.0892172.077977141.0050.3297

Model:MODEL2
DependentVariable:PRICE
AnalysisofVariance
SumofMean
SourceDFSquaresSquareFValueProb>F
Model26635.480763317.7403855.3910.0001
Error171018.2487459.89698
CTotal197653.72950
RootMSE7.73931Rsquare0.8670
C.V.6.87359
ParameterEstimates
ParameterStandardTforH0:
VariableDFEstimateErrorParameter=0Prob>|T|
INTERCEP123.7371339.186753582.5840.0193
ROOMS19.2189381.029623058.9540.0001
NBHD112.6967433.535372323.5910.0023
Model:MODEL3
DependentVariable:PRICE
AnalysisofVariance
SumofMean
SourceDFSquaresSquareFValueProb>F
Model15862.943705862.9437058.9310.0001
Error181790.7858099.48810
CTotal197653.72950
RootMSE9.97437Rsquare0.7660
C.V.8.85863
ParameterEstimates
ParameterStandardTforH0:
VariableDFEstimateErrorParameter=0Prob>|T|
INTERCEP123.33867711.838958071.9710.0643
ROOMS19.9727741.299103237.6770.0001

6.

samplesof10supermarketswereselectedineachofthe4cities.Theresultsobtainedaregiven
below.
Toronto
Kingston
Ottawa
Montreal

68
72
65
80

40
63
42
75

72
89
74
88

56
60
50
92

70
75
68
93

60
91
76
70

47
58
50
62

55
68
52
78

58
77
75
86

53

65

60

72

Totals
579

718

612
796
a)
b)
c)
d)

|2705

FillinthegapsintheANOVAtable.
Use=.05.
Ifappropriate,determinewhichcitiesdiffer.UsetheTukeyMultipleComparisonprocedure
with=.05.
Why?

data;
inputcitycost@@;
cards;
168140172156170160147155158153
272263289260275291258268277265
365342374350368376350352375360
480475488492493470462478486472
run;
procanova;
classcity;
modelcost=city;
run;
AnalysisofVarianceProcedure
ClassLevelInformation
ClassLevelsValues
CITY41234
Numberofobservationsindataset=40
AnalysisofVarianceProcedure
DependentVariable:COST
SumofMean
SourceDFSquaresSquareFValuePr>F
Model8.160.0003
Error
CorrectedTotal7331.375000
RSquareC.V.RootMSEMEASUREMean
0.40468216.2820467.6250000
SourceDFAnovaSSMeanSquareFValuePr>F
CITY8.160.0003

7.

a)

Theleastsquaresregressionequation Y =1.53.5X1+7.5X2150X3gavethefollowing
results:

Ftestfor

H 0 : 1= 2 = 3 = 0
H A : at least one j 0

F=37.5

pvalue=.0002

ttestsfor

H0 : j=0
HA: j 0

X1
X2
X3
i)
ii)
iii)

b)

t=0.5
t=1.89
t=1.29

pvalue=.54
pvalue=.28
pvalue=.31

IfX1wereremovedfromthemodel,doyouthinkthecoefficientsofX 2andX3
wouldstillbeapproximately7.5and150respectively?Whyorwhynot?
SSR( X 2 | X 1 , X 3 ) [SSR( X 1 , X 2 , X 3 ) - SSR( X 1 , X 3 )]
=
Would

be
TSS
TSS
SSR( X 2 )
approximatelyequalto
, theproportionofthetotalsumofsquares
TSS
accountedforbyX2inasimplelinearregressionofYonX2?Why?

orderlinearmodelcontainingthe2variablesX1andX2.Acolleagueclaimsthattheextra3
variables X 3 = X 12 , X 4 = X 22 , X 5 = X 1 X 2 areneeded.Thereare85observations,and
TSS=332.92
Thefittedregressionequationusingall5variablesis:
Y = 1.109 + 1.982 X 1 + 4.657 X 2 + 4.223 X 3 + 1.218 X 4 + 0.223 X 5 with SSR =

296.30.
ThefittedequationusingonlyX1andX2is:
Y = 45.73 + 9.803 X 1 + 7.451 X 2 withSSR=287.71.
Basedontheseresultswhichclaimwouldyousupport?Use=.05.
8.

Basedonthefollowingcorrelationmatrix

Y
X1
X2
X3
a)
b)
9.

X1

X2

X3

1
.25
.36
.59

.25
1
.54
.22

.36
.54
1
.31

.59
.22
.31
1

WhatistherelationshipbetweenthecorrelationcoefficientofYwithX 1andthecoefficient
ofdeterminationfortheSLRofYonX1?Whatarethesevaluesinthisproblem?
WhichtwoXvariablesaremosthighlycorrelatedwithY?Which2Xvariablesarethe
mosthighlycorrelatedwitheachother?

Themanagerofasmallenginerepairshopwantstodeterminewhetherthelengthoftimeittakesto
specialorderapartwasrecordedfor22randomlyselectedordersfromeachofthethreewarehouses

A,B,C.SASoutputisshownbelow.

DATAPARTS;
INPUTwarehous\$time@@;
cards;
A13A17A14A10A9A15A10A11A13A18A14
A13A15A12A14A15A17A14A11A14A16A12
B7B12B9B15B6B10B12B10B8B14B10
B6B9B13B11B9B13B10B11B16B10B9
C10C12C18C19C17C15C20C11C15C13C17
C13C17C14C16C15C13C15C9C14C14C15
RUN;
PROCCHART;
BYwarehous;
VBARtime;
RUN;
PROCANOVA;
CLASSwarehous;
MODELtime=warehous;
Meanswarehous/tukeycldiff;
RUN;

8
*****

*****
6
*****

*****
4
*****
*****
*****

*****
*****
*****
*****
*****
2
*****
*****
*****
*****
*****

*****
*****
*****
*****
*****

10
12
14
16
18
time Midpoint

*****
8
*****

*****
6
*****

*****
*****
4
*****
*****

*****
*****
*****
*****
2
*****
*****
*****
*****
*****

*****
*****
*****
*****
*****

6.25
8.75
11.25
13.75
16.25
time Midpoint
The SAS System

## --------------------------------- warehouse=C ---------------------------------Frequency

*****
*****

*****
6
*****

*****
4
*****
*****
*****

*****
*****
*****
*****
2
*****
*****
*****
*****
*****

*****
*****
*****
*****
*****

10.0
12.5
15.0
17.5
20.0
time Midpoint

TheSASSystem11
+++++++++++
RESIDUAL||
5+***+
R|***|
e|***|
s0+***+
i|***|
d|***|
u5+***+
a||
l||
10++
||
+++++++++++
10.010.511.011.512.012.513.013.514.014.515.0
PredictedValueofTIMEPRED

AnalysisofVarianceProcedure
ClassLevelInformation
ClassLevelsValues
WAREHOUS3ABC
Numberofobservationsindataset=66
AnalysisofVarianceProcedure
DependentVariable:TIME
SumofMean
SourceDFSquaresSquareFValuePr>F
Model205.727272715.000.0001
Error432.0454545
CorrectedTotal637.7727273
RSquareC.V.RootMSETIMEMean
0.32257120.357792.61875212.8636364
SourceDFAnovaSSMeanSquareFValuePr>F
WAREHOUS205.727272715.000.0001

AnalysisofVarianceProcedure
Tukey'sStudentizedRange(HSD)Testforvariable:TIME
NOTE:ThistestcontrolsthetypeIexperimentwiseerrorrate.
Alpha=0.05Confidence=0.95df=63MSE=6.857864
CriticalValueofStudentizedRange=3.395
MinimumSignificantDifference=1.8953
Comparisonssignificantatthe0.05levelareindicatedby

DifferenceSimultaneous
WAREHOUSBetween95%Confidence
ComparisonMeansLimits
CA1.1360.7593.032
CB4.1822.2876.077
AC1.1363.0320.759
AB3.0451.1504.941
BC4.1826.0772.287
BA3.0454.9411.150

a)

BasedonthehistogramsandresidualplotsshownabovewouldyousaythatanANOVAF
testwasvalid?Whyorwhynot?
Fillinthemissingvaluesintheoutput.
Testwhether theaveragenumberofdays tospecialorderapart is thesameforall 3
warehouses.Use=.05.
Ifappropriate,usetheresultsoftheTukeymultiplecomparisontesttodeterminewhich
warehousesdifferinaveragenumberofdaystospecialorder.(Youdonotneedtocarryout
aformalhypothesistest.) Underwhatcircumstanceswouldcarryingoutsuchtests not be
appropriate?
Whyaremultiplecomparisonmethodsneeded?
CarryoutthecalculationstoobtaintheTukeyC.I.forCA.

b)
c)
d)

e)
f)
10.

WhencarryingoutBonferronimultiplecomparisonsonthemeansof5differentpopulationswith
=.10,
a)
b)

Howmanytestsareneededtotestfordifferencesbetweenallpossiblepairsofmeans?
Whatarethenullandalternativehypothesesbeingtested?Whatistheformulaforthecritical
difference?
Whatisthesignificancelevelforeachindividualtest?
Definewhatismeantbythefamily,oroverallerrorrate,forthetests.

c)
d)
11.

Inordertopredictmonthlypowerusagebasedonhousesize,datawasobtainedandascatterplot
summarizedbelow.

10
X

X =

20

20

20

17

70

27.1

X Y = 14.3

53.1

68

( X T X )-1 =

-1

14

-1

14

1
20

28

Y i2 = 83.77
a)
b)
c)

State the appropriate model along with the assumptions necessary for estimation and
hypothesistesting.
Findtheleastsquaresfittedregressionequation.
CompletethefollowingANOVAtable.

Source

d)
e)
f)
g)
12.

d.f.

S.S.

regression

10.2677

error

0.0613

total

10.329

M.S.

WhatisMSEestimating?WhatdoesitmeanwhenwesayMSEisanunbiasedestimator?
Isthereasignificantlinearrelationshipbetweenmonthlypowerusageandtheexplanatory
variables?Use=0.05.
ComputethecoefficientofcorrelationbetweenXandY.
Finda95%C.I.estimatefor2.Basedonthisconfidenceintervalwouldyouconcludethat

Inordertocomparethreebrandsofgasoline,eachbrandwastestedineachofsevencars,driven
underidenticalconditions. Themilespergallonachievedbyeachcarforeachgasolinebrandis
givenbelow.PartialSASoutputisalsogiven.
Car

GasA

GasB

GasC

Totals

1
2
3
4
5
6
7

16
18
17
20
19
20
18

19
22
23
20
22
21
23

23
24
22
23
25
20
22

58
64
62
63
66
61
63

Totals

128

150

159

437

Source

df

SS

MS

Gas
Brand

14.53

Car

0.84

error
Total

a)
b)

115.238

FillintheblanksintheANOVAtable.
Atthe.05levelofsignificanceistheresufficientevidencetoconcludethattheaveragemiles
pergallondiffersbetweenthe3brandsofgas?

13.

TheDirectorofManagementInformationSystemsataconglomeratemustpreparehislongrange
forecastsforthecompany's3yearbudget.Inparticularhemustdevelopstaffingratiostopredictthe
electronicdataprocessingstaffsof10companieswithintheindustryareasfollows:
15
7
20
12
16
20
10
9
18
15

6
2
10
4
7
8
4
6
7
9
63

T
X Y =

979

a)
b)
c)
d)

14.

10

142

142

2204

T
X X =

Y i2 = 451

StatetheappropriateSLRmodelalongwiththeassumptionsnecessaryforestimationand
hypothesistesting.
Findtheleastsquaresfittedregressionline.
Computethecoefficientofdeterminationandinterpretit.
Assumingthatitwasconcludedthatthereisalinearrelationshipbetweennumberofproject
managersneededatCompanyXYZifitplanstoemploy14programmers.

Acostanalystforalargeuniversitywouldliketodeveloparegressionmodeltopredictlibrary
expendituresformaterialsandsalaries(in\$millions).Threeexplanatoryvariablesareavailablefor
consideration:
X1=no.ofvolumesinthelibrary(inthousands)
X3=no.ofcurrentserials(inthousands)
Asampleof20largeresearchlibrarieswasselectedandthecomputeroutputonthefollowingpages
produced.
a)
b)
c)

Listalltheindicationsofmulticollinearityyoucanfindintheoutput.
AssumingthatanyassumptionviolationsinMODEL7(the3variablemodel)arenotsevere
enoughtoinvalidateestimationandhypothesistests,testwhetherX 1andX3contributeto
thepredictionoflibraryexpendituresinamodelthatincludesX2.Use=.05
Which regression model would you advise the analyst to choose? Provide a detailed
explanationforyourchoice.

SAS
Correlation
CORRX1YX2X3
X11.00000.92670.94630.9082
Y0.92671.00000.91880.8953
X20.94630.91881.00000.8761
X30.90820.89530.87611.0000

Model:MODEL1
DependentVariable:Y
AnalysisofVariance
SumofMean
SourceDFSquaresSquareFValueProb>F
Model1314544161.69314544161.69115.5890.0001
Error1951703289.4122721225.7585
CTotal20366247451.10
RootMSE1649.61382Rsquare0.8588
C.V.17.43350
ParameterEstimates
ParameterStandardTforH0:
VariableDFEstimateErrorParameter=0Prob>|T|
INTERCEP13266.565093679.473708504.8070.0001
X112.3146880.2152950010.7510.0001

++++++++++
||
||
4000++
||
R|**|
E2000+**+
S|*|
I|***|
D0+*****+
U|***|
A|*|
L2000+*+
|***|
||
4000++
||
++++++++++
600080001000012000140001600018000200002200024000
PRED
Model:MODEL2
DependentVariable:Y
AnalysisofVariance
SumofMean
SourceDFSquaresSquareFValueProb>F

Model1309158879.58309158879.58102.8930.0001
Error1957088571.5223004661.6591
CTotal20366247451.10
RootMSE1733.39599Rsquare0.8441
C.V.18.31892
ParameterEstimates
ParameterStandardTforH0:
VariableDFEstimateErrorParameter=0Prob>|T|
INTERCEP1664.572085946.213805790.7020.4910
X21114.39801011.2778261410.1440.0001

++++++++++
||
||
4000++
|**|
R|*|
E2000++
S|*|
I|**|
D0+*****+
U|****|
A|*|
L2000+***+
|*|
||
4000++
||
++++++++++
40006000800010000120001400016000180002000022000
PRED
Model:MODEL3
DependentVariable:Y
AnalysisofVariance
SumofMean
SourceDFSquaresSquareFValueProb>F
Model1293586581.99293586581.9976.7700.0001
Error1972660869.1033824256.2686
CTotal20366247451.10
RootMSE1955.57057Rsquare0.8016
C.V.20.66692
ParameterEstimates
ParameterStandardTforH0:
VariableDFEstimateErrorParameter=0Prob>|T|
INTERCEP12244.979484927.702591992.4200.0257
X31270.40897630.862172568.7620.0001

+++++++++
5000++
||
|*|
|***|
R|*****|
E0+******+
S|***|
I|*|
D|*|
U||
A5000++
L|*|
||
||
||
10000++
+++++++++
400060008000100001200014000160001800020000
PRED

Model:MODEL4
DependentVariable:Y
AnalysisofVariance
SumofMean
SourceDFSquaresSquareFValueProb>F
Model2320669262.90160334631.4563.3200.0001
Error1845578188.1962532121.5665
CTotal20366247451.10
RootMSE1591.26414Rsquare0.8756
C.V.16.81684
ParameterEstimates
ParameterStandardTforH0:
VariableDFEstimateErrorParameter=0Prob>|T|
INTERCEP11967.0609701061.93946361.8520.0805
X111.3694020.642286022.1320.0470
X2149.79878732.018752591.5550.1373
++++++++++
||
||
4000++
|*|
R|**|
E2000++
S|**|
I|***|
D0+****+
U|***|
A|**|
L2000+**+
|*|
||
4000++
||

++++++++++
40006000800010000120001400016000180002000022000
PRED

Model:MODEL5
DependentVariable:Y
AnalysisofVariance
SumofMean
SourceDFSquaresSquareFValueProb>F
Model2320570959.74160285479.8763.1650.0001
Error1845676491.3592537582.8533
CTotal20366247451.10
RootMSE1592.97924Rsquare0.8753
C.V.16.83497
ParameterEstimates
ParameterStandardTforH0:
VariableDFEstimateErrorParameter=0Prob>|T|
INTERCEP12656.899966766.177707243.4680.0027
X111.6195760.496655643.2610.0043
X3192.55287560.056042681.5410.1407
++++++++++
||
||
4000++
||
R|**|
E2000++
S|**|
I|*****|
D0+***+
U|***|
A|**|
L2000+**+
||
|*|
4000++
||
++++++++++
40006000800010000120001400016000180002000022000
PRED
Model:MODEL6
DependentVariable:Y
AnalysisofVariance
SumofMean
SourceDFSquaresSquareFValueProb>F
Model2322034328.09161017164.0565.5530.0001
Error1844213123.0052456284.6114
CTotal20366247451.10
RootMSE1567.25384Rsquare0.8793
C.V.16.56310
ParameterEstimates
ParameterStandardTforH0:

VariableDFEstimateErrorParameter=0Prob>|T|
INTERCEP1792.153987857.334218740.9240.3677
X2171.97559421.149527653.4030.0032
X31117.45402551.301027812.2900.0343
+++++++++
||
||
5000++
||
R|*|
E2500+*+
S|*|
I|******|
D0+******+
U|***|
A|*|
L2500+*+
|*|
||
5000++
||
+++++++++
400060008000100001200014000160001800020000
PRED

Model:MODEL7
DependentVariable:Y
AnalysisofVariance
SumofMean
SourceDFSquaresSquareFValueProb>F
Model3325360308.32108453436.1145.0930.0001
Error1740887142.7812405126.0459
CTotal20366247451.10
RootMSE1550.84688Rsquare0.8884
C.V.16.38970
ParameterEstimates
ParameterStandardTforH0:
VariableDFEstimateErrorParameter=0Prob>|T|
INTERCEP11566.5699971073.96025831.4590.1629
X110.8543650.726528341.1760.2558
X2144.37495531.446232971.4110.1762
X3182.28468158.918691261.3970.1805
++++++++++
||
||
4000++
||
R|**|
E2000++
S|**|
I|****|
D0+*****+

U|****|
A|*|
L2000+*+
||
|*|
4000++
||
++++++++++
40006000800010000120001400016000180002000022000
PRED