Вы находитесь на странице: 1из 22

Stat250GundersonLectureNotes

11:RegressionAnalysis

Theinvalidassumptionthatcorrelationimpliescauseisprobablyamongthetwoor
threemostseriousandcommonerrorsofhumanreasoning.

StephenJayGould,TheMismeasureofMan

Describingandassessingthesignificanceofrelationshipsbetweenvariablesisveryimportantin
research.Wewillfirstlearnhowtodothisinthecasewhenthetwovariablesarequantitative.
Quantitativevariableshavenumericalvaluesthatcanbeorderedaccordingtothosevalues.

Mainidea
Wewishtostudytherelationshipbetweentwoquantitativevariables.

Generallyonevariableisthe_____RESPONSE______variable,denotedbyy.
Thisvariablemeasurestheoutcomeofthestudy
andisalsocalledthe_____DEPENDENT_____variable.

Theothervariableisthe_____EXPLANATORY______variable,denotedbyx.
Itisthevariablethatisthoughttoexplainthechangesweseeintheresponsevariable.

Theexplanatoryvariableisalsocalledthe__INDEPENDENT___variable.

The first step in examining the relationship is to use a graph a scatterplot to display the
relationship.Wewilllookforanoverallpatternandseeifthereareanydeparturesfromthis
overallpattern.

Ifalinearrelationshipappearstobereasonablefromthescatterplot,wewilltakethenextstep
offindingamodel(anequationofaline)tosummarizetherelationship.Theresultingequation
maybeusedforpredictingtheresponseforvariousvaluesoftheexplanatoryvariable.Ifcertain
assumptions hold, we can assess the significance of the linear relationship and make some
confidenceintervalsforourestimationsandpredictions.

Let'sbeginwithanexamplethatwewillcarrythroughoutourdiscussions.

171

GraphingtheRelationship:RestaurantBillvsTip
Howwelldoesthesizeofarestaurantbillpredictthetiptheserverreceives?Belowarethebills
andtipsfromsixdifferentrestaurantvisitsindollars.
Bill
41
98
25
85
50
73
Tip
8
17
4
12
5
14

Response(dependent)variabley=TipAmount.

Explanatory(independent)variablex=AmountoftheBill.

Step1:Examinethedatagraphicallywithascatterplot.
Addthepointstothescatterplotbelow:

Interpretthescatterplotintermsof...
overallform(istheaveragepatternlooklikeastraightlineorisitcurved?)
directionofassociation(positiveornegative)
strengthofassociation(howmuchdothepointsvaryaroundtheaveragepattern?)
anydeviationsfromtheoverallform?

172

DescribingaLinearRelationshipwithaRegressionLine

Regression analysis is the area of statistics used to examine the relationship between a
quantitative response variable and one or more explanatory variables. A key element is the
estimationofanequationthatdescribeshow,onaverage,theresponsevariableisrelatedto
theexplanatoryvariables.Aregressionequationcanalsobeusedtomakepredictions.

Thesimplestkindofrelationshipbetweentwovariablesisastraightline,theanalysisinthiscase
iscalledlinearregression.

RegressionLineforBillvs.Tip
Remembertheequationofaline?y=mx+b
Instatisticswedenotetheregressionlineforasampleas:
where:

y yhat=thepredictedyorestimatedyvalue
b0 yintercept=estimatedywhenx=0(notalwaysmeaningful)
b1 slope=howmuchofanincrease/decreaseweexpecttoseeinywhenxincreasesby1unit.

Goal:
Apossibleline
Tofindalinethatisclosetothedatapointsfindthebestfittingline.

How?
Whatdowemeanbybest?
Observederrorifweused
Onemeasureofhowgoodalinefitsistolookatthe
thislinetopredict=yyhat
observederrorsinprediction.

Observederrors=_____ y y _________

arecalled____ residuals __________

So we want to choose the line for which the sum of


squares of the observed errors (the sum of squared
residuals)istheleast.

Thelinethatdoesthisiscalled:______ LeastSquaresRegressionLine _____

173

Theequationsfortheestimatedslopeandinterceptaregivenby:

b1

x x y y x x y S
x x
x x S
2

XY

XX

sy
sx

b0 y b1 x

Predicta
singleyat
givenx

Estimatethe
averageyforallx

Theleastsquaresregressionline(estimatedregressionfunction)is: y y ( x) b0 b1 x
Moreonthisdistinctionlaterwhentalkaboutpredictionintervalsvs.CIsforamean.

To find this estimated regression line for our exam data by hand, it is easier if we set up a
calculationtable.Byfillinginthistableandcomputingthecolumntotals,wewillhaveallofthe
mainsummariesneededtoperformacompletelinearregressionanalysis.Notethatherewe
haven=6observations.Thefirstfiverowshavebeencompletedforyou.Ingeneral,useRora
calculatortohelpwiththegraphingandnumericalcomputations!

x=bill

y=tip

xx

x x 2

x x y

y y

y y 2

41

4162=21

(21)2=441

(21)(8)=168

810=2

(2)2=4

98

17

9862=36

(36)2=1296

(36)(17)=612

1710=7

(7)2=49

25

2562=37

(37)2=1369

(37)(4)=148

410=6

(6)2=36

85

12

8562=23

(23)2=529

(23)(12)=276

1210=2

(2)2=4

50

5062=12

(12)2=144

(12)(5)=60

510=5

(5)2=25

73

14

7362=11

(11)2=121

(11)(14)=154

1410=4

(4)2=16

372

60

3900

666

134

372
6

SlopeEstimate: b1

60
6

x x y 666 0.17077
x x 3900
2

yinterceptEstimate: b0

y b1 x 10 (0.17077)(62) 10 10.5877 0.5877

EstimatedRegressionLine: y

b0 b1 x 0.5877 0.17077( x )

174

Predictthetipforadinnerguestwhohada$50bill.

y b0 b1 x 0.5877 0.17077(50) 7.95

Note:The5thdinnerguestinsamplehadabillof$50andtheobservedtipwas$5.

Findtheresidualforthe5thobservation.

Notationforaresidual e5 y 5 y 5 57.95=2.95

Theresiduals

You found the residual for one observation. You could compute the residual for each
observation.Thefollowingtableshowseachresidual.

Squaredresiduals
predictedvalues
residuals
x=bill
y=tip

0.5877 0.17077 e y y
(e) 2 y y 2
41
98
25
85
50
73

8
17
4
12
5
14

6.41
16.15
3.68
13.93
7.95
11.88

1.59
0.85
0.32
1.93
2.95
2.12

2.52
0.72
0.10
3.73
8.70
4.49

~20.27

SSE=sumofsquarederrors(orresiduals)20.27

175

MeasuringStrengthandDirectionofaLinearRelationshipwithCorrelation

Thecorrelationcoefficientrisameasureofstrengthofthelinearrelationshipbetweenyandx.

PropertiesabouttheCorrelationCoefficientr

1. r rangesfrom...1to+1(anditisunitless)

2. Signof r indicates...directionoftheassociation

3. Magnitudeof r indicates...strength

(r=0.8andr=+0.8indicateequallystronglinearassociations)
Astrongrisdisciplinespecific
r=0.8mightbeanimportant(orstrong)correlationinengineering
r=0.6mightbeastrongcorrelationinpsychologyormedicalresearch

4. r ONLYmeasuresthestrengthoftheLINEARrelationship.

Somepictures:

r=+0.7
y

The formula for the correlation:


(but we will get it from computer output or from r2)

TipsExample:wewillsoonseethat

r=____0.9213______

Interpretation:
Afairlystrongpositivelinearassociation
betweenamountofthebillandthe
amountoftip.

r=0.4

176

r0

Thesquareofthecorrelation r 2

Total
variation
intheys
Variationnot
accountedfor

The squared correlation coefficient r 2 always has a value between __0 and 1___ and is
sometimespresentedasapercent.Itcanbeshownthatthesquareofthecorrelationisrelated
tothesumsofsquaresthatariseinregression.

Theresponses(theamountoftip)indatasetarenotallthesametheydovary.Wewould
measurethetotalvariationintheseresponsesas SSTO y y 2 (lastcolumntotalin
calculationtablesaidwewoulduselater).
Partofthereasonwhytheamountoftipvariesisbecausethereisalinearrelationshipbetween
amountoftipandamountofbill,andthestudyincludeddifferentamountsofbill.

Whenwefoundtheleastsquaresregressionline,therewasstillsomesmallvariationremaining
oftheresponsesfromtheline.Thisamountofvariationthatisnotaccountedforbythelinear
relationshipiscalledtheSSE.

Theamountofvariationthatisaccountedforbythelinearrelationshipiscalledthesumof
squaresduetothemodel(orregression),denotedbySSM(orsometimesasSSR).

Sowehave: SSTO=______SSM+SSE________
Itcanbeshownthat
SSM
2 SSTO SSE

r =

SSTO
SSTO

= theproportionoftotalvariabilityintheresponsesthatcanbeexplainedbythelinear
relationshipwiththeexplanatoryvariable x .

Note:Thevalueof r 2 andthesesumsofsquaresaresummarizedinanANOVAtablethatis
standardoutputfromcomputerpackageswhendoingregression.

177

MeasuringStrengthandDirectionforExam2vsFinal

Fromourfirstcalculationtablewehave:

SSTO=___134_____________

Fromourresidualcalculationtablewehave:

SSE=___20.27_______________

Sothesquaredcorrelationcoefficientforourexamscoresregressionis:

SSTO SSE
=
r2
SSTO

134 20.27 113.73

0.84873
134
134

Interpretation:
Weaccountedfor~84.9%ofthevariationin__AmountofTipsreceived_

bythelinearregressiononAmountoftheBill.

r
0.84873 0.9213
Thecorrelationcoefficientisr=

Afewmoregeneralnotes:

Nonlinearrelationships

DetectingOutliersandtheirinfluenceonregressionresults.

DangersofExtrapolation(predictingoutsidetherangeofyourdata)

Dangersofcombininggroupsinappropriately(SimpsonsParadox)

Correlationdoesnotprovecausation

178

RRegressionAnalysisforBillvsTips

LetslookattheRoutputforourBillandTipdata.
Wewillseethatmuchofthecomputationsaredoneforus.

Call:
lm(formula = Tip ~ Bill, data = Tips)
Residuals:
1
2
1.5862 0.8523

3
4
5
0.3185 -1.9277 -2.9508

6
2.1215

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.58769
2.41633 -0.243 0.81980
Bill
0.17077
0.03604
4.738 0.00905 **
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.251 on 4 degrees of freedom
Multiple R-squared: 0.8487,
Adjusted R-squared: 0.8109
F-statistic: 22.45 on 1 and 4 DF, p-value: 0.009052
Correlation Matrix
Bill
Tip
Bill 1.0000000 0.9212755
Tip 0.9212755 1.0000000

ANOVA Table
Response: Tip
Df Sum Sq Mean Sq F value
Pr(>F)
Bill
1 113.732 113.732 22.446 0.009052 **
Residuals 4 20.268
5.067
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

179

InferenceinLinearRegressionAnalysis

Thematerialcoveredsofarfocusesonusingthedataforasampletographanddescribethe
relationship.Theslopeandinterceptvalueswehavecomputedarestatistics,theyareestimates
oftheunderlyingtruerelationshipforthelargerpopulation.

Nextweturntomakinginferencesabouttherelationshipforthelargerpopulation.Hereisa
nice summary to help us distinguish between the regression line for the sample and the
regressionlineforthepopulation.

RegressionLinefortheSample

RegressionLineforthePopulation

Allimages
Aside:E(Y)=Y(x)=meanresponseatagivenx;sometimescalledtheregressionfunction.
Itcantakeonmanyforms,wewillconsiderthesimplelinearregressionfunction:0+1x

180


Todoformalinference,wethinkofourb0andb1asestimatesoftheunknownparameters0and
1. Below we have the somewhat statistical way of expressing the underlying model that
producesourdata:

LinearModel:theresponsey=[0+1(x)]+

=[Populationrelationship]+Randomness

Thisstatisticalmodelforsimplelinearregressionassumesthatforeachvalueofxtheobserved
valuesoftheresponse(thepopulationofyvalues)isnormallydistributed,varyingaroundsome
truemean(thatmaydependonxinalinearway)andastandarddeviationthatdoesnot
dependonx.ThistruemeanissometimesexpressedasE(Y)=0+1(x).Andthecomponents
andassumptionsregardingthisstatisticalmodelareshowvisuallybelow.

True
regression
line

Therepresentsthetrueerrorterm.Thesewouldbethedeviationsofaparticularvalueofthe
responseyfromthetrueregressionline.Asthesearethedeviationsfromthemean,thenthese
errortermsshouldhaveanormaldistributionwithmean0andconstantstandarddeviation.

Now,wecannotobservethese s.Howeverwewillbeabletousetheestimated(observable)
errors,namelytheresiduals,tocomeupwithanestimateofthestandarddeviationandtocheck
theconditionsaboutthetrueerrors.

181


Sowhathavewedone,andwherearewegoing?
1. Estimatetheregressionlinebasedonsomedata.DONE!
2. Measurethestrengthofthelinearrelationshipwiththecorrelation. DONE!
3. Usetheestimatedequationforpredictions.
DONE!
4. Assessifthelinearrelationshipisstatisticallysignificant.
5. Provideintervalestimates(confidenceintervals)forourpredictions.
6. Understandandchecktheassumptionsofourmodel.

Wehavealreadydiscussedthedescriptivegoalsof1,2,and3.Fortheinferentialgoalsof4and
5,wewillneedanestimateoftheunknownstandarddeviationinregression

EstimatingtheStandardDeviationforRegression
Thestandarddeviationforregressioncanbethoughtofasmeasuringtheaveragesizeofthe
residuals.Arelativelysmallstandarddeviationfromtheregressionlineindicatesthatindividual
datapointsgenerallyfallclosetotheline,sopredictionsbasedonthelinewillbeclosetothe
actualvalues.

It seems reasonable that our estimate of this average size of the residuals be based on the
residualsusingthesumofsquaredresidualsanddividingbyappropriatedegreesoffreedom.
Ourestimateofisgivenby:

s=

sum of squared residuals

n2

SSE
MSE where SSE
n2

2
i

y y

Note:Whyn2?Inestimatingthemeanresponsewehadtoestimate2quantities,they
interceptandtheslope;sowelose2df.

EstimatingtheStandardDeviation:BillvsTip

BelowaretheportionsoftheRregressionoutputthatwecouldusetoobtaintheestimateof
forourregressionanalysis.

FromSummary:

Residual standard error: 2.251 on 4 degrees of freedom


Multiple R-squared: 0.8487,
Adjusted R-squared: 0.8109
F-statistic: 22.45 on 1 and 4 DF, p-value: 0.009052

OrfromANOVA:

Response: Tip
Df Sum Sq Mean Sq F value
Pr(>F)
Bill
1 113.732 113.732 22.446 0.009052 **
Residuals 4 20.268
5.067

182

SignificantLinearRelationship?

Considerthefollowinghypotheses:
H 0 : 1 0 versus H a : 1 0
Whathappensifthenullhypothesisistrue?
If1=0thenE(Y)=0=>aconstantnomatterwhatthevalueofxis.i.e.knowingxdoesnothelpto
predicttheresponse.Sothesehypothesesaretestingifthereisasignificantnonzerolinear
relationshipbetweenyandx.

Thereareanumberofwaystotestthishypothesis.Onewayisthroughatteststatistic(think
aboutwhyitisatandnotaztest).Thegeneralformforatteststatisticis:

sample statistic - null value

t standard error of the sample statistic

Wehaveoursampleestimatefor 1 ,itis b1 .Andwehavethenullvalueof0.Soweneedthe


standarderrorfor b1 .Wecouldderiveit,usingtheideaofsamplingdistributions(thinkabout
thepopulationofallpossible b1 valuesifweweretorepeatthisprocedureoverandovermany
times).Hereistheresult:

ttestforthepopulationslope 1
b 0
Totest H 0 : 1 0 wewoulduse t 1

s.e.(b1 )
where SE (b1 )

s
2
x x

andthedegreesoffreedomforthetdistributionaren2.

This tstatistic could be modified to test a variety of hypotheses about the population slope
(differentnullvaluesandvariousdirectionsofextreme).

TryIt!
SignificantRelationshipbetweenBillandTip?
Isthereasignificant(nonzero)linearrelationshipbetweenthetotalcostofarestaurantbilland
thetipthatisleft?(isthebillausefullinearpredictorforthetip?)
Thatis,test H 0 : 1 0 versus H a : 1 0 usinga5%levelofsignificance.
s
2.251
1. SE (b1 )

0.036
2
3900
x x

2. t

b1 0 0.17077 0

4.74
s.e.(b1 )
0.0326

3.Usingthettablewithdf=62=4,wehavepvalue<2(0.020)=0.04
WecanrejectH0andconcludetheamountofthebillisasignificantlinearpredictorofamountofthe
tip(forthepopulationofsuchdinnercustomers).

183

Thinkaboutit:
Basedontheresultsofthepreviousttestconductedatthe5%significancelevel,doyouthinka
95%confidenceintervalforthetrueslope 1 wouldcontainthevalueof0?

ConfidenceIntervalforthepopulationslope 1

b1 t * SE b1

wheredf=n2forthe t * value

Computetheintervalandcheckyouranswer.
Couldyouinterpretthe95%confidencelevelhere?

0.17077(2.78)(0.036)0.170770.10008(0.07069,0.27085)

(t*=2.78fromdf=4and95%confidence)

Ifthisexperimentwererepeatedmanytimes,
wedexpect95%oftheresultingconfidenceintervalstocontainthepopulationslope1.

InferenceaboutthePopulationSlopeusingR
BelowaretheportionsoftheRregressionoutputthatwecouldusetoperformthettestand
obtaintheconfidenceintervalforthepopulationslope 1 .

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.58769
2.41633 -0.243 0.81980
Bill
0.17077
0.03604
4.738 0.00905 **

Note:Thereisathirdwaytotest H 0 : 1 0 versus H a : 1 0 .
ItinvolvesanotherFtestfromanANOVAforregression.

Response: Tip
Df Sum Sq Mean Sq F value
Pr(>F)
Bill
1 113.732 113.732 22.446 0.009052 **
Residuals 4 20.268
5.067

*ThettestismoreflexiblethantheFtest;Fonlytwosidedwithnull=0

184

PredictingforIndividualsversusEstimatingtheMean

Considertherelationshipbetweenthebillandtip

Leastsquaresregressionline(orestimatedregressionfunction):

y 0.5877+0.17077(x)

also E Y =0.5877+0.17077(x)

Wealsohave: s 2.251

HowwouldyoupredictthetipforBarbwhohada$50restaurantbill?

y 0.5877+0.17077(50)=$7.95

Howwouldyouestimatethemeantipforallcustomerswhohada$50restaurantbill?

E Y =0.5877+0.17077(50)=$7.95

Soourestimateforpredictingafutureobservationandforestimatingthemeanresponseare
foundusingthesameleastsquaresregressionequation.Whatabouttheirstandarderrors?(We
wouldneedthestandarderrorstobeabletoproduceanintervalestimate.)

Idea:Considerapopulationofindividualsandapopulationofmeans:

Populationofindividuals
Populationofmeans

Whatisthestandarddeviationforapopulationofindividuals?

Whatisthestandarddeviationforapopulationofmeans?n

Whichstandarddeviationislarger?

Soapredictionintervalforanindividualresponsewillbe

(widerornarrower)

thanaconfidenceintervalforameanresponse.

185

Herearethe(somewhatmessy)formulas:

TryIt!BillvsTip
Constructa95%confidenceintervalforthemeantipgivenforallcustomerswhohada$50
2
62, x x S XX 3900, y 0.5877+0.17077(x),ands=

bill(x).Recall:n=6,x
2.251.

y 0.5877+0.17077(50)=$7.95

s.e.(fit) s

( x x )2

2.251

t*=2.78(withdf=4)

1 (50 62) 2

1.0157
6
3900

y t*s.e.(fit) 7.95 (2.78)(1.0157) 7.95 2.83 =>($5.12,$10.78)

Constructa95%predictionintervalforthetipfromanindividualcustomerwhohada$50bill(x).

s.e.(pred) s 2 s.e.(fit) (2.251) 2 1.0157 2.47


2

y t *s.e.(pred) 7.95 2.78(2.47)


7.95 6.87
=>($1.08,$14.82)
Itiswider!

Showpredictionintervaland
confidenceintervalbandsonthescatterplot

186

CheckingAssumptionsinRegression

Letsrecallthestatisticalwayofexpressingtheunderlyingmodelthatproducesourdata:

LinearModel:theresponsey=[0+1(x)]+

=[Populationrelationship]+Randomness

wherethes,thetrueerrortermsshouldbenormallydistributed
withmean0andconstantstandarddeviation,
andthisrandomnessisindependentfromonecasetoanother.

Thustherearefouressentialtechnicalassumptionsrequiredforinferenceinlinearregression:

(1)Relationshipisinfactlinear.
(2)TRUEErrorsshouldbenormallydistributed.
(3)TRUEErrorsshouldhaveconstantvariance.
(4)TRUEErrorsshouldnotdisplayobviouspatterns.

Now,wecannotobservethese s.Howeverwewillbeabletousetheestimated(observable)
errors,namelytheresiduals,tocomeupwithanestimateofthestandarddeviationandtocheck
theconditionsaboutthetrueerrors.

Sohowcanwechecktheseassumptionswithourdataandestimatedmodel?

(1) Relationshipisinfactlinear.examinethescatterplotofyversusx

(2) TRUEErrorsshouldbenormallydistributed.Histogramorqqplotofresiduals

(3) TRUEErrorsshouldhaveconstantvariance. Ifwesee


(4)TRUEErrorsshouldnotdisplayobviouspatterns.
ResidualvsFittedPlot y :

ifrandomscatterwithnopattern
Now,ifwesaw
inhorizontalband=>ok


ResidualvsFittedPlot:shows
evidencethattrueerrorsdo
nothaveconstantvariance

ResidualvsFittedPlot:shows
evidencethattheunderlying
relationshipmaynotbelinear
(maybequadratic)

187

Let'sturntoonelastfullregressionproblemthat
includescheckingassumptions.

Relationshipbetweenheightandfoot
lengthforCollegeMen

The heights (in inches) and foot lengths (in


centimeters) of 32 college men were used to
develop a model for the relationship between
height and foot length. The scatterplot and R
regressionoutputareprovided.

mean
sd n
foot
27.78125 1.549701 32
height 71.68750 3.057909 32

Call:
lm(formula = foot ~ height, data = heightfoot)
Residuals:
Min
1Q
-1.74925 -0.81825

Median
0.07875

3Q
0.58075

Max
2.25075

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.25313
4.33232
0.058
0.954
height
0.38400
0.06038
6.360 5.12e-07 ***
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.028 on 30 degrees of freedom
Multiple R-squared: 0.5741,
Adjusted R-squared: 0.5599
F-statistic: 40.45 on 1 and 30 DF, p-value: 5.124e-07
Correlation Matrix
foot
height
foot
1.0000000 0.7577219
height 0.7577219 1.0000000
Analysis of Variance Table
Response: foot
Df Sum Sq Mean Sq F value
Pr(>F)
height
1 42.744 42.744 40.446 5.124e-07 ***
Residuals 30 31.705
1.057

Alsonotethat:SXX= x x 2 =289.87

188

a. How much would you expect foot length to increase for each 1inch increase in height?
Includetheunits.

Thisisaskingabouttheslope:0.384centimeters.

b. Whatisthecorrelationbetweenheightandfootlength?

r=0.7577(wouldyoubeabletointerpretthevalueofr2?

c. Givetheequationoftheleastsquaresregressionlineforpredictingfootlengthfromheight.

predictedy=yhat=0.252+0.384(x)

d. SupposeMaxis70inchestallandhasafootlengthof28.5centimeters.Basedontheleast
squaresregressionline,whatisthevalueofthepredicationerror(residual)forMax?Show
allwork.

predictedy=yhat=0.252+0.384(70)=27.13cm

observedypredictedy=28.527.13=1.37cm

e. Use a 1% significance level to assess if there is a significant positive linear relationship


betweenheightandfootlength.Statethehypothesestobetested,theobservedvalueofthe
teststatistic,thecorrespondingpvalue,andyourdecision.

Hypotheses:H0:_____1=0_____
Ha:_____1>0_______

TestStatisticValue:____6.36_______ pvalue:_0.0000005124/2=0.00000002562_

RejectH0
Decision:(circle)
FailtorejectH0

Conclusion:Thusitappearsthereisasignificantpositivelinearrelationshipbetween
heightandfootlengthsforthepopulationofcollegemenrepresentedbythesample.

189

f. Calculatea95%confidenceintervalfortheaveragefootlengthforallcollegemenwhoare
70inchestall.(Justclearlypluginallnumericalvalues.)

y t s
n
*

(x x) 2

x i x 2

27.132 (2.04) 1.028

1 70 71.7

32
289.87

27.1320.425(26.707,27.557)

g. Considertheresidualsvsfittedplotshown.

Doesthisplotsupporttheconclusionthatthelinearregressionmodelisappropriate?

Yes
No

Explain:

Theplotshowsarandomscatterinahorizontalbandaround0withnopattern.

Note:

onexam,studentswhosaidNO,becausethevariationappearstobe

changingweremarkedasoktoo.

190

Regression
LinearRegressionModel

PopulationVersion:
Y x E (Y ) 0 1 x
Mean:
Individual: y i 0 1 x i i
where i is N (0, )

SampleVersion:
y b0 b1 x
Mean:
Individual: yi b0 b1 xi ei

StandardErroroftheSampleSlope

s.e.(b1 )

b1 t *s.e.(b1 )

x x 2

df=n2

tTestfor 1 Totest H 0 : 1 0 t

b1 0

s.e.(b1 )

df=n2
MSREG
df=1,n2
MSE

ConfidenceIntervalfortheMeanResponse
y t * s.e.(fit)

df=n2

x x y y x x y
x x
x x
2

S XX

ParameterEstimators
S XY

S XX

ConfidenceIntervalfor 1

or F

b1

where s.e.(fit ) s

b0 y b1 x

Residuals

1 (x x) 2

n
S XX

PredictionIntervalforanIndividualResponse
y t * s.e.(pred)

df=n2

e y y =observedypredictedy

2
where s.e.(pred) s 2 s.e.(fit )

Correlationanditssquare
S XY

r2

S XX S YY

StandardErroroftheSampleIntercept

SSTO SSE SSREG

SSTO
SSTO

where SSTO S YY

y y

SSE

1
x2

n S XX

ConfidenceIntervalfor 0
2

Estimateof
s MSE

s.e.(b0 ) s

b0 t *s.e.(b0 )

df=n2

tTestfor 0 Totest H 0 : 0 0

SSE
where
n2

y y e
2

191

b0 0

s.e.(b0 )

df=n2

AdditionalNotes
Aplacetojotdownquestionsyoumayhaveandaskduringofficehours,takeafewextranotes,write
outanextraproblemorsummarycompletedinlecture,createyourownsummaryabouttheseconcepts.

192

Вам также может понравиться