Regression Analysis Explained

Stat250GundersonLectureNotes
11:RegressionAnalysis
Theinvalidassumptionthatcorrelationimpliescauseisprobablyamongthetwoor
threemostseriousandcommonerrorsofhumanreasoning.
StephenJayGould,TheMismeasureofMan
Describingandassessingthesignificanceofrelationshipsbetweenvariablesisveryimportantin
research.Wewillfirstlearnhowtodothisinthecasewhenthetwovariablesarequantitative.
Quantitativevariableshavenumericalvaluesthatcanbeorderedaccordingtothosevalues.
Mainidea
Wewishtostudytherelationshipbetweentwoquantitativevariables.
Generallyonevariableisthe_____RESPONSE______variable,denotedbyy.
Thisvariablemeasurestheoutcomeofthestudy
andisalsocalledthe_____DEPENDENT_____variable.
Theothervariableisthe_____EXPLANATORY______variable,denotedbyx.
Itisthevariablethatisthoughttoexplainthechangesweseeintheresponsevariable.
Theexplanatoryvariableisalsocalledthe__INDEPENDENT___variable.
The first step in examining the relationship is to use a graph a scatterplot to display the
relationship.Wewilllookforanoverallpatternandseeifthereareanydeparturesfromthis
overallpattern.
Ifalinearrelationshipappearstobereasonablefromthescatterplot,wewilltakethenextstep
offindingamodel(anequationofaline)tosummarizetherelationship.Theresultingequation
maybeusedforpredictingtheresponseforvariousvaluesoftheexplanatoryvariable.Ifcertain
assumptions hold, we can assess the significance of the linear relationship and make some
confidenceintervalsforourestimationsandpredictions.
Let'sbeginwithanexamplethatwewillcarrythroughoutourdiscussions.
171
GraphingtheRelationship:RestaurantBillvsTip
Howwelldoesthesizeofarestaurantbillpredictthetiptheserverreceives?Belowarethebills
andtipsfromsixdifferentrestaurantvisitsindollars.
Bill
41
98
25
85
50
73
Tip
8
17
4
12
5
14
Response(dependent)variabley=TipAmount.
Explanatory(independent)variablex=AmountoftheBill.
Step1:Examinethedatagraphicallywithascatterplot.
Addthepointstothescatterplotbelow:
Interpretthescatterplotintermsof...
overallform(istheaveragepatternlooklikeastraightlineorisitcurved?)
directionofassociation(positiveornegative)
strengthofassociation(howmuchdothepointsvaryaroundtheaveragepattern?)
anydeviationsfromtheoverallform?
172
DescribingaLinearRelationshipwithaRegressionLine
Regression analysis is the area of statistics used to examine the relationship between a
quantitative response variable and one or more explanatory variables. A key element is the
estimationofanequationthatdescribeshow,onaverage,theresponsevariableisrelatedto
theexplanatoryvariables.Aregressionequationcanalsobeusedtomakepredictions.
Thesimplestkindofrelationshipbetweentwovariablesisastraightline,theanalysisinthiscase
iscalledlinearregression.
RegressionLineforBillvs.Tip
Remembertheequationofaline?y=mx+b
Instatisticswedenotetheregressionlineforasampleas:
where:
y yhat=thepredictedyorestimatedyvalue
b0 yintercept=estimatedywhenx=0(notalwaysmeaningful)
b1 slope=howmuchofanincrease/decreaseweexpecttoseeinywhenxincreasesby1unit.
Goal:
Apossibleline
Tofindalinethatisclosetothedatapointsfindthebestfittingline.
How?
Whatdowemeanbybest?
Observederrorifweused
Onemeasureofhowgoodalinefitsistolookatthe
thislinetopredict=yyhat
observederrorsinprediction.
Observederrors=_____ y y _________
arecalled____ residuals __________
So we want to choose the line for which the sum of

squares of the observed errors (the sum of squared
residuals)istheleast.
Thelinethatdoesthisiscalled:______ LeastSquaresRegressionLine _____
173
Theequationsfortheestimatedslopeandinterceptaregivenby:
b1
x x y y x x y S
x x
x x S
2
XY
XX
sy
sx
b0 y b1 x
Predicta
singleyat
givenx
Estimatethe
averageyforallx
Theleastsquaresregressionline(estimatedregressionfunction)is: y y ( x) b0 b1 x
Moreonthisdistinctionlaterwhentalkaboutpredictionintervalsvs.CIsforamean.
To find this estimated regression line for our exam data by hand, it is easier if we set up a
calculationtable.Byfillinginthistableandcomputingthecolumntotals,wewillhaveallofthe
mainsummariesneededtoperformacompletelinearregressionanalysis.Notethatherewe
haven=6observations.Thefirstfiverowshavebeencompletedforyou.Ingeneral,useRora
calculatortohelpwiththegraphingandnumericalcomputations!
x=bill
y=tip
xx
x x 2
x x y
y y
y y 2
41
4162=21
(21)2=441
(21)(8)=168
810=2
(2)2=4
98
17
9862=36
(36)2=1296
(36)(17)=612
1710=7
(7)2=49
25
2562=37
(37)2=1369
(37)(4)=148
410=6
(6)2=36
85
12
8562=23
(23)2=529
(23)(12)=276
1210=2
(2)2=4
50
5062=12
(12)2=144
(12)(5)=60
510=5
(5)2=25
73
14
7362=11
(11)2=121
(11)(14)=154
1410=4
(4)2=16
372
60
3900
666
134
372
6
SlopeEstimate: b1
60
6
x x y 666 0.17077
x x 3900
2
yinterceptEstimate: b0
y b1 x 10 (0.17077)(62) 10 10.5877 0.5877
EstimatedRegressionLine: y
b0 b1 x 0.5877 0.17077( x )
174
Predictthetipforadinnerguestwhohada$50bill.
y b0 b1 x 0.5877 0.17077(50) 7.95
Note:The5thdinnerguestinsamplehadabillof$50andtheobservedtipwas$5.
Findtheresidualforthe5thobservation.
Notationforaresidual e5 y 5 y 5 57.95=2.95
Theresiduals
You found the residual for one observation. You could compute the residual for each
observation.Thefollowingtableshowseachresidual.
Squaredresiduals
predictedvalues
residuals
x=bill
y=tip
0.5877 0.17077 e y y
(e) 2 y y 2
41
98
25
85
50
73
8
17
4
12
5
14
6.41
16.15
3.68
13.93
7.95
11.88
1.59
0.85
0.32
1.93
2.95
2.12
2.52
0.72
0.10
3.73
8.70
4.49
~20.27
SSE=sumofsquarederrors(orresiduals)20.27
175
MeasuringStrengthandDirectionofaLinearRelationshipwithCorrelation
Thecorrelationcoefficientrisameasureofstrengthofthelinearrelationshipbetweenyandx.
PropertiesabouttheCorrelationCoefficientr

1. r rangesfrom...1to+1(anditisunitless)
2. Signof r indicates...directionoftheassociation
3. Magnitudeof r indicates...strength
(r=0.8andr=+0.8indicateequallystronglinearassociations)
Astrongrisdisciplinespecific
r=0.8mightbeanimportant(orstrong)correlationinengineering
r=0.6mightbeastrongcorrelationinpsychologyormedicalresearch
4. r ONLYmeasuresthestrengthoftheLINEARrelationship.
Somepictures:
r=+0.7
y
The formula for the correlation:

(but we will get it from computer output or from r2)
TipsExample:wewillsoonseethat
r=____0.9213______
Interpretation:
Afairlystrongpositivelinearassociation
betweenamountofthebillandthe
amountoftip.
r=0.4
176
r0
Thesquareofthecorrelation r 2
Total
variation
intheys
Variationnot
accountedfor
The squared correlation coefficient r 2 always has a value between __0 and 1___ and is
sometimespresentedasapercent.Itcanbeshownthatthesquareofthecorrelationisrelated
tothesumsofsquaresthatariseinregression.
Theresponses(theamountoftip)indatasetarenotallthesametheydovary.Wewould
measurethetotalvariationintheseresponsesas SSTO y y 2 (lastcolumntotalin
calculationtablesaidwewoulduselater).
Partofthereasonwhytheamountoftipvariesisbecausethereisalinearrelationshipbetween
amountoftipandamountofbill,andthestudyincludeddifferentamountsofbill.
Whenwefoundtheleastsquaresregressionline,therewasstillsomesmallvariationremaining
oftheresponsesfromtheline.Thisamountofvariationthatisnotaccountedforbythelinear
relationshipiscalledtheSSE.
Theamountofvariationthatisaccountedforbythelinearrelationshipiscalledthesumof
squaresduetothemodel(orregression),denotedbySSM(orsometimesasSSR).
Sowehave: SSTO=______SSM+SSE________
Itcanbeshownthat
SSM
2 SSTO SSE
r =
SSTO
SSTO
= theproportionoftotalvariabilityintheresponsesthatcanbeexplainedbythelinear
relationshipwiththeexplanatoryvariable x .
Note:Thevalueof r 2 andthesesumsofsquaresaresummarizedinanANOVAtablethatis
standardoutputfromcomputerpackageswhendoingregression.
177
MeasuringStrengthandDirectionforExam2vsFinal
Fromourfirstcalculationtablewehave:
SSTO=___134_____________
Fromourresidualcalculationtablewehave:
SSE=___20.27_______________
Sothesquaredcorrelationcoefficientforourexamscoresregressionis:
SSTO SSE
=
r2
SSTO
134 20.27 113.73
0.84873
134
134
Interpretation:
Weaccountedfor~84.9%ofthevariationin__AmountofTipsreceived_
bythelinearregressiononAmountoftheBill.
r
0.84873 0.9213
Thecorrelationcoefficientisr=
Afewmoregeneralnotes:
Nonlinearrelationships
DetectingOutliersandtheirinfluenceonregressionresults.
DangersofExtrapolation(predictingoutsidetherangeofyourdata)
Dangersofcombininggroupsinappropriately(SimpsonsParadox)
Correlationdoesnotprovecausation
178
RRegressionAnalysisforBillvsTips
LetslookattheRoutputforourBillandTipdata.
Wewillseethatmuchofthecomputationsaredoneforus.
Call:
lm(formula = Tip ~ Bill, data = Tips)
Residuals:
1
2
1.5862 0.8523
3
4
5
0.3185 -1.9277 -2.9508
6
2.1215
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.58769
2.41633 -0.243 0.81980
Bill
0.17077
0.03604
4.738 0.00905 **
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.251 on 4 degrees of freedom
Multiple R-squared: 0.8487,
Adjusted R-squared: 0.8109
F-statistic: 22.45 on 1 and 4 DF, p-value: 0.009052
Correlation Matrix
Bill
Tip
Bill 1.0000000 0.9212755
Tip 0.9212755 1.0000000
ANOVA Table
Response: Tip
Df Sum Sq Mean Sq F value
Pr(>F)
Bill
1 113.732 113.732 22.446 0.009052 **
Residuals 4 20.268
5.067
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
179
InferenceinLinearRegressionAnalysis
Thematerialcoveredsofarfocusesonusingthedataforasampletographanddescribethe
relationship.Theslopeandinterceptvalueswehavecomputedarestatistics,theyareestimates
oftheunderlyingtruerelationshipforthelargerpopulation.
Nextweturntomakinginferencesabouttherelationshipforthelargerpopulation.Hereisa
nice summary to help us distinguish between the regression line for the sample and the
regressionlineforthepopulation.
RegressionLinefortheSample
RegressionLineforthePopulation
Allimages
Aside:E(Y)=Y(x)=meanresponseatagivenx;sometimescalledtheregressionfunction.
Itcantakeonmanyforms,wewillconsiderthesimplelinearregressionfunction:0+1x
180

Todoformalinference,wethinkofourb0andb1asestimatesoftheunknownparameters0and
1. Below we have the somewhat statistical way of expressing the underlying model that
producesourdata:
LinearModel:theresponsey=[0+1(x)]+
=[Populationrelationship]+Randomness
Thisstatisticalmodelforsimplelinearregressionassumesthatforeachvalueofxtheobserved
valuesoftheresponse(thepopulationofyvalues)isnormallydistributed,varyingaroundsome
truemean(thatmaydependonxinalinearway)andastandarddeviationthatdoesnot
dependonx.ThistruemeanissometimesexpressedasE(Y)=0+1(x).Andthecomponents
andassumptionsregardingthisstatisticalmodelareshowvisuallybelow.
True
regression
line
Therepresentsthetrueerrorterm.Thesewouldbethedeviationsofaparticularvalueofthe
responseyfromthetrueregressionline.Asthesearethedeviationsfromthemean,thenthese
errortermsshouldhaveanormaldistributionwithmean0andconstantstandarddeviation.
Now,wecannotobservethese s.Howeverwewillbeabletousetheestimated(observable)
errors,namelytheresiduals,tocomeupwithanestimateofthestandarddeviationandtocheck
theconditionsaboutthetrueerrors.
181

Sowhathavewedone,andwherearewegoing?
1. Estimatetheregressionlinebasedonsomedata.DONE!
2. Measurethestrengthofthelinearrelationshipwiththecorrelation. DONE!
3. Usetheestimatedequationforpredictions.
DONE!
4. Assessifthelinearrelationshipisstatisticallysignificant.
5. Provideintervalestimates(confidenceintervals)forourpredictions.
6. Understandandchecktheassumptionsofourmodel.
Wehavealreadydiscussedthedescriptivegoalsof1,2,and3.Fortheinferentialgoalsof4and
5,wewillneedanestimateoftheunknownstandarddeviationinregression
EstimatingtheStandardDeviationforRegression
Thestandarddeviationforregressioncanbethoughtofasmeasuringtheaveragesizeofthe
residuals.Arelativelysmallstandarddeviationfromtheregressionlineindicatesthatindividual
datapointsgenerallyfallclosetotheline,sopredictionsbasedonthelinewillbeclosetothe
actualvalues.
It seems reasonable that our estimate of this average size of the residuals be based on the
residualsusingthesumofsquaredresidualsanddividingbyappropriatedegreesoffreedom.
Ourestimateofisgivenby:
s=
sum of squared residuals
n2
SSE
MSE where SSE
n2
2
i
y y
Note:Whyn2?Inestimatingthemeanresponsewehadtoestimate2quantities,they
interceptandtheslope;sowelose2df.
EstimatingtheStandardDeviation:BillvsTip
BelowaretheportionsoftheRregressionoutputthatwecouldusetoobtaintheestimateof
forourregressionanalysis.
FromSummary:

F-statistic: 22.45 on 1 and 4 DF, p-value: 0.009052
OrfromANOVA:
Response: Tip
Pr(>F)
Bill
1 113.732 113.732 22.446 0.009052 **
Residuals 4 20.268
5.067
182
SignificantLinearRelationship?
Considerthefollowinghypotheses:
H 0 : 1 0 versus H a : 1 0
Whathappensifthenullhypothesisistrue?
If1=0thenE(Y)=0=>aconstantnomatterwhatthevalueofxis.i.e.knowingxdoesnothelpto
predicttheresponse.Sothesehypothesesaretestingifthereisasignificantnonzerolinear
relationshipbetweenyandx.
Thereareanumberofwaystotestthishypothesis.Onewayisthroughatteststatistic(think
aboutwhyitisatandnotaztest).Thegeneralformforatteststatisticis:
sample statistic - null value
t standard error of the sample statistic
Wehaveoursampleestimatefor 1 ,itis b1 .Andwehavethenullvalueof0.Soweneedthe

standarderrorfor b1 .Wecouldderiveit,usingtheideaofsamplingdistributions(thinkabout
thepopulationofallpossible b1 valuesifweweretorepeatthisprocedureoverandovermany
times).Hereistheresult:
ttestforthepopulationslope 1
b 0
Totest H 0 : 1 0 wewoulduse t 1
s.e.(b1 )
where SE (b1 )
s
2
x x
andthedegreesoffreedomforthetdistributionaren2.
This tstatistic could be modified to test a variety of hypotheses about the population slope
(differentnullvaluesandvariousdirectionsofextreme).
TryIt!
SignificantRelationshipbetweenBillandTip?
Isthereasignificant(nonzero)linearrelationshipbetweenthetotalcostofarestaurantbilland
thetipthatisleft?(isthebillausefullinearpredictorforthetip?)
Thatis,test H 0 : 1 0 versus H a : 1 0 usinga5%levelofsignificance.
s
2.251
1. SE (b1 )
0.036
2
3900
x x
2. t
b1 0 0.17077 0
4.74
s.e.(b1 )
0.0326
3.Usingthettablewithdf=62=4,wehavepvalue<2(0.020)=0.04
WecanrejectH0andconcludetheamountofthebillisasignificantlinearpredictorofamountofthe
tip(forthepopulationofsuchdinnercustomers).
183
Thinkaboutit:
Basedontheresultsofthepreviousttestconductedatthe5%significancelevel,doyouthinka
95%confidenceintervalforthetrueslope 1 wouldcontainthevalueof0?
ConfidenceIntervalforthepopulationslope 1
b1 t * SE b1
wheredf=n2forthe t * value
Computetheintervalandcheckyouranswer.
Couldyouinterpretthe95%confidencelevelhere?
0.17077(2.78)(0.036)0.170770.10008(0.07069,0.27085)
(t*=2.78fromdf=4and95%confidence)
Ifthisexperimentwererepeatedmanytimes,
wedexpect95%oftheresultingconfidenceintervalstocontainthepopulationslope1.
InferenceaboutthePopulationSlopeusingR
BelowaretheportionsoftheRregressionoutputthatwecouldusetoperformthettestand
obtaintheconfidenceintervalforthepopulationslope 1 .
Coefficients:
(Intercept) -0.58769
2.41633 -0.243 0.81980
Bill
0.17077
0.03604
4.738 0.00905 **
Note:Thereisathirdwaytotest H 0 : 1 0 versus H a : 1 0 .
ItinvolvesanotherFtestfromanANOVAforregression.
Response: Tip
Pr(>F)
Bill
1 113.732 113.732 22.446 0.009052 **
Residuals 4 20.268
5.067
*ThettestismoreflexiblethantheFtest;Fonlytwosidedwithnull=0
184
PredictingforIndividualsversusEstimatingtheMean
Considertherelationshipbetweenthebillandtip
Leastsquaresregressionline(orestimatedregressionfunction):
y 0.5877+0.17077(x)
also E Y =0.5877+0.17077(x)
Wealsohave: s 2.251
HowwouldyoupredictthetipforBarbwhohada$50restaurantbill?
y 0.5877+0.17077(50)=$7.95
Howwouldyouestimatethemeantipforallcustomerswhohada$50restaurantbill?
E Y =0.5877+0.17077(50)=$7.95
Soourestimateforpredictingafutureobservationandforestimatingthemeanresponseare
foundusingthesameleastsquaresregressionequation.Whatabouttheirstandarderrors?(We
wouldneedthestandarderrorstobeabletoproduceanintervalestimate.)
Idea:Considerapopulationofindividualsandapopulationofmeans:
Populationofindividuals
Populationofmeans
Whatisthestandarddeviationforapopulationofindividuals?
Whatisthestandarddeviationforapopulationofmeans?n
Whichstandarddeviationislarger?
Soapredictionintervalforanindividualresponsewillbe
(widerornarrower)
thanaconfidenceintervalforameanresponse.
185
Herearethe(somewhatmessy)formulas:
TryIt!BillvsTip
Constructa95%confidenceintervalforthemeantipgivenforallcustomerswhohada$50
2
62, x x S XX 3900, y 0.5877+0.17077(x),ands=
bill(x).Recall:n=6,x
2.251.
y 0.5877+0.17077(50)=$7.95
s.e.(fit) s
( x x )2
2.251
t*=2.78(withdf=4)
1 (50 62) 2
1.0157
6
3900
y t*s.e.(fit) 7.95 (2.78)(1.0157) 7.95 2.83 =>($5.12,$10.78)
Constructa95%predictionintervalforthetipfromanindividualcustomerwhohada$50bill(x).
s.e.(pred) s 2 s.e.(fit) (2.251) 2 1.0157 2.47

2
y t *s.e.(pred) 7.95 2.78(2.47)

7.95 6.87
=>($1.08,$14.82)
Itiswider!
Showpredictionintervaland
confidenceintervalbandsonthescatterplot
186
CheckingAssumptionsinRegression
Letsrecallthestatisticalwayofexpressingtheunderlyingmodelthatproducesourdata:
LinearModel:theresponsey=[0+1(x)]+
=[Populationrelationship]+Randomness
wherethes,thetrueerrortermsshouldbenormallydistributed
withmean0andconstantstandarddeviation,
andthisrandomnessisindependentfromonecasetoanother.
Thustherearefouressentialtechnicalassumptionsrequiredforinferenceinlinearregression:
(1)Relationshipisinfactlinear.
(2)TRUEErrorsshouldbenormallydistributed.
(3)TRUEErrorsshouldhaveconstantvariance.
(4)TRUEErrorsshouldnotdisplayobviouspatterns.
Now,wecannotobservethese s.Howeverwewillbeabletousetheestimated(observable)
errors,namelytheresiduals,tocomeupwithanestimateofthestandarddeviationandtocheck
theconditionsaboutthetrueerrors.
Sohowcanwechecktheseassumptionswithourdataandestimatedmodel?
(1) Relationshipisinfactlinear.examinethescatterplotofyversusx
(2) TRUEErrorsshouldbenormallydistributed.Histogramorqqplotofresiduals
(3) TRUEErrorsshouldhaveconstantvariance. Ifwesee

(4)TRUEErrorsshouldnotdisplayobviouspatterns.
ResidualvsFittedPlot y :
ifrandomscatterwithnopattern
Now,ifwesaw
inhorizontalband=>ok

ResidualvsFittedPlot:shows
evidencethattrueerrorsdo
nothaveconstantvariance
ResidualvsFittedPlot:shows
evidencethattheunderlying
relationshipmaynotbelinear
(maybequadratic)
187
Let'sturntoonelastfullregressionproblemthat
includescheckingassumptions.
Relationshipbetweenheightandfoot
lengthforCollegeMen
The heights (in inches) and foot lengths (in

centimeters) of 32 college men were used to
develop a model for the relationship between
height and foot length. The scatterplot and R
regressionoutputareprovided.
mean
sd n
foot
27.78125 1.549701 32
height 71.68750 3.057909 32
Call:
lm(formula = foot ~ height, data = heightfoot)
Residuals:
Min
1Q
-1.74925 -0.81825
Median
0.07875
3Q
0.58075
Max
2.25075
Coefficients:
(Intercept) 0.25313
4.33232
0.058
0.954
height
0.38400
0.06038
6.360 5.12e-07 ***
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
F-statistic: 40.45 on 1 and 30 DF, p-value: 5.124e-07
Correlation Matrix
foot
height
foot
1.0000000 0.7577219
height 0.7577219 1.0000000
Analysis of Variance Table
Response: foot
Pr(>F)
height
1 42.744 42.744 40.446 5.124e-07 ***
Residuals 30 31.705
1.057
Alsonotethat:SXX= x x 2 =289.87
188
a. How much would you expect foot length to increase for each 1inch increase in height?
Includetheunits.

Thisisaskingabouttheslope:0.384centimeters.
b. Whatisthecorrelationbetweenheightandfootlength?
r=0.7577(wouldyoubeabletointerpretthevalueofr2?
c. Givetheequationoftheleastsquaresregressionlineforpredictingfootlengthfromheight.
predictedy=yhat=0.252+0.384(x)
d. SupposeMaxis70inchestallandhasafootlengthof28.5centimeters.Basedontheleast
squaresregressionline,whatisthevalueofthepredicationerror(residual)forMax?Show
allwork.
predictedy=yhat=0.252+0.384(70)=27.13cm
observedypredictedy=28.527.13=1.37cm
e. Use a 1% significance level to assess if there is a significant positive linear relationship

betweenheightandfootlength.Statethehypothesestobetested,theobservedvalueofthe
teststatistic,thecorrespondingpvalue,andyourdecision.

Hypotheses:H0:_____1=0_____
Ha:_____1>0_______
TestStatisticValue:____6.36_______ pvalue:_0.0000005124/2=0.00000002562_
RejectH0
Decision:(circle)
FailtorejectH0
Conclusion:Thusitappearsthereisasignificantpositivelinearrelationshipbetween
heightandfootlengthsforthepopulationofcollegemenrepresentedbythesample.

189
f. Calculatea95%confidenceintervalfortheaveragefootlengthforallcollegemenwhoare
70inchestall.(Justclearlypluginallnumericalvalues.)
y t s
n
*
(x x) 2
x i x 2
27.132 (2.04) 1.028
1 70 71.7
32
289.87
27.1320.425(26.707,27.557)
g. Considertheresidualsvsfittedplotshown.
Doesthisplotsupporttheconclusionthatthelinearregressionmodelisappropriate?
Yes
No
Explain:
Theplotshowsarandomscatterinahorizontalbandaround0withnopattern.
Note:
onexam,studentswhosaidNO,becausethevariationappearstobe
changingweremarkedasoktoo.
190
Regression
LinearRegressionModel
PopulationVersion:
Y x E (Y ) 0 1 x
Mean:
Individual: y i 0 1 x i i
where i is N (0, )
SampleVersion:
y b0 b1 x
Mean:
Individual: yi b0 b1 xi ei
StandardErroroftheSampleSlope
s.e.(b1 )
b1 t *s.e.(b1 )
x x 2
df=n2
tTestfor 1 Totest H 0 : 1 0 t
b1 0
s.e.(b1 )
df=n2
MSREG
df=1,n2
MSE
ConfidenceIntervalfortheMeanResponse
y t * s.e.(fit)
df=n2
x x y y x x y
x x
x x
2
S XX
ParameterEstimators
S XY
S XX
ConfidenceIntervalfor 1
or F
b1
where s.e.(fit ) s
b0 y b1 x
Residuals
1 (x x) 2
n
S XX
PredictionIntervalforanIndividualResponse
y t * s.e.(pred)
df=n2
e y y =observedypredictedy
2
where s.e.(pred) s 2 s.e.(fit )
Correlationanditssquare
S XY
r2
S XX S YY
StandardErroroftheSampleIntercept
SSTO SSE SSREG
SSTO
SSTO
where SSTO S YY
y y
SSE
1
x2
n S XX
ConfidenceIntervalfor 0
2
Estimateof
s MSE
s.e.(b0 ) s
b0 t *s.e.(b0 )
df=n2
tTestfor 0 Totest H 0 : 0 0
SSE
where
n2
y y e
2
191
b0 0
s.e.(b0 )
df=n2
AdditionalNotes
Aplacetojotdownquestionsyoumayhaveandaskduringofficehours,takeafewextranotes,write
outanextraproblemorsummarycompletedinlecture,createyourownsummaryabouttheseconcepts.
192

Regression Analysis Explained

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Regression Analysis Explained

Загружено:

Авторское право:

Доступные форматы

Stat250GundersonLectureNotes

arecalled residuals ______

So we want to choose the line for which the sum of

Thelinethatdoesthisiscalled:__ LeastSquaresRegressionLine _

y b1 x 10 (0.17077)(62) 10 10.5877 0.5877

y b0 b1 x 0.5877 0.17077(50) 7.95

The formula for the correlation:

134 20.27 113.73

sum of squared residuals

Residual standard error: 2.251 on 4 degrees of freedom

sample statistic - null value

t standard error of the sample statistic

Wehaveoursampleestimatefor 1 ,itis b1 .Andwehavethenullvalueof0.Soweneedthe

y t*s.e.(fit) 7.95 (2.78)(1.0157) 7.95 2.83 =>($5.12,$10.78)

s.e.(pred) s 2 s.e.(fit) (2.251) 2 1.0157 2.47

y t *s.e.(pred) 7.95 2.78(2.47)

(3) TRUEErrorsshouldhaveconstantvariance. Ifwesee

The heights (in inches) and foot lengths (in

e. Use a 1% significance level to assess if there is a significant positive linear relationship

27.132 (2.04) 1.028

SSTO SSE SSREG

Вам также может понравиться