Академический Документы
Профессиональный Документы
Культура Документы
edu/stat857)
Home>4.1VariableSelectionfortheLinearModel
4.1VariableSelectionfortheLinear
Model
Soinlinearregression,themorefeatures(Xj )thebetter(sinceRSSkeepsgoingdown)?
NO!
Carefullyselectedfeaturescanimprovemodelaccuracy.Butaddingtoomanycanleadto
overfitting:
Overfittedmodelsdescriberandomerrorornoiseinsteadofanyunderlying
relationship
Theygenerallyhavepoorpredictiveperformanceontestdata
Forinstance,wecanusea15degreepolynomialfunctiontofitthefollowingdataso
thatthefittedcurvegoesnicelythroughthedatapoints.However,abrandnewdataset
collectedfromthesamepopulationmaynotfitthisparticularcurvewellatall.
Sometimeswhenwedopredictionwemaynotwanttouseallofthepredictorvariables
(sometimespistoobig).Forexample,aDNAarrayexpressionexamplehasasample
size(N)of96butadimension(p)ofover4000!
Insuchcaseswewouldselectasubsetofpredictorvariablestoperformregressionor
classification,e.g.tochoosekpredictingvariablesfromthetotalofpvariablesyielding
minimumRS S (^) .
VariableSelectionfortheLinearRegressionModel
WhentheassociationofYandXj conditioningonotherfeaturesisofinterest,weare
interestedintestingH0 : j = 0 versusHa : j 0.
Underthenormalerror(residual)assumption,zj
v j
,wherevj isthejthdiagonal
elementof(X X)1 .
zj isdistributedastN p1 (astudent'stdistributionwithN
freedom).
p 1
degreesof
Whenpredictionisofinterest:
Ftest
Likelihoodratiotest
AIC,BIC,etc.
Crossvalidation.
Ftest
TheresidualsumofsquaresRS S () isdefinedas:
N
N
2
^ )
RS S () = (yi y
i
i=1
= (yi Xi )
i=1
(RS S0 RS S1 )/(p 1 p 0 )
RS S1 /(N p 1 1)
Underthenormalerrorassumption,theFstatisticwillhaveaF(p
p0 ),(N p1 1)
distribution.
Forlinearregressionmodels,anindividualttestisequivalenttoanFtestfordroppinga
singlecoefficientj fromthemodel.
LikelihoodRatioTest(LRT)
L
LetL1 bethemaximumvalueofthelikelihoodofthebiggermodel.
LetL0 bethemaximumvalueofthelikelihoodofthenestedsmallermodel.
Thelikelihoodratio = L0 /L1 isalwaysbetween0and1,andthelesslikelyarethe
restrictiveassumptionsunderlyingthesmallermodel,thesmallerwillbe .
Thelikelihoodratioteststatistic(deviance),2log(),approximatelyfollowsa2p
distribution.
p 0
Sowecantestthefitofthe'null'modelM0 againstamorecomplexmodelM1 .
NotethatthequantilesoftheF(p
distribution.
p0 ),(N p1 1)
distributionapproachthoseofthe2p
p 0
AkaikeInformationCriterion(AIC)
UseoftheLRTrequiresthatourmodelsarenested.Akaike(1971/74)proposedamore
generalmeasureof"modelbadness:"
^
AI C = 2logL( ) + 2p
wherepisthenumberofparameters.
Facedwithacollectionofputativemodels,the'best'(or'leastbad')onecanbechosenby
seeingwhichhasthelowestAIC.
Thescaleisstatistical,notscientific,butthetradeoffisclearwemustimprovethelog
likelihoodbyoneunitforeveryextraparameter.
AICisasymptoticallyequivalenttoleaveoneoutcrossvalidation.
BayesInformationCriterion(BIC)
AICtendstooverfitmodels(seeGoodandHardinChapter12forhowtocheckthis).
Anotherinformationcriterionwhichpenalizescomplexmodelsmoreseverelyis:
^
BI C = 2logL( ) + p log(n)
alsoknownastheSchwarz'criterionduetoSchwarz(1978),whereanapproximate
Bayesianderivationisgiven.
LowestBICistakentoidentifythe'bestmodel',asbefore.
BICtendstofavorsimplermodelsthanthosechosenbyAIC.
StepwiseSelection
AICandBICalsoallowstepwisemodelselction.
Anexhaustivesearchforthesubsetmaynotbefeasibleifpisverylarge.Therearetwomain
alternatives:
Forwardstepwiseselection:
Firstweapproximatetheresponsevariableywithaconstant(i.e.,anintercept
onlyregressionmodel).
Thenwegraduallyaddonemorevariableatatime(oraddmaineffectsfirst,then
interactions).
Everytimewealwayschoosefromtherestofthevariablestheonethatyieldsthe
bestaccuracyinpredictionwhenaddedtothepoolofalreadyselectedvariables.
ThisaccuracycanbemeasuredbytheFstatistic,LRT,AIC,BIC,etc.
Forexample,ifwehave10predictorvariables,firstwewouldapproximateywith
aconstant,andthenuseonevariableoutofthe10(Iwouldperform10
regressions,eachtimeusingadifferentpredictorvariableforeveryregressionI
havearesidualsumofsquaresthevariablethatyieldstheminimumresidual
sumofsquaresischosenandputinthepoolofselectedvariables).Wethen
proceedtochoosethenextvariablefromthe9left,etc.
Backwardstepwiseselection:Thisissimilartoforwardstepwiseselection,exceptthat
westartwiththefullmodelusingallthepredictorsandgraduallydeletevariablesoneat
atime.
Therearevariousmethodsdevelopedtochoosethenumberofpredictors,forinstancetheF
ratiotest.WestopforwardorbackwardstepwiseselectionwhennopredictorproducesanF
ratiostatisticgreaterthansomethreshold.
SourceURL:https://onlinecourses.science.psu.edu/stat857/node/45