Вы находитесь на странице: 1из 4

PublishedonSTAT897D(https://onlinecourses.science.psu.

edu/stat857)
Home>4.1VariableSelectionfortheLinearModel

4.1VariableSelectionfortheLinear
Model
Soinlinearregression,themorefeatures(Xj )thebetter(sinceRSSkeepsgoingdown)?
NO!
Carefullyselectedfeaturescanimprovemodelaccuracy.Butaddingtoomanycanleadto
overfitting:
Overfittedmodelsdescriberandomerrorornoiseinsteadofanyunderlying
relationship
Theygenerallyhavepoorpredictiveperformanceontestdata
Forinstance,wecanusea15degreepolynomialfunctiontofitthefollowingdataso
thatthefittedcurvegoesnicelythroughthedatapoints.However,abrandnewdataset
collectedfromthesamepopulationmaynotfitthisparticularcurvewellatall.

Sometimeswhenwedopredictionwemaynotwanttouseallofthepredictorvariables

(sometimespistoobig).Forexample,aDNAarrayexpressionexamplehasasample
size(N)of96butadimension(p)ofover4000!
Insuchcaseswewouldselectasubsetofpredictorvariablestoperformregressionor
classification,e.g.tochoosekpredictingvariablesfromthetotalofpvariablesyielding
minimumRS S (^) .

VariableSelectionfortheLinearRegressionModel
WhentheassociationofYandXj conditioningonotherfeaturesisofinterest,weare
interestedintestingH0 : j = 0 versusHa : j 0.
Underthenormalerror(residual)assumption,zj

v j

,wherevj isthejthdiagonal

elementof(X X)1 .
zj isdistributedastN p1 (astudent'stdistributionwithN
freedom).

p 1

degreesof

Whenpredictionisofinterest:
Ftest
Likelihoodratiotest
AIC,BIC,etc.
Crossvalidation.

Ftest
TheresidualsumofsquaresRS S () isdefinedas:
N

N
2

^ )
RS S () = (yi y
i

i=1

= (yi Xi )
i=1

LetRS S1 correspondtothebiggermodelwithp1 + 1 parameters,andRS S0 correspond


tothenestedsmallermodelwithp0 + 1 parameters.
TheFstatisticmeasuresthereductionofRSSperadditionalparameterinthebiggermodel:
F =

(RS S0 RS S1 )/(p 1 p 0 )
RS S1 /(N p 1 1)

Underthenormalerrorassumption,theFstatisticwillhaveaF(p

p0 ),(N p1 1)

distribution.

Forlinearregressionmodels,anindividualttestisequivalenttoanFtestfordroppinga
singlecoefficientj fromthemodel.

LikelihoodRatioTest(LRT)
L

LetL1 bethemaximumvalueofthelikelihoodofthebiggermodel.
LetL0 bethemaximumvalueofthelikelihoodofthenestedsmallermodel.
Thelikelihoodratio = L0 /L1 isalwaysbetween0and1,andthelesslikelyarethe
restrictiveassumptionsunderlyingthesmallermodel,thesmallerwillbe .
Thelikelihoodratioteststatistic(deviance),2log(),approximatelyfollowsa2p
distribution.

p 0

Sowecantestthefitofthe'null'modelM0 againstamorecomplexmodelM1 .
NotethatthequantilesoftheF(p
distribution.

p0 ),(N p1 1)

distributionapproachthoseofthe2p

p 0

AkaikeInformationCriterion(AIC)
UseoftheLRTrequiresthatourmodelsarenested.Akaike(1971/74)proposedamore
generalmeasureof"modelbadness:"
^
AI C = 2logL( ) + 2p

wherepisthenumberofparameters.
Facedwithacollectionofputativemodels,the'best'(or'leastbad')onecanbechosenby
seeingwhichhasthelowestAIC.
Thescaleisstatistical,notscientific,butthetradeoffisclearwemustimprovethelog
likelihoodbyoneunitforeveryextraparameter.
AICisasymptoticallyequivalenttoleaveoneoutcrossvalidation.

BayesInformationCriterion(BIC)
AICtendstooverfitmodels(seeGoodandHardinChapter12forhowtocheckthis).
Anotherinformationcriterionwhichpenalizescomplexmodelsmoreseverelyis:
^
BI C = 2logL( ) + p log(n)

alsoknownastheSchwarz'criterionduetoSchwarz(1978),whereanapproximate
Bayesianderivationisgiven.
LowestBICistakentoidentifythe'bestmodel',asbefore.
BICtendstofavorsimplermodelsthanthosechosenbyAIC.

StepwiseSelection

AICandBICalsoallowstepwisemodelselction.

Anexhaustivesearchforthesubsetmaynotbefeasibleifpisverylarge.Therearetwomain
alternatives:
Forwardstepwiseselection:
Firstweapproximatetheresponsevariableywithaconstant(i.e.,anintercept
onlyregressionmodel).
Thenwegraduallyaddonemorevariableatatime(oraddmaineffectsfirst,then
interactions).
Everytimewealwayschoosefromtherestofthevariablestheonethatyieldsthe
bestaccuracyinpredictionwhenaddedtothepoolofalreadyselectedvariables.
ThisaccuracycanbemeasuredbytheFstatistic,LRT,AIC,BIC,etc.
Forexample,ifwehave10predictorvariables,firstwewouldapproximateywith
aconstant,andthenuseonevariableoutofthe10(Iwouldperform10
regressions,eachtimeusingadifferentpredictorvariableforeveryregressionI
havearesidualsumofsquaresthevariablethatyieldstheminimumresidual
sumofsquaresischosenandputinthepoolofselectedvariables).Wethen
proceedtochoosethenextvariablefromthe9left,etc.
Backwardstepwiseselection:Thisissimilartoforwardstepwiseselection,exceptthat
westartwiththefullmodelusingallthepredictorsandgraduallydeletevariablesoneat
atime.
Therearevariousmethodsdevelopedtochoosethenumberofpredictors,forinstancetheF
ratiotest.WestopforwardorbackwardstepwiseselectionwhennopredictorproducesanF
ratiostatisticgreaterthansomethreshold.
SourceURL:https://onlinecourses.science.psu.edu/stat857/node/45

Вам также может понравиться