Gradient Boosting - Wikipedia, The Free Encyclopedia

9/2/2016
GradientboostingWikipedia,thefreeencyclopedia
Gradientboosting
FromWikipedia,thefreeencyclopedia
Gradientboostingisamachinelearningtechniqueforregressionandclassificationproblems,whichproducesa
predictionmodelintheformofanensembleofweakpredictionmodels,typicallydecisiontrees.Itbuildsthe
modelinastagewisefashionlikeotherboostingmethodsdo,anditgeneralizesthembyallowingoptimizationof
anarbitrarydifferentiablelossfunction.
TheideaofgradientboostingoriginatedintheobservationbyLeoBreiman[1]thatboostingcanbeinterpretedas
anoptimizationalgorithmonasuitablecostfunction.Explicitregressiongradientboostingalgorithmswere
subsequentlydevelopedbyJeromeH.Friedman[2][3]simultaneouslywiththemoregeneralfunctionalgradient
boostingperspectiveofLlewMason,JonathanBaxter,PeterBartlettandMarcusFrean.[4][5]Thelattertwopapers
introducedtheabstractviewofboostingalgorithmsasiterativefunctionalgradientdescentalgorithms.Thatis,
algorithmsthatoptimizeacostfunctionoverfunctionspacebyiterativelychoosingafunction(weakhypothesis)
thatpointsinthenegativegradientdirection.Thisfunctionalgradientviewofboostinghasledtothedevelopment
ofboostingalgorithmsinmanyareasofmachinelearningandstatisticsbeyondregressionandclassification.
Contents
1 Informalintroduction
2 Algorithm
3 Gradienttreeboosting
3.1 Sizeoftrees
4 Regularization
4.1 Shrinkage
4.2 Stochasticgradientboosting
4.3 Numberofobservationsinleaves
4.4 PenalizeComplexityofTree
5 Usage
6 Names
7 Seealso
8 References
Informalintroduction
(ThissectionfollowstheexpositionofgradientboostingbyLi.[6])
Likeotherboostingmethods,gradientboostingcombinesweaklearnersintoasinglestronglearner,inaniterative
fashion.Itiseasiesttoexplainintheleastsquaresregressionsetting,wherethegoalistolearnamodel that
predictsvalues
,minimizingthemeansquarederror
tothetruevaluesy(averagedoversome
trainingset).
Ateachstage
ofgradientboosting,itmaybeassumedthatthereissomeimperfectmodel (atthe
outset,averyweakmodelthatjustpredictsthemeanyinthetrainingsetcouldbeused).Thegradientboosting
algorithmdoesnotchange inanywayinstead,itimprovesonitbyconstructinganewmodelthataddsan
estimator toprovideabettermodel
https://en.wikipedia.org/wiki/Gradient_boosting
.Thequestionisnow,howtofind ?The
1/6
9/2/2016
estimatorhtoprovideabettermodel
.Thequestionisnow,howtofind ?The
gradientboostingsolutionstartswiththeobservationthataperfecthwouldimply
or,equivalently,
.
Therefore,gradientboostingwillfithtotheresidual
.Likeinotherboostingvariants,each
learnstocorrectitspredecessor .Ageneralizationofthisideatootherlossfunctionsthansquarederror(andto
classificationandrankingproblems)followsfromtheobservationthatresiduals
arethenegative
gradientsofthesquarederrorlossfunction
.So,gradientboostingisagradientdescentalgorithm
andgeneralizingitentails"pluggingin"adifferentlossanditsgradient.
Algorithm
Inmanysupervisedlearningproblemsonehasanoutputvariableyandavectorofinputvariablesxconnected
togetherviaajointprobabilitydistribution
.Usingatrainingset
ofknownvalues
ofxandcorrespondingvaluesofy,thegoalistofindanapproximation
toafunction
thatminimizes
theexpectedvalueofsomespecifiedlossfunction
:
.
Gradientboostingmethodassumesarealvaluedyandseeksanapproximation
sumoffunctions
fromsomeclass,calledbase(orweak)learners:
intheformofaweighted
.
Inaccordancewiththeempiricalriskminimizationprinciple,themethodtriestofindanapproximation
that
minimizestheaveragevalueofthelossfunctiononthetrainingset.Itdoessobystartingwithamodel,consisting
ofaconstantfunction
,andincrementallyexpandingitinagreedyfashion:
,
,
wherefisrestrictedtobeafunctionfromtheclassofbaselearnerfunctions.
However,theproblemofchoosingateachstepthebestfforanarbitrarylossfunctionLisahardoptimization
problemingeneral,andsowe'll"cheat"bysolvingamucheasierprobleminstead.
Theideaistoapplyasteepestdescentsteptothisminimizationproblem.Ifweonlycaredaboutpredictionsatthe
pointsofthetrainingset,andfwereunrestricted,we'dupdatethemodelperthefollowingequation,wherewe
view
notasafunctionaloff,butasafunctionofavectorofvalues
:
2/6
9/2/2016
Butasfmustcomefromarestrictedclassoffunctions(that'swhatallowsustogeneralize),we'lljustchoosethe
onethatmostcloselyapproximatesthegradientofL.Havingchosenf,themultiplieristhenselectedusingline
searchjustasshowninthesecondequationabove.
Inpseudocode,thegenericgradientboostingmethodis:[2][7]
Input:trainingset
adifferentiablelossfunction
numberofiterationsM.
Algorithm:
1.Initializemodelwithaconstantvalue:
2.Form=1toM:
1.Computesocalledpseudoresiduals:
2.Fitabaselearner
3.Computemultiplier
topseudoresiduals,i.e.trainitusingthetrainingset
bysolvingthefollowingonedimensionaloptimizationproblem:
4.Updatethemodel:
3.Output
Gradienttreeboosting
Gradientboostingistypicallyusedwithdecisiontrees(especiallyCARTtrees)ofafixedsizeasbaselearners.For
thisspecialcaseFriedmanproposesamodificationtogradientboostingmethodwhichimprovesthequalityoffit
ofeachbaselearner.
Genericgradientboostingatthemthstepwouldfitadecisiontree
topseudoresiduals.Let bethe
numberofitsleaves.Thetreepartitionstheinputspaceinto disjointregions
andpredictsa
constantvalueineachregion.Usingtheindicatornotation,theoutputof
forinputxcanbewrittenasthe
sum:
3/6
9/2/2016
where
isthevaluepredictedintheregion
.[8]
Thenthecoefficients
aremultipliedbysomevalue
function,andthemodelisupdatedasfollows:
,chosenusinglinesearchsoastominimizetheloss
Friedmanproposestomodifythisalgorithmsothatitchoosesaseparateoptimalvalue
foreachofthetree's
regions,insteadofasingle forthewholetree.Hecallsthemodifiedalgorithm"TreeBoost".Thecoefficients
fromthetreefittingprocedurecanbethensimplydiscardedandthemodelupdaterulebecomes:
Sizeoftrees
,thenumberofterminalnodesintrees,isthemethod'sparameterwhichcanbeadjustedforadatasetathand.It
controlsthemaximumallowedlevelofinteractionbetweenvariablesinthemodel.With
(decisionstumps),
nointeractionbetweenvariablesisallowed.With
themodelmayincludeeffectsoftheinteractionbetween
uptotwovariables,andsoon.
Hastieetal.[7]commentthattypically
workwellforboostingandresultsarefairlyinsensitivetothe
choiceof inthisrange,
isinsufficientformanyapplications,and
isunlikelytoberequired.
Regularization
Fittingthetrainingsettoocloselycanleadtodegradationofthemodel'sgeneralizationability.Severalsocalled
regularizationtechniquesreducethisoverfittingeffectbyconstrainingthefittingprocedure.
OnenaturalregularizationparameteristhenumberofgradientboostingiterationsM(i.e.thenumberoftreesinthe
modelwhenthebaselearnerisadecisiontree).IncreasingMreducestheerrorontrainingset,butsettingittoo
highmayleadtooverfitting.AnoptimalvalueofMisoftenselectedbymonitoringpredictionerroronaseparate
validationdataset.BesidescontrollingM,severalotherregularizationtechniquesareused.
Shrinkage
Animportantpartofgradientboostingmethodisregularizationbyshrinkagewhichconsistsinmodifyingthe
updateruleasfollows:
whereparameter iscalledthe"learningrate".
4/6
9/2/2016
Empiricallyithasbeenfoundthatusingsmalllearningrates(suchas
)yieldsdramaticimprovementsin
model'sgeneralizationabilityovergradientboostingwithoutshrinking(
).[7]However,itcomesattheprice
ofincreasingcomputationaltimebothduringtrainingandquerying:lowerlearningraterequiresmoreiterations.
Stochasticgradientboosting
SoonaftertheintroductionofgradientboostingFriedmanproposedaminormodificationtothealgorithm,
motivatedbyBreiman'sbaggingmethod.[3]Specifically,heproposedthatateachiterationofthealgorithm,abase
learnershouldbefitonasubsampleofthetrainingsetdrawnatrandomwithoutreplacement.[9]Friedman
observedasubstantialimprovementingradientboosting'saccuracywiththismodification.
Subsamplesizeissomeconstantfractionfofthesizeofthetrainingset.Whenf=1,thealgorithmisdeterministic
andidenticaltotheonedescribedabove.Smallervaluesoffintroducerandomnessintothealgorithmandhelp
preventoverfitting,actingasakindofregularization.Thealgorithmalsobecomesfaster,becauseregressiontrees
havetobefittosmallerdatasetsateachiteration.Friedman[3]obtainedthat
leadstogoodresults
forsmallandmoderatesizedtrainingsets.Therefore,fistypicallysetto0.5,meaningthatonehalfofthetraining
setisusedtobuildeachbaselearner.
Also,likeinbagging,subsamplingallowsonetodefineanoutofbagerrorofthepredictionperformance
improvementbyevaluatingpredictionsonthoseobservationswhichwerenotusedinthebuildingofthenextbase
learner.Outofbagestimateshelpavoidtheneedforanindependentvalidationdataset,butoftenunderestimate
actualperformanceimprovementandtheoptimalnumberofiterations.[10]
Numberofobservationsinleaves
Gradienttreeboostingimplementationsoftenalsouseregularizationbylimitingtheminimumnumberof
observationsintrees'terminalnodes(thisparameteriscalledn.minobsinnodeintheRgbmpackage[10]).Itisusedin
thetreebuildingprocessbyignoringanysplitsthatleadtonodescontainingfewerthanthisnumberoftrainingset
instances.
Imposingthislimithelpstoreducevarianceinpredictionsatleaves.
PenalizeComplexityofTree
Anotherusefulregularizationtechniquesforgradientboostedtreesistopenalizemodelcomplexityofthelearned
model.[11]Themodelcomplexitycanbedefinedproportionalnumberofleavesinthelearnedtrees.Thejoint
optimizationoflossandmodelcomplexitycorrespondstoapostpruningalgorithmtoremovebranchesthatfailto
reducethelossbyathreshold.Otherkindsofregularizationsuchasan penaltyontheleafvaluescanalsobe
addedtoavoidoverfitting.
Usage
Recently,gradientboostinghasgainedsomepopularityinthefieldoflearningtorank.Thecommercialwebsearch
enginesYahoo[12]andYandex[13]usevariantsofgradientboostingintheirmachinelearnedrankingengines.
Names
5/6
9/2/2016
Themethodgoesbyavarietyofnames.Friedmanintroducedhisregressiontechniqueasa"GradientBoosting
Machine"(GBM).[2]Mason,Baxteret.el.describedthegeneralizedabstractclassofalgorithmsas"functional
gradientboosting".[4][5]
Apopularopensourceimplementation[10]forRcallsit"GeneralizedBoostingModel".Commercial
implementationsfromSalfordSystemsusethenames"MultipleAdditiveRegressionTrees"(MART)andTreeNet,
bothtrademarked.
Seealso
AdaBoost
Randomforest
xgboost
References
1.Breiman,L."ArcingTheEdge(http://statistics.berkeley.edu/sites/default/files/techreports/486.pdf)"(June1997)
2.Friedman,J.H."GreedyFunctionApproximation:AGradientBoostingMachine.(http://wwwstat.stanford.edu/~jhf/ftp/t
rebst.pdf)"(February1999)
3.Friedman,J.H."StochasticGradientBoosting.(https://statweb.stanford.edu/~jhf/ftp/stobst.pdf)"(March1999)
4.Mason,L.Baxter,J.Bartlett,P.L.Frean,Marcus(1999)."BoostingAlgorithmsasGradientDescent"(PDF).InS.A.
SollaandT.K.LeenandK.Mller.AdvancesinNeuralInformationProcessingSystems12.MITPress.pp.512518.
5.Mason,L.Baxter,J.Bartlett,P.L.Frean,Marcus(May1999).BoostingAlgorithmsasGradientDescentinFunction
Space(PDF).
6.ChengLi."AGentleIntroductiontoGradientBoosting"(PDF).
7.Hastie,T.Tibshirani,R.Friedman,J.H.(2009)."10.BoostingandAdditiveTrees".TheElementsofStatistical
Learning(2nded.).NewYork:Springer.pp.337384.ISBN0387848576.
8.Note:incaseofusualCARTtrees,thetreesarefittedusingleastsquaresloss,andsothecoefficient
fortheregion
isequaltojustthevalueofoutputvariable,averagedoveralltraininginstancesin
.
9.Notethatthisisdifferentfrombagging,whichsampleswithreplacementbecauseitusessamplesofthesamesizeasthe
trainingset.
10.Ridgeway,Greg(2007).GeneralizedBoostedModels:Aguidetothegbmpackage.(https://cran.rproject.org/web/packag
es/gbm/gbm.pdf)
11.TianqiChen.IntroductiontoBoostedTrees(http://homes.cs.washington.edu/~tqchen/pdf/BoostedTree.pdf)
12.Cossock,DavidandZhang,Tong(2008).StatisticalAnalysisofBayesOptimalSubsetRanking(http://www.stat.rutgers.
edu/~tzhang/papers/it08ranking.pdf),page14.
13.Yandexcorporateblogentryaboutnewrankingmodel"Snezhinsk"(http://webmaster.ya.ru/replies.xml?item_no=5707&nc
rnd=5118)(inRussian)
Retrievedfrom"https://en.wikipedia.org/w/index.php?title=Gradient_boosting&oldid=736184383"
Categories: Decisiontrees Ensemblelearning
Thispagewaslastmodifiedon25August2016,at19:23.
TextisavailableundertheCreativeCommonsAttributionShareAlikeLicenseadditionaltermsmayapply.
Byusingthissite,youagreetotheTermsofUseandPrivacyPolicy.Wikipediaisaregisteredtrademark
oftheWikimediaFoundation,Inc.,anonprofitorganization.
6/6

Gradient Boosting - Wikipedia, The Free Encyclopedia

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Gradient Boosting - Wikipedia, The Free Encyclopedia

Загружено:

Авторское право:

Доступные форматы

9/2/2016

Вам также может понравиться