Вы находитесь на странице: 1из 6

9/2/2016

GradientboostingWikipedia,thefreeencyclopedia

Gradientboosting
FromWikipedia,thefreeencyclopedia

Gradientboostingisamachinelearningtechniqueforregressionandclassificationproblems,whichproducesa
predictionmodelintheformofanensembleofweakpredictionmodels,typicallydecisiontrees.Itbuildsthe
modelinastagewisefashionlikeotherboostingmethodsdo,anditgeneralizesthembyallowingoptimizationof
anarbitrarydifferentiablelossfunction.
TheideaofgradientboostingoriginatedintheobservationbyLeoBreiman[1]thatboostingcanbeinterpretedas
anoptimizationalgorithmonasuitablecostfunction.Explicitregressiongradientboostingalgorithmswere
subsequentlydevelopedbyJeromeH.Friedman[2][3]simultaneouslywiththemoregeneralfunctionalgradient
boostingperspectiveofLlewMason,JonathanBaxter,PeterBartlettandMarcusFrean.[4][5]Thelattertwopapers
introducedtheabstractviewofboostingalgorithmsasiterativefunctionalgradientdescentalgorithms.Thatis,
algorithmsthatoptimizeacostfunctionoverfunctionspacebyiterativelychoosingafunction(weakhypothesis)
thatpointsinthenegativegradientdirection.Thisfunctionalgradientviewofboostinghasledtothedevelopment
ofboostingalgorithmsinmanyareasofmachinelearningandstatisticsbeyondregressionandclassification.

Contents
1 Informalintroduction
2 Algorithm
3 Gradienttreeboosting
3.1 Sizeoftrees
4 Regularization
4.1 Shrinkage
4.2 Stochasticgradientboosting
4.3 Numberofobservationsinleaves
4.4 PenalizeComplexityofTree
5 Usage
6 Names
7 Seealso
8 References

Informalintroduction
(ThissectionfollowstheexpositionofgradientboostingbyLi.[6])
Likeotherboostingmethods,gradientboostingcombinesweaklearnersintoasinglestronglearner,inaniterative
fashion.Itiseasiesttoexplainintheleastsquaresregressionsetting,wherethegoalistolearnamodel that
predictsvalues
,minimizingthemeansquarederror
tothetruevaluesy(averagedoversome
trainingset).
Ateachstage
ofgradientboosting,itmaybeassumedthatthereissomeimperfectmodel (atthe
outset,averyweakmodelthatjustpredictsthemeanyinthetrainingsetcouldbeused).Thegradientboosting
algorithmdoesnotchange inanywayinstead,itimprovesonitbyconstructinganewmodelthataddsan

estimator toprovideabettermodel
https://en.wikipedia.org/wiki/Gradient_boosting

.Thequestionisnow,howtofind ?The

1/6

9/2/2016

GradientboostingWikipedia,thefreeencyclopedia

estimatorhtoprovideabettermodel
.Thequestionisnow,howtofind ?The
gradientboostingsolutionstartswiththeobservationthataperfecthwouldimply

or,equivalently,
.
Therefore,gradientboostingwillfithtotheresidual
.Likeinotherboostingvariants,each
learnstocorrectitspredecessor .Ageneralizationofthisideatootherlossfunctionsthansquarederror(andto
classificationandrankingproblems)followsfromtheobservationthatresiduals
arethenegative
gradientsofthesquarederrorlossfunction

.So,gradientboostingisagradientdescentalgorithm

andgeneralizingitentails"pluggingin"adifferentlossanditsgradient.

Algorithm
Inmanysupervisedlearningproblemsonehasanoutputvariableyandavectorofinputvariablesxconnected
togetherviaajointprobabilitydistribution
.Usingatrainingset
ofknownvalues
ofxandcorrespondingvaluesofy,thegoalistofindanapproximation
toafunction
thatminimizes
theexpectedvalueofsomespecifiedlossfunction
:
.
Gradientboostingmethodassumesarealvaluedyandseeksanapproximation
sumoffunctions
fromsomeclass,calledbase(orweak)learners:

intheformofaweighted

.
Inaccordancewiththeempiricalriskminimizationprinciple,themethodtriestofindanapproximation
that
minimizestheaveragevalueofthelossfunctiononthetrainingset.Itdoessobystartingwithamodel,consisting
ofaconstantfunction
,andincrementallyexpandingitinagreedyfashion:
,
,
wherefisrestrictedtobeafunctionfromtheclassofbaselearnerfunctions.
However,theproblemofchoosingateachstepthebestfforanarbitrarylossfunctionLisahardoptimization
problemingeneral,andsowe'll"cheat"bysolvingamucheasierprobleminstead.
Theideaistoapplyasteepestdescentsteptothisminimizationproblem.Ifweonlycaredaboutpredictionsatthe
pointsofthetrainingset,andfwereunrestricted,we'dupdatethemodelperthefollowingequation,wherewe
view
notasafunctionaloff,butasafunctionofavectorofvalues
:
https://en.wikipedia.org/wiki/Gradient_boosting

2/6

9/2/2016

GradientboostingWikipedia,thefreeencyclopedia

Butasfmustcomefromarestrictedclassoffunctions(that'swhatallowsustogeneralize),we'lljustchoosethe
onethatmostcloselyapproximatesthegradientofL.Havingchosenf,themultiplieristhenselectedusingline
searchjustasshowninthesecondequationabove.
Inpseudocode,thegenericgradientboostingmethodis:[2][7]

Input:trainingset

adifferentiablelossfunction

numberofiterationsM.

Algorithm:
1.Initializemodelwithaconstantvalue:

2.Form=1toM:
1.Computesocalledpseudoresiduals:

2.Fitabaselearner
3.Computemultiplier

topseudoresiduals,i.e.trainitusingthetrainingset
bysolvingthefollowingonedimensionaloptimizationproblem:

4.Updatethemodel:

3.Output

Gradienttreeboosting
Gradientboostingistypicallyusedwithdecisiontrees(especiallyCARTtrees)ofafixedsizeasbaselearners.For
thisspecialcaseFriedmanproposesamodificationtogradientboostingmethodwhichimprovesthequalityoffit
ofeachbaselearner.
Genericgradientboostingatthemthstepwouldfitadecisiontree
topseudoresiduals.Let bethe
numberofitsleaves.Thetreepartitionstheinputspaceinto disjointregions
andpredictsa
constantvalueineachregion.Usingtheindicatornotation,theoutputof
forinputxcanbewrittenasthe
sum:

https://en.wikipedia.org/wiki/Gradient_boosting

3/6

9/2/2016

where

GradientboostingWikipedia,thefreeencyclopedia

isthevaluepredictedintheregion

.[8]

Thenthecoefficients
aremultipliedbysomevalue
function,andthemodelisupdatedasfollows:

,chosenusinglinesearchsoastominimizetheloss

Friedmanproposestomodifythisalgorithmsothatitchoosesaseparateoptimalvalue
foreachofthetree's
regions,insteadofasingle forthewholetree.Hecallsthemodifiedalgorithm"TreeBoost".Thecoefficients
fromthetreefittingprocedurecanbethensimplydiscardedandthemodelupdaterulebecomes:

Sizeoftrees
,thenumberofterminalnodesintrees,isthemethod'sparameterwhichcanbeadjustedforadatasetathand.It
controlsthemaximumallowedlevelofinteractionbetweenvariablesinthemodel.With
(decisionstumps),
nointeractionbetweenvariablesisallowed.With
themodelmayincludeeffectsoftheinteractionbetween
uptotwovariables,andsoon.
Hastieetal.[7]commentthattypically
workwellforboostingandresultsarefairlyinsensitivetothe
choiceof inthisrange,
isinsufficientformanyapplications,and
isunlikelytoberequired.

Regularization
Fittingthetrainingsettoocloselycanleadtodegradationofthemodel'sgeneralizationability.Severalsocalled
regularizationtechniquesreducethisoverfittingeffectbyconstrainingthefittingprocedure.
OnenaturalregularizationparameteristhenumberofgradientboostingiterationsM(i.e.thenumberoftreesinthe
modelwhenthebaselearnerisadecisiontree).IncreasingMreducestheerrorontrainingset,butsettingittoo
highmayleadtooverfitting.AnoptimalvalueofMisoftenselectedbymonitoringpredictionerroronaseparate
validationdataset.BesidescontrollingM,severalotherregularizationtechniquesareused.

Shrinkage
Animportantpartofgradientboostingmethodisregularizationbyshrinkagewhichconsistsinmodifyingthe
updateruleasfollows:

whereparameter iscalledthe"learningrate".

https://en.wikipedia.org/wiki/Gradient_boosting

4/6

9/2/2016

GradientboostingWikipedia,thefreeencyclopedia

Empiricallyithasbeenfoundthatusingsmalllearningrates(suchas
)yieldsdramaticimprovementsin
model'sgeneralizationabilityovergradientboostingwithoutshrinking(
).[7]However,itcomesattheprice
ofincreasingcomputationaltimebothduringtrainingandquerying:lowerlearningraterequiresmoreiterations.

Stochasticgradientboosting
SoonaftertheintroductionofgradientboostingFriedmanproposedaminormodificationtothealgorithm,
motivatedbyBreiman'sbaggingmethod.[3]Specifically,heproposedthatateachiterationofthealgorithm,abase
learnershouldbefitonasubsampleofthetrainingsetdrawnatrandomwithoutreplacement.[9]Friedman
observedasubstantialimprovementingradientboosting'saccuracywiththismodification.
Subsamplesizeissomeconstantfractionfofthesizeofthetrainingset.Whenf=1,thealgorithmisdeterministic
andidenticaltotheonedescribedabove.Smallervaluesoffintroducerandomnessintothealgorithmandhelp
preventoverfitting,actingasakindofregularization.Thealgorithmalsobecomesfaster,becauseregressiontrees
havetobefittosmallerdatasetsateachiteration.Friedman[3]obtainedthat
leadstogoodresults
forsmallandmoderatesizedtrainingsets.Therefore,fistypicallysetto0.5,meaningthatonehalfofthetraining
setisusedtobuildeachbaselearner.
Also,likeinbagging,subsamplingallowsonetodefineanoutofbagerrorofthepredictionperformance
improvementbyevaluatingpredictionsonthoseobservationswhichwerenotusedinthebuildingofthenextbase
learner.Outofbagestimateshelpavoidtheneedforanindependentvalidationdataset,butoftenunderestimate
actualperformanceimprovementandtheoptimalnumberofiterations.[10]

Numberofobservationsinleaves
Gradienttreeboostingimplementationsoftenalsouseregularizationbylimitingtheminimumnumberof
observationsintrees'terminalnodes(thisparameteriscalledn.minobsinnodeintheRgbmpackage[10]).Itisusedin
thetreebuildingprocessbyignoringanysplitsthatleadtonodescontainingfewerthanthisnumberoftrainingset
instances.
Imposingthislimithelpstoreducevarianceinpredictionsatleaves.

PenalizeComplexityofTree
Anotherusefulregularizationtechniquesforgradientboostedtreesistopenalizemodelcomplexityofthelearned
model.[11]Themodelcomplexitycanbedefinedproportionalnumberofleavesinthelearnedtrees.Thejoint
optimizationoflossandmodelcomplexitycorrespondstoapostpruningalgorithmtoremovebranchesthatfailto
reducethelossbyathreshold.Otherkindsofregularizationsuchasan penaltyontheleafvaluescanalsobe
addedtoavoidoverfitting.

Usage
Recently,gradientboostinghasgainedsomepopularityinthefieldoflearningtorank.Thecommercialwebsearch
enginesYahoo[12]andYandex[13]usevariantsofgradientboostingintheirmachinelearnedrankingengines.

Names
https://en.wikipedia.org/wiki/Gradient_boosting

5/6

9/2/2016

GradientboostingWikipedia,thefreeencyclopedia

Themethodgoesbyavarietyofnames.Friedmanintroducedhisregressiontechniqueasa"GradientBoosting
Machine"(GBM).[2]Mason,Baxteret.el.describedthegeneralizedabstractclassofalgorithmsas"functional
gradientboosting".[4][5]
Apopularopensourceimplementation[10]forRcallsit"GeneralizedBoostingModel".Commercial
implementationsfromSalfordSystemsusethenames"MultipleAdditiveRegressionTrees"(MART)andTreeNet,
bothtrademarked.

Seealso
AdaBoost
Randomforest
xgboost

References
1.Breiman,L."ArcingTheEdge(http://statistics.berkeley.edu/sites/default/files/techreports/486.pdf)"(June1997)
2.Friedman,J.H."GreedyFunctionApproximation:AGradientBoostingMachine.(http://wwwstat.stanford.edu/~jhf/ftp/t
rebst.pdf)"(February1999)
3.Friedman,J.H."StochasticGradientBoosting.(https://statweb.stanford.edu/~jhf/ftp/stobst.pdf)"(March1999)
4.Mason,L.Baxter,J.Bartlett,P.L.Frean,Marcus(1999)."BoostingAlgorithmsasGradientDescent"(PDF).InS.A.
SollaandT.K.LeenandK.Mller.AdvancesinNeuralInformationProcessingSystems12.MITPress.pp.512518.
5.Mason,L.Baxter,J.Bartlett,P.L.Frean,Marcus(May1999).BoostingAlgorithmsasGradientDescentinFunction
Space(PDF).
6.ChengLi."AGentleIntroductiontoGradientBoosting"(PDF).
7.Hastie,T.Tibshirani,R.Friedman,J.H.(2009)."10.BoostingandAdditiveTrees".TheElementsofStatistical
Learning(2nded.).NewYork:Springer.pp.337384.ISBN0387848576.
8.Note:incaseofusualCARTtrees,thetreesarefittedusingleastsquaresloss,andsothecoefficient
fortheregion
isequaltojustthevalueofoutputvariable,averagedoveralltraininginstancesin
.
9.Notethatthisisdifferentfrombagging,whichsampleswithreplacementbecauseitusessamplesofthesamesizeasthe
trainingset.
10.Ridgeway,Greg(2007).GeneralizedBoostedModels:Aguidetothegbmpackage.(https://cran.rproject.org/web/packag
es/gbm/gbm.pdf)
11.TianqiChen.IntroductiontoBoostedTrees(http://homes.cs.washington.edu/~tqchen/pdf/BoostedTree.pdf)
12.Cossock,DavidandZhang,Tong(2008).StatisticalAnalysisofBayesOptimalSubsetRanking(http://www.stat.rutgers.
edu/~tzhang/papers/it08ranking.pdf),page14.
13.Yandexcorporateblogentryaboutnewrankingmodel"Snezhinsk"(http://webmaster.ya.ru/replies.xml?item_no=5707&nc
rnd=5118)(inRussian)

Retrievedfrom"https://en.wikipedia.org/w/index.php?title=Gradient_boosting&oldid=736184383"
Categories: Decisiontrees Ensemblelearning
Thispagewaslastmodifiedon25August2016,at19:23.
TextisavailableundertheCreativeCommonsAttributionShareAlikeLicenseadditionaltermsmayapply.
Byusingthissite,youagreetotheTermsofUseandPrivacyPolicy.Wikipediaisaregisteredtrademark
oftheWikimediaFoundation,Inc.,anonprofitorganization.

https://en.wikipedia.org/wiki/Gradient_boosting

6/6

Вам также может понравиться