Вы находитесь на странице: 1из 33

DecisionTrees

GroupMembers
(inorderofpresentation)

Betson Thomas AndrewGaun BillyAnzovino

[Overview,ID3]

[C4.5]
[PaperSLIQ]

References
y Overview,ID3 y http://en.wikipedia.org/wiki/Decision_Trees y http://www.cise.ufl.edu/~ddd/cap6635/Fall97/Short papers/2.htm y http://www.cis.temple.edu/~ingargio/cis587/readings/i d3c45.html y http://www.cse.unsw.edu.au/~billw/cs9414/notes/ml/0 6prop/id3/id3.html y http://www.autonlab.org/tutorials/dtree.html y http://www.autonlab.org/tutorials/infogain.html y http://www.rulequest.com/see5comparison.html

Whatisadecisiontree?
y General y Agraph/modelthathelpsmakedecisions

Whatisadecisiontree?[cont]
y DataMining y Apredictive modelusedtoclassifydata. y Asetofattributesthataretestedtopredictoutcome [class].
y

Eachnode[attribute]representsachoicebetweenanumber ofalternatives[attributevalues] Eachleafnoderepresentsaclassificationofthatdata

Howdowecreateone?
y Weneeddata. y CLS/ID3Requirements y Attributevaluedescription
y

Sameattributesmustdescribeeach[record].Setofvalues mustbefixed. y C4.5handlescontinuousdata Attributesarenotlearnedbythealgorithm Clearlydistinct. Dontwant:Bread={stiff,verystiff,abitstiff,soft,verysoft } Want:Vodka={Likes,DoesntLike} Needtodistinguishvalidpatternsfromchance

y y

Predefinedclasses
y

Discreteclasses
y y y

SufficientTestData
y

PlayBaseball?
Outlook Sunny Sunny Overcast Rain Rain Rain Overcast Sunny Sunny Rain Sunny Overcast Overcast Rain Temperature Hot Hot Hot Mild Cool Cool Cool Mild Cool Mild Mild Mild Hot Mild Humidity High High High High Normal Normal Normal High Normal Normal Normal High Normal High Wind Weak Strong Weak Weak Weak Strong Strong Weak Weak Weak Strong Strong Weak Strong Playball No No Yes Yes Yes No Yes No Yes Yes Yes Yes Yes No

Algorithms
y y y y

CLS ID3 C4.5 C5/See5

sleep

income

forest

BlueisC5.GrayisC4.5

CLS
y C=trainingdata y Step1:Ifall[records]inCarepositive,thencreateYESnode andhalt. y Ifall[records]inCarenegative,createaNOnodeandhalt. y Otherwiseselectan[attribute]withvaluesv1,...,vn andcreatea decisionnode. y Step2:Partitionthetraining[records]inCintosubsetsC1,C2, ...,Cn accordingtothevaluesofV[v1,,vn]. y Step3:applythealgorithmrecursivelytoeachofthesetsCi. y Note,thetrainer(theexpert)decideswhich[attribute]toselect.

ID3
y ExtendedfromCLS y Addsattributeselectionheuristic. y Searchesthroughdataandselectsthebestattribute, theonethatbestseparatesthedata.
y y

Minimizethedepthofthetree. OccamsRazor y Oftwoequivalenttheoriesorexplanations,allotherthings beingequal,thesimplestoneistobepreferred. OccamsRazorfordecisiontrees y "Theworldisinherentlysimple.Thereforethesmallest decisiontreethatisconsistentwiththesamplesistheone thatismostlikelytoidentifyunknownobjectscorrectly."

ID3AttributeSelection
y InformationGain y Measureshowwellanattributeseparatesdatainto classes. y Selectattributewithhighestinformationgain=most usefulforclassification=mostinformative=good decision y Todefineinfogain,weneedentropy[frominformation theory=quantifyinformation] y Entropy y Measurestheamountofinformationinanattribute

Entropy
y Entropy(S)= p(I)log2p(I) y S=setofsamples y I=valueoftheclassattribute y Example y Shas14samples,9=YES,5=NO y Entropy(S)= p(I)log2p(I) y (p(YES)log2p(YES) +p(NO)log2p(NO)) y ((9/14)log2(9/14)+(5/14)log2(5/14)) y (.642(.637)+.357(1.485))=0.94 y IfShadequaldistribution,7=YES,7=NO y Entropy(S)=1,interpretedastotallyrandom y IfShad,14=YES,0=NO y Entropy(S)=0,interpretedasperfectlyclassified

InformationGain
y Gain(S,A)=InfogainofSduetoattributeA y Gain(S,A)=Entropy(S) Entropy(S,A) y Entropy(S,A)= ((|Sv|/|S|)*Entropy(Sv)) y acrossallpossiblevaluesvofattributeA y Sv =subsetofSforwhichattributeAhasvaluev y |Sv|=numberofelementsinSv y |S|=numberofelementsinS

InformationGainExample
y ExampleusingWind attributefromdataset y WindcanbeWeakorStrong,|Wind|=14 y |Wind=Weak|=8,YES=6,NO=2 y |Wind=Strong|=6,YES=3,NO=3 y Entropy(S,Wind)= ((|Sv|/|S|)*Entropy(Sv)) y =((8/14)Entropy(Sweak)+(6/14)Entropy(Sstrong))
y Entropy(Sweak)=((6/8)log2(6/8)+(2/8)log2(2/8))=0.811 y Entropy(Sstrong)=((3/6)log2(3/6)+(3/6)log2(3/6)=1.0

y =(8/14)0.811+(6/14)1.0=.892 y Gain(S,Wind)=Entropy(S) Entropy(S,Wind) y Gain(S,Wind)=0.94 0.892=0.048

ID3

y Attributeselectionheuristic,ateachnode y Computeinformationgainforeachattribute y Splitontheattributewiththehighestgain y Usingdataset,S=allrecords y Gain(S,Outlook)=0.246 y Gain(S,Temperature)=0.029 y Gain(S,Humidity)=0.151 y Gain(S,Wind)=0.048 y SelectOutlooksinceitishighest,nowrepeatwithremaining

attributes y Gain(Ssunny,Humidity)=0.970 y Gain(Ssunny,Temperature)=0.570 y Gain(Ssunny,Wind)=0.019 y SelectHumidity y Repeatuntil y Alldataisclassifiedperfectly=allrecordsatanodeofthesameclass y Runoutofattributes


y

UseMajorityvoting

Whats wrong with Gain?


Informationgainisbiastowardsattributeswithlarge numberofvalues. Forexample,ifanattributeinthetableisanidthenit wouldbechosen.Thisisbecauseeachbranchwould producealeaf,whichwouldcauseInfoid(D) =0.Thus, informationgainwouldbemaximalbecauseGain(A)= Info(D)

Gain Ratios
y GainratiosareusedtoaddressthebiasoftheID3 algorithm.
GainRatio A D = Gain A SplitInfoA D

y SplitInfo istheinformationduetothesplitofDonthe attributeA.

Gain Ratio Example


rec r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14 Age Income Student Credit_rating Buys_computer <=30 High No Fair No <=30 High No Excellent No 31...40 High No Fair Yes >40 Medium No Fair Yes >40 Low Yes Fair Yes >40 Low Yes Excellent No 31...40 Low Yes Excellent Yes <=30 Medium No Fair No <=30 Low Yes Fair Yes >40 Medium Yes Fair Yes <-=30 Medium Yes Excellent Yes 31...40 Medium No Excellent Yes 31...40 High Yes Fair Yes >40 Medium No Excellent No

FromProfessorAnitaWasilewska's examplesides

Gain Ratio Example 2


Calculationsofinformationratioforrec:
yI(P,N)= (9/(9+5))Log2(9/(9+5))(5/(9+5))Log2(5/(9+5))

=.643(0.64)+(.357)(1.49)=.944 yI(Pi,Ni) = (0/(0+1))Log2*(0/(0+1))(1/(0+1))log2(1/(0+1)) =0(infinite)+(1)(0)=0 yE(rec) =I(Pr1,Nr1) +I(Pr2,Nr2) +...=0 yGain(rec) =.944 0 =.944 ySplitInforec(Root) =14*(1/14*Log2(1/14))= 14*0.271953923=3.807354922 yGainRatiorec(Root) =.944 /3.807354922 =0.248

Gain Ratio Example 3


CalculationsofinformationratioforStudent:
yI(P,N)= (9/(9+5))Log2(9/(9+5))(5/(9+5))Log2(5/(9+5))

=.643(0.64)+(.357)(1.49)=.944 yI(P1,N1)= (6/(6+1))*Log2(6/(6+1))(1/(6+1))*Log2(1/(6+q))= .591 yI(P2,N2)=(3/(3+4))*Log2(3/(3+4))(4/(3+4))*Log2(4/(3+4))= .987 yE(Student) =(((6+1)/14)*.591)=.266+ (((3+4)/14)*.987)= .493 = .789 yGain(Student) =.944 .789 =.155 ySplitInfoStudent(Root) =7/14*Log2(7/14) 7/14*Log2(7/14)=1 yGainRatioStudent(Root) =.155 /1 =0.155

Gain Ratio Example 4


CalculationsofinformationratioforIncome:
yI(P,N)= (9/(9+5))Log2(9/(9+5))(5/(9+5))Log2(5/(9+5))

=.643(0.64)+(.357)(1.49)=.944 yI(P1,N1)= (2/(2+2))*Log2(2/(2+2))(2/(2+2))*Log2(2/(2+2))= 1 yI(P2,N2)=(4/(4+2))*Log2(4/(4+2))(2/(4+2))*Log2(2/(4+2))= .918 yI(P3,N3)=(3/(3+1))*Log2(3/(3+1))(1/(3+1))*Log2(1/(3+1))= .811 yE(Income) =(((2+2)/14)*1)=.286+ (((4+2)/14)*.918)= .393+ (((3+1)/14)*.811)=.232 = .911 yGain(Income) =.944 .911 =.033 ySplitInfoIncome(Root) =4/14*Log2(4/14) 6/14*Log2(6/14) 4/14*Log2(4/14)=1.557 yGainRatioIncome(Root) =.033 /1.557 =0.0212

C4.5
AnextensionofID3.Thealgorithmisverysimilarwiththefollowingdifferences:

y UsesGainRatio tofindattributetosplitinsteadofjustGain
y Canhandlecontinuousattributes Firstthetableissortedbythecontinuousattribute,A.Athreshold,h,fromtheattributelistis chosensplittingA intoA h,A >h.Theh isusedistheonethatmaximizestheGainRatio. yCanbuildwithtrainingsetswithunknownattributes. Onlyrecordswithdefinedvaluesareconsideredforthegainratio yCanusethetestdatawithunknownattributes. Whenanattributeismissingweestimateitsvaluebytheprobabilityofthevariousresults.

Mean and variance

MeanandvarianceforaBernoullitrial:

p,p(1p)

Expectedsuccessratef=S/N Meanandvarianceforf:p,p(1p)/N ForlargeenoughN,f followsaNormaldistribution c%confidenceinterval[z X z]forrandomvariablewith0meanisgivenby: Withasymmetricdistribution:

Pr[ z X z ] = c

TakenfromDr.GregoryPiatetskyShapiro'sslides

Pr [ z X z ]= 1 2 Pr [ X z ]

Transforming f
f p p (1 p ) / N

yTransformedvalueforf:

(i.e.subtractthemeananddividebythestandarddeviation)

yResultingequation:

Pr z

f p z=c p 1 p / N

ySolvingforp:

p= f

z2 z 2N

f f2 N N

z2 / 1 2 4N

z2 N

TakenfromDr.GregoryPiatetskyShapiro'sslides

C4.5 methods
yErrorestimateforsubtree isweightedsumoferrorestimatesforallitsleaves yErrorestimateforanode(upperbound):

e= f

z 2N

f f N N

2 2

4N

/ 1

z N

yIfc=25%thenz=0.69(fromnormaldistribution) yfistheerroronthetrainingdata yNisthenumberofinstancescoveredbytheleaf

TakenfromDr.GregoryPiatetskyShapiro'sslides

C 4.5 tree error


Estimatederrorofeachnodeofthetree: y Ifleaf:error=N*UCF(E,N) y Otherwise:Sumofsubtrees error N=Numberofclasses E=numberofincorrect UCF =Binomialdistribution CF=ConfidenceLevel

C 4.5 pruning
Color red red red red red red white blue blue blue blue blue blue blue blue blue Class 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1

Whenpruningstartatthebottom Ifwesplitoncolorweget3leaves: Ontheredleaf,1withN=6,E=0 Onthewhiteleaf,2withN=1,E=0 Ontheblueleaf,1withN=8,E=0 U25(0,6)=.206 U25(0,1)=.750 U25(0,8)=.143

BasedonnumbersfromJ.R.Quinlan'sslides

C 4.5 pruning part 2


Whenweconsiderreplacinganodefirstwecalculatethe errorofitssubtree andcompareitwitherrorifwe replaceitwithaleaf. Theerrorofthesubtree is: 6*.206+1*.750+9*.143=3.273 Theerrorifreplaced: 16*U25(1,16)=16*.157=2.512 Sowereplacethesubgraph withaleaf.

C5.0/See5
RossQuinlancreatorofID3andC4.5wentontocreateacommercialimprovementon thisalgorithmscalledC5.0forUnix/LinuxandSee5forWindows
y Speed C5.0isordersofmagnitudefasterthanC4.5 y Memoryusage C5.0ismorememoryefficientthanC4.5 y Smallerdecisiontrees C5.0getssimilarresultstoC4.5withconsiderablysmaller

decisiontrees. y Supportforboosting Boostingimprovesthetreesandgivesthemmoreaccuracy. y Weighting C5.0allowsyoutoweightdifferentattributesandmisclassificationtypes. y Winnowing C5.0automaticallywinnowsthedatatohelpreducenoise. Source:Wikipedia

References
y Quinlan,J.R.C4.5:ProgramsforMachineLearning.MorganKaufmann

Publishers,1993. y J.R.Quinlan.Improveduseofcontinuousattributesinc4.5.Journalof ArtificialIntelligenceResearch,4:7790,1996. y C4.5andBeyond.C4.5andBeyond www.cs.uvm.edu/~xwu/kdd/Slides/C4.5byRossQuinlanforICDM06.pdf, 2006 yWasilewska,Anita:DecisionTreeExamples. www.cs.sunysb.edu/~cse634/examplesdtree.pdf yWorld:C4.5algorithm. http://en.wikipedia.org/wiki/C4.5_algorithm yPiatetskyShapiro,Gregory:MachineLearninginRealWorld:C4.5, http://www.kdnuggets.com/data_mining_course/