Академический Документы
Профессиональный Документы
Культура Документы
Overfitting
Supplimentary material
www
http://dms.irb.hr/tutorial/tut_dtrees.php
http://www.cs.uregina.ca/~dbd/cs831/notes/ml
/dtrees/4_dtrees1.html
ICS320 2
Decision Tree for
PlayTennis
Attributes and their values:
Outlook: Sunny, Overcast, Rain
ICS320 3
Decision Tree for
PlayTennis
Outlook
No Yes No Yes
ICS320 4
Decision Tree for
PlayTennis
Outlook
No Yes No
ICS320 Yes 6
Decision Tree for
Conjunction
Outlook=Sunny Wind=Weak
Outlook
Wind No No
Strong Weak
No Yes
ICS320 7
Decision Tree for
Disjunction
Outlook=Sunny Wind=Weak
Outlook
No Yes No Yes
ICS320 8
Decision Tree
decision trees represent disjunctions of conjunction
Outlook
(Outlook=Sunny Humidity=Normal)
(Outlook=Overcast)
(Outlook=Rain Wind=Weak)
ICS320 9
When to consider Decision
Trees
Instances describable by attribute-value pairs
e.g Humidity: High, Normal
ICS320 10
Top-Down Induction of
Decision Trees ID3
4. A the best decision attribute for next
node
5. Assign A as decision attribute for node
6. For each value of A create new
descendant
7. Sort training examples to leaf node
according to
the attribute value of the branch
5. If all training examples are perfectly
classified (same value of target attribute)
stop, else iterate over new
ICS320leaf nodes. 11
Which Attribute is best?
ICS320 12
Entropy
Over
Sunny Rain
cast
Note: 0Log20 =0
ICS320 20
ID3 Algorithm Note: 0Log20
=0
[D1,D2,,D14] Outlook
[9+,5-]
Ssunny=[D1,D2,D8,D9,D11]
[D3,D7,D12,D13]
[D4,D5,D6,D10,D14
[2+,3-] [4+,0-] [3+,2-]
Test for this node
? Yes ?
Gain(Ssunny , Humidity)=0.970-(3/5)0.0 2/5(0.0) = 0.970
Gain(Ssunny , Temp.)=0.970-(2/5)0.0 2/5(1.0)-(1/5)0.0 =
0.570
Gain(Ssunny , Wind)=0.970= -(2/5)1.0 3/5(0.918) = 21
ICS320
ID3 Algorithm
Outlook
No Yes No Yes
coincidence
A long hypothesis that fits the data might be a
coincidence
Argument opposed:
There are many ways to define small sets of hypotheses
hypothesis
ICS320 23
Overfitting
ICS320 24
Overfitting in Decision Tree
Learning
ICS320 25
Avoid Overfitting
How can we avoid overfitting?
Stop growing when data split not statistically
significant
Grow full tree then post-prune
Minimize:
size(tree) +
size(misclassifications(tree))
ICS320 26
Converting a Tree to Rules
Outlook
ICS320 29