Вы находитесь на странице: 1из 10

b)

IS328 Data Mining

Decision Trees and Classification

Tutorial 10 Exercises

SECTION A - MULTIPLE CHOICE QUESTIONS

Q1 Which of the following is not a decision tree node?


A. Root node.
B. Predicted node.
C. Leaf node.
D. Internal node.

Q2 Which of the following is an unsupervised learning?


A K Nearest Neighbour
B.     Decision Tree
C.     Clustering
D.     Naive Bayesian
E.     None of the above

Q3  A classification problem involves four attributes plus a class. The attributes have 3, 2, 2,
and 3 possible values each. The class has 2 possible values. How many possible different
examples can be there?
A.     10
B.     24
C.     36
D.     72
E.     96

Q4 The power of self-learning system lies in__________.


A. cost.
B. speed.
C. accuracy.
D. simplicity.
b)

Q5 How many internal nodes are there in the tree below:

A. 4
B. 5
C .2
D. 3

Use the three-class confusion matrix below to answer questions (5) through
(9).

Computed Decision
Class 1 Class 2 Class 3

Class 1 10 5 3
Class 2 5 15 5
Class 3 2 4 11

Q6 What percent of the instances were correctly classified?


A 40%
B 45%
C 50%
D 60%

Q7 How many class 2 instances are in the dataset?


A 20
B 23
C 25
D 30
b)

Q8 How many instances were incorrectly classified with class 3?


A 5
B 8
C 10
D 12

Q9 What is the error rate of the model?


A 40%
B 45%
C 50%
D 60%

Q10 Which class instances were classified with the least error rate?
A Class 1
B Class 2
C Class 3
D Class 1 and Class 2

Answers: to MCQs

Q1 – B
Q2 – C
Q3 – D
Q4 - C
Q5 – C
Q6 - D
Q7 - C
Q8 - B
Q9 - A
Q10 - C

Explanations to Q6-Q10.

Q6 What percent of the instances were correctly classified?


Total number of instances = 10+5+3+5+15+5+2+4+11 = 60

Answer = (10 + 15+ 11) / 60 = 60%

Q7 How many class 2 instances are in the dataset?


b)

Answer = (5 + 15+ 5) = 25

Q8 How many instances were incorrectly classified with class 3?


Answer = 3 + 5 = 8

Q9 What is the error rate of the model?


Answer = 100 - Accuracy = 100 – 60 = 40%

Q10 Which class instances were classified with the least error rate?
Error Rate for Class 1 = 8 / 18 = 0.444 = 44.4%
 Error Rate for Class 2 = 5+ 5 = 10 / 25 = 0.4 – 40%
Error Rate for Class 3 = 2+ 4 = 6 / 17 = 0.353 – 35.3%

Therefore, Class 3 instances were classified with the least error rate.

SECTION B – Decision Trees

EX1: Information Gain Calculations

Calculate the information gain for the attribute splits shown in


(a) and (b).

a)
b)

b)
b)
b)

EX2: Determie the root node of the decision tree for the following
data set.
Show your calculations3

Example Attributes Target

  Hour Weather Accident Stall Commute

D1 8 AM Sunny No No Long

D2 8 AM Cloudy No Yes Long

D3 10 AM Sunny No No Short

D4 9 AM Rainy Yes No Long

D5 9 AM Sunny Yes Yes Long

D6 10 AM Sunny No No Short

D7 10 AM Cloudy No No Short

D8 9 AM Rainy No No Medium

D9 9 AM Sunny Yes No Long

D10 10 AM Cloudy Yes Yes Long

D11 10 AM Rainy No No Short

D12 8 AM Cloudy Yes No Long

D13 9 AM Sunny No No Medium

• For the 13 Training instances


– 7 : Long
– 4 : Short
– 2 : Medium
• Entropy:
H(T)= - (7/13) log2(7/13) - (4/13) log2(4/13) - (2/13) log2(2/13)
b)

H(T)=

Given our commute time sample set, we can calculate the entropy of each attribute at the root
node

• Gain(T, Hour)= ?
– 8am=3 (3 Long, 0 Short, 0 Medium)
– 9am=5 (3 Long, 0 Short, 2 Medium)
– 10am=5 (1 Long, 4 Short, 0 Medium)

– Entropy(T8am) = - (3/3) log2 (3/3)
– Entropy(T9am) = - (3/5) log2 (3/5) - (2/5) log2 (2/5)
– Entropy(T10am ) = - (1/5) log2 (1/5) - (4/5) log2 (4/5)
– Gain(T, Hour) = Entropy(T)- ((P(8am)Entropy(T8am) +
P(9am) Entropy(T9am)+ P(10am) Entropy(T10am) )
= Entropy(T) - ((3/13)Entropy(T8am)+(5/13)Entropy(T9am)+
(5/13)Entropy(T10am))

Gain(T, Hour) = 0.768449

• Gain(T, Weather)= ?
– Sunny=6 (3 Long, 2 Short, 1 Medium)
– Cloudy=4 (3 Long, 1 Short, 0 Medium)
– Rainy=3 (1 Long, 1 Short, 1 Medium)

– Entropy(Tsunny) = - (3/6) log2 (3/6) - (2/6) log2 (2/6) - (1/6) log2 (1/6)
– Entropy(Tcloudy) = - (3/4) log2 (3/4) - (1/4) log2 (1/4)
– Entropy(Trainy ) = - (1/3) log2 (1/3) - (1/3) log2 (1/3) - (1/3) log2 (1/3)
– Gain(T, Weather) = Entropy(T)- ((P(Sunny)Entropy(Tsunny) +
P(Cloudy) Entropy(Tcloudy)+ P(Rainy) Entropy(Trainy) )
= Entropy(T) - ((6/13)Entropy(Tsunny)+(4/13)Entropy(Tcloudy)+
(3/13)Entropy(Trainy))

Gain(T, Weather) = 0.130719

• Gain(T, accident)= ?
– Yes=5 (5 Long)
– No=8 (2 Long, 3 Short, 2 Medium)

– Entropy(Tyes)= = - (5/5) log2 (5/5) = 0


– Entropy(Tno)= - (2/8) log2 (2/8) - (3/8) log2 (3/8) - (2/8) log2 (2/8)
– Gain(T, accident) = Entropy(T)- ((P(yes)Entropy(Tyes) +
P(no) Entropy(Tno))
=Entropy(T) - ((5/13)Entropy(Tyes)+(8/13)Entropy(Tno))
Gain(T, accident)= 0.496479
b)

• Gain(T, stall)= ?
– Yes=3 (3 Long)
– No=10 (4 Long, 4 Short, 3 Medium)

– Entropy(Tyes)= = - (3/3) log2 (3/3) = 0
– Entropy(Tno)= - (4/10) log2 (4/10) - (4/10) log2 (4/10) - (3/8) log2 (3/8)
– Gain(T, stall) = Entropy(T)- ((P(yes)Entropy(Tyes) +
P(no) Entropy(Tno))
= Entropy(T) - ((3/13)Entropy(Tyes)+(10/13)Entropy(Tno))

Gain(T, stall)= 0.248842

Attribute Information Gain

Hour 0.768449

Weather 0.130719

Accident 0.496479

Stall 0.248842

Therefore the root node is “Hour”.

EX3: Consider the following decision tree and answer the


question below:
b)

Calculate the accuracy of the above decision tree model using the test data
given below.

TEST DATA
OUTLOO TEMP HUMIDITY WINDY PLAY
K
Sunny Hot High False No
Sunny Hot High True No
Sunny Mild High False Yes
Sunny Cool Normal True Yes
Sunny Mild Normal True No
Rainy Hot High False No
Rainy Hot High True No
Rainy Mild High False Yes
Rainy Cool Normal True Yes
Rainy Mild Normal True No

What is the accuracy of this model./

Total No of Tests = 10
No of correct classifications = 6
The accuracy = 60%

Вам также может понравиться