Вы находитесь на странице: 1из 6

TMI 4013 Data Mining

Revision
Question 1
Construct a classification model using the simple
naïve Bayes method using the dataset below.
Question 2
a) Calculate the Gini index of impurity for every
descriptive attribute in the dataset given below.

b) Which attribute should be selected as the root of


the tree according to the Gini index?
w
GINI (t )  1   [ p(Ci | t )]2
i 1
m
GiniIndex ( A)  Gini (Class )   P (a j )  Gini ( A  a j )
j 1
Question 3
RegNo Body Body Distance to Distance to Cluster
Weight (kg) Height (cm) Centroid 1 Centroid 2 Tag
B1 65 178 8.1 2.4 2
B2 70 179 3.4 3.6 1
B3 75 182 2.7 9.5 1
B4 72 180 1.5 5.9 1
B5 68 175 7.8 2.0 2

a) Given the data in the table, determine which


cluster is a better cluster by calculating the Sum
of Squared Errors (SSE) for both Cluster 1 and
Cluster 2. SSE(C )   d ( x, r )
k k
2

xCk
Question 3 (cont…)
b) Calculate the within-cluster variation for both
Cluster 1 and Cluster 2. Then, calculate the sum
of within-cluster variation. 𝑤𝑐(𝐶𝑘 ) = 𝐶1 SSE(𝐶𝑘 ) WC   wc(C
𝑘
K

k )
k 1

c) Given that the centroids for Cluster 1 and


Cluster 2 are (72.3, 181.5) and (67.1, 176.8)
respectively, calculate the sum of squared
distances between cluster centroids for Cluster 1
and Cluster 2. BC   d (r , r ) j k
2

1 j  k  K

d) Calculate the overall cluster quality (Q). BC


Q
WC
All the best for your finals!

Вам также может понравиться