Академический Документы
Профессиональный Документы
Культура Документы
1,
Classification: k-NN and Decision trees
2,
Similarity and dissimilarity
3,
Similarity and dissimilarity (2)
“Parliament overwhelmingly approved “Parliament's Committee for “The cabinet on Wednesday will be
amendments to the Firearms Act on Constitutional Law says that persons looking at a controversial package of
Wednesday. The new law requires applying for handgun licences should proposed changes in the curriculum in
physicians to inform authorities of not be required to join a gun club in the nation's comprehensive schools.
individuals they consider unfit to own the future. Government is proposing The most divisive issue is expected to
guns. It also increases the age for that handgun users be part of a gun be the question of expanded language
handgun ownership from 18 to 20.” association for at least two years.” teaching.”
4,
Similarity and dissimilarity (3)
I Similarity: s
I Numerical measure of the degree to which two objects are alike
I Higher for objects that are alike
I Typically between 0 (no similarity) and 1 (completely similar)
I Dissimilarity: d
I Numerical measure of the degree to which two objects are
different
I Higher for objects that are different
I Typically between 0 (no difference) and ∞ (completely
different)
5,
Nearest neighbour classifier
6,
Nearest neighbour classifier (2)
7,
Nearest neighbour classifier (3)
includes ‘millions’
no yes
no yes no yes
no yes
9,
Decision trees: A second example
I Node contents:
I Each terminal node is assigned a prediction
(here, for simplicity: a definite class label).
I Each non-terminal node defines a test, with the outgoing
edges representing the various possible results of the test
(here, for simplicity: a test only involves a single feature)
11 ,
Learning a decision tree from data: General idea
x2
0 x1
0 1 2 3 4 5
12 ,
Learning a decision tree from data: General idea
x2
A x1 > 2.8
0 x1
0 1 2 3 4 5
12 ,
Learning a decision tree from data: General idea
x2
A x1 > 2.8
3 x2 > 2.0
2 B
0 x1
0 1 2 3 4 5
12 ,
Learning a decision tree from data: General idea
x2
A x1 > 2.8
2 B
B
1
0 x1
0 1 2 3 4 5
12 ,
Learning a decision tree from data: General idea
x2
A x1 > 2.8
2 B x1 > 4.3
B
1
C
0 x1
0 1 2 3 4 5
12 ,
Decision tree vs linear classifier (e.g., LDA)
14 ,
Feature test conditions
I Continuous features:
I Same as ordinal
15 ,
Impurity measures
Key part in choosing test for a node is purity of a subset D of
training data
I Suppose there are K classes. Let p̂mc be the fraction of
examples of class c among all examples in node m:
I Impurity measures:
16 ,
Impurity measures: Binary classification
0.5 Entropy/2
Gini
Misclassification error
0.4
0.3
entropy
0.2
0.1
0.0
p0
Q(D) − Q({ D1 , D2 })
18 ,
Avoiding over-fitting
I The tree building process that splits until nodes have size
nmax (e.g., 5), will overfit
19 ,
Reduced-error pruning
I The tree building stage can be done using any of the impurity
measures (misclass. rate, Gini, entropy), but in the pruning
stage, Steps 4–5 almost invariably use misclassification rate
20 ,
Cost-complexity pruning with Boston data and rpart
## set things up
library(rpart); library(MASS)
dev.new(); par( mfrow=c(1,3) )
## check out the data quickly
head(Boston)
## fit & plot tree
tt <- rpart( medv ∼ ., data=Boston )
plot( tt ); text( tt )
## plot cost-complexity curve
plotcp( tt )
## apply ‘‘1-SE’’ rule, yields cp=0.03 in this case
ttp <- prune( tt, 0.03 )
plot(ttp); text(ttp)
21 ,
Properties of decision trees
I Nonparametric approach:
I If the tree size is unlimited, can in approximate any decision
boundary to arbitrary precision (like k-NN)
I So with infinite data could in principle always learn the optimal
classifier
I But with finite training data needs some way of avoiding
overfitting (pruning) — bias–variance tradeoff!
x2
0 x1
0 1 2 3 4 5 22 ,
Properties of decision trees
I Nonparametric approach:
I If the tree size is unlimited, can in approximate any decision
boundary to arbitrary precision (like k-NN)
I So with infinite data could in principle always learn the optimal
classifier
I But with finite training data needs some way of avoiding
overfitting (pruning) — bias–variance tradeoff!
x2
0 x1
0 1 2 3 4 5 22 ,
Properties of decision trees
I Nonparametric approach:
I If the tree size is unlimited, can in approximate any decision
boundary to arbitrary precision (like k-NN)
I So with infinite data could in principle always learn the optimal
classifier
I But with finite training data needs some way of avoiding
overfitting (pruning) — bias–variance tradeoff!
x2
0 x1
0 1 2 3 4 5 22 ,
Properties of decision trees
I Nonparametric approach:
I If the tree size is unlimited, can in approximate any decision
boundary to arbitrary precision (like k-NN)
I So with infinite data could in principle always learn the optimal
classifier
I But with finite training data needs some way of avoiding
overfitting (pruning) — bias–variance tradeoff!
x2
0 x1
0 1 2 3 4 5 22 ,
Properties of decision trees
I Nonparametric approach:
I If the tree size is unlimited, can in approximate any decision
boundary to arbitrary precision (like k-NN)
I So with infinite data could in principle always learn the optimal
classifier
I But with finite training data needs some way of avoiding
overfitting (pruning) — bias–variance tradeoff!
x2
0 x1
0 1 2 3 4 5 22 ,
Properties of decision trees
I Nonparametric approach:
I If the tree size is unlimited, can in approximate any decision
boundary to arbitrary precision (like k-NN)
I So with infinite data could in principle always learn the optimal
classifier
I But with finite training data needs some way of avoiding
overfitting (pruning) — bias–variance tradeoff!
x2
0 x1
0 1 2 3 4 5 22 ,
Properties of decision trees
I Nonparametric approach:
I If the tree size is unlimited, can in approximate any decision
boundary to arbitrary precision (like k-NN)
I So with infinite data could in principle always learn the optimal
classifier
I But with finite training data needs some way of avoiding
overfitting (pruning) — bias–variance tradeoff!
x2
0 x1
0 1 2 3 4 5 22 ,
Properties of decision trees
I Nonparametric approach:
I If the tree size is unlimited, can in approximate any decision
boundary to arbitrary precision (like k-NN)
I So with infinite data could in principle always learn the optimal
classifier
I But with finite training data needs some way of avoiding
overfitting (pruning) — bias–variance tradeoff!
x2
0 x1
0 1 2 3 4 5 22 ,
Properties of decision trees
I Nonparametric approach:
I If the tree size is unlimited, can in approximate any decision
boundary to arbitrary precision (like k-NN)
I So with infinite data could in principle always learn the optimal
classifier
I But with finite training data needs some way of avoiding
overfitting (pruning) — bias–variance tradeoff!
x2
0 x1
0 1 2 3 4 5 22 ,
Properties of decision trees (contd.)
I The textbook may give the impression that decision trees and
ensemble methods belong together, but we will consider
ensemble methods more generally (in a week or two)
23 ,