Академический Документы
Профессиональный Документы
Культура Документы
Task:
– Learn a model that maps each attribute set x
into one of the predefined class labels y
3 No Small 70K No
6 No Medium 60K No
Training Set
Apply
Tid Attrib1 Attrib2 Attrib3 Class Model
11 No Small 55K ?
15 No Large 67K ?
10
Test Set
Base Classifiers
– Decision Tree based Methods
– Rule-based Methods
– Nearest-neighbor
– Neural Networks
– Naïve Bayes and Bayesian Belief Networks
– Support Vector Machines
Ensemble Classifiers
– Boosting, Bagging, Random Forests
cal cal us
ri ri uo
ego ego tin ss
t t n a
ca ca co cl
Splitting Attributes
Home Marital Annual Defaulted
ID
Owner Status Income Borrower
1 Yes Single 125K No Home
2 No Married 100K No Owner
Yes No
3 No Single 70K No
4 Yes Married 120K No NO MarSt
5 No Divorced 95K Yes Single, Divorced Married
6 No Married 60K No
Income NO
7 Yes Divorced 220K No
< 80K > 80K
8 No Single 85K Yes
9 No Married 75K No NO YES
10 No Single 90K Yes
10
cal cal us
i i o
or or nu
t eg
t eg
nti
ass
l
ca ca co c MarSt Single,
Married Divorced
Home Marital Annual Defaulted
ID
Owner Status Income Borrower
1 Yes Single 125K No
NO Home
Yes Owner No
2 No Married 100K No
3 No Single 70K No NO Income
4 Yes Married 120K No < 80K > 80K
5 No Divorced 95K Yes
NO YES
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No There could be more than one tree that
10 No Single 90K Yes
fits the same data!
10
Many Algorithms:
– Hunt’s Algorithm (one of the earliest)
– CART
– ID3, C4.5
– SLIQ,SPRINT
Status
Single,
Marital Married
Defaulted = No Divorced
Status
Single, Annual Defaulted = No
Married
Divorced Income
Defaulted = Yes Defaulted = No < 80K >= 80K
(c) (d)
Greedy approach:
– Nodes with purer class distribution are
preferred
C0: 5 C0: 9
C1: 5 C1: 1
Gini Index
GINI (t ) 1 [ p ( j | t )] 2
Entropy
Entropy (t ) p( j | t ) log p( j | t )
j
Misclassification error
Error (t ) 1 max P (i | t ) i
GINI (t ) 1 [ p ( j | t )]2
j
Advantages:
– Inexpensive to construct
– Extremely fast at classifying unknown records
– Easy to interpret for small-sized trees
– Accuracy is comparable to other classification
techniques for many simple data sets
Basic idea:
– If it walks like a duck, quacks like a duck,
then it’s probably a duck
Compute
Distance Test Record
Conditional Probability:
P( X , Y )
P (Y | X )
P( X )
Bayes theorem: P( X , Y )
P( X | Y )
P (Y )
P( X | Y ) P (Y )
P (Y | X )
P( X )
© Tan,Steinbach, Kumar Introduction to Data Mining 8/05/2005 21
Evaluating Classifiers
Confusion Matrix:
PREDICTED CLASS
Class=Yes Class=No
ACTUAL Class=Yes a b
CLASS
Class=No c d
a: TP (true positive)
b: FN (false negative)
c: FP (false positive)
d: TN (true negative)
PREDICTED CLASS
Class=Yes Class=No
Class=Yes a b
ACTUAL (TP) (FN)
CLASS
Class=No c d
(FP) (TN)
ad TP TN
Accuracy
a b c d TP TN FP FN
© Tan,Steinbach, Kumar Introduction to Data Mining 8/05/2005 23
Methods for Classifier Evaluation
Holdout
– Reserve k% for training and (100-k)% for testing
Random subsampling
– Repeated holdout
Cross validation
– Partition data into k disjoint subsets
– k-fold: train on k-1 partitions, test on the remaining one
– Leave-one-out: k=n
Bootstrap
– Sampling with replacement
– .632 bootstrap:
1 b
accboot 0.632 acci 0.368 accs
b i 1
© Tan,Steinbach, Kumar Introduction to Data Mining 8/05/2005 24
Problem with Accuracy
PREDICTED CLASS
TP TN
Accuracy
Class=Yes Class=No TP FN FP TN
TP
Precision (p)
TP FP
ACTUAL Class=Yes 35 5 TP
CLASS Recall (r)
(TP) (FN)
TP FN
2rp 2TP
Class=No 5 5 F - measure (F)
(FP) (TN) r p 2TP FN FP
Accuracy = 0.8
For Yes class: precision = 87.5, recall = 87.5, F-measure = 87.5
For No class: precision = 0.5, recall = 0.5, F-measure = 0.5
02/14/2011
© Tan,Steinbach, Kumar CSCI 8980: Spring
Introduction 2011:
to Data Mining Biomedical Data
Mining 8/05/2005 26
26
Example of classification accuracy measures
PREDICTED CLASS
Class=Yes Class=No
ACTUAL Class=Yes 99 1
CLASS
(TP) (FN)
Class=No 10 90
(FP) (TN)
Accuracy = 0.9450
Sensitivity = 0.99
Specificity = 0.90
02/14/2011
© Tan,Steinbach, Kumar CSCI 8980: Spring
Introduction 2011:
to Data Mining Biomedical Data
Mining 8/05/2005 27
27
Measures of Classification Performance
PREDICTED CLASS
Yes No
ACTUAL
CLASS Yes TP FN
No FP TN
02/14/2011
© Tan,Steinbach, Kumar CSCI 8980: Spring
Introduction 2011:
to Data Mining Biomedical Data
Mining 8/05/2005 29
29
ROC Curve
(TPR,FPR):
• (0,0): declare everything
to be negative class
• (1,1): declare everything
to be positive class
• (1,0): ideal
• Diagonal line:
– Random guessing
– Below diagonal line:
• prediction is opposite of
the true class
02/14/2011
© Tan,Steinbach, Kumar CSCI 8980: Spring
Introduction 2011:
to Data Mining Biomedical Data
Mining 8/05/2005 30
30
Using ROC for Model Comparison
• No model consistently
outperforms the other
– M1 is better for small FPR
– M2 is better for large FPR
02/14/2011
© Tan,Steinbach, Kumar CSCI 8980: Spring
Introduction 2011:
to Data Mining Biomedical Data
Mining 8/05/2005 31
31
ROC (Receiver Operating Characteristic)
02/14/2011
© Tan,Steinbach, Kumar CSCI 8980: Spring
Introduction 2011:
to Data Mining Biomedical Data
Mining 8/05/2005 32
32
ROC Curve Example
02/14/2011
© Tan,Steinbach, Kumar CSCI 8980: Spring
Introduction 2011:
to Data Mining Biomedical Data
Mining 8/05/2005 33
33
How to Construct an ROC curve
02/14/2011
© Tan,Steinbach, Kumar CSCI 8980: Spring
Introduction 2011:
to Data Mining Biomedical Data
Mining 8/05/2005 34
34
How to construct an ROC curve
Class + - + - - - + - + +
P
Threshold >= 0.25 0.43 0.53 0.76 0.85 0.85 0.85 0.87 0.93 0.95 1.00
TP 5 4 4 3 3 3 3 2 2 1 0
FP 5 5 4 4 3 2 1 1 0 0 0
TN 0 0 1 1 2 3 4 4 5 5 5
FN 0 1 1 2 2 2 2 3 3 4 5
TPR 1 0.8 0.8 0.6 0.6 0.6 0.6 0.4 0.4 0.2 0
ROC Curve:
02/14/2011
© Tan,Steinbach, Kumar CSCI 8980: Spring
Introduction 2011:
to Data Mining Biomedical Data
Mining 8/05/2005 35
35