Академический Документы
Профессиональный Документы
Культура Документы
variable selection
Part 2: Wrappers for feature subset selection
Mikko Korpela
26th September, 2006
Outline of presentation
Introduction
Feature subset selection
Relevance and optimality of features
Filter approach, wrapper approach
Experimental setting
Comparative results
Overfitting
Summary
Introduction
Bibliography
Different from choosing the relevant set of
features!
[1] R. Kohavi and G. H. John. Wrappers for feature subset selection. Artificial
Intelligence, 97(12):273324, 1997.
3
Relevance of features
Intuitively, X1 is relevant
Both X2 and X4 are relevant, but either one can be
omitted
Definitions of relevance fail miserably:
Filter approach
Input
features
Feature
subset selection
Induction
Algorithm
Wrapper approach
Training set
Training set
Performance
estimation
Feature set
Feature set
Induction
Algorithm
Feature evaluation
Feature set
Hypothesis
Induction Algorithm
Final Evaluation
Test set
11
Estimated
Accuracy
1,1,0,1
1,0,1,1
0,1,1,1
12
1,1,0,0
1,0,0,0
0,1,0,0
0,0,1,0
0,0,0,1
1,0,1,0
0,1,1,0
1,0,0,1
0,1,0,1
1,1,1,0
1,1,0,1
1,0,1,1
0,1,1,1
1,1,1,1
13
0,0,1,1
2
3
3
2
3
Induction
Induction
c
c
2
1
Eval
Experimental setting:
Datasets
*
14
Experimental setting:
Induction algorithms
Two families of induction algorithms used in the
paper
(Induction algorithms build classifiers)
1. Decision tree algorithms
C4.5, builds trees top-down and prunes them
ID3, no pruning
2. Naive-Bayes
Naive
15
16
Stale search
17
Compound operators in
state space
All features
Compound operators in
state space: Example
0,0,0,0
1,1,0,0
1,0,0,0
0,1,0,0
0,0,1,0
0,0,0,1
1,0,1,0
0,1,1,0
1,0,0,1
0,1,0,1
1,1,1,0
1,1,0,1
1,0,1,1
0,1,1,1
0,0,1,1
1,1,1,1
19
Compound operators in
state space: Results
real acc
crx - backward
84
80
82
60
80
40
78
20
76
100
200
300
Nodes
400
100
200
300
400
Nodes
20
Comparative results
Overfitting
Definition:
Training data
modeled too well
Predictions poor
Search engine guided
by accuracy estimates
Estimates can be poor,
misleading
Mainly a problem
when number of
instances small
Accuracy
Rand - forward selection
80
75
70
65
60
55
50
0
Accuracy
Breast Cancer - forward selection
82
80
78
76
74
72
70
Nodes
0
20
40
60
80
22
Nodes
Accuracy
Glass2 - backward elimination
85
80
75
70
0
20
40
60
80
100
Nodes
Summary
Feature
subset selection
24
Induction
Algorithm
* Classification results
25