Академический Документы
Профессиональный Документы
Культура Документы
Dan Steinberg
2012
http://www.salford-systems.com
TreeNet is unusual among learning machines in being very strong for both
classification and regression problems
Aspects of TreeNet
immune to outliers
handles missing values automatically
results invariant under order preserving transformations of variables
Can be used to easily test for presence of interactions and their degree
Medical diagnoses
Insurance claim fraud
Thus historical data for not frauds actually includes undetected fraud
Some of the 0s are actually 1s which complicates learning (possibly fatal)
TreeNet manages such data successfully
Multi-tree methods have been under development since the early 1990s.
Most important variants (and dates of published articles) are:
TreeNet version 1.0 was released in 2002 but its power took some
time to be recognized
Work on TreeNet continues with major refinements underway
(Friedman in collaboration with Salford Systems)
Simplest example:
TreeNet
TreeNet Process
Compute residuals for this simple model (prediction error) for every record
in data (even for classification model)
Grow second small tree to predict the residuals from first tree
New model is now:
Tree1 + Tree2
Compute residuals from this new 2-tree model and grow 3rd tree to predict
revised residuals
10
Tree 1
Tree 3
Tree 2
First tree
grown on
original target.
Intentionally
weak model
11
0.172
YES
- 0.228
COST2INC < 65.4
EQ2CUST_STF < 5.04
NO
- 0.051
Tree 1
+ 0.068
Tree 1
+ 0.065
TOTAL_DEPS < 65.5M
- 0.213
Tree 2
- 0.010
- 0.088
+ 0.140
Tree 3
Predicted
Response
12
13
Slow learning: the method peels the onion extracting very small amounts of
information in any one learning cycle
TreeNet can leverage hints available in the data across a number of
predictors
14
Friedman originally named his methodology MART as the method generates small
trees which are summed to obtain an overall score
The model can be thought of as a series expansion approximating the true functional
relationship
We can think of each small tree as a mini-scorecard, making use of possibly different
combinations of variables. Each mini-scorecard is designed to offer a slight
improvement by correcting and refining its predecessors
Because each tree starts at the root node and can use all of the available data a
TreeNet model can never run out of data no matter how many trees are built.
The number of trees required to obtain best possible performance can vary considerably
from data set to data set
Some models converge after just a handful of trees and other may require many
thousands
15
16
Criterion
CXE Class Error
ROC Area
Lift
---------------------------------------------------------------------------Optimal Number of Trees:
739
1069
643
163
Optimal Criterion
0.4224394
0.1803279
0.8862029
1.9626555
17
Classification Accuracy
Models fit to
artificial data
(RED)
Orange is logistic
regression
Green is TreeNet
Optimized for
Classification
Accuracy
Blue is TreeNet
optimized for
max log
likelihood (CXE)
18
Interpreting TN Models
Partial dependency plots: These exhibit the relationship between the target and
any predictor as captured by the model.
Variable Importance Rankings: These stable rankings give an excellent
assessment of the relative importance of predictors
For regression models SPM 7.0 displays box plots for residuals and residual outlier
diagnostics
For binary classification TN also displays:
ROC curves: TN models produce scores that are typically unique for each scored
record allowing records to be ranked from best to worst. The ROC curve and area
under the curve reveal how successful the ranking is.
Confusion Matrix: Using an adjustable score threshold this matrix displays the
model false positive and false negative rates.
19
20
21
Y-axis: One-half-log-odds
Salford Systems Copyright 2005-2012
22
Partial dependency plots are extracted from the TreeNet model via
simulation
To trace out the impact of the predictor X on the target Y we start with
a single record from the training data and generate a set of
predictions for the target Y for each of a large number of different
values of X
We repeat this process for every record in the training data and then
average the results to produce the displayed partial dependency plot
In this way the plot reflects the effects of all the other relevant
predictors
23
24
25
Can obtain
SAS code
Java
C
PMML
Other languages coming:
SQL
Visual Basic
26
27
Impose Constraint
In this case constraint is mild in that the differences on Y-axis are quite small
Salford Systems Copyright 2005-2012
28
Interaction Detection
29
30
NMOS
PAY_FREQUENCY_2$
31
32
33
Mid-sized bank
34
35
1789 Trees in model with best test sample area under ROC curve
CXE
Class Error
ROC Area
Lift
-------------------------------------------------------------------Optimal N Trees:
2000
1930
1789
1911
Optimal Criterion
0.534914 0.3485293
0.7164921
1.9231729
36
Risk or Prob
of Default
Sales
/ Current
Ratio Assets
1
0.003
0.002
0.001
10
20
30
40
50
60
0.000
Ratio Value
-0.001
-0.002
-0.003
-0.004
37
Risk or Prob
of Default
Ratio 2
0.0006
0.0004
0.0002
Ratio Value
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
-0.0002
-0.0004
38
CXE
Class Error
ROC Area
Lift
-------------------------------------------------------------------Optimal N Trees:
2000
1812
1978
1763
Optimal Criterion
0.54538
0.39331
0.70749
1.84881
2000
0.53491
1930
0.34852
1789
0.71649
1911
1.92317
Two-node trees do not permit interactions of any kind as each term in the
model involves a single predictor in isolation. Two-node tree models can be
highly nonlinear but strictly additive
Three- or more node trees allow progressively higher order interactions.
Running model with different tree sizes allows simple discovery of the precise
amount of interaction required for maximum performance models
In this example no strong evidence for low-order interactions
39
Challenges include
66 banks
25 potential predictors
Study run in 2003!
40
1.4
1.3
1.2
1.1
1
0.9
0.8
945
886
827
768
709
650
591
532
473
414
355
296
237
178
0.6
119
0.7
60
C V M e a n A b s o l u te E r r o r
Mode l Size
41
42
AT
BE
CA
CH
DE
DK
ES
FR
GB
GR
IE
IT
JP
LU
NL
PT
SE
US
Distribution by Country
Swiss banks (CH) tend be rated low risk and Greek and Portuguese,
Italian, Irish and Japanese banks high risk
Salford Systems Copyright 2005-2012
43
Bank Specialization
Distribution
Savings Bank
Cooperative Bank
Commercial Bank
10
20
30
40
44
50
10
15
20
25
30
35
Histogram
0 e+00
1 e+08
2 e+08
3 e+08
4 e+08
5 e+08
6 e+08
45
ROAE
10
20
30
40
Histogram
-20
-10
10
20
30
46
40
10
15
20
Histogram
20
40
60
80
100
47
10
15
20
Histogram
10
15
48
References
49