Вы находитесь на странице: 1из 10

Find the Best Prospects For a

New Product by Using a Data


Mining Model

Ting Millette,
Wachovia Bank
Copyright © 2007, SAS Institute Inc. All rights reserved.

ABSTRACT

ƒ Goal: The financial firm is launching a


new product and wishes to determine
who are the best 10k prospects for this
new product.

ƒ Logistic Regression, Decision Tree and


Neural Network models are built for
scoring the prospective customers

Copyright © 2007, SAS Institute Inc. All rights reserved.

Copyright © 2007, SAS Institute Inc. All rights reserved. 1


Data Source

ƒ Demographics (Age, Gender, Zip code,


Years of current job, Marital status, Number
of Children …)
ƒ Account Information (Product Line, Average
Monthly Balance, Credit Limit, Cash
Advance)
ƒ Transaction Information (Average Number
of Purchases, Over Limit, Number of
Applications for Credit Card)

Copyright © 2007, SAS Institute Inc. All rights reserved.

Methods

Logistic Regression
ƒ Dependent variable is whether or not the
customer is a good prospect
ƒ The independent variables are the
customer’s age, gender, zip code,
income, marital status, children, car,
saving account, check account and
mortgage.
ƒ Two factor interactions, polynomial terms
Copyright © 2007, SAS Institute Inc. All rights reserved.

Copyright © 2007, SAS Institute Inc. All rights reserved. 2


Methods

Decision Tree
ƒ Split search - binary, multiway, L-way splits
ƒ Splitting Criteria - Gini index, Pearson Chi
Square, Entropy
ƒ Stopping Rule – Postpruning, Preprunning
ƒ Leaf Size:5, Max Branch:2, Max Depth:6
Min Categorical levels:5, Number of Rules:5

Copyright © 2007, SAS Institute Inc. All rights reserved.

Methods

Neural Networks
ƒ Model selection criteria: Profit/loss,
Misclassification Rate, Average Error
ƒ Architecture: GLIM, MLP, ORBFEQ,
NRBFEH. NRBFEW, NRBFEQ
ƒ Direct Connection: No
ƒ Number of Hidden Units: 3
ƒ Max Iterations: 20
Copyright © 2007, SAS Institute Inc. All rights reserved.

Copyright © 2007, SAS Institute Inc. All rights reserved. 3


Variable Selection

ƒ Stepwise Logistic Regression


ƒ Decision Tree
ƒ Variable Selection Node : R Square, Chi
Square

Copyright © 2007, SAS Institute Inc. All rights reserved.

MODEL SETUP

Copyright © 2007, SAS Institute Inc. All rights reserved.

Copyright © 2007, SAS Institute Inc. All rights reserved. 4


Model Assessment

ƒ Confusion Matrix
ƒ Lift Chart
ƒ ROC Chart
ƒ Profit Chart

Copyright © 2007, SAS Institute Inc. All rights reserved.

Model Assessment

Lift Chart
ƒ Regression has a lift of
175%
ƒ Neural network has a lift
of 310%
ƒ Decision tree has a lift of
335%

Copyright © 2007, SAS Institute Inc. All rights reserved.

Copyright © 2007, SAS Institute Inc. All rights reserved. 5


Model Assessment

Profit Chart (Look at Top


20% percentile)
ƒ 0.75k profit earned by
Decision Tree
ƒ 0.56k profit earned by
Neural Network
ƒ 0.39k profit earned by
Logistic Regression
We can tell that the
Decision Tree
outperforms Logistic
Regression and Neural
Network model.

Copyright © 2007, SAS Institute Inc. All rights reserved.

Model Assessment
Target Valid:Root Test:Root
ƒ R-square and Tool Target Event Root ASE ASE ASE
Average Square Error Neural
Network buyer 1 0.4198573 0.375608677 0.377804163
are better Regression buyer 1 0.42919849 0.390205283 0.359309433
measurement of Tree buyer 1 0.29295261 0.305571021 0.341392826
model fit than F test.
ƒ Decision Tree
outperforms the Target Misclassification Valid:Misclassification Test:
Tool Target Event Rate Rate Misclassification Rate
Logistic Regression Neural
and Neural Network Network buyer 1 0.230769231 0.177570094 0.175925926
model with smaller Regression buyer 1 0.237762238 0.177570094 0.166666667
average squared Tree buyer 1 0.097902098 0.112149533 0.148148148
error and smaller
misclassification.
Copyright © 2007, SAS Institute Inc. All rights reserved.

Copyright © 2007, SAS Institute Inc. All rights reserved. 6


Model Validation and Optimization

Validation:
Predicted probability is 90% versus a
random probability of 44% with cutoff
point 0.55; decision tree model has a lift
of 335%, which means it is more than 3
times more accurate than with no
modeling.

Copyright © 2007, SAS Institute Inc. All rights reserved.

Model Validation and Optimization

Optimization:
C= Fixed Cost + Per Customer Cost *
Mailing rate * Population Size
E= Expected Num of Buyers * Avg. Cust.
Value
= (Lift*Mailing Rate)*Avg. Cust. Value
Max Profit = E-C or ROI = E/C

Copyright © 2007, SAS Institute Inc. All rights reserved.

Copyright © 2007, SAS Institute Inc. All rights reserved. 7


Checklist

ƒ Correct differentiation for training, validation and


test data sets
ƒ Use optimization for determining good cut-off
ƒ Cost model recognizes that cost of false positive
< cost of false negative
ƒ Specify baseline model for cost comparison
ƒ Recognition that the value of model calculated
on score data
ƒ Calculates $ worth of model as difference
between model and baseline profits
Copyright © 2007, SAS Institute Inc. All rights reserved.

Conclusion

ƒ This paper introduced how to build up the


data mining models by logistic regression,
decision tree and neural network.
Compare the models by assessing model
performance and generalize the model by
validating the model.
ƒ Targeting the potential customers by
optimal data mining model can help
financial institution reduce costs and
increase revenue.
Copyright © 2007, SAS Institute Inc. All rights reserved.

Copyright © 2007, SAS Institute Inc. All rights reserved. 8


Contact Information

Your comments and questions are valued and


encouraged.

Contact the author at:

Ting Millette
Wachovia Bank
831-277-1276
Ting.millette@wachovia.com

Copyright © 2007, SAS Institute Inc. All rights reserved.

Question?

Thanks for coming!

Copyright © 2007, SAS Institute Inc. All rights reserved.

Copyright © 2007, SAS Institute Inc. All rights reserved. 9


Copyright © 2007, SAS Institute Inc. All rights reserved.

Copyright © 2007, SAS Institute Inc. All rights reserved. 10

Вам также может понравиться