Академический Документы
Профессиональный Документы
Культура Документы
Defaults
Henry Chang | Avani Sharma
Atindra Bandi | Abraham Khan
Group 14
Situation
Key Questions:
Categorical variables
Sex, Education, Marriage, and Payment status for 6 months
Density
20% 0.03
40%
No default
75%
Male
60% 0.02
Female
Default
No default 0.01
25%
80%
0.00
20 40 60 80
Age
40%
30%
30%
20% 16%
20%
10% 10%
2% 2%
0% 0%
University Post Highschool Others Single Married Others
Graduate
100,000 defaulted
Logistic regression
• 10 fold cross validation, 20 times
Select variables that remained most often
• Credit limit
• Recent payment amounts
• Recent delayed payments
• Age of customer
Henry
• Credit limit
• Recent payment amounts
• Recent delayed payments
• Age of customer
Henry
Introduction Data Exploration Analysis Conclusion
Model Comparison
Sensitivity = true positive rate Logistic Naïve Random
Models
Regression Bayes Forest
Specificity = true negative rate Accuracy 80% 80% 80%
Sensitivity 15% 46% 53%
Accuracy = correct prediction rate Specificity 98% 90% 88%
AUC 73% 73% 76%
AUC = area under ROC curve Cut-off 0.57 0.97 0.32
0.2
1 - Specificity
(False Positive)
0.0
0.0 0.2 0.4 0.6 0.8 1.0
Misclassification
• Threshold 0.3
• Number of trees
Error
• Number of variables 0.2 Out of Sample
0.1
In Sample
Random forest has the best balance between TPR and FPR