Вы находитесь на странице: 1из 13

Capstone Project – BFS

Group Name:
1. Anil Jain – DDA1710455
2. Abhineet Vyas – DDA1710158
3. Nilesh Patel –DDA1710093
4. Ramanpreet Singh – DDA1710022
Problem Statement & Business Objective

Problem Statement
• CredX a leading credit card provider that gets thousands of credit card applicants every year. But in the past few years, it has experienced an increase
in credit loss. The CEO believes that the best strategy to mitigate credit risk is to ‘acquire the right customers’.

Business Objective
The objective of this project is to help CredX identify the right customers using predictive modelling techniques on a given past data of bank’s applicants
. As a modelling exercise the following need to be determined/achieved:
1. To Determine – factors/significant variables which affect credit at risk
2. To Create - Strategies which help in mitigating the risk at acquisition
3. To Acess – the financial benefits which are achieved through this project and explain the same in following terms .

• The implications of using the model for auto approval or rejection, i.e. how many applicants on an average would the model automatically approve or reject.
• The potential credit loss avoided with the help of the model
• Assumptions based on which the model has been built
Problem solving Approach
To solve this business problem statement of accessing the credit default risk at acquisition we applied ‘CRISP-DM framework’ which includes the
following steps:
# 1. Business Understanding
# 2. Data Understanding
#2.1 check for null values, sanity check, duplicate records
#2.2 Univariate analysis for categorical variables
#2.2.1 Histogram/Bar chart to understand the distribution
#2.2.2 Box plot to identify the outliers
#2.3 Univarivariate and bivariate analysis for continuous variables
#2.3.1 Histogram to understand the distribution
#2.3.2 Box plot to identify the outliers
# 3. Data Preparation
#3.1 Remove or mutate missing values based on the business justification
#3.2 outlier treatment
#3.3 data imputation for missing values with WOE
#3.4 Chi-square test for feature selection for categorical variables
#3.5 IV test for feature selection for continuous variables
#3.6 During feature selection, we would also revisit the data preparation step to scale the data with WOE values
#3.7 Fine and coarse classing during data WOE analysis of respective variables
# 4. Data Modelling - Prepare the below different models
#4.1 logistic regression
#4.1.1 Demographic data model
#4.1.2 Merge data model (Demographic + credit score data)
#4.2 Decision tree
#4.3 random forest
# 5. Model Evaluation - select the best model based on the below criteria
#5.1 Accuracy,Sensitivity, Specificity of the model
#5.2 KS statistics - Lift and Gain chart
#5.3.Application score card based on the probability
# 6. Model Deployment
#6.1 the potential average credit loss per default and then overall potential average credit loss
#6.2 Auto approval and rejection rate
#6.3 Recommendations to avoid the potential credit loss avoided with the help of the model.
Data Understanding
There are two data sets in this project — demographic and credit bureau data.
Demographic/application data: This is obtained from the information provided by the applicants at the time of credit card application. It contains
customer-level information on age, gender, income, marital status, etc.
Credit bureau: This is taken from the credit bureau and contains variables such as 'number of times 30 DPD or worse in last 3/6/12 months',
'outstanding balance', 'number of trades', etc.
Both the above datasets would be used for predictive modelling and we would be merging these datasets.
For our Analysis we would be considering only the ‘Approved population’ of the dataset (i.e. applications which were accepted for credit card)
For data understating we would be doing the following checks:
1. Identification of categorical and Continuous variables.

2. Check for null values, missing values sanity check, duplicate records & outliers
3. Univariate and bivariate analysis for categorical variables
4. Univariate and Bivariate analysis for continuous variables
EDA – Bivariate Analysis (significant variables)
Low and Middle Income group customers seem to
Customers with <2 years in current company seem to more defaulters than other income groups.
Customers with 2-5 yrs. in current residence seems to
be most defaulters.
be most defaulters

#Note – we converted no of months to years for EDA 0 to 15 –Low Income, 16 to 31 –Middle Income
#Note – we converted no of months to years for EDA
plotting 31 to 45 –upper middle Income , >45- High income
plotting
Between 2-4 no of PL trades opened in 6 last
Between 9-13 & 24 no of trades opened in 12 last
months shows higher no of defaulters
Between 3-5 no of trades opened in 6 last months months shows higher no of defaulters
shows higher no of defaulters
Modelling – Logistic
Logitic Model evalaution
# 4. Data Modeling in detail
Decision Tree /Random Forest
# 4. Model deployment in detail
Decision Tree/Random Forest Model evalaution

# 5. Model Evaluation in detail


Application scorecard

1. We are considering only the approved population for modelling - as we are not aware of the credit beauraue data of the
rejected population .
2. We are deleting records with missing values with significantly very low % of such records assuming that they will not have any
significant impact on the model.
3. Statistical methodology will suffice here for feature selection.
Final benefits and Recommendation

Вам также может понравиться