Академический Документы
Профессиональный Документы
Культура Документы
Contents
1 Objective of Project
2 Assumptions
7 Conclusion
1 Objective of Project
We want to build a model that will help Thera bank which is having more liability customers than asset
customers to identify the potential customer who have higher probability of purchasing the loan on
basis of last year campaign data which is having 9.6 % success rate for 5000 customers.
You are brought in as a consultant and your job is to build the best model which can classify the
right customers who have a higher probability of purchasing the loan. You are expected to do the
following:
EDA of the data available. Showcase the results using appropriate graphs - (10 Marks)
Apply appropriate clustering on the data and interpret the output - (10 Marks)
Build appropriate models on both the test and train data (CART & Random Forest). Interpret
all the model outputs and do the necessary modifications wherever eligible (such as
pruning) - (20 Marks)
Check the performance of all the models that you have built (test and train). Use all the model
performance measures you have learned so far. Share your remarks on which model performs
the best. - (20 Marks)
2 Assumptions
No specific assumption is made about data.
● Libraries loaded – readxl, corrplot, ggplot2, caTools, rpart, rpart.plot, randomForest, lattice, etc.
Personal Loan is considered the Dependent variable and all other attributes as Independent
variables.
We should not consider the ID as its completely unique for each customer and does not help
in model building
The data includes the demographic information of customer like (Age, Income, Experience,
Family, zip code, family members, Education) which represent the customer behavior, So
we need to take these columns under consideration.
Mortgage, PersonalLoan, SecuritiesAccount, CD.Account, Online, Credit card columns all are
having only 0 or 1 as a value.
Personal loan is having mean of 0.096 which infers having 9.6 % success rate in last year
campaign.
Used boxplot to find outliers if any. It was observed that outliers are present for variables Income(in
K/month), CCAvg, Mortgage.
Used Qplots to check whether Personal Loan, Education, Securities Account , CreditCard, CD
Account & Online have lesser ratio when compared to the data provided.
library(readxl)
library(ggplot2)
library(corrplot)
str(mydata)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 5000 obs. of 14 variables:
$ ID : num 1 2 3 4 5 6 7 8 9 10 ...
$ Age (in years) : num 25 45 39 35 35 37 53 50 35 34 ...
$ Experience (in years): num 1 19 15 9 8 13 27 24 10 9 ...
$ Income (in K/month) : num 49 34 11 100 45 29 72 22 81 180 ...
$ ZIP Code : num 91107 90089 94720 94112 91330 ...
$ Family members : num 4 3 1 1 4 4 2 1 3 1 ...
$ CCAvg : num 1.6 1.5 1 2.7 1 0.4 1.5 0.3 0.6 8.9 ...
$ Education : num 1 1 1 2 2 2 2 3 2 3 ...
$ Mortgage : num 0 0 0 0 0 155 0 0 104 0 ...
$ Personal Loan : num 0 0 0 0 0 0 0 0 0 1 ...
$ Securities Account : num 1 1 0 0 0 0 0 0 0 0 ...
$ CD Account : num 0 0 0 0 0 0 0 0 0 0 ...
$ Online : num 0 0 0 0 0 1 1 0 1 0 ...
$ CreditCard : num 0 0 0 0 1 0 0 1 0 0 ...
summary(mydata)
ID Age (in years) Experience (in years) Income (in K/month) ZIP Code Family members
Min. : 1 Min. :23.00 Min. :-3.0 Min. : 8.00 Min. : 9307 Min. :1.000
1st Qu.:1251 1st Qu.:35.00 1st Qu.:10.0 1st Qu.: 39.00 1st Qu.:91911 1st Qu.:1.000
Median :2500 Median :45.00 Median :20.0 Median : 64.00 Median :93437 Median :2.000
Mean :2500 Mean :45.34 Mean :20.1 Mean : 73.77 Mean :93153 Mean :2.397
3rd Qu.:3750 3rd Qu.:55.00 3rd Qu.:30.0 3rd Qu.: 98.00 3rd Qu.:94608 3rd Qu.:3.000
Max. :5000 Max. :67.00 Max. :43.0 Max. :224.00 Max. :96651 Max. :4.000
NA's :18
CCAvg Education Mortgage Personal Loan Securities Account CD Account
Min. : 0.000 Min. :1.000 Min. : 0.0 Min. :0.000 Min. :0.0000 Min. :0.0000
1st Qu.: 0.700 1st Qu.:1.000 1st Qu.: 0.0 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.:0.0000
Median : 1.500 Median :2.000 Median : 0.0 Median :0.000 Median :0.0000 Median :0.0000
Mean : 1.938 Mean :1.881 Mean : 56.5 Mean :0.096 Mean :0.1044 Mean :0.0604
3rd Qu.: 2.500 3rd Qu.:3.000 3rd Qu.:101.0 3rd Qu.:0.000 3rd Qu.:0.0000 3rd Qu.:0.0000
Max. :10.000 Max. :3.000 Max. :635.0 Max. :1.000 Max. :1.0000 Max. :1.0000
Online CreditCard
Min. :0.0000 Min. :0.000
1st Qu.:0.0000 1st Qu.:0.000
Median :1.0000 Median :0.000
Mean :0.5968 Mean :0.294
3rd Qu.:1.0000 3rd Qu.:1.000
Max. :1.0000 Max. :1.000
anyNA(mydata)
TRUE
sum(is.na(mydata))
18
mydata=na.omit(mydata)
correlationMatrix=cor(mydata)
mydata$Personal.Loan=as.factor(mydata$Personal.Loan)
mydata$Securities.Account=as.factor(mydata$Securities.Account)
mydata$CD.Account=as.factor(mydata$CD.Account)
mydata$Online=as.factor(mydata$Online)
mydata$CreditCard=as.factor(mydata$CreditCard)
mydata$Education=as.factor(mydata$Education)
mydata$ZIP.Code=as.factor(mydata$ZIP.Code)
boxplot(mydata$Age..in.years.)
boxplot(mydata$Experience..in.years.)
boxplot(mydata$Income..in.K.month.)
boxplot(mydata$Family.members)
boxplot(mydata$CCAvg)
boxplot(mydata$Mortgage)
qplot(mydata$Personal.Loan)
qplot(mydata$Education)
qplot(mydata$Securities.Account)
qplot(mydata$CD.Account)
qplot(mydata$Online)
qplot(mydata$CreditCard)
Clustering
?dist
print(distMatrix, digits = 3)
1 2 3 4 5 6 7 8 9 10 11
12 13 14 15 16 17 18 19 20 21 22
23 24 25 26 27 28 29 30 31 32 33
34 35 36 37 38 39 40 41 42 43 44
45 46 47 48 49 50 51 52 53 54 55
56 57 58 59 60 61 62 63 64 65 66
67 68 69 70 71 72 73 74 75 76 77
78 79 80 81 82 83 84 85 86 87 88
89 90 91 92 93 94 95 96 97 98 99
100 101 102 103 104 105 106 107 108 109 110
111 112 113 114 115 116 117 118 119 120 121
122 123 124 125 126 127 128 129 130 131 132
133 134 135 136 137 138 139 140 141 142 143
144 145 146 147 148 149 150 151 152 153 154
155 156 157 158 159 160 161 162 163 164 165
166 167 168 169 170 171 172 173 174 175 176
177 178 179 180 181 182 183 184 185 186 187
188 189 190 191 192 193 194 195 196 197 198
199 200 201 202 203 204 205 206 207 208 209
210 211 212 213 214 215 216 217 218 219 220
221 222 223 224 225 226 227 228 229 230 231
232 233 234 235 236 237 238 239 240 241 242
243 244 245 246 247 248 249 250 251 252 253
254 255 256 257 258 259 260 261 262 263 264
265 266 267 268 269 270 271 272 273 274 275
276 277 278 279 280 281 282 283 284 285 286
287 288 289 290 291 292 293 294 295 296 297
298 299 300 301 302 303 304 305 306 307 308
309 310 311 312 313 314 315 316 317 318 319
320 321 322 323 324 325 326 327 328 329 330
331 332 333 334 335 336 337 338 339 340 341
342 343 344 345 346 347 348 349 350 351 352…………and so on
clustering=scale(mydata[,1:14])
print(clustering)
ID Age (in years) Experience (in years) Income (in K/month) ZIP Code
[1,] -1.7315312529 -1.77423939 -1.665911856 -0.538174951 -0.96401766
[2,] -1.7308385019 -0.02952064 -0.096320584 -0.864022980 -1.44378718
[3,] -1.7301457508 -0.55293627 -0.445118645 -1.363656626 0.73873996
[4,] -1.7294529998 -0.90188002 -0.968315735 0.569708351 0.45219785
[5,] -1.7287602487 -0.90188002 -1.055515250 -0.625067758 -0.85892081
[6,] -1.7280674977 -0.72740814 -0.619517675 -0.972638990 -0.48613329
[7,] -1.7273747466 0.66836686 0.601275536 -0.038541305 -0.67936070
[8,] -1.7266819956 0.40665905 0.339676991 -1.124701404 0.37255045
[9,] -1.7259892445 -0.90188002 -0.881116220 0.156967513 -1.44378718
[10,] -1.7252964935 -0.98911595 -0.968315735 2.307564509 -0.06103300
[11,] -1.7246037424 1.71519811 1.647669717 0.678324360 0.73402709
[12,] -1.7239109914 -1.42529564 -1.317113796 -0.625067758 -1.35518534
[13,] -1.7232182403 0.23218717 0.252477476 0.873833178 -0.02191623
[14,] -1.7225254893 1.19178248 1.037273112 -0.733683768 0.83299723
[15,] -1.7218327382 1.88966998 1.822068747 0.830386774 -0.66522211
[16,] -1.7211399872 1.27901842 0.862874082 -1.124701404 0.89614960
[17,] -1.7204472361 -0.64017220 -0.532318160 1.221404410 0.87541300
[18,] -1.7197544851 -0.29122845 -0.183520099 0.156967513 0.54315612
[19,] -1.7190617340 0.05771530 0.078078446 2.589966135 -0.72978834
[20,] -1.7183689830 0.84283873 0.688475051 -1.146424606 0.73873996
[21,] -1.7176762319 0.93007467 0.950073597 -1.059531798 0.40648307
[22,] -1.7169834809 1.01731061 0.601275536 -0.234050123 -1.44095946
[23,] -1.7162907298 -1.42529564 -1.317113796 -0.255773325 -1.35518534
[24,] -1.7155979788 -0.11675658 -0.183520099 -0.668514162 -0.86363367
[25,] -1.7149052277 -0.81464408 -0.793916705 1.699314854 1.11624033
[26,] -1.7142124767 -0.20399252 -0.096320584 -0.972638990 0.54315612
[27,] -1.7135197256 -0.46570033 -0.357919130 0.200413917 0.90086246
[28,] -1.7128269746 0.05771530 -0.009121069 1.829654065 -1.45556934
[29,] -1.7121342235 0.93007467 0.862874082 -0.559898153 0.65343713
[30,] -1.7114414724 -0.64017220 -0.619517675 0.982449188 0.44842756
[31,] -1.7107487214 1.19178248 1.298871657 -0.842299778 -0.02191623
[32,] -1.7100559703 -0.46570033 -0.357919130 -0.972638990 0.45455428
[33,] -1.7093632193 0.66836686 0.688475051 -0.711960566 0.77691415
[34,] -1.7086704682 -1.33805970 -1.229914280 -1.211594212 -0.85892081
[35,] -1.7079777172 -1.25082377 -1.317113796 -0.516451749 0.41590880
[36,] -1.7072849661 0.23218717 0.339676991 0.156967513 -0.23823667
[37,] -1.7065922151 1.19178248 1.298871657 1.025895592 0.73873996
[38,] -1.7058994640 0.49389498 0.426876506 -0.060264507 1.25432724
[39,] -1.7052067130 -0.29122845 -0.183520099 1.460359632 0.45314042
[40,] -1.7045139619 -0.64017220 -0.619517675 0.135244311 0.45361171
[41,] -1.7038212109 1.01731061 1.037273112 0.222137119 -0.22645451
[42,] -1.7031284598 -0.98911595 -0.968315735 -0.299219729 0.45691071
[43,] -1.7024357088 -1.16358783 -1.142714765 1.264850814 -1.47677723
[44,] -1.7017429577 -0.55293627 -0.445118645 -0.625067758 1.16101254
[45,] -1.7010502067 0.05771530 -0.009121069 0.656601158 0.43004739
[46,] -1.7003574556 1.01731061 0.950073597 -0.473005345 0.73873996
[47,] -1.6996647046 -0.55293627 -0.532318160 -0.668514162 0.87729815
[48,] -1.6989719535 -0.72740814 -0.706717190 2.611689337 -0.83535649
[49,] -1.6982792025 0.93007467 0.514076021 0.156967513 1.22275105
[50,] -1.6975864514 -0.46570033 -0.357919130 -0.538174951 -0.36736913
[51,] -1.6968937004 -1.16358783 -1.055515250 -1.428826232 -0.49932931
[52,] -1.6962009493 1.36625436 1.473270687 1.243127612 0.73873996
[53,] -1.6955081983 -1.33805970 -1.229914280 -0.038541305 0.40177021
[54,] -1.6948154472 0.40665905 0.514076021 2.524796529 -1.37026651
[55,] -1.6941226962 -1.42529564 -1.317113796 -0.646790960 1.25668367
[56,] -1.6934299451 -0.37846439 -0.270719615 1.416913228 0.40978208
[57,] -1.6927371941 0.84283873 0.862874082 -0.972638990 0.40177021
[58,] -1.6920444430 0.93007467 0.950073597 1.243127612 1.16101254
[59,] -1.6913516920 -1.51253158 -1.578712341 0.417645937 0.43004739
[60,] -1.6906589409 -1.25082377 -1.317113796 2.481350125 -0.86363367
[61,] -1.6899661899 0.31942311 0.339676991 -0.755406970 -1.29533198
[62,] -1.6892734388 0.14495123 0.078078446 1.112788400 0.11994096
[63,] -1.6885806878 -0.29122845 -0.183520099 -1.124701404 -1.44378718
[64,] -1.6878879367 -0.29122845 -0.270719615 -0.907469384 0.64589654
[65,] -1.6871951857 0.14495123 0.252477476 0.678324360 -1.47442079
[66,] -1.6865024346 1.19178248 1.298871657 1.243127612 -0.84478222
[67,] -1.6858096835 1.45349030 1.386071172 0.678324360 1.18646200
[68,] -1.6851169325 0.66836686 0.252477476 -0.625067758 0.92866836
[69,] -1.6844241814 0.14495123 0.078078446 -0.299219729 0.11994096
[70,] -1.6837314304 0.66836686 0.775674566 -1.168147808 -1.46452378
[71,] -1.6830386793 -0.29122845 -0.183520099 0.895556380 -0.85656437
Family members CCAvg Education Mortgage Personal Loan Securities Account CD Account
[1,] 1.3971629 -0.19336610 -1.0489730 -0.5554684 -0.3258427 2.9286223 -0.2535149
[2,] 0.5254452 -0.25058550 -1.0489730 -0.5554684 -0.3258427 2.9286223 -0.2535149
[3,] -1.2179901 -0.53668251 -1.0489730 -0.5554684 -0.3258427 -0.3413892 -0.2535149
[4,] -1.2179901 0.43604731 0.1416887 -0.5554684 -0.3258427 -0.3413892 -0.2535149
[5,] 1.3971629 -0.53668251 0.1416887 -0.5554684 -0.3258427 -0.3413892 -0.2535149
[6,] 1.3971629 -0.87999891 0.1416887 0.9684153 -0.3258427 -0.3413892 -0.2535149
[7,] -0.3462724 -0.25058550 0.1416887 -0.5554684 -0.3258427 -0.3413892 -0.2535149
[8,] -1.2179901 -0.93721831 1.3323505 -0.5554684 -0.3258427 -0.3413892 -0.2535149
[9,] 0.5254452 -0.76556011 0.1416887 0.4670084 -0.3258427 -0.3413892 -0.2535149
[10,] -1.2179901 3.98365017 1.3323505 -0.5554684 3.0683519 -0.3413892 -0.2535149
[11,] 1.3971629 0.26438911 1.3323505 -0.5554684 -0.3258427 -0.3413892 -0.2535149
[12,] 0.5254452 -1.05165711 0.1416887 -0.5554684 -0.3258427 -0.3413892 -0.2535149
[13,] -0.3462724 1.06546072 1.3323505 -0.5554684 -0.3258427 2.9286223 -0.2535149
[14,] 1.3971629 0.32160851 0.1416887 -0.5554684 -0.3258427 -0.3413892 -0.2535149
[15,] -1.2179901 0.03551150 -1.0489730 -0.5554684 -0.3258427 2.9286223 -0.2535149
[16,] -1.2179901 -0.25058550 1.3323505 -0.5554684 -0.3258427 -0.3413892 -0.2535149
[17,] 1.3971629 1.58043533 1.3323505 0.7619536 3.0683519 -0.3413892 -0.2535149
[18,] 1.3971629 0.26438911 -1.0489730 -0.5554684 -0.3258427 -0.3413892 -0.2535149
[19,] -0.3462724 3.52589497 1.3323505 -0.5554684 3.0683519 -0.3413892 -0.2535149
[20,] -1.2179901 -0.82277951 0.1416887 -0.5554684 -0.3258427 2.9286223 -0.2535149
[21,] NA -0.59390191 0.1416887 0.5358290 -0.3258427 -0.3413892 -0.2535149
[22,] 0.5254452 0.03551150 1.3323505 -0.5554684 -0.3258427 -0.3413892 -0.2535149
[23,] -1.2179901 -0.42224370 -1.0489730 2.0007236 -0.3258427 -0.3413892 -0.2535149
[24,] -0.3462724 -0.70834071 -1.0489730 1.0470673 -0.3258427 2.9286223 -0.2535149
[25,] -0.3462724 1.12268012 -1.0489730 1.0077413 -0.3258427 -0.3413892 -0.2535149
[26,] 0.5254452 -0.82277951 -1.0489730 0.3981878 -0.3258427 -0.3413892 -0.2535149
[27,] 1.3971629 -0.99443771 1.3323505 -0.5554684 -0.3258427 -0.3413892 -0.2535149
[28,] -1.2179901 0.26438911 -1.0489730 -0.5554684 -0.3258427 -0.3413892 -0.2535149
[29,] -1.2179901 0.14995031 1.3323505 -0.5554684 -0.3258427 -0.3413892 -0.2535149
[30,] -1.2179901 0.77936372 0.1416887 -0.5554684 3.0683519 -0.3413892 3.9437520
[31,] -1.2179901 -0.42224370 1.3323505 0.6439755 -0.3258427 -0.3413892 -0.2535149
[32,] -1.2179901 0.03551150 0.1416887 -0.5554684 -0.3258427 -0.3413892 -0.2535149
[33,] -0.3462724 -0.76556011 1.3323505 1.3420126 -0.3258427 -0.3413892 -0.2535149
[34,] 0.5254452 -0.59390191 1.3323505 -0.5554684 -0.3258427 -0.3413892 -0.2535149
[35,] 1.3971629 -0.07892730 1.3323505 -0.5554684 -0.3258427 -0.3413892 -0.2535149
[36,] 0.5254452 -0.70834071 -1.0489730 -0.5554684 -0.3258427 -0.3413892 -0.2535149
[37,] -1.2179901 0.55048611 -1.0489730 -0.5554684 -0.3258427 -0.3413892 -0.2535149
[38,] -1.2179901 -0.30780490 1.3323505 1.3911701 -0.3258427 -0.3413892 -0.2535149
[39,] 0.5254452 1.75209353 1.3323505 -0.5554684 3.0683519 2.9286223 3.9437520
[40,] 1.3971629 -0.70834071 1.3323505 2.2465112 -0.3258427 -0.3413892 -0.2535149
[41,] 0.5254452 -0.19336610 1.3323505 -0.5554684 -0.3258427 2.9286223 -0.2535149
[42,] 0.5254452 0.20716971 -1.0489730 -0.5554684 -0.3258427 -0.3413892 -0.2535149
[43,] 1.3971629 -0.47946310 0.1416887 3.4951127 3.0683519 -0.3413892 -0.2535149
[44,] -1.2179901 -0.70834071 -1.0489730 -0.5554684 -0.3258427 -0.3413892 -0.2535149
[45,] -1.2179901 2.15262934 -1.0489730 -0.5554684 -0.3258427 -0.3413892 -0.2535149
[46,] 1.3971629 0.32160851 -1.0489730 -0.5554684 -0.3258427 -0.3413892 -0.2535149
[47,] 0.5254452 -0.70834071 0.1416887 0.9487523 -0.3258427 -0.3413892 -0.2535149
[48,] 1.3971629 -0.99443771 1.3323505 1.5189797 3.0683519 2.9286223 3.9437520
[49,] -0.3462724 1.46599653 1.3323505 -0.5554684 -0.3258427 -0.3413892 -0.2535149
[50,] -1.2179901 -0.07892730 -1.0489730 -0.5554684 -0.3258427 -0.3413892 -0.2535149
[51,] 1.3971629 -0.70834071 0.1416887 -0.5554684 -0.3258427 2.9286223 -0.2535149
[52,] -1.2179901 0.55048611 -1.0489730 -0.5554684 -0.3258427 -0.3413892 -0.2535149
[53,] -1.2179901 -1.05165711 -1.0489730 1.4796537 -0.3258427 -0.3413892 -0.2535149
[54,] 0.5254452 0.09273091 1.3323505 1.8040934 3.0683519 -0.3413892 -0.2535149
[55,] -1.2179901 -0.99443771 1.3323505 -0.5554684 -0.3258427 -0.3413892 -0.2535149
[56,] -0.3462724 3.46867556 -1.0489730 -0.5554684 -0.3258427 -0.3413892 -0.2535149
[57,] 0.5254452 -1.05165711 0.1416887 -0.5554684 -0.3258427 2.9286223 3.9437520
[58,] -0.3462724 -0.42224370 1.3323505 -0.5554684 3.0683519 -0.3413892 -0.2535149
[59,] NA -0.99443771 -1.0489730 -0.5554684 -0.3258427 -0.3413892 -0.2535149
[60,] -0.3462724 1.46599653 -1.0489730 3.9178675 -0.3258427 -0.3413892 -0.2535149
[61,] 0.5254452 -0.13614670 0.1416887 -0.5554684 -0.3258427 2.9286223 -0.2535149
[62,] -1.2179901 2.15262934 -1.0489730 0.5456605 -0.3258427 2.9286223 -0.2535149
[63,] -1.2179901 -0.53668251 -1.0489730 -0.5554684 -0.3258427 -0.3413892 -0.2535149
[64,] 1.3971629 -1.10887652 0.1416887 -0.5554684 -0.3258427 -0.3413892 -0.2535149
[65,] -0.3462724 0.77936372 -1.0489730 -0.5554684 -0.3258427 -0.3413892 -0.2535149
[66,] -1.2179901 1.06546072 -1.0489730 -0.5554684 -0.3258427 -0.3413892 -0.2535149
[67,] -0.3462724 0.49326671 -1.0489730 2.7479181 -0.3258427 -0.3413892 -0.2535149
[68,] 1.3971629 0.03551150 1.3323505 0.7422906 -0.3258427 2.9286223 -0.2535149
[69,] 0.5254452 0.09273091 -1.0489730 -0.5554684 -0.3258427 -0.3413892 -0.2535149
[70,] 1.3971629 -0.99443771 -1.0489730 -0.5554684 -0.3258427 -0.3413892 -0.2535149
[71,] -1.2179901 0.89380252 -1.0489730 -0.5554684 -0.3258427 -0.3413892 -0.2535149
Online CreditCard
[1,] -1.2164961 -0.6452498
[2,] -1.2164961 -0.6452498
[3,] -1.2164961 -0.6452498
[4,] -1.2164961 -0.6452498
[5,] -1.2164961 1.5494774
[6,] 0.8218687 -0.6452498
[7,] 0.8218687 -0.6452498
[8,] -1.2164961 1.5494774
[9,] 0.8218687 -0.6452498
[10,] -1.2164961 -0.6452498
[11,] -1.2164961 -0.6452498
[12,] 0.8218687 -0.6452498
[13,] -1.2164961 -0.6452498
[14,] 0.8218687 -0.6452498
[15,] -1.2164961 -0.6452498
[16,] 0.8218687 1.5494774
[17,] -1.2164961 -0.6452498
[18,] -1.2164961 -0.6452498
[19,] -1.2164961 -0.6452498
[20,] -1.2164961 1.5494774
[21,] 0.8218687 -0.6452498
[22,] 0.8218687 -0.6452498
[23,] 0.8218687 -0.6452498
[24,] -1.2164961 -0.6452498
[25,] -1.2164961 1.5494774
[26,] 0.8218687 -0.6452498
[27,] -1.2164961 -0.6452498
[28,] 0.8218687 1.5494774
[29,] 0.8218687 1.5494774
[30,] 0.8218687 1.5494774
[31,] 0.8218687 -0.6452498
[32,] 0.8218687 -0.6452498
[33,] -1.2164961 -0.6452498
[34,] -1.2164961 -0.6452498
[35,] 0.8218687 -0.6452498
[36,] -1.2164961 -0.6452498
[37,] -1.2164961 1.5494774
[38,] -1.2164961 -0.6452498
[39,] 0.8218687 -0.6452498
[40,] 0.8218687 -0.6452498
[41,] -1.2164961 -0.6452498
[42,] -1.2164961 -0.6452498
[43,] 0.8218687 -0.6452498
[44,] 0.8218687 -0.6452498
[45,] 0.8218687 1.5494774
[46,] -1.2164961 1.5494774
[47,] 0.8218687 -0.6452498
[48,] 0.8218687 1.5494774
[49,] -1.2164961 1.5494774
[50,] -1.2164961 1.5494774
[51,] 0.8218687 -0.6452498
[52,] 0.8218687 -0.6452498
[53,] -1.2164961 -0.6452498
[54,] 0.8218687 -0.6452498
[55,] 0.8218687 -0.6452498
[56,] 0.8218687 -0.6452498
[57,] 0.8218687 -0.6452498
[58,] -1.2164961 -0.6452498
[59,] -1.2164961 -0.6452498
[60,] -1.2164961 -0.6452498
[61,] 0.8218687 -0.6452498
[62,] -1.2164961 -0.6452498
[63,] -1.2164961 -0.6452498
[64,] 0.8218687 -0.6452498
[65,] -1.2164961 -0.6452498
[66,] 0.8218687 1.5494774
[67,] -1.2164961 -0.6452498
[68,] -1.2164961 -0.6452498
[69,] 0.8218687 1.5494774
[70,] 0.8218687 -0.6452498
[71,] -1.2164961 1.5494774
[ reached getOption("max.print") -- omitted 4929 rows ]
attr(,"scaled:center")
ID Age (in years) Experience (in years) Income (in K/month)
2500.500000 45.338400 20.104600 73.774200
ZIP Code Family members CCAvg Education
93152.503000 2.397230 1.937938 1.881000
Mortgage Personal Loan Securities Account CD Account
56.498800 0.096000 0.104400 0.060400
Online CreditCard
0.596800 0.294000
attr(,"scaled:scale")
ID Age (in years) Experience (in years) Income (in K/month)
1443.5200033 11.4631656 11.4679537 46.0337293
ZIP Code Family members CCAvg Education
2121.8521973 1.1471604 1.7476590 0.8398691
Mortgage Personal Loan Securities Account CD Account
101.7138021 0.2946207 0.3058093 0.2382503
Online CreditCard
0.4905893 0.4556375
cluster$height
[1] 0.1644316 0.1659243 0.1698234 0.1701177 0.2045809 0.2057245 0.2096606 0.2108413 0.2156200 0.2200490
[11] 0.2285585 0.2326489 0.2430373 0.2437600 0.2522115 0.2525235 0.2534795 0.2554537 0.2554810 0.2588159
[21] 0.2749192 0.2750783 0.2802163 0.2853737 0.2955212 0.2959120 0.3089493 0.3091822 0.3100042 0.3178606
[31] 0.3180990 0.3188373 0.3216391 0.3217560 0.3218460 0.3219622 0.3262639 0.3282431 0.3283350 0.3286615
[41] 0.3335504 0.3371445 0.3418396 0.3514692 0.3564445 0.3571165 0.3606581 0.3613403 0.3637455 0.3671325
[51] 0.3703429 0.3757498 0.3765238 0.3776570 0.3791352 0.3792368 0.3797722 0.3797880 0.3851541 0.3855276
[61] 0.3872324 0.3874331 0.3887630 0.3960005 0.3968269 0.3972734 0.3985425 0.4023943 0.4024181 0.4030343
[71] 0.4061548 0.4082258 0.4086243 0.4095620 0.4096985 0.4124044 0.4134006 0.4158446 0.4188541 0.4195476
[81] 0.4200046 0.4201225 0.4216312 0.4254159 0.4258333 0.4262333 0.4273095 0.4294341 0.4319551 0.4344206
[91] 0.4373783 0.4412600 0.4418621 0.4422429 0.4439687 0.4446968 0.4470181 0.4510722 0.4531260 0.4536369
[101] 0.4563163 0.4568637 0.4568665 0.4591126 0.4633760 0.4635948 0.4638394 0.4654513 0.4666214 0.4679173
[111] 0.4685319 0.4687902 0.4706936 0.4728982 0.4767953 0.4775498 0.4776026 0.4777108 0.4783247 0.4791086….and
so on
[ reached getOption("max.print") -- omitted 3999 entries ]
##Adding cluster number back to the dataset
mydata$cluster
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[51] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[101] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[151] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[201] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[251] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[301] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[351] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[401] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[451] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[501] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[551] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[601] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[651] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[701] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[751] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[801] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[851] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[901] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[951] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[ reached getOption("max.print") -- omitted 4000 entries ]
bank$Frequency = as.vector(table(mydata$cluster))
View(bank)
Cart & Random Forest
##Splitting
table(mydata$`Personal Loan`)
0 1
4520 480
sum(mydata$`Personal Loan`==1)/nrow(mydata)
0.096
table(mydata$`Personal Loan`)
sum(mydata$`Personal Loan`==1)/nrow(mydata)
library(caTools)
summary(TrainingData)
ID Age (in years) Experience (in years) Income (in K/month) ZIP Code Family members
Min. : 1 Min. :23.00 Min. :-3.00 Min. : 8.00 Min. : 9307 Min. :1.000
1st Qu.:1261 1st Qu.:36.00 1st Qu.:10.00 1st Qu.: 39.00 1st Qu.:91942 1st Qu.:1.000
Median :2495 Median :45.00 Median :20.00 Median : 64.00 Median :93437 Median :2.000
Mean :2503 Mean :45.51 Mean :20.28 Mean : 73.86 Mean :93155 Mean :2.408
3rd Qu.:3759 3rd Qu.:56.00 3rd Qu.:30.00 3rd Qu.: 98.00 3rd Qu.:94608 3rd Qu.:4.000
Max. :4994 Max. :67.00 Max. :43.00 Max. :224.00 Max. :96651 Max. :4.000
NA's :11
CCAvg Education Mortgage Personal Loan Securities Account CD Account
Min. : 0.000 Min. :1.000 Min. : 0.00 Min. :0.000 Min. :0.0000 Min. :0.00000
1st Qu.: 0.700 1st Qu.:1.000 1st Qu.: 0.00 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.:0.00000
Median : 1.500 Median :2.000 Median : 0.00 Median :0.000 Median :0.0000 Median :0.00000
Mean : 1.931 Mean :1.875 Mean : 55.52 Mean :0.096 Mean :0.1046 Mean :0.06229
3rd Qu.: 2.500 3rd Qu.:3.000 3rd Qu.:101.00 3rd Qu.:0.000 3rd Qu.:0.0000 3rd Qu.:0.00000
Max. :10.000 Max. :3.000 Max. :635.00 Max. :1.000 Max. :1.0000 Max. :1.00000
Online CreditCard
Min. :0.0000 Min. :0.0000
1st Qu.:0.0000 1st Qu.:0.0000
Median :1.0000 Median :0.0000
Mean :0.6029 Mean :0.2937
3rd Qu.:1.0000 3rd Qu.:1.0000
Max. :1.0000 Max. :1.0000
str(TrainingData)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 3500 obs. of 14 variables:
$ ID : num 1 2 4 5 7 8 9 10 12 13 ...
$ Age (in years) : num 25 45 35 35 53 50 35 34 29 48 ...
$ Experience (in years): num 1 19 9 8 27 24 10 9 5 23 ...
$ Income (in K/month) : num 49 34 100 45 72 22 81 180 45 114 ...
$ ZIP Code : num 91107 90089 94112 91330 91711 ...
$ Family members : num 4 3 1 4 2 1 3 1 3 2 ...
$ CCAvg : num 1.6 1.5 2.7 1 1.5 0.3 0.6 8.9 0.1 3.8 ...
$ Education : num 1 1 2 2 2 3 2 3 2 3 ...
$ Mortgage : num 0 0 0 0 0 0 104 0 0 0 ...
$ Personal Loan : num 0 0 0 0 0 0 0 1 0 0 ...
$ Securities Account : num 1 1 0 0 0 0 0 0 0 1 ...
$ CD Account : num 0 0 0 0 0 0 0 0 0 0 ...
$ Online : num 0 0 0 0 1 0 1 0 1 0 ...
$ CreditCard : num 0 0 0 1 0 1 0 0 0 0 ...
colnames(TrainingData)=make.names(colnames(TrainingData))
print(TrainingData)
# A tibble: 3,500 x 14
ID Age..in.years. Experience..in.~ Income..in.K.mo~ ZIP.Code Family.members CCAvg Education Mortgage
* <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 25 1 49 91107 4 1.6 1 0
2 2 45 19 34 90089 3 1.5 1 0
3 4 35 9 100 94112 1 2.7 2 0
4 5 35 8 45 91330 4 1 2 0
5 7 53 27 72 91711 2 1.5 2 0
6 8 50 24 22 93943 1 0.3 3 0
7 9 35 10 81 90089 3 0.6 2 104
8 10 34 9 180 93023 1 8.9 3 0
9 12 29 5 45 90277 3 0.1 2 0
10 13 48 23 114 93106 2 3.8 3 0
# ... with 3,490 more rows, and 5 more variables: Personal.Loan <dbl>, Securities.Account <dbl>,
# CD.Account <dbl>, Online <dbl>, CreditCard <dbl>
library(rpart)
library(rpart.plot)
seed=1000
set.seed(seed)
printcp(TrainingTree)
Classification tree:
rpart(formula = Personal.Loan ~ ., data = TrainingData, method = "class",
cp = 0, minbucket = 5)
n= 3500
plotcp(TrainingTree)
printcp(Pruned_TrainingTree)
Classification tree:
rpart(formula = Personal.Loan ~ ., data = TrainingData, method = "class",
cp = 0, minbucket = 5)
n= 3500
# table(TrainingData$Personal.Loan,TrainingData$Cart_Prediction)
##Random Forest
library(randomForest)
set.seed(seed)
print(RandomForestModel)
Call:
randomForest(formula = Personal.Loan ~ ., data = TrainingData[, c(-4, -14)], ntree = 501, mtry = 3, nodesize = 10,
importance = TRUE, na.action = na.exclude)
Type of random forest: regression
Number of trees: 501
No. of variables tried at each split: 3
RandomForestModel$err.rate
NULL
plot(RandomForestModel)
legend("topright", c("OOB", "0", "1"), text.col = 1:6, lty = 1:3, col = 1:3)
set.seed(12345)
y=TrainingData$Personal.Loan,
ntreeTry = 51,
nodesize = 10,
mtryStart = 6,
stepFactor = 1.5,
improve = 0.0001,
trace = TRUE,
plot = TRUE,
doBest = TRUE,
importance = TRUE)
print(tuneRF)
function (x, y, mtryStart = if (is.factor(y)) floor(sqrt(ncol(x))) else floor(ncol(x)/3),
ntreeTry = 50, stepFactor = 2, improve = 0.05, trace = TRUE,
plot = TRUE, doBest = FALSE, ...)
{
if (improve < 0)
stop("improve must be non-negative.")
classRF <- is.factor(y)
errorOld <- if (classRF) {
randomForest(x, y, mtry = mtryStart, ntree = ntreeTry,
keep.forest = FALSE, ...)$err.rate[ntreeTry, 1]
}
else {
randomForest(x, y, mtry = mtryStart, ntree = ntreeTry,
keep.forest = FALSE, ...)$mse[ntreeTry]
}
if (errorOld < 0)
stop("Initial setting gave 0 error and no room for improvement.")
if (trace) {
cat("mtry =", mtryStart, " OOB error =",
if (classRF)
paste(100 * round(errorOld, 4), "%", sep = "")
else errorOld, "\n")
}
oobError <- list()
oobError[[1]] <- errorOld
names(oobError)[1] <- mtryStart
for (direction in c("left", "right")) {
if (trace)
cat("Searching", direction, "...\n")
Improve <- 1.1 * improve
mtryBest <- mtryStart
mtryCur <- mtryStart
while (Improve >= improve) {
mtryOld <- mtryCur
mtryCur <- if (direction == "left") {
max(1, ceiling(mtryCur/stepFactor))
}
else {
min(ncol(x), floor(mtryCur * stepFactor))
}
if (mtryCur == mtryOld)
break
errorCur <- if (classRF) {
randomForest(x, y, mtry = mtryCur, ntree = ntreeTry,
keep.forest = FALSE, ...)$err.rate[ntreeTry,
"OOB"]
}
else {
randomForest(x, y, mtry = mtryCur, ntree = ntreeTry,
keep.forest = FALSE, ...)$mse[ntreeTry]
}
if (trace) {
cat("mtry =", mtryCur, "\tOOB error =",
if (classRF)
paste(100 * round(errorCur, 4), "%",
sep = "")
else errorCur, "\n")
}
oobError[[as.character(mtryCur)]] <- errorCur
Improve <- 1 - errorCur/errorOld
cat(Improve, improve, "\n")
if (Improve > improve) {
errorOld <- errorCur
mtryBest <- mtryCur
}
}
}
mtry <- sort(as.numeric(names(oobError)))
res <- unlist(oobError[as.character(mtry)])
res <- cbind(mtry = mtry, OOBError = res)
if (plot) {
plot(res, xlab = expression(m[try]), ylab = "OOB Error",
type = "o", log = "x", xaxt = "n")
axis(1, at = res[, "mtry"])
}
if (doBest)
res <- randomForest(x, y, mtry = res[which.min(res[,
2]), 1], ...)
res
}
<bytecode: 0x000002107c280ae0>
<environment: namespace:randomForest>
#..................Building Model
TrainingDataModel <- rpart(formula = Personal.Loan ~ ., data = TrainingData, method = "class",
control = cartParameters)
TrainingDataModel
n= 3500
library(rattle)
library(RColorBrewer)
fancyRpartPlot(TrainingDataModel)
printcp(TrainingDataModel)
Classification tree:
rpart(formula = Personal.Loan ~ ., data = TrainingData, method = "class",
control = cartParameters)
n= 3500
bestcp
library(gplots)
library(ineq)
library(InformationValue)
View(TrainingData)
##Confusion Matrix
colnames(TestingData)=make.names(colnames(TestingData))
confusionMatrix_Cart_Test=table(TestingData$Personal.Loan, TestingData$Cart_PredictionClass)
ErrorRate_CartModel=print((confusionMatrix_Cart_Test[1,2]+confusionMatrix_Cart_Test[2,1])/nrow(T
estingData))
< 0.02133333
##AUC
auc_TrainingData = as.numeric(auc@y.values)
## Gini Coefficicent
##Concordance Ratio
Concordance_TrainingData = Concordance(actuals=Personal.Loan$actuals,
predictedscores=Personal.Loan$predictedscores)
Conclusion
The distribution is uniform for both test and train data. Therefore, the model is built good.