Вы находитесь на странице: 1из 4

Insurance: Mathematics and Economics 8 (1989) 31-34 31

North-Holland

A credit scoring model for Dersonal loans

A. STEENACKERS (1) Which characteristics are to be used in the


Kaiholieke Universiteit Leuven, B-3000 Louvain, Belgium scoring model as variables that can dis-
criminate between a good and a bad loan?
M.J. GOOVAERTS (2) How to obtain the score for each characteris-
Katholieke Universiteit Leuuen, B-3000 Louvain, Belgium
tic?
Uniuersiteit van Amsterdam, I01 I NH Amsterdam,
The Netherlands
Mathematically, a credit scoring system can be
expressed as a decision rule based on a linear
A logistic regression model is used to develop a numerical
function
scoring system for personal loans.
f(x ,,..., x,)=b,X,+b,X,+...+b,&,
Keywords: Credit scoring model, Logistic regression.
where

X, = relevant characteristic,
1. Introduction b, = weight or score corresponding to characteris-
tic X,.
In each bank or credit company the evaluation
of new credit demands is one of the basic aspects A method most frequently used for the de-
of credit granting. Traditionally the decision termination of the coefficients b, is discriminant
whether to grant credit to an individual is taken analysis [see for example Myers and Forgy (1963)].
by a specialized person, who handles each demand However, one of the basic underlying assump-
individually. As this is a time-consuming (and tions in discriminant analysis is the assumption of
therefore expensive) process, financial institutions normally distributed variables X,, which is violated
often make use of a credit scoring system. in this case as most of the variables used in a
Such a system is a computerized procedure credit scoring system are categorical variables (e.g.
which attributes to the client a number of points profession, marital status).
(a score) according to a number of relevant char- For that reason a decision rule was determined
acteristics as income, profession, age, etc. If the using a logistic regression model, which does not
total score obtained by summation of the individ- imply that the variables X, are multivariate nor-
ual scores is high enough, i.e., if it is higher than a mally distributed. In the credit scoring area this
so-called cut-off level, the credit will be granted. model was first proposed by Chesser (1974) for
If the total score does not reach the cut-off level, forecasting commercial loan non-compliance.
the credit will be refused. In the logistic regression model the assumption
This report will focus on the practical deriva- is made that the probability of a loan to be good
tion of a credit scoring model for personal loans. is dependent on the level of the characteristics X,.
Section 1 briefly reviews the statistical method. In Specifically, one assumes that the posterior prob-
Section 2 a description of the data is given. In ability of a good loan is given by
Section 3 the resulting credit scoring model is
eh,+b,X,t +b,X,
presented.
Px = 1 + eb,+b,X,+ +b,X, (1)

where
2. The method
X, = relevant characteristic,
The two major questions to be answered in the b, = corresponding weight,
derivation of a credit scoring model are:

0167-6687/89/$3.50 0 1989, Elsevier Science Publishers B.V. (North-Holland)


32 A. Sreenackers. h4.J. Goovaeris / Credit scoring model

or equivalently, Table 1
Characteristics of applicant.
In ~=b,+b,X,+...+b,X,.
px (4
1 -Px 1. Marital status
2. Nationality
This means that the natural logarithm of the ratio 3. Sex
of the posterior probability of a good loan and the 4. No. children
posterior probability of a bad loan is equal to a 5. Age a
linear function of the characteristics X,. 6. Having a telephone a
7. Time at present address a
The weights b, are to be estimated by use of the
8. Geographical region in Belgium a
maximum likelihood method [See Altman et al. 9.. Profession a
(1981)]. 10. Working at private/public sector a
Finally, a new loan is allocated to the popu- 11. Time at present job a
12. Total monthly revenue a
lation of the good loans if its predicted probability
13. Total monthly expenses
p, is higher than a cut-off level c, which will be
14. Homeowner a
determined in the last section. 15. Previous credit experience
16. No. previous credits a
17. Duration of the loan a
18. Amount of the loan
3. The data
19. Object of the loan

For the present study, data on personal loans a Variables included in the scoring model
were collected in a Belgian credit company. The
loans dated from November 1984 till December
1986. The total sample contains three kinds of For the selection of the characteristics in the
loans: good loans (995) bad loans (1257) and scoring model, the occurrence of all possible val-
refused loans (693). Refused loans are loans which ues of each characteristic is examined separately
were not accepted by the credit company. An in the sample of good loans and bad loans, in
accepted loan is by definition a bad loan after order to obtain a first indication of the dis-
three reminders. criminating power of each characteristic. For ex-
The total sample is randomly divided in an ample, as 96% of the clients in the sample of good
original sample (containing 800 good, 1000 bad loans have no telephone, against 90% in the sam-
and 500 refused loans) which is used for the ple of bad loans, the characteristic having a tele-
derivation of the scoring model and a so-called phone or not can be considered as a possible
holdout sample (containing 195 good, 252 bad variable in the scoring model.
and 193 refused loans) used for an unbiased test For the final selection of the characteristics, a
of the model. stepwise logistic regression is carried out. This
Information was available on personal char- technique performs a number of logistic regres-
acteristics, professional situation, financial situa- sions, starting with the single variable with the
tion and loan-related characteristics. See Table 1 most predictive power, and adding one by one the
for a complete list of available characteristics. variables that give the best improvement in good-
All possible values for each item were grouped ness of fit of the model, until no further single
into different categories, and dummy variables addition achieves a specified significance level.
were defined to describe each category. [See Bartolucci and Fraser (1977).]
For example, professions were grouped into 11 As data on good and bad loans are collected
categories according to the necessary educational from a portfolio which already passed a screening
background. procedure in the credit company, a credit scoring
As frequency tables indicate that short-period model which is based only on these data gives
loans and long-period loans are more likely to be biased results if it is used for the selection of new
good than intermediate-period loans, duration of loans. Therefore data on refused loans are incor-
the loan is divided into three groups: less than 21 porated in the model in the following way. First a
months, between 22 and 47 months and more than scoring model is built using only the data of good
47 months. and bad loans. As we do not know whether a
A. Steenackers, M.J. Goouaerts / Credit scoring model 33

refused loan would have been good if it were Table 2


Regression results
accepted, we apply this model to classify the re-
fused loans into good and bad refused loans. First model Final model
Of course we will never know with certainty - 2 log likelihood
whether a refused loan would have been good or (a) Logistic model 2042.67 2383.99
bad. The final scoring model is then constructed (b) Model with only 2473.06 3165.44
an intercept
with the same variables as in the first model but
(c) Likelihood ratio 430.39 781.45
with corresponding weights calculated on the base d.f. 28 28
of a sample which also includes the classified prob-value 0.00 0.00
refused loans. (d) R-statistic 0.39 0.48

Percentage of correct classification 0


original sample
4. Results _ Good loans 63.5% 70.0% b
_ Bad loans 76.7% 79.1% b
As the final scoring model should remain con-
Holdout sample
fidential, the explicit prediction formula for the _ Good loans 60.0% 62.6%
posterior probability of a new loan to be good will _ Bad loans 79.0% 76.6%
not be reproduced here.
a For a cutoff-level c = 0.50.
The characteristics which are included in the b Including refused loans.
scoring model are indicated by an a in Table 1.
All these variables are significant at the 0.05 level.
The signs of all the weights correspond to the The likelihood ratio (LR = 2(ln L - In L,)) in
theoretical considerations. Among these character- (c) is significant for both models.
istics, the number of previous credits is the most In (d) the R-statistic is given, which is a statis-
predictive variable. People who pretend to have no tic similar to the normal multiple correlation coef-
previous loans will get a lower weight (and a ficient, but with a correction for the number of
resulting lower probability to pay back their loan) estimated parameters:
than people who mention one or two previous -2ln L,+21n L-2k
loans. F.2
-2ln L, f
The second important criterion in the selection
of new credits by the scoring model is the posses- If the -2k correction is ignored, R is equal to
sion of a house. A client who does not own a 1 if the model fits perfectly and R is equal to 0 if
house, will get al negative weight for this char- the model does not differ from the null hypothesis
acteristic, a houseowner will get a zero weight. that the predicted probability is equal to the pro-
This means that, ceteris paribus, a houseowner portion of good loans in the sample for all ob-
will get a higher probability to have a good loan servations..
than someone who does not possess a house. Note that the R-statistic in the final model has
Some of the variables in Table 1, as for exam- a higher value than in the first model, because
ple marital status, are not included in the model, obviously the final model will give good predic-
although frequency tables for these variables indi- tions for the refused loans, as these are classified
cate a difference of more than 5% between good by a model based on the same explanatory varia-
and bad loans. These variables are eliminated bles.
because of strong intercorrelations with other vari- The percentages given in Table 2 are the rates
ables. For example, marital status is correlated of correct classification in the original sample as
with age, possession of a house, etc. well as in the holdout sample for a cut-off level c
Table 2 shows the results from the first model, equal to 0.50. For example, 76.6% of the bad loans
based only on bad and good loans, as well as from in the holdout sample will be classified by the
the final model including data on refused loans. scoring model as bad loans.
In (a) - 2 log likelihood (- 2 In L) of the total These percentages can be used for the de-
regression model is given; in (b) the same statistic termination of the cut-off level in the following
is for a model with only an intercept (- 2 In L,). way. As the actual rate of acceptation in the credit
34 A. Steenackers, M.J. Goouaerts / Credit scoring model

100 loans

(53%)a

53 accewted 47 refused

(36.9%)

42qod
YL?
refused
by the
scoring
26.5 15.9 model
accepted refused accepted refused
by the by the by the by the
scoring scoring scoring scoring
model model model model

EActual rate of acceptation


Actual rate of delinquency
Fig. 1. Rate of acceptation at cutoff-level c = 0.50.

company is 53% and the actual rate of delin- References


quency is 20%, the calculations on Figure 1 show
that at a cut-off level c = 0.50 about 46.3% of the Altman, E., R. Avery, R. Eisenbeis and J. Sinkey (1981).
new loans will be accepted by the scoring model. Application of Classification Techniques in Business, Bankmg
If one decreases the cut-off level to c = 0.45, the and Fmance. JAI Press, Greenwich, CT.
acceptance rate of the scoring model will increase Bartolucci, A.A. and M.D. Fraser (1977) Comparative step-up
and composite tests for selecting prognostic indicators asso-
to 51%. In this way the cut-off level can be adapted
ciated with survival. Biometrical Journal 19, 437-448.
according to the percentage of loans the credit Chesser, D.L. (1974). Prediction loan noncompliance. Journal
company wants to accept. of Commercial Bank Lending 56, no. 8, 28-38.
Finally, as the development of a credit scoring Cox, D. (1970). Analysis of Binary Data. Methuen, London.
model relies on past experience for information on Myers, J.H. and W. Forgy (1963). The development of numeri-
cal credit evaluation systems. Journal of the American Stat-
the applicants characteristics and behaviour, a
istical Association 58, no. 303, 799-806.
periodical review of the model will be necessary to Van Nieuwburg, M.J.T.J. (1984). Credit-scoring: Een manage-
adjust for shifts in the underlying factors as for ment informatiesysteem. Tijdschrift uoor Economic en
example income levels. Management XXIX, no. 4.

Вам также может понравиться