Академический Документы
Профессиональный Документы
Культура Документы
The credit assessment made by corporate banks has been evolving in recent
years. Credit assessments have evolved from the being the subjective assessment
of the bank’s credit experts, to become more mathematically evolved. Banks are
increasingly opening their eyes to the excessive need for comprehensive model-
ing of credit risk. The financial crisis of 2008 is certain to further the great need
for good modeling procedures. In this thesis the modeling framework for credit
assessment models is constructed. Different modeling procedures are tried, lead-
ing to the assumption that logistic regression is the most suitable framework for
credit rating models. Analyzing the performance of different link functions for
the logistic regression, lead to the assumption that the complementary log-log
link is most suitable for modeling the default event.
Validation of credit rating models lacks a single numeric measure that concludes
the model performance. A solution to this problem is suggested by using prin-
cipal component representatives of few discriminatory power indicators. With a
single measure of model performance model development becomes a much more
efficient process. The same goes for variable selection. The data used in the
modeling process are not extensive as would be the case for many banks. An
resampling process is introduced that is useful in getting stable estimates of
model performance for a relatively small dataset.
ii
Preface
The project was carried out in the period from October 1st 2007 to October 1st
2008.
The subject of the thesis is the statistical aspect of credit risk modeling.
I would also like to thank my family, my girlfriend Hrund for her moral support,
my older son Halli for his patience and my new-born son Almar for his inspiration
and for allowing me some sleep.
vi
Contents
Summary i
Preface iii
Acknowledgements v
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
4 Data Resources 35
6 Validation Methods 85
CONTENTS ix
6.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
7 Modeling Results 99
8 Conclusion 129
C Programming 153
Introduction
1.1 Background
Banking is built on the idea of profiting by loaning money to ones that are in
need of money. Banks then collect interests on the payments which the borrower
makes in order to pay back the money they borrowed. The likely event that
some borrowers will default on their loans, that is fail to make their payments,
results in a financial loss for the bank.
In the application process for new loans, banks assess the potential borrowers
creditworthiness. As a measure of creditworthiness some assessment are made
on the probability of default for the potential borrowers. The risk that the
credit assessment of the borrowers is to modest, is called credit risk. Credit risk
modeling is quite an active research field. Before the milestone of Altman [2],
credit risk on corporate loan was based on subjective analysis of credit experts
of financial institutes.
Probability of default is a key figure in the daily operation of any credit institute,
as it is used as a measure of credit risk in both internal and external reporting.
The credit risk assessments made by banks are commonly referred to as credit
rating models. In this thesis various statistical methods are used as modeling
2 Introduction
This thesis is done in co-operation with a corporate bank, which supplied the
necessary data resources. The aim of the thesis is to see whether logistic re-
gression can outperform the current heuristic credit rating model used in the
co-operating corporate bank . The current model is called Rating Model Cor-
porate (RMC) and is described better in section 4.5.1. This was the only clear
aim in the beginning, but further goals were acquired in the proceedings of the
thesis.
First some variables that were not used in RMC but were still available, are
tested. Then an attempt was made to model credit default with different math-
ematical procedures. Also an effort was made to combine some of those methods
with logistic regression. Since discriminant analysis have seen excessive use in
credit modeling the performance of discriminant analysis was documented for
comparison.
The general purpose of this thesis is to inform the reader on how it is possible
to construct credit rating models. Special emphasis is made on the practical
methods that a bank in the corporate banking sector could make use of, in the
development process of a new credit rating model.
1.3 Outline of Thesis 3
Credit risk modeling is a wide field. In this thesis an attempt is made to shed
a light on the many different subjects of credit risk modeling. Chapters 2 and
6 provide the fundamental understanding of credit risk modeling.
In order to get a better feel for credit modeling framework there are some
important concepts and measures that are worth considering. It is also worth
considering the need of credit modeling and the important role of international
legislation on banking supervision, called Basel II.
In Section 2.1 the most important concepts of the credit modeling framework
are defined. The definitions are partly adapted from the detailed discussion
in Ong [26] and Alexander and Sheedy [1]. Section 2.2 discusses the ongoing
financial crisis that are partly due to poor credit ratings and finally the model
development process is introduced in Section 2.3.
By considering the first of the two rather formal definitions, it states that if the
bank believes it will not receive their debt in full, without demanding ownership
of the collateral3 taken. The second scenario is simpler as it states that if the
borrower has not paid some promised payment, which was due 90 days ago, the
borrower is considered to have defaulted on its payment. The sentence regarding
overdrafts4 can be interpreted as if the borrower were to make a transaction
breaking the advised limit or is struggling to lower its limit and thus making
the bank fear that they will not receive their payment.
It is important to note the difference between the three different terms, in-
solvency, bankruptcy and default. The tree terms, are frequently used in the
literature as the same thing. In order to avoid confusion the three terms are
given an explanation here. The term insolvency refers to a borrower that unable
its debt whereas the borrower that has defaulted on its debt is either unwilling
or unable to pay their debt. To complicate matters even further insolvency is
often referred to as the situation when liabilities exceed assets, but firms might
still be profitable and thus be able to pay all their debts. Bankruptcy is a legal
finding that results in a court supervision over the financial affairs of a borrower
that is either insolvent or in default. It is important to note that a borrower that
has defaulted can come back from being defaulted by settling the debt. That
might be done by adding collateral or by getting alternative fundings. Further-
more, as will be seen later, when considering loss given default, the event of a
default does not necessary result in a financial loss for the bank.
When potential borrowers apply for a loan at a bank, the bank will evaluate
the creditworthiness of the potential borrower. This assessment is of whether
2A firm is any business entity such as a corporation, partnership or sole trader.
3 Collateralis an asset of the borrower that becomes the lenders if the borrower defaults
on the loan.
4 Overdraft is a type of loan meant to cover firm’s short term cash need. It generally has
an upper bound and interests are payed on the outstanding balance of the overdraft loan.
2.1 Definition of Credit Concepts 7
the borrower can pay the principal and interest when due. The risk that arises
from the uncertainty of the credit assessment, especially that it is to modest, is
called credit risk. According to the Basel Handbook [26] credit risk is the major
risk to which banks are exposed, whereas making loans is the primary activity
of most banks. A formal definition of credit risk is give by Zenios [35] as
The creditworthiness may decline over time, due to bad management or some
external factors, such as rising inflation5 , weaker exchange rates6 , increased
competition or volatility in asset value.
Exposure at Default (EAD) is the amount that the borrower legally owes the
bank. It may not be the entire amount of the funds the bank has granted the
5 Inflation is an economical term for the general increase in the price level of goods and
services.
6 Exchange rates describes the relation between two currencies, specifying how much one
Loss Given Default (LGD) is a percentage of the actual loss of EAD, that the
bank suffers. Banks like to protect themselves and frequently do so by taking
collateral or by holding credit derivatives8 as a securitization. Borrowers may
even have a guarantor who will adopt the debt if the borrower defaults, in that
case the LGD takes the value zero. The mirror image of LGD, recovery rate
given default is frequently used in the literature and they add up to the amount
owed by the borrower at the time of default, EAD. Loss given default is simply
the expected percentage of loss on the funds provided to the borrower. Altman
et al. [4] reports empirical evidence that observed default rates and LGDs are
positively correlated. From this observation it is possible to conclude that banks
are successful in protecting themselves when default rates are moderate, but fail
to do so when high default rates are observed.
Expected Loss (EL) can be seen as the average loss of historically observed losses.
EL can also be estimated using estimates of the three components in equation
(2.1).
EL = P D × EAD × LGD (2.1)
EL estimations is partly decisive of the banks capital requirement. Capital
requirements, that is the amount of money that the bank has to keep available,
is determined by financial authorities and is based on common capital ratios9 .
The capital requirements are though usually substantially higher than EL as
it has to cover all types of risk that the bank is imposed to, such as market,
liquidity, systematic and operational risks10 or simply all risks that might result
in a solvency crisis for the bank. Un-expected Loss (UEL) is defined in Alexander
8 Credit derivatives are bilateral contracts between a buyer and seller, under which the
seller sells protection against the credit risk of an underlying bond, loan or other financial
asset.
9 Tier I, Tier II, leverage ratio, Common stockholders’ equity.
10
Market risk the risk of unexpected changes in prices or interest or exchange rates.
Liquidity risk the risk that the costs of adjusting financial positions will increase substan-
tially or that a firm will lose access to financing.
Systemic risk the risk of breakdown in marketwide liquidity or chain-reaction default.
Operational risk the risk of fraud, systems failures, trading errors, and many other internal
organizational risks.
2.1 Definition of Credit Concepts 9
and Sheedy [1] with respect to a certain Value at Risk (VaR) quantile and the
probability distribution of the portfolio’s loss. The VaR quantile can be seen as
an estimate of the maximum loss. The VaR quantile is defined mathematically
as Pr [Loss ≤ V aRα ] = α, where α is generally chosen as high quantiles 99%-
99.9%. For a certain VaRα quantile the UEL can be defined as
UEL = VaRα − EL
The name un-expected loss is somewhat confusing as the value rather states
how much incremental loss could be expected in a worst case scenario. Further
discussion on how to obtain an estimate of EL, VaRα and UEL can be seen in
Appendix A.
One of the primary objectives of this thesis is to consider how to obtain the best
possible estimate of probability of default of specific borrowers. It is therefore
worth considering what is the purpose of acquiring the best possible estimate
of PDs. The PDs are reported as a measure of risk to both bank’s executive
board and to financial supervisory authorities. The duty of financial supervi-
sory authority is to monitor the bank’s financial undertakings and to ensure
that bank’s have reliable banking procedures. Financial supervisory authority
determine banks, capital requirements. As banks like to minimize their capital
requirements it is of great value to show that credit risk is successfully modeled.
Expected loss, capital requirements along with the PDs are the main factors in
deciding the interest rate for each borrower. As most borrowers will look for the
best offer on the market it is vital to have a good rating model. In a competitive
market, banks will loan at increasingly lower interest rates. Thus some of them
might default and as banks loan other banks, that might cause a chain reaction.
Banking legislation
Switzerland, Germany, United Kingdom, Italy, United States, Japan and Luxembourg.
10 Credit Modeling Framework
The Basel committee published an accord called Basel II in 2004 which is meant
to create international standards that banking regulators can use when creating
regulations about how much capital, banks need to keep solvent in order to
avoid credit and operational risks.
More specifically the aim of the Basel II regulations is according to Ong [26]
to quantify and separate operational risk from credit risk and to ensure that
capital allocation is more risk sensitive. In other words Basel II sets a guideline
how, banks in-house estimation of the loss parameters; probability of default
(PD), loss given default (LGD), and exposure at default (EAD), should be.
As banks need regulators approval, these guidelines ensure that banks hold
sufficient capital to cover the risk that the bank exposes itself to through its
lending and investment practices. These international standards should protect
the international financial system from problems that might arise should a major
bank or a series of banks collapse.
Credit Modeling
The Basel II accord introduces good practices for internal based rating systems
as another option to using ratings obtained from credit rating agencies. Credit
rating agencies rate; firms, countries and financial instruments based on their
credit risk. The largest and amongst the most cited agencies are Moody’s,
Standard & Poor’s and Fitch Ratings. Internal based rating systems have the
advantage over the rating agencies that, there are addition information available
inside that bank, such as credit history and credit experts valuation. Internal
based ratings can be obtain for all borrowers whereas for rating agencies ratings
might be missing some potential borrowers. Furthermore, rating agencies just
publicly report the risk grades of larger firms, whereas there is a price to view
their ratings for small and medium sized firms.
There are two different types of credit models that should not be confused
2.1 Definition of Credit Concepts 11
together. One is credit rating models and the other is credit pricing models.
There is a fundamental difference in the two models as the credit rating models
are used to model PDs and the pricing models consider combinations of PDs,
EADs and LGDs to model the EL. A graphical representation of the two models
can be seen in Figure 2.1.
In this thesis credit rating models are of the main concern, as it is of more
practical use and can be used to get estimates of EL. By estimating the EL
the same result as for credit pricing models is obtained. Reconsidering the
relationship between the risk components in equation (2.1).
The PDs are obtained from the credit rating model, the EAD is easily estimated
as the current exposure. An estimate of LGD can be found by collecting his-
torical data of LGD and in Figure 2.2 an example of LGD distribution can be
seen. The average which lies around 40% does not represent the distribution
well. A more sophisticated procedure would be to model the event of loss or
no loss with some classification procedure, e.g. logistic regression. Then use
the left part of the empirical distribution to model those classified as no loss
and the right part for those classified as loss. The averages of each side of the
distribution could be used. It would though be even better to use LGD as a
stochastic variable, and consider it to be independent of PD. It is generally
seen in practice that LGDs are assumed independent of PDs as Altman et al.
[4] points out that the commercial credit pricing models12 use LGD either as
12 These value-at-risk (VaR) models include J.P. Morgan’s CreditMetrics
,
R McKin-
12 Credit Modeling Framework
Histogram of LGD
0.05
0.04
Relative Frequency
0.03
0.02
0.01
0.00
0 20 40 60 80 100
LGD [%]
sey’s CreditPortfolioView
,
R Credit Suisse Financial Products’ CreditRisk+
,
R KMV’s
PortfolioManager
,
R and Kamakura’s Risk Manager
.
R
2.2 Subprime Mortgage Crisis 13
In their 2006 paper, Altman et al. [4], argue that there was a type of credit
bubble on the rising, causing seemingly highly distressed firms to remain non-
bankrupt when, in more normal periods, many of these firms would have de-
faulted. Their words could be understood as there has been given to much credit
to distressed firms, which would thus result in greater losses when that credit
bubble would collapse. With the financial crisis of 2008 that credit bubble is
certain to have bursted. This might result in high default rates and significant
losses for corporate banks in the next year or two, only time will tell.
The financial crisis of 2008 is directly related to the subprime mortgage cri-
sis, whereas high oil and commodity prices have increased inflation, which has
induced further crisis situations. A brief discussion, adapted from Maslakovic
[22], on the subprime mortgage crisis and its causes follows.
when there has been a negative growth in real gross domestic product (GDP) for two or more
consecutive quarters. A sustained recession is referred to as depression.
14 Credit Modeling Framework
The mortgage lenders were the first to be affected, as borrowers defaulted, but
major banks and other financial institutions around the world were hurt as
well. The reason for their pain was due to a financial engineering tool called
securitization, where rights to the mortgage payments is passed on via mortgage-
backed securities (MBS) and collateralized debt obligations (CDO). Corporate,
individual and institutional investors holding MBS or CDO faced significant
losses, as the value of the underlying mortgage assets declined. The stock prices
of those firms reporting great losses caused by their involvement in MBS or
CDO fell drastically.
The widespread dispersion of credit risk through CDOs and MBSs and the
unclear effect on financial institutions caused lenders to reduce lending activity
or to make loans at higher interest rates. Similarly, the ability of corporations
to obtain funds through the issuance of commercial paper was affected. This
aspect of the crisis is consistent with a credit crisis term called credit crunch. The
general crisis caused stock markets to decline significantly in many countries.
The liquidity concerns drove central banks around the world to take action to
provide funds to member banks to encourage the lending of funds to worthy
borrowers and to re-invigorate the commercial paper markets.
The credit crunch has cooled the world economic system, as fewer and more
expensive loans decrease the investments of businesses and consumers. The
major contributors to the subprime mortgage crisis were poor lending practices
and mispricing of credit risk. Credit rating agencies have been criticized for
giving CDOs and MBSs based on subprime mortgage loans much higher ratings
then they should have, thus encouraging investors to buy into these securities.
Critics claim that conflicts of interest were involved, as rating agencies are paid
by the firms that organize and sell the debt to investors, such as investment
banks. The market for mortgages had previously been dominated by government
sponsored agencies with stricter rating criteria.
In the financial crisis, which has been especially hard for financial institutes
around the world, the words of the prominent Cambridge economist John May-
nard Keynes have never been more appropriate, as he observed in 1931 during
the Great Depression:
A sound banker, alas, is not one who foresees danger and avoids
it, but one who, when he is ruined, is ruined in a conventional way
along with his fellows, so that no one can really blame him.
2.3 Development Process of Credit Rating Models 15
The data used are recordings from the co-operating bank’s database, and they
are the same data as used in Rating Model Corporate (RMC). The data are
given a full discussion in Chapter 4 can be categorized as shown at the top of
Figure 2.3.
The data goes through a certain cleaning process. A firm that is not observed
in two successive years, it is either a new customer or a retiring one, and thus
removed from the dataset. Observations with missing values are also removed
from the dataset.
16 Credit Modeling Framework
When the data has been cleansed they will be referred to as complete and
they are then splitted into training and validation sets. The total data will be
approximately splitted as following, 50% will be used as a training set, 25% as
a validation set and 25% as a test set:
The training set is used to fit the model and the validation set is used to estimate
the prediction error for model selection. In order to account for the small sample
of data, that is of bad cases, the process of splitting, fitting, transformation and
validation is performed recursively.
The test set is then used to assess the generalization error of the final model
chosen. The training and validation sets, together called modeling sets, are
randomly chosen sets from the 2005, 2006 and 2007 dataset, whereas the test
set is the 2008 dataset. The recursive splitting of the modeling sets is done by
choosing a random sample without replacement such that the training set is 2/3
and validation set is 1/3 of the modeling set.
In the early stages of the modeling process it was observed that different seedings
into training and validation sets, resulted in considerable different results. In
order to accommodate this problem a resampling process is performed and the
average performance over N samples is considered for variable selection. In
order to ensure that the the same N samples are used in the resampling process
the following procedure is performed:
An example of the different performances for different splits for RMC and a
logistic regression model can be seen in Figure 2.4. The figure shows the clear
2.3 Development Process of Credit Rating Models 17
need for the resampling process. This can be seen by considering the differ-
ent splits in iteration 1 and 50 respecitvely. For iteration 1 the RMC would
have been preferred to the LR model. The opposit conclusion would have been
reached if the split of iteration 50 would have been considered.
Performance Comparison
2 2 1
2
2
22 2 1 11
2 2 112 1
2 1
2
2
2 2 1111 11
2 2
2 11 11
2
0
2 111
2 111 2
111 2
2 2 1
1 2 2 22 2 2
2
11 22
PCA.stat
2 2 2 2 2
1111
2 112
11
1 2
−2
2 2
111 2
2
2 1
2 2
11
2
11
−4
1
1
LR Model
2 RMC
0 10 20 30 40 50
Iteration
The datasets consists of creditworthiness data and the variable of whether the
firm has defaulted a year later. The default variable is given the value one if
the firm has defaulted and the value zero otherwise.
When the training and validation sets have been properly constructed, the mod-
eling is performed. The modeling refers to the process of constructing a model
that can predict whether a borrower will default on their loan, using some previ-
ous informations on similar firm. The proposed model is fitted using the data of
the training set and then a prediction is made for the validation set. If logistic
18 Credit Modeling Framework
regression15 is used as a modeling method then the predicted values will lie on
the interval [0,1] and the predicted values can be interpreted as the probabilities
of default (PD). Generally when one is modeling some event or non-event the
predicted values are rounded to one for event and to zero for non-event. There
is a problem to this as the fitted values depend largely on the ratios of zeros
and ones in the training sample. That is, for cases when there are alot of zeros
compared to ones in the training set, which is the case for credit default data,
the predicted values will be small. Those probabilities can be interpreted as the
probability of default of individual firm. An example of computed probabilities
can be seen in Figure 2.5.
400
200
0
Prob. Default
From Figure 2.5 it is apparent that the largest PD is considerable below 0.5 and
thus all the fitted values would get the value zero if they where rounded to binary
15 Logistic regression is a modeling procedure that is specialized for modeling when the
dependent variable is either one or zero. Logistic regression is introduced in section 3.2.2 and
a more detailed discussion can be seen in section 5.2.2.
2.3 Development Process of Credit Rating Models 19
numbers. This is the main reason for why ordinary classification and validation
methods do not work on credit default data. The observed probabilities of
default are small numbers and thus not easily interpreted. Hence, to enhance
the readability the default probabilities they are transformed to risk ratings.
Rating Model Corporate has 12 possible ratings and the same transformation to
risk rating scale was used for proposed models, in order to ensure comparability.
The transformation from PDs to risk ratings is summarized in Table 2.1.
PD-interval Rating
[ 0.0%; 0.11% [ 12
[ 0.11%; 0.17% [ 11
[ 0.17%; 0.26% [ 10
[ 0.26%; 0.41% [ 9
[ 0.41%; 0.64% [ 8
[ 0.64%; 0.99% [ 7
[ 0.99%; 1.54% [ 6
[ 1.54%; 2.40% [ 5
[ 2.40%; 3.73% [ 4
[ 3.73%; 5.80% [ 3
[ 5.80%; 9.01% [ 2
[ 9.01%; 100.0% ] 1
Table 2.1: Probabilities of Default (PD) are transformed to the relative risk
rating.
It is apparent from Table 2.1 that the PD-intervals are very different is size.
It is also apparent that low PDs representing a good borrower are transformed
to high risk rating. An example of a risk rating distribution can be seen in
Figure 2.6. When the ratings have been observed it is possible to validate the
results, that is done by computing the discriminatory power16 of the observed
ratings. The discriminatory power indicators are then compared to the indica-
tors calculated for RMC in the specific validation set. The model performance
is concluded from the discriminatory power indicators. Numerous discrimina-
tory power methods are presented in Section 6.4. Important information can
be drawn form visual representation of the model performance as in the rela-
tive and cumulative frequencies of the good and bad cases respectively and the
respective ROC curve, which are all introduced in Sections 6.2 and 6.3. Visual
comparison is not made when the modeling is performed on numerous modeling
sets, that is when the resampling process is used.
16 The term, discriminatory power refers to the fundamental ability to differentiate between
0.10
0.05
0.00
2 4 6 8 10 12
Rating Class
Figure 2.6: Example of a Risk Rating distribution, when the PDs have been
transformed to risk ratings.
From the model performance it is possible to assess different varaibles and mod-
eling procedures. The results can be seen in Section 7.
Chapter 3
In this chapter, credit assessment models, commonly used in practice, are pre-
sented. First their general functionality and application is introduced, followed
by a light discussion of current research in the field is given. The credit as-
sessment models are used to rate borrowers based on their creditworthiness and
they can be grouped as seen in Figure 3.1. The three main groups are heuristic,
statistical and causal models. In practice, combinations of heuristic and either
of the other two methods are frequently used and referred to as hybrid mod-
els. The discussion here is adapted from Datschetzky et al. [13]1 and should be
viewed for a more detailed discussion.
Heuristic models are discussed in Section 3.1 and a brief introduction of sta-
tistical models in Section 3.2 and a more detailed discussion in Chapter 5. In
Section 3.3 models based on option pricing theory and cash flow simulation are
introduced and then finally hybrid form models are introduced in Section 3.4.
1 Chapter 3
22 Commonly Used Credit Assessment Models
Heuristic models attempt to use past experience to evaluate the future creditwor-
thiness of a potential borrower. Credit experts choose relevant creditworthiness
factors and their weights, based on their experience. Significancy of factors are
not necessarily estimated and their weights not necessarily optimized.
Expert systems are software solutions which aim to recreate human problem
solving abilities. The system uses data and rules selected by credit experts in
order to evaluate its expert evaluation.
Altman and Saunders [3] reports that bankers tend to be overly pessimistic
about the credit risk and that multivariate credit-scoring systems tend to out-
perform such expert systems.
Fuzzy logic systems can be seen as a special case of expert systems with the
additional ability of fuzzy logic. In a fuzzy logic system, specific values entered
for creditworthiness criteria are not allocated to a single categorical term e.g.
high or low, rather they are assigned multiple values. As an example consider
a expert system that rates firms with return on equity of 15% or more as good
and a return on equity of less than 15% as poor. It is not in line with human
decision-making behavior to have such sharp decision boundaries, as it is not
sensible to rate a firm with return on equity of 14.9% as poor and a firm with a
return on equity of 15% as good. By introducing a linguistic variable as seen in
Figure 3.2 a firm having return on equity of 5% would be considered 100% poor
and a firm having return on equity of 25% would be considered 100% good. A
firm with a return on equity of 15% would be be considered 50% poor and 50%
good. These linguistic variables are used in a computer based evaluation based
24 Commonly Used Credit Assessment Models
1
Poor Good
0.8
0.6
0.4
0.2
0 5 10 15 20 25 30
Return on equity (%)
In 1968, Altman [2] introduced his Z-score formula for predicting bankruptcy,
this was the first attempt to predict bankruptcy by using financial ratios. To
form the Z-score formula, Altman used linear multivariate discriminant analysis,
with the original data sample consisted of 66 firms. Half of the firms had filed
for bankruptcy.
All the values except the Market Value Equity, in X4 , can be found directly
from firms’ financial statements. The weights of the original Z-score was based
on data from publicly held manufacturers with assets greater than $1 million,
but has since been modified for private manufacturing, non-manufacturing and
service companies. The discrimination of Z-score model can be summarized as
follows
D = w0 + w1 X1 + w1 X2 + . . . + wk Xk (3.2)
Another popular tool for credit assessment is the logistic regression. Logistic re-
gression uses as a dependent variable a binary variable that takes the value one
if a borrower defaulted in the observation period and zero otherwise. The inde-
pendent variables are all potentially relevant parameters to credit risk. Logistic
regression is discussed further and in more detail in Section 5.2.2. A logistic
regression is often represented using the logit link function as
1
p(X) = (3.3)
1 + exp[−(β0 + β1 X1 + β1 X2 + · · · + βk Xk )]
where p(X) is the probability of default given the k input variables X. Logistic
regression has several advantages over DA. It does not require normal distribu-
tion in input variables and thus qualitative creditworthiness characteristics can
be taken into account. Secondly the results of logistic regression can be inter-
preted directly as the probability of default. According to Datschetzky et al.
[13] logistic regression has seen more widespread use both in academic research
and in practice in recent years. This can be attributed to the flexibility in data
handling and more readable results compared to discriminant analysis.
3.2 Statistical Models 27
In this section a short introduction of other methods which can be grouped under
the same heading of statistical and machine learning methods. As advances in
computer programming evolved new methods were tried as credit assessment
methods, those include
This method is also known as classification and regression trees (CART) and is
given a more detailed introduction under that name in Section 5.5.
Hazard regression3 considerers time until failure, default in the case of credit
modeling. Lando [21] refers to hazard regression as the most natural statisti-
cal framework to analyze survival data but as Altman and Saunders [3] points
out an financial institute would need a portfolio of some 20,000-30,000 firms
to develop very stable estimates of default probabilities. Very few financial
institutes worldwide come even remotely close to having this number of poten-
tial borrowers. The Robert Morris Associates, Philadelphia, PA, USA, have
though initiated a project to develop a shared national data base, among larger
banks, of historic mortality loss rates on loans. Rating agencies, have adopted
and modified the mortality approach and utilize it in their structured financial
instrument analysis, according to Altman and Saunders [3].
The revolutionary work of Black and Scholes (1973) and Merton (1974) formed
the basis of the option pricing theory. The theory was originally used to price
options4 can also be used to valuate default risk on the basis of individual
transactions. Option pricing models can be constructed without using a com-
prehensive default history, however it requires data on the economic value of
assets, debt and equity and especially volatilities. The main idea behind the
option pricing model is that credit default occurs when the economic value of
the borrowers asset falls below the economic value of the debt.
The data required makes it impossible to use option pricing models in the public
sector and it is not without its problem to require the data needed for the
corporate sector, it is for example difficult in many cases to assess the economic
value of assets.
Cash flow models are simulation models of future cash flow arising from the
assets being financed and are thus especially well suited for credit assessment
in specialized lending transactions. Thus the transaction itself is rated, not the
potential borrower and the result would thus be referred to as transaction rating.
Cash flow models can be viewed as a variation of the option pricing model where
the economic value of the firm is calculated on the basis of expected future cash
flow.
Since the pioneering work of Markowich, 1959, portfolio theory has been ap-
plied on common stock data. The theory could just as well be applied to the
fixed income area involving corporate and government bonds and even banks
portfolio of loans. Even though portfolio theory could be a useful tool for fi-
nancial institutes, widespread use of the theory has not been seen according to
Altman and Saunders [3]. Portfolio theory lays out how rational investors will
use diversification to optimize their portfolio. The traditional objective of the
portfolio theory is to maximize return for a given level of risk and can also be
used for guidance on how to price risky assets. Portfolio theory could be applied
4 financial instrument that gives the right, but not the obligation, to engage in a future
The models discussed in previous sections are rarely used in their pure form.
Heuristic models are often used in collaboration with statistical or causal mod-
els. Even though statistical and causal models are generally seen as better rating
procedures the inclusion of credit experts’ knowledge generally improves ratings.
In addition not all statistical models are capable of processing qualitative infor-
mation e.g. discriminant analysis or they require a large data set to produce
significant results.
There are four main architectures to combine the qualitative data with the
quantitative data.
- Knock Out Criteria, here the credit experts set some predefined rules,
which have to be fulfilled before an credit assessment is made. This can
for example that some specific risky sectors are not considered as possible
customers.
- Special Rules, here the credit experts set some predefined rules. The
rules can be on almost every form and regard every aspect of the modeling
procedure. An example of such rules would be that start-up firms could
not get higher ratings than some predefined rating.
Table 3.1: Typical values obtained in practice for the Gini coefficient as a mea-
sure of discriminatory power.
In the study of Yu et al. [34] highly evolved neural networks where compared
with logistic regression, simple artificial neural network (ANN) and a support
vector machine (SVM). The study also compared a fuzzy support vector ma-
chine (Fuzzy SVM). The study was performed on detailed information of 60
5 pp. 109
6 The Gini coefficient ranges form zero to one, one being optimal.The Gini coefficient is
introduced in Section 6.4
32 Commonly Used Credit Assessment Models
show that logistic regression has the worst performance of all the single model-
ing procedures, whereas SVM performs best of the single modeling procedures
By introducing fuzzy logic to the SVM the performance improves. The multi-
stage reliability-based neural network ensemble learning models all show similar
performance and outperform the single and hybrid form models significantly.
provide the best estimation for default with an average 91.69% hit rate. Neural
Networks provided the second best results with an average hit rate of 89.00%.
The K-Nearest Neighbor algorithm had an average hit rate of 85.05%. These
results outperformed a logistic regression model using the Probit link function,
which attained an average hit rate of 84.87%. Although the results are for mort-
gage loan data it is clear that the performance of logistic regression models can
be outperformed.
Current studies
Credit crisis in the 70s and 80s fueled researches in the field, resulting in great
improvements in observed default rates. High default rates in the early 90s and
in the beginning of a new millennium have ensured that credit risk modeling is
still an active research field. In the light of the financial crisis of 2008, researches
in the field are sure to continue. Most of the current research is highly evolved
and well beyond the scope of this thesis and is thus just given a brief discussion.
Even though it is not very practical for most financial institutes much of current
researches are focused on option pricing models. Lando [21] introduces Intensity
Modeling as the most exciting research area in the field. Intensity models can
explained in a naive way as a mixture of hazard regression and standard pricing
machinery. The objective of Intensity models is not to get the probability of
default but to build better models for credit spreads and default intensities. The
math of Intensity models is highly evolved and one should refer to Lando [21]
for a complete discussion on the topic.
The subject of credit pricing has also been subject to extensive researches, es-
pecially as credit derivatives have seen more common use. The use of macroe-
conomical variables is seen as a material for prospective studies.
The discussion here on credit assessment models is rather limited and for further
interest one could view Altman and Saunders [3] and Altman et al. [4] for a
discussion on the development in credit modeling, Datschetzky et al. [13] for a
good overview of models used in practice. Lando [21] then gives a good overview
of current research in the field, along with extensive list of references.
34 Commonly Used Credit Assessment Models
Chapter 4
Data Resources
The times we live in are sometimes referred to as the information age, whereas
the technical breakthrough of commercial computers have made information
recordings an easier task. Along with increased information it has also made
computations more efficient furthering advances in practical mathematical mod-
eling.
In the development of a statistical credit rating models the quality of the data
used in the model development, is of great importance. Especially important is
the information on the few firms that have defaulted on their liabilities.
In this chapter the data made available by the co-operating Corporate bank
are presented. This chapter is partly influenced by the co-operating bank’s in-
house paper Credit [11]. Section 4.1 introduces data dimensionality and data
processing is discussed. Introduction of quantitative and qualitative figures are
given in Sections 4.2 and 4.3, respectively. Customer factors are introduced in
Section 4.4 and other factors and figures are introduced in Section 4.5. Finally,
some preliminary data analysis are performed in Section 4.6.
36 Data Resources
The data used in the modeling process are the data used in the co-operating
Corporate bank’s current credit rating model, which is called Rating Model
Corperate (RMC), which is introduced in Section 4.5.1. The available data can
be grouped according to their identity into the following groups
- Quantitative
- Qualitative
- Customer factors
- Other factors and figures
Rating Model Corperate is a heuristic model and was developed in 2004. There-
fore, the first raw data are from 2004 as can be seen in Table 4.1. In order to
validate the performance of the credit rating model the dependent variable,
which is whether the firm has defaulted on it’s obligations a year after it was
rated, is needed. In order to construct datasets that are submissible for valida-
tion, firms that are not observed in two successive years and thus being either a
new customer or a retireing one, are removed from the dataset. The first valida-
tion was done in 2005 and from Table 4.1 it can be seen that the observations of
the constructed 2005 dataset are noticeably fewer than the raw dataset of 2004
and 2005, due to the exclusion of new or retireing customers. The constructed
datasets are the datasets that the co-operating bank would perform their vali-
dation on, they are however not submissible for use in modeling purposes. The
reason for that is that there are missing values in the constructed dataset.
When the data has been cleansed they are splitted into training and validation
sets. The total data will be approximately splitted as follows, 50% will be used
as a training set, 25% as a validation set and 25% as a test set:
The training set is used to fit the model and the validation set is used to estimate
the prediction error for model selection. In order to account for the small sample
of data, that is of bad cases, the process of splitting, fitting and validation is
performed recursively. The average performance of the recursive evaluations is
then consider in the modeling development.
The test set is then used to assess the generalization error of the final model
chosen. The training and validation sets, together called modeling sets, are
randomly chosen sets from the 2005, 2006 and 2007 dataset whereas the test
set is the 2008 dataset. The recursive splitting of the modeling sets is done by
choosing a random sample without replacement such that the training set is 2/3
and validation set is 1/3 of the modeling set.
- Real estate
- Trade
- Production
38 Data Resources
- Service
- Transport
Table 4.2: Summary of the portfolios concentration between sectors and sector-
wise default rates.
By analyzing Table 4.2 it is apparent that the production sector is the largest
and has the highest default rate. On the other hand the trade and real estate
sectors have rather low default rates.
Only first four categories of these ratios are used to measure firms creditwor-
thiness as the market ratios are mostly used in the financial markets. The dis-
cussion here and in the following sections on financial ratios is largely adapted
from Credit [11] and Bodie et al. [9]
As the values used to calculated the financial ratio are obtained from firm’s
financial statements, it is only possible to calculate financial ratios when a firm
has published its financial statements. This produces two kinds of problems,
firstly new firms do not have financial statements and secondly new data are
only available once a year.
in both short and long term. Financial statements are usually reported annually and splitted
into two main parts, first the balance sheet and secondly the income statement. The balance
sheet reports current assets, liabilities and equity, while the income statement reports the
income, expenses and the profit/loss of the reporting period.
40 Data Resources
Current Assets
liquidity = (4.1)
Current liabilities
The liquidity ratio can also be seen as a indicator of the firm’s ability to avoid
insolvency in the short run and should thus be a good indicator of creditwor-
thiness. By considering the components of equation (4.1), it can be seen that
a large positive value of the current ratio can be seen as a positive indicator of
creditworthiness. In the case that the current liabilities are zero, it is considered
as a positive indicator of creditworthiness, and the liquidity ratio is given the
extreme value 1000. In Table 4.3 the summary statistics of the liquidity ratio
can be seen for all sectors and each individual sector.
Table 4.3: Summary statistics of the Liquidity ratio, without the 1000 values.
The rate of observed extreme values, ev(1000), is also listed for each sector.
As can be seen in Table 4.3 by looking at the median and first quarters the real
estate sector has the lowest liquidity ratio. The transport sector also has low
liquidity ratios. The liquidity ratio for all sectors and each individual sector can
be seen in Figure 4.1.
2 Current assets are cash and other assets expected to be converted to cash, sold, or con-
They usually include amongst others, wages, accounts, taxes, short-term debt and proportions
of long-term debt to be paid this year
4.2 Quantitative key figures 41
The liquidity ratio will simply be referred to as the liquidity whereas it measures
the firms ability to liquidating its current assets by turning them into cash. It
is though worth noting that it is just a measure of liquidity as the book value of
assets might be considerable different to it’s actual value. Mathematically the
liquidity will be referred to as αl .
The Debt ratio a key figure consisting of net interest bearing debt divided by
the earnings before interest, taxes, depreciation and amortization (EBITDA)4 .
The Debt ratio can be calculated using equation (4.2) where the figures are
obtainable from the firm’s financial statement.
Debt Net interest bearing debt
= (4.2)
EBITDA Operating profit/loss + Depreciation/Amortization
Where the net interest bearing debt can be calculated from the firm´s financial
statement and equation (4.3).
Net interest bearing debt = Subordinary loan capital + long term liabilities
+ Current liabilities to mortgagebanks + Current bank liabilities
+ Current liabilities to group + Current liabilities to owner, etc.
− Liquid funds − Securities − Group debt
− Outstanding accounts from owner, etc.
(4.3)
The Debt ratio is a measure of the pay-back period as it indicates how long
time it would take to pay back all liabilities with the current operation profit.
The longer the payback period, the greater the risk and thus will small ratios
indicates that the firm is in a good financial position. As both debt and EBITDA
can be negative there are some precautions that have to be made, as it has two
different meaning if the ratio turns out to be negative. In the case where the
debt is negative it is a positive thing and should thus be overwritten as zero or
a negative number to indicate a positive creditworthiness. In the case where the
EBITDA is negative or zero the ratio should be overwritten as a large number to
indicate poor creditworthiness, in the original dataset these figures are -1000 and
1000 respectively. In the case when both values are negative they are assigned
the resulting positive value, even though negative debt can be considered as a
much more positive thing.
4 Amortization is the write-off of intangible assets and depreciation is the wear and tear of
tangible assets.
42 Data Resources
0.8
0.6
0.6
Density
Density
0.4
0.4
0.2
0.2
0.0
0.0
0 5 10 15 20 25 30 0 5 10 15 20 25 30
Liquidity Liquidity
Trade Production
0.8
0.8
0.6
0.6
Density
Density
0.4
0.4
0.2
0.2
0.0
0.0
0 5 10 15 20 25 30 0 5 10 15 20 25 30
Liquidity Liquidity
Service Transport
0.8
0.8
0.6
Density
Density
0.4
0.4
0.2
0.0
0.0
0 5 10 15 20 25 30 0 5 10 15 20 25 30
Liquidity Liquidity
Figure 4.1: Histogram of the liquidity ratio for all sectors and each individual
sector, the figures shows a refined scale of this key figure for the complete dataset.
4.2 Quantitative key figures 43
Table 4.4: Summary of debt/EBITDA, for all sectors and each individual sec-
tor, without figures outside the ±1000 range. The rate of the extreme values
ev(1000) and ev(-1000) for each sector is also listed.
From Table 4.4 it is clear that the real estate sector has considerable larger
Debt ratio than the other sector which are all rather equal. The inconsistency
between sectors has to be considered before modeling. Mathematically the Debt
ratio will be referred to as αd .
The Return On total Assets (ROA) percentage shows how profitable a com-
pany’s assets are in generating revenue. The total assets are approximated as
the average of this years total assets and last years assets, which are the assets
that formed the operating profit/loss. Return On total Assets is a measure of
profitability and can be calculated using equation (4.4) and the relative compo-
nents from the firm’s financial statements.
Operating profit/loss
ROA = 1 (4.4)
2 (Balance sheet0 + Balance sheet−1 )
44 Data Resources
0.15
0.10
0.10
Density
Density
0.05
0.05
0.00
0.00
0 20 40 60 80 100 0 20 40 60 80 100
Debt/EBITDA Debt/EBITDA
Trade Production
0.15
0.15
0.10
0.10
Density
Density
0.05
0.05
0.00
0.00
0 20 40 60 80 100 0 20 40 60 80 100
Debt/EBITDA Debt/EBITDA
Service Transport
0.15
Density
0.05
0.00
0 20 40 60 80 100 0 20 40 60 80 100
Debt/EBITDA Debt/EBITDA
Figure 4.2: Histograms of Debt/EBITDA for all sectors and each individual
sector, in a refined scale. The ±1000 values are not shown.
4.2 Quantitative key figures 45
In equation (4.4) the balance sheets5 have the subscripts zero and minus one,
which refer to the current and last years assets, respectively. For firms that do
only have the current balance sheet, that value is used instead of the average
value of the currents and last years assets. Return on assets gives an indication of
the capital intensity of the firm, which differs between sectors. Firms that have
undergone large investments will generally have lower return on assets. Start
up firms do not have a balance sheet and are thus given the poor creditworthy
value -100. By taking a look at the histograms of the ROA in Figure 4.3 it is
clear that the transport sector and especially the real estate sector have a quite
different distribution compared to the other sectors.
As can be seen from Table 4.5 the ROA differs significantly between sectors.
The mean values might be misleading and it is better to consider the median
value and the first and third quartiles. It can be seen that the transport and
real estate sectors do not have as high ROA as the others which can partly be
explained by the large investments made by many real estate sector firms. It is
also observable that the first quartile of the service sector is considerable lower
than the others indicating a heavier negative tail than the other sectors.
Solvency can also be described as the ability of a firm to meet its long-term fixed
expenses and to accomplish long-term expansion and growth. The Solvency ratio
is also often referred to as the equity ratio, consists of the shareholders’ equity6
and the balance sheet, obtainable from the firm’s financial statement.
Shareholders’ equity
Solvency = (4.5)
Balance sheet
5 Balance sheet=Total Assets=Total Liabilities + Shareholders’ Equity
6 Equity=Total Assets-Total Liabilities. Equity is defined in Section ??.
46 Data Resources
0.08
Density
Density
0.04
0.04
0.00
0.00
−100 −50 0 50 100 −100 −50 0 50 100
Return Return
Trade Production
0.08
0.08
Density
Density
0.04
0.04
0.00
0.00
Return Return
Service Transport
0.08
0.08
Density
Density
0.04
0.04
0.00
0.00
Return Return
Figure 4.3: Histograms of the Return On total Assets for all sectors and each
individual sector.
4.2 Quantitative key figures 47
The balance sheet can be considered as either the the total assets or the sum of
total liabilities and shareholders’ equity. By considering the balance sheet to be
the sum of total liabilities and shareholders’ equity the solvency ratio describes
to what degree the shareholders’ equity is funding the firm. The solvency ratio
is a percentage and ideally on the interval [0%,100%]. The higher the solvency
ratio, the better the firm is financially.
By viewing Table 4.6 it can be seen that the minimum values are large negative
figures. This occurs when the valuations placed on assets does not exceed lia-
bilities, then negative equity exists. In the case when the balance sheet is zero,
as is the case for newly started firms, the Solvency ratio is given the extremely
negative creditworthiness value of -100. To get a better view of the distribution
of the Solvency ratio, histograms of the solvency ratio can be seen in Figure
4.4. As can be seen in Figure 4.4 the distribution is mainly on the positive side
of zero. The transport and real estate sectors look quite different compared
to the other sectors. Then by considering the median value and the first and
third quantiles it is observable that the trade and productions sectors are quite
similar. The real estate and service sectors are tailed towards 100 while the real
estate is also tailed towards zero.
4.2.5 Discussion
Firms that have just started business do not have any financial statements to
construct the quantitative key figures. In order to assess the creditworthiness
of a start-up firm there are two possibilities. One is to build a separate start-up
model and the other is to adapt the start-up firms to the rating model.
There is one other thing worth noting regarding financial ratios, and that is
48 Data Resources
0.04
Density
Density
0.02
0.02
0.00
0.00
−100 −50 0 50 100 −100 −50 0 50 100
Solvency Solvency
Trade Production
0.04
0.04
Density
Density
0.02
0.02
0.00
0.00
Solvency Solvency
Service Transport
0.04
0.04
Density
Density
0.02
0.02
0.00
0.00
Solvency Solvency
Figure 4.4: Histograms of the Solvency ratio for all sectors and each individual
sector.
4.2 Quantitative key figures 49
that they are constructed on values that are called book value and might be
far from the actual market value. The book value of liabilities is subjected
to less uncertainty, but might be subjected to some uncertainty in interest and
exchange rates. That is if they hold some debt that carry adjustable rates or are
in foreign currencies, respectively. As the equity is calculated as the difference
between the total assets and total liabilities, the equity value might be far from
the actual market value. This fact results in some deteriation of the predictive
power of the financial ratios.
By considering the key figures in previous sections it is clear that there are
two problematic situations. First, it is difficult to decide what values should
be assigned in the cases when the actual ratio is nonsense and secondly is the
difference between sectors. The predictive power of the key figures would be
poor, especially for some sectors, if they where used without correcting them
for each sector. An article by Altman and Saunders [3] reports that sector
relative financial ratios, rather than simple firm specific financial ratios, are
better predictors of corporate default. It is stated that in general, the sector
relative financial ratio model outperformed the simple firm specific model.
The key figures have been scaled by the co-operating bank for use in their RMC.
The scaling process is performed in such a way such that the scaled key figures
are on the continuous scale from 1 to 7 where 1 indicates a bad situation and
7 indicates a good situation. In the cases when the actual ratios are nonsense,
they are assigned the value 1 if they are to represent a poor creditworthiness
and 7 if they are to represent a positive creditworthiness. After the simple firm
specific financial ratios have been scaled to correct them for each sector they
are referred to as scores. Since, they have been adjusted for their sector it is of
no interest to consider each sector separately.
Histograms of the scaled quantitative factors along with the default variable
and RMCs ratings can be seen in Figure 4.5. In the same figure one can see
the Spearman’s rank correlation7 and dotplots of the scaled key figures. The
Spearman’s rank correlation is used as an alternative to the Pearson’s correlation
as it is a non-parametric procedure and does thus not need any distributional
assumptions. In figure 4.5 it can be seen that there is some correlation between
the scaled key figures, especially between the debt and return scores and liquidity
7 Correlation is a numerical measure of how related two variables are. Correlation coeffi-
cients range from minus one to one where one means that they are completely the same and
minus one that they are totally different. If the correlation coefficient is zero then there is no
relation between the two variables.
50 Data Resources
Mathematically the scaled key figures will be referred to as the greek letter alpha
with a tilde sign above it, α̃.
In the credit application process, credit experts rate the potential borrower in
six different aspects, reflecting the firms position in that particular field. The
fields that make up the qualitative figures are the following.
- Market position
- Refunding
The customer chief handling the loan application rates the potential borrower
in each field. The qualitative ratings are in discrete scale from 1 to 7 where 1
indicates a bad situation and 7 indicates a good situation. Those ratings then
need to be accepted by administrators in the credit department of the bank. It
is possible to reject each individual factor if it is not relevant to a firm.
In order to get a better feel of the qualitative factors a dotplot can be seen in
Figure 4.6, where red dots are defaulted firms and black dots are solvent firms.
In the same figure one can see the Spearman’s rank correlation and histograms
of the qualitative factors. From Figure 4.6 it is clear that the qualitative factors
are considerable correlated. It is also noticeable that red dots appear more often
in the lower left corner of the dot plots indicating that qualitative factors have
some predictive power.
For example, do new firms not have earlier min or max ratings, so if those
variables are to be used in modeling purposes it would result in smaller datasets.
For the qualitative figures there are quite a few cases where one of the six values
is missing and in order to save the observation from being omitted it would be
4.3 Qualitative figures 51
2 6 10 1 3 5 7 1 3 5 7
DEFAULT
0.8
0.4
0.0
RATING
10
0.18
6
2
7
DEBT_SCORE
5
0.11
0.49
3
1
7
LIQUIDITY_SCORE
5
0.065
0.56 0.27
3
1
7
RETURN_SCORE
5
0.081
0.20
0.62 0.034
3
1
7
SOLVENCY_SCORE
5
0.0031
0.11
0.67 0.36 0.49
3
1
Figure 4.5: Dotplot for all the scaled quantitative factors along with the default
variable and RMC ratings, where red dots are defaulted firm and black dots are
solvent firms. In the lower triangular the correlation of the variables can be seen
and on the diagonal there respective histograms.
52 Data Resources
1 3 5 7 1 3 5 7 1 3 5 7
7
MANAGEMENT
5
3
1
7
STABILITY
5
0.65
3
1
7
POSITION
5
0.63 0.65
3
1
7
SITUATION
5
7
REFUNDING
5
0.63 0.54 0.54 0.52
3
1
7
RISK
5
1 3 5 7 1 3 5 7 1 3 5 7
Figure 4.6: Dotplot for all the qualitative factors, where red dots are defaulted
firm and black dots are solvent firms. In the lower triangular there is the cor-
relation of the qualitative factors and on the diagonal there are histograms of
them.
4.4 Customer factors 53
The customer factors that are listed in Table 4.7 are the available in the data
as they are used in Rating Model Corporate. As can be seen from Table 4.7
the customer factors all have three levels and most negative ones are in the
highest row and they get more positive as they get lower. The stock exchange
listed firms are unlikely to have any predictive powers as their are very few stock
exchange listed firms in the portfolio and furthermore it is not a indicator of a
more likely default event to be stock exchange listed, it is on the contrary. The
stock exchange listed firms can thus only be used as a heuristic variable, giving
8 The principal component analysis method is presented in Section 5.6.
54 Data Resources
stock exchange listed firms a higher rating than estimated. The reason for this
is that stock exchange listed firms have an active market for their shares and
can go to the market when in need for money by offering more shares.
In this section some of the factors and figures that are not part of the qualitative,
quantitative figures or customer factors, are presented.
The rating model used by FIH today is called Rating Model Corporate. As it is
a rather delicate industrial secret it will just be briefly introduced. The model is
a heuristic9 model which uses the variables presented in the previous sections.
A systematic overview of the proceedings of Rating Model Corporate can be
seen in Figure 4.7.
Weighted average of the scaled qualitative factors and weighted average of the
qualitative key figures are weighted together to get an initial score. Customer
factors are then added to the model score which is then used in an exponential
formula in order to get an estimated PD. The PDs are then mapped to a the
final score which is on the range 1 - 12. There are also several special rules.
The weighted average makes it easy to handle missing values. The performance
of RMC can be seen in Section 7.5
KOB Score is a rating from the Danish department of the firm Experian which
is an international rating agency and is Denmarks largest credit rating agency.
The correlation of KOB ratings and Rating Model Corporate is around 0.6 so
9 A heuristic is a problem solving method. Heuristics are non-conventional strategies to
solve a problem. Heuristics can be seen as some simple rules, educated guesses or intuitive
judgments.
4.5 Other factors and figures 55
it can be assumed that there is some variance there. The KOB rating is on the
scale 0 to 100, where 0 is the worst, and 100 is the greatest. So if the rating is low
then the creditworthiness is also low. The KOB rating is a weighted conclusion,
where the economical factors have the highest weight but there are also other
factors that are taken into consideration. These factors can have positive or
negative effects and can change the ratings given in Table 7.16. There are some
complications regarding the KOB score as their are some firms that are rated B
and the some number e.g. B50. In order to solve that all firms rated with B50
and higher where given the numeric value 20 and all firms having ratings lower
56 Data Resources
In the datasets generated from the banks database there are few other factors
and figures that have not been mentioned earlier they are the following
these figures and factors are now given a brief introduction. In mathematical
notations these figures will be referred to as the greek letter sigma, ς, and the
first letter of the figure as a subscript.
4.5 Other factors and figures 57
Lowest and highest earlier ratings are the maximum and minimum rating the
firm has had over the last twelve months. Earlier ratings should only be taken
into consideration with the utmost care. When earlier values are used in model-
ing purposes they are often referred to as having a memory. Including a variable
with a memory could undermine the robustness of the other variables.
Guarantor Rating
Subjective Rating
Credit experts can give their subjective opinion on what the final credit rating
should be. Credit experts are only supposed to give this subjective rating if it
is in their opinion some external factors influencing the firms creditworthiness.
Each firm has an identity number that is used to obtain matching information
between different datasets.
Default
The dependent variable is a logistic variable stating whether the firm has fulfilled
it’s obligations or not. A formal and much more detailed description can be seen
in Section 2.
Equity
The shareholder’s equity is the difference between the total assets and total debt.
Should all the firms assets be sold and all liabilities settled then the shareholders
58 Data Resources
The relative and cumulative frequencies and the relative ROC curve of 2005 and
2006 data can be seen in Figure 4.8. The relative and cumulative frequencies
and the relative ROC curve of 2005 and 2006 data can be seen in Figure 4.9.
The complete datasets where used to form Figures 4.8 and 4.9. The default
frequency of the datasets can be seen in Figure 4.8 and it is interesting to see
that there is quite some difference between years. Likewise, it is interesting
to see the difference between the distributions of the bad cases. There is also
considerably better results for the 2006 dataset compared to the 2005 dataset.
The number of variables used in this analysis is quite limited. It is thus worth
concluding with few word on variable selection for the development of a new
credit rating model.
Chen et al. [10] lists 28 variables for modeling credit default and discusses their
predictive power, using support vector machine as a modeling procedure. Behr
and Güttler [7] report quite a few interesting points on variable selection for a
logistic regression. Another interesting thing is that their research is performed
with a dataset of ten times the size of the available data for this research.
For a logistic regression it might improve the model performance if model vari-
able age would be measured as a continous variable, then by using CART anal-
ysis it could be possible to obtain information on at what age interval firms are
most vulnerable to solvency problems.
Relative Frequency for default and non−default Relative Frequency for default and non−default
cases of the 2005 dataset cases of the 2006 dataset
Non−default
Relative Frequency
Relative Frequency
0.4
Default
0.4
0.2
0.2
0.0
0.0
2 4 6 8 10 12 2 4 6 8 10 12
Cumulative Frequency for default and non−default Cumulative Frequency for default and non−default
cases of the 2005 dataset cases of the 2006 dataset
Cumulative Frequency
Cumulative Frequency
0.8
0.8
0.4
0.4
0.0
0.0
2 4 6 8 10 12 2 4 6 8 10 12
ROC Curve for the 2005 dataset ROC Curve for the 2006 dataset
Relative Frequency of default cases
0.8
0.4
0.4
0.0
0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Figure 4.8: The relative and cumulative frequencies and the relative ROC curve
of 2005 and 2006 data using complete datasets.
60 Data Resources
Relative Frequency for default and non−default Relative Frequency for default and non−default
cases of the 2007 dataset cases of the whole set
Non−default
Relative Frequency
Relative Frequency
0.4
0.4
Default
0.2
0.2
0.0
0.0
2 4 6 8 10 12 2 4 6 8 10 12
Cumulative Frequency for default and non−default Cumulative Frequency for default and non−default
cases of the 2007 dataset cases of the whole set
Cumulative Frequency
Cumulative Frequency
0.8
0.8
0.4
0.4
0.0
0.0
2 4 6 8 10 12 2 4 6 8 10 12
ROC Curve for the 2007 dataset ROC Curve for the whole dataset
Relative Frequency of default cases
0.8
0.4
0.4
0.0
0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Figure 4.9: The relative and cumulative frequencies and the relative ROC curve
of 2007 and all available data using complete datasets.
Chapter 5
The Modeling Toolbox
As competition gets harder in the banking sector, advances are constantly sought
at all levels. To this, standards in modeling is no exception. This chapter con-
tains an overview of some of the methods used to analyze data, construct models
and validate the outcome. An effort was made to make the mathematical nota-
tions as simple as possible for those readers with less statistical or mathematical
background. In the sections where more advanced topics are introduced, a brief
summary of the concept and its usage, is given in order to make it easier for
readers with less statistical knowledge to understand the topic.
The general theory behind linear models and generalized linear models is in-
troduced in Sections 5.1 and 5.2. Discriminant analysis in Sections 5.3 and
different classification methods in Sections 5.4 and 5.5. In Section 5.1 some of
the basic concepts of statistics are introduced, whereas more advanced methods
of statistics are introduced in Sections 5.2-5.5. Finally in Section 5.6 a method
used to reduce multidimensional data sets to lower dimensions is introduced.
regression and the analysis of variance or covariance and are often referred to as
general linear models. The general linear models dates back to Carl Friedrich
Gauss (1777-1855).
The underlying assumptions of the general linear model are introduced in Sec-
tions 5.1.5 - 5.1.7. In our complex world, there exist problems that do not
fit those underlying assumptions of the general linear models and therefore an
extension called generalized linear models is introduced in section 5.2. As it
is rather inconvenient that both general and generalized linear models have
the same initials, general linear models will be abbreviated as LM and the
generalized linear models as GLM. Montgomery and Runger [24] give a good
introduction to LM some of the topics are adapted therefrom.
The mathematical notation of the general linear regression is widely known and
can be found in most statistical textbooks. Even though linear regression is
not used directly in the model it is the foundation of the logistic regression and
general knowledge of it is of great importance. Linear regression is used to model
a certain variable called the response or dependent variable which is denoted here
as y. The dependent variable is modeled with explanatory variables, called the
independent variable, those variables are the variables that are used to model
the dependent variable, and are denoted as X here.
y = Xβ + ε (5.5)
where
y1 1 x1,1 x1,2 ··· x1,k
y2 1 x2,1 x2,2 ··· x2,k
y = . , X = . .. .. .. ..
.. .. . . . .
yn 1 xn,1 xn,2 ··· xn,k
β0 ε1
β1 ε2
β= . and ε=.
.. ..
βk εn
where X is called the design matrix, β is the unknown parameter vector and ε is
the error term. By using historical data it is possible to estimate the parameter
vector using the least squares estimation, introduced in Section 5.1.2. From the
estimated parameter vector β it is possible to obtain a fit or a prediction of the
dependent variable.
ŷ = X β̂ (5.6)
The term ŷ is a vector called the predicted or fitted values. The error term can
then be measured as the difference between the actual observed values,y and
the predicted values ŷ
ε = y − ŷ = y − X β̂ (5.7)
The measured error term is usually referred to as the residuals.
is to obtain the estimate of the β parameters that minimizes the loss function
L(β).
β̂ = arg min L(β) (5.8)
β
L2 (β) is also referred to as the residual sum of squares (RSS). This minimization
is called least squares estimation and is obtained by equating the first derivation
of the loss function L2 (β) to zero and solving for β. Without going into detail
the resulting equations, called the normal equations, that must be solved are
X T X β̂ = X T y (5.10)
β̂ = (X T X)−1 X T y (5.11)
β̂
z= (5.12)
V (β̂)
where cj is the jth diagonal element of (X T X)−1 . Under the null hypothesis
that βj = 0, zj is compared to different significance levels, α, of the student-t
distribution with N − k − 1 degrees of freedom, t(N −k−1,α) . A large absolute
value of zj will lead to rejection of the null hypothesis, i.e. large zj represent
significant β estimates.
Another test, frequently applied for LMs is a test for model reduction. Consider
models M0 and M1
β0 β0
β1 β1
H0 : β = β 0 =
. . . and H1 : β = β 1 =
. . .
βq βp
The deviances D0 and D1 can thus be calculated and used to calculate the
F -statistic
D0 − D1 D1
F = / (5.17)
p−q N −p
The null hypothesis is thus rejected for large values of F relative to some α-level
of the F (p − q, N − p) distribution.
66 The Modeling Toolbox
As a measure of the goodness of fit for multiple linear regression models the
coefficient of determination is introduced. The coefficient of determination, R2 ,
is based on the comparison of a suggested model to the minimal model. The
residual sum of squares (RSS0 ) of the minimal model introduced in equation
(5.2) is the largest and worst reasonable RSS value. The RSS for any other
model can be computed and comparedto RSS0 .
RSS0 − RSS
R2 = (5.18)
RSS0
For a perfect fit the RSS will be zero and the resulting coefficient of deter-
mination will be one. The coefficient of determination for the minimal model
will be zero. All models improving the minimal models should thus have R2
satisfying 0 ≤ R2 ≤ 1. The coefficient of determination can be interpreted as
the proportion of the total variation in the data explained by the model. For
R2 = 0.5 then 50% of the total variation is explained by the model.
where loglik is the maximized value of the log-likelihood function for the esti-
mated model. The AIC takes the number of variables p into the account, just
2
like the Radj . For the logistic regression model the binomial log-likelihood the
AIC becomes
2 2p
AIC = − · loglik + (5.21)
N N
If there are several competing models built from the same data then they can
be ranked according to their AIC, with the one having the lowest AIC being the
best.
5.1 General Linear Models 67
5.1.5 Normality
y ∼ N(µ, σ 2 )
and the same distributional assumption are thus necessary for the identities on
the right side of equation (5.5). The objective of the linear modeling is to find
the linear combination of the design variables X T β that results in a zero mean
residual vector
ε ∼ N(0, σε2 )
ŷ ∼ N(µ, σŷ2 )
which holds except for the minimal model then ŷ = β0 = µ. Now it is easy
to see that the variance of the minimal model is the variance of the original
observations σε2 = σ 2 . As it is the objective of the modeling to describe the
variance of the observations it is clear that σ 2 is the maximum variance and
that it is desired to find a model the results in a decrease in variance.
5.1.6 Homoscedasticity
One of the underlying assumptions for the LM is that the random variables, both
dependent and independent, are homoscedastic, that is all observations of each
variable have the same finite variance. This assumption is usually made before
the modeling and leads to adequate estimation results, even if the assumption
is not true. If the assumption is not true the model is called heteroscedastic.
Heteroscedasticity can be observed from scatterplots of the data variables. Het-
eroscedasticity can also be observed from analyzing residual plots, that is plot of
residuals against the dependent variables. An example of both Homoscedastic
and Heteroscedastic data can be seen in Figure 5.1.
Homoscedasticity Heteroscedasticity
120 120
100 100
80 80
60 60
y
40 40
20 20
0 0
−20 −20
−20 0 20 40 60 80 100 120 −20 0 20 40 60 80 100 120 140
x1 x2
5.1.7 Linearity
The term linear in the general linear regression might cause confusion for those
who are not familiar with the term, in the sense what is linear? The term refers
to the β parameters, as they must form a linear collection of the independent
variables. The independent variables may be transformed as desired by non-
linear functions. Such transformations are made on non-normally distributed
variables to try to make them closer to be normally distributed. A better fit
is observed if the independent variables are normally distributed. In order to
make this clear two simple examples are shown.
y = β0 + eβ1 x1 + β2 x2 + · · · + βp xp + ε (5.22)
As data do not always comply with the underlying assumptions, more advanced
methods were developed and called generlized linear models GLM. Since the
term was first introduced by Nelder and Wedderburn (1972), it has slowly be-
come well known and widely used. Acknowledgment has to be given to the
contribution of the computer age, which has brought access to large databases
and major advances in computing resources.
The main idea behind GLM is to formulate linear models for a transforma-
tion of the mean value, with the link function, and keep the observations un-
transformed and thereby preserving the distributional properties of the obser-
vations. Consider the situations when:
One of the advances was the recognition that the nice properties of the normal
distributions where shared by a wider class of distributions called the exponential
family of distributions. The exponential family will be introduced in the next
section. Most definitions and theory in this section is influenced by two books
on GLM i.e. Thyregod and Madsen [31] and Dobson [14].
The exponential family are in fact two sets of families, the natural exponential
family and the exponential dispersion family.
artificial variables
70 The Modeling Toolbox
where the function κ(θ) is called the cumulant generator. The formulation in
5.24 is called canonical parameterization of the family and the parameter θ is
called the nuisance or canonical parameter.
The dispersion family has an extra parameter the so-called dispersion parameter.
The exponential dispersion family is defined as
The exponential family forms the basis for the discussion on GLM.
In case the dependent variable is measured on the binary scale [0,1], the math-
ematical problem is called logistic regression. Logistic regression is special case
of the GLM and is given a discussion here. A definition of a binary random
variable
1 if an event occurs
Z=
0 if an event does not occurs
the probabilities of each case can be modeled as Pr(Z = 1) = p and Pr(Z =
0) = 1 − p. For n such independent random variables (Z1 , Z2 , . . . , Zn ) with
probabilities Pr(Zi = 1) = pi , the joint probability function is
n
" n X n
#
Y
zi 1−zi
X pi
pi (1 − pi ) = exp zi ln + ln(1 − pi ) (5.26)
i=1 i=1
1 − pi i=1
There are several link functions for the binomial distribution. A popular link
function is the logistic or logit link function
pi
g(p) = ln = X Ti β
1 − pi
exp(X Ti β)
p(x) = (5.30)
1 + exp(X Ti β)
There are other members of the exponential family that can be used in logistic
regression, introducing:
In the model selection process for the GLM an important measure is the de-
viance, also called the log likelihood statistic. Consider the log-likelihood l (y, θ)
function of a member of the exponential family, the deviance is defined as
d(y, θ) = 2 max l (y, θ) − 2l (y, θ) (5.34)
θ
the deviance of the binomial distribution is a rather complex function and will
therefore not be derived here. The deviance is usually reported in the result
summary of most computation software handling the GLM. It is interesting
that the deviance of Y ∼ N (µ, σ 2 I) is simple the residual sum of squares (RSS)
introduced in Section 5.1.2. The deviance is thus used in the same matter
in GLM as the RSS is used in the LM, and models with smaller deviance is
preferable than models with larger deviance.
D0 − D
pseudo R2 =
D0
Linear discriminant analysis (LDA) arises from the assumption that the two
dependent variables are normally distributed. Consider the binary dependent
variable y forming the two classes j ∈ {0, 1} and k independent variables X
spanning the k-dimensional input space Rk . Decision theory for classification
expresses the need to know the class posteriors Pr(Y |X) for optimal classifica-
tion. Suppose fj (x) is the class conditional density of X and πj is the prior
probability of class j. Then from Bayes theorem
fj (x)πj fj (x)πj
Pr(Y = j|X = x) = P1 = (5.35)
j=0 fj (x)πj
f 0 (x)π0 + f1 (x)π1
Consider the two classes defined by the binary dependent variable to be as-
sociated with a multivariate independent variables which are assumed to be
74 The Modeling Toolbox
it is possible to see that the equal covariance matrices assumption causes the
normalization factors to cancel, as well as the quadratic part in the exponents.
It is from the assumption that the covariance matrices are equal, that the LDA
is derived. That is however, hardly the case in practice and thus an estimate of
the covariance matrix is used, it is found to be
!
1 X
T
X
T
Σ̂ = (X0i − µ̂0 )(X0i − µ̂0 ) + (X1i − µ̂1 )(X1i − µ̂1 )
N0 + N1 − 2 i i
(5.40)
Even though it is not obvious, it can be seen that equation (5.39) is a linear
function of x. The discrimination boundary is set where
It is also possible to combine the two discriminant function and then the LDA
rule would classify for class 1 if
1 T −1 1 π1
xT Σ̂−1 (µ̂1 − µ̂0 ) > µ̂ Σ̂ µ̂1 − µ̂T0 Σ̂−1 µ̂0 + ln (5.42)
2 1 2 π0
Lando [21] points out an interesting connection between the LDA and logis-
tic regression using logit link function. From the results of Bayes theorem in
equation (5.35) it is possible to write the probability of default as
Applying the logarithm and comparing equation 5.39 with equation 5.30 it ap-
pears as if the LDA and LR are the same.
Although they are not, the both have the same linear form but differ in the
way the linear coefficients are estimated. The logistic regression is considered
to be more general, relying on fewer assumptions. The LDA assumes that
the explanatory variables are normally distributed and that they have equal
covariance matrices, even though the assumption regarding equal covariance
matrices is considered to be less significant. Those assumptions make it possible
to consider the logistic regression as a safer bet, as it is more robust than the
LDA.
76 The Modeling Toolbox
As there are much fewer defaulted firms than non-defaulted in credit rating
models classifiers as k-NN do not work that well as classifiers. Their criteria
can though be used as additional information. For the k-NN classifier method,
the average of defaults in the neighborhood is used as an independent variable.
It is easy to argue against using different methods with the same input data,
but as they rely on completely different assumptions it can be thought of as
looking at the data from different perspectives to get a better view of the big
picture.
The splitting rules make up a binary decision tree, the solution algorithm does
not only have to automatically decide on the splitting variables and split points,
but also the shape of the tree. In mathematics trees grow from the first node,
called the root, to their final nodes called leaves. All nodes, except for the root
node have a unique parent node and the resulting relationship is that all nodes
except for the leaf nodes have exactly two child nodes. Ancestors refers to
parents, grandparents and so forth and points of connection are known as forks
and the segments as branches.
be the proportion of default observations. Starting with the complete data set,
consider a splitting variable J and split point s, the objective is to solve
min Qm (T ) (5.51)
J, s
where Qm (T ) is called the node impurity of the tree T . The impurity of a set of
samples is designed to capture how similar the samples are to each other. The
smaller the number, the less impure the sample set is. There are few different
measures of Qm (T ) available, including the following:
They are all similar, but cross-entropy and the Gini index have the advantage
that they are differentiable and thus better suited to numerical optimization
than the misclassification error.
5.5 CART, a tree-based Method 79
Root Node
Jj T s1
True False
Jj T sm−1 Jj T sm
p1 .. pm ..
. .
N1 Nm
pm−2 pm−1 pm+1 pM
Nm−2 Nm−1 Nm+1 NM
The first splitting rule makes up the root node. All observations that satisfy the
splitting rule follow the left branch from the root node while the others follow
theright branch. At the leaf nodes the default proportion, pm , is reported along
with the number of observation, Nm in accordance to the splitting rules of its
ancestors. There are numerous stopping criterias including,
The size of a tree is an important measure, as a large tree might overfit the data
whereas a small tree might miss out on important structures. A useful strategy
called cost-complexity pruning is often used to get well sized tree. The strategy
is as follows, a large tree T0 is grown, stopping the splitting process when some
small minimum node size is reached. Let the subtree T ⊂ T0 be any tree that
can be obtained by pruning T0 by collapsing any number of its internal nodes,
that is not the leaf nodes. Let |T | denote the number of terminal nodes in T .
Then the cost complexity criterion is defined
|T |
X
Cα (T ) = Nm Qm (T ) + α|T | (5.52)
m=1
The objective is then to find, for some α, the subtree Tα ⊂ T0 that minimizes
80 The Modeling Toolbox
Cα (T ). The tuning parameter α > 0 governs the tradeoff between tree size and
its goodness of fit to the data.
While there is a certain degree of complexity that follows the use of CART it
has been shown to be a very helpful tool to analyze data. The introduction here
is mainly adapted from Hastie et al. [18].
PX = Y (5.53)
where
x1,1 x1,2 ··· x1,n
p1 p1,1 ··· p1,m x1 x2,1
.. .. .. , x2,2 ··· x2,n
.. X = ... = .
P = . = . . . .. .. ..
.. . . .
pm pm,1 ··· pm,m xm
xm,1 xm,2 ··· xm,n
y1,1 y1,2 ··· y1,n
y1 y2,1 y2,2 ··· y2,n
Y = ... = .
.. .. ..
.. . . .
ym
ym,1 ym,2 ··· ym,n
On our journey to find the transformation P there are few things which need
to be introduced. The original data has the variance-covariance matrix C X , a
m × m matrix.
1
CX = XX T
n−1
Y also has a m × m variance-covariance matrix
1
CY = YYT
n−1
and it is the objective of PCA to find an orthogonal transformation P such that
the variance-covariance matrix C Y is a diagonal matrix.
A = EDE T (5.55)
1
CY = P AP T
n−1
1
= P (P T DP )P T
n−1
1
= (P P T )D(P P )T
n−1
1
= (P P −1 )D(P P −1 )
n−1
1
= D (5.56)
n−1
Without going into too much detail, the eigenvalues λ of the symmetric m × m
matrix A is a solution to the equation
det(A − λI) = 0
there will exist m real valued eigenvalues, some may have equal values. If λ is
an eigenvalue, then there will exist vectors p 6= 0, such that
Ap = λp
There are few properties of the PCA that have need to be addressed:
- Similar result is that the total variance i.e. the sum of variance of the
original variables is equal to the sum of the variance of the principal com-
ponents X X
V (xi ) = V (y i )
i i
Then the empirical correlation matrix form the basis of the analysis instead
of the empirical variance matrix.
- It is also possible to compute PCA with a method called singular value
decomposition (SVD) which is mathematically more involved but numer-
ically more accurate according to R Development Core Team [23].
84 The Modeling Toolbox
Chapter 6
Validation Methods
The term, Discriminatory Power, plays a significant role in the validating pro-
cess. The term refers to the fundamental ability of a rating model to differentiate
1 Chapter 6
86 Validation Methods
In Figure 6.1 the relative frequency of both default and non-default cases can be
seen for the 12 rating classes, where class 1 represents the worst firms and class
12 the best firms. From the density functions it is apparent how the ratings for
both default and non-default cases are distributed. It is desired that the two
distributions are considerable different in order to discriminate between default
and non-default cases. A nice distribution of the bad cases would have the most
observations at rating class one, as it does in Figure 6.1, decreasing smoothly
to the right. On the other hand the distribution of the good cases would be
preferred to be the mirrored distribution of bad cases, that is skewed to the
right. Most important is that the two distributions are different and separable.
Non−default
Default
0.4
0.3
Relative Frequency
0.2
0.1
0.0
2 4 6 8 10 12
Rating Class
Figure 6.1: Example of a frequency density distribution for both default and
non-default cases.
An ideal rating procedure would run vertically from (0,0) to (0,1) and horizon-
tally from (0,1) to (1,1). That rating process should also just have two ratings.
A rating model with no predictive powers would run along the diagonal. It is
desirable that the ROC curve is a concave function over the entire range. If this
condition is violated, then there is a rating class with lower default probability
than a superior rating class. It is obviously desired to have decreasing default
probabilities with higher ratings.
88 Validation Methods
0.6
0.4
0.2
Non−default
Default
0.0
2 4 6 8 10 12
Rating Class
The Area Under the Curve (AUC) is the a numerical measure of the area under
the ROC curve. For an ideal rating model the AUC should be 1 and for a non-
differentiating model it would be 0.5. The higher the value of AUC, the higher
is the discriminatory power of the rating procedure. AUC is a one-dimensional
measure of discriminatory power and does thus not capture the shape of the
ROC curve. In Figure 6.4 ROC curves for two different models with the same
AUC measure are shown. Then it impossible to select either model from the
AUC statistic alone. The steeper curve, corresponding to the black curve in
Figure 6.4, would though be preferred as it predicts better for rating classes
considering worse firms. The slope of the ROC curve in each section reflects
6.4 Measures of Discriminatory Power 89
ROC Curve
1.0
0.8
Relative Frequency of default cases
0.6
0.4
0.2
0.0
Figure 6.3: Example of a frequency density distribution for both default and
non-default cases. The red line represents the ideal procedure, whereas the blue
line represents a procedure with no predictive power.
the ratio of bad versus good cases in the respective rating class. It would thus
be preferred that the ROC curve would be steepest in the beginning and then
the steepness would decrease. This would make the curve concave 2 over the
entire range of the curve. A violation of this condition an inferior class will
show a lower default probability than a rating class which is actually superior.
A concave curve can of course be caused by statistical fluctuations but should
be avoided in a development process of a new rating model. Both curves in
Figure 6.4 are concave, the red curve is concave in the region near the (0.1,0.5)
point, and the black curve is concave in the region near the (0.9,0.95) point.
ROC Curve
1.0
0.8
Relative Frequency of default cases
0.6
0.4
0.2
0.0
Figure 6.4: Comparison of two different ROC curves with the same AUC.
and is thus completely correlated with the AUC indicator. It has thus no ad-
ditional information and is just calculated to compare model performance to
reported performance. The Gini coefficient has geometrical connection to a
graphical representation called the CAP curve or Powercurve, which is a graph
of the cumulative frequencies of default cases versus all cases. The ROC curve
is more sensitive than the CAP curve and is thus preferred.
6.4 Measures of Discriminatory Power 91
where H0 refers to the absolute information value which, represents the infor-
mation known, regardless of the rating procedure.
92 Validation Methods
seperating the essential properties of the Brier Score. The first term
p(1 − p) (6.10)
describes the variance of the default rate observed over the entire sample, p.
This value is independent of the rating procedure and depends only on the
observed sample.
Which is simple the Brier Score scaled with the variation term, which is constant
for each sample. Recalling that a low value for the calibration is desired and a
large value for the resolution, it is easy to see that larger values are desired for the
Brier Skill Score. Both the calibration and resolution can be considered as one-
dimensional measure of discriminatory power and thus BSS two dimensional.
The fact that the resolution term is larger than the calibration term in absolute
terms undermines the BSS. It might be better to consider the two terms sepa-
rately as a great improvement in calibration might be overseen, if the value of
the resolution would increase by the same amount. Reliability diagrams give a
visual representation of the Brier Skill Score and are considered in Section 6.4.6.
94 Validation Methods
Reliability diagrams also referred to as the calibration curve show observed de-
fault rates against forecasted default rates. An example of a reliability diagram
can be seen in Figure 6.5. The red line in figure represents the observed default
Reliability Diagram
1e+02
1e+01
Observed Default Rate
1e+00
1e−01
1e−02
frequency for the whole portfolio. The blue line is a diagonal line and represents
the optimal line for the calibration curve. The black line then represents the
observed calibration curve of a rating model. A well calibrated model proce-
dure would fall very closely to the diagonal line. It is observable that there are
six observations which make up the calibration curve, which means that there
where observed defaults in six rating classes of twelve.
Future Rating
A Aa B Bb C Cc Default
Current Rating
A 16 11 9 2 0 0 0
Aa 4 7 9 5 3 1 0
B 1 11 15 15 9 5 1
Bb 0 3 14 19 13 8 3
C 0 0 2 9 14 9 5
Cc 1 0 1 4 7 9 9
columns and the future rating listed in the rows. The observed frequencies are
generally accumulated along the main diagonal of the matrix. The cases that
lie on the diagonal represent borrower who did not migrate from their original
rating over the observed time horizon. The more rating classes that are in use
the more frequently changes will be observed between ratings and lower the con-
centration along the diagonal. In order to calculate the transition probabilities
it is necessary to convert the absolute numbers into row probabilities. Each
row should thus sum up to one. Datschetzky et al. [13] suggests that in order
96 Validation Methods
Sun and Wang [29] state that stability analysis must take into account the time
homogenous of transition matrix to analyzing whether the model results are
PIT or TTC rating system. By recognizing that the identity matrix I as a
homogenous matrix, which can also be seen as a completely through the cycle
(TTC) rating procedure. The deviation from homogeneously can be measured
by defining a matrix P̃ representing the distance form the actual matrix P to
the homogenous matrix I.
P̃ = P − I (6.14)
Jafry and Schuermann [19] discuss various methods on how to measure the
deviation from homogeneously and propose a metric defined as the average
singular value of a transition matrix, Msvd ,described in equation 6.15, where
6.5 Discussion 97
λi denotes the ith eigenvalue. The eigenvalues can be obtained using singular
value decomposition, which makes it easy to compute the average singular value
as the average and the resulting diagonal matrix D.
n q
1X
Msvd (P ) = λi (P̃ 0 P̃ ) (6.15)
n i=1
For the identity matrix which can be seen as a representative matrix for through-
the-cycle (TTC) ratings and the resulting average singular value is zero. For a
completely point-in-time (PIT) ratings the average singular value is one. The
scale is linear in between those two values.
6.5 Discussion
It seems trivial that banks would hold two rating models, one that could be
considered as a PIT and another that was TTC. The PIT rating system could
then consider macro-economic variables along with all the other variables and
be updated frequently. While the TTC system would rely more on qualitative
assessments along with key figures. It would though require additional cost due
to model development. Another possibility is to have another rating scale with
fewer possible ratings. That would though cause problems for firms having PDs
that are on the edges of some PD-interval, as they might then frequently be
changing between grades.
It should be noted that information on the properties of the rating model is lost
in the calculation of CIER, AUC and all the other one-dimensional measures of
discriminatory power. As they have limited meaning, as individual indicators, in
the assessment of a rating model. It is perhaps best seen for the terms making
up the Brier Skill Score that different discriminatory power indicators might
improve for a specific model while others deteriorate.
Modeling Results
In previous chapters the modeling toolbox and validation methods that are used
in the development process for a new credit rating model are presented. In this
chapter the most important findings of the development process of a new credit
rating model are presented. The development process is given a full description
in Section 2.3. The findings are presented in an order of significancy and less
important findings can be seen in Appendix B.
Firstly, the general performance of a logistic regression model using the same
variables as used in Rating Model Corporate (RMC) is compared to the per-
formance of RMC in Section 7.1. In Section 7.2 results of principal component
analysis are reported. The resampling process that was used in most of the
modeling is introduced in Section 7.3. The modeling performance of single vari-
able models can be seen in Section 7.4 and variable selection process can be seen
in Section 7.5. Performance of models using new parameters are introduced in
Section 7.6 and discriminant analysis is presented in Section 7.7. Finally, results
for different link functions can be seen in Section 7.8.
100 Modeling Results
As the aim of the thesis is to see whether logistic regression can outperform
the benchmark credit rating model, Rating Model Corperate (RMC), used in
the co-operating Corporate bank. A logistic regression model using the same
variables as used in RMC is constructed and predictions made for the testset.
The testset consists of creditworhiness observations from 2007 and observations
on defaults from 2008. The total modeling set is used to construct the parameter
estimates for the predicting model. The rating performance of RMC, logistic
regression model and logistic regression model with subjective ratings can be
seen in Table 7.1.
Table 7.1: Performance statistics of RMC, logistic regression model and logis-
tic regression with subjective ratings. High values are desired, except for the
Calibration.
As can be seen in Table 7.1 by considering for example the PCA.stat1 it can
be seen that the RMC has considerable lower score than the logistic regression
model.
The model that has the heading Subjective is the same logistic regression model
except that the ratings are overwritten with the subjective ratings if they are
present2 . The subjective ratings are used in RMC and it is thus interesting to see
whether it improves the performance of the logistic regression model. PCA.stat
and BSS are indicating that the subjective ratings are improving but the AUC,
Pietra and CIER statistics are indicating otherwise. It is debatable whether
the subjective ratings are indeed improving the performance and from the large
values of PCA.stat it is out of its comfort zone3 . It is of course optimal if a rating
1 The PCA.stat discriminatory power statistic is presented in Section 7.2.
2 The subjective ratings are the special rating opinion of credit experts that feels that the
there are some special conditions that is not captured by the rating model.
3 See Section 7.2 for discussion of this matter.
7.1 General Results 101
model would perform that well, that the subjective ratings could be assumed
un-necessery. Further interesting observations can be made by comparing the
validation figures in figure 7.1.
By considering the relative frequency of the good cases of RMC and the LR
model it can be seen that there is considerable difference in the distributional
shape between the two models. The RMC has a normal like distribution with
a bit heavier tale towards the rating one. The logistic regression has a totally
different distribution whereas it is almost steadily increasing from one to twelve.
Likewise it is interesting to view the distribution of bad cases, that is defaults,
as it can be seen that compared to earlier observed frequencies as in figure 4.8
and 4.9 that there are quite many observed defaults with relatively high credit
ratings.
It is also worth noting that the logistic regression model has defaults up to
the rating 9 whereas the RMC only has defaults up to the rating 7. Although
one might consider that this as a negative thing for the LR model it is not,
on the contrary, as the center of the LR distribution is approximately 9 and
approximately 6 for RMC, as can be seen by viewing the cumulative frequencies.
The LR model puts the whole scale into better use, it is for example difficault
to argue what the difference between ratings 9 and 12 in RMC wheras there is
obvously much greater difference for the LR model.
By comparing the ratings of the RMC to the relative ratings obtained from
the LR model, the difference in distributions is observable. From Table 7.2 it
can be seen that the LR model has generally considerable higher credit ratings.
It is interesting that most ratings higher than 8 in RMC are given the rating
12 in the LR model. The higher ratings are results of the lower probabilities
of default observed from the LR model. The probabilities of default that are
obtained from the LR model are largely dependent on the general default rate
of the modeling set. It is then possible to manipulate the general PDs, higher
PDs could be obtained by removing some of the non-default firms. Another
possibility would be to reconsider the transformation that transformes the PDs
to risk ratings.
It is interesting to note that as can be seen form Table 7.1 the Calibration of the
models is higher than the Resolution and that is totally different from the results
of previous years. This results in a negative BSS and it is quite interesting to
view the reliability diagrams of the two models which can be seen in figures 7.2.
By considering first the reliability diagram of RMC it is clear that the model is
poorly calibrated, as the calibration curve does not evolve around the diagonal
line. The reliability diagram of the LR model is partially better calibrated and
can be seen in Figure 7.3.
102 Modeling Results
Relative Frequencies of good and bad cases Relative Frequencies of good and bad cases
for the LR Rating Model for RMC
Good cases
0.4
0.4
Relative Frequency
Relative Frequency
Bad cases
0.2
0.2
0.0
0.0
2 4 6 8 10 12 2 4 6 8 10 12
Cumulative Frequencies of good and bad cases Cumulative Frequencies of good and bad cases
for the LR Rating Model for RMC
Cumulative Frequency
Cumulative Frequency
0.8
0.8
0.4
0.4
0.0
0.0
2 4 6 8 10 12 2 4 6 8 10 12
0.8
0.4
0.4
0.0
0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Figure 7.1: Validation plot. Compares the performance of RMC and a Logistic
Regression Model.
7.1 General Results 103
Reliability Diagram
1e+02
1e+01
Observed Default Rate
1e+00
1e−01
1e−02
Figure 7.2: Reliability diagram of Rating Model Corporate. Shows the observed
default rate of each class against the forecasted default rate of respective class.
The black line is the calibration curve whereas the the red line is the observed
default rate of the entire portfolio. The blue line is represents the optimal line.
104 Modeling Results
5 1 1 3 14 33 96 101 75 54 34 9 15
6 0 1 3 6 20 47 88 131 133 72 53 46
7 0 0 1 1 6 16 31 69 106 101 101 91
8 0 0 0 1 0 2 16 19 25 50 72 144
9 0 0 0 1 0 2 1 4 18 18 22 125
10 0 1 0 0 0 0 0 0 6 4 9 58
11 0 0 0 0 0 0 0 0 0 0 1 30
12 0 0 0 0 0 0 0 0 0 0 0 9
Table 7.2: Rating comparison matrix, comparing the ratings of RMC to the
ratings of the logistic regression model.
In Tables 7.3 and 7.4 the transition matrices of RMC and the LR models,
repectively. The transition matrices show the change in ratings from the ratings
of 2007, and the the current ratings, 2008. The two matrices have the highest
rates along the diagonal as expected. There is an obvious difference between
the two transition matrices as the one for RMC has the highest density in the
middle of the matrix. The LR transition matrix has the highest desnity in the
lower right corner. It is also observable that the LR model has few identities
that have had high rating in 2007 and have had major downfall since then.
That is a clear disadvantage as the bank would like to belive that a firm with
high ratings would also have a high rating a year later. As a measure of rating
stability the average singular value is calculated4 . The average singular value of
RMC is calculated as 0.6519 and can thus be considered as a 65% point-in-time
(PIT) rating system. The average singular value of the LR model is calculated
as 0.7135. The LR model can thus be considered as a 71% point-in-time (PIT)
rating system. The LR model can thus be considered as a more robust rating
procedure than RMC.
The main conclusion that can be made after considering the results in this
section must be clear, and that is that a logistic regression model outperformes
the heuristic model RMC.
Reliability Diagram
1e+02
1e+01
Observed Default Rate
1e+00
1e−01
1e−02
2008
1 2 3 4 5 6 7 8 9 10 11 12
1 16 11 9 4 1 3 0 1 0 0 0 0
2 4 7 9 9 3 1 2 1 0 0 0 0
3 1 11 35 30 17 5 1 1 0 0 0 0
4 4 4 24 69 54 28 8 5 2 0 0 0
5 4 3 8 34 117 94 37 12 3 2 0 0
2007
6 2 3 4 26 70 176 120 34 8 0 0 0
7 2 1 0 6 15 61 149 85 26 1 0 0
8 0 1 1 2 5 13 43 99 45 7 1 0
9 0 1 0 0 1 4 19 23 41 24 1 0
10 0 0 1 0 0 0 0 6 3 20 5 1
11 0 0 0 0 0 0 0 2 0 2 2 0
12 0 0 0 0 0 0 0 0 0 0 4 3
Table 7.3: Transition matrix, comparing the changes between years of ratings
in RMC.
106 Modeling Results
2008
1 2 3 4 5 6 7 8 9 10 11 12
1 3 2 3 1 1 2 1 0 0 0 0 0
2 0 2 3 3 2 1 0 1 0 0 0 0
3 2 4 6 3 8 0 3 0 1 0 0 0
4 0 0 8 17 17 6 3 1 2 0 1 1
5 1 0 5 4 15 19 11 9 3 1 0 0
2007
6 0 0 3 4 14 36 26 9 13 6 3 0
7 0 0 0 2 10 14 31 24 14 8 7 3
8 0 0 0 0 5 14 21 52 33 20 8 7
9 0 0 1 0 3 3 17 38 65 43 23 13
10 0 0 1 0 1 1 5 18 31 47 40 21
11 1 1 1 1 0 2 2 9 16 28 51 45
12 0 0 0 2 4 1 5 5 10 30 43 235
Table 7.4: Transition matrix, comparing the changes between years of ratings
in LR.
7.2 Principal Component Analysis 107
Table 7.5: The list and description of different Principal Component Analysis
that where done.
The general results of PCA I-VI are given a discussion in appendix B. The
performance of the principal component representatives of PCA I-VI can be seen
in Section 7.4. The PCA of the discriminatory power indicators are however of
more interest and is given a full discussion in Section 7.2.1.
The fact that there is no single numerical measure of model performance makes
the validation of a rating procedure a difficault task. In order to address this
problem PCA is performed on a set of discriminatory power indicators to re-
duce the dimension of variables that are taken into consideration. The PCA
is performed on numerous discriminatory power indicators and then the first
principal component representatives considered as a single numeric measure of
discriminatory power. In order to explain this in more detail it is important to
understand what is going on inside the PCA.
108 Modeling Results
There are quite a few interesting things that can be seen in Figure 7.4, first it
is possible to see that the AUC and the Gini Index are completely correlated,
as exspected. Maybe more surprisingly the Pietra Index is pretty correlated
with the AUC and Gini Index, while the CIER indicator is not that correlated
with other indicators except maybe to Resolution and BSS. The Resolution
and BSS are then again very correlated, resulting in the conclusion that the
calibration measure has little leverage in the BSS, due to the relative difference
in size between the calibration and resolution, whereas the resolution is generally
considerable larger.
The variance measures is only dependent on the default rate of the sample
and it is preferable mostly uncorrelated with the other discriminatory power
indicators. The Brier score is however pretty correlated with the variance and
it is thus easy to see why the BSS should be preferred to the Brier Score.
By recalling that the Calibration is a measure of how well calibrated the model
is, i.e. the desired small difference between forecasted and observed default
rates. As can be seen in Figure 7.4 there is not considerable correlation between
the Calibration and the other indicators, it is possible to conclude that no other
indicator is describing the calibration. The Brier indicator is almost completely
uncorrolated with the Calibration indicator, further undermining the usability
of the Brier indicator.
It is apparent from Table 7.6 that the first principal component describes most
of the variance of the four indicators. The first principal component will be
refered to as PCA.stat in model performance reportation. It is then interesting
7.2 Principal Component Analysis 109
0.90
AUC
0.80
Pietra
0.50 0.65
0.84
CIER
−0.1 0.3
0.56 0.59
0.80
Gini
1.00 0.84 0.56
0.60
0.0175
Variance
0.11 0.10 0.065 0.11
0.0155
4e−04
Calibration
0.36 0.32 0.32 0.36 0.17
0e+00
Resolution
0.60 0.58 0.78 0.60 0.19
0.51
0.0010
0.0165
Brier
0.022
0.37 0.34 0.25 0.37
0.89 0.22
0.0135
0.12
BSS
0.012
0.62 0.60 0.78 0.62 0.38
0.97 0.42
0.04
Table 7.6: The rotation of variables and summary of the principal component
analysis of the discriminator power indicators.
It thus worth considering that the average value of the PCA.stat is zero, for the
sample that was used in the principal component analysis. Models that perform
better than average will get a positive values and models performing worse
will get a negative value for PCA.stat. By analizing the the range of the first
principal component representative in Figure 7.5, it is observed that most of the
values lie in the range of [-4,4]. As the PCA is a linear transformation, it assumes
linear relationships between data. As can be seen from the dotplots in Figure 7.4
the relationship between the variables considered in the PCA is relatively linear.
The relationship between the DPIs outside the range considered in Figure 7.4
might though be non-linear. Values of the PCA.stat outside the range of [-4,4]
must thus be considered with care.
The problem about the use of PCA.stat could be user acceptance, as those with
less statistical background might reject it’s use. It is thus worth noting that
the PCA.stat can be considered as the weighted average of the standardized
DPIs. The weights can be seen in Table 7.6 under the heading PC1 and as they
are close to being equal they are almost the simple average of the standardized
DPIs. The term standardized refers to the procedure that makes it possible to
compare variables of different sizes. The standardization is usually performed
by subtracting the mean from all the observations and dividing by the sam-
ple variance. This standardization can be considered as converting apples and
oranges into cash in order to compare them. After excessive use of PCA.stat
7.2 Principal Component Analysis 111
AUC
0.88
0.80
Pietra
0.65
0.84
0.50
CIER
0.3
0.56 0.59
−0.1
0e+00 3e−04
Calibration
0.36 0.32 0.32
Resolution
0.60 0.58 0.78 0.51
0.0010
0.0160
Brier
0.37 0.34 0.25 0.022 0.22
0.0135
BSS
0.10
0.62 0.60 0.78 0.38
0.97 0.42
0.04
PCA1
4
the conclusion was made that it was indeed a great single indicator of model
performance and there was no mismatches observed in its use.
From the consideration on whether the calibration term gets lost in the calcula-
tion process of BSS an additional PCA was performed using the four DPIs that
where used in the previous PCA along with the calibration. The results of the
PCA with five DPIs can be seen in Table 7.7.
Table 7.7: The rotation of variables and summary of the principal component
analysis of five discriminator power indicators.
From Table 7.7 it can be seen that the weight for the calibration is somewhat
smaller than the other weights. The proportion of variance that the first prin-
cipal component decribes is somewhat smaller then for the PCA with only four
DPIs. As can be seen in Figure 7.6 the correlation is similar in most cases except
for the calibration where it is somewhat larger than observed when only four
DPIs where used in the PCA. It is difficault to make any strong conclusions
from comparison of the two different PCA. The decision to go with the PCA
using only four DPIs was made as it was considered to be a safer choose.
In order to estimate how many resamplings are necessary to get a stable measure
of the actual model performance, the model performance of RMC was considered
and when the performance statistic and standard deviation have stabilized then
sufficient number of resamplings can be assumed. To save computation time
only the AUC discriminatory power indicator is considered i.e. the mean over
all samples and the respective standard deviations.
7.3 Resampling Iterations 113
AUC
0.88
0.80
Pietra
0.65
0.84
0.50
CIER
0.3
0.56 0.59
−0.1
0e+00 3e−04
Calibration
0.36 0.32 0.32
Resolution
0.60 0.58 0.78 0.51
0.0010
0.0160
Brier
0.37 0.34 0.25 0.022 0.22
0.0135
BSS
0.62 0.60 0.78 0.38
0.97 0.42 0.10
0.04
PCA2
2
The performance of RMC and a ramdomly chosen model with the scaled sol-
vency as a single variable, for 30, 40, 50, 60 and 80 resampling iterations can be
seen in Table 7.8.
Table 7.8: Performance of RMC and model with Solvency score as a variable
for 30, 40, 50, 60 and 80 resampling iterations.
Considering the results in Table 7.8 it is apparent that the mean AUC is in all
cases significant to the degree of two decimal figures whereas the standard devi-
ation is only significant to to the same degree after 50 iterations. Even though
this analysis is not extensive it is consider enough to consider 50 iterations to
get a fair estimate of the actual model performance. It is important to note that
no significant correlation was observed between sample size and model perfor-
mance. Strenghtening the belive that splits with difference in default rates no
more than ±10% can all be considered equally good.
It is also interesting to consider from the results in Table 7.8 that it is noted
in Datschetzky et al. [13] that for an empirical dataset, the upper bound in the
AUC is approximately 0.9, so it can be seen that the performance of Rating
Models Corperate (RMC) is very good and that there is not much room for
improvement. It is though difficault to conclude on this matters as it might be
various reasons for the good performance of RMC, e.g. the economical situation
in the years in consideration has been considered good and this particular loan
portfolio might be more conservative than the general banks considered in Ong
[26].
From considering how many resampling iterations should be performed the num-
ber of 50 iterations was chosen. To evaluate the performance of individual vari-
7.4 Performance of Individual Variables 115
ables one variable models are constructed an their performance over 50 iterations
is documented. The results of numerous univariate models with quantitative key
figures as variables are considered in Table 7.9. Only the AUC discriminatory
power indicator is considered in order to save calculation time, the average AUC
and relative standard deviations are listed in Table 7.9.
Table 7.9: Performance of single variables models for the quantitative key fig-
ures. Model 10 is though a multivariate model considering all the scaled quan-
titative key figures.
Starting from the top in Table 7.9 first is the performance of the RMC for
comparison. Models 2-9 then show the performance of the qualitative key fig-
ures. It is clear that the sector relative key figures outperform the simple firm
specific key figures with quite a margin. By considering the scaled key figures
the solvency has the best performance and then the debt, closely followed by
the return whereas the liquidity has the least discriminatory power. Model 10
then considers the sum of all the quantitative key figures, and to no surprise
outperformes all the individual parameter models.
The results of models considering the qualitative figures as variables can be seen
in Table 7.10. In models 10 -15 the performance of individual variable of the
qualitative figures are considered. It can be seen that the refunding variable
has the most predictive power and then the risk asessment of the credit experts.
The management, stability and position variables have medium performance
whereas the situation variable shows the least performance. The first principal
component of the qualitative figures performes well and the the second principal
component has some predictive power. The sum of all qualitative figures shows
the best performance, closely followed by the sum of the first two principal
components. However it is interesting to see that the standard deviation of
116 Modeling Results
Table 7.10: Performance of single variables models for the qualitative figures.
Models 18 and 19 are though multivariate models considering all the qualitative
figures.
Table 7.11: Performance of single variables models for the catagorial variables.
The performance of customer factors can be seen in Table 7.11. The sum of
numeric values of the factors as they are used in RMC perform quite well by
itself. Interestingly the factors telling what sector the firm belongs to has some
predictive power. By viewing that model in more detail it is apparent that the
real estate sector is the least risky and then trade, transport, service and finally
by far the riskiest is the industry sector. Another interesting point which can
be considered from Table 7.11 is that the obligation factor outperformes the
annotation and age factors by some margin.
In Table 7.12 the performance of some of the principal components of the various
PCA performed can be seen. The first six models, models 25 - 30, can be seen
as pairs of the first and second principal components of different PCA of the
7.4 Performance of Individual Variables 117
Table 7.12: Performance of single variables models for the principal compo-
nents of different PCA of quantitative key figures. A combined PCA for both
qualitative and quantitative figures is considered in models 31-38.
quantitative key figures. The first pair, models 25 and 26 is the results for
regular PCA of the unscaled qualitative key figures. Models 27 and 28 show
the results when different PCA where performed on observations from each
sector. That was done in order to account for the variance between sectors.
The performance of scaled qualitative key figures can be seen in models 29 and
30. The performance of models 27 and 28 are clearly better than models 25
and 26 so the attempt of making different PCA for different sectors results in
great improvements compared with single PCA for all observations. The first
principal component of the scaled key figures has the greatest discriminatory
power of models 25-30. It is then interesting to see that the second principal
component of the scaled key figures has no predictive powers.
The results of models number 31 - 34 are results of models using the first four
principal components, of a PCA using both qualitative and quantitative vari-
ables. The performance of model 31 barely makes up for the performance of
model 16 which only uses the qualitative figures. The other principal compo-
nents have no real predictive powers. It can thus be concluded that this is not
the way to go.
The results of models 35-38 are of models using the first four principal compo-
nents, of a PCA using both qualitative and scaled quantitative variables. The
118 Modeling Results
performance of the first principal component is the best of the models presented
to this point. There is some limited predictive power in the second and forth
principal components.
There are some variables available that are not used in RMC. They are listed
in Table 7.13.
Table 7.13: Performance of single variables models for variables that are not
used in RMC.
As can be seen from Table 7.13 the KOB rating system performes well, but
clearly not as well as the RMC. It is interesting to see that the minimum earlier
ratings outperformes the earlier maximum ratings. From this the conclusion
can be drawn that a more conservative model would perform better. It might
thus be considered introducing a special rule in the model that would make it
harder for ratings to go up than it is for them to go down. It is also observable
from Table 7.13 that the equity has some predictive power, indicating that size
matters. That is if the value of the equity is consider as a measure of size.
In a paper by Behr and Güttler [7] it is reported that apart from being good
individual indicators, a positive growth rate of the solvency ratio and return on
sales ratio, reduces the default risk of firms. This result gives reason to analize
the performance of the change in the quantitative key figures. The analysis
requires information of firm’s key ratios in three successive years and the data
requirements to construct one complete dataset e.g. 2006 data from 2004, 2005
and 2006 are required. With the 2008 data availible it was possible to construct
three complete datasets. The performance of the change of the scaled key figures
can be seen in Table 7.14.
From Table 7.14 it is clear that the change in the solvency ratio is the only
one which has some limited predictive power. It is worth remembering that the
return ratio measures the return on total assets, not return on sales. As this
7.5 Performance of Multivariate Models 119
Table 7.14: Performance of models with change in the scaled key figures as
variables.
analysis was performed late in the process the change in the solvency ratio is
not used in any further modeling.
From the results represented in prevous tables, the next step would be to analize
some of those variables together with other variables in such a way that the
disicive conclusion of which variables to use in the model and which ones do not
need to be considered further.
Regular stepwise regression does not work for this problem as it is desired to
have the same variables in all resamplings and can thus not be done inside the
resampling loop. The reason for this is that for one splitting into training and
validation sets a variable might be included and then excluded for a different
splitting. The process of adding one at a time is thus considered and if it
improves the model it is included in further analysis. There is a problem to
this procedure, as it is hard for variables to be excluded from the model which
is at the time the best model. It is thus up to the programmers to decide
on whether attempt should be made to exclude an excisting variable. After
having introduced the PCA.stat the variable selection process could though be
automated.
In Section 7.4 the performance of each individual variable is given a full discus-
sion. In this section the performance of different combinations of variables are
introduced. By introducing the principal component analysis into the modeling
120 Modeling Results
The results in Table 7.15 are a summary of many result tables that can be seen
in Appendix B. Models I-III are considering different methods to deal with
the key figures and it seems like the sectorwise PCA of the unscaled key figures
outperformes by some margin both the sum of all scaled key figures and the first
principal component of the scaled key figures. From this results it is interesting
to compare models VII, X and XI, which are the same models as I-III except,
7.5 Performance of Multivariate Models 121
that the they all include the pc1 (ϕ) variable. Then the PCA of the unscaled key
figures is no longer performing best it is actually performing worst. Model VII
performes best of the three VII, X and XI.
Models IV-VI are modeling the qualitative figures in three different ways. By
comparing models IV and V it is clear that the second principal component
representative of the qualitative figures does not improve the model. Model VI
performes by far best of the three models in consideration, it has however by
far the highest standard deviation.
Models XI-XIV consider different mixtures of the first two principal component
representative variables of the indipendent PCA of the qualitative and quanti-
tative figures. By comparing models XI and XII it is clear that there is almost
no improvement in the second principal component representative of the scaled
qualitative figures. Model XIII performes best and interestingly model XIV has
the worst performance.
At one point in time model IX seemed to have the best performance and thus
customer factors are introduced in models XV-XVII. All the customer factors
improve the model by some margin. It is however observable from Table 7.15
that models VIII and XVIII are outperforming model IX.
Model XVIII is clearly has the best performance of the models in Table 7.15 that
only considers the the quantitative and qaulitativ figures. In models XV-XVII
the customer factors are introduced and interestingly it seem like the age factor
no longer has predictive powers.
The models presented in Table 7.15 are not the only models tested but rather
given as an example of how the modeling selection process worked. No higher
order relationships were observed. Higher order refers to the product of vari-
ables.
122 Modeling Results
In this section variables that are not used in RMC are included in the model with
the best performance to this point. The variables in question are the KOB score,
maximum and minimum earlier ratings and equity. The performance of each
of these variables can be seen in Table 7.13 and even though the earlier ratings
show quite a good performance as single variable it is the authors oppinion that
earlier rating should not be used as a variable as it would reduce the robustness
of the model. The performance of the earlier rating was however recorded and
making a long story short, neither of the earlier ratings where able to improve
the performance of model XXII. The same result was observed for the equity.
Including these variables in the analysis results in a bit smaller dataset than the
complete dataset as these variables include some missing values. The results
obtained when the KOB score included in model XXII can be seen in Table
7.16. It is clear that some of the predictive powers of the KOB is not modeled
in RMC and vise versa. From these results, it is possible to draw the conclusion
that there is room for improvement in the modeling process. A model could
be seen as a very good model if it were not possible to improve the model
by including the KOB score. The room for improvement could be filled by
including new variables. The problem is that it is a massive project to collect
some new quantitaive variables from earlier years, and that explaines the lack
of experiment with new variables as they are not availible in the co-operation
banks database.
The models in Table 7.17 both models include the KOB rating as a variable.
The model to the left has subjective ratings overwriting the predicted ratings.
It is interesting that it gives almost no improvement to include the subjective
ratings, which is very desirable. The model to the right has a double weights
on the defaulted observations. The idea behind that attempt was to make a
more conservative model. As can be seen from Table 7.17 the performance
drops significantly, the major influence in this is the CIER indicator. It is thus
concluded that weighted analysis is not the way to go.
7.6 Addition of Variables 123
P P
ŷ ∼ pc1 (α̃) + ϕ ŷ ∼ pc1 (α̃) + ϕ
RMC +γo + γaa + ςk +γo + γaa + γaa + ςk
DP Indicator Mean Std.dev. Mean Std.dev. Mean Std.dev.
AUC 0.88879 0.01759 0.90878 0.02231 0.90946 0.02221
Pietra 0.61418 0.04502 0.68886 0.04866 0.69414 0.04829
CIER 0.28553 0.10688 0.52570 0.06470 0.54776 0.06591
BSS 0.08782 0.01946 0.13986 0.01978 0.14120 0.02118
PCA.stat -0.08614 1.55514 3.37185 1.48053 3.56527 1.50515
AIC 564.401 20.3699 564.636 19.8915
Psuedo R2 0.38769 0.02196 0.39197 0.02152
Table 7.16: Model performance with the KOB rating included as a variable.
P P
ŷ ∼ pc1 (α̃) + ϕ ŷ ∼ pc1 (α̃) + ϕ
+γo + γaa + ςk & ςs +γo + γaa + ςk & w2
DP Indicator Mean Std.dev. Mean Std.dev.
AUC 0.91349 0.02021 0.90873 0.01855
Pietra 0.70955 0.04764 0.69545 0.04584
CIER 0.55151 0.06946 0.04294 0.10642
BSS 0.13995 0.02272 0.10126 0.02051
PCA.stat 3.78365 1.52714 0.43464 1.59559
AIC 564.636 19.8915 903.181 35.7216
Psuedo R2 0.39197 0.02152 0.42616 0.02273
Table 7.17: The model to the left has subjective ratings overwriting the pre-
dicted ratings. The model to the right has additional weights on the defaulted
observations. Both models include the KOB rating as a variable. The & refers
to that the following variables are modeled heuristically.
124 Modeling Results
ŷ ∼ pc1 (α̃) + pc2 (α̃) ŷ ∼ pc1 (α̃) + pc1 (ϕ) ŷ ∼ pc1 (α̃) + pc2 (α̃)
+pc1 (ϕ) +pc2 (ϕ) +pc1 (ϕ) + pc2 (ϕ)
DP Indicator Mean Std.dev. Mean Std.dev. Mean Std.dev.
AUC 0.85973 0.02346 0.86957 0.02202 0.86799 0.02268
Pietra 0.58314 0.04719 0.60452 0.04426 0.59574 0.04252
CIER 0.24045 0.13268 0.24499 0.13586 0.25094 0.13530
BSS 0.04849 0.02020 0.05035 0.01927 0.05246 0.02020
PCA.stat -2.12909 1.75003 -1.65832 1.70556 -1.69990 1.69341
From the comparison of Tables 7.19 and B.3 it can be seen that the LDA out-
performes the logistic regression by quite a margin. The downfall to the LDA
is that it is impossible to include the customer factors in the model. That can
though be done by applying the customer factors in a heuristic procedure. The
heuristic procedure was performed in such a way that the final rating was down-
graded by one and two if the customer factors where indicating negative factors.
In Table 7.20 the results when accountants annotations and subjective ratings
7.7 Discriminant Analysis 125
ŷ ∼ pc1 (α̃) + pc1 (ϕ) ŷ ∼ pc1 (α̃) + pc1 (ϕ) ŷ ∼ pc1 (α̃) + pc1 (ϕ)
+pc2 (ϕ) & ςs +pc2 (ϕ) + γaa +pc2 (ϕ) + γaa & ςs
DP Indicator Mean Std.dev. Mean Std.dev. Mean Std.dev.
AUC 0.87784 0.02211 0.87133 0.02249 0.87908 0.02244
Pietra 0.62879 0.04580 0.60339 0.04423 0.62754 0.04562
CIER 0.30502 0.12192 0.25010 0.13791 0.29537 0.12446
BSS 0.06372 0.01810 0.05825 0.02312 0.07081 0.02140
PCA.stat -0.66560 1.62737 -1.41711 1.81301 -0.51832 1.72214
By comparing the PCA.stat of the three models in Table 7.20 with the PCA.stat
of the middle model in Table 7.19, it is clear that both the accountants annota-
tions and the subjective ratings improve the model. It is also noticable that the
subjective rating improves the perfromance by a greater margin than the accoun-
tant’s annotation. It also interesting to see that when customer factors where
introduced in Section 7.5 it resulted in a jump in model performance. When
further customer factors where included in the heuristic procedure, it reduced
the model performance. The conclusion is that linear discriminant analysis is
not likely to outperform logistic regression based on the prerequsition of nomally
distributed explanatory variables.
An extensive attempt was made to model the support vector machine without
great success. This is given a brief discussion in Section B.3 in Appendix B
As can be seen in Section 5.2.2 there are several available link functions. In this
section the different performance of different link functions is discussed. Porath
[27] reports that the complementary log log link function is the most suitable
link function when modeling default probabilities. The results can be seen in
Table 7.22. From the results in table 7.22 it is clear from the PCA.stat indi-
cator that the complementary log-log link function has the best performance.
The complementary log-log link functions was observed to have some conver-
gency problems, that is in some cases many iterations were needed to get stable
estimates of parameters. In order to save time the complementary log-log link
function was thus not used in other analysis, the logit link was used in all other
analysis except something otherwise is noted. It is though important to note
that the complementary log-log link function is especially well suited for model-
ing default data. Others links were tried but were subjected to sever convergency
7.8 Link functions 127
problems and lack of performance, and are thus not given a further discussion
128 Modeling Results
Chapter 8
Conclusion
This chapter contains a short summary of results found in the thesis, in Section
8.1. Suggestions on possible further works related to the work done in this thesis
are discussed in Section 8.2.
In this thesis many different aspects of the development process of a new credit
rating model were considered. The main aspects are; modeling procedures,
variable performance, variables selection procedure and the validation process.
Various methods are available to model the default event and some of the best
suited models for the problem are statistical models. The most appropriate
statistical models are the models that can produce individual probabilities of
whether a certain firm will default or not. Of the modeling procedures tried,
the logistic regression models was seen to be the most practical procedure to
model default probabilities. That is concluded from the smooth transition from
creditworthiness data to probabilities of default.
The linear and quadratic discriminant analysis methods, have a clear lack of
generality for the modeling of credit default. As they require normality of the
130 Conclusion
The amount of data available can not be considered as optimal in two senses.
First as the number of defaults is rather limited and only three years are can
be considered in the modeling process. This problem was solved by performing
recursive resampling of the modeling data and to consider the average perfor-
mance over 50 resampling iterations. Secondly the lack of different quantitative
key ratios, made the variable selection analysis very limited. The credit rating
score, called KOB score, of an credit rating agency showed significant increase in
model performance. From this it is possible to conclude that there is definitely
room for improvement that could be filled by including variables that were not
available in the available data.
The validation of credit rating models seems to lack a single numerical measure
on model performance. That causes a great problems in model development
and thus a new measure is suggested called PCA.stat. The PCA.stat is not
really a new measure as it is a principal component representative of some
selected discriminatory power indicators. With one numerical measure of model
performance, variable selection and the model development in general becomes
much more efficient.
The next step would be to consider some macroeconomic variables in the mod-
8.2 Further work 131
eling process. Interest rates, gas price, house prices and inflation are amongst
some of the economic variables that could bring important value to the model.
It would also be advisable to consider each sector separately, it can easily be
seen that for example house prices probable have greater influence on firms in
the real estate sector and gas prices on firms in the transport sector.
There are many different discriminatory power indicators introduced here, while
there are many other discriminatory power indicators available, many of which
assume that the underlying distribution of both the distribution of default and
non-default cases are normally distributed. That distributional assumption sim-
ply can not hold especially for the distribution of default cases. No discrimina-
tory power indicators with those assumptions are considered in this thesis. It
would be interesting to develop some new discriminatory power indicators that
would consider the PDs instead of the risk ratings.
It would also by interesting to apply the fixed income portfolio analysis into
the analysis as Altman and Saunders [3] points out has not seen widespread use
to this day. Portfolio theory could be applied to banks portfolio to price, by
determining interest rates, new loan applicants after calculating their probability
of default, their risk measure.
132 Conclusion
Appendix A
Credit Pricing Modeling
In this chapter a practical method for estimating the loss distribution is pre-
sented. The theory is mostly adapted from Alexander and Sheedy [1] and Ong
[26].
In order to estimate the loss distribution from a loan portfolio, the probability
distribution of defaults has to be estimated first. For the portfolio a firm can
either default with probability π or stay solvent with probability (1-π). The
default events for different firms are assumed independent and are thus well
fitted by the binomial distribution. The probability of exactly k defaults in the
portfolio is then:
n!
Pr(k) = π k (1 − π)n−k (A.1)
k!(n − k)!
For large n this probability can be approximated by the Poisson distribution:
(nπ)k e−nπ
Pr(k) = (A.2)
k!
According to two rules of thumb the approximation is good if n ≥ 20 and
π ≤ 0.05, or if n ≥ 100 and nπ ≤ 10.
134 Credit Pricing Modeling
where PD is adopted from the credit rating model, EAD is estimated as the
current exposure. The LGD should be estimated from historical data.
For this procedure to be used in practice the whole portfolio has to be divided
into m approximately equally large portfolios. The reason for splitting the whole
portfolio up into smaller portfolios is that for large n the binomial distribution
behaves as the normal distribution, as provided by the central limit theory. The
portfolio should be divided by the size of exposure such that the firms with
smallest exposure are in the first portfolio and so on. If the fore mentioned rules
of thumbs are satisfied then the probability distribution of default is approxi-
mated by the Poisson distribution.
(nπ)ki e−(nπ)i
Pr(k)i = i = 1, 2, . . . , m (A.4)
k!
From equations A.4 and A.5 it is possible to estimate the expected loss (EL).
That is done by summing up the loss for all k such that the cumulative Pr(k)i is
0.5, and then summing up the relative losses for all m portfolios. It is therefore
possible to estimate V aRα by summing up the cumulative probability to the α
level. From the EL and V aRα it is possible to calculate the un-expected loss
(UEL) which is sometimes also referred to as incremental credit reserve (ICR)
in the literature.
Appendix B
In order to make an educated decision on how the quantitative key figures should
be used in the model, three different means can be seen in Table B.1. The first
model in Table B.1 uses the two principal components of the PCA considering
each sector. The second model shows the performance of the sum of scaled
quantitative key figures. The last model in Table B.1 shows the performance of
the first principal component of the scaled quantitative key figures.
From Table B.1 it might be difficult to decide which of the three models has the
best performance. A good place to start analyzing tables similar to Table B.1
is to view the PCA.stat statistic as it pulls together some of the other statis-
tics. The first two principal components of the unscaled quantitative figures has
the highest PCA.stat but lacks to perform well on some of the other statistics.
The PCA.stat was constructed from competitive models whereas from the low
PCA.stat values for the models in Table B.1 those models can hardly be consid-
ered to be competitive. PCA.stat is constructed from the AUC, Pietra, CIER
and BSS indicators, which are always presented along with the PCA.stat so the
reader can confidence of the PCA.stat.
136 Additional Modeling Results
P
ŷ ∼ pc∗1 (α) + pc∗2 (α) ŷ ∼ α̃ ŷ ∼ pc1 (α̃)
DP Indicator Mean Std.dev. Mean Std.dev. Mean Std.dev.
AUC 0.73982 0.02557 0.78286 0.02757 0.77181 0.02925
Pietra 0.40328 0.05507 0.43447 0.04889 0.41429 0.05248
CIER 0.09513 0.35933 -0.24632 0.29130 -0.18314 0.33465
Gini 0.47964 0.05113 0.56572 0.05513 0.54362 0.05850
Variance 0.01659 0.00070 0.01659 0.00070 0.01659 0.00070
Calibration 0.00027 0.00012 0.00039 0.00022 0.00026 0.00016
Resolution 0.00044 0.00012 0.00052 0.00013 0.00050 0.00014
Brier 0.01643 0.00070 0.01647 0.00072 0.01636 0.00070
BSS 0.00988 0.00790 0.00756 0.01957 0.01417 0.01674
PCA.stat -7.81915 2.02009 -8.18340 2.46617 -8.16708 2.58280
AIC 884.937 19.3976 829.646 18.2382 835.900 18.4168
Psuedo R2 0.05803 0.01389 0.12159 0.01196 0.10846 0.01203
The AIC and psuedo R2 are measure the fit of the model not its performance
as a credit assessment model. The sum of the scaled qualitative key figures has
the best fit indicating that it might be a good performer. The last model in
Table B.1 considers the first principal component of the scaled qualitative key
figures, and performs slightly better than the model considering the sum model,
according to the PCA.stat.
Considering next the qualitative figures in Table B.2 no matter what indicators
are analyzed it is quickly apparent that the qualitative figures outperform the
quantitative key figures.
B.1 Detailed Performance of Multivariate Models 137
P
ŷ ∼ pc1 (ϕ) ŷ ∼ pc1 (ϕ) + pc2 (ϕ) ŷ ∼ ϕ
DPIndicator Mean Std.dev. Mean Std.dev. Mean Std.dev.
AUC 0.83785 0.02643 0.84726 0.02414 0.84770 0.02499
Pietra 0.56553 0.04685 0.56431 0.04372 0.56672 0.05526
CIER 0.20652 0.15527 0.16860 0.15328 0.23910 0.16155
BSS 0.04437 0.01755 0.04028 0.01642 0.05217 0.02123
PCA.stat -2.97002 1.82934 -3.06658 1.73939 -2.43837 2.07074
AIC 761.940 20.2347 753.953 20.1698 817.509 22.8831
Psuedo R2 0.18774 0.01524 0.19845 0.01516 0.20976 0.01930
From Table B.2 it can be seen, e.g. by considering the PCA.stat, that the model
containing the sum of the qualitative figures outperforms the other two models.
Interestingly it can be seen that it also has the highest AIC indicating that it
has the poorest fit out of the three. It should be taken into consideration that
the sum model has six variables probable whereas the others only have one and
two variables, explaining somewhat the high value of AIC. It should be noted
that the sum model has higher standard deviation on the PCA.stat than the
others.
In Table B.3 some combinations of the qualitative and quantitative figures are
considered. It is clear that the second principal component representative of
the qualitative figures has some good predictive powers. The results in Tables
B.4-B.8 are given a full discussion in Section 7.5 and a just listed here for further
references.
As can be seen in Table B.9 the model including the age factor does not perform
as well as the one without the age factor. Although this seems decisive it is in
fact not, as age is most probable a factor. Recalling that the firms that have
missing values are deleted from the modeling dataset which has been considered
to this point. It should be clear that young firms would rather have missing
values than older firms. An successful attempt was made to prove this point by
considering a modeling dataset that excluded the qualitative figures and using
the principal component representatives of the qualitative figures.
In Table B.10 the results for the attempt to use the two first principal compo-
nents instead of the sum of qualitative key figures are listed. From the results it
is clear that it does not outperform the use of the sum of qualitative key figures,
as can be seen in Table B.9.
138 Additional Modeling Results
P P P
ŷ ∼ α̃ + pc1 (ϕ) ŷ ∼ α̃ + pc1 (ϕ) + pc2 (ϕ) ŷ ∼ i∈{1,2,4} pci (α̃, ϕ)
DP Indicator Mean Std.dev. Mean Std.dev. Mean Std.dev.
AUC 0.86001 0.02357 0.87128 0.02174 0.86231 0.02345
Pietra 0.58767 0.05149 0.60575 0.04357 0.59143 0.05542
CIER 0.21020 0.13418 0.20746 0.13256 0.22424 0.13581
BSS 0.04492 0.01971 0.05300 0.02067 0.04449 0.01952
PCA.stat -2.29438 1.78160 -1.70544 1.71236 -2.16406 1.86009
AIC 736.802 20.5171 728.705 20.8637 736.490 20.5230
Psuedo R2 0.22326 0.01630 0.23408 0.01676 0.21930 0.01629
RMC ŷ ∼ pc∗1 (α) + pc∗2 (α) + pc1 (ϕ) ŷ ∼ pc1 (α̃) + pc1 (ϕ)
DP Indicator Mean Std.dev. Mean Std.dev. Mean Std.dev.
AUC 0.88380 0.01890 0.84780 0.02379 0.85851 0.02396
Pietra 0.60149 0.04582 0.57775 0.04827 0.58138 0.05110
CIER 0.26502 0.12102 0.20872 0.14318 0.20029 0.14207
BSS 0.08448 0.02098 0.04582 0.01877 0.04721 0.01986
PCA.stat -0.48041 1.69085 -2.61091 1.76196 -2.37164 1.85403
AIC 751.191 20.3170 738.210 19.9818
Psuedo R2 0.20355 0.01591 0.21532 0.01554
Table B.4: Models considering different principal component procedures for the
quantitative key figures.
B.1 Detailed Performance of Multivariate Models 139
ŷ ∼ pc1 (α̃) + pc2 (α̃) ŷ ∼ pc1 (α̃) + pc1 (ϕ) ŷ ∼ pc1 (α̃) + pc2 (α̃)
+pc1 (ϕ) +pc2 (ϕ) +pc1 (ϕ) + pc2 (ϕ)
DP Indicator Mean Std.dev. Mean Std.dev. Mean Std.dev.
AUC 0.85783 0.02422 0.86962 0.02130 0.84780 0.02379
Pietra 0.58297 0.05028 0.60758 0.03841 0.57775 0.04827
CIER 0.20727 0.13902 0.18618 0.14480 0.20872 0.14318
BSS 0.04557 0.02050 0.04726 0.02038 0.04582 0.01877
PCA.stat -2.37979 1.81994 -1.95172 1.66847 -2.61091 1.76196
AIC 739.176 20.3843 729.683 20.3124 751.191 20.3170
Psuedo R2 0.21643 0.01604 0.22660 0.01606 0.20355 0.01591
P P P
ŷ ∼ i∈{1,2,4} pci (α̃, ϕ) ŷ ∼ i∈{1,2,4} pci (α̃, ϕ) ŷ ∼ pci (α̃, ϕ)
i∈{1,2,4}
+γo +γo + γaa +γo + γaa + γa
DP Indicator Mean Std.dev. Mean Std.dev. Mean Std.dev.
AUC 0.87745 0.02374 0.87787 0.02419 0.87710 0.02421
Pietra 0.61149 0.05311 0.61095 0.05151 0.62099 0.04929
CIER 0.42878 0.11387 0.44029 0.12174 0.44121 0.11794
BSS 0.08360 0.02103 0.08785 0.02211 0.08859 0.02215
PCA.stat 0.17029 1.76229 0.32746 1.80317 0.43781 1.75137
AIC 699.890 20.9853 696.784 21.8350 695.738 22.1820
Psuedo R2 0.26281 0.01789 0.27043 0.01896 0.27584 0.01942
Table B.6: Introducing the customer factors,γo indicates whether firms have
previously failed to fulfill their obligations, γaa indicates whether there are some
annotation made by the accountant in the firms financial statement and γa is
an age factor.
140 Additional Modeling Results
P P P P
RMC ŷ ∼ α̃ + ϕ ŷ ∼ α̃ + ϕ + γo
DP Indicator Mean Std.dev. Mean Std.dev. Mean Std.dev.
AUC 0.88380 0.01890 0.87424 0.02341 0.88547 0.02434
Pietra 0.60149 0.04582 0.61055 0.05222 0.63648 0.05308
CIER 0.26502 0.12102 0.22785 0.13405 0.44337 0.10300
BSS 0.08448 0.02098 0.05660 0.02115 0.08619 0.01971
PCA.stat -0.48041 1.69085 -1.42421 1.80689 0.70390 1.65541
AIC 723.649 21.5181 688.360 22.0673
Psuedo R2 0.24808 0.01773 0.29018 0.01929
Table B.7: Models where the both the qualitative and quantitative figures are
used individually. The obligation variable is also introduced to the model with
both the qualitative and quantitative figures.
P P P P
ŷ ∼ α̃ + ϕ ŷ ∼ α̃ + ϕ
RMC +γo + γaa +γo + γaa + γa
DP Indicator Mean Std.dev. Mean Std.dev. Mean Std.dev.
AUC 0.88380 0.01890 0.88572 0.02489 0.88565 0.02540
Pietra 0.60149 0.04582 0.63273 0.05262 0.63800 0.04865
CIER 0.26502 0.12102 0.45406 0.10027 0.44279 0.10310
BSS 0.08448 0.02098 0.09227 0.02162 0.08995 0.02112
PCA.stat -0.48041 1.69085 0.86769 1.71254 0.81453 1.72739
AIC 684.090 22.6430 683.246 23.0888
Psuedo R2 0.29904 0.02009 0.30424 0.02059
Table B.8: Model performances when two further customer factors are intro-
duced.
B.1 Detailed Performance of Multivariate Models 141
P P
ŷ ∼ pc1 (α̃) + ϕ ŷ ∼ pc1 (α̃) + ϕ
RMC +γo + γaa +γo + γaa + γa
DP Indicator Mean Std.dev. Mean Std.dev. Mean Std.dev.
AUC 0.88380 0.01890 0.88557 0.02427 0.88578 0.02529
Pietra 0.60149 0.04582 0.63032 0.04787 0.63582 0.04852
CIER 0.26502 0.12102 0.45815 0.10427 0.44672 0.11022
BSS 0.08448 0.02098 0.09323 0.02151 0.09212 0.02027
PCA.stat -0.48041 1.69085 0.88147 1.69849 0.86564 1.74342
AIC 683.929 22.1750 682.563 22.6804
psuedo R2 0.29278 0.01964 0.29853 0.02021
Table B.9: The first principal component representative of scaled key figures is
introduced as a replacement of the sum of scaled key figures.
P
ŷ ∼ pc1 (α̃) + pc1 (ϕ) ŷ ∼ α̃ + pc1 (ϕ)
+pc2 (ϕ) + γo + γaa + γa +pc2 (ϕ) + γo + γaa + γa
DP Indicator Mean Std.dev. Mean Std.dev.
AUC 0.88414 0.02199 0.88359 0.02326
Pietra 0.63235 0.03944 0.63518 0.04604
CIER 0.44639 0.11448 0.44064 0.11415
BSS 0.09031 0.02131 0.08904 0.02132
PCA.STAT 0.75239 1.58640 0.71479 1.65092
AIC 689.933 21.9609 690.250 22.3572
Psuedo R2 0.28205 0.01928 0.28815 0.01960
Table B.10: The first two principal components of the qualitative figures is
introduced as a replacement of the sum of qualitative figures.
142 Additional Modeling Results
Generally the regular variables of the complete dataset were used for model-
ing purposes. It is though interesting to analyze the performance of principal
component representative. In this chapter the general results of PCA I-IV are
presented.
Table B.11: The rotation of variables and summary of the principal component
analysis of the qualitative figures.
The PCA for the quantitative key figures have some different variants. First the
PCA of the scaled key figures is pursued, the results are summarized in Table
B.12. From Table B.12 it can be seen that all the scaled key figures have the
same sign but the debt score has the largest rotation loading of all the variables.
The first PC only makes up for 46% of the total variance of the variables. It is
interesting that the first two PCs are quite similar for the liquidity and solvency
scores.
The PCA for the unscaled figure is done by scaling them to have unit variance
before conducting the PCA. The scaling results in totally different scaling, if
compared with the scaled key figures. The summary results can be seen in Table
B.13 and it is noticeable that the liquidity ratio has different sign compared to
the other ratios. It is then interesting to see that the first PC only makes up
144 Additional Modeling Results
Table B.12: The rotation of variables and summary of the principal component
analysis of the scaled qualitative figures.
for 35% of the total variance of the variables. It thus clear that the variance
between variable is much more than for the qualitative figures. It is interesting
to consider the results of table 7.12, where it can be seen that the second PC
does not have any predictive powers.
Table B.13: The rotation of variables and summary of the principal component
analysis of the qualitative figures.
In order to deal with the different distribution of key figures between sectors a
PCA was done on each sector separately. The results can be seen in table B.14.
From table B.14 it is quite noticable that the first PCs are considerable different
between sectors.
B.2 Additional Principal Component Analysis 145
Table B.14: The rotation of variables and summary of the principal component
analysis of the qualitative figures for the all sector separately.
146 Additional Modeling Results
In order to analyze whether the combination of the qualitative and scaled quan-
titative figures would perform better than individual PCA of the qualitative and
quantitative figures. The results of the PCA of the qualitative and quantitative
figures can be seen in Table B.16.
B.2 Additional Principal Component Analysis
Variable PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10
DEBT 0.0840 0.6872 0.1468 0.0104 0.6927 −0.1016 0.0925 −0.0079 0.0129 0.0094
LIQUIDITY −0.0188 0.1708 −0.8314 −0.5273 0.0157 −0.0122 −0.0083 −0.0186 0.0203 −0.0005
RETURN −0.0954 −0.6757 0.0549 −0.2793 0.6282 −0.2086 0.1182 −0.0134 0.0124 −0.0300
SOLVENCY −0.1917 −0.1297 −0.5041 0.7618 0.2740 0.1843 0.0092 −0.0089 −0.0509 −0.0099
MANAGEMENT −0.4145 0.0832 0.0705 −0.0700 −0.0312 −0.0445 −0.0839 −0.5429 −0.5611 −0.4379
STABILITY −0.3900 0.0333 0.1035 −0.1753 0.1106 0.4194 −0.3115 0.6378 −0.3309 0.0625
POSITION −0.3951 0.0241 0.1113 −0.1206 0.0800 0.4561 −0.2144 −0.4093 0.6093 0.1315
SITUATION −0.3779 0.0732 0.0377 −0.0441 −0.1280 0.1254 0.8575 0.1871 0.1245 −0.1745
REFUNDING −0.3881 0.0856 −0.0243 0.1012 −0.0802 −0.5912 −0.2963 0.2908 0.3961 −0.3821
RISK −0.4163 0.0624 0.0025 0.0332 −0.0884 −0.4011 0.0642 −0.1031 −0.1696 0.7807
Standard deviation 2.0634 1.1605 1.0238 0.9177 0.7931 0.7195 0.6789 0.5760 0.5607 0.5015
Proportion of Variance 0.4258 0.1347 0.1048 0.0842 0.0629 0.0518 0.0461 0.0332 0.0314 0.0251
Cumulative Proportion 0.4258 0.5604 0.6653 0.7495 0.8124 0.8641 0.9102 0.9434 0.9748 1.0000
Table B.15: The rotation of variables and summary of the principal component analysis of both the quantitative and
qualitative figures.
147
Additional Modeling Results
Variable PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10
DEBT SCORE 0.1971 0.6559 −0.0820 0.1753 −0.0979 0.0010 −0.1157 0.0612 −0.6802 −0.0640
LIQUIDITY SCORE 0.1746 0.2238 0.6274 −0.5386 0.2759 −0.3905 0.0114 0.0813 0.0167 0.0032
RETURN SCORE 0.1111 0.6140 −0.4280 −0.0523 0.1441 −0.0946 0.0918 −0.0563 0.6190 0.0184
SOLVENCY SCORE 0.2276 0.2115 0.5870 0.3253 −0.3643 0.4440 0.0262 −0.1459 0.3184 0.0006
MANAGEMENT 0.3950 −0.1568 −0.0898 −0.0488 0.1441 −0.0317 −0.4736 −0.6054 0.0085 −0.4430
STABILITY 0.3739 −0.0830 −0.1631 −0.4081 −0.0103 0.3722 0.6469 −0.2498 −0.1931 0.0570
POSITION 0.3774 −0.0924 −0.1604 −0.3556 −0.1382 0.3554 −0.4771 0.5458 0.0858 0.1364
SITUATION 0.3586 −0.1489 −0.0970 0.0500 −0.6489 −0.5728 0.1890 0.1355 0.0734 −0.1693
REFUNDING 0.3734 −0.1398 0.0443 0.4288 0.5104 0.0235 0.2540 0.4343 0.0362 −0.3738
RISK 0.4003 −0.1279 −0.0105 0.2981 0.1981 −0.2167 −0.0817 −0.1773 −0.0335 0.7804
Standard deviation 2.1201 1.2402 1.1459 0.7455 0.6851 0.6766 0.5770 0.5597 0.5233 0.5010
Proportion of Variance 0.4495 0.1538 0.1313 0.0556 0.0469 0.0458 0.0333 0.0313 0.0274 0.0251
Cumulative Proportion 0.4495 0.6033 0.7346 0.7902 0.8371 0.8829 0.9162 0.9475 0.9749 1.0000
Table B.16: The rotation of variables and summary of the principal component analysis of both the scaled quantitative
and qualitative figures.
148
B.3 Unsuccessful Modeling 149
In this section some of the methods that were tried without success are given a
brief discussion. In addition to a logistic regression a k-Nearest Neighbor (k-NN)
analysis and CART analysis where tried. It is not possible to use CART or k-NN
directly in credit modeling as it is not possible to get indipendent estimates of
probabilities of default, for each firm. As Frydman et al. [16] notes that CART
outperformes discriminant analysis and that even better results were obtained
by combining the methods. As they do not provide probabilities of default for
individual borrowers, some results from there analysis were used instead as a
explanatory variable. For the k-NN the ratio of defaulted neighbours, Ki (k)
in equation (5.49), was used as a variable. For the CART model the default
ratio, pm of the splitted region which that particular firm falls into, is used as
a variable. When the k-NN was used as a variable the resulting probabilities of
default were to low as can be seen in Figure B.1.
When one of the k-NN ratios where used as an independent variable, the re-
sulting PDs are much more conservative than for the other models. The models
with k-NN variables give much better values for the Akaike Information Crite-
ria (AIC) than any of the other models. It is thus clear that k-NN have some
good predictive powers but as they result in such a conservative PDs models
using the information of the neighborhood can not be used with the same rating
transformation as the RMC. In Figure B.1 it can be seen that a large proportion
of the ratings are 11 and 12. As the transformation in use is not appropriate
for this model the k-NN variable is left out of the analysis. It is though likely
to perform well, if another transformation is to be analyzed.
The CART analysis did not perform, an example of a CART tree can be seen
in Figure B.2. The tree in Figure B.2 has the the obligation factor at top.
As the obligation factor has 3 levels, non failed, failed in past 12 months and
failed in past 24 months, the bc labeling refers to either of the failed obligation
levels. The left leg from the root contains all firms that have previously not
failed to fulfill their obligations, where the right one contains the firm that have
previously failed to fulfill their obligations. At the next nodes it is the pc1 (ϕ)
variable which provides the split. At the leaf nodes, one can see the default rate
and the number of observations that are divided by the criterias above.
The same problem as for k-NN was observed when modeling the problem with
the support vector machine (SVM). That is the observed PDs are relatively
small compared to the the relative PDs obtained from logistic regression. That
results in high risk rating and no observations getting risk ratings lower than six.
SVD is a complex method whereas it has some tuning parameters and despite
extensive analysis reasonable PD where not obtained. From this it can be seen
150 Additional Modeling Results
Relative Frequencies of good and bad cases Relative Frequencies of good and bad cases
for the LR Rating Model for the Testset
Good cases
0.4
0.4
Relative Frequency
Relative Frequency
Bad cases
0.2
0.2
0.0
0.0
2 4 6 8 10 12 2 4 6 8 10 12
Cumulative Frequencies of good and bad cases Cumulative Frequencies of good and bad cases
for the LR Rating Model for the Testset
Cumulative Frequency
Cumulative Frequency
0.8
0.8
0.4
0.4
0.0
0.0
2 4 6 8 10 12 2 4 6 8 10 12
0.8
0.4
0.4
0.0
0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
OBLIGATION=bc
|
PCAquanti1< 1.76
0.006391
n=5320 PCAquanti1>=−0.7743
0.09091
PCAquanti1>=1.925
0.05502
n=66
n=309 PCAquanti2>=0.4885
0.7143
0.1471 0.5714
PCAquali>=3.387 n=7
n=34 n=7 0.08333
n=12 PCAquali< 4.008
PCAquali< 3.265
0.3333 0.75
0 0.4444 n=18 n=8
n=13 n=18
max PCA.stat
subject to xi ≤ xi+1 , for i = 1, 2, . . . , 11.
x0 = 0, x12 = 1 (B.1)
2 see page 19
Appendix C
Programming
The MASS package made available by Venables and Ripley [32] makes discrim-
inant analysis possible and the CART analysis where done with the help of
the rpart package made available by Therneau and Atkinson [30]. The Design
package which made it possible to use a penalty in a logistic regression function
Jr Frank E Harrell [20]. The xtable package Dahl [12] make the transition of
reporting tables from R straight into LATEX a very easy task. With a touch of
class, Venables and Ripley [32], make it possible to perform a k-Nearest Neigh-
bor analysis very easily, with the class package.
C.2 R code
Code appendix is omitted but all code is available up on request. Please send
emails to arnar.einarsson@gmail.com.
19. Yusuf Jafry and Til Schuermann. Measurement, estimation and comparison
of credit migration matrices. Journal of Banking and Finance, 28:2603–
2639, August 2004.
20. Jr Frank E Harrell. Design: Design Package, 2007. URL
http://biostat.mc.vanderbilt.edu/s/Design. R package version 2.1-
1.
21. D. Lando. Credit Risk Modeling: Theory and Applications. Princeton Series
in Finance. Princeton University Press, 2004.