Credit Risk Modelling

Credit Risk Modeling
Arnar Ingi Einarsson
Kongens Lyngby 2008

IMM-PHD-2008-1
Technical University of Denmark
Informatics and Mathematical Modelling
Building 321, DK-2800 Kongens Lyngby, Denmark
Phone +45 45253351, Fax +45 45882673
reception@imm.dtu.dk
www.imm.dtu.dk
IMM-PHD: ISSN 0909-3192

Summary
The credit assessment made by corporate banks has been evolving in recent
years. Credit assessments have evolved from the being the subjective assessment
of the bank’s credit experts, to become more mathematically evolved. Banks are
increasingly opening their eyes to the excessive need for comprehensive model-
ing of credit risk. The financial crisis of 2008 is certain to further the great need
for good modeling procedures. In this thesis the modeling framework for credit
assessment models is constructed. Different modeling procedures are tried, lead-
ing to the assumption that logistic regression is the most suitable framework for
credit rating models. Analyzing the performance of different link functions for
the logistic regression, lead to the assumption that the complementary log-log
link is most suitable for modeling the default event.
Validation of credit rating models lacks a single numeric measure that concludes
the model performance. A solution to this problem is suggested by using prin-
cipal component representatives of few discriminatory power indicators. With a
single measure of model performance model development becomes a much more
efficient process. The same goes for variable selection. The data used in the
modeling process are not extensive as would be the case for many banks. An
resampling process is introduced that is useful in getting stable estimates of
model performance for a relatively small dataset.
ii
Preface
This thesis was prepared at Informatics Mathematical Modelling, the Technical

University of Denmark in partial fulfillment of the requirements for acquiring
the Master of Science in Engineering.
The project was carried out in the period from October 1st 2007 to October 1st
2008.
The subject of the thesis is the statistical aspect of credit risk modeling.
Lyngby, October 2008
Arnar Ingi Einarsson

iv
Acknowledgements
I thank my supervisors Professor Henrik Madsen and Jesper Colliander Kris-

tensen for their guidance throughout this project.
I would also like to thank my family, my girlfriend Hrund for her moral support,
my older son Halli for his patience and my new-born son Almar for his inspiration
and for allowing me some sleep.
vi
Contents
Summary i
Preface iii
Acknowledgements v
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Aim of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Outline of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Credit Modeling Framework 5
2.1 Definition of Credit Concepts . . . . . . . . . . . . . . . . . . . . 5
2.2 Subprime Mortgage Crisis . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Development Process of Credit Rating Models . . . . . . . . . . . 15

viii CONTENTS
3 Commonly Used Credit Assessment Models 21
3.1 Heuristic Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2 Statistical Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.3 Causal Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.4 Hybrid Form Models . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.5 Performance of Credit Risk Models . . . . . . . . . . . . . . . . . 31
4 Data Resources 35
4.1 Data dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.2 Quantitative key figures . . . . . . . . . . . . . . . . . . . . . . . 39
4.3 Qualitative figures . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.4 Customer factors . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.5 Other factors and figures . . . . . . . . . . . . . . . . . . . . . . . 54
4.6 Exploratory data analysis . . . . . . . . . . . . . . . . . . . . . . 58
5 The Modeling Toolbox 61
5.1 General Linear Models . . . . . . . . . . . . . . . . . . . . . . . . 61
5.2 Generalized Linear Models . . . . . . . . . . . . . . . . . . . . . . 69
5.3 Discriminant Analysis . . . . . . . . . . . . . . . . . . . . . . . . 73
5.4 k-Nearest Neighbors . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.5 CART, a tree-based Method . . . . . . . . . . . . . . . . . . . . . 77
5.6 Principal Component Analysis . . . . . . . . . . . . . . . . . . . 80
6 Validation Methods 85
CONTENTS ix
6.1 Discriminatory Power . . . . . . . . . . . . . . . . . . . . . . . . 85
6.2 Relative frequencies and Cumulative frequencies . . . . . . . . . 86
6.3 ROC curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.4 Measures of Discriminatory Power . . . . . . . . . . . . . . . . . 88
6.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
7 Modeling Results 99
7.1 General Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
7.2 Principal Component Analysis . . . . . . . . . . . . . . . . . . . 107
7.3 Resampling Iterations . . . . . . . . . . . . . . . . . . . . . . . . 112
7.4 Performance of Individual Variables . . . . . . . . . . . . . . . . 114
7.5 Performance of Multivariate Models . . . . . . . . . . . . . . . . 119
7.6 Addition of Variables . . . . . . . . . . . . . . . . . . . . . . . . . 122
7.7 Discriminant Analysis . . . . . . . . . . . . . . . . . . . . . . . . 124
7.8 Link functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
8 Conclusion 129
8.1 Summary of Results . . . . . . . . . . . . . . . . . . . . . . . . . 129
8.2 Further work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
A Credit Pricing Modeling 133
A.1 Modeling of Loss Distribution . . . . . . . . . . . . . . . . . . . . 133
B Additional Modeling Results 135

x CONTENTS
B.1 Detailed Performance of Multivariate Models . . . . . . . . . . . 135
B.2 Additional Principal Component Analysis . . . . . . . . . . . . . 142
B.3 Unsuccessful Modeling . . . . . . . . . . . . . . . . . . . . . . . . 149
C Programming 153
C.1 The R Language . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
C.2 R code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

Chapter 1
Introduction
1.1 Background
Banking is built on the idea of profiting by loaning money to ones that are in
need of money. Banks then collect interests on the payments which the borrower
makes in order to pay back the money they borrowed. The likely event that
some borrowers will default on their loans, that is fail to make their payments,
results in a financial loss for the bank.
In the application process for new loans, banks assess the potential borrowers
creditworthiness. As a measure of creditworthiness some assessment are made
on the probability of default for the potential borrowers. The risk that the
credit assessment of the borrowers is to modest, is called credit risk. Credit risk
modeling is quite an active research field. Before the milestone of Altman [2],
credit risk on corporate loan was based on subjective analysis of credit experts
of financial institutes.
Probability of default is a key figure in the daily operation of any credit institute,
as it is used as a measure of credit risk in both internal and external reporting.
The credit risk assessments made by banks are commonly referred to as credit
rating models. In this thesis various statistical methods are used as modeling
2 Introduction
procedures for credit rating models.
1.2 Aim of Thesis
This thesis is done in co-operation with a corporate bank, which supplied the
necessary data resources. The aim of the thesis is to see whether logistic re-
gression can outperform the current heuristic credit rating model used in the
co-operating corporate bank . The current model is called Rating Model Cor-
porate (RMC) and is described better in section 4.5.1. This was the only clear
aim in the beginning, but further goals were acquired in the proceedings of the
thesis.
First some variables that were not used in RMC but were still available, are
tested. Then an attempt was made to model credit default with different math-
ematical procedures. Also an effort was made to combine some of those methods
with logistic regression. Since discriminant analysis have seen excessive use in
credit modeling the performance of discriminant analysis was documented for
comparison.
Validation of credit ratings is hard compared to regular modeling whereas there

is no true or observed rating that can be compared with the predicted credit
rating to measure the prediction error. There are some validation methods
available but no single measure can be used in order to make the clear cut
decision on whether one model is better than other. It is thus necessary to
consider numerous measures simultaneously to draw some conclusion on model
performance. This has an clear disadvantage as it might be debateable whether
one model is better than another. In order to address this problem an attempt
was made to combine the measures that are available into a single measure.
As missing values are frequently apparent in many of the modeling variables,

some thoughts are made on how that particular problem could be solved. The
problem regarding small sample of data is dealt with.
The general purpose of this thesis is to inform the reader on how it is possible
to construct credit rating models. Special emphasis is made on the practical
methods that a bank in the corporate banking sector could make use of, in the
development process of a new credit rating model.
1.3 Outline of Thesis 3
1.3 Outline of Thesis
Credit risk modeling is a wide field. In this thesis an attempt is made to shed
a light on the many different subjects of credit risk modeling. Chapters 2 and
6 provide the fundamental understanding of credit risk modeling.
The structure of the thesis is as follows.
Chapter 2: Credit Modeling Framework. Introduces the basic concepts

of credit risk modeling. Furthermore, a discussion on the ongoing financial
crisis is given. Then finally a detailed description of the modeling process
is given.
Chapter 3: Commonly Used Credit Assessment Models. Gives a brief
introduction on the different types and performance of commonly used
credit assessment models.
Chapter 4: Data Resources. Gives a quite detailed description about the
data used in the analysis. The data were supplied by a co-operating cor-
porate bank.
Chapter 5: The Modeling Toolbox. Gives a full discussion on the mathe-
matical procedures that where used in the model development.
Chapter 6: Validation Methods. Introduces the large selection of valida-
tion methods. As validation is a fundamental part of credit risk modeling.
Chapter 7: Modeling Results. The main findings are presented. Perfor-
mance of different mathematical procedures are listed. Furthermore the
performance of variables is given a discussion.
Chapter 8: Conclusion. Concludes on the thesis and includes a section about
further works.
Appendix A: Credit Pricing Models. Introduces a practical method to es-
timate the loss distribution. The estimation of the loss distribution can
be used to extend the credit rating model to a credit pricing model.
Appendix B: Additional Modeling Results. Some modeling results that
were considered less important results are presented.
Appendix C: Programming. Includes an introduction to R, the program-
ming language used.
4 Introduction
Chapter 2
Credit Modeling Framework
In order to get a better feel for credit modeling framework there are some
important concepts and measures that are worth considering. It is also worth
considering the need of credit modeling and the important role of international
legislation on banking supervision, called Basel II.
In Section 2.1 the most important concepts of the credit modeling framework
are defined. The definitions are partly adapted from the detailed discussion
in Ong [26] and Alexander and Sheedy [1]. Section 2.2 discusses the ongoing
financial crisis that are partly due to poor credit ratings and finally the model
development process is introduced in Section 2.3.
2.1 Definition of Credit Concepts
The major activity of most banks1 is to raise principal by loaning money to

those who are in need of money. They then collect interests on the payments
made by the borrower in order to pay back the principal borrowed. As some
borrowers fail to make their payments, they are said to have defaulted on their
promise of repayment. A more formal definition of default is obtained from the
1 By the term bank it is also referred to any financial institute giving credit.
6 Credit Modeling Framework
Basel II legislation [6]. A firm2 , is defined as a default firm if either or both of

the following scenarios have taken place.
I - The credit institution considers that the obligor is unlikely to pay

its credit obligations to the credit institution in full, without recourse
by the credit institution to actions such as realizing security (if held).
II - The obligor is past due more than 90 days on any material

credit obligation to the banking group. Overdrafts will be considered
as being past due once the customer has breached an advised limit or
been advised of a limit smaller than current outstandings.
By considering the first of the two rather formal definitions, it states that if the
bank believes it will not receive their debt in full, without demanding ownership
of the collateral3 taken. The second scenario is simpler as it states that if the
borrower has not paid some promised payment, which was due 90 days ago, the
borrower is considered to have defaulted on its payment. The sentence regarding
overdrafts4 can be interpreted as if the borrower were to make a transaction
breaking the advised limit or is struggling to lower its limit and thus making
the bank fear that they will not receive their payment.
It is important to note the difference between the three different terms, in-
solvency, bankruptcy and default. The tree terms, are frequently used in the
literature as the same thing. In order to avoid confusion the three terms are
given an explanation here. The term insolvency refers to a borrower that unable
its debt whereas the borrower that has defaulted on its debt is either unwilling
or unable to pay their debt. To complicate matters even further insolvency is
often referred to as the situation when liabilities exceed assets, but firms might
still be profitable and thus be able to pay all their debts. Bankruptcy is a legal
finding that results in a court supervision over the financial affairs of a borrower
that is either insolvent or in default. It is important to note that a borrower that
has defaulted can come back from being defaulted by settling the debt. That
might be done by adding collateral or by getting alternative fundings. Further-
more, as will be seen later, when considering loss given default, the event of a
default does not necessary result in a financial loss for the bank.
When potential borrowers apply for a loan at a bank, the bank will evaluate
the creditworthiness of the potential borrower. This assessment is of whether
2A firm is any business entity such as a corporation, partnership or sole trader.
3 Collateralis an asset of the borrower that becomes the lenders if the borrower defaults
on the loan.
4 Overdraft is a type of loan meant to cover firm’s short term cash need. It generally has
an upper bound and interests are payed on the outstanding balance of the overdraft loan.
2.1 Definition of Credit Concepts 7
the borrower can pay the principal and interest when due. The risk that arises
from the uncertainty of the credit assessment, especially that it is to modest, is
called credit risk. According to the Basel Handbook [26] credit risk is the major
risk to which banks are exposed, whereas making loans is the primary activity
of most banks. A formal definition of credit risk is give by Zenios [35] as
The risk of an unkept payment promise due to default of an obligor–

counter-party, issuer or borrower–or due to adverse price movements
of an asset caused by an upgrading or downgrading of the credit qual-
ity of an obligor that brings into question their ability to make future
payments.
The creditworthiness may decline over time, due to bad management or some
external factors, such as rising inflation5 , weaker exchange rates6 , increased
competition or volatility in asset value.
The credit risk can be generalized with the following equation

Credit Risk = max {Actual Loss − Expected Loss, 0}
where the actual loss is the observed financial loss. Credit risk is thus the risk
that the actual loss is larger than the expected loss. Expected loss is an estimate
and the credit risk can be considered the risk that the actual loss is considerable
larger the the expected loss. The expected loss can be divided into further
components as follows
Expected Loss = Probability of Default × Exposure at Default × Loss Given Default
An explanation of each of these components is adapted from Ong [26].
Probability of Default (PD) is the expected probability that a borrower will

default on the debt before its maturity7 . PD is generally estimated by reviewing
the historical default record of other loans with similar characteristics. PD is
generally defined as the default probability of a borrower over a one year period.
As PDs are generally small numbers they are generally transformed to a risk
grade or risk rating, to make them more readable.
Exposure at Default (EAD) is the amount that the borrower legally owes the
bank. It may not be the entire amount of the funds the bank has granted the
5 Inflation is an economical term for the general increase in the price level of goods and
services.
6 Exchange rates describes the relation between two currencies, specifying how much one
currency is worth in terms of the other.

7 Maturity referes to the final payment date of a loan, at which point all remaining interest
and principal is due to be paid.

borrower. For instance, a borrower with an overdraft, under which outstandings

go up and down depending on the borrower’s cashflow needs, could fail at a point
when not all of the funds has been drawn down. EAD is simply the exact amount
the borrower owes at the time of default and can easily be estimated at any time
as the current exposure. The current exposure is the current outstanding debt
minus a discounted value of the collateral. The discounted value of the collateral
is meant to represent the actual value of the collateral.
Loss Given Default (LGD) is a percentage of the actual loss of EAD, that the
bank suffers. Banks like to protect themselves and frequently do so by taking
collateral or by holding credit derivatives8 as a securitization. Borrowers may
even have a guarantor who will adopt the debt if the borrower defaults, in that
case the LGD takes the value zero. The mirror image of LGD, recovery rate
given default is frequently used in the literature and they add up to the amount
owed by the borrower at the time of default, EAD. Loss given default is simply
the expected percentage of loss on the funds provided to the borrower. Altman
et al. [4] reports empirical evidence that observed default rates and LGDs are
positively correlated. From this observation it is possible to conclude that banks
are successful in protecting themselves when default rates are moderate, but fail
to do so when high default rates are observed.
Expected Loss (EL) can be seen as the average loss of historically observed losses.
EL can also be estimated using estimates of the three components in equation
(2.1).
EL = P D × EAD × LGD (2.1)
EL estimations is partly decisive of the banks capital requirement. Capital
requirements, that is the amount of money that the bank has to keep available,
is determined by financial authorities and is based on common capital ratios9 .
The capital requirements are though usually substantially higher than EL as
it has to cover all types of risk that the bank is imposed to, such as market,
liquidity, systematic and operational risks10 or simply all risks that might result
in a solvency crisis for the bank. Un-expected Loss (UEL) is defined in Alexander
8 Credit derivatives are bilateral contracts between a buyer and seller, under which the
seller sells protection against the credit risk of an underlying bond, loan or other financial
asset.
9 Tier I, Tier II, leverage ratio, Common stockholders’ equity.
10
Market risk the risk of unexpected changes in prices or interest or exchange rates.
Liquidity risk the risk that the costs of adjusting financial positions will increase substan-
tially or that a firm will lose access to financing.
Systemic risk the risk of breakdown in marketwide liquidity or chain-reaction default.
Operational risk the risk of fraud, systems failures, trading errors, and many other internal
organizational risks.
and Sheedy [1] with respect to a certain Value at Risk (VaR) quantile and the
probability distribution of the portfolio’s loss. The VaR quantile can be seen as
an estimate of the maximum loss. The VaR quantile is defined mathematically
as Pr [Loss ≤ V aRα ] = α, where α is generally chosen as high quantiles 99%-
99.9%. For a certain VaRα quantile the UEL can be defined as
UEL = VaRα − EL
The name un-expected loss is somewhat confusing as the value rather states
how much incremental loss could be expected in a worst case scenario. Further
discussion on how to obtain an estimate of EL, VaRα and UEL can be seen in
Appendix A.
One of the primary objectives of this thesis is to consider how to obtain the best
possible estimate of probability of default of specific borrowers. It is therefore
worth considering what is the purpose of acquiring the best possible estimate
of PDs. The PDs are reported as a measure of risk to both bank’s executive
board and to financial supervisory authorities. The duty of financial supervi-
sory authority is to monitor the bank’s financial undertakings and to ensure
that bank’s have reliable banking procedures. Financial supervisory authority
determine banks, capital requirements. As banks like to minimize their capital
requirements it is of great value to show that credit risk is successfully modeled.
Expected loss, capital requirements along with the PDs are the main factors in
deciding the interest rate for each borrower. As most borrowers will look for the
best offer on the market it is vital to have a good rating model. In a competitive
market, banks will loan at increasingly lower interest rates. Thus some of them
might default and as banks loan other banks, that might cause a chain reaction.
Banking legislation
If a chain of banks or a major bank would default, it would have catastrophic

consequences on any economic system. As banks loan each others the operations
of banks are very integrated with each other. Strong commercial banks are the
driving force in the economical growth of any country, as they make funds
available for investors. Realizing this the central bank governors of the G10
nations11 founded the Basel Committee on Banking Supervision in 1974. The
aim of this committee is according to their website [8]
The Basel Committee on Banking Supervision provides a forum for

regular cooperation on banking supervisory matters. Its objective is
11 The twelve member states of G10 are: Belgium, Netherlands, Canada, Sweden, France,
Switzerland, Germany, United Kingdom, Italy, United States, Japan and Luxembourg.
to enhance understanding of key supervisory issues and improve the

quality of banking supervision worldwide. It seeks to do so by ex-
changing information on national supervisory issues, approaches and
techniques, with a view to promoting common understanding. At
times, the Committee uses this common understanding to develop
guidelines and supervisory standards in areas where they are consid-
ered desirable. In this regard, the Committee is best known for its
international standards on capital adequacy; the Core Principles for
Effective Banking Supervision; and the Concordat on cross-border
banking supervision.
The Basel committee published an accord called Basel II in 2004 which is meant
to create international standards that banking regulators can use when creating
regulations about how much capital, banks need to keep solvent in order to
avoid credit and operational risks.
More specifically the aim of the Basel II regulations is according to Ong [26]
to quantify and separate operational risk from credit risk and to ensure that
capital allocation is more risk sensitive. In other words Basel II sets a guideline
how, banks in-house estimation of the loss parameters; probability of default
(PD), loss given default (LGD), and exposure at default (EAD), should be.
As banks need regulators approval, these guidelines ensure that banks hold
sufficient capital to cover the risk that the bank exposes itself to through its
lending and investment practices. These international standards should protect
the international financial system from problems that might arise should a major
bank or a series of banks collapse.
Credit Modeling
The Basel II accord introduces good practices for internal based rating systems
as another option to using ratings obtained from credit rating agencies. Credit
rating agencies rate; firms, countries and financial instruments based on their
credit risk. The largest and amongst the most cited agencies are Moody’s,
Standard & Poor’s and Fitch Ratings. Internal based rating systems have the
advantage over the rating agencies that, there are addition information available
inside that bank, such as credit history and credit experts valuation. Internal
based ratings can be obtain for all borrowers whereas for rating agencies ratings
might be missing some potential borrowers. Furthermore, rating agencies just
publicly report the risk grades of larger firms, whereas there is a price to view
their ratings for small and medium sized firms.
There are two different types of credit models that should not be confused
together. One is credit rating models and the other is credit pricing models.
There is a fundamental difference in the two models as the credit rating models
are used to model PDs and the pricing models consider combinations of PDs,
EADs and LGDs to model the EL. A graphical representation of the two models
can be seen in Figure 2.1.
Figure 2.1: Systematic overview of Credit Assessment Models.
In this thesis credit rating models are of the main concern, as it is of more
practical use and can be used to get estimates of EL. By estimating the EL
the same result as for credit pricing models is obtained. Reconsidering the
relationship between the risk components in equation (2.1).
The PDs are obtained from the credit rating model, the EAD is easily estimated
as the current exposure. An estimate of LGD can be found by collecting his-
torical data of LGD and in Figure 2.2 an example of LGD distribution can be
seen. The average which lies around 40% does not represent the distribution
well. A more sophisticated procedure would be to model the event of loss or
no loss with some classification procedure, e.g. logistic regression. Then use
the left part of the empirical distribution to model those classified as no loss
and the right part for those classified as loss. The averages of each side of the
distribution could be used. It would though be even better to use LGD as a
stochastic variable, and consider it to be independent of PD. It is generally
seen in practice that LGDs are assumed independent of PDs as Altman et al.
[4] points out that the commercial credit pricing models12 use LGD either as
12 These value-at-risk (VaR) models include J.P. Morgan’s CreditMetrics ,
R McKin-
Histogram of LGD
0.05
0.04
Relative Frequency
0.03
0.02
0.01
0.00
0 20 40 60 80 100
LGD [%]
Figure 2.2: Example of a empirical distribution of Loss Given Default (LGD).
a constant or a stochastic variable independent from PD. When estimations of

PDs, EADs and LGDs have been obtained they can be used to estimate the EL.
A practical procedure to estimate the expected loss is given an introduction in
appendix A.
sey’s CreditPortfolioView ,
R Credit Suisse Financial Products’ CreditRisk+ ,
R KMV’s
PortfolioManager ,
R and Kamakura’s Risk Manager .
R
2.2 Subprime Mortgage Crisis 13
2.2 Subprime Mortgage Crisis
It is important to recognize the importance of macro-economics13 on observed

default frequencies. By comparing the average default rates reported by Altman
et al. [4] and reports of recent recessions14 a clear and simple relationship can be
seen. Wikipedia [33] reports a recession in the early 1990s and in the early 2000
and Altman et al. [4] reports default rates higher than 10% in 1990, 1991, 2001
and 2002, whereas frequently observed default rates are between 1% and 2%.
The relationship is that high default rates are observed at and after recession
times.
In their 2006 paper, Altman et al. [4], argue that there was a type of credit
bubble on the rising, causing seemingly highly distressed firms to remain non-
bankrupt when, in more normal periods, many of these firms would have de-
faulted. Their words could be understood as there has been given to much credit
to distressed firms, which would thus result in greater losses when that credit
bubble would collapse. With the financial crisis of 2008 that credit bubble is
certain to have bursted. This might result in high default rates and significant
losses for corporate banks in the next year or two, only time will tell.
The financial crisis of 2008 is directly related to the subprime mortgage cri-
sis, whereas high oil and commodity prices have increased inflation, which has
induced further crisis situations. A brief discussion, adapted from Maslakovic
[22], on the subprime mortgage crisis and its causes follows.
The subprime mortgage crisis is an ongoing worldwide economic problem, re-

sulting in liquidity issues in the global banking system. The crisis began with
the bursting of the U.S. housing bubble in late 2006, resulting high default
rates on subprime and other adjustable rate mortgages (ARM). The term, sub-
prime refers to higher-risk borrowers, that is borrowers with lower income or
lesser credit history than prime borrowers. Subprime lending has been a ma-
jor contributor to the increases in home ownership in the U.S. in recent years.
The easily obtained mortgages, combined with the assumption of rising hous-
ing prices after a long term trend of rising housing prices encouraged subprime
borrowers to take mortgage loans. As interest rates went up, and once housing
prices started to drop moderately in 2006 and 2007 in many parts of the U.S.,
defaults and foreclosure activity increased dramatically.
13 Macroeconomics is the field of economics that considers the performance and behavior
of a national or regional economy as a whole. Macroeconomists try to model the structure

of national income/output, consumption, inflation, interest rates and unemployment rates,
amongst others. Macro- refers to large scale whereas micro- refers to small scale.
14 A recession is a contraction phase of the business cycle. Recession is generally defined as
when there has been a negative growth in real gross domestic product (GDP) for two or more
consecutive quarters. A sustained recession is referred to as depression.
The mortgage lenders were the first to be affected, as borrowers defaulted, but
major banks and other financial institutions around the world were hurt as
well. The reason for their pain was due to a financial engineering tool called
securitization, where rights to the mortgage payments is passed on via mortgage-
backed securities (MBS) and collateralized debt obligations (CDO). Corporate,
individual and institutional investors holding MBS or CDO faced significant
losses, as the value of the underlying mortgage assets declined. The stock prices
of those firms reporting great losses caused by their involvement in MBS or
CDO fell drastically.
The widespread dispersion of credit risk through CDOs and MBSs and the
unclear effect on financial institutions caused lenders to reduce lending activity
or to make loans at higher interest rates. Similarly, the ability of corporations
to obtain funds through the issuance of commercial paper was affected. This
aspect of the crisis is consistent with a credit crisis term called credit crunch. The
general crisis caused stock markets to decline significantly in many countries.
The liquidity concerns drove central banks around the world to take action to
provide funds to member banks to encourage the lending of funds to worthy
borrowers and to re-invigorate the commercial paper markets.
The credit crunch has cooled the world economic system, as fewer and more
expensive loans decrease the investments of businesses and consumers. The
major contributors to the subprime mortgage crisis were poor lending practices
and mispricing of credit risk. Credit rating agencies have been criticized for
giving CDOs and MBSs based on subprime mortgage loans much higher ratings
then they should have, thus encouraging investors to buy into these securities.
Critics claim that conflicts of interest were involved, as rating agencies are paid
by the firms that organize and sell the debt to investors, such as investment
banks. The market for mortgages had previously been dominated by government
sponsored agencies with stricter rating criteria.
In the financial crisis, which has been especially hard for financial institutes
around the world, the words of the prominent Cambridge economist John May-
nard Keynes have never been more appropriate, as he observed in 1931 during
the Great Depression:
A sound banker, alas, is not one who foresees danger and avoids
it, but one who, when he is ruined, is ruined in a conventional way
along with his fellows, so that no one can really blame him.
2.3 Development Process of Credit Rating Models 15
2.3 Development Process of Credit Rating Mod-

els
In this section the development process of credit rating models is introduced.

Figure 2.3 shows the systematic overview of the credit modeling process. The
rectangular boxes in Figure 2.3 represent processes, whereas the boxes with
the sloped sides represent numerical informations. As can be seen from Figure
2.3 there are quite a few processes inside the credit rating modeling process.
The figure shows the journey from the original data to the model performance
informations.
Figure 2.3: Systematic overview of the Credit Rating Modeling Process.
The data used are recordings from the co-operating bank’s database, and they
are the same data as used in Rating Model Corporate (RMC). The data are
given a full discussion in Chapter 4 can be categorized as shown at the top of
Figure 2.3.
The data goes through a certain cleaning process. A firm that is not observed
in two successive years, it is either a new customer or a retiring one, and thus
removed from the dataset. Observations with missing values are also removed
from the dataset.
When the data has been cleansed they will be referred to as complete and
they are then splitted into training and validation sets. The total data will be
approximately splitted as following, 50% will be used as a training set, 25% as
a validation set and 25% as a test set:
Training Validation Test
The training set is used to fit the model and the validation set is used to estimate
the prediction error for model selection. In order to account for the small sample
of data, that is of bad cases, the process of splitting, fitting, transformation and
validation is performed recursively.
The test set is then used to assess the generalization error of the final model
chosen. The training and validation sets, together called modeling sets, are
randomly chosen sets from the 2005, 2006 and 2007 dataset, whereas the test
set is the 2008 dataset. The recursive splitting of the modeling sets is done by
choosing a random sample without replacement such that the training set is 2/3
and validation set is 1/3 of the modeling set.
In the early stages of the modeling process it was observed that different seedings
into training and validation sets, resulted in considerable different results. In
order to accommodate this problem a resampling process is performed and the
average performance over N samples is considered for variable selection. In
order to ensure that the the same N samples are used in the resampling process
the following procedure is performed:
- First a random number, called the seed, is selected e.g. 2345.

- From the seed a set of random numbers, called a seeding pool, are gener-
ated. The modeling sample is then splitted into the training and validation
sets using a identity from the seeding pool.
- After the splitting into the training and validation sets, the default rates
of the two sets are calculated, respectively. If the difference in default
rates is more than ±10% then that particular split is rejected and with a
new identity from the seeding pool a new split is tried recursively until an
appropriate training and validation sets are obtained.
An example of the different performances for different splits for RMC and a
logistic regression model can be seen in Figure 2.4. The figure shows the clear
need for the resampling process. This can be seen by considering the differ-
ent splits in iteration 1 and 50 respecitvely. For iteration 1 the RMC would
have been preferred to the LR model. The opposit conclusion would have been
reached if the split of iteration 50 would have been considered.
Performance Comparison
2 2 1
2
2
22 2 1 11
2 2 112 1
2 1
2
2
2 2 1111 11
2 2
2 11 11
2
0
2 111
2 111 2
111 2
2 2 1
1 2 2 22 2 2
2
11 22
PCA.stat
2 2 2 2 2
1111
2 112
11
1 2
−2
2 2
111 2
2
2 1
2 2
11
2
11
−4
1
1
LR Model
2 RMC
0 10 20 30 40 50
Iteration
Figure 2.4: Comparison of the performance of a Logistic regression model and

RMC. The performances have been ordered in such a way that the performance
of the LR model is in an increasing order.
The datasets consists of creditworthiness data and the variable of whether the
firm has defaulted a year later. The default variable is given the value one if
the firm has defaulted and the value zero otherwise.
When the training and validation sets have been properly constructed, the mod-
eling is performed. The modeling refers to the process of constructing a model
that can predict whether a borrower will default on their loan, using some previ-
ous informations on similar firm. The proposed model is fitted using the data of
the training set and then a prediction is made for the validation set. If logistic
regression15 is used as a modeling method then the predicted values will lie on
the interval [0,1] and the predicted values can be interpreted as the probabilities
of default (PD). Generally when one is modeling some event or non-event the
predicted values are rounded to one for event and to zero for non-event. There
is a problem to this as the fitted values depend largely on the ratios of zeros
and ones in the training sample. That is, for cases when there are alot of zeros
compared to ones in the training set, which is the case for credit default data,
the predicted values will be small. Those probabilities can be interpreted as the
probability of default of individual firm. An example of computed probabilities
Histogram of Probability of Default

600
Frequency
400
200
0
0.00 0.05 0.10 0.15 0.20 0.25 0.30
Prob. Default
Figure 2.5: Example of a emperical distribution of probabilities of default (PD).
From Figure 2.5 it is apparent that the largest PD is considerable below 0.5 and
thus all the fitted values would get the value zero if they where rounded to binary
15 Logistic regression is a modeling procedure that is specialized for modeling when the
dependent variable is either one or zero. Logistic regression is introduced in section 3.2.2 and
a more detailed discussion can be seen in section 5.2.2.
numbers. This is the main reason for why ordinary classification and validation
methods do not work on credit default data. The observed probabilities of
default are small numbers and thus not easily interpreted. Hence, to enhance
the readability the default probabilities they are transformed to risk ratings.
Rating Model Corporate has 12 possible ratings and the same transformation to
risk rating scale was used for proposed models, in order to ensure comparability.
The transformation from PDs to risk ratings is summarized in Table 2.1.
PD-interval Rating
[ 0.0%; 0.11% [ 12
[ 0.11%; 0.17% [ 11
[ 0.17%; 0.26% [ 10
[ 0.26%; 0.41% [ 9
[ 0.41%; 0.64% [ 8
[ 0.64%; 0.99% [ 7
[ 0.99%; 1.54% [ 6
[ 1.54%; 2.40% [ 5
[ 2.40%; 3.73% [ 4
[ 3.73%; 5.80% [ 3
[ 5.80%; 9.01% [ 2
[ 9.01%; 100.0% ] 1
Table 2.1: Probabilities of Default (PD) are transformed to the relative risk
rating.
It is apparent from Table 2.1 that the PD-intervals are very different is size.
It is also apparent that low PDs representing a good borrower are transformed
to high risk rating. An example of a risk rating distribution can be seen in
Figure 2.6. When the ratings have been observed it is possible to validate the
results, that is done by computing the discriminatory power16 of the observed
ratings. The discriminatory power indicators are then compared to the indica-
tors calculated for RMC in the specific validation set. The model performance
is concluded from the discriminatory power indicators. Numerous discrimina-
tory power methods are presented in Section 6.4. Important information can
be drawn form visual representation of the model performance as in the rela-
tive and cumulative frequencies of the good and bad cases respectively and the
respective ROC curve, which are all introduced in Sections 6.2 and 6.3. Visual
comparison is not made when the modeling is performed on numerous modeling
sets, that is when the resampling process is used.
16 The term, discriminatory power refers to the fundamental ability to differentiate between
good and bad cases and is introduced in Section 6.1.

Histogram of Predicted Ratings

0.15
Relative Frequency
0.10
0.05
0.00
2 4 6 8 10 12
Rating Class
Figure 2.6: Example of a Risk Rating distribution, when the PDs have been
transformed to risk ratings.
From the model performance it is possible to assess different varaibles and mod-
eling procedures. The results can be seen in Section 7.
Chapter 3
Commonly Used Credit

Assessment Models
In this chapter, credit assessment models, commonly used in practice, are pre-
sented. First their general functionality and application is introduced, followed
by a light discussion of current research in the field is given. The credit as-
sessment models are used to rate borrowers based on their creditworthiness and
they can be grouped as seen in Figure 3.1. The three main groups are heuristic,
statistical and causal models. In practice, combinations of heuristic and either
of the other two methods are frequently used and referred to as hybrid mod-
els. The discussion here is adapted from Datschetzky et al. [13]1 and should be
viewed for a more detailed discussion.
Heuristic models are discussed in Section 3.1 and a brief introduction of sta-
tistical models in Section 3.2 and a more detailed discussion in Chapter 5. In
Section 3.3 models based on option pricing theory and cash flow simulation are
introduced and then finally hybrid form models are introduced in Section 3.4.
1 Chapter 3
22 Commonly Used Credit Assessment Models
Figure 3.1: Systematic overview of Credit Assessment Models.
3.1 Heuristic Models
Heuristic models attempt to use past experience to evaluate the future creditwor-
thiness of a potential borrower. Credit experts choose relevant creditworthiness
factors and their weights, based on their experience. Significancy of factors are
not necessarily estimated and their weights not necessarily optimized.
3.1.1 Classic Rating Questionnaires
In classic rating questionnaires the credit institutions, credit expert’s define

clearly answerable questions regarding factors relevant to creditworthiness and
assigns fixed number of points to specific answers. Generally, the higher the
point score the better the credit rating will be. This type of models are fre-
quently observed in the public sector, and then filled out by a representative of
the credit institute. An example of questions for a public sector customer might
be, sex, age, maritual status and income.
3.1 Heuristic Models 23
3.1.2 Qualitative Systems
In qualitative systems the information categories relevant to creditworthiness

are defined by credit experts, but in contrast to questionnaires , qualitative
systems are not assigned a fixed value in each factor. Instead, a representative
of the credit institute evaluates the applicant for each factor. This might by
done with grades and then the final assessment would be a weighted or simple
average of all grades. The grading system need to be well documented in order
to get similar ratings from different credit institute representatives.
In practice, credit institutions have used these procedures frequently, especially

in the corporate customer segment. Improvements in data availability along
with advances in statistics have reduced the use of qualitative systems.
3.1.3 Expert Systems
Expert systems are software solutions which aim to recreate human problem
solving abilities. The system uses data and rules selected by credit experts in
order to evaluate its expert evaluation.
Altman and Saunders [3] reports that bankers tend to be overly pessimistic
about the credit risk and that multivariate credit-scoring systems tend to out-
perform such expert systems.
3.1.4 Fuzzy Logic Systems
Fuzzy logic systems can be seen as a special case of expert systems with the
additional ability of fuzzy logic. In a fuzzy logic system, specific values entered
for creditworthiness criteria are not allocated to a single categorical term e.g.
high or low, rather they are assigned multiple values. As an example consider
a expert system that rates firms with return on equity of 15% or more as good
and a return on equity of less than 15% as poor. It is not in line with human
decision-making behavior to have such sharp decision boundaries, as it is not
sensible to rate a firm with return on equity of 14.9% as poor and a firm with a
return on equity of 15% as good. By introducing a linguistic variable as seen in
Figure 3.2 a firm having return on equity of 5% would be considered 100% poor
and a firm having return on equity of 25% would be considered 100% good. A
firm with a return on equity of 15% would be be considered 50% poor and 50%
good. These linguistic variables are used in a computer based evaluation based
1
Poor Good
0.8
0.6
0.4
0.2
0 5 10 15 20 25 30
Return on equity (%)
Figure 3.2: Example of a Linguistic Variable.
on the experience of credit experts. The Deusche Bundesbank uses discriminant

analysis as a main modeling procedure with the error rate 18.7%, then after
introducing fuzzy logic system the error rate dropped to 16%.
3.2 Statistical Models
Statistical models rely on empirical data suggested by credit experts as predic-

tors of creditworthiness, while heuristic models rely purely on subjective expe-
rience of credit experts. In order to get good predictions from statistical models
large empirical datasets are required. The traditional methods of discriminant
analysis and logistic regression are discussed in Sections 3.2.1 and 3.2.2, respec-
tively. Then more advanced methods for modeling credit risk are discussed in
Section 3.2.3.
3.2 Statistical Models 25
3.2.1 Discriminant Analysis
In 1968, Altman [2] introduced his Z-score formula for predicting bankruptcy,
this was the first attempt to predict bankruptcy by using financial ratios. To
form the Z-score formula, Altman used linear multivariate discriminant analysis,
with the original data sample consisted of 66 firms. Half of the firms had filed
for bankruptcy.
Altman proposed the following Z-score formula

Z = 0.12X1 + 0.14X2 + 0.033X3 + 0.006X4 + 0.999X5 (3.1)
where
X1 = Working Capital / Total Assets.

Measures net liquid assets in relation to the size of the company.
X2 = Retained Earnings / Total Assets.
Measures profitability that reflects the company’s age
X3 = Earnings Before Interest and Taxes / Total Assets.
Measures operating efficiency apart from tax and leveraging factors.
X4 = Market Value Equity / Book Value of Total Debt.
Measures how much firms market value can decline before coming insol-
vent.
X5 = Sales / Total Assets.
Standard measure for turnover and varies greatly from industry to indus-
try.
All the values except the Market Value Equity, in X4 , can be found directly
from firms’ financial statements. The weights of the original Z-score was based
on data from publicly held manufacturers with assets greater than $1 million,
but has since been modified for private manufacturing, non-manufacturing and
service companies. The discrimination of Z-score model can be summarized as
follows
2.99< Z-score Firms having low probability of default

1.81 ≤ Z-score ≤ 2.99 Firms having intermediate probability of default
1.81> Z-score Firms having high probability of default
Advances in computing capacity has made discriminant analysis (DA) a popular

tool for credit assessment. The general objective of multivariate discriminant
analysis is to distinguish between default and non-default borrowers, with help

of several independent creditworthiness figures. Linear discriminant functions
are frequently used in practice and can be given a simple explanation as an
weighted linear combination of indicators. The discriminant score is
D = w0 + w1 X1 + w1 X2 + . . . + wk Xk (3.2)
The main advantage of DA, compared to other classification procedures is that

the individual weights show the contribution of each explanatory variable. The
result of the linear function is then also easy to interpret, as low Z-score is
observed it represents a poor loan applicant.
The downside to DA is that it requires the explanatory variables to be normally

distributed. Another prerequisite is that the explanatory variables are required
to have the same variance for the groups to be discriminated. In practice this
is however often thought to be less significant and thus often disregarded.
Discriminant analysis is given a more detailed mathematical discussion in Sec-

tion 5.3.
3.2.2 Logistic Regression
Another popular tool for credit assessment is the logistic regression. Logistic re-
gression uses as a dependent variable a binary variable that takes the value one
if a borrower defaulted in the observation period and zero otherwise. The inde-
pendent variables are all potentially relevant parameters to credit risk. Logistic
regression is discussed further and in more detail in Section 5.2.2. A logistic
regression is often represented using the logit link function as
1
p(X) = (3.3)
1 + exp[−(β0 + β1 X1 + β1 X2 + · · · + βk Xk )]
where p(X) is the probability of default given the k input variables X. Logistic
regression has several advantages over DA. It does not require normal distribu-
tion in input variables and thus qualitative creditworthiness characteristics can
be taken into account. Secondly the results of logistic regression can be inter-
preted directly as the probability of default. According to Datschetzky et al.
[13] logistic regression has seen more widespread use both in academic research
and in practice in recent years. This can be attributed to the flexibility in data
handling and more readable results compared to discriminant analysis.
3.2 Statistical Models 27
3.2.3 Other Statistical and Machine Learning Methods
In this section a short introduction of other methods which can be grouped under
the same heading of statistical and machine learning methods. As advances in
computer programming evolved new methods were tried as credit assessment
methods, those include
- Recursive Partitioning Algorithm (RPA)

- k-Nearest Neighbor Algorithm (kNN)
- Support Vector Machine (SVM)
- Neural Networks (NN)
A brief introduction of those methods follows.
Recursive Partitioning Algorithm (RPA)
One of these methods Recursive Partitioning Algorithm (RPA) is a data mining

method that employs decision trees and can be used for a variety of business
and scientific applications. In a study by Frydman et al. [16] RPA was found
to outperform discriminant analysis in most original sample and holdout com-
parisons. Interestingly it was also observed that additional information where
derived by using both RPA and discriminant analysis results.
This method is also known as classification and regression trees (CART) and is
given a more detailed introduction under that name in Section 5.5.
k-Nearest Neighbor Algorithm (kNN)
k-Nearest Neighbor Algorithm is a non-parametric method that considers the

average of the dependent variable of the k observation that are most similar to
a new observation and is introduced in Section 5.4.
Support Vector Machine (SVM)
Support Vector Machine is method closely related to discriminant analysis where

an optimal nonlinear boundary is constructed. This rather complex method is
given a brief introduction in Section 5.3.3.
Neural Networks (NN)
Neural networks use information technology in an attempt to simulate the com-

plicated way in which the human brain processes information. Without going
into to much detail on how the human brain works neural networks can be
thought of as multi-stage information processing. In each stage hidden corre-
lations among the explanatory variables are identified making the processing a
black box model2 . Neural networks can process any form of information which
makes then especially well suited to form a good rating models. Combining
the black box modeling and a large set of information NN generally show high
levels of discriminatory power. However, the black box nature of NN results in
great acceptance problems. Altman et al. [5] concluded that the neural network
approach did not materially improve upon the linear discriminant structure.
3.2.4 Hazard Regression
Hazard regression3 considerers time until failure, default in the case of credit
modeling. Lando [21] refers to hazard regression as the most natural statisti-
cal framework to analyze survival data but as Altman and Saunders [3] points
out an financial institute would need a portfolio of some 20,000-30,000 firms
to develop very stable estimates of default probabilities. Very few financial
institutes worldwide come even remotely close to having this number of poten-
tial borrowers. The Robert Morris Associates, Philadelphia, PA, USA, have
though initiated a project to develop a shared national data base, among larger
banks, of historic mortality loss rates on loans. Rating agencies, have adopted
and modified the mortality approach and utilize it in their structured financial
instrument analysis, according to Altman and Saunders [3].
3.3 Causal Models
Causal models in credit assessment procedures use the analytics of financial

theory to estimate creditworthiness. These kind of models differ from statistical
models in the way that they do not rely on empirical data sets.
2A black box model is a model where the internal structure of the model is not viewable
3 Hazard Regression is also called Survival Analysis in the literature.
3.3 Causal Models 29
3.3.1 Option Pricing Models
The revolutionary work of Black and Scholes (1973) and Merton (1974) formed
the basis of the option pricing theory. The theory was originally used to price
options4 can also be used to valuate default risk on the basis of individual
transactions. Option pricing models can be constructed without using a com-
prehensive default history, however it requires data on the economic value of
assets, debt and equity and especially volatilities. The main idea behind the
option pricing model is that credit default occurs when the economic value of
the borrowers asset falls below the economic value of the debt.
The data required makes it impossible to use option pricing models in the public
sector and it is not without its problem to require the data needed for the
corporate sector, it is for example difficult in many cases to assess the economic
value of assets.
3.3.2 Cash Flow Models
Cash flow models are simulation models of future cash flow arising from the
assets being financed and are thus especially well suited for credit assessment
in specialized lending transactions. Thus the transaction itself is rated, not the
potential borrower and the result would thus be referred to as transaction rating.
Cash flow models can be viewed as a variation of the option pricing model where
the economic value of the firm is calculated on the basis of expected future cash
flow.
3.3.3 Fixed Income Portfolio Analysis
Since the pioneering work of Markowich, 1959, portfolio theory has been ap-
plied on common stock data. The theory could just as well be applied to the
fixed income area involving corporate and government bonds and even banks
portfolio of loans. Even though portfolio theory could be a useful tool for fi-
nancial institutes, widespread use of the theory has not been seen according to
Altman and Saunders [3]. Portfolio theory lays out how rational investors will
use diversification to optimize their portfolio. The traditional objective of the
portfolio theory is to maximize return for a given level of risk and can also be
used for guidance on how to price risky assets. Portfolio theory could be applied
4 financial instrument that gives the right, but not the obligation, to engage in a future
transaction on some underlying security.

to banks portfolio to price, by determining interest rates, new loan applicants

after calculating their probability of default (PD), their risk measure.
3.4 Hybrid Form Models
The models discussed in previous sections are rarely used in their pure form.
Heuristic models are often used in collaboration with statistical or causal mod-
els. Even though statistical and causal models are generally seen as better rating
procedures the inclusion of credit experts’ knowledge generally improves ratings.
In addition not all statistical models are capable of processing qualitative infor-
mation e.g. discriminant analysis or they require a large data set to produce
significant results.
The use of credit experts’ knowledge also improves users acceptance.
There are four main architectures to combine the qualitative data with the
quantitative data.
- Horizontal linking of model types. Then both qualitative and quanti-

tative data are used as a input in the rating machine.
- Overrides, here the rating obtained from either statistical or a causal

model is altered by the credit expert. This should only be done for few
firms and only if it is considered necessary. Excessive use of overrides may
indicate a lack of user acceptance or a lack of understanding of the rating
model.
- Knock Out Criteria, here the credit experts set some predefined rules,
which have to be fulfilled before an credit assessment is made. This can
for example that some specific risky sectors are not considered as possible
customers.
- Special Rules, here the credit experts set some predefined rules. The
rules can be on almost every form and regard every aspect of the modeling
procedure. An example of such rules would be that start-up firms could
not get higher ratings than some predefined rating.
All or some of these architectures could be observed in hybrid models.

3.5 Performance of Credit Risk Models 31
3.5 Performance of Credit Risk Models
In order to summarize the general performance of the models in this Chapter

the performance of some of the models can be seen in Table 3.1 Datschetzky
et al. [13]5 reports a list of Gini Coefficient6 values obtained in practice for
different types of rating models. As can be seen in Table 3.1 multivariate model
Model Gini Coefficient

Univariate models In general, good individual indicators
can reach 30-40%. Special indica-
tor may reach approx 55% in selected
samples.
Classic rating questionnaire Frequently below 50%
/ qualitative systems
Option pricing models Greater than 55% for exchange-listed
companies.
Multivariate models (discriminant Practical models with quantitative in-
analysis and logistic regression) dicators reach approximately 60-70%.
Multivariate models with quantita- Practical models reach approximately
tive and qualitative factors 70-80%
Neural Networks Up to 80% in heavily cleansed sam-
ples: however, in practice this value
is hardly attainable.
Table 3.1: Typical values obtained in practice for the Gini coefficient as a mea-
sure of discriminatory power.
generally outperform option pricing models by quite a margin. The importance

of qualitative factors as modeling variables is also clear. Neural networks have
also been shown to produce great performance, but the high complexity of the
rating procedure makes neural networks a less attractive option.
In the study of Yu et al. [34] highly evolved neural networks where compared
with logistic regression, simple artificial neural network (ANN) and a support
vector machine (SVM). The study also compared a fuzzy support vector ma-
chine (Fuzzy SVM). The study was performed on detailed information of 60
5 pp. 109
6 The Gini coefficient ranges form zero to one, one being optimal.The Gini coefficient is
introduced in Section 6.4
corporations which of 30 where insolvent. The results reported in Table 3.27
Category Model Rule Average Hit Rate (%)

Single Log R 70.77 [5.96]
ANN 73.63 [7.29]
SVM 77.84 [5.82]
Hybrid Fuzzy SVM 79.00 [5.65]
Ensemble Voting-based Majority 81.63 [7.33]
Reliability-based Maximum 84.14 [5.69]
Minimum 85.01 [5.73]
Median 84.25 [5.86]
Mean 85.09 [5.68]
Product 85.87 [6.59]
Table 3.2: Results of a comprehensive study of Yu et al. [34], emphasizing on

neural networks. The figures in the brackets are the standard deviations.
show that logistic regression has the worst performance of all the single model-
ing procedures, whereas SVM performs best of the single modeling procedures
By introducing fuzzy logic to the SVM the performance improves. The multi-
stage reliability-based neural network ensemble learning models all show similar
performance and outperform the single and hybrid form models significantly.
Galindo and Tamayo [17] conducted an extensive comparative research of dif-

ferent statistical and machine learning modeling methods of classification on a
mortgage loan data set. Their findings for a training sample of 2,000 records
are summarized in Table 3.3. The results show that CART decision-tree models
Model Average Hit Rate (%)

CART 91.69
Neural Networks 89.00
K-Nearest Neighbor 85.05
Probit 84.87
Table 3.3: Performance of different statistical and machine learning modeling

methods of classification on a mortgage loan data set
number of correct classification

Total Hit Rate =
the number of evaluation sample
3.5 Performance of Credit Risk Models 33
provide the best estimation for default with an average 91.69% hit rate. Neural
Networks provided the second best results with an average hit rate of 89.00%.
The K-Nearest Neighbor algorithm had an average hit rate of 85.05%. These
results outperformed a logistic regression model using the Probit link function,
which attained an average hit rate of 84.87%. Although the results are for mort-
gage loan data it is clear that the performance of logistic regression models can
be outperformed.
Current studies
Credit crisis in the 70s and 80s fueled researches in the field, resulting in great
improvements in observed default rates. High default rates in the early 90s and
in the beginning of a new millennium have ensured that credit risk modeling is
still an active research field. In the light of the financial crisis of 2008, researches
in the field are sure to continue. Most of the current research is highly evolved
and well beyond the scope of this thesis and is thus just given a brief discussion.
Even though it is not very practical for most financial institutes much of current
researches are focused on option pricing models. Lando [21] introduces Intensity
Modeling as the most exciting research area in the field. Intensity models can
explained in a naive way as a mixture of hazard regression and standard pricing
machinery. The objective of Intensity models is not to get the probability of
default but to build better models for credit spreads and default intensities. The
math of Intensity models is highly evolved and one should refer to Lando [21]
for a complete discussion on the topic.
The subject of credit pricing has also been subject to extensive researches, es-
pecially as credit derivatives have seen more common use. The use of macroe-
conomical variables is seen as a material for prospective studies.
The discussion here on credit assessment models is rather limited and for further
interest one could view Altman and Saunders [3] and Altman et al. [4] for a
discussion on the development in credit modeling, Datschetzky et al. [13] for a
good overview of models used in practice. Lando [21] then gives a good overview
of current research in the field, along with extensive list of references.
Chapter 4
Data Resources
The times we live in are sometimes referred to as the information age, whereas
the technical breakthrough of commercial computers have made information
recordings an easier task. Along with increased information it has also made
computations more efficient furthering advances in practical mathematical mod-
eling.
In the development of a statistical credit rating models the quality of the data
used in the model development, is of great importance. Especially important is
the information on the few firms that have defaulted on their liabilities.
In this chapter the data made available by the co-operating Corporate bank
are presented. This chapter is partly influenced by the co-operating bank’s in-
house paper Credit [11]. Section 4.1 introduces data dimensionality and data
processing is discussed. Introduction of quantitative and qualitative figures are
given in Sections 4.2 and 4.3, respectively. Customer factors are introduced in
Section 4.4 and other factors and figures are introduced in Section 4.5. Finally,
some preliminary data analysis are performed in Section 4.6.
36 Data Resources
4.1 Data dimensions
The data used in the modeling process are the data used in the co-operating
Corporate bank’s current credit rating model, which is called Rating Model
Corperate (RMC), which is introduced in Section 4.5.1. The available data can
be grouped according to their identity into the following groups
- Quantitative
- Qualitative
- Customer factors
- Other factors and figures
Rating Model Corperate is a heuristic model and was developed in 2004. There-
fore, the first raw data are from 2004 as can be seen in Table 4.1. In order to
validate the performance of the credit rating model the dependent variable,
which is whether the firm has defaulted on it’s obligations a year after it was
rated, is needed. In order to construct datasets that are submissible for valida-
tion, firms that are not observed in two successive years and thus being either a
new customer or a retireing one, are removed from the dataset. The first valida-
tion was done in 2005 and from Table 4.1 it can be seen that the observations of
the constructed 2005 dataset are noticeably fewer than the raw dataset of 2004
and 2005, due to the exclusion of new or retireing customers. The constructed
datasets are the datasets that the co-operating bank would perform their vali-
dation on, they are however not submissible for use in modeling purposes. The
reason for that is that there are missing values in the constructed dataset.
By removing missing values from the constructed datset a complete dataset is

obtained. It is complete in the sense that there are equally many observations
for all variables. The problem with removing missing values is that a large
proportion of the data are thrown away as can be seen in Table 4.1. Some
variables have more missing values than others and by excluding some of the
variables with many missing values would result in a larger modeling dataset.
When the data has been cleansed they are splitted into training and validation
sets. The total data will be approximately splitted as follows, 50% will be used
as a training set, 25% as a validation set and 25% as a test set:
Training Validation Test

4.1 Data dimensions 37
Data Set Rows Columns

Raw Data
- 2008 4063 2
- 2007 4125 29
- 2006 4237 29
- 2005 4262 29
- 2004 4521 29
Constructed Data
- 2008 ∼3600 29
- 2007 3599 29
- 2006 3586 29
- 2005 3788 29
Complete Data
- 2008 2365 29
- 2007 2751 29
- 2006 2728 29
- 2005 2717 29
Table 4.1: Summary of data dimensions and usable observations.
The training set is used to fit the model and the validation set is used to estimate
the prediction error for model selection. In order to account for the small sample
of data, that is of bad cases, the process of splitting, fitting and validation is
performed recursively. The average performance of the recursive evaluations is
then consider in the modeling development.
The test set is then used to assess the generalization error of the final model
chosen. The training and validation sets, together called modeling sets, are
randomly chosen sets from the 2005, 2006 and 2007 dataset whereas the test
set is the 2008 dataset. The recursive splitting of the modeling sets is done by
choosing a random sample without replacement such that the training set is 2/3
and validation set is 1/3 of the modeling set.
To see how the co-operating bank’s portfolio is concentrated between sectors

the portfolio is splitted up into five main sectors, those are:
- Real estate
- Trade
- Production
38 Data Resources
- Service
- Transport
The portfolio is splitted according to a in-house procedure largely based a Danish

legislation called the Danish Industrial Classification 2003 (DB03) which is based
on EU legislations. To view how the portfolio is divided between sectors the
number of total observations of the complete data set and respective percentage
of each sector can be seen in Table 4.2. Table 4.2 also shows the number of
defaulted observations in each sector and the relative default rate.
Sector Observations [%] Default Observations [%] Default Rate (%)

Real Estate 2295 [28.0] 21 [15.2] 0.92
Trade 1153 [14.1] 11 [ 8.0] 0.95
Production 3181 [38.8] 82 [59.4] 2.58
Service 1348 [16.5] 21 [15.2] 1.56
Transport 219 [ 2.7] 3 [ 2.2] 1.37
All 8196 [100.0] 138 [100.0] 1.68
Table 4.2: Summary of the portfolios concentration between sectors and sector-
wise default rates.
By analyzing Table 4.2 it is apparent that the production sector is the largest
and has the highest default rate. On the other hand the trade and real estate
sectors have rather low default rates.
It is difficult to generalize what default rate can be considered as normal, but

some assumptions can be made by considering the average default rates of the
period 1982-2006 in the U.S. reported by Altman et al. [4]. Where most of the
observations are between one and two percentages that might be considered as
normal default rates. There are not as many observations between two and five
percentage, which can then be considered as high default rates and percentages
above five as very high.
4.2 Quantitative key figures 39
4.2 Quantitative key figures
As a quantitative measure of creditworthiness financial ratios are used. A finan-

cial ratio is a ratio of selected values on a firm’s financial statements1 . Financial
ratios can be used to quantify many different aspects of a firms financial per-
formance and allow for comparison between firms in the same business sector.
Furthermore, financial ratios can be used to, compare firms to it’s sector aver-
age and to consider their variation over time. Financial ratios can vary greatly
between sectors and can be categorized by which aspect of business it describes,
the categories are as follows.
- Liquidity ratios measure the firm’s, availability of cash to pay debt.

- Leverage ratios measure the firm’s ability to repay long-term debt.
- Profitability ratios measure the firm’s use of its assets and control of
its expenses to generate an acceptable rate of return.
- Activity ratios measure how quickly a firm converts non-cash assets to
cash assets.
- Market ratios measure investor response to owning a company’s stock
and also the cost of issuing stock.
Only first four categories of these ratios are used to measure firms creditwor-
thiness as the market ratios are mostly used in the financial markets. The dis-
cussion here and in the following sections on financial ratios is largely adapted
from Credit [11] and Bodie et al. [9]
As the values used to calculated the financial ratio are obtained from firm’s
financial statements, it is only possible to calculate financial ratios when a firm
has published its financial statements. This produces two kinds of problems,
firstly new firms do not have financial statements and secondly new data are
only available once a year.
Mathematically financial ratios will be referred to as the greek letter alpha, α.

Financial ratios are also referred to as key figures or key ratios both in this work
and in the literature. The summary statistics and figures are obtained by using
the complete datasets.
1 Financial statements are reports which provide an overview of a firm’s financial condition
in both short and long term. Financial statements are usually reported annually and splitted
into two main parts, first the balance sheet and secondly the income statement. The balance
sheet reports current assets, liabilities and equity, while the income statement reports the
income, expenses and the profit/loss of the reporting period.
40 Data Resources
4.2.1 Liquidity Ratio
The liquidity ratio is a financial ratio that is used as a measure of liquidity.

The term, liquidity, refers to how easily an asset can be converted to cash. The
liquidity ratio in equation (4.1) consists of current assets2 divided by current
liabilities3 and is thus often referred to as the current ratio. The liquidity ratio
is considered to measure to some degree whether or not a firm has enough
resources to pay its debts over the next 12 months.
Current Assets
liquidity = (4.1)
Current liabilities
The liquidity ratio can also be seen as a indicator of the firm’s ability to avoid
insolvency in the short run and should thus be a good indicator of creditwor-
thiness. By considering the components of equation (4.1), it can be seen that
a large positive value of the current ratio can be seen as a positive indicator of
creditworthiness. In the case that the current liabilities are zero, it is considered
as a positive indicator of creditworthiness, and the liquidity ratio is given the
extreme value 1000. In Table 4.3 the summary statistics of the liquidity ratio
can be seen for all sectors and each individual sector.
Statistics All Sectors Real Estate Trade Production Service Transport

Min. −0.65 −0.09 −0.01 −0.65 0.00 0.00
1st Qu. 0.83 0.14 0.94 0.83 0.53 0.47
Median 1.11 0.62 1.19 1.11 0.97 0.69
Mean 1.26 2.31 1.53 1.26 1.57 0.86
3rd Qu. 1.46 1.58 1.60 1.46 1.48 0.99
Max. 25.64 275.50 37.21 25.64 91.80 10.54
ev(1000) 0.95% 2.48% 0.78% 0.22% 0.37% 0.0%
Table 4.3: Summary statistics of the Liquidity ratio, without the 1000 values.
The rate of observed extreme values, ev(1000), is also listed for each sector.
As can be seen in Table 4.3 by looking at the median and first quarters the real
estate sector has the lowest liquidity ratio. The transport sector also has low
liquidity ratios. The liquidity ratio for all sectors and each individual sector can
be seen in Figure 4.1.
2 Current assets are cash and other assets expected to be converted to cash, sold, or con-
sumed within a year.

3 Current liabilities these liabilities are reasonably expected to be liquidated within a year.
They usually include amongst others, wages, accounts, taxes, short-term debt and proportions
of long-term debt to be paid this year
The liquidity ratio will simply be referred to as the liquidity whereas it measures
the firms ability to liquidating its current assets by turning them into cash. It
is though worth noting that it is just a measure of liquidity as the book value of
assets might be considerable different to it’s actual value. Mathematically the
liquidity will be referred to as αl .
4.2.2 Debt ratio
The Debt ratio a key figure consisting of net interest bearing debt divided by
the earnings before interest, taxes, depreciation and amortization (EBITDA)4 .
The Debt ratio can be calculated using equation (4.2) where the figures are
obtainable from the firm’s financial statement.
Debt Net interest bearing debt
= (4.2)
EBITDA Operating profit/loss + Depreciation/Amortization
Where the net interest bearing debt can be calculated from the firm´s financial
statement and equation (4.3).
Net interest bearing debt = Subordinary loan capital + long term liabilities
+ Current liabilities to mortgagebanks + Current bank liabilities
+ Current liabilities to group + Current liabilities to owner, etc.
− Liquid funds − Securities − Group debt
− Outstanding accounts from owner, etc.
(4.3)
The Debt ratio is a measure of the pay-back period as it indicates how long
time it would take to pay back all liabilities with the current operation profit.
The longer the payback period, the greater the risk and thus will small ratios
indicates that the firm is in a good financial position. As both debt and EBITDA
can be negative there are some precautions that have to be made, as it has two
different meaning if the ratio turns out to be negative. In the case where the
debt is negative it is a positive thing and should thus be overwritten as zero or
a negative number to indicate a positive creditworthiness. In the case where the
EBITDA is negative or zero the ratio should be overwritten as a large number to
indicate poor creditworthiness, in the original dataset these figures are -1000 and
1000 respectively. In the case when both values are negative they are assigned
the resulting positive value, even though negative debt can be considered as a
much more positive thing.
4 Amortization is the write-off of intangible assets and depreciation is the wear and tear of
tangible assets.
42 Data Resources
All Sectors Real Estate

0.8
0.8
0.6
0.6
Density
Density
0.4
0.4
0.2
0.2
0.0
0.0
0 5 10 15 20 25 30 0 5 10 15 20 25 30
Liquidity Liquidity
Trade Production
0.8
0.8
0.6
0.6
Density
Density
0.4
0.4
0.2
0.2
0.0
0.0
0 5 10 15 20 25 30 0 5 10 15 20 25 30
Liquidity Liquidity
Service Transport
0.8
0.8
0.6
Density
Density
0.4
0.4
0.2
0.0
0.0
0 5 10 15 20 25 30 0 5 10 15 20 25 30
Liquidity Liquidity
Figure 4.1: Histogram of the liquidity ratio for all sectors and each individual
sector, the figures shows a refined scale of this key figure for the complete dataset.
The overwritten values have to be carefully selected in order to prevent that

the regression will be unstable. Histograms of the Debt ratio for all sectors and
each individual sector can be seen in Figure 4.2. The ±1000 values make it hard
to see the distribution of the other figures and are thus not shown. As can be
seen in Figure 4.2 the debt ratio is different for different sectors, especially in
the real estate sector. There the ratio is on average larger for the real estate
sector than for the other sectors. In order to get an even better view of this key
figure summary values for all sectors and each individual sector can be seen in
Table 4.4.

Min. 0.01 0.00 0.01 0.01 0.00 0.24
1st Qu. 1.64 4.95 2.18 1.64 2.18 1.95
Median 3.14 7.67 4.00 3.14 3.93 3.27
Mean 5.87 11.56 6.62 5.87 6.78 5.87
3rd Qu. 5.21 11.42 6.59 5.21 6.90 5.16
Max. 469.90 454.70 601.00 469.90 162.40 157.10
ev(1000) 6.73% 6.58% 6.50% 6.41% 8.61% 2.74%
ev(-1000) 5.17% 4.23% 4.16% 5.28% 7.79% 2.74%
Table 4.4: Summary of debt/EBITDA, for all sectors and each individual sec-
tor, without figures outside the ±1000 range. The rate of the extreme values
ev(1000) and ev(-1000) for each sector is also listed.
From Table 4.4 it is clear that the real estate sector has considerable larger
Debt ratio than the other sector which are all rather equal. The inconsistency
between sectors has to be considered before modeling. Mathematically the Debt
ratio will be referred to as αd .
4.2.3 Return on Total Assets
The Return On total Assets (ROA) percentage shows how profitable a com-
pany’s assets are in generating revenue. The total assets are approximated as
the average of this years total assets and last years assets, which are the assets
that formed the operating profit/loss. Return On total Assets is a measure of
profitability and can be calculated using equation (4.4) and the relative compo-
nents from the firm’s financial statements.
Operating profit/loss
ROA = 1 (4.4)
2 (Balance sheet0 + Balance sheet−1 )
44 Data Resources

0.15
0.15
0.10
0.10
Density
Density
0.05
0.05
0.00
0.00
0 20 40 60 80 100 0 20 40 60 80 100
Debt/EBITDA Debt/EBITDA
Trade Production
0.15
0.15
0.10
0.10
Density
Density
0.05
0.05
0.00
0.00
0 20 40 60 80 100 0 20 40 60 80 100
Service Transport
0.15
0.00 0.05 0.10 0.15

0.10
Density
Density
0.05
0.00
0 20 40 60 80 100 0 20 40 60 80 100
Figure 4.2: Histograms of Debt/EBITDA for all sectors and each individual
sector, in a refined scale. The ±1000 values are not shown.
In equation (4.4) the balance sheets5 have the subscripts zero and minus one,
which refer to the current and last years assets, respectively. For firms that do
only have the current balance sheet, that value is used instead of the average
value of the currents and last years assets. Return on assets gives an indication of
the capital intensity of the firm, which differs between sectors. Firms that have
undergone large investments will generally have lower return on assets. Start
up firms do not have a balance sheet and are thus given the poor creditworthy
value -100. By taking a look at the histograms of the ROA in Figure 4.3 it is
clear that the transport sector and especially the real estate sector have a quite
different distribution compared to the other sectors.

Min. −104.10 −100.00 −100.00 −104.10 −100.00 −100.00
1st Qu. 3.17 3.13 3.93 3.17 2.69 3.78
Median 7.43 5.67 7.71 7.43 6.67 6.97
Mean 1.15 2.30 4.67 1.15 3.06 2.23
3rd Qu. 12.60 8.23 13.12 12.60 11.44 9.76
Max. 93.05 203.30 104.50 93.05 105.50 31.55
ev(-100) 6.49% 5.01 % 5.90% 8.20% 5.86% 4.11%
Table 4.5: Summary of Return On total Assets
As can be seen from Table 4.5 the ROA differs significantly between sectors.
The mean values might be misleading and it is better to consider the median
value and the first and third quartiles. It can be seen that the transport and
real estate sectors do not have as high ROA as the others which can partly be
explained by the large investments made by many real estate sector firms. It is
also observable that the first quartile of the service sector is considerable lower
than the others indicating a heavier negative tail than the other sectors.
4.2.4 Solvency ratio
Solvency can also be described as the ability of a firm to meet its long-term fixed
expenses and to accomplish long-term expansion and growth. The Solvency ratio
is also often referred to as the equity ratio, consists of the shareholders’ equity6
and the balance sheet, obtainable from the firm’s financial statement.
Shareholders’ equity
Solvency = (4.5)
Balance sheet
5 Balance sheet=Total Assets=Total Liabilities + Shareholders’ Equity
6 Equity=Total Assets-Total Liabilities. Equity is defined in Section ??.
46 Data Resources

0.08
0.08
Density
Density
0.04
0.04
0.00
0.00
−100 −50 0 50 100 −100 −50 0 50 100
Return Return
Trade Production
0.08
0.08
Density
Density
0.04
0.04
0.00
0.00
−100 −50 0 50 100 −100 −50 0 50 100
Return Return
Service Transport
0.08
0.08
Density
Density
0.04
0.04
0.00
0.00
−100 −50 0 50 100 −100 −50 0 50 100
Return Return
Figure 4.3: Histograms of the Return On total Assets for all sectors and each
individual sector.
The balance sheet can be considered as either the the total assets or the sum of
total liabilities and shareholders’ equity. By considering the balance sheet to be
the sum of total liabilities and shareholders’ equity the solvency ratio describes
to what degree the shareholders’ equity is funding the firm. The solvency ratio
is a percentage and ideally on the interval [0%,100%]. The higher the solvency
ratio, the better the firm is financially.
By viewing Table 4.6 it can be seen that the minimum values are large negative
figures. This occurs when the valuations placed on assets does not exceed lia-
bilities, then negative equity exists. In the case when the balance sheet is zero,
as is the case for newly started firms, the Solvency ratio is given the extremely
negative creditworthiness value of -100. To get a better view of the distribution

ev(-100) 3.64% 4.23% 4.86% 3.02% 3.12% 3.20%
Min. −138.10 −133.40 −100.00 −138.10 −234.40 −100.00
1st Qu. 14.59 10.06 13.59 14.59 13.01 13.30
Median 24.27 22.00 22.52 24.27 24.82 18.23
Mean 23.19 22.72 20.87 23.19 25.09 18.30
3rd Qu. 34.96 38.01 34.95 34.96 41.11 27.79
Max. 99.57 100.00 100.00 99.57 100.00 83.48
Table 4.6: Summary statistics of the Solvency ratio
of the Solvency ratio, histograms of the solvency ratio can be seen in Figure
4.4. As can be seen in Figure 4.4 the distribution is mainly on the positive side
of zero. The transport and real estate sectors look quite different compared
to the other sectors. Then by considering the median value and the first and
third quantiles it is observable that the trade and productions sectors are quite
similar. The real estate and service sectors are tailed towards 100 while the real
estate is also tailed towards zero.
4.2.5 Discussion
Firms that have just started business do not have any financial statements to
construct the quantitative key figures. In order to assess the creditworthiness
of a start-up firm there are two possibilities. One is to build a separate start-up
model and the other is to adapt the start-up firms to the rating model.
There is one other thing worth noting regarding financial ratios, and that is
48 Data Resources

0.04
0.04
Density
Density
0.02
0.02
0.00
0.00
−100 −50 0 50 100 −100 −50 0 50 100
Solvency Solvency
Trade Production
0.04
0.04
Density
Density
0.02
0.02
0.00
0.00
−100 −50 0 50 100 −100 −50 0 50 100
Solvency Solvency
Service Transport
0.04
0.04
Density
Density
0.02
0.02
0.00
0.00
−100 −50 0 50 100 −100 −50 0 50 100
Solvency Solvency
Figure 4.4: Histograms of the Solvency ratio for all sectors and each individual
sector.
that they are constructed on values that are called book value and might be
far from the actual market value. The book value of liabilities is subjected
to less uncertainty, but might be subjected to some uncertainty in interest and
exchange rates. That is if they hold some debt that carry adjustable rates or are
in foreign currencies, respectively. As the equity is calculated as the difference
between the total assets and total liabilities, the equity value might be far from
the actual market value. This fact results in some deteriation of the predictive
power of the financial ratios.
4.2.6 Scaled Key Figures
By considering the key figures in previous sections it is clear that there are
two problematic situations. First, it is difficult to decide what values should
be assigned in the cases when the actual ratio is nonsense and secondly is the
difference between sectors. The predictive power of the key figures would be
poor, especially for some sectors, if they where used without correcting them
for each sector. An article by Altman and Saunders [3] reports that sector
relative financial ratios, rather than simple firm specific financial ratios, are
better predictors of corporate default. It is stated that in general, the sector
relative financial ratio model outperformed the simple firm specific model.
The key figures have been scaled by the co-operating bank for use in their RMC.
The scaling process is performed in such a way such that the scaled key figures
are on the continuous scale from 1 to 7 where 1 indicates a bad situation and
7 indicates a good situation. In the cases when the actual ratios are nonsense,
they are assigned the value 1 if they are to represent a poor creditworthiness
and 7 if they are to represent a positive creditworthiness. After the simple firm
specific financial ratios have been scaled to correct them for each sector they
are referred to as scores. Since, they have been adjusted for their sector it is of
no interest to consider each sector separately.
Histograms of the scaled quantitative factors along with the default variable
and RMCs ratings can be seen in Figure 4.5. In the same figure one can see
the Spearman’s rank correlation7 and dotplots of the scaled key figures. The
Spearman’s rank correlation is used as an alternative to the Pearson’s correlation
as it is a non-parametric procedure and does thus not need any distributional
assumptions. In figure 4.5 it can be seen that there is some correlation between
the scaled key figures, especially between the debt and return scores and liquidity
7 Correlation is a numerical measure of how related two variables are. Correlation coeffi-
cients range from minus one to one where one means that they are completely the same and
minus one that they are totally different. If the correlation coefficient is zero then there is no
relation between the two variables.
50 Data Resources
and solvency scores.
Mathematically the scaled key figures will be referred to as the greek letter alpha
with a tilde sign above it, α̃.
4.3 Qualitative figures
In the credit application process, credit experts rate the potential borrower in
six different aspects, reflecting the firms position in that particular field. The
fields that make up the qualitative figures are the following.
- Management and strategy
- Sector stability and prospects
- Market position
- Staff situation production facilities and asset assessment
- Financial risk and management
- Refunding
The customer chief handling the loan application rates the potential borrower
in each field. The qualitative ratings are in discrete scale from 1 to 7 where 1
indicates a bad situation and 7 indicates a good situation. Those ratings then
need to be accepted by administrators in the credit department of the bank. It
is possible to reject each individual factor if it is not relevant to a firm.
In order to get a better feel of the qualitative factors a dotplot can be seen in
Figure 4.6, where red dots are defaulted firms and black dots are solvent firms.
In the same figure one can see the Spearman’s rank correlation and histograms
of the qualitative factors. From Figure 4.6 it is clear that the qualitative factors
are considerable correlated. It is also noticeable that red dots appear more often
in the lower left corner of the dot plots indicating that qualitative factors have
some predictive power.
For example, do new firms not have earlier min or max ratings, so if those
variables are to be used in modeling purposes it would result in smaller datasets.
For the qualitative figures there are quite a few cases where one of the six values
is missing and in order to save the observation from being omitted it would be
4.3 Qualitative figures 51
2 6 10 1 3 5 7 1 3 5 7
DEFAULT
0.8
0.4
0.0
RATING
10
0.18
6
2
7
DEBT_SCORE
5
0.11
0.49
3
1
7
LIQUIDITY_SCORE
5
0.065
0.56 0.27
3
1
7
RETURN_SCORE
5
0.081
0.20
0.62 0.034
3
1
7
SOLVENCY_SCORE
5
0.0031
0.11
0.67 0.36 0.49
3
1
0.0 0.4 0.8 1 3 5 7 1 3 5 7
Figure 4.5: Dotplot for all the scaled quantitative factors along with the default
variable and RMC ratings, where red dots are defaulted firm and black dots are
solvent firms. In the lower triangular the correlation of the variables can be seen
and on the diagonal there respective histograms.
52 Data Resources
1 3 5 7 1 3 5 7 1 3 5 7
7
MANAGEMENT
5
3
1
7
STABILITY
5
0.65
3
1
7
POSITION
5
0.63 0.65
3
1
7
SITUATION
5
0.60 0.54 0.53

3
1
7
REFUNDING
5
0.63 0.54 0.54 0.52
3
1
7
RISK
5
0.69 0.59 0.56 0.60 0.69

3
1
1 3 5 7 1 3 5 7 1 3 5 7
Figure 4.6: Dotplot for all the qualitative factors, where red dots are defaulted
firm and black dots are solvent firms. In the lower triangular there is the cor-
relation of the qualitative factors and on the diagonal there are histograms of
them.
4.4 Customer factors 53
possible to consider the principal component8 representatives of the qualitative

figures.
In mathematical notations the qualitative figures will be referred to as the greek

letter phi, ϕ.
4.4 Customer factors
The customer factors that are listed in Table 4.7 are the available in the data
as they are used in Rating Model Corporate. As can be seen from Table 4.7
Customer Factor Factor level

Is there an accountants’s Yes, an outright reservation
annotation in the financial Yes, a supplementary remark
statements? No
Has the company failed to Yes, within the past year
perform its obligation to FIH? Yes, within the past 12-24 months
No
Is the company listed on the No
stock exchange? Yes, but the shares are not OMXC20-listed
Yes, and the shares are OMXC20-listed
Age of company with current Up to and including 24 months
operation? From 25 months and up to
and including 60 months
From 61 months and older
Table 4.7: Customer factors used in the Corporate Model.
the customer factors all have three levels and most negative ones are in the
highest row and they get more positive as they get lower. The stock exchange
listed firms are unlikely to have any predictive powers as their are very few stock
exchange listed firms in the portfolio and furthermore it is not a indicator of a
more likely default event to be stock exchange listed, it is on the contrary. The
stock exchange listed firms can thus only be used as a heuristic variable, giving
8 The principal component analysis method is presented in Section 5.6.
54 Data Resources
stock exchange listed firms a higher rating than estimated. The reason for this
is that stock exchange listed firms have an active market for their shares and
can go to the market when in need for money by offering more shares.
Mathematically, the customer factors will be referred to as the greek letter

gamma, γ.
4.5 Other factors and figures
In this section some of the factors and figures that are not part of the qualitative,
quantitative figures or customer factors, are presented.
4.5.1 Rating Model Corporate
The rating model used by FIH today is called Rating Model Corporate. As it is
a rather delicate industrial secret it will just be briefly introduced. The model is
a heuristic9 model which uses the variables presented in the previous sections.
A systematic overview of the proceedings of Rating Model Corporate can be
seen in Figure 4.7.
Weighted average of the scaled qualitative factors and weighted average of the
qualitative key figures are weighted together to get an initial score. Customer
factors are then added to the model score which is then used in an exponential
formula in order to get an estimated PD. The PDs are then mapped to a the
final score which is on the range 1 - 12. There are also several special rules.
The weighted average makes it easy to handle missing values. The performance
of RMC can be seen in Section 7.5
4.5.2 KOB Ratings
KOB Score is a rating from the Danish department of the firm Experian which
is an international rating agency and is Denmarks largest credit rating agency.
The correlation of KOB ratings and Rating Model Corporate is around 0.6 so
9 A heuristic is a problem solving method. Heuristics are non-conventional strategies to
solve a problem. Heuristics can be seen as some simple rules, educated guesses or intuitive
judgments.
4.5 Other factors and figures 55
Figure 4.7: Systematic overview of Rating Model Corporate.
it can be assumed that there is some variance there. The KOB rating is on the
scale 0 to 100, where 0 is the worst, and 100 is the greatest. So if the rating is low
then the creditworthiness is also low. The KOB rating is a weighted conclusion,
KOB Credit Rating Risk

B Very High/Unknown
0-14 Very high
15-33 High
34 - 49 Moderate
50 - 69 Normal
70 - 80 Low
81 -100 Very low
Table 4.8: Creditworthiness of credit rating of the KOB model.
where the economical factors have the highest weight but there are also other
factors that are taken into consideration. These factors can have positive or
negative effects and can change the ratings given in Table 7.16. There are some
complications regarding the KOB score as their are some firms that are rated B
and the some number e.g. B50. In order to solve that all firms rated with B50
and higher where given the numeric value 20 and all firms having ratings lower
56 Data Resources
Factors in the KOB Model Weight

Master data 25%
- Buisness sector
- Age
- Numer of Employes
Economical data 50%
- Solvency
- Return on Equity
- Liquidity
- Netto results
- Equity
Other data 25%
- Payment History
- Accountant’s Annotation
- Qulitative measure
Table 4.9: Creditworthiness of credit rating of the KOB model.
than B50 were given the numeric value 10.
4.5.3 Other figures
In the datasets generated from the banks database there are few other factors
and figures that have not been mentioned earlier they are the following
- Lowest Earlier Rating

- Highest Earlier Rating
- Guarantor Rating
- Subjective Rating
- Firms Identity Number
- Default
- Equity
these figures and factors are now given a brief introduction. In mathematical
notations these figures will be referred to as the greek letter sigma, ς, and the
first letter of the figure as a subscript.
4.5 Other factors and figures 57
Lowest and Highest Earlier Ratings
Lowest and highest earlier ratings are the maximum and minimum rating the
firm has had over the last twelve months. Earlier ratings should only be taken
into consideration with the utmost care. When earlier values are used in model-
ing purposes they are often referred to as having a memory. Including a variable
with a memory could undermine the robustness of the other variables.
Guarantor Rating
Guarantor Rating is the rating of the guarantor. A firm is said to have a

guarantor, if some other firm is ready to adopt the debt the borrower defaults
on its debt.
Subjective Rating
Credit experts can give their subjective opinion on what the final credit rating
should be. Credit experts are only supposed to give this subjective rating if it
is in their opinion some external factors influencing the firms creditworthiness.
Firms Identity Number
Each firm has an identity number that is used to obtain matching information
between different datasets.
Default
The dependent variable is a logistic variable stating whether the firm has fulfilled
it’s obligations or not. A formal and much more detailed description can be seen
in Section 2.
Equity
The shareholder’s equity is the difference between the total assets and total debt.
Should all the firms assets be sold and all liabilities settled then the shareholders
58 Data Resources
would receive the difference, called equity.
4.6 Exploratory data analysis
The relative and cumulative frequencies and the relative ROC curve of 2005 and
2006 data can be seen in Figure 4.8. The relative and cumulative frequencies
and the relative ROC curve of 2005 and 2006 data can be seen in Figure 4.9.
The complete datasets where used to form Figures 4.8 and 4.9. The default
frequency of the datasets can be seen in Figure 4.8 and it is interesting to see
that there is quite some difference between years. Likewise, it is interesting
to see the difference between the distributions of the bad cases. There is also
considerably better results for the 2006 dataset compared to the 2005 dataset.
4.6.1 Variable Discussion
The number of variables used in this analysis is quite limited. It is thus worth
concluding with few word on variable selection for the development of a new
credit rating model.
Chen et al. [10] lists 28 variables for modeling credit default and discusses their
predictive power, using support vector machine as a modeling procedure. Behr
and Güttler [7] report quite a few interesting points on variable selection for a
logistic regression. Another interesting thing is that their research is performed
with a dataset of ten times the size of the available data for this research.
For a logistic regression it might improve the model performance if model vari-
able age would be measured as a continous variable, then by using CART anal-
ysis it could be possible to obtain information on at what age interval firms are
most vulnerable to solvency problems.
Payment history of firms is likely to be a good source of information. By consid-

ering firms that always make their payments of time, they can be seen as firms
that are not subject to cashflow problems. On the other hand firms that are
making their payments late, but escaping default should be documented and
used as early warning indicators.
4.6 Exploratory data analysis 59
Relative Frequency for default and non−default Relative Frequency for default and non−default
cases of the 2005 dataset cases of the 2006 dataset
Non−default
Relative Frequency
Relative Frequency
0.4
Default
0.4
0.2
0.2
0.0
0.0
2 4 6 8 10 12 2 4 6 8 10 12
Rating Class Rating Class

The default rate for the 2005 dataset is 1.51 % The default rate for the 2006 dataset is 2.68 %
Cumulative Frequency for default and non−default Cumulative Frequency for default and non−default
cases of the 2005 dataset cases of the 2006 dataset
Cumulative Frequency
0.8
0.8
0.4
0.4
0.0
0.0
2 4 6 8 10 12 2 4 6 8 10 12
ROC Curve for the 2005 dataset ROC Curve for the 2006 dataset
Relative Frequency of default cases

0.8
0.8
0.4
0.4
0.0
0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Relative Frequency of non−default cases Relative Frequency of non−default cases
Figure 4.8: The relative and cumulative frequencies and the relative ROC curve
of 2005 and 2006 data using complete datasets.
60 Data Resources
Relative Frequency for default and non−default Relative Frequency for default and non−default
cases of the 2007 dataset cases of the whole set
Non−default
Relative Frequency
Relative Frequency
0.4
0.4
Default
0.2
0.2
0.0
0.0
2 4 6 8 10 12 2 4 6 8 10 12

The default rate for the 2007 dataset is 0.872 % The default rate for the whole dataset is 1.68 %
Cumulative Frequency for default and non−default Cumulative Frequency for default and non−default
cases of the 2007 dataset cases of the whole set
0.8
0.8
0.4
0.4
0.0
0.0
2 4 6 8 10 12 2 4 6 8 10 12
ROC Curve for the 2007 dataset ROC Curve for the whole dataset

0.8
0.8
0.4
0.4
0.0
0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Relative Frequency of non−default cases Relative Frequency of non−default cases
Figure 4.9: The relative and cumulative frequencies and the relative ROC curve
of 2007 and all available data using complete datasets.
Chapter 5
The Modeling Toolbox
As competition gets harder in the banking sector, advances are constantly sought
at all levels. To this, standards in modeling is no exception. This chapter con-
tains an overview of some of the methods used to analyze data, construct models
and validate the outcome. An effort was made to make the mathematical nota-
tions as simple as possible for those readers with less statistical or mathematical
background. In the sections where more advanced topics are introduced, a brief
summary of the concept and its usage, is given in order to make it easier for
readers with less statistical knowledge to understand the topic.
The general theory behind linear models and generalized linear models is in-
troduced in Sections 5.1 and 5.2. Discriminant analysis in Sections 5.3 and
different classification methods in Sections 5.4 and 5.5. In Section 5.1 some of
the basic concepts of statistics are introduced, whereas more advanced methods
of statistics are introduced in Sections 5.2-5.5. Finally in Section 5.6 a method
used to reduce multidimensional data sets to lower dimensions is introduced.
5.1 General Linear Models
In this section, a brief introduction to several of the methods generally intro-

duced in elementary courses on statistics is given. The methods are linear
62 The Modeling Toolbox
regression and the analysis of variance or covariance and are often referred to as
general linear models. The general linear models dates back to Carl Friedrich
Gauss (1777-1855).
The underlying assumptions of the general linear model are introduced in Sec-
tions 5.1.5 - 5.1.7. In our complex world, there exist problems that do not
fit those underlying assumptions of the general linear models and therefore an
extension called generalized linear models is introduced in section 5.2. As it
is rather inconvenient that both general and generalized linear models have
the same initials, general linear models will be abbreviated as LM and the
generalized linear models as GLM. Montgomery and Runger [24] give a good
introduction to LM some of the topics are adapted therefrom.
5.1.1 Linear Regression
The mathematical notation of the general linear regression is widely known and
can be found in most statistical textbooks. Even though linear regression is
not used directly in the model it is the foundation of the logistic regression and
general knowledge of it is of great importance. Linear regression is used to model
a certain variable called the response or dependent variable which is denoted here
as y. The dependent variable is modeled with explanatory variables, called the
independent variable, those variables are the variables that are used to model
the dependent variable, and are denoted as X here.
In the case of general linear regression the dependent variable y is assumed to

be independently identically normally distributed random variable
y ∼ N (µ, σ 2 )
with mean µ and variance σ 2 . From n observations (y1 , y2 , . . . , yi , . . . , yn ), an
observation from y can be written as
yi = µ + εi (5.1)
where εi is an error term. This can be written in vector form as
y = µ+ε (5.2)
As the mean, µ, of y is the simplest form of describing data it is called the
minimal model. An improvement of the minimal model can be obtained if
there is some known or possible relationships between the dependent variable
and some k independent variables. A model with all k variables is then called
the maximal model. A linear function g(β, X) called the link function is thus
introduced
y = g(β, X) + ε (5.3)
5.1 General Linear Models 63
where β is a set of unknown parameters weighting the influence of the inde-

pendent variables and if k independent variables are available then the function
becomes
y = β0 + β1 x1 + β2 x2 + · · · + βk xk + ε (5.4)
where β0 measures the amount not explained by the variance of the independent
variables1 . In matrix format this can be written as
y = Xβ + ε (5.5)
where
  
y1 1 x1,1 x1,2 ··· x1,k
 y2  1 x2,1 x2,2 ··· x2,k 
   
y =  . , X = . .. .. .. .. 
 ..   .. . . . . 
yn 1 xn,1 xn,2 ··· xn,k
   
β0 ε1
 β1   ε2 
   
β= .  and ε=. 
 ..   .. 
βk εn
where X is called the design matrix, β is the unknown parameter vector and ε is
the error term. By using historical data it is possible to estimate the parameter
vector using the least squares estimation, introduced in Section 5.1.2. From the
estimated parameter vector β it is possible to obtain a fit or a prediction of the
dependent variable.
ŷ = X β̂ (5.6)
The term ŷ is a vector called the predicted or fitted values. The error term can
then be measured as the difference between the actual observed values,y and
the predicted values ŷ
ε = y − ŷ = y − X β̂ (5.7)
The measured error term is usually referred to as the residuals.
5.1.2 Least Squares Estimation
Least squares estimation refers to the estimation of the parameter vector β =

[β0 , . . . , βk ]T . In order to find the most adequate parameters a loss function
L(β) is introduced, representing the error of the model. Then the objective
1 Simply the intercept in two dimensions. If there are no independent variables then β0 = µ
is to obtain the estimate of the β parameters that minimizes the loss function
L(β).
β̂ = arg min L(β) (5.8)
β
As quadratic functions are differentiable and thus easier to optimize a quadratic

loss function L2 (β) is proposed
n
X
L2 (β) = ε2i = εT ε = (y − Xβ)T (y − Xβ) (5.9)
i=1
L2 (β) is also referred to as the residual sum of squares (RSS). This minimization
is called least squares estimation and is obtained by equating the first derivation
of the loss function L2 (β) to zero and solving for β. Without going into detail
the resulting equations, called the normal equations, that must be solved are
X T X β̂ = X T y (5.10)
If X T X is nonsingular2, then the columns of X are linearly independent, and

an unique solution is given by
β̂ = (X T X)−1 X T y (5.11)
5.1.3 Hypothesis Testing
To test the significance level of a particular parameter estimate, a hypothesis

that β̂j = 0 is proposed. This is done by introducing the standardized param-
eter, z, which is the ratio between the parameter estimate and the variance of
its estimate
β̂
z= (5.12)
V (β̂)
In order to calculate the standardized parameter it necessary to calculate the

variance-covariance matrix of the the least squares parameter estimates. The
variance-covariance matrix of the least squares parameter estimates,V (β̂), can
be derived from equation (5.11) to be
V (β̂) = (X T X)−1 σ 2 (5.13)

2 Singular matrix is a non-invertible matrix, that is it has a determinant zero.
where an unbiased estimate of the variance σ 2 is found as

N
1 X
σ̂ 2 = (yi − ŷi )2 (5.14)
N − p − 1 i=1
the hypothesis that a particular coefficient β̂j = 0 , a particular standardized

coefficient zj is defined as
β̂j
zj = √ (5.15)
σ̂ cj
where cj is the jth diagonal element of (X T X)−1 . Under the null hypothesis
that βj = 0, zj is compared to different significance levels, α, of the student-t
distribution with N − k − 1 degrees of freedom, t(N −k−1,α) . A large absolute
value of zj will lead to rejection of the null hypothesis, i.e. large zj represent
significant β estimates.
Another test, frequently applied for LMs is a test for model reduction. Consider
models M0 and M1
   
β0 β0
 β1   β1 
H0 : β = β 0 =  
. . . and H1 : β = β 1 =  
. . .
βq βp
where q < p < N . Let X0 , X1 , β0 and β 1 denote the corresponding design

matrix and least square estimates of models M0 and M1 , respectively. Then
revisiting the residual sum of squares (RSS) in equation (5.7), a new term called
the deviance is introduced
1
D= (y − Xβ)T (y − Xβ)
σ2
1
= 2 (y T y − 2βT X T y + β T X T Xβ)
σ
1
= 2 (y T y − β T X T y) (5.16)
σ
The deviances D0 and D1 can thus be calculated and used to calculate the
F -statistic
D0 − D1 D1
F = / (5.17)
p−q N −p
The null hypothesis is thus rejected for large values of F relative to some α-level
of the F (p − q, N − p) distribution.
5.1.4 Goodness of Fit
As a measure of the goodness of fit for multiple linear regression models the
coefficient of determination is introduced. The coefficient of determination, R2 ,
is based on the comparison of a suggested model to the minimal model. The
residual sum of squares (RSS0 ) of the minimal model introduced in equation
(5.2) is the largest and worst reasonable RSS value. The RSS for any other
model can be computed and comparedto RSS0 .
RSS0 − RSS
R2 = (5.18)
RSS0
For a perfect fit the RSS will be zero and the resulting coefficient of deter-
mination will be one. The coefficient of determination for the minimal model
will be zero. All models improving the minimal models should thus have R2
satisfying 0 ≤ R2 ≤ 1. The coefficient of determination can be interpreted as
the proportion of the total variation in the data explained by the model. For
R2 = 0.5 then 50% of the total variation is explained by the model.
Despite the popularity and ease of interpretation, R2 has some limitation as a

measure of goodness of fit. One of the limitations is that it is biased to the
number of variables in the model i.e. R2 will become higher as the number of
variables is increased. In order to solve this problem the coefficient of determi-
nation is adjusted by introducing the number of independent variables p,

2 n−1
Radj = 1 − (1 − R2 ) (5.19)
n−p
Still both R2 and Radj

2
lack generalization and thus a suggestion for better
measure is the Akaike Information Criteria (AIC). The AIC is not a test on the
model in the sense of hypothesis testing, rather it is a tool for model selection.
Generally the AIC is defined as
AIC = −2 · loglik + 2p (5.20)
where loglik is the maximized value of the log-likelihood function for the esti-
mated model. The AIC takes the number of variables p into the account, just
2
like the Radj . For the logistic regression model the binomial log-likelihood the
AIC becomes
2 2p
AIC = − · loglik + (5.21)
N N
If there are several competing models built from the same data then they can
be ranked according to their AIC, with the one having the lowest AIC being the
best.
5.1.5 Normality
Normality refers to the distributional assumptions made on the variables. For

LMs, the dependent variable y is assumed to be normally distributed
y ∼ N(µ, σ 2 )
and the same distributional assumption are thus necessary for the identities on
the right side of equation (5.5). The objective of the linear modeling is to find
the linear combination of the design variables X T β that results in a zero mean
residual vector
ε ∼ N(0, σε2 )
leading to the assumption that ŷ = Xβ is also normally distributed
ŷ ∼ N(µ, σŷ2 )
which holds except for the minimal model then ŷ = β0 = µ. Now it is easy
to see that the variance of the minimal model is the variance of the original
observations σε2 = σ 2 . As it is the objective of the modeling to describe the
variance of the observations it is clear that σ 2 is the maximum variance and
that it is desired to find a model the results in a decrease in variance.
5.1.6 Homoscedasticity
One of the underlying assumptions for the LM is that the random variables, both
dependent and independent, are homoscedastic, that is all observations of each
variable have the same finite variance. This assumption is usually made before
the modeling and leads to adequate estimation results, even if the assumption
is not true. If the assumption is not true the model is called heteroscedastic.
Heteroscedasticity can be observed from scatterplots of the data variables. Het-
eroscedasticity can also be observed from analyzing residual plots, that is plot of
residuals against the dependent variables. An example of both Homoscedastic
and Heteroscedastic data can be seen in Figure 5.1.
Serious violations of the homoscedasticity assumption might result in a rightfull

estimation of parameters but predictions would be errorenus, especially near the
scattered end. In order to correct the violation of the assumption it would be
possible to either transform the variables, with some nonlinear transformation,
or use weighted least squares estimation.
Homoscedasticity Heteroscedasticity
120 120
100 100
80 80
60 60
y
40 40
20 20
0 0
−20 −20
−20 0 20 40 60 80 100 120 −20 0 20 40 60 80 100 120 140
x1 x2
Figure 5.1: Examples of Homo- and Heteroscedastic data.
5.1.7 Linearity
The term linear in the general linear regression might cause confusion for those
who are not familiar with the term, in the sense what is linear? The term refers
to the β parameters, as they must form a linear collection of the independent
variables. The independent variables may be transformed as desired by non-
linear functions. Such transformations are made on non-normally distributed
variables to try to make them closer to be normally distributed. A better fit
is observed if the independent variables are normally distributed. In order to
make this clear two simple examples are shown.
y = β0 + eβ1 x1 + β2 x2 + · · · + βp xp + ε (5.22)
y = β0 + β1 ex1 + β2 ln x2 + · · · + βp x2p + ε (5.23)

The example in equation (5.22) is a nonlinear regression equation as it is not
a linear collection of independent variables, and could thus not be written in
matrix form. The example in equation (5.23) is a linear regression equation even
though nonlinear functions are applied to the independent variables. This even
frequently done in order to transform variables that are not normally distributed.
The main difference is that non-linear functions are applied to two different
thing, parameters and variables.
5.2 Generalized Linear Models 69
The independent variables do not need to be continuous variables and categorical

variables can be introduced as binary dummy variables3 . In general a categorical
variable with k-levels can be modelled with k − 1 dummy variables.
5.2 Generalized Linear Models
As data do not always comply with the underlying assumptions, more advanced
methods were developed and called generlized linear models GLM. Since the
term was first introduced by Nelder and Wedderburn (1972), it has slowly be-
come well known and widely used. Acknowledgment has to be given to the
contribution of the computer age, which has brought access to large databases
and major advances in computing resources.
The main idea behind GLM is to formulate linear models for a transforma-
tion of the mean value, with the link function, and keep the observations un-
transformed and thereby preserving the distributional properties of the obser-
vations. Consider the situations when:
- The response variables have non-normal distributional assumptions be-

havior, even categorical rather than continuous
- There is a non-linear relationship between the response and explanatory
variables
One of the advances was the recognition that the nice properties of the normal
distributions where shared by a wider class of distributions called the exponential
family of distributions. The exponential family will be introduced in the next
section. Most definitions and theory in this section is influenced by two books
on GLM i.e. Thyregod and Madsen [31] and Dobson [14].
5.2.1 The Exponential family of Distributions
The exponential family are in fact two sets of families, the natural exponential
family and the exponential dispersion family.
Consider the univariate random variable y whose probability distribution de-

pends on a single parameter θ, having the density function f (y, θ). The natural
3 Binary variable is a variable which takes only the values 0 or 1. A dummy variable is an
artificial variables
exponential family is defined as
f (y, θ) = c(y) exp(θy − κ(θ)) (5.24)
where the function κ(θ) is called the cumulant generator. The formulation in
5.24 is called canonical parameterization of the family and the parameter θ is
called the nuisance or canonical parameter.
The dispersion family has an extra parameter the so-called dispersion parameter.
The exponential dispersion family is defined as
f (y, θ) = c(y, λ) exp(λ[θy − κ(θ)]) (5.25)
where the λ > 0 is called the precision or dispersion parameter.
The exponential family forms the basis for the discussion on GLM.
5.2.2 Logistic Regression
In case the dependent variable is measured on the binary scale [0,1], the math-
ematical problem is called logistic regression. Logistic regression is special case
of the GLM and is given a discussion here. A definition of a binary random
variable
1 if an event occurs
Z=
0 if an event does not occurs
the probabilities of each case can be modeled as Pr(Z = 1) = p and Pr(Z =
0) = 1 − p. For n such independent random variables (Z1 , Z2 , . . . , Zn ) with
probabilities Pr(Zi = 1) = pi , the joint probability function is
n
" n X n
#
Y
zi 1−zi
X pi
pi (1 − pi ) = exp zi ln + ln(1 − pi ) (5.26)
i=1 i=1
1 − pi i=1
Which can be shown to be member of the exponential family.

P
Consider the random variable Y = Z/n to have the binomial distribution
Bin(n, p), the probability function of Y can be written as

n y
f (y, p) = p (1 − p)n−p y ∈ 0, 1, . . . , n (5.27)
p
where n ≥ 1, 0 ≤ p ≤ 1 and

n n!
=
p y!(n − y)!
5.2 Generalized Linear Models 71
The mean and variance of y are

E(y|p) = np V(y|p) = np(1 − p) (5.28)
The probability function of Y can be written on its canonical form as

n p
exp[ y ln + n ln(1 − p)] (5.29)
p 1−p
by comparison to equation (5.25) it is clear that the binomial distribution
belongs
to exponential dispersion family, with the canonical parameter θ =
p
ln 1−p , κ(θ) = ln(1 + exp(θ)) and λ = n.
There are several link functions for the binomial distribution. A popular link
function is the logistic or logit link function

pi
g(p) = ln = X Ti β
1 − pi
exp(X Ti β)
p(x) = (5.30)
1 + exp(X Ti β)
There are other members of the exponential family that can be used in logistic
regression, introducing:
The probit model

g(p) = Φ−1 (p) = X Ti β
p(x) = Φ(X Ti β) (5.31)
where Φ denotes the cumulative probability function of the normal distribution
N (0, 1) and Φ−1 denotes its inverse. The logit and probit link function are very
similar, they are both symmetric around x = 0, but the probit converges faster
towards 0 and 1 when x → ∓∞.
Furthermore are there two asymmetric link functions:
The complementary log-log link

g(p) = ln − ln(1 − p) = X Ti β
p(x) = 1 − exp[− exp(X Ti β)] (5.32)
The log-log link

g(p) = ln − ln(p) = X Ti β
p(x) = exp[− exp(X Ti β)] (5.33)
The complementary log-log increases slowly away from 0, whereas it approaches

1 fast. The log-log is the inverse i.e. increases fast away from 0 but slowly
approaches 1.
As binary values can be obtained by rounding the binomially distributed values.

Binary response variables can be modeled with the link functions derived from
the binomial distribution.
5.2.3 Model Selection and Assessment
In the model selection process for the GLM an important measure is the de-
viance, also called the log likelihood statistic. Consider the log-likelihood l (y, θ)
function of a member of the exponential family, the deviance is defined as
d(y, θ) = 2 max l (y, θ) − 2l (y, θ) (5.34)
θ
the deviance of the binomial distribution is a rather complex function and will
therefore not be derived here. The deviance is usually reported in the result
summary of most computation software handling the GLM. It is interesting
that the deviance of Y ∼ N (µ, σ 2 I) is simple the residual sum of squares (RSS)
introduced in Section 5.1.2. The deviance is thus used in the same matter
in GLM as the RSS is used in the LM, and models with smaller deviance is
preferable than models with larger deviance.
A hypothesis testing is proposed, consider two models M0 and M1 having the

same probability distribution and the same link function but differ in number
of explanatory parameters i.e.
   
β0 β0
 β1   β1 
H0 : β = β 0 =  
. . . and H1 : β = β 1 =  
. . .
βq βp
M0 has q parameters and M1 has p parameters and q < p < N . If both M0
and M1 describe the data well then their deviance should be D0 ∼ χ2 (N − q)
and D1 ∼ χ2 (N − p), respectively.To test the H0 hypothesis against the H1
hypothesis, the difference in deviance ∆D = D0 − D1 is used. The ∆D is can
be shown to be
∆D ∼ χ2 (p − q).
If the value of ∆D is lower than χ2 (p − q) the simpler model, M0 , would be

preferable, on the other hand if ∆D is greater than χ2 (p − q) model, M1 ,
provides a better description of the data.
5.3 Discriminant Analysis 73
As a measure of the goodness of fit the null deviance, D0 is also frequently

reported. The null deviance is the deviance of the minimum model and can thus
be considered as an upper bound of deviance. It is thus possible to calculate an
analog of R2 introduced in Section 5.1.4,
D0 − D
pseudo R2 =
D0
that is used to represent the proportional improvement in the log-likelihood

function influenced by addition of predictive variables, compared to the minimal
model.
5.3 Discriminant Analysis
In this section, discriminant analysis (DA) is given an brief introduction. In 1936

the English statistician R.A Fisher introduced a linear discriminant generally
referred to as Fisher’s linear discriminant which formed the basis for linear
discriminant analysis (LDA). Since Fisher introduced his work, discriminant
analysis have been studied extensively, where nonlinearity and optimization of
decision boundaries has been considered. Notations in this section assumes that
there are only two classes. The notation is influenced from Hastie et al. [18]
which gives a much more detailed discussion on the topics introduced here.
5.3.1 Linear Discriminant Analysis
Linear discriminant analysis (LDA) arises from the assumption that the two
dependent variables are normally distributed. Consider the binary dependent
variable y forming the two classes j ∈ {0, 1} and k independent variables X
spanning the k-dimensional input space Rk . Decision theory for classification
expresses the need to know the class posteriors Pr(Y |X) for optimal classifica-
tion. Suppose fj (x) is the class conditional density of X and πj is the prior
probability of class j. Then from Bayes theorem
fj (x)πj fj (x)πj
Pr(Y = j|X = x) = P1 = (5.35)
j=0 fj (x)πj
f 0 (x)π0 + f1 (x)π1
Consider the two classes defined by the binary dependent variable to be as-
sociated with a multivariate independent variables which are assumed to be
multivariate normally distributed populations

1 X
X 0 ∼ Nk (µ0 , Σ0 ) ⇐⇒ µ̂0 = X0i
N0 i
1 X
X 1 ∼ Nk (µ1 , Σ1 ) ⇐⇒ µ̂1 = X1i (5.36)
N1 i
the prior probabilities of each class is the relative proportion of observations in

each class.
N0 N1
π0 = π1 = (5.37)
N N
Then consider the multivariate normal density function

1 1
fj (x) = p exp − (x − µj )T Σ−1 j (x − µj ) (5.38)
(2π)k/2 det(Σj ) 2
Consider for now that they have a common covariance matrix Σ0 = Σ1 = Σ.

By taking a look at the log ratio of the conditional probabilities
Pr(Y = 1|X = x) f1 (x) π1
ln = ln + ln
Pr(Y = 0|X = x) f0 (x) π0
π1 1
= ln − (µ1 + µ0 )T Σ−1 (µ1 − µ0 ) + xT Σ−1 (µ1 − µ0 )
π0 2
(5.39)
it is possible to see that the equal covariance matrices assumption causes the
normalization factors to cancel, as well as the quadratic part in the exponents.
It is from the assumption that the covariance matrices are equal, that the LDA
is derived. That is however, hardly the case in practice and thus an estimate of
the covariance matrix is used, it is found to be
!
1 X
T
X
T
Σ̂ = (X0i − µ̂0 )(X0i − µ̂0 ) + (X1i − µ̂1 )(X1i − µ̂1 )
N0 + N1 − 2 i i
(5.40)
Even though it is not obvious, it can be seen that equation (5.39) is a linear
function of x. The discrimination boundary is set where
Pr(Y = 0|X = x) = Pr(Y = 1|X = x)
and from the linear connection to x the boundary is a k dimensional hyperplane.

The boundary would separate the Rk into two regions one belonging to class
zero and the other to class one. From (5.39) it is possible to derive the linear
discriminant functions
1
δj (x) = xT Σ−1 µj − µTj Σ−1
j µj + ln πj (5.41)
2
for each class, a classification of x would be fore, the discriminant function

having the larger value. All the parameters in (5.41) can be calculated from
equations (5.40) and (5.36)
It is also possible to combine the two discriminant function and then the LDA
rule would classify for class 1 if
1 T −1 1 π1
xT Σ̂−1 (µ̂1 − µ̂0 ) > µ̂ Σ̂ µ̂1 − µ̂T0 Σ̂−1 µ̂0 + ln (5.42)
2 1 2 π0
the following where true otherwise it would classify for class 0.
Lando [21] points out an interesting connection between the LDA and logis-
tic regression using logit link function. From the results of Bayes theorem in
equation (5.35) it is possible to write the probability of default as
p(x) = Pr(Y = 1|X = x)

f1 (x)π1
= (5.43)
f0 (x)π0 + f1 (x)π1
Now from the proportional probabilities
p(x) Pr(Y = 1|X = x) f1 (x)π1

= = (5.44)
1 − p(x) Pr(Y = 0|X = x) f0 (x)π0
Applying the logarithm and comparing equation 5.39 with equation 5.30 it ap-
pears as if the LDA and LR are the same.
Pr(Y = 1|X = x) f1 (x) π1

ln = ln + ln
Pr(Y = 0|X = x) f0 (x) π0
π1 1
= ln − (µ1 + µ0 )T Σ−1 (µ1 − µ0 ) + xT Σ−1 (µ1 − µ0 )
π0 2
T
=βj x (5.45)
Although they are not, the both have the same linear form but differ in the
way the linear coefficients are estimated. The logistic regression is considered
to be more general, relying on fewer assumptions. The LDA assumes that
the explanatory variables are normally distributed and that they have equal
covariance matrices, even though the assumption regarding equal covariance
matrices is considered to be less significant. Those assumptions make it possible
to consider the logistic regression as a safer bet, as it is more robust than the
LDA.
5.3.2 Quadratic Discriminant Analysis
If the general discriminant problem in equation (5.39) is reconsidered and now

as if Σ0 6= Σ1 it results decision boundary that has a quadratic relation with x.
The result is known as quadratic discriminant analysis (QDA)
1 1
δj (x) = − ln (det(Σj )) − (x − µj )T Σ−1
j (x − µj ) + ln πj (5.46)
2 2
the same procedure goes for the QDA as the LDA in the sense that the classi-
fication is fore the class that has the higher discriminant function.
5.3.3 Support Vector Machine
In LDA, the linear decision boundaries is determined by the covariance of the

class distributions and the positions of the class centroids, an extension of this
is called the support vector machine (SVM) an optimization method, which
simultaneously minimize the empirical classification error and maximize the
geometric margin between the two data sets. There are two variants of the
SVM one where the two classes are completely separable and one where they are
non-separable. In the case of credit default analysis the data are most definite
non-separable and thus the non-separable case is presented here. Shifting into
a higher mathematical gear the SVM can be defined as
(
yi xTi β + β0 ≥ 1 − ξi ∀i,
min kβk subject to P (5.47)
ξi ≥ 0, ξi ≤ γ.
where, γ, is a tuning parameter and, ξi , are slack variables, which measure

the degree of misclassification of observation i belonging to class j inside the
other class. The representation in equation (5.47) is a non-convex optimization
problem and thus a hard problem. In order to make the problem a bit easier it
is altered such that it is a quadratic programming (QP) optimization problem
without changing the solution. The problem in equation (5.47) is rewritten as
N
1 X
min kβk2 + γ ξi
β,β0 2
i=1

subject to ξi ≥ 0, yi xTi β + β0 ≥ 1 − ξi ∀i (5.48)
without going into detail the solution to this problem is found by writing out the
primal and the dual functions and solving with standard techniques. A study
by Yu et al. [34] reports that SVM outperformed logistic regression by some
margin. The introduction of SVM is adapted from Hastie et al. [18]. Further
discussion on the SVM and extensions of the SVM can also be found there.
5.4 k-Nearest Neighbors 77
5.4 k-Nearest Neighbors
The k-nearest neighbor method is a non-parametric classification method. Con-

sider the binary dependent variable y and k independent variables X spanning
the k-dimensional input space Rk . For a new observation xn the k-nearest
neighbor method calculates the average of the dependent variable y for the k
observations of the input space, X, closest to xn to form ŷ. The k-nearest
neighbor where y is binomial is defined as

0 if Ki < 0.5 1 X
ŷi (xn ) = Ki (k) = yi (xn ) (5.49)
1 if Ki ≥ 0.5 k
xn ∈Nk (X)
where Nk (x) is the neighborhood of x defined by the k closest points xi in the

training set. The closeness is a metric, frequently measured as the Euclidean
distance. The tuning parameter k is the number of neighbors considered when
the average is calculated. The tuning parameter is usually selected from a
validation set or by cross-validation.
As there are much fewer defaulted firms than non-defaulted in credit rating
models classifiers as k-NN do not work that well as classifiers. Their criteria
can though be used as additional information. For the k-NN classifier method,
the average of defaults in the neighborhood is used as an independent variable.
It is easy to argue against using different methods with the same input data,
but as they rely on completely different assumptions it can be thought of as
looking at the data from different perspectives to get a better view of the big
picture.
The discussion here on k-NN is adapted from Hastie et al. [18].
5.5 CART, a tree-based Method
Classification and regression trees (CART) is a non-parametric technique that

employs decision trees. The trees are either classification or regression trees,
depending on whether the dependent variable is a categorical or continuous
variable, respectively.
A major advantage of CART is how quickly significant variables can be spot-

ted and the graphical view of partitions can lead to recognition of significant
relationships and patterns. Other advantage is that CART can deal with incom-
plete data and works with multiple types of variables, from floats to categorical
variables, both as input and output variables.
Consider the binary dependent variable y and k independent variables X span-

ning the k-dimensional input space Rk . The basic concept behind this technique
is that a best possible binary partition, of a independent variable, is made recur-
sively. The binary partition refers to finding a splitting variable J and split point
s which make up a splitting rule. The determination of the split point s can
be done very quickly for each splitting variable. Then by scanning through all
input variables the best pair (J, s) is feasible. Having found the best pair (J, s),
the data are divided into two resulting regions, one satisfying the rule while the
other one does not. The process is repeated on all the resulting regions until
some stopping criteria is reached.
The splitting rules make up a binary decision tree, the solution algorithm does
not only have to automatically decide on the splitting variables and split points,
but also the shape of the tree. In mathematics trees grow from the first node,
called the root, to their final nodes called leaves. All nodes, except for the root
node have a unique parent node and the resulting relationship is that all nodes
except for the leaf nodes have exactly two child nodes. Ancestors refers to
parents, grandparents and so forth and points of connection are known as forks
and the segments as branches.
In order to make the best possible split it is necessary to introduce some

objective. Suppose it is desired to partition the input space into M regions
R1 , R2 , . . . , RM . In a node m, representing a region Rm with Nm observations,
let pm
1 X
pm = I(yi = 1) (5.50)
Nm
xi ∈Rm
be the proportion of default observations. Starting with the complete data set,
consider a splitting variable J and split point s, the objective is to solve
min Qm (T ) (5.51)
J, s
where Qm (T ) is called the node impurity of the tree T . The impurity of a set of
samples is designed to capture how similar the samples are to each other. The
smaller the number, the less impure the sample set is. There are few different
measures of Qm (T ) available, including the following:
Misclassification error: 1 − max(p, 1 − p)

Gini index: 2p(1 − p)
Cross-entropy or Deviance: −p log p − (1 − p) log(1 − p)
They are all similar, but cross-entropy and the Gini index have the advantage
that they are differentiable and thus better suited to numerical optimization
than the misclassification error.
5.5 CART, a tree-based Method 79
Following is an example of a binary tree:
Root Node
Jj T s1
True False
Jj T sm−1 Jj T sm
p1 .. pm ..
. .
N1 Nm
pm−2 pm−1 pm+1 pM
Nm−2 Nm−1 Nm+1 NM
The first splitting rule makes up the root node. All observations that satisfy the
splitting rule follow the left branch from the root node while the others follow
theright branch. At the leaf nodes the default proportion, pm , is reported along
with the number of observation, Nm in accordance to the splitting rules of its
ancestors. There are numerous stopping criterias including,
- Maximum number, M , of splits has been performed.

- Splits are not considered when nodes are pure, i.e. their pm is binary.
- If a minimum number of observation of a node is reached then no further
splits are considered.
The size of a tree is an important measure, as a large tree might overfit the data
whereas a small tree might miss out on important structures. A useful strategy
called cost-complexity pruning is often used to get well sized tree. The strategy
is as follows, a large tree T0 is grown, stopping the splitting process when some
small minimum node size is reached. Let the subtree T ⊂ T0 be any tree that
can be obtained by pruning T0 by collapsing any number of its internal nodes,
that is not the leaf nodes. Let |T | denote the number of terminal nodes in T .
Then the cost complexity criterion is defined
|T |
X
Cα (T ) = Nm Qm (T ) + α|T | (5.52)
m=1
The objective is then to find, for some α, the subtree Tα ⊂ T0 that minimizes
Cα (T ). The tuning parameter α > 0 governs the tradeoff between tree size and
its goodness of fit to the data.
While there is a certain degree of complexity that follows the use of CART it
has been shown to be a very helpful tool to analyze data. The introduction here
is mainly adapted from Hastie et al. [18].
5.6 Principal Component Analysis
Principal components analysis (PCA) plays an important role in modern data

analysis, as it is a simple yet a powerful method of extracting important infor-
mation from possibly confusing data sets. The discussion here is adapted from
Shlens [28] and Ersbøll and Conradsen [15]. Consider a data set, X, consisting
of n observations of m variables, such that X is an m × n matrix. For now, it
is assumed that X has the mean value 0, without loss of generality. Let Y be
another m × n matrix related to X by a linear transformation P .
PX = Y (5.53)
where
 
      x1,1 x1,2 ··· x1,n
p1 p1,1 ··· p1,m x1  x2,1
 ..   .. ..  , x2,2 ··· x2,n 
.. X =  ...  =  .
   
P = . = . . .  .. .. .. 
 .. . . . 
pm pm,1 ··· pm,m xm
xm,1 xm,2 ··· xm,n
 
  y1,1 y1,2 ··· y1,n
y1  y2,1 y2,2 ··· y2,n 
Y =  ...  =  .
   
.. .. .. 
 .. . . . 
ym
ym,1 ym,2 ··· ym,n
where P is a m × m rotation matrix which transforms X into Y . X is the

original recorded data set and Y is the principal component re-representation
of that data set. The following quantities also need definition:
- pj are the rows of P , where j = 1, . . . , m
- the rows of P are the principal components of X
- xi are the columns of X, y i are the columns of Y , where i = 1, . . . , n

5.6 Principal Component Analysis 81
- Geometrically, P is a rotation and a stretch.
On our journey to find the transformation P there are few things which need
to be introduced. The original data has the variance-covariance matrix C X , a
m × m matrix.
1
CX = XX T
n−1
Y also has a m × m variance-covariance matrix
1
CY = YYT
n−1
and it is the objective of PCA to find an orthogonal transformation P such that
the variance-covariance matrix C Y is a diagonal matrix.
5.6.1 Solving PCA, eigenvalue decomposition
The solution to PCA is based on an important property of eigenvector decompo-

sition. With the relationship between X and Y in equation (5.53) the objective
1
is to find some orthonormal matrix P such that C Y = n−1 Y Y T is diagonalized.
The first step is to write C Y as a function of P .

1
CY = YYT
n−1
1
= (P X)(P X)T
n−1
1
= P XX T P T
n−1
1
= P (XX T )P T
n−1
Now a new symmetric matrix A is defined A = XX T

1
CY = P AP T (5.54)
n−1
From linear algebra a symmetric matrix, such as A, is diagonalized by an or-
thogonal matrix of its eigenvectors. Mathematically,
A = EDE T (5.55)
where D is a diagonal matrix and E is a matrix of eigenvectors of A arranged

as columns. The trick is then to select the matrix P to be a matrix where
each row pi is an eigenvector of XX T . When this selection has been made

equation (5.55) can be re-written as A = P T DP . With this relation inserted
into equation (5.54) and recalling from linear algebra that the inverse of an
orthogonal matrix is its transpose (A−1 = AT ), our final destination is in sight.
1
CY = P AP T
n−1
1
= P (P T DP )P T
n−1
1
= (P P T )D(P P )T
n−1
1
= (P P −1 )D(P P −1 )
n−1
1
= D (5.56)
n−1
Objective of diagonalizing C Y is reached by choosing P as a matrix of eigen-

vectors of A.
Without going into too much detail, the eigenvalues λ of the symmetric m × m
matrix A is a solution to the equation
det(A − λI) = 0
there will exist m real valued eigenvalues, some may have equal values. If λ is
an eigenvalue, then there will exist vectors p 6= 0, such that
Ap = λp
where p 6= 0 is the eigenvectors of A.
There are few properties of the PCA that have need to be addressed:
- PCA can be summarized with the matrices P and C Y .
- The principal components of X are the eigenvectors of XX T ; or the rows

of P
- The j th diagonal element of C Y is the variance of X along pj the principal

components of X
- The eigenvalues λ are generally sorted in descending order and denoted as

λ1 ≥ λ2 ≥ · · · ≥ λm
- The generalized variance (GV ) of X and X can be defined as

GV (X) = det(C X )
m
Y
GV (Y ) = det(C Y ) = λi
i=1
The generalized variance (GV ) of the principal components is equal to the

generalized variance of the original observations.
GV (X) = GV (Y )
- Similar result is that the total variance i.e. the sum of variance of the
original variables is equal to the sum of the variance of the principal com-
ponents X X
V (xi ) = V (y i )
i i
- Cumulated proportion of the variance can be calculated as

Pk
λi λ1 + · · · + λk
Pi=1
m = for k ≤ m
i=1 λi λ1 + · · · + λk + · · · + λm
and it may be observed that few of the k first eigenvalues are considerable
larger than the rest of them m − k. From that possible result the assump-
tion can be made that the k first eigenvalue correspond to the interesting
dynamics of the data and that the rest of them corresponds to noise and
can thus be made redundant.
- As it is assumed that the mean of X is 0, the mean of each measurement
type should be subtracted, before the eigenvectors of XX T are computed.
- It can be shown that the principal components are dependent of the scales
of measurements of the original variables X. Therefore the standardized
variables should be considered. The standardization can be performed
by subtracting the mean and then dividing with the empirical standard
deviation or simply as
Xij − X̄j
Zj = qP
i (Xij − X̄j )/(n − 1)
Then the empirical correlation matrix form the basis of the analysis instead
of the empirical variance matrix.
- It is also possible to compute PCA with a method called singular value
decomposition (SVD) which is mathematically more involved but numer-
ically more accurate according to R Development Core Team [23].
Chapter 6
Validation Methods
A detailed validation is essential to the development process of a new credit rat-

ing model. The validation is used to assess the performance of each modeling
procedure The validation is performed on ratings which are obtained by trans-
forming estimates of the probability of default which are then obtained from
a modeling procedure, e.g. logistic regression. The introduction of validation
methods is mainly adapted from the thorough discussion of the same matter in
Datschetzky et al. [13]1 .
In Section 6.1 the important concept of discriminatory power is introduced.

The important visual methods of relative and cumulative frequencies and ROC
curves are introduced in Sections 6.2 and 6.3, respectively. Different numer-
ical measures are introduced in Section 6.4. The chapter closes with a brief
discussion about the validation methods in Section 6.5.
6.1 Discriminatory Power
The term, Discriminatory Power, plays a significant role in the validating pro-
cess. The term refers to the fundamental ability of a rating model to differentiate
1 Chapter 6
86 Validation Methods
between default and non-default cases. The discriminatory power of a rating

model can only be reviewed using earlier recorded data on default and non-
default cases. The discriminatory power is generally considered for each of the
possible rating class. In the following sections an introduction and discussion
on the methods and indicators used to quantify the discriminatory power of
rating models. After quantifying the discriminatory power of rating models it
is possible to evaluate which models perform better than others.
6.2 Relative frequencies and Cumulative frequen-

cies
The frequency density distribution and cumulative frequencies of default and

non-default cases serve as a point of departure for calculating discriminatory
power. The frequencies are used to calculate many discriminatory power indi-
cators and they also give quite a good visual representation of the discriminative
powers of the rating model. An example of a frequency density distribution can
be seen in Figure 6.1.
In Figure 6.1 the relative frequency of both default and non-default cases can be
seen for the 12 rating classes, where class 1 represents the worst firms and class
12 the best firms. From the density functions it is apparent how the ratings for
both default and non-default cases are distributed. It is desired that the two
distributions are considerable different in order to discriminate between default
and non-default cases. A nice distribution of the bad cases would have the most
observations at rating class one, as it does in Figure 6.1, decreasing smoothly
to the right. On the other hand the distribution of the good cases would be
preferred to be the mirrored distribution of bad cases, that is skewed to the
right. Most important is that the two distributions are different and separable.
The cumulative frequency is also used in to calculate some of the discriminatory

power indicators. In Figure 6.2 the cumulative frequency density distribution
for both default and non-default cases can be seen. It is apparent that there is
considerable difference between the two distributions, as desired.
6.3 ROC curves
The Receiver Operating Characteristic (ROC) curve is a common way to depict

the discriminatory power of a rating model. The ROC curve is constructed
6.3 ROC curves 87
Relative Frequency for default and non−default cases
Non−default
Default
0.4
0.3
Relative Frequency
0.2
0.1
0.0
2 4 6 8 10 12
Rating Class
Figure 6.1: Example of a frequency density distribution for both default and
non-default cases.
by plotting the cumulative frequency of non-default cases on the x-axis against

cumulative frequency of default cases on the y-axis. An example of a ROC curve
can be seen in Figure 6.2. Each point on the graph is the cumulative frequencies,
for both default and non-default cases, for each rating class. The cumulative
frequencies for the worst rating class is the point having the lowest cumulative
frequencies.
An ideal rating procedure would run vertically from (0,0) to (0,1) and horizon-
tally from (0,1) to (1,1). That rating process should also just have two ratings.
A rating model with no predictive powers would run along the diagonal. It is
desirable that the ROC curve is a concave function over the entire range. If this
condition is violated, then there is a rating class with lower default probability
than a superior rating class. It is obviously desired to have decreasing default
probabilities with higher ratings.
Cumulative Frequency for default and non−default cases

1.0
0.8
0.6
0.4
0.2
Non−default
Default
0.0
2 4 6 8 10 12
Rating Class
Figure 6.2: Example of a cumulative frequency density distribution for both

default and non-default cases.
6.4 Measures of Discriminatory Power
6.4.1 Area Under the Curve
The Area Under the Curve (AUC) is the a numerical measure of the area under
the ROC curve. For an ideal rating model the AUC should be 1 and for a non-
differentiating model it would be 0.5. The higher the value of AUC, the higher
is the discriminatory power of the rating procedure. AUC is a one-dimensional
measure of discriminatory power and does thus not capture the shape of the
ROC curve. In Figure 6.4 ROC curves for two different models with the same
AUC measure are shown. Then it impossible to select either model from the
AUC statistic alone. The steeper curve, corresponding to the black curve in
Figure 6.4, would though be preferred as it predicts better for rating classes
considering worse firms. The slope of the ROC curve in each section reflects
6.4 Measures of Discriminatory Power 89
ROC Curve
1.0
0.8
0.6
0.4
0.2
0.0
0.0 0.2 0.4 0.6 0.8 1.0
Relative Frequency of non−default cases
Figure 6.3: Example of a frequency density distribution for both default and
non-default cases. The red line represents the ideal procedure, whereas the blue
line represents a procedure with no predictive power.
the ratio of bad versus good cases in the respective rating class. It would thus
be preferred that the ROC curve would be steepest in the beginning and then
the steepness would decrease. This would make the curve concave 2 over the
entire range of the curve. A violation of this condition an inferior class will
show a lower default probability than a rating class which is actually superior.
A concave curve can of course be caused by statistical fluctuations but should
be avoided in a development process of a new rating model. Both curves in
Figure 6.4 are concave, the red curve is concave in the region near the (0.1,0.5)
point, and the black curve is concave in the region near the (0.9,0.95) point.
2A concave curve, is a curve curving in towards the diagonal blue line

ROC Curve
1.0
0.8
0.6
0.4
0.2
0.0
0.0 0.2 0.4 0.6 0.8 1.0
Relative Frequency of non−default cases
Figure 6.4: Comparison of two different ROC curves with the same AUC.
6.4.2 Gini Coefficient
The Gini coefficient is a discriminatory power measure frequently reported in

the literature. It is closely related to the AUC indicator and is thus not given a
proper derivation. The relationship between the Gini coefficient and AUC is,
Gini Coefficient = 2 · AU C − 1 (6.1)
and is thus completely correlated with the AUC indicator. It has thus no ad-
ditional information and is just calculated to compare model performance to
reported performance. The Gini coefficient has geometrical connection to a
graphical representation called the CAP curve or Powercurve, which is a graph
of the cumulative frequencies of default cases versus all cases. The ROC curve
is more sensitive than the CAP curve and is thus preferred.
6.4.3 Pietra Index
The Pietra Index is a one-dimensional measure of discriminatory power which

can be derived from the ROC curve. Geometrically the Pietra Index is defined
as twice the area of the largest triangle which can be drawn between the diagonal
and the ROC curve. The Pietra Index take the value 0 for a non-differentiating
model and 1 of the ideal one. A nice property of the Pietra Index is that it can
be interpreted as the maximum difference between the cumulative frequency
D
distributions, of default, Fcum , and non-default cases, Fcum .
D
Pietra Index = max[Fcum − Fcum ] (6.2)
It is possible to perform a Kolmogorov-Smirnov Test (KS Test) for the differ-

ences between the cumulative distributions of default and non-default cases.
The following null hypothesis is suggested: The distributions of default and
non-default cases are identical. The hypothesis is rejected at a level α if the
Pietra Index is equal or greater than
Kα
D= p (6.3)
N · p(1 − p)
where Kα is the α significancy level of the Kolmogorov distribution. The KS

test for two independent distributions is sensitive to all kinds of differences
between the two distributions. It considers not only location and shape of the
empirical cumulative distribution functions but also differences in variance and
other higher moments of the two samples.
6.4.4 Conditional Information Entropy Ratio, CIER
The term Information entropy is a measure of the uncertainty associated with

a random variable. If one knows exactly what the result will be, the entropy is
one, but if one knows nothing about the outcome the entropy is zero. This can
be explained with a simple example of throwing coins, the probability of either
outcome is assumed to be 0.5 and an experiment would prove the assumption.
The information entropy of this coin example is one.
If the average probability of default p is known, then the binary information

entropy is defined as
H0 = − [p log2 (p) + (1 − p) log2 (1 − p)] (6.4)
where H0 refers to the absolute information value which, represents the infor-
mation known, regardless of the rating procedure.
Conditional entropy is defined with conditional probabilities p(D|c) instead of

absolute probabilities p. The conditional probabilities refers to the probability
of default, p(D|) conditioned on the rating class c. For each rating class c, the
conditional entropy, hc , is defined as:
hc = − [p(D|c) log2 (p(D|c)) + (1 − p)(D|c) log2 ((1 − p(D|c)))] (6.5)
The conditional entropy hc of a rating class thus corresponds to the uncertainty
remaining with regard to the future default status after a case is assigned to a
rating class. The conditional entropy Hc of the entire portfolio is obtained by
calculating the weighted average
X
Hc = wc · hc (6.6)
c
where the weights, wc = nc /N , is the proportion of observations in each class.

The relative measure of the information gained due to the rating model, is
obtained using the entropy H0 , which is the known average default probability
of the sample. The conditional information entropy ratio (CIER) is defined as:
H 0 − Hc Hc
CIER = =1− (6.7)
H0 H0
The CIER indicator ranges form zero to one where zero refers to no information
gained by the rating model. If the rating model is ideal, no uncertainty remains
regarding the future default events and the CIER indicator takes the value one.
Generally the higher the CIER value is the more information regarding the
future default status is gained form the rating model.
6.4.5 Brier Score
The Brier Score is a measure of the accuracy of a set of probability assess-

ments, proposed by Brier (1950). It is the average quadratic deviation between
predicted default rate, p̂n , and the realized default rate, yn , so a lower score
represents higher accuracy.
N
(
1 X 2 1 for default in n
BS = (p̂n − yn ) yn (6.8)
N n=1 0 for non-default in n
however in practice, the Brier score is often calculated with the partitioning into
a three term equation, derived in Murphy [25]. Equation (6.8) is then rewritten
as
K K
1 X 1 X
BS = p(1 − p) + Nc (p̂c − pc )2 − Nc (p − pc )2 (6.9)
| {z } N c=1 N c=1
V ariance | {z } | {z }
Calibration Resolution
seperating the essential properties of the Brier Score. The first term
p(1 − p) (6.10)
describes the variance of the default rate observed over the entire sample, p.
This value is independent of the rating procedure and depends only on the
observed sample.
The second term

K
1 X
Nc (p̂c − pc )2 (6.11)
N c=1
is called calibration and represent the average quadratic deviation from the
forecasted and observed default probabilities for each class c. It is the objective
of a modeling procedure to have a low calibration in order to have a well-
calibrated model. The third term
K
1 X
Nc (p − pc )2 (6.12)
N c=1
is called resolution and describes the average quadratic deviation from the re-
alized default probabilities of the whole portfolio compared with the realized
default probabilities of individual rating classes. Larger values of the resolution
term are observed for rating models that differentiate well between good and
bad cases. The lower the Brier Score is the better the model is, but the differ-
ence in signs and absolute values of the terms in equation (6.9), make it difficult
to interpret the Brier Score. As the first term, the variation, is independent of
the rating procedure it is more appropriate to consider the Brier Skill Score
BS
BSS =1 −
p(1 − p)
K K
!,
1 X 1 X
= − Nc (p̂c − pc )2 + Nc (p − pc )2 p(1 − p) (6.13)
N c=1 N c=1 | {z }
| {z } | {z } V ariance
Calibration Resolution
Which is simple the Brier Score scaled with the variation term, which is constant
for each sample. Recalling that a low value for the calibration is desired and a
large value for the resolution, it is easy to see that larger values are desired for the
Brier Skill Score. Both the calibration and resolution can be considered as one-
dimensional measure of discriminatory power and thus BSS two dimensional.
The fact that the resolution term is larger than the calibration term in absolute
terms undermines the BSS. It might be better to consider the two terms sepa-
rately as a great improvement in calibration might be overseen, if the value of
the resolution would increase by the same amount. Reliability diagrams give a
visual representation of the Brier Skill Score and are considered in Section 6.4.6.
6.4.6 Reliability Diagrams
Reliability diagrams also referred to as the calibration curve show observed de-
fault rates against forecasted default rates. An example of a reliability diagram
can be seen in Figure 6.5. The red line in figure represents the observed default
Reliability Diagram
1e+02
1e+01
Observed Default Rate
1e+00
1e−01
1e−02
1e−02 1e−01 1e+00 1e+01 1e+02
Forcasted Default Rate
Figure 6.5: Double logarithmic plot showing an example of a reliability diagram.
frequency for the whole portfolio. The blue line is a diagonal line and represents
the optimal line for the calibration curve. The black line then represents the
observed calibration curve of a rating model. A well calibrated model proce-
dure would fall very closely to the diagonal line. It is observable that there are
six observations which make up the calibration curve, which means that there
where observed defaults in six rating classes of twelve.
The calibration as calculated by equation (6.11) can be seen as the weighted

average, weighted with the numbers of cases in each rating class, of squared
deviations of points on the calibration curve from the diagonal. Similarly the
resolution as calculated by equation (6.12) can be seen as the weighted average,

weighted with the numbers of cases in each rating class, of squared deviations
of points on the calibration curve from the red line, representing the observed
default frequency of the entire portfolio. It is desired to have the resolution as
high as possible, resulting in steeper calibration curves. The steepness of the
calibration curve is determined by the rating models discriminatory power and
is independent of forecasted default rates.
6.4.7 Stability Analysis
Rating of a firm is reconsidered before and after a credit agreement is made,

i.e. new loans are made or in case of refinancing. New ratings are also calcu-
lated once new financial statement are made available and when any addition
information are available that influence the firms creditworthiness. Transition
matrices are frequently constructed as a measure of how ratings change with
time. A transition matrix specific to a rating model indicates the probability of
transition from current rating to a future rating during a specific time period.
In practice, this period is usually one or more years. An example of a transition
matrix can be seen in Table 2.1. The current ratings are generally listed in the
Future Rating
A Aa B Bb C Cc Default
Current Rating
A 16 11 9 2 0 0 0
Aa 4 7 9 5 3 1 0
B 1 11 15 15 9 5 1
Bb 0 3 14 19 13 8 3
C 0 0 2 9 14 9 5
Cc 1 0 1 4 7 9 9
Table 6.1: Example of a Transition Matrix, comparing the changes between

years of risk grades.
columns and the future rating listed in the rows. The observed frequencies are
generally accumulated along the main diagonal of the matrix. The cases that
lie on the diagonal represent borrower who did not migrate from their original
rating over the observed time horizon. The more rating classes that are in use
the more frequently changes will be observed between ratings and lower the con-
centration along the diagonal. In order to calculate the transition probabilities
it is necessary to convert the absolute numbers into row probabilities. Each
row should thus sum up to one. Datschetzky et al. [13] suggests that in order
to get a more appropriate estimate of the transition probabilities the transition

matrix should be smoothed such that the transition probabilities are monoton-
ically decreasing away from the diagonal but making sure that each row sums
up to one.
Point-in-Time and Through-the Cycle Ratings
In Alexander and Sheedy [1] a formal definition to the important concepts of

Point-in-Time and Through-the Cycle Ratings. The two concepts refer to the
change of ratings with respect to time.
- In a point-in-time (PIT) rating approach, borrowers are classified into a

risk rating class based on the best available current credit quality infor-
mation.
- In a through-the-cycle (TTC) rating approach, borrowers are classified
into a risk rating class based on their ability to remain solvent over a full
business cycle. The borrower’s risk assessment should represent a bottom
of the cycle scenario, such that the risk assessment could be considered as
a stable estimate even at stressful times.
Financial authorities prefer rating systems to be as close to being TTC ap-

proaches as possible. Their preference is that the borrower’s risk assessment is
a worst-case scenario, as such a conservative rating system would cause little
fluctuation in the bank’s capital allocation requirements. With stable capital
requirements there would be little worries of liquidity issues for that bank itself.
Banks might prefer to have PIT rating systems, as it gives information on the
current condition of the borrower. While not explicitly requiring that rating
systems be PIT or TTC, Basel II does hint at a preference for TTC approaches.
Sun and Wang [29] state that stability analysis must take into account the time
homogenous of transition matrix to analyzing whether the model results are
PIT or TTC rating system. By recognizing that the identity matrix I as a
homogenous matrix, which can also be seen as a completely through the cycle
(TTC) rating procedure. The deviation from homogeneously can be measured
by defining a matrix P̃ representing the distance form the actual matrix P to
the homogenous matrix I.
P̃ = P − I (6.14)
Jafry and Schuermann [19] discuss various methods on how to measure the
deviation from homogeneously and propose a metric defined as the average
singular value of a transition matrix, Msvd ,described in equation 6.15, where
6.5 Discussion 97
λi denotes the ith eigenvalue. The eigenvalues can be obtained using singular
value decomposition, which makes it easy to compute the average singular value
as the average and the resulting diagonal matrix D.
n q
1X
Msvd (P ) = λi (P̃ 0 P̃ ) (6.15)
n i=1
For the identity matrix which can be seen as a representative matrix for through-
the-cycle (TTC) ratings and the resulting average singular value is zero. For a
completely point-in-time (PIT) ratings the average singular value is one. The
scale is linear in between those two values.
6.5 Discussion
In the stability analysis there seems to be a dilemma on the capital allocation

matter. Banks would like to keep as little principal solvent as possible, while
financial authorities would like that amount to be high.
It seems trivial that banks would hold two rating models, one that could be
considered as a PIT and another that was TTC. The PIT rating system could
then consider macro-economic variables along with all the other variables and
be updated frequently. While the TTC system would rely more on qualitative
assessments along with key figures. It would though require additional cost due
to model development. Another possibility is to have another rating scale with
fewer possible ratings. That would though cause problems for firms having PDs
that are on the edges of some PD-interval, as they might then frequently be
changing between grades.
It should be noted that information on the properties of the rating model is lost
in the calculation of CIER, AUC and all the other one-dimensional measures of
discriminatory power. As they have limited meaning, as individual indicators, in
the assessment of a rating model. It is perhaps best seen for the terms making
up the Brier Skill Score that different discriminatory power indicators might
improve for a specific model while others deteriorate.
In order to solve this problem it is thus suggested to use principal component

analysis on numerous discriminatory power indicators. As it was observed that
the first principal component of the discriminatory power indicators makes up
for most of the variance, the first principal component is used as a discrimina-
tory power indicator. The indicator is referred to as the PCA.stat. Principal
component analysis are introduced in Section 5.6 and the results are discussed
in Section 7.2
Chapter 7
Modeling Results
In previous chapters the modeling toolbox and validation methods that are used
in the development process for a new credit rating model are presented. In this
chapter the most important findings of the development process of a new credit
rating model are presented. The development process is given a full description
in Section 2.3. The findings are presented in an order of significancy and less
important findings can be seen in Appendix B.
Firstly, the general performance of a logistic regression model using the same
variables as used in Rating Model Corporate (RMC) is compared to the per-
formance of RMC in Section 7.1. In Section 7.2 results of principal component
analysis are reported. The resampling process that was used in most of the
modeling is introduced in Section 7.3. The modeling performance of single vari-
able models can be seen in Section 7.4 and variable selection process can be seen
in Section 7.5. Performance of models using new parameters are introduced in
Section 7.6 and discriminant analysis is presented in Section 7.7. Finally, results
for different link functions can be seen in Section 7.8.
100 Modeling Results
7.1 General Results
As the aim of the thesis is to see whether logistic regression can outperform
the benchmark credit rating model, Rating Model Corperate (RMC), used in
the co-operating Corporate bank. A logistic regression model using the same
variables as used in RMC is constructed and predictions made for the testset.
The testset consists of creditworhiness observations from 2007 and observations
on defaults from 2008. The total modeling set is used to construct the parameter
estimates for the predicting model. The rating performance of RMC, logistic
regression model and logistic regression model with subjective ratings can be
seen in Table 7.1.
Statistic RMC LR Model Subjective

AUC 0.803638 0.840293 0.823202
Pietra 0.463855 0.490342 0.491218
CIER -0.976991 -0.012705 0.111694
Calibration 0.000614 0.000368 0.000412
Resolution 0.000115 0.000208 0.000254
BSS -0.066143 -0.021245 -0.020882
PCA.STAT -12.417894 -6.259363 -6.040432
Table 7.1: Performance statistics of RMC, logistic regression model and logis-
tic regression with subjective ratings. High values are desired, except for the
Calibration.
As can be seen in Table 7.1 by considering for example the PCA.stat1 it can
be seen that the RMC has considerable lower score than the logistic regression
model.
The model that has the heading Subjective is the same logistic regression model
except that the ratings are overwritten with the subjective ratings if they are
present2 . The subjective ratings are used in RMC and it is thus interesting to see
whether it improves the performance of the logistic regression model. PCA.stat
and BSS are indicating that the subjective ratings are improving but the AUC,
Pietra and CIER statistics are indicating otherwise. It is debatable whether
the subjective ratings are indeed improving the performance and from the large
values of PCA.stat it is out of its comfort zone3 . It is of course optimal if a rating
1 The PCA.stat discriminatory power statistic is presented in Section 7.2.
2 The subjective ratings are the special rating opinion of credit experts that feels that the
there are some special conditions that is not captured by the rating model.
3 See Section 7.2 for discussion of this matter.
7.1 General Results 101
model would perform that well, that the subjective ratings could be assumed
un-necessery. Further interesting observations can be made by comparing the
validation figures in figure 7.1.
By considering the relative frequency of the good cases of RMC and the LR
model it can be seen that there is considerable difference in the distributional
shape between the two models. The RMC has a normal like distribution with
a bit heavier tale towards the rating one. The logistic regression has a totally
different distribution whereas it is almost steadily increasing from one to twelve.
Likewise it is interesting to view the distribution of bad cases, that is defaults,
as it can be seen that compared to earlier observed frequencies as in figure 4.8
and 4.9 that there are quite many observed defaults with relatively high credit
ratings.
It is also worth noting that the logistic regression model has defaults up to
the rating 9 whereas the RMC only has defaults up to the rating 7. Although
one might consider that this as a negative thing for the LR model it is not,
on the contrary, as the center of the LR distribution is approximately 9 and
approximately 6 for RMC, as can be seen by viewing the cumulative frequencies.
The LR model puts the whole scale into better use, it is for example difficault
to argue what the difference between ratings 9 and 12 in RMC wheras there is
obvously much greater difference for the LR model.
By comparing the ratings of the RMC to the relative ratings obtained from
the LR model, the difference in distributions is observable. From Table 7.2 it
can be seen that the LR model has generally considerable higher credit ratings.
It is interesting that most ratings higher than 8 in RMC are given the rating
12 in the LR model. The higher ratings are results of the lower probabilities
of default observed from the LR model. The probabilities of default that are
obtained from the LR model are largely dependent on the general default rate
of the modeling set. It is then possible to manipulate the general PDs, higher
PDs could be obtained by removing some of the non-default firms. Another
possibility would be to reconsider the transformation that transformes the PDs
to risk ratings.
It is interesting to note that as can be seen form Table 7.1 the Calibration of the
models is higher than the Resolution and that is totally different from the results
of previous years. This results in a negative BSS and it is quite interesting to
view the reliability diagrams of the two models which can be seen in figures 7.2.
By considering first the reliability diagram of RMC it is clear that the model is
poorly calibrated, as the calibration curve does not evolve around the diagonal
line. The reliability diagram of the LR model is partially better calibrated and
Relative Frequencies of good and bad cases Relative Frequencies of good and bad cases
for the LR Rating Model for RMC
Good cases
0.4
0.4
Relative Frequency
Relative Frequency
Bad cases
0.2
0.2
0.0
0.0
2 4 6 8 10 12 2 4 6 8 10 12

CIER statistic = 0.0677 , PCA.stat= 6.1463 , BSS= −0.0267 CIER statistic = −0.719 , PCA.stat= 10.7364 , BSS= −0.0662
Cumulative Frequencies of good and bad cases Cumulative Frequencies of good and bad cases
for the LR Rating Model for RMC
0.8
0.8
0.4
0.4
0.0
0.0
2 4 6 8 10 12 2 4 6 8 10 12

AUC statistic = 0.837 , Gini index= 0.674 , Pietra= 0.488 AUC statistic = 0.816 , Gini index= 0.632 , Pietra= 0.499
ROC Curve of the LR Rating Model ROC Curve of RMC

Relative Frequency of bad cases

0.8
0.8
0.4
0.4
0.0
0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Relative Frequency of good cases Relative Frequency of good cases
Figure 7.1: Validation plot. Compares the performance of RMC and a Logistic
Regression Model.
Reliability Diagram
1e+02
1e+01
1e+00
1e−01
1e−02
1e−02 1e−01 1e+00 1e+01 1e+02
Figure 7.2: Reliability diagram of Rating Model Corporate. Shows the observed
default rate of each class against the forecasted default rate of respective class.
The black line is the calibration curve whereas the the red line is the observed
default rate of the entire portfolio. The blue line is represents the optimal line.
Logistic Regression Model

1 2 3 4 5 6 7 8 9 10 11 12
1 41 19 4 3 7 5 4 2 1 0 0 0
2 13 8 10 5 7 4 3 1 3 2 0 1
3 6 18 28 21 28 18 16 5 8 3 2 0
4 5 7 15 47 50 54 38 39 19 12 6 8
RMC
5 1 1 3 14 33 96 101 75 54 34 9 15
6 0 1 3 6 20 47 88 131 133 72 53 46
7 0 0 1 1 6 16 31 69 106 101 101 91
8 0 0 0 1 0 2 16 19 25 50 72 144
9 0 0 0 1 0 2 1 4 18 18 22 125
10 0 1 0 0 0 0 0 0 6 4 9 58
11 0 0 0 0 0 0 0 0 0 0 1 30
12 0 0 0 0 0 0 0 0 0 0 0 9
Table 7.2: Rating comparison matrix, comparing the ratings of RMC to the
ratings of the logistic regression model.
In Tables 7.3 and 7.4 the transition matrices of RMC and the LR models,
repectively. The transition matrices show the change in ratings from the ratings
of 2007, and the the current ratings, 2008. The two matrices have the highest
rates along the diagonal as expected. There is an obvious difference between
the two transition matrices as the one for RMC has the highest density in the
middle of the matrix. The LR transition matrix has the highest desnity in the
lower right corner. It is also observable that the LR model has few identities
that have had high rating in 2007 and have had major downfall since then.
That is a clear disadvantage as the bank would like to belive that a firm with
high ratings would also have a high rating a year later. As a measure of rating
stability the average singular value is calculated4 . The average singular value of
RMC is calculated as 0.6519 and can thus be considered as a 65% point-in-time
(PIT) rating system. The average singular value of the LR model is calculated
as 0.7135. The LR model can thus be considered as a 71% point-in-time (PIT)
rating system. The LR model can thus be considered as a more robust rating
procedure than RMC.
The main conclusion that can be made after considering the results in this
section must be clear, and that is that a logistic regression model outperformes
the heuristic model RMC.
4 The procedure is explained in Section 6.4.7.

Reliability Diagram
1e+02
1e+01
1e+00
1e−01
1e−02
1e−02 1e−01 1e+00 1e+01 1e+02
Figure 7.3: Reliability diagram of Logistic Regression Model.
2008
1 2 3 4 5 6 7 8 9 10 11 12
1 16 11 9 4 1 3 0 1 0 0 0 0
2 4 7 9 9 3 1 2 1 0 0 0 0
3 1 11 35 30 17 5 1 1 0 0 0 0
4 4 4 24 69 54 28 8 5 2 0 0 0
5 4 3 8 34 117 94 37 12 3 2 0 0
2007
6 2 3 4 26 70 176 120 34 8 0 0 0
7 2 1 0 6 15 61 149 85 26 1 0 0
8 0 1 1 2 5 13 43 99 45 7 1 0
9 0 1 0 0 1 4 19 23 41 24 1 0
10 0 0 1 0 0 0 0 6 3 20 5 1
11 0 0 0 0 0 0 0 2 0 2 2 0
12 0 0 0 0 0 0 0 0 0 0 4 3
Table 7.3: Transition matrix, comparing the changes between years of ratings
in RMC.
2008
1 2 3 4 5 6 7 8 9 10 11 12
1 3 2 3 1 1 2 1 0 0 0 0 0
2 0 2 3 3 2 1 0 1 0 0 0 0
3 2 4 6 3 8 0 3 0 1 0 0 0
4 0 0 8 17 17 6 3 1 2 0 1 1
5 1 0 5 4 15 19 11 9 3 1 0 0
2007
6 0 0 3 4 14 36 26 9 13 6 3 0
7 0 0 0 2 10 14 31 24 14 8 7 3
8 0 0 0 0 5 14 21 52 33 20 8 7
9 0 0 1 0 3 3 17 38 65 43 23 13
10 0 0 1 0 1 1 5 18 31 47 40 21
11 1 1 1 1 0 2 2 9 16 28 51 45
12 0 0 0 2 4 1 5 5 10 30 43 235
Table 7.4: Transition matrix, comparing the changes between years of ratings
in LR.
7.2 Principal Component Analysis
In the model development process Principal Component Analysis (PCA) are

performed for two set of reasons, first it is performed in order to reduce the di-
mension of variables and on the other hand to see whether it can outperform the
original variables. Several different PCA were performed, including independent
PCA for the qualitative and quantitative variables, respectively. PCA were also
performed on the combination of qualitative and quantitative figures. In order
to obtain a single measure of discriminatory power a PCA was performed on a
selection of discriminatory power indicators. To summarize, a list of PCA done
can be seen in Table 7.5.
PCA Variables Description

I pca(ϕ) PCA of qualitative figures
II pca(α) PCA of quantitative key figures
III pca(α̃) PCA of scaled quantitative key figures
IV pca*(α) PCA of quantitative key figures for each sector seperatly
V pca(α, ϕ) PCA of qualitative and quantitative figures
VI pca(α̃, ϕ) PCA of qualitative and scaled quantitative figures
VII pca(DPI) PCA of Discriminatory Power Indicators
Table 7.5: The list and description of different Principal Component Analysis
that where done.
The general results of PCA I-VI are given a discussion in appendix B. The
performance of the principal component representatives of PCA I-VI can be seen
in Section 7.4. The PCA of the discriminatory power indicators are however of
more interest and is given a full discussion in Section 7.2.1.
7.2.1 VII - PCA of Discriminatory Power Indicators
The fact that there is no single numerical measure of model performance makes
the validation of a rating procedure a difficault task. In order to address this
problem PCA is performed on a set of discriminatory power indicators to re-
duce the dimension of variables that are taken into consideration. The PCA
is performed on numerous discriminatory power indicators and then the first
principal component representatives considered as a single numeric measure of
discriminatory power. In order to explain this in more detail it is important to
understand what is going on inside the PCA.
The new discriminatory power indicator will be refered to as PCA.stat. In order

to make the PCA possible a sample of thousand model performance observa-
tions are collected. The sample of observations is collected by considering nine
different LR models with different number of parameters and RMC summing
to a total of ten different models. Then hundred recursive splits are tried for
each model that produce the thousand observations. Dotplots, histograms and
Spearman correlation of the sample of discriminatory power indicators can be
seen in figure 7.4.
There are quite a few interesting things that can be seen in Figure 7.4, first it
is possible to see that the AUC and the Gini Index are completely correlated,
as exspected. Maybe more surprisingly the Pietra Index is pretty correlated
with the AUC and Gini Index, while the CIER indicator is not that correlated
with other indicators except maybe to Resolution and BSS. The Resolution
and BSS are then again very correlated, resulting in the conclusion that the
calibration measure has little leverage in the BSS, due to the relative difference
in size between the calibration and resolution, whereas the resolution is generally
considerable larger.
The variance measures is only dependent on the default rate of the sample
and it is preferable mostly uncorrelated with the other discriminatory power
indicators. The Brier score is however pretty correlated with the variance and
it is thus easy to see why the BSS should be preferred to the Brier Score.
By recalling that the Calibration is a measure of how well calibrated the model
is, i.e. the desired small difference between forecasted and observed default
rates. As can be seen in Figure 7.4 there is not considerable correlation between
the Calibration and the other indicators, it is possible to conclude that no other
indicator is describing the calibration. The Brier indicator is almost completely
uncorrolated with the Calibration indicator, further undermining the usability
of the Brier indicator.
The chose of discriminatory power indicators to use in the PCA is obtained by

excluding those who are not appropriate. For example is it not appropriate to
use both AUC and Gini Index in the PCA as they are completely correlated.The
Brier Score is excluded through its connection with the Variance, which mea-
sures the default rate of the sample and is thus not desired in a measure of
model performance. If it is considered that the BSS containes the information
on the calibration of the model then the calibration can be excluded from the
PCA. The results of the PCA on the DPIs can be seen in Table 7.6.
It is apparent from Table 7.6 that the first principal component describes most
of the variance of the four indicators. The first principal component will be
refered to as PCA.stat in model performance reportation. It is then interesting
0.50 0.65 0.60 0.80 0e+00 4e−04 0.0135 0.0165
0.90
AUC
0.80
Pietra
0.50 0.65
0.84
CIER
−0.1 0.3
0.56 0.59
0.80
Gini
1.00 0.84 0.56
0.60
0.0175
Variance
0.11 0.10 0.065 0.11
0.0155
4e−04
Calibration
0.36 0.32 0.32 0.36 0.17
0e+00
Resolution
0.60 0.58 0.78 0.60 0.19
0.51
0.0010
0.0165
Brier
0.022
0.37 0.34 0.25 0.37
0.89 0.22
0.0135
0.12
BSS
0.012
0.62 0.60 0.78 0.62 0.38
0.97 0.42
0.04
0.80 0.90 −0.1 0.3 0.0155 0.0175 0.0010 0.04 0.12
Figure 7.4: Dotplots, histograms and Spearman correlation of the discriminatory

power indicators.
DP Indicator PC1 PC2 PC3 PC4

AUC −0.5042 0.5043 −0.2530 −0.6538
Pietra −0.5068 0.4788 0.3555 0.6225
CIER −0.4842 −0.5693 0.5948 −0.2960
BSS −0.5045 −0.4385 −0.6751 0.3121
Standard deviation 1.7322 0.7982 0.4679 0.3788
Proportion of Variance 0.7501 0.1593 0.0547 0.0359
Cumulative Proportion 0.7501 0.9094 0.9641 1.0000
Table 7.6: The rotation of variables and summary of the principal component
analysis of the discriminator power indicators.
to compare the first principal component to the other variables as in Figure

7.5. It is observable that the PCA.stat is rather correlated to most of the DPIs
except for calibration and Brier. The high correlation indicates that a high score
for the DPIs making up the PCA.stat will most certainly results in a high score
for the PCA.stat. It can thus be concluded that a high values of the PCA.stat
indicate good model performance.
It thus worth considering that the average value of the PCA.stat is zero, for the
sample that was used in the principal component analysis. Models that perform
better than average will get a positive values and models performing worse
will get a negative value for PCA.stat. By analizing the the range of the first
principal component representative in Figure 7.5, it is observed that most of the
values lie in the range of [-4,4]. As the PCA is a linear transformation, it assumes
linear relationships between data. As can be seen from the dotplots in Figure 7.4
the relationship between the variables considered in the PCA is relatively linear.
The relationship between the DPIs outside the range considered in Figure 7.4
might though be non-linear. Values of the PCA.stat outside the range of [-4,4]
must thus be considered with care.
The problem about the use of PCA.stat could be user acceptance, as those with
less statistical background might reject it’s use. It is thus worth noting that
the PCA.stat can be considered as the weighted average of the standardized
DPIs. The weights can be seen in Table 7.6 under the heading PC1 and as they
are close to being equal they are almost the simple average of the standardized
DPIs. The term standardized refers to the procedure that makes it possible to
compare variables of different sizes. The standardization is usually performed
by subtracting the mean from all the observations and dividing by the sam-
ple variance. This standardization can be considered as converting apples and
oranges into cash in order to compare them. After excessive use of PCA.stat
0.50 0.65 0e+00 3e−04 0.0135 0.0160 −4 0 4
AUC
0.88
0.80
Pietra
0.65
0.84
0.50
CIER
0.3
0.56 0.59
−0.1
0e+00 3e−04
Calibration
0.36 0.32 0.32
Resolution
0.60 0.58 0.78 0.51
0.0010
0.0160
Brier
0.37 0.34 0.25 0.022 0.22
0.0135
BSS
0.10
0.62 0.60 0.78 0.38
0.97 0.42
0.04
PCA1
4
0.87 0.88 0.84 0.39 0.84 0.41 0.86

0
−4
0.80 0.88 −0.1 0.3 0.0010 0.04 0.10
Figure 7.5: Dotplots, histograms and Spearman correlation of the discrimina-

tory power indicators and the first principal component representative of PCA
considering the AUC, Pietra CIER and BSS, which can be seen in the lower
right corner.
the conclusion was made that it was indeed a great single indicator of model
performance and there was no mismatches observed in its use.
From the consideration on whether the calibration term gets lost in the calcula-
tion process of BSS an additional PCA was performed using the four DPIs that
where used in the previous PCA along with the calibration. The results of the
PCA with five DPIs can be seen in Table 7.7.
DP Indicator PC1 PC2 PC3 PC4 PC5

AUC −0.4718 0.3564 −0.3992 0.4041 −0.5724
Pietra −0.4733 0.3806 −0.3511 −0.4297 0.5686
CIER −0.4517 0.0809 0.6659 −0.4600 −0.3665
Calibration −0.3323 −0.8173 −0.3751 −0.2443 −0.1458
Resolution −0.4888 −0.2317 0.3649 0.6171 0.4398
Standard deviation 1.8010 0.8997 0.7859 0.4359 0.3731
Proportion of Variance 0.6487 0.1619 0.1235 0.0380 0.0278
Cumulative Proportion 0.6487 0.8106 0.9342 0.9722 1.0000
Table 7.7: The rotation of variables and summary of the principal component
analysis of five discriminator power indicators.
From Table 7.7 it can be seen that the weight for the calibration is somewhat
smaller than the other weights. The proportion of variance that the first prin-
cipal component decribes is somewhat smaller then for the PCA with only four
DPIs. As can be seen in Figure 7.6 the correlation is similar in most cases except
for the calibration where it is somewhat larger than observed when only four
DPIs where used in the PCA. It is difficault to make any strong conclusions
from comparison of the two different PCA. The decision to go with the PCA
using only four DPIs was made as it was considered to be a safer choose.
7.3 Resampling Iterations
In order to estimate how many resamplings are necessary to get a stable measure
of the actual model performance, the model performance of RMC was considered
and when the performance statistic and standard deviation have stabilized then
sufficient number of resamplings can be assumed. To save computation time
only the AUC discriminatory power indicator is considered i.e. the mean over
all samples and the respective standard deviations.
7.3 Resampling Iterations 113
0.50 0.65 0e+00 3e−04 0.0135 0.0160 −6 −2 2
AUC
0.88
0.80
Pietra
0.65
0.84
0.50
CIER
0.3
0.56 0.59
−0.1
0e+00 3e−04
Calibration
0.36 0.32 0.32
Resolution
0.60 0.58 0.78 0.51
0.0010
0.0160
Brier
0.37 0.34 0.25 0.022 0.22
0.0135
BSS
0.62 0.60 0.78 0.38
0.97 0.42 0.10
0.04
PCA2
2
0.85 0.85 0.83 0.55 0.87 0.32

0.85
−6 −2
0.80 0.88 −0.1 0.3 0.0010 0.04 0.10
Figure 7.6: Dotplots and Spearman correlation of the discriminatory power

indicators and PCA with five indicators.
The performance of RMC and a ramdomly chosen model with the scaled sol-
vency as a single variable, for 30, 40, 50, 60 and 80 resampling iterations can be
seen in Table 7.8.
Resampling RMC ŷ ∼ α̃s

Iterations AUC Std.dev. AUC Std.dev.
30 0.8868 0.0155 0.7393 0.0303
40 0.8843 0.0171 0.7328 0.0301
50 0.8838 0.0189 0.7309 0.0293
60 0.8838 0.0186 0.7280 0.0288
80 0.8865 0.0188 0.7304 0.0290
Table 7.8: Performance of RMC and model with Solvency score as a variable
for 30, 40, 50, 60 and 80 resampling iterations.
Considering the results in Table 7.8 it is apparent that the mean AUC is in all
cases significant to the degree of two decimal figures whereas the standard devi-
ation is only significant to to the same degree after 50 iterations. Even though
this analysis is not extensive it is consider enough to consider 50 iterations to
get a fair estimate of the actual model performance. It is important to note that
no significant correlation was observed between sample size and model perfor-
mance. Strenghtening the belive that splits with difference in default rates no
more than ±10% can all be considered equally good.
It is also interesting to consider from the results in Table 7.8 that it is noted
in Datschetzky et al. [13] that for an empirical dataset, the upper bound in the
AUC is approximately 0.9, so it can be seen that the performance of Rating
Models Corperate (RMC) is very good and that there is not much room for
improvement. It is though difficault to conclude on this matters as it might be
various reasons for the good performance of RMC, e.g. the economical situation
in the years in consideration has been considered good and this particular loan
portfolio might be more conservative than the general banks considered in Ong
[26].
7.4 Performance of Individual Variables
From considering how many resampling iterations should be performed the num-
ber of 50 iterations was chosen. To evaluate the performance of individual vari-
7.4 Performance of Individual Variables 115
ables one variable models are constructed an their performance over 50 iterations
is documented. The results of numerous univariate models with quantitative key
figures as variables are considered in Table 7.9. Only the AUC discriminatory
power indicator is considered in order to save calculation time, the average AUC
and relative standard deviations are listed in Table 7.9.
No. Variable Variable Name Mean AUC Std.dev.

1 y RMC 0.8838 0.0189
2 αd DEBT 0.5470 0.0472
3 α̃d DEBT SCORE 0.7137 0.0302
4 αl LIQUIDITY 0.5244 0.0365
5 α̃l LIQUIDITY SCORE 0.6194 0.0341
6 αr RETURN 0.6595 0.0302
7 α̃r RETURN SCORE 0.6905 0.0329
8 αs SOLVENCY 0.7040 0.0218
9 α̃
Ps SOLVENCY SCORE 0.7309 0.0293
10 α̃ SUM OF SCALED QUANTITATIVE 0.7829 0.0276
Table 7.9: Performance of single variables models for the quantitative key fig-
ures. Model 10 is though a multivariate model considering all the scaled quan-
titative key figures.
Starting from the top in Table 7.9 first is the performance of the RMC for
comparison. Models 2-9 then show the performance of the qualitative key fig-
ures. It is clear that the sector relative key figures outperform the simple firm
specific key figures with quite a margin. By considering the scaled key figures
the solvency has the best performance and then the debt, closely followed by
the return whereas the liquidity has the least discriminatory power. Model 10
then considers the sum of all the quantitative key figures, and to no surprise
outperformes all the individual parameter models.
The results of models considering the qualitative figures as variables can be seen
in Table 7.10. In models 10 -15 the performance of individual variable of the
qualitative figures are considered. It can be seen that the refunding variable
has the most predictive power and then the risk asessment of the credit experts.
The management, stability and position variables have medium performance
whereas the situation variable shows the least performance. The first principal
component of the qualitative figures performes well and the the second principal
component has some predictive power. The sum of all qualitative figures shows
the best performance, closely followed by the sum of the first two principal
components. However it is interesting to see that the standard deviation of

10 ϕm MANAGEMENT 0.7494 0.0315
11 ϕst STABILITY 0.7635 0.0293
12 ϕp POSITION 0.7637 0.0274
13 ϕs SITUATION 0.7140 0.0308
14 ϕf REFUNDING 0.8352 0.0254
15 ϕr RISK 0.7909 0.0292
16 pc1 (ϕ) PCAquali 0.8379 0.0264
17 pc2 (ϕ) PCAquali2 0.5815 0.0296
18 pc1 (ϕ)P+ pc2 (ϕ) PCAquali 0.8440 0.0188
19 ϕ SUM OF QUALITATIVE 0.8477 0.0250
Table 7.10: Performance of single variables models for the qualitative figures.
Models 18 and 19 are though multivariate models considering all the qualitative
figures.
model 18 is considerable lower than the standard deviation of model 19.

20 γc CUSTOMERFAKTOR 0.7001 0.0349
21 γs SECTOR 0.6010 0.0261
22 γaa ANNOTATION 0.5744 0.0241
23 γo OBLIGATION 0.6576 0.0294
24 γa AGE 0.5229 0.0166
Table 7.11: Performance of single variables models for the catagorial variables.
The performance of customer factors can be seen in Table 7.11. The sum of
numeric values of the factors as they are used in RMC perform quite well by
itself. Interestingly the factors telling what sector the firm belongs to has some
predictive power. By viewing that model in more detail it is apparent that the
real estate sector is the least risky and then trade, transport, service and finally
by far the riskiest is the industry sector. Another interesting point which can
be considered from Table 7.11 is that the obligation factor outperformes the
annotation and age factors by some margin.
In Table 7.12 the performance of some of the principal components of the various
PCA performed can be seen. The first six models, models 25 - 30, can be seen
as pairs of the first and second principal components of different PCA of the
7.4 Performance of Individual Variables 117

25 pc1 (α) PCAquanti1 0.6019 0.0325
26 pc2 (α) PCAquanti2 0.4988 0.0020
27 pc∗1 (α) PCAquanti1all 0.7085 0.0383
28 pc∗2 (α) PCAquanti2all 0.6336 0.0203
29 pc1 (α̃) PCAquanti1sc 0.7718 0.0292
30 pc2 (α̃) PCAquanti2sc 0.4911 0.0189
31 pc1 (α, ϕ) PCAqq1 0.8338 0.0255
32 pc2 (α, ϕ) PCAqq2 0.5117 0.0326
33 pc3 (α, ϕ) PCAqq3 0.5270 0.0280
34 pc4 (α, ϕ) PCAqq4 0.5115 0.0214
35 pc1 (α̃, ϕ) PCAqq1s 0.8469 0.0244
36 pc2 (α̃, ϕ) PCAqq2s 0.5783 0.0238
37 pc3 (α̃, ϕ) PCAqq3s 0.4992 0.0179
38 pc4 (α̃, ϕ) PCAqq4s 0.5689 0.0287
Table 7.12: Performance of single variables models for the principal compo-
nents of different PCA of quantitative key figures. A combined PCA for both
qualitative and quantitative figures is considered in models 31-38.
quantitative key figures. The first pair, models 25 and 26 is the results for
regular PCA of the unscaled qualitative key figures. Models 27 and 28 show
the results when different PCA where performed on observations from each
sector. That was done in order to account for the variance between sectors.
The performance of scaled qualitative key figures can be seen in models 29 and
30. The performance of models 27 and 28 are clearly better than models 25
and 26 so the attempt of making different PCA for different sectors results in
great improvements compared with single PCA for all observations. The first
principal component of the scaled key figures has the greatest discriminatory
power of models 25-30. It is then interesting to see that the second principal
component of the scaled key figures has no predictive powers.
The results of models number 31 - 34 are results of models using the first four
principal components, of a PCA using both qualitative and quantitative vari-
ables. The performance of model 31 barely makes up for the performance of
model 16 which only uses the qualitative figures. The other principal compo-
nents have no real predictive powers. It can thus be concluded that this is not
the way to go.
The results of models 35-38 are of models using the first four principal compo-
nents, of a PCA using both qualitative and scaled quantitative variables. The
performance of the first principal component is the best of the models presented
to this point. There is some limited predictive power in the second and forth
principal components.
There are some variables available that are not used in RMC. They are listed
in Table 7.13.

39 ςk KOB SCORE 0.8341 0.0322
40 ςmin RATING EARLIER MIN 0.8692 0.0186
41 ςmax RATING EARLIER MAX 0.8384 0.0205
42 ςe EQUITY 0.5558 0.0104
Table 7.13: Performance of single variables models for variables that are not
used in RMC.
As can be seen from Table 7.13 the KOB rating system performes well, but
clearly not as well as the RMC. It is interesting to see that the minimum earlier
ratings outperformes the earlier maximum ratings. From this the conclusion
can be drawn that a more conservative model would perform better. It might
thus be considered introducing a special rule in the model that would make it
harder for ratings to go up than it is for them to go down. It is also observable
from Table 7.13 that the equity has some predictive power, indicating that size
matters. That is if the value of the equity is consider as a measure of size.
7.4.1 Modeling Change in Variables
In a paper by Behr and Güttler [7] it is reported that apart from being good
individual indicators, a positive growth rate of the solvency ratio and return on
sales ratio, reduces the default risk of firms. This result gives reason to analize
the performance of the change in the quantitative key figures. The analysis
requires information of firm’s key ratios in three successive years and the data
requirements to construct one complete dataset e.g. 2006 data from 2004, 2005
and 2006 are required. With the 2008 data availible it was possible to construct
three complete datasets. The performance of the change of the scaled key figures
can be seen in Table 7.14.
From Table 7.14 it is clear that the change in the solvency ratio is the only
one which has some limited predictive power. It is worth remembering that the
return ratio measures the return on total assets, not return on sales. As this
7.5 Performance of Multivariate Models 119

43 ∆α̃d CHANGE IN DEBT SCORE 0.4948 0.0255
44 ∆α̃l CHANGE IN LIQUIDITY SCORE 0.4964 0.0138
45 ∆α̃r CHANGE IN RETURN SCORE 0.4852 0.0310
46 ∆α̃
P s CHANGE IN SOLVENCY SCORE 0.5372 0.0210
47 ∆α̃ SUM OF CHANGE IN KEY FIGURES 0.5274 0.0334
Table 7.14: Performance of models with change in the scaled key figures as
variables.
analysis was performed late in the process the change in the solvency ratio is
not used in any further modeling.
7.4.2 Model Selection
From the results represented in prevous tables, the next step would be to analize
some of those variables together with other variables in such a way that the
disicive conclusion of which variables to use in the model and which ones do not
need to be considered further.
Regular stepwise regression does not work for this problem as it is desired to
have the same variables in all resamplings and can thus not be done inside the
resampling loop. The reason for this is that for one splitting into training and
validation sets a variable might be included and then excluded for a different
splitting. The process of adding one at a time is thus considered and if it
improves the model it is included in further analysis. There is a problem to
this procedure, as it is hard for variables to be excluded from the model which
is at the time the best model. It is thus up to the programmers to decide
on whether attempt should be made to exclude an excisting variable. After
having introduced the PCA.stat the variable selection process could though be
automated.
7.5 Performance of Multivariate Models
In Section 7.4 the performance of each individual variable is given a full discus-
sion. In this section the performance of different combinations of variables are
introduced. By introducing the principal component analysis into the modeling
framework it complicates the variable selection. In order to give the reader a

glimpse into the variable selection method a selection of variable combinations
is listed in Table 7.15. In order not to confuse the numberings to the numbering
of Section 7.4 roman numerals are used instead of the conventional numerals.
No. Model PCA.stat Std.dev. Table

∗
I ŷ ∼pc
P1 (α) + pc∗2 (α) -7.81915 2.02009
II ŷ ∼ α̃ -8.18340 2.46617 B.1
III ŷ ∼ pc1 (α̃) -8.16708 2.58280
IV ŷ ∼ pc1 (ϕ) -2.97002 1.82934
V ŷ ∼ pc
P1 (ϕ) + pc2 (ϕ) -3.06658 1.73939 B.2
VI ŷ ∼ ϕ -2.43837 2.07074
P
VII ŷ ∼ P α̃ + pc1 (ϕ) -2.29438 1.78160
VIII ŷ ∼ P α̃ + pc1 (ϕ) + pc2 (ϕ) -1.70544 1.71236 B.3
IX ŷ ∼ i∈{1,2,4} pci (α̃, ϕ) -2.16406 1.86009
X ŷ ∼ pc∗1 (α) + pc∗2 (α) + pc1 (ϕ) -2.61091 1.76196
B.4
XI ŷ ∼ pc1 (α̃) + pc1 (ϕ) -2.37164 1.85403
XII ŷ ∼ pc1 (α̃) + pc2 (α̃) + pc1 (ϕ) -2.37979 1.81994
XIII ŷ ∼ pc1 (α̃) + pc1 (ϕ) + pc2 (ϕ) -1.95172 1.66847 B.5
XIV ŷ ∼ pc1 (α̃) + pc2 (α̃) + pc1 (ϕ) + pc2 (ϕ) -2.61091 1.76196
P
XV ŷ ∼ i∈{1,2,4} pci (α̃, ϕ) + γo 0.17029 1.76229
P
XVI ŷ ∼ i∈{1,2,4} pci (α̃, ϕ) + γo + γaa 0.32746 1.80317 B.6
P
XVII ŷ ∼ i∈{1,2,4} pci (α̃, ϕ) + γo + γaa + γa 0.43781 1.75137
P P
XVIII ŷ ∼ P α̃ + P ϕ -1.42421 1.80689
B.7
XIX ŷ ∼ α̃ + ϕ + γo 0.70390 1.65541
P P
XX ŷ ∼ P α̃ + P ϕ + γo + γaa 0.86769 1.71254
B.8
XXI ŷ ∼ α̃ + ϕ + γo + γaa + γa 0.81453 1.72739
P
XXII ŷ ∼ pc1 (α̃) + P ϕ + γo + γaa 0.88147 1.69849
B.9
XXIII ŷ ∼ pc1 (α̃) + ϕ + γo + γaa + γa 0.86564 1.74342
Table 7.15: Summary of the variable selection process.
The results in Table 7.15 are a summary of many result tables that can be seen
in Appendix B. Models I-III are considering different methods to deal with
the key figures and it seems like the sectorwise PCA of the unscaled key figures
outperformes by some margin both the sum of all scaled key figures and the first
principal component of the scaled key figures. From this results it is interesting
to compare models VII, X and XI, which are the same models as I-III except,
7.5 Performance of Multivariate Models 121
that the they all include the pc1 (ϕ) variable. Then the PCA of the unscaled key
figures is no longer performing best it is actually performing worst. Model VII
performes best of the three VII, X and XI.
Models IV-VI are modeling the qualitative figures in three different ways. By
comparing models IV and V it is clear that the second principal component
representative of the qualitative figures does not improve the model. Model VI
performes by far best of the three models in consideration, it has however by
far the highest standard deviation.
Models XI-XIV consider different mixtures of the first two principal component
representative variables of the indipendent PCA of the qualitative and quanti-
tative figures. By comparing models XI and XII it is clear that there is almost
no improvement in the second principal component representative of the scaled
qualitative figures. Model XIII performes best and interestingly model XIV has
the worst performance.
At one point in time model IX seemed to have the best performance and thus
customer factors are introduced in models XV-XVII. All the customer factors
improve the model by some margin. It is however observable from Table 7.15
that models VIII and XVIII are outperforming model IX.
Model XVIII is clearly has the best performance of the models in Table 7.15 that
only considers the the quantitative and qaulitativ figures. In models XV-XVII
the customer factors are introduced and interestingly it seem like the age factor
no longer has predictive powers.
As there is little difference in performance of models II and III it was considered

with trying to use the first principal component representative of the scaled key
figures as a variable instead of the sum of the scaled key figures as can be seen
for models XXII and XXIII. Models XXII and XXIII outperform models XX
and XXI respectivly and the final conclusion that model XXII is indeed the best
is reached.
The models presented in Table 7.15 are not the only models tested but rather
given as an example of how the modeling selection process worked. No higher
order relationships were observed. Higher order refers to the product of vari-
ables.
7.6 Addition of Variables
In this section variables that are not used in RMC are included in the model with
the best performance to this point. The variables in question are the KOB score,
maximum and minimum earlier ratings and equity. The performance of each
of these variables can be seen in Table 7.13 and even though the earlier ratings
show quite a good performance as single variable it is the authors oppinion that
earlier rating should not be used as a variable as it would reduce the robustness
of the model. The performance of the earlier rating was however recorded and
making a long story short, neither of the earlier ratings where able to improve
the performance of model XXII. The same result was observed for the equity.
Including these variables in the analysis results in a bit smaller dataset than the
complete dataset as these variables include some missing values. The results
obtained when the KOB score included in model XXII can be seen in Table
7.16. It is clear that some of the predictive powers of the KOB is not modeled
in RMC and vise versa. From these results, it is possible to draw the conclusion
that there is room for improvement in the modeling process. A model could
be seen as a very good model if it were not possible to improve the model
by including the KOB score. The room for improvement could be filled by
including new variables. The problem is that it is a massive project to collect
some new quantitaive variables from earlier years, and that explaines the lack
of experiment with new variables as they are not availible in the co-operation
banks database.
The models in Table 7.17 both models include the KOB rating as a variable.
The model to the left has subjective ratings overwriting the predicted ratings.
It is interesting that it gives almost no improvement to include the subjective
ratings, which is very desirable. The model to the right has a double weights
on the defaulted observations. The idea behind that attempt was to make a
more conservative model. As can be seen from Table 7.17 the performance
drops significantly, the major influence in this is the CIER indicator. It is thus
concluded that weighted analysis is not the way to go.
7.6 Addition of Variables 123
P P
ŷ ∼ pc1 (α̃) + ϕ ŷ ∼ pc1 (α̃) + ϕ
RMC +γo + γaa + ςk +γo + γaa + γaa + ςk
DP Indicator Mean Std.dev. Mean Std.dev. Mean Std.dev.
AUC 0.88879 0.01759 0.90878 0.02231 0.90946 0.02221
Pietra 0.61418 0.04502 0.68886 0.04866 0.69414 0.04829
CIER 0.28553 0.10688 0.52570 0.06470 0.54776 0.06591
BSS 0.08782 0.01946 0.13986 0.01978 0.14120 0.02118
PCA.stat -0.08614 1.55514 3.37185 1.48053 3.56527 1.50515
AIC 564.401 20.3699 564.636 19.8915
Psuedo R2 0.38769 0.02196 0.39197 0.02152
Table 7.16: Model performance with the KOB rating included as a variable.
P P
ŷ ∼ pc1 (α̃) + ϕ ŷ ∼ pc1 (α̃) + ϕ
+γo + γaa + ςk & ςs +γo + γaa + ςk & w2
DP Indicator Mean Std.dev. Mean Std.dev.
AUC 0.91349 0.02021 0.90873 0.01855
Pietra 0.70955 0.04764 0.69545 0.04584
CIER 0.55151 0.06946 0.04294 0.10642
BSS 0.13995 0.02272 0.10126 0.02051
PCA.stat 3.78365 1.52714 0.43464 1.59559
AIC 564.636 19.8915 903.181 35.7216
Psuedo R2 0.39197 0.02152 0.42616 0.02273
Table 7.17: The model to the left has subjective ratings overwriting the pre-
dicted ratings. The model to the right has additional weights on the defaulted
observations. Both models include the KOB rating as a variable. The & refers
to that the following variables are modeled heuristically.
7.7 Discriminant Analysis
In this section the performance of linear discriminant analysis as a modeling

function is compared to the logistic regression. The principal components val-
ues of both qualitative and quantitative variables are used as they fulfill the
important prerequsition of normality. The results in Table 7.19 are the results
when the LDA was used as a modeling function and can be compared to Table
B.3 in Appendix B were the same variables are used in a logistic regression
model.
RMC ŷ ∼ pc1 (α̃) + pc1 (ϕ)

DP Indicator Mean Std.dev Mean Std.dev
AUC 0.88380 0.01890 0.85917 0.02434
Pietra 0.60149 0.04582 0.57963 0.05019
CIER 0.26502 0.12102 0.22976 0.13658
BSS 0.08448 0.02098 0.04985 0.01866
PCA.stat -0.48041 1.69085 -2.18649 1.80176
Table 7.18: Model performance of linear discriminant analysis.
ŷ ∼ pc1 (α̃) + pc2 (α̃) ŷ ∼ pc1 (α̃) + pc1 (ϕ) ŷ ∼ pc1 (α̃) + pc2 (α̃)
+pc1 (ϕ) +pc2 (ϕ) +pc1 (ϕ) + pc2 (ϕ)
AUC 0.85973 0.02346 0.86957 0.02202 0.86799 0.02268
Pietra 0.58314 0.04719 0.60452 0.04426 0.59574 0.04252
CIER 0.24045 0.13268 0.24499 0.13586 0.25094 0.13530
BSS 0.04849 0.02020 0.05035 0.01927 0.05246 0.02020
PCA.stat -2.12909 1.75003 -1.65832 1.70556 -1.69990 1.69341
Table 7.19: Model performance of linear discriminant analysis.
From the comparison of Tables 7.19 and B.3 it can be seen that the LDA out-
performes the logistic regression by quite a margin. The downfall to the LDA
is that it is impossible to include the customer factors in the model. That can
though be done by applying the customer factors in a heuristic procedure. The
heuristic procedure was performed in such a way that the final rating was down-
graded by one and two if the customer factors where indicating negative factors.
In Table 7.20 the results when accountants annotations and subjective ratings
have been introduce heuristically5 .
ŷ ∼ pc1 (α̃) + pc1 (ϕ) ŷ ∼ pc1 (α̃) + pc1 (ϕ) ŷ ∼ pc1 (α̃) + pc1 (ϕ)
+pc2 (ϕ) & ςs +pc2 (ϕ) + γaa +pc2 (ϕ) + γaa & ςs
AUC 0.87784 0.02211 0.87133 0.02249 0.87908 0.02244
Pietra 0.62879 0.04580 0.60339 0.04423 0.62754 0.04562
CIER 0.30502 0.12192 0.25010 0.13791 0.29537 0.12446
BSS 0.06372 0.01810 0.05825 0.02312 0.07081 0.02140
PCA.stat -0.66560 1.62737 -1.41711 1.81301 -0.51832 1.72214
Table 7.20: Model performance of linear discriminant analysis, with addition of

heuristic procedures.
By comparing the PCA.stat of the three models in Table 7.20 with the PCA.stat
of the middle model in Table 7.19, it is clear that both the accountants annota-
tions and the subjective ratings improve the model. It is also noticable that the
subjective rating improves the perfromance by a greater margin than the accoun-
tant’s annotation. It also interesting to see that when customer factors where
introduced in Section 7.5 it resulted in a jump in model performance. When
further customer factors where included in the heuristic procedure, it reduced
the model performance. The conclusion is that linear discriminant analysis is
not likely to outperform logistic regression based on the prerequsition of nomally
distributed explanatory variables.
7.7.1 Quadratic Discriminant Analysis
As an alternative to the linear discriminant analysis the quadratic discriminant

analysis is performed. The performance of the quadratic discriminant analysis
can be seen in Table 7.21. By comparing the results of Tables 7.18 and 7.21 it is
clear that the linear discriminant analysis outperformes the quadratic discrimi-
nant analysis. It is thus of no relivance to consider the quadratic discriminant
analysis further.
An extensive attempt was made to model the support vector machine without
great success. This is given a brief discussion in Section B.3 in Appendix B
5 Accountants annotations =γaa and subjective ratings = ςs

RMC ŷ ∼ pc1 (α̃) + pc1 (ϕ)

DP Indicator Mean Std.dev Mean Std.dev
AUC 0.88380 0.01890 0.85541 0.02373
Pietra 0.60149 0.04582 0.57735 0.04930
CIER 0.26502 0.12102 0.12279 0.17487
BSS 0.08448 0.02098 0.04350 0.02092
PCA.STAT -0.48041 1.69085 -2.89135 2.02721
Table 7.21: Model performance of quadratic discriminant analysis.
7.8 Link functions
As can be seen in Section 5.2.2 there are several available link functions. In this
section the different performance of different link functions is discussed. Porath
[27] reports that the complementary log log link function is the most suitable
link function when modeling default probabilities. The results can be seen in
Table 7.22. From the results in table 7.22 it is clear from the PCA.stat indi-
logit probit cloglog

AUC 0.88578 0.02529 0.88531 0.02537 0.89059 0.01906
Pietra 0.63582 0.04852 0.63402 0.04858 0.64326 0.03868
CIER 0.44672 0.11022 0.39385 0.11828 0.48787 0.06902
BSS 0.09212 0.02027 0.09109 0.02253 0.09324 0.01778
PCA.stat 0.86564 1.74342 0.58957 1.85025 1.23449 1.27456
AIC 682.563 22.6804 679.233 23.6393 688.922 18.3589
Psuedo R2 0.29853 0.02021 0.30210 0.02132 0.28983 0.01519
Table 7.22: Model performance for different link functions.
cator that the complementary log-log link function has the best performance.
The complementary log-log link functions was observed to have some conver-
gency problems, that is in some cases many iterations were needed to get stable
estimates of parameters. In order to save time the complementary log-log link
function was thus not used in other analysis, the logit link was used in all other
analysis except something otherwise is noted. It is though important to note
that the complementary log-log link function is especially well suited for model-
ing default data. Others links were tried but were subjected to sever convergency
7.8 Link functions 127
problems and lack of performance, and are thus not given a further discussion
Chapter 8
Conclusion
This chapter contains a short summary of results found in the thesis, in Section
8.1. Suggestions on possible further works related to the work done in this thesis
are discussed in Section 8.2.
8.1 Summary of Results
In this thesis many different aspects of the development process of a new credit
rating model were considered. The main aspects are; modeling procedures,
variable performance, variables selection procedure and the validation process.
Various methods are available to model the default event and some of the best
suited models for the problem are statistical models. The most appropriate
statistical models are the models that can produce individual probabilities of
whether a certain firm will default or not. Of the modeling procedures tried,
the logistic regression models was seen to be the most practical procedure to
model default probabilities. That is concluded from the smooth transition from
creditworthiness data to probabilities of default.
The linear and quadratic discriminant analysis methods, have a clear lack of
generality for the modeling of credit default. As they require normality of the
130 Conclusion
predictive variables, making it impossible to include customer factors. Other

methods frequently used to model classification problems, that were tried here
include, support vector machine (SVM), CART and k-Nearest Neighbor (k-NN).
The SVM modeling was not modeled to success, as the estimates of probabilities
of default were to small for the transformation that was used throughout this
thesis work. A new transformation would need to be constructed in order to
validate the performance of SVM compared to logistic regression. The CART
did not improve the performance of competitive models. It is though worth
noting that the CART has many tuning parameters and could thus provide
additional information after extensive search for the right tuning parameters.
The same result was observed for the k-NN as a variable as for the SVM that
is low PDs.
The amount of data available can not be considered as optimal in two senses.
First as the number of defaults is rather limited and only three years are can
be considered in the modeling process. This problem was solved by performing
recursive resampling of the modeling data and to consider the average perfor-
mance over 50 resampling iterations. Secondly the lack of different quantitative
key ratios, made the variable selection analysis very limited. The credit rating
score, called KOB score, of an credit rating agency showed significant increase in
model performance. From this it is possible to conclude that there is definitely
room for improvement that could be filled by including variables that were not
available in the available data.
The validation of credit rating models seems to lack a single numerical measure
on model performance. That causes a great problems in model development
and thus a new measure is suggested called PCA.stat. The PCA.stat is not
really a new measure as it is a principal component representative of some
selected discriminatory power indicators. With one numerical measure of model
performance, variable selection and the model development in general becomes
much more efficient.
8.2 Further work
Developing a support vector machine that could be competitive to the logistic

regression, would provide a whole new dimension to the work done here. Neural
Networks have also been shown to perform very well compared to other methods.
There has been some controversy about its use, but development of a neural
networks credit rating model would be of great interest.
The next step would be to consider some macroeconomic variables in the mod-
8.2 Further work 131
eling process. Interest rates, gas price, house prices and inflation are amongst
some of the economic variables that could bring important value to the model.
It would also be advisable to consider each sector separately, it can easily be
seen that for example house prices probable have greater influence on firms in
the real estate sector and gas prices on firms in the transport sector.
It is observable by viewing the creditworthiness data of defaulted firms that it is

possible to split the defaulted firms into two different groups. That is defaults
that are easily classified as defaults and defaults that is difficult to classify. It
would be interesting to analyze the reasons behind the defaults of those firms
that have gone default when they have relatively high creditworthiness values.
If there where to be some common factors with those firms it would greatly
enhance the performance of any credit rating model considering those factors.
It would be interesting to construct a credit pricing model based on the results

in this thesis and compare its performance to the performance of commercial
softwares. Comparison between the different commercial softwares would also
be of great interest.
There are many different discriminatory power indicators introduced here, while
there are many other discriminatory power indicators available, many of which
assume that the underlying distribution of both the distribution of default and
non-default cases are normally distributed. That distributional assumption sim-
ply can not hold especially for the distribution of default cases. No discrimina-
tory power indicators with those assumptions are considered in this thesis. It
would be interesting to develop some new discriminatory power indicators that
would consider the PDs instead of the risk ratings.
It would also by interesting to apply the fixed income portfolio analysis into
the analysis as Altman and Saunders [3] points out has not seen widespread use
to this day. Portfolio theory could be applied to banks portfolio to price, by
determining interest rates, new loan applicants after calculating their probability
of default, their risk measure.
132 Conclusion
Appendix A
Credit Pricing Modeling
In this chapter a practical method for estimating the loss distribution is pre-
sented. The theory is mostly adapted from Alexander and Sheedy [1] and Ong
[26].
A.1 Modeling of Loss Distribution
In order to estimate the loss distribution from a loan portfolio, the probability
distribution of defaults has to be estimated first. For the portfolio a firm can
either default with probability π or stay solvent with probability (1-π). The
default events for different firms are assumed independent and are thus well
fitted by the binomial distribution. The probability of exactly k defaults in the
portfolio is then:
n!
Pr(k) = π k (1 − π)n−k (A.1)
k!(n − k)!
For large n this probability can be approximated by the Poisson distribution:
(nπ)k e−nπ
Pr(k) = (A.2)
k!
According to two rules of thumb the approximation is good if n ≥ 20 and
π ≤ 0.05, or if n ≥ 100 and nπ ≤ 10.
134 Credit Pricing Modeling
From the probability distribution of default it is possible to estimate the prob-

ability distribution of losses
F (k) = Pr(k) × LGD × EAD × P D × k (A.3)
where PD is adopted from the credit rating model, EAD is estimated as the
current exposure. The LGD should be estimated from historical data.
For this procedure to be used in practice the whole portfolio has to be divided
into m approximately equally large portfolios. The reason for splitting the whole
portfolio up into smaller portfolios is that for large n the binomial distribution
behaves as the normal distribution, as provided by the central limit theory. The
portfolio should be divided by the size of exposure such that the firms with
smallest exposure are in the first portfolio and so on. If the fore mentioned rules
of thumbs are satisfied then the probability distribution of default is approxi-
mated by the Poisson distribution.
The probability of default for each of the m portfolios is then
(nπ)ki e−(nπ)i
Pr(k)i = i = 1, 2, . . . , m (A.4)
k!
From the probability distribution of default it is possible to estimate the prob-

ability distribution of losses as
F (k, i) = Pr(k)i × LGD × EADi × P Di × k (A.5)
where EADi and P Di is the portfolio’s average exposure and probabilities of

default, respectively. However, a more accurate estimate would be obtained if
they where modeled as stochastic variables.
From equations A.4 and A.5 it is possible to estimate the expected loss (EL).
That is done by summing up the loss for all k such that the cumulative Pr(k)i is
0.5, and then summing up the relative losses for all m portfolios. It is therefore
possible to estimate V aRα by summing up the cumulative probability to the α
level. From the EL and V aRα it is possible to calculate the un-expected loss
(UEL) which is sometimes also referred to as incremental credit reserve (ICR)
in the literature.
Appendix B
Additional Modeling Results
B.1 Detailed Performance of Multivariate Mod-

els
In order to make an educated decision on how the quantitative key figures should
be used in the model, three different means can be seen in Table B.1. The first
model in Table B.1 uses the two principal components of the PCA considering
each sector. The second model shows the performance of the sum of scaled
quantitative key figures. The last model in Table B.1 shows the performance of
the first principal component of the scaled quantitative key figures.
From Table B.1 it might be difficult to decide which of the three models has the
best performance. A good place to start analyzing tables similar to Table B.1
is to view the PCA.stat statistic as it pulls together some of the other statis-
tics. The first two principal components of the unscaled quantitative figures has
the highest PCA.stat but lacks to perform well on some of the other statistics.
The PCA.stat was constructed from competitive models whereas from the low
PCA.stat values for the models in Table B.1 those models can hardly be consid-
ered to be competitive. PCA.stat is constructed from the AUC, Pietra, CIER
and BSS indicators, which are always presented along with the PCA.stat so the
reader can confidence of the PCA.stat.
136 Additional Modeling Results
P
ŷ ∼ pc∗1 (α) + pc∗2 (α) ŷ ∼ α̃ ŷ ∼ pc1 (α̃)
AUC 0.73982 0.02557 0.78286 0.02757 0.77181 0.02925
Pietra 0.40328 0.05507 0.43447 0.04889 0.41429 0.05248
CIER 0.09513 0.35933 -0.24632 0.29130 -0.18314 0.33465
Gini 0.47964 0.05113 0.56572 0.05513 0.54362 0.05850
Variance 0.01659 0.00070 0.01659 0.00070 0.01659 0.00070
Calibration 0.00027 0.00012 0.00039 0.00022 0.00026 0.00016
Resolution 0.00044 0.00012 0.00052 0.00013 0.00050 0.00014
Brier 0.01643 0.00070 0.01647 0.00072 0.01636 0.00070
BSS 0.00988 0.00790 0.00756 0.01957 0.01417 0.01674
PCA.stat -7.81915 2.02009 -8.18340 2.46617 -8.16708 2.58280
AIC 884.937 19.3976 829.646 18.2382 835.900 18.4168
Psuedo R2 0.05803 0.01389 0.12159 0.01196 0.10846 0.01203
Table B.1: Three different models handling the quantitative figures.
The Gini indicators is only reported as it is commonly reported in the literature.

The Gini indicator is completely correlated with the AUC and only one of the
two statistics should be considered when the results are considered. The Brier
indicator is the sum of the variance and the calibration minus the resolution.
The variance is the same for all models which have the same datasets. The
variance is reported for the modeler as it shows the general impurity of the
dataset. It is thus desirable that the variance is of similar magnitude in order
for comparison between two different datasets. The indicators from the Gini
Index to Brier indicator will generally be submitted from the reported results
in order to make the results more readable. It then also convenient that all the
remaining discriminatory power indicators should be as high as possible.
The AIC and psuedo R2 are measure the fit of the model not its performance
as a credit assessment model. The sum of the scaled qualitative key figures has
the best fit indicating that it might be a good performer. The last model in
Table B.1 considers the first principal component of the scaled qualitative key
figures, and performs slightly better than the model considering the sum model,
according to the PCA.stat.
Considering next the qualitative figures in Table B.2 no matter what indicators
are analyzed it is quickly apparent that the qualitative figures outperform the
quantitative key figures.
B.1 Detailed Performance of Multivariate Models 137
P
ŷ ∼ pc1 (ϕ) ŷ ∼ pc1 (ϕ) + pc2 (ϕ) ŷ ∼ ϕ
DPIndicator Mean Std.dev. Mean Std.dev. Mean Std.dev.
AUC 0.83785 0.02643 0.84726 0.02414 0.84770 0.02499
Pietra 0.56553 0.04685 0.56431 0.04372 0.56672 0.05526
CIER 0.20652 0.15527 0.16860 0.15328 0.23910 0.16155
BSS 0.04437 0.01755 0.04028 0.01642 0.05217 0.02123
PCA.stat -2.97002 1.82934 -3.06658 1.73939 -2.43837 2.07074
AIC 761.940 20.2347 753.953 20.1698 817.509 22.8831
Psuedo R2 0.18774 0.01524 0.19845 0.01516 0.20976 0.01930
Table B.2: Three different models handling the qualitative figures.
From Table B.2 it can be seen, e.g. by considering the PCA.stat, that the model
containing the sum of the qualitative figures outperforms the other two models.
Interestingly it can be seen that it also has the highest AIC indicating that it
has the poorest fit out of the three. It should be taken into consideration that
the sum model has six variables probable whereas the others only have one and
two variables, explaining somewhat the high value of AIC. It should be noted
that the sum model has higher standard deviation on the PCA.stat than the
others.
In Table B.3 some combinations of the qualitative and quantitative figures are
considered. It is clear that the second principal component representative of
the qualitative figures has some good predictive powers. The results in Tables
B.4-B.8 are given a full discussion in Section 7.5 and a just listed here for further
references.
As can be seen in Table B.9 the model including the age factor does not perform
as well as the one without the age factor. Although this seems decisive it is in
fact not, as age is most probable a factor. Recalling that the firms that have
missing values are deleted from the modeling dataset which has been considered
to this point. It should be clear that young firms would rather have missing
values than older firms. An successful attempt was made to prove this point by
considering a modeling dataset that excluded the qualitative figures and using
the principal component representatives of the qualitative figures.
In Table B.10 the results for the attempt to use the two first principal compo-
nents instead of the sum of qualitative key figures are listed. From the results it
is clear that it does not outperform the use of the sum of qualitative key figures,
as can be seen in Table B.9.
P P P
ŷ ∼ α̃ + pc1 (ϕ) ŷ ∼ α̃ + pc1 (ϕ) + pc2 (ϕ) ŷ ∼ i∈{1,2,4} pci (α̃, ϕ)
AUC 0.86001 0.02357 0.87128 0.02174 0.86231 0.02345
Pietra 0.58767 0.05149 0.60575 0.04357 0.59143 0.05542
CIER 0.21020 0.13418 0.20746 0.13256 0.22424 0.13581
BSS 0.04492 0.01971 0.05300 0.02067 0.04449 0.01952
PCA.stat -2.29438 1.78160 -1.70544 1.71236 -2.16406 1.86009
AIC 736.802 20.5171 728.705 20.8637 736.490 20.5230
Psuedo R2 0.22326 0.01630 0.23408 0.01676 0.21930 0.01629
Table B.3: Models considering combinations of qualitative and quantitative

figures.
RMC ŷ ∼ pc∗1 (α) + pc∗2 (α) + pc1 (ϕ) ŷ ∼ pc1 (α̃) + pc1 (ϕ)
AUC 0.88380 0.01890 0.84780 0.02379 0.85851 0.02396
Pietra 0.60149 0.04582 0.57775 0.04827 0.58138 0.05110
CIER 0.26502 0.12102 0.20872 0.14318 0.20029 0.14207
BSS 0.08448 0.02098 0.04582 0.01877 0.04721 0.01986
PCA.stat -0.48041 1.69085 -2.61091 1.76196 -2.37164 1.85403
AIC 751.191 20.3170 738.210 19.9818
Psuedo R2 0.20355 0.01591 0.21532 0.01554
Table B.4: Models considering different principal component procedures for the
quantitative key figures.
ŷ ∼ pc1 (α̃) + pc2 (α̃) ŷ ∼ pc1 (α̃) + pc1 (ϕ) ŷ ∼ pc1 (α̃) + pc2 (α̃)
+pc1 (ϕ) +pc2 (ϕ) +pc1 (ϕ) + pc2 (ϕ)
AUC 0.85783 0.02422 0.86962 0.02130 0.84780 0.02379
Pietra 0.58297 0.05028 0.60758 0.03841 0.57775 0.04827
CIER 0.20727 0.13902 0.18618 0.14480 0.20872 0.14318
BSS 0.04557 0.02050 0.04726 0.02038 0.04582 0.01877
PCA.stat -2.37979 1.81994 -1.95172 1.66847 -2.61091 1.76196
AIC 739.176 20.3843 729.683 20.3124 751.191 20.3170
Psuedo R2 0.21643 0.01604 0.22660 0.01606 0.20355 0.01591
Table B.5: Models considering different combinations of principal components

for both the qualitative and quantitative figures.
P P P
ŷ ∼ i∈{1,2,4} pci (α̃, ϕ) ŷ ∼ i∈{1,2,4} pci (α̃, ϕ) ŷ ∼ pci (α̃, ϕ)
i∈{1,2,4}
+γo +γo + γaa +γo + γaa + γa
AUC 0.87745 0.02374 0.87787 0.02419 0.87710 0.02421
Pietra 0.61149 0.05311 0.61095 0.05151 0.62099 0.04929
CIER 0.42878 0.11387 0.44029 0.12174 0.44121 0.11794
BSS 0.08360 0.02103 0.08785 0.02211 0.08859 0.02215
PCA.stat 0.17029 1.76229 0.32746 1.80317 0.43781 1.75137
AIC 699.890 20.9853 696.784 21.8350 695.738 22.1820
Psuedo R2 0.26281 0.01789 0.27043 0.01896 0.27584 0.01942
Table B.6: Introducing the customer factors,γo indicates whether firms have
previously failed to fulfill their obligations, γaa indicates whether there are some
annotation made by the accountant in the firms financial statement and γa is
an age factor.
P P P P
RMC ŷ ∼ α̃ + ϕ ŷ ∼ α̃ + ϕ + γo
AUC 0.88380 0.01890 0.87424 0.02341 0.88547 0.02434
Pietra 0.60149 0.04582 0.61055 0.05222 0.63648 0.05308
CIER 0.26502 0.12102 0.22785 0.13405 0.44337 0.10300
BSS 0.08448 0.02098 0.05660 0.02115 0.08619 0.01971
PCA.stat -0.48041 1.69085 -1.42421 1.80689 0.70390 1.65541
AIC 723.649 21.5181 688.360 22.0673
Psuedo R2 0.24808 0.01773 0.29018 0.01929
Table B.7: Models where the both the qualitative and quantitative figures are
used individually. The obligation variable is also introduced to the model with
both the qualitative and quantitative figures.
P P P P
ŷ ∼ α̃ + ϕ ŷ ∼ α̃ + ϕ
RMC +γo + γaa +γo + γaa + γa
AUC 0.88380 0.01890 0.88572 0.02489 0.88565 0.02540
Pietra 0.60149 0.04582 0.63273 0.05262 0.63800 0.04865
CIER 0.26502 0.12102 0.45406 0.10027 0.44279 0.10310
BSS 0.08448 0.02098 0.09227 0.02162 0.08995 0.02112
PCA.stat -0.48041 1.69085 0.86769 1.71254 0.81453 1.72739
AIC 684.090 22.6430 683.246 23.0888
Psuedo R2 0.29904 0.02009 0.30424 0.02059
Table B.8: Model performances when two further customer factors are intro-
duced.
P P
ŷ ∼ pc1 (α̃) + ϕ ŷ ∼ pc1 (α̃) + ϕ
RMC +γo + γaa +γo + γaa + γa
AUC 0.88380 0.01890 0.88557 0.02427 0.88578 0.02529
Pietra 0.60149 0.04582 0.63032 0.04787 0.63582 0.04852
CIER 0.26502 0.12102 0.45815 0.10427 0.44672 0.11022
BSS 0.08448 0.02098 0.09323 0.02151 0.09212 0.02027
PCA.stat -0.48041 1.69085 0.88147 1.69849 0.86564 1.74342
AIC 683.929 22.1750 682.563 22.6804
psuedo R2 0.29278 0.01964 0.29853 0.02021
Table B.9: The first principal component representative of scaled key figures is
introduced as a replacement of the sum of scaled key figures.
P
ŷ ∼ pc1 (α̃) + pc1 (ϕ) ŷ ∼ α̃ + pc1 (ϕ)
+pc2 (ϕ) + γo + γaa + γa +pc2 (ϕ) + γo + γaa + γa
DP Indicator Mean Std.dev. Mean Std.dev.
AUC 0.88414 0.02199 0.88359 0.02326
Pietra 0.63235 0.03944 0.63518 0.04604
CIER 0.44639 0.11448 0.44064 0.11415
BSS 0.09031 0.02131 0.08904 0.02132
PCA.STAT 0.75239 1.58640 0.71479 1.65092
AIC 689.933 21.9609 690.250 22.3572
Psuedo R2 0.28205 0.01928 0.28815 0.01960
Table B.10: The first two principal components of the qualitative figures is
introduced as a replacement of the sum of qualitative figures.
B.2 Additional Principal Component Analysis
Generally the regular variables of the complete dataset were used for model-
ing purposes. It is though interesting to analyze the performance of principal
component representative. In this chapter the general results of PCA I-IV are
presented.
B.2.1 I - PCA of qualitative figures
In all general analysis the complete dataset is used as a modeling dataset. It

is though interesting to analyze the performance of variables when some of the
observation containing missing values are included. Missing values are handled
differently for the qualitative and quantitative variables respectively. As the key
figures are calculated from firms financial statements, in no cases is there one or
more missing, that is if one is missing then they are in fact all missing. That is
not the case for the qualitative figures as the reason for a missing value is that it
is not observable or appropriate, such as it might be difficult to access the market
position for some businesses. By using the principal component representation of
the modeling variables, PCA can be used to account for missing values whereas
if one or more variables where missing then the others that are not missing are
rotated in such a way that their internal balance is kept. This is a debatable
non-standard technique and was done to analyze the data that are normally
excluded from the dataset, when the complete dataset is constructed.
It is easy to criticize this procedure, it was however preferred to a nearest neigh-

bor procedure. This procedure was chosen as it was considered better to use
weights that might be slightly off, instead of producing some generated values
that might describe a totally different situation. Others procedure could be
more appropriate. The rotation of variables and some summary of the prin-
cipal component analysis of the qualitative figures can be seen in table B.11.
From table B.11 it can bee seen that the first principal component makes up
for almost 70% of the total variance of all the original variables. As pc1 makes
up for such a large part of the variance it is likely to be significant in a LR
model as the other principal components are likely to be surplus in a LR model.
For the first PC all the variables have the same sign and are similar in size,
refunding is though somewhat larger than the others. It is observable that the
pc1 and pc2 are almost identical for Position and Stability, indicating that they
are rather similar for most cases. It is also interesting to compare the rota-
tion loadings of the first PC in table B.11 and the single variable results in table
7.101 . From this comparison it is clear that factors with higher rotation loadings
1 see page 116
B.2 Additional Principal Component Analysis 143
Variable PC1 PC2 PC3 PC4 PC5 PC6

MANAGEMENT −0.4118 0.1536 −0.2516 0.6732 0.0960 −0.5301
STABILITY −0.3799 0.4269 0.4785 −0.0152 −0.6645 0.0520
POSITION −0.3784 0.4383 0.2743 −0.1498 0.7169 0.2304
SITUATION −0.3115 0.1806 −0.5389 −0.6783 −0.1042 −0.3302
REFUNDING −0.5151 −0.7377 0.3557 −0.1650 0.0663 −0.1800
RISK −0.4249 −0.1591 −0.4643 0.1918 −0.1415 0.7223
Standard deviation 2.1384 0.8419 0.6526 0.5775 0.5680 0.5252
Proportion of Variance 0.6887 0.1067 0.0641 0.0502 0.0486 0.0415
Cumulative Proportion 0.6887 0.7955 0.8596 0.9099 0.9585 1.0000
Table B.11: The rotation of variables and summary of the principal component
analysis of the qualitative figures.
show better modeling performance, whereas management is the only exception.

The almost equal performance of the Position and Stability verifies that they
are most probably very similar. Another interesting fact is that even though
the first PC describes almost 70% of the total variance the first two PC have
predictive powers as can be seen in Table B.3.
B.2.2 II - PCA of quantitative key figures
The PCA for the quantitative key figures have some different variants. First the
PCA of the scaled key figures is pursued, the results are summarized in Table
B.12. From Table B.12 it can be seen that all the scaled key figures have the
same sign but the debt score has the largest rotation loading of all the variables.
The first PC only makes up for 46% of the total variance of the variables. It is
interesting that the first two PCs are quite similar for the liquidity and solvency
scores.
B.2.3 III - PCA of scaled quantitative key figures
The PCA for the unscaled figure is done by scaling them to have unit variance
before conducting the PCA. The scaling results in totally different scaling, if
compared with the scaled key figures. The summary results can be seen in Table
B.13 and it is noticeable that the liquidity ratio has different sign compared to
the other ratios. It is then interesting to see that the first PC only makes up
Variable PC1 PC2 PC3 PC4

DEBT SCORE 0.6267 −0.3028 0.1331 −0.7056
LIQUIDITY SCORE 0.4251 0.5359 −0.7294 0.0099
RETURN SCORE 0.4454 −0.6146 −0.1835 0.6247
SOLVENCY SCORE 0.4778 0.4933 0.6454 0.3344
analysis of the scaled qualitative figures.
for 35% of the total variance of the variables. It thus clear that the variance
between variable is much more than for the qualitative figures. It is interesting
to consider the results of table 7.12, where it can be seen that the second PC
does not have any predictive powers.
Variable PC1 PC2 PC3 PC4

DEBT −0.6812 0.0034 0.1464 −0.7173
LIQUIDITY 0.1079 −0.7650 0.6345 0.0234
RETURN −0.6569 0.1602 0.2797 0.6816
SOLVENCY −0.3048 −0.6238 −0.7055 0.1424
analysis of the qualitative figures.
B.2.4 IV - PCA of quantitative key figures for each sector

separately
In order to deal with the different distribution of key figures between sectors a
PCA was done on each sector separately. The results can be seen in table B.14.
From table B.14 it is quite noticable that the first PCs are considerable different
between sectors.
B.2 Additional Principal Component Analysis 145
Real Estate PC1 PC2 PC3 PC4

DEBT −0.6924 0.1611 −0.0839 −0.6983
LIQUIDITY 0.1647 0.6780 −0.7120 0.0787
RETURN −0.6954 −0.1015 −0.1815 0.6879
SOLVENCY −0.0996 0.7099 0.6731 0.1816
Trade PC1 PC2 PC3 PC4
DEBT 0.6151 −0.0824 −0.3812 0.6852
LIQUIDITY −0.3887 0.2800 −0.8718 −0.1025
RETURN 0.6335 −0.1411 −0.2430 −0.7209
SOLVENCY 0.2631 0.9460 0.1886 −0.0175
Production PC1 PC2 PC3 PC4
DEBT −0.6603 0.1369 0.1125 −0.7298
LIQUIDITY 0.0020 −0.8719 0.4811 −0.0913
RETURN −0.6258 0.1486 0.3964 0.6551
SOLVENCY −0.4152 −0.4460 −0.7738 0.1727
Service PC1 PC2 PC3 PC4
DEBT 0.6983 0.0454 0.0656 −0.7113
LIQUIDITY −0.1204 0.7130 0.6907 −0.0090
RETURN 0.6593 −0.1760 0.3053 0.6642
SOLVENCY 0.2513 0.6772 −0.6522 0.2298
Transport PC1 PC2 PC3 PC4
DEBT −0.5756 0.1661 −0.8006 0.0153
LIQUIDITY −0.4596 −0.5988 0.2180 0.6186
RETURN −0.2810 0.7808 0.3720 0.4159
SOLVENCY −0.6153 −0.0647 0.4161 −0.6664
analysis of the qualitative figures for the all sector separately.
B.2.5 V - PCA of qualitative and quantitative figures
In order to analyze whether the combination of the qualitative and quantita-

tive figures would perform better than individual PCA of the qualitative and
quantitative figures. The results of the PCA of the qualitative and quantitative
figures can be seen in Table B.15.
B.2.6 VI - PCA of qualitative and scaled quantitative fig-

ures
In order to analyze whether the combination of the qualitative and scaled quan-
titative figures would perform better than individual PCA of the qualitative and
quantitative figures. The results of the PCA of the qualitative and quantitative
figures can be seen in Table B.16.
B.2 Additional Principal Component Analysis
Variable PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10
DEBT 0.0840 0.6872 0.1468 0.0104 0.6927 −0.1016 0.0925 −0.0079 0.0129 0.0094
LIQUIDITY −0.0188 0.1708 −0.8314 −0.5273 0.0157 −0.0122 −0.0083 −0.0186 0.0203 −0.0005
RETURN −0.0954 −0.6757 0.0549 −0.2793 0.6282 −0.2086 0.1182 −0.0134 0.0124 −0.0300
SOLVENCY −0.1917 −0.1297 −0.5041 0.7618 0.2740 0.1843 0.0092 −0.0089 −0.0509 −0.0099
MANAGEMENT −0.4145 0.0832 0.0705 −0.0700 −0.0312 −0.0445 −0.0839 −0.5429 −0.5611 −0.4379
STABILITY −0.3900 0.0333 0.1035 −0.1753 0.1106 0.4194 −0.3115 0.6378 −0.3309 0.0625
POSITION −0.3951 0.0241 0.1113 −0.1206 0.0800 0.4561 −0.2144 −0.4093 0.6093 0.1315
SITUATION −0.3779 0.0732 0.0377 −0.0441 −0.1280 0.1254 0.8575 0.1871 0.1245 −0.1745
REFUNDING −0.3881 0.0856 −0.0243 0.1012 −0.0802 −0.5912 −0.2963 0.2908 0.3961 −0.3821
RISK −0.4163 0.0624 0.0025 0.0332 −0.0884 −0.4011 0.0642 −0.1031 −0.1696 0.7807
Standard deviation 2.0634 1.1605 1.0238 0.9177 0.7931 0.7195 0.6789 0.5760 0.5607 0.5015
Proportion of Variance 0.4258 0.1347 0.1048 0.0842 0.0629 0.0518 0.0461 0.0332 0.0314 0.0251
Cumulative Proportion 0.4258 0.5604 0.6653 0.7495 0.8124 0.8641 0.9102 0.9434 0.9748 1.0000
Table B.15: The rotation of variables and summary of the principal component analysis of both the quantitative and
qualitative figures.
147
Additional Modeling Results
Variable PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10
DEBT SCORE 0.1971 0.6559 −0.0820 0.1753 −0.0979 0.0010 −0.1157 0.0612 −0.6802 −0.0640
LIQUIDITY SCORE 0.1746 0.2238 0.6274 −0.5386 0.2759 −0.3905 0.0114 0.0813 0.0167 0.0032
RETURN SCORE 0.1111 0.6140 −0.4280 −0.0523 0.1441 −0.0946 0.0918 −0.0563 0.6190 0.0184
SOLVENCY SCORE 0.2276 0.2115 0.5870 0.3253 −0.3643 0.4440 0.0262 −0.1459 0.3184 0.0006
MANAGEMENT 0.3950 −0.1568 −0.0898 −0.0488 0.1441 −0.0317 −0.4736 −0.6054 0.0085 −0.4430
STABILITY 0.3739 −0.0830 −0.1631 −0.4081 −0.0103 0.3722 0.6469 −0.2498 −0.1931 0.0570
POSITION 0.3774 −0.0924 −0.1604 −0.3556 −0.1382 0.3554 −0.4771 0.5458 0.0858 0.1364
SITUATION 0.3586 −0.1489 −0.0970 0.0500 −0.6489 −0.5728 0.1890 0.1355 0.0734 −0.1693
REFUNDING 0.3734 −0.1398 0.0443 0.4288 0.5104 0.0235 0.2540 0.4343 0.0362 −0.3738
RISK 0.4003 −0.1279 −0.0105 0.2981 0.1981 −0.2167 −0.0817 −0.1773 −0.0335 0.7804
Standard deviation 2.1201 1.2402 1.1459 0.7455 0.6851 0.6766 0.5770 0.5597 0.5233 0.5010
Proportion of Variance 0.4495 0.1538 0.1313 0.0556 0.0469 0.0458 0.0333 0.0313 0.0274 0.0251
Cumulative Proportion 0.4495 0.6033 0.7346 0.7902 0.8371 0.8829 0.9162 0.9475 0.9749 1.0000
Table B.16: The rotation of variables and summary of the principal component analysis of both the scaled quantitative
and qualitative figures.
148
B.3 Unsuccessful Modeling 149
B.3 Unsuccessful Modeling
In this section some of the methods that were tried without success are given a
brief discussion. In addition to a logistic regression a k-Nearest Neighbor (k-NN)
analysis and CART analysis where tried. It is not possible to use CART or k-NN
directly in credit modeling as it is not possible to get indipendent estimates of
probabilities of default, for each firm. As Frydman et al. [16] notes that CART
outperformes discriminant analysis and that even better results were obtained
by combining the methods. As they do not provide probabilities of default for
individual borrowers, some results from there analysis were used instead as a
explanatory variable. For the k-NN the ratio of defaulted neighbours, Ki (k)
in equation (5.49), was used as a variable. For the CART model the default
ratio, pm of the splitted region which that particular firm falls into, is used as
a variable. When the k-NN was used as a variable the resulting probabilities of
default were to low as can be seen in Figure B.1.
When one of the k-NN ratios where used as an independent variable, the re-
sulting PDs are much more conservative than for the other models. The models
with k-NN variables give much better values for the Akaike Information Crite-
ria (AIC) than any of the other models. It is thus clear that k-NN have some
good predictive powers but as they result in such a conservative PDs models
using the information of the neighborhood can not be used with the same rating
transformation as the RMC. In Figure B.1 it can be seen that a large proportion
of the ratings are 11 and 12. As the transformation in use is not appropriate
for this model the k-NN variable is left out of the analysis. It is though likely
to perform well, if another transformation is to be analyzed.
The CART analysis did not perform, an example of a CART tree can be seen
in Figure B.2. The tree in Figure B.2 has the the obligation factor at top.
As the obligation factor has 3 levels, non failed, failed in past 12 months and
failed in past 24 months, the bc labeling refers to either of the failed obligation
levels. The left leg from the root contains all firms that have previously not
failed to fulfill their obligations, where the right one contains the firm that have
previously failed to fulfill their obligations. At the next nodes it is the pc1 (ϕ)
variable which provides the split. At the leaf nodes, one can see the default rate
and the number of observations that are divided by the criterias above.
The same problem as for k-NN was observed when modeling the problem with
the support vector machine (SVM). That is the observed PDs are relatively
small compared to the the relative PDs obtained from logistic regression. That
results in high risk rating and no observations getting risk ratings lower than six.
SVD is a complex method whereas it has some tuning parameters and despite
extensive analysis reasonable PD where not obtained. From this it can be seen
Relative Frequencies of good and bad cases Relative Frequencies of good and bad cases
for the LR Rating Model for the Testset
Good cases
0.4
0.4
Relative Frequency
Relative Frequency
Bad cases
0.2
0.2
0.0
0.0
2 4 6 8 10 12 2 4 6 8 10 12

CIER statistic = 0.766 , Brier= 0.01724 , BSS= −0.0194 CIER statistic = 0.240 , Brier= 0.01568 , BSS= 0.073
Cumulative Frequencies of good and bad cases Cumulative Frequencies of good and bad cases
for the LR Rating Model for the Testset
0.8
0.8
0.4
0.4
0.0
0.0
2 4 6 8 10 12 2 4 6 8 10 12

AUC statistic = 0.835 , Gini index= 0.669 , Pietra= 0.557 AUC statistic = 0.888 , Gini index= 0.776 , Pietra= 0.615
ROC Curve of the LR Rating Model ROC Curve of the Testset


0.8
0.8
0.4
0.4
0.0
0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Relative Frequency of good cases Relative Frequency of good cases
Figure B.1: Validation plot for model using a k-NN variable.

B.3 Unsuccessful Modeling 151
OBLIGATION=bc
|
PCAquali< 2.984 PCAquali< 2.185
PCAquanti1< 1.76
0.006391
n=5320 PCAquanti1>=−0.7743
0.09091
PCAquanti1>=1.925
0.05502
n=66
n=309 PCAquanti2>=0.4885
0.7143
0.1471 0.5714
PCAquali>=3.387 n=7
n=34 n=7 0.08333
n=12 PCAquali< 4.008
PCAquali< 3.265
0.3333 0.75
0 0.4444 n=18 n=8
n=13 n=18
Figure B.2: CART partitioning tree.

that this is somewhat a problem, a problem that needs to be addressed. This

problem could be solved be manipulating the PDs or by making an alternative
transformation. The transformation could be optimized such that it would
maximize some discriminant power indicator. The optimization problem could
be set up for a twelve rating, risk rating system as proposed here:
max PCA.stat
subject to xi ≤ xi+1 , for i = 1, 2, . . . , 11.
x0 = 0, x12 = 1 (B.1)
where the xi represent a separating value in the transformation from PDs to

risk ratings. These values can be seen as percentages for the current transfor-
mation in table 2.12 . An optimization as this one could be done to improve the
performance of the rating model. It is possible to include the number of risk
ratings in the optimization problem but that would complicate the optimization
problem greatly.
2 see page 19
Appendix C
Programming
In this chapter a brief introduction is made on the programming proceedings.

All programming was performed in the statistical software R. A brief discussion
about the software can be seen in Section C.1.
C.1 The R Language
R is a programming language and environment for statistical computation and

software development. R’s main strengths lie in statistical and time-series anal-
ysis, whereas it can also be used in general matrix calculations. R also has great
data manipulation abilities, and fine graphical facilities.
R is a great environment for statistical software development whereas it is highly

flexible in programming new functions. R objects can be manipulated by the
programming language C and for computer intensive tasks C, C++ and Fortran
code can be linked and called at run time, making R particularly practical for
model development. R can also be used from within Microsoft Excel. The R
language is one of the most widely used statistical software amongst statisti-
cians1 .
1 Along with it’s commercial twin brother S-PLUS.
154 Programming
R is an open source program made available by the R Development Core Team

[23]2 . The term open source means that it is free of charge and all programs
are written by active practitioners. In order to give credit to the writers of
the additional packages, used in the programming process of this thesis, a brief
notation about the packages follows, with citations for further informations.
The MASS package made available by Venables and Ripley [32] makes discrim-
inant analysis possible and the CART analysis where done with the help of
the rpart package made available by Therneau and Atkinson [30]. The Design
package which made it possible to use a penalty in a logistic regression function
Jr Frank E Harrell [20]. The xtable package Dahl [12] make the transition of
reporting tables from R straight into LATEX a very easy task. With a touch of
class, Venables and Ripley [32], make it possible to perform a k-Nearest Neigh-
bor analysis very easily, with the class package.
C.2 R code
Code appendix is omitted but all code is available up on request. Please send
emails to arnar.einarsson@gmail.com.
2 For further informations see R’s homepage:http://www.r-project.org/

Bibliography
1. C. Alexander and E. Sheedy, editors. The Professional Risk Managers’

Handbook, volume Volume III. PRMIA Publications, (2005).
2. Edward I. Altman. Financial ratios, discriminant analysis and the prediction
of corporate bankruptcy. The Journal of Finance, 23(4):589–609, 1968.
ISSN 00221082. URL http://www.jstor.org/stable/2978933.
3. Edward I. Altman and Anthony Saunders. Credit risk measurement: De-
velopments over the last 20 years. Journal of Banking and Finance, 1998.
4. Edward I Altman, Brooks Brady, Andrea Resti, and Andrea Sironi. The
link between default and recovery rates: Theory, empirical evidence, and
implications*. The Journal of Business, 78(6):2203–2227, nov 2005.
5. E.I. Altman, G. Marco, and F. Varetto. Corporate distress diagnosis: Com-
parisons using linear discriminant analysis and neural networks (the italian
experience). Journal of Banking and Finance, pages 505–529, 1994.
6. International Convergence of Capital Measurement and Capital Standards.
Basel Committee on Banking Supervision, Basel, Switzerland, June 2004.
URL http://www.bis.org/publ/bcbs107.pdf. ISBN web: 92-9197-669-5.
7. Patrick Behr and André Güttler. Credit risk assessment and relationship
lending: An empirical analysis of german small and medium-sized enter-
prises. Journal of Small Business Management, 45(2):pp. 194–213, 2007.
8. BIS. Bank for International Settlements (BIS) Homepage. About the
Basel Committee. BIS home, Monetary & financial stability, 2008. URL
http://www.bis.org/bcbs/index.htm.
9. Z. Bodie, A. Kane, and A.J. Marcus. Investments. McGraw-Hill, 2002.
156 BIBLIOGRAPHY
10. Shiyi Chen, Wolfgang K. Härdle, and Rouslan A. Moro. Es-

timation of default probabilities with support vector machines.
Technical report, Humboldt-Universität zu Berlin, 2006. URL
"http://sfb649.wiwi.hu-berlin.de/papers/pdf/SFB649DP2006-077.pdf".
11. Credit. The corporate model. March 2007.
12. David B. Dahl. xtable: Export tables to LaTeX or HTML, 2007. R package
version 1.5-2.
13. D. Datschetzky, Y.D. Kuo, A. Tscherteu, T. Hudetz, U. Hauser-
Rethaller, B. Nösslinger, and G. Thonabauer, editors. Rat-
ing Models and Validation, Guidelines on Credit Risk Manage-
ment. Oesterreichische Nationalbank (OeNB), (2005). URL
www.oenb.at/en/img/rating_models_tcm16-22933.pdf.
14. A.J. Dobson. An Introduction to Generalized Linear Models. Chapman &
Hall/CRC, 2002.
15. B.K. Ersbøll and K. Conradsen. An Introduction to Statistics, volume 2.
IMM, Kgs. Lyngby, 2007.
16. Halina Frydman, Edward I. Altman, and Duen-Li Kao. Introducing re-
cursive partitioning for financial classification: The case of financial dis-
tress. The Journal of Finance, 40(1):269–291, 1985. ISSN 00221082. URL
http://www.jstor.org/stable/2328060.
17. J. Galindo and P. Tamayo. Credit risk assessment using statistical and ma-
chine learning: Basic methodology and risk modeling applications. Compu-
tational Economics, 15(No. 1-2):pp. 107–143, April 2000.
18. T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical

Learning: Data mining, Inference and Prediction. Springer, 2001.
19. Yusuf Jafry and Til Schuermann. Measurement, estimation and comparison
of credit migration matrices. Journal of Banking and Finance, 28:2603–
2639, August 2004.
20. Jr Frank E Harrell. Design: Design Package, 2007. URL
http://biostat.mc.vanderbilt.edu/s/Design. R package version 2.1-
1.
21. D. Lando. Credit Risk Modeling: Theory and Applications. Princeton Series
in Finance. Princeton University Press, 2004.
22. Marko Maslakovic. Banking 2008. Technical report, Inter-

national Financial Services London, London, UK, 2008. URL
http://www.ifsl.org.uk/upload/CBS_Banking_2008.pdf.
BIBLIOGRAPHY 157
23. R Development Core Team. R: A Language and Environment for Statistical

Computing. R Foundation for Statistical Computing, Vienna, Austria, 2007.
URL http://www.R-project.org. ISBN 3-900051-07-0.
24. D.C. Montgomery and G.C Runger. Applied Statistics and Probability for
Engineers. John Wiley & Sons, third edition, 2003.
25. A. H. Murphy. Scalar and vector partitions of the probability score (part i),
two state situation. Journal of Applied Meteorology, pages 273–282, 1972.
26. M.K. Ong, editor. The Basel Handbook. Risk books, a Division of Incisive
Financial Publishing. KPMG, 2nd edition edition, 2007.
27. Daniel Porath. Estimating probabilities of default for
german savings banks and credit cooperatives. Tech-
nical report, Deutsche Bundesbank, 2004. URL
http://www.bundesbank.de/download/bankenaufsicht/dkp/200406dkp_b.pdf.
28. J. Shlens. A tutorial on principal component analysis. Systems Neu-
robiology Laboratory and Institute for Nonlinear Science, University
of California, San Diego La Jolla, CA, December 10 2005. URL
http://www.cs.cmu.edu/~elaw/papers/pca.pdf.
29. Ming-Yi Sun and Szu-Fang Wang. Validation of credit rating models - a
preliminary look at methodology and literature review. Review of Financial
Risk Management, No year listed.
30. Terry M Therneau and Beth Atkinson.
rpart: Recursive Partitioning, 2007. URL
http://mayoresearch.mayo.edu/mayo/research/biostat/splusfunctions.cfm.
R package version 3.1-37.
31. P. Thyregod and H. Madsen. An Introduction to General and Generalized
Linear Models. IMM, Kgs. Lyngby, 2006.
32. W. N. Venables and B. D. Ripley. Modern Applied Statis-
tics with S. Springer, New York, fourth edition, 2002. URL
http://www.stats.ox.ac.uk/pub/MASS4. ISBN 0-387-95457-0.
33. Wikipedia. List of recessions in the united states
— wikipedia, the free encyclopedia, 2008. URL
http://en.wikipedia.org/w/index.php?title=List_of_recessions_in_the_Un%ited_St
[Online; accessed 4-September-2008].
34. Lean Yu, Shouyang Wang, and Kin Keung Lai. Credit risk assessment with a
multistage neural network ensemble learning approach. Expert Systemswith
Applications, 2007.
35. Stavros A. Zenios. Practical financial optimization, Draft of July 22 2005.

Credit Risk Modelling

Загружено:

Сведения о документе

Исходное описание:

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Credit Risk Modelling

Загружено:

Авторское право:

Доступные форматы

Credit Risk Modeling

Arnar Ingi Einarsson

Kongens Lyngby 2008

IMM-PHD: ISSN 0909-3192

This thesis was prepared at Informatics Mathematical Modelling, the Technical

Lyngby, October 2008

Arnar Ingi Einarsson

I thank my supervisors Professor Henrik Madsen and Jesper Colliander Kris-

1.2 Aim of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Outline of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Credit Modeling Framework 5

2.1 Definition of Credit Concepts . . . . . . . . . . . . . . . . . . . . 5

2.2 Subprime Mortgage Crisis . . . . . . . . . . . . . . . . . . . . . . 13

2.3 Development Process of Credit Rating Models . . . . . . . . . . . 15

3 Commonly Used Credit Assessment Models 21

3.1 Heuristic Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.2 Statistical Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.3 Causal Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.4 Hybrid Form Models . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.5 Performance of Credit Risk Models . . . . . . . . . . . . . . . . . 31

4.1 Data dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.2 Quantitative key figures . . . . . . . . . . . . . . . . . . . . . . . 39

4.3 Qualitative figures . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.4 Customer factors . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.5 Other factors and figures . . . . . . . . . . . . . . . . . . . . . . . 54

4.6 Exploratory data analysis . . . . . . . . . . . . . . . . . . . . . . 58

5 The Modeling Toolbox 61

5.1 General Linear Models . . . . . . . . . . . . . . . . . . . . . . . . 61

5.2 Generalized Linear Models . . . . . . . . . . . . . . . . . . . . . . 69

5.3 Discriminant Analysis . . . . . . . . . . . . . . . . . . . . . . . . 73

5.4 k-Nearest Neighbors . . . . . . . . . . . . . . . . . . . . . . . . . 77

5.5 CART, a tree-based Method . . . . . . . . . . . . . . . . . . . . . 77

5.6 Principal Component Analysis . . . . . . . . . . . . . . . . . . . 80

6.1 Discriminatory Power . . . . . . . . . . . . . . . . . . . . . . . . 85

6.2 Relative frequencies and Cumulative frequencies . . . . . . . . . 86

6.3 ROC curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

6.4 Measures of Discriminatory Power . . . . . . . . . . . . . . . . . 88

7.1 General Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

7.2 Principal Component Analysis . . . . . . . . . . . . . . . . . . . 107

7.3 Resampling Iterations . . . . . . . . . . . . . . . . . . . . . . . . 112

7.4 Performance of Individual Variables . . . . . . . . . . . . . . . . 114

7.5 Performance of Multivariate Models . . . . . . . . . . . . . . . . 119

7.6 Addition of Variables . . . . . . . . . . . . . . . . . . . . . . . . . 122

7.7 Discriminant Analysis . . . . . . . . . . . . . . . . . . . . . . . . 124

7.8 Link functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

8.1 Summary of Results . . . . . . . . . . . . . . . . . . . . . . . . . 129

8.2 Further work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

A Credit Pricing Modeling 133

A.1 Modeling of Loss Distribution . . . . . . . . . . . . . . . . . . . . 133

B Additional Modeling Results 135

B.1 Detailed Performance of Multivariate Models . . . . . . . . . . . 135

B.2 Additional Principal Component Analysis . . . . . . . . . . . . . 142

B.3 Unsuccessful Modeling . . . . . . . . . . . . . . . . . . . . . . . . 149

C.1 The R Language . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

C.2 R code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

procedures for credit rating models.

1.2 Aim of Thesis

Validation of credit ratings is hard compared to regular modeling whereas there

As missing values are frequently apparent in many of the modeling variables,

1.3 Outline of Thesis

The structure of the thesis is as follows.

Chapter 2: Credit Modeling Framework. Introduces the basic concepts

Credit Modeling Framework