Combining Accounting Data and A Structural Model For Predicting Credit Ratings Empirical Evidence From European Listed Firms 2015 Journal of Banking F

Journal of Banking & Finance 50 (2015) 599607
Contents lists available at ScienceDirect
Journal of Banking & Finance

journal homepage: www.elsevier.com/locate/jbf
Combining accounting data and a structural model for predicting credit

ratings: Empirical evidence from European listed rms
Michael Doumpos a,, Dimitrios Niklis a, Constantin Zopounidis a,b, Kostas Andriosopoulos c
a
Technical University of Crete, School of Production Engineering and Management, Financial Engineering Laboratory, University Campus, 73100 Chania, Greece
Audencia Group, Nantes School of Management, France
c
ESCP Europe Business School, Research Centre for Energy Management, UK
b
a r t i c l e
i n f o
Article history:
Available online 24 January 2014
JEL classication:
C44
G24
G13
Keywords:
Credit ratings
Rating agencies
BlackScholesMerton model
Multi-criteria decision making
a b s t r a c t
Ratings issued by credit rating agencies (CRAs) play an important role in the global nancial environment.
Among other issues, past studies have explored the potential for predicting these ratings using a variety
of explanatory factors and modeling approaches. This paper describes a multi-criteria classication
approach that combines accounting data with a structural default prediction model in order to obtain
improved predictions and test the incremental information that a structural model provides in this context. Empirical results are presented for a panel data set of European listed rms during the period 2002
2012. The analysis indicates that a distance-to-default measure obtained from a structural model adds
signicant information compared to popular nancial ratios. Nevertheless, its power is considerably
weakened when market capitalization is also considered. The robustness of the results is examined over
time and under different rating category specications.
2014 Elsevier B.V. All rights reserved.
1. Introduction
Credit ratings are important ingredients of the credit risk management process, and they are widely used for estimating default
probabilities, supporting credit-granting decisions, pricing loans,
and managing loan portfolios. Credit ratings are either obtained
through models developed internally by nancial institutions
(Treacy and Carey, 2000) or provided externally by credit rating
agencies (CRAs). The latter, despite the criticisms on their scope
and accuracy (e.g., Frost, 2007; Pagano and Volpin, 2010; Tichy
et al., 2011), are widely used by investors, nancial institutions,
and regulators, and they have been extensively studied in academic
research (for a recent overview, see Jeon and Lovo, 2013). In this
context, models that explain and replicate the ratings issued
by CRAs can be useful in various ways, as they can facilitate an
understanding of the factors that drive CRAs evaluations, provide
investors and regulators with early-warning signals and information for important rating changes, and support the credit risk
assessment process for rms not rated by the CRAs.
Previous studies have focused on analyzing and predicting credit ratings using mostly rm-specic data (usually in the form of
Corresponding author. Tel.: +30 28210 37318; fax: +30 28210 69410.
E-mail addresses: mdoumpos@dpem.tuc.gr (M. Doumpos), dniklis@isc.tuc.gr
(D. Niklis), kostas@dpem.tuc.gr (C. Zopounidis), kandriosopoulos@escpeurope.eu
(K. Andriosopoulos).
0378-4266/$ - see front matter 2014 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.jbankn.2014.01.010
nancial ratios) and market variables (Huang et al., 2004; Mizen

and Tsoukas, 2012; Pasiouras et al., 2006). Some recent studies
(Hwang et al., 2010; Hwang, 2013; Lu et al., 2012) have also considered default risk estimates from structural models (Black and
Scholes, 1973; Merton, 1974). Nevertheless, this line of research
has been underdeveloped, as no systematic analysis has been conducted to examine the value of the additional information that the
estimates of structural models provide compared to accountingbased data for predicting credit ratings, even though considerable
research has been done on this issue in the context of default prediction (e.g., Agarwal and Tafer, 2008; Hillegeist et al., 2004;
Vassalou and Xing, 2004). Hilscher and Wilson (2013), however,
argue that focusing solely on a rms default risk may lead to considerable loss of information in credit risk assessment, as systematic risk is also an important yet distinct dimension, and it is
best modeled through credit ratings. This is in accordance with
the results of Das et al. (2009), who found that a combination of
accounting variables and a structural model was more powerful
in explaining CDS spreads when compared to the independent
use of its main components.
Therefore, given the fundamental differences between default
prediction and credit ratings and the possible synergies that can
be derived through the combination of different credit risk modeling approaches, it is interesting to explore the usefulness of incorporating market-based risk estimates from a structural model to
600
M. Doumpos et al. / Journal of Banking & Finance 50 (2015) 599607
the analysis and prediction of credit ratings in combination with

nancial data and simple market variables, such as capitalization.
This type of analysis can provide further evidence on the relationship between credit ratings and default estimates, the added value
of structural models in credit risk assessment, the role of their
main ingredients, and the synergies between structural and reduced-form models. In this study, we contribute to the literature
by exploring these issues using a sample of European companies
from different countries over the period 20022012. While most
of the past studies related to the analysis of the ratings issued by
CRAs have focused on the U.S. and the UK, the ratings of rms in
European countries (other than the UK) have been relatively under-examined. The focus on European data has some interesting
aspects. First, during the past decade, particularly after the outbreak of the European sovereign debt crisis, the role of CRAs has received much attention from authorities, regulators, and
governments in Europe. Furthermore, in contrast to U.S. rms,
which operate out of a single country, European rms face different
economic and business conditions, and the global crisis has not affected all European countries in the same manner. These particular
features make it interesting to examine how the ndings of studies
conducted in other regions and time periods translate into a crosscountry European setting, and to investigate the existence of timevarying effects, particularly in light of the ongoing turmoil in the
European economic environment.
Except for the above contributions to the literature, on the
methodological side we employ an innovative non-parametric
multi-criteria decision-making (MCDM) technique, as opposed to
the parametric statistical methods (e.g., logistic or probit models)
often used in this area. MCDM is well suited to the ordinal nature
of credit ratings and the features of credit scorecards, while taking
into account the nonlinearities observed in previous studies
(Hwang et al., 2010; Mizen and Tsoukas, 2012) through an easyto-comprehend additive modeling form that does not rely on statistical assumptions. In this framework, we introduce a new linear
programming approach for building rating prediction models that
explicitly take into consideration the multi-grading nature of credit
ratings.
The obtained empirical results indicate that a structural model
provides signicant additional information when combined with
traditional accounting-based ratios. However, its signicance is
considerably reduced when market capitalization is also considered. The analysis of the stability of the results over time further
shows that the relative importance of the capitalization of rms
has increased during the European sovereign debt crisis. Finally,
the obtained conclusions are robust when considering a dichotomic scheme (i.e., investment vs. speculative grades), but the proposed multi-grading MCDM modeling approach is found to be
more accurate than dichotomic prediction models.
The rest of the paper is organized as follows. Section 2 discusses
the market model used in the analysis, as well as the multi-criteria
approach employed for constructing the credit rating classication
models. Section 3 is devoted to the description of the data set and
the variables, whereas Section 4 presents the empirical framework
and the analysis of the obtained results. Finally, Section 5 concludes the paper and discusses some future research directions.
2. Methodology
2.1. Market model
The works of Black and Scholes (1973) and Merton (1974) led to
the development of the research on structural models for credit
risk modeling. In this framework (henceforth referred to as BSM),
a rm is assumed to have a simple debt structure, consisting of a
single liability with face value L maturing at time T. The rm defaults on its debt at maturity if its assets market value is lower
than L. In this context, the rms market value of equity (E) is modeled as a call option on the underlying assets (A), whose value is
given by the BlackScholes option pricing formula:

p
E AN d1 Lerf T N d1 r T
where rf is the risk-free rate, r is the volatility of the asset returns,

N represents the cumulative normal distribution function, and
d1
lnA=L r f 0:5r2 T
p
r T
Furthermore, under Mertons assumption that equity is a function of assets and time, the following equation is derived from Its
lemma (Hull, 2011):
A
E
rE rN d1
Solving Eqs. (1) and (2) simultaneously or with iterative procedures

(Hillegeist et al., 2004; Vassalou and Xing, 2004) leads to an estimate of the market value of assets (A) and the volatility of the assets return (r). Then, a distance-to-default (DD) measure can be
dened as the number of standard deviations that the rm is away
from default (i.e., how much lnA=L should deviate from its mean in
order for default to occur; Vassalou and Xing (2004)):
DD
lnA=L l 0:5r2 T
p
r T
where l is the expected return on assets, which can be estimated

from the annual changes in A obtained from the solution of Eqs.
(1) and (2).
Despite its simplicity and appealing grounding in nancial theory, the basic BSM model is based on some well-documented but
strong assumptions (Agarwal and Tafer, 2008; Bharath and
Shumway, 2008), most notably involving the simple structure of
a rms debt (e.g., it is assumed that a rm issues a zero-coupon
bond of maturity T, and that default only occurs at maturity) and
the statistical distribution of the rms assets value (it is assumed
that it follows a geometric Brownian motion, thus implying that
assets value is log-normally distributed). Nevertheless, the model
has attracted much interest among academics and practitioners,
and several variants have been introduced in the literature (see
Agarwal and Tafer, 2008 for a comparative analysis).
2.2. Multi-criteria analysis approach
In this study, the development of models to explain and predict
credit ratings is based on a non-parametric MCDM approach.
MCDM has evolved into a major discipline in operations research
involved with decision problems under multiple criteria, and has
been extensively used in various areas of nancial risk management (Zopounidis and Doumpos, 2013), including credit scoring
and rating (Doumpos and Pasiouras, 2005; Doumpos and Zopounidis, 2011). In this context, we introduce and employ a variant of
the UTADIS multi-criteria classication method (Doumpos and
Zopounidis, 2002) in order to cope with the multi-class nature of
credit ratings. The adopted MCDM approach is based on the construction of an evaluation (scoring) model expressed in the form
of an additive value function, which is widely used by nancial
institutions for credit scoring and rating (Krahnen and Weber,
2001):
Vxi
K
X
wk v k xik
k1
where xi xi1 ; . . . ; xiK is the data vector for rm i on K independent attributes (evaluation criteria), wk is the (non-negative)
trade-off constant for criterion k (the trade-offs are normalized
to sum up to 1), and v k is the corresponding marginal value
function scaled in [0, 1]. The credit scores obtained from the additive model (4) range in [0, 1] and are inversely related to the risk
level of the rms (i.e., high credit scores indicate low risk and
vise-versa). The marginal value functions decompose the overall
credit score of a rm into partial scores at the criteria level; they
are non-decreasing for prot-related attributes and non-increasing for cost-related indicators. For example, the higher the protability of a rm (according to a protability indicator such as the
return on assets ROA), the higher (i.e., closer to 1) its performance on the corresponding marginal value function and the
higher its overall credit score (i.e., lower risk level). The marginal
value functions have a functional-free form (piece-wise linear)
that is inferred directly through the model-tting process. This enables the additive model to capture the nonlinear (monotone)
relationships between the independent attributes and the ratings
of the rms. Nevertheless, in contrast to popular nonlinear data
mining algorithms (e.g., neural networks), the additive form of
the model makes it easy to comprehend as it adopts the structure
of a simple credit scorecard. On the other hand, the criteria tradeoffs act as proxies for the relative importance of the independent
attributes (on a 01 scale). Given the additive and compensatory
structure of the evaluation model, a high risk level implied by
the low performance of an attribute with a high trade-off constant
is not easily compensated by high performance by other attributes
with low trade-offs.1
On the basis of its overall credit score as dened by (4), a rm i
is classied into risk grade R if and only if
t < Vxi < t 1
where 1 > t 1 > t 2 > t N1 > 0 are the thresholds that distinguish a
set of N ordered rating classes R1 ; R2 ; . . . ; RN (class R1 is the lowrisk grade and RN is the high-risk one). Given a training sample consisting of m observations from each rating class R , the additive
model can be developed through the solution of the following
optimization problem:
min
N
N
X

1 XX
yih yih
m
1
i2R h1
s:t: Vxi yi P t d 8 i 2 fR1 ; . . . ; R g; 1; . . . ; N 1 7

Vxi yi 6 t 1 d 8 i 2 fR ; . . . ; RN g; 2; . . . ; N
t t1 P e 1; . . . ; N 2
8
9
w1 w2 wK 1
10
v k xik v k xjk P 0 8 xik P xjk

0 6 v k xik 6 1; wk ; yi ; yi P 0 8 i; k;
11
12
The objective of this optimization formulation is to t an additive scoring model to the data, so that the total weighted classication error (downgrade and upgrade errors) is minimized. The
classication errors in the objective function are weighted by the
number of training observations from each rating class, thereby
avoiding the construction of a scoring model that is biased toward
larger classes.2 Constraint (7) denes the downgrade errors
y
for a rm i from rating class R
i ; yi;1 ; . . . ; yi;N1
1
For instance, if a protability indicator has a high trade-off constant and a
leverage indicator has a low trade-off, then poor protability cannot be easily
compensated for by lower debt levels in order to achieve an overall high credit risk
score.
2
The objective function can be easily modied to introduce different weights for
the downgrade and upgrade errors, thus enabling analysts to take into consideration
the different costs of such errors. However, due to the lack of such information, this
possibility is not explored in this study.
601
(a)
(b)
Fig. 1. Classication errors. (a) Two rms from the low-risk class, R1 , are
downgraded to categories R2 and R3 . (b) Two rms from the high-risk class, R3 ,
are upgraded.
( 1; . . . ; N 1). These errors refer to violations of the lower

bounds of classes R ; . . . ; RN1 . In particular, if a rm i from the rating class R is downgraded by the model (see Fig. 1a), then Vxi < t .
If the downgrade is limited to a one-notch rating difference, then
t 1 < Vxi < t and the error variables are y
i t Vxi d > 0
and y
i;1 yi;N1 0, where d is a small positive constant used
to model the strict inequalities in classication rule (5). If there is a
two-notch downgrade compared to the actual rating of the rm,
then
Vxi < t 1 < t
and
the
error
variables
are
y
i t Vxi d; yi;1 t 1 Vxi d, and the others equal to
0. Thus, the total downgrade error is y

i yi;1 . The same interpretation extends to higher downgrades as well. Similarly, constraint (8)

denes the upgrade errors y
i2 ; . . . ; yi for a rm i from rating class
R ( 2; . . . ; N) as the violations of the upper bounds of the classes
R2 ; . . . ; R . If a rm i from the rating class R is upgraded by the model
(see Fig. 1b), then Vxi > t 1 . If the upgrade is limited to one-notch
rating difference, then t 1 < Vxi < t 2 and the error variables are

y
i;1 Vxi t 1 d > 0 and yi1 yi;2 0. If there is a twonotch upgrade, then Vxi > t 2 > t 1 and the error variables are

y
i;2 Vxi t 2 d; yi;1 Vxi t 1 d, and the others equal

to 0. Thus, the total upgrade error is y
i;1 yi;2 . The same interpretation extends to higher upgrades. It should be noted that, as illustrated in Fig. 1, the use of multiple error variables enables the
distinction between errors of different magnitudes, thus leading to
better model tting for multi-group problems.
Constraint (9) ensures that class limits are non-increasing (with
e being a small positive constant), whereas (10) ensures that the
attributes trade-offs sum up to 1. Finally, constraint (11) imposes
the monotonicity of the marginal value functions (i.e., assuming
that all attributes are expressed in maximization form).
Using a piece-wise linear modeling scheme for the marginal value functions, this optimization model can be expressed in linear
programming form (for the details, see Doumpos and Zopounidis,
2002), thereby allowing the tting of model (4) to large data sets,
which are common in credit scoring and rating.
3. Data and variables
The empirical analysis is based on a panel data set consisting of
1325 rm-year observations involving European listed companies
over the period 20022012. The sample covers eight different
countries and ve business sectors, as illustrated in Table 1.
Financial data for the rms in the sample were collected from
the Osiris database, whereas Bloomberg was used to get the rms
ratings from S&P. Due to the sparsity of the data set with respect to
the number of observations from each rating grade in the S&P
scale, the observations were re-grouped. Two schemes are
602
Table 1
Sample composition (number of observations) by year, country, and business sector.
Year
No. of
obs.
Country
No. of
obs.
2002
2003
38
102
Germany
France
308
303
2004
115
UK
298
2005
126
Switzerland
135
2006
2007
2008
2009
2010
2011
2012
135
138
140
139
149
151
92
Netherlands
Italy
Spain
Belgium
130
88
33
30
Total
1325
Sector
No. of
obs.
Manufacturing
Information and
communication
Wholesale and
retail trade
Transportation
and storage
Construction
1325
853
220
130
Table 3
Averages of independent variables by rating group.
ROA
EBIT/IE
EQ/TA
EQ/LTD
CAP
DD
R1
R2
R3
R4
R5
Invest.
Specul.
13.69
18.33
46.86
2.29
18.01
8.96
8.02
8.42
34.74
1.27
16.75
5.94
6.58
5.54
30.48
1.06
15.74
4.17
4.26
3.12
27.82
0.87
14.65
2.89
2.79
0.54
21.33
0.78
13.12
1.31
7.64
7.56
33.26
1.23
16.27
5.17
2.47
2.46
26.17
0.85
14.26
2.49
90
32
1325
considered for this purpose. In the rst, the sample rms are
grouped into ve major risk classes, as follows: (1) class R1 , consisting of the lowest-risk cases, with ratings in the range from
AA- up to AAA; (2) class R2 , consisting of cases with ratings A,
A, and A+; (3) classes R3 , consisting of cases from low investment-level grades (BBB, BBB, BBB+); (4) class R4 , consisting of
speculative ratings (BB, BB, BB+); and (5) the high-risk category
R5 , which includes highly speculative ratings (i.e., D up to B+).
Alternatively, we also consider a two-group setting distinguishing
between speculative (D to BB+) and investment grades (BBB to
AAA). The percentage of observations in each rating group under
these schemes is shown in Table 2.
For every observation in the sample for year t, the S&P longterm rating is recorded at the end of June, while annual nancial
data are taken from the end of year t 1. The nancial data involve
four nancial ratios: ROA (prot before taxes/total assets), interest
coverage (earnings before interest and taxes/interest expenses,
EBIT/IE), solvency (equity/total assets, EQ/TA), and the long-term
debt leverage ratio (equity/long term debt, EQ/LTD). ROA is the primary indicator used to measure corporate protability. Interest
coverage assesses rms ability to cover their debt obligations
through their operating prots. The solvency ratio analyzes the
capital adequacy of the rms, whereas the long-term debt leverage
ratio takes into consideration the long-term debt burden of rms
relative to their equity. In addition to these nancial ratios, we also
take into account the sizes of rms, as measured by the logarithm
of their market capitalization (CAP), and a country risk indicator
Table 2
Percentage of sample observations in each risk category.
R1
R2
R3
R4
R5
Investment
Speculative
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
5.3
5.9
7.8
7.1
7.4
7.2
5.7
4.3
4.0
4.0
5.4
31.6
30.4
31.3
28.6
24.4
23.9
24.3
25.9
24.8
23.2
27.2
39.5
43.1
39.1
39.7
43.7
45.7
42.1
41.0
41.6
43.7
45.7
18.4
12.7
13.0
16.7
20.0
19.6
22.9
20.1
22.1
23.2
15.2
5.3
7.8
8.7
7.9
4.4
3.6
5.0
8.6
7.4
6.0
6.5
76.3
79.4
78.3
75.4
75.6
76.8
72.1
71.2
70.5
70.9
78.3
23.7
20.6
21.7
24.6
24.4
23.2
27.9
28.8
29.5
29.1
21.7
Overall
5.8
26.3
42.4
19.0
6.5
74.5
25.5
R1 includes ratings from AA to AAA; R2 includes A, A, and A+; R3 consists of
BBB, BBB, BBB+; R4 includes BB, BB, BB+, R5 includes ratings from D to B+.
Speculative grades range from D to BB+, and investment grades range from BBB to
AAA.
(CRI) dened as a binary variable that takes the value of 1 for countries rated by S&P as AAA (in a given year) and 0 otherwise.
The DD measure from the BSM model is also employed, estimated from the daily stock prices of the rms in year t 1. The
risk-free rate for countries in the Eurozone is taken from the
three-month Euribor rate, whereas the three-month treasury-bill
and LIBOR rates are employed for the UK and Switzerland, respectively. The face value of debt is dened using current liabilities plus
half the long-term debt, in accordance with the KMVs implementation of the BSM model (Dwyer et al., 2004) and the arguments
and empirical results presented by Vassalou and Xing (2004).3
Finally, following simplications similar to those of Bharath and
Shumway (2008) and Li and Miu (2010), the annual equity returns
and the corresponding volatilities over year t 1 are used as proxies
of the expected return on assets (l) and its volatility (r), respectively. However, to avoid using a negative or near-zero return on
assets, we adopt the transformation of Hillegeist et al. (2004) and
set l maxfr f ; r E g, where rf is the risk-free rate and r E denotes
the annual equity return.
Table 3 presents the averages (over all years) of the nancial
variables and the DD indicator for each rating group in the sample.
All variables have a clear monotone (increasing) relationship with
the ratings. In particular, highly rated rms (e.g., rating group R1 )
are more protable, have higher interest coverage, are better capitalized and leveraged in terms of long-term debt, and have higher
market capitalization. The DD is also considerable larger for highly
rated rms than for low-rated ones. The differences between the
rating groups are statistically signicant at the 1% level for all variables under the KruskalWallis non-parametric test. The country
indicator was also found signicant at the 5% level according to
the v2 test (p 0:045).
The relationship between the selected variables and the rating
of the rms is also veried with Kendalls s rank correlation coefcient, as illustrated in Table 4. The results are reported both for
the ve rating categories dened above and for the full rating scale
of S&P. For comparison purposes, we also include the correlations
for the logarithm of total assets (TA) as an alternative size indicator
to market capitalization. Results for two alternative procedures for
estimating the DD measure are also reported, including the iterative
procedure of Vassalou and Xing (2004) and the model of Bharath
and Shumway (2008). These are denoted as DD-VX and DD-BS,
respectively. All correlations are found signicant at the 1% level.
The negative signs indicate that (as expected) all variables are
negatively related to credit risk (i.e., the higher the considered variables the lower the risk). Market capitalization has the strongest
correlation with the ratings. Total assets are also strongly associated with the ratings, but the correlation is considerably weaker
compared to capitalization, thus conrming that capitalization
3
Vassalou and Xing (2004) note that considering long-term debt in the BSM model
(even though a time horizon of one year is assumed) is important because rms make
interest payments in the short term for their long-term liabilities. Additionally, longterm debt enables rms to roll over their short-term liabilities. The authors also
conducted tests using different portions of long-term debt, and concluded that there
is no signicant variation in the results of the BSM model compared to the base
setting where half of the long-term liabilities are used.

Table 4
Kendalls
s rank correlations with the credit ratings.
ROA
EBIT/IE
EQ/TA
EQ/LTD
CAP
TA
DD
DD-VX
DD-BS
Five risk-grades
Full rating scale
0.335
0.391
0.199
0.202
0.565
0.404
0.382
0.361
0.370
0.327
0.372
0.181
0.186
0.570
0.396
0.375
0.352
0.362
incorporates additional information when compared to accounting-based measures of a rms size, such as its total assets. Among
the three estimates of DD, the one obtained with the procedure
adopted in this study has the highest correlation with the ratings.
Finally, it is also worth noting the limited differences between the
ve-grade scheme and the full rating scale of S&P, which conrms
that there is no signicant information lost due to the adopted simplication of the rating scale.
4. Results
4.1. Empirical setting
Given that, during the time period spanned by the data, the
European economic environment has experienced signicant
changes (e.g., the outbreak of the global crisis and the European
sovereign debt crisis), we test the dynamics and robustness of
the results over time by developing and validating a series of models through a walk-forward approach. In particular, the data for the
period 20022005 are used rst for model tting, whereas the subsequent period, 20062012, serves as the holdout sample. In a second run, the training data are extended up to 2006, and the holdout
data span the period 20072012. The same process is repeated up
to the case where the training data cover the period 20022010.
Thus, six training and test runs are performed. Henceforth, these
walk-forward runs will be referred to as F05 (model tting on data
up to 2005) up to F10 (model tting on data up to 2010).
In each run of the walk-forward approach, four models are
developed. The rst model (henceforth referred to as M1) is based
solely on the four nancial ratios and the country indicator. The
second model (M2) additionally considers the DD. A third model
(M3) extends M1 by adding capitalization, whereas the most comprehensive model (M4) combines all independent attributes. The
consideration of these four models provides a decomposition of
the results, covering both the information content of the market
model compared to accounting ratios, as well as its explanatory
power as opposed to the size of the rms, which was found to be
the strongest predictor.
In the section below, we analyze the results obtained using the
ve-point rating classication described in the previous section. As
a robustness test, the two-class investment-speculative scheme
will be discussed in a separate section.
4.2. Overall results
Table 5 summarizes the trade-offs of all variables in the models,
averaged over all six tests of the walk-forward approach. As a measure of the variability of the results over the six tests, the coefcients of variation are also reported (in parentheses).
Under the two models that do not consider the capitalization of
rms (i.e., models M1 and M2), ROA and interest coverage (EBIT/IE)
are the strongest variables, followed by solvency (EQ/TA), whereas
the country risk rating indicator appears to be practically irrelevant
603
in both models. The DD measure has a strong effect in model M2,

with an average trade-off constant of 23.39%. The introduction of
capitalization in models M3 and M4 has a signicant impact on
the relative importance of the attributes. In particular, both models
show capitalization as the dominant factor, with the likelihood of
receiving a low risk rating increasing with capitalization. ROA
and interest coverage are also found to be signicant predictors.
The signicance of market capitalization has also been reported
by Hwang et al. (2010) and Hwang (2013), who derived similar results for U.S. rms. Furthermore, with the introduction of market
capitalization in model M4, the relative importance of the DD indicator becomes considerably weaker than in model M2, which
ignores capitalization. This result is consistent with the ndings
on bankruptcy prediction of Agarwal and Tafer (2008), who used
a data set of UK bankruptcies and concluded that market-based
models do not carry much information over simpler market variables, particularly market capitalization. It is worth noting that
capitalization is actually used in the BSM model. Therefore, these
results indicate that the power of the BSM model can be mostly
attributed to the use of capitalization rather than to the remaining
parameters. This may explain why different variants of the model
based on alternative ways of estimating the volatility of assets and
the corresponding expected returns have produced robust results
in past studies (Ak et al., 2012; Agarwal and Tafer, 2008; Charitou et al., 2013). As a consequence, once capitalization is explicitly
taken into consideration, the BSM estimates lose a signicant part
of their information value.
The coefcients of variation of the variables trade-offs are generally low in all models. This indicates that the conclusions drawn
above are robust over time (i.e., under the six tests conducted
through the walk-forward approach). Details on the evolution of
the variables relative importance over the six walk-forward tests
are illustrated in Fig. 2 for all variables except for the country indicator, which, as explained above, makes only a minor contribution
in the models. The relative importance of ROA and interest coverage clearly follows a slightly decreasing trend over time. For ROA
this decreasing trend is stronger under models M3 and M4,
whereas for interest coverage the decrease is larger with models
M1 and M2. On the other hand, the trade-off of long-term leverage
steadily increases and the trend is robust under all modeling settings. The same is also observed for capitalization, whose relative
importance appears higher in the walk-forward tests that incorporate data for 20082010. On the other hand, the trade-off of the DD
follows an increasing trend under model M2, but the introduction
of capitalization in model M4 maintains it at an almost steady level
of around 10%. The same behavior is also evident for the solvency
ratio (EQ/TA), whose importance over time is increasing in models
M1 and M2, whereas in M3 and M4 its importance follows a
decreasing trend. The effect on DD can be explained on the basis
of the arguments described above on its connection to capitalization. On the other hand, the differences in the time trend of the relative importance of solvency, due to the introduction of market
capitalization, can be explained on the basis of evidence reported
in the literature on the interaction and relationship between
capitalization ratios and capital structure. For instance, Baker and
Wurgler (2002) concluded that uctuations in market valuations
have long-lasting effects on capital structure, with low leverage
rms being those that raise funds during the peak of their market
valuation, whereas Chen and Zhao (2006) found that the relationship between market value and capital structure is non-monotonic.
Our empirical results indicate that, in the context of analyzing
credit ratings, even though at rst glance it may seem that the
importance of solvency has increased over time, when controlling
for the capitalization of the rms, this conclusion is clearly not robust. In fact, the positive market conditions up to 2007 (which
helped European rms to reduce their leverage and improve their
604
Table 5
Trade-offs (in %) of the independent variables (averages over all years with coefcients of variation in parentheses).
ROA
EBIT/IE
EQ/TA
EQ/LTD
CI
DD
CAP
M1
40.17
(0.04)
31.37
(0.13)
18.34
(0.17)
8.26
(0.28)
1.86
(0.09)
M2
27.13
(0.09)
27.20
(0.16)
13.71
(0.18)
6.89
(0.24)
1.67
(0.07)
23.39
(0.11)
M3
19.98
(0.16)
15.49
(0.11)
9.08
(0.06)
8.38
(0.22)
1.03
(0.08)
46.03
(0.05)
M4
15.12
(0.20)
13.43
(0.13)
7.13
(0.12)
7.80
(0.22)
0.93
(0.13)
10.70
(0.09)
44.89
(0.05)
50%
45%
ROA
40%
35%
30%
25%
20%
15%
10%
EBIT/IE
5%
F05
F06
M1
F07
F08
M2
25%
F09
M3
F10
F05
M4
M1
F06
13%
EQ/TA
20%
11%
15%
9%
10%
7%
F07
F08
M2
F09
M3
F10
M4
EQ/LTD
5%
5%
F05
F06
M1
F07
M2
29%
F08
F09
M3
F10
F05
M4
M1
F06
M2
50%
DD
23%
48%
17%
45%
11%
43%
F07
F08
F09
M3
F10
M4
CAP
40%
5%
F05
F06
F07
M2
F08
F09
F10
M4
F05
F06
F07
M3
F08
F09
F10
M4
Fig. 2. The trade-offs of the variables over time.
solvency) may have been the actual reason for the increase in the
relative importance of the solvency ratio that is observed in models
M1 and M2. However, after controlling for the market capitalization of the rms, the effect of solvency becomes weaker and appears to be steady over time.
Table 6 presents the results of the comparison of the classication accuracies of all models in the holdout samples.4 The rows in
4
The classication accuracy of a rating model is dened as the ratio between the
number of correct rating assignments by the model (i.e., cases where the ratings
estimated by the model coincide with the actual ratings) to the total number of
observations in the sample.
the two panels correspond to the six walk-forward tests, while the
columns correspond to the years in the holdout samples. For each
year in the holdout samples, Panel A presents the differences between the classication accuracy of model M2 as opposed to model
M1, whereas Panel B presents the same information for the comparison of M4 to M3. The two last columns of the table present the overall classication accuracies of the models for the full holdout panel
data.
Both models M1 and M2 clearly perform poorly, as their classication accuracies are below 50%. The accuracies for model M1,
which only considers the four nancial ratios and the country indicator, range from 37% to 40%. The introduction of DD into model
605

Table 6
Comparison of classication accuracies.
Panel A: M2 M1 differences
F05
F06
F07
F08
F09
F10
Overall accur.
2006
2007
2008
2.22
2.90
6.52
0.71
2.14
2.14
2009
2.88
2.16
0.72
0.00
2010
2011
2012
M1
M2
4.03
3.36
0.67
3.36
4.70
4.64
5.30
4.64
3.97
5.30
5.96
11.96
14.13
9.78
14.13
10.87
8.70
38.77
37.70
37.11
39.17
40.05
39.92
41.74
42.03
39.34
43.69
46.43
46.91
M3
M4
54.03
57.97
56.18
56.87
58.42
60.91
55.40
58.71
56.93
58.00
60.20
62.14
Panel B: M4 M3 differences
F05
F06
F07
F08
F09
F10
0.74
5.07
1.45
0.71
0.71
0.00
1.44
0.72
2.16
0.72
M2 improves the results (over M1) by 4.5% on average (according

to McNemars test, the differences between the two models are
statistically signicant at the 5% level, except the F07 walk-forward
test, which is based on the data up to 2007). The comparisons of
the two models by year indicate that M2 performs almost consistently better than M1 in all cases, and is signicantly better in
2011 and 2012. The introduction of capitalization in models M3
and M4 considerably improves the results. Model M3 outperforms
M1 by more than 18% on average, and M4 performs better than M2
by more than 15%. When compared to model M3, which combines
the nancial ratios, the country indicators, and capitalization, the
introduction of DD in model M4 provides a slight improvement
of 1.2% on average, but the differences are not signicant at the
5% level according to the McNemar test.
Table 7 presents the overall average classication matrix (from
all years) for model M4. On average, the accuracy rate for the
highly rated rms (rating category R1 ) is 90.23%. Thus, more than
90% of the rms from risk category R1 (i.e., rated as at least AA)
are also rated by the model in the same rating class (the corresponding accuracy rate is signicantly different from a random
classication at the 1% level, according to a z-test). On the other
hand, the accuracy rates are lower for the rms in classes R2
and R3 (48.09% and 57.82%, respectively; still signicantly different from random at the 1% level). The overall classication accuracy rate is 57.8% (total percentage of correct predictions), and it
is signicantly different from a random classication result at the
1% level. Furthermore, all misclassications are evidently restricted to two notches. In fact, 94% of the errors involve onenotch misclassications. Overall, the percentage of downgrades
is 55% of the number of misclassications versus 45% for the upgrades, with the downgrades outnumbering the upgrades in all of
the six walk-forward tests, except for F05. This is a positive result,
1.34
1.34
0.67
2.68
2.68
2.65
3.97
4.64
2.65
3.31
0.66
as it indicates that the discrepancies between the multi-criteria

models and the actual credit ratings of the rms involve mostly
cases for which the models provide more conservative estimates
(i.e., indicating that the risk is higher than the one implied by the
ratings). The downgrades from the investment grades (R1 R3 ) to
speculative classes are 18% of the total number of observations in
the investment classes, whereas the upgrade rate (from speculative to investment grades) is 23% of the total number of observations in the speculative categories R4 and R5 .
The classication results are further tested using the area under
the receiver operating characteristic (AUROC) curve. The AUROC is
usually employed to measure the discriminating power of twoclass credit scoring and rating models; however, it can be extended
to multi-class cases, such as the one considered in this study. To
this end, the generalization proposed by Hand and Till (2001) is
employed, which indicates the probability that a rm from any rating category R ( 1; . . . ; N 1) has a higher evaluation score
(credit score) than any other rm with a worse rating (i.e., from
any of the classes R1 , . . ., RN ). Fig. 3 illustrates the average AUROC
for all models. The results are in accordance with the observations
made above regarding the classication accuracies. In particular,
model M2 consistently outperforms M1 (by an average of 2.3%),
whereas the two models that take capitalization into account
(M3 and M4) are almost indistinguishable, as their differences
range between 0.04% and 0.3%. Furthermore, similar to the comparison of the classication accuracies (Table 6), the improvement
that the introduction of the DD brings in model M2 compared to
M1 steadily increases over time, particularly in the walk-forward
tests F08F10, in which the development of the models is based
on data that include the effects of the crisis.
98%
Table 7
Average classication matrix for model M4.
R1
M1
M2
M3
M4
95%
92%
Estimated rating (%)
Actual rating
R1
90.23
R2
17.53
R3
1.49
R4
0.00
R5
0.00
2.17
2.17
0.00
1.09
2.17
2.17
R2
R3
R4
R5
6.32
48.09
12.60
1.60
0.00
3.45
31.35
57.82
27.67
2.17
0.00
3.03
26.94
60.43
35.65
0.00
0.00
1.16
10.29
62.17
Percentage of sample observations with actual rating Ri (rows) that are classied by
the model in class Rj (columns).
Signicantly different from random classication at the 1% level.
89%
86%
83%
80%
F05
F06
F07
F08
F09
F10
Fig. 3. AUROC of the models, averaged over all years.
606
Table 8
Average trade-offs (in %) of the independent variables (coefcients of variation in parentheses) under the two-class setting.
ROA
EBIT/IE
EQ/TA
EQ/LTD
CI
DD
CAP
M1
36.38
(0.15)
31.85
(0.16)
20.51
(0.36)
9.47
(0.23)
1.78
(0.09)
M2
23.03
(0.23)
25.26
(0.28)
14.91
(0.36)
6.79
(0.22)
1.76
(0.21)
28.24
(0.25)
M3
14.78
(0.27)
18.24
(0.13)
11.94
(0.44)
7.32
(0.10)
1.48
(0.44)
46.24
(0.03)
M4
11.26
(0.22)
15.50
(0.18)
10.34
(0.46)
6.41
(0.12)
1.31
(0.44)
10.94
(0.19)
44.24
(0.03)
85%
M1
M2
M3
91%
M4
M1
M2
M3
M4
88%
80%
AUROC
Accuracy
85%
75%
70%
82%
79%
76%
65%
73%
60%
70%
F05
F06
F07
F08
F09
F10
F05
F06
F07
F08
F09
F10
Fig. 4. Classication accuracy and AUROC for the two-class models (averaged over all years).
4.3. Speculative versus investment grades

We tested the robustness of the results presented above by also
considering a binary classication scheme, based on discriminating
between speculative and investment grades. Table 8 summarizes
the trade-offs of the variables in each of the four models. The results conrm the ndings discussed previously (e.g., Table 5). In
particular, among the nancial ratios, ROA, interest coverage, and
solvency have the highest relative contribution in the models.
The country indicator once again has marginal relevance, whereas
DD has a relative contribution in model M2 equal to 28.24%, which
is very similar to its trade-off (23.39%) in the same model with the
ve-class scheme. Capitalization is again the dominant factor in
models M3 and M4, with its trade-off being consistently higher
than 40% and exhibiting low variability over the six walk-forward
tests (as indicated by the coefcient of variation, which is equal to
0.03). As with the observation made above for model M2, the relative importance of DD in model M4 under this two-class setting
is almost identical to the multi-grade specication discussed in
the previous section (i.e., 10.94% in the two-class scheme versus
10.7% in the ve-grade scheme).
Details on the classication results (accuracy rate and AUROC)
are shown in Fig. 4. As seen with the multi-class setting, the introduction of DD in model M2 improves both the classication accuracy (by 1.77% on average) and AUROC (by 1.6% on average), with
the improvements becoming larger in the walk-forward tests
based on the most recent data (the same result was also observed
in the multi-category setting). On the other hand, the introduction
of DD provides no noticeable benet when capitalization is already
included in the model. Under the most comprehensive model, M4,
the overall (average) accuracy is 77%, with the downgrade error
rate being 23.5% and the update error being 21.8%. Interestingly,
these error rates are slightly higher than the ones reported earlier
for the multi-class scheme, thus indicating that the proposed
MCDM model-tting approach for multi-class problems is better
suited for the analysis of credit ratings in a multi-grading scheme

than a simpler dichotomic approach.
5. Conclusions and future perspectives

The analysis of the ratings issued by CRAs has received much
attention in the nancial literature, due to their signicance in
the context of credit risk management and their widespread use
by investors, policy makers, and managers. In this study, we sought
to explain and predict the credit ratings issued by S&P on the basis
of nancial and market data, using a unique cross-country panel
data set from Europe over the period 20022012. The BSM structural model was employed to obtain DD estimates, and their information content in predicting credit ratings was compared to
models based on nancial and market capitalization data. For the
analysis, we employed a MCDM technique based on a linear programming formulation for tting additive credit scoring models
to the data. The results demonstrate that, even though the BSM
model signicantly enhances the predictions based solely on nancial ratios, its information power is considerably weaker when
market capitalization is considered. Furthermore, the relative
importance of market capitalization and long-term leverage has increased after the outbreak of the crisis, whereas the relative importance of protability ratios decreased. The developed multi-criteria
models exhibit good and robust behavior, with most classication
errors restricted to one-notch divergences from the actual ratings.
These empirical results could be extended in a number of directions. First, additional predictor attributes could be considered,
focusing on macroeconomic factors, which could be of particular
importance over the business cycle and during economic turmoil,
providing a better description of cross-country differences. Data
related to market sentiment and information from the CDS markets could also be useful for predicting credit ratings by complementing the estimates of structural models with timely ndings
on market trends. Variables related to regulatory frameworks

(Cheng and Neamtiu, 2009) and corporate governance (Alali
et al., 2012) are also important for the analysis of credit ratings
in a comprehensive context. Second, except for only focusing on
static analyses of credit ratings, the explanation and prediction of
rating changes is also a crucial point of major interest to market
participants. The investigation could also be extended to cover
non-listed companies, which raises the question of how structural
models can be generalized successfully in such cases where market
data are not available. Finally, the combination of nancial data,
structural models, and credit ratings in an integrated risk management context could be considered, in accordance with the ndings
reported in recent studies (e.g., Das et al., 2009; Hilscher and
Wilson, 2013) on the possible synergies that can be derived by
combining different types of risk models and measures in credit
risk assessment.
References
Ak, Z., Arad, O., Galil, K., 2012. Using Merton Model: An Empirical Assessment of
Alternatives. Tech. Rep., Ben Gurion University, Israel.
Agarwal, V., Tafer, R., 2008. Comparing the performance of market-based and
accounting-based bankruptcy prediction models. J. Bank. Finance 32 (8), 1541
1551.
Alali, F., Anandarajan, A., Jiang, W., 2012. The effect of corporate governance on
rms credit ratings: further evidence using governance score in the United
States. Account. Finance 52 (2), 291312.
Baker, M., Wurgler, J., 2002. Market timing and capital structure. J. Finance 57 (1),
132.
Bharath, S.T., Shumway, T., 2008. Forecasting default with the Merton distance to
default model. Rev. Financ. Stud. 21 (3), 13391369.
Black, F., Scholes, M., 1973. The pricing of options and corporate liabilities. J. Polit.
Econ. 81 (3), 637659.
Charitou, A., Dionysiou, D., Lambertides, N., Trigeorgis, L., 2013. Alternative
bankruptcy prediction models using option-pricing theory. J. Bank. Finance 37
(7), 23292341.
Chen, L., Zhao, X., 2006. On the relation between the market-to-book ratio, growth
opportunity, and leverage ratio. Finance Res. Lett. 3 (4), 253266.
Cheng, M., Neamtiu, M., 2009. An empirical analysis of changes in credit rating
properties: timeliness, accuracy and volatility. J. Account. Econ. 47 (12), 108130.
Das, S.R., Hanouna, P., Sarin, A., 2009. Accounting-based versus market-based crosssectional models of CDS spreads. J. Bank. Finance 33 (4), 719730.
Doumpos, M., Pasiouras, F., 2005. Developing and testing models for replicating
credit ratings: a multicriteria approach. Comput. Econ. 25 (4), 327341.
Doumpos, M., Zopounidis, C., 2002. Multicriteria Decision Aid Classication
Methods. Springer, New York.
607
Doumpos, M., Zopounidis, C., 2011. A multicriteria ouranking modeling approach

for credit rating. Decis. Sci. 42 (3), 721742.
Dwyer, D., Kocagil, A., Stein, R., 2004. Moodys KMV RiskCalcTM v3.1 Model. Tech.
Rep., Moodys Investor Services. <http://www.moodys.com/sites/products/
ProductAttachments/RiskCalc3.1Whitepaper.pdf> (last accessed 08.12.13).
Frost, C.A., 2007. Credit rating agencies in capital markets: a review of research
evidence on selected criticisms of the agencies. J. Account., Audit. Finance 22
(3), 469492.
Hand, D.J., Till, R.J., 2001. A simple generalisation of the area under the ROC curve
for multiple class classication problems. Mach. Learn. 45 (2), 171186.
Hillegeist, S.A., Keating, E.K., Cram, D.P., Lundstedt, K.G., 2004. Assessing the
probability of bankruptcy. Rev. Account. Stud. 9 (1), 534.
Hilscher, J., Wilson, M., 2013. Credit Ratings and Credit Risk: Is One Measure
Enough?. Tech. Rep., AFA 2013 San Diego Meeting.
Huang, Z., Chen, H., Hsu, C.-J., Chen, W.-H., Wu, S., 2004. Credit rating analysis with
support vector machines and neural networks: a market comparative study.
Decis. Support Syst. 37 (4), 543558.
Hull, J., 2011. Options, Futures, and Other Derivatives, eighth ed. Prentice Hall, New
Jersey.
Hwang, R.-C., 2013. Forecasting credit ratings with the varying-coefcient model.
Quant. Finance, 119.
Hwang, R.-C., Chung, H., Chu, C., 2010. Predicting issuer credit ratings using a
semiparametric method. J. Empirical Finance 17 (1), 120137.
Jeon, D.-S., Lovo, S., 2013. Credit rating industry: a helicopter tour of stylized facts
and recent theories. Int. J. Ind. Org. 31 (5), 643651.
Krahnen, J.P., Weber, M., 2001. Generally accepted rating principles: a primer. J.
Bank. Finance 25 (1), 323.
Li, M.-Y.L., Miu, P., 2010. A hybrid bankruptcy prediction model with dynamic
loadings on accounting-ratio-based and market-based information: a binary
quantile regression approach. J. Empirical Finance 17 (4), 818833.
Lu, H.-M., Tsai, F.-T., Chen, H., Hung, M.-W., Li, S.-H., 2012. Credit rating change
modeling using news and nancial ratios. ACM Trans. Manage. Inform. Syst. 3
(3), 14:114:30.
Merton, R.C., 1974. On the pricing of corporate debt: the risk structure of interest
rates. J. Finance 29 (2), 449470.
Mizen, P., Tsoukas, S., 2012. Forecasting US bond default ratings allowing for
previous and initial state dependence in an ordered probit model. Int. J.
Forecast. 28 (1), 273287.
Pagano, M., Volpin, P., 2010. Credit ratings failures and policy options. Econ. Policy
25 (62), 401431.
Pasiouras, F., Gaganis, C., Zopounidis, C., 2006. The impact of bank regulations,
supervision, market structure, and bank characteristics on individual bank
ratings: a cross-country analysis. Rev. Quant. Finance Account. 27 (4), 403438.
Tichy, G., Lannoo, K., Gwilym, O., Alsakka, R., Masciandaro, D., Paudyn, B., 2011.
Credit rating agencies: part of the solution or part of the problem?
Intereconomics 46 (5), 232262.
Treacy, W.F., Carey, M., 2000. Credit risk rating systems at large US banks. J. Bank.
Finance 24 (12), 167201.
Vassalou, M., Xing, Y., 2004. Default risk in equity returns. J. Finance 59 (2), 831
868.
Zopounidis, C., Doumpos, M., 2013. Multicriteria decision systems for nancial
problems. TOP 21 (2), 241261.

Combining Accounting Data and A Structural Model For Predicting Credit Ratings Empirical Evidence From European Listed Firms 2015 Journal of Banking F

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Combining Accounting Data and A Structural Model For Predicting Credit Ratings Empirical Evidence From European Listed Firms 2015 Journal of Banking F

Загружено:

Авторское право:

Доступные форматы

Journal of Banking & Finance 50 (2015) 599607

Contents lists available at ScienceDirect

Journal of Banking & Finance

Combining accounting data and a structural model for predicting credit

nancial ratios) and market variables (Huang et al., 2004; Mizen

M. Doumpos et al. / Journal of Banking & Finance 50 (2015) 599607

the analysis and prediction of credit ratings in combination with

where rf is the risk-free rate, r is the volatility of the asset returns,

Solving Eqs. (1) and (2) simultaneously or with iterative procedures

where l is the expected return on assets, which can be estimated

M. Doumpos et al. / Journal of Banking & Finance 50 (2015) 599607

t < Vxi < t 1

s:t: Vxi yi P t d 8 i 2 fR1 ; . . . ; R g; 1; . . . ; N 1 7

v k xik v k xjk P 0 8 xik P xjk

( 1; . . . ; N 1). These errors refer to violations of the lower

0. Thus, the total downgrade error is y

M. Doumpos et al. / Journal of Banking & Finance 50 (2015) 599607

M. Doumpos et al. / Journal of Banking & Finance 50 (2015) 599607

s rank correlations with the credit ratings.

Full rating scale

in both models. The DD measure has a strong effect in model M2,

M. Doumpos et al. / Journal of Banking & Finance 50 (2015) 599607

Fig. 2. The trade-offs of the variables over time.

M. Doumpos et al. / Journal of Banking & Finance 50 (2015) 599607

M2 improves the results (over M1) by 4.5% on average (according

as it indicates that the discrepancies between the multi-criteria

Estimated rating (%)

Signicantly different from random classication at the 1% level.

Fig. 3. AUROC of the models, averaged over all years.

M. Doumpos et al. / Journal of Banking & Finance 50 (2015) 599607

4.3. Speculative versus investment grades

suited for the analysis of credit ratings in a multi-grading scheme

5. Conclusions and future perspectives

M. Doumpos et al. / Journal of Banking & Finance 50 (2015) 599607

on market trends. Variables related to regulatory frameworks

Doumpos, M., Zopounidis, C., 2011. A multicriteria ouranking modeling approach

Вам также может понравиться