Non Linear Probability Models

Chapter 3
Non Linear Probability Models

In Chapter 2, we described the basic regression technique that is employed while estimating
probability models. However, three problems stand out while estimating the Linear
Probability Model. These are, low goodness of fit, unreasonable probability estimates and
non linear effect of variables on default probability. In this chapter, we will explore the
possibility of estimating non linear probability estimates.

The essence of non linear regression involving dummy dependent variable is to estimate a
regression model of k explanatory variables,
i
k
j
j ij i
u x y + +

1
0
*
. (3.1)
where
i
u is the error term in estimation and
*
i
y is not observed. In our case,
*
i
y is the
probability of default, which is directly not observable. This is often called the latent
variable estimation technique. Instead of observing
*
i
y , what we observe is a dummy
variable
i
y defined by,
'
>
otherwise 0
0 if 1
*
i
y
y . (3.2)

To estimate, (2.4.1), we need to assume/ impose structure on the error term,
i
u . Note that,
( )
(3.3) ........ 1
Prob 1 Prob
1
0
k
1
0
,
_
,
_
+
1
1
]
1
,
_
+ >
k
j
ij
j
ij i i i
F
u y P

Where ( ) F is the cumulative function of u , the error term. Note that, if the distribution of
u is symmetric, then, ( ) ( ). 1 Z F Z F Therefore, ( )
,
_
k
j
ij i i
F y P
1
0
1 Prob
What are the possible error distributions one can assume? We consider, two commonly used
distributions (i) the logistic distribution, (ii) the normal distribution.

3.1. Logistic Model
The exact functional form to be estimated in equation (3.3)depends upon the assumption one
makes on the error terms. If the cumulative distribution of
i
u is logistic, we have what is
known popularly as the logit model.

The random variable Z follows a logistic distribution if
( )
i
i
Z
Z
i
e
e
Z F
+
1
. (3.4).
Note,
( )
( )
i
i
i
Z
Z F
Z F
1
ln
where, ln denotes the natural logarithm (base e ). Therefore,
k
j
ij
i
i
i
P
P
L
1
0
1
ln
The left hand side of this equation is called the log-odds ratio.
Some of the interesting features of the logistic model are;
Note that, as
i
Z , the cumulative probability approaches zero, while as,
+
i
Z the cumulative probability approaches one.
Although L is linear in the explanatory variables, the probabilities are not. To see
this, note that, ) 1 ( P P
dx
dP
j
j
. This shows that the rate of change in the
probability as the explanatory variable changes depend not only on the parameter, ,
but also on the level of probability from which the probability is measured. Note that,
effect of a unit change of the explanatory variable on he probability is highest when
5 . 0 P and is the least when P approaches 0 or 1.
A simple plot of logistic distribution function is given in figure 3.1.
Figure 3.1
Logistic function
0
0.2
0.4
0.6
0.8
1
1.2
-15 -10 -5 0 5 10 15
Z
C
u
m
u
l
a
t
i
v
e

P
r
o
b
a
b
i
l
i
t
y
Logistic function

How to estimate the above equation? Unfortunately, the standard estimation technique like
the Ordinary Least Square (OLS) is not applicable here.
1
Instead, we employ a technique
that is called Maximum Likelihood Technique. This is arrived at by setting up a likelihood
function
( )

0 1
1
i i
y
i
y
i
P P l
and then choosing the parameters
j
to maximize . l Most advance Econometric software
packages will do this regularly.
2
We will report below the routine findings of a logistic
regression that with the data set given in Statistical data.xls. The software package employed
for the logistic regression is EViews 5.0. The estimated logistic regression equation is
,
_
Equity
Debt
P
P
L
i
i
i
. 17 . 14 15 . 5
1
ln .. (3.4)
Therefore, for a firm that has debt equity ratio of 0.5 (say), the probability of default is
(approximately) 0.535.

1
For details see Gujarati (1995).
2
The OLS technique can not be employed for one additional reason. The estimation suffers from
Heteroscedasticity (Gujarati, 1995).

Goodness of Fit: How good is the estimated model obtained in (3.4)? Usually, when
confronted with this question, we base our decision on the goodness of fit statistic the
2
R or the Adjusted
2
R depending upon we are estimating a single explanatory variable
model or multiple explanatory variables model. However, the usual interpretations of
2
R may
not be valid or even the most desirable statistic to judge goodness of fit. The reasons are as
follows. Firstly, the observed frequencies are binary- zeroes and ones, while the estimated
probabilities are typically expected to be continuous variables in the interval [ ] 1 , 0 . Secondly,
a good probability model for us is the one where if a firm is classified as `high probability of
default as compared to another that has been classified as `low probability of default, the
observed characteristics of the first firm should be defaulted while that of the second firm
should be not defaulted.

In other words, a good probability estimation model should necessarily minimize the type I
and type II error. The type I error in this case is classifying a firm with low probability of
default when it has actually defaulted. Similarly, the type II error in this case is classifying a
firm with high probability of default when it has actually not defaulted. Therefore, we need
to specify a threshold level of default (here, say 0.5) and calculate the extent of errors in our
model. Therefore, out of the 32 observations, which had 13 defaulters and 19 non defaulters,
the classification table with threshold default probability of 0.5 is presented below.

Table 3.1

Estimated
Equation

Constant
Probability
Dep=0 Dep=1 Total Dep=0 Dep=1 Total

P(Dep=1)<=0.5 19 2 21 19 13 32
P(Dep=1)>0.5 0 11 11 0 0 0
Total 19 13 32 19 13 32
Correct 19 11 30 19 0 19
% Correct 100 84.62 93.75 100 0 59.38
% Incorrect 0 15.38 6.25 0 100 40.63
Total Gain* 0 84.62 34.38

The interpretation of the above classification table is crucial in understanding goodness of fit
involving logit models.
Out of 32 observations involving 19 non defaulters (those with 0
i
y ) and 13 defaulters
(those with 1
i
y ), the model has correctly predicted 30 firms. In other words, the model
had predicted a default probability of 5 . 0 P for 11 firms. All the 11 observations are
correctly classified, that is, none of the actual non defaulters are estimated to have a
probability of default exceeding 0.5. Therefore, the type II error is zero here. However, in
actual, there were 13 defaulters, while the model has correctly captured 11 of them. The
model could not correctly classify 2 of the defaulters. This is the type I error. The findings of
this model is now contrasted with the findings of a constant probability model. Note that, we
could have set up a trivial model that classifies all firms will not default or all firms will
default. How accurate will that model be? If we consider the first trivial model(where we
call all firms as non defaulters), we will be correct 19 out of 32 firms- as there were 19 non
defaulters which will be correctly classified. However, our trivial model would also mean
that we incorrectly classify the remaining 13. Our estimated logit model should have the
desirable property that it has more explanatory power as compared to the trivial constant
probability model. The total percentage gain our model has over the constant probability
model is %. 38 . 34
32
19
32
11 19

,
_

,
_
+
In both the parentheses, the denominator is the total
number of observations while the numerator has number of observations correctly classified.
The classification analysis is the most commonly used goodness of fit for logit models.
3

Note that if we set the cut off probability as 0.4, it will give a more conservative approach to
measuring default risk. Although the predictability of the model may go down, this would
mean a lower type I error - which may be desirable if default is very costly. The eventual cut
off to set can be best left to the practitioners.

While employing the logit model, one has to be wary about certain problems. While, most
standard packages can fix the estimation problems, one has to be careful about the problem of
disproportionate samples. In other words, it is desirable to have almost equal number of
defaulters and non defaulters in the sample. Usually in a cross section involving n borrowers,
the proportion of defaulters will be out numbered by non defaulters. One possible way of
overcoming this problem is to apply different sampling rates for the two groups. Different
sampling rates tend to affect the estimated intercept term.

3.2. Probit Model

In the previous section, the cumulative distribution of
i
u were logistic. However, if the error
terms,
i
u in (3.3) follow a normal distribution, we have the probit model.
4
Therefore, with
error term following normal distribution, we have
( ) dt e Z F
t
Z
i
i
,
_
2
2
2
1
. (3.5).

3
Alternate measures include McFaddens
2
R and Hosmer-Lemeshow test. Both these tests are beyond the scope
of this book. For reference, see Gujarati (1995).
4
This is a bit of a misnomer. It should have been ideally called normit. Indeed some books refer to this as normit.
However, the nomenclature, probit is more widely used.
Similar to the logit model, the estimation of Tobit model uses non linear estimation technique
involving method of maximum likelihood function. The cumulative probability function plot
of probit model is given in figure 3.2.

Figure 3.2

Probit
0
0.2
0.4
0.6
0.8
1
1.2
-15 -10 -5 0 5 10 15
Z
C
u
m
u
l
a
t
i
v
e

P
r
o
b
a
b
i
l
i
t
y
Probit

Note that, the plots of cumulative probability distributions under logistic or normal, are very
similar. This can be seen fro figure 3.3.
Figure 3.3
Comparison: Logit vs. Probit
0
0.2
0.4
0.6
0.8
1
1.2
-15 -10 -5 0 5 10 15
Z
C
u
m
u
l
a
t
i
v
e

P
r
o
b
a
b
i
l
i
t
y
Logistic function
Probit

Therefore, we are expected to get very similar results whether we use logistic or normal
distribution function for error terms except for the tails. However, if we have large number of
observations, then observations in the tail will tend to show up more frequently. This will
mean the parameter estimates from logit or probit will tend to differ significantly. Amemiya
(1981) suggests the following approximation of probit estimates from logit estimates.
Multiply, the estimated parameters of logit model by 0.625 to get the estimated probit
parameters.

Let us directly estimate the probit model for the data given in Statistical data.xls. EViews 5.0
gives these as, . 68 . 7 and ; 74 . 2
1 0
Note that, the parameter estimates s using
logistic regression were, 17 . 14 and ; 15 . 5
1 0
. Similar to table 3.1, the default
prediction of the probit model is presented in table 3.2. The results are identical. (Why?)

Table 3.2

Estimated
Equation

Constant
Probability
Dep=0 Dep=1 Total Dep=0 Dep=1 Total

P(Dep=1)<=0.5 19 2 21 19 13 32
P(Dep=1)>0.5 0 11 11 0 0 0
Total 19 13 32 19 13 32
Correct 19 11 30 19 0 19
% Correct 100 84.62 93.75 100 0 59.38
% Incorrect 0 15.38 6.25 0 100 40.63
Total Gain* 0 84.62 34.38

A Final Word:
LPM, logit or probit? With reasonable accuracy, estimated coefficients of logit and probit can
be approximated from the LPM regression. Amemiya (1981) suggests that the estimated
coefficients of the LPM,
LP
and the coefficients of the logit model,

L
are related as
follows,
term intercept for the ,
25 . 0 5 . 0
term intercept for the except ,
25 . 0
L LP
L LP

+

As the estimated coefficients of logit and probit models are closely related, therefore, one can
expect to get not too different results using either of the methods. However, logit or probit
appears to do significantly better than the LPM, the choice is not obvious between logit or
probit. However, one possible hint in choosing logit over probit or vice versa, can be sought
in figure 3.3. As logit models allow us to observe significant tail observations as compared to
probit, if we consider large sample size- which may contain potentially very healthy firms or
extremely insolvent firms, logistic regression may be more appropriate. For relatively small
sample sizes, there is hardly any difference between the two.

Exercise And Questions:
3.1 (Detailed Exercise) : The output from logistic regression done on 1578 Indian
firms that defaulted between 1994-2004 is given below. The variables used for the
regression are;
Default = 1 , If the firm defaults
= 0; Otherwise.
MCAP= Average Market capitalization of the firm (in Crore Rs)
Sal= Workers wage and salary (In Crore Rs)
TA= Total assets (In Crore Rs)
Sales=Sales (In Crore Rs)

The logistic regression output is given below:
Logit Regression
Output
Dependent Variable: Default
Method: ML - Binary Logit (Quadratic hill climbing)
Sample (adjusted): 2 1579
Included observations: 1303 after adjustments
Convergence achieved after 4 iterations
Covariance matrix computed using second derivatives

Variable Coefficient Std. Error
z-
Statistic Prob.
C -1.0004 0.072337 -13.8298 0
log(Sal/TA) -1.50E-09 1.04E-09 -1.44131 0.0495
log(MCAP) 1.35E-09 3.17E-10 4.261213 0
log(Sales) 1.39E-10 7.03E-10 0.19734 0.0036

Interpret the above findings? Would you have liked to perform the regression
any differently? Explain
Suppose, a particular firm has the following numbers; Sales 20 crores; Salary:
0.5 crores; Total assets: 20 crores and Market capital: 20 crores. What is the
probability that it will default ?
To find, how well the model fitted, we did the following three additional checks. How do
you interpret the three outputs?

Check1:
Dependent Variable: CHOICE
Prediction Evaluation (success cutoff C = 0.5)
Estimated Equation Constant Probability
Dep=0 Dep=1 Total Dep=0 Total
P(Dep=1)<=C 908 346 1254 922 1303
P(Dep=1)>C 14 35 49 0 0
Total 922 381 1303 922 1303
Correct 908 35 943 922 922
% Correct 98.48 9.19 72.37 100 70.76
% Incorrect 1.52 90.81 27.63 0 29.24
Total Gain* -1.52 9.19 1.61
Percent Gain** NA 9.19 5.51
Check 2:
P(Dep=1)<=C 12 6 18 0 0
P(Dep=1)>C 910 375 1285 922 1303
Total 922 381 1303 922 1303
Correct 12 375 387 0 381
% Correct 1.3 98.43 29.7 0 29.24
% Incorrect 98.7 1.57 70.3 100 70.76
Total Gain* 1.3 -1.57 0.46
Percent Gain** 1.3 NA 0.65
Check 3:
P(Dep=1)<=C 917 367 1284 922 1303
P(Dep=1)>C 5 14 19 0 0
Total 922 381 1303 922 1303
Correct 917 14 931 922 922
% Correct 99.46 3.67 71.45 100 70.76
% Incorrect 0.54 96.33 28.55 0 29.24
Total Gain* -0.54 3.67 0.69
Percent Gain** NA 3.67 2.36

3.2 (Detailed Exercise) : Refer to the data in the accompanying CD. For the year
2003, estimate a logistic default probability model using the algorithm described
in the appendix of this chapter.
Interpret your results
How does your result compare with the Linear Probability Model and the
Linear Discriminant Models obtained in exercise 2.2 and 2.3 of the
previous chapter?
Which of the three models give you the best in sample and out of sample
predictions?

Appendix 3

Logistic Regressions in Excel

Although most standard statistical software packages routinely perform the logistic
regression, it is worthwhile to describe the various steps in estimating the logit regression.
We will try to estimate a logit regression using Excel.

Consider the example in LogitExcel.xls. This contains a cross section data of 635 firms
with 200 defaulters and 435 non defaulters. The explanatory variable,
i
X is the debt equity
ratio for firm . i The indicator variable is
'
otherwise, if 0
defaulted has firm the if 1
i
y
We wish to estimate the equation,
i i
i
i
i
u X
P
P
L + +
1 0
1
ln

Note that, we can not run the regression by substituting the indicator variable in place of
i
P
(why?). Therefore, we follow these steps to estimate the problem.

Step I: Divide the population into 32 groups - 31 groups having 20 observations and the
final group having 15 observations. Calculate the average debt equity ratio,
i
X
for each
group . i
Step II: For each of the groups, calculate the probability of default as
( )
( )
i
i
i
N
n
P
size Group
defaulters of Number

Note that, by design we have kept 20
i
N for 31 groups and 15
i
N for the last group.
Every group i , formed must ensure that . 1 0 < <
i
P
Step III: Arrange the groups into ascending order of
i
X
.
Step IV: Calculate the log-odds ratio as
,
_
P
P
L
i
i
ln
.
Step V: Calculate ( )
i i i i
P P N w
. for each group.

Step VI: Multiply
i
X
and
i
L
by
i
w to obtain
i i i
X w X
*
and
i i i
L w L
*
. The transformed
equation is now, ,
*
1 0
*
i i i i
v X w L + + where
i i i
u w v .
Step VII: Regress (OLS)
*
i
L on
i
w and
*
i
X . The transformation performed in step VI will
ensure the problem of heteroscedasticity is addressed,
5
and the coefficient estimates will be
efficient. While regressing, ensure that the regression is through the origin. In excel, in the
regression dialogue box, enable the box `constant is zero. This is because, note the constant
in the original model is now transformed.

The regression output appears,

5
The method of transformation is often referred to as the Weighted least Square (WLS) or the generalized Least
Square (GLS).

Table 3.A.1

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.68123
R Square 0.464074
Adjusted R
Square 0.411111
Standard Error 0.591054
Observations 31
ANOVA
Df SS MS F
Significance
F
Regression 2 8.772711 4.386356 12.55596 0.000128
Residual 29 10.13099 0.349345
Total 31 18.9037
Coefficients
Standard
Error t Stat P-value Lower 95%
Upper
95%
Intercept 0 #N/A #N/A #N/A #N/A #N/A
X* 2.183884 0.325669 6.705832 2.34E-07 1.517815 2.849953
W -3.76392 0.462128 -8.14476 5.56E-09 -4.70908 -2.81877

We wish to estimate the probability that a firm will default given a certain debt equity ratio.
Consider, a firm that has a debt equity ratio of 1.407. Note that, corresponding to that
observation, . 198 . 2
i
w Therefore, from
1 1 0

X L
i
+ we have, . 691 . 0

i
L Now
from,
,
_
i
i
i
P
P
L
ln , we calculate, estimated probability as 33 . 0

i
P .
The regression output of a logistic regression using MLE method in EViews is given below,

Table 3.A.2
Variable Coefficient Std. Error
z-
Statistic Prob.
C -7.618468 0.597114
-
12.75882 0
DEQ 4.387461 0.355321 12.34789 0
Mean dependent
var 0.315457 S.D. dependent var 0.465065
S.E. of regression 0.311471 Akaike info criterion 0.622937
Sum squared resid 61.31312 Schwarz criterion 0.636981
Log likelihood -195.4709 Hannan-Quinn criter. 0.62839
Restr. log likelihood -395.2342 Avg. log likelihood -0.30831
LR statistic (1 df) 399.5266
McFadden R-
squared 0.50543
Probability(LR stat) 0

Note that, the two coefficients are not directly comparable. However, from both the outputs,
debt equity ratio turns out to be a significant variable. Both the methodologies lead to about
86% correct predictions. However, the nature of incorrect predictions- that is the exact type I
and type II error is not identical (verify with the data). The estimated probability with either
of the two algorithms are plotted in Figure 3.A.1. Interpret the two graphs.
Figure 3.A.1

Comparison Of Logistic Probabilities
using EViews and Excel
0
0.2
0.4
0.6
0.8
1
1.2
Debt Equity ratios
C
u
m
u
l
a
t
i
v
e

P
r
o
b
a
b
i
l
i
t
i
e
s
Estimated
probability
(Eviews)
Estimated
probability
(Excel)

While concluding, let us estimate the above model using the LPM. The estimated coefficients
are given in table 3.A.3. Therefore, the estimated equation is,
i i
X P 484 . 0 344 . 0
+ . The
error classification also gives similar results as the two above. Are the coefficients obtained
in tables 3.A.1 and 3.A.3 comparable? Are the coefficients obtained in tables 3.A.2 and 3.A.3
comparable? Why?

Table 3.A.3
SUMMARY OUTPUT

Regression Statistics
Multiple R 0.693114
R Square 0.480407
Adjusted R
Square 0.479585
Standard Error 0.335497
Observations 634

ANOVA
Df SS MS F
Significance
F
Regression 1 65.77187 65.77187 584.3376 6.45E-92
Residual 632 71.13665 0.112558
Total 633 136.9085

Coefficients
Standard
Error t Stat P-value Lower 95%
Upper
95%
Intercept -0.34453 0.03038 -11.3405 2.91E-27 -0.40419 -0.28487
DEQ 0.484019 0.020023 24.17308 6.45E-92 0.444699 0.523339

Non Linear Probability Models

Загружено:

Сведения о документе

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Non Linear Probability Models

Загружено:

Авторское право:

Доступные форматы

Chapter 3

Non Linear Probability Models

and the coefficients of the logit model,

term intercept for the except ,

. for each group.

ln , we calculate, estimated probability as 33 . 0

Вам также может понравиться