F 473272830

Disability Income Insurance
Explaining the structural changes in the disability duration of Dutch

self-employed with a focus on business cycle-related variables.
Dorthe van Waarden
July 14, 2012
Masters Thesis Mathematics
Supervisors: Michel Mandjes, Theo Beekman,
Folkert de Jong, Martin Heijnsbroek
Faculty of Science
University of Amsterdam
Abstract
This thesis analyses the dynamics of the return-to-work process of Dutch self-
employed using a unique data set containing more than 30.000 sick leave claims
during the period 2003 2011. We estimate a multi-state model and analyse the
transitions from one state to another during the incapacity by both a proportional
hazards models and a logit model. In particular, we focus on the inuence of the
business cycle on the full and partial recovery rates, as well as on the fall-back
rate. Finally, the inuence of the various risk factors is quantied by calculating
the expected duration until recovery for dierent values of the risk factors and
comparing these to the benchmark self-employed.
Details
Title: Disability Income Insurance
Author: Dorthe van Waarden, djvw@science.uva.nl, 5801974
Supervisor: Prof.dr Michel Mandjes (UvA)
Theo Beekman (Achmea)
Folkert de Jong, Martin Heijnsbroek (MIcompany)
Second reviewer: Prof.dr. Rudesindo Nunez-Queija
Date: July 14, 2012
Faculty of Science
University of Amsterdam
Science Park 904, 1098 XH Amsterdam
http://www.science.uva.nl/math
Preface
This Masters thesis is part of my internship at MIcompany, a specialized commercial analytics
agency that helps companies deal with two key challenges. Firstly, they help to discover growth
opportunities based on granular analysis of customer data. Secondly, MIcompany helps to build
an in-house capability to leverage analytics in the business. They help their customers with
recruitment and development of their analytical talent, build analytical tools that facilitate easy
replication of smart analysis, and provide specialized support. One of their clients is a large
Dutch insurance company, who provided the data and research question for this thesis. I really
enjoyed the practical side of this internship and being able to work in two dierent companies.
Foremost, I would like to express my sincere gratitude to all my supervisors. First of all I would
like to thank Prof.dr Michel Mandjes for assisting me nding an internship and supporting me
throughout the whole process. Second I would like to thank Theo Beekman for taking the time
to explain me everything about disability insurance and the various models, and for providing
the data and background information needed for this thesis. Last, but certainly not least, I
would like to express my thanks to Folkert de Jong and Martin Heijnsbroek from MIcompany,
for giving me the opportunity to do my internship and for initiating and supporting the contact
with the insurance company.
i
Contents
Preface i
1 Introduction 1
2 Disability Income Insurance 6
2.1 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Features of the insurance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Data description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 Recovery models 10
3.1 Current model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2 Multi-state model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2.1 Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.3 Explanatory variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3.1 Business cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4 Survival Theory 20
4.1 Hazard function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2 Kaplan Meier estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.3 Censored data and the Likelihood function . . . . . . . . . . . . . . . . . . . . . 24
4.4 Cox Proportional Hazards model . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.4.1 PH assumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.4.2 Partial Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.5 Time-varying covariates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.5.1 Episode splitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.6 Competing risks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.7 Unobserved heterogeneity: Frailty models . . . . . . . . . . . . . . . . . . . . . . 34
4.8 Goodness of t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5 Binary regression 38
5.1 The logit model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.1.1 Likelihood function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.2 Panel data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.2.1 Linear panel models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.2.2 Binary panel models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.3 Goodness of t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.3.1 Maximum likelihood theory . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.3.2 Pseudo R
2
measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
ii
CONTENTS iii
6 Results 49
6.1 Data set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6.2 Model without the business cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
6.2.1 Unobserved heterogeneity . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6.2.2 Proportional hazards assumption . . . . . . . . . . . . . . . . . . . . . . . 52
6.3 Business cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
7 Quantifying the inuence of the risk factors 56
7.1 Transition probabilities in the multi-state model . . . . . . . . . . . . . . . . . . 56
7.2 Expected duration until recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
7.2.1 Other causes of the uctuations in the loss ratio . . . . . . . . . . . . . . 61
8 Conclusion and advice 63
A Estimation results of the MPH model 65
A.1 PH assumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
A.2 Business cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
A.2.1 Inuence business cycle per profession . . . . . . . . . . . . . . . . . . . . 74
A.2.2 Inuence business cycle per disorder . . . . . . . . . . . . . . . . . . . . . 75
B Estimation results of the logit model 77
B.1 Business cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
C Business cycle 83
D Comparison of models 84
D.1 Comparing non-nested models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
D.1.1 Maximum likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
D.1.2 Information Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Chapter 1
Introduction
When an employee is sick or injured and it is impossible to perform his normal working activities,
his employer is obliged to continue his salary payments. In the rst year this amounts to a full
100% of his last salary and in the second year this is 70%. After these two years the government
will provide benet payments until the employee is recovered. However, about 14% of the Dutch
labour force consists of self-employed, who cannot benet from this construction. In order to
assure themselves of continuation of income and make sure their business survives in case of
long-term sickness or disability, a self-employed can buy a so-called disability income insurance.
In this thesis we consider a large Dutch insurance company, which provides a disability income
insurance for self-employed. For this company it is crucial to know which risk factors aect
the probability that a client becomes disabled and which aect the return-to-work process. Un-
derstanding these risk factors gives the company more insight into the individual risk of each
applicant and it can help to determine the premium that should be asked and the amount of
capital that should be kept. To illustrate the importance of knowing what these factors are and
entail, we will start by sketching the nancial and economic background. The yearly premium
of the disability income insurance amounts to 250 275 million. This means that a dierence
of one percentage point in the benet payments results in a prot or loss of almost 3 million.
Besides this, the nancial interest is also caused by the planned introduction of Solvency II in
January 2013. This is the newest risk management regulatory framework, developed by the Eu-
ropean Union (EU), and consists of a three-pillar structure of insurance supervision. The most
important of these three is quantitative requirements, which is a set of rules about determining
the minimal capital and the target capital. The minimal capital partly depends on whether the
business is related to life or non-life insurance, whereas the target capital corresponds to the
insurers economic capital for running its business within a given safety level. To determine the
target capital, an insurance company can use either a standard or an internal risk model. The
latter is a model constructed by the insurer itself for its specic needs, based on its own data. In
contrast, a standard model is one designed by the EU and is used uniformly across insurers. It is
expected that internal models result in more accurate analysis of the insurers nancial situation
than the more generic standard models. However, before an internal model may be used, it
has to be certied by the EU. This process requires detailed documentation of the model and
its underlying assumptions. Furthermore it has to be examined periodically to ensure that the
model is properly adjusted to the dynamic nancial environment.
It is reasonable to assume that more knowledge about which risk factors aect the disability
process will result in a better-tting model. When the t of the model improves, this will result
1
CHAPTER 1. INTRODUCTION 2
in less unexplained variance in the payments. With Solvency II in mind, this means that the
company could keep less target capital. This amount can then be used for other purposes to
benet the company.
In this thesis we will focus on the loss ratio, which is the ratio between the benet payments and
the premium. In an ideal situation, the company would be able to predict the payments with
great accuracy so they could adjust their premiums in order to keep this ratio approximately
constant over all years. However, the loss ratio of the company under consideration was not
constant at all, but showed some large variations over the last years. This is shown in the graph
below.
Figure 1.1: The loss ratio of the company under consideration in the period 2001 2010.
The aim of this thesis is to explain these uctuations and to give advice on how this explanation
could be used in order to improve the current models. The loss ratio is determined by three
things: the probability that an insured becomes disabled, the duration of the disability, and the
premium that is received. Therefore our research question should be divided into three separate
sections. We will start by briey discussing the last one. The premium is determined solely by
the company itself and it is not aected by other external factors. Therefore it seems reasonable
not to analyse it in this thesis. Still, it is worth mentioning that in the last years the policy of the
company has changed. In order to satisfy their loyal costumers they felt that a new insurance
should not be much cheaper than an existing one. This resulted in a reduction in prizes for the
existing clients, which amounted to a total of 10 million. However, even if the company had
not implemented this reduction and we would add this 10 million extra premium to the loss
ratio, the uctuations would still be clearly observable. This means that the uctuations are not
solely caused by this price reduction. Therefore the research question is now reduced to:
What is the cause of the variations in the benet payments?
To answer this question we rst take a look at the chance an insured becomes disabled. As it
turns out, this chance is rather constant and there are no large deviations, as is shown in 1.2.
Figure 1.2: The percentage of insured that become disabled in a certain month
Therefore it can be stated that there are no indications that the uctuations in the loss ratio
are caused by changes in the disability probability. On the other hand, the average duration of
a claim does vary over time. This is clearly illustrated in gure 1.3. Since the inux did not
change substantially, an increase in the percentage disabled should be caused by an increase in
disability duration and vice versa.
Figure 1.3: The percentage of insured that is disabled.
Therefore the nal remaining question we would like to answer in this thesis is: What did cause
the uctuations in disability durations? First of all the current models do not include certain
variables, like the year the disability started, the brand, and the deferment period, which we
assume to be important factors in disability duration. In the literature various other causes are
mentioned. For example, De Ravin [15] refers to the following reasons for increasing disability
experience:
Weaker underwriting standards;
Greater awareness of the cover and the right to claim;
Changing work ethic and social attitudes to insurance;
Changes in the economic environment;
More liberal denitions of disability;
Weakening of other policy terms and conditions
Under-resourced or under-skilled claims management
The hypothesis that changes in the economic environment inuence the recovery process is also
stated in [30] and [31]. Looking back at gure 1.3 it is noticed that the increase in the percentage
of disabled insured started in the rst quarter of 2008, the year in which the nancial and
economic crisis started. Therefore it seems reasonable to assume that the economic environment
is a major contributor to the variation in disability duration.
With respect to the business cycle, the hypothesis is that a self-employed wants to return to
his business as soon as possible when the economy is booming, since in that case a large prot
and hence a large income can be achieved. Moral hazard could play a role when the economy
is in a downturn. During a recession the income of a self-employed is likely to become less as
compared to periods of high economic growth and hence a replacement income paid by disability
insurance may seem an attractive alternative. These expectations are conrmed by Smoluk (31),
who states that when the consumption-to-wealth ratio
1
is high, long-term disability claim rates
are low, and vice versa.
It is also expected that the inuence diers per disorder and profession. For example, it is
plausible that the inuence on claimants with cancer is relatively small, whereas it will probably
be more noticeable on stress-related disorders, such as backache or other locomotive disorders.
Since the economic crisis heavily aected the construction and the shopping industry we think
that those claimants are more prone to changes in the economic environment. On the other
hand, we assume that claimants working in the (para)medical sector are hardly aected by the
business cycle.
The articles written by Spierdijk en Koning (32), Amelink (7) and Bultena (10) are used as a
starting point of this thesis. In these articles similar data sets are used to identify risk factors
for positive and negative recovery and to estimate the claim reserves and its uncertainty. We
will extend this analysis by including extra variables and not only use survival analysis, but
also logistic regression methods. The outline of this thesis is as follows: In chapter 1 it will be
explained how disability income insurance for self-employed is organized in the Netherlands. Also
some aspects of the insurance company under consideration and properties of the data set are
discussed. In chapter 2 a multi-state model to model the recovery process is introduced, together
with the principle of Markov chains. The chapter nishes with an overview of all explanatory
variables which will be considered. In chapters 3 and 4 we respectively introduce survival and
logit models. These models will be used to estimate the four transitions in the multi-state
1
Financial economic theory suggests that the consumption-to-wealth ratio reects consumption smoothing and
reveals expectations about future wealth. For individuals contemplating submitting an LTD claim, the expected
payo to exercising this insurance option is a function of their expectations about their future wealth.
model. The results of these methods will be discussed in chapter 5. In chapter 6 we will quantify
the inuence of the various risk factors by calculating the expected duration until recovery for
dierent values of the risk factors. Finally, in chapter 7 we will draw our conclusions and give
an advice on how the results could be used in practice.
Chapter 2
Disability Income Insurance
In this chapter it rst will be explained how income insurance for self-employed is organized in the
Netherlands. Second, we will discuss some features of the income insurance sold by the company
that provided the data for this thesis. We will nish with an overview of the characteristics of
the data that will be used in our analysis.
2.1 History
Until 2004, the state provided income insurance for self-employed in the Netherlands by means
of the WAZ, the Wet Arbeidsongeschikheidsverzekering Zelfstandigen (English: Self-Employed
Income Insurance Act). The costs of the WAZ were funded from the tax payed by the self-
employed. This insurance, however, only included compensation for loss of income after one year
of disability and for a maximum of 70% of the statutory minimum wage. For those who wanted
earlier or extra compensation, private insurance companies provided additional insurance. Such
an insurance consisted of two parts: an A-cover and a B-cover. The rst one provided income
in the rst year of sickness, whereas the second one covered the income loss after the rst year.
The cover of the WAZ was very limited, but at the same time it was quite expensive for many self-
employed. Therefore there were a lot of complaints and in August 2004 the WAZ was abolished
by the Dutch government, partly at the request of MKB-Nederland (the largest entrepreneurs
organization in the Netherlands, representing small and medium-sized companies). Since then,
income insurance for self-employed has only been available from private insurance companies.
2.2 Features of the insurance
In this section we will have a closer look at the policy conditions belonging to the income insurance
sold by the company that provided the data for this study.
Upon buying an income insurance sold by the company under consideration, a self-employed has
to make a number of choices. First of all he has to decide about the amount insured. In order
to avoid moral hazard, this annual replacement income can never exceed 80% of the income of
the self-employed, with a maximum of 250.000. Second, he has to choose the deferment period,
which refers to the time between becoming disabled and the start of the benet payments. This
period can be 14 days, 1, 2, 3, 6, 12 or even 24 months and depends on how long the insured can
survive with other income or savings. The longer this deferment period, the lower the premium
6
CHAPTER 2. DISABILITY INCOME INSURANCE 7
that should be payed. Furthermore an insured should decide which criterion is used in order
to determine whether he is disabled: unable to perform his original profession or disabled to
perform any kind of job. Last, a client has to choose the end age, which is the age at which
the payments will stop no matter the health state of the insured. This can chosen to be 50 or
any age between 55 and 65. Instead of choosing an end age there is also the option of receiving
benet payments for a xed period of time, namely 1, 2, 3, 4 or 5 years.
The benet payments depend on the percentage of disability of a self-employed, which is also
known as the replacement rate. There are two dierent cases.
1. The benet payments equal the percentage of incapacity times the amount insured.
2. The benet payments equal the payout percentage times the amount insured, where the
payout percentages are dened as follows:
disability percentage payout percentage
0 - 24 % 0
25 - 34 % 30
35 - 44 % 40
45 - 54 % 50
55 - 64 % 60
65 - 79 % 75
80 - 100 % 100
In both cases the client should choose a lower bound. If his disability percentage is beneath this
bound, he will not receive any benet payments. In the rst case the lower bound can be chosen
to be 25%, 35%, 50% or 75%, whereas in the second case the choice is between 25%, 35%, 45%,
55%, 65% and 80%. For ease, in this thesis we will only consider case 1 and we will assume that
the lower bounds are set to 25%, which is the case for the majority of the insured.
When an insured is fully disabled, no insurance premium has to be paid. In all other cases the
insured has to pay a premium which equals the original premium times the percentage he is still
able to work. The benet payments can end for various reasons, for example if the insured is
recovered, passes away or reaches his end age.
2.3 Data description
The data set used for this thesis has been provided by a large Dutch insurance company and
consists of 62451 approved sick leave claims during the period running from December 2002 up
to February 2012. For each claim a wide range of characteristics is given. Besides the gender and
the day of birth, we also know the profession in which the claimant is working. In addition, the
date on which the disability started and the current status of the claim are known. Furthermore
for each claim we know the total disability duration, measured in months, and the evolution of
the replacement rate during the incapacity spell. This rate is a time-varying variable reported
on a monthly basis. Finally we have information about the postal code, the amount insured and
the brand of insurance (1, 2, 3 or 4).
For some of the claims the rst couple of replacement rates are missing. This can be due to
several reasons, for example it can be the case that the insurance company was not able to
determine the rate for the rst month(s). This can happen when the claim is reported at the last
day of a given month. Another explanation could be that the claimant has a deferment period
of a couple of days, meaning that there will be some time between the start of the disability and
the start of the observation. It could also be that a client does not report his claim immediately,
but waits for a period which can even last a couple of years. If the rst replacement rates are
missing, these will be treated as delayed entries. Claims for which all entries are either a missing
or between 0% and 25%, will be removed from our data set. Also claims due to pregnancy
without complications (the so-called G600) are removed, since these claims follow a dierent
process than claims due to other illnesses. For example, pregnancy claims almost always occur
to women and are more frequent among younger insured, whereas the prevalence of other claims
is higher among older clients. The nal adjustment that had to be made concerns the dierent
brands. In our data set, 95% of the claims correspond to brand 1. The remaining 5% belongs to
either 2, 3 or 4, and these are added to our data set after the merger of the four dierent brands
in the years 2009/2010. This results in relatively few short claims and more long-term disability
claims for these brands, which causes a bias. Therefore it is chosen to only consider the claims
belonging to brand 1.
This resulted in a nal data set containing 31780 claims. However, before we could start analyzing
these data, we had to make some corrections.
If the claim had ended because of death or expiring, the zeros after the last positive
disability percentage were replaced by missings. In this way only the changes in disability
during the validity of the claim are considered, and not the seemingly recovery at the end.
Zeros preceding the rst positive disability percentages were replaced by missings. This
was the case by so-called IBNER claims (incurred, but not enough reported), meaning that
the claim was fully reported only after some time since the disability started. It would be
incorrect to assign the value 0 to this cases, since that would mean that the client is not
disabled.
There were two claims which were noteworthy, because they had a striking deviating re-
covery process. After checking, however, it turned out that some replacement rates were
wrong, so these were corrected manually.
From the nal set 6171 claims are still continuing. We cannot simply assume that these have
ended in our observation window, so we will treat these claims as right-censored. If we consider
the claims that have ended, we see that most of them did so because of recovery of the insured.
There are, however, other reasons for the benet payments to stop, for example because the
insured passed away or had reached his end age. For our analysis we are only interested in those
claims which have ended because of a full recovery. Claims ended because of another reason will
therefore also be treated as right censored, resulting in a total of 7083 right-censored claims,
which is 22, 29% of the total data set.
We end this chapter by providing an overview of some characteristics of our data:
Gender Number Percentage
Male 27956 87, 97%
Female 3824 12, 03%
Table 2.1: Preliminary statistics: gender
Profession Number Percentage
Agriculture 13639 42, 92%
Construction 5987 18, 84%
Shopkeeper 2592 8, 16%
(Para)Medical 2407 7, 57%
Service 1655 5, 21%
Other 5500 17, 30%
Table 2.2: Preliminary statistics: type of profession.
Disorder Symptoms Cancer Infections Injury Other
Locomotive disease 28, 7% 0, 2% 0% 23, 0% 12, 5%
Psychological disease 4, 2% 0% 0% 0% 5, 3%
Digestive disease 0, 4% 0, 5% 0, 2% 0% 3, 8%
Other 4, 5% 1, 9% 1, 3% 2.0% 11.8%
Table 2.3: Preliminary statistics: type of disorder.
The age of the claimants at the start of the disability varies between 18 and 64 years. The
average age is 43 years.
Chapter 3
Recovery models
In order to determine to what account the business cycle aects the expected duration until
recovery, we have to be able to determine what the expected duration until recovery would be if
the inuence of the business cycle is neglected. Therefore we rst will discuss how the disability
durations are modeled by the insurance company until now. A drawback of this current model
is that it only focuses on the total disability volume of all insured. Therefore no individual
information of the claimants is considered and no prognoses about partial or negative recovery
(= transition to a higher disability percentage) are provided. However, information about these
types of rehabilitation can be very useful, since claimants who face a health deterioration during
their disability spell may be particularly costly for insurers, due to a higher replacement income
and a possibly prolonged sick leave duration. It is therefore important to analyze the transi-
tions from one health state to another and to know which variables aect the various transition
rates. Furthermore, the future recovery process of a claimant is likely to depend on his current
health state. For example, we expect slower recovery for a self-employed with a higher disability
percentage. By analyzing disability spells in relation to the health condition of the claimant,
we can assess the precise role of the risk factors in each stage of the return-to-work process.
Another drawback of the current model is that it only contains a limited amount of explanatory
variables, namely age, gender and time. In order to improve this and to give more insight in
the entire interrelated trajectory of the process of rehabilitation, we will present a multi-state
model based on Markov-processes. We will add extra variables which are expected to aect the
recovery process, such as type of disorder and deferment period.
3.1 Current model
In the model that is currently used by the insurance company there is a distinction between two
states:
Healthy (H): Disability of 0 24%. In this state no benet payments are made by the
insurer, so the insured is either at work or in a unpaid sick leave situation.
Disabled (D): Disability of 25 100%.
In order to model the transitions between these states the following denitions are used:
Denition 1. The disability volume at time t is the sum of all disability percentages of all
insured at time t. With DIS%(X, A, t) we denote the disability volume at time t of all insured
who are X years old and who are disabled for A to A+ 1 months at time t.
10
CHAPTER 3. RECOVERY MODELS 11
Figure 3.1: The two-state model which is currently used by the insurance company. Transition
reects invalidation and recovery.
Denition 2. The recovery probability r(X, A, t) is the probability that an insured, who became
disabled at the age of X and at time t 1 is disabled for A months, recovers in the subsequent
month.
The recovery probability is determined by the dierence in the disability volume of all insured
of age X of two consecutive months, divided by the begin situation, or in formula (for t 1):
r(X, A, t) =
DIS%(X, A1, t 1) DIS%(X, A, t)
DIS%(X, A1, t 1)
Applying this formula to our data set with X = 43 results in the graph shown in gure 3.2
(averaged over time t). The graphs shows a wiggly character for large A, since there are relatively
few observations for these times. However, it is clear that the recovery probability strongly
declines as the disability spell continues.
Figure 3.2: Recovery probabilities for a 43 years old.
Besides the inuences of time, it was also shown that age is an important risk factor. In general
one can say that the older the claimant, the slower the recovery. Therefore it was chosen to pick
a linear relation between age and recovery. Third it was pointed out that the graphs showed
some cracks. This was modeled by dividing the formula into ve parts and by including three
extra parameters for the rst three months of disability. Because of the condentiality of the
information, the exact formulas cannot be shown in this thesis.
3.2 Multi-state model
As mentioned before, the current model has the drawback that it only considers the total disabil-
ity volume of all insured and it does not provide individual information about partial or negative
recovery. In order to give more insight in the recovery process, we will split the state of disability
into more states. In this way we can distinguish between mild and severe disability. When an
insured is disabled, his level of disability is expressed in a percentage, ranging from 0 to 100%. If
we would include all these dierent percentages in our model, this would result in 101 states and
101 100 = 10100 possible transitions. Since we would like to create a simple model to describe
the behavior of insured, this is not practical. Therefore we will dene a model with only three
disability states. The choice of these states is based on the graph shown in gure 3.3. We notice
that the most common percentages in the third month of disability are 0, around 50 and 100.
Figure 3.3: The frequency of disability percentages in the third month of disability.
Based on this frequencies the dierent disability percentages can be classied into three states
1
:
State 0, Healthy: Disability of 0 24%. In this state no benet payments are made by the
insurer, so the insured is either at work or in a unpaid sick leave situation.
State 1, Partial disabled: Disability of 25 75%.
State 2, Disabled: Disability of 76 100%.
During a disability spell, a claimant can jump between these three states. There are six possible
transitions: 0 1, 0 2, 1 0, 1 2, 2 0 and 2 1. However, the rst two transitions
represent the probability that an insured becomes disabled or that a claimant experiences a fall-
back within four weeks after he fully recovered. These transitions will not be discussed in this
thesis. Consequently, there are four transitions left to focus on. A graphical representation of
the multi-state model is given in gure 3.4. We notice that transition 1 2 represents a decline
in health status, whereas the other three reect a full or partial recovery.
A claimant makes a transition from one state to another at some moment in time. There are
two dierent ways we can specify time here. The rst possibility is to dene it as the total time
spend in the system, whereas another way would be to specify it as the time in the current state.
In the rst case we have that the time at which a claimant makes a transition is the time since
the start of the disability and in the second case it is the time spend in a particular state since
1
We have performed some robustness checks and it turned out that our nal results are robust to changes in
the denition of the three health states.
Figure 3.4: A multi-state model with three states, representing the dierent degrees of disability.
The two transitions which are illustrated by the dashed arrows are not observed.
the previous transition. As the purpose of this thesis is to address the inuence of the business
cycle of the expected duration until recovery it is reasonable to consider the time in the system.
Claims with a duration of more than 3 years are regarded chronical, and it is assumed that the
recovery processes of these claimants are not aected by the business cycle. Therefore we will
only consider transitions that take place before the 37th month. This results in a total of 33595
transitions, from which an overview is given below in table 3.1.
Current Next
0 1 2
1 14584 (43, 4%) - 3118 (9, 3%)
2 6554 (19, 5%) 9339 (27, 8%) -
Table 3.1: An overview of the transitions in the data set.
3.2.1 Markov chains
A natural way to model the recovery process is by using a multi-state model. Andersen et al ([3])
has studied such models using a nite state Markov process model where the hazard rates for each
possible transition in the multi-state model are modeled by a separate Cox proportional hazards
model. Proportional hazards models will be explained in chapter 4 about survival analysis. In
this section we will give a more formal denition of the multi-state model introduced in the
previous section. Therefore we need the denitions of a Markov chain and stochastic processes,
which we will discuss now.
Loosely speaking, a stochastic process is a phenomenon that can be thought of as evolving in
time in a random manner. More formal we dene:
Denition 3. A stochastic process is a collection X = (X
t
)
tT
of measurable maps from a
probability space (, F, P) to the state space (E, E).
The index t is a time parameter, and we view the index set T as the set of all observation instants
of the process. The stochastic process is called a discrete-time process when T is countable. On
the contrary, when T is an interval of the real line we say that X = (X
t
)
tT
is a continuous-time
process.
Denition 4. Let X = (X
t
)
tT
be a stochastic process with nite state space E = {x
0
, x
1
, ...x
m
}
and let T correspond to a nite set of times. Then X = (X
t
)
tT
is called a Markov chain if the
Markov property is satised: n = 0, 1, .. and x
0
, ..., x
n
E it holds that
P(X
n
= x
n
|X
0
= x
0
, ..., X
n1
= x
n1
) = P(X
n
= x
n
|X
n1
= x
n1
). (3.1)
We notice that the Markov property states that the transition probability only depends on the
current state X
n1
= x
n1
, and is independent of the path before x
n1
.
Another way to characterize a Markov chain is by stating:
f(x
0
, ..., x
n
) = f(x
0
)f(x
1
|x
0
)f(x
2
|x
0
, x
1
) ... f(x
n
|x
0
, ..., x
n1
)
= f(x
0
)f(x
1
|x
0
)f(x
2
|x
1
) ... f(x
n
|x
n1
).
Between two dierent states from E transitions can take place. These are formally dened as
the set
{(i, j)|i = j; i, j E}.
Denition 5. For 0 t < s and i, j {0, .., m} the transition probabilities are given by:
P
ij
(t, s) = P(X
s
= x
j
|X
t
= x
i
).
The transition intensities are dened by:
q
ij
(t) = lim
st
P
ij
(t, s)
s t
Therefore, the transition intensity can be interpreted as an instantaneous probability of going
from state x
i
to state x
j
. The advantage of the transition intensity over the related probability,
is the fact that it depends on a single time variable, instead of two.
We end this section by formally dene our multi-state model as a Markov chain X = (X
t
)
tT
with T = {0, 1, ..., 36} in months and state space (E = {0, 1, 2}, 2
E
). The transitions are given
by {(0, 1), (0, 2), (1, 0), (1, 2), (2, 0), (2, 1)}, from which we only focus on the last four.
3.3 Explanatory variables
In this section we will discuss the independent variables which might aect the recovery pro-
cess. First, we will give an overview of the individual-specic covariates that will be considered.
Second, the variables related to the business cycle will be discussed.
Regressor Description
Socio-economic status
Gender Dummy variable for gender (male = 1)
Age Age at the start of the disability spell (in years)
Classication of disorder
Locomotive Dummy variable for locomotive disease
Psychological Dummy variable for psychological disease
Digestive Dummy variable for digestive disease
Type of disorder
1 Dummy variable for symptoms
2 Dummy variable for cancer
4 Dummy variable for infections
5 Dummy variable for injury
Occupational class
Dummy variables for Agricultural sector, Construction, Shopkeeper,
(Para)Medical sector, Service and Other
Contract characteristics
Compensation Insured income
Deferment
2
Deferment period
Other
Dis. Year The year in which the disability started
Previous Dummy variable for previous state
Table 3.2: Description of (possible) explanatory variables.
3.3.1 Business cycle
The term business cycle (or economic cycle) refers to economy-wide uctuations in production
or economic activity over several months or years. These uctuations occur around a long-term
growth trend, and typically involve shifts over time between periods of relatively rapid economic
growth (an expansion or boom), and periods of relative stagnation or decline (a contraction or
recession).
2
The possible deferment periods are divided in four groups: A ( 14 days), B (1 month), C (2 6 months)
and D (> 6 months).
Figure 3.5: The dierent phases of the business cycle.
In order to assist in the analysis of the state and the course of the Dutch economy, the CBS,
also called Statistics Netherlands, developed the Business Cycle Tracer (BCT). As its name
indicates, the BCT traces the cyclical nature of economic developments. The state of the business
cycle is determined using a selection of key macro-economic indicators. Portraying this fteen
indicators together results in a coherent picture of the state of the economy at a particular
moment in time. We will illustrate this with the following gure.
Figure 3.6: The Business Cycle Tracer in March 2012.
For each indicator, the deviation from its long-term trend is given on the y-axis and the period-
on-period change is given on the x-axis. Four situations can the distinguished, corresponding to
the four dierent quadrants:
Above trend and decreased (upper left-hand quadrant).
Below trend and decreased (lower left-hand quadrant).
Below trend and increased (lower right-hand quadrant).
Above trend and increased (upper right-hand quadrant).
For each indicator this results in a coordinate in one of the four quadrants. The distribution of
the various indicators across these quadrants gives an indication of the state and the course of
the Dutch business cycle. This is based on the average position and movement of the indicators.
In a period of high economic growth most of of the indicators will be above trend, whereas in a
period of economic decline they will be below trend.
The 15 indicators are divided into three macro-economic clusters: condence, economy and
labour market. To analyse the business cycle, it is also important to know how the 15 indicators
relate to each other in terms of time, which is also called business cycle phasing. To this end
the indicators can be divided into leading, coincident and lagging with respect to upward and
downward movements in the business cycle. The extent to which amount each indicator leads,
coincides or lags is calculated by correlating it to the Business Cycle Tracer Indicator. This
indicator is the unweighed arithmetic mean of the fteen indicators [38].
Leading indicators are the rst to show which way macro-economic activity is headed in the
medium term. Therefore it is important to look at these indicators rst in the BCT. Normally
they move into a subsequent phase an average six months earlier than the coincident indicators.
There are 5 leading indicators in total, of which 4 are condence indicators. As a sentiment
factor, condence in the economy will adjust to the business cycle more quickly than the physical
economic and labour market indicators. The 5th leading indicator is temp hours. Because of
its temporary nature, work via temp agencies can also adapt to economic circumstance more
quickly.
Coincident indicators correlate most closely in time with the upward and downward movements
in macro-economic activity. There are 7 coincident indicators in the BCT, of which 6 economic
and 1 labour market indicator (bankruptcies). The coincident indicators are very important for
the BCT, as they provide the actual information for a reliable up-to-date picture of the Dutch
business cycle.
Lastly, there is the group of lagging indicators. In the BCT these lagging indicators are the second
conrmation that the business cycle has moved to or is in a next phase. It is no coincidence
that the 3 lagging indicators are all labour market indicators: labour volume, job vacancies and
unemployment. As a result of the strict labour regulations in the Netherlands, it takes some
time for the rigid labour market to adapt to changes in the economy. Compared with the other
two groups, the movements of the lagging indicators are the calmest. And this is exactly why
they are included in the BCT. Once they begin to change, there is no doubt about the way the
economy is headed.
The BCT indicator shows the state of the business cycle in one gure. The leading indicators
will have shown this on average six months earlier. They should be seen as the quick business
cycle indicators. But it is the coincident indicators that show the actual changes in the business
cycle. The role of the lagging indicators is mainly to conrm the durability of the business cycle
changes. This is important because the course of the cycle is not constant but variable. An
overview of the phasing of the 15 indicators is shown in gure 3.7.
Figure 3.7: Phasing of the BCT indicators
In order to address the inuence of the business cycle on the recovery process, a proper measure
should be used. However, as illustrated above, there is no universally agreed single measure of
the economy. Data on interest rates, unemployment, GDP and ination are frequently quoted in
the media, but this is by no means a denitive list of economic variables. Other measures such as
the number of bankruptcies and retail sales are also useful in explaining aspects of the economy.
Each economic statistic captures some part of the economy, but no measure has been found
that adequately describes the overall situation [30]. Furthermore, these measures dier in the
degree of correlation with the producer environment. Based on these dierences and exploratory
analyses it was chosen to consider the 5 indicators described below
3
.
DNB business cycle indicator
The DNB business cycle indicator provides insight into the economic outlook in the short term
and aims to identify turning points in the Dutch business cycle at maximum seven months
forward. The indicator is drawn from consumer and producer surveys, nancial indicators and
export indicators [39].
Condence
As summary measure of the leading indicators it is chosen to consider the condence, which is
dened as the weighted average of the producer and consumer condence index:
The producer condence index (PCI) or business condence is a survey of 1700 man-
ufacturing companies which gathers up-to-date information on economic developments for
all activities of the manufacturing industry. The basis of this producer condence consists
of three components of the economic survey: how companies evaluate their order positions,
the number of nished products in stock and the anticipated economic activity in the next
three months.
The consumer condence index (CCI) is an indicator designed to measure consumer
condence, which is dened as the degree of optimism on the state of the economy that
consumers are expressing through their activities of savings and spending.
Since the values of the PCI range from 23, 5 to 9, 4, whereas the values of the CCI range from
40 to 18 it is chosen to assign them the weights 2 and 1 respectively.
Hence: Condence = (2*PCI + CCI)/3.
3
Information about these variables can be found on www.cbs.nl and www.dnb.nl
Gross domestic product
As coincident indicator it is chosen to use the gross domestic product (GDP), since it is the main
indicator for the development of the economy [38]. The GDP refers to the market value of all
ocially recognized nal goods and services produced in the Netherlands. Economic growth is
measured in terms of the volume change in GDP. In formula-form the GDP can be described as:
GDP = private consumption + gross investment + government spending + (exports - imports).
Labour
As summary measure of the lagging indicators the arithmetic average of the labour volume and
the number of vacancies is used.
Average income
As a unconventional measure of the business climate, we have chosen to consider the average
income of a self-employed, corrected for ination.
It is assumed that an increases in each of these measures correspond with an improvement in the
business climate, and vice versa. In his thesis [10], Pieter Bultena measured the inuence of the
unemployment rate on the dierent transitions. We have chosen to neglect this variable, since it
is a measure of employees rather than of self-employed. On the other hand, Remko Amelink [7]
used the GDP growth rate and the business condence index as business cycle related variables.
He concludes that a decrease in the GDP growth rate leads to higher recovery rates, hence
shorter durations. For the coecient of the business condence index he allows for a structural
change after March 2009. After this date, most of the self-employed started to experience the
consequences of the nancial and economic crisis. It is concluded that until March 2009, an
increase in the business condence index led to a signicant increase in recovery rates. However,
after this date, a change in the index has no signicant eect anymore.
Chapter 4
Survival Theory
In this and the following chapter we will discuss econometric models which could be used for
the four transitions introduced in section 3.2. An econometric model is a set of joint probability
distributions to which the true joint probability distribution of the variables under study is
supposed to belong. In the case in which the elements of this set can be indexed by a nite
number of real-valued parameters, the model is called a parametric model; otherwise it is a
nonparametric or semi-parametric model.
We note that in our case there are only two possible outcomes for each transition (jump or no
jump), hence the number of models that can be used is limited. In this chapter we will discuss
survival analysis, in the next one we will continue with binary regression.
In survival analysis, interest centers on a group or groups of individuals for each of whom (which)
there is dened a point event, often called a failure, occurring after a length of time called the
failure time. To determine this time precisely, there are three requirements:
1. A time origin must be unambiguously dened.
2. A scale for measuring the passage of time must be agreed.
3. The meaning of failure must be entirely clear.
A special source of diculty in the analysis of survival data is the possibility that some individuals
may not be observed for the full time to failure. Ideally both the birth and death dates of all
subjects are known (for our purpose the date a claimant enters and leaves a specic state). In
practice, however, this often will not be the case. Sometimes it is only known that the failure
time is after some date, which is called right censoring. Right censoring will occur for those
subjects whose date of birth is known, but who are still alive when they withdraw from the
study or when the study ends. If a subjects lifetime is known to be less than a certain duration,
the lifetime is said to be left-censored.
It can also happen that subjects with a lifetime less than some threshold may not be observed
at all: this is called truncation. Note that truncation is dierent from left-censoring, since for
a left-censored datum, we know that the subject exists, but for a truncated datum, we may be
completely unaware of the subject.
In the rest of this chapter we think of failure time as a continuous random variable T, equipped
with distribution function F(t) = P(T t) = P(T < t) and probability density function
20
CHAPTER 4. SURVIVAL THEORY 21
f(t) = dF/dt. We consider a large population of people who enter some given state at a time we
shall identify as T = 0. The calender time of entry need not be the same for all people and in
most practical cases it will not be. Thus, T measures the time on person-specic clocks that are
each set to zero at the moment that person enters the state we consider. T is then referred to as
the duration of stay in the state. For now it assumed that the population is homogeneous with
respect to regressor variables that aect the distribution of T. This means that the duration of
stay will be a realization of a random variable from the same probability distribution.
4.1 Hazard function
The probability that a person who has occupied a state for a time t leaves it in the short interval dt
after t is equal to P(t T < t +dt|T t). However, our interest is focused on the instantaneous
rate of leaving per unit time period at t. Therefore we need the following denition:
Denition 6. The hazard function is dened as:
(t) = lim
t0
P(t T < t + t|T t)
t
.
Globally stated the hazard function, which is also called the hazard rate, is the probability that a
certain event happens in a certain period, given what has happened before the beginning of that
period. We note, however, that the hazard rate is not a true probability in the sense that it can
exceed the value 1 when t decreases. It is most useful to think of the hazard as a characteristic
of individuals, not of populations or samples (unless everyone in the population is exactly the
same). Each individual may have a hazard function that is completely dierent from anyone
elses.
By using the denition of conditional probability, we can express the hazard function in terms
of the distribution and probability density function of the continuous random variable T:
(t) = lim
t0
P(t T < t + t)
P(T t)
1
t
= lim
t0
F(t + t) F(t)
1 F(t)
1
t
=
F
(t)
1 F(t)
=
f(t)
1 F(t)
=
f(t)
S(t)
(4.1)
where S(t) := 1 F(t) = P(T > t) is called the survival function, since it gives the probability
of survival to time t.
Since f(t) = dF(t)/dt = dS(t)/dt, we can view (4.1) as a dierential equation in t whose
solution, subject to the initial condition S(0) = 1, is given by:
S(t) = exp
_
_
t
0
(s) ds
_
, (4.2)
as can be veried by dierentiation.
This shows how one can calculate the probability distribution of the duration of state occupancy
given the hazard function. We note that it follows that the density function of T can be written
as:
f(t) = (t) exp
_
_
t
0
(s) ds
_
= (t) S(t). (4.3)
So, (t), S(t) and f(t) are alternative ways to describe the distribution of the probability of exit
over the positive real axis; if we know one, we can deduce the others.
Last, we introduce the cumulative hazard or integrated hazard, which is dened as
(t) = ln S(t) =
_
t
0
(s) ds.
We can think of (t) as the sum of the risk one will face going from duration 0 to t.
Suppose we are interested in the expectation of life (or the expected duration in a specic state).
Let E(T) = . By denition we have:
=
_

0
tf(t) dt.
Integrating by parts, and making use of the fact that f(t) =
dS(t)
dt
, which has limits S(0) = 1
and S() = 0, one can show that
=
_

0
S(t) dt. (4.4)
We have seen that the hazard function is useful to describe the probability distribution for the
time of event occurrence. Every hazard function has a corresponding probability distribution.
But hazard functions can be extremely complicated, and the associated probability distributions
may be rather complex. We only will examine some simple hazard functions and discusses
their associated probability distributions. These hazard functions are the basis for some widely
employed regression models.
Example 1. The simplest function states that the hazard is constant over time: (t) = or,
equivalently, log (t) = log() = . Substituting this hazard into equation (4.2) and carrying out
the integration implies that the survival function is S(t) = e
t
. Then, from equation ((4.3)),
we get the density function, f(t) = e
t
. This is the density function for the well-known
exponential distribution with parameter . Thus, a constant hazard implies an exponential
distribution for the time until an event occurs (or the time between events), which makes sense
due to memoryless property of this distribution.
Example 2. The next step up in complexity is to let the natural logarithm of the hazard be a
linear function of time: log (t) = + a t. Taking the logarithm is a convenient and popular
way to ensure that (t) is nonnegative, regardless of the values of , a, and t. Of course, we can
rewrite the equation as (t) = e
e
at
= e
at
. After integration we nd that S(t) = e
a
(e
at
1)
,
so F(t) = 1 e
a
(e
at
1)
. Hence, this hazard function implies that the time of event occurrence
has a Gompertz distribution.
Example 3. Another possibility is to assume: log (t) = log a + (a 1) log t or, equivalently,
(t) = at
a1
. Then the survival function equals S(t) = e
t
a
. The cumulative distribution
function becomes F(t) = 1 e
t
a
, in which we recognize the Weibull-distribution.
The Weibull model is used the most frequent in economical applications.
4.2 Kaplan Meier estimator
In this section we introduce the non-parametric Kaplan-Meier estimator, which estimates the
survivor and hazard functions (see also [16]). Most of the times it is used for preliminary analysis
of the data, since a drawback of this estimator is that it does not allow for covariates.
Suppose we consider a data set with n subjects, which have ordered survival times t
1
< ... <
t
n
< . We will use the counting process notation {N
i
(t), Y
i
(t) | 0 < t < }, where N
i
(t) takes
the value one if subject i has been observed to fail prior to time t and takes value zero otherwise.
Y
i
(t) takes value one if subject i is at risk at time t and zero otherwise. We denote the aggregated
processes by:
N(t) =
i
N
i
(t) =
i
1
{t
i
t}
: the number of spells completed up to and including time t.
Y (t) =
i
Y
i
(t): the number of persons at risk of making a transition immediately prior to
t, which is made up of those who have a censored or completed spell of length t or longer.
We start by estimating the integrated hazard rate. Therefore we consider a small interval of
time:
(s +h) (s) (s)h
= P(event in (s, s +h]|at risk at s)
It is reasonable to estimate this probability by:
N(s +h) N(s)
Y (s)
.
Integrating this over the range (0, t] yields the so-called Nelson-Aalen estimator
(t) =
_
t
0
dN(s)
Y (s)
.
Since we deal with discrete time intervals in the disability data set, it is more convenient to dene
the Nelson-Aalen estimator by the equivalent sum
(t) =
i:t
i
t
N(t
i
)
Y (t
i
)
,
where N(t
i
) denotes the number of events occurring precisely at time t
i
, the time until the ith
event. Since the integrated hazard rate has no useful interpretation, we will transform it to a
survival function. A logical estimator, proposed by Breslow (1972) is
S
B
(t) = exp
_
(t)
_
. (4.5)
However, the Kaplan-Meijer estimator uses the increment of the Nelson-Aalen estimator at the
ith failure: d
(t
i
) = dN(t
i
)/Y (t
i
). The proportion of those entering a state who survive to
the rst observed survival time t
1
is simply one minus the proportion who made a transition
out of the state by that time:

S
KM
(t
1
) = 1 dN(t
1
)/Y (t
1
) = 1 d
(t
1
). Similarly, the
proportion surviving to the second observed survival time t
2
is

S
KM
(t
1
) multiplied by one minus
the proportion who made a transition out of the state between t
1
and t
2
. More generally, the
Kaplan-Meijer estimator of the survival function is dened as:
S
KM
(t) =
i:t
i
t
(1 d
(t)) (4.6)
This estimator diers slightly from

S
B
, but since e
x
1 x for small x, the Kaplan-Meijer
estimator works ne for small increments d
, that is when there are many subjects still at

risk. The two estimates are in fact asymptotically equivalent, since as n the individual
increments get arbitrary small [34].
4.3 Censored data and the Likelihood function
One of the main advantages of survival theory is that it can handle censored data. The only
type of censoring we will consider is right-censoring. For this type we observe spells from time
0 until a censoring time c. Some spells will have ended by this time anyway, but others will
be incomplete and all we know is that they will end somewhere in the interval (c, ). In this
section we will discuss how these censored observations can be incorporated into the likelihood
function. For a brief introduction to maximum likelihood theory we refer to (D.1.1).
Suppose we have n individuals with transition lifetimes according to the survivor function S(t),
with associated density f(t) and hazard (t). We further assume that person i is observed during
t
i
time units. If he jumps at t
i
, its contribution to the likelihood function is the density at the
duration, which according to (4.3) can be written as
L
i
= f(t
i
) = S(t
i
)(t
i
).
However, if we are dealing with a censored observation, all we know is that the lifetime exceeds
t
i
. The probability of this event is
L
i
= S(t
i
),
which becomes the contribution of a censored observation to the likelihood.
We now introduce a transition indicator d
i
, taking the value one if person i jumps and the value
zero if the observation is censored. Then the likelihood function can be written as
L =
n
i=1
L
i
=
n
i=1
S(t
i
)(t
i
)
d
i
.
Taking the natural logarithms and recalling the denition of the cumulative hazard, we obtain
the log-likelihood function
l = ln L =
n
i=1
(d
i
ln (t
i
) (t
i
)).
4.4 Cox Proportional Hazards model
Up to this point we have been concerned with a homogeneous population, where the lifetimes
of all individuals are governed by the same survival function S(t) and hazard (t). However,
individuals have distinctive features, such as age, gender and social environment, which are likely
to aect their lifetimes. To cope with this we will introduce a vector of covariates and consider
the general problem of modeling the inuence of these independent variables on the survival time.
This can be done by using a parametric model, for instance the exponential (see example 1) or
Weibull (see example 3) distribution. Such models are relatively easy to estimate in the presence
of censoring, but they produce inconsistent parameter estimates if any part of the parametric
model is misspecied. One way of resolving this is to choose parametric functional forms that are
exible and hence provide some protection against misspecication. Unfortunately, identication
and estimation of such exible functional forms can be rather complicated. However, there
is a semi-parametric method that requires less than complete distributional specication, the
Proportional Hazards model, developed by David Cox (1987, [13]). In fact, this method is viewed
as empirically so successful that it has become the standard method for analyzing survival data
[11].
In the proportional hazards (PH) model the hazard at time t for an individual with covariates
x
i
(not including a constant) is assumed to be
(t|x
i
) =
0
(t)(x
i
). (4.7)
Note that the model separates the eect of time from the eect of the covariates. The time-
dependent function
0
(t) is the baseline hazard function that describes the risk for individuals
with x
i
= 0, who serve as a reference cell. The function (x
i
) is the relative risk; a proportionate
increase or reduction in risk, associated with the set of characteristics x
i
. Note that the increase
or reduction in risk is the same at all durations t. Usually (x
i
) is chosen to be equal to e
T
x
i
,
since it ensures (x
i
) > 0. Furthermore it permits coecients to be easily interpretable: Suppose
the jth regressor x
j
increases by one unit, while the other regressors are unchanged. Then:
(t|x
new
) =
0
(t)e
T
x+
j
= e
j
(t|x)
Thus the new hazard is e
j
times the original hazard.
Besides the reasons, there are other reasons for considering this model:
1. There is a simple and easily understood interpretation to the idea that the eect of a
variable (say, treatment), is to multiply the hazard by a constant factor.
2. In some elds there is empirical evidence to support the assumption of proportional hazards
in distinct treatment groups.
3. Within this formulation, censoring and the occurrence of several types of failure are rela-
tively easily accommodated.
4. It is possible to incorporate time-varying covariates with relative ease.
Therefore the following formulation of the proportional hazards model will be used:
(t|x
i
) =
0
(t)e
T
x
i
. (4.8)
The baseline function,
0
(t), is an unspecied function, which makes the Cox model semi-
parametric. Further we note that all hazard functions (t|x) of this form are proportional
to the baseline hazard, with scale factor e
T
x
i
, which is not an explicit function of t.
4.4.1 PH assumption
The proportional hazards assumption requires that the hazard ratio is constant over time, or
equivalently, that the hazard for one individual is proportional to the hazard for any other
individual, where the proportionality constant is independent of time.
Let x and x denote the set of predictors for two dierent individuals. Then we can write the
hazard ratio as
HR =
(t| x)
(t|x)
= e
k
i=1

i
( x
i
x
i
)
.
So the proportional hazards assumption can be stated as:
e
k
i=1

i
( x
i
x
i
)
= constant,
or equivalently
k
i=1
i
( x
i
x
i
) = constant. (4.9)
David Cox observed that if the proportional hazards assumption holds (or, is assumed to hold) it
is possible to estimate the eect parameter(s) without any consideration of the baseline hazard
function. We can integrate both sides of (4.8) from 0 tot t to obtain the cumulative hazards
(t|x
i
) =
0
(t)e
T
x
i
,
which are also proportional. By changing the signs and exponentiating we obtain the survivor
functions
S(t|x
i
) = S
0
(t)
e
T
x
i
, (4.10)
where S
0
(t) = e
0
(t)
is the baseline survivor function. Thus, the eect of the covariate value x
i
on the survivor function is to raise it to a power given by the relative risk e
T
x
i
.
The most important question now arising is how to check the proportional hazards assumption.
As a general rule one can take: if two hazards cross, the PH assumption is not met, so the Cox
PH model is inappropriate. But what if the hazards do not cross? In that case there are two
general approaches to verify the PH assumption: graphical or by means of a goodness of t test
(GOF). The GOF approach is more appealing than the graphical one, since it provides a single
test statistic for each variable and is not as subjective as the graphical approach. Nevertheless,
a GOF test may be too global in that in may not detect specic departures from the PH
assumption that may be observed in the graphical way. We will now explain the two methods
in detail.
1. Graphical approach
There are two types of graphical techniques available. The most popular of these involves com-
paring estimated log-survivor curves over dierent categories of variables. A log survival curve
is simply a transformation of an estimated survival curve that results from taking the natural
log of an estimated survival probability. By using (4.10) it follows that:
ln S(t, x) =
T
x
i
+ ln S
0
(t) =
k
j=1
j
x
j
+ ln S
0
(t).
Now suppose we consider two dierent individuals and let x and x denote the set of predictors
for these two. Subtracting the corresponding second log function from the rst yields:
ln S(t, x) ln S(t, x) =
k
j1
j
( x
i
x
i
).
According to (4.9) this should be constant if the PH assumption is satised. Hence we can
conclude that if the log survival curves of two individuals are parallel over time, they satisfy the
proportional hazards assumption.
An alternative graphical approach is to compare the observed with the predicted survivor
curves. The observed curves are derived for categories of the variable being checked, without
putting this variable in the PH model. The predicted curves are derived with this variable
included in the model. If both curves are close, then the PH assumption is satised. This
method is the graphical analog of the goodness of t testing approach we will describe later and
is therefore a reasonable alternative to the log survival curve approach. In particular, the GOF
test uses the same observed and expected survival probability estimates that are used to obtain
the observed and expected plots.
The rst step is to stratify the data by categories of the predictor that is being veried. Then the
observed plots are drawn by deriving the Kaplan-Meijer curves for each category separately. To
obtain expected plots, we t a Cox PH model containing the predictor. We then substitute the
value for each category in the formula for the estimated survival curve. This results in separate
estimated survival curves for each category. Finally, these plots are compared by putting both
sets on the same graph. When the observed and expected plots are close to one another, it
can be concluded that the PH assumption is satised. If, however, one or more categories show
quite discrepant plots, we will conclude that the PH assumption is violated.
An obvious drawback to both graphical approaches is deciding how parallel or close the two
graphs should be. This is a subjective decision. Most often a conservative strategy is used,
meaning that the PH assumption is assumed to be satised unless there is strong evidence that
the two curves are nonparallel or strongly discrepant.
2. Goodness-of-t tests
In contrast to the graphical approach, the Goodness-of-t testing method provides a test statistic
or equivalent p-value for assessing the PH assumption for a given predictor. Hence it is possible
to make a more clear-cut and evidence-based decision. A number of dierent tests for assessing
the PH assumption have been proposed in the literature. The test proposed by Schoenfeld
([29]) is the most common in practice. Instead of a single residual for each individual, there are
separate residuals for each individual, for each covariate. However, the Schoenfeld residuals are
not dened for censored individuals.
The Schoenfeld residual vector is calculated on a per event time basis as:
S
i
(t) = x
i
x
i
,
where x
i
is a weighted average of the covariates over the risk set at time t and is given by:
x
i
=
n
j=1
x
j
(t)Y
j
(t) exp(
T
x
j
)
n
j=1
Y
j
(t) exp(
T
x
j
)
.
The implementation of the test can be performed in three steps.
Perform a Cox PH model and obtain the Schoenfeld residuals for each covariate and indi-
vidual.
Create a variable that ranks the order of failures. The subject who has the rst event gets
a value of 1, the next gets a value of 2, and so on.
Test the correlation between the variables created in step 1 and 2. The null hypothesis is
that there is no correlation between the Schoenfeld residuals and ranked failure time.
The idea behind this statistical test is that if the PH assumption holds for a particular variable
then the Schoenfeld residuals for that covariate will not be related to survival time. A non-
signicant (i.e., large) p-value suggest that the PH assumption is reasonable, whereas a small
p-value, say less than 0.05, suggests that the independent variable being tested does not satisfy
this assumption.
4.4.2 Partial Likelihood
Cox proposed a method to estimate in the PH model without a simultaneous estimation of the
baseline hazard function
0
(t). However, if desired, an estimate of the baseline hazard can be
recovered after estimation of . The likelihood function for the proportional hazards model can
be factored into two parts. One that depends both on
0
(t) and , and one that depends on
alone. Partial likelihood discards the rst parts and treats the second part - the partial likelihood
function - as it were an ordinary likelihood function. Because there is some information about
in the discarded portion of the likelihood function, the resulting estimates are not fully correct.
Their standard errors are larger than they would be if we would have used the entire likelihood
function to obtain the estimates. In most cases, however, the loss of eciency is quite small and
in return we gain robustness because the estimates have good properties regardless of the actual
shape of the baseline hazard function. Partial likelihood estimates still have two of the three
standard properties of ML estimates: they are consistent and asymptotically normal [11].
In this section we will discuss the basics of partial likelihood. We will assume that the failure
times of our observations are ordered and that we have a partition of observations into two groups:
those who make a transition to another state, and those who are at risk. Let t
1
< ... < t
i
< ... < t
k
denote the ordered discrete failure times of the spells in the sample of size n (n k). We dene:
The risk set R(t
i
) = {j : t
j
t
i
}: the set of individuals who are at risk of failing just
before the ith ordered failure time. It includes all spells that are not yet completed or
censored.
The death set D
i
= D(t
i
) = {j : t
i
= t
j
}: the set of subjects that die at time t
i
.
The risk score r
i
() = e
T
x
i
= r
i
, for subject i.
Furthermore, we recall the indicator function:
Y
j
(t
i
) =
_
1 if j R(t
i
)
0 otherwise
We now consider the probability that a particular spell ends at time t
i
. First we compute the
probability that spell i is the actual spell that ends.
P(T
i
= t
i
|i R(t
i
)) =
P(T
i
= t
i
|T
i
t
i
)
jR(t
i
)
P(T
j
= t
j
|T
j
t
i
)
=
(t
i
|x
i
)
jR(t
i
)
(t
i
|x
j
)
=
e
T
x
i
jR(t
i
)
e
T
x
j
(4.11)
=
r
i
j
Y
j
(t
i
)r
j
(4.12)
where in (4.11) the baseline hazard
0
(t) has dropped out, as a consequence of the PH assumption.
The partial likelihood function is now dened to be the joint product of P(T
i
= t
i
|i R(t
i
))
over the k ordered failure times:
PL() =
k
i=1
r
i
j
Y
j
(t
i
)r
j
(4.13)
The partial log-likelihood is then given by
ln(PL()) =
k
i=1
_
_
ln r
i
ln
_
_
j
Y
j
(t
i
)r
j
_
_
_
_
=
k
i=1
_
_
T
x
i
ln
_
_
j
Y
j
(t
i
)r
j
_
_
_
_
(4.14)
Equations (4.13) and (4.14) are derived by assuming continuous survival times and do not allow
for tied events. This assumption is doubtful because survival times, in our case recovery times,
are measured in discrete time units and there will be ties in the survival times. One option to
deal with discrete time data is to use a discrete survival model, for example the logistic model
which will be discussed in chapter 5. Another option would be to adjust equation (4.12). We
will discuss three ways this can be done. For each method we give the contribution to the partial
likelihood function, L
i
(). The partial likelihood is then given by PL() =
k
i=1
L
i
().
The exact method assumes that the survival time has a continuous distribution and that the
tied survival times are in fact dierent. The exact likelihood contribution is given by
L
i
() =
kD
i
r
k
qQ
i
r
q
, (4.15)
where Q
i
is the set of all d
i
-tuples that could be selected from R(t
i
) and r
q
is the product of r
j
for
all members j of the d
i
-tuple q. We notice that if there are k individuals with tied failure times,
there are k! terms in this sum. As one can imagine, this becomes quite complicated with many
tied values. The partial likelihood of (4.15) can be expressed as an integral (using integration by
parts, see [14]):
_

0
k
i=1
_
1 exp
_
r
i
t
j
Y
j
(t
i
)r
j
__
e
t
dt.
Due to the computational diculty of this method, two other methods have been proposed to
approximate the exact partial likelihood. A standard one, due to Breslow (1974), is the Breslow
approximation, which states
L
i
() =
kD
i
r
k
_
j
Y
j
(t
i
)r
j
_
|D
i
|
. (4.16)
This approximation works well if the number of failures at time t
i
is small relative to the number
at risk. If this is not the case, however, it lacks accuracy. Therefore Efron (1977) proposed the
Efron approximation of the contribution to the partial likelihood for which both subjects have
a share
1
:
L
i
() =
kD
i
r
k
|D
i
|
k=1
_
j
Y
j
(t
i
)r
j

k1
|D
i
|
jD
i
r
j
_. (4.17)
One can check that in the absence of ties these approximations give the same partial likelihood
as (4.13).
We will end this section with an example to illustrate the three methods.
Example 4. Suppose we consider four subjects for which we measure the time until an event,
where the rst two subjects have an event measured at exactly the same time. Without any
knowledge of the true ordering of the survival times of subjects 1 and 2, we have to consider all
possible orderings, which are 2! = 2. If subject 1 fails before 2, the contribution to the partial
likelihood is given by:
_
r
1
r
1
+r
2
+r
3
+r
4
__
r
2
r
2
+r
3
+r
4
_
.
A similar term arises if 2 fails before 1. The exact likelihood contribution is the average of these
two possibilities:
L
i
() =
_
r
1
r
1
+r
2
+r
3
+r
4
__
r
2
r
2
+r
3
+r
4
_
+
_
r
2
r
1
+r
2
+r
3
+r
4
__
r
1
r
1
+r
3
+r
4
_
=
r
1
r
2
(r
1
+r
2
+r
3
+r
4
)(r
2
+r
3
+r
4
) + (r
1
+r
2
+r
3
+r
4
)(r
1
+r
3
+r
4
)
Using (4.16), we see that the Breslow contribution is equal to:
L
i
() =
r
1
r
2
(r
1
+r
2
+r
3
+r
4
)
2
.
Finally, the Efron approximation is given by:
L
i
() =
r
1
r
2
(r
1
+r
2
+r
3
+r
4
)(
1
2
r
1
+
1
2
r
2
+r
3
+r
4
)
.
1
Since there are a lot of ties in our data set, we will use the Efron approximation in our estimations.
4.5 Time-varying covariates
All preceding results have been restricted to models where regressors are variables that vary
across individuals but - for a given individual - not over time. In our case, however, we will
consider variables related to the business cycle, which do vary over time. It would be incorrect
to treat these variables as if they were xed, since the entire history of the covariate over the
spell may be relevant. There are two dierent types of time-dependent variables:
An external time-dependent covariate is one whose path is generated without any inuence
of the individual.
An internal time-dependent covariate is one where the change of the covariate over time
is related to the behavior of the individual.
Suppose we have a vector of time-dependent covariates, which - for the ith individual in our
sample - is denoted by x
i
(t) = (x
i1
(t), ..., x
iq
(t))
T
, corresponding to the value of these covariates
at time t. Note that this notation allows us to use time-independent variables as well. For
example, if the jth variable is time-independent, we have that x
ij
(t) = x
ij
, for all t.
We now let x
H
i
(t) denote the history of the vector of the time-dependent covariates up to time t:
x
H
i
(t) = {x
i
(s), 0 s t}.
We now can dene the hazard rate at time t conditional on this history by:
(t|x
H
i
(t)) = lim
t0
P(t T < t + t|T t, x
H
i
(t))
t
.
This is the instantaneous rate of failure at time t, given that the individual was at risk at time
t and has history x
H
i
(t). For such a conditional hazard rate, we can consider a proportional
hazards model
(t|x
H
i
(t)) =
0
(t)e
T
g(x
H
i
(t))
,
where g(x
H
i
(t)) is a vector of functions of the history.
A note has to be made when interpreting hazard rates with time-dependent variables, since the
hazard function may not necessarily be used to construct survival distributions. For example, if
we have a time-dependent variable x, then the conditional survival distribution
S(t|x) = P(T t|x) = exp
_
_
t
0
(s|x) ds
_
is well-dened and meaningful. However, the distribution
S(t|x
H
(t)) = P(T t|x
H
(t))
may not make any sense, since x
H
(t) was measured when an individual was alive at time t.
The estimation of the regression parameters in the model, as well as the underlying cumulative
hazard functions, does not create additional diculties. That is, we can use the theory developed
so far for time-independent covariates with only slight modication. Remember the denition of
the partial likelihood (4.13). It is clear that what matters at each failure time t
j
is the value of
the regressors x
H
i
(t
j
) for those observations in the risk set R(t
j
). Thus for the ith subject, the
time-independent vector x
i
is replaced by de function of time-dependent covariates g(x
H
i
(t
j
)).
The partial likelihood has similar changes.
4.5.1 Episode splitting
One important way of dealing with time-dependent covariates in practice is by means of episode
splitting. Here, every time a covariate changes its value during an episode, the episode is split
up at that point in time, resulting in two new episodes. In our case, the variables related to
the business cycle are measured on a monthly basis, so the recovery process is split up several
times accordingly. Suppose individual i is disabled for k months. For each month he is receiving
benet payments, the corresponding values of the business cycle variables are known. By means
of episode splitting this claim is split into k dierent observations over the period of 1 month
together with the value of the variables for that specic calender month. So each of this episodes
will look like a complete case in a way that it will carry all the necessary variables. However,
there is one important dierent: if an episode has been split, this means that the rst part of
the former episode has to be considered as censored, since the event (if any) will occur only in
the last sub-episode.
Note that this procedure, even though it looks as if we have increased the number of cases in
the data, actually does not invalidate the statistical inference. This is because the real units of
event history analysis are not individuals but individuals * time, and the time span covered by
the claims does not change at all by episode splitting.
Example 5. Suppose we consider a claim which started in state 1 at t = 0, which corresponds
to the calender date 04 2005. The health status of the claimant remains unchanged for two
months. Then he recovers and the claim ends. Further suppose there are two variables, the
time-independent covariate age and the time-dependent covariate producer Condence. For
this particular spell the data can be written as a three-line record, rather than a one line record,
as follows:
ID Time Age Prod. Conf. Transition State Previous Date
1 1 42 0, 6 0 1 NA 04 2005
1 2 42 0, 1 0 1 NA 05 2005
1 3 42 1, 4 1 0 1 06 2005
Table 4.1: Episode split observation.
4.6 Competing risks
Until now we have only considered transitions from one state to another. However, if a client
is in state 1, he can jump either to state 0 or to state 2. Therefore the possibility of exit to
one of several destination states has to be considered. For this purpose we will use a so-called
competing risks model (CRM) with latent approach. This model is applicable to modeling time
in one state when exit is to a number of competing states. It is attractive because it is relatively
straightforward to implement in a PH model. In the latent approach, a survival analysis is
performed separately for each event type, where other competing events are treated as right-
censored categories. Separate hazards are thus estimated for each failure type.
We will only treat the case with two competing risks, since an insured can jump to two dierent
states, depending on the state he is staying in. However, it is relatively easy to generalize this
model to the situation with m competing risks. The setup of the model is as follows. Each
claim has an underlying failure time, which is subject to censoring. Failure time may be one of
2 dierent types, given by the set {1, 2}. We may think of this as a situation with two distinct
causes of transition from a given state. However, the occurrence of a failure of one kind removes
the individual from risks of other kinds of events. Therefore, given censoring of the remaining
duration for each individual, we observe at most one complete duration.
In a CRM model with 2 types of failure, there are 3 states {0, 1, 2}, where 0 represents the
initial state and {1, 2} are possible destination states. The model provides the joint distribution
of the spell duration T and the exit route R, which is an indicator random variable that takes
one of the values in the set {1, 2}. We dene:

1
(t) is the latent hazard rate of exit to destination 1, with survival times characterized by
the density function f
1
(t) and latent failure time given by the random variable T
1
:
1
(t) = lim
t0
P(t T < t + t, R = 1|T t)
t

2
(t) is the latent hazard rate of exit to destination 2, with survival times characterized by
the density function f
2
(t) and latent failure time given by the random variable T
2
:
2
(t) = lim
t0
P(t T < t + t, R = 2|T t)
t
(t): the hazard rate for exit to one of the two destinations.
Each destination-specic hazard rate can be thought of as the hazard rate that would apply if
transition to the other state was not possible. If this would be the case, we would be able to link
the observed hazards with the destination-specic hazard. However, since there are competing
risks, the hazard rates are latent rather than observed in this way. What we observe in the
data is either no event at all (a censored case, with spell length T
ci
) or an exit to state 1 or 2.
The observed failure time is T
i
= min{T
1i
, T
2i
, T
ci
} and the corresponding exit route is given by
r = arg min{T
1i
, T
2i
, T
ci
}. We now dene destination-specic censoring indicators:
1
i
=
_
1 if i exits to 1
0 otherwise
(4.18)
2
i
=
_
1 if i exits to 2
0 otherwise
(4.19)
Now, for each individual i we have a vector of the form (x
i
, T
i
,
1
i
,
2
i
). Sometimes it can be
useful to include a censoring indicator
c
i
which equals 1 if the spell is censored (for example
because of withdraw from study), and 0 otherwise. However, we note that
c
i
= 1
1
i

2
i
, so
all information is already captured in the vector.
Now our goal is to develop a method to estimate the destination-specic hazard rates. First, we
assume that
1
(t) and
2
(t) are independent. This implies that
(t) =
1
(t) +
2
(t).
Given that failure occurs at time t, the conditional probability that the failure is of type i = 1, 2
is
i
(t)/(t), hence that the marginal probability that the failure is of type i = 1, 2 is
P(T = T
i
) =
_

0
i
(t) exp
_
_
t
0
(z) dz
_
dt.
Independence also means that the survivor function for exit can be factored into a product of
destination-specic survivor functions:
S(t) = exp
_
_
t
0
(s) ds
_
= exp
_
_
t
0
(
1
(s) +
2
(s)) ds
_
= exp
_
_
t
0
1
(s) ds
_
exp
_
_
t
0
2
(s) ds
_
= S
1
(t)S
2
(t).
Now we consider the likelihood in this independent competing risk model with two destinations.
The individual sample likelihood contribution is of three types:
1. Exit to 1: L
1
= f
1
(T)S
2
(T). Summarizes the chances of a transition to 1 combined with no
transition to 2.
2. Exit to 2: L
2
= f
2
(T)S
1
(T). Summarizes the chances of a transition to 2 combined with no
transition to 1.
3. Censored spell: L
c
= S(T) = S
1
(T)S
2
(T).
By using the destination-specic censoring indicators (4.18) and (4.19), we obtain that the overall
contribution from the individual i to the likelihood L is given by:
L
i
= (L
1
i
)
1
i
(L
2
i
)
2
i
(L
c
i
)
1
1
i
2
i
= [f
1
(T
i
)S
2
(T
i
)]
1
i
[f
2
(T
i
)S
1
(T
i
)]
2
i
[S
1
(T
i
)S
2
(T
i
)]
1
1
i
2
i
=
_
f
1
(T
i
)
S
1
(T
i
)
_
1
i
S
1
(T
i
)
_
f
2
(T
i
)
S
2
(T
i
)
_
2
i
S
2
(T
i
)
=
_
[
1
(T
i
)]
1
i
S
1
(T
i
)
__
[
2
(T
i
)]
2
i
S
2
(T
i
)
_
.
Or, equivalently:
ln L
i
=
_
1
i
ln
1
(T
i
) + ln S
1
(T
i
)
_
+
_
2
i
ln
2
(T
i
) + ln S
2
(T
i
)
_
.
Now the log-likelihood for the whole sample equals to sum of this expression over all individ-
uals. In other words, the log-likelihood factors into two parts, each of which depends only on
parameters specic to that destinations. Hence one can maximize the overall log-likelihood by
maximizing the two components separately. These results generalize straightforward to the sit-
uation with more than two independent competing risks. This means that the model can be
estimated very easily. One should dene new destination-specic censoring variables as above
and then estimate separate models for each transition.
4.7 Unobserved heterogeneity: Frailty models
In the PH model, it is implicitly assumed that a homogeneous population is studied. This means
that all individuals sampled into the study are in principle similar and only dier on certain
covariates. In many applications, however, it is impossible to measure all relevant covariates
related to the disease of interest. The frailty approach is a statistical modeling concept which
aims to account for heterogeneity, caused by such unmeasured covariates. Frailty models are an
extension of the proportional hazards model.
We start this section by illustrating the importance of identifying unobserved heterogeneity in
survival analysis. Suppose the aggregated hazard rate out of disability is known to be a declining
function of the length of the spell. If all individuals were identical then this would imply negative
duration dependence, that is, a falling probability of recovery the longer an individual is disabled.
However, suppose that the population consists of two dierent groups of equal size: type F (fast)
and S (slow), where F has a higher hazard rate than S. As a result, group F has a higher risk
of failure than group S. This implies that the proportion of the two subpopulations in the
sample declines over time. So the hazard rate in the total population will appear to fall over
time, despite the fact that the hazard for both groups remains constant. It can be conclude that
unobserved heterogeneity may give the appearance of a total decline in hazard rate, even when
the individual rates are constant. As a result, the estimated hazard rate in models that do not
allow for unobserved heterogeneity can become biased towards negative duration dependence.
To allow for unobserved heterogeneity we can add a frailty term, or random eect, v to the
proportional hazards model (4.8). This random eect can be interpreted as a function of unob-
served explanatory variables such as risk aversion, motivation to recover, lifestyle or education.
We will use the specication that the frailty term v = e
v
operates multiplicative on the hazard
rate. Hence the hazard rate can be written as:
(t|x
i
, v) =
0
(t) e
T
x
i
+v
= (t|x
i
) v.
Because of the denition of the frailty term, there is no information available about v, which
implies that we should make assumptions on the individual values. First of all it is assumed
that v is independent of time and of the vector of covariates x. Furthermore, v should have
unit mean (normalization required for identication) and nite variance
2
> 0. Last, since the
hazard rate cannot be negative, the distribution of v has to be chosen from a class of positive
distributions. The most frequently used frailty distribution is the gamma distribution. This is
convenient from a computational and analytical point of view, because it is easy to derive the
closed form expression of the survival, density and hazard function. This is due to the simplicity
of the Laplace transform. For z > 0 the gamma density is given by:
f(z; , n)

n
(n)
z
n1
e
z
,
with E[z] = n/ and Var(z) = n/
2
. Normalization sets n = , E[z] = 1 and Var(z) = 1/.
A PH model with frailty term is better known as a mixed proportional hazard (MPH) model. It
can be shown that the relationship between the frailty survival function and the survival function
without frailty is given by:
S(t|x, v) = [S(t|x)]
v
.
Thus the individual eect v scales the survival function in a way that individuals with above-
average frailty leave relatively fast, while the opposite occurs for individuals with below-average
values of v.
Another advantage of adding a frailty term is the fact that it can help to solve the problem of
multiple spells. Suppose that person i has c(i) dierent claims, each with its own frailty term
v
i
. Conditional on the covariates x
i
and v
i
the individual hazard function (t|x
i
, v
i
) is the same
for all these spells. The likelihood contribution of person i is then given by:
_

0
...
_

0
c(i)
j=1
S(t
j
i
|x
j
i
, v
j
i
)(t
j
i
|x
j
i
, v
j
i
)
d
i
dG(v
1
i
, ..., v
c(i)
i
), (4.20)
where t
j
i
is the length of the j-th duration of individual i and G(v
1
i
, ..., v
c(i)
i
) is a joint distribution,
see [9].
However, if it is assumed that an individual will have a shared frailty term for dierent spells,
the individual likelihood contribution in (4.20) reduces to:
_

0
c(i)
j=1
S(t
j
i
|x
j
i
, v
i
)(t
j
i
|x
j
i
, v
i
)
d
i
dG(v
i
),
where G(v
i
) is a particular distribution, for example the gamma distribution.
4.8 Goodness of t
One way to asses if a Cox PH model is adequately specied is via the diagnostic approach of the
Cox-Snell residuals which are dened by
(r
CS
)
i
= exp(
T
x
i
)

0
(t),
where

0
(t) is the estimated cumulative baseline hazard function at time t and is dened as
0
(t) =
n
i=1
_
t
0
dN
i
(s)
n
j=1
Y
j
(s) exp(
T
x
j
(s))
If the tted model is correct and

is close to the true value of , then the (r
CS
)
i
s should be a
plausible sample of observations from a unit exponential distribution. Thus, a plot of Cox-Snell
residuals versus observations or time will not lead to a symmetric display [1].
Another goodness-of-t statistic can be formed by rst obtaining the partial likelihood estimate
of
i
= exp(
T
x
i
), which is

i
= exp(
T
x
i
). Then the subjects are grouped into regions based
on the percentiles of

i
, which are called percentiles of risk. Abeysekera and Sooriyarachchi [1]
suggest to form G regions of approximately equal size so that the rst group contains the n/G
subjects with the smallest

i
s, and the last groups contains the n/G subjects with the largest
i
s. In general, this classication leads to partitioning subjects that have similar risks of death
at any given time i. For g = 1, ..., G1 group indicators are dened by
I
ig
=
_
1 if

i
is in region g
0 otherwise
In order to assess the goodness of t of the PH model (t|x
i
) =
0
(t) exp(
T
x
i
) we consider the
alternative Cox model
(t|x
i
=
0
(t)) exp
_
T
x
i
+
G1
g=1
g
I
ig
_
. (4.21)
If the PH model is correctly specied, one should have
g
= 0, for all g. To test the goodness of
t of the PH model versus the alternative (4.21) one can use the likelihood ratio, Wald, or score
statistic to test the null hypothesis H
0
:
1
=
2
= ... =
G1
= 0. If the PH model has been
correctly specied, each of these statistics should have an approximate chi-squared distribution
with (G1) degrees of freedom.
Chapter 5
Binary regression
In the previous chapter it was assumed that the survival process was continuous and that change
could occur anywhere during the time interval. In many cases, however, measurements of time are
imprecise. Many researchers rely on event history data that is collected at particular intervals
in panel studies. This also is the case with the disability data in our study, where data was
collected on a monthly basis. This means that we do not know the exact timing of the event, but
rather the interval in which it occurred. For this reason, this type of data is often referred to as
interval-censored. To analyse time-to-event data when survival times are grouped into discrete
intervals of time, discrete-time models were developed.
As in the previous chapter, the focus is on how the hazard function (t) depends upon covariates.
In both continuous and discrete-time models, the risk of the event occurring at time t is being
modeled. Whereas the dependent variable in a continuous-time model is the hazard rate, in a
discrete-time model it is the odds. Although the dependent variable might at rst glance appear
to be dierent, the approximation is close and it actually conveys the same information [27].
The central dierence with discrete-time models is that the discrete-time hazard function is the
probability of an event occurring during interval (t 1, t], conditional on the fact that the event
did not occur before this interval. For T denoting the event time, this is written as:
(t) = P(T (t 1, t]|T > t 1).
The survival function is represented by:
S(t) = P(T > t|T > t 1) = 1 (t),

which is the probability that an event did not occur before time t. The cumulative probabil-
ity density function, which measures the probability that an event occurs before time t (the
probability of failure) is written as:
F(t) = P(T < t) = 1

S(t).
Considering the fact that our discrete-time data is in the form of a binary variable (transition or
no transition), it is possible to estimate commonly used binary models, which will be discussed
in this chapter. A basic knowledge of linear regression is presumed. For a brief introduction one
could consider [23] or [35].
Binary analysis is in many ways the complement of ordinary linear regression whenever the
dependent variable is is not a continuous variable but a state which may or may not hold, or a
38
CHAPTER 5. BINARY REGRESSION 39
category in a given classication. When such discrete variables occur among the regressors, they
are dealt with by the introduction of one or several (0, 1) dummy variables. However, when the
dependent variable belongs to this type, the regression model breaks down. Binary analysis then
provides a good alternative.
Suppose we consider m individuals which correspond to m independent observations y
1
, .., y
m
.
The ith observation can be treated as a realization of the random variable Y
i
, with a Bernoulli
distribution with success probability p
i
. Hence
Y
i
=
_
1 with probability p
i
0 with probability 1 p
i
We now assume that for each trial i there is a set of explanatory variables that might inuence
p
i
= P(y
i
= 1). These variables can be thought of as a k-dimensional vector x
i
= (x
i1
, ..., x
ik
)
T
.
The regression model dened by
y
i
=
0
+
1
x
i1
+
2
x
i2
+... +
k
x
ik
+
i
,
is called the linear probability model. Here
0
is called the intercept and
1
,
2
, .. the regression
coecients of x
i1
, x
i2
, ... respectively. The intercept is the value of y
i
when all the independent
variables are equal to 0, hence it is the value of someone without risk factors. Each of the
regression coecients describes the size of the contribution of the corresponding risk factor. A
positive value means that the variable increases the possibility of the outcome, whereas a negative
one means that this risk factor decreases the probability of the outcome. The larger the value,
the greater the inuence. Finally,
i
is the error term or the disturbance. It contains factors
other than x
i1
, ..x
ik
that aect y
i
. The key assumption for every regression model is:
E[
i
|x
i1
, ..., x
ik
] := E[
i
|x
i
] = 0 (5.1)
This requires that all factors in the unobserved error term are uncorrelated with the explanatory
variables. It also means that we have correctly accounted for the functional relationships between
y
i
and the regression coecients.
If (5.1) is fullled, we obtain:
E[y
i
|x
i
] =
0
+
1
x
i1
+
2
x
i2
+... +
k
x
ik
=
T
x
i
,
where we use that x
0
= 1 and = (
0
, ...,
k
)
T
. Since the variable y
i
can only take the values 0
and 1, we have
p
i
= P(y
i
= 1|x
i
) (5.2)
= 1 P(y
i
= 1|x
i
) + 0 P(y
i
= 0|x
i
)
= E[y
i
|x
i
] (5.3)
=
T
x
i
. (5.4)
The linear probability model has several disadvantages. For example, it places implicit restric-
tions on the parameters as (5.4) requires that 0
T
x
i
1 for all i = 1, ..., n. Furthermore,
the error terms
i
are not normally distributed, which is caused by the distribution of y
i
. Since
i
= y
i
T
x
i
it follows that
i
is a random variable with discrete distribution given by:
i
= 1
T
x
i
with probability
T
x
i
i
=
T
x
i
with probability 1
T
x
i
.
Hence the distribution of
i
depends on x
i
. Now the conventional ordinary least squares (OLS)
formula for the standard errors does not apply. Further, if the OLS estimates

are used to
compute the estimated probabilities

P(y
i
= 1) =

T
x
i
, then this may result in values smaller
than zero or larger than one. So in that case we cannot talk about real probabilities.
However, the probabilities can be conned to values between zero and one by using a non-linear
model. Let F be a function with range [0, 1] and let
p
i
= P(y
i
= 1|x
i
) = F(
T
x
i
) (5.5)
The choice of such a function F corresponds to assuming a specic distribution for the unobserved
individual eects. Often F is chosen to be a cumulative distribution function, because in that
case we have a monotonically non-decreasing function. This has the advantage that positive
coecients
i
correspond to positive eects on the success probability and vice versa. In the
following section we will discuss the most common choice for F, resulting in the logit model.
There are also other probabilities, resulting in the probit model for example. However, the
results of these models turn out to be very similar [17].
5.1 The logit model
The logit model arises if F is chosen to be the logistic function.
Denition 7. The logistic function of y R is given by:
F(y) = (y) =
1
1 +e
y
=
e
y
1 +e
y
.
Figure 5.1: The logistic function
The logistic function is the cdf of the logistic distribution, where the density function is given
by:
f(y) = (y) =
e
y
(1 +e
y
)
2
.
The main interest lies in determining the marginal eect of change in a regressor on the con-
ditional probability that y = 1. For a change in the jth regressor, which is assumed to be
continuous, this equals (use (5.5)):
P(y
i
= 1|x
i
)
x
ij
=
(
T
x
i
)
j
= (
T
x
i
)[1 (
T
x
i
)]
j
.
= p
i
(1 p
i
)
j
A very common interpretation of the coecients is in terms of marginal eects on the odds ratio
rather than on the probability. We have:
p
i
= (
T
x
i
) =
e
T
x
i
1 +e
T
x
i
p
i
1 p
i
= e
T
x
i
ln
p
i
1 p
i
=
T
x
i
Here p
i
/(1 p
i
) measures the probability that y
i
= 1 relative to the probability that y
i
= 0 and
is called the odds ratio or relative risk. We notice that for the logit model the log-odds ratio is
linear in the regressors.
The function g(p) = ln
p
1p
is called the logit, which explains the name of the corresponding
model.
5.1.1 Likelihood function
The preferred method of estimation in probability models is maximum likelihood, since this
permits the estimation of the parameters of almost any analytical specication of the probabil-
ity function. Furthermore, it yields estimates that are consistent and asymptotically ecient,
together with estimates of their asymptotic covariance matrix.
Suppose we have i = 1, ..., n observations on the occurrence of a certain event, denoted by the
(0, 1) variable y
i
, and a number of covariates which are arranged in the vector x
i
= (x
i1
, ..., x
ik
)
T
.
The density function of y
i
is given by:
f(y
i
, x
i
) = p
y
i
i
(1 p
i
)
1y
i
y
i
{0, 1},
where p
i
= (
T
x
i
). This implies that the sample density of a vector y of zeros and ones is
written as:
f(y, x) =
n
i=1
f(y
i
, x
i
).
Hence, the log-likelihood is given by:
l() = ln
n
i=1
f(y
i
, x
i
)
=
n
i=1
[y
i
ln(p
i
) + (1 y
i
) ln(1 p
i
)]
=
n
i=1
[y
i
ln (
T
x
i
) + (1 y
i
) ln(1 (
T
x
i
))].
Note that the actual ordering of the observations does not aect their density nor the (log)likelihood.
This is since the observations are independent, so the way of ordering is arbitrary.
When dierentiating with respect to , it follows that the maximum likelihood estimator

solves:
0 =
n
i=1
_
y
i
(
T
x
i
)
(
T
x
i
)x
i
1 y
i
1 (
T
x
i
)
(
T
x
i
)x
i
_
=
n
i=1
y
i
(
T
x
i
)
(
T
x
i
)(1 (
T
x
i
))
(
T
x
i
)x
i
=
n
i=1
(y
i
(
T
x
i
))x
i
.
5.2 Panel data
Panel or longitudinal data provide information on individual behavior both across time and
across individuals. Panel data are obtained by initially selecting a sample S and then collecting
observations for a sequence of time periods, t = 1, ..., T. This produces a sequence of data vectors
{w
1
, ...w
T
} that is used to make inferences about either the behavior of the population or the
behavior of the particular sample of data drawn from a non-stationary population. A major
advantage of panel data is the increased precision in estimation. This is the result of an increase
in the number of observations owing to combining - also called pooling - several time periods of
data for each individual. In this section we will discuss linear panel data models, which later on
will be adapted to panel data models for logistic regression.
5.2.1 Linear panel models
A very general linear model for panel data permits the intercept and slope coecients to vary
over both individual and time:
y
it
=
it
+
T
it
x
it
+
it
.
where i indexes individuals in a cross section and t indexes time. Like before, y
it
is a scalar
dependent variable, x
it
a (k 1) vector of independent variables and
it
an error term. However,
this model is too general and it is not estimable since there are more parameters to estimate than
there are observations. As a consequence, further restrictions need to be placed on the extent to
which
it
and
it
vary with i and t, and on the behavior of the error term
it
.
The most restrictive model is a pooled model that species constant coecients, which is also
the usual assumption for cross-section analysis:
y
it
= +
T
it
x
it
+
it
. (5.6)
If this model is correctly specied and regressors are uncorrelated with the error, then it can be
consistently estimated using pooled OLS.
A simple variant of model (5.6) permits intercepts to vary across individuals and over time, while
the slope parameters do not. Then
y
it
=
i
+
t
+
T
x
it
+
it
, (5.7)
where it is assumed that x
it
does not include an intercept. If there are n individuals and T time
points, this model has n + (T 1) + dim[x] parameters. These can be consistently estimated if
both n and T . Since patients are followed for a nite time span T will not go to .
Therefore the
t
can be consistently estimated, so the time dummies are simply incorporated
into the regressors x
it
. The challenge then lies in estimating the parameters controlling for
the n individual intercepts
i
[11].
The individual-specic eects model allows each cross-sectional unit to have a dierent
intercept term:
y
it
=
i
+
T
x
it
+
it
, (5.8)
where
it
is assumed to be iid over both i and t. This is a more parsimonious way to express
model (5.7), with time dummies included in the regressors x
it
. The
i
are random variables that
capture unobserved heterogeneity. Throughout this section it is assumed that the error term has
mean zero conditional on past, current and future values of the regressors:
E[
it
|
i
, x
i1
, ..., x
iT
] = 0 t = 1, ..., T (5.9)
There are two variants of the individual-specic eects model:
The xed eects (FE) model:
i
is treated as an unobserved random variable that is
potentially correlated with the observed regressors x
it
The random eects (RE) model: The unobservable individual eects
i
are assumed to
be random variables that are distributed independently of the regressors. Usually additional
assumptions are made:
i
[,
2
];
it
[0,
2
],
so that both the random eects and the error term are assumed to be iid. Note that no
specic distributions have been specied.
Because of (5.9) both models assume that
E[y
it
|
i
, x
it
] =
i
+
T
x
it
.
The individual-specic eect
i
is unknown and in short panels cannot be consistently estimated,
so we cannot estimate E[y
it
|
i
, x
it
]. Instead, we can eliminate
i
by taking the expectation only
with respect to x
it
, resulting in
E[y
it
|x
it
] = E[
i
|x
it
] +
T
x
it
.
For the RE model it is assumed that E[
i
|x
it
] = , so E[y
it
|x
it
] = +
T
x
it
. In the FE
model, however, E[
i
|x
it
] varies with x
it
and it is not known how. Therefore we cannot identify
E[y
it
|x
it
]. It is nonetheless possible to consistently estimate in the FE model with short panels
and to identify the marginal eect
=
E[y
it
|
i
, x
it
]
x
it
,
even though the conditional mean is not identied. However, a drawback is that identication
of the marginal eects is only permitted for time-varying regressors. In contrast to this, the
RE model has the advantage of permitting consistent estimation of all parameters, including
coecients of time-invariant regressors [11].
In our study, the individual-specic term is used to capture unobserved characteristics as lifestyle,
willingness to recover and education. It is reasonable to assume that these characteristics are
uncorrelated with the known regressors. In the following section we will therefore only consider
the random eects model.
5.2.2 Binary panel models
The natural extension of the binary outcome model from cross-section data (as discussed in 5.1)
to panel data with individual-specic eects is to specify that y
it
only takes the values 0 and 1,
with
P(y
it
= 1|x
it
,
i
) = (
i
+
T
x
it
)
=
e
i
+
T
x
it
1 +e
i
+
T
x
it
.
Using this and assuming conditional independence, the joint density of the ith observation y
i
=
(y
i1
, ..., y
iT
) is given by
f(y
i
|x
i
,
i
) =
T
t=1
(
i
+
T
x
it
)
y
it
(1 (
i
+
T
x
it
))
1y
it
. (5.10)
For binary data, the conditional probability is also the conditional mean (see (5.2)), so
E[y
it
|x
it
,
i
] = (
i
+
T
x
it
).
The random eects MLE assumes that the individual eects are normally distributed, with
i
N(0,
2
). The random eects MLE of and

2
maximizes the log-likelihood

ln L =
N
i=1
ln f(y
i
|x
i
,
2
).
Here
f(y
i
|x
i
,
2
) =
_
f(y
i
|x
i
,
i
)
1
_
2
2
exp(
i
2
2
)
2
d
i
, (5.11)
and f(y
i
|x
i
,
i
) is given by (5.10). There is no closed-form solution for this integral and it is
standard to compute it numerically using quadrature methods [35]. If xed eects are absent,
an alternative to the RE model is a pooled binary model that simply species that P(y
it
=
1|x
it
) = (
T
x
it
). Statistical inference should then be based on panel-robust standard errors.
More ecient estimation is possible using a generalized estimating equation (GEE) approach,
see Liang and Zeger (26). This is also the approach used by the statistical package SAS, by
which the estimations are performed.
1
5.3 Goodness of t
Measures of model t and tests of signicance for logistic regression are not identical to those
in OLS regression, although they are conceptually related. In the last, measures of variation in
the form of sums of squares are the building blocks of R
2
as well as of tests of signicance of
overall prediction and gain in prediction. In logistic regression, measures of deviance replace the
sums of squares of OLS regression as the building blocks of measures of t and statistical tests.
Each deviance measure is a measure of a lack of t of the data to a logit model. Two of them
are particularly useful. The rst of them is the null deviance, D
null
= 2 ln L
null
, which is a
summary number of all the deviance that could potentially be accounted for. It can be thought
of as a measure of lack of t of data to a model containing an intercept but no predictors. It
provides a baseline against which to compare predictions from other models that contain at
least one predictor. The second is the model deviance from a model containing k predictors,
D
k
= 2 ln L
k
, which is a summary number of all the deviance that remains to be predicted
after prediction from a set of k predictors. If the model containing k predictors ts better than
a model without predictors, the model deviance should be smaller than the null deviance.
5.3.1 Maximum likelihood theory
As said, measures of goodness of t and test statistics in logistic regression are constructed
from the deviance measures. Since these are derived from ratios of maximum likelihoods under
dierent models, the statistical tests built on deviances are referred to as likelihood ratio tests.
In this section we will discuss these kind of tests together with some other measures of goodness
of t.
One of the basics of the statistical model of maximum likelihood is the denition of a parameter
space in which the true parameter as well as its maximum likelihood estimator

must lie.
Nested hypotheses that restrict to a subspace of a wider but still acceptable parameter space
are tested. The restriction may constrain one element of to a particular value (most often
zero), but it may also take a more general form, for example an interval. There are three types
of tests of such restrictions against the wider alternative, namely
Likelihood ratio tests (LR) based on a comparison of the maximum value of the log
likelihood with and without restrictions.
Wald tests based on a comparison of the restricted values with the asymptotic normal
distribution of the unrestricted parameter estimates. Under the Wald test, the maximum
likelihood estimate

of the parameter of interest is compared with the proposed value
0
,
with the assumption that the dierence between the two will be approximately normally
distributed.
1
The procedure proc genmod is used with the logistic function as link function.
Lagrange multiplier tests (LM) based on a comparison of the score vector of the
unrestricted model, evaluated at the constrained parameter estimates, with the unrestricted
value (which is zero). These tests are also known as score tests.
Any nested hypothesis can be subjected to all three tests. They are asymptotically equivalent,
but they may give dierent results in any particular instance. We will shortly discuss them here,
following the approach taken by [19].
Suppose we are testing the null hypothesis that the data y are generated by a joint density
function f(y,
0
) against the alternative hypothesis that the data are generated by f(y, ), for
R
k
. The log-likelihood is dened as
L(y, ) = log f(y, ).
Further we dene the score function as
s(y, ) =
L(y, )
, (5.12)
and the Fisher Information as
I() = E
2
L
(). (5.13)
The maximum likelihood estimator now solves the equation s(y,

) = 0. If the MLE has a normal
distribution, and if I() is consistently estimated by

, then
W
= (

0
)
T
I(
)(

0
)
will have a
2
distribution with k degrees of freedom when the null hypothesis is true. Since the
variance of

can be calculated as the inverse of the Fisher Information, var(
) = I
1
(), in the
1-dimensional case this can be reduced to
W
=
(

0
)
2
var(
)
. (5.14)
This is commonly known as the Wald test .
The likelihood ratio test is based upon the dierence between the maximum of the likelihood
under the null and under the alternative hypothesis. The corresponding measure is given by
LR
= 2[L(y,
0
) L(y,

)] (5.15)
= D
null
D
k
.
It can be shown that
LR
has a
2
distribution if the null hypothesis is true.
There are several reasons to prefer the likelihood ratio test to the Wald test. One is that the
Wald test can give dierent answers to the same question, depending on how the question is
phrased. For example, asking whether = 1 is the same as asking whether log = 0. However,
the Wald statistic for the rst case is not the same as the Wald statistic for the latter one. This
is caused by the fact that there is in general no neat relationship between the standard errors of
and log . On the contrary, likelihood ratio tests will give exactly the same answer whether we
work with , log or any other monotonic transformation of . Another reason is that the Wald
test is based on two assumptions (that we know the standard error, and that the distribution is
chi-squared), whereas the likelihood ratio test uses only one assumption (that the distribution is
chi-squared).
Another possible test is the Langrange multiplier (LM) or score test, which has the advantage
that it can be formulated in situations where the variability is dicult to estimate. The statistic
to test the null hypothesis H
0
: =
0
is given by:
LM
=
s(y,
0
)
2
I(
0
)
, (5.16)
which asymptotically takes a
2
1
distribution under the null hypothesis. Alternatively, one can
also test the statistic
LM
against a normal distribution. This approach is equivalent and
yields identical results. The main advantage of the Lagrange Multiplier test is that is does
not require an estimate of the information under the alternative hypothesis or unconstrained
maximum likelihood. This makes testing feasible even when the maximum likelihood estimate
is a boundary point in the parameter space.
The three tests are based on dierent statistics which measure the distance between H
0
and H
1
.
The Wald test is formulated in terms of
0

, the LR test in terms of L(
0
) L(
), and the
LM test in terms of s(
0
). This dierence is illustrated by gure 5.2.
Figure 5.2: The log-likelihood function plotted against for a particular realization of y, for
k = 1.
The Wald test is based upon the horizontal dierence between
0
and

, the LR test is based
upon the vertical dierence, and the LM test is based on the slope of the likelihood function at
0
. Each of this methods is a reasonable measure of the distance between H
0
and H
1
. In fact,
when L is a smooth curve that is well approximated by a quadratic, they all give the same test.
Theorem 1. If L() = b1/2 (
)
T
A(
) where A is a symmetric positive denite matrix,

b is a scalar and

is a function of the data, then the Wald, LR and LM tests are identical.
Proof. By direct substituting in (5.15) one can see that
LR
= (
0
)
T
A(
0
).
Further, by dierentiating it follows that
s() =
L
() = (
)
T
A
and
I =

2
L
= A.
Substituting this in (5.14) and (5.16) results in
W
= (
0
)
T
A(
0
),
LM
= s(
0
)
T
A
1
s(
0
)
= (
0
)
T
A(
0
).
Whenever the true value of is equal to or close to
0
, the likelihood function in the neighborhood
of
0
will be approximately quadratic for large samples. This is the reason of the asymptotic
equivalence of the dierent likelihood tests.
5.3.2 Pseudo R
2
measures
In addition to the likelihood tests, several dierent R
2
-like measures have been developed.
These measures are interpreted in a manner similar to the coecient of determination in multiple
regression. The pseudo R
2
value for a logit model can be calculated as
R
2
logit
=
D
null
D
k
D
null
.
Just like the multiple regression counterpart, the value of R
2
logit
ranges from 0 to 1. As the model
t increases, the deviance decreases. A perfect t has a deviance of 0 and a R
2
logit
of 1.
Chapter 6
Results
In this chapter the eects of the risk factors on the dierent transitions will be discussed, based
on the survival and logit model discussed in chapter 4 and 5 respectively.
6.1 Data set
The results in this section are obtained by tting separate models for each transition. This
is done based on an expanded data set, obtained from the original data set, containing the
durations of the transitions made by the claimants. To correctly account for competing risks, for
each claimant with transition i j (i = j) a row is added for the other (unobserved) transition
that is possible from the initial state. This additional row contains the same information as the
original row, only the event variable is dierent and equals zero, since the added transition does
not actually take place. Since we also applied the method of episode splitting, see section 4.5.1,
there are a lot of left-truncated durations in our data, hence we have to characterize each duration
by a start and stop variable. For the censored transitions we do not observe the end state.
We deal with this by including two additional rows in the augmented data set, which correspond
to the two possible transitions, with event indicator equal to zero. See [27] for a more extensive
explanation about preparing data sets for (discrete) survival analysis. We have chosen to lag
the time-dependent covariates corresponding to the business-cycle by one month. By lagging a
variable we ensure that it is correctly specied that the cause precedes the eect. If the value
of the variable that supposedly causes the eect occurs at the same time, the cause-eect logic
is certainly lost and will not appear in the results. In this case, lagging ensures that changes in
the time-dependent variable precede the actual event. On the other hand, in this way we also
take into account that changes in the disability percentages have to be communicated to and
administrated by the insurance company, which can take some time.
49
CHAPTER 6. RESULTS 50
6.2 Model without the business cycle
Figure 6.1: Kaplan-Meijer estimates of transition-specic survival functions.
On the vertical axis gure 6.1 shows the proportion of people entering a specic state at time
0 (on a person-specic clock) which are still in that state after some days. Calender time is
ignored in this gure, meaning that the dierent start date of dierent clients plays no role in
the construction of the gure. As expected, the function starts at one and monotonically declines
to zero, indicating that all people should eventually jump to another state. We observe that a
transition from state 2 to 0 takes place in the rst months. After this period the probability of a
full recovery from a severe disorder is minimal. On the other hand, for the transition from state
1 to state 2 it is noted that even after two years, there is still a signicant probability of falling
back. The probabilities of the other two transitions, which reect a small improvement in health
status, are distributed almost equally.
Before the inuence of the economy could be addressed, the model without business cycle related
variables had to be estimated. In this model (further referred to as the original model) all other
possible covariates, as discussed in table 3.2, were included. Then, using the Information Criteria
discussed in section D.1.2, insignicant variables were deleted based on stepwise exclusion. The
results of this estimation, together with the exponent of the coecient and the results of the
signicance test can be found in the Appendix A and B. Here a for a variable x is used to
denote that this covariate is not included in the model. The estimated coecients are asymp-
totically normally distributed and are tested by the null hypothesis H
0
:

= 0. In this section
the eects of the included risks on the dierent transition probabilities will be discussed. Since
the results of the survival and logit model are very similar, we will restrict the discussion to the
survival model. First we note that the variable for compensation benet level is insignicant for
all transitions. This contradicts the ndings of Spierdijk and Koning [32], but is in concordance
with Amelink [7].
Transition 1 2 (see table A.1)
A transition from state 1 to 2 means that a claimant falls back from a mild disability to a more
severe illness. The coecient of age is insignicant and this regressor is not incorporated in
the nal model. Therefore it can be concluded that the fall back rate of an older self-employed
is not signicantly lower or higher than the fall back rate of a younger self-employed. Based
on a 5% signicance it can be said that females have a 14% higher fall back rate than males.
Almost all dummies for occupational class are signicant. Working in construction is an indicator
for a higher fall back rate, while working in the (para)medical or agricultural sector, or in a
shop indicates a lower fall back rate. Most of the disorder type dummies have an insignicant
coecient. Only the ones corresponding to digestive and psychological disorders or cancer have a
p-value smaller than 0, 05. Claimants with a digestive disorder have a 114% higher fall back rate
and claimants with a psychological disorder a 15% lower, when compared to the other types of
disorder, whereas the presence of cancer indicates an increase of 104%. Finally it is shown that
both the previous state and the year in which the disability started have no signicant inuence
on the transition intensity from a mild to a severe disorder. These variables are not included in
the model. Also the coecients of the deferment period are not signicant.
We will now consider which risk factors aect the probability of recovery from a mild disability.
First we note that the coecients for age is negative and signicant. This implicates that older
claimants recover more slowly. Surprisingly, the dummy variable for female is insignicant and
not included in the model. Hence, recovery from a mild disorder is independent of the gender of
the claimant. For the dummy variables representing the occupational classes it is noted that a
self-employed working in service or in the (para)medical sector has approximately a 20% higher
recovery rate, whereas a claimant working in the agricultural sector has 6% more chance to
recover than insured working in other professions. Is also can be concluded that the recovery
rate is signicantly inuenced by almost all disorders. A digestive disorder indicates a higher
recovery rate, whereas locomotive or psychological disorders are indicators for slow recovery.
Furthermore, claimants with symptoms, infections, or injuries recover faster than the average.
On the other hand, cancer is an indicator for slow recovery. From the signicant coecient for
the previous state in can be concluded that if a claimant just experienced a partial recovery (so
made a transition from state 2 to 1) he has a 52% higher recovery rate than a claimant who was
healthy before entering state 1. All dummy variables for the year in which the disability started
are insignicant, except for the years 2005 and 2006. When compared to the year 2003 it can
be said that these years are an indicator for faster recovery. For the deferment period in can be
said that the longer the period, the lower the recovery rate.
A transition from state 2 to 0 reects a full recovery from a severe illness. We see that the
coecients for both age and female are signicant and negative. This means that females recover
more slowly than men, and the recovery rate decreases with age. There are two signicant
occupational dummies: shopkeeper and construction. The rst decreases the probability of
recovery with 12%, whereas the latter one causes an increase in recovery of 24%. Most of the
disorder type dummies are signicant, except for infections. We see that both locomotive and
psychological disorders slow down the recovery, although the eect of the second one is much
stronger. As with transition 1 0, it is the case that insured with a digestive disorder, symptoms
or injuries recover faster. On the other hand, cancer is an indicator for longer duration in state
2, slowing down the recovery rate with 75%. Further it is noticed that, compared to the year
2003, insured who became disabled in 2006, 2008 or 2009 recover more slowly. Again it can be
concluded that the shorter the deferment period, the higher the recovery rate.
The last transition we should discuss is the partial recovery from a severe disorder to a mild
disorder. We see that the signicant coecient of age is negative, so the older the self-employed
the lower the recovery rate. Almost all occupational dummies are signicant, except the dummy
for service. Working in the agricultural or (para)medical sector, or in a shop inuences the
recovery rate in a positive way, while persons working in the construction recover more slowly.
We notice that all disorder dummy variables are signicant. Psychological disorder and cancer
slow down the recovery, whereas the other types of disorders increase the recovery rate. From
the signicant coecient for the previous state 1 it can be concluded that a claimant has a lower
recovery rate if he was healthy before entering state 2. Opposite to the transition 2 0 we see
that, when compared to the year 2003, insured who become disabled in later years recover faster.
Like before it can be concluded that the shorter the deferment period, the higher the recovery
rate.
6.2.1 Unobserved heterogeneity
Besides the estimation results for the various risk factors, tables A.1-A.4 also report the frailty
term. It is shown that these are signicant for all four transition rates. However, the question
arises whether the results would be very dierent when this frailty term would be omitted.
Therefore we have run the same model without unobserved heterogeneity. It turned out that
there are 7 cases in which a variable was signicant in one model and insignicant in the other.
Therefore it can be concluded that adding a frailty term really inuences the outcomes of the
model and that removing it would result in dierent conclusions.
6.2.2 Proportional hazards assumption
After tting a proportional hazards model, one should test the proportional hazard assumption.
As explained in section 4.4.1 this can be done visually or formally, based on the Schoenfeld
residuals
1
. The results of this Schoenfeld residuals test for the dierent transitions are given in
Appendix A.1. Note that a large test statistic (and small p-value) suggests no proportionality.
From these results it is concluded that, based on a 5% signicance level, in about half of the
cases proportionality does not hold. It now should be investigated how serious these violations
of the PH assumption are. Therefore we have plotted time-dependent coecient plots of the
dierent covariates where the proportionality assumption was violated. For transition 1 2
these are shown in Appendix A.1. We observe that in most cases the deviation outside the
condence bounds of

j
is numerically small. In some cases there are relatively large deviations,
for example for the covariate psychological disorder. However, these deviations are mostly at the
end of the time-horizon, when there are relatively few observations. Therefore we can conclude
that the results are good enough to use the proportional hazards model.
1
This in done by using the function cox.zph in R.
6.3 Business cycle
The inuence of a variable related to the business cycle was measured by adding it to the original
model and considering the values of the information criteria and the corresponding p-value. If
the BIC of the extended model was lower than that of the original and the p-value was signicant,
it was concluded that that specic variable aected the transition.
Survival model
An overview of the coecients of all variables per transition can be found in the tables A.9-A.12.
The signicant variables and the exponents of their corresponding coecients are shown in table
6.1. These values should be interpreted as follows: If, for example, the DNB-indicator increases
with 1 point, for the hazard function we have
new
12
= 0, 971
old
12
, and for the survival function
it holds that S
new
12
= (S
old
12
)
0,971
.
1 2 1 0 2 0 2 1
DNB indicator 0, 971 1, 021
Condence 0, 996 1, 004
GDP 0, 971 1, 013
Labour market
Income
Table 6.1: Summary of the signicant variables per transition together with the exponent of the
estimated coecient (based on a 5% signicance level).
Logit model
The signicant variables and the exponents of their corresponding coecients are shown in table
6.2. An overview of the coecients of all variables per transition can be found in the tables
B.5-B.8.
1 2 1 0 2 0 2 1
Condence 0, 996 1, 004
GDP 0, 970 1, 013
Labour market
Income
Table 6.2: Summary of the signicant variables per transition, together with the exponent of the
estimated coecient.
It is noticed that again the results of both models are very similar. Therefore, the results of the
logit model will be viewed as a conrmation of the results obtained by applying survival analysis.
We see that the variables labour market and income are insignicant for each transition.
Because of this it is chosen to remove these two from further analysis. In the remainder of this
thesis only the DNB indicator, the condence and the GDP will be considered.
Furthermore it is noticed that, based on a 5% signicance level, the business cycle only aects
the transitions 1 2 and 1 0. We see that in general, a positive business environment causes
a higher recovery rate and a lower fall-back rate. The question arises whether this inuence is the
same for all professions and disorders. To answer this question, we considered dierent sections
and addressed the inuence of the various business cycle indicators. The results for the dierent
transitions can be found in Appendix A.2.1 and A.2.2 . We will give a summary of the results:
Inuence per profession
The inuence of the business cycle on claimants working in construction is signicant for
the fall-back rate and the recovery from a mild disorder. The variable condence also
signicantly aects full recovery from a severe disorder. However, there are no indications
for a relation between the business cycle and partial recovery from a severe disorder.
Insured working in the agricultural sector or as a shopkeeper are not aected by changes
in the business cycle.
For claimants in the (para)medical sector the economic environment only inuences full
recovery from a mild disorder. All other transitions are unaected.
There is only one signicant coecient for insured working in service. The higher the GDP
the lower the fall-back rate and vice versa.
It can be concluded that the inuence of the business cycle is foremost noticeable for claimants
working in construction. The inuence of the economy on the other professions is less evident.
Inuence per disorder
The business cycle clearly aects the recovery process of insured with a locomotive dis-
order, since the variables are signicant for the all transitions, except 2 0. Hence, the
probability of full recovery from a severe locomotive disorder is unaected by the economic
environment.
There is no signicant relationship between the state of the economy and the recovery
process of claimants with a digestive disorder.
Insured with cancer or a psychological disorder are hardly aected by the business cycle.
For both there is only one signicant coecient.
For insured with disorder type symptoms the economic situations mainly aects small
recovery, so transition 1 0 and 2 1. The inuence on the fall-back rate is less clear
and there are no signicant coecients for full recovery from a severe disorder.
When a claimant suers from an infection, the business cycle only inuences the transition
1 0. The better the economy, the higher the probability of recovery from a mild infection.
There are indications that the business cycle aects the recovery process of claimants with
an injury. However, there is no consistent inuence on one of the transitions in particular.
Referring back to table 2.3 it can be concluded that the inuence of the business cycle is most
evident for claimants with locomotive symptoms. There are also indications for inuence on
claimants with a locomotive injury. On the other hand, there is hardly no inuence on insured
with other types of disability.
It is noticed that in almost all cases either all three business cycle related variables are signicant
or they all are insignicant. Looking at the graphs of the DNB-indicator, the condence and the
GDP as shown in appendix C, we see that they approximately follow the same pattern (only on a
dierent scale). Besides this there is a lot of interaction between the dierent variables. Therefore
it would be incorrect to add them all to the model. Instead of that, we should pick one. Now the
question is: Which variable should be used to measure the business cycle? Dierent motivations
could be used resulting in dierent answers. For example, the GDP has the advantage of being
objective, since its value is determined by the formula:
GDP = private consumption + gross investment + government spending + (exports - imports).
On the other hand, the DNB-indicator has the benet of covering 7 months forward. Therefore
this variable could be used by the insurance company to adjust the premium or develop special
recovery programs in an early stage. Because of this forecasting property it is chosen to add the
DNB-indicator to our model as the business cycle-related variable.
Chapter 7
Quantifying the inuence of the
risk factors
In the previous chapter we considered the coecients and corresponding hazards of the various
variables. These two only provide information about the inuence of the variable on the transition
rate. However, the insurance company is mainly interested in the inuence of the risk factors on
the average recovery time. In this chapter we will calculate the expected duration until recovery,
and we will show how this duration changes if the value of one of the covariate changes. First
we consider the transition probabilities in the multi-state model. These are used to calculate
the so-called taboo probabilities which are needed to determine the expected recovery period.
Finally, the results of this calculations will be discussed in section 7.2.
7.1 Transition probabilities in the multi-state model
We start this section with a short review of the multi-state model. For an introduction to Markov
chains and transition probabilities we refer to section 3.2.1.
Our multi-state model consists of three states, 0, 1 and 2, so the state space is given by S =
{0, 1, 2}. We dene S(t) as the random state at time t, where S(t) S and t > 0. A transition
from one state to another takes place with intensity
ij
(t) =
ij
(t|x), for i, j S. The transition
probabilities are denoted by:
P
ij
(t, s) = P(S(s) = j|S(t) = i).
The relationship between the transition probabilities and intensities is given by:
ij
(t) = lim
st
P
ij
(t, s)
s t
. (7.1)
The transition intensities of our model are known and they are given in chapter 6. We now would
like to express the transition probabilities in these transition intensities.
Theorem 2. The transition probabilities and transition intensities satisfy the Kolmogorov back-
ward dierential equations
d
dt
P
ij
(t, s) = P
ij
(t, s)
j:j=i
ij
(t)
k:k=i
P
kj
(t, s)
ik
(t), (7.2)
56
CHAPTER 7. QUANTIFYING THE INFLUENCE OF THE RISK FACTORS 57
and the Kolmogorov forward dierential equations
d
ds
P
ij
(t, s) =
k:k=j
P
ik
(t, s)
kj
(s) P
ij
(t, s)
i:i=j
ji
(s). (7.3)
Proof. By using the Markov property (denition 3.1) and the law of total probability, it is realized
that, for t < u < s :
P(S(t) = i, S(s) = j) =
k
P(S(t) = i, S(u) = k, S(s) = j)
=
k
P(S(s) = j|S(u) = k)P(S(u) = k|S(t) = i)P(S(t) = i)
Diving both sides by P(S(t) = i) results in
P
ij
(t, s) =
k
P
ik
(t, u)P
kj
(u, s) i, j, k S and s > t > 0.
These equations are known as the Chapman-Kolmogorov equations. Next we consider:
P
ij
(u, s) P
ij
(t, s) = P
ij
(u, s)
k
P
ik
(t, u)P
kj
(u, s)
= [1 P
ii
(t, u)]P
ij
(u, s)
k:k=i
P
ik
(t, u)P
kj
(u, s).
Dividing by u t and taking the limits as u t results in
lim
ut
P
ij
(u, s) P
ij
(t, s)
u t
= lim
ut
_
_
1 P
ii
(t, u)
u t
P
ij
(u, s)
k:k=i
P
ik
(t, u)
P
kj
(u, s)
u t
_
_
= lim
ut
_
_
j:j=i
P
ij
(t, u)
u t
P
ij
(u, s)
k:k=i
P
kj
(u, s)
P
ik
(t, u)
u t
_
_
.
Since we are summing over all state spaces, the summing index is nite. So we may interchange
the limit and summation. Using (7.1) this results in
d
dt
P
ij
(t, s) = P
ij
(t, s)
j:j=i
ij
(t)
k:k=i
P
kj
(t, s)
ij
(t).
Hence (7.2) is proven. Analogue, the proof of (7.3) can be done.
However, the derivation of the transition probabilities from the Kolmogorov dierential equa-
tions is rather complex. To simplify this, Haberman and Pitacco [20] propose to assume time-
homogeneous transition intensities. Unfortunately, this method in not applicable in our situation,
since there is a negative duration dependence, as shown in chapter 6. Another possible solution
is to consider the occupancy probabilities:
Denition 8. The occupancy probability is the probability of staying in a state and is, for i S
and t < s dened as
P
ii
(t, s) := P(S(u) = i u [t, s] | S(t) = i). (7.4)
This denition is dierent from the denition of P
ii
(t, s), which also indicates the probability of
being in the same state for t and s, but one could have visited other states j (j = i) between t
and s. Like before, the relationship between occupancy probabilities and transition intensities
can be derived from the Kolmogorov backward dierent equations;
d
dt
P
ii
(t, s) = P
ii
(t, s)
j:j=i
ij
(t).
These equations can be solved by using the boundary condition P
ii
(t, t) = 1, resulting in
P
ii
(t, s) = exp
_
_
_
s
t
j:j=i
ij
(u) du
_
_
, (7.5)
as can be veried by dierentiation.
In order to determine the expected duration until recovery we need the transition probabilities
between the dierent states. As explained in the previous section, however, we have no solution
for a closed formula of P
ij
(t, s). On the other hand we do have a closed form solution for the
occupancy probabilities. Using (7.5) we have for t < s:
P
11
(t, s) = exp
_
_
s
t
[
12
(u) +
10
(u)] du
_
; (7.6)
P
22
(t, s) = exp
_
_
s
t
[
21
(u) +
20
(u)] du
_
. (7.7)
Since our time-axis t is on a monthly scale, an insured can make only one transition between t
and t + 1, or he stays in his current state. So in our case we have P
ii
(t, t + 1) = P
ii
(t, t + 1).
Therefore it is possible to derive a closed form solution for the one-month transition probabilities
P
ii
(t, t + 1). The probability that a claimants leaves state i in the interval (t, t + 1] is equal to
1 P
ii
(t, t +1). This portion should be divided proportionally to the transition intensities. This
results in the following one-month transition and occupancy probabilities:
P
11
(t, t + 1) = exp
_
_
t+1
t
[
10
(u) +
12
(u)]du
_
; (7.8)
P
10
(t, t + 1) = (1 P
11
(t, t + 1))
_
t+1
t

10
(u)du
_
t+1
t
[
10
(u) +
12
(u)]du
; (7.9)
P
12
(t, t + 1) = 1 P
11
(t, t + 1) P
10
(t, t + 1); (7.10)
P
22
(t, t + 1) = exp
_
_
t+1
t
[
20
(u) +
21
(u)]du
_
; (7.11)
P
20
(t, t + 1) = (1 P
22
(t, t + 1))
_
t+1
t

20
(u)du
_
t+1
t
[
20
(u) +
21
(u)]du
; (7.12)
P
12
(t, t + 1) = 1 P
11
(t, t + 1) P
10
(t, t + 1). (7.13)
With these probabilities it is now possible to calculate the probabilities P
ij
(t, s) for every pair
t < s. However, in order to determine the expected duration until recovery we do not need to
know al these probabilities. We only need to know the probability of starting in state 1 or 2,
follow some path through these two states an arriving in state 0 for the rst time at time t (for
all t = 2, 3, ..). This concept is captured in the so-called taboo probabilities.
Denition 9. For i {1, 2} the taboo probability is given by
q
i0
(1, t) = P
i
[S(2) = 0, ..., S(t 1) = 0, S(t) = 0]
q
i0
(1, 1) = 0.
In order to calculate the taboo probabilities we dene the matrices Q(t) and R(t), t = 1, 2, .. by
Q(t) =
_
_
0 0 0
0 p
11
(t, t + 1) p
12
(t, t + 1)
0 p
21
(t, t + 1) p
22
(t, t + 1)
_
_
(7.14)
R(t) =
_
_
0 0 0
p
10
(t, t + 1) 0 0
p
20
(t, t + 1) 0 0
_
_
(7.15)
where the entries are dened by (7.8)-(7.13). Combining this, we obtain, for i {1, 2} and
t = 2, 3, ..
q
i0
(1, t) =
_
t1
s=1
Q(i) R(s)
_
i1
. (7.16)
Using denition 9 the expected duration until recovery ER
i
of a self-employed with risk factors
X = x, starting in state i, is given by
ER
i
=
t=1
q
i0
(1, t) (t 1). (7.17)
It is noted that this summing equation is from t = 1 to innity, but only the transition inten-
sities for t = 1, ..., 100 are known. However, at t = 101 there are still some claims continuing.
Neglecting these claims would seriously bias the expected duration. Therefore it will be assumed
that these claims last for 101 months.
7.2 Expected duration until recovery
In order to determine the inuence of the dierent risk factors on the expected recovery time,
we will calculate the expected duration until recovery as described in the previous section. First
of all, we will do the calculations for a benchmark self-employed starting in state 1, and we will
repeat this for the same self-employed starting in state 2. Subsequently, we will change the value
of one of the covariates and re-calculate the expected duration until recovery. In this way we can
measure the impact of a change in one of the risk factors.
As benchmark self-employed is it chosen to consider the average insured; a male of 42, working
in the agricultural sectoring, suering from locomotive symptoms (also known as an L1-disorder).
Like the majority of the claimants, he has a deferment period of 14 days. The time-dependent
DNB-indicator representing the business cycle gives rise to a problem. Since the future values
of these variable is unknown, it cannot be used for the prediction of the expected recovery time.
This is solved by assuming that the value of these indicator is constant during a spell. For
State 1 State 2
Benchmark 9, 88 14, 64
Age = 60 18, 20 25, 34
Gender = Female 11, 16 16, 36
Profession = Construction 10, 16 15, 36
Profession = (Para)Medical 7, 67 12, 32
Profession = Service 8, 14 13, 25
Profession = Shopkeeper 11, 37 17, 61
Disorder = Loc. injury 5, 19 9, 27
Disorder = Psych. symptoms 9, 92 20, 57
Disorder = Digestive 5, 74 8, 38
Disorder = Dig. cancer 13, 40 30, 28
Deferment = B 12, 91 19, 12
Deferment = C 19, 35 25, 93
Deferment = D 28, 58 40, 42
DNB = 2 10, 72 15, 46
DNB = 2 9, 11 13, 86
Table 7.1: Estimates of the expected recovery time (in months) for self-employed with dierent
characteristics, starting in disability state 1 or 2. The benchmark self-employed is a male of 42
working in the agricultural sector, suering from locomotive symptoms, with a deferment period
of 14 days and the DNB-indicator equal to 0.
the benchmark self-employed it is assumed that this value is equal to the average value during
2003 2011, namely 0.
The results of the calculations can be found in table 7.1. In the rst column the expected duration
until recovery is shown for claimants starting with a mild disorder. For claimants starting with
a severe disorder the results are given the second column. We will consider a dierence in
expectation signicant if and only if it an increase/decrease of at least 5%. So for state 1,
expectations are considered to be signicant when they are outside the interval [9, 39; 10, 37],
and for state 2 outside the interval [13, 91; 15, 37].
We will start by discussing the result for the benchmark self-employed and subsequently discuss
the inuence of the various risk factors. The expected duration until recovery of the benchmark
self-employed depends signicantly on whether he started in state 1 or 2. If he has started in state
1, his expected duration until recovery is 9, 9 months. However, if the benchmark self-employed
would have started in state 2 instead of 1, his disability spell would last an expected period of
14, 6 months. When comparing the columns of table 7.1 for the other characteristics, it is noted
that in all cases the expected duration when started in state 2 is signicantly higher than when
started in state 1. Therefore it can be concluded that the expected duration until recovery of
a self-employed starting with a mild disorder is signicantly lower than that of a self-employed
starting with a severe disorder. Furthermore it is noted that in both cases the expected duration
until recovery signicantly rises when the claimant in 60 years old. Also for female insured we
see an increase in the expected recovery time.
Concentrating on the dierent occupational classes it is noted that claimants working in the
(para)medical sector or in service recover signicantly faster than claimants in the agricultural
sector. On the other hand, working as a shopkeeper signicantly increases the expected duration
until recovery. For claimants working in the construction we see a slightly increased expectation of
0, 3 (state 1) and 0, 7 months (state 2). This is a increase of less than 5%, hence it is insignicant.
Changes in the disorder type lead in almost all cases to signicantly dierent estimates of the
expected duration until recovery from both a mild and a severe disorder. In both cases, claimants
with locomotive injuries or a digestive disorder recovery signicantly faster than claimants with
locomotive symptoms. On the other hand, claimants with digestive cancer experience a much
longer disability spell. When they start in state 2, the expected duration until recovery is more
than the double of the benchmark self-employed. Insured with a mild psychological discover show
no signicant deviation from the expected duration of the benchmark. However, for insured with
a severe psychological disorder we observe an increase in expected duration of almost 5 months.
For both insured starting in state 1 and insured starting in state 2, the length of the deferment
period is a signicant indicator for the duration of the disability spell. It can be said that the
longer the deferment period, the longer it takes for a claimant to recover. This is probably caused
by two reasons: First of all it sometimes is the case that claimants do not report their disability
when its duration is shorter than the deferment period. Second, a selection eect can play an
important role. Most claimants choose for a longer deferment period either when they have a lot
of savings to cover the rst period, or when they suspect that they will not experience (many)
short spells. So when they do suer from a disorder, there is a higher chance that this spell will
last longer than average.
An increase in the DNB-indicator reects an improvement of the economic situation. We see
that when the DNB-indicator increases from 2 to 2, the expected recovery period decreases
signicantly, for both state 1 and 2. For claimants starting with a mild disorder we see an
average duration of 10, 72 months when the indicator is at its minimum, and a duration of
9, 11 months when it is at its maximum. For claimants starting with a severe disorder these
values are respectively 15, 46 and 13, 86 months. Hence, in both cases we observe a decline in
expected recovery period of 1, 6 months. Therefore it can be concluded that an increase in the
DNB-indicator leads to a decrease in the expected duration until recovery.
7.2.1 Other causes of the uctuations in the loss ratio
Over the period 2003 2011 the DBN-indicator ranged from 2, 02 (August 2008) to 2, 74
(July 2007). Hence this means a maximum dierence of expected duration until recovery of
approximately 1, 9 months. When comparing graph 1.3 and C.1a, it is noted that in the rst
years an increase in the DNB-indicator corresponds with a decrease in the percentage of insured
with a claim and vice versa - as expected. However in the period after June 2009 this is not the
case. Therefore it is assumed that the uctuations in the loss ratio are not solely explained by
changes in the business cycle, but that there are also other processes going on. For example,
both the man/woman-ratio and the ratio between the dierent professions has changed over
time. In 2003 above 91% percent of the claims were made by man. This number declined to
approximately 85% in 2011. Since it is shown that in general women recover more slowly, this
dierence in ratio is another reason for the increase in average recovery time. Another change
in the composition of the portfolio is one in the ratio of the dierent occupational classes, as is
shown in the following graph.
Figure 7.1: Change of ratio in professions over the dierent years in which the disability started.
We notice that the percentage of claimants working in the agricultural sector decreased very
fast, while the percentage of the other professions increased. For example, the percentage of
claimants working in service doubled over the last two years. This is important because the
dierent professions have dierent expected durations until recovery, as was shown in table 7.1.
Furthermore, the characteristics of the claimants dier per profession, as shown in gure 7.2a
and 7.2b.
(a) Disorders per profession (in %). (b) Ratio men/women per profession (in %).
Chapter 8
Conclusion and advice
The aim of this thesis was to explain the uctuations in the loss ratio of the disability insurance
company and to give advice on how this explanation could be used in order to improve the
current models. We used a unique data set containing more than 30000 claims during the period
2003 2011. The focus was primarily on the inuence of the business cycle on the recovery
process. In order to dierentiate between negative, partial and full recovery a multi-state model
was introduced. The model consisted of three states: one healthy state and two disability states.
Only the transitions between the two disability states and from the disability states to the
healthy state were considered. These transitions were estimated by both a proportional hazards
model and a logistic regression model. First a basic model was created, after which the inuence
of the various business cycle related variables was addressed. The results of the two dierent
estimation methods were almost identical: the business cycle mainly aects the fall-back rate
and full recovery from a mild disorder. Both transitions are aected by the DNB-indicator, the
leading variable condence and the coincident variable GDP. On the other hand, it was shown
that the lagging variable labour and the average income of self-employed has no inuence on
the recovery process.
For transition 1 0 the business cycle-related variables have a positive eect: the higher the
value of these variables, the higher the rate of full recovery from a mild disorder. For transition
1 2 it is exactly the other way around: the higher the value of the business-cycle related
variables, the lower the fall-back rate. From these ndings it can be concluded that a positive
business climate results in less negative recovery and more recovery from a mild disorder. It
was also investigated whether these results dier per profession or disorder. It turned out that
the inuence of the business cycle is most signicant for claimants working in construction and
claimants suering from locomotive symptoms. For the other occupational classes and types of
disability the inuence was hardly, or not at all, noticeable.
These ndings could be used when constructing an internal model for Solvency II. Since disability
duration lasts longer in periods of low condence, low GDP and a low value of the DNB-indicator,
the insurance company should increase the required loss reserves in these periods, and vice versa.
Another advantage of knowing what aects slow recovery, is that the company can provide extra
services, such as prevention or reintegration services, to repulse long sick-leave durations in
times of economic downturn. Since it was shown that claimants with a psychological disorder
or cancer are not inuenced by changes in the business climate, it is probable that for these
claimants extra reintegration services will not be benecial. The same, for example, holds true
for claimants working in the agricultural sector or as a shopkeeper.
63
CHAPTER 8. CONCLUSION AND ADVICE 64
In order to quantify the inuence of the dierent risk factors one of the business cycle-related
variables had to be added to the model. It was chosen to pick the DNB-indicator because of its
forecasting property. Subsequently the inuence of the risk factors was addressed by determining
the expected duration until recovery, both when started with a mild disorder and when started
with a severe disorder. These results were compared to the expected durations of the benchmark
self-employed. It was shown that a dierence in the DNB-indicator can account for a longer
expected duration until recovery of at most 1, 9 months. Therefore it can be concluded that
changes in the business cycle did inuence the percentage of insured with a claim and therefore
aected the loss ratio of the insurance company. However, when compared to the other risk
factors, it is noted that the inuence of the DNB-indicator is relatively small. It turned out
that the composition of the portfolio was not constant, but it changed over time. For instance,
the men/women-ratio declined, meaning that in most recent years there were more women and
less men than in earlier years. Also the ratio of the dierent professions changed during the
last decade. The percentage of claimants working in the agricultural sector decreased, while the
contribution of the other professions in our portfolio increased. This development is important
because the characteristics of the claimants dier per profession, for example on the frequency
of the various disorders.
Finally, some recommendations for further research are given. First of all it would be interesting
to quantify the inuence of the change in the portfolio. In this way it could be determined
which part of the increase in the average disability duration is caused by the business cycle and
which by changes in the composition of the insured. Second, when determining the expected
duration until recovery it was assumed that the value of the time-dependent DNB-indicator was
constant during a spell. However, to obtain more accurate expectations one should account for
the changes the DNB-indicator makes over time. Last, we only considered the data from the
years 20032011, which consisted of one period of economic growth and one of economic decline.
By adding the years 1995 2002 to the data set one would be able to compare two economic
cycles with each other, since in that period the dot-com bubble had its climax.
65
APPENDIX A. ESTIMATION RESULTS OF THE MPH MODEL 66
Appendix A
Estimation results of the MPH
model
Coef Exp(Coef ) se(Coef ) p-Value
Age - - - -
Female 0, 142 1, 153 0, 054 0, 008
Agriculture 0, 110 0, 895 0, 049 0, 024
Construction 0, 178 1, 195 0, 054 0, 003
(Para)Medical 0, 204 0, 816 0, 077 0, 008
Service 0, 177 0, 838 0, 101 0, 078
Shopkeeper 0, 242 0, 785 0, 074 0, 001
Locomotive 0, 006 1, 006 0, 049 0, 900
Psychological 0, 146 0, 864 0, 070 0, 036
Digestive 0, 761 2, 141 0,084 0, 000
Symptoms 0, 002 0, 998 0, 042 0, 970
Cancer 0, 695 2, 004 0, 101 0, 000
Infections 0, 257 0, 773 0, 160 0, 110
Injury 0, 043 0, 957 0, 053 0, 410
Prev. state
Year 2004
Year 2005
Year 2006
Year 2007
Year 2008
Year 2009
Year 2010
Year 2011
Deferment B 0, 050 0, 951 0, 042 0, 230
Deferment C 0, 074 0, 929 0, 053 0, 160
Deferment D 0, 479 0, 619 0, 542 0, 380
Amount insured - - - -
Frailty 0,001
Table A.1: Estimation results for the MPH model, transition 1 2.
Age 0, 020 0, 980 0, 001 0, 000
Female
Agriculture 0, 060 1, 062 0, 026 0, 018
Construction 0, 055 1, 057 0, 032 0, 079
(Para)Medical 0, 176 1, 193 0, 040 0, 000
Service 0, 180 1, 197 0, 050 0, 000
Shopkeeper 0, 044 0, 957 0, 038 0, 250
Locomotive 0, 115 0, 891 0, 026 0, 000
Psychological 0, 167 0, 846 0, 035 0, 000
Digestive 0, 452 1, 572 0, 049 0, 000
Symptoms 0, 185 1, 203 0, 022 0, 000
Cancer 0, 155 0, 856 0, 077 0, 043
Infections 0, 145 1, 156 0, 073 0, 046
Injury 0,528 1, 695 0, 026 0, 000
Prev. state 2 0, 419 1, 521 0, 020 0, 000
Year 2004 0, 024 1, 024 0, 035 0, 500
Year 2005 0, 120 1, 127 0, 035 0, 001
Year 2006 0, 072 1, 074 0, 035 0, 040
Year 2007 0, 062 1, 064 0, 035 0, 079
Year 2008 0, 034 1, 035 0, 035 0, 330
Year 2009 0, 029 1, 030 0, 035 0, 410
Year 2010 0, 056 1, 058 0, 038 0, 140
Year 2011 0, 007 1, 007 0, 068 0, 920
Deferment B 0, 161 0, 851 0, 021 0, 000
Deferment C 0, 431 0, 650 0, 029 0, 000
Deferment D 0, 722 0, 486 0, 360 0, 045
Frailty 0, 000
Age 0, 026 0, 975 0, 002 0, 000
Female 0, 142 0, 862 0, 055 0, 010
Agriculture 0, 089 0, 970 0, 046 0, 054
Construction 0, 244 1, 243 0, 050 0, 000
(Para)Medical 0, 059 1, 084 0, 076 0, 440
Service 0, 155 1, 170 0, 089 0, 080
Shopkeeper 0, 160 0, 881 0, 073 0, 028
Locomotive 0, 410 0, 708 0, 045 0, 000
Psychological 1, 007 0, 398 0, 067 0, 000
Digestive 0, 593 1, 786 0, 068 0, 000
Symptoms 0, 277 1, 285 0, 043 0, 000
Cancer 1, 484 0, 247 0, 126 0, 000
Infections 0, 010 1, 016 0, 132 0, 940
Injury 0, 485 1, 564 0, 048 0, 000
Prev. state 1 0, 059 1, 061 0, 057 0, 300
Year 2004 0, 026 0, 003 0, 062 0, 680
Year 2005 0, 054 0, 114 0, 062 0, 390
Year 2006 0, 134 0, 063 0, 063 0, 033
Year 2007 0, 056 0, 040 0, 063 0, 370
Year 2008 0, 232 0, 015 0, 063 0, 000
Year 2009 0, 188 0, 007 0, 063 0, 003
Year 2010 0, 116 0, 030 0, 066 0, 077
Year 2011 0, 313 0, 978 0, 113 0, 006
Deferment B 0, 237 1, 251 0, 038 0, 000
Deferment C 0, 402 1, 040 0, 055 0, 000
Deferment D 0, 322 1, 062 0, 421 0, 440
Frailty 0, 000
Age 0, 006 0, 994 0, 001 0, 011
Female - - - -
Agriculture 0, 344 1, 410 0, 033 0, 000
Construction 0, 194 0.823 0, 039 0, 000
(Para)Medical 0, 262 1, 300 0, 051 0, 000
Service 0, 007 1, 007 0, 065 0, 910
Shopkeeper 0, 147 1, 159 0, 051 0, 004
Locomotive 0, 270 1, 310 0, 034 0, 000
Psychological 0, 263 0, 769 0, 045 0, 000
Digestive 0, 126 1, 134 0, 059 0, 033
Symptoms 0, 152 1, 164 0, 030 0, 000
Cancer 1, 059 0, 347 0, 073 0, 000
Infections 0, 270 1, 310 0, 103 0, 009
Injury 0, 115 1, 122 0, 035 0, 001
Prev. state 1 0, 100 1, 105 0, 032 0, 002
Year 2004 0, 169 1, 184 0, 047 0, 000
Year 2005 0, 238 1, 269 0, 047 0, 000
Year 2006 0, 191 1, 211 0, 047 0, 000
Year 2007 0, 274 1, 316 0, 047 0, 000
Year 2008 0, 167 1, 182 0, 046 0, 000
Year 2009 0, 142 1, 153 0, 047 0, 002
Year 2010 0, 207 1, 230 0, 049 0, 000
Year 2011 0, 132 1, 142 0, 083 0, 110
Deferment B 0, 118 0, 888 0, 028 0, 000
Deferment C 0, 230 0, 794 0, 038 0, 000
Deferment D 1, 378 0, 252 0, 343 0, 000
Frailty 0, 000
A.1 PH assumption
2
-stat. p-value
2
-stat. p-value
Female 1, 561 0, 211 Symptoms 1, 770 0, 183
Agriculture 1, 160 0, 281 Cancer 7, 766 0, 005
Construction 0, 036 0, 851 Infections 10, 294 0, 001
(Para)Medical 3, 148 0, 076 Injury 11, 054 0, 001
Service 3, 997 0, 046 Deferment B 7, 882 0, 005
Shopkeeper 0, 040 0, 841 Deferment C 1, 877 0, 171
Locomotive 3, 475 0, 062 Deferment D 0, 967 0, 325
Psychological 0, 023 0, 879 GLOBAL 105, 129 0, 000
Digestive 4, 507 0, 034
Table A.5: Test statistics of the univariate test for non-proportionality and global test of pro-
portional hazard for the transition 1 2.
2
-stat. p-value
2
-stat. p-value
Age 19, 10 0, 000 Year 2004 0, 670 0, 413
Female 15, 00 0, 000 Year 2005 0, 450 0, 502
Agriculture 3, 690 0, 055 Year 2006 0, 360 0, 548
Construction 2, 540 0, 111 Year 2007 4, 800 0, 028
(Para)Medical 1, 850 0, 174 Year 2008 0, 118 0, 731
Service 0, 567 0, 452 Year 2009 1, 260 0, 261
Shopkeeper 2, 030 0, 154 Year 2010 0, 472 0, 492
Locomotive 9, 410 0, 002 Year 2011 5, 090 0, 024
Psychological 273, 0 0, 000 Previous state 2 34, 00 0, 000
Digestive 0, 858 0, 354 Deferment B 52, 00 0, 000
Symptoms 19, 30 0, 000 Deferment C 102, 0 0, 000
Cancer 0, 123 0, 726 Deferment D 0, 002 0, 966
Infections 0, 161 0, 688 GLOBAL 920, 0 0, 000
Injury 108, 0 0, 000
2
-stat. p-value
2
-stat. p-value
Age 0, 844 0, 358 Year 2004 4, 160 0, 041
Female 5, 720 0, 017 Year 2005 1, 690 0, 193
Service 0, 255 0, 613 Year 2009 8, 690 0, 003
Shopkeeper 0, 102 0, 750 Year 2010 12, 70 0, 000
Locomotive 47, 20 0, 000 Year 2011 1, 070 0, 301
Psychological 167, 0 0, 000 Previous state 2 0, 909 0, 340
Digestive 14, 20 0, 000 Deferment B 2, 260 0, 133
Symptoms 7, 970 0, 005 Deferment C 0, 039 0, 843
Cancer 0, 264 0, 608 Deferment D 0, 261 0, 609
Infections 0, 594 0, 441 GLOBAL 251, 0 0, 000
Injury 12, 10 0, 001
2
-stat. p-value
2
-stat. p-value
Age 9, 106 0, 003 Year 2004 0, 522 0, 470
Service 3, 447 0, 063 Year 2008 6, 919 0, 008
Shopkeeper 5, 526 0, 019 Year 2009 21, 064 0, 000
Locomotive 3, 239 0, 072 Year 2010 15, 137 0, 000
Psychological 16, 621 0, 000 Year 2011 6, 389 0, 012
Digestive 0, 013 0, 910 Previous state 2 80, 522 0, 000
Symptoms 8, 606 0, 003 Deferment B 8, 140 0, 004
Cancer 21, 125 0, 000 Deferment C 77, 106 0, 000
Infections 0, 868 0, 352 Deferment D 0, 639 0, 424
Injury 2, 110 0, 146 GLOBAL 89, 486 0, 000
(a) Service, p-value: 0, 046 (b) Digestive, p-value: 0, 034
(c) Cancer, p-value: 0, 005
(d) Infections, p-value: 0, 001
(e) Injury, p-value: 0, 001 (f) Deferment B, p-value: 0, 005
Figure A.1: Time-dependent coecient plots for variables where the proportional hazards as-
sumption is violated (transition 1 0)
A.2 Business cycle
Coecient Exp(coef ) p-value
DNB indicator 0, 030 0, 971 0, 013
Condence 0, 004 0, 996 0, 050
GDP 0, 029 0, 971 0, 000
Labour market 0, 001 0, 999 0, 610
Income 0, 013 0, 987 0, 110
Table A.9: Coecient, hazard rates and p-values for the business cycle variables, transition
1 2.
DNB indicator 0, 022 1, 022 0, 026
Condence 0, 004 1, 004 0, 003
GDP 0, 013 1, 013 0, 018
Labour market 0, 000 1, 000 0, 750
Income 0, 005 1, 005 0, 520
1 0.
DNB indicator 0, 004 1, 004 0, 870
Condence 0, 005 1, 005 0, 110
GDP 0, 014 1, 014 0, 280
Labour market 0, 001 0, 999 0, 670
Income 0, 021 1, 021 0, 220
2 0.
DNB indicator 0, 019 0, 981 0, 110
Condence 0, 002 0, 998 0, 210
GDP 0, 013 0, 987 0, 056
Labour market 0, 000 1, 000 0, 830
Income 0, 001 1, 001 0, 900
2 1.
A.2.1 Inuence business cycle per profession
Agriculture
1 2 1 0 2 0 2 1
DNB indicator
Condence
GDP
Table A.13: Profession = agriculture: Summary of the signicant variables per transition to-
gether with the exponent of the estimated coecient (based on a 5% signicance level).
Construction
1 2 1 0 2 0 2 1
Condence 0, 988 1, 010 1, 015
GDP 0, 950 1, 038
Table A.14: Profession = construction: Summary of the signicant variables per transition
together with the exponent of the estimated coecient (based on a 5% signicance level).
(Para)Medical
1 2 1 0 2 0 2 1
DNB indicator 1, 064
Condence 1, 010
GDP
Table A.15: Profession = (para)medical: Summary of the signicant variables per transition
Service
1 2 1 0 2 0 2 1
DNB indicator
Condence
GDP 0, 928
Table A.16: Profession = service: Summary of the signicant variables per transition together
with the exponent of the estimated coecient (based on a 5% signicance level).
Shopkeeper
1 2 1 0 2 0 2 1
DNB indicator
Condence
GDP
Table A.17: Profession = shopkeeper: Summary of the signicant variables per transition to-
A.2.2 Inuence business cycle per disorder
Locomotive
1 2 1 0 2 0 2 1
DNB indicator 0, 971 1, 013 1, 024
Condence 1, 003 1, 004
GDP 0, 966 1, 009
Table A.18: Disorder = locomotive: Summary of the signicant variables per transition together
Psychological
1 2 1 0 2 0 2 1
DNB indicator
Condence
GDP 0, 952
Table A.19: Disorder = psychological: Summary of the signicant variables per transition to-
Digestive
1 2 1 0 2 0 2 1
DNB indicator
Condence
GDP
Table A.20: Disorder = digestive: Summary of the signicant variables per transition together
Symptoms
1 2 1 0 2 0 2 1
Condence 1, 004 1, 004
GDP 0, 973 1, 013 1, 014
Table A.21: Disorder type = symptoms: Summary of the signicant variables per transition
Cancer
1 2 1 0 2 0 2 1
DNB indicator
Condence
GDP 0, 949
Table A.22: Disorder type = cancer: Summary of the signicant variables per transition together
Infections
1 2 1 0 2 0 2 1
Condence 1, 018
GDP 1, 050
Table A.23: Disorder type = infections: Summary of the signicant variables per transition
Injury
1 2 1 0 2 0 2 1
Condence
GDP 0, 968
Table A.24: Disorder type = injury: Summary of the signicant variables per transition together
77
APPENDIX B. ESTIMATION RESULTS OF THE LOGIT MODEL 78
Appendix B
Estimation results of the logit
model
Age - - - -
Female 0, 147 1, 158 0, 054 0, 006
Agriculture 0, 104 0, 925 0, 050 0, 036
Construction 0, 180 1, 110 0, 059 0, 002
(Para)Medical 0, 202 0, 817 0, 078 0, 010
Service 0, 150 0, 861 0, 105 0, 152
Shopkeeper 0, 242 0, 785 0, 074 0, 001
Locomotive 0, 036 1, 037 0, 048 0, 940
Psychological 0, 154 0, 857 0, 069 0, 0025
Digestive 0, 759 2, 136 0, 085 0, 000
Symptoms 0, 010 1, 010 0, 041 0, 804
Cancer 0, 638 1, 893 0, 101 0, 000
Infections 0, 220 0, 803 0, 163 0, 177
Injury 0, 003 0, 997 0, 052 0, 948
Prev. state 2
Year 2003 - - - -
Year 2004 - - - -
Year 2005 - - - -
Year 2006 - - - -
Year 2007 - - - -
Year 2008 - - - -
Year 2009 - - - -
Year 2010 - - - -
Deferment B 0, 063 0, 939 0, 042 0, 136
Deferment C 0, 099 0, 906 0, 053 0, 060
Deferment D 0, 446 0, 640 0, 575 0, 439
Time in system 0, 041 0, 960 0, 002 0, 000
Table B.1: Estimation results for the logit model, transition 1 2.
Age 0, 022 0, 978 0, 001 0, 000
Female 0, 042 0, 959 0, 029 0, 145
Agriculture 0, 048 1, 049 0, 026 0, 068
Construction 0, 042 1, 043 0, 033 0, 206
(Para)Medical 0, 184 1,202 0, 040 0, 000
Service 0, 184 1, 202 0, 050 0, 003
Shopkeeper 0, 044 0, 957 0, 038 0, 237
Locomotive 0, 111 0, 895 0, 027 0, 000
Psychological 0, 092 0, 912 0, 032 0, 004
Digestive 0, 467 1, 595 0, 053 0, 000
Symptoms 0, 172 1, 188 0, 022 0, 000
Cancer 0, 108 0, 898 0, 075 0, 150
Infections 0, 165 1, 179 0, 071 0, 020
Injury 0, 512 1, 669 0, 028 0, 000
Prev. state 2 0, 532 1, 702 0, 020 0, 000
Year 2004 0, 003 1, 003 0, 037 0, 936
Year 2005 0, 108 1, 114 0, 036 0, 003
Year 2006 0, 061 1, 063 0, 036 0, 089
Year 2007 0, 039 1, 040 0, 037 0, 286
Year 2008 0, 015 1, 015 0, 036 0, 674
Year 2009 0, 007 1, 007 0, 036 0, 851
Year 2010 0, 030 1, 030 0, 039 0, 447
Year 2011 0, 022 0, 978 0, 072 0, 763
Deferment B 0, 147 0, 834 0, 022 0, 000
Deferment C 0, 405 0, 641 0, 029 0, 000
Deferment D 0, 706 0, 463 0, 346 0, 042
Time in system 0, 126 0, 882 0, 002 0, 000
Age 0, 024 0, 976 0, 002 0, 000
Female 0, 125 0, 882 0, 051 0, 014
Agriculture 0, 080 0, 923 0, 043 0, 063
Construction 0, 226 1, 254 0, 046 0, 000
(Para)Medical 0, 049 1, 050 0, 071 0, 487
Service 0, 139 1, 149 0, 082 0, 090
Shopkeeper 0, 135 0, 874 0, 067 0, 045
Locomotive 0, 301 0, 740 0, 042 0, 000
Psychological 0, 829 0, 436 0, 060 0, 000
Digestive 0, 628 1, 874 0, 064 0, 000
Symptoms 0, 231 1, 260 0, 039 0, 000
Cancer 1, 376 0, 253 0, 121 0, 000
Infections 0, 052 1, 053 0, 124 0, 673
Injury 0, 409 1, 505 0, 044 0, 000
Prev. state 1 0, 151 1, 163 0, 054 0, 005
Year 2004 0, 067 0, 935 0, 057 0, 240
Year 2005 0, 075 0, 928 0, 057 0, 189
Year 2006 0, 165 0, 848 0, 058 0, 005
Year 2007 0, 085 0, 919 0, 058 0, 142
Year 2008 0, 260 0, 771 0, 058 0, 000
Year 2009 0, 219 0, 803 0, 057 0, 000
Year 2010 0, 164 0, 849 0, 061 0, 007
Year 2011 0, 347 0, 707 0, 107 0, 001
Deferment B 0, 217 0, 805 0, 035 0, 000
Deferment C 0, 376 0, 687 0, 051 0, 000
Deferment D 0, 300 0, 741 0, 382 0, 433
Time in system 0, 180 0, 835 0, 006 0, 000
Age 0, 007 0, 993 0, 001 0, 000
Female - - - -
Agriculture 0, 317 1, 373 0, 031 0, 000
Construction 0, 158 0, 854 0, 036 0, 000
(Para)Medical 0, 239 1, 270 0, 048 0, 000
Service 0, 032 1, 033 0, 062 0, 603
Shopkeeper 0, 091 1, 095 0, 048 0, 058
Locomotive 0, 293 1, 340 0, 034 0, 000
Psychological 0, 131 0, 877 0, 041 0, 001
Digestive 0, 124 1, 132 0, 057 0, 030
Symptoms 0, 112 1, 119 0, 028 0, 000
Cancer 0, 783 0, 457 0, 065 0, 000
Infections 0, 308 1, 361 0, 096 0, 001
Injury 0, 083 1, 087 0, 033 0, 011
Prev. state 1 0, 610 1, 840 0, 032 0, 000
Year 2004 0, 114 1, 121 0, 043 0, 008
Year 2005 0, 170 1, 185 0, 044 0, 000
Year 2006 0, 108 1, 114 0, 044 0, 013
Year 2007 0, 201 1, 223 0, 044 0, 000
Year 2008 0, 083 1, 087 0, 043 0, 054
Year 2009 0, 023 1, 023 0, 044 0, 606
Year 2010 0, 116 1, 123 0, 045 0, 010
Year 2011 0, 061 1, 063 0, 073 0, 405
Deferment B 0, 093 0, 911 0, 027 0, 001
Deferment C 0, 118 0, 889 0, 034 0, 001
Deferment D 1, 042 0, 353 0, 289 0, 000
Time in system 0, 064 0, 938 0, 002 0, 000
B.1 Business cycle
DNB indicator 0, 030 0, 970 0, 012
Condence 0, 004 0, 996 0, 032
GDP 0, 030 0, 970 0, 000
Labour 0, 005 0, 995 0, 707
Income 0, 012 0, 988 0, 117
Table B.5: Coecient, hazard rates and p-values for the business cycle variables, transition
1 2.
DNB indicator 0, 020 1, 020 0, 006
Condence 0, 004 1, 004 0, 006
GDP 0, 013 1, 013 0, 027
Labour 0, 007 0, 993 0, 599
Income 0, 003 1, 003 0, 686
1 0.
DNB indicator 0, 005 0, 995 0, 818
Condence 0, 004 1, 004 0, 224
GDP 0, 013 1, 013 0, 321
Labour 0, 003 0, 997 0, 388
Income 0, 012 1, 012 0, 494
2 0.
DNB indicator 0, 017 0, 983 0, 165
Condence 0, 001 0, 999 0, 544
GDP 0, 011 0, 989 0, 094
Labour 0, 005 1, 005 0, 739
Income 0, 004 1, 004 0, 631
2 1.
Appendix C
Business cycle
In this chapter the graphs of the signicant business cycle-related variables are plotted for the
years 2003 2011. It is noted that all three graphs show approximately the same patern, but on
a dierent scale.
(a) DNB-indicator
(b) Condence
(c) GDP
Figure C.1: The values of the signicant business cycle-related variables during the period 2003
2011.
83
Appendix D
Comparison of models
In this chapter it will be explained how dierent models can be compared. We have used this in
chapter 4 and 5 to obtain the best tting survival and logit model.
D.1 Comparing non-nested models
Models are non-nested if it is impossible to express one model as a constrained version of the
other. Non-nested models may include the same predictor variables, but they may also involve
some variables that are unique to each model. Discriminating between nested models is possible
by using a standard hypothesis test of the parametric restriction that reduces one model to the
other. In the non-nested case, however, alternative methods need to be developed. For example
we can use information criteria such as the Akaikes Information Criterion (AIC) or the Bayesian
Information Criterion (BIC). These criteria are full sample criteria and they seek, in model
selection, to incorporate the divergent considerations of accuracy of estimation and the best
approximation to reality. The essential intuition behind all these criteria is that there exists
a tension between model t, as measured by the maximized log-likelihood, and the principle
of parsimony that favors a simple model. The t of the model can be improved by increasing
model complexity, but parameters are only added if the resulting improvement in t suciently
compensates for loss of parsimony. The dierent information criteria vary in how steeply they
penalize model complexity. Before we can introduce them, we rst should explain the principle
of maximum likelihood estimation.
D.1.1 Maximum likelihood
The principle of maximum likelihood (ML) is based on distributional assumptions about the
data. Suppose we have i random variables with observations y
i
(i = 1, ..., n) for a sample of n
independent subjects. The probability density function for y
i
is denoted by f(y
i
|), where is
a parameter characterizing the distribution. The ML principle is an estimation principle that
nds an estimate for one or more unknown parameters (such as ) such that it maximizes the
likelihood of observing the data y
i
, i = 1, ..., n. The likelihood L of a model can be interpreted
as the probability of the observed data y, given that model. The probability of observing y
i
is
given by the density function f(y
i
|). Because the observations on the n subjects are assumed
to be independent, the joint density function of all observations is the product of the densities:
L() =
n
i=1
f(y
i
|) (D.1)
84
APPENDIX D. COMPARISON OF MODELS 85
This is called the likelihood and is a function of the unknown parameter . A certain parameter
value
1
is more likely than another
2
, in light of the observed data, if it makes those data more
probable. In that case one should have:
L(
1
|y) > L(
2
|y).
The expression for the likelihood can be simplied if its natural logarithm is taken, in which case
the product in D.1 is replaced by a sum:
l() = ln
_
n
i=1
f(y
i
|)
_
=
N
i=1
ln f(y
i
|). (D.2)
Since the natural logarithm is a monotone increasing function, maximizing the log-likelihood D.2
yields the same estimates as maximizing the likelihood D.1.
D.1.2 Information Criteria
Akaike (1974) proposed the Akaikes Information Criterion as a simple model comparison cri-
terion to compare model t. It takes into account the number of regression coecients being
tested; given equal t of two models, the more parsimonious model (i.e. having fewer predic-
tors) will have a better AIC t index. This criterion is represented by the log-likelihood and an
additional term to penalize for lack of parsimony:
AIC = 2 ln L + 2k,
where k is the number of parameters, including the intercept.
Schwarz (1978) criticized Akaikes criterion as being asymptotically non-optimal. He proposed a
revised form of the penalty function by introducing the Bayesian Information Criterion:
BIC = 2 ln L + (ln N)k,
where N is the number of observations. The BIC may be negative or positive; the more negative
the value, the better the t.
If model parsimony is important, then the BIC is more widely used since the model-size penalty
for AIC is relatively low. Given their simplicity, penalized likelihood criteria are often used for
selecting the best model. However, there is no clear answer to which criterion, if any, should be
preferred. From a decision-theoretic point of view, the choice of the model from a set of models
should depend on the intended use of that model [8].
Bibliography
[1] W.W.M. Abeysekera and M.R. Sooriyarachchi, Use of Schoenfelds Global Test to Test the
Proportional Hazards Assumption in the Cox Proportional Hazards Model: an Application
to a Clinical Study, J. Natn. Sci. Foundation Sri Lanka 2009 37(1): 41-51.
[2] A. Agresti, Categorical Data Analysis, New York: John Wiley & Sons Inc, 2002.
[3] P.K. Andersen, O. Borgan, R.D. Gill and N. Keiding, Statistical Models Based on Counting
Processes, Springer Series in Statistics, Springer-Verlag, 1993.
[4] B.H. Baltagi, Econometric Analysis of Panel Data, John Wiley & Sons, Ltd, second edition,
2001.
[5] P.D. Allison, Logistic Regression Using the SAS System: Theory and Application, SAS
Institute Inc., 1999.
[6] P.D. Allison, Survival Analysis using SAS: A Practical Guide, SAS Institute Inc., USA,
2010.
[7] R. Amelink, Disability Durations of Dutch Self-Employed; assessing how certain risk factors,
particular variables related to the business cycle, inuence disability durations of Dutch self-
employed, Faculty of Economics and Business, Groningen, 2010.
[8] D. Beal, Information Criteria Methods in SAS for Multiple Linear Regression Models, Sci-
ence Applications International Corporation, Oak Ridge, TN.
[9] G. J. van den Berg, Duration Models: Specication, Identication and Multiple Durations,
Handboook of Econometrics, 2001.
[10] P. W. Bultena, Application of Claim Reserving in a Multiple State Model for Disability
Insurance, Faculty of Economic and Business, Groningen, 2009.
[11] A. Cameron and P. Trivedi, Microeconometrics; methods and applications, Cambridge Uni-
versity Press, 2005.
[12] J. Cohen, P. Cohen, S. West and L. Aiken, Applied Multiple Regression/Correlation Analysis
for the Behavioral Sciences, Lawrence Erlbaum Associates, Publisher, London, 2003.
[13] D.R. Cox and D. Oakes, Analysis of Survival Data, Chapman and Hall, London, New York,
1984.
[14] D. Delong, G. Guirguis and Y. So, Ecient Computation of Subset Selection Probabilities
with Application to Cox Regression, Biometrika 81: 607-611, 1994.
86
BIBLIOGRAPHY 87
[15] J. De Ravin, The Management of Disability Income Claims, Australian Actuarial Journal,
Volume IV Issue 4, The Institute of Actuaries of Australia, 1998.
[16] S.P. Jenkins, Survival Analyis, Lecture notes, University of Essex, 2005.
[17] J. Cramer, Logit Models From Economics and Other Fields, Cambridge University Press,
2003.
[18] J. Cramer, The Logit Model for Economists, Routledge, Chapman and Hall Inc, 1991.
[19] R. F. Engle, Wald, Likelihood Ratio, and Lagrange Multiplier Tests in Econometrics, Hand-
book of Econometrics, Volume II, Elsevier Science Publishers BV, 1984.
[20] S. Haberman and E. Pitacco, Actuarial Models for Disability Insurance, Chapman & Hall,
1999.
[21] Hair, Black, Babin, Anderson and Tatham, Multivariate Data Analysis, Pearson Prentice
Hall, New Jersey, 2006.
[22] K. van Harn and P. Holewijn, Markov-ketens in diskrete tijd, Epsilon Uitgaven, Utrecht,
1991.
[23] C. Heij, P. de Boer, P. H. Franses, T. Kloek and H. van Dijk, Econometric Methods with
Applications in Business and Economics, Oxford University Press, 2004.
[24] T. Lancaster, The Econometric Analysis of Transition Data, Cambridge University Press,
1990.
[25] P. Leeang, D. Wittink, M. Wedel and P. Naert, Building Models for Marketing Decisions,
Kluwer Academic Publishers, Dordrecht, 2000.
[26] K. Liang and S. Zeger, Longitudinal Data Analysis Using Generalized Linear Models, 1986,
Biometrika 73 (1): 1322.
[27] M. Mills, Introducing Survival and Event History Analysis, Sage Publications Ltd, 2011.
[28] F. van Ruth, B. Schouten and R. Wekker, The Statistics Netherlands Business Cycle Tracer.
Methodological Aspects; concept, cycle computation and indicator selection, 2005.
[29] D. Schoenfeld, Residuals for the proportional hazards regression model, Biometrika, 1982.
[30] D. Service and K. Ferris, Disability Experience and Economic Correlations, Institute of
Actuaries of Australia Convention, 2001.
[31] H.J. Smoluk, Long-Term Disability Claims Rates and the Consumption-to-Wealth Ratio,
The Journal of Risk and Insurance 2009, Vol. 76, No. 1, 109-131.
[32] L. Spierdijk and R. Koning, Sickness Absenteeism Among Self-Employed: Determinants of
Return to Work, 2010.
[33] L. Spierdijk, et al., The Determinants of Sick Leave Durations of Dutch Self-Emplyed, J.
Health Econ. 2009.
[34] T. Therneau and P. Grambsch, Modeling Survival Data: Extending the Cox Model,
Springer-Verlag, 2000.
[35] J.M. Wooldridge, Introductory Economics; a modern approach, 2008.
BIBLIOGRAPHY 88
[36] D. Zhang, Analysis of Survival Data, Lecture notes, Department of Statistics, North Carolina
State University, 2005.
[37] http://www.ats.ucla.edu/stat/sas/seminars/sas logistic/logistic1.htm.
[38] http://www.cbs.nl/.
[39] http://www.dnb.nl/en/onderzoek-2/dnb238497.jsp.

F 473272830

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

F 473272830

Загружено:

Авторское право:

Доступные форматы

Disability Income Insurance

Explaining the structural changes in the disability duration of Dutch

, that is when there are many subjects still at

S(t) = P(T > t|T > t 1) = 1 (t),

). The random eects MLE of and

maximizes the log-likelihood

) where A is a symmetric positive denite matrix,

Вам также может понравиться