Вы находитесь на странице: 1из 169

Stochastic Claims Reserving Methods

in Non-Life Insurance

Mario V. W
uthrich 1
Department of Mathematics

Michael Merz 2
Faculty of Economics

ETH Z
urich

University T
ubingen

Version 1.1

1 ETH

Z
urich, CH-8092 Z
urich, Switzerland.
T
ubingen, D-72074 T
ubingen, Germany.

2 University

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

Contents
1 Introduction and Notation
1.1 Claims process . . . . . . . . . . . . . . . . . . . . . .
1.1.1 Accounting principle and accident year . . . . .
1.1.2 Inflation . . . . . . . . . . . . . . . . . . . . . .
1.2 Structural framework to the claims reserving problem .
1.2.1 Fundamental properties of the reserving process
1.2.2 Known and unknown claims . . . . . . . . . . .
1.3 Outstanding loss liabilities, classical notation . . . . . .
1.4 General Remarks . . . . . . . . . . . . . . . . . . . . .
2 Basic Methods
2.1 Chain-ladder model (distribution free model) . . . .
2.2 The Bornhuetter-Ferguson method . . . . . . . . .
2.3 Number of IBNyR claims, Poisson model . . . . . .
2.3.1 Poisson derivation of the chain-ladder model

.
.
.
.

.
.
.
.

3 Chain-ladder models
3.1 Mean square error of prediction . . . . . . . . . . . . .
3.2 Chain-ladder method . . . . . . . . . . . . . . . . . . .
3.2.1 The Mack model . . . . . . . . . . . . . . . . .
3.2.2 Conditional process variance . . . . . . . . . . .
3.2.3 Estimation error for single accident years . . . .
3.2.4 Conditional MSEP in the chain-ladder model for
accident years . . . . . . . . . . . . . . . . . . .
3.3 Analysis of error terms . . . . . . . . . . . . . . . . . .
3.3.1 Classical chain-ladder model . . . . . . . . . . .
3.3.2 Enhanced chain-ladder model . . . . . . . . . .
3.3.3 Interpretation . . . . . . . . . . . . . . . . . . .
3.3.4 Chain-ladder estimator in the enhanced model .
3.3.5 Conditional process and prediction errors . . . .
3

.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.

7
7
9
10
12
13
15
16
18

.
.
.
.

21
21
27
30
34

. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
aggregated
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .

39
39
41
42
46
48
59
62
63
64
65
66
67

Contents
3.3.6
3.3.7

Chain-ladder factors and conditional estimation error . . . .


Parameter estimation . . . . . . . . . . . . . . . . . . . . . .

68
75

4 Bayesian models
85
4.1 Introduction to credibility claims reserving methods . . . . . . . . . 85
4.1.1 Benktander-Hovinen method . . . . . . . . . . . . . . . . . . 86
4.1.2 Minimizing quadratic loss functions . . . . . . . . . . . . . . 89
4.1.3 Cape-Cod Model . . . . . . . . . . . . . . . . . . . . . . . . 92
4.1.4 A distributional example to credible claims reserving . . . . 95
4.2 Exact Bayesian models . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.2.2 Log-normal/Log-normal model . . . . . . . . . . . . . . . . 101
4.2.3 Overdispersed Poisson model with gamma a priori distribution108
4.2.4 Exponential dispersion family with its associate conjugates . 116
4.2.5 Poisson-gamma case, revisited . . . . . . . . . . . . . . . . . 125
4.3 B
uhlmann-Straub Credibility Model . . . . . . . . . . . . . . . . . . 126
4.3.1 Parameter estimation . . . . . . . . . . . . . . . . . . . . . . 132
4.4 Multidimensional credibility models . . . . . . . . . . . . . . . . . . 136
4.4.1 Hachemeister regression model . . . . . . . . . . . . . . . . . 137
4.4.2 Other credibility models . . . . . . . . . . . . . . . . . . . . 140
4.5 Kalman filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
5 Outlook
A Unallocated loss adjustment
A.1 Motivation . . . . . . . . .
A.2 Pure claims payments . .
A.3 ULAE charges . . . . . . .
A.4 New York-method . . . . .
A.5 Example . . . . . . . . . .

149
expenses
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .

B Distributions
B.1 Discrete distributions . . . . . . . . .
B.1.1 Binomial distribution . . . . .
B.1.2 Poisson distribution . . . . . .
B.1.3 Negative binomial bistribution
B.2 Continuous distributions . . . . . . .
B.2.1 Normal distribution . . . . . .
B.2.2 Log-normal distribution . . .
B.2.3 Gamma distribution . . . . .

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.

151
151
152
153
153
157

.
.
.
.
.
.
.
.

159
159
159
159
160
160
160
160
161

Contents

B.2.4 Beta distribution . . . . . . . . . . . . . . . . . . . . . . . . 162

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

Contents

Chapter 1
Introduction and Notation
1.1

Claims process

In this lecture we consider claims reserving for a branch of insurance called


Non-Life Insurance.
Sometimes, this branch is also called General Insurance (UK), or Property and
Casualty Insurance (US).
This branch usually contains all kind of insurance products except life insurance
products. This separation is mainly for two reasons: 1) Life insurance products
are rather different from non-life insurance contracts, e.g. the terms of a contract,
the type of claims, etc. This implies that life and non-life products are modelled
rather differently. 2) Moreover, in many countries, e.g. in Switzerland, there is a
strict legal separation between life insurance and non-life insurance products. This
means that a company for non-life insurance products is not allowed to sell life
products, and on the other hand a life insurance company can besides life products
only sell health and disability products. Every Swiss company which sells both life
and non-life products has at least two legal entities.
The branch non-life insurance contains the following lines of business (LoB):
Motor insurance (motor third party liability, motor hull)
Property insurance (private and commercial property against fire, water,
flooding, business interruption, etc.)
Liability insurance (private and commercial liability including director and
officers (D&O) liability insurance)
Accident insurance (personal and collective accident including compulsory
accident insurance and workmens compensation)
7

Chapter 1. Introduction and Notation


Health insurance (private personal and collective health)
Marine insurance (including transportation)
Other insurance products, like aviation, travel insurance, legal protection,
credit insurance, epidemic insurance, etc.

A non-life insurance policy is a contract among two parties, the insurer and the
insured. It provides to the insurer a fixed amount of money (called premium), to
the insured a financial coverage against the random occurrence of well-specified
events (or at least a promise that he gets a well-defined amount in case such an
event happens). The right of the insured to these amounts (in case the event
happens) constitutes a claim by the insured on the insurer.
The amount which the insurer is obliged to pay in respect of a claim is known as
claim amount or loss amount. The payments which make up this claim are
known as
claims payments,
loss payments,
paid claims, or
paid losses.
The history of a typical claim may look as follows:

accident date

claims payments
reporting date

reopening
claims closing

payments

claims closing

time

Figure 1.1: Typical time line of a non-life insurance claim


This means that usually the insurance company is not able to settle a claim immediately, this is mainly due to two reasons:
1. Usually, there is a reporting delay (time-lag between claims occurrence and
claims reporting to the insurer). The reporting of a claim can take several
years, especially in liability insurance (e.g. asbestos or environmental pollution claims), see also Example 1.1.
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

Chapter 1. Introduction and Notation

2. After the reporting it can take several years until a claim gets finally settled.
In property insurance we usually have a rather fast settlement whereas in
liability or bodily injury claims it often takes a lot of time until the total
degree of a claim is clear and known (and can be settled).
3. It can also happen that a closed claim needs to be reopend due to new
(unexpected) new developments or in case a relapse happens.

1.1.1

Accounting principle and accident year

There are different premium accounting principles: i) premium booked, ii) premium
written, iii) premium earned. It depends on the kind of business written, which
principle should be chosen. W.l.o.g. we concentrate in the present manuscript on
the premium earned principle:
Usually an insurance company closes its books at least once a year. Let us assume
that we close our book always on December 31. How should we show a one-year
contract which was written on October 1 2006 with two premium installments paid
on October 1 2006 and April 1 2007?
We assume that
premium written 2006 = 100,
premium booked 2006 = 50 (= premium received in 2006),
pipeline premium 31.12.2006 = 50 (= premium which will be received in
2007), which gives premium booked 2007 = 50.
If we assume that the risk exposure is distributed uniformly over time (pro rata
temporis), this implies that
premium earned 2006 = 25 (= premium used for exposure in 2006),
unearned premium reserve UPR 31.12.2006 = 75 (= premium which will be
used for exposure in 2007), which gives premium earned 2007 = 75.
If the exposure is not pro rata temporis, then of course we have a different split
of the premium earned into the different accounting years. In order to have a
consistent financial statement it is now important that the accident date and the
premium accounting principle are compatible (via the exposure pattern). Hence all
claims which have accident year 2006 have to be matched to the premium earned
2006, i.e. the claims 2006 have to be paid by the premium earned 2006, whereas
the claims with accident year later than 2006 have to be paid by the unearned
premium reserve UPR 31.12.2006.
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

10

Chapter 1. Introduction and Notation

Hence on the one hand we have to build premium reserves for future exposures,
but on the other hand we need also to build claims reserves for unsettled claims of
past exposures. There are two different types of claims reserves for past exposures:
1. IBNyR reserves (incurred but not yet reported): We need to build claims
reserves for claims which have occurred before 31.12.2006, but which have
not been reported by the end of the year (i.e. the reporting delay laps into
the next accounting years).
2. IBNeR reserves (incurred but not enough reported): We need to build claims
reserves for claims which have been reported before 31.12.2006, but which
have not been settled yet, i.e. we still expect payments in the future, which
need to be financed by the already earned premium.
Example 1.1 (Reporting delay)
accident
year
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

0
368
393
517
578
622
660
666
573
582
545
509
589
564
607
674
619
660
660

number of reported claims, non-cumulative according to reporting delay


reporting period
1
2
3
4
5
6
7
8
191
28
8
6
5
3
1
0
151
25
6
4
5
4
1
2
185
29
17
11
10
8
1
0
254
49
22
17
6
3
0
1
206
39
16
3
7
0
1
0
243
28
12
12
4
4
1
0
234
53
10
8
4
6
1
0
266
62
12
5
7
6
5
1
281
32
27
12
13
6
2
1
220
43
18
12
9
5
2
0
266
49
22
15
4
8
0
210
29
17
12
4
9
196
23
12
9
5
203
29
9
7
169
20
12
190
41
161

9
0
1
0
0
0
0
0
0
0

10
1
0
1
0
0
0
0
1

Table 1.1: claims development triangle for number of IBNyR cases (source [75])

1.1.2

Inflation

The following subsection on inflation follows Taylor [75].


Claims costs are often subject to inflation. Usually it is not the typical inflation,
like salary or price inflation. Inflation is very specific to the LoB chosen. For
example in the LoB accident inflation is driven by medical inflation, whereas for
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

Chapter 1. Introduction and Notation

11

the LoB motor hull inflation is driven by the technical complexity of car repairing
techniques. The essential point is that claims inflation may continue beyond the
occurrence date of the accident up to the point of its final payments/settlement.
If Xti denote the positive single claims payments at time ti expressed in money
value at time t1 , then the total claim amount is in money value at time t1 given by

X
C1 =
Xti .
(1.1)
i=1

If () denotes the index which measures the claims inflation, the actual claim
amount (nominal) is

X
(ti )
Xti .
(1.2)
C=
(t
)
1
i=1
Whenever is an increasing function we observe that C is bigger than C1 . Of
course, in practice we only observe the unindexed payments Xti (ti )/(t1 ) and in
general it is difficult to estimate an index function such that we obtain indexed
values Xti . Finding an index function () is equivalent in defining appropriate
deflators , which is a well-known concept in market consistent actuarial valuation,
see e.g. W
uthrich-B
uhlmann-Furrer [91].
The basic idea between indexed values C1 is that, if two sets of payments relate
to identical circumstances except that there is a time translation in the payment,
their indexed values will be the same, whereas the unindexed values are not the
same: For c > 0 we assume that
t +c = Xt .
X
i
i
For increasing we have that
X
X
t +c = C1
C1 =
Xti =
X
i
C =

X
i=1

(1.3)

(1.4)

X (ti + c)
(ti + c)
Xti +c =
Xti > C,
(t1 )
(t1 )
i=1

(1.5)

whenever is an increasing function (we have assumed (1.3)). This means that
the unindexed values differ by the factor (ti + c)/(ti ). However in practice this
ratio turns often out to be even of a different form, namely

 (t + c)
i

1 + (ti , ti + c)
,
(1.6)
(ti )
meaning that over
[ti , ti + c] claim costs are inflated by an ad the time interval


ditional factor 1 + (ti , ti + c) above the natural inflation. This additional


inflation is referred to as superimposed inflation and can be caused e.g. by
changes in the jurisdiction and an increased claims awareness of the insured. We
will not further discuss this in the sequel.
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

12

Chapter 1. Introduction and Notation

1.2

Structural framework to the claims reserving


problem

In this section we present a mathematical framework for claims reserving. For


this purpose we follow Arjas [5]. Observe that in this subsection all actions of a
claim are ordered according to their notification at the insurance company. From
a statistical point of view this makes perfect sense, however from an accounting
point of view, one should order the claims rather to their occurence/accident date,
this has been done e.g. in Norberg [58, 59]. Of course, there is a one-to-one relation
between the two concepts.
We assume that we have N claims within a fixed time period with reporting dates
T1 , . . . TN (assume that they are ordered, Ti Ti+1 for all i). Fix the i-th claim.
Then Ti = Ti,0 , Ti,1 , . . . , Ti,j , . . . , Ti,Ni denotes the sequence of dates, where some
action on claim i is observed, at time Ti,j we have for example a payment, a new
estimation of the claims adjuster or other new information on claim i. Ti,Ni denotes
the final settlement of the claim. Assume that Ti,Ni +k = for k 1.
We specify the events that take place at time Ti,j by
(
payment at time Ti,j for claim i,
Xi,j =
(1.7)
0, if there is no payment at time Ti,j ,
(
new information available at Ti,j for claim i,
Ii,j =
(1.8)
, if there is no new information at time Ti,j .
We set Xi,j = 0 and Ii,j = whenever Ti,j = .
With this structure we can define various interesting processes, moreover our claims
reserving problem splits into several subproblems. For every i we obtain a marked
point processes.
Payment process of claim i. (Ti,j , Xi,j )j0 defines the following cumulative
payment process
X
Ci (t) =
Xi,j .
(1.9)
j:Ti,j t

Moreover Ci (t) = 0 for t < Ti . The total ultimate claim amount is given by
X
Ci () = Ci (Ti,Ni ) =
Xi,j .
(1.10)
j0

The total claims reserves for claim i at time t for the future liabilities (outstanding claim at time t) are given by
X
Ri (t) = Ci () Ci (t) =
Xi,j .
(1.11)
j:Ti,j >t

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

Chapter 1. Introduction and Notation

13

Information process of claim i is given by (Ti,j , Ii,j )j0 .


Settlement process of claim i is given by (Ti,j , Ii,j , Xi,j )j0 .
We denote the aggregated processes of all claims i by
C(t) =

R(t) =

N
X
i=1
N
X

Ci (t),

(1.12)

Ri (t).

i=1

C(t) denotes all payments up to time t for all N claims, and R(t) denotes the
outstanding claims payments (reserves) at time t for these N claims.
We consider now claims reserving as a prediction problem. Let
FtN = {(Ti,j , Ii,j , Xi,j )i1,j0 : Ti,j t}

(1.13)

be the information available at time t. This -field is obtained from the information
available at time t from the claims settlement processes. Often there is additional
exogenous information Et at time t (change of legal practice, high inflation, job
market infromation, etc.). Therefore we define the information which the insurance
company has at time t by

Ft = FtN Et .
(1.14)
Problem. Estimate the conditional distributions
t () = P [C() |Ft ] ,

(1.15)

Mt = E [C()|Ft ] ,

(1.16)

with the first two moments

Vt = Var (C()|Ft ) .

1.2.1

(1.17)

Fundamental properties of the reserving process

Because of
C() = C(t) + R(t),

(1.18)

we have that
def.

Mt = C(t) + E [R(t)|Ft ] = C(t) + mt ,


Vt = Var (R(t)|Ft ) .
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

(1.19)
(1.20)

14

Chapter 1. Introduction and Notation

Lemma 1.2 Mt is an Ft -martingale, i.e. for t > s we have that


E [Mt | Fs ] = Ms ,

a.s.

(1.21)

Proof. The proof is clear (successive forecasts).


2
Lemma 1.3 The variance process Vt is an Ft -supermartingale, i.e. for t > s we
have that
E [Vt | Fs ] Vs ,

a.s.

(1.22)

Proof. Using Jensens inequality for t > s we have a.s. that


E [Vt | Fs ] = E [Var (C()|Ft )| Fs ]
 

 

= E E C 2 () Ft Fs E E [ C()| Ft ]2 Fs


E C 2 () Fs E [E [ C()| Ft ]| Fs ]2

(1.23)

= Var (C()|Fs ) = Vs .
2
Consider u > t and define the increment from t to u by
M (t, u) = Mu Mt .

(1.24)

Then, a.s., we have that


E [M (t, u)M (u, )| Ft ] = E [M (t, u)E [ M (u, )| Fu ]| Ft ]

(1.25)

= E [M (t, u) (E [C()| Fu ] Mu )| Ft ] = 0.
This implies that M (t, u) and M (u, ) are uncorrelated, which is the well-known
property that a we have uncorrelated increments.
First approach to the claims reserving problem. Use the martingale integral
representation. This leads to the innovation gains process, which determines Mt
when updating Ft .
This theory is well-understood.
One has little idea about the updating process.
One has (statistically) not enough data.
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

Chapter 1. Introduction and Notation

15

Second approach to the claims reserving problem. For t < u we have that
Ft Fu . Since Mt is an Ft -martingale we have that
E [M (t, u)| Ft ] = 0

a.s.

(1.26)

We define the incremental payments within t and u by


X(t, u) = C(u) C(t).

(1.27)

Hence we have that


M (t, u) = E [ C()| Fu ] E [ C()| Ft ]
= C(u) + E [R(u)|Fu ] (C(t) + E [R(t)|Ft ])

(1.28)

= X(t, u) + E [R(u)|Fu ] E [C(u) C(t) + R(u)| Ft ]


= X(t, u) E [X(t, u)|Ft ] + E [R(u)|Fu ] E [R(u)|Ft ] .
Henceforth we have the following two terms
1. prediction error for payments within (t, t + 1]
X(t, t + 1) E [X(t, t + 1)|Ft ] ;

(1.29)

2. prediction error of reserves R(t + 1) when updating information


E [R(t + 1)|Ft+1 ] E [R(t + 1)|Ft ] .

1.2.2

(1.30)

Known and unknown claims

As in Subsection 1.1.1 we define IBNyR (incurred but not yet reported) claims and
reported claims. The following process counts the number of reported claims,
X
Nt =
1{Ti t} .
(1.31)
i1

Hence we can split the ultimate claim and the reserves at time t with respect to
the fact whether we have a reported or an IBNyR claim by
X
X
R(t) =
Ri (t) 1{Ti t} +
Ri (t) 1{Ti >t} ,
(1.32)
i

where
X

Ri (t) 1{Ti t}

reserves for at time t reported claims,

(1.33)

Ri (t) 1{Ti >t}

reserves for at time t IBNyR claims.

(1.34)

X
i

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

16

Chapter 1. Introduction and Notation

Hence we define
#
#
" N
t


X


= E
Ri (t) 1{Ti t} Ft = E
Ri (t) Ft ,


i
i=1
#
#
"
" N


X
X


= E
Ri (t) 1{Ti >t} Ft = E
Ri (t) Ft ,


"

Rtrep
RtIBN yR

(1.35)

(1.36)

i=Nt +1

where N is total (random) number of claims. Hence we easily see that


X
Rtrep =
E [Ri (t)| Ft ] ,

(1.37)

iNt

"
RtIBN yR = E

N
X
i=Nt +1

#


Ri (t) Ft .

(1.38)

Rtrep denotes the at time t expected future payments for reported claims. This is
often called best estimate reserves at time t for reported claims. RtIBN yR are the
at time t expected future payments for IBNyR claims (or best estimate reserves
for IBNyR claims).
Conclusions. (1.37)-(1.38) shows that the reserves for reported claims and the
reserves for IBNyR claims are of rather different nature:
i) The reserves for reported claims should be determined individually, i.e. on
a single claims basis. Often one has quite a lot of information on reported
claims (e.g. case estimates), which asks for an estimate on single claims.
ii) The reserves for IBNyR claims can not be decoupled due to the fact that N
is not known at time t (see (1.36)). Moreover we have no information on a
single claims basis. This shows that IBNyR reserves should be determined
on a collective basis.
Unfortunately most of the classical claims reserving methods do not distinguish
reported claims from IBNyR claims, i.e. they estimate the claims reserves at the
same time on both classes. In that context, we have to slightly disappoint the
reader, because the most methods presented in this manuscript do also not make
this distinction.

1.3

Outstanding loss liabilities, classical notation

In this subsection we introduce the classical claims reserving notation and terminology. In most cases outstanding loss liabilities are estimated in so-called claims
development triangles which separates claims on two time axis.
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

Chapter 1. Introduction and Notation

17

In the sequel we always denote by


i =

accident year, year of occurrence,

(1.39)

j =

development year, development period.

(1.40)

For illustrative purposes we assume that: Xi,j denotes all payments in development
period j for claims with accident year i, i.e. this corresponds to the incremental
claims payments for claims with accident year i done in accounting year i + j.
Below, we see which other meaning Xi,j can have.
In a claims development triangle accident years are usually on the vertical line
whereas development periods are on the horizontal line (see also Table 1.1). Usually the loss development tables split into two parts the upper triangle/trapezoid
where we have observations and the lower triangle where we want to estimate the
outstanding payments. On the diagonals we always see the accounting years.
Hence the claims data have the following structure:
accident
year i
0
1
..
.
I +1J
I +2J
..
.
..
.
I +iJ
..
.
..
.
I 2
I 1
I

development years j
3
4
...
j

...

realizations of r.v. Ci,j , Xi,j


(observations)

predicted Ci,j , Xi,j

Data can be shown in cumulative form or in non-cumulative (incremental) form.


Incremental data are always denoted by Xi,j and cumulative data are given by
Ci,j =

j
X

Xi,k .

(1.41)

k=0

The incremental data Xi,j may denote the incremental payments in cell (i, j), the
number of reported claims with reporting delay j and accident year i or the change
of reported claim amount in cell (i, j). For cumulative data Ci,j we often use
the terminology cumulative payments, total number of reported claims or
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

18

Chapter 1. Introduction and Notation

claims incurred (for cumulative reported claims). Ci, is often called ultimate
claim amount/load of accident year i or total number of claims in year i.

Xi,j incremental payments


Xi,j number of reported claims with delay j
Xi,j change of reported claim amount

Ci,j cumulative payments


Ci,j total number of reported claims
Ci,j claims incurred

Usually we have observations DI = {Xi,j ; i + j I} in the upper trapezoid and


DIc = {Xi,j ; i + j > I} needs to be estimated.
The payments in a single accounting year are
X

Xk =

Xi,j ,

(1.42)

i+j=k

these are the payments in the (k + 1)-st diagonal.


If Xi,j denote incremental payments then the outstanding loss liabilities for
accident year i at time j are given by

Ri,j =

Xi,k = Ci, Ci,j .

(1.43)

k=j+1

Ri,j are also called claims reserves, this is essentially the amount we have to
estimate (lower triangle) so that together with the past payments Ci,j we obtain
the whole claims load (ultimate claim) for accident year i.

1.4

General Remarks

If we consider loss reserving models, i.e. models which estimate the total claim
amount there are always several possibilities to do so.
Cumulative or incremental data
Payments or claims incurred data
Split small and large claims
Indexed or unindexed data
Number of claims and claims averages
Etc.
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

Chapter 1. Introduction and Notation

19

Usually, different methods and differently aggregated data sets lead to very different
results. Only an experienced reserving actuary is able to tell you which is an
accurate/good estimate for future liabilities for a specific data set.
Often there are so many phenomenons in the data which need first to be understood
before applying a method (we can not simply project the past to the future by
applying one model).
With this in mind we describe different methods, but only practical experience
will tell you which method should be applied in which situation. I.e. the focus
of this manuscript lies on the mathematical description of stochastic models. We
derive various properties of these models. The question of an appropriate model
choice for a specific data set is not treated here. Indeed, this is probably one
of the most difficult questions. Moreover, there is only very little literature on this
topic, e.g. for the chain-ladder method certain aspects are considered in BarnettZehnwirth [7] and Venter [77].
Remark on claims figures.
When we speak about claims development triangles (paid or incurred data), these
usually contain loss adjustment expenses, which can be allocated/attributed to
single claims (and therefore are contained in the claims figures). Such expenses
are called allocated loss adjustment expenses (ALAE). These are typically
expenses for external lawyers, an external expertise, etc. Internal loss adjustment
expenses (income of claims handling department, maintenance of claims handling
system, management fees, etc.) are typically not contained in the claims figures
and therefore have to be estimated separately. These costs are called unallocated
loss adjustment expenses (ULAE). Below, in the appendix, we describe the
New York-method (paid-to-paid method), which serves to estimate ULAE. The
New York-method is a rather rough method which only works well in stationary
situation. Therefore one could think of more sophisticated methods. Since usually,
ULAE are rather small compared to the other claims payments, the New Yorkmethod is often sufficient in practical applications.

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

20

Chapter 1. Introduction and Notation

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

Chapter 2
Basic Methods
We start the general discussion on claims reserving with three standard methods:
1. Chain-ladder method
2. Bornhuetter-Ferguson method
3. Poisson model for claim counts
This short chapter has on the one hand illustrative purposes to give some ideas,
how one can tackle the problem. It presents the easiest two methods (chain-ladder
and Bornhuetter-Ferguson method). On the other hand one should realize that in
practice these are the methods which are used most often (due to their simplicity).
The chain-ladder method will be discussed in detail in Chapter 3, the BornhuetterFerguson method will be discussed in detail in Chapter 4.
We assume that the last development period is given by J, i.e. Xi,j = 0 for j > J,
and that the last observed accident year is given by I (of course we assume (J I)).

2.1

Chain-ladder model (distribution free model)

The chain-ladder model is probably the most popular loss reserving technique. We
give different derivations for the chain-ladder model. In this section we give a
distribution-free derivation of the chain-ladder model (see Mack [49]). The conditional prediction error of the chain-ladder model will be treated in Chapter 3.
The classical actuarial literature often explains the chain-ladder method as a pure
computational alogrithm to estimate claims reserves. It was only much later that
actuaries started to think about stochastic models which generate the chain-ladder
algorithm. The first who came up with a full stochastic model for the chain-ladder
method was Mack [49]. In 1993, Mack [49] published one of the most famous
21

22

Chapter 2. Basic Methods

articles in claims reserving on the calculation of the standard error in the chainladder model.
Model Assumptions 2.1 (Chain-ladder model)
There exist development factors f0 , . . . , fJ1 > 0 such that for all 0 i I and
all 1 j J we have that
E [ Ci,j | Ci,0 , . . . , Ci,j1 ] = E [Ci,j | Ci,j1 ] = fj1 Ci,j1 ,

(2.1)

and different accident years i are independent.


2
Remarks 2.2
We assume independence of the accident years. We will see below that this
assumption is done in almost all of the methods. It means that we have
already eliminated accounting year effects in the data.
In addition we could also do stronger assumptions for the sequences Ci,0 , Ci,1 , . . .,
namely that they form Markov chains. Moreover, observe that
Ci,j

j1
Y

fl1

(2.2)

l=0

forms a martingale for j 0.


The factors fj are called development factors, chain-ladder factors or age-toage factors. It is the central object of interest in the chain-ladder method.
Lemma 2.3 Let DI = {Ci,j ; i + j I, 0 j J} be the set of observations
(upper trapezoid). Under Model Assumptions 2.1 we have for all I J + 1 i I
that
E [Ci,J | DI ] = E [Ci,J | Ci,Ii ] = Ci,Ii fIi fJ1 .
(2.3)
Proof. This is an exercise using conditional expectations:
E [Ci,J | Ci,Ii ] = E [Ci,J | DI ]
= E [E [Ci,J | Ci,J1 ]| DI ]

(2.4)

= fJ1 E [Ci,J1 | DI ] .
If we iterate this procedure until we reach the diagonal i + j = I we obtain the
claim.
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

Chapter 2. Basic Methods

23
2

Lemma 2.3 gives an algorithm for estimating the expected ultimate claim given
the observations DI . This algorithm is often called recursive algorithm. For known
chain-ladder factors fj we estimate the expected outstanding claims liabilities of
accident year i based on DI by
E [Ci,J | DI ] Ci,Ii = Ci,Ii (fIi fJ1 1) .

(2.5)

This corresponds to the best estimate reserves for accident year i at time I
(based on the information DI ). Unfortunately, in most practical applications the
chain-ladder factors are not known and need to be estimated. We define
j (i) = min{J, I i}

i (j) = I j,

and

(2.6)

these denote the last observations/indices on the diagonal. The age-to-age factors
fj1 are estimated as follows:
(j)
iP

fbj1 =

k=0
(j)
iP

Ck,j
.

(2.7)

Ck,j1

k=0

Estimator 2.4 (Chain-ladder estimator) The CL estimator for E [Ci,j | DI ] is


given by
CL
d
b [ Ci,j | DI ] = Ci,Ii fbIi fbj1
C
=E
(2.8)
i,j
for i + j > I.
We define (see also Table 2.1)
Bk = {Ci,j ; i + j I, 0 j k} DI .

(2.9)

In fact, we have BJ = DI , which is the set of all observations at time I.


Lemma 2.5 Under Model Assumptions 2.1 we have that:
h i

b
a) fj is, given Bj , an unbiased estimator for fj , i.e. E fbj Bj = fj ,
h i
b
b) fj is (unconditionally) unbiased for fj , i.e. E fbj = fj ,
h
i
h i
h
i
b
b
b
b
b
b
c) f0 , . . . , fJ1 are uncorrelated, i.e. E f0 . . . fJ1 = E f0 . . . E fJ1 ,
CL
d
d) C
i,J
Ci,Ii
his, given
i , an unbiased estimator for E [Ci,J | DI ] = E [Ci,J | Ci,Ii ],
CL
d
i.e. E C
CIi = E [Ci,J | DI ] and
i,J

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

24

Chapter 2. Basic Methods

accident
year
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

0
368
393
517
578
622
660
666
573
582
545
509
589
564
607
674
619
660
660

number of reported claims, non-cumulative according to reporting delay


reporting period
1
2
3
4
5
6
7
8
191
28
8
6
5
3
1
0
151
25
6
4
5
4
1
2
185
29
17
11
10
8
1
0
254
49
22
17
6
3
0
1
206
39
16
3
7
0
1
0
243
28
12
12
4
4
1
0
234
53
10
8
4
6
1
0
266
62
12
5
7
6
5
1
281
32
27
12
13
6
2
1
220
43
18
12
9
5
2
0
266
49
22
15
4
8
0
210
29
17
12
4
9
196
23
12
9
5
203
29
9
7
169
20
12
190
41
161

9
0
1
0
0
0
0
0
0
0

10
1
0
1
0
0
0
0
1

Table 2.1: The set B3


i
h
CL
CL
d
d
= E [Ci,J ].
e) Ci,J
is (unconditionally) unbiased for E [Ci,J ], i.e. E Ci,J
At the first sight, the uncorrelatedness of fbj is surprising since neighboring estimators of the age-to-age factors depend on the same data.
Proof of Lemma 2.5. a) We have
Pi (j)

h
i Pi (j) E [C | B ]
Ck,j1 fj1

k,j
j1
k=0
b
E fj1 Bj1 =
= k=0
= fj1 .
(2.10)
Pi (j)
Pi (j)
k=0 Ck,j1
k=0 Ck,j1
This immediately implies the conditional unbiasedness.
b) Follows immediately from a).
c) For the uncorrelatedness of the estimators we have for j < k
ii
h
i
h h
h
h ii
h
i


E fbj fbk = E E fbj fbk Bk = E fbj E fbk Bk = E fbj fk = fj fk ,
(2.11)
which implies the claim.
d) For the unbiasedness of the chain-ladder estimator we have


h
i
h
i
CL

b
b
d
E Ci,J Ci,Ii = E Ci,Ii fIi fJ1 Ci,Ii

h
h
i
i


b
b
b
= E Ci,Ii fIi fJ2 E fJ1 BJ1 Ci,Ii (2.12)

h
i
CL
\
= fJ1 E Ci,J1 Ci,Ii .
Iteration of this procedure leads to

h
i
CL
d
E Ci,J Ci,Ii = Ci,Ii fIi fJ1 = E [Ci,J | DI ] .
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

(2.13)

Chapter 2. Basic Methods

25

e) Follows immediately from d).


This finishes the proof of this lemma.
2
Remarks 2.6
Observe that we have proved in Lemma 2.5 that the estimators fbj are uncorrelated. But pay attention to the fact that they are not independent. In
fact, the squares of two successive estimators fbj and fbj+1 are negatively correlated (see also Lemma 3.8 below). It is also this negative correlation which
will lead to quite some discussions about estimation errors of our parameter
estimates.
Observe that Lemma 2.5 d) shows that we obtain unbiased estimators for the
best estimate reserves E [Ci,J | DI ].

Let us finish this section with an example.


Example 2.7 (Chain-ladder method)

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)
2
10563929
10316383
10092366
9268771
9178009
9585897
9056505
8256211

1.0229

1
9668212
9593162
9245313
8546239
8524114
9013132
8493391
7728169
7648729
1.0778

1.0148

3
10771690
10468180
10355134
9459424
9451404
9830796
9282022

1.0070

4
10978394
10536004
10507837
9592399
9681692
9935753

1.0051

5
11040518
10572608
10573282
9680740
9786916

1.0011

6
11106331
10625360
10626827
9724068

1.0010

7
11121181
10636546
10635751

1.0014

8
11132310
10648192

9
11148124

8470989

8243496
9129696

8445057
8432051
9338521

9419776
8570389
8557190
9477113

10005044
9485469
8630159
8616868
9543206

9837277
10056528
9534279
8674568
8661208
9592313

9734574
9847906
10067393
9544580
8683940
8670566
9602676

10646884
9744764
9858214
10077931
9554571
8693030
8679642
9612728

10663318
10662008
9758606
9872218
10092247
9568143
8705378
8691971
9626383

Total

15126
26257
34538
85302
156494
286121
449167
1043242
3950815
6047061

Reserves

Table 2.2: Observed historical cumulative payments Ci,j and estimated chain-ladder factors fbj

0
5946975
6346756
6269090
5863015
5778885
6184793
5600184
5288066
5290793
5675568
1.4925

CL
CL
d
d
Table 2.3: Estimated cumulative chain-ladder payments C
and estimated chain-ladder reserves C
Ci,Ii
i,j
i,J

0
1
2
3
4
5
6
7
8
9

0
1
2
3
4
5
6
7
8
9
fbj

26
Chapter 2. Basic Methods

Chapter 2. Basic Methods

2.2

27

The Bornhuetter-Ferguson method

The Bornhuetter-Ferguson method is in general a very robust method, since it


does not consider outliers in the observations. We will further comment on this
in Chapter 4. The method goes back to Bornhuetter-Ferguson [10] which have
published this method in 1972 in an article called the actuary and IBNR.
The Bornhuetter-Ferguson method is usually understood as a pure algorithm to
estimate reserves (this is also the way it was published in [10]). There are several
possibilities to define an appropriate underlying stochastic model which motivates
the BF method. Straightforward are for example the following assumptions:
Model Assumptions 2.8
Different accident years i are independent.
There exist parameters 0 , . . . , I > 0 and a pattern 0 , . . . , J > 0 with J =
1 such that for all i {0, . . . , I}, j {0, . . . , J 1} and k {1, . . . , J j}
E[Ci,0 ] = i 0 ,
E[Ci,j+k |Ci,0 , . . . , Ci,j ] = Ci,j + i (j+k j ) .

(2.14)
2

Then we have E[Ci,j ] = i j and E[Ci,J ] = i . The sequence (j )j denotes


the claims development pattern. If Ci,j are cumulative payments, then j is the
expected cumulative cashflow pattern (also called payout pattern). Such a pattern
is often used, when one needs to build market-consistent/discounted reserves, where
time values differ over time (see also Subsection 1.1.2 on inflation).
From this discussion we see that Model Assumptions 2.8 imply the following model
assumptions.
Model Assumptions 2.9
Different accident years i are independent.
There exist parameters 0 , . . . , I > 0 and a pattern 0 , . . . , J > 0 with
J = 1 such that for all i {0, . . . , I} and j {0, . . . , J 1}
E[Ci,j ] = i j .

(2.15)
2

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

28

Chapter 2. Basic Methods

Often the Bornhuetter-Ferguson method is explained with the help of Model Assumptions 2.9 (see e.g. Radtke-Schmidt [63], pages 37ff.). However, with Model
Assumptions 2.9 we face some difficulties: Observe that
E [Ci,J | DI ] = E[Ci,J |Ci,0 , . . . , Ci,Ii ]

(2.16)

= Ci,Ii + E [Ci,J Ci,Ii | Ci,0 , . . . , Ci,Ii ] .


If we have no additional assumptions, we do not exactly know, what we should do
with this last term. If we would know that the incremental payment Ci,J Ci,Ii
is independent from Ci,0 , . . . , Ci,Ii then this would imply that
E [Ci,J | DI ] = E[Ci,J |Ci,0 , . . . , Ci,Ii ]
= Ci,Ii + (1 Ii ) i ,

(2.17)

which also comes out of Model Assumptions 2.8.


In both model assumptions it remains to estimate the last term in (2.16)-(2.17).
In the Bornhuetter-Ferguson method this is done as follows
Estimator 2.10 (Bornhuetter-Ferguson estimator) The BF estimator is given
by


BF
b
d
b
bi
(2.18)
Ci,J
= E [Ci,J | DI ] = Ci,Ii + 1 Ii
for I J + 1 i I, where bIi is an estimate for Ii and
bi is an a priori
estimate for E[Ci,J ].
Comparison of Bornhuetter-Ferguson and chain-ladder estimator. From
the Chain-ladder Assumptions 2.1 we have that
E[Ci,j ] = E [E [Ci,j |Ci,j1 ]] = fj1 E[Ci,j1 ] = E[Ci,0 ]

j1
Y

fk ,

(2.19)

k=0

E[Ci,J ] = E[Ci,0 ]

J1
Y

fk ,

(2.20)

k=0

which implies
E[Ci,j ] =

J1
Y

fk1 E[Ci,J ].

(2.21)

k=j

If we compare this to the Bornhuetter-Ferguson model (Model Assumptions 2.9)


E [Ci,j ] = j i we find that
J1
Y

fk1

plays the role of

j ,

k=j

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

(2.22)

Chapter 2. Basic Methods

29

Q
1
since J1
describes the proportion already paid from i = E [Ci,J ] after j
k=j fk
development periods in the chain-ladder model. Therefore the two variables in
(2.22) are often identified: this can be done with Model Assumptions 2.9, but not
with Model Assumptions 2.8 (since Model Assumptions 2.8 are not implied by
the chain-ladder assumptions nor vica versa). I.e. if one knows the chain-ladder
factors fj one constructs a development pattern k using the identity in (2.22) and
vice-versa. Then the Bornhuetter-Ferguson estimator can be rewritten as follows

\ !
BF
1
d

bi .
(2.23)
C
= Ci,Ii + 1 QJ1
i,J
f
j=Ii j
On the other hand we have for the chain-ladder estimator that
J1
Y

CL
d
C
= Ci,Ii
i,J

fbj

j=Ii

J1
Y

= Ci,Ii + Ci,Ii

fbj 1

j=Ii
CL

d
C
i,J
= Ci,Ii + QJ1

b
j=Ii fj

= Ci,Ii +

J1
Y

fbj 1

j=Ii

1 QJ1

j=Ii

fbj

CL
d
C
.
i,J

(2.24)

Hence the difference between the Bornhuetter-Ferguson method and the chainladder method is that for the Bornhuetter-Ferguson method we completely believe
into our a priori estimate
bi , whereas in the chain-ladder method the a priori
CL
d
estimate is replaced by an estimate C
which comes completely from the obi,J
servations.
Parameter estimation.
For i we need an a priori estimate
bi . This is often a plan value from a strategic business plan. This value is estimated before one has any observations,
i.e. it is a pure a priori estimate.
For the still-to-come factor (1 Ii ) one should also use an a priori estimate
if one applies strictly the Bornhuetter-Ferguson method. This should be done
independently from the observations. In most practical applications here one
quits the path of the pure Bornhuetter-Ferguson method and one estimates
the still-to-come factor from the data with the chain-ladder estimators: If fbk
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

30

Chapter 2. Basic Methods


denote the chain-ladder estimators (2.7) (see also (2.22)), then we set
\ ! J1
Y 1
1
(CL)
.
(2.25)
bj
= bj = QJ1
=
fk
fbk
k=j

k=j

In that case the Bornhuetter-Ferguson method and the chain-ladder method


differ only in the choice of the estimator for the ultimate claim, i.e. a priori
estimate vs. chain-ladder estimate (see (2.23) and (2.24)).
Example 2.11 (Bornhuetter-Ferguson method)
We revisit the example given in Table 2.2 (see Example 2.7).
a priori
0
1
2
3
4
5
6
7
8
9

estimate
bi
11653101
11367306
10962965
10616762
11044881
11480700
11413572
11126527
10986548
11618437

(CL)
bIi
100.0%
99.9%
99.8%
99.6%
99.1%
98.4%
97.0%
94.8%
88.0%
59.0%

estimator
BF
d
C
i,J

estimator
CL
d
C
i,J

11148124
10664316
10662749
9761643
9882350
10113777
9623328
8830301
8967375
10443953

11148124
10663318
10662008
9758606
9872218
10092247
9568143
8705378
8691971
9626383
Total

BF

CL

reserves

reserves

16124
26998
37575
95434
178024
341305
574089
1318646
4768384
7356580

15126
26257
34538
85302
156494
286121
449167
1043242
3950815
6047061

Table 2.4: Claims reserves from the Bornhuetter-Ferguson method and the chainladder method
We already see in this example, that using different methods can lead to substantial
differences in the claims reserves.

2.3

Number of IBNyR claims, Poisson model

We close this chapter with the Poisson model, which is mainly used for claim counts.
The remarkable thing in the Poisson model is, that it leads to the same reserves
as the chain-ladder model (see Lemma 2.16). It was Mack [48], Appendix A, who
has first proved that the chain-ladder reserves as maximum likelihood reserves for
the Poisson model.
Model Assumptions 2.12 (Poisson model)
There exist parameters 0 , . . . , I > 0 and 0 , . . . , J > 0 such that the incremental
quantities Xi,j are independent Poisson distributed with
E[Xi,j ] = i j ,
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

(2.26)

Chapter 2. Basic Methods


for all i I and j J, and

31
PJ

j=0

j = 1.
2

For the definition of the Poisson distribution we refer to the appendix, Section
B.1.2.
The cumulative quantity in accident year i, Ci,J , is again Poisson-distributed with
E[Ci,J ] = i .

(2.27)

Hence, i is a parameter that stands for the expected number of claims in accident
year i (exposure), whereas j defines an expected reporting/cashflow pattern over
the different development periods j. Moreover we have
j
E[Xi,j ]
= ,
E[Xi,0 ]
0

(2.28)

which is independent of i.
Lemma 2.13 The Poisson model satisfies the Model Assumptions 2.8.
Proof. The independence of different accident years follows from the independence
of Xi,j . Moreover, we have that E [Ci,0 ] = E [Xi,0 ] = i 0 with 0 = 0 and
E [ Ci,j+k | Ci,0 , . . . , Ci,j ] = Ci,j +

k
X

E [ Xi,j+l | Ci,0 , . . . , Ci,j ]

(2.29)

l=1

= Ci,j + i

k
X

j+l = Ci,j + i (j+k j ) ,

l=1

with j =

Pj

l=0

j . This finishes the proof.


2

To estimate the parameters (i )i and (j )j there are different methods, one possibility is to use the maximum likelihood estimators: The likelihood function for
DI = {Ci,j ; i + j I, j J}, the -algebra generated by DI is the same as the
one generated by {Xi,j ; i + j I, j J}, is given by

Xi,j
Y 
i j (i j )
LDI (0 , . . . , I , 0 , . . . , J ) =
e

.
(2.30)
Xi,j !
i+jI
We maximize this log-Likelihood function by setting its I +J +2 partial derivatives
w.r.t. the unknown parameters j and j equal to zero. Thus, we obtain on DI
that
(Ii)J

(Ii)J

bi
bj =

j=0

i=0

Xi,j = Ci,(Ii)J ,

(2.31)

j=0

Ij

X
Ij

bi
bj =

Xi,j ,

i=0

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

(2.32)

32

Chapter 2. Basic Methods

P
for all i {0, . . . , I} and all j {0, . . . , J} under the constraint that

bj = 1.
This system has a unique solution and gives us the ML estimates for i and j .
Estimator 2.14 (Poisson ML estimator) The ML estimator in the Poisson
Model 2.12 is for i + j > I given by
P oi
d
b i,j ] =
X
= E[X
bi
bj ,
i,j

(2.33)
J
X

P oi
d
b [Ci,J | DI ] = Ci,Ii +
C
= E
i,J

P oi
d
X
.
i,j

(2.34)

j=Ii+1

Observe that
d
C
i,J

P oi

= Ci,Ii +

Ii
X

bj

bi ,

(2.35)

j=0

hence the Poisson ML estimator has the same form as the BF Estimator 2.10.
However, here we use estimates for i and j that depend on the data.
Example 2.15 (Poisson ML estimator)
We revisit the example given in Table 2.2 (see Example 2.7).

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

0
1
2
3
4
5
6
7
8
9

bj

58.96%

1
3721237
3246406
2976223
2683224
2745229
2828338
2893207
2440103
2357936

2
895717
723222
847053
722532
653894
572765
563114
528043

3
207760
151797
262768
190653
273395
244899
225517

4
206704
67824
152703
132976
230288
104957

5
62124
36603
65444
88340
105224

6
65813
52752
53545
43329

7
14850
11186
8924

8
11130
11646

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)
188846
188555
208825
2.17%

137754
125332
125139
138592
1.44%

69291
65693
59769
59677
66093
0.69%

50361
51484
48810
44409
44341
49107
0.51%

10506
10628
10865
10301
9372
9358
10364
0.11%

11133
10190
10308
10538
9991
9090
9076
10052
0.10%

15126
15124
13842
14004
14316
13572
12348
12329
13655
0.14%

bi
11148124
10663318
10662008
9758606
9872218
10092247
9568143
8705378
8691972
9626383

15126
26257
34538
85302
156494
286121
449167
1043242
3950815
6047061

estimated reserves

9
15813

P oi
d
Table 2.6: Estimated bi , bj , incremental payments X
and Poisson reserves
i,j

594767
658707
6.84%

Table 2.5: Observed historical incremental payments Xi,j

0
5946975
6346756
6269090
5863015
5778885
6184793
5600184
5288066
5290793
5675568

2795422
29.04%

0
1
2
3
4
5
6
7
8
9

Chapter 2. Basic Methods


33

34

Chapter 2. Basic Methods

Remark. The expected reserve is the same as in the chain-ladder model on cumulative data (see Lemma 2.16 below).

2.3.1

Poisson derivation of the chain-ladder model

I this subsection we show that the Poisson model (Section 2.3) leads to the chainladder estimate for the reserves.
Lemma 2.16 The Chain-ladder Estimate 2.4 and the ML Estimate 2.14 in the
Poisson model 2.12 lead to the same reserve.
In fact the Poisson ML model/estimate defined in Section 2.3 leads to a chainladder model (see formula (2.39)), moreover the ML estimators lead to estimators
for the age-to-age factors which are the same as in the distribution-free chain-ladder
model.
Proof. In the Poisson model 2.12 the estimate for E [Ci,j |Ci,j1 ] is given by

bi
bj + Ci,j1 .

(2.36)

If we iterate this procedure we obtain for i > I J


J
X

P oi
d
b [Ci,J |DI ] =
C
= E
bi
i,J

bj + Ci,Ii

j=j (i)+1

=
bi

J
X

j (i)

bj +

j=j (i)+1

Xi,j =
bi

j=0

J
X

bj ,

(2.37)

j=0

where in the last step we have used (2.31). Using (2.31) once more we find that
PJ

bj
P oi
d
b [Ci,J |DI ] = Ci,Ii P j=0
.
(2.38)
C
=E
i,J

j (i)
bj
j=0
This last formula can be rewritten introducing additional factors
PJ
bj
P oi
j=0
d
C
= Ci,Ii Pj (i)
i,J
bj
j=0
Pj (i)+1
PJ

bj
bj
j=0
j=0
= Ci,Ii Pj (i)
. . . PJ1 .
bj
bj
j=0
j=0

(2.39)

If we use Lemma 2.17 below we see that on DI we have that


Ij
X
i=0

Ci,jJ =

Ij
X
i=0

bi

jJ
X

bk .

k=0

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

(2.40)

Chapter 2. Basic Methods

35

Moreover using (2.32)


Ij
X

Ci,(j1)J =

Ij
X

Ij
X

Ci,jJ Xi,j 1{jJ} =

i=0

i=0

(j1)J

bi

i=0

bk .

(2.41)

k=0

But (2.40)-(2.41) immediately imply for j J that


Pj
PIj

b
k
i=0 Ci,j
= fbj1 .
= PIj
Pk=0
j1

b
C
k=0 k
i=0 i,j1

(2.42)

Hence from (2.39) we obtain


d
C
i,J

P oi

PI(j (i)+1)

= Ci,Ii
= Ci,Ii

Ck,j (i)+1
k=0

PI(j (i)+1)
Ck,j (i)
k=0
CL
d
,
fbIi fbJ1 = C
i,J

PIJ

k=0 Ck,J
. . . PIJ
k=0 Ck,J1

(2.43)

which is the chain-ladder estimate (2.8). This finishes the proof of Lemma 2.16.
2
Lemma 2.17 Under Model Assumptions 2.12 we have on DI that
Ij
X

Ci,jJ =

i=0

Ij
X
i=0

bi

jJ
X

bk .

(2.44)

k=0

Proof. We proof this by induction. Using (2.31) for i = 0 we have that


C0,IJ =

IJ
X

X0,j =
b0

j=0

IJ
X

bj ,

(2.45)

j=0

which is the starting point of our induction (j = I). Induction step j j 1


(using (2.31)-(2.32)): In the last step we use the induction assuption, then
I(j1)

I(j1)

Ci,(j1)J =

i=0

Ci,jJ + Ci,(j1)J Ci,jJ

(2.46)

i=0

Ij
X

I(j1)

Ci,jJ

i=0
Ij

Ci,jJ

i=0

i=0

Ij

Ij

Xi,j 1{jJ} + CIj+1,jJ

i=0
Ij

Ci,jJ

i=0

Xi,j 1{jJ} XIj+1,j 1{jJ} +

k=0

Xi,j 1{jJ} +

i=0
Ij

k=0

bk
bj 1{jJ}

XIj+1,k

k=0

jJ

i=0

XIj+1,k

(j1)J

Ij

bi

jJ
X

(j1)J

bi +
bIj+1

i=0

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

X
k=0

bk .

36

Chapter 2. Basic Methods

Hence we find that


I(j1)

Ij
X

Ci,(j1)J =

(j1)J

bi

i=0

i=0

bk +
bIj+1

k=0

I(j1)

(j1)J

bk

k=0

(j1)J

bi

i=0

bk ,

(2.47)

k=0

which proves the claim (2.44).


2
Corollary 2.18 Under Model Assumptions 2.12 we have for all j {0, . . . , J}
that (see also (2.25))
j
J1
X
Y 1
(CL)
.
(2.48)

bk = bj
=
bk
f
k=0
k=j
Proof. From (2.38) and (2.43) we obtain for all i I J
PJ

j=0

Ci,Ii PIi

bj

bj
j=0

Since

d
=C
i,J

P oi

d
=C
i,J

CL

= Ci,Ii fbIi . . . fbJ1 .

(2.49)

bj = 1 is normalized we obtain that


1=

Ii
X
j=0

bj

J1
Y
j=Ii

fbj =

Ii
X

(CL)

bj bIi

1

(2.50)

j=0

which proves the claim.


2
Remarks 2.19
Corollary 2.18 says that the chain-ladder method and the Poisson model
(CL)
method lead to the same cash-flow pattern bj
(and hence to the same
Bornhuetter-Ferguson reserve if we use this cash-flow pattern for the estimate
(CL)
of j ). Henceforth, if we use the cash-flow pattern bj
for the BF method,
the BF method and the Poisson model only differ in the choice of the expected
ultimate claim i , since with (2.35) we obtain that


P oi
(CL)
b
d
C
=
C
+
1

bi ,
(2.51)
i,J
i,Ii
Ii
where
bi is the ML estimate given in (2.31)-(2.32).
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

Chapter 2. Basic Methods

37

Observe that we have to solve a system of linear equations (2.31)-(2.32) to


obtain the ML estimates
bi and
bj . This solution can easily be obtained with
the help of the chain-ladder factors fbj (see Corollary 2.18), namely
(CL)
(CL)

bl = bl
bl1 =

J1
Y
k=l


1 
b
1 1/fl1 ,
fbk

(2.52)

and
(Ii)J

bi =

X
j=0

(Ii)J

Xi,j /

bj .

(2.53)

j=0

Below we will see other ML methods and GLM models where the solution
of the equations is more complicated, and where one applies algorithmic
methods to find numerical solutions.

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

38

Chapter 2. Basic Methods

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

Chapter 3
Chain-ladder models
3.1

Mean square error of prediction

In the previous section we have only given an estimate for the mean/expected
ultimate claim, of course we would also like to know, how good this estimate is.
To measure the quality of the estimate we consider second moments.
Assume that we have a random variable X and a set of observations D. Assume
b is a D-measurable estimator for E[X|D].
that X
Definition 3.1 (Conditional Mean Square Error of Prediction) The condib is defined by
tional mean square error of prediction of the estimator X

2 
b
b
(3.1)
msepX|D (X) = E X X D .
b we have
For a D-measurable estimator X

2
b = Var (X| D) + X
b E [X|D] .
msepX|D (X)

(3.2)

The first term on the right-hand side of (3.2) is the so-called process variance
(stochastic error), i.e. the variance which is within the stochastic model (pure
randomness which can not be eliminated). The second term on the right-hand
side of (3.2) is the parameter/estimation error. It reflects the uncertainty in the
estimation of the parameters and of the expectation, respectively. In general,
this estimation error becomes smaller the more observations we have. But pay
attention: In many practical situations it does not completely disappear, since we
try to predict future values with the help of past information, already a slight
change in the model over time causes lots of problems (this is also discussed below
in Section 3.3).
For the estimation error we would like to explicitly calculate the last term in (3.2).
However, this can only be done if E [X|D] is known, but of course this term is in
39

40

Chapter 3. Chain-ladder models

b Therefore, the derivation


general not known (we estimate it with the help of X).
of an estimate for the parameter error is more sophisticated: One is interested into
b therefore one studies the possible fluctuations of X
b
the quality of the estimate X,
around E [X|D].
Case 1. We assume that X is independent of D. This is e.g. the case if we
have i.i.d. experiments where we want to estimate its average outcome. In
that case we have that
E [X|D] = E[X]

and

Var (X| D) = Var (X) .

(3.3)

If we consider the unconditional mean square error of prediction for the estib we obtain
mator X

h
i
2 
b
b
b
msepX (X) = E msepX|D (X) = Var (X) + E X E [X]
,
(3.4)
 
b = E[X], we have
b is an unbiased estimator for E[X], i.e. E X
and if X
h
i

b
b
b .
msepX (X) = E msepX|D (X) = Var (X) + Var X
(3.5)
b
Hence the parameter error is estimated by the variance of X.
Example. Assume X and X1 , . . . , Xn are i.i.d. with mean and variance
b = Pn Xi /n that
2 < . Then we have for the estimator X
i=1
!2
n
X
1
b = 2 +
msepX|D (X)
Xi .
(3.6)
n i=1
By the strong law of large numbers we know that the last term disappears
a.s. for n . In order to determine this term for finite n, one would like to
P
explicitly calculate the distance between ni=1 Xi /n and . However, since
in general is not known, we can only give an estimate for that distance. If
we calculate the unconditional mean square error of prediction we obtain
b = 2 + 2 /n.
msepX (X)

(3.7)

P
Henceforth, we can say that the deviation of ni=1 Xi /n around is in the

average of order / n. But unfortunately this doesnt tell anything about


P
the estimation error for a specific realisation of ni=1 Xi /n. We will further
discuss this below.
Case 2. X is not independent of the observations D. We have several time
series examples below, where we do not have independence between different
observations, e.g. in the distribution free version of the chain-ladder method.
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

Chapter 3. Chain-ladder models

41

In all these cases the situation is even more complicated. Observe that if we
calculate the unconditional mean square error of prediction we obtain
h
i
b = E msepX|D (X)
b
msepX (X)
(3.8)


2
b E [ X| D]
= E [Var ( X| D)] + E X

2 
b E [X| D]
= Var (X) Var (E [X| D]) + E X

2 
b E [X]
= Var (X) + E X
h

i
b
2 E X E [X] (E [X| D] E [X]) .
b is unbiased for E [X] we obtain
If the estimator X
 


b
b
b
msepX (X) = Var (X) + Var X 2 Cov X, E [X| D] .

(3.9)

This again tells something on the average estimation error but it doesnt tell
b for a specific realization.
anything on the quality of the estimate X

3.2

Chain-ladder method

We have already described the chain-ladder method in Subsection 2.1. The chainladder method can be applied to cumulative payments, to claims incurred, etc. It
is the method which is most commonly applied because it is very simple, and often
using appropriate estimates for the chain-ladder factors, one obtains reliable claims
reserves.
The main deficiencies of the chain-ladder method are
The homogeneity property need to be satisfied, e.g. we should not have any
trends in the development factors (otherwise we have to transform our observations).
For estimating old development factors (fj for large j) there is only very
little data available, which is maybe (in practice) no longer representative
for younger accident years. E.g. assume that we have a claims development
with J = 20 (years), and that I = 2006. Hence we estimate with todays
information (accident years < 2006) what will happen with accident year
2006 in 20 years.
For young accident years, very much weight is given to the observations,
i.e. if we have an outlier on the diagonal, this outlier is projected right to
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

42

Chapter 3. Chain-ladder models


the ultimate claim, which is not always appropriate. Therefore for younger
accident years sometimes the Bornhuetter-Ferguson method is preferred (see
also discussion in Subsection 4.2.4).
In long-tailed branches/LoB the difference between the chain-ladder method
on cumulative payments and claims incurred is often very large. This is
mainly due to the fact that the homogeneity property is not fulfilled. Indeed,
if we have new phenomenons in the data, usually claims incurred methods
overestimates such effects, whereas estimates on paid data underestimate the
effects since we only observe the new behavior over time. This is mainly due to
the effect that the claims adjusters usually overestimate new phenomenons
(which is reflected in the claims incurred figures), whereas in claims paid
figures one observes new phenomenons only over time (when a claim is settled
via payments).
There is an extensive list of references on how the chain-ladder method should
be applied in practice and where future research projects could be settled.
We do not further discuss this here but only give two references [46] and [40]
which refer to such questions. Moreover, we would mention that there is also
literature on the appropriatness of the chain-ladder method for specific data
sets, see e.g. Barnett-Zehnwirth [7] and Venter [77].

3.2.1

The Mack model

We define the chain-ladder model once more, but this time we extend the definition to the second moments, so that we are also able to give an estimate for the
conditional mean square error of prediction for the chain-ladder estimator.
In the actuarial literature, the chain-ladder method is often understood as a purely
computational algorithm and leaves the question open which probabilistic model
would lead to that algorithm. It is Macks merit [49] that he has given first an
answer to that question (a first decisive step towards the formulas was done by
Schnieper [69]).
Model Assumptions 3.2 (Chain-ladder, Mack [49])
Different accident years i are independent.
(Ci,j )j is a Markov chain with: There exist factors f0 , . . . , fJ1 > 0 and
2
variance parameters 02 , . . . , J1
> 0 such that for all 0 i I and 1 j
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

Chapter 3. Chain-ladder models

43

J we have that
E [ Ci,j | Ci,j1 ] = fj1 Ci,j1 ,

(3.10)

2
Ci,j1 .
Var (Ci,j | Ci,j1 ) = j1

(3.11)
2

Remark. In Mack [49] there are slightly weaker assumptions, namely the Markov
chain assumption is replaced by weaker assumptions on the first two moments of
(Ci,j )j .
We recall the results from Section 2.1 (see Lemma 2.5):
Choose the following estimators for the parameters fj and j2 :
i (j+1)
P

fbj =

Ci,j+1

i=0
i (j+1)
P

i (j+1)

i=0

Ci,j

i=0

bj2

Ci,j+1
Ci,j

,
Ci,j
P
Ck,j

i (j+1)
k=0

(3.12)


2
i (j+1)
X
Ci,j+1
1

Ci,j
fbj .
=
i (j + 1) i=0
Ci,j

fbj is unconditionally and conditionally, given Bj , unbiased for fj .


fb0 , . . . , fbJ1 are uncorrelated.
If we define the individual development factors by
Fi,j+1 =

Ci,j+1
,
Ci,j

(3.13)

then the age-to-age factor estimates fbj are weighted averages of Fi,j+1 , namely
i (j+1)

fbj =

X
i=0

Ci,j
i (j+1)
P

Fi,j+1 .

(3.14)

Ck,j

k=0

Lemma 3.3 Under Assumptions 3.2 the estimator fbj is the Bj+1 -measurable unbiased estimator for fj , which has minimal conditional variance among all linear

combinations of the unbiased estimators Fi,j+1 0ii (j+1) for fj , conditioned on
Bj , i.e.


i (j+1)
X


Var fbj |Bj = minVar
i Fi,j+1 Bj .
(3.15)
i R

i=0
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

44

Chapter 3. Chain-ladder models

The conditional variance of fbj is given by


i (j+1)

Var fbj |Bj =

j2 /

Ci,j .

(3.16)

i=0

We need the following lemma to proof the statement:


Lemma 3.4 Assume that P1 , . . . , PH are stochastically independent unbiased es2
. Then the minimum variance unbiased
timators for with variances 12 , . . . , H
linear combination of the Ph is given by
H
P

P =

(Ph /h2 )

h=1
H
P

(3.17)

(1/h2 )

h=1

with
Var(P ) =

H
X

!1
(1/h2 )

(3.18)

h=1

Proof. See Proposition 12.1 in Taylor [75] (the proof is based on the method of
Lagrange).
2
Proof of Lemma 3.3. Consider the individual development factors
Fi,j+1 =

Ci,j+1
.
Ci,j

(3.19)

Conditioned on Bj , Fi,j+1 are unbiased and independent estimators for fj with


Var(Fi,j+1 |Bj ) = Var(Fi,j+1 |Ci,j ) = j2 /Ci,j .

(3.20)

With Lemma 3.4 the claim immediately follows with


i (j+1)

Var(fbj |Bj ) = j2 /

Ci,j .

(3.21)

i=0

2
Lemma 3.5 Under Assumptions 3.2 we have:
 2 
a)
bj2 is, given Bj , an unbiased estimator for j2 , i.e. E
bj Bj = j2 ,
 2
b)
bj2 is (unconditionally) unbiased for j2 , i.e. E
bj = j2 .
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

Chapter 3. Chain-ladder models

45

Proof. b) easily follows from a). Hence we prove a). Consider


"
"
2 #
2 #
Ci,k+1
Ci,k+1


fbk Bk = E
fk Bk
E
(3.22)


Ci,k
Ci,k

 

 
2 
Ci,k+1
2 E
fk fbk fk Bk + E fbk fk Bk .
Ci,k
Hence we calculate the terms on the r.h.s. of the equality above.
"

2 #

Ci,k+1
1
Ci,k+1

fk Bk = Var
2.
Bk =
E


Ci,k
Ci,k
Ci,k k

(3.23)

The next term is (using the independence of different accident years)




 

 

Ci,k+1
C
i,k+1
E
(3.24)
fk fbk fk Bk = Cov
, fbk Bk
Ci,k
Ci,k



Ci,k
Ci,k+1
= Pi (k+1)
Var
Bk
Ci,k
Ci,k
i=0
k2
= Pi (k+1)
i=0

Ci,k

Whereas for the last term we obtain



2 
 
k2

b
E fk fk Bk = Var fbk Bk = Pi (k+1)
i=0

Putting all this together gives


"
2 #
Ci,k+1

E
fbk Bk = k2

Ci,k

Ci,k

1
1
Pi (k+1)
Ci,k
Ci,k
i=0

(3.25)

!
.

(3.26)

Hence we have that



E
bk2 Bk =


"
2 #
i (k+1)
X
1
Ci,k+1

Ci,k E
fbk Bk = k2 ,


i (k + 1) i=0
Ci,k

(3.27)

which proves the claim a). This finishes the proof of Lemma 3.5.
2
The following equality plays an important role in the derivation of an estimator for
the conditional estimation error
h i
 
k2

2
b
E fk Bk = Var fbk Bk + fk2 = Pi (k+1)
+ fk2 .
(3.28)
Ci,k
i=0

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

46

Chapter 3. Chain-ladder models

In Estimator 2.4 we have already seen how we estimate the ultimate claim Ci,J ,
given the information DI in the chain-ladder model:
CL
d
b [ Ci,J | DI ] = Ci,Ii fbIi fbJ1 .
C
=E
i,J

(3.29)

Our goal is to derive an estimate for the conditional mean square error of prediction
CL
d
(conditional MSEP) of C
for single accident years i {I J + 1, . . . , I}
i,J



2 
CL
CL
d
d
msepCi,J |DI C
= E C
Ci,J DI
(3.30)
i,J
i,J
CL
d
= Var (Ci,J | DI ) + C
E [Ci,J |DI ]
i,J

and for aggregated accident years we consider

!
I
I
X
X
CL
CL
d
d

msepPi Ci,J |DI


C
=
E
C

i,J
i,J

2

!2

Ci,J DI .

i=IJ+1
i=IJ+1
i=IJ+1
(3.31)
From (3.30) we see that we need to give an estimate for the process variance and
for the estimation error (coming from the fact that fj is estimated by fbj ).

3.2.2

I
X

Conditional process variance

We consider the first term on the right-hand side of (3.30), which is the conditional
process variance. Assume J > I i,
Var (Ci,J | DI ) = Var (Ci,J | Ci,Ii )

(3.32)

= E [ Var (Ci,J | Ci,J1 )| Ci,Ii ] + Var ( E [Ci,J | Ci,J1 ]| Ci,Ii )


2
2
= J1
E [ Ci,J1 | Ci,Ii ] + fJ1
Var ( Ci,J1 | Ci,Ii )

2
J1

J2
Y

Ci,Ii

2
fj + fJ1
Var ( Ci,J1 | Ci,Ii ) .

j=Ii

Hence we obtain a recursive formula for the process variance. If we iterate this
procedure, we find that
Var ( Ci,J | Ci,Ii ) = Ci,Ii

J1
X

J1
Y

fn2

2
m

m=Ii n=m+1
J1
X

J1
Y

m1
Y

fl

l=Ii

2
fn2 m
E [Ci,m | Ci,Ii ]

m=Ii n=m+1

J1
2 X
E [Ci,J | Ci,Ii ]
m=Ii

2
2
m
/fm
.
E [Ci,m | Ci,Ii ]

This gives the following Lemma:


c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

(3.33)

Chapter 3. Chain-ladder models

47

Lemma 3.6 (Process variance for single accident years) Under Model Assumptions 3.2 the conditional process variance for the ultimate claim of a single
accident year i {I J + 1, . . . , I} is given by
J1
2 X
Var (Ci,J | DI ) = E [Ci,J | Ci,Ii ]
m=Ii

2
2
m
/fm
.
E [Ci,m | Ci,Ii ]

(3.34)

Hence we estimate the conditional process variance for a single accident year i by


d (Ci,J | DI ) = E
b (Ci,J E [Ci,J | DI ])2 DI
Var
J1


2 b2
X
CL 2

bm
/fm
d
= Ci,J
.
(3.35)

d CL
m=Ii Ci,m
The estimator for the conditional process variance can be rewritten in a recursive
form. We obtain for i {I J + 1, . . . , I}
2
d (Ci,J | DI ) = Var
d (Ci,J1 | DI ) fb2 +
\
Var
bJ1
C
i,J1
J1

CL

(3.36)

d (Ci,Ii | DI ) = 0.
where Var
Because different accident years are independent, we estimate the conditional process variance for aggregated accident years by
!
I
I

X
X

d
d (Ci,J | DI ) .
Var
Var
(3.37)
Ci,J DI =

i=IJ+1

i=IJ+1

Example 3.7 (Chain-ladder method)


We come back to our example in Table 2.2 (see Example 2.7). Since we do not
have enough data (i.e. we dont have I > J) we are not able to estimate the last
2
2
(cf. (3.12)). There is an extensive
with the estimator
bJ1
variance parameter J1
literature about estimation of tail factors and variance estimates. We do not further
discuss this here, but we simply choose the extrapolation chosen by Mack [49]:
 4

2
2
2
2

bJ1
= min
bJ2 /b
J3
;
bJ3
;
bJ2
(3.38)
2
as estimate for J1
. This estimate is motivated by the observation that the series 0 , . . . , J2 is usually decreasing (cf. Table 3.1). This gives the estimated
conditional process standard deviations in Table 3.2.
We define the estimated conditional variational coefficient for accident year i relative to the estimated CL reserves as follows:
d (Ci,J | DI )1/2
Var
d
Vcoi = Vco ( Ci,J | DI ) =
.
(3.39)
CL
d
C
Ci,Ii
i,J

If we take this variational coefficient as a measure for the uncertainty, we see that
the uncertainty of the total CL reserves is about 7%.
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

48

Chapter 3. Chain-ladder models

1
2
3
4
5
6
7
8
9
10
fbj

bj

0
1.6257
1.5115
1.4747
1.4577
1.4750
1.4573
1.5166
1.4614
1.4457

1
1.0926
1.0754
1.0916
1.0845
1.0767
1.0635
1.0663
1.0683

2
1.0197
1.0147
1.0260
1.0206
1.0298
1.0255
1.0249

3
1.0192
1.0065
1.0147
1.0141
1.0244
1.0107

4
1.0057
1.0035
1.0062
1.0092
1.0109

5
1.0060
1.0050
1.0051
1.0045

6
1.0013
1.0011
1.0008

7
1.0010
1.0011

8
1.0014

1.4925
135.253

1.0778
33.803

1.0229
15.760

1.0148
19.847

1.0070
9.336

1.0051
2.001

1.0011
0.823

1.0010
0.219

1.0014
0.059

Table 3.1: Observed historical individual chain-ladder factors Fi,j+1 , estimated


chain-ladder factors fbj and estimated standard deviations
bj
i
0
1
2
3
4
5
6
7
8
9
Total

Ci,Ii
11148124
10648192
10635751
9724068
9786916
9935753
9282022
8256211
7648729
5675568

CL
d
C
i,J
11148124
10663318
10662008
9758606
9872218
10092247
9568143
8705378
8691971
9626383

CL reserves
0
15126
26257
34538
85302
156494
286121
449167
1043242
3950815
6047061

d Ci,J |DI 1/2


Var

Vcoi

191
742
2669
6832
30478
68212
80077
126960
389783
424379

1.3%
2.8%
7.7%
8.0%
19.5%
23.8%
17.8%
12.2%
9.9%
7.0%

Table 3.2: Estimated chain-ladder reserves and estimated conditional process standard deviations

3.2.3

Estimation error for single accident years

Next we need to derive an estimate for the conditional parameter/estimation error,


i.e. we want to get an estimate for the accuracy of our chain-ladder factor estimates
fbj . The parameter error for a single accident year in the chain-ladder estimate is
given by (see (3.30), (2.3) and (2.8))


CL
d
C
E [Ci,J | DI ]
i,J

2


2
2
fbIi . . . fbJ1 fIi . . . fJ1
= Ci,Ii
2
= Ci,Ii

J1
Y
j=Ii

fbj2 +

J1
Y

fj2 2

j=Ii

J1
Y

(3.40)
!

fbj fj

j=Ii

Hence we would like to calculate (3.40). Observe that the realizations of the estimators fbIi , . . . , fbJ1 are known at time I, but the true chain-ladder factors
fIi , . . . , fJ1 are unknown. Hence (3.40) can not be calculated explicitly. In order
to determine the conditional estimation error we will analyze how much the posc
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

Chapter 3. Chain-ladder models

49

sible chain-ladder factors fbj fluctuate around fj . We measure these volatilities of


the estimates fbj by means of resampled observations for fbj .
There are different approaches to resample these values: conditional ones and unconditional ones, see Buchwalder et al. [13]. For the explanation of these different
approaches we fix accident year i {I J + 1, . . . , I}. Then we see from the righthand side of (3.40) that the main difficulty in the determination of the volatility
in the estimates comes from the calculation of the squares of the estimated chainladder factors.
Therefore, we focus for the moment on these squares, i.e. we need to resample the
following product of squared estimates
2
2
fbIi
. . . fbJ1
,

(3.41)

the treatment of the last term in (3.40) is then straightforward.


To be able to distinguish the different resample approaches we define by
O
DI,i
= {Ci,j DI ; j > I i} DI

(3.42)

the upper right corner of the observations DI with respect to development year
j = I i + 1.
accident
year i
0
..
.
i
..
.
I

development year j
...
I i
...

O
DI,i

O
Table 3.3: The upper right corner DI,i

For the following explanation observe that fbj is Bj+1 -measurable.


O
Approach 1 (Unconditional resampling in DI,i
). In this approach one calculates the expectation

h
i
2
2
b
b
E fIi . . . fJ1 BIi .
(3.43)

This is the complete averaging over the multidimensional distribution after time
O
I i. Since DI,i
BIi = holds true the value (3.43) does not depend on
O
O
the observations in DI,i
. I.e. the observed realizations in the upper corner DI,i
have no influence on the estimation of the parameter error. Therefore we call this
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

50

Chapter 3. Chain-ladder models

the unconditional version because it gives the average/expected estimation error


O
(independent of the observations in DI,i
).
O
Approach 2 (Partial conditional resampling in DI,i
). In this approach one
calculates the value

h
i
2
2
2
fbIi
. . . fbJ2
E fbJ1
(3.44)
BJ1 .
O
In this version the averaging is only done partially. However, DI,i
BJ1 6=
O
holds. I.e. the value (3.44) depends on the observations in DI,i . If one decouples
the problem of resampling in a smart way, one can even choose the position j
{I i, . . . , J 1} at which one wants to do the partial resampling.

O
Approach 3 (Conditional resampling in DI,i
). Calculate the value



i
i
h
i
h
h

2
2
2
b
b
b
E fIi BIi E fIi+1 BIi+1 . . . E fJ1 BJ1 .

(3.45)

Unlike the approach (3.44) the averaging is now done in every position j {I
O
Bj 6= if j > Ii the observed
i, . . . , J 1} on the conditional structure. Since DI,i
O
have a direct influence on the estimate and (3.45) depends on
realizations in DI,i
O
. In contrast to (3.43) the averaging is only done over the
the observations in DI,i
conditional distributions and not over the multidimensional distribution after I i.
Therefore we call this the conditional version. From a numerical point of view it
is important to note that Approach 3 allows for a multiplicative structure of the
measure of volatility (see Figure 3.1).
Concluding, this means that we consider different probability measures for the
resampling, conditional and unconditional ones. Observe that the estimated chainladder factors fbj are functions of (Ci,j+1 )i=0,...,Ij1 and (Ci,j )i=0,...,Ij1 , i.e.
Ij1
P

!
fbj = fbj (Ci,j+1 )i=0,...,Ij1 , (Ci,j )i=0,...,Ij1

Ci,j+1

i=0
Ij1
P

(3.46)

Ci,j

i=0

In the conditional resampling the denominator serves as a fixed volume measure,


whereas in the unconditional resampling the denominator is also resampled.
Since our time series (Ck,j )j is a Markov chain we can write its probability distribution (with the help of stochastic kernels Kj ) as follows:
dPk (x0 , . . . , xJ )
= K0 (dx0 ) K1 (x0 , dx1 ) K2 (x0 , x1 , dx2 ) KJ (x0 , . . . , xJ1 , dxJ )
= K0 (dx0 ) K1 (x0 , dx1 ) K2 (x1 , dx2 ) KJ (xJ1 , dxJ ) .
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

(3.47)

Chapter 3. Chain-ladder models

51

O
In Approach 1 one considers a complete resampling on DI,i
, i.e. one looks, given
BIi , at the measures

dP ((xk,j )k,j | BIi ) =

dPk (xk,Ii+1 , . . . , xk,Ik | Ck,Ii = xk,Ii )

(3.48)

k<i

KIi+1 (xk,Ii , dxk,Ii+1 ) KIk (xk,Ik1 , dxk,Ik ) ,

k<i

for the resampling of the estimated chain-ladder factors


!
Y

fbj =

jIi

fbj (xi,j+1 )i=0,...,Ij1 , (xi,j )i=0,...,Ij1 .

(3.49)

jIi

In Approach 3 we always keep fixed the set of actual observations Ci,j and we only
resample the next step in the time series, i.e. given DI we consider the measures
(see also Figure 3.1)

dPD I ((xk,j )k,j ) =

KIi+1 (Ck,Ii , dxk,Ii+1 ) KIk (Ck,Ik1 , dxk,Ik ) , (3.50)

k<i

for the resampling of


!
Y
jIi

fbj =

fbj (xi,j+1 )i=0,...,Ij1 , (Ci,j )i=0,...,Ij1 .

(3.51)

jIi

Hence in this context Ci,j serves as a volume measure for the resampling of Ci,j+1 .
In Approach 1 this volume measure is also resampled, whereas in Approach 3 it is
kept fixed.

Observe. The question, as to which approach should be chosen, is not a mathematical one and has lead to extensive discussions in the actuarial community (see
Buchwalder et al. [11], Mack et al. [52], Gisler [29] and Venter [78]). It depends
on the circumstances of the questions as to which approach should be used for a
specific practical problem. Henceforth, only the practitioner can choose the appropriate approach for his problems and questions.
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

52

Chapter 3. Chain-ladder models

Approach 1 (Unconditional resampling)


In the unconditional approach we have (due to the uncorrelatedness of the chainladder factors) that


2
CL
d
E C
E [Ci,J | DI ] BIi
(3.52)
i,J

" J1
#
J1
J1

Y
Y
Y

2
= Ci,Ii
E
fbj2 +
fj2 2
fbj fj BIi

j=Ii
j=Ii
j=Ii
!
" J1
#
J1

Y
Y
2
2
b
fj BIi
fj2 .
= Ci,Ii E

j=Ii

j=Ii

Hence, to give an estimate for the estimation error with the unconditional version,
we need to calculate the expectation in the last term of (3.52) (as described in
Approach 1). This would be easy, if the estimated chain-ladder factors fbj were
independent. But they are only uncorrelated, see Lemma 2.5 and the following
lemma (for a similar statement see also Mack et al. [52]):
Lemma 3.8 Under Model Assumptions 3.2 the squares of two successive chainladder estimators fbj1 and fbj are, given Bj1 , negatively correlated, i.e.



2
2
b
b
Cov fj1 , fj Bj1 < 0
(3.53)
for 1 j J 1.
Proof. Observe that fbj1 is Bj -measurable. We define
i (j+1)

Sj =

Ci,j .

(3.54)

i=0

Hence, we have that






2
Cov fbj1
, fbj2 Bj1

i
h

i
 h
h i



2
2
2
2
b
b
b
b
= E Cov fj1 , fj Bj Bj1 + Cov E fj1 Bj , E fj Bj Bj1




j2
2
2
= Cov fbj1 ,
+ fj Bj1
(3.55)
Sj


2


i (j)
2
X

j
1
=
Ci,j , Bj1 .
2 Cov
Sj
Sj1
i=0
Moreover, using

2
i (j)
X
2

Ci,j = Sj2 + 2 Sj CIj,j + CIj,j


,
i=0

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

(3.56)

Chapter 3. Chain-ladder models

53

the independence of different accident years and E [CIj,j | Bj1 ] = fj1 CIj,j1
leads to



2
2
b
b
Cov fj1 , fj Bj1
(3.57)








j2
1
2 1
=
2 Cov Sj , Bj1 + 2 fj1 CIj,j1 Cov Sj , Bj1 .
Sj
Sj
Sj1
Finally, we need to calculate both covariance terms on the right-hand side of (3.57).
Using Jensens inequality we obtain for = 1, 2












1
Cov Sj , Bj1 = E Sj1 Bj1 E Sj Bj1 E Sj1 Bj1
(3.58)
Sj



< E S 1 Bj1 E [Sj | Bj1 ] E [Sj | Bj1 ]1 = 0,
j

Jensens inequality is strict because we have assumed strictly positive variances


2
> 0, which implies that Sj is not deterministic at time j 1. This finishes
j1
the proof of Lemma 3.8.
2
Lemma 3.8 implies that the term
"
E

J1
Y
j=Ii


#


fbj2 BIi

(3.59)

can not easily be calculated. Hence from this point of view Approach 1 is not a
promising route for finding a closed formula for the estimation error.
Approach 3 (conditional resampling)
In Approach 3 we explicitly resample the observed chain-ladder factors fbj . To do
the resampling we introduce stronger model assumptions. This is done with a time
series model. Such time series models for the chain-ladder method can be found
in several papers in the literature see e.g. Murphy [55], Barnett-Zehnwirth [7] or
Buchwalder et al. [13].
Model Assumptions 3.9 (Time series model)
Different accident years i are independent.
There exist constants fj > 0, j > 0 and random variables i,j+1 such that
for all i {0, . . . , I} and j {0, . . . , J 1} we have that
p
Ci,j+1 = fj Ci,j + j Ci,j i,j+1 ,
(3.60)
with conditionally, given B0 , i,j+1 are independent with E [i,j+1 | B0 ] = 0,


E 2i,j+1 B0 = 1 and P [Ci,j+1 > 0| B0 ] = 1 for all i {0, . . . , I} and j
{0, . . . , J 1}.
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

54

Chapter 3. Chain-ladder models


2

Remarks 3.10
The time series model defines an auto-regressive process. It is particularly
useful for the derivation of the estimation error and reflects the mechanism
of generating sets of other possible observations.
The random variables i,j+1 are defined conditionally, given B0 , in order to
ensure that the cumulative payments Ci,j+1 stay positive, P [| B0 ]-a.s.
It is easy to show that Model Assumptions 3.9 imply the Assumptions 3.2 of
the classical stochastic chain-ladder model of Mack [49].
The definition of the time series model in Buchwalder et al. [11] is slightly
different. The difference lies in the fact that here we assume a.s. positivity
of Ci,j . This could also be done with the help of conditional assumptions,
i.e. the theory would also run through if we would assume that
P [Ci,j+1 > 0| Ci,j ] = 1,

(3.61)

for all i and j.

In the sequel we use Approach 3, i.e. we do conditional resampling in the time


series model. We therefore resample the observations for fbIi , . . . , fbJ1 , given the
upper trapezoid DI . Thereby we take into account that, given DI , the observations
for fbj could have been different from the observed values. To account for this
source of uncertainty we proceed as usual in statistics: Given DI , we generate for
ei,j+1 by the
i {0, . . . , I} and j {0, . . . , J 1} a set of new observations C
formula
p
ei,j+1 = fj Ci,j + j Ci,j ei,j+1 ,
(3.62)
C
where j > 0 and ei,j+1 , i,j+1 are independent and identically distributed given B0
(cf. Model Assumptions 3.9). This means that Ci,j acts as a fixed volume measure
ei,j+1 (d)
and we resample C
= Ci,j+1 , given Bj . This means in the language of stochastic
kernels that we consider the distributions Kj+1 (Ci,j , dxj+1 ) (see (3.50)).
ei,j+1 vs. Ci,j+1 ) to clearly illustrate
Remark. We have chosen a different notation (C
ei,j+1 are random variables and
that we resample on the conditional structure, i.e. C
Ci,j are (deterministic) volumes, given Bj .

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

Chapter 3. Chain-ladder models

55

In the spirit of Approach 3 (cf. (3.45)) we resample the observations for fbj by
only resampling the observations of development year j + 1. Together with the
resampling assumption (3.62) this leads to the following resampled representation
for the estimates of the development factors
i (j+1)
P

fbj =

ei,j+1
C

i=0
i (j+1)
P

Ci,j

i (j+1)
j X p
= fj +

Ci,j ei,j+1
Sj
i=0

(0 j J 1),

(3.63)

i=0

where

i (j+1)

Sj =

Ci,j .

(3.64)

i=0

As in (3.50) we denote the probability measure of these resampled chain-ladder


estimates by PD I .
These resampled estimates of the development factors have, given Bj , the same
distribution as the original estimated chain-ladder factors. Unlike the observations




ei,j ; i + j I and also the resampled estimates
Ci,j ; i + j I the observations C
are random variables given DI . Furthermore the observations Ci,j and the random
variables ei,j are independent, given B0 DI . This and (3.63) shows that
1) the estimators fb0 , . . . , fbJ1 are conditionally independent w.r.t. PD I ,
h i
2) ED I fbj = fj for 0 j J 1 and
3)

ED I

  
2
fbj
= f2 +
j

j2
Sj

for 0 j J 1.

Figure 3.1 illustrates the conditional resampling for two different possible obser(1)
(2)
vations DI and DI of the original data DI , which would give the two different
b(1) and C
b(2) for E [Ci,J | DI ].
chain-ladder estimates C
i,J
i,J
Therefore in Approach 3 we estimate the estimation error by (using 1)-3))


2 
2

b
b
EDI Ci,Ii fIi . . . fJ1 fIi . . . fJ1


2
= Ci,Ii
VarPD fbIi . . . fbJ1
I
!
  
J1
J1
Y
Y
2
(3.65)
2
= Ci,Ii

ED I fbl

fl2
l=Ii
2
= Ci,Ii

J1
Y
l=Ii

l=Ii


2

fl2 + l
Sl

J1
Y

!
fl2

l=Ii

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

56

Chapter 3. Chain-ladder models


h
i
2
2
E fbIi
+
BIi = fIi

J1
Q

2
Ii
SIi

l=Ii

h i
J1
Q  2

E fbl2 Bl =
fl +
l=Ii

l2
Sl

b(1)
C
i,J
(1)
fbIi

E [Ci,J | DI ]

fIi
b(2)
C
i,J
Ci,Ii

(2)
fbIi

I i

...

O
Figure 3.1: Conditional resampling in DI,i
(Approach 3)

Observe that this calculation is exact, the estimation has been done at the point
where we have decided to use Approach 3 for the estimation error, i.e. the estimate
was done choosing the conditional probability measure PD I .

2
2
Next, we replace the parameters Ii
, . . . , J1
and fIi , . . . , fJ1 with their estimators, and we obtain the following estimator for the conditional estimation error
of accident year i {I J + 1, . . . , I}




2 
CL
CL

d
d
b
d
Var Ci,J DI
= EDI Ci,J E [Ci,J | DI ]
2
= 1Ci,Ii

J1
Y
l=Ii

b2
fbl2 + l
Sl

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

J1
Y
l=Ii

(3.66)
!
fbl2

Chapter 3. Chain-ladder models

57

The estimator for the conditional estimation error can be written in a recursive
form. We obtain for i {I J + 1, . . . , I}


CL
d C
d
Var
DI
i,J

J2


2
2
Y 
CL

b
J1
l
2
2
2
d C
\

fb +
(3.67)
= Var
DI fbJ1 + Ci,Ii
i,J1
SJ1 l=Ii l
Sl

J2
 

2
2
Y
CL

bJ1

bJ1
2
2
b
d
\
= Var Ci,J1 DI fJ1 +
+ Ci,Ii

fb2 ,
SJ1
SJ1 l=Ii l


CL
d C[
where Var
DI = 0.
i,Ii
Estimator 3.11 (MSEP for single accident years, conditional version)
Under Model Assumptions 3.9 we have the following estimator for the conditional
mean square of prediction of the ultimate claim of a single accident year i {I
J + 1, . . . , I}



2 
CL
CL
d
d
b C
=E
Ci,J DI
msep
[ Ci,J |DI C
i,J
i,J
!

J1
J1
J1
 X

2
2 b2
Y 
Y
CL 2

b
/
f
(3.68)
l
l
2
d

= C
fbl2 + l
+ Ci,Ii

fbl2 .
i,J
CL
Sl
c
l=Ii
l=Ii Ci,l
l=Ii
{z
}
|
{z
} |
estimation error

process variance

We can rewrite (3.68) as follows




d
msep
[ Ci,J |DI C
i,J

CL

d
= C
i,J


CL 2

J1
X

bl2 /fbl2
l=Ii

ci,l CL
C

J1
Y
l=Ii

!
!

bl2 /fbl2
+1 1 .
Sl
(3.69)

We could also do a linear approximation to the estimation error:



J1
J1
J1
J1
2
Y 
Y
Y
X

bl2 /fbl2
l
2

fbl2
fbl2
.
fbl +
S
S
l
l
l=Ii
l=Ii
l=Ii
l=Ii

(3.70)

Observe that in fact the right-hand side of (3.70) is a lower bound for the left-hand
side. This immediately gives the following estimate:
Estimator 3.12 (MSEP for single accident years)
Under Model Assumptions 3.9 we have the following estimator for the conditional
mean square error of prediction of the ultimate claim of a single accident year
i {I J + 1, . . . , I}
!
J1

 
 X
2
CL
CL 2

b
1
1
l
[
d
d
+
.
(3.71)
msep
[ Ci,J |DI C
= C

i,J
i,J
2
b
ci,l CL Sl
C
l=Ii fl
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

58

Chapter 3. Chain-ladder models

The Mack [49] approach


Mack [49] even gives a different approach to the estimation of the estimation error.
Introduce for j {I i, . . . , J 1}


b
b
b
Tj = fIi fj1 fj fj fj+1 fJ1 .
(3.72)
Observe that


fbIi . . . fbJ1 fIi . . . fJ1

2

J1
X

!2
Tj

(3.73)

j=Ii

This implies that (see (3.40))




d
C
i,J

CL

E [Ci,J | DI ]

2

2
= Ci,Ii

J1
X
j=Ii

!
Tj2 + 2

Tj Tk

(3.74)

Iij<kJ1

Each term in the sums on the right-hand side of the equality above is now estimated
by a slightly modified version of Approach 2: We estimate Tj Tk for j < k by
E [Tj Tk | Bk ]
(3.75)
n

o n
o
n
o
2
2
= fbIi
fbj1
fj fbj fbj fj+1 fbj+1 . . . fk1 fbk1
io
n
h

2
2
. . . fJ1
fk E fk fbk Bk fk+1
= 0,
and Tj2 is estimated by

2 


2
2
2
2
E Tj2 Bj = fbIi
fbj1
E fj fbj Bj fj+1
. . . fJ1


2
2
= fbIi
fbj1

(3.76)

j2 2
2
.
f . . . fJ1
Sj j+1

Hence (3.40) is estimated by


2

Ci,Ii

J1
X

2
2
fbIi
fbj1

j=Ii

j2 2
2
fj+1 . . . fJ1
.
Sj

(3.77)

If we now replace the unknown parameters j2 and fj by its estimates we exactly




[
d CL for the conditional estimation error preobtain the estimate msep
[
C
Ci,J |DI

i,J

sented in Estimator 3.12.


Remarks 3.13
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

Chapter 3. Chain-ladder models

59

We see that the Mack estimate for the conditional estimation error (also
presented in Estimator 3.12) is a linear approximation and lower bound to
the estimate coming from Approach 3.
The difference comes from the fact that Mack [49] decouples the estimation
error in an appropriate way (with the help of the terms Tj ) and then applies
a partial conditional resampling to each of the terms in the decoupling.
The Time Series Model 3.9 has slightly stronger assumptions than the weighted
average development (WAD) factor model studied in Murphy [55], Model IV.
To obtain the crucial recursive formula for the conditional estimation error
(Theorem 3 in Appendix C of [55]) Murphy assumes independence for the estimators of the chain-ladder factors. However, this assumption is inconsistent
with the model assumptions since the chain-ladder factors indeed are uncorrelated (see Lemma 2.5c)) but the squares of two successive chain-ladder
estimators are negatively correlated as we can see from Lemma 3.8. The
point is that by his assumptions Murphy [55] gets a multiplicative structure
of the measure of volatility. In Approach 3 we get the multiplicative structure by the choice of the conditional resampling (probability measure PD I for
the measure of the (conditional) volatility of the chain-ladder estimator (see
discussion in Section 3.2.3). This means, in Approach 3 we do not assume
that the estimated chain-ladder factors are independent. Henceforth, since
in both estimators a multiplicative structure is used it turns out that the
recursive estimator (3.67) for the conditional estimation error is exactly the
estimator presented in Theorem 3 of Murphy [55] (see also Appendix B in
Barnett-Zehnwirth [7]).
Example 3.7 revisited
We come back to our example in Table 2.2. This gives the following error estimates:

From Tables 3.4 and 3.5 we see that the differences in the estimates for the conditional estimation error coming from the linear approximation (Mack formula) are
negligible. In all examples we have looked at we came to this conclusion.

3.2.4

Conditional MSEP in the chain-ladder model for aggregated accident years

Consider two different accident years i < l. From the model assumptions we know
that the ultimate losses Ci,J and Cl,J are independent. Nevertheless we have to be
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

60

Chapter 3. Chain-ladder models

CL
d
C
i,J

CL reserves

0
1
2
3
4
5
6
7
8
9

11148124
10663318
10662008
9758606
9872218
10092247
9568143
8705378
8691971
9626383

15126
26257
34538
85302
156494
286121
449167
1043242
3950815

d Ci,J |DI 1/2


Var

1/2
CL
d C
d
Var
|DI
i,J

d
msep
[ Ci,J |DI (C
i,J

191
742
2669
6832
30478
68212
80077
126960
389783

187
535
1493
3392
13517
27286
29675
43903
129770

267
914
3058
7628
33341
73467
85398
134337
410817

1.3%
2.8%
7.7%
8.0%
19.5%
23.8%
17.8%
12.2%
9.9%

1.2%
2.0%
4.3%
4.0%
8.6%
9.5%
6.6%
4.2%
3.3%

CL 1/2
)

1.8%
3.5%
8.9%
8.9%
21.3%
25.7%
19.0%
12.9%
10.4%

Table 3.4: Estimated chain-ladder reserves and error terms according to Estimator
3.11
i

CL
d
C
i,J

CL reserves

0
1
2
3
4
5
6
7
8
9

11148124
10663318
10662008
9758606
9872218
10092247
9568143
8705378
8691971
9626383

15126
26257
34538
85302
156494
286121
449167
1043242
3950815

d Ci,J |DI 1/2


Var

1/2
CL
d C
d
Var
|DI
i,J

CL 1/2
[
d
msep
[ Ci,J |DI (C
)
i,J

191
742
2669
6832
30478
68212
80077
126960
389783

187
535
1493
3392
13517
27286
29675
43903
129769

267
914
3058
7628
33341
73467
85398
134337
410817

1.3%
2.8%
7.7%
8.0%
19.5%
23.8%
17.8%
12.2%
9.9%

1.2%
2.0%
4.3%
4.0%
8.6%
9.5%
6.6%
4.2%
3.3%

1.8%
3.5%
8.9%
8.9%
21.3%
25.7%
19.0%
12.9%
10.4%

Table 3.5: Estimated chain-ladder reserves and error terms according to Estimator
3.12
CL
CL
d
d
careful if we aggregate C
and C
. The estimators are no longer independent
i,J
l,J
since they use the same observations for estimating the age-to-age factors fj . We
have that


2 

CL
CL
CL
CL
d
d
d
d
=E C
+C
(Ci,J + Cl,J ) DI
+C
msepCi,J +Cl,J |DI C
l,J
i,J
i,J
i,J


2
CL
CL
d
d
= Var (Ci,J + Cl,J | DI ) + C
+
C

E
[C
+
C
|
D
]
.
i,J
l,J
i,J
l,J
I

(3.78)

Using the independence of the different accident years, we obtain for the first term
Var (Ci,J + Cl,J | DI ) = Var (Ci,J | DI ) + Var (Cl,J | DI ) ,

(3.79)

whereas for the second term we obtain




2
CL
CL
d
d
C
+C
E [Ci,J + Cl,J | DI ]
i,J
l,J

2 
2
CL
CL
d
d
= C
E [Ci,J | DI ] + C
E [Cl,J | DI ]
i,J
l,J

 

CL
CL
d
d
+2 Ci,J E [Ci,J | DI ] Cl,J E [Cl,J | DI ] .

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

(3.80)

Chapter 3. Chain-ladder models

61

Hence we have the following decomposition for the conditional prediction error of
the sum of two accident years

2 
CL
CL
d
d
E Ci,J + Cl,J (Ci,J + Cl,J ) DI


2 
2 
CL
CL
d
d
=E C
Ci,J DI + E C
Cl,J DI
(3.81)
i,J
l,J

 

CL
CL
d
d
+2 Ci,J E [Ci,J | DI ] Cl,J E [Cl,J | DI ] .
Hence we obtain


CL
CL
d
d
msepCi,J +Cl,J |DI C
+C
i,J
i,J




CL
CL
d
d
= msepCi,J |DI Ci,J
+ msepCl,J |DI Cl,J

 

CL
CL
d
d

E
[C
|
D
]

E
[C
|
D
]
.
+2 C
i,J
i,J
I
l,J
l,J
I

(3.82)

In addition to the conditional mean square error of prediction of single accident


years, we need to average similar to (3.40) over the possible values of fbj for the
cross-products of the conditional estimation errors of the two accident years:

 

CL
CL
d
d
C
E [Ci,J | DI ] C
E [Cl,J | DI ]
i,J
l,J


(3.83)
= Ci,Ii fbIi . . . fbJ1 fIi . . . fJ1


Cl,Il fbIl . . . fbJ1 fIl . . . fJ1 .
Now we could have the same discussions about resampling as above. Here we
simply use Approach 3 for resampling, i.e. we choose the probability measure PD I .
Then we can explicitly calculate these cross-products. As in (3.65) we obtain as
estimate for the cross-products
" J1
!
!#
J1
J1
J1
Y
Y
Y
Y
Ci,Ii Cl,Il E
fbj
fj
fbj
fj
(3.84)
DI

j=Ii

j=Ii

j=Il

j=Il


b
b
b
b
= Ci,Ii Cl,Il Cov
fIi . . . fJ1 , fIl . . . fJ1


= Ci,Ii Cl,Il fIl . . . fIi1 VarPD fbIi . . . fbJ1
I
!
  
J1
J1
Y
Y
2
= Ci,Ii Cl,Il fIl . . . fIi1
ED I fbj

fj2

PD
I

j=Ii

= Ci,Ii E [ Cl,Ii | DI ]

J1
Y
j=Ii

fj2 +


j2
Sj

j=Ii

J1
Y

!
fj2

j=Ii

But then the estimation of the covariance term is straightforward from the estimate
of a single accident year.
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

62

Chapter 3. Chain-ladder models

Estimator 3.14 (MSEP aggregated accident years, conditional version)


Under Model Assumptions 3.9 we have the following estimator for the conditional
mean square error of prediction of the ultimate claim for aggregated accident years

!
!2
I
I
I
X
X
X

CL
CL
d
b
d
msep
[ P Ci,J |DI
C
=E
C

Ci,J DI
i,J
i,J
i

i=IJ+1
i=IJ+1
i=IJ+1
=

I
X

msep
[ Ci,J |DI

CL
d
C
i,J

i=IJ+1

+2

Ci,Ii C[
l,Ii

CL

J1
Y
j=Ii

IJ+1i<lI

bj2
fbj2 +
Sj

J1
Y

!
fbj2

. (3.85)

j=Ii

Remarks 3.15
The last terms (covariance terms) from the result above can be rewritten as
2

X
IJ+1i<lI

CL


CL
C[
l,Ii
d
d
Var Ci,J DI ,
Ci,Ii

(3.86)



CL
d
d
where Var Ci,J DI is the conditional estimation error of the single accident year i (see (3.66)). This may be helpful in the implementation since it
leads to matrix multiplications.
We can again do a linear approximation and then we find the estimator
presented in Mack [49].
Example 3.7 revisited
We come back to our example in Table 2.2. This gives the error estimates in Table
3.6.

3.3

Analysis of error terms

In this section we further analyze the conditional mean square error of prediction of
the chain-ladder method. In fact, we consider three different kinds of error terms: a)
conditional process error, b) conditional prediction error, c) conditional estimation
error. To analyze these three terms we define a model, which is different from
the classical chain-ladder model. It is slightly more complicated than the classical
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

Chapter 3. Chain-ladder models


i
0
1
2
3
4
5
6
7
8
9
Cov. term
Total

63

d Ci,J |DI 1/2


Var

1/2
CL
d C
d
Var
|DI
i,J

d
msep
[ Ci,J |DI (C
i,J

15126
26257
34538
85302
156494
286121
449167
1043242
3950815

191
742
2669
6832
30478
68212
80077
126960
389783

1.3%
2.8%
7.7%
8.0%
19.5%
23.8%
17.8%
12.2%
9.9%

6047061

424379

7.0%

187
535
1493
3392
13517
27286
29675
43903
129770
116811
185026

267
914
3058
7628
33341
73467
85398
134337
410817
116811
462960

CL
d
C
i,J

CL reserves

11148124
10663318
10662008
9758606
9872218
10092247
9568143
8705378
8691971
9626383

1.2%
2.0%
4.3%
4.0%
8.6%
9.5%
6.6%
4.2%
3.3%
3.1%

CL 1/2
)

1.8%
3.5%
8.9%
8.9%
21.3%
25.7%
19.0%
12.9%
10.4%
7.7%

Table 3.6: Estimated chain-ladder reserves and error terms (Estimator 3.14)
model but therefore leads to a clear distinction between these error terms. The
motivation for a clear distinction between the three error terms is that the sources
of these error classes are rather different ones and we believe that in the light of the
solvency discussions (see e.g. SST [73], Sandstrom [67], Buchwalder et al. [11, 14]
or W
uthrich [88]) we should clearly distinguish between the different risk factors.
In this section we closely follow W
uthrich [90]. For a similar Bayesian approach
we also refer to Gisler [29].

3.3.1

Classical chain-ladder model

The observed individual development factors were defined by (see also (3.19))
Fi,j =

Ci,j
,
Ci,j1

(3.87)

then we have with Model Assumptions 3.2 that


E [Fi,j | Ci,j1 ] = fj1

and

Var (Fi,j | Ci,j1 ) =

2
j1
.
Ci,j1

(3.88)

The conditional variational coefficients of the development factors Fi,j are given by
Vco (Fi,j | Ci,j1 ) = Vco (Ci,j | Ci,j1 ) =

j1
1/2
Ci,j1 0,
fj1

as Ci,j1 .

(3.89)
Hence for increasing volume the conditional variational coefficients of Fi,j converge
to zero! It is exactly this property (3.89) which is crucial in risk management. If
we assume that risk is defined through these variational coefficients, it means that
the risk completely disappears for very large portfolios (law of large numbers). But
we all know that this is not the case in practice. There are always external factors, which influence a portfolio and which are not diversifiable, e.g. if jurisdiction
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

64

Chapter 3. Chain-ladder models

changes it is not helpful to have a large portfolio, etc. Also the experiences in
recent years have shown that we have to be very careful about external factors and
parameter errors since they can not be diversified. Therefore, in almost all developments of new solvency guidelines and requirements one pays a lot of attention
to these risks. The goal here is to define a chain-ladder model, which reflects also
this kind of risk class.

3.3.2

Enhanced chain-ladder model

The approach in this section modifies (3.89) as follows. We assume that there exist
constants a20 , a21 , . . . 0 such that for all 1 j J we have that
Vco2 (Fi,j | Ci,j1 ) =

2
j1
1
Ci,j1
+ a2j1 .
2
fj1

(3.90)

Hence
Vco2 (Fi,j | Ci,j1 ) >

lim

Ci,j1

Vco2 (Fi,j | Ci,j1 ) = a2j1 ,

(3.91)

which is now bounded from below by a2j1 . This implies that we replace the chainladder condition on the variance by
2
2
2
Var (Ci,j | Ci,j1 ) = j1
Ci,j1 + a2j1 fj1
Ci,j1
.

(3.92)

This means that we add a quadratic term to ensure that the variational coefficient
does not disappear when the volume is going to infinity.
As above, we define the chain-ladder consistent time series model. This time series
model gives an algorithm how we should simulate additional observations. This
algorithm will be used for the calculation of the estimation error.
Model Assumptions 3.16 (Enhanced time series model)
Different accident years i are independent.
There exist constants fj > 0, j2 > 0, a2j 0 and random variables i,j+1 such
that for all i {0, . . . , I} and j {0, . . . , J 1} we have that
Ci,j+1 = fj Ci,j + j2 + a2j fj2 Ci,j

1/2 p
Ci,j i,j+1 ,

(3.93)

with conditionally, given B0 , i,j+1 are independent with E [i,j+1 | B0 ] = 0,




E 2i,j+1 B0 = 1 and P [Ci,j+1 > 0| B0 ] = 1 for all i {0, . . . , I} and j
{0, . . . , J 1}.
2
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

Chapter 3. Chain-ladder models

65

Remark. See Remarks 3.10.


Lemma 3.17 Model 3.16 satisfies Model Assumptions 3.2 with (3.11) replaced by
(3.92).
I.e. the model satisfies the chain-ladder assumptions with modified variance function. For aj = 0 we obtain the Time Series Version 3.9.

3.3.3

Interpretation

In this subsection we give an interpretation to the variance term (3.92). Alternatively, we could use a model with latent variables i,j . This is similar to the
Bayesian approaches such as used in Gisler [29] saying that the true chain-ladder
factors fj are themselves random variables (depending on external/latent factors).
(A1) Conditionally, given i,j , we have
E [Ci,j+1 | i,j , Ci,j ] = fj (i,j ) Ci,j ,
Var (Ci,j+1 | i,j , Ci,j ) =

j2 (i,j )

Ci,j .

(3.94)
(3.95)

(A2) i,j are independent with


E [fj (i,j )| Ci,j ] = fj ,
Var ( fj (i,j )| Ci,j ) = a2j fj2 ,



E j2 (i,j ) Ci,j = j2 .

(3.96)
(3.97)
(3.98)
2

Remark. The variables Fi,j = Ci,j+1 /Ci,j satisfy the B


uhlmann-Straub model
assumptions (see B
uhlmann-Gisler [18] and Section 4.3 below).
For the variance term we obtain
Var (Ci,j+1 | Ci,j ) = E [Var (Ci,j+1 | i,j , Ci,j )| Ci,j ]

(3.99)

+Var (E [Ci,j+1 | i,j , Ci,j ]| Ci,j )


2
.
= j2 Ci,j + a2j fj2 Ci,j

(3.100)

Moreover we see that


Vco ( fj (i,j )| Ci,j ) = aj .
Hence we introduce the following terminology:

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

(3.101)

66

Chapter 3. Chain-ladder models

a) Conditional process error / Conditional process variance. The conditional process error corresponds to the term
j2 Ci,j

(3.102)

and reflects the fact that Ci,j+1 are random variables which have to be predicted.
For increasing volume Ci,j the variational coefficient of this term disappears.
b) Conditional Prediction error. The conditional prediction error corresponds
to the term
2
a2j fj2 Ci,j
(3.103)
and reflects the fact that we have to predict the future development factors fj (i,j ).
These future development factors underlay also some uncertainty, and hence may
be modelled stochastically (Bayesian point of view). The Mack formula and the
Estimator 3.14 for the conditional mean square error of prediction does not consider
this kind of risk.
c) Conditional estimation error. There is a third kind of risk, namely the
risk which comes from the fact that we have to estimate the true parameters fj in
(3.96) from the data. This error term will be called conditional estimation error.
It is also considered in the Mack model and in Estimator 3.14. For the derivation
of an estimate for this term we will use Approach 3, page 50. This derivation will
use the time series definition of the chain-ladder method.

3.3.4

Chain-ladder estimator in the enhanced model

Under Model Assumptions 3.16 we have that


1
Fi,j+1 = fj + j2 Ci,j
+ a2j fj2

1/2

i,j+1 ,

(3.104)

with
E [Fi,j+1 | Ci,j ] = fj

and

1
Var (Fi,j+1 | Ci,j ) = j2 Ci,j
+ a2j fj2 .

(3.105)

This immediately gives the following lemma:


Lemma 3.18 Under Model Assumptions 3.16 we have for i > I J that
E [Ci,J | DI ] = E [Ci,J | Ci,Ii ] = Ci,Ii

J1
Y
j=Ii

Proof. See proof of Lemma 2.3.


c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

fj .

(3.106)

Chapter 3. Chain-ladder models

67
2

Remark. As soon as we know the chain-ladder factors fj we can calculate the


expected conditional ultimate Ci,J , given the information DI . Of course, in general,
the chain-ladder factors fj are not known and need to be estimated from the data.

3.3.5

Conditional process and prediction errors

We derive now the recursive formula for the conditional process and prediction error: Under Model Assumptions 3.16 we have for the ultimate claim Ci,J of accident
year i > I J that
Var ( Ci,J | DI ) = Var (Ci,J | Ci,Ii )

(3.107)

= E [Var ( Ci,J | Ci,J1 )| Ci,Ii ] + Var (E [Ci,J | Ci,J1 ]| Ci,Ii ) .


For the first term on the right-hand side of (3.107) we obtain under Model Assumptions 3.16 that
E [Var ( Ci,J | Ci,J1 )| Ci,Ii ]

 2

2
2
Ci,Ii
= E J1
Ci,J1 + a2J1 fJ1
Ci,J1
=

2
J1

J2
Y

(3.108)

2
fj Ci,Ii + a2J1 fJ1
Var (Ci,J1 | DI ) + E [Ci,J1 | Ci,Ii ]2

j=Ii
2
= Ci,Ii

J2
J1
2
Y
Y
J1
2

fj + aJ1
fj2
Ci,Ii j=Ii
j=Ii

!
2
+ a2J1 fJ1
Var (Ci,J1 | DI ) .

For the second term on the right-hand side of (3.107) we obtain under Model
Assumptions 3.16
Var (E [Ci,J | Ci,J1 ]| Ci,Ii ) = Var (fJ1 Ci,J1 | Ci,Ii )

(3.109)

2
= fJ1
Var (Ci,J1 | DI ) .

This leads to the following recursive formula (compare this to (3.32))


2
Var (Ci,J | DI ) = Ci,Ii

J2
J1
2
Y
Y
J1
2

fj + aJ1
fj2
Ci,Ii j=Ii
j=Ii

2
+(1 + a2J1 ) fJ1
Var (Ci,J1 | DI ) .

For a2J1 = 0 it coincides with the formula given in (3.32).


This gives the following lemma:
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

!
(3.110)

68

Chapter 3. Chain-ladder models

Lemma 3.19 (Process/prediction errors for single accident years)


Under Model Assumptions 3.16 the conditional process variance and prediction
errors for the ultimate claim of a single accident year i {I j + 1, . . . , I} are
given by
Var (Ci,J | DI )
"

!#
m
m1
2
Y
Y

m
2
fj2

(1 + a2n ) fn2

fj + a2m
= Ci,Ii
C
i,Ii
j=Ii
m=Ii n=m+1
j=Ii
" J1 
#

J1
2
2
X
Y

/f
m
m
= E [ Ci,J | DI ]2
+ a2m
(1 + a2n ) .
(3.111)
E
[
C
|
D
]
i,m
I
n=m+1
m=Ii
J1
X

J1
Y

Lemma 3.19 implies that the conditional variational coefficient of the ultimate Ci,J
is given by
" J1 
#1/2
 J1
2
2
X
Y
m
/fm
(1 + a2n )
.
(3.112)
Vco ( Ci,J | DI ) =
+ a2m
E
[C
|
D
]
i,m
I
n=m+1
m=Ii
Henceforth we see that the conditional prediction error of Ci,J corresponds to
(the conditional process error disappears for infinitely large volume Ci,Ii )
#1/2
" J1
J1
Y
X
2
2
(1 + an )
,
(3.113)
lim Vco ( Ci,J | DI ) =
am
Ci,Ii

n=m+1

m=Ii

and the conditional variational coefficient for the conditional process error of
Ci,J is given by
" J1 
#1/2
 J1
2
2
X
Y
/fm
m
(1 + a2n )
.
(3.114)

E
[C
|
D
]
i,m
I
n=m+1
m=Ii

3.3.6

Chain-ladder factors and conditional estimation error

The conditional estimation error comes from the fact that we have to estimate the
fj from the data.
Estimation Approach 1
From Lemma 3.4 we obtain the following lemma:
Lemma 3.20 Under Model Assumptions 3.16, the estimator
i (j+1)
P

Fbj =

Ci,j
j2 +a2j fj2 Ci,j

i=0
i (j+1)
P
i=0

i (j+1)
P

Fi,j+1

Ci,j
j2 +a2j fj2 Ci,j

i=0
i (j+1)
P
i=0

Ci,j+1
j2 +a2j fj2 Ci,j
Ci,j
j2 +a2j fj2 Ci,j

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

(3.115)

Chapter 3. Chain-ladder models

69

is the Bj+1 -measurable unbiased estimator for fj , which has minimal conditional
variance among all linear combinations of the unbiased estimators (Fi,j+1 )0ii (j+1)
for fj , conditioned on Bj , i.e.


i (j+1)
X

b
Var(Fj |Bj ) = minVar
i Fi,j+1 Bj .
(3.116)
i R

i=0
The conditional variance is given by

i (j+1)
 
X

Var Fbj Bj =
i=0

1
j2

Ci,j

+ fj2 Ci,j
a2j

(3.117)

Proof. From (3.105) we see that Fi,j+1 is an unbiased estimator for fj , conditioned
on Bj , with
E [ Fi,j+1 | Bj ] = E [Fi,j+1 | Ci,j ] = fj ,

(3.118)

1
Var (Fi,j+1 | Bj ) = Var (Fi,j+1 | Ci,j ) = j2 Ci,j
+ a2j fj2 .

(3.119)

Hence the proof follows from Lemma 3.4.


2
Remark. For aj = 0 we obtain the classical chain-ladder estimators (2.7). Moreover, observe that for calculating the estimate Fbj one needs to know the parameter
fj , aj and j (see (3.115)). Of course this contradicts the fact that we need to
estimate fj . One way out of this dilemma is to use an estimate for fj which is not
optimal, i.e. has larger variance.
Let us (in Estimation Approach 1) assume that we can calculate (3.115).
Estimator 3.21 (Chain-ladder estimator, enhanced time series model)
The CL estimator for E [Ci,j | DI ] in the Enhanced Model 3.16 is given by
d
C
i,j

(CL,2)

b [Ci,j | DI ] = Ci,Ii
=E

j1
Y

Fbl .

(3.120)

l=Ii

for i + j > I.
We obtain the following lemma for the estimators in the enhanced time series
model:
Lemma 3.22 Under Assumptions 3.16 we have:
h i

b
a) Fj is, given Bj , an unbiased estimator for fj , i.e. E Fbj Bj = fj ,
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

70

Chapter 3. Chain-ladder models


h i
b) Fbj is (unconditionally) unbiased for fj , i.e. E Fbj = fj ,
h
i J1
Q hb i
c) Fb0 , . . . , FbJ1 are uncorrelated, i.e. E Fb0 . . . FbJ1 =
E Fj ,
j=0
(CL,2)
d
d) C
is, giveniCi,Ii , an unbiased estimator for E [Ci,J | DI ], i.e.
i,J
h
(CL,2)
d
E C
CIi = E [Ci,J | DI ] and
i,J

h
i
(CL,2)
(CL,2)
d
d
e) C
is
(unconditionally)
unbiased
for
E
[C
],
i.e.
E
C
=
i,J
i,J
i,J
E [Ci,J ].
Proof. See proof of Lemma 2.5.
2
Single accident years
In the sequel of this subsection we assume that the parameters in (4.62) are known
to calculate Fbj .
Our goal is to estimate the conditional mean square error of prediction (conditional
MSEP) as in the classical chain-ladder model

 


(CL,2) 2
(CL,2)
d
d
DI
= E Ci,J C
(3.121)
msepCi,J |DI C
i,J
i,J

(CL,2)
d
= Var ( Ci,J | DI ) + E [Ci,J | DI ] C
i,J

2

The first term is exactly the conditional process variance and the conditional prediction error obtained in Lemma 3.19, the second term is the conditional estimation
error. It is given by
!2
J1
J1


Y
Y
(CL,2) 2
d
E [Ci,J | DI ] C
= C2
fj
Fbj .
(3.122)
i,J
i,Ii

j=Ii

j=Ii

Observe that
i (j+1)
P

Fbj =

Ci,j
j2 +a2j fj2 Ci,j

i=0
i (j+1)
P
i=0

= fj +

Fi,j+1

1
i (j+1)
P
i=0

(3.123)

Ci,j
j2 +a2j fj2 Ci,j

Ci,j
j2 +a2j fj2 Ci,j

i (j+1) 

X
i=0

Ci,j
2
2
j + aj fj2 Ci,j

1/2
i,j+1 .

Hence Fbj consists of a constant fj and a stochastic error term (see also Lemma
3.20). In order to determine the conditional estimation error we now proceed as
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

Chapter 3. Chain-ladder models

71

in Section 3.2.3 for the Time Series Model 3.9. This means that we use Approach
O
3 (conditional resampling in DI,i
, page 50) to estimate the fluctuations of the
b
b
estimators F0 , . . . , FJ1 around the chain-ladder factors f0 , . . . , fJ1 , i.e. to get an
estimate for (3.122).
We therefore (conditionally) resample the observations Fb0 , . . . , FbJ1 , given DI , and
use the resampled estimates to calculate an estimate for the conditional estimation error. For these resampled observations we again use the notation PD I for
the conditional measure (for a more detailed discussion we refer to Section 3.2.3).
Moreover, under PD I , the random variables Fbj are independent with
h

ED I Fbj = fj

and ED I



Fbj

2 

i (j+1)
X
2

= fj +

j2

i=0

Ci,j

+ fj2 Ci,j

(3.124)

a2j

(cf. Section 3.2.3, Approach 3). This means that the conditional estimation error
(3.122) is estimated by

!2
!
J1
J1
J1
Y
Y
Y
2
2
Fbj = Ci,Ii
ED I Ci,Ii

fj
VarPD
Fbj
I

j=Ii

j=Ii

2
Ci,Ii

J1
Y

j=Ii



ED I

Fbj

2 

j=Ii

2
= Ci,Ii

fj2

j=Ii

J1
Y

fj2

(3.125)

j=Ii

J1
Y

J1
Y

i (j+1)

j=Ii

k=0

Ck,j
j2
fj2

+ a2j Ck,j

j=Ii

2
Ci,Ii

J1
Y
I

j=Ii
J1
Y

j=Ii

Fbj

i (j+1)
J1
X
X

j=Ii

j=Ii

fj2

+ 1 1 .

Finally, if we do a linear approximation to (3.125) we obtain

!2
J1
J1
Y
Y
2
2
ED I Ci,Ii

fj
Fbj = Ci,Ii
VarPD

k=0

Ck,j
j2
fj2

a2j

Ck,j

(3.126)

For aj = 0 this is exactly the conditional estimation error in the Mack Model 3.2.
For increasing number of observations (accident years i) this error term goes to
zero.
If we use the linear approximation (3.126) and if we replace the parameters in
(3.111) and (3.126) by their estimators (cf. Section 3.3.7) we obtain the following
estimator for the conditional mean square error of prediction (for the time being
we assume that j2 and a2j are known).
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

72

Chapter 3. Chain-ladder models

Estimator 3.23 (MSEP for single accident years)


Under Model Assumptions 3.16 we have the following estimator for the conditional
mean square error of prediction for the ultimate claim of a single accident year
i {I J + 1, . . . , I}
!
"
J1



 X
j2
(CL,2)
(CL,2) 2
d
d
+ a2j (3.127)
msep
[ Ci,J |DI C
= C

i,J
i,J
2
(CL,2)
cj C
d
F
j=Ii
i,j

1
#
i (j+1)
J1
Y
X
C

k,j

(1 + a2n ) +
.

j2
2
n=j+1
k=0 c 2 + aj Ck,j
Fj

Aggregated accident years


Consider two different accident years k < i. From our assumptions we know
that the ultimate losses Ck,J and Ci,J are independent. Nevertheless we have to
(CL,2)
(CL,2)
d
d
be careful if we aggregate C
and C
. The estimators are no longer
k,J
i,J
independent since they use the same observations for estimating the chain-ladder
factors fj .

2 
(CL,2)
(CL,2)
d
d
E C
+C
(Ck,J + Ci,J ) DI
(3.128)
k,J
i,J

2
(CL,2)
(CL,2)
d
d
.
= Var (Ck,J + Ci,J | DI ) + C
+
C

E
[C
+
C
|
D
]
k,J
i,J
k,J
i,J
I
Using the independence of the different accident years, we obtain for the first term
Var (Ck,J + Ci,J | DI ) = Var (Ck,J | DI ) + Var (Ci,J | DI ) .

(3.129)

This term is exactly the conditional process and prediction error from Lemma 3.19.
For the second term (3.128) we obtain

2
(CL,2)
(CL,2)
d
d
C
+
C

E
[C
+
C
|
D
]
k,J
i,J
k,J
i,J
I
2



2
(CL,2)
(CL,2)
d
d
= Ck,J
E [Ck,J | DI ] + Ci,J
E [Ci,J | DI ]
(3.130)




(CL,2)
(CL,2)
d
d
E [Ck,J | DI ] C
E [Ci,J | DI ] .
+2 C
k,J
i,J
Hence we have the following decomposition for the conditional mean square error
of prediction error of the sum of two accident years

2 
(CL,2)
(CL,2)
d
d
E C
+C
(Ck,J + Ci,J ) DI
k,J
i,J


2 
2 
(CL,2)
(CL,2)
d
d

= E Ck,J
Ck,J DI + E Ci,J
Ci,J DI
(3.131)

 

(CL,2)
(CL,2)
d
d
+2 Ck,J
E [Ck,J | DI ] Ci,J
E [Ci,J | DI ] .
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

Chapter 3. Chain-ladder models

73

In addition to the conditional MSEP of single accident years (see Estimator 3.23),
we need to average the covariance terms over the possible values of Fbj similar to
(3.122):


 

(CL,2)
(CL,2)
d
d
Ck,J
E [Ck,J | DI ] Ci,J
E [Ci,J | DI ]
!
!
J1
J1
J1
J1
Y
Y
Y
Y
= Ck,Ik
Fbl
fl Ci,Ii
Fbl
fl .
l=Ik

l=Ik

l=Ii

(3.132)

l=Ii

As in (3.125), using Approach 3, we obtain for the covariance term (3.132)


"

J1
Y

ED I Ck,Ik Ci,Ii
= Ck,Ik Ci,Ii

Fbj

j=Ik

Ik1
Y

J1
Y

fj

Ik1
Y

J1
Y

J1
Y

fj

ED I

j=Ii

2 

Fbj

fj

J1
Y

!
fj2

(3.133)

j=Ik

fj2

j=Ik

i (j+1)

j=Ik

j=Ii



!#

J1
Y

Fbj

j=Ik

j=Ii

J1
Y

fj

j=Ik

j=Ii

= Ck,Ik Ci,Ii

J1
Y

m=0

Cm,j
j2
fj2

+ a2j Cm,j

+ 1 1 .

If we do the same linear approximation as in (3.126) the estimation of the covariance


term is straightforward from (3.117).
Estimator 3.24 (MSEP for aggregated accident years)
Under Model Assumptions 3.16 we have the following estimator for the conditional
mean square error of prediction of the ultimate claim for aggregated accident years
msep
[ Pi Ci,J |DI

I
X

d
C
i,J

IJ+1

+2

(CL,2)

!
=

I
X



(CL,2)
d
msep
[ Ci,J |DI C
i,J

(3.134)

i=IJ+1

(CL,2)
(CL,2)
d
d
C
C

k,J
i,J

IJ+1k<iI

J1
X
j=Ik

i (j+1)

m=0

Cm,j
j2
cj
F

+ a2j Cm,j

Estimation Approach 2
In the derivation of the estimate Fbj , see (4.62), we have seen that we face the
problem, that the parameters need already be known in order to estimate them.
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

74

Chapter 3. Chain-ladder models

We could also use a different (unbiased) estimator. We define


i (j+1)
P

cj (0) =
F

Ci,j
i=0
i (j+1)
P

i (j+1)
P

Fi,j+1

Ci,j+1

i=0
i (j+1)
P

=
Ci,j

i=0

(3.135)

Ci,j

i=0

cj (0) = fbj is the classical chain-ladder estimator in the Mack Model 3.2. It is
F
optimal under the Mack variance condition, but it is not optimal under our variance
condition (3.90). Observe that
i (j+1)

(0)


c
Var Fj Bj
=


i (j+1)
P

X
!2

Var ( Ci,j+1 | Ci,j )

(3.136)

i=0

Ci,j

i=0
i (j+1)
P

i=0

2
j2 Ci,j1 + a2j fj2 Ci,j
!2
i (j+1)
P
Ci,j
i=0

a2j

j2
i (j+1)
P

fj2

i (j+1)
P
i=0

+
i (j+1)
P

Ci,j

i=0

2
Ci,j
!2 .

Ci,j

i=0

This immediately gives the following corollary:


Corollary 3.25 Under Model Assumptions 3.16 we have for i > I J that
Ci,Ii

J1
Y

cj (0)
F

(3.137)

j=Ii

defines a conditionally, given Ci,Ii , unbiased estimator for E [Ci,J | DI ]. The process variance and the prediction error is given by Lemma 3.19.
For the estimation error of a single accident year in Approach 3 we obtain the
estimate

i (j+1)
P

2
2
2
Ci,j
a

J1
J1

2
j
j
Y
Y

j
i=0
2
2
2
+
fj . (3.138)
Ci,Ii
i (j+1)
!2 + fj

i (j+1)
P
j=Ii
j=Ii

Ci,j
Ci,j
i=0

i=0

This expression is of course larger than the one obtained in (3.125).


c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

Chapter 3. Chain-ladder models

3.3.7

75

Parameter estimation

We need to estimate three families of parameters fj , j and aj . For Estimation


Approach 1 the estimation of fj is given in (3.115), which gives only an implicit
expression for the estimation of fj , since the chain-ladder factors appear also in
the weights. Therefore we propose an iterative estimation in Approach 1 (on the
other hand there is no difficulty in Estimation Approach 2).

Estimation of aj . The sequence aj can usually not be estimated from the data,
unless we have a very large portfolio (Ci,j ), such that the conditional process
error disappears. Hence aj can only be obtained if we have data from the whole
insurance market. This kind of considerations have been done for the determination
of the parameters for prediction errors in the Swiss Solvency Test (see e.g. Tables
6.4.4 and 6.4.7 in [73]). Unfortunately, the tables only give an overall estimate for
the conditional prediction error, not a sequence aj (e.g. the variational coefficient of
the overall error (similar to (3.101)) for motor third party liability claims reserves
is 3.5%).
We reconstruct aj with the help of (3.113). Define for j = 0, . . . , J 1

Vj2 =

J1
X

J1
Y

a2m

m=j1

(1 + a2n ).

(3.139)

n=m+1

2
:
Hence aj1 can be determined recursively from Vj2 Vj+1

2
a2j1 = Vj2 Vj+1

Y
 J1

(1 + a2n )1 .

(3.140)

n=j

Henceforth, we can estimate aj1 as soon as we have an estimate for Vj . Vj corresponds to


Vj =

lim

Ci,j1

Vco ( Ci,J | Ci,j1 )

(3.141)

(cf. (3.113)). Hence we need to estimate the conditional prediction error of Ci,J ,
given the observation Ci,j1 . Since we do not really have a good idea/guess about
the conditional variational coefficient in (3.141) we express the conditional variac
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

76

Chapter 3. Chain-ladder models

tional coefficient in terms of reserves


Var ( Ci,J | Ci,j1 )1/2
Vco ( Ci,J | Ci,j1 ) =
E [Ci,J | Ci,j1 ]

(3.142)

Var ( Ci,J Ci,j1 | Ci,j1 )1/2 E [Ci,J Ci,j1 | Ci,j1 ]


=

E [Ci,J Ci,j1 | Ci,j1 ]


E [Ci,J | Ci,j1 ]
J1
Q
fl 1
l=j1
.
= Vco ( Ci,J Ci,j1 | Ci,j1 )
J1
Q
fl
l=j1

In our examples we assume that the conditional variational coefficient for the conditional prediction error of the reserves Ci,J Ci,j1 is constant equal to r and we
set
J1
Q b (0)
Fl 1
l=j1
Vbj = r
.
(3.143)
J1
Q b (0)
Fl
l=j1

This immediately gives an estimate b


aj for the conditional prediction error aj .
Estimation of j . j2 is estimated iteratively from the data. A tedious calculation
on conditional expectation gives

i (j+1)
2 
X
1
(0)
Bj
Ci,j E Fi,j+1 Fbj

i (j + 1) i=0

i (j+1)
P
2
Ci,j
i (j+1)
a2j fj2
X

i=0
2
= j +
Ci,j i (j+1)

.
i (j + 1) i=0

P
Ci,j

(3.144)

i=0

Hence we get the following iteration for the estimation of j2 : For k 1


bj2

(k)

i (j+1)

2
X
1
(0)
Ci,j Fi,j+1 Fbj
=
i (j + 1) i=0

i (j+1)

2
P
2
(k1)
Ci,j
(j+1)
iX
abj 2 Fbj

i=0

Ci,j i (j+1)

i (j + 1)
i=0

P
Ci,j

(3.145)

i=0
(k)

If bj2 becomes negative, it is set to 0, i.e. we only have a conditional prediction


error and the conditional process error is equal to zero (the volume is sufficiently
large, such that the conditional process error disappears).
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

Chapter 3. Chain-ladder models

77

Estimation of Fbj . The estimators Fbj are then iteratively determined via (3.115).
For k 1
i (j+1)
P
Ci,j+1
(k)
Fbj

i=0
i (j+1)
P
i=0

c2 (k) +ab 2 Fb(k1) 2 C

j
i,j
j
j

(3.146)

Ci,j

(k)
(k1) 2
2
c
2
j +abj Fbj
Ci,j

Remarks 3.26
In all examples we have looked at we have observed very fast convergence
(k)
(k)
of b2 and Fb in the sense that we have not observed any changes in the
j

ultimates after three iterations for the Fbj .


To determine j2 we could also choose a different unbiased estimator

i (j+1)
2 
X
Ci,j
1
cj Bj . (3.147)

E Fi,j+1 F
1=

i (j + 1) i=0 j2 + a2j fj2 Ci,j
The difficulty with (3.147) is that it again leads to an implicit expression for
bj2 .
The formula for the MSEP, Estimator 3.24, was derived under the assumption
that the underlying model parameters fj , j and aj are known, if we replace
these parameters by their estimates (as it is described via the iteration in this
section) we obtain additional sources for the estimation errors! However, since
calculations get too tedious (or even impossible) we omit further derivations
of the MSEP and take Estimator 3.24 as a first approximation.
We close this section with an example:
Example 3.27 (MSEP in the enhanced chain-ladder model)
We choose two portfolios: Portfolio A and Portfolio B. Both are of similar type
(i.e. consider the same line of business), moreover Portfolio B is contained in Portfolio A.
Portfolio A

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

2
156159
175502
193823
220123
219680
219201
214692
212443
214161
288676
266497
268848
237102
261213
239800

3
156759
176533
196324
222731
220978
220469
220040
214108
215982
290036
269130
270787
244847
264755

4
157583
176989
198632
222916
221276
222751
223467
214661
217962
292206
269404
271624
245940

5
158666
177269
200299
223320
223724
223958
223754
214610
220783
294531
269691
271688

6
160448
178488
202740
223447
223743
224005
223752
214564
221078
294671
269720

7
160552
178556
203848
223566
223765
224030
223593
214484
221614
294705

8
160568
178620
204168
227103
223669
223975
223585
214459
221616

0
1.4416
18.3478

1
1.0278
8.7551

2
1.0112
3.9082

3
1.0057
2.2050

4
1.0048
2.1491

5
1.0025
2.0887

6
1.0008
0.8302

7
1.0020
2.4751

8
1.0010
1.0757

9
1.0001
0.1280

9
160617
178621
205560
227127
223601
224048
223688
214459

Table 3.7: Observed cumulative payments Ci,j in Portfolio A

1
154622
171449
189682
217342
212770
213352
209969
207644
209604
282621
260308
263130
230607
239723
233309
246019

Table 3.8: Chain-ladder parameters in Macks Model 3.2 for Portfolio A

fbj

bj

0
111551
116163
127615
147659
157495
154969
152833
144223
145612
196695
181381
177168
156505
157839
159429
169990
173377

10
160621
178644
205562
227276
223558
224036
223697

78
Chapter 3. Chain-ladder models

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

Chapter 3. Chain-ladder models

79

This leads in Macks Model 3.2 to the following reserves:


i

CL reserves

7
8
9
10
11
12
13
14
15
16
Total

20
231
898
1044
1731
2747
4487
6803
14025
90809
122795

CL 1/2
[
d
msep
[ Ci,J |DI (C
)
i,J

d Ci,J |DI 1/2


Var

1/2
CL
d C
d
Var
|DI
i,J

64
543
1582
1573
1957
2169
2563
3169
5663
10121
13941

59
510
1468
1470
1838
2055
2426
3030
5443
9762
12336

23
187
589
560
674
693
826
928
1564
2669
6495

322.0%
235.2%
176.1%
150.7%
113.1%
79.0%
57.1%
46.6%
40.4%
11.1%
11.4%

300.4%
220.8%
163.4%
140.9%
106.2%
74.8%
54.1%
44.5%
38.8%
10.8%
10.0%

115.8%
80.9%
65.5%
53.7%
38.9%
25.2%
18.4%
13.6%
11.2%
2.9%
5.3%

Table 3.9: Reserves and conditional MSEP in Macks Model 3.2 for Portfolio A

We compare these results now to the estimates in the Model 3.16: We set r = 5%
and obtain the parameter estimates given below.
Remark. In practice aj can only be determined with the help of external know how
and market data. Therefore, e.g. for solvency purposes, aj should be determined
a priori by the regulator. It answers the question how good can an actuarial
estimate at most be?.

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

cj (3)
F

cj (1)

cj (2)

cj (3)

cj (1)
F
cj (2)
F

3
1.0057
0.0833%
0.0624%

4
1.0048
0.0552%
0.0453%

5
1.0025
0.0316%
0.0251%

6
1.0008
0.0193%
0.0119%

1.01123
3.87516
3.87516
3.87516

1.01123

1.01123

1.00572
2.18642
2.18642
2.18642

1.00572

1.00572

1.00477
2.13924
2.13924
2.13924

1.00477

1.00477

1.00249
2.08568
2.08568
2.08568

1.00249

1.00249

1.00082
0.82851
0.82851
0.82851

1.00082

1.00082

1.00200
2.47435
2.47435
2.47435

1.00200

1.00200

1.00095
1.07546
1.07546
1.07546

1.00095

1.00095

1.0010
0.0052%
0.0052%

Table 3.11: Estimated parameters in Model 3.16 for Portfolio A

1.02784
8.68855
8.68856
8.68856

1.02784

7
1.0020
0.0152%
0.0143%

Table 3.10: Estimated abj and Vbj in Model 3.16

1.44152
1.44152
15.82901
15.82926
15.82926

2
1.0112
0.1379%
0.1099%

1.02784

1.0278
0.2697%
0.2317%

1.44152

1.4416
1.7187%
1.6974%

1.00009
0.12802
0.12802
0.12802

1.00009

1.00009

1.0001
0.0005%
0.0005%

Already after 3 iterations the parameters have sufficiently converged such that the reserves are stable.

cj (0) = fbj
F
c
Vj
abj
0.0000%
0.0000%

10

80
Chapter 3. Chain-ladder models

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

20
231
898
1044
1731
2747
4489
6804
14024
90796
122784

7
8
9
10
11
12
13
14
15
16
Total

64
543
1581
1573
1956
2165
2556
3153
5627
9244
13298

321.9%
235.1%
176.0%
150.7%
113.0%
78.8%
56.9%
46.3%
40.1%
10.2%
10.8%

(CL,2) 1/2
[
d
msep
[ Ci,J |DI (C
)
i,J

59
510
1468
1470
1836
2051
2418
3013
5405
8844
11598

300.4%
220.8%
163.4%
140.8%
106.1%
74.7%
53.9%
44.3%
38.5%
9.7%
9.4%

d Ci,J |DI 1/2


Var
59
510
1467
1469
1834
2046
2408
2994
5360
7590
10640

300.4%
220.7%
163.3%
140.7%
106.0%
74.5%
53.6%
44.0%
38.2%
8.4%
8.7%

process error1/2
1
12
45
52
87
137
224
340
701
4540
4615

5.0%
5.0%
5.0%
5.0%
5.0%
5.0%
5.0%
5.0%
5.0%
5.0%
3.8%

prediction error1/2
23
187
589
560
674
693
826
928
1565
2688
6504

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

Portfolio B
We choose now a second portfolio B, which includes similar business as our example given in Table 3.7 (Portfolio A). In fact,
Portfolio B is a sub-portfolio of Portfolio A given in Table 3.13 containing exactly the same line of business. Therefore we assume
that the conditional prediction errors are the same as in Table 3.12.

Comment. The resulting reserves are almost the same in the Mack Model 3.2 and in Model 3.16. We obtain now both, a
conditional process error and a conditional prediction error term. The sum of these two terms has about the same size as the
conditional process error in Macks method. This comes from the fact that we use the same data to estimate the parameters. But
the error term in the enhanced chain-ladder model is now bounded from below by the conditional prediction error, whereas the
conditional process error in the Mack model converges to zero for increasing volume.

115.8%
80.9%
65.5%
53.7%
38.9%
25.2%
18.4%
13.6%
11.2%
3.0%
5.3%

1/2
(CL,2)
d C
d
Var
|DI
i,J

Table 3.12: Reserves and conditional MSEP in Model 3.16 for Portfolio A

CL reserves

Chapter 3. Chain-ladder models


81

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)
cj (3)
F

cj (1)

cj (2)

cj (3)

cj (1)
F
cj (2)
F

2
74548
89303
97648
106546
108406
108779
111455
101347
103657
147211
147683
151029
129969
150062
128322

3
75076
90033
99429
106919
108677
109093
116231
102624
104516
147777
149575
151960
131858
152883

4
75894
90058
100462
106934
108838
111366
117896
102629
105297
149506
149710
152645
131972

5
76128
90303
101683
107144
110140
111390
118161
102587
107749
149753
149857
152682

6
77904
91454
103549
107170
110110
111422
118157
102545
107911
149865
149890

7
78008
91472
104642
107225
110111
111448
117940
102500
107949
149899

8
78022
91482
104917
107232
110155
111367
117940
102474
107949

1.01168
3.52827
3.52827
3.52827

1.01168

1.01168

1.00632
2.24269
2.24269
2.24269

1.00632

1.00632

1.00463
2.35370
2.35370
2.35370

1.00463

1.00463

1.00415
2.63740
2.63740
2.63740

1.00415

1.00415

1.00102
1.10600
1.10600
1.10600

1.00102

1.00102

1.00026
0.30292
0.30292
0.30292

1.00026

1.00026

1.00088
0.69058
0.69058
0.69058

1.00088

1.00088

9
78071
91483
105560
107232
110155
111369
117972
102474

Table 3.14: Estimated parameters in Model 3.16 for Portfolio B

1.03310
10.86582
10.86582
10.86582

1.03310

1.43999
11.16524
11.16676
11.16676

1.03310

1.43999

Table 3.13: Observed cumulative payments Ci,j Portfolio B

1
73067
87679
95734
105349
105630
105987
108835
98753
101911
144167
143742
147042
126032
131721
124048
129970

1.43999

0
53095
59183
64640
72150
76272
75469
78835
70780
73311
102741
97797
98682
86067
87013
83678
90415
86382

0.99996
0.05593
0.05593
0.05593

0.99996

0.99996

10
78075
91494
105560
107232
110110
111369
117974

82
Chapter 3. Chain-ladder models

19
242
318
557
1232
1453
1835
2144
4726
5885
8933

-485.1%
265.6%
191.6%
174.4%
128.3%
100.6%
69.2%
57.2%
57.5%
12.8%
14.1%

18
228
293
519
1159
1381
1734
2051
4545
5652
7967

-453.9%
249.8%
176.4%
162.5%
120.6%
95.6%
65.4%
54.7%
55.3%
12.3%
12.6%

d Ci,J |DI 1/2


Var
18
228
292
518
1158
1379
1729
2043
4530
5175
7623

-453.8%
249.7%
175.8%
162.2%
120.5%
95.4%
65.3%
54.5%
55.1%
11.3%
12.0%

process error1/2
0
6
23
29
49
74
130
182
373
2273
2316

-12.0%
6.2%
13.7%
9.1%
5.1%
5.1%
4.9%
4.9%
4.5%
5.0%
3.6%

prediction error1/2
7
82
124
202
419
452
599
625
1295
1639
4041

-171.1%
90.5%
74.7%
63.3%
43.7%
31.3%
22.6%
16.7%
15.7%
3.6%
6.4%

1/2
(CL,2)
d C
d
Var
|DI
i,J

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

A more conservative model would be to assume total dependence for the conditional prediction errors between the accident
years, i.e. then we would not observe any diversification of the conditional prediction error between the accident years.

The conditional prediction errors in portfolio A and portfolio B slightly differ since we choose different development factors
Fbj and since the relative weights Ci,Ii between the accident years i differ in portfolio A and portfolio B.

The error terms between portfolio A and portfolio B are now directly comparable: The conditional prediction errors are
the same for both portfolios. The conditional process error decreases now from portfolio B to portfolio A by about factor

2, since portfolio A has about twice the size of portfolio B. The conditional estimation error decreases from portfolio B to
portfolio A since in portfolio A we have more data to estimate the parameters.

Comments.

-4
91
166
320
961
1445
2650
3749
8224
45878
63480

7
8
9
10
11
12
13
14
15
16
Total

(CL,2) 1/2
[
d
msep
[ Ci,J |DI (C
)
i,J

Table 3.15: Reserves and conditional MSEP in Model 3.16 for Portfolio B

CL reserves

Chapter 3. Chain-ladder models


83

84

Chapter 3. Chain-ladder models

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

Chapter 4
Bayesian models

4.1

Introduction to credibility claims reserving


methods

In the broadest sense, Bayesian methods for claims reserving can be considered
as methods in which one combines expert knowledge or existing a priori information with observations resulting in an estimate for the ultimate claim. In the
simplest case this a priori knowledge/information is given e.g. by a single value like
an a priori estimate for the ultimate claim or for the average loss ratio (see this
section and the following section). However, in a strict sense the a priori knowledge/information in Bayesian methods for claims reserving is given by an a priori
distribution of a random quantity such as the ultimate claim or a risk parameter.
The Bayesian inference is then understood to be the process of combining the a priori distribution of the random quantity with the data given in the upper trapezoid
via Bayes theorem. In this manner it is sometimes possible to obtain an analytic
expression for the a posteriori distribution of the ultimate claim that reflects the
change in the uncertainty due to the observations. The a posteriori expectation of
the ultimate claim is then called the Bayesian estimator for the ultimate claim
and minimizes the quadratic loss in the class of all estimators which are square integrable functions of the observations (see Section 4.2). In cases where we are not
able to explicitly calculate the a posteriori expectation of the ultimate we restrict
the search of the best estimator to the smaller class of estimators, which are linear
functions of the observations (see Sections 4.3, 4.4 and 4.5).
85

86

4.1.1

Chapter 4. Bayesian models

Benktander-Hovinen method

This method goes back to Benktander [8] and Hovinen [37]. They have developed
independently a method which leads to the same total estimated loss amount.
Choose i > I J. Assume we have an a priori estimate i for E[Ci,J ] and that
the claims development pattern (j )0jJ with E[Ci,j ] = i j is known. Since
the Bornhuetter-Ferguson method completely ignores the observations Ci,Ii on the
last observed diagonal and the chain-ladder method completely ignores the a priori
estimate i at hand, one could consider a credibility mixture of these two methods
(see (2.23)-(2.24)): For c [0, 1] we define the following credibility mixture
CL
d
Si (c) = c C
+ (1 c) i
i,J

(4.1)

CL

d
is the chain-ladder estimate for the ultimate
for I J + 1 i I, where C
i,J
claim. The parameter c should increase with developing Ci,Ii since we have more
information in Ci,j with increasing time. Benktander [8] proposed to choose c =
Ii . This leads to the following estimator:
Estimator 4.1 (Benktander-Hovinen estimator) The BH estimator is given
by


BH
CL
d
d
(4.2)
C
= Ci,Ii + (1 Ii ) Ii C
+ (1 Ii ) i
i,J
i,J
for I J + 1 i I.
Observe that we could again identify the claims development pattern (j )0jJ with
the chain-ladder factors (fj )0j<J . This can be done if we use Model Assumptions
2.9 for the Bornhuetter-Ferguson motivation, see also (2.22). Henceforth we identify
in the sequel of this section
1
j = QJ1
k=j

fk

(4.3)

Since the development pattern j is known we also have (using (4.3)) known chainladder factors, which implies that we set
fj = fbj

(4.4)

for 0 j J 1. Then the BH estimator can be written in the form


BH
CL
BF
d
d
d
C
= Ii C
+ (1 Ii ) C
i,J
i,J
i,J

d
= Ci,Ii + (1 Ii ) C
i,J

BF

(cf. (2.18) and (2.24)).


c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

(4.5)
(4.6)

Chapter 4. Bayesian models

87

Remarks 4.2
Equation (4.6) shows that the Benktander-Hovinen estimator can be seen as
an iterated Bornhuetter-Ferguson estimator using the BF estimate as new a
priori estimate.
The following lemma shows that the weighting Ii is not a fixe point of our
iteration since we have to evaluate the BH estimate at 1 (1 Ii )2 .
Lemma 4.3 We have that
BH
d
C
= Si 1 (1 Ii )2
i,J

(4.7)

for I J + 1 i I.
Proof. It holds that


BH
CL
d
d
C
=
C
+
(1

C
+
(1

i,J
i,Ii
Ii
Ii
i,J
Ii
i

CL
CL
2
d
d
= Ii C
+ Ii Ii
C
+ (1 Ii )2 i
(4.8)
i,J
i,J


CL
d
= 1 (1 Ii )2 C
+ (1 Ii )2 i = Si 1 (1 Ii )2 .
i,J
This finishes the proof of the lemma.
2
Example 4.4 (Benktander-Hovinen method)
We revisit the data set given in Examples 2.7 and 2.11. We see that the Benktanderestimated reserves
i
0
1
2
3
4
5
6
7
8
9
Total

Ci,Ii
11148124
10648192
10635751
9724068
9786916
9935753
9282022
8256211
7648729
5675568

i
11653101
11367306
10962965
10616762
11044881
11480700
11413572
11126527
10986548
11618437

Ii
100.0%
99.9%
99.8%
99.6%
99.1%
98.4%
97.0%
94.8%
88.0%
59.0%

CL
d
C
i,J
11148124
10663318
10662008
9758606
9872218
10092247
9568143
8705378
8691971
9626383

BH
d
C
i,J
11148124
10663319
10662010
9758617
9872305
10092581
9569793
8711824
8725026
9961926

CL

BH

BF

15126
26257
34538
85302
156494
286121
449167
1043242
3950815
6047061

15127
26259
34549
85389
156828
287771
455612
1076297
4286358
6424190

16124
26998
37575
95434
178024
341305
574089
1318646
4768384
7356580

Table 4.1: Claims reserves from the Benktander-Hovinen method

Hovinen reserves are in between the chain-ladder reserves and the BornhuetterFerguson reserves. They are closer to the chain-ladder reserves because Ii is
larger than 50% for all accident years i {0, . . . , I}.
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

88

Chapter 4. Bayesian models

The next theorem is due to Mack [51]. It says that if we further iterate the BF
method, we arrive at the chain-ladder reserve:
b(0) = i and define for m 0
Theorem 4.5 (Mack [51]) Choose C
b(m+1) = Ci,Ii + (1 Ii ) C
b(m) .
C

(4.9)

If Ii > 0 then
CL
b(m) = C
d
lim C
.
i,J

(4.10)

Proof. For m 1 we claim that



CL
b(m) = 1 (1 Ii )m C
d
C
+ (1 Ii )m i .
i,J

(4.11)

The claim is true for m = 1 (BF estimator) and for m = 2 (BH estimator, see
Lemma 4.3). Hence we prove the claim by induction. Induction step m m + 1:
b(m+1) = Ci,Ii + (1 Ii ) C
b(m)
C
(4.12)



CL
m
m
d
= Ci,Ii + (1 Ii ) 1 (1 Ii ) Ci,J + (1 Ii ) i

CL
CL
d
d
+ (1 Ii )m+1 i ,
= Ii C
+ (1 Ii ) (1 Ii )m+1 C
i,J
i,J
which proves (4.11). But from (4.11) the claim of the theorem immediately follows.
2
Example 4.4, revisited
In view of Theorem 4.5 we have

0
1
2
3
4
5
6
7
8
9

BF
b (1) = C
d
C
i,J
11148124
10664316
10662749
9761643
9882350
10113777
9623328
8830301
8967375
10443953

BH
b (2) = C
d
C
i,J
11148124
10663319
10662010
9758617
9872305
10092581
9569793
8711824
8725026
9961926

b (3)
C
11148124
10663318
10662008
9758606
9872218
10092252
9568192
8705711
8695938
9764095

b (4)
C
11148124
10663318
10662008
9758606
9872218
10092247
9568144
8705395
8692447
9682902

b (5)
C
11148124
10663318
10662008
9758606
9872218
10092247
9568143
8705379
8692028
9649579

...
...

...

...

CL
b () = C
d
C
i,J
11148124
10663318
10662008
9758606
9872218
10092247
9568143
8705378
8691971
9626383

Table 4.2: Iteration of the Bornhuetter-Ferguson method

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

Chapter 4. Bayesian models

4.1.2

89

Minimizing quadratic loss functions

We choose i > I J and define the reserves for accident year i (see also (1.43))
Ri = Ci,J Ci,Ii .

(4.13)

Hence under the assumption that the development pattern and the chain-ladder
factors are known (and identified by (4.3) under Model Assumptions 2.9) the chainladder reserve and the Bornhuetter-Ferguson reserve are given by
ci
R

CL

d
= C
i,J

CL

Ci,Ii = Ci,Ii

J1
Y

!
fj 1 ,

(4.14)

j=Ii

ci
R

BF

d
= C
i,J

BF

Ci,Ii = (1 Ii ) i .

(4.15)

If we mix the chain-ladder and Bornhuetter-Ferguson methods we obtain for the


credibility mixture (c [0, 1])
CL
BF
d
d
cC
+ (1 c) C
i,J
i,J

(4.16)

ci (c) = c R
ci CL + (1 c) R
ci BF
R


CL
d
= (1 Ii ) c Ci,J + (1 c) i

(4.17)

the following reserves

d
=C
i,J

BF

d
Ci,Ii + c (C
i,J

CL

d
C
i,J

BF

(see also (4.5)).


Question. Which is the optimality c? Optimal is defined in the sense of minimizing
the quadratic error function. This means:
Our goal is to minimize the (unconditional) mean square error of prediction for the
ci (c)
reserves R

2 

ci (c) = E Ri R
ci (c)
msepRi R
(4.18)
(see also Section 3.1).
In order to do this minimization we need a proper stochastic model definition.
Model Assumptions 4.6
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

90

Chapter 4. Bayesian models

Assume that different accident years i are independent. There exists a sequence
(j )0jJ with J = 1 such that we have for all j {0, . . . , J}
E [Ci,j ] = j E [Ci,J ] .

(4.19)

Moreover, we assume that Ui is a random variable, which is unbiased for E[Ci,J ],


i.e. E[Ui ] = E[Ci,J ], and that Ui is independent of Ci,Ii and Ci,J .
2

Remarks 4.7
Model Assumptions 4.6 coincide with Model Assumptions 2.9 if we assume
that Ui = i is deterministic.
Observe that we do not assume that the chain-ladder model is satisfied! The
chain-ladder model satisfies Model Assumptions 4.6 but not necessarily vica
versa. Assume that fj is identified with j (via (4.3)) and that
CL
Ci,Ii
d
C
=
i,J
Ii

BF
d
C
= Ci,Ii + (1 Ii ) Ui .
i,J

and

(4.20)

Hence the reserves are given by




CL
c
d
Ri (c) = (1 Ii ) c Ci,J + (1 c) Ui .

(4.21)

Under these model assumptions and definitions we minimize



2 

c
c
msepRi Ri (c) = E Ri Ri (c)
.

(4.22)

Observe that also if we would assume that the chain-ladder model is satisfied
we could not directly compare this situation to the mean square error of
prediction calculation in Chapter 3. For the derivation of a MSEP formula
for the chain-ladder method we have always assumed that the chain-ladder
factors fj are not known. If they would be known the mean square error of
prediction of the chain-ladder reserves simply is given by (see (3.30))
 
 
CL 
CL 2
d
d
DI
msepCi,J Ci,J
= E E Ci,J Ci,J

h

i
CL
d
= E msepC |D C
i,J
i,J

= E [Var ( Ci,J | DI )]

(4.23)

= Var (Ci,J ) Var (E [Ci,J | DI ])


J1
Y
= Var (Ci,J ) Var (Ci,Ii )
fj2 .
j=Ii

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

Chapter 4. Bayesian models

91

If we calculate (4.17) under Model Assumptions 4.6 and with (4.20) we obtain


bi (c) = E[Ri ] and
E R
msepRi


2 

ci (c) = Var (Ri ) + E E[Ri ] R
ci (c)
R
h
i
 
c
+2 E Ri E[Ri ] E[Ri ] Ri (c)




ci (c) 2 Cov Ri , R
ci (c) .
= Var (Ri ) + Var R

(4.24)

Then we have the following theorem:


Theorem 4.8 (Mack [51]) Under Model Assumptions 4.6 and (4.20) the optimal credibility factor ci which minimizes the (unconditional) mean square error of
prediction (4.22) is given by
ci =

Cov(Ci,Ii , Ri ) + Ii (1 Ii )Var(Ui )
Ii

.
2
1 Ii
Var(Ci,Ii ) + Ii
Var(Ui )

(4.25)

Proof. We have that





2 


CL
BF 2
BF 2
2
c
c
c
c
E Ri (ci ) Ri
= c i E Ri Ri
+ E Ri Ri
h CL

i
BF
BF
c
c
c
2 ci E Ri Ri
Ri Ri
. (4.26)
Hence the optimal ci is given by
i

h CL
BF
BF
c
c
c
Ri Ri
E Ri Ri

(4.27)
ci =

CL
BF 2
c
c
E Ri Ri



E (1/Ii 1) Ci,Ii (1 Ii ) Ui Ri (1 Ii ) Ui
h
=
2 i
E (1/Ii 1) Ci,Ii (1 Ii ) Ui



E Ci,Ii Ii Ui Ri (1 Ii ) Ui
Ii


.
=

1 Ii
E (Ci,Ii Ii Ui )2
Since E[Ii Ui ] = E[Ci,Ii ] and E[Ui ] = E[Ci,J ] we obtain
Ii
1 Ii
Ii
=
1 Ii

ci =

Cov(Ci,Ii Ii Ui , Ri (1 Ii ) Ui )
Var(Ci,Ii Ii Ui )
Cov(Ci,Ii , Ri ) + Ii (1 Ii ) Var(Ui )

.
2
Var(Ci,Ii ) + Ii
Var(Ui )

(4.28)

This finishes the proof.


2
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

92

Chapter 4. Bayesian models

We would like to mention once more that we have not considered the estimation
errors in the claims development pattern j and fj , respectively. In this sense
Theorem 4.8 is a statement giving optimal credibility weights considering process
variance and the uncertainty in the a priori estimate Ui .
Remark. To explicitly calculate ci in Theorem 4.8 we need to specify an explicit
stochastic model. We will do this below in Section 4.1.4, and close this subsection
for the moment.

4.1.3

Cape-Cod Model

One main deficiency in the chain-ladder model is that the chain-ladder model completely depends on the last observation on the diagonal (see Chain-ladder Estimator
2.4). If this last observation is an outlier, this outlier will be projected to the ultimate claim (using the age-to-age factors). Often in long-tailed lines of business
the first observations are not always representative. One possibility to smoothen
outliers on the last observed diagonal is to combine BF and CL methods as e.g. in
the Benktander-Hovinen method, another possibility is to robustify such observations. This is done in the Cape-Cod method. The Cape-Cod method goes back to
B
uhlmann [15].
Model Assumptions 4.9 (Cape-Cod method)
There exist parameters 0 , . . . , I > 0, > 0 and a claims development pattern
(j )0jJ with J = 1 such that
E[Ci,j ] = i j

(4.29)

for all i = 0, . . . , I. Moreover, different accident years i are independent.


2
Observe that the Cape-Cod model assumptions coincide with Model Assumptions
2.9, set i = i . For the Cape-Cod method we have described these new
assumptions, to make clear the parameter i can be interpreted as the premium
in year i and reflects the average loss ratio. We assume that is independent
of the accident year i, i.e. the premium level w.r.t. is the same for all accident
years. Under (4.3) we can for each accident year estimate the loss ratio using the
chain-ladder estimate
CL
d
Ci,Ii
Ci,Ii
C
i,J
= QJ1 1
=
.

bi =
i
Ii i
j=Ii fj i

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

(4.30)

Chapter 4. Bayesian models

93

This is an unbiased estimate for ,


h
i
CL
1
1
1
d
E C
E [Ci,Ii ] =
E [Ci,J ] = .
E [b
i ] =
=
i,J
i
i Ii
i
The robustified overall loss ratio is then estimate by
PI
I
X
Ci,Ii
Ii i
CC

b =

bi = PI i=0
.
PI

Ik
k
Ii
i
k=0
i=0
i=0

(4.31)

(4.32)

Observe that
bCC is an unbiased estimate for .
A robustified value for Ci,Ii is then found by (i > I J)
C[
i,Ii

CC

=
bCC i Ii .

(4.33)

This leads to the Cape-Cod estimator:


Estimator 4.10 (Cape-Cod estimator) The CC estimator is given by
CC
d
C
= Ci,Ii C[
i,J
i,Ii

CC

J1
Y

fj C[
i,Ii

CC

(4.34)

j=Ii

for I J + 1 i I.
CC
d
Lemma 4.11 Under Model Assumptions 4.9 and (4.3) the estimator C

i,J
Ci,Ii is unbiased for E [Ci,J Ci,Ii ] = i (1 Ii ).

Proof. Observe that


h
 CC 
CC i
E C[
=E
b
i Ii = i Ii = E [Ci,Ii ] .
i,Ii

(4.35)

Moreover we have with (4.3) that


d
C
i,J

CC

Ci,Ii = C[
i,Ii

CC

J1
Y

!
fj 1

=
bCC i (1 Ii ) .

(4.36)

j=Ii

This finishes the proof.


2
Remarks 4.12
CC

The chain-ladder iteration is applied to the robustified diagonal value C[


,
i,Ii
but still we add the difference between original observation Ci,Ii and robustified diagonal value in order to calculate the ultimate.
If we transform the Cap-Code estimator we obtain (see also (4.36))
CC
d
C
= Ci,Ii + (1 Ii )
bCC i ,
i,J

(4.37)

which is a Bornhuetter-Ferguson type estimate with modified a priori estimate

bCC i .
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

94

Chapter 4. Bayesian models


Observe that
Var (b
i ) =

2i

1
Var (Ci,Ii ) .
2
Ii

(4.38)

According to the choice of the variance function of Ci,j this may also suggest
that the robustification can be done in an other way (with smaller variance),
see also Lemma 3.4.
Example 4.13 (Cape-Cod method)
We revisit the data set given in Examples 2.7, 2.11 and 4.4.
estimated reserves
0
1
2
3
4
5
6
7
8
9

i
15473558
14882436
14456039
14054917
14525373
15025923
14832965
14550359
14461781
15210363

bCC

bi
72.0%
71.7%
73.8%
69.4%
68.0%
67.2%
64.5%
59.8%
60.1%
63.3%
67.3%

CC

\
C
i,Ii
10411192
9999259
9702614
9423208
9688771
9953237
9681735
9284898
8562549
6033871

CC
d
C
i,J
11148124
10662396
10659704
9757538
9871362
10092522
9580464
8761342
8816611
9875801

Cape-Cod
0
14204
23953
33469
84446
156769
298442
505131
1167882
4200233
6484530

Total

CL
0
15126
26257
34538
85302
156494
286121
449167
1043242
3950815
6047061

BF
0
16124
26998
37575
95434
178024
341305
574089
1318646
4768384
7356580

Table 4.3: Claims reserves from the Cape-Cod method

Estimated loss ratios


75.0%

loss ratio

70.0%
65.0%
60.0%
55.0%
50.0%
0

accident years
kappa i

kappa CC

Figure 4.1: Estimated individual loss ratios


bi and estimated Cap-Code loss ratio
CC

b
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

Chapter 4. Bayesian models

95
CC

The example shows the robustified diagonal values C[


. This leads to the Capei,Ii
CC
d
Cod estimate. The Cape-Cod estimate C
is smaller than the Bornhuetteri,J
BF
d
Ferguson estimate C
. This comes from the fact, that the a priori estimates
bi
i,J
used for the Bornhuetter-Ferguson method are rather pessimistic. The loss ratios

bi /i are all above 75%, whereas the Cape-Cod method gives loss ratios
bi , which
are all below 75% (see Figure 4.1).
However, as Figure 4.1 shows: We have to be careful with the assumption of
constant loss ratios . The figure suggests that we have to consider underwriting
cycles carefully. In soft markets, loss ratios are rather low (we are able to charge
rather high premiums). If there is a keen competition we expect low profit margins.
If possible, we should adjust our premium with underwriting cycle information.
For this reason one finds in practice modified versions of the Cape-Cod method,
e.g. smoothening of the last observed diagonal is only done over neighboring values.

4.1.4

A distributional example to credible claims reserving

To construct the Benktander-Hovinen estimate we have used a credible weighting


between the Bornhuetter-Ferguson method and the chain-ladder method. Theorem
4.8 gave a statement for the best weighted average (relative to the quadratic loss
function). We now specify an explicit model to apply Theorem 4.8.
Model Assumptions 4.14 (Mack [51])
Different accident years i are independent.
There exists a sequence (j )0jJ with J = 1 such that we have
E [ Ci,j | Ci,J ] = j Ci,J ,
Var (Ci,j | Ci,J ) = j (1 j ) 2 (Ci,J )

(4.39)
(4.40)

for all i = 0, . . . , I and j = 0, . . . , J.


2
Remarks 4.15
The spirit of this model is different from the Chain-ladder Model 3.2. In
the chain-ladder model we have a forward iteration, i.e. we condition on
the preceding observation. In the model above we have rather a backward
iteration, conditioning on the ultimate Ci,J we determine intermediate cumulative payment states, i.e. this is simply a refined definition of the development
pattern.
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

96

Chapter 4. Bayesian models


This model can also be viewed as a Bayesian approach, with latent variables
which determine the ultimate claim Ci,J . This will be further discussed below.
Observe that this model satisfies the Model Assumptions 2.9 with i =
E [Ci,J ]. Moreover Ci,j satisfies the assumptions given in Model Assumptions 4.6. The chain-ladder model is in general not satisfied (see also Section
4.2.2, e.g. (4.68) below).
Observe that the variance condition is such that it converges to zero for
j 1, i.e. if the expected outstanding payments are low, also the uncertainty
is low.

In view of Theorem 4.8 we have the following corollary (use definitions (4.21) and
(4.20)):
Corollary 4.16 Under the assumption that Ui is an unbiased estimator for E[Ci,J ]
which is independent of Ci,Ii and Ci,J and Model Assumption 4.14 the optimal
credibility factor ci which minimizes the (unconditional) mean square error of prediction (4.22) is given by
ci =

E [2 (Ci,J )]
Var(Ui ) + Var(Ci,J ) E [2 (Ci,J )]

(4.41)

Ii
Cov(Ci,Ii , Ci,J Ci,Ii ) + Ii (1 Ii ) Var(Ui )
.

2
1 Ii
Var(Ci,Ii ) + Ii
Var(Ui )

(4.42)

Ii
Ii + ti

with

ti =

for i {I J + 1, . . . , I}.
Proof. From Theorem 4.8 we have
ci =

Now we need to calculate the elements of the equation above. We obtain


Var(Ci,Ii ) = E [Var(Ci,Ii |Ci,J )] + Var (E[Ci,Ii |Ci,J ])


2
= Ii (1 Ii ) E 2 (Ci,J ) + Ii
Var (Ci,J ) ,

(4.43)

Cov(Ci,Ii , Ci,J Ci,Ii ) = Cov(Ci,Ii , Ci,J ) Var(Ci,Ii ).

(4.44)

and

Henceforth we need to calculate


Cov(Ci,Ii , Ci,J ) = E [Cov(Ci,Ii , Ci,J |Ci,J )] + Cov (E[Ci,Ii |Ci,J ], E[Ci,J |Ci,J ])
= 0 + Cov (Ii Ci,J , Ci,J ) = Ii Var(Ci,J ).
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

(4.45)

Chapter 4. Bayesian models

97

This implies that


Cov(Ci,Ii , Ci,J Ci,Ii ) = Ii Var(Ci,J ) Var(Ci,Ii ).

(4.46)

Hence we obtain
Ii
Ii Var(Ci,J ) Var(Ci,Ii ) + Ii (1 Ii ) Var(Ui )

2
1 Ii
Var(Ci,Ii ) + Ii
Var(Ui )
2
Var(Ci,J ) E [ (Ci,J )] + Var(Ui )

=
(4.47)
1
Ii 1 E [2 (Ci,J )] + Var(Ci,J ) + Var(Ui )
Var(Ci,J ) E [2 (Ci,J )] + Var(Ui )
=
.
1
Ii
E [2 (Ci,J )] + Var(Ci,J ) E [2 (Ci,J )] + Var(Ui )

ci =

This finishes the proof of the corollary.


2
Corollary 4.17 Under the assumption that Ui is an unbiased estimator for E[Ci,J ]
which is independent of Ci,Ii and Ci,J and the Model Assumption 4.14 we find the
following mean square errors of prediction (see also (4.21)-(4.22)):
 2

2

 2

c
1
(1

c)
ci (c) = E (Ci,J )
msepRi R
+
+
(1 Ii )2 ,
Ii 1 Ii
ti



 2

1
1
ci (0) = E (Ci,J )
+
(1 Ii )2 ,
msepRi R
1 Ii ti



 2

1
1
c
msepRi Ri (1) = E (Ci,J )
+
(1 Ii )2 ,
Ii 1 Ii



 2

1
1

c
+
(1 Ii )2
(4.48)
msepRi Ri (ci ) = E (Ci,J )
Ii + ti 1 Ii
for i {I J + 1, . . . , I}.
Proof. Exercise.
2
Remarks 4.18
ci (0) corresponds to the Bornhuetter-Ferguson reserve R
ci BF and
The reserve R

ci (1) corresponds to the chain-ladder reserve R
ci CL . However, msepR R
ci (1)
R
i

ci CL from Section 3 are not comparable since a) we use a comand msepR R
i

pletely different model, which leads to different process error and prediction
error terms; b) in Corollary 4.17 we do not investigate the estimation error
coming from the fact that we have to estimate fj and j .
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

98

Chapter 4. Bayesian models


From Corollary 4.17 we see that the Bornhuetter-Ferguson estimate in Model
4.14 is better than the chain-ladder estimate as long as
ti > Ii .

(4.49)

I.e. for years with small loss experience Ii one should take the BF estimate
whereas for older years one should take the CL estimate. Similar estimates
can be derived for the BH estimate.
Example 4.19 (Mack model, Model Assumptions 4.14)
An easy distributional example satisfying Model Assumptions 4.14 is the following.

Assume that, conditionally given Ci,J , Ci,j /Ci,J has a Beta i j , i (1 j ) distribution. Hence



Ci,j
Ci,J = j Ci,J ,
(4.50)
E [ Ci,j | Ci,J ] = Ci,J E
Ci.J



2
Ci,J
Ci,j
2
Var (Ci,j | Ci,J ) = Ci,J Var
C
=

(1

(4.51)
i,J
j
j
Ci,J
1 + i
for all i = 0, . . . , I and j = 0, . . . , J.
See appendix, Section B.2.4, for the definition of the Beta distribution and its
moments.
We revisit the data set given in Examples 2.7, 2.11 and 4.4. Observe that


E 2 (Ci,J ) =

 2  E [Ci,J ]2

1
E Ci,J
Vco2 (Ci,J ) + 1 .
=
1 + i
1 + i

(4.52)

As already mentioned before, we assume that the claims development pattern


(j )0jJ is known. This means that in our estimates there is no estimation error
term coming from the claims development parameters. We only have a process
variance term and the uncertainty in the estimation of the ultimate Ui . This corresponds to a prediction error term in the language of Section 3.3. We assume
that an actuary is able to predict the true a priori estimate with an error of 5%,
i.e. Vco(Ui ) = 5%. Hence we assume that
Vco (Ci,J ) = Vco2 (Ui ) + r2

1/2

(4.53)

where r = 6% corresponds to the pure process error. This leads to the results in
Table 4.4.
We already see from the choices of our parameters i , r and Vco(Ui ) that it is
rather difficult to apply this method in practice, since we have not estimated these
parameters from the data available.
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

Chapter 4. Bayesian models

0
1
2
3
4
5
6
7
8
9

i
600
600
600
600
600
600
600
600
600
600

Vco(Ui )
5.0%
5.0%
5.0%
5.0%
5.0%
5.0%
5.0%
5.0%
5.0%
5.0%

r
6.0%
6.0%
6.0%
6.0%
6.0%
6.0%
6.0%
6.0%
6.0%
6.0%

Vco(Ci,J )
7.8%
7.8%
7.8%
7.8%
7.8%
7.8%
7.8%
7.8%
7.8%
7.8%

99

ti
24.2%
24.2%
24.2%
24.2%
24.2%
24.2%
24.2%
24.2%
24.2%
24.2%

ci
80.5%
80.5%
80.5%
80.5%
80.4%
80.3%
80.1%
79.7%
78.5%
70.9%
Total

estimated reserves
ci (c )
CL
BF
R
i
0
0
0
15126
16124
15320
26257
26998
26401
34538
37575
35131
85302
95434
87288
156494
178024
160738
286121
341305
297128
449167
574089
474538
1043242 1318646 1102588
3950815 4768384 4188531
6047061 7356580 6387663

Table 4.4: Claims reserves from Model 4.14

0
1
2
3
4
5
6
7
8
9
total

E[Ui ]
11653101
11367306
10962965
10616762
11044881
11480700
11413572
11126527
10986548
11618437

E[2 (Ci,J )]1/2


476788
465094
448551
434386
451902
469734
466987
455243
449515
475369

ci (1))
msep1/2 (R

ci (0))
msep1/2 (R

ci (c ))
msep1/2 (R
i

17529
22287
25888
42189
58952
81990
106183
166013
396616
457811

17568
22373
26031
42751
60340
85604
113911
190514
500223
560159

17527
22282
25879
42153
58862
81745
105626
163852
372199
435814

Table 4.5: Mean square errors of prediction according to Corollary 4.17

The prediction errors are given in Table 4.5.


Observe that these mean square errors of prediction can not be compared to the
mean square error of prediction obtained in the chain-ladder method (see Section
3). We do not know whether the model assumptions in this example imply the
chain-ladder model assumptions. Moreover, we do not investigate the uncertainties
in the parameter estimates, and the choice of the parameters was rather artificial,
motivated by expert opinions.
2

4.2
4.2.1

Exact Bayesian models


Motivation

Bayesian methods for claims reserving are methods in which one combines a priori information or expert knowledge with observations in the upper trapezoid DI .
Available information/knowledge is incorporated through an a priori distribution
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

100

Chapter 4. Bayesian models

of a random quantity such as the ultimate claim (see Sections 4.2.2 and 4.2.3)
or a risk parameter (see Section 4.2.4) which must be modeled by the actuary.
This distribution is then connected with the likelihood function via Bayes theorem. If we use a smart choice for the distribution of the observations and the a
priori distribution such as the exponential dispersion family (EDF) and its associate conjugates (see Section 4.2.4), we are able to derive an analytic expression
for the a posteriori distribution of the ultimate claim. This means that we can
compute the a posteriori expectation E [Ci,J |DI ] of the ultimate claim Ci,J which
is called Bayesian estimator for the ultimate claim, given the observations DI .
The Bayesian method is called exact since the Bayesian estimator E [Ci,J |DI ] is
optimal in the sense that it minimizes the squared loss function (MSEP) in the
class L2Ci,J (DI ) of all estimators for Ci,J which are square integrable functions of
the observations in DI , i.e.


E [Ci,J |DI ] = argmin E (Ci,J Y )2 DI .
Y L2C

i,J

(4.54)

(DI )

For its conditional mean square error of prediction we have that



msepCi,J |DI E [Ci,J |DI ] = Var(Ci,J |DI ).

(4.55)

Of course, if there are unknown parameters in the underlying probabilistic model,


we can not explicitly calculate E [Ci,J |DI ]. These parameters need to be estimated by DI -measurable estimators. Hence we obtain a DI -measurable estimator
b [Ci,J |DI ] for E [Ci,J |DI ] (and Ci,J |DI , resp.) which implies that
E
msepCi,J |DI


2

b
b
E [Ci,J |DI ] = Var(Ci,J |DI ) + E [Ci,J |DI ] E [Ci,J |DI ] ,

(4.56)

and now we are in a similar situation as in the chain-ladder model, see (3.30).
We close this section with some remarks: For pricing and tariffication of insurance
contracts Bayesian ideas and techniques are well investigated and widely used in
practice. For the claims reserving problem Bayesian methods are less used although
we believe that they are very useful for answering practical questions (this has
e.g. already be mentioned in de Alba [2]).
In the literature exact Bayesian models have been studied e.g. in a series of papers
by Verrall [79, 81, 82], de Alba [2, 4], de Alba-Corzo [3], Gogol [30], HaastrupArjas [32] Ntzoufras-Dellaportas [60] and the corresponding implementation by
Scollnik [70]. Many of these results refer to explicit choices of distributions, e.g. the
Poisson-gamma or the (log-)normal-normal cases are considered. Below, we give
an approach which suites for rather general distributions (see Section 4.2.4).
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

Chapter 4. Bayesian models

4.2.2

101

Log-normal/Log-normal model

In this section we revisit Model 4.14. We make distributional assumptions on Ci,J


and Ci,j |Ci,J which satisfy Model Assumptions 4.14 and hence would allow for the
application of Corollary 4.16. However, in this section we dont follow that route:
In Corollary 4.16 we have specified a second distribution for an a priori estimate
E[Ui ] which then led to Corollary 4.16.
Here, we dont use a distribution for the a priori estimate, but we explicitly specify
the distribution of Ci,J . The distributional assumptions will be such that we can
determine the exact distribution of Ci,J |Ci,j according to Bayes theorem. It figures
out that the best estimate for E [Ci,J | Ci,Ii ] is a crediblity mixture between the
observation Ci,Ii and the a priori mean E[Ci,J ]. Gogol [30] proposed the following
model.
Model Assumptions 4.20 (Log-normal/Log-normal model)
Different accident years i are independent.
Ci,J is log-normally distributed with parameters (i) and i2 for i = 0, . . . , I.
Conditionally, given Ci,J , Ci,j has a Log-normal distribution with parameters
j (Ci,J ) and j2 (Ci,J ) for i = 0, . . . , I and j = 0, . . . , J.
2
Remark. In appendix, Section B.2.2, we provide the definition of the Log-normal
distribution. (i) and i2 denote the parameters of the Log-normal distribution of
Ci,J , with


i = E[Ci,J ] = exp (i) + i2 /2
(4.57)
is the a priori mean of Ci,J .
If (Ci,j )0jJ also satisfies Model Assumptions 4.14 we have that

!
E [Ci,j | Ci,J ] = exp j + j2 /2 = j Ci,J ,
(4.58)



!
Var (Ci,j | Ci,J ) = exp 2 j + j2 exp{j2 } 1 = j (1 j ) 2 (Ci,J ).
This implies that we have to choose
1 j 2 (Ci,J )

j2 = j2 (Ci,J ) = log 1 +
2
j
Ci,J

!
,

1
1 j 2 (Ci,J )
j = j (Ci,J ) = log (j Ci,J ) log 1 +

2
2
j
Ci,J
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

(4.59)
!
. (4.60)

102

Chapter 4. Bayesian models

The joint distribution of (Ci,j , Ci,J ) is given by


fCi,j ,Ci,J (x, y) = fCi,j |Ci,J (x|y) fCi,J (y)
(

2 )
1
1 log(x) j (y)
1
exp
(4.61)
=
(2)1/2 j (y) x
2
j (y)
(

2 )
1
1
1 log(y) (i)

exp
(2)1/2 i y
2
i
(

2

2 )
1
1
1 log(x) j (y)
1 log(y) (i)
=
exp

.
2 i j (y) x y
2
j (y)
2
i
Lemma 4.21 The Model Assumptions 4.20 combined with Model Assumptions
4.14 with 2 (c) = a2 c2 for some a R satisfies the following equalities



1 j 2
=
= log 1 +
a ,
j
j (c) = log c + log j j2 /2.

j2 (c)

j2

(4.62)
(4.63)

Moreover, the conditional distribution of Ci,J given Ci,j is again a Log-normal


distribution with updated parameters

post(i,j) =
2
post(i,j)
=

j2
1 2
i + j2


j2 /2 + log(Ci,j /j ) +

j2
2.
i2 + j2 i

j2
(i) , (4.64)
i2 + j2
(4.65)

Remarks 4.22
This example shows a typical Bayesian and credibility result: i) In this example of conjugated distributions we can exactly calculate the a posteriori
distribution of the ultimate claim Ci,J given the information Ci,j (cf. Section
4.2.4 and see also B
uhlmann-Gisler [18]). ii) We see that we need to update
the parameter (i) by choosing a credibility weighted average of the a priori
parameter (i) and the transformed observation j2 /2 + log(Ci,j /j ), where
the credibility weight is given by
i,j = i2 /(i2 + j2 ).

(4.66)

This implies the updating of the a priori mean of the ultimate claim Ci,J
E[Ci,J ] = exp{(i) + i2 /2}
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

(4.67)

Chapter 4. Bayesian models

103

to the a posteriori mean of the ultimate claim Ci,J


2
E[Ci,J |Ci,j ] = exp{post(i,j) + post(i,j)
/2}



= exp (1 i,j ) (i) + i2 /2 + i,j log(Ci,j /j ) + j2 /2

= exp (1 i,j )

(i)

i2 /2

+ i,j log j +

j2 /2

(4.68)

i2
2 + 2
i
j

Ci,j

see also (4.83) below.


Observe that this model does in general not satisfy the chain-ladder assumptions (cf. last expression in (4.68)). This has already been mentioned in
Remarks 4.15.
Observe that in the current derivation we only consider one observation Ci,j .
We could also consider the whole sequence of observations Ci,0 , . . . , Ci,j then
the a posteriori distribution of Ci,J is log-normally distributed with mean
log Ci,k log k +k2
k=0
2
Pj k1
1
k=0 k2 + i2

Pj
post(i,j) =

= i,j
Pj

1
k=0 k2

(i)
i2

(4.69)

j
X
log Ci,k log k + 2
k

k2

k=0

with

 (i)

+ 1 i,j
,

Pj

i,j

= Pj

1
k=0 k2

1
k=0 k2

1
i2

and variance
j
X
1
1
+ 2
=
2

i
k=0 k

"

2,
post(i,j)

(4.70)

#1
.

(4.71)

Observe that this is again a credibility weighted average between the a priori
estimate (i) and the observations Ci,0 , . . . , Ci,j . The credibility weights are

given by i,j
. Moreover, observe that this model does not have the Markov
property, this is in contrast to our chain-ladder assumptions.
Proof of Lemma 4.21. The equations (4.62)-(4.63) easily follow from (4.59)(4.60). Hence we only need to calculate the conditional distribution of Ci,J given
Ci,j . From (4.61) and (4.63) we see that the joint density of (Ci,j , Ci,J ) is given by
1
1

(4.72)
fCi,j ,Ci,J (x, y) =
2 i j x y
(

2

2 )
1 log(x) log(y) log j + j2 /2
1 log(y) (i)
exp

.
2
j
2
i
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

104

Chapter 4. Bayesian models

Now we have that




zc

2


+

2

2 c+ 2
2 + 2

2

2 2
2 + 2

( c)2
.
2 + 2

(4.73)

This implies that


1
1
fCi,j ,Ci,J (x, y) =

2 i j x y

2
2 c(x)+ 2 (i)
2

1
(i)
log y i 2 + j2
c(x)

i
j
+
exp
,
i2 j2

i2 + j2
2

(4.74)

i2 +j2

where
c(x) = log(x) log j + j2 /2.

(4.75)

From this we see that


fCi,J |Ci,j (y|x) =

fC ,C (x, y)
fCi,j ,Ci,J (x, y)
= R i,j i,J
fCi,j (x)
fCi,j ,Ci,J (x, y)dy

(4.76)

is the density of a Log-normal distribution with parameters


post(i,j)

i2 c(Ci,j ) + j2 (i)
=
,
i2 + j2

2
post(i,j)
=

i2 j2
.
i2 + j2

(4.77)
(4.78)

Finally, we rewrite post(i,j) (cf. (4.75)):


post(i,j)


i2 log(Ci,j ) log j + j2 /2 + j2 (i)
=
.
i2 + j2

(4.79)

This finishes the proof of Lemma 4.21.


2
Estimator 4.23 (Log-normal/Log-normal model, Gogol [30])
Under the assumptions of Lemma 4.21 we have the following estimator for the
ultimate claim E [Ci,J | Ci,Ii ]
Go
d
C
= E [Ci,J | Ci,Ii ]
(4.80)
i,J



 


2
2
2
2
Ii
i
i
Ci,Ii
i
(i)
+
+ 2
log
+
= exp 1 2
2
2
i + Ii
2
i + Ii
Ii
2

for I J + 1 i I.
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

Chapter 4. Bayesian models

105

Observe that we only condition on the last observation Ci,Ii , see also Remarks
4.22 on Markov property.
Remark. We could also consider
Go,2
Go
d
d
C
= Ci,Ii + (1 Ii ) C
.
i,J
i,J

(4.81)

Go,2
d
From a practical point of view C
is more useful, if we have an outlier on the
i,J
diagonal. However, both estimators are not easily obtained in practice, since there
are too many parameters which are difficult to estimate.

Example 4.24 (Model Gogol [30], Assumptions of Lemma 4.21)


We revisit the data set given in Example 2.7. For the parameters we do the
same choices as in Example 4.19 (see Table 4.4). I.e. we set Vco(Ci,J ) equal to
the value obtained in (4.53). In formula (4.53) this variational coefficient was
decomposed into process error and parameter uncertainties, here we only use the
1
with i = 600. Using (4.62),
overall uncertainty. Moreover we choose a2 = 1+
i
(4.57) and

i2 = ln Vco2 (Ci,J ) + 1

(4.82)

(cf. appendix, Table B.5) leads to Table 4.6.

0
1
2
3
4
5
6
7
8
9

i = E[Ci,J ]
11653101
11367306
10962965
10616762
11044881
11480700
11413572
11126527
10986548
11618437

Vco(Ci,J )
7.8%
7.8%
7.8%
7.8%
7.8%
7.8%
7.8%
7.8%
7.8%
7.8%

(i)
16.27
16.24
16.21
16.17
16.21
16.25
16.25
16.22
16.21
16.27

i
7.80%
7.80%
7.80%
7.80%
7.80%
7.80%
7.80%
7.80%
7.80%
7.80%

Ii
100.0%
99.9%
99.8%
99.6%
99.1%
98.4%
97.0%
94.8%
88.0%
59.0%

a2
0.17%
0.17%
0.17%
0.17%
0.17%
0.17%
0.17%
0.17%
0.17%
0.17%

Ii
0.0%
0.2%
0.2%
0.2%
0.4%
0.5%
0.7%
1.0%
1.5%
3.4%

Table 4.6: Parameter choice for the Log-normal/Log-normal model

We obtain the credibility weights and estimates for the ultimates in Table 4.7.
CL
C
d
Using (4.66), (4.57) and C
= i,Ii
we obtain for Estimator 4.23 the following
i,J
Ii
representation:

 



Go
Ci,Ii
(i)
2
2
d
+ Ii /2
Ci,J
= exp (1 i,Ii ) + i /2 + i,Ii log
Ii
n
oi,Ii
CL
1i,Ii
2
d
.
(4.83)
= i
exp log Ci,J + Ii /2
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

106

Chapter 4. Bayesian models


estimated reserves

0
1
2
3
4
5
6
7
8
9

Ci,Ii
11148124
10648192
10635751
9724068
9786916
9935753
9282022
8256211
7648729
5675568

1 i,Ii
0.0%
0.0%
0.1%
0.1%
0.2%
0.4%
0.8%
1.5%
3.6%
16.0%

post(i,Ii)
16.23
16.18
16.18
16.09
16.11
16.13
16.08
15.98
15.99
16.11

post(i,Ii)
0.00%
0.15%
0.20%
0.24%
0.38%
0.51%
0.71%
0.94%
1.48%
3.12%

Go
d
C
i,J
11148124
10663595
10662230
9759434
9874925
10097962
9582510
8737154
8766487
9925132

Go
0
15403
26479
35365
88009
162209
300487
480942
1117758
4249564
6476218

CL
0
15126
26257
34538
85302
156494
286121
449167
1043242
3950815
6047061

BF
0
16124
26998
37575
95434
178024
341305
574089
1318646
4768384
7356580

Table 4.7: Estimated reserves in model Lemma 4.21

Hence we obtain a weighted average between the a priori estimate i = E[Ci,J ] and
CL
d
the chain-ladder estimate C
on the log-scale. This leads (together with the
i,J
bias correction) to multiplicative credibility formula. In Table 4.7 we see that the
weights 1 i,Ii given to the a priori mean i are rather low.
For the conditional mean square error of prediction we have


Go
d
= Var(Ci,J |Ci,Ii )
msepCi,J |Ci,Ii C
i,J


 2


2
= exp 2 post(i,Ii) + post(i,Ii)
exp post(i,Ii)
1
(4.84)
2
 2


= E [Ci,J | Ci,Ii ] exp post(i,Ii) 1


 2


Go 2
d
= C
exp post(i,Ii)
1 .
i,J
This holds under the assumption that the parameters j , (i) , i and a2 are known.
Hence it is not directly comparable to the mean square error of prediction obtained
from the chain-ladder model, since we have no canonical model for the estimation
of these parameters and hence we can not quantify the estimation error.
If we want to compare this mean square error of prediction to the ones obtained in
Corollary 4.17 we need to calculate the unconditional version:




Go
Go 2
d
d
msepCi,J Ci,J
= E Ci,J Ci,J
= E [Var(Ci,J |Ci,Ii )]


 2


Go 2
d
= E Ci,J
exp post(i,Ii)
1 .

(4.85)

CL
d
Hence we need the distribution of C
= Ci,Ii /Ii (cf. (4.83)). Using (4.74)
i,J

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

Chapter 4. Bayesian models

107

we obtain
Z
fCi,Ii ,Ci,J (x, y) dy

fCi,Ii (x) =
R+

=
R+

1
2 i2Ii2

i +Ii



2 (i) 2
i2 c(x)+Ii

log y
2
i2 +Ii
1
1
exp
dy
2
i2 Ii

y
2

2
2
i +Ii
{z
}
=1

(4.86)

2

2

log(x/Ii ) + Ii
(i)
2
1
1
exp
.
q
2

i2 + Ii
2
2

2 (i2 + Ii
) x
CL
d
This shows the estimator C
= Ci,Ii /Ii is log-normally distributed with
i,J
(i)
2
2
2
parameters Ii /2 and i +Ii . Moreover, the multiplicative reproductiveness
of the Log-normal distribution implies that for > 0

CL
d
C
i,J



2
2
LN (i) Ii
/2, 2 i2 + Ii
.

(d)

(4.87)

Using (4.83) and (4.57) this leads to




Go
d
msepCi,J C
i,J


 2


Go 2
d
exp post(i,Ii)
1
= E Ci,J
=

2(1i,Ii )
i

exp i,Ii

2
Ii

exp

2
post(i,Ii)



CL
d
1 E C
i,J

(4.88)

2
i,Ii



2
= exp 2 (i) + (1 i,Ii ) i2 + i,Ii Ii


 2


2
2
2
exp i,Ii Ii
+ 2 i,Ii
i2 + Ii
exp post(i,Ii)
1 .
Observe that

2
= i2
i,Ii i2 + Ii

(4.89)

(cf. (4.66)). This immediately implies the following corollary:


Corollary 4.25 Under the assumptions of Lemma 4.21 we have





 2

Go
d
msepCi,J C
= exp 2 (i) + (1 + i,Ii ) i2 exp post(i,Ii)
1
i,J
(4.90)
for all I J + 1 i I.

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

108

Chapter 4. Bayesian models


1/2

msepC

i,J |Ci,Ii

0
1
2
3
4
5
6
7
8
9
total

Go
d
C
i,J
16391
21602
23714
37561
51584
68339
82516
129667
309586
359869

1/2

msepC

i,J

Go
d
C
i,J

`
ci (c )
msep1/2 R
i

17526
22279
25875
42139
58825
81644
105397
162982
363331
427850

17527
22282
25879
42153
58862
81745
105626
163852
372199
435814

Table 4.8: Mean square errors of prediction under the assumptions of Lemma 4.21
and in Model 4.14

4.2.3

Overdispersed Poisson model with gamma a priori


distribution

In the next subsections we will consider a different class of Bayesian models. In


Model Assumptions 4.14 we had a distributional assumption on Ci,j given the
ultimate claim Ci,J (which can be seen as a backward iteration). Now we introduce
a latent variable i . Conditioned on i we will do distributional assumptions on
the cumulative sizes Ci,j and incremental quantities Xi,j , respectively. i describes
the risk characteristics of accident year i (e.g. was it a good or a bad year).
Ci,J is then a random variable with parameters which depend on i . In the spirit
of the previous chapters i reflects the prediction uncertainties.
We start with the overdispersed Poisson model. The overdispersed Poisson model
differs from the Poisson Model 2.12 in that the variance is not equal to the mean.
This model was introduced for claims reserving in a Bayesian context by Verrall [79, 81, 82] and Renshaw-Verrall [65]. Furthermore, the overdispersed Poisson
model is also used in a generalized linear model context (see McCullagh-Nelder [53],
England-Verrall [25] and references therein, and Chapter 5, below). The overdispersed Poisson model as considered below can be generalized to the exponential
dispersion family, this is done in Subsection 4.2.4.
We start with the overdispersed Poisson model with Gamma a priori distribution
(cf. Verrall [81, 82]).
Model Assumptions 4.26 (Overdispersed Poisson-gamma model)
There exist random variables i and Zi,j as well as constants i > 0 and 0 , . . . , J >
P
0 with Jj=0 j = 1 such that for all i {0, . . . , I} and j {0, . . . , J} we have
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

Chapter 4. Bayesian models

109

conditionally, given i , the Zi,j are independent and Poisson distributed, and
the incremental variables Xi,j = i Zi,j satisfy
E [Xi,j | i ] = i j

and

Var (Xi,j | i ) = i i j .

(4.91)


The pairs i , (Xi,0 , . . . , Xi,J ) (i = 0, . . . , I) are independent and i is
Gamma distributed with shape parameter ai and scale parameter bi .
2
Remarks 4.27
See appendix, Sections B.1.2 and B.2.3 for the definition of the Poisson and
Gamma distribution.
Observe, given i , the expectation and variance of Zi,j satisfy
E [Zi,j | i ] = Var [Zi,j | i ] =

i j
.
i

(4.92)

The a priori expectation of the increments Xi,j is given by




ai
E [Xi,j ] = E E [Xi,j | i ] = j E [i ] = j .
bi

(4.93)

For the cumulative ultimate claim we obtain


Ci,J = i

J
X

Zi,j .

(4.94)

j=0

This implies that conditionally, given i ,


Ci,J (d)
Poisson(i /i ),
i

and

E [Ci,J | i ] = i ,

(4.95)

this means that i plays the role of the (unknown) total expected claim
amount of accident year i. The Bayesian approach chosen tells us, how we
should combine the a priori expectation E[Ci,J ] = ai /bi and the information
DI .
This model is sometimes problematic in practical applications. It assumes
that we have no negative increments Xi,j . If we count the number of reported
claims this may hold true. However if Xi,j denotes incremental payments, we
can have negative values. E.g. in motor hull insurance in old development
periods one gets more money (via subrogation and repayments of deductibles)
than one spends.
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

110

Chapter 4. Bayesian models

Observe that we have assumed that the claims development pattern j is


known.
Observe that in the overdispersed Poisson model, in general, Ci,j is not a
natural number. Henceforth, if we work with claims counts with dispersion
i 6= 1 there is not really an interpretation for this model.
Lemma 4.28 Under Model Assumptions 4.26 the a posteriori distribution of i ,
given (Xi,0 , . . . , Xi,j ), is a Gamma distribution with updated parameters
apost
= ai + Ci,j /i ,
i,j
bpost
i,j

= bi +

j
X

k /i = bi + j /i ,

(4.96)
(4.97)

k=0

where j =

Pj

k=0

k .

Remarks 4.29
Since accident years are independent it suffices to consider (Xi,0 , . . . , Xi,j ) for
the calculation of the a posteriori distribution of i .
We assume that a priori all accident years are equal (i are i.i.d.). After
we have a set of observations DI , we obtain a posteriori risk characteristics
which differ according to the observations.
Model 4.26 belongs to the well-known class of exponential dispersion models
with associated conjugates (see e.g. B
uhlmann-Gisler [18] in Subsection 2.5.1,
and Subsection 4.2.4 below).
Using Lemma 4.28 we obtain for the a posteriori expectation
apost
i,Ii

ai + Ci,Ii /i
(4.98)
bi + Ii /i


bi
ai
bi
Ci,Ii
+ 1

,
=
bi + Ii /i bi
bi + Ii /i
Ii

E [ i | DI ] =

bpost
i,Ii

which is a credibility weighted average between the a priori expectation


C
E [i ] = abii and the observation i,Ii
(see next section and B
uhlmann-Gisler
Ii
[18] for more detailed discussions).
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

Chapter 4. Bayesian models

111

In fact we can specify the a posteriori distribution of (Ci,J Ci,Ii )/i , given
DI . It holds for k {0, 1, . . .} that

P (Ci,J Ci,Ii )/i = k
post
Z
k (bpost )ai,Ii
post
post
i,Ii
(1Ii ) ((1 Ii ) )

ai,Ii 1 ebi,Ii d
=
e

post
k!
(ai,Ii )
R+
post

ai,Ii
Z
bpost
(1 Ii )k
post
post
i,Ii

=

k+ai,Ii 1 e(bi,Ii +1Ii ) d


post
ai,Ii k!
R
| +
{z
}

post
density of k + apost
i,Ii , bi,Ii + 1 Ii

bpost
i,Ii

apost
i,Ii


(1 Ii )k
k + apost
i,Ii

=
(4.99)

k+apost
i,Ii
apost
i,Ii k!
bpost
+
1

Ii
i,Ii
post
!
!k

a
i,Ii
k + apost
bpost
1 Ii
i,Ii
i,Ii

=

k! apost
bpost
bpost
i,Ii
i,Ii + 1 Ii
i,Ii + 1 Ii
post
!ai,Ii
!k


bpost
k + apost
1 Ii
i,Ii
i,Ii 1
=

,
k
bpost
bpost
i,Ii + 1 Ii
i,Ii + 1 Ii
which is a Negative binomial distribution with parameters r = apost
i,Ii and

post
post
p = bi,Ii / bi,Ii + 1 Ii (see appendix Section B.1.3).
Proof. Using (4.92) we obtain for the conditionally density of (Xi,0 , . . . , Xi,j ),
given i , that
fXi,0 ,...,Xi,j |i (x0 , . . . , xj |) =

j
Y

exp{ k /i }

k=0

( k /i )xk /i
.
xk /i

(4.100)

Hence the joint distribution of i and (Xi,0 , . . . , Xi,j ) is given by


fi ,Xi,0 ,...,Xi,j (, x0 , . . . , xj ) = fXi,0 ,...,Xi,j |i (x0 , . . . , xj |) fi ()
=

j
Y
k=0

exp{ k /i }

bai
( k /i )xk /i
i ai 1 ebi .
xk /i
(ai )

(4.101)

This shows that the a posteriori distribution of i , given (Xi,0 , . . . , Xi,j ), is again
a Gamma distribution with updated parameters
apost
= ai + Ci,j /i ,
i,j

(4.102)

bpost
= bi +
i,j

k /i .

k=0

This finishes the proof of the lemma.


c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

(4.103)

112

Chapter 4. Bayesian models


2

Using the conditional independence of Xi,j , given i , and (4.91) we obtain




E [Ci,J | DI ] = E E [Ci,J | i , DI ] DI


" " Ii
# #
" " J
# #




X
X




=E E
Xi,j i , DI DI + E E
Xi,j i , DI DI




j=0
j=Ii+1
# #
" " J


X


= Ci,Ii + E E
Xi,j i DI
(4.104)


j=Ii+1

= Ci,Ii + (1 Ii ) E [i | DI ] .
Together with (4.98) this motivates the following estimator:
Estimator 4.30 (Poisson-gamma model, Verrall [81, 82]) Under Model Assumptions 4.26 we have the following estimator for the ultimate claim E [ Ci,J | DI ]
#
!
"
P oiGa
b
a
C
b
i
i
i,Ii
i
d
+ 1

C
= Ci,Ii + (1 Ii )
i,J
Ii
bi + Ii bi
bi + Ii
i

(4.105)
for I J + 1 i I.
Example 4.31 (Poisson-gamma model)
We revisit the data set given in Example 2.7. For the a priori parameters we do the
same choices as in Example 4.19 (see Table 4.4). Since i is Gamma distributed
with shape parameter ai and scale parameter bi we have
ai
,
bi
1/2
Vco (i ) = ai ,
E [i ] =

(4.106)
(4.107)

and, using (4.91), we obtain


Var (Ci,J ) = E [Var (Ci,J | i )] + Var (E [Ci,J | i ])
= i E[i ] + Var (i )

ai
.
= i + b1
i
bi

(4.108)

This leads to Table 4.9.


We define
i,Ii =

Ii /i
,
bi + Ii /i

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

(4.109)

Chapter 4. Bayesian models

0
1
2
3
4
5
6
7
8
9

E[i ]
11653101
11367306
10962965
10616762
11044881
11480700
11413572
11126527
10986548
11618437

113
Vco(i )
5.00%
5.00%
5.00%
5.00%
5.00%
5.00%
5.00%
5.00%
5.00%
5.00%

Vco(Ci,J )
7.8%
7.8%
7.8%
7.8%
7.8%
7.8%
7.8%
7.8%
7.8%
7.8%

ai
400
400
400
400
400
400
400
400
400
400

bi
0.00343%
0.00352%
0.00365%
0.00377%
0.00362%
0.00348%
0.00350%
0.00360%
0.00364%
0.00344%

i
41951
40922
39467
38220
39762
41331
41089
40055
39552
41826

Table 4.9: Parameter choice for the Poisson-gamma model

which is the credibility weight given to the observation i,Ii


(cf. (4.98)). The
Ii
credibility weights and estimates for the ultimates are given in Table 4.10.
Observe that

i,Ii =

The term
4.38).

E [Var (Xi,Ii |i )]
Ii Var(i )

Ii
=
Ii + i bi

Ii +

Ii
.
E [Var (Xi,Ii |i )]

(4.110)

Ii Var(i )

is the so-called credibility coefficient (see also Remark

estimated reserves
post

0
1
2
3
4
5
6
7
8
9

Ci,Ii

i,Ii

i,Ii

11148124
10648192
10635751
9724068
9786916
9935753
9282022
8256211
7648729
5675568

100.0%
99.9%
99.8%
99.6%
99.1%
98.4%
97.0%
94.8%
88.0%
59.0%

41.0%
40.9%
40.9%
40.9%
40.8%
40.6%
40.3%
39.7%
37.9%
29.0%

ai,Ii
post

bi,Ii

11446143
11079028
10839802
10265794
10566741
10916902
10670762
10165120
10116206
11039755

P oiGa
d
C
i,J

11148124
10663907
10662446
9760401
9878219
10105034
9601115
8780696
8862913
10206452

PoiGa

CL

BF

0
15715
26695
36333
91303
169281
319093
524484
1214184
4530884
6927973

0
15126
26257
34538
85302
156494
286121
449167
1043242
3950815
6047061

0
16124
26998
37575
95434
178024
341305
574089
1318646
4768384
7356580

Table 4.10: Estimated reserves in the Poisson-gamma model

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

114

Chapter 4. Bayesian models

The conditional mean square error of prediction is given by





 
P oiGa
P oiGa 2
d
d
DI
msepCi,J |DI C
= E Ci,J C
i,J
i,J

!2
J
X

=E
Xi,j (1 Ii ) E [i | DI ] DI

j=Ii+1

!2
J
X

DI (4.111)
=E
Xi,j j E [i | DI ]


j=Ii+1
(cf. (4.104)-(4.105)). Since for j > I i


E [Xi,j | DI ] = E E [Xi,j | i , DI ]| DI


= E E [Xi,j | i ]| DI

(4.112)

= j E [i | DI ] ,
we have that


d
msepCi,J |DI C
i,J

P oiGa

= Var

J
X
j=Ii+1

!


Xi,j DI .

(4.113)

This last expression can be calculated. We do the complete calculation, but we


could also argue with the help of the negative binomial distribution. Using the
conditional independence of Xi,j , given i , and (4.91) we obtain
!
J

X

Var
Xi,j DI

j=Ii+1
! !
! !
J
J




X
X




Xi,j i DI
= E Var
Xi,j i DI + Var E




j=Ii+1
j=Ii+1
!
!
J
J


X
X


i i j DI + Var
i j DI
(4.114)
=E


j=Ii+1

j=Ii+1

= i (1 Ii ) E [ i | DI ] + (1 Ii )2 Var (i | DI ) .
With Lemma 4.28 this leads to the following corollary:
Corollary 4.32 Under Model Assumptions 4.26 the conditional mean square error
of prediction is given by
msepCi,J |DI


apost
apost
P oiGa
i,Ii
i,Ii
2
d
Ci,J
= i (1 Ii ) post + (1 Ii ) post 2
bi,Ii
bi,Ii

for I J + 1 i I.
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

(4.115)

Chapter 4. Bayesian models

115

Remark. Observe that we have assumed that the parameters ai , bi , i and j are
known. If these need to be estimated we obtain an additional term in the MSEP
calculation which corresponds to the parameter estimation error.
The unconditional mean square error of prediction can then easily be calculated.
We have


h

i
P oiGa
P oiGa
d
d
msepCi,J C
=
E
msep
C
(4.116)
i,J
i,J
Ci,J |DI
 post 
 post 
E ai,Ii
E ai,Ii
+ (1 Ii )2 post 2 ,
= i (1 Ii )
post
bi,Ii
bi,Ii
and using E[Ci,Ii ] = Ii
msepCi,J

ai
bi

(cf. (4.93)) we obtain


P oiGa
ai
1 + i bi
d
Ci,J
= i (1 Ii )
.
bi i bi + Ii

(4.117)

Hence we obtain the Table 4.11 for the conditional prediction errors.
1/2

msepC

i,J |Ci,Ii

0
1
2
3
4
5
6
7
8
9
total

()

1/2

msepC

i,J

()

P oiGa
d
C
i,J

Go
d
C
i,J

P oiGa
d
C
i,J

Go
d
C
i,J

ci (c )
R
i

25367
32475
37292
60359
83912
115212
146500
224738
477318
571707

16391
21602
23714
37561
51584
68339
82516
129667
309586
359869

25695
32659
37924
61710
86052
119155
153272
234207
489668
588809

18832
23940
27804
45276
63200
87704
113195
174906
388179
457739

17527
22282
25879
42153
58862
81745
105626
163852
372199
435814

Table 4.11: Mean square errors of prediction in the Poisson-gamma model, the
Log-normal/Log-normal model and in Model 4.14

We have already seen in Table 4.10 that the Poisson-gamma reserves are closer to
the Bornhuetter-Ferguson estimate (this stands in contrast with the other methods
presented in this chapter). Table 4.11 shows that the prediction error is substantially larger in the Poisson-gamma model than in the other models (comparable to
ci (0) in Table 4.5). This suggests that in the present case
the estimation error in R
the Poisson-gamma method is not an appropriate method.
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

116

Chapter 4. Bayesian models

4.2.4

Exponential dispersion family with its associate conjugates

In the subsection above we have seen that in the Poisson-gamma model i has as a
posteriori distribution again a Gamma distribution with updated parameters. This
indicates, using a smart choice of the distributions we were able to calculate the a
posteriori distribution. We generalize the Poisson-gamma model to the Exponential
dispersion family (EDF), and we look for its associate conjugates. This are standard models in Bayesian inference, for literature we refer e.g. to Bernardo-Smith
[9]. Similar ideas have been applied for tariffication and pricing (see B
uhlmannGisler [18], Chapter 2), we transform these ideas to the reserving context (see also
W
uthrich [89]).
Model Assumptions 4.33 (Exponential dispersion model)
There exists a claims development pattern (j )0jJ with J = 1, 0 = 0 6= 0 and
j = j j1 6= 0 for j {1, . . . , J}.
Conditionally, given i , the Xi,j (0 i I, 0 j J) are independent
with




2
x i b(i )
Xi,j (d)
(i )
dFi,j (x) = a x,
exp
d(x), (4.118)
j i
wi,j
2 /wi,j
where is a suitable -finite measure on R, b() is some real-valued twicedifferentiable function of i and i > 0, 2 and wi,j > 0 are some real-valued
( )
constants, and Fi,j i is a probability distribution on R.

The random vectors i , (Xi,0 , . . . , Xi,J ) (i = 0, . . . , I) are independent and
i are real-valued random variables with densities (w.r.t. the Lebesgue measure)


b()
2
u, 2 () = d(, ) exp
,
(4.119)
2
with 1 and 2 > 0.
2
Remarks 4.34
In the following the measure is given by the Lebesgue measure or by the
counting measure.
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

Chapter 4. Bayesian models

117

A distribution of the type (4.118) is said to be a (one parametric) Exponential


dispersion family (EDF). The class of (one parametric) Exponential dispersion families covers a large class of families of distributions, e.g. the families
of the Poisson, Bernoulli, Gamma, Normal and Inverse-gaussian distributions
(cf. B
uhlmann-Gisler [18], Section 2.5).
The first assumption implies that the scaled sizes Yi,j = Xi,j /(j i ) have,
( )
given i , a distribution Fi,j i which belongs to the EDF. A priori they are
all the same, which is described by the fact that and 2 do not depend on
i.
For the time being we assume that all parameters of the underlying distributions are known, wi,j is a known volume measure which will be further
specified below.
For the moment we could also concentrate on a single accident year i, i.e. we
only need the Model Assumptions 4.33 for a fixed accident year i.
A pair of distributions given by (4.118) and (4.119) is said to be a (one parametric) Exponential dispersion family with associated conjugates. Examples
are (see B
uhlmann-Gisler [18], Section 2.5): Poisson-gamma case (see also
Verrall [81, 82] and Subsection 4.2.3), Binomial-beta case, Gamma-gamma
case or Normal-normal case.
We have the following lemma:
Lemma 4.35 (Associate Conjugate) Under Model Assumptions 4.33 the conditional distribution of i , given Xi,0 , . . . , Xi,j , has the density u(i) , 2 () with
post,j post,j
the a posteriori parameters
#1
j
2
X

wi,k
,
= 2 2 +

k=0
#
"
j
2
X
post,j
2
(j)
=
2 1+
wi,k Yi
,
2

k=0
"

2
post,j

(i)

post,j

(4.120)

(4.121)

where
(j)
Yi =

j
X
k=0

wi,k
Xi,k

.
Pj
k i
l=0 wi,l

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

(4.122)

118

Chapter 4. Bayesian models

Proof. Define Yi,j = Xi,j /(j i ). The joint distribution of (i , Yi,0 , . . . , Yi,j ) is
given by
fi ,Yi,0 ,...,Yi,j (, y0 , . . . , yj ) = fYi,0 ,...,Yi,j |i (y0 , . . . , yj |) u1, 2 ()


1 b()
2
= d(1, ) exp
(4.123)
2




j
Y
yk b()
2
exp
.

a yk ,
2 /w
w

i,k
i,k
k=0
Hence the conditional distribution of i , given Xi,0 , . . . , Xi,j , is proportional to
( "
#
"
#)
j
j
X
X
1
wi,k Xi,k
1
wi,k
exp 2 +
b() 2 +
.
(4.124)

2 k i

2
k=0
k=0
This finishes the proof of the lemma.
2
Remarks 4.36
Lemma 4.35 states that the distribution defined by density (4.119) is a conjugated distribution to the distribution given by (4.118). This means that
the a posteriori distribution of i , given Xi,0 , . . . , Xi,j , is again of the type
(i)
2
and post,j .
(4.119) with updated parameters post,j
From Lemma 4.35 we can calculate the distribution of (Yi,Ii+1 , . . . Yi,J ), given
DI . First we remark that different accident years are independent, hence we
can restrict ourselves to the observations Yi,0 , . . . , Yi,Ii , then we have that
the a posteriori distribution is given by
Z

J
Y
j=Ii+1

()

Fi,j (yj ) u(i)

2
post,Ii ,post,Ii

() d.

(4.125)

In the Poisson-gamma case this is a negative binomial distribution. Observe


that for the EDF with its associate conjugates we can determine the explicit
distributions, not only estimates for the first two moments.
Theorem 4.37 Under the Model Assumptions 4.33 we have for i = 0, . . . , I and
j = 0, . . . , J:
Xi,j
1. The conditional moments of the standardized observations j
are given by
i


Xi,j
def.
(i ) = E
i = b0 (i ),
(4.126)
j i


Xi,j
2 b00 (i )
i
=
.
(4.127)
Var
j i
wi,j
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

Chapter 4. Bayesian models


2. If exp

119


i b() / 2 disappears on the boundary of i for all i , 2 then
E [Xi,j ] = j E [(i )] = j i ,
(j)
E [(i )| Xi,0 , . . . , Xi,j ] = i,j Y + (1 i,j ) 1,
i

(4.128)
(4.129)

where
Pj

wi,k
.
2
2
k=0 wi,k + /
k=0

i,j = Pj

(4.130)

Proof. See Lemma 5 below, Theorem 2.20 in B


uhlmann-Gisler [18] or BernardoSmith [9].
2
Remarks 4.38
In Model Assumptions 4.33 and in Theorem 4.37 we study the standardized
version for the observations Xi,j . If i is equal for all i, the standardization
is not necessary. If they are not equal, the standardized version is straightforward for comparisons between accident years.
Theorem 4.37 says, that the a posteriori mean of (i ), given the observations
Xi,0 , . . . , Xi,j , is a credibility weighted average between the a priori mean
(j)
E[(i )] = 1 and the weighted average Yi of the standardized observations.
The larger the individual variation 2 the smaller the credibility weight i,j ;
the larger the collective variability 2 the larger the credibility weight i,j .
For a detailed discussion on the credibility coefficient
= 2 / 2

(4.131)

we refer to B
uhlmann-Gisler [18].
Estimator 4.39 Under Model Assumptions 4.33 we have the following estimators
for the increments E [Xi,Ii+k | DI ] and the ultimate claims E [Ci,J | DI ]
X\
i,Ii+k
d
C
i,J

EDF
EDF

^
= Ii+k i (
i ),

(4.132)

^
= Ci,Ii + (1 Ii ) i (
i)

(4.133)

for I J + 1 i I and k {1, . . . , J I + i}, where


^
(Ii) + (1 i,Ii ) 1.
(
i ) = E [(i )| DI ] = i,Ii Yi
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

(4.134)

120

Chapter 4. Bayesian models

Theorem 4.40 (Bayesian estimator) Under Model Assumptions 4.33 the estiEDF
EDF
^
d
\
mators (
and C
are DI -measurable and minimize the coni ), Xi,j+k
i,J
ditional mean square errors msep(i )|DI (), msepXi,j+k |DI () and msepCi,J |DI (), respectively, for I J + 1 i I. I.e. these estimators are Bayesian w.r.t. DI and
minimize the quadratic loss function (L2 (P )-norm).
^
Proof. The DI -measurability is clear. But then the claim for (
i ) is clear,
since the conditional expectation minimizes the mean square error given DI (see
Theorem 2.5 in [18]). Due to our independence assumptions we have
^
E [Xi,Ii+k |DI ] = E [E [Xi,Ii+k |i ] |DI ] = Ii+k i (
i ),
^
E [Ci,J |DI ] = Ci,Ii + (1 Ii ) i (
i ),

(4.135)
(4.136)

which finishes the proof of the theorem.


2
Explicit choice of weights.
W.l.o.g. we may and will assume that
mb = E [b00 (i )] = 1.

(4.137)

Otherwise we simply multiply 2 and 2 by mb , which in our context of EDF


with associate conjugates leads to the same model with b() replaced by b(1) () =
mb b(/mb ). This rescaled model has then

mb 2 b00(1) (i )
 00

Xi,j

,
with
E
b
(
)
= 1, (4.138)
=
Var
i
i
(1)
j i
wi,j

Var b0(1) (i ) = mb 2 .
(4.139)


Since both, 2 and 2 are multiplied by mb , the credibility weights i,j do not
change under this transformation. Hence we assume (4.137) for the rest of this
work.
In Section 4.2.4 we have not specified the weights wi,j . In Mack [47] there is a
discussion choosing appropriate weights (Assumption (A4 ) in Mack [47]). In fact
we could choose a design matrix i,j which gives a whole family of models. We do
not further discuss this here, we will do a canonical choice (which is favoured in
many applications) that has the nice side effect that we obtain a natural mixture
between the chain-ladder estimate and the Bornhuetter-Ferguson estimate.
Model Assumptions 4.41
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

Chapter 4. Bayesian models

121

In addition to Model Assumptions 4.33 and (4.137) we assume that there exists 0 with wi,j = j i for all i = 0, . . . , I and j = 0, . . . , J and that


exp 0 b() / 2 disappears on the boundary of i for all 0 and 2 .
2
Pj
Hence, we have k=0 wi,k = j i . This immediately implies:
Corollary 4.42 Under Model Assumptions 4.41 we have for all i = 0, . . . , I that
Ci,Ii
^
(
+ (1 i,Ii ) 1,
(4.140)
i ) = i,Ii
Ii i
Ii
where i,Ii =
.
(4.141)
2
Ii + 2
i

Remark. Compare the weight i,Ii from (4.141) to i,Ii from (4.110):
In the notation of Subsection 4.2.3 (see (4.110)) we have
E [Var (Xi,Ii | i )]
Ii Var(i )
and in the notation of this subsection we have
i
h

Xi,Ii

E Var
i i
2 /i
 .
i =
=
2
Ii Var (i )
i = i bi =

(4.142)

(4.143)

P oiGa
EDF
d
d
This shows that the estimators C
and C
give the same estimated
i,J
i,J
reserve (the Poisson-gamma model is an example for the Exponential dispersion
family with associate conjugates).

Example 4.43 (Exponential dispersion model with associate conjugate)


We revisit the data set given in Example 2.7. For the a priori parameters we do
the same choices as in Example 4.19 (see Table 4.4).
Observe that the credibility weight of the reserves does not depend on the choice
of 0 for given Vco(Ci,J ): Using the conditional independence of the increments
Xi,j , given i , and (4.126), (4.127) and (4.137) leads to
!#
#!
"
" J
J


X
X


Var(Ci,J ) = E Var
Xi,j i
+ Var E
Xi,j i


j=0
j=0
" J
#

!


J
X
X

X
X
i,j
i,j
=E
j2 2i Var
i
+ Var
j i E

i

j
i
j
i
j=0
j=0
=
=

J
X

j2 2i

j=0
2i


2
+ 2i Var b0 (i )
i,j

2 + 2i 2 .

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

(4.144)

122

Chapter 4. Bayesian models

Hence, we have that


Vco2 (Ci,J ) =

2
+ 2.

(4.145)

This implies that


i,Ii =

Ii
Ii
=
.
2
Vco2 (Ci,J )
Ii + 2

1
Ii
2
i

(4.146)

For simplicity we have chosen = 0 in Table 4.12, which implies that i .


0
1
2
3
4
5
6
7
8
9

5.00%
5.00%
5.00%
5.00%
5.00%
5.00%
5.00%
5.00%
5.00%
5.00%

6.00%
6.00%
6.00%
6.00%
6.00%
6.00%
6.00%
6.00%
6.00%
6.00%

1.4400
1.4400
1.4400
1.4400
1.4400
1.4400
1.4400
1.4400
1.4400
1.4400

i,Ii
41.0%
40.9%
40.9%
40.9%
40.8%
40.6%
40.3%
39.7%
37.9%
29.0%

^
(
i)
0.9822
0.9746
0.9888
0.9669
0.9567
0.9509
0.9349
0.9136
0.9208
0.9502

reserves EDF
0
15715
26695
36333
91303
169281
319093
524484
1214184
4530884
6927973

Table 4.12: Estimated reserves in the Exponential dispersion model with associate
conjugate

The estimates in Table 4.10 and Table 4.12 lead to the same result.
^
Moreover, we see that the Bayesian estimate (
i ) is below 1 for all accident years
i (see Table 4.12). This suggests (once more) that the choices of the a priori
estimates i for the ultimate claims were too conservative.
EDF

d
Conclusion 1. Corollary 4.42 implies that the estimator C
gives the optimal
i,J
mixture between the Bornhuetter-Ferguson and the chain-ladder estimates in the
EDF with associate conjugate: Assume that j and fj are identified by (4.3) and
bCL = Ci,Ii /Ii . Then we have that
set C
i,J


EDF
Ci,Ii
d
Ci,J
= Ci,Ii + (1 Ii ) i,Ii
+ (1 i,Ii ) i
Ii
h
i
CL
d
= Ci,Ii + (1 Ii ) i,Ii C
+
(1

i,J
i,Ii
i (4.147)
= Ci,Ii + (1 Ii ) Si (i,Ii ) ,
where Si () is the function defined in (4.1). Hence we have the mixture
EDF
CL
BF
d
d
d
C
= i,Ii C
+ (1 i,Ii ) C
i,J
i,J
i,J

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

(4.148)

Chapter 4. Bayesian models

123

between the CL estimate and the BF estimate. Moreover it minimizes the conditional MSEP in the Exponential dispersion model with associate conjugate. Observe that
Ii
,
(4.149)
i,Ii =
Ii + i
where the credibility coefficient was defined in (4.143). If we choose i = 0 we obtain the chain-ladder estimate and if we choose i = we obtain the BornhuetterFerguson reserve.
Conclusion 2. Using (4.135) we find for all I i j < J that
E [Ci,j+1 |Ci,0 , . . . , Ci,Ii ]

#
" j+1

X

Xi,l Ci,0 , . . . , Ci,Ii
= Ci,Ii + E

l=Ii+1

= Ci,Ii +

j+1
X

^
l i (
i)

(4.150)

l=Ii+1


=


j+1 Ii
1+
i,Ii Ci,Ii + (j+1 Ii ) (1 i,Ii ) i .
Ii

In the 2nd step we explicitly use, that we have an exact Bayesian estimator. (4.150)
does not hold true in the B
uhlmann-Straub model (see Section 4.3 below). Formula
(4.150) suggests that the EDF with associate conjugate is a linear mixture of
the chain-ladder model and the Bornhuetter-Ferguson model. If we choose the
credibility coefficient i = 0, we obtain


j+1 j
E [Ci,j+1 |Ci,0 , . . . , Ci,j ] = 1 +
Ci,j = fj Ci,j ,
(4.151)
j
if we assume (4.3). This is exactly the chain-ladder assumption (2.1). If we choose
i = then i,Ii = 0 and we
E [Ci,J |Ci,0 , . . . , Ci,Ii ] = Ci,Ii + (1 Ii ) i ,

(4.152)

which is Model 2.8 that we have used to motivate the Bornhuetter-Ferguson estiBF
d
mate C
.
i,J
Under Model Assumptions 4.41 we obtain for the conditional mean square error of
prediction



2 
^
^

msep(i )|DI (
(
= Var((i )|DI ), (4.153)
i) = E
i ) (i ) DI
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

124

Chapter 4. Bayesian models

and hence we have that


msep(i )




^
(i ) = E Var((i )|DI ) .

(4.154)

If we plug in the estimator (4.140) we obtain




^
msep(i ) (
i)
"
2 #
Ci,Ii
+ (1 i,Ii ) 1 (i )
=E
i,Ii
Ii i
"


#
 2
Ci,Ii
=E
i,Ii
(i ) (1 i,Ii ) (i ) 1
(4.155)
Ii i



Ci,Ii
2
= (i,Ii ) E Var
i
+ (1 i,Ii )2 Var ((i ))
Ii i
= (i,Ii )2

2
+ (1 i,Ii )2 2
Ii i

= (1 i,Ii ) 2 ,
2

where in the last step we have used 2 (1 i,Ii ) = i,Ii


(cf. (4.141)).
i Ii
From this we derive the unconditional mean square error of prediction for the
estimate of Ci,J :



2 
EDF
^
d
msepCi,J C
= E (1 Ii ) i (
(4.156)
i,J
i ) (Ci,J Ci,Ii )
"
2 #

 C C
i,J
i,Ii
^
= 2i E
(1 Ii ) (
i ) (i ) + (i )
i
2

J

X


^
(i ) +
E Var (Xi,k | i )

2i

(1 Ii ) msep(i )

2i



(1 Ii )2 (1 i,Ii ) 2 + (1 Ii ) 2 /i .

k=Ii+1

Moreover, if we set = 0 we obtain




d
msepCi,J C
i,J

EDF

= (1 Ii ) i

2
i 2
2
ii + i

1+

(4.157)

This is the same value as for the Poisson-gamma case, see (4.116) and Table 4.11.
For the conditional mean square error of prediction for the estimate of Ci,J , one
needs to calculate


Var (i )|DI = Var b0 (i )|DI ,
(4.158)
where i , given DI , has a posteriori distribution u(i) , 2 () given by Lemma
post,j post,j
4.35. We omit its further calculation.
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

Chapter 4. Bayesian models

125

Remarks on parameter estimation. So far we have always assumed that i ,


j , 2 and 2 are known. Under these assumptions we have calculated the Bayesian
estimator which was optimal in the sense that it minimizes the MSEP. If the
parameters are not known, the problem becomes substantially more difficult and
in general one loses the optimality results.
Estimation of j . At the moment we do not have a canonical way, how the
claims development pattern should be estimated. In practice one often chooses the
(CL)
chain-ladder estimate bj
provided in (2.25) and then sets
(CL)

bj

(CL)
(CL)
= bj
bj1 .

(4.159)

At the current stage we can not say anything about the optimality of this estimator.
However, observe that for the Poisson-gamma model this estimator is natural in
the sense that it coincides with the MLE estimator provided in the Poisson model
(see Corollary 2.18). For more on this topic we refer to Subsection 4.2.5.
Estimation of i . Usually, one takes a plan value, a budget value or the value
used for the premium calculation (as in the BF method).
Estimation of 2 and 2 . For known j and i one can give unbiased estimators
for these variance parameters. For the moment we omit its formulation, because
in Section 4.3 we see that the Exponential dispersion model with its associate
conjugates satisfies the assumptions of the B
uhlmann-Straub model. Hence we can
take the same estimators as in the B
uhlmann-Straub model and these are provided
in Subsection 4.3.1.

4.2.5

Poisson-gamma case, revisited

In Model Assumptions 4.26 and 4.33 we have assumed that the claims development
pattern j is known. Of course, in general this is not the case and in practice one
usually uses estimate (4.159) for the claims development pattern. In Verrall [82]
this is called the plug-in estimate (which leads to the CL and BF mixture).
However, in a full Bayesian approach one should also estimate this paremater in a
Bayesian way (since usually it is not known). This means that we should also give
an a priori distribution to the claims development pattern. For simplicity, we only
treat the Poisson-gamma case (which was also considered in Verrall [82]). We have
the following assumptions
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

126

Chapter 4. Bayesian models

Model Assumptions 4.44 (Poisson-gamma model)


There exist positive random vectors = (i )i and = (j )j with
such that for all i {0, . . . , I} and j {0, . . . , J} we have

PJ

j=0

j = 1

conditionally, given and , the Xi,j are independent and Poisson distributed
with mean i j .
and are independent and i are independent Gamma distributed with
shape parameter ai and scale parameter bi , and is f distributed.
2
As before, we can calculate the joint distribution of {Xi,j , i + j I}, and ,
which is given by
Y

f ((xi,j )i+jI , , ) =

i j

i+jI

I
(i j )xi,j Y
fai ,bi (i ) f ().

xi,j !
i=0

(4.160)

The posterior distribution (, ) given the observations DI is proportional to


I
Y

(Ii)J

post
apost
i,Ii ,bIi

i=0

(i )

j i,j f (),

(4.161)

j=0

with
apost
i,j

= ai +

jJ
X

Xi,j

and

bpost
i,j

= bi +

k=0

jJ
X

k ,

(4.162)

k=0

see also Lemma 4.28. From this we immediately see, that one can not calculate
analytically the posterior distribution of (, ) given the observations DI , but this
also implies that we can not easily calculate the conditional distribution of Xk+l ,
k + l > I, given the observations DI . Hence these Bayesian models can only be
implemented with the help of numerical simulations, e.g. the Markov Chain Monte
Carlo (MCMC) approach. The implementation using a simulation-based MCMC
is discussed in de Alba [2, 4] and Scollnik [70].

4.3

B
uhlmann-Straub Credibility Model

In the last section we have seen an exact Bayesian approach to the claims reserving
problem. The Bayesian estimator
^
(
i ) = E [(i )| Xi,0 , . . . , Xi,Ii ]
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

(4.163)

Chapter 4. Bayesian models

127

is the best estimator for (i ) in the class of all estimators which are square integrable functions of the observations Xi,0 , . . . , Xi,Ii . The crucial point in the
calculation was that from the EDF with its associate conjugates we were able
to explicitly calculate the a posteriori distribution of (i ). Moreover, the parameters of the a posteriori distribution and the Bayesian estimator were linear in the
observations. However, in most of the Bayesian models we are not in the situation where we are able to calculate the a posteriori distribution, and therefore the
Bayesian estimator cannot be expressed in a closed analytical form. I.e. in general
the Bayesian estimator does not meet the practical requirements of simplicity and
intuitiveness and can only be calculated by numerical procedures such as Markov
Chain Monte Carlo methods (MCMC methods).
In cases where we are not able to derive the Bayesian estimator we restrict the class
of possible estimators to a smaller class, which are linear functions of the observations Xi,0 , . . . , Xi,Ii . This means that we try to get an estimator which minimizes
the quadratic loss function among all estimators which are linear combinations of
the observations Xi,0 , . . . , Xi,Ii . The result will be an estimator which is practicable and intuitive by definition. This approach is well-known in actuarial science as
credibility theory and since best is also to be understood in the Bayesian sense
credibility estimators are linear Bayes estimators (see B
uhlmann-Gisler [18]).
In claims reserving the credibility theory was used e.g. by De Vylder [84], Neuhaus
[56] and Mack [51] in the B
uhlmann-Straub context.
In the sequel we always assume that the incremental loss development pattern
(j )j=0,...,J given by
0 = 0

and j = j j1 for j = 1, . . . , J

(4.164)

is known as in the previous sections on Bayesian estimates.


Model Assumptions 4.45 (B
uhlmann-Straub model [18])
Conditionally, given i , the increments Xi,0 , . . . , Xi,J are independent with
E [Xi,j /j |i ] = (i ),

(4.165)

Var (Xi,j /j |i ) = 2 (i )/j

(4.166)

for all i = 0, . . . , I and j = 0, . . . , J.


The pairs (i , Xi ) (i = 0, . . . , I) are independent, and the i are independent
and identically distributed.
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

128

Chapter 4. Bayesian models

For the cumulative claim amount we obtain


E [Ci,j |i ] = j (i ),

(4.167)

Var (Ci,j |i ) = j 2 (i ).

(4.168)

The latter equation shows that this model is different from Model 4.14. The term
(1 j ) 2 (Ci,J ) is replaced by 2 (i ). On the other hand the B
uhlmann-Straub
model is very much in the spirit of the EDF with its associate conjugates. The
parameter i plays the role of the underlying risk characteristics, i.e. the parameter
i is unknown and tells us whether we have a good or bad accident year. For a
more detailed explanation in the framework of tariffication and pricing we refer to
B
uhlmann-Gisler [18].
\
In linear credibility theory one looks for an estimate (
i ) of (i ) which minimizes
the quadratic loss function among all estimators which are linear in the observations
Xi,j (see also [18], Definition 3.8). I.e. one has to solve the optimization problem
cred

\
(
i)



= argmineL(X,1) E (()
e)2 ,

(4.169)

where

I (Ii)J

X
X
L(X, 1) =
e;
e = ai,0 +
ai,j Xi,j

i=0

with ai,j R

(4.170)

j=0

Remarks 4.46
cred

\
Observe that the credibility estimator (
is linear in the observations
i)
Xi,j by definition. We could also allow for general real-valued, square integrable functions of the observations Xi,j . In that case we obtain simply the
Bayesian estimator since the conditional a posteriori expectation minimizes
the quadratic loss function among all estimators which are a square integrable
function of the observations.
Credibility estimators can also be constructed using Hilbert space theory.
Indeed (4.169) asks for a minimization in an L2 -sense, which corresponds to
orthogonal projections in Hilbert spaces. For more on this topic we refer to
B
uhlmann-Gisler [18].
We define the structural parameters
0 = E [(i )] ,


2 = E 2 (i ) ,

(4.171)

2 = Var ((i )) .

(4.173)

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

(4.172)

Chapter 4. Bayesian models

129

Theorem 4.47 (inhomogeneous B


uhlmann-Straub estimator)
Under Model Assumptions 4.45 the optimal linear inhomogeneous estimator of
(i ), given the observations DI , is given by
cred

\
(
i)

= i Yi + (1 i ) 0

(4.174)

for I J + 1 i I, where
i =

Ii
,
Ii + 2 / 2
(Ii)J

Yi =

j=0

(4.175)

Ci,(Ii)J
j Xi,j

=
.
Ii j
Ii

(4.176)

In credibility theory the a priori mean 0 can also be estimated from the data.
This leads to the homogeneous credibility estimator.
Theorem 4.48 (homogeneous B
uhlmann-Straub estimator)
Under Model Assumptions 4.45 the optimal linear homogeneous estimator of (i )
given the observations DI is given by
hom

\
(
i)

= i Yi + (1 i )
b0

(4.177)

for I J + 1 i I, where i and Yi are given in Theorem 4.47 and


I
X
i
Yi ,

b0 =

i=0

with

I
X

i .

(4.178)

i=0

Proof of Theorem 4.47 and Theorem 4.48. We refer to Theorems 4.2 and 4.4
in B
uhlmann-Gisler [18].
2
Remarks 4.49
If the a priori mean 0 is known we choose the inhomogeneous credibility esticred
\
mator (
from Theorem 4.47. This estimator minimizes the quadratic
i)
loss function given in (4.169) among all estimators given in (4.170).
If the a priori mean 0 is unknown, we estimate its value also from the data.
hom
\
This is done by switching to the homogeneous credibility estimator (
i)
given in Theorem 4.48. The crucial part is that we have to slightly change
the set of possible estimators given in (4.170) towards

I (Ii)J

X
X
Le (X) =
e;
e=
ai,j Xi,j with ai,j R, E [e
] = 0 .

i=0

j=0

(4.179)
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

130

Chapter 4. Bayesian models


The homogeneous credibility estimator minimizes the quadratic loss function
among all estimators from the set Le (X), i.e.
hom


\
(4.180)
(
= argmineLe (X) E (()
e)2 .
i)

The crucial point in the credibility estimators in (4.174) and (4.177) is that
we take a weighted average Yi between the individual observations of accident
year i and the a priori mean 0 and its estimator
b0 , respectively. Observe
that the weighted average Yi only depends on the observations of accident
year i. This is a consequence of the independence assumption between the
accident years. However, the estimator
b0 uses the observations of all accident
years since the a priori mean 0 holds for all accident years. The credibility
weight i [0, 1] for the weighted average of the individual observations Yi
becomes small when the expected fluctuations within the accidents years 2
are large and becomes large if the fluctuations between the accident years 2
are large.
The estimator (4.174) is exactly the same as the one from the exponential
dispersion model with associate conjugates (Corollary 4.42) if we assume that
all a priori means i are equal and = 0.
cred

\
Since the inhomogeneous estimator (
contains a constant it is aui)
tomatically an unbiased estimator for the a priori mean 0 . In contrast to
cred
hom
\
\
(
the homogeneous (
is unbiased for 0 by definition.
i)
i)
The weights j in the model assumptions could be replaced by weights i,j ,
then the B
uhlmann-Straub result still holds true. Indeed, one could choose a
design matrix i,j = i (j) to apply the B
uhlmann-Straub model (see Taylor
[75] and Mack [47]) and the variance condition is then replaced by
Var (Xi,j /j,i |i ) =

2 (i )
,

Vi j,i

(4.181)

where Vi > 0 is an appropriate measure for the volume and > 0. = 1 is


the model favoured by Mack [47], whereas De Vylder [84] has chosen = 2.
For = 0 we obtain a condition which is independent of j (credibility model
of B
uhlmann, see [18]).
Different a priori means i . If Xi,j /j has different a priori means i for different
accident years i, we modify the B
uhlmann-Straub assumptions (4.165)-(4.166) to


Xi,j
i = (i ),
(4.182)
E
j i


Xi,j
2 (i )
Var
i
=
,
(4.183)
j i
j i
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

Chapter 4. Bayesian models

131

for an appropriate choice 0. In this case we have E [(i )] = 1 and the


inhomogeneous and homogeneous credibility estimator are given by
cred

\
(
i)

= i Yi + (1 i ) 1,

(4.184)

= i Yi + (1 i )
b0 ,

(4.185)

and
hom

\
(
i)
respectively, where
Yi =

Ci,IiJ
,
i Ii

i =

Ii
Ii + i

with i =

2
.
i 2

(4.186)

Observe that this gives now completely the same estimator as in the exponential
dispersion family with its associate conjugates (see Corollary 4.42).
This immediately gives the following estimators:
Estimator 4.50 (B
uhlmann-Straub credibility reserving estimator)
In the B
uhlmann-Straub model 4.45 with generalized assumptions (4.182)-(4.183)
we have the following estimators
cred

cred
\
d
b [Ci,J | DI ] = Ci,Ii + (1 Ii ) i (
C
= E
i,J
i)

d
C
i,J

hom

(4.187)

hom

\
b [Ci,J | DI ] = Ci,Ii + (1 Ii ) i (
= E
i)

(4.188)

for I J + 1 i I.
Lemma 4.51 In the B
uhlmann-Straub model 4.45 the quadratic losses for the credibility estimators are given by
"
2 #
cred
\
E
(
(i )
= 2 (1 i ) ,
(4.189)
i)
"
E

hom

\
(
i)

2 #


1 i
2
(i )
= (1 i ) 1 +

(4.190)

for I J + 1 i I.
Proof. We refer to Theorems 4.3 and 4.6 in B
uhlmann-Gisler [18].
2
Corollary 4.52 In the B
uhlmann-Straub model 4.45 with generalized assumptions
(4.182)-(4.183) the mean square errors of prediction of the inhomogeneous and homogeneous credibility reserving estimator are given by




cred
d
msepCi,J Ci,J
= 2i (1 Ii ) 2 /i + (1 Ii )2 2 (1 i ) . (4.191)
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

132

Chapter 4. Bayesian models

and




hom
cred
(1 i )2
d
d
, (4.192)
msepCi,J C
= msepCi,J C
+ 2i (1 Ii )2 2
i,J
i,J

respectively, for I J + 1 i I.
Remarks 4.53
The first term on the right-hand side of the above equalities stands again for
the process error whereas the second terms stand for the parameter/prediction
errors (how good can an actuary predict the mean). Observe again, that we
assume that the incremental loss development pattern (j )j=0,...,J is known,
and hence we do not estimate the estimation error in the claims development
pattern.
Observe the MSEP formula for the credibility estimator conicides with the
one for the exponential dispersion family, see (4.156).
Proof. We separate the mean square error of prediction as follows
"
2 #


cred
cred
\
d
msepC
C
=E
(1 Ii ) i (
(Ci,J Ci,Ii )
.
i,J
i)
i,J

(4.193)
Conditionally, given = (0 , . . . , I ), we have that the increments Xi,j are independent. But this immediately implies that the expression in (4.193) is equal
to
" "

2 ##
cred

\
(4.194)
E E (1 Ii )2 2i (
(i )
i)

h h
2 ii
+ E E (1 Ii ) i (i ) (Ci,J Ci,Ii )


cred


2
2
\
+ E Var (Ci,J Ci,Ii | ) .
= (1 Ii ) i msep(i ) (i )
But then the claim follows from Lemma 4.51 and
Var (Ci,J Ci,Ii | ) = (1 Ii ) 2
2 (i ).
i

(4.195)
2

4.3.1

Parameter estimation

So far (in the example) the choice of the variance parameters was rather artificial.
In this subsection we provide estimators for 2 and 2 . In practical applications it
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

Chapter 4. Bayesian models

133

is often convenient to eliminate outliers for the estimation of 2 and 2 , since the
estimators are often not very robust.
Before we start with the parameter estimations we would like to mention that in
this section essentially the same remarks apply as the ones mentioned on page 125.
We need to estimate j , 2 and 2 . For the weights j we proceed as in (4.159)
Estimate the claims development pattern j from (2.25). The incremental losse
development pattern j is then estimated by (4.164).
We define
2

(Ii)J
X
Xi,j
1
Yi .
j
(4.196)
Si =
(I i) J j=0
j
Then Si is an unbiased estimator for 2 (see [18], (4.22)). Hence 2 is estimated
by the following unbiased estimator
I1

1X
Si .
b2 =
I i=0

(4.197)

For the estimation of 2 we define


T =

I
X
i=0

where

2

P Ii Yi Y ,
i Ii

P
P
Ii Yi
i Ci,(Ii)J
i
Y = P
= P
.
i Ii
i Ii

Then an unbiased estimator for 2 is given by (see [18], (4.26))




2
I

b2 = c T P
,
i Ii
with
c=

I
X
i=0


!1
Ii
Ii
P
1 P
.
i Ii
i Ii

(4.198)

(4.199)

(4.200)

(4.201)

If b2 is negative it is set to zero.


If we work with different i we have to slightly change the estimators (see B
uhlmannGisler [18], Section 4.8).
Example 4.54 (B
uhlmann-Straub model, constant i )

We revisit the data given in Example 2.7. We recall that we have set Vco (i ) =
5% and Vco(Ci,J ) = 7.8%, using external know how only (see Tables 4.9 and 4.4).
For this example we assume that all a priori expectations i are equal and we use
the homogeneous credibility estimator. We have the following observations, where
the incremental claims development pattern j is estimated via the chain-ladder
method.
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

0
5946975
6346756
6269090
5863015
5778885
6184793
5600184
5288066
5290793
5675568
1.4925

2
10563929
10316383
10092366
9268771
9178009
9585897
9056505
8256211

1.0229

1
9668212
9593162
9245313
8546239
8524114
9013132
8493391
7728169
7648729
1.0778

1.0148

3
10771690
10468180
10355134
9459424
9451404
9830796
9282022

1.0070

4
10978394
10536004
10507837
9592399
9681692
9935753

1.0051

5
11040518
10572608
10573282
9680740
9786916

1.0011

6
11106331
10625360
10626827
9724068

1.0010

7
11121181
10636546
10635751

1.0014

8
11132310
10648192

9
11148124

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

1
10086719
10764791
10633061
9944313
9801620
10490085
9498524
8969136
8973762
9626383
59.0%

3
13090078
10569215
12378890
10559131
9556060
8370426
8229388
7716853

6.8%

2
12814544
11179404
10248997
9240019
9453540
9739738
9963120
8402801
8119848
29.0%

2.2%

4
9577303
6997504
12113052
8788679
12602942
11289335
10395847

1.4%

5
14357308
4710946
10606498
9236259
15995382
7290128

0.7%

6
9048371
5331290
9531934
12866767
15325912

0.5%

7
12901245
10340861
10496344
8493606

0.1%

8
13793367
10390677
8289406

0.1%

9
10658637
11152804

0.1%

10
11148124

Table 4.14: Observed scaled incremental payments Xi,j /j and estimated incremental claims development pattern
bj

0
1
2
3
4
5
6
7
8
9
j

Table 4.13: Observed historical cumulative payments Ci,j and estimated chain-ladder factors fbj , see Table 2.2

0
1
2
3
4
5
6
7
8
9
fbj

134
Chapter 4. Bayesian models

Chapter 4. Bayesian models

135

Hence we obtain the following estimators:


c = 1.11316,
Y

c2

(4.202)

= 90 9110 975,

(4.203)

b = 3370 289,

(4.204)

b = 7340 887,

(4.205)

b0 = 90 8850 584.

d (i ) =
= 21.1%, Vco

This leads with


b = c2

1/2
c2 +c2

= 8.2% to the following reserves:

c0

(4.206)
b

c0

d i,J ) =
= 7.4% and Vco(C

estimated reserves
0
1
2
3
4
5
6
7
8
9

i
82.6%
82.6%
82.6%
82.5%
82.5%
82.4%
82.2%
81.8%
80.7%
73.7%

cred
d
C
i,J
11148124
10663125
10661675
9758685
9872238
10091682
9569836
8716445
8719642
9654386

CL
0
15126
26257
34538
85302
156494
286121
449167
1043242
3950815
6047061

hom. cred.
0
14934
25924
34616
85322
155929
287814
460234
1070913
3978818
6114503

Table 4.15: Estimated reserves in the homogeneous B


uhlmann-Straub model (constant i )
We see that the estimates are close to the chain-ladder method. This comes from
the fact that the credibility weights are rather big: Since
b is rather small compared
to Ii we obtain credibility weights which are all larger than 70%.
For the mean square errors of prediction we obtain the values in Table 4.16.
Example 4.55 (B
uhlmann-Straub model, varying i )
We revisit the data set given in Example 2.7 and Example 4.55. This time we
assume that an a priori differentiation i is given by Table 4.6 (a priori mean
for Bornhuetter-Ferguson method). We apply the scaled model (4.182)-(4.183) for
= 0, 1, 2 and obtain the reserves in Table 4.17.
We see that the estimates for different s do not differ too much, and they are still
close to the chain-ladder method. However, they differ from the estimates for the
constant i case (see Table 4.15).
For the estimated variational coefficient we have for = 0, 1, 2

d (i ) 6.8%.
Vco
(4.207)
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

136

Chapter 4. Bayesian models


1/2

msepC

i,J

cred
d
C
i,J

0
1
2
3
4
5
6
7
8
9
total

1/2

msepC

i,J

hom
d
C
i,J

0
12711
16755
20095
31465
42272
59060
78301
123114
265775
314699

0
12711
16755
20096
31467
42278
59076
78339
123259
267229
315998

Table 4.16: Mean square error of prediction in the B


uhlmann-Straub model (constant i )

0
1
2
3
4
5
6
7
8
9
total

c0

credibility weights i
=0
=1
=2
80.2%
80.6%
81.1%
80.1%
80.2%
80.3%
80.1%
79.6%
79.1%
80.1%
79.1%
78.0%
80.0%
79.7%
79.3%
79.9%
80.2%
80.4%
79.7%
79.8%
80.0%
79.3%
79.0%
78.8%
78.1%
77.6%
77.0%
70.4%
71.0%
71.5%
0.8810

0.8809

reserves
CL
0
15126
26257
34538
85302
156494
286121
449167
1043242
3950815
6047061

credibility reserves
=0
=1
=2
0
0
0
14943
14944
14944
25766
25753
25740
34253
34238
34222
85056
85051
85046
156562
156561
156559
289078
289056
289035
460871
461021
461180
1069227 1069815 1070427
4024687 4023270 4021903
6160443 6159709 6159056

0.8809

Table 4.17: Estimated reserves in the homogeneous B


uhlmann-Straub model (varying i )

This describes the accuracy of the estimate of the true expected mean by the
actuary. Observe that we have chosen 5% in Example 4.54.
Moreover, we see (once more) that the a priori estimate i seems to be rather
pessimistic, since b0 is substantially smaller than 1 (for all ).
For the mean square error of prediction we obtain the values in Table 4.18.

4.4

Multidimensional credibility models

In Section 4.3 we have assumed that the incremental payments have the following
form
E [Xi,j |i ] = j (i ).
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

(4.208)

Chapter 4. Bayesian models

137
1/2

msepC

i,J

1
2
3
4
5
6
7
8
9
total

=0
12835
16317
18952
30871
43110
59876
77383
120119
273931
320377

hom
d
C
i,J

=1
12771
16532
19511
31161
42682
59456
77819
121536
269926
317540

=2
12711
16755
20094
31464
42272
59059
78282
123008
266054
314889

Table 4.18: Mean square error of prediction in the B


uhlmann-Straub model (varying i )

The constant j denotes the payment ratio in period j. If we rewrite this in vector
form we obtain
E [Xi |i ] = (i ),

(4.209)

where Xi = (Xi,0 , . . . , Xi,J )0 and = (0 , . . . , J )0 .


We see that the stochastic terms (i ) can only act as a scalar. Sometimes we
would like to have more flexibility, i.e. we replace (i ) by a vector. This leads to
a generalization of the B
uhlmann-Straub model.

4.4.1

Hachemeister regression model

Model Assumptions 4.56 (Hachemeister regression model [31])


0
There exist p-dimensional design vectors j (i) = j,1 (i), . . . , j,p (i) and
0
vectors (i ) = 1 (i ), . . . , p (i ) (p J + 1) such that we have
E [Xi,j |i ] = j (i)0 (i ),
Cov (Xi,j , Xi,k |i ) = j,k,i (i )

(4.210)
(4.211)

for all i {0, . . . , I} and j {0, . . . , J}.


0
The (J +1)p matrix i = 0 (i), . . . , J (i) has rank p and the components
1 (i ), . . . , p (i ) of (i ) are linearly independent.
The pairs (i , Xi ) (i = 0, . . . , I) are independent, and the i are independent
and identically distributed.
Remarks 4.57
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

138

Chapter 4. Bayesian models

We are now in the credibility regression case, see B


uhlmann-Gisler [18], Sec0
tion 8.3, where (i ) = 1 (i ), . . . , p (i ) is a p-dimensional vector, which
we would like to estimate.
i is a known (J + 1) p design matrix.
We define the following parameters


= E (i ) ,


Sj,k,i = E j,k,i (i ) ,

(4.212)
(4.213)


T = Cov (), () ,

(4.214)

Si = (Sj,k,i )j,k=0, ,J

(4.215)

for i {0, . . . , I} and j, k {0, . . . , J} . Hence T is a p p covariance matrix for


the variability between the different accident years and Si is a (J + 1) (J + 1)
matrix that describes the variability within the accident year i. An important
special case for Si is given by

1
1
Si = 2 Wi1 = 2 diag wi,0
, . . . , wi,J
,

(4.216)

for appropriate weights wi,j > 0 and a scalar 2 > 0.


Theorem 4.58 (Hachemeister estimator)
Under Model Assumptions 4.56 the optimal linear inhomogeneous estimator for
(i ) is given by
cred
\
(
= Ai Bi + (1 Ai ) ,
(4.217)
i)
with


[Ii] 0 1 [Ii]
i
Si i

1 1

Ai = T T +
,

1
[Ii] 0 1 [Ii]
[Ii] 0 1
[Ii]
Bi = i
Si i
i
Si Xi ,

(4.218)
(4.219)

where
[Ii]

[Ii]
Xi

0
= 0 (i), . . . , (Ii)J (i), 0, . . . , 0

(4.220)

= (Xi,0 , . . . , Xi,(Ii)J , 0, . . . , 0)0

(4.221)

for I J + 1 i I with p I i + 1. The quadratic loss matrix for the credibility


estimator is given by

 
0 
cred
cred
\
\
E (i )
(i ) (i )
(i )
= (1 Ai ) T.
(4.222)
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

Chapter 4. Bayesian models

139

Proof. See Theorem 8.7 in B


uhlmann-Gisler [18].
2
We have the following corollary:
Corollary 4.59 (Standard Regression) Under Model Assumption 4.56 with Si
given by (4.216) we have


1 1
[Ii] 0
[Ii]
2
Ai = T T + i
Wi i
,
(4.223)
1

[Ii] 0
[Ii]
[Ii] 0
[Ii]
Bi = i
Wi i
i
Wi Xi
(4.224)
for I J + 1 i I with p I i + 1.
This leads to the following reserving estimator:
Estimator 4.60 (Hachemeister credibility reserving estimator)
In the Hachemeister Regression Model 4.56 the estimator is given by
cred
d
C
= Ci,Ii +
i,J

J
X

cred

\
j (i)0 (
i)

(4.225)

j=Ii+1

for I J + 1 i I with p I i + 1.
Remarks 4.61
If is not known, then (4.217) can be replaced by the homogeneous credibility
estimator for (i ) using
!1 I
I
X
X

b=

Ai Bi .
(4.226)
Ai
i=0

i=0

In that case the right-hand side of (4.222) needs to be replaced by

!1
I
X
(1 Ai ) T 1 +
A0i
(1 A0i ) .

(4.227)

i=0

Term (4.219) gives the formula for the data compression (see also Theorem
8.6 in B
uhlmann-Gisler [18]). We already see from this that for p > 1 we have
some difficulties with considering the youngest years since the dimension of
is larger than the available number of observations if p > I i + 1. Observe
that
E [Bi | i ] = (i ),
(4.228)
h

i

1

0
[Ii] 0 1 [Ii]
E Bi (i ) Bi (i ) = i
Si i
. (4.229)
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

140

Chapter 4. Bayesian models

Choices of the design matrix i . There are various possibilities to choose


the design matrix i . One possibility which is used is the so-called Hoerl
curve (see De Jong-Zehnwirth [42] and Zehnwirth [92]), set p = 3 and
j (i) = (1, log(j + 1), j)0 .

(4.230)

Parameter estimation. It is rather difficult to get good parameter estimations in this model for p > 1. If we assume that the covariance matrix

j,k,i (i ) j,k=0,...,J is diagonal with mean Si given by (4.216), we can estimate Si with the help of the one-dimensional B
uhlmann-Straub model (see
Subsection 4.3.1). An unbiased estimator for the covariance matrix T is given
by
Tb =

Ip
Ip 
1
X

0 i
1
1 X h
[Ii] 0 1 [Ii]
Si i
,
i
E Bi B Bi B
I p i=0
I p + 1 i=0
(4.231)

with
Ip

X
1
B=
Bi .
I p + 1 i=0

(4.232)

Examples. In all examples we have looked at it was rather difficult to obtain


reasonable estimates for the claims reserves. This has various reasons: 1)
There is not an obvious choice for a good design matrix i . In our examples
the Hoerl curve has not well-behaved. 2) The estimation of the structural
parameters Si and T are always difficult. Moreover they are not robust
against outliers. 3) Already, slight perturbations of the data had a large effect
on the resulting reserves. For all these reasons we do not give a real data
example, i.e. the Hachemeister model is very interesting from a theoretical
point of view, from a practical point of view it is rather difficult to apply it
to real data.

4.4.2

Other credibility models

In the B
uhlmann-Straub credibility model we had a deterministic cashflow pattern
j and we have estimated the exposure (i ) of the accident years. We could also
exchange the role of these two parameters
Model Assumptions 4.62
There exist scalars i (i = 0, . . . , I) such that
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

Chapter 4. Bayesian models

141

conditionally, given i , we have for all j {0, . . . , J}


E [Xi,j |i ] = j (i ) i .

(4.233)

The pairs (i , Xi ) (i = 0, . . . , I) are independent, and the i are independent


and identically distributed.
Remarks 4.63
0
Now the whole vector (i ) = 0 (i ), . . . , J (i ) is a random drawing
with


E j (i ) = j ,

Cov j (i ), k (i ) = Tj,k ,
Cov (Xi,j , Xi,k | i ) = j,k,i (i ).

(4.234)
(4.235)
(4.236)

The difficulty in this model is that we have observations Xi,0 , . . . , Xi,Ii for
0 (i ), . . . , Ii (i ) and we need to estimate Ii+1 (i ), . . . , J (i ). This is
slightly different from classical one-dimensional credibility applications. From
this it is clear that a crucial role is played by the covariance structures, which
projects past observations to the future.
For general covariance structures it is difficult to give nice formulas. Special
cases were studied by Jewell [38] and Hesselager-Witting [35]. HesselagerWitting [35] assume that the vectors
(0 (i ), . . . , J (i ))
are i.i.d. Dirichlet distributed with parameters a0 , . . . , aJ . Define a =
then we have (see Hesselager-Witting [35], formula (3))

(4.237)
PJ

j=0

aj



E j (i ) = j = aj /a,
(4.238)

1
(1j=k j j k ) . (4.239)
Cov j (i ), k (i ) = Tj,k =
1+a
If we then choose a specific form for the covariance structure j,k,i (i ) we
can work out a credibility formula for the expected ultimate claim.
Of course there is a large variety of other credibility models, such as e.g. hierarchical
credibility models Hesselager [36]. We do not further discuss them here.
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

142

4.5

Chapter 4. Bayesian models

Kalman filter

Kalman filters are an enhancement of credibility models. We will treat only the
one-dimensional case, since already in the multivariate credibility context we have
seen that it becomes difficult to go to higher dimensions.
Kalman filters are evolutionary credibility models. If we take e.g. the B
uhlmannStraub model then it is assumed that i (i = 0, . . . , I) are independent and identically distributed (see Model 4.45). If we go back to Example 4.54 we obtain the
following picture for the observations Y0 , . . . , YI and the estimate
b0 for the a priori
mean 0 (cf. (4.176) and (4.178), respectively):

12'000'000
10'000'000
8'000'000
6'000'000
4'000'000
2'000'000
0
0

5
Y_i

10

mu_0 hat

Figure 4.2: Observations Yi and estimate b0


From Figure 4.2 it is not obvious that = (0 , 1 , . . .) is a process of identically
distributed random variables. We could also have underwriting cycles which would
rather suggest, that neighboring i s are dependent. Hence, we assume that =
(0 , 1 , . . .) is a stochastic process of random variables which are not necessarily
independent and identically distributed.
Model Assumptions 4.64 (Kalman filter)
= (0 , 1 , . . .) is a stochastic process.
Conditionally, given , the increments Xi,j are independent with for all i, j
E [Xi,j /j |] = (i ),
Cov (Xi,j /j , Xk,l /l |) = 1{i=k,j=l} 2 (i )/j .
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

(4.240)
(4.241)

Chapter 4. Bayesian models

143


(i ) i0 is a martingale.
2
Remarks 4.65
The assumption (4.241) can be relaxed in the sense that we only need in the
average (over ) conditional uncorrelatedness. Assumption (4.241) implies
that we obtain an updating procedure which is recursive.
The martingale assumption implies that we have uncorrelated centered increments (i+1 ) (i ) (see also (1.25)),
E [(i+1 )| (0 ), . . . , (i )] = (i ).

(4.242)

In Hilbert space language this reads as follows: The projection of (i+1 ) onto
the subspace of all square integrable functions of (0 ), . . . , (i ) is simply

(i ), i.e. the process (i ) i0 has centered orthogonal increments. This
last assumption could be generalized to linear transformations (see Corollary
9.5 in B
uhlmann-Gisler [18]).
We introduce the following notations (the notation is motivated by the usual terminology from state space models, see e.g. Abraham-Ledolter [1]):
Yi = (Xi,0 /0 , . . . , Xi,Ii /Ii ),
h
2 i
i|i1 = argmineL(Y0 ,...,Yi1 ,1) E (i )
e ,
h
2 i
i|i = argmineL(Y0 ,...,Yi ,1) E (i )
e

(4.243)
(4.244)
(4.245)

(cf. (4.170)). i|i1 is the best linear forecast for (i ) based on the information
Y0 , . . . , Yi1 . Whereas i|i is the best linear forecast for (i ) which is also based
on Yi . Hence there are two updating procedures: 1) updating from i|i1 to i|i
on the basis of the newest observation Yi and 2) updating from i|i to i+1|i due
to the parameter movement from (i ) to (i+1 ).
We define the following structural parameters


2 = E 2 (i ) ,
(4.246)

(4.247)
i2 = Var (i ) (i1 ) ,
h
2 i
qi|i1 = E i,i1 (i ) ,
(4.248)
h
i
2
qi|i = E i,i (i ) .
(4.249)
Theorem 4.66 (Kalman filter recursion formula, Theorem 9.6 in [18])
Under Model Assumptions 4.64 we have
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

144

Chapter 4. Bayesian models

1. Anchoring (i = 0)
0|1 = 0 = E [(0 )]


q0|1 = 02 = Var (0 ) .

and

(4.250)

2. Recursion (i 0)
(a) Observation update:
i|i = i Yi + (1 i ) i|i1 ,

(4.251)

qi|i = (1 i ) qi|i1 ,

(4.252)
(4.253)

with
i =

Ii
,
Ii + 2 /qi|i1
(Ii)J

Yi =

X
j=0

Ci,(Ii)J
j Xi,j
=
.
Ii j
Ii

(4.254)
(4.255)

(b) Parameter update:


i+1|i = i|i

and

2
qi+1|i = qi|i + i+1
.

(4.256)

Proof. For the proof we refer to Theorem 9.6 in B


uhlmann-Gisler [18].
2
This leads to the following reserving estimator:
Estimator 4.67 (Kalman filter reserving estimator)
In the Kalman filter model 4.64 the estimator is given by
Ka
d
b [Ci,J | DI ] = Ci,Ii + (1 Ii ) i|i
C
=E
i,J

(4.257)

for I J + 1 i I.
Remarks 4.68
In practice we face two difficulties: 1) We need to estimate all the parameters. 2) We need good estimates for the starting values 0 and 02 for the
iteration.
Parameter estimation: For the estimation of 2 we choose b2 as in the
B
uhlmann-Straub model (see (4.197)). The estimation of i2 is less straightforward, in fact we need to define a special case of the Model Assumptions
4.64.
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

Chapter 4. Bayesian models

145

Model Assumptions 4.69 (Gerber-Jones [28])


Model Assumptions 4.64 hold.
There exists a sequence (i )i1 of independent random variables with E[i ] =
0 and Var(i ) = 2 such that
(i ) = (i1 ) + i

(4.258)

for all i 1.
(0 ) and i are independent for all i 1.
2

Remark. In this model holds i = Var (i ) (i1 ) = Var(i ) = 2 .
Let us first calculate the variances and covariances of Yi defined in (4.255).



Var(Yi ) = Var E [Yi | ] + E Var (Yi | )



(Ii)J
2
X j

Xi,j

Var
= Var (i ) + E
2

j
Ii
j=0

1
= Var (0 ) + i 2 +
2.
Ii

(4.259)

Assume that i > l





Cov(Yi , Yl ) = Cov E [Yi | ] , E [Yl | ] + E Cov (Yi , Yl | )

= Cov (i ), (l )
!
i
X
k , (l )
= Cov (l ) +
k=l+1


= Var (0 ) + l 2 .
P
We define Y as in (4.199) with = Ii=0 Ii . Hence
!
PI
I
I
h
X
X
2 i
Ii
Ii
i=0 i Yi
=
E Yi Y
Var Yi

i=0
i=0
=

I
X
Ii
i=0

(4.260)

I
X
Ik Il
Var(Yi )
Cov(Yk , Yl )
2
k,l=0
I

X
(I + 1) 2
Ii X
Ik Ii
=
+ 2
i

min{i, k}

2
i=0
k=0
I

i1

XX
Ik Ii
(I + 1) 2
=
+ 2
(i k)
.

2
i=0 k=0
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

(4.261)

146

Chapter 4. Bayesian models

This motivates the following unbiased estimator for 2 (see also (4.198)):
!1
!
I
I X
i1
X
X
2 (I + 1) 2

Ii
Ik
Ii

Yi Y
b2 =
(i k)
2

i=0
i=0 k=0


(I + 1) 2
= c T
,
(4.262)

with
c =

I
X

max{i k, 0}

i,k=0

Ii Ik
2

!1
.

(4.263)

Observe that expression (4.262) is similar to the estimator of 2 in the B


uhlmannStraub model (4.200). The difference lies in the constant.
Example 4.70 (Kalman filter)
We revisit the Example 4.54.
We have the following parameters and estimates
c = 0.62943,
Y

(4.264)

= 90 9110 975,

(4.265)

b = 3370 289,
b = 5450 637.

(4.266)
(4.267)

We start the iteration with the estimates

b0 = 90 8850 584

and

b0 = b = 5450 637

(4.268)

(see also (4.178)).


0
1
2
3
4
5
6
7
8
9

1/2

i|i1

qi|i1

Yi

i|i

9885584
10799066
10694625
10669454
9966628
9893857
10046550
9679468
8935539
8752681

545637
616466
620781
621064
621138
621423
621820
622655
623962
628309

72.4%
76.9%
77.2%
77.2%
77.1%
77.0%
76.7%
76.4%
75.1%
67.2%

11148123
10663316
10662005
9758602
9872213
10092241
9568136
8705370
8691961
9626366

10799066
10694625
10669454
9966628
9893857
10046550
9679468
8935539
8752681
9339528

1/2

i+1|i

qi+1|i

286899
296057
296651
296805
297401
298230
299967
302670
311533
360009

10799066
10694625
10669454
9966628
9893857
10046550
9679468
8935539
8752681
9339528

616466
620781
621064
621138
621423
621820
622655
623962
628309
653702

qi|i

1/2

Table 4.19: Iteration in the Kalman filter


We see that in this example the credibility weights are smaller compared to the
B
uhlmann-Straub model (see Table 4.15). However, they are still rather high,
which means that the a priori value i|i1 will move rather closely with the observations Yi1 . Hence we are now able to model dependent time-series, where the a
priori value incorporates the past observed loss ratios: see Figure 4.3.
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

Chapter 4. Bayesian models

147

12'000'000
10'000'000
8'000'000
6'000'000
4'000'000
2'000'000
0
0

Y_i

mu_0 hat

10

mu_i|i-1

Figure 4.3: Observations Yi , estimate b0 and estimates i|i1

estimated reserves
0
1
2
3
4
5
6
7
8
9
total

Ka
d
C
i,J
11148123
10663360
10662023
9759339
9872400
10091532
9571465
8717246
8699249
9508643

CL
0
15126
26257
34538
85302
156494
286121
449167
1043242
3950815
6047061

hom. cred.
0
14934
25924
34616
85322
155929
287814
460234
1070913
3978818
6114503

Kalman
0
15170
26275
35274
85489
155785
289450
461042
1050529
3833085
5952100

Table 4.20: chain-ladder reserves, homogeneous B


uhlmann-Straub reserves and
Kalman filter reserves

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

148

Chapter 4. Bayesian models

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

Chapter 5
Outlook
Several topics on stochastic claims reserving methods need to be added to the
current version of this manuscript: e.g.
explicit distributional models and methods, such as the Log-normal model or
Tweedies compound Poisson model
generalized linear model methods
bootstrapping methods
multivariate methods
Munich chain-ladder method
etc.

149

150

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

Chapter 5. Outlook

Appendix A
Unallocated loss adjustment
expenses
A.1

Motivation

In this section we describe the New York-method for the estimation of unallocated loss adjustment expenses (ULAE). The New York-method for estimating
ULAE is, unfortunately, only poorly documented in the literature (e.g. as footnotes
in Feldblum [26] and Foundation CAS [19]).
In non-life insurance there are usually two different kinds of claims handling costs,
external ones and internal ones. External costs like costs for external lawyers or for
an external expertise etc. are usually allocated to single claims and are therefore
contained in the usual claims payments and loss development figures. These payments are called allocated loss adjustment expenses (ALAE). Typically, internal
loss adjustment expenses (income of claims handling department, maintenance of
claims handling system, etc.) are not contained in the claims figures and therefore
have to be estimated separately. These internal costs can usually not be allocated to single claims. We call these costs unallocated loss adjustment expenses
(ULAE). From a regulatory point of view, we should also build reserves for these
costs/expenses because they are part of the claims handling process which guarantees that an insurance company is able to meet all its obligations. I.e. ULAE
reserves should guarantee the smooth run off of the old insurance liabilities without pay-as-you-go from new business/premium for the internal claims handling
processes.
151

152

A.2

Appendix A. Unallocated loss adjustment expenses

Pure claims payments

Usually, claims development figures only consist of pure claims payments not
containing ULAE charges. They are usually studied in loss development triangles
or trapezoids as above (see Section 1.3).
(pure)

In this section we denote by Xi,j


the pure incremental payments for accident
year i (0 i I) in development year j (0 j J). Pure always means,
that these quantities do not contain ULAE (this is exactly the quantity studied in
Section 1.3). The cumulative pure payments for accident year i after development
period j are denoted by (see (1.41))

(pure)
Ci,j

j
X

Xi,k (pure) .

(A.1)

k=0

(pure)

We assume that Xi,j


(pure)
given by Ci,J .

= 0 for all j > J, i.e. the ultimate pure cumulative loss is


(pure)

We have observations for DI = {Xi,j ; 0 i I and 0 j min{J, I i}}


and the complement of DI needs to be predicted.

For the New York-method we also need a second type of development trapezoids,
(pure)
denotes the pure cunamely a reporting trapezoid: For accident year i, Zi,j
mulative ultimate claim amount for all those
claims, whichare reported up to (and

(pure)
(pure)
(pure)
(pure)
including) development year j. Hence Zi,0 , Zi,1 , . . . with Zi,J
= Ci,J
(pure)

describes, how the pure ultimate claim Ci,J


is reported over time at the insurance company. Of course, this reporting pattern is much more delicate, because
eI = {Z (pure) ; 0 i I and 0 j
sizes which are reported in the upper set D
i,j
min{J, I i}} are still developping, since usually it takes quite some time between
the reporting and the final settlement of a claim. In general, the final value for
(pure)
Zi,j
is only known at time i + J.

Remark: Since the New York-method is an algorithm based on deterministic


numbers, we assume that all our variables are deterministic. Stochastic variables
are replaced by their best estimate for its conditional mean at time I. We think
that for the current presentation (to explain the New York-method) it is not helpful
to work in a stochastic framework.
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

Appendix A. Unallocated loss adjustment expenses

A.3

153

ULAE charges

The cumulative ULAE payments for accident year i until development period j are
(U LAE)
denoted by Ci,j
. And finally, the total cumulative payments (pure and ULAE)
are denoted by
(pure)
(U LAE)
Ci,j = Ci,j
+ Ci,j
.
(A.2)
(U LAE)

The cumulative ULAE payments Ci,j


(U LAE)

Xi,j

and the incremental ULAE charges

(U LAE)

= Ci,j

(U LAE)

Ci,j1

(A.3)

need to be estimated: The main difficulty is that for each accounting year t I
we usually have only one aggregated observation
X (U LAE)
(U LAE)
=
(sum over t-diagonal).
(A.4)
Xt
Xi,j
i+j=t
0jJ

I.e. ULAE payments are usually not available for single accident years but rather
we have a position Total ULAE Expenses for each accounting year t (in general
ULAE charges are contained in the position Administrative Expenses in the
annual profit-and-loss statement).
Hence, for the estimation of future ULAE payments we need first to define an
(U LAE)
appropriate model in order to split the aggregated observations Xt
into the
(U LAE)
different accident years Xi,j
.

A.4

New York-method

The New York-method assumes that one part of the ULAE charge is proportional
to the claims registration (denote this proportion by r [0, 1]) and the other part
is proportional to the settlement (payments) of the claims (proportion 1 r).
Assumption A.1 We assume that there are two development patterns (j )j=0,...,J
P
P
and (j )j=0,...,J with j 0, j 0, for all j, and Jj=0 j = Jj=0 j = 1 such that
(cashflow or payout pattern)
(pure)

Xi,j

(pure)

= j Ci,J

(A.5)

and (reporting pattern)


(pure)
Zi,j

j
X

(pure)

l Ci,J

l=0

for all i and j.


c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

(A.6)

154

Appendix A. Unallocated loss adjustment expenses

Remarks:
(pure)

Equation (A.5) describes, how the pure ultimate claim Ci,J


is paid over
(pure)
time. In fact j gives the cashflow pattern for the pure ultimate claim Ci,J .
We propose that j is estimated by the classical chain-ladder factors fj , see
(3.12)
!
1
1
1
.
(A.7)

bjCL =
fbj fbJ1
fbj1
The estimation of the claims reporting pattern j in (A.6) is more delicate.
As we have seen there are not many claims reserving methods which give a
reporting pattern j . Such a pattern can only be obtained if one separates
the claims estimates for reported claims and IBNyR claims (incurred but not
yet reported).
Model Assumptions A.2 Assume that there exists r [0, 1] such that the incremental ULAE payments satisfy for all i and all j
 (U LAE)
(U LAE)
= r j + (1 r) j Ci,J
Xi,j
.
(A.8)
Henceforth, we assume that one part (r) of the ULAE charge is proportional to
the reporting pattern (one has loss adjustment expenses at the registration of the
claim), and the other part (1 r) of the ULAE charge is proportional to the claims
settlement (measured by the payout pattern).
Definition A.3 (Paid-to-paid ratio) We define for all t
P
(U LAE)
Xi,j
i+j=t
(U LAE)
X
0jJ
.
t = t (pure) = P
(pure)
Xt
Xi,j

(A.9)

i+j=t
0jJ

The paid-to-paid ratio measures the ULAE payments relative to the pure claim
payments in each accounting year t.
Lemma A.4 Assume there exists > 0 such that for all accident years i we have
(U LAE)

Ci,J

(pure)

= .

(A.10)

Ci,J

Under Assumption A.1 and Model A.2 we have for all accounting years t
t = ,
(pure)

whenever Ci,J

is constant in i.

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

(A.11)

Appendix A. Unallocated loss adjustment expenses


Proof of Lemma A.4. We have
P
(U LAE)
Xi,j

J
P

i+j=t

t =

0jJ

(pure)
Xi,j

j=0

 (U LAE)
r j + (1 r) j Ctj,J
J
P

i+j=t

j=0

0jJ

J
P

155

(pure)

j Ctj,J

 (pure)
r j + (1 r) j Ctj,J

j=0

J
P

= .
j

j=0

(A.12)

(pure)
Ctj,J

This finishes the proof.


2
We define the following split of the claims reserves for accident year i at time j:
(pure)

Ri,j

(pure)

Xi,l

(IBN yR)

(total pure claims reserves),

l>j

l>j

Ri,j

(pure)

l Ci,J

(pure)

l Ci,J

(IBNyR reserves, incurred but not yet reported),

l>j
(rep)
Ri,j

(pure)
Ri,j

(IBN yR)

Ri,j

(reserves for reported claims).

Estimator A.5 (New York-method) Under the assumptions of Lemma A.4 we


can predict using the observations t (accounting year data). The reserves for
P
(U LAE)
(U LAE)
,
= l>j Xi,l
ULAE charges for accident year i after development year j, Ri,j
are estimated by
b(U LAE) = r R(IBN yR) + (1 r) R(pure)
R
i,j
i,j
i,j
(IBN yR)

= Ri,j

(rep)

+ (1 r) Ri,j .

(A.13)

Explanation of Result A.5.


We have under the assumptions of Lemma A.4 for all i, j that
(U LAE)

Ri,j

 (U LAE)
r l + (1 r) l Ci,J

(A.14)

l>j

 (pure)
r l + (1 r) l Ci,J

l>j
(IBN yR)

= r Ri,j

(pure)

+ (1 r) Ri,j

Remarks:
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

156

Appendix A. Unallocated loss adjustment expenses

In practice one assumes the stationarity condition t = for all t. This


implies that can be estimated from the accounting data of the annual
profit-and-loss statements. Pure claims payments are directly contained in
the profit-and-loss statements, whereas ULAE payments are often contained
in the administrative expenses. Hence one needs to divide this position into
further subpositions (e.g. with the help of an activity-based cost allocation
split).
Result A.5 gives an easy formula for estimating ULAE reserves. If we are
interested into the total ULAE reserves after accounting year t we simply
have
X (U LAE)
X (IBN yR)
X (rep)
bt(U LAE) =
b
R
R
=

R
+

(1

r)

Ri,j , (A.15)
i,j
i,j
i+j=t

i+j=t

i+j=t

i.e. all we need to know is, how to split of total pure claims reserves into
reserves for IBNyR claims and reserves for reported claims.
The assumptions for the New York-method are rather restrictive in the sense
(pure)
that the pure cumulative ultimate claim Ci,J
must be constant in k (see
Lemma A.4). Otherwise the paid-to-paid ratio t for accounting years is
(U LAE)
(pure)
not the same as the ratio Ci,J
/Ci,J
even if the latter is assumed to
be constant. Of course in practice the assumption of equal pure cumulative
ultimate claim is never fulfilled. If we relax this condition we obtain the
following lemma.
Lemma A.6 Assume there exists > 0 such that for all accident years i we have

1
(U LAE)
Ci,J

= r + (1 r)
,
(A.16)
(pure)

Ci,J
with

(pure)
j=0 j Ctj,J
PJ
(pure)
j=0 Ctj,J

PJ
=

(pure)
j=0 j Ctj,J
.
PJ
(pure)
C
j=0 tj,J

PJ
=

and

(A.17)

Under Assumption A.1 and Model A.2 we have for all accounting years t
t = .
Proof of Lemma A.6. As in Lemma A.4 we obtain
J
 (pure)
P
r j + (1 r) j Ctj,J

1

j=0
t = r + (1 r)

= .
J
P

(pure)
j Ctj,J
j=0

This finishes the proof.


c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

(A.18)

(A.19)

Appendix A. Unallocated loss adjustment expenses

157
2

Remarks:
If all pure cumulative ultimates are equal then = = 1/(J + 1) (apply
Lemma A.4).
Assume that there exists a constant i(p) > 0 such that for all i 0 we
(pure)
(pure)
have Ci+1,J = (1 + i(p) ) Ci,J , i.e. constant growth i(p) . If we blindly apply
(A.11) of Lemma A.4 (i.e. we do not apply the correction factor in (A.16)) and
estimate the incremental ULAE payments by (A.13) and (A.15) we obtain
X

b (U LAE) =
X
i,j

i+j=t

J
X

 (pure)
r j + (1 r) j Ctj,J

j=0

(U LAE)
Xt
(pure)
Xt

J
X

 (pure)
r j + (1 r) j Ctj,J

(A.20)

j=0

=
r + (1 r)

i+j=t
!

PJ
(p) Jj
X (U LAE)

1
+
i
j
j=0
r PJ
=
Xi,j
+ (1 r)
(p) )Jj

(1
+
i
j
i+j=t
j=0
X (U LAE)
>
Xi,j
,
X

(U LAE)
Xi,j

i+j=t

where the last inequality in general holds true for i(p) > 0, since usually (j )j
is more concentrated than (j )j , i.e. we usually have J > 1 and
j
X
l=0

l >

j
X

for j = 0, . . . , J 1.

(A.21)

l=0

This comes from the fact that the claims are reported before they are paid.
I.e. if we blindly apply the New York-method for constant positive growth
then the ULAE reserves are too high (for constant negative growth we obtain
the opposite sign). This implies that we have always a positive loss experience
on ULAE reserves for constant positive growth.

A.5

Example

We assume that the observations for t are generated by i.i.d. random variables
(U LAE)
Xt
(pure) . Hence we can estimate from this sequence. Assume = 10%. Moreover
Xt

i(p) = 0 and set r = 50% (this is the usual choice, also done in the SST [73]).
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

158

Appendix A. Unallocated loss adjustment expenses

Moreover we assume that we have the following reporting and cash flow patterns
(J = 4):
(0 , . . . , 4 ) = (90%, 10%, 0%, 0%, 0%),

(A.22)

(0 , . . . , 4 ) = (30%, 20%, 20%, 20%, 10%).

(A.23)

(pure)

Assume that Ci,J


= 10 000. Then the ULAE reserves for accident year i are given
by


b(U LAE) , . . . , R
b(U LAE) = (100, 40, 25, 15, 5),
(A.24)
R
i,1
i,3
which implies for the estimated incremental ULAE payments


b (U LAE) , . . . , X
b (U LAE) = (60, 15, 10, 10, 5).
X
i,0

i,4

(A.25)

b (U LAE) we have
bi,j = X (pure) + X
Hence for the total estimated payments X
i,j
i,j



bi,0 , . . . , X
bi,4 = (360, 215, 210, 210, 105).
X

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

(A.26)

Appendix B
Distributions
B.1
B.1.1

Discrete distributions
Binomial distribution

For n N and p (0, 1) the Binomial distribution Bin(n, p) is defined to be the


discrete distribution with probability function
 
n
fn,p (x) =
px (1 p)nx
x

(B.1)

for all x {0, . . . , n}.


E(X)

Var(X)

np

n p (1 p)

Vco(X)
q
1p
np

Table B.1: Expectation, variance and variational coefficient of a Bin(n, p)distributed random variable X

B.1.2

Poisson distribution

For (0, ) the Poisson distribution Poisson() is defined to be the discrete


distribution with probability function
f (x) = e
for all x N0 .
159

x
x!

(B.2)

160

Appendix B. Distributions
E(X) Var(X) Vco(X)

Table B.2: Expectation, variance and variational coefficient of a Poisson()distributed random variable X

B.1.3

Negative binomial bistribution

For r (0, ) and p (0, 1) the Negative binomial distribution NB(r, p) is defined
to be the discrete distribution with probability function


r+x1
fr,p (x) =
pr (1 p)x
(B.3)
x
for all x N0 .
For R and n N0 , the generalized binomial coefficient is defined to be
 
n

( 1) . . . ( n + 1) Y k + 1
=
.
(B.4)
=
n!
k
n
k=1
E(X)
r

1p
p

Var(X)
r

1p
p2

Vko(X)

1
r(1p)

Table B.3: Expectation, variance and variational coefficient of a NB(r, p)distributed random variable X

B.2
B.2.1

Continuous distributions
Normal distribution

For R and 2 > 0 the Normal distribution N (, 2 ) is defined to be the


continuous distribution with density


1
(x )2
exp
1R (x).
(B.5)
f,2 (x) =
2 2
2 2

B.2.2

Log-normal distribution

For R and 2 > 0 the Log-normal distribution LN (, 2 ) is defined to be the


continuous distribution with density


1
(ln x )2
f,2 (x) =
exp
1(0,) (x).
(B.6)
2 2
2 2 x
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

Appendix B. Distributions

161
E(X) Var(X) Vco(X)

Table B.4: Expectation, variance and variational coefficient of a N (, 2 )distributed random variable X
E(X)
2
+ 2

Var(X)
2

e2+ e


1

Vco(X)
2
e 1

Table B.5: Expectation, variance and variational coefficient of a LN (, 2 )distributed random variable X

B.2.3

Gamma distribution

For , c (0, ) the Gamma distribution (, c) is defined to be the continuous


distribution with density
f,c (x) =

c
x1 ecx 1(0,1) (x).
()

The map : (0, ) (0, ) given by


Z
() =
u1 eu du

(B.7)

(B.8)

is called the Gamma function. The parameters and c are called shape and scale
respectively.
The Gamma function has the following properties
1) (1) = 1.
2) (1/2) =

3) ( + 1) = ().

E(X) Var(X) Vco(X)

c2

Table B.6: Expectation, variance and variational coefficient of a (, c)-distributed


random variable X

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

162

B.2.4

Appendix B. Distributions

Beta distribution

For a, b (0, ) the Beta distribution Beta(a, b) is defined to be the continuous


distribution with density
fa,b (x) =

1
xa1 (1 x)b1 1(0,1) (x).
B(a, b)

The map B : (0, ) (0, ) (0, ) given by


Z 1
(a) (b)
ua1 (1 u)b1 du =
B(a, b) =
(a + b)
0

(B.9)

(B.10)

is called the Beta function.


E(X)

Var(X)

a
a+b

ab
(a+b)2 (a+b+1)

Vco(X)
q

b
a(a+b+1)

Table B.7: Expectation, variance and variational coefficient of a Beta(a, b)distributed random variable X

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

Bibliography
[1] Abraham, B., Ledolter, J. (1983), Statistical Methods for Forecasting. John Wiley
and Sons, NY.
[2] Alba de, E. (2002), Bayesian estimation of outstanding claim reserves. North American Act. J. 6/4, 1-20.
[3] Alba de, E., Corzo, M.A.R. (2006), Bayesian claims reserving when there are negative values in the runoff triangle. Actuarial Research Clearing House, Jan 01, 2006.
[4] Alba de, E. (2006), Claims reserving when there are negative values in the runoff
triangle: Bayesian analysis using the three-parameter log-normal distribution. North
American Act. J. 10/3, 45-59.
[5] Arjas, E. (1989), The claims reserving problem in non-life insurance: some structural
ideas. ASTIN Bulletin 19/2, 139-152.
[6] Barnett, G., Zehnwirth, B. (1998), Best estimate reserves. CAS Forum, 1-54.
[7] Barnett, G., Zehnwirth, B. (2000), Best estimates for reserves. Proc. CAS, Vol.
LXXXII, 245-321.
[8] Benktander, G. (1976), An approach to credibility in calculating IBNR for casualty
excess reinsurance. The Actuarial Review, April 1976, p.7.
[9] Bernardo, J.M., Smith, A.F.M. (1994), Bayesian Theory. John Wiley and Sons, NY.
[10] Bornhuetter, R.L., Ferguson, R.E. (1972), The actuary and IBNR. Proc. CAS,
Vol. LIX, 181-195.
[11] Buchwalder, M., B
uhlmann H., Merz, M., W
uthrich, M.V. (2005), Legal valuation
portfolio in non-life insurance. Conference paper, presented at the 36th International
ASTIN Colloquium, 4-7 September 2005, ETH Z
urich. www.astin2005.ch
[12] Buchwalder, M., B
uhlmann H., Merz, M., W
uthrich, M.V. (2006), Estimation of
unallocated loss adjustment expenses. Bulletin SAA 2006/1, 43-53.
[13] Buchwalder, M., B
uhlmann H., Merz, M., W
uthrich, M.V. (2006), The mean square
error of prediction in the chain ladder reserving method (Mack and Murphy revisited). To appear in ASTIN Bulletin 36/2.

163

164

Bibliography

[14] Buchwalder, M., B


uhlmann H., Merz, M., W
uthrich, M.V. (2006), Valuation portfolio in non-life insurance. Preprint
[15] B
uhlmann, H. (1983), Chain ladder, Cape Cod and complementary loss ratio. International Summer School 1983, unpublished.
[16] B
uhlmann, H. (1992), Stochastic discounting. Insurance: Math. Econom. 11, 113127.
[17] B
uhlmann, H. (1995), Life insurance with stochastic interest rates. In: Financial
Risk in Insurance, G. Ottaviani (Ed.), Springer.
[18] B
uhlmann, H., Gisler, A. (2005), A Course in Credibility Theory and its Applications. Springer Universitext.
[19] Casualty Actuarial Society (CAS) (1990). Foundations of Casualty Actuarial Science, fourth edition.
[20] Clark, D.R. (2003), LDF curve-fitting and stochastic reserving: a maximum likelihood approach. CAS Forum (Fall), 41-92.
[21] Efron, B. (1979), Bootstrap methods:
Ann. Statist. 7/1, 126.

another look at the jackknife.

[22] Efron, B., Tibshirani, R.J. (1995), An Introduction to the Bootstrap. Chapman &
Hall, NY.
[23] England, P.D., Verrall, R.J. (1999), Analytic and bootstrap estimates of prediction
errors in claims reserving. Insurance: Math. Econom. 25, 281-293.
[24] England, P.D., Verrall, R.J. (2001), A flexible framework for stochastic claims reserving. Proc. CAS, Vol. LXXXIII, 1-18.
[25] England, P.D., Verrall, R.J. (2002), Stochastic claims reserving in general insurance.
British Act. J. 8/3, 443-518.
[26] Feldblum, S. (2002), Completing and using schedule P. CAS Forum, 353-590.
[27] Finney, D.J. (1941), On the distribution of a variate whose algorithm is normally
distributed. JRSS Suppl. 7, 155-161.
[28] Gerber, H.U., Jones, D.A. (1975), Credibility formulas of the updating type. In:
Credibility: Theory and Applications, P.M. Kahn (Ed.), Academic Press, NY.
[29] Gisler, A. (2006), The estimation error in the chain-ladder reserving method: a
Bayesian approach. To appear in ASTIN Bulletin 36/2.
[30] Gogol, D. (1993), Using
Math. Econom. 12, 297-299.

expected

loss

ratios

in

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

reserving.

Insurance:

Bibliography

165

[31] Hachemeister, C.A. (1975), Credibility for regression models with application to
trend. In: Credibility: Theory and Applications, P.M. Kahn (Ed.), Academic Press,
NY.
[32] Haastrup, S., Arjas, E. (1996), Claims reserving in continuous time; a nonparametric
Bayesian approach. ASTIN Bulletin 26/2, 139-164.
[33] Herbst, T. (1999), An application of randomly truncated data models in reserving
IBNR claims. Insurance: Math. Econom. 25, 123-131.
[34] Hertig, J. (1985), A statistical approach to the IBNR-reserves in marine insurance.
ASTIN Bulletin 15, 171-183.
[35] Hesselager, O., Witting, T. (1988), A credibility model with random fluctutations
in delay probabilities for the prediction of IBNR claims. ASTIN Bulletin 18, 79-90.
[36] Hesselager, O. (1991), Prediction of outstanding claims: A hierarchical credibility
approach. Scand. Act. J. 1991, 25-47.
[37] Hovinen, E. (1981), Additive and continuous IBNR. ASTIN Colloquium Loen, Norway.
[38] Jewell, W.S. (1976), Two classes of covariance matrices giving simple linear forecasts.
Scand. Act. J. 1976, 15-29.
[39] Jewell, W.S. (1989), Predicting IBNyR events and delays. ASTIN Bulletin 19/1,
25-55.
[40] Jones, A.R., Copeman, P.J., Gibson, E.R., Line, N.J.S., Lowe, J.A., Martin, P.,
Matthews, P.N., Powell, D.S. (2006), A change agenda for reserving. Report of
the general insurance reserving issues taskforce (GRIT). Institute of Actuaries and
Faculty of Actuaries.
[41] Jong de, E. (2006), Forecasting runoff triangles. North American Act. J. 10/2, 28-38.
[42] Jong de, P., Zehnwirth, B. (1983), Claims reserving, state-space models and the
Kalman filter. J.I.A. 110, 157-182.
[43] Jrgensen, B., de Souza, M.C.P. (1994), Fitting Tweedies compound Poisson model
to insurance claims data. Scand. Act. J. 1994, 69-93.
[44] Kremer, E. (1982), IBNR claims and the two way model of ANOVA.
Scand. Act. J. 1982, 47-55.
[45] Larsen, C.R. (2005), A dynamic claims reserving model. Conference paper, presented
at the 36th International ASTIN Colloquium, 4-7 September 2005, ETH Z
urich.
www.astin2005.ch
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

166

Bibliography

[46] Lyons, G., Forster, W., Kedney, P., Warren, R., Wilkinson, H. (2002),
Claims reserving working party paper. General Insurance Convention 2002.
http://www.actuaries.org.uk
[47] Mack, T. (1990), Improved estimation of IBNR claims by credibility. Insurance:
Math. Econom. 9, 51-57.
[48] Mack, T. (1991), A simple parametric model for rating automobile insurance or
estimating IBNR claims reserves. ASTIN Bulletin 21/1, 93-109.
[49] Mack, T. (1993), Distribution-free calculation of the standard error of chain ladder
reserve estimates. ASTIN Bulletin 23/2, 213-225.
[50] Mack, T. (1994), Measuring the variability of chain ladder reserve estimates. CAS
Forum (Spring), 101-182.
[51] Mack, T. (2000), Credible claims reserves: The Benktander method. ASTIN Bulletin
30/2, 333-347.
[52] Mack, T., Quarg, G., Braun, C. (2006), The mean square error of prediction in the
chain ladder reserving method - a comment. To appear in ASTIN Bulletin 36/2.
[53] McCullagh, P., Nelder, J.A. (1989), Generalized Linear Models. 2nd edition, Chapman & Hall, London.
[54] Merz, M., W
uthrich, M.V. (2006), A credibility approach to the Munich chain-ladder
method. To appear in Blatter DGVFM.
[55] Murphy, D.M. (1994), Unbiased loss development factors. Proc. CAS, Vol. LXXXI,
154-222.
[56] Neuhaus, W. (1992), Another pragmatic loss reserving method or Bornhuetter/Ferguson revisited. Scand. Act. J. 1992, 151-162.
[57] Neuhaus, W. (2004), On the estimation of outstanding claims. Conference paper,
presented at the 35th International ASTIN Colloquium 2004, Bergen, Norway.
[58] Norberg, R. (1993), Prediction of outstanding liabilities in non-life insurance. ASTIN
Bulletin 23/1, 95-115.
[59] Norberg, R. (1999), Prediction of outstanding liabilities II. Model variations and
extensions. ASTIN Bulletin 29/1, 5-25.
[60] Ntzoufras, I., Dellaportas, P. (2002), Bayesian modelling of outstanding liabilities
incorporating claim count uncertainty. North American Act. J. 6/1, 113-128.
[61] Partrat, C., Pey, N., Schilling, J. (2005), Delta method and reserving. Conference
paper, presented at the 36th International ASTIN Colloquium, 4-7 September 2005,
ETH Z
urich. www.astin2005.ch
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

Bibliography

167

[62] Quarg, G., Mack, T. (2004), Munich Chain Ladder. Blatter DGVFM, Band XXVI,
597-630.
[63] Radtke, M., Schmidt, K.D. (2004), Handbuch zur Schadenreservierung. Verlag Versicherungswirtschaft, Karlsruhe.
[64] Renshaw, A.E. (1994), Claims reserving by joint modelling. Actuarial research paper
no. 72, Department of Actuarial Sciences and Statistics, City University, London.
[65] Renshaw, A.E., Verrall, R.J. (1998), A stochastic model underlying the chain ladder
technique. British Act. J. 4/4, 903-923.
[66] Ross, S.M. (1985), Introdution to Probability Models, 3rd edition. Academic Press,
Orlando Florida.
[67] Sandstrom, A. (2006), Solvency: Models, Assessment and Regulation. Chapman
and Hall, CRC.
[68] Schmidt, K.D., Schaus, A. (1996), An extension of Macks model for the chain-ladder
method. ASTIN Bulletin 26, 247-262.
[69] Schnieper, R. (1991), Separating true IBNR and IBNER claims. ASTIN Bulletin 21,
111-127.
[70] Scollnik, D.P.M. (2002), Implementation of four models for outstanding liabilities in
WinBUGS: A discussion of a paper by Ntzoufras and Dellaportas. North American
Act. J. 6/1, 113-128.
[71] Smyth, G.K., Jrgensen, B. (2002), Fitting Tweedies compound Poisson model to
insurance claims data: dispersion modelling. ASTIN Bulletin 32, 143-157.
[72] Srivastava, V.K., Giles, D.E. (1987), Seemingly Unrelated Regression Equation Models: Estimation and Inference. Marcel Dekker, NY.
[73] Swiss Solvency Test (2005), BPV SST Technisches Dokument, Version 22.Juni 2005.
Available under www.sav-ausbildung.ch
[74] Taylor, G. (1987), Regression models in claims analysis I: theory. Proc. CAS,
Vol. XLVI, 354-383.
[75] Taylor, G. (2000), Loss Reserving: An Actuarial Perspective. Kluwer Academic
Publishers.
[76] Taylor, G., McGuire, G. (2005), Synchronous bootstrapping of seemingly unrelated
regressions. Conference paper, presented at the 36th International ASTIN Colloquium, 4-7 September 2005, ETH Z
urich. www.astin2005.ch
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

168

Bibliography

[77] Venter, G.G. (1998), Testing the assumptions of age-to-age factors. Proc. CAS,
Vol. LXXXV, 807-847.
[78] Venter, G.G. (2006), Discussion of MSEP in the CLRM (MMR). To appear in
ASTIN Bulletin 36/2.
[79] Verrall, R.J. (1990), Bayesian and empirical Bayes estimation for the chain ladder
model. ASTIN Bulletin 20/2, 217-238.
[80] Verrall, R.J. (1991), On the estimation of reserves from loglinear models. Insurance:
Math. Econom. 10, 75-80.
[81] Verrall, R.J. (2000), An investigation into stochastic claims reserving models and
the chain-ladder technique. Insurance: Math. Econom. 26, 91-99.
[82] Verrall, R.J. (2004), A Bayesian generalized linear model for the BornhuetterFerguson method of claims reserving. North American Act. J. 8/3, 67-89.
[83] Verdier, B., Klinger, A. (2005), JAB Chain: A model-based calculation of paid and
incurred loss development factors. Conference paper, presented at the 36th International ASTIN Colloquium, 4-7 September 2005, ETH Z
urich. www.astin2005.ch
[84] Vylder De, F. (1982), Estimation of IBNR claims by credibility theory. Insurance:
Math. Econom. 1, 35-40.
[85] Vylder De, F., Goovaerts, M.J. (1979), Proceedings of the first meeting of
the contact group Actuarial Sciences. KU Leuven, nl. 7904B wettelijk report:
D/1979/23761/5.
[86] Wright, T.S. (1990), A stochastic method for claims reserving in general insurance.
J.I.A. 117, 677-731.
[87] W
uthrich, M.V. (2003), Claims reserving using Tweedies compound Poisson model.
ASTIN Bulletin 33/2, 331-346.
[88] W
uthrich, M.V. (2006), Premium liability risks: modeling small claims. Bulletin
SAA 2006/1, 27-38.
[89] W
uthrich, M.V. (2006), Using a Bayesian approach for claims reserving. Accepted
for publication in CAS Journal.
[90] W
uthrich, M.V. (2006), Prediction error in the chain ladder method. Preprint.
[91] W
uthrich, M.V., B
uhlmann, H., Furrer, H. (2006), Lecture notes on market consistent actuarial valuation. ETH Z
urich, Summer Term 2006.
[92] Zehnwirth, B. (1998), ICRFS-Plus 8.3 Manual. Insureware Pty Ltd. St. Kilda, Australia.
c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

Bibliography

169

[93] Zellner, A. (1962), An efficient method of estimating seemingly unrelated regressions


and test for aggregated bias. J. American Stat. Assoc. 57, 346-368.

c
2006
(M. W
uthrich, ETH Z
urich & M. Merz, Uni T
ubingen)

Вам также может понравиться