is the set
of all model point indices. Furthermore, let the amount of model point policies
be dened as m
= [M
(1)
There are of course limits to the amount of policies we can group, since we
still need to ensure that our set X
iM
C(x
i
)
iM
C(x
k
)[ p s S (2)
where p can be set company specic. To be able to solve the problem men
tioned above, we need to know
iM
C(x
i
) which we will call the base run
calculation. Furthermore, we need to construct the set of model points X
and
obtain
kM
C(x
k
), where x
k
is a model point policy. We would also like to
know before we group, in which way we should group, in order to make the error
as small as possible. This is called the estimation of our errors and we denote
them per model point policy as
k
. Which means that there are basically three
actions we need to perform;
1. Calculate the base run (we still need this at rst, to check if our estimation
works)
2. Calculate the estimated errors from the model points as compared to the
base run
3. Calculate the empirical discounted cash ows resulting from the model
points (to check against the base run)
3.2 Minimizing the error induced by making use of model
points
So on which criteria should we group, and what error do we create by grouping
in this way? Can we upper bound this before we even group? Let us rst dene
our groups G
k
, k M
, k ,= k
. We dene the
averages of the policies in each group as our model points. Then we can express
the distance of a policy to the respective model point, as follows: x x
k
where we dene x
k
= [
iG
k
x
i
1
G
k

. . .
i
G
k
x
i
n
G
k

]. Which is now the vector of
averages of every attribute j over all policies in group k.
Now, how much does the projected present value of the cash ows deviate
as we represent our policy x by a single model point x
k
? To approximate this,
we will make use of the Taylor series expansion. Let us rewrite C(x), by adding
and subtracting the model point x
k
from it.
C(x) = C(x +x
k
x
k
) =
C(x
k
) +
dC(x
k
)
dx
(x x
k
) +
1
2
_
x x
k
_
T
d
2
C(x
k
)
dx
2
(x x
k
) +
q=3
1
q!
dC
(q)
dx
(q)
_
x x
k
_
(q)
(3)
8
where
q=3
1
q!
dC
(q)
dx
(q)
are the qthe order derivatives
2
.
Furthermore the Jacobian is:
dC
dx
=
_
C
x1
,
C
x2
, . . . ,
C
xn
_
and the Hessian,
H =
d
2
C
dx
2
=
_
2
C
x
2
1
2
C
x1x2
. . .
2
C
x1xn
2
C
x2x1
2
C
x
2
2
. . .
2
C
x2xn
.
.
.
.
.
.
.
.
.
.
.
.
2
C
xnx1
2
C
xnx2
. . .
2
C
x
2
n
_
_
If we now sum our equation (3) over all policies we get the following formula:
iM
C(x
k
) +
dC(x
k
)
dx
iM
(x
i
x
k
) +
1
2
iM
_
x
i
x
k
_
T
d
2
C(x
k
)
dx
2
(x
i
x
k
) +
iM
q=3
1
q!
dC
(q)
dx
(q)
)
because of the equivalence of the Jacobian, Hessian and higher order derivatives
for every policy i we can take these terms out of the sum.
For simplicity, suppose rst that
1
q!
dC
(q)
dx
(q)
= 0 for q = 2, which means that
we suppose that C(x) is linear, then the following set of equations hold
3
:
iM
C(x
i
) =
N C(x
k
) +
dC(x
k
)
dx
iM
(x
i
x
k
) =
N C(x
k
) +
dC(x)
dx
_
iM
x
i
N
iM
x
i
N
_
. .
0
=
N C(x
k
)
(4)
This states that we do not make any error at all! For grouping this is a really
powerful result, because this means that we can use only one bucket for a specic
attribute that is linear in the present value of its cash ows. In other words, we
can group this attribute away.
Suppose now, that
1
q!
dC
(q)
dx
(q)
,= 0?. If we now rewrite C(x), this gives:
2
Note that this is a nonstandard use of a Taylor representation. If we replaced x x
k
by
h we see the familiar Taylor expression. If lim
h0
h is equivalent to an increase in the amount
of model points
3
This implicitly sets all higher order derivatives also equal to 0
9
iM
C(x
i
) =
kM
iG
k
C(x
i
) =
()
kM
[G
k
[C
_
_
_
_
_
_
_
_
iG
k
x
i
1
G
k

iG
k
x
i
2
G
k

.
.
.
iG
k
x
i
n
G
k

_
_
_
_
_
_
_
_
_
+
kM
iG
k
1
2
_
x
i
x
k
_
T
k
H
mp
_
x
i
x
k
_
=
kM
[G
k
[C
_
_
_
_
_
_
_
x
k
1
x
k
2
.
.
.
x
k
m
_
_
_
_
_
_
_
+
kM
iG
k
1
2
_
x
i
x
k
_
T
k
H
mp
_
x
i
x
k
_
where in (*) we used our result from (4). However this is only holds whenever
1
q!
dC
(q)
dx
(q)
= 0 for q = 3. However we will assume from now on, that higher
order eects are negligible, which denes the error that we make to be solely in
the second order derivatives. Note that we can only consider them negligible,
whenever our distance x
i
x
k
, k M
, i G
k
is small.
Considering now only one group k, for this group we can express the error
made as follows:
k
=
iG
k
C(x
i
) [G
k
[C(x
k
)
=
iG
k
_
1
2
_
x
i
x
k
_
T
H
mp
k
_
x
i
x
k
_
_
=
iG
k
1
2
_
x
i
x
k
_
T
_
jN
2
C(x
k
)
x1xj
(x
i
j
x
k
j
)
jN
2
C(x
k
)
x2xj
(x
i
j
x
k
j
)
.
.
.
jN
2
C(x
k
)
xnxj
(x
i
j
x
k
j
)
_
_
Now for j
1
, j
2
N we have
k
=
iG
k
j1N
j2N
1
2
2
C(x
k
)
xj
1
xj
2
(x
ij1
x
k
j1
)(x
ij2
x
k
j2
)
(5)
Suppose now that the cross derivatives are equal to zero
2
C(x
k
)
xj
1
xj
2
= 0, j
1
,=
j
2
, then the error per model point is reduced to:
k
=
iG
k
jN
1
2
2
C(x
k
)
x
2
j
(x
i
j
x
k
j
)
2
This now denes our grouping method. The grouping should be done in the
following way; if
2
C
x
2
j
is large in a certain area, then we need x
i
j
x
k
j
to be small.
This means that we need a lot of groups in those areas where the second order
derivatives are large.
10
But what if the cross derivatives are not equal to zero? By determining the
norm of the Hessian H we can dene our grouping method. Let us denote by

k
H
mp
 the Hessian evaluated at model point k. If we would calculate H
2
we
would have to calculate the eigenvalues of H and consequently solve the system
(H I) u = 0 of linear equations, where u is the eigenvector corresponding to
the eigenvector . Using Cramers rule, this system has only nontrivial solutions
if and only if its determinant vanishes, which means that the solutions are given
by: det (H I) = 0 which is the characteristic equation of H. This however
involves solving polynomial function of order n , p () =
jN
(1)
j
S
j
nj
,
where S
j
are the sums of the principal minors. Since there exist no exact
solutions to this system when n > 4, one has to resort to root nding algorithms
such as Newtons method. In packages such as Mathematica or Matlab, these
algorithms are readily available, but this is very costly and not desirable in
practice. What we can do however, is instead of making use of the 2norm we
can use the norm, H
= max
jN
N
[H
jj
[
4
, where j are the rows in
H and j
the columns, H
jj
the element of H in row j and column j
. This is
an easy calculation and can therefore be done in practice. We can now upper
bound the error by
k

k
H
mp

iG
k
j1N
j2N
_
x
ij1
x
k
j1
_ _
x
ij2
x
k
j2
_
k M
(6)
which states again that we should create more buckets whenever the innity
norm is bigger. We can see that whenever our function C(x) is linear in its
attribute j, the innity norm equals zero, and again, one bucket suces.
3.3 Numerical model point creation
Before we move on, we will rst solve some practical issues with the method in
Section 3.2. So far we have assumed that the function C(x) is known and we have
shown that, whenever we know this function we can analytically dierentiate it.
Furthermore, we can obtain a grouping strategy, as well as an upper bound on
the error made as we have shown in equation (6). However, we do not always
have access to this function. Therefore we will also dene a numerical way to
obtain a grouping strategy.
3.3.1 Grid construction
To obtain information about the function C(x), we need to explore the function
over the range of the attributes. In order to do so we create a grid. We will call
this grid our exploration grid. We call it exploration grid to avoid confusion
later on, when we describe the creation of the model points in Section 3.3.3.
The exploration grid is measuring the value of C(x) at dierent values of x
j
over the range of every attribute j. Constructing such a grid can be a time
4
If cross derivatives are 0 we simply have only the diagonal elements of H as every sum
11
consuming task, especially if we want it to be very precise. We discus this in
Appendix A.1. This however is a one time only investment. Whenever we know
the landscape where our function C(x) lives we do not have to perform this
action again. In this paper we assume that we have created a grid in a nested
way as in Appendix A.1. Denote by L the set of grid points and let a single grid
point be denoted by l L. The amount of grid points is now [L[. These grid
points are then constructed by dening per attribute j an amount of buckets b
e
j
,
which we will call our exploration buckets. The feasible range per attribute j
is now divided in b
e
j
buckets. This implies a division of the space D in an equal
amount of hypercubes [L[ =
jN
b
e
j
, which we index by l L. Then D
l
is a
hypercube, where
l
D
l
= D and D
l
D
l
= , l ,= l
, l, l
L. In Appendix
A.1, we describe two grid construction methods. For more advanced grids we
refer to [10], where an adaptive grid is discussed.
If we move back to our Hessian H, as discussed in Section 3.2, it now has
to be evaluated in l points. Each point is now the center of a hypercube D
l
.
Let us distinguish between those Hessians by introducing the notation
l
H
e
for
the exploration Hessian evaluated in grid point l. Since we need to numer
ically compute the Hessian by using the central dierence formula, we need
to evaluate the function C(x), 3 times (C(x
1
, . . . , x
j
, . . . , x
n
), C(x
1
, . . . , x
j
+
j
, . . . , x
m
), C(x
1
, . . . , x
j
j
, . . . , x
n
)) per entry of
l
H
e
. This means that for
every
l
H
e
, being a symmetric n n matrix, we need to call C(x), 3
n(n+1)
2
times. In total this makes [L[3
n(n+1)
2
function calls. Whenever we construct
such a grid, this should really be taken into consideration, since the amount of
function calls becomes very large, even for a small number of grid points.
3.3.2 Determining the amount of buckets
Let us now dene the maximum for every attribute j over all grid points l
in
l
H
e
, l L, by h
e
j
= max
lL
N
[
l
H
e
jj
[, j N. We can now put all
these h
e
j
in a column vector h
e
R
n
. The most important attribute is now the
attribute
j corresponding to the value h
e

j
.
Let b
mp
j
denote the amount of model point buckets for attribute j. Then we
can calculate b
mp
j
by using the following formula b
mp
j
= b
mp
j
h
e
j
h
e
j
, j N
j,
where [0, 1] is some parameter that should be chosen in any way to adjust
for the amount of buckets b
mp
j
. Note that the amount of buckets b
mp
j
, j N
can never be more then b
mp
j
.
3.3.3 Creating the model points
The amount of buckets for every attribute j can be spread in various ways over
the range of every attribute. We could spread them evenly for every attribute j,
or be somewhat more sophisticated and let the population distribution decide
12
on the cuto points. Dividing the range of every attribute j in any way, we
obtain
jN
b
mp
j
hypercubes in the R
n
. Note that some of these hypercubes
may not contain any policies. We will not create any model points in empty
sets, which means that the amount of model points constructed in such a way
is m
j
b
mp
j
, where k is the model point of G
k
, just as in Section 3.2.
Whenever the amount of model points is considered to be too large, one should
decrease b
mp
j
and recompute b
mp
j
, j ,=
j until the desired amount of groups k
is formed.
3.3.4 Assigning the policies to the model points
The next step is to assign to every policy i a group G
k
. This is done by checking
if the policy is within the range of group G
k
in every dimension n. Once we
know i M, its group i G
k
, we can compute the nal model point policies
x
k
, k M
as follows: x
k
=
_
iG
k
x
i
1
G
k

. . .
iG
k
x
i
n
G
k

_
T
, k M
, k M
to obtain
an exact upper bound on the error made by our grouping strategy. This involves
again calling the function C(x), m
3
n(n+1)
2
times. Note that this is no
problem whenever m
, l L,
which we have already calculated. This means that this does not cost any ex
tra function calls. We can approximate this, by considering the distance from
the model point to every grid point (x
l
x
k
), and use either the maximum or
the closest 
l
H
e
. Another, somewhat more sophisticated method, would use
interpolation techniques as in [1]. We will describing an interpolation method
by looking at the grid points that are the closest in every direction. So in our
dimension n we have 2
n
closest grid points, because in every dimension we have
two ways in which we can go. These will all be incorporated for the estimation
of 
k
H
mp


l1
H
e

(x
l1
x
k
) +. . . +
l
2
n
H
e

(x
l
2
n
x
k
)
2
n
z=1
(x
lz
x
k
)
Of course since this is only an indication, it cannot guarantee a certain amount
of error, but is still important from a practical view.
13
4 A real life example
4.1 Describing the product
The product of focus, is a so called lijfrente polis, and the policies that we will
look at are already in force. The product consists of a single premium payment,
the policies are already inforce. When a policy is inforce, it is means that policy
holder is currently insured. We make a picture at a certain time moment and
look at which policies are captured by the interval between the start date
and end date
5
. This implicitly means that the premium payment has already
occurred and we will therefore only look at cash outows. The payment takes
place at the start date and the amount is equal to the grosspremium. For
some reason, this is not available in the database and the grosspremium is
calculated by discounting the insured amount. The insurance company invests
the grosspremium in stocks and bonds and while doing this it guarantees a
return on the clients investment equal to an interest rate over the rst ve
years and another interest rate over the rest of the period. For this product,
there are three types of benets for the insured: the surrender, death, and the
maturity benet.
Surrendering means that you have the option of withdrawing money in be
tween. Whenever someone surrenders, the surrender value that he wishes to
withdraw will be used to buy an immediate annuity. The market value of this
immediate annuity is scenario dependent and will be paid out as the surrender
benet. The minimal annuity duration is based on the age and sex of the pol
icy holder which is looked up on a specic table. The surrender cash ow is
now the total discounted value of all future annuity payments. The probability
associated with a surrender is scenario dependent.
Furthermore we have a death benet which is equal to the insured amount.
This is the money that is paid in case of death.
Finally the product has a maturity benefit which is equal to part of the
insured amount. The maturity benet is the money you get upon expiration
of the policy.
Annuities that were already inforce were also in our data set. However,
given annuities are modeled deterministically, and therefore not dependent on
scenarios we do not consider them. Furthermore, there were 500 scenarios for
which we did not know the rationale behind, nor the likelihood of the occurrence
of the scenarios.
4.2 Selecting the attributes
The rst thing we need to do is get a feeling for the attributes before we can
compute our Hessian. What are they, and on what scale are they projected?
This becomes really important whenever we will disturb them with our
j
.
In our model there exist 32 policy attributes which we have to examine. We
should be careful here, as some of them are not real valued (e.g. product type).
5
Attributes are denoted in monospace
14
Therefore, this is why we will look at a specic product type. Some other
attributes have only a few outcomes over the range of the attribute which means
there is not much grouping to be done. There also exist a lot of attributes that
are calculated by the model, and attributes that are not used for our specic
product. So at rst we will produce the descriptive statistics of some of the
attributes in Table 1. According to these statistics we can get a broad idea on
which attributes it might be interesting to group. We can already specify one
on which we will not group: sex, since it only has two possible values, which
are integers. After a thorough investigation in the total set of attributes, we
concluded with ve attributes that where thought to be the main determinants.
All others where either dependent on those ve or, were considered to be critical,
which means that they will not be grouped upon. The resulting attributes are
presented below:
1. date of birth
2. start date
3. end date
4. insured amount
5. interest
4.3 Exploring the function
Now that we have found our set of interesting variables we want to measure their
inuence on the projected cash ow under every scenario. For this, will follow
our method as described in Section 3.3 we will construct a grid with virtual
policies. It is virtual in the sense that the policies do not come from the data
set X. They are made using the feasible input range for every attribute. This
feasible input range is assumed to accompany every program. In our program
however, we did not have this feasible input range and we constructed these by
looking at the descriptive statistics of the data set. In Table 1 we present these
statistics. The feasible input range of an attribute is now determined by the
dierence between the maximum and the minimum value of an attribute.
There exist various options to create such a grid and we discuss two methods
in Appendix A.1. Let us rst consider a nested grid. This would require at
least 2 3
n
amount of function calls, when we consider only 2 grid points per
attribute (one in its minimum and one in the maximum, and disturbing them
all with a + and a ). For n = 5 this is already 7776 calls of C(x). Since
a policy takes on average 70 seconds to calculate on our hardware, this would
take more then 6 days to calculate. This was considered to be too long for us.
Therefore we use a sequential grid construction, which means that for every
attribute we move from its minimum to its maximum in 10 steps and set all
other interesting parameters equal to their averages. These are then the virtual
policies. We are aware of the fact that we lose a lot of accuracy in this way
and we can only hope that the averages are a good representative of the whole
15
function. This also means that we have only one entry in the matrix H(x
l
)l L
, which is the second order partial derivative with respect to the attribute on
the corresponding axis where we created our grid. This is immediately the
norm of the matrix
l
H
e
, l L. Before we continue, we should check if the
constructed grids represent feasible combinations of the attributes. This means
that no start date can fall after an end date or that the date of birth of
a person should be before the start date of a policy.
After constructing the grid points, we can put them one by one in the model
and obtain the resulting discounted cash ows for every scenario.
4.4 Interpreting the function
Now that we have projected the cash ows for every virtual policy for 500
scenarios we will rst investigate the cash ow movement over the ten grid
points. We will look at scenario 1, which we plotted in Figure 2.
A rst step is interpreting these graphs. Starting with the cash ows for
the date of birth, we see that the later you are born the more negative the
projected cash ow becomes. This can be explained by the fact that the later
you were born, the younger you are now and the longer you are expected to
live. The insured amount is build up of two components, the death benefit
insured amount and the surrender benefit insured amount. The relation
between the death benefit and the surrender benefit is on average 1 : 4,
which makes the surrender benefit the strongest determinant of projected
cash ows. Once you have died, there will not be any possibility to surrender.
This means that, the longer you live the more surrender benefit you will get,
and maybe in addition even the death benefit.
The start date is uctuating a lot and one can hardly tell what the drivers
behind this function are. However, if we look at the scale, it is not an interesting
variable at all. We can therefore simply put this attribute in one bucket without
making a signicant error.
Looking at the end date, we see that when we move our end date to a later
time moment, the cash ow is also increasing, which means that the liability is
decreasing (the cash ows are all negative). Because we expected the liability
to increase when a policy has longer duration the result is really counter intu
itive. However looking at the actuarial model, it can easily be explained. By
shifting the end date to a later time moment, we are increasing the duration.
When investment premium is calculated, it makes use of the insured amount,
the duration and the interest rate. Discounting the insured amount at
the same interest rate over a longer term, results in a lower investment
premium. In turn, the investment premium is used to calculate the possible
amount of surrender cash ows, which is now of course lower.
Considering the insured amount we see a linear relationship between at the
chosen grid points between the project cash ows and the insured amount.
However, this is only an expectation, the function may still uctuate in between
the grid points. However when making the buckets, we expect that we can put
this attribute in one bucket without making any signicant error.
16
Discounting with higher interest rate will result in a lower investment
premium, which again causes the discounted cash ows will be lower.
What we can see from these functions is that they are either monotonically
increasing or monotonically decreasing except for the start date, which was
not considered important because of its scale. However, if the scale would have
been very large the amount of buckets needed would be very big which would
cause issues. However, all the functions seem to be not that far from linearity
and we therefore do not expect large errors at all, even if we will group them
all in one large model point (we will further discuss this in Section 4.6).
To estimate our second order partial derivatives for every attribute we use
the central dierence formula. The graphs are presented in Figure 3. If we look
at them, one can rst of all see, that all second order derivatives ar not far
from zero which means linearity. Since we did not know the whole cash ow
function, the function like end date might uctuate a little bit in between. This
can simply be seen as numerical noise. Using the recipe presented in Section
3.3, we can take the maximum second order partial derivative for every grid
point and take the maximum of these to determine the amount of buckets for
the attribute. The results are presented in Table 3, where they are all sorted
according to importance. We now see that the end date is considered the
most important attribute. Setting the amount of buckets for the end date,
automatically induces the amount of buckets for the other attributes as dened
in Section 3.3.2.
4.5 Generating model points
Until now, we have not yet used any policy data. The projected cash ow
functions were solely based on the feasible range of the attributes as discussed
in Section 4.3. The policy data that we will use is from an existing insurance
company. However for privacy and computational reasons we have modied the
data. We will now make use of these policies for the bucketing of the attributes,
by spreading the population evenly over the buckets as described in Section
3.3.2. In Microsoft Visual Basic for Applications (VBA) we developed a script
to produce a cumulative distribution function which gave us our cuto points
for the buckets at 1/b
j
. Now that we have set our buckets, we need to nd out
which policies are in a certain group. We will do this by using the method as
described in Section 3.3.4. For this we again used VBA by checking if a policy
is within the range of every buckets cuto points and code every policy with a
group number. The model points are now the averages of every attribute over
the policies within the group.
4.6 Results
We will rst produce a base run, where we calculate the cash ow by using all
single policies. In our resulting data set we have 243 policies. The important
statistics we need are the calculation time and the resulting present value per
17
scenario. We will present them in Table 2. As shown in Table 2 calculating the
cash ows for all single policies took 160 minutes.
4.6.1 Grouping
To show our method at work, we will explain stepwise what we can achieve.
Table 3 is produced as described in Section 3.3.2, from which we can see that
the end date requires the most groups. We will initially set the end date to 3
groups which gives the date of birth 2 groups and the amount insured and the
interest rate only 1 according to our formula in 3.3.2. Therefore we go from
243 individual policies to 6 model point policies. However, let us start bottom up
to show the algorithm at work. We start by grouping the insured amount in one
model point. Since this attribute looked linear we do not expect any error, apart
from the error that we made by assuming that the cross derivatives are zero.
When grouping this attribute away we could decrease the amount of policies by
a factor 3, i.e. 243 policies to only 81 model point policies. Computation times
dropped from over 2.4 hours to only 62 minutes. If we now look at our cash
ow deviation, we still have an accuracy of 99.9995% which is in line with our
theory.
Now, as we have seen, we can eliminate the start date attribute since it
does not contribute much to the discounted cash ow. When doing this, we
actually also group this one away and set the value equal to its average. A
reduction of 3 is again made i.e. 81 model point policies to 27. Our accuracy
lowers slightly but we are still 99.83% accurate. Solving this took only 20
minutes.
The interest rate is then grouped in one bucket which leaves us with only
9 model point policies. Computation times are now only 7 minutes and, quite
unexpectedly, the accuracy is only 97.15 %, which is still fair for most life insur
ers since they are aiming for an accuracy in between 95% and 98%. The result
was unexpected given that the interest rate looked linear. However, recall
that we only gained very local insight by assuming that the cross derivatives
are zero and that we have at every grid point only the second order partial
derivative in one direction. Although the interest rate is nearly linear on
this local part of the hyperplane, it uctuates more elsewhere.
Therefore we will see if perhaps the date of birth should have been next
on the list instead of the interest rate. When now grouping, in addition to
insured amount and start date, the date of birth in one model point, we are
98.25 % accurate. This is also unexpected but can be explained by the weakness
of our local grid (see Appendix A.1)
However, let us continue in the way that our algorithm predicted. In the
end, this leaves us with 6 model points which are 95.91 % accurate. This took
only 3.3 minutes. Therefore, what would happen if we would group them all in 1
model point? Doing this, leaves us still with 95.72 % of accuracy. Again we are
5
All runs where performed on an IBM Lenovo T61 with Intel T7500 Core 2 Duo 2.2
GHz Processor with 2.0 Gb of RAM. Microsoft Visual Basic for Applications was run on the
Windows XP operating System.
18
confronted with the limited reliability of the local insight of a sequential grid.
The function C(x) seems to be very at and does not seem to uctuate very
much over the whole domain. As insurance companies really do have trouble
grouping policies while maintaining accuracy, this is generally not the case as.
It could be due to the chosen product or the restriction on the input parameters.
It was however not very well suited for demonstration purpose of our method.
4.6.2 Approximate deviation from the base run
For the insured amount we looked at the error that we can predict by grouping
them all in one model point. By using the interpolation method as described
in Section 3.3.5 our model estimates an accuracy of 99.999999% which is higher
than the actual accuracy as we can see in Table 3. Although this does not look
too bad as compared to the 99.9995%, one has to put this in perspective to the
total error range we are looking at. We will never make a higher error than
95.72% since this is the error made when we group all policies in one model
point. Again, we can not be conclusive and the lack of a good grid disturbs the
outcomes.
19
5 Conclusion and future work
In this paper we have provided a solution to estimate the error induced by the
usage of model points, a solution to a problem that insurance companies were
not even aware of. They estimated the errors of their grouping strategy based
on an outdated base run. We have shown how we can correctly upper bound
these errors without having to calculate the base run again.
Furthermore we have dened a way how insurance companies could group
their policies. Whenever a linear attribute is encountered it can be grouped
away, without making any error at all. If an attribute is nonlinear, it should be
grouped according to the Hessian in a certain area of the domain of the function
that discounts the cash ows.
Even if the exact analytical function is not available, we have shown numer
ical ways to group the policies. We made use of grids to explore the function
that calculates the present value of the cash ows. These exploration grids do
consume a lot of time, however if one is willing to make the one time only invest
ment of exploring this function, we can upper bound the error made. Moreover,
if one does not want to invest too much in the construction of the grid we can
still provide an approximate upper bound.
For a simple but real life example we have illustrated the method. We have
shown the tradeo and drawbacks of a fast local exploration of the landscape,
by making use of the sequential grid versus a slow nested grid construction. The
slightly weak results can be attributed to the unfortunate choice of the product
and lack of computing power for a good grid.
Improvements can be made by better distributing the buckets over an at
tribute. This could be done by distributing them depending on the Hessian over
the range of the attribute, instead of uniformly distributing them. Considering
the grids, a more sophisticated grid such as an adaptive one, may be of great
help. However, if we know the analytical function we might not need to use the
exploration grids at all and we can perfectly upper bound the errors. The exact
calculations can be performed by making use of software such as Mathematica
or Maple.
There is still a lot of work to be done in this area, however a rst step has
been made which can greatly benet insurance companies.
20
References
[1] Robert S. Anderssen and Markus Hegland. For numerical dierentia
tion, dimensionality can be a blessing. Mathematics Of Computation,
68(227):1121 1141, February 1999.
[2] J. Ghosh A.Strehl, G.K. Gupta. Distance based clustering of association
rules. Department of Electrical and Computer Engineering, 1991.
[3] Prof. dr. A. Oosenbrug RA AAG. Levensverzekering in Nederland. Shaker
Publishing, 1999.
[4] S. Z. Wang G. Nakamura and Y. B. Wang. Numerical dierentiation for
the second order derivative of functions with several variables. Mathematics
Subject Classication, 1991.
[5] Hans U. Gerber. Life Insurance Mathematics. Springer, 1997.
[6] R. Bulirsch J. Stoer. Introduction to Numerical Analysis. SpringerVerlag,
2 edition, 1991.
[7] J.M. Mulvey and H.P. Crowder. Cluster analysis: An application of la
grangian relaxation. 1979.
[8] J.M. Mulvey and H.P. Crowder. Impact of similarity measures on webpage
clustering. 2000.
[9] C.M. Procopiuc P. K. Agarwal. Exact and approximation algorithms for
clustering. Management Science, 25(4):329340, 1997.
[10] D. S. McRae R. K. Srivastava and M. T. Odmany. An adaptive grid
algorithm for airquality modeling. Journal of Computational Physics,
(165):437472, 2000.
[11] Vladimir I. Rotar. Actuarial Models: The Mathematics Of Insurance.
Chapman & Hall, 2006.
[12] J. Rowland and D. Dullaway. Smart modeling for a stochastic world. Em
phasis Magazine, 2004.
[13] M. Sarjeant and S. Morrison. Advances in risk management systems of life
insurers. Information Technology, September 2007.
[14] HM Treasury. Solvency II: A new framework for prudential regulation of
insurance in the EU. Crown, February 2006.
[15] HM Treasury. Supervising insurance groups under Solvency II; A discussion
paper. Crown, February 2006.
[16] G.R. Wood and B.P. Zhang. Estimation of the lipschitz constant of a
function. Journal of global optimization, 8(1):91103, January 1996.
21
A Appendix
A.1 Gridconstruction
When we do not have an analytical model, we rely on numerical methods. To
gain insight in our cash ow function in order to later construct the Hessian, we
need to know the value of C(x) evaluated at dierent points. A correct grid can
only be constructed when we can isolate an attribute. This means we need to x
all attributes at a value except for one which we let increase over its range in a
certain number of steps depending on the chosen grid size. We can do this for all
attributes. However we need to check that the policies generated in this fashion
are feasible. This means that no end date can fall before a start date, etc.
6
. There are a many ways to construct such a grid, see [10], but we will present
two ways; sequential and nested grid construction. A nested grid construction
considers all possible combinations of the attribute values. On the contrary a
sequential grid construction calculates a grid per attribute, while xing the other
attributes at a certain value. Therefore set obtained by calculating sequential
grid points is actually a subset of the nested version. Although the nested
version is far more accurate, the computational complexity is overwhelming [1].
6
If we let our grid size go to we have the exact ndimensional landscape
22
Algorithm A.1: Sequential grid construction(C(x))
j
= [0 . . . 0
j
0 . . . 0]
T
for r 0 to [R[
do
_
_
x
r1
= min
i
x
i
1
+r
maxi x
i
1
mini x
i
1
R
x
r2
= x
2
, x
r3
= x
3
, . . . , x
rn
= x
n
C(x
r
)
C(x
r
+
1
)
C(x
r
1
)
for r 0 to [R[
do
_
_
x
r2
= min
i
x
2
+r
maxi x2mini x2
R
x
r1
= x
1
, x
r3
= x
3
, . . . , x
rn
= x
n
C(x
r
)
C(x
r
+
2
)
C(x
r
2
)
.
.
.
for r 0 to [R[
do
_
_
x
rn
= min
i
x
i
n
+r
maxi x
i
n
mini x
i
n
R
x
r1
= x
1
, x
r2
= x
2
, . . . , x
r,n1
= x
n1
C(x
r
)
C(x
r
+
n
)
C(x
r
n
)
As illustration, let R be the set of buckets for every attribute j, and r R
a bucket. Consider then constructing [R[ buckets for every attribute j. This
will construct in total [R[
n
(1+2n) policies which is exponential in its attribute.
Suppose now that the time to compute the cash ow of one policy is t seconds.
The nested grid construction, as described above, will take t [R[
n
(1 + 2n)
seconds. To visualize this consider 4 attributes, for which we would like to
to compute 10 grid points, and the time to compute a single policy on this
computer is 70 seconds. This would take about 72 days to compute. For the
purpose of this paper, this was considered to be too long and we will look at the
sequential grid which takes (3 nt [R[) seconds and is a factor
R
n1
(1+2n)
3n
faster.
For the nested version we need to make a choice at what point to x the other
attributes. For now we will x them to be their averages, but we are well aware
that this might not be the correct choice. In addition the few points created in
23
the sequential loops only give very local insight in the true n dimensional space.
Algorithm A.2: Nested grid construction(C(x))
for r
1
0 to [R[
do
_
_
x
r11
= min
i
x
i
1
+r
1
maxi x
i
1
mini x
i
1
R
for r
2
0 to [R[
do
_
_
x
r22
= min
i
x
i
2
+r
2
maxi x
i
2
mini x
i
2
R
.
.
.
for r
n
0 to [R[
do
_
_
x
rnn
= min
i
x
i
n
+r
n
maxi x
i
n
mini x
i
n
R
C(x
r1,1
, x
r2,2
, . . . , x
rn,n
)
C(x
r1,1
+
1
, x
r2,2
, . . . , x
rn,n
)
.
.
.
C(x
r1,1
, x
r2,2
, . . . , x
rn,n
+
n
)
C(x
r1,1
1
, x
r2,2
, . . . , x
rn,n
)
.
.
.
C(x
r1,1
, x
r2,2
, . . . , x
rn,n
n
)
Faster hardware or splitting the workload over multiple processing units
could help speeding up the computation times for the grid construction, however
this still does not reduce the exponential n which is the main determinant for
the long runtime. The bottom line is that one should really make use of adaptive
grids.
24
A.2 Notations
1. D = [0, 1]
n
: the ndimensional space in which the scaled function C(x)
lives
2. M = 1, . . . , m: represents the policies, where i M is a policy
3. M
= 1, . . . , m
is
a model point
4. N = 1, . . . , n: represents the policy attributes, where j N is an
attribute
5. L: represents the grid points, where l L is a grid point
6. G
k
: group of policies represented by model point k
G
k
M, G
k
G
k
= ,
k
G
k
= M, k, k
7. x R
n
: a generic policy consisting of n attributes
8. x
i
R
n
: a specic policy i
9. x
k
R
n
model point policy k,
10. x
i
j
[0, 1]: the value of attribute j for policy i
11. C(x) : R
n
R: the function which discounts the future cash ows of
policy x
12. b
e
j
: the amount of buckets for attribute j in the exploration grid
13. b
mp
j
: the amount of buckets for attribute j in the model point grid
jN
b
mp
j
m
14. H R
nn
: the general Hessian
15. H
j
R
n
: the jth row in H
16. H
jj
R: the element at row j and column j
of the Hessian
17. H
N
l
H
jj
: the maximum sum over the rows in every
exploration grid point l, for every attribute j
21. h
e
R
n
= [h
e
1
h
e
2
. . . h
e
m
]
T
22. h
e

= H