Вы находитесь на странице: 1из 29

Supplementary notes for Exam C

Overview
1.1 Introduction

In July and August 2013, the SoA added a number of questions to the sample exam questions
document for Exam C on the Be-an-Actuary website. These were to cover syllabus items recently
added to Exam C. The attached note covers the additional material needed for these syllabus
items.

There are three sections in this note. The first looks at the idea of an extreme value distribution.
The second describes an alternative approach to dealing with large data sets. The final section
introduces a number of additional simulation techniques in various situations.

As you read this material, you should have in mind as far as possible the material in Chapter 2
for the first section, that in Chapter 7 for the second section, and that in Chapter 12 for the final
section.

We have given the syllabus items themselves in an appendix to this note.

1
Supplementary note

1.2 Extreme value distributions

The following section should be read in conjunction with Chapter 2 of the textbook.

There are some areas of insurance work where it is useful to model quantities using distributions
with particularly heavy tails. One example of a situation like this would be when constructing a
model for the largest value in a set of identical and independently distributed random variables.
If we are trying to model the maximum value from a random sample, intuitively this maximum
value is likely to be in some sense large. We may therefore want to model it with a distribution
which is heavy-tailed.

Distributions of this type are known as extreme value distributions. In the situation outlined
above, the inverse Weibull distribution is often use as a model. This is related to the Weibull
distribution studied earlier as follows.

Inverse Weibull distribution (Fréchet distribution)


If a random variable X has a Weibull distribution, then the random variable Y  1 / X is said to
have an inverse Weibull distribution. The inverse Weibull distribution has the following
attributes:
 ( x / )
  x /  e
PDF: f (x) 
x

CDF: F  x   e ( x / )

Moments:  
E X k   k   1  k / 

This distribution is sometimes known as the Fréchet distribution. More details for the Inverse
Weibull distribution are given in the Tables for Exam C.

The process of finding the distribution for the random variable Y  1 / X can be applied to other
distributions. Examples of the inverse exponential distribution and the inverse gamma
distribution are given in the Tables for Exam C.

Other distributions which are sometimes used in this context are the Gumbel distribution, and
the Weibull distribution itself (without inversion). Details for the Gumbel distribution are given
below.

Gumbel distribution
A random variable X is said to have a Gumbel distribution if:
1 y x
f (x)  e exp   e  y  where y  , and   x  
   
The distribution function is:

F( x )  exp   e  y 
 

The details for the Weibull distribution itself are given in Chapter 2 of the textbook. The Pareto
distribution also has a thick tail, and can sometimes be used in these situations.

2
Supplementary note

1.3 Large data sets – an alternative approach

The following section should be read in conjunction with Chapter 7 of the textbook.

We have seen in Chapter 7 how the Kaplan-Meier method can be adapted for use with large data
sets. Here we look at another approach for calculating mortality rates when large numbers of
lives are involved. To help to describe the main features of the method, we shall use a small
sample of six lives to demonstrate the approach.

The exact exposure method

A company is trying to estimate mortality rates for the holders of a certain type of policy. It has
the following information about a group of 6 lives, who all hold a policy of this type. The
investigation ran for a three-year period, from Jan 1 2010 to Dec 31 2012.

Life Date of birth Date of purchase Mode of exit Date of exit

1 Mar 1 1965 Jul 1 2009 Alive Dec 31 2012

2 Jul 1 1965 Nov 1 2009 Death Mar 1 2011

3 Aug 1 1965 Apr 1 2010 Surrender Feb 1 2012

4 Apr 1 1965 Jun 1 2011 Alive Dec 31 2012

5 May 1 1965 Aug 1 2010 Surrender Jun 1 2012

6 Oct 1 1965 May 1 2010 Death Apr 1 2012

We see that of the 6 lives, two survived within the population to the end of the investigation, two
of the lives surrendered their policies while the investigation was in progress, and two died
during the period of the investigation. We wish to use the information in the table above to
estimate mortality rates at various ages. We shall assume that each month is exactly one-twelfth
of a year, to simplify the calculations.

We start by finding the ages at which each life started to be observed, and the age at which life
ceased to be observed. Note that although Life 1 purchased his policy on July 1 2009, the
investigation had not started at that point. So the date on which Life 1 is first observed is
January 1 2010. Life 2 is also first observed on this date.

3
Supplementary note

This gives us the following table of ages:

Life Age at first observation Age at last observation

1 10
44 12 10
47 12

2 6
44 12 8
45 12

3 8
44 12 6
46 12

4 2
46 12 9
47 12

5 3
45 12 1
47 12

6 7
44 12 6
46 12

In order to estimate the mortality rates, we need to find out the length of time for which each life
was alive, and for which they were a member of the investigation. We need to subdivide these
periods by age last birthday. So, for example, we shall use e44 for the period of time during
which a life (or group of lives) was aged 44 last birthday, e45 for the corresponding period of
time for which lives were aged 45 last birthday, and so on.

From the table above, we can now find the contribution of each life to each of e44 , e45 , e46 and
e47 . This gives us the following table of figures (in months):

Life Age at first Age at last e44 e45 e46 e47


observation observation

1 10
44 12 10
47 12 2 12 12 10

2 6
44 12 8
45 12 6 8 - -

3 8
44 12 6
46 12 4 12 6 -

4 2
46 12 9
47 12 - - 10 9

5 3
45 12 1
47 12 - 9 12 1

6 7
44 12 6
46 12 5 12 6 -

This gives us totals in each of the ek columns of 17, 53, 46 and 20 respectively.

4
Supplementary note

We can now use these to calculate estimates of the mortality rates. It can be shown that d j / e j
provides us with the maximum likelihood estimate of the hazard rate at each age. Noting that
Life 2 dies at age 45 last birthday, and that Life 6 dies aged 46 last birthday, we can find estimates
of the hazard rates at these two ages:

1 1
hˆ 45   0.22642 and: hˆ 46   0.26087
53 /12 46 /12

Note that we do not have enough data to provide estimates of the hazard rates at any other age.
Alternatively we could claim, without much conviction, that our estimates of the mortality rates
at ages 44 and 47 were zero, based on this very small sample of data.

If we wish to find the values of the corresponding q -type mortality rates, we use the
relationship:

ˆ
qˆ x  1  e  hx

In this case we obtain corresponding q -type rates of 0.20261 and 0.22962 respectively.

The method we have used here is called the exact exposure method. We have calculated the
exact period of time for which a group of lives has been exposed to the risk of death for a
particular age.

The actuarial exposure method

An alternative approach is to use what is called the actuarial exposure method. This provides us
with a direct estimate of the q -type mortality rates, but it is perhaps not so intuitively appealing.
We proceed as follows:

(1) Calculate the contribution of each life to each of the e j figures, as above.

(2) For each of the lives that die (and only for the deaths), add in the period of time from the
date of death until the end of the year of age (ie the period of time until the life would
have achieved its next birthday). This increases the contribution from the deaths to one
(or sometimes two) of the e j figures.

(3) The q -type rates are now given directly by d j / e j .

If we apply this method to the data given above, we have the following alterations:

(a) Life 2 now contributes 12 months to e45 .

(b) Life 6 now contributes 12 months to e46 .

All the other figures in the table are unchanged. The column totals are now 17, 57, 52 and 20
respectively. If we recalculate our mortality estimates, we find that

1 1
qˆ 45   0.21053 and: qˆ 46   0.23077
57 /12 52 /12

5
Supplementary note

Although we have used in these examples a sample of only 6 lives, you should be able to see that
the method generalizes easily, and can cope with large sample data without any real increase in
the level of difficulty of the calculation.

The approaches outlined above are sometimes called seriatim methods. This refers to the fact
that the data points are analyzed as a series of independent observations.

Insuring ages

A variation on this idea is to use the concept of insuring ages. In this case, an insurer will
designate each policyholder to have their birthday on the date on which the policy was first taken
out. So, for example, if a person is aged 45 last birthday when he takes out his policy, we treat
him as if he is aged exactly 45 on the issue date. This means that some of the elements of the
exposure will be assigned to younger ages than would be the case when using the policyholder’s
true birthday.

Example 1.1

Reanalyze the data given above for the six lives, using insuring ages by age last birthday.
Recalculate the estimates of the hazard rates at ages 45 and 46.

Solution

We now have the following table.

Life Date of birth Date of New Age at first Age at last


purchase birthday observation observation

1 Mar 1 1965 Jul 1 2009 Jul 1 1965 6


44 12 6
47 12

2 Jul 1 1965 Nov 1 2009 Nov 1 1965 2


44 12 4
45 12

3 Aug 1 1965 Apr 1 2010 Apr 1 1966 44 10


45 12

4 Apr 1 1965 Jun 1 2011 Jun 1 1965 46 7


47 12

5 May 1 1965 Aug 1 2010 Aug 1 1965 45 10


46 12

6 Oct 1 1965 May 1 2010 May 1 1966 44 11


45 12

Note that again, Lives 1 and 2 are not observed until the start of the investigation on January 1
2010.

Notice that by using insuring ages last birthday, the birthday is always moved forwards in time,
so that lives becomes younger than they really are. Also, lives whose policy purchase occurs
within the period of the investigation will now be observed for the first time at an integer age.

6
Supplementary note

This now gives us the following table of exposures:

Life Age at first Age at last e44 e45 e46 e47


observation observation

1 6
44 12 6
47 12 6 12 12 6

2 2
44 12 4
45 12 10 4 - -

3 44 10
45 12 12 10 - -

4 46 7
47 12 - - 12 7

5 45 10
46 12 - 12 10 -

6 44 11
45 12 12 11 - -

The total contribution of each life (ie the total of the exposures in each row) is the same as before,
but the distribution is different. We now have column totals of 40, 49, 34 and 13. Using the exact
exposure method, we find that:

1 1
hˆ 45   0.24490 and: hˆ 46   0.35294
49 /12 34 /12

We can calculate q -type rates from these as before. 

Anniversary-based studies

In the study outlined above, we had a three-year period of investigation, which ran from
January 1 2010 to December 31 2012.

An alternative approach (which can simplify the numbers obtained) is to use an anniversary-
based study. In a study of this type, each life enters the investigation on the first policy
anniversary during the period of the investigation. Lives will also exit on the last policy
anniversary within the period of the investigation, if they are still active lives at this point. The
amount of exposure is reduced (which reduces the amount of information we are using), but the
numbers may be simplified, particularly if we use this method in conjunction with insuring ages.

Let’s see how we can apply this method to the example data given earlier.

7
Supplementary note

Example 1.2

Using the data for the 6 lives given above, calculate the exposures that would be obtained in an
anniversary-based study, using insuring ages last birthday.

Solution

Although the overall period of the investigation is from January 1 2010 to December 31 2012, each
life will enter the investigation on the policy anniversary following January 1 2010, and will leave
on the policy anniversary preceding December 31 2012, if they are still active at this point. So, for
example, Life 1 enters the investigation on July 1 2010, at which point the life has insuring age 45.
We obtain the following new table of values.

Life Date of Date of Date of Insuring Date of exit Insuring age


birth purchase entry age at at exit
entry

1 Mar 1 1965 Jul 1 2009 Jul 1 2010 45 Jul 1 2012 47

2 Jul 1 1965 Nov 1 2009 Nov 1 2010 45 Mar 1 2011 4


45 12

3 Aug 1 1965 Apr 1 2010 Apr 1 2010 44 Feb 1 2012 10


45 12

4 Apr 1 1965 Jun 1 2011 Jun 1 2011 46 Jun 1 2012 47

5 May 1 1965 Aug 1 2010 Aug 1 2010 45 Jun 1 2012 10


46 12

6 Oct 1 1965 May 1 2010 May 1 2010 44 Apr 1 2012 11


45 12

So the exposures are now as follows.

Life Age at first Age at last e44 e45 e46 e47


observation observation

1 45 47 - 12 12 -

2 45 4
45 12 - 4 - -

3 44 10
45 12 12 10 - -

4 46 47 - - 12 -

5 45 10
46 12 - 12 10 -

6 44 11
45 12 12 11 - -

The total exposures at each age are now 24, 49, 34 and zero (working in months as before). 

8
Supplementary note

Note that in an investigation of this type, all lives who are active at the end of the investigation
will contribute a whole number of years to the exposures. Only lives who die or surrender will
contribute at fractional ages. In a large investigation, it may be that most of the lives are active
lives. So the amount of calculation needed may be reduced significantly using this method.

Interval-based methods

An alternative approach is not to record the exact time or age at which an event takes place, but
just to record the number of events of each type in each year of age. If we do this we will lose
some accuracy, but the calculations will be simplified. In a large actuarial study, provided that
there are many lives contributing to each age group, the loss of accuracy is likely to be small.

We will need to record the number of lives in the investigation at the start of each year of age,
together with the numbers entering, dying and leaving during the course of the year. We can
then use a table of these values to estimate the exposure within each particular age group.

Let’s see how we might apply these ideas to the group of six lives studied earlier.

Example 1.3

Using the data for the six lives given previously, construct a table of the numbers of decrements
in each year of age, and calculate the exact exposure for each of the relevant age groups.

Solution

Using exact ages, we have previously constructed the following table of data:

Life Age at first observation Age at last observation Mode of exit

1 10
44 12 10
47 12 Withdrawal

2 6
44 12 8
45 12 Death

3 8
44 12 6
46 12 Withdrawal

4 2
46 12 9
47 12 Withdrawal

5 3
45 12 1
47 12 Withdrawal

6 7
44 12 6
46 12 Death

Withdrawal here includes both lives who surrendered, and lives who were active at the end of
the investigation period.

9
Supplementary note

We can see that:

(a) Four lives entered at age 44 last birthday, one at 45 last birthday and one at 46 last
birthday.

(b) One death occurred at age 45 last birthday, and one at age 46 last birthday.

(c) Of the survivors, one exited at age 46 last birthday, and three at age 47 last birthday.

This leads to the following table of decrements:

Age Population at Number Number dying Number Population at


start of year entering leaving year end
during the during the
year year

44 0 4 0 0 4

45 4 1 1 0 4

46 4 1 1 1 3

47 3 0 0 3 0

We can now calculate estimates of the exact exposure, and the actuarial exposure. We are
assuming that we now do not have the exact information about entrances and exits, but only
have information in the form of the table above. We will now have to approximate the exposure.

The numbers in the population at the start of the year will contribute a full year to the exposure.
The numbers entering, dying and leaving are assumed to be distributed uniformly over the year.
This leads to the following formula for the exposure:

 
e j  Pj  n j  d j  w j / 2

where Pj is the population at the start of the year, n j is the number of lives entering during the
year, d j is the number of deaths during the year and w j is the number leaving the population
during the year.

If we apply this to the figures in the table above, we obtain estimates of the exact exposure at age
44 of:

e44  P44  (n44  d44  w44 ) / 2  0  (4  0  0) / 2  2

Similarly, if we apply the formula at the other ages, we obtain exposures of 4, 3.5 and 1.5
respectively. We can then calculate estimates of the hazard rate at each age:

1 1
hˆ 45   0.25 and: hˆ 46   0.2857
4 3.5

Of course, given the small sample of lives, these estimates are different from the ones we
obtained earlier. However, using a large data sample, the loss of accuracy may not be great.

10
Supplementary note

If we wish to use the actuarial method, the deaths count a full year in the exposure. We therefore
do not need to deduct half the number of deaths in the exposure formula, which now becomes:


e j  Pj  n j  w j / 2 
We now have figures for the exposure in each year of 2, 4.5, 4 and 1.5. So, for example, our
estimate for q 45 using the actuarial method now becomes:

1
qˆ 45   0.2222
4.5

Variance of the estimators

We have seen that hˆ  d / e can be used as an estimate for the hazard rate h , and that, using the
actuarial approach to finding e , that qˆ  d / e can be used as an estimate for the mortality rate q .
Note that e is calculated differently in the two cases.

It can also be shown that, under certain assumptions, these estimates are actually maximum
likelihood estimates for h and q . We shall not prove this here. However, if we make the
assumption that h is constant over the period during which we are observing the lives, then
these estimates are maximum likelihood estimates.

Here is a formal statement of this result.

MLEs for h and q


Suppose that a group of lives is observed from age a to age b , b  a . Assuming that the hazard
rate h is constant over the interval, then:

hˆ  d / e and: qˆ  1  e  d /e

are maximum likelihood estimates for h and q . Here, e is the exact exposure for the group of
lives over the age interval.

We can show these results in the usual way, by constructing the likelihood function, taking logs,
differentiating it with respect to h , setting the result equal to zero and solving the resulting
equation.
In fact we can go further than this. Recall from Chapter 5 that the Cramér-Rao lower bound can
be found for the variance of an estimator. In this case we can use the CRLB to find the variance of

the estimator for the hazard rate – it turns out to be var hˆ  d / e 2 . With this result, and using
the delta method from Chapter 5, we can also find the variance for the estimator for q , which
2
turns out to be var  qˆ    1  qˆ  d / e 2 .

We are assuming here that q represents the probability of death in a single time period. In the
more general case, where q is the estimate of the probability of death over a longer period than
one year, we have the corresponding result that:
2
var  qˆ    1  qˆ   b  a 2 d / e 2
where q is now the probability that a life dies between age a and age b .

11
Supplementary note

1.4 Simulation

The following section should be read in conjunction with Chapter 12 of the textbook.

Simulation methods for normal and lognormal distributions

To simulate values from a normal distribution, the inversion method can be used as usual. The
procedure would be:

1 Simulate a value u1 from a U (0,1) distribution.

2 Use tables of the standard normal distribution to find z1 where ( z1 )  u1 .

3 z1 is now a simulated value from a N (0,1) distribution. To find a simulated value x1


from a general N (  ,  2 ) distribution, use the transformation x1    z1 .

4 To find a simulated value from a lognormal distribution with parameters  and  2 , use
the transformation x1  e   z1 .

5 Repeat the process to obtain as many simulated values as are required.

However, there are a number of other methods that can be used to simulate values from normal
distributions. We give two methods here.

The Box-Muller method

An alternative approach is to use the Box-Muller method. This uses pairs of independent U (0,1)
simulated values to obtain pairs of independent standard normal values. The procedure is as
follows.

1 Generate 2 independent U (0,1) random numbers, u1 and u2 .

2 Then:

z1  2 log  u1  cos  2 u2  and: z2  2 log  u1  sin  2 u2 

are independent values from an N (0,1) distribution.

Note that you should set your calculator to ensure that the trigonometric functions are calculated
in radian mode.

12
Supplementary note

The polar method

The polar method also starts by calculating two independent values from U (0,1) . The method is
as follows:

1 Generate two independent U (0,1) numbers, u1 and u2 .

2 Calculate x1  2 u1  1 and x2  2u2  1 .

3 Calculate the value of w  x12  x22 . If w  1 , reject the process and start again.

4 Calculate y   2 log w  / w

5 Calculate z1  x1 y and z2  x2 y . z1 and z2 are the required independent N (0,1)


variables.

Let’s see how to simulate values from a normal distribution using each of the methods given
above.

Example 1.4

Use each of the three methods given above (including the inversion method) and the random
numbers u1  0.273 and u2  0.518 to generate values from a normal distribution with mean 100
and standard deviation 20.

Solution

First we use the inversion method. We need to find the values z1 , z2 from the standard normal
distribution, such that   z1   0.273 and   z2   0.518 . Since the first random number is less
than 0.5, we will use the equivalent result    z1   1  0.273  0.727 . From the tables of the
standard normal distribution, we find that z1  0.604 . Similarly, we find that z2  0.045 . To
find values from a normal distribution with the given mean and standard deviation, we use the
relationships x1  100  20 z1  87.92 and x2  100  20 z2  100.90 . These are our two simulated
values from a N (100, 20 2 ) distribution.

Using the Box-Muller method, we obtain the values:

z1  2 log  0.273  cos  2    0.518   1.60109

and: z2  2 log  0.273  sin  2    0.518   0.18186

Multiplying by 20 and adding 100, we obtain simulated values of 67.98 and 96.36.

We need to be careful here about the order in which we use the random numbers. If we switch
around u1 and u2 , we will of course end up with different simulated normal values. 

13
Supplementary note

Finally, using the polar method, we have x1  0.454 and x2  0.036 . So w  0.207412 , and we
can use these values in the process since w  1 . Using the formula given above for y , we find
that y  3.89466 , and our standard normal values are 1.76817 and 0.14021 . Multiplying by 20
and adding 100 as before, we obtain the numbers 64.64 and 102.80. 

Simulation of a discrete mixture

Consider the distribution whose distribution function is given by:

    
F( x )  0.4 1  e 0.03 x  0.3 1  e 0.02 x  0.3 1  e 0.05 x 
This random variable is a discrete mixture of three exponential distributions.
Inverting this distribution function as it stands will not be very easy. However, an alternative
approach to simulating values from this type of distribution is as follows:
1 Use a random number to determine which individual exponential distribution to
simulate.
2 Use another random number to simulate a value from the correct exponential
distribution.
Here is an example.

Example 1.5

Use the random numbers 0.28, 0.57, 0.81 and 0.73 to simulate two values from the distribution
whose CDF is given above.

Solution

We subdivide the interval  0,1  into three sub-intervals,  0,0.4  ,  0.4,0.7  and  0.7,1 .
Observing which of these sub-intervals contains our first random number will determine which
exponential distribution we use in the simulation.
Here our first random number is 0.28. Since this falls into the first sub-interval, we simulate from
an exponential distribution with parameter 0.03. Using the second random number in the
inversion process:
1
0.57  1  e 0.03 x1  x1   log  1  0.57   28.13
0.03
Repeating the process, our next random number 0.81 falls into the third sub-interval, so we
simulate from an exponential distribution with parameter 0.05, using the fourth random number:
1
0.73  1  e 0.05x2  x2   log  1  0.73   26.19
0.05
In this way we avoid having to invert the rather complicated expression for the CDF of the
mixture distribution. 

Simulation using a stochastic process

We have already seen methods for simulating values from an ( a , b ,0) distribution, using the
inversion method. However, this method is not always very efficient. In this section we look at
an alternative approach to simulating values from a Poisson, binomial or negative binomial
distribution.

14
Supplementary note

Rather than trying to simulate directly the number of observations from the distribution, we will
consider the underlying process in time. If, for example, we want to simulate the number of
claims in one year, and we know that the claim distribution is Poisson with mean 3.4 per year, we
can simulate values from a Poisson distribution with mean 3.4. However, an alternative
approach would be to simulate the times at which these Poisson events occur, total these times,
and see how many occur before time one (year). This may seem like a longer process, but it can
in some situations be more efficient to program on a computer.
This method can be used for any of the three discrete distributions mentioned above. It can be
shown that the time to the next event always has an exponential distribution. However, we need
to be careful to use the correct exponential parameter, depending on the form of the distribution
we are trying to simulate. If the events we are trying to simulate occur according to a Poisson
process, then the time to the next event is exponential with the (constant) Poisson parameter. If
the events we are trying to simulate are binomial, then it can be shown that the time to the next
event is still exponential, but with a parameter that varies as the events occur. Similarly, if the
events we are trying to simulate are negative binomial, then the time to the next event has an
exponential distribution, but again the underlying parameter varies as the events occur.
Here are the key results that we will need for each of the three distributions.

Simulating a Poisson distribution


Time to the next event: The time to the next event if events have a Poisson distribution
with parameter  is exponential with parameter  (and mean
1 /  ).

Exponential distribution: We simulate the time to the next event as an exponential


random variable using sk   log  1  uk  /  .

Simulated value: We can now determine the number of events happening in one
time unit, by summing up the sk ’s. The total time is
tk  t k 1  sk . The number of events occurring before time 1 is
our simulated value.

Simulating a binomial distribution


Time to the next event: The time between events, if events have a binomial distribution
with parameters m and q , is exponential with parameter k .
The value of k is given by k  c  dk , where d  log  1  q 
and c  md .
Exponential distribution: We simulate the time between events as an exponential random
variable using sk   log  1  uk  / k .

Simulated value: We can now determine the number of events happening in one
time unit, by summing up the sk ’s using tk  t k 1  sk . The
number of events occurring before time 1 is our simulated
value.

15
Supplementary note

Simulating a negative binomial distribution


Time to the next event: The time between events, if events have a negative binomial
distribution with parameters r and  , is exponential with
parameter k . The value of k is given by k  c  dk , where
d  log  1    and c  rd .

Exponential distribution: We simulate the time between events as an exponential random


variable using sk   log  1  uk  / k .

Simulated value: We can now determine the number of events happening in one
time unit, by summing up the sk ’s using tk  t k 1  sk . The
number of events occurring before time 1 is our simulated
value.

Note the convention in use here. We shall use our first random number, u0 , to determine 0 , the
exponential parameter that we shall use to simulate the time from time zero to the first event,
s0  t0 . Then u1 will be used to determine 1 , the exponential parameter of the distribution of
the time from the first to the second event, s1 , and now t1  s0  s1 is the total time to the second
event. t2 will be the total time until the third event, and so on.

Let’s see how this process works in practice using an example.

Example 1.6

Simulate values from each of the three distributions given below using as many of the following
random numbers as necessary:

u0  0.14 u1  0.28 u2  0.73 u3  0.82 u4  0.44 u5  0.61

(a) a Poisson distribution with mean 1.6.

(b) a binomial distribution with parameters m  40 and q  0.04 .

(c) a negative binomial distribution with parameters r  120 and   0.014 .

16
Supplementary note

Solution

(a) Poisson distribution

We have   1.6 . So, using our first random number, we have t0   log  1  0.14  /1.6  0.0943 .
So the time to our first event is 0.0943 time units. We now use the same formula but with the next
random number to find the time from the first to the second event:

s1   log  1  u1  /1.6  0.2053  t1  0.0943  0.2053  0.2996

The total time to the second event is 0.2996 time units.

Repeating the process again, we have:

s2   log  1  u2  /1.6  0.8183  t2  0.2996  0.8183  1.1179

So the third event occurs after the end of the time period, and two events have occurred within
the time interval  0,1  . The simulated value is 2.

Note that, with the notation above, t k is actually the total time to the k  1 th event.

(b) Binomial distribution

We now need the values of c and d :

d  log  1  q   log 0.96  0.04082 and: c  md  1.63288

We can now calculate the appropriate values of k :

0  c  1.63288 1  c  d  1.59206 2  c  2 d  1.55124

Now we can find the times of the various events:

t0  s0   log  1  u0  /1.63288  0.0924

s1   log  1  u1  /1.59206  0.2063  t1  0.0924  0.2063  0.2987

s2   log  1  u2  /1.55124  0.8441  t2  0.2987  0.8441  1.1428

So again, the third simulated event occurs after the end of the time interval, and our simulated
value is 2.

(c) Negative binomial distribution

Again we need the values of c and d :

d  log  1     0.01390 and: c  rd  1.66835

We can now calculate the appropriate values of k :

0  c  1.66835 1  c  d  1.68225 2  c  2 d  1.69615

17
Supplementary note

Now we can find the times of the various events:

t0  s0   log  1  u0  /1.66835  0.0904

s1   log  1  u1  /1.68225  0.1952  t1  0.0904  0.1953  0.2857

s2   log  1  u2  /1.69615  0.7719  t2  0.2857  0.7719  1.0576

So again, the third simulated event occurs after the end of the time interval, and our simulated
value is again 2. 

Simulation from a decrement table

When following the progress of a group of policyholders, it may be necessary to simulate the
outcomes for the group. The group may be subject to a variety of different decrements, for
example death, retirement, withdrawal and so on.

Consider a group of 1,000 identical policyholders, all aged 60 exact. Let us assume that they are
subject to three decrements, death, age retirement and ill-health retirement. The probabilities for
each of these decrements at each age might be as follows:

Age Probability of death Probability of age Probability of ill-


retirement health retirement

60 0.04 0.12 0.09

61 0.05 0.15 0.10

We want to simulate the progress of this group of policyholders, identifying the numbers of lives
who will leave the group via each decrement at each age. To do this, we will need to simulate
values from various binomial distributions. We might proceed as follows.

Consider first the number of deaths at age 60. This has a binomial distribution with parameters
1,000 and 0.04. So we first simulate a value from this binomial distribution to determine the
number of deaths during the year. Suppose that our simulated value is 28.

We now have a sample of 1,000  28  972 lives remaining. To determine the simulated number
of age retirements during the year, we now need a value from a binomial distribution with
parameter 972. However, we need the conditional probability of age retirement, given that a life
0.12
is still alive. This will be  0.125 . We can simulate a value from the binomial
1  0.04
distribution with these parameters using any of the methods given previously for the binomial
distribution. Suppose our simulated value is 102.

We now have 972  102  870 lives remaining. To simulate the number of ill-health retirements,
we need the conditional probability of taking ill health retirement, given that a life has not died
or taken age retirement. This is:

0.09
 0.10714
1  0.04  0.12

18
Supplementary note

We now need a simulated value from a binomial distribution with parameters 870 and 0.10714.
Perhaps our simulated value is 62. We now have 870  62  808 lives surviving in the population
until age 61.

We can continue the process for as long as necessary, simulating the observed numbers of lives
exiting by each decrement at each age. We may need to carry out the process on a computer if we
want a large number of repeated simulations. But the underlying method is fairly
straightforward.

19
Supplementary note

Supplementary Note Practice Questions

Question 1.1
Use the random number u  0.845 to generate a random number from a negative binomial
distribution with mean 0.264 and variance 0.3.

Question 1.2
Using the inversion method, use the random number u1  0.42 to generate a single observation
from a lognormal distribution with mean 5,000 and standard deviation 400.

Question 1.3
Use the random numbers 0.81, 0.95, 0.09, 0.22 and the polar method to generate two random
numbers from the standard normal distribution.

Question 1.4
Use the random numbers u1  0.73 and u2  0.28 and the Box-Muller method to generate two
random numbers from a normal distribution with mean 100 and standard deviation 10.

Question 1.5
Use a stochastic process to generate a random observation from a binomial distribution with
parameters m  50 and q  0.01 . Use as many of these random numbers as are needed:

u0  0.423 u1  0.796 u2  0.522 u3  0.637 u4  0.992

Question 1.6
Use a stochastic process to generate values from a negative binomial distribution with
parameters r  100 and   0.08 . Use the same random numbers as in the previous question.

Question 1.7
Use the first two random numbers from the previous question to generate a random observation
from the mixture of Pareto distributions with distribution function:

  200 3    300  4 
F( x )  0.6  1     0.4 1 
  x  200     x  300  
   

20
Supplementary note

You are given the following information about a sample of lives:

Life Date of birth Date of purchase Mode of exit Date of exit

1 Apr 15 1950 Jan 1 2011 Died May 15 2011

2 Jul 15 1950 Apr 1 2011 Surrendered Mar 15 2012

3 Oct 15 1950 Oct 1 2011 Alive -

4 Jan 15 1950 Feb 1 2011 Alive -

5 Feb 15 1951 Mar 1 2011 Died Aug 15 2011

These lives are subject to a 2-year investigation, running from July 1 2010 to June 30 2012.
Assume in each of the following questions that each half-month period is exactly one twenty-
fourth of a year.

Question 1.8
Using the exact exposure method, estimate h61 .

Question 1.9
Using the actuarial exposure method, estimate q60 and h60 .

Question 1.10
Using the exact exposure method with insuring ages last birthday, estimate q60 .

Question 1.11
Using the actuarial exposure method with insuring ages last birthday, estimate q60 .

Question 1.12
Explain how your answer to Question 1.10 would alter if you were using an anniversary-based
study to estimate q60 and q61 .

Question 1.13
Using an interval-based method and the table of lives given above, construct a table of
decrements, and hence estimate q60 using the actuarial method.

21
Supplementary note

Question 1.14
Find the estimated variance of your estimator in the previous question.

22
Supplementary note

Solutions to Supplementary Note Practice Questions

Question 1.1
We first need the parameters of the negative binomial distribution. Using the formulae for the
mean and variance:

r   0.264 and: r   1     0.3

Solving these simultaneous equations, we obtain r  1.936 and   0.13636 .

The question does not require us to use a stochastic process, so it is probably quickest just to use
the inversion method as normal. Calculating the first few negative binomial probabilities:

r
p0   1     0.780762

r
p1   0.18139
 1   r  1

So the inversion method will transform random numbers in the interval (0, p0 ) to a simulated
value of zero, and random numbers in the range ( p0 , p0  p1 ) to a simulated value of 1. Our
random number lies in this second interval, so our simulated value is 1.

Question 1.2
First we need the parameters of the lognormal distribution. Using the formulae for the mean and
variance of the lognormal, we have:

e 2    e  1   400 2
2 2 2
e   ½  5,000 and:
 

Solving these simultaneous equations, we find that   8.514003 and  2  0.0063796 .

We now find a simulated N (0,1) value by using the normal tables:

  z1   0.42     z1   0.58  z1  0.2019

We can now find a simulated value from the lognormal distribution:

x1  e   z1  e8.514 0.2019 0.0063796


 4,904

23
Supplementary note

Question 1.3
First we find x1  2u1  1  0.62 and x2  2u2  1  0.90 . Applying the check, we find that
w  x12  x22
 1.19 . Since w  1 , we reject these values and start the process again using the other
random numbers.

2 2
Now we have x3  2u3  1  0.82 and x4  2u4  1  0.56 . Since  0.82    0.56   0.986  1 ,
we can proceed. So:

y  2 log 0.986  /0.986  0.16911

and we have:

z1  x3 y  0.82  0.16911  0.1387

and: z2  x 4 y  0.56  0.16911  0.0947

These are our simulated values from the standard normal distribution.

Question 1.4
Using the standard Box-Muller formula, we have:

z1  2 log u1 cos  2 u2   2 log  0.73  cos  2  0.28   0.148661

and: z2  2 log u1 sin  2 u2   2 log  0.73  sin  2  0.28   0.779308

These are independent N (0,1) observations. The corresponding values from the normal
distribution with mean 100 and standard deviation 10 are:

x1  100  10 z1  98.51 and: x2  100  10 z2  107.79

Question 1.5
We first need the parameters c and d :

d  log  1  q   log 0.99  0.010050 and: c  md  0.502517

We now use the formula k  c  dk to generate the successive exponential parameters:

 log 0.577
s0   log  1  u0  / 0   1.0943
0.502517

Since this value is greater than one, the first event occurs after time one, and so there are no
observed events in a unit time period. The simulated value from the distribution is zero.

24
Supplementary note

Question 1.6
First we need the values of the parameters c and d :

d  log  1     0.076961 and: c  rd  7.696104

We use the formula k  c  dk to generate the successive exponential parameters. So the times to
the events are:

t0   log  1  0.423  /7.696104  0.071453

s1   log  1  0.796  /7.773065  0.204506

s2   log  1  0.522  /7.850026  0.094031

s3   log  1  0.637  /7.926987  0.127836

s4   log  1  0.992  /8.003948  0.603242

We see that t4  t0  s1  s2  s3  s4 is the first value of t which is greater than one. So the fifth
event occurs after time 1, and there are 4 events in the unit time period. The simulated value is 4.

Question 1.7
Since u0  0.423 and 0  0.423  0.6 , we use the first of the two Pareto distributions to simulate.
Using our second random number:

3
 200 
0.796  1     x  139.75
 x  200 

25
Supplementary note

Question 1.8
We start by calculating the age at which each life was first observed, and last observed. Treating
a half month as being equal to 24 1 th of a year, we obtain the following table of ages and

exposures (the exposure unit is also one twenty-fourth of a year):

Life Age at first Age at last e59 e60 e61 e62


observation observation

1 60 17 2
61 24 - 7 2 -
24

2 60 17 61 16 - 7 16 -
24 24

3 23
60 24 61 17 - 1 17 -
24

4 1
61 24 11
62 24 - - 23 11

5 1
60 24 60 12 - 11 - -
24

The total exposure at age 61 is 58 twenty-fourths of a year. We have one death at age 61 last
2 ). So our estimate is:
birthday (Life 1 dies at age 61 24

1
hˆ61   0.41379
58 / 24

Question 1.9
Using the actuarial exposure method, we need to allow for extra exposure for the deaths. We are
now looking at age 60, and Life 5 dies aged 60 12
24
. So there is extra exposure of 12
24
ths of a year
(from age 60 12
24
to age 61), and the total exposure at age 60 goes up by 12, from 26 to 38 (twenty-
fourths of a year). So we now have:

1
qˆ 60   0.63158
38 / 24

To find the estimate for the hazard rate, we note that q  1  e  h , ie that h   log  1  q  . So we
have:

hˆ60   log  1  0.63158   0.99853

26
Supplementary note

Question 1.10
We now want to use insuring ages last birthday. We have the following new table of dates:

Life Date of Date of New date Insuring Date of exit Insuring


birth purchase of birth age at age at exit
entry

1 Apr 15 1950 Jan 1 2011 Jan 1 1951 60 May 15 2011 9


60 24

2 Jul 15 1950 Apr 1 2011 Apr 1 1951 60 Mar 15 2012 23


60 24

3 Oct 15 1950 Oct 1 2011 Oct 1 1951 60 Jun 30 2012 60 18


24

4 Jan 15 1950 Feb 1 2011 Feb 1 1950 61 Jun 30 2012 62 10


24

5 Feb 15 1951 Mar 1 2011 Mar 1 1951 60 Aug 15 2011 11


60 24

9 ,
We now check the ages at death. Using insuring ages, we find that Life 1 now dies at age 60 24
11 . So we now have two deaths at age 60 last birthday.
and Life 5 dies at age 60 24

The contributions to the exposures at each age are as follows (in units of one twenty-fourth of a
year):

Life Age at first Age at last e60 e61 e62


observation observation

1 60 9
60 24 9 - -

2 60 23
60 24 23 - -

3 60 60 18 18 - -
24

4 61 62 10 - 24 10
24

5 60 11
60 24 11 - -

So the exposure at age 60 is now 61 twenty-fourths of a year, and there are two deaths. So using
the exact exposure method, the estimate of the hazard rate at age 60 is:

2
hˆ60   0.78689
61 / 24

And so:

qˆ 60  1  e 0.78689  0.54474

27
Supplementary note

Question 1.11
Using the actuarial exposure method, we increase the exposure for lives 1 and 5 to a whole year.
So we now have:

e60  24  23  18  24  89

and the estimate for q60 is now:

2
qˆ 60   0.53933
89 / 24

Question 1.12
The figures would be different for the two lives who remain until the end of the investigation.

Life 3 enters the investigation on 1 Oct 2011. At this point there is less than a full year until the
end of the investigation. So life 3 cannot contribute even one year to the exposure, and so would
not contribute at all. Life 4 enters on 1 Feb 2011, so can contribute for a full year from 1 Feb 2011
to 1 Feb 2012. So the contributions to the exposure of these two lives will be zero for Life 3, and
one full year of exposure in e61 only for Life 4.

Question 1.13
We obtain the following figures:

Age Pj nj dj wj Pj  1

60 0 4 1 0 3

61 3 1 1 2 1

62 1 0 0 1 0

We can now calculate the exposure at age 60 using the actuarial method:

e60  P60   n60  w60  / 2  0   4  0  / 2  2

So the exposure is 2 years, and our estimate is qˆ 60  0.5 .

Question 1.14
Using the formula given in the text, we have:

2
var  qˆ 60    1  qˆ 60  d / e 2  0.52  1 / 2 2  0.0625

28
Supplementary note

Appendix – Syllabus changes

In 2013 the Society of Actuaries added a small number of syllabus items to the examination
syllabus for Exam C. The new syllabus items are listed here, together with details of the material
which covers them.

A8 Identify and describe two extreme value distributions.

A very brief introduction to the study of extreme value distributions is given in Section 1.2 of this
study note.

G Estimation of decrement probabilities from large samples

1 Estimate decrement probabilities using both parametric and non-parametric approaches


for both individual and interval data

2 Approximate the variance of the estimators.

Some methods for dealing with large samples are covered in Chapter 7 of the BPP textbook.
However, some additional ideas are covered in Section 1.3 of this study note, which gives an
alternative approach to these ideas.

J2 Simulate from discrete mixtures, decrement tables, the ( a , b ,0) class, and the normal and
lognormal distributions using methods designed for those distributions

The basic simulation ideas are covered in Chapter 12 of the BPP textbook. A small number of
additional methods, which have been added to the syllabus, are covered in Section 1.4 of this
study note.

The SoA has now added 10 additional questions to the end of the Exam C Sample Questions
document (these are currently Questions 290-299). You can find this by searching on the web for
“Be an actuary Exam C Syllabus” and clicking on the link to the syllabus – the questions and
solutions links are at the end of the syllabus document. You should test your understanding of
the material in this note by completing these additional questions. They all relate to the material
in this study note.

29

Вам также может понравиться