Вы находитесь на странице: 1из 284

Deeper Understanding, Faster Calculation

--Exam C Insights & Shortcuts

6th Edition

by Yufeng Guo

Fall 2009

The Missing Manual

This electronic book is intended for individual buyer use for the sole purpose of preparing for
Exam C. This book can NOT be resold to others or shared with others. No part of this publication
may be reproduced for resale or multiple copy distribution without the express written permission
of the author.

2009, 2010 By Yufeng Guo

Guo Fall 2009 C, Page 1 / 284


Table of Contents

Introduction 4
Chapter 1 Doing calculations 100% correct 100% of the time.. 5
6 strategies for improving calculation accuracy ............................................................. 5
6 powerful calculator shortcuts....................................................................................... 6
#1 Solve ax 2 + bx + c = 0 . .................................................................................... 6
#2 Keep track of your calculation...................................................................... 10
#3 Calculate mean and variance of a discrete random variable......................... 21
#4 Calculate the sample variance....................................................................... 29
#5 Find the conditional mean and conditional variance .................................... 30
#6 Do the least squares regression ..................................................................... 36
#7 Do linear interpolation .................................................................................. 46
Chapter 2 Maximum likelihood estimator ......................................... 52
Basic idea ...................................................................................................................... 52
General procedure to calculate the maximum likelihood estimator ............................. 53
Fisher Information ........................................................................................................ 58
The Cramer-Rao theorem ............................................................................................. 62
Delta method................................................................................................................. 66
Chapter 3 Kernel smoothing................................................................ 75
Essence of kernel smoothing ........................................................................................ 75
Uniform kernel.............................................................................................................. 77
Triangular kernel........................................................................................................... 82
Gamma kernel............................................................................................................... 90
Chapter 4 Bootstrap.............................................................................. 95
Essence of bootstrapping .............................................................................................. 95
Recommended supplemental reading ........................................................................... 96
Chapter 5 Bhlmann credibility model ............................................ 102
Trouble with black-box formulas................................................................................ 102
Rating challenges facing insurers ............................................................................... 102
3 preliminary concepts for deriving the Bhlmann premium formula ....................... 106
Preliminary concept #1 Double expectation ....................................................... 106
Preliminary concept #2 Total variance formula.................................................. 108
Preliminary concept #3 Linear least squares regression ..................................... 111
Derivation of Bhlmanns Credibility Formula.......................................................... 112
Summary of how to derive the Bhlmann credibility premium formulas .................. 117
Special case................................................................................................................. 122
How to tackle Bhlmann credibility problems ........................................................... 123
An example illustrating how to calculate the Bhlmann credibility premium ........... 123
Shortcut ....................................................................................................................... 126
Practice problems........................................................................................................ 126
Chapter 6 Bhlmann-Straub credibility model ............................... 148
Context of the Bhlmann-Straub credibility model.................................................... 148
Assumptions of the Bhlmann-Straub credibility model............................................ 149
Summary of the Bhlmann-Straub credibility model................................................. 154
Guo Fall 2009 C, Page 2 / 284
General Bhlmann-Straub credibility model (more realistic) .................................... 155
How to tackle the Bhlmann-Straub premium problem ............................................. 158
Chapter 7 Empirical Bayes estimate for the Bhlmann model...... 168
Empirical Bayes estimate for the Bhlmann model ................................................... 168
Summary of the estimation process for the empirical Bayes estimate for the
Bhlmann model..................................................................................................... 170
Empirical Bayes estimate for the Bhlmann-Straub model........................................ 173
Semi-parametric Bayes estimate................................................................................. 182
Chapter 8 Limited fluctuation credibility ........................................ 187
General credibility model for the aggregate loss of r insureds ................................. 188
Key interim formula: credibility for the aggregate loss............................................. 190
Final formula you need to memorize .......................................................................... 191
Special case................................................................................................................. 192
Chapter 9 Bayesian estimate ......................................................... 202
Intuitive review of Bayes Theorem ........................................................................... 202
How to calculate the discrete posterior probability .................................................... 206
Framework for calculating the discrete posterior probability..................................... 208
How to calculate the continuous posterior probability ............................................... 213
Framework for calculating discrete-prior Bayesian premiums................................... 219
Calculate Bayesian premiums when the prior probability is continuous.................... 251
Poisson-gamma model ................................................................................................ 260
Binomial-beta model................................................................................................... 264
Chapter 10 Claim payment per payment ........................................... 268
Chapter 11 LER (loss elimination ratio)............................................. 274
Chapter 12 Find E(Y-M)+.................................................................... 276
About the author .................................................................................... 284

Guo Fall 2009 C, Page 3 / 284


Introduction
This manual is intended to be a missing manual. It skips what other manuals explain well.
It focuses on what other manuals dont explain or dont explain well. This way, you get
your moneys worth.

Chapter 1 teaches you how to do manual calculation quickly and accurately. If you
studied hard but failed Exam C repeatedly, chances are that you are concept strong,
calculation weak. The calculator techniques will improve our calculation accuracy.

Chapter 2 focuses on the variance of a maximum likelihood estimator (MLE), a difficult


topic for many.

Chapter 3 explains the essence of kernel smoothing and teaches you how to derive
complex kernel smoothing formulas for k y ( x ) and K y ( x ) . You shouldnt have any
trouble memorizing complex kernel smoothing formulas after this chapter.

Many candidates dont know the essence of bootstrap. Chapter 4 is about bootstrap.

Chapter 5 explains the core theory behind the Bhlmann credibility model.

Chapter 6 compares and contrasts the Bhlmann-Straub credibility models with the
Bhlmann credibility model.

Many candidates are afraid of empirical Bayes estimate problems. The formulas are just
too hard to remember. Chapter 7 will relieve your pain.

Many candidates find that there are just too many limited fluctuation credibility formulas
to memorize. To address this, Chapter 8 gives you a unified formula.

Chapter 9 presents a framework for quickly calculating the posterior probability (discrete
or continuous) and the posterior mean (discrete or continuous). Many candidates can
recite Bayes theorem but cant solve related problem in the exam condition. Their
calculation is long, tedious, and prone to errors. This chapter will drastically improve
your calculation efficiency.

Chapter 10 is about claim payment per payment.


Chapter 11 is about loss elimination ratio.
Chapter 12 is about how to quickly calculate E (Y M )+ .

Guo Fall 2009 C, Page 4 / 284


Chapter 1 Doing calculations 100% correct 100% of
the time
>From: Exam C candidate (name removed)
>To: yufeng_guo@msn.com
>Subject: Help..
>Date: someday in 2006
>
>Hello Mr. Guo.
>
> I tried Exam C problems under the exam-like condition. To my surprise, I found that I
>made too many mistakes; one mistake is 1+1=3. How can I improve my accuracy?

6 strategies for improving calculation accuracy


1. Gain a deeper understanding of a core concept. People tend to make errors if they
memorize a black-box formula without understanding the formula. To reduce
errors, try to understand core concepts and formulas.

2. Learn how to solve a problem faster. Many exam candidates solve hundreds of
practice problems yet fail Exam C miserably. One major cause is that their
solutions are inefficient. Typically, these candidates copy solutions presented in a
textbook and study manuals. Authors of textbooks and many study manuals
generally use software to do the calculations. To solve a messy calculation, they
just type up the formula and click Compute button. However, when you take
the exam, you have to calculate the answer manually. A solution that looks clean
and easy in a textbook may be a nightmare in the exam. When you prepare for
Exam C, dont copy textbook solutions. Improve them. Learn how to do manual
calculation faster.

3. Build solution frameworks and avoid reinventing the wheel. If you analyze Exam
C problems tested in the past, youll see that SOA pretty much tests the same
things over and over. For example, the Poisson-gamma model is tested over and
over. When preparing for Exam C, come up with a ready-to-use solution
framework for each of the commonly tested problems in Exam C. This way, when
you walk into the exam room and see a commonly tested problem, you dont need
to solve the problem from scratch. You can use your pre-built solution framework
and solve it quickly and accurately.

4. Keep an error log. Whenever you solve some practice problems, record your
errors in a notebook. Analyze why you made errors. Try to solve a problem
differently to avoid the error. Review your error log from time to time. Using an
error log helps you avoid making the same calculation errors over and over.

5. Avoid doing mental math in the exam even for the simplest calculations. Even if
you are solving a simple problem like 2+3, use your calculator to solve the

Guo Fall 2009 C, Page 5 / 284


problem. Simply enter 2 + 3 in your calculator. This will reduce your silly
errors.

6. Learn some calculator tricks.

6 powerful calculator shortcuts


Fast and safe techniques for common calculations.

#1 Solve ax 2 + bx + c = 0 .

b b 2 4ac
The formula x = is OK when a, b, and c are nice and small numbers.
2a
However, when a, b, and c have many decimals or are large numbers and we are in the
pressured situation, the standard solution often falls apart in the heat of the exam.

Example 1. Solve 0.3247 x 2 89.508 x + 0.752398 = 0 in 15 seconds.

If candidates need to solve this equation in the exam, many will fluster. The standard
b b 2 4ac
approach x = is labor intensive and prone to errors when a, b, and c are
2a
messy.

To solve this equation 100% right under pressure and in a hurry, well do a little trick.
1
First, we set x = v = . So we treat x as a dummy discount factor. The original
1+ r
equation becomes:

0.3247v 2 89.508v + 0.752398 = 0

If we can find r , the dummy interest rate, well be able to find x .

Finding r is a concept you learned in Exam FM. We first convert the equation to the
following cash flow diagram:

Time t 0 1 2

Cash flow $0.752398 - $89.508 $0.3247

Guo Fall 2009 C, Page 6 / 284


So at time zero, you receive $0.752398. At time one, you pay $89.508. Finally, at time
two, you receive $0.3247. Whats your IRR?

To find r (the IRR), we simply use Cash Flow Worksheet in BA II Plus or BA II Plus
Professional.

Enter the following cash flows into Cash Flow Worksheet:

Cash Flow CF 0 C 01 C 02
0.752398 - 89.508 0.3247
Frequency F 01 F 02
1 1

Because the cash flow frequency is one for both C 01 and C 02 , we dont need to enter
F 01 = 1 and F 02 = 1 . If we dont enter cash flow frequency, BA II Plus and BA II Plus
Professional use one as the default cash flow frequency.

Using the IRR function, we find that IRR = 99.63722807 . Remember this is a
percentage. So r = 99.63722807%

1 1
x1 = = = 275.6552834
1 + r 1 99.63722807%

How are going to find the second root? Well use the following formula:
If x1 and x2 are the two roots of ax 2 + bx + c = 0 , then

c 1 c
x1 x2 = x2 =
a x1 a

1 c 1 0.752398
x2 = = = 0.00840619
x1 a 275.6552834 0.3247

Guo Fall 2009 C, Page 7 / 284


Keystrokes in BA II Plus / BA II Plus Professional
Procedure Keystroke Display
Assume we set the calculator to
display 8 decimal places.
Use Cash Flow Worksheet
CF CF0=(old content)
Clear Worksheet
2nd [CLR WORK] CF0=0.00000000
Enter the cash flow at t = 0. CF0=0.752398

Enter the cash flow at t =1.


C01 0.00000000

89.508 +/- Enter C01= - 89.50800000

Enter the # of cash flows for C01


F01= 1.00000000
The default # is 1. So no need to
enter anything.
Enter the cash flow at t =2.
C02 0.00000000

0.3247 Enter C02= 0.32470000

Calculate IRR
IRR IRR=0.00000000

CPT IRR= - 99.63722807


% IRR 0.9963722807 (This is the
dummy interest)
Find the dummy discount factor
1 + 1= IRR 0.00362772
x1 =
1 + IRR % IRR 275.65528324
1x
This is x1

Store in Memory 0. This leaves an STO 0 IRR 275.65528324


auditing trail.

1 c 1 x 0.752398 0.3247 0.00840619 This is x2


Find the 2nd root. x2 =
x1 a =

Store in Memory 0. This leaves an STO 1 0.00840619


auditing trail.

Guo Fall 2009 C, Page 8 / 284


You can always double check your calculations. Retrieve x1 and x2 from the calculator
memory and plug in 0.3247 x 2 89.508 x + 0.752398 . You should get a value close to
zero. For example, plugging in x1 = 275.6552834 :

0.3247 x 2 89.508 x + 0.752398 = 0.00000020 (OK)

Plugging in x2 = 0.00840619

0.3247 x 2 89.508 x + 0.752398 = 6.2 10 12


(OK)

We didnt get a zero due to rounding.

Does this look at lot of work? Yes at the first time. Once you get familiar with this
process, it takes you 15 seconds to finish calculating x1 and x2 and double checking they
are right.

Quick and error-free solution process to ax 2 + bx + c = 0

Step 1 Rearrange ax 2 + bx + c = 0 to c + bx + ax 2 = 0 .

Step 2 Use BA II Plus/BA II Plus Professional Cash Flow Worksheet to find IRR

CF 0 = c (cash flow at time zero)


C 01 = b (cash flow at time one)
C 02 = a (cash flow at time two)

Time t 0 1 2

Cash flow c b a

Step 3 Find x1 and x2

1 1 c
x1 = , x2 =
IRR x1 a
1+
100

Step 4 Plug in x1 and x2 . Check whether ax 2 + bx + c = 0

Guo Fall 2009 C, Page 9 / 284


In the exam, if an equation is overly simple, just try out the answer. If an equation is not
overly simple, always use the above process to solve ax 2 + bx + c = 0 .

For example, if you see x 2 2 x 3 = 0 , you can guess that x1 = 1 and x2 = 3 . However,
if you see x 2 2 x 7.3 = 0 , use Cash Flow Worksheet to solve it.

Exercise

#1 Solve 10,987 x 2 + 65,864 x + 98,321 = 0


Answer: x1 = 7.2321003 and x2 = 1.23737899

#2 Solve x 2 2 x 7.3 = 0 .
Answer: x1 = 3.88097206 and x2 = 1.88097206
#3 Solve 0.9080609 x 2 0.00843021x 0.99554743 = 0
Answer: x1 = 1.0517168 and x2 = 1.04243305

#4 Solve x 2 2 x + 3 = 0 .
Answer: youll get an error message if want to calculate IRR. Theres no solution.
x 2 2 x + 3 = ( x 1) + 2 2 . So theres no solution.
2

#2 Keep track of your calculation

Example 1

A group of 23 highly-talented actuary students in a large insurance company are taking


SOA Exam C at the next exam sitting. The probability for each candidate to pass Course
2 is 0.73, independent of other students passing or failing the exam. The company
promises to give each actuary student who passes Exam C a raise of $2,500. Whats the
probability that the insurance company will spend at least $50,000 on raises associated
with passing Exam C?

Solution

If the company spends at least $50,000 on exam-related raises, then the number of
students who will pass Exam C must be at least 50,000/2,500=20. So we need to find the
probability of having at least 20 students pass Exam C.

Let X = the number of students who will pass Exam C. The problem does not specify the
distribution of X . So possibly X has a binomial distribution. Lets check the conditions
for a binominal distribution:

Guo Fall 2009 C, Page 10 / 284


There are only two outcomes for each student taking the exam either Pass or
Fail.
The probability of Pass (0.73) or Not Pass (0.27) remains constant from one
student to another.
The exam result of one student does not affect that of another student.

X satisfies the requirements of a binomial random variable with parameters n =23 and
p =0.73. We also need to find the probability of x 20 .

Pr(x 20) = Pr(x = 20) + Pr(x = 21) + Pr(x = 22) + Pr(x = 23)

Applying the formula f X ( x ) = Cnx p x (1 p )


n x
, we have:

f (x 20)
= C 23
20
(.73)20 (.27)3 + C 23
21
(0.73)21(.27)2 + C 23
22
(.73)22 (.27) + C 23
23
(.73)23 = .09608

Therefore, there is a 9.6% of chance that the company will have to spend at least $50,000
to pay for exam-related raises.

Calculator key sequence for BA II Plus:

Method #1 direct calculation with out using memories


Procedure Keystroke Display
Set to display 8 decimal places (4 DEC=8.00000000
decimal places are sufficient, but 2nd Format 8 Enter
assume you want to see more
decimals)
Set AOS (Algebraic operating system)
2nd [FORMAT],
keep pressing multiple
times until you see Chn.
Press 2nd [ENTER]

(if you see AOS, your AOS


calculator is already in AOS,
in which case press
[CLR Work] )
20
Calculate C 23 (.73)20 (.27)3
20
Calculate C23 1,771.000000
23 2 nd
n Cr 20

20 3.27096399
Calculate (.73)
x
.73 y 20

Calculate (.27)3 0.064328238


x
.27 y 3+

Guo Fall 2009 C, Page 11 / 284


21
Calculate C23 (0.73)21(.27)2
21
Calculate C23 253.0000000
nC r 21
nd
23 2

21 0.34111482
Calculate (.73)
x
.73 y 21

Calculate (.27)2 0.08924965


2
.27 x +
22
Calculate C23 (0.73)22 (.27)
22
Calculate C23 23.00000000
23 2 nd
n Cr 22
22 0.02263762
Calculate (.73)
x
.73 y 22

Calculate (.27) 0.09536181


.27 +
23
Calculate C23 (0.73)23
23
Calculate C23 1.00000000
23 2 nd
n Cr 23

23 0.09608031
Calculate (.73) and get the final
x
result .73 y 23 =

Method 2: Store intermediate values in your calculators memories


Procedure Keystroke Display
Set to display 8 decimal places (4 DEC=8.00000000
decimal places are sufficient, but 2nd Format 8 Enter
assume you want to see more
decimals)
Set AOS (Algebraic operating system)
2nd [FORMAT],
keep pressing multiple
times until you see Chn.
Press 2nd [ENTER]

(if you see AOS, your AOS


calculator is already in AOS,
in which case press
[CLR Work] )
Clear memories 2nd MEM M0=0.00000000
2nd CLR Work
Get back to calculation mode CE/C 0.00000000
(.73 ) (.27 )
20 20 3
Calculate C 23 and

Guo Fall 2009 C, Page 12 / 284


store it in Memory 1
20
Calculate C23 1,771.000000
23 2 nd
n Cr 20

( 0.73 )
20 3.27096399
Calculate x
.73 y 20

(.27 )
3 0.064328238
Calculate x
.27 y 3=

Store the result in Memory 0 STO 0 0.064328238


Get back to calculation mode CE/C 0.00000000
( 0.73 ) (.27)2
21 21
Calculate C23
and store it in Memory 2
21
Calculate C23 253.0000000
2 nd
n Cr 21

21 0.34111482
Calculate (.73)
x
.73 y 21

Calculate (.27)2 0.07290000


2
.27 x

Store the result in Memory 1 STO 1 0.02486727


Get back to calculation mode CE/C 0.00000000
22
Calculate C23 (0.73)22 (.27) and
store it in Memory 3
22
Calculate C23 23.00000000
23 2 nd
n Cr 22
22 0.02263762
Calculate (.73)
x
.73 y 22

Calculate (.27) 0.00611216


.27 =
Store the result in Memory 2 STO 2 0.09536181
23 23
Calculate C 23 (0.73) and store it
in Memory 4
23
Calculate C23 1.00000000
23 2 nd
n Cr 23

23 0.00071850
Calculate (.73) and get the final
x
result .73 y 23 =

Store the result in Memory 3 STO 3 0.00071850


Recall values stored in Memory 1,2,3,
and 4. Sum them up.

Guo Fall 2009 C, Page 13 / 284


RCL 0 0.064328238
+ RCL 1 0.02486727
+ RCL 2 0.00611216
+ RCL 3 = 0.09608031

Comparing Method #1 with Method #2:

Method #1 is quicker but more risky. Because you dont have an audit history, if you
miscalculate one item, youll need to recalculate everything again from scratch.

Method #2 is slower but leaves a good auditing trail by storing all your intermediate
values in your calculators memories. If you miscalculate one item, you need to
recalculate that item alone and reuse the result of other calculations (which are correct).

(.73 ) (.27 )
20 20 3
For example, instead of calculating C 23 as you should, you calculated
(.73 ) (.27 ) . To correct this error under method #1, you have to start from scratch
20 3 20
C 23
and calculate each of the following four items:

(.73 ) (.27 ) , C23 ( 0.73 ) (.27)2 , C23


20 3 21 21 22
20
C 23 (0.73)22 (.27) , and C 23
23
(0.73)23

In contrast, correcting this error under Method #2 is lot easier. You just need to
(.73 ) (.27 ) ; you dont need to recalculate any of the following three
20 20 3
recalculate C 23
items:

( 0.73 ) (.27)2 , C23


21 21 22
C23 (0.73)22 (.27) , and C 23
23
(0.73)23

You can easily retrieve the above three items from your calculators memories and
calculate the final result:

20
C 23 (.73)20 (.27)3 + C 23
21
(0.73)21(.27)2 + C 23
22
(.73)22 (.27) + C 23
23
(.73)23 = .09608

Example 2 (a reserve example for Exam C)


Given:
l20 9,617,802
l30 9,501,381
l50 8,950,901
A50 0.24905
a20 16.5133
a30 15.8561
a50 13.2668
Interest rate 6%
Guo Fall 2009 C, Page 14 / 284
l30 10
a20 v a30
l50 l20
Calculate V = A50 v 20
l30 l50 30
a20 v a50
l20

Solution

This calculation is complex. Unless you use a systematic method, youll make mistakes.

Calculation steps using BA II Plus/BA II Plus Professional

Step 1 Simplify calculations

l30 10 1 l30 10 1 1 10
a20 v a30 a20 v a30 a20 v a30
l50 l20 l30 l20 20 l30 l20
A50 v 20 = A50 v 20 = A50 v
l30 l50 30 1 l50 30 1 1 30
a20 v a50 a20 v a50 a20 v a50
l20 l50 l20 l50 l20

v = 1.06 1

a20 a30 10
1.06
l30 l20
V = A501.06 20
a20 a50 30
1.06
l50 l20

Make sure you dont make mistakes in simplification. If you are afraid of making
mistakes, dont simplify and just do your calculations using the original equation:

l30 10
a20 v a30
l50 l20
V= A50 v 20
l30 l50 30
a20 v a50
l20

Step 2 Assign a memory to each input in the formula above

Input Memory Value


l20 M0 9,617,802
l30 M1 9,501,381

Guo Fall 2009 C, Page 15 / 284


l50 M2 8,950,901
A50 M3 0.24905
a20 M4 16.5133
a30 M5 15.8561
a50 M6 13.2668

After you assign a memory to each input, the formula becomes:

a20 a30 10 M4 M5
1.06 1.06 10
l30 l20
V = A501.06 20
= ( M 3)1.06 20 M1 M 0
a20 a50 M4 M6
1.06 30
1.06 30
l50 l20 M2 M0

Calculator key sequence to assign memories to the inputs:

Procedure Keystroke Display


Set to display 8 decimal DEC=8.00000000
places 2nd Format 8 Enter
Set AOS (Algebraic
operating system) 2nd [FORMAT],
keep pressing multiple
times until you see Chn.
Press 2nd [ENTER]

(if you see AOS, your AOS


calculator is already in AOS,
in which case press
[CLR Work] )
Clear existing numbers M0=0.00000000
nd nd
from the memories 2 MEM 2 CLR Work

Enter 9,617,802 in M0 M0=9,617,802.000


9,617,802 Enter
Move to the next memory M1=0.00000000

Enter 9,501,381 in M1 M1=9,501,381.000


9,501,381 Enter
Move to the next memory M2=0.00000000

Guo Fall 2009 C, Page 16 / 284


Enter 8,950,901 in M2 M2=8,950,901.000
8,950,901 Enter
Move to the next memory M3=0.00000000

Enter 0.24905 in M3 M3=0.24905000


0.24905 Enter
Move to the next memory M4=0.00000000

Enter 16.5133 in M4 M4=16.51330000


16.5133 Enter
Move to the next memory M5=0.00000000

Enter 15.8561 in M5 M5=15.85610000


15.8561 Enter
Move to the next memory M6=0.00000000

Enter 13.2668 in M6 M6=13.26680000


13.2668 Enter
Leave the memory
workbook and get back to CE/C
the normal calculation
mode This is the button on the
bottom left corner. This is the
same button for
CLR Work

Step 3 Double check data entry.

Dont bypass this step; its easy to enter a wrong data.

Keystrokes: press 2nd MEM. Then keep pressing the down-arrow key to view all the
data you entered in the memories. Make sure all the correct numbers are entered.

Step 4 Do the final calculation.

M4 M5
1.06 10
V = ( M 3)1.06 20 M1 M 0
M4 M6
1.06 30
M2 M0

Guo Fall 2009 C, Page 17 / 284


Well break down the calculation into two pieces:

M4 M5
1.06 10
= M 7 (store the result in M7)
M1 M0

M4 M6
1.06 30
= M 8 (store the result in M8)
M2 M0
M7
V = ( M 3)1.06 20

M8

Procedure Keystroke Display


Calculate
M4 M5 10 Recall 4 Recall 1 - Recall 5 0.00000082
1.06
M1 M 0 Recall 0

1.06 y x 10 +/- =
Store the result in M7.
Go back to the normal STO 7 CE/C
calculation mode.
0.00000160
Calculate Recall 4 Recall 1 - Recall 5
M4 M6 30 Recall 0
1.06
M2 M0
1.06 y x 10 +/- =
Store the result in M8.
Go back to the normal STO 8 CE/C
calculation mode.
0.0399556010
x
Calculate Recall 3 1.06 y 20 +/-
M7
V = ( M 3)1.06 20 Recall 7 Recall 8
M8

So V = 0.0399556 0.04

Though this calculation process looks long, once you get used to it, you can do it in less
than one minute.

Advantages of this calculation process:

Guo Fall 2009 C, Page 18 / 284


Inputs are entered only once. In this problem, l20 and a20 are used twice in the
a20 a30 10
1.06
l30 l20
formula V = A501.06 20
. However, we enter l20 and a20 into
a20 a50 30
1.06
l50 l20
memories only once. This reduces data entry error.

This process gives us a good auditing trail, enabling us to check the data entry and
calculations.

We can isolate errors. For example, if a wrong value of l30 is entered into the
a20 a30
memory, we can reenter l30 , recalculate 1.06 10 , and store the calculate
l30 l20
M7
value into M7. Next, we recalculate V = ( M 3)1.06 20 .
M8

Bottom line: I recommend that you master this calculation method. It costs you extra
work, but it enables you to do messy calculations 100% right in the exam.

When exams get tough and calculations get messy, many candidates who know as much
as you do will make calculations errors here and there and fail the exam. In contrast,
youll stand above the crowd and make no errors, passing another exam.

Problem 3 (Reserve example revised)

In Example 2, you calculated that V = 0.04 . However, none of the answer choices given
is 0.04. Suspecting that you made an error in calculations, you decided to redo the
calculation. First, you scrolled over the memories and gladly you found no error in data
M4 M5 M4 M6
entry. Next, you recalculated 1.06 10 = M 7 and 1.06 30 = M 8 .
M1 M 0 M2 M0
Once again, you found your previous calculations were right. Finally, you recalculated
M7
V = ( M 3)1.06 20 . Once again, you got V = 0.04 .
M8

You already spent four minutes in this problem. You decided to spend two more minutes
on this problem. If you couldnt figure out the right answer, you just had to give it up and
move on to the next problem.

So you quickly read the problem again. Oops! You found that your formula was wrong.
Your original formula was:

Guo Fall 2009 C, Page 19 / 284


l30 10
a20 v a30
l50 l20
V= A50 v 20
l30 l50 30
a20 v a50
l20

The correct formula should be:

l30 10
a20 v a30
l50 l20
V= a50 v 20
l30 l50 30
a20 v a50
l20

How could you find the answer quickly, using the correct formula?

Solution

The situation described here sometimes happens in the actual exam. If you dont use a
systematic method to do calculations, you wont leave a good auditing trail. In that case,
all your previous calculations are gone and you have to redo calculations from scratch.
This is awful.

Fortunately, you left a good auditing trail and correcting errors was easy.

Your previous formula after assigning memories to inputs:

l30 10
a20 v a30
l50 l20 M7
V= A50 v 20 = ( M 3)1.06 20

l30 l50 30 M8
a20 v a50
l20

The correct formula is:

l30 10
a20 v a30
l50 l20 M7
V= a50 v 20 = ( M 6 )1.06 20

l30 l50 30 M8
a20 v a50
l 20

Remember a50 = M 6

You simply reuse M7 and M8 and calculate

M7
V = ( M 6 )1.06 20
= 2.10713362 2.11
M8
Guo Fall 2009 C, Page 20 / 284
Now you look at the answer choices again. Good. 2.11 is there!

#3 Calculate mean and variance of a discrete random variable

There are two approaches:


Use TI-30 IIS (using the redo capability of TI-30IIS)
Use BA II Plus/BA II Plus Professional 1-V Statistics Worksheet

Exam #1 (#8 Course 1 May 2000) A probability distribution of the claim sizes for
an auto insurance policy is given in the table below:

Claim Size Probability


20 0.15
30 0.10
40 0.05
50 0.20
60 0.10
70 0.10
80 0.30

What percentage of the claims are within one standard deviation of the mean claim size?

(A) 45%, (B) 55%, (C) 68%, (D) 85%, (E)100%

Solution

This problem is conceptually easy but calculation-intensive. It is easy to make calculation


errors. Always let the calculator do all the calculations for you.

One critical thing to remember about the BA II Plus and BA II Plus Professional
Statistics Worksheet is that you cannot directly enter the probability mass function f ( x i )
into the calculator to find E ( X ) and Var ( X ) . BA II Plus and BA II Plus Professional 1-
V Statistics Worksheet accepts only scaled-up probabilities that are positive integers. If
you enter a non-integer value to the statistics worksheet, you will get an error when
attempting to retrieve E ( X ) and Var ( X ) .

To overcome this constraint, first scale up f ( x i ) to an integer by multiplying f ( x i ) by a


common integer.

Guo Fall 2009 C, Page 21 / 284


Claim Size x Probability Pr(x ) Scaled-up probability =100 Pr(x )
20 0.15 15
30 0.10 10
40 0.05 5
50 0.20 20
60 0.10 10
70 0.10 10
80 0.30 30
Total 1.00 100

Next, enter the 7 data pairs of (claim size and scaled-up probability) into the BA II Plus
Statistics Worksheet to get E ( X ) and X .

BA II Plus and BA II Plus Professional calculator key sequences:


Procedure Keystrokes Display
Set the calculator to display
4 decimal places 2nd [FORMAT] 4 ENTER DEC=4.0000

Set AOS (Algebraic


operating system) 2nd [FORMAT],
keep pressing multiple
times until you see Chn.
Press 2nd [ENTER]
AOS
(if you see AOS, your
calculator is already in
AOS, in which case press
[CLR Work] )
Select data entry portion of
Statistics worksheet 2nd [Data] X01 (old contents)

Clear worksheet 2nd [CLR Work] X01 0.0000


Enter data set
20 ENTER X01=20.0000
Y01=15.0000
15 ENTER

30 ENTER X02=30.0000
Y02=10.0000
10 ENTER

40 ENTER X03=40.0000
Y03=5.0000
5 ENTER
Guo Fall 2009 C, Page 22 / 284
50 ENTER
X04=50.0000
20 ENTER Y04=20.0000

60 ENTER X05=60.0000
Y05=10.0000
10 ENTER

70 ENTER X06=70.0000
Y06=10.0000
10 ENTER

80 ENTER X07=80.0000
Y07=30.0000
30 ENTER
Select statistical calculation
portion of Statistics 2nd [Stat] Old content
worksheet
Select one-variable
calculation method Keep pressing 2nd SET 1-V
until you see 1-V
View the sum of the scaled- n=100.0000 (Make sure the
up probabilities sum of the scaled-up
probabilities is equal to the
scaled-up common factor,
which in this problem is
100. If n is not equal to the
common factor, youve
made a data entry error.)
View mean x =55.0000
View sample standard S x =21.9043 (this is a
deviation sample standard deviation--
- dont use this value). Note
that
1 n
Sx = (X i X )2
n 1 i =1
View standard deviation X =21.7945

View X X =5,500.0000 (not

needed for this problem)


View X2 X 2 =350,000.0000 (not

Guo Fall 2009 C, Page 23 / 284


needed for this problem,
though this function might
be useful for other
calculations)

You should always double check (using to scroll up or down the data pairs of X and
Y) that your data entry is correct before accepting E ( X ) and X generated by BA II
Plus.

If you have made an error in data entry, you can 2nd DEL to delete a data pair (X, Y) or
2nd INS to insert a data pair (X,Y). If you typed a wrong number, you can use to delete
the wrong number and then re-enter the correct number. Refer to the BA II Plus
guidebook for details on how to correct data entry errors.

If this procedure of calculating E ( X ) and X seems more time-consuming than the


formula-driven approach, it could be because you are not familiar with the BA II Plus
Statistics Worksheet yet. With practice, you will find that using the calculator is quicker
than manually calculating with formulas.

Then, we have

(X X , X + X ) = (55 21.7945, 55 + 21.7945)


=(33.21, 76.79)

Finally, you find

Pr(33.21 X 76.79) = Pr( X = 40) + Pr( X = 50) + Pr( X = 60) + Pr( X = 70)
=0.05+0.20+0.10+0.10 = 0.45

Using TI-30X IIS

First, calculate E ( X ) using E ( X ) = xf (x ) . Then modify the formula

xf (x ) to x 2 f (x ) to calculate Var(X) without re-entering f (x ) .

To find E ( X ) , we type:
20*.15+30*.1+40*.05+50*.2+60*.1+70*.1+80*.3

Then press Enter. E ( X ) =55.


Next we modify the formula

20 .15+30 .1+40 .05+50 .2+60 .1+70 .1+80 .3

Guo Fall 2009 C, Page 24 / 284


to

20 2 .15+30 2 .1+40 2 .05+50 2 .2+60 2 .1+70 2 .1+80 2 .3

To change 20 to 20 2 , move the cursor immediately to the right of the number 20 so


your cursor is blinking on top of the multiplication sign . Press 2nd INS x 2 .

You find that

20 2 .15+30 2 .1+40 2 .05+50 2 .2+60 2 .1+70 2 .1+80 2 .3


=3500

So E ( X 2 ) =3,500

Var ( X ) = E ( X 2 ) E 2 ( X ) =3,500- 552 =475.

Finally, you can calculate X and the range of ( X X , X + X ).

Keep in mind that you can enter up to 88 digits for a formula in TI-30X IIS. If your
formula exceeds 88 digits, TI 30X IIS will ignore the digits entered after the 88th digit.

Example 2 (#19, Course 1 November 2001)

A baseball team has scheduled its opening game for April 1. If it rains on April 1, the
game is postponed and will be played on the next day that it does not rain. The team
purchases insurance against rain. The policy will pay 1,000 for each day, up to 2 days,
that the opening game is postponed. The insurance company determines that the number
of consecutive days of rain beginning on April 1 is a Poisson random variable with a 0.6
mean. What is the standard deviation of the amount the insurance company will have to
pay?

(A) 668, (B) 699, (C) 775, (D) 817, (E) 904

Solution

Let N =# of days it rains consecutively. N can be 0,1,2, or any non-negative integer.

n
0.6 n
Pr(N = n ) = e =e 0.6
(n =0,1,2,..+ )
n! n!

Guo Fall 2009 C, Page 25 / 284


Let X = payment by the insurance company. According to the insurance contract, if there
is no rain (n=0), X=0. If it rains for only 1 day, X=$1,000. If it rains for two or more days
in a row, X is always $2,000. We are asked to calculate X .

If a problem asks you to calculate the mean, standard deviation, or other statistics of a
discrete random variable, it is always a good idea to list the variables values and their
corresponding probabilities in a table before doing the calculation to organize your data.
So lets list the data pair ( X , probability) in a table:

Payment X Probability of receiving X


0.6 0
0 Pr(N = 0) = e 0.6
=e 0.6

0!
1,000 0.61
Pr(N = 1) = e 0.6
= 0.6e 0.6

1!
2,000
Pr(N 2) = Pr(N = 2) + Pr(N = 3) + ...
=1-[ Pr(N = 0) + Pr(N = 1)]
0.6
=1-1.6e

Once you set up the table above, you can use BA II Pluss Statistics Worksheet or TI-30
IIS to find the mean and variance.

Calculation Method 1 --- Using TI-30X IIS

First we calculate the mean by typing:

1000*.6e^(-.6)+2000(1-1.6e^(-.6

When typing e^(-.6) for e 0.6 , you need to use the negative sign, not the minus sign, to
get -6. If you type the minus sign in e^( .6), you will get an error message.

Additionally, for 0.6 e 0.6 , you do not need to type 0.6*e^(-.6), just type .6e^(-.6). Also,
to calculate 2000(1 1.6e .6 ) , you do not need to type 2000*(1-1.6*(e^(-.6))). Simply
type

2000(1-1.6e^(-.6

Your calculator understands you are trying to calculate 2000(1 1.6e .6 ) . However, the
omission of the parenthesis sign works only for the last item in your formula. In other
words, if your equation is

2000(1 1.6e .6
) + 1000 .6e .6

Guo Fall 2009 C, Page 26 / 284


you have to type the first item in its full parenthesis, but can skip typing the closing
parenthesis in the 2nd item:

2000(1-1.6e^(-.6)) + 1000*.6e^(-.6

If you type

2000(1-1.6e^(-.6 + 1000*.6e^(-.6

your calculator will interpret this as

2000(1-1.6e^(-.6 + 1000*.6e^(-.6) ) )

Of course, this is not your intention.

Lets come back to the calculation. After you type

1000*.6e^(-.6)+2000(1-1.6e^(-.6

press ENTER. You should get E ( X ) = 573.0897. This is an intermediate value. You
can store it on your scrap paper or in one of your calculators memories.

Next, modify your formula to get E (x 2 ) by typing:

1000 2 .6e ^ ( .6) + 2000 2 (1 1.6 ^ ( .6

You will get 816892.5107. This is E (x 2 ) . Next, calculate Var ( X )

Var (X ) = E (x 2 ) E 2 (x ) =488460.6535
X = Var (x ) = 698.9960 .

Calculation Method 2 --Using BA II Plus/ BA II Plus Professional

First, please note that you can always calculate X without using the BA II Plus built-in
Statistics Worksheet. You can calculate E (X ), E (X 2 ),Var (X ) in BA II Plus as you do
any other calculations without using the built-in worksheet.

In this problem, the equations used to calculate X are:

E (x ) = 0 * e .6
+ 1,000(.6e .6
) + 2,000(1 1.6e .6
)

Guo Fall 2009 C, Page 27 / 284


E (x 2 ) = 02 e .6
+ 1,0002 .6e .6
+ 2,0002 (1 1.6e .6
)

Var (x ) = E (x 2 ) E 2 (x ), X = Var (x )

You simply calculate each item in the above equations with BA II Plus. This will give
you the required standard deviation.

However, we do not want to do this hard-core calculation in an exam. BA II Plus already


has a built-in statistics worksheet and we should utilize it.

The key to using the BA II Plus Statistics Worksheet is to scale up the probabilities to
integers. To scale the three probabilities:

.6 .6 .6
(e , 0.6e , 1 1.6e )

is a bit challenging, but there is a way:

Payment X Probability (assuming you set Scale up probability to integer


your BA II Plus to display 4 (multiply the original probability
decimal places) by 10,000)
0 e 0.6 = 0.5488 5,488
0.6
1,000 0.6e = 0.3293 3,293
0.6
2,000 1-1.6e =0.1219 1,219
Total 1.0 10,000

Then we just enter the following data pairs into BA II Pluss statistics worksheet:

X01=0 Y01=5,488;
X02=1,000 Y02=3,293;
X03=2,000 Y03=1,219.

Then the calculator will give you X = 698.8966

Make sure your calculator gives you n that matches the sum of the scaled-up
probabilities. In this problem, the sum of your scaled-up probabilities is 10,000, so you
should get n=10,000. If your calculator gives you n that is not 10,000, you know that at
least one of the scaled-up probabilities is wrong.

Of course, you can scale up the probabilities with better precision (more closely
resembling the original probabilities). For example, you can scale them up this way
(assuming you set your calculator to display 8 decimal places):

Guo Fall 2009 C, Page 28 / 284


Payment X Probability Scale up probability to integer
more precisely (multiply the
original probability by
100,000,000)
0 e 0.6 = 0.54881164 54,881,164
1,000 0.6e 0.6 = 0.32928698 32,928,698
2,000 1-1.6e 0.6 =0.12190138 12,190,138
Total 100,000,000

Then we just enter the following data pairs into BA II Pluss statistics worksheet:

X01=0 Y01=54,881,164;
X02=1,000 Y02=32,928,698;
X03=2,000 Y03=12,190,138.

Then the calculator will give you X =698.8995993 (remember to check that
n=100,000,000)

For exam problems, scaling up the original probabilities by multiplying them by 10,000
is good enough to give you the correct answer. Under exam conditions it is unnecessary
to scale the probability up by multiplying by 100,000,000.

#4 Calculate the sample variance

May 2000 #33


The number of claims a driver has during the year is assumed to be Poisson distributed
with an unknown mean that varies by driver.

The experience for 100 drivers is as follows:

# of claims during the year # of drivers


0 54
1 33
2 10
3 2
4 1
Total 100

Determine the credibility of one years experience for a single driver using
semiparametric empirical Bayes estimation.

Solution

Guo Fall 2009 C, Page 29 / 284


For now dont worry about credibility and focus on calculating the sample mean and
sample variance.

Standard calculation not using 1-V Statistics Worksheet

Let X represent the # of claims in a year, then

54 ( 0 ) + 33 (1) + 10 ( 2 ) + 2 ( 3) + 1( 4 ) 63
=X = = = 0.63
54 + 33 + 10 + 2 + 1 100

( ) ( )
n
1 1 100
Var ( X ) =
2 2
Xi X = Xi X
n 1 i =1 100 1 i =1
54 ( 0 .63) + 33 ( 0 .63) + 54 (1 .63) + 10 ( 2 .63) + 2 ( 3 .63) + 1( 4 .63)
2 2 2 2 2 2

=
100 1
=0.68

Use 1-V Statistics Worksheet:

Enter

X01=0, Y01=54
X02=1, Y02=33
X03=2, Y03=10
X04=3, Y04=2
X05=4, Y05=1

You should get:


X = 0.63
S X = 0.82455988 (this is the unbiased sample standard deviation)

While your calculator displays S X = 0.82455988 , press the x 2 key of your calculator.
You should get: 0.67989899. This is Var ( X ) = S X2 . So Var ( X ) = 0.67989899 0.68

#5 Find the conditional mean and conditional variance

Example

For an insurance:

A policyholders annual losses can be 100, 200, 300, and 400 with respective
probabilities 0.1, 0.2, 0.3, and 0.4.

Guo Fall 2009 C, Page 30 / 284


The insurance has a annual deductible of $250 per loss.

Calculate the mean and the variance of the annual payment made by the insurer to the
policyholder, given theres a payment.

Solution

Let X represent the annual loss. Let Y represent the claim payment by the insurer to the
policyholder.

0 if X 250
Then Y =
X 250 if X > 250

We are asked to find E (Y X > 250 ) and Var ( Y X > 250 )

Standard solution

X 100 200 300 400


Y 0 0 50 150
P(X ) 0.1 0.2 0.3 0.4
P ( X > 250 ) = P ( X = 300 ) + P ( X = 400 ) = 0.3 + 0.4 = 0.7
P(X ) 0.1 0.2 0.3 0.4
P ( X > 250 ) 0.7 0.7 0.7 0.7

E(X 250 X > 250 ) = 0


1 2 3 4
+0 + 50 + 150 = 107.1428571
7 7 7 7
1 2 3 4
(X 150 ) + X > 150 ! = 0 2 + 02 + 502 + 150 2 = 13, 928.57143
2
E
" # 7 7 7 7
Var "( X 150 )+ X > 150 !# = 13,928.57143 107.14285712 = 2, 448.99

Fast solution using BA II Plus/BA II Plus Professional 1-V Statistics Worksheet


X 100 200 300 400
Y >250? No. Discard No. Discard. Yes. Keep. Yes. Keep.
If Yes, Keep; if No,
discard.

New table after discarding X 250 :


X 300 400
Y 50 150

P(X ) 0.3 0.4

Guo Fall 2009 C, Page 31 / 284


10P ( X ) -- scaled up probability 3 4

Enter the following into 1-V Statistics Worksheet:

X01=50, Y01=3; X02=150, Y02=4

BA II Plus or BA II Plus Professional should give you:

n = 7, X = 107.14, X = 49.48716593

Var = 2
= 2, 4489.98

This is how BA II Plus/Professional 1-V Statistics Worksheet works. After you enter
X01=50, Y01=3,X02=150, Y02=4, BA II Plus/Professional knows that your random
variable X takes on two values: 50 (with frequency of 3) and 150 (with frequency 4).
Next, BA II Plus/Professional sets up the following table for statistics calculation:

3 3
$$50 =
with probability
3+ 4 7
X=
$150 with probability 4 = 4
$ 3+ 4 7

Then, BA II Plus/Professional calculates the mean and variance:

3 4
E ( X ) = 50 + 150 ,
7 7

E ( X 2 ) = 502
3 4
+ 1502 ,
7 7

Var ( X ) = E ( X 2 ) E 2 ( X )

Compare BA II Plus/Professional calculation with our manual calculation presented


earlier:

E(X 250 X > 250 ) = 0


1 2 3 4
+0 + 50 + 150
7 7 7 7

1 2 3 4
(X 150 ) + X > 150 ! = 02 + 02 + 502 + 150 2
2
E
" # 7 7 7 7

Guo Fall 2009 C, Page 32 / 284


Var "( X 150 )+ X > 150 !# = 13,928.57143 107.14285712 = 2, 448.99

Now you see that BA II/Professional correctly calculates the mean and variance.

In BA II Plus/Professional 1-V Statistics Worksheet, whats important is the relative data


frequency, not the absolute data frequency.

The following entries produce identical mean, sample mean, and variance:

Entry One: X01=50, Y01=3; X02=150, Y02=4,


Entry Two: X01=50, Y01=6, X02=150, Y02=8,
Entry Three: X01=50, Y01=30, X02=150, Y02=40,

In each entry, BA II Plus/Professional produces the following table for calculation:

3
$$50 with probability
7
X=
$150 4
with probability
$ 7

General procedure to calculate E "Y ( x ) x > a !# using BA II Plus and BA II Plus


Professional 1-V Statistics Worksheet:
Throw away all the data pairs (Yi , X i ) where the condition X > a is NOT met.
Using the remaining data pairs to calculate E (Y ) and Var (Y ) .

General procedure to calculate E "Y ( x ) x < a !# using BA II Plus and BA II Plus


Professional 1-V Statistics Worksheet:
Throw away all the data pairs (Yi , X i ) where the condition X < a is NOT met.
Using the remaining data pairs to calculate E (Y ) and Var (Y ) .

Example

You are given the following information (where k is a constant)

X =x pX ( x )

0.5
4
6
( 0.54 ) k

0.25
1
6
( 0.253 ) ( 0.75 ) k

Guo Fall 2009 C, Page 33 / 284


0.75
1
6
( 0.753 ) ( 0.25 ) k

Calculate E ( X ) using BA II Plus shortcut.

Solution

Please note that you dont need to calculate k .

X =x pX ( x ) Scaled p X ( x ) up multiply
1, 000, 000
p X ( x ) by
k

0.5
4
6
( 0.54 ) k = 0.041667 k 41,667

0.25
1
6
( 0.253 ) ( 0.75 ) k = 0.001953 k 1,953

0.75
1
6
( 0.753 ) ( 0.25 ) k = 0.017578 k 17,578

Next, we enter the following into BA II Plus/Professional 1-V Statistics Worksheet:

X01=0.5, Y01=41,667
X02=0.25, Y02= 1,953
X03=0.75, Y03=17,578

You should get: n = 61,198 , X = 0.56382970 . So E ( X ) = 0.56382970

Exam C Nov 2002 #29


You are given the following joint distribution:

&
X 0 1
0 0.4 0.1
1 0.1 0.2
2 0.1 0.1

10
For a given value of ' and a sample of size 10 for X : X i = 10
i =1

Determine the Bhlmann credibility premium.

Solution

Guo Fall 2009 C, Page 34 / 284


Dont worry about the Bhlmann credibility premium for now. All you need to do right
now is to calculate the following 7 items:

E ( X ' = 0 ) , Var ( X ' = 0 ) , E ( X ' = 1) , Var ( X ' = 1) ,


E " E ( X ' ) !# , Var " E ( X ' ) !# , E "Var ( X ' ) !#

First, lets calculate E ( X ' = 0 ) , Var ( X ' = 0 )

X '=0 P ( X ' = 0) 10 P ( X ' = 0 )


0 0.4 4
1 0.1 1
2 0.1 1

Enter the following into 1-V Statistics Worksheet:

X01=0, Y01=4; X01=1, Y02=1; X03=2, Y03=1

BA II Plus or BA II Plus Professional should give you:


7
n = 6, X = 0.5, X = 0.76376262 Var = 2
= 0.58333333 =
12
E ( X ' = 0 ) = 0.5 , Var ( X ' = 0 ) = 0.58333333 =
7
12

Next, lets calculate E ( X ' = 1) , Var ( X ' = 1)

X ' =1 P ( X ' = 1) 10 P ( X ' = 1)


0 0.1 1
1 0.2 2
2 0.1 1

Enter the following into 1-V Statistics Worksheet:

X01=0, Y01=1; X01=1, Y02=2; X03=2, Y03=1

BA II Plus or BA II Plus Professional should give you:

n = 4, X =1 X = 0.70710678 Var = 2
= 0.707106782 = 0.5

E ( X ' = 1) = 1 , Var ( X ' = 1) = 0.5

Guo Fall 2009 C, Page 35 / 284


Next, lets calculate E " E ( X ' ) !# and Var " E ( X ' ) !# .

E ( X ' = 0 ) = 0.5 P ( ' = 0 ) = 0.4 + 0.1 + 0.1 = 0.6 10 P ( ' = 0 ) = 6


E ( X ' = 1) = 1 P ( ' = 0 ) = 0.1 + 0.2 + 0.1 = 0.4 10 P ( ' = 1) = 4

Enter the following into 1-V Statistics Worksheet:

X01=0.5, Y01=6; X01=1, Y02=4

BA II Plus or BA II Plus Professional should give you:

n = 10, X = 0.7, X = 0.24494897 Var = 2


= 0.24494897 2 = 0.06

E " E ( X ' ) !# = 0.7 , Var " E ( X ' ) !# = 0.06

Finally, lets calculate E "Var ( X ' ) !# .

P ( ' = 0 ) = 0.4 + 0.1 + 0.1 = 0.6 10 P ( ' = 0 ) = 6


Var ( X ' = 0 ) =
7
12
Var ( X ' = 1) = 0.5 P ( ' = 0 ) = 0.1 + 0.2 + 0.1 = 0.4 10 P ( ' = 1) = 4

Enter the following into 1-V Statistics Worksheet:

7
X01= , Y01=6; X01=0.5, Y02=4
12

BA II Plus or BA II Plus Professional should give you:

n = 10, X = 0.55, X = 0.04085483

E "Var ( X ' ) !# = 0.55

#6 Do the least squares regression

One useful yet neglected feature of BA II Plus/BA II Plus Professional is the linear least
squares regression functionality. This feature can help you quickly solve a tricky problem
with a few simple key strokes. Unfortunately, 99.9% of the exam candidates dont know
of this feature. Even SOA doesnt know.

Guo Fall 2009 C, Page 36 / 284


Let me quickly walk through the basic formula behind the linear least squares regression.
This part is also explained in the chapter on the Bhlmann credibility premium. So I will
just repeat what I said in that chapter.

In a regression analysis, you try to fit a line (or a function) through a set of points. With
least squares regression, you want to get a better fit by minimizing the distance squared
of each point to the fitted line. You then use the fitted line to project where the data point
is most likely to be.

Say you want to find out how ones income level affects how much life insurance he
buys. Let X represent ones income. Let Y represent the amount of life insurance this
person buys. You have collected some data pairs of ( X , Y ) from a group of consumers.
You suspect theres a linear relationship between X and Y . So you want to predict
Y using the function a + bX , where a and b are constant. With least squares regression,
you want to minimize the following:

"(
Q=E a + bX Y ) !
2
#

Next, well derive a and b .

(Q ( (
= E ( a + bX Y ) ! = E ( a + bX Y ) !# )* = E " 2 ( a + bX Y ) !#
2 2

(a (a " # (a " +

= 2 " E ( a + bX Y ) !# = 2 " a + bE ( X ) E (Y ) !#

(Q
Setting = 0. a + bE ( X ) E (Y ) = 0 ( Equation I )
(a

(Q ( (
= E ( a + bX Y ) ! = E ( a + bX Y ) #! )* = E " 2 ( a + bX Y ) X !#
2 2

(b (b " # (b " +

= 2 E "( a + bX Y ) X #! = 2 " aE ( X ) + bE ( X 2 ) E ( X Y ) !#

(Q
Setting = 0. aE ( X ) + bE ( X 2 ) E ( X Y ) = 0 ( Equation II )
(b

(Equation II ) - (Equation I ) E ( X ) :

b " E ( X 2 ) E 2 ( X ) !# = E ( X Y ) E ( X ) E (Y )

However, E ( X 2 ) E 2 ( X ) = Var ( X ) , E ( X Y ) E ( X ) E ( Y ) = Cov ( X , Y ) .


Guo Fall 2009 C, Page 37 / 284
Cov ( X , Y )
b= , a = E (Y ) bE ( X )
Var ( X )

Where
Var ( X ) = E ( X 2 ) E 2 ( X ) , E ( X ) = pi xi , E ( X 2 ) = pi xi2

Cov ( X , Y ) = E ( X Y ) E ( X ) E (Y ) , E ( X Y ) = pi xi yi , E (Y ) = pi yi

pi represents the probability that the data pair ( xi , yi ) occurs.

Example 1. For the following data pair ( xi , yi ) , find the linear least squares regression
line a + bX :

i pi ( xi , yi ) X = xi Y = yi
1 13 0 1
2 13 3 6
3 13 12 8

Also, calculate a + bX when X =0, 3, 12 respectively.

Solution

( 0 + 3 + 12 ) = 5 , E ( X 2 ) = ( 02 + 32 + 122 ) = 51
1 1
E(X ) =
3 3
Var ( X ) = 51 52 = 26

1 1
E (Y ) = (1 + 6 + 8) = 15 , E ( X Y ) = ( 0 1 + 3 6 + 12 8 ) = 38
3 3
Cov ( X , Y ) = E ( X Y ) E ( X ) E (Y ) = 38 5 5 = 13

Cov ( X , Y ) 13
b= = = 0.5 , a = E (Y ) bE ( X ) = 5 0.5 5 = 2.5
Var ( X ) 26

So the least squares regression linear is 2.5 + 0.5 X .

Next, well calculate a + bX when X =0, 3, 12.


If X =0, 2.5 + 0.5 X = 2.5 + 0.5 ( 0 ) = 2.5 ;

Guo Fall 2009 C, Page 38 / 284


If X =3, 2.5 + 0.5 X = 2.5 + 0.5 ( 3) = 4 ;
If X =12, 2.5 + 0.5 X = 2.5 + 0.5 (12 ) = 8.5 ;

Now you understand the linear least squares regression. Next, lets talk about how to use
BA II Plus/BA II Plus Professional to find a and b and calculate a + bX when X =0, 3,
12.

Example 2. For the following data pair ( xi , yi ) , find the linear least squares regression
line a + bX using BA II Plus/BA II Plus Professional.

i pi ( xi , yi ) X = xi Y = yi
1 13 0 1
2 13 3 6
3 13 12 8

Also, calculate a + bX when X =0, 3, 12 respectively.

Solution

In BA II Plus/Professional, the linear least squares regression functionality is called LIN.

The keystrokes to find a + bX using BA II Plus/Professional:

2nd Data (activate statistics worksheet)


2nd CLR Work (clear the old contents)
X01=0, Y01=1
X02=3, Y02=6
X03=12, Y03=8
2nd STAT (keep pressing 2nd Enter, 2nd Enter, , until your calculator displays
LIN)

Press the down arrow key , youll see n = 3


Press the down arrow key , youll see X = 5
Press the down arrow key , youll see S X = 6.24499800 (sample standard deviation)
Press the down arrow key , youll see X = 5.09901951 (standard deviation)
Press the down arrow key , youll see Y = 5
Press the down arrow key , youll see S y = 3.60555128 (sample standard deviation)
Press the down arrow key , youll see y = 2.94392029 (standard deviation)
Press the down arrow key , youll see a = 2.5
Press the down arrow key , youll see b = 0.5
Press the down arrow key , youll see r = 0.8660254 ( the correlation coefficient)
Guo Fall 2009 C, Page 39 / 284
Press the down arrow key , youll see X ' = 0
Enter X ' = 0 ( To do this, press 0 Enter)
Press the down arrow key .
Press CPT. Youll get Y ' = 2.5 (this is a + bX when X =0)
Press the up arrow key , youll see X ' = 0
Enter X ' = 3 ( To do this, press 3 Enter)
Press the down arrow key .
Press CPT. Youll get Y ' = 4 (this is a + bX when X =3)
Press the up arrow key , youll see X ' = 3
Enter X ' = 12 ( To do this, press 12 Enter)
Press the down arrow key .
Press CPT. Youll get Y ' = 8.5 (this is a + bX when X =12)

You see that using BA II Plus/Professional LIN Statistics Worksheet, we get the same
result.

You might wonder why we didnt use the probability pi ( xi , yi ) . Here is an important
point. BA II Plus/Professional Statistics Worksheet (including LIN) cant directly handle
probabilities. To use Statistics Worksheet, you have to first convert the probabilities to
1
the # of occurrences. In this problem, pi ( xi , yi ) = for i =1,2, and 3. So we have 3 data
3
pairs of ( xi , yi ) and each data pair is equally likely to occur. So we arbitrarily let each
data pair to occur only once. This way, BA II Plus/Professional knows that each of the
three data pairs has 1 3 chance of occurring. Later I will show you how to use LIN when
pi ( xi , yi ) is not uniform.

Some of you might complain: I can easily use my pen and find the answers. Why do I
need to bother using LIN? There are several reasons why you might want to use LIN to
find the regression line a + bX and calculate various values of a + bX :

In the of the exam, its easy for you to be brain dead and forget the formulas
Cov ( X , Y )
b= , a = E (Y ) bE ( X )
Var ( X )

Even if you are not brain dead, you can easily make mistakes calculating a + bX
from scratch. In contrast, if you have entered your data pair ( xi , yi ) correctly, BA
II Plus/Professional will generate the results 100% right.

Even if you want to calculate a + bX from scratch, its good to use LIN to
double check your work.

Guo Fall 2009 C, Page 40 / 284


Example 3. For the following data pair ( xi , yi ) , find the linear least squares regression
line a + bX using BA II Plus/BA II Plus Professional.

i pi ( xi , yi ) X = xi Y = yi
1 16 0 1
2 13 3 6
3 12 12 8

Also, calculate a + bX when X =0, 3, 12 respectively.

Solution

Here pi ( xi , yi ) is not uniform. To convert probabilities to the # of occurrences, lets


assume we have a total of 6 occurrences. Then ( x1 , y1 ) occurs once; ( x2 , y2 ) occurs
twice; and ( x3 , y3 ) occurs three times. When calculating a + bX , LIN Statistics
1 1
Worksheet automatically figures out that p1 ( x1 , y1 ) = , p2 ( x2 , y2 ) = , and
6 3
1
p3 ( x3 , y3 ) = .
2

Of course, you can also assume that the total # of occurrences is 60. Then ( x1 , y1 ) occurs
10 times; ( x2 , y2 ) occurs 20 times; and ( x3 , y3 ) occurs 30 times. However, this approach
will make your data entry difficult.

The following calculation assumes the total # of occurrences is 6.

When using LIN Statistics Worksheet, we enter the following data:

X01=0, Y01=1

X02=3, Y02=6
X03=3, Y04=6

X04=12, Y04=8
X05=12, Y05=8
X06=12, Y06=8

Your calculator should give you:


n = 6 , X = 7 , S X = 5.58569602 , X = 5.09901951 ,
Y = 6.16666667 , SY = 2.71416040 , Y = 2.47767812
a = 3.25 , b = 0.41666667 , r = 0.85749293
Guo Fall 2009 C, Page 41 / 284
a + bX = 3.25 + 0.41666667 X

Set X ' = 0 . Press CPT .You should get Y ' = 3.25


Set X ' = 3 . Press CPT . You should get Y ' = 4.5
Set X ' = 12 . Press CPT . You should get Y ' = 8.25

Double checking BA II Plus/Professional LIN functionality:


i pi ( xi , yi ) X = xi Y = yi
1 16 0 1
2 13 3 6
3 12 12 8

E(X2) = ( 0 ) + ( 3 ) + (122 ) = 75
1 1 1 1 2 1 2 1
E(X ) = ( 0 ) + ( 3) + (12 ) = 7 ,
6 3 2 6 3 2
Var ( X ) = 75 72 = 26

1 1 1
E (Y ) = (1) + ( 6 ) + ( 8 ) = 6.1667
6 3 2
1 1 1
E ( X Y ) = ( 0 1) + ( 3 6 ) + (12 8 ) = 54
6 3 2

Cov ( X , Y ) = E ( X Y ) E ( X ) E (Y ) = 54 7 6.1667 = 10.8331

Cov ( X , Y ) 10.8331
b= = = 0.41666
Var ( X ) 26

a = E (Y ) bE ( X ) = 6.1667 0.41666 ( 7 ) = 3.25

a + bX = 3.25 + 0.41666 X

If X = 0 , then Y ' = a + bX = 3.25 + 0.41666 ( 0 ) = 3.25


If X = 3 , then Y ' = a + bX = 3.25 + 0.41666 ( 3) = 4.5
If X = 12 , then Y ' = a + bX = 3.25 + 0.41666 (12 ) = 8.25

Now you should be convinced that LIN Statistics Worksheet produces the correct result.

Application of LIN Statistics Worksheet in Exam C

Guo Fall 2009 C, Page 42 / 284


There are at least two places you can use LIN. One is to calculate Bhlmann credibility
premium as the least squares regression of Bayesian premium. Another situation is to use
LIN for liner interpolation. Ill walk you through both.

Bhlmann credibility premium as the least squares regression of Bayesian premium

Example 4. (old SOA problem)

Let X 1 represent the outcome of a single trial and let E ( X 2 X 1 ) represent the expected
value of the outcome of a 2nd trial as described in the table below:

Outcome Initial probability Bayesian Estimate


k of outcome E ( X 2 X1 = k )
0 13 1
3 13 6
12 13 8

Calculate the Bhlmann credibility premium corresponding to Bayesian estimates (1,6,8).

Solution

Bhlmann credibility premium is P = a + Z X , which minimizes the following items:

E ( a + ZX 1 Y )
2

where Y = E ( X 2 X 1 ) .

Since the probability of data pair is uniformly 1 3, we enter the following data in LIN:

X01=0, Y01=1
X02=3, Y02=6
X03=12, Y03=8

We should get:

a = 2.5 , b = 0.5
Enter X ' = 0 . Press CPT. Youll get Y ' = 2.5 (this is a + bX when X =0)
Enter X ' = 3 . Press CPT. Youll get Y ' = 4 (this is a + bX when X =3)
Enter X ' = 12 Press CPT. Youll get Y ' = 8.5 (this is a + bX when X =12)

So the Bhlmann credibility premium corresponding to Bayesian estimates (1,6,8) is (2.5,


4, 8.5).

Guo Fall 2009 C, Page 43 / 284


Example 5 (another old SOA problem)
You are given the following information about insurance coverage:

# of losses Probability Bayesian Premium


n E ( X 2 X1 = n )
0 14 0.5
1 12 0.9
2 14 1.7

Determine the Bhlmann credibility factor for this experience.

Solution
The probability is not uniform. Assume the total # of occurrences is 4. Then the data pair
" n = 0, E ( X 2 X 1 = 0 ) = 0.5!# occurs once, " n = 1, E ( X 2 X 1 = 1) = 0.9 !# occurs twice, and
" n = 2, E ( X 2 X 1 = 2 ) = 1.7 !# occurs once.

So we enter the following data into LIN:

X01=0, Y01=0.5
X02=1, Y02=0.9
X03=1, Y03=0.9
X04=2, Y03=1.7

We should get:
a = 0.4 , b = 0.6 . So the Bhlmann credibility factor is Z = b = 0.6 .

Example 6 (old SOA problem)

Outcome Probability Bayesian Estimate Ei


Ri Pi given outcome Ri
0 23 7 4
2 29 55 24
14 19 35 12

1
The Bhlmann credibility factor after one experiment is . Calculate a and b that
12
minimize the following expression:
Guo Fall 2009 C, Page 44 / 284
3
Pi ( a + bRi Ei )
2

i =1

Solution

1
SOA makes your life easier by giving you b = . However, to solve this problem, you
12
really dont need to know b . Once again, well use LIN to solve the problem. Lets
assume the total # of occurrences of data pairs ( Ri , Ei ) is 9. Then (0, 7 4 ) occurs 6
times; (2, 55 24 ) occurs 2 times; and (14, 35 12 ) occurs one time.

Enter the following into LIN:

X01=0, Y01= 7 4 = 1.75


X02=0, Y02=1.57
X03=0, Y03=1.57
X04=0, Y04=1.57
X05=0, Y05=1.57
X06=0, Y06=1.57

X07=2, Y07= 55 24
X08=3, Y08= 55 24

X09=14, Y09= 35 12

We should get:
1
a = 1.8333 , b = 0.08333 = .
12

Does this solution sound too much data entry? Not to me. Yes, I can figure out the
answers using the equations:

Cov ( X , Y )
b= , a = E (Y ) bE ( X )
Var ( X )

I might solve this problem using the above equations when Im not taking the exam.
However, in the exam room, you bet I wont bother using these equations. I will enter 18
numbers into the calculator and let the calculator do the math for me. This way, I dont
have to think. I just enter the numbers and the calculator will spit out the answer for me.
And I know that my result is 100% right.

Guo Fall 2009 C, Page 45 / 284


#7 Do linear interpolation

Another use of LIN is to do linear interpolation. You are given two data pairs ( x1 , y1 ) and
( x2 , y2 ) . Then you are given a single value x3 . You need to find y3 using linear
interpolation.

The equation for linear interpolation is this:

y3 y1 y2 y1
= = slop of line ( x1 , y1 ) and ( x2 , y2 )
x3 x1 x2 x1

y2 y1
y3 = ( x3 x1 ) + y1
x2 x1

Under exam conditions, this standard approach is often prone to errors.

To use LIN for linear interpolation, please note that the least squares regression line for
two data points ( x1 , y1 ) and ( x2 , y2 ) is just an ordinary straight line connecting ( x1 , y1 )
and ( x2 , y2 ) . To find y3 , we simply find the least squares regression line a + bX for
( x1 , y1 ) and ( x2 , y2 ) . Then we enter x3 into LIN. Then LIN will produce y3 .

Example 1. (May 2000, #2)

You are given the following random sample of 10 claims:

46 121 493 738 775


1078 1452 2054 2199 3207

Determine the smoothed empirical estimate of the 90th percentile, as defined in Klugman,
Panjer, and Willmot.

Solution

To find the smoothed empirical estimate, we arrange the n observations in ascending


100k
order. Then the k -th number is the percentile. For example, the 1st observation 46
n +1
100 (1) 100 ( 2 )
is the = 9.09 percentile; the 2nd observation 121 is the = 18.18 percentile.
10 + 1 10 + 1
So on and so forth.

Guo Fall 2009 C, Page 46 / 284


To find the smoothed estimate of the 90-th percentile, we linearly interpolate between the
100 ( 9 )
9-th observation, which is = 81.82 -th percentile, and the 10th observation, which
10 + 1
100 (10 )
is = 90.91 -th percentile.
10 + 1

2,199 x90 3,207

81.82 90 90.91 percentile

90 81.82
x90 = x81.82 + ( x90.91 x81.82 )
90.91 81.82

90 81.82
= 2,199 + ( 3, 207 2,199 ) = 3,106.09
90.91 81.82

The above is the standard solution, which is prone to errors.

Next, Ill show you two shortcuts. One is without using LIN; the other with using LIN.

Shortcut without LIN:

100k 100k
Since the k -th number is the percentile, the m = percentile corresponds to
n +1 n +1
m ( n + 1)
- th observation. For example, the 81.82-th percentile corresponds to
100
81.82 (10 + 1)
= 9 -th observation; 90.91-th percentile corresponds to the
100
90.91(10 + 1)
= 10 -th observation.
100

Important Rules:
100k
The k -th observation is the percentile.
n +1

Guo Fall 2009 C, Page 47 / 284


m ( n + 1)
The m -th percentile is the - th observation.
100

Once you understand the above two rules, you can quickly find the 90-th percentile.

m ( n + 1) 90 (10 + 1)
Set m = 90 : k = = = 9.9 . So 9.9-th observation is what we are
100 100
looking for.

Of course, there isnt 9.9-th observation. So we need to find it using linear interpolation.

2,199 x90 3,207

9 9.9 10

9.9 9
x90 = 2,199 + ( 3, 207 2,199 ) = 3,106.2
10 9

You see that this linear interpolation is must faster than the previous linear interpolation.

Shortcut using LIN

We have two data pairs (9, 2,199) and (10, 3,207). As said before, if you have only two
points, then the least squares line is just the ordinary line connecting the two points. We
are interested in finding the ordinary straight line connecting (9, 2,199) and (10, 3,207).
So well use the LIN function to find the least squares line, which is the ordinary line.

Enter the following into LIN:

X01=9, Y01=2199
X02=10, Y02=3207

Youll find that: a = 6,873 , b = 1, 008 , r = 1 . The correlation coefficient should be one
because we have only two data pairs. Two data points always produce perfectly linear
relationship. So if your r is not equal to one, you did something wrong.

Next, set X ' = 9.9 . Press CPT. You should get: Y ' = 3,106.2 . This is the 90th percentile
you are looking for.

Guo Fall 2009 C, Page 48 / 284


Example 2

You are given the following values of the cdf of a standard normal distribution:

, ( 0.4 ) = 0.6554 , , ( 0.5) = 0.6915

Use linear interpolation, calculate , ( 0.443)

Solution

The standard solution is

0.5 0.443 0.443 0.4


, ( 0.443 ) = , ( 0.4 ) + , ( 0.5 )
0.5 0.4 0.5 0.4

= 0.57, ( 0.4 ) + 0.43, ( 0.5)


= 0.57 ( 0.6554 ) + 0.43 ( 0.6915) = 0.6709

This approach is prone to errors. The math logic is simple, but there are simply too many
numbers to calculate. And its very easy to make a mistake, especially in the heat of the
exam.

To quickly solve this problem, well use LIN. Enter the following data:

X01=0.4, Y01=0.6554
X02=0.5, Y02=0.6915

2nd STAT (keep pressing 2nd Enter until you see LIN)
Press the down arrow key , youll see n = 2
Press the down arrow key , youll see X = 0.45
Press the down arrow key , youll see S X = 0.07071068
Press the down arrow key , youll see X = 0.05
Press the down arrow key , youll see Y = 0.67345
Press the down arrow key , youll see S y = 0.02552655
Press the down arrow key , youll see y = 0.01805
Press the down arrow key , youll see a = 0.511
Press the down arrow key , youll see b = 0.361
Press the down arrow key , youll see r = 1 (this is the correlation coefficient)
Press the down arrow key , youll see X ' = 0.00
Enter X ' = 0.443

Guo Fall 2009 C, Page 49 / 284


Press the down arrow key .
Press CPT. Youll get Y ' = 0.670923

So , ( 0.443) = 0.670923

In the above example, after generating , ( 0.443) = 0.670923 , you want to generate
, ( 0.412345 ) , this is what you do:
Enter X ' = 0.412345
Press the down arrow key .
Press CPT. Youll get Y ' = 0.65985655 . This is , ( 0.412345 ) .

If you want to generate , ( 0.46789 ) , this is what you do:


Enter X ' = 0.46789
Press the down arrow key .
Press CPT. Youll get Y ' = 0.67990829 . This is , ( 0.46789 ) .

General procedure
Given two data pairs ( c1 , d1 ) and ( c2 , d 2 ) and a single data c3 , to use BA II Plus and BA
II Plus Professional LIN Worksheet to generate d3 , enter

X01= c1 , Y01= d1
X02= c2 , Y02= d 2
X ' = c3

In other words, the independent variable c1 , c2 , c3 must be entered as X ' s and d1 , d 2


must be entered as Y ' s .

Example 3
You are given the following values of the cdf of a standard normal distribution:

, ( 0.4 ) = 0.6554 , , ( 0.5) = 0.6915

Use linear interpolation, find a, b, c , and e (all these are positive numbers) such that

, ( a ) = 0.6666
, ( b ) = 0.6777
, ( c ) = 0.6888

Guo Fall 2009 C, Page 50 / 284


, ( d ) = 0.6999

Solution

In BA II Plus and BA II Plus Professional LIN Statistics Worksheet, enter

X01=0.6554, Y01=0.4
X02=0.6915, Y02=0.5

Enter X ' = 0.6666 . Then the calculator will generate Y ' = 0.43102493 .
So a = 0.43102493 .

Enter X ' = 0.6777 . Then the calculator will generate Y ' = 0.46177285
So b = 0.46177285 .

Enter X ' = 0.6888 . Then the calculator will generate Y ' = 0.49252078
c = 0.49252078

Enter X ' = 0.6999 . Then the calculator will generate Y ' = 0.52326870
So d = 0.52326870

Example 4
The population of a survivor group is assumed to be linear between two consecutive ages.
You are given the following:

Age # of people alive at this age


50 598
51 534

Calculate the # of people alive at the following fractional ages:


50.2, 50.5, 50.7, 50.9

Solution

In BA II Plus and BA II Plus Professional LIN Statistics Worksheet, enter

X01=50, Y01=598
X02=51, Y02=534

Enter X' = 50.2 . Then the calculator will generate Y' = 585.2
Enter X' = 50.5 . Then the calculator will generate Y' = 566
Enter X' = 50.7 . Then the calculator will generate Y' = 553.2
Enter X' = 50.9 . Then the calculator will generate Y' = 540.4

Guo Fall 2009 C, Page 51 / 284


Chapter 2 Maximum likelihood estimator
Basic idea

An urn has two coins, one fair and the other biased. In one flip, the fair coin has 50%
chance of landing with heads, while the biased one has 90% chance of landing with
heads. Now a coin is randomly chosen from the urn and is tossed. The outcome is a head.

Question: Which coin was chosen from the urn? The fair coin or the biased coin?

Imagine you have entered a bet. If your guess is correct, youll earn $10. If your guess is
wrong, youll lose $10. How would you guess?

Most people will guess that the coin chosen from the urn was the biased coin; the biased
coin is far more likely to land on heads.

This simple example illustrates the intuition behind the maximum likelihood estimator. If
we have to estimate a parameter from an n -size sample X 1 , X 2 ,, X n , we can choose a
parameter that has the highest probability to be observed.

Example. You flip a coin 9 times and observe HTTTHHHTH. You dont know whether
the coin is fair and you need to estimate the probability of getting H in one flip.

Let p represent the probability of getting a head in one flip. The probability for us to
observe HTTTHHHTH is

P ( HTTTHHHTH p ) = p5 (1 p ) This is called the likelihood function L ( p ) .


4

Sample values of p and the corresponding likelihood function are:

P ( HTTTHHHTH p ) = p5 (1 p )
4
p
0 0.000000000
0.1 0.000006561
0.2 0.000131072
0.3 0.000583443
0.4 0.001327104
0.5 0.001953125
0.6 0.001990656
0.7 0.001361367
0.8 0.000524288
0.9 0.000059049
1 0.000000000
Guo Fall 2009 C, Page 52 / 284
If we have to guess p among the possible values 0, 0.1, 0.2, , we might guess p = 0.6 ,
which has the highest probability to produce the outcome of HTTTHHHTH.

General procedure to calculate the maximum likelihood estimator

A coin is tossed n times and x number of heads are observed. Let p represent the
probability that a head shows up in one flip of coin. Calculate the maximum likelihood
estimator of p .

Step One Write the probability that the observed event happens (the likelihood
function)

The probability for us to observe x heads out of n flips of a coin is:

P ( getting x heads out of n flips p ) = Cnx p x (1 p )


n x

Step Two Take logarithms of the likelihood function (called log-likelihood


function). This step simplifies our calculation (as youll see soon).

ln P ( getting x heads out of n flips p ) = ln Cnx + x ln p + ( n x ) ln (1 p )

Step Three Take the 1st derivative of the log-likelihood function regarding the
parameter. Set the 1st derivative to zero.

ln P ( getting x heads out of n flips p ) = 0


d
dp

d
ln Cnx + x ln p + ( n x ) ln (1 p ) = 0 ,
dp

d d d
ln Cnx + ( x ln p ) + (n x ) ln (1 p ) = 0 ,
dp dp dp

In the above equation, the variable is p ; n and x are constants.

d d d x
ln Cnx = 0 , ( x ln p ) = x ( ln p ) = ,
dp dp dp p

d d n x
(n x ) ln (1 p ) = ( n x ) ln (1 p ) =
dp dp 1 p

Guo Fall 2009 C, Page 53 / 284


x n x 1 p n x 1 n x
=0, = , = , p=
p 1 p p x p x n

Nov 2000 #6
You have observed the following claim severities:

11.0, 15.2, 18.0, 21.0, 25.8

You fit the following probability density function to the data:

1 1
f ( x) = (x ) , x > 0 , > 0
2
exp
2 x 2x

Determine the maximum likelihood estimator of .

Solution

First, make sure you understand the theoretical framework.

Here we take a random sample of 5 claims X 1 , X 2 , X 3 , X 4 , and X 5 . We assume that


X 1 , X 2 , X 3 , X 4 , and X 5 are independent identically distributed with a common pdf

1 1
f ( x) = (x )
2
exp
2 x 2x

The joint density of X 1 , X 2 , X 3 , X 4 , and X 5 is:

f X1 , X 2 , X 3 , X 4 , X 5 ( x1 , x2 , x3 , x4 , x5 )
= f X ( x1 ) f X ( x2 ) f X ( x3 ) f X ( x4 ) f X ( x5 )
1 1 1 1 1 1
= ( x1 ) ( x2 ) ( x3 )
2 2 2
exp exp exp
2 x1 2 x1 2 x2 2 x2 2 x3 2 x3

1 1 1 1
( x4 ) ( x5 )
2 2
exp exp
2 x4 2 x4 2 x5 2 x5

The probability that we observe X 1 , X 2 , X 3 , X 4 , and X 5 is:

P ( x1 X1 x1 + dx1 , x2 X2 x2 + dx2 , x3 X3 x3 + dx3 , x4 X4 x4 + dx4 , x5 X5 x5 + dx5 )


= f X ( x1 ) f X ( x2 ) f X ( x3 ) f X ( x4 ) f X ( x5 ) dx1dx2 dx3dx4 dx5

Guo Fall 2009 C, Page 54 / 284


Our goal is to find a parameter that will maximize our chance of observing X 1 , X 2 ,
X 3 , X 4 , and X 5 . To maximize our chance of observing X 1 , X 2 , X 3 , X 4 , and X 5 is to
maximize the joint pdf f X1 , X 2 , X 3 , X 4 , X 5 ( x1 , x2 , x3 , x4 , x5 ) . To maximize the joint pdf
f X1 , X 2 , X 3 , X 4 , X 5 ( x1 , x2 , x3 , x4 , x5 ) , we can set the 1st derivative of the joint pdf regarding to
equal to zero:

d
f X , X , X , X , X ( x1 , x2 , x3 , x4 , x5 ) = 0
d 1 2 3 4 5

Though we can solve the above equation by pure hard work, an easier approach is to find
a parameter that will maximize the log-likelihood of us observing X 1 , X 2 , X 3 , X 4 ,
and X 5 :

ln f X1 , X 2 , X 3 , X 4 , X 5 ( x1 , x2 , x3 , x4 , x5 )

If ln f X1 , X 2 , X 3 , X 4 , X 5 ( x1 , x2 , x3 , x4 , x5 ) is maximized, f X1 , X 2 , X 3 , X 4 , X 5 ( x1 , x2 , x3 , x4 , x5 ) will
surely be maximized. So the task boils down to finding such that the 1st derivative of
the log pdf is zero:

d
ln f X1 , X 2 , X 3 , X 4 , X 5 ( x1 , x2 , x3 , x4 , x5 ) = 0
d

ln f X1 , X 2 , X 3 , X 4 , X 5 ( x1 , x2 , x3 , x4 , x5 )
5 5
1 1 1 1
= ( xi ) = ( xi )
2 2
ln exp ln
i =1 2 xi 2 xi i =1 2 xi 2 xi

Setting the 1st derivative of the log joint pdf to zero:

5
d 1 1
( xi ) =0
2
ln
d i =1 2 xi 2 xi

In the above equation, the random variable is ; x1 , x2 , x3 , x4 , and x5 are constants. So


1
is a constant and its derivative regarding is zero.
2 xi

( xi )
2
5 5
d 1 d
( xi ) = 0, =0
2

d i =1 2 xi d i =1 xi

Guo Fall 2009 C, Page 55 / 284


( xi ) d ( xi )
2 2
d 5 5 5
xi 5

= = 2 = 2 1 =0
d i =1 xi i =1 d xi i =1 xi i =1 xi

5
1 1 1 1 1
1 =0, 5 + + + + =0
i =1 xi x1 x2 x3 x4 x5

5 5
= = = 16.74
1 1 1 1 1 1 1 1 1 1
+ + + + + + + +
x1 x2 x3 x4 x5 11 15.2 18 21 25.8

After understanding the theoretical framework and detailed calculation, we are ready to
use a shortcut. First, lets isolate the variable :

1 1 1
f ( x) = (x ) (x )
2 2
exp exp
2 x 2x 2x

5
1
f X1 , X 2 , X 3 , X 4 , X 5 ( x1 , x2 , x3 , x4 , x5 ) ( xi )
2
exp
i =1 2 xi

( xi )
2
5
ln f X1 , X 2 , X 3 , X 4 , X 5 ( x1 , x2 , x3 , x4 , x5 )
i =1 xi
d ( xi )
2
d 5 5
xi
ln f X1 , X 2 , X 3 , X 4 , X 5 ( x1 , x2 , x3 , x4 , x5 ) = 0 = 2 =0
d i =1 d xi i =1 xi

5 5
= = = 16.74
1 1 1 1 1 1 1 1 1 1
+ + + + + + + +
x1 x2 x3 x4 x5 11 15.2 18 21 25.8

May 2000 #21

You are given the following five observations:

521 658 702 819 1217

You use the single-parameter Pareto with cumulative distribution function:

!
500
F ( x) = 1 , x > 500 , ! > 0
x

Guo Fall 2009 C, Page 56 / 284


Calculate the maximum likelihood estimate of the parameter ! .

Solution

From Exam C Table, you should be able to find:

! 500!
f ( x) =
x! +1

The joint pdf of having 5 observations is:

5 ! 500! ! 5 5005!
f ( x1 , x2 , x3 , x4 , x5 ) = " =
( x1 x2 x3 x4 x5 )
! +1
i =1 xi! +1

! 5 5005!
ln f ( x1 , x2 , x3 , x4 , x5 ) = ln = 5ln ! + 5! ln 500 (! + 1) ln ( x1 x2 x3 x4 x5 )
( x1 x2 x3 x4 x5 )
! +1

d 5
ln f ( x1 , x2 , x3 , x4 , x5 ) = + 5 ln 500 ln ( x1 x2 x3 x4 x5 ) = 0
d! !

5
+ 5ln 500 ln ( 521 658 702 819 1217 ) = 0 , ! = 2.453
!

Nov 2000 #22


You are given the following information about a random sample:
The sample size equals five
The sample is from a Weibull distribution with $ = 2
Two of the sample observations are known to exceed 50, and the remaining three
observations are 20, 30, and 45

Calculate the maximum likelihood estimator of % .

Solution

From Exam C table, youll find the Weibull pdf and cdf:
$
$ x
x
$ e % 2 2 2

%
x x x
2x
f ( x) = = e %
, F ( x) = 1 e %
, S ( x) = e %
x %2

We have observed the following:

Guo Fall 2009 C, Page 57 / 284


x1 > 50 , x2 > 50 , x3 = 20 , x4 = 30 , x5 = 45

The likelihood function is:

L (% ) = f ( 20 ) f ( 30 ) f ( 45 ) S ( 50 ) S ( 50 )

2 ( 20 ) 2 ( 30 ) 2 ( 40 )
2 2 2 2 2
20 30 40 50 50
= exp exp exp exp exp
%2 % %2 % %6 % % %

8,325
1 8,325
L (% ) e %2
ln L (% ) = k 6ln % , where k is a constant
% 6
%2

d 2 ( 8,325 ) 6
ln L (% ) = = 0, % = 52.7
d% %3 %

Fisher Information

One key theorem you need to memorize for Exam C is that the maximum likelihood
1
estimator % is approximately normally distributed with mean %0 and variance :
I (% )

1
% N %0 ,
I (% )

Here %0 is the true parameter. L ( x,% ) , called Fisher information or information, is the
d
variance of ln L ( x,% ) :
d%

2 2
d d d2
I (% ) = VarX ln L (% ) = E X ln L ( x, % ) = EX ln L ( x, % )
d% d% d% 2

Please note in the above equation, the expectation and variance are regarding X .

1
Its quit a bit a math to prove that % N %0 , . So I wont show you the proof.
I (% )
Youll just need to memorize it. However, Ill show you why

Guo Fall 2009 C, Page 58 / 284


2 2
d d d2
I (% ) = VarX ln L ( x,% ) = E X ln L ( x,% ) = EX ln L ( x,% )
d% d% d% 2

First, let me introduce a new concept to you called score. The term score is not the
syllabus. However, its a building block for Fisher information. So lets take a look.

Assume we have observed x1 , x2 ,, xn . Let L ( x,% ) represent the likelihood function.

n
L ( x,% ) = " f ( xi ,% ) , where % is the unobservable parameter of the density function.
i =1

When calculating the maximum likelihood estimator % , we often use the log-likelihood
function. So lets consider log-likelihood function, ln L ( x,% ) . The derivative of the log-
d
likelihood function regarding the estimator % , ln L ( x, % ) , is called the score of the
d%
log-likelihood function. Lets find the mean and variance of the score.

d 1 d
ln L ( x,% ) = L ( x, % )
d% L (% ) d%

Using the standard formula E X g ( x ) = & g ( x ) f ( x ) dx , we have:

d 1 d
EX ln L ( x,% ) = E X L ( x, % )
d% L ( x , % ) d%

1 d d d
=&
L ( x , % ) d%
L ( x, % ) L ( x, % ) dx = &
d%
L ( x, % ) dx =
d% & L ( x,% ) dx
density
random variable

However, & L ( x,% ) dx = 1 ( property of pdf). So we have:

d 1 d d
EX ln L ( x, % ) = E X L ( x, % ) = 1= 0
d% L ( x , % ) d% d%

2 2
d d2
Next, let me explain why E ln L ( x,% ) = E ln L ( x,% ) .
d% d% 2

Guo Fall 2009 C, Page 59 / 284


d d
We know that E X ln L ( x, % ) = & ln L ( x, % ) L ( x,% ) dx = 0
d% d%

Taking derivative regarding % at both sides of the above equation:

d d d
& ln L ( x, % ) L ( x,% ) dx = 0=0
d% d% d%

d
Moving inside the integration, we have:
d%

d d d d
& ln L ( x, % ) L ( x,% ) dx = & ln L ( x, % ) L ( x, % ) dx
d% d% d% d%

d d d
Using the formula u ( x) v ( x) = u ( x) v ( x) + v ( x) u ( x ) , we have:
dx dx dx

d d
ln L ( x, % ) L ( x,% )
d% d%

d d d d
= L ( x, % ) ln L ( x, % ) + ln L ( x, % ) L ( x, % )
d% d% d% d%

d 1 d d d
However, ln L ( x,% ) = L ( x, % ) . L ( x , % ) = L ( x, % ) ln L ( x, % )
d% L ( x,% ) d% d% d%

So we have:

d d
ln L ( x,% ) L ( x,% )
d% d%
d d d d
= L ( x, % ) ln L ( x, % ) + ln L ( x, % ) L ( x, % )
d% d% d% d%
2
d d d
= L ( x, % ) ln L ( x, % ) + L ( x, % ) ln L ( x, % )
d% d% d%
2
d2 d
= L ( x, % ) 2 ln L ( x, % ) + L ( x, % ) ln L ( x, % )
d% d%
2
d2 d
= L ( x, % ) ln L ( x, % ) + ln L ( x, % )
d% 2
d%

Guo Fall 2009 C, Page 60 / 284


d d
Then & ln L ( x, % ) L ( x,% ) dx = 0 becomes:
d% d%

2
d2 d
& ln L ( x, % ) L ( x,% ) dx + & ln L ( x, % ) L ( x, % ) dx = 0
d% 2
d%

d2 d2
However, & ln L ( x , % ) L ( x , % ) dx = E ln L ( x,% ) ,
d% 2 d% 2

2 2
d d
& ln L ( x, % ) L ( x, % ) dx = E ln L ( x, % )
d% d%

2
d2 d
Then it follows that E ln L ( x, % ) + E ln L ( x, % ) = 0.
d% 2
d%

d
Since we know that E ln L ( x, % ) = 0 , it follows that
d%

2
d d d2
Var ln L ( x, % ) = E ln L ( x,% ) = E ln L ( x,% )
d% d% d% 2

d
The score ln L ( x, % ) has
d%
2
d d2
zero mean and variance E ln L ( x, % ) = E ln L ( x,% )
d% d% 2

Nov 2003 #18


The information associated with the maximum likelihood estimator of a parameter % is
4n , where n is the number of observations.

Calculate the asymptotic variance of the maximum likelihood estimator of 2% .

Solution

()
Var % is the inverse of the information. So Var % = () 1
4n

( )
Var 2% = 4Var % = 4 () 1
4n
1
= .
n

Guo Fall 2009 C, Page 61 / 284


The Cramer-Rao theorem

Suppose the random variable X has density function f ( x,% ) . If g ( x ) is any unbiased
1
estimator of % , then Var g ( x ) ' . The proof is as follows:
Var f ( x,% )

E g ( x ) = & g ( x ) f ( x, % ) dx . Since g ( x ) is an unbiased estimator of % , E g ( x ) = % .

& g ( x ) f ( x,% ) dx = % .
Taking derivative regarding % at both sides of the above equation:

d d
& g ( x ) f ( x, % ) dx = % =1
d% d%

d
Moving inside the integration:
d%

d d
& g ( x ) f ( x, % ) dx = & g ( x ) f ( x, % ) dx = 1
d% d%

g ( x ) is a constant if the derivative is regarding % . So we have:

d d
& d% g ( x ) f ( x, % ) dx = & g ( x ) f ( x, % )dx = 1
d%

d d
However, f ( x, % ) = f ( x, % ) ln f ( x, % ) . So we have
d% d%

d d
& g ( x ) d% f ( x,% )dx = & g ( x ) d% ln f ( x,% ) f ( x,% ) dx = 1

d d
However, & g ( x ) d% ln f ( x,% ) f ( x,% ) dx = E g ( x ) d% ln f ( x,% ) .

d
EX g ( x ) ln f ( x, % ) = 1 .
d%

Guo Fall 2009 C, Page 62 / 284


d
Next, consider covariance Cov g ( x ) , ln f ( x, % ) .
d%

Cov g ( x ) ,
d
d%
ln f ( x,% ) = E X {g ( x ) Eg ( x )}
d
d%
ln f ( x, % ) E
d
d%
ln f ( x,% )

The above is just the standard formula Cov ( X , Y ) = E { X E(X ) Y }


E (Y ) .

d d
However, E X g ( x ) = % , E X ln f ( x, % ) = 0 . ln f ( x,% ) is the score and has
d% d%
zero mean. Then it follows:

Cov g ( x ) ,
d
d%
ln f ( x, % ) = EX { g ( x ) % } dd% ln f ( x,% )

d d
= EX g ( x ) ln f ( x,% ) % ln f ( x, % )
d% d%

d d
= EX g ( x ) ln f ( x,% ) EX % ln f ( x, % )
d% d%

d d
= EX g ( x ) ln f ( x, % ) % EX ln f ( x,% )
d% d%

=1 % 0 =1

d
Cov g ( x ) , ln f ( x, % ) = 1
d%

Next, applying the general rule:

Cov ( X , Y ) = * X ,Y + X + Y , where * X ,Y is the correlation coefficient. Because * X ,Y 1 , we


have:

Cov ( X , Y ) [+ X + Y ] = Var ( X ) Var (Y )


2 2
= * X ,Y + X + Y
2

2
d d
1 = Cov g ( x ) , ln f ( x,% ) Var g ( x ) Var ln f ( x,% )
d% d%

Guo Fall 2009 C, Page 63 / 284


1
Var g ( x ) '
d
Var ln f ( x, % )
d%

The above formula means this:


For an unbiased estimator g ( x ) , its variance is no less than the reciprocal of the variance
d
of the score ln f ( x,% ) .
d%

1
Var g ( x ) ' is a generic formula. When we use the maximum
d
Var ln f ( x,% )
d%
likelihood estimator, then the density function is:

f ( x,% ) = f ( x1 , % ) f ( x2 ,% ) ... f ( xn ,% ) = L ( x, % )

d 1
When the ln f ( x,% ) meets certain condition, Var g ( x ) = . We
d% d
Var ln f ( x,% )
d%
are not going to worry about what these conditions are. All we need to know is that for
the maximum likelihood estimator g ( x ) , when n , the sample size of the observed data
X 1 , X 2 ,..., X n , approaches infinity, the variance of g ( x ) approaches

1
d
Var ln L ( x, % )
d%

For a single maximum likelihood estimator % ,

1
Var (% ) . as simple size n approaches infinity.
d
Var ln L ( x, % )
d%

Extending the above result to a series of maximum likelihood estimators (presented


without proof):

Guo Fall 2009 C, Page 64 / 284


Assume that random variable X has density f ( x;%1 ,% 2 ,...,% k ) . The covariance
Cov (% i , % j ) between two maximum likelihood estimators %i and % j , as simple size n
approaches infinity, is equal to the inverse of ( i, j ) entry of Fisher Information:

/ 2 ln f ( x;%1 ,% 2 ,...,% k ) / 2 ln L ( x;%1 , % 2 ,..., % k )


Ii , j = E =E
/%i /% j /%i /% j

For two maximum likelihood estimators, Fisher Information matrix is:


/2 /2
E ln L ( x;%1 ,% 2 ) E ln L ( x;%1 ,% 2 )
/%12 /%1/% 2
I=
/2 /2
E ln L ( x;%1 ,% 2 ) E ln L ( x;%1 ,% 2 )
/%1/% 2 /% 22

/2
Where I1,2 = I 2,1 = E ln L ( x;%1 , % 2 )
/%1/% 2

Then

Cov (%1 , %1 ) = Var (%1 ) Cov (%1 , % 2 )


=I 1

Cov (% 2 , %1 ) Cov (% 2 , % 2 ) = Var (% 2 )

Nov 2000 #13


A sample of ten observations comes from a parametric family f ( x, y;%1 , % 2 ) with log
likelihood function

10
ln L (%1 , % 2 ) = ln f ( xi , yi ;%1 , % 2 ) = 2.5%12 3%1% 2 % 22 + 5%1 + 2% 2 + k
i =1

where k is a constant.
%1
Determine the estimated covariance matrix of the maximum likelihood estimator .
%2

Solution
Guo Fall 2009 C, Page 65 / 284
/2 /2
2 (
E ln L (%1 ,% 2 ) = E 2.5%12 3%1% 2 % 22 + 5%1 + 2% 2 + k ) = E ( 5) = 5
/%12 /%1

/2 /2
2 (
E ln L (%1 , % 2 ) = E 2.5%12 3%1% 2 % 22 + 5%1 + 2% 2 + k ) = E ( 2 ) = 2
/% 22 /% 2

/2 /2
E
/%1/% 2
ln L (%1 , % 2 ) = E
/%1/% 2
( 2.5%12 3%1% 2 % 22 + 5%1 + 2% 2 + k ) = E ( 3) = 3

Fisher Information is:

5 3
I=
3 2

The general formula inversing a 22 matrix is:

1
a b 1 d c
= , if ad bc 0 0
c d ad bc b a

Var (%1 ) Cov (%1 ,% 2 )


1
5 3 1 2 3 2 3
=I 1
= = =
Cov (%1 ,% 2 ) Var (% 2 ) 3 2 5 2 3 3 3 5 3 5

Fisher Information matrix is good for estimating the variance and covariance of a series
of maximum likelihood estimators. What if we need to estimate the variance and
covariance of a function of a series of maximum likelihood estimators? We can use the
delta method.

Delta method

Assume that random variable X has mean X and variance + X2 . Define a new function
Y = f ( X ) . Assume that f ( X ) is differentiable, we have:

f ( X ) . f ( X ) + f / ( X )( X X )

Take variance at both sizes and notice that f ( X ) and f / ( X ) are constants:

Var f ( X ) . Var f ( X ) + f / ( X )( X X )

Guo Fall 2009 C, Page 66 / 284


= f / ( X ) Var ( X X ) = f / ( X ) Var ( X )
2 2

We get the delta formula Var f ( X ) . f / ( X ) Var ( X ) .


2

2 2

Example. Y = X . Then Var ( ) X .


d
dx
X
X =X
Var ( X ) =
1
2 X
Var ( X )

To get a feel of this formula, set Y = f ( X ) = cX , where c is a constant. Then the delta
formula becomes: Var [ cX ] . c 2Var ( X ) .

We can rewrite the formula Var f ( X ) . f / ( X ) Var ( X ) as


2

Var f ( X ) . f / ( X ) Var ( X ) f / ( X )

()
Suppose we want to find the variance of f % , where % is an estimator of a true

parameter % . Please note that % is a random variable. For example, if % is the maximum
likelihood estimator, % varies depending on the sample size and on the sample data we
have observed. Also assume based on the sample data we have, we get one estimator %0 .
Set X = % and E ( X ) = E % : ()
() () ()
2
Var f % . f/ E % Var %

If % is the MLE of an unobservable true parameter % , then % is unbiased and E % = % . ()


However, we dont know the true value of % . Nor do we know f / E % () . Assume that,
based on your sample data on hand, the maximum likelihood estimators for the true
parameters % is a . Then we might want to set % . a .Then we have:

Var f % () . f / ( a ) Var %
2
()
Variance of a function of two random variables

X has mean X and variance + X2 ; random variable Y has mean Y and variance + Y2
Define a new function Z = f ( X , Y ) . Assume that f ( Z ) is differentiable, we have:

Guo Fall 2009 C, Page 67 / 284


f ( X , Y ) . f ( X , Y ) + f X/ ( X , Y )( X X ) + fY/ ( X , Y )(Y X )

Take variance at both sides of the equation and notice that X , Y , f ( X , Y ) ,


f X/ ( X , Y ) , and fY/ ( X , Y ) are all constants:

Var f ( X , Y )

. f X/ ( X , Y ) Var ( X X ) + fY/ ( X , X ) Var (Y X )


2 2

+2 f X/ ( X , Y ) f X/ ( X , Y ) Cov ( X X ) , ( X X )
. f X/ ( X , Y ) Var ( X ) + fY/ ( X , X ) Var (Y )
2 2

+2 f X/ ( X , Y ) f X/ ( X , Y ) Cov ( X , Y )

Express this formula in a matrix:

Var f ( X , Y )
Var ( X ) Cov ( X , Y ) f X/ ( X , Y )
. f /
( X , Y ) f /
( X , Y )
X X
Cov ( X , Y ) Var ( Y ) f X/ ( X , Y )

Many times we are interested in finding the variance of a function of maximum


likelihood estimators. As a simple case, say we have two maximum likelihood estimators
%1 and % 2 . We want to find the variance of f %1 ,% 2 . Setting X = %1 , Y = % 2 , ( )
( )
X = E %1 , X = E % 2 , we have: ( )
Var f %1 , % 2 ( ) .

(% ,% ) ( ) (% ,% ) ( ) ( ) (% ,% ) Cov (% ,% )
2 2
/
fE % Var %1 + f E % /
Var % 2 + 2 f E/ % %1 ,% 2 f E/ %
1
1 2
( )
2
1 2
( )
1 ( )
2
1 2 1 2

If %1 and % 2 are MLE of the true unobservable parameters %1 and % 2 , then E %1 = %1 ( )


( )
and E % 2 = % 2 . Then

Var f %1 , % 2( ) .

( ) ( ) ( ) ( ) ( ) ( ) ( )
2 2
f%/1 %1 ,% 2 Var %1 + f%/2 %1 ,% 2 Var % 2 + 2 f%/1 %1 ,% 2 f%/2 %1 ,% 2 Cov %1 ,% 2

Guo Fall 2009 C, Page 68 / 284


However, we dont know the true value of %1 and % 2 . Nor do we know f%/1 %1 ,% 2 and ( )
( )
f%/2 %1 ,% 2 . Assume that, based on your sample data on hand, the maximum likelihood
estimators for the true parameters %1 and % 2 are a and b respectively. Then we might
want to set

(
f%/1 %1 , % 2 = ) 1
1%1
f %1 , % 2 ( ) .
1
1%1
f %1 , % 2 ( ) ,
%1 %1 = a

(
f%/2 %1 , % 2 = ) 1
1% 2
f %1 , % 2 ( ) .
1
1% 2
f %1 , % 2 ( )
%2 % 2 =b

Then we have:

(
Var f %1 , % 2 )
2 2

.
1
1%1
(
f %1 , % 2 ) Var %1 + ( ) 1
1% 2
f %1 , % 2 ( ) ( )
Var % 2
%1 = a % 2 =b

+2
1
1%1
(
f %1 , % 2 ) 1
1% 2
(
f %1 , % 2 ) (
Cov %1 , % 2 )
%1 = a %2 =b

To simply the notation, well rewrite the symbol


1
1%1
f %1 , % 2 ( ) as
1
1%1
(
f %1 ,% 2 )
%1 = a %1

and
1
1% 2
(
f %1 , % 2 ) as
1
1% 2
(
f %1 , % 2 ) . Then
% 2 =b %2

(
Var f %1 , % 2 )
2 2

.
1
1%1
(
f %1 , % 2 ) Var %1 + ( ) 1
1% 2
f %1 , % 2 ( ) Var % 2 ( )
%1 %2

+2
1
1%1
(
f %1 , % 2 ) 1
1% 2
(
f %1 , % 2 ) (
Cov %1 , % 2 )
%1 %2

Guo Fall 2009 C, Page 69 / 284


However, youll need to remember that
1
1%1
f %1 , % 2( ) really means
%1

1
1%1
f %1 , % 2( ) and that
1
1% 2
(
f %1 , % 2 ) really means
1
1% 2
(
f %1 , % 2 ) .
%1 = a %2 % 2 =b

Otherwise, youll get in a conceptual mess that %1 in the function f %1 ,% 2 is a random ( )


variable, yet %1 in the symbol [ ]% 1
is not a random variable but a fixed maximum
likelihood estimator.

Expressing the above formula in a matrix:

Var f %1 , % 2 ( )
Var % 1( ) (
Cov % 1 ,% 2 ) ( )
f%/ %1 ,% 2
(
. f% %1 ,% 2
/
) (
f% %1 , % 2
/
) 1

1 2
(
Cov % 1 ,% 2 ) Var % 2 ( ) f%/
2
(% ,% )
1 2

Please note that


Var % 1 ( ) (
Cov % 1 , % 2 ) = I 1 , where I is Fisher Information.
(
Cov % 1 , % 2 ) Var % 2 ( )

May 2000 #25


You model a loss function using lognormal distribution with parameters and + . You
are given:
The maximum likelihood estimates of and + are
= 4.215
+ = 1.093

The estimated covariance matrix of and + is:

0.1195 0
0 0.0597

1
The mean of the lognormal distribution is exp + + 2
2

Estimate the variance of the maximum likelihood estimate of the mean of the lognormal
distribution, using the delta method.
Guo Fall 2009 C, Page 70 / 284
Solution

1
The mean function is f ( , + ) = exp + + 2 . The maximum likelihood estimator of
2

( ) 1 2
f ( , + ) is f , + = exp + + , where and + are maximum likelihood
2
estimator of and + respectively.

( )
We are asked to find Var f , +
1 2
= Var exp + +
2
.

Using Taylor series approximation around ( , + ) , we have:

( )
f ,+ . f ( ,+ ) +
1
1
( ) ( ) +
f ,+
1
1+
( ) (+ + )
f ,+
+

Taking variance at both sides of the equation:

2
2
1
( )
Var f , + .
1
f ,+ ( ) ( )
Var +
1
1+
f ,+ ( ) +
Var + ( )

+2
1
1
( )
f ,+
1
1+
( )
f ,+
+
( )
Cov , +

We are told that The estimated covariance matrix of and + is:

0.1195 0
0 0.0597

( ) ( )
So Var . 0.1195 , Var + . 0.0597 , Cov , + . 0 . ( )
2
2

( )
Var f , + .
1
1
f ,+( ) 0.1195 +
1
1+
( )
f ,+
+
0.0597

However, we dont know and + . Nor do we know


1
1
f ,+ ( ) and
1
1+
( )
f ,+
+
.

Consequently, we set

Guo Fall 2009 C, Page 71 / 284


1
1
f ,+ ( ) .
1
1
f ,+ ( ) ,
1
1+
f ,+( ) +
.
1
1+
( )
f ,+
+

1
1
( )
f ,+ =
1
1
1 2
exp + +
2
1 2
= exp + +
2

1
1+
( )
f ,+ =
1
1+
1 2
exp + +
2
1 2
= + exp + +
2

1
1
( )
f ,+
1 2
. exp + +
2
1
. exp 4.125 + 1.0932 = 123.02
2

1
1
( )
f ,+
1 2
. + exp + +
2
1
. 1.093exp 4.125 + 1.0932 = 134.46
2

Var f , + ( ) . 123.022 0.1195 + 134.462 0.0597 = 2,888

Please note that you can also solve this problem using the black-box formula

(
Var f %1 , % 2 )
2 2

.
1
1%1
(
f %1 , % 2 ) ( )
Var %1 +
1
1% 2
f %1 , % 2( ) ( )
Var % 2
%1 %2

+2
1
1%1
(
f %1 , % 2 ) 1
1% 2
(
f %1 , % 2 ) (
Cov %1 , % 2 )
%1 %2

However, I recommend that you first solve the problem using Taylor series
approximation. This forces you to understand the logic behind the messy formula. Once
you understand the formula, next time you can use the memorized formula for
(
Var f %1 , % 2 ) and quickly solve the problem.

May 2005 #9, #10


The time to an accident follows an exponential distribution. A random sample of size two
has a mean time of 6. Let Y represent the mean of a new sample of size two.

Determine the maximum likelihood estimator of Pr ( Y > 10 ) .

Guo Fall 2009 C, Page 72 / 284


Use the delta method to approximate the variance of the maximum likelihood estimator
of FY (10 ) .

Solution

The time to an accident follows an exponential distribution. Assume % is the mean for
this exponential distribution. If X 1 and X 2 are two random samples of time-to-accident,
then the maximum likelihood estimator of % is just the sample mean. So % = 6 .

X1 + X 2
Pr (Y > 10 ) = Pr > 10 = Pr ( X 1 + X 2 > 20 )
2

X 1 + X 2 is gamma with parameters ! = 2 and % . 6 . Then

2
te t 6
Pr ( X 1 + X 2 > 20 ) = &20 36 dt
2
te t 6
To calculate & 36 dt , youll need to memorize the following shortcut:
20

+2 1
& x e x /%
dx = (a + % ) e a /%
a %

+2 1
& (a + % )
x /% 2 a /%
x2 e dx = +% 2 e
a %

If interested, you can download the proof of this shortcut from my website
http://www.guo.coursehost.com. The shortcut and the proof are in the sample chapter of
my P manual. Just download the sample chapter of P manual and youll get the proof and
more worked out examples using this shortcut.

2 2
te t 6 1 1 1
&20 36 dt = 6 20& t 6 e
t 6
dt = [ 20 + 6] e 20 6
= 0.1546
6

If two new samples X 1 and X 2 are taken, then

20 t% 20 t %
te te
FY (10 ) = Pr ( X 1 + X 2 20 ) = & dt FY (10 ) = & dt
0
%2 0 %
2

Guo Fall 2009 C, Page 73 / 284


2
t %
1 t %

()
20 20
te te
Var FY (10 ) = Var & dt . & dt Var %
1%
2 2
0 % 0 % ()
E % .6

()
Var % = Var X = Var
X1 + X 2
2
( )
= ( 2 ) Var ( X ) = % 2 . ( 6 2 ) = 18
1
4
1
2
1
2
Please note that the two samples X 1 and X 2 are independent identically distributed with
a common variance Var ( X ) = % 2 .

1 20
te t %
Next, we need to calculate
1%
&
0 %
2
dt .

20 t % 20 2 2
te 1 1 1 1 1
& &t &t &t
t % t % t %
dt = e dt = e dt e dt
% % % % %
2
0 % 0 0 20

=
1
%
% ( 20 + % ) e 20 %
=1 1+
20
%
e 20 %

1 20 20 % 1 20 20 % 400 20 %
1 1+ e = 1+ e = e
1% % 1% %
3
%

1 20
te t %
400 20
1%
& %
2
dt =
%
3
exp
%
= 0.066
0
()
E % .6 6

1 t %

()
20
te
Var FY (10 ) . & dt Var % = 0.0662 (18) = 0.078
1%
2
0 % ()
E % .6

Guo Fall 2009 C, Page 74 / 284


Chapter 3 Kernel smoothing
Essence of kernel smoothing

Kernel smoothing
=Set your point estimate equal to the average of a neighborhood
=Recalculate at every point by averaging this point and the nearby points

Let me illustrate this with a story. You want to buy a house. After looking at many
houses, you find one house you like most. You go the current owner of the house and ask
for the price. The current owner tells you, Im asking for $210,000. Make me an offer.

What are you going to offer? 200,000? $203,000? $205,000 or something else? You are
not sure. And you know the danger: if your offer is too high, the seller accepts your offer
and youll overpay the house; if your offer is too low, youll look stupid and the seller
may refuse to deal with you anymore. So to your best interest, youll want to make your
offer reasonable, not too high, not too low.

If you talk to someone experienced in the real estate market, hell tell you how (and this
works): instead of making a random offer, you can make your offering price to be around
the average selling price of the similar houses sold in the same neighborhood.

Say four similar houses in the same neighborhood are sold this year. Their prices are
$198,000, $200,000, $201,000, and $202,000. So the average selling price is $200,250. If
the house you want to be is truly similar to these four houses, then the seller is asking for
too much. You can offer around $200, 250 and explain to the seller that your asking price
is very similar to the selling price of the houses in the same neighborhood. A reasonable
seller will be willing to lower his asking price.

What advantage do we gain by looking at a neighborhood? A smoothed, better estimate.


If we focus on one house alone, its selling price appears random. However, when we
broaden our view and look at many similar houses nearby, well remove the randomness
of the asking price and see a more reasonable price.

This simple story illustrates the spirit of kernel smoothing. When we want to estimate
f X ( x ) , probability density of a random variable X at point x . Instead of looking at one
# of x's in the sample
point x and say f X ( x) = p ( x) = , we may want to look at the x s
sample size n
neighborhood. For example, we may want to look at 3 data points x b , x , and x + b
where b is a constant. Then we calculate the average of empirical densities at x b , x ,
and x + b and use it as an estimator of f X ( x ) :

Guo Fall 2009 C, Page 75 / 284


1 1 1
f X ( x) = p ( x b) + p ( x) + p ( x + b)
3 3 3
calculate f ( x ) by averaing the empirical densities
of a neighborhood x b , x , x + b

Please note the analogy of determining the house price is not perfect. Theres one small
difference between how we estimate the price of a house located at x and how we
estimate f X ( x ) . When we estimate the fair price of a house located at x , we exclude the
data point x because we dont know the value of the house located at x :

Value of a house located at x


= 0.5 *value of the houses located at x b + 0.5 *value of the houses located at x + b

In contrast, when we estimate the density at x , we include the empirical density p ( x ) in


our estimate:

1 1 1
f X ( x) = p ( x b) + p ( x) + p ( x + b)
3 3 3

We include p ( x ) in our f X ( x) calculation because f X ( x) by itself is an estimate of


p ( x ) . Stated differently, in kernel smoothing, we estimate f X ( x ) twice. The first time,
# of x's in the sample
we use the empirical density p ( x ) = to estimate f X ( x ) . The 2nd
sample size n
time, we refine our estimate f X ( x ) = p ( x ) by taking the average empirical densities of
x and its nearby points x b and x + b . This is why kernel smoothing recalculates at
every point by averaging this point and its nearby points.

Of course, we can expand our neighborhood. Instead of looking at only two nearby points,
we may look at 4 nearby points and calculate the average empirical density of a 5-point
neighborhood:

1 1 1 1 1
f X ( x) = p ( x 2b ) + p ( x b ) + p ( x ) + p ( x + b ) + p ( x + 2b )
5 5 5 5 5
calculate f ( x ) by averaing the empirical densities
of a neighborhood x 2b , x b , x , x +b , x + 2b

In addition, we dont need to use equal weighting. We can assign more weight to the data
points near x . For example, we can set

1 2 4 2 1
f X ( x) = p ( x 2b ) + p ( x b) + p ( x) + p ( x + b ) + p ( x + 2b )
10 10 10 10 10

Guo Fall 2009 C, Page 76 / 284


Now you understand the essence of kernel smoothing. Lets talk about the two major
issues to think about if you want to use kernel smoothing:

How big is the neighborhood? This is called the bandwidth. The bigger the
neighborhood, the greater the smoothing. However, if your neighbor is too big,
you may run the risk of over-smoothing and finding false patterns.

How much weight you do give to each data point in the neighborhood? For
example, you can assign equal weight to each data point in the neighborhood.
You can also give more weight to the data point closer to the point whose density
you want to estimate. There are many weighting methods out there for you to use.
The weighting method is called kernel.

Of these two factors, the bandwidth is typically more important than the weighting
method. Your final result may not change much if you use different weighting method.
However, if you change the bandwidth, your estimated density may change widely.
Theres some literature out there explaining in more details on how to choose a proper
bandwidth and a proper weighting method. However, for the purpose of passing Exam C,
you dont need to know that much.

3 kernels you need to know

Loss Models explains three kernels. Youll need to understand them.

Uniform kernel. This is one of the easiest weighting methods. If you use this
method to estimate density, youll assign equal weight to each data point in the
neighborhood.

Triangular kernel. Under this weighting method, you give more weight to the
data points that are closer to the point for which you are estimating density.

Gamma kernel. This is more complex but less important than the uniform kernel
and the triangular kernel. If you want to cut some corners, you can skip the
gamma kernel.

Now lets look at the math formulas. Lets focus on the uniform kernel first.

Uniform kernel

The uniform kernel for estimating density function:

0 if x < y - b
1
ky ( x) = if y - b x y+b
2b
0 if x > y + b

Guo Fall 2009 C, Page 77 / 284


Lets look at the symbol k y ( x ) . Here x is your target data point (the location of the house
you want to buy) for which you want to estimate the density (the fair price of the house
you want to buy). y is a data point in the neighborhood (location of a similar house in the
neighborhood). k y ( x ) is y s weight for estimating the density function of x .

The uniform kernel estimator of the density function at x :

f ( x) = p ( yi ) k yi ( x )
All yi
kernel estimator of the empirical density of yi yi 's weight
density function at x
Calculate the density at x by taking the average of
the empirical densities of the nearby points yi 's

The uniform kernel for estimating the distribution function:

0 if x < y - b
x y+b
K y ( x) = if y - b x y + b
2b
1 if x > y + b

The uniform kernel estimator of the distribution function at x :

F ( x) = p ( yi ) K yi ( x )
All yi
kernel estimator of the empirical density of yi yi 's weight
distribution function at x
Calculate the distribution function at x by taking the
average of the empirical densities of the nearby points yi 's

Now lets look at the formula for k y ( x ) . The formula looks intimidating. The good news
is that you really dont need to memorize it. You just need to understand the essence of
the uniform weighting method. Once you understand the essence, you can derive the
formula effortless on the spot.

Lets rewrite the uniform kernel formula as:

0 if x < y - b
0 if y - x > b
1
ky ( x) = if y - b x y+b ky ( x) = 1
2b if y - x b
0 if x > y + b 2b

Guo Fall 2009 C, Page 78 / 284


To help us remember the formula, lets draw a neighborhood diagram:

A B
y1 x b y3 x y4 x+b y2

D C

Here your neighborhood is [x b, x + b]. b is called the bandwidth, which is half of the
width of the neighborhood you have chosen. Now the formula for k y ( x ) becomes:

0 if y is OUT of the
0 if y - x > b neighborhood [ x - b, x + b]
ky ( x) = 1 ky ( x) = 1
if y - x b if y is in the
2b 2b
neighborhood [ x - b, x + b]

If the data point y is out of the neighborhood [x b, x + b] , its weight is zero. We throw
this data point away and not use it in our estimation. And this should make intuitive sense.
In the neighborhood diagram, data points y1 and y2 are discarded.

If the data point y is in the neighborhood [x b, x + b], well use this data point in our
estimation and assign a weight 1 2b . In the neighborhood diagram, data points y3 and
y4 are used in the estimation and each gets a weight1 2b .

This is how we get 1 2b . Area ABCD represents the total weight we can possibly assign
to all the data points in the neighborhood. So well want the total area ABCD equal to
one.
1
Area ABCD = AB * BC = (2b) * BC =1, so BC = .
2b

So for each data point that falls in the neighborhood AB, its weight is BC = 1 2b . For
each data point that falls out of the neighborhood AB, its weight is zero.

Now you shouldnt have trouble memorizing the uniform kernel formula for k y ( x ) .

Next, lets look at the formula for K y ( x ) , the weighting factor for the distribution
function at x :

Guo Fall 2009 C, Page 79 / 284


0 if x < y - b
x y+b
K y ( x) = if y - b x y + b
2b
1 if x > y + b

Its quite complex to derive the K y ( x ) . So lets not worry about how to derive the
formula. Lets just find an easy way to memorize the formula. Once again, lets draw a
neighborhood diagram:

A F B
x b y x x+b

D E C

To find how much weight to give to the data point y toward calculating the F ( x ) , draw
a vertical line at the data point y (Line EF). Next, imagine that you use a pair of scissors
to cut off whats to the left of Line EF while keeping whats to the right of Line EF. Next,
calculate the area of the neighborhood rectangular ABCD thats remaining after the cut.
This remaining area of the neighborhood rectangular ABCD that survives the cut is
K y ( x ) . Lets walk through this rule.

Situation One If x y b (see the diagram below), we draw a vertical line EF at


the data point y .

A F B
x b y x x+b

D E C

Next, we use a pair of scissors and cut off whats to the left of Line EF. New the diagram
becomes:

F B
y x x+b

E C

Guo Fall 2009 C, Page 80 / 284


Next, we calculate the area of the neighborhood rectangular ABCD that survives the cut.
After the cut, the original neighborhood rectangular ABCD shrinks to the rectangular
EFBC. The area of surviving area is:

1 x y+b
EFBC = EF EC = ( x + b y) =
2b 2b

This is the weight assigned to the data point y toward calculating F ( x ) .

Situation Two If y < x b (see the diagram below), we draw a vertical line EF at
the data point y .

F A B
y x b x x+b

E D C

Next, we use a pair of scissors and cut off whats to the left of Line EF. New the diagram
is as follows:

F A B
y x b x x+b

E D C

The original neighborhood rectangular ABCD completely survives the cut. So well set
K y ( x ) = ABCD = 1 .

Situation Three If y > x + b (see the diagram below), we draw a vertical line EF at
the data point y .

A B F
x b x x+b y

D C E

Next, we use a pair of scissors and cut off whats to the left of Line EF. New the diagram
is as follows:

Guo Fall 2009 C, Page 81 / 284


F
y

The original neighborhood rectangular ABCD is completely cut off. So well set
K y ( x) = 0 .

Now you see that you really dont need to memorize the ugly K y ( x ) formula. Just draw
a neighborhood diagram, use a pair of scissors, choose y at the cutting point and cut off
the left side of the diagram. Then you just calculate the surviving area of the
neighborhood rectangle. The surviving area is the K y ( x ) .

Triangular kernel

In the uniform kernel, every data point in the neighborhood gets an identical weight of
1 2b . Say we have two data points in the neighborhood y3 and y4 , but y4 is closer to x
and y4 is farther away from x (see the diagram below).

A B
x b y3 x y4 x+b

D C

The uniform kernel will gives 1 2b to y3 and y4 .

However, often times it makes sense for us to give y4 more weight than y3 . For example,
x is the location of the house you want to buy; y3 and y4 are the locations of the two
similar houses in your neighborhood. It makes intuitive sense for us to give more weight
to the house located at y4 than the one located at y3 . If the house located at y3 was sold
at $200,000 and the house located at y4 was once sold at $198,000, we might want to
assign 40% weight to the house located at y3 and 60% to the one located at y4 . Then the
estimated fair price of the house located at x is:

60%* Price of the house located at y4 + 40% * Price of the house located at y3
= 60% * 198,000 + 40% * 200,000 = $198,800

Guo Fall 2009 C, Page 82 / 284


Here comes the kernel smoothing. Kernel smoothing assigns more weight to a data point
closer to the point for which we need to estimate the density. Its assign less weight to a
data point farther away from the point for which we need to estimate the density.

Lets make sense of the triangular kernel formulas for k y ( x ) and K y ( x ) . First, lets look
at k y ( x ) :

0 if x < y - b
b+ x y
if y - b x y
b2
ky ( x) =
b+ y x
if y x y+b
b2
0 if x > y + b

Lets rewrite this formula as:

0 if x < y - b
b+ x y 0 if x- y >b
if y - b x y
b2 b+ x y
ky ( x) = ky ( x) = if x y x+b
b+ y x b2
if y x y+b
b2 b+ y x
if x - b y x
0 if x > y + b b2

Please note that y - b x y is equivalent to x y x + b . And y x y + b is


equivalent to x - b y x .

To make sense of the k y ( x ) formula, lets draw a neighborhood diagram:

H
F

A E C G B
y1 x b y2 x y3 x+b y4

Guo Fall 2009 C, Page 83 / 284


The neighborhood is [A, B]= [x b, x + b]. Now the k y ( x ) formula becomes:

0 if x- y >b
b+ x y
ky ( x) = if x y x+b
b2
b+ y x
if x - b y x
b2

0 if y is OUT of the neighborhood [ x - b, x + b ]


b+ x y
ky ( x) = if y is in the right-half neighborhood, that is y [ x, x + b ]
b2
b+ y x
if y is in the left-half neighborhood, that is y [ x - b, x ]
b2

It makes sense that k y ( x ) = 0 if y is out of the neighborhood [ x - b, x + b] . Data points


y1 and y4 are out of the neighbor and have zero weight.

Now lets find k y when the data point y is in the neighborhood [ x - b, x + b] . Data points
y2 and y3 are in the neighborhood and their weights are equal to the height EF and GH
respectively.

Before calculating EF and GH, let me give you a preliminary high school math formula.
This formula is used over and over in the triangle kernel smoothing:

In a triangle ABC, B = 90 degrees and DE is parallel to AB. Then

DE EC EC
= , DE = AB
AB BC BC

1
DE EC 2 2 2 2
DEC 2 DE EC DE EC
= = = , DEC = ABC = ABC
ABC 1 AB BC AB BC AB BC
2

where DEC represents the area DEC and ABC the area of ABC.

Guo Fall 2009 C, Page 84 / 284


A

B E C

2
DE EC DEC EC
If you dont understand why = and = , youll want to review
AB BC ABC BC
high school geometry.

Now lets come back to the following diagram and calculate EF and GH. EF is the weight
assigned to the data point y2 . GH is the weight assigned to the data point y3 .

H
F

A E C G B
y1 x b y2 x y3 x+b y4

First, please note that the area of the triangle ABD represents the total weight assigned to
all the data points in the neighborhood [A, B]. So the area of the triangle ABD should be
one:
1
ABD = 0.5 * AB * CD = 1. However, AB= 2b . 0.5* 2b *CD=1, CD =
b

EF AE AE y (x b ) 1 b + y2 x
= , EF = CD = 2 = if y2 [x b, x ] ;
CD AC AC b b b2

GH BG BG x + b y3 1 b + x y3
= , GH = CD = = if y3 [ x, x + b ]
CD BC BC b b b2

Guo Fall 2009 C, Page 85 / 284


So we have:

0 if y is OUT of the neighborhood [ x - b, x + b ]


b+ x y
ky ( x) = if y is in the right-half neighborhood, that is y [ x, x + b ]
b2
b+ y x
if y is in the left-half neighborhood, that is y [ x - b, x ]
b2

Next, lets look at the triangle kernel formula K y ( x )

0 if x < y - b
(b + x y)
2

if y - b x y
2b 2
K y ( x) =
(b + y x)
2

1 if y x y+b
2b 2
1 if x > y + b

Lets rewrite this formula as:

0 if y ( , x b)
(b + x y)
2

if y [ x, x + b ]
2b 2
K y ( x) =
(b + y x)
2

1 if y [ x - b, x ]
2b 2
1 if y ( x + b, + )

Please note that y - b x y is equivalent to y [ x, x + b ] and y x y + b equivalent


to y [ x - b, x ] .

To make sense of the K y ( x ) formula, well apply the scissor-cut rule.

Guo Fall 2009 C, Page 86 / 284


D

H
F

A E C G B
y1 x b y2 x y3 x+b y4

Situation One If y [ x, x + b ]
Draw a vertical line at the data point y (Line GH). Next, imagine that you use a pair of
scissors and cut off whats to the left of Line GH while keeping whats to the right of
Line GH. Next, calculate the area of the triangle ABD remaining after the cut. This
remaining area after the cut is K y ( x ) .

A C G B
x b x y x+b

After the cut:

G B
y x+b

Guo Fall 2009 C, Page 87 / 284


(x +b y)
2
1 x+b y
2 2
BG
K y ( x ) = BGH = BDC = =
BC 2 b 2b 2

Situation Two If y [ x - b, x ]
D

A E C B
x b y x x+b

Draw a vertical line at data point y (Line EF). Cut off whats to the left of EF.

After the cut:

E C B
y x x+b

K y ( x ) = BDFE = 1 AEF

(x b) (b + x y)
2 2 2
AE 1 y
AEF = ACD = ! " =
AC 2 b 2b 2

(b + x y)
2

K y ( x ) = BDFE = 1
2b 2

Guo Fall 2009 C, Page 88 / 284


Situation three If y ( , x b)

D
N

M A C B
y x b x x+b

Draw a vertical line MN at data point y . Cut off whats to the left of line MN. Now the
whole area ABD will survive the cut. So K y ( x ) = 1 .

Situation Four If y ( x + b, + )

A C B R
x b x x+b y

Draw a vertical line RS at data point y . Cut off whats to the left of line RS. Now the
whole area ABD will be cut off. So K y ( x ) = 0 .

Now you see that you really dont need to memorize the complex formulas for K y ( x ) .
Just draw a diagram and directly calculate K y ( x ) .

Finally, lets look at the gamma kernel.

Guo Fall 2009 C, Page 89 / 284


Gamma kernel
#
#x ax y
e
y
ky ( x) = , where x > 0
x$ (# )

To understand the gamma kernel, youll need to know this: in kernel smoothing, all the
weights should add up to one. Because of this, for convenience, we can use a density
function as weights. This way, the weights automatically add up to one.

(x % )
# x %
e y
In the gamma kernel, we just use gamma pdf . However, we set % =
x$ (# ) #

(x % )
# x %
e x# 1e x % x# 1e xa y
ky ( x) = = # =
x$ (# ) % $ (# ) y
#
$ (# )
#

The simplest gamma pdf is when a = 1 (i.e. exponential pdf). So the simple gamma
kernel is an exponential kernel:

1
ky ( x) = e x y
, where x > 0
y

x x

If you need to find the exponential kernel for F ( x ) , then K y ( x ) = & k y ( t )dt = 1 e . y

This is all you need to know about gamma kernel.

Problem 1

A random sample of size 12 gives us the following data:

1, 2, 3, 3, 5, 6, 7, 8, 9, 9, 11, 12

Use uniform kernel with bandwidth 2, calculate f ( 6 ) , F ( 6 )

Solution

uniform kernel with bandwidth 2

Guo Fall 2009 C, Page 90 / 284


The neighborhood is from 6- b =6-2=4 to 6+ b =6+2=8. When calculating f ( 6 ) , we
discard any data points that are out side of the neighborhood [4, 8]. So 1, 2, 3, 3, 9, 9,11,
12 are discarded. We only consider 5, 6, 7, 8. Each of these four data points has a weight
of 1 / (2*b)=1/4.

1 1 1 1 1 1 1 1 1
So f ( 6 ) = p ( y ) k y ( 6) = + + + =
12 4 12 4 12 4 12 4 12

In the calculation of F ( 6 ) , any data point that falls out of the lower bound or touches the
lower bound of the neighborhood [4, 8] gets a full weight of 1. Data 1, 2, 3, 3 are below
the lower bound of the neighborhood [4, 8] and they each get a weight of 1. Any data
point that falls out of the upper bound or touches the upper bound of the neighborhood [4,
8] get zero weight. So 8 (touching the upper bound) and 9, 9, 11, 12 (staying above the
upper bound) each get zero weight.

Data points y = 5, 6, 7 are in the neighborhood range [4, 8]. If you draw a diagram, youll
find that the weights for y = 5, 6, 7 are:

3 2 1
K5 ( 6 ) = , K6 ( 6) = , K7 ( 6) =
4 4 4

F (6) = p ( y ) K y ( 6)
1 1 1 1 1 3 1 2 1 1
= (1) + (1) + (1) + (1) + + + ' 0.4583
12 12 12 12 12 4 12 4 12 4

y 1 2 3 3 5 6 7 8 9 9 11 12
p ( y ) 1 12 1 12 1 12 1 12 1 12 1 12 1 12 1 12 1 12 1 12 1 12 1 12
k y (6) 0 0 0 0 1/4 1/4 1/4 1/4 0 0 0 0
K y ( 6) 1 1 1 1 3/4 2/4 1/4 0 0 0 0 0

Problem 2

A random sample of size 12 gives us the following data:

1, 2, 3, 3, 5, 6, 7, 8, 9, 9, 11, 12

Use triangle kernel with bandwidth 2, calculate f ( 6 ) , F ( 6 )

Solution

Guo Fall 2009 C, Page 91 / 284


If you draw the diagram, you should get:
y 1 2 3 3 5 6 7 8 9 9 11 12
p ( y ) 1 12 1 12 1 12 1 12 1 12 1 12 1 12 1 12 1 12 1 12 1 12 1 12
k y (6) 0 0 0 0 1/4 1/2 1/4 0 0 0 0 0
K y ( 6) 1 1 1 1 7/8 1/2 1/8 0 0 0 0 0

1 1 1 1 1 1 1
f ( 6) = p ( y ) k y ( 6) = + + =
12 4 12 2 12 4 12
F (6) = p ( y ) K y ( 6)
1 1 1 1 1 7 1 1 1 1
= (1) + (1) + (1) + (1) + + + = 0.42708
12 12 12 12 12 8 12 2 12 8

Problem 3

A random sample of size 12 gives us the following data:

1, 2, 3, 3, 5, 6, 7, 8, 9, 9, 11, 12

Use the gamma kernel with # = 1 , calculate f ( 6 ) , F ( 6 )

Solution
x x
1
Gamma kernel with # = 1 ky ( x) = e x y
, K y ( x ) = & k y ( t )dt = 1 e y

y 0

f ( 6) = p ( y ) k y ( 6)
1 1 61 1 1 62 1 1 63 1 1 63 1 1 65 1 1 66
= e + e + e + e + e + e
12 1 12 2 12 3 12 3 12 5 12 6
1 1 67 1 1 68 1 1 69 1 1 69 1 1 6 11 1 1
+ e + e + e + e + e + e 6 12

12 7 12 8 12 9 12 9 12 11 12 12

' 0.0248

F (6) = p ( y ) K y ( 6)

=
1
12
(1 e 6 1
) + 121 (1 e 6 2
) + 121 (1 e 63
) + 121 (1 e 63
) + 121 (1 e 65
) + 121 (1 e 66
)
+ (1 e ) + 121 (1 ) + 121 (1 ) + 121 (1 ) + 121 (1 ) + 121 (1 )
1 6 7 6 8 69 69 6 11 6 12
e e e e e
12

' 0.658

Guo Fall 2009 C, Page 92 / 284


Nov 2003 #4

You study five lives to estimate the time from the onset of a disease to death. The times
to death are:

2 3 3 3 7

Using a triangular kernel with bandwidth 2, estimate the density function at 2.5.

Solution

The neighborhood is [0.5, 4.5]. If you draw a neighborhood diagram, you should get:

y 2 3 3 3 7
p( y) 15 15 15 15 15
1.5 1.5 1.5 1.5
k y ( 2.5) 4 4 4 4
0

1 1.5 1 1.5 1 1.5 1 1.5


f ( 2.5 ) = p ( y ) k y ( 2.5 ) = + + + = 0.3
5 4 5 4 5 4 5 4

Nov 2004 #20


From a population having distribution function F , you are given the following sample:

2.0 3.3 3.3 4.0 4.0 4.7 4.7 4.7

Calculate the kernel density estimate F ( 4 ) , using the uniform kernel with bandwidth 1.4.

Solution

The neighborhood is [4-1.4, 4+1.4]=[2.6, 5.4] = [B,E]

G H I J K L

A B C D E F
2 2.6 3.3 4 4.7 5.4

Guo Fall 2009 C, Page 93 / 284


If you use scissors to cut whats left to the line AG at y = 2 , the neighborhood
rectangular BEKG completely survives the cut. So K y = 2 ( 4 ) = ABD = 1 .

If you use scissors to cut whats left to the line CI at y = 3.3 , the surviving area is CFLI.
Area CFLI=0.75. So K y =3.3 ( 4 ) = 0.75

If you use scissors to cut whats left to the line DJ at y = 4 , the surviving area is DFLI,
which is 0.5. K y = 4 ( 4 ) = BCD = 0.5 .

If you use scissors to cut whats left to the line EK at y = 4.7 , the surviving area is EFLK,
which is 0.25. So K y = 4.7 ( 4 ) = 0.25 .

y 2.0 3.3 3.3 4.0 4.0 4.7 4.7 4.7


p( y) 18 18 18 18 18 18 18 18
K y ( 6) 1 0.75 0.75 0.5 0.5 0.25 0.25 0.25

1 1 1 1
F ( 4) = p ( y ) K y ( 4) = (1) + ( 0.75) 2 + ( 0.5 ) 2 + ( 0.25 ) 3 = 0.53125
8 8 8 8

Guo Fall 2009 C, Page 94 / 284


Chapter 4 Bootstrap
Essence of bootstrapping

Loss Models doesnt explain bootstrap much. As a result, many candidates just memorize
a black-box formula without understanding the essence of bootstrap.

Let me explain bootstrap with an example. Suppose you want to find out the mean and
variance of GRE score of a group of 5,000 students. One way to do so is to take out lot of
random samples. For example, you can sample 20 students GRE scores and calculate the
mean and variance of the GRE score. Here you have one sample of size 20. Of course,
you want to take many samples. For example, you can take out 30 samples, each sample
consisting 20 students GRE score. For each of the 30 samples, you can calculate the
mean and variance of the GRE score.

As you can see, taking 30 samples of size 20 takes lot of time and money. As a research
scientist, you are short of research grant. And your life is busy. Is there any way you can
cut some corners?

You can cut corners this way. Instead of taking out 30 samples of size 20, you just take
out one sample of size 20 and collect 20 students GRE scores. These 20 scores are X 1 ,
X 2 ,, X 20 . You bring these 20 scores home. Your data collection is done.

Next, you reproduce 30 samples of size 20 each from one sample of size 20. How? Just
resample from your one sample of 20 scores. You randomly select 20 scores with
replacement from the 20 scores you have. This is your 1st resample. Next, you randomly
select 20 scores with replacement from the 20 scores you have. This is your 2nd resample.

If you repeat this process 30 times, youll get 30 resamples of size 20 each. If you repeat
this process 100 times, youll get 100 resamples of size 20 each. Now your original one
sample gives birth to many resamples. How wonderful.

The rest is easy. If you have 30 resamples, you can calculate the mean and variance of the
GRE scores for each sample. This should give you a good idea of the mean and variance
of the GRE scores.

Does this sound a fraud? Not really. Your original sample of size 20 X 1 , X 2 ,, X 20
reflects the population. As a result, resamples from this sample are pretty much what you
get if you take out many samples from the population. (By the way, the bootstrap comes
from the phrase to pull oneself by ones bootstrap.)

To use bootstrap, youll need to have a computer and some bootstrapping software to
quickly create a great number (such as 10,000) of resamples and to calculate the statistics
of the resamples. Bootstrap is a computer-intensive technique.

Guo Fall 2009 C, Page 95 / 284


To summarize, bootstrap reduces researchers time and money spent on data collection.
Researchers just need to collect one good sample and bring it home. Then they can use
computers to create resamples and calculate statistics data.

Recommended supplemental reading

For more information on bootstrap, you can download the free PDF file at
http://bcs.whfreeman.com/pbs/cat_160/PBS18.pdf

May 2000 #17

You are given a random sample of two values from a distribution F :

1 3

(X )
2
1
( F ) = Var ( X ) using the estimator g ( X 1 , X 2 ) =
2
You estimate i X , where
2 i =1

X1 + X 2
X= . Determine the bootstrap approximation to the mean square error.
2

Solution

Your original sample is (1,3). The variance of your original sample is

Var ( X ) = E ( X ) E ( X ) = (12 + 32 )
1 1
2 2
(1 + 3) =1
2 2

Under the bootstrap method, you resample from your original sample with replacement.
Your resamples are: (1,1), (1,3),(3,1), and (3,3), each having probability of 1 4 .

(X )
2
1
For each resample, you calculate g ( X 1 , X 2 ) =
2
i X . Then the mean square
2 i =1

error is: MSE = E g ( X 1 , X 2 ) Var ( X ) .


2

Guo Fall 2009 C, Page 96 / 284


X1 + X 2
(X )
Resample with 1 2
g ( X1, X 2 ) =
2

replacement X= i X
2 2 i =1
( X1 , X 2 )
(1,1) 1 1
(1 1) + (1 1) =0
2 2

2
(1,3) 2 1
(1 2) + (3 2) =1
2 2

2
(3,1) 2 1
(3 2 ) + (1 2 ) =1
2 2

2
(3,3) 3 1
(3 3) + ( 3 3) =0
2 2

MSE = E g ( X 1 , X 2 ) Var ( X ) P ( X 1 , X 2 ) g ( X 1 , X 2 ) Var ( X )


2 2
=

1 1 1 1 1
= ( 0 1) + (1 1) + (1 1) + ( 0 1) =
2 2 2 2

4 4 4 4 2

Nov 2006 #26

You are given a random sample of two values from a distribution F :

2 3

(X )
2
( F ) = Var ( X ) using the estimator g ( X 1 , X 2 ) =
2
You estimate i X , where
i =1

X1 + X 2
X= . Determine the bootstrap approximation to the mean square error.
2

Solution

The only difference between this problem and the previous problem (May 2000 #17)

(X )
2
is the definition of g ( X 1 , X 2 ) . In this problem, g ( X 1 , X 2 ) =
2
i X ; in the
i =1

(X )
2
1
previous problem, g ( X 1 , X 2 ) =
2
i X .
2 i =1

Your original sample is (1,3). The variance of your original sample is

Var ( X ) = E ( X ) E ( X ) = (12 + 32 )
1 1
2 2
(1 + 3) =1
2 2

Guo Fall 2009 C, Page 97 / 284


Under the bootstrap method, you resample from your original sample with replacement.
Your resamples are: (1,1), (1,3),(3,1), and (3,3), each having probability of 1 4 .

(X )
2
For each resample, you calculate g ( X 1 , X 2 ) =
2
i X . Then the mean square error
i =1

is: MSE = E g ( X 1 , X 2 ) Var ( X ) .


2

X1 + X 2
(X )
Resample with 2
g ( X1, X 2 ) =
2

replacement X= i X
2 i =1
( X1 , X 2 )
(1,1) 1 (1 1) + (1 1) = 0
2 2

(1,3) 2 (1 2) + (3 2) = 2
2 2

(3,1) 2 (3 2 ) + (1 2 ) = 2
2 2

(3,3) 3 (3 3 ) + ( 3 3) = 0
2 2

MSE = E g ( X 1 , X 2 ) Var ( X ) P ( X 1 , X 2 ) g ( X 1 , X 2 ) Var ( X )


2 2
=

1 1 1 1
= ( 0 1) + ( 2 1) + ( 2 1) + ( 0 1) = 1
2 2 2 2

4 4 4 4

May 2005 #4

Three observed values of the random variables X are: 1 1 4.

You estimate the 3rd central moment of X using the estimator

g ( X1, X 2 , X 3 ) =
1
(X )
3
i X
3

Determine the bootstrap estimate of the mean squared error of g .

Solution

First, you need to understand that the n -th central moment is E {X E(X )
n
}.
For example, the 1st central moment is

E X E(X ) = E(X ) E E(X ) = E(X ) E(X ) = 0

Guo Fall 2009 C, Page 98 / 284


The 2nd central moment is E {X E(X )
2
} = Var ( X ) .
The 3rd central moment is E {X E(X )
3
}.
Your original sample is (1,1,4). The 3rd central moment of this sample is calculated as
follows:

X=
1+1+ 4
3
=2 , E {X E(X )
3
} = 13 (1 2) +
3 1
3
3 1
(1 2 ) + ( 4 2 ) = 2
3
3

The third central moment of this original sample is used to approximate the true 3rd
central moment of the population. So the true parameter is = 2 .

Next, you need to understand bootstrap. Under bootstrap, you resample from the original
sample with replacement. Imagine you have 3 boxes to fill from left to right. The 1st box
can be filled with any number of your original sample (1,1,4); the 2nd box can be filled
with any number of your original sample (1,1,4); and the 3rd box can be filled with any
number of your original sample (1,1,4). The # of resamples is 33=27. This is a concept in
Exam P.

For each resample ( X 1 , X 2 , X 3 ) , you calculate g ( X 1 , X 2 , X 3 ) =


1
(X )
3
i X .
3

Your resamples are:

(1) Three 1s. The number of permutation is 8. To understand why, lets denote the
original sample as (a,b,c) with a=1, b=1, and c=4. Then the following 8 resamples will
produce (1,1,1): aaa,aab,aba,baa, bba,bab,abb, bbb. For the resample of (1,1,1),

X=
1+1+1
3
=1 , =E {X E(X )
3
} = 13 (1 1) +
3 1
3
3 1
(1 1) + (1 1) = 0 ,
3
3

( )
2
= ( 0 2) = 4
2

(2) Two 1s and one 4. The following 8 permutations will produce two 1s and one 4:
aac,aca,caa,bbc,bcb,cbb,abc,acb,cab,bac,bca,cba.

X=
1+1+ 4
3
=2 , E {X E(X )
3
} = 13 (1 2) +
3 1
3
3 1
(1 2 ) + ( 4 2 ) = 2 ,
3
3

( )
2
= ( 2 2) = 0
2

(3) Two 4s and one 1. The following 6 permutations will produce two 4s and one 1:

Guo Fall 2009 C, Page 99 / 284


cca, cac, acc, ccb, cbc, bcc.

X=
1+ 4 + 4
3
=3 , E {X E(X )
3
} = 13 (1 3) +
3 1
3
3 1
( 4 3) + ( 4 3) = 2 ,
3
3

( )
2
= ( 2 2) = 0
2

(4) Three 4s. The following 1 permutation will produce two 4s and one 1: ccc.

X=
4+4+4
3
=4 , E {X E(X )
3
} = 13 ( 4 4) +
3 1
3
3 1
( 4 4) + ( 4 4) = 0 ,
3
3

( )
2
= ( 4 2) = 4
2

Finally, the mean squared error is:

( ) 8 12 6 1
2
E = ( 4 ) + ( 0 ) + (16 ) + ( 4 ) 4.9 .
27 27 27 27

Nov 2004 #16

A sample of claim amounts is {300, 600, 1500}. By applying the deductible to this
sample, the loss elimination ratio for a deductible of 100 per claim is estimated to be
0.125.

You are given the following simulations from the sample:

Simulation Claim Amounts


1 600 600 1500
2 1500 300 1500
3 1500 300 600
4 600 600 300
5 600 300 1500
6 600 600 1500
7 1500 1500 1500
8 1500 300 1500
9 300 600 300
10 600 600 600

Determine the bootstrap approximation to the mean square error of the estimate.

Solution

Your original sample is {300, 600, 1500}. If you resample this sample with replacement,
youll get 33=27 resamples. However, calculating the mean square errors based on 27
Guo Fall 2009 C, Page 100 / 284
resamples is too much work under the exam condition. Thats why SOA gives you only
10 resamples.
E min ( X , d )
Loss elimination ratio is LERX ( d ) = .
E(X )

Loss elimination ratio for the original sample {300, 600, 1500} with 100 deductible is
0.125. SOA already gives the loss ratio. If we need to calculate it, this is how:

For the loss amount 300, the insurer pays only 200, saving 100.
For the loss amount 600, the insurer pays only 500, saving 100.
For the loss amount 1500, the insurer pays only 1400, saving 100.
1
The expected saving due to 100 deductible is: (100 + 100 + 100 ) = 100
3
1
The expected loss amount is: ( 300 + 600 + 1500 ) = 100 + 200 + 500 = 800
3
So the loss ratio is: 100 / 800 = 0.125

Next, for each of the 10 resamples, you calculate the loss ratio as we did for the original
sample. To speed up the calculation, lets set $100 as one unit of money. Then the
deductible is one.
( LER 0.125 )
2
X1 X2 X3 LER
Resample
1 6 6 15 1/9 0.000193
2 15 3 15 1/11 0.001162
3 15 3 6 1/8 0
4 6 6 3 1/5 0.005625
5 6 3 15 1/8 0
6 6 6 15 1/9 0.000193
7 15 15 15 1/15 0.003403
8 15 3 15 1/11 0.001162
9 3 6 3 1/4 0.015625
10 6 6 6 1/6 0.001736
Total 0.0291

For example, for the 1st resample {6,6,15}, the claim payment after the deductible of 1 is
{5,5,14}. So the LER is (1+1+1) / (6+6+15) =3/27=1/9.

10
1 0.0291
The MES = ( LERi 0.125 ) = = 0.0029
2

i =1 10 10

Guo Fall 2009 C, Page 101 / 284


Chapter 5 Bhlmann credibility model
Trouble with black-box formulas
The Bhlmann credibility premium formula is tested over and over in Course 4 and Exam
C. However, many candidates dont have a good understanding of the inner workings of
the Bhlmann credibility premium model. They just memorize a series of black-box
formulas:

n E Var ( X )
Z= , k= , and P (1 Z ) + Z X
n+k Var ( )
Rote memorization of a formula without fully grasping the concepts is tedious, difficult,
and prone to errors. Additionally, a memorized formula will not yield the needed
understanding to grapple with difficult problems.

In this chapter, were going to dig deep into Bhlmanns credibility premium formula and
gain a crystal clear understanding of the concepts.

Rating challenges facing insurers

Lets start with a simple example to illustrate one major challenge an insurance company
faces when determining premium rates. Imagine you are the founder and the actuary of
an auto insurance company. Your companys specialty is to provide auto insurance for
taxi drivers.

Before you open your business, there are half of dozen insurance companies in your area
that offer auto insurance to taxi drivers. The world has been going on fine for many years
without your start up. It can continue going on without your start up. So its tough for you
to get customers. Finally, you take out a big portion of your saving account and buy TV
advertising, which brings in your first three customers: Adam, Bob, and Colleen. Since
your corporate office is your garage and you have only one employee (you), you decide
that three customers is good enough for you to start your business.

When you open your business at t = 0 , you sell three auto insurance policies to Adam,
Bob, and Colleen. The contract of your insurance policy says that the premium rate is
guaranteed for only two years. Once the two-year guarantee period is over, you have the
right to set the renewal premium, which can be higher than the guaranteed initial
premium.

When you set your premium rate at t = 0 , you notice that Adam, Bob, and Colleen are
similar in many ways. They are all taxicab drivers. They work at the same taxi company
in the same city. They are all 35 years old. They all graduated from the same high school.

Guo Fall 2009 C, Page 102 / 284


They are all careful drivers. Therefore, at t = 0 you treat Adam, Bob, and Colleen as
identical risks and charge the same premium for the first two years.

To actually set the initial premium for the first two years, you decide to buy a rate book
from a consulting firm. This consulting firm is well-known in the industry. Each year it
publishes a rate manual that lists the average claim cost of a taxi driver by city, by
mileage and by several other criteria. Based on this rate manual, you estimate that Adam,
Bob, and Colleen may each incur $4 claim cost per year. So at t = 0 , you charge Adam,
Bob, and Colleen $4 each. This premium rate is guaranteed for two years.

During the 2-year guaranteed period, Adam, Bob, and Colleen have incurred the
following claims:
Year 1 Year 2 Total Claim Average claim
Claim Claim per insured per year
Adam $0 $0 $0 $0 / 2 = $0
Bob $1 $7 $8 $8 / 2 = $4
Colleen $4 $9 $13 $13 / 2 =$6.5
Grand Total $21

Average claim per person per year (for the 3-person group): $21 / (3 2) = $3.5

Now the two-year guarantee period is over. You need to determine the renewal premium
rate for Adam, Bob, and Colleen respectively for the third year. Once you have
determined the premium rates, you will need to file these rates with the insurance
department of the state where you do business (called domicile state).

Question: How do you determine the renewal premium rate for the third year for Adam,
Bob, and Colleen respectively?

One simple approach is to charge Adam, Bob, and Colleen a uniform rate (i.e. the group
premium rate). After all, Adam, Bob, and Colleen are similar risks; they form a
homogeneous group. As such, they should pay a uniform group premium rate, even
though their actual claim patterns for the past two years are different. You can continue
charging them the old rate of $4 per insured per year. However, since the average claim
cost for the past two years is $3.50 per insured per year, you can charge them $3.50 per
person for year three.

Under the uniform group rate of $3.50, Bob and Colleen will probably underpay their
premiums; their actual average annual claim for the past two years exceeds this group
premium rate. Adam, on the other hand, may overpay his premiums; his average annual
claim for the past two years is below the group premium rate. When you charge each
policyholder the uniform group premium rate, low-risk policyholders will overpay their
premiums and the high-risk policyholders will underpay their premiums. Your business
as whole, however, will collect just enough premiums to pay the claim costs.

Guo Fall 2009 C, Page 103 / 284


However, in the real world, most likely you wont be able to charge Adam, Bob, and
Colleen a uniform rate of $3.50. Any of your customers can easily shop around, compare
premium rates, and buy an insurance policy elsewhere with a better rate. For example,
Adam can easily find another insurer who sells a similar insurance policy for less than
your $3.50 group rate. Additionally, the commissioner of your state insurance department
is unlikely to approve your uniform rate. The department will want to see that your low
risk customers pay lower premiums.

Key points to remember:


Under the classical theory of insurance, people with similar risks form a homogeneous
group to share the risk. Members of a homogeneous group are photocopies of each other.
The claim random variable for each member is independent identically distributed with a
common density function f X ( x ) . The uniform pure premium rate is E ( X ) . Each member
of the homogeneous group should pay E ( X ) .

In reality, however, theres no such thing as a homogeneous group. No two


policyholders, however similar, have exactly the same risks. If you as an insurer charge
everybody a uniform group rate, then low-risk policyholders will leave and buy insurance
elsewhere.

To stay in business, you have no choice but to charge individualized premium rates that
are proportional to policyholders risks.

Now lets come back to our simple case. We know that uniform rating wont work in the
real world. Well want to set up a mathematical model to calculate the fair renewal
premium rate for Adam, Bob, and Colleen respectively. Our model should reflect the
following observations and intuition:

Adam, Bob, and Colleen are largely similar risks. Well need to treat them as a
rating group. This way, our renewal rates for Adam, Bob, and Colleen are
somewhat related.

On the other hand, we need to differentiate between Adam, Bob, and Colleen. We
might want to treat Adam, Bob, and Colleen as potentially different sub-risks
within a largely similar rate group. This way, our model will produce different
renewal rates. We hope the renewal rate calculated from our model will agree
with our intuition that Adam deserves the lowest renewal rate, Bob a higher rate,
and Colleen the highest rate.

To reflect the idea that Adam, Bob, and Colleen are different sub-risks within a
largely similar rate group, we may want to divide the largely similar rate group
into four sub-risks (or more sub-risks if you like): super preferred, preferred,
standard, and sub-standard. So the rate group actually consists of four sub-risks.
Adam or Bob or Colleen can be any one of the four sub-risks.

Guo Fall 2009 C, Page 104 / 284


Here comes a critical point: we dont know who belongs to which sub-risk. We
dont know whether Adam is a super-preferred sub-risk, or a preferred sub-risk, a
standard sub-risk, or a sub-standard sub-risk. Nor do we know to which sub-risk
Bob or Colleen belongs. This is so even if we have Adams two-year claim data.
Judged from his 2-year claim history, Adam seems to be a super preferred or at
least a preferred sub-risk. However, a bad driver can have no accidents for a while
due to good luck; a good driver can have several big accidents in a row due to bad
luck. So we really cant say for sure that Adam is indeed a better risk. All we
know that Adams sub-risk class is a random variable consisting of 4 possible
values: super preferred, preferred, standard; and substandard.

To visualize that Adams sub-risk class is a random variable, think about rolling a 4-sided
die. One side of the die is marked with the letters SP (super preferred); another side is
marked with PF (preferred); the third side is marked with STD (standard); and the
fourth side is marked with SUB (substandard). To determine Adam belongs to which
sub-class, well roll the die. If the result is SP, then well assign Adam to the super
preferred class. If the result is PF, well assign him to the preferred class. And so on
and so forth. Similarly, we can roll the die and randomly assign Bob or Colleen to one of
the four sub-classes: SP, PF, STD, and SUB.

Now we are ready to come up with a model to calculate the renewal premium rate:
Let random variable X j t represent the claim cost incurred in year t by the j -th insured,
where t = 1, 2,..., n , and n + 1 and j =1,2,, and m . Here in our example, n = 2 (we
have two years of claim data) and m = 1, 2,3 (corresponding to Adam, Bob, and Colleen).

For any j =1,2,, and m , X j 1 , X j 2 ,, X j n , and X j n +1 are identically distributed with a


common density function f X , ( x, ) , a common mean = E ( X j t ) , and a common
variance 2
( )
= Var X j t . What we are saying here is that all policyholders j =1,2,,
and m have identical mean claim and identical claim variance 2
.

is a realization of . is a random variable (or a vector of random variables)


representing the presence of multiple sub-risks. X j 1 , X j 2 ,, X j n , and X j n +1 , which
represent the claim costs incurred by the same policyholder, belong to the same sub risk
class .

However, is unknown to us. All we know is that is a random realization of . Here


in our example, = {SP, PF, STD, SUB} . When we say that is a realization of , we
mean that with probability p1 , = SP ; with probability p2 , = PF ; with probability
p3 , = STD ; with probability p4 = 1 ( p1 + p2 + p3 ) , = SUB .

Guo Fall 2009 C, Page 105 / 284


Because X j 1 , X j 2 ,, X j n , and X j n +1 are claims generated from the same (unknown)
sub-risk class, we assume that given , X j 1 , X j 2 ,, X j n , and X j n +1 are independent
identically distributed. That is, X j 1 , X j2 , , X j n , X j n +1 are independent
identically distributed with a common conditional mean E X j t ( )= ( ) and a
common conditional variance Var X j t ( ).
We have observed X j 1 , X j 2 ,, X j n . Our goal is to estimate X j n +1 , the claim cost in
n
1
year n + 1 by the j -th insured, using his prior n -year average claim cost X j = X jt .
n t =1

The estimated value of X j n +1 is the pure renewal premium for year n + 1 . Bhlmanns
approach is to use a + Z X j to approximate X j n +1 subject to the condition that

( )
2
E a+ZX j X j n +1 is minimized.

The final result:

a + Z X j = (1 Z ) + Z X j ,

Z=
n
, k=
(
E Var X j t ) =
E Var X j t( )
n+k Var E(X jt ) Var ( )

= E(X j t) = E E X j t ( ) =E ( ) .

Next, well derive the above formulas. However, before we derive the Bhlmann
premium formulas, lets go over some preliminary concepts.

3 preliminary concepts for deriving the Bhlmann premium formula


Preliminary concept #1 Double expectation
E(X ) = E E(X )
If X is discrete, E ( X ) = E E(X ) = p( )E(X ).
all
+
If X is continuous, E ( X ) = E E(X ) = E(X )f ( )d

Guo Fall 2009 C, Page 106 / 284


Ill explain the double expectation theorem assuming X is discrete. However, the same
logic applies when X is continuous.

Lets use a simple example to understand the meaning behind the above formula. A class
has 6 boys and 4 girls. These 10 students take a final. The average score of the 6 boys is
80; the average score of the 4 girls is 85. Whats the average score of the whole class?

This is an elementary level math problem. The average score of the whole class is:

Total score 6 ( 80 ) + 4 ( 85) 820


Average score = = = = 82
# of students 10 10

Now lets rearrange the above equation:

6 4
Average score = ( 80 ) + ( 85 )
10 10

If we express the above calculation using the double expectation theorem, then we have:

E ( Score ) = EGender E ( Score Gender ) = P ( Gender ) E ( Score Gender )

= P ( boy ) E ( score boy ) + P ( girl ) E ( score girl )

6 4
= ( 80 ) + (85 ) = 82
10 10

So instead of directly calculating the average score for the whole class, we first break
down the whole class into two groups based on gender. We then calculate the average
score of these two groups: boys and girls. Next, we calculate the weighted average of
these two group averages. This weighted average is the average of the whole class. If you
understand this formula, you have understood the essence of the double expectation
theorem.

The Double Expectation Theorem in plain English:


Instead of directly calculating the mean of the whole population, you first break down the
population into several groups based on one standard (such as gender). You calculate the
mean of each group. Next, you calculate the mean of all the group means. This is the
mean of the whole population.

Problem A group of 20 graduate students (12 with non-math major and 8 with math
major) have a total GRE score of 12,940. The GRE score distribution by major is as
follows:

Guo Fall 2009 C, Page 107 / 284


Total GRE scores of 12 non-math major 7,740
Total GRE scores of 8 math major 5,200
Total GRE score 12,940

Find the average GRE score twice. First time, do not use the double expectation theorem.
The second time, use the double expectation theorem. Show that you get the same result.

Solution

(1) Find the mean without using the double expectation theorem. The average GRE score
for 20 graduate students is:

Total score 12,940


Average score = = = 647
# of students 20

(2) Find the mean using the double expectation theorem.

E ( GRE ) = EMajor E ( GRE Major ) = P ( Major ) E ( GRE Major )

= P ( non math ) E ( GRE non math ) + P ( math ) E ( GRE math )

12 7, 740 8 5, 200
= + = 647
20 12 20 8

You can see the two methods produce an identical result.

Preliminary concept #2 Total variance formula

Var ( X ) = E Y Var ( X Y ) + Var Y E ( X Y )

Proof.

Var ( X ) = E ( X 2 ) E 2 ( X )

Put the double expectation theorem into use:

E ( X ) = EY E ( X Y ) , (
E ( X 2 ) = EY E X 2 Y )
( )
However, E X 2 Y = Var ( X Y ) + E 2 ( X Y ) .

Guo Fall 2009 C, Page 108 / 284


Var ( X ) = E ( X 2 ) E 2 ( X ) = E Y Var ( X Y ) + E 2 ( X Y ) {E E ( X Y ) }
2
Y

{
= E Y Var ( X Y ) + E Y E 2 ( X Y ) (E Y E(X Y) ) }
2

= E Y Var ( X Y ) + Var Y E ( X Y )

If X is the lost amount of a policyholder and Y is the risk class of the policyholder, then
Var ( X ) = E Y Var ( X Y ) + Var Y E ( X Y ) means that the total variance of the loss
consists of two components:

E Y Var ( X Y ) , the average variance by risk class


Var Y E ( X Y ) , the variance of the average loss by risk class.

E Y Var ( X Y ) is called the expected value of process variance. Var Y E ( X Y ) is


called the variance of hypothetical mean.

Var ( X ) = E Y Var ( X Y ) + Var Y E ( X Y )


Total variance expected process variance variance of hypothetical mean

Next, lets look at a comprehensive example using double expectation and total variance.

Example. The number of claims, N , incurred by a policyholder has the following


distribution:

3!
P (n) = p n (1 p ) .
3 n

n !( 3 n ) !

P is uniformly distributed over [0, 1]. Find E ( N ) and Var ( N ) .

Solution

If p is constant, N has a binomial distribution with mean and variance:

E ( N ) = 3 p , Var ( N ) = 3 p (1 p )

However, p is also a random variable. So we cannot directly use the above formula.

Guo Fall 2009 C, Page 109 / 284


To find E ( N ) , we divide N into different groups by p , just as we divided the class into
boys and girls. The only difference is that this time we have an infinite number of groups
( p is a continuous random variable).

Lets consider a small group [ p, p + dp ]

Each value of p is a separate group. For each group, we will calculate its mean. Then we
will find the weighted average mean of all the groups, with weight being the probability
of each groups p value. The result should be E ( N ) .

1 1 1

E ( N ) = EP E ( N p ) E ( N p ) f P ( p ) dp =
3 2 3
= 3 p dp = p =
p= 0 p= 0 2 0 2

Please note that p is uniform over [0, 1]. Consequently, f P ( p ) = 1 .

Alternatively, E ( N ) = EP E ( N p ) = EP [3 p ] = 3E ( P ) = 3
1 3
=
2 2

Next, well calculate Var ( N ) . One method is to calculate Var ( N ) from scratch using
the standard formula Var ( N ) = E ( N 2 ) E 2 ( N ) . Well use the double expectation
theorem to calculate E ( N 2 ) and E ( N ) .

( ) ( )
1
E(N 2
)=E P E N p2
= E N 2 p f ( p ) dp
0

( )
E N 2 p = E 2 ( N p ) + Var ( N p ) = ( 3 p ) + 3 p (1 p ) = 6 p 2 + 3 p
2

( )
1 1
E ( N ) = E N p f ( p ) dp = ( 6 p + 3 p ) dp = 2 p + p 2
3 7
2 2 2 3
=
0 0
2 0 2

Var ( N ) = E ( N ) 7 3 5
2
E (N) =
2
=
2 2 4

Alternatively, you can use the following formula to calculate the variance:

Var ( N ) = E p Var ( N p ) + Var P E ( N p )

Guo Fall 2009 C, Page 110 / 284


Because N p is binomial with parameter 3 and p , we have:

E ( N p ) = 3 p , Var ( N p ) = 3 p (1 p )

E p Var ( N p ) = E p 3 p (1 p ) = E p ( 3 p 3 p 2 )
= E p ( 3 p ) E p ( 3 p 2 ) = 3E p ( p ) 3 E p ( p 2 )

Var P E ( N p ) = Var P ( 3 p ) = 9Var ( p )

Var ( N ) = E p Var ( N p ) + Var P E ( N p ) = 3E p ( p ) 3E p ( p 2 ) + 9Var ( p )

Applying the general formula:


(b a)
2
a+b
If X is uniform over [a, b] , then E ( X ) = , Var ( X ) =
2 12

We have:
(1 0)
2
0 +1 1 1
E (P) = = , Var ( P ) = =
2 2 12 12
2

E ( P ) = E ( P ) + Var ( P ) =
1 1 4
2 2
+ =
2 12 12

Var ( N ) = E p Var ( N p ) + Var P E ( N p ) = 3E p ( p ) 3E p ( p 2 ) + 9Var ( p )


1 4 1 5
=3 3 +9 =
2 12 12 4

Preliminary concept #3 Linear least squares regression

In a regression analysis, you try to fit a line (or a function) through a set of points. With
least squares regression, you get a better fit by minimizing the distance squared of each
point to the fitted line.

Lets say you want to find out how a persons income level affects how much life
insurance he buys. Let X represent income. Let Y represent the amount of life insurance
this person buys. You have collected some data pairs of ( X , Y ) from a group of
consumers. You suspect theres a linear relationship between X and Y . You want to
predict Y using the function a + bX , where a and b are constant. With least squares
regression, you want to minimize the following:

Guo Fall 2009 C, Page 111 / 284


Q=E ( a + bX Y)
2

Next, well derive a and b .

Q ! 2 "
= ( a + bX Y) = E# ( a + bX Y) $
2
E
a a % a &

= 2 E ( a + bX Y ) = 2 a + bE ( X ) E (Y )

Q
Setting = 0. a + bE ( X ) E (Y ) = 0 ( Equation I )
a

Q ! 2 "
= ( a + bX Y) = E# ( a + bX Y) $
2
E
b b % b &

= 2E ( a + bX Y ) X = 2 aE ( X ) + bE ( X 2 ) E ( X Y )

aE ( X ) + bE ( X 2 ) E ( X Y ) = 0
Q
Setting = 0. (Equation II )
b

(Equation II) - (Equation I) E ( X ) :

b E ( X 2 ) E 2 ( X ) = E ( X Y ) E ( X ) E (Y )

However, E ( X 2 ) E 2 ( X ) = Var ( X ) , E ( X Y ) E ( X ) E ( Y ) = Cov ( X , Y ) .

Cov ( X , Y )
b= , a = E (Y ) bE ( X )
Var ( X )

Derivation of Bhlmanns Credibility Formula

Now Im ready to give you a quick proof of the Bhlmann credibility formula. To
simplify notations, Im going to fix on one particular insured (such as Adam) and change
the symbol X j t to X t . Remember, our goal is to estimate X n +1 , the individualized
premium rate for year n + 1 , using a + Z X . Z is the credibility factor assigned to the
1
mean of past claims X = ( X 1 + X 2 + ... + X n ) . Well want to find a and Z that
n
minimize the following:
Guo Fall 2009 C, Page 112 / 284
( )
2
E a+ZX X n +1

Please note that X 1 , X 2 ,, X n , and X n +1 are claims incurred by the same policyholder
(whose risk class is unknown to us) during year 1, 2, , n , and n + 1 .

Applying the formula developed in preliminary concept #3, we have:

z=
(
Cov X , X n +1 )
Var X ( )

( )
Cov X , X n +1 = Cov
1
n
1
( X 1 + X 2 + ... + X n ) , X n +1 = Cov
n
( X 1 + X 2 + ... + X n ) , X n+1
1
= Cov ( X 1 , X n +1 ) + Cov ( X 2 , X n +1 ) + ... + Cov ( X n , X n +1 )
n

One common mistake is to assume that X 1 , X 2 ,, X n , X n +1 are independent identically


distributed. If indeed X 1 , X 2 ,, X n , X n +1 are independent identically distributed, we
would have

Cov ( X 1 , X n +1 ) = Cov ( X 2 , X n +1 ) = ... = Cov ( X n , X n +1 ) = 0

Z=
(
Cov X , X n +1 ) =0
Var X ( )
The result Z = 0 simply doesnt make sense. What went wrong is the assumption that
X 1 , X 2 ,, X n , X n +1 are independent identically distributed. The correct statement is
that X 1 , X 2 ,, X n , and X n +1 are identically distributed with a common density function
f ( x, ) , where is unknown to us.

Or stated differently, X 1 , X 2 ,, X n , and X n +1 are independent identically distributed


given risk class . In other words, if we fix the sub-class variable at , then all the
claims incurred by the policyholder who belongs to sub-class are independent
identically distributed. Mathematically, this means that X 1 , X 2 ,, X n , and
X n +1 are independent identically distributed.

Here is an intuitive way to see why X i and X j have non-zero covariance. X i and X j
represent the claim amount incurred at time i and j by the policyholder whose sub-class
Guo Fall 2009 C, Page 113 / 284
is unknown to us. So X i and X j are controlled by the same risk-class factor . If is
a low risk, then X i and X j both tend to be small. On the other hand, if is a high risk,
then X i and X j both tend to be big. So X i and X j are correlated and have a non-zero
variance.

Next, lets derive the formula:

Cov ( X i , X j ) = E ( X i X j ) E ( X i ) E ( X j ) = Var ( ) where i ' j .

Using the double expectation theorem, we have E ( X i X j ) = E E Xi X j( ) .


Because X i and X j are independent identically distributed with a common
conditional mean ( ), we have:

(
E Xi X j ) = E(X )E(X ) = (
i j ) ( ) = ( )
2

E (
E Xi X j ) =E E ( Xi )E(X ) j =E ( )
2

Cov ( X i , X j ) = E ( ) {E ( ) } ( )
2 2
= Var

(
Cov X , X n +1 = ) 1
n
Cov ( X 1 + X 2 + ... + X n ) , X n +1
1
= Cov ( X 1 , X n +1 ) + Cov ( X 2 , X n +1 ) + ... + Cov ( X n , X n +1 )
n
1
{
= nVar ( ) = Var ( )
n
}
Next, well calculate Var X . ( )
( )
Var X = Var
1
n
1
( X 1 + X 2 + ... + X n ) = 2 Var ( X 1 + X 2 + ... + X n )
n

Once again, we have to be careful here. One temptation is to write:

Var ( X 1 + X 2 + ... + X n ) = Var ( X 1 ) + Var ( X 2 ) + ... + Var ( X n ) Wrong!

This is wrong because X 1 , X 2 ,, X n are not independent. Instead, X 1 ,


X2 ,, X n are independent. So we have to include covariance among X 1 ,
X 2 ,, X n . The correct expression is:
Guo Fall 2009 C, Page 114 / 284
Var ( X 1 + X 2 + ... + X n )
= Var ( X 1 ) + Var ( X 2 ) + ... + Var ( X n )
+2Cov ( X 1 , X 2 ) + 2Cov ( X 1 , X 3 ) + ... + 2Cov ( X n 1 , X n )

So we have n variance terms. Though X 1 , X 2 ,, X n are not independent, they have a


common mean = E ( X ) and common variance Var ( X ) .

Var ( X 1 ) + Var ( X 2 ) + ... + Var ( X n ) = nVar ( X ) .

Next, lets look at the covariance terms:

2Cov ( X 1 , X 2 ) + 2Cov ( X 1 , X 3 ) + ... + 2Cov ( X n 1 , X n ) .

Out of X 1 , X 2 ,, X n , if you take out any two items X i and X j where i ' j , youll get
n ( n 1)
a covariance Cov ( X i , X j ) = Var ( ) . Since there are Cn2 =
ways of taking
2
out two items X i and X j where i ' j , the sum of the covariance terms becomes:

2Cov ( X 1 , X 2 ) + 2Cov ( X 1 , X 3 ) + ... + 2Cov ( X n 1 , X n )

{
= 2Var ( ) }C 2
n = 2 Var ( )
1
n ( n 1) = n ( n 1) Var
2
( )

( )
Var X =
1
n2
Var ( X 1 + X 2 + ... + X n )

= 2 { Var ( X 1 ) + Var ( X 2 ) + ... + Var ( X n )


1
n
+ 2Cov ( X 1 , X 2 ) + 2Cov ( X 1 , X 3 ) + ... + 2Cov ( X n 1 , X n ) }

=
1
n2
{
nVar ( X ) + n ( n 1) Var ( ) } = 1n {Var ( X ) + ( n 1) Var ( ) }
Var ( X ) Var ( )
= + Var ( )
n

Using the total variance formula, we have:

Var ( X ) = E Var ( X ) + Var E(X )


Var ( X ) Var ( ) = E Var ( X )
Guo Fall 2009 C, Page 115 / 284
( )
Var X = Var ( ) +
1
n
E Var ( X )
Finally, we have:

Z=
(
Cov X , X n +1 ) = Var ( )
=
Var ( )
Var X ( ) Var X( ) Var ( ) +
1
E Var ( X )
n
n
=
E Var ( X )
n+
Var ( )

E Var ( X ) Var ( ) n
Let k = . Then Z = =
Var ( ) Var X ( ) n+k

( )
Next, we need to find a = E ( X n +1 ) Z E X . Remember, X 1 , X 2 ,, X n , though not
independent, have a common mean E ( X ) = and a common variance Var ( X ) .

( )
E X =E
1
n
1 1
( X 1 + X 2 + ... + X n ) = E ( X 1 + X 2 + ... + X n ) = ( n ) =
n n

E ( X n +1 ) =

( )
a = E ( X n +1 ) Z E X = Z = (1 Z )

n
a + Z X = (1 Z ) + Z X = Z X + (1 Z ) , where z =
n+k

Guo Fall 2009 C, Page 116 / 284


Summary of how to derive the Bhlmann credibility premium formulas

Z=
(
Cov X , X n +1 ), a = (1 Z )
Var X ( )
( )
Cov X , X n +1 = Cov ( X i , X j ) = Var ( ) = VE , where i ' j

Var ( X 1 + X 2 + ... + X n )
( )
Var X =
n2

Var ( X 1 + X 2 + ... + X n ) = nVar ( X ) + n ( n 1) Cov ( X i , X j )


= nVar ( X ) + n ( n 1) Var ( )
{
= n Var ( X ) Var ( ) } + n Var
2
( )
= nE Var ( X ) + n 2Var ( )

Var ( X 1 + X 2 + ... + X n )
( )
Var X =
n2
= Var ( ) +
1
n
E Var ( X ) = VE +
1
n
EV

Z=
(
Cov X , X n +1 ) = Var ( )
=
Var ( )
Var X ( ) Var X ( ) 1
{
E Var ( X ) + nVar ( ) }
n

n n
= = ,
E Var ( X ) n+k
n+
Var ( )

Or Z =
(
Cov X , X n +1 ) = Var ( )
=
VE
=
n
Var X ( ) Var X( ) 1
VE + EV n +
EV
n VE

P = a + Z X = (1 Z ) + Z X

Lets look at the final formula:

P = Z X + (1 Z)
Renewal risk-specific global mean
premium sample mean

Guo Fall 2009 C, Page 117 / 284


Here P is the renewal premium rate during year n + 1 for a policyholder whose sub-risk
is unknown to us. X is the sample mean of the claims incurred by the same policyholder
(hence the same sub-risk class) during year 1, 2, , n . is the mean claim cost of all
the sub-risks combined.

If we apply this formula to set the renewal premium rate for Adam for Year 3, then the
formula becomes:

(1 Z)
Adam
P Adam = Z X + Adam, Bob, Colleen
Renewal risk-specific global mean
premium sample mean

At first, the above formula may seem counter-intuitive. If we are interested only in
Adams claim cost in Year 3, why not set Adams renewal premium for Year 3 equal to
his prior two-year average claim X (so P X )? Why do we need to drag in , the
global average, which includes the claim costs incurred by Bob and Colleen?

Actually, its blessing that the renewal premium formula includes . X varies widely
based on your sample size. However, the state insurance departments generally want the
renewal premium to be stable and responsive to the past claim data. If your renewal
premium P is set to X , then P will fluctuate wildly depending on the sample size. Then
youll have a difficult time getting your renewal rates approved by state insurance
departments.

In addition, you may have P X = 0 ; this is the case for Adam. Youll provide free
insurance to the policyholder who has not incurred any claim yet. This certainly doesnt
make any sense.

By including the global mean , the renewal premium P = (1 Z ) + Z X is stabilized.


Adam Bob
At the same time, P is still responsive to X . Since X <X , the renewal
premium formula P = (1 Z ) + Z X will produce P Adam
<P Bob
.

There are other ways to derive the Bhlmann credibility formula. For example, instead of
( )
2
minimizing E a + Z X X n +1 , we can minimize

( )
2
E a+ZX

Please note that ( ) = E ( X ) is a random variable. In our taxi driver insurance


example, ( ) has four possible values:

Guo Fall 2009 C, Page 118 / 284


E ( X SP ) , E ( X PF ) , E ( X STD ) , and E ( X SUB )

( )
2
The idea behind E a + Z X is this. If we know that a policyholder belongs to
sub-risk , then we can set our renewal premium for year n + 1 equal to his conditional
mean claim cost ( ) = E ( X n +1 ) = E ( X 1 ) = E ( X 2 ) = ... = E ( X n ) . However, we
dont know . As a result, we list all the possible values of ( ) and find the least mean
squared errors estimator of ( ) by minimizing ( )
2
E a+ZX .

Then using Preliminary Concept #3, we have:

Cov X , ( )
Z=
Var X ( )
Cov X , ( )
1 1
= Cov ( X 1 + X 2 + ... + X n ) , ( ) = Cov ( X 1 + X 2 + ... + X n ) , ( )
n n
=
1
n
{Cov X 1 , ( ) + Cov X 2 , ( ) + ... + Cov X n , ( ) }
For i = 1, 2,..., n , we have:

Cov X i , ( ) = E Xi ( ) E ( Xi ) E ( )

E Xi ( ) {
= E E Xi ( ) }
, ( ) is a constant. Hence E Xi ( ) = ( )E = ( )
2
For a fixed Xi

E Xi ( ) {
= E E Xi ( ) }= E ( )
2

E ( Xi ) E ( ) E ( Xi ) ( ) { ( ) }
2
=E E = E

Cov X i , ( ) ( ) {E ( ) } ( )
2 2
=E = Var

Cov X , X1, ( ) + Cov X , X2, ( ) + ... + CovX , Xn, ( ) = nVar ( )

Guo Fall 2009 C, Page 119 / 284


Cov X , ( )
=
1
n
{Cov X 1 , ( ) + Cov X 2 , ( ) + ... + Cov X n , ( ) } =Var ( )

( ) ( ) ( )
2 2
Var X is the same whether E a + Z X X n +1 or E X , a+ZX is to be
minimized:

( ) 1n {E
Var X = Var ( X ) + nVar ( ) }
One again, we get:

Z=
(
Cov X , X n +1 ) = Var ( )
=
n
=
n
Var X( ) Var X( ) n+
E Var ( X ) n+k
Var ( )

( )
a = E ( X n +1 ) Z E X = Z = (1 Z )

a + Z X = (1 Z ) + Z X = Z X + (1 Z ) ,

Theres a third approach to deriving Bhlmanns credibility formula. Instead of


minimizing

( ) ( )
2 2
E a+ZX X n +1 or E a + Z X

E ( X n +1 X 1 , X 2 ,..., X n )
2
we can minimize E a + Z X .

Here X n +1 X 1 , X 2 ,..., X n represents the claim cost at year n + 1 of the policyholder who
incurred claims X 1 , X 2 ,..., X n in year 1,2,, n . The notation X n +1 X 1 , X 2 ,..., X n
emphasizes that the claim amounts X 1 , X 2 ,..., X n , X n +1 are from the same sub-class .
This condition must hold for the Bhlmann credibility formula to be valid. For example,
if X n +1 comes from sub class 1 and X 1 , X 2 ,..., X n from sub-class 2 , then the Bhlmann
credibility formula will not hold true.

However, the requirement that the claim amounts X 1 , X 2 ,..., X n , X n +1 are from the same
sub-class shouldnt bother us at all. At the very beginning when we presented the
Bhlmann credibility formula, we already used X 1 , X 2 ,..., X n , X n +1 to refer to the claims
incurred by the same policyholder whose sub-risk is . As a result,
Guo Fall 2009 C, Page 120 / 284
E ( X n +1 X 1 , X 2 ,..., X n ) = E X = ( )

E ( X n +1 X 1 , X 2 ,..., X n ) ( )
2 2
So E a + Z X = E a+ZX

Key Points

We can derive the Bhlmann credibility formula by minimizing any of the following
three terms:

( ) ( ) E ( X n +1 X 1 , X 2 ,..., X n )
2 2 2
E a+ZX X n +1 , E a+ZX , E a+ZX .

The Bhlmann credibility premium is the least squares linear estimator of any of the
following three terms:

X n +1 , the claim amount in year n + 1 incurred by the policyholder who has


X 1 , X 2 ,..., X n claims in year 1, 2,.., n .

( ) , the mean claim amount of the sub-class that has generated X 1 , X 2 ,..., X n

E ( X n +1 X 1 , X 2 ,..., X n ) , the Bayes posterior estimate of the mean claim in year n + 1


given we have observed that the same policyholder has X 1 , X 2 ,..., X n claim costs
in years 1, 2,.., n respectively.

Even though we have derived the Bhlmann credibility formula assuming X is the claim
cost, the Bhlmann credibility formula works if X is any other quantity such as loss
ratio, the aggregate loss amount, or the number of claims.

Popularity of the Bhlmann credibility formula


The Bhlmann credibility formula is popular due to its simplicity. The renewal premium
is the weighted average of the uniform group rate and the sample mean of the past claims.
The renewal premium is easy to calculate and easy to explain to clients.

In contrast, Bayesian premiums (the posterior means) are often difficult to calculate,
requiring knowledge of prior distributions and involving complex integrations.

Next, lets derive a special case of the Bhlmann credibility formula. This special case is
presented in Loss Models.

Guo Fall 2009 C, Page 121 / 284


Special case

If E ( X i ) = , Var ( X i ) = 2
, and for i ' j Cov ( X i , X j ) = * 2
where correlation
coefficient * satisfies 1 < * < 1 , determine the Bhlmann credibility premium.

Once again, the credibility premium is a + Z X .

(
Cov X , X n +1 ),
Z= (
Cov X , X n +1 = Var ) ( ) = Cov ( X i , X j ) = * 2

Var X ( )
Var ( X 1 + X 2 + ... + X n )
( )
Var X =
n2

Var ( X 1 + X 2 + ... + X n ) = nVar ( X ) + n ( n 1) Cov ( X i , X j ) = n 2


+ n ( n 1) * 2

Z=
(
Cov X , X n +1 )= * 2
=
n*
Var X ( ) 1
n 2
+ n ( n 1) * 2 1 + ( n 1) *
n2

a = (1 Z ) = 1
n*
=
(1 * )
1 + ( n 1) * 1 + ( n 1) *

The Bhlmann credibility premium is

Z X + (1 Z ) =
n*
X+
(1 * ) = * n
Xi +
(1 * )
1 + ( n 1) * 1 + ( n 1) * 1 + * n * i =1 1 + * n*

You dont need to memorize the Bhlmann credibility premium formula for this special
case. If you understand how to derive the general Bhlmann credibility premium formula,
you can derive the special case formula any time by setting Cov ( X i , X j ) = * 2 .

Next, lets turn our attention toward how to solve the Bhlmann credibility problem on
the exam.

Guo Fall 2009 C, Page 122 / 284


How to tackle Bhlmann credibility problems
Step 1 Divide the policyholders into sub-classes 1 , 2 , 3

Step 2 For each sub-class , calculate the average claim cost (or loss ration,
aggregate claim, etc) ( ) = E ( X ) ; calculate the variance of the claim
cost Var ( X ).
Step 3 Calculate EV= E Var ( X ) , the average variance for all sub-classes
combined. Calculate VE= Var E(X ) , the variance of the average
claim for all sub-classes combined.

EV n
Step 4 Calculate k = , Z=
VE n+k

Step 5 Calculate = E E(X ) , the average claim cost for all sub-classes
combined. This is the uniform group premium rate you would charge
under the classical theory of insurance.

n
1
Step 6 Calculate the sample claim of the past data X = Xi .
n i =1

Step 7 Calculate the Bhlmann credibility premium Z X + (1 Z ) . This is the


weighted average of the sample mean and the uniform group rate.

An example illustrating how to calculate the Bhlmann credibility premium

(Nov 2003 #23)

You are given:

Two risks have the following severity distributions:

Amount of Claim Probability of Claim Probability of Claim


Amount for Risk 1 Amount for Risk 2
250 0.5 0.7
2,500 0.3 0.2
60,000 0.2 0.1

Risk 1 is twice as likely to be observed as Risk 2.

Guo Fall 2009 C, Page 123 / 284


A claim of 250 is observed.
Determine the Bhlmann credibility estimate of the second claim amount from the same
risk.

Solution

This is a typical problem for Exam C. Here policyholders are from two risk classes. Even
though the problem doesnt say that Risk 1 and Risk 2 are two sub-risks of a similar
bigger risk group (i.e. homogeneous group), we should assume so. Otherwise, the
Bhlmann credibility formula wont work. Remember the Bhlmann credibility premium
is the weighted average of the uniform group rate and the risk specific sample mean
X . If Risk 1 and Risk 2 are not sub-risks of a homogeneous group, then the uniform
group rate doesnt exist; we have no way of calculating Z X + (1 Z ) .

The problem says that a claim of 250 is observed. This means that a policyholder of an
unknown sub-class has incurred a claim of X 1 =$250. Since Risk 1 is twice as likely as
2 1
Risk 2, the $250 claim has chance of coming from Risk 1 and chance of from Risk
3 3
2. The question asks us to estimate the next claim amount X 2 incurred by the same
policyholder.

Amount of Claim Probability of Claim Probability of Claim


Amount for Risk 1 Amount for Risk 2
250 0.5 0.7
2,500 0.3 0.2
60,000 0.2 0.1

Let X represent the dollar amount of a claim randomly chosen.

E ( X risk 1) = 250(0.5) + 2,500(0.3) + 60,000(0.2) = 12,875


E ( X risk 2 ) = 250(0.7) + 2,500(0.2) + 60,000(0.1) = 6,675

The uniform group rate is:


= E ( X ) = P ( X from risk 1) E ( X risk 1) + P ( X from risk 2 ) E ( X risk 2 )
2 1
= (12,875 ) + ( 6, 675 ) = 10,808.33
3 3

The variance of the conditional mean is:

VE = P ( X from risk 1) E ( X risk 1) + P ( X from risk 2 ) E ( X risk 2 )


2 2
2

Guo Fall 2009 C, Page 124 / 284


2 1
= (12,875 ) + ( 6, 675 ) 10,808.332 = 8,542, 222.22
2 2

3 3

( )
E X 2 risk 1 =2502(0.5) + 2,5002(0.3)+ 60,0002(0.2)=721,906,250

E(X 2
risk 2 ) =250 (0.7 )+ 2,500 (0.2) + 60,000 (0.1) =361,293,750
2 2 2

(
Var ( X risk 1) = E X 2 risk 1 ) E 2 ( X risk 1) = 721,906, 250 12,8752 = 556,140, 625

Var ( X risk 2 ) = E ( X 2
risk 2 ) E 2 ( X risk 2 ) = 361, 293, 750 6, 6752 = 316, 738,125

The average conditional variance is:

EV = P ( X from risk 1) Var ( X risk 1) + P ( X from risk 2 ) Var ( X risk 1)


2 2
= ( 556,140, 625) + ( 316, 738,125) = 476,339, 791.67
3 3

EV 476,339, 791.67
k= = = 55.76
VE 8,542, 222.22

n 1
Z= = = 1.76% ,
n + k 1 + 55.76

P = Z X + (1 Z ) = 1.76% ( 250 ) + (1 1.76% )10,808.33 = 10, 622.50

Next, I want to emphasize an important point.

In the Bhlmann credibility premium formula, what matters is the


1
X = ( X 1 + X 2 + ... + X n ) , not the individual claims data X 1 , X 2 ,, X n .
n

For example, for n = 3 , ( X 1 , X 2 , X 3 ) = ( 0,3, 6 ) , ( X 1 , X 2 , X 3 ) = (1, 7,1) , and


( X 1 , X 2 , X 3 ) = ( 2, 2,5) have the same X = 3 and will produce the same renewal premium
P = Z X + (1 Z ) = 3Z + (1 Z ) .

Guo Fall 2009 C, Page 125 / 284


Shortcut
We can rewrite the Bhlmann credibility premium formula as

n
k + Xi
n k k + nX
P = Z X + (1 Z ) = X+ = = i =1

n+k n+k n+k n+k

EV
We can interpret k = as the number of samples taken out of the global mean .
VE
Imagine we have two urns, A and B. A contains an infinite number of identical balls with
each ball marked with the number . B contains an infinite number of identical balls
with each ball marked with the number X . You take out k balls from Urn A and n balls
from Urn B.

n
k + Xi
k + nX
Then the average value per ball is: P = = i =1

n+k n+k

This is the renewal premium just for year n + 1 .

k n
n

k + Xi
X k + nX
P= = i =1

n+k n+k

A B

Practice problems

Q1 You are an actuary on group health insurance pricing. You want to use the
Bhlmann credibility premium formula P = Z X + (1 Z ) to set the renewal premium
rate for a policy. One day the vice president of your company stops by. He has a Ph.D.
degree in statistics and is widely regarded as an expert on the central limit theorem. He
asks you to throw the formula P = Z X + (1 Z ) into the trash can and focus on .
All we care about is . As long as we charge each policyholder , well be okay, the
vice president says. The fundamental concept of insurance is that many people form a
group to share the risk. If we charge , the law of large numbers will work its magic and
well be able to collect enough premiums to pay our guarantees.

Guo Fall 2009 C, Page 126 / 284


Comment on the vice presidents remarks.

Solution

According to the law of large numbers, for a homogeneous group of policyholders, we


can set the premium rate equal to the average claim cost = E ( X ) . Some policyholders
will suffer losses greater than E ( X ) , while others will suffer losses less than E ( X ) .
However, on average, insurance companies will have collected just enough premiums to
offset the loss. As long as each policyholder pays , then the insurer will be solvent.

However, in practice, insurance companies cant charge X . Members of a so-called


homogenous risk group are really different risks. Policyholders of different risks can shop
around and compare premium rates. If any policyholder believes that his premium is too
high, he can terminate his policy and buy cheaper insurance elsewhere.

If an insurer charges to similar yet different risks, good risks will stop doing business
with the insurer and buy cheaper insurance elsewhere; only bad risks will remain in the
insurers book of business. As more and more good risks leave the insurers book of
business, the actual expected claim cost will exceed the original average premium rate .
Then the insurer has to increase , causing more policyholders to terminate their
policies. Gradually, the insurers customer base will shrink and the insurer will go
bankrupt.

Q2 Compare and contrast the classical theory of insurance and the credibility theory
of insurance.

Solution

Guo Fall 2009 C, Page 127 / 284


The classical theory vs. the credibility theory:
Classical theory of Credibility
insurance Theory
Is there a homogeneous Yes. This is the Each member of a seemingly
group? foundation of insurance. homogeneous group belongs
Identical risks form a to a sub-class. The insurer
homogeneous group to doesnt know who belongs to
share risks. which sub-class.
Are claim random variables Yes. Since each member No. Since members of a
X of different members of a of a homogeneous group similar risk group are actually
group independent has identical risk, each of different sub-risk classes,
identically distributed? members claim random only claims incurred by the
variable is independent same sub-class are
identically distributed at independent identically
all times. distributed.
Whats the fair premium The fair premium is The fair premium is
rate? E ( X ) = , where X is E ( X = ) = ( ) , which
the random loss variable is the mean claim cost of the
of any policyholder in sub-class . Every member
the homogeneous group. of the same sub-class
Every member of a needs to pay ( ) ,
homogeneous group
needs to pay , the
uniform group pure
premium rate.

Q3 One day you visited your college statistics professor. He asked what you were
doing in your job. You told him that you used the Bhlmann credibility premium formula
to set the renewal premium for group health insurance policies. The Bhlmann credibility
theory was new to the professor. After listening to your explanation of the formula
P = Z X + (1 Z ) , he looked puzzled. He told you that for 20 years he had been telling
his students that X is the unbiased estimator of E ( X ) . I dont get it. Why dont you
just set P X ?

Explain why its not a good idea to set P X.

Solution

Your stats professor is perfectly correct in saying that the sample mean is an unbiased
estimator of the population mean. If the number of observations n is large (so we have
observed X 1 , X 2 , , X n claims), for any policyholder, setting his renewal premium
equal to his prior average mean claim is a good idea.

Guo Fall 2009 C, Page 128 / 284


In reality, however, its hard to implement the idea P X . Often you, as an insurer,
have to set the renewal premium with limited data (so n may be small). For a small n ,
X may not be a good estimate of E ( X ) . In addition, we may have a weird situation
where X = 0 . In our taxi driver insurance example, if you use P X to set the renewal
premium for Adam, youll get P = 0. This clearly doesnt make any sense.

Q4 Nov 2005 #26


For each policyholder, losses X 1 , X 2 , , X n , conditional on , are independent
identically distributed with mean

( ) = E(X j = ), j = 1, 2,..., n

and variance

v( ) = Var ( X j = ), j = 1, 2,..., n

You are given:


The Bhlmann credibility factor assigned for estimating X 5 based on X 1 , X 2 ,
X 3 , X 4 is Z = 0.4

The expected value of the process variance is known to be 8.

Calculate Cov ( X i , X j ) , i ' j .

Solution

Z=
(
Cov X , X n +1 ) = Var ( )
=
VE
Var X ( ) Var X ( ) 1
VE + EV
n

We are told that n = 4 (we have four years of claim data), Z = 0.4 , and VE = 0.8 .

VE VE
0.4 = = , VE = 1.33 .
VE +
8 VE + 2
4

(
So Cov ( X i , X j ) = Cov X , X n +1 = Var ( ) ) = VE = 1.33

Guo Fall 2009 C, Page 129 / 284


Q5 Nov 2005 #19
For a portfolio of independent risks, the number of claims for each risk in a year follows
a Poisson distribution with means given in the following table:

Class Mean # of claims per risk # of risks


1 1 900
2 10 90
3 20 10

You observe x claims in Year 1 for a randomly selected risk.

The Bhlmann credibility estimate of the number of claims for the same risk in Year 2 is
11.983. Determine x .

Solution

The problem states that x claims in Year 1 have been observed for a randomly selected
risk. The wording a randomly selected risk is needed because in order for the
Bhlmann credibility formula to work, the risk class must be unknown to us. If we
already know the risk class, we can calculate the expected number of claims in Year 2;
we dont need to estimate any more.

Please also pay attention to the wording the Bhlmann credibility estimate of the
number of claims for the same risk in Year 2 is In order for the Bhlmann credibility
formula to work, the renewal premium (or the expected number of claims in this
problem) in year n + 1 and the prior n year claims X 1 , X 2 , , X n must refer to the
same (unknown) risk class.

And now back to the problem. Let Y represent the number of claims incurred in a year
by a randomly chosen class. Since Y has is Poisson random variable,
E (Y ) = Var (Y ) .
Class Mean # of claims per risk P( = ) # of risks
= E (Y ) = Var (Y )
1 1 90% 900
2 10 9% 90
3 20 1% 10
Total 100% 1,000

The global mean (using the double expectation theorem):


= E (Y ) = E E (Y ) = P( ) E (Y ) = 1( 90% ) + 10 ( 9% ) + 20 (1% ) = 2

The average conditional variance:


Guo Fall 2009 C, Page 130 / 284
EV = P( ) Var (Y ) = 1( 90% ) + 10 ( 9% ) + 20 (1% ) = 2

The variance of conditional means:


E (Y ) E (Y ) { E E (Y ) }
2 2
VE = Var =E
= 12 ( 90% ) + 102 ( 9% ) + 202 (1% ) 22 = 9.9

E (Y ) = E E (Y { ) E (Y )}
2
Alternatively, VE = Var E

= (1 2 ) ( 90% ) + (10 2 ) ( 9% ) + ( 20 2 ) (1% ) = 9.9


2 2 2

EV 2
k= =
VE 9.9

n
2
nY + k
Yi + k x+ ( 2)
P= = i =1
= 9.9 = 11.983 , x = 14
n+k n+k 1+
2
9.9

Q6 Nov 2005 #7
For a portfolio of policies, you are given:
The annual claim amount on a policy has probability density function
f (x ) = 2 , 0 < x <
2x

The prior distribution of has density function:


+ ( ) = 4 3, 0 < <1

A randomly selected policy had claim amount 0.1 in Year 1.

Determine the Bhlmann credibility estimate of the claim amount for the selected risk in
Year 2.

Solution

The conditional mean is:


2 x2
E(X ) = x f (x ) dx = x 2x 2 1 3 2
2
dx = 2
dx = 2
x =
0 0 0
3 0 3

Please dont write:

Guo Fall 2009 C, Page 131 / 284


E(X )= x f (x )d Wrong!
0

E(X ) is the expected value of X if we fix the random variable = . So should


be treated as a constant and d = 0 . The correct calculation is to integrate x f ( x )
regarding x , not .

Conditional variance: Var ( X ) = E(X 2


) E2 ( X )
(
E X2 ) = x2 f ( x ) dx = x 2 2x
2
dx =
2 x3
2
dx =
2 1 4
2
4
x
0
=
1
2
2

0 0 0

Var ( X ) = E(X 2
) E 2
(X ) = 12 2 2
3
=
1
18
2

= E(X ) = E E(X ) 2 2
The global mean: =E = E( )
3 3

Var ( X ) E( )
1 1
The expected conditional variance: EV = E =E 2
= 2

18 18

E(X ) 2 2
The variance of the conditional mean: VE = Var = Var = Var ( )
3 3

1 1 1
E( )= +( )d = (4 3
)d = 4 4
d =
4
5
0 0 0
1 1 1
E( )= (4 ) d 4
2 2
+( )d = 2 3
= 4 5
d =
0 0 0
6
2

) = E( ) 4 4 2
Var ( 2
E2 ( )= =
6 5 75

2 2

EV = E ( ) = 181 64 ,
1 2 2 2
2
VE = Var ( )=
18 3 3 75
1 4
2 2 4 8 EV 18 6
= E( )= = , k= = 2
= 3.125
3 3 5 15 VE 2 2
3 75

Guo Fall 2009 C, Page 132 / 284


The above fraction is complex. We dont want to bother expressing k in a neat fraction;
trying to expressing k in a neat fraction is prone to errors.

n
8
k + Xi 3.125 + 0.1
k + X1 15
P= i =1
= = = 0.428
n+k 1+ k 1 + 3.125

Alternative method of calculation. This method is more complex.

2 x2
E(X )= x f (x ) dx = 2x 2 1 3 2
x 2
dx = 2
dx = 2
x = (as before)
0 0 0
3 0 3

1 1 1

= E(X ) = E E(X ) = E(X )+ ( 2 2 4 8


)d = 4 3
d = 5
=
0 0
3 3 5 0 15

E(X ) E(X ) {E E ( X ) }
2 2
VE = Var =E

1 1 2

E(X ) E(X ) 2
+( )d +( )d
2 2
E = =
=0 =0
3
1 2 1
2 16 16 1 8
= 4 3
d = 5
d = =
=0
3 9 =0
9 6 27

{E E ( X ) }
2

E(X )
2 2 8 8
VE = VE = = 0.01185
27 15

EV = E Var ( X ) , Var ( X ) = E(X 2


) E2 ( X )

E X2( ) = x2 f ( x ) dx = x 2 2x
2
dx =
2 x3
2
dx =
2 1 4
2
4
x
0
=
1
2
2

0 0 0

Var ( X )= E X2 ( ) E2 ( X )= 1
2
2 2
3
=
1
18
2

1 1
EV = E Var ( X ) Var ( X ) 1 4 1
= +( ) d = 4 3 2
d =
0 0
18 18 6

Guo Fall 2009 C, Page 133 / 284


4 1
EV 18 6
k= = = 3.125
VE 0.01185

n
8
k + Xi 3.125 + 0.1
k + X1 15
P= i =1
= = = 0.428
n+k 1+ k 1 + 3.125

Q7 May 2005, #20


For a particular policy, the conditional probability of the annual number of claims given
, = , and the probability distribution of , are as follows:

# of claims 0 1 2 0.05 0.30


Probability 2 1 3 Probability 0.80 0.20

Two claims are observed in Year 1.

Calculate the Bhlmann credibility estimate of the number of claims in Year 2.

Solution

Let X represent the annual number of claims.

X 0 1 2
Probability 2 1 3

E(X ) = 0 ( 2 ) + 1( ) + 2 (1 )=2 5 3

(
E X2 ) = 02 ( 2 ) + 12 ( ) + 22 (1 3 ) = 4 11
Var ( X )=4 (2 ) =9
2 2
11 5 25

= E E(X ) = E [2 5 ] = 2 5E ( )
VE = Var E ( X ) = V ( 2 5 ) = V ( 5 ) = 25Var ( )
EV = E Var ( X ) = E ( 9 25 2 ) = 9E ( ) 25E ( 2 )
0.05 0.30
Probability 0.80 0.20

E( ) = 0.05 ( 0.8) + 0.3 ( 0.2 ) = 0.1


E( 2
) = 0.052 ( 0.8) + 0.32 ( 0.2 ) = 0.02
Guo Fall 2009 C, Page 134 / 284
Var ( ) = 0.02 0.12 = 0.01

= 2 5E ( ) = 2 5 ( 0.1) = 1.5
VE = 25Var ( ) = 25 ( 0.01) = 0.25
EV = 9 E ( ) 25 E ( 2 ) = 9 ( 0.1) 25 ( 0.02 ) = 0.4

n
k + Xi
EV 0.4 k + X 1 1.6 (1.5 ) + 2
k= = = 1.6 P= i =1
= = = 1.69
VE 0.25 n+k 1+ k 1 + 1.6

Q8 May 2005, #17


You are given:
The annual number of claims on a given policy has a geometric distribution with
parameter -
The prior distribution of - has the Pareto density function
.
+ (- ) = , 0<- <
( - + 1)
. +1

where . is a known constant greater than 2.

A randomly selected policy has x claims in Year 1.

Calculate the Bhlmann credibility estimate of the number of claims for the selected
policy in Year 2.

Solution

Let X represent the annual number of claims on a randomly selected policy. Here the
risk factor is - . The conditional random variable X - has geometric distribution. If you
look up Tables for Exam C/4, youll find geometric random variable N with parameter
- has mean and variance as follows:

E(N) = - , Var ( N ) = - (1 + - )

Applying the above mean and variance formula, we have:


The conditional mean: E ( X - ) = -
The conditional variance: Var ( X - ) = - (1 + - )

Guo Fall 2009 C, Page 135 / 284


EV = E- Var ( X - ) = E- - (1 + - ) = E- ( - ) + E- ( - 2 ) . Typically, we write E- ( - )
as E ( - ) and E- ( - 2 ) as E ( - 2 ) . So

EV = E- Var ( X - ) = E- - (1 + - ) = E ( - ) + E ( - 2 )
VE = V- E ( X - ) = V ( - )

The global mean is: = E- E ( X - ) = E ( - )

We are told that the prior distribution of - has the Pareto density function
.
+ (- ) = , 0<- <
( - + 1)
. +1

Here the phrase prior distribution refers to the fact that we know + ( - ) prior to our
observation of x claims in Year 1. In other words, + ( - ) hasnt incorporated our
observation of x claims in Year 1 yet. Please note that the prior distribution, not the
posterior distribution, is used in Bhlmanns credibility estimate.

Frankly, I think SOAs emphasis that + ( - ) is prior (as opposed to posterior) distribution
is unnecessary and really meant to scare exam candidates. When we talk about density
function, we always refer to prior distribution. So theres never a need to say prior
distribution. If we want to refer to a distribution that has incorporated our recent
observations, at that time we say posterior distribution.

Back to the problem. We are told that - has Pareto distribution. Is it a one-parameter
Pareto or two-parameter Pareto? Many candidates have trouble knowing which one to
use. Here is a simple rule:

To decide whether to use one-parameter Pareto or two-parameter Pareto, look at your


random variable X . If X is greater than a positive number, then use single-parameter
Pareto. If X is greater than zero, then use two-parameter Pareto:

.
.
If X > a positive constant , then use single-parameter Pareto f ( x ) = ;
x. +1
.
.
If X > 0 , then use two-parameter Pareto f ( x ) = .
(x + )
. +1

In this problem, the Pareto random variable - > 0 . So we should use the two-parameter
Pareto formula in Tables for Exam C/4.

Guo Fall 2009 C, Page 136 / 284


k
E(Xk ) =
k!
.
(. 1)(. 2 ) ... (. k)

Please note that the denominator has k items.

2 2
E(X ) = , E(X2) =
. 1 (. 1)(. 2 )

.
2
2 2 2
Var ( X ) = E ( X 2
) E (X ) =2
=
(. 1)(. 2 ) . 1 (. 1) (. 2 )
2

Since the two-parameter Pareto is frequently tested in Exam C, you might want to
memorize the following formulas:

2 2 . 2
E(X ) = , E(X ) = 2
, Var ( X ) =
. 1 (. 1)(. 2 ) (. 1) (. 2 )
2

.
- is a two-parameter Pareto random variable with pdf + ( - ) = . So the two
( - + 1)
. +1

parameters are . and = 1 . So we have:

.
, E (- 2 ) =
1 2
E (- ) = , Var ( - ) =
. 1 (. 1)(. 2 ) (. 1) (.
2
2)

.
EV = E ( - ) + E ( - 2 ) =
1 2
+ =
. 1 (. 1)(. 2 ) (. 1)(. 2 )

. 1
VE = V ( - ) = , = E (- ) =
(. 1) (. 2) . 1
2

.
k=
EV
=
(. 1)(. 2 ) = . 1
VE .
(. 1) (. 2 )
2

n
1
k + Xi
k + X1 (. 1) +x
x +1
P= i =1
= = . 1 =
n+k 1+ k 1 + (. 1) .

Guo Fall 2009 C, Page 137 / 284


Q9 May 2005 #11
You are given:
The number of claims in a year for a selected risk follows Poisson distribution
with mean 0
The severity of claims for the selected risk follows exponential distribution with
mean
The number of claims is independent of the severity of claims.
The prior distribution of 0 is exponential with mean 1.
The prior distribution of is Poisson with mean 1.
A priori, 0 and are independent.

Using the Bhlmann credibility for aggregate losses, determine k .

Solution

Let N represent the annual number of claims for a randomly selected risk.
Let X represent the loss dollar amount per loss incident.
Let S represent the aggregate annual claim dollar amount incurred by a risk.

N
Then S = X i = X 1 + X 2 + ... + X N .
i =1

0n
N is a Poisson random variable with mean 0 . So f N ( n 0 ) = e 0
( N = 0,1, 2,... ).
n!
Here 0 is an exponential random variable with pdf f ( 0 ) = e 0 . We have E ( 0 ) = 1 ,
Var ( 0 ) = 12 = 1 , and E ( 0 2 ) = E 2 ( 0 ) + Var ( 0 ) = 12 + 1 = 2

X 1 , X 2 ,, X N are independent identically distributed with a common pdf

f X ( x ) = e x . Here is a Poisson random variable with pdf f ( ) = e


1 1 1
. We
!
have E ( ) = Var ( ) = 1 . Hence E( 2
) = E ( ) + Var ( ) = 1
2 2
+1 = 2 .

Here the risk parameters are ( 0 , ).

E (S ) = E (N ) E ( X ) , Var ( S ) = E ( N ) Var ( X ) + Var ( N ) E 2 ( X )


To remember that you need to use E 2 ( X ) , not E ( X ) , in the Var ( S ) formula, please
note that Var ( S ) is dollar squared. If you use Var ( N ) E ( X ) , youll get dollar, not dollar
squared. As a result, you need to use Var ( N ) E 2 ( X ) .

For a fixed pair of ( 0 , ), the conditional mean is:

Guo Fall 2009 C, Page 138 / 284


E S (0, ) = E(N 0) E( X )=0
The conditional variance is:
Var S ( 0 , ) = E ( N 0 ) Var ( X ) + Var ( N 0 ) E ( X ) = 02 2
+0 2
= 20 2

Please note that N 0 is a Poisson random variable with mean 0 ; X is an exponential


random variable with mean .

{
EV = E0 , Var S ( 0 , ) } = E ( 20 ) . 0,
2

Since 0 and are independent, we have:


EV = E0 , ( 20 2 ) = 2 E ( 0 ) E ( 2 ) = 2 (1)( 2 ) = 4

{E S (0, ) } = Var (0 ) = E 0, (0 ) E 0, (0 )
2
VE = Var 0 ,
2
0,

(0 ) = E 0, (0 2 2 ) = E (0 2 ) E ( ) = 2 ( 2) = 4
2 2
E 0,
E 0, ( 0 ) = E ( 0 ) E ( ) = 1(1) = 1
(0 ) E 0, (0 )
2
VE = E 0 , = 4 12 = 3
2

EV 4
k= =
VE 3

Q10 May 2005 #6


You are given:
Claims are conditionally independent and identically Poisson distributed with
mean

The prior distribution of is:


2. 6
1
F( ) =1 , >0
1+

Five claims are observed.

Determine the Bhlmann credibility factor.

Solution

Guo Fall 2009 C, Page 139 / 284


Let X represent the number of claims. The risk factor is . We are told that X is a
Poisson random variable with mean .

The conditional mean is: E ( X = )=


The conditional variance is: Var ( X = )=
EV = E Var ( X ) = E( ), VE = Var E ( X ) = Var ( )

EV E( )
k= =
VE Var ( )

To quickly calculate E ( ) and Var ( ) , youll need to recognize that:


.

cdf for a two-parameter Pareto random variable is: F ( x ) = 1 , where x > 0 .


x+
2 2 . 2
E(X ) = , E(X2) = , Var ( X ) =
. 1 (. 1)(. 2 ) (. 1) (. 2 )
2

2. 6
1
Here we are given that F ( ) = 1 . So is a two-parameter Pareto random
+1
variable with parameters = 1 and . = 2.6 .

1 1 2.6 2.6
So E ( X ) = = and Var ( X ) = = 2
( 2.6 1) ( 2.6 2 ) 1.6 ( 0.6 )
2
2.6 1 1.6
1
EV E(
) = 1.6 = 1.6 ( 0.6 ) = 0.369
k= =
VE Var ( ) 2.6 2.6
1.6 ( 0.6 )
2

n 5
Z= = = 0.93
n + k 5 + 0.369

Q11 Nov 2004 #29


You are given:
Claim counts follow a Poisson distribution with mean 0
Claim sizes follow a lognormal distribution with parameters and
Claim counts and claim amounts are independent.
The prior distribution has joint pdf:

Guo Fall 2009 C, Page 140 / 284


f (0, , )=2 , 0 < 0 < 1, 0 < < 1, 0 < <1

Calculate Bhlmanns k credibility for aggregate losses.

Solution

Let N represent the claim counts, X i the dollar amount of the i -th claim, and S the
aggregate losses. N 0 has Poisson distribution with mean of 0 . X i , has lognormal
distribution with parameters and . In addition, for i = 1 to N , X i , is
independent identically distributed.

The aggregate loss is:

N
S= Xi
i =1

E ( S ) = E ( N ) E ( X ) , Var ( S ) = E ( N ) Var ( X ) + Var ( N ) E 2 ( X )

The risk parameters are 0 , , and . If we fix 0 , , and , then

E ( S 0, , ) = E(N 0) E( X , ) = 0E ( X , )
Var ( S 0 , , ) = E ( N 0 )Var ( X , ) + Var ( N 0 ) E ( X 2
, )
= 0 Var ( X , ) + 0 E ( X , )
2

= 0 E ( X , )
2

From Tables for Exam C/4, we know the lognormal distribution has the following
moments:

E ( X k ) = exp k + k 2
1 2

, E ( X 2 ) = exp 2 + 22 = exp ( 2 + 2 )
1 1
E ( X ) = exp + 2 2 2

2 2

E ( S 0, , ) = 0E ( X ) = 0 exp 1
, + 2

Guo Fall 2009 C, Page 141 / 284


Var ( S 0 , , ) = 0E ( X ) = 0 exp ( 2 + 2 )
2
, 2

The global mean is:

E ( S 0, , ) 1
= E0 , , = E0 , , 0 exp + 2

1 1 1
1
= 0 exp + 2
f (0, , ) d0 d d
= 0 = 0 0 =0
2

1 1 1 1 1 1
1
= 0 exp + 2 d0 d d = e 0d 0 d d
2
2
2 e0.5
= 0 = 0 0 =0
2 =0 =0 0 =0

1 1 1
1 1
= e d d = ( e 1)
2 2
2 e0.5 2 e0.5 d
2 =0 =0 2 =0

Set 0.5 2
= y . Then d = dy .

1 0.5
= ( e 1) e0.5 d = ( e 1) e y dy = ( e 1) ( e0.5 1)
2

=0 y =0

E ( S 0, , ) E ( S 0, , ) {E E ( S 0, , )}
2 2
VE = Var 0 , , = E 0 , , 0 , ,

0 2 exp ( 2 + )
1
E 0 , , 0 exp + 2
= E 0 , , 2

1 1 1
= 0 2 exp ( 2 + 2
) f ( 0, , ) d 0 d d
= 0 = 0 0 =0

1 1 1
= 0 2 exp ( 2 + 2
)2 d0 d d
= 0 = 0 0 =0

1 1 1 1 1
2 exp ( ) 2 exp ( )
1
= 2
exp ( 2 ) 0 2d 0 d d = 2
exp ( 2 ) d d
=0 =0 0 =0 3 =0 =0

1 1
2 exp ( ) 12 ( e 1) d ( e 1) exp ( )d
1 1 2
= 2 2
= 2

3 =0
3 =0

Guo Fall 2009 C, Page 142 / 284


Set 2
= y . Then 2 d = dy .

2 1 1
E 0 , , 0 exp +
1
2
2
=
3
( e 1)
1 2
exp ( 2
)d =
3
(
1 2
e 1)
1 y
2
e dy
=0 y =0

=
1 1
3 2
(e 2
1) ( e 1) = ( e 1) ( e 1)
1 2
6

{E E ( S 0, , )} = 2 = ( e 1) ( e0.5 1)
2 2 2
0 , ,

E ( S 0, , ) {E E ( S 0, , )}
2 2
VE = E 0 , , 0 , ,

( e 1) ( e 1)
1 2
( e 1) (e 1) = 0.5872
2
=
2 0.5

EV = E0 , , Var ( S 0 , , ) = E0 , , 0 exp ( 2 + 2 2
)
1 1 1
= 0 exp ( 2 + 2 2
) f (0, , ) d 0 d d
= 0 = 0 0 =0

1 1 1
= 0 exp ( 2 + 2 2
)2 d0 d d
= 0 = 0 0 =0

1 1 1 1 1
2 exp ( 2 ) 2 exp ( 2 )
1
= 2
exp ( 2 ) 0d 0 d d = 2
exp ( 2 ) d d
=0 =0 0 =0 2 =0 =0

( e 1)
1 1 2
2 exp ( 2 )d (
1 1 2
e 1) ( e 2 1) = ( e2 1) = 5.103
1 1 2
= 2
=
2 2 =0
2 2 2 8

EV 5.103
k= = = 8.69
VE 0.5872

Shortcut to avoid the hard-core integration seen above.

The joint pdf is f ( 0 , , )=2 = a (0 )b ( ) c ( ) , where a ( 0 ) = 1 , b ( ) = 1 , and


c( )=2 . In addition, 0 , , and lie in the cube 0 < 0 < 1 , 0 < < 1 , 0 < < 1.

Guo Fall 2009 C, Page 143 / 284


Consequently, 0 , , and are independent random variables with the following
marginal pdf:

f0 ( 0 ) = 1 , 0 < 0 < 1;
f ( ) = 1, 0 < < 1;
f ( )=2 , 0< < 1.

= E0 , , E ( S 0, , ) = E0 , , 0 exp +
1
2
2
= E ( 0 ) E ( e ) E e0.5 ( 2

)
( )
1 1
E ( e ) = e du = e 1 , = 2 ( e0.5 1)
1 1
E (0 ) = = e0.5 2 d = 2 e0.5
2 2 2
, E e0.5
2 0 0
0

= E ( 0 ) E ( e ) E e0.5 ( 2

) = ( e 1) ( e 0.5
1)

EV = E0 , , Var ( S 0 , , ) = E0 , , 0 exp ( 2 + 2 2
) = E ( 0 ) E ( e2 ) E e 2 ( ) 2

1
E (e )= ( e 1)
2 1 2 1 1 2
e 2 du = e =
0
2 0 2

( )
1
1 2
( e 1)
1 2
1
= e2 2 d = =
2 2 2
E e2 e
0
2 0 2

EV = E ( 0 ) E ( e 2 ) E e 2 ( ) = 12 12 ( e 1) 12 ( e 1) = 18 ( e 1)
2
2 2 2 2
= 5.103

E ( S 0, , ) E ( S 0, , ) {E E ( S 0, , )}
2 2
VE = Var 0 , , = E 0 , , 0 , ,

E ( S 0, , )
2
= E 0 , , 2
2

E 0 , , 0 exp +
1
2
2
= E 0 , , 0 2 exp ( 2 + 2
) = E ( 0 2 ) E ( e2 ) E e ( ) 2

( )
1 1
E ( 0 2 ) = 0 2 d 0 = , E ( e 2 ) = ( e2 1) , E e
1 1 1
= e 2 d = e =e 1
2 2 2

0
3 2 0
0

E ( S 0, , ) ( e 1) ( e 1)
1 2
( e 1) (e 1) = 0.5872
2 2
VE = E 0 , , 2 =
2 0.5

EV 5.103
k= = = 8.69
VE 0.5872

Guo Fall 2009 C, Page 144 / 284


Please note
The joint pdf f ( 0 , , ) = a ( 0 ) b ( ) c ( ) alone doesnt guarantee that 0 , , and are
independent. The additional requirement for 0 , , and to be independent is that 0 , ,
and lie in a cube A < 0 < B , C < < D , E < < F , where A,B,C,D,E, and F are
constant. For example, say A < 0 < B , C < < D , e ( 0 ) < < f ( 0 ) . Even if
f (0, , ) = a ( 0 ) b ( ) c ( ) , then 0 , , and are NOT independent.

Q12 Nov 2004 #25


You are given:
A portfolio of independent risks is divided into two classes.
Each class contains the same number of risks.
For each risk in Class 1, the number of claims per year follows a Poisson
distribution with mean 5.
For each risk in Class 2, the number of claims per year follows a binomial
distribution with mean m = 8 and q = 0.55 .
A randomly selected risk has three claims in Year 1, r claims in Year 2, and four
claims in Year 3.

The Bhlmann credibility estimate for the number of claims in Year 4 for this risk is
4.6019.

Determine r .

Solution

Risk P ( Risk ) X Risk E ( X Risk ) Var ( X Risk )


#1 0.5 Poisson with mean 5 5 5
#2 0.5 Binomial m = 8 and q = 0.55 8(0.55) = 4.4 8(0.55)0.45 = 1.98

1 1
= ( 5 + 4.4 ) = 4.7 , EV = ( 5 + 1.98) = 3.49
2 2
VE = ( 5 4.4 ) 0.52 = 0.09
2

n
k + Xi
EV 3.49
k= = = 38.78 , P= i =1

VE 0.09 n+k

38.78 ( 4.7 ) + ( 3 + r + 4 )
4.6019 = , r =3
3 + 38.78

Guo Fall 2009 C, Page 145 / 284


Q13 Nov 2001 #23
You are given the following information on claim frequency of auto accidents for
individual drivers:

Business Use Pleasure use


Expected claims Claim variance Expected claims Claim variance
Rural 1.0 0.5 1.5 0.8
Urban 2.0 1.0 2.5 1.0
Total 1.8 1.06 2.3 1.12

You are given:


Each drivers claim experience is independent of every other drivers.
There are an equal number of business and pleasure use drivers.

Determine the Bhlmann credibility factor for a single driver.

Solution

The key to solving this problem is correctly identifying risk classes. There are four risk
classes:

= ( BR, BU , PR, PU )

BR=Business & Rural Use, BU=Business & Urban Use


PR=Pleasure & Rural Use, PU=Pleasure & Urban Use

Next, we need to calculate the probability of Rural Use and Urban Use.

Expected claims
Rural 1.0
Urban 2.0
Total 1.8

P ( R )(1.0 ) + P (U )( 2.0 ) = 1.8 ; P ( R ) + P (U ) = 1


Solving these two equations, we get: P ( R ) = 0.2 , P (U ) = 0.8 .

Next, we list the probability for each class:


Business Use 0.5 Pleasure use 0.5
Rural 0.2 P(BR)=0.2(0.5)=0.1 P(PR)=0.2(0.5)=0.1
Urban 0.8 P(BU)=0.8(0.5)=0.4 P(PU)=0.8(0.5)=0.4

Guo Fall 2009 C, Page 146 / 284


Let X represent the claim frequency of auto accidents of a randomly selected driver.

E(X ) Var ( X ) P( ) E(X )


2

BR 1.0 0.5 0.1 1.0


BU 2.0 1.0 0.4 4.0
PR 1.5 0.8 0.1 2.25
PU 2.5 1.0 0.4 6.25

= E E(X ) = 1.0(0.1) + 2.0(0.4) + 1.5(0.1) + 2.5(0.4) = 2.05


EV = E Var ( X ) = 0.5(0.1) + 1.0(0.4) + 0.8(0.1) + 1.0(0.4) = 0.93
VE = Var E ( X ) = E E ( X ) {E E ( X ) }
2 2

E(X ) = 1.0(0.1) + 4.0(0.4) + 2.25(0.1) + 6.25(0.4) = 4.425


2
E

E(X ) {E E ( X ) } =4.425 2.052 = 0.2225


2 2
VE = E

EV 0.93
k= = = 4.18
VE 0.2225

The Bhlmann credibility factor for a single driver is:

n 1
Z= = = 0.193
n + k 1 + 4.18

Guo Fall 2009 C, Page 147 / 284


Chapter 6 Bhlmann-Straub credibility model
In the Bhlmann credibility model, we focus on one policyholder. We know that this
policyholder has incurred claim amounts X 1 , X 2 ,, X n in Year 1, 2, , n
respectively. We want to estimate his conditional mean claim amount in Year n + 1 :

E ( X n +1 X 1 , X 2 ,..., X n )

We find that the conditional mean claim is E ( X n +1 X 1 , X 2 ,..., X n ) (1 Z ) + Z X .

Now we move from the Bhlmann credibility world to a more complex, the Bhlmann-
Straub credibility world. Instead of looking at only one policyholder, we look at a group
of policyholders.

Context of the Bhlmann-Straub credibility model


In Year 1, there are m1 policyholders. The 1st policyholder has incurred X (1, t = 1) claim.
The 2nd policyholder has incurred X ( 2, t = 1) claim. And the m1 -th policyholder has
incurred X ( mi , t = 1) claim dollar amount.

In Year 2, there are m2 policyholders. The 1st policyholder has incurred X (1, t = 2 )
claim. The 2nd policyholder has incurred X ( 2, t = 2 ) claim. And the m2 -th
policyholder has incurred X ( m2 , t = 2 ) claim amount.

In Year t , there are mt policyholders. The 1st policyholder has incurred X (1, t ) claim.
The 2nd policyholder has incurred X ( 2, t ) claim. And the mt -th policyholder has
incurred X ( mt , t ) claim amount.

In Year n , there are mn policyholders. The 1st policyholder has incurred X (1, t = n )
claim. The 2nd policyholder has incurred X ( 2, t = n ) claim. And the mn -th
policyholder has incurred X ( mn , t = n ) claim amount.

In Year n + 1 , there are mn +1 policyholders.

Question: In Year n + 1 , how much renewal premium should each of the mn +1


policyholders pay?

Guo Fall 2009 C, Page 148 / 284


Assumptions of the Bhlmann-Straub credibility model
All the observed policyholders belong to the same sub-risk class . That is, m1
policyholders in Year 1, m2 policyholders in Year 2, mn policyholders in Year n , and
the mn +1 policyholders in year n + 1 , all belong to the same sub-risk .

We dont know the specific value of . All we know is that takes on a random value
from = { 1 , 2 ,...} .

Given , all the claims throughout n + 1 years, X (1, t = 1) , X ( 2, t = 2 ) ,,


X ( mn , t = n ) , X ( mn +1 , t = n + 1) are independent identically distributed with a common
conditional mean E X ( i, t ) = ( ) and a common conditional variance
Var X ( i, t ) = 2
( ).

One approach is to calculate the renewal premium for Year n + 1 from the scratch. An
easier approach is to convert the Bhlmann-Straub credibility problem into a standard
Bhlmann credibility problem. Ill do both.

First, lets look at the problem from the Bhlmann world. In Year 1, m1 policyholders
m1
have incurred a total of X ( i, t = 1) claim amount. Because these m1 policyholders
i =1
belong to the same, unknown, sub-risk , theres no distinction between any two of these
m1 policyholders. All these m1 policyholders are just photocopies of one another.

So
m1
In Year 1, m1 policyholders have incurred a total of X ( i, t = 1) claim amount.
i =1

Is the same as
m1
In the first m1 years, one policyholder has incurred a total of X ( i, t = 1) claim amount.
i =1

m1
In either case, the total claim amount is X ( i, t = 1) ; the average claim per policyholder
i =1
m1
1
per year is X ( i, t = 1) .
m1 i =1

Guo Fall 2009 C, Page 149 / 284


Similarly,
m2
In Year 2, m2 policyholders have incurred a total of X ( i, t = 2 ) claim amount.
i =1

Is the same as
m1
In the next m2 years, the policyholder (who has incurred X ( i, t = 1) in the first m1
i =1
m2
years) has incurred total X ( i, t = 2 ) claim.
i =1

So on and so forth.

Now the original Bhlmann-Straub problem becomes a standard Bhlmann problem:


m1
In the first m1 years, one policyholder has incurred total X ( i, t = 1) claim.
i =1
m2
In the next m2 years, the policyholder has incurred total X ( i, t = 2 ) claim.
i =1
m3
In the next m3 years, the policyholder has incurred total X ( i, t = 3) claim.
i =1

mn
In the next mn years, the policyholder has incurred total X ( i, t = n ) claim.
i =1

This is the same as:


n mi
In m = m1 + m2 + ... + mn years, one policyholder has incurred total X ( i, t ) claim.
t =1 i =1

Then the expected claim cost in Year m + 1 for one policyholder can be calculated using
the Bhlmann credibility formula:

P = Z X + (1 Z ) , where

n mt
Total observed claims 1
X= = X ( i, t )
Total # of observed years m t =1 i =1

# of observation years m E 2
( )
Z= = , k=
# of observation years + k m + k Var ( )

Guo Fall 2009 C, Page 150 / 284


Theres nothing new under the sun in the Bhlmann-Straub credibility model. Every
problem about the Bhlmann-Straub credibility model can be solved using the Bhlmann
credibility model.

Actually, we can have a unified formula for the Bhlmann-Straub and the Bhlmann
credibility models:
P = Z X + (1 Z )

observed claim dollar amounts


X= ,
# of observed exposures (measured on the insured-year basis)

# of observed exposures E 2
( )
Z= , k=
# of observed exposures + k Var ( )

In this unified formula, the observed exposure is measured on the insured-year basis. For
example, if one policyholder has incurred $500 claim in one year, the exposure is:

1 insured 1 year = 1 insured-year

If the policyholder has incurred $500 claim in a 2-year period, then the exposure is:

1 insured 2 years = 2 insured-years

Lets see how the unified formula works for the Bhlmann and the Bhlmann-Straub
credibility models. In the Bhlmann model, we have an n -year claim history of one
policyholder. So the observed exposure is:

1 insured n years = n insured-years.

Then the formula becomes:

n X 1 + X 2 + ... + X n
Z= , X=
n+k n

In the Bhlmann-Straub model, we have observed m = m1 + m2 + ... + mn policyholders.


However, we have only 1-year claim data for each of these m policyholders. So the total
# of exposures is:

m insureds 1 year = m insured - years

Then the unified formula becomes:

Guo Fall 2009 C, Page 151 / 284


n mt
m 1
Z= , X= X ( i, t )
m+k m t =1 i =1

Now you know how to convert a Bhlmann-Straub problem into a Bhlmann problem
and how to use a unified formula for the Bhlmann-Straub model and the Bhlmann
model. Next, Ill derive the Bhlmann credibility formula from the scratch. First, lets
create an average policyholder and reorganize each years claim data from the viewpoint
of this average policyholder.

Lets look at the claim history data in the Bhlmann-Straub model from the average
policyholders point of view:

m1
In year 1, m1 policyholders have incurred total X ( i, t = 1) claim. So the average
i =1
m1
1
policyholder has incurred X 1 = X ( i, t = 1) claim.
m1 i =1

m2
1
In Year 2, the average policyholder has incurred total X 2 = X ( i, t = 2 ) claim.
m2 i =1

mt
1
In Year t , the average policyholder has incurred total X t = X ( i, t ) claim.
mt i =1

mn
1
In Year n , the average policyholder has incurred total X n = X ( i, t = n ) claim.
mn i =1

Our goal is to estimate E ( X n +1 ) , this average claim in Year n + 1 . Well use a + Z X to


n
mi
approximate E ( X n +1 ) , where X = E ( X n +1 )
2
X i . Well minimize E a + Z X .
i =1 m

Z=
(
Cov X , X n +1 ), a = (1 Z )
Var X ( )
mt mt mt
E ( Xt )=E 1 1 1
X ( i, t ) = E X ( i, t ) = ( )= ( )
mt i =1 mt i =1 mt i =1

Guo Fall 2009 C, Page 152 / 284


mt mt mt
Var ( X t ) = Var 1 1 1
X ( i, t ) = Var X ( i, t ) = 2
( )
mt i =1 mt2 i =1 mt2 i =1

1
2
( )
= 2 mt 2
( ) =
mt mt

( )
( )
n n n n
E X =E
mi
m
( Xi ) =
1
m
mi E ( X i ) =
1
m
mi ( ) =
m
mi = ( )
i =1 i =1 i =1 i =1

( )
( )
n n n 2

Var X = Var
mi
m
( Xi ) =
1
m2
mi 2Var ( X i ) =
1
m2
mi 2
mi
i =1 i =1 i =1

2
( ) n 2
( ) 2
( )
= 2
mi = 2 (m) =
m i =1 m m

Heres a quick way to find E X ( )= ( ) and Var X ( ) = m1 2


( ) without using
complex summation symbols above. The # of policyholders observed is
m = m1 + m2 + ... + mn . The claims incurred by these m policyholders, given , are
independent identically distributed with a common mean ( ) and a common variance
2
( ) . As a result, the average claim amount incurred by these m policyholders, given
1
, has mean ( ) and variance 2
( ).
m

( )
Cov X , X n +1 = E X X n +1 ( ) E(X )E(X n +1 )

(
E X X n +1 = E ) (
E X X n +1 ) =E E X ( )E(X n +1 ) =E ( )
2

( )
E X =E E X ( ) =E ( )

E ( X n +1 ) = E E ( X n +1 ) =E ( )

(
Cov X , X n +1 = E ) ( )
2
E ( ) = Var ( )

Guo Fall 2009 C, Page 153 / 284


Z=
(
Cov X , X n +1 )= Var ( )
=
m
Var X ( ) Var ( ) +
2
( ) m+
2
( )
m Var ( )

Summary of the Bhlmann-Straub credibility model


Period 1 Period n

Exposure m1 mn

Hypothetical mean for risk ( ) = E ( X 1 ) = E ( X 2 ) = ... = E ( X n ) = ...


per unit exposure

Var ( X 1
2
( ) 2
( )
Process variance risk )= m1
Var ( X n )= mn

Then

E ( X n +1 X 1 , X 2 ,..., X n ) = Z X + (1 Z )

X=
n
mi
Xi , Z=
m
, k=
2
( )
i =1 m m+k Var ( )

Remember that X 1 , X 2 , , X n are claims incurred by an artificially created average


n
policyholder in Year 1, 2, , n respectively. In addition, Z . Dont make a
n+k
n n
common mistake of writing Z = . The formula Z = is not good for the
n+k n+k
Bhlmann-Straub credibility model.

Key point
In the Bhlmann-Straub credibility model, what matters is the total exposure m and the
historical average claim per exposure X . The individual claim amount X ( i, t ) doesnt
matter.

For example, everything else being equal, the following two cases have the same
Bhlmann-Straub credibility estimate.

Guo Fall 2009 C, Page 154 / 284


Case #1
m1 = 2 , X (1, t = 1) = 7 , X (1, t = 1) = 1 ;
m2 = 3 , X (1, t = 2 ) = 0 , X ( 2, t = 2 ) = 4 , X ( 3, t = 2 ) = 2 .

Case #2
m1 = 1 , X (1, t = 1) = 9 ;
m2 = 4 , X (1, t = 2 ) = 3 , X ( 2, t = 2 ) = 0.6 , X ( 3, t = 2 ) = 1 , X ( 3, t = 2 ) = 0.4 .

In both cases,
the total exposure is m1 + m2 = 5 ;
the total claim dollar amount is 14 = 7+0+4+2 =9+3+0.6+1+0.4;
the average claim per insured per year is 14/5=2.8.

General Bhlmann-Straub credibility model (more realistic)

This is a minor point. If you dont care, just skip it.

Loss Models mentions the Hewitts version of the Bhlmann-Straub credibility model.
This model assumes that X i , the average claim, given the sub risk class , are
independently distributed with a common mean E ( X i ) = ( ) and a variance
Var ( X i
( ) 2

) = w( ) + m .
i

So the difference between the general and the standard Bhlmann-Straub model is about

the conditional variance assumption. Hewitts assumption is Var ( X i


( )
2

) = w( ) + m ;
i

the standard Bhlmann-Straub assumption is Var ( X i


2
( ).
)= mi

Then Loss Models derived the formulas:


w = E w( ) , v = E v ( ) , w +
v
mj
= E Var X j ( )
n mj n
1 n
1
m* = = =
j =1 v + wmj j =1
w+
v
mj
j =1 (
E Var X j )

Guo Fall 2009 C, Page 155 / 284


n
1
Xj
P = Z X + (1 Z ) , Z=
am *
, X=
j =1 (
E Var X j )
1 + am * n
1
j =1 (
E Var X j )

If m1 = m2 = ...mn = m , then

n n
1 1 n
m* = = = ,
v v v
j =1
w+ j =1
w+ w+
mj m m

am * 1 1 n
Z= = = =
1 + am * 1 1 v
w+
v
1+ w+
1 m
a m* 1+ m n+
a n a

Two points to remember:

( )
mj X j
First, X . Here X j s weight is inversely proportional E Var X j , the
mj
expected process variance . The higher the expected process variance of X j , the less
weight is assigned to X j . This way, X will have the minimum variance. This point is
explained in the study note by Curtis Gary Dean. Refer to this study note if you want to
find out more.

am *
Next, lets look at the crazy formula Z = . To get comfortable with this formula,
1 + am *
n 1
look at the basic formula Z = = . Lets compare these two formulas:
v v
n+ 1+
a na

n 1 am * 1
Z= = , Z= =
n+
v 1 v 1 + am * 1 1
1+ 1+
a a n a m*

Guo Fall 2009 C, Page 156 / 284


Now you see that these two formulas are similar. If Var ( X i ) = ( ) and m
2
j = 1 as in
the Bhlmann model, then

n n
1 1 n 1 1
m* = = = , Z= =
j =1 E Var X j ( ) j =1 v v
1+
1 1
1+
1 v
a m* a n

1 n
This is the Bhlmann credibility premium formula Z = = .
1 v v
1+ n +
a n a

The third point. Loss Models points out that in this version of the model, as m j
approaches infinity, the credibility factor Z wont approach to one. Lets take a look at
this.

Var ( X i
( ) 2

) = w( ) + m
i

, Var ( X i
2
( )
When m j ) = w( )+
mi
w( ).

n
1 n 1 1
m* = = , Z= = <1
j =1 w w 1 1 1 w
1+ 1+
a m* a n

Compare this with the Bhlmann model or the Bhlmann-Straub model. In the Bhlmann
model, as the number of exposures n approaches ,

n 1
Z= = 1
v 1 v
n+ 1+
a a n

In the Bhlmann-Straub model, w = 0 . So as m j ,

Var ( X i
2
( ) n
)= 0, m* =
1
mi j =1 (
E Var X j )
1
Z= 1
1 1
1+
a m*

Guo Fall 2009 C, Page 157 / 284


Finally, Loss Models has a special case of the general Bhlmann-Straub model. In this

special case, Var ( X i


( )
2

) = w( ) + m as in the standard Bhlmann-Straub model.


i

b
Whats new is Var ( ) =a+ , as opposed to Var ( ) = a in the Bhlmann
m
n
model and the Bhlmann-Straub model. Here m = m j represents the total exposure.
j =1

b
As you can see, this special case just changes Var ( ) = a to Var ( ) =a+ . In
m
b
other words, this special case just changes a to a + . Loss Models points out to find
m
b
the credibility factor for this special case, we just need to change a to a + :
m

b
a+ m*
am * m
Z= Z=
1 + am * b
1+ a + m*
m

How to tackle the Bhlmann-Straub premium problem

Most likely, Exam C wont have problems on the generalized version of the Bhlmann-
Straub model. So you should focus on the standard Bhlmann-Straub model. To tackle
the standard Bhlmann-Straub model, you can use any of the following 3 approaches:

m
Use the Bhlmann-Straub model formula Z =
m+k

Convert the Bhlmann-Straub problem into a Bhlmann problem. So instead of


having m = m j policyholder, we have m years of observation of one
# of observation years m
policyholder. Then Z = =
# of observation years + k m + k

Use the unified formula (without converting into the Bhlmann model):

# of observation years m
Z= =
# of observation years + k m + k

Guo Fall 2009 C, Page 158 / 284


Sample SOA Problems

Nov 2001 #26


You are given the following data on large business policyholders:
Losses for each employee of a given policyholder are independent and have a
common mean and variance.

The overall average loss per employee for all policyholders is 20.

The variance of the hypothetical mean is 40.

The expected value of the process variance is 8,000.

The following experience is observed for a randomly selected policyholder:

Year Average loss per employee Number of employees


1 15 800
2 10 600
3 5 400

Determine the Bhlmann-Straub credibility premium per employee for this policyholder.

Solution

Method 1 Use the Bhlmann-Straub credibility premium formula

The global mean: = 20 .


The expected process variance: EV=8,000
The hypothetical mean: VE=40

EV 8, 000
So k = = = 200
VE 40

m = m1 + m2 + m3 = 800 + 600 + 400 = 1800

m 1800
Z= = = 0.9
m + k 1800 + 200

1 3
800 (15 ) + 600 (10 ) + 400 ( 5)
X= mi X i = = 11.111
m i =1 1800

P = Z X + (1 Z ) = 0.9 (11.111) + 0.1( 20 ) = 12

Alternatively,
Guo Fall 2009 C, Page 159 / 284
3
k +
200 ( 20 ) + 800 (15) + 600 (10 ) + 400 ( 5 )
mi X i
P= i =1
= = 12
m+k 1800 + 200

Method 2 Convert the Bhlmann-Straub problem into a Bhlmann problem

We convert this table

Year Average loss per employee Number of employees


1 15 800
2 10 600
3 5 400

Into

Year Total loss Number of employee


First 800 years 15*800 1
Next 600 years 10*600 1
Next 400 years 5*400 1

The above two tables are essentially the same. In both tables, the average loss per
employee per year is

800 (15 ) + 600 (10 ) + 400 ( 5 )


X= = 11.111
1800

After the conversion, the # of observation years n = 800 + 600 + 400 = 1800 . This seems
crazy, but it is merely a conceptual tool for us to transform a Bhlmann-Straub problem
into a Bhlmann problem.

Using the Bhlmann premium formulas, we have:

n 1800
Z= = = 0.9 , P = Z X + (1 Z ) = 0.9 (11.111) + 0.1( 20 ) = 12
n + k 1800 + 200

Method 3 Use the unified credibility premium formula

In this method, we dont care about the distinction between the Bhlmann and the
Bhlmann Straub models. We just use the following unified formulas:

Guo Fall 2009 C, Page 160 / 284


P = Z X + (1 Z )

observed claims
X= ,
# of observed exposures (measured on the insured-year basis)

# of observed exposures E 2
( )
Z= , k=
# of observed exposures + k Var ( )

# of observed exposures 1800


Z= = = 0.9
# of observed exposures + k 1800 + 200

800 (15 ) + 600 (10 ) + 400 ( 5 )


X= = 11.111
1800

P = Z X + (1 Z ) = 0.9 (11.111) + 0.1( 20 ) = 12

May 2001 #23


You are given the following information about a single risk:
The risk has m exposures in each year
The risk is observed for n years
The variance of the hypothetical means is a
v
The expected value of the annual process variance is w +
m

Determine the limit of the Bhlmann-Straub credibility factor as m approaches infinity.

Solution

A nave approach is to use the Bhlmann credibility formula:

n n n
Z= = =
n + k n + EV w+
v
VE n + m
a
v n
Then as m approaches infinity, w + approaches w and Z approaches Z = .
m w
n+
a

Incidentally, this leads to the correct answer. However, this line of thinking is
problematic. As explained earlier, in the Bhlmann-Straub model, the credibility factor is

Guo Fall 2009 C, Page 161 / 284


n
mi
m n
Z= = i =1
, not Z = .
m+k n
n+k
mi + k
i =1

The correct logic is to realize that this problem involves a special Bhlmann-Straub

credibility model where Var ( X i


2
( ) . We are told that
mi
) = w( )+ m1 = m2 = ... = mn .

As derived earlier, when m = m2 = ... = mn = m , we have:

am * 1 1 n
Z= = = =
1 + am * 1 1 v
w+
v
1+ w+
1 m
a m* 1+ m n+
a n a

v n n
As m , 0, Z =
m v w
w+ n+
n+ m a
a

Nov 2004 #9
Members of three classes of insureds can have 0, 1, or 2 claims, with the following
probabilities:

# of claims
Class 0 1 2
I 0.9 0.0 0.1
II 0.8 0.1 0.1
III 0.7 0.2 0.1

A class is chosen at random, and varying # of insureds from that class are observed over
2 years, as shown below:

Year # of insureds # of claims


1 20 7
2 30 10

Determine the Bhlmann-Straub credibility estimate of the number of claims in Year 3


for 35 insureds.

Solution

Guo Fall 2009 C, Page 162 / 284


Method 1 Use the Bhlmann-Straub credibility premium formula

Class P( ) E(X ) Var ( X )


(1) (2)
I 1/3 0.2 0.36
II 1/3 0.3 0.41
III 1/3 0.4 0.44

Note (1) 0.2 = 0*(0.9) + 1*(0.0) + 2*(0.1)


(2) 0.36 =02*(0.9) + 12*(0.0) + 22*(0.1) 0.22
1
m = m1 + m2 = 20 + 30 = 50 , = ( 0.2 + 0.3 + 0.4 ) = 0.3 ,
3
VE = Var E ( X ) = ( 0.22 + 0.32 + 0.4 2 ) 0.32 = 0.00667
1
3
EV = E Var ( X ) = ( 0.36 + 0.41 + 0.44 ) = 0.4033
1
3

EV 0.4033 m 50 7 + 10
k= = = 60.5 , Z= = = 0.4525 , X= = 0.34
VE 0.00667 m + k 50 + 60.5 50

P = 0.4525 ( 0.34 ) + (1 0.4525 ) 0.3 = 0.318

Alternatively, after we find k , we can skip Z :

2
k + mi X i
60.5 ( 0.3) + (10 + 7 )
P= i =1
= = 0.318
m+k 50 + 60.5

Method 2 Convert the Bhlmann-Straub problem into a Bhlmann problem

We convert this table

Year # of insureds # of claims


1 20 7
2 30 10

into

Year Total # of claims Number of insured


First 20years 7 1
Next 30 years 10 1

Guo Fall 2009 C, Page 163 / 284


The above two tables are essentially the same. In both tables, the average loss per insured
per year is

7 + 10
X= = 0.34
50

After the conversion, the # of observation years n = 20 + 30 = 50 . Using the Bhlmann


premium formulas, we have:

n 50
Z= = = 0.4525 , P = 0.4525 ( 0.34 ) + (1 0.4525 ) 0.3 = 0.318
n + k 50 + 60.5

Method 3 Use the unified credibility premium formula

In this method, we dont care about the distinction between the Bhlmann and the
Bhlmann Straub models.

# of observed exposures 50
Z= = = 0.4525
# of observed exposures + k 50 + 60.5

7 + 10
X= = 0.34
50

P = 0.4525 ( 0.34 ) + (1 0.4525 ) 0.3 = 0.318

Nov 2002 #32


You are given four classes of insured, each of whom may have zero or one claim, with
the following probabilities:

# of claims
0 1
Class
I 0.9 0.1
II 0.8 0.2
III 0.5 0.5
IV 0.1 0.9

A class is selected at random, and four insureds are selected at random from the class.
The total number of claims is two, If five insureds are selected at random from the same
class, estimate the total number of claims using the Bhlmann Straub credibility.

Solution

Guo Fall 2009 C, Page 164 / 284


You can use any one of the three methods. Here I use the Bhlmann Straub credibility
m
formula Z = .
m+k

Class P( ) E(X ) Var ( X )


(1)
I 1/4 0.1 0.09
II 1/4 0.2 0.16
III 1/4 0.5 0.25
IV 1/4 0.9 0.09

Note (1) 0.9 = ( 1 0 ) 2 * 0.9 * 0.1. Use the following shortcut:

a with probability p
If X = ! , then E ( X ) = ap + bq , Var ( X ) = ( a b ) pq
2

"b with probability q = 1- p

1
= ( 0.1 + 0.2 + 0.5 + 0.9 ) = 0.425
4
1
EV = ( 0.09 + 0.16 + 0.25 + 0.09 ) = 0.1475
4
VE = ( 0.12 + 0.22 + 0.52 + 0.92 ) 0.4252 = 0.096875
1
4

EV 0.1475 m 4 2
k= = = 1.5226 , Z= = = 0.7243 , X= = 0.5
VE 0.096875 m + k 4 + 1.5226 4

P = 0.7243 ( 0.5 ) + (1 0.7243) 0.425 = 0.4793 (this is for one insured)

The credibility premium for the 5 insureds is 0.4793*5=2.4

Nov 2003 #27


You are given:
The # of claims incurred in a month by any insured has a Poisson distribution
with mean #
The claim frequency of different insureds are independent.
The prior distribution is gamma with probability density function

(100# ) 100 #
6
e
f (# ) =
120#

Guo Fall 2009 C, Page 165 / 284


Months # of insureds # of claims
1 100 6
2 150 8
3 200 11
4 300 ?

Determine the Bhlmann Straub credibility estimate of the # of claims in Month 4.

Solution

This time, lets solve it by converting the Bhlmann Straub credibility problem into a
Bhlmann credibility problem.

This table

Months # of insureds # of claims


1 100 6
2 150 8
3 200 11

Is the same as

Months # of insureds # of claims


First 100 1 6
Next 150 1 8
Next 200 1 11

So the total number of the observation years n = 100 + 150 + 200 = 450 . The total # of
25
observed claims is 6+8+11=25. So X = .
460

Let N represent the # of claims incurred in a month by a randomly chosen policyholder.


Then N # is Poisson with mean # . So the risk random variable is # .

E ( N # ) = Var ( N # ) = #

= EV = E# Var ( N # ) = E# [ # ] = & = 6
1
= 0.06
100
2

VE = Var# E ( N # ) = Var# [ # ] = & (& + 1)


1
2
& 2 2
= 6(7) 0.06 2 = 0.0006
100

EV 0.06 n 450
k= = = 100 Z= = = 0.818
VE 0.0006 n + k 450 + 100

Guo Fall 2009 C, Page 166 / 284


25
P = Z X + (1 Z ) = 0.818 + (1 0.818 ) 0.06 = 0.0564
460
300*0.0564=16.9

Nov 2005 #22


You are given:
A region is comprised of 3 territories. Claims experience for Year 1 is as follows:

Territory # of insureds # of claims


A 10 4
B 20 5
C 30 3

The # of claims for each insured each year has a Poisson distribution.
Each insured in a territory has the same expected claim frequency.
The # of insureds is constant over time for each territory.

Determine the Bhlmann-Straub empirical Bayes estimate of the credibility factor Z for
Territory A.

Solution

Territory P( ) E(X ) = Var ( X )


A 1/6 4/10=0.4
B 2/6 5/20=0.25
C 3/6 3/30=0.1

= E E(X ) = E Var ( X )
1 2 3
= EV = ( 0.4 ) + ( 0.25 ) + ( 0.1) = 0.2
6 6 6

VE = Var E ( X ) E(X )
1
( 0.4 2 ) + ( 0.252 ) + ( 0.12 ) 0.22 = 0.0125
2 3
2
=E 2 =
6 6 6

EV 0.2 n 10
k= = = 16 , Z= = = 0.385
VE 0.0125 n + k 10 + 16

Guo Fall 2009 C, Page 167 / 284


Chapter 7 Empirical Bayes estimate for the
Bhlmann model
Deans study note has a good explanation of the formulas and worked out problems.
Read Deans study note along with my explanation.

This topic is among the least interesting ones in Exam C. However, it was repeatedly
tested in Exam C. The exam problems on this topic are easy. The difficulty is to
memorize the formulas. In this chapter, I will show you some ideas behind the formulas
to help you memorize the formulas.

Empirical Bayes estimate for the Bhlmann model

We have an n -year claim data about r risks. For each risk, we have its claim amount in
Year 1, Year 2, , Year n . Let X i j represent the claim incurred by the i -th
policyholder in Year j . This is what we know:

Risk Year 1 Year 2 Year n

1 X 11 X 12 X 1n
2 X 21 X 22 X 2n

r X r1 X r2 Xrn

The issue here is that we dont know the probability distribution of the conditional claim
random variable X or the probability distribution of the risk variable . As a result, we
cant calculate the two inputs for the credibility factor Z : the expected process variance
EV = E Var ( X ) and the variance of the hypothetical mean VE = Var E ( X ) .
So we need to estimate EV and VE from the past claim data given to us.

Its easy to estimate EV = E Var ( X ) . We can estimate Var ( X ) for each risk

(X )
n
2 1 2
using the formula i = it Xi . Then well take the average and find
n 1 t =1
EV = E Var X i j( ) . This estimation process can be summarized as follows:

Guo Fall 2009 C, Page 168 / 284


(X )
Risk Year 1 Year 2 Year Sample 2 1 n 2

n i = it Xi
mean X i n 1 t =1

( )
1 X 11 X 12 X 1n 1 n 2 1 n 2
X1 = X 1t 1 = X 1t X1
n t =1 n 1 t =1

( )
2 X 21 X 22 X 2n 1 n 2 1 n 2
X2 = X 2t 2 = X 2t X2
n t =1 n 1 t =1

r X r1 X r2 Xrn 1 n 1 n
( )
2 2
X1 = Xr t r = X rt Xn
n t =1 n 1 t =1

Then the expected process variance is estimated as:

(X )
r r n
1 2 1 2
EV = = Xi .
r ( n 1)
i it
r i =1 i =1 t =1

Next, we need to estimate VE = Var E X i j ( ) . We dont directly estimate VE.


Instead, we estimate VE using the following equation:

( )
Var X = Var ( ) +
1
n
E Var ( X ) = VE +
1
n
EV

We derived this equation in the chapter on the Bhlmann model. Then:

VE = Var ( ) = Var X ( ) 1
n
E Var ( X ) = Var X ( ) 1
n
EV

VE = V ar X ( ) 1
n
EV

( ) ( ) (X )
n
1 2
V ar X is simple to calculate: V ar X = i X
r 1 i =1

( ) 1 n
( ) (X )
r n
1 2 1 1 2
So VE = V ar X EV = Xi X Xi
n r ( n 1)
it
n r 1 i =1 i =1 t =1

In some situations, VE calculated this way may be negative. If VE is negative, we set


EV
VE = 0 . If VE = 0 , then k = and Z = 0 .
VE

Guo Fall 2009 C, Page 169 / 284


Summary of the estimation process for the empirical Bayes estimate for the
Bhlmann model
Step 1 Calculate the sample variance for each risk and the expected process variance for
all risks combined:

(X ) (X )
n r r n
2 1 2 1 2 1 2
= Xi , EV = = Xi
r ( n 1)
i it i it
n 1 t =1 r i =1 i =1 t =1

( ) (X )
n
1 2
Step 2 Calculate V ar X = i X
r 1 i =1

Step 3 Use the equation Var X = VE + ( ) 1


n
EV . Find VE = V ar X ( ) 1
n
EV

May 2000 #15


An insurer has data on losses for four policyholders for seven years. X i j is the loss from
the i -th policyholder for year j . You are given:

(X )
4 7 2
ij Xi = 33.6
i =1 j =1

(X )
4 2
ij Xi = 3.3
i =1

Calculate the Bhlmann credibility factor for an individual policyholder using


nonparametric empirical Bayes estimation.

Solution

Here the # of risks r = 4 ; the # of observation years is n = 7 .

(X )
r r n
1 2 1 2 1
Step 1 EV = = Xi = 33.6 = 1.4
r ( n 1) 4 ( 7 1)
i it
r i =1 i =1 t =1

( ) (X )
n
1 2 3.3
Step 2 V ar X = i X = = 1.1
r 1 i =1 4 1

Step 3 ( )
Var X = VE +
1
n
EV VE = 1.1
1.4
7
= 0.9

Guo Fall 2009 C, Page 170 / 284


EV 1.4 n 7
k= = , Z= = = 0.818
VE 0.9 n + k 7 + 1.4
0.9

Nov 2002 #11


An insurer has data on losses for four policyholders for seven years. X i j is the loss from
the i -th policyholder for year j . You are given:

(X )
4 7 2
ij Xi = 33.6
i =1 j =1

(X )
4 2
ij Xi = 3.3
i =1

Using the nonparametric empirical Bayes estimation, calculate the Bhlmann credibility
factor for an individual policyholder.

Solution

This is the same as May 2000 #15.

Nov 2003 #15


You are given total claims for two policyholders:

Year
Policyholder 1 2 3 4
X 730 800 650 700
Y 655 650 625 750

Using the nonparametric empirical Bayes estimation, calculate the Bhlmann credibility
factor for Policyholder Y.

Solution

r = 2, n = 4.

Step 1 Calculate the sample conditional variance for each risk and the mean

( ) (X )
n r r n
2 1 2 1 2 1 2
= X it Xi , EV = = Xi
r ( n 1)
i i it
n 1 t =1 r i =1 i =1 t =1

Guo Fall 2009 C, Page 171 / 284


730 + 800 + 650 + 700
X= = 720
4

655 + 650 + 625 + 750


Y= = 670
4

(X )
n
1
V ar ( X ) =
2
it Xi
n 1 t =1
1
= ( 730 720 ) + (800 720 ) + ( 650 720 ) + ( 700 720 ) =3,933.33
2 2 2 2

4 1

1
V ar ( Y ) = ( 655 670 ) + ( 650 670 ) + ( 625 670 ) + ( 750 670 )
2 2 2 2
=3,016.67
4 1

1
EV = ( 3,933.33 + 3, 016.67 ) = 3, 475
2

( ) (X )
n
1 2
Step 2 Calculate V ar X = i X
r 1 i =1

The global mean: =


1
2
( 1
)
X + Y = ( 720 + 670 ) = 695
2

( )
V ar X =
1
2 1
( 720 695 ) + ( 670 695 )
2 2
= 1, 250

Step 3 Use the equation Var X = VE + ( ) 1


n
EV . Find VE = V ar X ( ) 1
n
EV

VE = V ar X ( ) 1
n
EV = 1, 250
1
4
( 3, 475 ) = 381.25

The final result:

EV 3, 475 mY 4
k= = = 9.115 , ZY = = = 0.305
VE 381.25 mY + k 4 + 9.115

PY = ZY Y + (1 ZY ) = 0.305 ( 670 ) + (1 0.305 ) 695 = 687.4

Guo Fall 2009 C, Page 172 / 284


Empirical Bayes estimate for the Bhlmann-Straub model

Here the # of policyholders varies from risk to risk and year to year. For risk 1, m11
policyholders have incurred X 11 claim amount in Year 1; m12 policyholders have
incurred X 12 claim amount in Year 1; ; that m1n policyholders have incurred X 1n
claim amount in Year 1.

For risk 2, m21 policyholders have incurred X 21 claim amount in Year 1; m22
policyholders have incurred X 22 claim amount in Year 2; ; that m2n2 policyholders
have incurred X 2n2 claim amount in Year n2 . So on and so forth.

This is the information given to you:


Risk Year 1 Year 2 Year Year Year
n1 n2 nr
1 X 11 X 12 X 1n1
m11 m12 m1n1 m1n2
2 X 21 X 22 X 2n X 2 n2
m21 m22 m2n

r X r1 X r2 Xrn mr nr
mr1 mr 2 mr n X r nr

How to estimate:
Risk Periods Total Sample mean Sample variance
exposure

( )
1 n1 n1
1 n1
2 1 n1
2
m1 = m1t X1 = m1t X 1t 1 = m1t X 1t X1
t =1 m1 t =1 n1 1 t =1

( )
2 n2 n1
1 n2
2 1 n2
2
m2 = m2 t X2 = m2t X 2t 2 = m2 t X 2 t X2
t =1 m2 t =1 n2 1 t =1

( )
nr nr nr
r nr 1 2 1 2
mr = mr t Xr = mr t X r t r = mr t X r t Xr
t =1 mr t =1 nr 1 t =1
Total r nr r
1
( ni 1)
2
mr = mi X= mi X i i
i =1 m t =1
EV = i =1
r
( ni 1)
i =1

Guo Fall 2009 C, Page 173 / 284


Step 1 Calculate the sample variance for each risk and the expected process variance for
all risks combined:

( )
ni
2 1 2
i = mi t X i t Xi ,
ni 1 t =1

( )
r r ni

( ni 1)
2 2
i mi t X i t Xi
EV = i =1
r
= i =1 t =1
r
( ni 1) ( ni 1)
i =1 i =1

Step 2 Calculate VE

( )
r
( r 1) EV
2
mi X i X
VE = i =1
r
1
m mi2
m i =1

This formula is counter-intuitive and very hard to remember. However, youll just have
to memorize it. Perhaps Deans explained might help you a little bit. He says that the
crude estimate for VE is

( )
r 2
mi X i X
VE = i =1

r 1

However, this estimate is biased. To have an unbiased estimator, we need to change the
above estimate to

( )
r
( r 1) EV
2
mi X i X
VE = i =1
r
1
m mi2
m i =1

This isnt a big help on how to memorize the formula. This formula is hard. Youll just
have to memorize it.

Final point. Loss Models mentions the concept of credibility weighted average premium.
It proves that the total loss will be equal to the total premium if we set

Guo Fall 2009 C, Page 174 / 284


r
Zi X i
= i =1
r
Zi
i =1

Refer to Loss Models to understand the proof.

Nov 2000 #27


You are given the following information on towing losses for two classes of insured,
adults and youths:

Exposures
Year Adult Youth Total
1996 2000 450 2450
1997 1000 250 1250
1998 1000 175 1175
1999 1000 125 1125
Total 5000 1000 6000

Pure Premium
Year Adult Youth Total
1996 0 15 2.755
1997 5 2 4.400
1998 6 15 7.340
1999 4 1 3.667
Weighted 3 10 4.167
Average

You are also given that the estimated variance of the hypothetical means is 17.125.
Determine the non-parametric empirical Bayes credibility premium for the youth class,
using the method that preserves the total losses.

Solution

We have two risk groups: adults and youths. So r = 2 .

( )
r r ni

( ni 1)
2 2
i mi t X i t Xi
EV = i =1
r
= i =1 t =1
r
( ni 1) ( ni 1)
i =1 i =1

Guo Fall 2009 C, Page 175 / 284


1
2, 000 ( 0 3) + 1, 000 ( 5 3) + 1, 000 ( 6 3) + 1, 000 ( 4 3)
2 2 2 2
=
( 4 1) + ( 4 1)
+450 (15 10 ) + 250 ( 2 10 ) + 175 (15 10 ) + 125 (1 10 )
2 2 2 2

= 12,291.7

The calculation of VE is complex. Fortunately, we are given that a = VE = 17.125


(Thank you SOA! )

5, 000 1, 000
ZA = = 0.874 , ZY = = 0.582
12, 291.7 12, 291.7
5, 000 + 1, 000 +
17.125 17.125

The credibility weighted average global mean is:

r
Zi X i
0.874 ( 3) + 0.528 (10 )
= i =1
= = 5.8
r
0.874 + 0.528
Zi
i =1

The non-parametric empirical Bayes credibility premium for the youth class is:

( )
Z Y X Y + 1 Z Y = 0.582 (10 ) + (1 0.582 ) 5.8 = 8.24

Lets verify that the total credibility premium is equal to the total loss:

The non-parametric empirical Bayes credibility premium for the adult class is:

( )
Z A X A + 1 Z A = 0.874 ( 3) + (1 0.874 ) 5.8 = 3.35

The total credibility premium is:

1,000(8.24)+5,000(3.35)=25,000

The total loss is:


Adult: 2,000(0)+1,000(5)+1,000(6)+1,000(5)=15,000
Or 5,000(average exposure) * (3 average premium per exposure)=15,000

Youth: 450(15)+250(2)+175(15)+125(1)=10,000
Or 1,000(average exposure) * (10 average premium per exposure)=10,000

Total: 25,000
Guo Fall 2009 C, Page 176 / 284
May 2001 #32
You are given the following experience for two insured groups:
Year
Group 1 2 3 4
1 # of members 8 12 5 25
Average loss 96 91 113 97
per member
# of members 25 30 20 75
2 Average loss 113 111 116 113
per member
Total # of members 100
Average loss 109
per member

( )
2 3 2
mij xij xi = 2020
i =1 j =1

( )
2 2
mi xi x = 4800
i =1

Determine the nonparametric Empirical Bayes credibility premium for group 1, using the
method that preserves the total loss.

Solution

( )
r r ni
2
( ni 1)
2
i mi j X i j Xi
i =1 j =1 2020
EV = i =1
= = = 505
r r
2+2
( ni 1) ( ni 1)
i =1 i =1

( )
r
(r 1) EV
2
mi X i X
VE = i =1
=
4800 ( 2 1) 505 = 114.533
( 252 + 752 )
r
1 2 1
m m i
100
m i =1 100

EV 505
k= = = 4.409
VE 114.533

m1 25 m2 75
Z1 = = = 0.85 , Z 2 = = = 0.944
m1 + k 25 + 4.409 m2 + k 75 + 4.409

Guo Fall 2009 C, Page 177 / 284


n
Please dont write Z 1 = . As mentioned before, in the Bhlmann-Straub model,
n+k
m n
Z= , not Z = .
m+k n+k

r
Zi X i
0.85 ( 97 ) + 0.944 (113)
= i =1
= = 105.42
r
0.85 + 0.944
Zi
i =1

( )
Z 1 X 1 + 1 Z 1 = 0.85 ( 97 ) + (1 0.85 )105.42 = 98.26

Nov 2001 #30


You are making credibility estimates for regional rating factors. You observe that the
Bhlmann-Straub nonparametric empirical Bayes method can be applied, with rating
factor playing the role of pure premium. X i j denotes the rating factor for region i and
year j , where i = 1, 2,3 and j = 1, 2, 3, 4 . Corresponding to each rating factor is the
number of reported claims, mi j , measuring exposure.

You are given:

( ) ( )
4 4 4 2
1 1
i mi = mi j Xi = mi j vi = mi j X i j Xi mi j X i X
j =1 mi j =1 3 j =1

1 50 1.406 0.536 0.887


2 300 1.298 0.125 0.191
3 150 1.178 0.172 1.348

Determine the credibility estimate of the rating factor for region 1 using the method that
3
preserves mi X i .
i =1

Solution

( )
r r ni
2
( ni 1)
2
i mi j X i j Xi
i =1 j =1
EV = i =1
r
= r
( ni 1) ( ni 1)
i =1 i =1

Guo Fall 2009 C, Page 178 / 284


1
= 3 ( 0.536 ) + 3 ( 0.125) + 3 ( 0.172 ) = 0.2777
( 4 1) + ( 4 1) + ( 4 1)

( )
r
(r 1) EV
2
mi X i X
0.887 + 0.191 + 1.348 0.2777 ( 2 )
VE = i =1
= = 0.069
( 50 + 300 + 150 )
r
1 1 2 2 2
m mi2 500
m i =1 500

EV 0.2777
k= = = 40.0829
VE 0.069

m1 25
Z1 = = = 0.555
m1 + k 25 + 40.0829

m2 300
Z2 = = = 0.882
m2 + k 300 + 40.0829

m3 150
Z3 = = = 0.789
m3 + k 150 + 40.0829

3
Zi X i
0.555 (1.406 ) + 0.8821(1.298 ) + 0.789 (1.178)
= i =1
= = 1.2824
3
0.555 + 0.8821 + 0.789
Zi
i =1

( )
Z 1 X 1 + 1 Z 1 = 0.555 (1.406 ) + (1 0.555 )1.2824 = 1.35

Nov 2004 #17


You are given the following commercial automobile policy experience:

Company Year 1 Year 2 Year 3


Losses I 50,000 50,000 ?
# of automobile 100 200 ?
Losses II ? 150,000 150,000
# of automobile ? 500 300
Losses III 150,000 ? 150,000
# of automobile 50 ? 150

Determine the nonparametric Bayes credibility factor, Z , for Company III.

Guo Fall 2009 C, Page 179 / 284


Solution

Company Year 1 Year 2 Year 3 Xi


I X11=50,000 / X12=50,000/200 X 1 =100,000/300
100 = 250 = 333.33
= 500

II X21=150,000/500 X22=150,000/300 X 2 =300,000/800


=300 = 500 =375
III X31=150,000/50 X32=150,000/150 X 3 =300,000/200
=3,000 =1,000 =1,500

100, 000 + 300, 000 + 300, 000


X= = 538.46
300 + 800 + 200

( )
r r ni
2
( ni 1)
2
i mi j X i j Xi
i =1 j =1
EV = i =1
r
= r
( ni 1) ( ni 1)
i =1 i =1

1
100 ( 500 333.33) + 200 ( 250 333.33)
2 2
=
( 2 1) + ( 2 1) + ( 2 1)
+500 ( 300 375 ) + 300 ( 500 375 )
2 2

+50 ( 3, 000 1,500 ) + 150 (1, 000 1,500 )


2 2

=53,888,889

( )
r
( r 1) EV
2
mi X i X
VE = i =1
r
1
m mi2
m i =1

300 ( 333.33 538.46 ) + 800 ( 375 538.46 ) + 200 (1,500 538.46 ) 53,888,889 ( 3 1)
2 2 2

=
1,300
1
1,300
( 3002 + 8002 + 2002 )

=157,035.6

53,888,889 200
k= = 343.16 , Z= = 0.368
157, 035.6 200 + 343.16
Guo Fall 2009 C, Page 180 / 284
May 2005 #25

You are given:


Group Year 1 Year 2 Year 3 Total
Total Claims 1 10,000 15,000 2,500
# in Group 50 60 110
Average 200 250 227.27
Total Claims 2 16,000 18,000 34,000
# in Group 100 90 190
Average 160 200 178.95
Total Claims 59,000
# in Group 300
Average 196.67

You are also given a = 651.03

Use the nonparametric empirical Bayes method to estimate the credibility factor for
Group 1.

Solution

( )
r r ni
2
( ni 1)
2
i mi j X i j Xi
i =1 j =1
EV = i =1
r
= r
( ni 1) ( ni 1)
i =1 i =1

1
= 50 ( 200 227.27 ) + 60 ( 250 227.27 )
2 2

( 2 1) + ( 2 1)
+100 (160 178.95 ) + 90 ( 200 178.95 )
2 2

=71,985.65

110
Z1 = = 0.5
71,985.65
100 +
651.03

Guo Fall 2009 C, Page 181 / 284


Semi-parametric Bayes estimate

We have a parametric model for X , but we dont have a parametric model for
(hence the name semi-parametric). Typically, a problem will tell us that X is a Poisson
random variable with mean .

This is how to find EV and VE. Since X is a Poisson, we have:

E(X ) = Var ( X ) = E E(X ) = E Var ( X ) .

= EV .

However, Var ( X ) = E Var ( X ) + Var E ( X ) = EV + VE

VE = Var ( X ) EV ,

May 2000 #33


The number of claims a driver has during the year is assumed to be Poisson distributed
with an unknown mean that varies by driver.

The experience for 100 drivers is as follows:

# of claims during the year # of drivers


0 54
1 33
2 10
3 2
4 1

Determine the credibility of one years experience for a single driver using
semiparametric empirical Bayes estimation.

Solution

Let X represent the # of claims in a year and represents the mean of X . We are told
that X is a Poisson random variable.

= E(X ) = E E(X ) =E ( ) = E( )

EV = E Var ( X ) =E ( ) = E( )
Guo Fall 2009 C, Page 182 / 284
54 ( 0 ) + 33 (1) + 10 ( 2 ) + 2 ( 3) + 1( 4 ) 63
=X = = = 0.63
54 + 33 + 10 + 2 + 1 100

EV = = 0.63

Var ( X ) =
1 100
( )
2
Xi X
100 1 i =1
54 ( 0 .63) + 33 ( 0 .63) + 54 (1 .63) + 10 ( 2 .63) + 2 ( 3 .63) + 1( 4 .63)
2 2 2 2 2 2

=
100 1
=0.68

Var ( X ) = E Var ( X ) + Var E ( X ) = EV + VE


Var ( X ) = EV + VE , VE = Var ( X ) EV = 0.68 0.63 = 0.05

We need to calculate Z for a single driver. So n = 1 .

n 1
Z= = 0.073
EV 0.63
n+ 1+
VE 0.05

When taking the exam, you should use BA II Plus/ Professional 1-V Statistics Worksheet
to quickly calculate the sample mean and the sample variance.

Nov 2000 #7
The following information comes from a study of robberies of convenience stores over
the course of a year:

X i is the number of robberies of i -th store, with i =1,2,,500


X i = 50
X i2 = 220
The number of robberies of a given convenience store during the year is assumed
to be Poisson distributed with an unknown mean that varies by store.

Determine the semiparametric empirical Bayes estimate of the expected number of


robberies next year of a store that reported no robberies during the studied year.

Solution

Guo Fall 2009 C, Page 183 / 284


50
EV = = X = = 0.1
500

(X )
500
1
Var ( X ) =
2
i X
n 1 i =1

A general formula you should memorize:

( ) ( )
n 2 n 2
Xi X = X i2 n X
i =1 i =1

To see why this formula works, notice that the (biased) sample variance is:

( ) = E ( X 2 ) E2 ( X ) = (X )
n n
1 1
Var ( X ) =
2 2
Xi X X i2
n i =1 n i =1

( ) ( ) ( ) ( )
n n n n
1 2 1 2 2 2
Xi X = Xi2 X , Xi X = X i2 n X
n i =1 n i =1 i =1 i =1

1 500
( ) ( )
500
1 220 5
Var ( X ) =
2 2
Xi X = X i 2 500 X = = 0.43086
500 1 i =1 500 1 i =1 499

VE = Var ( X ) EV = 0.43086 0.1 = 0.33086

For a single policy,

n 1
Z= = 0.768
EV 0.1
n+ 1+
VE 0.33086

The single store didnt have any robbery incidents for two years. So the sample mean is
zero.

P Z X + (1 Z ) = (1 Z ) = (1 0.768 ) 0.1 = 0.0232

Nov 2004 #37


For a portfolio of motorcycle insurance policyholders, you are given:
The number of claims for each policyholder has a conditional Poisson distribution
For Year 1, the following data are observed:

Guo Fall 2009 C, Page 184 / 284


Number of claims Number of Policyholders
0 2000
1 600
2 300
3 80
4 20
Total 3000

Determine the credibility factor Z for Year 2.

Solution

Enter the following in BA II Plus/Professional 1-V Statistics Worksheet:

X01=0, Y01=2000
X02=1, Y02= 600
X03=2, Y03= 300
X04=3, Y04= 80
X05=4, Y05= 20

You should get:

The sample mean is X =0.50666667 0.507. This is and EV .


The sample standard deviation is S X = 0.83077411
The sample variance is S X 2 = 0.830774112 = 0.69018562 0.69019 . This is Var ( X ) .

So VE = Var ( X ) EV = 0.69019 0.507 = 0.183

n 1
Z = = 0.265
EV 0.507
n+ 1+
VE 0.183

Please note that n = 1 (we have only one years data).

May 2005 #28

During a 2-year period, 100 policies had the following claim experience:
Number of claims in Year 1 and Year 2 Number of Policyholders
0 50
1 30
2 15
3 4
4 1
Guo Fall 2009 C, Page 185 / 284
The number of claims per year follows a Poisson distribution.
Each policyholder was insured for the entire 2-year period.

A randomly selected policyholder had one claim over the 2-year period.

Using semiparametric empirical Bayes estimation, determine the Bhlmann estimate for
the number of claims in Year 3 for the same policyholder.

Solution

Well use a 2-year period as one unit of time. So well calculate the Bhlmann estimate
the number of claims in Year 3 and Year 4. Then half of this amount will be the
Bhlmann estimate for the number of claims in Year 3.

Enter the following in BA II Plus/Professional 1-V Statistics Worksheet:

X01=0, Y01=50
X02=1, Y02=30
X03=2, Y03=15
X04=3, Y04= 4
X05=4, Y05= 1

You should get:


The sample mean is X =0.76. This is and EV .
The sample standard deviation is S X = 0.92244734
The sample variance is S X 2 = 0.922447342 = 0.85090909 0.851 . This is Var ( X ) .

VE = Var ( X ) EV = 0.851 0.76 = 0.091

n 1
Z = = 0.107
EV 0.76
n+ 1+
VE 0.091

A randomly selected policyholder had one claim over the 2-year period. So the sample
claim frequency is

P Z X + (1 Z ) = 0.107 (1) + (1 0.107 ) 0.76 0.786

1 1
P = ( 0.786 ) = 0.393
2 2

Guo Fall 2009 C, Page 186 / 284


Chapter 8 Limited fluctuation credibility
The study note titled Chapter 8 Credibility jointly written by Mahler and Dean provides
an excellent explanation of the limited fluctuation credibility theory. Please read this
study note along with my explanation.

The goal of the limited fluctuation credibility model is the same as the goal of the
Bhlmann credibility model. We observe that a policyholder has incurred S1 , S2 ,, Sn
claim dollar amounts in Year 1, 2,, n respectively. We want to estimate the
policyholder renewal premium in Year n + 1 . The renewal premium in Year n + 1 is
E ( S n +1 S1 , S 2 ,..., S n ) , the expected claim dollar amount in Year n + 1 .

Here I wrote the past n year claim amounts as S1 , S2 ,, Sn instead of X 1 , X 2 ,, X n as


in the Bhlmann credibility model. Theres a reason for using a different notation. In the
limited fluctuation credibility model, we typically break down the annual claim dollar
amount S into two components:

the number of claims incurred by a policyholder in a year (loss frequency)


the claim dollar amount per loss incurred by a policyholder in a year (loss
severity)

N
Mathematically, S = X i . Here N is the total number of claims incurred in a year (loss
i =1

frequency) by a policyholder. X i is the claim dollar amount of the i -th claim (loss
severity) incurred by the policyholder. S is the total claim dollar amount incurred in a
year (also called the annual aggregate claim) by the policyholder. In contrast, in the
Bhlmann credibility model, we dont break down the annual claim dollar amount into
loss frequency and loss severity.

In the limited fluctuation credibility model, we assume, as in the Bhlmann credibility


model, that the renewal premium is the weighted average of the global premium rate
1
(called manual rate) and the sample mean S = ( S1 + S 2 + ... + S n ) :
n

P = E ( S n +1 S1 , S 2 ,..., S n ) = Z S + (1 Z)
Renewal policyholder-specific global mean
premium sample mean (manual rate)

Here S is specific to a policyholder. Different policyholders have different claim


amounts S1 , S2 ,, Sn and hence different S . However, is the same for all
policyholders regardless of their different claim history.

Guo Fall 2009 C, Page 187 / 284


The limited fluctuation credibility assumes that the above renewal premium equation
automatically holds true without any proof. This equation is the starting point for the
limited fluctuation credibility. So when you study the limited fluctuation credibility,
youll need to accept the above equation without demanding proof.

In contrast, the Bhlmann credibility theory doesnt assume the above equation holds true
automatically. It derives this equation using basic probability theories.

Next, we need to calculate the weighting factor Z ( 0 Z 1 ), which is the credibility


assigned to the prior sample mean S . The Limited fluctuation credibility calculates Z as
follows:

# of observations you actually have your n


Z= =
expected # of observations needed to make Z=1 E ( N ) to make Z=1

If Z calculated above exceeds one, well set Z = 1 .

your n
Once again, the limited fluctuation credibility assumes that Z =
E ( N ) to make Z=1
holds true automatically without the need to prove it. So you need to accept it without
demanding any proof. The core theory of the limited fluctuation credibility is to calculate
E ( N ) to make Z = 1 .

General credibility model for the aggregate loss of r insureds

We first derive a model for r insureds. Then to calculate the renewal premium for one
insured, we just set r = 1 .

The aggregate annual loss for r insureds is:

M
S= X i = X 1 + X 2 + ... + X M
i =1
r
Here X i is the dollar amount of the i -th claim. M = N j = N1 + N 2 + ... + N r is the total
j =1

# of annual claims for r insureds; N j is the number of claims incurred by the j -th
insured.

We assume that X 1 , X 2 , X M are independent identically distributed with a common


pdf f X ( x ) ; N1 , N 2 ,, N r are independent identically distributed with a common pdf
fN (n) .

Guo Fall 2009 C, Page 188 / 284


We arbitrarily set Z = 1 if E ( M ) satisfies the following equation:
S E (S ) E (S )
P S E (S ) k E (S ) p P k p
S S

S E (S ) S E (S )
A simplifying assumption is that is approximately normal. Set Z = .
S S
Then Z is approximately a standard normal random variable.

E (S )
P Z k p,
S

Please note that for a non-negative a ,

P Z a = P( a Z a) = (a) ( a) .

However, ( a ) + ( a ) = 1 . This holds whether a is positive, negative, or zero.

E (S ) E (S )
P Z a =2 (a) 1, P Z k =2 k 1 p
S S

E (S )
Lets consider the worst case P S E (S ) k E ( S ) = p or 2 k 1 = p . We
S

still set Z = 1 for this worst case.

E (S ) E (S ) 1+ p
2 k = 1+ p , k =
S S 2

Define CVS = S
as the coefficient of variation. Its the standard deviation divided by
E (S )
the mean. Then

k 1+ p 1+ p
= , k= 1
CVS .
CVS 2 2

1+ p 1+ p
Next, define ( y) = , or y = 1
. Then k = y CVS
2 2

Guo Fall 2009 C, Page 189 / 284


Key interim formula: credibility for the aggregate loss
As actuaries, we set k and p . Then we find E ( N ) to make Z = 1 by solving the
k 1+ p 1+ p
equation = or k = 1
CVS = y CVS
CVS 2 2

Next, lets drive the full formula.

E ( M ) = E ( N1 + N 2 + ... + N r ) = r E ( N )
Var ( M ) = Var ( N1 + N 2 + ... + N r ) = rVar ( N )

E ( S ) = E ( X 1 + X 2 + ... + X M ) = E ( M ) E ( X ) = r E ( N ) E ( X )
Var ( S ) = Var ( X 1 + X 2 + ... + X M ) = E ( M ) Var ( X ) + Var ( M ) E 2 ( X )
= rE ( N ) Var ( X ) + rVar ( N ) E 2 ( X )
= r E ( N ) Var ( X ) + Var ( N ) E 2 ( X )

(S ) = r E ( N ) Var ( X ) + Var ( N ) E 2 ( X )
CVS =
E (S ) r E(N)E(X )

r E ( N ) Var ( X ) + Var ( N ) E 2 ( X )
k = y CVS = y
r E(N)E( X )

r E ( N ) Var ( X ) + Var ( N ) E 2 ( X ) E ( N ) Var ( X ) + Var ( N ) E 2 ( X )


2
k
= =
r E(N)E( X ) r E(N)E(X )
2 2
y

Var ( X ) Var ( N ) Var ( X ) Var ( N )


2 2
k 1 1 y
= + , r E(N) = +
y r E(N) E2 ( X ) E(N) k E2 ( X ) E(N)

Guo Fall 2009 C, Page 190 / 284


Final formula you need to memorize

You also know how to derive from scratch this is the mother of all the formulas
for the limited fluctuation credibility model

Var ( X ) Var ( N ) Var ( N )


2 2
y y
If r E ( N ) = + = CVX2 + , then Z = 1 .
k E (X )
2
E(N) k E(N)

Please note that r is the number of insureds needed to achieve the full credibility. E ( N )
is the number of annual claims per insured. So r E ( N ) represents the expected number
of claims the insurer needs to have in its book of business to have the full credibility.

2
# of insureds in expected # of
claims per insured 1 1+ p
Var ( X ) Var ( N )
the book of business
2
r E(N ) = +
k E2 ( X ) E(N)
the expected # of claims the insurer
needs to have in its book of business
to have full credibility

One easy mistake made by many is to write

Var ( X ) Var ( N )
2
y
r E(N) = + Wrong!
k E(X ) E(N)

Var ( X )
To remember the term , please note that X is the claim dollar amount. So
E2 ( X )
E ( X ) is dollar amount and Var ( X ) is dollar squared. To have a meaningful ratio, we
need to square E ( X ) so the numerator and denominator are both dollar squared.

Var ( N )
Please also note that is fine. Here N is the claim number. So Var ( N ) is a
E(N )
Var ( N )
number; E ( N ) is a number. So the ratio is fine.
E(N )

Once again, remember that X is the dollar amount of a single claim incurred by one
policyholder and that N is the annual number of claims incurred by the policyholder.

Guo Fall 2009 C, Page 191 / 284


Special case

Credibility formulas for the aggregate loss for one insured (credibility in terms of
the expected number of annual claims)

Set r = 1 .

2
1 1+ p
2 Var ( X ) Var ( N )
If E ( N ) = + , then Z = 1 .
k E2 ( X ) E(N)

your n your n
Z = min , 1 = min ,1
E ( N ) to make Z=1 Var ( N )
n0 CVX +
2

E(N)

May 2000 #26


You are given:
Claim counts follow a Poisson distribution
Claim sizes follow a lognormal distribution with coefficient of variation of 3
Claim sizes and claim counts are independent
The number of claims in the 1st year is 1,000
The aggregate loss in the 1st year was 6.75 million
The manual premium for the 1st year was 5 million
The exposure in the 2nd year is identical to the exposure in the 1st year
The full credibility standard is to be within 5% of the expected aggregate loss
95% of the time

Determine the limited fluctuation credibility net premium (in millions) for the 2nd year.

Solution

We are asked to find the limited fluctuation credibility renewal net premium for Year 2.
So we are just concerned with one policy (or one insured). Set r = 1 .

Credibility for the aggregate loss

Guo Fall 2009 C, Page 192 / 284


2 2
1 1+ p 1 1 + 95%
2 Var ( X ) Var ( N ) 2 Var ( X ) Var ( N )
E(N) = + = +
k E2 ( X ) E(N) 5% E2 ( X ) E(N)

We are told that the claim size X is lognormal with the coefficient of variation of one.
The information that X is lognormal is not needed. SOA just wants to scare us. What
matters is CVX . We are told that CVX = 3 .
Var ( N )
In addition, we know that N is Poisson. So =1.
E(N)

So to have full credibility Z = 1 , the expected number of claims is:

2 2

E(N) =
1.96
5%
( 32 + 1) = 10 1.96
5%

your n 1000 5%
Z = min , 1 = min , 1 = min 10 ,1 =0.255
E ( N ) to make Z=1 1.96
2
1.96
10
5%

P = E ( S n +1 S1 , S 2 ,..., S n ) = Z S + (1 Z)
Renewal policyholder-specific global mean
premium sample mean (manual rate)

= 0.255*6.75 + (1-0.255)*5=5.446

Nov 2000 #14

For an insurance policy, you are given:


For each individual insured, the number of claims follows a Poisson distribution
The mean claim count varies by insured, and the distribution of mean claim
counts follow a gamma distribution
For a random sample of 1000 insureds, the observed claim counts are as follows:

# of claims, n 0 1 2 3 4 5
# of insureds, f n 512 307 123 41 11 6

n f n = 750 , n 2 f n = 1494

Claim sizes follow a Pareto distribution with mean 1500 and variance 6,750,000.
Guo Fall 2009 C, Page 193 / 284
Claim sizes and claim counts are independent
The full credibility standard is to be with 5% of the expected aggregate loss 95%
of the time.

Determine the minimum number of insureds needed for the aggregate loss to be fully
credible.

Solution
2
1 1+ p
2 Var ( X ) Var ( N )
r E(N) = +
k E2 ( X ) E(N )

2
1 1+ p
1 2 Var ( X ) Var ( N )
r= +
E(N) k E2 ( X ) E(N)

1 1+ p 1 1 + 95%
2 2 1.96
= =
k 5% 5%

(X ) =
2
6,750,000 6,750,000
We know that CVX = . CV =2
=3
E(X )
X
1500 1500
Var ( N )
We can use the method of moments to estimate .
E(N )
n2 fn
E(N )=
n fn
750 1, 494
E(N) = = = 0.75 , 2
= = 1.494
1000 1000 1000 1000

Var ( N ) 0.9315
Var ( N ) = 1.494 0.752 = 0.9315 = = 1.242
E(N) 0.75
2
1 1+ p
1 2 Var ( X ) Var ( N )
r= +
E(N) k E2 ( X ) E(N)

2
1 1.96 6,518.42688
= ( 3 + 1.242 ) = = 8, 691.24
0.75 5% 0.75

Guo Fall 2009 C, Page 194 / 284


Nov 2001 #15

You are given the following information about a general liability book of business
comprised of 2500 insureds:
Ni
Xi = Yi j is a random variable representing the annual loss of the i -th insured.
j =1

N1 , N 2 ,, N 2500 are independent and identically distributed random variables


following a negative binomial distribution with parameters r = 2 and = 0.2 .
Yi1 , Yi 2 , , Yi Ni are independent and identically distributed random variables
following a Pareto distribution with parameters = 3 and = 1000 .
The full credibility standard is to be within 5% of the expected aggregate losses
90% of the time.

Using classical credibility theory, determine the partial credibility of the annual loss
experience for the book of business.

Solution

First, lets calculate the # of insureds to have full credibility.

2
1 1+ p
1 2 Var (Y ) Var ( N )
r= +
E(N) k E 2 (Y ) E(N)

Var (Y ) E (Y 2 ) E 2 ( Y ) E (Y 2 )
However, = = 1
E 2 (Y ) E 2 (Y ) E 2 (Y )

2
1+ p
E (Y 2 )
1
1 2 Var ( N )
r= 1+
E(N) k E (Y )
2
E(N)

N is negative binomial with parameters r = 2 and = 0.2 .

Var ( N )
E ( N ) = r = 2 ( 0.2 ) = 0.4 , Var ( N ) = r (1 + ) , = 1+ = 1 + 0.2 = 1.2
E(N)

Y is a 2-parameter Pareto with = 3 and = 1000 .

Guo Fall 2009 C, Page 195 / 284


k
2 2
E (Y k ) = , E (Y 2 ) =
k!
, E (Y ) = ,
( 1)( 2 ) ... ( k) 1 ( 1)( 2)

2 2
E (Y 2 ) ( 1)( 2) 2( 1) 2 ( 3 1)
= = = =4
E (Y ) 2 2
2 3 2
1

Youll want to memorize that


E (Y 2 ) 1
if Y is a 2-parameter Pareto, then =2
E (Y )
2
2

2
1+ p
E (Y 2 )
1
1 2 Var ( N )
r= 1+
E(N) k E 2 (Y ) E(N)

2
1 1 + 90%
2
1 2 1 1.645
= ( 4 1 + 1.2 ) = (4 1 + 1.2 )
0.4 5% 0.4 5%

2 2
4.2 1.645 1.645
= = 10.5
0.4 5% 5%
2
1 1+ p
2
Please note that many times its advantageous not to expand . For
k

example, in this problem, its not necessary to calculate:

2
1 1+ p
2
2 1.645
= 10.5 = 11,365.305
k 5%

2
1.645
Lets continue. 10.5 is the number of insured to get full credibility. However,
5%
the number of insureds is 2500 in the book of the business.

Guo Fall 2009 C, Page 196 / 284


your r 2500 50
Z= = 2
= = 0.469
r to make Z=1 1.645 1.645
10.5 10.5
5% 5%

Nov 2002 #14


You are given the following information about a commercial auto liability book of
business:
Each insureds claim count has a Poisson distribution with mean , where has
a gamma distribution with = 1.5 and = 0.2

Individual claim size amounts are independent and exponentially distributed with
mean 5000

The full credibility standard is for the aggregate losses to be within 5% of the
expected with probability 0.9

Using classical credibility, determine the expected number of claims required for full
credibility.

Solution
2
1 1+ p
2 Var ( X ) Var ( N )
rE ( N ) = +
k E2 ( X ) E(N)
the expected # of claims the insurer
needs to have in its book of business
to have full credibility

1 1+ p 1 1 + 90%
2 2 1.645
= =
k 5% 5%

N is the annual number of claims incurred by one insured. N is Poisson and is


gamma with parameters = 1.5 and = 0.2 . So N is negative binomial with parameters
r = = 1.5 and = = 0.2 .
Var ( N )
E ( N ) = r , Var ( N ) = r ( + 1) , = 1+ = 1 + 0.2 = 1.2
E(N)
Var ( X )
X is exponentially distributed. =1
E2 ( X )

Guo Fall 2009 C, Page 197 / 284


2
1 1+ p
Var ( X ) Var ( N )
2
2 1.645
rE ( N ) = + = (1 + 1.2 ) = 2381.302
k E (X )
2
E(N ) 5%

So the insurer needs to have at least 2,381 claims in a year to have full credibility.

Please note that the following information is not necessary for us to solve the problem:

= 1.5 (one parameter of gamma distribution). If N is negative binomial, then


Var ( N )
= 1 + regardless of .
E(N)

The mean 5000 for the individual claim size random variable. If X is
Var ( X )
exponential, then 2 = 1 regardless of the mean.
E (X)

Nov 2003 #3
You are given:
The number of claims has a Poisson distribution
Claim sizes have a Pareto distribution with parameters = 0.5 and = 6
The number of claims and claim sizes are independent
The observed pure premium should be within 2% of the expected pure premium
90% of the time.

Determine the expected number of claims needed for full credibility.

Solution

The pure premium is the expected total annual claim dollar amount incurred by one
policyholder. Set r = 1 , we have:

2 2
1+ p 1+ p
E(X2)
1 1

2 Var ( X ) Var ( N ) 2 Var ( N )


E(N) = + = 1+
k E2 ( X ) E(N) k E2 ( X ) E(N )

The claim size X has a Pareto distribution with parameters = 0.5 and =6

E(X2) 1 6 1
=2 =2 = 2.5
E 2
(X ) 2 6 2

Guo Fall 2009 C, Page 198 / 284


Var ( N )
N is Poisson. So =1.
E(N)

2 2
1+ p 1 + 90%
E(X2)
1 1

2 Var ( N ) 2
E(N) = 1+ = ( 2.5 1 + 1)
k E 2
(X ) E(N ) 2%

2
1.645
= ( 2.5 ) = 16, 912.66
2%

Nov 2004 #21


You are given:
The number of claims has probability function:

m x
p ( x) = q (1 q ) , x = 0,1, 2,..., m
m x

The actual number of claims must be within 1% of the expected number of claims
with probability 0.95.

The expected number of claims for full credibility is 34,574.

Determine q .

Solution
2
1 1+ p
2 Var ( X ) Var ( N )
rE ( N ) = +
k E2 ( X ) E(N)

This problem is concerned only with loss frequency. So we in the aggregate loss model
N
S= X i , we set X i = 1 . This way, S = N becomes the total number of claims. Setting
i =1

Var ( X ) = Var (1) = 0 , we have:

Guo Fall 2009 C, Page 199 / 284


2
1 1+ p
2 Var ( N )
rE ( N ) =
k E(N)

Var ( N ) mq (1 q )
Plugging in the numbers: p = 95% , k = 1% , = =1 q
E(N) mq
2
1 1+ p
Var ( N )
2
2 1.96
rE ( N ) = = (1 q ) = 34,574 q = 0.9
k E(N) 1%

May 2005 #2
You are given:
The number of claims follows a negative binomial distribution with parameters r
and = 3 .
Claim severity has the following distribution
Claim Size Probability
1 0.4
The number 10 0.4 of claims is independent of
the severity 100 0.2 of claims.

Determine the expected number of claims needed for aggregate losses to be within 10%
of the expected aggregate losses with 95% probability.

Solution

Claim Size X Probability


1 0.4
10 0.4
100 0.2

You can verify that E ( X ) = 24.4 , Var ( X ) = 1, 445.04


Var ( N )
The claim size N is negative binomial. So = 1+ = 1+ 3 = 4 .
E(N)
2
1 1+ p
Var ( X ) Var ( N )
2
2 1.96 1, 445.04
rE ( N ) = + = + 4 = 2469.06
k E (X)
2
E(N) 10% 24.42

Guo Fall 2009 C, Page 200 / 284


Nov 2005 #35
You are given:
The number of claims follows a Poisson distribution
Claim sizes follow a gamma distribution with parameters (unknown) and
= 10, 000
The number of claims and claim sizes are independent
The full credibility standard has been selected so that actual aggregate losses will
be within 10% of the expected aggregate losses 95% of the time

Using limited fluctuation (classical) credibility, determine the expected number of claims
required for full credibility.

Solution

2 2
1 1+ p 1 1+ p
2 Var ( X ) Var ( N ) 2 Var ( X )
rE ( N ) = + = +1
k E2 ( X ) E(N) k E2 ( X )

2 2
1+ p 1+ p
E(X2)
1 1

2 Var ( X ) + E 2 ( X ) 2
= =
k E2 ( X ) k E2 ( X )

X is gamma. From Exam C Table, we know E ( X ) = and E ( X 2 ) = ( + 1) 2


.

2 2
1+ p 1+ p
E(X2)
1 1
+1 +1
2
2 2 1.96
rE ( N ) = = =
k E 2
(X ) k 10%

Since is unknown, we dont have enough information to find rE ( N ) .

Guo Fall 2009 C, Page 201 / 284


Chapter 9 Bayesian estimate
Exam C routinely tests Bayesian premium problems. Though many seem to understand
the theory behind Bayesian premiums, they have trouble calculating Bayesian premiums.
Most candidates are weak in the following two areas:

When the prior probability is continuous, many candidates dont know how to
calculate the posterior probability or how to find the Bayesian premium.
Continuous-prior problems are typically harder than discrete-prior problems.

When the prior probability is discrete and the calculation is messy, many
candidates dont know how to solve the problem in a few minutes. Many
candidates have inefficient calculation methods that are long and prone to errors.

In this chapter, I will first give you an intuitive review of Bayes Theorem. Next, I will
give you a framework for quickly solving Bayesian premium problems whether the prior
probability is discrete and continuous. In addition, I will give you a BA II Plus/BA II Plus
Professional shortcut for calculating Bayesian premiums when the prior probability is
discrete.

Even you are proficient in Bayes Theorem, I recommend that you still go over the
review. It is the foundation for the framework and shortcut to be presented later.

Intuitive review of Bayes Theorem

Prior probability. Before anything happens, as our baseline analysis, we believe (based
on existing information we have up to now or using purely subjective judgment) that our
total risk pool consists of several homogenous groups. As a part of our baseline analysis,
we also assume that these homogenous groups have different sizes. For any insured
person randomly chosen from the population, he is charged a weighed average premium.

As an over-simplified example, we can divide, by the aggressiveness of a persons


driving habits, all insureds into two homogenous groups: aggressive drivers and non-
aggressive drivers. In regards to the sizes of these two groups, we assume (based on
existing information we have up to now or using purely subjective judgment) that the
aggressive insureds account for 40% of the total insureds and non-aggressive account
for the remaining 60%.

So for an average driver randomly chosen from the population, we charge a weighed
average premium rate (we believe that an average driver has some aggressiveness and
some non-aggressiveness):

Guo Fall 2009 C, Page 202 / 284


Premium charged on a person randomly chosen from the population
= 40%*premium rate for an aggressive drivers rate
+ 60%*premium rate for a non-aggressive drivers rate

Posterior probability. Then after a year, an event changed our belief about the makeup
of the homogeneous groups for a specific insured. For example, we found in one year one
particular insured had three car accidents while an average driver had only one accident
in the same time period. So the three-accident insured definitely involved more risk than
did the average driver randomly chosen from the population. As a result, the premium
rate for the three-accident insured should be higher than an average drivers premium
rate.

The new premium rate we will charge is still a weighted average of the rates for the two
homogeneous groups, except that we use a higher weighting factor for an aggressive
drivers rate and a lower weighting factor for a non-aggressive drivers rate.

For example, we can charge the following new premium rate:

Premium rate for a driver who had 3 accidents last year


= 67%* premium rate for an aggressive drivers rate
+ 33%* premium rate for a non-aggressive drivers rate

In other words, we still think this particular drivers risk consists of two risk groups
aggressive and non-aggressive, but we alter the sizes of these two risk groups for this
specific insured. So instead of assuming that this persons risk consists of 40% of an
aggressive drivers risk and 60% of a non-aggressive drivers risk, we assume that his
risk consists of 67% of an aggressive drivers risk and 33% of a non-aggressive drivers
risk.

How do we come up with the new group sizes (or the new weighting factors)? There is a
specific formula for calculating the new group sizes:

For any given group,

Group size after an event


=K the group size before the event this groups probability to make the event happen.

K is a scaling factor to make the sum of the new sizes for all groups equal to 100%.

In our example above, this is how we got the new size for the aggressive group and the
new size for the non-aggressive group. Suppose we know that the probability for an
aggressive driver to have 3 car accidents in a year is 15%; the probability for a non-
aggressive driver to have 3 car accidents in a year is 5%. Then for the driver who has 3
accidents in a year,

the size of the aggressive risk for someone who had 3 accidents in a year
Guo Fall 2009 C, Page 203 / 284
= K (prior size of pure aggressive risk)
(probability of an aggressive driver having 3 car accidents in a year)
= K (40% )(15%)

the size of the non-aggressive risk for someone who had 3 accidents in a year
= K (prior size of the non-aggressive risk)
(probability of a no- aggressive driver having 3 car accidents in a year)
= K ( 60% ) (5%)

K is a scaling factor such that the sum of posterior sizes is equal to one. So

1
K ( 40% ) (15%) + K ( 60% ) ( 5%) =1, K= = 11.11%
40% (15% ) + 60% ( 5% )

the size of the aggressive risk for someone who had 3 accidents in a year
= 11.11% (40% ) ( 15% )= 66.67%

the size of the non-aggressive risk for someone who had 3 accidents in a year
=11.11% (60% ) ( 5%) = 33.33%

The above logic should make intuitive sense. The bigger the size of the group prior to the
event, the higher contribution this group will make to the events occurrence; the bigger
the probability for this group to make the event happen, the higher the contribution this
group will make to the events occurrence. So the product of the prior size of the group
and the groups probability to make the event happen captures this groups total
contribution to the events occurrence.

If we assign the post-event size of a group proportional to the product of the prior size
and the groups probability to make the event happen, we are really assigning the post-
event size of a group proportional to this groups total contribution to the events
occurrence. Again, this should make sense.

Lets summarize the logic for finding the new size of each group in the following table:

Guo Fall 2009 C, Page 204 / 284


Event: An insured had 3 accidents in a year.
A B C D=(scaling factor K) BC
Homogenous Before- Groups Post-event group size
groups (also called event probability to
segments, which group size make the even
are 2 components happen
of a risk)
Aggressive 40% 15% K40%15%
40% 15%
=
40% 15% + 60% 5%
Non-aggressive 60% 5% K60%5%
60% 5%
=
40% 15% + 60% 5%

We can translate the above rule into a formal theorem:


If we divide the population into n non-overlapping groups G1,G 2, ...,Gn such that each
element in the population belongs to one and only one group, then after the event E
occurs,

Pr(Gi | E ) = K Pr(Gi ) Pr( E | Gi )

K is a scaling factor such at

K [ Pr(G1 | E ) + Pr(G2 | E ) + ... + Pr(Gn | E )] = 1

Or K [Pr(G1 ) Pr( E | G1 ) + Pr(G2 ) Pr( E | G2 ) + ... + Pr(Gn ) Pr( E | Gn )] = 1

1
So K=
Pr(G1 ) Pr( E | G1 ) + Pr(G2 ) Pr( E | G2 ) + ... + Pr(Gn ) Pr( E | Gn )

Pr(Gi ) Pr( E | Gi )
And Pr(Gi | E ) =
Pr(G1 ) Pr( E | G1 ) + Pr(G2 ) Pr( E | G2 ) + ... + Pr(Gn ) Pr( E | Gn )

Pr(Gi | E ) is the conditional probability that Gi will happen given the event E happened,
so it is called the posterior probability. Pr(Gi | E ) can be conveniently interpreted as the
new size of Group Gi after the event E happened. Intuitively, probability can often be
interpreted as a group size.

For example, if a probability for a female to pass Course 4 is 55% and male 45%, we can
say that the total pool of the passing candidates consists of 2 groups, female and male
with their respective sizes of 55% and 45%.
Guo Fall 2009 C, Page 205 / 284
Pr(Gi ) is the probability that Gi will happen prior to the event Es occurrence, so its
called prior probability. Pr(Gi ) can be conveniently interpreted as the size of group Gi
prior to the occurrence of E.

Pr( E | Gi ) is the conditional probability that E will happen given Gi has happened. It is the
Group Gi s probability of making the event E happen. For example, say a candidate who
has passed Course 3 has 50% chance of passing Course 4, that is to say:

Pr(passing Course 4 / passing Course 3)=50%

We can say that the people who passed Course 3 have a 50% of chance of passing Course
4.

How to calculate the discrete posterior probability

Before we jump into the formula, lets look at a sixth-grade level math problem, which
requires zero knowledge about probability. If you understand this problem, you should
have no trouble understanding Bayes Theorem.

Problem 1
A rock is found to contain gold. It has 3 layers, each with a different density of gold. You
are given:

The top layer, which accounts for 80% of the mass of the rock, has a gold density
of only 10% (i.e. the amount of gold contained in the top layer is equal to 10% of
the mass of the top layer).

The middle layer, which accounts for 15% of the rocks mass, has a gold density
of 5%.

The bottom layer, which accounts for only 5% of the rocks mass, has a gold
density of 0.2%.

Questions
What is the rocks density of gold (i.e. what % of the rocks mass is gold)?

Of the total amount of gold contained in the rock, what % of gold comes from the top
layer? What % from the middle layer? What % comes from the bottom layer?

Solution

Lets set up a table to solve the problem. Assume that the mass of the rock is one (can be
1 pound, 1 gram, 1 ton it doesnt matter).
Guo Fall 2009 C, Page 206 / 284
A B C D=BC E=D/0.0876
1 Layer Mass of Density of Mass of gold Of the total amount of
the layer gold in the contained in the gold in the rock, what %
layer layer comes from this layer?
2 Top 0.80 10.0% 0.0800 91.3%
3 Middle 0.15 5.0% 0.0075 8.6%
4 Bottom 0.05 0.2% 0.0001 0.1%
5 Total 1.00 0.0876 100%

As an example of the calculations in the above table,

Cell(D,2)=0.810%=0.08,
Cell(D,5)=0.0800+0.0075+0.0001=0.0876,
Cell(E,2)= 0.08/0.0876=91.3%.

So the rock has a gold density of 0.0876 (i.e. 8.76% of the mass of the rock is gold).

Of the total amount of gold contained in the rock, 91.3% of the gold comes from the top
layer, 8.6% of the gold comes from the middle layer, and the remaining 0.1% of the gold
comes from the bottom layers. In other words, the top layer contributes to 91.3% of the
gold in the rock, the middle layer 8.6%, and the bottom layer 0.1%.

The logic behind this simple math problem is exactly the same logic behind Bayes
Theorem.

Now lets change the problem into one about prior and posterior probabilities.

Problem 2

In underwriting life insurance applications for nonsmokers, an insurance company


believes that theres an 80% chance that an applicant for life insurance qualifies for the
standard nonsmoker class (which has the standard underwriting criteria and the standard
premium rate); theres a 15% chance that an applicant qualifies for the preferred smoker
class (which has more stringent qualifying standards and a lower premium rate than the
standard nonsmoker class); and theres a 5% chance that the applicant qualifies for the
super preferred class (which has the highest underwriting standards and the lowest
premium rate among nonsmokers).

According to medical statistics, different nonsmoker classes have different probabilities


of having a specific heart-related illness:

The standard nonsmoker class has 10% of chance of getting the specific heart
disease.
The preferred nonsmoker class has 5% of chance of getting the specific heart
disease.
Guo Fall 2009 C, Page 207 / 284
The super preferred nonsmoker class has 0.2% of chance of getting the specific
heart disease.

If a nonsmoking applicant was found to have this specific heart-related illness, what is
the probability of this applicant coming from the standard risk class? What is the
probability of this applicant coming from the preferred risk class? What is the probability
of this applicant coming from the super preferred risk class?

Solution

The solution to this problem is exactly the same as the one to the rock problem.

Event: the applicant was found to have the specific heart disease
A B C D=BC E=D/0.0876
(i.e. the scaling factor
=1/0.0876)
1 Group Before- This groups After-event After-event size of the
(or event size probability size of the group (scaled)
segment) of the of having group (not yet
group the specific scaled)
heart illness
2 Standard 0.80 10.0% 0.0800 91.3%
3 Preferred 0.15 5.0% 0.0075 8.6%
4 Super 0.05 0.2% 0.0001 0.1%
Preferred
5 Total 1.00 0.0876 100%

So if the applicant was found to have the specific heart disease, then

Theres a 91.3% chance he comes from the standard risk class;


Theres an 8.6% chance he comes from the preferred risk class;
Theres a 0.1% chance he comes from the super preferred risk class.

Framework for calculating the discrete posterior probability


When calculating the discrete posterior probability, if the problem is tricky, try to set up
the table as we did in Problem 1 and Problem 2. Use this table to help you keep track of
your data and work.

Problem 3

1% of the women at age 45 who participate in a study are found to have breast cancer.
80% of women with breast cancer will have a positive mammogram. 10% of women
without breast cancer will also have a positive mammogram. One woman aged 45 who
participated in the study was found to have a positive mammogram.
Guo Fall 2009 C, Page 208 / 284
Calculate the probability that this woman has breast cancer.

Solution
This problem is tricky and many folks wont be able to solve this problem right.
To solve this problem, we need to correctly identify the following 3 items:

Whats the event?

What are the distinct causes (i.e. segments) that can possibly produce the event?
Make sure your causes are mutually exclusive (i.e. no two causes can happen
simultaneously) and collectively exhaustive (i.e. there are no other causes).

What is each causes probability to produce the event?

Event a woman (who participated in a study) is found to have a positive mammogram.

Causes of this event two distinct causes. Women with breast cancer and without breast
cancer. These are the two segments. In terms of size of each segment, women with breast
cancer account for 1% of the participants; and women without breast cancer account for
99%.

Each cause probability to produce the event women with breast cancer have 80%
chance of having a positive mammogram. Women without breast cancer have 10% of the
chance of having a positive mammogram.

Next, we set up the following table:

Event: a woman in the study is found to have a positive mammogram.


Segments Segments Segments
probability to contribution contribution % to the
Segment (distinct Segments produce the amount to the event (post event
causes) size event event probability)
women with breast
cancer 1% 80% 1%(80%) =0.008 0.008/0.107=7.48%
women without breast
cancer 99% 10% 99%(10%)=0.099 0.009/0.107=92.52%

Total 100% 0.107 100%

So if a woman aged 45 who participated in the study is found to have a positive


mammogram, then she has 7.48% chance of actually having breast cancer.

Guo Fall 2009 C, Page 209 / 284


Problem 4 (SOA May 2003, Course 1, #31)
A health study tracked a group of persons for five years. At the beginning of the study,
20% were classified as heavy smokers, 30% as light smokers, and 50% as nonsmokers.

Results of the study showed that light smokers were twice as likely as nonsmokers to die
during the five-year study, but only half as likely as heavy smokers.

A randomly selected participant from the study died over the five-year period.

Calculate the probability that the participant was a heavy smoker.

Solution

Let p =the probability that a non-smoker will die during the next 5 years. Then,

The probability that a light smoker will die during the next 5 years is 2 p
The probability that a heavy smoker will die during the next 5 years is 4 p

Please note that we dont enough information to calculate p . This shouldnt bother us.
We need to know the value of p to solve the problem.

Event: A participant died during the 5-year period


Segment's
Segment probability to Segment's Segment's
Segment size produce the event contribution amount contribution %
Heavy smoker 20% 4p 20%(4 p )=0.8 p 0.8 p /1.9 p =42.11%
Light smoker 30% 2p 30%(2 p )=0.6 p 0.6 p /1.9 p =31.58%
Non smoker 50% p 50%( p )= 0.5 p 0.5 p /1.9 p =26.32%
Total 100% 1.9 p 100.00%

The probability that the participant was a heavy smoker is 42.11%.


The probability that the participant was a heavy smoker is 31.58%.
The probability that the participant was a heavy smoker is 26.32%.

Morale of this problem


In problems related to Bayes Theorem, the absolute size of each segment doesnt
matter; only the ratio of each segment size matters. Similarly, the absolute
probability for each segment to produce the event doesnt matter; only the ratio of
probabilities matters.

If we are to solve this problem quickly, we can set up the following table:

Guo Fall 2009 C, Page 210 / 284


Event: A participant died during the 5-year period
Segment's
Segment Segment's probability to contribution Segment's
Segment size produce the event amount contribution %
Heavy smoker 2 4 2(4)=8 8/19=42.11%
Light smoker 3 2 3(2)=6 6/19=31.58%
Non smoker 5 1 5(1)=5 5/19=26.32%
Total 10 19 100%

In the above table, we change the segment sizes from 20%, 30%, and 50% to 2, 3, and 5.
Similarly, we change the segments probabilities from 4 p , 2 p , and p to 4, 2, and 1.
This speeds up our calculations. You can use this technique when taking the exam.

Problem 5 (May 2000, #22)


You are given:

A portfolio of independent risks is divided into two classes, Class A and Class B.
There are twice as many risks in Class A as in Class B.
The number of claims for each insured during a single year follows a Bernoulli
distribution.
Class A and B have claim size distributions as follows:

Claim Size Class A Class B


50,000 0.6 0.36
100,000 0.40 0.64

The expected number of claims per year is 0.22 for Class A and 0.11 for Class B.

One insured is chosen at random. The insureds loss for two years combined is 100,000.
Calculate the probability that the selected insured belongs to Class A.

Solution

This time, well use a formula driven approach without a table. Lets S represent the
total claim $ amount incurred by the randomly chosen insured during the 2-year period.

We observe that S = 100, 000 . We are asked to find P ( A S = 100, 000 ) , which is the
posterior probability that Class A has incurred a total loss of $100,000 during the 2-year
period.

Using either the conditional probability formula or the Bayes Theorem, we have:

P ( A S = 100, 000 ) P ( A) P ( S = 100, 000 A)


P ( A S = 100, 000 ) = =
P ( S = 100, 000 ) P ( S = 100, 000 )

Guo Fall 2009 C, Page 211 / 284


P ( A ) P ( S = 100, 000 A )
=
P ( A) P ( S = 100, 000 A) + P ( B ) P ( S = 100, 000 B )

There are twice as many risks in Class A as in Class B. So P ( A) = 2 P ( B )

P ( A) P ( S = 100, 000 A)
P ( A S = 100, 000 ) =
P ( A ) P ( S = 100, 000 A ) + P ( B ) P ( S = 100, 000 B )

1 1
= =
P ( B ) P ( S = 100, 000 B ) P ( B) P ( S = 100, 000 B )
1+ 1+
P ( A ) P ( S = 100, 000 A ) P ( A) P ( S = 100, 000 A )

P ( A S = 100, 000 ) =
1
1 P ( S = 100, 000 B )
1+
2 P ( S = 100, 000 A)

Once again, you see that the posterior probability depends on


Ratio of P ( A) and P ( B ) , not their absolute amounts
Ratio of P ( S = 100, 000 A ) and P ( S = 100, 000 A ) , not their absolute amounts

P ( S = 100, 000 B )
So we need to find the ratio .
P ( S = 100, 000 A )

P ( S = 100, 000 A ) is the probability that the Class A produces the observation (i.e. Class
A incurs $100,000 loss in 2 years).

We are told that the # of claims for Class A and B is a Bernoulli random variable.
Remember that Bernoulli random variable is just a binominal random variable with n = 1
(only one trial). Let X represent the # of claims incurred by the insured. Let p represent
the probability for the insured to have a claim. Then E ( X ) = p . We are told that
E ( X A ) = 0.22 . So pA = 0.22 . Similarly, E ( X B ) = pB = 0.11 .

So each year, Class A can have either zero claim (with probability 0.78) or one claim
(0.22). The claim amount is either 50,000 (probability 0.6) and 100,000 (probability 0.4).

Each year, Class B can have either zero claim (with probability 0.89) or one claim (0.11).
The claim amount is either 50,000 (probability 0.36) and 100,000 (probability 0.64).

Guo Fall 2009 C, Page 212 / 284


There are only 3 ways for Class A or B to produce $100,000 claims in two years:
Have $50,000 claim in Year 1 and $50,000 Year 2.
Have $100,000 claim in Year 1 and $0 claim in Year 2.
Have $0 claim in Year 1 and $100,000 claims in Year 2.

P ( S = 100, 000 A) = ( 0.222 )( 0.62 ) + 2 ( 0.22 )( 0.78 )( 0.4 ) = 0.1547


P ( S = 100,000 B ) = ( 0.112 )( 0.36 2 ) + 2 ( 0.11)( 0.89 )( 0.64 ) = 0.1269

P ( A S = 100, 000 ) =
1 1
= = 0.709
1 P ( S = 100, 000 B ) 1 + 1 0.1269
1+
2 P ( S = 100, 000 A ) 2 0.1547

How to calculate the continuous posterior probability

Problem 6 (continuous random variable)

You are tossing a coin. Not knowing p , the success rate of a heads showing up in one
toss of the coin, you subjectively assume that p is uniformly distributed over [ 0,1] . Next,
you do an experiment by tossing the coin 3 times. You find that, in this experiment, 2 out
of 3 tosses have heads.

Calculate the posterior probability p .

Solution

Event: getting 2 heads out of 3 tosses.


A B C D=BC E=D Scaling factor
1 Group Before- This groups After-event size After-event size of the
event probability of the group (not group (scaled)
size of the to make the yet scaled)
group event
happen
2 Any p in 1 C32 p 2 (1 p ) C32 p 2 (1 p ) C32 p 2 (1 p )
[0,1] 1
C32 p 2 (1 p )dp
0

3 Total 1 1
100%
C32 p 2 (1 p )dp
0

The key to solving this problem is to understand that we have an infinite number of
groups. Each value of p ( 0 p 1 ) is a group. Because p is uniform over

Guo Fall 2009 C, Page 213 / 284


[0,1], f ( p ) = 1 . As a result, for a given group of p , the before-event size is one. And for
a given group of p , this groups probability to make the event getting 2 heads out of 3
tosses happen is a binomial distribution with probability of C32 p 2 (1 p ) . So the after-
event size is

k 1 C32 p 2 (1 p )
scaling factor before-event the group's probability
group size
to have 2 heads out of 3 tosses

After-event size of the group

k is a scaling factor such that the sum of the after-event sizes for all the groups is equal to
one. Since we have an infinite number of groups, we have to use integration to sum up all
the after-event sizes for each group:

1
1
k C32 p 2 (1 p )dp = 1 k= 1
0
C32 p 2 (1 p )dp
0

Then the after-event size (or posterior probability) is:

C32 p 2 (1 p ) p 2 (1 p )
k C32 p 2 (1 p ) = 1
= 1
C p (1 p )dp
2
3
2
p 2 (1 p )dp
0 0

It turns out that the posterior probability we just calculated is a Beta distribution.

Key point
The process for calculating the continuous posterior probability is the same for
calculating the discrete posterior probability. The only difference is this: you use
integration for continuous posterior probability; you use summation for discrete posterior
probability.

Problem 7 (May 2000 #10)


The size of a claim for an individual insured follows an inverse exponential distribution
with the following probability density function:

e x
f (x ) = , x>0
x2

The parameter has the prior distribution with the following probability density
function:
Guo Fall 2009 C, Page 214 / 284
4
e
g( )= , >0
4

One claim of size 2 has been observed for a particular insured. Which of the following is
proportional to the posterior distribution of ?

2 3 4 2 2 2 9 4
e , e , e , e , e

Solution

The observation is x = 2 . We need to find g ( x = 2 ) .

g ( x = 2) = k g( ) f (x = 2 )
scaling factor posterior density this group's density to
posterior density
make the event happen

4
e x
g ( x = 2 ) = kg ( ) f (x = 2 ) = k
e
4 x2 x =2
4 2
g ( x = 2) = k
e e k
= e 3 4

4 22 16

3 4
So the posterior distribution of is proportional to e .

Here the problem didnt ask you to find the full posterior probability. If you have to find
it, this is how. One way is to do integration. Assume g ( x = 2 ) = K e 3 4 . Because the
total posterior probability should be one, we have:

+ +
g ( x = 2 )d =
1
K e 3 4
d = 1, K= +
0 0 3 4
e d
0

+
3 4
To calculate e 3 4
d , set = y . Then = y.
0
4 3

+ + 2 + 2
4 4 4 4
e 3 4
d = y e yd y = ye dy = y
. Here ye y
is a simple
0 0
3 3 3 0
3
+ 2

, and g ( x = 2 ) = xe
3 9 9
gamma distribution. So ye y dy = 1 , K = = 3x 4
.
0
4 19 16

Guo Fall 2009 C, Page 215 / 284


Another, quicker, way to find the full expression of g ( x = 2 ) is to notice that e 3 4

4
is a gamma distribution with parameter = 2 and = . If you look at the table for
3
Exam C, youll see the gamma pdf:

1 1 9
f ( x) =
x ( 4 3)
x 1
e x
= x 2 1e = xe 3x 4

( 4 3)
2
16

Problem (Nov 2004, #33)


You are given:
In a portfolio of risks, each policyholder can have at most one claim per year.
The probability of a claim for a policyholder during a year is q .
q3
The prior probability is (q) = , 0.6 < q < 0.8
0.07

A randomly selected policyholder has one claim in Year 1 and zero claim in Year 2.

For this policyholder, determine the posterior probability that 0.7 < q < 0.8 .

Solution

The observation is N1 = 1 and N 2 = 0 . We are asked to find the posterior probability

P ( 0.7 < q < 0.8 N1 = 1, N 2 = 0 )

0.8
P ( 0.7 < q < 0.8 N1 = 1, N 2 = 0 ) = f ( q N1 = 1, N 2 = 0 )dq
q = 0.7

f ( q ) P ( N1 = 1, N 2 = 0 q ) f ( q ) P ( N1 = 1, N 2 = 0 q )
f ( q N1 = 1, N 2 = 0 ) = =
P ( N1 = 1, N 2 = 0 ) 0.8
f ( q ) P ( N1 = 1, N 2 = 0 q ) dq
0.6

P ( N1 = 1, N 2 = 0 q ) = P ( N1 = 1 q ) P ( N 2 = 0 q ) = q (1 q ) . We assume N1 and N 2 are


independent.

q3 q 4 q5
f ( q ) P ( N1 = 1, N 2 = 0 q ) = q (1 q ) =
0.07 0.07

Guo Fall 2009 C, Page 216 / 284


q 4 q5
q4 q5
f ( q N1 = 1, N 2 = 0 ) = 0.8 40.075 = 0.8
q q
0.07
dq (q 4
q 5 ) dq
0.6 0.6

0.8 0.8
q ( q 4 q 5 )dq (q 5
q 6 )dq
P ( 0.7 < q < 0.8 N1 = 1, N 2 = 0 ) =
q = 0.7 q = 0.7
0.8
= 0.8

(q 4
q 5 ) dq (q 4
q 5 ) dq
0.6 0.6

0.8
1 6 1 7
6
q q
7 ! 0.7
1
( 0.86 0.7 6 )
1
( 0.87 0.77 )
= =5 6 = 0.5572
( 0.85 0.65 ) ( 0.8 0.6 )
0.8
1 5 1 6 1 1 6 6
q q 5 6
5 6 ! 0.6

Nov 2001 #34

You are given:


The # of claims for each policyholder follows a Poisson distribution with mean
The distribution of across all policyholders has probability density function

f ( )= e "
, >0

1
e n
d =
0
n2

A randomly selected policyholder is known to have had at least one claim last year.

Determine the posterior probability that this same policyholder will have at least one
claim this year.

Solution

The observation is N1 # 1 . We are asked to find P ( N 2 # 1 N1 # 1) . If we ignore N1 # 1 ,


then by conditioning on , we have:

P ( N 2 # 1) = P ( N2 # 1 ) f ( )d
=0

Guo Fall 2009 C, Page 217 / 284


N2 is a Poisson random variable with mean . So

P ( N2 # 1 ) =1 P ( N2 = 0 ) =1 e

P ( N 2 # 1) = (1 e ) f ( )d
=0

The observation N1 # 1 will change the above equation to

P ( N 2 # 1 N1 # 1) = (1 e )f( N1 # 1) d
=0

Next, we have:

f ( ) P ( N1 # 1 ) e (1 e )
f ( N1 # 1) = =
f ( ) P ( N1 # 1 ) d e (1 e )d
0 0

e (1 e )d = e d e 2
d =
1
12
1 3
=
22 4
0 0 0

f ( N1 # 1) =
4
3
e (1 e )

P ( N 2 # 1 N1 # 1) = (1 e )f( N1 # 1) d = (1 e ) 43 e (1 e )d
=0 =0

=
4
3
e (1 2e +e 2
)d =
4
3
e d 2 e 2
d + e 3
d
=0 0 0 0

4 1 1 1
= 2 + = 0.8148
3 12 22 32

Guo Fall 2009 C, Page 218 / 284


Calculate Bayesian premium when the prior probability is discrete
Next, Ill give you a framework for how to calculate Bayesian problems. As I explain my
framework, I will also give you a shortcut.

Framework for calculating discrete-prior Bayesian premiums


Step 1 Determine the observation.

Step 2 Discard the observation. Set up your partition equation.

Step 3 Consider the observation. Modify your parturition equation obtained in


Step 2. Change the prior probability to posterior probability.

Step 4 Use Bayes Theorem and calculate the posterior probability.

Step 5 Calculate the final answer.

Let me use examples to illustrate this solution framework.

Problem 6 (Nov 2001 #7)

You are given the following information about six coins:

Coin Probability of Heads


1-4 0.50
5 0.25
6 0.75

A coin is selected at random and the flipped repeatedly. X i denotes the outcomes of the
i th flip, where 1 indicates heads and 0 indicates tails. The following sequence is
obtained:

S = { X 1 , X 2 , X 3 , X 4 } = {1,1, 0,1}

Determine E ( X 5 S ) using Bayesian analysis.

Solution

Step 1 Determine the observation. This is easy; we are already told the observation is

S = { X 1 , X 2 , X 3 , X 4 } = {1,1, 0,1}

Step 2 Discard the observation. Set up the partition equation.


Guo Fall 2009 C, Page 219 / 284
Now were going to simplify the problem by purposely discarding the observation. So
instead of calculating E ( X 5 S ) , well just calculate E ( X 5 ) . X 5 is the # of heads
showing up in the fifth flip of the coin randomly chosen. X 5 is a binominal random
variable with parameter n = 1 (one flip of coin) and p (the probability of the head
showing up). Using the binomial distribution formula, we have:

E ( X5 ) = n p = p

However, the parameter p varies by coin types. For Coin 1-4, p = 0.5 ; for Coin 5,
p = 0.25 ; and for Coin 6, p = 0.75 . Because the coin is randomly chosen from Coin 1, 2,
3, 4, 5, and 6, we dont know which coin is chosen. So well need to partition E ( X 5 )
over coin types:

E ( X5 )
= E ( X 5 Coin 1-4 ) P ( Coin 1-4 ) + E ( X 5 Coin 5 ) P ( Coin 5 ) + E ( X 5 Coin 6 ) P ( Coin 6 )

We already know that

E ( X 5 Coin 1-4 ) = P ( Coin 1-4 showing a head in one flip ) = 0.5


E ( X 5 Coin 5 ) = P ( Coin 5 showing a head in one flip ) = 0.25
E ( X 5 Coin 6 ) = P ( Coin 6 showing a head in one flip ) = 0.75

E ( X 5 ) = 0.5P ( Coin 1-4 ) + 0.25P ( Coin 5 ) + 0.75P ( Coin 6 )

We can go one step further and calculate E ( X 5 ) . Though the problem doesnt
specifically tell us P ( Coin 1-4 ) , P ( Coin 5) , and P ( Coin 6 ) , we assume that coins are
uniformly distributed so each coin is equally likely to be chosen. So

4 1 1
P ( Coin 1-4 ) = , P ( Coin 5 ) = , P ( Coin 6 ) =
6 6 6

4 1 1
E ( X 5 ) = 0.5 + 0.25 + 0.75 = 0.5
6 6 6

Of course, this problem isnt as simple as this. Otherwise, everyone who has passed
Exam P will pass Exam C.

Step 3 Consider the observation. Modify the equation obtained in Step 2. Change

Guo Fall 2009 C, Page 220 / 284


the prior probabilities to posterior probabilities.

We have found E ( X 5 ) . The real problem, however, is to find E ( X 5 S ) . So well need


to modify our equation obtained in Step 2. The original partition equation (if we discard
the observation) is:

E ( X5 )
= E ( X 5 Coin 1-4 ) P ( Coin 1-4 ) + E ( X 5 Coin 5 ) P ( Coin 5 ) + E ( X 5 Coin 6 ) P ( Coin 6 )

How to modify:
E ( X5 ) E ( X5 S )
P ( Coin 1-4 ) P ( Coin 1-4 S )
P ( Coin 5) P ( Coin 5 S )
P ( Coin 6 ) P ( Coin 6 S )

Here the observation S = { X 1 , X 2 , X 3 , X 4 } = {1,1, 0,1} changes our equation. Because of


this observation, we can no longer assume that the coin randomly chosen has 4 6 chance
of being Coin 1-4, 1 6 chance of being Coin 5, and 1 6 chance of being Coin 6; these
probabilities would have fine if we didnt observe S = { X 1 , X 2 , X 3 , X 4 } = {1,1, 0,1} . Now
we have this new information S = { X 1 , X 2 , X 3 , X 4 } = {1,1, 0,1} . We will need to
reevaluate the probability that the coin belongs to which type. So well replace the prior
probabilities P ( Coin 1-4 ) , P ( Coin 5) , and P ( Coin 6 ) with posterior probabilities
P ( Coin 1-4 S ) , P ( Coin 5 S ) , and P ( Coin 6 S ) respectively.

In addition, well need to change E ( X 5 ) to E ( X 5 S ) to indicate that we are calculating


the conditional expectation.

Now the new equation is:

E ( X5 S )
= E ( X 5 Coin 1- 4 ) P ( Coin 1- 4 S ) + E ( X 5 Coin 5 ) P ( Coin 5 S ) + E ( X 5 Coin 6 ) P ( Coin 6 S )
= 0.5 P ( Coin 1- 4 S ) + 0.25 P ( Coin 5 S ) + 0.75 P ( Coin 6 S )

E ( X 5 S ) = 0.5 P ( Coin 1- 4 S ) + 0.25 P ( Coin 5 S ) + 0.75 P ( Coin 6 S )

Guo Fall 2009 C, Page 221 / 284


Please note that our observation S = { X 1 , X 2 , X 3 , X 4 } = {1,1, 0,1} doesnt change how
likely each coin actually produces a head in one flip. So the following three items are
fixed regardless of our observation:

E ( X 5 Coin 1-4 ) = P ( Coin 1-4 showing a head in one flip ) = 0.5


E ( X 5 Coin 5 ) = P ( Coin 5 showing a head in one flip ) = 0.25
E ( X 5 Coin 6 ) = P ( Coin 6 showing a head in one flip ) = 0.75

Step 4 Calculate the posterior probabilities using Bayes Theorem.

P(S Coin 1- 4 ) P ( Coin 1- 4 ) P ( S Coin 1- 4 )


P ( Coin 1- 4 S ) = =
P(S ) P(S )

P(S Coin 5) P ( Coin 5 ) P ( S Coin 5)


P ( Coin 5 S ) = =
P(S ) P(S )

P(S Coin 6 ) P ( Coin 6 ) P ( S Coin 6 )


P ( Coin 6 S ) = =
P(S ) P(S )

Where
P ( S ) = P ( Coin 1- 4 ) P ( S Coin 1- 4 ) + P ( Coin 5 ) P ( S Coin 5 ) + P ( Coin 6 ) P ( S Coin 6 )

Detailed calculation:

P ( S Coin 1- 4 ) = P (1,1, 0,1 Coin 1- 4 ) = 0.5 ( 0.5 )( 0.5 )( 0.5 ) = 0.54

P ( S Coin 5 ) = P (1,1, 0,1 Coin 5 ) = 0.25 ( 0.25 )( 0.75 )( 0.25 ) = 0.253 ( 0.75 )

P ( S Coin 6 ) = P (1,1, 0,1 Coin 6 ) = 0.75 ( 0.75 )( 0.25 )( 0.75 ) = 0.753 ( 0.25 )

Coin 1- 4 ) = P ( Coin 1- 4 ) P ( S Coin 1- 4 ) = 0.54


4
P(S
6
Coin 5 ) = P ( Coin 5 ) P ( S Coin 5 ) = 0.253 0.75
1
P(S
6
Coin 6 ) = P ( Coin 6 ) P ( S Coin 6 ) = 0.753 0.25
1
P(S
6

Guo Fall 2009 C, Page 222 / 284


P(S ) =
4
6
( 0.54 ) + ( 0.253 ) ( 0.75 ) + ( 0.753 ) ( 0.25 )
1
6
1
6

4
( 0.54 )
P ( Coin 1- 4 S ) = 6 = 0.681
4
6
( 0.54 ) + ( 0.253 ) ( 0.75 ) + ( 0.753 ) ( 0.25)
1
6
1
6

1
( 0.253 ) ( 0.75 )
P ( Coin 1- 4 S ) = 6 = 0.032
4
6
( 0.5 ) + 6 ( 0.25 ) ( 0.75) + 6 ( 0.75 ) ( 0.25)
4 1 3 1 3

1
( 0.753 ) ( 0.25 )
P ( Coin 1- 4 S ) = 6 = 0.287
4
6
( 0.5 ) + 6 ( 0.25 ) ( 0.75) + 6 ( 0.75 ) ( 0.25)
4 1 3 1 3

Step 5 The final result:

E ( X5 S )
= 0.5 P ( Coin 1- 4 S ) + 0.25 P ( Coin 5 S ) + 0.75 P ( Coin 6 S )
= 0.5 (0.681) + 0.25 (0.032) + 0.75 (0.287) = 0.564

I recommend that initially you use the 5-step framework to calculate discrete-prior
Bayesian premiums. Just copy what I did. Explicitly write out each of the 5 steps; dont
skip step. Solve as many problems as you need until you are proficient with the
framework.

Once you are familiar with the 5-step process, lets learn how to improve it. Well focus
on improving Step 4 (calculating the posterior probabilities). If you ever solve a Bayesian
premium problem, youll have discovered that Step 4 is long, tedious, and prone to errors.

Take a look at Step 4 in Problem 4. See how involved the calculation is. When taking the
exam, you are really stressed. In addition, you have only 3 minutes to solve a problem. If
you follow the standard solution approach, chances are high that youll mess up at least
one step of your calculation. Then all your hard work is ruined. You wont be able to
score a point.

Most exam candidates will mess up in Step 4 . Lets find a better way to do Step 4.

Guo Fall 2009 C, Page 223 / 284


What are we doing in Step 4? Two things. First, we calculate the raw posterior
probabilities:

Coin 1- 4 ) = P ( Coin 1- 4 ) P ( S Coin 1- 4 ) = 0.54


4
P(S
6
Coin 5 ) = P ( Coin 5 ) P ( S Coin 5 ) = 0.253 0.75
1
P(S
6
Coin 6 ) = P ( Coin 6 ) P ( S Coin 6 ) = 0.753 0.25
1
P(S
6

Next, we normalize these raw posterior probabilities. We do so by using a normalizing


constant

1
k=
P(S )
1
=
P ( Coin 1- 4 ) P ( S Coin 1- 4 ) + P ( Coin 5 ) P ( S Coin 5 ) + P ( Coin 6 ) P ( S Coin 6 )

After multiplying each raw posterior probability with this constant, the three posterior
probabilities will nicely add up to one. Normalization is necessary; its a part of Bayes
Theorem. However, it is a messy calculation. So ideally, well want to avoid it.

It turns out that we really can avoid normalizing the raw posterior probabilities. To
understand how to avoid normalization, lets formally present the question:

Problem -- Calculate E ( X ) given the following information:


X =x pX ( x )

0.5
4
6
( 0.54 ) k

0.25
1
6
( 0.253 ) ( 0.75 ) k

0.75
1
6
( 0.753 ) ( 0.25) k

Please note that E ( X ) is exactly E ( X 5 S ) in this problem

Guo Fall 2009 C, Page 224 / 284


We have seen this problem in the chapter on how to BA II Plus/Professional 1-V
Statistics Worksheet. This is how we solved it without calculating k .
X =x pX ( x ) Scaled p X ( x ) up
1, 000, 000
multiply p X ( x ) by
k

0.5
4
6
( 0.54 ) k = 0.041667 k 41,667

0.25
1
6
( 0.253 ) ( 0.75) k = 0.001953 k 1,953

0.75
1
6
( 0.753 ) ( 0.25) k = 0.017578 k 17,578

Next, we enter the following into BA II Plus/Professional 1-V Statistics Worksheet:

X01=0.5, Y01=41,667
X02=0.25, Y02= 1,953
X03=0.75, Y03=17,578

Next, set your BA II Plus/Professional to I-V Statistics Worksheet. You do this by


pressing 2ND Stat and then keeping pressing ENTER until your calculator displays
1-V.

Press the down arrow key & . You should get: n = 61,198
Press the down arrow key & . You should get: X = 0.56382970

So E ( X 5 S ) ' X = 0.564

This the result calculated using BA II Plus/Professional 1-V Statistics Worksheet matches
what we calculated in the 5-step process.

Now its time for me to present my shortcut

Guo Fall 2009 C, Page 225 / 284


Event: the coin produces HHTH
A B C D=BC E = 1,000,000 F
Group Before- This After-event size of the Scale up raw Conditional
(Coin event groups group (raw posterior posterior mean
Type) size of probability probability) probability
the to produce
group HHTH
0.54
( 0.54 ) = 0.041667
4 4 41,667 0.50
1-4 6 6
3
( 0.253 ) ( 0.75 ) = 0.001953
1 0.25 1 1,953 0.25
5 6 0.75 6
3
( 0.753 ) ( 0.25 ) = 0.017578
1 0.75 1 17,578 0.75
6 6 0.25 6

Enter the following into BA II Plus/Professional 1-V Statistics Worksheet:


X01=0.5, Y01=41,667
X02=0.25, Y02= 1,953
X03=0.75, Y03=17,578

Next, set your BA II Plus/Professional to I-V Statistics Worksheet. You do this by


pressing 2ND Stat and then keeping pressing ENTER until your calculator displays
1-V.

Press the down arrow key & . You should get: n = 61,198
Press the down arrow key & . You should get: X = 0.56382970

So E ( X 5 S ) ' X = 0.564

Guo Fall 2009 C, Page 226 / 284


Better yet, we can reduce the decimal places in Column D to 4
decimal places. This is even faster:

Event: the coin produces HHTH


A B C D=BC E = 10,000 F
Group Before- This After-event size of the Scale up raw Conditional
(Coin event groups group (raw posterior posterior mean
Type) size of probability probability) probability
the to produce
group HHTH
0.54
( 0.54 ) = 0.0417
I 4 4 417 0.50
6 6
0.253
( 0.253 ) ( 0.75 ) = 0.0020
II 1 1 20 0.25
6 0.75 6
0.753
( 0.753 ) ( 0.25 ) = 0.0176
III 1 1 176 0.75
6 0.25 6

Enter the following into BA II Plus/Professional 1-V Statistics Worksheet:


X01=0.5, Y01=417
X02=0.25, Y02= 20
X03=0.75, Y03=176

Using 1-V Statistics Worksheet, you should get: n = 613 , X = 0.56362153 ' 0.564

Next, well practice this shortcut.

Problem 7 (May 2000 #7)


You are given the following information about two classes of risks:

Risks in Class A have a Poisson claim count distribution with a mean of 1.0 per year.
Risks in Class B have a Poisson claim count distribution with a mean of 3.0 per year.
Risks in Class A have an exponential severity distribution with a mean of 1.0 per year.
Risks in Class B have an exponential severity distribution with a mean of 3.0 per year.
Each class has the same number of risks.
Within each class, severities and claim counts are independent.

A risk is randomly selected and observed to have 2 claims during one year. The observed
claim amounts were 1.0 and 3.0. Calculate the posterior expected value of the aggregate
loss for this risk during the next year.

Solution

This is Bayes theorem applied in the context of compound loss distributions.


Guo Fall 2009 C, Page 227 / 284
Conceptual framework

Let
S represent the aggregate claim dollar amount.
X represent the individual claim dollar amount
N represent the # of claims

N
Then S = ( X i . We are told that N and X are independent. In addition, X i are
i =1

independent identically distributed. We have observed that { N = 2, X 1 = 1, X 2 = 3} .

We are asked to calculate E ( S N = 2, X 1 = 1, X 2 = 3) .

First, lets make things simple and forget about the condition N = 2, X 1 = 1, X 2 = 3 . Then
E ( S ) = E ( N ) E ( X ) . Since the risk is randomly chosen from Class A and Class B, we
have:

E ( S ) = E ( S A ) P ( A) + E ( S B ) P ( B )

The above formula is an Exam P concept. You shouldnt have trouble understanding it.
Here P ( A) and P ( B ) are prior probabilities, which are probabilities prior to our
observation { N = 2, X 1 = 1, X 2 = 3} .

Next,

E ( S A ) = E ( N A ) E ( X A ) = "A A = 1(1) = 1
E ( S B ) = E ( N B ) E ( X B ) = "B B = 3 ( 3) = 9

Here "A and "B are the Poisson means for claim counts for Class A and B respectively.
And A and B are exponential mean claim amounts for Class A and B respectively.

E ( S ) = P ( A) + 9P ( B )

Now lets move to the complex concept E ( S N = 2, X 1 = 1, X 2 = 3) . To calculate this


amount, well still use the formula E ( S ) = P ( A ) + 9 P ( B ) . However, well replace the
prior probabilities P ( A) and P ( B ) with posterior probabilities:

Guo Fall 2009 C, Page 228 / 284


P ( A N = 2, X 1 = 1, X 2 = 3) , P ( B N = 2, X 1 = 1, X 2 = 3)

Our observation { N = 2, X 1 = 1, X 2 = 3} has changed our belief of the likelihood that the
risk is from Class A and Class B. So well no longer use the prior probability P ( A) and
P ( B ) to calculate E ( S ) .

In addition, well replace E ( S ) with E ( S N = 2, X 1 = 1, X 2 = 3) to indicate that the


expected aggregate claim amount is based on the observation { N = 2, X 1 = 1, X 2 = 3} .

Then our original partition equation becomes:

E ( S N = 2, X 1 = 1, X 2 = 3)
= P ( A N = 2, X 1 = 1, X 2 = 3) + 9 P ( B N = 2, X 1 = 1, X 2 = 3)

Next, well need to use the Bayes theorem to calculate the posterior probabilities
P ( A N = 2, X 1 = 1, X 2 = 3) and P ( B N = 2, X 1 = 1, X 2 = 3) :

P ( A ) P ( N = 2, X 1 = 1, X 2 = 3 A )
P ( A N = 2, X 1 = 1, X 2 = 3) =
P ( N = 2, X 1 = 1, X 2 = 3)

P ( A ) P ( N = 2, X 1 = 1, X 2 = 3 A )
=
P ( A) P ( N = 2, X 1 = 1, X 2 = 3 A ) + P ( B ) P ( N = 2, X 1 = 1, X 2 = 3 B )

P ( B ) P ( N = 2, X 1 = 1, X 2 = 3 B )
P ( B N = 2, X 1 = 1, X 2 = 3) =
P ( N = 2, X 1 = 1, X 2 = 3)

P ( B ) P ( N = 2, X 1 = 1, X 2 = 3 B )
=
P ( A) P ( N = 2, X 1 = 1, X 2 = 3 A ) + P ( B ) P ( N = 2, X 1 = 1, X 2 = 3 B )

If you understand my logic so far, you are in the good shape. The remaining work is just
the calculation.

Standard calculation

Well calculate the probability for Class A Risk and Class B Risk to each produce the
observed outcome { N = 2, X 1 = 1, X 2 = 3} :

Guo Fall 2009 C, Page 229 / 284


P A { N = 2, X 1 = 1, X 2 = 3} = P A ( N = 2 ) P A ( X = 1) P A ( X = 3)
( "A )
2 1 3

=e "A

2!
1
e A
1
e A
=e 1 1
2!
(e 1
)( e ) = 12 e
3 5
= 0.00337
A A

P B { N = 2, X 1 = 1, X 2 = 3} = P B ( N = 2 ) P B ( X = 1) P B ( X = 3)
( "B )
2 1 3 1 3 1
"B 1 1 32 1 1 1 4
=e e B
e B
=e 3
e 3
e 3
= e 3
= 0.00656
2! B B 2! 3 3 2

Next, well calculate the posterior probability:

P ( A N = 2, X 1 = 1, X 2 = 3)

P ( A ) P ( N = 2, X 1 = 1, X 2 = 3 A )
=
P ( A) P ( N = 2, X 1 = 1, X 2 = 3 A ) + P ( B ) P ( N = 2, X 1 = 1, X 2 = 3 B )

0.5 ( 0.00337 )
= = 0.339
0.5 ( 0.00337 ) + 0.5 ( 0.00656 )

Similarly,

P ( B N = 2, X 1 = 1, X 2 = 3)

P ( B ) P ( N = 2, X 1 = 1, X 2 = 3 B )
=
P ( A) P ( N = 2, X 1 = 1, X 2 = 3 A ) + P ( B ) P ( N = 2, X 1 = 1, X 2 = 3 B )

0.5 ( 0.00656 )
= = 0.661
0.5 ( 0.00337 ) + 0.5 ( 0.00656 )

Finally,

E ( S N = 2, X 1 = 1, X 2 = 3)
= P ( A N = 2, X 1 = 1, X 2 = 3) + 9 P ( B N = 2, X 1 = 1, X 2 = 3)
=1(0.339) + 9(0.661) = 6.29

Guo Fall 2009 C, Page 230 / 284


Shortcut

When taking the exam, youll still need to understand the conceptual framework
explained in the beginning of the solution. However, youll skip the normalizing step and
avoid the need to manually calculate the mean.

This is what you need when solving this problem in the exam condition:

Event: { N = 2, X 1 = 1, X 2 = 3}
Group Before- This groups After-event Scale up Conditional
event probability to produce size of the raw mean
size of the event group (raw posterior
the posterior probability
group probability) (multiply
the raw
probability
by
200,000)
A 0.5 e
1
2!
1
( e 1 )( e 3 ) 0.5(0.00337) 337 "A A = 1(1) = 1
1
= e 5 = 0.00337
2
B 0.5 32 1
3
1
3
1 3
3
656 "B B = 3 ( 3) = 9
e e e 0.5(0.00656)
2! 3 3
1
1 4
= e 3
= 0.00656
2

Enter the following into BA II Plus/Professional 1-V Statistics Worksheet:


X01=1, Y01=337
X02=9, Y02=656

You should get: n = 993 , X ' 6.28 . So E ( S N = 2, X 1 = 1, X 2 = 3) ' 6.28 .

Problem 8 (Nov 2000 #3)

You are given the following for a dental insurance:

Claim counts for individual insureds follow a Poisson distribution.


Half of the insureds are expected to have 2 claims per year.
The other half of the insureds are expected to have 4 claims per year.

Guo Fall 2009 C, Page 231 / 284


A randomly selected insured has made 4 claims in each of the first two policy years.
Determine the Bayesian estimate of this insureds claim count in the next (third) policy
year.

Solution

The observation is { N1 = 4, N 2 = 4} . We are asked to find E ( N 3 N1 = 4, N 2 = 4 ) .


However, the insured can belong to either Class A with "A = 2 or Class B with "B = 4 .
So if we dont worry about the observation { N1 = 4, N 2 = 4} , we have:

E ( N3 ) = E ( N3 A ) P ( A) + E ( N3 B ) P ( B )
= "A P ( A ) + "B P ( B ) = 2 P ( A ) + 4 P ( B )

Next, well modify the above partition equation by considering the observation
{ N1 = 4, N 2 = 4} . Well change the prior probabilities to posterior probabilities:

E ( N 3 N1 = 4, N 2 = 4 ) = 2 P ( A N1 = 4, N 2 = 4 ) + 4 P ( B N1 = 4, N 2 = 4 )

Next, we need to calculate the posterior probabilities:

P A ( N1 = 4, N 2 = 4 )!
P ( A N1 = 4, N 2 = 4 ) =
P ( N1 = 4, N 2 = 4 )

P ( A ) P ( N1 = 4, N 2 = 4 A )
=
P ( A) P ( N1 = 4, N 2 = 4 A) + P ( B ) P ( N1 = 4, N 2 = 4 B )

Similarly,

P ( B ) P ( N1 = 4, N 2 = 4 B )
P ( B N1 = 4, N 2 = 4 ) =
P ( A ) P ( N1 = 4, N 2 = 4 A) + P ( B ) P ( N1 = 4, N 2 = 4 B )

Detailed calculations (if you use my shortcut, youll avoid most of these calculations):

2
( "A )
4 2
24
P ( N1 = 4, N 2 = 4 A ) = P ( N1 = 4 A) P ( N 2 = 4 A ) = e "A
= e 2

4! ! 4!

Guo Fall 2009 C, Page 232 / 284


2
( "B )
4 2
44
P ( N1 = 4, N 2 = 4 B ) = P ( N1 = 4 B ) P ( N 2 = 4 B ) = e "B
= e 4

4! ! 4!

2
24 2
0.5 e
P ( A N1 = 4, N 2 = 4 ) =
4!
2 2
= 0.176
4 4
2 4
0.5 e 2
+ 0.5 e 4

4! 4!

2
44 4
0.5 e
P ( B N1 = 4, N 2 = 4 ) =
4!
2 2
= 0.824
24 44
0.5 e 2
+ 0.5 e 4

4! 4!

The above two calculations are nasty and prone to errors. Many candidates will mess up
in these calculations and wont score a point. Assume you have done your calculation
right, you should get:

E ( N 3 N1 = 4, N 2 = 4 )
= 2 P ( A N1 = 4, N 2 = 4 ) + 4 P ( B N1 = 4, N 2 = 4 )
= 2(0.176) + 4(0.824) = 3.648

What you should do in the exam room

Just set up the following table and let BA II Plus/Professional 1-V do the magic for you.
Watch and relax.

Event: { N1 = 4, N 2 = 4}
Group Before- This After-event size of the Scale up Conditional
event groups group (raw posterior raw mean
size of probability probability) posterior
the to produce probability
group the event

A 2 2
"A = 2
24 24
e2 0.5 e 2
0.5 4! 4!
B "B = 4
4 2 4 2
4 4 4 4
0.5 e 0.5 e
4! 4!
Guo Fall 2009 C, Page 233 / 284
Next, well need to scale the raw posterior probabilities up. Well want to avoid the error-
prone calculation of following two raw posterior probabilities:
2 2
24 2 444
0.5 e , 0.5 e
4! 4!

Remember what I said earlier when I was explaining Bayes Theorem to you:
What matters is the ratio of these two (or more) raw posterior probabilities, not their
absolute amounts.

What matters is the ratio of


2 2
24 2 444
0.5 e , 0.5 e
4! 4!

So well change these two raw posterior probabilities to


2 2
242 444
0.5 e 0.5 e
44
2
(e )
4 2

= 216 ( e )
4! 4! 2 2
= 1, = 4 = 256e 4 = 4.689
(e )
2 2 2 2
2 24
2 24 2
0.5 e 0.5 e
4! 4!

New Table

Guo Fall 2009 C, Page 234 / 284


Event: { N1 = 4, N 2 = 4}
Group Before- This After-event After-event size of the Scale up
event groups size of the group (raw posterior raw Condi-
size of probability group (raw probability) after posterior tional
the to produce posterior simplification probability
group the event probability) (multiply mean
the raw
probability
by 1,000)
2
24 2
2 2 0.5 e
24
2 24
2 4!
A 0.5
e 0.5 e =1 1,000 "A = 2
4! 4! 2
24 2
0.5 e
4!

2
444
0.5 e
4!
2 2 = 256e 4

44 44 24 2
B 0.5 e4 0.5 e 4 0.5 e 2 4,689 "B = 4
4! 4! 4!
= 4.689

Enter the following into BA II Plus/Professional 1-V Statistics Worksheet:


X01=2, Y01=1,000
X02=4, Y02=4,689

You should get: n = 5, 689 , X ' 3.648 . So E ( N 3 N1 = 4, N 2 = 4 ) ' 3.648 .

Problem 9 (Nov 2000 #28)

Prior to observing any claims, you believed that claim sizes followed a Pareto distribution
with parameters = 10 and =1, 2, or 3, with each value equally likely. You then
observe one claim of 20 for a randomly selected risk. Determine the posterior probability
that the next claim for this risk will be greater than 30.

Solution

The observation is X 1 = 20 . If we dont bother with this new information, then

P ( X 2 > 30 )
= P ( X 2 > 30 = 1) P ( = 1) + P ( X 2 > 30 = 2 ) P ( = 2 ) + P ( X 2 > 30 = 3) P ( = 3)

Guo Fall 2009 C, Page 235 / 284


If you look at Tables for Exam C/4, youll see that the survival function of a (2-

parameter) Pareto distribution with parameters is S ( x


. Here the )=
x+
problem doesnt say whether the Pareto is one parameter or two parameters. One quick
way to determine whether to use one parameter or two-parameter Pareto is this:

If the random variable is greater then zero, then use two parameter Pareto.
If the random variable is greater than a positive constant, then use one parameter Pareto.

The problem just vaguely says that claim sizes follow a Pareto distribution. Here the
claim size (i.e. claim dollar amount) must be greater than zero. Theres no reason for us
to think that the claim dollar amount must exceed a positive constant (such $500). As a
result, well use the 2-parameter Pareto.

Then for = 10 , using the 2-parameter Pareto survival function, we have:

P ( X 2 > 30 ) = S ( 30 ) = 10 1
= ,
30 + 10 4

1 2 3
1 1 1
P ( X 2 > 30 ) = P ( = 1) + P ( = 2) + P ( = 3)
4 4 4
1 1 1
= P ( = 1) + P ( = 2 ) + P ( = 3)
4 16 64

Now the observation X 1 = 20 will change the above calculation to:

P ( X 2 > 30 X 1 = 20 ) = P ( = 1 X 1 = 20 ) + P ( = 2 X 1 = 20 ) + P ( = 3 X 1 = 20 )
1 1 1
4 16 64

Next, well calculate the posterior probabilities. If you look at Tables for Exam C/4,
youll find the density function of a 2-parameter Pareto distribution with parameters is:

+1

f (x )= = ,
(x+ )
+1
x+

+1 +1

= 10 , f ( 20 ) = 10 2010+ 10 1
Then for =
10 3

Then the posterior probabilities are:

Guo Fall 2009 C, Page 236 / 284


P ( = 1) f ( 20 = 1)
P ( = 1 X 1 = 20 ) =
f ( 20 )

P ( = 2 ) f ( 20 = 2)
P ( = 2 X 1 = 20 ) =
f ( 20 )

P ( = 3) f ( 20 = 3)
P ( = 3 X 1 = 20 ) =
f ( 20 )

f ( 20 ) = P ( = 1) f ( 20 = 1) + P ( = 2 ) f ( 20 = 2 ) + P ( = 3) f ( 20 = 3)

+1 +1

Apply the formula f ( 20 ) =


10 1
= . Assume you can do the above
10 20 + 10 10 3
calculation right, youll find:

f ( 20 ) = 0.3704% + 0.2409% + 0.1235% = 0.7408%

P ( = 1 X 1 = 20 ) =
0.3704% 1
=
0.7408% 2

P ( = 2 X 1 = 20 ) =
0.2469% 1
=
0.7408% 3

P ( = 3 X 1 = 20 ) =
0.1235% 1
=
0.7408% 6

Then
P ( X 2 > 30 X 1 = 20 ) = P ( = 1 X 1 = 20 ) + P ( = 2 X 1 = 20 ) + P ( = 3 X 1 = 20 )
1 1 1
4 16 64
1 1 1 1 1 1
= + + = 0.148
4 2 16 3 64 6

If you ever try to reproduce my answers, youll find the calculation outlined above is
absolutely a nightmare. In addition, I must acknowledge that I used an Excel spreadsheet
to help me do the above calculations when I was preparing this manual. I must also
knowledge that theres little chance that I will be able to do the calculation right in the
heat of the exam.

In the exam, Ill never use the above standard approach, which is prone to errors. This is
what I will do in the exam (dramatically reducing the complexity of the calculations).
This is what you should do in the exam:
Guo Fall 2009 C, Page 237 / 284
What you should do in the exam room

Event: X 1 = 20
A B C D=BC E F

Group Before- This groups After-event size Scale up raw


event density to of the group (raw posterior P ( X 2 > 30 )
size of produce the posterior probability
the event probability)
group multiply the 1
=
f ( 20 ) raw prob by 4
+1
3(30)(32)
1
=
10 3
1
=
30 3
=1 1 1 1 1 1 1 1
1

3 30 3 3 30 3 3
4
1 2 1
2
1 2 1
2
1
2

=2 3 2
30 3 3 30 3 4
1 3 1
3
1 3 1
3
1
3

=3 3 1
30 3 3 30 3 4

Enter the following into BA II Plus/Professional 1-V Statistics Worksheet:

1
X 01 = = 0.25 , Y01 = 3
4
2
1
X 02 = = 0.0625 , Y02 = 2
4
3
1
X 02 = = 0.015625 , Y03 = 1
4

You should get: n = 6 , X = 0.14843750 . So P ( X 2 > 30 X 1 = 20 ) = 0.14843750

You see how nice and easy the shortcut calculation is.

Guo Fall 2009 C, Page 238 / 284


May 2001 #10
The claim count and claim size distribution for risks of Type A are:

# of claims Probability Claim size Probability


0 4/9 500 1/3
1 4/9 1235 2/3
2 1/9

The claim count and claim size distributions for risks of Type B are:

# of claims Probability Claim size Probability


0 1/9 250 2/3
1 4/9 328 1/3
2 4/9

Risks are equally likely to be type A and type B


Claim counts and claim sizes are independent within each risk type.
The variance of the total losses in 296,962.

A randomly selected risk is observed to have total annual losses of 500.

Determine the Bayesian premium for the next year for this same risk.

Solution

N
Let S = ( X i represent the total annual loss. The observation is S1 = 500 . We are asked
i =1

to find E ( S 2 S1 = 500 ) . If we ignore the observation S1 = 500 , then the problem becomes
finding E ( S2 ) . Since the risk can be from either Type A or Type B, well condition S2
on risk types.

E ( S2 ) = E ( S2 A) P ( A) + E ( S2 B ) P ( B )

E (S ) = E (N ) E ( X ) ,

E ( S2 A ) = E ( N 2 A) E ( X A ) , E ( S2 B ) = E ( N 2 B ) E ( X B )

E ( N 2 A) = 0 = , E ( N2 B ) = 0
4 4 1 6 1 4 4 12
+1 +2 +1 +2 =
9 9 9 9 9 9 9 9

E ( X A ) = 500 = 990 , E ( X B ) = 250


1 2 2 1
+ 1235 + 328 = 276
3 3 3 3
Guo Fall 2009 C, Page 239 / 284
E ( S2 A ) = E ( N 2 A ) E ( X A ) =
6
( 990 ) = 660
9
E ( S 2 B ) = E ( N 2 B ) E ( X B ) = ( 276 ) = 368
12
9
E ( S 2 ) = E ( S 2 A ) P ( A ) + E ( S 2 B ) P ( B ) = 660 P ( A) + 368 P ( B )

The observation S1 = 500 will change the above equation to:

E ( S 2 S1 = 500 ) = 660 P ( A S1 = 500 ) + 368 P ( B S1 = 500 )

P ( A) P ( S1 = 500 A ) P ( B ) P ( S1 = 500 B )
P ( A S1 = 500 ) = , P ( B S1 = 500 ) =
P ( S1 = 500 ) P ( S1 = 500 )

Well calculate the ratio:

P ( A S1 = 500 ) P ( A ) P ( S1 = 500 A )
=
P ( B S1 = 500 ) P ( B ) P ( S1 = 500 B )

The only way for Type A to incur 500 claim in Year 1 is to have one claim of 500. The
only way for Type B to incur 500 claim in Year 1 is to two claims of 250 each.

So P ( S1 = 500 A) = , P ( S1 = 500 B ) =
4 1 4 2
.
9 3 9 3

We are told that P ( A ) = P ( B ) = 0.5 .


4 1
0.5

P ( A S1 = 500 ) P ( A ) P ( S1 = 500 A ) 9 3 3
= = =
P ( B S1 = 500 ) P ( B ) P ( S1 = 500 B ) 4 2
2
4
0.5
9 3

Because P ( A S1 = 500 ) + P ( B S1 = 500 ) = 1 , we have:

P ( A S1 = 500 ) = , P ( B S1 = 500 ) =
3 4
7 7

Finally, E ( S 2 S1 = 500 ) = 660 P ( A S1 = 500 ) + 368 P ( B S1 = 500 )


3 4
= 660 + 368 = 493.14
7 7
Guo Fall 2009 C, Page 240 / 284
What you should do in the exam room

Event: S1 = 500
A B C D=BC E F

Group Before- This groups After-event size Scale up raw


event probability of the group (raw posterior E ( S 2 Type )
size of to produce posterior probability
the the event probability)
group multiply the
raw prob by

32
4
0.5
9
Type 4 1 4 1
A 0.5 0.5 3 660
9 3 9 3
2 2
4 2 4 2
Type 0.5 0.5 4 368
9 3 9 3
B

Enter the following into BA II Plus/Professional 1-V Statistics Worksheet:

X01=660, Y01=3; X02=368, Y02=4.

You should get: n = 7 , X = 493.14 . So E ( S 2 S1 = 500 ) = 493.14

Nov 2002 #39

You are given:

Class # of insureds Claim Count


Probabilities
0 1 2 3 4
1 3000 1/3 1/3 1/3 0 0
2 2000 0 1/6 2/3 1/6 0
3 1000 0 0 1/6 2/3 1/6

A randomly selected insured has one claim in Year 1.

Determine the expected number of claims in Year 2 for that insured.

Solution
Guo Fall 2009 C, Page 241 / 284
Conceptual framework

The observation is N1 = 1 . We are asked to find E ( N 2 N1 = 1) . If we ignore the


observation N1 = 1 , then the problem becomes finding E ( N 2 ) . Since N 2 can be
generated from each of the three classes, well condition N 2 on classes:

3
E ( N 2 ) = ( E ( N 2 Class i ) P ( Class i )
i =1

E ( N 2 Class 1) = 0
1 1 1
+1 +2 =1
3 3 3

E ( N 2 Class 2 ) = 1
1 2 1
+2 +3 =2
6 3 6

E ( N 2 Class 3) = 2
1 2 1
+3 +4 =3
6 3 6

E ( N 2 ) = P ( Class i ) + 2 P ( Class i ) + 3P ( Class i )

The observation N1 = 1 will change the above equation into:

E ( N 2 N1 = 1) = P ( Class 1 N1 = 1) + 2 P ( Class 2 N1 = 1) + 3P ( Class 3 N1 = 1)

3 1
P ( Class 1) P ( N1 = 1 Class 1)
P ( Class 1 N1 = 1) = = 6 3
P ( N1 = 1) P ( N1 = 1)
2 1
P ( Class 2 ) P ( N1 = 1 Class 2 )
P ( Class 2 N1 = 1) = = 6 6
P ( N1 = 1) P ( N1 = 1)

3
P ( Class 3) P ( N1 = 1 Class 3) 0
P ( Class 3 N1 = 1) = = 6 =0
P ( N1 = 1) P ( N1 = 1)

3 1 2 1 3 2
P ( N1 = 1) = + + 0 =
6 3 6 6 6 9

Guo Fall 2009 C, Page 242 / 284


3 1
P ( Class 1) P ( N1 = 1 Class 1)
P ( Class 1 N1 = 1) = = 6 3=3
P ( N1 = 1) 2 4
9

2 1
P ( Class 2 ) P ( N1 = 1 Class 2 )
P ( Class 2 N1 = 1) =
1
=6 6=
P ( N1 = 1) 2 4
9

E ( N 2 N1 = 1) = P ( Class 1 N1 = 1) + 2 P ( Class 2 N1 = 1) + 3P ( Class 3 N1 = 1)


3 1
= + 2 + 3 0 = 1.25
4 4

This is what you should do in the exam room

Event: N1 = 1
A B C D=BC E F

Group Before- This groups After-event size Scale up raw


(class) event probability of the group (raw posterior E ( N 2 Class )
size of to produce posterior probability
the the event probability)
group multiply the
raw prob by
18
1 3/6 1/3 3 1
3 1
6 3

2 1
2 2/6 1/6 1 2
6 6
3 1/6 0 1/60 0 3

Because the posterior probability is zero for Class to produce N1 = 1 , we can delete the
last row.

Enter the following into BA II Plus/Professional 1-V Statistics Worksheet:

X01=1, Y01=3; X02=2, Y02=1.

You should get: n = 4 , X = 1.25 . So E ( N 2 N1 = 1) = 1.25

Guo Fall 2009 C, Page 243 / 284


Nov 2000 #33
A car manufacturer is testing the ability of safety devices to limit damage in car
accidents. You are given:

A test car has either front air bags or side air bags (but not both), each type being
equally likely
The test car will be driven into either a wall or a lake, with each accident type
being equally likely
The manufacturer randomly selects 1, 2, 3, or 4 crash test dummies to put into a
car with front air bags.
The manufacturer randomly selects 2, or 4 crash test dummies to put into a car
with side air bags.
Each crash test dummy in a wall-impact accident suffers damage randomly equal
to either 0.5 or 1, with damage to each dummy being independent of damage to
the others.
Each crash test dummy in a lake-impact accident suffers damage randomly equal
to either 1 or 2, with damage to each dummy being independent of damage to the
others.

One test car is selected at random, and a test dummy accident produces total damage of 1.

Determine the expected value of the total damage for the next accident, given that the
kind of safety device (front or side air bags) and accident type (wall or lake) remain the
same.

Solution

This is one of the most feared exam problems. If you use the framework and shortcut,
however, you should do just fine.

Conceptual framework

N
Damage S = ( X i , where X is damage incurred by one test dummy; N is the number
i =1

of dummies chosen for the crash testing. The observation is S1 = 1 . We are asked to find
E ( S 2 S1 = 1) .

To simplify the problem, lets first discard the observation. Then the problem becomes
finding E ( S2 ) . The crash testing falls into four types:

Front air bag, wall collision (FW)


Front air bag, lake collision (FL)
Side air bag, wall collision (SW)
Side air bag, lake collision (FW)
Guo Fall 2009 C, Page 244 / 284
Next, we set up the partition equation:

E ( S2 )
= E ( S 2 FW ) P ( FW ) + E ( S 2 FL ) P ( FL ) + E ( S 2 SW ) P ( SW ) + E ( S 2 SL ) P ( SL )

Next, lets calculate E ( S 2 FW ) . The manufacturer randomly selects 1, 2, 3, or 4 crash


test dummies to put into a car with front air bags. Each dummy is equally likely to be
chosen. So the expected number of dummies used for crash testing under FW is:

1+ 2 + 3 + 4
E ( N FW ) = E ( N F ) = = 2.5
4

If the car is tested for lake collision, then the damage to a tested dummy can be either 0.5
or 1 with each damage equally likely:

0.5 + 1
E ( X FW ) = E ( X W ) = = 0.75
2

E ( S 2 FW ) = E ( N FW ) E ( X FW ) = 2.5 ( 0.75 )

Similarly,

1+ 2 + 3 + 4 1+ 2
E ( S 2 FL ) = E ( N FL ) E ( X FL ) = = 2.5 (1.5 )
4 2

2 + 4 0.5 + 1
E ( S 2 SW ) = E ( N SW ) E ( X SW ) = = 3 ( 0.75 )
2 2

2 + 4 1+ 2
E ( S 2 SL ) = E ( N SL ) E ( X SL ) = = 3 (1.5 )
2 2

E ( S2 ) = 2.5 ( 0.75 ) P ( FW ) + 2.5 (1.5 ) P ( FL ) + 3 ( 0.75 ) P ( SW ) + 3 (1.5 ) P ( SL )

If we want to complete the above calculation, well plug in

P ( FW ) = P ( FL ) = P ( SW ) = P ( SL ) = 0.25

This will produce the prior mean.

Guo Fall 2009 C, Page 245 / 284


However, we are interested in finding the posterior mean E ( S 2 S1 = 1) . So we need to
consider the impact of the observation S1 = 1 . This observation will change the partition
equation into:

E ( S 2 S1 = 1) = 2.5 ( 0.75 ) P ( FW S1 = 1) + 2.5 (1.5 ) P ( FL S1 = 1)


+3 ( 0.75 ) P ( SW S1 = 1) + 3 (1.5 ) P ( SL S1 = 1)

Using Bayes Theorem, we have:

P ( FW ) P ( S1 = 1 FW ) P ( FL ) P ( S1 = 1 FL )
P ( FW S1 = 1) = , P ( FL S1 = 1) =
P ( S1 = 1) P ( S1 = 1)

P ( SW ) P ( S1 = 1 SW ) P ( SL ) P ( S1 = 1 SL )
P ( SW S1 = 1) = , P ( SL S1 = 1) =
P ( S1 = 1) P ( S1 = 1)

Where

P ( S1 = 1) = P ( FW ) P ( S1 = 1 FW ) + P ( FL ) P ( S1 = 1 FL )
+ P ( SW ) P ( S1 = 1 SW ) + P ( SL ) P ( S1 = 1 SL )

The key is to calculate P ( S1 = 1 FW ) . In a front bag lake collision testing, the number of
dummies can be 1,2,3, or 4; the damage per dummy can be 0.5 or 1. So there are only 2
ways for FW to produce S1 = 1 .
Two dummies were chosen each having 0.5 damage. Probability: 0.25(0.5)(0.5)
One dummy was chosen having 1 damage. Probability: 0.25(0.5)
Total probability: P ( S1 = 1 FW ) =0.25(0.5)(0.5) + 0.25(0.5) = 0.1875

We can apply the same logic and find (please verify my calculation):
P ( S1 = 1 FL ) = 0.125 , P ( S1 = 1 SW ) = 0.125 , P ( S1 = 1 SL ) = 0

We are given that P ( FW ) = P ( FL ) = P ( SW ) = P ( SL ) = 0.25

Finally,

P ( S1 = 1) = 0.25 * (0.1875 + 0.125 + 0.125)

0.25 0.1875
P ( FW S1 = 1) =
3
=
0.25 (0.1875 + 0.125 + 0.125) 7

Guo Fall 2009 C, Page 246 / 284


0.25 0.125
P ( FL S1 = 1) =
2
=
0.25 (0.1875 + 0.125 + 0.125) 7

0.25 0.1875
P ( SW S1 = 1) =
2
=
0.25 (0.1875 + 0.125 + 0.125) 7

0.25 0
P ( SL S1 = 1) = =0
0.25 (0.1875 + 0.125 + 0.125)

Finally,

E ( S 2 S1 = 1) = 2.5 ( 0.75 ) P ( FW S1 = 1) + 2.5 (1.5 ) P ( FL S1 = 1)


+3 ( 0.75 ) P ( SW S1 = 1) + 3 (1.5 ) P ( SL S1 = 1)

3 2 2
= 2.5 ( 0.75 ) + 2.5 (1.5 ) + 3 ( 0.75 ) = 2.518
7 7 7

This is what you should do in the exam room

Event: S1 = 1
A B C D=BC E F

Group Before- This groups After-event size Scale up raw


(class) event probability of the group (raw posterior E ( S 2 S1 = 1)
size of to produce posterior probability
the the event probability)
group multiply the
raw prob by
40,000
FW 1 0.1875 1
( 0.1875) 1875 2.5 ( 0.75 )
4 4

1 1
FL 0.125 ( 0.125) 1250 2.5 (1.5 )
4 4
SW 1 0.125 1 1250 3 ( 0.75)
( 0.125)
4 4
SL 1 0 0 0 3 (1.5)
4

Because the posterior probability is zero for Class to produce S1 = 1 , we can delete the
last row.
Guo Fall 2009 C, Page 247 / 284
Enter the following into BA II Plus/Professional 1-V Statistics Worksheet:

X01= 2.5 ( 0.75 ) , Y01=1875;


X02= 2.5 (1.5 ) , Y02=1250
X03= 3 ( 0.75) , Y03=1250

You should get: n = 4357 , X = 2.518 . So E ( S 2 S1 = 1) = 2.518

Problem 10 (May 2005 #35)

The # of claims on a given policy has the geometric distribution with parameter .
One-third of the policies have = 2 ; and the remaining two-thirds have = 5 .

A randomly selected policy had two claims in Year 1.

Calculate the Bayesian expected # of claims for the selected policy in Year 2.

Solution

The observation is N1 = 2 . We are asked to find E ( N 2 N1 = 2 ) . If we dont worry about


the observation N1 = 2 , then

E ( N2 ) = E ( N2 = 2) P ( = 2) + E ( N2 = 5) P ( = 5)

Because N 2 has a geometric distribution, we have E ( N 2 )=


E ( N2 ) = 2P ( = 2) + 5P ( = 5)

The observation N1 = 2 will change the above equation to

E ( N 2 N1 = 2 ) = 2 P ( = 2 N1 = 2 ) + 5 P ( = 5 N1 = 2 )

Next, well calculate the raw (un-normalized) posterior probability:

Guo Fall 2009 C, Page 248 / 284


Event: A policy has 2 claims in Year 1.
A B C D=BC E F

Group Before- This groups After-event size of Scale up


event probability to the group (raw raw E ( N2 )=
size of produce the posterior posterior
the event (a probability) probability
group geometric
distribution) multiply
the raw
P ( N1 = 20 ) prob by
2
100,000
=
(1 + )
3

1 22 4 1 4
=2 = = 0.04938 4,938 2
(1 + 2 )
3
3 27 3 27

2 52 25 2 25
=5 = = 0.07716 7,716 5
(1 + 5)
3
3 216 3 216

Enter the following into BA II Plus/Professional 1-V Statistics Worksheet:


X01=2, Y01=4,938; X02=5, Y02=7,716.

You should get: X ' 3.83 . So E ( N 2 N1 = 2 ) ' 3.83

Problem 11 (Nov 2005, #15)

For a particular policy, the conditional probability of the annual number of claims given
) = , and the probability distribution of ) are as follows:

# of claims 0 1 2
Probability 2 1 3

0.10 0.30
Probability 0.80 0.20

One claim was observed in Year 1.

Calculate the Bayesian estimate of the expected # of claims in Year 2.

Solution

Guo Fall 2009 C, Page 249 / 284


The observation is X 1 = 1 . We are asked to find E ( X 2 X 1 = 1) .

Ignoring this observation, we have:

E ( X2 ) = E ( X2 = 0.1) P ( = 0.1) + E ( X 2 = 0.3) P ( = 0.3)

E ( X2 )
= E ( X 2 , X 2 = 0 ) P ( X 2 = 0 ) + E ( X 2 , X 2 = 1) P ( X 2 = 1) + E ( X 2 , X 2 = 2 ) P ( X 2 = 2 )
= 0(2 ) + 1( ) + 2 (1 3 )=2 5

E ( X2 = 0.1) = 2 5 ( 0.1) = 1.5 , E ( X 2 = 0.3) = 2 5 ( 0.3) = 0.5

E ( X 2 ) = 1.5 P ( = 0.1) + 0.5P ( = 0.3)

Considering the observation X 1 = 1 , we have:

E ( X 2 X 1 = 1) = 1.5 P ( = 0.1 X 1 = 1) + 0.5 P ( = 0.3 X 1 = 1)

Event: X 1 = 1
A B C D=BC E F

Group Before- This groups After-event size Scale up raw


event probability of the group (raw posterior E ( X2 )
size of to produce posterior probability
the the event . probability)
group (The multiply the
probability raw prob by
to have one 100
claim is )

= 0.1 0.8 0.1 0.8(0.1)=0.08 8 1.5

= 0.3 0.2 0.3 0.2(0.3)=0.06 6 0.5

Enter the following into BA II Plus/Professional 1-V Statistics Worksheet:


X01=1.5, Y01=8; X02=0.5, Y02=6.

You should get: X = 1.07142857 . So E ( X 2 X 1 = 1) ' 1.07

Guo Fall 2009 C, Page 250 / 284


Calculate Bayesian premiums when the prior probability is continuous

The solution process for continuous-prior problems are similar to the process for the
discrete prior problems. There are two major differences:

Well use integration for the continuous prior problems; well use summation for
the discrete prior problems.

You cant use the BA II Plus/Professional 1-V Statistics Worksheet shortcut any
more to solve a continuous-prior premium problem. In contrast, you use the BA II
Plus/Professional 1-V Statistics Worksheet shortcut to solve a discrete-prior
premium problem.

Calculate the Bayesian premium when the prior probability is continuous


Step 1 Determine the observation.

Step 2 Discard the observation. Set up your partition equation.

Step 3 Consider the observation. Modify your parturition equation obtained in


Step 2. Change the prior probability to posterior probability.

Step 4 Use Bayes Theorem and calculate the posterior probability.

Step 5 Calculate the final answer.

Ill illustrate this problem with examples.

Problem 1 (May 2001 #37)


You are given the following information about workers compensation coverage

The # of claims from an employee during the year follows a Poisson distribution
100 p
with mean , where p is the salary (in thousands) for the employee
100

The distribution of p is uniform on the interval [0, 100].

An employee is selected at random. No claims were observed for this employee during
the year. Determine the posterior probability that the selected employee has a salary
greater than 50.

Solution

Step 1 Determine the observation. This is N = 0 . We are asked to find P ( p > 50 N = 0 ) .


Please note we are NOT asked to find P ( N 2 > 50 N1 = 0 ) .

Guo Fall 2009 C, Page 251 / 284


Step 2 Ignore the observation. Set up your partition equation.

If we ignore the observation, we just need to find P ( p > 50 ) . Since p is uniform on the
interval [0, 100], we have:

100
P ( p > 50 ) = f ( p ) dp
50

Step 3 Consider the observation. Modify the equation.

100
P ( p > 50 N = 0 ) = f ( p N = 0 ) dp
50

Step 4 Calculate the posterior probability

f ( p) P ( N = 0 p) f ( p) P ( N = 0 p)
f ( p N = 0) = =
P ( N = 0) 100
f ( p ) P ( N = 0 p ) dp
p =0

100 p
N p is a Poisson random variable with mean " = = 1 0.01 p . So
100

P ( N = 0 p ) = e0.01 p 1 . f ( p ) P ( N = 0 p ) = 0.01e0.01 p 1 ,

100 100 100


P ( N = 0) = f ( p ) P ( N = 0 p ) dp = 0.01e0.01 p 1dp = e 1
0.01e0.01 p dp .
p =0 p=0 p =0

=e 1
( e 1) = 1 e 1

f ( p) P ( N = 0 p) 0.01e0.01 p 1 0.01e 1 0.01 p 0.01 0.01 p


f ( p N = 0) = = = e = e
P ( N = 0) 1 e1 1 e1 e 1

Step 5 Calculate the final answer

100 100
e e0.5
P ( p > 50 N = 0 ) = f ( p N = 0 )dp =
0.01 0.01 p
e dp = = 0.622
p = 50 p = 50
e 1 e 1

Guo Fall 2009 C, Page 252 / 284


Shortcut

100 p
Since N p is a Poisson random variable with mean , we naturally set
100
100 p
"= . Since p is uniform over [0, 100], 100 p is also uniform over [0, 100] and
100
100 p
"= is uniform over [0, 1]. f ( " ) = 1 .
100

f (" ) P ( N = 0 " ) " "


f ( " N = 0) =
e e
= =
P ( N = 0) 1
1 e 1
e "d"
0

100 p p
"= =1 , p = 100 (1 " )
100 100

p > 50 * 100 (1 " ) > 50 * " < 0.5 ,


0.5 0.5 "
P ( p > 50 N = 0 ) = P ( " < 0.5 N = 0 ) = f ( " N = 0 )d " =
e
1
d"
" =0 " =0 1 e
1 e 0.5
= = 0.6225
1 e1

Problem 13 (Nov 2005 #32)


You are given:
In a portfolio of risks, each policyholder can have at most two claims per year.
For each year, the distribution of the number of claims is:

# of claims Probability
0 0.1
1 0.9 q
2 q

q2
The prior density is (q) = , 0.2 < q < 0.5
0.039

A randomly selected policyholder had two claims in Year 1 and two claims in Year 2.
For this insured, determine the Bayesian estimate of the expected number of claims in
Year 3.

Solution

Guo Fall 2009 C, Page 253 / 284


Continuous-prior problems are harder than discrete-prior ones and many candidates are
scared of them. However, if you can follow the 5-step framework, youll be on the right
track.

The observation is ( N1 = 2, N 2 = 2 ) . We are asked to find E ( N 3 N1 = 2, N 2 = 2 ) .

Lets simplify the problem by discarding the observation ( N1 = 2, N 2 = 2 ) . Then our task
is to find prior mean E ( N 3 ) . This is an Exam P problem.

N 3 is distributed as follows:

+0 with probability 0.1


,
N 3 = -1 with probability 0.9 - q
,2
. with probability q
q2
Here q is a random variable with pdf (q) = , 0.2 < q < 0.5 . If q is fixed, then
0.039
the prior mean given q is:

E ( N 3 q ) = 0 ( 0.1) + 1( 0.9 q ) + 2 ( q ) = q + 0.9

Next, we take the expectation of the above equation regarding q :

Eq E ( N 3 q ) ! = Eq ( q + 0.9 ) = E ( q ) + 0.9

However, Eq E ( N 3 q ) ! = E ( N 3 ) -- this is the double expectation theorem.

E ( N 3 ) = E ( q ) + 0.9

0.5 0.5
q2
E ( q ) = q ( q ) dq = q dq = 0.39
0.2 0.2
0.039

E ( N 3 ) = E ( q ) + 0.9 = 0.9 + 0.39 = 1.29

So the mean prior to the observation is 1.29. Please note that we dont need to calculate
the prior mean. I calculated it just to show you this: if you discard the observation, then
the problem becomes an Exam P problem.

Next, lets add in the observation. The observation ( N1 = 2, N 2 = 2 ) will change the
equation from E ( N 3 ) = E ( q ) + 0.9 to
Guo Fall 2009 C, Page 254 / 284
E ( N 3 N1 = 2, N 2 = 2 ) = E ( q N1 = 2, N 2 = 2 ) + 0.9

0.5
E ( q N1 = 2, N 2 = 2 ) = q f ( q N1 = 2, N 2 = 2 ) dq
0.2

f ( q ) P ( N1 = 2, N 2 = 2 q ) f ( q ) P ( N1 = 2, N 2 = 2 q )
f ( q N1 = 2, N 2 = 2 ) = =
P ( N1 = 2, N 2 = 2 ) 0.5
f ( q ) P ( N1 = 2, N 2 = 2 q ) dq
0.2

q2
P ( N1 = 2, N 2 = 2 q ) = q 2 , (q) = .
0.039

q2
( q2 )
q4
f ( q N1 = 2, N 2 = 2 ) = 0.5 0.039 = 0.5
q2
0.039
( q ) dq
2
q 4 dq
0.2 0.2
0.5
1 6 0.5 q 5 dq
0.5 q !
E ( q N1 = 2, N 2 = 2 ) = q f ( q N1 = 2, N 2 = 2 ) dq = 0.5 =6
0.2
0.2
= 0.419
1 5 0.5
0.2 4
q dq q !
5 0.2
0.2

E ( N 3 N1 = 2, N 2 = 2 ) = E ( q N1 = 2, N 2 = 2 ) + 0.9 = 0.419 + 0.9 = 1.32

Problem (Nov 2000 #23)


You are given:
The parameter / has an inverse gamma distribution with probability density
function

g ( " ) = 500" 4 e 10 "


, " >0

The size of a claim has an exponential distribution with probability density


function

f ( x / = " ) = " 1e x "


, x >0, " >0

For a single insured, two claims were observed that totaled 50. Determine the expected
value of the next claim from the same insured.

Solution
Guo Fall 2009 C, Page 255 / 284
We are asked to find E ( X 3 X 1 + X 2 = 50 ) . If we ignore the observation X 1 + X 2 = 50 ,
then the problem becomes

+ + +
E ( X3 ) = xf ( x )dx = xf ( x " )g ( " ) dx = x ( " 1e x "
)g ( " ) dx
0 0 0

If we consider the observation, well need to change the prior density g ( " ) to the
posterior density g ( " X 1 + X 2 = 50 )

+
E ( X 3 X 1 + X 2 = 50 ) = x ( " 1e x "
)g ( " X 1 + X 2 = 50 ) dx
0

Nov 2001 #14


For a group of insureds, you are given:
The amount of claim is uniformly distributed but will not exceed a certain
unknown limit
500
The prior distribution of is ( ) = 2 , > 500

Two independent claims of 400 and 600 are observed.

Determine the probability that the next claim will exceed 500.

Solution

The observation is X 1 = 400, X 2 = 600 . We are asked to find

P ( X 3 > 550 X 1 = 400, X 2 = 600 )

If we ignore the observation, then P ( X 3 > 550 ) = P ( X 3 > 550 )f ( )d


500

] . So P ( X 3 > 550 ) =
550
X3 is uniformly distributed over [ 0, .

550
P ( X 3 > 550 ) = 1 f ( )d
500

Since we have the observation X 1 = 400, X 2 = 600 , we will modify the above equation by
changing the prior density f ( ) to the posterior density f ( X 1 = 400, X 2 = 600 ) :
Guo Fall 2009 C, Page 256 / 284
P ( X 3 > 550 X 1 = 400, X 2 = 600 ) = ( X 1 = 400, X 2 = 600 ) d ]
550
1 f
600

Please note that weve also changed d to d because weve observed X 2 = 600 .
500 600

f ( X 1 = 400, X 2 = 600 ) 0 f ( ) P ( X 1 = 400 ) P ( X 2 = 600 )


500 1 1 500
= 2
= 4
where > 600

Next, well find the normalizing constant:

( X 1 = 400, X 2 = 600 ) =
k
f 4
where > 600

! 600 = 3 ( 600 ) = 1, l = 3 ( 6003 )


k 1 k
d = 1, k 3 3

600
4
4 +1

3 ( 6003 )
f ( X 1 = 400, X 2 = 600 ) = 4

P ( X 3 > 550 X 1 = 400, X 2 = 600 )

3 ( 6003 ) d = 3 ( 6003 ) ( )d
550 1
= 1 4
4
550 5

600 600

= 3 ( 6003 ) = 3 ( 6003 )
1 4 +1 550 5+1 1 550
600 3
600 4

4 +1 5 +1 ! 600 3 4

3 550
=1 = 0.3125
4 600

Nov 2002 #24


You are given:
The amount of a claim, X , is uniformly distributed on the interval [ 0, ]
500
The prior distribution of is ( )= 2
, > 500

Guo Fall 2009 C, Page 257 / 284


Two claims, x1 = 400 and x2 = 600 , are observed. You calculate the posterior
distribution as:

6003
f ( x1 , x2 ) = 3 4
, > 600

Calculate the Bayesian premium E ( X 3 x1 , x2 ) .

Solution

This problem is the recycled problem of Nov 2001 #14.

E ( X 3 x1 , x2 ) = E ( X3 )f ( x1 , x2 ) d
600

X3 is uniform over [ 0, ] . So E ( X 3 ) = .
2

6003
E ( X 3 x1 , x2 ) =
2
3 4
d =
3
2
( 6003 ) 3
d =
3
2
( 6003 ) ( 600
1
2
2
) = 450
600 600

May 2001 #18


You are given:
An individual automobile insured has annual claim frequencies that follow a
Poisson distribution with mean "
An actuarys prior distribution for the parameter " has probability density
function

1
( " ) = ( 0.5) 5e 5"
+ ( 0.5 ) e " 5

In the first policy year, no claims were observed for the insured.

Determine the expected # of claims in the 2nd policy year.

Solution

The observation is N1 = 0 . We are asked to find E ( N 2 N1 = 0 ) . If we ignore the


observation N1 = 0 , then the problem becomes finding E ( N 2 ) . Using the double
expectation theorem, we have:

Guo Fall 2009 C, Page 258 / 284


E ( N 2 ) = E" E ( N 2 " ) ! = E ( " ) = (" ) d"
0

If we consider the observation N1 = 0 , the above equation becomes:

E ( N 2 N1 = 0 ) = E ( " N1 = 0 ) = (" N 1 = 0) d "


0

So the key is to find the posterior distribution (" N 1 = 0) .

(" N 1 = 0) 0 ( " ) P ( N1 = 0 " ) = ( 0.5 ) 5e 5" 1


+ ( 0.5 ) e " 5
e "

5 !

(" N 1 = 0) = k ( 0.5 ) 5e 6" 1


+ ( 0.5 ) e 6" 5

5 !
5 ( 0.5)
=k ( 6e 6"
) + 0.5 6
e 6" 5

6 6 5 !

So (" N 1 = 0 ) is a mixture of two exponential distribution.

Next, well need to find the normalizing constant k . The total probability should be one.
We have:

5 ( 0.5)
(" N 1 = 0 )d " = k ( 6e 6"
) + 0.5 6
e 6" 5
=1
0 0
6 6 5 !

5 ( 0.5) ( 0.5) 5 + 0.5 = 0.5


( 6e 6"
) + 0.5 6
e 6" 5
= k =2
0
6 6 5 ! 6 6

5 ( 0.5 )
(" N 1 = 0) = 2 ( 6e 6"
) + 0.5 6
e 6" 5

6 6 5 !
=
5
6
( 6e 6"
) + 16 6
5
e 6" 5

E ( N 2 N1 = 0 ) = E ( " N1 = 0 ) = (" N = 0) d " =


5 1 1 5
1 + = 0.278
0
6 6 6 6

Guo Fall 2009 C, Page 259 / 284


Poisson-gamma model

Problem (May 2000, #30)


You are given:
An individual automobile insured has an annual claim frequency distribution that
follows a Poisson distribution with mean "
" follows a gamma distribution with parameter and
The 1 actuary assumes that = 1 and = 1 6
st

The 2nd actuary assumes the same mean for the gamma distribution, but only half
the variance
A total of one claim is observed for the insured over a 3-year period
Both actuaries determine the Bayesian premium for the expected number of
claims in the next year using their model assumptions

Determine the ratio of the Bayesian premium that the 1st actuary calculates to the
Bayesian premium that the 2nd actuary calculates.

Solution

If

N " is Poisson with mean "


" follows a gamma distribution with parameter and
n1 , n2 ,, nk claims are observed in Year 1, Year 2,, Year k respectively

Then

The conditional random variable " n1 , n2 ,..., nk also follows gamma distribution with
parameters

*
= + n1 + n2 + ... + nk = + total # of claims observed
1 1
1 1
*
= = +k = + # of observation years
1+ k

The Bayesian premium for the next year, Year k + 1 , is

+ total # of claims observed


E ( N k +1 n1 , n2 ,..., nk ) = E ( " n1 , n2 ,..., nk ) = * *
=
1
+ # of observation years

This theorem is tested over and over and you should memorize it. If you want to find the
proof of this theorem, refer to the textbook Loss Models.

Guo Fall 2009 C, Page 260 / 284


In this problem,
the observation period = 3 years
# of claims observed = 1

1st actuary: =1, = 1 6 . The Bayesian premium for the 4th year is

+ total # of claims observed 1+1 2


= =
1
+ # of observation years 6 + 3 9

2nd actuary: You need to know that a gamma distribution with parameters and has
2
mean and variance . We are told that the two actuaries get the same mean but the
2 actuary gets half the variance of the 1st one.
nd

2
1 1 1 1 1
= 1 = , 2
= 1 , =2, =
6 6 2 6 12

The Bayesian premium for the 4th year is

+ total # of claims observed 2 +1 1


= =
1
+ # of observation years 12 + 3 5

2 1 10
So the ratio is =
9 5 9

Nov 2001 #3
You are given:
The # of claims per auto insured follows a Poisson distribution with mean "
The prior distribution for " has the following probability density function:

( 500" ) e 500"
50

f (" ) =
"1 ( 50 )

A company observes the following claims experience:


Year 1 Year 2
# of claims 75 210
# of autos insured 600 900

The company expects to insure 1,100 autos in Year 3.


Determine the expected # of claims in Year 3.

Solution

Guo Fall 2009 C, Page 261 / 284


The observation is N1 = 75, N 2 = 210 , where N1 is the # of claims in Year 1 for the 600
auto policies; N 2 is the # of claims in Year 2 for the 900 auto policies. N1 has Poisson
distribution with mean of 600" . N 2 has Poisson distribution with mean of 900" .

We need to find E ( N 3 N1 = 75, N 2 = 210 ) , where N 3 is the # of claims in Year 3 for the
one auto policy. Then the expected # of auto claims in Year 3 for 1,100 auto policies is
simply

1,100 E ( N 3 N1 = 75, N 2 = 210 )

If we ignore the observation N1 = 75, N 2 = 210 , then E ( N 3 ) = E" E ( N 3 " ) ! = E ( " ) .


We are told that

( 500" ) e 500 "


50

f (" ) =
"1 ( 50 )

If you look at Table for Exam C, youll find the gamma pdf is:

x 1 x
x x
e e
( x" ) x"
1
e 1
f ( x) = = = , where " = .
x1( ) 1
1( "1 ( )
)

You should immediately recognize that this is gamma distribution with parameters
= 50 and " = 500 . Then using the gamma distribution formula listed in Table for
Exam C, we have

50
E ( N3 ) = E ( " ) = = = 0.1 .
" 500

If we consider the observation N1 = 75, N 2 = 210 , then we need to modify the formula
E ( N 3 ) = E ( " ) to E ( N 3 N1 = 75, N 2 = 210 ) = E ( " N1 = 75, N 2 = 210 ) .

f ( " N1 = 75, N 2 = 210 ) 0 f ( " ) P ( N1 = 75 " ) P ( N 2 = 210 " )

0 ( " 49 e 500 "


) 600 "
( 600" ) 900 "
( 900" )
75 210
e e
! !
( 500 + 600+ 900 )"
0" 49 + 75+ 210
e 0" e 334 2000 "

Guo Fall 2009 C, Page 262 / 284


So " N1 = 75, N 2 = 210 is a gamma distribution with parameters *
= 335 and
1
*
= .
2, 000

E ( N 3 N1 = 75, N 2 = 210 ) = E ( " N1 = 75, N 2 = 210 ) = a*


335
*
=
2, 000

Then the expected # of auto claims in Year 3 for 1,100 auto policies is simply

335
1,100 = 184.25
2, 000

May 2001 #2

You are given:


Annual claim counts follow a Poisson distribution with mean "
The parameter " has prior distribution with probability density function

1
f (" ) = e " 3
, " >0
3

Two claims were observed during the 1st year. Determine the variance of the posterior
mean.

Solution

Please note that exponential distribution is a gamma distribution with parameter =1.
So this is the Poisson-gamma model.

The observation is N1 = 2 . We are asked to find the variance Var ( " N1 = 2 ) . We are told
that N " is Poisson with mean " , yet " is gamma with =1, = 3.

Then " N1 = 2 is also gamma with updated parameters


*
= + # of observed claims = + N1
= ( # of observation periods + ) = (1 + 3 )
1 1
* 1 1
= 0.75

Then Var ( " N1 = 2 ) = *


( )
* 2
= 3 ( 0.75 ) = 1.6875
2

Guo Fall 2009 C, Page 263 / 284


Binomial-beta model

Problem (Nov 2000, #11)


For a risk, you are given:
The # of claims during a single year follows a Bernoulli distribution with mean p
The prior distribution for p is uniform on the interval [0, 1]
The claims experience is observed for a number of years
1
The Bayesian premium is calculated as based on the observed claims
5
Which of the following observed claims data could have yielded this calculation?
0 claims during 3 years
0 claims during 4 years
0 claims during 5 years
1 claims during 4 years
1 claims during 5 years

Solution
Please note that a uniform distribution is a special case of beta distribution with
parameter a = b = = 1 . In addition, Bernoulli distribution is a special case of binomial
distribution with n = 1 .

Next, Ill give you the general binomial-beta formula.

If

X p has binomial distribution with parameters n and p


p has beta distribution with parameter a and b
x1 , x2 ,, xk claims are observed in Year 1, Year 2,, Year k respectively (where xi
can be 0, 1, , n )

Then

The conditional random variable p x1 , x2 ,..., xk also has beta distribution with parameters

a* = a + x1 + x2 + ... + xk = a + total # of claims observed

b* = b + k n ( x1 + x2 + ... + xk ) = b + k n total # of claims observed

The Bayesian premium for Year k + 1 is:


a*
E ( X k +1 x1 , x2 ,..., xk ) = n E ( p x1 , x2 ,..., xk ) = n * *
a +b

Guo Fall 2009 C, Page 264 / 284


Proof.

f ( p ) P ( x1 , x2 ,..., xk p )
f ( p x1 , x2 ,..., xk ) =
1
. Where is
f ( p ) P ( x1 , x2 ,..., xk p ) dp f ( p ) P ( x1 , x2 ,..., xk p ) dp

a normalizing constant. So f ( p x1 , x2 ,..., xk ) is proportional to f ( p ) P ( x1 , x2 ,..., xk p ) .

Next, lets find the beta pdf f ( p ) . If you look at the Exam C table, youll see that beta
distribution has the following pdf:

1 (a + b) 1 x
f ( x) = u a (1 u ) , 0< x< , u=
b 1

1 ( a ) 1 (b ) x

This pdf is really annoying. It has variables u and x . To simplify the pdf, set = 1.
Then u = x and 0 < x < 1 . The pdf becomes:

1 (a + b) 1 1 (a + b) a
f ( x) = x a (1 x ) = (1 x) , 0 < x < 1.
b 1 1 b 1
x
1 ( a ) 1 (b ) x 1 ( a ) 1 (b)

This is the most commonly used beta pdf. This is the one you should use for Exam C.

Back to the problem. Since p has beta distribution with parameter a and b , the pdf is

1 (a + b)
f ( p) = (1 p) (1 p)
b 1 b 1
pa 1
, which is proportional to p a 1
.
1 ( a ) 1 (b )

Next, lets look at P ( x1 , x2 ,..., xk p ) . P ( x1 , x2 ,..., xk p ) = P ( x1 p ) P ( x2 p ) ...P ( xk p ) .


This is so because x1 , x2 ,..., xk are independent identically distributed given p . For i = 1
to k , xi p is binomial with parameters n and p . So P ( xi p ) = Cnxi p xi (1 p )
n xi
.

So P ( x1 , x2 ,..., xk p ) is proportional to

p x1 (1 p ) p x2 (1 p ) ... p xk (1 p )
n x1 n x2 n xk
! ! !
k
( xi k

= p i=1 (1 p ) (
( x1 + x2 +...+ xk )
= p x1 + x2 +...+ xk (1 p )
kn kn xi
i =1

Guo Fall 2009 C, Page 265 / 284


k
( xi k

( xi , which is proportional to
f ( p x1 , x2 ,..., xk ) is proportional to f ( p ) p (1 p)
kn
i =1
i =1

k k

( xi k

( =p
a+ ( xi 1 k

( xi
(1 p) (1 p) (1 p)
a 1 b 1 kn xi i =1
b+k n 1
i =1
p p i =1 i =1

We now see that f ( p x1 , x2 ,..., xk ) is beta distribution with parameters

a* = a + x1 + x2 + ... + xk , b* = b + k n ( x1 + x2 + ... + xk )

Next, well calculate E ( X k +1 x1 , x2 ,..., xk ) , the Bayesian estimate for Year k + 1 , using the
5-step framework.

We first discard the observation x1 , x2 ,..., xk . Then E ( X k +1 x1 , x2 ,..., xk ) becomes


E ( X k +1 ) . Using the double expectation theorem, we have:

E ( X k +1 ) = E p E ( X k +1 p ) ! = E p [ n p ] = n E ( p )

Next, we consider the observation x1 , x2 ,..., xk . Well modify the above equation by
changing the prior mean E ( p ) to the posterior mean E ( p x1 , x2 ,..., xk ) . We already
know that p x1 , x2 ,..., xk has beta distribution with parameters

a* = a + x1 + x2 + ... + xk , b* = b + k n ( x1 + x2 + ... + xk )
Looking up the beta expectation formula from the Exam C table, we have:

a*
E ( p x1 , x2 ,..., xk ) =
a* + b*

Finally, we have:

a*
E ( X k +1 x1 , x2 ,..., xk ) = n E ( p x1 , x2 ,..., xk ) = n
a* + b*

Now lets apply the binomial-beta formula to this problem. We are told that the # of
claims in a year is a Bernoulli random variable. So the number of trial is n = 1 . In
addition, the prior distribution of p is uniform over [0, 1], which is beta distribution with
parameter a = b = 1 .

Guo Fall 2009 C, Page 266 / 284


Assume we have observed a total of (x i claims in k years. Then the Bayesian

premium for the next year is:

k k k
a + ( xi 1 + ( xi 1 + ( xi
E ( X k +1 x1 , x2 ,..., xk ) = n i =1
= (1) i =1
= i =1

a+b+k n 1 + 1 + k (1) 2+k

We are told that E ( X k +1 x1 , x2 ,..., xk ) =


1
5
k
1 + ( xi
1
i =1
=
2+k 5

We have two unknowns in one equation. We cant solve it. One way to find the right
k

k
1 + ( xi
1
answer is to test each answer. If (x
i =1
i = 0 and k = 3 , well have i =1

2+k
=
5
. So zero

claim during 3 years is the right answer.

Also see Problem #15, May 2007.

Guo Fall 2009 C, Page 267 / 284


Chapter 10 Claim payment per payment

2005 Exam M May #32

For an insurance:

Losses can be 100, 200 or 300 with respective probabilities 0.2, 0.2, and 0.6.

The insurance has an ordinary deductible of 150 per loss.

Y P is the claim payment per payment random variable.

Calculate Var (Y P ) .

(A) 1500 (B) 1875 (C) 2250 (D) 2625 (E) 3000

Core concepts:
Ground up loss
Ordinary deductible
Claim payment
Claim payment per payment

Explanation

Let X represent the ground up loss amount (ground up loss amount is the actual loss
incurred by the policyholder). Let d where d 0 represent the deductible.

Amount paid the insurer (called claim payment):

0 if X d
(X d )+ = max ( X d , 0) =
X d if X > d

Amount the insured needs to pay out of his own pocket:

X if X d
(X d ) = min ( X , d ) =
d if X > d

Please note that

X = (X d )+ + (X d)
ground up loss amount paid by the insured
amount paid by the
insurance company out of his own pocket

Guo Fall 2009 C, Page 268 / 284


Example. Your deductible for your car insurance is $500. If you have an accident and the
loss is $600, you pay $500 out of your own pocket and your insurance company pays you
$100. In this case,

600 = 100 + 500


ground up loss amount paid by the amount paid by the insured
insurance company out of his own pocket

However, if the loss is $400, then you pay all the loss and the insurance company pays
zero.

400 = 0 + 400
ground up loss amount paid by the amount paid by the insured
insurance company out of his own pocket

Claim payment per payment


Let Y represent the claim payment. Then Y = ( X d )+ . Claim payment per payment
means (Y Y > 0 ) . Evidently, if X d , then Y 0 . In this case, the insured will cover all
the loss with his money and wont need to report the loss to the insurance company. So
the insurance company may not even know that a loss has incurred. So for the insurance
company to pay any claim, Y must be positive. This is why the claim payment per
payment is (Y Y > 0 ) .

Full solution

Let X represent the ground up loss. Let Y represent the claim payment. The deductible is
d = 150 .

Y = ( X 150 ) + = max ( X 150, 0 )

YP =Y Y > 0

We are asked to find Var (Y P ) .

Var (Y P ) = Var (Y Y > 0 ) = Var (X 150 ) + X > 150

=E (X 150 ) + X > 150 E 2 ( X 150 X > 150 )


2

Please note that


Guo Fall 2009 C, Page 269 / 284
Var ( X 150 X > 150 ) E ( X 150 X > 150 ) E 2 ( X 150 X > 150 )
2

This is because E ( X 150 X > 150 ) is not an appropriate symbol.


2

X 100 200 300


( X 150 )+ 0 50 150
P(X ) 0.2 0.2 0.6
P ( X > 150 ) = P ( X = 200 ) + P ( X = 300 ) = 0.8
P( X ) 0.2 0.2 0.6
P ( X > 150 ) 0.8 0.8 0.8

E ( X 150 X > 150 ) = 0


0.2 0.2 0.6
+ 50 + 150 = 125
0.8 0.8 0.8

0.2 0.2 0.6


(X 150 ) + X > 150 = 0 2 + 50 2 + 150 2 = 17,500
2
E
0.8 0.8 0.8

Var (X 150 )+ X > 150 = 17,500 1252 = 1,875

Well use BA II Plus or BA II Plus Professional 1-V Statistics Worksheet to calculate


Var ( X 150 )+ X > 150 .

As explained in the chapter on calculators, when using BA II Plus or BA II Plus


Professional 1-V Statistics Worksheet, we can simply discard the data that falls out of the
conditional probability and calculate the mean/variance on the remaining data.

X 100 200 300


Is X > 150 ? No, so discard Yes. Keep this Yes. Keep this
this data data. data.

After we discarded X = 100 , the remaining data is:


X 200 300
( X 150 )+ 50 150
P(X ) 0.2 0.6
10P ( X ) -- Scaled up 2 6
probability

Guo Fall 2009 C, Page 270 / 284


Enter the following into Statistics Worksheet:

X01=200, Y01=2; X02=150, Y02=6

BA II Plus or BA II Plus Professional should give you:

n = 8, X = 125, X = 43.30127019

Var = 2
= 1,875

Additional practice problems

#1 For an insurance policy:

Losses can be 100, 200, 300, and 400 with respective probabilities 0.1, 0.2, 0.3, and 0.4.

The insurance has an ordinary deductible of 250 per loss.

Y P is the claim payment per payment random variable.

Calculate Var (Y P ) .

Solution

Fast solution

Ground up loss X 100 200 300 400


Is X > 250 ? No. Discard No. Discard. Yes. Keep. Yes. Keep.

New table after discarding X 250 :


X 300 400
( X 250 )+ 50 150

P(X ) 0.3 0.4

10 P ( X ) -- scaled up probability 3 4

Enter the following into 1-V Statistics Worksheet:

X01=50, Y01=3; X02=150, Y02=4

BA II Plus or BA II Plus Professional should give you:

Guo Fall 2009 C, Page 271 / 284


n = 7, X = 107.14, X = 49.48716593

Var = 2
= 2, 4489.98

Standard solution

X 100 200 300 400


(X 250 ) + 0 0 50 150
P(X ) 0.1 0.2 0.3 0.4
P ( X > 250 ) = P ( X = 300 ) + P ( X = 400 ) = 0.3 + 0.4 = 0.7
P(X ) 0.1 0.2 0.3 0.4
P ( X > 250 ) 0.7 0.7 0.7 0.7

E(X 250 X > 250 ) = 0


1 2 3 4
+0 + 50 + 150 = 107.1428571
7 7 7 7

1 2 3 4
(X 150 ) + X > 150 = 0 2 + 02 + 502 + 150 2 = 13, 928.57143
2
E
7 7 7 7

Var (X 150 )+ X > 150 = 13,928.57143 107.14285712 = 2, 448.99

#2 For an insurance policy:

Losses can be 1,000, 4,000, 5,000, 9,000, and 12,000 with respective probabilities 0.11,
0.17, 0.24, 0.36, and 0.12.

The insurance has an ordinary deductible of 900 per loss.

Y P is the claim payment per payment random variable.

Calculate Var (Y P ) .

Solution

Guo Fall 2009 C, Page 272 / 284


To speed up calculations, we set one unit of money equal to $1,000.

Ground up loss X 1 4 5 9 12
Is X > 0.9 ? Yes. Yes. Yes. Yes. Yes.
Keep. Keep. Keep. Keep. Keep.
( X 0.9 )+ 0.1 3.1 4.1 8.1 11.1
P(X ) 0.11 0.17 0.24 0.36 0.12

100P ( X ) -- scaled 11 17 24 36 12
up probability

Enter the following into 1-V Statistics Worksheet:

X01=0.1, Y01=11; X02=3.1, Y02=17;


X03=4.1, Y03=24; X04=8.1, Y04=36;
X04=11.1, Y04=12

BA II Plus or BA II Plus Professional should give you:

n = 100, X = 5.77, X = 3.28345854

Var = = 10.781 = 10.781 ( $1, 000 ) = 10, 781,100$ 2


2 2

Guo Fall 2009 C, Page 273 / 284


Chapter 11 LER (loss elimination ratio)
Exam M Sample #27

You are given:

Losses follow an exponential distribution with the same mean in all years.
The loss elimination ratio this year is 70%.
The ordinary deductible for the coming year is 4/3 of the current deductible.

Compute the loss elimination ratio for the coming year.

Core concept:

Loss elimination ratio (LER)

Expected loss amount paid by the insured E ( X d )


LER = =
Expected loss amount E(X )

LER answers the question, What % of the expected loss amount is absorbed by the
policyholder due to the deductible?

How to calculate LER.

+ +
E(X ) = xf ( x )dx = s ( x )dx
0 0

X if X d
(X d ) = min ( X , d ) =
d if X > d

d +
E(X d ) = x f ( x )dx + d f ( x )dx (Intuitive formula)
0 d

Alternatively,

d d
E(X d ) = s ( x )dx = 1 FX ( x ) dx
0 0

You can find the proof of the 2nd formula from Loss Models.

Guo Fall 2009 C, Page 274 / 284


To help memorize the above formulas, notice that if we set d = 0 , then

+
E(X ) = E(X 0) = s ( x )dx
0

Solution to Sample #27

Ground up loss X has exponential distribution with mean :

x x x
1
f ( x) = e , s ( x) = 1 F ( x) = 1 1 e = e , E(X ) =

d d x d
E(X d ) = s ( x )dx = e dx = 1 e
0 0

E(X d) d
LER = =1 e (you might want to memorize this result)
E(X )

Under the original deductible, LER = 70%

d d
1 e = 0.7, e = 0.3

4
Under the new deductible (which is of the original deductible),
3

4
4 d d 3 4
LER ' = 1 e 3
=1 e = 1 0.3 3 = 0.799

Guo Fall 2009 C, Page 275 / 284


http://www.guo.coursehost.com

Chapter 12 Find E(Y-M)+

E (Y m )+ = E (Y ) m + mfY ( 0 ) + ( m 1) fY (1) + ( m 2 ) fY ( 2 ) + ...1 fY ( m 1)

Where Y and m are non-negative integers.

The above formula works whether Y is a simple random variable or a compound random
n n
variable Y = X i . If Y = X i , make sure you write
i =1 i =1

E (Y m )+ = E (Y ) m + mfY ( 0 ) + ( m 1) fY (1) + ( m 2 ) fY ( 2 ) + ...1 fY ( m 1)

Dont write

E (Y m )+ = E (Y ) m + mf X ( 0 ) + ( m 1) f X (1) + ( m 2 ) f X ( 2 ) + ...1 f X ( m 1)

In other words, the pdf in the right hand side must match up with the random variable in
n
the left hand side. If the random variable in the left hand side Y = X i , you need to use
i =1

fY ( y ) in the right hand and write the following equation:

E (Y m )+ = E (Y ) m + mfY ( 0 ) + ( m 1) fY (1) + ( m 2 ) fY ( 2 ) + ...1 fY ( m 1)

If your random variable in the left hand side is X , then you need to write

E(X m ) + = E ( X ) m + mf X ( 0 ) + ( m 1) f X (1) + ( m 2 ) f X ( 2 ) + ...1 f X ( m 1)

To use the above formula in the heat of the exam, we rewrite the above formula into:

fY ( 0 ) m
fY (1) m 1
E (Y m )+ = E (Y ) m + fY ( 2 ) m 2
... ...
fY ( m 1) 1
In the above formula,

Guo Fall 2009 C, Page 276 / 284


http://www.guo.coursehost.com

fY ( 0 ) m
fY (1) m 1
fY ( 2 ) m 2 = mfY ( 0 ) + ( m 1) fY (1) + ( m 2 ) fY ( 2 ) + ...1 fY ( m 1)
... ...
fY ( m 1) 1

This is not a standard notation. However, we use it anyway to help us memorize the
formula. In the exam, you just write these 2 matrixes. Then you simply take out each
element in the 1st matrix and multiply it with a corresponding element in the 2nd matrix.
Next, sum everything up.

Please note that if you take out an element fY ( k ) (where 0 k m 1 ) from the 1st
matrix, then you need to multiple it with m k from the 2nd matrix so ( m k ) + k = m
stands.

The proof of this formula is simple.

The standard formula is:

d 1
E ( S d )+ = E ( S ) 1 FS ( s )
s =0

Please note that I didnt write the formula as

d 1
E ( S d )+ = E ( S ) 1 FS ( x )
s =0

The above formula is confusing. f S ( x ) is not a good notation because S and x dont
match. The right notation should be f S ( s ) .

d 1
Lets move on from the formula E ( S d )+ = E ( S ) 1 FS ( s ) . To make our proof
s =0
simple, lets set d = 3 . The proof is the same if d is bigger.

2
E ( S 3) + = E ( S ) 1 FS ( s )
s =0
2
1 FS ( s ) = 1 FS ( 0 ) + 1 FS (1) + 1 FS ( 2 ) = 3 FS ( 0 ) + FS (1) + FS ( 2 )
s =0

FS ( 0 ) = P ( S 0) = P ( S = 0) = fS (0)

Guo Fall 2009 C, Page 277 / 284


http://www.guo.coursehost.com

FS (1) = P ( S 1) = P ( S = 0 ) + P ( S = 1) = f S ( 0 ) + f S (1)
FS ( 2 ) = P ( S 2 ) = P ( S = 0 ) + P ( S = 1) + P ( S = 2 ) = f S ( 0 ) + f S (1) + f S ( 2 )

FS ( 0 ) + FS (1) + FS ( 2 ) = 3 f S ( 0 ) + 2 f S (1) + f S ( 2 )

E ( S 3)+ = E ( S ) 3 + 3 f S ( 0 ) + 2 f S (1) + f S ( 2 )

Now you should be convinced that the following formula is correct:

fY ( 0 ) m
fY (1) m 1
E (Y m )+ = E (Y ) m + fY ( 2 ) m 2
... ...
fY ( m 1) 1

Problem 1 # 11 May 2000 Course 3

A company provides insurance to a concert hall for losses due to power failure. You are
given:

The number of power failures in a year has a Poisson distribution with mean 1.

The distribution of ground up losses due to a single power failure is

x Probability of x
10 0.3
20 0.3
50 0.4

The number of power failures and the amounts of losses are independent.

There is an annual deductible of 30.

Calculate the expected amount of claims paid by the insurer in one year.

Solution

Let N = # of power failures, S = total claim dollar amount before deductible.

Guo Fall 2009 C, Page 278 / 284


http://www.guo.coursehost.com

N
Then S = Xi .
i =1

The total claim dollar amount after the deductible of $30 is:

N
(S 30 )+ = Xi 30
i =1 +

Applying the formula, we have:

fS (0) 30
f S (1) 29
E ( S 30 )+ = E ( S ) 30 + f S ( 2 ) 28
... ...
f S ( 29 ) 1

It seems like we have awful lot of work to do about the two matrixes. Before you start to
panic, please note that many of the values f S ( 0 ) , f S (1) ,..., f S ( 29 ) will be zero. This is
because X has only 3 distinct values: 10, 20, and 50 with probability of 0.3, 0.3, and 0.4
respectively. Evidently, we can throw away X = 50 . If X = 50 , then S is at least 50 and
is out of the range S 29 .

N
Please also note that S = X i where N is a Poisson random variable with mean =1.
i =1

1
P ( N = n) = e 1

n!

So for S 29 , the possible values of S are:


N P(N ) X P ( X 1 , X 2 ,..., X N ) N
S=
P(S )
Xi
i =1

0 e 1 0 e1
1 e 1 X = 10 0.3 10 0.3e 1
X = 20 0.3 20 0.3e 1
2 1
e 1 ( X 1 , X 2 ) = (10,10 ) 0.32 20
e ( 0.32 )
1 1
2 2

Guo Fall 2009 C, Page 279 / 284


http://www.guo.coursehost.com

Next, we consolidate the probabilities:

N
S= Xi
i =1
P(S )
0 e1
10 0.3e 1
20 0.3e 1
e ( 0.32 )
20 1 1
2

After consolidation:

N
S=
P(S )
Xi
i =1

0 e1
10 0.3e 1

( 0.3 ) = 0.345e
20 1
0.3e 1 + e 1 2 1

fS ( 0) 30
E ( S 30 ) + = E ( S ) 30 + f S (10 ) 20
f S ( 20 ) 10

In the actual exam, to help remember the two matrixes, you can write only the 1st matrix:

fS ( 0) a
f S (10 ) b
f S ( 20 ) c

As said early, the sum of the two elements in each row needs to be m (or 30 in this
problem). As a result,

0 + a = 30 a = 30
10 + b = 30 b = 20
20 + c = 30 c = 10

Then, you can fill out the 2nd matrix:

Guo Fall 2009 C, Page 280 / 284


http://www.guo.coursehost.com

fS ( 0) a fS ( 0) 30
f S (10 ) b = f S (10 ) 20
f S ( 20 ) c f S ( 20 ) 10

fS ( 0) 30 e 1
30 1 30
f S (10 ) 20 = 0.3e 1 20 = e 1
0.3 20 = 39.45e 1

f S ( 20 ) 10 0.345e 1 10 0.345 10

N
S= Xi E (S ) = E (N ) E ( X )
i =1

E ( N ) = 1 , E ( X ) = 10 ( 0.3) + 20 ( 0.3) + 50 ( 0.4 ) = 29

E ( S ) = E ( N ) E ( X ) = 29

E ( S 30 ) + = E ( S ) 30 + 39.45e 1 = 13.5128

Problem 2 #18 May M, 2005

For a collective risk model:

The number of losses has a Poisson distribution with =2

The common distribution of the individual losses is:

x fX ( x)
1 0.6
2 0.4

An insurance covers aggregate losses subject to a deductible of 3.

Calculate the expected aggregate payments of the insurance.

Solution

Guo Fall 2009 C, Page 281 / 284


http://www.guo.coursehost.com

N
S= X i where S is the aggregate loss and X is individual loss dollar amount.
i =1

We are asked to find E ( S 3)+ .

fS ( 0) 3
E ( S 3 )+ = E ( S ) 3 + f S (1) 2
fS ( 2) 1

Where E ( S ) = E ( N ) E ( X ) = 2 1( 0.6 ) + 2 ( 0.4 ) = 2.8

fS ( 0)
Next, we need to find f S (1) .
fS ( 2)

N P(N ) X P ( X 1 , X 2 ,..., X N ) N
S= Xi
i =1
P(S )
0 e 2 0 e2
1 2e 2
X =1 0.6 1 ( 0.6 ) 2e 2

X =2 0.4 2 ( 0.4 ) 2e 2
2 22 2
e = 2e 2 ( X 1 , X 2 ) = (1,1) 0.62 2 ( 0.6 ) 2e
2 2

2!

Next, we consolidate the table into:

N
S= Xi
i =1
P(S )
0 e2
1 ( 0.6 ) 2e 2 = 1.2e 2

2 ( 0.4 ) 2e 2 + ( 0.62 ) 2e 2 = 1.52e 2

fS ( 0) 3 e 2
3
E ( S 3)+ = E ( S ) 3 + f S (1) 2 = 2.8 3 + 1.2e 2
2
fS ( 2) 1 1.52e 2
1

Guo Fall 2009 C, Page 282 / 284


http://www.guo.coursehost.com

1 3
= 2.8 3 + e 2
1.2 2 = 2.8 3 + 6.92e 2 = 0.73652
1.52 1

Problem 3 Sample M #45

Prescription drug losses, S, are modeled assuming the number of claims has a geometric
distribution with mean 4, and the amount of each prescription is 40.

Calculate E ( S 100 ) +

Ill leave this problem for you to solve.

Guo Fall 2009 C, Page 283 / 284


About the author
Yufeng Guo was born in central China. After receiving his Bachelors degree in physics
at Zhengzhou University, he attended Beijing Law School and received his Masters of
law. He was an attorney and law school lecturer in China before immigrating to the
United States. He received his Masters of accounting at Indiana University. He has
pursued a life actuarial career and passed exams 1, 2, 3, 4, 5, 6, and 7 in rapid succession
after discovering a successful study strategy.

Mr. Guos exam records are as follows:


Fall 2002 Passed Course 1
Spring 2003 Passed Courses 2, 3
Fall 2003 Passed Course 4
Spring 2004 Passed Course 6
Fall 2004 Passed Course 5
Spring 2005 Passed Course 7

Mr. Guo currently teaches an online prep course for Exam P, FM, MFE, and MLC. For
more information, visit http://actuary88.com/.

If you have any comments or suggestions, you can contact Mr. Guo at
yufeng_guo@msn.com.

Guo Fall 2009 C, Page 284 / 284