00 голосов за00 голосов против

84 просмотров284 стр.May 06, 2017

© © All Rights Reserved

PDF, TXT или читайте онлайн в Scribd

© All Rights Reserved

84 просмотров

00 голосов за00 голосов против

© All Rights Reserved

Вы находитесь на странице: 1из 284

6th Edition

by Yufeng Guo

Fall 2009

This electronic book is intended for individual buyer use for the sole purpose of preparing for

Exam C. This book can NOT be resold to others or shared with others. No part of this publication

may be reproduced for resale or multiple copy distribution without the express written permission

of the author.

Table of Contents

Introduction 4

Chapter 1 Doing calculations 100% correct 100% of the time.. 5

6 strategies for improving calculation accuracy ............................................................. 5

6 powerful calculator shortcuts....................................................................................... 6

#1 Solve ax 2 + bx + c = 0 . .................................................................................... 6

#2 Keep track of your calculation...................................................................... 10

#3 Calculate mean and variance of a discrete random variable......................... 21

#4 Calculate the sample variance....................................................................... 29

#5 Find the conditional mean and conditional variance .................................... 30

#6 Do the least squares regression ..................................................................... 36

#7 Do linear interpolation .................................................................................. 46

Chapter 2 Maximum likelihood estimator ......................................... 52

Basic idea ...................................................................................................................... 52

General procedure to calculate the maximum likelihood estimator ............................. 53

Fisher Information ........................................................................................................ 58

The Cramer-Rao theorem ............................................................................................. 62

Delta method................................................................................................................. 66

Chapter 3 Kernel smoothing................................................................ 75

Essence of kernel smoothing ........................................................................................ 75

Uniform kernel.............................................................................................................. 77

Triangular kernel........................................................................................................... 82

Gamma kernel............................................................................................................... 90

Chapter 4 Bootstrap.............................................................................. 95

Essence of bootstrapping .............................................................................................. 95

Recommended supplemental reading ........................................................................... 96

Chapter 5 Bhlmann credibility model ............................................ 102

Trouble with black-box formulas................................................................................ 102

Rating challenges facing insurers ............................................................................... 102

3 preliminary concepts for deriving the Bhlmann premium formula ....................... 106

Preliminary concept #1 Double expectation ....................................................... 106

Preliminary concept #2 Total variance formula.................................................. 108

Preliminary concept #3 Linear least squares regression ..................................... 111

Derivation of Bhlmanns Credibility Formula.......................................................... 112

Summary of how to derive the Bhlmann credibility premium formulas .................. 117

Special case................................................................................................................. 122

How to tackle Bhlmann credibility problems ........................................................... 123

An example illustrating how to calculate the Bhlmann credibility premium ........... 123

Shortcut ....................................................................................................................... 126

Practice problems........................................................................................................ 126

Chapter 6 Bhlmann-Straub credibility model ............................... 148

Context of the Bhlmann-Straub credibility model.................................................... 148

Assumptions of the Bhlmann-Straub credibility model............................................ 149

Summary of the Bhlmann-Straub credibility model................................................. 154

Guo Fall 2009 C, Page 2 / 284

General Bhlmann-Straub credibility model (more realistic) .................................... 155

How to tackle the Bhlmann-Straub premium problem ............................................. 158

Chapter 7 Empirical Bayes estimate for the Bhlmann model...... 168

Empirical Bayes estimate for the Bhlmann model ................................................... 168

Summary of the estimation process for the empirical Bayes estimate for the

Bhlmann model..................................................................................................... 170

Empirical Bayes estimate for the Bhlmann-Straub model........................................ 173

Semi-parametric Bayes estimate................................................................................. 182

Chapter 8 Limited fluctuation credibility ........................................ 187

General credibility model for the aggregate loss of r insureds ................................. 188

Key interim formula: credibility for the aggregate loss............................................. 190

Final formula you need to memorize .......................................................................... 191

Special case................................................................................................................. 192

Chapter 9 Bayesian estimate ......................................................... 202

Intuitive review of Bayes Theorem ........................................................................... 202

How to calculate the discrete posterior probability .................................................... 206

Framework for calculating the discrete posterior probability..................................... 208

How to calculate the continuous posterior probability ............................................... 213

Framework for calculating discrete-prior Bayesian premiums................................... 219

Calculate Bayesian premiums when the prior probability is continuous.................... 251

Poisson-gamma model ................................................................................................ 260

Binomial-beta model................................................................................................... 264

Chapter 10 Claim payment per payment ........................................... 268

Chapter 11 LER (loss elimination ratio)............................................. 274

Chapter 12 Find E(Y-M)+.................................................................... 276

About the author .................................................................................... 284

Introduction

This manual is intended to be a missing manual. It skips what other manuals explain well.

It focuses on what other manuals dont explain or dont explain well. This way, you get

your moneys worth.

Chapter 1 teaches you how to do manual calculation quickly and accurately. If you

studied hard but failed Exam C repeatedly, chances are that you are concept strong,

calculation weak. The calculator techniques will improve our calculation accuracy.

topic for many.

Chapter 3 explains the essence of kernel smoothing and teaches you how to derive

complex kernel smoothing formulas for k y ( x ) and K y ( x ) . You shouldnt have any

trouble memorizing complex kernel smoothing formulas after this chapter.

Many candidates dont know the essence of bootstrap. Chapter 4 is about bootstrap.

Chapter 5 explains the core theory behind the Bhlmann credibility model.

Chapter 6 compares and contrasts the Bhlmann-Straub credibility models with the

Bhlmann credibility model.

Many candidates are afraid of empirical Bayes estimate problems. The formulas are just

too hard to remember. Chapter 7 will relieve your pain.

Many candidates find that there are just too many limited fluctuation credibility formulas

to memorize. To address this, Chapter 8 gives you a unified formula.

Chapter 9 presents a framework for quickly calculating the posterior probability (discrete

or continuous) and the posterior mean (discrete or continuous). Many candidates can

recite Bayes theorem but cant solve related problem in the exam condition. Their

calculation is long, tedious, and prone to errors. This chapter will drastically improve

your calculation efficiency.

Chapter 11 is about loss elimination ratio.

Chapter 12 is about how to quickly calculate E (Y M )+ .

Chapter 1 Doing calculations 100% correct 100% of

the time

>From: Exam C candidate (name removed)

>To: yufeng_guo@msn.com

>Subject: Help..

>Date: someday in 2006

>

>Hello Mr. Guo.

>

> I tried Exam C problems under the exam-like condition. To my surprise, I found that I

>made too many mistakes; one mistake is 1+1=3. How can I improve my accuracy?

1. Gain a deeper understanding of a core concept. People tend to make errors if they

memorize a black-box formula without understanding the formula. To reduce

errors, try to understand core concepts and formulas.

2. Learn how to solve a problem faster. Many exam candidates solve hundreds of

practice problems yet fail Exam C miserably. One major cause is that their

solutions are inefficient. Typically, these candidates copy solutions presented in a

textbook and study manuals. Authors of textbooks and many study manuals

generally use software to do the calculations. To solve a messy calculation, they

just type up the formula and click Compute button. However, when you take

the exam, you have to calculate the answer manually. A solution that looks clean

and easy in a textbook may be a nightmare in the exam. When you prepare for

Exam C, dont copy textbook solutions. Improve them. Learn how to do manual

calculation faster.

3. Build solution frameworks and avoid reinventing the wheel. If you analyze Exam

C problems tested in the past, youll see that SOA pretty much tests the same

things over and over. For example, the Poisson-gamma model is tested over and

over. When preparing for Exam C, come up with a ready-to-use solution

framework for each of the commonly tested problems in Exam C. This way, when

you walk into the exam room and see a commonly tested problem, you dont need

to solve the problem from scratch. You can use your pre-built solution framework

and solve it quickly and accurately.

4. Keep an error log. Whenever you solve some practice problems, record your

errors in a notebook. Analyze why you made errors. Try to solve a problem

differently to avoid the error. Review your error log from time to time. Using an

error log helps you avoid making the same calculation errors over and over.

5. Avoid doing mental math in the exam even for the simplest calculations. Even if

you are solving a simple problem like 2+3, use your calculator to solve the

problem. Simply enter 2 + 3 in your calculator. This will reduce your silly

errors.

Fast and safe techniques for common calculations.

#1 Solve ax 2 + bx + c = 0 .

b b 2 4ac

The formula x = is OK when a, b, and c are nice and small numbers.

2a

However, when a, b, and c have many decimals or are large numbers and we are in the

pressured situation, the standard solution often falls apart in the heat of the exam.

If candidates need to solve this equation in the exam, many will fluster. The standard

b b 2 4ac

approach x = is labor intensive and prone to errors when a, b, and c are

2a

messy.

To solve this equation 100% right under pressure and in a hurry, well do a little trick.

1

First, we set x = v = . So we treat x as a dummy discount factor. The original

1+ r

equation becomes:

Finding r is a concept you learned in Exam FM. We first convert the equation to the

following cash flow diagram:

Time t 0 1 2

So at time zero, you receive $0.752398. At time one, you pay $89.508. Finally, at time

two, you receive $0.3247. Whats your IRR?

To find r (the IRR), we simply use Cash Flow Worksheet in BA II Plus or BA II Plus

Professional.

Cash Flow CF 0 C 01 C 02

0.752398 - 89.508 0.3247

Frequency F 01 F 02

1 1

Because the cash flow frequency is one for both C 01 and C 02 , we dont need to enter

F 01 = 1 and F 02 = 1 . If we dont enter cash flow frequency, BA II Plus and BA II Plus

Professional use one as the default cash flow frequency.

Using the IRR function, we find that IRR = 99.63722807 . Remember this is a

percentage. So r = 99.63722807%

1 1

x1 = = = 275.6552834

1 + r 1 99.63722807%

How are going to find the second root? Well use the following formula:

If x1 and x2 are the two roots of ax 2 + bx + c = 0 , then

c 1 c

x1 x2 = x2 =

a x1 a

1 c 1 0.752398

x2 = = = 0.00840619

x1 a 275.6552834 0.3247

Keystrokes in BA II Plus / BA II Plus Professional

Procedure Keystroke Display

Assume we set the calculator to

display 8 decimal places.

Use Cash Flow Worksheet

CF CF0=(old content)

Clear Worksheet

2nd [CLR WORK] CF0=0.00000000

Enter the cash flow at t = 0. CF0=0.752398

C01 0.00000000

F01= 1.00000000

The default # is 1. So no need to

enter anything.

Enter the cash flow at t =2.

C02 0.00000000

Calculate IRR

IRR IRR=0.00000000

% IRR 0.9963722807 (This is the

dummy interest)

Find the dummy discount factor

1 + 1= IRR 0.00362772

x1 =

1 + IRR % IRR 275.65528324

1x

This is x1

auditing trail.

Find the 2nd root. x2 =

x1 a =

auditing trail.

You can always double check your calculations. Retrieve x1 and x2 from the calculator

memory and plug in 0.3247 x 2 89.508 x + 0.752398 . You should get a value close to

zero. For example, plugging in x1 = 275.6552834 :

Plugging in x2 = 0.00840619

(OK)

Does this look at lot of work? Yes at the first time. Once you get familiar with this

process, it takes you 15 seconds to finish calculating x1 and x2 and double checking they

are right.

Step 1 Rearrange ax 2 + bx + c = 0 to c + bx + ax 2 = 0 .

Step 2 Use BA II Plus/BA II Plus Professional Cash Flow Worksheet to find IRR

C 01 = b (cash flow at time one)

C 02 = a (cash flow at time two)

Time t 0 1 2

Cash flow c b a

1 1 c

x1 = , x2 =

IRR x1 a

1+

100

In the exam, if an equation is overly simple, just try out the answer. If an equation is not

overly simple, always use the above process to solve ax 2 + bx + c = 0 .

For example, if you see x 2 2 x 3 = 0 , you can guess that x1 = 1 and x2 = 3 . However,

if you see x 2 2 x 7.3 = 0 , use Cash Flow Worksheet to solve it.

Exercise

Answer: x1 = 7.2321003 and x2 = 1.23737899

#2 Solve x 2 2 x 7.3 = 0 .

Answer: x1 = 3.88097206 and x2 = 1.88097206

#3 Solve 0.9080609 x 2 0.00843021x 0.99554743 = 0

Answer: x1 = 1.0517168 and x2 = 1.04243305

#4 Solve x 2 2 x + 3 = 0 .

Answer: youll get an error message if want to calculate IRR. Theres no solution.

x 2 2 x + 3 = ( x 1) + 2 2 . So theres no solution.

2

Example 1

SOA Exam C at the next exam sitting. The probability for each candidate to pass Course

2 is 0.73, independent of other students passing or failing the exam. The company

promises to give each actuary student who passes Exam C a raise of $2,500. Whats the

probability that the insurance company will spend at least $50,000 on raises associated

with passing Exam C?

Solution

If the company spends at least $50,000 on exam-related raises, then the number of

students who will pass Exam C must be at least 50,000/2,500=20. So we need to find the

probability of having at least 20 students pass Exam C.

Let X = the number of students who will pass Exam C. The problem does not specify the

distribution of X . So possibly X has a binomial distribution. Lets check the conditions

for a binominal distribution:

There are only two outcomes for each student taking the exam either Pass or

Fail.

The probability of Pass (0.73) or Not Pass (0.27) remains constant from one

student to another.

The exam result of one student does not affect that of another student.

X satisfies the requirements of a binomial random variable with parameters n =23 and

p =0.73. We also need to find the probability of x 20 .

Pr(x 20) = Pr(x = 20) + Pr(x = 21) + Pr(x = 22) + Pr(x = 23)

n x

, we have:

f (x 20)

= C 23

20

(.73)20 (.27)3 + C 23

21

(0.73)21(.27)2 + C 23

22

(.73)22 (.27) + C 23

23

(.73)23 = .09608

Therefore, there is a 9.6% of chance that the company will have to spend at least $50,000

to pay for exam-related raises.

Procedure Keystroke Display

Set to display 8 decimal places (4 DEC=8.00000000

decimal places are sufficient, but 2nd Format 8 Enter

assume you want to see more

decimals)

Set AOS (Algebraic operating system)

2nd [FORMAT],

keep pressing multiple

times until you see Chn.

Press 2nd [ENTER]

calculator is already in AOS,

in which case press

[CLR Work] )

20

Calculate C 23 (.73)20 (.27)3

20

Calculate C23 1,771.000000

23 2 nd

n Cr 20

20 3.27096399

Calculate (.73)

x

.73 y 20

x

.27 y 3+

21

Calculate C23 (0.73)21(.27)2

21

Calculate C23 253.0000000

nC r 21

nd

23 2

21 0.34111482

Calculate (.73)

x

.73 y 21

2

.27 x +

22

Calculate C23 (0.73)22 (.27)

22

Calculate C23 23.00000000

23 2 nd

n Cr 22

22 0.02263762

Calculate (.73)

x

.73 y 22

.27 +

23

Calculate C23 (0.73)23

23

Calculate C23 1.00000000

23 2 nd

n Cr 23

23 0.09608031

Calculate (.73) and get the final

x

result .73 y 23 =

Procedure Keystroke Display

Set to display 8 decimal places (4 DEC=8.00000000

decimal places are sufficient, but 2nd Format 8 Enter

assume you want to see more

decimals)

Set AOS (Algebraic operating system)

2nd [FORMAT],

keep pressing multiple

times until you see Chn.

Press 2nd [ENTER]

calculator is already in AOS,

in which case press

[CLR Work] )

Clear memories 2nd MEM M0=0.00000000

2nd CLR Work

Get back to calculation mode CE/C 0.00000000

(.73 ) (.27 )

20 20 3

Calculate C 23 and

store it in Memory 1

20

Calculate C23 1,771.000000

23 2 nd

n Cr 20

( 0.73 )

20 3.27096399

Calculate x

.73 y 20

(.27 )

3 0.064328238

Calculate x

.27 y 3=

Get back to calculation mode CE/C 0.00000000

( 0.73 ) (.27)2

21 21

Calculate C23

and store it in Memory 2

21

Calculate C23 253.0000000

2 nd

n Cr 21

21 0.34111482

Calculate (.73)

x

.73 y 21

2

.27 x

Get back to calculation mode CE/C 0.00000000

22

Calculate C23 (0.73)22 (.27) and

store it in Memory 3

22

Calculate C23 23.00000000

23 2 nd

n Cr 22

22 0.02263762

Calculate (.73)

x

.73 y 22

.27 =

Store the result in Memory 2 STO 2 0.09536181

23 23

Calculate C 23 (0.73) and store it

in Memory 4

23

Calculate C23 1.00000000

23 2 nd

n Cr 23

23 0.00071850

Calculate (.73) and get the final

x

result .73 y 23 =

Recall values stored in Memory 1,2,3,

and 4. Sum them up.

RCL 0 0.064328238

+ RCL 1 0.02486727

+ RCL 2 0.00611216

+ RCL 3 = 0.09608031

Method #1 is quicker but more risky. Because you dont have an audit history, if you

miscalculate one item, youll need to recalculate everything again from scratch.

Method #2 is slower but leaves a good auditing trail by storing all your intermediate

values in your calculators memories. If you miscalculate one item, you need to

recalculate that item alone and reuse the result of other calculations (which are correct).

(.73 ) (.27 )

20 20 3

For example, instead of calculating C 23 as you should, you calculated

(.73 ) (.27 ) . To correct this error under method #1, you have to start from scratch

20 3 20

C 23

and calculate each of the following four items:

20 3 21 21 22

20

C 23 (0.73)22 (.27) , and C 23

23

(0.73)23

In contrast, correcting this error under Method #2 is lot easier. You just need to

(.73 ) (.27 ) ; you dont need to recalculate any of the following three

20 20 3

recalculate C 23

items:

21 21 22

C23 (0.73)22 (.27) , and C 23

23

(0.73)23

You can easily retrieve the above three items from your calculators memories and

calculate the final result:

20

C 23 (.73)20 (.27)3 + C 23

21

(0.73)21(.27)2 + C 23

22

(.73)22 (.27) + C 23

23

(.73)23 = .09608

Given:

l20 9,617,802

l30 9,501,381

l50 8,950,901

A50 0.24905

a20 16.5133

a30 15.8561

a50 13.2668

Interest rate 6%

Guo Fall 2009 C, Page 14 / 284

l30 10

a20 v a30

l50 l20

Calculate V = A50 v 20

l30 l50 30

a20 v a50

l20

Solution

This calculation is complex. Unless you use a systematic method, youll make mistakes.

l30 10 1 l30 10 1 1 10

a20 v a30 a20 v a30 a20 v a30

l50 l20 l30 l20 20 l30 l20

A50 v 20 = A50 v 20 = A50 v

l30 l50 30 1 l50 30 1 1 30

a20 v a50 a20 v a50 a20 v a50

l20 l50 l20 l50 l20

v = 1.06 1

a20 a30 10

1.06

l30 l20

V = A501.06 20

a20 a50 30

1.06

l50 l20

Make sure you dont make mistakes in simplification. If you are afraid of making

mistakes, dont simplify and just do your calculations using the original equation:

l30 10

a20 v a30

l50 l20

V= A50 v 20

l30 l50 30

a20 v a50

l20

l20 M0 9,617,802

l30 M1 9,501,381

l50 M2 8,950,901

A50 M3 0.24905

a20 M4 16.5133

a30 M5 15.8561

a50 M6 13.2668

a20 a30 10 M4 M5

1.06 1.06 10

l30 l20

V = A501.06 20

= ( M 3)1.06 20 M1 M 0

a20 a50 M4 M6

1.06 30

1.06 30

l50 l20 M2 M0

Set to display 8 decimal DEC=8.00000000

places 2nd Format 8 Enter

Set AOS (Algebraic

operating system) 2nd [FORMAT],

keep pressing multiple

times until you see Chn.

Press 2nd [ENTER]

calculator is already in AOS,

in which case press

[CLR Work] )

Clear existing numbers M0=0.00000000

nd nd

from the memories 2 MEM 2 CLR Work

9,617,802 Enter

Move to the next memory M1=0.00000000

9,501,381 Enter

Move to the next memory M2=0.00000000

Enter 8,950,901 in M2 M2=8,950,901.000

8,950,901 Enter

Move to the next memory M3=0.00000000

0.24905 Enter

Move to the next memory M4=0.00000000

16.5133 Enter

Move to the next memory M5=0.00000000

15.8561 Enter

Move to the next memory M6=0.00000000

13.2668 Enter

Leave the memory

workbook and get back to CE/C

the normal calculation

mode This is the button on the

bottom left corner. This is the

same button for

CLR Work

Keystrokes: press 2nd MEM. Then keep pressing the down-arrow key to view all the

data you entered in the memories. Make sure all the correct numbers are entered.

M4 M5

1.06 10

V = ( M 3)1.06 20 M1 M 0

M4 M6

1.06 30

M2 M0

Well break down the calculation into two pieces:

M4 M5

1.06 10

= M 7 (store the result in M7)

M1 M0

M4 M6

1.06 30

= M 8 (store the result in M8)

M2 M0

M7

V = ( M 3)1.06 20

M8

Calculate

M4 M5 10 Recall 4 Recall 1 - Recall 5 0.00000082

1.06

M1 M 0 Recall 0

1.06 y x 10 +/- =

Store the result in M7.

Go back to the normal STO 7 CE/C

calculation mode.

0.00000160

Calculate Recall 4 Recall 1 - Recall 5

M4 M6 30 Recall 0

1.06

M2 M0

1.06 y x 10 +/- =

Store the result in M8.

Go back to the normal STO 8 CE/C

calculation mode.

0.0399556010

x

Calculate Recall 3 1.06 y 20 +/-

M7

V = ( M 3)1.06 20 Recall 7 Recall 8

M8

So V = 0.0399556 0.04

Though this calculation process looks long, once you get used to it, you can do it in less

than one minute.

Inputs are entered only once. In this problem, l20 and a20 are used twice in the

a20 a30 10

1.06

l30 l20

formula V = A501.06 20

. However, we enter l20 and a20 into

a20 a50 30

1.06

l50 l20

memories only once. This reduces data entry error.

This process gives us a good auditing trail, enabling us to check the data entry and

calculations.

We can isolate errors. For example, if a wrong value of l30 is entered into the

a20 a30

memory, we can reenter l30 , recalculate 1.06 10 , and store the calculate

l30 l20

M7

value into M7. Next, we recalculate V = ( M 3)1.06 20 .

M8

Bottom line: I recommend that you master this calculation method. It costs you extra

work, but it enables you to do messy calculations 100% right in the exam.

When exams get tough and calculations get messy, many candidates who know as much

as you do will make calculations errors here and there and fail the exam. In contrast,

youll stand above the crowd and make no errors, passing another exam.

In Example 2, you calculated that V = 0.04 . However, none of the answer choices given

is 0.04. Suspecting that you made an error in calculations, you decided to redo the

calculation. First, you scrolled over the memories and gladly you found no error in data

M4 M5 M4 M6

entry. Next, you recalculated 1.06 10 = M 7 and 1.06 30 = M 8 .

M1 M 0 M2 M0

Once again, you found your previous calculations were right. Finally, you recalculated

M7

V = ( M 3)1.06 20 . Once again, you got V = 0.04 .

M8

You already spent four minutes in this problem. You decided to spend two more minutes

on this problem. If you couldnt figure out the right answer, you just had to give it up and

move on to the next problem.

So you quickly read the problem again. Oops! You found that your formula was wrong.

Your original formula was:

l30 10

a20 v a30

l50 l20

V= A50 v 20

l30 l50 30

a20 v a50

l20

l30 10

a20 v a30

l50 l20

V= a50 v 20

l30 l50 30

a20 v a50

l20

How could you find the answer quickly, using the correct formula?

Solution

The situation described here sometimes happens in the actual exam. If you dont use a

systematic method to do calculations, you wont leave a good auditing trail. In that case,

all your previous calculations are gone and you have to redo calculations from scratch.

This is awful.

Fortunately, you left a good auditing trail and correcting errors was easy.

l30 10

a20 v a30

l50 l20 M7

V= A50 v 20 = ( M 3)1.06 20

l30 l50 30 M8

a20 v a50

l20

l30 10

a20 v a30

l50 l20 M7

V= a50 v 20 = ( M 6 )1.06 20

l30 l50 30 M8

a20 v a50

l 20

Remember a50 = M 6

M7

V = ( M 6 )1.06 20

= 2.10713362 2.11

M8

Guo Fall 2009 C, Page 20 / 284

Now you look at the answer choices again. Good. 2.11 is there!

Use TI-30 IIS (using the redo capability of TI-30IIS)

Use BA II Plus/BA II Plus Professional 1-V Statistics Worksheet

Exam #1 (#8 Course 1 May 2000) A probability distribution of the claim sizes for

an auto insurance policy is given in the table below:

20 0.15

30 0.10

40 0.05

50 0.20

60 0.10

70 0.10

80 0.30

What percentage of the claims are within one standard deviation of the mean claim size?

Solution

errors. Always let the calculator do all the calculations for you.

One critical thing to remember about the BA II Plus and BA II Plus Professional

Statistics Worksheet is that you cannot directly enter the probability mass function f ( x i )

into the calculator to find E ( X ) and Var ( X ) . BA II Plus and BA II Plus Professional 1-

V Statistics Worksheet accepts only scaled-up probabilities that are positive integers. If

you enter a non-integer value to the statistics worksheet, you will get an error when

attempting to retrieve E ( X ) and Var ( X ) .

common integer.

Claim Size x Probability Pr(x ) Scaled-up probability =100 Pr(x )

20 0.15 15

30 0.10 10

40 0.05 5

50 0.20 20

60 0.10 10

70 0.10 10

80 0.30 30

Total 1.00 100

Next, enter the 7 data pairs of (claim size and scaled-up probability) into the BA II Plus

Statistics Worksheet to get E ( X ) and X .

Procedure Keystrokes Display

Set the calculator to display

4 decimal places 2nd [FORMAT] 4 ENTER DEC=4.0000

operating system) 2nd [FORMAT],

keep pressing multiple

times until you see Chn.

Press 2nd [ENTER]

AOS

(if you see AOS, your

calculator is already in

AOS, in which case press

[CLR Work] )

Select data entry portion of

Statistics worksheet 2nd [Data] X01 (old contents)

Enter data set

20 ENTER X01=20.0000

Y01=15.0000

15 ENTER

30 ENTER X02=30.0000

Y02=10.0000

10 ENTER

40 ENTER X03=40.0000

Y03=5.0000

5 ENTER

Guo Fall 2009 C, Page 22 / 284

50 ENTER

X04=50.0000

20 ENTER Y04=20.0000

60 ENTER X05=60.0000

Y05=10.0000

10 ENTER

70 ENTER X06=70.0000

Y06=10.0000

10 ENTER

80 ENTER X07=80.0000

Y07=30.0000

30 ENTER

Select statistical calculation

portion of Statistics 2nd [Stat] Old content

worksheet

Select one-variable

calculation method Keep pressing 2nd SET 1-V

until you see 1-V

View the sum of the scaled- n=100.0000 (Make sure the

up probabilities sum of the scaled-up

probabilities is equal to the

scaled-up common factor,

which in this problem is

100. If n is not equal to the

common factor, youve

made a data entry error.)

View mean x =55.0000

View sample standard S x =21.9043 (this is a

deviation sample standard deviation--

- dont use this value). Note

that

1 n

Sx = (X i X )2

n 1 i =1

View standard deviation X =21.7945

View X2 X 2 =350,000.0000 (not

needed for this problem,

though this function might

be useful for other

calculations)

You should always double check (using to scroll up or down the data pairs of X and

Y) that your data entry is correct before accepting E ( X ) and X generated by BA II

Plus.

If you have made an error in data entry, you can 2nd DEL to delete a data pair (X, Y) or

2nd INS to insert a data pair (X,Y). If you typed a wrong number, you can use to delete

the wrong number and then re-enter the correct number. Refer to the BA II Plus

guidebook for details on how to correct data entry errors.

formula-driven approach, it could be because you are not familiar with the BA II Plus

Statistics Worksheet yet. With practice, you will find that using the calculator is quicker

than manually calculating with formulas.

Then, we have

=(33.21, 76.79)

Pr(33.21 X 76.79) = Pr( X = 40) + Pr( X = 50) + Pr( X = 60) + Pr( X = 70)

=0.05+0.20+0.10+0.10 = 0.45

To find E ( X ) , we type:

20*.15+30*.1+40*.05+50*.2+60*.1+70*.1+80*.3

Next we modify the formula

to

your cursor is blinking on top of the multiplication sign . Press 2nd INS x 2 .

=3500

So E ( X 2 ) =3,500

Keep in mind that you can enter up to 88 digits for a formula in TI-30X IIS. If your

formula exceeds 88 digits, TI 30X IIS will ignore the digits entered after the 88th digit.

A baseball team has scheduled its opening game for April 1. If it rains on April 1, the

game is postponed and will be played on the next day that it does not rain. The team

purchases insurance against rain. The policy will pay 1,000 for each day, up to 2 days,

that the opening game is postponed. The insurance company determines that the number

of consecutive days of rain beginning on April 1 is a Poisson random variable with a 0.6

mean. What is the standard deviation of the amount the insurance company will have to

pay?

(A) 668, (B) 699, (C) 775, (D) 817, (E) 904

Solution

n

0.6 n

Pr(N = n ) = e =e 0.6

(n =0,1,2,..+ )

n! n!

Let X = payment by the insurance company. According to the insurance contract, if there

is no rain (n=0), X=0. If it rains for only 1 day, X=$1,000. If it rains for two or more days

in a row, X is always $2,000. We are asked to calculate X .

If a problem asks you to calculate the mean, standard deviation, or other statistics of a

discrete random variable, it is always a good idea to list the variables values and their

corresponding probabilities in a table before doing the calculation to organize your data.

So lets list the data pair ( X , probability) in a table:

0.6 0

0 Pr(N = 0) = e 0.6

=e 0.6

0!

1,000 0.61

Pr(N = 1) = e 0.6

= 0.6e 0.6

1!

2,000

Pr(N 2) = Pr(N = 2) + Pr(N = 3) + ...

=1-[ Pr(N = 0) + Pr(N = 1)]

0.6

=1-1.6e

Once you set up the table above, you can use BA II Pluss Statistics Worksheet or TI-30

IIS to find the mean and variance.

1000*.6e^(-.6)+2000(1-1.6e^(-.6

When typing e^(-.6) for e 0.6 , you need to use the negative sign, not the minus sign, to

get -6. If you type the minus sign in e^( .6), you will get an error message.

Additionally, for 0.6 e 0.6 , you do not need to type 0.6*e^(-.6), just type .6e^(-.6). Also,

to calculate 2000(1 1.6e .6 ) , you do not need to type 2000*(1-1.6*(e^(-.6))). Simply

type

2000(1-1.6e^(-.6

Your calculator understands you are trying to calculate 2000(1 1.6e .6 ) . However, the

omission of the parenthesis sign works only for the last item in your formula. In other

words, if your equation is

2000(1 1.6e .6

) + 1000 .6e .6

you have to type the first item in its full parenthesis, but can skip typing the closing

parenthesis in the 2nd item:

2000(1-1.6e^(-.6)) + 1000*.6e^(-.6

If you type

2000(1-1.6e^(-.6 + 1000*.6e^(-.6

2000(1-1.6e^(-.6 + 1000*.6e^(-.6) ) )

1000*.6e^(-.6)+2000(1-1.6e^(-.6

press ENTER. You should get E ( X ) = 573.0897. This is an intermediate value. You

can store it on your scrap paper or in one of your calculators memories.

Var (X ) = E (x 2 ) E 2 (x ) =488460.6535

X = Var (x ) = 698.9960 .

First, please note that you can always calculate X without using the BA II Plus built-in

Statistics Worksheet. You can calculate E (X ), E (X 2 ),Var (X ) in BA II Plus as you do

any other calculations without using the built-in worksheet.

E (x ) = 0 * e .6

+ 1,000(.6e .6

) + 2,000(1 1.6e .6

)

E (x 2 ) = 02 e .6

+ 1,0002 .6e .6

+ 2,0002 (1 1.6e .6

)

Var (x ) = E (x 2 ) E 2 (x ), X = Var (x )

You simply calculate each item in the above equations with BA II Plus. This will give

you the required standard deviation.

has a built-in statistics worksheet and we should utilize it.

The key to using the BA II Plus Statistics Worksheet is to scale up the probabilities to

integers. To scale the three probabilities:

.6 .6 .6

(e , 0.6e , 1 1.6e )

your BA II Plus to display 4 (multiply the original probability

decimal places) by 10,000)

0 e 0.6 = 0.5488 5,488

0.6

1,000 0.6e = 0.3293 3,293

0.6

2,000 1-1.6e =0.1219 1,219

Total 1.0 10,000

Then we just enter the following data pairs into BA II Pluss statistics worksheet:

X01=0 Y01=5,488;

X02=1,000 Y02=3,293;

X03=2,000 Y03=1,219.

Make sure your calculator gives you n that matches the sum of the scaled-up

probabilities. In this problem, the sum of your scaled-up probabilities is 10,000, so you

should get n=10,000. If your calculator gives you n that is not 10,000, you know that at

least one of the scaled-up probabilities is wrong.

Of course, you can scale up the probabilities with better precision (more closely

resembling the original probabilities). For example, you can scale them up this way

(assuming you set your calculator to display 8 decimal places):

Payment X Probability Scale up probability to integer

more precisely (multiply the

original probability by

100,000,000)

0 e 0.6 = 0.54881164 54,881,164

1,000 0.6e 0.6 = 0.32928698 32,928,698

2,000 1-1.6e 0.6 =0.12190138 12,190,138

Total 100,000,000

Then we just enter the following data pairs into BA II Pluss statistics worksheet:

X01=0 Y01=54,881,164;

X02=1,000 Y02=32,928,698;

X03=2,000 Y03=12,190,138.

Then the calculator will give you X =698.8995993 (remember to check that

n=100,000,000)

For exam problems, scaling up the original probabilities by multiplying them by 10,000

is good enough to give you the correct answer. Under exam conditions it is unnecessary

to scale the probability up by multiplying by 100,000,000.

The number of claims a driver has during the year is assumed to be Poisson distributed

with an unknown mean that varies by driver.

0 54

1 33

2 10

3 2

4 1

Total 100

Determine the credibility of one years experience for a single driver using

semiparametric empirical Bayes estimation.

Solution

For now dont worry about credibility and focus on calculating the sample mean and

sample variance.

54 ( 0 ) + 33 (1) + 10 ( 2 ) + 2 ( 3) + 1( 4 ) 63

=X = = = 0.63

54 + 33 + 10 + 2 + 1 100

( ) ( )

n

1 1 100

Var ( X ) =

2 2

Xi X = Xi X

n 1 i =1 100 1 i =1

54 ( 0 .63) + 33 ( 0 .63) + 54 (1 .63) + 10 ( 2 .63) + 2 ( 3 .63) + 1( 4 .63)

2 2 2 2 2 2

=

100 1

=0.68

Enter

X01=0, Y01=54

X02=1, Y02=33

X03=2, Y03=10

X04=3, Y04=2

X05=4, Y05=1

X = 0.63

S X = 0.82455988 (this is the unbiased sample standard deviation)

While your calculator displays S X = 0.82455988 , press the x 2 key of your calculator.

You should get: 0.67989899. This is Var ( X ) = S X2 . So Var ( X ) = 0.67989899 0.68

Example

For an insurance:

A policyholders annual losses can be 100, 200, 300, and 400 with respective

probabilities 0.1, 0.2, 0.3, and 0.4.

The insurance has a annual deductible of $250 per loss.

Calculate the mean and the variance of the annual payment made by the insurer to the

policyholder, given theres a payment.

Solution

Let X represent the annual loss. Let Y represent the claim payment by the insurer to the

policyholder.

0 if X 250

Then Y =

X 250 if X > 250

Standard solution

Y 0 0 50 150

P(X ) 0.1 0.2 0.3 0.4

P ( X > 250 ) = P ( X = 300 ) + P ( X = 400 ) = 0.3 + 0.4 = 0.7

P(X ) 0.1 0.2 0.3 0.4

P ( X > 250 ) 0.7 0.7 0.7 0.7

1 2 3 4

+0 + 50 + 150 = 107.1428571

7 7 7 7

1 2 3 4

(X 150 ) + X > 150 ! = 0 2 + 02 + 502 + 150 2 = 13, 928.57143

2

E

" # 7 7 7 7

Var "( X 150 )+ X > 150 !# = 13,928.57143 107.14285712 = 2, 448.99

X 100 200 300 400

Y >250? No. Discard No. Discard. Yes. Keep. Yes. Keep.

If Yes, Keep; if No,

discard.

X 300 400

Y 50 150

10P ( X ) -- scaled up probability 3 4

n = 7, X = 107.14, X = 49.48716593

Var = 2

= 2, 4489.98

This is how BA II Plus/Professional 1-V Statistics Worksheet works. After you enter

X01=50, Y01=3,X02=150, Y02=4, BA II Plus/Professional knows that your random

variable X takes on two values: 50 (with frequency of 3) and 150 (with frequency 4).

Next, BA II Plus/Professional sets up the following table for statistics calculation:

3 3

$$50 =

with probability

3+ 4 7

X=

$150 with probability 4 = 4

$ 3+ 4 7

3 4

E ( X ) = 50 + 150 ,

7 7

E ( X 2 ) = 502

3 4

+ 1502 ,

7 7

Var ( X ) = E ( X 2 ) E 2 ( X )

earlier:

1 2 3 4

+0 + 50 + 150

7 7 7 7

1 2 3 4

(X 150 ) + X > 150 ! = 02 + 02 + 502 + 150 2

2

E

" # 7 7 7 7

Var "( X 150 )+ X > 150 !# = 13,928.57143 107.14285712 = 2, 448.99

Now you see that BA II/Professional correctly calculates the mean and variance.

frequency, not the absolute data frequency.

The following entries produce identical mean, sample mean, and variance:

Entry Two: X01=50, Y01=6, X02=150, Y02=8,

Entry Three: X01=50, Y01=30, X02=150, Y02=40,

3

$$50 with probability

7

X=

$150 4

with probability

$ 7

Professional 1-V Statistics Worksheet:

Throw away all the data pairs (Yi , X i ) where the condition X > a is NOT met.

Using the remaining data pairs to calculate E (Y ) and Var (Y ) .

Professional 1-V Statistics Worksheet:

Throw away all the data pairs (Yi , X i ) where the condition X < a is NOT met.

Using the remaining data pairs to calculate E (Y ) and Var (Y ) .

Example

X =x pX ( x )

0.5

4

6

( 0.54 ) k

0.25

1

6

( 0.253 ) ( 0.75 ) k

0.75

1

6

( 0.753 ) ( 0.25 ) k

Solution

X =x pX ( x ) Scaled p X ( x ) up multiply

1, 000, 000

p X ( x ) by

k

0.5

4

6

( 0.54 ) k = 0.041667 k 41,667

0.25

1

6

( 0.253 ) ( 0.75 ) k = 0.001953 k 1,953

0.75

1

6

( 0.753 ) ( 0.25 ) k = 0.017578 k 17,578

X01=0.5, Y01=41,667

X02=0.25, Y02= 1,953

X03=0.75, Y03=17,578

You are given the following joint distribution:

&

X 0 1

0 0.4 0.1

1 0.1 0.2

2 0.1 0.1

10

For a given value of ' and a sample of size 10 for X : X i = 10

i =1

Solution

Dont worry about the Bhlmann credibility premium for now. All you need to do right

now is to calculate the following 7 items:

E " E ( X ' ) !# , Var " E ( X ' ) !# , E "Var ( X ' ) !#

0 0.4 4

1 0.1 1

2 0.1 1

7

n = 6, X = 0.5, X = 0.76376262 Var = 2

= 0.58333333 =

12

E ( X ' = 0 ) = 0.5 , Var ( X ' = 0 ) = 0.58333333 =

7

12

0 0.1 1

1 0.2 2

2 0.1 1

n = 4, X =1 X = 0.70710678 Var = 2

= 0.707106782 = 0.5

Next, lets calculate E " E ( X ' ) !# and Var " E ( X ' ) !# .

E ( X ' = 1) = 1 P ( ' = 0 ) = 0.1 + 0.2 + 0.1 = 0.4 10 P ( ' = 1) = 4

= 0.24494897 2 = 0.06

Var ( X ' = 0 ) =

7

12

Var ( X ' = 1) = 0.5 P ( ' = 0 ) = 0.1 + 0.2 + 0.1 = 0.4 10 P ( ' = 1) = 4

7

X01= , Y01=6; X01=0.5, Y02=4

12

One useful yet neglected feature of BA II Plus/BA II Plus Professional is the linear least

squares regression functionality. This feature can help you quickly solve a tricky problem

with a few simple key strokes. Unfortunately, 99.9% of the exam candidates dont know

of this feature. Even SOA doesnt know.

Let me quickly walk through the basic formula behind the linear least squares regression.

This part is also explained in the chapter on the Bhlmann credibility premium. So I will

just repeat what I said in that chapter.

In a regression analysis, you try to fit a line (or a function) through a set of points. With

least squares regression, you want to get a better fit by minimizing the distance squared

of each point to the fitted line. You then use the fitted line to project where the data point

is most likely to be.

Say you want to find out how ones income level affects how much life insurance he

buys. Let X represent ones income. Let Y represent the amount of life insurance this

person buys. You have collected some data pairs of ( X , Y ) from a group of consumers.

You suspect theres a linear relationship between X and Y . So you want to predict

Y using the function a + bX , where a and b are constant. With least squares regression,

you want to minimize the following:

"(

Q=E a + bX Y ) !

2

#

(Q ( (

= E ( a + bX Y ) ! = E ( a + bX Y ) !# )* = E " 2 ( a + bX Y ) !#

2 2

(a (a " # (a " +

= 2 " E ( a + bX Y ) !# = 2 " a + bE ( X ) E (Y ) !#

(Q

Setting = 0. a + bE ( X ) E (Y ) = 0 ( Equation I )

(a

(Q ( (

= E ( a + bX Y ) ! = E ( a + bX Y ) #! )* = E " 2 ( a + bX Y ) X !#

2 2

(b (b " # (b " +

= 2 E "( a + bX Y ) X #! = 2 " aE ( X ) + bE ( X 2 ) E ( X Y ) !#

(Q

Setting = 0. aE ( X ) + bE ( X 2 ) E ( X Y ) = 0 ( Equation II )

(b

(Equation II ) - (Equation I ) E ( X ) :

b " E ( X 2 ) E 2 ( X ) !# = E ( X Y ) E ( X ) E (Y )

Guo Fall 2009 C, Page 37 / 284

Cov ( X , Y )

b= , a = E (Y ) bE ( X )

Var ( X )

Where

Var ( X ) = E ( X 2 ) E 2 ( X ) , E ( X ) = pi xi , E ( X 2 ) = pi xi2

Cov ( X , Y ) = E ( X Y ) E ( X ) E (Y ) , E ( X Y ) = pi xi yi , E (Y ) = pi yi

Example 1. For the following data pair ( xi , yi ) , find the linear least squares regression

line a + bX :

i pi ( xi , yi ) X = xi Y = yi

1 13 0 1

2 13 3 6

3 13 12 8

Solution

( 0 + 3 + 12 ) = 5 , E ( X 2 ) = ( 02 + 32 + 122 ) = 51

1 1

E(X ) =

3 3

Var ( X ) = 51 52 = 26

1 1

E (Y ) = (1 + 6 + 8) = 15 , E ( X Y ) = ( 0 1 + 3 6 + 12 8 ) = 38

3 3

Cov ( X , Y ) = E ( X Y ) E ( X ) E (Y ) = 38 5 5 = 13

Cov ( X , Y ) 13

b= = = 0.5 , a = E (Y ) bE ( X ) = 5 0.5 5 = 2.5

Var ( X ) 26

If X =0, 2.5 + 0.5 X = 2.5 + 0.5 ( 0 ) = 2.5 ;

If X =3, 2.5 + 0.5 X = 2.5 + 0.5 ( 3) = 4 ;

If X =12, 2.5 + 0.5 X = 2.5 + 0.5 (12 ) = 8.5 ;

Now you understand the linear least squares regression. Next, lets talk about how to use

BA II Plus/BA II Plus Professional to find a and b and calculate a + bX when X =0, 3,

12.

Example 2. For the following data pair ( xi , yi ) , find the linear least squares regression

line a + bX using BA II Plus/BA II Plus Professional.

i pi ( xi , yi ) X = xi Y = yi

1 13 0 1

2 13 3 6

3 13 12 8

Solution

2nd CLR Work (clear the old contents)

X01=0, Y01=1

X02=3, Y02=6

X03=12, Y03=8

2nd STAT (keep pressing 2nd Enter, 2nd Enter, , until your calculator displays

LIN)

Press the down arrow key , youll see X = 5

Press the down arrow key , youll see S X = 6.24499800 (sample standard deviation)

Press the down arrow key , youll see X = 5.09901951 (standard deviation)

Press the down arrow key , youll see Y = 5

Press the down arrow key , youll see S y = 3.60555128 (sample standard deviation)

Press the down arrow key , youll see y = 2.94392029 (standard deviation)

Press the down arrow key , youll see a = 2.5

Press the down arrow key , youll see b = 0.5

Press the down arrow key , youll see r = 0.8660254 ( the correlation coefficient)

Guo Fall 2009 C, Page 39 / 284

Press the down arrow key , youll see X ' = 0

Enter X ' = 0 ( To do this, press 0 Enter)

Press the down arrow key .

Press CPT. Youll get Y ' = 2.5 (this is a + bX when X =0)

Press the up arrow key , youll see X ' = 0

Enter X ' = 3 ( To do this, press 3 Enter)

Press the down arrow key .

Press CPT. Youll get Y ' = 4 (this is a + bX when X =3)

Press the up arrow key , youll see X ' = 3

Enter X ' = 12 ( To do this, press 12 Enter)

Press the down arrow key .

Press CPT. Youll get Y ' = 8.5 (this is a + bX when X =12)

You see that using BA II Plus/Professional LIN Statistics Worksheet, we get the same

result.

You might wonder why we didnt use the probability pi ( xi , yi ) . Here is an important

point. BA II Plus/Professional Statistics Worksheet (including LIN) cant directly handle

probabilities. To use Statistics Worksheet, you have to first convert the probabilities to

1

the # of occurrences. In this problem, pi ( xi , yi ) = for i =1,2, and 3. So we have 3 data

3

pairs of ( xi , yi ) and each data pair is equally likely to occur. So we arbitrarily let each

data pair to occur only once. This way, BA II Plus/Professional knows that each of the

three data pairs has 1 3 chance of occurring. Later I will show you how to use LIN when

pi ( xi , yi ) is not uniform.

Some of you might complain: I can easily use my pen and find the answers. Why do I

need to bother using LIN? There are several reasons why you might want to use LIN to

find the regression line a + bX and calculate various values of a + bX :

In the of the exam, its easy for you to be brain dead and forget the formulas

Cov ( X , Y )

b= , a = E (Y ) bE ( X )

Var ( X )

Even if you are not brain dead, you can easily make mistakes calculating a + bX

from scratch. In contrast, if you have entered your data pair ( xi , yi ) correctly, BA

II Plus/Professional will generate the results 100% right.

Even if you want to calculate a + bX from scratch, its good to use LIN to

double check your work.

Example 3. For the following data pair ( xi , yi ) , find the linear least squares regression

line a + bX using BA II Plus/BA II Plus Professional.

i pi ( xi , yi ) X = xi Y = yi

1 16 0 1

2 13 3 6

3 12 12 8

Solution

assume we have a total of 6 occurrences. Then ( x1 , y1 ) occurs once; ( x2 , y2 ) occurs

twice; and ( x3 , y3 ) occurs three times. When calculating a + bX , LIN Statistics

1 1

Worksheet automatically figures out that p1 ( x1 , y1 ) = , p2 ( x2 , y2 ) = , and

6 3

1

p3 ( x3 , y3 ) = .

2

Of course, you can also assume that the total # of occurrences is 60. Then ( x1 , y1 ) occurs

10 times; ( x2 , y2 ) occurs 20 times; and ( x3 , y3 ) occurs 30 times. However, this approach

will make your data entry difficult.

X01=0, Y01=1

X02=3, Y02=6

X03=3, Y04=6

X04=12, Y04=8

X05=12, Y05=8

X06=12, Y06=8

n = 6 , X = 7 , S X = 5.58569602 , X = 5.09901951 ,

Y = 6.16666667 , SY = 2.71416040 , Y = 2.47767812

a = 3.25 , b = 0.41666667 , r = 0.85749293

Guo Fall 2009 C, Page 41 / 284

a + bX = 3.25 + 0.41666667 X

Set X ' = 3 . Press CPT . You should get Y ' = 4.5

Set X ' = 12 . Press CPT . You should get Y ' = 8.25

i pi ( xi , yi ) X = xi Y = yi

1 16 0 1

2 13 3 6

3 12 12 8

E(X2) = ( 0 ) + ( 3 ) + (122 ) = 75

1 1 1 1 2 1 2 1

E(X ) = ( 0 ) + ( 3) + (12 ) = 7 ,

6 3 2 6 3 2

Var ( X ) = 75 72 = 26

1 1 1

E (Y ) = (1) + ( 6 ) + ( 8 ) = 6.1667

6 3 2

1 1 1

E ( X Y ) = ( 0 1) + ( 3 6 ) + (12 8 ) = 54

6 3 2

Cov ( X , Y ) 10.8331

b= = = 0.41666

Var ( X ) 26

a + bX = 3.25 + 0.41666 X

If X = 3 , then Y ' = a + bX = 3.25 + 0.41666 ( 3) = 4.5

If X = 12 , then Y ' = a + bX = 3.25 + 0.41666 (12 ) = 8.25

Now you should be convinced that LIN Statistics Worksheet produces the correct result.

There are at least two places you can use LIN. One is to calculate Bhlmann credibility

premium as the least squares regression of Bayesian premium. Another situation is to use

LIN for liner interpolation. Ill walk you through both.

Let X 1 represent the outcome of a single trial and let E ( X 2 X 1 ) represent the expected

value of the outcome of a 2nd trial as described in the table below:

k of outcome E ( X 2 X1 = k )

0 13 1

3 13 6

12 13 8

Solution

E ( a + ZX 1 Y )

2

where Y = E ( X 2 X 1 ) .

Since the probability of data pair is uniformly 1 3, we enter the following data in LIN:

X01=0, Y01=1

X02=3, Y02=6

X03=12, Y03=8

We should get:

a = 2.5 , b = 0.5

Enter X ' = 0 . Press CPT. Youll get Y ' = 2.5 (this is a + bX when X =0)

Enter X ' = 3 . Press CPT. Youll get Y ' = 4 (this is a + bX when X =3)

Enter X ' = 12 Press CPT. Youll get Y ' = 8.5 (this is a + bX when X =12)

4, 8.5).

Example 5 (another old SOA problem)

You are given the following information about insurance coverage:

n E ( X 2 X1 = n )

0 14 0.5

1 12 0.9

2 14 1.7

Solution

The probability is not uniform. Assume the total # of occurrences is 4. Then the data pair

" n = 0, E ( X 2 X 1 = 0 ) = 0.5!# occurs once, " n = 1, E ( X 2 X 1 = 1) = 0.9 !# occurs twice, and

" n = 2, E ( X 2 X 1 = 2 ) = 1.7 !# occurs once.

X01=0, Y01=0.5

X02=1, Y02=0.9

X03=1, Y03=0.9

X04=2, Y03=1.7

We should get:

a = 0.4 , b = 0.6 . So the Bhlmann credibility factor is Z = b = 0.6 .

Ri Pi given outcome Ri

0 23 7 4

2 29 55 24

14 19 35 12

1

The Bhlmann credibility factor after one experiment is . Calculate a and b that

12

minimize the following expression:

Guo Fall 2009 C, Page 44 / 284

3

Pi ( a + bRi Ei )

2

i =1

Solution

1

SOA makes your life easier by giving you b = . However, to solve this problem, you

12

really dont need to know b . Once again, well use LIN to solve the problem. Lets

assume the total # of occurrences of data pairs ( Ri , Ei ) is 9. Then (0, 7 4 ) occurs 6

times; (2, 55 24 ) occurs 2 times; and (14, 35 12 ) occurs one time.

X02=0, Y02=1.57

X03=0, Y03=1.57

X04=0, Y04=1.57

X05=0, Y05=1.57

X06=0, Y06=1.57

X07=2, Y07= 55 24

X08=3, Y08= 55 24

X09=14, Y09= 35 12

We should get:

1

a = 1.8333 , b = 0.08333 = .

12

Does this solution sound too much data entry? Not to me. Yes, I can figure out the

answers using the equations:

Cov ( X , Y )

b= , a = E (Y ) bE ( X )

Var ( X )

I might solve this problem using the above equations when Im not taking the exam.

However, in the exam room, you bet I wont bother using these equations. I will enter 18

numbers into the calculator and let the calculator do the math for me. This way, I dont

have to think. I just enter the numbers and the calculator will spit out the answer for me.

And I know that my result is 100% right.

#7 Do linear interpolation

Another use of LIN is to do linear interpolation. You are given two data pairs ( x1 , y1 ) and

( x2 , y2 ) . Then you are given a single value x3 . You need to find y3 using linear

interpolation.

y3 y1 y2 y1

= = slop of line ( x1 , y1 ) and ( x2 , y2 )

x3 x1 x2 x1

y2 y1

y3 = ( x3 x1 ) + y1

x2 x1

To use LIN for linear interpolation, please note that the least squares regression line for

two data points ( x1 , y1 ) and ( x2 , y2 ) is just an ordinary straight line connecting ( x1 , y1 )

and ( x2 , y2 ) . To find y3 , we simply find the least squares regression line a + bX for

( x1 , y1 ) and ( x2 , y2 ) . Then we enter x3 into LIN. Then LIN will produce y3 .

1078 1452 2054 2199 3207

Determine the smoothed empirical estimate of the 90th percentile, as defined in Klugman,

Panjer, and Willmot.

Solution

100k

order. Then the k -th number is the percentile. For example, the 1st observation 46

n +1

100 (1) 100 ( 2 )

is the = 9.09 percentile; the 2nd observation 121 is the = 18.18 percentile.

10 + 1 10 + 1

So on and so forth.

To find the smoothed estimate of the 90-th percentile, we linearly interpolate between the

100 ( 9 )

9-th observation, which is = 81.82 -th percentile, and the 10th observation, which

10 + 1

100 (10 )

is = 90.91 -th percentile.

10 + 1

90 81.82

x90 = x81.82 + ( x90.91 x81.82 )

90.91 81.82

90 81.82

= 2,199 + ( 3, 207 2,199 ) = 3,106.09

90.91 81.82

Next, Ill show you two shortcuts. One is without using LIN; the other with using LIN.

100k 100k

Since the k -th number is the percentile, the m = percentile corresponds to

n +1 n +1

m ( n + 1)

- th observation. For example, the 81.82-th percentile corresponds to

100

81.82 (10 + 1)

= 9 -th observation; 90.91-th percentile corresponds to the

100

90.91(10 + 1)

= 10 -th observation.

100

Important Rules:

100k

The k -th observation is the percentile.

n +1

m ( n + 1)

The m -th percentile is the - th observation.

100

Once you understand the above two rules, you can quickly find the 90-th percentile.

m ( n + 1) 90 (10 + 1)

Set m = 90 : k = = = 9.9 . So 9.9-th observation is what we are

100 100

looking for.

Of course, there isnt 9.9-th observation. So we need to find it using linear interpolation.

9 9.9 10

9.9 9

x90 = 2,199 + ( 3, 207 2,199 ) = 3,106.2

10 9

You see that this linear interpolation is must faster than the previous linear interpolation.

We have two data pairs (9, 2,199) and (10, 3,207). As said before, if you have only two

points, then the least squares line is just the ordinary line connecting the two points. We

are interested in finding the ordinary straight line connecting (9, 2,199) and (10, 3,207).

So well use the LIN function to find the least squares line, which is the ordinary line.

X01=9, Y01=2199

X02=10, Y02=3207

Youll find that: a = 6,873 , b = 1, 008 , r = 1 . The correlation coefficient should be one

because we have only two data pairs. Two data points always produce perfectly linear

relationship. So if your r is not equal to one, you did something wrong.

Next, set X ' = 9.9 . Press CPT. You should get: Y ' = 3,106.2 . This is the 90th percentile

you are looking for.

Example 2

You are given the following values of the cdf of a standard normal distribution:

Solution

, ( 0.443 ) = , ( 0.4 ) + , ( 0.5 )

0.5 0.4 0.5 0.4

= 0.57 ( 0.6554 ) + 0.43 ( 0.6915) = 0.6709

This approach is prone to errors. The math logic is simple, but there are simply too many

numbers to calculate. And its very easy to make a mistake, especially in the heat of the

exam.

To quickly solve this problem, well use LIN. Enter the following data:

X01=0.4, Y01=0.6554

X02=0.5, Y02=0.6915

2nd STAT (keep pressing 2nd Enter until you see LIN)

Press the down arrow key , youll see n = 2

Press the down arrow key , youll see X = 0.45

Press the down arrow key , youll see S X = 0.07071068

Press the down arrow key , youll see X = 0.05

Press the down arrow key , youll see Y = 0.67345

Press the down arrow key , youll see S y = 0.02552655

Press the down arrow key , youll see y = 0.01805

Press the down arrow key , youll see a = 0.511

Press the down arrow key , youll see b = 0.361

Press the down arrow key , youll see r = 1 (this is the correlation coefficient)

Press the down arrow key , youll see X ' = 0.00

Enter X ' = 0.443

Press the down arrow key .

Press CPT. Youll get Y ' = 0.670923

So , ( 0.443) = 0.670923

In the above example, after generating , ( 0.443) = 0.670923 , you want to generate

, ( 0.412345 ) , this is what you do:

Enter X ' = 0.412345

Press the down arrow key .

Press CPT. Youll get Y ' = 0.65985655 . This is , ( 0.412345 ) .

Enter X ' = 0.46789

Press the down arrow key .

Press CPT. Youll get Y ' = 0.67990829 . This is , ( 0.46789 ) .

General procedure

Given two data pairs ( c1 , d1 ) and ( c2 , d 2 ) and a single data c3 , to use BA II Plus and BA

II Plus Professional LIN Worksheet to generate d3 , enter

X01= c1 , Y01= d1

X02= c2 , Y02= d 2

X ' = c3

must be entered as Y ' s .

Example 3

You are given the following values of the cdf of a standard normal distribution:

Use linear interpolation, find a, b, c , and e (all these are positive numbers) such that

, ( a ) = 0.6666

, ( b ) = 0.6777

, ( c ) = 0.6888

, ( d ) = 0.6999

Solution

X01=0.6554, Y01=0.4

X02=0.6915, Y02=0.5

Enter X ' = 0.6666 . Then the calculator will generate Y ' = 0.43102493 .

So a = 0.43102493 .

Enter X ' = 0.6777 . Then the calculator will generate Y ' = 0.46177285

So b = 0.46177285 .

Enter X ' = 0.6888 . Then the calculator will generate Y ' = 0.49252078

c = 0.49252078

Enter X ' = 0.6999 . Then the calculator will generate Y ' = 0.52326870

So d = 0.52326870

Example 4

The population of a survivor group is assumed to be linear between two consecutive ages.

You are given the following:

50 598

51 534

50.2, 50.5, 50.7, 50.9

Solution

X01=50, Y01=598

X02=51, Y02=534

Enter X' = 50.2 . Then the calculator will generate Y' = 585.2

Enter X' = 50.5 . Then the calculator will generate Y' = 566

Enter X' = 50.7 . Then the calculator will generate Y' = 553.2

Enter X' = 50.9 . Then the calculator will generate Y' = 540.4

Chapter 2 Maximum likelihood estimator

Basic idea

An urn has two coins, one fair and the other biased. In one flip, the fair coin has 50%

chance of landing with heads, while the biased one has 90% chance of landing with

heads. Now a coin is randomly chosen from the urn and is tossed. The outcome is a head.

Question: Which coin was chosen from the urn? The fair coin or the biased coin?

Imagine you have entered a bet. If your guess is correct, youll earn $10. If your guess is

wrong, youll lose $10. How would you guess?

Most people will guess that the coin chosen from the urn was the biased coin; the biased

coin is far more likely to land on heads.

This simple example illustrates the intuition behind the maximum likelihood estimator. If

we have to estimate a parameter from an n -size sample X 1 , X 2 ,, X n , we can choose a

parameter that has the highest probability to be observed.

Example. You flip a coin 9 times and observe HTTTHHHTH. You dont know whether

the coin is fair and you need to estimate the probability of getting H in one flip.

Let p represent the probability of getting a head in one flip. The probability for us to

observe HTTTHHHTH is

4

P ( HTTTHHHTH p ) = p5 (1 p )

4

p

0 0.000000000

0.1 0.000006561

0.2 0.000131072

0.3 0.000583443

0.4 0.001327104

0.5 0.001953125

0.6 0.001990656

0.7 0.001361367

0.8 0.000524288

0.9 0.000059049

1 0.000000000

Guo Fall 2009 C, Page 52 / 284

If we have to guess p among the possible values 0, 0.1, 0.2, , we might guess p = 0.6 ,

which has the highest probability to produce the outcome of HTTTHHHTH.

A coin is tossed n times and x number of heads are observed. Let p represent the

probability that a head shows up in one flip of coin. Calculate the maximum likelihood

estimator of p .

Step One Write the probability that the observed event happens (the likelihood

function)

n x

function). This step simplifies our calculation (as youll see soon).

Step Three Take the 1st derivative of the log-likelihood function regarding the

parameter. Set the 1st derivative to zero.

d

dp

d

ln Cnx + x ln p + ( n x ) ln (1 p ) = 0 ,

dp

d d d

ln Cnx + ( x ln p ) + (n x ) ln (1 p ) = 0 ,

dp dp dp

d d d x

ln Cnx = 0 , ( x ln p ) = x ( ln p ) = ,

dp dp dp p

d d n x

(n x ) ln (1 p ) = ( n x ) ln (1 p ) =

dp dp 1 p

x n x 1 p n x 1 n x

=0, = , = , p=

p 1 p p x p x n

Nov 2000 #6

You have observed the following claim severities:

1 1

f ( x) = (x ) , x > 0 , > 0

2

exp

2 x 2x

Solution

X 1 , X 2 , X 3 , X 4 , and X 5 are independent identically distributed with a common pdf

1 1

f ( x) = (x )

2

exp

2 x 2x

f X1 , X 2 , X 3 , X 4 , X 5 ( x1 , x2 , x3 , x4 , x5 )

= f X ( x1 ) f X ( x2 ) f X ( x3 ) f X ( x4 ) f X ( x5 )

1 1 1 1 1 1

= ( x1 ) ( x2 ) ( x3 )

2 2 2

exp exp exp

2 x1 2 x1 2 x2 2 x2 2 x3 2 x3

1 1 1 1

( x4 ) ( x5 )

2 2

exp exp

2 x4 2 x4 2 x5 2 x5

= f X ( x1 ) f X ( x2 ) f X ( x3 ) f X ( x4 ) f X ( x5 ) dx1dx2 dx3dx4 dx5

Our goal is to find a parameter that will maximize our chance of observing X 1 , X 2 ,

X 3 , X 4 , and X 5 . To maximize our chance of observing X 1 , X 2 , X 3 , X 4 , and X 5 is to

maximize the joint pdf f X1 , X 2 , X 3 , X 4 , X 5 ( x1 , x2 , x3 , x4 , x5 ) . To maximize the joint pdf

f X1 , X 2 , X 3 , X 4 , X 5 ( x1 , x2 , x3 , x4 , x5 ) , we can set the 1st derivative of the joint pdf regarding to

equal to zero:

d

f X , X , X , X , X ( x1 , x2 , x3 , x4 , x5 ) = 0

d 1 2 3 4 5

Though we can solve the above equation by pure hard work, an easier approach is to find

a parameter that will maximize the log-likelihood of us observing X 1 , X 2 , X 3 , X 4 ,

and X 5 :

ln f X1 , X 2 , X 3 , X 4 , X 5 ( x1 , x2 , x3 , x4 , x5 )

If ln f X1 , X 2 , X 3 , X 4 , X 5 ( x1 , x2 , x3 , x4 , x5 ) is maximized, f X1 , X 2 , X 3 , X 4 , X 5 ( x1 , x2 , x3 , x4 , x5 ) will

surely be maximized. So the task boils down to finding such that the 1st derivative of

the log pdf is zero:

d

ln f X1 , X 2 , X 3 , X 4 , X 5 ( x1 , x2 , x3 , x4 , x5 ) = 0

d

ln f X1 , X 2 , X 3 , X 4 , X 5 ( x1 , x2 , x3 , x4 , x5 )

5 5

1 1 1 1

= ( xi ) = ( xi )

2 2

ln exp ln

i =1 2 xi 2 xi i =1 2 xi 2 xi

5

d 1 1

( xi ) =0

2

ln

d i =1 2 xi 2 xi

1

is a constant and its derivative regarding is zero.

2 xi

( xi )

2

5 5

d 1 d

( xi ) = 0, =0

2

d i =1 2 xi d i =1 xi

( xi ) d ( xi )

2 2

d 5 5 5

xi 5

= = 2 = 2 1 =0

d i =1 xi i =1 d xi i =1 xi i =1 xi

5

1 1 1 1 1

1 =0, 5 + + + + =0

i =1 xi x1 x2 x3 x4 x5

5 5

= = = 16.74

1 1 1 1 1 1 1 1 1 1

+ + + + + + + +

x1 x2 x3 x4 x5 11 15.2 18 21 25.8

After understanding the theoretical framework and detailed calculation, we are ready to

use a shortcut. First, lets isolate the variable :

1 1 1

f ( x) = (x ) (x )

2 2

exp exp

2 x 2x 2x

5

1

f X1 , X 2 , X 3 , X 4 , X 5 ( x1 , x2 , x3 , x4 , x5 ) ( xi )

2

exp

i =1 2 xi

( xi )

2

5

ln f X1 , X 2 , X 3 , X 4 , X 5 ( x1 , x2 , x3 , x4 , x5 )

i =1 xi

d ( xi )

2

d 5 5

xi

ln f X1 , X 2 , X 3 , X 4 , X 5 ( x1 , x2 , x3 , x4 , x5 ) = 0 = 2 =0

d i =1 d xi i =1 xi

5 5

= = = 16.74

1 1 1 1 1 1 1 1 1 1

+ + + + + + + +

x1 x2 x3 x4 x5 11 15.2 18 21 25.8

!

500

F ( x) = 1 , x > 500 , ! > 0

x

Calculate the maximum likelihood estimate of the parameter ! .

Solution

! 500!

f ( x) =

x! +1

5 ! 500! ! 5 5005!

f ( x1 , x2 , x3 , x4 , x5 ) = " =

( x1 x2 x3 x4 x5 )

! +1

i =1 xi! +1

! 5 5005!

ln f ( x1 , x2 , x3 , x4 , x5 ) = ln = 5ln ! + 5! ln 500 (! + 1) ln ( x1 x2 x3 x4 x5 )

( x1 x2 x3 x4 x5 )

! +1

d 5

ln f ( x1 , x2 , x3 , x4 , x5 ) = + 5 ln 500 ln ( x1 x2 x3 x4 x5 ) = 0

d! !

5

+ 5ln 500 ln ( 521 658 702 819 1217 ) = 0 , ! = 2.453

!

You are given the following information about a random sample:

The sample size equals five

The sample is from a Weibull distribution with $ = 2

Two of the sample observations are known to exceed 50, and the remaining three

observations are 20, 30, and 45

Solution

From Exam C table, youll find the Weibull pdf and cdf:

$

$ x

x

$ e % 2 2 2

%

x x x

2x

f ( x) = = e %

, F ( x) = 1 e %

, S ( x) = e %

x %2

x1 > 50 , x2 > 50 , x3 = 20 , x4 = 30 , x5 = 45

L (% ) = f ( 20 ) f ( 30 ) f ( 45 ) S ( 50 ) S ( 50 )

2 ( 20 ) 2 ( 30 ) 2 ( 40 )

2 2 2 2 2

20 30 40 50 50

= exp exp exp exp exp

%2 % %2 % %6 % % %

8,325

1 8,325

L (% ) e %2

ln L (% ) = k 6ln % , where k is a constant

% 6

%2

d 2 ( 8,325 ) 6

ln L (% ) = = 0, % = 52.7

d% %3 %

Fisher Information

One key theorem you need to memorize for Exam C is that the maximum likelihood

1

estimator % is approximately normally distributed with mean %0 and variance :

I (% )

1

% N %0 ,

I (% )

Here %0 is the true parameter. L ( x,% ) , called Fisher information or information, is the

d

variance of ln L ( x,% ) :

d%

2 2

d d d2

I (% ) = VarX ln L (% ) = E X ln L ( x, % ) = EX ln L ( x, % )

d% d% d% 2

Please note in the above equation, the expectation and variance are regarding X .

1

Its quit a bit a math to prove that % N %0 , . So I wont show you the proof.

I (% )

Youll just need to memorize it. However, Ill show you why

2 2

d d d2

I (% ) = VarX ln L ( x,% ) = E X ln L ( x,% ) = EX ln L ( x,% )

d% d% d% 2

First, let me introduce a new concept to you called score. The term score is not the

syllabus. However, its a building block for Fisher information. So lets take a look.

n

L ( x,% ) = " f ( xi ,% ) , where % is the unobservable parameter of the density function.

i =1

When calculating the maximum likelihood estimator % , we often use the log-likelihood

function. So lets consider log-likelihood function, ln L ( x,% ) . The derivative of the log-

d

likelihood function regarding the estimator % , ln L ( x, % ) , is called the score of the

d%

log-likelihood function. Lets find the mean and variance of the score.

d 1 d

ln L ( x,% ) = L ( x, % )

d% L (% ) d%

d 1 d

EX ln L ( x,% ) = E X L ( x, % )

d% L ( x , % ) d%

1 d d d

=&

L ( x , % ) d%

L ( x, % ) L ( x, % ) dx = &

d%

L ( x, % ) dx =

d% & L ( x,% ) dx

density

random variable

d 1 d d

EX ln L ( x, % ) = E X L ( x, % ) = 1= 0

d% L ( x , % ) d% d%

2 2

d d2

Next, let me explain why E ln L ( x,% ) = E ln L ( x,% ) .

d% d% 2

d d

We know that E X ln L ( x, % ) = & ln L ( x, % ) L ( x,% ) dx = 0

d% d%

d d d

& ln L ( x, % ) L ( x,% ) dx = 0=0

d% d% d%

d

Moving inside the integration, we have:

d%

d d d d

& ln L ( x, % ) L ( x,% ) dx = & ln L ( x, % ) L ( x, % ) dx

d% d% d% d%

d d d

Using the formula u ( x) v ( x) = u ( x) v ( x) + v ( x) u ( x ) , we have:

dx dx dx

d d

ln L ( x, % ) L ( x,% )

d% d%

d d d d

= L ( x, % ) ln L ( x, % ) + ln L ( x, % ) L ( x, % )

d% d% d% d%

d 1 d d d

However, ln L ( x,% ) = L ( x, % ) . L ( x , % ) = L ( x, % ) ln L ( x, % )

d% L ( x,% ) d% d% d%

So we have:

d d

ln L ( x,% ) L ( x,% )

d% d%

d d d d

= L ( x, % ) ln L ( x, % ) + ln L ( x, % ) L ( x, % )

d% d% d% d%

2

d d d

= L ( x, % ) ln L ( x, % ) + L ( x, % ) ln L ( x, % )

d% d% d%

2

d2 d

= L ( x, % ) 2 ln L ( x, % ) + L ( x, % ) ln L ( x, % )

d% d%

2

d2 d

= L ( x, % ) ln L ( x, % ) + ln L ( x, % )

d% 2

d%

d d

Then & ln L ( x, % ) L ( x,% ) dx = 0 becomes:

d% d%

2

d2 d

& ln L ( x, % ) L ( x,% ) dx + & ln L ( x, % ) L ( x, % ) dx = 0

d% 2

d%

d2 d2

However, & ln L ( x , % ) L ( x , % ) dx = E ln L ( x,% ) ,

d% 2 d% 2

2 2

d d

& ln L ( x, % ) L ( x, % ) dx = E ln L ( x, % )

d% d%

2

d2 d

Then it follows that E ln L ( x, % ) + E ln L ( x, % ) = 0.

d% 2

d%

d

Since we know that E ln L ( x, % ) = 0 , it follows that

d%

2

d d d2

Var ln L ( x, % ) = E ln L ( x,% ) = E ln L ( x,% )

d% d% d% 2

d

The score ln L ( x, % ) has

d%

2

d d2

zero mean and variance E ln L ( x, % ) = E ln L ( x,% )

d% d% 2

The information associated with the maximum likelihood estimator of a parameter % is

4n , where n is the number of observations.

Solution

()

Var % is the inverse of the information. So Var % = () 1

4n

( )

Var 2% = 4Var % = 4 () 1

4n

1

= .

n

The Cramer-Rao theorem

Suppose the random variable X has density function f ( x,% ) . If g ( x ) is any unbiased

1

estimator of % , then Var g ( x ) ' . The proof is as follows:

Var f ( x,% )

& g ( x ) f ( x,% ) dx = % .

Taking derivative regarding % at both sides of the above equation:

d d

& g ( x ) f ( x, % ) dx = % =1

d% d%

d

Moving inside the integration:

d%

d d

& g ( x ) f ( x, % ) dx = & g ( x ) f ( x, % ) dx = 1

d% d%

d d

& d% g ( x ) f ( x, % ) dx = & g ( x ) f ( x, % )dx = 1

d%

d d

However, f ( x, % ) = f ( x, % ) ln f ( x, % ) . So we have

d% d%

d d

& g ( x ) d% f ( x,% )dx = & g ( x ) d% ln f ( x,% ) f ( x,% ) dx = 1

d d

However, & g ( x ) d% ln f ( x,% ) f ( x,% ) dx = E g ( x ) d% ln f ( x,% ) .

d

EX g ( x ) ln f ( x, % ) = 1 .

d%

d

Next, consider covariance Cov g ( x ) , ln f ( x, % ) .

d%

Cov g ( x ) ,

d

d%

ln f ( x,% ) = E X {g ( x ) Eg ( x )}

d

d%

ln f ( x, % ) E

d

d%

ln f ( x,% )

E (Y ) .

d d

However, E X g ( x ) = % , E X ln f ( x, % ) = 0 . ln f ( x,% ) is the score and has

d% d%

zero mean. Then it follows:

Cov g ( x ) ,

d

d%

ln f ( x, % ) = EX { g ( x ) % } dd% ln f ( x,% )

d d

= EX g ( x ) ln f ( x,% ) % ln f ( x, % )

d% d%

d d

= EX g ( x ) ln f ( x,% ) EX % ln f ( x, % )

d% d%

d d

= EX g ( x ) ln f ( x, % ) % EX ln f ( x,% )

d% d%

=1 % 0 =1

d

Cov g ( x ) , ln f ( x, % ) = 1

d%

have:

2 2

= * X ,Y + X + Y

2

2

d d

1 = Cov g ( x ) , ln f ( x,% ) Var g ( x ) Var ln f ( x,% )

d% d%

1

Var g ( x ) '

d

Var ln f ( x, % )

d%

For an unbiased estimator g ( x ) , its variance is no less than the reciprocal of the variance

d

of the score ln f ( x,% ) .

d%

1

Var g ( x ) ' is a generic formula. When we use the maximum

d

Var ln f ( x,% )

d%

likelihood estimator, then the density function is:

f ( x,% ) = f ( x1 , % ) f ( x2 ,% ) ... f ( xn ,% ) = L ( x, % )

d 1

When the ln f ( x,% ) meets certain condition, Var g ( x ) = . We

d% d

Var ln f ( x,% )

d%

are not going to worry about what these conditions are. All we need to know is that for

the maximum likelihood estimator g ( x ) , when n , the sample size of the observed data

X 1 , X 2 ,..., X n , approaches infinity, the variance of g ( x ) approaches

1

d

Var ln L ( x, % )

d%

1

Var (% ) . as simple size n approaches infinity.

d

Var ln L ( x, % )

d%

without proof):

Assume that random variable X has density f ( x;%1 ,% 2 ,...,% k ) . The covariance

Cov (% i , % j ) between two maximum likelihood estimators %i and % j , as simple size n

approaches infinity, is equal to the inverse of ( i, j ) entry of Fisher Information:

Ii , j = E =E

/%i /% j /%i /% j

/2 /2

E ln L ( x;%1 ,% 2 ) E ln L ( x;%1 ,% 2 )

/%12 /%1/% 2

I=

/2 /2

E ln L ( x;%1 ,% 2 ) E ln L ( x;%1 ,% 2 )

/%1/% 2 /% 22

/2

Where I1,2 = I 2,1 = E ln L ( x;%1 , % 2 )

/%1/% 2

Then

=I 1

A sample of ten observations comes from a parametric family f ( x, y;%1 , % 2 ) with log

likelihood function

10

ln L (%1 , % 2 ) = ln f ( xi , yi ;%1 , % 2 ) = 2.5%12 3%1% 2 % 22 + 5%1 + 2% 2 + k

i =1

where k is a constant.

%1

Determine the estimated covariance matrix of the maximum likelihood estimator .

%2

Solution

Guo Fall 2009 C, Page 65 / 284

/2 /2

2 (

E ln L (%1 ,% 2 ) = E 2.5%12 3%1% 2 % 22 + 5%1 + 2% 2 + k ) = E ( 5) = 5

/%12 /%1

/2 /2

2 (

E ln L (%1 , % 2 ) = E 2.5%12 3%1% 2 % 22 + 5%1 + 2% 2 + k ) = E ( 2 ) = 2

/% 22 /% 2

/2 /2

E

/%1/% 2

ln L (%1 , % 2 ) = E

/%1/% 2

( 2.5%12 3%1% 2 % 22 + 5%1 + 2% 2 + k ) = E ( 3) = 3

5 3

I=

3 2

1

a b 1 d c

= , if ad bc 0 0

c d ad bc b a

1

5 3 1 2 3 2 3

=I 1

= = =

Cov (%1 ,% 2 ) Var (% 2 ) 3 2 5 2 3 3 3 5 3 5

Fisher Information matrix is good for estimating the variance and covariance of a series

of maximum likelihood estimators. What if we need to estimate the variance and

covariance of a function of a series of maximum likelihood estimators? We can use the

delta method.

Delta method

Assume that random variable X has mean X and variance + X2 . Define a new function

Y = f ( X ) . Assume that f ( X ) is differentiable, we have:

f ( X ) . f ( X ) + f / ( X )( X X )

Take variance at both sizes and notice that f ( X ) and f / ( X ) are constants:

Var f ( X ) . Var f ( X ) + f / ( X )( X X )

= f / ( X ) Var ( X X ) = f / ( X ) Var ( X )

2 2

2

2 2

d

dx

X

X =X

Var ( X ) =

1

2 X

Var ( X )

To get a feel of this formula, set Y = f ( X ) = cX , where c is a constant. Then the delta

formula becomes: Var [ cX ] . c 2Var ( X ) .

2

Var f ( X ) . f / ( X ) Var ( X ) f / ( X )

()

Suppose we want to find the variance of f % , where % is an estimator of a true

parameter % . Please note that % is a random variable. For example, if % is the maximum

likelihood estimator, % varies depending on the sample size and on the sample data we

have observed. Also assume based on the sample data we have, we get one estimator %0 .

Set X = % and E ( X ) = E % : ()

() () ()

2

Var f % . f/ E % Var %

However, we dont know the true value of % . Nor do we know f / E % () . Assume that,

based on your sample data on hand, the maximum likelihood estimators for the true

parameters % is a . Then we might want to set % . a .Then we have:

Var f % () . f / ( a ) Var %

2

()

Variance of a function of two random variables

X has mean X and variance + X2 ; random variable Y has mean Y and variance + Y2

Define a new function Z = f ( X , Y ) . Assume that f ( Z ) is differentiable, we have:

f ( X , Y ) . f ( X , Y ) + f X/ ( X , Y )( X X ) + fY/ ( X , Y )(Y X )

f X/ ( X , Y ) , and fY/ ( X , Y ) are all constants:

Var f ( X , Y )

2 2

+2 f X/ ( X , Y ) f X/ ( X , Y ) Cov ( X X ) , ( X X )

. f X/ ( X , Y ) Var ( X ) + fY/ ( X , X ) Var (Y )

2 2

+2 f X/ ( X , Y ) f X/ ( X , Y ) Cov ( X , Y )

Var f ( X , Y )

Var ( X ) Cov ( X , Y ) f X/ ( X , Y )

. f /

( X , Y ) f /

( X , Y )

X X

Cov ( X , Y ) Var ( Y ) f X/ ( X , Y )

likelihood estimators. As a simple case, say we have two maximum likelihood estimators

%1 and % 2 . We want to find the variance of f %1 ,% 2 . Setting X = %1 , Y = % 2 , ( )

( )

X = E %1 , X = E % 2 , we have: ( )

Var f %1 , % 2 ( ) .

(% ,% ) ( ) (% ,% ) ( ) ( ) (% ,% ) Cov (% ,% )

2 2

/

fE % Var %1 + f E % /

Var % 2 + 2 f E/ % %1 ,% 2 f E/ %

1

1 2

( )

2

1 2

( )

1 ( )

2

1 2 1 2

( )

and E % 2 = % 2 . Then

Var f %1 , % 2( ) .

( ) ( ) ( ) ( ) ( ) ( ) ( )

2 2

f%/1 %1 ,% 2 Var %1 + f%/2 %1 ,% 2 Var % 2 + 2 f%/1 %1 ,% 2 f%/2 %1 ,% 2 Cov %1 ,% 2

However, we dont know the true value of %1 and % 2 . Nor do we know f%/1 %1 ,% 2 and ( )

( )

f%/2 %1 ,% 2 . Assume that, based on your sample data on hand, the maximum likelihood

estimators for the true parameters %1 and % 2 are a and b respectively. Then we might

want to set

(

f%/1 %1 , % 2 = ) 1

1%1

f %1 , % 2 ( ) .

1

1%1

f %1 , % 2 ( ) ,

%1 %1 = a

(

f%/2 %1 , % 2 = ) 1

1% 2

f %1 , % 2 ( ) .

1

1% 2

f %1 , % 2 ( )

%2 % 2 =b

Then we have:

(

Var f %1 , % 2 )

2 2

.

1

1%1

(

f %1 , % 2 ) Var %1 + ( ) 1

1% 2

f %1 , % 2 ( ) ( )

Var % 2

%1 = a % 2 =b

+2

1

1%1

(

f %1 , % 2 ) 1

1% 2

(

f %1 , % 2 ) (

Cov %1 , % 2 )

%1 = a %2 =b

1

1%1

f %1 , % 2 ( ) as

1

1%1

(

f %1 ,% 2 )

%1 = a %1

and

1

1% 2

(

f %1 , % 2 ) as

1

1% 2

(

f %1 , % 2 ) . Then

% 2 =b %2

(

Var f %1 , % 2 )

2 2

.

1

1%1

(

f %1 , % 2 ) Var %1 + ( ) 1

1% 2

f %1 , % 2 ( ) Var % 2 ( )

%1 %2

+2

1

1%1

(

f %1 , % 2 ) 1

1% 2

(

f %1 , % 2 ) (

Cov %1 , % 2 )

%1 %2

However, youll need to remember that

1

1%1

f %1 , % 2( ) really means

%1

1

1%1

f %1 , % 2( ) and that

1

1% 2

(

f %1 , % 2 ) really means

1

1% 2

(

f %1 , % 2 ) .

%1 = a %2 % 2 =b

variable, yet %1 in the symbol [ ]% 1

is not a random variable but a fixed maximum

likelihood estimator.

Var f %1 , % 2 ( )

Var % 1( ) (

Cov % 1 ,% 2 ) ( )

f%/ %1 ,% 2

(

. f% %1 ,% 2

/

) (

f% %1 , % 2

/

) 1

1 2

(

Cov % 1 ,% 2 ) Var % 2 ( ) f%/

2

(% ,% )

1 2

Var % 1 ( ) (

Cov % 1 , % 2 ) = I 1 , where I is Fisher Information.

(

Cov % 1 , % 2 ) Var % 2 ( )

You model a loss function using lognormal distribution with parameters and + . You

are given:

The maximum likelihood estimates of and + are

= 4.215

+ = 1.093

0.1195 0

0 0.0597

1

The mean of the lognormal distribution is exp + + 2

2

Estimate the variance of the maximum likelihood estimate of the mean of the lognormal

distribution, using the delta method.

Guo Fall 2009 C, Page 70 / 284

Solution

1

The mean function is f ( , + ) = exp + + 2 . The maximum likelihood estimator of

2

( ) 1 2

f ( , + ) is f , + = exp + + , where and + are maximum likelihood

2

estimator of and + respectively.

( )

We are asked to find Var f , +

1 2

= Var exp + +

2

.

( )

f ,+ . f ( ,+ ) +

1

1

( ) ( ) +

f ,+

1

1+

( ) (+ + )

f ,+

+

2

2

1

( )

Var f , + .

1

f ,+ ( ) ( )

Var +

1

1+

f ,+ ( ) +

Var + ( )

+2

1

1

( )

f ,+

1

1+

( )

f ,+

+

( )

Cov , +

0.1195 0

0 0.0597

( ) ( )

So Var . 0.1195 , Var + . 0.0597 , Cov , + . 0 . ( )

2

2

( )

Var f , + .

1

1

f ,+( ) 0.1195 +

1

1+

( )

f ,+

+

0.0597

1

1

f ,+ ( ) and

1

1+

( )

f ,+

+

.

Consequently, we set

1

1

f ,+ ( ) .

1

1

f ,+ ( ) ,

1

1+

f ,+( ) +

.

1

1+

( )

f ,+

+

1

1

( )

f ,+ =

1

1

1 2

exp + +

2

1 2

= exp + +

2

1

1+

( )

f ,+ =

1

1+

1 2

exp + +

2

1 2

= + exp + +

2

1

1

( )

f ,+

1 2

. exp + +

2

1

. exp 4.125 + 1.0932 = 123.02

2

1

1

( )

f ,+

1 2

. + exp + +

2

1

. 1.093exp 4.125 + 1.0932 = 134.46

2

Please note that you can also solve this problem using the black-box formula

(

Var f %1 , % 2 )

2 2

.

1

1%1

(

f %1 , % 2 ) ( )

Var %1 +

1

1% 2

f %1 , % 2( ) ( )

Var % 2

%1 %2

+2

1

1%1

(

f %1 , % 2 ) 1

1% 2

(

f %1 , % 2 ) (

Cov %1 , % 2 )

%1 %2

However, I recommend that you first solve the problem using Taylor series

approximation. This forces you to understand the logic behind the messy formula. Once

you understand the formula, next time you can use the memorized formula for

(

Var f %1 , % 2 ) and quickly solve the problem.

The time to an accident follows an exponential distribution. A random sample of size two

has a mean time of 6. Let Y represent the mean of a new sample of size two.

Use the delta method to approximate the variance of the maximum likelihood estimator

of FY (10 ) .

Solution

The time to an accident follows an exponential distribution. Assume % is the mean for

this exponential distribution. If X 1 and X 2 are two random samples of time-to-accident,

then the maximum likelihood estimator of % is just the sample mean. So % = 6 .

X1 + X 2

Pr (Y > 10 ) = Pr > 10 = Pr ( X 1 + X 2 > 20 )

2

2

te t 6

Pr ( X 1 + X 2 > 20 ) = &20 36 dt

2

te t 6

To calculate & 36 dt , youll need to memorize the following shortcut:

20

+2 1

& x e x /%

dx = (a + % ) e a /%

a %

+2 1

& (a + % )

x /% 2 a /%

x2 e dx = +% 2 e

a %

If interested, you can download the proof of this shortcut from my website

http://www.guo.coursehost.com. The shortcut and the proof are in the sample chapter of

my P manual. Just download the sample chapter of P manual and youll get the proof and

more worked out examples using this shortcut.

2 2

te t 6 1 1 1

&20 36 dt = 6 20& t 6 e

t 6

dt = [ 20 + 6] e 20 6

= 0.1546

6

20 t% 20 t %

te te

FY (10 ) = Pr ( X 1 + X 2 20 ) = & dt FY (10 ) = & dt

0

%2 0 %

2

2

t %

1 t %

()

20 20

te te

Var FY (10 ) = Var & dt . & dt Var %

1%

2 2

0 % 0 % ()

E % .6

()

Var % = Var X = Var

X1 + X 2

2

( )

= ( 2 ) Var ( X ) = % 2 . ( 6 2 ) = 18

1

4

1

2

1

2

Please note that the two samples X 1 and X 2 are independent identically distributed with

a common variance Var ( X ) = % 2 .

1 20

te t %

Next, we need to calculate

1%

&

0 %

2

dt .

20 t % 20 2 2

te 1 1 1 1 1

& &t &t &t

t % t % t %

dt = e dt = e dt e dt

% % % % %

2

0 % 0 0 20

=

1

%

% ( 20 + % ) e 20 %

=1 1+

20

%

e 20 %

1 20 20 % 1 20 20 % 400 20 %

1 1+ e = 1+ e = e

1% % 1% %

3

%

1 20

te t %

400 20

1%

& %

2

dt =

%

3

exp

%

= 0.066

0

()

E % .6 6

1 t %

()

20

te

Var FY (10 ) . & dt Var % = 0.0662 (18) = 0.078

1%

2

0 % ()

E % .6

Chapter 3 Kernel smoothing

Essence of kernel smoothing

Kernel smoothing

=Set your point estimate equal to the average of a neighborhood

=Recalculate at every point by averaging this point and the nearby points

Let me illustrate this with a story. You want to buy a house. After looking at many

houses, you find one house you like most. You go the current owner of the house and ask

for the price. The current owner tells you, Im asking for $210,000. Make me an offer.

What are you going to offer? 200,000? $203,000? $205,000 or something else? You are

not sure. And you know the danger: if your offer is too high, the seller accepts your offer

and youll overpay the house; if your offer is too low, youll look stupid and the seller

may refuse to deal with you anymore. So to your best interest, youll want to make your

offer reasonable, not too high, not too low.

If you talk to someone experienced in the real estate market, hell tell you how (and this

works): instead of making a random offer, you can make your offering price to be around

the average selling price of the similar houses sold in the same neighborhood.

Say four similar houses in the same neighborhood are sold this year. Their prices are

$198,000, $200,000, $201,000, and $202,000. So the average selling price is $200,250. If

the house you want to be is truly similar to these four houses, then the seller is asking for

too much. You can offer around $200, 250 and explain to the seller that your asking price

is very similar to the selling price of the houses in the same neighborhood. A reasonable

seller will be willing to lower his asking price.

If we focus on one house alone, its selling price appears random. However, when we

broaden our view and look at many similar houses nearby, well remove the randomness

of the asking price and see a more reasonable price.

This simple story illustrates the spirit of kernel smoothing. When we want to estimate

f X ( x ) , probability density of a random variable X at point x . Instead of looking at one

# of x's in the sample

point x and say f X ( x) = p ( x) = , we may want to look at the x s

sample size n

neighborhood. For example, we may want to look at 3 data points x b , x , and x + b

where b is a constant. Then we calculate the average of empirical densities at x b , x ,

and x + b and use it as an estimator of f X ( x ) :

1 1 1

f X ( x) = p ( x b) + p ( x) + p ( x + b)

3 3 3

calculate f ( x ) by averaing the empirical densities

of a neighborhood x b , x , x + b

Please note the analogy of determining the house price is not perfect. Theres one small

difference between how we estimate the price of a house located at x and how we

estimate f X ( x ) . When we estimate the fair price of a house located at x , we exclude the

data point x because we dont know the value of the house located at x :

= 0.5 *value of the houses located at x b + 0.5 *value of the houses located at x + b

our estimate:

1 1 1

f X ( x) = p ( x b) + p ( x) + p ( x + b)

3 3 3

p ( x ) . Stated differently, in kernel smoothing, we estimate f X ( x ) twice. The first time,

# of x's in the sample

we use the empirical density p ( x ) = to estimate f X ( x ) . The 2nd

sample size n

time, we refine our estimate f X ( x ) = p ( x ) by taking the average empirical densities of

x and its nearby points x b and x + b . This is why kernel smoothing recalculates at

every point by averaging this point and its nearby points.

Of course, we can expand our neighborhood. Instead of looking at only two nearby points,

we may look at 4 nearby points and calculate the average empirical density of a 5-point

neighborhood:

1 1 1 1 1

f X ( x) = p ( x 2b ) + p ( x b ) + p ( x ) + p ( x + b ) + p ( x + 2b )

5 5 5 5 5

calculate f ( x ) by averaing the empirical densities

of a neighborhood x 2b , x b , x , x +b , x + 2b

In addition, we dont need to use equal weighting. We can assign more weight to the data

points near x . For example, we can set

1 2 4 2 1

f X ( x) = p ( x 2b ) + p ( x b) + p ( x) + p ( x + b ) + p ( x + 2b )

10 10 10 10 10

Now you understand the essence of kernel smoothing. Lets talk about the two major

issues to think about if you want to use kernel smoothing:

How big is the neighborhood? This is called the bandwidth. The bigger the

neighborhood, the greater the smoothing. However, if your neighbor is too big,

you may run the risk of over-smoothing and finding false patterns.

How much weight you do give to each data point in the neighborhood? For

example, you can assign equal weight to each data point in the neighborhood.

You can also give more weight to the data point closer to the point whose density

you want to estimate. There are many weighting methods out there for you to use.

The weighting method is called kernel.

Of these two factors, the bandwidth is typically more important than the weighting

method. Your final result may not change much if you use different weighting method.

However, if you change the bandwidth, your estimated density may change widely.

Theres some literature out there explaining in more details on how to choose a proper

bandwidth and a proper weighting method. However, for the purpose of passing Exam C,

you dont need to know that much.

Uniform kernel. This is one of the easiest weighting methods. If you use this

method to estimate density, youll assign equal weight to each data point in the

neighborhood.

Triangular kernel. Under this weighting method, you give more weight to the

data points that are closer to the point for which you are estimating density.

Gamma kernel. This is more complex but less important than the uniform kernel

and the triangular kernel. If you want to cut some corners, you can skip the

gamma kernel.

Now lets look at the math formulas. Lets focus on the uniform kernel first.

Uniform kernel

0 if x < y - b

1

ky ( x) = if y - b x y+b

2b

0 if x > y + b

Lets look at the symbol k y ( x ) . Here x is your target data point (the location of the house

you want to buy) for which you want to estimate the density (the fair price of the house

you want to buy). y is a data point in the neighborhood (location of a similar house in the

neighborhood). k y ( x ) is y s weight for estimating the density function of x .

f ( x) = p ( yi ) k yi ( x )

All yi

kernel estimator of the empirical density of yi yi 's weight

density function at x

Calculate the density at x by taking the average of

the empirical densities of the nearby points yi 's

0 if x < y - b

x y+b

K y ( x) = if y - b x y + b

2b

1 if x > y + b

F ( x) = p ( yi ) K yi ( x )

All yi

kernel estimator of the empirical density of yi yi 's weight

distribution function at x

Calculate the distribution function at x by taking the

average of the empirical densities of the nearby points yi 's

Now lets look at the formula for k y ( x ) . The formula looks intimidating. The good news

is that you really dont need to memorize it. You just need to understand the essence of

the uniform weighting method. Once you understand the essence, you can derive the

formula effortless on the spot.

0 if x < y - b

0 if y - x > b

1

ky ( x) = if y - b x y+b ky ( x) = 1

2b if y - x b

0 if x > y + b 2b

To help us remember the formula, lets draw a neighborhood diagram:

A B

y1 x b y3 x y4 x+b y2

D C

Here your neighborhood is [x b, x + b]. b is called the bandwidth, which is half of the

width of the neighborhood you have chosen. Now the formula for k y ( x ) becomes:

0 if y is OUT of the

0 if y - x > b neighborhood [ x - b, x + b]

ky ( x) = 1 ky ( x) = 1

if y - x b if y is in the

2b 2b

neighborhood [ x - b, x + b]

If the data point y is out of the neighborhood [x b, x + b] , its weight is zero. We throw

this data point away and not use it in our estimation. And this should make intuitive sense.

In the neighborhood diagram, data points y1 and y2 are discarded.

If the data point y is in the neighborhood [x b, x + b], well use this data point in our

estimation and assign a weight 1 2b . In the neighborhood diagram, data points y3 and

y4 are used in the estimation and each gets a weight1 2b .

This is how we get 1 2b . Area ABCD represents the total weight we can possibly assign

to all the data points in the neighborhood. So well want the total area ABCD equal to

one.

1

Area ABCD = AB * BC = (2b) * BC =1, so BC = .

2b

So for each data point that falls in the neighborhood AB, its weight is BC = 1 2b . For

each data point that falls out of the neighborhood AB, its weight is zero.

Now you shouldnt have trouble memorizing the uniform kernel formula for k y ( x ) .

Next, lets look at the formula for K y ( x ) , the weighting factor for the distribution

function at x :

0 if x < y - b

x y+b

K y ( x) = if y - b x y + b

2b

1 if x > y + b

Its quite complex to derive the K y ( x ) . So lets not worry about how to derive the

formula. Lets just find an easy way to memorize the formula. Once again, lets draw a

neighborhood diagram:

A F B

x b y x x+b

D E C

To find how much weight to give to the data point y toward calculating the F ( x ) , draw

a vertical line at the data point y (Line EF). Next, imagine that you use a pair of scissors

to cut off whats to the left of Line EF while keeping whats to the right of Line EF. Next,

calculate the area of the neighborhood rectangular ABCD thats remaining after the cut.

This remaining area of the neighborhood rectangular ABCD that survives the cut is

K y ( x ) . Lets walk through this rule.

the data point y .

A F B

x b y x x+b

D E C

Next, we use a pair of scissors and cut off whats to the left of Line EF. New the diagram

becomes:

F B

y x x+b

E C

Next, we calculate the area of the neighborhood rectangular ABCD that survives the cut.

After the cut, the original neighborhood rectangular ABCD shrinks to the rectangular

EFBC. The area of surviving area is:

1 x y+b

EFBC = EF EC = ( x + b y) =

2b 2b

Situation Two If y < x b (see the diagram below), we draw a vertical line EF at

the data point y .

F A B

y x b x x+b

E D C

Next, we use a pair of scissors and cut off whats to the left of Line EF. New the diagram

is as follows:

F A B

y x b x x+b

E D C

The original neighborhood rectangular ABCD completely survives the cut. So well set

K y ( x ) = ABCD = 1 .

Situation Three If y > x + b (see the diagram below), we draw a vertical line EF at

the data point y .

A B F

x b x x+b y

D C E

Next, we use a pair of scissors and cut off whats to the left of Line EF. New the diagram

is as follows:

F

y

The original neighborhood rectangular ABCD is completely cut off. So well set

K y ( x) = 0 .

Now you see that you really dont need to memorize the ugly K y ( x ) formula. Just draw

a neighborhood diagram, use a pair of scissors, choose y at the cutting point and cut off

the left side of the diagram. Then you just calculate the surviving area of the

neighborhood rectangle. The surviving area is the K y ( x ) .

Triangular kernel

In the uniform kernel, every data point in the neighborhood gets an identical weight of

1 2b . Say we have two data points in the neighborhood y3 and y4 , but y4 is closer to x

and y4 is farther away from x (see the diagram below).

A B

x b y3 x y4 x+b

D C

However, often times it makes sense for us to give y4 more weight than y3 . For example,

x is the location of the house you want to buy; y3 and y4 are the locations of the two

similar houses in your neighborhood. It makes intuitive sense for us to give more weight

to the house located at y4 than the one located at y3 . If the house located at y3 was sold

at $200,000 and the house located at y4 was once sold at $198,000, we might want to

assign 40% weight to the house located at y3 and 60% to the one located at y4 . Then the

estimated fair price of the house located at x is:

60%* Price of the house located at y4 + 40% * Price of the house located at y3

= 60% * 198,000 + 40% * 200,000 = $198,800

Here comes the kernel smoothing. Kernel smoothing assigns more weight to a data point

closer to the point for which we need to estimate the density. Its assign less weight to a

data point farther away from the point for which we need to estimate the density.

Lets make sense of the triangular kernel formulas for k y ( x ) and K y ( x ) . First, lets look

at k y ( x ) :

0 if x < y - b

b+ x y

if y - b x y

b2

ky ( x) =

b+ y x

if y x y+b

b2

0 if x > y + b

0 if x < y - b

b+ x y 0 if x- y >b

if y - b x y

b2 b+ x y

ky ( x) = ky ( x) = if x y x+b

b+ y x b2

if y x y+b

b2 b+ y x

if x - b y x

0 if x > y + b b2

equivalent to x - b y x .

H

F

A E C G B

y1 x b y2 x y3 x+b y4

The neighborhood is [A, B]= [x b, x + b]. Now the k y ( x ) formula becomes:

0 if x- y >b

b+ x y

ky ( x) = if x y x+b

b2

b+ y x

if x - b y x

b2

b+ x y

ky ( x) = if y is in the right-half neighborhood, that is y [ x, x + b ]

b2

b+ y x

if y is in the left-half neighborhood, that is y [ x - b, x ]

b2

y1 and y4 are out of the neighbor and have zero weight.

Now lets find k y when the data point y is in the neighborhood [ x - b, x + b] . Data points

y2 and y3 are in the neighborhood and their weights are equal to the height EF and GH

respectively.

Before calculating EF and GH, let me give you a preliminary high school math formula.

This formula is used over and over in the triangle kernel smoothing:

DE EC EC

= , DE = AB

AB BC BC

1

DE EC 2 2 2 2

DEC 2 DE EC DE EC

= = = , DEC = ABC = ABC

ABC 1 AB BC AB BC AB BC

2

where DEC represents the area DEC and ABC the area of ABC.

A

B E C

2

DE EC DEC EC

If you dont understand why = and = , youll want to review

AB BC ABC BC

high school geometry.

Now lets come back to the following diagram and calculate EF and GH. EF is the weight

assigned to the data point y2 . GH is the weight assigned to the data point y3 .

H

F

A E C G B

y1 x b y2 x y3 x+b y4

First, please note that the area of the triangle ABD represents the total weight assigned to

all the data points in the neighborhood [A, B]. So the area of the triangle ABD should be

one:

1

ABD = 0.5 * AB * CD = 1. However, AB= 2b . 0.5* 2b *CD=1, CD =

b

EF AE AE y (x b ) 1 b + y2 x

= , EF = CD = 2 = if y2 [x b, x ] ;

CD AC AC b b b2

GH BG BG x + b y3 1 b + x y3

= , GH = CD = = if y3 [ x, x + b ]

CD BC BC b b b2

So we have:

b+ x y

ky ( x) = if y is in the right-half neighborhood, that is y [ x, x + b ]

b2

b+ y x

if y is in the left-half neighborhood, that is y [ x - b, x ]

b2

0 if x < y - b

(b + x y)

2

if y - b x y

2b 2

K y ( x) =

(b + y x)

2

1 if y x y+b

2b 2

1 if x > y + b

0 if y ( , x b)

(b + x y)

2

if y [ x, x + b ]

2b 2

K y ( x) =

(b + y x)

2

1 if y [ x - b, x ]

2b 2

1 if y ( x + b, + )

to y [ x - b, x ] .

D

H

F

A E C G B

y1 x b y2 x y3 x+b y4

Situation One If y [ x, x + b ]

Draw a vertical line at the data point y (Line GH). Next, imagine that you use a pair of

scissors and cut off whats to the left of Line GH while keeping whats to the right of

Line GH. Next, calculate the area of the triangle ABD remaining after the cut. This

remaining area after the cut is K y ( x ) .

A C G B

x b x y x+b

G B

y x+b

(x +b y)

2

1 x+b y

2 2

BG

K y ( x ) = BGH = BDC = =

BC 2 b 2b 2

Situation Two If y [ x - b, x ]

D

A E C B

x b y x x+b

Draw a vertical line at data point y (Line EF). Cut off whats to the left of EF.

E C B

y x x+b

K y ( x ) = BDFE = 1 AEF

(x b) (b + x y)

2 2 2

AE 1 y

AEF = ACD = ! " =

AC 2 b 2b 2

(b + x y)

2

K y ( x ) = BDFE = 1

2b 2

Situation three If y ( , x b)

D

N

M A C B

y x b x x+b

Draw a vertical line MN at data point y . Cut off whats to the left of line MN. Now the

whole area ABD will survive the cut. So K y ( x ) = 1 .

Situation Four If y ( x + b, + )

A C B R

x b x x+b y

Draw a vertical line RS at data point y . Cut off whats to the left of line RS. Now the

whole area ABD will be cut off. So K y ( x ) = 0 .

Now you see that you really dont need to memorize the complex formulas for K y ( x ) .

Just draw a diagram and directly calculate K y ( x ) .

Gamma kernel

#

#x ax y

e

y

ky ( x) = , where x > 0

x$ (# )

To understand the gamma kernel, youll need to know this: in kernel smoothing, all the

weights should add up to one. Because of this, for convenience, we can use a density

function as weights. This way, the weights automatically add up to one.

(x % )

# x %

e y

In the gamma kernel, we just use gamma pdf . However, we set % =

x$ (# ) #

(x % )

# x %

e x# 1e x % x# 1e xa y

ky ( x) = = # =

x$ (# ) % $ (# ) y

#

$ (# )

#

The simplest gamma pdf is when a = 1 (i.e. exponential pdf). So the simple gamma

kernel is an exponential kernel:

1

ky ( x) = e x y

, where x > 0

y

x x

If you need to find the exponential kernel for F ( x ) , then K y ( x ) = & k y ( t )dt = 1 e . y

Problem 1

1, 2, 3, 3, 5, 6, 7, 8, 9, 9, 11, 12

Solution

The neighborhood is from 6- b =6-2=4 to 6+ b =6+2=8. When calculating f ( 6 ) , we

discard any data points that are out side of the neighborhood [4, 8]. So 1, 2, 3, 3, 9, 9,11,

12 are discarded. We only consider 5, 6, 7, 8. Each of these four data points has a weight

of 1 / (2*b)=1/4.

1 1 1 1 1 1 1 1 1

So f ( 6 ) = p ( y ) k y ( 6) = + + + =

12 4 12 4 12 4 12 4 12

In the calculation of F ( 6 ) , any data point that falls out of the lower bound or touches the

lower bound of the neighborhood [4, 8] gets a full weight of 1. Data 1, 2, 3, 3 are below

the lower bound of the neighborhood [4, 8] and they each get a weight of 1. Any data

point that falls out of the upper bound or touches the upper bound of the neighborhood [4,

8] get zero weight. So 8 (touching the upper bound) and 9, 9, 11, 12 (staying above the

upper bound) each get zero weight.

Data points y = 5, 6, 7 are in the neighborhood range [4, 8]. If you draw a diagram, youll

find that the weights for y = 5, 6, 7 are:

3 2 1

K5 ( 6 ) = , K6 ( 6) = , K7 ( 6) =

4 4 4

F (6) = p ( y ) K y ( 6)

1 1 1 1 1 3 1 2 1 1

= (1) + (1) + (1) + (1) + + + ' 0.4583

12 12 12 12 12 4 12 4 12 4

y 1 2 3 3 5 6 7 8 9 9 11 12

p ( y ) 1 12 1 12 1 12 1 12 1 12 1 12 1 12 1 12 1 12 1 12 1 12 1 12

k y (6) 0 0 0 0 1/4 1/4 1/4 1/4 0 0 0 0

K y ( 6) 1 1 1 1 3/4 2/4 1/4 0 0 0 0 0

Problem 2

1, 2, 3, 3, 5, 6, 7, 8, 9, 9, 11, 12

Solution

If you draw the diagram, you should get:

y 1 2 3 3 5 6 7 8 9 9 11 12

p ( y ) 1 12 1 12 1 12 1 12 1 12 1 12 1 12 1 12 1 12 1 12 1 12 1 12

k y (6) 0 0 0 0 1/4 1/2 1/4 0 0 0 0 0

K y ( 6) 1 1 1 1 7/8 1/2 1/8 0 0 0 0 0

1 1 1 1 1 1 1

f ( 6) = p ( y ) k y ( 6) = + + =

12 4 12 2 12 4 12

F (6) = p ( y ) K y ( 6)

1 1 1 1 1 7 1 1 1 1

= (1) + (1) + (1) + (1) + + + = 0.42708

12 12 12 12 12 8 12 2 12 8

Problem 3

1, 2, 3, 3, 5, 6, 7, 8, 9, 9, 11, 12

Solution

x x

1

Gamma kernel with # = 1 ky ( x) = e x y

, K y ( x ) = & k y ( t )dt = 1 e y

y 0

f ( 6) = p ( y ) k y ( 6)

1 1 61 1 1 62 1 1 63 1 1 63 1 1 65 1 1 66

= e + e + e + e + e + e

12 1 12 2 12 3 12 3 12 5 12 6

1 1 67 1 1 68 1 1 69 1 1 69 1 1 6 11 1 1

+ e + e + e + e + e + e 6 12

12 7 12 8 12 9 12 9 12 11 12 12

' 0.0248

F (6) = p ( y ) K y ( 6)

=

1

12

(1 e 6 1

) + 121 (1 e 6 2

) + 121 (1 e 63

) + 121 (1 e 63

) + 121 (1 e 65

) + 121 (1 e 66

)

+ (1 e ) + 121 (1 ) + 121 (1 ) + 121 (1 ) + 121 (1 ) + 121 (1 )

1 6 7 6 8 69 69 6 11 6 12

e e e e e

12

' 0.658

Nov 2003 #4

You study five lives to estimate the time from the onset of a disease to death. The times

to death are:

2 3 3 3 7

Using a triangular kernel with bandwidth 2, estimate the density function at 2.5.

Solution

The neighborhood is [0.5, 4.5]. If you draw a neighborhood diagram, you should get:

y 2 3 3 3 7

p( y) 15 15 15 15 15

1.5 1.5 1.5 1.5

k y ( 2.5) 4 4 4 4

0

f ( 2.5 ) = p ( y ) k y ( 2.5 ) = + + + = 0.3

5 4 5 4 5 4 5 4

From a population having distribution function F , you are given the following sample:

Calculate the kernel density estimate F ( 4 ) , using the uniform kernel with bandwidth 1.4.

Solution

G H I J K L

A B C D E F

2 2.6 3.3 4 4.7 5.4

If you use scissors to cut whats left to the line AG at y = 2 , the neighborhood

rectangular BEKG completely survives the cut. So K y = 2 ( 4 ) = ABD = 1 .

If you use scissors to cut whats left to the line CI at y = 3.3 , the surviving area is CFLI.

Area CFLI=0.75. So K y =3.3 ( 4 ) = 0.75

If you use scissors to cut whats left to the line DJ at y = 4 , the surviving area is DFLI,

which is 0.5. K y = 4 ( 4 ) = BCD = 0.5 .

If you use scissors to cut whats left to the line EK at y = 4.7 , the surviving area is EFLK,

which is 0.25. So K y = 4.7 ( 4 ) = 0.25 .

p( y) 18 18 18 18 18 18 18 18

K y ( 6) 1 0.75 0.75 0.5 0.5 0.25 0.25 0.25

1 1 1 1

F ( 4) = p ( y ) K y ( 4) = (1) + ( 0.75) 2 + ( 0.5 ) 2 + ( 0.25 ) 3 = 0.53125

8 8 8 8

Chapter 4 Bootstrap

Essence of bootstrapping

Loss Models doesnt explain bootstrap much. As a result, many candidates just memorize

a black-box formula without understanding the essence of bootstrap.

Let me explain bootstrap with an example. Suppose you want to find out the mean and

variance of GRE score of a group of 5,000 students. One way to do so is to take out lot of

random samples. For example, you can sample 20 students GRE scores and calculate the

mean and variance of the GRE score. Here you have one sample of size 20. Of course,

you want to take many samples. For example, you can take out 30 samples, each sample

consisting 20 students GRE score. For each of the 30 samples, you can calculate the

mean and variance of the GRE score.

As you can see, taking 30 samples of size 20 takes lot of time and money. As a research

scientist, you are short of research grant. And your life is busy. Is there any way you can

cut some corners?

You can cut corners this way. Instead of taking out 30 samples of size 20, you just take

out one sample of size 20 and collect 20 students GRE scores. These 20 scores are X 1 ,

X 2 ,, X 20 . You bring these 20 scores home. Your data collection is done.

Next, you reproduce 30 samples of size 20 each from one sample of size 20. How? Just

resample from your one sample of 20 scores. You randomly select 20 scores with

replacement from the 20 scores you have. This is your 1st resample. Next, you randomly

select 20 scores with replacement from the 20 scores you have. This is your 2nd resample.

If you repeat this process 30 times, youll get 30 resamples of size 20 each. If you repeat

this process 100 times, youll get 100 resamples of size 20 each. Now your original one

sample gives birth to many resamples. How wonderful.

The rest is easy. If you have 30 resamples, you can calculate the mean and variance of the

GRE scores for each sample. This should give you a good idea of the mean and variance

of the GRE scores.

Does this sound a fraud? Not really. Your original sample of size 20 X 1 , X 2 ,, X 20

reflects the population. As a result, resamples from this sample are pretty much what you

get if you take out many samples from the population. (By the way, the bootstrap comes

from the phrase to pull oneself by ones bootstrap.)

To use bootstrap, youll need to have a computer and some bootstrapping software to

quickly create a great number (such as 10,000) of resamples and to calculate the statistics

of the resamples. Bootstrap is a computer-intensive technique.

To summarize, bootstrap reduces researchers time and money spent on data collection.

Researchers just need to collect one good sample and bring it home. Then they can use

computers to create resamples and calculate statistics data.

For more information on bootstrap, you can download the free PDF file at

http://bcs.whfreeman.com/pbs/cat_160/PBS18.pdf

1 3

(X )

2

1

( F ) = Var ( X ) using the estimator g ( X 1 , X 2 ) =

2

You estimate i X , where

2 i =1

X1 + X 2

X= . Determine the bootstrap approximation to the mean square error.

2

Solution

Var ( X ) = E ( X ) E ( X ) = (12 + 32 )

1 1

2 2

(1 + 3) =1

2 2

Under the bootstrap method, you resample from your original sample with replacement.

Your resamples are: (1,1), (1,3),(3,1), and (3,3), each having probability of 1 4 .

(X )

2

1

For each resample, you calculate g ( X 1 , X 2 ) =

2

i X . Then the mean square

2 i =1

2

X1 + X 2

(X )

Resample with 1 2

g ( X1, X 2 ) =

2

replacement X= i X

2 2 i =1

( X1 , X 2 )

(1,1) 1 1

(1 1) + (1 1) =0

2 2

2

(1,3) 2 1

(1 2) + (3 2) =1

2 2

2

(3,1) 2 1

(3 2 ) + (1 2 ) =1

2 2

2

(3,3) 3 1

(3 3) + ( 3 3) =0

2 2

2 2

=

1 1 1 1 1

= ( 0 1) + (1 1) + (1 1) + ( 0 1) =

2 2 2 2

4 4 4 4 2

2 3

(X )

2

( F ) = Var ( X ) using the estimator g ( X 1 , X 2 ) =

2

You estimate i X , where

i =1

X1 + X 2

X= . Determine the bootstrap approximation to the mean square error.

2

Solution

The only difference between this problem and the previous problem (May 2000 #17)

(X )

2

is the definition of g ( X 1 , X 2 ) . In this problem, g ( X 1 , X 2 ) =

2

i X ; in the

i =1

(X )

2

1

previous problem, g ( X 1 , X 2 ) =

2

i X .

2 i =1

Var ( X ) = E ( X ) E ( X ) = (12 + 32 )

1 1

2 2

(1 + 3) =1

2 2

Under the bootstrap method, you resample from your original sample with replacement.

Your resamples are: (1,1), (1,3),(3,1), and (3,3), each having probability of 1 4 .

(X )

2

For each resample, you calculate g ( X 1 , X 2 ) =

2

i X . Then the mean square error

i =1

2

X1 + X 2

(X )

Resample with 2

g ( X1, X 2 ) =

2

replacement X= i X

2 i =1

( X1 , X 2 )

(1,1) 1 (1 1) + (1 1) = 0

2 2

(1,3) 2 (1 2) + (3 2) = 2

2 2

(3,1) 2 (3 2 ) + (1 2 ) = 2

2 2

(3,3) 3 (3 3 ) + ( 3 3) = 0

2 2

2 2

=

1 1 1 1

= ( 0 1) + ( 2 1) + ( 2 1) + ( 0 1) = 1

2 2 2 2

4 4 4 4

May 2005 #4

g ( X1, X 2 , X 3 ) =

1

(X )

3

i X

3

Solution

First, you need to understand that the n -th central moment is E {X E(X )

n

}.

For example, the 1st central moment is

The 2nd central moment is E {X E(X )

2

} = Var ( X ) .

The 3rd central moment is E {X E(X )

3

}.

Your original sample is (1,1,4). The 3rd central moment of this sample is calculated as

follows:

X=

1+1+ 4

3

=2 , E {X E(X )

3

} = 13 (1 2) +

3 1

3

3 1

(1 2 ) + ( 4 2 ) = 2

3

3

The third central moment of this original sample is used to approximate the true 3rd

central moment of the population. So the true parameter is = 2 .

Next, you need to understand bootstrap. Under bootstrap, you resample from the original

sample with replacement. Imagine you have 3 boxes to fill from left to right. The 1st box

can be filled with any number of your original sample (1,1,4); the 2nd box can be filled

with any number of your original sample (1,1,4); and the 3rd box can be filled with any

number of your original sample (1,1,4). The # of resamples is 33=27. This is a concept in

Exam P.

1

(X )

3

i X .

3

(1) Three 1s. The number of permutation is 8. To understand why, lets denote the

original sample as (a,b,c) with a=1, b=1, and c=4. Then the following 8 resamples will

produce (1,1,1): aaa,aab,aba,baa, bba,bab,abb, bbb. For the resample of (1,1,1),

X=

1+1+1

3

=1 , =E {X E(X )

3

} = 13 (1 1) +

3 1

3

3 1

(1 1) + (1 1) = 0 ,

3

3

( )

2

= ( 0 2) = 4

2

(2) Two 1s and one 4. The following 8 permutations will produce two 1s and one 4:

aac,aca,caa,bbc,bcb,cbb,abc,acb,cab,bac,bca,cba.

X=

1+1+ 4

3

=2 , E {X E(X )

3

} = 13 (1 2) +

3 1

3

3 1

(1 2 ) + ( 4 2 ) = 2 ,

3

3

( )

2

= ( 2 2) = 0

2

(3) Two 4s and one 1. The following 6 permutations will produce two 4s and one 1:

cca, cac, acc, ccb, cbc, bcc.

X=

1+ 4 + 4

3

=3 , E {X E(X )

3

} = 13 (1 3) +

3 1

3

3 1

( 4 3) + ( 4 3) = 2 ,

3

3

( )

2

= ( 2 2) = 0

2

(4) Three 4s. The following 1 permutation will produce two 4s and one 1: ccc.

X=

4+4+4

3

=4 , E {X E(X )

3

} = 13 ( 4 4) +

3 1

3

3 1

( 4 4) + ( 4 4) = 0 ,

3

3

( )

2

= ( 4 2) = 4

2

( ) 8 12 6 1

2

E = ( 4 ) + ( 0 ) + (16 ) + ( 4 ) 4.9 .

27 27 27 27

A sample of claim amounts is {300, 600, 1500}. By applying the deductible to this

sample, the loss elimination ratio for a deductible of 100 per claim is estimated to be

0.125.

1 600 600 1500

2 1500 300 1500

3 1500 300 600

4 600 600 300

5 600 300 1500

6 600 600 1500

7 1500 1500 1500

8 1500 300 1500

9 300 600 300

10 600 600 600

Determine the bootstrap approximation to the mean square error of the estimate.

Solution

Your original sample is {300, 600, 1500}. If you resample this sample with replacement,

youll get 33=27 resamples. However, calculating the mean square errors based on 27

Guo Fall 2009 C, Page 100 / 284

resamples is too much work under the exam condition. Thats why SOA gives you only

10 resamples.

E min ( X , d )

Loss elimination ratio is LERX ( d ) = .

E(X )

Loss elimination ratio for the original sample {300, 600, 1500} with 100 deductible is

0.125. SOA already gives the loss ratio. If we need to calculate it, this is how:

For the loss amount 300, the insurer pays only 200, saving 100.

For the loss amount 600, the insurer pays only 500, saving 100.

For the loss amount 1500, the insurer pays only 1400, saving 100.

1

The expected saving due to 100 deductible is: (100 + 100 + 100 ) = 100

3

1

The expected loss amount is: ( 300 + 600 + 1500 ) = 100 + 200 + 500 = 800

3

So the loss ratio is: 100 / 800 = 0.125

Next, for each of the 10 resamples, you calculate the loss ratio as we did for the original

sample. To speed up the calculation, lets set $100 as one unit of money. Then the

deductible is one.

( LER 0.125 )

2

X1 X2 X3 LER

Resample

1 6 6 15 1/9 0.000193

2 15 3 15 1/11 0.001162

3 15 3 6 1/8 0

4 6 6 3 1/5 0.005625

5 6 3 15 1/8 0

6 6 6 15 1/9 0.000193

7 15 15 15 1/15 0.003403

8 15 3 15 1/11 0.001162

9 3 6 3 1/4 0.015625

10 6 6 6 1/6 0.001736

Total 0.0291

For example, for the 1st resample {6,6,15}, the claim payment after the deductible of 1 is

{5,5,14}. So the LER is (1+1+1) / (6+6+15) =3/27=1/9.

10

1 0.0291

The MES = ( LERi 0.125 ) = = 0.0029

2

i =1 10 10

Chapter 5 Bhlmann credibility model

Trouble with black-box formulas

The Bhlmann credibility premium formula is tested over and over in Course 4 and Exam

C. However, many candidates dont have a good understanding of the inner workings of

the Bhlmann credibility premium model. They just memorize a series of black-box

formulas:

n E Var ( X )

Z= , k= , and P (1 Z ) + Z X

n+k Var ( )

Rote memorization of a formula without fully grasping the concepts is tedious, difficult,

and prone to errors. Additionally, a memorized formula will not yield the needed

understanding to grapple with difficult problems.

In this chapter, were going to dig deep into Bhlmanns credibility premium formula and

gain a crystal clear understanding of the concepts.

Lets start with a simple example to illustrate one major challenge an insurance company

faces when determining premium rates. Imagine you are the founder and the actuary of

an auto insurance company. Your companys specialty is to provide auto insurance for

taxi drivers.

Before you open your business, there are half of dozen insurance companies in your area

that offer auto insurance to taxi drivers. The world has been going on fine for many years

without your start up. It can continue going on without your start up. So its tough for you

to get customers. Finally, you take out a big portion of your saving account and buy TV

advertising, which brings in your first three customers: Adam, Bob, and Colleen. Since

your corporate office is your garage and you have only one employee (you), you decide

that three customers is good enough for you to start your business.

When you open your business at t = 0 , you sell three auto insurance policies to Adam,

Bob, and Colleen. The contract of your insurance policy says that the premium rate is

guaranteed for only two years. Once the two-year guarantee period is over, you have the

right to set the renewal premium, which can be higher than the guaranteed initial

premium.

When you set your premium rate at t = 0 , you notice that Adam, Bob, and Colleen are

similar in many ways. They are all taxicab drivers. They work at the same taxi company

in the same city. They are all 35 years old. They all graduated from the same high school.

They are all careful drivers. Therefore, at t = 0 you treat Adam, Bob, and Colleen as

identical risks and charge the same premium for the first two years.

To actually set the initial premium for the first two years, you decide to buy a rate book

from a consulting firm. This consulting firm is well-known in the industry. Each year it

publishes a rate manual that lists the average claim cost of a taxi driver by city, by

mileage and by several other criteria. Based on this rate manual, you estimate that Adam,

Bob, and Colleen may each incur $4 claim cost per year. So at t = 0 , you charge Adam,

Bob, and Colleen $4 each. This premium rate is guaranteed for two years.

During the 2-year guaranteed period, Adam, Bob, and Colleen have incurred the

following claims:

Year 1 Year 2 Total Claim Average claim

Claim Claim per insured per year

Adam $0 $0 $0 $0 / 2 = $0

Bob $1 $7 $8 $8 / 2 = $4

Colleen $4 $9 $13 $13 / 2 =$6.5

Grand Total $21

Average claim per person per year (for the 3-person group): $21 / (3 2) = $3.5

Now the two-year guarantee period is over. You need to determine the renewal premium

rate for Adam, Bob, and Colleen respectively for the third year. Once you have

determined the premium rates, you will need to file these rates with the insurance

department of the state where you do business (called domicile state).

Question: How do you determine the renewal premium rate for the third year for Adam,

Bob, and Colleen respectively?

One simple approach is to charge Adam, Bob, and Colleen a uniform rate (i.e. the group

premium rate). After all, Adam, Bob, and Colleen are similar risks; they form a

homogeneous group. As such, they should pay a uniform group premium rate, even

though their actual claim patterns for the past two years are different. You can continue

charging them the old rate of $4 per insured per year. However, since the average claim

cost for the past two years is $3.50 per insured per year, you can charge them $3.50 per

person for year three.

Under the uniform group rate of $3.50, Bob and Colleen will probably underpay their

premiums; their actual average annual claim for the past two years exceeds this group

premium rate. Adam, on the other hand, may overpay his premiums; his average annual

claim for the past two years is below the group premium rate. When you charge each

policyholder the uniform group premium rate, low-risk policyholders will overpay their

premiums and the high-risk policyholders will underpay their premiums. Your business

as whole, however, will collect just enough premiums to pay the claim costs.

However, in the real world, most likely you wont be able to charge Adam, Bob, and

Colleen a uniform rate of $3.50. Any of your customers can easily shop around, compare

premium rates, and buy an insurance policy elsewhere with a better rate. For example,

Adam can easily find another insurer who sells a similar insurance policy for less than

your $3.50 group rate. Additionally, the commissioner of your state insurance department

is unlikely to approve your uniform rate. The department will want to see that your low

risk customers pay lower premiums.

Under the classical theory of insurance, people with similar risks form a homogeneous

group to share the risk. Members of a homogeneous group are photocopies of each other.

The claim random variable for each member is independent identically distributed with a

common density function f X ( x ) . The uniform pure premium rate is E ( X ) . Each member

of the homogeneous group should pay E ( X ) .

policyholders, however similar, have exactly the same risks. If you as an insurer charge

everybody a uniform group rate, then low-risk policyholders will leave and buy insurance

elsewhere.

To stay in business, you have no choice but to charge individualized premium rates that

are proportional to policyholders risks.

Now lets come back to our simple case. We know that uniform rating wont work in the

real world. Well want to set up a mathematical model to calculate the fair renewal

premium rate for Adam, Bob, and Colleen respectively. Our model should reflect the

following observations and intuition:

Adam, Bob, and Colleen are largely similar risks. Well need to treat them as a

rating group. This way, our renewal rates for Adam, Bob, and Colleen are

somewhat related.

On the other hand, we need to differentiate between Adam, Bob, and Colleen. We

might want to treat Adam, Bob, and Colleen as potentially different sub-risks

within a largely similar rate group. This way, our model will produce different

renewal rates. We hope the renewal rate calculated from our model will agree

with our intuition that Adam deserves the lowest renewal rate, Bob a higher rate,

and Colleen the highest rate.

To reflect the idea that Adam, Bob, and Colleen are different sub-risks within a

largely similar rate group, we may want to divide the largely similar rate group

into four sub-risks (or more sub-risks if you like): super preferred, preferred,

standard, and sub-standard. So the rate group actually consists of four sub-risks.

Adam or Bob or Colleen can be any one of the four sub-risks.

Here comes a critical point: we dont know who belongs to which sub-risk. We

dont know whether Adam is a super-preferred sub-risk, or a preferred sub-risk, a

standard sub-risk, or a sub-standard sub-risk. Nor do we know to which sub-risk

Bob or Colleen belongs. This is so even if we have Adams two-year claim data.

Judged from his 2-year claim history, Adam seems to be a super preferred or at

least a preferred sub-risk. However, a bad driver can have no accidents for a while

due to good luck; a good driver can have several big accidents in a row due to bad

luck. So we really cant say for sure that Adam is indeed a better risk. All we

know that Adams sub-risk class is a random variable consisting of 4 possible

values: super preferred, preferred, standard; and substandard.

To visualize that Adams sub-risk class is a random variable, think about rolling a 4-sided

die. One side of the die is marked with the letters SP (super preferred); another side is

marked with PF (preferred); the third side is marked with STD (standard); and the

fourth side is marked with SUB (substandard). To determine Adam belongs to which

sub-class, well roll the die. If the result is SP, then well assign Adam to the super

preferred class. If the result is PF, well assign him to the preferred class. And so on

and so forth. Similarly, we can roll the die and randomly assign Bob or Colleen to one of

the four sub-classes: SP, PF, STD, and SUB.

Now we are ready to come up with a model to calculate the renewal premium rate:

Let random variable X j t represent the claim cost incurred in year t by the j -th insured,

where t = 1, 2,..., n , and n + 1 and j =1,2,, and m . Here in our example, n = 2 (we

have two years of claim data) and m = 1, 2,3 (corresponding to Adam, Bob, and Colleen).

common density function f X , ( x, ) , a common mean = E ( X j t ) , and a common

variance 2

( )

= Var X j t . What we are saying here is that all policyholders j =1,2,,

and m have identical mean claim and identical claim variance 2

.

representing the presence of multiple sub-risks. X j 1 , X j 2 ,, X j n , and X j n +1 , which

represent the claim costs incurred by the same policyholder, belong to the same sub risk

class .

in our example, = {SP, PF, STD, SUB} . When we say that is a realization of , we

mean that with probability p1 , = SP ; with probability p2 , = PF ; with probability

p3 , = STD ; with probability p4 = 1 ( p1 + p2 + p3 ) , = SUB .

Because X j 1 , X j 2 ,, X j n , and X j n +1 are claims generated from the same (unknown)

sub-risk class, we assume that given , X j 1 , X j 2 ,, X j n , and X j n +1 are independent

identically distributed. That is, X j 1 , X j2 , , X j n , X j n +1 are independent

identically distributed with a common conditional mean E X j t ( )= ( ) and a

common conditional variance Var X j t ( ).

We have observed X j 1 , X j 2 ,, X j n . Our goal is to estimate X j n +1 , the claim cost in

n

1

year n + 1 by the j -th insured, using his prior n -year average claim cost X j = X jt .

n t =1

The estimated value of X j n +1 is the pure renewal premium for year n + 1 . Bhlmanns

approach is to use a + Z X j to approximate X j n +1 subject to the condition that

( )

2

E a+ZX j X j n +1 is minimized.

a + Z X j = (1 Z ) + Z X j ,

Z=

n

, k=

(

E Var X j t ) =

E Var X j t( )

n+k Var E(X jt ) Var ( )

= E(X j t) = E E X j t ( ) =E ( ) .

Next, well derive the above formulas. However, before we derive the Bhlmann

premium formulas, lets go over some preliminary concepts.

Preliminary concept #1 Double expectation

E(X ) = E E(X )

If X is discrete, E ( X ) = E E(X ) = p( )E(X ).

all

+

If X is continuous, E ( X ) = E E(X ) = E(X )f ( )d

Ill explain the double expectation theorem assuming X is discrete. However, the same

logic applies when X is continuous.

Lets use a simple example to understand the meaning behind the above formula. A class

has 6 boys and 4 girls. These 10 students take a final. The average score of the 6 boys is

80; the average score of the 4 girls is 85. Whats the average score of the whole class?

This is an elementary level math problem. The average score of the whole class is:

Average score = = = = 82

# of students 10 10

6 4

Average score = ( 80 ) + ( 85 )

10 10

If we express the above calculation using the double expectation theorem, then we have:

6 4

= ( 80 ) + (85 ) = 82

10 10

So instead of directly calculating the average score for the whole class, we first break

down the whole class into two groups based on gender. We then calculate the average

score of these two groups: boys and girls. Next, we calculate the weighted average of

these two group averages. This weighted average is the average of the whole class. If you

understand this formula, you have understood the essence of the double expectation

theorem.

Instead of directly calculating the mean of the whole population, you first break down the

population into several groups based on one standard (such as gender). You calculate the

mean of each group. Next, you calculate the mean of all the group means. This is the

mean of the whole population.

Problem A group of 20 graduate students (12 with non-math major and 8 with math

major) have a total GRE score of 12,940. The GRE score distribution by major is as

follows:

Total GRE scores of 12 non-math major 7,740

Total GRE scores of 8 math major 5,200

Total GRE score 12,940

Find the average GRE score twice. First time, do not use the double expectation theorem.

The second time, use the double expectation theorem. Show that you get the same result.

Solution

(1) Find the mean without using the double expectation theorem. The average GRE score

for 20 graduate students is:

Average score = = = 647

# of students 20

12 7, 740 8 5, 200

= + = 647

20 12 20 8

Proof.

Var ( X ) = E ( X 2 ) E 2 ( X )

E ( X ) = EY E ( X Y ) , (

E ( X 2 ) = EY E X 2 Y )

( )

However, E X 2 Y = Var ( X Y ) + E 2 ( X Y ) .

Var ( X ) = E ( X 2 ) E 2 ( X ) = E Y Var ( X Y ) + E 2 ( X Y ) {E E ( X Y ) }

2

Y

{

= E Y Var ( X Y ) + E Y E 2 ( X Y ) (E Y E(X Y) ) }

2

= E Y Var ( X Y ) + Var Y E ( X Y )

If X is the lost amount of a policyholder and Y is the risk class of the policyholder, then

Var ( X ) = E Y Var ( X Y ) + Var Y E ( X Y ) means that the total variance of the loss

consists of two components:

Var Y E ( X Y ) , the variance of the average loss by risk class.

called the variance of hypothetical mean.

Total variance expected process variance variance of hypothetical mean

Next, lets look at a comprehensive example using double expectation and total variance.

distribution:

3!

P (n) = p n (1 p ) .

3 n

n !( 3 n ) !

Solution

E ( N ) = 3 p , Var ( N ) = 3 p (1 p )

However, p is also a random variable. So we cannot directly use the above formula.

To find E ( N ) , we divide N into different groups by p , just as we divided the class into

boys and girls. The only difference is that this time we have an infinite number of groups

( p is a continuous random variable).

Each value of p is a separate group. For each group, we will calculate its mean. Then we

will find the weighted average mean of all the groups, with weight being the probability

of each groups p value. The result should be E ( N ) .

1 1 1

E ( N ) = EP E ( N p ) E ( N p ) f P ( p ) dp =

3 2 3

= 3 p dp = p =

p= 0 p= 0 2 0 2

Alternatively, E ( N ) = EP E ( N p ) = EP [3 p ] = 3E ( P ) = 3

1 3

=

2 2

Next, well calculate Var ( N ) . One method is to calculate Var ( N ) from scratch using

the standard formula Var ( N ) = E ( N 2 ) E 2 ( N ) . Well use the double expectation

theorem to calculate E ( N 2 ) and E ( N ) .

( ) ( )

1

E(N 2

)=E P E N p2

= E N 2 p f ( p ) dp

0

( )

E N 2 p = E 2 ( N p ) + Var ( N p ) = ( 3 p ) + 3 p (1 p ) = 6 p 2 + 3 p

2

( )

1 1

E ( N ) = E N p f ( p ) dp = ( 6 p + 3 p ) dp = 2 p + p 2

3 7

2 2 2 3

=

0 0

2 0 2

Var ( N ) = E ( N ) 7 3 5

2

E (N) =

2

=

2 2 4

Alternatively, you can use the following formula to calculate the variance:

Because N p is binomial with parameter 3 and p , we have:

E ( N p ) = 3 p , Var ( N p ) = 3 p (1 p )

E p Var ( N p ) = E p 3 p (1 p ) = E p ( 3 p 3 p 2 )

= E p ( 3 p ) E p ( 3 p 2 ) = 3E p ( p ) 3 E p ( p 2 )

(b a)

2

a+b

If X is uniform over [a, b] , then E ( X ) = , Var ( X ) =

2 12

We have:

(1 0)

2

0 +1 1 1

E (P) = = , Var ( P ) = =

2 2 12 12

2

E ( P ) = E ( P ) + Var ( P ) =

1 1 4

2 2

+ =

2 12 12

1 4 1 5

=3 3 +9 =

2 12 12 4

In a regression analysis, you try to fit a line (or a function) through a set of points. With

least squares regression, you get a better fit by minimizing the distance squared of each

point to the fitted line.

Lets say you want to find out how a persons income level affects how much life

insurance he buys. Let X represent income. Let Y represent the amount of life insurance

this person buys. You have collected some data pairs of ( X , Y ) from a group of

consumers. You suspect theres a linear relationship between X and Y . You want to

predict Y using the function a + bX , where a and b are constant. With least squares

regression, you want to minimize the following:

Q=E ( a + bX Y)

2

Q ! 2 "

= ( a + bX Y) = E# ( a + bX Y) $

2

E

a a % a &

= 2 E ( a + bX Y ) = 2 a + bE ( X ) E (Y )

Q

Setting = 0. a + bE ( X ) E (Y ) = 0 ( Equation I )

a

Q ! 2 "

= ( a + bX Y) = E# ( a + bX Y) $

2

E

b b % b &

= 2E ( a + bX Y ) X = 2 aE ( X ) + bE ( X 2 ) E ( X Y )

aE ( X ) + bE ( X 2 ) E ( X Y ) = 0

Q

Setting = 0. (Equation II )

b

b E ( X 2 ) E 2 ( X ) = E ( X Y ) E ( X ) E (Y )

Cov ( X , Y )

b= , a = E (Y ) bE ( X )

Var ( X )

Now Im ready to give you a quick proof of the Bhlmann credibility formula. To

simplify notations, Im going to fix on one particular insured (such as Adam) and change

the symbol X j t to X t . Remember, our goal is to estimate X n +1 , the individualized

premium rate for year n + 1 , using a + Z X . Z is the credibility factor assigned to the

1

mean of past claims X = ( X 1 + X 2 + ... + X n ) . Well want to find a and Z that

n

minimize the following:

Guo Fall 2009 C, Page 112 / 284

( )

2

E a+ZX X n +1

Please note that X 1 , X 2 ,, X n , and X n +1 are claims incurred by the same policyholder

(whose risk class is unknown to us) during year 1, 2, , n , and n + 1 .

z=

(

Cov X , X n +1 )

Var X ( )

( )

Cov X , X n +1 = Cov

1

n

1

( X 1 + X 2 + ... + X n ) , X n +1 = Cov

n

( X 1 + X 2 + ... + X n ) , X n+1

1

= Cov ( X 1 , X n +1 ) + Cov ( X 2 , X n +1 ) + ... + Cov ( X n , X n +1 )

n

distributed. If indeed X 1 , X 2 ,, X n , X n +1 are independent identically distributed, we

would have

Z=

(

Cov X , X n +1 ) =0

Var X ( )

The result Z = 0 simply doesnt make sense. What went wrong is the assumption that

X 1 , X 2 ,, X n , X n +1 are independent identically distributed. The correct statement is

that X 1 , X 2 ,, X n , and X n +1 are identically distributed with a common density function

f ( x, ) , where is unknown to us.

given risk class . In other words, if we fix the sub-class variable at , then all the

claims incurred by the policyholder who belongs to sub-class are independent

identically distributed. Mathematically, this means that X 1 , X 2 ,, X n , and

X n +1 are independent identically distributed.

Here is an intuitive way to see why X i and X j have non-zero covariance. X i and X j

represent the claim amount incurred at time i and j by the policyholder whose sub-class

Guo Fall 2009 C, Page 113 / 284

is unknown to us. So X i and X j are controlled by the same risk-class factor . If is

a low risk, then X i and X j both tend to be small. On the other hand, if is a high risk,

then X i and X j both tend to be big. So X i and X j are correlated and have a non-zero

variance.

Because X i and X j are independent identically distributed with a common

conditional mean ( ), we have:

(

E Xi X j ) = E(X )E(X ) = (

i j ) ( ) = ( )

2

E (

E Xi X j ) =E E ( Xi )E(X ) j =E ( )

2

Cov ( X i , X j ) = E ( ) {E ( ) } ( )

2 2

= Var

(

Cov X , X n +1 = ) 1

n

Cov ( X 1 + X 2 + ... + X n ) , X n +1

1

= Cov ( X 1 , X n +1 ) + Cov ( X 2 , X n +1 ) + ... + Cov ( X n , X n +1 )

n

1

{

= nVar ( ) = Var ( )

n

}

Next, well calculate Var X . ( )

( )

Var X = Var

1

n

1

( X 1 + X 2 + ... + X n ) = 2 Var ( X 1 + X 2 + ... + X n )

n

X2 ,, X n are independent. So we have to include covariance among X 1 ,

X 2 ,, X n . The correct expression is:

Guo Fall 2009 C, Page 114 / 284

Var ( X 1 + X 2 + ... + X n )

= Var ( X 1 ) + Var ( X 2 ) + ... + Var ( X n )

+2Cov ( X 1 , X 2 ) + 2Cov ( X 1 , X 3 ) + ... + 2Cov ( X n 1 , X n )

common mean = E ( X ) and common variance Var ( X ) .

Out of X 1 , X 2 ,, X n , if you take out any two items X i and X j where i ' j , youll get

n ( n 1)

a covariance Cov ( X i , X j ) = Var ( ) . Since there are Cn2 =

ways of taking

2

out two items X i and X j where i ' j , the sum of the covariance terms becomes:

{

= 2Var ( ) }C 2

n = 2 Var ( )

1

n ( n 1) = n ( n 1) Var

2

( )

( )

Var X =

1

n2

Var ( X 1 + X 2 + ... + X n )

1

n

+ 2Cov ( X 1 , X 2 ) + 2Cov ( X 1 , X 3 ) + ... + 2Cov ( X n 1 , X n ) }

=

1

n2

{

nVar ( X ) + n ( n 1) Var ( ) } = 1n {Var ( X ) + ( n 1) Var ( ) }

Var ( X ) Var ( )

= + Var ( )

n

Var ( X ) Var ( ) = E Var ( X )

Guo Fall 2009 C, Page 115 / 284

( )

Var X = Var ( ) +

1

n

E Var ( X )

Finally, we have:

Z=

(

Cov X , X n +1 ) = Var ( )

=

Var ( )

Var X ( ) Var X( ) Var ( ) +

1

E Var ( X )

n

n

=

E Var ( X )

n+

Var ( )

E Var ( X ) Var ( ) n

Let k = . Then Z = =

Var ( ) Var X ( ) n+k

( )

Next, we need to find a = E ( X n +1 ) Z E X . Remember, X 1 , X 2 ,, X n , though not

independent, have a common mean E ( X ) = and a common variance Var ( X ) .

( )

E X =E

1

n

1 1

( X 1 + X 2 + ... + X n ) = E ( X 1 + X 2 + ... + X n ) = ( n ) =

n n

E ( X n +1 ) =

( )

a = E ( X n +1 ) Z E X = Z = (1 Z )

n

a + Z X = (1 Z ) + Z X = Z X + (1 Z ) , where z =

n+k

Summary of how to derive the Bhlmann credibility premium formulas

Z=

(

Cov X , X n +1 ), a = (1 Z )

Var X ( )

( )

Cov X , X n +1 = Cov ( X i , X j ) = Var ( ) = VE , where i ' j

Var ( X 1 + X 2 + ... + X n )

( )

Var X =

n2

= nVar ( X ) + n ( n 1) Var ( )

{

= n Var ( X ) Var ( ) } + n Var

2

( )

= nE Var ( X ) + n 2Var ( )

Var ( X 1 + X 2 + ... + X n )

( )

Var X =

n2

= Var ( ) +

1

n

E Var ( X ) = VE +

1

n

EV

Z=

(

Cov X , X n +1 ) = Var ( )

=

Var ( )

Var X ( ) Var X ( ) 1

{

E Var ( X ) + nVar ( ) }

n

n n

= = ,

E Var ( X ) n+k

n+

Var ( )

Or Z =

(

Cov X , X n +1 ) = Var ( )

=

VE

=

n

Var X ( ) Var X( ) 1

VE + EV n +

EV

n VE

P = a + Z X = (1 Z ) + Z X

P = Z X + (1 Z)

Renewal risk-specific global mean

premium sample mean

Here P is the renewal premium rate during year n + 1 for a policyholder whose sub-risk

is unknown to us. X is the sample mean of the claims incurred by the same policyholder

(hence the same sub-risk class) during year 1, 2, , n . is the mean claim cost of all

the sub-risks combined.

If we apply this formula to set the renewal premium rate for Adam for Year 3, then the

formula becomes:

(1 Z)

Adam

P Adam = Z X + Adam, Bob, Colleen

Renewal risk-specific global mean

premium sample mean

At first, the above formula may seem counter-intuitive. If we are interested only in

Adams claim cost in Year 3, why not set Adams renewal premium for Year 3 equal to

his prior two-year average claim X (so P X )? Why do we need to drag in , the

global average, which includes the claim costs incurred by Bob and Colleen?

Actually, its blessing that the renewal premium formula includes . X varies widely

based on your sample size. However, the state insurance departments generally want the

renewal premium to be stable and responsive to the past claim data. If your renewal

premium P is set to X , then P will fluctuate wildly depending on the sample size. Then

youll have a difficult time getting your renewal rates approved by state insurance

departments.

In addition, you may have P X = 0 ; this is the case for Adam. Youll provide free

insurance to the policyholder who has not incurred any claim yet. This certainly doesnt

make any sense.

Adam Bob

At the same time, P is still responsive to X . Since X <X , the renewal

premium formula P = (1 Z ) + Z X will produce P Adam

<P Bob

.

There are other ways to derive the Bhlmann credibility formula. For example, instead of

( )

2

minimizing E a + Z X X n +1 , we can minimize

( )

2

E a+ZX

example, ( ) has four possible values:

E ( X SP ) , E ( X PF ) , E ( X STD ) , and E ( X SUB )

( )

2

The idea behind E a + Z X is this. If we know that a policyholder belongs to

sub-risk , then we can set our renewal premium for year n + 1 equal to his conditional

mean claim cost ( ) = E ( X n +1 ) = E ( X 1 ) = E ( X 2 ) = ... = E ( X n ) . However, we

dont know . As a result, we list all the possible values of ( ) and find the least mean

squared errors estimator of ( ) by minimizing ( )

2

E a+ZX .

Cov X , ( )

Z=

Var X ( )

Cov X , ( )

1 1

= Cov ( X 1 + X 2 + ... + X n ) , ( ) = Cov ( X 1 + X 2 + ... + X n ) , ( )

n n

=

1

n

{Cov X 1 , ( ) + Cov X 2 , ( ) + ... + Cov X n , ( ) }

For i = 1, 2,..., n , we have:

Cov X i , ( ) = E Xi ( ) E ( Xi ) E ( )

E Xi ( ) {

= E E Xi ( ) }

, ( ) is a constant. Hence E Xi ( ) = ( )E = ( )

2

For a fixed Xi

E Xi ( ) {

= E E Xi ( ) }= E ( )

2

E ( Xi ) E ( ) E ( Xi ) ( ) { ( ) }

2

=E E = E

Cov X i , ( ) ( ) {E ( ) } ( )

2 2

=E = Var

Cov X , ( )

=

1

n

{Cov X 1 , ( ) + Cov X 2 , ( ) + ... + Cov X n , ( ) } =Var ( )

( ) ( ) ( )

2 2

Var X is the same whether E a + Z X X n +1 or E X , a+ZX is to be

minimized:

( ) 1n {E

Var X = Var ( X ) + nVar ( ) }

One again, we get:

Z=

(

Cov X , X n +1 ) = Var ( )

=

n

=

n

Var X( ) Var X( ) n+

E Var ( X ) n+k

Var ( )

( )

a = E ( X n +1 ) Z E X = Z = (1 Z )

a + Z X = (1 Z ) + Z X = Z X + (1 Z ) ,

minimizing

( ) ( )

2 2

E a+ZX X n +1 or E a + Z X

E ( X n +1 X 1 , X 2 ,..., X n )

2

we can minimize E a + Z X .

Here X n +1 X 1 , X 2 ,..., X n represents the claim cost at year n + 1 of the policyholder who

incurred claims X 1 , X 2 ,..., X n in year 1,2,, n . The notation X n +1 X 1 , X 2 ,..., X n

emphasizes that the claim amounts X 1 , X 2 ,..., X n , X n +1 are from the same sub-class .

This condition must hold for the Bhlmann credibility formula to be valid. For example,

if X n +1 comes from sub class 1 and X 1 , X 2 ,..., X n from sub-class 2 , then the Bhlmann

credibility formula will not hold true.

However, the requirement that the claim amounts X 1 , X 2 ,..., X n , X n +1 are from the same

sub-class shouldnt bother us at all. At the very beginning when we presented the

Bhlmann credibility formula, we already used X 1 , X 2 ,..., X n , X n +1 to refer to the claims

incurred by the same policyholder whose sub-risk is . As a result,

Guo Fall 2009 C, Page 120 / 284

E ( X n +1 X 1 , X 2 ,..., X n ) = E X = ( )

E ( X n +1 X 1 , X 2 ,..., X n ) ( )

2 2

So E a + Z X = E a+ZX

Key Points

We can derive the Bhlmann credibility formula by minimizing any of the following

three terms:

( ) ( ) E ( X n +1 X 1 , X 2 ,..., X n )

2 2 2

E a+ZX X n +1 , E a+ZX , E a+ZX .

The Bhlmann credibility premium is the least squares linear estimator of any of the

following three terms:

X 1 , X 2 ,..., X n claims in year 1, 2,.., n .

( ) , the mean claim amount of the sub-class that has generated X 1 , X 2 ,..., X n

given we have observed that the same policyholder has X 1 , X 2 ,..., X n claim costs

in years 1, 2,.., n respectively.

Even though we have derived the Bhlmann credibility formula assuming X is the claim

cost, the Bhlmann credibility formula works if X is any other quantity such as loss

ratio, the aggregate loss amount, or the number of claims.

The Bhlmann credibility formula is popular due to its simplicity. The renewal premium

is the weighted average of the uniform group rate and the sample mean of the past claims.

The renewal premium is easy to calculate and easy to explain to clients.

In contrast, Bayesian premiums (the posterior means) are often difficult to calculate,

requiring knowledge of prior distributions and involving complex integrations.

Next, lets derive a special case of the Bhlmann credibility formula. This special case is

presented in Loss Models.

Special case

If E ( X i ) = , Var ( X i ) = 2

, and for i ' j Cov ( X i , X j ) = * 2

where correlation

coefficient * satisfies 1 < * < 1 , determine the Bhlmann credibility premium.

(

Cov X , X n +1 ),

Z= (

Cov X , X n +1 = Var ) ( ) = Cov ( X i , X j ) = * 2

Var X ( )

Var ( X 1 + X 2 + ... + X n )

( )

Var X =

n2

+ n ( n 1) * 2

Z=

(

Cov X , X n +1 )= * 2

=

n*

Var X ( ) 1

n 2

+ n ( n 1) * 2 1 + ( n 1) *

n2

a = (1 Z ) = 1

n*

=

(1 * )

1 + ( n 1) * 1 + ( n 1) *

Z X + (1 Z ) =

n*

X+

(1 * ) = * n

Xi +

(1 * )

1 + ( n 1) * 1 + ( n 1) * 1 + * n * i =1 1 + * n*

You dont need to memorize the Bhlmann credibility premium formula for this special

case. If you understand how to derive the general Bhlmann credibility premium formula,

you can derive the special case formula any time by setting Cov ( X i , X j ) = * 2 .

Next, lets turn our attention toward how to solve the Bhlmann credibility problem on

the exam.

How to tackle Bhlmann credibility problems

Step 1 Divide the policyholders into sub-classes 1 , 2 , 3

Step 2 For each sub-class , calculate the average claim cost (or loss ration,

aggregate claim, etc) ( ) = E ( X ) ; calculate the variance of the claim

cost Var ( X ).

Step 3 Calculate EV= E Var ( X ) , the average variance for all sub-classes

combined. Calculate VE= Var E(X ) , the variance of the average

claim for all sub-classes combined.

EV n

Step 4 Calculate k = , Z=

VE n+k

Step 5 Calculate = E E(X ) , the average claim cost for all sub-classes

combined. This is the uniform group premium rate you would charge

under the classical theory of insurance.

n

1

Step 6 Calculate the sample claim of the past data X = Xi .

n i =1

weighted average of the sample mean and the uniform group rate.

Amount for Risk 1 Amount for Risk 2

250 0.5 0.7

2,500 0.3 0.2

60,000 0.2 0.1

A claim of 250 is observed.

Determine the Bhlmann credibility estimate of the second claim amount from the same

risk.

Solution

This is a typical problem for Exam C. Here policyholders are from two risk classes. Even

though the problem doesnt say that Risk 1 and Risk 2 are two sub-risks of a similar

bigger risk group (i.e. homogeneous group), we should assume so. Otherwise, the

Bhlmann credibility formula wont work. Remember the Bhlmann credibility premium

is the weighted average of the uniform group rate and the risk specific sample mean

X . If Risk 1 and Risk 2 are not sub-risks of a homogeneous group, then the uniform

group rate doesnt exist; we have no way of calculating Z X + (1 Z ) .

The problem says that a claim of 250 is observed. This means that a policyholder of an

unknown sub-class has incurred a claim of X 1 =$250. Since Risk 1 is twice as likely as

2 1

Risk 2, the $250 claim has chance of coming from Risk 1 and chance of from Risk

3 3

2. The question asks us to estimate the next claim amount X 2 incurred by the same

policyholder.

Amount for Risk 1 Amount for Risk 2

250 0.5 0.7

2,500 0.3 0.2

60,000 0.2 0.1

E ( X risk 2 ) = 250(0.7) + 2,500(0.2) + 60,000(0.1) = 6,675

= E ( X ) = P ( X from risk 1) E ( X risk 1) + P ( X from risk 2 ) E ( X risk 2 )

2 1

= (12,875 ) + ( 6, 675 ) = 10,808.33

3 3

2 2

2

2 1

= (12,875 ) + ( 6, 675 ) 10,808.332 = 8,542, 222.22

2 2

3 3

( )

E X 2 risk 1 =2502(0.5) + 2,5002(0.3)+ 60,0002(0.2)=721,906,250

E(X 2

risk 2 ) =250 (0.7 )+ 2,500 (0.2) + 60,000 (0.1) =361,293,750

2 2 2

(

Var ( X risk 1) = E X 2 risk 1 ) E 2 ( X risk 1) = 721,906, 250 12,8752 = 556,140, 625

Var ( X risk 2 ) = E ( X 2

risk 2 ) E 2 ( X risk 2 ) = 361, 293, 750 6, 6752 = 316, 738,125

2 2

= ( 556,140, 625) + ( 316, 738,125) = 476,339, 791.67

3 3

EV 476,339, 791.67

k= = = 55.76

VE 8,542, 222.22

n 1

Z= = = 1.76% ,

n + k 1 + 55.76

1

X = ( X 1 + X 2 + ... + X n ) , not the individual claims data X 1 , X 2 ,, X n .

n

( X 1 , X 2 , X 3 ) = ( 2, 2,5) have the same X = 3 and will produce the same renewal premium

P = Z X + (1 Z ) = 3Z + (1 Z ) .

Shortcut

We can rewrite the Bhlmann credibility premium formula as

n

k + Xi

n k k + nX

P = Z X + (1 Z ) = X+ = = i =1

EV

We can interpret k = as the number of samples taken out of the global mean .

VE

Imagine we have two urns, A and B. A contains an infinite number of identical balls with

each ball marked with the number . B contains an infinite number of identical balls

with each ball marked with the number X . You take out k balls from Urn A and n balls

from Urn B.

n

k + Xi

k + nX

Then the average value per ball is: P = = i =1

n+k n+k

k n

n

k + Xi

X k + nX

P= = i =1

n+k n+k

A B

Practice problems

Q1 You are an actuary on group health insurance pricing. You want to use the

Bhlmann credibility premium formula P = Z X + (1 Z ) to set the renewal premium

rate for a policy. One day the vice president of your company stops by. He has a Ph.D.

degree in statistics and is widely regarded as an expert on the central limit theorem. He

asks you to throw the formula P = Z X + (1 Z ) into the trash can and focus on .

All we care about is . As long as we charge each policyholder , well be okay, the

vice president says. The fundamental concept of insurance is that many people form a

group to share the risk. If we charge , the law of large numbers will work its magic and

well be able to collect enough premiums to pay our guarantees.

Comment on the vice presidents remarks.

Solution

can set the premium rate equal to the average claim cost = E ( X ) . Some policyholders

will suffer losses greater than E ( X ) , while others will suffer losses less than E ( X ) .

However, on average, insurance companies will have collected just enough premiums to

offset the loss. As long as each policyholder pays , then the insurer will be solvent.

homogenous risk group are really different risks. Policyholders of different risks can shop

around and compare premium rates. If any policyholder believes that his premium is too

high, he can terminate his policy and buy cheaper insurance elsewhere.

If an insurer charges to similar yet different risks, good risks will stop doing business

with the insurer and buy cheaper insurance elsewhere; only bad risks will remain in the

insurers book of business. As more and more good risks leave the insurers book of

business, the actual expected claim cost will exceed the original average premium rate .

Then the insurer has to increase , causing more policyholders to terminate their

policies. Gradually, the insurers customer base will shrink and the insurer will go

bankrupt.

Q2 Compare and contrast the classical theory of insurance and the credibility theory

of insurance.

Solution

The classical theory vs. the credibility theory:

Classical theory of Credibility

insurance Theory

Is there a homogeneous Yes. This is the Each member of a seemingly

group? foundation of insurance. homogeneous group belongs

Identical risks form a to a sub-class. The insurer

homogeneous group to doesnt know who belongs to

share risks. which sub-class.

Are claim random variables Yes. Since each member No. Since members of a

X of different members of a of a homogeneous group similar risk group are actually

group independent has identical risk, each of different sub-risk classes,

identically distributed? members claim random only claims incurred by the

variable is independent same sub-class are

identically distributed at independent identically

all times. distributed.

Whats the fair premium The fair premium is The fair premium is

rate? E ( X ) = , where X is E ( X = ) = ( ) , which

the random loss variable is the mean claim cost of the

of any policyholder in sub-class . Every member

the homogeneous group. of the same sub-class

Every member of a needs to pay ( ) ,

homogeneous group

needs to pay , the

uniform group pure

premium rate.

Q3 One day you visited your college statistics professor. He asked what you were

doing in your job. You told him that you used the Bhlmann credibility premium formula

to set the renewal premium for group health insurance policies. The Bhlmann credibility

theory was new to the professor. After listening to your explanation of the formula

P = Z X + (1 Z ) , he looked puzzled. He told you that for 20 years he had been telling

his students that X is the unbiased estimator of E ( X ) . I dont get it. Why dont you

just set P X ?

Solution

Your stats professor is perfectly correct in saying that the sample mean is an unbiased

estimator of the population mean. If the number of observations n is large (so we have

observed X 1 , X 2 , , X n claims), for any policyholder, setting his renewal premium

equal to his prior average mean claim is a good idea.

In reality, however, its hard to implement the idea P X . Often you, as an insurer,

have to set the renewal premium with limited data (so n may be small). For a small n ,

X may not be a good estimate of E ( X ) . In addition, we may have a weird situation

where X = 0 . In our taxi driver insurance example, if you use P X to set the renewal

premium for Adam, youll get P = 0. This clearly doesnt make any sense.

For each policyholder, losses X 1 , X 2 , , X n , conditional on , are independent

identically distributed with mean

( ) = E(X j = ), j = 1, 2,..., n

and variance

v( ) = Var ( X j = ), j = 1, 2,..., n

The Bhlmann credibility factor assigned for estimating X 5 based on X 1 , X 2 ,

X 3 , X 4 is Z = 0.4

Solution

Z=

(

Cov X , X n +1 ) = Var ( )

=

VE

Var X ( ) Var X ( ) 1

VE + EV

n

We are told that n = 4 (we have four years of claim data), Z = 0.4 , and VE = 0.8 .

VE VE

0.4 = = , VE = 1.33 .

VE +

8 VE + 2

4

(

So Cov ( X i , X j ) = Cov X , X n +1 = Var ( ) ) = VE = 1.33

Q5 Nov 2005 #19

For a portfolio of independent risks, the number of claims for each risk in a year follows

a Poisson distribution with means given in the following table:

1 1 900

2 10 90

3 20 10

The Bhlmann credibility estimate of the number of claims for the same risk in Year 2 is

11.983. Determine x .

Solution

The problem states that x claims in Year 1 have been observed for a randomly selected

risk. The wording a randomly selected risk is needed because in order for the

Bhlmann credibility formula to work, the risk class must be unknown to us. If we

already know the risk class, we can calculate the expected number of claims in Year 2;

we dont need to estimate any more.

Please also pay attention to the wording the Bhlmann credibility estimate of the

number of claims for the same risk in Year 2 is In order for the Bhlmann credibility

formula to work, the renewal premium (or the expected number of claims in this

problem) in year n + 1 and the prior n year claims X 1 , X 2 , , X n must refer to the

same (unknown) risk class.

And now back to the problem. Let Y represent the number of claims incurred in a year

by a randomly chosen class. Since Y has is Poisson random variable,

E (Y ) = Var (Y ) .

Class Mean # of claims per risk P( = ) # of risks

= E (Y ) = Var (Y )

1 1 90% 900

2 10 9% 90

3 20 1% 10

Total 100% 1,000

= E (Y ) = E E (Y ) = P( ) E (Y ) = 1( 90% ) + 10 ( 9% ) + 20 (1% ) = 2

Guo Fall 2009 C, Page 130 / 284

EV = P( ) Var (Y ) = 1( 90% ) + 10 ( 9% ) + 20 (1% ) = 2

E (Y ) E (Y ) { E E (Y ) }

2 2

VE = Var =E

= 12 ( 90% ) + 102 ( 9% ) + 202 (1% ) 22 = 9.9

E (Y ) = E E (Y { ) E (Y )}

2

Alternatively, VE = Var E

2 2 2

EV 2

k= =

VE 9.9

n

2

nY + k

Yi + k x+ ( 2)

P= = i =1

= 9.9 = 11.983 , x = 14

n+k n+k 1+

2

9.9

Q6 Nov 2005 #7

For a portfolio of policies, you are given:

The annual claim amount on a policy has probability density function

f (x ) = 2 , 0 < x <

2x

+ ( ) = 4 3, 0 < <1

Determine the Bhlmann credibility estimate of the claim amount for the selected risk in

Year 2.

Solution

2 x2

E(X ) = x f (x ) dx = x 2x 2 1 3 2

2

dx = 2

dx = 2

x =

0 0 0

3 0 3

E(X )= x f (x )d Wrong!

0

be treated as a constant and d = 0 . The correct calculation is to integrate x f ( x )

regarding x , not .

) E2 ( X )

(

E X2 ) = x2 f ( x ) dx = x 2 2x

2

dx =

2 x3

2

dx =

2 1 4

2

4

x

0

=

1

2

2

0 0 0

Var ( X ) = E(X 2

) E 2

(X ) = 12 2 2

3

=

1

18

2

= E(X ) = E E(X ) 2 2

The global mean: =E = E( )

3 3

Var ( X ) E( )

1 1

The expected conditional variance: EV = E =E 2

= 2

18 18

E(X ) 2 2

The variance of the conditional mean: VE = Var = Var = Var ( )

3 3

1 1 1

E( )= +( )d = (4 3

)d = 4 4

d =

4

5

0 0 0

1 1 1

E( )= (4 ) d 4

2 2

+( )d = 2 3

= 4 5

d =

0 0 0

6

2

) = E( ) 4 4 2

Var ( 2

E2 ( )= =

6 5 75

2 2

EV = E ( ) = 181 64 ,

1 2 2 2

2

VE = Var ( )=

18 3 3 75

1 4

2 2 4 8 EV 18 6

= E( )= = , k= = 2

= 3.125

3 3 5 15 VE 2 2

3 75

The above fraction is complex. We dont want to bother expressing k in a neat fraction;

trying to expressing k in a neat fraction is prone to errors.

n

8

k + Xi 3.125 + 0.1

k + X1 15

P= i =1

= = = 0.428

n+k 1+ k 1 + 3.125

2 x2

E(X )= x f (x ) dx = 2x 2 1 3 2

x 2

dx = 2

dx = 2

x = (as before)

0 0 0

3 0 3

1 1 1

)d = 4 3

d = 5

=

0 0

3 3 5 0 15

E(X ) E(X ) {E E ( X ) }

2 2

VE = Var =E

1 1 2

E(X ) E(X ) 2

+( )d +( )d

2 2

E = =

=0 =0

3

1 2 1

2 16 16 1 8

= 4 3

d = 5

d = =

=0

3 9 =0

9 6 27

{E E ( X ) }

2

E(X )

2 2 8 8

VE = VE = = 0.01185

27 15

) E2 ( X )

E X2( ) = x2 f ( x ) dx = x 2 2x

2

dx =

2 x3

2

dx =

2 1 4

2

4

x

0

=

1

2

2

0 0 0

Var ( X )= E X2 ( ) E2 ( X )= 1

2

2 2

3

=

1

18

2

1 1

EV = E Var ( X ) Var ( X ) 1 4 1

= +( ) d = 4 3 2

d =

0 0

18 18 6

4 1

EV 18 6

k= = = 3.125

VE 0.01185

n

8

k + Xi 3.125 + 0.1

k + X1 15

P= i =1

= = = 0.428

n+k 1+ k 1 + 3.125

For a particular policy, the conditional probability of the annual number of claims given

, = , and the probability distribution of , are as follows:

Probability 2 1 3 Probability 0.80 0.20

Solution

X 0 1 2

Probability 2 1 3

E(X ) = 0 ( 2 ) + 1( ) + 2 (1 )=2 5 3

(

E X2 ) = 02 ( 2 ) + 12 ( ) + 22 (1 3 ) = 4 11

Var ( X )=4 (2 ) =9

2 2

11 5 25

= E E(X ) = E [2 5 ] = 2 5E ( )

VE = Var E ( X ) = V ( 2 5 ) = V ( 5 ) = 25Var ( )

EV = E Var ( X ) = E ( 9 25 2 ) = 9E ( ) 25E ( 2 )

0.05 0.30

Probability 0.80 0.20

E( 2

) = 0.052 ( 0.8) + 0.32 ( 0.2 ) = 0.02

Guo Fall 2009 C, Page 134 / 284

Var ( ) = 0.02 0.12 = 0.01

= 2 5E ( ) = 2 5 ( 0.1) = 1.5

VE = 25Var ( ) = 25 ( 0.01) = 0.25

EV = 9 E ( ) 25 E ( 2 ) = 9 ( 0.1) 25 ( 0.02 ) = 0.4

n

k + Xi

EV 0.4 k + X 1 1.6 (1.5 ) + 2

k= = = 1.6 P= i =1

= = = 1.69

VE 0.25 n+k 1+ k 1 + 1.6

You are given:

The annual number of claims on a given policy has a geometric distribution with

parameter -

The prior distribution of - has the Pareto density function

.

+ (- ) = , 0<- <

( - + 1)

. +1

Calculate the Bhlmann credibility estimate of the number of claims for the selected

policy in Year 2.

Solution

Let X represent the annual number of claims on a randomly selected policy. Here the

risk factor is - . The conditional random variable X - has geometric distribution. If you

look up Tables for Exam C/4, youll find geometric random variable N with parameter

- has mean and variance as follows:

E(N) = - , Var ( N ) = - (1 + - )

The conditional mean: E ( X - ) = -

The conditional variance: Var ( X - ) = - (1 + - )

EV = E- Var ( X - ) = E- - (1 + - ) = E- ( - ) + E- ( - 2 ) . Typically, we write E- ( - )

as E ( - ) and E- ( - 2 ) as E ( - 2 ) . So

EV = E- Var ( X - ) = E- - (1 + - ) = E ( - ) + E ( - 2 )

VE = V- E ( X - ) = V ( - )

We are told that the prior distribution of - has the Pareto density function

.

+ (- ) = , 0<- <

( - + 1)

. +1

Here the phrase prior distribution refers to the fact that we know + ( - ) prior to our

observation of x claims in Year 1. In other words, + ( - ) hasnt incorporated our

observation of x claims in Year 1 yet. Please note that the prior distribution, not the

posterior distribution, is used in Bhlmanns credibility estimate.

Frankly, I think SOAs emphasis that + ( - ) is prior (as opposed to posterior) distribution

is unnecessary and really meant to scare exam candidates. When we talk about density

function, we always refer to prior distribution. So theres never a need to say prior

distribution. If we want to refer to a distribution that has incorporated our recent

observations, at that time we say posterior distribution.

Back to the problem. We are told that - has Pareto distribution. Is it a one-parameter

Pareto or two-parameter Pareto? Many candidates have trouble knowing which one to

use. Here is a simple rule:

random variable X . If X is greater than a positive number, then use single-parameter

Pareto. If X is greater than zero, then use two-parameter Pareto:

.

.

If X > a positive constant , then use single-parameter Pareto f ( x ) = ;

x. +1

.

.

If X > 0 , then use two-parameter Pareto f ( x ) = .

(x + )

. +1

In this problem, the Pareto random variable - > 0 . So we should use the two-parameter

Pareto formula in Tables for Exam C/4.

k

E(Xk ) =

k!

.

(. 1)(. 2 ) ... (. k)

2 2

E(X ) = , E(X2) =

. 1 (. 1)(. 2 )

.

2

2 2 2

Var ( X ) = E ( X 2

) E (X ) =2

=

(. 1)(. 2 ) . 1 (. 1) (. 2 )

2

Since the two-parameter Pareto is frequently tested in Exam C, you might want to

memorize the following formulas:

2 2 . 2

E(X ) = , E(X ) = 2

, Var ( X ) =

. 1 (. 1)(. 2 ) (. 1) (. 2 )

2

.

- is a two-parameter Pareto random variable with pdf + ( - ) = . So the two

( - + 1)

. +1

.

, E (- 2 ) =

1 2

E (- ) = , Var ( - ) =

. 1 (. 1)(. 2 ) (. 1) (.

2

2)

.

EV = E ( - ) + E ( - 2 ) =

1 2

+ =

. 1 (. 1)(. 2 ) (. 1)(. 2 )

. 1

VE = V ( - ) = , = E (- ) =

(. 1) (. 2) . 1

2

.

k=

EV

=

(. 1)(. 2 ) = . 1

VE .

(. 1) (. 2 )

2

n

1

k + Xi

k + X1 (. 1) +x

x +1

P= i =1

= = . 1 =

n+k 1+ k 1 + (. 1) .

Q9 May 2005 #11

You are given:

The number of claims in a year for a selected risk follows Poisson distribution

with mean 0

The severity of claims for the selected risk follows exponential distribution with

mean

The number of claims is independent of the severity of claims.

The prior distribution of 0 is exponential with mean 1.

The prior distribution of is Poisson with mean 1.

A priori, 0 and are independent.

Solution

Let N represent the annual number of claims for a randomly selected risk.

Let X represent the loss dollar amount per loss incident.

Let S represent the aggregate annual claim dollar amount incurred by a risk.

N

Then S = X i = X 1 + X 2 + ... + X N .

i =1

0n

N is a Poisson random variable with mean 0 . So f N ( n 0 ) = e 0

( N = 0,1, 2,... ).

n!

Here 0 is an exponential random variable with pdf f ( 0 ) = e 0 . We have E ( 0 ) = 1 ,

Var ( 0 ) = 12 = 1 , and E ( 0 2 ) = E 2 ( 0 ) + Var ( 0 ) = 12 + 1 = 2

1 1 1

. We

!

have E ( ) = Var ( ) = 1 . Hence E( 2

) = E ( ) + Var ( ) = 1

2 2

+1 = 2 .

To remember that you need to use E 2 ( X ) , not E ( X ) , in the Var ( S ) formula, please

note that Var ( S ) is dollar squared. If you use Var ( N ) E ( X ) , youll get dollar, not dollar

squared. As a result, you need to use Var ( N ) E 2 ( X ) .

E S (0, ) = E(N 0) E( X )=0

The conditional variance is:

Var S ( 0 , ) = E ( N 0 ) Var ( X ) + Var ( N 0 ) E ( X ) = 02 2

+0 2

= 20 2

random variable with mean .

{

EV = E0 , Var S ( 0 , ) } = E ( 20 ) . 0,

2

EV = E0 , ( 20 2 ) = 2 E ( 0 ) E ( 2 ) = 2 (1)( 2 ) = 4

{E S (0, ) } = Var (0 ) = E 0, (0 ) E 0, (0 )

2

VE = Var 0 ,

2

0,

(0 ) = E 0, (0 2 2 ) = E (0 2 ) E ( ) = 2 ( 2) = 4

2 2

E 0,

E 0, ( 0 ) = E ( 0 ) E ( ) = 1(1) = 1

(0 ) E 0, (0 )

2

VE = E 0 , = 4 12 = 3

2

EV 4

k= =

VE 3

You are given:

Claims are conditionally independent and identically Poisson distributed with

mean

2. 6

1

F( ) =1 , >0

1+

Solution

Let X represent the number of claims. The risk factor is . We are told that X is a

Poisson random variable with mean .

The conditional variance is: Var ( X = )=

EV = E Var ( X ) = E( ), VE = Var E ( X ) = Var ( )

EV E( )

k= =

VE Var ( )

.

x+

2 2 . 2

E(X ) = , E(X2) = , Var ( X ) =

. 1 (. 1)(. 2 ) (. 1) (. 2 )

2

2. 6

1

Here we are given that F ( ) = 1 . So is a two-parameter Pareto random

+1

variable with parameters = 1 and . = 2.6 .

1 1 2.6 2.6

So E ( X ) = = and Var ( X ) = = 2

( 2.6 1) ( 2.6 2 ) 1.6 ( 0.6 )

2

2.6 1 1.6

1

EV E(

) = 1.6 = 1.6 ( 0.6 ) = 0.369

k= =

VE Var ( ) 2.6 2.6

1.6 ( 0.6 )

2

n 5

Z= = = 0.93

n + k 5 + 0.369

You are given:

Claim counts follow a Poisson distribution with mean 0

Claim sizes follow a lognormal distribution with parameters and

Claim counts and claim amounts are independent.

The prior distribution has joint pdf:

f (0, , )=2 , 0 < 0 < 1, 0 < < 1, 0 < <1

Solution

Let N represent the claim counts, X i the dollar amount of the i -th claim, and S the

aggregate losses. N 0 has Poisson distribution with mean of 0 . X i , has lognormal

distribution with parameters and . In addition, for i = 1 to N , X i , is

independent identically distributed.

N

S= Xi

i =1

E ( S 0, , ) = E(N 0) E( X , ) = 0E ( X , )

Var ( S 0 , , ) = E ( N 0 )Var ( X , ) + Var ( N 0 ) E ( X 2

, )

= 0 Var ( X , ) + 0 E ( X , )

2

= 0 E ( X , )

2

From Tables for Exam C/4, we know the lognormal distribution has the following

moments:

E ( X k ) = exp k + k 2

1 2

, E ( X 2 ) = exp 2 + 22 = exp ( 2 + 2 )

1 1

E ( X ) = exp + 2 2 2

2 2

E ( S 0, , ) = 0E ( X ) = 0 exp 1

, + 2

Var ( S 0 , , ) = 0E ( X ) = 0 exp ( 2 + 2 )

2

, 2

E ( S 0, , ) 1

= E0 , , = E0 , , 0 exp + 2

1 1 1

1

= 0 exp + 2

f (0, , ) d0 d d

= 0 = 0 0 =0

2

1 1 1 1 1 1

1

= 0 exp + 2 d0 d d = e 0d 0 d d

2

2

2 e0.5

= 0 = 0 0 =0

2 =0 =0 0 =0

1 1 1

1 1

= e d d = ( e 1)

2 2

2 e0.5 2 e0.5 d

2 =0 =0 2 =0

Set 0.5 2

= y . Then d = dy .

1 0.5

= ( e 1) e0.5 d = ( e 1) e y dy = ( e 1) ( e0.5 1)

2

=0 y =0

E ( S 0, , ) E ( S 0, , ) {E E ( S 0, , )}

2 2

VE = Var 0 , , = E 0 , , 0 , ,

0 2 exp ( 2 + )

1

E 0 , , 0 exp + 2

= E 0 , , 2

1 1 1

= 0 2 exp ( 2 + 2

) f ( 0, , ) d 0 d d

= 0 = 0 0 =0

1 1 1

= 0 2 exp ( 2 + 2

)2 d0 d d

= 0 = 0 0 =0

1 1 1 1 1

2 exp ( ) 2 exp ( )

1

= 2

exp ( 2 ) 0 2d 0 d d = 2

exp ( 2 ) d d

=0 =0 0 =0 3 =0 =0

1 1

2 exp ( ) 12 ( e 1) d ( e 1) exp ( )d

1 1 2

= 2 2

= 2

3 =0

3 =0

Set 2

= y . Then 2 d = dy .

2 1 1

E 0 , , 0 exp +

1

2

2

=

3

( e 1)

1 2

exp ( 2

)d =

3

(

1 2

e 1)

1 y

2

e dy

=0 y =0

=

1 1

3 2

(e 2

1) ( e 1) = ( e 1) ( e 1)

1 2

6

{E E ( S 0, , )} = 2 = ( e 1) ( e0.5 1)

2 2 2

0 , ,

E ( S 0, , ) {E E ( S 0, , )}

2 2

VE = E 0 , , 0 , ,

( e 1) ( e 1)

1 2

( e 1) (e 1) = 0.5872

2

=

2 0.5

EV = E0 , , Var ( S 0 , , ) = E0 , , 0 exp ( 2 + 2 2

)

1 1 1

= 0 exp ( 2 + 2 2

) f (0, , ) d 0 d d

= 0 = 0 0 =0

1 1 1

= 0 exp ( 2 + 2 2

)2 d0 d d

= 0 = 0 0 =0

1 1 1 1 1

2 exp ( 2 ) 2 exp ( 2 )

1

= 2

exp ( 2 ) 0d 0 d d = 2

exp ( 2 ) d d

=0 =0 0 =0 2 =0 =0

( e 1)

1 1 2

2 exp ( 2 )d (

1 1 2

e 1) ( e 2 1) = ( e2 1) = 5.103

1 1 2

= 2

=

2 2 =0

2 2 2 8

EV 5.103

k= = = 8.69

VE 0.5872

c( )=2 . In addition, 0 , , and lie in the cube 0 < 0 < 1 , 0 < < 1 , 0 < < 1.

Consequently, 0 , , and are independent random variables with the following

marginal pdf:

f0 ( 0 ) = 1 , 0 < 0 < 1;

f ( ) = 1, 0 < < 1;

f ( )=2 , 0< < 1.

= E0 , , E ( S 0, , ) = E0 , , 0 exp +

1

2

2

= E ( 0 ) E ( e ) E e0.5 ( 2

)

( )

1 1

E ( e ) = e du = e 1 , = 2 ( e0.5 1)

1 1

E (0 ) = = e0.5 2 d = 2 e0.5

2 2 2

, E e0.5

2 0 0

0

= E ( 0 ) E ( e ) E e0.5 ( 2

) = ( e 1) ( e 0.5

1)

EV = E0 , , Var ( S 0 , , ) = E0 , , 0 exp ( 2 + 2 2

) = E ( 0 ) E ( e2 ) E e 2 ( ) 2

1

E (e )= ( e 1)

2 1 2 1 1 2

e 2 du = e =

0

2 0 2

( )

1

1 2

( e 1)

1 2

1

= e2 2 d = =

2 2 2

E e2 e

0

2 0 2

EV = E ( 0 ) E ( e 2 ) E e 2 ( ) = 12 12 ( e 1) 12 ( e 1) = 18 ( e 1)

2

2 2 2 2

= 5.103

E ( S 0, , ) E ( S 0, , ) {E E ( S 0, , )}

2 2

VE = Var 0 , , = E 0 , , 0 , ,

E ( S 0, , )

2

= E 0 , , 2

2

E 0 , , 0 exp +

1

2

2

= E 0 , , 0 2 exp ( 2 + 2

) = E ( 0 2 ) E ( e2 ) E e ( ) 2

( )

1 1

E ( 0 2 ) = 0 2 d 0 = , E ( e 2 ) = ( e2 1) , E e

1 1 1

= e 2 d = e =e 1

2 2 2

0

3 2 0

0

E ( S 0, , ) ( e 1) ( e 1)

1 2

( e 1) (e 1) = 0.5872

2 2

VE = E 0 , , 2 =

2 0.5

EV 5.103

k= = = 8.69

VE 0.5872

Please note

The joint pdf f ( 0 , , ) = a ( 0 ) b ( ) c ( ) alone doesnt guarantee that 0 , , and are

independent. The additional requirement for 0 , , and to be independent is that 0 , ,

and lie in a cube A < 0 < B , C < < D , E < < F , where A,B,C,D,E, and F are

constant. For example, say A < 0 < B , C < < D , e ( 0 ) < < f ( 0 ) . Even if

f (0, , ) = a ( 0 ) b ( ) c ( ) , then 0 , , and are NOT independent.

You are given:

A portfolio of independent risks is divided into two classes.

Each class contains the same number of risks.

For each risk in Class 1, the number of claims per year follows a Poisson

distribution with mean 5.

For each risk in Class 2, the number of claims per year follows a binomial

distribution with mean m = 8 and q = 0.55 .

A randomly selected risk has three claims in Year 1, r claims in Year 2, and four

claims in Year 3.

The Bhlmann credibility estimate for the number of claims in Year 4 for this risk is

4.6019.

Determine r .

Solution

#1 0.5 Poisson with mean 5 5 5

#2 0.5 Binomial m = 8 and q = 0.55 8(0.55) = 4.4 8(0.55)0.45 = 1.98

1 1

= ( 5 + 4.4 ) = 4.7 , EV = ( 5 + 1.98) = 3.49

2 2

VE = ( 5 4.4 ) 0.52 = 0.09

2

n

k + Xi

EV 3.49

k= = = 38.78 , P= i =1

VE 0.09 n+k

38.78 ( 4.7 ) + ( 3 + r + 4 )

4.6019 = , r =3

3 + 38.78

Q13 Nov 2001 #23

You are given the following information on claim frequency of auto accidents for

individual drivers:

Expected claims Claim variance Expected claims Claim variance

Rural 1.0 0.5 1.5 0.8

Urban 2.0 1.0 2.5 1.0

Total 1.8 1.06 2.3 1.12

Each drivers claim experience is independent of every other drivers.

There are an equal number of business and pleasure use drivers.

Solution

The key to solving this problem is correctly identifying risk classes. There are four risk

classes:

= ( BR, BU , PR, PU )

PR=Pleasure & Rural Use, PU=Pleasure & Urban Use

Next, we need to calculate the probability of Rural Use and Urban Use.

Expected claims

Rural 1.0

Urban 2.0

Total 1.8

Solving these two equations, we get: P ( R ) = 0.2 , P (U ) = 0.8 .

Business Use 0.5 Pleasure use 0.5

Rural 0.2 P(BR)=0.2(0.5)=0.1 P(PR)=0.2(0.5)=0.1

Urban 0.8 P(BU)=0.8(0.5)=0.4 P(PU)=0.8(0.5)=0.4

Let X represent the claim frequency of auto accidents of a randomly selected driver.

2

BU 2.0 1.0 0.4 4.0

PR 1.5 0.8 0.1 2.25

PU 2.5 1.0 0.4 6.25

EV = E Var ( X ) = 0.5(0.1) + 1.0(0.4) + 0.8(0.1) + 1.0(0.4) = 0.93

VE = Var E ( X ) = E E ( X ) {E E ( X ) }

2 2

2

E

2 2

VE = E

EV 0.93

k= = = 4.18

VE 0.2225

n 1

Z= = = 0.193

n + k 1 + 4.18

Chapter 6 Bhlmann-Straub credibility model

In the Bhlmann credibility model, we focus on one policyholder. We know that this

policyholder has incurred claim amounts X 1 , X 2 ,, X n in Year 1, 2, , n

respectively. We want to estimate his conditional mean claim amount in Year n + 1 :

E ( X n +1 X 1 , X 2 ,..., X n )

Now we move from the Bhlmann credibility world to a more complex, the Bhlmann-

Straub credibility world. Instead of looking at only one policyholder, we look at a group

of policyholders.

In Year 1, there are m1 policyholders. The 1st policyholder has incurred X (1, t = 1) claim.

The 2nd policyholder has incurred X ( 2, t = 1) claim. And the m1 -th policyholder has

incurred X ( mi , t = 1) claim dollar amount.

In Year 2, there are m2 policyholders. The 1st policyholder has incurred X (1, t = 2 )

claim. The 2nd policyholder has incurred X ( 2, t = 2 ) claim. And the m2 -th

policyholder has incurred X ( m2 , t = 2 ) claim amount.

In Year t , there are mt policyholders. The 1st policyholder has incurred X (1, t ) claim.

The 2nd policyholder has incurred X ( 2, t ) claim. And the mt -th policyholder has

incurred X ( mt , t ) claim amount.

In Year n , there are mn policyholders. The 1st policyholder has incurred X (1, t = n )

claim. The 2nd policyholder has incurred X ( 2, t = n ) claim. And the mn -th

policyholder has incurred X ( mn , t = n ) claim amount.

policyholders pay?

Assumptions of the Bhlmann-Straub credibility model

All the observed policyholders belong to the same sub-risk class . That is, m1

policyholders in Year 1, m2 policyholders in Year 2, mn policyholders in Year n , and

the mn +1 policyholders in year n + 1 , all belong to the same sub-risk .

We dont know the specific value of . All we know is that takes on a random value

from = { 1 , 2 ,...} .

X ( mn , t = n ) , X ( mn +1 , t = n + 1) are independent identically distributed with a common

conditional mean E X ( i, t ) = ( ) and a common conditional variance

Var X ( i, t ) = 2

( ).

One approach is to calculate the renewal premium for Year n + 1 from the scratch. An

easier approach is to convert the Bhlmann-Straub credibility problem into a standard

Bhlmann credibility problem. Ill do both.

First, lets look at the problem from the Bhlmann world. In Year 1, m1 policyholders

m1

have incurred a total of X ( i, t = 1) claim amount. Because these m1 policyholders

i =1

belong to the same, unknown, sub-risk , theres no distinction between any two of these

m1 policyholders. All these m1 policyholders are just photocopies of one another.

So

m1

In Year 1, m1 policyholders have incurred a total of X ( i, t = 1) claim amount.

i =1

Is the same as

m1

In the first m1 years, one policyholder has incurred a total of X ( i, t = 1) claim amount.

i =1

m1

In either case, the total claim amount is X ( i, t = 1) ; the average claim per policyholder

i =1

m1

1

per year is X ( i, t = 1) .

m1 i =1

Similarly,

m2

In Year 2, m2 policyholders have incurred a total of X ( i, t = 2 ) claim amount.

i =1

Is the same as

m1

In the next m2 years, the policyholder (who has incurred X ( i, t = 1) in the first m1

i =1

m2

years) has incurred total X ( i, t = 2 ) claim.

i =1

So on and so forth.

m1

In the first m1 years, one policyholder has incurred total X ( i, t = 1) claim.

i =1

m2

In the next m2 years, the policyholder has incurred total X ( i, t = 2 ) claim.

i =1

m3

In the next m3 years, the policyholder has incurred total X ( i, t = 3) claim.

i =1

mn

In the next mn years, the policyholder has incurred total X ( i, t = n ) claim.

i =1

n mi

In m = m1 + m2 + ... + mn years, one policyholder has incurred total X ( i, t ) claim.

t =1 i =1

Then the expected claim cost in Year m + 1 for one policyholder can be calculated using

the Bhlmann credibility formula:

P = Z X + (1 Z ) , where

n mt

Total observed claims 1

X= = X ( i, t )

Total # of observed years m t =1 i =1

# of observation years m E 2

( )

Z= = , k=

# of observation years + k m + k Var ( )

Theres nothing new under the sun in the Bhlmann-Straub credibility model. Every

problem about the Bhlmann-Straub credibility model can be solved using the Bhlmann

credibility model.

Actually, we can have a unified formula for the Bhlmann-Straub and the Bhlmann

credibility models:

P = Z X + (1 Z )

X= ,

# of observed exposures (measured on the insured-year basis)

# of observed exposures E 2

( )

Z= , k=

# of observed exposures + k Var ( )

In this unified formula, the observed exposure is measured on the insured-year basis. For

example, if one policyholder has incurred $500 claim in one year, the exposure is:

If the policyholder has incurred $500 claim in a 2-year period, then the exposure is:

Lets see how the unified formula works for the Bhlmann and the Bhlmann-Straub

credibility models. In the Bhlmann model, we have an n -year claim history of one

policyholder. So the observed exposure is:

n X 1 + X 2 + ... + X n

Z= , X=

n+k n

However, we have only 1-year claim data for each of these m policyholders. So the total

# of exposures is:

n mt

m 1

Z= , X= X ( i, t )

m+k m t =1 i =1

Now you know how to convert a Bhlmann-Straub problem into a Bhlmann problem

and how to use a unified formula for the Bhlmann-Straub model and the Bhlmann

model. Next, Ill derive the Bhlmann credibility formula from the scratch. First, lets

create an average policyholder and reorganize each years claim data from the viewpoint

of this average policyholder.

Lets look at the claim history data in the Bhlmann-Straub model from the average

policyholders point of view:

m1

In year 1, m1 policyholders have incurred total X ( i, t = 1) claim. So the average

i =1

m1

1

policyholder has incurred X 1 = X ( i, t = 1) claim.

m1 i =1

m2

1

In Year 2, the average policyholder has incurred total X 2 = X ( i, t = 2 ) claim.

m2 i =1

mt

1

In Year t , the average policyholder has incurred total X t = X ( i, t ) claim.

mt i =1

mn

1

In Year n , the average policyholder has incurred total X n = X ( i, t = n ) claim.

mn i =1

n

mi

approximate E ( X n +1 ) , where X = E ( X n +1 )

2

X i . Well minimize E a + Z X .

i =1 m

Z=

(

Cov X , X n +1 ), a = (1 Z )

Var X ( )

mt mt mt

E ( Xt )=E 1 1 1

X ( i, t ) = E X ( i, t ) = ( )= ( )

mt i =1 mt i =1 mt i =1

mt mt mt

Var ( X t ) = Var 1 1 1

X ( i, t ) = Var X ( i, t ) = 2

( )

mt i =1 mt2 i =1 mt2 i =1

1

2

( )

= 2 mt 2

( ) =

mt mt

( )

( )

n n n n

E X =E

mi

m

( Xi ) =

1

m

mi E ( X i ) =

1

m

mi ( ) =

m

mi = ( )

i =1 i =1 i =1 i =1

( )

( )

n n n 2

Var X = Var

mi

m

( Xi ) =

1

m2

mi 2Var ( X i ) =

1

m2

mi 2

mi

i =1 i =1 i =1

2

( ) n 2

( ) 2

( )

= 2

mi = 2 (m) =

m i =1 m m

( ) without using

complex summation symbols above. The # of policyholders observed is

m = m1 + m2 + ... + mn . The claims incurred by these m policyholders, given , are

independent identically distributed with a common mean ( ) and a common variance

2

( ) . As a result, the average claim amount incurred by these m policyholders, given

1

, has mean ( ) and variance 2

( ).

m

( )

Cov X , X n +1 = E X X n +1 ( ) E(X )E(X n +1 )

(

E X X n +1 = E ) (

E X X n +1 ) =E E X ( )E(X n +1 ) =E ( )

2

( )

E X =E E X ( ) =E ( )

E ( X n +1 ) = E E ( X n +1 ) =E ( )

(

Cov X , X n +1 = E ) ( )

2

E ( ) = Var ( )

Z=

(

Cov X , X n +1 )= Var ( )

=

m

Var X ( ) Var ( ) +

2

( ) m+

2

( )

m Var ( )

Period 1 Period n

Exposure m1 mn

per unit exposure

Var ( X 1

2

( ) 2

( )

Process variance risk )= m1

Var ( X n )= mn

Then

E ( X n +1 X 1 , X 2 ,..., X n ) = Z X + (1 Z )

X=

n

mi

Xi , Z=

m

, k=

2

( )

i =1 m m+k Var ( )

n

policyholder in Year 1, 2, , n respectively. In addition, Z . Dont make a

n+k

n n

common mistake of writing Z = . The formula Z = is not good for the

n+k n+k

Bhlmann-Straub credibility model.

Key point

In the Bhlmann-Straub credibility model, what matters is the total exposure m and the

historical average claim per exposure X . The individual claim amount X ( i, t ) doesnt

matter.

For example, everything else being equal, the following two cases have the same

Bhlmann-Straub credibility estimate.

Case #1

m1 = 2 , X (1, t = 1) = 7 , X (1, t = 1) = 1 ;

m2 = 3 , X (1, t = 2 ) = 0 , X ( 2, t = 2 ) = 4 , X ( 3, t = 2 ) = 2 .

Case #2

m1 = 1 , X (1, t = 1) = 9 ;

m2 = 4 , X (1, t = 2 ) = 3 , X ( 2, t = 2 ) = 0.6 , X ( 3, t = 2 ) = 1 , X ( 3, t = 2 ) = 0.4 .

In both cases,

the total exposure is m1 + m2 = 5 ;

the total claim dollar amount is 14 = 7+0+4+2 =9+3+0.6+1+0.4;

the average claim per insured per year is 14/5=2.8.

Loss Models mentions the Hewitts version of the Bhlmann-Straub credibility model.

This model assumes that X i , the average claim, given the sub risk class , are

independently distributed with a common mean E ( X i ) = ( ) and a variance

Var ( X i

( ) 2

) = w( ) + m .

i

So the difference between the general and the standard Bhlmann-Straub model is about

( )

2

) = w( ) + m ;

i

2

( ).

)= mi

w = E w( ) , v = E v ( ) , w +

v

mj

= E Var X j ( )

n mj n

1 n

1

m* = = =

j =1 v + wmj j =1

w+

v

mj

j =1 (

E Var X j )

n

1

Xj

P = Z X + (1 Z ) , Z=

am *

, X=

j =1 (

E Var X j )

1 + am * n

1

j =1 (

E Var X j )

If m1 = m2 = ...mn = m , then

n n

1 1 n

m* = = = ,

v v v

j =1

w+ j =1

w+ w+

mj m m

am * 1 1 n

Z= = = =

1 + am * 1 1 v

w+

v

1+ w+

1 m

a m* 1+ m n+

a n a

( )

mj X j

First, X . Here X j s weight is inversely proportional E Var X j , the

mj

expected process variance . The higher the expected process variance of X j , the less

weight is assigned to X j . This way, X will have the minimum variance. This point is

explained in the study note by Curtis Gary Dean. Refer to this study note if you want to

find out more.

am *

Next, lets look at the crazy formula Z = . To get comfortable with this formula,

1 + am *

n 1

look at the basic formula Z = = . Lets compare these two formulas:

v v

n+ 1+

a na

n 1 am * 1

Z= = , Z= =

n+

v 1 v 1 + am * 1 1

1+ 1+

a a n a m*

Now you see that these two formulas are similar. If Var ( X i ) = ( ) and m

2

j = 1 as in

the Bhlmann model, then

n n

1 1 n 1 1

m* = = = , Z= =

j =1 E Var X j ( ) j =1 v v

1+

1 1

1+

1 v

a m* a n

1 n

This is the Bhlmann credibility premium formula Z = = .

1 v v

1+ n +

a n a

The third point. Loss Models points out that in this version of the model, as m j

approaches infinity, the credibility factor Z wont approach to one. Lets take a look at

this.

Var ( X i

( ) 2

) = w( ) + m

i

, Var ( X i

2

( )

When m j ) = w( )+

mi

w( ).

n

1 n 1 1

m* = = , Z= = <1

j =1 w w 1 1 1 w

1+ 1+

a m* a n

Compare this with the Bhlmann model or the Bhlmann-Straub model. In the Bhlmann

model, as the number of exposures n approaches ,

n 1

Z= = 1

v 1 v

n+ 1+

a a n

Var ( X i

2

( ) n

)= 0, m* =

1

mi j =1 (

E Var X j )

1

Z= 1

1 1

1+

a m*

Finally, Loss Models has a special case of the general Bhlmann-Straub model. In this

( )

2

i

b

Whats new is Var ( ) =a+ , as opposed to Var ( ) = a in the Bhlmann

m

n

model and the Bhlmann-Straub model. Here m = m j represents the total exposure.

j =1

b

As you can see, this special case just changes Var ( ) = a to Var ( ) =a+ . In

m

b

other words, this special case just changes a to a + . Loss Models points out to find

m

b

the credibility factor for this special case, we just need to change a to a + :

m

b

a+ m*

am * m

Z= Z=

1 + am * b

1+ a + m*

m

Most likely, Exam C wont have problems on the generalized version of the Bhlmann-

Straub model. So you should focus on the standard Bhlmann-Straub model. To tackle

the standard Bhlmann-Straub model, you can use any of the following 3 approaches:

m

Use the Bhlmann-Straub model formula Z =

m+k

having m = m j policyholder, we have m years of observation of one

# of observation years m

policyholder. Then Z = =

# of observation years + k m + k

Use the unified formula (without converting into the Bhlmann model):

# of observation years m

Z= =

# of observation years + k m + k

Sample SOA Problems

You are given the following data on large business policyholders:

Losses for each employee of a given policyholder are independent and have a

common mean and variance.

The overall average loss per employee for all policyholders is 20.

1 15 800

2 10 600

3 5 400

Determine the Bhlmann-Straub credibility premium per employee for this policyholder.

Solution

The expected process variance: EV=8,000

The hypothetical mean: VE=40

EV 8, 000

So k = = = 200

VE 40

m 1800

Z= = = 0.9

m + k 1800 + 200

1 3

800 (15 ) + 600 (10 ) + 400 ( 5)

X= mi X i = = 11.111

m i =1 1800

Alternatively,

Guo Fall 2009 C, Page 159 / 284

3

k +

200 ( 20 ) + 800 (15) + 600 (10 ) + 400 ( 5 )

mi X i

P= i =1

= = 12

m+k 1800 + 200

1 15 800

2 10 600

3 5 400

Into

First 800 years 15*800 1

Next 600 years 10*600 1

Next 400 years 5*400 1

The above two tables are essentially the same. In both tables, the average loss per

employee per year is

X= = 11.111

1800

After the conversion, the # of observation years n = 800 + 600 + 400 = 1800 . This seems

crazy, but it is merely a conceptual tool for us to transform a Bhlmann-Straub problem

into a Bhlmann problem.

n 1800

Z= = = 0.9 , P = Z X + (1 Z ) = 0.9 (11.111) + 0.1( 20 ) = 12

n + k 1800 + 200

In this method, we dont care about the distinction between the Bhlmann and the

Bhlmann Straub models. We just use the following unified formulas:

P = Z X + (1 Z )

observed claims

X= ,

# of observed exposures (measured on the insured-year basis)

# of observed exposures E 2

( )

Z= , k=

# of observed exposures + k Var ( )

Z= = = 0.9

# of observed exposures + k 1800 + 200

X= = 11.111

1800

You are given the following information about a single risk:

The risk has m exposures in each year

The risk is observed for n years

The variance of the hypothetical means is a

v

The expected value of the annual process variance is w +

m

Solution

n n n

Z= = =

n + k n + EV w+

v

VE n + m

a

v n

Then as m approaches infinity, w + approaches w and Z approaches Z = .

m w

n+

a

Incidentally, this leads to the correct answer. However, this line of thinking is

problematic. As explained earlier, in the Bhlmann-Straub model, the credibility factor is

n

mi

m n

Z= = i =1

, not Z = .

m+k n

n+k

mi + k

i =1

The correct logic is to realize that this problem involves a special Bhlmann-Straub

2

( ) . We are told that

mi

) = w( )+ m1 = m2 = ... = mn .

am * 1 1 n

Z= = = =

1 + am * 1 1 v

w+

v

1+ w+

1 m

a m* 1+ m n+

a n a

v n n

As m , 0, Z =

m v w

w+ n+

n+ m a

a

Nov 2004 #9

Members of three classes of insureds can have 0, 1, or 2 claims, with the following

probabilities:

# of claims

Class 0 1 2

I 0.9 0.0 0.1

II 0.8 0.1 0.1

III 0.7 0.2 0.1

A class is chosen at random, and varying # of insureds from that class are observed over

2 years, as shown below:

1 20 7

2 30 10

for 35 insureds.

Solution

Method 1 Use the Bhlmann-Straub credibility premium formula

(1) (2)

I 1/3 0.2 0.36

II 1/3 0.3 0.41

III 1/3 0.4 0.44

(2) 0.36 =02*(0.9) + 12*(0.0) + 22*(0.1) 0.22

1

m = m1 + m2 = 20 + 30 = 50 , = ( 0.2 + 0.3 + 0.4 ) = 0.3 ,

3

VE = Var E ( X ) = ( 0.22 + 0.32 + 0.4 2 ) 0.32 = 0.00667

1

3

EV = E Var ( X ) = ( 0.36 + 0.41 + 0.44 ) = 0.4033

1

3

EV 0.4033 m 50 7 + 10

k= = = 60.5 , Z= = = 0.4525 , X= = 0.34

VE 0.00667 m + k 50 + 60.5 50

2

k + mi X i

60.5 ( 0.3) + (10 + 7 )

P= i =1

= = 0.318

m+k 50 + 60.5

1 20 7

2 30 10

into

First 20years 7 1

Next 30 years 10 1

The above two tables are essentially the same. In both tables, the average loss per insured

per year is

7 + 10

X= = 0.34

50

premium formulas, we have:

n 50

Z= = = 0.4525 , P = 0.4525 ( 0.34 ) + (1 0.4525 ) 0.3 = 0.318

n + k 50 + 60.5

In this method, we dont care about the distinction between the Bhlmann and the

Bhlmann Straub models.

# of observed exposures 50

Z= = = 0.4525

# of observed exposures + k 50 + 60.5

7 + 10

X= = 0.34

50

You are given four classes of insured, each of whom may have zero or one claim, with

the following probabilities:

# of claims

0 1

Class

I 0.9 0.1

II 0.8 0.2

III 0.5 0.5

IV 0.1 0.9

A class is selected at random, and four insureds are selected at random from the class.

The total number of claims is two, If five insureds are selected at random from the same

class, estimate the total number of claims using the Bhlmann Straub credibility.

Solution

You can use any one of the three methods. Here I use the Bhlmann Straub credibility

m

formula Z = .

m+k

(1)

I 1/4 0.1 0.09

II 1/4 0.2 0.16

III 1/4 0.5 0.25

IV 1/4 0.9 0.09

a with probability p

If X = ! , then E ( X ) = ap + bq , Var ( X ) = ( a b ) pq

2

1

= ( 0.1 + 0.2 + 0.5 + 0.9 ) = 0.425

4

1

EV = ( 0.09 + 0.16 + 0.25 + 0.09 ) = 0.1475

4

VE = ( 0.12 + 0.22 + 0.52 + 0.92 ) 0.4252 = 0.096875

1

4

EV 0.1475 m 4 2

k= = = 1.5226 , Z= = = 0.7243 , X= = 0.5

VE 0.096875 m + k 4 + 1.5226 4

You are given:

The # of claims incurred in a month by any insured has a Poisson distribution

with mean #

The claim frequency of different insureds are independent.

The prior distribution is gamma with probability density function

(100# ) 100 #

6

e

f (# ) =

120#

Months # of insureds # of claims

1 100 6

2 150 8

3 200 11

4 300 ?

Solution

This time, lets solve it by converting the Bhlmann Straub credibility problem into a

Bhlmann credibility problem.

This table

1 100 6

2 150 8

3 200 11

Is the same as

First 100 1 6

Next 150 1 8

Next 200 1 11

So the total number of the observation years n = 100 + 150 + 200 = 450 . The total # of

25

observed claims is 6+8+11=25. So X = .

460

Then N # is Poisson with mean # . So the risk random variable is # .

E ( N # ) = Var ( N # ) = #

= EV = E# Var ( N # ) = E# [ # ] = & = 6

1

= 0.06

100

2

1

2

& 2 2

= 6(7) 0.06 2 = 0.0006

100

EV 0.06 n 450

k= = = 100 Z= = = 0.818

VE 0.0006 n + k 450 + 100

25

P = Z X + (1 Z ) = 0.818 + (1 0.818 ) 0.06 = 0.0564

460

300*0.0564=16.9

You are given:

A region is comprised of 3 territories. Claims experience for Year 1 is as follows:

A 10 4

B 20 5

C 30 3

The # of claims for each insured each year has a Poisson distribution.

Each insured in a territory has the same expected claim frequency.

The # of insureds is constant over time for each territory.

Determine the Bhlmann-Straub empirical Bayes estimate of the credibility factor Z for

Territory A.

Solution

A 1/6 4/10=0.4

B 2/6 5/20=0.25

C 3/6 3/30=0.1

= E E(X ) = E Var ( X )

1 2 3

= EV = ( 0.4 ) + ( 0.25 ) + ( 0.1) = 0.2

6 6 6

VE = Var E ( X ) E(X )

1

( 0.4 2 ) + ( 0.252 ) + ( 0.12 ) 0.22 = 0.0125

2 3

2

=E 2 =

6 6 6

EV 0.2 n 10

k= = = 16 , Z= = = 0.385

VE 0.0125 n + k 10 + 16

Chapter 7 Empirical Bayes estimate for the

Bhlmann model

Deans study note has a good explanation of the formulas and worked out problems.

Read Deans study note along with my explanation.

This topic is among the least interesting ones in Exam C. However, it was repeatedly

tested in Exam C. The exam problems on this topic are easy. The difficulty is to

memorize the formulas. In this chapter, I will show you some ideas behind the formulas

to help you memorize the formulas.

We have an n -year claim data about r risks. For each risk, we have its claim amount in

Year 1, Year 2, , Year n . Let X i j represent the claim incurred by the i -th

policyholder in Year j . This is what we know:

1 X 11 X 12 X 1n

2 X 21 X 22 X 2n

r X r1 X r2 Xrn

The issue here is that we dont know the probability distribution of the conditional claim

random variable X or the probability distribution of the risk variable . As a result, we

cant calculate the two inputs for the credibility factor Z : the expected process variance

EV = E Var ( X ) and the variance of the hypothetical mean VE = Var E ( X ) .

So we need to estimate EV and VE from the past claim data given to us.

Its easy to estimate EV = E Var ( X ) . We can estimate Var ( X ) for each risk

(X )

n

2 1 2

using the formula i = it Xi . Then well take the average and find

n 1 t =1

EV = E Var X i j( ) . This estimation process can be summarized as follows:

(X )

Risk Year 1 Year 2 Year Sample 2 1 n 2

n i = it Xi

mean X i n 1 t =1

( )

1 X 11 X 12 X 1n 1 n 2 1 n 2

X1 = X 1t 1 = X 1t X1

n t =1 n 1 t =1

( )

2 X 21 X 22 X 2n 1 n 2 1 n 2

X2 = X 2t 2 = X 2t X2

n t =1 n 1 t =1

r X r1 X r2 Xrn 1 n 1 n

( )

2 2

X1 = Xr t r = X rt Xn

n t =1 n 1 t =1

(X )

r r n

1 2 1 2

EV = = Xi .

r ( n 1)

i it

r i =1 i =1 t =1

Instead, we estimate VE using the following equation:

( )

Var X = Var ( ) +

1

n

E Var ( X ) = VE +

1

n

EV

VE = Var ( ) = Var X ( ) 1

n

E Var ( X ) = Var X ( ) 1

n

EV

VE = V ar X ( ) 1

n

EV

( ) ( ) (X )

n

1 2

V ar X is simple to calculate: V ar X = i X

r 1 i =1

( ) 1 n

( ) (X )

r n

1 2 1 1 2

So VE = V ar X EV = Xi X Xi

n r ( n 1)

it

n r 1 i =1 i =1 t =1

EV

VE = 0 . If VE = 0 , then k = and Z = 0 .

VE

Summary of the estimation process for the empirical Bayes estimate for the

Bhlmann model

Step 1 Calculate the sample variance for each risk and the expected process variance for

all risks combined:

(X ) (X )

n r r n

2 1 2 1 2 1 2

= Xi , EV = = Xi

r ( n 1)

i it i it

n 1 t =1 r i =1 i =1 t =1

( ) (X )

n

1 2

Step 2 Calculate V ar X = i X

r 1 i =1

n

EV . Find VE = V ar X ( ) 1

n

EV

An insurer has data on losses for four policyholders for seven years. X i j is the loss from

the i -th policyholder for year j . You are given:

(X )

4 7 2

ij Xi = 33.6

i =1 j =1

(X )

4 2

ij Xi = 3.3

i =1

nonparametric empirical Bayes estimation.

Solution

(X )

r r n

1 2 1 2 1

Step 1 EV = = Xi = 33.6 = 1.4

r ( n 1) 4 ( 7 1)

i it

r i =1 i =1 t =1

( ) (X )

n

1 2 3.3

Step 2 V ar X = i X = = 1.1

r 1 i =1 4 1

Step 3 ( )

Var X = VE +

1

n

EV VE = 1.1

1.4

7

= 0.9

EV 1.4 n 7

k= = , Z= = = 0.818

VE 0.9 n + k 7 + 1.4

0.9

An insurer has data on losses for four policyholders for seven years. X i j is the loss from

the i -th policyholder for year j . You are given:

(X )

4 7 2

ij Xi = 33.6

i =1 j =1

(X )

4 2

ij Xi = 3.3

i =1

Using the nonparametric empirical Bayes estimation, calculate the Bhlmann credibility

factor for an individual policyholder.

Solution

You are given total claims for two policyholders:

Year

Policyholder 1 2 3 4

X 730 800 650 700

Y 655 650 625 750

Using the nonparametric empirical Bayes estimation, calculate the Bhlmann credibility

factor for Policyholder Y.

Solution

r = 2, n = 4.

Step 1 Calculate the sample conditional variance for each risk and the mean

( ) (X )

n r r n

2 1 2 1 2 1 2

= X it Xi , EV = = Xi

r ( n 1)

i i it

n 1 t =1 r i =1 i =1 t =1

730 + 800 + 650 + 700

X= = 720

4

Y= = 670

4

(X )

n

1

V ar ( X ) =

2

it Xi

n 1 t =1

1

= ( 730 720 ) + (800 720 ) + ( 650 720 ) + ( 700 720 ) =3,933.33

2 2 2 2

4 1

1

V ar ( Y ) = ( 655 670 ) + ( 650 670 ) + ( 625 670 ) + ( 750 670 )

2 2 2 2

=3,016.67

4 1

1

EV = ( 3,933.33 + 3, 016.67 ) = 3, 475

2

( ) (X )

n

1 2

Step 2 Calculate V ar X = i X

r 1 i =1

1

2

( 1

)

X + Y = ( 720 + 670 ) = 695

2

( )

V ar X =

1

2 1

( 720 695 ) + ( 670 695 )

2 2

= 1, 250

n

EV . Find VE = V ar X ( ) 1

n

EV

VE = V ar X ( ) 1

n

EV = 1, 250

1

4

( 3, 475 ) = 381.25

EV 3, 475 mY 4

k= = = 9.115 , ZY = = = 0.305

VE 381.25 mY + k 4 + 9.115

Empirical Bayes estimate for the Bhlmann-Straub model

Here the # of policyholders varies from risk to risk and year to year. For risk 1, m11

policyholders have incurred X 11 claim amount in Year 1; m12 policyholders have

incurred X 12 claim amount in Year 1; ; that m1n policyholders have incurred X 1n

claim amount in Year 1.

For risk 2, m21 policyholders have incurred X 21 claim amount in Year 1; m22

policyholders have incurred X 22 claim amount in Year 2; ; that m2n2 policyholders

have incurred X 2n2 claim amount in Year n2 . So on and so forth.

Risk Year 1 Year 2 Year Year Year

n1 n2 nr

1 X 11 X 12 X 1n1

m11 m12 m1n1 m1n2

2 X 21 X 22 X 2n X 2 n2

m21 m22 m2n

r X r1 X r2 Xrn mr nr

mr1 mr 2 mr n X r nr

How to estimate:

Risk Periods Total Sample mean Sample variance

exposure

( )

1 n1 n1

1 n1

2 1 n1

2

m1 = m1t X1 = m1t X 1t 1 = m1t X 1t X1

t =1 m1 t =1 n1 1 t =1

( )

2 n2 n1

1 n2

2 1 n2

2

m2 = m2 t X2 = m2t X 2t 2 = m2 t X 2 t X2

t =1 m2 t =1 n2 1 t =1

( )

nr nr nr

r nr 1 2 1 2

mr = mr t Xr = mr t X r t r = mr t X r t Xr

t =1 mr t =1 nr 1 t =1

Total r nr r

1

( ni 1)

2

mr = mi X= mi X i i

i =1 m t =1

EV = i =1

r

( ni 1)

i =1

Step 1 Calculate the sample variance for each risk and the expected process variance for

all risks combined:

( )

ni

2 1 2

i = mi t X i t Xi ,

ni 1 t =1

( )

r r ni

( ni 1)

2 2

i mi t X i t Xi

EV = i =1

r

= i =1 t =1

r

( ni 1) ( ni 1)

i =1 i =1

Step 2 Calculate VE

( )

r

( r 1) EV

2

mi X i X

VE = i =1

r

1

m mi2

m i =1

This formula is counter-intuitive and very hard to remember. However, youll just have

to memorize it. Perhaps Deans explained might help you a little bit. He says that the

crude estimate for VE is

( )

r 2

mi X i X

VE = i =1

r 1

However, this estimate is biased. To have an unbiased estimator, we need to change the

above estimate to

( )

r

( r 1) EV

2

mi X i X

VE = i =1

r

1

m mi2

m i =1

This isnt a big help on how to memorize the formula. This formula is hard. Youll just

have to memorize it.

Final point. Loss Models mentions the concept of credibility weighted average premium.

It proves that the total loss will be equal to the total premium if we set

r

Zi X i

= i =1

r

Zi

i =1

You are given the following information on towing losses for two classes of insured,

adults and youths:

Exposures

Year Adult Youth Total

1996 2000 450 2450

1997 1000 250 1250

1998 1000 175 1175

1999 1000 125 1125

Total 5000 1000 6000

Pure Premium

Year Adult Youth Total

1996 0 15 2.755

1997 5 2 4.400

1998 6 15 7.340

1999 4 1 3.667

Weighted 3 10 4.167

Average

You are also given that the estimated variance of the hypothetical means is 17.125.

Determine the non-parametric empirical Bayes credibility premium for the youth class,

using the method that preserves the total losses.

Solution

( )

r r ni

( ni 1)

2 2

i mi t X i t Xi

EV = i =1

r

= i =1 t =1

r

( ni 1) ( ni 1)

i =1 i =1

1

2, 000 ( 0 3) + 1, 000 ( 5 3) + 1, 000 ( 6 3) + 1, 000 ( 4 3)

2 2 2 2

=

( 4 1) + ( 4 1)

+450 (15 10 ) + 250 ( 2 10 ) + 175 (15 10 ) + 125 (1 10 )

2 2 2 2

= 12,291.7

(Thank you SOA! )

5, 000 1, 000

ZA = = 0.874 , ZY = = 0.582

12, 291.7 12, 291.7

5, 000 + 1, 000 +

17.125 17.125

r

Zi X i

0.874 ( 3) + 0.528 (10 )

= i =1

= = 5.8

r

0.874 + 0.528

Zi

i =1

The non-parametric empirical Bayes credibility premium for the youth class is:

( )

Z Y X Y + 1 Z Y = 0.582 (10 ) + (1 0.582 ) 5.8 = 8.24

Lets verify that the total credibility premium is equal to the total loss:

The non-parametric empirical Bayes credibility premium for the adult class is:

( )

Z A X A + 1 Z A = 0.874 ( 3) + (1 0.874 ) 5.8 = 3.35

1,000(8.24)+5,000(3.35)=25,000

Adult: 2,000(0)+1,000(5)+1,000(6)+1,000(5)=15,000

Or 5,000(average exposure) * (3 average premium per exposure)=15,000

Youth: 450(15)+250(2)+175(15)+125(1)=10,000

Or 1,000(average exposure) * (10 average premium per exposure)=10,000

Total: 25,000

Guo Fall 2009 C, Page 176 / 284

May 2001 #32

You are given the following experience for two insured groups:

Year

Group 1 2 3 4

1 # of members 8 12 5 25

Average loss 96 91 113 97

per member

# of members 25 30 20 75

2 Average loss 113 111 116 113

per member

Total # of members 100

Average loss 109

per member

( )

2 3 2

mij xij xi = 2020

i =1 j =1

( )

2 2

mi xi x = 4800

i =1

Determine the nonparametric Empirical Bayes credibility premium for group 1, using the

method that preserves the total loss.

Solution

( )

r r ni

2

( ni 1)

2

i mi j X i j Xi

i =1 j =1 2020

EV = i =1

= = = 505

r r

2+2

( ni 1) ( ni 1)

i =1 i =1

( )

r

(r 1) EV

2

mi X i X

VE = i =1

=

4800 ( 2 1) 505 = 114.533

( 252 + 752 )

r

1 2 1

m m i

100

m i =1 100

EV 505

k= = = 4.409

VE 114.533

m1 25 m2 75

Z1 = = = 0.85 , Z 2 = = = 0.944

m1 + k 25 + 4.409 m2 + k 75 + 4.409

n

Please dont write Z 1 = . As mentioned before, in the Bhlmann-Straub model,

n+k

m n

Z= , not Z = .

m+k n+k

r

Zi X i

0.85 ( 97 ) + 0.944 (113)

= i =1

= = 105.42

r

0.85 + 0.944

Zi

i =1

( )

Z 1 X 1 + 1 Z 1 = 0.85 ( 97 ) + (1 0.85 )105.42 = 98.26

You are making credibility estimates for regional rating factors. You observe that the

Bhlmann-Straub nonparametric empirical Bayes method can be applied, with rating

factor playing the role of pure premium. X i j denotes the rating factor for region i and

year j , where i = 1, 2,3 and j = 1, 2, 3, 4 . Corresponding to each rating factor is the

number of reported claims, mi j , measuring exposure.

( ) ( )

4 4 4 2

1 1

i mi = mi j Xi = mi j vi = mi j X i j Xi mi j X i X

j =1 mi j =1 3 j =1

2 300 1.298 0.125 0.191

3 150 1.178 0.172 1.348

Determine the credibility estimate of the rating factor for region 1 using the method that

3

preserves mi X i .

i =1

Solution

( )

r r ni

2

( ni 1)

2

i mi j X i j Xi

i =1 j =1

EV = i =1

r

= r

( ni 1) ( ni 1)

i =1 i =1

1

= 3 ( 0.536 ) + 3 ( 0.125) + 3 ( 0.172 ) = 0.2777

( 4 1) + ( 4 1) + ( 4 1)

( )

r

(r 1) EV

2

mi X i X

0.887 + 0.191 + 1.348 0.2777 ( 2 )

VE = i =1

= = 0.069

( 50 + 300 + 150 )

r

1 1 2 2 2

m mi2 500

m i =1 500

EV 0.2777

k= = = 40.0829

VE 0.069

m1 25

Z1 = = = 0.555

m1 + k 25 + 40.0829

m2 300

Z2 = = = 0.882

m2 + k 300 + 40.0829

m3 150

Z3 = = = 0.789

m3 + k 150 + 40.0829

3

Zi X i

0.555 (1.406 ) + 0.8821(1.298 ) + 0.789 (1.178)

= i =1

= = 1.2824

3

0.555 + 0.8821 + 0.789

Zi

i =1

( )

Z 1 X 1 + 1 Z 1 = 0.555 (1.406 ) + (1 0.555 )1.2824 = 1.35

You are given the following commercial automobile policy experience:

Losses I 50,000 50,000 ?

# of automobile 100 200 ?

Losses II ? 150,000 150,000

# of automobile ? 500 300

Losses III 150,000 ? 150,000

# of automobile 50 ? 150

Solution

I X11=50,000 / X12=50,000/200 X 1 =100,000/300

100 = 250 = 333.33

= 500

=300 = 500 =375

III X31=150,000/50 X32=150,000/150 X 3 =300,000/200

=3,000 =1,000 =1,500

X= = 538.46

300 + 800 + 200

( )

r r ni

2

( ni 1)

2

i mi j X i j Xi

i =1 j =1

EV = i =1

r

= r

( ni 1) ( ni 1)

i =1 i =1

1

100 ( 500 333.33) + 200 ( 250 333.33)

2 2

=

( 2 1) + ( 2 1) + ( 2 1)

+500 ( 300 375 ) + 300 ( 500 375 )

2 2

2 2

=53,888,889

( )

r

( r 1) EV

2

mi X i X

VE = i =1

r

1

m mi2

m i =1

300 ( 333.33 538.46 ) + 800 ( 375 538.46 ) + 200 (1,500 538.46 ) 53,888,889 ( 3 1)

2 2 2

=

1,300

1

1,300

( 3002 + 8002 + 2002 )

=157,035.6

53,888,889 200

k= = 343.16 , Z= = 0.368

157, 035.6 200 + 343.16

Guo Fall 2009 C, Page 180 / 284

May 2005 #25

Group Year 1 Year 2 Year 3 Total

Total Claims 1 10,000 15,000 2,500

# in Group 50 60 110

Average 200 250 227.27

Total Claims 2 16,000 18,000 34,000

# in Group 100 90 190

Average 160 200 178.95

Total Claims 59,000

# in Group 300

Average 196.67

Use the nonparametric empirical Bayes method to estimate the credibility factor for

Group 1.

Solution

( )

r r ni

2

( ni 1)

2

i mi j X i j Xi

i =1 j =1

EV = i =1

r

= r

( ni 1) ( ni 1)

i =1 i =1

1

= 50 ( 200 227.27 ) + 60 ( 250 227.27 )

2 2

( 2 1) + ( 2 1)

+100 (160 178.95 ) + 90 ( 200 178.95 )

2 2

=71,985.65

110

Z1 = = 0.5

71,985.65

100 +

651.03

Semi-parametric Bayes estimate

We have a parametric model for X , but we dont have a parametric model for

(hence the name semi-parametric). Typically, a problem will tell us that X is a Poisson

random variable with mean .

= EV .

VE = Var ( X ) EV ,

The number of claims a driver has during the year is assumed to be Poisson distributed

with an unknown mean that varies by driver.

0 54

1 33

2 10

3 2

4 1

Determine the credibility of one years experience for a single driver using

semiparametric empirical Bayes estimation.

Solution

Let X represent the # of claims in a year and represents the mean of X . We are told

that X is a Poisson random variable.

= E(X ) = E E(X ) =E ( ) = E( )

EV = E Var ( X ) =E ( ) = E( )

Guo Fall 2009 C, Page 182 / 284

54 ( 0 ) + 33 (1) + 10 ( 2 ) + 2 ( 3) + 1( 4 ) 63

=X = = = 0.63

54 + 33 + 10 + 2 + 1 100

EV = = 0.63

Var ( X ) =

1 100

( )

2

Xi X

100 1 i =1

54 ( 0 .63) + 33 ( 0 .63) + 54 (1 .63) + 10 ( 2 .63) + 2 ( 3 .63) + 1( 4 .63)

2 2 2 2 2 2

=

100 1

=0.68

Var ( X ) = EV + VE , VE = Var ( X ) EV = 0.68 0.63 = 0.05

n 1

Z= = 0.073

EV 0.63

n+ 1+

VE 0.05

When taking the exam, you should use BA II Plus/ Professional 1-V Statistics Worksheet

to quickly calculate the sample mean and the sample variance.

Nov 2000 #7

The following information comes from a study of robberies of convenience stores over

the course of a year:

X i = 50

X i2 = 220

The number of robberies of a given convenience store during the year is assumed

to be Poisson distributed with an unknown mean that varies by store.

robberies next year of a store that reported no robberies during the studied year.

Solution

50

EV = = X = = 0.1

500

(X )

500

1

Var ( X ) =

2

i X

n 1 i =1

( ) ( )

n 2 n 2

Xi X = X i2 n X

i =1 i =1

To see why this formula works, notice that the (biased) sample variance is:

( ) = E ( X 2 ) E2 ( X ) = (X )

n n

1 1

Var ( X ) =

2 2

Xi X X i2

n i =1 n i =1

( ) ( ) ( ) ( )

n n n n

1 2 1 2 2 2

Xi X = Xi2 X , Xi X = X i2 n X

n i =1 n i =1 i =1 i =1

1 500

( ) ( )

500

1 220 5

Var ( X ) =

2 2

Xi X = X i 2 500 X = = 0.43086

500 1 i =1 500 1 i =1 499

n 1

Z= = 0.768

EV 0.1

n+ 1+

VE 0.33086

The single store didnt have any robbery incidents for two years. So the sample mean is

zero.

For a portfolio of motorcycle insurance policyholders, you are given:

The number of claims for each policyholder has a conditional Poisson distribution

For Year 1, the following data are observed:

Number of claims Number of Policyholders

0 2000

1 600

2 300

3 80

4 20

Total 3000

Solution

X01=0, Y01=2000

X02=1, Y02= 600

X03=2, Y03= 300

X04=3, Y04= 80

X05=4, Y05= 20

The sample standard deviation is S X = 0.83077411

The sample variance is S X 2 = 0.830774112 = 0.69018562 0.69019 . This is Var ( X ) .

n 1

Z = = 0.265

EV 0.507

n+ 1+

VE 0.183

During a 2-year period, 100 policies had the following claim experience:

Number of claims in Year 1 and Year 2 Number of Policyholders

0 50

1 30

2 15

3 4

4 1

Guo Fall 2009 C, Page 185 / 284

The number of claims per year follows a Poisson distribution.

Each policyholder was insured for the entire 2-year period.

A randomly selected policyholder had one claim over the 2-year period.

Using semiparametric empirical Bayes estimation, determine the Bhlmann estimate for

the number of claims in Year 3 for the same policyholder.

Solution

Well use a 2-year period as one unit of time. So well calculate the Bhlmann estimate

the number of claims in Year 3 and Year 4. Then half of this amount will be the

Bhlmann estimate for the number of claims in Year 3.

X01=0, Y01=50

X02=1, Y02=30

X03=2, Y03=15

X04=3, Y04= 4

X05=4, Y05= 1

The sample mean is X =0.76. This is and EV .

The sample standard deviation is S X = 0.92244734

The sample variance is S X 2 = 0.922447342 = 0.85090909 0.851 . This is Var ( X ) .

n 1

Z = = 0.107

EV 0.76

n+ 1+

VE 0.091

A randomly selected policyholder had one claim over the 2-year period. So the sample

claim frequency is

1 1

P = ( 0.786 ) = 0.393

2 2

Chapter 8 Limited fluctuation credibility

The study note titled Chapter 8 Credibility jointly written by Mahler and Dean provides

an excellent explanation of the limited fluctuation credibility theory. Please read this

study note along with my explanation.

The goal of the limited fluctuation credibility model is the same as the goal of the

Bhlmann credibility model. We observe that a policyholder has incurred S1 , S2 ,, Sn

claim dollar amounts in Year 1, 2,, n respectively. We want to estimate the

policyholder renewal premium in Year n + 1 . The renewal premium in Year n + 1 is

E ( S n +1 S1 , S 2 ,..., S n ) , the expected claim dollar amount in Year n + 1 .

in the Bhlmann credibility model. Theres a reason for using a different notation. In the

limited fluctuation credibility model, we typically break down the annual claim dollar

amount S into two components:

the claim dollar amount per loss incurred by a policyholder in a year (loss

severity)

N

Mathematically, S = X i . Here N is the total number of claims incurred in a year (loss

i =1

frequency) by a policyholder. X i is the claim dollar amount of the i -th claim (loss

severity) incurred by the policyholder. S is the total claim dollar amount incurred in a

year (also called the annual aggregate claim) by the policyholder. In contrast, in the

Bhlmann credibility model, we dont break down the annual claim dollar amount into

loss frequency and loss severity.

model, that the renewal premium is the weighted average of the global premium rate

1

(called manual rate) and the sample mean S = ( S1 + S 2 + ... + S n ) :

n

P = E ( S n +1 S1 , S 2 ,..., S n ) = Z S + (1 Z)

Renewal policyholder-specific global mean

premium sample mean (manual rate)

amounts S1 , S2 ,, Sn and hence different S . However, is the same for all

policyholders regardless of their different claim history.

The limited fluctuation credibility assumes that the above renewal premium equation

automatically holds true without any proof. This equation is the starting point for the

limited fluctuation credibility. So when you study the limited fluctuation credibility,

youll need to accept the above equation without demanding proof.

In contrast, the Bhlmann credibility theory doesnt assume the above equation holds true

automatically. It derives this equation using basic probability theories.

assigned to the prior sample mean S . The Limited fluctuation credibility calculates Z as

follows:

Z= =

expected # of observations needed to make Z=1 E ( N ) to make Z=1

your n

Once again, the limited fluctuation credibility assumes that Z =

E ( N ) to make Z=1

holds true automatically without the need to prove it. So you need to accept it without

demanding any proof. The core theory of the limited fluctuation credibility is to calculate

E ( N ) to make Z = 1 .

We first derive a model for r insureds. Then to calculate the renewal premium for one

insured, we just set r = 1 .

M

S= X i = X 1 + X 2 + ... + X M

i =1

r

Here X i is the dollar amount of the i -th claim. M = N j = N1 + N 2 + ... + N r is the total

j =1

# of annual claims for r insureds; N j is the number of claims incurred by the j -th

insured.

pdf f X ( x ) ; N1 , N 2 ,, N r are independent identically distributed with a common pdf

fN (n) .

We arbitrarily set Z = 1 if E ( M ) satisfies the following equation:

S E (S ) E (S )

P S E (S ) k E (S ) p P k p

S S

S E (S ) S E (S )

A simplifying assumption is that is approximately normal. Set Z = .

S S

Then Z is approximately a standard normal random variable.

E (S )

P Z k p,

S

P Z a = P( a Z a) = (a) ( a) .

E (S ) E (S )

P Z a =2 (a) 1, P Z k =2 k 1 p

S S

E (S )

Lets consider the worst case P S E (S ) k E ( S ) = p or 2 k 1 = p . We

S

E (S ) E (S ) 1+ p

2 k = 1+ p , k =

S S 2

Define CVS = S

as the coefficient of variation. Its the standard deviation divided by

E (S )

the mean. Then

k 1+ p 1+ p

= , k= 1

CVS .

CVS 2 2

1+ p 1+ p

Next, define ( y) = , or y = 1

. Then k = y CVS

2 2

Key interim formula: credibility for the aggregate loss

As actuaries, we set k and p . Then we find E ( N ) to make Z = 1 by solving the

k 1+ p 1+ p

equation = or k = 1

CVS = y CVS

CVS 2 2

E ( M ) = E ( N1 + N 2 + ... + N r ) = r E ( N )

Var ( M ) = Var ( N1 + N 2 + ... + N r ) = rVar ( N )

E ( S ) = E ( X 1 + X 2 + ... + X M ) = E ( M ) E ( X ) = r E ( N ) E ( X )

Var ( S ) = Var ( X 1 + X 2 + ... + X M ) = E ( M ) Var ( X ) + Var ( M ) E 2 ( X )

= rE ( N ) Var ( X ) + rVar ( N ) E 2 ( X )

= r E ( N ) Var ( X ) + Var ( N ) E 2 ( X )

(S ) = r E ( N ) Var ( X ) + Var ( N ) E 2 ( X )

CVS =

E (S ) r E(N)E(X )

r E ( N ) Var ( X ) + Var ( N ) E 2 ( X )

k = y CVS = y

r E(N)E( X )

2

k

= =

r E(N)E( X ) r E(N)E(X )

2 2

y

2 2

k 1 1 y

= + , r E(N) = +

y r E(N) E2 ( X ) E(N) k E2 ( X ) E(N)

Final formula you need to memorize

You also know how to derive from scratch this is the mother of all the formulas

for the limited fluctuation credibility model

2 2

y y

If r E ( N ) = + = CVX2 + , then Z = 1 .

k E (X )

2

E(N) k E(N)

Please note that r is the number of insureds needed to achieve the full credibility. E ( N )

is the number of annual claims per insured. So r E ( N ) represents the expected number

of claims the insurer needs to have in its book of business to have the full credibility.

2

# of insureds in expected # of

claims per insured 1 1+ p

Var ( X ) Var ( N )

the book of business

2

r E(N ) = +

k E2 ( X ) E(N)

the expected # of claims the insurer

needs to have in its book of business

to have full credibility

Var ( X ) Var ( N )

2

y

r E(N) = + Wrong!

k E(X ) E(N)

Var ( X )

To remember the term , please note that X is the claim dollar amount. So

E2 ( X )

E ( X ) is dollar amount and Var ( X ) is dollar squared. To have a meaningful ratio, we

need to square E ( X ) so the numerator and denominator are both dollar squared.

Var ( N )

Please also note that is fine. Here N is the claim number. So Var ( N ) is a

E(N )

Var ( N )

number; E ( N ) is a number. So the ratio is fine.

E(N )

Once again, remember that X is the dollar amount of a single claim incurred by one

policyholder and that N is the annual number of claims incurred by the policyholder.

Special case

Credibility formulas for the aggregate loss for one insured (credibility in terms of

the expected number of annual claims)

Set r = 1 .

2

1 1+ p

2 Var ( X ) Var ( N )

If E ( N ) = + , then Z = 1 .

k E2 ( X ) E(N)

your n your n

Z = min , 1 = min ,1

E ( N ) to make Z=1 Var ( N )

n0 CVX +

2

E(N)

You are given:

Claim counts follow a Poisson distribution

Claim sizes follow a lognormal distribution with coefficient of variation of 3

Claim sizes and claim counts are independent

The number of claims in the 1st year is 1,000

The aggregate loss in the 1st year was 6.75 million

The manual premium for the 1st year was 5 million

The exposure in the 2nd year is identical to the exposure in the 1st year

The full credibility standard is to be within 5% of the expected aggregate loss

95% of the time

Determine the limited fluctuation credibility net premium (in millions) for the 2nd year.

Solution

We are asked to find the limited fluctuation credibility renewal net premium for Year 2.

So we are just concerned with one policy (or one insured). Set r = 1 .

2 2

1 1+ p 1 1 + 95%

2 Var ( X ) Var ( N ) 2 Var ( X ) Var ( N )

E(N) = + = +

k E2 ( X ) E(N) 5% E2 ( X ) E(N)

We are told that the claim size X is lognormal with the coefficient of variation of one.

The information that X is lognormal is not needed. SOA just wants to scare us. What

matters is CVX . We are told that CVX = 3 .

Var ( N )

In addition, we know that N is Poisson. So =1.

E(N)

2 2

E(N) =

1.96

5%

( 32 + 1) = 10 1.96

5%

your n 1000 5%

Z = min , 1 = min , 1 = min 10 ,1 =0.255

E ( N ) to make Z=1 1.96

2

1.96

10

5%

P = E ( S n +1 S1 , S 2 ,..., S n ) = Z S + (1 Z)

Renewal policyholder-specific global mean

premium sample mean (manual rate)

= 0.255*6.75 + (1-0.255)*5=5.446

For each individual insured, the number of claims follows a Poisson distribution

The mean claim count varies by insured, and the distribution of mean claim

counts follow a gamma distribution

For a random sample of 1000 insureds, the observed claim counts are as follows:

# of claims, n 0 1 2 3 4 5

# of insureds, f n 512 307 123 41 11 6

n f n = 750 , n 2 f n = 1494

Claim sizes follow a Pareto distribution with mean 1500 and variance 6,750,000.

Guo Fall 2009 C, Page 193 / 284

Claim sizes and claim counts are independent

The full credibility standard is to be with 5% of the expected aggregate loss 95%

of the time.

Determine the minimum number of insureds needed for the aggregate loss to be fully

credible.

Solution

2

1 1+ p

2 Var ( X ) Var ( N )

r E(N) = +

k E2 ( X ) E(N )

2

1 1+ p

1 2 Var ( X ) Var ( N )

r= +

E(N) k E2 ( X ) E(N)

1 1+ p 1 1 + 95%

2 2 1.96

= =

k 5% 5%

(X ) =

2

6,750,000 6,750,000

We know that CVX = . CV =2

=3

E(X )

X

1500 1500

Var ( N )

We can use the method of moments to estimate .

E(N )

n2 fn

E(N )=

n fn

750 1, 494

E(N) = = = 0.75 , 2

= = 1.494

1000 1000 1000 1000

Var ( N ) 0.9315

Var ( N ) = 1.494 0.752 = 0.9315 = = 1.242

E(N) 0.75

2

1 1+ p

1 2 Var ( X ) Var ( N )

r= +

E(N) k E2 ( X ) E(N)

2

1 1.96 6,518.42688

= ( 3 + 1.242 ) = = 8, 691.24

0.75 5% 0.75

Nov 2001 #15

You are given the following information about a general liability book of business

comprised of 2500 insureds:

Ni

Xi = Yi j is a random variable representing the annual loss of the i -th insured.

j =1

following a negative binomial distribution with parameters r = 2 and = 0.2 .

Yi1 , Yi 2 , , Yi Ni are independent and identically distributed random variables

following a Pareto distribution with parameters = 3 and = 1000 .

The full credibility standard is to be within 5% of the expected aggregate losses

90% of the time.

Using classical credibility theory, determine the partial credibility of the annual loss

experience for the book of business.

Solution

2

1 1+ p

1 2 Var (Y ) Var ( N )

r= +

E(N) k E 2 (Y ) E(N)

Var (Y ) E (Y 2 ) E 2 ( Y ) E (Y 2 )

However, = = 1

E 2 (Y ) E 2 (Y ) E 2 (Y )

2

1+ p

E (Y 2 )

1

1 2 Var ( N )

r= 1+

E(N) k E (Y )

2

E(N)

Var ( N )

E ( N ) = r = 2 ( 0.2 ) = 0.4 , Var ( N ) = r (1 + ) , = 1+ = 1 + 0.2 = 1.2

E(N)

k

2 2

E (Y k ) = , E (Y 2 ) =

k!

, E (Y ) = ,

( 1)( 2 ) ... ( k) 1 ( 1)( 2)

2 2

E (Y 2 ) ( 1)( 2) 2( 1) 2 ( 3 1)

= = = =4

E (Y ) 2 2

2 3 2

1

E (Y 2 ) 1

if Y is a 2-parameter Pareto, then =2

E (Y )

2

2

2

1+ p

E (Y 2 )

1

1 2 Var ( N )

r= 1+

E(N) k E 2 (Y ) E(N)

2

1 1 + 90%

2

1 2 1 1.645

= ( 4 1 + 1.2 ) = (4 1 + 1.2 )

0.4 5% 0.4 5%

2 2

4.2 1.645 1.645

= = 10.5

0.4 5% 5%

2

1 1+ p

2

Please note that many times its advantageous not to expand . For

k

2

1 1+ p

2

2 1.645

= 10.5 = 11,365.305

k 5%

2

1.645

Lets continue. 10.5 is the number of insured to get full credibility. However,

5%

the number of insureds is 2500 in the book of the business.

your r 2500 50

Z= = 2

= = 0.469

r to make Z=1 1.645 1.645

10.5 10.5

5% 5%

You are given the following information about a commercial auto liability book of

business:

Each insureds claim count has a Poisson distribution with mean , where has

a gamma distribution with = 1.5 and = 0.2

Individual claim size amounts are independent and exponentially distributed with

mean 5000

The full credibility standard is for the aggregate losses to be within 5% of the

expected with probability 0.9

Using classical credibility, determine the expected number of claims required for full

credibility.

Solution

2

1 1+ p

2 Var ( X ) Var ( N )

rE ( N ) = +

k E2 ( X ) E(N)

the expected # of claims the insurer

needs to have in its book of business

to have full credibility

1 1+ p 1 1 + 90%

2 2 1.645

= =

k 5% 5%

gamma with parameters = 1.5 and = 0.2 . So N is negative binomial with parameters

r = = 1.5 and = = 0.2 .

Var ( N )

E ( N ) = r , Var ( N ) = r ( + 1) , = 1+ = 1 + 0.2 = 1.2

E(N)

Var ( X )

X is exponentially distributed. =1

E2 ( X )

2

1 1+ p

Var ( X ) Var ( N )

2

2 1.645

rE ( N ) = + = (1 + 1.2 ) = 2381.302

k E (X )

2

E(N ) 5%

So the insurer needs to have at least 2,381 claims in a year to have full credibility.

Please note that the following information is not necessary for us to solve the problem:

Var ( N )

= 1 + regardless of .

E(N)

The mean 5000 for the individual claim size random variable. If X is

Var ( X )

exponential, then 2 = 1 regardless of the mean.

E (X)

Nov 2003 #3

You are given:

The number of claims has a Poisson distribution

Claim sizes have a Pareto distribution with parameters = 0.5 and = 6

The number of claims and claim sizes are independent

The observed pure premium should be within 2% of the expected pure premium

90% of the time.

Solution

The pure premium is the expected total annual claim dollar amount incurred by one

policyholder. Set r = 1 , we have:

2 2

1+ p 1+ p

E(X2)

1 1

E(N) = + = 1+

k E2 ( X ) E(N) k E2 ( X ) E(N )

The claim size X has a Pareto distribution with parameters = 0.5 and =6

E(X2) 1 6 1

=2 =2 = 2.5

E 2

(X ) 2 6 2

Var ( N )

N is Poisson. So =1.

E(N)

2 2

1+ p 1 + 90%

E(X2)

1 1

2 Var ( N ) 2

E(N) = 1+ = ( 2.5 1 + 1)

k E 2

(X ) E(N ) 2%

2

1.645

= ( 2.5 ) = 16, 912.66

2%

You are given:

The number of claims has probability function:

m x

p ( x) = q (1 q ) , x = 0,1, 2,..., m

m x

The actual number of claims must be within 1% of the expected number of claims

with probability 0.95.

Determine q .

Solution

2

1 1+ p

2 Var ( X ) Var ( N )

rE ( N ) = +

k E2 ( X ) E(N)

This problem is concerned only with loss frequency. So we in the aggregate loss model

N

S= X i , we set X i = 1 . This way, S = N becomes the total number of claims. Setting

i =1

2

1 1+ p

2 Var ( N )

rE ( N ) =

k E(N)

Var ( N ) mq (1 q )

Plugging in the numbers: p = 95% , k = 1% , = =1 q

E(N) mq

2

1 1+ p

Var ( N )

2

2 1.96

rE ( N ) = = (1 q ) = 34,574 q = 0.9

k E(N) 1%

May 2005 #2

You are given:

The number of claims follows a negative binomial distribution with parameters r

and = 3 .

Claim severity has the following distribution

Claim Size Probability

1 0.4

The number 10 0.4 of claims is independent of

the severity 100 0.2 of claims.

Determine the expected number of claims needed for aggregate losses to be within 10%

of the expected aggregate losses with 95% probability.

Solution

1 0.4

10 0.4

100 0.2

Var ( N )

The claim size N is negative binomial. So = 1+ = 1+ 3 = 4 .

E(N)

2

1 1+ p

Var ( X ) Var ( N )

2

2 1.96 1, 445.04

rE ( N ) = + = + 4 = 2469.06

k E (X)

2

E(N) 10% 24.42

Nov 2005 #35

You are given:

The number of claims follows a Poisson distribution

Claim sizes follow a gamma distribution with parameters (unknown) and

= 10, 000

The number of claims and claim sizes are independent

The full credibility standard has been selected so that actual aggregate losses will

be within 10% of the expected aggregate losses 95% of the time

Using limited fluctuation (classical) credibility, determine the expected number of claims

required for full credibility.

Solution

2 2

1 1+ p 1 1+ p

2 Var ( X ) Var ( N ) 2 Var ( X )

rE ( N ) = + = +1

k E2 ( X ) E(N) k E2 ( X )

2 2

1+ p 1+ p

E(X2)

1 1

2 Var ( X ) + E 2 ( X ) 2

= =

k E2 ( X ) k E2 ( X )

.

2 2

1+ p 1+ p

E(X2)

1 1

+1 +1

2

2 2 1.96

rE ( N ) = = =

k E 2

(X ) k 10%

Chapter 9 Bayesian estimate

Exam C routinely tests Bayesian premium problems. Though many seem to understand

the theory behind Bayesian premiums, they have trouble calculating Bayesian premiums.

Most candidates are weak in the following two areas:

When the prior probability is continuous, many candidates dont know how to

calculate the posterior probability or how to find the Bayesian premium.

Continuous-prior problems are typically harder than discrete-prior problems.

When the prior probability is discrete and the calculation is messy, many

candidates dont know how to solve the problem in a few minutes. Many

candidates have inefficient calculation methods that are long and prone to errors.

In this chapter, I will first give you an intuitive review of Bayes Theorem. Next, I will

give you a framework for quickly solving Bayesian premium problems whether the prior

probability is discrete and continuous. In addition, I will give you a BA II Plus/BA II Plus

Professional shortcut for calculating Bayesian premiums when the prior probability is

discrete.

Even you are proficient in Bayes Theorem, I recommend that you still go over the

review. It is the foundation for the framework and shortcut to be presented later.

Prior probability. Before anything happens, as our baseline analysis, we believe (based

on existing information we have up to now or using purely subjective judgment) that our

total risk pool consists of several homogenous groups. As a part of our baseline analysis,

we also assume that these homogenous groups have different sizes. For any insured

person randomly chosen from the population, he is charged a weighed average premium.

driving habits, all insureds into two homogenous groups: aggressive drivers and non-

aggressive drivers. In regards to the sizes of these two groups, we assume (based on

existing information we have up to now or using purely subjective judgment) that the

aggressive insureds account for 40% of the total insureds and non-aggressive account

for the remaining 60%.

So for an average driver randomly chosen from the population, we charge a weighed

average premium rate (we believe that an average driver has some aggressiveness and

some non-aggressiveness):

Premium charged on a person randomly chosen from the population

= 40%*premium rate for an aggressive drivers rate

+ 60%*premium rate for a non-aggressive drivers rate

Posterior probability. Then after a year, an event changed our belief about the makeup

of the homogeneous groups for a specific insured. For example, we found in one year one

particular insured had three car accidents while an average driver had only one accident

in the same time period. So the three-accident insured definitely involved more risk than

did the average driver randomly chosen from the population. As a result, the premium

rate for the three-accident insured should be higher than an average drivers premium

rate.

The new premium rate we will charge is still a weighted average of the rates for the two

homogeneous groups, except that we use a higher weighting factor for an aggressive

drivers rate and a lower weighting factor for a non-aggressive drivers rate.

= 67%* premium rate for an aggressive drivers rate

+ 33%* premium rate for a non-aggressive drivers rate

In other words, we still think this particular drivers risk consists of two risk groups

aggressive and non-aggressive, but we alter the sizes of these two risk groups for this

specific insured. So instead of assuming that this persons risk consists of 40% of an

aggressive drivers risk and 60% of a non-aggressive drivers risk, we assume that his

risk consists of 67% of an aggressive drivers risk and 33% of a non-aggressive drivers

risk.

How do we come up with the new group sizes (or the new weighting factors)? There is a

specific formula for calculating the new group sizes:

=K the group size before the event this groups probability to make the event happen.

K is a scaling factor to make the sum of the new sizes for all groups equal to 100%.

In our example above, this is how we got the new size for the aggressive group and the

new size for the non-aggressive group. Suppose we know that the probability for an

aggressive driver to have 3 car accidents in a year is 15%; the probability for a non-

aggressive driver to have 3 car accidents in a year is 5%. Then for the driver who has 3

accidents in a year,

the size of the aggressive risk for someone who had 3 accidents in a year

Guo Fall 2009 C, Page 203 / 284

= K (prior size of pure aggressive risk)

(probability of an aggressive driver having 3 car accidents in a year)

= K (40% )(15%)

the size of the non-aggressive risk for someone who had 3 accidents in a year

= K (prior size of the non-aggressive risk)

(probability of a no- aggressive driver having 3 car accidents in a year)

= K ( 60% ) (5%)

K is a scaling factor such that the sum of posterior sizes is equal to one. So

1

K ( 40% ) (15%) + K ( 60% ) ( 5%) =1, K= = 11.11%

40% (15% ) + 60% ( 5% )

the size of the aggressive risk for someone who had 3 accidents in a year

= 11.11% (40% ) ( 15% )= 66.67%

the size of the non-aggressive risk for someone who had 3 accidents in a year

=11.11% (60% ) ( 5%) = 33.33%

The above logic should make intuitive sense. The bigger the size of the group prior to the

event, the higher contribution this group will make to the events occurrence; the bigger

the probability for this group to make the event happen, the higher the contribution this

group will make to the events occurrence. So the product of the prior size of the group

and the groups probability to make the event happen captures this groups total

contribution to the events occurrence.

If we assign the post-event size of a group proportional to the product of the prior size

and the groups probability to make the event happen, we are really assigning the post-

event size of a group proportional to this groups total contribution to the events

occurrence. Again, this should make sense.

Lets summarize the logic for finding the new size of each group in the following table:

Event: An insured had 3 accidents in a year.

A B C D=(scaling factor K) BC

Homogenous Before- Groups Post-event group size

groups (also called event probability to

segments, which group size make the even

are 2 components happen

of a risk)

Aggressive 40% 15% K40%15%

40% 15%

=

40% 15% + 60% 5%

Non-aggressive 60% 5% K60%5%

60% 5%

=

40% 15% + 60% 5%

If we divide the population into n non-overlapping groups G1,G 2, ...,Gn such that each

element in the population belongs to one and only one group, then after the event E

occurs,

1

So K=

Pr(G1 ) Pr( E | G1 ) + Pr(G2 ) Pr( E | G2 ) + ... + Pr(Gn ) Pr( E | Gn )

Pr(Gi ) Pr( E | Gi )

And Pr(Gi | E ) =

Pr(G1 ) Pr( E | G1 ) + Pr(G2 ) Pr( E | G2 ) + ... + Pr(Gn ) Pr( E | Gn )

Pr(Gi | E ) is the conditional probability that Gi will happen given the event E happened,

so it is called the posterior probability. Pr(Gi | E ) can be conveniently interpreted as the

new size of Group Gi after the event E happened. Intuitively, probability can often be

interpreted as a group size.

For example, if a probability for a female to pass Course 4 is 55% and male 45%, we can

say that the total pool of the passing candidates consists of 2 groups, female and male

with their respective sizes of 55% and 45%.

Guo Fall 2009 C, Page 205 / 284

Pr(Gi ) is the probability that Gi will happen prior to the event Es occurrence, so its

called prior probability. Pr(Gi ) can be conveniently interpreted as the size of group Gi

prior to the occurrence of E.

Pr( E | Gi ) is the conditional probability that E will happen given Gi has happened. It is the

Group Gi s probability of making the event E happen. For example, say a candidate who

has passed Course 3 has 50% chance of passing Course 4, that is to say:

We can say that the people who passed Course 3 have a 50% of chance of passing Course

4.

Before we jump into the formula, lets look at a sixth-grade level math problem, which

requires zero knowledge about probability. If you understand this problem, you should

have no trouble understanding Bayes Theorem.

Problem 1

A rock is found to contain gold. It has 3 layers, each with a different density of gold. You

are given:

The top layer, which accounts for 80% of the mass of the rock, has a gold density

of only 10% (i.e. the amount of gold contained in the top layer is equal to 10% of

the mass of the top layer).

The middle layer, which accounts for 15% of the rocks mass, has a gold density

of 5%.

The bottom layer, which accounts for only 5% of the rocks mass, has a gold

density of 0.2%.

Questions

What is the rocks density of gold (i.e. what % of the rocks mass is gold)?

Of the total amount of gold contained in the rock, what % of gold comes from the top

layer? What % from the middle layer? What % comes from the bottom layer?

Solution

Lets set up a table to solve the problem. Assume that the mass of the rock is one (can be

1 pound, 1 gram, 1 ton it doesnt matter).

Guo Fall 2009 C, Page 206 / 284

A B C D=BC E=D/0.0876

1 Layer Mass of Density of Mass of gold Of the total amount of

the layer gold in the contained in the gold in the rock, what %

layer layer comes from this layer?

2 Top 0.80 10.0% 0.0800 91.3%

3 Middle 0.15 5.0% 0.0075 8.6%

4 Bottom 0.05 0.2% 0.0001 0.1%

5 Total 1.00 0.0876 100%

Cell(D,2)=0.810%=0.08,

Cell(D,5)=0.0800+0.0075+0.0001=0.0876,

Cell(E,2)= 0.08/0.0876=91.3%.

So the rock has a gold density of 0.0876 (i.e. 8.76% of the mass of the rock is gold).

Of the total amount of gold contained in the rock, 91.3% of the gold comes from the top

layer, 8.6% of the gold comes from the middle layer, and the remaining 0.1% of the gold

comes from the bottom layers. In other words, the top layer contributes to 91.3% of the

gold in the rock, the middle layer 8.6%, and the bottom layer 0.1%.

The logic behind this simple math problem is exactly the same logic behind Bayes

Theorem.

Now lets change the problem into one about prior and posterior probabilities.

Problem 2

believes that theres an 80% chance that an applicant for life insurance qualifies for the

standard nonsmoker class (which has the standard underwriting criteria and the standard

premium rate); theres a 15% chance that an applicant qualifies for the preferred smoker

class (which has more stringent qualifying standards and a lower premium rate than the

standard nonsmoker class); and theres a 5% chance that the applicant qualifies for the

super preferred class (which has the highest underwriting standards and the lowest

premium rate among nonsmokers).

of having a specific heart-related illness:

The standard nonsmoker class has 10% of chance of getting the specific heart

disease.

The preferred nonsmoker class has 5% of chance of getting the specific heart

disease.

Guo Fall 2009 C, Page 207 / 284

The super preferred nonsmoker class has 0.2% of chance of getting the specific

heart disease.

If a nonsmoking applicant was found to have this specific heart-related illness, what is

the probability of this applicant coming from the standard risk class? What is the

probability of this applicant coming from the preferred risk class? What is the probability

of this applicant coming from the super preferred risk class?

Solution

The solution to this problem is exactly the same as the one to the rock problem.

Event: the applicant was found to have the specific heart disease

A B C D=BC E=D/0.0876

(i.e. the scaling factor

=1/0.0876)

1 Group Before- This groups After-event After-event size of the

(or event size probability size of the group (scaled)

segment) of the of having group (not yet

group the specific scaled)

heart illness

2 Standard 0.80 10.0% 0.0800 91.3%

3 Preferred 0.15 5.0% 0.0075 8.6%

4 Super 0.05 0.2% 0.0001 0.1%

Preferred

5 Total 1.00 0.0876 100%

So if the applicant was found to have the specific heart disease, then

Theres an 8.6% chance he comes from the preferred risk class;

Theres a 0.1% chance he comes from the super preferred risk class.

When calculating the discrete posterior probability, if the problem is tricky, try to set up

the table as we did in Problem 1 and Problem 2. Use this table to help you keep track of

your data and work.

Problem 3

1% of the women at age 45 who participate in a study are found to have breast cancer.

80% of women with breast cancer will have a positive mammogram. 10% of women

without breast cancer will also have a positive mammogram. One woman aged 45 who

participated in the study was found to have a positive mammogram.

Guo Fall 2009 C, Page 208 / 284

Calculate the probability that this woman has breast cancer.

Solution

This problem is tricky and many folks wont be able to solve this problem right.

To solve this problem, we need to correctly identify the following 3 items:

What are the distinct causes (i.e. segments) that can possibly produce the event?

Make sure your causes are mutually exclusive (i.e. no two causes can happen

simultaneously) and collectively exhaustive (i.e. there are no other causes).

Causes of this event two distinct causes. Women with breast cancer and without breast

cancer. These are the two segments. In terms of size of each segment, women with breast

cancer account for 1% of the participants; and women without breast cancer account for

99%.

Each cause probability to produce the event women with breast cancer have 80%

chance of having a positive mammogram. Women without breast cancer have 10% of the

chance of having a positive mammogram.

Segments Segments Segments

probability to contribution contribution % to the

Segment (distinct Segments produce the amount to the event (post event

causes) size event event probability)

women with breast

cancer 1% 80% 1%(80%) =0.008 0.008/0.107=7.48%

women without breast

cancer 99% 10% 99%(10%)=0.099 0.009/0.107=92.52%

mammogram, then she has 7.48% chance of actually having breast cancer.

Problem 4 (SOA May 2003, Course 1, #31)

A health study tracked a group of persons for five years. At the beginning of the study,

20% were classified as heavy smokers, 30% as light smokers, and 50% as nonsmokers.

Results of the study showed that light smokers were twice as likely as nonsmokers to die

during the five-year study, but only half as likely as heavy smokers.

A randomly selected participant from the study died over the five-year period.

Solution

Let p =the probability that a non-smoker will die during the next 5 years. Then,

The probability that a light smoker will die during the next 5 years is 2 p

The probability that a heavy smoker will die during the next 5 years is 4 p

Please note that we dont enough information to calculate p . This shouldnt bother us.

We need to know the value of p to solve the problem.

Segment's

Segment probability to Segment's Segment's

Segment size produce the event contribution amount contribution %

Heavy smoker 20% 4p 20%(4 p )=0.8 p 0.8 p /1.9 p =42.11%

Light smoker 30% 2p 30%(2 p )=0.6 p 0.6 p /1.9 p =31.58%

Non smoker 50% p 50%( p )= 0.5 p 0.5 p /1.9 p =26.32%

Total 100% 1.9 p 100.00%

The probability that the participant was a heavy smoker is 31.58%.

The probability that the participant was a heavy smoker is 26.32%.

In problems related to Bayes Theorem, the absolute size of each segment doesnt

matter; only the ratio of each segment size matters. Similarly, the absolute

probability for each segment to produce the event doesnt matter; only the ratio of

probabilities matters.

If we are to solve this problem quickly, we can set up the following table:

Event: A participant died during the 5-year period

Segment's

Segment Segment's probability to contribution Segment's

Segment size produce the event amount contribution %

Heavy smoker 2 4 2(4)=8 8/19=42.11%

Light smoker 3 2 3(2)=6 6/19=31.58%

Non smoker 5 1 5(1)=5 5/19=26.32%

Total 10 19 100%

In the above table, we change the segment sizes from 20%, 30%, and 50% to 2, 3, and 5.

Similarly, we change the segments probabilities from 4 p , 2 p , and p to 4, 2, and 1.

This speeds up our calculations. You can use this technique when taking the exam.

You are given:

A portfolio of independent risks is divided into two classes, Class A and Class B.

There are twice as many risks in Class A as in Class B.

The number of claims for each insured during a single year follows a Bernoulli

distribution.

Class A and B have claim size distributions as follows:

50,000 0.6 0.36

100,000 0.40 0.64

The expected number of claims per year is 0.22 for Class A and 0.11 for Class B.

One insured is chosen at random. The insureds loss for two years combined is 100,000.

Calculate the probability that the selected insured belongs to Class A.

Solution

This time, well use a formula driven approach without a table. Lets S represent the

total claim $ amount incurred by the randomly chosen insured during the 2-year period.

We observe that S = 100, 000 . We are asked to find P ( A S = 100, 000 ) , which is the

posterior probability that Class A has incurred a total loss of $100,000 during the 2-year

period.

Using either the conditional probability formula or the Bayes Theorem, we have:

P ( A S = 100, 000 ) = =

P ( S = 100, 000 ) P ( S = 100, 000 )

P ( A ) P ( S = 100, 000 A )

=

P ( A) P ( S = 100, 000 A) + P ( B ) P ( S = 100, 000 B )

P ( A) P ( S = 100, 000 A)

P ( A S = 100, 000 ) =

P ( A ) P ( S = 100, 000 A ) + P ( B ) P ( S = 100, 000 B )

1 1

= =

P ( B ) P ( S = 100, 000 B ) P ( B) P ( S = 100, 000 B )

1+ 1+

P ( A ) P ( S = 100, 000 A ) P ( A) P ( S = 100, 000 A )

P ( A S = 100, 000 ) =

1

1 P ( S = 100, 000 B )

1+

2 P ( S = 100, 000 A)

Ratio of P ( A) and P ( B ) , not their absolute amounts

Ratio of P ( S = 100, 000 A ) and P ( S = 100, 000 A ) , not their absolute amounts

P ( S = 100, 000 B )

So we need to find the ratio .

P ( S = 100, 000 A )

P ( S = 100, 000 A ) is the probability that the Class A produces the observation (i.e. Class

A incurs $100,000 loss in 2 years).

We are told that the # of claims for Class A and B is a Bernoulli random variable.

Remember that Bernoulli random variable is just a binominal random variable with n = 1

(only one trial). Let X represent the # of claims incurred by the insured. Let p represent

the probability for the insured to have a claim. Then E ( X ) = p . We are told that

E ( X A ) = 0.22 . So pA = 0.22 . Similarly, E ( X B ) = pB = 0.11 .

So each year, Class A can have either zero claim (with probability 0.78) or one claim

(0.22). The claim amount is either 50,000 (probability 0.6) and 100,000 (probability 0.4).

Each year, Class B can have either zero claim (with probability 0.89) or one claim (0.11).

The claim amount is either 50,000 (probability 0.36) and 100,000 (probability 0.64).

There are only 3 ways for Class A or B to produce $100,000 claims in two years:

Have $50,000 claim in Year 1 and $50,000 Year 2.

Have $100,000 claim in Year 1 and $0 claim in Year 2.

Have $0 claim in Year 1 and $100,000 claims in Year 2.

P ( S = 100,000 B ) = ( 0.112 )( 0.36 2 ) + 2 ( 0.11)( 0.89 )( 0.64 ) = 0.1269

P ( A S = 100, 000 ) =

1 1

= = 0.709

1 P ( S = 100, 000 B ) 1 + 1 0.1269

1+

2 P ( S = 100, 000 A ) 2 0.1547

You are tossing a coin. Not knowing p , the success rate of a heads showing up in one

toss of the coin, you subjectively assume that p is uniformly distributed over [ 0,1] . Next,

you do an experiment by tossing the coin 3 times. You find that, in this experiment, 2 out

of 3 tosses have heads.

Solution

A B C D=BC E=D Scaling factor

1 Group Before- This groups After-event size After-event size of the

event probability of the group (not group (scaled)

size of the to make the yet scaled)

group event

happen

2 Any p in 1 C32 p 2 (1 p ) C32 p 2 (1 p ) C32 p 2 (1 p )

[0,1] 1

C32 p 2 (1 p )dp

0

3 Total 1 1

100%

C32 p 2 (1 p )dp

0

The key to solving this problem is to understand that we have an infinite number of

groups. Each value of p ( 0 p 1 ) is a group. Because p is uniform over

[0,1], f ( p ) = 1 . As a result, for a given group of p , the before-event size is one. And for

a given group of p , this groups probability to make the event getting 2 heads out of 3

tosses happen is a binomial distribution with probability of C32 p 2 (1 p ) . So the after-

event size is

k 1 C32 p 2 (1 p )

scaling factor before-event the group's probability

group size

to have 2 heads out of 3 tosses

k is a scaling factor such that the sum of the after-event sizes for all the groups is equal to

one. Since we have an infinite number of groups, we have to use integration to sum up all

the after-event sizes for each group:

1

1

k C32 p 2 (1 p )dp = 1 k= 1

0

C32 p 2 (1 p )dp

0

C32 p 2 (1 p ) p 2 (1 p )

k C32 p 2 (1 p ) = 1

= 1

C p (1 p )dp

2

3

2

p 2 (1 p )dp

0 0

It turns out that the posterior probability we just calculated is a Beta distribution.

Key point

The process for calculating the continuous posterior probability is the same for

calculating the discrete posterior probability. The only difference is this: you use

integration for continuous posterior probability; you use summation for discrete posterior

probability.

The size of a claim for an individual insured follows an inverse exponential distribution

with the following probability density function:

e x

f (x ) = , x>0

x2

The parameter has the prior distribution with the following probability density

function:

Guo Fall 2009 C, Page 214 / 284

4

e

g( )= , >0

4

One claim of size 2 has been observed for a particular insured. Which of the following is

proportional to the posterior distribution of ?

2 3 4 2 2 2 9 4

e , e , e , e , e

Solution

g ( x = 2) = k g( ) f (x = 2 )

scaling factor posterior density this group's density to

posterior density

make the event happen

4

e x

g ( x = 2 ) = kg ( ) f (x = 2 ) = k

e

4 x2 x =2

4 2

g ( x = 2) = k

e e k

= e 3 4

4 22 16

3 4

So the posterior distribution of is proportional to e .

Here the problem didnt ask you to find the full posterior probability. If you have to find

it, this is how. One way is to do integration. Assume g ( x = 2 ) = K e 3 4 . Because the

total posterior probability should be one, we have:

+ +

g ( x = 2 )d =

1

K e 3 4

d = 1, K= +

0 0 3 4

e d

0

+

3 4

To calculate e 3 4

d , set = y . Then = y.

0

4 3

+ + 2 + 2

4 4 4 4

e 3 4

d = y e yd y = ye dy = y

. Here ye y

is a simple

0 0

3 3 3 0

3

+ 2

, and g ( x = 2 ) = xe

3 9 9

gamma distribution. So ye y dy = 1 , K = = 3x 4

.

0

4 19 16

Another, quicker, way to find the full expression of g ( x = 2 ) is to notice that e 3 4

4

is a gamma distribution with parameter = 2 and = . If you look at the table for

3

Exam C, youll see the gamma pdf:

1 1 9

f ( x) =

x ( 4 3)

x 1

e x

= x 2 1e = xe 3x 4

( 4 3)

2

16

You are given:

In a portfolio of risks, each policyholder can have at most one claim per year.

The probability of a claim for a policyholder during a year is q .

q3

The prior probability is (q) = , 0.6 < q < 0.8

0.07

A randomly selected policyholder has one claim in Year 1 and zero claim in Year 2.

For this policyholder, determine the posterior probability that 0.7 < q < 0.8 .

Solution

0.8

P ( 0.7 < q < 0.8 N1 = 1, N 2 = 0 ) = f ( q N1 = 1, N 2 = 0 )dq

q = 0.7

f ( q ) P ( N1 = 1, N 2 = 0 q ) f ( q ) P ( N1 = 1, N 2 = 0 q )

f ( q N1 = 1, N 2 = 0 ) = =

P ( N1 = 1, N 2 = 0 ) 0.8

f ( q ) P ( N1 = 1, N 2 = 0 q ) dq

0.6

independent.

q3 q 4 q5

f ( q ) P ( N1 = 1, N 2 = 0 q ) = q (1 q ) =

0.07 0.07

q 4 q5

q4 q5

f ( q N1 = 1, N 2 = 0 ) = 0.8 40.075 = 0.8

q q

0.07

dq (q 4

q 5 ) dq

0.6 0.6

0.8 0.8

q ( q 4 q 5 )dq (q 5

q 6 )dq

P ( 0.7 < q < 0.8 N1 = 1, N 2 = 0 ) =

q = 0.7 q = 0.7

0.8

= 0.8

(q 4

q 5 ) dq (q 4

q 5 ) dq

0.6 0.6

0.8

1 6 1 7

6

q q

7 ! 0.7

1

( 0.86 0.7 6 )

1

( 0.87 0.77 )

= =5 6 = 0.5572

( 0.85 0.65 ) ( 0.8 0.6 )

0.8

1 5 1 6 1 1 6 6

q q 5 6

5 6 ! 0.6

The # of claims for each policyholder follows a Poisson distribution with mean

The distribution of across all policyholders has probability density function

f ( )= e "

, >0

1

e n

d =

0

n2

A randomly selected policyholder is known to have had at least one claim last year.

Determine the posterior probability that this same policyholder will have at least one

claim this year.

Solution

then by conditioning on , we have:

P ( N 2 # 1) = P ( N2 # 1 ) f ( )d

=0

N2 is a Poisson random variable with mean . So

P ( N2 # 1 ) =1 P ( N2 = 0 ) =1 e

P ( N 2 # 1) = (1 e ) f ( )d

=0

P ( N 2 # 1 N1 # 1) = (1 e )f( N1 # 1) d

=0

Next, we have:

f ( ) P ( N1 # 1 ) e (1 e )

f ( N1 # 1) = =

f ( ) P ( N1 # 1 ) d e (1 e )d

0 0

e (1 e )d = e d e 2

d =

1

12

1 3

=

22 4

0 0 0

f ( N1 # 1) =

4

3

e (1 e )

P ( N 2 # 1 N1 # 1) = (1 e )f( N1 # 1) d = (1 e ) 43 e (1 e )d

=0 =0

=

4

3

e (1 2e +e 2

)d =

4

3

e d 2 e 2

d + e 3

d

=0 0 0 0

4 1 1 1

= 2 + = 0.8148

3 12 22 32

Calculate Bayesian premium when the prior probability is discrete

Next, Ill give you a framework for how to calculate Bayesian problems. As I explain my

framework, I will also give you a shortcut.

Step 1 Determine the observation.

Step 2. Change the prior probability to posterior probability.

1-4 0.50

5 0.25

6 0.75

A coin is selected at random and the flipped repeatedly. X i denotes the outcomes of the

i th flip, where 1 indicates heads and 0 indicates tails. The following sequence is

obtained:

S = { X 1 , X 2 , X 3 , X 4 } = {1,1, 0,1}

Solution

Step 1 Determine the observation. This is easy; we are already told the observation is

S = { X 1 , X 2 , X 3 , X 4 } = {1,1, 0,1}

Guo Fall 2009 C, Page 219 / 284

Now were going to simplify the problem by purposely discarding the observation. So

instead of calculating E ( X 5 S ) , well just calculate E ( X 5 ) . X 5 is the # of heads

showing up in the fifth flip of the coin randomly chosen. X 5 is a binominal random

variable with parameter n = 1 (one flip of coin) and p (the probability of the head

showing up). Using the binomial distribution formula, we have:

E ( X5 ) = n p = p

However, the parameter p varies by coin types. For Coin 1-4, p = 0.5 ; for Coin 5,

p = 0.25 ; and for Coin 6, p = 0.75 . Because the coin is randomly chosen from Coin 1, 2,

3, 4, 5, and 6, we dont know which coin is chosen. So well need to partition E ( X 5 )

over coin types:

E ( X5 )

= E ( X 5 Coin 1-4 ) P ( Coin 1-4 ) + E ( X 5 Coin 5 ) P ( Coin 5 ) + E ( X 5 Coin 6 ) P ( Coin 6 )

E ( X 5 Coin 5 ) = P ( Coin 5 showing a head in one flip ) = 0.25

E ( X 5 Coin 6 ) = P ( Coin 6 showing a head in one flip ) = 0.75

We can go one step further and calculate E ( X 5 ) . Though the problem doesnt

specifically tell us P ( Coin 1-4 ) , P ( Coin 5) , and P ( Coin 6 ) , we assume that coins are

uniformly distributed so each coin is equally likely to be chosen. So

4 1 1

P ( Coin 1-4 ) = , P ( Coin 5 ) = , P ( Coin 6 ) =

6 6 6

4 1 1

E ( X 5 ) = 0.5 + 0.25 + 0.75 = 0.5

6 6 6

Of course, this problem isnt as simple as this. Otherwise, everyone who has passed

Exam P will pass Exam C.

Step 3 Consider the observation. Modify the equation obtained in Step 2. Change

the prior probabilities to posterior probabilities.

to modify our equation obtained in Step 2. The original partition equation (if we discard

the observation) is:

E ( X5 )

= E ( X 5 Coin 1-4 ) P ( Coin 1-4 ) + E ( X 5 Coin 5 ) P ( Coin 5 ) + E ( X 5 Coin 6 ) P ( Coin 6 )

How to modify:

E ( X5 ) E ( X5 S )

P ( Coin 1-4 ) P ( Coin 1-4 S )

P ( Coin 5) P ( Coin 5 S )

P ( Coin 6 ) P ( Coin 6 S )

this observation, we can no longer assume that the coin randomly chosen has 4 6 chance

of being Coin 1-4, 1 6 chance of being Coin 5, and 1 6 chance of being Coin 6; these

probabilities would have fine if we didnt observe S = { X 1 , X 2 , X 3 , X 4 } = {1,1, 0,1} . Now

we have this new information S = { X 1 , X 2 , X 3 , X 4 } = {1,1, 0,1} . We will need to

reevaluate the probability that the coin belongs to which type. So well replace the prior

probabilities P ( Coin 1-4 ) , P ( Coin 5) , and P ( Coin 6 ) with posterior probabilities

P ( Coin 1-4 S ) , P ( Coin 5 S ) , and P ( Coin 6 S ) respectively.

the conditional expectation.

E ( X5 S )

= E ( X 5 Coin 1- 4 ) P ( Coin 1- 4 S ) + E ( X 5 Coin 5 ) P ( Coin 5 S ) + E ( X 5 Coin 6 ) P ( Coin 6 S )

= 0.5 P ( Coin 1- 4 S ) + 0.25 P ( Coin 5 S ) + 0.75 P ( Coin 6 S )

Please note that our observation S = { X 1 , X 2 , X 3 , X 4 } = {1,1, 0,1} doesnt change how

likely each coin actually produces a head in one flip. So the following three items are

fixed regardless of our observation:

E ( X 5 Coin 5 ) = P ( Coin 5 showing a head in one flip ) = 0.25

E ( X 5 Coin 6 ) = P ( Coin 6 showing a head in one flip ) = 0.75

P ( Coin 1- 4 S ) = =

P(S ) P(S )

P ( Coin 5 S ) = =

P(S ) P(S )

P ( Coin 6 S ) = =

P(S ) P(S )

Where

P ( S ) = P ( Coin 1- 4 ) P ( S Coin 1- 4 ) + P ( Coin 5 ) P ( S Coin 5 ) + P ( Coin 6 ) P ( S Coin 6 )

Detailed calculation:

P ( S Coin 5 ) = P (1,1, 0,1 Coin 5 ) = 0.25 ( 0.25 )( 0.75 )( 0.25 ) = 0.253 ( 0.75 )

P ( S Coin 6 ) = P (1,1, 0,1 Coin 6 ) = 0.75 ( 0.75 )( 0.25 )( 0.75 ) = 0.753 ( 0.25 )

4

P(S

6

Coin 5 ) = P ( Coin 5 ) P ( S Coin 5 ) = 0.253 0.75

1

P(S

6

Coin 6 ) = P ( Coin 6 ) P ( S Coin 6 ) = 0.753 0.25

1

P(S

6

P(S ) =

4

6

( 0.54 ) + ( 0.253 ) ( 0.75 ) + ( 0.753 ) ( 0.25 )

1

6

1

6

4

( 0.54 )

P ( Coin 1- 4 S ) = 6 = 0.681

4

6

( 0.54 ) + ( 0.253 ) ( 0.75 ) + ( 0.753 ) ( 0.25)

1

6

1

6

1

( 0.253 ) ( 0.75 )

P ( Coin 1- 4 S ) = 6 = 0.032

4

6

( 0.5 ) + 6 ( 0.25 ) ( 0.75) + 6 ( 0.75 ) ( 0.25)

4 1 3 1 3

1

( 0.753 ) ( 0.25 )

P ( Coin 1- 4 S ) = 6 = 0.287

4

6

( 0.5 ) + 6 ( 0.25 ) ( 0.75) + 6 ( 0.75 ) ( 0.25)

4 1 3 1 3

E ( X5 S )

= 0.5 P ( Coin 1- 4 S ) + 0.25 P ( Coin 5 S ) + 0.75 P ( Coin 6 S )

= 0.5 (0.681) + 0.25 (0.032) + 0.75 (0.287) = 0.564

I recommend that initially you use the 5-step framework to calculate discrete-prior

Bayesian premiums. Just copy what I did. Explicitly write out each of the 5 steps; dont

skip step. Solve as many problems as you need until you are proficient with the

framework.

Once you are familiar with the 5-step process, lets learn how to improve it. Well focus

on improving Step 4 (calculating the posterior probabilities). If you ever solve a Bayesian

premium problem, youll have discovered that Step 4 is long, tedious, and prone to errors.

Take a look at Step 4 in Problem 4. See how involved the calculation is. When taking the

exam, you are really stressed. In addition, you have only 3 minutes to solve a problem. If

you follow the standard solution approach, chances are high that youll mess up at least

one step of your calculation. Then all your hard work is ruined. You wont be able to

score a point.

Most exam candidates will mess up in Step 4 . Lets find a better way to do Step 4.

What are we doing in Step 4? Two things. First, we calculate the raw posterior

probabilities:

4

P(S

6

Coin 5 ) = P ( Coin 5 ) P ( S Coin 5 ) = 0.253 0.75

1

P(S

6

Coin 6 ) = P ( Coin 6 ) P ( S Coin 6 ) = 0.753 0.25

1

P(S

6

constant

1

k=

P(S )

1

=

P ( Coin 1- 4 ) P ( S Coin 1- 4 ) + P ( Coin 5 ) P ( S Coin 5 ) + P ( Coin 6 ) P ( S Coin 6 )

After multiplying each raw posterior probability with this constant, the three posterior

probabilities will nicely add up to one. Normalization is necessary; its a part of Bayes

Theorem. However, it is a messy calculation. So ideally, well want to avoid it.

It turns out that we really can avoid normalizing the raw posterior probabilities. To

understand how to avoid normalization, lets formally present the question:

X =x pX ( x )

0.5

4

6

( 0.54 ) k

0.25

1

6

( 0.253 ) ( 0.75 ) k

0.75

1

6

( 0.753 ) ( 0.25) k

We have seen this problem in the chapter on how to BA II Plus/Professional 1-V

Statistics Worksheet. This is how we solved it without calculating k .

X =x pX ( x ) Scaled p X ( x ) up

1, 000, 000

multiply p X ( x ) by

k

0.5

4

6

( 0.54 ) k = 0.041667 k 41,667

0.25

1

6

( 0.253 ) ( 0.75) k = 0.001953 k 1,953

0.75

1

6

( 0.753 ) ( 0.25) k = 0.017578 k 17,578

X01=0.5, Y01=41,667

X02=0.25, Y02= 1,953

X03=0.75, Y03=17,578

pressing 2ND Stat and then keeping pressing ENTER until your calculator displays

1-V.

Press the down arrow key & . You should get: n = 61,198

Press the down arrow key & . You should get: X = 0.56382970

So E ( X 5 S ) ' X = 0.564

This the result calculated using BA II Plus/Professional 1-V Statistics Worksheet matches

what we calculated in the 5-step process.

Event: the coin produces HHTH

A B C D=BC E = 1,000,000 F

Group Before- This After-event size of the Scale up raw Conditional

(Coin event groups group (raw posterior posterior mean

Type) size of probability probability) probability

the to produce

group HHTH

0.54

( 0.54 ) = 0.041667

4 4 41,667 0.50

1-4 6 6

3

( 0.253 ) ( 0.75 ) = 0.001953

1 0.25 1 1,953 0.25

5 6 0.75 6

3

( 0.753 ) ( 0.25 ) = 0.017578

1 0.75 1 17,578 0.75

6 6 0.25 6

X01=0.5, Y01=41,667

X02=0.25, Y02= 1,953

X03=0.75, Y03=17,578

pressing 2ND Stat and then keeping pressing ENTER until your calculator displays

1-V.

Press the down arrow key & . You should get: n = 61,198

Press the down arrow key & . You should get: X = 0.56382970

So E ( X 5 S ) ' X = 0.564

Better yet, we can reduce the decimal places in Column D to 4

decimal places. This is even faster:

A B C D=BC E = 10,000 F

Group Before- This After-event size of the Scale up raw Conditional

(Coin event groups group (raw posterior posterior mean

Type) size of probability probability) probability

the to produce

group HHTH

0.54

( 0.54 ) = 0.0417

I 4 4 417 0.50

6 6

0.253

( 0.253 ) ( 0.75 ) = 0.0020

II 1 1 20 0.25

6 0.75 6

0.753

( 0.753 ) ( 0.25 ) = 0.0176

III 1 1 176 0.75

6 0.25 6

X01=0.5, Y01=417

X02=0.25, Y02= 20

X03=0.75, Y03=176

Using 1-V Statistics Worksheet, you should get: n = 613 , X = 0.56362153 ' 0.564

You are given the following information about two classes of risks:

Risks in Class A have a Poisson claim count distribution with a mean of 1.0 per year.

Risks in Class B have a Poisson claim count distribution with a mean of 3.0 per year.

Risks in Class A have an exponential severity distribution with a mean of 1.0 per year.

Risks in Class B have an exponential severity distribution with a mean of 3.0 per year.

Each class has the same number of risks.

Within each class, severities and claim counts are independent.

A risk is randomly selected and observed to have 2 claims during one year. The observed

claim amounts were 1.0 and 3.0. Calculate the posterior expected value of the aggregate

loss for this risk during the next year.

Solution

Guo Fall 2009 C, Page 227 / 284

Conceptual framework

Let

S represent the aggregate claim dollar amount.

X represent the individual claim dollar amount

N represent the # of claims

N

Then S = ( X i . We are told that N and X are independent. In addition, X i are

i =1

First, lets make things simple and forget about the condition N = 2, X 1 = 1, X 2 = 3 . Then

E ( S ) = E ( N ) E ( X ) . Since the risk is randomly chosen from Class A and Class B, we

have:

E ( S ) = E ( S A ) P ( A) + E ( S B ) P ( B )

The above formula is an Exam P concept. You shouldnt have trouble understanding it.

Here P ( A) and P ( B ) are prior probabilities, which are probabilities prior to our

observation { N = 2, X 1 = 1, X 2 = 3} .

Next,

E ( S A ) = E ( N A ) E ( X A ) = "A A = 1(1) = 1

E ( S B ) = E ( N B ) E ( X B ) = "B B = 3 ( 3) = 9

Here "A and "B are the Poisson means for claim counts for Class A and B respectively.

And A and B are exponential mean claim amounts for Class A and B respectively.

E ( S ) = P ( A) + 9P ( B )

amount, well still use the formula E ( S ) = P ( A ) + 9 P ( B ) . However, well replace the

prior probabilities P ( A) and P ( B ) with posterior probabilities:

P ( A N = 2, X 1 = 1, X 2 = 3) , P ( B N = 2, X 1 = 1, X 2 = 3)

Our observation { N = 2, X 1 = 1, X 2 = 3} has changed our belief of the likelihood that the

risk is from Class A and Class B. So well no longer use the prior probability P ( A) and

P ( B ) to calculate E ( S ) .

expected aggregate claim amount is based on the observation { N = 2, X 1 = 1, X 2 = 3} .

E ( S N = 2, X 1 = 1, X 2 = 3)

= P ( A N = 2, X 1 = 1, X 2 = 3) + 9 P ( B N = 2, X 1 = 1, X 2 = 3)

Next, well need to use the Bayes theorem to calculate the posterior probabilities

P ( A N = 2, X 1 = 1, X 2 = 3) and P ( B N = 2, X 1 = 1, X 2 = 3) :

P ( A ) P ( N = 2, X 1 = 1, X 2 = 3 A )

P ( A N = 2, X 1 = 1, X 2 = 3) =

P ( N = 2, X 1 = 1, X 2 = 3)

P ( A ) P ( N = 2, X 1 = 1, X 2 = 3 A )

=

P ( A) P ( N = 2, X 1 = 1, X 2 = 3 A ) + P ( B ) P ( N = 2, X 1 = 1, X 2 = 3 B )

P ( B ) P ( N = 2, X 1 = 1, X 2 = 3 B )

P ( B N = 2, X 1 = 1, X 2 = 3) =

P ( N = 2, X 1 = 1, X 2 = 3)

P ( B ) P ( N = 2, X 1 = 1, X 2 = 3 B )

=

P ( A) P ( N = 2, X 1 = 1, X 2 = 3 A ) + P ( B ) P ( N = 2, X 1 = 1, X 2 = 3 B )

If you understand my logic so far, you are in the good shape. The remaining work is just

the calculation.

Standard calculation

Well calculate the probability for Class A Risk and Class B Risk to each produce the

observed outcome { N = 2, X 1 = 1, X 2 = 3} :

P A { N = 2, X 1 = 1, X 2 = 3} = P A ( N = 2 ) P A ( X = 1) P A ( X = 3)

( "A )

2 1 3

=e "A

2!

1

e A

1

e A

=e 1 1

2!

(e 1

)( e ) = 12 e

3 5

= 0.00337

A A

P B { N = 2, X 1 = 1, X 2 = 3} = P B ( N = 2 ) P B ( X = 1) P B ( X = 3)

( "B )

2 1 3 1 3 1

"B 1 1 32 1 1 1 4

=e e B

e B

=e 3

e 3

e 3

= e 3

= 0.00656

2! B B 2! 3 3 2

P ( A N = 2, X 1 = 1, X 2 = 3)

P ( A ) P ( N = 2, X 1 = 1, X 2 = 3 A )

=

P ( A) P ( N = 2, X 1 = 1, X 2 = 3 A ) + P ( B ) P ( N = 2, X 1 = 1, X 2 = 3 B )

0.5 ( 0.00337 )

= = 0.339

0.5 ( 0.00337 ) + 0.5 ( 0.00656 )

Similarly,

P ( B N = 2, X 1 = 1, X 2 = 3)

P ( B ) P ( N = 2, X 1 = 1, X 2 = 3 B )

=

P ( A) P ( N = 2, X 1 = 1, X 2 = 3 A ) + P ( B ) P ( N = 2, X 1 = 1, X 2 = 3 B )

0.5 ( 0.00656 )

= = 0.661

0.5 ( 0.00337 ) + 0.5 ( 0.00656 )

Finally,

E ( S N = 2, X 1 = 1, X 2 = 3)

= P ( A N = 2, X 1 = 1, X 2 = 3) + 9 P ( B N = 2, X 1 = 1, X 2 = 3)

=1(0.339) + 9(0.661) = 6.29

Shortcut

When taking the exam, youll still need to understand the conceptual framework

explained in the beginning of the solution. However, youll skip the normalizing step and

avoid the need to manually calculate the mean.

This is what you need when solving this problem in the exam condition:

Event: { N = 2, X 1 = 1, X 2 = 3}

Group Before- This groups After-event Scale up Conditional

event probability to produce size of the raw mean

size of the event group (raw posterior

the posterior probability

group probability) (multiply

the raw

probability

by

200,000)

A 0.5 e

1

2!

1

( e 1 )( e 3 ) 0.5(0.00337) 337 "A A = 1(1) = 1

1

= e 5 = 0.00337

2

B 0.5 32 1

3

1

3

1 3

3

656 "B B = 3 ( 3) = 9

e e e 0.5(0.00656)

2! 3 3

1

1 4

= e 3

= 0.00656

2

X01=1, Y01=337

X02=9, Y02=656

Half of the insureds are expected to have 2 claims per year.

The other half of the insureds are expected to have 4 claims per year.

A randomly selected insured has made 4 claims in each of the first two policy years.

Determine the Bayesian estimate of this insureds claim count in the next (third) policy

year.

Solution

However, the insured can belong to either Class A with "A = 2 or Class B with "B = 4 .

So if we dont worry about the observation { N1 = 4, N 2 = 4} , we have:

E ( N3 ) = E ( N3 A ) P ( A) + E ( N3 B ) P ( B )

= "A P ( A ) + "B P ( B ) = 2 P ( A ) + 4 P ( B )

Next, well modify the above partition equation by considering the observation

{ N1 = 4, N 2 = 4} . Well change the prior probabilities to posterior probabilities:

E ( N 3 N1 = 4, N 2 = 4 ) = 2 P ( A N1 = 4, N 2 = 4 ) + 4 P ( B N1 = 4, N 2 = 4 )

P A ( N1 = 4, N 2 = 4 )!

P ( A N1 = 4, N 2 = 4 ) =

P ( N1 = 4, N 2 = 4 )

P ( A ) P ( N1 = 4, N 2 = 4 A )

=

P ( A) P ( N1 = 4, N 2 = 4 A) + P ( B ) P ( N1 = 4, N 2 = 4 B )

Similarly,

P ( B ) P ( N1 = 4, N 2 = 4 B )

P ( B N1 = 4, N 2 = 4 ) =

P ( A ) P ( N1 = 4, N 2 = 4 A) + P ( B ) P ( N1 = 4, N 2 = 4 B )

Detailed calculations (if you use my shortcut, youll avoid most of these calculations):

2

( "A )

4 2

24

P ( N1 = 4, N 2 = 4 A ) = P ( N1 = 4 A) P ( N 2 = 4 A ) = e "A

= e 2

4! ! 4!

2

( "B )

4 2

44

P ( N1 = 4, N 2 = 4 B ) = P ( N1 = 4 B ) P ( N 2 = 4 B ) = e "B

= e 4

4! ! 4!

2

24 2

0.5 e

P ( A N1 = 4, N 2 = 4 ) =

4!

2 2

= 0.176

4 4

2 4

0.5 e 2

+ 0.5 e 4

4! 4!

2

44 4

0.5 e

P ( B N1 = 4, N 2 = 4 ) =

4!

2 2

= 0.824

24 44

0.5 e 2

+ 0.5 e 4

4! 4!

The above two calculations are nasty and prone to errors. Many candidates will mess up

in these calculations and wont score a point. Assume you have done your calculation

right, you should get:

E ( N 3 N1 = 4, N 2 = 4 )

= 2 P ( A N1 = 4, N 2 = 4 ) + 4 P ( B N1 = 4, N 2 = 4 )

= 2(0.176) + 4(0.824) = 3.648

Just set up the following table and let BA II Plus/Professional 1-V do the magic for you.

Watch and relax.

Event: { N1 = 4, N 2 = 4}

Group Before- This After-event size of the Scale up Conditional

event groups group (raw posterior raw mean

size of probability probability) posterior

the to produce probability

group the event

A 2 2

"A = 2

24 24

e2 0.5 e 2

0.5 4! 4!

B "B = 4

4 2 4 2

4 4 4 4

0.5 e 0.5 e

4! 4!

Guo Fall 2009 C, Page 233 / 284

Next, well need to scale the raw posterior probabilities up. Well want to avoid the error-

prone calculation of following two raw posterior probabilities:

2 2

24 2 444

0.5 e , 0.5 e

4! 4!

Remember what I said earlier when I was explaining Bayes Theorem to you:

What matters is the ratio of these two (or more) raw posterior probabilities, not their

absolute amounts.

2 2

24 2 444

0.5 e , 0.5 e

4! 4!

2 2

242 444

0.5 e 0.5 e

44

2

(e )

4 2

= 216 ( e )

4! 4! 2 2

= 1, = 4 = 256e 4 = 4.689

(e )

2 2 2 2

2 24

2 24 2

0.5 e 0.5 e

4! 4!

New Table

Event: { N1 = 4, N 2 = 4}

Group Before- This After-event After-event size of the Scale up

event groups size of the group (raw posterior raw Condi-

size of probability group (raw probability) after posterior tional

the to produce posterior simplification probability

group the event probability) (multiply mean

the raw

probability

by 1,000)

2

24 2

2 2 0.5 e

24

2 24

2 4!

A 0.5

e 0.5 e =1 1,000 "A = 2

4! 4! 2

24 2

0.5 e

4!

2

444

0.5 e

4!

2 2 = 256e 4

44 44 24 2

B 0.5 e4 0.5 e 4 0.5 e 2 4,689 "B = 4

4! 4! 4!

= 4.689

X01=2, Y01=1,000

X02=4, Y02=4,689

Prior to observing any claims, you believed that claim sizes followed a Pareto distribution

with parameters = 10 and =1, 2, or 3, with each value equally likely. You then

observe one claim of 20 for a randomly selected risk. Determine the posterior probability

that the next claim for this risk will be greater than 30.

Solution

P ( X 2 > 30 )

= P ( X 2 > 30 = 1) P ( = 1) + P ( X 2 > 30 = 2 ) P ( = 2 ) + P ( X 2 > 30 = 3) P ( = 3)

If you look at Tables for Exam C/4, youll see that the survival function of a (2-

. Here the )=

x+

problem doesnt say whether the Pareto is one parameter or two parameters. One quick

way to determine whether to use one parameter or two-parameter Pareto is this:

If the random variable is greater then zero, then use two parameter Pareto.

If the random variable is greater than a positive constant, then use one parameter Pareto.

The problem just vaguely says that claim sizes follow a Pareto distribution. Here the

claim size (i.e. claim dollar amount) must be greater than zero. Theres no reason for us

to think that the claim dollar amount must exceed a positive constant (such $500). As a

result, well use the 2-parameter Pareto.

P ( X 2 > 30 ) = S ( 30 ) = 10 1

= ,

30 + 10 4

1 2 3

1 1 1

P ( X 2 > 30 ) = P ( = 1) + P ( = 2) + P ( = 3)

4 4 4

1 1 1

= P ( = 1) + P ( = 2 ) + P ( = 3)

4 16 64

P ( X 2 > 30 X 1 = 20 ) = P ( = 1 X 1 = 20 ) + P ( = 2 X 1 = 20 ) + P ( = 3 X 1 = 20 )

1 1 1

4 16 64

Next, well calculate the posterior probabilities. If you look at Tables for Exam C/4,

youll find the density function of a 2-parameter Pareto distribution with parameters is:

+1

f (x )= = ,

(x+ )

+1

x+

+1 +1

= 10 , f ( 20 ) = 10 2010+ 10 1

Then for =

10 3

P ( = 1) f ( 20 = 1)

P ( = 1 X 1 = 20 ) =

f ( 20 )

P ( = 2 ) f ( 20 = 2)

P ( = 2 X 1 = 20 ) =

f ( 20 )

P ( = 3) f ( 20 = 3)

P ( = 3 X 1 = 20 ) =

f ( 20 )

f ( 20 ) = P ( = 1) f ( 20 = 1) + P ( = 2 ) f ( 20 = 2 ) + P ( = 3) f ( 20 = 3)

+1 +1

10 1

= . Assume you can do the above

10 20 + 10 10 3

calculation right, youll find:

P ( = 1 X 1 = 20 ) =

0.3704% 1

=

0.7408% 2

P ( = 2 X 1 = 20 ) =

0.2469% 1

=

0.7408% 3

P ( = 3 X 1 = 20 ) =

0.1235% 1

=

0.7408% 6

Then

P ( X 2 > 30 X 1 = 20 ) = P ( = 1 X 1 = 20 ) + P ( = 2 X 1 = 20 ) + P ( = 3 X 1 = 20 )

1 1 1

4 16 64

1 1 1 1 1 1

= + + = 0.148

4 2 16 3 64 6

If you ever try to reproduce my answers, youll find the calculation outlined above is

absolutely a nightmare. In addition, I must acknowledge that I used an Excel spreadsheet

to help me do the above calculations when I was preparing this manual. I must also

knowledge that theres little chance that I will be able to do the calculation right in the

heat of the exam.

In the exam, Ill never use the above standard approach, which is prone to errors. This is

what I will do in the exam (dramatically reducing the complexity of the calculations).

This is what you should do in the exam:

Guo Fall 2009 C, Page 237 / 284

What you should do in the exam room

Event: X 1 = 20

A B C D=BC E F

event density to of the group (raw posterior P ( X 2 > 30 )

size of produce the posterior probability

the event probability)

group multiply the 1

=

f ( 20 ) raw prob by 4

+1

3(30)(32)

1

=

10 3

1

=

30 3

=1 1 1 1 1 1 1 1

1

3 30 3 3 30 3 3

4

1 2 1

2

1 2 1

2

1

2

=2 3 2

30 3 3 30 3 4

1 3 1

3

1 3 1

3

1

3

=3 3 1

30 3 3 30 3 4

1

X 01 = = 0.25 , Y01 = 3

4

2

1

X 02 = = 0.0625 , Y02 = 2

4

3

1

X 02 = = 0.015625 , Y03 = 1

4

You see how nice and easy the shortcut calculation is.

May 2001 #10

The claim count and claim size distribution for risks of Type A are:

0 4/9 500 1/3

1 4/9 1235 2/3

2 1/9

The claim count and claim size distributions for risks of Type B are:

0 1/9 250 2/3

1 4/9 328 1/3

2 4/9

Claim counts and claim sizes are independent within each risk type.

The variance of the total losses in 296,962.

Determine the Bayesian premium for the next year for this same risk.

Solution

N

Let S = ( X i represent the total annual loss. The observation is S1 = 500 . We are asked

i =1

to find E ( S 2 S1 = 500 ) . If we ignore the observation S1 = 500 , then the problem becomes

finding E ( S2 ) . Since the risk can be from either Type A or Type B, well condition S2

on risk types.

E ( S2 ) = E ( S2 A) P ( A) + E ( S2 B ) P ( B )

E (S ) = E (N ) E ( X ) ,

E ( S2 A ) = E ( N 2 A) E ( X A ) , E ( S2 B ) = E ( N 2 B ) E ( X B )

E ( N 2 A) = 0 = , E ( N2 B ) = 0

4 4 1 6 1 4 4 12

+1 +2 +1 +2 =

9 9 9 9 9 9 9 9

1 2 2 1

+ 1235 + 328 = 276

3 3 3 3

Guo Fall 2009 C, Page 239 / 284

E ( S2 A ) = E ( N 2 A ) E ( X A ) =

6

( 990 ) = 660

9

E ( S 2 B ) = E ( N 2 B ) E ( X B ) = ( 276 ) = 368

12

9

E ( S 2 ) = E ( S 2 A ) P ( A ) + E ( S 2 B ) P ( B ) = 660 P ( A) + 368 P ( B )

P ( A) P ( S1 = 500 A ) P ( B ) P ( S1 = 500 B )

P ( A S1 = 500 ) = , P ( B S1 = 500 ) =

P ( S1 = 500 ) P ( S1 = 500 )

P ( A S1 = 500 ) P ( A ) P ( S1 = 500 A )

=

P ( B S1 = 500 ) P ( B ) P ( S1 = 500 B )

The only way for Type A to incur 500 claim in Year 1 is to have one claim of 500. The

only way for Type B to incur 500 claim in Year 1 is to two claims of 250 each.

So P ( S1 = 500 A) = , P ( S1 = 500 B ) =

4 1 4 2

.

9 3 9 3

4 1

0.5

P ( A S1 = 500 ) P ( A ) P ( S1 = 500 A ) 9 3 3

= = =

P ( B S1 = 500 ) P ( B ) P ( S1 = 500 B ) 4 2

2

4

0.5

9 3

P ( A S1 = 500 ) = , P ( B S1 = 500 ) =

3 4

7 7

3 4

= 660 + 368 = 493.14

7 7

Guo Fall 2009 C, Page 240 / 284

What you should do in the exam room

Event: S1 = 500

A B C D=BC E F

event probability of the group (raw posterior E ( S 2 Type )

size of to produce posterior probability

the the event probability)

group multiply the

raw prob by

32

4

0.5

9

Type 4 1 4 1

A 0.5 0.5 3 660

9 3 9 3

2 2

4 2 4 2

Type 0.5 0.5 4 368

9 3 9 3

B

Probabilities

0 1 2 3 4

1 3000 1/3 1/3 1/3 0 0

2 2000 0 1/6 2/3 1/6 0

3 1000 0 0 1/6 2/3 1/6

Solution

Guo Fall 2009 C, Page 241 / 284

Conceptual framework

observation N1 = 1 , then the problem becomes finding E ( N 2 ) . Since N 2 can be

generated from each of the three classes, well condition N 2 on classes:

3

E ( N 2 ) = ( E ( N 2 Class i ) P ( Class i )

i =1

E ( N 2 Class 1) = 0

1 1 1

+1 +2 =1

3 3 3

E ( N 2 Class 2 ) = 1

1 2 1

+2 +3 =2

6 3 6

E ( N 2 Class 3) = 2

1 2 1

+3 +4 =3

6 3 6

3 1

P ( Class 1) P ( N1 = 1 Class 1)

P ( Class 1 N1 = 1) = = 6 3

P ( N1 = 1) P ( N1 = 1)

2 1

P ( Class 2 ) P ( N1 = 1 Class 2 )

P ( Class 2 N1 = 1) = = 6 6

P ( N1 = 1) P ( N1 = 1)

3

P ( Class 3) P ( N1 = 1 Class 3) 0

P ( Class 3 N1 = 1) = = 6 =0

P ( N1 = 1) P ( N1 = 1)

3 1 2 1 3 2

P ( N1 = 1) = + + 0 =

6 3 6 6 6 9

3 1

P ( Class 1) P ( N1 = 1 Class 1)

P ( Class 1 N1 = 1) = = 6 3=3

P ( N1 = 1) 2 4

9

2 1

P ( Class 2 ) P ( N1 = 1 Class 2 )

P ( Class 2 N1 = 1) =

1

=6 6=

P ( N1 = 1) 2 4

9

3 1

= + 2 + 3 0 = 1.25

4 4

Event: N1 = 1

A B C D=BC E F

(class) event probability of the group (raw posterior E ( N 2 Class )

size of to produce posterior probability

the the event probability)

group multiply the

raw prob by

18

1 3/6 1/3 3 1

3 1

6 3

2 1

2 2/6 1/6 1 2

6 6

3 1/6 0 1/60 0 3

Because the posterior probability is zero for Class to produce N1 = 1 , we can delete the

last row.

Nov 2000 #33

A car manufacturer is testing the ability of safety devices to limit damage in car

accidents. You are given:

A test car has either front air bags or side air bags (but not both), each type being

equally likely

The test car will be driven into either a wall or a lake, with each accident type

being equally likely

The manufacturer randomly selects 1, 2, 3, or 4 crash test dummies to put into a

car with front air bags.

The manufacturer randomly selects 2, or 4 crash test dummies to put into a car

with side air bags.

Each crash test dummy in a wall-impact accident suffers damage randomly equal

to either 0.5 or 1, with damage to each dummy being independent of damage to

the others.

Each crash test dummy in a lake-impact accident suffers damage randomly equal

to either 1 or 2, with damage to each dummy being independent of damage to the

others.

One test car is selected at random, and a test dummy accident produces total damage of 1.

Determine the expected value of the total damage for the next accident, given that the

kind of safety device (front or side air bags) and accident type (wall or lake) remain the

same.

Solution

This is one of the most feared exam problems. If you use the framework and shortcut,

however, you should do just fine.

Conceptual framework

N

Damage S = ( X i , where X is damage incurred by one test dummy; N is the number

i =1

of dummies chosen for the crash testing. The observation is S1 = 1 . We are asked to find

E ( S 2 S1 = 1) .

To simplify the problem, lets first discard the observation. Then the problem becomes

finding E ( S2 ) . The crash testing falls into four types:

Front air bag, lake collision (FL)

Side air bag, wall collision (SW)

Side air bag, lake collision (FW)

Guo Fall 2009 C, Page 244 / 284

Next, we set up the partition equation:

E ( S2 )

= E ( S 2 FW ) P ( FW ) + E ( S 2 FL ) P ( FL ) + E ( S 2 SW ) P ( SW ) + E ( S 2 SL ) P ( SL )

test dummies to put into a car with front air bags. Each dummy is equally likely to be

chosen. So the expected number of dummies used for crash testing under FW is:

1+ 2 + 3 + 4

E ( N FW ) = E ( N F ) = = 2.5

4

If the car is tested for lake collision, then the damage to a tested dummy can be either 0.5

or 1 with each damage equally likely:

0.5 + 1

E ( X FW ) = E ( X W ) = = 0.75

2

E ( S 2 FW ) = E ( N FW ) E ( X FW ) = 2.5 ( 0.75 )

Similarly,

1+ 2 + 3 + 4 1+ 2

E ( S 2 FL ) = E ( N FL ) E ( X FL ) = = 2.5 (1.5 )

4 2

2 + 4 0.5 + 1

E ( S 2 SW ) = E ( N SW ) E ( X SW ) = = 3 ( 0.75 )

2 2

2 + 4 1+ 2

E ( S 2 SL ) = E ( N SL ) E ( X SL ) = = 3 (1.5 )

2 2

P ( FW ) = P ( FL ) = P ( SW ) = P ( SL ) = 0.25

However, we are interested in finding the posterior mean E ( S 2 S1 = 1) . So we need to

consider the impact of the observation S1 = 1 . This observation will change the partition

equation into:

+3 ( 0.75 ) P ( SW S1 = 1) + 3 (1.5 ) P ( SL S1 = 1)

P ( FW ) P ( S1 = 1 FW ) P ( FL ) P ( S1 = 1 FL )

P ( FW S1 = 1) = , P ( FL S1 = 1) =

P ( S1 = 1) P ( S1 = 1)

P ( SW ) P ( S1 = 1 SW ) P ( SL ) P ( S1 = 1 SL )

P ( SW S1 = 1) = , P ( SL S1 = 1) =

P ( S1 = 1) P ( S1 = 1)

Where

P ( S1 = 1) = P ( FW ) P ( S1 = 1 FW ) + P ( FL ) P ( S1 = 1 FL )

+ P ( SW ) P ( S1 = 1 SW ) + P ( SL ) P ( S1 = 1 SL )

The key is to calculate P ( S1 = 1 FW ) . In a front bag lake collision testing, the number of

dummies can be 1,2,3, or 4; the damage per dummy can be 0.5 or 1. So there are only 2

ways for FW to produce S1 = 1 .

Two dummies were chosen each having 0.5 damage. Probability: 0.25(0.5)(0.5)

One dummy was chosen having 1 damage. Probability: 0.25(0.5)

Total probability: P ( S1 = 1 FW ) =0.25(0.5)(0.5) + 0.25(0.5) = 0.1875

We can apply the same logic and find (please verify my calculation):

P ( S1 = 1 FL ) = 0.125 , P ( S1 = 1 SW ) = 0.125 , P ( S1 = 1 SL ) = 0

Finally,

0.25 0.1875

P ( FW S1 = 1) =

3

=

0.25 (0.1875 + 0.125 + 0.125) 7

0.25 0.125

P ( FL S1 = 1) =

2

=

0.25 (0.1875 + 0.125 + 0.125) 7

0.25 0.1875

P ( SW S1 = 1) =

2

=

0.25 (0.1875 + 0.125 + 0.125) 7

0.25 0

P ( SL S1 = 1) = =0

0.25 (0.1875 + 0.125 + 0.125)

Finally,

+3 ( 0.75 ) P ( SW S1 = 1) + 3 (1.5 ) P ( SL S1 = 1)

3 2 2

= 2.5 ( 0.75 ) + 2.5 (1.5 ) + 3 ( 0.75 ) = 2.518

7 7 7

Event: S1 = 1

A B C D=BC E F

(class) event probability of the group (raw posterior E ( S 2 S1 = 1)

size of to produce posterior probability

the the event probability)

group multiply the

raw prob by

40,000

FW 1 0.1875 1

( 0.1875) 1875 2.5 ( 0.75 )

4 4

1 1

FL 0.125 ( 0.125) 1250 2.5 (1.5 )

4 4

SW 1 0.125 1 1250 3 ( 0.75)

( 0.125)

4 4

SL 1 0 0 0 3 (1.5)

4

Because the posterior probability is zero for Class to produce S1 = 1 , we can delete the

last row.

Guo Fall 2009 C, Page 247 / 284

Enter the following into BA II Plus/Professional 1-V Statistics Worksheet:

X02= 2.5 (1.5 ) , Y02=1250

X03= 3 ( 0.75) , Y03=1250

The # of claims on a given policy has the geometric distribution with parameter .

One-third of the policies have = 2 ; and the remaining two-thirds have = 5 .

Calculate the Bayesian expected # of claims for the selected policy in Year 2.

Solution

the observation N1 = 2 , then

E ( N2 ) = E ( N2 = 2) P ( = 2) + E ( N2 = 5) P ( = 5)

E ( N2 ) = 2P ( = 2) + 5P ( = 5)

E ( N 2 N1 = 2 ) = 2 P ( = 2 N1 = 2 ) + 5 P ( = 5 N1 = 2 )

Event: A policy has 2 claims in Year 1.

A B C D=BC E F

event probability to the group (raw raw E ( N2 )=

size of produce the posterior posterior

the event (a probability) probability

group geometric

distribution) multiply

the raw

P ( N1 = 20 ) prob by

2

100,000

=

(1 + )

3

1 22 4 1 4

=2 = = 0.04938 4,938 2

(1 + 2 )

3

3 27 3 27

2 52 25 2 25

=5 = = 0.07716 7,716 5

(1 + 5)

3

3 216 3 216

X01=2, Y01=4,938; X02=5, Y02=7,716.

For a particular policy, the conditional probability of the annual number of claims given

) = , and the probability distribution of ) are as follows:

# of claims 0 1 2

Probability 2 1 3

0.10 0.30

Probability 0.80 0.20

Solution

The observation is X 1 = 1 . We are asked to find E ( X 2 X 1 = 1) .

E ( X2 )

= E ( X 2 , X 2 = 0 ) P ( X 2 = 0 ) + E ( X 2 , X 2 = 1) P ( X 2 = 1) + E ( X 2 , X 2 = 2 ) P ( X 2 = 2 )

= 0(2 ) + 1( ) + 2 (1 3 )=2 5

Event: X 1 = 1

A B C D=BC E F

event probability of the group (raw posterior E ( X2 )

size of to produce posterior probability

the the event . probability)

group (The multiply the

probability raw prob by

to have one 100

claim is )

X01=1.5, Y01=8; X02=0.5, Y02=6.

Calculate Bayesian premiums when the prior probability is continuous

The solution process for continuous-prior problems are similar to the process for the

discrete prior problems. There are two major differences:

Well use integration for the continuous prior problems; well use summation for

the discrete prior problems.

You cant use the BA II Plus/Professional 1-V Statistics Worksheet shortcut any

more to solve a continuous-prior premium problem. In contrast, you use the BA II

Plus/Professional 1-V Statistics Worksheet shortcut to solve a discrete-prior

premium problem.

Step 1 Determine the observation.

Step 2. Change the prior probability to posterior probability.

You are given the following information about workers compensation coverage

The # of claims from an employee during the year follows a Poisson distribution

100 p

with mean , where p is the salary (in thousands) for the employee

100

An employee is selected at random. No claims were observed for this employee during

the year. Determine the posterior probability that the selected employee has a salary

greater than 50.

Solution

Please note we are NOT asked to find P ( N 2 > 50 N1 = 0 ) .

Step 2 Ignore the observation. Set up your partition equation.

If we ignore the observation, we just need to find P ( p > 50 ) . Since p is uniform on the

interval [0, 100], we have:

100

P ( p > 50 ) = f ( p ) dp

50

100

P ( p > 50 N = 0 ) = f ( p N = 0 ) dp

50

f ( p) P ( N = 0 p) f ( p) P ( N = 0 p)

f ( p N = 0) = =

P ( N = 0) 100

f ( p ) P ( N = 0 p ) dp

p =0

100 p

N p is a Poisson random variable with mean " = = 1 0.01 p . So

100

P ( N = 0 p ) = e0.01 p 1 . f ( p ) P ( N = 0 p ) = 0.01e0.01 p 1 ,

P ( N = 0) = f ( p ) P ( N = 0 p ) dp = 0.01e0.01 p 1dp = e 1

0.01e0.01 p dp .

p =0 p=0 p =0

=e 1

( e 1) = 1 e 1

f ( p N = 0) = = = e = e

P ( N = 0) 1 e1 1 e1 e 1

100 100

e e0.5

P ( p > 50 N = 0 ) = f ( p N = 0 )dp =

0.01 0.01 p

e dp = = 0.622

p = 50 p = 50

e 1 e 1

Shortcut

100 p

Since N p is a Poisson random variable with mean , we naturally set

100

100 p

"= . Since p is uniform over [0, 100], 100 p is also uniform over [0, 100] and

100

100 p

"= is uniform over [0, 1]. f ( " ) = 1 .

100

f ( " N = 0) =

e e

= =

P ( N = 0) 1

1 e 1

e "d"

0

100 p p

"= =1 , p = 100 (1 " )

100 100

0.5 0.5 "

P ( p > 50 N = 0 ) = P ( " < 0.5 N = 0 ) = f ( " N = 0 )d " =

e

1

d"

" =0 " =0 1 e

1 e 0.5

= = 0.6225

1 e1

You are given:

In a portfolio of risks, each policyholder can have at most two claims per year.

For each year, the distribution of the number of claims is:

# of claims Probability

0 0.1

1 0.9 q

2 q

q2

The prior density is (q) = , 0.2 < q < 0.5

0.039

A randomly selected policyholder had two claims in Year 1 and two claims in Year 2.

For this insured, determine the Bayesian estimate of the expected number of claims in

Year 3.

Solution

Continuous-prior problems are harder than discrete-prior ones and many candidates are

scared of them. However, if you can follow the 5-step framework, youll be on the right

track.

Lets simplify the problem by discarding the observation ( N1 = 2, N 2 = 2 ) . Then our task

is to find prior mean E ( N 3 ) . This is an Exam P problem.

N 3 is distributed as follows:

,

N 3 = -1 with probability 0.9 - q

,2

. with probability q

q2

Here q is a random variable with pdf (q) = , 0.2 < q < 0.5 . If q is fixed, then

0.039

the prior mean given q is:

Eq E ( N 3 q ) ! = Eq ( q + 0.9 ) = E ( q ) + 0.9

E ( N 3 ) = E ( q ) + 0.9

0.5 0.5

q2

E ( q ) = q ( q ) dq = q dq = 0.39

0.2 0.2

0.039

So the mean prior to the observation is 1.29. Please note that we dont need to calculate

the prior mean. I calculated it just to show you this: if you discard the observation, then

the problem becomes an Exam P problem.

Next, lets add in the observation. The observation ( N1 = 2, N 2 = 2 ) will change the

equation from E ( N 3 ) = E ( q ) + 0.9 to

Guo Fall 2009 C, Page 254 / 284

E ( N 3 N1 = 2, N 2 = 2 ) = E ( q N1 = 2, N 2 = 2 ) + 0.9

0.5

E ( q N1 = 2, N 2 = 2 ) = q f ( q N1 = 2, N 2 = 2 ) dq

0.2

f ( q ) P ( N1 = 2, N 2 = 2 q ) f ( q ) P ( N1 = 2, N 2 = 2 q )

f ( q N1 = 2, N 2 = 2 ) = =

P ( N1 = 2, N 2 = 2 ) 0.5

f ( q ) P ( N1 = 2, N 2 = 2 q ) dq

0.2

q2

P ( N1 = 2, N 2 = 2 q ) = q 2 , (q) = .

0.039

q2

( q2 )

q4

f ( q N1 = 2, N 2 = 2 ) = 0.5 0.039 = 0.5

q2

0.039

( q ) dq

2

q 4 dq

0.2 0.2

0.5

1 6 0.5 q 5 dq

0.5 q !

E ( q N1 = 2, N 2 = 2 ) = q f ( q N1 = 2, N 2 = 2 ) dq = 0.5 =6

0.2

0.2

= 0.419

1 5 0.5

0.2 4

q dq q !

5 0.2

0.2

You are given:

The parameter / has an inverse gamma distribution with probability density

function

, " >0

function

, x >0, " >0

For a single insured, two claims were observed that totaled 50. Determine the expected

value of the next claim from the same insured.

Solution

Guo Fall 2009 C, Page 255 / 284

We are asked to find E ( X 3 X 1 + X 2 = 50 ) . If we ignore the observation X 1 + X 2 = 50 ,

then the problem becomes

+ + +

E ( X3 ) = xf ( x )dx = xf ( x " )g ( " ) dx = x ( " 1e x "

)g ( " ) dx

0 0 0

If we consider the observation, well need to change the prior density g ( " ) to the

posterior density g ( " X 1 + X 2 = 50 )

+

E ( X 3 X 1 + X 2 = 50 ) = x ( " 1e x "

)g ( " X 1 + X 2 = 50 ) dx

0

For a group of insureds, you are given:

The amount of claim is uniformly distributed but will not exceed a certain

unknown limit

500

The prior distribution of is ( ) = 2 , > 500

Determine the probability that the next claim will exceed 500.

Solution

500

] . So P ( X 3 > 550 ) =

550

X3 is uniformly distributed over [ 0, .

550

P ( X 3 > 550 ) = 1 f ( )d

500

Since we have the observation X 1 = 400, X 2 = 600 , we will modify the above equation by

changing the prior density f ( ) to the posterior density f ( X 1 = 400, X 2 = 600 ) :

Guo Fall 2009 C, Page 256 / 284

P ( X 3 > 550 X 1 = 400, X 2 = 600 ) = ( X 1 = 400, X 2 = 600 ) d ]

550

1 f

600

Please note that weve also changed d to d because weve observed X 2 = 600 .

500 600

500 1 1 500

= 2

= 4

where > 600

( X 1 = 400, X 2 = 600 ) =

k

f 4

where > 600

k 1 k

d = 1, k 3 3

600

4

4 +1

3 ( 6003 )

f ( X 1 = 400, X 2 = 600 ) = 4

3 ( 6003 ) d = 3 ( 6003 ) ( )d

550 1

= 1 4

4

550 5

600 600

= 3 ( 6003 ) = 3 ( 6003 )

1 4 +1 550 5+1 1 550

600 3

600 4

4 +1 5 +1 ! 600 3 4

3 550

=1 = 0.3125

4 600

You are given:

The amount of a claim, X , is uniformly distributed on the interval [ 0, ]

500

The prior distribution of is ( )= 2

, > 500

Two claims, x1 = 400 and x2 = 600 , are observed. You calculate the posterior

distribution as:

6003

f ( x1 , x2 ) = 3 4

, > 600

Solution

E ( X 3 x1 , x2 ) = E ( X3 )f ( x1 , x2 ) d

600

X3 is uniform over [ 0, ] . So E ( X 3 ) = .

2

6003

E ( X 3 x1 , x2 ) =

2

3 4

d =

3

2

( 6003 ) 3

d =

3

2

( 6003 ) ( 600

1

2

2

) = 450

600 600

You are given:

An individual automobile insured has annual claim frequencies that follow a

Poisson distribution with mean "

An actuarys prior distribution for the parameter " has probability density

function

1

( " ) = ( 0.5) 5e 5"

+ ( 0.5 ) e " 5

In the first policy year, no claims were observed for the insured.

Solution

observation N1 = 0 , then the problem becomes finding E ( N 2 ) . Using the double

expectation theorem, we have:

E ( N 2 ) = E" E ( N 2 " ) ! = E ( " ) = (" ) d"

0

0

+ ( 0.5 ) e " 5

e "

5 !

+ ( 0.5 ) e 6" 5

5 !

5 ( 0.5)

=k ( 6e 6"

) + 0.5 6

e 6" 5

6 6 5 !

Next, well need to find the normalizing constant k . The total probability should be one.

We have:

5 ( 0.5)

(" N 1 = 0 )d " = k ( 6e 6"

) + 0.5 6

e 6" 5

=1

0 0

6 6 5 !

( 6e 6"

) + 0.5 6

e 6" 5

= k =2

0

6 6 5 ! 6 6

5 ( 0.5 )

(" N 1 = 0) = 2 ( 6e 6"

) + 0.5 6

e 6" 5

6 6 5 !

=

5

6

( 6e 6"

) + 16 6

5

e 6" 5

5 1 1 5

1 + = 0.278

0

6 6 6 6

Poisson-gamma model

You are given:

An individual automobile insured has an annual claim frequency distribution that

follows a Poisson distribution with mean "

" follows a gamma distribution with parameter and

The 1 actuary assumes that = 1 and = 1 6

st

The 2nd actuary assumes the same mean for the gamma distribution, but only half

the variance

A total of one claim is observed for the insured over a 3-year period

Both actuaries determine the Bayesian premium for the expected number of

claims in the next year using their model assumptions

Determine the ratio of the Bayesian premium that the 1st actuary calculates to the

Bayesian premium that the 2nd actuary calculates.

Solution

If

" follows a gamma distribution with parameter and

n1 , n2 ,, nk claims are observed in Year 1, Year 2,, Year k respectively

Then

The conditional random variable " n1 , n2 ,..., nk also follows gamma distribution with

parameters

*

= + n1 + n2 + ... + nk = + total # of claims observed

1 1

1 1

*

= = +k = + # of observation years

1+ k

E ( N k +1 n1 , n2 ,..., nk ) = E ( " n1 , n2 ,..., nk ) = * *

=

1

+ # of observation years

This theorem is tested over and over and you should memorize it. If you want to find the

proof of this theorem, refer to the textbook Loss Models.

In this problem,

the observation period = 3 years

# of claims observed = 1

1st actuary: =1, = 1 6 . The Bayesian premium for the 4th year is

= =

1

+ # of observation years 6 + 3 9

2nd actuary: You need to know that a gamma distribution with parameters and has

2

mean and variance . We are told that the two actuaries get the same mean but the

2 actuary gets half the variance of the 1st one.

nd

2

1 1 1 1 1

= 1 = , 2

= 1 , =2, =

6 6 2 6 12

= =

1

+ # of observation years 12 + 3 5

2 1 10

So the ratio is =

9 5 9

Nov 2001 #3

You are given:

The # of claims per auto insured follows a Poisson distribution with mean "

The prior distribution for " has the following probability density function:

( 500" ) e 500"

50

f (" ) =

"1 ( 50 )

Year 1 Year 2

# of claims 75 210

# of autos insured 600 900

Determine the expected # of claims in Year 3.

Solution

The observation is N1 = 75, N 2 = 210 , where N1 is the # of claims in Year 1 for the 600

auto policies; N 2 is the # of claims in Year 2 for the 900 auto policies. N1 has Poisson

distribution with mean of 600" . N 2 has Poisson distribution with mean of 900" .

We need to find E ( N 3 N1 = 75, N 2 = 210 ) , where N 3 is the # of claims in Year 3 for the

one auto policy. Then the expected # of auto claims in Year 3 for 1,100 auto policies is

simply

We are told that

50

f (" ) =

"1 ( 50 )

If you look at Table for Exam C, youll find the gamma pdf is:

x 1 x

x x

e e

( x" ) x"

1

e 1

f ( x) = = = , where " = .

x1( ) 1

1( "1 ( )

)

You should immediately recognize that this is gamma distribution with parameters

= 50 and " = 500 . Then using the gamma distribution formula listed in Table for

Exam C, we have

50

E ( N3 ) = E ( " ) = = = 0.1 .

" 500

If we consider the observation N1 = 75, N 2 = 210 , then we need to modify the formula

E ( N 3 ) = E ( " ) to E ( N 3 N1 = 75, N 2 = 210 ) = E ( " N1 = 75, N 2 = 210 ) .

) 600 "

( 600" ) 900 "

( 900" )

75 210

e e

! !

( 500 + 600+ 900 )"

0" 49 + 75+ 210

e 0" e 334 2000 "

So " N1 = 75, N 2 = 210 is a gamma distribution with parameters *

= 335 and

1

*

= .

2, 000

335

*

=

2, 000

Then the expected # of auto claims in Year 3 for 1,100 auto policies is simply

335

1,100 = 184.25

2, 000

May 2001 #2

Annual claim counts follow a Poisson distribution with mean "

The parameter " has prior distribution with probability density function

1

f (" ) = e " 3

, " >0

3

Two claims were observed during the 1st year. Determine the variance of the posterior

mean.

Solution

Please note that exponential distribution is a gamma distribution with parameter =1.

So this is the Poisson-gamma model.

The observation is N1 = 2 . We are asked to find the variance Var ( " N1 = 2 ) . We are told

that N " is Poisson with mean " , yet " is gamma with =1, = 3.

*

= + # of observed claims = + N1

= ( # of observation periods + ) = (1 + 3 )

1 1

* 1 1

= 0.75

( )

* 2

= 3 ( 0.75 ) = 1.6875

2

Binomial-beta model

For a risk, you are given:

The # of claims during a single year follows a Bernoulli distribution with mean p

The prior distribution for p is uniform on the interval [0, 1]

The claims experience is observed for a number of years

1

The Bayesian premium is calculated as based on the observed claims

5

Which of the following observed claims data could have yielded this calculation?

0 claims during 3 years

0 claims during 4 years

0 claims during 5 years

1 claims during 4 years

1 claims during 5 years

Solution

Please note that a uniform distribution is a special case of beta distribution with

parameter a = b = = 1 . In addition, Bernoulli distribution is a special case of binomial

distribution with n = 1 .

If

p has beta distribution with parameter a and b

x1 , x2 ,, xk claims are observed in Year 1, Year 2,, Year k respectively (where xi

can be 0, 1, , n )

Then

The conditional random variable p x1 , x2 ,..., xk also has beta distribution with parameters

a*

E ( X k +1 x1 , x2 ,..., xk ) = n E ( p x1 , x2 ,..., xk ) = n * *

a +b

Proof.

f ( p ) P ( x1 , x2 ,..., xk p )

f ( p x1 , x2 ,..., xk ) =

1

. Where is

f ( p ) P ( x1 , x2 ,..., xk p ) dp f ( p ) P ( x1 , x2 ,..., xk p ) dp

Next, lets find the beta pdf f ( p ) . If you look at the Exam C table, youll see that beta

distribution has the following pdf:

1 (a + b) 1 x

f ( x) = u a (1 u ) , 0< x< , u=

b 1

1 ( a ) 1 (b ) x

This pdf is really annoying. It has variables u and x . To simplify the pdf, set = 1.

Then u = x and 0 < x < 1 . The pdf becomes:

1 (a + b) 1 1 (a + b) a

f ( x) = x a (1 x ) = (1 x) , 0 < x < 1.

b 1 1 b 1

x

1 ( a ) 1 (b ) x 1 ( a ) 1 (b)

This is the most commonly used beta pdf. This is the one you should use for Exam C.

Back to the problem. Since p has beta distribution with parameter a and b , the pdf is

1 (a + b)

f ( p) = (1 p) (1 p)

b 1 b 1

pa 1

, which is proportional to p a 1

.

1 ( a ) 1 (b )

This is so because x1 , x2 ,..., xk are independent identically distributed given p . For i = 1

to k , xi p is binomial with parameters n and p . So P ( xi p ) = Cnxi p xi (1 p )

n xi

.

So P ( x1 , x2 ,..., xk p ) is proportional to

p x1 (1 p ) p x2 (1 p ) ... p xk (1 p )

n x1 n x2 n xk

! ! !

k

( xi k

= p i=1 (1 p ) (

( x1 + x2 +...+ xk )

= p x1 + x2 +...+ xk (1 p )

kn kn xi

i =1

k

( xi k

( xi , which is proportional to

f ( p x1 , x2 ,..., xk ) is proportional to f ( p ) p (1 p)

kn

i =1

i =1

k k

( xi k

( =p

a+ ( xi 1 k

( xi

(1 p) (1 p) (1 p)

a 1 b 1 kn xi i =1

b+k n 1

i =1

p p i =1 i =1

a* = a + x1 + x2 + ... + xk , b* = b + k n ( x1 + x2 + ... + xk )

Next, well calculate E ( X k +1 x1 , x2 ,..., xk ) , the Bayesian estimate for Year k + 1 , using the

5-step framework.

E ( X k +1 ) . Using the double expectation theorem, we have:

E ( X k +1 ) = E p E ( X k +1 p ) ! = E p [ n p ] = n E ( p )

Next, we consider the observation x1 , x2 ,..., xk . Well modify the above equation by

changing the prior mean E ( p ) to the posterior mean E ( p x1 , x2 ,..., xk ) . We already

know that p x1 , x2 ,..., xk has beta distribution with parameters

a* = a + x1 + x2 + ... + xk , b* = b + k n ( x1 + x2 + ... + xk )

Looking up the beta expectation formula from the Exam C table, we have:

a*

E ( p x1 , x2 ,..., xk ) =

a* + b*

Finally, we have:

a*

E ( X k +1 x1 , x2 ,..., xk ) = n E ( p x1 , x2 ,..., xk ) = n

a* + b*

Now lets apply the binomial-beta formula to this problem. We are told that the # of

claims in a year is a Bernoulli random variable. So the number of trial is n = 1 . In

addition, the prior distribution of p is uniform over [0, 1], which is beta distribution with

parameter a = b = 1 .

Assume we have observed a total of (x i claims in k years. Then the Bayesian

k k k

a + ( xi 1 + ( xi 1 + ( xi

E ( X k +1 x1 , x2 ,..., xk ) = n i =1

= (1) i =1

= i =1

1

5

k

1 + ( xi

1

i =1

=

2+k 5

We have two unknowns in one equation. We cant solve it. One way to find the right

k

k

1 + ( xi

1

answer is to test each answer. If (x

i =1

i = 0 and k = 3 , well have i =1

2+k

=

5

. So zero

Chapter 10 Claim payment per payment

For an insurance:

Losses can be 100, 200 or 300 with respective probabilities 0.2, 0.2, and 0.6.

Calculate Var (Y P ) .

(A) 1500 (B) 1875 (C) 2250 (D) 2625 (E) 3000

Core concepts:

Ground up loss

Ordinary deductible

Claim payment

Claim payment per payment

Explanation

Let X represent the ground up loss amount (ground up loss amount is the actual loss

incurred by the policyholder). Let d where d 0 represent the deductible.

0 if X d

(X d )+ = max ( X d , 0) =

X d if X > d

X if X d

(X d ) = min ( X , d ) =

d if X > d

X = (X d )+ + (X d)

ground up loss amount paid by the insured

amount paid by the

insurance company out of his own pocket

Example. Your deductible for your car insurance is $500. If you have an accident and the

loss is $600, you pay $500 out of your own pocket and your insurance company pays you

$100. In this case,

ground up loss amount paid by the amount paid by the insured

insurance company out of his own pocket

However, if the loss is $400, then you pay all the loss and the insurance company pays

zero.

400 = 0 + 400

ground up loss amount paid by the amount paid by the insured

insurance company out of his own pocket

Let Y represent the claim payment. Then Y = ( X d )+ . Claim payment per payment

means (Y Y > 0 ) . Evidently, if X d , then Y 0 . In this case, the insured will cover all

the loss with his money and wont need to report the loss to the insurance company. So

the insurance company may not even know that a loss has incurred. So for the insurance

company to pay any claim, Y must be positive. This is why the claim payment per

payment is (Y Y > 0 ) .

Full solution

Let X represent the ground up loss. Let Y represent the claim payment. The deductible is

d = 150 .

YP =Y Y > 0

2

Guo Fall 2009 C, Page 269 / 284

Var ( X 150 X > 150 ) E ( X 150 X > 150 ) E 2 ( X 150 X > 150 )

2

2

( X 150 )+ 0 50 150

P(X ) 0.2 0.2 0.6

P ( X > 150 ) = P ( X = 200 ) + P ( X = 300 ) = 0.8

P( X ) 0.2 0.2 0.6

P ( X > 150 ) 0.8 0.8 0.8

0.2 0.2 0.6

+ 50 + 150 = 125

0.8 0.8 0.8

(X 150 ) + X > 150 = 0 2 + 50 2 + 150 2 = 17,500

2

E

0.8 0.8 0.8

Var ( X 150 )+ X > 150 .

Professional 1-V Statistics Worksheet, we can simply discard the data that falls out of the

conditional probability and calculate the mean/variance on the remaining data.

Is X > 150 ? No, so discard Yes. Keep this Yes. Keep this

this data data. data.

X 200 300

( X 150 )+ 50 150

P(X ) 0.2 0.6

10P ( X ) -- Scaled up 2 6

probability

Enter the following into Statistics Worksheet:

n = 8, X = 125, X = 43.30127019

Var = 2

= 1,875

Losses can be 100, 200, 300, and 400 with respective probabilities 0.1, 0.2, 0.3, and 0.4.

Calculate Var (Y P ) .

Solution

Fast solution

Is X > 250 ? No. Discard No. Discard. Yes. Keep. Yes. Keep.

X 300 400

( X 250 )+ 50 150

10 P ( X ) -- scaled up probability 3 4

n = 7, X = 107.14, X = 49.48716593

Var = 2

= 2, 4489.98

Standard solution

(X 250 ) + 0 0 50 150

P(X ) 0.1 0.2 0.3 0.4

P ( X > 250 ) = P ( X = 300 ) + P ( X = 400 ) = 0.3 + 0.4 = 0.7

P(X ) 0.1 0.2 0.3 0.4

P ( X > 250 ) 0.7 0.7 0.7 0.7

1 2 3 4

+0 + 50 + 150 = 107.1428571

7 7 7 7

1 2 3 4

(X 150 ) + X > 150 = 0 2 + 02 + 502 + 150 2 = 13, 928.57143

2

E

7 7 7 7

Losses can be 1,000, 4,000, 5,000, 9,000, and 12,000 with respective probabilities 0.11,

0.17, 0.24, 0.36, and 0.12.

Calculate Var (Y P ) .

Solution

To speed up calculations, we set one unit of money equal to $1,000.

Ground up loss X 1 4 5 9 12

Is X > 0.9 ? Yes. Yes. Yes. Yes. Yes.

Keep. Keep. Keep. Keep. Keep.

( X 0.9 )+ 0.1 3.1 4.1 8.1 11.1

P(X ) 0.11 0.17 0.24 0.36 0.12

100P ( X ) -- scaled 11 17 24 36 12

up probability

X03=4.1, Y03=24; X04=8.1, Y04=36;

X04=11.1, Y04=12

2 2

Chapter 11 LER (loss elimination ratio)

Exam M Sample #27

Losses follow an exponential distribution with the same mean in all years.

The loss elimination ratio this year is 70%.

The ordinary deductible for the coming year is 4/3 of the current deductible.

Core concept:

LER = =

Expected loss amount E(X )

LER answers the question, What % of the expected loss amount is absorbed by the

policyholder due to the deductible?

+ +

E(X ) = xf ( x )dx = s ( x )dx

0 0

X if X d

(X d ) = min ( X , d ) =

d if X > d

d +

E(X d ) = x f ( x )dx + d f ( x )dx (Intuitive formula)

0 d

Alternatively,

d d

E(X d ) = s ( x )dx = 1 FX ( x ) dx

0 0

You can find the proof of the 2nd formula from Loss Models.

To help memorize the above formulas, notice that if we set d = 0 , then

+

E(X ) = E(X 0) = s ( x )dx

0

x x x

1

f ( x) = e , s ( x) = 1 F ( x) = 1 1 e = e , E(X ) =

d d x d

E(X d ) = s ( x )dx = e dx = 1 e

0 0

E(X d) d

LER = =1 e (you might want to memorize this result)

E(X )

d d

1 e = 0.7, e = 0.3

4

Under the new deductible (which is of the original deductible),

3

4

4 d d 3 4

LER ' = 1 e 3

=1 e = 1 0.3 3 = 0.799

http://www.guo.coursehost.com

The above formula works whether Y is a simple random variable or a compound random

n n

variable Y = X i . If Y = X i , make sure you write

i =1 i =1

Dont write

E (Y m )+ = E (Y ) m + mf X ( 0 ) + ( m 1) f X (1) + ( m 2 ) f X ( 2 ) + ...1 f X ( m 1)

In other words, the pdf in the right hand side must match up with the random variable in

n

the left hand side. If the random variable in the left hand side Y = X i , you need to use

i =1

If your random variable in the left hand side is X , then you need to write

To use the above formula in the heat of the exam, we rewrite the above formula into:

fY ( 0 ) m

fY (1) m 1

E (Y m )+ = E (Y ) m + fY ( 2 ) m 2

... ...

fY ( m 1) 1

In the above formula,

http://www.guo.coursehost.com

fY ( 0 ) m

fY (1) m 1

fY ( 2 ) m 2 = mfY ( 0 ) + ( m 1) fY (1) + ( m 2 ) fY ( 2 ) + ...1 fY ( m 1)

... ...

fY ( m 1) 1

This is not a standard notation. However, we use it anyway to help us memorize the

formula. In the exam, you just write these 2 matrixes. Then you simply take out each

element in the 1st matrix and multiply it with a corresponding element in the 2nd matrix.

Next, sum everything up.

Please note that if you take out an element fY ( k ) (where 0 k m 1 ) from the 1st

matrix, then you need to multiple it with m k from the 2nd matrix so ( m k ) + k = m

stands.

d 1

E ( S d )+ = E ( S ) 1 FS ( s )

s =0

d 1

E ( S d )+ = E ( S ) 1 FS ( x )

s =0

The above formula is confusing. f S ( x ) is not a good notation because S and x dont

match. The right notation should be f S ( s ) .

d 1

Lets move on from the formula E ( S d )+ = E ( S ) 1 FS ( s ) . To make our proof

s =0

simple, lets set d = 3 . The proof is the same if d is bigger.

2

E ( S 3) + = E ( S ) 1 FS ( s )

s =0

2

1 FS ( s ) = 1 FS ( 0 ) + 1 FS (1) + 1 FS ( 2 ) = 3 FS ( 0 ) + FS (1) + FS ( 2 )

s =0

FS ( 0 ) = P ( S 0) = P ( S = 0) = fS (0)

http://www.guo.coursehost.com

FS (1) = P ( S 1) = P ( S = 0 ) + P ( S = 1) = f S ( 0 ) + f S (1)

FS ( 2 ) = P ( S 2 ) = P ( S = 0 ) + P ( S = 1) + P ( S = 2 ) = f S ( 0 ) + f S (1) + f S ( 2 )

FS ( 0 ) + FS (1) + FS ( 2 ) = 3 f S ( 0 ) + 2 f S (1) + f S ( 2 )

E ( S 3)+ = E ( S ) 3 + 3 f S ( 0 ) + 2 f S (1) + f S ( 2 )

fY ( 0 ) m

fY (1) m 1

E (Y m )+ = E (Y ) m + fY ( 2 ) m 2

... ...

fY ( m 1) 1

A company provides insurance to a concert hall for losses due to power failure. You are

given:

The number of power failures in a year has a Poisson distribution with mean 1.

x Probability of x

10 0.3

20 0.3

50 0.4

The number of power failures and the amounts of losses are independent.

Calculate the expected amount of claims paid by the insurer in one year.

Solution

http://www.guo.coursehost.com

N

Then S = Xi .

i =1

The total claim dollar amount after the deductible of $30 is:

N

(S 30 )+ = Xi 30

i =1 +

fS (0) 30

f S (1) 29

E ( S 30 )+ = E ( S ) 30 + f S ( 2 ) 28

... ...

f S ( 29 ) 1

It seems like we have awful lot of work to do about the two matrixes. Before you start to

panic, please note that many of the values f S ( 0 ) , f S (1) ,..., f S ( 29 ) will be zero. This is

because X has only 3 distinct values: 10, 20, and 50 with probability of 0.3, 0.3, and 0.4

respectively. Evidently, we can throw away X = 50 . If X = 50 , then S is at least 50 and

is out of the range S 29 .

N

Please also note that S = X i where N is a Poisson random variable with mean =1.

i =1

1

P ( N = n) = e 1

n!

N P(N ) X P ( X 1 , X 2 ,..., X N ) N

S=

P(S )

Xi

i =1

0 e 1 0 e1

1 e 1 X = 10 0.3 10 0.3e 1

X = 20 0.3 20 0.3e 1

2 1

e 1 ( X 1 , X 2 ) = (10,10 ) 0.32 20

e ( 0.32 )

1 1

2 2

http://www.guo.coursehost.com

N

S= Xi

i =1

P(S )

0 e1

10 0.3e 1

20 0.3e 1

e ( 0.32 )

20 1 1

2

After consolidation:

N

S=

P(S )

Xi

i =1

0 e1

10 0.3e 1

( 0.3 ) = 0.345e

20 1

0.3e 1 + e 1 2 1

fS ( 0) 30

E ( S 30 ) + = E ( S ) 30 + f S (10 ) 20

f S ( 20 ) 10

In the actual exam, to help remember the two matrixes, you can write only the 1st matrix:

fS ( 0) a

f S (10 ) b

f S ( 20 ) c

As said early, the sum of the two elements in each row needs to be m (or 30 in this

problem). As a result,

0 + a = 30 a = 30

10 + b = 30 b = 20

20 + c = 30 c = 10

http://www.guo.coursehost.com

fS ( 0) a fS ( 0) 30

f S (10 ) b = f S (10 ) 20

f S ( 20 ) c f S ( 20 ) 10

fS ( 0) 30 e 1

30 1 30

f S (10 ) 20 = 0.3e 1 20 = e 1

0.3 20 = 39.45e 1

f S ( 20 ) 10 0.345e 1 10 0.345 10

N

S= Xi E (S ) = E (N ) E ( X )

i =1

E ( S ) = E ( N ) E ( X ) = 29

E ( S 30 ) + = E ( S ) 30 + 39.45e 1 = 13.5128

x fX ( x)

1 0.6

2 0.4

Solution

http://www.guo.coursehost.com

N

S= X i where S is the aggregate loss and X is individual loss dollar amount.

i =1

fS ( 0) 3

E ( S 3 )+ = E ( S ) 3 + f S (1) 2

fS ( 2) 1

fS ( 0)

Next, we need to find f S (1) .

fS ( 2)

N P(N ) X P ( X 1 , X 2 ,..., X N ) N

S= Xi

i =1

P(S )

0 e 2 0 e2

1 2e 2

X =1 0.6 1 ( 0.6 ) 2e 2

X =2 0.4 2 ( 0.4 ) 2e 2

2 22 2

e = 2e 2 ( X 1 , X 2 ) = (1,1) 0.62 2 ( 0.6 ) 2e

2 2

2!

N

S= Xi

i =1

P(S )

0 e2

1 ( 0.6 ) 2e 2 = 1.2e 2

fS ( 0) 3 e 2

3

E ( S 3)+ = E ( S ) 3 + f S (1) 2 = 2.8 3 + 1.2e 2

2

fS ( 2) 1 1.52e 2

1

http://www.guo.coursehost.com

1 3

= 2.8 3 + e 2

1.2 2 = 2.8 3 + 6.92e 2 = 0.73652

1.52 1

Prescription drug losses, S, are modeled assuming the number of claims has a geometric

distribution with mean 4, and the amount of each prescription is 40.

Calculate E ( S 100 ) +

About the author

Yufeng Guo was born in central China. After receiving his Bachelors degree in physics

at Zhengzhou University, he attended Beijing Law School and received his Masters of

law. He was an attorney and law school lecturer in China before immigrating to the

United States. He received his Masters of accounting at Indiana University. He has

pursued a life actuarial career and passed exams 1, 2, 3, 4, 5, 6, and 7 in rapid succession

after discovering a successful study strategy.

Fall 2002 Passed Course 1

Spring 2003 Passed Courses 2, 3

Fall 2003 Passed Course 4

Spring 2004 Passed Course 6

Fall 2004 Passed Course 5

Spring 2005 Passed Course 7

Mr. Guo currently teaches an online prep course for Exam P, FM, MFE, and MLC. For

more information, visit http://actuary88.com/.

If you have any comments or suggestions, you can contact Mr. Guo at

yufeng_guo@msn.com.

## Гораздо больше, чем просто документы.

Откройте для себя все, что может предложить Scribd, включая книги и аудиокниги от крупных издательств.

Отменить можно в любой момент.