Академический Документы
Профессиональный Документы
Культура Документы
6th Edition
by Yufeng Guo
Fall 2009
This electronic book is intended for individual buyer use for the sole purpose of preparing for
Exam C. This book can NOT be resold to others or shared with others. No part of this publication
may be reproduced for resale or multiple copy distribution without the express written permission
of the author.
Introduction 4
Chapter 1 Doing calculations 100% correct 100% of the time.. 5
6 strategies for improving calculation accuracy ............................................................. 5
6 powerful calculator shortcuts....................................................................................... 6
#1 Solve ax 2 + bx + c = 0 . .................................................................................... 6
#2 Keep track of your calculation...................................................................... 10
#3 Calculate mean and variance of a discrete random variable......................... 21
#4 Calculate the sample variance....................................................................... 29
#5 Find the conditional mean and conditional variance .................................... 30
#6 Do the least squares regression ..................................................................... 36
#7 Do linear interpolation .................................................................................. 46
Chapter 2 Maximum likelihood estimator ......................................... 52
Basic idea ...................................................................................................................... 52
General procedure to calculate the maximum likelihood estimator ............................. 53
Fisher Information ........................................................................................................ 58
The Cramer-Rao theorem ............................................................................................. 62
Delta method................................................................................................................. 66
Chapter 3 Kernel smoothing................................................................ 75
Essence of kernel smoothing ........................................................................................ 75
Uniform kernel.............................................................................................................. 77
Triangular kernel........................................................................................................... 82
Gamma kernel............................................................................................................... 90
Chapter 4 Bootstrap.............................................................................. 95
Essence of bootstrapping .............................................................................................. 95
Recommended supplemental reading ........................................................................... 96
Chapter 5 Bhlmann credibility model ............................................ 102
Trouble with black-box formulas................................................................................ 102
Rating challenges facing insurers ............................................................................... 102
3 preliminary concepts for deriving the Bhlmann premium formula ....................... 106
Preliminary concept #1 Double expectation ....................................................... 106
Preliminary concept #2 Total variance formula.................................................. 108
Preliminary concept #3 Linear least squares regression ..................................... 111
Derivation of Bhlmanns Credibility Formula.......................................................... 112
Summary of how to derive the Bhlmann credibility premium formulas .................. 117
Special case................................................................................................................. 122
How to tackle Bhlmann credibility problems ........................................................... 123
An example illustrating how to calculate the Bhlmann credibility premium ........... 123
Shortcut ....................................................................................................................... 126
Practice problems........................................................................................................ 126
Chapter 6 Bhlmann-Straub credibility model ............................... 148
Context of the Bhlmann-Straub credibility model.................................................... 148
Assumptions of the Bhlmann-Straub credibility model............................................ 149
Summary of the Bhlmann-Straub credibility model................................................. 154
Guo Fall 2009 C, Page 2 / 284
General Bhlmann-Straub credibility model (more realistic) .................................... 155
How to tackle the Bhlmann-Straub premium problem ............................................. 158
Chapter 7 Empirical Bayes estimate for the Bhlmann model...... 168
Empirical Bayes estimate for the Bhlmann model ................................................... 168
Summary of the estimation process for the empirical Bayes estimate for the
Bhlmann model..................................................................................................... 170
Empirical Bayes estimate for the Bhlmann-Straub model........................................ 173
Semi-parametric Bayes estimate................................................................................. 182
Chapter 8 Limited fluctuation credibility ........................................ 187
General credibility model for the aggregate loss of r insureds ................................. 188
Key interim formula: credibility for the aggregate loss............................................. 190
Final formula you need to memorize .......................................................................... 191
Special case................................................................................................................. 192
Chapter 9 Bayesian estimate ......................................................... 202
Intuitive review of Bayes Theorem ........................................................................... 202
How to calculate the discrete posterior probability .................................................... 206
Framework for calculating the discrete posterior probability..................................... 208
How to calculate the continuous posterior probability ............................................... 213
Framework for calculating discrete-prior Bayesian premiums................................... 219
Calculate Bayesian premiums when the prior probability is continuous.................... 251
Poisson-gamma model ................................................................................................ 260
Binomial-beta model................................................................................................... 264
Chapter 10 Claim payment per payment ........................................... 268
Chapter 11 LER (loss elimination ratio)............................................. 274
Chapter 12 Find E(Y-M)+.................................................................... 276
About the author .................................................................................... 284
Chapter 1 teaches you how to do manual calculation quickly and accurately. If you
studied hard but failed Exam C repeatedly, chances are that you are concept strong,
calculation weak. The calculator techniques will improve our calculation accuracy.
Chapter 3 explains the essence of kernel smoothing and teaches you how to derive
complex kernel smoothing formulas for k y ( x ) and K y ( x ) . You shouldnt have any
trouble memorizing complex kernel smoothing formulas after this chapter.
Many candidates dont know the essence of bootstrap. Chapter 4 is about bootstrap.
Chapter 5 explains the core theory behind the Bhlmann credibility model.
Chapter 6 compares and contrasts the Bhlmann-Straub credibility models with the
Bhlmann credibility model.
Many candidates are afraid of empirical Bayes estimate problems. The formulas are just
too hard to remember. Chapter 7 will relieve your pain.
Many candidates find that there are just too many limited fluctuation credibility formulas
to memorize. To address this, Chapter 8 gives you a unified formula.
Chapter 9 presents a framework for quickly calculating the posterior probability (discrete
or continuous) and the posterior mean (discrete or continuous). Many candidates can
recite Bayes theorem but cant solve related problem in the exam condition. Their
calculation is long, tedious, and prone to errors. This chapter will drastically improve
your calculation efficiency.
2. Learn how to solve a problem faster. Many exam candidates solve hundreds of
practice problems yet fail Exam C miserably. One major cause is that their
solutions are inefficient. Typically, these candidates copy solutions presented in a
textbook and study manuals. Authors of textbooks and many study manuals
generally use software to do the calculations. To solve a messy calculation, they
just type up the formula and click Compute button. However, when you take
the exam, you have to calculate the answer manually. A solution that looks clean
and easy in a textbook may be a nightmare in the exam. When you prepare for
Exam C, dont copy textbook solutions. Improve them. Learn how to do manual
calculation faster.
3. Build solution frameworks and avoid reinventing the wheel. If you analyze Exam
C problems tested in the past, youll see that SOA pretty much tests the same
things over and over. For example, the Poisson-gamma model is tested over and
over. When preparing for Exam C, come up with a ready-to-use solution
framework for each of the commonly tested problems in Exam C. This way, when
you walk into the exam room and see a commonly tested problem, you dont need
to solve the problem from scratch. You can use your pre-built solution framework
and solve it quickly and accurately.
4. Keep an error log. Whenever you solve some practice problems, record your
errors in a notebook. Analyze why you made errors. Try to solve a problem
differently to avoid the error. Review your error log from time to time. Using an
error log helps you avoid making the same calculation errors over and over.
5. Avoid doing mental math in the exam even for the simplest calculations. Even if
you are solving a simple problem like 2+3, use your calculator to solve the
#1 Solve ax 2 + bx + c = 0 .
b b 2 4ac
The formula x = is OK when a, b, and c are nice and small numbers.
2a
However, when a, b, and c have many decimals or are large numbers and we are in the
pressured situation, the standard solution often falls apart in the heat of the exam.
If candidates need to solve this equation in the exam, many will fluster. The standard
b b 2 4ac
approach x = is labor intensive and prone to errors when a, b, and c are
2a
messy.
To solve this equation 100% right under pressure and in a hurry, well do a little trick.
1
First, we set x = v = . So we treat x as a dummy discount factor. The original
1+ r
equation becomes:
Finding r is a concept you learned in Exam FM. We first convert the equation to the
following cash flow diagram:
Time t 0 1 2
To find r (the IRR), we simply use Cash Flow Worksheet in BA II Plus or BA II Plus
Professional.
Cash Flow CF 0 C 01 C 02
0.752398 - 89.508 0.3247
Frequency F 01 F 02
1 1
Because the cash flow frequency is one for both C 01 and C 02 , we dont need to enter
F 01 = 1 and F 02 = 1 . If we dont enter cash flow frequency, BA II Plus and BA II Plus
Professional use one as the default cash flow frequency.
Using the IRR function, we find that IRR = 99.63722807 . Remember this is a
percentage. So r = 99.63722807%
1 1
x1 = = = 275.6552834
1 + r 1 99.63722807%
How are going to find the second root? Well use the following formula:
If x1 and x2 are the two roots of ax 2 + bx + c = 0 , then
c 1 c
x1 x2 = x2 =
a x1 a
1 c 1 0.752398
x2 = = = 0.00840619
x1 a 275.6552834 0.3247
Calculate IRR
IRR IRR=0.00000000
Plugging in x2 = 0.00840619
Does this look at lot of work? Yes at the first time. Once you get familiar with this
process, it takes you 15 seconds to finish calculating x1 and x2 and double checking they
are right.
Step 1 Rearrange ax 2 + bx + c = 0 to c + bx + ax 2 = 0 .
Step 2 Use BA II Plus/BA II Plus Professional Cash Flow Worksheet to find IRR
Time t 0 1 2
Cash flow c b a
1 1 c
x1 = , x2 =
IRR x1 a
1+
100
For example, if you see x 2 2 x 3 = 0 , you can guess that x1 = 1 and x2 = 3 . However,
if you see x 2 2 x 7.3 = 0 , use Cash Flow Worksheet to solve it.
Exercise
#2 Solve x 2 2 x 7.3 = 0 .
Answer: x1 = 3.88097206 and x2 = 1.88097206
#3 Solve 0.9080609 x 2 0.00843021x 0.99554743 = 0
Answer: x1 = 1.0517168 and x2 = 1.04243305
#4 Solve x 2 2 x + 3 = 0 .
Answer: youll get an error message if want to calculate IRR. Theres no solution.
x 2 2 x + 3 = ( x 1) + 2 2 . So theres no solution.
2
Example 1
Solution
If the company spends at least $50,000 on exam-related raises, then the number of
students who will pass Exam C must be at least 50,000/2,500=20. So we need to find the
probability of having at least 20 students pass Exam C.
Let X = the number of students who will pass Exam C. The problem does not specify the
distribution of X . So possibly X has a binomial distribution. Lets check the conditions
for a binominal distribution:
X satisfies the requirements of a binomial random variable with parameters n =23 and
p =0.73. We also need to find the probability of x 20 .
Pr(x 20) = Pr(x = 20) + Pr(x = 21) + Pr(x = 22) + Pr(x = 23)
f (x 20)
= C 23
20
(.73)20 (.27)3 + C 23
21
(0.73)21(.27)2 + C 23
22
(.73)22 (.27) + C 23
23
(.73)23 = .09608
Therefore, there is a 9.6% of chance that the company will have to spend at least $50,000
to pay for exam-related raises.
20 3.27096399
Calculate (.73)
x
.73 y 20
21 0.34111482
Calculate (.73)
x
.73 y 21
23 0.09608031
Calculate (.73) and get the final
x
result .73 y 23 =
( 0.73 )
20 3.27096399
Calculate x
.73 y 20
(.27 )
3 0.064328238
Calculate x
.27 y 3=
21 0.34111482
Calculate (.73)
x
.73 y 21
23 0.00071850
Calculate (.73) and get the final
x
result .73 y 23 =
Method #1 is quicker but more risky. Because you dont have an audit history, if you
miscalculate one item, youll need to recalculate everything again from scratch.
Method #2 is slower but leaves a good auditing trail by storing all your intermediate
values in your calculators memories. If you miscalculate one item, you need to
recalculate that item alone and reuse the result of other calculations (which are correct).
(.73 ) (.27 )
20 20 3
For example, instead of calculating C 23 as you should, you calculated
(.73 ) (.27 ) . To correct this error under method #1, you have to start from scratch
20 3 20
C 23
and calculate each of the following four items:
In contrast, correcting this error under Method #2 is lot easier. You just need to
(.73 ) (.27 ) ; you dont need to recalculate any of the following three
20 20 3
recalculate C 23
items:
You can easily retrieve the above three items from your calculators memories and
calculate the final result:
20
C 23 (.73)20 (.27)3 + C 23
21
(0.73)21(.27)2 + C 23
22
(.73)22 (.27) + C 23
23
(.73)23 = .09608
Solution
This calculation is complex. Unless you use a systematic method, youll make mistakes.
l30 10 1 l30 10 1 1 10
a20 v a30 a20 v a30 a20 v a30
l50 l20 l30 l20 20 l30 l20
A50 v 20 = A50 v 20 = A50 v
l30 l50 30 1 l50 30 1 1 30
a20 v a50 a20 v a50 a20 v a50
l20 l50 l20 l50 l20
v = 1.06 1
a20 a30 10
1.06
l30 l20
V = A501.06 20
a20 a50 30
1.06
l50 l20
Make sure you dont make mistakes in simplification. If you are afraid of making
mistakes, dont simplify and just do your calculations using the original equation:
l30 10
a20 v a30
l50 l20
V= A50 v 20
l30 l50 30
a20 v a50
l20
a20 a30 10 M4 M5
1.06 1.06 10
l30 l20
V = A501.06 20
= ( M 3)1.06 20 M1 M 0
a20 a50 M4 M6
1.06 30
1.06 30
l50 l20 M2 M0
Keystrokes: press 2nd MEM. Then keep pressing the down-arrow key to view all the
data you entered in the memories. Make sure all the correct numbers are entered.
M4 M5
1.06 10
V = ( M 3)1.06 20 M1 M 0
M4 M6
1.06 30
M2 M0
M4 M5
1.06 10
= M 7 (store the result in M7)
M1 M0
M4 M6
1.06 30
= M 8 (store the result in M8)
M2 M0
M7
V = ( M 3)1.06 20
M8
1.06 y x 10 +/- =
Store the result in M7.
Go back to the normal STO 7 CE/C
calculation mode.
0.00000160
Calculate Recall 4 Recall 1 - Recall 5
M4 M6 30 Recall 0
1.06
M2 M0
1.06 y x 10 +/- =
Store the result in M8.
Go back to the normal STO 8 CE/C
calculation mode.
0.0399556010
x
Calculate Recall 3 1.06 y 20 +/-
M7
V = ( M 3)1.06 20 Recall 7 Recall 8
M8
So V = 0.0399556 0.04
Though this calculation process looks long, once you get used to it, you can do it in less
than one minute.
This process gives us a good auditing trail, enabling us to check the data entry and
calculations.
We can isolate errors. For example, if a wrong value of l30 is entered into the
a20 a30
memory, we can reenter l30 , recalculate 1.06 10 , and store the calculate
l30 l20
M7
value into M7. Next, we recalculate V = ( M 3)1.06 20 .
M8
Bottom line: I recommend that you master this calculation method. It costs you extra
work, but it enables you to do messy calculations 100% right in the exam.
When exams get tough and calculations get messy, many candidates who know as much
as you do will make calculations errors here and there and fail the exam. In contrast,
youll stand above the crowd and make no errors, passing another exam.
In Example 2, you calculated that V = 0.04 . However, none of the answer choices given
is 0.04. Suspecting that you made an error in calculations, you decided to redo the
calculation. First, you scrolled over the memories and gladly you found no error in data
M4 M5 M4 M6
entry. Next, you recalculated 1.06 10 = M 7 and 1.06 30 = M 8 .
M1 M 0 M2 M0
Once again, you found your previous calculations were right. Finally, you recalculated
M7
V = ( M 3)1.06 20 . Once again, you got V = 0.04 .
M8
You already spent four minutes in this problem. You decided to spend two more minutes
on this problem. If you couldnt figure out the right answer, you just had to give it up and
move on to the next problem.
So you quickly read the problem again. Oops! You found that your formula was wrong.
Your original formula was:
l30 10
a20 v a30
l50 l20
V= a50 v 20
l30 l50 30
a20 v a50
l20
How could you find the answer quickly, using the correct formula?
Solution
The situation described here sometimes happens in the actual exam. If you dont use a
systematic method to do calculations, you wont leave a good auditing trail. In that case,
all your previous calculations are gone and you have to redo calculations from scratch.
This is awful.
Fortunately, you left a good auditing trail and correcting errors was easy.
l30 10
a20 v a30
l50 l20 M7
V= A50 v 20 = ( M 3)1.06 20
l30 l50 30 M8
a20 v a50
l20
l30 10
a20 v a30
l50 l20 M7
V= a50 v 20 = ( M 6 )1.06 20
l30 l50 30 M8
a20 v a50
l 20
Remember a50 = M 6
M7
V = ( M 6 )1.06 20
= 2.10713362 2.11
M8
Guo Fall 2009 C, Page 20 / 284
Now you look at the answer choices again. Good. 2.11 is there!
Exam #1 (#8 Course 1 May 2000) A probability distribution of the claim sizes for
an auto insurance policy is given in the table below:
What percentage of the claims are within one standard deviation of the mean claim size?
Solution
One critical thing to remember about the BA II Plus and BA II Plus Professional
Statistics Worksheet is that you cannot directly enter the probability mass function f ( x i )
into the calculator to find E ( X ) and Var ( X ) . BA II Plus and BA II Plus Professional 1-
V Statistics Worksheet accepts only scaled-up probabilities that are positive integers. If
you enter a non-integer value to the statistics worksheet, you will get an error when
attempting to retrieve E ( X ) and Var ( X ) .
Next, enter the 7 data pairs of (claim size and scaled-up probability) into the BA II Plus
Statistics Worksheet to get E ( X ) and X .
30 ENTER X02=30.0000
Y02=10.0000
10 ENTER
40 ENTER X03=40.0000
Y03=5.0000
5 ENTER
Guo Fall 2009 C, Page 22 / 284
50 ENTER
X04=50.0000
20 ENTER Y04=20.0000
60 ENTER X05=60.0000
Y05=10.0000
10 ENTER
70 ENTER X06=70.0000
Y06=10.0000
10 ENTER
80 ENTER X07=80.0000
Y07=30.0000
30 ENTER
Select statistical calculation
portion of Statistics 2nd [Stat] Old content
worksheet
Select one-variable
calculation method Keep pressing 2nd SET 1-V
until you see 1-V
View the sum of the scaled- n=100.0000 (Make sure the
up probabilities sum of the scaled-up
probabilities is equal to the
scaled-up common factor,
which in this problem is
100. If n is not equal to the
common factor, youve
made a data entry error.)
View mean x =55.0000
View sample standard S x =21.9043 (this is a
deviation sample standard deviation--
- dont use this value). Note
that
1 n
Sx = (X i X )2
n 1 i =1
View standard deviation X =21.7945
You should always double check (using to scroll up or down the data pairs of X and
Y) that your data entry is correct before accepting E ( X ) and X generated by BA II
Plus.
If you have made an error in data entry, you can 2nd DEL to delete a data pair (X, Y) or
2nd INS to insert a data pair (X,Y). If you typed a wrong number, you can use to delete
the wrong number and then re-enter the correct number. Refer to the BA II Plus
guidebook for details on how to correct data entry errors.
Then, we have
Pr(33.21 X 76.79) = Pr( X = 40) + Pr( X = 50) + Pr( X = 60) + Pr( X = 70)
=0.05+0.20+0.10+0.10 = 0.45
To find E ( X ) , we type:
20*.15+30*.1+40*.05+50*.2+60*.1+70*.1+80*.3
So E ( X 2 ) =3,500
Keep in mind that you can enter up to 88 digits for a formula in TI-30X IIS. If your
formula exceeds 88 digits, TI 30X IIS will ignore the digits entered after the 88th digit.
A baseball team has scheduled its opening game for April 1. If it rains on April 1, the
game is postponed and will be played on the next day that it does not rain. The team
purchases insurance against rain. The policy will pay 1,000 for each day, up to 2 days,
that the opening game is postponed. The insurance company determines that the number
of consecutive days of rain beginning on April 1 is a Poisson random variable with a 0.6
mean. What is the standard deviation of the amount the insurance company will have to
pay?
(A) 668, (B) 699, (C) 775, (D) 817, (E) 904
Solution
n
0.6 n
Pr(N = n ) = e =e 0.6
(n =0,1,2,..+ )
n! n!
If a problem asks you to calculate the mean, standard deviation, or other statistics of a
discrete random variable, it is always a good idea to list the variables values and their
corresponding probabilities in a table before doing the calculation to organize your data.
So lets list the data pair ( X , probability) in a table:
0!
1,000 0.61
Pr(N = 1) = e 0.6
= 0.6e 0.6
1!
2,000
Pr(N 2) = Pr(N = 2) + Pr(N = 3) + ...
=1-[ Pr(N = 0) + Pr(N = 1)]
0.6
=1-1.6e
Once you set up the table above, you can use BA II Pluss Statistics Worksheet or TI-30
IIS to find the mean and variance.
1000*.6e^(-.6)+2000(1-1.6e^(-.6
When typing e^(-.6) for e 0.6 , you need to use the negative sign, not the minus sign, to
get -6. If you type the minus sign in e^( .6), you will get an error message.
Additionally, for 0.6 e 0.6 , you do not need to type 0.6*e^(-.6), just type .6e^(-.6). Also,
to calculate 2000(1 1.6e .6 ) , you do not need to type 2000*(1-1.6*(e^(-.6))). Simply
type
2000(1-1.6e^(-.6
Your calculator understands you are trying to calculate 2000(1 1.6e .6 ) . However, the
omission of the parenthesis sign works only for the last item in your formula. In other
words, if your equation is
2000(1 1.6e .6
) + 1000 .6e .6
2000(1-1.6e^(-.6)) + 1000*.6e^(-.6
If you type
2000(1-1.6e^(-.6 + 1000*.6e^(-.6
2000(1-1.6e^(-.6 + 1000*.6e^(-.6) ) )
1000*.6e^(-.6)+2000(1-1.6e^(-.6
press ENTER. You should get E ( X ) = 573.0897. This is an intermediate value. You
can store it on your scrap paper or in one of your calculators memories.
Var (X ) = E (x 2 ) E 2 (x ) =488460.6535
X = Var (x ) = 698.9960 .
First, please note that you can always calculate X without using the BA II Plus built-in
Statistics Worksheet. You can calculate E (X ), E (X 2 ),Var (X ) in BA II Plus as you do
any other calculations without using the built-in worksheet.
E (x ) = 0 * e .6
+ 1,000(.6e .6
) + 2,000(1 1.6e .6
)
Var (x ) = E (x 2 ) E 2 (x ), X = Var (x )
You simply calculate each item in the above equations with BA II Plus. This will give
you the required standard deviation.
The key to using the BA II Plus Statistics Worksheet is to scale up the probabilities to
integers. To scale the three probabilities:
.6 .6 .6
(e , 0.6e , 1 1.6e )
Then we just enter the following data pairs into BA II Pluss statistics worksheet:
X01=0 Y01=5,488;
X02=1,000 Y02=3,293;
X03=2,000 Y03=1,219.
Make sure your calculator gives you n that matches the sum of the scaled-up
probabilities. In this problem, the sum of your scaled-up probabilities is 10,000, so you
should get n=10,000. If your calculator gives you n that is not 10,000, you know that at
least one of the scaled-up probabilities is wrong.
Of course, you can scale up the probabilities with better precision (more closely
resembling the original probabilities). For example, you can scale them up this way
(assuming you set your calculator to display 8 decimal places):
Then we just enter the following data pairs into BA II Pluss statistics worksheet:
X01=0 Y01=54,881,164;
X02=1,000 Y02=32,928,698;
X03=2,000 Y03=12,190,138.
Then the calculator will give you X =698.8995993 (remember to check that
n=100,000,000)
For exam problems, scaling up the original probabilities by multiplying them by 10,000
is good enough to give you the correct answer. Under exam conditions it is unnecessary
to scale the probability up by multiplying by 100,000,000.
Determine the credibility of one years experience for a single driver using
semiparametric empirical Bayes estimation.
Solution
54 ( 0 ) + 33 (1) + 10 ( 2 ) + 2 ( 3) + 1( 4 ) 63
=X = = = 0.63
54 + 33 + 10 + 2 + 1 100
( ) ( )
n
1 1 100
Var ( X ) =
2 2
Xi X = Xi X
n 1 i =1 100 1 i =1
54 ( 0 .63) + 33 ( 0 .63) + 54 (1 .63) + 10 ( 2 .63) + 2 ( 3 .63) + 1( 4 .63)
2 2 2 2 2 2
=
100 1
=0.68
Enter
X01=0, Y01=54
X02=1, Y02=33
X03=2, Y03=10
X04=3, Y04=2
X05=4, Y05=1
While your calculator displays S X = 0.82455988 , press the x 2 key of your calculator.
You should get: 0.67989899. This is Var ( X ) = S X2 . So Var ( X ) = 0.67989899 0.68
Example
For an insurance:
A policyholders annual losses can be 100, 200, 300, and 400 with respective
probabilities 0.1, 0.2, 0.3, and 0.4.
Calculate the mean and the variance of the annual payment made by the insurer to the
policyholder, given theres a payment.
Solution
Let X represent the annual loss. Let Y represent the claim payment by the insurer to the
policyholder.
0 if X 250
Then Y =
X 250 if X > 250
Standard solution
n = 7, X = 107.14, X = 49.48716593
Var = 2
= 2, 4489.98
This is how BA II Plus/Professional 1-V Statistics Worksheet works. After you enter
X01=50, Y01=3,X02=150, Y02=4, BA II Plus/Professional knows that your random
variable X takes on two values: 50 (with frequency of 3) and 150 (with frequency 4).
Next, BA II Plus/Professional sets up the following table for statistics calculation:
3 3
$$50 =
with probability
3+ 4 7
X=
$150 with probability 4 = 4
$ 3+ 4 7
3 4
E ( X ) = 50 + 150 ,
7 7
E ( X 2 ) = 502
3 4
+ 1502 ,
7 7
Var ( X ) = E ( X 2 ) E 2 ( X )
1 2 3 4
(X 150 ) + X > 150 ! = 02 + 02 + 502 + 150 2
2
E
" # 7 7 7 7
Now you see that BA II/Professional correctly calculates the mean and variance.
The following entries produce identical mean, sample mean, and variance:
3
$$50 with probability
7
X=
$150 4
with probability
$ 7
Example
X =x pX ( x )
0.5
4
6
( 0.54 ) k
0.25
1
6
( 0.253 ) ( 0.75 ) k
Solution
X =x pX ( x ) Scaled p X ( x ) up multiply
1, 000, 000
p X ( x ) by
k
0.5
4
6
( 0.54 ) k = 0.041667 k 41,667
0.25
1
6
( 0.253 ) ( 0.75 ) k = 0.001953 k 1,953
0.75
1
6
( 0.753 ) ( 0.25 ) k = 0.017578 k 17,578
X01=0.5, Y01=41,667
X02=0.25, Y02= 1,953
X03=0.75, Y03=17,578
&
X 0 1
0 0.4 0.1
1 0.1 0.2
2 0.1 0.1
10
For a given value of ' and a sample of size 10 for X : X i = 10
i =1
Solution
n = 4, X =1 X = 0.70710678 Var = 2
= 0.707106782 = 0.5
7
X01= , Y01=6; X01=0.5, Y02=4
12
One useful yet neglected feature of BA II Plus/BA II Plus Professional is the linear least
squares regression functionality. This feature can help you quickly solve a tricky problem
with a few simple key strokes. Unfortunately, 99.9% of the exam candidates dont know
of this feature. Even SOA doesnt know.
In a regression analysis, you try to fit a line (or a function) through a set of points. With
least squares regression, you want to get a better fit by minimizing the distance squared
of each point to the fitted line. You then use the fitted line to project where the data point
is most likely to be.
Say you want to find out how ones income level affects how much life insurance he
buys. Let X represent ones income. Let Y represent the amount of life insurance this
person buys. You have collected some data pairs of ( X , Y ) from a group of consumers.
You suspect theres a linear relationship between X and Y . So you want to predict
Y using the function a + bX , where a and b are constant. With least squares regression,
you want to minimize the following:
"(
Q=E a + bX Y ) !
2
#
(Q ( (
= E ( a + bX Y ) ! = E ( a + bX Y ) !# )* = E " 2 ( a + bX Y ) !#
2 2
(a (a " # (a " +
= 2 " E ( a + bX Y ) !# = 2 " a + bE ( X ) E (Y ) !#
(Q
Setting = 0. a + bE ( X ) E (Y ) = 0 ( Equation I )
(a
(Q ( (
= E ( a + bX Y ) ! = E ( a + bX Y ) #! )* = E " 2 ( a + bX Y ) X !#
2 2
(b (b " # (b " +
= 2 E "( a + bX Y ) X #! = 2 " aE ( X ) + bE ( X 2 ) E ( X Y ) !#
(Q
Setting = 0. aE ( X ) + bE ( X 2 ) E ( X Y ) = 0 ( Equation II )
(b
(Equation II ) - (Equation I ) E ( X ) :
b " E ( X 2 ) E 2 ( X ) !# = E ( X Y ) E ( X ) E (Y )
Where
Var ( X ) = E ( X 2 ) E 2 ( X ) , E ( X ) = pi xi , E ( X 2 ) = pi xi2
Cov ( X , Y ) = E ( X Y ) E ( X ) E (Y ) , E ( X Y ) = pi xi yi , E (Y ) = pi yi
Example 1. For the following data pair ( xi , yi ) , find the linear least squares regression
line a + bX :
i pi ( xi , yi ) X = xi Y = yi
1 13 0 1
2 13 3 6
3 13 12 8
Solution
( 0 + 3 + 12 ) = 5 , E ( X 2 ) = ( 02 + 32 + 122 ) = 51
1 1
E(X ) =
3 3
Var ( X ) = 51 52 = 26
1 1
E (Y ) = (1 + 6 + 8) = 15 , E ( X Y ) = ( 0 1 + 3 6 + 12 8 ) = 38
3 3
Cov ( X , Y ) = E ( X Y ) E ( X ) E (Y ) = 38 5 5 = 13
Cov ( X , Y ) 13
b= = = 0.5 , a = E (Y ) bE ( X ) = 5 0.5 5 = 2.5
Var ( X ) 26
Now you understand the linear least squares regression. Next, lets talk about how to use
BA II Plus/BA II Plus Professional to find a and b and calculate a + bX when X =0, 3,
12.
Example 2. For the following data pair ( xi , yi ) , find the linear least squares regression
line a + bX using BA II Plus/BA II Plus Professional.
i pi ( xi , yi ) X = xi Y = yi
1 13 0 1
2 13 3 6
3 13 12 8
Solution
You see that using BA II Plus/Professional LIN Statistics Worksheet, we get the same
result.
You might wonder why we didnt use the probability pi ( xi , yi ) . Here is an important
point. BA II Plus/Professional Statistics Worksheet (including LIN) cant directly handle
probabilities. To use Statistics Worksheet, you have to first convert the probabilities to
1
the # of occurrences. In this problem, pi ( xi , yi ) = for i =1,2, and 3. So we have 3 data
3
pairs of ( xi , yi ) and each data pair is equally likely to occur. So we arbitrarily let each
data pair to occur only once. This way, BA II Plus/Professional knows that each of the
three data pairs has 1 3 chance of occurring. Later I will show you how to use LIN when
pi ( xi , yi ) is not uniform.
Some of you might complain: I can easily use my pen and find the answers. Why do I
need to bother using LIN? There are several reasons why you might want to use LIN to
find the regression line a + bX and calculate various values of a + bX :
In the of the exam, its easy for you to be brain dead and forget the formulas
Cov ( X , Y )
b= , a = E (Y ) bE ( X )
Var ( X )
Even if you are not brain dead, you can easily make mistakes calculating a + bX
from scratch. In contrast, if you have entered your data pair ( xi , yi ) correctly, BA
II Plus/Professional will generate the results 100% right.
Even if you want to calculate a + bX from scratch, its good to use LIN to
double check your work.
i pi ( xi , yi ) X = xi Y = yi
1 16 0 1
2 13 3 6
3 12 12 8
Solution
Of course, you can also assume that the total # of occurrences is 60. Then ( x1 , y1 ) occurs
10 times; ( x2 , y2 ) occurs 20 times; and ( x3 , y3 ) occurs 30 times. However, this approach
will make your data entry difficult.
X01=0, Y01=1
X02=3, Y02=6
X03=3, Y04=6
X04=12, Y04=8
X05=12, Y05=8
X06=12, Y06=8
E(X2) = ( 0 ) + ( 3 ) + (122 ) = 75
1 1 1 1 2 1 2 1
E(X ) = ( 0 ) + ( 3) + (12 ) = 7 ,
6 3 2 6 3 2
Var ( X ) = 75 72 = 26
1 1 1
E (Y ) = (1) + ( 6 ) + ( 8 ) = 6.1667
6 3 2
1 1 1
E ( X Y ) = ( 0 1) + ( 3 6 ) + (12 8 ) = 54
6 3 2
Cov ( X , Y ) 10.8331
b= = = 0.41666
Var ( X ) 26
a + bX = 3.25 + 0.41666 X
Now you should be convinced that LIN Statistics Worksheet produces the correct result.
Let X 1 represent the outcome of a single trial and let E ( X 2 X 1 ) represent the expected
value of the outcome of a 2nd trial as described in the table below:
Solution
E ( a + ZX 1 Y )
2
where Y = E ( X 2 X 1 ) .
Since the probability of data pair is uniformly 1 3, we enter the following data in LIN:
X01=0, Y01=1
X02=3, Y02=6
X03=12, Y03=8
We should get:
a = 2.5 , b = 0.5
Enter X ' = 0 . Press CPT. Youll get Y ' = 2.5 (this is a + bX when X =0)
Enter X ' = 3 . Press CPT. Youll get Y ' = 4 (this is a + bX when X =3)
Enter X ' = 12 Press CPT. Youll get Y ' = 8.5 (this is a + bX when X =12)
Solution
The probability is not uniform. Assume the total # of occurrences is 4. Then the data pair
" n = 0, E ( X 2 X 1 = 0 ) = 0.5!# occurs once, " n = 1, E ( X 2 X 1 = 1) = 0.9 !# occurs twice, and
" n = 2, E ( X 2 X 1 = 2 ) = 1.7 !# occurs once.
X01=0, Y01=0.5
X02=1, Y02=0.9
X03=1, Y03=0.9
X04=2, Y03=1.7
We should get:
a = 0.4 , b = 0.6 . So the Bhlmann credibility factor is Z = b = 0.6 .
1
The Bhlmann credibility factor after one experiment is . Calculate a and b that
12
minimize the following expression:
Guo Fall 2009 C, Page 44 / 284
3
Pi ( a + bRi Ei )
2
i =1
Solution
1
SOA makes your life easier by giving you b = . However, to solve this problem, you
12
really dont need to know b . Once again, well use LIN to solve the problem. Lets
assume the total # of occurrences of data pairs ( Ri , Ei ) is 9. Then (0, 7 4 ) occurs 6
times; (2, 55 24 ) occurs 2 times; and (14, 35 12 ) occurs one time.
X07=2, Y07= 55 24
X08=3, Y08= 55 24
X09=14, Y09= 35 12
We should get:
1
a = 1.8333 , b = 0.08333 = .
12
Does this solution sound too much data entry? Not to me. Yes, I can figure out the
answers using the equations:
Cov ( X , Y )
b= , a = E (Y ) bE ( X )
Var ( X )
I might solve this problem using the above equations when Im not taking the exam.
However, in the exam room, you bet I wont bother using these equations. I will enter 18
numbers into the calculator and let the calculator do the math for me. This way, I dont
have to think. I just enter the numbers and the calculator will spit out the answer for me.
And I know that my result is 100% right.
Another use of LIN is to do linear interpolation. You are given two data pairs ( x1 , y1 ) and
( x2 , y2 ) . Then you are given a single value x3 . You need to find y3 using linear
interpolation.
y3 y1 y2 y1
= = slop of line ( x1 , y1 ) and ( x2 , y2 )
x3 x1 x2 x1
y2 y1
y3 = ( x3 x1 ) + y1
x2 x1
To use LIN for linear interpolation, please note that the least squares regression line for
two data points ( x1 , y1 ) and ( x2 , y2 ) is just an ordinary straight line connecting ( x1 , y1 )
and ( x2 , y2 ) . To find y3 , we simply find the least squares regression line a + bX for
( x1 , y1 ) and ( x2 , y2 ) . Then we enter x3 into LIN. Then LIN will produce y3 .
Determine the smoothed empirical estimate of the 90th percentile, as defined in Klugman,
Panjer, and Willmot.
Solution
90 81.82
x90 = x81.82 + ( x90.91 x81.82 )
90.91 81.82
90 81.82
= 2,199 + ( 3, 207 2,199 ) = 3,106.09
90.91 81.82
Next, Ill show you two shortcuts. One is without using LIN; the other with using LIN.
100k 100k
Since the k -th number is the percentile, the m = percentile corresponds to
n +1 n +1
m ( n + 1)
- th observation. For example, the 81.82-th percentile corresponds to
100
81.82 (10 + 1)
= 9 -th observation; 90.91-th percentile corresponds to the
100
90.91(10 + 1)
= 10 -th observation.
100
Important Rules:
100k
The k -th observation is the percentile.
n +1
Once you understand the above two rules, you can quickly find the 90-th percentile.
m ( n + 1) 90 (10 + 1)
Set m = 90 : k = = = 9.9 . So 9.9-th observation is what we are
100 100
looking for.
Of course, there isnt 9.9-th observation. So we need to find it using linear interpolation.
9 9.9 10
9.9 9
x90 = 2,199 + ( 3, 207 2,199 ) = 3,106.2
10 9
You see that this linear interpolation is must faster than the previous linear interpolation.
We have two data pairs (9, 2,199) and (10, 3,207). As said before, if you have only two
points, then the least squares line is just the ordinary line connecting the two points. We
are interested in finding the ordinary straight line connecting (9, 2,199) and (10, 3,207).
So well use the LIN function to find the least squares line, which is the ordinary line.
X01=9, Y01=2199
X02=10, Y02=3207
Youll find that: a = 6,873 , b = 1, 008 , r = 1 . The correlation coefficient should be one
because we have only two data pairs. Two data points always produce perfectly linear
relationship. So if your r is not equal to one, you did something wrong.
Next, set X ' = 9.9 . Press CPT. You should get: Y ' = 3,106.2 . This is the 90th percentile
you are looking for.
You are given the following values of the cdf of a standard normal distribution:
Solution
This approach is prone to errors. The math logic is simple, but there are simply too many
numbers to calculate. And its very easy to make a mistake, especially in the heat of the
exam.
To quickly solve this problem, well use LIN. Enter the following data:
X01=0.4, Y01=0.6554
X02=0.5, Y02=0.6915
2nd STAT (keep pressing 2nd Enter until you see LIN)
Press the down arrow key , youll see n = 2
Press the down arrow key , youll see X = 0.45
Press the down arrow key , youll see S X = 0.07071068
Press the down arrow key , youll see X = 0.05
Press the down arrow key , youll see Y = 0.67345
Press the down arrow key , youll see S y = 0.02552655
Press the down arrow key , youll see y = 0.01805
Press the down arrow key , youll see a = 0.511
Press the down arrow key , youll see b = 0.361
Press the down arrow key , youll see r = 1 (this is the correlation coefficient)
Press the down arrow key , youll see X ' = 0.00
Enter X ' = 0.443
So , ( 0.443) = 0.670923
In the above example, after generating , ( 0.443) = 0.670923 , you want to generate
, ( 0.412345 ) , this is what you do:
Enter X ' = 0.412345
Press the down arrow key .
Press CPT. Youll get Y ' = 0.65985655 . This is , ( 0.412345 ) .
General procedure
Given two data pairs ( c1 , d1 ) and ( c2 , d 2 ) and a single data c3 , to use BA II Plus and BA
II Plus Professional LIN Worksheet to generate d3 , enter
X01= c1 , Y01= d1
X02= c2 , Y02= d 2
X ' = c3
Example 3
You are given the following values of the cdf of a standard normal distribution:
Use linear interpolation, find a, b, c , and e (all these are positive numbers) such that
, ( a ) = 0.6666
, ( b ) = 0.6777
, ( c ) = 0.6888
Solution
X01=0.6554, Y01=0.4
X02=0.6915, Y02=0.5
Enter X ' = 0.6666 . Then the calculator will generate Y ' = 0.43102493 .
So a = 0.43102493 .
Enter X ' = 0.6777 . Then the calculator will generate Y ' = 0.46177285
So b = 0.46177285 .
Enter X ' = 0.6888 . Then the calculator will generate Y ' = 0.49252078
c = 0.49252078
Enter X ' = 0.6999 . Then the calculator will generate Y ' = 0.52326870
So d = 0.52326870
Example 4
The population of a survivor group is assumed to be linear between two consecutive ages.
You are given the following:
Solution
X01=50, Y01=598
X02=51, Y02=534
Enter X' = 50.2 . Then the calculator will generate Y' = 585.2
Enter X' = 50.5 . Then the calculator will generate Y' = 566
Enter X' = 50.7 . Then the calculator will generate Y' = 553.2
Enter X' = 50.9 . Then the calculator will generate Y' = 540.4
An urn has two coins, one fair and the other biased. In one flip, the fair coin has 50%
chance of landing with heads, while the biased one has 90% chance of landing with
heads. Now a coin is randomly chosen from the urn and is tossed. The outcome is a head.
Question: Which coin was chosen from the urn? The fair coin or the biased coin?
Imagine you have entered a bet. If your guess is correct, youll earn $10. If your guess is
wrong, youll lose $10. How would you guess?
Most people will guess that the coin chosen from the urn was the biased coin; the biased
coin is far more likely to land on heads.
This simple example illustrates the intuition behind the maximum likelihood estimator. If
we have to estimate a parameter from an n -size sample X 1 , X 2 ,, X n , we can choose a
parameter that has the highest probability to be observed.
Example. You flip a coin 9 times and observe HTTTHHHTH. You dont know whether
the coin is fair and you need to estimate the probability of getting H in one flip.
Let p represent the probability of getting a head in one flip. The probability for us to
observe HTTTHHHTH is
P ( HTTTHHHTH p ) = p5 (1 p )
4
p
0 0.000000000
0.1 0.000006561
0.2 0.000131072
0.3 0.000583443
0.4 0.001327104
0.5 0.001953125
0.6 0.001990656
0.7 0.001361367
0.8 0.000524288
0.9 0.000059049
1 0.000000000
Guo Fall 2009 C, Page 52 / 284
If we have to guess p among the possible values 0, 0.1, 0.2, , we might guess p = 0.6 ,
which has the highest probability to produce the outcome of HTTTHHHTH.
A coin is tossed n times and x number of heads are observed. Let p represent the
probability that a head shows up in one flip of coin. Calculate the maximum likelihood
estimator of p .
Step One Write the probability that the observed event happens (the likelihood
function)
Step Three Take the 1st derivative of the log-likelihood function regarding the
parameter. Set the 1st derivative to zero.
d
ln Cnx + x ln p + ( n x ) ln (1 p ) = 0 ,
dp
d d d
ln Cnx + ( x ln p ) + (n x ) ln (1 p ) = 0 ,
dp dp dp
d d d x
ln Cnx = 0 , ( x ln p ) = x ( ln p ) = ,
dp dp dp p
d d n x
(n x ) ln (1 p ) = ( n x ) ln (1 p ) =
dp dp 1 p
Nov 2000 #6
You have observed the following claim severities:
1 1
f ( x) = (x ) , x > 0 , > 0
2
exp
2 x 2x
Solution
1 1
f ( x) = (x )
2
exp
2 x 2x
f X1 , X 2 , X 3 , X 4 , X 5 ( x1 , x2 , x3 , x4 , x5 )
= f X ( x1 ) f X ( x2 ) f X ( x3 ) f X ( x4 ) f X ( x5 )
1 1 1 1 1 1
= ( x1 ) ( x2 ) ( x3 )
2 2 2
exp exp exp
2 x1 2 x1 2 x2 2 x2 2 x3 2 x3
1 1 1 1
( x4 ) ( x5 )
2 2
exp exp
2 x4 2 x4 2 x5 2 x5
d
f X , X , X , X , X ( x1 , x2 , x3 , x4 , x5 ) = 0
d 1 2 3 4 5
Though we can solve the above equation by pure hard work, an easier approach is to find
a parameter that will maximize the log-likelihood of us observing X 1 , X 2 , X 3 , X 4 ,
and X 5 :
ln f X1 , X 2 , X 3 , X 4 , X 5 ( x1 , x2 , x3 , x4 , x5 )
If ln f X1 , X 2 , X 3 , X 4 , X 5 ( x1 , x2 , x3 , x4 , x5 ) is maximized, f X1 , X 2 , X 3 , X 4 , X 5 ( x1 , x2 , x3 , x4 , x5 ) will
surely be maximized. So the task boils down to finding such that the 1st derivative of
the log pdf is zero:
d
ln f X1 , X 2 , X 3 , X 4 , X 5 ( x1 , x2 , x3 , x4 , x5 ) = 0
d
ln f X1 , X 2 , X 3 , X 4 , X 5 ( x1 , x2 , x3 , x4 , x5 )
5 5
1 1 1 1
= ( xi ) = ( xi )
2 2
ln exp ln
i =1 2 xi 2 xi i =1 2 xi 2 xi
5
d 1 1
( xi ) =0
2
ln
d i =1 2 xi 2 xi
( xi )
2
5 5
d 1 d
( xi ) = 0, =0
2
d i =1 2 xi d i =1 xi
5
1 1 1 1 1
1 =0, 5 + + + + =0
i =1 xi x1 x2 x3 x4 x5
5 5
= = = 16.74
1 1 1 1 1 1 1 1 1 1
+ + + + + + + +
x1 x2 x3 x4 x5 11 15.2 18 21 25.8
After understanding the theoretical framework and detailed calculation, we are ready to
use a shortcut. First, lets isolate the variable :
1 1 1
f ( x) = (x ) (x )
2 2
exp exp
2 x 2x 2x
5
1
f X1 , X 2 , X 3 , X 4 , X 5 ( x1 , x2 , x3 , x4 , x5 ) ( xi )
2
exp
i =1 2 xi
( xi )
2
5
ln f X1 , X 2 , X 3 , X 4 , X 5 ( x1 , x2 , x3 , x4 , x5 )
i =1 xi
d ( xi )
2
d 5 5
xi
ln f X1 , X 2 , X 3 , X 4 , X 5 ( x1 , x2 , x3 , x4 , x5 ) = 0 = 2 =0
d i =1 d xi i =1 xi
5 5
= = = 16.74
1 1 1 1 1 1 1 1 1 1
+ + + + + + + +
x1 x2 x3 x4 x5 11 15.2 18 21 25.8
!
500
F ( x) = 1 , x > 500 , ! > 0
x
Solution
! 500!
f ( x) =
x! +1
5 ! 500! ! 5 5005!
f ( x1 , x2 , x3 , x4 , x5 ) = " =
( x1 x2 x3 x4 x5 )
! +1
i =1 xi! +1
! 5 5005!
ln f ( x1 , x2 , x3 , x4 , x5 ) = ln = 5ln ! + 5! ln 500 (! + 1) ln ( x1 x2 x3 x4 x5 )
( x1 x2 x3 x4 x5 )
! +1
d 5
ln f ( x1 , x2 , x3 , x4 , x5 ) = + 5 ln 500 ln ( x1 x2 x3 x4 x5 ) = 0
d! !
5
+ 5ln 500 ln ( 521 658 702 819 1217 ) = 0 , ! = 2.453
!
Solution
From Exam C table, youll find the Weibull pdf and cdf:
$
$ x
x
$ e % 2 2 2
%
x x x
2x
f ( x) = = e %
, F ( x) = 1 e %
, S ( x) = e %
x %2
L (% ) = f ( 20 ) f ( 30 ) f ( 45 ) S ( 50 ) S ( 50 )
2 ( 20 ) 2 ( 30 ) 2 ( 40 )
2 2 2 2 2
20 30 40 50 50
= exp exp exp exp exp
%2 % %2 % %6 % % %
8,325
1 8,325
L (% ) e %2
ln L (% ) = k 6ln % , where k is a constant
% 6
%2
d 2 ( 8,325 ) 6
ln L (% ) = = 0, % = 52.7
d% %3 %
Fisher Information
One key theorem you need to memorize for Exam C is that the maximum likelihood
1
estimator % is approximately normally distributed with mean %0 and variance :
I (% )
1
% N %0 ,
I (% )
Here %0 is the true parameter. L ( x,% ) , called Fisher information or information, is the
d
variance of ln L ( x,% ) :
d%
2 2
d d d2
I (% ) = VarX ln L (% ) = E X ln L ( x, % ) = EX ln L ( x, % )
d% d% d% 2
Please note in the above equation, the expectation and variance are regarding X .
1
Its quit a bit a math to prove that % N %0 , . So I wont show you the proof.
I (% )
Youll just need to memorize it. However, Ill show you why
First, let me introduce a new concept to you called score. The term score is not the
syllabus. However, its a building block for Fisher information. So lets take a look.
n
L ( x,% ) = " f ( xi ,% ) , where % is the unobservable parameter of the density function.
i =1
When calculating the maximum likelihood estimator % , we often use the log-likelihood
function. So lets consider log-likelihood function, ln L ( x,% ) . The derivative of the log-
d
likelihood function regarding the estimator % , ln L ( x, % ) , is called the score of the
d%
log-likelihood function. Lets find the mean and variance of the score.
d 1 d
ln L ( x,% ) = L ( x, % )
d% L (% ) d%
d 1 d
EX ln L ( x,% ) = E X L ( x, % )
d% L ( x , % ) d%
1 d d d
=&
L ( x , % ) d%
L ( x, % ) L ( x, % ) dx = &
d%
L ( x, % ) dx =
d% & L ( x,% ) dx
density
random variable
d 1 d d
EX ln L ( x, % ) = E X L ( x, % ) = 1= 0
d% L ( x , % ) d% d%
2 2
d d2
Next, let me explain why E ln L ( x,% ) = E ln L ( x,% ) .
d% d% 2
d d d
& ln L ( x, % ) L ( x,% ) dx = 0=0
d% d% d%
d
Moving inside the integration, we have:
d%
d d d d
& ln L ( x, % ) L ( x,% ) dx = & ln L ( x, % ) L ( x, % ) dx
d% d% d% d%
d d d
Using the formula u ( x) v ( x) = u ( x) v ( x) + v ( x) u ( x ) , we have:
dx dx dx
d d
ln L ( x, % ) L ( x,% )
d% d%
d d d d
= L ( x, % ) ln L ( x, % ) + ln L ( x, % ) L ( x, % )
d% d% d% d%
d 1 d d d
However, ln L ( x,% ) = L ( x, % ) . L ( x , % ) = L ( x, % ) ln L ( x, % )
d% L ( x,% ) d% d% d%
So we have:
d d
ln L ( x,% ) L ( x,% )
d% d%
d d d d
= L ( x, % ) ln L ( x, % ) + ln L ( x, % ) L ( x, % )
d% d% d% d%
2
d d d
= L ( x, % ) ln L ( x, % ) + L ( x, % ) ln L ( x, % )
d% d% d%
2
d2 d
= L ( x, % ) 2 ln L ( x, % ) + L ( x, % ) ln L ( x, % )
d% d%
2
d2 d
= L ( x, % ) ln L ( x, % ) + ln L ( x, % )
d% 2
d%
2
d2 d
& ln L ( x, % ) L ( x,% ) dx + & ln L ( x, % ) L ( x, % ) dx = 0
d% 2
d%
d2 d2
However, & ln L ( x , % ) L ( x , % ) dx = E ln L ( x,% ) ,
d% 2 d% 2
2 2
d d
& ln L ( x, % ) L ( x, % ) dx = E ln L ( x, % )
d% d%
2
d2 d
Then it follows that E ln L ( x, % ) + E ln L ( x, % ) = 0.
d% 2
d%
d
Since we know that E ln L ( x, % ) = 0 , it follows that
d%
2
d d d2
Var ln L ( x, % ) = E ln L ( x,% ) = E ln L ( x,% )
d% d% d% 2
d
The score ln L ( x, % ) has
d%
2
d d2
zero mean and variance E ln L ( x, % ) = E ln L ( x,% )
d% d% 2
Solution
()
Var % is the inverse of the information. So Var % = () 1
4n
( )
Var 2% = 4Var % = 4 () 1
4n
1
= .
n
Suppose the random variable X has density function f ( x,% ) . If g ( x ) is any unbiased
1
estimator of % , then Var g ( x ) ' . The proof is as follows:
Var f ( x,% )
& g ( x ) f ( x,% ) dx = % .
Taking derivative regarding % at both sides of the above equation:
d d
& g ( x ) f ( x, % ) dx = % =1
d% d%
d
Moving inside the integration:
d%
d d
& g ( x ) f ( x, % ) dx = & g ( x ) f ( x, % ) dx = 1
d% d%
d d
& d% g ( x ) f ( x, % ) dx = & g ( x ) f ( x, % )dx = 1
d%
d d
However, f ( x, % ) = f ( x, % ) ln f ( x, % ) . So we have
d% d%
d d
& g ( x ) d% f ( x,% )dx = & g ( x ) d% ln f ( x,% ) f ( x,% ) dx = 1
d d
However, & g ( x ) d% ln f ( x,% ) f ( x,% ) dx = E g ( x ) d% ln f ( x,% ) .
d
EX g ( x ) ln f ( x, % ) = 1 .
d%
Cov g ( x ) ,
d
d%
ln f ( x,% ) = E X {g ( x ) Eg ( x )}
d
d%
ln f ( x, % ) E
d
d%
ln f ( x,% )
d d
However, E X g ( x ) = % , E X ln f ( x, % ) = 0 . ln f ( x,% ) is the score and has
d% d%
zero mean. Then it follows:
Cov g ( x ) ,
d
d%
ln f ( x, % ) = EX { g ( x ) % } dd% ln f ( x,% )
d d
= EX g ( x ) ln f ( x,% ) % ln f ( x, % )
d% d%
d d
= EX g ( x ) ln f ( x,% ) EX % ln f ( x, % )
d% d%
d d
= EX g ( x ) ln f ( x, % ) % EX ln f ( x,% )
d% d%
=1 % 0 =1
d
Cov g ( x ) , ln f ( x, % ) = 1
d%
2
d d
1 = Cov g ( x ) , ln f ( x,% ) Var g ( x ) Var ln f ( x,% )
d% d%
1
Var g ( x ) ' is a generic formula. When we use the maximum
d
Var ln f ( x,% )
d%
likelihood estimator, then the density function is:
f ( x,% ) = f ( x1 , % ) f ( x2 ,% ) ... f ( xn ,% ) = L ( x, % )
d 1
When the ln f ( x,% ) meets certain condition, Var g ( x ) = . We
d% d
Var ln f ( x,% )
d%
are not going to worry about what these conditions are. All we need to know is that for
the maximum likelihood estimator g ( x ) , when n , the sample size of the observed data
X 1 , X 2 ,..., X n , approaches infinity, the variance of g ( x ) approaches
1
d
Var ln L ( x, % )
d%
1
Var (% ) . as simple size n approaches infinity.
d
Var ln L ( x, % )
d%
/2
Where I1,2 = I 2,1 = E ln L ( x;%1 , % 2 )
/%1/% 2
Then
10
ln L (%1 , % 2 ) = ln f ( xi , yi ;%1 , % 2 ) = 2.5%12 3%1% 2 % 22 + 5%1 + 2% 2 + k
i =1
where k is a constant.
%1
Determine the estimated covariance matrix of the maximum likelihood estimator .
%2
Solution
Guo Fall 2009 C, Page 65 / 284
/2 /2
2 (
E ln L (%1 ,% 2 ) = E 2.5%12 3%1% 2 % 22 + 5%1 + 2% 2 + k ) = E ( 5) = 5
/%12 /%1
/2 /2
2 (
E ln L (%1 , % 2 ) = E 2.5%12 3%1% 2 % 22 + 5%1 + 2% 2 + k ) = E ( 2 ) = 2
/% 22 /% 2
/2 /2
E
/%1/% 2
ln L (%1 , % 2 ) = E
/%1/% 2
( 2.5%12 3%1% 2 % 22 + 5%1 + 2% 2 + k ) = E ( 3) = 3
5 3
I=
3 2
1
a b 1 d c
= , if ad bc 0 0
c d ad bc b a
Fisher Information matrix is good for estimating the variance and covariance of a series
of maximum likelihood estimators. What if we need to estimate the variance and
covariance of a function of a series of maximum likelihood estimators? We can use the
delta method.
Delta method
Assume that random variable X has mean X and variance + X2 . Define a new function
Y = f ( X ) . Assume that f ( X ) is differentiable, we have:
f ( X ) . f ( X ) + f / ( X )( X X )
Take variance at both sizes and notice that f ( X ) and f / ( X ) are constants:
Var f ( X ) . Var f ( X ) + f / ( X )( X X )
2 2
To get a feel of this formula, set Y = f ( X ) = cX , where c is a constant. Then the delta
formula becomes: Var [ cX ] . c 2Var ( X ) .
Var f ( X ) . f / ( X ) Var ( X ) f / ( X )
()
Suppose we want to find the variance of f % , where % is an estimator of a true
parameter % . Please note that % is a random variable. For example, if % is the maximum
likelihood estimator, % varies depending on the sample size and on the sample data we
have observed. Also assume based on the sample data we have, we get one estimator %0 .
Set X = % and E ( X ) = E % : ()
() () ()
2
Var f % . f/ E % Var %
Var f % () . f / ( a ) Var %
2
()
Variance of a function of two random variables
X has mean X and variance + X2 ; random variable Y has mean Y and variance + Y2
Define a new function Z = f ( X , Y ) . Assume that f ( Z ) is differentiable, we have:
Var f ( X , Y )
+2 f X/ ( X , Y ) f X/ ( X , Y ) Cov ( X X ) , ( X X )
. f X/ ( X , Y ) Var ( X ) + fY/ ( X , X ) Var (Y )
2 2
+2 f X/ ( X , Y ) f X/ ( X , Y ) Cov ( X , Y )
Var f ( X , Y )
Var ( X ) Cov ( X , Y ) f X/ ( X , Y )
. f /
( X , Y ) f /
( X , Y )
X X
Cov ( X , Y ) Var ( Y ) f X/ ( X , Y )
(% ,% ) ( ) (% ,% ) ( ) ( ) (% ,% ) Cov (% ,% )
2 2
/
fE % Var %1 + f E % /
Var % 2 + 2 f E/ % %1 ,% 2 f E/ %
1
1 2
( )
2
1 2
( )
1 ( )
2
1 2 1 2
Var f %1 , % 2( ) .
( ) ( ) ( ) ( ) ( ) ( ) ( )
2 2
f%/1 %1 ,% 2 Var %1 + f%/2 %1 ,% 2 Var % 2 + 2 f%/1 %1 ,% 2 f%/2 %1 ,% 2 Cov %1 ,% 2
(
f%/1 %1 , % 2 = ) 1
1%1
f %1 , % 2 ( ) .
1
1%1
f %1 , % 2 ( ) ,
%1 %1 = a
(
f%/2 %1 , % 2 = ) 1
1% 2
f %1 , % 2 ( ) .
1
1% 2
f %1 , % 2 ( )
%2 % 2 =b
Then we have:
(
Var f %1 , % 2 )
2 2
.
1
1%1
(
f %1 , % 2 ) Var %1 + ( ) 1
1% 2
f %1 , % 2 ( ) ( )
Var % 2
%1 = a % 2 =b
+2
1
1%1
(
f %1 , % 2 ) 1
1% 2
(
f %1 , % 2 ) (
Cov %1 , % 2 )
%1 = a %2 =b
and
1
1% 2
(
f %1 , % 2 ) as
1
1% 2
(
f %1 , % 2 ) . Then
% 2 =b %2
(
Var f %1 , % 2 )
2 2
.
1
1%1
(
f %1 , % 2 ) Var %1 + ( ) 1
1% 2
f %1 , % 2 ( ) Var % 2 ( )
%1 %2
+2
1
1%1
(
f %1 , % 2 ) 1
1% 2
(
f %1 , % 2 ) (
Cov %1 , % 2 )
%1 %2
1
1%1
f %1 , % 2( ) and that
1
1% 2
(
f %1 , % 2 ) really means
1
1% 2
(
f %1 , % 2 ) .
%1 = a %2 % 2 =b
Var f %1 , % 2 ( )
Var % 1( ) (
Cov % 1 ,% 2 ) ( )
f%/ %1 ,% 2
(
. f% %1 ,% 2
/
) (
f% %1 , % 2
/
) 1
1 2
(
Cov % 1 ,% 2 ) Var % 2 ( ) f%/
2
(% ,% )
1 2
0.1195 0
0 0.0597
1
The mean of the lognormal distribution is exp + + 2
2
Estimate the variance of the maximum likelihood estimate of the mean of the lognormal
distribution, using the delta method.
Guo Fall 2009 C, Page 70 / 284
Solution
1
The mean function is f ( , + ) = exp + + 2 . The maximum likelihood estimator of
2
( ) 1 2
f ( , + ) is f , + = exp + + , where and + are maximum likelihood
2
estimator of and + respectively.
( )
We are asked to find Var f , +
1 2
= Var exp + +
2
.
( )
f ,+ . f ( ,+ ) +
1
1
( ) ( ) +
f ,+
1
1+
( ) (+ + )
f ,+
+
2
2
1
( )
Var f , + .
1
f ,+ ( ) ( )
Var +
1
1+
f ,+ ( ) +
Var + ( )
+2
1
1
( )
f ,+
1
1+
( )
f ,+
+
( )
Cov , +
0.1195 0
0 0.0597
( ) ( )
So Var . 0.1195 , Var + . 0.0597 , Cov , + . 0 . ( )
2
2
( )
Var f , + .
1
1
f ,+( ) 0.1195 +
1
1+
( )
f ,+
+
0.0597
Consequently, we set
1
1
( )
f ,+ =
1
1
1 2
exp + +
2
1 2
= exp + +
2
1
1+
( )
f ,+ =
1
1+
1 2
exp + +
2
1 2
= + exp + +
2
1
1
( )
f ,+
1 2
. exp + +
2
1
. exp 4.125 + 1.0932 = 123.02
2
1
1
( )
f ,+
1 2
. + exp + +
2
1
. 1.093exp 4.125 + 1.0932 = 134.46
2
Please note that you can also solve this problem using the black-box formula
(
Var f %1 , % 2 )
2 2
.
1
1%1
(
f %1 , % 2 ) ( )
Var %1 +
1
1% 2
f %1 , % 2( ) ( )
Var % 2
%1 %2
+2
1
1%1
(
f %1 , % 2 ) 1
1% 2
(
f %1 , % 2 ) (
Cov %1 , % 2 )
%1 %2
However, I recommend that you first solve the problem using Taylor series
approximation. This forces you to understand the logic behind the messy formula. Once
you understand the formula, next time you can use the memorized formula for
(
Var f %1 , % 2 ) and quickly solve the problem.
Solution
The time to an accident follows an exponential distribution. Assume % is the mean for
this exponential distribution. If X 1 and X 2 are two random samples of time-to-accident,
then the maximum likelihood estimator of % is just the sample mean. So % = 6 .
X1 + X 2
Pr (Y > 10 ) = Pr > 10 = Pr ( X 1 + X 2 > 20 )
2
2
te t 6
Pr ( X 1 + X 2 > 20 ) = &20 36 dt
2
te t 6
To calculate & 36 dt , youll need to memorize the following shortcut:
20
+2 1
& x e x /%
dx = (a + % ) e a /%
a %
+2 1
& (a + % )
x /% 2 a /%
x2 e dx = +% 2 e
a %
If interested, you can download the proof of this shortcut from my website
http://www.guo.coursehost.com. The shortcut and the proof are in the sample chapter of
my P manual. Just download the sample chapter of P manual and youll get the proof and
more worked out examples using this shortcut.
2 2
te t 6 1 1 1
&20 36 dt = 6 20& t 6 e
t 6
dt = [ 20 + 6] e 20 6
= 0.1546
6
20 t% 20 t %
te te
FY (10 ) = Pr ( X 1 + X 2 20 ) = & dt FY (10 ) = & dt
0
%2 0 %
2
()
20 20
te te
Var FY (10 ) = Var & dt . & dt Var %
1%
2 2
0 % 0 % ()
E % .6
()
Var % = Var X = Var
X1 + X 2
2
( )
= ( 2 ) Var ( X ) = % 2 . ( 6 2 ) = 18
1
4
1
2
1
2
Please note that the two samples X 1 and X 2 are independent identically distributed with
a common variance Var ( X ) = % 2 .
1 20
te t %
Next, we need to calculate
1%
&
0 %
2
dt .
20 t % 20 2 2
te 1 1 1 1 1
& &t &t &t
t % t % t %
dt = e dt = e dt e dt
% % % % %
2
0 % 0 0 20
=
1
%
% ( 20 + % ) e 20 %
=1 1+
20
%
e 20 %
1 20 20 % 1 20 20 % 400 20 %
1 1+ e = 1+ e = e
1% % 1% %
3
%
1 20
te t %
400 20
1%
& %
2
dt =
%
3
exp
%
= 0.066
0
()
E % .6 6
1 t %
()
20
te
Var FY (10 ) . & dt Var % = 0.0662 (18) = 0.078
1%
2
0 % ()
E % .6
Kernel smoothing
=Set your point estimate equal to the average of a neighborhood
=Recalculate at every point by averaging this point and the nearby points
Let me illustrate this with a story. You want to buy a house. After looking at many
houses, you find one house you like most. You go the current owner of the house and ask
for the price. The current owner tells you, Im asking for $210,000. Make me an offer.
What are you going to offer? 200,000? $203,000? $205,000 or something else? You are
not sure. And you know the danger: if your offer is too high, the seller accepts your offer
and youll overpay the house; if your offer is too low, youll look stupid and the seller
may refuse to deal with you anymore. So to your best interest, youll want to make your
offer reasonable, not too high, not too low.
If you talk to someone experienced in the real estate market, hell tell you how (and this
works): instead of making a random offer, you can make your offering price to be around
the average selling price of the similar houses sold in the same neighborhood.
Say four similar houses in the same neighborhood are sold this year. Their prices are
$198,000, $200,000, $201,000, and $202,000. So the average selling price is $200,250. If
the house you want to be is truly similar to these four houses, then the seller is asking for
too much. You can offer around $200, 250 and explain to the seller that your asking price
is very similar to the selling price of the houses in the same neighborhood. A reasonable
seller will be willing to lower his asking price.
This simple story illustrates the spirit of kernel smoothing. When we want to estimate
f X ( x ) , probability density of a random variable X at point x . Instead of looking at one
# of x's in the sample
point x and say f X ( x) = p ( x) = , we may want to look at the x s
sample size n
neighborhood. For example, we may want to look at 3 data points x b , x , and x + b
where b is a constant. Then we calculate the average of empirical densities at x b , x ,
and x + b and use it as an estimator of f X ( x ) :
Please note the analogy of determining the house price is not perfect. Theres one small
difference between how we estimate the price of a house located at x and how we
estimate f X ( x ) . When we estimate the fair price of a house located at x , we exclude the
data point x because we dont know the value of the house located at x :
1 1 1
f X ( x) = p ( x b) + p ( x) + p ( x + b)
3 3 3
Of course, we can expand our neighborhood. Instead of looking at only two nearby points,
we may look at 4 nearby points and calculate the average empirical density of a 5-point
neighborhood:
1 1 1 1 1
f X ( x) = p ( x 2b ) + p ( x b ) + p ( x ) + p ( x + b ) + p ( x + 2b )
5 5 5 5 5
calculate f ( x ) by averaing the empirical densities
of a neighborhood x 2b , x b , x , x +b , x + 2b
In addition, we dont need to use equal weighting. We can assign more weight to the data
points near x . For example, we can set
1 2 4 2 1
f X ( x) = p ( x 2b ) + p ( x b) + p ( x) + p ( x + b ) + p ( x + 2b )
10 10 10 10 10
How big is the neighborhood? This is called the bandwidth. The bigger the
neighborhood, the greater the smoothing. However, if your neighbor is too big,
you may run the risk of over-smoothing and finding false patterns.
How much weight you do give to each data point in the neighborhood? For
example, you can assign equal weight to each data point in the neighborhood.
You can also give more weight to the data point closer to the point whose density
you want to estimate. There are many weighting methods out there for you to use.
The weighting method is called kernel.
Of these two factors, the bandwidth is typically more important than the weighting
method. Your final result may not change much if you use different weighting method.
However, if you change the bandwidth, your estimated density may change widely.
Theres some literature out there explaining in more details on how to choose a proper
bandwidth and a proper weighting method. However, for the purpose of passing Exam C,
you dont need to know that much.
Uniform kernel. This is one of the easiest weighting methods. If you use this
method to estimate density, youll assign equal weight to each data point in the
neighborhood.
Triangular kernel. Under this weighting method, you give more weight to the
data points that are closer to the point for which you are estimating density.
Gamma kernel. This is more complex but less important than the uniform kernel
and the triangular kernel. If you want to cut some corners, you can skip the
gamma kernel.
Now lets look at the math formulas. Lets focus on the uniform kernel first.
Uniform kernel
0 if x < y - b
1
ky ( x) = if y - b x y+b
2b
0 if x > y + b
f ( x) = p ( yi ) k yi ( x )
All yi
kernel estimator of the empirical density of yi yi 's weight
density function at x
Calculate the density at x by taking the average of
the empirical densities of the nearby points yi 's
0 if x < y - b
x y+b
K y ( x) = if y - b x y + b
2b
1 if x > y + b
F ( x) = p ( yi ) K yi ( x )
All yi
kernel estimator of the empirical density of yi yi 's weight
distribution function at x
Calculate the distribution function at x by taking the
average of the empirical densities of the nearby points yi 's
Now lets look at the formula for k y ( x ) . The formula looks intimidating. The good news
is that you really dont need to memorize it. You just need to understand the essence of
the uniform weighting method. Once you understand the essence, you can derive the
formula effortless on the spot.
0 if x < y - b
0 if y - x > b
1
ky ( x) = if y - b x y+b ky ( x) = 1
2b if y - x b
0 if x > y + b 2b
A B
y1 x b y3 x y4 x+b y2
D C
Here your neighborhood is [x b, x + b]. b is called the bandwidth, which is half of the
width of the neighborhood you have chosen. Now the formula for k y ( x ) becomes:
0 if y is OUT of the
0 if y - x > b neighborhood [ x - b, x + b]
ky ( x) = 1 ky ( x) = 1
if y - x b if y is in the
2b 2b
neighborhood [ x - b, x + b]
If the data point y is out of the neighborhood [x b, x + b] , its weight is zero. We throw
this data point away and not use it in our estimation. And this should make intuitive sense.
In the neighborhood diagram, data points y1 and y2 are discarded.
If the data point y is in the neighborhood [x b, x + b], well use this data point in our
estimation and assign a weight 1 2b . In the neighborhood diagram, data points y3 and
y4 are used in the estimation and each gets a weight1 2b .
This is how we get 1 2b . Area ABCD represents the total weight we can possibly assign
to all the data points in the neighborhood. So well want the total area ABCD equal to
one.
1
Area ABCD = AB * BC = (2b) * BC =1, so BC = .
2b
So for each data point that falls in the neighborhood AB, its weight is BC = 1 2b . For
each data point that falls out of the neighborhood AB, its weight is zero.
Now you shouldnt have trouble memorizing the uniform kernel formula for k y ( x ) .
Next, lets look at the formula for K y ( x ) , the weighting factor for the distribution
function at x :
Its quite complex to derive the K y ( x ) . So lets not worry about how to derive the
formula. Lets just find an easy way to memorize the formula. Once again, lets draw a
neighborhood diagram:
A F B
x b y x x+b
D E C
To find how much weight to give to the data point y toward calculating the F ( x ) , draw
a vertical line at the data point y (Line EF). Next, imagine that you use a pair of scissors
to cut off whats to the left of Line EF while keeping whats to the right of Line EF. Next,
calculate the area of the neighborhood rectangular ABCD thats remaining after the cut.
This remaining area of the neighborhood rectangular ABCD that survives the cut is
K y ( x ) . Lets walk through this rule.
A F B
x b y x x+b
D E C
Next, we use a pair of scissors and cut off whats to the left of Line EF. New the diagram
becomes:
F B
y x x+b
E C
1 x y+b
EFBC = EF EC = ( x + b y) =
2b 2b
Situation Two If y < x b (see the diagram below), we draw a vertical line EF at
the data point y .
F A B
y x b x x+b
E D C
Next, we use a pair of scissors and cut off whats to the left of Line EF. New the diagram
is as follows:
F A B
y x b x x+b
E D C
The original neighborhood rectangular ABCD completely survives the cut. So well set
K y ( x ) = ABCD = 1 .
Situation Three If y > x + b (see the diagram below), we draw a vertical line EF at
the data point y .
A B F
x b x x+b y
D C E
Next, we use a pair of scissors and cut off whats to the left of Line EF. New the diagram
is as follows:
The original neighborhood rectangular ABCD is completely cut off. So well set
K y ( x) = 0 .
Now you see that you really dont need to memorize the ugly K y ( x ) formula. Just draw
a neighborhood diagram, use a pair of scissors, choose y at the cutting point and cut off
the left side of the diagram. Then you just calculate the surviving area of the
neighborhood rectangle. The surviving area is the K y ( x ) .
Triangular kernel
In the uniform kernel, every data point in the neighborhood gets an identical weight of
1 2b . Say we have two data points in the neighborhood y3 and y4 , but y4 is closer to x
and y4 is farther away from x (see the diagram below).
A B
x b y3 x y4 x+b
D C
However, often times it makes sense for us to give y4 more weight than y3 . For example,
x is the location of the house you want to buy; y3 and y4 are the locations of the two
similar houses in your neighborhood. It makes intuitive sense for us to give more weight
to the house located at y4 than the one located at y3 . If the house located at y3 was sold
at $200,000 and the house located at y4 was once sold at $198,000, we might want to
assign 40% weight to the house located at y3 and 60% to the one located at y4 . Then the
estimated fair price of the house located at x is:
60%* Price of the house located at y4 + 40% * Price of the house located at y3
= 60% * 198,000 + 40% * 200,000 = $198,800
Lets make sense of the triangular kernel formulas for k y ( x ) and K y ( x ) . First, lets look
at k y ( x ) :
0 if x < y - b
b+ x y
if y - b x y
b2
ky ( x) =
b+ y x
if y x y+b
b2
0 if x > y + b
0 if x < y - b
b+ x y 0 if x- y >b
if y - b x y
b2 b+ x y
ky ( x) = ky ( x) = if x y x+b
b+ y x b2
if y x y+b
b2 b+ y x
if x - b y x
0 if x > y + b b2
H
F
A E C G B
y1 x b y2 x y3 x+b y4
0 if x- y >b
b+ x y
ky ( x) = if x y x+b
b2
b+ y x
if x - b y x
b2
Now lets find k y when the data point y is in the neighborhood [ x - b, x + b] . Data points
y2 and y3 are in the neighborhood and their weights are equal to the height EF and GH
respectively.
Before calculating EF and GH, let me give you a preliminary high school math formula.
This formula is used over and over in the triangle kernel smoothing:
DE EC EC
= , DE = AB
AB BC BC
1
DE EC 2 2 2 2
DEC 2 DE EC DE EC
= = = , DEC = ABC = ABC
ABC 1 AB BC AB BC AB BC
2
where DEC represents the area DEC and ABC the area of ABC.
B E C
2
DE EC DEC EC
If you dont understand why = and = , youll want to review
AB BC ABC BC
high school geometry.
Now lets come back to the following diagram and calculate EF and GH. EF is the weight
assigned to the data point y2 . GH is the weight assigned to the data point y3 .
H
F
A E C G B
y1 x b y2 x y3 x+b y4
First, please note that the area of the triangle ABD represents the total weight assigned to
all the data points in the neighborhood [A, B]. So the area of the triangle ABD should be
one:
1
ABD = 0.5 * AB * CD = 1. However, AB= 2b . 0.5* 2b *CD=1, CD =
b
EF AE AE y (x b ) 1 b + y2 x
= , EF = CD = 2 = if y2 [x b, x ] ;
CD AC AC b b b2
GH BG BG x + b y3 1 b + x y3
= , GH = CD = = if y3 [ x, x + b ]
CD BC BC b b b2
0 if x < y - b
(b + x y)
2
if y - b x y
2b 2
K y ( x) =
(b + y x)
2
1 if y x y+b
2b 2
1 if x > y + b
0 if y ( , x b)
(b + x y)
2
if y [ x, x + b ]
2b 2
K y ( x) =
(b + y x)
2
1 if y [ x - b, x ]
2b 2
1 if y ( x + b, + )
H
F
A E C G B
y1 x b y2 x y3 x+b y4
Situation One If y [ x, x + b ]
Draw a vertical line at the data point y (Line GH). Next, imagine that you use a pair of
scissors and cut off whats to the left of Line GH while keeping whats to the right of
Line GH. Next, calculate the area of the triangle ABD remaining after the cut. This
remaining area after the cut is K y ( x ) .
A C G B
x b x y x+b
G B
y x+b
Situation Two If y [ x - b, x ]
D
A E C B
x b y x x+b
Draw a vertical line at data point y (Line EF). Cut off whats to the left of EF.
E C B
y x x+b
K y ( x ) = BDFE = 1 AEF
(x b) (b + x y)
2 2 2
AE 1 y
AEF = ACD = ! " =
AC 2 b 2b 2
(b + x y)
2
K y ( x ) = BDFE = 1
2b 2
D
N
M A C B
y x b x x+b
Draw a vertical line MN at data point y . Cut off whats to the left of line MN. Now the
whole area ABD will survive the cut. So K y ( x ) = 1 .
Situation Four If y ( x + b, + )
A C B R
x b x x+b y
Draw a vertical line RS at data point y . Cut off whats to the left of line RS. Now the
whole area ABD will be cut off. So K y ( x ) = 0 .
Now you see that you really dont need to memorize the complex formulas for K y ( x ) .
Just draw a diagram and directly calculate K y ( x ) .
To understand the gamma kernel, youll need to know this: in kernel smoothing, all the
weights should add up to one. Because of this, for convenience, we can use a density
function as weights. This way, the weights automatically add up to one.
(x % )
# x %
e y
In the gamma kernel, we just use gamma pdf . However, we set % =
x$ (# ) #
(x % )
# x %
e x# 1e x % x# 1e xa y
ky ( x) = = # =
x$ (# ) % $ (# ) y
#
$ (# )
#
The simplest gamma pdf is when a = 1 (i.e. exponential pdf). So the simple gamma
kernel is an exponential kernel:
1
ky ( x) = e x y
, where x > 0
y
x x
If you need to find the exponential kernel for F ( x ) , then K y ( x ) = & k y ( t )dt = 1 e . y
Problem 1
1, 2, 3, 3, 5, 6, 7, 8, 9, 9, 11, 12
Solution
1 1 1 1 1 1 1 1 1
So f ( 6 ) = p ( y ) k y ( 6) = + + + =
12 4 12 4 12 4 12 4 12
In the calculation of F ( 6 ) , any data point that falls out of the lower bound or touches the
lower bound of the neighborhood [4, 8] gets a full weight of 1. Data 1, 2, 3, 3 are below
the lower bound of the neighborhood [4, 8] and they each get a weight of 1. Any data
point that falls out of the upper bound or touches the upper bound of the neighborhood [4,
8] get zero weight. So 8 (touching the upper bound) and 9, 9, 11, 12 (staying above the
upper bound) each get zero weight.
Data points y = 5, 6, 7 are in the neighborhood range [4, 8]. If you draw a diagram, youll
find that the weights for y = 5, 6, 7 are:
3 2 1
K5 ( 6 ) = , K6 ( 6) = , K7 ( 6) =
4 4 4
F (6) = p ( y ) K y ( 6)
1 1 1 1 1 3 1 2 1 1
= (1) + (1) + (1) + (1) + + + ' 0.4583
12 12 12 12 12 4 12 4 12 4
y 1 2 3 3 5 6 7 8 9 9 11 12
p ( y ) 1 12 1 12 1 12 1 12 1 12 1 12 1 12 1 12 1 12 1 12 1 12 1 12
k y (6) 0 0 0 0 1/4 1/4 1/4 1/4 0 0 0 0
K y ( 6) 1 1 1 1 3/4 2/4 1/4 0 0 0 0 0
Problem 2
1, 2, 3, 3, 5, 6, 7, 8, 9, 9, 11, 12
Solution
1 1 1 1 1 1 1
f ( 6) = p ( y ) k y ( 6) = + + =
12 4 12 2 12 4 12
F (6) = p ( y ) K y ( 6)
1 1 1 1 1 7 1 1 1 1
= (1) + (1) + (1) + (1) + + + = 0.42708
12 12 12 12 12 8 12 2 12 8
Problem 3
1, 2, 3, 3, 5, 6, 7, 8, 9, 9, 11, 12
Solution
x x
1
Gamma kernel with # = 1 ky ( x) = e x y
, K y ( x ) = & k y ( t )dt = 1 e y
y 0
f ( 6) = p ( y ) k y ( 6)
1 1 61 1 1 62 1 1 63 1 1 63 1 1 65 1 1 66
= e + e + e + e + e + e
12 1 12 2 12 3 12 3 12 5 12 6
1 1 67 1 1 68 1 1 69 1 1 69 1 1 6 11 1 1
+ e + e + e + e + e + e 6 12
12 7 12 8 12 9 12 9 12 11 12 12
' 0.0248
F (6) = p ( y ) K y ( 6)
=
1
12
(1 e 6 1
) + 121 (1 e 6 2
) + 121 (1 e 63
) + 121 (1 e 63
) + 121 (1 e 65
) + 121 (1 e 66
)
+ (1 e ) + 121 (1 ) + 121 (1 ) + 121 (1 ) + 121 (1 ) + 121 (1 )
1 6 7 6 8 69 69 6 11 6 12
e e e e e
12
' 0.658
You study five lives to estimate the time from the onset of a disease to death. The times
to death are:
2 3 3 3 7
Using a triangular kernel with bandwidth 2, estimate the density function at 2.5.
Solution
The neighborhood is [0.5, 4.5]. If you draw a neighborhood diagram, you should get:
y 2 3 3 3 7
p( y) 15 15 15 15 15
1.5 1.5 1.5 1.5
k y ( 2.5) 4 4 4 4
0
Calculate the kernel density estimate F ( 4 ) , using the uniform kernel with bandwidth 1.4.
Solution
G H I J K L
A B C D E F
2 2.6 3.3 4 4.7 5.4
If you use scissors to cut whats left to the line CI at y = 3.3 , the surviving area is CFLI.
Area CFLI=0.75. So K y =3.3 ( 4 ) = 0.75
If you use scissors to cut whats left to the line DJ at y = 4 , the surviving area is DFLI,
which is 0.5. K y = 4 ( 4 ) = BCD = 0.5 .
If you use scissors to cut whats left to the line EK at y = 4.7 , the surviving area is EFLK,
which is 0.25. So K y = 4.7 ( 4 ) = 0.25 .
1 1 1 1
F ( 4) = p ( y ) K y ( 4) = (1) + ( 0.75) 2 + ( 0.5 ) 2 + ( 0.25 ) 3 = 0.53125
8 8 8 8
Loss Models doesnt explain bootstrap much. As a result, many candidates just memorize
a black-box formula without understanding the essence of bootstrap.
Let me explain bootstrap with an example. Suppose you want to find out the mean and
variance of GRE score of a group of 5,000 students. One way to do so is to take out lot of
random samples. For example, you can sample 20 students GRE scores and calculate the
mean and variance of the GRE score. Here you have one sample of size 20. Of course,
you want to take many samples. For example, you can take out 30 samples, each sample
consisting 20 students GRE score. For each of the 30 samples, you can calculate the
mean and variance of the GRE score.
As you can see, taking 30 samples of size 20 takes lot of time and money. As a research
scientist, you are short of research grant. And your life is busy. Is there any way you can
cut some corners?
You can cut corners this way. Instead of taking out 30 samples of size 20, you just take
out one sample of size 20 and collect 20 students GRE scores. These 20 scores are X 1 ,
X 2 ,, X 20 . You bring these 20 scores home. Your data collection is done.
Next, you reproduce 30 samples of size 20 each from one sample of size 20. How? Just
resample from your one sample of 20 scores. You randomly select 20 scores with
replacement from the 20 scores you have. This is your 1st resample. Next, you randomly
select 20 scores with replacement from the 20 scores you have. This is your 2nd resample.
If you repeat this process 30 times, youll get 30 resamples of size 20 each. If you repeat
this process 100 times, youll get 100 resamples of size 20 each. Now your original one
sample gives birth to many resamples. How wonderful.
The rest is easy. If you have 30 resamples, you can calculate the mean and variance of the
GRE scores for each sample. This should give you a good idea of the mean and variance
of the GRE scores.
Does this sound a fraud? Not really. Your original sample of size 20 X 1 , X 2 ,, X 20
reflects the population. As a result, resamples from this sample are pretty much what you
get if you take out many samples from the population. (By the way, the bootstrap comes
from the phrase to pull oneself by ones bootstrap.)
To use bootstrap, youll need to have a computer and some bootstrapping software to
quickly create a great number (such as 10,000) of resamples and to calculate the statistics
of the resamples. Bootstrap is a computer-intensive technique.
For more information on bootstrap, you can download the free PDF file at
http://bcs.whfreeman.com/pbs/cat_160/PBS18.pdf
1 3
(X )
2
1
( F ) = Var ( X ) using the estimator g ( X 1 , X 2 ) =
2
You estimate i X , where
2 i =1
X1 + X 2
X= . Determine the bootstrap approximation to the mean square error.
2
Solution
Var ( X ) = E ( X ) E ( X ) = (12 + 32 )
1 1
2 2
(1 + 3) =1
2 2
Under the bootstrap method, you resample from your original sample with replacement.
Your resamples are: (1,1), (1,3),(3,1), and (3,3), each having probability of 1 4 .
(X )
2
1
For each resample, you calculate g ( X 1 , X 2 ) =
2
i X . Then the mean square
2 i =1
replacement X= i X
2 2 i =1
( X1 , X 2 )
(1,1) 1 1
(1 1) + (1 1) =0
2 2
2
(1,3) 2 1
(1 2) + (3 2) =1
2 2
2
(3,1) 2 1
(3 2 ) + (1 2 ) =1
2 2
2
(3,3) 3 1
(3 3) + ( 3 3) =0
2 2
1 1 1 1 1
= ( 0 1) + (1 1) + (1 1) + ( 0 1) =
2 2 2 2
4 4 4 4 2
2 3
(X )
2
( F ) = Var ( X ) using the estimator g ( X 1 , X 2 ) =
2
You estimate i X , where
i =1
X1 + X 2
X= . Determine the bootstrap approximation to the mean square error.
2
Solution
The only difference between this problem and the previous problem (May 2000 #17)
(X )
2
is the definition of g ( X 1 , X 2 ) . In this problem, g ( X 1 , X 2 ) =
2
i X ; in the
i =1
(X )
2
1
previous problem, g ( X 1 , X 2 ) =
2
i X .
2 i =1
Var ( X ) = E ( X ) E ( X ) = (12 + 32 )
1 1
2 2
(1 + 3) =1
2 2
(X )
2
For each resample, you calculate g ( X 1 , X 2 ) =
2
i X . Then the mean square error
i =1
X1 + X 2
(X )
Resample with 2
g ( X1, X 2 ) =
2
replacement X= i X
2 i =1
( X1 , X 2 )
(1,1) 1 (1 1) + (1 1) = 0
2 2
(1,3) 2 (1 2) + (3 2) = 2
2 2
(3,1) 2 (3 2 ) + (1 2 ) = 2
2 2
(3,3) 3 (3 3 ) + ( 3 3) = 0
2 2
1 1 1 1
= ( 0 1) + ( 2 1) + ( 2 1) + ( 0 1) = 1
2 2 2 2
4 4 4 4
May 2005 #4
g ( X1, X 2 , X 3 ) =
1
(X )
3
i X
3
Solution
First, you need to understand that the n -th central moment is E {X E(X )
n
}.
For example, the 1st central moment is
X=
1+1+ 4
3
=2 , E {X E(X )
3
} = 13 (1 2) +
3 1
3
3 1
(1 2 ) + ( 4 2 ) = 2
3
3
The third central moment of this original sample is used to approximate the true 3rd
central moment of the population. So the true parameter is = 2 .
Next, you need to understand bootstrap. Under bootstrap, you resample from the original
sample with replacement. Imagine you have 3 boxes to fill from left to right. The 1st box
can be filled with any number of your original sample (1,1,4); the 2nd box can be filled
with any number of your original sample (1,1,4); and the 3rd box can be filled with any
number of your original sample (1,1,4). The # of resamples is 33=27. This is a concept in
Exam P.
(1) Three 1s. The number of permutation is 8. To understand why, lets denote the
original sample as (a,b,c) with a=1, b=1, and c=4. Then the following 8 resamples will
produce (1,1,1): aaa,aab,aba,baa, bba,bab,abb, bbb. For the resample of (1,1,1),
X=
1+1+1
3
=1 , =E {X E(X )
3
} = 13 (1 1) +
3 1
3
3 1
(1 1) + (1 1) = 0 ,
3
3
( )
2
= ( 0 2) = 4
2
(2) Two 1s and one 4. The following 8 permutations will produce two 1s and one 4:
aac,aca,caa,bbc,bcb,cbb,abc,acb,cab,bac,bca,cba.
X=
1+1+ 4
3
=2 , E {X E(X )
3
} = 13 (1 2) +
3 1
3
3 1
(1 2 ) + ( 4 2 ) = 2 ,
3
3
( )
2
= ( 2 2) = 0
2
(3) Two 4s and one 1. The following 6 permutations will produce two 4s and one 1:
X=
1+ 4 + 4
3
=3 , E {X E(X )
3
} = 13 (1 3) +
3 1
3
3 1
( 4 3) + ( 4 3) = 2 ,
3
3
( )
2
= ( 2 2) = 0
2
(4) Three 4s. The following 1 permutation will produce two 4s and one 1: ccc.
X=
4+4+4
3
=4 , E {X E(X )
3
} = 13 ( 4 4) +
3 1
3
3 1
( 4 4) + ( 4 4) = 0 ,
3
3
( )
2
= ( 4 2) = 4
2
( ) 8 12 6 1
2
E = ( 4 ) + ( 0 ) + (16 ) + ( 4 ) 4.9 .
27 27 27 27
A sample of claim amounts is {300, 600, 1500}. By applying the deductible to this
sample, the loss elimination ratio for a deductible of 100 per claim is estimated to be
0.125.
Determine the bootstrap approximation to the mean square error of the estimate.
Solution
Your original sample is {300, 600, 1500}. If you resample this sample with replacement,
youll get 33=27 resamples. However, calculating the mean square errors based on 27
Guo Fall 2009 C, Page 100 / 284
resamples is too much work under the exam condition. Thats why SOA gives you only
10 resamples.
E min ( X , d )
Loss elimination ratio is LERX ( d ) = .
E(X )
Loss elimination ratio for the original sample {300, 600, 1500} with 100 deductible is
0.125. SOA already gives the loss ratio. If we need to calculate it, this is how:
For the loss amount 300, the insurer pays only 200, saving 100.
For the loss amount 600, the insurer pays only 500, saving 100.
For the loss amount 1500, the insurer pays only 1400, saving 100.
1
The expected saving due to 100 deductible is: (100 + 100 + 100 ) = 100
3
1
The expected loss amount is: ( 300 + 600 + 1500 ) = 100 + 200 + 500 = 800
3
So the loss ratio is: 100 / 800 = 0.125
Next, for each of the 10 resamples, you calculate the loss ratio as we did for the original
sample. To speed up the calculation, lets set $100 as one unit of money. Then the
deductible is one.
( LER 0.125 )
2
X1 X2 X3 LER
Resample
1 6 6 15 1/9 0.000193
2 15 3 15 1/11 0.001162
3 15 3 6 1/8 0
4 6 6 3 1/5 0.005625
5 6 3 15 1/8 0
6 6 6 15 1/9 0.000193
7 15 15 15 1/15 0.003403
8 15 3 15 1/11 0.001162
9 3 6 3 1/4 0.015625
10 6 6 6 1/6 0.001736
Total 0.0291
For example, for the 1st resample {6,6,15}, the claim payment after the deductible of 1 is
{5,5,14}. So the LER is (1+1+1) / (6+6+15) =3/27=1/9.
10
1 0.0291
The MES = ( LERi 0.125 ) = = 0.0029
2
i =1 10 10
n E Var ( X )
Z= , k= , and P (1 Z ) + Z X
n+k Var ( )
Rote memorization of a formula without fully grasping the concepts is tedious, difficult,
and prone to errors. Additionally, a memorized formula will not yield the needed
understanding to grapple with difficult problems.
In this chapter, were going to dig deep into Bhlmanns credibility premium formula and
gain a crystal clear understanding of the concepts.
Lets start with a simple example to illustrate one major challenge an insurance company
faces when determining premium rates. Imagine you are the founder and the actuary of
an auto insurance company. Your companys specialty is to provide auto insurance for
taxi drivers.
Before you open your business, there are half of dozen insurance companies in your area
that offer auto insurance to taxi drivers. The world has been going on fine for many years
without your start up. It can continue going on without your start up. So its tough for you
to get customers. Finally, you take out a big portion of your saving account and buy TV
advertising, which brings in your first three customers: Adam, Bob, and Colleen. Since
your corporate office is your garage and you have only one employee (you), you decide
that three customers is good enough for you to start your business.
When you open your business at t = 0 , you sell three auto insurance policies to Adam,
Bob, and Colleen. The contract of your insurance policy says that the premium rate is
guaranteed for only two years. Once the two-year guarantee period is over, you have the
right to set the renewal premium, which can be higher than the guaranteed initial
premium.
When you set your premium rate at t = 0 , you notice that Adam, Bob, and Colleen are
similar in many ways. They are all taxicab drivers. They work at the same taxi company
in the same city. They are all 35 years old. They all graduated from the same high school.
To actually set the initial premium for the first two years, you decide to buy a rate book
from a consulting firm. This consulting firm is well-known in the industry. Each year it
publishes a rate manual that lists the average claim cost of a taxi driver by city, by
mileage and by several other criteria. Based on this rate manual, you estimate that Adam,
Bob, and Colleen may each incur $4 claim cost per year. So at t = 0 , you charge Adam,
Bob, and Colleen $4 each. This premium rate is guaranteed for two years.
During the 2-year guaranteed period, Adam, Bob, and Colleen have incurred the
following claims:
Year 1 Year 2 Total Claim Average claim
Claim Claim per insured per year
Adam $0 $0 $0 $0 / 2 = $0
Bob $1 $7 $8 $8 / 2 = $4
Colleen $4 $9 $13 $13 / 2 =$6.5
Grand Total $21
Average claim per person per year (for the 3-person group): $21 / (3 2) = $3.5
Now the two-year guarantee period is over. You need to determine the renewal premium
rate for Adam, Bob, and Colleen respectively for the third year. Once you have
determined the premium rates, you will need to file these rates with the insurance
department of the state where you do business (called domicile state).
Question: How do you determine the renewal premium rate for the third year for Adam,
Bob, and Colleen respectively?
One simple approach is to charge Adam, Bob, and Colleen a uniform rate (i.e. the group
premium rate). After all, Adam, Bob, and Colleen are similar risks; they form a
homogeneous group. As such, they should pay a uniform group premium rate, even
though their actual claim patterns for the past two years are different. You can continue
charging them the old rate of $4 per insured per year. However, since the average claim
cost for the past two years is $3.50 per insured per year, you can charge them $3.50 per
person for year three.
Under the uniform group rate of $3.50, Bob and Colleen will probably underpay their
premiums; their actual average annual claim for the past two years exceeds this group
premium rate. Adam, on the other hand, may overpay his premiums; his average annual
claim for the past two years is below the group premium rate. When you charge each
policyholder the uniform group premium rate, low-risk policyholders will overpay their
premiums and the high-risk policyholders will underpay their premiums. Your business
as whole, however, will collect just enough premiums to pay the claim costs.
To stay in business, you have no choice but to charge individualized premium rates that
are proportional to policyholders risks.
Now lets come back to our simple case. We know that uniform rating wont work in the
real world. Well want to set up a mathematical model to calculate the fair renewal
premium rate for Adam, Bob, and Colleen respectively. Our model should reflect the
following observations and intuition:
Adam, Bob, and Colleen are largely similar risks. Well need to treat them as a
rating group. This way, our renewal rates for Adam, Bob, and Colleen are
somewhat related.
On the other hand, we need to differentiate between Adam, Bob, and Colleen. We
might want to treat Adam, Bob, and Colleen as potentially different sub-risks
within a largely similar rate group. This way, our model will produce different
renewal rates. We hope the renewal rate calculated from our model will agree
with our intuition that Adam deserves the lowest renewal rate, Bob a higher rate,
and Colleen the highest rate.
To reflect the idea that Adam, Bob, and Colleen are different sub-risks within a
largely similar rate group, we may want to divide the largely similar rate group
into four sub-risks (or more sub-risks if you like): super preferred, preferred,
standard, and sub-standard. So the rate group actually consists of four sub-risks.
Adam or Bob or Colleen can be any one of the four sub-risks.
To visualize that Adams sub-risk class is a random variable, think about rolling a 4-sided
die. One side of the die is marked with the letters SP (super preferred); another side is
marked with PF (preferred); the third side is marked with STD (standard); and the
fourth side is marked with SUB (substandard). To determine Adam belongs to which
sub-class, well roll the die. If the result is SP, then well assign Adam to the super
preferred class. If the result is PF, well assign him to the preferred class. And so on
and so forth. Similarly, we can roll the die and randomly assign Bob or Colleen to one of
the four sub-classes: SP, PF, STD, and SUB.
Now we are ready to come up with a model to calculate the renewal premium rate:
Let random variable X j t represent the claim cost incurred in year t by the j -th insured,
where t = 1, 2,..., n , and n + 1 and j =1,2,, and m . Here in our example, n = 2 (we
have two years of claim data) and m = 1, 2,3 (corresponding to Adam, Bob, and Colleen).
The estimated value of X j n +1 is the pure renewal premium for year n + 1 . Bhlmanns
approach is to use a + Z X j to approximate X j n +1 subject to the condition that
( )
2
E a+ZX j X j n +1 is minimized.
a + Z X j = (1 Z ) + Z X j ,
Z=
n
, k=
(
E Var X j t ) =
E Var X j t( )
n+k Var E(X jt ) Var ( )
= E(X j t) = E E X j t ( ) =E ( ) .
Next, well derive the above formulas. However, before we derive the Bhlmann
premium formulas, lets go over some preliminary concepts.
Lets use a simple example to understand the meaning behind the above formula. A class
has 6 boys and 4 girls. These 10 students take a final. The average score of the 6 boys is
80; the average score of the 4 girls is 85. Whats the average score of the whole class?
This is an elementary level math problem. The average score of the whole class is:
6 4
Average score = ( 80 ) + ( 85 )
10 10
If we express the above calculation using the double expectation theorem, then we have:
6 4
= ( 80 ) + (85 ) = 82
10 10
So instead of directly calculating the average score for the whole class, we first break
down the whole class into two groups based on gender. We then calculate the average
score of these two groups: boys and girls. Next, we calculate the weighted average of
these two group averages. This weighted average is the average of the whole class. If you
understand this formula, you have understood the essence of the double expectation
theorem.
Problem A group of 20 graduate students (12 with non-math major and 8 with math
major) have a total GRE score of 12,940. The GRE score distribution by major is as
follows:
Find the average GRE score twice. First time, do not use the double expectation theorem.
The second time, use the double expectation theorem. Show that you get the same result.
Solution
(1) Find the mean without using the double expectation theorem. The average GRE score
for 20 graduate students is:
12 7, 740 8 5, 200
= + = 647
20 12 20 8
Proof.
Var ( X ) = E ( X 2 ) E 2 ( X )
E ( X ) = EY E ( X Y ) , (
E ( X 2 ) = EY E X 2 Y )
( )
However, E X 2 Y = Var ( X Y ) + E 2 ( X Y ) .
{
= E Y Var ( X Y ) + E Y E 2 ( X Y ) (E Y E(X Y) ) }
2
= E Y Var ( X Y ) + Var Y E ( X Y )
If X is the lost amount of a policyholder and Y is the risk class of the policyholder, then
Var ( X ) = E Y Var ( X Y ) + Var Y E ( X Y ) means that the total variance of the loss
consists of two components:
Next, lets look at a comprehensive example using double expectation and total variance.
3!
P (n) = p n (1 p ) .
3 n
n !( 3 n ) !
Solution
E ( N ) = 3 p , Var ( N ) = 3 p (1 p )
However, p is also a random variable. So we cannot directly use the above formula.
Each value of p is a separate group. For each group, we will calculate its mean. Then we
will find the weighted average mean of all the groups, with weight being the probability
of each groups p value. The result should be E ( N ) .
1 1 1
E ( N ) = EP E ( N p ) E ( N p ) f P ( p ) dp =
3 2 3
= 3 p dp = p =
p= 0 p= 0 2 0 2
Alternatively, E ( N ) = EP E ( N p ) = EP [3 p ] = 3E ( P ) = 3
1 3
=
2 2
Next, well calculate Var ( N ) . One method is to calculate Var ( N ) from scratch using
the standard formula Var ( N ) = E ( N 2 ) E 2 ( N ) . Well use the double expectation
theorem to calculate E ( N 2 ) and E ( N ) .
( ) ( )
1
E(N 2
)=E P E N p2
= E N 2 p f ( p ) dp
0
( )
E N 2 p = E 2 ( N p ) + Var ( N p ) = ( 3 p ) + 3 p (1 p ) = 6 p 2 + 3 p
2
( )
1 1
E ( N ) = E N p f ( p ) dp = ( 6 p + 3 p ) dp = 2 p + p 2
3 7
2 2 2 3
=
0 0
2 0 2
Var ( N ) = E ( N ) 7 3 5
2
E (N) =
2
=
2 2 4
Alternatively, you can use the following formula to calculate the variance:
E ( N p ) = 3 p , Var ( N p ) = 3 p (1 p )
E p Var ( N p ) = E p 3 p (1 p ) = E p ( 3 p 3 p 2 )
= E p ( 3 p ) E p ( 3 p 2 ) = 3E p ( p ) 3 E p ( p 2 )
We have:
(1 0)
2
0 +1 1 1
E (P) = = , Var ( P ) = =
2 2 12 12
2
E ( P ) = E ( P ) + Var ( P ) =
1 1 4
2 2
+ =
2 12 12
In a regression analysis, you try to fit a line (or a function) through a set of points. With
least squares regression, you get a better fit by minimizing the distance squared of each
point to the fitted line.
Lets say you want to find out how a persons income level affects how much life
insurance he buys. Let X represent income. Let Y represent the amount of life insurance
this person buys. You have collected some data pairs of ( X , Y ) from a group of
consumers. You suspect theres a linear relationship between X and Y . You want to
predict Y using the function a + bX , where a and b are constant. With least squares
regression, you want to minimize the following:
Q ! 2 "
= ( a + bX Y) = E# ( a + bX Y) $
2
E
a a % a &
= 2 E ( a + bX Y ) = 2 a + bE ( X ) E (Y )
Q
Setting = 0. a + bE ( X ) E (Y ) = 0 ( Equation I )
a
Q ! 2 "
= ( a + bX Y) = E# ( a + bX Y) $
2
E
b b % b &
= 2E ( a + bX Y ) X = 2 aE ( X ) + bE ( X 2 ) E ( X Y )
aE ( X ) + bE ( X 2 ) E ( X Y ) = 0
Q
Setting = 0. (Equation II )
b
b E ( X 2 ) E 2 ( X ) = E ( X Y ) E ( X ) E (Y )
Cov ( X , Y )
b= , a = E (Y ) bE ( X )
Var ( X )
Now Im ready to give you a quick proof of the Bhlmann credibility formula. To
simplify notations, Im going to fix on one particular insured (such as Adam) and change
the symbol X j t to X t . Remember, our goal is to estimate X n +1 , the individualized
premium rate for year n + 1 , using a + Z X . Z is the credibility factor assigned to the
1
mean of past claims X = ( X 1 + X 2 + ... + X n ) . Well want to find a and Z that
n
minimize the following:
Guo Fall 2009 C, Page 112 / 284
( )
2
E a+ZX X n +1
Please note that X 1 , X 2 ,, X n , and X n +1 are claims incurred by the same policyholder
(whose risk class is unknown to us) during year 1, 2, , n , and n + 1 .
z=
(
Cov X , X n +1 )
Var X ( )
( )
Cov X , X n +1 = Cov
1
n
1
( X 1 + X 2 + ... + X n ) , X n +1 = Cov
n
( X 1 + X 2 + ... + X n ) , X n+1
1
= Cov ( X 1 , X n +1 ) + Cov ( X 2 , X n +1 ) + ... + Cov ( X n , X n +1 )
n
Z=
(
Cov X , X n +1 ) =0
Var X ( )
The result Z = 0 simply doesnt make sense. What went wrong is the assumption that
X 1 , X 2 ,, X n , X n +1 are independent identically distributed. The correct statement is
that X 1 , X 2 ,, X n , and X n +1 are identically distributed with a common density function
f ( x, ) , where is unknown to us.
Here is an intuitive way to see why X i and X j have non-zero covariance. X i and X j
represent the claim amount incurred at time i and j by the policyholder whose sub-class
Guo Fall 2009 C, Page 113 / 284
is unknown to us. So X i and X j are controlled by the same risk-class factor . If is
a low risk, then X i and X j both tend to be small. On the other hand, if is a high risk,
then X i and X j both tend to be big. So X i and X j are correlated and have a non-zero
variance.
(
E Xi X j ) = E(X )E(X ) = (
i j ) ( ) = ( )
2
E (
E Xi X j ) =E E ( Xi )E(X ) j =E ( )
2
Cov ( X i , X j ) = E ( ) {E ( ) } ( )
2 2
= Var
(
Cov X , X n +1 = ) 1
n
Cov ( X 1 + X 2 + ... + X n ) , X n +1
1
= Cov ( X 1 , X n +1 ) + Cov ( X 2 , X n +1 ) + ... + Cov ( X n , X n +1 )
n
1
{
= nVar ( ) = Var ( )
n
}
Next, well calculate Var X . ( )
( )
Var X = Var
1
n
1
( X 1 + X 2 + ... + X n ) = 2 Var ( X 1 + X 2 + ... + X n )
n
Out of X 1 , X 2 ,, X n , if you take out any two items X i and X j where i ' j , youll get
n ( n 1)
a covariance Cov ( X i , X j ) = Var ( ) . Since there are Cn2 =
ways of taking
2
out two items X i and X j where i ' j , the sum of the covariance terms becomes:
{
= 2Var ( ) }C 2
n = 2 Var ( )
1
n ( n 1) = n ( n 1) Var
2
( )
( )
Var X =
1
n2
Var ( X 1 + X 2 + ... + X n )
=
1
n2
{
nVar ( X ) + n ( n 1) Var ( ) } = 1n {Var ( X ) + ( n 1) Var ( ) }
Var ( X ) Var ( )
= + Var ( )
n
Z=
(
Cov X , X n +1 ) = Var ( )
=
Var ( )
Var X ( ) Var X( ) Var ( ) +
1
E Var ( X )
n
n
=
E Var ( X )
n+
Var ( )
E Var ( X ) Var ( ) n
Let k = . Then Z = =
Var ( ) Var X ( ) n+k
( )
Next, we need to find a = E ( X n +1 ) Z E X . Remember, X 1 , X 2 ,, X n , though not
independent, have a common mean E ( X ) = and a common variance Var ( X ) .
( )
E X =E
1
n
1 1
( X 1 + X 2 + ... + X n ) = E ( X 1 + X 2 + ... + X n ) = ( n ) =
n n
E ( X n +1 ) =
( )
a = E ( X n +1 ) Z E X = Z = (1 Z )
n
a + Z X = (1 Z ) + Z X = Z X + (1 Z ) , where z =
n+k
Z=
(
Cov X , X n +1 ), a = (1 Z )
Var X ( )
( )
Cov X , X n +1 = Cov ( X i , X j ) = Var ( ) = VE , where i ' j
Var ( X 1 + X 2 + ... + X n )
( )
Var X =
n2
Var ( X 1 + X 2 + ... + X n )
( )
Var X =
n2
= Var ( ) +
1
n
E Var ( X ) = VE +
1
n
EV
Z=
(
Cov X , X n +1 ) = Var ( )
=
Var ( )
Var X ( ) Var X ( ) 1
{
E Var ( X ) + nVar ( ) }
n
n n
= = ,
E Var ( X ) n+k
n+
Var ( )
Or Z =
(
Cov X , X n +1 ) = Var ( )
=
VE
=
n
Var X ( ) Var X( ) 1
VE + EV n +
EV
n VE
P = a + Z X = (1 Z ) + Z X
P = Z X + (1 Z)
Renewal risk-specific global mean
premium sample mean
If we apply this formula to set the renewal premium rate for Adam for Year 3, then the
formula becomes:
(1 Z)
Adam
P Adam = Z X + Adam, Bob, Colleen
Renewal risk-specific global mean
premium sample mean
At first, the above formula may seem counter-intuitive. If we are interested only in
Adams claim cost in Year 3, why not set Adams renewal premium for Year 3 equal to
his prior two-year average claim X (so P X )? Why do we need to drag in , the
global average, which includes the claim costs incurred by Bob and Colleen?
Actually, its blessing that the renewal premium formula includes . X varies widely
based on your sample size. However, the state insurance departments generally want the
renewal premium to be stable and responsive to the past claim data. If your renewal
premium P is set to X , then P will fluctuate wildly depending on the sample size. Then
youll have a difficult time getting your renewal rates approved by state insurance
departments.
In addition, you may have P X = 0 ; this is the case for Adam. Youll provide free
insurance to the policyholder who has not incurred any claim yet. This certainly doesnt
make any sense.
There are other ways to derive the Bhlmann credibility formula. For example, instead of
( )
2
minimizing E a + Z X X n +1 , we can minimize
( )
2
E a+ZX
( )
2
The idea behind E a + Z X is this. If we know that a policyholder belongs to
sub-risk , then we can set our renewal premium for year n + 1 equal to his conditional
mean claim cost ( ) = E ( X n +1 ) = E ( X 1 ) = E ( X 2 ) = ... = E ( X n ) . However, we
dont know . As a result, we list all the possible values of ( ) and find the least mean
squared errors estimator of ( ) by minimizing ( )
2
E a+ZX .
Cov X , ( )
Z=
Var X ( )
Cov X , ( )
1 1
= Cov ( X 1 + X 2 + ... + X n ) , ( ) = Cov ( X 1 + X 2 + ... + X n ) , ( )
n n
=
1
n
{Cov X 1 , ( ) + Cov X 2 , ( ) + ... + Cov X n , ( ) }
For i = 1, 2,..., n , we have:
Cov X i , ( ) = E Xi ( ) E ( Xi ) E ( )
E Xi ( ) {
= E E Xi ( ) }
, ( ) is a constant. Hence E Xi ( ) = ( )E = ( )
2
For a fixed Xi
E Xi ( ) {
= E E Xi ( ) }= E ( )
2
E ( Xi ) E ( ) E ( Xi ) ( ) { ( ) }
2
=E E = E
Cov X i , ( ) ( ) {E ( ) } ( )
2 2
=E = Var
( ) ( ) ( )
2 2
Var X is the same whether E a + Z X X n +1 or E X , a+ZX is to be
minimized:
( ) 1n {E
Var X = Var ( X ) + nVar ( ) }
One again, we get:
Z=
(
Cov X , X n +1 ) = Var ( )
=
n
=
n
Var X( ) Var X( ) n+
E Var ( X ) n+k
Var ( )
( )
a = E ( X n +1 ) Z E X = Z = (1 Z )
a + Z X = (1 Z ) + Z X = Z X + (1 Z ) ,
( ) ( )
2 2
E a+ZX X n +1 or E a + Z X
E ( X n +1 X 1 , X 2 ,..., X n )
2
we can minimize E a + Z X .
Here X n +1 X 1 , X 2 ,..., X n represents the claim cost at year n + 1 of the policyholder who
incurred claims X 1 , X 2 ,..., X n in year 1,2,, n . The notation X n +1 X 1 , X 2 ,..., X n
emphasizes that the claim amounts X 1 , X 2 ,..., X n , X n +1 are from the same sub-class .
This condition must hold for the Bhlmann credibility formula to be valid. For example,
if X n +1 comes from sub class 1 and X 1 , X 2 ,..., X n from sub-class 2 , then the Bhlmann
credibility formula will not hold true.
However, the requirement that the claim amounts X 1 , X 2 ,..., X n , X n +1 are from the same
sub-class shouldnt bother us at all. At the very beginning when we presented the
Bhlmann credibility formula, we already used X 1 , X 2 ,..., X n , X n +1 to refer to the claims
incurred by the same policyholder whose sub-risk is . As a result,
Guo Fall 2009 C, Page 120 / 284
E ( X n +1 X 1 , X 2 ,..., X n ) = E X = ( )
E ( X n +1 X 1 , X 2 ,..., X n ) ( )
2 2
So E a + Z X = E a+ZX
Key Points
We can derive the Bhlmann credibility formula by minimizing any of the following
three terms:
( ) ( ) E ( X n +1 X 1 , X 2 ,..., X n )
2 2 2
E a+ZX X n +1 , E a+ZX , E a+ZX .
The Bhlmann credibility premium is the least squares linear estimator of any of the
following three terms:
( ) , the mean claim amount of the sub-class that has generated X 1 , X 2 ,..., X n
Even though we have derived the Bhlmann credibility formula assuming X is the claim
cost, the Bhlmann credibility formula works if X is any other quantity such as loss
ratio, the aggregate loss amount, or the number of claims.
In contrast, Bayesian premiums (the posterior means) are often difficult to calculate,
requiring knowledge of prior distributions and involving complex integrations.
Next, lets derive a special case of the Bhlmann credibility formula. This special case is
presented in Loss Models.
If E ( X i ) = , Var ( X i ) = 2
, and for i ' j Cov ( X i , X j ) = * 2
where correlation
coefficient * satisfies 1 < * < 1 , determine the Bhlmann credibility premium.
(
Cov X , X n +1 ),
Z= (
Cov X , X n +1 = Var ) ( ) = Cov ( X i , X j ) = * 2
Var X ( )
Var ( X 1 + X 2 + ... + X n )
( )
Var X =
n2
Z=
(
Cov X , X n +1 )= * 2
=
n*
Var X ( ) 1
n 2
+ n ( n 1) * 2 1 + ( n 1) *
n2
a = (1 Z ) = 1
n*
=
(1 * )
1 + ( n 1) * 1 + ( n 1) *
Z X + (1 Z ) =
n*
X+
(1 * ) = * n
Xi +
(1 * )
1 + ( n 1) * 1 + ( n 1) * 1 + * n * i =1 1 + * n*
You dont need to memorize the Bhlmann credibility premium formula for this special
case. If you understand how to derive the general Bhlmann credibility premium formula,
you can derive the special case formula any time by setting Cov ( X i , X j ) = * 2 .
Next, lets turn our attention toward how to solve the Bhlmann credibility problem on
the exam.
Step 2 For each sub-class , calculate the average claim cost (or loss ration,
aggregate claim, etc) ( ) = E ( X ) ; calculate the variance of the claim
cost Var ( X ).
Step 3 Calculate EV= E Var ( X ) , the average variance for all sub-classes
combined. Calculate VE= Var E(X ) , the variance of the average
claim for all sub-classes combined.
EV n
Step 4 Calculate k = , Z=
VE n+k
Step 5 Calculate = E E(X ) , the average claim cost for all sub-classes
combined. This is the uniform group premium rate you would charge
under the classical theory of insurance.
n
1
Step 6 Calculate the sample claim of the past data X = Xi .
n i =1
Solution
This is a typical problem for Exam C. Here policyholders are from two risk classes. Even
though the problem doesnt say that Risk 1 and Risk 2 are two sub-risks of a similar
bigger risk group (i.e. homogeneous group), we should assume so. Otherwise, the
Bhlmann credibility formula wont work. Remember the Bhlmann credibility premium
is the weighted average of the uniform group rate and the risk specific sample mean
X . If Risk 1 and Risk 2 are not sub-risks of a homogeneous group, then the uniform
group rate doesnt exist; we have no way of calculating Z X + (1 Z ) .
The problem says that a claim of 250 is observed. This means that a policyholder of an
unknown sub-class has incurred a claim of X 1 =$250. Since Risk 1 is twice as likely as
2 1
Risk 2, the $250 claim has chance of coming from Risk 1 and chance of from Risk
3 3
2. The question asks us to estimate the next claim amount X 2 incurred by the same
policyholder.
3 3
( )
E X 2 risk 1 =2502(0.5) + 2,5002(0.3)+ 60,0002(0.2)=721,906,250
E(X 2
risk 2 ) =250 (0.7 )+ 2,500 (0.2) + 60,000 (0.1) =361,293,750
2 2 2
(
Var ( X risk 1) = E X 2 risk 1 ) E 2 ( X risk 1) = 721,906, 250 12,8752 = 556,140, 625
Var ( X risk 2 ) = E ( X 2
risk 2 ) E 2 ( X risk 2 ) = 361, 293, 750 6, 6752 = 316, 738,125
EV 476,339, 791.67
k= = = 55.76
VE 8,542, 222.22
n 1
Z= = = 1.76% ,
n + k 1 + 55.76
n
k + Xi
n k k + nX
P = Z X + (1 Z ) = X+ = = i =1
EV
We can interpret k = as the number of samples taken out of the global mean .
VE
Imagine we have two urns, A and B. A contains an infinite number of identical balls with
each ball marked with the number . B contains an infinite number of identical balls
with each ball marked with the number X . You take out k balls from Urn A and n balls
from Urn B.
n
k + Xi
k + nX
Then the average value per ball is: P = = i =1
n+k n+k
k n
n
k + Xi
X k + nX
P= = i =1
n+k n+k
A B
Practice problems
Q1 You are an actuary on group health insurance pricing. You want to use the
Bhlmann credibility premium formula P = Z X + (1 Z ) to set the renewal premium
rate for a policy. One day the vice president of your company stops by. He has a Ph.D.
degree in statistics and is widely regarded as an expert on the central limit theorem. He
asks you to throw the formula P = Z X + (1 Z ) into the trash can and focus on .
All we care about is . As long as we charge each policyholder , well be okay, the
vice president says. The fundamental concept of insurance is that many people form a
group to share the risk. If we charge , the law of large numbers will work its magic and
well be able to collect enough premiums to pay our guarantees.
Solution
If an insurer charges to similar yet different risks, good risks will stop doing business
with the insurer and buy cheaper insurance elsewhere; only bad risks will remain in the
insurers book of business. As more and more good risks leave the insurers book of
business, the actual expected claim cost will exceed the original average premium rate .
Then the insurer has to increase , causing more policyholders to terminate their
policies. Gradually, the insurers customer base will shrink and the insurer will go
bankrupt.
Q2 Compare and contrast the classical theory of insurance and the credibility theory
of insurance.
Solution
Q3 One day you visited your college statistics professor. He asked what you were
doing in your job. You told him that you used the Bhlmann credibility premium formula
to set the renewal premium for group health insurance policies. The Bhlmann credibility
theory was new to the professor. After listening to your explanation of the formula
P = Z X + (1 Z ) , he looked puzzled. He told you that for 20 years he had been telling
his students that X is the unbiased estimator of E ( X ) . I dont get it. Why dont you
just set P X ?
Solution
Your stats professor is perfectly correct in saying that the sample mean is an unbiased
estimator of the population mean. If the number of observations n is large (so we have
observed X 1 , X 2 , , X n claims), for any policyholder, setting his renewal premium
equal to his prior average mean claim is a good idea.
( ) = E(X j = ), j = 1, 2,..., n
and variance
v( ) = Var ( X j = ), j = 1, 2,..., n
Solution
Z=
(
Cov X , X n +1 ) = Var ( )
=
VE
Var X ( ) Var X ( ) 1
VE + EV
n
We are told that n = 4 (we have four years of claim data), Z = 0.4 , and VE = 0.8 .
VE VE
0.4 = = , VE = 1.33 .
VE +
8 VE + 2
4
(
So Cov ( X i , X j ) = Cov X , X n +1 = Var ( ) ) = VE = 1.33
The Bhlmann credibility estimate of the number of claims for the same risk in Year 2 is
11.983. Determine x .
Solution
The problem states that x claims in Year 1 have been observed for a randomly selected
risk. The wording a randomly selected risk is needed because in order for the
Bhlmann credibility formula to work, the risk class must be unknown to us. If we
already know the risk class, we can calculate the expected number of claims in Year 2;
we dont need to estimate any more.
Please also pay attention to the wording the Bhlmann credibility estimate of the
number of claims for the same risk in Year 2 is In order for the Bhlmann credibility
formula to work, the renewal premium (or the expected number of claims in this
problem) in year n + 1 and the prior n year claims X 1 , X 2 , , X n must refer to the
same (unknown) risk class.
And now back to the problem. Let Y represent the number of claims incurred in a year
by a randomly chosen class. Since Y has is Poisson random variable,
E (Y ) = Var (Y ) .
Class Mean # of claims per risk P( = ) # of risks
= E (Y ) = Var (Y )
1 1 90% 900
2 10 9% 90
3 20 1% 10
Total 100% 1,000
E (Y ) = E E (Y { ) E (Y )}
2
Alternatively, VE = Var E
EV 2
k= =
VE 9.9
n
2
nY + k
Yi + k x+ ( 2)
P= = i =1
= 9.9 = 11.983 , x = 14
n+k n+k 1+
2
9.9
Q6 Nov 2005 #7
For a portfolio of policies, you are given:
The annual claim amount on a policy has probability density function
f (x ) = 2 , 0 < x <
2x
Determine the Bhlmann credibility estimate of the claim amount for the selected risk in
Year 2.
Solution
0 0 0
Var ( X ) = E(X 2
) E 2
(X ) = 12 2 2
3
=
1
18
2
= E(X ) = E E(X ) 2 2
The global mean: =E = E( )
3 3
Var ( X ) E( )
1 1
The expected conditional variance: EV = E =E 2
= 2
18 18
E(X ) 2 2
The variance of the conditional mean: VE = Var = Var = Var ( )
3 3
1 1 1
E( )= +( )d = (4 3
)d = 4 4
d =
4
5
0 0 0
1 1 1
E( )= (4 ) d 4
2 2
+( )d = 2 3
= 4 5
d =
0 0 0
6
2
) = E( ) 4 4 2
Var ( 2
E2 ( )= =
6 5 75
2 2
EV = E ( ) = 181 64 ,
1 2 2 2
2
VE = Var ( )=
18 3 3 75
1 4
2 2 4 8 EV 18 6
= E( )= = , k= = 2
= 3.125
3 3 5 15 VE 2 2
3 75
n
8
k + Xi 3.125 + 0.1
k + X1 15
P= i =1
= = = 0.428
n+k 1+ k 1 + 3.125
2 x2
E(X )= x f (x ) dx = 2x 2 1 3 2
x 2
dx = 2
dx = 2
x = (as before)
0 0 0
3 0 3
1 1 1
E(X ) E(X ) {E E ( X ) }
2 2
VE = Var =E
1 1 2
E(X ) E(X ) 2
+( )d +( )d
2 2
E = =
=0 =0
3
1 2 1
2 16 16 1 8
= 4 3
d = 5
d = =
=0
3 9 =0
9 6 27
{E E ( X ) }
2
E(X )
2 2 8 8
VE = VE = = 0.01185
27 15
E X2( ) = x2 f ( x ) dx = x 2 2x
2
dx =
2 x3
2
dx =
2 1 4
2
4
x
0
=
1
2
2
0 0 0
Var ( X )= E X2 ( ) E2 ( X )= 1
2
2 2
3
=
1
18
2
1 1
EV = E Var ( X ) Var ( X ) 1 4 1
= +( ) d = 4 3 2
d =
0 0
18 18 6
n
8
k + Xi 3.125 + 0.1
k + X1 15
P= i =1
= = = 0.428
n+k 1+ k 1 + 3.125
Solution
X 0 1 2
Probability 2 1 3
E(X ) = 0 ( 2 ) + 1( ) + 2 (1 )=2 5 3
(
E X2 ) = 02 ( 2 ) + 12 ( ) + 22 (1 3 ) = 4 11
Var ( X )=4 (2 ) =9
2 2
11 5 25
= E E(X ) = E [2 5 ] = 2 5E ( )
VE = Var E ( X ) = V ( 2 5 ) = V ( 5 ) = 25Var ( )
EV = E Var ( X ) = E ( 9 25 2 ) = 9E ( ) 25E ( 2 )
0.05 0.30
Probability 0.80 0.20
= 2 5E ( ) = 2 5 ( 0.1) = 1.5
VE = 25Var ( ) = 25 ( 0.01) = 0.25
EV = 9 E ( ) 25 E ( 2 ) = 9 ( 0.1) 25 ( 0.02 ) = 0.4
n
k + Xi
EV 0.4 k + X 1 1.6 (1.5 ) + 2
k= = = 1.6 P= i =1
= = = 1.69
VE 0.25 n+k 1+ k 1 + 1.6
Calculate the Bhlmann credibility estimate of the number of claims for the selected
policy in Year 2.
Solution
Let X represent the annual number of claims on a randomly selected policy. Here the
risk factor is - . The conditional random variable X - has geometric distribution. If you
look up Tables for Exam C/4, youll find geometric random variable N with parameter
- has mean and variance as follows:
E(N) = - , Var ( N ) = - (1 + - )
EV = E- Var ( X - ) = E- - (1 + - ) = E ( - ) + E ( - 2 )
VE = V- E ( X - ) = V ( - )
We are told that the prior distribution of - has the Pareto density function
.
+ (- ) = , 0<- <
( - + 1)
. +1
Here the phrase prior distribution refers to the fact that we know + ( - ) prior to our
observation of x claims in Year 1. In other words, + ( - ) hasnt incorporated our
observation of x claims in Year 1 yet. Please note that the prior distribution, not the
posterior distribution, is used in Bhlmanns credibility estimate.
Frankly, I think SOAs emphasis that + ( - ) is prior (as opposed to posterior) distribution
is unnecessary and really meant to scare exam candidates. When we talk about density
function, we always refer to prior distribution. So theres never a need to say prior
distribution. If we want to refer to a distribution that has incorporated our recent
observations, at that time we say posterior distribution.
Back to the problem. We are told that - has Pareto distribution. Is it a one-parameter
Pareto or two-parameter Pareto? Many candidates have trouble knowing which one to
use. Here is a simple rule:
.
.
If X > a positive constant , then use single-parameter Pareto f ( x ) = ;
x. +1
.
.
If X > 0 , then use two-parameter Pareto f ( x ) = .
(x + )
. +1
In this problem, the Pareto random variable - > 0 . So we should use the two-parameter
Pareto formula in Tables for Exam C/4.
2 2
E(X ) = , E(X2) =
. 1 (. 1)(. 2 )
.
2
2 2 2
Var ( X ) = E ( X 2
) E (X ) =2
=
(. 1)(. 2 ) . 1 (. 1) (. 2 )
2
Since the two-parameter Pareto is frequently tested in Exam C, you might want to
memorize the following formulas:
2 2 . 2
E(X ) = , E(X ) = 2
, Var ( X ) =
. 1 (. 1)(. 2 ) (. 1) (. 2 )
2
.
- is a two-parameter Pareto random variable with pdf + ( - ) = . So the two
( - + 1)
. +1
.
, E (- 2 ) =
1 2
E (- ) = , Var ( - ) =
. 1 (. 1)(. 2 ) (. 1) (.
2
2)
.
EV = E ( - ) + E ( - 2 ) =
1 2
+ =
. 1 (. 1)(. 2 ) (. 1)(. 2 )
. 1
VE = V ( - ) = , = E (- ) =
(. 1) (. 2) . 1
2
.
k=
EV
=
(. 1)(. 2 ) = . 1
VE .
(. 1) (. 2 )
2
n
1
k + Xi
k + X1 (. 1) +x
x +1
P= i =1
= = . 1 =
n+k 1+ k 1 + (. 1) .
Solution
Let N represent the annual number of claims for a randomly selected risk.
Let X represent the loss dollar amount per loss incident.
Let S represent the aggregate annual claim dollar amount incurred by a risk.
N
Then S = X i = X 1 + X 2 + ... + X N .
i =1
0n
N is a Poisson random variable with mean 0 . So f N ( n 0 ) = e 0
( N = 0,1, 2,... ).
n!
Here 0 is an exponential random variable with pdf f ( 0 ) = e 0 . We have E ( 0 ) = 1 ,
Var ( 0 ) = 12 = 1 , and E ( 0 2 ) = E 2 ( 0 ) + Var ( 0 ) = 12 + 1 = 2
{
EV = E0 , Var S ( 0 , ) } = E ( 20 ) . 0,
2
{E S (0, ) } = Var (0 ) = E 0, (0 ) E 0, (0 )
2
VE = Var 0 ,
2
0,
(0 ) = E 0, (0 2 2 ) = E (0 2 ) E ( ) = 2 ( 2) = 4
2 2
E 0,
E 0, ( 0 ) = E ( 0 ) E ( ) = 1(1) = 1
(0 ) E 0, (0 )
2
VE = E 0 , = 4 12 = 3
2
EV 4
k= =
VE 3
Solution
EV E( )
k= =
VE Var ( )
2. 6
1
Here we are given that F ( ) = 1 . So is a two-parameter Pareto random
+1
variable with parameters = 1 and . = 2.6 .
1 1 2.6 2.6
So E ( X ) = = and Var ( X ) = = 2
( 2.6 1) ( 2.6 2 ) 1.6 ( 0.6 )
2
2.6 1 1.6
1
EV E(
) = 1.6 = 1.6 ( 0.6 ) = 0.369
k= =
VE Var ( ) 2.6 2.6
1.6 ( 0.6 )
2
n 5
Z= = = 0.93
n + k 5 + 0.369
Solution
Let N represent the claim counts, X i the dollar amount of the i -th claim, and S the
aggregate losses. N 0 has Poisson distribution with mean of 0 . X i , has lognormal
distribution with parameters and . In addition, for i = 1 to N , X i , is
independent identically distributed.
N
S= Xi
i =1
E ( S 0, , ) = E(N 0) E( X , ) = 0E ( X , )
Var ( S 0 , , ) = E ( N 0 )Var ( X , ) + Var ( N 0 ) E ( X 2
, )
= 0 Var ( X , ) + 0 E ( X , )
2
= 0 E ( X , )
2
From Tables for Exam C/4, we know the lognormal distribution has the following
moments:
E ( X k ) = exp k + k 2
1 2
, E ( X 2 ) = exp 2 + 22 = exp ( 2 + 2 )
1 1
E ( X ) = exp + 2 2 2
2 2
E ( S 0, , ) = 0E ( X ) = 0 exp 1
, + 2
E ( S 0, , ) 1
= E0 , , = E0 , , 0 exp + 2
1 1 1
1
= 0 exp + 2
f (0, , ) d0 d d
= 0 = 0 0 =0
2
1 1 1 1 1 1
1
= 0 exp + 2 d0 d d = e 0d 0 d d
2
2
2 e0.5
= 0 = 0 0 =0
2 =0 =0 0 =0
1 1 1
1 1
= e d d = ( e 1)
2 2
2 e0.5 2 e0.5 d
2 =0 =0 2 =0
Set 0.5 2
= y . Then d = dy .
1 0.5
= ( e 1) e0.5 d = ( e 1) e y dy = ( e 1) ( e0.5 1)
2
=0 y =0
E ( S 0, , ) E ( S 0, , ) {E E ( S 0, , )}
2 2
VE = Var 0 , , = E 0 , , 0 , ,
0 2 exp ( 2 + )
1
E 0 , , 0 exp + 2
= E 0 , , 2
1 1 1
= 0 2 exp ( 2 + 2
) f ( 0, , ) d 0 d d
= 0 = 0 0 =0
1 1 1
= 0 2 exp ( 2 + 2
)2 d0 d d
= 0 = 0 0 =0
1 1 1 1 1
2 exp ( ) 2 exp ( )
1
= 2
exp ( 2 ) 0 2d 0 d d = 2
exp ( 2 ) d d
=0 =0 0 =0 3 =0 =0
1 1
2 exp ( ) 12 ( e 1) d ( e 1) exp ( )d
1 1 2
= 2 2
= 2
3 =0
3 =0
2 1 1
E 0 , , 0 exp +
1
2
2
=
3
( e 1)
1 2
exp ( 2
)d =
3
(
1 2
e 1)
1 y
2
e dy
=0 y =0
=
1 1
3 2
(e 2
1) ( e 1) = ( e 1) ( e 1)
1 2
6
{E E ( S 0, , )} = 2 = ( e 1) ( e0.5 1)
2 2 2
0 , ,
E ( S 0, , ) {E E ( S 0, , )}
2 2
VE = E 0 , , 0 , ,
( e 1) ( e 1)
1 2
( e 1) (e 1) = 0.5872
2
=
2 0.5
EV = E0 , , Var ( S 0 , , ) = E0 , , 0 exp ( 2 + 2 2
)
1 1 1
= 0 exp ( 2 + 2 2
) f (0, , ) d 0 d d
= 0 = 0 0 =0
1 1 1
= 0 exp ( 2 + 2 2
)2 d0 d d
= 0 = 0 0 =0
1 1 1 1 1
2 exp ( 2 ) 2 exp ( 2 )
1
= 2
exp ( 2 ) 0d 0 d d = 2
exp ( 2 ) d d
=0 =0 0 =0 2 =0 =0
( e 1)
1 1 2
2 exp ( 2 )d (
1 1 2
e 1) ( e 2 1) = ( e2 1) = 5.103
1 1 2
= 2
=
2 2 =0
2 2 2 8
EV 5.103
k= = = 8.69
VE 0.5872
f0 ( 0 ) = 1 , 0 < 0 < 1;
f ( ) = 1, 0 < < 1;
f ( )=2 , 0< < 1.
= E0 , , E ( S 0, , ) = E0 , , 0 exp +
1
2
2
= E ( 0 ) E ( e ) E e0.5 ( 2
)
( )
1 1
E ( e ) = e du = e 1 , = 2 ( e0.5 1)
1 1
E (0 ) = = e0.5 2 d = 2 e0.5
2 2 2
, E e0.5
2 0 0
0
= E ( 0 ) E ( e ) E e0.5 ( 2
) = ( e 1) ( e 0.5
1)
EV = E0 , , Var ( S 0 , , ) = E0 , , 0 exp ( 2 + 2 2
) = E ( 0 ) E ( e2 ) E e 2 ( ) 2
1
E (e )= ( e 1)
2 1 2 1 1 2
e 2 du = e =
0
2 0 2
( )
1
1 2
( e 1)
1 2
1
= e2 2 d = =
2 2 2
E e2 e
0
2 0 2
EV = E ( 0 ) E ( e 2 ) E e 2 ( ) = 12 12 ( e 1) 12 ( e 1) = 18 ( e 1)
2
2 2 2 2
= 5.103
E ( S 0, , ) E ( S 0, , ) {E E ( S 0, , )}
2 2
VE = Var 0 , , = E 0 , , 0 , ,
E ( S 0, , )
2
= E 0 , , 2
2
E 0 , , 0 exp +
1
2
2
= E 0 , , 0 2 exp ( 2 + 2
) = E ( 0 2 ) E ( e2 ) E e ( ) 2
( )
1 1
E ( 0 2 ) = 0 2 d 0 = , E ( e 2 ) = ( e2 1) , E e
1 1 1
= e 2 d = e =e 1
2 2 2
0
3 2 0
0
E ( S 0, , ) ( e 1) ( e 1)
1 2
( e 1) (e 1) = 0.5872
2 2
VE = E 0 , , 2 =
2 0.5
EV 5.103
k= = = 8.69
VE 0.5872
The Bhlmann credibility estimate for the number of claims in Year 4 for this risk is
4.6019.
Determine r .
Solution
1 1
= ( 5 + 4.4 ) = 4.7 , EV = ( 5 + 1.98) = 3.49
2 2
VE = ( 5 4.4 ) 0.52 = 0.09
2
n
k + Xi
EV 3.49
k= = = 38.78 , P= i =1
VE 0.09 n+k
38.78 ( 4.7 ) + ( 3 + r + 4 )
4.6019 = , r =3
3 + 38.78
Solution
The key to solving this problem is correctly identifying risk classes. There are four risk
classes:
= ( BR, BU , PR, PU )
Next, we need to calculate the probability of Rural Use and Urban Use.
Expected claims
Rural 1.0
Urban 2.0
Total 1.8
EV 0.93
k= = = 4.18
VE 0.2225
n 1
Z= = = 0.193
n + k 1 + 4.18
E ( X n +1 X 1 , X 2 ,..., X n )
Now we move from the Bhlmann credibility world to a more complex, the Bhlmann-
Straub credibility world. Instead of looking at only one policyholder, we look at a group
of policyholders.
In Year 2, there are m2 policyholders. The 1st policyholder has incurred X (1, t = 2 )
claim. The 2nd policyholder has incurred X ( 2, t = 2 ) claim. And the m2 -th
policyholder has incurred X ( m2 , t = 2 ) claim amount.
In Year t , there are mt policyholders. The 1st policyholder has incurred X (1, t ) claim.
The 2nd policyholder has incurred X ( 2, t ) claim. And the mt -th policyholder has
incurred X ( mt , t ) claim amount.
In Year n , there are mn policyholders. The 1st policyholder has incurred X (1, t = n )
claim. The 2nd policyholder has incurred X ( 2, t = n ) claim. And the mn -th
policyholder has incurred X ( mn , t = n ) claim amount.
We dont know the specific value of . All we know is that takes on a random value
from = { 1 , 2 ,...} .
One approach is to calculate the renewal premium for Year n + 1 from the scratch. An
easier approach is to convert the Bhlmann-Straub credibility problem into a standard
Bhlmann credibility problem. Ill do both.
First, lets look at the problem from the Bhlmann world. In Year 1, m1 policyholders
m1
have incurred a total of X ( i, t = 1) claim amount. Because these m1 policyholders
i =1
belong to the same, unknown, sub-risk , theres no distinction between any two of these
m1 policyholders. All these m1 policyholders are just photocopies of one another.
So
m1
In Year 1, m1 policyholders have incurred a total of X ( i, t = 1) claim amount.
i =1
Is the same as
m1
In the first m1 years, one policyholder has incurred a total of X ( i, t = 1) claim amount.
i =1
m1
In either case, the total claim amount is X ( i, t = 1) ; the average claim per policyholder
i =1
m1
1
per year is X ( i, t = 1) .
m1 i =1
Is the same as
m1
In the next m2 years, the policyholder (who has incurred X ( i, t = 1) in the first m1
i =1
m2
years) has incurred total X ( i, t = 2 ) claim.
i =1
So on and so forth.
mn
In the next mn years, the policyholder has incurred total X ( i, t = n ) claim.
i =1
Then the expected claim cost in Year m + 1 for one policyholder can be calculated using
the Bhlmann credibility formula:
P = Z X + (1 Z ) , where
n mt
Total observed claims 1
X= = X ( i, t )
Total # of observed years m t =1 i =1
# of observation years m E 2
( )
Z= = , k=
# of observation years + k m + k Var ( )
Actually, we can have a unified formula for the Bhlmann-Straub and the Bhlmann
credibility models:
P = Z X + (1 Z )
# of observed exposures E 2
( )
Z= , k=
# of observed exposures + k Var ( )
In this unified formula, the observed exposure is measured on the insured-year basis. For
example, if one policyholder has incurred $500 claim in one year, the exposure is:
If the policyholder has incurred $500 claim in a 2-year period, then the exposure is:
Lets see how the unified formula works for the Bhlmann and the Bhlmann-Straub
credibility models. In the Bhlmann model, we have an n -year claim history of one
policyholder. So the observed exposure is:
n X 1 + X 2 + ... + X n
Z= , X=
n+k n
Now you know how to convert a Bhlmann-Straub problem into a Bhlmann problem
and how to use a unified formula for the Bhlmann-Straub model and the Bhlmann
model. Next, Ill derive the Bhlmann credibility formula from the scratch. First, lets
create an average policyholder and reorganize each years claim data from the viewpoint
of this average policyholder.
Lets look at the claim history data in the Bhlmann-Straub model from the average
policyholders point of view:
m1
In year 1, m1 policyholders have incurred total X ( i, t = 1) claim. So the average
i =1
m1
1
policyholder has incurred X 1 = X ( i, t = 1) claim.
m1 i =1
m2
1
In Year 2, the average policyholder has incurred total X 2 = X ( i, t = 2 ) claim.
m2 i =1
mt
1
In Year t , the average policyholder has incurred total X t = X ( i, t ) claim.
mt i =1
mn
1
In Year n , the average policyholder has incurred total X n = X ( i, t = n ) claim.
mn i =1
Z=
(
Cov X , X n +1 ), a = (1 Z )
Var X ( )
mt mt mt
E ( Xt )=E 1 1 1
X ( i, t ) = E X ( i, t ) = ( )= ( )
mt i =1 mt i =1 mt i =1
1
2
( )
= 2 mt 2
( ) =
mt mt
( )
( )
n n n n
E X =E
mi
m
( Xi ) =
1
m
mi E ( X i ) =
1
m
mi ( ) =
m
mi = ( )
i =1 i =1 i =1 i =1
( )
( )
n n n 2
Var X = Var
mi
m
( Xi ) =
1
m2
mi 2Var ( X i ) =
1
m2
mi 2
mi
i =1 i =1 i =1
2
( ) n 2
( ) 2
( )
= 2
mi = 2 (m) =
m i =1 m m
( )
Cov X , X n +1 = E X X n +1 ( ) E(X )E(X n +1 )
(
E X X n +1 = E ) (
E X X n +1 ) =E E X ( )E(X n +1 ) =E ( )
2
( )
E X =E E X ( ) =E ( )
E ( X n +1 ) = E E ( X n +1 ) =E ( )
(
Cov X , X n +1 = E ) ( )
2
E ( ) = Var ( )
Exposure m1 mn
Var ( X 1
2
( ) 2
( )
Process variance risk )= m1
Var ( X n )= mn
Then
E ( X n +1 X 1 , X 2 ,..., X n ) = Z X + (1 Z )
X=
n
mi
Xi , Z=
m
, k=
2
( )
i =1 m m+k Var ( )
Key point
In the Bhlmann-Straub credibility model, what matters is the total exposure m and the
historical average claim per exposure X . The individual claim amount X ( i, t ) doesnt
matter.
For example, everything else being equal, the following two cases have the same
Bhlmann-Straub credibility estimate.
Case #2
m1 = 1 , X (1, t = 1) = 9 ;
m2 = 4 , X (1, t = 2 ) = 3 , X ( 2, t = 2 ) = 0.6 , X ( 3, t = 2 ) = 1 , X ( 3, t = 2 ) = 0.4 .
In both cases,
the total exposure is m1 + m2 = 5 ;
the total claim dollar amount is 14 = 7+0+4+2 =9+3+0.6+1+0.4;
the average claim per insured per year is 14/5=2.8.
Loss Models mentions the Hewitts version of the Bhlmann-Straub credibility model.
This model assumes that X i , the average claim, given the sub risk class , are
independently distributed with a common mean E ( X i ) = ( ) and a variance
Var ( X i
( ) 2
) = w( ) + m .
i
So the difference between the general and the standard Bhlmann-Straub model is about
) = w( ) + m ;
i
If m1 = m2 = ...mn = m , then
n n
1 1 n
m* = = = ,
v v v
j =1
w+ j =1
w+ w+
mj m m
am * 1 1 n
Z= = = =
1 + am * 1 1 v
w+
v
1+ w+
1 m
a m* 1+ m n+
a n a
( )
mj X j
First, X . Here X j s weight is inversely proportional E Var X j , the
mj
expected process variance . The higher the expected process variance of X j , the less
weight is assigned to X j . This way, X will have the minimum variance. This point is
explained in the study note by Curtis Gary Dean. Refer to this study note if you want to
find out more.
am *
Next, lets look at the crazy formula Z = . To get comfortable with this formula,
1 + am *
n 1
look at the basic formula Z = = . Lets compare these two formulas:
v v
n+ 1+
a na
n 1 am * 1
Z= = , Z= =
n+
v 1 v 1 + am * 1 1
1+ 1+
a a n a m*
n n
1 1 n 1 1
m* = = = , Z= =
j =1 E Var X j ( ) j =1 v v
1+
1 1
1+
1 v
a m* a n
1 n
This is the Bhlmann credibility premium formula Z = = .
1 v v
1+ n +
a n a
The third point. Loss Models points out that in this version of the model, as m j
approaches infinity, the credibility factor Z wont approach to one. Lets take a look at
this.
Var ( X i
( ) 2
) = w( ) + m
i
, Var ( X i
2
( )
When m j ) = w( )+
mi
w( ).
n
1 n 1 1
m* = = , Z= = <1
j =1 w w 1 1 1 w
1+ 1+
a m* a n
Compare this with the Bhlmann model or the Bhlmann-Straub model. In the Bhlmann
model, as the number of exposures n approaches ,
n 1
Z= = 1
v 1 v
n+ 1+
a a n
Var ( X i
2
( ) n
)= 0, m* =
1
mi j =1 (
E Var X j )
1
Z= 1
1 1
1+
a m*
b
Whats new is Var ( ) =a+ , as opposed to Var ( ) = a in the Bhlmann
m
n
model and the Bhlmann-Straub model. Here m = m j represents the total exposure.
j =1
b
As you can see, this special case just changes Var ( ) = a to Var ( ) =a+ . In
m
b
other words, this special case just changes a to a + . Loss Models points out to find
m
b
the credibility factor for this special case, we just need to change a to a + :
m
b
a+ m*
am * m
Z= Z=
1 + am * b
1+ a + m*
m
Most likely, Exam C wont have problems on the generalized version of the Bhlmann-
Straub model. So you should focus on the standard Bhlmann-Straub model. To tackle
the standard Bhlmann-Straub model, you can use any of the following 3 approaches:
m
Use the Bhlmann-Straub model formula Z =
m+k
Use the unified formula (without converting into the Bhlmann model):
# of observation years m
Z= =
# of observation years + k m + k
The overall average loss per employee for all policyholders is 20.
Determine the Bhlmann-Straub credibility premium per employee for this policyholder.
Solution
EV 8, 000
So k = = = 200
VE 40
m 1800
Z= = = 0.9
m + k 1800 + 200
1 3
800 (15 ) + 600 (10 ) + 400 ( 5)
X= mi X i = = 11.111
m i =1 1800
Alternatively,
Guo Fall 2009 C, Page 159 / 284
3
k +
200 ( 20 ) + 800 (15) + 600 (10 ) + 400 ( 5 )
mi X i
P= i =1
= = 12
m+k 1800 + 200
Into
The above two tables are essentially the same. In both tables, the average loss per
employee per year is
After the conversion, the # of observation years n = 800 + 600 + 400 = 1800 . This seems
crazy, but it is merely a conceptual tool for us to transform a Bhlmann-Straub problem
into a Bhlmann problem.
n 1800
Z= = = 0.9 , P = Z X + (1 Z ) = 0.9 (11.111) + 0.1( 20 ) = 12
n + k 1800 + 200
In this method, we dont care about the distinction between the Bhlmann and the
Bhlmann Straub models. We just use the following unified formulas:
observed claims
X= ,
# of observed exposures (measured on the insured-year basis)
# of observed exposures E 2
( )
Z= , k=
# of observed exposures + k Var ( )
Solution
n n n
Z= = =
n + k n + EV w+
v
VE n + m
a
v n
Then as m approaches infinity, w + approaches w and Z approaches Z = .
m w
n+
a
Incidentally, this leads to the correct answer. However, this line of thinking is
problematic. As explained earlier, in the Bhlmann-Straub model, the credibility factor is
The correct logic is to realize that this problem involves a special Bhlmann-Straub
am * 1 1 n
Z= = = =
1 + am * 1 1 v
w+
v
1+ w+
1 m
a m* 1+ m n+
a n a
v n n
As m , 0, Z =
m v w
w+ n+
n+ m a
a
Nov 2004 #9
Members of three classes of insureds can have 0, 1, or 2 claims, with the following
probabilities:
# of claims
Class 0 1 2
I 0.9 0.0 0.1
II 0.8 0.1 0.1
III 0.7 0.2 0.1
A class is chosen at random, and varying # of insureds from that class are observed over
2 years, as shown below:
Solution
EV 0.4033 m 50 7 + 10
k= = = 60.5 , Z= = = 0.4525 , X= = 0.34
VE 0.00667 m + k 50 + 60.5 50
2
k + mi X i
60.5 ( 0.3) + (10 + 7 )
P= i =1
= = 0.318
m+k 50 + 60.5
into
7 + 10
X= = 0.34
50
n 50
Z= = = 0.4525 , P = 0.4525 ( 0.34 ) + (1 0.4525 ) 0.3 = 0.318
n + k 50 + 60.5
In this method, we dont care about the distinction between the Bhlmann and the
Bhlmann Straub models.
# of observed exposures 50
Z= = = 0.4525
# of observed exposures + k 50 + 60.5
7 + 10
X= = 0.34
50
# of claims
0 1
Class
I 0.9 0.1
II 0.8 0.2
III 0.5 0.5
IV 0.1 0.9
A class is selected at random, and four insureds are selected at random from the class.
The total number of claims is two, If five insureds are selected at random from the same
class, estimate the total number of claims using the Bhlmann Straub credibility.
Solution
a with probability p
If X = ! , then E ( X ) = ap + bq , Var ( X ) = ( a b ) pq
2
1
= ( 0.1 + 0.2 + 0.5 + 0.9 ) = 0.425
4
1
EV = ( 0.09 + 0.16 + 0.25 + 0.09 ) = 0.1475
4
VE = ( 0.12 + 0.22 + 0.52 + 0.92 ) 0.4252 = 0.096875
1
4
EV 0.1475 m 4 2
k= = = 1.5226 , Z= = = 0.7243 , X= = 0.5
VE 0.096875 m + k 4 + 1.5226 4
(100# ) 100 #
6
e
f (# ) =
120#
Solution
This time, lets solve it by converting the Bhlmann Straub credibility problem into a
Bhlmann credibility problem.
This table
Is the same as
So the total number of the observation years n = 100 + 150 + 200 = 450 . The total # of
25
observed claims is 6+8+11=25. So X = .
460
E ( N # ) = Var ( N # ) = #
= EV = E# Var ( N # ) = E# [ # ] = & = 6
1
= 0.06
100
2
EV 0.06 n 450
k= = = 100 Z= = = 0.818
VE 0.0006 n + k 450 + 100
The # of claims for each insured each year has a Poisson distribution.
Each insured in a territory has the same expected claim frequency.
The # of insureds is constant over time for each territory.
Determine the Bhlmann-Straub empirical Bayes estimate of the credibility factor Z for
Territory A.
Solution
= E E(X ) = E Var ( X )
1 2 3
= EV = ( 0.4 ) + ( 0.25 ) + ( 0.1) = 0.2
6 6 6
VE = Var E ( X ) E(X )
1
( 0.4 2 ) + ( 0.252 ) + ( 0.12 ) 0.22 = 0.0125
2 3
2
=E 2 =
6 6 6
EV 0.2 n 10
k= = = 16 , Z= = = 0.385
VE 0.0125 n + k 10 + 16
This topic is among the least interesting ones in Exam C. However, it was repeatedly
tested in Exam C. The exam problems on this topic are easy. The difficulty is to
memorize the formulas. In this chapter, I will show you some ideas behind the formulas
to help you memorize the formulas.
We have an n -year claim data about r risks. For each risk, we have its claim amount in
Year 1, Year 2, , Year n . Let X i j represent the claim incurred by the i -th
policyholder in Year j . This is what we know:
1 X 11 X 12 X 1n
2 X 21 X 22 X 2n
r X r1 X r2 Xrn
The issue here is that we dont know the probability distribution of the conditional claim
random variable X or the probability distribution of the risk variable . As a result, we
cant calculate the two inputs for the credibility factor Z : the expected process variance
EV = E Var ( X ) and the variance of the hypothetical mean VE = Var E ( X ) .
So we need to estimate EV and VE from the past claim data given to us.
Its easy to estimate EV = E Var ( X ) . We can estimate Var ( X ) for each risk
(X )
n
2 1 2
using the formula i = it Xi . Then well take the average and find
n 1 t =1
EV = E Var X i j( ) . This estimation process can be summarized as follows:
n i = it Xi
mean X i n 1 t =1
( )
1 X 11 X 12 X 1n 1 n 2 1 n 2
X1 = X 1t 1 = X 1t X1
n t =1 n 1 t =1
( )
2 X 21 X 22 X 2n 1 n 2 1 n 2
X2 = X 2t 2 = X 2t X2
n t =1 n 1 t =1
r X r1 X r2 Xrn 1 n 1 n
( )
2 2
X1 = Xr t r = X rt Xn
n t =1 n 1 t =1
(X )
r r n
1 2 1 2
EV = = Xi .
r ( n 1)
i it
r i =1 i =1 t =1
( )
Var X = Var ( ) +
1
n
E Var ( X ) = VE +
1
n
EV
VE = Var ( ) = Var X ( ) 1
n
E Var ( X ) = Var X ( ) 1
n
EV
VE = V ar X ( ) 1
n
EV
( ) ( ) (X )
n
1 2
V ar X is simple to calculate: V ar X = i X
r 1 i =1
( ) 1 n
( ) (X )
r n
1 2 1 1 2
So VE = V ar X EV = Xi X Xi
n r ( n 1)
it
n r 1 i =1 i =1 t =1
(X ) (X )
n r r n
2 1 2 1 2 1 2
= Xi , EV = = Xi
r ( n 1)
i it i it
n 1 t =1 r i =1 i =1 t =1
( ) (X )
n
1 2
Step 2 Calculate V ar X = i X
r 1 i =1
(X )
4 7 2
ij Xi = 33.6
i =1 j =1
(X )
4 2
ij Xi = 3.3
i =1
Solution
(X )
r r n
1 2 1 2 1
Step 1 EV = = Xi = 33.6 = 1.4
r ( n 1) 4 ( 7 1)
i it
r i =1 i =1 t =1
( ) (X )
n
1 2 3.3
Step 2 V ar X = i X = = 1.1
r 1 i =1 4 1
Step 3 ( )
Var X = VE +
1
n
EV VE = 1.1
1.4
7
= 0.9
(X )
4 7 2
ij Xi = 33.6
i =1 j =1
(X )
4 2
ij Xi = 3.3
i =1
Using the nonparametric empirical Bayes estimation, calculate the Bhlmann credibility
factor for an individual policyholder.
Solution
Year
Policyholder 1 2 3 4
X 730 800 650 700
Y 655 650 625 750
Using the nonparametric empirical Bayes estimation, calculate the Bhlmann credibility
factor for Policyholder Y.
Solution
r = 2, n = 4.
Step 1 Calculate the sample conditional variance for each risk and the mean
( ) (X )
n r r n
2 1 2 1 2 1 2
= X it Xi , EV = = Xi
r ( n 1)
i i it
n 1 t =1 r i =1 i =1 t =1
(X )
n
1
V ar ( X ) =
2
it Xi
n 1 t =1
1
= ( 730 720 ) + (800 720 ) + ( 650 720 ) + ( 700 720 ) =3,933.33
2 2 2 2
4 1
1
V ar ( Y ) = ( 655 670 ) + ( 650 670 ) + ( 625 670 ) + ( 750 670 )
2 2 2 2
=3,016.67
4 1
1
EV = ( 3,933.33 + 3, 016.67 ) = 3, 475
2
( ) (X )
n
1 2
Step 2 Calculate V ar X = i X
r 1 i =1
( )
V ar X =
1
2 1
( 720 695 ) + ( 670 695 )
2 2
= 1, 250
VE = V ar X ( ) 1
n
EV = 1, 250
1
4
( 3, 475 ) = 381.25
EV 3, 475 mY 4
k= = = 9.115 , ZY = = = 0.305
VE 381.25 mY + k 4 + 9.115
Here the # of policyholders varies from risk to risk and year to year. For risk 1, m11
policyholders have incurred X 11 claim amount in Year 1; m12 policyholders have
incurred X 12 claim amount in Year 1; ; that m1n policyholders have incurred X 1n
claim amount in Year 1.
For risk 2, m21 policyholders have incurred X 21 claim amount in Year 1; m22
policyholders have incurred X 22 claim amount in Year 2; ; that m2n2 policyholders
have incurred X 2n2 claim amount in Year n2 . So on and so forth.
How to estimate:
Risk Periods Total Sample mean Sample variance
exposure
( )
1 n1 n1
1 n1
2 1 n1
2
m1 = m1t X1 = m1t X 1t 1 = m1t X 1t X1
t =1 m1 t =1 n1 1 t =1
( )
2 n2 n1
1 n2
2 1 n2
2
m2 = m2 t X2 = m2t X 2t 2 = m2 t X 2 t X2
t =1 m2 t =1 n2 1 t =1
( )
nr nr nr
r nr 1 2 1 2
mr = mr t Xr = mr t X r t r = mr t X r t Xr
t =1 mr t =1 nr 1 t =1
Total r nr r
1
( ni 1)
2
mr = mi X= mi X i i
i =1 m t =1
EV = i =1
r
( ni 1)
i =1
( )
ni
2 1 2
i = mi t X i t Xi ,
ni 1 t =1
( )
r r ni
( ni 1)
2 2
i mi t X i t Xi
EV = i =1
r
= i =1 t =1
r
( ni 1) ( ni 1)
i =1 i =1
Step 2 Calculate VE
( )
r
( r 1) EV
2
mi X i X
VE = i =1
r
1
m mi2
m i =1
This formula is counter-intuitive and very hard to remember. However, youll just have
to memorize it. Perhaps Deans explained might help you a little bit. He says that the
crude estimate for VE is
( )
r 2
mi X i X
VE = i =1
r 1
However, this estimate is biased. To have an unbiased estimator, we need to change the
above estimate to
( )
r
( r 1) EV
2
mi X i X
VE = i =1
r
1
m mi2
m i =1
This isnt a big help on how to memorize the formula. This formula is hard. Youll just
have to memorize it.
Final point. Loss Models mentions the concept of credibility weighted average premium.
It proves that the total loss will be equal to the total premium if we set
Exposures
Year Adult Youth Total
1996 2000 450 2450
1997 1000 250 1250
1998 1000 175 1175
1999 1000 125 1125
Total 5000 1000 6000
Pure Premium
Year Adult Youth Total
1996 0 15 2.755
1997 5 2 4.400
1998 6 15 7.340
1999 4 1 3.667
Weighted 3 10 4.167
Average
You are also given that the estimated variance of the hypothetical means is 17.125.
Determine the non-parametric empirical Bayes credibility premium for the youth class,
using the method that preserves the total losses.
Solution
( )
r r ni
( ni 1)
2 2
i mi t X i t Xi
EV = i =1
r
= i =1 t =1
r
( ni 1) ( ni 1)
i =1 i =1
= 12,291.7
5, 000 1, 000
ZA = = 0.874 , ZY = = 0.582
12, 291.7 12, 291.7
5, 000 + 1, 000 +
17.125 17.125
r
Zi X i
0.874 ( 3) + 0.528 (10 )
= i =1
= = 5.8
r
0.874 + 0.528
Zi
i =1
The non-parametric empirical Bayes credibility premium for the youth class is:
( )
Z Y X Y + 1 Z Y = 0.582 (10 ) + (1 0.582 ) 5.8 = 8.24
Lets verify that the total credibility premium is equal to the total loss:
The non-parametric empirical Bayes credibility premium for the adult class is:
( )
Z A X A + 1 Z A = 0.874 ( 3) + (1 0.874 ) 5.8 = 3.35
1,000(8.24)+5,000(3.35)=25,000
Youth: 450(15)+250(2)+175(15)+125(1)=10,000
Or 1,000(average exposure) * (10 average premium per exposure)=10,000
Total: 25,000
Guo Fall 2009 C, Page 176 / 284
May 2001 #32
You are given the following experience for two insured groups:
Year
Group 1 2 3 4
1 # of members 8 12 5 25
Average loss 96 91 113 97
per member
# of members 25 30 20 75
2 Average loss 113 111 116 113
per member
Total # of members 100
Average loss 109
per member
( )
2 3 2
mij xij xi = 2020
i =1 j =1
( )
2 2
mi xi x = 4800
i =1
Determine the nonparametric Empirical Bayes credibility premium for group 1, using the
method that preserves the total loss.
Solution
( )
r r ni
2
( ni 1)
2
i mi j X i j Xi
i =1 j =1 2020
EV = i =1
= = = 505
r r
2+2
( ni 1) ( ni 1)
i =1 i =1
( )
r
(r 1) EV
2
mi X i X
VE = i =1
=
4800 ( 2 1) 505 = 114.533
( 252 + 752 )
r
1 2 1
m m i
100
m i =1 100
EV 505
k= = = 4.409
VE 114.533
m1 25 m2 75
Z1 = = = 0.85 , Z 2 = = = 0.944
m1 + k 25 + 4.409 m2 + k 75 + 4.409
r
Zi X i
0.85 ( 97 ) + 0.944 (113)
= i =1
= = 105.42
r
0.85 + 0.944
Zi
i =1
( )
Z 1 X 1 + 1 Z 1 = 0.85 ( 97 ) + (1 0.85 )105.42 = 98.26
( ) ( )
4 4 4 2
1 1
i mi = mi j Xi = mi j vi = mi j X i j Xi mi j X i X
j =1 mi j =1 3 j =1
Determine the credibility estimate of the rating factor for region 1 using the method that
3
preserves mi X i .
i =1
Solution
( )
r r ni
2
( ni 1)
2
i mi j X i j Xi
i =1 j =1
EV = i =1
r
= r
( ni 1) ( ni 1)
i =1 i =1
( )
r
(r 1) EV
2
mi X i X
0.887 + 0.191 + 1.348 0.2777 ( 2 )
VE = i =1
= = 0.069
( 50 + 300 + 150 )
r
1 1 2 2 2
m mi2 500
m i =1 500
EV 0.2777
k= = = 40.0829
VE 0.069
m1 25
Z1 = = = 0.555
m1 + k 25 + 40.0829
m2 300
Z2 = = = 0.882
m2 + k 300 + 40.0829
m3 150
Z3 = = = 0.789
m3 + k 150 + 40.0829
3
Zi X i
0.555 (1.406 ) + 0.8821(1.298 ) + 0.789 (1.178)
= i =1
= = 1.2824
3
0.555 + 0.8821 + 0.789
Zi
i =1
( )
Z 1 X 1 + 1 Z 1 = 0.555 (1.406 ) + (1 0.555 )1.2824 = 1.35
( )
r r ni
2
( ni 1)
2
i mi j X i j Xi
i =1 j =1
EV = i =1
r
= r
( ni 1) ( ni 1)
i =1 i =1
1
100 ( 500 333.33) + 200 ( 250 333.33)
2 2
=
( 2 1) + ( 2 1) + ( 2 1)
+500 ( 300 375 ) + 300 ( 500 375 )
2 2
=53,888,889
( )
r
( r 1) EV
2
mi X i X
VE = i =1
r
1
m mi2
m i =1
300 ( 333.33 538.46 ) + 800 ( 375 538.46 ) + 200 (1,500 538.46 ) 53,888,889 ( 3 1)
2 2 2
=
1,300
1
1,300
( 3002 + 8002 + 2002 )
=157,035.6
53,888,889 200
k= = 343.16 , Z= = 0.368
157, 035.6 200 + 343.16
Guo Fall 2009 C, Page 180 / 284
May 2005 #25
Use the nonparametric empirical Bayes method to estimate the credibility factor for
Group 1.
Solution
( )
r r ni
2
( ni 1)
2
i mi j X i j Xi
i =1 j =1
EV = i =1
r
= r
( ni 1) ( ni 1)
i =1 i =1
1
= 50 ( 200 227.27 ) + 60 ( 250 227.27 )
2 2
( 2 1) + ( 2 1)
+100 (160 178.95 ) + 90 ( 200 178.95 )
2 2
=71,985.65
110
Z1 = = 0.5
71,985.65
100 +
651.03
We have a parametric model for X , but we dont have a parametric model for
(hence the name semi-parametric). Typically, a problem will tell us that X is a Poisson
random variable with mean .
= EV .
VE = Var ( X ) EV ,
Determine the credibility of one years experience for a single driver using
semiparametric empirical Bayes estimation.
Solution
Let X represent the # of claims in a year and represents the mean of X . We are told
that X is a Poisson random variable.
= E(X ) = E E(X ) =E ( ) = E( )
EV = E Var ( X ) =E ( ) = E( )
Guo Fall 2009 C, Page 182 / 284
54 ( 0 ) + 33 (1) + 10 ( 2 ) + 2 ( 3) + 1( 4 ) 63
=X = = = 0.63
54 + 33 + 10 + 2 + 1 100
EV = = 0.63
Var ( X ) =
1 100
( )
2
Xi X
100 1 i =1
54 ( 0 .63) + 33 ( 0 .63) + 54 (1 .63) + 10 ( 2 .63) + 2 ( 3 .63) + 1( 4 .63)
2 2 2 2 2 2
=
100 1
=0.68
n 1
Z= = 0.073
EV 0.63
n+ 1+
VE 0.05
When taking the exam, you should use BA II Plus/ Professional 1-V Statistics Worksheet
to quickly calculate the sample mean and the sample variance.
Nov 2000 #7
The following information comes from a study of robberies of convenience stores over
the course of a year:
Solution
(X )
500
1
Var ( X ) =
2
i X
n 1 i =1
( ) ( )
n 2 n 2
Xi X = X i2 n X
i =1 i =1
To see why this formula works, notice that the (biased) sample variance is:
( ) = E ( X 2 ) E2 ( X ) = (X )
n n
1 1
Var ( X ) =
2 2
Xi X X i2
n i =1 n i =1
( ) ( ) ( ) ( )
n n n n
1 2 1 2 2 2
Xi X = Xi2 X , Xi X = X i2 n X
n i =1 n i =1 i =1 i =1
1 500
( ) ( )
500
1 220 5
Var ( X ) =
2 2
Xi X = X i 2 500 X = = 0.43086
500 1 i =1 500 1 i =1 499
n 1
Z= = 0.768
EV 0.1
n+ 1+
VE 0.33086
The single store didnt have any robbery incidents for two years. So the sample mean is
zero.
Solution
X01=0, Y01=2000
X02=1, Y02= 600
X03=2, Y03= 300
X04=3, Y04= 80
X05=4, Y05= 20
n 1
Z = = 0.265
EV 0.507
n+ 1+
VE 0.183
During a 2-year period, 100 policies had the following claim experience:
Number of claims in Year 1 and Year 2 Number of Policyholders
0 50
1 30
2 15
3 4
4 1
Guo Fall 2009 C, Page 185 / 284
The number of claims per year follows a Poisson distribution.
Each policyholder was insured for the entire 2-year period.
A randomly selected policyholder had one claim over the 2-year period.
Using semiparametric empirical Bayes estimation, determine the Bhlmann estimate for
the number of claims in Year 3 for the same policyholder.
Solution
Well use a 2-year period as one unit of time. So well calculate the Bhlmann estimate
the number of claims in Year 3 and Year 4. Then half of this amount will be the
Bhlmann estimate for the number of claims in Year 3.
X01=0, Y01=50
X02=1, Y02=30
X03=2, Y03=15
X04=3, Y04= 4
X05=4, Y05= 1
n 1
Z = = 0.107
EV 0.76
n+ 1+
VE 0.091
A randomly selected policyholder had one claim over the 2-year period. So the sample
claim frequency is
1 1
P = ( 0.786 ) = 0.393
2 2
The goal of the limited fluctuation credibility model is the same as the goal of the
Bhlmann credibility model. We observe that a policyholder has incurred S1 , S2 ,, Sn
claim dollar amounts in Year 1, 2,, n respectively. We want to estimate the
policyholder renewal premium in Year n + 1 . The renewal premium in Year n + 1 is
E ( S n +1 S1 , S 2 ,..., S n ) , the expected claim dollar amount in Year n + 1 .
N
Mathematically, S = X i . Here N is the total number of claims incurred in a year (loss
i =1
frequency) by a policyholder. X i is the claim dollar amount of the i -th claim (loss
severity) incurred by the policyholder. S is the total claim dollar amount incurred in a
year (also called the annual aggregate claim) by the policyholder. In contrast, in the
Bhlmann credibility model, we dont break down the annual claim dollar amount into
loss frequency and loss severity.
P = E ( S n +1 S1 , S 2 ,..., S n ) = Z S + (1 Z)
Renewal policyholder-specific global mean
premium sample mean (manual rate)
In contrast, the Bhlmann credibility theory doesnt assume the above equation holds true
automatically. It derives this equation using basic probability theories.
your n
Once again, the limited fluctuation credibility assumes that Z =
E ( N ) to make Z=1
holds true automatically without the need to prove it. So you need to accept it without
demanding any proof. The core theory of the limited fluctuation credibility is to calculate
E ( N ) to make Z = 1 .
We first derive a model for r insureds. Then to calculate the renewal premium for one
insured, we just set r = 1 .
M
S= X i = X 1 + X 2 + ... + X M
i =1
r
Here X i is the dollar amount of the i -th claim. M = N j = N1 + N 2 + ... + N r is the total
j =1
# of annual claims for r insureds; N j is the number of claims incurred by the j -th
insured.
S E (S ) S E (S )
A simplifying assumption is that is approximately normal. Set Z = .
S S
Then Z is approximately a standard normal random variable.
E (S )
P Z k p,
S
P Z a = P( a Z a) = (a) ( a) .
E (S ) E (S )
P Z a =2 (a) 1, P Z k =2 k 1 p
S S
E (S )
Lets consider the worst case P S E (S ) k E ( S ) = p or 2 k 1 = p . We
S
E (S ) E (S ) 1+ p
2 k = 1+ p , k =
S S 2
Define CVS = S
as the coefficient of variation. Its the standard deviation divided by
E (S )
the mean. Then
k 1+ p 1+ p
= , k= 1
CVS .
CVS 2 2
1+ p 1+ p
Next, define ( y) = , or y = 1
. Then k = y CVS
2 2
E ( M ) = E ( N1 + N 2 + ... + N r ) = r E ( N )
Var ( M ) = Var ( N1 + N 2 + ... + N r ) = rVar ( N )
E ( S ) = E ( X 1 + X 2 + ... + X M ) = E ( M ) E ( X ) = r E ( N ) E ( X )
Var ( S ) = Var ( X 1 + X 2 + ... + X M ) = E ( M ) Var ( X ) + Var ( M ) E 2 ( X )
= rE ( N ) Var ( X ) + rVar ( N ) E 2 ( X )
= r E ( N ) Var ( X ) + Var ( N ) E 2 ( X )
(S ) = r E ( N ) Var ( X ) + Var ( N ) E 2 ( X )
CVS =
E (S ) r E(N)E(X )
r E ( N ) Var ( X ) + Var ( N ) E 2 ( X )
k = y CVS = y
r E(N)E( X )
You also know how to derive from scratch this is the mother of all the formulas
for the limited fluctuation credibility model
Please note that r is the number of insureds needed to achieve the full credibility. E ( N )
is the number of annual claims per insured. So r E ( N ) represents the expected number
of claims the insurer needs to have in its book of business to have the full credibility.
2
# of insureds in expected # of
claims per insured 1 1+ p
Var ( X ) Var ( N )
the book of business
2
r E(N ) = +
k E2 ( X ) E(N)
the expected # of claims the insurer
needs to have in its book of business
to have full credibility
Var ( X ) Var ( N )
2
y
r E(N) = + Wrong!
k E(X ) E(N)
Var ( X )
To remember the term , please note that X is the claim dollar amount. So
E2 ( X )
E ( X ) is dollar amount and Var ( X ) is dollar squared. To have a meaningful ratio, we
need to square E ( X ) so the numerator and denominator are both dollar squared.
Var ( N )
Please also note that is fine. Here N is the claim number. So Var ( N ) is a
E(N )
Var ( N )
number; E ( N ) is a number. So the ratio is fine.
E(N )
Once again, remember that X is the dollar amount of a single claim incurred by one
policyholder and that N is the annual number of claims incurred by the policyholder.
Credibility formulas for the aggregate loss for one insured (credibility in terms of
the expected number of annual claims)
Set r = 1 .
2
1 1+ p
2 Var ( X ) Var ( N )
If E ( N ) = + , then Z = 1 .
k E2 ( X ) E(N)
your n your n
Z = min , 1 = min ,1
E ( N ) to make Z=1 Var ( N )
n0 CVX +
2
E(N)
Determine the limited fluctuation credibility net premium (in millions) for the 2nd year.
Solution
We are asked to find the limited fluctuation credibility renewal net premium for Year 2.
So we are just concerned with one policy (or one insured). Set r = 1 .
We are told that the claim size X is lognormal with the coefficient of variation of one.
The information that X is lognormal is not needed. SOA just wants to scare us. What
matters is CVX . We are told that CVX = 3 .
Var ( N )
In addition, we know that N is Poisson. So =1.
E(N)
2 2
E(N) =
1.96
5%
( 32 + 1) = 10 1.96
5%
your n 1000 5%
Z = min , 1 = min , 1 = min 10 ,1 =0.255
E ( N ) to make Z=1 1.96
2
1.96
10
5%
P = E ( S n +1 S1 , S 2 ,..., S n ) = Z S + (1 Z)
Renewal policyholder-specific global mean
premium sample mean (manual rate)
= 0.255*6.75 + (1-0.255)*5=5.446
# of claims, n 0 1 2 3 4 5
# of insureds, f n 512 307 123 41 11 6
n f n = 750 , n 2 f n = 1494
Claim sizes follow a Pareto distribution with mean 1500 and variance 6,750,000.
Guo Fall 2009 C, Page 193 / 284
Claim sizes and claim counts are independent
The full credibility standard is to be with 5% of the expected aggregate loss 95%
of the time.
Determine the minimum number of insureds needed for the aggregate loss to be fully
credible.
Solution
2
1 1+ p
2 Var ( X ) Var ( N )
r E(N) = +
k E2 ( X ) E(N )
2
1 1+ p
1 2 Var ( X ) Var ( N )
r= +
E(N) k E2 ( X ) E(N)
1 1+ p 1 1 + 95%
2 2 1.96
= =
k 5% 5%
(X ) =
2
6,750,000 6,750,000
We know that CVX = . CV =2
=3
E(X )
X
1500 1500
Var ( N )
We can use the method of moments to estimate .
E(N )
n2 fn
E(N )=
n fn
750 1, 494
E(N) = = = 0.75 , 2
= = 1.494
1000 1000 1000 1000
Var ( N ) 0.9315
Var ( N ) = 1.494 0.752 = 0.9315 = = 1.242
E(N) 0.75
2
1 1+ p
1 2 Var ( X ) Var ( N )
r= +
E(N) k E2 ( X ) E(N)
2
1 1.96 6,518.42688
= ( 3 + 1.242 ) = = 8, 691.24
0.75 5% 0.75
You are given the following information about a general liability book of business
comprised of 2500 insureds:
Ni
Xi = Yi j is a random variable representing the annual loss of the i -th insured.
j =1
Using classical credibility theory, determine the partial credibility of the annual loss
experience for the book of business.
Solution
2
1 1+ p
1 2 Var (Y ) Var ( N )
r= +
E(N) k E 2 (Y ) E(N)
Var (Y ) E (Y 2 ) E 2 ( Y ) E (Y 2 )
However, = = 1
E 2 (Y ) E 2 (Y ) E 2 (Y )
2
1+ p
E (Y 2 )
1
1 2 Var ( N )
r= 1+
E(N) k E (Y )
2
E(N)
Var ( N )
E ( N ) = r = 2 ( 0.2 ) = 0.4 , Var ( N ) = r (1 + ) , = 1+ = 1 + 0.2 = 1.2
E(N)
2 2
E (Y 2 ) ( 1)( 2) 2( 1) 2 ( 3 1)
= = = =4
E (Y ) 2 2
2 3 2
1
2
1+ p
E (Y 2 )
1
1 2 Var ( N )
r= 1+
E(N) k E 2 (Y ) E(N)
2
1 1 + 90%
2
1 2 1 1.645
= ( 4 1 + 1.2 ) = (4 1 + 1.2 )
0.4 5% 0.4 5%
2 2
4.2 1.645 1.645
= = 10.5
0.4 5% 5%
2
1 1+ p
2
Please note that many times its advantageous not to expand . For
k
2
1 1+ p
2
2 1.645
= 10.5 = 11,365.305
k 5%
2
1.645
Lets continue. 10.5 is the number of insured to get full credibility. However,
5%
the number of insureds is 2500 in the book of the business.
Individual claim size amounts are independent and exponentially distributed with
mean 5000
The full credibility standard is for the aggregate losses to be within 5% of the
expected with probability 0.9
Using classical credibility, determine the expected number of claims required for full
credibility.
Solution
2
1 1+ p
2 Var ( X ) Var ( N )
rE ( N ) = +
k E2 ( X ) E(N)
the expected # of claims the insurer
needs to have in its book of business
to have full credibility
1 1+ p 1 1 + 90%
2 2 1.645
= =
k 5% 5%
So the insurer needs to have at least 2,381 claims in a year to have full credibility.
Please note that the following information is not necessary for us to solve the problem:
The mean 5000 for the individual claim size random variable. If X is
Var ( X )
exponential, then 2 = 1 regardless of the mean.
E (X)
Nov 2003 #3
You are given:
The number of claims has a Poisson distribution
Claim sizes have a Pareto distribution with parameters = 0.5 and = 6
The number of claims and claim sizes are independent
The observed pure premium should be within 2% of the expected pure premium
90% of the time.
Solution
The pure premium is the expected total annual claim dollar amount incurred by one
policyholder. Set r = 1 , we have:
2 2
1+ p 1+ p
E(X2)
1 1
The claim size X has a Pareto distribution with parameters = 0.5 and =6
E(X2) 1 6 1
=2 =2 = 2.5
E 2
(X ) 2 6 2
2 2
1+ p 1 + 90%
E(X2)
1 1
2 Var ( N ) 2
E(N) = 1+ = ( 2.5 1 + 1)
k E 2
(X ) E(N ) 2%
2
1.645
= ( 2.5 ) = 16, 912.66
2%
m x
p ( x) = q (1 q ) , x = 0,1, 2,..., m
m x
The actual number of claims must be within 1% of the expected number of claims
with probability 0.95.
Determine q .
Solution
2
1 1+ p
2 Var ( X ) Var ( N )
rE ( N ) = +
k E2 ( X ) E(N)
This problem is concerned only with loss frequency. So we in the aggregate loss model
N
S= X i , we set X i = 1 . This way, S = N becomes the total number of claims. Setting
i =1
Var ( N ) mq (1 q )
Plugging in the numbers: p = 95% , k = 1% , = =1 q
E(N) mq
2
1 1+ p
Var ( N )
2
2 1.96
rE ( N ) = = (1 q ) = 34,574 q = 0.9
k E(N) 1%
May 2005 #2
You are given:
The number of claims follows a negative binomial distribution with parameters r
and = 3 .
Claim severity has the following distribution
Claim Size Probability
1 0.4
The number 10 0.4 of claims is independent of
the severity 100 0.2 of claims.
Determine the expected number of claims needed for aggregate losses to be within 10%
of the expected aggregate losses with 95% probability.
Solution
Using limited fluctuation (classical) credibility, determine the expected number of claims
required for full credibility.
Solution
2 2
1 1+ p 1 1+ p
2 Var ( X ) Var ( N ) 2 Var ( X )
rE ( N ) = + = +1
k E2 ( X ) E(N) k E2 ( X )
2 2
1+ p 1+ p
E(X2)
1 1
2 Var ( X ) + E 2 ( X ) 2
= =
k E2 ( X ) k E2 ( X )
2 2
1+ p 1+ p
E(X2)
1 1
+1 +1
2
2 2 1.96
rE ( N ) = = =
k E 2
(X ) k 10%
When the prior probability is continuous, many candidates dont know how to
calculate the posterior probability or how to find the Bayesian premium.
Continuous-prior problems are typically harder than discrete-prior problems.
When the prior probability is discrete and the calculation is messy, many
candidates dont know how to solve the problem in a few minutes. Many
candidates have inefficient calculation methods that are long and prone to errors.
In this chapter, I will first give you an intuitive review of Bayes Theorem. Next, I will
give you a framework for quickly solving Bayesian premium problems whether the prior
probability is discrete and continuous. In addition, I will give you a BA II Plus/BA II Plus
Professional shortcut for calculating Bayesian premiums when the prior probability is
discrete.
Even you are proficient in Bayes Theorem, I recommend that you still go over the
review. It is the foundation for the framework and shortcut to be presented later.
Prior probability. Before anything happens, as our baseline analysis, we believe (based
on existing information we have up to now or using purely subjective judgment) that our
total risk pool consists of several homogenous groups. As a part of our baseline analysis,
we also assume that these homogenous groups have different sizes. For any insured
person randomly chosen from the population, he is charged a weighed average premium.
So for an average driver randomly chosen from the population, we charge a weighed
average premium rate (we believe that an average driver has some aggressiveness and
some non-aggressiveness):
Posterior probability. Then after a year, an event changed our belief about the makeup
of the homogeneous groups for a specific insured. For example, we found in one year one
particular insured had three car accidents while an average driver had only one accident
in the same time period. So the three-accident insured definitely involved more risk than
did the average driver randomly chosen from the population. As a result, the premium
rate for the three-accident insured should be higher than an average drivers premium
rate.
The new premium rate we will charge is still a weighted average of the rates for the two
homogeneous groups, except that we use a higher weighting factor for an aggressive
drivers rate and a lower weighting factor for a non-aggressive drivers rate.
In other words, we still think this particular drivers risk consists of two risk groups
aggressive and non-aggressive, but we alter the sizes of these two risk groups for this
specific insured. So instead of assuming that this persons risk consists of 40% of an
aggressive drivers risk and 60% of a non-aggressive drivers risk, we assume that his
risk consists of 67% of an aggressive drivers risk and 33% of a non-aggressive drivers
risk.
How do we come up with the new group sizes (or the new weighting factors)? There is a
specific formula for calculating the new group sizes:
K is a scaling factor to make the sum of the new sizes for all groups equal to 100%.
In our example above, this is how we got the new size for the aggressive group and the
new size for the non-aggressive group. Suppose we know that the probability for an
aggressive driver to have 3 car accidents in a year is 15%; the probability for a non-
aggressive driver to have 3 car accidents in a year is 5%. Then for the driver who has 3
accidents in a year,
the size of the aggressive risk for someone who had 3 accidents in a year
Guo Fall 2009 C, Page 203 / 284
= K (prior size of pure aggressive risk)
(probability of an aggressive driver having 3 car accidents in a year)
= K (40% )(15%)
the size of the non-aggressive risk for someone who had 3 accidents in a year
= K (prior size of the non-aggressive risk)
(probability of a no- aggressive driver having 3 car accidents in a year)
= K ( 60% ) (5%)
K is a scaling factor such that the sum of posterior sizes is equal to one. So
1
K ( 40% ) (15%) + K ( 60% ) ( 5%) =1, K= = 11.11%
40% (15% ) + 60% ( 5% )
the size of the aggressive risk for someone who had 3 accidents in a year
= 11.11% (40% ) ( 15% )= 66.67%
the size of the non-aggressive risk for someone who had 3 accidents in a year
=11.11% (60% ) ( 5%) = 33.33%
The above logic should make intuitive sense. The bigger the size of the group prior to the
event, the higher contribution this group will make to the events occurrence; the bigger
the probability for this group to make the event happen, the higher the contribution this
group will make to the events occurrence. So the product of the prior size of the group
and the groups probability to make the event happen captures this groups total
contribution to the events occurrence.
If we assign the post-event size of a group proportional to the product of the prior size
and the groups probability to make the event happen, we are really assigning the post-
event size of a group proportional to this groups total contribution to the events
occurrence. Again, this should make sense.
Lets summarize the logic for finding the new size of each group in the following table:
1
So K=
Pr(G1 ) Pr( E | G1 ) + Pr(G2 ) Pr( E | G2 ) + ... + Pr(Gn ) Pr( E | Gn )
Pr(Gi ) Pr( E | Gi )
And Pr(Gi | E ) =
Pr(G1 ) Pr( E | G1 ) + Pr(G2 ) Pr( E | G2 ) + ... + Pr(Gn ) Pr( E | Gn )
Pr(Gi | E ) is the conditional probability that Gi will happen given the event E happened,
so it is called the posterior probability. Pr(Gi | E ) can be conveniently interpreted as the
new size of Group Gi after the event E happened. Intuitively, probability can often be
interpreted as a group size.
For example, if a probability for a female to pass Course 4 is 55% and male 45%, we can
say that the total pool of the passing candidates consists of 2 groups, female and male
with their respective sizes of 55% and 45%.
Guo Fall 2009 C, Page 205 / 284
Pr(Gi ) is the probability that Gi will happen prior to the event Es occurrence, so its
called prior probability. Pr(Gi ) can be conveniently interpreted as the size of group Gi
prior to the occurrence of E.
Pr( E | Gi ) is the conditional probability that E will happen given Gi has happened. It is the
Group Gi s probability of making the event E happen. For example, say a candidate who
has passed Course 3 has 50% chance of passing Course 4, that is to say:
We can say that the people who passed Course 3 have a 50% of chance of passing Course
4.
Before we jump into the formula, lets look at a sixth-grade level math problem, which
requires zero knowledge about probability. If you understand this problem, you should
have no trouble understanding Bayes Theorem.
Problem 1
A rock is found to contain gold. It has 3 layers, each with a different density of gold. You
are given:
The top layer, which accounts for 80% of the mass of the rock, has a gold density
of only 10% (i.e. the amount of gold contained in the top layer is equal to 10% of
the mass of the top layer).
The middle layer, which accounts for 15% of the rocks mass, has a gold density
of 5%.
The bottom layer, which accounts for only 5% of the rocks mass, has a gold
density of 0.2%.
Questions
What is the rocks density of gold (i.e. what % of the rocks mass is gold)?
Of the total amount of gold contained in the rock, what % of gold comes from the top
layer? What % from the middle layer? What % comes from the bottom layer?
Solution
Lets set up a table to solve the problem. Assume that the mass of the rock is one (can be
1 pound, 1 gram, 1 ton it doesnt matter).
Guo Fall 2009 C, Page 206 / 284
A B C D=BC E=D/0.0876
1 Layer Mass of Density of Mass of gold Of the total amount of
the layer gold in the contained in the gold in the rock, what %
layer layer comes from this layer?
2 Top 0.80 10.0% 0.0800 91.3%
3 Middle 0.15 5.0% 0.0075 8.6%
4 Bottom 0.05 0.2% 0.0001 0.1%
5 Total 1.00 0.0876 100%
Cell(D,2)=0.810%=0.08,
Cell(D,5)=0.0800+0.0075+0.0001=0.0876,
Cell(E,2)= 0.08/0.0876=91.3%.
So the rock has a gold density of 0.0876 (i.e. 8.76% of the mass of the rock is gold).
Of the total amount of gold contained in the rock, 91.3% of the gold comes from the top
layer, 8.6% of the gold comes from the middle layer, and the remaining 0.1% of the gold
comes from the bottom layers. In other words, the top layer contributes to 91.3% of the
gold in the rock, the middle layer 8.6%, and the bottom layer 0.1%.
The logic behind this simple math problem is exactly the same logic behind Bayes
Theorem.
Now lets change the problem into one about prior and posterior probabilities.
Problem 2
The standard nonsmoker class has 10% of chance of getting the specific heart
disease.
The preferred nonsmoker class has 5% of chance of getting the specific heart
disease.
Guo Fall 2009 C, Page 207 / 284
The super preferred nonsmoker class has 0.2% of chance of getting the specific
heart disease.
If a nonsmoking applicant was found to have this specific heart-related illness, what is
the probability of this applicant coming from the standard risk class? What is the
probability of this applicant coming from the preferred risk class? What is the probability
of this applicant coming from the super preferred risk class?
Solution
The solution to this problem is exactly the same as the one to the rock problem.
Event: the applicant was found to have the specific heart disease
A B C D=BC E=D/0.0876
(i.e. the scaling factor
=1/0.0876)
1 Group Before- This groups After-event After-event size of the
(or event size probability size of the group (scaled)
segment) of the of having group (not yet
group the specific scaled)
heart illness
2 Standard 0.80 10.0% 0.0800 91.3%
3 Preferred 0.15 5.0% 0.0075 8.6%
4 Super 0.05 0.2% 0.0001 0.1%
Preferred
5 Total 1.00 0.0876 100%
So if the applicant was found to have the specific heart disease, then
Problem 3
1% of the women at age 45 who participate in a study are found to have breast cancer.
80% of women with breast cancer will have a positive mammogram. 10% of women
without breast cancer will also have a positive mammogram. One woman aged 45 who
participated in the study was found to have a positive mammogram.
Guo Fall 2009 C, Page 208 / 284
Calculate the probability that this woman has breast cancer.
Solution
This problem is tricky and many folks wont be able to solve this problem right.
To solve this problem, we need to correctly identify the following 3 items:
What are the distinct causes (i.e. segments) that can possibly produce the event?
Make sure your causes are mutually exclusive (i.e. no two causes can happen
simultaneously) and collectively exhaustive (i.e. there are no other causes).
Causes of this event two distinct causes. Women with breast cancer and without breast
cancer. These are the two segments. In terms of size of each segment, women with breast
cancer account for 1% of the participants; and women without breast cancer account for
99%.
Each cause probability to produce the event women with breast cancer have 80%
chance of having a positive mammogram. Women without breast cancer have 10% of the
chance of having a positive mammogram.
Results of the study showed that light smokers were twice as likely as nonsmokers to die
during the five-year study, but only half as likely as heavy smokers.
A randomly selected participant from the study died over the five-year period.
Solution
Let p =the probability that a non-smoker will die during the next 5 years. Then,
The probability that a light smoker will die during the next 5 years is 2 p
The probability that a heavy smoker will die during the next 5 years is 4 p
Please note that we dont enough information to calculate p . This shouldnt bother us.
We need to know the value of p to solve the problem.
If we are to solve this problem quickly, we can set up the following table:
In the above table, we change the segment sizes from 20%, 30%, and 50% to 2, 3, and 5.
Similarly, we change the segments probabilities from 4 p , 2 p , and p to 4, 2, and 1.
This speeds up our calculations. You can use this technique when taking the exam.
A portfolio of independent risks is divided into two classes, Class A and Class B.
There are twice as many risks in Class A as in Class B.
The number of claims for each insured during a single year follows a Bernoulli
distribution.
Class A and B have claim size distributions as follows:
The expected number of claims per year is 0.22 for Class A and 0.11 for Class B.
One insured is chosen at random. The insureds loss for two years combined is 100,000.
Calculate the probability that the selected insured belongs to Class A.
Solution
This time, well use a formula driven approach without a table. Lets S represent the
total claim $ amount incurred by the randomly chosen insured during the 2-year period.
We observe that S = 100, 000 . We are asked to find P ( A S = 100, 000 ) , which is the
posterior probability that Class A has incurred a total loss of $100,000 during the 2-year
period.
Using either the conditional probability formula or the Bayes Theorem, we have:
P ( A) P ( S = 100, 000 A)
P ( A S = 100, 000 ) =
P ( A ) P ( S = 100, 000 A ) + P ( B ) P ( S = 100, 000 B )
1 1
= =
P ( B ) P ( S = 100, 000 B ) P ( B) P ( S = 100, 000 B )
1+ 1+
P ( A ) P ( S = 100, 000 A ) P ( A) P ( S = 100, 000 A )
P ( A S = 100, 000 ) =
1
1 P ( S = 100, 000 B )
1+
2 P ( S = 100, 000 A)
P ( S = 100, 000 B )
So we need to find the ratio .
P ( S = 100, 000 A )
P ( S = 100, 000 A ) is the probability that the Class A produces the observation (i.e. Class
A incurs $100,000 loss in 2 years).
We are told that the # of claims for Class A and B is a Bernoulli random variable.
Remember that Bernoulli random variable is just a binominal random variable with n = 1
(only one trial). Let X represent the # of claims incurred by the insured. Let p represent
the probability for the insured to have a claim. Then E ( X ) = p . We are told that
E ( X A ) = 0.22 . So pA = 0.22 . Similarly, E ( X B ) = pB = 0.11 .
So each year, Class A can have either zero claim (with probability 0.78) or one claim
(0.22). The claim amount is either 50,000 (probability 0.6) and 100,000 (probability 0.4).
Each year, Class B can have either zero claim (with probability 0.89) or one claim (0.11).
The claim amount is either 50,000 (probability 0.36) and 100,000 (probability 0.64).
P ( A S = 100, 000 ) =
1 1
= = 0.709
1 P ( S = 100, 000 B ) 1 + 1 0.1269
1+
2 P ( S = 100, 000 A ) 2 0.1547
You are tossing a coin. Not knowing p , the success rate of a heads showing up in one
toss of the coin, you subjectively assume that p is uniformly distributed over [ 0,1] . Next,
you do an experiment by tossing the coin 3 times. You find that, in this experiment, 2 out
of 3 tosses have heads.
Solution
3 Total 1 1
100%
C32 p 2 (1 p )dp
0
The key to solving this problem is to understand that we have an infinite number of
groups. Each value of p ( 0 p 1 ) is a group. Because p is uniform over
k 1 C32 p 2 (1 p )
scaling factor before-event the group's probability
group size
to have 2 heads out of 3 tosses
k is a scaling factor such that the sum of the after-event sizes for all the groups is equal to
one. Since we have an infinite number of groups, we have to use integration to sum up all
the after-event sizes for each group:
1
1
k C32 p 2 (1 p )dp = 1 k= 1
0
C32 p 2 (1 p )dp
0
C32 p 2 (1 p ) p 2 (1 p )
k C32 p 2 (1 p ) = 1
= 1
C p (1 p )dp
2
3
2
p 2 (1 p )dp
0 0
It turns out that the posterior probability we just calculated is a Beta distribution.
Key point
The process for calculating the continuous posterior probability is the same for
calculating the discrete posterior probability. The only difference is this: you use
integration for continuous posterior probability; you use summation for discrete posterior
probability.
e x
f (x ) = , x>0
x2
The parameter has the prior distribution with the following probability density
function:
Guo Fall 2009 C, Page 214 / 284
4
e
g( )= , >0
4
One claim of size 2 has been observed for a particular insured. Which of the following is
proportional to the posterior distribution of ?
2 3 4 2 2 2 9 4
e , e , e , e , e
Solution
g ( x = 2) = k g( ) f (x = 2 )
scaling factor posterior density this group's density to
posterior density
make the event happen
4
e x
g ( x = 2 ) = kg ( ) f (x = 2 ) = k
e
4 x2 x =2
4 2
g ( x = 2) = k
e e k
= e 3 4
4 22 16
3 4
So the posterior distribution of is proportional to e .
Here the problem didnt ask you to find the full posterior probability. If you have to find
it, this is how. One way is to do integration. Assume g ( x = 2 ) = K e 3 4 . Because the
total posterior probability should be one, we have:
+ +
g ( x = 2 )d =
1
K e 3 4
d = 1, K= +
0 0 3 4
e d
0
+
3 4
To calculate e 3 4
d , set = y . Then = y.
0
4 3
+ + 2 + 2
4 4 4 4
e 3 4
d = y e yd y = ye dy = y
. Here ye y
is a simple
0 0
3 3 3 0
3
+ 2
, and g ( x = 2 ) = xe
3 9 9
gamma distribution. So ye y dy = 1 , K = = 3x 4
.
0
4 19 16
4
is a gamma distribution with parameter = 2 and = . If you look at the table for
3
Exam C, youll see the gamma pdf:
1 1 9
f ( x) =
x ( 4 3)
x 1
e x
= x 2 1e = xe 3x 4
( 4 3)
2
16
A randomly selected policyholder has one claim in Year 1 and zero claim in Year 2.
For this policyholder, determine the posterior probability that 0.7 < q < 0.8 .
Solution
0.8
P ( 0.7 < q < 0.8 N1 = 1, N 2 = 0 ) = f ( q N1 = 1, N 2 = 0 )dq
q = 0.7
f ( q ) P ( N1 = 1, N 2 = 0 q ) f ( q ) P ( N1 = 1, N 2 = 0 q )
f ( q N1 = 1, N 2 = 0 ) = =
P ( N1 = 1, N 2 = 0 ) 0.8
f ( q ) P ( N1 = 1, N 2 = 0 q ) dq
0.6
q3 q 4 q5
f ( q ) P ( N1 = 1, N 2 = 0 q ) = q (1 q ) =
0.07 0.07
0.8 0.8
q ( q 4 q 5 )dq (q 5
q 6 )dq
P ( 0.7 < q < 0.8 N1 = 1, N 2 = 0 ) =
q = 0.7 q = 0.7
0.8
= 0.8
(q 4
q 5 ) dq (q 4
q 5 ) dq
0.6 0.6
0.8
1 6 1 7
6
q q
7 ! 0.7
1
( 0.86 0.7 6 )
1
( 0.87 0.77 )
= =5 6 = 0.5572
( 0.85 0.65 ) ( 0.8 0.6 )
0.8
1 5 1 6 1 1 6 6
q q 5 6
5 6 ! 0.6
f ( )= e "
, >0
1
e n
d =
0
n2
A randomly selected policyholder is known to have had at least one claim last year.
Determine the posterior probability that this same policyholder will have at least one
claim this year.
Solution
P ( N 2 # 1) = P ( N2 # 1 ) f ( )d
=0
P ( N2 # 1 ) =1 P ( N2 = 0 ) =1 e
P ( N 2 # 1) = (1 e ) f ( )d
=0
P ( N 2 # 1 N1 # 1) = (1 e )f( N1 # 1) d
=0
Next, we have:
f ( ) P ( N1 # 1 ) e (1 e )
f ( N1 # 1) = =
f ( ) P ( N1 # 1 ) d e (1 e )d
0 0
e (1 e )d = e d e 2
d =
1
12
1 3
=
22 4
0 0 0
f ( N1 # 1) =
4
3
e (1 e )
P ( N 2 # 1 N1 # 1) = (1 e )f( N1 # 1) d = (1 e ) 43 e (1 e )d
=0 =0
=
4
3
e (1 2e +e 2
)d =
4
3
e d 2 e 2
d + e 3
d
=0 0 0 0
4 1 1 1
= 2 + = 0.8148
3 12 22 32
A coin is selected at random and the flipped repeatedly. X i denotes the outcomes of the
i th flip, where 1 indicates heads and 0 indicates tails. The following sequence is
obtained:
S = { X 1 , X 2 , X 3 , X 4 } = {1,1, 0,1}
Solution
Step 1 Determine the observation. This is easy; we are already told the observation is
S = { X 1 , X 2 , X 3 , X 4 } = {1,1, 0,1}
E ( X5 ) = n p = p
However, the parameter p varies by coin types. For Coin 1-4, p = 0.5 ; for Coin 5,
p = 0.25 ; and for Coin 6, p = 0.75 . Because the coin is randomly chosen from Coin 1, 2,
3, 4, 5, and 6, we dont know which coin is chosen. So well need to partition E ( X 5 )
over coin types:
E ( X5 )
= E ( X 5 Coin 1-4 ) P ( Coin 1-4 ) + E ( X 5 Coin 5 ) P ( Coin 5 ) + E ( X 5 Coin 6 ) P ( Coin 6 )
We can go one step further and calculate E ( X 5 ) . Though the problem doesnt
specifically tell us P ( Coin 1-4 ) , P ( Coin 5) , and P ( Coin 6 ) , we assume that coins are
uniformly distributed so each coin is equally likely to be chosen. So
4 1 1
P ( Coin 1-4 ) = , P ( Coin 5 ) = , P ( Coin 6 ) =
6 6 6
4 1 1
E ( X 5 ) = 0.5 + 0.25 + 0.75 = 0.5
6 6 6
Of course, this problem isnt as simple as this. Otherwise, everyone who has passed
Exam P will pass Exam C.
Step 3 Consider the observation. Modify the equation obtained in Step 2. Change
E ( X5 )
= E ( X 5 Coin 1-4 ) P ( Coin 1-4 ) + E ( X 5 Coin 5 ) P ( Coin 5 ) + E ( X 5 Coin 6 ) P ( Coin 6 )
How to modify:
E ( X5 ) E ( X5 S )
P ( Coin 1-4 ) P ( Coin 1-4 S )
P ( Coin 5) P ( Coin 5 S )
P ( Coin 6 ) P ( Coin 6 S )
E ( X5 S )
= E ( X 5 Coin 1- 4 ) P ( Coin 1- 4 S ) + E ( X 5 Coin 5 ) P ( Coin 5 S ) + E ( X 5 Coin 6 ) P ( Coin 6 S )
= 0.5 P ( Coin 1- 4 S ) + 0.25 P ( Coin 5 S ) + 0.75 P ( Coin 6 S )
Where
P ( S ) = P ( Coin 1- 4 ) P ( S Coin 1- 4 ) + P ( Coin 5 ) P ( S Coin 5 ) + P ( Coin 6 ) P ( S Coin 6 )
Detailed calculation:
P ( S Coin 5 ) = P (1,1, 0,1 Coin 5 ) = 0.25 ( 0.25 )( 0.75 )( 0.25 ) = 0.253 ( 0.75 )
P ( S Coin 6 ) = P (1,1, 0,1 Coin 6 ) = 0.75 ( 0.75 )( 0.25 )( 0.75 ) = 0.753 ( 0.25 )
4
( 0.54 )
P ( Coin 1- 4 S ) = 6 = 0.681
4
6
( 0.54 ) + ( 0.253 ) ( 0.75 ) + ( 0.753 ) ( 0.25)
1
6
1
6
1
( 0.253 ) ( 0.75 )
P ( Coin 1- 4 S ) = 6 = 0.032
4
6
( 0.5 ) + 6 ( 0.25 ) ( 0.75) + 6 ( 0.75 ) ( 0.25)
4 1 3 1 3
1
( 0.753 ) ( 0.25 )
P ( Coin 1- 4 S ) = 6 = 0.287
4
6
( 0.5 ) + 6 ( 0.25 ) ( 0.75) + 6 ( 0.75 ) ( 0.25)
4 1 3 1 3
E ( X5 S )
= 0.5 P ( Coin 1- 4 S ) + 0.25 P ( Coin 5 S ) + 0.75 P ( Coin 6 S )
= 0.5 (0.681) + 0.25 (0.032) + 0.75 (0.287) = 0.564
I recommend that initially you use the 5-step framework to calculate discrete-prior
Bayesian premiums. Just copy what I did. Explicitly write out each of the 5 steps; dont
skip step. Solve as many problems as you need until you are proficient with the
framework.
Once you are familiar with the 5-step process, lets learn how to improve it. Well focus
on improving Step 4 (calculating the posterior probabilities). If you ever solve a Bayesian
premium problem, youll have discovered that Step 4 is long, tedious, and prone to errors.
Take a look at Step 4 in Problem 4. See how involved the calculation is. When taking the
exam, you are really stressed. In addition, you have only 3 minutes to solve a problem. If
you follow the standard solution approach, chances are high that youll mess up at least
one step of your calculation. Then all your hard work is ruined. You wont be able to
score a point.
Most exam candidates will mess up in Step 4 . Lets find a better way to do Step 4.
1
k=
P(S )
1
=
P ( Coin 1- 4 ) P ( S Coin 1- 4 ) + P ( Coin 5 ) P ( S Coin 5 ) + P ( Coin 6 ) P ( S Coin 6 )
After multiplying each raw posterior probability with this constant, the three posterior
probabilities will nicely add up to one. Normalization is necessary; its a part of Bayes
Theorem. However, it is a messy calculation. So ideally, well want to avoid it.
It turns out that we really can avoid normalizing the raw posterior probabilities. To
understand how to avoid normalization, lets formally present the question:
0.5
4
6
( 0.54 ) k
0.25
1
6
( 0.253 ) ( 0.75 ) k
0.75
1
6
( 0.753 ) ( 0.25) k
0.5
4
6
( 0.54 ) k = 0.041667 k 41,667
0.25
1
6
( 0.253 ) ( 0.75) k = 0.001953 k 1,953
0.75
1
6
( 0.753 ) ( 0.25) k = 0.017578 k 17,578
X01=0.5, Y01=41,667
X02=0.25, Y02= 1,953
X03=0.75, Y03=17,578
Press the down arrow key & . You should get: n = 61,198
Press the down arrow key & . You should get: X = 0.56382970
So E ( X 5 S ) ' X = 0.564
This the result calculated using BA II Plus/Professional 1-V Statistics Worksheet matches
what we calculated in the 5-step process.
Press the down arrow key & . You should get: n = 61,198
Press the down arrow key & . You should get: X = 0.56382970
So E ( X 5 S ) ' X = 0.564
Using 1-V Statistics Worksheet, you should get: n = 613 , X = 0.56362153 ' 0.564
Risks in Class A have a Poisson claim count distribution with a mean of 1.0 per year.
Risks in Class B have a Poisson claim count distribution with a mean of 3.0 per year.
Risks in Class A have an exponential severity distribution with a mean of 1.0 per year.
Risks in Class B have an exponential severity distribution with a mean of 3.0 per year.
Each class has the same number of risks.
Within each class, severities and claim counts are independent.
A risk is randomly selected and observed to have 2 claims during one year. The observed
claim amounts were 1.0 and 3.0. Calculate the posterior expected value of the aggregate
loss for this risk during the next year.
Solution
Let
S represent the aggregate claim dollar amount.
X represent the individual claim dollar amount
N represent the # of claims
N
Then S = ( X i . We are told that N and X are independent. In addition, X i are
i =1
First, lets make things simple and forget about the condition N = 2, X 1 = 1, X 2 = 3 . Then
E ( S ) = E ( N ) E ( X ) . Since the risk is randomly chosen from Class A and Class B, we
have:
E ( S ) = E ( S A ) P ( A) + E ( S B ) P ( B )
The above formula is an Exam P concept. You shouldnt have trouble understanding it.
Here P ( A) and P ( B ) are prior probabilities, which are probabilities prior to our
observation { N = 2, X 1 = 1, X 2 = 3} .
Next,
E ( S A ) = E ( N A ) E ( X A ) = "A A = 1(1) = 1
E ( S B ) = E ( N B ) E ( X B ) = "B B = 3 ( 3) = 9
Here "A and "B are the Poisson means for claim counts for Class A and B respectively.
And A and B are exponential mean claim amounts for Class A and B respectively.
E ( S ) = P ( A) + 9P ( B )
Our observation { N = 2, X 1 = 1, X 2 = 3} has changed our belief of the likelihood that the
risk is from Class A and Class B. So well no longer use the prior probability P ( A) and
P ( B ) to calculate E ( S ) .
E ( S N = 2, X 1 = 1, X 2 = 3)
= P ( A N = 2, X 1 = 1, X 2 = 3) + 9 P ( B N = 2, X 1 = 1, X 2 = 3)
Next, well need to use the Bayes theorem to calculate the posterior probabilities
P ( A N = 2, X 1 = 1, X 2 = 3) and P ( B N = 2, X 1 = 1, X 2 = 3) :
P ( A ) P ( N = 2, X 1 = 1, X 2 = 3 A )
P ( A N = 2, X 1 = 1, X 2 = 3) =
P ( N = 2, X 1 = 1, X 2 = 3)
P ( A ) P ( N = 2, X 1 = 1, X 2 = 3 A )
=
P ( A) P ( N = 2, X 1 = 1, X 2 = 3 A ) + P ( B ) P ( N = 2, X 1 = 1, X 2 = 3 B )
P ( B ) P ( N = 2, X 1 = 1, X 2 = 3 B )
P ( B N = 2, X 1 = 1, X 2 = 3) =
P ( N = 2, X 1 = 1, X 2 = 3)
P ( B ) P ( N = 2, X 1 = 1, X 2 = 3 B )
=
P ( A) P ( N = 2, X 1 = 1, X 2 = 3 A ) + P ( B ) P ( N = 2, X 1 = 1, X 2 = 3 B )
If you understand my logic so far, you are in the good shape. The remaining work is just
the calculation.
Standard calculation
Well calculate the probability for Class A Risk and Class B Risk to each produce the
observed outcome { N = 2, X 1 = 1, X 2 = 3} :
=e "A
2!
1
e A
1
e A
=e 1 1
2!
(e 1
)( e ) = 12 e
3 5
= 0.00337
A A
P B { N = 2, X 1 = 1, X 2 = 3} = P B ( N = 2 ) P B ( X = 1) P B ( X = 3)
( "B )
2 1 3 1 3 1
"B 1 1 32 1 1 1 4
=e e B
e B
=e 3
e 3
e 3
= e 3
= 0.00656
2! B B 2! 3 3 2
P ( A N = 2, X 1 = 1, X 2 = 3)
P ( A ) P ( N = 2, X 1 = 1, X 2 = 3 A )
=
P ( A) P ( N = 2, X 1 = 1, X 2 = 3 A ) + P ( B ) P ( N = 2, X 1 = 1, X 2 = 3 B )
0.5 ( 0.00337 )
= = 0.339
0.5 ( 0.00337 ) + 0.5 ( 0.00656 )
Similarly,
P ( B N = 2, X 1 = 1, X 2 = 3)
P ( B ) P ( N = 2, X 1 = 1, X 2 = 3 B )
=
P ( A) P ( N = 2, X 1 = 1, X 2 = 3 A ) + P ( B ) P ( N = 2, X 1 = 1, X 2 = 3 B )
0.5 ( 0.00656 )
= = 0.661
0.5 ( 0.00337 ) + 0.5 ( 0.00656 )
Finally,
E ( S N = 2, X 1 = 1, X 2 = 3)
= P ( A N = 2, X 1 = 1, X 2 = 3) + 9 P ( B N = 2, X 1 = 1, X 2 = 3)
=1(0.339) + 9(0.661) = 6.29
When taking the exam, youll still need to understand the conceptual framework
explained in the beginning of the solution. However, youll skip the normalizing step and
avoid the need to manually calculate the mean.
This is what you need when solving this problem in the exam condition:
Event: { N = 2, X 1 = 1, X 2 = 3}
Group Before- This groups After-event Scale up Conditional
event probability to produce size of the raw mean
size of the event group (raw posterior
the posterior probability
group probability) (multiply
the raw
probability
by
200,000)
A 0.5 e
1
2!
1
( e 1 )( e 3 ) 0.5(0.00337) 337 "A A = 1(1) = 1
1
= e 5 = 0.00337
2
B 0.5 32 1
3
1
3
1 3
3
656 "B B = 3 ( 3) = 9
e e e 0.5(0.00656)
2! 3 3
1
1 4
= e 3
= 0.00656
2
Solution
E ( N3 ) = E ( N3 A ) P ( A) + E ( N3 B ) P ( B )
= "A P ( A ) + "B P ( B ) = 2 P ( A ) + 4 P ( B )
Next, well modify the above partition equation by considering the observation
{ N1 = 4, N 2 = 4} . Well change the prior probabilities to posterior probabilities:
E ( N 3 N1 = 4, N 2 = 4 ) = 2 P ( A N1 = 4, N 2 = 4 ) + 4 P ( B N1 = 4, N 2 = 4 )
P A ( N1 = 4, N 2 = 4 )!
P ( A N1 = 4, N 2 = 4 ) =
P ( N1 = 4, N 2 = 4 )
P ( A ) P ( N1 = 4, N 2 = 4 A )
=
P ( A) P ( N1 = 4, N 2 = 4 A) + P ( B ) P ( N1 = 4, N 2 = 4 B )
Similarly,
P ( B ) P ( N1 = 4, N 2 = 4 B )
P ( B N1 = 4, N 2 = 4 ) =
P ( A ) P ( N1 = 4, N 2 = 4 A) + P ( B ) P ( N1 = 4, N 2 = 4 B )
Detailed calculations (if you use my shortcut, youll avoid most of these calculations):
2
( "A )
4 2
24
P ( N1 = 4, N 2 = 4 A ) = P ( N1 = 4 A) P ( N 2 = 4 A ) = e "A
= e 2
4! ! 4!
4! ! 4!
2
24 2
0.5 e
P ( A N1 = 4, N 2 = 4 ) =
4!
2 2
= 0.176
4 4
2 4
0.5 e 2
+ 0.5 e 4
4! 4!
2
44 4
0.5 e
P ( B N1 = 4, N 2 = 4 ) =
4!
2 2
= 0.824
24 44
0.5 e 2
+ 0.5 e 4
4! 4!
The above two calculations are nasty and prone to errors. Many candidates will mess up
in these calculations and wont score a point. Assume you have done your calculation
right, you should get:
E ( N 3 N1 = 4, N 2 = 4 )
= 2 P ( A N1 = 4, N 2 = 4 ) + 4 P ( B N1 = 4, N 2 = 4 )
= 2(0.176) + 4(0.824) = 3.648
Just set up the following table and let BA II Plus/Professional 1-V do the magic for you.
Watch and relax.
Event: { N1 = 4, N 2 = 4}
Group Before- This After-event size of the Scale up Conditional
event groups group (raw posterior raw mean
size of probability probability) posterior
the to produce probability
group the event
A 2 2
"A = 2
24 24
e2 0.5 e 2
0.5 4! 4!
B "B = 4
4 2 4 2
4 4 4 4
0.5 e 0.5 e
4! 4!
Guo Fall 2009 C, Page 233 / 284
Next, well need to scale the raw posterior probabilities up. Well want to avoid the error-
prone calculation of following two raw posterior probabilities:
2 2
24 2 444
0.5 e , 0.5 e
4! 4!
Remember what I said earlier when I was explaining Bayes Theorem to you:
What matters is the ratio of these two (or more) raw posterior probabilities, not their
absolute amounts.
= 216 ( e )
4! 4! 2 2
= 1, = 4 = 256e 4 = 4.689
(e )
2 2 2 2
2 24
2 24 2
0.5 e 0.5 e
4! 4!
New Table
2
444
0.5 e
4!
2 2 = 256e 4
44 44 24 2
B 0.5 e4 0.5 e 4 0.5 e 2 4,689 "B = 4
4! 4! 4!
= 4.689
Prior to observing any claims, you believed that claim sizes followed a Pareto distribution
with parameters = 10 and =1, 2, or 3, with each value equally likely. You then
observe one claim of 20 for a randomly selected risk. Determine the posterior probability
that the next claim for this risk will be greater than 30.
Solution
P ( X 2 > 30 )
= P ( X 2 > 30 = 1) P ( = 1) + P ( X 2 > 30 = 2 ) P ( = 2 ) + P ( X 2 > 30 = 3) P ( = 3)
If the random variable is greater then zero, then use two parameter Pareto.
If the random variable is greater than a positive constant, then use one parameter Pareto.
The problem just vaguely says that claim sizes follow a Pareto distribution. Here the
claim size (i.e. claim dollar amount) must be greater than zero. Theres no reason for us
to think that the claim dollar amount must exceed a positive constant (such $500). As a
result, well use the 2-parameter Pareto.
P ( X 2 > 30 ) = S ( 30 ) = 10 1
= ,
30 + 10 4
1 2 3
1 1 1
P ( X 2 > 30 ) = P ( = 1) + P ( = 2) + P ( = 3)
4 4 4
1 1 1
= P ( = 1) + P ( = 2 ) + P ( = 3)
4 16 64
P ( X 2 > 30 X 1 = 20 ) = P ( = 1 X 1 = 20 ) + P ( = 2 X 1 = 20 ) + P ( = 3 X 1 = 20 )
1 1 1
4 16 64
Next, well calculate the posterior probabilities. If you look at Tables for Exam C/4,
youll find the density function of a 2-parameter Pareto distribution with parameters is:
+1
f (x )= = ,
(x+ )
+1
x+
+1 +1
= 10 , f ( 20 ) = 10 2010+ 10 1
Then for =
10 3
P ( = 2 ) f ( 20 = 2)
P ( = 2 X 1 = 20 ) =
f ( 20 )
P ( = 3) f ( 20 = 3)
P ( = 3 X 1 = 20 ) =
f ( 20 )
f ( 20 ) = P ( = 1) f ( 20 = 1) + P ( = 2 ) f ( 20 = 2 ) + P ( = 3) f ( 20 = 3)
+1 +1
P ( = 1 X 1 = 20 ) =
0.3704% 1
=
0.7408% 2
P ( = 2 X 1 = 20 ) =
0.2469% 1
=
0.7408% 3
P ( = 3 X 1 = 20 ) =
0.1235% 1
=
0.7408% 6
Then
P ( X 2 > 30 X 1 = 20 ) = P ( = 1 X 1 = 20 ) + P ( = 2 X 1 = 20 ) + P ( = 3 X 1 = 20 )
1 1 1
4 16 64
1 1 1 1 1 1
= + + = 0.148
4 2 16 3 64 6
If you ever try to reproduce my answers, youll find the calculation outlined above is
absolutely a nightmare. In addition, I must acknowledge that I used an Excel spreadsheet
to help me do the above calculations when I was preparing this manual. I must also
knowledge that theres little chance that I will be able to do the calculation right in the
heat of the exam.
In the exam, Ill never use the above standard approach, which is prone to errors. This is
what I will do in the exam (dramatically reducing the complexity of the calculations).
This is what you should do in the exam:
Guo Fall 2009 C, Page 237 / 284
What you should do in the exam room
Event: X 1 = 20
A B C D=BC E F
3 30 3 3 30 3 3
4
1 2 1
2
1 2 1
2
1
2
=2 3 2
30 3 3 30 3 4
1 3 1
3
1 3 1
3
1
3
=3 3 1
30 3 3 30 3 4
1
X 01 = = 0.25 , Y01 = 3
4
2
1
X 02 = = 0.0625 , Y02 = 2
4
3
1
X 02 = = 0.015625 , Y03 = 1
4
You see how nice and easy the shortcut calculation is.
The claim count and claim size distributions for risks of Type B are:
Determine the Bayesian premium for the next year for this same risk.
Solution
N
Let S = ( X i represent the total annual loss. The observation is S1 = 500 . We are asked
i =1
to find E ( S 2 S1 = 500 ) . If we ignore the observation S1 = 500 , then the problem becomes
finding E ( S2 ) . Since the risk can be from either Type A or Type B, well condition S2
on risk types.
E ( S2 ) = E ( S2 A) P ( A) + E ( S2 B ) P ( B )
E (S ) = E (N ) E ( X ) ,
E ( S2 A ) = E ( N 2 A) E ( X A ) , E ( S2 B ) = E ( N 2 B ) E ( X B )
E ( N 2 A) = 0 = , E ( N2 B ) = 0
4 4 1 6 1 4 4 12
+1 +2 +1 +2 =
9 9 9 9 9 9 9 9
P ( A) P ( S1 = 500 A ) P ( B ) P ( S1 = 500 B )
P ( A S1 = 500 ) = , P ( B S1 = 500 ) =
P ( S1 = 500 ) P ( S1 = 500 )
P ( A S1 = 500 ) P ( A ) P ( S1 = 500 A )
=
P ( B S1 = 500 ) P ( B ) P ( S1 = 500 B )
The only way for Type A to incur 500 claim in Year 1 is to have one claim of 500. The
only way for Type B to incur 500 claim in Year 1 is to two claims of 250 each.
So P ( S1 = 500 A) = , P ( S1 = 500 B ) =
4 1 4 2
.
9 3 9 3
P ( A S1 = 500 ) = , P ( B S1 = 500 ) =
3 4
7 7
Event: S1 = 500
A B C D=BC E F
32
4
0.5
9
Type 4 1 4 1
A 0.5 0.5 3 660
9 3 9 3
2 2
4 2 4 2
Type 0.5 0.5 4 368
9 3 9 3
B
Solution
Guo Fall 2009 C, Page 241 / 284
Conceptual framework
3
E ( N 2 ) = ( E ( N 2 Class i ) P ( Class i )
i =1
E ( N 2 Class 1) = 0
1 1 1
+1 +2 =1
3 3 3
E ( N 2 Class 2 ) = 1
1 2 1
+2 +3 =2
6 3 6
E ( N 2 Class 3) = 2
1 2 1
+3 +4 =3
6 3 6
3 1
P ( Class 1) P ( N1 = 1 Class 1)
P ( Class 1 N1 = 1) = = 6 3
P ( N1 = 1) P ( N1 = 1)
2 1
P ( Class 2 ) P ( N1 = 1 Class 2 )
P ( Class 2 N1 = 1) = = 6 6
P ( N1 = 1) P ( N1 = 1)
3
P ( Class 3) P ( N1 = 1 Class 3) 0
P ( Class 3 N1 = 1) = = 6 =0
P ( N1 = 1) P ( N1 = 1)
3 1 2 1 3 2
P ( N1 = 1) = + + 0 =
6 3 6 6 6 9
2 1
P ( Class 2 ) P ( N1 = 1 Class 2 )
P ( Class 2 N1 = 1) =
1
=6 6=
P ( N1 = 1) 2 4
9
Event: N1 = 1
A B C D=BC E F
2 1
2 2/6 1/6 1 2
6 6
3 1/6 0 1/60 0 3
Because the posterior probability is zero for Class to produce N1 = 1 , we can delete the
last row.
A test car has either front air bags or side air bags (but not both), each type being
equally likely
The test car will be driven into either a wall or a lake, with each accident type
being equally likely
The manufacturer randomly selects 1, 2, 3, or 4 crash test dummies to put into a
car with front air bags.
The manufacturer randomly selects 2, or 4 crash test dummies to put into a car
with side air bags.
Each crash test dummy in a wall-impact accident suffers damage randomly equal
to either 0.5 or 1, with damage to each dummy being independent of damage to
the others.
Each crash test dummy in a lake-impact accident suffers damage randomly equal
to either 1 or 2, with damage to each dummy being independent of damage to the
others.
One test car is selected at random, and a test dummy accident produces total damage of 1.
Determine the expected value of the total damage for the next accident, given that the
kind of safety device (front or side air bags) and accident type (wall or lake) remain the
same.
Solution
This is one of the most feared exam problems. If you use the framework and shortcut,
however, you should do just fine.
Conceptual framework
N
Damage S = ( X i , where X is damage incurred by one test dummy; N is the number
i =1
of dummies chosen for the crash testing. The observation is S1 = 1 . We are asked to find
E ( S 2 S1 = 1) .
To simplify the problem, lets first discard the observation. Then the problem becomes
finding E ( S2 ) . The crash testing falls into four types:
E ( S2 )
= E ( S 2 FW ) P ( FW ) + E ( S 2 FL ) P ( FL ) + E ( S 2 SW ) P ( SW ) + E ( S 2 SL ) P ( SL )
1+ 2 + 3 + 4
E ( N FW ) = E ( N F ) = = 2.5
4
If the car is tested for lake collision, then the damage to a tested dummy can be either 0.5
or 1 with each damage equally likely:
0.5 + 1
E ( X FW ) = E ( X W ) = = 0.75
2
E ( S 2 FW ) = E ( N FW ) E ( X FW ) = 2.5 ( 0.75 )
Similarly,
1+ 2 + 3 + 4 1+ 2
E ( S 2 FL ) = E ( N FL ) E ( X FL ) = = 2.5 (1.5 )
4 2
2 + 4 0.5 + 1
E ( S 2 SW ) = E ( N SW ) E ( X SW ) = = 3 ( 0.75 )
2 2
2 + 4 1+ 2
E ( S 2 SL ) = E ( N SL ) E ( X SL ) = = 3 (1.5 )
2 2
P ( FW ) = P ( FL ) = P ( SW ) = P ( SL ) = 0.25
P ( FW ) P ( S1 = 1 FW ) P ( FL ) P ( S1 = 1 FL )
P ( FW S1 = 1) = , P ( FL S1 = 1) =
P ( S1 = 1) P ( S1 = 1)
P ( SW ) P ( S1 = 1 SW ) P ( SL ) P ( S1 = 1 SL )
P ( SW S1 = 1) = , P ( SL S1 = 1) =
P ( S1 = 1) P ( S1 = 1)
Where
P ( S1 = 1) = P ( FW ) P ( S1 = 1 FW ) + P ( FL ) P ( S1 = 1 FL )
+ P ( SW ) P ( S1 = 1 SW ) + P ( SL ) P ( S1 = 1 SL )
The key is to calculate P ( S1 = 1 FW ) . In a front bag lake collision testing, the number of
dummies can be 1,2,3, or 4; the damage per dummy can be 0.5 or 1. So there are only 2
ways for FW to produce S1 = 1 .
Two dummies were chosen each having 0.5 damage. Probability: 0.25(0.5)(0.5)
One dummy was chosen having 1 damage. Probability: 0.25(0.5)
Total probability: P ( S1 = 1 FW ) =0.25(0.5)(0.5) + 0.25(0.5) = 0.1875
We can apply the same logic and find (please verify my calculation):
P ( S1 = 1 FL ) = 0.125 , P ( S1 = 1 SW ) = 0.125 , P ( S1 = 1 SL ) = 0
Finally,
0.25 0.1875
P ( FW S1 = 1) =
3
=
0.25 (0.1875 + 0.125 + 0.125) 7
0.25 0.1875
P ( SW S1 = 1) =
2
=
0.25 (0.1875 + 0.125 + 0.125) 7
0.25 0
P ( SL S1 = 1) = =0
0.25 (0.1875 + 0.125 + 0.125)
Finally,
3 2 2
= 2.5 ( 0.75 ) + 2.5 (1.5 ) + 3 ( 0.75 ) = 2.518
7 7 7
Event: S1 = 1
A B C D=BC E F
1 1
FL 0.125 ( 0.125) 1250 2.5 (1.5 )
4 4
SW 1 0.125 1 1250 3 ( 0.75)
( 0.125)
4 4
SL 1 0 0 0 3 (1.5)
4
Because the posterior probability is zero for Class to produce S1 = 1 , we can delete the
last row.
Guo Fall 2009 C, Page 247 / 284
Enter the following into BA II Plus/Professional 1-V Statistics Worksheet:
The # of claims on a given policy has the geometric distribution with parameter .
One-third of the policies have = 2 ; and the remaining two-thirds have = 5 .
Calculate the Bayesian expected # of claims for the selected policy in Year 2.
Solution
E ( N2 ) = E ( N2 = 2) P ( = 2) + E ( N2 = 5) P ( = 5)
E ( N 2 N1 = 2 ) = 2 P ( = 2 N1 = 2 ) + 5 P ( = 5 N1 = 2 )
1 22 4 1 4
=2 = = 0.04938 4,938 2
(1 + 2 )
3
3 27 3 27
2 52 25 2 25
=5 = = 0.07716 7,716 5
(1 + 5)
3
3 216 3 216
For a particular policy, the conditional probability of the annual number of claims given
) = , and the probability distribution of ) are as follows:
# of claims 0 1 2
Probability 2 1 3
0.10 0.30
Probability 0.80 0.20
Solution
E ( X2 )
= E ( X 2 , X 2 = 0 ) P ( X 2 = 0 ) + E ( X 2 , X 2 = 1) P ( X 2 = 1) + E ( X 2 , X 2 = 2 ) P ( X 2 = 2 )
= 0(2 ) + 1( ) + 2 (1 3 )=2 5
Event: X 1 = 1
A B C D=BC E F
The solution process for continuous-prior problems are similar to the process for the
discrete prior problems. There are two major differences:
Well use integration for the continuous prior problems; well use summation for
the discrete prior problems.
You cant use the BA II Plus/Professional 1-V Statistics Worksheet shortcut any
more to solve a continuous-prior premium problem. In contrast, you use the BA II
Plus/Professional 1-V Statistics Worksheet shortcut to solve a discrete-prior
premium problem.
The # of claims from an employee during the year follows a Poisson distribution
100 p
with mean , where p is the salary (in thousands) for the employee
100
An employee is selected at random. No claims were observed for this employee during
the year. Determine the posterior probability that the selected employee has a salary
greater than 50.
Solution
If we ignore the observation, we just need to find P ( p > 50 ) . Since p is uniform on the
interval [0, 100], we have:
100
P ( p > 50 ) = f ( p ) dp
50
100
P ( p > 50 N = 0 ) = f ( p N = 0 ) dp
50
f ( p) P ( N = 0 p) f ( p) P ( N = 0 p)
f ( p N = 0) = =
P ( N = 0) 100
f ( p ) P ( N = 0 p ) dp
p =0
100 p
N p is a Poisson random variable with mean " = = 1 0.01 p . So
100
P ( N = 0 p ) = e0.01 p 1 . f ( p ) P ( N = 0 p ) = 0.01e0.01 p 1 ,
=e 1
( e 1) = 1 e 1
100 100
e e0.5
P ( p > 50 N = 0 ) = f ( p N = 0 )dp =
0.01 0.01 p
e dp = = 0.622
p = 50 p = 50
e 1 e 1
100 p
Since N p is a Poisson random variable with mean , we naturally set
100
100 p
"= . Since p is uniform over [0, 100], 100 p is also uniform over [0, 100] and
100
100 p
"= is uniform over [0, 1]. f ( " ) = 1 .
100
100 p p
"= =1 , p = 100 (1 " )
100 100
# of claims Probability
0 0.1
1 0.9 q
2 q
q2
The prior density is (q) = , 0.2 < q < 0.5
0.039
A randomly selected policyholder had two claims in Year 1 and two claims in Year 2.
For this insured, determine the Bayesian estimate of the expected number of claims in
Year 3.
Solution
Lets simplify the problem by discarding the observation ( N1 = 2, N 2 = 2 ) . Then our task
is to find prior mean E ( N 3 ) . This is an Exam P problem.
N 3 is distributed as follows:
Eq E ( N 3 q ) ! = Eq ( q + 0.9 ) = E ( q ) + 0.9
E ( N 3 ) = E ( q ) + 0.9
0.5 0.5
q2
E ( q ) = q ( q ) dq = q dq = 0.39
0.2 0.2
0.039
So the mean prior to the observation is 1.29. Please note that we dont need to calculate
the prior mean. I calculated it just to show you this: if you discard the observation, then
the problem becomes an Exam P problem.
Next, lets add in the observation. The observation ( N1 = 2, N 2 = 2 ) will change the
equation from E ( N 3 ) = E ( q ) + 0.9 to
Guo Fall 2009 C, Page 254 / 284
E ( N 3 N1 = 2, N 2 = 2 ) = E ( q N1 = 2, N 2 = 2 ) + 0.9
0.5
E ( q N1 = 2, N 2 = 2 ) = q f ( q N1 = 2, N 2 = 2 ) dq
0.2
f ( q ) P ( N1 = 2, N 2 = 2 q ) f ( q ) P ( N1 = 2, N 2 = 2 q )
f ( q N1 = 2, N 2 = 2 ) = =
P ( N1 = 2, N 2 = 2 ) 0.5
f ( q ) P ( N1 = 2, N 2 = 2 q ) dq
0.2
q2
P ( N1 = 2, N 2 = 2 q ) = q 2 , (q) = .
0.039
q2
( q2 )
q4
f ( q N1 = 2, N 2 = 2 ) = 0.5 0.039 = 0.5
q2
0.039
( q ) dq
2
q 4 dq
0.2 0.2
0.5
1 6 0.5 q 5 dq
0.5 q !
E ( q N1 = 2, N 2 = 2 ) = q f ( q N1 = 2, N 2 = 2 ) dq = 0.5 =6
0.2
0.2
= 0.419
1 5 0.5
0.2 4
q dq q !
5 0.2
0.2
For a single insured, two claims were observed that totaled 50. Determine the expected
value of the next claim from the same insured.
Solution
Guo Fall 2009 C, Page 255 / 284
We are asked to find E ( X 3 X 1 + X 2 = 50 ) . If we ignore the observation X 1 + X 2 = 50 ,
then the problem becomes
+ + +
E ( X3 ) = xf ( x )dx = xf ( x " )g ( " ) dx = x ( " 1e x "
)g ( " ) dx
0 0 0
If we consider the observation, well need to change the prior density g ( " ) to the
posterior density g ( " X 1 + X 2 = 50 )
+
E ( X 3 X 1 + X 2 = 50 ) = x ( " 1e x "
)g ( " X 1 + X 2 = 50 ) dx
0
Determine the probability that the next claim will exceed 500.
Solution
] . So P ( X 3 > 550 ) =
550
X3 is uniformly distributed over [ 0, .
550
P ( X 3 > 550 ) = 1 f ( )d
500
Since we have the observation X 1 = 400, X 2 = 600 , we will modify the above equation by
changing the prior density f ( ) to the posterior density f ( X 1 = 400, X 2 = 600 ) :
Guo Fall 2009 C, Page 256 / 284
P ( X 3 > 550 X 1 = 400, X 2 = 600 ) = ( X 1 = 400, X 2 = 600 ) d ]
550
1 f
600
Please note that weve also changed d to d because weve observed X 2 = 600 .
500 600
( X 1 = 400, X 2 = 600 ) =
k
f 4
where > 600
600
4
4 +1
3 ( 6003 )
f ( X 1 = 400, X 2 = 600 ) = 4
3 ( 6003 ) d = 3 ( 6003 ) ( )d
550 1
= 1 4
4
550 5
600 600
= 3 ( 6003 ) = 3 ( 6003 )
1 4 +1 550 5+1 1 550
600 3
600 4
4 +1 5 +1 ! 600 3 4
3 550
=1 = 0.3125
4 600
6003
f ( x1 , x2 ) = 3 4
, > 600
Solution
E ( X 3 x1 , x2 ) = E ( X3 )f ( x1 , x2 ) d
600
X3 is uniform over [ 0, ] . So E ( X 3 ) = .
2
6003
E ( X 3 x1 , x2 ) =
2
3 4
d =
3
2
( 6003 ) 3
d =
3
2
( 6003 ) ( 600
1
2
2
) = 450
600 600
1
( " ) = ( 0.5) 5e 5"
+ ( 0.5 ) e " 5
In the first policy year, no claims were observed for the insured.
Solution
5 !
5 !
5 ( 0.5)
=k ( 6e 6"
) + 0.5 6
e 6" 5
6 6 5 !
Next, well need to find the normalizing constant k . The total probability should be one.
We have:
5 ( 0.5)
(" N 1 = 0 )d " = k ( 6e 6"
) + 0.5 6
e 6" 5
=1
0 0
6 6 5 !
5 ( 0.5 )
(" N 1 = 0) = 2 ( 6e 6"
) + 0.5 6
e 6" 5
6 6 5 !
=
5
6
( 6e 6"
) + 16 6
5
e 6" 5
The 2nd actuary assumes the same mean for the gamma distribution, but only half
the variance
A total of one claim is observed for the insured over a 3-year period
Both actuaries determine the Bayesian premium for the expected number of
claims in the next year using their model assumptions
Determine the ratio of the Bayesian premium that the 1st actuary calculates to the
Bayesian premium that the 2nd actuary calculates.
Solution
If
Then
The conditional random variable " n1 , n2 ,..., nk also follows gamma distribution with
parameters
*
= + n1 + n2 + ... + nk = + total # of claims observed
1 1
1 1
*
= = +k = + # of observation years
1+ k
This theorem is tested over and over and you should memorize it. If you want to find the
proof of this theorem, refer to the textbook Loss Models.
1st actuary: =1, = 1 6 . The Bayesian premium for the 4th year is
2nd actuary: You need to know that a gamma distribution with parameters and has
2
mean and variance . We are told that the two actuaries get the same mean but the
2 actuary gets half the variance of the 1st one.
nd
2
1 1 1 1 1
= 1 = , 2
= 1 , =2, =
6 6 2 6 12
2 1 10
So the ratio is =
9 5 9
Nov 2001 #3
You are given:
The # of claims per auto insured follows a Poisson distribution with mean "
The prior distribution for " has the following probability density function:
( 500" ) e 500"
50
f (" ) =
"1 ( 50 )
Solution
We need to find E ( N 3 N1 = 75, N 2 = 210 ) , where N 3 is the # of claims in Year 3 for the
one auto policy. Then the expected # of auto claims in Year 3 for 1,100 auto policies is
simply
f (" ) =
"1 ( 50 )
If you look at Table for Exam C, youll find the gamma pdf is:
x 1 x
x x
e e
( x" ) x"
1
e 1
f ( x) = = = , where " = .
x1( ) 1
1( "1 ( )
)
You should immediately recognize that this is gamma distribution with parameters
= 50 and " = 500 . Then using the gamma distribution formula listed in Table for
Exam C, we have
50
E ( N3 ) = E ( " ) = = = 0.1 .
" 500
If we consider the observation N1 = 75, N 2 = 210 , then we need to modify the formula
E ( N 3 ) = E ( " ) to E ( N 3 N1 = 75, N 2 = 210 ) = E ( " N1 = 75, N 2 = 210 ) .
Then the expected # of auto claims in Year 3 for 1,100 auto policies is simply
335
1,100 = 184.25
2, 000
May 2001 #2
1
f (" ) = e " 3
, " >0
3
Two claims were observed during the 1st year. Determine the variance of the posterior
mean.
Solution
Please note that exponential distribution is a gamma distribution with parameter =1.
So this is the Poisson-gamma model.
The observation is N1 = 2 . We are asked to find the variance Var ( " N1 = 2 ) . We are told
that N " is Poisson with mean " , yet " is gamma with =1, = 3.
Solution
Please note that a uniform distribution is a special case of beta distribution with
parameter a = b = = 1 . In addition, Bernoulli distribution is a special case of binomial
distribution with n = 1 .
If
Then
The conditional random variable p x1 , x2 ,..., xk also has beta distribution with parameters
f ( p ) P ( x1 , x2 ,..., xk p )
f ( p x1 , x2 ,..., xk ) =
1
. Where is
f ( p ) P ( x1 , x2 ,..., xk p ) dp f ( p ) P ( x1 , x2 ,..., xk p ) dp
Next, lets find the beta pdf f ( p ) . If you look at the Exam C table, youll see that beta
distribution has the following pdf:
1 (a + b) 1 x
f ( x) = u a (1 u ) , 0< x< , u=
b 1
1 ( a ) 1 (b ) x
This pdf is really annoying. It has variables u and x . To simplify the pdf, set = 1.
Then u = x and 0 < x < 1 . The pdf becomes:
1 (a + b) 1 1 (a + b) a
f ( x) = x a (1 x ) = (1 x) , 0 < x < 1.
b 1 1 b 1
x
1 ( a ) 1 (b ) x 1 ( a ) 1 (b)
This is the most commonly used beta pdf. This is the one you should use for Exam C.
Back to the problem. Since p has beta distribution with parameter a and b , the pdf is
1 (a + b)
f ( p) = (1 p) (1 p)
b 1 b 1
pa 1
, which is proportional to p a 1
.
1 ( a ) 1 (b )
So P ( x1 , x2 ,..., xk p ) is proportional to
p x1 (1 p ) p x2 (1 p ) ... p xk (1 p )
n x1 n x2 n xk
! ! !
k
( xi k
= p i=1 (1 p ) (
( x1 + x2 +...+ xk )
= p x1 + x2 +...+ xk (1 p )
kn kn xi
i =1
( xi , which is proportional to
f ( p x1 , x2 ,..., xk ) is proportional to f ( p ) p (1 p)
kn
i =1
i =1
k k
( xi k
( =p
a+ ( xi 1 k
( xi
(1 p) (1 p) (1 p)
a 1 b 1 kn xi i =1
b+k n 1
i =1
p p i =1 i =1
a* = a + x1 + x2 + ... + xk , b* = b + k n ( x1 + x2 + ... + xk )
Next, well calculate E ( X k +1 x1 , x2 ,..., xk ) , the Bayesian estimate for Year k + 1 , using the
5-step framework.
E ( X k +1 ) = E p E ( X k +1 p ) ! = E p [ n p ] = n E ( p )
Next, we consider the observation x1 , x2 ,..., xk . Well modify the above equation by
changing the prior mean E ( p ) to the posterior mean E ( p x1 , x2 ,..., xk ) . We already
know that p x1 , x2 ,..., xk has beta distribution with parameters
a* = a + x1 + x2 + ... + xk , b* = b + k n ( x1 + x2 + ... + xk )
Looking up the beta expectation formula from the Exam C table, we have:
a*
E ( p x1 , x2 ,..., xk ) =
a* + b*
Finally, we have:
a*
E ( X k +1 x1 , x2 ,..., xk ) = n E ( p x1 , x2 ,..., xk ) = n
a* + b*
Now lets apply the binomial-beta formula to this problem. We are told that the # of
claims in a year is a Bernoulli random variable. So the number of trial is n = 1 . In
addition, the prior distribution of p is uniform over [0, 1], which is beta distribution with
parameter a = b = 1 .
k k k
a + ( xi 1 + ( xi 1 + ( xi
E ( X k +1 x1 , x2 ,..., xk ) = n i =1
= (1) i =1
= i =1
We have two unknowns in one equation. We cant solve it. One way to find the right
k
k
1 + ( xi
1
answer is to test each answer. If (x
i =1
i = 0 and k = 3 , well have i =1
2+k
=
5
. So zero
For an insurance:
Losses can be 100, 200 or 300 with respective probabilities 0.2, 0.2, and 0.6.
Calculate Var (Y P ) .
(A) 1500 (B) 1875 (C) 2250 (D) 2625 (E) 3000
Core concepts:
Ground up loss
Ordinary deductible
Claim payment
Claim payment per payment
Explanation
Let X represent the ground up loss amount (ground up loss amount is the actual loss
incurred by the policyholder). Let d where d 0 represent the deductible.
0 if X d
(X d )+ = max ( X d , 0) =
X d if X > d
X if X d
(X d ) = min ( X , d ) =
d if X > d
X = (X d )+ + (X d)
ground up loss amount paid by the insured
amount paid by the
insurance company out of his own pocket
However, if the loss is $400, then you pay all the loss and the insurance company pays
zero.
400 = 0 + 400
ground up loss amount paid by the amount paid by the insured
insurance company out of his own pocket
Full solution
Let X represent the ground up loss. Let Y represent the claim payment. The deductible is
d = 150 .
YP =Y Y > 0
n = 8, X = 125, X = 43.30127019
Var = 2
= 1,875
Losses can be 100, 200, 300, and 400 with respective probabilities 0.1, 0.2, 0.3, and 0.4.
Calculate Var (Y P ) .
Solution
Fast solution
10 P ( X ) -- scaled up probability 3 4
Var = 2
= 2, 4489.98
Standard solution
1 2 3 4
(X 150 ) + X > 150 = 0 2 + 02 + 502 + 150 2 = 13, 928.57143
2
E
7 7 7 7
Losses can be 1,000, 4,000, 5,000, 9,000, and 12,000 with respective probabilities 0.11,
0.17, 0.24, 0.36, and 0.12.
Calculate Var (Y P ) .
Solution
Ground up loss X 1 4 5 9 12
Is X > 0.9 ? Yes. Yes. Yes. Yes. Yes.
Keep. Keep. Keep. Keep. Keep.
( X 0.9 )+ 0.1 3.1 4.1 8.1 11.1
P(X ) 0.11 0.17 0.24 0.36 0.12
100P ( X ) -- scaled 11 17 24 36 12
up probability
Losses follow an exponential distribution with the same mean in all years.
The loss elimination ratio this year is 70%.
The ordinary deductible for the coming year is 4/3 of the current deductible.
Core concept:
LER answers the question, What % of the expected loss amount is absorbed by the
policyholder due to the deductible?
+ +
E(X ) = xf ( x )dx = s ( x )dx
0 0
X if X d
(X d ) = min ( X , d ) =
d if X > d
d +
E(X d ) = x f ( x )dx + d f ( x )dx (Intuitive formula)
0 d
Alternatively,
d d
E(X d ) = s ( x )dx = 1 FX ( x ) dx
0 0
You can find the proof of the 2nd formula from Loss Models.
+
E(X ) = E(X 0) = s ( x )dx
0
x x x
1
f ( x) = e , s ( x) = 1 F ( x) = 1 1 e = e , E(X ) =
d d x d
E(X d ) = s ( x )dx = e dx = 1 e
0 0
E(X d) d
LER = =1 e (you might want to memorize this result)
E(X )
d d
1 e = 0.7, e = 0.3
4
Under the new deductible (which is of the original deductible),
3
4
4 d d 3 4
LER ' = 1 e 3
=1 e = 1 0.3 3 = 0.799
The above formula works whether Y is a simple random variable or a compound random
n n
variable Y = X i . If Y = X i , make sure you write
i =1 i =1
Dont write
E (Y m )+ = E (Y ) m + mf X ( 0 ) + ( m 1) f X (1) + ( m 2 ) f X ( 2 ) + ...1 f X ( m 1)
In other words, the pdf in the right hand side must match up with the random variable in
n
the left hand side. If the random variable in the left hand side Y = X i , you need to use
i =1
If your random variable in the left hand side is X , then you need to write
To use the above formula in the heat of the exam, we rewrite the above formula into:
fY ( 0 ) m
fY (1) m 1
E (Y m )+ = E (Y ) m + fY ( 2 ) m 2
... ...
fY ( m 1) 1
In the above formula,
fY ( 0 ) m
fY (1) m 1
fY ( 2 ) m 2 = mfY ( 0 ) + ( m 1) fY (1) + ( m 2 ) fY ( 2 ) + ...1 fY ( m 1)
... ...
fY ( m 1) 1
This is not a standard notation. However, we use it anyway to help us memorize the
formula. In the exam, you just write these 2 matrixes. Then you simply take out each
element in the 1st matrix and multiply it with a corresponding element in the 2nd matrix.
Next, sum everything up.
Please note that if you take out an element fY ( k ) (where 0 k m 1 ) from the 1st
matrix, then you need to multiple it with m k from the 2nd matrix so ( m k ) + k = m
stands.
d 1
E ( S d )+ = E ( S ) 1 FS ( s )
s =0
d 1
E ( S d )+ = E ( S ) 1 FS ( x )
s =0
The above formula is confusing. f S ( x ) is not a good notation because S and x dont
match. The right notation should be f S ( s ) .
d 1
Lets move on from the formula E ( S d )+ = E ( S ) 1 FS ( s ) . To make our proof
s =0
simple, lets set d = 3 . The proof is the same if d is bigger.
2
E ( S 3) + = E ( S ) 1 FS ( s )
s =0
2
1 FS ( s ) = 1 FS ( 0 ) + 1 FS (1) + 1 FS ( 2 ) = 3 FS ( 0 ) + FS (1) + FS ( 2 )
s =0
FS ( 0 ) = P ( S 0) = P ( S = 0) = fS (0)
FS (1) = P ( S 1) = P ( S = 0 ) + P ( S = 1) = f S ( 0 ) + f S (1)
FS ( 2 ) = P ( S 2 ) = P ( S = 0 ) + P ( S = 1) + P ( S = 2 ) = f S ( 0 ) + f S (1) + f S ( 2 )
FS ( 0 ) + FS (1) + FS ( 2 ) = 3 f S ( 0 ) + 2 f S (1) + f S ( 2 )
E ( S 3)+ = E ( S ) 3 + 3 f S ( 0 ) + 2 f S (1) + f S ( 2 )
fY ( 0 ) m
fY (1) m 1
E (Y m )+ = E (Y ) m + fY ( 2 ) m 2
... ...
fY ( m 1) 1
A company provides insurance to a concert hall for losses due to power failure. You are
given:
The number of power failures in a year has a Poisson distribution with mean 1.
x Probability of x
10 0.3
20 0.3
50 0.4
The number of power failures and the amounts of losses are independent.
Calculate the expected amount of claims paid by the insurer in one year.
Solution
N
Then S = Xi .
i =1
The total claim dollar amount after the deductible of $30 is:
N
(S 30 )+ = Xi 30
i =1 +
fS (0) 30
f S (1) 29
E ( S 30 )+ = E ( S ) 30 + f S ( 2 ) 28
... ...
f S ( 29 ) 1
It seems like we have awful lot of work to do about the two matrixes. Before you start to
panic, please note that many of the values f S ( 0 ) , f S (1) ,..., f S ( 29 ) will be zero. This is
because X has only 3 distinct values: 10, 20, and 50 with probability of 0.3, 0.3, and 0.4
respectively. Evidently, we can throw away X = 50 . If X = 50 , then S is at least 50 and
is out of the range S 29 .
N
Please also note that S = X i where N is a Poisson random variable with mean =1.
i =1
1
P ( N = n) = e 1
n!
0 e 1 0 e1
1 e 1 X = 10 0.3 10 0.3e 1
X = 20 0.3 20 0.3e 1
2 1
e 1 ( X 1 , X 2 ) = (10,10 ) 0.32 20
e ( 0.32 )
1 1
2 2
N
S= Xi
i =1
P(S )
0 e1
10 0.3e 1
20 0.3e 1
e ( 0.32 )
20 1 1
2
After consolidation:
N
S=
P(S )
Xi
i =1
0 e1
10 0.3e 1
( 0.3 ) = 0.345e
20 1
0.3e 1 + e 1 2 1
fS ( 0) 30
E ( S 30 ) + = E ( S ) 30 + f S (10 ) 20
f S ( 20 ) 10
In the actual exam, to help remember the two matrixes, you can write only the 1st matrix:
fS ( 0) a
f S (10 ) b
f S ( 20 ) c
As said early, the sum of the two elements in each row needs to be m (or 30 in this
problem). As a result,
0 + a = 30 a = 30
10 + b = 30 b = 20
20 + c = 30 c = 10
fS ( 0) a fS ( 0) 30
f S (10 ) b = f S (10 ) 20
f S ( 20 ) c f S ( 20 ) 10
fS ( 0) 30 e 1
30 1 30
f S (10 ) 20 = 0.3e 1 20 = e 1
0.3 20 = 39.45e 1
f S ( 20 ) 10 0.345e 1 10 0.345 10
N
S= Xi E (S ) = E (N ) E ( X )
i =1
E ( S ) = E ( N ) E ( X ) = 29
E ( S 30 ) + = E ( S ) 30 + 39.45e 1 = 13.5128
x fX ( x)
1 0.6
2 0.4
Solution
N
S= X i where S is the aggregate loss and X is individual loss dollar amount.
i =1
fS ( 0) 3
E ( S 3 )+ = E ( S ) 3 + f S (1) 2
fS ( 2) 1
fS ( 0)
Next, we need to find f S (1) .
fS ( 2)
N P(N ) X P ( X 1 , X 2 ,..., X N ) N
S= Xi
i =1
P(S )
0 e 2 0 e2
1 2e 2
X =1 0.6 1 ( 0.6 ) 2e 2
X =2 0.4 2 ( 0.4 ) 2e 2
2 22 2
e = 2e 2 ( X 1 , X 2 ) = (1,1) 0.62 2 ( 0.6 ) 2e
2 2
2!
N
S= Xi
i =1
P(S )
0 e2
1 ( 0.6 ) 2e 2 = 1.2e 2
fS ( 0) 3 e 2
3
E ( S 3)+ = E ( S ) 3 + f S (1) 2 = 2.8 3 + 1.2e 2
2
fS ( 2) 1 1.52e 2
1
1 3
= 2.8 3 + e 2
1.2 2 = 2.8 3 + 6.92e 2 = 0.73652
1.52 1
Prescription drug losses, S, are modeled assuming the number of claims has a geometric
distribution with mean 4, and the amount of each prescription is 40.
Calculate E ( S 100 ) +
Mr. Guo currently teaches an online prep course for Exam P, FM, MFE, and MLC. For
more information, visit http://actuary88.com/.
If you have any comments or suggestions, you can contact Mr. Guo at
yufeng_guo@msn.com.