Академический Документы
Профессиональный Документы
Культура Документы
ACTL2131
Probability and Mathematical Statistics
Exercises
S1 2017
January 5, 2017
Contents
1 Probability Theory 2
Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Exercise 1.1 [wk01Q1] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Exercise 1.2 [wk01Q2] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Exercise 1.3 [wk01Q3] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Exercise 1.4 [wk01Q4] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Exercise 1.5 [wk01Q5] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Exercise 1.6 [wk01Q6] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Exercise 1.7 [wk01Q7] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Exercise 1.8 [wk01Q8] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1 Mathematical Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Exercise 1.9 [wk01Q9] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Exercise 1.10 [wk01Q10] . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Exercise 1.11 [wk01Q11] . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Exercise 1.12 [wk01Q12] . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Exercise 1.13 [wk01Q13] . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Exercise 1.14 [wk01Q14] . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Exercise 1.15 [wk01Q15] . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Exercise 1.16 [wk01Q16] . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Exercise 1.17 [wk01Q17] . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Univariate Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Exercise 1.18 [wk02Q1] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Exercise 1.19 [wk02Q2] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Exercise 1.20 [wk02Q3] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Exercise 1.21 [wk02Q4] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Exercise 1.22 [wk02Q5] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Exercise 1.23 [wk02Q6] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
i
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
ii
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
iii
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
2 Parameter Estimation 58
2.1 Estimation Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Exercise 2.1 [wk05Q2] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Exercise 2.2 [wk05Q5] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Exercise 2.3 [wk05Q6] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Exercise 2.4 [wk05Q9] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Exercise 2.5 [wk05Q12] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Exercise 2.6 [wk05Q13] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2.2 Limit Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Exercise 2.7 [wk05Q1] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Exercise 2.8 [wk05Q3] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Exercise 2.9 [wk05Q4] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Exercise 2.10 [wk05Q7] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Exercise 2.11 [wk05Q8] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Exercise 2.12 [wk05Q10] . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Exercise 2.13 [wk05Q11] . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2.3 Evaluating Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Exercise 2.14 [wk06Q1] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Exercise 2.15 [wk06Q2] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Exercise 2.16 [wk06Q3] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Exercise 2.17 [wk06Q4] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Exercise 2.18 [wk06Q5] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Exercise 2.19 [wk06Q6] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Exercise 2.20 [wk06Q7] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Exercise 2.21 [wk06Q8] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Exercise 2.22 [wk06Q9] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Exercise 2.23 [wk06Q10] . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Exercise 2.24 [wk06Q11] . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Exercise 2.25 [wk06Q12] . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Exercise 2.26 [wk06Q13] . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Exercise 2.27 [wk06Q14] . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Exercise 2.1 [wk05Q2] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Exercise 2.2 [wk05Q5] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Exercise 2.3 [wk05Q6] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Exercise 2.4 [wk05Q9] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Exercise 2.5 [wk05Q12] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
iv
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
3 Hypothesis Test 89
3.1 Statistical test procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Exercise 3.1 [wk07Q1] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Exercise 3.2 [wk07Q2] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Exercise 3.3 [wk07Q3] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Exercise 3.4 [wk07Q4] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Exercise 3.5 [wk07Q5] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Exercise 3.6 [wk10Q10] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
3.2 Properties of the hypothesis testing . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Exercise 3.7 [wk08Q1] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Exercise 3.8 [wk08Q2] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Exercise 3.9 [wk08Q3] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Exercise 3.10 [wk08Q4] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Exercise 3.11 [wk08Q5] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
v
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
vi
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
vii
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
viii
Schedule of Tutorial Exercises
1
Module 1
Probability Theory
Preliminaries
Exercise 1.1: [wk01Q1, Solution, Schedule] An urn contains one black ball and one gold ball while
a second urn contains one white and one gold ball. One ball is selected at random from each urn.
3. Describe the event that both balls will be of the same colour. What is the probability of this
event?
Exercise 1.2: [wk01Q2, Solution, Schedule] A box contains 100 Christmas balls: 49 are red, 34 are
gold, and 17 are silver. Three balls are to be drawn without replacement. Determine the probability
that:
2. the balls are drawn in the order: red, gold, and silver;
3. the third ball is a silver, given that the first 2 are red and gold (not necessarily in that order); and
4. the first 2 are red, given that the third ball is a silver;
Exercise 1.3: [wk01Q3, Solution, Schedule] Let A and B be two independent events. Prove that the
following pairs are also independent:
1. A and BC
2. AC and B
3. AC and BC
Exercise 1.4: [wk01Q4, Solution, Schedule] A pair of events A and B cannot be simultaneously
mutually exclusive and independent. Assume that their probabilities are strictly positive, i.e., Pr (A) >
0 and Pr (B) > 0. Prove the following:
2
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
Exercise 1.5: [wk01Q5, Solution, Schedule] This exercise shows that independence does not imply
pairwise independence. Consider a random experiment which consists of tossing two dice. Define
the following events:
E1 = doubles appear
Exercise 1.6: [wk01Q6, Solution, Schedule] In an undergraduate statistics class, three students A, B,
and C submitted exactly (word-for-word) the same solution to a homework problem. It is the policy
of the lecturer to give zero marks for those who copy homework problems. Believing that there must
be one of the three who actually did the work, the lecturer will pardon one of the three and chooses at
random the student to pardon.
However, the lecturer will only inform the students at the end of the semester who among them has
been pardoned.
The next day, A tries to get the lecturer to tell him who had been pardoned. The lecturer refuses. A
then asks which of B or C will not be pardoned. The lecturer thinks for a while, then tells A that B is
not to be pardoned.
Lecturers reasoning: Each student has a 1/3 chance of being pardoned. Clearly, either B or C
must not be pardoned, so I have given A no information about whether A will be pardoned.
As reasoning: Given that B will not be pardoned, then either A or C will be pardoned. My
chance of being pardoned has risen to 1/2.
1. Evaluate the lecturers reasoning, i.e., explain whether his reasoning is justified.
Exercise 1.7: [wk01Q7, Solution, Schedule] Two airlines serving some of the same cities in Australia
have merged. Management has decided to eliminate some of the repetitious daily flights. On the
Perth-Sydney route, one airline originally had five daily flights (each at different a time) and the other
had six daily flights (each at different a time). Determine the number of ways:
2. the first airline can eliminate two of its scheduled five flights.
3. the second airline can eliminate two of its scheduled six flights.
3
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
Exercise 1.8: [wk01Q8, Solution, Schedule] Three boxes are numbered 1, 2 and 3. For k = 1, 2 and
3, box k contains k blue marbles and 5 k red marbles. In a two-step experiment, a box is selected and
2 marbles are drawn from it without replacement. If the probability of selecting box k is proportional
to k, what is the probability that the two marbles drawn have different colors?
Pr(0) = Pr(1)
Pr(k + 1) = Pr(k)/k for k = 1, 2, 3, . . ..
Find Pr(0).
Exercise 1.10: [wk01Q10, Solution, Schedule] Consider X, a continuous random variable with den-
sity function
Find
Exercise 1.11: [wk01Q11, Solution, Schedule] The distribution function for a discrete random vari-
able X is given by:
if x < 1
0
F X (x) = if 1 x < 2/3
1/3
if x 2/3.
1
Exercise 1.12: [wk01Q12, Solution, Schedule] Let X be a random variable with density:
1 x 2
" #
1
fX (x) = exp , for < x < .
2 2
Here, X is called a normally distributed random variable.
4
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
3. Use the above result to prove that, with the normal density, we have
Exercise 1.13: [wk01Q13, Solution, Schedule] Let X be a random variable with parameters , , ,
and <, and have the following moment generating function: MX (t) = 1 + t + t2 + t3 + t4 .
1. How many distribution functions corresponds to this m.g.f. for given values of the parameters?
5. Let X represents the claim sizes, i.e., a higher value is bad for the insurer. Insurer A and Bi
ask a quote for reinsuring a tail risk (for example: the reinsurer makes a payment to the insurer
if the loss is larger than $1 million). Based on the mean, variance, skewness and kurtosis, which
of the two would receive a higher quote for reinsuring the risk, and why, if:
Exercise 1.14: [wk01Q14, Solution, Schedule] The probability density function for a continuous
random variable X is given by:
(
2/x3 for x 1
fX (x) =
0 otherwise.
Exercise 1.15: [wk01Q15, Solution, Schedule] Let X be a random variable with probability density
function:
e , if x 0;
1 x
2
fX (x) =
1 ex , if x < 0.
2
3. Find the moment generating function and the probability generating function of X.
5
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
Exercise 1.16: [wk01Q16, Solution, Schedule] Actuaries often model the age-at-death as a non-
negative random variable X and define the force of mortality as follows:
F X (x + h) F X (x)
(x) = lim ,
h0 h (1 F X (x))
where F X () denotes the cdf of X.
Exercise 1.17: [wk01Q17, Solution, Schedule] A random variable X has a probability density func-
tion of the form:
fX (x) = ax 1 bx2 , for 0 x 1, and zero otherwise,
1. This year, there are 100 students enrolled in an introductory actuarial studies course. For the
mid-session test for this course, the papers are marked by a team of tutors; however, a sample of
these papers is examined by the course professor for marking consistency. Experience suggests
that 1% of all papers will be improperly marked. The professor selects 10 papers at random
from the 100 papers and examines them for marking inconsistencies. X is the number of papers
in the sample that are improperly marked.
2. A standard drug has been known to be effective in 90% of the cases in which it is used. To
re-evaluate the effectiveness of this same drug, a clinical trial will be performed where 20 has
volunteered. X is the number of cases where the drug has been found effective.
6
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
3. An immunologist is studying blood disorders exhibited by people with rare blood types. It
is estimated that 10% of the population has the type of blood being investigated. Volunteers
whose blood type is unknown are tested until 100 people with the desired blood type are found.
X is the number of people tested who do not have the desired rare blood type.
4. Customers arrive at a fastfood restaurant independently and at random. During lunch hour,
where more customers are often expected to arrive, customers arrive at the fastfood restaurant
at the rate of two per minute on the average. X is the number of people who arrive between 12:15
p.m. and 12:30 p.m.
Exercise 1.19: [wk02Q2, Solution, Schedule] For each of the following moment generating functions
of discrete random variables X, identify the distribution and specify the associated parameters.
et
1. MX (t) =
2 et
!3
et + 1
2. MX (t) =
2
!
1 1
3. MX (t) = exp et
2 2
!4
et
4. MX (t) =
2 et
!5
3et + 1
5. MX (t) =
4
Exercise 1.20: [wk02Q3, Solution, Schedule] Poisson approximation to the binomial. This exercise
is to show that binomial probabilities can be approximated using the Poisson probabilities, which
are generally easier to calculate. Let X Binomial(n, p) and Y Poisson() where = np. The
approximation states that
Pr (X = x) Pr (Y = x) ,
for large n and small np. This can be proven using convergence of mgfs. Denote the respective mgfs
by MX (t) and MY (t).
2. Another method to prove this approximation is as follows: First, establish that the Poisson
distribution satisfies the relation
Pr (Y = x)
= for x = 1, 2, . . . .
Pr (Y = x 1) x
7
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
Pr (Y = 0) Pr (X = 0) ,
for large n.
3. A typesetter, on the average makes one error in every 400 words typeset. A typical page contains
300 words. Use the Poisson approximation to the binomial to compute the probability that there
will be more than 3 errors in 10 pages.
Exercise 1.21: [wk02Q4, Solution, Schedule] An insurance company receives 200 claims per day on
the average. Claims arrive independently and at random at the company office. Of the claims, 95%
are for amounts less than $100 and are processed immediately; the remaining 5% are examined more
closely to verify their accuracy and eligibility.
2. Determine the probability of getting at most two claims over $100 in a given day.
3. How many claims for amounts less than $100 should this company expect to receive in a week
(5 business days)?
Exercise 1.23: [wk02Q6, Solution, Schedule] Suppose that you have $1 000 to invest for a year. You
are currently evaluating two investments: Investment A and Investment B, with annual rates of return,
respectively denoted by RA and RB . Assume:
1. Under Investment A, compute the probability that your investment will be below $1 000 in a
year.
2. Under Investment A, compute the probability that your investment will exceed $1 200 in a year.
3. Under Investment B, compute the probability that your investment will below $1 000 in a year.
8
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
4. Under Investment B, compute the probability that your investment will exceed $1 200 in a year.
Exercise 1.24: [wk02Q7, Solution, Schedule] A city engineer has studied the frequency of accidents
at two busy intersections. He has determined that the time T in months between accidents at each
intersection has an exponential distribution. The parameters for these two distributions are 2 and 2.5.
Assume that the occurrence of accidents at these intersections is independent.
1. Determine the probability that there are no accidents at either intersection in the next month.
2. Determine the probability that there will be no accidents for at least one of the intersections in
the next month.
Exercise 1.25: [wk02Q8, Solution, Schedule] The Pareto distribution is very commonly used to
model certain insurance loss amounts. We say X has a Pareto distribution if its density can be ex-
pressed as:
+1
fX (x) = for x > ,
x
and zero otherwise.
4. An insurance policy has a deductible1 of $5. The random variable for the loss amount (before
deductible) on claims filed has a Pareto distribution with = 3.5 and = 4. Find:
Pr(X = x, Y = y) X=x
0 1 2 3
Y=y 1 0.05 0.20 0.15 0.05
2 0.20 0.15 0.12 0.08
Calculate:
1
A deductible is that the policy only makes no payment if the loss amount is smaller than the deductible; and the claim
amount equals the loss amount minus the deductible if the loss amount is larger than the deductible.
9
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
1. E [X]
2. E [Y]
3. E [X |Y = 1]
4. Var (Y |X = 3)
5. E [XY] and Cov(X,Y).
Exercise 1.27: [wk03Q5, Solution, Schedule] Let X and Y be two discrete random variables whose
joint probability mass function is given by:
Pr(X = x, Y = y) X=x
1 2 3 4
1 0.10 0.05 0.02 0.02
Y=y 2 0.05 0.20 0.05 0.02
3 0.02 0.05 0.20 0.04
4 0.02 0.02 0.04 0.10
Exercise 1.28: [wk03Q6, Solution, Schedule] Let X and Y have the joint density:
6
fX,Y (x, y) = (x + y)2 , for 0 x 1 and 0 y 1,
7
and zero otherwise.
Exercise 1.29: [wk03Q8, Solution, Schedule] Let xn and s2n denote the sample mean and variance
for the sample x1 , x2 , . . . , xn . Let xn+1 and s2n+1 denote these quantities when an additional observation
xn+1 is added to the sample.
10
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
Exercise 1.30: [wk03Q9, Solution, Schedule] Suppose X and Y are two continuous random variables.
Prove that:
Z
E [Y] = E [Y |X = x ] fX (x) dx.
X1 Uniform[0, 1]
Conditional on X1 , X2 Uniform[0, X1 ]
Exercise 1.32: [wk03Q11, Solution, Schedule] Suppose that the joint distribution function of X1 and
X2 is given by
if x1 < 0 or x2 < 0;
0, h i
x1 x2 1 + 12 (1 x1 ) (1 x2 ) ,
if 0 x1 1 and 0 x2 1;
F X1 ,X2 (x1 , x2 ) =
F x1 (x1 ), if x2 > 1;
if x1 > 1,
F x2 (x2 ),
2. Find the marginal distribution functions of X1 and X2 . Can you recognise these distributions?
Exercise 1.33: [wk03Q12, Solution, Schedule] We have the joint probability density function:
(
k(1 x2 ), if 0 x1 x2 1;
fX1 ,X2 (x1 , x2 ) =
0, else.
2. Determine the region for the integral for determining Pr(X1 3/4, X2 1/2).
11
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
2. Find the mean and median of the claim amounts. What can you say about the skewness of the
distribution?
Exercise 1.35: [wk03Q2, Solution, Schedule] Data were collected on 100 consecutive days for the
number of claims, X, arising from a group of insurance policies. This resulted in the following
frequency distribution:
1. mode
2. median
3. interquartile range
4. Suppose the average value for 5 claims or more is 7.5. Calculate the sample mean.
Exercise 1.36: [wk03Q3, Solution, Schedule] For a set of 32 observations, you are given:
32
X 32
X
xk = 13 337.6 and xk2 = 5 667 388.7.
k=1 k=1
The largest of the observations is 605. Suppose you are interested in measuring the impact of the
largest observation on the mean and standard deviation.
2. Calculate the sample mean and the sample standard deviation, with the largest observation
deleted.
Exercise 1.37: [wk03Q7, Solution, Schedule] Two independent measurements, X and Y, are taken
of a quantity . We are given the means are equal, E [X] = E [Y] = , but the variances 2X and 2Y
are not equal. The two measurements are then combined by means of a weighted average to give:
Z = X + (1 ) Y,
12
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
3. Under what circumstances is it better to use the average (X + Y) /2 than either X or Y alone to
determine ? Hint: a smaller variance would give a better estimate of the population mean.
4. Now, suppose X and Y are instead not independent and have covariance:
Cov (X, Y) = XY .
S = X1 + X2 + . . . + X N .
Exercise 1.39: [wk04Q2, Solution, Schedule] Let X1 , X2 and X3 be i.i.d. with common density:
fX (x) = ex , x 0.
Exercise 1.40: [wk04Q3, Solution, Schedule] Let X Gamma(, 1) and Y Gamma(, 1) be inde-
pendent random variables. Define U = X + Y and V = X/(X + Y).
2. Use the Jacobian transformation technique to find the joint distribution of U and V.
13
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
4. Find the marginals of U and V using their joint distribution derived in part 2. Demonstrate the
the marginal of U is consistent with that derived from Exercise 1.40 part 1.
5. Use Exercise 1.40 part 3 and 4, to find the mean and variance of V.
Exercise 1.41: [wk04Q4, Solution, Schedule] Let X1 , X2 and X3 be three independent and identically
distributed as Exp(1) random variables. Find:
h i
1. E X(3) X(1) = x
h i
2. E X(1) X(3) = x
Exercise 1.42: [wk04Q5, Solution, Schedule] Let X1 and X2 be i.i.d. (independent and identically
distributed) N (0, 1) random variables.
3. Suppose X1 and X2 are no longer independent but each still has N (0, 1) distribution. Will X1 +X2
and X1 X2 be still independent?
(a) Find the p.d.f. of an Inverse Gamma Distribution, i.e., find the p.d.f. of Y = X1 .
(b) Find the c.d.f. of the inverse gamma distribution as function of the c.d.f. of the gamma
distribution.
Exercise 1.43: [wk05Q14, Solution, Schedule] Using the p.d.f. of a chi-squared distribution with
one degree of freedom:
exp(y/2)
fY (y) = p , if y > 0,
2y
and zero otherwise, prove that the moment generating function of Y is given by:
MY (t) = (1 2t)1/2 .
14
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
Exercise 1.45: [wk05Q16, Solution, Schedule] Prove that the p.d.f. of a Snecdors F distribution,
given by the transformation:
U/n1
F= ,
V/n2
where U 2 (n1 ) and V 2 (n2 ), is given by:
I Let Z1 and Z2 be two independent N (0, 1) random variables and let V1 2 (r1 ) and V2
2 (r2 ) be two independent chi-squared random variables. Which of the following random vari-
ables has a t-distribution with degrees of freedom (r1 + r2 )?
Z1 + Z2
(A)
(V1 + V2 ) /(r1 + r2 )
Z1 + Z2
(B)
(V1 /r1 ) + (V2 /r2 )
Z1 + Z2
(C)
2 (V1 + V2 ) /(r1 + r2 )
Z1 Z2
(D)
(V1 + V2 ) /(r1 + r2 )
Z1 Z2
(E) +
V1 /r1 V2 /r2
II Let Z1 and Z2 be two independent standard normal random variables. Which of the following
combinations of the two has also a standard normal random variable?
(A) (Z1 + Z2 ) /2
(B) Z1 + Z2
(C) Z1 /Z2
(D) Z1 Z2
(E) (Z1 Z2 ) / 2
III Let Z1 N (0, 1) and Z2 N (0, 1) be two random variables with correlation coefficient
(Z1 , Z2 ) = ,
15
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
(A) X Exp()
(B) X Exp(n)
(C) X Exp(/n)
(D) X Gamma(n, n)
X Gamma n, n
(E)
n
Note: m.g.f. of Gamma: MXi (t) = 1 t ).
V Let X1 , . . . , Xn be n independent and identically distributed Poisson random variables with mean
. Describe the distribution of the sum of these random variables:
Xn
S = Xk .
k=1
(A) S Poisson(1)
(B) S Poisson()
(C) S Poisson(/n)
(D) S Poisson(n)
(E) Cannot be determined from the given information
VI Suppose X1 , X2 , . . . , X20 are twenty independent random variables and are identically distributed
as Exp(2). Determine Pr X(20) > 1 .
VII Let X1 , X2 , . . . , Xn be n i.i.d. (independent and identically distributed) random variables each
with density:
16
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
(C) E X(n) = 1
VIII In a 100-meter Olympic race, the running times are considered to be uniformly distributed
between 8.5 and 10.5 seconds. Suppose there are 8 competitors in the finals. The current world
record is 9.9 seconds.
Determine the probability that the loser of the race will not break the world record.
(A) 0.54
(B) 0.64
(C) 0.74
(D) 0.84
(E) 0.94
17
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
Solutions
Solution 1.1: [wk01Q1, Exercise, Schedule] Urn 1: 1 Black (B), 1 Gold (G) and Urn 2: 1 White
(W), 1 Gold (G). Define B = black ball, G = gold ball, W = white ball
3. Let E be the event of getting the same color for both balls. Then E = {GG} and Pr (E) =
Pr (Urn 1 = G Urn 2 = G) = Pr (Urn 1 = G) Pr (Urn 2 = G) = 12 12 = 14 , * using indepen-
dence.
18
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
Solution 1.3: [wk01Q3, Exercise, Schedule] Let A and B be two independent events so that Pr (A B) =
Pr (A) Pr (B) , Pr (B|A) = Pr (B) , and Pr (A|B) = Pr (A).
thus independent.
thus independent.
3. It then becomes straightforward to show AC and BC are independent. Given that A and B are
independent, we know from part (b) that AC and B are also independent. Applying (a), then AC
and BC must also be independent.
1. Let A and B be mutually exclusive, i.e., A B = and Pr (A B) = 0. Suppose they are also
independent. Then
Pr (A B) = Pr (A) Pr (B) = 0.
Therefore, either Pr (A) = 0 or Pr (B) = 0. But, both Pr (A) > 0 or Pr (B) > 0 by assumption.
This is a contradiction so that A and B cannot be independent.
2. Now suppose A and B are independent, i.e., Pr (A B) = Pr (A) Pr (B). Suppose they are
mutually exclusive. Then Pr (A B) = 0 which implies Pr (A) Pr (B) = 0 and following
similar argument above, this cannot be true. Therefore they cannot be mutually exclusive.
Solution 1.5: [wk01Q5, Exercise, Schedule] There are a total of 6 6 = 36 possible outcomes.
We have that: E1 = {(1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6)},
E2 = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1), (2, 6), (3, 5), (4, 4), (5, 3), (6, 2), (3, 6), (4, 5), (5, 4), (6, 3), (4, 6), (5, 5),
and
E3 = {(1, 1), (1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1), (2, 6), (3, 5), (4, 4), (5, 3), (6, 2)}. Thus, by counting
from these possible outcomes, we see that:
Pr (E1 ) = 6
36
= 1
6
Pr (E2 ) = 18
36
= 1
2
Pr (E3 ) = 12
36
= 1
3
19
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
1. Note that
1
Pr (E1 E2 E3 ) = Pr (double and sum is 8) = Pr ((4, 4)) =
36
and
1 1 1 1
Pr (E1 ) Pr (E2 ) Pr (E3 ) = = .
6 2 3 36
Thus, are independent.
2. However,
1
Pr (E1 E2 ) = Pr (double and sum is 8 or 10) = ,
18
which is not equal to:
1 1 1
Pr (E1 ) Pr (E2 ) = = .
6 2 12
3. Note that:
11
Pr (E2 E3 ) = Pr (sum is 7 or 8) =
36
which is not equal to:
1 1 1
Pr (E2 ) Pr (E3 ) = = .
2 3 6
4. Consider:
2 1
Pr (E1 E3 ) = Pr (doubles and sum is 2 or 8) = =
36 18
and note that
1 1 1
Pr (E1 ) Pr (E3 ) = = .
6 3 18
Therefore, they are independent.
2. However, A falsely interprets the event Z as equal to the event BC (event B is not pardoned) and
calculates:
Pr A BC 1/3 1
Pr A BC = C
= = .
Pr (B ) 2/3 2
20
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
Student As reasoning is not justified, i.e., the event Z has more information than the event B.
The lecturer does provide extra information on the probability that event C will happen, i.e.,
Pr (C |Z ) = Pr(CZ)
Pr(CZ)
= 1/2
1/3
= 23 .
Using x pX (x) = 1, Pr (C |Z ) = 2/3 and Pr (B |Z ) = 0 we have that Pr (A |Z ) = 1/3
P
4. Use multiplication
! rule. S 1 = number of ways first airline can eliminate 2 flights,
! with
5 6
n1 = , S 2 = number of ways second airline can eliminate 2 flights, with n2 = .There
2 ! ! 2
5 6
are n1 n2 = = 150 ways to eliminate 2 flights for each airline.
2 2
Solution 1.8: [wk01Q8, Exercise, Schedule] Define D = 2 marbles have different colors, B1 =
Box 1 is selected, B2 = Box 2 is selected,B3 = Box 3 is selected. Let p be the probability that
box 1 is selected. Then p + 2p + 3p = 1. Thus p = 16 . The required probability is:
Solution 1.9: [wk01Q9, Exercise, Schedule] We know: Pr(i) 0 for i = 0, 1, 2, . . . and i Pr(i) = 1.
P
Rewriting the p.d.f. gives: Pr(0) = Pr(1) = Pr(2); Pr(3) = 21 Pr(2) = 21 Pr(0); Pr(4) =
1
3
Pr(3) = 3! Pr(0)
1
Solution 1.10: [wk01Q10, Exercise, Schedule] X is a continuous random variable with density func-
tion:
fX (x) = cex , x > 1,
and zero otherwise.
R
1. To prove X is a random variable the following two conditions must be satisfied: 1) fX (x)dx =
1 and 2) fX (x) 0, for all x <.
R R
f (x)dx = cex dx = [cex ]
X 1 = ce
1
= 1. Thus for c = e the first conditional holds.
1
21
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
For c = e we have: fX (x) = ex+1 e = 0 for x > 1 and zero otherwise, hence also the
second condition is satisfied. Thus c = e.
R3 R3
cex dx ex dx R3
c ex dx e2 e3
2. Pr (X < 3|X > 2) = Pr(X<3X>2)
Pr(X>2)
= R2
cex dx
= R2
ex dx
= R 2 x = e2
= 1 e1 = e1
e
.
2 2 c 2
e dx
1.
if x = 1
1
32
pX (x) = if x = 2/3
03
otherwise.
2. graphd of pX :
Solution 1.12: [wk01Q12, Exercise, Schedule] We are given a normal distributed random variable.
22
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
3. Thus, S (t) = log (MX (t)) = log exp + 1/22 t2 = t + 12 2 t2 implies S 0 (t) = + 2 t and
S 00 (t) = 2 so that
S 0 (0) = and S 00 (0) = 2
and the result E [X] = and V (X) = 2 immediately follows.
4. This function is called the cumulant generating function of X.
1. By the theorem explained in the lecture notes we know that every m.g.f. corresponds to only
one distribution.
2. First, we determine the first five derivatives of MX (t) with respect to t:
MX(1) (t) = + 2t + 3t3 + 4t3
MX(2) (t) =2 + 6t + 12t2
MX(3) (t) =6t + 24t
MX(4) (t) =24
MX(5) (t) =0
Next, we can easily derive the non-central moments:
E [X] =MX(1) (0) =
h i
E X 2 =MX(2) (0) = 2
h i
E X 3 =MX(3) (0) = 6
h i
E X 4 =MX(4) (0) = 24
h i
E X 5 =MX(5) (0) = 0
3. The central moments can easily be determined using the non-central moments:
E X = = = 0
h i h i
E (X )2 =E X 2 2X + 2
h i
=E X 2 2E [X] + 2
=2 2
h i h i
E (X )3 =E X 3 3X 2 + 3X2 3
h i h i
=E X 3 3E X 2 + 3E [X] 2 3
=6 6 + 23
h i h i
E (X )4 =E X 4 4X 3 + 6X 2 2 4X3 + 4
h i h i h i
=E X 4 4E X 3 + 6E X 2 2 4E [X] 3 + 4
=24 24 + 122 34
h i h i
E (X )5 =E X 5 5X 4 + 10X 3 2 10X 2 3 + 5X3 5
h i h i h i h i
=E X 5 5E X 4 + 10E X 3 2 10E X 2 3 + 5E [X] 4 5
= 120 + 602 203 + 44
23
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
n
* using the Binomial expansion: (a + b)n = an + 1
an1 b + n2 an2 b2 + . . . + bn with a = X,
b = , and n=2,3,4, or 5.
- Mean: = E [X] = ;
h i
- Variance: 2 = E (X )2 = 2 2 ;
E[(X)3 ] 66+23
- Skewness: = 2 3/2
= ;
E[(X) (22 )3/2
]
E (X)4 ]
- (Excess) Kurtosis: = [
2 34
2 2
= 2424+12 2 3.
E[ (X) ] (2 )
2
5. First, we calculate the mean, variance, skewness, and kurtosis for those four set of parameters.
We have that:
i) We have a smaller variance for insurer B1 than A, and the same mean, skewness, and
kurtosis. This implies that a large claim for insurer A is more likely than for insurer B
(i.e., more variability in the claim size for insurer A), hence the price for reinsuring this
risk is larger for insurer A than insurer B.
ii) We have a smaller skewness for insurer A than B2 , and the same mean, variance, and
kurtosis. The negative skewness of insurer A indicates that the probability of a claim
larger than the mean is more than 50%. This implies that a large claim for insurer A is
more likely than for insurer B2 , hence the price for reinsuring this risk is larger for insurer
A than insurer B2 .
iii) We have a smaller kurtosis for insurer A than B3 , and the same mean, variance, and skew-
ness. Hence, the distribution of the claims of insurer A are more flat than of insurer B.
This implies that a large claim for insurer B3 is more likely than for insurer A, hence the
price for reinsuring this risk is larger for insurer B3 than insurer A.
Solution 1.14: [wk01Q14, Exercise, Schedule] The probability density function for a continuous
random variable X is given by:
( 2
, for x 1;
fX (x) = x3
0, otherwise.
Rx Rx Rx h ix
1. F X (x) = Pr(X x) = fX (z)dz = 2
z3
dz = 2 z3 dz = z2 = 1
x2
(1) = 1 x12 for x 1
1
1 1
and zero otherwise.
2. Pr(X 4) = 1 Pr(X < 4) = 1 Pr(X 4) = 1 F X (4) = 1
16
.
* using Pr(X = x) = 0 for continuous random variables.
24
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
3. graph of fX :
graph FX
~
FX
~
1
0.8
0.6
0.4
0.2
x
1 2 3 4 5
and for x 0,
Z 0 Z x " #x
1 u 1 u 1 1 u 1
e du + e du = e = 1 ex .
2 0 2 2 2 0 2
Thus,
1 21 ex , if x 0;
F X (x) =
1 ex ,
if x < 0.
2
25
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
Solution 1.16: [wk01Q16, Exercise, Schedule] We define the force of mortality as:
F X (x + h) F X (x)
(x) = lim .
h0 h (1 F X (x))
The force of mortality is thus a conditional instantaneous rate of death at age xconditional on
surviving to age x. (The corresponding conditional instantaneous probability of death at age x is thus
x dx.)
26
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
Hence, the only way to get the first part ([z (F X (z) + C)]0 ) to a value smaller than infinity (and larger than minis
infinity) is to set C = 1. R
4
That the integrated term vanishes for x is proven as follows: since E[X] < , the integral 0 x f x (x)dx is
convergent, and hence the tails tend to zero, so
Z Z
x[1 F X (x)] = x fX (t)dt t fX (t)dt 0 for x .
x x
27
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
1. Geometric(p = 1/2)
2. Binomial(n = 3, p = 1/2)
3. Poisson( = 1/2)
4. N.B.(r = 4, p = 1/2)
5. Binomial(n = 5, p = 3/4)
28
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
1. Now, consider
n
lim MX (t) = lim pet + (1 p)
n n
n
= lim 1 + p et 1
n
t n
= lim 1 + e 1 .
n n
Equivalently, by taking x = 1/n, so that as n , x 0,
1/x
lim MX (t) = lim 1 + x et 1
n x0
= e(e 1) = MY (t) .
t
* using:
x n
lim 1 + = ex
n n
or, equivalently (h = 1/n),
lim (1 + hx)1/h = e x
h0
29
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
1
3. n = 300 10 = 3, 000 and p = . Then X Binomial(n = 3, 000, p = 1/400) and =
400
E[X] = np = 7.5. Approximating using Y Poisson( = 7.5), we have:
1-sum(dpois(0:3,7.5))
1-sum(dbinom(0:3,3000,(1/400)))
Solution 1.21: [wk02Q4, Exercise, Schedule] Let X = number of claims over $100. Then X
Binomial(n = 200, p = 0.05).
!
200
1. Pr (X = 0) = (0.05)0 (0.95)200 = 0.0000351.
0
! !
200 200
2. Pr (X = 0)+Pr (X = 1)+Pr (X = 2) = 0.0000351+ (0.05) (0.95) +
1 199
(0.05)2 (0.95)198 =
1 2
0.0023363.
Solution 1.22: [wk02Q5, Exercise, Schedule] We are given X Gamma(, ) so that the p.d.f. has
the form
x1 ex
fX (x) = , for x 0.
()
30
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
Solution 1.23: [wk02Q6, Exercise, Schedule] Returns are normally distributed. Use the standard
normal table to get the desired probabilities, but only after standardising, i.e., Z = X
.
1. The probability that investment will be below 1, 000 is:
!
0 0.05
Pr (1000 (1 + RA ) < 1000) = Pr (RA < 0) = Pr ZA <
0.1
= Pr (ZA < 0.16) = 1 (0.16)
= 1 0.5636 = 0.4364.
2. The probability that investment will exceed 1, 200 is:
!
0.2 0.05
Pr (1000 (1 + RA ) > 1 200) = Pr (RA > 0.2) = Pr ZA >
0.1
= Pr (ZA > 0.47) = 1 (0.47)
= 1 0.6808 = 0.3192.
Similar procedures above under investment B but the mean and variance are different. You can
verify that for part (c), the probability is 0.4443 and for part (d), the probability is 0.4443.
Solution 1.24: [wk02Q7, Exercise, Schedule] Let T 1 and T 2 be the time until the next accident at
each of the busy intersections. Then T 1 Exp (2) and T 2 Exp (2.5).
1. The probability that there are no accidents at either intersections in the next month is:
Pr (T 1 > 1 T 2 > 1) = Pr (T 1 > 1) Pr (T 2 > 1)
= (1 Pr (T 1 1)) (1 Pr (T 2 1))
= (1 FT1 (1)) (1 FT2 (1))
= (1 1 e2 (1 1 e2.5 )
= e2 e2.5 = e4.5 = 0.0111.
* using independence between T 1 and T 2 .
2. The probability that there will be no accidents for at least one of the intersections in the next
month is:
Pr (T 1 > 1 T 2 > 1) = Pr (T 1 > 1) + Pr (T 2 > 1) Pr (T 1 > 1 T 2 > 1)
= (1 Pr (T 1 1)) + (1 Pr (T 2 1)) Pr (T 1 > 1 T 2 > 1)
= (1 FT1 (1)) + (1 FT2 (1)) Pr (T 1 > 1 T 2 > 1)
= (1 1 e2 ) + (1 1 e2.5 ) Pr (T 1 > 1 T 2 > 1)
= e2 + e2.5 e4.5 = 0.2063.
31
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
1. Note that:
+1
Z Z Z
E [X] = x fX (x)dx = x dx = x dx
x
" +1 #
x
= =
1 1
and that
+1
2
Z Z Z
h i
E X 2
= x fX (x)dx =
2
x dx = x+1 dx
x
" +2 #
x 2
= = .
2 2
2 2 2
Var (X) = = .
2 1 ( 1)2 ( 2)
We have that F X (x) = 0 for x . Hence, the quantile function of X is given by solving
u = 1 F 1(u)
X
F X1 (u) = ,
(1 u)1/
where u [0, 1].
32
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
x 0 1 2 3
Pr (X = x) = y Pr(X = x, Y = y)
P
0.25 0.35 0.27 0.13
and
y 1 2
Pr (Y = y) = x Pr(X = x, Y = y)
P
0.45 0.55
The conditional probability functions are:
x 0 1 2 3
Pr (X |Y = 1) = Pr(X=x,Y=y)
Pr(Y=y)
1/9 4/9 3/9 1/9
and
y 1 2
Pr (Y |X = 3) = Pr(X=x,Y=y)
Pr(X=x)
5/13 8/13
33
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
h i h i P
4. Var (Y |X = 3 ) = E Y 2 |X = 3 (E [Y |X = 3])2 = 40/169, where E Y 2 |X = 3 = y y2 Pr(Y =
y|X = 3) = 12 5/13 + 22 8/13 = 37/13 and E [Y |X = 3] = y y Pr(Y = y|X = 3) =
P
1 5/13 + 2 8/13 = 21/13
Solution 1.27: [wk03Q5, Exercise, Schedule] The marginals can be obtained using:
X X
pX (x) = p (x, y) and pY (y) = p (x, y)
y x
x/y 1 2 3 4
pX (x) = Pr (X = x) 0.19 0.32 0.31 0.18
pY (y) = Pr (Y = y) 0.19 0.32 0.31 0.18
x/y 1 2 3 4
Pr (X |Y = 2) 5/32 20/32 5/32 2/32
Pr (Y |X = 2) 5/32 20/32 5/32 2/32
i.
Z Z y Z 1Z y
6 2 6 2 12
Pr (X < Y) = fX,Y (x, y)dxdy = x + y + xydxdy
0 0 7 7 7
Z 1" #y Z 1
6 3 6 2 12 2 3 6 3 6 3
= x + y x + x2 y dy = y + y + y dy
0 21 7 14 0 0 7 7 7
Z 1 " 4 #1
14 3 y 1
= y dy = 2 = .
0 7 4 0 2
34
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
ii. Use X + Y 1 X 1 Y:
Z Z 1y Z 1 Z 1y
6 2 6 2 12
Pr (X + Y 1) = fX,Y (x, y)dxdy = x + y + xydxdy
0 0 7 7 7
Z 1" #1y
6 3 6 2 12
= x + y x + x2 y dy
0 21 7 14 0
Z 1
6 6 12
= (1 y)3 + (1 y)y2 + (1 y)2 ydy
0 21 7 14
Z 0
6 6 12
= (z)3 z(1 z)2 (z)2 (1 z)dz
1 21 7 14
Z 1 ! !
6 6 12 12 2 6 6
= + (z) +
3
(z)2 + (z)dz
0 21 7 14 14 7 7
Z 1 ! !
2 6 6
= (z)3 (z)2 + (z)dz
0 7 7 7
" #1
2 4 6 3 3 2 3
= z z + z = ,
74 73 7 0 14
* using z = 1 y, dz = 1dy.
iii.
Z 1/2 Z Z 1/2 Z 1
6 2 6 2 12
Pr (X 1/2) = fX,Y (x, y)dydx =x + y + xydydx
0 0 7 7 7
Z 1/2 " #1 Z 1/2
6 2 6 12 6 2 6 12
= x y + y3 + xy2 dx = x + + xdx
0 7 21 14 0 0 7 21 14
" #1/2
6 3 6 6 2
= x + x + x2 = .
21 21 14 0 7
2. Rewriting fX.Y (x, y) = 76 (x2 + y2 + 2xy), we have the following marginal densities:
Z Z 1
6 2
fX (x) = fX,Y (x, y)dy = (x + y2 + 2xy)dy
0 7
" #1
6 2 1 3 1 2 2 2
= x y + y + xy = 3x + 3x + 1 for 0 x 1,
7 3 2 0 7
and zero otherwise, and
Z Z 1
6 2
fY (y) = fX,Y (x, y)dx =
(x + y2 + 2xy)dx
0 7
" #1
6 2 1 3 1 2 2 2
= y x + x + yx = 3y + 3y + 1 for 0 y 1,
7 3 2 0 7
and zero otherwise.
3. You can also immediately check the following conditional densities:
fX,Y (x, y) 3 (x + y)2
fY|X (y |x ) = = 2 for 0 y 1,
fX (x) 3x + 3x + 1
and zero otherwise, and
fX,Y (x, y) 3 (x + y)2
fX|Y (x |y) = = 2 for 0 x 1,
fY (y) 3y + 3y + 1
and zero otherwise.
35
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
Solution 1.29: [wk03Q8, Exercise, Schedule] Let xn and s2n denote the sample mean and variance
for the sample x1 , x2 , . . . , xn . Let xn+1 and s2n+1 denote these quantities when an additional observation
xn+1 is added to the sample.
36
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
Proceed as follows. Note n+1 k=1 (xk xn ) = xn+1 x, using k=1 xk xn = 0, then the second term
P Pn
n+1
!X 2
1 xn xn+1 (xn xn+1 )
(xn+1 xn ) + 2
2
(xk xn ) +
n+1 +
n n 1
|k=1 {z }
=xn+1 x
2
(xn xn+1 )2
" #
1 2(xn+1 xn )
= (xn+1 xn )2 +
n n+1 n+1
2
" #
1 (xn+1 xn )
= (xn+1 xn )2
n n+1
1 n
= (xn+1 xn )2
n n+1
(xn+1 xn )2
= ,
n+1
as required, *** using (a b)2 = ((a b))2 = (b a)2 .
xn )2
Hence, we have s2n+1 = n1 s2 + (xn+1n+1
n n
.
Solution 1.30: [wk03Q9, Exercise, Schedule] Starting with the right-hand side, we have:
Z Z Z
E [Y |X = x ] fX (x) dx = y fY|X (y |x ) fX (x) dydx
Z Z
f (x, y)
= y fX (x) dydx
fX (x)
Z Z
= y f (x, y) dydx
Z Z
= y f (x, y) dxdy
Z Z
= y f (x, y) dxdy
| {z }
= fY (y)
Z
= y fY (y) dy = E [Y] .
Solution 1.31: [wk03Q10, Exercise, Schedule] X1 Uniform[0, 1] implies that fX1 (x1 ) = 1 for
0 x1 1, and zero otherwise, and conditional on X1 , X2 Uniform[0, X1 ] implies
1
fX2 |X1 (x2 |x1 ) = , for 0 x2 x1 1,
x1
and zero otherwise.
37
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
Thus we have:
if x2 < 0 or x1 < 0;
0,
x2 1 + log xx21 ,
if 0 x2 x1 1;
F X1 ,X2 (x1 , x2 ) =
x1 1 + log xx11 = x1 , if 1 > x2 > x1 > 0;
1, else.
and zero otherwise, and hence the marginal distribution function is:
if x2 < 0;
0,
R x2
F X2 (x2 ) = log(u2 )du2 = u2 log(u2 ) u2 0 = x2 log(x2 ) + x2 , if 0 x2 1;
x2
0
if x2 > 1.
1,
Solution 1.32: [wk03Q11, Exercise, Schedule] First note that we can re-express the joint distribution
function as:
3 1 1 1
F (x1 , x2 ) = x1 x2 x12 x2 x1 x22 + x12 x22 .
2 2 2 2
1. The joint density can be derived by taking the partial derivative twice:
2 F (x1 , x2 ) 3
f (x1 , x2 ) = = x1 x2 + 2x1 x2
x1 x2 2
if 0 x1 , x2 1 and zero otherwise.
if x1 < 0;
0,
Z x1
R x1
F (x1 ) = fX1 (u1 )du1 = 1du1 = [u1 ]0 = x1 , if 0 x1 1;
x1
0
if x1 > 1.
1,
38
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
and
if x2 < 0;
Z x2
0,
R x2
F (x2 ) = fX2 (u2 )du2 = 1du2 = [u2 ]0 = x2 , if 0 x2 1;
x2
0
if x2 > 1.
1,
Solution 1.33: [wk03Q12, Exercise, Schedule] We have the joint probability density function:
(
k(1 x2 ), if 0 x1 x2 1;
fX1 ,X2 (x1 , x2 ) =
0, else.
1. For fX1 ,X2 (x1 , x2 ) to be a (joint) probability density function, the function should satisfy the two
conditions:
1) RfX1 ,XR2 (x1 , x2 ) 0 for all x1 , x2
2) fX1 ,X2 (x1 , x2 ) dx1 dx2 = 1.
39
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
1) is satisfied if k 0.
For the second condition we calculate:
Z Z Z 1Z 1
fX1 ,X2 (x1 , x2 ) dx1 dx2 = k(1 x2 )dx2 dx1
0 x1
#1
x2
Z 1"
=k x2 2 dx1
0 2 x1
1
x12
Z
1
=k x1 + dx1
0 2 2
" 2 3 #1
x1 x1 x1
=k +
2 2 6
! 0
1 1 1 1
=k + =k .
2 2 6 6
2. To determine the region for the integral for determining Pr(X1 3/4, X2 1/2) we have three
conditions, namely:
Hence, the upper left part of the figure is the region which we integrate.
1 1
0.8 0.8
0.6 0.6
2
x2
x
0.4 0.4
0.2 0.2
0 0
0 0.5 1 0 0.5 1
x1 x1
3. To calculate Pr(X1 3/4, X2 1/2) we split the integral into three part, namely:
40
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
Thus we have:
Z 1 Z 3/4
Pr(X1 3/4, X2 1/2) = fX1 ,X2 (x1 , x2 ) dx1 dx2
1/2 0
Z 1 Z 1/2 Z 1 Z 3/4
= k(1 x2 )dx1 dx2 + k(1 x2 )dx1 dx2
1/2 0 3/4 1/2
Z 3/4 Z 3/4
+ k(1 x2 )dx2 dx1
1/2 x1
Z 1 Z 1
= [k(1 x2 )x1 ]1/2
0 dx2 + [k(1 x2 )x1 ]3/4
1/2 dx2
1/2 3/4
#3/4
3/4
x22
Z "
+k x2 dx1
1/2 2 x1
Z 1 Z 1
k k
= (1 x2 )dx2 + (1 x2 )dx2
1/2 2 3/4 4
x12
Z 3/4
15
+k x1 + dx1
1/2 32 2
1 !#1
x22 x22
" !# "
k k
= x2 + x2
2 2 1/2 4 2 3/4
3/4
15x1 x12 x13
" #
+k +
32 2 6 1/2
! !
k 1 3 k 1 15
= +
2 2 8 4 2 32
!
9 25 3 3 4 31
+k = + + = = 0.484375.
64 192 8 64 64 64
Solution 1.34: [wk03Q1, Exercise, Schedule] The data arranged in ascending order:
> stem(storm,scale=2)
0 | 79
1 | 025699
2 | 4457
3 | 13479
4 | 2367
41
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
5 | 4
6 | 8
7 |
8 | 2
2. Mean = 3 175, Median = 2710 + 1/2 (3110 2710) = 2 910. Stem-and-leaf display appears
to show a positively-skewed distribution.
3. Q1 = 1 807.5 and Q3 = 4 225. Therefore IQR = Q3 Q1 = 2 417.5.
24
4. F24 (1 000) = 1
I (xk 1 000) = 3
= 18 .
P
24 24
k=1
1. Mode = 2
2. Median = 2
3. Q1 = 1 and Q3 = 3 so that IQR = 3 1 = 2.
4. Since the average value of 5 claims or more is 7.5, then the sum of claims of 5 or more is
(5 7.5) = 37.5. Therefore
1 X 0 (14) + 1 (25) + 2 (26) + 3 (18) + 4 (18) + 37.5
x= xk = = 2.165.
100 100
Solution 1.36: [wk03Q3, Exercise, Schedule] Recall the formulas for the sample mean and variance:
1X 1 X 1 X 2
x= xk and s2 = (xk x)2 = xk nx2 .
n n1 n1
q
1. x = 32 (13, 337.6) = 416.8 and s = 31
1 1
5667388.7 32 (416.8)2 = 3492.8071 = 59.1.
2. Let xnew and s2new be the new mean and variance respectively after deleting the largest observa-
tion. Thus,
1
xnew = (13337.6 605) = 410.73
31
and
1 h i
s2new = 5667388.7 6052 31 (410.73)2 = 2389.686.
30
Therefore, snew = 2389.686 = 48.9.
3. Percentage change in the mean = newold
old
100% = 410.73416.8
416.8
100% = 1.46%.
4. Percentage change in the standard deviation = newold
old
100% = 48.959.1
59.1
100% = 17.26%.
1. Therefore,
E [Z] = E [X + (1 ) Y]
= E [X] + (1 ) E [Y]
= + (1 ) = .
42
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
= 2
2X + (1 )
2
2Y
Taking the first order condition (FOC) with respect to , i.e., differentiating with respect to
and then equating to zero, we have:
Var (Z) = 22X 2 (1 ) 2Y = 0,
which gives us:
2
22X = 2(1 )2Y = 2Y
1 X
2Y
= 2 .
X + 2Y
You must check for second derivative to ensure this gives the minimum!
X+Y
3. 2
is better than either X or Y if it has smaller variance than both of them, i.e.,
X + Y X + Y
Var < Var (X) and Var < Var (Y) .
2 2
Equivalently,
1 2 1 2
X + 2Y < 2X and X + 2Y < 2Y
4 4
2Y < 32X and 2X < 32Y
2X 1 2X
> and 2 < 3.
2Y 3 Y
1 2X
< < 3.
3 2Y
Var(Z) = 2 2X + (1 )2 2Y + 2 (1 )X,Y
43
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
thus we have:
2Y XY
= .
2X + 2Y 2XY
where
MN (t) = e(e 1) .
t
MX (t) = and
t
Thus,
!
e ( ) 1
log t t
MS (t) = e = exp .
t
Solution 1.39: [wk04Q2, Exercise, Schedule] Xk Exp(1) implies that fXk (x) = ex for x 0 and
zero otherwise, for k = 1, 2, 3. We have that:
if x < 0;
(
0,
F Xk (x) =
1 ex , if x 0,
for k = 1, 2, 3.
Let X(1) = min {X1 , X2 , X3 } and X(3) = max {X1 , X2 , X3 }. Finding the distributions of the minimum and
the maximum, we have:
F X(1) (x) = 1 (1 F (x))3 = 1 e3x ,
44
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
and zero otherwise. Therefore, we get the joint distribution of X(1) , X(3) by integrating over all
possible values of X(2) as:
Z y3 i y3
6e(y1 +y2 +y3 ) dy2 = 6 e(y1 +y2 +y3 )
h
fX(1) ,X(3) (y1 , y3 ) =
y1
y1
= 6 e2y1 y3 ey1 2y3 , for 0 y1 y3 < ,
h exp(cx) i
x exp(cx)dx = (cx 1) , and (note exp(a)b = exp(a b)):
R
* using c2
Z
E X(3) =3 y exp(y)(1 exp(y))2 dy
Z0
=3 y exp(y) 2y exp(2y) + y exp(3y)dy
0
" #
exp(y) exp(2y) exp(3y)
=3 (y 1) 2 (2y 1) + (3y 1)
1 4 9 0
11
=3 (1 1/2 + 1/9) = .
6
45
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
4. Now, for:
Z Z Z Z
E X(1) X(3) = xy fX,Y (x, y)dydx = xy 6e(x+y) ex ey dydx
Z0 x
Z
0 x
Therefore, we have:
46
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
x1 ex y1 ey
f (x, y) = fX (x) fY (y) = .
() ()
The inverse of the transformation:
x
u= x+y and v=
x+y
is given by:
x = uv and y = u uv = u (1 v) .
Which is derived by: x = u y v = uy
u
uv = u y y = (v 1)u y = (1 v)u
x = u u(1 v) x = uv.
Its jacobian is:
h1 (u, v) /u h1 (u, v) /v
v
u
J (u, v) = det = det
h2 (u, v) /u h2 (u, v) /v 1 v u
= uv u(1 v) = u.
Thus |J (u, v) | = u, because 0 < u < . By the Jacobian transformation technique, the joint
density of U and V is:
47
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
3. Use euv eu(1v) = eu , than we can further simplify the joint density as:
1 1
fUV (u, v) = u|+1 u 1
{ze } v| (1{z v)1 .
() () }
| {z } function of u alone function of v alone
constant
Thus, we see that we can express the joint density as a product of functions of u alone and v
alone, i.e., fU,V (u, v) = fU (u) fV (v). Therefore, U and V are independent.
4. Note x, y 0, thus 0 X
X+Y
X
X
= 1. For the marginal of U, we have
u+1 eu v1 (1 v)1
1
Z
fU (u) = dv
0 () ()
u+1 eu 1 ( + ) 1
Z
= v (1 v)1 dv
( + ) 0 () ()
| {z }
density of a Beta(,) =1
(+)1 u
u e
= , for u > 0
( + )
and zero otherwise. This is the density of a Gamma( + , 1). This reinforces the result in (a).
Note: 0 X + Y . For the marginal of V, we have:
e v (1 v)1
Z +1 u 1
u
fV (v) = du
0 () ()
Z +1 u
( + ) 1 1 u e
= v (1 v) du
() () ( + )
|0 {z }
density of a Gamma(+,1) =1
( + ) 1
= v (1 v)1 , for 0 < v < 1
() ()
and zero otherwise. This is the density of a Beta(, ).
so that
E [V] = .
+
Similarly, we have for the variance:
h i
Var (X) = Var (UV) = E U 2 V 2 (E [UV])2
h i h i
= E U 2 E V 2 2
so that
h i + 2 + 2
E V 2 = 2 = .
E U ( + ) (1 + + )
48
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
Solution 1.41: [wk04Q4, Exercise, Schedule] Xk Exp(1) implies that fXk (x) = ex for x 0 and
zero otherwise, for k = 1, 2, 3. We have that:
if x < 0;
(
0,
F Xk (x) =
1 ex , if x 0,
for k = 1, 2, 3.
Let X(1) = min {X1 , X2 , X3 } and X(3) = max {X1 , X2 , X3 }. Finding the distributions of the minimum and
the maximum, we have:
F X(1) (x) = 1 (1 F (x))3 = 1 e3x ,
for x 0 and zero otherwise. So that:
and zero otherwise. The joint distribution of X(1) , X(2) , X(3) is given by:
and zero otherwise. Therefore, we get the joint distribution of X(1) , X(3) by integrating over all
possible values of X(2) as:
Z y3 iy3
6e(y1 +y2 +y3 ) dy2 = 6 e(y1 +y2 +y3 )
h
fX(1) ,X(3) (y1 , y3 ) =
y1
y1
= 6 e2y1 y3 ey1 2y3 , for 0 y1 y3 < ,
49
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
h
i Z x 2 e2y e(y+x)
E X(1) X(3) = x = y dy
0 (1 ex )2
" 2x #x !
2 e
= x y x
(2y 1) e e (y 1) 0
(1 ex )2 4 0
2xe2y e2y + 1 + 4xe2x + 4e2x 4ex
=
2(1 ex )2
2y
!
2 e 1
= (2x 1) + + e (x + 1) e
2x x
(1 ex )2 4 4
1 4ex + 3e2x + 2xe2x
= ,
2 (1 ex )2
h cx i
* using xecx dx = ec2 (cx 1) .
R
50
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
1. Let S = X1 + X2 .
1 2 1 2 1 2
= e 2 t e 2 t = e 2 (2)t
2. Let D = X1 X2 .
h i h i h i
MD (t) = E e(X1 X2 )t = E eX1 t E eX2 (t)
1 2 1 2 1 2
= e 2 t e 2 (t) = e 2 (2)t
which is the m.g.f. of a N (0, 2). Thus, D has the same distribution as S .
3. Now, assume that they are no longer independent and has the bivariate normal density:
1 1 !
fX1 ,X2 (x1 , x2 ) = exp x1 2x1 x2 + x2 .
2 2
2 1 2 2 1 2
p
Using Jacobian transformation technique, we find the joint distribution of S and D. From
S = X1 + X2 and D = X1 X2
51
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
which is derived by X1 = S X2 D = S X2 X2 X2 = S D 2
X1 = S S D 2
= S +D
2
. Its
Jacobian is:
(S + D)/2 /S (S + D)/2 /D
1/2 1/2
J (S , D) = det = det = 1/41/4 = 1/2.
(S D)/2 /S (S D)/2 /D 1/2 1/2
1 (s + d) 2 2 1 (s + d) 1
1 1
fS ,D (s, d) = exp 2 2
2 1 2 2 1 2 1 (s d) + 1 (s d) 2 |2|
p
2 2
1 1 !
= exp s + d 2 s d + s + d
2 2 2 2 2 2
4 1 2 8 1 2
p
1 1 !
= exp (1 ) s + (1 + ) d
2 2
4 1 2 4 1 2
p
s2 d2
! !
1
= exp exp
4 1 2 4 (1 + ) 4 (1 )
p
Therefore, clearly we can write the density as a product of functions of s alone and d alone. S
and D are therefore independent.
(a) The transformation g(X) = 1/X is a monotonic decreasing function for x > 0, because
d 1x
dg(x)
d
x = d
x = x2 < 0 for x > 0. Hence, we can apply the CDF technique, with
g1 (Y) 1y
g(Y) = 1/X, g1 (Y) = 1/Y, and y
= y
= y2 < 0, support of Y: g(0) = ,
g() = 0 we have:
fY (y) = fX (g1 (y)) g1 (y)
y
!1
1
= e y y2
() y
= (y)+1 e y y2
()
= y1 e y
()
for y > 0 and zero otherwise.
(b) The c.d.f. of the inverse gamma distribution, as function of the c.d.f. of the gamma
distribution, is given by applying the CDF technique:
52
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
Solution 1.43: [wk05Q14, Exercise, Schedule] The p.d.f. of a chi-squared distribution with one
degree of freedom:
exp(y/2)
fY (y) = p , if y > 0,
2y
and zero otherwise. We need to prove that the moment generating function of Y is given by:
MY (t) = (1 2t)1/2 .
Using the transformation x = 2 y(t 1/2) and thus dy = y1/2 /2 2 (t 1/2)dx we have:
p
Z Z
exp(y/2)
MY (t) = e fY (y)dy =
ty
exp(ty) p dy
0 2y
Z
exp(y (t 1/2))
= p dy
0 2y
exp(x2 /2)
Z
2
= dx
2 (t 1/2) 0 2
2 1
=
2 (t 1/2) 2
1
= = (2 (t + 1/2))0.5 = (1 2t)0.5
2 (t 1/2)
R 2 /2)
* using that 0 exp(x
2
dx is the integral of the p.d.f. of a standard normal distributed random variable
over the positive values of x. Due to the symmetry property of the standard normal distribution in 0,
we have that this integral equals 1/2.
F Xn (x) F X (x) as n .
This implies that one can use the cumulative density function of the student-t distribution and the
standard normal distribution to prove the convergence. However, these do not have a closed form
expression. Therefore, we will prove that the probability density function of a studentt distribution is
the same as the standard normal one when n . When the probability density function converges,
also the cumulative density function must converge.
53
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
We have:
n+1
21 x2
!(n+1)/2
lim ft|n (x) = lim 1 +
n n n n n
2
r !(n+1)/2
n 1 x2
= lim 1+
n 2 n n
n/21/2
x2 /2
!
1
= lim 1 +
n 2 n/2
1 1 1
= lim n/2
q
1 + xn/2/2
2
1 + xn/2/2
n 2 2
1 1 1
= 1/2x2 lim q
2 e n
1+ x2 /2
n/2
1 2
= e1/2x ,
2
( n+1 )
which is the probability density function of a standard normal random variable, * using lim ( 2n ) =
n 2
a n
= + q 1 =
pn a
2
, ** using e lim 1 n
, and *** using lim x2 /2
1.
n n 1+ n/2
V =G U = n1 F V/n2 = n1 F G/n2 .
* using independence between U and V, ** using inverse transformation, determined in step ii), and
*** using exp(ga) exp(gb) = exp(g(a + b)) and ab ac = ab+c .
54
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
I. C
A t-distribution is obtained by a standard normal r.v. divided by the square root of a chi-squared
r.v. divided by its degree of freedom.
We have Z1 + Z2 N(0, 2), i.e., Z1+Z2 2 N(0, 1) (see lecture notes).
For a chi-squared distribution we have the m.g.f.: MVi (t) = (1 2 t)ri /2 for i = 1, 2. Hence,
MV1 (t) MV2 (t) = (1 2 t)r1 /2 (1 2 t)r2 /2
= (1 2 t)(r1 +r2 )/2 ,
which is the m.g.f. of a chi-squared distribution with r1 + r2 degrees of freedom. Hence, V1 + V2
has a chi-squared distribution with r1 + r2 degrees of freedom.
II. E
See lecture notes/ previous question.
III. C
We have:
Z1 + Z2 N (0, Var(Z1 ) + Var(Z2 ) + 2Cov(Z1 , Z2 ))
N (0, 1 + 1 + 2 1 1)
N (0, 2 + 2)
N (0, 2 (1 + )) .
55
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
Z +Z
Thus Var 1 2
2
= 2(1+)
2
= 1 + , 1.
IV. D
We have MXk (t) = (1 t/)1 for k = 1, . . . , n. Let Yk = Xk /n, then we have: MYk (t) =
1
MXk /n (t) = MXk (t/n) = 1 nt for k = 1, . . . , n.
Using the m.g.f. technique we determine the distribution of the sample mean by the m.g.f.:
MX (t) =MY1 (t) . . . MYn (t)
t n
= 1
n
which is the m.g.f. of a Gamma distribution with parameters n and n.
V D
Use the m.g.f. technique. MXk (t) = exp((exp(t) 1)) for k = 1, . . . , n. We have:
MS (t) =MX1 (t) . . . MXn (t)
Y
= n exp((exp(t) 1))
k=1
= exp((exp(t) 1))n
= exp(n (exp(t) 1)),
which is the m.g.f. of a Poisson r.v. with mean n.
VI. B
We have:
Pr X(20) > 20 =1 Pr X(20) 1
=1 (F X (1))20
=1 1 exp(2) .
20
VII. D
We have:
Z x
0, h ix if x 0;
Rx
F X (x) = fX (x)dx = 2xdx = x = x , if 0 < x < 1;
2 2
0 0
0
1, if x 1.
56
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
VIII. E
We have that X U (8.5, 10.5), then fX (x) = 1/2 if x [8.5, 10.5] and zero otherwise and we
have:
if x < 8.5;
0,
F X (x) = ,
x8.5
if 8.5 x 10.5;
1,2
if x > 10.5.
Then we have: Pr(loser will not break world record) = Pr X(8) 9.9 = 1 Pr X(8) < 9.9 =
1 F X (9.9)8 = 1 0.78 .
57
Module 2
Parameter Estimation
Exercise 2.2: [wk05Q5, Solution, Schedule] Consider N independent random variables each having
a binomial distribution with parameters n = 3 and so that:
!
3 k
Pr (Xi = k) = (1 )nk ,
k
for i = 1, 2, . . . , N and k = 0, 1, 2, 3, and zero otherwise. Assume that of these N random variables n0
take the value 0, n1 take the value 1, n2 take the value 2, and n3 take the value 3 with N = n0 +n1 +n2 +n3 .
Exercise 2.3: [wk05Q6, Solution, Schedule] Assume that we have n independent observations y> =
[y1 , y2 , . . . , yn ], each with the Pareto p.d.f. given by:
A
fYi | (yi |; A) = ,
y+1
i
where 0 < < and 0 < A < yi < , and zero otherwise. You are now told the value of A, leaving
as the only unknown parameter.
58
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
2. Explain why we can express the relationship between the posterior distribution, prior distribu-
tion and likelihood function as follows:
3. We assume our prior pdf for is such that log() is uniformly distributed, implying:
1
() , 0 < < .
Show that the posterior pdf for is:
(|y; A) n1 ean ,
where a = log(G/A).
Exercise 2.4: [wk05Q9, Solution, Schedule] Given that there are n realizations of xi ,where i =
1, 2, . . . , n. We know that xi |p Ber(p) and p U(0, 1).
3. Why might we be interested in the Bayesian estimator for p(1 p)? Hint: consider the case
when n is large.
Exercise 2.5: [wk05Q12, Solution, Schedule] Suppose that X follows a geometric distribution, with
probability mass function:
Exercise 2.6: [wk05Q13, Solution, Schedule] The Pareto distribution is often used in economics as
a model for a density function with a slowly decaying tail. Its density is given by:
fX (x|) = x0 x1 , x x0 , > 1,
and zero otherwise. Assume that x0 > 0 is given and that x1 , . . . , xn is a sample from this distribution.
59
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
p
Prove X in probability.
Exercise 2.9: [wk05Q4, Solution, Schedule] A drunkard executes a random walk in the following
manner: each minute, he takes a step north or south, with probability 12 each, and his successive step
directions are independent. Each step he takes is of length 50 cm. Use the central limit theorem to
approximate the probability distribution of his location after one hour. Where is he most likely to be?
2. show that as , the gamma distribution with parameters and , properly standardised,
tends to the standard Normal distribution.
is said to have a Cauchy distribution. It is well-known that for Cauchy distribution, its mean does not
exist. Furthermore, suppose X1 , X2 , . . . , Xn are n independent Cauchy random variables, then it can be
shown that the sample mean:
n
1X
Xn = Xk
n k=1
also has a Cauchy distribution.1 Deduce then that from these results, the Cauchy violates the law of
large numbers. Explain why.
Exercise 2.12: [wk05Q10, Solution, Schedule] Let X1 , X2 , . . . be independent random variables with
common density:
fX (x) = x(+1) , for x > 1,
1
Proofs of these results are not expected for this course.
60
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
Exercise 2.13: [wk05Q11, Solution, Schedule] (Problem from [JR]) Suppose that X1 , X2 , . . . , X20 are
independent random variables with density functions:
and zero otherwise. Let S = X1 + . . . + X20 . Use the central limit theorem to approximate
Pr (S 10) .
Exercise 2.15: [wk06Q2, Solution, Schedule] Consider a random sampling from a normal distri-
bution with mean and variance 2 . Derive a 100 (1 ) % confidence interval of 2 when is
known.
Exercise 2.16: [wk06Q3, Solution, Schedule] This exercise aims to show that if we sample from a
continuous distribution, a pivotal quantity always exists. Let X1 , X2 , . . . , Xn be a random sample from
a continuous distribution fX (x|). Denote the corresponding cumulative distribution function by:
Z x
F X (x|) = fX (z|) dz.
(b) Show that W = log (1/F X (X|)) has an exponential distribution with mean 1. To do so, first
find the c.d.f c.d.f. W.
n
P
(c) From (b), deduce that log (1/F X (Xk |)) has a Gamma distribution. Specify its parameters.
k=1
(d) Use (c) to prove that there will always be a pivotal quantity when sampling from a continuous
distribution.
Exercise 2.17: [wk06Q4, Solution, Schedule] (modified based on a past Institute of Actuaries exam.)
Let X1 , X2 , . . . , Xn denote a random sample of a Gamma(3, ) and X is the sample mean.
61
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
(b) Use (a) to construct a lower 95% confidence interval for , of the form (0, U) .
(c) Use (a) to construct an upper 95% confidence interval for , of the form (L, ).
(d) Use (a) to construct a 95% confidence interval for , of the form (L, U) where L and U are not
necessarily equal to those found in (b) and (c).
1. Evaluate the intervals in (b), (c) and (d) in the case for which the total of a random sample of
20 observations yielded a value[(e)] of 20 k=1 xk = 98.2.
P
Exercise 2.18: [wk06Q5, Solution, Schedule] A local health club advertises that its members lose at
least 10 pounds on the average during a 30-day weight loss programme. After receiving a number
of complaints from people who were enticed to join the club, the Better Business Bureau sends out a
representative to the club to check out the claim. The representative sampled the following nine (9)
people who are enrolled in the program:
The representative of Better Business Bureau reported its findings in terms of a confidence interval.
Construct the appropriate 95% confidence interval for the average weight loss for participants in the
programme.
Exercise 2.19: [wk06Q6, Solution, Schedule] (Past Institute of Actuaries Exam Question) Inde-
pendent random samples of size n1 and n2 are taken from the normal populations N 1 , 21 and
N 2 , 22 . Let the sample means be X 1 and X 2 and the sample variances be S 12 and S 22 . You may
assume that X l and S l2 , l = 1, 2 are independent and distributed as follows:
2k (nk 1) S k2
!
X k N k , and 2 (nk 1) for k = 1, 2.
nk 2k
(a) It is required to construct a confidence interval for (1 2 ), the difference between the popu-
lation means.
i. Suppose that 21 and 22 are known. State the distribution of X 1 X 2 and write down a
suitable pivotal quantity together with its sampling distribution. Hence, write down a 95%
confidence interval for (1 2 ).
ii. Suppose that 21 and 22 are unknown but are known to be equal. State the definition of a
tk variable in terms of independent N(0, 1) and 2k variables and use it to develop a suitable
pivotal quantity. Hence, write down a 95% confidence interval for (1 2 ).
62
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
21
(b) It is required to construct a confidence interval for , the ratio of the population variances.
22
State the definition of an Fk,l variable in terms of independent 2k and 2l variables and use it to
21
develop a suitable pivotal quantity. Hence, obtain a 90% confidence interval for 2 .
2
(c) A regional newspaper included a consumer rights article comparing the cost of shopping in
corner shops and supermarkets. The researchers investigated the price of a standard se-
lection of household goods in a sample of 10 corner shops selected at random from the region,
and in a sample of 10 supermarkets selected at random from the region. The data yielded the
following values:
Sample Mean Sample S.D.
Corner Shops 22.55 1.22
Supermarkets 19.72 0.96
i. Use the result in part (a)(ii) to calculate a 95% confidence interval for (1 2 ), the dif-
ference between the population means (1 = corner shops, 2 = supermarkets).
2
ii. Use your result in part (b) to calculate a 90% confidence interval for 21 , the ratio of
2
the population variances. Use this result to comment briefly on the assumption of equal
variances required for the confidence interval in part (c)(i).
Exercise 2.20: [wk06Q7, Solution, Schedule] (IoA, Subject CT3, April 2005, No.6) In a survey
conducted by a mail order company a random sample of 200 customers yielded 172 who indicated
that they were highly satisfied with the delivery time of their orders.
Calculate an approximate 95% confidence interval for the proportion of the companys customers
who are highly satisfied with delivery times.
Exercise 2.21: [wk06Q8, Solution, Schedule] (IoA, Subject CT3, April 2005, No.8) The distribution
of claim size under a certain class of policy is modelled as a normal random variable, and previous
years records indicate that the standard deviation is 120.
(a) Calculate the width of a 95% confidence interval for the mean claim size if a sample of size 100
is available.
(b) Determine the minimum sample size required to ensure that a 95% confidence interval for the
mean claim size is of width at most 10.
(c) Comment briefly on the comparison of the confidence intervals in (a) and (b) with respect to
widths and sample sizes used.
Exercise 2.22: [wk06Q9, Solution, Schedule] (IoA, Subject CT3, April 2005, No.12 (partial))
1. A random variable Y has a Poisson distribution with parameter but there is a restriction that
zero counts cannot occur. The distribution of Y in this case is referred to as the zero-truncated
Poisson distribution.
(a) Show that the probability function of Y is given by:
y e
pY (y) = , for y = 1, 2, 3, . . . ,
y!(1 e )
and zero otherwise.
63
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
(b) Show that E[Y] = .
1 e
2. Answer the following.
(a) Let y1 , . . . , yn denote a random sample from the zero-truncated Poisson distribution. Show
that the maximum likelihood estimate of may be determined by the solution to the fol-
lowing equation:
e
y = 0,
1 e
and deduce that the maximum likelihood estimate is the same as the method of moments
estimate.
(b) Obtain an expression for the Cramer-Rao lower bound for the variance of an unbiased
estimator of .
Exercise 2.23: [wk06Q10, Solution, Schedule] (IoA, Subject 101, April 2004, No.12) For the esti-
mation of a bernoulli probability p = Pr(success), a series of n independent trials are performed and
X represents the number of successes observed.
(a) Write down the likelihood function L(p; x) and show that the maximum likelihood estimator
p = X/n.
(MLE) of p is b
(c) In order to develop an approximate 95% confidence interval for p for large n, the following
pivotal quantity is to be used:
pp
b
r N(0, 1).
p(1 p)
n
Assuming that this pivotal quantity is monotonic in p, show that rearrangement of the inequal-
ity:
pp
b
1.96 < r < 1.96
p(1 p)
n
leads to a quadratic inequality in p, and hence determine an approximate 95% confidence inter-
val for p.
(d) A simpler and more widely used approximate confidence interval is obtained by using the fol-
lowing pivotal quantity
bpp
r N(0, 1).
bp(1 b
p)
n
Determine the resulting approximate 95% confidence interval using this.
64
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
In each case calculate the two approximate confidence intervals from parts (c) and (d) and
comment briefly on your answers.
Exercise 2.24: [wk06Q11, Solution, Schedule] A random sample of 16 values, x1 , x2 , . . . , x16 , was
drawn from a normal population and gave the following summary statistics:
16
X
xi = 51.2
i=1
16
X
xi2 = 243.19
i=1
Exercise 2.25: [wk06Q12, Solution, Schedule] Consider a random sample of size n from a normal
distribution N(, 2 ) and let S 2 denote the sample variance.
(n 1) S 2
1. State the sampling distribution for and specify an approximate sampling distribution
2
for this expression when n is large.
2. For n = 101 calculate an approximate value for the probability that S 2 exceeds 2 by more than
a factor of 10%, i.e. Pr(S 2 > 1.12 ).
Exercise 2.26: [wk06Q13, Solution, Schedule] A group of 500 insurance policies gave rise to a total
of 83 claims during the last year. Assuming a Poisson model for the occurrence of claims, calculate
an approximate 95% confidence interval for , the claim rate per policy per year.
Exercise 2.27: [wk06Q14, Solution, Schedule] Let Xi , i = 1, . . . , n denote a random sample of size
n from a population with a uniform distribution on the interval (0, ). Let X(n) = max{X1 , . . . , Xn } and
define U = (1/)X(n) .
2. Because the distribution of U does not depend on , U is a pivotal quantity. Find the 95% lower
confidence bound for .
65
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
Solutions
Solution 2.1: [wk05Q2, Exercise, Schedule] To find an estimator for using the method of moments,
let E [X] = X. We then have:
Z
X = E [X] = fX (x)dx
Z
2 ( x)
= x dx
2
0
Z
2
= 2 x x2 dx
0
#
2 x2 x3
"
= 2
2 3 0
2 2
3
!
= 2
2 3
= .
3
Hence, the method moments estimate is:
= 3X.
b
Solution 2.2: [wk05Q5, Exercise, Schedule] Consider N independent random variables each having
a binomial distribution with parameters n = 3 and so that Pr (Xi = k) = 3k k (1 )nk , for i =
1, 2, . . . , N and k = 0, 1, 2, 3. Assume that of these N random variables n0 take the value 0, n1 take the
value 1, n2 take the value 2, and n3 take the value 3 with N = n0 + n1 + n2 + n3 .
66
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
67
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
4. We have that (|y; A) n1 exp (na) or, equivalently, there exist some constant c <
for which (|y; A) = c n1 exp (na). we need to determine the constant c. We know that
R
(|y; A)d = 1, because otherwise it is not a posterior density.
Given this observation, we are going to compare c n1 exp (na) with the p.d.f. of
X Gamma( x , x ), which is given by:
x x
fX (x) = xx 1 ex x .
( x )
68
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
5. The Bayesian estimator of is the expected value of the posterior. The posterior has a Z Gamma(n, an)
distribution. We have that E [Z] = na
n
. Thus:
h i n 1
B = E (|y; A) =
b = .
na a
Thus the Bayesian estimator of is a1 .
Solution 2.4: [wk05Q9, Exercise, Schedule] Given that there are n realizations of xi ,where i =
1, 2, . . . , n. We know that xi |p Ber(p) and p U(0, 1). We are asked to find the Bayesian esti-
mators for p and p(1 p). Since n random variables are independent, then:
n
Y
f (x1 , x2 , . . . , xn |p) = f (xi |p)
i=1
Pn Pn
=p i=1 xi (1 p)n i=1 xi
xi + 1
Pn
p = E p|X = i=1
B
.
n+2
b
69
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
2. Now we wish to find a Bayesian estimator for p(1 p). Then using the similar idea:
B
p)) =E p(1 p)|X
(p(1
[
Z 1
= p(1 p) f (p|x1 , x2 , . . . , xn )d p
0
(n + 2)
Z 1 Pn Pn
= Pn p1+ i=1 xi (1 p)n+1 i=1 xi d p
( i=1 xi + 1)(n + 1 i=1 xi ) 0
Pn
(n + 2)
= Pn
( i=1 xi + 1)(n + 1 ni=1 xi )
P
(( ni=1 xi + 1) ( ni=1 xi + 1)) ((n ni=1 xi + 1) (n ni=1 xi + 1))
P P P P
(n + 3) (n + 2) (n + 2)
xi + 1)(n + 1 i=1 xi )
Pn Pn
(
= i=1 .
(n + 3)(n + 2)
xi + 1 (a + b) (a + 2)
Pn
= i=1
n+2 (a) (a + b + 2)
(a + 1) a
Pn
xi 1
= i=1
n2 (a + b + 1)(a + b)
( i=1 xi + 1)(n + 1 ni=1 xi )
Pn P
= ,
(n + 3)(n + 2)
* where a = ni=1 xi + 1 and b = n + 1 ni=1 xi
P P
3. We are interested in the Bayesian estimator of p(1 p), since np(1 p) is the variance of the
binomial distribution (with n a known constant) and we can use this for the normal approxima-
tion.
Solution 2.5: [wk05Q12, Exercise, Schedule] Note that X can be interpreted as a geometric random
variable where k is the total number of trials. Here E [X] = 1p .
70
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
* using: a
1a
= b
c
1
1/a1
= b
c
1
a
1= c
b
1
a
= c+b
b
a= b
b+c
.
Solution 2.6: [wk05Q13, Exercise, Schedule] For the Pareto distribution with parameters x0 and
we have the following p.d.f.:
and zero otherwise. The expected value of the random variable X is then given by:
Z Z
E [X] = x fX (x)dx = x (x0 ) x1 dx
x0
R
x
= (x0 ) x dx 0
1
#"
x
= (x0 )
1 x0
= x0
1
= x0 .
1
1. Given x0 , we have E [X] = x,
1 0
thus:
x0 =X
1
x0 =X ( 1)
x0 =X X
X = X x0
X
=
b .
X x0
71
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
Solution 2.7: [wk05Q1, Exercise, Schedule] We are given that X Exp(1/5000). Thus, E [X] =
5000 and Var (X) = (5000)2 . Let S = X1 + . . . + X100 . Then E [S ] = 100 (5000) = 500, 000 and
Var (S ) = 100 (5000)2 .Thus, using the central limit theorem, we have:
!
S E (S ) 100 (50)
Pr (S > 100 (5050)) = Pr >
Var (S ) 10 (5000)
Pr (Z > 0.10) = 1 0.5398 = 0.4602.
p
Solution 2.8: [wk05Q3, Exercise, Schedule] To prove X n in probability, we show that if we
take any > 0, we must have:
Pr X n > 0, as n
or, equivalently;
lim Pr X n > = 0.
n
72
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
Solution 2.9: [wk05Q4, Exercise, Schedule] Let L be the location after one hour (or 60 minutes).
Therefore:
L = X1 + . . . + X60 ,
where (
50 cm, w.p. 21
Xk =
50 cm, w.p. 12 ,
so that E [Xk ] = 0 and Var (Xk ) = 2500.
Therefore,
E [S ] = 0 and Var (S ) = 60 (2500) = 150000.
Thus, using the central limit theorem, we have:
! !
L E [L] x x
Pr (L x) = Pr Pr Z .
Var (L) 150000 100 15
In other words,
L N (0, 150000)
approximately. The mean of a normal is also the mode, therefore its most likely position after one
hour is 0, the point where he started with.
Solution 2.10: [wk05Q7, Exercise, Schedule] We use moment generating function to show that:
1. The binomial tends to the Poisson: Let X Binomial(n, p). Its m.g.f. is therefore:
n
MX (t) = 1 p + pet
let np = so that p = /n
n
= 1 + et
n n !
n
et 1
= 1+
n
and by taking limit on both sides, we have:
!n
et 1
lim MX (t) = lim 1 + = exp et 1 ,
n n n
which is the moment generating function of a Poisson with mean .
2. The gamma, properly standardized, tends to Normal: Let X Gamma(, ) so that its density
is of the form:
1 x
f (x) = x e , for x 0,
()
73
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
Its mean and variance are, respectively, / and /2 . These results have been derived in lecture
week 2. Consider the standardized Gamma random variable:
X E (X) X / X X
Y= = p = =
Var (X) /2
t
!
X
t t t
MY (t) = e E e =e
MX
= e t = e t e log(1(t/ ))
t/
!!
1 2
= exp t t/ t/ + R
2
here R is the Taylors series remainder term
!
12 0
= exp t + R ,
2
where R involves powers of 1/ .. Thus in the limit, MY (t) exp 21 t2 as .
0
Solution 2.11: [wk05Q8, Exercise, Schedule] If the law of large numbers were to hold here, it would
have had the sample mean X approaching the mean of X, which does not exist in this case. At first
glance therefore it would seem not a violation. But, in fact, it is, because the assumption of finite
mean does not hold for Cauchy and therefore the law of large numbers cannot hold.
Solution 2.12: [wk05Q10, Exercise, Schedule] The common distribution function is given by:
Z x
F X (x) = u(+1) du = u 1x = 1 x , if x > 1,
1
Thus, limit exists and therefore converges in distribution. The limiting distribution is:
74
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
Solution 2.13: [wk05Q11, Exercise, Schedule] The mean and the variance of S are respectively:
40 10
E [S ] = and Var (S ) = .
3 9
Thus, using the central limit theorem, we have:
!
S E [S ] 10 (40/3)
Pr (S 10) = Pr
Var (S ) 10/9
Pr Z 10 = Pr (Z 3.16) = 0.0008.
or, equivalently,
21 (2n)
!
Pr < = 1 .
2nX
For = 0.05, the required constant is:
20.95 (2n)
a= ,
2n
where 20.95 (2n) denotes the 95th quantile of a chi-squared distribution with 2n degrees of freedom.
75
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
Solution 2.15: [wk06Q2, Exercise, Schedule] Let X1 , . . . , Xn be a random sample from N , 2 ,
and Zk a standard normally distributed. If is known, then it is known that:
X 2
k
= Zk2 2 (1) ,
so that: n n
X Xk 2 X
= Zk2 2 (n) ,
k=1
k=1
and to construct a 100 (1 ) % confidence interval for 2 , we define 2/2 (n) and 21/2 (n) to be the
(/2)th and (1 /2)th quantiles respectively of a chi-squared distribution with n degrees of freedom.
Using the above as a pivot quantity, we have:
k=1 (Xk )
Pn 2 !
Pr /2 (n) <
2
< 1/2 (n) = 1 ,
2
2
Solution 2.16: [wk06Q3, Exercise, Schedule] We have a random sample from a continuous distribu-
tion.
1. To prove that the c.d.f., when viewed as a random variable, has a uniform distribution, we have:
Pr (F X (X) x) = Pr X F X1 (x)
= F X F X1 (x) = x.
We know that this is the c.d.f. of a Uniform(0, 1) random variable, because x represents proba-
bility, which lays between 0 and 1 and the p.d.f. of the probability is uniformly distributed (e.g.
the probability of the probability occurring is equal for all probabilities between 0 and 1).
= 1 Pr F X (X) ex = 1 ex ,
76
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
3. Let Wk = log (1/F X (Xk )) Exp(1). Then, the m.g.f. of Wk , with = 1 is given by:
t 1
MWk (t) = 1
Using the m.g.f. technique we have, using the properties of m.g.f. (week 1):
n
X
Y= Wk
k=1
t n
MY (t) =MW1 +...+Wn (t) = MWk (t) n = 1
Y Gamma (n, 1) .
4. Using the properties of m.g.f. and the result from (c), we have:
n
X
2Y =2 Wk
k=1
!n 2n
2t t 2
M2Y (t) =MW1 +...+Wn (2t) = MWk (2t) = 1 = 1
n
2
n !
X 2n 1
2Y 2 Wk Gamma , = 2 (2n) .
k=1
2 2
Pn
Thus, you can always choose 2 k=1 Wk as a pivot because its distribution is free of any param-
eter.
1. Use the MGF technique. We have that the m.g.f. of the sample mean X can be written as:
which is the m.g.f. of a 2 (6n). Thus, we can use it as a pivot to construct confidence interval.
To construct a lower 95% confidence interval for , we note:
Pr 0 < 2nX < 20.95 (6n) = 0.95,
77
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
so that equivalently:
20.95 (6n)
!
Pr 0 < < = 0.95.
2nX
Therefore:
20.95 (6n)
!
0, =U ,
2nX
is the required confidence interval.
so that equivalently:
20.05 (6n)
!
Pr < < = 0.95,
2nX
and, hence:
20.05 (6n)
!
L= , ,
2nX
provides an upper confidence interval.
5. Using 20k=1 xk = 98.2, we have the following 95% confidence intervals for :
P
20.95 (120)
! !
146.57
Lower Tail: 0, = 0, = (0, 0.7463)
2 (98.2) 196.4
20.05 (120)
! !
95.70
Upper Tail: , = , = (0.4873, )
2 (98.2) 196.4
Note: the values 20.95 (120) = 146.57, 20.05 (120) = 95.70, 20.975 (120) = 152.21, and 20.025 (120) =
91.58 are computed using R or Excel (using: =chiinv(q,df)). Formulae and Tables only pro-
vides percentage points of the chi-squared distribution until 100 degrees of freedom (page 169).
Alternatively: use the approximate distribution of a chi-squared distributed for large n. Let
Y 2 (n), then we know Y = ni=1 Zi2 , where Zi are i.i.d. standard normal random variables.
P
Applying the Law of Large Numbers (see week 5) we have: Y N(n, 2n). Using this we
have that (n) = n + 2nz , thus: 20.95 (120) 145.49 (z0.95 = 1.6449), 20.05 (120) 94.52
2
(z0.05 = 1.6449), 20.975 (120) 150.36 (z0.975 = 1.96), and 20.025 (120) 89.64 (z0.025 = 1.96).
D = After-Weight Before-Weight .
78
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
A negative difference will mean a weight loss and a positive difference, a gain in loss. The sample
mean and standard deviation of the difference is
v
t n
1 X
d = 6.4444 sD = xi2 n x2 = 5.1988.
and
n 1 i=1
The required 95% confidence interval is therefore given by:
. .
d t1/2,n1 sD n = 6.4444 (2.306) 5.1988 9
= (10.44056322, 2.448236783).
This result may differ slightly because of rounding.
1. (a) We have that the difference in sample mean, given known population variances, is given
by:
21 22
!
(X 1 X 2 ) N 1 2 , + ,
n1 n2
note that the samples are independent, thus Cov(X 1 , X 2 ) = 0.
The pivotal quantity is then:
(X 1 X 2 ) (1 2 )
q 2 N (0, 1) .
1 22
n1
+ n2
Finally, using the pivotal quantity we have that the 95% confidence interval is given by:
s s
21 22 21 22
(x1 x2 ) + z10.025 < (1 2 ) < (x1 x2 ) + + z10.025 ,
n1 n2 n1 n2
where z10.025 is the 0.975 quantile of a standard normal random variable.
(b) Now, we have that the difference in sample mean, given equal, but unknown population
variance, is given by:
(X 1 X 2 )(1 2 )
1/n1 +1/n2
tk = r
kS 2p
2
k
Z
= ,
Y/k
(n 1)S 2 +(n 1)S 2
where S p = 1 n1 +n 1 2
2 2
2
, Z N(0, 1), and Y 2n1 +n2 2 thus k = n1 + n2 2. The
pivotal quantity is given by:
(X 1 X 2 )(1 2 )
1/n1 +1/n2
q tk .
S 2p
2
79
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
1.4884 1 2 1.4884
< 12 < 3.179
0.9216 3.179 2 0.9216
21
0.508027 < 2 < 5.13414
2
80
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
Solution 2.20: [wk06Q7, Exercise, Schedule] (See Q9 before doing this) X, the number of customers
who indicated high satisfaction; X Binomial(200, p).
X
p = = 172
Estimator of the parameter p : b = 43 = 0.86. Then:
n 200 50
0.86 p
Z= r N(0, 1).
0.86 (1 0.86)
200
An approximate 95% confidence interval for p is:
r r
0.86 (1 0.86) 0.86 (1 0.86)
, 0.86 + z10.025
0.86 z10.025
200 200
r r
0.86 (1 0.86) 0.86 (1 0.86)
= 0.86 1.96 , 0.86 + 1.96
200 200
= 0.811 910 05, 0.908 089 95 .
Solution 2.21: [wk06Q8, Exercise, Schedule] The distribution of claim size under a certain class of
policy is modelled as a normal random variable, and previous years records indicate that the standard
deviation is 120.
n
P
Xi
1. Let X = i=1
n
, an unbiased estimator of . We have:
dist
X N(, 2X = 2 /n).
120
width = 2(1.96) = 47. 04.
10
2. We want:
2z10.025 10,
n
hence:
120
2(1.96) 10,
n
2
and n 2(1.96) 120
10
= 2212. 761 6. The minimum sample size required is 2213.
3. The smaller the width of the confidence interval, the larger is the required sample size.
81
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
1. A random variable Y has a Poisson distribution with parameter but there is a restriction that
zero counts cannot occur. The distribution of Y in this case is referred to as the zero-truncated
Poisson distribution.
(a) Let X be the not-truncated random variable of Y. Hence, X has a Poisson distribution.
Note that:
Pr(X = x)
Pr(Y = y) = Pr(X = x|X > 0) =
Pr(X 1)
Pr(X)
=
1 Pr(X = 0)
y e
= , for y = 1, 2, 3, . . . ,
y!(1 e )
and zero otherwise.
(b)
X yy e
E[Y] = E[X|X > 0] =
y=1
y!(1 e )
1 X y e
=
1 e y=1 (y 1)!
1 X y+1 e
=
1 e y=0 (y)!
X y e
=
1 e y=0 (y)!
=
1 e
P y e
* using (y)!
is the sum of all the probability mass function of a Poisson random variable
y=0
with parameter and thus equals one.
Alternatively, one could use:
E[X] = E[X|X > 0] Pr(X > 0) + E[X|X 0] Pr(X 0).
| {z }
=0
82
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
To find the maximum point, we set the derivative of the log-likelihood function equal to
zero: n
P
`(; y) yi
ne i=1
= n + = 0.
d (1 e )
Equivalently,
e Y
1 + = 0,
(1 e )
or:
e
Y = 0.
1 e
Also, from the method of moments:
Y =E [Y] =
=0
1 e
e + e
0 =Y
1 e
e
0 =Y ,
1 e
* using result in Q9(a)2. Hence the maximum likelihood estimate is the same as the
method of moments estimate.
(b)
n
P
`(; y) yi
ne i=1
=n +
(1 e )
n
P
`(; y)
2 2
! yi
(1 e )ne ne i=1
=
2 (1 e )2 2
Pn
yi
ne
!
i=1
= 2
(1 e )2
Pn
`(; y)
2
! yi
=E
ne i=1
2
E
2
(1 e )
2
ne n 1e
!
=
(1 e )2 2
!
ne n
=
(1 e ) 2 (1 e )
ne n(1 e )
=
(1 e )2
ne n + ne )
=
(1 e )2
83
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
(1 e )2
CRLB =
ne n + ne )
Solution 2.23: [wk06Q10, Exercise, Schedule] We observe Yi Ber(p) the number of successes. We
know that ni=1 Yi Bin(n, p). We have that X is the number of successes.
P
1. We have that:
n
Y
L(p, y) = pyi (1 p)1yi = p x (1 p)nx = L(p, x).
i=1
and than the the derivative of `(p, x) with respect to p and equate that equal to zero:
`(p, x) X n X
0= =
p p 1 p
(1 p)X (n X)p
=0
p(1 p) p(1 p)
(1 p)X =(n X)p
X =np
b p =X/n.
84
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
3. We have:
p p
1.96 < p <1.96
b
p (1 p)/n
p p
<1.96
b
p
p (1 p)/n
p p 2
<1.962
b
p (1 p)/n
p (1 p)
p2 + p2 2 b
b p p <1.962
n
(1 + 1.962 /n) p2 (2 b
p + 1.962 /n) p + b
p2 <0
p + 1.962 /n)2 4 (1 + 1.962 /n) b
p
p + 1.962 /n)
(2 b (2 b p2
<p <
2 (1 + 1.962 /n) 2 (1 + 1.962 /n)
p + 1.962 /n)2 4 (1 + 1.962 /n) b
p
p + 1.962 /n)
(2 b (2 b p2
+ ,
2 (1 + 1.962 /n) 2 (1 + 1.962 /n)
where the last step is derived using the abc-formula: ax2 +bx+c = 0 d = b2 4ac, x = b d
2a
.
p
p(1 b
4. Using b p) p(1 p) as n we can approximate the approximated pivotal quantity
by:
p p
b
p N(0, 1).
bp (1 b
p)/n
p p
z10.025 < p <z10.975
b
p (1 b
b p)/n
p p
1.96 < p <1.96
b
p (1 b
b p)/n
p p
1.96 b p (1 bp)/n < b p p <1.96 b p (1 bp)/n
p p
1.96 b p (1 bp)/n b p < p <1.96 b p (1 bp)/n b
p
p p
p 1.96 b
b p (1 bp)/n < p <bp + 1.96 b p (1 b
p)/n.
(a) We have n = 10, X = 4 and b p = X/n = 0.4. Using this the confidence interval in c) is
given by (0.168177581, 0.687330453) and in d) by (0.096358106, 0.703641894).
85
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
(b) We have n = 200, X = 80 and bp = X/n = 0.4. Using this the confidence interval in c) is
given by (0.33460464, 0.469164561) and in d) by (0.332103608, 0.467896392).
p
We observe that for large n indeed the convergence b p(1 b
p) p(1 p) is is a good approx-
imation, but for small n (i.e., equal to 10) this does not hold. Therefore, the approximation of
the confidence interval in d) is substantial different than the confidence interval in c).
Note that we use in cases the law of large numbers for the pivotal quantity. The Law of Large
Number is a good approximation if we have a large sum, which is not the case for n = 10.
Therefore, it would be better to use the exact binomial test if n is small and not the normal
approximation. Hence, if n is large, both the Law of Large Numbers and the convergence
p
p(1 b
b p) p(1 p) can be used for a good approximation of the confidence interval for p, but
if n is small, one should use the exact Binomial pivotal quantity.
Solution 2.24: [wk06Q11, Exercise, Schedule] Note that the sample size is equal to 16 (i.e., n = 16),
thus we have a small sample size and have to use the student-t distribution for the population mean.
The 95% (i.e., = 0.05) confidence interval for the population mean is given by:
s s
x t1/2,n1 < < x t1/2,n1
n n
s s
P16 P16 2 2 P16 P16 2 2
i=1 xi i=1 xi n x 1 i=1 xi i=1 xi n x 1
t1/2,n1 < < t1/2,n1
n n1 n n n1 n
r r
51.2 243.19 163.84 1 51.2 243.19 163.84 1
t10.025,15 < < + t10.025,15
16 15 16 16 15 16
r r
5.29 5.29
3.2 2.131 < < 3.2 + 2.131
16 16
1.974675 < < 4.425325,
using t10.025,15 = 2.131 (see table Formulae and Tables page 163).
1. We have that:
(n 1)S 2
2 (n 1).
2
Moreover, we know that 2 (n 1) = n1 2
P
i=1 Zi N(n 1, 2 (n 1)) as n due to the Law
of Large numbers. Thus, as n we approximately have:
(n 1)S 2 /2 (n 1)
N(0, 1).
2 (n 1)
86
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
Solution 2.26: [wk06Q13, Exercise, Schedule] We have that Xi POI() i.i.d. for i = 1, . . . , 500.
Using the moment generating technique we have MP500 i=1 Xi
(t) = MX500
i
(t) = exp( (exp(t) 1))500 =
exp(500 (exp(t) 1)). Thus we have i=1 Xi = X POI(500).
P500
Due to the Law of Large numbers, we have that 500i=1 Xi = X is approximately normally distributed
P
with mean 500 and variance 500. Thus:
X
500 N(0, 1).
500
We have:
83
z0.025 < 500 < z0.975
500
83/ 500 500
1.96 < < 1.96
83
500 <1.96
500
83 2
83
+ 500 2 2 500 <1.962
500 500
832
83
+ 500 2 (2 500 + 1.962 ) <0
500 500
0.133923 < < 0.205761,
where the last step is derived using the abc-formula, i.e., ax2 + bx + c = 0 d = b2 4ac, x = b d
2a
.
Solution 2.27: [wk06Q14, Exercise, Schedule] We have that Xi UNIF(0, ) i.i.d. for i = 1, . . . , n.
We denoted U = (1/)X(n) .
1. We know (week 5) that that the cumulative distribution function of the maximum F X(n) =
F X (x(n) ) n and we have F X (x) = x , if 0 < x < . Thus we have:
if x(n) < 0;
0,
x(n) n
F X x(n) = , if 0 x(n) ;
if x(n) > .
1,
87
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
if u < 0;
0,
FU (u) = n
,
(u) if 0 u 1;
if u > 1.
1,
Pr (q1 ) =0.95
Pr q1 X(n) /U =0.95
!
q1
Pr 1/U =0.95
X(n)
!
X(n)
Pr U =0.95
q1
!
X(n)
FU =0.95
q1
X(n) n
!
=0.95
q1
X(n)
=0.951/n
q1
q1 =X(n) 0.951/n
Pr X(n) 0.951/n =0.95,
* using U = (1/)X(n) = X(n) /U. Thus, the 95% lower confidence interval for is
(0, X(n) 0.951/n ).
88
Module 3
Hypothesis Test
Exercise 3.2: [wk07Q2, Solution, Schedule] Let X1 , X2 , . . . , X10 be a random sample of size 10 from
a Poisson distribution with mean . Consider the critical region C defined by:
10
X
C= , , . . . , .
(x x x ) : x 3
1 2 10 k
k=1
1. Show that C is a best critical region for testing H0 : = 0.1 against Ha : = 0.5.
Exercise 3.3: [wk07Q3, Solution, Schedule] Let X1 , X2 , . . . , Xn be a random sample from the density
function: !
1 1
fX (x|) = exp (x ) . 2
2 2
At a level of significance , find the best critical region (or most powerful test) for testing the simple
null H0 : = 0 against the simple alternative Ha : = 1.
Exercise 3.4: [wk07Q4, Solution, Schedule] Let X1 , X2 , . . . , Xn be a random sample from a Poisson()
distribution. In testing the simple null H0 : = 0 against the simple alternative Ha : = 1 , where
1 > 0 :
2. Determine the distribution of the test statistic under the null hypothesis.
89
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
1. A manufacturing company produces screws of a particular size which are put into boxes of
150. On a particular day a random sample of such boxes is taken from each of the morning
and afternoon production runs. The number of defective screws found in each sampled box are
given in the following table:
Morning 28 17 18 16 20 12 11 10 18 17 20 25
Afternoon 19 15 22 21 9 14 17 13 22 9
Table 3.1: Number of defectives per box
(a) Test for a difference between the mean number of defectives produced in the morning and
afternoon (you may assume that the underlaying population variances are equal).
(b) Plot the data in an appropriate and simple way and comment briefly on the validity of the
test of part ii).
2. On another day screws are put into boxes of 100. The table below gives the number of defectives
in twenty boxes sampled from this days production run.
5 15 18 12 8 7 9 14 11 10
6 18 14 9 18 12 11 5 18 12
Table 3.2: Number of defectives per box of 100 screws
(a) Carry out a test to establish whether there is a difference between the proportions of de-
fectives produced on the two days.
(b) Carry out a test to establish whether the proportion of defectives in boxes of 100 screws is
more than 9%.
Exercise 3.6: [wk10Q10, Solution, Schedule] Suppose that Y represents a single observation from
the probability density given by:
( 1
y , 0<y<1
fY (y|) =
0, elsewhere.
Find the most powerful test with significance level = 0.05 to test H0 : = 2 against Ha : = 1.
Exercise 3.8: [wk08Q2, Solution, Schedule] Let X have a Bernoulli distribution where = Pr (X = 1).
Take a random sample of size n = 10 from this Bernoulli distribution and consider the test:
90
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
where
nj k X nj k
X xi j X xi j X njxj
xj = x= =
i=1
nj j=1 i=1
N j=1
N
Hint: 1) Rewrite the left side by adding and subtracting within the squares x j ;
2) Rewrite is using Binomial expansion (see F&T page 2).
Exercise 3.12: [wk08Q7, Solution, Schedule] The following observations represent weight loss (in
pounds) of men of similar physique, metabolic activity, and so on, after a certain amount of time on
three types of diet programs: A, B, and C.
Diet Program
A B C
3 2 7
7 4 10
4 6 8
5 6 9
6 5 4
- 3 8
- 4 -
Test for the differences in the mean weight loss between the three diet programs. State any assump-
tions you make. Provide the point estimates estimates of the mean losses, and the ANOVA table used
to partition the various sources of variation.
91
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
is to be preformed. A random sample of 50 claims is examined and yields a mean amount of 207
and a standard deviation of 42. Calculate the approximate p-value for the test.
Preform an approximate F test at the 5% level to investigate the validity of the equal variance as-
sumption.
Subject 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Age (x) 29 39 44 37 42 17 38 43 51 30 32 59 33 31 32 32 36 50
Incubation 13 46 43 34 20 20 18 72 19 36 48 44 21 32 86 48 28 16
period (y)
Survival N Y Y N N Y N Y N N N Y N N Y N Y N
1. A scatterplot of incubation period against age is given below, in which different symbols are
used for subjects who died and for subjects who survived.
92
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
70
60
Incub Per
50
40
30
20
10
15 20 25 30 35 40 45 50 55 60
Age
Comment briefly on any relationships between age and incubation period for those subjects
who died and for those who survived.
and make a brief informal comparison of the died and survived groups based on these dotplots.
3. Construct a 95% and 99% confidence intervals for the mean difference between the incubation
period for subjects who survived and subjects who dies (i.e., take the mean incubation period
for subjects who survived minus the mean incubation period for subjects who died).
Comment briefly on these confidence intervals.
4. (a) Construct a test to investigate whether the variances of the incubation period for subjects
who died and subjects who survived are equal.
(b) Comment on the validity of the assumptions that are required for the confidence intervals
given in part c) to be approximate.
Company A 117 154 166 189 190 202 233 263 289 331
Company B 142 160 166 188 221 241 276 279 284 302
93
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
1. Illustrate the data given above on a suitable diagram and hence comment briefly on the validity
of the assumptions required for a two-sample t test for the premiums of these two companies.
2. Assuming that the premiums are normally distributed, carry out a formal test to check that it is
appropriate to apply a two-sample t test to these data.
3. Test whether the level of premiums charged by company B was higher than that charged by
company A. State your p-value and conclusions clearly.
4. Calculate a 95% confidence interval for the difference between the proportions of premiums of
each company that are in excess of 200. Comment briefly on your result.
5. The average premium charged by company A in the previous year was 170. Formally test
whether company A appears to have increased its premium since the previous year.
I II
A 22 28 50
B 28 22 50
50 50 100
Calculate the observed 2 test statistic and state an approximate conclusion concerning the
independence of the two criteria.
2. (Added to the past Institute exam) Preform the Pearsons chi-square test using R.
Exercise 3.19: [wk09Q3, Solution, Schedule] Continued from previous question Exercise 3.18. Us-
ing the Fishers exact test:
94
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
3. Use R to calculate show that the cumulative density function of a Hypergeometric distribution
with N = 100, M = 50, n = 50 and x = 22 equals 0.15867.
Exercise 3.20: [wk09Q4, Solution, Schedule] Compare the results in question 3.18. and 3.19. and
explain the differences/similiraties.
1. (a) State any assumptions needed to justify the use of a binomial model for the number of
sampled houses per street which have been burgled during the last six months.
(b) Derive the maximum likelihood estimator of p, the probability that a house of the type
sampled has been burgled during the last six months.
(c) Fit the binomial model using your estimate of p, and, without doing a formal test, com-
ment on the fit.
2. An insurance company works on the basis that the probability of a house being burgled over a
six month period is 0.18. Carry out a test to investigate whether the binomial model with this
value of p provides a good fit for the data.
Exercise 3.22: [wk09Q6, Solution, Schedule] Check your answer of Exercise 3.21 using R.
Exercise 3.23: [wk09Q7, Solution, Schedule] Does education really make a difference in how much
money you will earn?1 Researchers randomly selected 100 people from each of three income categories
marginally rich, comfortably rich, and super richand recorded their education levels. The data
are summarised in the table that follows.
1. Describe the independent multinomial populations whose proportions are compared in the 2
analysis.
95
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
3. Do the data indicate that the proportions in the various education levels differ for the three
income categories? Test at the = 0.01 level.
4. Construct a 95% confidence interval for the difference in proportions with at least an under-
graduate degree for individuals who are marginally and super rich. Interpret the interval.
96
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
Solutions
Solution 3.1: [wk07Q1, Exercise, Schedule] Distinction between terms:
1. The null hypothesis is the hypothesis being tested while the alternative hypothesis is the hy-
pothesis accepted if the null is rejected.
2. A one-tailed hypothesis is one where it is of the form of an inequality like > a or < a, while
a two-tailed is of the form , a.
3. A simple hypothesis is one where if true, it will completely specify the probability distribution,
otherwise it is called a composite hypothesis.
4. A Type I error is the mistake committed when the null hypothesis is rejected when it is in fact
true. On the other hand, a Type II error is the mistake committed when the null hypothesis is
accepted when it is in fact false.
Solution 3.2: [wk07Q2, Exercise, Schedule] The probability mass function for a Poisson is:
e x
pX (x) = .
x!
1. Thus, the best critical region is given by solving the Neyman-Pearson lemma:
L (x1 , . . . , xn ; 0 ) e0.1n (0.1) xk / xk !
P Q
=
L (x1 , . . . , xn ; 1 ) e0.5n (0.5) xk / xk !
P Q
=e0.4n (0.2) xk k
P
= (0.2) xk k1 (= k/e0.4n )
P
X
= xk log((0.2)) k2 (= log(k1 ))
X
xk k (= k2 / log(0.2))
is the form of the best critical region (note: log(0.2) < 0). Thus the best critical region is of the
form:
X
?
n o
C = (x1 , . . . , xn ) : xk k ,
where k is such that Pr ni=1 xi k |H0 = .
P
2. For the specific form of the critical region given in the problem, that is, where we reject the null
H0 when 10
P
k=1 xk 3, the level of significance is:
10
X X e1
= Pr xk 3 | = 0.1 =
x=3
x!
k=1
= 1 e1 + e1 + e1 /2 = 0.0803.
Solution 3.3: [wk07Q3, Exercise, Schedule] The density is actually that of a N (, 1) distribution
where the variance is known. The best critical region can be found by solving:
n P
L (x1 , . . . , xn ; 0 ) 1/ 2 exp 21 nk=1 xk2
= n
L (x1 , . . . , xn ; 1 )
P
1/ 2 exp 12 nk=1 (xk 1)2
n
1 X
= exp 2xk n k.
2 k=1
97
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
Pr Reject H0 | = 0 =
since we know that when = 0, the nk=1 Xk N (0, n). Since the probability is equal to , we have
P
k
= z1 . Thus, reject the null whenever:
n
n
X
xk n (z1 ) .
k=1
1. Using Neyman-Pearson lemma, the best critical region can be found by solving:
P
L (x1 , . . . , xn ; 0 ) en0 0 xk / xk !
Q
=
L (x1 , . . . , xn ; 1 )
P
en1 1 xk / xk !
Q
!P x k
0
= en(0 1 ) k,
1
which after some manipulation will lead us to:
0
X !
xk log log ken(0 1 )
1
X log ken(0 1 )
xk = k ,
0
log 1
0 0
!
1 > 0 = < 1 = log < 0.
1 1
xk k where k is determined from:
P
Thus, reject H0 whenever
n
X
xk k |0 = .
Pr
k=1
98
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
2. Note that since the sum of Poisson is another Poisson with parameters by simply adding the
Poisson parameters, the distribution of the test statistic nk=1 xk Poisson(n0 ). Therefore, k is
P
determined from
x
X en0 (n0 )
.
argmax
x!
k
x=k
1. (a) Test:
H0 : M = A v.s. H1 : M , A ,
where M is the population mean of the number of defectives in the morning and M is
the population mean of the number of defectives in the afternoon. Note that we are asked
to test whether there is a difference between in the means, which implies a two-sided
test. The test statistic is using the difference in mean, given unknown population variance,
which is assumed to be equal in the two samples. Note that the sample sizes are small, thus
we do not approximate the student-t distribution with the standard normal one. Hence, the
test statistic is:
(X M X A ) ( M A )
T= q
S p n1M + n1A
,s
(X M X A ) ( M A ) n? S 2p
,
= n? where n? = nA + n M 2
2
q
p n M + nA
1 1
} 2 (n? )/n?
| {z }
| {z
Z
(X M X A )
= q tnM +nA 2
S p n1M + n1A
99
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
Note that the significance level is not given in this exercise. Thus we have to find he
p-value of the test. From Formulae and Table page 163 we observe that the 1- quantile
student-t distribution with 20 degrees of freedom takes the value 0.6870 for = 0.25 and
0.8600 for = 0.20. Therefore, the p-value is close to 2 0.25 = 0.5 (somewhat lower).
Hence usually we consider p-values of 0.1, 0.05 or 0.01, for those p-values we would
reject the null hypothesis and accept the alternative, i.e., there is no statistical difference
in the number of defectives in the morning compared with the afternoon.
(b) See below a dotchart(note only stars are enough). The stars represent the observations
(lower, black morning observations, upper, blue afternoon). The + signs corresponds to
x 2s, x s, x, x + s, x + 2s, in the middle using pooled sample standard variance and the
upper and lower ones using the sample standard variance of the individual (e.g. morning
or afternoon) sample. If the data is normal, then know that 95% of the observations should
be smaller (larger) than the +2 (2) and approximately 2/3 of the observation should
lay in the interval ( , + ).
5 10 15 20 25 30
From this dotchart we observe that the equal variance assumption seems reasonable, the
normality assumption of the morning data seems reasonable, however of the afternoon
data the normality assumption seems questionable (perhaps due to small number of obser-
vations) with not a hump-shaped density function (i.e., we do not observe that there are
more observations around the mean) and the probability of large outliers is relatively large
(i.e., we observe some excess kurtosis).
100
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
2. (a) In this question we are interested in proportions, i.e., the probability of a defective screw.
We are testing:
H0 : p0 = p1 v.s. H1 : p0 , p1
where p0 is the (population) probability of a defective in the 150 screws sample day and
p1 is the (population) probability of a defective in the 100 screws sample day. Note that
we are asked to test whether there is a difference between the proportions, which implies
a two-sided test and that n and np are large, so we can use the normal approximation.
The test statistic of difference in proportions is (similar to the difference in mean when
variance -under the null- are equal):
p0 p1
Z= q N(0, 1)
p) n10 + n11
p(1 b
b
Note the difference with the example in the lecture notes in week 7, where the proportion
under the null hypothesis is given. In this case both p0 and p1 are random variables. Under
the null hypothesis of equal proportions, the best estimate of this proportion, denoted by
p is given by the average proportion in the two samples combined.
b
The rejection region is C = {(x1 , . . . , xn )|Z {(, z1/2 ) (z1/2 , )}}.
Hence, we have:
n0 =22 150 = 3300, n1 = 20 100 = 2000
373 232
p0 = = 0.11303, p1 = = 0.116
22 150 20 100
373 + 232 605
p= = = 0.11415.
22 150 + 20 100 5300
b
Again, no level of significance is given, so we compute the p-value. From Formulae and
Tables page 160 we observe (0.22) = 0.58706 Hence, the p-value would be 2 (1
0.58706) = 0.82588. Thus the difference in the proportion is not significant at levels of
< 0.82588, which is usually the case.
(b) Now we have the following test:
H0 : p = e
p = 0.09 v.s. H1 : p = p1 > e
p
Note that one can also set H0 : p = e p 0.09, but this complicates the test statistic. It
would lead to the same statistic and critical value.
The test statistic now which corresponds with the one in the lecture notes:
p1 e
p
Z= p N(0, 1).
p (1 e
e p)/n1
The rejection region is C = {(x1 , . . . , xn )|Z (z1 , )}.
The value of this test statistic is, using that under the null hypothesis2 p = e
p = 0.09:
0.116 0.09
Z= = 4.063.
0.09 0.91/2000
2
In case of composite null hypothesis H0 : p1 0.09, you should select here the p1 (0, 0.09] which leads to the
highest Type I error (Pr(Reject H0 |H0 is true)), which is the highest if p1 = 0.09.
101
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
From Formulae and Tables page 161 we observe (4.06) = 0.99998 thus the correspond-
ing p-value is 1 (4.06) = 0.00002. Hence, for level of significance higher than 0.00002
(for example 5%) we can reject the null hypothesis that the proportion of defectives is 9%.
Hence we can conclusively disprove that the proportion is 9% and thus we have proven
that the proportion is larger than 9%.
Solution 3.6: [wk10Q10, Exercise, Schedule] For a single observation, note that L(Y|) = y1 .
Hence
L(Y| = 2) 2y
= = 2y, 0 < y < 1.
L(Y| = 1) 1
The form of the critical region of the best test is
2y < k,
or equivalently
k
y< = c.
2
To find c, note that = 0.05 is specified and
y < 0.2236,
Solution 3.7: [wk08Q1, Exercise, Schedule] Distinction between terms level of significance is the
probability of committing a Type I error, while the power of the test is the probability that the null is
rejected when in fact it is false.
Solution 3.8: [wk08Q2, Exercise, Schedule] The power function is the probability of rejecting the
null hypothesis as a function of the parameter, while the size of the test is also the levelof significance,
and is the probability of committing a Type I error.
1. Let S = 10 k=1 Xk , the total number of successes in the sample. Clearly, S Binomial(n = 10, p = ) .
P
The power function is thus
which is clearly a function of . A sketch of the power function for various values of the
parameter is given on the next page. It was produced using the following R code:
102
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
theta=0:100/200+0.5
par(lab=c(10,10,7))
plot(theta,1-pbinom(5,10,theta),type="n",
ylab="Power",xlab="theta")
lines(theta,1-pbinom(5,10,theta),col=4)
1.00
0.90
0.80
0.70
Power
0.60
0.50
0.40
theta
= Pr Reject H0 | 0.5
Pr Reject H0 | = 0.5
10 !
X 10
= 0.5k (0.5)10k
k=6
k
= 0.37695.
Solution 3.9: [wk08Q3, Exercise, Schedule] The power of the test is:
1 = 1 Pr Type II error
k=1
X e5 5 x
=
x=3
x!
= 1 e5 + 5e5 + e5 52 /2 = 0.8753.
* using critical region: C = {(X1 , . . . , X10 ) : 10 k=1 xk 3} and H1 : = 0.5 from week 7 material. **
P
P10
using k=1X H POI(n) using X |H POI().
k 1 i 1
103
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
Solution 3.10: [wk08Q4, Exercise, Schedule] The power of the test is:
1 = 1 Pr Type II error
k=1
!
n (z1 ) n
= Pr Z
n
= Pr Z z1 n .
* using critical region: C = {(X1 , . . . , Xn ) : nk=1 xk n z1 } derived in week 7 material; ** using
P
H1 : = 1 from week 7 material; and *** using nk=1 Xk H1 N(n, n) using Xi |H1 N(1, 1).
P
This cannot be evaluated numerically unless of course the sample size is given.
X nj
k X k
X
= (xi j x j ) +
2
n j (x j x)2
j=1 i=1 j=1
Solution 3.12: [wk08Q7, Exercise, Schedule] To test for the difference in weight loss across different
diet programs, we assume the one-way ANOVA model which states that yi j , the weight loss of the jth
individual for diet program i = A, B, C satisfies:
yi j = + i + i j , for i = A, B, C, and j = 1, 2, . . . , ni
where i j refers to the random error with the usual assumption of zero mean and constant variance.
One can easily verify the sample means across diet programs are:
which give the point estimates of the mean losses for each diet program, and the grand mean is:
y = 5.611.
104
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
= (3 5.611)2 + (4 5.611)2 + . . .
= 84.28
Thus, to test:
we would reject the null hypothesis if the observed F-statistic > F1 (I 1, N I). Since
we then reject H0 and say that there is strong evidence that the mean losses across diet programs are
different.
Solution 3.13: [wk08Q8, Exercise, Schedule] The hypothesis is given in this question. To find the
test statistic we can apply the cental limit theorem, because n = 50 is large. Therefore, the test statistic
is:
X
Z= N(0, 1).
/ n
The rejection region is C = {(X1 , . . . , Xn ) : Z {(, z1/2 ) (z1/2 , )}}.
The value of the test statistic for the sample with n = 50, x = 207, and = 42 is given by:
207 200
Z= = 1.178511.
42/ 50
From Formulae and Tables page 160 we observe (1.17) = 0.87900. We have a two sided test
therefore p/2 = 1 (1.17) = 0.121 p = 0.242, i.e., the p-value is 0.242.
105
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
22 S 12
F= F(n1 1, n2 1)
21 S 22
2
S
= 12 F(n1 1, n2 1),
S2
* using, under the null equal variances, thus the fraction of the variances are equal to one.
The rejection region is C = {(x1 , . . . , xn )|F {(0, 1/F1/2 (n2 1, n1 1))(F1/2 (n1 1, n2 1), )}}.
The upper critical value is is given by F(24, 29, 0.975) = 2.514 and the lower approximated by
F(24, 29, 0.025) = 1/F(29, 24, 0.975) 1/F(24, 24, 0.975) = 1/2.269 = 0.441 (see Formulae and
tables page 173), note two-sided test, therefore we have the 1 /2 for constructing the critical value.
The value of the test statistic is:
s21 139.7
F= = = 1.82.
s22 76.6
We reject the null hypothesis for large and small value of F, which is not the case. Hence, we cannot
reject the null hypothesis of equal variances at a 5% significance level.
1. There does not seem to be a relationship between age and incubation period for both individuals
who died and who survived. (There seems to be a (positive) relationship between surviving and
the incubation period, but this was not asked in this question).
2. For this we use the following dotplots (with the upper dots for the individuals who died (black
stars) and the lower for the individuals who survived (blue stars)).
106
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
20 25 30 35 40 45 50 55
Age
(a) The dotplot does not suggest a relationship between survival and age.
20 30 40 50 60 70 80
Age
107
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
(b) The dotplot suggests a relationship between survival and incubation period, namely the
individuals who survived tended to have a longer incubation period.
3. We are interested in the difference in mean, with unknown population standard deviation.
Therefore, we have to assume that the population variance of the incubation period for the
survived and died individuals is equal to set up a test statistic (from the dotcharts we observe
the variance of died individuals might be higher than the survived individuals). Under this
assumption, and using the central limit theorem (which might be not a good approximation
because number of survived is 7 and number of died is 11, i.e., total sample size is 18) or when
both the incubation period for survived and the incubation period for died are normally dis-
tributed (which might be a good approximation looking at the dotcharts) we have the following
test statistic:
(Y S Y D ) (S D )
T= q tnS +nD 2 ,
S p nS + nD
1 1
where Y S , S , nS is the sample mean, population mean, and sample size of incubation period
for survived individuals, Y D , D is the sample mean, population mean, and sample size of incu-
bation period for died individuals, and S p is the sample. Note that sample size is small, thus we
have to use the t-distribution and not the standard normal distribution. We have:
nS = 7 nD = 11
yS = 339/7= 48.429 yD = 305/11 = 27.727
P 2
yS
P y 2 2
sS = nS 1 nS nS
2 nS S
= 6 7 339
7 19665
7
= 541.2857
P 2 P 2 2
y
s2D = nDnD1 nDD nyDD = 11
10
10035
11
30511
= 157.8182
(n 1)s2 +(n 1)s2
s2p = S nS S+nD 2D D
= 6541.2857+10157.8182
16
= 4825.8961
16
= 301.6185
The (1 ) 100% confidence interval of the difference in mean is given by:
r r
1 1 1 1
(xS xD ) t1/2,n1 +n2 2 s p + <S D < (xS + xD ) + t1/2,n1 +n2 2 s p +
n nD nS nD
rS r
1 1 1 1
20.702 t1/2,n1 +n2 2 17.3672 + <S D < 20.702 + t1/2,n1 +n2 2 17.3672 +
7 11 7 11
Using Formulae and Table page 163 we observe t0.975,16 = 2.120 and t0.995,16 = 2.921. Thus
the 95% confidence interval for the difference in mean is given by (2.9, 38.5) and the 99%
confidence interval for the difference in mean is given by (3.8, 45.2).
The 95% confidence interval for the difference in mean does not include zero, hence when
testing the hypothesis of equal mean versus the alternative of a difference in mean (two-sided)
with a significance level of 5% we would reject the null hypothesis.
However, the 99% confidence interval for the difference in mean does include zero, hence when
testing the hypothesis of equal mean versus the alternative of a difference in mean (two-sided)
with a significance level of 1% we cannot reject the null hypothesis.
4. (a) We preform the test:
H0 : 2S = 2D v.s. H1 : 2S , 2D = 5%
The test statistic is given by:
2D S S2
F = 2 2 F(n1 1, n2 1)
S S D
2
S
= 2s F(6, 10),
Sd
108
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
* using, under the null equal variances, thus the fraction of the variances are equal to one.
The rejection region is C = {(x1 , . . . , xn )|{(0, 1/F1/2 (n2 1, n1 1)) (F1/2 (n1 1, n2
1), )}}.
The upper critical value is is given by F(6, 10, 0.975) = 4.072 and the lower is given by
F(6, 10, 0.025) = 1/F(10, 6, 0.975) = 1/5.461 = 0.18312 (see Formulae and tables page
173), note two-sided test, therefore we have the 1 /2 for constructing the critical value.
The value of the test statistic is:
s2S 541.2857
F= 2
= = 3.4298.
sD 157.8182
We reject the null hypothesis for large and small value of F, which is not the case. Hence,
we cannot reject the null hypothesis of equal variances at a 5% significance level.
Note that: F(6, 10, 0.95) = 3.271 (Formulae and Tables page 172), implying that we can
reject the null hypothesis of equal variances at a 10% significance level, and the p-value
is slightly smaller than 0.1.
(b) See answer question c).
Although the dotcharts suggests that there is a difference in variance, when formally test-
ing the hypothesis, we cannot reject the null hypothesis of equal variances (due to small
sample size which either causes the observed difference in sample variance when the pop-
ulation variances are equal or -in case of unequal population variances- the small sample
size leads to a low power of the test).
From the dotcharts we observe that the incubation period seems to be normally distribu-
tion for both the sample survived and the sample died.
1. See below a dotchart (note only stars are enough). The stars represent the observations (upper,
black Company A observations, lower, blue Company B observations). The + signs corre-
sponds to x 2s, x s, x, x + s, x + 2s, in the middle using pooled sample standard variance and
the upper and lower ones using the sample standard variance of the individual (e.g. Company
A or Company B) sample. If the data is normal, then we know that 95% of the observations
should be smaller (larger) than the + 2 ( 2) and approximately 2/3 of the observation
should lay in the interval ( , + ).
109
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
In order to apply the hypothesis test, the population mean of company A and company B should
be normally distributed with the same population variance. For the assumption of normally dis-
tribution of the population mean for company A and company B we cannot use CLT, because
that only holds for large n, which is not the case. Therefore, only if the underlaying population
is normally distributed, than the population mean is normally distributed.
From the dotcharts we observe that approximately 2/3 of the observation of both samples lay
within one sample standard deviation from the sample mean and no observations are smaller/larger
than the sample mean +/- 2 times the sample standard deviation. There seems to be a concen-
tration of the observations around the sample mean (i.e., hump-shaped p.d.f.). Therefore, we
cannot reject the assumption that the distribution of the premiums of company A and the pre-
miums of company B are normally distributed.
We observe that the sample variance of company A is larger than the sample variance of com-
pany B, but this might be due to the small sample size. Hence, we cannot reject the assumption
of equal variance from the dotcharts.
2. Assuming that the premiums are normally distributed, the only test is the test for equal vari-
ances. Hence, we test/the hypothesis is:
H0 : 2A = 2B v.s.H1 : 2A , 2B with = 0.05
The test statistic is given by:
2B S 2A
F= 2 2 F(nA 1, nB 1)
A SB
2
S
= 2A F(9, 9),
SB
* using, under the null equal variances, thus the fraction of the variances are equal to one.
The rejection region is C = {(x1 , . . . , xnA +nB )|F (0, 1/F1/2 (nB 1, nA 1)) (F1/2 (nA
110
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
1, nB 1), )}}.
The upper critical value is is given by F(9, 9, 0.975) = 4.026 and the lower critical value by
F(9, 9, 0.025) = 1/F(9, 9, 0.975) = 1/4.026 = 0.2484 (see Formulae and tables page 173), note
two-sided test, therefore we have the 1 /2 for constructing the critical value.
The value of the test statistic is:
s2A 4303.4
F= 2 = = 1.243,
sB 3461.7
P P A 2 P P A 2
A2 2134 2 A2
where = s2A nA
na 1
nA
nA = 9 10 10
10 494126
= 4303.4 and s2B = nA nA
=
na 1
nA
2259 2
10
9
541463
10
10 = 3461.7.
We reject the null hypothesis for large and small value of F, which is not the case. Hence, we
cannot reject the null hypothesis of equal variances at a 5% significance level. Therefore, it is
reasonable to assume that 2A = 2B
Note that even F(9,9,0.9) = 2.440 (Formulae and Tables page 171), which we cannot reject the
null hypothesis of equal variance at a level of significance of 20%.
H0 : B = A v.s. H1 : B > A ,
or (note, this will result in the test statistic and the same critical value)
H0 : B A v.s. H1 : B > A .
The corresponding test statistic (note that the sample size is small, hence the student-t distribu-
tion cannot be approximated by the standard normal distribution) is:
(X B X A ) (B A )
T= q
S p n1B + n1A
(X B X A )
= q tnB +nA 2
S p n1B + n1A
3
In case of composite null hypothesis, you should select here the A (, B ] which leads to the highest Type I error
(Pr(Reject H0 |H0 is true)), which is the highest if A = B .
111
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
Note that the significance level is not given in this exercise. Thus we have to find he p-value
of the test. From Formulae and Table page 163 we observe that the 1- quantile student-t
distribution with 18 degrees of freedom takes the value 0.5338 for = 0.3 and 0.2571 for
= 0.40. Therefore, the p-value is between to 0.3 and 0.4 (note: one sided test). Hence usually
we consider p-values of 0.1, 0.05 or 0.01, for those p-values we would reject the null hypothesis
and accept the alternative, i.e., there is no statistical larger premium charged by company B and
company A.
4. Let pA , pB be the (population) proportion of the proportion of the claims that are higher than
200 for Company A and B, respectively. In order to construct the confidence interval for the
difference in proportions we first need the test statistic:
pA b
(b pB ) (pA pB )
Z= q N(0, 1),
pA (1b
b
nA
pA )
+ nB
pB (1b
b pB )
note that, under the null hypothesis, nA pA = 5 and nB pB = 5 which is the minimum require-
ment as rule of thumb for a reasonable good approximation of a Binomial random variable by
a normal random variable, which is used in the test. the corresponding (two-sided) confidence
interval is given by:
s s
pA (1 b
pA ) bpB (1 b
pB ) pA (1 b
pA ) bpB (1 b
pB )
+ < (pA pB ) < (b pB ) + z1/2 +
b b
pA b
(b pB ) z1/2 pA b
nA nB nA nB
From the data we have b pA = 5/10 = 0.5 and b
pB = 6/10 = 0.6 and from Formulae and Tables
page 162 z0.975 = 1.96.
p p
0.1 1.96 0.25/10 + 0.24/10 < (pA pB ) < 0.1 + 1.96 0.25/10 + 0.24/10
0.53 < (pA pB ) <0.33,
thus the 95% confidence interval for the difference in proportion of premiums charged higher
than 200 is given by (0.53, 0.33).
This confidence interval contains the value zero, hence when testing the null hypothesis of equal
proportion of premiums charged higher than 200 versus a different in proportions (two-sided
test), we cannot reject the null hypothesis at a 5% significance level.
5. Now, we have the following hypothesis:
H0 : A = 170 v.s. H1 : A > 170
The test statistic is (recall small sample size nA = 10, so we cannot approximate the student-t
distribution by a standard normal one):
X A A
T= tnA 1
sA / nA
X A 170
= tnA 1
sA / nA
* assuming that the null hypothesis is true. Reject for large values of the statistic, i.e., the
rejection region is C = {(x1 , . . . , xnA )|T (t1 (nA 1), )}.
The value of the test statistic, given the calculation in b) (i.e., s2A = 4303.4) and c) (i.e., xA =
213.4), is given by:
213.4 170
T= = 2.092
340.34
112
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
From Formulae and Tables page 163 we observe t9,0.95 = 1.833 and t9,0.975 = 2.262. Therefore,
the p-value lays between 2.5% and 5%, i.e., testing at a 5% significance level would reject
the null hypothesis of no increase in the premium, whereas testing at a 2.5% significance level
would not lead to a rejection the null hypothesis of no increase in the premium.
Solution 3.17: [wk09Q1, Exercise, Schedule] The calculation of the expected value in each cell are
done in the table below. The expected value is simply the product of the (row total) to (column total)
and dividing it by the (grand total). You can easily verify the numbers:
From the chi-square table with degree of freedom equal to (row - 1) x (column - 1) = 4, we have:
2 value = 9.49
at = 5%. Thus, we would reject the null hypothesis of independence if the observed 2 statistic
exceed the 2 value and in this case, it does. Therefore, we conclude that based on the data, there is no
strong evidence to support the hypothesis that intelligence and manner of clothing are independent.
Using the likelihood ratio test we find the approximate chi-squared test statistic:
X (Oi Ei )2
T= 21 ,
i{{A,B},{I,II}}
E i
Note that the degrees of freedom of the unconstraint model is two, i.e., the Pr(A = I) (which
result in Pr(A = II) = 1 Pr(A = I) and is therefore no extra degree of freedom) and Pr(B = I)
(which result in Pr(B = II) = 1 Pr(B = I) and is therefore no extra degree of freedom). In
the constraint model, i.e., under the null, we have Pr(I) = Pr(A = I) = Pr(B = I) and thus
Pr(A = II) = Pr(B = II) = 1 Pr(I), hence the only parameter is Pr(I) and thus the constraint
model has one degree of freedom. We will reject the null hypothesis for large value of the test
statistic (interpretation: large values of the test statistic corresponds to large value of (O E)2
and hence large deviations of what is expected under the null hypothesis, which is not likely)
In order to find the chi-squared test, we have to find the observed and expected numbers. We
have that the sum of each row and the sum of each columns equals 50. Therefore, under the null
113
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
hypothesis that the classification criteria where independent, the expected number of each cell
should be 501/2 = 25. The 1/2 is due to the probability that an observation (either in A or B) is
equal to I is 50/100 = 1/2 (using column totals), note that this is under the null hypothesis our
best estimate of the proportion. Thus we have the following observed and expected numbers:
Observed I II Expected I II
A 22 28 50 A 25 25 50
B 28 22 50 B 25 25 50
50 50 100 50 50 100
and the corresponding observed minus expected:
Observed-Expected I II
A 3 3
B 3 3
Hence, the value of our test statistic is:
32
T =4 = 1.44.
25
From Formulae and Tables page 164 we observe that Pr(21 1.44) = 0.77. Hence, our p-value
is 1 0.77 = 0.23. Thus, for levels of significance of 0.23 or less (usually the case) the is not
evidence of dependence of the two criteria.
2. R-code for the Pearson Chi-squared test:
> data < matrix(c(22,28,28,22),nrow=2,byrow=T) #create 2 2 matrix of the data
> chisq.test(data,correct=F) #preform the test
This is the same as we we calculated in the previous question.
N11 N12 n1
N21 N22 n2
n1 n2 n
114
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
4. Step 1 (defining the hypothesis) and step 2 (defining the test statistic) have been done in question
i). We now need to find the corresponding p-value of this test. To do so we use the cumulative
distribution function. We find that Pr(X 22) = 0.15867 (see question iii). We would reject
the null hypothesis if Pr(X 22) /2 or Pr(X 22) 1 /2. The smallest for which this
holds is obtained by Pr(X 22) = /2 = 0.15867 p-value is 2 0.15867 = 0.3173. Hence,
for reasonable levels of significance (less than 31%) the test cannot reject the null hypothesis
of independence.
Solution 3.20: [wk09Q4, Exercise, Schedule] Both test cannot reject the null hypothesis at reason-
able levels of significance. However, the p-values substantially differs, i.e., p-value is 0.3173 in the
Fisher test and p-value is 0.2301 in the chi-squared test. This is due to the fact that the chi-squared
test uses an approximated distribution (normal for the Binomial one for the observed numbers), which
holds if np with x = 22 and x = 28 this should give a good approximation. Hence, therefore
we cannot reject the null hypothesis in both test. However, since the chi-squared test is only an
approximated test, this explains to the difference in the p-value.
1. (a) There are two assumptions which follows from the fact a Binomial distribution is the sum
of i.i.d. Bernoulli random variables. Hence, the two assumptions are that the probability
of house being burgled is independent from other houses being burgled (the independent
part of i.i.d.) and that each house in should have the same probability of being burgled
(the identically distributed part of i.i.d.).
(b) We have that the number of houses per street which are burgled has a Bin(6, p) distribution.
Each street is an observation of the random variable X Bin(6, p) the number of houses in
a street in the sample which are burgled the past six months. Thus the Likelihood function
115
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
is given by:
100
Y
L(p; x) = fX (xi )
i=1
= (Pr (X = 0))39 (Pr (X = 1))38 (Pr (X = 2))18 (Pr (X = 3))4
(Pr (X = 4))0 (Pr (X = 5))1 (Pr (X = 6))0
! !39 ! !38
6 6
= 0
p (1 p) 6
1
p (1 p) 5
0 1
! !18 ! !4
6 2 4 6 3 3
p (1 p) p (1 p)
2 3
! !0 ! !1 ! !0
6 4 2 6 5 1 6 6 0
p (1 p) p (1 p) p (1 p)
4 5 6
2 `(p; x) 91 509
= 2 < 0,
p 2 p
b p)2
(1 b
p=
Hence, b 91
600
is indeed the maximum of the log-likelihood function and hence the MLE.
116
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
# of streets 0 1 2 3 4 5 6
Observed # burgles 39 38 18 4 0 1 0
Expected # burgles 37.3 40.0 17.9 4.3 0.6 0.0 0.0
The observed and expected number of streets with burgles equal to 0, 1, . . . , 6 are similar,
which implies a good fit.
H0 : p = 0.18 provides a good fit v.s. H1 : p = 0.18 does not provide a good fit
For a goodness of fit test, we use the chi-squared test statistic with k bins:
k
X (Oi Ei )2
T= 2k1 ,
i=1
Ei
Note that in this hypothesis p is given and thus not estimated, hence we do not have to reduce
the degree of freedom with the number of parameters estimated. We reject the null hypothesis
for large value of the test statistic.
Similar to question a)iii), i.e., X Bin(n, p), with n = 6, but now with p = 0.18 we have the
probabilities for x = 0, 1, . . . , 6: 0.3040, 0.4004, 0.2197, 0.0643, 0.0106, 0.0009, 0.000. Hence,
the expected number of street, given the estimate of p = 0.18, with the number of houses
burgled equal to x = 0, 1, . . . , 6 is given by m = 100 times this probability. From this we can
construct the following table:
# of streets 0 1 2 3 4 5 6
Observed # burgles 39 38 18 4 0 1 0
Expected # burgles 30.40 40.04 21.97 6.43 1.06 0.09 0.0
Since the expected number of burgles is less than 5 for # burgles per street is equal to 4, 5, and
6. Therefore, we have to aggregate cells in order to obtain only cells which have an expected
number of street with this burgled larger than or equal to 5. Aggregating cells 4, 5, and 6
would only lead to an aggregate of 1.15, which is also substantial smaller than 5, therefore we
aggregate cells 3, 4, 5, and 6 (i.e., 3 or more burgles in a street) which result in an aggregate of
7.58.
# of streets 0 1 2 3+
Observed # burgles 39 38 18 5
Expected # burgles 30.40 40.04 21.97 7.58
117
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
# PREFORM TEST
> chisq.test(Burgled2,y=BurgledPred2,rescale.p=T) #chisq.test(x (observed), y = NULL Hypothesis
(expected), TRUE then p is rescaled (if necessary) to sum to 1 if FALSE, and p does not sum to 1, an
error is given)
1. We are looking at 3 populations here: the MR (marginally rich), the CR (comfortably rich) and
the SR (super rich). For each of these populations we group members into one of four education
groups, thus creating a multinomial classification of each of the populations.
2. The observed proportions are given by the cell by the sum of their column total.
118
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
Or equivalently:
where f = 6 is the degree of freedom. Note that we have to use the (maximum likelihood)
estimate of the proportion of proportions to find the expected frequencies and that the proportion
of (for example) PG given education level can be computed by NC,SC,UG given education
level. Therefore, the degree of freedom of the test is equal to (4 1) (3 1). Note, we reject
the null hypothesis for large values of the test statistic.
The critical value value of our test statistic is given by Pr(26 16.81) = 0.01 (See Formulae
and Tables page 169). Hence, we reject the null hypothesis if T > 16.81 or C = (16.81, ).
To calculate the test statistic we we the expected numbers under the null hypothesis:
Note all expected cell values are larger than 5, so we do not have to combine cells. The value
of our test statistic is given by:
(32 25)2 + (20 25)2 + (23 25)2 (13 10)2 + (16 10)2 + (1 10)2
T= +
25 10
(43 51.33)2 + (51 51.33)2 + (60 51.33)2
+
51.33
(12 13.67) + (13 13.67)2 + (16 13.67)2
2
+ = 19.17233
13.67
We observe that the value of the test statistic is in the rejection region C, so we reject the null
hypothesis of probabilities of MR, CR, and SR given the education level are equal at a level of
significance of 1%.
119
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
4. We are asked to find the 95% confidence interval for the difference in proportions with at least
an undergraduate degree for individuals who are marginally and super rich. The corresponding
number of observations are given in the table below
Using the CLT (proportion is the mean) the pivotal quantity is given by:
p MR b
(b pS R ) (p MR pS R )
T= q N(0, 1).
p MR (1b
b
n MR
p MR )
+ nS R
pS R (1b
b pS R )
5. Part c)
> rich < matrix(c(32,20,23,13,16,1,43,51,60,12,13,16),nrow=4,byrow=T)
> E < chisq.test(rich,correct=F)$expected;print(E) #(displays the expected cell values, use
this to check whether all cells5)
> chisq.test(rich,correct=F)
Part d)
>p1.hat < sum(rich[3:4,1])/100
>p3.hat < sum(rich[3:4,3])/100
>diff < p1.hat-p3.hat
>lower < diff+qnorm(.025)*sqrt(p1.hat*(1-p1.hat)/100+p3.hat*(1-p3.hat)/100
120
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
121
Module 4
Linear Regression
where the i s are independent and identically distributed normal random variables with E[i ] = 0 and
Var(i ) = 2 .
1. Rewrite the exponential regression model as a linear regression model with parameters and
and describe the relationship between and 0 and the relationship between and 0 .
Derive the following from the linear regression model:
|X = x] =
3. E[b
|X = x] = 2 /S xx
4. Var(b
|X = x) N(1 , 2 /S xx )
5. (b
|X = x] =
6. E[b
x2
!
1
|X = x) =
7. Var(b 2
+
n S xx
x2
!!
1
|X = x) N ,
8. (b 2
+
n S xx
|X = x) = S xxx
2
, b
9. Cov(b
Exercise 4.2: [wk10Q2, Solution, Schedule] Forensic scientists use various methods for determin-
ing the likely time of death from post-mortem examination of human bodies. A recently suggested
objective method uses the concentration of a compound (3-methoxytyramine or 3-MT) in a particular
122
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
1. Construct a scatterplot of the data. Comment on any interesting features of the data and discuss
briefly whether linear regression is appropriate to model the relationship between concentration
of 3-MT and the interval from death.
2. Calculate the correlation coefficient for the data, and use it to test the null hypothesis that the
population correlation coefficient is equal to zero.
3. Calculate the equation of the least-squares fitted regression line and use it to estimate the con-
centration of 3-MT:
(a) after 1 day and
(b) after 2 days.
Comment briefly on the reliability of these estimates.
4. Calculate a 99% confidence interval for the slope of the regression line. Using this confidence
interval, test the hypothesis that the slope of the regression line is equal to zero. Comment on
your answer in relation to the answer given in part (2) above.
123
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
3. Now consider an estimator b of which is a linear function of the responses, i.e. an estimator
Pn 3
which has the form 3 = i=1 ai Yi , where a1 , . . . , an are constants.
b
Exercise 4.4: [wk10Q4, Solution, Schedule] A university wishes to analyse the performance of its
students on a particular degree course. It records the scores obtained by a sample of 12 students at
the entry to the course, and the scores obtained in their final examinations by the same students. The
results are as follows:
Student A B C D E F G H I J K L
Entrance exam score x (%) 86 53 71 60 62 79 66 84 90 55 58 72
Final paper score y (%) 75 60 74 68 70 75 78 90 85 60 62 70
2. Assuming the full normal model, calculate an estimate of the error variance 2 and obtain a
90% confidence interval for 2 .
3. By considering the slope parameter, formally test whether the data is positively correlated.
4. Find a 95% confidence interval for the mean finals paper score corresponding to an individual
entrance score of 53.
5. Test whether this data come form a population with a correlation coefficient equal to 0.75.
124
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
6. Calculate the proportion of variance explained by the model. Hence, comment on the fit of the
model.
Exercise 4.5: [wk10Q5, Solution, Schedule] Complete the following ANOVA table for a simple
linear regression with 60 observations:
Exercise 4.6: [wk10Q6, Solution, Schedule] Suppose you are interested in relating the accounting
variable EPS (earnings per share) to the market variable STKPRICE (stock price). Then, a regression
equation was fitted using STKPRICE as the response variable with EPS as the regressor variable.
Following is the computer output from your fitted regression. You are also given that: x = 2.338,
y = 40.21, S x = 2.004, and S y = 21.56.
Regression Analysis
The regression equation is
STKPRICE = 25.044 + 7.445 EPS
Analysis of Variance
SOURCE DF SS MS F p
Regression 1 10475 10475 42.35 0.000
Error 46 11377 247
Total 47 21851
Exercise 4.7: [wk10Q7, Solution, Schedule] (Modified from an Institute of Actuaries exam problem)
An insurance company issues house buildings policies for houses of similar size in four different
post-code regions A, B, C, and D. An insurance agent takes independent random samples of 10 house
buildings policies for houses of similar size in each of the four regions. The annual premiums (in
dollars) were as follows:
125
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
Region A : 229
P 241 270 256 241 247 261 243 272 219
x = 2, 479, x = 617, 163
P 2
Region B : 261
P 269 284 268 249 255 237 270 269 257
x = 2, 619, x = 687, 467
P 2
Region C : 253 247 244 245 221 229 245 256 232 269
x = 2, 441, x = 597, 607
P P 2
Region D : 279 268 290 245 281 262 287 257 262 246
x = 2, 677, x = 718, 973
P P 2
Perform a one-way analysis of variance at the 5% level to compare the premiums for all four regions.
State briefly the assumptions required to perform this analysis of variance.
Exercise 4.8: [wk10Q8, Solution, Schedule] You are given the following one-way ANOVA model:
Yi j = + i + i j , for i = 1, . . . , I and j = 1, . . . , J
where the error terms i j are i.i.d. normal random variables with mean 0 and common variance 2 .
Using fundamental principles of maximum likelihood, derive the maximum likelihood estimates for
all parameters in the model.
Exercise 4.9: [wk10Q9, Solution, Schedule] For the one-way ANOVA model derive the following
maximum likelihood estimators:
X ni
I X ni
I X
X
Yi j Yi j
i=1 j=1 i=1 j=1
=Y=
1. b =
I
X N
ni
i=1
ni
X
Yi j
j=1
i = Y i. Y =
2. b Y
ni
Exercise 4.10: [wk10Q11, Solution, Schedule] Past Institute Exam (April 2005)
As part of an investigation into health service funding a working party was concerned with the issue
of whether mortality could be used to predict sickness rates. Data on standardised mortality rates and
standarised sickness rates collected for a sample of 10 regions and are shown in the table below:
126
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
Data summaries:
m = 1136.1, m2 = 129, 853.03, s = 1934.2, s2 = 377, 700.62, and ms = 221, 022.58.
P P P P P
1. Calculate the correlation coefficient between the mortality rates and the sickness rates and de-
termine the probability-value for testing whether the underlaying correlation coefficient is zero
against the alternative that it is positive.
2. Noting the issue under investigation, draw an appropriate scatterplot for these data and comment
on the relationship between the two rates.
3. Determine the fitted linear regression of sickness rate on mortality rate and test whether the
underlaying slope coefficient can be considered to be as large as 2.0.
4. For a region with mortality rate 115.0, estimate the expected sickness rate and calculate 95%
confidence limits for this expected rate.
Exercise 4.11: [wk10Q12, Solution, Schedule] Past Institute Exam (September 2005)
The data given in the following table are the number of deaths from AIDS in Australia for 12 consec-
utive quarters starting from the second quarter of 1983.
Quarter (i) 1 2 3 4 5 6 7 8 9 10 11 12
Number of deaths (ni ) 1 2 3 1 4 9 18 23 31 20 25 37
E[ni ] = i2
might be appropriate for these data, where is a parameter to be estimated from the data above.
She has proposed two methods for estimating , and these are given in part i. and ii. below.
(a) Show that the least squares estimate of , obtained by minimising q = 12 i=1 (ni i ) is
2 2
P
given by:
P12 2
i ni
b = Pi=112 4
.
i=1 i
(b) Show that an alternative (weighted) least squares estimate of , obtained by minimising
P (ni i2 )2
q = 12
i=1 i2
is given by:
P12
ni
= Pi=1
e 12 2
.
i=1 i
3. To assess whether the single parameter model which was used in part b) is appropriate for the
data, a two parameter model is considered. The model is of the form:
E[Ni ] = i
for i = 1, . . . , 12.
127
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
E[Yi ] = + xi
is used, where xi = log(i) and Yi = log(Ni ) for i = 1, . . . , 12. Relate the parameters and
to the regression parameters and .
(b) The least squares estimates of and are -0.6112 and 1.6008 with standard errors 0.4586
and 0.2525 respectively (you are not asked to verify these results).
Using the value for the estimate , conduct a formal statistical test to assess whether the
form of the model suggested in (b) is adequate.
Group 1 2 3 4
Claim sizes y 0.11 0.46 0.52 1.43 1.48 2.05 1.52 2.36
0.71 1.45 1.84 2.47 2.38 3.31 2.95 4.08
I: sum assured x 1 2 3 4
II: Company A B C D
Group 1 2 3 4
P
y 2.73 6.26 9.22 10.91
P 2
y 2.8303 11.8018 23.0134 33.2289
Yi = + xi + i
where Yi is the ith claim size and xi is the corresponding sum assured, i = 1, . . . , 16.
(a) Calculate the total sum of squares and its partition into the regression (model) sum of
squares and the residual (error) sum of squares.
(b) Fit the model and calculate the fitted values for the first claim size of group 1 (namely
0.11) and the last claim size of group 4 (namely 4.08).
(c) Consider a test of the hypothesis H0 : = 0 against a two-sided alterative. By preform-
ing appropriate calculations, assess the strength of the evidence against this no linear
relationship hypothesis.
2. In scenario II, suppose we adopt the analysis of variance model
Yi j = + i + ei j
where Yi j is the jth claim size for company i and i is the ith company effect, i = 1, 2, 3, 4 and
j = A, B, C, D.
128
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
(a) Calculate the partition of the total sum of squared into the between companies (model)
sum of squares and the within companies (residual/error) sum of squares.
(b) Fit the model.
(c) Calculate the fitted values for the first claim size of group 1 and the last claim size of group
4.
(d) Consider a test of hypothesis H0 : i = 0, i = A, B, C, D against a general alternative. By
preforming appropriate calculations, assess the strength of the evidence against this no
company effects hypothesis.
Yk = xk + k , for each k = 1, 2, . . . , n,
that is, the regression with one regressor variable variable but without the intercept term. This model
is called regression through the origin because the true regression line passes through the point (0,0).
Derive the least squares estimate of .
Now, consider the quadratic regression model passing through the origin;
Exercise 4.14: [wk11Q2, Solution, Schedule] Use the following steps to establish a relationship
between the coefficient of determination and the correlation coefficient:
1. Show that:
yk y = b
b (xk x) .
s2x
R2 = b
2 = r2 .
S y2
Exercise 4.15: [wk11Q3, Solution, Schedule] In the regression model Yk = + xk + k , use algebra
to establish the following results:
n 2 s2
1. R2 = 1 , where s2y is the sample variance of Y.
n 1 s2y
129
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
r
n1
2. s = sy 1 r2
, where sy is the sample standard deviation of Y.
n2
r
r2
= = n2
b
3. t b
se b 1 r2
1. Write down the design matrix for the simple linear regression model.
2. Write out the matrix X > X for the simple linear regression model.
3. Write out the matrix X > Y for the simple linear regression model.
4. Write out the matrix (X > X)1 for the simple linear regression model.
Exercise 4.17: [wk11Q5, Solution, Schedule] The following model was fitted to a sample of super-
markets in order to explain their profit levels:
y = 0 + 1 x1 + 2 x2 + 3 x3 +
where
1 = 0.027
b and 2 = 0.097
b and 3 = 0.525.
b
(E) An increase in store size by one square foot increases profits by 52.5 cents.
Exercise 4.18: [wk11Q6, Solution, Schedule] In a regression model of three explanatory variables,
twenty-five observations were used to calculate the least squares estimates. The total sum of squares
and regression sum of squares were found to be 666.98 and 610.48, respectively. Calculate the ad-
justed coefficient of determination.
130
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
(A) 89.0%
(B) 89.4%
(C) 89.9%
(D) 90.3%
(E) 90.5%
Exercise 4.19: [wk11Q7, Solution, Schedule] In a multiple regression model given by:
y = 0 + 1 x1 + . . . + p1 x p1 + ,
which of the following gives a correct expression for the coefficient of determination?
I. SSM
SST
II. SST SSE
SST
III. SSM
SSE
(A) I only
(B) II only
Exercise 4.20: [wk11Q8, Solution, Schedule] The ANOVA table output from a multiple regression
model is given below:
ANOVA Table
Source D.F. SS MS F-Ratio Prob(> F)
Regression 5 13326.1 2665.2 13.13 0.000
Error 42 8525.3 203.0
Total 47 21851.4
(A) 52%
(B) 56%
(C) 61%
(D) 63%
(E) 68%
131
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
Exercise 4.21: [wk11Q9, Solution, Schedule] You have information on 62 purchases of Ford auto-
mobiles. In particular, you have the amount paid for the car (y) in hundreds of dollars, the annual in-
come of the individuals (x1 ) in hundreds of dollars, the sex of the purchaser (x2 , 1 = male and 0 = female),
and whether or not the purchaser graduated from college x3 , 1 = yes and 0 = no . After examining
the data and other information available, you decide to use the regression model:
y = 0 + 1 x1 + 2 x2 + 3 x3 + .
(A) 0.17
(B) 17.78
(C) 50.04
(D) 55.54
(E) 57.43
Exercise 4.22: [wk11Q10, Solution, Schedule] Suppose in addition to the information in question
9., you are given:
9 558
4 880 937
X > Y = .
7 396
6 552
Calculate the expected difference in the amount spent to purchase a car between a person who gradu-
ated from college and another one who did not.
(A) 233.5
(B) 1 604.3
(C) 2 195.3
(D) 4 920.6
(E) 6 472.1
Exercise 4.23: [wk11Q11, Solution, Schedule] A regression model of y on four independent vari-
ables x1 , x2 , x3 and x4 has been fitted to a data consisting of 212 observations and the computer output
132
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
Regression Analysis
The regression equation is
y = 3894 - 50.3 x1 + 0.0826 x2 + 0.893 x3 + 0.137 x4
Exercise 4.24: [wk11Q12, Solution, Schedule] In a multiple regression model, which of the follow-
ing gives a correct expression for the unbiased estimate of 2 ?
>
(A) 1
np+1
Y Xb
Y Xb
>
1
(B) np+1
Y b
Y Y bY
(C) 1
n1
Y >Y
>
1
(D) n1
Y b
Y Y bY
>
(E) 1
np
Y Xb
Y Xb
Exercise 4.25: [wk11Q13, Solution, Schedule] The estimated regression model of fitting life ex-
pectancy from birth (LIFE EXP) on the countrys gross national product (in thousands) per population
(GNP) and the percentage of population living in urban areas (URBAN%) is given by:
For a particular country, its URBAN% is 60 and its GNP is 3.0. Calculate the estimated life ex-
pectancy at birth for this country.
(A) 49
(B) 50
133
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
(C) 57
(D) 60
(E) 65
Exercise 4.26: [wk11Q14, Solution, Schedule] What is the use of the scatter plot of the fitted values
and the residuals?
Exercise 4.27: [wk11Q15, Solution, Schedule] For the case of the multiple regression model, show:
|X = x] =
1. E[b
|X = x) = 2 (X > X)1
2. Var(b
1. Suggest why H = X(X > X)1 X > is called the hat matrix.
2. Show that HH > = H 2 = H.
3. Explain why:
X
yi = hii yi +
b hi j y j ,
j,i
134
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
Solutions
Solution 4.1: [wk10Q1, Exercise, Schedule]
1. Consider the given exponential regression model. First, transform the regression equation so
that you have a linear regression form by taking the logarithms of both sides:
and differentiating with respect to the parameters and setting to zero, that is,
n
X
S S (, ) = (2) log(yi ) xi = 0
i=1
n
X
S S (, ) = (2xi ) log yi xi = 0.
i=1
135
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
Solving, we get:
n
X n
X
=
b log (yi ) /n b
xi /n
i=1 i=1
=log(y) bx
ni=1 xi
Pn P
i=1 xi log (yi ) b
=
b Pn 2
i=1 xi
( ni=1 xi )2
P
+
Pn Pn
x
i=1 i log (y i ) log(y) x
i=1 i
b
= Pn 2
n
i=1 xi
Pn 2 Pn Pn
i=1 xi xi log (y i ) log(y) i=1 xi
= i=1
1 Pn 2 b
Pn 2
n i=1 xi i=1 xi
Pn Pn
i=1 xi log (yi ) log(y) i=1 xi
=
Pn 2 (Pni=1 xi )2
i=1 xi
Pnn Pn
j=1 log(y j )xi /n
Pn
i=1 xi log (yi ) i=1
= Pn 2 2
i=1 xi nx
X n
xi /n nj=1 log(y j )
Pn P
(y
i=1 xi log i )
|i=1{z }
=x
=
nx2
Pn 2
i=1 xi
Pn
(xi x) log (yi )
= i=1Pn 2 2
i=1 xi nx
n
X
= ci log(yi )
i=1
3.
n
h i X x i x
|X = x =E =
E b Pn 2 2
log(y i )|X x
i=1 x
i=1 i nx
n
X xi x h i
= Pn 2 2
E log(yi )|X = x
i=1 i=1 xi nx
n
X xi x
= Pn 2 2
( + xi )
i=1 i=1 xi nx
=0
z }| {
Xn
xi x Pn
i=1 xi (xi x)
= Pn 2 + Pi=1
i=1 xi nx2 n
i=1 xi2 nx2
| {z }
=1
=
136
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
4.
n
X
|X = x =Var ci log(Yi )|X = x
Var b
i=1
n
X
= c2i Var log(Yi )|X = x
i=1
n
X
=2 c2i
i=1
Pn
x)2
i=1 (xi
= P2
2 2
n
2
x
i=1 i nx
2
= Pn
i=1 xi2 nx2
6.
h i h i
E b|X = x =E y b x|X = x
n
X h i
=E log(yi )/n|X = x E b
|X = x x
i=1
n
1X h i
= E log(Yi )|X = x x
n i=1
n
1X
= ( + xi ) x
n i=1
n
1 X
= (n + ) xi x
n
n i=1
X xi
= + x =
i=1
n
137
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
7.
|X = x =Var log(y) b
Var b x|X = x
=Var log(y)|X = x + x2 Var b |X = x
2xCov log(y), b |X = x
n
X log(yi )
=Var |X = x + x2 Var b |X = x
i=1
n
n n
X log(yi ) X
, ci log(yi )|X = x
2xCov
n
i=1 i=1
n
1 X
= 2 Var log(yi )|X = x + x2 Var b |X = x
n i=1
n n
2x X X
ci Cov log(yi ), log(y j )|X = x
n i=1 j=1
1
= Var log(yi )|X = x + x2 Var b |X = x
n
n
2x X
ci Var log(yi )|X = x
n i=1
n
2 x 2 2 22 x X
= + Pn 2
+ ci
n i=1 (xi x) n i=1
|{z}
=0(due to n
P
i=1 (xi x))
x2
!
1
= 2
+ Pn 2
n i=1 (xi x)
* using Cov log(yi ), log(y j )|X = x is equal to zero if i , j and equal to Var log(yi )|X = x if
i = j.
8. We have that is a linear combination of two normally distributed random variables, i.e.,
(log(Y)|X = x) and (|X = x), which is thus also normally distributed. The mean and vari-
ance are given in question (6) and (7).
9.
Cov b |X = x =Cov log(y) b
, b x, b
|X = x
=Cov log(y), b
|X = x xCov b , b
|X = x
| {z }
=0(see (7))
2
x
= Pn 2
i=1 (xi x)
10. We have 0 = exp(), 0 = exp(). Moreover, we have that (|X = x), (|X = x) and log(Y) are
normally distributed with their mean and variance as given in (5), (6), and (7). Thus, (0 |X =
x), (0 |X = x) and (Y|X = x) are lognormally distributed with parameters the mean of the
logarithm of the variable and2 the variance
of the logarithm of the variable. For example, 0
is E[|X = x] and 0 is Var |X = x .
2
138
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
3.5
2.5
Concentration
1.5
0.5
0 10 20 30 40 50 60
Interval (x)
1. Interesting features are that, in general, the concentration of 3-MT in the brain seems to de-
crease as the post mortem interval increases. Another interesting feature is that we observe two
observations with a much higher post mortem interval than the other observations.
The data seems to be appropriate for linear regression. The linear relationship seems to hold,especially
for values of interval between 5 and 26 (we have enough observations for that). Care should
be taken into account when evaluating y for x lower than 5 and larger than 26 (only two ob-
servations) because we do not know whether the linear relationship between x and y still holds
then.
2. We test:
H0 : = 0 v.s. H1 : , 0
139
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
From Formulae and Tables page 163 we observe Pr(t16 4.015) = Pr(t16 4.015) = 0.05%,
* using symmetry property of the student-t distribution. We observe that the value of our test
statistic (-5.89) is smaller than -4.015, thus our p-value should be smaller than 20.05% = 0.1%.
Thus, we can reject the null hypothesis even at a significance level of 0.1%, hence we can
conclude that there is a linear dependency between interval and concentration. Note that the
alternative hypothesis is here a linear dependency and not negative linear dependency, so you
do accept the alternative by rejecting the null hypothesis. Although, when you would use
as alternative hypothesis negative dependency, you would accept this alternative, due to the
construction of the test we have to use the phrase a linear dependency and not a negative
linear dependency.
y = + x +
=y b
b x
=42.98/18 + 0.0372008 337/18 = 3.084259
y =b
b +bx
=3.084259 0.0372008x
The data set contains accurate data up to 26 hours, as for observations 17 and 18 (at 48 hour
and 60 hours respectively) there was no eye-witness testimony direct available. Predicting 3-
MT concentration after 26 hours may not be advisable, even though x = 48 is within the range
of the x-values (5.5 hours to 60 hours).
b
tn2 .
)
s.e.(b
140
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
We have:
s
2
) =
b
s.e.(b
xi2 nx2
P
s
2
=
b
9854.5 3372 /18
( xi yi xi yi /n)2
P P P !
1 X 2 X 2
=
2
yi ( yi ) /n P 2 P 2
xi ( xi ) /n
b
n2
(672.8 337 42.98/18)2
!
1
= 109.7936 42.98 /18
2
= 0.1413014
16 9854.5 3372 /18
r
0.1413014
) =
s.e.(b = 0.00631331
9854.5 3372 /18
1 is given by:
(b) The mean value of b
"Pn #
h i (y i xi )
1 =E Pn 2
E b i=1
x
Pn i=1 i
(E yi |xi xi )
= i=1Pn 2
x
Pn i=1 i
i=1 (xi xi )
= P n 2
=
i=1 xi
141
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
h i
1 given a value of xi only depends on the value of yi , hence the E yi |xi
* using that E b
with the condition and ** using E yi |xi = xi .
For the variance we have:
Pn !
(y i xi )
Var b 1 =Var Pn 2 i=1
i=1 xi
Pn 2
i=1 (xi Var (yi |xi ))
= Pn 2
2
i=1 xi
2
= Pn .
i=1 xi2
2 ) Var(b
(b) We need to prove: Var(b 1 ) which is equivalent to prove that Var(b
2 )Var(b
1 )
0.
2
2
2 ) Var(b
Var(b 1 ) = 2 Pn 2
nx i=1 xi
!
1 1
=
2
Pn 2 0
nx2 i=1 xi
Xn Xn
xi2 nx2 = (xi x)2 = (n 1)s2x 0
i=1 i=1
142
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
Thus the variance of the estimator b 2 is at least as large as the variance of the least squares
estimator 1 and is strictly larger if there is variability in the value xi can take.
b
1 we have:
(b) For b
Pn
xi Yi
1 = Pi=1
b
n 2
i=1 xi
n
X xi
= Pn Yi ,
i=1 i=1 xi2
hence ai = Pnxi x2 for i = 1, . . . , n.
i=1 i
We need to verify the condition ni=1 ai xi = 1:
P
n n
X X xi
ai xi = Pn xi
i=1 i=1 i=1 xi2
Pn
i=1 xi2
= Pn = 1.
i=1 xi2
143
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
2 we have:
For b
Pn
xYi
2 = Pi=1
b n
i=1 xi
n
X 1
= Pn Yi ,
i=1 i=1 xi
hence ai = Pn1 xi = nx1 for i = 1, . . . , n.
i=1
We need to verify the condition ni=1 ai xi = 1:
P
n n
X X 1
ai xi = Pn xi
i=1 i=1 i=1 xi
P n
xi
= Pi=1
n = 1.
i=1 xi
3 is the general notation of a linear estimator. The condition ni=1 ai xi = 1
P
(c) We have that b
implies that we only look at unbiased estimators. This means that the linear estimator
with ai = Pnxi x2 , which is the least squares estimator, is the best (i.e., minimum variance)
i=1 i
unbiased estimator (BLUE estimator).
144
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
H0 : = 0 v.s. H1 : > 0,
T= q
b
tn2
/ i=1 (xi x)
c2
Pn 2
v) The value of the test statistic is in the rejection region, hence we reject the null hypothesis of
a zero correlation.
4. We have that yi |xi by|xi has a student-t distribution:
Var(yi |xi )
yi |xi b
y|xi
p tn2
Var(yi |xi )
The predicted value is given by:
y|xi = b
b +b
xi = 28.205 + 0.63223 53 = 61.713.
145
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
Thus, the 95% confidence interval for the value of y given that x = 53 is given by:
where
1+r 1 + 0.85860
! !
1 1
zr = log = log = 1.2880
2 1r 2 1 0.85860
1+ 1 + 0.75
! !
1 1
z = log = log = 0.97296
2 1 2 1 0.75
Pn
(xi x)(xi y)
r = pPn i=1 Pn
2 2
i=1 (xi x) i=1 (yi y)
1122
= Pn 2
( i=1 yi ny2i )( ni=1 xi2 nx2 )
P
1122
= = 0.85860
962.25 1774.667
v) We have that z0.82894 = 0.95. Thus, the p-value is given by 2 (1 0.82894) = 0.34212. The
value of the test statistic is not in the critical region if the level of significance is lower than
0.34212 (which is normally the case). Hence, for reasonable values of the level of significance
we would not reject the null hypothesis.
146
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
11222
= = 0.737193.
962.25 1774.667
Hence, a large proportion of the variability of Y is explained by X.
Solution 4.5: [wk10Q5, Exercise, Schedule] The completed ANOVA table is given below:
S T KPRICE
[ = 25.044 + 7.445 (2) = 39.934.
147
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
4. s = 247 = 15.716 and R2 = SSM
SST =
10475
21851
= 47.94%.
5. A scatter plot or diagram of the fitted values against the residuals (standardised) will provide us
an indication of the constancy of the variation in the errors.
6. To test for the significance of the variable EPS, we test H0 : = 0 against Ha : , 0. The test
statistic is:
7.445
= = = 6.508.
b
t b
se b 1.144
This is larger than t1/2,n2 = 2.0147 and therefore we reject the null. There is evidence to
support the fact that the EPS variable is a significant predictor of stock price.
b 0 7.445 24
=
t b = = 14.47.
se b 1.144
Thus, since this test statistic is smaller than t1,n2 = t0.95,46 = 1.676, do not reject the null
hypothesis.
Solution 4.7: [wk10Q7, Exercise, Schedule] The grand total/sum is x = 2479 + 2619 + 2441 +
P
2677 = 10216 so that the grand mean is x = 10216/40 = 255.4. Also, x2 = 617163 + 687467 +
P
597607 + 718973 = 2621210. Therefore the total sum of squares is:
X 2 X 2
SST = xx = x2 N x
= 2621210 (40)(255.4)2 = 12043.6.
Thus, to test the equality of the mean premiums across the regions, we test:
148
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
Solution 4.8: [wk10Q8, Exercise, Schedule] Consider the one-way ANOVA model:
Yi j = + i + i j , for i = 1, . . . , I and j = 1, . . . , J.
where the error terms i j are i.i.d. normal random variables with mean 0 and common variance 2 .
Since
Yi j N + i , 2 ,
then the likelihood function is given by:
I X J
!N 2
1 1 X yi j i
L yi j ; , i , 2 = exp (
2 i=1 j=1
2
where N = I J is the grand total sample size. Now, take the log-likelihood and differentiate with
respect to each parameter:
1 X X yi j i 2
I J
N
log L = ` yi j ; , i , = log (2) N log
2
2 2 i=1 j=1
and
I J
` 1 XX
= yij i
2 i=1 j=1
I J I
1
X X X
= = 0
y i j I J J i
2 i=1 j=1
i=1
J
` 1 X
= 2 yk j k = 0, for k = 1, 2, . . . , I
k j=1
2
` N XI X J
i j
y i
= + = 0.
i=1 j=1
3
149
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
and from the last equation, we have the MLE for the variance of the error term:
I J
1 XX 2
b 2
= yi j y yi. + y
I J i=1 j=1
I J
1 XX 2
= yi j yi. .
I J i=1 j=1
Solution 4.9: [wk10Q9, Exercise, Schedule] For the one-way ANOVA model we have Yi j N( +
i , 2 ) hence
1 yi j ( + i ) 2
!
1
f (yi j ; , i , ) = exp .
2 2
The likelihood function can be written as:
I
!Pi=1 !2
1
ni 1 X ni
I X
y i j ( + i )
L(yi j ; , i , ) =
exp
2 2 i=1 j=1
l
I ni
yi j ( + i )
! !
1 XX 1
= 2 =0
2 i=1 j=1
ni
I X
X
yi j ( + i ) = 0
i=1 j=1
X ni
I X I
X I
X
yi j ni + ni i = 0
i=1 j=1 i=1 i=1
| {z }
0
ni
I X
X
yi j N = 0
i=1 j=1
ni
I P
P
yi j
i=1 j=1
=
b
N
150
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
l
ni
yi j ( + i )
! !
1 X 1
= 2 =0
i 2 j=1
ni
X
yi j ( + i ) = 0
j=1
Xni
yi j ni ni i = 0
j=1
ni
X
ni i = yi j ni
j=1
ni
P
yi j
j=1
bi =
b
ni
H0 : = 0 v.s. H1 : > 0
2. Given the issue of whether mortality can be used to predict sickness, we require a plot of
sickness against mortality:
151
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
230
220
210
Sickness (s)
200
190
180
170
160
100 105 110 115 120 125 130
Mortality (m)
There seems to be an increase linear relationship such that mortality could be used to predict
sickness.
i) Hypothesis:
H0 : = 2 v.s. H1 : < 2
T= q
b
tn2
/s xx
c2
C = {(X1 , . . . , Xn ) : T (, tn2,1 )}
1.6371 2
T= q = = 0.74
b
0.2394
/s xx
c2
152
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
v) We have from Formulae and Tables page 163: t8,10.25 = 0.7064 and t8,10.20 = 0.8889. Thus
the p-value (using symmetry) is between 0.2 and 0.25. Thus, we accept the null hypothesis if
the level of significance is smaller than the p-value (which is usually the case). Note: exact
p-value using computer package is 0.2402.
4. For a region with m = 115 we have the estimated value:
s = 7.426 + 1.6371 115 = 195.69
b
with corresponding variance:
1 (x0 x)2 (115 113.61)2
! !
1
c2 + = 186.902 + = 19.1528
n smm 10 780.709
The corresponding
95% confidence limits are 195.69 t8,10.025 s.e.(s|m = 115) = 195.69
2.306 19.1528 = 185.60 and 195.69+t8,10.025 s.e.(s|m = 115) = 195.69+2.306 19.1528 =
205.78.
40
Number of deaths (n )
i
30
20
10
0
0 2 4 6 8 10 12
Quarter (i)
(b) The mean number of deaths increases with an increasing rate with quarter. The variance
also appears to increase with quarter.
2. (a) We have q = 12 i=1 (ni i ) . Take the derivative of q with respect to and equate that
2 2
P
equal to zero we obtain:
12
q X
=2 i2 (ni i2 ) = 0
i=1
12
X 12
X
ni i =
2
i4
i=1 i=1
P12 2
ni i
Pi=1
12
=b
.
i=1 i4
2 q
To prove that it is a minimum, we need to prove that 2
> 0:
12
2 q X
= 2 i4 > 0.
2 i=1
153
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
P (ni i2 )2 P12
(b) We have q = 12 i=1 i2
= i=1 (ni /i i)2 . Take the derivative of q with respect to
and equate that equal to zero we obtain:
12
q X
=2 i(ni /i i) = 0
i=1
12
X 12
X
ni = i2
i=1 i=1
P12
ni
Pi=1
12 2
=e
.
i=1 i
2 q
To prove that it is a minimum, we need to prove that 2
> 0:
12
2 q X
= 2 i2 > 0.
2 i=1
(c) We have:
P12
ni i2 15694
= Pi=1
b 12 4
= = 0.259
i=1 i
60710
P12
ni 174
= Pi=1
e 12 2
= = 0.268
i=1 i
650
H0 : = 2 v.s. H1 : , 2
T=
b
tn2
)
s.e.(b
iii) Critical region:
1.6008 2
T= = = 1.58
b
)
s.e.(b 0.2525
v) From formulae and Tables page 163 we obtain t10,10.10 = 1.372 and t10,10.05 = 1.812.
Thus the p-value of the hypothesis is between 0.1 and 0.2 (two-sided test!). For level of
significance lower than 0.1 we will accept the null hypothesis that = = 2 and thus this
assumption seems appropriate. Note: exact p-value using computer package is 0.1452.
154
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
1. (a) We have:
X X 2
SST = y2 y /n = 70.8744 29.122 /16 = 17.8760
X X
x =4 (1 + 2 + 3 + 4) = 40 x2 = 4 (12 + 22 + 32 + 42 ) = 120
X
xy =1 2.73 + 2 6.26 + 3 9.22 + 4 10.91 = 86.55
X X X
s xy = xy x y/n = 86.55 40 29.12/16 = 13.75
!2
13.75
SSM = = 1 s xx =
b2
20 = 9.453125
20
SSE =SST SSM = 17.8760 9.453125 = 8.422875.
(b)
s xy 13.75
=
b = = 0.6875
s xx 20
=y b
b x = 29.12 0.6875 40/16 = 0.1012
y=b
Thus, the fitted model is given by b +bx = 0.1012 + 0.6875x.
For x = 1 we have: by=b +bx = 0.1012 + 0.6875 1 = 0.7887
For x = 4 we have: by=b +bx = 0.1012 + 0.6875 4 = 2.8512
q
(c) We have s.e.(b ) = 8.4229/14
20
= 0.1734.
i) Hypothesis:
H0 : = 0 v.s. H1 : , 0
T=
b
tn2
)
s.e.(b
iii) Critical region:
0.6875 0
T= = = 3.965
b
)
s.e.(b 0.1734
v) We have t14,10.001 = 3.787 and t14,10.0005 = 4.140. Thus the p-value is between 0.1%
and 0.2%. Accept the null hypothesis if the level of significance is lower than the p-value
(which is usually not the case). Hence, we have strong evidence against the no linear
relationship hypothesis. Note: exact p-value using computer package is 0.00070481.
2. (a) We have:
SST =17.8760
SSB =(2.732 + 6.262 + 9.222 + 10.912 )/4 29.122 /16 = 9.6709
SSR =SST SSB = 17.8760 9.6709 = 8.2051
155
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
(b)
=29.12/16 = 1.82
b
1 =2.73/4 1.82 = 1.1375
b
2 =6.26/4 1.82 = 0.255
b
3 =9.22/4 1.82 = 0.485
b
4 =10.91/4 1.82 = 0.9075
b
(e) From Formulae and Tables page 173 and 174 we observe that F3,12 (4.474) = 2.5% and
F3,12 (5.953) = 1%. Thus the p-value is between 0.025 and 0.01, so we have some evidence
against the no company effects hypothesis. Note: exact p-value using computer package
is 0.0213.
Solution 4.13: [wk11Q1, Exercise, Schedule] To find the least squares estimate of , we minimize
the sum of squares:
n
X n
X
S S () = 2k = (Yk xk )2 .
k=1 k=1
which gives
Pn
xk Yk
= Pk=1
b
n 2
.
k=1 xk
For the quadratic regression, we simply replace xk by xk2 to get the least squares estimate:
Pn
k=1 xk2 Yk
= Pn
b .
k=1 xk4
Solution 4.14: [wk11Q2, Exercise, Schedule] To establish a relationship between the coefficient of
determination and the correlation coefficient:
yk y =
b +b
b xk y
= y bx + bxk y
= (xk x) .
b
156
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
2 S 2x (n 1) b2 S 2x S y 2 S 2x
!
SSM b
R =
2
= 2 = 2 = r = r2 .
SST S y (n 1) Sy S x S y2
Thus, the coefficient of determination, as shown above, is nothing but the square of the correla-
tion coefficient for simple linear regressions.
Solution 4.15: [wk11Q3, Exercise, Schedule] Consider the linear regression model Yk = +xk +k :
1.
1 x1
1 x2
X = [1n x] = .. ..
. .
1 xn
157
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
2.
1 x1
#
. . . 1 1 x2
" " Pn # " #
1 1 n x 1 x
X X=
>
.. .. = Pn x Pn x2 = n x
i=1 i
. . . xn 1 Pn
x1 x2 . . xi2
i=1 i i=1 i n i=1
1 xn
3.
y1
#
. . . 1 y2
" " Pn #
1 1 y
X Y=
>
.. = Pni=1
i
x1 x2 . . . xn . i=1 xi yi
yn
158
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
159
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
1. We have:
h i h i
|X = x =E (X > X)1 X > Y|X = x
E b
=(X > X)1 X > E Y|X = x
2. We have:
|X = x =Var (X > X)1 X > Y|X = x
Var b
=(X > X)1 X > Var Y|X = x X(X > X)1
1. The matrix H = X(X > X)1 X > is a n n matrix which is often called the hat-matrix, because
when we pre-multiply the vector Y with this matrix (i.e., HY) one obtains the estimated values
of b
Y.
2. We have:
>
H > H =H 2 = X(X > X)1 X > X(X > X)1 X >
> >
= X > (X > X)1 X > X(X > X)1 X >
=X(X > X)1 X > X(X > X)1 X >
=X(X > X)1 X >
=H
160
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
* see 15b.
3. We have:
h11 h12 h1n y1
h21 h22 h2n y2
Y =X = HY = .. .. . . . .. ..
b b
. . . .
hn1 hn2 hnn yn
The ith value of the vector b yi and is equal to the ith row of the matrix H multiplied with the
Y is b
vector Y. Thus we have:
n
X X
yi =
b hi j y j = hii yi + hi j y j
j=1 j,i
161
ACTL2131 Probability and Mathematical Statistics, S1 2017 Exercises
5. We have:
e =Y b
b Y = Y HY = (In H)Y
=(In H)X
=X HX
=X X(X > X)1 X > X
=X X
=0
* using HH > = (H > H)> = (H)> = H where the second equation is derived in question (b) and
the third equation sign is due to symmetry of the matrix H, which can be observed from the
results in question (d).
8. From our results above, the ith least squares residual has variance given by:
ei |X = x = 2 (1 hii ),
Var b
162