Вы находитесь на странице: 1из 3

Part A Statistics

HT 2014

Problem Sheet 2
1. Independent random variables X1 , . . . , Xn have common probability density function f (x; )
depending on the unknown parameter . Define the terms likelihood function for , maximum
likelihood estimator for , Fishers information for .
What is the connection between Fishers information and the asymptotic distribution of the
maximum likelihood estimator?
Suppose X1 , . . . , Xn are independent, each with density
f (x; ) =

1
,
[1 + (x )2 ]

xR

with parameter R. Derive an equation which must be satisfied by the maximum likelihood
b
estimator .
Prove that Fishers information for is 12 n. Explain how to calculate an approximate 95%
confidence interval for .
2. If S 2 is the variance of a random sample of size n from a normal distribution N (, 2 ),
then cS 2 has a standard distribution for a particular value of the constant c. Write down the
distribution and specify the constant c.
How many pairs of positive constants a, b exist such that P (a < cS 2 < b) = 0.95? For any
such pair, deduce that the random interval


(n 1)S 2 (n 1)S 2
,
b
a
will contain the variance 2 for approximately 95% of a large number of such random samples.
3. The following data are time intervals in days between earthquakes which either registered
magnitudes greater than 7.5 on the Richter scale or produced over 1,000 fatalities. Recording
starts on 16 December, 1902 and ends on 4 March, 1977, a total period of 27,107 days. There
were 63 earthquakes in all, and therefore 62 recorded time intervals.
840
145
33
150
434
759
567

1901
294
721
710
402
556
328

40
335
454
667
209
304

139
203
30
129
82
83

246
638
735
365
736
887

157
44
121
280
194
319

695
562
76
46
99
375

1336
1354
36
40
599
832

780
436
384
9
220
263

1617
937
38
92
584
460

Assuming the data to be from a random sample X1 , . . . , Xn drawn from an exponential


b of and calculate
distribution with parameter , obtain the maximum likelihood estimator
the maximum likelihood estimate.
Given that the moment generating function of a gamma distribution with parameters (n, )
is
n


Mn (t) =
t
1

show that Y =

Pn

i=1 Xi

has a gamma distribution. Show that




a
b
,
nx nx

is an exact 95% central confidence interval for if


Z a n1 y
Z n1 y
y
e
y
e
dy =
dy = 0.025.
(n)
(n)
0
b
Obtain Fishers information for and use it to show that (0.0018, 0.0029) is an approximate
95% confidence interval for .
4. Let X1 . . . , Xn be a random sample from a normal distribution with mean and variance 2 .
What statistics, based on this sample, have (i) a standard normal distribution, (ii) a 2 (n1)
distribution, (iii) a t(n 1) distribution?
What is meant by a 100(1 )% confidence interval for an unknown parameter of a distribution? Construct a central confidence interval (i.e. an equal-tail confidence interval) for
the mean of a normal distribution with unknown variance 2 , based on a random sample of
size n.
Let the confidence interval constructed above have width W . Show that there exists a
constant c such that cW 2 / 2 has a 2 -distribution and find c in terms of n and .
5. Let X1 , . . . , Xm and Y1 , . . . , Yn be independent random samples from normal distributions
N (1 , 2 ) and N (2 , 2 ), respectively, where the parameters 1 , 2 , 2 are unknown. Let
1

S = (m + n 2)

X
m


n
X
2
(Xi X) +
(Yj Y ) .
2

i=1

j=1

Determine the distributions of both


(m + n 2)S 2 / 2

and

X Y (1 2 )
q
.
1
+ n1 )
S2( m

Show how to construct a confidence interval for 1 2 .


6. The elastic modulus of a material is 1.183103 (in appropriate SI units), this being accurately
measured on sophisticated equipment. In an attempt to find a quicker, cheaper method of
measurement, the elastic modulus was measured using a simpler method and the following
results were obtained (all 103 SI units): 1.5, 1.0, 1.4, 1.7, 1.2, 1.3, 1.3, 1.2.
(i) Assuming the data arise from a normal distribution, how would you test whether this
distribution has the correct mean? State the appropriate null and alternative hypotheses, and any assumptions you need to make for the hypothesis test to be appropriate.
(ii) Carry out the test you suggested in (i) and state your conclusions.
(iii) Modify your test to test whether the data are from a distribution with a mean value
higher than the true value and re-state your conclusions.

7. (Optional, using R.) Read in the earthquake data from question 3 and try an exponential
Q-Q plot:
x <- scan("http://www.stats.ox.ac.uk/~laws/partA-stats/data/quakes.txt")
n <- length(x)
k <- 1:n
plot(-log(1 - k/(n+1)), sort(x), main = "Exponential Q-Q Plot",
ylab = "Ordered data", xlab = "-log[1 - k/(n+1)]")

Is an exponential model a reasonable assumption for this dataset?


The 2.5% and 97.5% quantiles for a gamma distribution with parameters (n, 1) can be calculated as follows.
a <- qgamma(0.025, n)
b <- qgamma(0.975, n)

That is, the function qgamma(p, n) calculates the pth quantile of a gamma distribution with
shape parameter n and rate parameter 1.
Now calculate the exact confidence interval of question 3:
sumx <- sum(x)
a/sumx
b/sumx

Also use R to check that the approximate 95% confidence interval for obtained using
Fishers information is as given in question 3.
8. (Optional, using R.) To do question 6 you will need the sample mean x and sample standard
deviation s:
x <- c(1.5, 1.0, 1.4, 1.7, 1.2, 1.3, 1.3, 1.2)
mean(x)
sd(x)

The functions qt and pt allow you to determine the significance (or otherwise) of your test
statistic(s) in question 6. Use qt and/or pt to find the quantiles and/or probabilities that
you need in question 6.
The pth quantile of a tr -distribution can be calculated using qt(p, r), so e.g. the 97.5%
quantile of a t4 -distribution can be found using
qt(0.975, 4)

Alternatively, the cdf of a tr -distribution at y can be calculated using pt(y, r), so e.g. the
probability that a t4 random variable is less than 1.96 is given by
pt(1.96, 4)

The above R code is in the file Sheet2.R and can be cut-and-pasted into R.

Вам также может понравиться