Академический Документы
Профессиональный Документы
Культура Документы
Mathematical statistics
Lecture notes by Janet Godolphin
2010
ii
Contents
1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3
3
3
4
5
7
9
9
10
12
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
13
13
14
15
15
16
17
18
19
19
19
20
21
22
22
23
24
25
25
26
iii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
CONTENTS
iv
.
.
.
.
.
.
.
.
26
27
30
31
31
31
32
34
35
35
35
36
36
38
38
38
39
41
41
41
42
42
45
45
45
47
48
49
49
50
50
51
52
52
52
53
55
2.6
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
CONTENTS
4
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
57
57
57
58
60
61
62
62
64
65
65
65
66
67
68
CONTENTS
Spring Semester
Credits 15
email: d.terhesiu@surrey.ac.uk
Class Test
The Class Test will be held on Thursady 11th March (week 5), starting at 12.00.
Class tests will include questions of the following types:
examples and proofs previously worked in lectures,
questions from the self-study exercises,
previously unseen questions in a similar style.
The Class Test will comprise 15% of the overall assessment for the course.
Coursework
Distribution: Coursework will be distributed at 14.00 on Friday 26th March.
Collection: Coursework will be collected on Thursday 29th April in Room LTB.
The Coursework will comprise 10% of the overall assessment for the course.
Chapter 1
Chapter 1 contains and reviews prerequisite material from MS132. Due to time
constraints, students are expected to work through at least part of this material
independently at the start of the course.
CONTENTS
with a set of self-study exercises, which you should attempt in your own study
time in parallel with the lectures.
In addition, six exercise sheets will be distributed during the course. You will
be given a week to complete each sheet, which will then be marked by the lecturer
and returned with model solutions. It should be stressed that completion of these
exercise sheets is not compulsory but those students who complete the sheets do
give themselves a considerable advantage!
Selected texts
Freund, J. E. Mathematical Statistics with Applications, Pearson (2004)
Hogg, R. V. and Tanis, E. A. Probability and Statistical Inference, Prentice Hall
(1997)
Lindgren, B. W. Statistical Theory, Macmillan (1976)
Mood, A. M., Graybill, F. G. and Boes, D. C. Introduction to the Theory of Statistics, McGraw-Hill (1974)
Wackerly, D.D., Mendenhall, W., and Scheaffer, R.L. Mathematical Statistics with
Applications, Duxbury (2002)
Useful series
These series will be useful during the course:
(1 x)
xk = 1 + x + x2 + x3 +
k=0
(1 x)2 =
x
k=0
k=0
Chapter 1
Introductory revision material
This chapter contains and reviews prerequisite material from MS132. If necessary
you should review your notes for that module for additional details. Several examples, together with numerical answers, are included in this chapter. It is strongly
recommended that you work independently through these examples in order to
consolidate your understanding of the material.
1.1
Basic probability
Probability or chance can be measured on a scale which runs from zero, which
represents impossibility, to one, which represents certainty.
1.1.1
Terminology
4
Answer:
1
2
first 3
roll 4
5
6
(1,1)
(2,1)
(3,1)
(4,1)
(5,1)
(6,1)
(1,2)
(2,2)
(3,2)
(4,2)
(5,2)
(6,2)
p(E1 ) = 16 ; p(E2 ) =
1
;
18
second roll
(1,3) (1,4)
(2,3) (2,4)
(3,3) (3,4)
(4,3) (4,4)
(5,3) (5,4)
(6,3) (6,4)
(1,5)
(2,5)
(3,5)
(4,5)
(5,5)
(6,5)
p(E3 ) = 21 ; p(E4 ) =
(1,6)
(2,6)
(3,6)
(4,6)
(5,6)
(6,6)
1
.
12
Combinations of events
Given events A and B, further events can be identified as follows.
The complement of any event A, written A or Ac , means that A does not
occur.
The union of any two events A and B, written A B, means that A or B
or both occur.
The intersection of A and B, written as A B, means that both A and B
occur.
Venn diagrams are useful in this context.
Example 1: (continued)
Obtain P (E1 E2 ), P (E1 E2 ), P (E1 E3 ) and P (E1 E3 )
Answer: P (E1 E2 ) = P () = 0
1
P (E1 E2 ) = P (E1 ) + P (E2 ) = 61 + 18
= 29
1
3
P (E1 E3 ) = P (6, 1), (6, 3), (6, 5) = 36 = 12
P (E1 E3 ) = P (E1 ) + P (E3 ) P (E1 E3 ) =
1
6
+ 12
1
12
7
12
[Notes on axioms:
(1) In order to cope with infinite sequences of events, it is necessary to strengthen axiom
3 to
3. P (
i=1 ) =
i=1 P (Ei ) for any sequence (E1 , E2 , ) of mutually exclusive events.
(2) When is noncountably infinite, in order to make the theory rigorous it is usually
necessary to restrict the class of events F to which probabilities are assigned.]
1.1.3
Conditional probability
Example 1 (continued): Find P (E1 |E2 ) and P (E1 |E3 ). Are E1 , E2 independent?
1 E2 )
1 E3 )
Answer: P (E1 |E2 ) = P (E
= 0, P (E1 |E3 ) = P (E
= 1/12
= 61
P (E2 )
P (E3 )
1/2
P (E1 )P (E2 ) 6= 0 so P (E1 E2 ) 6= P (E1 )P (E2 ) and thus E1 and E2 are not
independent.
P (A|Bj )P (Bj )
j=1
Bayes Rule
Suppose that events B1 , . . . , Bk are mutually exclusive and exhaustive and let A
be any event. Then
P (Bj |A) =
P (A|Bj )P (Bj )
P (A|Bj )P (Bj )
=
P (A)
i P (A|Bi )P (Bi )
P (A|D)P (D)
P (A)
0.00010.98
0.050093
= 0.002.
The person is still very unlikely to have the disease even though the test is positive.
Example 3: (Bertrands Box Paradox) Three indistinguishable boxes contain
black and white beads as shown: [ww], [wb], [bb]. A box is chosen at random
and a bead chosen at random from the selected box. What is the probability of
that the [wb] box was chosen given that selected bead was white?
Answer: E chose the [wb] box, W selected bead is white. By the
partition law: P (W ) = 1 13 + 12 13 + 0 13 = 12 . Now using Bayes rule
P (E|W ) =
P (E)P (W |E)
P (W )
1
12
3
1
2
1
3
box has been seen, the probability that the box is [wb] is still 13 ).
1.1.4
Self-study exercises
1. Consider families of three children, a typical outcome being bbg (boy-boygirl in birth order) with probability 81 . Find the probabilities of
(i) 2 boys and 1 girl (any order),
(ii) at least one boy,
(iii) consecutive children of different sexes.
Answer: (i) 38 ; (ii) 87 ; (iii) 14 .
2. Use pA = P (A), pB = P (B) and pAB = P (A B) to obtain expressions
for:
(a) P (A B),
(b) P (A B),
(c) P (A B),
(d) P (A B),
(B A)).
(e) P ((A B)
Describe each event in words. (Use a Venn diagram.)
Answer: (a) 1pAB ; (b) pB pAB ; (c) 1pA +pAB ; (d) 1pA pB +pAB ;
(e) pA + pB 2pAB .
3. (i) Express P (E1 E2 E3 ) in terms of the probabilities of E1 , E2 , E3 and
their intersections only. Illustrate with a sketch.
(ii)Three types of fault can occur which lead to the rejection of a certain
manufactured item. The probabilities of each of these faults (A, B and C)
1.2
1
.
15
f (x)dx = 1.
Example 6: Let f (x) = k(1 x2 ) on (1, 1). Calculate k and pr(|X| > 1/2).
k = 34
k(1 x2 )dx = k[x 31 x3 ]11 = 4k
3
1
5
P (|X| > 1/2) = 1 P ( 21 X 12 ) = 1 2 1 k(1 x2 )dx = 1 11k
= 16
12
Answer: 1 =
f (x)dx =
10
ux
p(u)
1.2.2 Expectation
The expectation (or expected value or mean) of the random variable X is defined
as
xp(x)
X discrete
= E(X) =
xf (x)dx X continuous
The Variance of X is 2 = Var(X) = E{(X )2 }. Equivalently 2 = E(X 2 )
{E(X)}2 (exercise: prove).
is called the standard deviation.
Functions of X:
(i) E{h(X)} =
h(x)p(x)
X discrete
11
1
4
=1
1
4
3
2
(1 + x)4 dx = k[ 31 (1 + x)3 ]
0 =
k
3
k=3
(1 + x)5 dx = 3[ 41 (1 + x)4 ]
0 =
3
4
12
31
61
73
Answer: E(X) = 20
, E(2X + 3) = 2E(X) + 3 = 10
, E(X 2 ) = 20
, so
2
499
499
2
Var(X) = E(X ) E(X) = 400 , Var(2X + 3) = 4Var(X) = 100 .
1
,
20
Chapter 2
Random variables and distributions
2.1
Transformations
Suppose that X has distribution function FX (x) and that the distribution function
FY (y) of Y = h(X) is required, where h is a strictly increasing function. Then
FY (y) = pr(Y y) = pr(h(X) y) = pr(X x) = FX (x)
where x x(y) = h1 (y). If X is continuous and h is differentiable, then it
follows that Y has density
fY (y) =
dFY (y)
dFX (x)
dx
=
= fX (x) .
dy
dy
dy
14
Example 9: Suppose that X has pdf fX (x) = 2e2x on (0, ). Obtain the pdf of
Y = log X.
Probability integral transform. Let X be a continuous random variable with distribution function F (x). Then Y = F (X) is uniformly distributed on (0, 1).
Proof. First note that 0 Y 1. Let 0 y 1; then
pr(Y y) = pr(F (X) y) = pr(X F 1 (y)) = F (F 1 (y)) = y,
so Y has pdf f (y) = 1 on (0, 1) (by differentiation), which is the density of the
uniform distribution on (0, 1).
This result has an important application to the simulation of random variables:
1 1 3 2
, , ,
10 5 10 5
and Y = (X 2)2 .
(i) Find E(Y ) and Var(Y ) using the formula for E{h(X)}.
(ii) Calculate the pmf of Y and use it to calculate E(Y ) and Var(Y ) directly.
15
( 1 )x
2
Binomial distribution
Consider a sequence of independent trials in each of which there are only two
possible results, success, with probability , or failure, with probability 1
(independent Bernoulli trials).
Outcomes can be represented as binary sequences, with 1 for success and 0 for
failure, e.g. 110001 has probability (1 )(1 )(1 ), since the trials are
independent.
Let the random variable X be the number of successes in n trials, with n fixed.
r
nr
The probability of a particular sequence
( ) of r 1s and n r 0s is (1 ) ,
n
and the event {X = r} contains
such sequences. Hence
r
(
p(r) = pr(X = r) =
n
r
)
r (1 )nr , r = 0, 1, . . . , n .
This is the pmf of the binomial (n, ) distribution. The name comes from the
binomial theorem
)
n (
n
{ + (1 )} =
r (1 )nr ,
r
n
r=0
from which
p(r) = 1 follows.
The mean is = n:
16
This is the pmf of the geometric () distribution: the probabilities are in geometric progression. Note that the sum of the probabilities over n = 1, 2, . . . is
1.
The mean is = 1/:
17
2.2.3
Poisson distribution
18
the average rate is two cars per minute, find the probability of exactly 3 cars in 5
minutes.
2.3
19
2.3.2
Exponential distribution
x
0
eu du = 1 ex
20
, x > 0,
(1 + x )+1
21
2.3.4
Self-study exercises
22
The parameters and 2 are the mean and variance respectively. The distribution
is denoted by N (, 2 ).
Mean:
The importance of the normal distribution follows from its use as an approximation in various statistical methods (consequence of Central Limit Theorem: see
Section 3.4.2), its convenience for theoretical manipulation, and its application to
describe observed data.
Standard normal distribution
The standard normal distribution is N (0, 1), for which the distribution function
has the special notation (x). Thus
x
u2
1
e 2 du .
(x) =
2
The function is tabulated widely (e.g. New Cambridge Statistical Tables). Useful values are (1.64) = 0.95, (1.96) = 0.975.
Example 12: Suppose that X is N (0, 1) and Y is N (2, 4). Use tables to calculate
pr(X < 1), pr(X < 1), pr(1.5 < X < 0.5), pr(Y < 1) and pr(Y 2 >
5Y 6).
23
2.4.2 Properties
(i) If X is N (, 2 ) then aX + b is N (a + b, a2 2 ).
In particular, the standardized variate (X )/ is N (0, 1).
(ii) if X1 is N (1 , 12 ), X2 is N (2 , 22 ) and X1 and X2 are independent, then
X1 + X2 is
N (1 + 2 , 12 + 22 ).
[Hence, from property (i), the distribution of X1 X2 is N (1 2 , 12 + 22 ).]
24
Proof:
2.5
25
Bivariate distributions
(i)
Outcome
X1
00
01
10
11
X2
Prob
(ii)
X2
-1
X1
0
1
2
Example 14: (continuous) Suppose X1 and X2 have joint pdf f (x1 , x2 ) = k(1
x1 x22 ) on (0, 1)2 . Obtain the value of k.
26
p1 (x1 ) = pr(X1 = x1 ) =
p(x1 , x2 ) and p2 (x2 ) = pr(X2 = x2 ) =
p(x1 , x2 )
x2
x1
pr(X1 = x1 |X2 = x2 )
pr(X1 = x1 , X2 = x2 )
p(x1 , x2 )
=
=
.
pr(X2 = x2 )
p2 (x2 )
Similarly
p2 (x2 |X1 = x1 ) =
p(x1 , x2 )
.
p1 (x1 )
f (x1 , x2 )
.
f2 (x2 )
f2 (x2 |X1 = x1 ) =
f (x1 , x2 )
.
f1 (x1 )
Similarly
27
Example 15: Suppose that R and N have a joint distribution in which R|N is
binomial (N, ) and N is Poisson (). Show that R is Poisson ().
2.5.4
where E(X1 X2 ) =
x1 x2 p(x1 , x2 ) (discrete) or x1 x2 f (x1 , x2 )dx1 dx2 (continuous).
The correlation between X1 and X2 is
= Corr(X1 , X2 ) =
Example 13: (continued)
Marginal distributions:
x1 = 0, 1, 2 with p1 (x1 ) =
x2 = 1, 0, 1 with p2 (x2 ) =
Marginal means:
1 = x1 p1 (x1 ) =
2 = x2 p2 (x2 ) =
Variances:
12
.
1 2
28
12 =
22 =
x21 p1 (x1 ) 21 =
x22 p2 (x2 ) 22 =
x1 = 0
x1 = 2
Covariance: 12 = x1 x2 p(x1 , x2 ) 1 2 =
Example 14: (continued)
Marginal distributions:
1
f1 (x1 ) = 0 k(1 x1 x22 )dx2 =
f2 (x2 ) =
Marginal means:
1 =
2 =
Variances:
12 =
22 =
1
0
1
0
1
0
1
0
1
0
x1 f1 (x1 )dx1 =
x2 f2 (x2 )dx2 =
x21 f1 (x1 )dx1 21 =
x22 f2 (x2 )dx2 22 =
29
Properties
(i) E(aX1 + bX2 ) = a1 + b2 , Var(aX1 + bX2 ) = a2 12 + 2ab12 + b2 22
Cov(aX1 + b, cX2 + d) = ac12 , Corr(aX1 + b, cX2 + d) = Corr(X1 , X2 )
(note: invariance under linear transformation)
Proof:
30
(iv) E(Y ) = E{E(Y |X)} and Var(Y ) = E{Var(Y |X)} + Var{E(Y |X)}
Proof:
2.6
31
Generating functions
2.6.1 General
The generating function for a sequence (an : n 0) is A(z) = a0 + a1 z + a2 z 2 +
n
=
n=0 an z . Here z is a dummy variable. The definition is useful only if the
series converges. The idea is to replace the sequence (an ) by the function A(z),
which may be easier to analyse than the original sequence.
Examples:
(i) If an = 1 for n = 0, 1, 2, . . . , then A(z) = (1 z)1 for |z| < 1 (geometric
series).
(
)
m
(ii) If an =
for n = 0, 1, . . . , m, and an = 0 for n > m, then A(z) =
n
(1 + z)m (binomial series).
2.6.2
P (z) = E(z X ) =
pn z n .
n
Properties
(i) |P (z)| 1 for |z| 1 .
Proof:
32
(iv) Let X and Y be independent random variables with pgfs PX and PY respectively. Then the pgf of X + Y is given by PX+Y (z) = PX (z)PY (z) .
Proof:
2.6.3
33
Properties
(i) = E(X) = M 0 (0), 2 = Var(X) = M 00 (0) 2 .
Proof:
(ii) Let X and Y be independent random variables with mgfs MX (z) , MY (z)
respectively. Then the mgf of X + Y is given by MX+Y (z) = MX (z)MY (z) .
Proof:
34
Chapter 3
Further distribution theory
3.1
Multivariate distributions
X1
X2
X=
Xp
3.1.1
Definitions
pr(X A) =
f (x)dx
A
(continuous case)
35
36
1
2
11 12 1p
21 22 2p
p1 p2 pp
Since the (i, j)th element of (X )(X )T is (Xi i )(Xj j ), we see that
= E{(X )(X )T } = E(XX T ) T .
3.1.3 Properties
Let X have mean and covariance matrix . Let a , b be p-vectors and A be a
q p matrix. Then
(i) E(aT X) = aT
(ii) Var(aT X) = aT a . It follows that is positive semi-definite.
(iii) Cov(aT X, bT X) = aT b
Proof:
37
38
3.2 Transformations
3.2.1 The univariate case
Problem: to find the distribution of Y = h(X) from the known distribution of X.
The case where h is a one-to-one function was treated in Section 1.2.3. When h
is many-to-one we use the following generalised formulae:
Continuous case: fY (y) =
fX (x) dx
dy
where in both cases the summations are over the set {x : h(x) = y}. That is, we
add up the contributions to the mass or density at y from all x values which map
to y.
Example 17: (discrete) Suppose pX (x) = px for x = 0, 1, 2, 3, 4, 5 and let Y =
(X 2)2 . Obtain the pmf of Y .
3.2. TRANSFORMATIONS
39
3.2.2
Case (ii): s < r. First transform the s-vector Y to the r-vector Y 0 , where Yi0 =
Yi , i = 1, . . . , s , and Yi0 , i = s + 1, . . . , r , are chosen for convenience. Now
0
find the density of Y 0 as above and then integrate out Ys+1
, . . . , Yr0 to obtain the
marginal density of Y , as required.
Case (iii): s = r but h() is not monotonic. Then there will generally be more
than one value of x corresponding to a given y and we need to add the probability
contributions from all relevant xs.
40
Sums and products If X1 and X2 are independent random variables with densities
f1 and f2 , then
41
1
2
on 12 x 32 .
3.3
3.3.1
TX
).
Here z T X = j zj Xj .
Properties
Suppose X has mgf M (z). Then
T
(i) X + a has mgf ea z M (z) and aX has mgf M (az).
then the mgf of X = (X1 , . . . , Xk )T is M (z) = kj=1 Mj (zj ), the product of the
individual mgfs.
Proof:
42
j=1
E|X|
.
a
43
Proof:
Cauchy-Schwartz inequality
Let X, Y be any two random variables with finite variances. Then
{E(XY )}2 E(X 2 )E(Y 2 ) .
Proof:
Jensens inequality
If u(x) is a convex function then
E{u(X)} u(E(X)) .
Note that u() is convex if the curve y = u(x) has a supporting line underneath at
each point, e.g. bowl-shaped.
Proof:
44
Examples
1. Chebyshevs inequality.
Let Y be any random variable with finite variance. Then for all a > 0
pr(|Y | a)
2
.
a2
2. Correlation inequality.
2 2
{Cov(X, Y )}2 X
Y (which implies that |Corr(X, Y )| 1).
3. |E(X)| E(|X|).
[It follows that |E{h(Y )}| E{|h(Y )|} for any function h().]
45
3.3.4
Self-study exercises
1. Find the joint mgf M (z) of (X, Y ) when the pdf is f (x, y) =
y)e(x+y) on (0, )2 . Deduce the mgf of U = X + Y .
1 3
(x
2
2 z2
.]
3. Suppose that X is such that E(X) = 3 and E(X 2 ) = 13. Use Chebyshevs
inequality to determine a lower bound for pr(2 < X < 8).
4. Show that {E(|X|)}1 E(|X|1 ).
3.4
3.4.1
Let X1 , X2 , . . . be a sequence of random variables. There are a number of alternative modes of convergence of (Xn ) to a limit random variable X. Suppose first
that X1 , X2 , . . . and X are all defined on the same sample space .
46
Convergence in probability
p
Xn X if E|Xn X|r 0 as n .
[rth mean p, but rth mean 6 a.s. ]
Suppose now that the distribution functions are F1 , F2 , . . . and F . The random
variables need not be defined on the same sample spaces for the following definition.
Convergence in distribution
d
47
3.4.2
n = n1 Sn .
Sn = ni=1 Xi , X
p
n
Weak Law of Large Numbers (WLLN). If E|Xi | < then X
.
n) =
Proof (case 2 = Var(Xi ) < ). Use Chebyshevs inequality: since E(X
we have, for every > 0,
n | > )
pr(|X
n)
Var(X
2
=
0
2
n2
as n .
Example 21: (continued). Here 2 = Var(Xi ) =
n , the proportion of successes.
applies to X
1
4
n a.s.
Strong Law of Large Numbers (SLLN). If E|Xi | < then X
.
[The proof is more tricky and is omitted.]
Central Limit Theorem (CLT). If 2 = Var(Xi ) < then
Sn n d
N (0, 1) .
n
Equivalently,
n d
X
N (0, 1) .
/ n
Proof. Suppose that Xi has mgf M (z). Write Zn =
given by
(
zZn
Sn
n
.
n
The mgf of Zn is
)}n
){ (
z n
z
.
) = exp
M
48
Therefore the cgf of Zn is
(
)
z n
z
n
{ (
)
(
)2 }
(
)
1
z n
z
z
+
+O
=
+n
2 n
n
n
)
(
z n z n z 2
z2
1
=
+
+
+O
2
2
n
as n , which is the cgf of the N (0, 1) distribution, as required.
[Note on the proof of the CLT. In cases where the mgf does not
exist, a similar proof
izX
j
can be given in terms of the function (z) = E(e
) where i = 1. () is called the
characteristic function and always exists.]
= and 2 = (1 ), so the CLT gives n(X
n )/ {(1 )} is
approximately N (0, 1).
p
n
Furthermore, X
by the WLLN, and it follows from the useful result that
49
3.5
1 = k {1 (1 )}k =
p(x)
x=k
50
p(x) = pr(X = x) =
m
x
)
1x1 2x2 kxk ,
where j = nj / ki=1 ni . This is the pmf of the multinomial (k, m, ) distribution. Here
(
)
m
= no. of different orderings of x1 + + xk beads
x
(
)
m!
=
x1 ! x k !
51
and the probability of any given order is 1x1 2x2 kxk . The name multinomial
m
comes from the multinomial
(
) expansion of (1 + +k ) in which the coefficient
m
of 1x1 2x2 kxk is
.
x
The means are mj :
3.5.4
Self-study exercises
52
In particular, for integer a, (a) = (a 1)! (since (1) = 1). Also, (1/2) = .
1
Beta function: B(a, b) = 0 xa1 (1 x)b1 dx for a > 0, b > 0
Relationship with Gamma function: B(a, b) = (a)(b)
(a+b)
x1 ex
,x>0
()
53
Example 22: The journey time of a bus on a nominal 12 -hour route has the gamma
(3, 6) distribution. What is the probability that the bus is over half an hour late?
3.6.3
Beta distribution
x1 (1 x)1
, 0 < x < 1,
B(, )
54
The mean is /( + ):
X1
X2
(
=
U1 U2
U1 (1 U2 )
with Jacobian
dx u2
u1
=
du 1 u2 u1
= u1 .
55
Therefore
[
] [ 2
]
1 (u1 u2 )1 1 eu1 u2
{u1 (1 u2 )}2 1 eu1 (1u2 )
fU (u) =
| u1 |
(1 )
(2 )
{ 1 +2 1 +2 1 u1 } {
}
u1
e
(1 + 2 ) 1 1
2 1
=
u
(1 u2 )
(1 + 2 )
(1 )(2 ) 2
on (0, ) (0, 1) and the result follows.
3.6.4
Self-study exercises
1. Suppose X has the gamma (2, 4) distribution. Find the probability that X
exceeds +2, where , are respectively the mean and standard deviation
of X.
2. Derive the variance of the gamma (, ) distribution. [Either by direct calculation or using the mgf.]
3. Find the distribution of log X when X is uniform (0,1). Hence show
that if X1 , . . . , Xk are iid uniform (0,1) then log(X1 X2 Xk ) is gamma
(k, 1).
4. If X is gamma (, ) show that log X has mgf z (z + )/().
5. Suppose X is uniform (0, 1) and > 0 Show that Y = X 1/ is beta (, 1).
56
Chapter 4
Normal and associated distributions
4.1
X1
X2
)
,=
1
2
(
, =
11 12
21 22
(
=
12
1 2
1 2
22
|2| = (2)2 12 22 (1 2 )
(
2 1
= (1 )
/(1 2 )
1/12
/(1 2 )
1/22
)
, giving
{(
[
)2
(
)(
) (
)2 }]
x1 1
x1 1
x2 2
x2 2
1
2
+
exp 2(12 )
1
1
2
2
f (x1 , x2 ) =
21 2 1 2
57
58
4.1.2 Properties
i) Suppose X is Np (, ) and let Y = T 1 (X ), where = T T T . Then
Yi , i = 1, . . . , p, are independent N (0, 1).
T z+ 1 z T z
2
59
X i is Np ( i i , i i ).
60
(v) Moments of Np (, ). Obtain by differentiation of the mgf. In particular, differentiating w.r.t. zj and zk gives E(Xj ) = j , Var(Xj ) = jj and
Cov(Xj , Xk ) = jk .
Note that if X1 , . . . , Xp are all uncorrelated (i.e. jk = 0 for j 6= k) then
X1 , . . . , Xp are independent N (j , j2 ).
61
4.1.4
Self-study exercises
((
0
1
))
) (
1 1
,
distribution
1 4
in component form.
2. Suppose that X i , i = 1, . . . , n, are independent Np (, ). Show that the
= n1 X i is Np (, n1 ).
sample mean vector, X
i
3. For the distribution in exercise 1, obtain the marginal distributions of X1
and X2 and the conditional distributions of X2 given X1 = x1 and X1 given
X2 = x2 .
62
f (u) =
2 2 ( 12 )
, u > 0.
2
2
2
2
i=1 Xi is . In particular, if X is N (0, 1) then X is 1 .
n
i=1
Ui is 2 with =
n
i=1
i .
63
(iii) If X is Np (, ) then (X )T 1 (X ) is 2p .
2
1
2
mean and S = (n 1)
i (Xi X) the sample variance.
is N (, 2 /n), (n 1)S 2 / 2 is 2n1 and X
and S 2 are independent.
Then X
Proof:
64
65
1
2
B( 21 , 22 )(1 +
x 2 1 1
1
1 x 12 (1 +2 )
)
2
, x > 0.
U1 /1
F1 ,2 .
U2 /2
Proof:
It follows from the above result that (i) F1 ,2 1/F2 ,1 and (ii) F1, t2 .
(Exercise: check)
4.3
4.3.1
(the sample mean) and S 2 = (n 1)1 i (Yi Y )2 (the sample variance) are
independent, respectively N (, 2 /n) and 2 2n1 /(n 1). Hence
Z=
(Y )
/ n
66
is N (0, 1),
(n 1)S 2
U=
2
is 2n1 and Z, U are independent.
It follows that
Z
Y
=
T =
S/ n
U/(n 1)
is tn1 .
Applications:
Inference about : one-sample z-test ( known) and t-test ( unknown).
Inference about 2 : 2 test.
4.3.2 Two-samples
Two independent samples. Suppose that Y11 , . . . , Y1n1 are iid N (1 , 12 ) and Y21 , . . . , Y2n2
are iid N (2 , 22 ).
Summary statistics: (n1 , Y1 , S12 ) and (n2 , Y2 , S22 )
(n 1)S12 +(n2 1)S22
Pooled sample variance: S 2 = 1 n1 +n
2 2
From Section 4.2, if 12 = 22 = 2 , say, then Y1 and (n1 1)S12 are indepen2
2 2
2
dent N (1 , n1
1 ) , n1 1 respectively, and Y2 and (n1 1)S2 are independent
2
2 2
N (2 , n1
2 ) , n2 1 respectively.
Furthermore, (Y1 , (n1 1)S 2 ) and (Y2 , (n2 1)S 2 ) are independent.
1
1
2
2
2 2
Therefore (Y1 Y2 ) is N (1 2 ), (n1
1 +n2 ) ), (n1 +n2 2)S is n1 +n2 2
and (Y1 Y2 ) and (n1 + n2 2)S 2 are independent.
Therefore
T
(Y1 Y2 ) (1 2 )
S ( n11 + n12 )
is tn1 +n2 2 .
Also, since S12 , S22 are independent,
F
S12
Fn1 1,n2 1 .
S22
Applications:
Inference about 1 2 : two-sample z-test ( known) and t-test ( unknown).
Inference about 12 /22 : F (variance ratio) test.
67
S/ n
4.3.3
mean) and n = i ni .
Then ssT = ssW + ssB where
ssB/(k 1)
ssW/(n k)
68
= Y x
x
i
i
V =
x
1
Txx
Sums of squares
Residual ss: Q(
, );
2 1/2
an unbiased estimator of with estimated standard error (S/ T xx )( i xi /n) .
If = 0 then
0
T =
S/ T xx
69