Академический Документы
Профессиональный Документы
Культура Документы
Kalbfleisch
Probability and
Statistical Inference
Volume 2: Statistical Inference
Second Edition
With 27 Illustrations
Springer-Verlag
New York Berlin Heidelberg Tokyo
Springer Texts in Statistics
Advisors:
Stephen Fienberg Ingram Olkin
J.G. Kalbfleisch
University of Waterloo
Department of Statistics and Actuarial Scien
N2L 3G 1 -
ce Preface
Waterloo, Ontar io,
Canada
Editorial Boar d
Stephen Fienberg Ingram Olkin
Department of Statistics Department of Statistics
Carnegie-Mellon University Stanford University
Pittsburgh, PA 15213 Stanford, CA 94305
U.S.A. U.S.A.
CH AP TE R 10
Tw o-P ara me ter Lik
eli ho od s
53
10. l Maximum Likelihood
Estimation
J0.2 Relative Likelihood and 53
Contour Maps
10.3 Maximum Relative Lik 61
elihood
10.4 Normal Approximatio 65
ns
!0. 5 A Dose-Response Ex. 70
ample
10.6 An Example from Le 74
aming Theory
10.7* Some Derivations 83
10.8* Multi-Parameter Likelih 88
oods
92
Contents ix
viii Contents
CHAPTER 13
Analysis of Normal Measurements 196
13 . I Introduction 196
13.2 Statistical Methods 200
13 . 3 The One-Sample Model 206
13.4 The Two-Sample Model 212
13 . 5 The Straight Line Model 220
13.6 The Straight Line Model (Continued) 229
13 .7 Analysis of Paired Measurements 234
Review Problems 240
CHAPTER 14
Normal Linear Models 242
14. 1 Matrix Notation 242
14.2 Parameter Estimates 247
14.3 Testing Hypotheses in Linear Models 252
14.4 More on Tests and Confidence Intervals 260
14.5 Checking the Model 267
14.6* Derivations 274
CHAPTER 15
Sufficient Statistics and Conditional Tests 277
15 . I The Sufficiency Principle 277
Contents of Volume 1
Preface
CHAP TER I
Introduction
I. I Probability and Statistics
1. 2 Observed Frequencies and Histograms
1.3 Probability Models
1.4 Expected Frequencies
CHAP TER 2
Equi-Probable Outc omes
2.1 Combinatorial Symbols
2.2 Random Sampling Without Replacement
2.3 The Hypergeometric Distribution
2.4 Random Sampling With Replacement
2.5 The Binomial Distribution
2.6* Occupancy Problems
2.7* The Theory of Runs
2.8* Symmetric Random Walks
CHAP TER 3
The Calculus of Probability
3.1 Unions and lntersections of Events
3.2 Independent Experiments and Produ
ct Models
3.3 Independent Events ·
Contents Contents Xlll
xii
CHAPTER 6
Continuous Variates
6.1 Definitions and Notation
6.2 Uniform and Exponential Distributions
6.3* Transformations Based on the Probability Integral
6.4* Lifetime Distributions
6.5* Waiting Times in a Poisson Process
6.6 The Normal Distribution
6. 7 The Central Limit Theorem
6.8 Some Normal Approximations
6.9 The Chi-Square Distribution
6.10 The F and t Distributions
Review Problems
CHAPTER 7
Bivariate Continuous Distributions
7. 1 Definitions and Notation
7.2 Change of Variables
CHAPTE R 9
Likelihood Methods
The first volume dealt with probability models, and with mathematical
methods for handling and describing them. Several of the simplest discrete
and continuous probability models were considered in detail. This volume is
concerned with applications of probability models in problems of data
analysis and interpretation.
One important use of probability models is to provide simple mathemat-
ical descriptions of large bodies of data. For instance, we might describe a set
of 1000 blood pressure measurements as being like a sample of 1000
independent values from a normal distribution whose mean µ and variance
a 2 are estimated from the data. This model gives a concise description of the
data, and from it we can easily calculate the approximat e proportion of blood
pressure measurements which lie in any particular range. The accuracy of
such calculations will, of course, depend upon how well the normal distri-
bution model fits the data.
We shall be concerned primarily with applications of probability models in
problems of statistical inference, where it is desired to draw general
conclusions based on a limited amount of data. For instance, tests might be
run to determine the length of life of an aircraft component prior to failure
from metal fatigue. Such tests are typically very expensive and time
consuming, and hence only a few specimens can be examined. Based on the
small amount of data obtained, one would attempt to draw conclusions
about similar components which had not been tested. The link between the
observed sample and the remaining component s is provided by the proba-
bility model. The data are used to check the adequacy of the model and to
estimate any unknown parameters which it involves. General statements
concerning this type of component are then based on the model.
A limited amount of data can be misleading, and therefore any general
2 9. Likelihood Methods 9.1 The Method of Maximum Likelihood 3
conclusions drawn will be subject to uncertainty. Measurement of the extent 9.1 The Method of Maximum Likelihood
of this uncertainty is an important part of the problem. An estimate is of little
value unless we know how accurate it is likely to be. Suppose that a probability model has been formulated for an experiment, and
In statistical inference problems, we usually start with a set of data, and that it involves a single unknown parameter 8. The experiment is performed
with some information about the way in which the data were collected. We and some data are obtained. We wish to use the data to estimate the value of
then attempt to formulate a probability model for the experiment which gave 8. More generally, we wish to determine which of the possible values of 8 are
rise to the data. Examination of the data, and· of other similar data sets, can plausible or likely in the light of the observations. .
be very useful at this stage. It is important to treat the data set in context, and The observed data can be regarded as an event E in the sample space for
to take full advantage of what is already known from other similar the probability model. The probability of event E can be determined from the
applications. model, and in general it will be a function of the unknown parameter, P(E; 8).
Usually the probability model will involve one or more unknown para- The maximum likelihood estimate (MLE) of 8 is the value of 8 which
meters which must be estimated from the data. We have already encountered maximizes P(E ; 8). The MLE of 8 is usually denoted by 0. It is the parameter
this problem on several occasions, and have used the observed sample mean value which best explains the data E in the sense that it maximizes the
as an estimate of the mean of a Poisson or exponential distribution. probability of E under the model..
Intuitively, this is a reasonable thing to do, but intuition may fail us in more
complicated situations. EXAMPLE 9.1.1. Suppose that we wish to estimate 8, the proportion of people
Section 9.1 introduces the method of maximum likelihood, which provides with tuberculosis in a large homogeneous population. To do this, we
a routine procedure for obtaining estimates of unknown parameters. Section randomly select n individuals for testing, and find that x of them have the
2 considers the problem of estimating an unknown parameter 8 on the basis disease. Since the population is large and homogeneous, we assume that the n
of data from two independent experiments. Section 3 shows how the relative individuals tested are independent, and that each has probability 8 of having
likelihood function may be used to rank possible values of 8 according to tuberculosis. The probability of the observed event (data) is then
their plausibilities.
Section 4 describes likelihood methods when the probability model is P(E; 8) = P(x out of n have tuberculosis)
continuous. The special case of censoring in lifetime experiments is consi-
dered in Section 5. Section 6 discusses the invariance property of likelihood (9.1.1)
methods, and Section 7 describes a normal approximation to the log relative
likelihood function. The use of Newton's method in finding maximum where 0::;; 8::;; 1. The maximum likelihood estimate 8 is the value of 8 which
likelihood estimates and likelihood intervals is illustrated in Section 8. x
maximizes (9.1.1). We shall show later that (9.1.1) is maximized for 8 = - , and
In this chapter it is assumed that the probability model involves only one n
unknown parameter. Likelihood methods for the estimation of two or more
unknown parameters are described in Chapter 10. Some theoretical pro- .
so the MLE of 8 is e= :_n . To maximize the probability of the data we
perties of these estimation procedures are considered in Chapter 11. x
Chapter 12 introduces tests of significance, which are used to investigate estimate 8, the proportion of diseased persons in the population, by - , the
n
whether various hypotheses of interest are consistent with the data. Several proportion of diseased persons in the sample.
applications of significance tests to frequency data are given.
Traditionally, the normal distribution has played a very important role in
statistical applications. Chapters 13 and 14 develop estimation procedures The Likelihood and Log Likelihood Functions
and significance tests for a variety of situations where measurements are
assumed to be independent and normally distributed. Finally, Chapters 15
and 16 deal with some more advanced topics in statistical inference. Note that the constant factor (:) will have no effect on the maximization of
(9.1.1) over 8. To simplify expressions, we shall generally omit such constants
and consider only the part of P(E; 8) which involves 8.
The likelihood function of 8 is defined as follows :
L(8) = c · P(E; 8). (9.1.2)
4
9. Likelihood Methods
\, 9.1 The Metho d of Maximum Likelihood
5
Here c is any positive const ant with respect to
8; that is, c is not a function of The ieformation functi on ..1(8) is minu s the
8, altho ugh it may be a function of the data. secon d derivative of the log
We choos e c to obtai n a simple
expression for Ll,8), and subse quent results will likelihood function with respect to 8:
not depen d upon the specific
choice made.
Usually P(E; 8) and L(8) are produ cts of terms
, and it will be more .~(8) = -/"(8 ) = -S'(8 ) = - d;~~). (9.1.5)
conve nient to work with logar ithms . The log likelih
ood functi on is the natur al
logar ithm of L Note that neither S(8) nor ..1(8) depen ds on the
choice of c in (9.1.2).
The set n of possible values of 8;is called the param
1(8) =log L(8). e.ter space. Usua lly Q is
(9.1.3) an interval of real values, such as. [O, l] in the
exam ple above, and the first
Note that, by (9.1.2), and second derivatives of /(8) with 'respect to 8
exist at all interi or point s of n.
Then, if 8 is an interi or point of n, the first deriv
/(8) = c' +log P(E; 8) ative will be zero and the
second derivative will be negat ive at 8 = 8. Thus
under these condi tions we
where c' =log c is not a funct ion of 8. have
The maxim um likeli hood estim ate B is the value
of 8 which maximizes S(B) =0; ..f(B) > o. (9.1.6)
P(E; 8). The value of 8 which maximizes P(E;
8) will also maximize L(8) and To find 8, we determ ine the roots of the maxim
1(8). Thus the MLE B is the value of 8 which um likelihood equat ion S(8) = 0.
maximizes the likelihood We then verify, by checking the sign of ..1(8)
funct ion and the log likelihood function. Usua or otherwise, that a relative
lly it is easiest to work with the maxim um has been found.
log likelihood function.
In some simple examples, the maxim um likeli
hood equat ion S(8) = 0 can
be solved algebraically to yield a formula
EXAM PLE 9.1.1 (continued). The
likelihood function of 8 is any const ant c situations, it will be necessary to solve this equat
for e.
In more comp licate d
times the expression for P(E; 8) in (9.1.1), where ion numerically (see Section
c may depen d on n and x but 9.8).
not on 8. Since the aim in choos ing c is to simpl
/(8) = x log 8 + (n- x)log (l - 8) for 0:::;: 8:::; I. I EXAM PLE 9.1.1 (continued). For
functions are
this example, the score and inform ation
The MLE Bis the value of 8 which maximizes
/(8). d/(8) x n- x
S( 8) = de = B - 1- 8 for0 <8 < l;
The Score and Information Functions
dS(8) x n- x
5 (8) = - d8 = for 0 < 8 <I.
82 + (1 - 8) 2
To evalu ate 8, we need to locate the maxim um
of /(8) over all possible values For 1 :::; x:::; n - I, the maxim um likelihood equat
of 8. This can usually be done by differentiating ion S(8) = 0 has a uniqu e
/(8) with respect to 8, settin g
the derivative equal to zero, and solving for soluti on 8 =::.S ince .1(8) > 0 at 8 = :: , the likeli
8. It is possible that this hood function has a relative
proce dure migh t yield a relative minim um or point n n
of inflexion instead of the
maxi mum desired. Thus it is necessary to verify
that a maxim um has been maxim um at 8 = ::. Furth ermo re, since L(B) =
found , perha ps by check ing that the second deriva 0 for 8 = 0 and for 8 = 1, we
tive is negative. n
The score functi on S(8) is defined to be the . x
likeli hood funct ion with respect to 8:
first derivative of the log have found the overall maxim um; and thus
= - .
n
e
If x = 0, the equat ion S(8) = 0 has no soluti on, and
the maxi mum occur s on
S(B) = /'(8) = d~~); (9.1.4)
a boun dary of the param eter space Q = (0, 1].
In this case we have
P(E; 8) = (1 -' 8)" for 0 s 8:::; 1,
6 9. Likelihood Methods 9.1 The Method of Maximum Likelihood 7
which is clearly largest when 8 = 0. Thus i1=0 when x = 0. Similarly we find In both cases, the MLE isµ= x. To maximize the probability of the data
x x1, x2,.. ., x", we estimate the population mean µby the sample mean x.
that i1 = l when x = n, and the formula 8= - holds for all x .
n EXAMPLE 9.1.3. It is usually not possible to count the number of bacteria in a
EXAMPLE 9.1.2. Some laboratory tests are run on samples of river water in sample of river water; one can only determine whether or not any are present.
order to determine whether the water is safe for swimming. Of particular n test tubes each containing a volume v of river water are incubated and
interest is the concentration of coliform bacteria in the water. The number of tested. A negative test shows that there were no bacteria present, while a
coliform bacteria is determined for each of n unit-volume samples of river positive test tube shows that at least one bacterium was·present. If y tubes out
water, giving n observed counts x 1 , x 1 , .. ., x 0 • The problem is to estimateµ, of the n tested give negative results, what is the maximum likelihood estimate
the average number of coliform bacteria per unit volume in the river. ofµ?
We assume that the bacteria are distributed randomly and uniformly SOLUTION. The probability that there are x bacteria in a volume v of river
throughout the river water, so that the assuinptions of a Poisson process water is given by a Poisson distribution with mean µv :
(Section 4.4) are satisfied. Then the probability of observing x 1 bacteria in a
sample of unit volume is given by a Poisson distribution: f(x) = (µv)"e - µ"/x!; x =0, l, 2, ... .
Since disjoint volumes are independent, the probability of the n observed P = f(O) = e-µv;
counts x 1 , x 2 , .. ., xn is the probability of a positive reaction (at least one bacterium) is
P(E; µ) = f(x1)f(x1) .. · f(xn) l -p= l -e-µ•.
Since disjoint volumes are independent, the n test tubes constitute independ-
ent trials. The probability of observing y negative reactions out of n is
therefore
The likelihood function is c · P(E; µ)where c is any constant not depending
upon µ. We choose c to simplify the expression for L(µ), and a natural choice
here is
P(E ; µ) = c) pl'(l - pr-Y
L(µ) = µr.x•e-nµ for 0::; µ < oo, chosen to give a simple expression for L(µ). Taking c = 1 / (~), we have
and the log-likelihood function is L(µ) = pl'(l - Pry
I(µ) =fa; log µ - nµ. where p = e-vµ and 0:::; µ < oo .
I
The score and information functions are Since p = e -•µ , it follows thatµ= - - log p. From Example 9.1.1, the
v
1
S(µ) = - fa 1 - n;
µ function pl'(l - p)"- 1 is maximized for p =~ . The corresponding value ofµ is
n
These functions will be the same no matter how the constant c is chosen. µ= _ ~ log P= _ ~log~ = log n - logy
If l:x 1 > 0, the maximum likelihood equation S(µ) = 0 has a unique solution v v n v
1
µ = -l:x 1 = x. Since §(µ) > 0 at µ = x, we have found a relative maximum. Here we have used the invariance property of likelihood (see Section 9.6).
n For instance, suppose that 40 test tubes each containing 10 ml of river
Furthermore, since L(O) = 0 and L(µ)-+ 0 as µ-+ oo, this must be the overall water are incubated. If 28 give negative tests and 12 give positive tests, then
maximum.
If l:x 1 = 0, the equation S(µ) = 0 has no solution, and the maximum occurs • log 40 - log 28
µ= 10 = 0.0357.
on the boundary of the parameter space: µ = 0.
8 9. Likelihood Methods
9.1 The Method of Maximum Likelihoo d
9
The concentration of bacteria in the river is estimated to be 0.0357
per ml.
The greater the concent ration of bacteria in the river, the more probabl The number of defective items out of 10 is thought to have a binomia
e it l
is that all n test tubes will give positive results. Hence the larger the value distribution. Find the MLE of 8, the probability that an item is defectiv
ofµ, e, and
the more probable the observation y = 0. If. we observe y = 0, the MLE compute estimated expected frequencies under the binomial distribu
ofµ model. tion
will be + oo. In this case, it does not make much practical sense to give ·
merely
a single estimate ofµ. What we require is an indication of the range
ofµ- SOLUTION. According to a binomial distribution model, the probabi
values which are plausible in the light of the data, rather than a single lity of
"mos.t observing j defectives out of 10 ·is
plausible" value. This can be obtaine d by examining the relative likeliho
od
(~0)81(1-8)10-1;
function (see Section 3).
0 P1= j = 0, 1, 2, ... ' 10.
Likelihoods Based on Frequency Table.s The probability of observing 4 or more defectives is P4+ = 1 - p
0
- p 1 - p2
- p3 • By (9.1.7), the likelihood function of 8 is
Data from n independent repetitions of an experiment are often summar
ized for 0.::;; 8.::;; 1,
in a frequency table:
where c is any convenient positive constant. Taking
Event or class Total
Observed frequency n
Expected frequency n we obtain
The sample space S for a single repetition of the experiment is partitio 48) = [(1- 8)10]133[8(1 - 8)9]52[82(1 - 8)8] 12 (83(1 - 8)7f
ned
into k mutually exclusive classes or events, S = A u A u · · · u Ak. Then
1 2 fj is = 8ss(l _ 8)1 91 5 .
the number of times that Ai occurs in n repetitions CL.fj = n). Let Pi
be the
probability of event Ai in any one repetition (L.p = 1). The p/s This likelihood function is of the form considered in Example 9.1.l,wi
1 can be th
determined from the probability model. If the model involves an unknow
n 85
parame ter e, the p/s will generally be functions of e. x = 85 and n = 2000. Hence B= 2000 = 0.0425.
The probab ility of observing a particul ar frequency table is given by The estiniated probability for class j = 0 is
the
multinomial distribution
Po= ( ~ ) 8°(1 -
0
P(E; 8) = (11h ... !,.) p{• p{2... rl,;. 8) 10 - 0 = (1 - 0.0425)1 ° = 0.6477
The likelihood function is and the estimated expected frequency for this class is
5.
np + (2x2 + x 3 - x 1 )p - 2x 3 = 0.
A brick -shap ed die (Exa mple 1.3.2) is
rolled 11 times, and the ith face comes up x (b) Supp ose that x, = 58, X2 = 33, and
times (i = 1, 2, ... , 6), where :Ex = n.
1
1 x
expected frequencies unde r the mode 3
= 9. Find p, and comp ute estim ated
l.
(a) Show that &= (3t - 2n)/ 12n, wher
e t = x1 + x2 + x3 + x • 11. An urn conta ins r red balls and b black
(b) Supp ose that the obse rved frequ 4
balls, wher e r is know n but b is unkn own.
encies are 11, 15, 13, 15, 22, 24. Com
estim ated expe cted frequencies unde pute Of n balls chose n at rand om witho ut
r the model. repla ceme nt, x were red and y were
(x + y = n). black
(a) Show that the MLE of(} satisfies the equation ents,
To combin e informa tion about 8 from two or more indepen dent experim
the log likeliho od function s.
x[l - (1-{}t] -k(J=O we multiply the likeliho od functions, or add
It follows from (9.2.2) and (9.1.4) that
where :i = 'Lix Jn. (9.2.3)
(b) Use the binomial theorem to show that, if() is small, then S(8) = sI (8) + S2(8).
the score
() ~ 2(x - l)J(k - l)x. The score function for the compos ite experim ent is the sum of
compon ents. Similarl y, (9.2.3) and (9.1.5) give
function s for the indepen dent
(c) Solve for &in the case k = 5, :i = 1.12.
14
9. Likelihood Methods
9.2. Combining Independent Experiment
s ·
15
f(8) = f 1(8) +f2( 8).
(9.2.4)
Let 81 , Bi, and 8 be the MLE's of 8 base B=0.98 1 +0. 182 .
d on just the first experiment, on
just the second experiment, and on both The overall MLE lies between 8 and
experiments, respectively. Thus 8 82 , and is closer to 8 , the MLE from
maximizes 11(8), Bi maximizes 12(8), and
e
maximizes 1(8). Except in special
1 the larger sample, than to 8 • 1 1
2
cases, it is not possible to compute from
likelihoods using (9.2.2) and then rema
e just B1 and Bz. One has to add log Note that the log likelihood function
(9.2.6) is the same as would be
ximize to get
If 81 = B2 , then both terms on the right e. obtained if we considered a single samp
le of n + m individuals, x + y of whom
hand side of (9.2.2) attain their were ~ound t~ ~ave tuberculosis. The divis
max ima at the same value of 8, and henc ion of the results into two sepa rate
e B= 81 = 82 • Otherwise, the overall expenments 1s 1rrelevant in so far as estim
ation of 8 is concerned. o
max imum will usually lie between el
and 82 . EXAMPLE 9.2.2. I~ performing the experime
e
If the estimates 1 , 8 were quite diffe
2 rent, it would usually be unwise to necessary to s_pecify the volume v of river
nt described in Example 9.1.3, it is
com bine results from the two experime water which is to be placed in each
nts to obta in a single overall estimate. test tube. If v is made too large, then all
Instead the results from the two experime of the test tubes will cont ain bacteria
nts should be reported separately, and ~ve a p~sitive reaction. If v is too
and an explanation of the difference shou small, we may get only negative
ld be sought. For further discussion reactions. In either case, the_experimen
see Example 9.3.2 and Section 12.3. t will be rathe r uninformative abou tµ,
the concentrat10n of bact ena in the river
.
EXAMPLE 9.2.l. Suppose that, in Exam . One way to guard against this diffic
ple 9.1.l, m additional people are ulty is to prep are two (or more)
rand omly selected, and y of them are different types of test tubes containin
found to have tuberculosis. Find the g different volumes of river water.
MLE of 8 based on both sets of data. Suppose t~at 40 test tubes containing 10
ml of river water were tested, and 28
gave negative results. Also, 40 test tube
SOLUTION . For the first experiment, the s containing l ml of river water were
log likelihood function is tes~ed, and 37 gave negative results.
est1mate of µ? Wha t is the maximum likelihood
11 (8)= xlog 8+( n-x )log (l -8),
(9.2.5)
and the max imum likelihood estimate SOLUTION . From Example 9.1.3, the likel
is 81 =::. For th~ second experiment, containing 10 ml is ihood function based on the 40 tubes
n (
we similarly obta in
Li(µ )= pfs( l - pi)12
12 (8) = y log 8 +(m -y)l og( l - 8), where P1 =e- 10", and th MLE
function is e ~f µ · µ' 1 = 0.035
is 7. The log likelihood
and 82 = l':..
Because the popu latio n is large, the two
m samples will be very
/1(µ) = 28 log p 1 + 12 log( l - p ).
nearly independent, and hence by (9.2.2 1
), the log likelihood function based on Similarly, !'rom the 40 tubes containin
both samples is g l ml we obta in
1(8) = 11(8)+12(8) 12(µ) = 37 logp 2 + 3 log( l -p )
2
= (x + y) log 8 + (n + m - x - y) log (1 - where p 2 = e-", and the MLE ofµ is
8). (9.2.6)
This is of the same form as (9.2.5), and , log n - log y log 40 - log 37
the overall MLE is
µ2 = v l = 0.0780.
B= x+y . By (9.2.2), the log likelihood function
n+m based on all 80 tubes is
Since x = nB 1 and y= mB 2 , we have /(µ) = I1 (µ) + 12 (µ),
and the overall MLE µ is chosen to max
n m imize this function.
71
u= --8 1 +- -B z, For the first sample we have ·
n+m n+m
which is a weighted average of 8 and S 1 (µ) =-d11 (µ) d/1 dpl
=- ·-
1 8 . For instance, if 90 individuals are dµ dpl dµ
examined in the first sample (n = 90), and2
only 10 in the second (m = 10), we
have
=- !Op! [28 -
Pi
._E_J
l - P1
= ~-
1 - P1
400·
'
9.3. Relative Likelihood 17
9. Likelihood Methods
16
Region Popul ation Death s
d 120 dpt 1200p
I
f1(µ )= - -Si( µ)= - (I - pi) 2 -dµ = (1 - pi) 2 • I. Eastern Ontar io 423,447 37
dµ
2. Lake Ontar io 175,685 II
Similarly, for the second sample we obtai n 3. Central Ontar io l,245,379 72
4. Niaga ra 413,465 40
5. Lake Erie 216,476 12
6. Lake St. Clair 242,810 14
7. Mid-Western Ontar io 213,591 16
s are
Thus, by (9.2.3) and (9.2.4), the combined result 8. Georgian Bay 166,045 9
9. North easter n On~ario 265,880 15
120 3 10. Lakehead- NW Ontar io 116,371 12
S(µ) =- - + ---4 40;
1 -pl 1 -pl
a + a 2 = I. Show that 8 must
where a1 and a 2 are positive real numbers with 1
lie between 81 and 02 ,
r(9 ) Since p 1 = y/n = 0.7 at the maximum (Example 9.1.3), the maximum log
.04 .06 .08 .10
.02 likelihood is
11 (ji 1 ) = 28 log 0.7 + 12 log 0.3 = -24.43.
_ _ based on 3 diseased
out of 100 The log relative likelihood function is then
-I
r 1 (µ) = / 1 (µ)- / 1 (jii) = -280µ + 12 log(l -e - µ) + 24.43.
10
_____ based on 6 diseased
out of 200
Similarly, the log relative likelihood function based only on observation 2 is
-2
R•O.I
\
\
r 2 (µ) = -37µ + 3 log(l - e-µ) + 10.66.
\
-3
\
\ For both observations together, the log LF is
\
I
I \
\
= 11(µ)+12(µ)
/(µ)
I
-4 I
I \ = -317µ + 12 log(l-e - 10µ) + 3 log(l -e - µ).
\
From Example 9.2.2, the overall MLE is µ = 0.04005, and substituti on of this
I I
I R•0.01
value gives /(fl)= - 35.71. The log RLF based on both observations is thus
I
\
-5
r(µ) = /(µ) + 35.71.
Figure 9.3.1. Log relative likelihood functions from Example 9.3.1.
The three log RLF's are tabulated in Table 9.3.1 and graphed in Figure
RLF based on the sample of 100 people. As a result, the larger sample gives 9.3.2, with r(µ) being given by the broken line. From the graphs, the following
shorter likelihood intervals for 8. For instance, the 10% LI is (0.011, 0.063) for 50% likelihood intervals may be obtained:
the sample of 200, as opposed to (0.006, 0.081) for the sample of 100. Observation 1 only: 0.025 ::; µ::; 0.049
In general, increasing the number of independent observations will
produce a more sharply peaked likelihood function and thus shorter Observation 2 only: 0.036::; µ::; 0.144
likelihood intervals for 8. With more observations there will be a shorter Both observations combined: 0.029-::; µ-::; 0.053.
range of plausible values for 8, and so 8 can be more precisely estimated. As a
rough guide, the length of the 1OOpo/o likelihood interval is inversely Table 9.3.1. Log Relative Likelihood
proportio nal to the .square root of the number of independent observations. Functions for Example 9.3.2
Thus about 4 times as many observations are needed to produce an interval
only half as wide. D µ r1(µ) r2(µ) r(µ)
-5.43
EXAMPLE 9.3.2. In Example 9.2.2, we considered data from two experiments 0.005
-3 .55 -9.51
0.01 -6.59
with test tubes containin g river water: 0.015 -3.42 -2.52 -5.32
Observati on 1: y = 28 negative reactions out of n = 40 0.018 -2.25 -2.09 -3.71
0.02 -1.66 -1.85 -2.89
test tubes each containing v = 10 ml. -0.67 -1.37 -1.42
0.025
Observation 2: y = 37 negative out of n = 40 tubes om -0.17 -1.02 -0.57
0.04 -0.08 -0.54 -0.00
with v =I. 0.05 -0.76 -0.26 -0.39
-0.09 -1.39
Graph the log relative likelihood functions and obtain 50% likelihood 0.06 -1.92
intervals for µ based on the two observations taken separately, and taken om -3.40 -0.02 -2.80
0.08 -5.12 -0.00 -4.50
together. -0.10
0.10
0.20 -1.87
SOLUTION. The log likelihood function based only on observation 1 is 0.30 -4.50
11 (µ) = 28 log p 1 +12 log(l - pi); p 1 = e- 10 1t.
22
9. Likelihood Methods
9.3. Relative Likelihood
23
r(µ)
.08 .10 .12 r(µ)
-1
r
sheet is divided into disjoin t region s of value x is then
a< X s b, say. The probability of <;>bserving the
numb ers of plaques observed in 20 regions:
s f(x)d x = F(b) - F(a) . (9.4.1)
2 2 1 4 8 P(a < X b) =
3 2 6 0
l 3 2 0 1 5 2 3 4
we observe n values
Suppose that, in n independent repetitions,
(a) Suppo se that the virus particles are rando
mly and uniformly distrib uted
x 1, x 2 , ... , x., with x 1 corre spond ing to the
real interval [a,, bJ. Because
over the cell sheet at the rate of J. per region . Plot the log RLF of J. and find a the data is obtai ned as a
repetitions are independent, the probability of
10% LI. product:
r recorded only whether the
(b) Suppo se that, for each region, the experimente
result was positiv e (at least one plaque ) or negati
know
ve (no plaques) Thus for the
only that there were 18
P(E; 8) =fl P(a 1 < X s b;) =
I
fl
[F(b 1) - F(a;)] .
i=
(9.4.2)
20 regions referred to above, we would I= 1
ves. Plot the new log RLF for ,1, on the graph prepar ed (9.4.2).
positives and 2 negati
ation about ,1, been lost in The likelihood function of 8 is propo rtiona l to
in (a) and find a 10% LI. Has much inform then F(b 1) will be close to F(a 1) ,
and negati ves? If the interval length ~; = b1 - a1 is small,
recording only positives ~;
26
9. Likelihood Methods
9.4. Likelihood for Cont inuo us Mod
els
and com puta tion of the differenc 27
e F(bi) - F(a 1) may introduce serio
roun doff errors. In this case, we mak us with mean 8 = 30 changes very little
e use of (6.1. 7), and. approximate the over an interval of length 1. Areas
und er the density function between area • the p.d.f. will thus be well app roxi mat under
a, and b, by the area of a rectangle ed by rectangles, and (9.4.4) should
base Ll 1 and height f (x;): with an accurate approximation. We subs give
titut e for f(x 1) in (9.4.4) and take c
obta in = 1 to
(9.4.3)
Som e or all of the factors in (9.4.2)
function which is easier to deal with
are app roxi mat ed in this way to obta
in a
L(8 )= TI -e
n
1~ 1
1 - """
8
9
=& - •exp ( - -1 l:x ) •
computationally and mathematical 8 1
In the most usual case, all of the ly. The log likelihood function is
measurement intervals Ll are small,
the approximation (9.4.3) may be 1 and
applied to all of the terms in (9.4.
gives 2). This
P(E; 8) ~)]
n
f(x 1)Ll 1 = [
}n] Ll 1 JI\
n
f(x;). The score and information function
s are
Since the Ll/s do not dep end upon n
the product of probability densities
8, the likelihood function is proportio
nal to S(8) = - - + -12 Ex · J(8) =-
n 2
, 8 8 " 82 + 83 LX; .
n
We may now solve the maximum
L(B) = c · TI f(x; )
i:::.1
(9.4.4) 1
likelihood equ atio n S(8) = O to obta
in
iJ = - l:x 1 = x. Not e that
where c is any convenient positive n
constant. This is actually an approxim
tion , but it will be an extremely accu a-
It is not necessary to replace ever
rate one whenever the Ll;'s are all sma
y factor in (9.4.2) by the approxim
ll. e
J( ) =
n 2n8
n
- 02 + 7J3 = 02
(9.4.3). For instance, it may hap pen ation >0
that f(x) changes rapidly when xis
in which case the original terms in sma ll, and hence the root obtained is a rela
(9.4.2) could be retained for small
values x 1, tive maximum.
and the app roxi mat ion could be used
for large x;'s. Another situation whe
some of the terms in (9.4.2) should re
be retained will be discussed in the r( e)
section. next
The total of the II= 10 observed lifetimes is LX; = 288, so that e= 28.8 and Table 9.4.1. Comp arison of Exact and Appro ximate
Likelihoods Based on Ten Obser vation s from an
288
1(8)= -10 Jog 8- . Expon ential Distrib ution
0 Difference
Exact r(O) Approx. r(O)
The log relative likelihood function, 0 Based on (9.4.2) Based on (9.4.4) (9.4.2)- (9.4.4)
r(8) = 1(8) - £(8), 5 -30.0745 . -30.09 06 + 0.0161
mean lifetime between -8.218 4 -8.222 1 +0.0037
is plotted in Figure 9.4.1. The observ ations indicate a
10
-5.242 9 -5.245 3 +0.0024
20 and 43 days (50% LI). Values of 8 less than 16 days
or greater than 62 days 12
15 -2.675 4 -2.676 7 +0.0013
are implausible (relative likelihood less than 10%). + 0.0006
20 -0.753 0 -0.753 6
25 -0.104 8 -0.105 0 +0.0002
shall determine the
EXACT SOLUTION BASED ON (9.4.2). For compa rison, we 40 -0.485 3 -0.485 0 -0.000 3
of the exponential
exact likelihood function based on (9.4.2). The c.d.f. 60 -2.140 1 -2.139 7 -0.000 4
distrib ution with mean 8 is 80 -3.8169 -3.816 5 -0.000 4
100 -5.3284 -5.327 9 -0.000 5
F(x) = 1 - e - x/O for x > 0. -10.81 94 -0.000 5
200 -10.81 99
interval x ± 0.5, with -14.39 46 -14.39 41 -0.000 5
An observed integer value x > 0 corres ponds to a real 300
probab ility
exp (- x -
8
h) - exp (- x +8 h) e' - e
-c c
2
c
4
13. Suppose that X and Y are continuous variates with joint probability density The model will usually involve one or more unknown parameters e which
function require estimation from the data.
f(x, y) e-ox-y/O for x > 0, y > 0. Suppose that n specimens are tested independently. If the experiment is
continued sufficiently long for all of the items to have failed, the likelihood
Find the MLE and RLF of 0 on the basis of n independent pairs of observations function fore based on then observed lifetimes x 1 , x 2 , ..• , x. can be obtained
(x., y1), i ='= 1, 2, ... , n. as in the last section. However, one might wait a very long time indeed for all
14. Independent measurements x 1 , x 2, •.. , x. are taken at unit time intervals. For of the specimens to fail, and it is often desirable to analyze the data before this
i = 1, 2, ... , 0 the measurements come from a standardized normal distribution happens. One or two hardy specimens may tie up a laboratory f-0r months or
N(O, !). A shift in the mean occurs after time(}, and for i = 0 + 1, 0 + 2, ... , n the years without greatly adding to the information about e, at the same time
measurements come from N(l, 1). preventing other experiments from being undertaken. It often makes good
(a) Show that the likelihood function of 0 is proportional to practical sense to terminate the experiment before all n items have failed.
If the ith specimen has failed by the time the experiment terminates, we will
exp{- .I
t=l
(x 1 !)}. know its lifetime xi. This will actually correspond to a real interval
ai < X::;: b1, say, with probability
(b) Graph the log RLF !or 0 on the basis of the following set of 20 consecutive P(a 1 < X::;: b1) = F(b 1) - F(a 1) ~f(xi).!1;,
measurements:
provided that the time interval Li;= b1 - a1 is small.
-1.26 -0.16 -0.64 0.56 -1.82 -0.76 -2,08 -0.58 0.14 If the jth specimen has not failed when the experiment ends, we will not
0.94 -0.58 0. 78 1.80 0.58 0.02 0.86 2.30 1.80 0.84 -0.18 know its lifetime, and the lifetime is said to be censored. The censoring time T;
is the total time for which the specimen had been tested when the experiment
Which values of 0 have relative likelihood 10% or more? ended. For this specimen, we know only that T; < X < oo, and the probability
15.* The p.d.f. of the double exponential distribution is of this event is
P(~ < X < oo) =F(oo)-f(~) = 1 - F(~).
f(x)=!exp{-lx-01} for co < x < co,
The likelihood function of e will be a product of n factors, one for
where - co < 0 < co. Let X1, X2, ... , Xn be independent observations from this
distribution, and let x0 ) s xm s ·.. S x<•l denote these n observed values each specimen tested. Suppose that m specimens fail and n - m do not, so
arranged in nondecreasing order. that we have m failure times xu x 2 , .•• , xm, and n - m censoring times
T1 , T2 , ... , T.-m· Then the likelihood function of e will be proportional to
e
(a) Show that, if n is odd, then = X(m) where n 2m - 1.
{b) Show that, if n = 2m, then 1(0) is maximized for any value of 0 between x<ml
and X(m+ !)o and SO 0 is not unique. [CT f(x )Li,] )J [1 - F(T)].
1
The Li;'s do not depend upon e and can be absorbed into the proportionality
constant to give
9.5. Censoring in Lifetime Experiments L(e) ={.a JXf c1 -
f(x;) F(Tj)]. (9.5.1)
In many experiments, the quantity of interest is the lifetime (or time to failure) where c is any convenient positive constant. The maximum likelihood estimate
of a specimen; for instance, the lifetime of an electronic component, or the and RLF can now be obtained.
length of time until an aircraft component fails from metal fatigue, or the
survival time of a cancer patient after a new treatment.
The probability model generally assumes the lifetime X to be a continuous
Special Case: Exponential Distribution
variate with some particular probability density function f and cumulative
distribution function F. For example, if we thought that the chance of failure If X is assumed to have an exponential distribution with mean e, then
did not depend upon the age of the specimen, we would assume an
exponential distribution. Lifetime distributions for situations in which the F(x) = 1 - e-x/O for x > 0.
risk of failure increases or decreases with age were considered in Section 6.4.
34
9. Likelihood Methods
9.5. Censoring in Lifetime Experiments
35
In this case, (9.5.1) simplifies to give
T 25
0
T•50
I I
I I 70
~II
I I
I I
I I 66
-5 I
I
120 I
-;<. 4
I I
I I
I I
35
I I
40 I
I I
29
I
I
I
I
0 20 40 60 Time
-5
Figure 9.5.1. Diagr amma tic representation
of lifetime data showing two possible
censoring times. Figure 9.5.2. Log refative likelihood functi
on for the exponential mean () under
various levels of censoring.
9. Likelihood Methods 9.6. Invariance 37
36
Hormone·treated Control
8. With no censoring, values of 8 greater than 62 are implausible (R < 0.1);
with censoring at 25 days, 8 can be as large as 108 before R decreases to 10%. Recurrence 2 4 6 9 9 9 1 4 6 7 13 24
Censoring thus makes it impossible to place as tight an upper bound on the times 13 14 18 23 31 32 25 35 35 39
value of 8, but has little effect on the lower bound.These results suggest that if 33 34 43
we were primarily interested in establishing a. lower bound for 8, a short
Censoring 10 14 14 16 17 18 3 4 5 8
experiment with heavy censoring could be quite satisfactory.
times 18 19 20 20 21 21 10 li13 14 14 15
23 24 29 29 30 30 17 19 20 22 24 24
Note. In applications, the appropriate anarysis will normally be that which 31 31 31 33 35 37 24 25 26 26 26 28
corresponds to the pattern of censoring actually used in the experiment. 40 41 42 42 44 46 29 29 32 35 38 39
However, in some cases one might also wish to examine the likelihood 48 49 51 53 54 54 40 41 44 45 47 47
function that would result from more severe censoring in order to see what 55 56 47 50 50 51
effect a few large lifetimes have on the analysis.
(a) Find the probability density function, and show that the mean of this
PROBLEMS FOR SECTION 9.5 distribution is e.
Ten electronic components with exponentially distributed lifetimes were tested (b) Forty bulbs were tested and failures occurred at the following times (in
l. hours):
for·predetermined periods of time as shown. Three of the tubes survived their
test periods, and the remaining seven failed at the times shown.
196 327 405 537 541 660 671 710 786
2 3 7 8 9 10
4 5 6 940 954 1004 1004 1006 1202 1459 1474 1484
Tube number
1602 1662 1666 1711 1784 1796 1799
Test period 81 72 70 60 41 31 31 30 29 21
Failure time 2 51 33 27 14 24 4
The remaining bulbs had not failed when testing stopped at 1800 hours.
Find the MLE and the 10% likelihood interval for the exponential mean 0. Find the MLE and the 10% likelihood interval for 8.
5. * An arrow is shot at the center of a circular target of radius 1. Let X denote the
2.t n electronic components were simultaneously placed on test. After a time T
testing was stopped. It wa~ observed that n - k were still operating and that k horizontal displacement and Y the vertical displacement of the point of impact
had failed, but the times at which the failures had occurred were not known. from the center of the target. It is to be assumed that X and Y are independent
Assuming that failure times follow an exponential distribution with mean. 0, N(O, cr 2 ) random variables.
derive the maximum likelihood estimate and the relative likelihood function (a) Show that the probability of a shot missing the target is
of 0.
P(X 2 + Y2 ;:::: 1) =exp { -1/2cr 2 }.
3. A clinical trial was conducted to determine whether a hormone treatment
benefits women who were treated previously for breast cancer. A woman entered (b) Of n independent shots, m hit the target at points (x,, y,) for i = 1, 2, ... , m.
the clinical trial when she had a recurrence. She was then treated by irradiation, The other n - m shots miss the target, and their points of impact are not
and assigned to either a hormone therapy group or a control group. The recorded. Find the MLE of er.
observation of interest is the time until a second recurrence, which may be
assumed to follow an exponential distribution with mean OH (hormone therapy
group) or Oc (control group). Many of the women did not have a second
recurrence before the clinical trial was concluded, so that their recurrence times 9.6. Invariance
are censored. In the following table, a censoring time "n" means that a woman
was observed for time n, and did not have a recurrence, so that her recurrence Suppose that the probability model for an experiment depends upon an
time is known to exceed n. Plot the log RLFs of (JH and Ocon the same graph. Is unknown parameter 8. The model then consists of a whole family of
there any indication that the hormone treatment increases the mean time to e
probability distributions, one for each value of in the parameter space n.
recurrence? For example, we might assume that the time to failure of an electronic
4.t* The cumulative distribution function for the lifetime of a new type oflight bulb is component has an exponential distribution, with probability density function
assumed to be 1
F(x) = 1-(l + 2; )e-2x/6 for x >0
f(x) = -e-xfO
e for 0 < x < oo, (9.6.1)
38 9. Likelihood Methods 9.6. Invariance
39
e e
where is the expected lifetime. For each value of belonging to !l = (0, oo ), EXAMPLE 9.6.1. In Example 9.4.l, we supposed that the lifetimes of electronic
we have a theoretical distribution. For instance, the distribution labeled by components were exponentially distributed, with mean lifetime e. On the
B= 1 is basis of ten observations, we found that 8 = 28.8. The 50% LI for was e
20:::; B s 43, and the 10% LI was 16:::; e:::; 62.
for 0 < x < oo, (9.6.2)
(a) Suppose that we are interested in the failure rate, A= 1/8. Then the
and the distribution labeled by 8 = 2 is MLE of A is
1 1
1 1 = 8 = 28.8 = 0.0347.
f(x) = -e-xfl for 0 < x < oo. (9.6.3)
2
The 50% LI for A is obtained by noting that 20:::; l/A:::; 43 if and only if
A family of distributions can be parametrized (or labeled) in many different
1/20:::: A:::: 1/43. Hence the 50% LI is 0.023:::; A:::; 0.050. Similarly, the 10%
ways. For instance, we could equally well write (9.6.1) as LI is found to be 0.016:::; A=:; 0.063.
for 0 <x < oo (b) Suppose that we are interested in the proportion p of such components
which will last at least 25 days. Then
where A= 1/8 is the failure rate. Distributions (9.6.2) and (9.6.3) are now
1
J
oo
labeled by A= 1 and A= 0.5, respectively. We have the choice of labeling the /) = P(X ;:o: 25) = (/-x/8dx = e-25/8,
family of exponential distributions by values of 8, or by values of A, or by 25
values of any other one-to-one function of 8. We usually try to select a which is a one-to-one function of 8. Hence the MLE of p is
parametrization so that the parameter represents some interesting character-
istic of the distribution, and the mathematical expressions are fairly simple. 'iJ = e- 2518 = 0.420.
When we say that 8 = 1 is ten times as likely as 8 = 2, we imply that the Since 8 = - 25/log /), the 50% LI for p is given by
distribution labeled by 8 = 1 is ten times as likely as the distribution labeled
by 8 = 2. When we say that the maximum likelihood estimate of 8 is 8 = 1.1, 25
20s; ---<43
we imply that the distribution labeled by 8 '== 1.1 is the most likely distri- log f3 -
bution. Since the method of labeling the distributions is largely arbitrary, it
would seem desirable that the plausibilities assigned to the distributions and solving for P gives 0.287:;:;; p::::; 0.559. Similarly, the 10% LI is
0.210 s p::; 0.668.
should not depend upon the particular method of labeling which has been
selected. In other words, the plausibilities assigned should be invariant under
one-to-one transformations of the parameter. PROBLEMS FOR SECTION 9.6
An attractive property of the likelihood methods we have discussed is that
1. Let Ydenote the median lifetime of electronic components in Example 9.4.1. Show
they are invariant under one-to-one parameter transformations. For suppose
that Y =Blog 2, and hence obtain the MLE and the 10% likelihood interval for y.
that 8 = g(A) where g is invertible, and let P(E; 8) be the probability of the
observed event E. Substituting 8 = g(A) in this expression gives the proba- 2. We wish to estimate p, the probability of no diseased trees in a four-acre plot, in
bility of E as a function of A. It follows that L(8), the likelihood function of 8, Problem 9.1.1. One approach would be to note that 4 out of 10 plots contained no
and L.(A), the likelihood function of A, are related as follows: diseased trees, so that p= 0.4 and L(p) = p4 (1- p) 6 • A second approach would be
to express p as a function of A and use the invariance property of likelihood.
L.(A) = L(8) where 8 = g(A). Determine the MLE and the 10% likelihood interval for p by both methods.
Under what conditions would the first method be preferable?
Hence both functions have the same maximum value, and
3.t The following table summarizes information concerning the lifetimes of one
R.(A) = R(8) where 8 = g(A). hundred V600 indicator tubes. (Ref.: D. J. Davis, Journal of the American
If A. 1 is any possible value of A and 8 1 = g(A 1 ) is the corresponding value of 8, Statistical Association 47 (1952), 113-150).
then A1 and 8 1 have the same relative likelihood. Relative likelihoods do not Lifetime (hours) 0-100 100--200 200--300 300--400 400--600
depend upon whether we choose to work with parameter A or parameter fJ. Frequency observed 29 22 12 10 10
It follows that, if 1is the MLE of 1, then 8 = g(1) is the MLE of 8. Similarly, Lifetime (hours) 600-800 800+
fJ 1 belongs to the 100p% likelihood region for 8 if and only if fJ 1 = g(A 1 ) where Frequency observed 9 8
A1 belongs to the 100p% likelihood region for A.
Suppose that the lifetimes follow an exponential distribution with mean 8.
•·.
40 9. Likelihood Methods 9.7. Normal Approximations
41
(a) Show that the joint probability distribut.ion of the frequenc or equivalently, r(8) :2: log p. Taking rN(8) :2: log p gives
ies is multinomial
with probabilities
p 1 =P(O< T< 100)= 1-/J; p2 = P(IOO < T < 200) = /J(l - /J); .. . ; 8 E {j ± j ( - 2 log p)/§ (B) (9.7.3)
p 7 = P(T> 800) = /J 8 , as an approx imation to the 100p% likeliho od region. This
where /3 = e- 10019• is an interva l
centere d at 8 with length
(b) Show that fl can be obtaine d as a root of a quadrat ic equatio
n, and deduce the 2J ( - 2 log p)/J(fJ) .
value of&.
(c) Prepare a graph of the log RLF of /J. Obtain 10% and The larger the value of J(li), the narrow er the approx imate
50% likelihood interva l will be,
intervals for /J, and transform them into likelihood intervals
for e. and hence the more inform ation we have concer ning 8. This
is the reaso n that
4.* The arrivals of westbound vehicles at a fixed point on ..1(8) is called the "inform ation functio n".
an east- west road are
random events in time. On the average there areµ arrivals per When the normal approx imatio n is sufficiently accurat
ten second interval. e, all of the
A traffic signal is to be installed a short distance beyond the observa inform ation concer ning 8 is summa rized in 8 and J (O). The
tion point. It MLE indicat es
is desired that the signal remain at "STOP " for a time /3 such that the most likely parame ter value, and J(8) indicat es the precisio
the probab1hty n with which
of holding up k or more vehicles is p. IJ can be determ ined. Given these two values, likeliho od
interva ls of any sizes
(a) Show that p = P(x~2 kl :s; fiµ / 5). In particula~. if k = 8 desired can be obtaine d from (9.7.3).
and p = 0.05, then
/3 = 39.8/µ. How large a sample is necessary before the normal
. approx imatio n
(b) Assuming that k = 8 and p = 0.05, use the data in Problem r(8) ~ rN(8) can be used? This depend s very much on the situatio
9.3 .2 to determme n. In the first
'/J and the 10% likelihood interval for /J. exampl e below we find that r(8) = rN(8) exactly for all sample
sizes, but in the
second exampl e the approx imatio n is not very good even
with 500 observ a-
tions. Thus it is necessary to check the accurac y of the normal
approx ima-
9.7. Norm al Appr oxim ation s tion in each new situatio n. We can do this by plottin g both
r(8) and rN(8) on
the same graph and verifying that they agree closely for values
of 8 inside, say,
Let 1(8) denote the log likeliho od functio n of a continu ous the 10% likeliho od interval. Alterna tively, we can check that
parame ter IJ with a graph of the
possibl e values in n. Let S(8) = 1'(8) and ..!(8) = -1"(8) denote score functio n S(8) is well approx imated by a straigh t line
.the scor~ and over this interval.
inform ation functio ns as in Section 1. We assume that
B exists and 1s an EXAMPLE 9.7.1. Let x 1 , x , .. ., xn be indepe ndent
interio r point of il, and that 1(8) has a Taylor 's series expans observ ations from a normal
ion at 8 = 8: 2
distribu tion with unknow n mean µ and known varianc e <J 2
. If the measur e-
8 - 8 71 (8 - 8) 2 71 (8 - 8) 3 ,,,(71) ment interva ls are small, the likeliho od functio n ofµ is propor
/(8) = 1(0) + -1!-l'( u) + - -!- l"(u) + - 3-!- 1 u + .. · · tional to the
2 produc t of probab ility density functions:
Since /'(8) = O and r(8) = 1(8) -1(0), we have
L(µ) = c · fl f( x;) = c · fl -~1-
i=l
exp {- ~
2<J
(x , - µ) 2 }
1 71
-(8 - u) 2§(u)
71 i =l
+ (8
(J
r(8) = -
2
- - -10)3
-
1"'(71)
t1 + ... · (9.7.l)
3.
The normal approximation to r(8) is defined as follows : = exp {- : 2 I:(x, - µ) 2 }
2
by choice of c. Hence the log likeliho od, score, and inform ation
rN(8) = - !(8- 0) 2 ..f(O). (9.7.2) functio ns are
2
l 2. 1 n
/(µ) = - 2<J2 L(X; - µ) • S(µ) = 2 I:(x; - µ);
If 18 - &i is small, the cubic and higher terms in (9.7.l) are small, and
hence (J
J(µ)= 2 ·
(J
r(8) ~ rN(8).
The effect of increas ing the amoun t of data is to produc e a
sharply peaked Solving S(µ) = O gives fl. = !n I:x ; = x, and hence the log relative likeliho od
likeliho od functio n and shorter likeliho od interva ls (see Examp
le 9.3.l). Thus, functio n is
for a sufficiently large sample , 18 - Bl will be small and rN(8)
will give a good
approx imatio n to r(8) over the entire region of plausible
parame ter values.
The 100p% likeliho od region for 8 is the set of 8-values such
that R(8) 2 p,
r(µ) =I(µ) - l(f.i.) = - 2:2 I:(x, - µ) + 2:2 I:(x; - x)
2 2
•
42 9. Likelihood Methods
9.7. Normal Approximations 43
J,().) =
d
- d). S. = -S(IJ) dd). 82
2
-
dS (d8)
d/J. dA.
2
where .J 1 = J 1(tJ 1) and .J2 = J 2(fJ 2). Show that the overall MLE based on both
experiments is given approximately by
= - S(B/28
d). 2
+ J(B) (d8)2
d).
B=(.J,e, + .J2e2)/ (.J1 + .J2).
and that this value lies between 81 and 82 •
At the maximum we have S(B) = 0, and therefore
7. (a) Let X 1 , X 2 , ... , x. be independent Poisson variates with meanµ, and consider
J.(1) = qi J(B) transformations of the form µ = )." where a¥- 0. Find the log likelihood
(9.7.4) function of )., and show that the cubic term in the expansion of this function
where q is the value of d8/d). at the maximum . about ). = ~ is zero for a = 3.
46 9. Likelihood Methods 9.8. Newton's Method 47
(b) Obtain an approximate 10% LI for ,l = µ 113 , and transform it to obtain an The revised guess is the point at which the linear approximation (tangent) to
interval for µ.
(c) Suppose that n 10 and :Ex 1 = 53. Using a table or graph, investigate the
e1 as the new
g(B) at B = B0 crosses the B-axis (see Figure 9.8.1). We now take
accuracy of the normal approximations to the original log RLF ofµ, and to the preliminary guess and repeat the calculation to get
log RLF of the transformed parameter }. = µ 113 . 82 =Bi - g(Bi)/g'(Bi).
We continue this procedure until Bi+ 1 ~ B1, in which case g(O,) = 0 and a root
has been found.
9.8. Newton's Method
In this section we describe two applications of Newton's iterative method for Solving the Maximum Likelihood Equation
solving an equation.
Suppose that we wish to find a root Bof the equation g(B) = 0. Let B0 be a We noted in Section 9.1 that, under suitable conditions, fl is a root of the
parameter value close to Band consider the Taylor's series expansion of g(O) maximum likelihood equation S(8) = 0, where S(8) = 1'(8). Taking g(8) = S(8)
about B= 00 :
in the above derivation, we have g'(8) = S'(8) = -f(B) (see Section 9.1). Thus
g(8) =g(8 0 ) + (0- 80 )g'(80 ) + (8-8 0 ) 2 g"(8 0 )/2! + ···. the updating formula (9.8.1) becomes
For 18 - 80 1 small, the quadratic and higher terms in this expansion will be (9.8.2)
small, and dropping these terms gives
Starting with an initial guess 80 , we repeatedly update to get 8 1 , 82 , 8 3 , ..•.
g(8);::::; g(8 0 ) + (B - 80 )g'(Bo). We stop as soon as ei+ 1 ~ e,, so that S(8;);::::; 0. To verify that a relative
maximum has been found, we check that f(8;) > 0.
We are approximating g(8) by a linear function of ewhich has the same value
Newton's method works well in most statistical applications. If the initial
and slope as g(O) at e = 80 •
guess is reasonable, the procedure usually produces an accurate approxima-
Since g(B) = 0, we have
tion to Bafter only three or four iterations. The reason for this is that, for
g(0 0 ) + (B- 80 )g'(8'0 ) ~ 0, moderately large samples, S(8) is nearly linear in e (see Section 9. 7). If S(8) is
and therefore exactly linear in e, Newton's method produces Bin a single iteration.
e;::::; 80 - g(8o)/g'(8o). If S(8) = 0 has more than one root, Newton's method will not necessarily
converge to the one desired. Difficulties can also arise if the maximum occurs
In Newton's method we take 80 to be a preliminary guess at 8, and then on or near a boundary of the parameter space. It is a good idea to examine a
compute a revised guess e1 as follows: graph of 1(8) before applying Newton's method.
(9.8.1) EXAMPLE 9.8.1. Newton's method will be used to obtain the overall MLE P. in
Example 9.2.2. The score and inf9rmation functions are
120 3 1200p 1 3pz
g(e)
S(µ) = - - + - 440; f(µ) = (1 - p ) 2 + (1 - P2) 2
1- P1 1 - P2 1
µ, S(µ,) .1'(µ1) I
''
µi+l I
I
0 0.057 -109. 66 4518.16 0.03273 I
I
I,
1 0.03273 83.07 13902.58 0.03871 I
I
S(µi) = 83.07; J(µ1) = 13902.58 Figure 9.8.2. Solution of r(l:l)- log p = 0 by Newt
on's method .
and hence obtai n
EXAMPLE 9.8.2. Newt on's meth od will be used
µ2 = 0.03273 + 83.07/ 13902.58 = 0.03871. interval for 8 in Example 9.4.l. For this exam
to obtai n the 10% likelihood
ple we have
Cont inuin g in this fashion, we obtai n µ
= 0.04005 corre ct to five decimal
places. Note that J(f.i) > 0, so a relative maxi 288 10 288 10 576
8 ; S(8)= --+
mum has been found. 1(8)= -101 og8 - -· ..1(8) = - 82 + B3.
8 82 '
The MLE is {) = 28.8, and so
Likelihood Inte rval Calc ulati on
1(8) = -43.6 04; ..f(B) = 0.01206.
In p~evi~us ~xamples w.e found likelihood
intervals from a graph of the log Thus the log relative likelihood function is
relative hkeh hood funct 1on r(8). Alternatively,
we can obtai n the endp oints of
the 100p% likeli hood interv al by solving 288
the equa tion g(8) = O, where r(8) = 1(8) -1(0) = - 10 log 8 - B + 43.604,
g(8) = r(8)- 1og p.
and (9.8.4) gives
Usually nume rical meth ods will be requi red,
and Newt on's iterative meth od
can again be used. 8 = 28.8 ± j(-2 log 0.1)/ 0.01206 = 28.8 ± 19.5.
Since r(8) = 1(8)- l(B), it follows that g'(8) = Table 9.8.2 shows the calcu lation of the left
/'(8) = S(8), and so (9.8.l) gives endp oint with initi al estim ate
81 = 80 - [r(80)- log p]/ S(8 ). 28.8 - 19.5 = 9.3. After five iterat ions, the left
0 endp oint is found to be 15.65,
(9.8.3)
Calcu lation of the right endp oint is illust
rated in Figur e 9.8.2. We begin Table 9.8.2. Calcu lation of 10% LI by
with a prelim inary estim ate 8 for the endp
0 oint. The revised estim ate 8 is the New ton's Meth od
8-value at which the tange nt to g(8) at 1
8 = 80 crosses the 8-axis. The
calcu lation is repea ted with the revised value
as the new initial estim ate. We 1:1, r(l:I,) S(8;) 1:1;+ 1
conti nue in this way until convergence to
the right endp oint is obtai ned. A
secon d iterat ion is then carrie d out for the 0 9.30 -9.66 4 2.255 12.57
left endp oint.
Start ing values for New ton's meth od can 1 12.57 -4.621 1.027 14.83
be taken from a preliminary
graph of r(8). Alternatively, they can often 2 14.83 -2.78 3 0.635 15.59
be obtai ned from the norm al 3 15.59 -2.33 6 0.544
appro xima tion (9.7.2), which gives . 15.65
4 15.65 -2.30 4 0.536 15.65
e= 8± j(-2 Iog p)/ ..f(O) 0 48.30 -1.133 -0.08 36 62.29
as appro xima tions to the interv al endp oints I 62.29 -2.33 8 -0.0863 61.88
. 2 61.88 -2.30 2 -0.0864 61.88
50 9. Likelihood Methods 9.8. Review Problems
51
correct to two decimal places. Similarly, the initial value for the right REVIEW PROBLEMS FOR CHAPTER 9
endpoint is 28.8 + 19.5 = 48.3, and the final value is 61.88 after three
iterations. Thus the 10% likelihood interval is 15.65 -o; 8 -o; 61.88. 1. (a) Red spider mites are distributed randomly and uniformly over the surface area
of leaves on an apple tree. A sample of 100 leaves of unit area yielded the
following results:
PROBLEMS FOR SECTION 9.8
Lt Use Newton's method to locate the maximum of the following log likelihood Number of mites 0 2 3 4 5 2:6
function : Observed frequency 16 31 22 18 ·10 3 0
/(µ) = 100 logµ- 50 µ- 50 log(l -e-µ) forµ> 0.
Find the MLE and the 10% LI for)., the expected number of mites per unit
2. Suppose that the score function is linear in 0,
area.
S(O) = ae + b for -w<fJ<o o, (b) The following collapsed table would have been obtained if only the absence or
presence of mites on a leaf had been recorded:
where a, bare constants with a< 0. Show that Newton's method converges to 0 in
one iteration for any starting value 00 .
Number of mites 0 1 or more
3. Samples of river water are placed in test tubes and incubated. There are n test
1
tubes each containing volume v" and y 1 of these give negative reactions, indicating Observed frequency 16 84
the absence of coliform bacteria. Altogether, data are available for m different
volumes v1 , v 2 , ... , vm. It is assumed that the bacteria are distributed randomly and Find the MLE and the 10% LI for). based on the collapsed table. Has much of
uniformly throughou t the river water, with A. bacteria per unit volume on average. the informatio n concerning A. been lost?
(a) Show that the score and informatio n functions for A. are 2.t In a study of the spread of diseas~ among spruce trees planted in a reforestatio n
project, a single line of trees is selected and the number of healthy trees between
S().) = r v,(n; - y,) - 1:v n ; successive diseased trees is counted.
1 1
x 1 -p,
Number of healthy trees 0 2 3 2:4 Total
where p1 = exp ( - Av1).
(h) Using Newton's method, evaluate J. for the following data: Observed frequency 50 23 14 8 5 100
Volume ~ 8 4 2 1 If the disease is non-contagious, the number X of healthy trees between successive
No. of test tubes ~ 10 10 10 10 diseased trees should have a geometric distributio n, with probability function
No. of negatives y, 0 2 3 7
for x = 0, 1, 2, . ..
4.t Use Newton's method to obtain the 10% likelihood interval forµ in Problem 9.3.2.
where 0 < 11. < 1.
5. The probability thatj different species of plant life are found in a randomly chosen
(a) Assuming the model to be appropriat e, calculate the MLE and the 10% LI
plot of specified area is
for 11..
(1 - e-;y+ I (b) Calculate estimated expected frequencies under the model. Does the model
P1= (j + l)). forj=O, 1, 2, ... , give a reasonable fit to the data?
where 0 < ). < oo . The data obtained from an examinatio n of200 plots are given in 3. A shipment of 20 items contains d defectives, where d is unknown. Six items are
the following frequency table: selected at random without replacement, and only one of them is defective. Find
the maximum likelihood estimate and the 50% likelihood interval for d.
No. of species 0 1 2 3 2:4
Frequency 4. An inoculum consists of a suspension of virulent microorganisms. To assess its
147 36 13 4 0
strength, n animals are given a dose of l ml. If the dose contains one or more
(a) Obtain expressions for the log likelihood, score, and informatio n functions organism the inoculated animal will get sick, otherwise it will not.
of A. (a) Find the probability p that an animal does not get sick as a function of)., the
(b) Evaluate J. by Newton's method.
density of organisms per ml o:f inoculum.
(c) Calculate estimated expected frequencies. Does the model appear to give a I (b) Out of 10 animals inoculated, 6 got sick. Find the MLE and 10% likelihood
reasonable fit to the data? j
interval for p.
(d) Use Newton's method to find the 10% likelihood interval for ).. l (c) From the results in (b), obtain the MLE and 10% likelihood interval for )..
52
9. Likelihood Methods
9 25 6 18 43 17 12 10 18 42 Two-Parameter Likelihoods
Assuming that the lifetimes follow an exponential distribution, find the maximum
hkehhood estimate of y, and determine the relative likelihood of the value y = 60.
6.t(a) The lifetimes (in hours) of certain radio tubes are independent continuous
variates with cumulative distribution function
F(x) = 1 - e - • 19 for x > 0
where e> 0. Five tubes were tested simultaneously over a period of 1000 hours.
One of them failed in hour 132 and another failed in hour 768. The remaining
three tubes survived the test period. Obtain the log likelihood function and
MLE of e based on these results.
(b) Find the maximum likelihood estimate of</>, the fraction of such tubes which
fail in the first I00 hours of use.
7. Suppose that events are occurring randomly in time at the constant rate of A. per
mmute. The numbers of events are observed in n time intervals of varying lengths, In this chapter we consider likelihood methods for parameter estimation
with the following results:
when the model involves two unknown parameters, a and /3. Section I
describes the method of maximum likelihood. The relative likelihood
Length of time interval lI 12 ... {0
Number of events function and likelihood regions are considered in Section 2. Section 3 defines
X1 X2 •.. X0 •
the maximum relative likelihood function of /3, whose properties are similar
Derive the likelihood function and maximum likelihood estimate of A.. to those of a one-parameter relative likelihood function. Normal approxi-
mations to the log RLF and maxi.mum log RLF are described in Section 4.
8. LetX1 , X2 .. ... x.. Yi. Y2,. . ., Y.,,beindependentvariates theX.'sbeingN{µ 2)
' I> 11 Sections 5 and 6 deal with two applications. The estimation of the
an d the Y.1 s N(µ2. 11 ). Bothµ, and µ 2 are known but 17 2 is 'not. Find
2 I
the MLE of 17 2
based on all n + m measurements. relationship between the probability of a response (e.g. death) and the dose of
a drug is considered .in Section 5. Section 6 describes an example from
9.tOne of the three children in a family comes home with the measles. Each of the learning theory, in which the probability of a response is dependent on the
other two children has probability e of catching measles from him. If neither or results of previous trials.
both get the measles, the epidemic ends. However, if only one of them gets the Section 7 derives some results quoted in Section I, and describes the use of
disease, the remammg child has another opportunity, with probability e, of being
infected. Newton's method to compute points on a likelihood contour.
Most of the discussion extends readily to the case of three or more
(a) Let X denote the total number of children in the family who are infected before unknown parameters. However, difficulties can anse with maximum likeli-
the epidemic ends. Show that hood estimation and maximum relative likelihood functions when there are
P(X = 1) = (1 - 8)2; P(X = 2) = W(l - 8)2; many unknown parameters. A brief discussion of the multi-parameter case is
P(X = 3) = 8 2 (3 - W).
given in Section 8.
(b) The following data were obtained in a survey of 100 three-child families in
which at least one child contracted the measles:
For a relative maximum, the matrix./(&, p) niust be positive definite; that is,
Given observed values x 1 = 15.6, x 2 = 29.3, and x 3 = 45.8, what are the
.J 1 I > O; .J 22 > O; .J l I .J 22 - .J f 2 > 0 (10.1.2) maximum likelihood estimates of µ 1 and µ 2 ?
where fu = §ij(a, 1J). See Section 10.7 for a derivation of this result. The joint p.d.f. of X 1 , X 2 , and X 3 is the product of three normal p.d.f.'s:
As in the one-parameter case, likelihoods are invariant under one-to-one
parameter transformations. Often a parameter transformation will simplify
the ca lculation of the maximum. The inverse transformation can then be
f(x1, Xz, X3) = (fo)3 e-(x, -µ,)'/2 e-(x,-µ,J'/2 e-(x,-µ, - µ,)'f2.
applied to obtain the MLE's of the original parameters. If the measurement intervals are small, L(µ 1 , µ 2 ) is proportional to f, and the
It follows from the invariance property that, if y = g(cx, {3), then y = g(a, p). log likelihood function is
1(µ1, µ1) = -t(x 1 - µi) 2 ~ t(x2 - µ1) 2 - !(x3 - µ1 - µ1) 2.
Calculation of(&, /3) The two components of the score function are
Suppose that it is possible to solve the first equation S1 (a, p) = 0 to obtain an
algebraic expression for a in terms of {3. Let &(p) denote the solution of this
equation. This is the MLE of r:t. given {3; that is, a(p) is the value of a which
maximizes /(ex, /3) when the value of f3 is assumed known. Substituting a= a({J)
into the second equation gives
The second derivatives are
EXAMPLE 10.1.2. The following are the results, in millions of revolutions , n nI.xf (log x;)2 n(I.xf log x;)1
to
failure, ot enduran ce tests for 23 deep-groove ball bearings: g ({J) = - {3 2 - L.xf + (I:xf )2
17.88 28.92 33.00 41.52 42.12 45.60
In this example we have n = 23 and I: log X; = 95.46. Taking f3 = I as the
48.48 51.84 51.96 54.12 55.56 initial guess, we obtain
67.80
68.64 68.64 68.88 84. 12 ' 93.12 98.64
105.12 105.84 127.92 128.04 173.40
I:xf log X; = 7312; I:xf(log x;) 2 = 32572;
The data are from page 286 of a paper by J. Lieblein and M. Zelen in J. Res.
g(f3) = 17.213; g'({J) = -28.287;
National Bureau of Standards (1956). = /3- g(fJ)/g'(jJ) = 1.6085.
f3new
As a result of testing thousand s of ball bearings, it is known that their
Repeating the calculati ons with f3 = 1.6085 gives /3new = 2.0155. Continu ing
in
58 10. Two-Parameter Likelihoods 10.l. Maximum Likelihood Estimation 59
this fashion, we find that {J = 2.1021, correct to four decimal places. We then (a) Show that ji(IT), the MLE ofµ given IT, is the same for all possible values of IT.
obtain (b) Derive expressions for µ and a.
(c) Show that f(ji, a), the information matrix of(µ, IT) evaluated at the maximum,
2=n/l:xf =9.515 x 10 - s. is positive definite.
Owing to the large amount of arithmetic, use of a computer or programmabl e 4. Suppose that X 1 , X 2 , .•• , X. are independent normal variates with the same
calculator is almost essential in this example. variance 1T 2 , but with different means,
The parameter ,1, does not represent a quantity of interest, and it is usually
preferable to work with parameters (8, {3) where ,1, = 8 - fi. By (6.4.6), the c.d.f. of for i = 1, 2, .. .-, n
the Weibull distribution is where b 1 , b2 , .. ., b. are known constants. Find expressions for the MLE ofµ and
IT2.
F(x) = I - exp { -A.xP} = 1 - exp { -(x/B)P}.
5.tTwo treatments A and B, with success probabilities a and f3, are to be tested.
It follows that Subjects are treated one at a time, and the result for one subject is known before
P(X ~ 8) = F(8) = l -e - 1 =0.63. the next subject is treated. The first subject receives treatment A. Subsequently, a
subject receives the same treatment as the preceding subject if a success was
Thus the parameter 8 is directly interpretable as the 0.63-quantile of the observed, and the other treatment if a failure was observed. Testing continues
distribution. until there have been m failures with each treatment. The following data come
Since the transformatio n from (A., /3) to (8, /3) is one-to-one, the MLE of 8 from such an experiment with m = 2:
can be computed from). and p. Since 8 = ,1, - t tfl, the invariance property gives
Subject
e= 2- 1~ 1 = 81.88.
2 3 4 5 6 7 8 9 IO
Treatment A S S F S S S F
Treatment B S F F
PROBLEMS FOR SECTION 10.1
1. Pea plants are classified according to the shape (round or angular) and color (a) Show that, if a> /3, then the expected number of subjects who receive
(green or yellow) of the peas they produce. According to genetic theory, the four treatment A is greater than the expected number who receive treatment B.
possible plant types, RG, RY, AG, and AY have probabilities af3, a(l - /3), (1 - a)/J, (b) Find the log likelihood function and MLE's of a and p based on the above
and (1 - a)(l - /J), respectively, with different plants being independent of one data with m = 2. Generalize your results to the case of m failures with each
another. The following table shows the observed frequencies of the four types in treatment.
500 plants examined: 6. Suppose that Y 1 , Y2 , and Y3 are independent Poisson variates with means µ , µ ,
1 2
and µ 1 + µ 2 , respectively. Derive formulas for the maximum likelihood estimates
Plant type RG RY AG AY (ji 1, {1 2 ) based on nonzero observed values y 1 , y 2 , y 3.
Observed frequency 276 104 94 26
7. The number N of eggs laid by a female robin has a Poisson distribution with mean
Find the MLE's of a and /J, and calculate estimated expected frequencies under µ.Each egg has probability eof hatching, independently of other eggs. Given that
the model. n eggs were laid, the_number Y which hatch has a binomial (n, 8) distribution.
2. (a) Let x 1 , x 2 , .. ., x. be independent observations from N(µ, a 2) , whereµ and IT (a) Find the joint probability function of N and Y.
are both unknown. Show that (b) A biologist records n,, the number of eggs laid, and y,, the number which
µ = x; '2
IT = -l ,,_
'<'( x, - xl 2 . hatch, for k female robins. Find the log likelihood function and MLE's ofµ
n and e.
(b) Show that
8.tThe probability density functioJJ for an exponential distribution with guarantee
:E(x, - xl 2 = :Exf - nx 2 = :Ext - (:Ex,) 2 /n. time c is
3.t Suppose that Y1 , Y2 , .. . , Y;, are independent normal variates with the same mean fort> c
µ, but with different variances:
where A. and c are positive constants. This distribution might be used as a model
for i = 1, 2, ... , n for the response time Tin a computer system where there is a minimum response
time c. Suppose that both ,\ and c are unknown, and that we have available n
where a 1 , a2 , ... , a. are known positive constants.
independent observations t 1 , t 2 , .. ., c. from this distribution.
60
10. Two-Pa ramete r Likelihoods
10.2. Relative Likelihood and Contou r Maps
61
(a) Write down the likelihood function of ,land c, paying
careful attention to the
range of allowable values for ,l and c.
(b) Show that, for any given ,l, L(,l, c) increases as 10.2. Relative Likelihood and Con tour Map s
c increases. Hence find the
M LE's of c and A.
The joint relative likelihood function (RLF) of IX and
/3 is defined as follows:
9. Consider the situation described in Problem 9.1.9.
It 1s suggested that, while the R(a, /3) = L(a, /3)/ L(&, p).
geometric distribution applies to most specimens, a
fraction 1 - ,l of them have
flaws and therefore always fracture on the first blow. Note that 0 :s; R(IX, /3) :s; l, and R(&, '{J) = 1. As in the
one-p arame ter case, we
(a) Show that the propor tions of specimens fractur use r to denot e the natura l logari thm of R:
ing after one, two, three, and ·
four or more blows are, respectively,
r(IX, /3) =log R(IX, /3) =/(a, /3) - I(&, p).
1 - AO, W(l - 0), W 2 (1 - 0), ,l0 3 . The relative likelihood of param eter values (1X , /Jo)
0 is
(b) If x, specimens are observed in the ith catcgc•ry
(i =I, 2, 3, 4; I:x 1 = n), show Proba bility of the data when (IX, /3) = (ao, /Jo)
that
R(ao, /Jo)= Maxim um proba bility of the data
for any IX, /3.
{) = X3 + 2X 4 •
If R(1X 0, f3 0 ) is near 0, the pair (a. , {3 ) is impla usible
x2 + 2x 3 + 2x4 ' 0 0
pairs of param eter values such that the data are
becau se there exist other
(c) Comp ute estimated expected frequencies for the much more proba ble. The
data given in Problem 9.1.9 joint RLF R(IX, /3) ranks pairs of param eter values
and comment on the fit of the model. accord ing to t.he1r
plausibilities in light of the data.
The 100p% likelihood region is the set of param eter
10.tn individuals are randomly selected. Blood serum values (IX, /3) such that
certain chemical compo und and observed for a time
from each is mixed with a R(IX, /3) ~ p. The curve R(a, /3) = p which forms
Tin order to record the time the bound ary of this region is
at which a certain color change occurs . It is observed called the 100p% likelihood contour.
that m individuals respond We may think of R(a, /3) as a "mou ntain" of likelih
at times 11 , t,. .. ., t,., and that the remaining n- m have ood sitting on the (a, /3)
shown no response at the plane (see Figure 10.2.1). Its maxim um value 1
end of the observation period T. The situation is though o.ccurs at (~, /3) =(ii:, PJ. A
t to be describable by a
probability density function ,le_,, (t > 0) for a fractio conve nient way to draw R(cx, /3) in two dimen sions
n p of the population, and is by plotti ng conto urs of
complete immunity to the reaction in the remain consta nt relative likelihood in the (a, {3) plane . This
ing fraction 1 - p. Find the produ ces a conto ur map
maximum likelihood equations, and indicate how these similar to those used in geogr aphy and meteorology
can be solved for p and .l.. . Usual ly the conto urs
will form a nested set of closed curves, rough ly ellipti
11. The lengths of the gestati on period s for 1000 cal in shape .
females are summarized in the
following table: EXAMPLE 10.2.1. In Exam ple 10.1.l, the log likelih
ood functi on of the
param eters µ 1 and µ 2 was found to be
Interval (days) Frequency Interval (days) Frequency
249.5-264.5
/(µl, µ2) = -t(l 5.6 - µi) 2 - t(29.3 - µ 2) 2 - 1(45.8 - µI - µ2) 2.
6 284.5- 289.5 176
264.5- 269.5 27 289.5- 294.5 135
269.5- 274.5 107 294.5- 299.5 34
274.5- 279.5 198 299.5- 304.5- 4
279.5-284.5 312 304.5-309.5 1
Suppose that the length of the gestation period is norma
lly distributed with mean
µ and variance 2
<J .
Since ji 1 = 15.9 and ji 2 = 29.6, its maximum is EXAMPLE 10.2.2. Consider the lifetime data from a Weibull distribution in
l(fl1 , f12) = -1((0.3) 2
+ (0.3) + (0.3) ) = -0.135.
2 2 Example 10.l.2. We shall work with the parameters (8, {J), where 8 is the
0.63-quantile and {J is the shape parameter. We obtain 1(8, /J) by substitut-
Hence the log relative likelihood function is ing A.= 8- P into the expression for /(A., /J) in Example 10.1.2:
r(µ1 , µ i ) = 1(µ 1 , µ 2 ) + 0.135. /(8, {J) = -n{J log 8 + n log {J + (/3 - l):E log X; - () - Pfaf_
The 50% , 10%, and l % likelihood contours are shown in Figure 10.2.2. We showed that p= 2.1021 and 8 = 81.88, so the maximum of the log
For instance, the 10% contour is given by r(µ 1 , µ 2 ) =log 0.1; that is, likelihood function is
-t(15.6 - µ 1 ) 1 -!(29.3 - µ 2 ) 2 - 1(45.8 - µ 1 - µ 2 ) 2 + 0.135 =log 0.1. 1(8, Pl= -113.691.
This is the equation of an ellipse centered at {µ 1 , j1 2 ). The 10% likelihood The log relative likelihood function of 8 and f3 is then
region is the set of all parameter values lying on or inside this ellipse.
The broken lines in Figure 10.2.2 show the outer limits of the 10% r(8, [3) = 1(8, [3) + 113.691.
likelih"od region. For all points (µ 1 , µ 2 ) in the 10% likelihood region we have Perhaps the simplest way to construct a contour map is from a tabulation
14.155µ 1 s 17.65 and 27.85sµ 2 s31.35. These are called 10% maximum of R(8, [3) = tt<0 • Pl over a lattice of (8, [3) values. Table 10.2.1 gives values of
likelihood intervals for µ 1 and for µ 2 (see Section 10.3), and parameter values R(8, [3) near the maximum, and the curve R(8, [3) = 0.5 is sketched in. This is
outside these intervals are implausible. the innermost curve on the contour map of Figure 10.2.3. The 10% and l %
Note that, although 14.15 and 27.85 are within the 10% intervals for µ 1 contours can be found in a similar way from a tabulation of R(8, {J) over a
and µ 2 , the pair of values (14.15, 27.85) is extremely implausible. It is possible larger region.
that µ 1 might be as small as 14.15, but if it is, then µ 2 is likely to be larger than The value f3 = 1 is of special interest, since for f3 = l the Weibull distri-
27.85. The axes of the elliptical contours are not parallel to the coordinate bution simplifies to an exponential distribution. Note that the line f3 = l lies
axes, and for this reason we cannot estimate µ 1 and µ 2 independently of one entirely outside the l % contour in Figure 10.2.3. If f3 = l, there does not exist
another. See Section l 0.3 for further discussion. a value of 8 for which R(8, [3);::: 0.01 ; in fact, the maximum of R(8, I) is about
0.0004. It is therefore highly unlikely that f3 = 1, and the simpler exponential
distribution model is not suitable for these data. Since f3 > l, the ball bearings
are deteriorating with age (see Section 6.4).
R=.01
32 I
The broken lines in Figure 10.2.3 show the outer limits of the 10%
I
I I likelihood region. The 10% maximum likelihood intervals are
I I
- - - - - - --1- - -----'----- 64.2 s 8 s 103.1, and 1.45 s f3 s 2.86. Parameter values outside these
I I' intervals are implausible.
I
30
Table 10.2.l. Relative Likelihood Function R({J, 8)
8=72 75 78 81 84 87 90 93
I
I
I
/J=2.6 0.019 0.066 0.155 0.261 0.338 0.351 0.306 0.230
28 _______ !I___ _ 2.5 0.047 0.136 0.275 ~418-0.501-0.49~ 0.416 0.307
2.4 0.100 0.245~.437 0.605 0.679 0.641 0. 5~5 0.383
2.3 . 0.184 0.387 0.619 0.791 0.839 0.764 0.613) 0.443
2.2 0.291/o.539 0.783 0.934 0.945 0.835 0.660 0.474
2.1 0.400 0.661 0.885 0.994 0.967 0.835 0.653 0.469
2.0 o.477 o.715 o.890 o.952 o.897 o.761 o.591 0.427
14 . 16 18 1.9 o.493 o.679 o.796 o.817 o.750 o.62~o<s1 0.354
Figure 10.2.2. Contour map for R(µ 1 , µ,)in Example 10.2.1. The broken lines show 1.8 o.441.....__ o.565 o.630 o.625 o.563---0.468 o.364 0.267
10% maximum likelihood intervals. 1.7 0.341 0.411 0.439 0.424 0.377 0.312 0.244 0.181
64
10. Two-Parameter Likelihood s 10.3. Maximum Relative Likelihood 65
f3
both samples, and x 4 in neither sample. He assumes that each individual
independently has a probability <P of survival between sampling periods, and a
3.0 I probability p of being caught in any sample if it is alive at the time of the sample.
--- - -- - ~---- ----
1
I (a) Show that the probabilities of the four classes of recapture are a( l - a), af3 , a 2 ,
I and I - a - ap, respectively, where a= </Jp and P= </J(I - p).
2.5 (b) Show that
P=-X_2_(1-&)·
x 2 +x 4 &. '
a= X1 +2x3
n+x 1 + x 3
2.0 (c) Suppose that the observed frequencies are 15, 11 , 9, and 29, respectively. Find
I the MLE's of </J and p, and compute estimated expected frequencies.
I (d) Find 10% maximum likelihood intervals for <P and p based on the data in (c).
I
I
I 3. Suppose that, in Example 10.1.2, testing had stopped at 75 million revolutions. The
1. 5
."'::::-~~=--=-::--= ______ .!._ - --- last 8 lifetimes would then have been censored. Thus we would have m = 15 failure
times x 1 = 17.88, ... , x 15 = 68.88 and 8 equal censoring times T1 = · ·· = T8 = 75.
Find the MLE's of() and p and prepare a contour map similar to Figure 10.2.3.
1.0-r----.~---.---.,----r---'-~---4
What effect does the censoring have on the estimation of() and {J?
e
60 72 84 96 108 4. Eighteen identical ball bearings were placed in test machines and subjected to a
Figure 10.2.3. Contours of constant relative likelihood for the Weibull distribution fixed radial load. The following are the numbers of hours the individual bearings
parameters m Example 10.2.2. endured at 2000 r.p.m.
Here &:(/3) is the MLE of a given [3, which may be found by solving the and the 10% interval is 14.15 :s; µ 1 :s; 17.65. These intervals are shown in
equation S 1 (a, [3) = 0 (see Section 10.1). The natural logarithm of Rmax is Figure 10.2.2.
r mul/3) = r(&(/J), /3) = 1(£(/3), /3)-1(&:, m, (10.3.1) EXAMPLE 10.3.2. Consider the analysis of failure times from a Weibull
distribution, as previously discu.ssed in Examples 10.1.2 and 10.2.2. The
which is the difference between the restricted maximum of /(ct, /3) with /3 fixed log RLF of A. and f3 is
and the unrestricted maximum.
The joint RLF can be pictured as a mountain of likelihood sitting in the r(A, /3) == n log A+ n Jog /J + ({3- l)L Jog Xi - Xi:xf + 113.691.
(a, {J) plane (see Figure 10.2.1). The maximum RLF of f3 is the profile or From Example 10.1.2, the MLE of A. given f3 is 1(/3) = n/.Lx f . Hence the
silhouette of R(a, {3) when it is viewed from a distant point on the a-axis. maximum log RLF of f3 is
Similarly, Rmax(a) is the silhouette of the likelihood mountain when it is
viewed from a distant point on the /J-axis. rmaxlf3) = r(A(/1), {3)
The properties of Rmax ({3) are simil ar to those of a one-parameter RLF. For = n log(n/.Lxf) + n log f3 + (/3- l)I: log x 1 - n + 113.691.
instance, we have
This function is plotted with a solid line in Figure 10.3.1. The broken line
shows the normal approximation' to rmax(/1) (see Section 10.4).
If Rmaxlf3 0 ) is near 0, there does not exist a parameter value a0 such that the The 10% maximum likelihood interval for f3 is 1.45 :s; f3 :s; 2.86. This can be
pair (a 0 , {3 0 ) is plausible, and hence {3 0 is an implausible value of fl. On the obtained from Figure 10.3.1, or from Figure 10.2.3, or by the numerical
other hand, if Rm.. (/3 0 ) is near 1, then there exists at least one plausible pair of methods described in Section 9.8.
values (a 0 , {J 0 ), and thus /Jo is not an implausible value of {3. Next we find the maximum log RLF of e, the 0.63-quantile of the
The 100p% max imum likelihood interval (or region) for f3 is the set of all /3 distribution. The joint log RLF of(} and f3 is
values fo r which Rmaxl/3) ~ p. This interval contains those f3 values such that, r(8,f1)= -nf31og8+nlogf3+(f3- 1).Lx 1 -f} - P.Lxf + 113.691.
for some a, the pair (a, {3) belongs to the 100p% li.kelihood region. Ten percent
maximum likelihood intervals are shown with broken lines in Figures 10.2.2 We find p(O), the MLE of /3 given(}, by solving the equation S2 ((}, {3) = 0. Then
and 10.2.3. we obtain
rmax(f}) = r(8, p(8)).
EXAMPLE 10.3.1. Consider the situation described in Examples 10.1.1 and
Numerical methods are required to solve for P(8). For instance, when(}= 80
10.2.1. The joint log likelihood function is
we find by Newton's method that S 2 (80, /J) = 0 for f3 = 2.0764. Thus
2 2
1(µ1, µ1) = - t(x 1 - µi) 2 - ·H x2 - µ2) -i(x3 - µi - µ2) P(80) = 2.0764, and
and the MLE's are rmax<.B)
1.5 2.0 2.5 3.0
0 .8
From Example 10.1.1, the MLE of µ 1 given µ 2 is
f.t1 (µ2) = ·Hx 1 + X 3 - µ2).
-I '\ \
The maximum log RLF of µ 2 is \
\
rmax(µz) = l(j.L1 (µ2), µz) - l(j.L1 , f.t2). \
\
-2 Rmax(tJ)• 0.1 \
After substitution and simplification, we obtain \
I
2
r ma.Cµ2) = -i(µ2 - f.t2) .
-3
I
I
Taking r max (µ 2) ~ log 0.1 gives the 10% maximum likelihood interval I
I
27.85 :s; µ 2 :s; 31.35. Similarly, we find that
Figure 10.3.1. Maximum log RLF fo r fJ in the Weibull distribution example. The
rmax(µi) = -i(µ1 - f.ti) 2, normal approximation is shown with a broken line.
68
10. Two-P arame ter Likelihoods
10.3. Maxim um Relative Likelihood
69
rmax( e)
60 80 By careful desig n of the exper imen t and choic
0 100 e of the param eters , it may be
possi ble to arran ge that (10.3.2) is true, at
least appro xima tely. We can then
treat the two-p aram eter probl em as a pair
of one-p aram eter probl ems, thus
simpl ifying both the analy sis and the interp
retati on. Adva nce plann ing to
-1
I
h
h
" \
\
achie ve facto rizati on of the likeli hood funct
ion becom es progr essiv ely more
\ impo rtant as the numb er of unkn own param
I
I \
\
eters incre ases.
I \
I \
-2 \ EXAM PLE 10.3.l (cont inued ). The
I Rmax (e) • 0. I \ joint log likeli hood funct ion of µ and
\ conta ins a produ ct term µ µ , and hence 1 µ2
I th·: likeli hood funct ion does not
1 2
I facto r into a funct ion ofµ times a funct ion
I 1 of µ 2 . As Figur e I 0.2.2 show s, the
-3 I range of plaus ible value s for µ depen ds
I 1 upon the value of µ 2 . In parti cular,
I the most likely value of µ is ·
I 1
(a) Derive expressions for ~ and e. about {) = 8. A similar derivation in the two-parameter case gives
(b) Show that the maximum RLF of). is
r(a, /3) ~ -t(a - &)2 .J 11 -J(/3 - P) 2 .J22 - (a - &)({J - p).J 12,
Rmax(..l.)=2
2
n(fx)"(1 + fxr n. 2
(10.4.1)
where .J1i =§ii(&, fJ) as in Section 10.1. If we take
(c) The following are the observed survival times for 12 subjects:
Treatment A:
Treatment B:
9 186 25 6 44
l 18 6 25 14
115
45
{)= [;J J(B) = J(&, lJ) = [~ 11
f12
Survival times are modeled as independent exponential variates with mean 8 the approximation may be written
for treatment A and mean ).() for treatment B. Obtain the 10% maximum
likelihood interval for ..l.. Do these data clearly demonstrate the superiority of r(e) ~ -t(e - ey J(B)(e - 8),
the first treatment?
which shows its similarity to the one-parameter result.
2
4. Let x 1 , x 2 , ... , xn be independent observations from N(µ, 11 ) where bothµ and,,. When (10.4.1) applies, the likelihood contours are close to ellipses centered
are unknown. at (&, f3). As in the one-parameter case, the normal approximation is not
(a) Show that the maximum relative likelihood function ofµ is invariant, and a one-to-one transformation from (a, {J) to new parameters
(µ, v) may substantially improve its accuracy. The information matrix for the
-[L(X;-µ)2]-n/2
Rmax(µ) - ,
-[ 1 + (jl-µ)2]-n/2
2 -,- - new parameters is
n11 ,,.
for - oo < µ < oo. Hence show that the 100p% maximum likelihood interval J,(µ, v) = Q' J(&, lJ)Q. (10.4.2)
for µ has the form Here Q is the two-by-two matrix of derivatives of the old parameters with
µE fl± ca respect to the new:
where c is a function of p and n.
(b) Show that the maximum relative likelihood function of,,. is [oa/oµ Q= oa/ov]
of3/oµ ofJ/ov ·
for 11> 0.
We evaluate Q at the MLE to obtain Q. The proof of (10.4.2) is similar to the
5. Find the maximum RLF of µ 2 in Problem 10.1.6. proof of the one-parameter result (9.7.4).
Differentiating (10.4.1) with respect to a gives an approximation to S 1 (a, {J),
6.t Find the maximum RLF's of). and c in Problem 10.1.8.
7. Find the maximum relative likelihood function for 8 in Problem 10.1.9. Show that S1 (a, /3) ~ - (a - &).J 11 -(/3- lJ).J 12,
this is the same as the relative likelihood function for 8 based on the conditional
distribution of X 2 , X 3 , and X 4 given X 1 •
which is linear in a and /J. Setting this equal to zero and solving for a gives
8.t Find the maximum relative likelihood function for). in Problem 10.1.10. (10.4.3)
9. Show that a one-to-one parameter transformation from (r:x, {J) to (y, {3) does not If we now substitute for a in (10.4.1) and simplify, we obtain the following
affect the maximum RLF of /3. The maximum RLF of f3 can be found by normal approximation to r max(/3):
maximizing the joint RLF of r:x and f3 over r:x, or by maximizing the joint RLF of y
and f3 over y. (10.4.4)
This has the same form as the normal approximation which we derived in
Section 9.7 for the one-parameter case. The quantity in square brackets is
10.4. Normal Approximations positive by (10.1.2).
The inverse of the information matrix is
In Section 9.7 we derived the normal approximation
r(e) ~ -t(e - 8)2 J(B)
by ignoring cubic and higher terms in the Taylor's series expansion of/({))
72 10. Two-Parameter Likelihoods
10.4. Normal Approximations
73
and the (2, 2)-elemen t of the inverse is
- 22 - must first evaluate J(B, '/3). An expressio n for 1(8, fl) is given in Example
.~ -
-J1 i/(J- 11J22
-
- J- 12)
2 - - 2
= (J22-J1 -
2/J11) - 1
. 10.2.2. We find the second derivative s of /(8, fl) and change their signs to get
Thus the normal approxim ation (10.4.4) can also be written as e
J(8, /J). Substituti ng 8 = and fJ = iJ then gives
- -
rmax(/3)- t ({3- p)
1) 2 /§- 22 .
(10.4.5) J(lJ
'
'/3) = [ 0.01516
-0.13046
-0.13046 J-
EXAMPLE 10.4.L Consider the normal distributio n example of the preceding 10.379
three sections. The log likelihood function of µ and µ is From this we compute
1 2
/(µ1, µz) = -t(x1 - µ1) 2 -t(x2 - µ2) 2 -t(x3 - µ1 - µ2) 2, .Jll -.Jf2/.l2 2 = 0.01352;
which is a second-de gree polynomi al in µ and µ • As a result, the approxi- and then (10.4.4) gives
1 2
matio_ns (10.4.1), (10.4.3), and (10.4.4) hold exactly. Since .J =.J = 2
and f 12 =1, we have 11 22 rm .. (8):::::: -t(8-8L8 8)2(0.01 352);
r(µ1' µ2) = -(µ1 -f!i) 2 -(µ z -f!2) 2 -(µ1 -fl1Hµ2 -ft2). r mu(/3) :=:::: -i(/3 - 2.1021) 2(9.256).
The contours of constant relative likelihood are ellipses as shown in Figure fhese functions are plotted as broken curves in Figures 10.3. l and 10.3.2. The
10.2.2. Also, since · agreemen t is not too bad. The normal approxim ations give 10% maximum
likelihood intervals 63.4 :-:;; 8 :-:;; 100.3 and 1.40 ~ f1:-:;;2.81, while the exact
.J 11 - .Jf 2/.J 22 = .J22 -.J Li.J 11 = !,
the maximum log RLF's are
results are 64.2 :-:;; e:-:;
103.l and 1.45 :-:;; f3 :-:;; 2.86.
2.t (a) Evaluate J(ji., A) and find an approxima te 10% maximum likelihood interval
for A in Problem I0.3.2(b).
(b) Transform parameters from (µ, A) to (a, fl) where a = logµ and fJ =log )..
The derivative s are not functions of the parameter s because the transform -
Calculate the information matrix .I.(&, fl). Obtain an approxima te 10%
ation is linear. Thus Q = Q, and (10.4.5) gives
maximum LI for fJ and transform it to give an interval for A.
-t
-!]'[2 l
t]
-± = [!0
(c) Compare the results of (a) and (b) with the exact 10% interval. Does the
logarithmic transformation seem to improve the normal approxima tion?
The log likelihood function is a second-de gree polynomi al in 8 and 8 , and 3. Prove result (10.4.2) for transforming informatio n matrices.
1 2
the approxim ations hold exactly. From (10.4. l) and (10.4.4) we obtain
4. Consider a one-to-one parameter transformation from (tX, {3) with information
r(81, 82) = -i(81 -81) 2 -t(8 2 - B2)~;
matrix .} = .f(&, 'i3) to (y, fl) with information matrix .J. = .f.(y, /3).
(a) Show that
This is called the logistic dose-response model, and log is called the whereµ;= niPi· Differentiating again gives
1-p
log-odds or logistic transform of p.
o
Both the logistic and the probit models are commonly used in analyzing or:x
data from dose-response tests. The two models lead to quite similar results, where vi= nip;(l - p;). Similarly, we obtain
and a very large amount of clata would be needed to show that one was better
than the other. The calculations are a bit simpler for the logistic model, and
for this reason we shall use it in what follows. The MLE's are found by solving the simultaneous equations
Maximum Likelihood Estimates In general, these equations must be solved numerically, and the
Newton-Raphson method (I 0.1.3) can be used.
Suppose that ni subjects receive dose di, and that Y; of these respond
(i = 1, 2, ... , k). Then Y; has a binomial distribution with parameters ni and EXAMPLE 10.5.1. k = 5 different doses of an insecticide were applied under
P;, where standardized conditions to samples of an insect species. The results are shown
Pi= 1-(1 +ea+pd,)-1. in Table 10.5.1. We assume that p, the probability that an insect dies, is
related to the dose via the logistic model (10.5.1). We wish to find the
If different subjects are used for different doses, the Ji's will be independent,
and their joint probability function is maximum likelihood estimates(&, PJ.
Based only on the data for dose di, we would estimate Pi by Yilni and the
log-odds by
y.jn
log ' ' =log
The likelihood and log likelihood functions are 1 - y;/ni n; - Yi
These values are given in the last row of the table, and are plotted versus the
L(a, /3) = f1k pf'(! - p;)n,-y, = f1k [ _Ei_ ]~ (1 - Pi)"'; dose in Figure I 0.5.2. A straight line has been drawn in by eye. If the logistic
i=l i=l 1-pi
model holds, then (10.5.2) implies that the five points should be scattered
about a straight line. The agreement with a straight line is very good in this
example.
From Figure I 0.5.2, we see that ex::::; - 5 and f3 ::::; 3, and we use these as
k
starting values for the Newton-Raphson method. Taking a= - 5 and /3 = 3,
= I
i= 1
[yi(r:x + {Jd;) + ni log (1 - pi)].
we compute pi, µi = ni pi, and vi= ni Pi(l - p;) for i = 1, 2, ... , 5. Using these
values and the d;'s from Table 10.5.1, we then get
Note that
S 1 =11.195; S 2 = 19.031;
§ 11 = 40.11; § 12 = 66.85;
op.
- ' =(1 +ea+pd,)-2e•+Pd•d-=p-(1-p-)d.
0{3 I I I 1•
Table 10.5.1. Data from a Dose-Response Experiment
Using these results, one can easily show that Concentration (mg/I) 2.6 3.8 5.1 7.7 10.2
Log concentration d; 0.96 1.34 1.63 2.04 2.32
ol 49 50
S1 (IX, {J) = oa = L(Yi - µ;); Number of insects ni
Number killed y,
50
6
48
16
46
24 42 44
F ract1on killed 0.12 0.33 0.52 0.86 0.88
ol -1.99 -0.69 0.09 1.79 1.99
S2(a, {J) = o{J = L(Yi - µ;)di
78
10. Tw o-P aram eter Likelih
oods
10.5. A Dose-Response
Example
y
log n=y - 79
2 Table 10.5.2. Observed Fre
0 quencies of Insects Killed
Expected Frequencies Un and Surviving, and
0 der a Logistic Model
Nu mb er killed
Con cen trat ion Nu mb er surviving
Observed (expected)
Ob ser ved (expected)
Tot al
2.6 6 (6.39)
3.8 44 (43.61)
16 (15.47) 50
5.1 32 (32,53)
2 d 24 (24.94) 48
7.7 22 (21.06) 46
42 (39 .68)
0
10.2 7 (9.32)
-I 44 (45.53) 49
6 (4.47) 50
l approximation (broken
Figure 10.5.4. Maximum log RLF (solid curve) and norma
where µ; = n; P; and n; p;(l - p;) as before. For any given value of y, we can
v1 =
to obtain fl(y), and then curve) for the ED50.
solve the equat ion g(f3) = 0 by Newto n's metho d
calculate
rmax(y) = /( -yft(y ), J3(y)) + 119.894.
Concentration 0.711 0.852 0.959 1.066 1.202 1.309 (c) Show that ft is a root of the equation
Number dead 15 24 26 24 29 29
Number alive (:Ex,yJ(:EePx;)-(:Ey 1)(:Ex 1e'1x') = 0,
35 25 24 26 21 20
and describe how p can be found by Newton's method.
Assume that the log-odds in favor of death is a linear function of the dose (d) Derive the maximum RLF of {1;
d,
p 5. The survival time Y; of an individual with tumor size x, has an exponent
log - - = a. + f3d ial
l-p distribution with mean
where d is the log concentration.
81 = E(Y,) = exp(a. + f3x 1)
(a) Prepare a graph to check whether the model seems reasonable, and from where a. and {1 are unknown parameters. Suppose that n survival
it times
obtain initial estimates of a. and /3. y 1, Yi· ... , y. with corresponding tumor sizes x , Xi, ..• , x. are observed.
1
(b) Obtain the maximum likelihood estimates &, ft by the Newton-R aphson
method. (a) Show that the score vector and information matrix of a. and /3 are as follows:
(c) Estimate the concentra tion of the insecticide which is required to obtain.a
kill probability.
(d) Find the 10% maximum likelihood intervals for a. and /3.
50%
s- [
-
J
:E(r 1 - l) ..
:Ex1(r 1 - 1) '
2.t The probability of a normal specimen after radiation dose d is assumed where r1 = y;/01•
to be (b) Show that the determinant
p = e•+Pd where a and {J are constants. The following table gives the number of~ is :Er1 times
of
normal specimens and the total number tested at each of five doses:
d = Radiation dose 0 1 2 3 4 where x = (:Er1x 1)/:Er1• Hence verify that condition (10.1.2) is satisfied.
y =Numbe r of normals 4357 3741 3373 2554 1914 (c) Derive an expression for the MLE of a when /3 is given, and describe
a
n =Numbe r tested 4358 3852 3605 2813 2206 numerical procedure for evaluating ft.
(a) Plot log(y/n) against d to check whether the model seems reasonable,
and
obtain rough estimates of a. and f3 from the graph.
(b) Find the maximum likelihood equations and solve numerically for 5: and
using the Newton- Raphson method or otherwise. Plot the 10% likelihood
ft 10.6. An Example from Learning Theory
contour, and obtain 10% maximum likelihood intervals for f3 and e•.
In their book Stochastic Models for Learning (Wiley, 1955), R.R. Bush and
3.tThe number of particles emitted in unit time from a radioactive source F. Mosteller develop general probabilistic learning models and apply them
bas a to
Poisson distribution. The strength of the source is decaying exponentially
with a variety of learning experiments. One of the most interesting applications is
time, and the mean of the Poisson distribution on the jth day is µi = to the Solomon...:Wynne experiment (R.L. Solomon and L.C. Wynne,
a.{31
U = 0, 1, ... , n). Independent counts x 0 , x 1 , ... , x. of the number of emissions in Traumat ic Avoidance Learning: Acquisition in Normal Dogs, Psych. Monog.
unit time are obtained on these n + l days. Find the maximum likelihood equations 67 (1953), No. 4). We shall first describe this experiment, then develop the
and indicate how these may be solved for ii and ft. model, and finally use likelihood methods to estimate the two parameters of
4. Observat ions Yt> Yi, ... , y. are taken on the number of plankton in
the model.
unit-volume In the Solomon -Wynne experiment, 30 dogs learned to avoid an intense
samples of seawater at temperatures x , xi, .. . , x •. The y,'s are modeled
1 as electric shock by jumping a barrier. The lights were turned out in the dog's
observed values of independent Poisson variates Y , Yi, ... , Y,,, where
1
compartment and the barrier was raised. Ten seconds later, an intense shock
µ 1 = E(Y;) = exp(<X + f3x;). was applied through the floor of the compart ment to the dog's feet, and was
(a) Show that the log likelihood function is left on until the dog escaped over the barrier. The dog could avoid the shock
only by jumping the barrier during the ten-second interval after the lights
/(a., {J) = :E(y1 log µ 1 - µ;). were turned out and before the shock was administered. Each trial could thus
(b) Find the score vector and information matrix for a and {J, and describe how be classified as a shock trial, or as an avoidance trial. The experimental record
to
obtain 5: and ft by the Newton- Raphson method. of 30 dogs, each of which had 25 trials, is shown in Table 10.6.1, with 0
84 10. Two-Parameter Likelihoods 10.6. An Example from Learning Theory 85
Table 10.6.1. Data from 25 Trials with 30 Dogs in the Solomon-Wynne The Model
Experiment
Consider the sequence of trials for one dog. As in Table 10.6.1 we take y1 =1 if
0 = Shock trial; 1 = Avoidance trial the dog avoids shock at trialj, and y1 = 0 otherwise (j = 0, 1, ... , 24). Because
Trial numbers of learning at trialj - 1, the probability that the dog receives shock should be
smaller at trial j than at trial j - 1. The amount by which the probability
0-4 5-9 10-14 15-19 20-24 decreases may well depend upon whether there was shock or avoidance at
trialj - I. We wish to compare the effectiveness of shock trials and avoidance
Dog 13 0 0 1 0 l 0 1 1 1 1 l 1 l 1 1 1 1 1 1 1 1 1 trials in teaching the dog to avoid future shocks.
16 0 0 0 0 0 0 0 1 0 0 0 00 0 1 l 1 1 1 1
17 0 0 0 0 0
Let <fl. be the probability that the dog receives a shock at trial j, given its
1 1 0 1 1 0 0 1 1 0 1 0 1 1 1
18 0 1 1 0 0 1 1 1 1 0
past history in trials 0 through j - I. Let x 1 be the number of times that the
1 0 1 0 1 1 1 I 1 1 1
21 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 l
dog has avoided shock in trials 0 through j - 1, so that
1 1 l 1
27 00000 01111 0 0 1 0 1 1 1 l 1 1 1 1 1 1 x 1 =Yo+ Yi+ ... + Y1-1 ·
29 00000 10000 0 0 1 1 1 1 1 1 1 1 1 1 1 1 The number of previous shock trials is thenj- x 1. Since all dogs were given a
30 00000 00110 0 1 1 1 1 1 1 l l 1 1 1 1
32 0 0 0 0 0 l 0 10 1
shock at trial 0, we assume that </1 0 =I. For j > 0 we assume that
1 0 1 0 0 0 1 1 l 1 0 1 1 0
33 00001 00110 1 0 1 1 1 1 1 1 1 1 1 1 1 (10.6.1)
34 0 0 0 0 0 0 0 0 0 0 1 l 11 l 1 0 l 1 1 1 1 1 1 where O::;; A::;; 1 and 0::;; B::;; I. We call A the avoidance parameter and B the
36 0 0 0 0 0 1 i 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 shock parameter. The model can also be written
37 0 0 0 1 1 0 1 0 0 1 l 1 1 1 l 1 1 l l l 1 1 1 1
41 0 0 0 0 1 0 1 I 0 I l 1 1 1 1 1 l 1 l l 1 1 1 l 1 log <jJ1 = X/1. + (j - x)f3 (10.6.2)
42 0 0 0 1 0 I 1 0 1 1 1 1 1 1 1 1 1 l 1 1 1 1 1 1 1
where CJ.= log A and f3 =log B.
43 0 0 0 0 0 0 0 I 1 1 l 1 1 l 1 1 l l l l 1 1 It is easy to show that
45 0 1 0 1 0 0 0 1 0 I 1 1 0 1 l l l l 1 1 1 1 l 1
47 0 0 0 0 1 0 l 0 1 1 1 1 1 1 1 1 l 1 1 1 l 1 11 1 if y1 _ 1 = 1;
48 0 1 0 0 0 0 l 00 0 1 1 1 1 1 l 1 l 1 1 1 1 1 1 1 ifY1-i=O.
46 0 0 0 0 1 1 0 10 1 1 0 1 0 1 l 1 1 1 1 l 1 1 1 1
The probability of a shock decreases by the factor A ifthere was an avoidance
49 0 0 0 1 1 1 1 I 0 1 1 1 1 1 1 1 1 1 1 1 1 l 1 1 1
at trial j - 1, or by the factor B if there was a shock at trial j - 1. If A is small,
50 0 0 1 0 l 0 1 1 1 1 l 1 l 1 1 1 0 0 1 l 11l 11
then the effect of an avoidance trial is to greatly reduce the chance of future
52 0 0 0 0 0 0 0 1 l 1 1 1 1 l 1 1 1 l 1 l 11 1 l 1
54 0 0 0 0 0 0 0 0 I 1 1 0 1 0 0 0 l 1 0 l 1 1 1 1 l shock. If A = 1, nothing is learned from an avoidance trial. If A < B, then
57 0 0 0 0 0 0 1 0 1 I 1 1 0 1 0 l 1 1 1 l 1 1 1 1 1 more is learned from an avoidance trial than from a shock trial.
59 0 0 1 0 1 l 1 0 l 1 0 1 1 1 1 1 l 1 1 l 1 1 1 1 1
67 0 0 0 0 1 0 l I 1 1 1 1 1 1 1 l 1 1 l 1 111 l 1
66 0 0 0 l 0 1 0 1 1 1 0 1 0 1 1 1 1 l 1 l 1 1 1 1 l
The Log Likelihood Function
69 0 0 0 0 1 I 0 0 1 1 0 1 0 l 0 l 0 l 1 1 1 1 1 1
71 0 0 0 0 l 1 I I 1 0 1 0 1 1 l l l 1 1 1 l 1 1 l The joint probability function of Y0 , Yi, ... , Y24 can be written as a product of
25 factors:
denoting a shock trial and 1 an avoidance trial. (The dogs are numbered 13, f(yo, Yi, .. ., Yn) =f(Yo) ·f(Yi IYo) ·f (Y2 IY0. Yi). · .. ·
16, etc. for identification purposes, and no use is made of these numbers in the Given the results of trials 0 through j - 1, the probability function of Y; is
analysis.) Initially, all of the dogs received shocks on almost every trial, but by
trial 20, all except dog number 32 had learned to avoid the shock by jumping ,;.. for y1 = O;
_,i.i-yi(l-</>Yj= 'l'j
the barrier. f(YilYo, .. .,y1-1 ) -'l'J 1 { 1-</Ji fory =1.
1
86
JO. Two-Parameter Likelihoods
10.6. An Example from Learning Theory
87
Since f(y 0 ) =I for Yo= I, this term makes no contrib
ution, and therefore MLE's and plot contou rs, thus obtain ing an exact summ
24 ary of the informa-
tion concerning the param eters.
f(yo,Y 1,.-.,y 24)= TI </>J - Y (l-</>i)Y
j = I
1 1.
A preliminary tabula tion of l(rx, /3) indicates that the maxim
um occurs near
The log likelihood function based on the data from a (rx, f3) = ( -0.1, -0.2). Taking these as initial values, the
single dog is thus MLE's can be found
by the Newto n-Rap hson metho d (10.1.3). After three iterati
24 ons we obtain
j=!
L [(I - Yi) log <Pi+ Yi log(! - </> 1)]. a:= -0.24091; p= -0.07872
Now we assume that results for different dogs are indepe
ndent, and that a I(&, p) = - 273.987
and /3 have the same values for all 30 dogs. Then the log
likelihood function • [2451
based on all of the data is ..f(a, p) = 2784]
2784 10277 .
30 24
l(rx, /3) = L L
i= I j= I
[(I - Yii) log </>Ii+ ylj log(l - </>ii)] The MLE's of the original param eters A, B are
/3) = T1 -
S 1 (a, r.r.yiixii</>iJ /( I - </>ii); 8
trial as in three shock trials. Note also that the experim ent determi Let (ix, {3 ) be a prelimin ary estimate of the root(&:,
p), and conside r the linear
nes the 0 0
value of B more precisely than the value of A, and that the parame approximations
ters A , B
cannot be estimate d indepen dently of one another .
ag ag
The contour s in Figure 10.6.1 are nearly elliptical, and therefore we
would g(a, /3) ~ g(ao, /30) +(a - 1Xo) aa + ({3 - /30) a{J
expect the normal approxi mations of Section 10.4 to be fairly accurate
. By
(10.4.4), we hav.e ah ah
h(a, /3) ~ h(ao, /30) +(a - rt.o) aa + ({3 - /30) a{J .
rmax(A) ~ - 1(A - A) 2 (3969 - 3833 2/ 12029);
Here the derivatives are to be evaluate d at a= a 0 , {3 = {3 0 . These
rmax(B) ~ -}(B- B) 2 (12029 - 3833 2 / 3969). linear
approxi mations can be derived by truncati ng the bivariat e Taylor'
s series
From these, we obtain approxi mate 10% maximu m likelihood expansi ons of g and h at (a , {3 ) . They have the same values
intervals 0 0 and first
0.745 s; A::; 0.827 and 0.901 s; B::; 0.948. The exact results from derivatives as g and h at the point (a 0 , /30)·
Figure 10.6.1
are 0.744 $A $ 0.826 and 0.899 s; B::; 0.946. Since g(&:, p) = 0 and h(&:, p) = 0, we have
ag ag
(&: -1Xo) OIX + (P -- /30) ap ~ - g(ao, f3o);
*10.7. Some Deriv ations ah . ah
(&: - a0 ) aa + (p - /3 0) ap ~ - h(ao, Pol·
In this section, derivati ons will be given for some results quoted earlier
in the
chapter . First we shall derive the Newton -Raphs on procedu re for
evaluati ng
Solving these linear equatio ns for .&: - a0 and P- /3 0 gives
the MLE (&:, P) in the two-par ameter case. Then conditio ns (
10.1.l) and
(10.1.2) which must be satisfied at a relative maximu m will be
established.
Finally, the use of Newton 's method to solve for points on a
likelihood
contour will be describe d.
Newton 's method for solving the equatio n g(8) = O was derived
at the
beginni ng of Section 9.8. We conside red a linear approxi mation (10.7.1)
to g(8) at
8=80:
where S and S are the compon ents of the score vector (see Section
*This section may be omitted on first reading. 1 2 10.1). In
90 10. Two-Parameter Likelihoods 10.7. Some Derivations
91
this case we have
function is
ag as 1 a2 1
iJct. =Ta= iJct.2 = -J11(ct., {J), h(d) =/(ct., fJ) = /(& + d cos cf>, 'fJ + d sin¢).
For each possible angle cf>, h(d) must have a relative maximum at d = 0. It
and similarly for the other derivatives. Thus (10.7.1) becomes
follows that, for all cf>,
[ Pa:J·~[ctPo .0J+[J11
1
.F 12 ] - [S 1 ] (10.7.2)
...f1 2 ...f22 S2 and at d = 0.
where St> S 2 , and the ...f;/s are all evaluated at ct.=ct. , fJ=fJ • This is the Using the chain rule, we find that
0 0
result stated in Section I 0.1.
In many applications, /(ct., {3) is approximately a quadratic function of ct. and
iJh(d) ol iJct. iJI iJ{J
{J near(&, Pl (see Section 10.4). Then S 1 and S will be nearly linear in ct. and {3 .
2
near (&, Jl). The Newton-R aphson procedure will then converge quickly
ad= iJa iJd + iJ{J iJd = S 1 cos c/> + S 2 sm </>.
provided that the initial guess (ct. 0 , {J 0 ) is not too far from (&, '/J). Since this must be zero for all </> when d = 0, it follows that
In Example 10.1.1 the log-likelihood function 1(µ , µ ) is a second-degree
1 2
polynomial in µ 1 and µ 2 , and the componen ts of the score function are linear
in µ 1 and µ 2 • In this case (10.7.2) will yield ([1 1 , [1 ) in one iteration, no matter which is (10. l.1).
2
what initial values are taken for µ 1 and µ • Upon differentiating again, we find that
2
2
a h(d) ~ ,I. ,, ~
---a;f2= J11COS 2 'f'+k.T12 ,i. ~
COS'f'sm 'f'+ J 22Slll 2
,I. • . ,i.
Derivati on of (10.1.1) and ( 10.1.2) '+'·
Since this must be positive for all </> when d = 0, it follows that
If l(ct., {J) has a relative maximum at (&, p), then it must be "down hill" from
(&, p) in all directions. .J 1l cos 2 </> + 2.J 12 cos</> sin</>+ .J 22 sin 2 </> > 0 (10.7.3)
Suppose that we move away from(&, p) along a iine at angle</> to the ct.-axis for all cf>. It is not difficult to show that this condition is satisfied if and only if
(see Figure 10.7.1). For points (ct., {3) on this line we have (10.1.2) holds.
ct. = &+ d cos </>; f3 = P+ d sin cf>
where d is the distance from (ct., {3) to (&, P). Along this line the log likelihood Calculation of Contour s
.2 .4 .6 .8 Probability
- --
....... ....... ..... ..... ......
' '
1.0
8
x e A B 00 = 0.1 0 0 = 0.2 0 0 =0.3
' ' ''' 0 0.0 0.000 O.Z06 0.349 0.107
-2 '\\ \
\
I 0.1 0.004
0.028
\
\ 0.403 0.387 0.268 0.121
\ \ \ 2 0.2 0.029 0.530
\
\
\ \ \
\ 0.194 0.302 0.233
\ \ 3
\ \
\
\ I
\ 0.3 0.072 0.635 0.057
\ \ \ I 0.201 0.267
\ \ \ I 4 0.4 0.128 0.725
\ \ \ I 0.011 0.088 0.200
\ \ I I 5 0.5 0.196 0.804
\
\ \ \ I
I
0.001 0.026 0.103
\ \ I I 6 0.6 0.275 0.872
x•O 0.000 0.006 0.037
2 3 4 5 7 0.7 0.365 0.928 0.000 0.001 0.009
Figure 11.1.1 . Six of the eleven possible log relative 8 0.8 0.470 0.971 0.000
likelihood functions in a binomial 0.000 0.001
example. 9 0.9 0.597 0.996 0.000 0.000 0.000
10 1.0 0.794 1.000 0.000 0.000 0.000
100 l L Frequency Properties : L1 Sampling Distributions 101
Table 11. l.2. Sampling Distribution of the Likelihood Ratio Statistic and B = X + c, with sampling distributions
=
D -2r(B 0 ) in a Binomial Example A~ N(B 0 - c, 1); B ~ N(B 0 + c, 1).
B0 =0.1 Bo 0.2 B0 =0.3 The likelihood ratio statistic is
x D(x) f(x) D(x) f(x) D(x) f(x)
0 2.11 0.349 4.46 0.107 7.13 0.028 which would also vary from one repetition of the experiment to another. To
0.00 0.387 0.73 0.268 2.33 0.121 find the sampling distribution of D, we note that, by (6.6.5),
2
3
0.89
3.07
0.194
0.057
0.00
0.56
0.302
0.201
0.51
0.00
0.233
0.267 Z =X - B 0 ~ N(O, 1).
4 6.22 0.011 2.09 0.088 0.45 0.200 Thus, by (6.9.8), we have
5 10.22 0.001 4.46 0.026 1.74 0.103
6 15.01 0.000 7.64 0.006 3.84 0.037 D = z2 ~ xf1)·
7 20.65 0.000 11.65 0.001 6.78 0.009 In this example the sampling distribution of the likelihood ratio statistic is
8 27.25 0.000 16.64 0.000 10.68 0.001
9 35.16 0.000 22.91 0.000 15.88 0.000
xfl) for all possible parameter values B0 • The situation is much simpler than in
10 46.05 0.000 32.19 0.000 24.08 0.000 the preceding example, where the sampling distribution of D depended on the
true value B0 •
To calculate probabilities for D, we can use either Table B4 for the chi-
In all three cases, values of D greater than 4.61 would occur very rarely in square distribution, or Table B2 for N(O, 1) (see Appendix B). For instance,
repetitions of the experiment. The true value B0 would almost always have a P(D $ 4.61) = P(xfl) $ 4.61) ~ 0.97
relative likelihood of 10% or more, and therefore a D-value of 4.61 or less.
by interpolating in Table B4. For greater accuracy, we note that
EXAMPLE 11.1.2. Suppose that an experiment involves taking a single mea-
surement x which is modeled as the observed value of a random variable P(D $ 4.61) = P(Z 2 $ 4.61) = P(IZI ::s; 2.146) = 0.968
X ~ N(B, 1). If the measurement interval is small, the log likelihood function from Table B2. This result is true for all B0 , whereas in the preceding example
of Bis we found that P(D $ 4.61) depended on the true value B0 •
l(B) = -t(x - B) 2 for - oo < B< oo, EXAMPLE 11.1.3. Suppose that we observe n measurements x 1 , x 2 , ... , x"
which we model as observed values of independent N(µ, () 2 ) variates. As in
from which we obtain {) = x and l(O) = 0. The log RLF is Example 9.7.1 we assume that (j is known and that the measurement intervals
r(B) = -t(x - 8)2 for - oo < B < oo. 1
are small. Then the MLE ofµ is fl= x= - LX;, the log RLF is
n
Upon solving r(B) ~log p, we find that the 100p% likelihood interval for Bis
given by n 2
for - oo < µ < oo,
r(µ) = - 2()2 (x - µ)
x-c$B$x+c
and the 100p% likelihood interval is
where c=J-21ogp. C(J C(J
Now imagine that the experiment is to be repeated over and over again x- :s;µ:s;x+
with B fixed at a particular value B0 . The probability distribution of X in
repetitions with B= B0 is N(B 0 , 1). Since the observed value of X would vary
from one repetition to the next, so would r(O), 0, and the endpoints of
where c = J-2 log p.
likelihood intervals. We can now think of{) as a random variable, 0 = X, with Now imagine that the experiment is to be repeated over and over with
sampling distribution u = µ 0 . The observed x/s would vary from one repetition to the next, and thus
0 ~ N(B 0 , 1). so would x. By (6.6.8) we have X ~ N(µ 0 , () 2 /n), and therefore
Similarly, the endpoints of the 100p% LI are random variables A= X - c µ =X ~ N(µ 0 , () 2 /n).
102 11. Frequency Propertie s 11.2. Coverage Probability
103
This is the samplin g distribu tion of the MLE in repetitio ns of the This is not a conditio nal probabi lity. Rather, the notation is
experim ent meant to
withµ= µ . Similarl y, the endpoin ts of the 100p% LI are now emphas ize that the true parame ter value B is to be used in comput
0 regarde d as 0 ing the
random variable s X - ca/Jn and X +ca/Jn . · coverag e probabi lity.
By (11.1.1), the likeliho od ratio statistic is The coverage probabi lity CP(B0 ) is the fraction of the time that the
interval
[A, B] would include the true value B in a large number of repetitio
D = -2r(µ 0) =2(A
n
(j
-
"U
µo)
2
= z2
0
experim ent with B = B0 • Note that A and B are the random
ns of the
variable s in
(11.2.1) and 80 is fixed.
where Z is the standar d form of X:
Z =(X - µ 0 )/~ ~ N(O, 1). EXAMPLE 11.2.1. Imagine that the experim ent describe d in Exampl
e 11.1.1 is
It follows by (6.9.8) that for all µ the samplin g distribu tion of Dis 2 e
to be repeated over and over again with fixed at B . Each time
0 the 10%
0 X with one likeliho od interval for Bis to be calculat ed. We want to know what
degree of freedom . The same result was obtaine d in the precedin fraction of
g example, the time this interval would contain the true value B .
which is the special case n = u = I. 0
e
First conside r repetitio ns with = 0.1. The 11 possible 10% likeliho
od
interval s are listed in Table 11.1.1 , and from the table we see that
As 0.1 s B
Note on Notati on whenever 0 s X s 3. Thus the coverag e probabi lity of the
10% LI in
repetitio ns with B= 0.1 is
In previou s chapter s we have used capital letters for random variable
s and CP(0.1) = P(A s 0.1sBI O=0.1) = P(O s X
the corresp onding small letters for their possible values. We shall s 3!B = 0.1).
no longer
follow this convent ion in all cases. In particul ar, we shall use 8 Now from the 5th column of Table 11.1.1 we get
to represen t
the MLE of e whether we are thinking of it as a random variable in CP(O.l) = 0.349 + 0.387 + 0.194 + 0.057 = 0.987.
repetitio ns of the experim ent, or as the particul ar value comput
ed from the
data. Also, we shall use r(8 ) for the log relative likeliho od of Bo The coverage probabi lity in repetitio ns with B = 0.2 is
0 whether we
are conside ring it to be fixed or random , because we are already CP(0.2) = P(A s 0.2 s BIB= 0.2) = P(O s X
mean e'.
using R to s 5)8 = 0.2).
Using the 6th column of Table 11.1.I we get
Suppos e that the probabi lity model for an experim ent involves
CP(0.3) = P(A s 0.3 s BIB= 0.3) = 0.961.
a single In repetitio ns of the experim ent with 8 = 0.1, the 10% LI for
unknow n parame ter B. As in Section 11.1 we imagine a series of repetitio B would
ns ~f 0
include, or cover, the true value 98.7% of the time. Similarly,
the experim ent withe fixed at Bo. Further , imagine that an interva~ the 10% LI
~A, B] IS would cover the true value B0 in 99.2% of repetitio ns with B =
to be comput ed from the data in the same way at each repet1t10 0.2, and in
n. For 96.1% of repetitio ns with 80 = 0.3.
0
insta nce, [A, B] could be the 10% likeliho od interval for 8.
Owing to random variabil ity in the data, the interval [A, B] would In this example, the coverag e probabi lity depends on the true value
vary B0 . It
from one repetitio n to the next. Its endpoin ts A, B can be modeled can be shown that CP(B 0 ) varies from a low of 89.3% when B
as random 0 is 0.206 or
variable s. The samplin g distribu tions of A and B can be derived 0.794 to a high approac hing 100% as 8 tends to 0 or 1. Owing
from the 0 to the
probabi lity model, and they will generally depend upon Bo. discreteness of the binomia l distribu tion, CP(B ) is not a continu ous
. 0 function .
Since the interval [A, B] would vary from one repetitio n to the For instance, we see from Table 11.1.1 that the upper endpoin t of
next, it the 10% LI
might sometim es fail to include the true value B . Hopeful ly corresp onding to X = 0 is 0.206. In comput ing CP(B ), we would
this would 0 include
0
happen only rarely, and the interval [A, B] would usually contain P(X = 0) for B0 s 0.206 but not for B0 > 0.206. As 8 increase s through
, or cover, 0 0.206,
the true value 80 . the coverage probabi lity suddenl y decreas es by
The coverage p,robability of the random interval [A, B] is the probabi P(X = OIB0 =
lity 0.206) = (0.794) 10 = 0.100.
that the interval [A, B] includes , or covers, the true parame ter
value Bo: Similarly, CP(8 0 ) will have a disconti nuity at each of the other endpoin
ts of
(11.2.1) the 11 possible likeliho od interval s.
104 I I. Frequency Properties
l 1.2. Coverage Probability
!05
EXAMPLE 11.2.2. In Example 11.1.2, the I 00p% likelihoo d interval for
B has
the form ' EXAMPLE 11.2.3. In Example 11. l.3 the 100p% likelihoo d interval for
µ has
the form
X-c::;;B ::s;X+ c
cu; cu
where c = J- 2 log p. By (11.2.1), the coverage probabil ity of this interval in X--< µ<X +-
Jn- -
repetitio ns of the experime nt with 8 = B is
0
Jn
CP(B 0 ) = P(X - c:::::; B0 ;:;: X + c!B = B0 ) where c = J-
2 log p. The coverage probabil ity of this. interval in repetitio ns
of the experime nt with µ = µ 0 is
= P(-c s X - 00 :::::; c!B = B0 ) .
The distribut ion of Xis N(B 0 , I), so (6.6.5) gives
approximation. The limiting distribution of D is xf1 for all interior parameter Table 11.3.1. Exact and Approximate Probabilities for the Likelihood
values 80 , but the sample size needed to achieve ~easonable accuracy may Ratio Statistic in a Binomial Example
depend upon 80 .
. In ~he following three examples, the accuracy of (11.3.1) is investigated in n 80 P(D s 2.706) P(D s 3.841) P(D s 6.635)
,
s1t~at10ns. where the exact sampling distribution of D can be derived fairly IO 0.1 0.930 0.987 0.998
easily. Usmg the exact sampling distribution, we shall calculate 0.2 0.859 0.859 0.992
0.3 0.924 0.961 0.961
P(D :s; 2.706), P(D $ 3.841), P(D $ 6.635).
?O 0.1 0.835 0.867 0.998
The values 2.706, 3.841, and 6.635 were chosen because they are the 90%,
0.2 0.899 0.956 0.986
95%, and 99% points of xfl) (see Table B4). Thus, if (11.3.1) holds, the three 0.3 0.917 0.947 0.987
probabilities should be close to 0.9, 0.95, and 0.99.
By (11.2.2), P(D $ d) is the coverage probability of the 100p% likelihood 50 0.1 0.908 0.942 0.992
interval where d = - 2 log p; that is, p = e-d/1. Since e- 2 . 7 o612 = 0.258 0.2 0.891 0.951 0.988
P(D $ 2.706) is the coverage probability of the 25.8% likelihood interval'. 0.3 0.912 0.957 0.987
Similarly, P(D $ 3.841) and P(D $ 6.635) are coverage probabilities of 14.7% x2 approx. 0.9 0.95 0.99
and 3.6% likelihood intervals. The three examples provide a comparison of
the exact coverage probabilities of these three likelihood intervals with their
approximate coverage probabilities from (11.3.1).
EXAMPLE 11.3.2. Suppose that an experiment yields n counts which are
EXAMPLE 11.3.1. Suppose that n people are tested for tuberculosis as in modeled as observed values of independent Poisson-distributed variates
Example 11.1.1. In that example we took n = 10 and derived the exact X 1 , X 2 , .. , X" with expected valueµ. From Example 9.1.2, the log likelihood
sampling distribution of D for 80 = 0.1, for 80 = 0.2, and for 80 = 0.3. function of µ is
The exact distribution of D when B0 = 0.1 is given in columns 2 and 3 of I(µ) = t logµ - nµ forµ> 0
Table 11.1.2. We see that D $ 2. 706 for X $ 2, and therefore
where t = r.x;. The MLE is jJ. = t/n, and the log RLF is
P(D $ 2. 706) = P(X $ 2) = 0.349 + 0.387 + 0.194 = 0.930.
Similarly, we have r(µ) = l(µ)- l(j;.) = t log~ - nµ + nj;.
µ
P(D:::::; 3.841) = P(X :s; 3) = 0.987;
t
P(D :s; 6.635) = P(X :s; 4) = 0.998. = -tlog--nµ+t.
nµ
These results are shown in the first row of Table 11.3.1. The values in the next
Now imagine a series of repetitions of the experiment with µfixed at µ 0 .
two rows of this table are obtained in a similar fashion from the last four
columns of Table 11.1.2. The last row of Table 11.3.1 gives the approximate
=
The total count T r.x; would vary from one repetition to the next, and is
modeled as a random variable. By the corollary to Example 4.6.1, the
probabilities according to (11.3.1). The other rows are obtained by redoing
probability distribution of Tis Poisson with mean m = nµ 0 •
the calculations of Example 11.1.1 with n = 20, and then repeating them again
with n = 50. The likelihood ratio statistic is
In this example, D is a discrete random variable having n + 1 possible
values, one for each value of X. The approximating x2 distribution 1s
continuous. For this reason, one would not expect the approximation (11.3.1)
D = -2r(µ 0 ) = 2[ Tlogf + m- T J
to be very accurate when n is as small as 10. As n increases, so does the D is a discrete variate with one possible value for each value of T. For any
number of possible values for D. When n is very large, the discreteness no 3iven m we can substitute T = 0, 1, 2, ... to obtain the possible values of D.
longer matters, and the distribution of D will be well approximated by x21 • Their probabilities are obtained from the Poisson distribution
The limiting distribution of D is xh> for all B0 such that 0 < B0 < 1. If B~ ls P(T = t) = m'e-m/t! fort= 0, 1, 2, ....
near i, (11.3.1) gives fairly accurate results for n = 20. However, a much larger
n is needed if B0 is close to 0 or 1. For instance, suppose that m = 10. Then
110 11. Frequency Properties
11.3. Chi-Square Approximations 111
D 2 [ T log l: + 10 T J RLF is
Table 11.3.3. Exact and Approx imate Probabi lities for the
Likeliho od Ratio Statistic in an Expone ntial Exampl e 11.4. Confidence Intervals
P(Ds 2.706) s s The random interval [A, B] is called a confidence interval (CI)
11 P(D 3.841) P(D 6.635) for e if its
coverag e probabi lity
1 0.874 0.932 0.984
2 0.886 0.941 0.987 CP(B 0 ) = P(A :::; B0 ::::; BIB= Bo)
3 0.891 0.944 0.988
5 is the same for all parame ter values B • The coverag e probabi
0.894 0.946 0.989 0 lity of a
10 0.897 0.948 confiden ce interval is called its confidence coefficient.
0.989
For instance , [A, B] is a 95% confide nce interval for B if
x2 approx. 0.9 0.95 0.99
P(A :$ B0 :::; BIB= B0 ) = 0.95
114
11. Frequency Properties
11.4. Confidence Intervals
115
for all possible paramete r values 80 . A 95% CI would include the true
paramete r value B0 in 95% of repetitions of the experiment with B fixed. Another method of constructing confidence intervals is by inverting a test
In Examples 11.2.2, 11.2.3, and 11.3.3 we found that the coverage of significance (see Section 12.9). .
probability of the 100p% likelihood interval was the same for all parameter
values. In each of these examples, the 100p% LI is a confidence interval. In EXAMPLE 11.4.1. In Example 11.2.2 we noted that
particular, the confidence coefficient of the 14.7% LI is exactly 0.95 in Z:X-B 0 ~N(O , 1).
Examples 11.2:2 and 11.2.3, and is close to 0.95 in Example 11.3.3.
Likelihood intervals are not confidence intervals in Examples 11.2.1, 11.3.1, Since P{ -1.96 ~ Z ~ 1.96} = 0.95, it follows that
and 11.3.2 because their coverage probabilities depend on the true parameter ~ X + 1.96) = P( - 1.96 ~ Z ~ 1.96) = 0.95.
P(X - 1.96 ~ B0
value B0 . In general, when the probability model is discrete, the c~verage
probability of a random interval [A, B] will be a discontinuous function of B The interval [X - 1.96, X + 1.96] has coverage probability 0.95 for all B ,
0
(see Example 11.2.1). For this reason, it is generally not possible to construct0 and therefore it is a 95% confidence interval for B. It is also a likelihood
exact confidence intervals in the discrete case. However, the effects of interval. Values of B included by this interval are more likely than the
discreteness become Jess important as the sample size increases. Thus it is excluded values.
often possible to find approxim ate confidence intervals for which CP(B ) is There are plenty of ways to construct confidence intervals in this example.
nearly constant over those paramete r values B which are of interest.
0 For instance, Table B2 gives
0
Because of the x approximation, likelihood intervals are exact or
2
P( -2.376 ~ Z ~ 1.751) = 0.95.
approxim ate confidence intervals in most applications. When (11.3.1) applies,
the approxim ate confidence coefficient (coverage probability) of the 100p% Thus the interval [X - 2.376, X + 1.751] has coverage probability 0.95 for all
likelihood interval is given by B0 , and is also a 95% confidence interval for B. Note, however, that this is not
a likelihood interval. It includes values of Bat the lower end which are much
CP::::: P{xf1) ~ - 2 log p} less likely than values excluded at the upper end. Although this interval
(see Table 11.2.1).
would cover the true parameter value 95% of the time in repetitions of the
experiment, it would not properly summarize the information available
Interpre tation concerning B in any particular application.
PROBLEMS FOR SECTION 11 .4 Assuming that .their measureme nts are independe nt N(3727, o- 2 ) , obtain a like-
I. Suppose that the distributio n of the likelihood ratio statistic D = -2r(0 ) does lihood interval for u which is an approxima te 95% confidence interval.
0
not depend upon 80 . Show that, for all p, the 100p% likelihood interval for 8 is a
8. Let X 1 , X 2 , ... , X. be IID N(µ, o- 2 ) variates, where µ is known but u is unknown
confidence interval.
(see Problem 11.4.7).
2. In a poll of 200 randomly chosen voters, 94 indicated that they would vote for the
(a) Show that the likelihood ratio statistic is
Conservatives if an election were called. Let p be the proportion of all voters who
would vote for the Conservatives. Find a likelihood interval which is an
approxima te 95% confidence interval for p. Is it likely that p = t?
D =-2r(u =T- n - n log(T/ n)
0)
3.t Five hundred people were chosen at random from a large population and were (b) Show that T has a x2 distributio n with 11 degrees of freedom.
asked their opinions on capital punishmen t for murderers of prison guards. Sixty (c) Show that the 100p% likelihood interval is a confidence interval, and describe
percent of those interviewed were in favor. Let p denote the fraction of the how its exact confidence coefficient can be determined.
population who favor capital punishment.
9. Let X 1 , X 1 , ... , X. be IID random variables having a gamma distributio n with
(a) Find likelihood intervals for p which are approxima te 95% and 99% p.d.f.
confidence intervals.
(b) Use the normal approxima tion to construct approxima te 95% and 99 %
confidence intervals, and compare them with the intervals in (a).
f(x} =; exp ( - ~)
2 for x > 0
4.t The following are the times to failure, measured in hours, of ten electronic where e is a positive unknown parameter.
component s: (a) Show that the likelihood ratio statistic is
2 119 51 77 33 27 14 24 4 37 = =
D -2r(8 0 ) T- 4n - 411 log(T/411)
Previous experience with similar types of componen ts suggests that the distri- where T= 2"LX;/8 0 .
bution of lifetimes should be exponential. The mean lifetime 8 is unknown. (b) Show that 2X,/80 has a x2 distributio n with 4 degrees of freedom, and hence
(a) Find a likelihood interval for (} which is an approxima te 90% confidence that T ~ xf4n»
(c) The total of n = 60 observatio ns was found to be "Lx; = 71.5. Find a likelihood
interval.
interval for 8 which is an approxima te 90% confidence interval. Will the exact
(b) Transform the result in (a) to obtain an approxima te 90% confidence interval
coverage probability be close to 90% in this situation?
for p, the proportion of componen ts whose lifetimes exceed 50 hours.
10. Let X 1 , X 2 , . .• , x. be IID exponentia l variates with mean 0, and define T= :EX;.
5. The number of accidents per month at a busy intersection has a Poisson
We noted in Example 11.3.3 that the distributio n of 2T/8 in repetitions of the
distributio n with mean µ, and successive months are independent. Over a 0
experiment with (} = 80 is xfini- Let a, b be values such that
10-month period there were 53 accidents altogether.
(a) Obtain a likelihood interval which is an approxima te 97.5% confidence P{xf2n) :$a}= 0.025 = P{xf2n) ~ b}.
interval for µ. (a) Show that the interval
(b) Use the normal approxima tion of Section 9.7 to obtain an approxima te
97.5% confidence interval forµ. 2T 5. fl< 2T
b - a
6. Wh en an automatic shear is set to cut plates to length µ, the lengths actually
produced are normally distributed about µ with standard deviation 1.6 inches. is a 95% confidence interval for e.
The average length of 15 plates cut at one setting was 125.77 inches. Find three (b) Let €IL and e. denote the lower and upper endpoints of the confidence interval
likelihood intervals for µ which are 90%, 95%, and 99% confidence intervals. in (a). Show that
7.t In a check of the accuracy of their measureme nt procedures, fifteen engineers are a 1
r(fl.) - r(flL) = n log b + 2(b - a).
asked to measure a precisely known distance of 3727 feet between two markers.
Their results are as follows:
(c) Using tables of the x2 distribution , evaluate r(O.) - r(Otl for n = 1, 5, 10, and
3727.75 3726.43 3728.04 3729.21 3726.30 15. Is the interval in (a) a likelihood interval? What happens as n increases?
3728.15 3724.25 3726.29 3724.90 3727.51 11.tLet X 1 , X z, . . ., X" be IID variates ha ving a continuous uniform distributio n on
3726.85 3728.50 3725.94 3727.69 3726.09 the interval [O, 8], where 8 is a positive unknown parameter.
120 11. Frequency Properties 11.5. Results for Two-Parameter Models 121
P{ Ms mfB = B0 } = (m/8 0 )" forms B0 ; freedom for D 2 • See Section 12.3 for a discussion of degrees of freedom.
P{D s dfB= B0 } = 1-e-aii for d>O. The true value (a. 0 , /3 0 ) belongs to the 100p% likelihood region if and only if
r(a. 0 , /3 0 ) 2 log p. Thus the cover:ige probability of the 100p% likelihood
Thus Dis distributed as xfz)· Note that (11.3.1) does not apply here because region for (a., /3) is
the range of the X,'s depends on 8.
(c) Show that t.he l~p% likelihood interval has coverage probability I - p. CP(a. 0 , /3 0)= P(D :s; -2 log pla. = a. 0, /3 = /3 0)
(d) Fmd a hkehhood mterval for B which is a 95% confidence interval based on
the following sample of size IO. ;:::; P(xf2> :s; - 2 log p).
The exact coverage probability may depend upon a. 0 and /3 0 , but the
0.7481 0.7484 0.9537 0.1589 0.3773 approximation does not. Consequently, likelihood regions are approximate
0.3345 0.2906 0.8527 0.3479 0.9245 confidence regions in large samples.
By (6.9.3), the c.d.f. of xf2 > is
ford> 0.
11.5. Results for Two-Parameter Models It follows that
Suppose that the probability model for the experiment mvolves two unknown CP;:::; P(xf2 i :s; - 2 log p) = 1 - e10g P = 1 - p.
par~meters, a. and /3. Let r(a., /3) denote the joint log RLF of a. and f3 as in
The 100p% likelihood region for (a, /3) is an approximate 100(1 - p)%
Section 10.1. The 100p% likelihood region for (a., /3) is the set of parameter
confidence region.
values such that r(a., /3);::: Jog p (see Section 10.2).
The true value /3 0 belongs to the 100p% maximum likelihood interval for f3
In Section 10.3, we defined the maximum log RLF of f3 to be the maximum
of r( a., /3) over a. with f3 fixed: if and only if r max(/3 0);::: log p. Thus the coverage probability of the 100p%
maximum likelihood interval for f3 is
r max(/3) =max r(a, /3). CP(a 0 , /3 0) = P(D 2 :s; - 2 log pja = cto, /3 = /30)
The 100p% maximum likelihood interval for f3 is the set of all [3-values such ;:::; P(xfi> :s; - 2 log p).
that rmax(/3) 2 log p. This interval can be found from a graph of r maxC/3), or
Maximum likelihood intervals are approximate confidence intervals. They have
from a contour map of r(a, [3).
the same approximate coverage probabilities as likelihood intervals in the one-
Now imagine a series of repetitions of the experiment with (a, f3) fixed at
parameter case.
(ao, /30). We consider two likelihood ratio statistics:
Figures 10.2.2, 10.2.3, and 10.6.1 show both 10% likelihood regions and
D =- 2r(a 0 , [3 0 ); 10% maximum likelihood intervals for three numerical examples. The 10%
likelihood region consists of all points on or within the 10% contour, which is
D2 '= - 2r max(/30). roughly elliptical in shape. This region would include the true values of both
D is the. likelih.oo~ ratio statistic for testing the hypothesis (a, f3) = (a 0 , [3 0 ), parameters in about 90% of repetitions of the experiment with both
and .D2 1s the hkehhood ratio statistic for testing the hypothesis f3 = f3o (see parameters fixed. The broken vertical lines show the 10% maximum
Sections 12.2 and 12.3). likelihood interval for the first parameter. The true value of the first
The values of D and D 2 would vary from one repetition of the experiment parameter would lie between these lines about 96.8% of the time in
to the next depending upon the data obtained. In principle, their exact repetitions with both parameters fixed. Similarly, the true value of the second
sa~~ling distrib~tions can be derived from the probability model. In practice, parameter would lie between the broken horizontal lines about 96.8% of the
this 1s usually difficult to do, and so approximations are used. time.
It can be shown, under conditions similar to those given in Section 11.3, In the one-parameter case we considered two normal distribution
122 11. Frequency Properties 11.5. Results for Two-Parameter Models 123
examples for which the distribution of the likelihood ratio stat1st1c was (10.4.5) gives a good approximation to rmaxUJ). A nonlinear parameter
exactly xfii· A two-parameter example is given below in which the distri- transformation may help. See the discussion in Sections 9.7 and 10.4.
xf
butions of D and D2 are exactly xf2 > and 1 i, respectively. It can be shown
that the same is true in Example 10.1.1.
PROBLEMS FOR SECTION 11.5
EXAMPLE 11.5.1. Suppose that an experiment involves taking two measure- 1. Use (11.5.l) to obtain approximate 95% confidence intervals for f3 and y in
ments x, y which are modeled as observed values of independent variates Example 10.5.1.
X ~ N(IX, 1) and Y ~ N(fl, 1). It is easy to show that & = x, 1J = y, and 2. (a) Find approximate 90% confidence intervals for ex and f3 in Problem 10.1.1.
r(IX, fl)= - !(x - 1X)
2
- t(y - fl) 2 (b) Use the result of Problem 10.4.4(c) to obtain an approximate 90% confidence
interval for the parameter y =et - f3 in Problem 10.l.l.
for - oo <IX< oo and - oo <fl< oo, and that
3.tUse Problem 10.4.4(c) to obtain an approximate 99% confidence interval for the
r max(fl) = i(Y - {J)2. parameter y =et+ 2/3 in Example 10.5.1. Transform this interval to obtain an
approximate 99% confidence interval for the probability of death at log concentra-
Imagine a series of repetitions of this experiment with IX= IX 0 and fl= fl 0 • tion d = 2.
The two likelihood ratio statistics are as follows:
4. (a) Let X and Y be independent Poisson variates with means µ 1 and µ 2 , and define
D = 2r(1X 0 , flo) = (X - 1X 0 )
2
+ (Y- /J 0 ) 2 = Zi + Z~; y=log(µ 2 /µ 1 ). Derive the information matrix .§(jL 1 ,jL 2 ). Then use Problem
D2 =-2rmax(/Jo) =(Y - /J 0 )
2
=Z~. 10.4.4(c) to show that
Since E(X;) = np" the expected informatio n function is As in Section 11. l, we imagine a series of repetition s of the experimen t with
.fE(8)=n l:- -
1 (dp;)2 -n:E-
d2p, 8 fixed at a particular value. The value of X will vary from one repetition to
the next. Sand .f are functions of X, and thus can· be considere d as random
p1 d8 d8 2 .
variables.
The latter sum is zero because I.p; = l. In what follows, we shall be taking expectatio ns over the distributi on of X.
Since dp;/d8 c= ±1 and dqjd8 = ±(1 - 8)/ 2, the expected informatio n func- Usually these expectatio ns would involve multiple sums or integrals.
tions for the two experimen ts are · However, for simplicity, we shall write all expectatio ns as single sums.
For any value of 8, the total probabilit y in the distributio n of X is equal to
n l ,,, ( ) n( l - 8) 2 l ·
fi(8) = - :E- an d .:r 2 8 = I.- . I, so
4 Pi 4 qi
2,f(x; 8) = l for all 8.
The ratio of these two functions, x
For all 8 > 0, the first experimen t is more efficient (has larger expected af
informatio n) than the second, and it is considera bly more efficient for 8 we have a =Sf, and therefore
8
near t . If costs were equal, the first experimen t would be preferable to the
second. O 2,Sf= I af =O.
x x a8
This shows that the expected value of the score function is zero. Also we have
Properti es of the Score and Informa tion Functions
We conclude this section by showing that, under suitable regularity con- f =-
as
a8 =-
a [ l of
a8 7 a8 J = f
1 (af )
2 a8
2
-
2
l a f
7 a8 2'
ditions, the score function S(8) has expected value 0 and variance equal to the
from which we obtain
expected informati on fE(8)=E {f(8)}. Since variances are non-negative, it
then follows that f E(8) ~ 0.
Let X be a random variable or vector of r~ndom variables having
probabilit y or probabilit y density function f(x; 8) which depends on a It now follows that
continuou s paramete r 8. The likelihood function of 8 is proportio nal to
f(x; 8), 2 a1f
Is J - Iff - I a8 = o.
-
2
L(8) = c ·f(x; 8), x x x
where c is positive and does not depend on 8. The score and informatio n The first sum is E(S 2 ), and the second sum is the expected informati on, so we
functions are have shown that
1 1
E(T) E(X) = -(n8) 8,
n n 1 1
that where a= -Lai. If Lai= 1, then a= and La( is minimized for
and hence T is an unbiased estimate of 8. n n
By the invariance property, the maximum likelihood estimate of 82 is a 1 = a2 = · · · =an= I/n. Hence the unbiased linear estimate ofµ with smallest
T 2 =: X 2 /n 2 , with expected value vanance is
E(T 2 ) = E(X 2 )/n 2 = [var(X) + E(X) 2 ]/n 2 '
D
by (5.2.3). Since var(X) = n8(1 - 8), we have
EXAMPLE 11.7.4. Let X be the number of successes before the first failure in
E(T 2 ) = [n8(1 - 8) + n2 82 ]/n 2 = () 2 + ()(l - O). independent trials with success probability 8. Define T(x) = 0 for x = 0, and
n T(x) = 1 for x;;::: 1. Show that Tis the unique unbiased estimate of e.
2
Hence T is not an unbiased estimate of 8 • The bias is 2
SOLUTION. The distribution of X is geometric:
8(1-
E(T2) - 02 = - - for x = 0, 1, 2, ....
n
The expected value of T is
which is positive for 0 < 8 < 1, and tends to zero as n-+ oo.
E(T)=O·P(T=O )+ 1·P(T=1)
EXAMPLE 11. 7.2. If X 1 , X 2 , ... , X n are independent Poisson variates with
meanµ, the maximum likelihood estimate ofµ is X = LXdn (Example 9.1.2). = P(X;;::: 1) = 1 - f(O) = ().
Since E(Xi) = µ, we have
Hence T is an unbiased estimate of ().
- 1 1 Now suppose that T' is another unbiased estimate, and define
E(X) = -LE(Xi) = -(nµ) = µ, U(x) = T'(x) - T(x) for x = 0, 1, .... Then
n n
and hence X is an unbiased estimate of µ. E(U) = E(T') - E(T) = 8 - 8 = 0 for all e.
By the invariance property, the maximum likelihood estimate of f3 = e -µis Also by (5.1.3) we have
e-x. Since T= I:Xi has a Poisson distribution with mean nµ (corollary to
Example 4.6.1), the expected value of e-x is E(U) = LU(x)OX(l - ()).
CHAPTER 12 true and then check whether this assumpt ion leads to an inconsistency. If a
contradiction is found, the hypothesis is disproved. If no contradi ction is
found, the method of proof fails and the hypothesis could be either true or
Tests of Significance false.
For instance, to prove by contradiction that there is no largest prime
number, we first formulate the hypothesis
H: there is a largest prime number
which is the opposite to what we want to prove. Assuming H to be true, there
are finitely many prime numbers p 1 < p 2 < · .. < Pn· If this is so, every
number larger than p. is composite, and is divisible by at least one of
P1.P2, ... ,p• . However, p=l+p 1 p 2 ... p. is larger than Pn and is
not
divisible by any of the p;'s. This is a contradiction, and therefore H is false.
In a mathematical proof by contradi ction, we look for a logical inconsis-
tency, but in statistical applications there will rarely be a logical inconsistency
between data and hypothesis. Even if we observed 100 heads in 100 tosses of a
A test of significance is a procedure for evaluating the strength of the evidence coin, we could not prove mathematically that the coin was biased, because
,.,: . -
provided by the data against an hypothesis. Section 1 gives a general this result could have arisen from ·100 tosses of a balanced coin. Nevertheless,
,.,..
introduc tion to significance tests and their interpretation, and defines test we would be quite sure that the coin was biased, because the probabil ity of
statistics and significance levels. obtaining such an extreme result with a balanced coin is extremely small.
In many applications, the hypothesis of interest can be formulated as an In a significance test, we compute the probability of observing such an
hypothesis concerning the values of unknown parameters in the probability extreme result when the hypothesis is true. The smaller the probability, the
model. It is then possible to derive a test statistic, called the likelihood ratio stronger the evidence that the hypothesis is false.
statistic, from the log likelihood function . Likelihood ratio tests are described
in Sections 2 and 3. EXAMPLE 12.1.l. Let X be the number of heads in 100 tosses of a coin. We
Sections 4, 5, 6, and 8 give applications of significance tests to examples assume that tosses are independent, and that 8, the probability of heads, is the
involving frequency data, where the basic model for the experiment is same at all trials. We observe a value of X, and we wish to test the hypothesis
binomial or multinomial. In particular, Section 5 discusses goodness of fit H : B=t. ·
tests for multinomial data, and Section 6 describes tests for independence in Under the hypothesis, X has a binomial distribut ion with probability
contingency tables. Section 7 is conrerned with the importan ce of controlled function
experiments and randomi zation in establishing cause and effect.
Significance intervals or regions are defined in Section 9, and their coverage f(x)= ('~)m100 for x = 0, 1, ... , 100.
probabilities are determined. Also, the connection between significance
intervals and likelihood regions is investigated. If H is true we expect to observe a value of X near 50. The quantity
The power of a test statistic against an alternative hypothesis is defined in D =IX - 501 measures how closely the observation agrees with the hypothesis.
Section 10. Power is sometimes useful in a theoretical comparison of two or If Dis close to 0, then Xis in good agreement with H : 8 =!-.A large value of
more possible test statistics, or in selecting the sample size for an experiment. D
indicates poor agreement between the data and hypothesis.
Suppose that we observe X = 35, so that the observed value of D is
\35 - 50\ = 15. The probability of getting such poor agreement with H (i.e. such
a large value of D) is
12.1. Introd uction
I P{D;;:;: 15} = P{\X - 501;;:;: 15}::::; 0.0027
A test of significance is a procedure for measuring the strength of the evidence (see below). If H were true, a result as extreme as X = 35 would occur very
provided by the data against an hypothesis H. It is similar to a proof by rarely. Thus we have strong evidence that H is false and the coin is biased.
contradi ction in mathematics. In each case we assume the hypothesis to be On the other hand, if we observe X = 45, the observed value of D is
136 12. Tests of Significance 12.1. Introduction 137
145 501 = 5, and the probability of such poor agreement with H is
the hypothesis were true, and we have evidence that His false. The smaller the
P{D;:::5}= P{IX 501:?::5}~0.32. significance level, the stronger the evidence against the hypothesis. A large SL
indicates only a lack of evidence against the hypothesis. Even a significance
Results as extreme as X = 45 would occur fairly often with a balanced coin,
level of 90% or 100% does not imply that the hypothesis is "probably true".
and we do not have evidence that His false. We have not shown that His true
The probability statement refers to the data, not the hypothesis.
either! There are plenty of other values of 8, such as 8 = 0.45, which could
Conventionally, 0.05 is taken to be the dividing line between "small" and
have produced X = 45. A large probability means simply that no contradic-
"large" significance levels. If SL s 0.05, the hypothesis is said to be con-
tion has been found. The method of proof fails, and H could be either true or
tradicted by the data (at the 5% level), whereas if SL> 0.05, the hypothesis is
false.
said to be consistent or compatible with the data (at the 5% level). Of course,
The above probabilities could be calculated exactly by summing f(x) over
this convention should not be taken too seriously. Significance levels 0.049
the appropriate values of X. Instead, the normal approximation to the
and 0.051 are on opposite sides of 0.05, but they imply about the same
binomial distribution was used (see Section 6.8). Under H: p = t we have
strength of evidence against the hypothesis.
X ~bin (100, t) ~ N(50, 25),
EXAMPLE 12.1.2 (Test for ESP). Consider a possible experiment for detecting
so that (X - 50)/5 is approximately N(O, 1). Thus
ESP (extra-sensory perception) in a human subject. Four cards labeled A, B,
C, and D are shuffled and placed face down on a table. The subject attempts
P{IX - 5012 15} = P{IX ~ 50 12 3} ~ P{IZI 2 3} to match the hidden letters to envelopes marked a, b, c, and d, and the number
of correct matches is recorded. The experiment is to be repeated 50 times
where Z ~ N(O, 1). Now Table B2 gives altogether.
P{IZI;::: 3} = 2(1 - 0.998650) = 0.0027. Even if the subject has no special powers, some correct matches will occur
by chance. A subject with ESP should be able to achieve more correct
matches than would occur by chance alone. To determine whether there is
evidence for ESP, we compare the results obtained with what would be
Test Statistics and Significance Levels expected under H, the hypothesis that the subject has no ESP. If the observed
results are in reasonable agreement with H, then we cannot claim to have
For a test of significance we require a ranking of possible outcomes according
proof of ESP.
to how closely they agree with the hypothesis. This ranking is usually
Let T denote the total number of correct guesses in 50repetitions. We shall
specified by defining a test statistic D, also called the test criterion or
show below that, under the hypothesis of no ESP, T has approximately a
discrepancy measure. A small value of D shows close agreement between the normal distribution with mean 50 and variance 50. Large values of Twill be
outcome and the hypothesis, and a large value of D indicates poor agreement. interpreted as evidence against H and in favor of ESP, so we take the test
The test statistic is to be chosen before the data are examined, and the
choice will reflect the type of dep~rture from the hypothesis that we wish to
=
statistic to be D T. The significance level is then
detect. A general method for constructing test statistics from the likelihood
function will be described in the next two sections. Power comparisons may
help in choosing among several possible test statistics (see Section 12.10).
When the experiment has been performed and data have been obtained, we where Z has a standardized normal distribution.
can compute the observed value of D. Then, assuming H to be true, we For instance, suppose that such an experiment produced the following
compute the probability of obtaining a value of D at least as great as that data:
observed. This probability is called the significance level (SL), or P-value, of
No. of correct matches 0 2 4 Total
the data in relation to the hypothesis: Frequency observed 17 18 9 6 50
SL= P{D 2 DobslH is true}.
The total number of correct matches is
The significance level is the probability of observing such poor agreement
between the hypothesis and data if the hypothesis is true. 7;,bs = 0 X 17 + 1 X 18 + 2 X 9+4 X 6 = 60,
If SL is very small, then such poor agreement would almost never occur if and hence the significance level is
138 12. Tests of Significance
12.1. Introduction 139
bility in Bernoulli trials, and the hypothesized value was B0 =!. His a simple Bernoulli trials with success probability B. We want to test the hypothesis
hypothesis because it specifies a numerical value for the only unknown H: () = () 0 where () 0 is a particular numerical value such as!.
The distribution of Xis binomial (n, B), and the log likelihood function of B
parameter.
Let /(B) denote the log likelihood function and () the MLE of B under the is
model. The maximum log likelihood under the model is /(B). The log l(B) = x log B + (n - x) log(l - B)
likelihood under the hypothesis is l(B 0 ). The likelihood ratio statistic for
testing H: B = B0 is defined to be twice the difference between these two log for 0 < B < 1. The MLE is () = x/n, and the maximum log likelihood is
likelihoods, x n-x
1(0) = x log-+ (n - x) log-- .
D = 2[/(B) -1(8 0 )] = - 2r(B 0 ), (12.2.l) n n
where r(B) is the log relative likelihood function of B. The log likelihood under H is
e
Since maximizes l(B), we have l(B) s l(B)for all values of B, and therefore l(B 0 ) = x log B0 + (n - x) log (1 - B0 ),
D 2: 0 (see Section 11. l). A small value of D means that the outcome of the
experiment is such that B0 is a likely paramete r value. A large value of D and so the likelihood ratio statistic for testing H: B = B0 is
means that the outcome is such that B0 is unlikely. Thus D ranks possible D = 2[/(B)-1 (8 0 )] = - 2r(B 0 )
outcomes of the experiment according to how well they agree with H: B = B0 . n-x
x
Taking D as the test statistic, we have = 2x log nBo + 2(n - x) log n(l _Bo)
SL= P{D 2: D bslH is true}
If n is large, the distribution of D is approximately x with one degree of
0 2
(12.2.2)
= P{D 2: D0 bsiB =Bo}, freedom (see Example 11.3.l). If B0 is near!, the approxim ation gives fairly
accurate results for n = 20. However, a much larger value of n is needed when
where Dobs is the observed value of D. The significance level is calculated from
B0 is close to 0 or l.
the (sampling) distribution of the likelihood ratio statistic when B = B0 . If we
For instance, suppose that we observe X = 35 inn= 100 trials and wish to
imagine a series of repetitions of the experiment with B fixed at B0 , SL is the
test H: B = ! as in Example 12. l. l. The likelihood ratio statistic for testing
fraction of the time that the test statistic D would be greater than or equai to
the observed value D0 b,. H: B=! is
x 100- x
Upon compari ng (12.2.2) with (11.2.;2,), we see that there will be a close D = 2x log + 2(100 - x) log --SO-,
50
connection between significance levels in likelihood ratio tests and coverage
144
12. Tests of Significance
12.2. Likelihood Ratio Tests for Simp
le H~potheses
and its observed value is 145
The observed value of Dis certainly not unusua)ly large, and hence there is no Pea type RY RG AY AG
evidence against the hypothesis that the die is balanced. Frequency 315 108 101 32
The exact significance level is a sum of multinomial probabilities:
SL= P{D 2: 3.701H is true}
Test the hypothesis that the probabilities of the four types are~, 2_, ~,and .2_
16 16 16 16,
respectively, as predicted by Mendelian theory.
= r.u)~o .. .f5)G)100 6. In a long-term study of heart disease in a large group of men, it was noted that 65
men who had no previous record of heart problems died suddenly of heart
The sum is taken over all sets of frequencies {fj} with I.Jj = 100 such that attacks. The following table shows the number of such deaths recorded on each
D 2: 3.70. Much arithmetic is needed to determine the appropriate sets of day of the week.
frequencies {fj}, although the calculations are certainly feasible on a high-
speed computer. · Day of week Mon. Tues. Wed. Thurs. Fri. Sat. Sun.
Alternatively, one could simulate the experiment a large number of times No. of deaths 22 7 6 13 5 4 6
on a computer and determine the fraction of the time that Dis greater than or
equal to 3.70. This gives an estimate of SL which can be made as precise as Test the significance of these dat(!. in relation to the hypothesis that deaths are
desired by increasing the number of simulations. equally likely to occur on any day of the week.
148
12. Tests of Significance
12.3. Likelihood Ratio Tests for Composite Hypotheses
149
7. (a) Let X,, X 2 , .. ., X. be IID Poisson variates with meanµ.
Derive the
likelihood ratio statistic for testing H: µ = µ .
0
12.3. Likelihood Ratio Tests for Composite
(b) Prior to the installation of a traffic signal, there were 6
accidents per month
(on the average) at a busy intersection. In the first year following
Hypotheses
the
installation there were 53 accidents. Using an approximate likelihood
ratio In this section we extend the discussi on of likeliho od ratio tests
test, determine whether there is evidence that the accident rate has changed. to include
compos ite hypothe ses as well as simple hypothe ses.
8.t(a) Let X 1 , X 2 , .. ., X" be independent exponential variates with mean Suppos e that the basic probabi lity model for the experim ent depends
9. Derive upon
the likelihood ratio statistic for testing H: (} = 9 . a vector of unknow n parame ters B, and conside r an hypothe sis H
0
(b) Survival times for patients treated for a certain disease may concern ing
be assumed to be the value of B. Togethe r, the basic model and hypothe sis determi
exponentially distributed. Under the standard treatment, the expected ne the
sur- hypothe sized model.
vival is 37.4 months. Ten patients receiving a new treatment survived for
the Let k denote the nm:pber of function ally indepen dent unknow n parame
following times (in months): ters
in the basic probabi lity model, and let q denote the number of
function ally
99 8 30 6 53 indepen dent unknow n parame ters which remain in the hypothe
sized model.
60 44 12 105 17 In general, it is not possible to test an hypothe sis H unless it produce
s a real
(i) Are these data consistent with a mean survival time of 37.4 months? simplifi cation in the model, so that q < k.
(ii) The doctor who developed the new treatment claims that it gives a A simple hypothe sis specifies numeric al values for all of the
50% unknow n
increase in mean survival time. Are the data consistent with this claim? paramet ers m the basic probabi lity model. Thus there are no
unknow n
(iii) Obtain a likelihood interval which is an approximate 95% confiden parame ters in the hypothe sized model, and so q = 0 for a simple hypothe
ce sis. A
interval for the mean survival time under the new treatment. compos ite hypothe sis does not complet ely elimina te the unknow
n par-
9. (a) Let X, , X 2 , .. ., x. be IID normal variates with known standard ameters , and so q > 0 for a compos ite hypothe sis.
deviation a Let l(B) denote the log likeliho od function of B under the basic model.
and unknown mean µ. Derive the likelihood ratio statistic for testing Let ()
the be the MLE under the basic model, so that l(G) 2 l(B) for all possible
hypothesis H: µ = µ 0 . values of
(b) The measurement errors associated with a set of scales are independ B. The maximu m log likeliho od under the basic model is l(G).
ent
normal with known standard deviation a= 1.3 grams. Ten weighings Next let 8 denote the MLE of B under the hypothe sized model.
of an The
unknown mass µ give the following results (in grams): maximu m log likeliho od under the hypothe sis is l(O). Since /(G)
2 l(B) for all
possible values of B, we have l(B) z l(O). The restricte d maximu m
227.1 226.8 224.8 228.2 of l(B) under
225.6 the hypothe sis cannot exceed the unrestri cted maximu m of l(B).
229.7 228.4 228.8 225.9 229.6 The likelihood ratio statistic for testing the hypothe sis H is defined
(i) Perform likelihood ratio tests of the hypothesis µ = 226, and to be
the twice the difference between these two maximu m log likeliho ods,
hypothesis µ = 229.
(ii) For which parameter values µ 0 does a likelihood ratio test of H: µ D = 2(1(0) -1(0)]. (12.3.1)
= µ0
give a significance level of 5% or more?
Note that D is twice the natural logarith m of a ratio of likeliho
ods,
10. Let X 1 , X 2 , .. ., X. be independent normal variates with known
variances D = 2 log [L(B)/ L(O)],
V1, Vi, .. ., v. and the same unknown mean
µ. Show that the likelihood ratio
statistic for testing H: µ = µ 0 is and this explains its name.
· Since /(0) 2 1(0), D is non-neg ative. If D is small, then the
D =(jl.- µ 0 ) 2 :Ev1- •, probabi lity of the data is nearly as great under the hypothe sis as
maximu m
it is under
where jl. = (:EX 1v1- 1 )j:Ev1- 1
• Show that, if H is true, the distribution of D is exactly the basic model, and therefor e the data are in good agreem ent
with the
xfl)· hypothe sis. A large value of D means that the data are much less
probabl e
under the hypothe sis, and therefor e the agreem ent is poor. Thus
D ranks
possible outcom es of the experim ent accordi ng to how closely they
agree with
the hypothe sis.
A simple hypothe sis has the form H: B = B , where B is a
0 0 vector of
numeric al values. Under H there is only one possible parame ter
value Bo .
Thus we have 8 = B0 , and the maximu m log likeliho od under
H is /(Bo).
Hence (12.3.1) is the same as (12.2.1) when His a simple hypothe
sis.
150 12. Tests of Significance 12.3. Likelihood Ratio Tests for Composite Hypotheses 151
Calculation of the Significance Level Usually we can find the MLE's ii and pby solving the simultaneous equations
where Dis the likelihood ratio statistic for testing H, and Dobs is the observed S i(a, /3 0 ) = 0.
value of D. The maximum log likelihood under the model is I(&, '/J). The maximum log
Calculation of the exact significance level is possible in some examples, but likelihood under the hypothesis H : f3 = {3 0 is /(&({3 0 ), /3 0 ). Hence the likelihood
in general there are both theoretical and computational difficulties. If H is ratio statistic for testing H : f3 = /3 0 is
composite, the exact significance level may well depend on the values of the q
unknown parameters in the hypothesized model. Sometimes this problem can D = 2[/(&, P) - l(&(f3o), Pon
be avoided by using a suitable conditional distribution to calculate the Note that, by (10.3.1), '
significance level, but then the calculations required may become unman- D = - 2r maA/30)
ageable. See Chapter 15 for further discussion of conditional tests.
where r max(/J) is the maximum log relative likelihood function of {3. We
Usually it is satisfactory to calculate an approximate significance level considered this likelihood ratio statistic in Section 11.5, and noted that its
using the x2 approximation to the distribution of the likelihood ratio statistic
distribution when f3 = {3 0 is approximately x2 with k - q = 1 degrees of
D. It can be shown that, under conditions similar to those described in
freedom. Thus we have
Section 11.3, the distribution of D when H is true is approximately x2 with
k - q degrees of freedom . When this approximation applies, we have SL= P{D;?: Dobsl/J = /30} ~ P {xfl) ;?: Dobs}·
(12.3.2) There is one degree of freedom for testing H : /1 = p0 , because it reduces the
number of unknown parameters by one.
which can be evaluated using Table B4.
The x2 approximation will generally be· quite accurate whenever the EXAMPLE 12.3.1. In Example 10.1.2 we considered the lifetimes x 1 , x 2 , ... , x"
number of independent observations in the experiment is large in comparison of n = 23 deep-groove ball bearings. These were assumed to be independent
with k, the number of parameters in the basic model. It is unwise to trust observations from a Weibull distribution with probability density function
(12.3.2) whenever 8 or ~ is on or near the boundary of the parameter space.
Note that the degrees of freedom for the x2 approximation is equal to k - q, for 0 < x < oo.
where k and q are the numbers of functionally independent unknown There are two unknown parameters, ,l > 0 and f3 > 0.
parameters in the basic model and hypothesized model, respectively. Thus the
We noted in Example 10.2.2 that the value p = 1 is of special importance,
degrees offreedom for testing H is equal to the number of unknown parameters
because when P= 1 the Weibull distribution simplifies to an exponential
which are eliminated by H.
distribution. Under an exponential distribution model, there is a constant
To conclude this section, we give two examples of likelihood ratio tests for risk of failure, and no deterioration or improvement with age. Thus we wish
composite hypotheses. Many additional examples will be found in the
to know whether the 23 observed lifetimes are consistent with the hypothesis
following sections.
P= 1.
To test H : p = 1, we shall compute the observed value of the likelihood
ratio statistic and then use the x2 approximation. Since H reduces the number
Testing H: f3 = /3 0 when rx is Unknown of unknown parameters by one, th.ere is one degree of freedom for the test.
From Example 10.1.2, the joint log likelihood function is
Suppose that the probability model involves two unknown parameters,
IJ =(a, {3), so that k = 2. Consider the hypothesis H : /3 =Po where f3 is a /(,l, p) = n log ,l + n log P+ (p- 1)1: log x; - ,lfaf,
particular numerical value. This is a composite hypothesis because no value is and the MLE's are
given for a. The hypothesized model involves the unknown parameter a, so p= 2.1021.
that q = 1.
Let /(a, p) be the joint log likelihood function of a and Punder the model. The maximum log likelihood under the model is
Let S 1 and S 2 be the two components of the score function as in Section 10.1. l(X, Pl = - 113.691.
152 12. Tests of Significance 12.3. Likelihood Ratio Tests for Composite Hypotheses
153
Also from Example 10.1.2, the MLE of). given f3 is by (12.3.1), the likelihood ratio statistic for testing H is
X(/3) = n/"i..xf. D = 2[L.11(0;) - L.l;(B) J = - 2'Lr;(B)
Thus the MLE of). under the hypothesi s f3 = 1 is where r1 is the log RLF from the ith experimen t:
A(l) = n/'Lx; = 23/ 1661 = 0.001385, r;(8;) = 1;(8;) -11(i1;).
and the maximum log likelihood under H: fJ = 1 is If D is large, there is no parameter value which is reasonabl y plausible in all
experimen ts, and hence the experimen ts give conflictin g informati on about 8.
1(0.001385 , 1) = -121.433.
There are k - 1 degrees of freedom for testing H because it reduces the
The observed value of the likelihood ratio st!!tistic for testing H: fJ = 1 is number of unknown parameter s from k to 1. Hence (12.3.2) gives
twice the difference between these maximum Jog likelihood s:
SL~ P{xfk - 1) ~ Dobs} ·
Dobs = 2[ -113.691 +121.433 ] = 15.48. A small significance level is evidence that the homogene ity hypothesi s is false ,
This result could also have been obtained from ·the expression for r (p) in and that the informati on from the k experimen ts should not be pooled.
Example 10.3.2. The X2 approxim ation gives max
EXAMPLE 12.3.2. [n Examples 9.2.2 and 9.3.2 we considere d data from k = 2
SL~ P{xf1 ) ~ 15.48} < 0.001 experimen ts with test tubes containin g river water. The paramete r of interest
from Table 84. There is very strong evidence against the hypothesi s fJ = I. is µ, the expected number of bacteria per ml of river water. For the first
The observatio ns are not compatibl e with the simpler exponenti al distri- experimen t the Jog RLF is
bution model.
r 1 (µ) = -280µ + 12 log(I - e- ioµ) + 24.43,
and for the second experimen t we have
Tests for Homoge neity r 2 (µ) = -37µ + 3 Jog(l - e-µ) + 10.66.
Suppose that two or more independe nt experimen ts give informatio n about The pooled MLE ofµ based on the data from both experimen ts was found to
the same unknown paramete r 8. If the experimen ts are in reasonabl e be µ = 0.04005. Hence the observed value of the likelihood ratio statistic for
agreemen t with one another, we can pool or combine the informatio n about 8 testing homogene ity is
by adding log likelihood functions (see Section 9.2 and Example 9.3.2). If, on
the other hand, the experimen ts contradic t one another, it would not be
appropria te to combine them. Instead we would estimate 8 separately for There is just one degree of freedom for testing H, and thus
each experimen t, and try to discover why the experimen ts produced dis-
similar results. SL~ P{xfl) ~ 1.24} > 0.25.
Suppose that there are k independe nt experimen ts, and initially let us There is no evidence against the homogene ity hypothesi s, and it is reasonabl e
suppose that we have a different parameter 8; for each experimen t. Let 1;(8) to pool informatio n about µ as we did in Example 9.3.2.
and 81 denote the Jog likelihood function and MLE for the ith experiment.
(i = 1, 2, ... , k). The overall Jog likelihood function is
PROBLEMS FOR SECTION 12.3
l(81 , 82 , .. .,8k)=l 1 (8i)+l 2 (8
2 )+ .. . +lk(Bd='L l;(O;),
J.tSuppose that X 1 , X 2 , and X 3 have a trinomial distributio n with index n and
and its maximum value is L.l;(O;). probability parameters p 1 , p 2 , p 3 where "'£.pi= 1. The log likelihood function is
Now consider the hypothesi s of homogene ity,
l(p 1 , p 2 , p 3 ) = "LX 1 log p1 ,
H : 8 1 =8 2 = ... =Bk> and the observed values of the X/s are 32, 46, and 22 (see Problem 9.1.4).
and Jet 8 denote the unknown common value of the 8/s. Under H, the log
(a) Find the maximum log likelihood when
likelihood function is L.1;(8), and we maximize this to obtain the combined or
e,
pooled MLE say. The maximum of the log likelihood under His 'Ll;(B), and (i) Pi is estimated as X 1/n for j = l, 2, 3;
154 12. Tests of Significance 12.3. Likelihood Ratio Tests for Composite l'Jypotheses 155
(ii) the p/s satisfy the hypothesis The results were as follows:
(b) Use the results from (i) and (ii) above to test the hypothesis H. The bacteria are assumed to be randomly and uniformly distributed
(c) Use the results from (ii) and (iii) to test whether 8 =!,assuming H to be true. throughout the river water, with µ 1 per unit volume at location 1, and µ 2 per
(d) Use the results from (i) and (iii) to test the hypothesis p 1 = p2 = i, unit volume at location 2. Test the hypothesis µ 1 = µ 2 ,
p 3 = !. Note that the likelihood ratio statistic and degrees of freedom are the
totals of those in (b) and (c). 6. Let X 1 , X 2 , .. ., Xn be independent exponential variates with mean 81 , and let
Y1 , Y2 , .. ., Ym be independent exponential variates with mean 82 . Show that the
2. A genetics experiment yields observations X 1 , X 2 , X 3 , X 4 with multinomial likelihood ratio statistic for testing H: 8 1 = 82 depends only on n, m, and X/Y.
probability
7. Suppose that k independent experiments give log RLF's r 1 , r 2 , .. ., rk and MLE's
81 , 82 , .. ., Bk for the same unknown parameter 8. Furthermore, suppose that the
normal approximation applies to each of the r/s:
where :EX;= n. The following are the results from three independent repetitions of r,(8) ~ -!(8 - 8,) 2 c,,
the experiment: where c; = J,(8,).
(a) Show that the MLE of 8 based on all k experiments is approximately equal to
e,where
Repetition 1 26 7 9 22
Repetition 2 24 9 9 22
Repetition 3 23 9 12 20 (b) Show that the likelihood ratio statistic for testing H: 8 = (} 0 is approximately
(8- 80 )2 :Ee,.
Test the hypothesis that the value of p is the same in all three repetitions. (c) Show that the likelihood ratio statistic for testing the homogeneity hypothesis
H: 8 1 = 82 = ... = (}k is approximately :E(fJ, - 8) 2 c,.
3. (a) Let Y1 , Y2, .. ., Yi, be independent Poisson variates with means µ 1 , µ 2, .. ., µk. (d) What are the approximate distributions of the likelihood ratio statistics in (b)
Show that the likelihood ratio statistic for testing H: µ 1 = µ 2 = ... = µk is and (c)?
given by
8. Continuation of Problem 7. Seven different dilution series experiments were used
D 2:EY;log(Y;/Y).
to estimate a parameter h, called the "hit number". The MLE fi and observed
(b) The numbers of cancer cells surviving a treatment in each of three replications information .J are given below for each of the seven experiments.
of an experiment were 235, 184, and 189. Test the hypothesis that these three
observations come from the same Poisson distribution. fi 2.028 2.108 1.912 1.675 1.730 1.808 1.889
.J 19.63 25.18 32.34. 70.54 64.88 67.63 36.58
4. (a) Suppose that Y1 , Y2 , .. ., Y,, are independent Poisson variates with means µ 1 ,
µ 2 , .. ., µn. Let P 1 , P 2 , .. ., Pn be known constants. Consider the hypothesis In each case, the likelihood function was approximately normal in h.
H: µ1 =A.Pi, µ2 = AP2, .. ., µn= ).pn (a) Are these results consistent with a common value of h in all seven
where ). is unknown. Show that the likelihood ratio statistic for testing H is experiments?
(b) Are the combined results consistent with the theoretical value h = 2?
D = 2:EY, log(YJ,ii,)
9.tContinuation of Problem 7. Suppose that three independent experiments give
whereµ,= 'XP, and A= (:EY;)/(:EP,). likelihood functions that are approximately normal in (}, with the following
(b) In Problem 9.2.2(b), test the hypothesis that the death rates for the 10 regions summary statistics:
are proportional to the populations of the regions.
e 1 =9.74 82 = 8.35 83=10.27
5.t(a) Let X 1 , X 2 , .. ., X" and Y1 , Y2 , .. ., Ym be independent Poisson variates. The J 1 (8 il = o.563 J 2 (B2) = 0.345 J 3 (B3) = o.695
X;'s have expected value µ 1 , and the Y;'s have expected value µ 2 • Derive the
likelihood ratio statistic for testing the hypothesis µ 1 = µ 2 • (a) Test the hypothesis that the value of(} is the same in all three experiments.
(b) Bacteria counts were made for 27 volumes of river water, each of unit volume. (b) Obtain four approximate 95% confidence intervals for (}, one from each
156
12. Tests of Significance
12.4. Tests for Binomial Probabilities
157
experiment taken separately, and one from the
combined results of all three
experiments. Now suppose that we wish to test an hypothesis
H about the p;'s. For
10. Consider the situation described in Problem instance, we may wish to test that they are equal
10.1.5. Testing stops when there have :
been m failures with each treatment. Let X
1 , X •.•. , X m be the numbers of
successes with treatm ent A, and let Y , Y , ••• , Ym 2 Hi:P i=P2 = ··· Pk·
1 2 be the numbers of successes with
treatment B. Derive the likelihood ratio statistic Their common value is not given, so there is one
for testing the hypothesis ex= fJ. unknown param eter under
11. Let X 1 , X 2 , ••. , X n and Y , Y , ••. , Ym be indepe Hi. Alternatively, if the k treatments are differ
1 2 ndent normal variates, all with the ent doses d1' d 2 , .. ., dk of a
same known variance a 2 • The X;'s have mean drug, we might wish to test the hypothesis
µ 1 and the Y;'s have mean µ .
2 ·
(a) Show that the likelihood ratio statistic for for i = 1, 2, .. ., k,
testing H: µ 1 = µ 2 is
which states that the response probability is relate
D= ~[n(
a
X -ii) 2 +m(Y -WJ nm _
--(X -Y)2,
-
model (10.5.1 ). There are two unknown param eters
d to the dose via the logistic
n+m under H 2. .
Assuming H to be true, we can rewrite the log
where µ (nX + mY)/(n + m). likelihood as a funct10n of
the q remaining unknown parameters and find
(b) Find the distribution of X - Y. Hen_ce show their MLE's. From these we
that the distribution of D is can compute p1 , p2, ... , pk, the MLE's of the origin
exactly x1n. al probability param eters
under H. The maximum of tht log likelihood is
then
l(p) =LY; log P; + L(n; - y;) log (1 - p;).
By (12.3.1), the likelihood ratio statistic for testin
12.4. Tests for Binomial Probabilities g H is
D = 2[/(p )- l(p)]
Suppose that k different treatments are to be
compared on the basis of
success/failure data. The first treatment is given = 2LY; log
to ni subjects and Yi _ + 2L(n; - y;) log (l _ - ) ·
successes are observed. The second treatment is n;p; n; P;
given to n2 different subjects
and Y2 successes are observed. The results of an Note that n;P; and n;(l - p;) are the expected
experiment with k treatments numbers of successes and
can be summarized in a table as follows: failures for the ith treatment under H, whereas Y;
and n; - Y; are the_observed
frequencies. Thus we can write
Treatm ent no. 2 k · obs freq
D = 2L(obs freq)· log f
No. of successes Yi Y2 exp req (12.4.1)
~
No. of failures ni - Yi n1 - Y2 nk- ~ where the sum extends over all 2k classes (succe
sses and failures).
Total The degrees of freedom for testing H is k - q,
ni n1 nk where q is the numb er of
unknown parameters which remain under H. By
We wish to make inferences abou t the success (12.3.2) we have
probabilities Pi, p 2 , ... , Pk on
the basis of the observed results. SL::::; P{x&-q) ~ Dobs}·
We assume that Y;, the numb er of successes The approximation will be accurate provided
with treatment i, has a that all of the expected
binomial (n;, p;) distribution, and that the Y;'s
are independent The basic frequencies n;p; and n;(l - p;) are fairly large.
model involves a vector of k different unknown param
eters, p = (Pi, P2, .. ., Pk),
where P; is the success probability for the ith treatm EXAMPLE 12.4.l. The food additive "Red Dye Num
ent. The log likelihood ber 2" was fed to 4~ rats at
function is a low dose and to 44 rats at a high dose. Later
the rats were examined for
/(p) =LY; log P; + L(n; - y;) log(l - p;). tumors, and the results were as follows:
The MLE of P; is P; = yjn;, and the maximum of Treatment Low dose High dose
the log likelihood under the
basic model is
Tumo r present 4(9) 14(9)
l(p) =LY; log~+ L(n; - Y;) log No tumor 40(35) 30(35)
n;
Total 44 44
12. Tests of Significance 124. Tests for Binomial Probabilities 159
158
Note that 47% developed tumors at the high dose, and only 9% developed Dobs = 2 [ 6 log _
6 + 44 log 44
_
43 61
+ ··· + 6 log 6
.4
4 7
J= 1.42.
6 39
tumors at the low dose. Could these results have arisen by chance, or is there
evidence of a real dose effect? Since H reduces the number of unknown parameter s from 5 to 2, there are
Let Y1 and Y2 be the numbers of rats with tumors at the low and high doses, three degrees of freedom for the test, and
respectively. We assume that Y1 and Y2 are independe nt, with Y1 - binomial
SL~ P{xf3 ,;:::: 1.42} > 0.5.
(n 1 , pi) and Y2 - binomial (n 2 , p 2 ), where n 1 = n2 = 44. We wish to know
whether there is conclusiv e evidence against H: Pt = p2 . The observed value of D is not unusually large, and hence there is no evidence
Let p denote the unknown common value of p 1 and p 2 under the against the hypothesi s of a logistic dose-resp onse curve.
hypothesi s H. From Example 9.2.2, the MLE of pis We concluded previously , after informal inspection s of Figure 10.5.2 and
- Yi + Y2 4 + 14 9 Table 10.5.2, that the logistic model fits the data well. The LR test just
p=--=---=-.
n 1 +n 2 44+44 44 performed provides a more formal justificati on of this conclusio n. The test
tells us whether the observed discrepan cies can be attributed to chance
variations . Tables and graphs tell us what kinds of departure s have occurred
Under H we have p1 = p2 =~,and the expected frequencies are
and how large they are. Both significance tests and less formal methods are
useful in assessing the fit of the model.
n1fi1=9;
n1(1- P1l = 35; PROBLEMS FOR SECTION 12.4
The table above shows these values in parenthes es. By (12.4.1), the observed 1. Two hundred volunteers participated in an experiment to examine the effectiveness
value of the LR statistic is of vitamin C in preventing the common cold. One hundred of them were selected at
random to receive a daily dose of vitamin C, and the others received a placebo .
30]
Dobs
.[ 4 14
= 2 4 log 9 + 14 log 9 + 40 log 40 + 30 log 35 = 7.32. None of the volunteers knew which group they were in. During the test period, 20
35 of those taking vitamin C and 35 of those receiving the placebo caught colds. Test
the hypothesis that the probability of catching a cold is the same for both groups.
Since H reduces the number of unknown parameter s from 2 to 1 there is one
degree of freedom for testing H, and 2. A seed dealer claims that his sweet pea seeds have a germination rate of 80% . A
customer purchased 4 packages of sweet pea seeds, one package of each of four
SL~ P{xfi>;:::: 7.32} < 0.01. colors. He planted 100 seeds from each package. The numbers of seeds germinating
within one month were as follows:
Results as extreme as those observed would rarely occur if p 1 and p2 were
equal, and therefore we have strong evidence against H: Pt= p 2 • The Red White Blue Yellow
greater at the high dose than at the low dose, and the Germination 75 66 81 74
incidence of tumors is
No germination 25 34 19 26
difference is too large to be attributed to chance.
(a) Test the hypothesis that the germination rate is 80% for all four colors.
EXAMPLE 12.4.2. Table 10.5.1 shows the data from an experimen t in which an
(b) Test the hypothesis that the germination rate is the same for all four colors (but
insecticide was administe red in k = 5 doses. We assume that Y;, the number not necessarily 80%).
killed at dose di> has a binomial (ni> Pi) distributio n, and that results for (c) Assuming that the germination rate is the same for all four colors, test the
different doses are independe nt. We wish to determine whether the logistic hypothesis that it is 80%.
dose-resp onse model (10.5.1) is compatibl e with these data. Thus the (d) How are the likelihood ratio statistics in (a), (b), and (c) related?
hypothesi s of interest is
3.tFour hundred patients took part in a study to compare the effectiveness of three
for i = 1, 2, . .. , 5 similar drugs. Each drug was given to 100 patients, and the remaining 100 patients
received a placebo. It was then observed whether or not there was improvement in
where a and fJ are unknown parameter s. the condition of each patient. The results were as follows:
We showed in Example 10.5. l that the MLE's of a and fJ are a= -4.8869,
Drug A Drug B DrugC Placebo
7J = 3.1035. Using these values, we computed estimated probabilit ies Pi and 24 19 29 10
Improvement
then found the expected frequencies ndii> nJl - p;) (see Table 10.5.2). Now, by No improvement 76 81 71 90
(12.4.l}, the observed value of the LR statistic for testing His
160
12. Tests of Significance
12.5. Tests for Multinomial Probabilit
ies
(a) Test the hypothesis that the prob 161
ability of improvement is the same
groups. in all four
(b) Test the hypothesis that the three To cons truc t a freq uenc y tabl e,
drugs are equally effective. we part ition the sam ple spac e S
(c) Assuming that the three drugs repe titio n into k mut uall y excl usiv for a sing le
are equally effective, test the hypothesi e'cla sses or even ts, S = A u A
success rate is the same for those s that the Let p1 be the prob abil ity of even 1 2
u ... u Ak.
receiving a drug as for those recei t A1 , and let fj be the num ber of
placebo. ving the occu rs in the n repe titio ns. Exa time s that A .
ctly one of the even ts mus t occu
(d) How are the likelihood ratio stati repe titio n, so "Lp = 1 and "Lfj = r in each
stics in (a), (b), and (c) related? 1 n.
4. An experiment involved expo Und er the assu mpt ion of inde pend
ent
then observing how many survived
sing a large number of cancer cells to
a treatment and fj's is mul tino mia l with join t prob abil repe titio ns, the dist ribu tion of the
. There were two treatments, each of ity func tion
applied to two different groups of which was
cells. The results were as follows: ui1; ....fiJp{' p{' ... Pi".
Treatment The log like liho od func tion is
A A B B
Num ber of cells 4800 0 4800 0 192000 1920 00 l(p) = l(P1, Pz, .. ., Pk)= °Lfj log p
Num ber surviving 7 9 49 39 1
whe re "Lp1 = 1. It can be show
Assume that cells respond independ n ,t hat, subj ect to this cond ition
ently, and that the survival probabili max imiz ed for p1 = y /n. Hen ce , l(p) is
four groups are ci: 1 , ci: , /3 , and /3 ties for the 1 the max imu m log like liho od und
2 1 2 , respectively. mul tino mia l mod el is er the basi c
(a) Test the hypothesis H: et: = et:
1 , /3 = /3 . l(p) = L.Jj log (fj/ n).
(b) Assuming that the hypothesis 2in 1 2
(a) is true, test the hypothesis that the The hypo thes ized mod el will dete
probability is the same for both treat survival rmin e the p/s num eric ally or as
ments. of unk now n para met ers. We find func tion s
5. Test the hypothesis of a logis the _ML E's of any unk now n para
tic dose-response model in Problem use thes e to com pute P1> p , met ers and
10.5. l. 2 .. ., Pk> the ML E's of the p
6.t Test the hypothesis p = e• +Pd hypo thes ized mod el. The max imu .'s und er the
in Problem 10.5.2. m of the log like liho od is then
7. An interviewer in a shopping plaz l(p) = L.fj log p .
a asks individuals who pass by if they
to fill in a questionnaire. He keeps are willing 1
asking people until 30 agree. The follo By (12.3.1), the like liho od ratio
the numbers of refusals he receives wing are stati stic for testi ng the mod el is
on each of six days.
D = 2[/( fi)- /(p) ] = 2"£fj log( j;le
Day ), (12.5.1)
2 3 4 5 6
Number refusing whe re e1 = np1 is the estim ated expe
70 67 80 62 100 cted freq uenc y for the jth clas s
und er the
112 hypo thes ized mod el. Not e that
Assume that individuals respond (12. 5.l) has the sam e form as (12.4
independently , and that each indiv sum is now take n over k clas ses .1), but the
questioned on the ith day has prob idual rath er than 2k classes.
ability p1 of responding. Test the Sinc e the k prob abil ities p , p ,
P1= P2 = ... = P6· hypothesis 1 2 .. ., Pk mus t sum to 1, ther e are only
func tion ally inde pend ent para met k- 1
Note: Since the distribution of the ers in the basi c mul tino mia l mod
number refusing is negative binomial be the num ber of unk now n para met el. Let q
than binomial, you will need to deriv rather ers in the hyp othe size d mod el. The
e the likelihood ratio statistic from first art; (k- 1)- q degr ees of free dom n ther e
principles. for the x2 app roxi mat ion, and (12.3
.2) give s
SL;::, P{Xfk-t -q) ~Dabs}·
Clas ses for whic h e ;::, 0 but f :2:
12.5. Tests for Multinomial Probab 1 l will have a big effect on D b,,
appr oxim atio n shou ld not be trus 1 and the x2
ilities thum b is that the e/s shou ld all
ted whe n the e/s are sma ll. The
0
usua l rule of
be at leas t 5, but an occa sion al
Sup pose that we have data from is not too harm ful. sma ller valu e
n inde pend ent repe titio ns of an expe
and that we wish to asse ss how rime nt, Ano ther test stati stic whic h may
well the data agre e with an hypo be used with mul tino mia l or bino
prob abil ity mod el for the expe rime thes ized data is the Pear son good ness of mia l
nt. One way of doin g this is to cons fit stati stic,
tabl e of obse rved freq uenc ies, truc t a
whi ch are then com pare d with
freq uenc ies und er the hyp othe expe cted
size d mod el (see Sect ion 1.4). (12.5.2)
sign ifica nce may be used to dete A test of Th!! obse rved valu e of this stati
rmin e whe ther the disc repa ncy stic will be very near ly equa l to
obse rved and expe cted freq uenc betw een the like liho od ratio stati stic (12.5.1) that of the
ies is too grea t to be attri bute d or (12.4 .1) whe n the e/s are very
to chan ce.
x
the sam e 2 app roxi mat ion can
be used .
larg e, and
162 12. Tests of Significance
12.5. Tests for Multinomial Probabilities
163
Significance tests for multin omial data using test statisti
c (12.5.1) or (12.5.2) If H were true, one would obtain D;;:: 6.22 in about 10%
are often called goodness of fit tests. of repetit ions of the
experiment. Therefore we do not have conclusive
evidence agains t the
EXAMPLE 12.5.1. In Examp le 12.2.2 the basic model hypothesis.
was multinomial with
k = 6 classes , and we carried out a likelihood ratio test This test and the one in Examp le 12.1.2 give about the
of the hypothesis same significance
level for these data, but in other examples they may
give quite different
H:p1 =pz= ··· =p6= t· results. For instance, suppo se that
The data were the observed frequencies 16, 15, 14, 20, 22,
13 from 100 rolls of
a die. This analysis can be simplified by using formu la (12.5.1
fo /1 = 10, f2 = 7, f4 = 8.
= 25,
) to compu te the Then 7;,bs = 56, and the test of Examp le 12. l.3 gives SL~
observ ed value of the likelihood ratio statistic. Under 0.2. However, the
H, each class has likelihood ratio test gives
expect ed frequency e =lOO (i)=16 .67. Now (12.5.1) gives
1
Dabs= 2 16 log 16
[ . + 15 log 15
16 67
. + ··· + 13 log 13
16 67 16 .67 = 3.70,
J SL~P{xf3 >;;:: 17.58} < 0.001.
The total numbe r of correct guesses is not far from
the expected numbe r
which agrees with the result in Examp le 12.2.2. Since H under H, but the observed frequencies are not at all
reduces the numbe r of like what we'd expect
unkno wn param eters from 6 - l = 5 to 0, there are 5 under H.
degrees of freedom for The likelihood ratio statistic (12.5.1) is a "general purpos
the test, and e" measu re which
does not look for any specific type of depart ure from H.
The test statist ic used
SL~ P{xfs);;:: 3.70};;:: 0.5 in Examp le 12.1.2 was designed to detect a particu lar type
of depart ure -an
as before. There is no evidence agains t the hypothesis excess of correc t guesses. It is more sensitive to depart
of a balanced die. ures of the type
anticipated, but it may fail to detect substa ntial depart
EXAMPLE 12.5.2. The following are the observed freque ures of other kinds.
ncies from the ESP
experi ment in Examp le 12.1.2: EXAMPLE 12.5.3. In Examp le 4.4.3 we considered the
distrib ution of flying-
bomb hits over 576 regions of equal area in south Londo
n. The following
No. correct j 0 table shows the numbe r of regions f which suffere
2 4 Total 1 d exactly j hits
(j=O, 1, 2, ... ):
Obs freq fj 17 18 19 6 50
Exp freq e1 18.75 16.67 12.50 2.08 50 No. of hits j 0 2 3 4 Total
under the basic model, the rs come from a multinomial distribution with Obs freq fj
Exp freq e1
229
226.74
211 93 35 7 576
J
9 8 211.39 98.54 30.62 7.14
k = 4 classes . If there is no ESP, the four classes have 1.57 576
probabilities
24 , 24 , One region received 7 hits, and the total numbe r of hits
6 observ ed on all 576
- and -l so the hypothesis
. f. .
o mterest 1s regions is
24' 24'
9 8 6 L.if1 = 229 x 0 + 211 x l + ... + 7 x 4 + 1 x 7 = 537.
1
H: Pt= 24, P2 = 24, p3 = 24, p4 = 24. Under the basic model, the f/s come from a multin omial
distrib ution with
We multip ly these four probabilities by 50 to get the k = 6 classes.
expected frequencies If points of impac t are random ly and uniformly distrib
under H. uted over the study
By (12.5.1), the observed value of the LR statistic is region, the numbe r of hits in an area should have a Poisso
n distrib ution. Thus
we consider the hypothesis
D =2[1 7log_ !!_+ :.. +6lo g_i_ J=6.22. for j = 0, I , 2, .. .
obs 18.75 2.08
H reduces the numbe r of unkno wn param eters from 4 where µ is an unkno wn param eter.
- l = 3 to 0, so Under H, the log likelihood function is
SL~ P{xf3>;;:: 6.22}::::: 0.10.
L.Jj log p1 = L.jf1 logµ - µL.jj- L.Jj logj!
164 12. Tests of Significance 12.5. Tests for Multinomial Probabilities
165
from which the MLE is found to be which is maximized for ji = 0.9300. Recomputi ng expected frequencies with
µ = L.jfj/'Lfj = 537/576 = 0.9323. this value ofµ gi ves D0 bs = 0.99 rather than 1.00.
In general, if the value ofµ used in the calculation s is not the "true" MLE,
(This is not quite right - see the note at the end of this example.) Using this D 0 b, will be too large. However, unless there is a substantial amo unt of
estimate, we can find p1 and e1 = 516p for j = 0, 1, ... , 4. The expected grouping, the difference will usually be too small to matter.
1
frequency for the last class is then obtained by subtraction from the total (see
Example 4.4.3). EXAMPLE 12.5.4. Consider the set of 109 waiting times between mining
The observed value of the LR statistic is accidents which we discussed in Sections 1.2 and 1.4. If accidents occur
randomly in time at the constant rate of }. per day, the time T between
Dobs =
I
2L.fj log2 = 1.18. successive accidents has an exponential distribution with mean (J = 1/ A. (see
ei
Section 6.5). Here we haven= 109 observation s t 1 , t 2 , . . . , t" , and we wish to
The hypothesis reduces the number of unknown parameters from k - l = 5 determine whether an exponential distribution model is satisfactory.
to 1, so One way to examine the fit of the model is to group the data into k classes
[a1 _ 1 ,a) and prepare a frequency table (see Example 1.2.1). Fo r the
SL~ P{xf4 > 2 1.18} ~ 0.9. exponential distribution we have
There is no evidence against the hypothesis. The observed frequencies are in
close agreement with the expected frequencies from a Poisson distribution. P(a1 _ 1 s; T < a)=exp(- a1 _ 1 /8)-exp(- a)8)
The expected frequency in the last class is only 1.57, and we might therefore and so the hypothesis of interest is
have some concern about the adequacy of the x2 approximat ion. To check
this, we could combine the last two classes into a single class ( 2 4) with H : p1 =exp(-a1 _ 1/8)-exp( -a) 8) forj= 1, 2, ... , k.
f = 7 + 1 = 8 and e = 7.14 + 1.57 = 8.71. Summing over the k = 5 classes gives There will be k - 2 degrees of freedom for testing H because it reduces the
D b, = 1.00 with (5 - 1)- 1=3 degrees of freedom, and
0 number of unknown parameters from k - l to 1.
Table 12.5.1 is obtained from Table 1.4.l by combining the last two classes.
SL~ P{xf3 > 2 1.00} ~ 0.8.
e
The ejs were computed using = t = 241 , which is the MLE based on the
The conclusion is the same as before. original set of 109 measureme nts. Now (12.5.1) gives
Note. In calculating ji above, we used the fact that the observation in class 2 5 D obs = 2L.fj log(fj /e) = l 8.79.
was "7''. Strictly speaking, ji should be obtained using only the information in Since there are k = 11 classes, we have
the frequency table. For the first test with k = 6 we have p = µ1e - µ/j! for
1 SL~ P{xf91 2 18.79} ~ O.o25.
0 s;j s; 4, and
Thus there is some evidence against the exponentia l distribution model.
P ~ s = 1-e-µ(1 + µ + µ 2/ 2! + µ 3 / 3! + µ 4 /4!) The expected frequency for the last class is only 1.72, and we might be
so the appropriate log likelihood function is tempted to combine the last two classes as we did in Example 12.5.3. We
4
I fj log p1 +f~ 5 log [l - e - µ(l + µ + ··· + µ 4 /4!)]. Table 12.5.1. Observed and Expected Frequencie s for
j = O the Mining Accident Data of Example 1.2.1
Maximizin g this by Newton's method or trial and error gives ji = 0.9291, and
this is the estimate which should be used in computing the ejs. The result is a Class Jj ei Class Ji el
slightly better fit (Dobs = 1.17 rather than 1.18), but no change in the [ 0, 50) 25 20.42 [ 300, 350) II 5.88
conclusion. [ 50, 100) 19 16.60 [ 350, 400) 6 4.78
Similarly, when we combine the last two classes, the appropriate log [100, 150) 11 13.49 [ 400, 600) 5 11.69
likelihood ftinction is [150, 200) 8 10.96 [ 600, 1000) 3 7.32
[200, 250) 9 8.91 [1000, co) 5 1.72
3
I fj log p1 +f;,:4 log [1-e-µ(l + µ + µ 2 /2! + µ 3/3!)] [250, 300) 7 7.24
Total 109 109.01
j=O
12 Tests of Significance 12.5. Tests for Multinomial Probabilities 167
166
4. Test the goodness of fit of a Poisson distribution model to the data of Example
would then obtain D obs = 11.51 with 8 degrees of freedom, and SL::<::: 0.2. The 4.4.2.
result is now quite different because the deviations in the last two classes were
in opposite directions and have cancelled one another. Combining these 5. In a biological experiment, a square millimeter of yeast culture was subdivided
classes is not a good idea because it obscures the difficulties in the right hand into 400 equal-sized squares, and the number of yeast cells in each small square
was recorded. The results are summarized in the following frequency table:
tail of the distribution. In fact, the model is not appropriate for these data
because the accident rate ). is not constant over time (see Example l.4.2).
Number of cells 0 2 3 4 5 6 '?.7
A difficulty with the above analysis is that the results obtained will depend Frequency observed 137
129 83 38 10 2 0
to some extent upon the arbitrary grouping used to produce the frequency
table. It is a good idea to try three or four different groupings and check that If yeast cells are randomly and uniformly distributed over the area examined the
similar results are obtained. Alternatively, the fit of the model can be checked number of yeast cells per square should have a Poisson distribution. Test wh:ther
via informal graphical procedures (see Example 6.3.1). a Poisson model is consistent with the data.
The note following Example 12.5.3 applies to this example as well. The
6.t According to genetic theory, blood types MM, MN, and NN should occur in a
likelihood function based on the frequency table is
very large population with relative frequencies 8 2 , 28(1 -8), and (I - 8) 2 where 8
is the (unknown) gene frequency .
where p1 = exp(-a1_ 1 /8)-exp(-a1/8),
(a) The observed frequencies in a sample of size 100 from the population were 33,
and maximizing this function gives~= 232.5. This value, not()= 241 , should 44, and 23, respectively. Test the goodness of fit of the model to these data.
properly have been used in computing expected frequencies. We would then (b) Suppose that the observed frequencies in a sample of size 400 were exactly
have obtained D b• = 18.65 instead of D b, = 18.79, a change which is too four times those given in (a). Carry out a goodness of fit test and explain why it
0
0
gives a different result than that in (a).
small to be of any practical importance.
7. Test the goodness of fit of the model in Problem 10.1.1.
PROBLEMS FOR SECTION 12.5 8. Test the goodness of fit of the exponential distribution model in Problem 9.6.3.
I. In Problem 12.1 .9, carry out a goodness offit test of the hypothesis that blocks are 9. Test the goodness of fit of the model in Problem 9.1. IO(b).
placed in a random order. 10.t(a) A city police department kept track of the number of traffic accidents
2.tTwelve dice were rolled 26306 times. Each time, the number of dice showing 5 or 6 involving personal injury on sixty week-day mornings. The results were as
uppermost was r.ecorded. The results are summarized in the following table: follows:
No. of 5's and 6's 7 8 9 10 II 12 Total ls a Poisson distribution model consistent with these data?
Frequency observed 1331 403 105 14 4 0 26306 (b) The police department also recorded the number of persons injured in traffic
accidents for the same sixty mornings, with the following results:
Compute expected frequencies under the assumption that trials are independent
and the dice are balanced. Test for consistency, and give a possible explanation for Number injured 0 I 2 3 4 5 6 7 2::8
the poor agreement. Frequency observed 17 8 9 8 10 4 2 2 0
3. Mass-produced items are packed in cartons of 10 as they come off the assembly
line. The items from 250 cartons are inspected for defects, with the following If injuries were randomly and uniformly distributed over time, the number of
injuries per morning would have a Poisson distribution. Show that this model
results:
is contradicted by the data, and indicate which of the assumptions for a
Number defective 0 I 2 3 4 5 '?.6 Poisson process is violated.
Frequency observed 103 81 39 19 6 2 0 (c) Of the 83 accidents recorded in (a), 22 occurred on Mondays, 13 on Tuesdays,
11 on Wednesdays, 12 on Thursdays, and 25 on Fridays. Are these results
Test the hypothesis that the number of defective items per cartcn has a binomial
consistent with the hypothesis that accidents are equally likely to occur on
distribution. Can you suggest a reason that the distribution might not be
any day of the week?
binomial?
168
12. Tests of Significance
12.5. Tests for Multinomial Probabilities
11. The following results were obtain ed in 169
150 rolls of a 6-sided die:
16.tn items were exami ned from the outpu t of
Side three simila r machi nes in a factor y. Ten
2 3 4 5 6 percen t were defective for the first machi ne,
Frequ ency observ ed five percen t for the secon d m achine ,
29 26 31 24 19 21 and twelve percen t for the third machi ne. A likel
ihood ratio test of the hypo thesis
that the proba bility of a defective is the same
(a) For a brick-shape d die (Exam ple 1.3.2), for all three machi nes gave a
the face probab ilities are significance level of 5%. How large was n?
·p, = P6 =i- 28. 17 (a) Let / 1 , f 1 , e,, and e1 be any positiv e real numbe rs. Prove
that
Comp ute expec ted freque ncies under this model
, and test the goodn ess of fit.
(b) Assum ing the model in (a) to be correc (!1 +!)I f,+f1
og - - - ~ fl f1
t, test 'the hypoth esis 8 = 0. 2
e 1 + e1 1 og - +f 1 log -f1 .
(c) Carry out a likelih ood ratio test of the e1 e2
hypoth esis p, = p 1 = · · = p = k.
How is the likelih ood ratio statist ic rel ated 6 Hint: Consi der the functi on
to those in (a) and (b)?
12. Fifty specim ens of plastic are repeat edly /(). 1 , A1) = f log ). + f log ).
struck with a hamm er until they fracture. 1 1 2 2 -(e 1). 1 + e 2 ). 2 ).
The data are summ arized in the following table:
Its restric ted maxim um subjec t '. to ). = ).
1 2 canno t exceed its unrest ricted
maxim um over all ). , A •
No. of blows 2 3
1 2
4 5 6 ~7 (b) Suppo se that (12.5.1) is calcul ated over k
Obser ved freque ncy 23 classes . The first two classes a re then
13 8 4 0 combi ned into a single class with observ ed
freque ncy f 1 + / 2 and expec ted
frequency e, + e 2 and D is recalc ulated . Show
It is thoug ht that the numb er of blows that the value of D ca nnot
neede d to fractu re a specim en 1s increase.
geome tricall y distrib uted, with proba bility
functi on (c) Show that the Pearso n goodn ess of fit statist
ic (12.5.2) has a simila r prope rty.
f(x)= O(I -er' ; x =I, 2, ... ; 18.t(a ) A popul ation consis ts of a, families with
exactl y i childr en, for i = 0, 1, .. .. k.
Find the maxim um likelih ood estima te of 8, There are l:ia, childr en in the popul ation,
and test the goodn ess of fit of the and from these n childr en are
model to the data. chosen at rando m. Find the expec ted numb er
of childr en in the sampl e who
have exactl y j siblings (broth ers and sisters).
13.t A long seque nce of digits (0, 1, .. ., 9) produ (b) The 1931 Canad a census yields the follow
ced by a rando m numb er genera tor ing data
was exami ned. There were 51 zeroes altoge
ther, giving 50 pairs of successive
zeroes. For each such pair, the numb er of nonze
ro digits betwe en the two zeroes
was determ ined. The results were as follows:
I 207756 5 32080 9 2859
I 6 8 10 22 12 15 0 2 156111 6 18128 10 1353
0
2 26 I 20 4 2 0 10 3 95779 7 10511 11
4 19 575
2 3 0 5 2 8 I 4 56275 8 5624 12
6 14 2 326
2 2 21 4 3 0
l
0 7 2 4
4 7 16 18 2 13 22 7 3 5 Find the expec ted numb er of childr en with exactl
y j sibling s (j = 1•... , 11)
in a rando m sampl e of 242 childr en from this
Descr ibe an appro priate proba bility model for popul ation.
these counts ifthe rando m numb er (:) A sociol ogist asked each of 242 alcoho lics
gener ator is actual ly produ cing rando m digits. bow many sibling s he had, and the
C:::onstruct a frequency table and results were as follow s:
test the goodn ess of fit of the model .
·
No ..of
14. Consi der the multin omial log likelih ood
functi on siblings 0 2 3 4 5 6 7 8 9 10 II Total
l=f1 logp 1 +f2 logp 2 + -·· +fi.lo gpk Obser ved
frequency 21 32 40 47 29 23 20
where l: pj = 1 and l:fj = n. Show that 11 l0 3 3 2 242
I is maxim ized when pj = f /n for
j =I, 2, .. ., k. 1
Test the hypoth esis that the distrib ution of
family size for alcoho lics is as
15. Show that, for both the likelih ood ratio statist indica ted by the 1931 census . ·
ic ( 12.5. I) and the Pearso n goodn ess
of fit statist ic (12.5.2), the effect of doubl ing all (d) Anoth er possib le proced ure would be
observ ed and expec ted freque ncies to comp are observ ed values fj =
is to doubl e the value of D. fi(j + 1) with the popul ation values ej = 242a/ l:a
Expla in why this proce dure is incorr ect. 1 using (12.5.1) or (12.5.2).
12. Tests of Significance 12.6. Tests for Independence in Contingency Tables 171
170
19.tThe following table records 292 litters of mice classified according to litter size the row classification (treatment). This can be investigated by testing the
and number of females in the litter. hypothesis that the row and column classifications are independent. If this
hypothesis is contradicted by the data, then there is evidence of an
Number of females association between the two classifications.
0 1 2 3 4
{'
8 12
a x b Contingency Table
. .
Litter size !
2 23
10
44
25
13
48 13
As in the preceding section, we consider n independent repetitions of an
5 30 34 22 5
experiment. However now we suppose that the outcome of each repetition is
Suppose that the number of females in a litter of size i is binomially distributed classified in two ways: according to which of the events A 1 , A 2 , ... , A 0 occurs,
with parameters (i, p1). and according to which of the events B 1 , B 2 , ... , Bb occurs. We assume that
the A/s (and similarly the B/s) are mutually exclusive and exhaustive, so that
(a) Test the hypothesis P1 = P2 = p3 = p4.
(b) Assuming the hypothesis in (a) to be true, test the significance of deviations each outcome belongs to exactly one of them. Altogether there are k =ab
from the binomial distribution model. possible classes Ai Bi. Let Pii be the probability of class Ai Bi, and let fii be the
(c) How would the test in (b) be affected if equality of the p;'s was not assumed? observed frequency for this class in the n repetitions. The frequencies can be
arranged in an ax b table as shown in Figure 12.6.1. We denote the ith row
20. Test the goodness offit of the exponential distribution model in Problem 9.4.l(b).
total by ri and the jth column total by c1. Note that
First, compute expected frequencies using the MLE of 0 based on the original 27
measurements. Then repeat the test using the MLE of 0 based on just the
frequency table.
Under the assumption of independent repetitions, the distribution of the
21. In Problem 10.1.11, test the hypothesis that the length of the gestatfon period is fu's is multinomial with k =ab classes. The joint probability function is
normally distributed. Use the approximate MLE's from Problem 10.1.1 l(a) in n
computing the expected frequencies. How would the value of the likelihood ratio U11f 12 ... f.b)pfrpfr ... p~r;b,
statistic change if one used the exact MLE's in calculating expected frequencies?
and the log likelihood function is
l(p) = L.L.fii log PiJ
12.6. Tests for Independence in Contingency Tables where L.L.p;1 = 1. The situation is the same as in Section 12.5 except that now
we are using double subscripts. It follows from (12.5.1) that the likelihood
Many interesting statistical applications involve the analysis of cross- ratio statistic for testing an hypothesis H concerning the Pu's is
classified frequency data. For instance, in a study to evaluate three treatments (12.6.1)
for cancer, one might classify each of n patients according to the treatment
received, and also according to whether or not the patient survived a five-year where eii = npii is the expected frequency for class A;Bi under the hypothesis.
period. The results could be displayed in a 3 x 2 array, with one row for each The degrees of freedom for testing H will be (k -1)- q =ab - 1 - q, where q
treatment category and one column for each survival category. The body of is the number of unknown parameters which remain under H.
the table would give the number of patients in each of the six classes. A cross-
tabulation of frequency data such as this is called a contingency table. B, Total
B2 Bb
In the example just described, we have a two-way or two-dimensional
table. If we also classified patients by sex, we would have a three-way (3 x 2 A1 f 11 f 12 f1b r1
x 2) contingency table containing 12 frequencies. We shall restrict the A2 !21 !22 f2b r2
discussion here to two-way tables only. Many examples of higher dimen-
sional contingency tables may be found in Bishop, Fienberg, and Holland, A. f.1 f.2 Jab r.
Discrete Multivariate Analysis, MIT Press (1975). Total C1 C2 Cb n
A question of interest in the cancer study would be whether there is a
connection or association between the column classification (survival) and Figure 12.6.l. ax b contingency table.
172
12. Tests of Significance 12.6. Tests for Independence in Contingency Tables
173
Hypothesis of Independence
EXAMPLE 12.6.1. (R.A. Fisher, Smoking and the Cance
r Contr oversy, Oliver
and Boyd, 1959). Seventy-one pairs of twins were
A questi on which is often of interest is whether there examined with respect to
is any connection or their smoking habits . For each pair, it was ascert
association between the row and colum n classification ained whether they were
s. To investigate this, identical twins (Ai) or fraternal twins, (A ), and 'wheth
we consider the hypothesis of independence, 2 er their smoki ng habits
were alike (Bi) or unlike (B 2 ) . The results are shown
in the following 2 x 2
for all i,j contingency table:
which states that each row event A, is independent
of every column event Bi. Like habits Unlike habits
Using the definition of condi tional proba bility (3.4.1) Total
, we can rewrite this as
Identical twins 44 (39.56) 9 (13.44)
for all i, j 53
Frater nal twins 9 (13.44) 9 (4.56) 18
which states that the probability of obtain ing an
observation in the jth Total 53
colum n is the same for every row. Evidence agains 18 71
t the independence
hypot hesis is evidence in favor of an assoCiation betwe
en the row and column Note that 83% of identical twin pairs have like
classifications. habits , but only 50% of
fraternal twins have like habits. Could such a large differe
Unde r H , the unkno wn param eters are nce reasonably have
occurred by chance, or is the probability of like habits
different for the two
et.;= P(A;) for i = 1, 2, ... , a; types of twins?
We wish to know whether the probability of 8 (like
{Ji= P(Bi) for j = 1, 2, .. . , b. habits) could be the
same for identical twins (A ) and fraternal twins1
1 (A 2 ); that is, we wish to
Since .fo, = 1 and I.{Ji = 1, the numb er of functi examine the hypothesis
onally independent para-
meters under H is q =(a - 1) + (b - 1), and the degree
s of freedom for testing
His H : P(B 1 IAi) = P(B 1 IA 2 ) = P(Bi).
(k - 1) - q =ab - 1 - (a - 1) - (b - 1) =(a - l)(b
- I).
This is the independence hypothesis. Unde r H, the expec
ted frequency for the
( 1, 1) cell is
Since Pii = a,{Ji, the log likelihood function is
e 11 = r 1 ctfn = 53 x 53/71 = 39.56.
I,I,f;i Jog Pii = I.I.f;i log Ct. ;+ log {3
1)
The remaining expected frequencies can be found
=~(log a;)~ f;i +~(log fJi ) ~ !. in a similar way, o r by
1 subtra ction from the marginal totals : The observed
value of the LR statistic
(12.6.1) is
a,=r j n; l31=c j/n. The degrees of freedom for the test is (a - l)(b -1)
= 1, and
Hence the expei;:ted freque11cy for class A;B is SL~ P{x fll ~ 7.15} < O.ot.
1
eij = npij = na,/31 = r,cj/n. It is not reasonable to attrib ute the observed discre
(12.6.2) pancies to chance, and
To obtain expected frequencies under the indepe herce there is strong evidence against the indep
ndence hypothesis, we endence hypothesis. The
multiply row totals by coli.{mn totals and divide by probability of like smoking habits is greater for
the grand total. identical twins than for
Note that the e,/s have the same row and column fraternal twms.
totals as the f;/s:
L
eii = r;(I.ci) /n = r;;
j
L
e;1 = C'Lr;) cj(n = ci . N ote. The main difference between the situations
in Examples 12.4.1 and
i 12.6.1 is that ·the column totals c =c =44 were
If we comp ute all of the expected frequencies in 1 2 fixed in advance in the
the upper left hand (a - 1) former, whereas only the grand total n = 71 was fixed
x (b - I) subtable, we can get the expected frequencies in the latter. Thus the
for the last row and basic model for Example 12.4.1 is a pair of indep
column of the table by subtra ction from the margm enden t binomial distri-
al totals. butions, but the basic model for Example 12.6.l
is a single multin omial
174 12. Tests of Significance 12.6. Tests for Independence in Contingency Tables 175
distribution. If we had applied the analysis from Example 12.4.l to the Is there any evidence that laterality of eye is related to laterality of hand?
current example, we would have obtained exactly the same expected Assuming that the two classifications are independent, the expected
frequencies, LR statistic, degrees of freedom, and significance level. For a test frequencies for the four classes in the upper left hand corner are
of the independence hypothesis it makes no difference whether the marginal 118 x 124/413 = 35.43 195 x 124/413 = 58.55
totals are random variables, or are fixed in advance by the experimental
design. 118 x 75/413 = 21.43 195 x 75/413 = 35.41
EXAMPLE 12.6.2. Twenty-seven of the pairs of identical twins considered in The remaining expected frequencies are obtained by subtraction from the
Example 12.6.1 had been separated at birth, whereas the other 26 pairs had marginal totals, and are shown in parentheses above. The observed value of
been raised together. The frequencies of like and unlike smoking habits for the LR statistic (12.6.1) is Dobs = 4.03, and there are (a - l)(b - 1) = 4 degrees
the two groups are as follows: of freedom for the test. Thus
Like habits Unlike habits Total SL~ P{xf4 >:;?: 4.03} > 0.25.
Separated 23 (22.42) 4 (4.58) 27 The hypothesis of independence is compatible with the data, and there is no
Not separated 21 (21.58) 5 (4.42) 26 evidence of an association between laterality of hand and Jaterality of eye.
Total 44 9 53 EXAMPLE 12.6.4. Nine hundred and fifty school children were classified
The figures in parentheses are the expected frequencies under the assumption according to their nutritional habits and intelligence quotients, with the
that the two classifications are independent. We do not need a formal test of following results:
significance to tell us that the agreement is extremely good. There is no Intelligence Quotient
evidence that the probability of like smoking habits is different for the two
groups. <80 80-89 90-99 2100 Total
The greater similarity between smoking habits of identical twins (Example Good nutrition 245 (252.5) 228 (233.3) 177 (173.8) 219 (209.4) 869
12.6.1) could be accounted for in two ways. Firstly, it could be due to the fact Poor nutrition 31 (23.5) 27 (21.7) 13 (16.2) 10 (19.6) 81
that identical twins have the same genotype, whereas fraternal twins are no
more alike genetically than ordinary brothers and sisters. Secondly, it could Total 276 255 190 229 950
be due to greater social pressures on identical twins to conform in their
habits. If the latter were the case, one would expect to find less similarity in If there is no relationship, the row and column classifications are
the smoking habits of identical twins who had been separated at birth. Since independent, and the expected frequencies are as shown. The observed value
this is not the case, it appears that genetic factors are primarily responsible for of the LR statistic is 10.51 with (a - l)(b - 1) = 3 degrees of freedom, giving
the similarity of smoking habits. SL~ P {xf3 >:;?: 10.51} ~ 0.02.
The possibility that genetic factors may influence smoking habits has
interesting implications for the smoking and cancer controversy, since these The data provide reasonably strong evidence against the hypothesis of
same genetic factors might also produce an increased susceptibility to cancer. independence. Poor nutrition and a low IQ tend to occur together.
See Fisher's pamphlet for further discussion.
EXAMPLE 12.6.3. In a study to determine whether Jaterality of hand is PROBLEMS FOR SECTION 12.6
associated with laterality of eye (measured by astigmatism, acuity of vision, I.tin December 1897 there was an outbreak of plague in a jail in Bombay. Of 127
etc.), 413 subjects were classified with respect to these two characteristics. The persons who were uninoculated, 10 contracted the plague. Of 147 persons who
results were as follows: had been inoculated, 3 contracted the disease. Test the hypothesis that contrac-
Left-eyed Ambiocular Right-eyed Total tion of the plague is independent of inoculation status.
Left-handed 34 (35.43) 62 (58.55) 28 (30.02) 124 2. It was noticed that married undergraduates seemed to do better academically
Ambidextrous 27 (21.43) 28 (35.41) 20 (18.16) 75 than single students. Accordingly, the following observations were made: of 1500
Right-handed 57 (61.14) 105 (101.04) 52 (51.82) 214 engineering students, 297 had failed their last set of examinations; 157 of them
were married, of whom only 14 had failed. Are these observations consistent with
Total 118 195 100 413 the hypothesis of a common failure rate for single and married students? Under
176
12. Tests of Significance
12.6. Tests for Independence in Contingency Tables
177
what conditions would the information that there
were more married students in
3rd and 4th years than in !st and 2nd years affect (a) Test the hypothesis that the shape
your conclusion? and color classifications are independent.
(b) According to Mendel's theory, the freque
3. Six hundr ed and four adult patients in a large ncies of yellow , yellow and green,
hospital were classified according to and green seeds should be in the ratio I :2: l. Test
whether or not they had cancer, and according whether this hypot hesis is
to whether or not they were consistent with the data.
smokers. The results were as follows:
8. In the folloWing table, 64 sets of triplets are
Cancer patient classified according to the age of their
Other mother at their birth and their sex distribution:
Smoker 70 397
Non-smoker 12 125 3 boys . 2 boys 2 girls 3 girls Total
Test the hypothesis that the disease classification
is independent of the smoking Mother under 30
classification. 5 8 9 7 29
Mothe r over 30 6 10 13
4. A total of 1376 father - daugh ter pairs were classif 6 35
ied as SS, ST, TS, or TT where S
stands for short and T for tall. Heights were divide Total 11 18 22
d at 68" for fathers and 64" for 13 64
daughters. The propo rtion of short daughters
amon g short fathers is 522/726
while amon g tall fathers the propo rtion is 206/65 (a) Is there any evidence of an association betwe
0.. Do the data indicate any en the sex distribution and the
association between the heights of fathers and age of the mother?
daughters?
(b) Suppose that the probability of. a male birth
5.tln a series of autopsies, indications of hyper is 0.5, and that the sexes of
tension were found in 37%' of 200 triplets are determined independently. Find the
heavy smokers, in 40% of 290 mode rate smoke probability that there are x
rs, in 45.3% of 150 light smokers, boys in a set of triplets (x = 0, 1, 2, 3), and test wheth
and in 51.3% of 160 non-smokers. Test the hypoth er the column totals are
esis that the probability of consistent with this distribution.
hypertension is independent of the smoki ng catego
ry.
9. 1398 school children with tonsils present were
6. The following table classifies 5816 births by classified according to tonsil size
day of the week. Row 1 classifies the and absence or presence of the carrier for strept
first 2000 births in Who's Who for 1970 (avera ococcus pyogenes. The results
ge year of birth 1907). Row 2 were as follows:
classifies the 3816 births which were announced
in The Times during one year
ending in August 1976. Norm al Enlarged Much enlarged
Carrier present 19
M 29 24
T W Th F Sa Su Total Carrier absent 497
Who's Who 262 560 269
307 270 307 272 280 302 2000
The Times 572 585 594 594 582 Is there evidence of an association between the
498 391 3816 two classifications?
(a) Test the hypothesis that, for the sample from 10. The following data on heights of 205 marrie
Who 's Who, births are uniformly d couples were presented by Yule in
distributed over the days of the week. 1900.
(b) Show that the distributions of births are signifi Tall wife
cantly different for the two Medium wife Short wife
samples. In what way are they different? Can you Tall husband 18
suggest an explanation for 28 19
this? Medium husba nd 20 51 28
Short husband 12 25
7.tGr egor Mendel grew 529 pea plants using seed 9
from a single source, and classified
them according to seed shape (round, round and · Test the hypothesis that the heights of husbands
wrinkled, wrinkled) and color and wives are independent.
(yellow, yellow and green, green). He obtained
the following data:
11.t A study was undertaken to determine wheth
38 round, yellow er there is an association between the
birth weights of infants and the smoking habits
65 round, yellow and green of their parents. Out of 50 infant s
of above average weight, 9 had parents who both
60 round and wrinkled, yellow smoked, 6 had mothers who
smoked but fathers who did not, 12 had fathers who
138 round and wrinkled, yellow and green smoked but mothers who did
not, and 23 bad parents of whom neither smoke
28 wrinkled, yellow d. The corresponding results for
50 infants of below average weight were 21, 10,
68 wrinkled, yellow and green 6, and 13, respectively.
35 round , green (a) Test whether these results are consistent with
the hypothesis that birth weight
67 round and wrinkled, green is independent of parental smoking habits.
30 wrinkled, green (b) Are these data consistent with the hypothesis
that, given the smoking habits
of the mother, the smoking habits:ofthe father are
not related to birth weight?
178 12. Tests of Significance 12.7. Cause and Effect 179
12. In a California study, data were collected on some features of motorcycle preparation of the virus extract, and the other half was rubbed with the second
accidents. As part of the study, questionnaires were sent to individuals who had extract. The following table shows the number of lesions appearing on the half
been involved in motorcycle collisions. One question of interest was the possible leaf.
relationship between the occurrence of head injury and helmet use. The following
data were reported on 626 injured male drivers who responded to the Leaf no. 2 3 4 5 Total
questionnaire.
Extract 1 31 20 18 17 9 95
No head Minor head Serious head Extract 2 18 17 14 11 10 70
injury injury injury
Helmet used 165 20 33 Test the hypothesis that the proportion of lesions produced by Extract 1 is the
No helmet 262 53 93 same on all leaves.
14. Consider a two-way cross-classification of counts hi' where 1 sis a and (i) A causes B;
1 sj s b. Assume that the h/s are independent, and that hi has a Poisson (ii) B causes A;
distribution with mean µii. Under this assumption, the total count n = r;r.hi is a (iii) some other factor C causes both A and B.
random variable. Consider the hypothesis
We cannot claim to have proof that A causes B unless the data have been
for 1 sis a, 1 sj s b, collected in such a way that (ii) and (iii) can be ruled out.
where the rt.,'s, {3/s, and y are unknown parameters and 'f.rt.; = 'f./3i = 1. This For instance, in Example 12.6.4, low IQ's were found more often in
hypothesis says that the expected counts in any two rows of the table are children with poor nutrition than in children with good nutrition. The
proportional: significance test tells us that the observed association cannot reasonably be
attributed to chance. However, it would not be valid to conclude that poor
for 1 sj s b.
nutrition causes low IQ, or that low IQ causes poor nutrition. There could be
Show that the expected frequencies under H are given by (12.6.2), and the a third factor such as poor home environment which is responsible for both
likelihood ratio statistic for testing His given by (12.6.1). poor nutrition and low IQ.
15. Continuation of Problem 14. An experiment was carried out to determine Rigorous evidence of cause-and-effect can be obtained only from a
whether two concentrations of a virus would produce different effects on tobacco controlled experiment in which the experimenter demonstrates that by
plants. Half a leaf of a tobacco plant was rubbed with cheesecloth soaked in one manipulating the cause A, he can control the effect B. Randomization is an
180
12. Tests of Significance
12.7. Cause and Effect
181
impo rtant comp onen t of the experiment.
If the subjects who received A were
chosen at rand om, then we know what serious cases . The larger total number of
caused A, and neither (ii) nor (iii) successes with treat ment 2 is due to
above is a possible explanation. a third factor, illness severity, which was not
considered in the original table.
For instance, suppose that we wish to demo Of course, there may be additional factors,
nstrate that aspirin causes a such as the sex and age of the
reduction in the probability of a second patient, which also affect success rates.
attack for hear t attac k victims. The A further breakdown of the data
experimental subjects should be assigned according to these factors may change the
at random to either the treatment picture again . If just one impo rtant
grou p which receives aspirin, or the cont rol factor is overlooked, conclusions abou
grou p which receives a placebo (a t the relative merits of the two
pill similar to aspirin in appearance and treatments may be incorrect.
flavor, but with no active ingre-
dients). If we allowed the patients or their In a designed experiment, patients would
doctors to choose their treatments, first be grouped according to any
we could not be sure that any reduction in factors such as disease severity which were
second attacks was actually due to expected to influence the success
the aspirin rathe r than to the way in whic rate. The patients in a grou p would then be
h the treatments were assigned. assigned to treatments at rand om.
The following example shows the sort ' Unde r rand om assignment one would expe
of difficulty that can arise when ct that any impo rtant factor
treat ment s are not rand omly assigned. which had been overlooked would be reaso
nably well balanced over the
treatment groups. An imbalance could still
occur by chance. However, if the
EXAM PLE 12.7. 1. In orde r to comp
are two possible treatments for the same experiment is repeated, it is very unlik
ely that we will again obta in an
disease, hospital records for 1100 appli imbalance of the same sort. Thus, by
cations of each treatment were rand omly assigning subjects to
examined. The treat ment was classified treatment groups, and by repeating the expe
as a success or failure in each riment, one can guard against the
application and the observed frequencies presence of unsuspected factors which migh
were as follows: t invalidate the conclusions.
Treat ment 1 Treat ment 2 PROBLEMS FOR SECTION 12.7
Success 595 905 l. Explain why it is that studies such as
Failure those in Probl ems 12.6.3 and 12.6.5 cann ot
505 195 used to establish cause be
-and-effect relationships.
Total 1100 1100 2. In an Onta rio study , 50267 live births
were classified according to the baby 's
weight Oess than or greater than 2.5 kg)
The success rate was 82% for treat ment and according to the moth er's smok ing
2 versus only 54% for treatment 1, habits (non-smoker, 1- 20 cigarettes per day,
and a significance test shows that this or more than 20 cigarettes per day).
difference is far too great to be The results were as follows:
attrib uted to chance.
One might be temp ted at this poin t to assum No. of cigarettes 0 1-20 > 20
e that the relationship was
causal, and that the overall success rate could Weight :::;2.5
be improved if treat ment 2 were 1322 1186 793
always used. This isn't necessarily so! We Weight >2.5
do not know that patients receiving 27036 14142 5788
the two treatments were similar in othe r
ways, and therefore we cann ot rule (a) Test the hypothesis that birth weight
out possibility (iii) above. is indep enden t of the mother's smok ing
For instance, it might be that treat ment habits.
1 was given primarily to patients (b) Explain why it is that these results
who were seriously ill, while treat ment do not prove that birth weights would
2 ·was usually given to those less increase if mothers stopp ed smoking durin
seriously ill. The breakdown into serious g pregnancy. How shoul d a study to
cases and less serious cases might be obtai n such proof be designed?
as follows: (c) A similar, though weaker, association
exists between birth weight and the
amou nt smoked by the father. Explain why
Less serious cases More serious cases this is to be expected even if the
father's smoking habits are irrelevant.
Trt 1 Trt 2 Trt 1 Trt 2
3.t91 8 freshman math emat ics students
were classified according to their first term
Success 95 900 500 average on six subjects, and according to
5 whether or not they had written a high
Failu re 5 100 500 95 school mathematics competition. The result
s were as follows:
Total 100 1000 1000 100 First term average < 50 50- 59 60--69 2 70
Trea tmen t 1 has a higher success rate for Wrot e competition 10 46
both the less serious and more 128 289
Did not write 41 89 146 169
182 12. Tests of Significance 12.8. Testing for Marginal Homogeneity
183
(a) Test the hypothesis that first term average is independent of competition
depend upon the situation. To illustrate this point, we give an example where
status.
(b) Explain why it is incorrect to conclude that high school students can improve
one is interested in comparing the marginal probabilities P(A;) and P(B;)
their prospects for first term at university by .writing the competition. rather than in testing independence.
Two drugs are compared to see which of them is less likely to produce
4. One hundred and fifty Statistics students took part in a study to evaluate unpleasant side effects. Each of 100 subjects is given the two drugs on
computer-assisted instruction (CAI). Seventy-five received the standard lecture different occasions, and is classified according to whether or not the drugs
course while the other 75 received some CAI. All 150 students then wrote the same
upset his stomach. The results can be summarized in · a 2 x 2 contingency
examination. Fifteen students in the standard course and 29 of those in the CAI
table as follows:
group received a mark over 80%.
(a) Are these results consistent with the hypothesis that the probability of Nausea with No nausea
drug B with B Total
achieving a mark over 80% is the same for both groups?
(b) Based on these results, the instructor concluded that CAI increases the chances Nausea with drug A 38 2 40
of a mark over 80%. How should the study have been carried out in order for No nausea with A 10 50 60
this conclusion to be valid?
Total 48 52 100
5.t (a) The following data were collected in a study of possible sex bias in graduate
admissions at a large university:
Drug B produced nausea in 48% of subjects, but drug A produced nausea in
Admitted Not admitted only 40% of subjects. Could this discrepancy reasonably be ascribed to
Male applicants 3738 4704 chance, or is there evidence of a real difference between the two drugs?
Female applicants 1494 2827 We assume that results for different subjects are independent, so that we
Test the hypothesis that admission status .is independent of sex. Do these data have n = 100 independent repetitions of an experiment with k = 4 possible
indicate a lower admission rate for females? outcomes. The basic model for the experiment is multinomial as in Section
(b) The following table shows the numbers of male and female applicants and the 12.6. However, the hypothesis of independence is not of interest in this
percentages admitted for the six largest graduate programs in (a): example. One would expect a patient who experiences nausea from one drug
to be more susceptible to nausea from the other drug as well. Indeed, 88 of the
Men Women 100 subjects reacted in the same way to both drugs. The row and column
Program Applicants % Admitted Applicants % Admitted classifications are certainly not independent, and there would be no point in
testing the hypothesis of independence.
A 825 62 108 82
B 560 63 The question of interest in this example is whether the probability of
25 68
c 325 37 593 34 nausea is the same for both drugs. Thus we consider the hypothesis of
D 417 33 373 35 marginal homogeneity,
E 191 28 393 24
F 373 6 341 7
H : P(A) = P(B).
Since P(A) = p 11 + p 12 and P(B) = p 11 + p 21 , this hypothesis is equivalent to
Test the independence of admission status and sex for each program. Does any
H: P12 = P21
of the programs show evidence of a bias against female applicants?
(c) Why is it that the totals in (a) seem to indicate a bias against women, but the which states that the 2 x 2 table of probabilities (Pu) is symmetric. There will
results for individual programs in (b) do not? be one degree of freedom for testing H because it reduces the number of
unknown parameters by 1.
Under H, the log likelihood function is
12.8. Testing for Marginal Homogeneity + U12 +f2il log P +!22 log P22
l = "f."f.fii log Pu= !11 log Pi1
Although the hypothesis of independence will often be of interest, one should where p is the common value of p 12 and p 21 , and p 11 + 2p + p = {.
22
not assume that every contingency table automatically calls for such a test. Maximizing l subject to this restriction gives
Contingency tables can arise in many ways, and the hypothesis of interest will P11 = !11 / n; ft22 =!22 /n; ft12 = ft21 =ft= tU12 +!21 )/n,
184 12. Tests of Significance 12.8. Testing for Marginal Homogeneit y 185
3.t Of 400 randomly chosen electors in a riding, 212 said that they supported inside this region is compatible with the data at the 5% level because a test of
government policy and 188 were opposed. Soon after this a new budget was H: (} = (} 0 gives a significance level of 5% or more. Any parameter value (} 1
introduced and the same 400 electors were re-interviewed. There were found to be outside this region is contradicted by the data at the 5% level because a test of
196 who now supported government policy, including 17 who had previously been H: (} = B1 gives a significance level less than 5%.
opposed.
(a) Explain why it would not be valid to carry out a test for independence in the
EXAMPLE 12.9.l. Suppose that X has a binomial (n, ~) distribution. The
following table:
expected value of X under H: (} = (} 0 , is ne0 , and so we might choose
Support Opposed D =IX - ne 0 1 as the test statistic (see Example 12.1.1). Given an observed
Before budget 212 188 value x, the significance level is
After budget 196 204
SL((}o) = P{IX - ne 0 1~Ix - nB 0 1}.
(b) Another way to tabulate the data is as follows: For n large, the normal approximation to the binomial distribution gives
Support after Opposed after
Support before 179 33
Support after 17 171
and hence the square of this quantity is approximately xfl)· It follows that
2
Carry out a test for independence in this table, and carefully explain what the 2 (x - n(} 0 ) }
result means. SL( Bo)~ p { Xo> ~ neo(l - (Jo) .
(c) Test the hypothesis that the proportion of voters who support government
policy is the same after the budget as it was before the budget. The approximate 5% significance interval for B based on this test contains
all parameter values B0 such that SL(B0 ) ~ 0.05. Since P{xf1 > z 3.841} = 0.05,
we have SL~ 0.05 if and only if
12.9. Significance Regions (x - n(} 0 ) 2 / n(} 0 (l - (} 0 ) $ 3.841
<-->(&- (} 0 ) 2 $ 3.84l(}o(I - (}o)/ n
In Section 11.4 we defined confidence intervals and suggested that they be
constructed from the likelihood function. In this section we consider another where &= x/n. The endpoints of the interval are thus the roots of a quadratic
construction based on a test of significance. equation
Suppose that the model involves a single unknown parameter(}, and that
we have a test of significance for the hypothesis(}= (} 0 . The significance level (l> - (}) 2 = 3.841(}( 1 - {})/n.
will depend upon which parameter value is tested, so we can think of SL as a For instance, suppose that we ·observe X = 35 in n = 100 trials as in
function of 0. If SL(B0 ) is near I, H: B = B0 is in good agreement with the data Example 12.2. l. Then l} = 0.35, and the equation is
and e0 is a "reasonable" parameter value. If SL((} 0 ) is near 0, H: (} = (} 0 is
strongly contradicted by the data, and (} 0 is not a reasonable parameter value. (0.35 - (}) 2 = 0.03841(}(1 - (}).
The significance level, considered as a function of (}, gives a ranking of
Its roots are (} = 0.2636 and (} = 0.4474, and so the approximate 5%
possible parameier values.
significance interval for (} based on the above test is 0.2636 $ (} ::; 0.4474.
Intervals or regions of parameter values can be obtained from SL((}) in the
Alternatively, we could use the likelihood ratio statistic
same way that likelihood regions are obtained from R((}). The set of
parameter values such that SL((})~ pis called a 100p% significance region far x n-x
e. Significance regions are also called consonance regions. See Kempthorne D' = - 2r((} 0 ) = 2x log nBo + 2(n - x) log n(l _(Jo)
and Folks, Probability, Statistics, and Data Analysis, Iowa State University
Press (1971). as in Example 12.2.1. Since D' ~ xf1 > for n large, we have
The 5% significance region for B consists of all parameter values (} 0 such
SL((}0 ) ~ P{xfll z -2r((} 0 ) }.
that SL((} 0 ) z 0.05. Usually this will be an interval. Any parameter value 00
188
12. Tests of Significance
12.9. Significance Regions
We then have SL~ 0.05 if and only if 189
If the distribution of D depends upon 00 , as it usually will in examples with SL s a.. If a. is achievable, then x EC. if and only if D(x):;:::. d•. It follows that,
discrete distributions, then dP will generally depend upon 00 • Significance for any achievable a.,
regions obtained from the likelihood ratio test need not be likelihood regions,
and the two constructions will usually give slightly different results. P(X e C.IH 0 is true)= P(D:;:::. d.IH 0 ) =a..
Note the similarity with the results on coverage probabilities in Section 12.8.
EXAMPLE 12.9.1 (continued). Consider again the binomial distribution Now let H 1 denote another hypothesis which is chosen to represent the
example with n = 100 and X observed to be 35. We shall show that the exact kind of departure from H 0 that we wish to detect. H 0 and H 1 are called the
5.4% significance interval obtained from the likelihood ratio test of H: 0 = 00 null hypothesis and the alternative hypothesis, respectively. Initially we assume
is not a likelihood interval. that both H0 and H 1 are simple hypotheses, so that the probability of any
The 14.7% likelihood interval (approximate 95% confidence interval) for 0 outcome x can be computed numerically under H 0 and under H 1 .
is given by The size a. power (or sensitivity) of a test statistic D with respect to the
-2r(O) s 3.841 simple alternative hypothesis H 1 i~
and solving gives 0.26117 s es 0.44642. The two endpoints of this interval K. = P{SL ~ a.IH 1 is true}= P{ Xe C.IH i}.
have equal relative likelihoods. For instance, Ko.os is the probability that a test of H 0 using D will produce a
An exact likelihood ratio test of H: e = 0.26117 can be carried out as in significance level of 5% or less if in fact H 1 is true. If Ko.as is near 1, the test
Example 12.2.1, and the significance level is found to be 0.052. Similarly, an statistic D is said to be powerful or sensitive against H 1 , because if H were
1
exact test of H: 0 = 0.44642 gives SL= 0.056. Thus the exact 5.4% sig- true the test would almost surely give evidence that H 0 is false.
nificance interval for () will contain () = 0.44642, but it won't contain the Now let D, D' be two comparable statistics for testing H 0 with power K.,
equally likely value 0 = 0.26117. It follows that the 5.4% significance interval K~ against H 1 • Dis said to be more powerful than D' against H if K. 2 K~ for
1
is not a likelihood interval. all achievable significance levels a.. A statistic D is called most powerful for
testing H 0 against H 1 if it is more powerful than every comparable statistic
D'.
*12.10. Power EXAMPLE 12.10.1. Let X ~ N(µ, 1), and consider a test of H 0 : µ = O against
H 1 : µ = 2. Two possible statistics for testing µ = 0 are D = X and D' IXI.
With test statistic D, only large positive values of X are considered to be in
=
This section briefly introduces a theory of test statistics. This theory is based
on the concept of the power or sensitivity of a test statistic against an poor agreement with µ = 0, whereas with D' both large positive and large
alternative hypothesis. Power comparisons may be helpful in a theoretical negative values of X are considered as evidence against µ = 0. Both D and D'
comparison of several possible test statistics to determine which of them is are continuous variates, and therefore all significance levels are achievable for
more likely to detect departures of a particular type. both statistics.
Consider a test of the simple hypothesis H 0 , with test statistic D. The The size a critical region for D has the form X 2 d. where d. is chosen so
significance level of outcome x in relation to H 0 is that
P{X;:::. d.IH 0 is true}= a..
SL= P(D;:::: di Ho is true)
Since Xis N(O, i) under H 0 , d, is the value such that F(d.) = 1- a. where Fis
where d = D(x) is the observed value of D.
the standardized normal c.d.f. The size a power of D with respect to H 1 is
If D is a continuous variate, it is possible to obtain any significance level
between 0 and 1. However if D is discrete there will be only a discrete set of K. = P{X;:::: d,IH 1 is true}= P{X;:::: d.IX ~ N(2, 1)}
possible significance levels corresponding to the possible values of D. If there
exists a variate value d, such that P(D:;:::. d.IH) =a., then a. is called an = P{Z;:::: d.-21Z ~ N(O, 1)} = 1-F(d.- 2),
achievable significance level. Two test statistics are called comparable if they The size a. critical region for D' has the form !XI;:::: d~ where d~ is chosen so
have the same set of achievable significance levels. that
The size a. critical region of a test is the set c. of outcomes x for which P{IXl.;::::d~IH 0 is true} =a..
*This section may be omitted on first reading. SinceXisN(O, 1) under H 0 , d~is the value such that F(d~) = 1- ~·The size a.
192
12. TesU. of Significance 12.10. Power
193
power of D' with respect to H is
1
Table 12.10.1. Probabi lities Under the Null Hypoth esis and
K~ = P{IXI;::: d~IH 1 is true}= l - P{!XI < d~IX -N(2, l)} Under the Alterna tive Hypoth esis for the Four Regions of the
Sample Space Defined by Two Size IX Critical Regions
= l - P{ - d~ - 2:::;; z ~ d~ - 2!Z - N(O, l)}
=l - F(d~ - 2) + F( -d~ - 2). Ho C' . c~ Total Hi c~ c~ Total
For IX = 0.05 we find from Table B2 that d. = l.645 and d~ = 1.960.
Thus we
c. P11 P12 Cl c. qll q,2 K.
have c. P21 P22 I-ex c. q21 q2 2 1-K.
K 0 . 05 = l - F(-0.35 5) = F(0.355) = 0.64; Total Cf. I-ct. Total K~ l-K~
K~. 05 = l - F(-0.00 4) + F(-3.96 0) = 0.48.
Ifµ= 2 and we test the hypoth esisµ= 0, the probabi lity of getting
is 0.64 with statistic D, but only 0.48 with statistic D'. Thus D gives
SL:::;; 0.05 Since c. and C~ are size IX critical regions, we have
us a better
chance of obtainin g evidenc e against H: µ = 0 when in fact µ =
2. P11 + P12 = P(X E C.iHo) =IX= P(X E C~!Ho) = P11 + P11 •
It can be shown that K. ;::: K~ for all values of IX, so that D is more
powerful and hence p 12 = p 21 . The size IX power is
than D' for testing H 0 : µ = 0 versus H : µ = 2. In fact, it follows
1 from the
theorem below, that Dis the most powerful statistic for testing µ= K. = P(X E C.!Hi) =qi!+ qll for D;
0 against
µ=2 .
K~=P(XEC~IH 1 )=q11 +q2 1 for D'
and the difference in power is
Most Powerful Test when H 0 and H are Simple
1 K.-K~=q1z-q11 ·
The following theorem , which is called the Neyman - Pearson Fundam Since c.C~ is a subset of c., (12.10.3) gives
ental
Lemma , yields a most powerfu l test statistic when both H and H
0 1 are simple q1z = Efi (x) ~ d. Efo(x) = d.P11
hypothe ses.
where the sums are taken over x E c.c~. Similarly, since c.c~ is a subset of
Theorem 12.10.1. Let H 0 and H 1 be simple hypotheses, and letf
0 (x) andf1 (x)
C. , (12.10.3). gives
denote the probabi lity of a typical outcome x under H and
0 under H 1 , qzi = Ef1 (x) < d.Efo(x ) = d.P21 ·
respectively. Then the statistic
Now, since p 12 = p 21 , we have
D(x) = f 1 (x) ffo(x) (12.10.l)
is most powerful for testing H 0 against H . K. - K~ = q12 - qz1 > d.P12 - d.P21= 0.
1
This result holds for all compar able statistic s D' and achieva ble
PROOF. Let IX be an achieva ble significance level for D, significance
and let d0 be the value levels ct, and hence the theorem follows.
of D such that P(D ~ d.IH 0 ) =IX. The size IX critical region c. 0
is the set of
x-value s for which D(x) ~ d•. Note that, by (12.10.1), we have EXAMPLE 12.10.2. Let X - N(µ, 1), and conside r a
test of the simple null
hypothe sis H 0 : µ = µ 0 versus the simple alternat ive hypothe sis
f 1 (x);::: d.f0 (x) H1: µ = µ1.
for XE C 0 ; (12.10.2) The theorem gives
f 1 (x) < d.f0 (x) for XE C•. (12.10.3) D(x) = f 1 (x)/f 0 (x) =exp { - t(x - µ 1 ) 2 + -!(x - 11 ) 2 }
0
Let C~ be the size IX critical region for any compar able test statistic
D', and = exp{x(µ1 - µo) + t{µ~ - µi)}
conside r the partitio n of the sample space into four disjoint regions
as a most powerful statistic for testing H against H .
S = (C.C~)u(C.C~)u(C.c~)u(C.C~). 0 1
If µ 1 > µ 0 , large values of D corresp ond to large values of X . The
size a
We use p's to denote the probabi lities of these regions under H critical region has the form X ;::: b. where b. is chosen so that
0 , and q's to
denote their probabi lities under H 1 (see Table 12.10.1).
P{X ~ b0 1H 0 is true}= IX.
12.10. Power 195
12. Tests of Significance
194
powerf ul statist ic for
H 1: µ > µ 0 • Similarly, there exists a unifor mly most
Since X - N(µ 0 , 1) under H 0 , we find that b. = µ 0 +
z. where z. is the value Howev er there is no unifor mly most
rdized norma l distrib ution. The size rx testing H 0 :µ=µ 0 versus H 1 : µ<µ 0 .
exceed ed with probab ility rx in a standa H
powerful statisti c for testing 0 : µ = µ 0 versus H 1 : µ # µ 0
becaus e a differe nt
power is rankin g is obtain ed for µ 1 < µ 0 than for µ 1 > µ 0 .
belong ing to the
K. = P{X <'.'.: b.IH i} = P{X <'.'.: b0 /X - N(µ 1 , 1)} A similar result can be establi shed for any distrib ution
expone ntial family (Exam ple 15.1.6). '
= P{Z ~ b0 - µ 1 /Z is N(O, 1)}
= 1- F(µ 0 + z. - µi)
Discussion
where F is the c.d.f. of N(O, 1).
Simila rly, if µ 1 < µ 0 , the critica l region has the form
X::;; a., and the size 11.
a statist ic which is
1. We have seen that there genera lly will not exist
power is found to be ul for testing H B = 8 agains t a two-si ded alterna -
uniform ly most powerf 0 : 0
K. = 1 - F(µ 1 + z,. - µ 0 ). *
tive 8 B0 • In fact, uniform ly most powerf ul tests will
in
rarely exist except in
definin g a theore ti-
t H: µ = µ 0 , we simple textbo ok examp les. To make furthe r progre ss
In accept ing only large values of X as eviden ce agains cally optim um test, additio nal restric tions must be placed on the types of test
ures on the high side (µ 1 > µ 0 ), but
we
achiev e maxim um power agains t depart to be consid ered. The restric tions usually sugges ted seem arbitra ry and
ures on the low side (µ < µ ) . The cost of
lose the ability to detect depart 1 0
situati on is even less satisfa ctory when both the null and
ure is a decrea se in unconv incing . The
increa sed sensiti vity to one particu lar type of depart alterna tive hypoth eses are compo site, and we shall
not give details here.
sed previo usly in Examp le
sensiti vity to other types. This point was discus 2. Althou gh power consid eration s will not identif y an optim um test
12.5.2. cases, a compa rison of power may still be
statisti c except in very special
The .likelih ood ratio statist ic for testing H: µ = µ 0 is helpful in choosi ng betwee n two test statisti cs D and D'. Given a statist ic D for
ine the size rx power as a
D' = (X - µ0)2 testing H 0 : B = B0 agains t H 1 : B = B1 , one can determ
of /X - µ 0 /. The LR functio n of B1 ,
which ranks outcom es accord ing to the magni tude
statist ic is not most power ful for testing µ = µ 0 agains t any particu lar K,.(Bi) = P(SL::;; 11./B =Bi).
power for depart ures in
alterna tive value µ 1 , but it does have reason ably high A graphi cal compa rison of the power functio ns K,.(B)
and K~(B) for selecte d
both directi ons. is prefera ble.
values of a. may sugges t that one of the statisti cs
put is the determ ination of
3. Anoth er use to which power has been
sample size. Suppo se that we intend to test H: B = B 0 using a test statist ic D,
Composite Alternative Hypothesis and that . we want to be 90% sure of obtain ing a signifi cance level of 5% or
be selecte d so that
H is simple but H 1 is less if in fact B = B1 . Then the sample size n should
Now consid er a slightl y more genera l proble m in which 0 Ko.os(Bi) = 0.9. For anothe r approa ch to experi menta l planni ng, see the
outcom e x has probab ility f (x; B) where B
compo site. Suppo se that a typical discus sion of expect ed inform ation in Sectio n 11.6.
eter. The null hypoth esis is taken to be H 0 : B= B0 . The unless one can be
is a real-va lued param 4.. Power compa risons are not likely to be very useful
nl is a set of possible
alterna tive hypoth esis has the form H 1: BE nl where In many applic ations of
*
param eter values; for instan ce H 1 : B B0 and 1 H : B > B0 have this form.
statisti c for testing
quite specific with respect to the alterna tive hypoth esis.
significance tests, one will have only a vague idea concer ning the types of
Given any particu lar value 1 B E n 1 , a most powerf ul
would like to avoid buildin g elabor ate model s
by · depart ure that may occur. One
B = B0 versus B = 1 B is given
the need for them has been demon strated in a test of
to explain these until
D(x) =f(x; Bi)/f(x ; B0 ) . significance.
(i.e. the same critica l
If this statist ic defines the same rankin g of outcom es
region ) for all B1 E n 1 , then Dis uniformly most powerful for testing H 0 versus
Hi .
g of outcom es from
In Examp le 12.10.2, one obtain s the same rankin
favora ble) whene ver µ 1 > µ 0 • Hence
smalle st (most favora ble) to largest (least
power ful statist ic for testing H 0 : µ = µ 0 versus
there exists a unifor mly most
13.1. Introd uctio n
197
CHAPTER 13
mc!asure. mileages repeatedly unde r cond
ition s that are identical, or as close
to 1dent1cal as w_e can mak e them, we
Analysis of Normal Measurements ~esult. There will be scatt er, or varia
will not always get exactly the same
bility, in observations mad e unde r
identical conditions. We model this
by assuming that the obse rvati ons
Y1,Y2, ... .'y" are observe~ values of
rand om variables Y , Y , ... , Y•. The
problem is the~ .to determme how the 1
probability distribution 2of Y; depends
upon the cond1t1ons unde r which this
observation was mad e .
. If .the. conditions are very different
for Y; than for }j, the probability
d1stnbut1ons for Y; and }j may be of com
pletely different types. For instance,
suppose that we are observing failu
re times of plastic gears at various
temperatures. Gear s fail due to melti11g
at very high temp eratu res, whereas at
low temperatures they become brittle 'and
tend to fracture. There is no reason
to suppose that the distributions of
extremes. lifeti mes will be similar at these two
In most studies we deal with relatively
small changes in cond ition s. Then
we expect the distributions of Y , Y , .•.
, Y,, to be simi lar to one
w_e 1:11ig~t reasonably assume that
1 2 anot her. Thus
the Y;'s all have the same type of
The norm al distr ibuti on plays a central d1stnbut10n, and that the effect of chan
role in the modelling and statistical ging conditions is to alter the value of
analysis of cont inuo us measurements. a pa.rameter in this distribution. This
Man y types of measurements have is the sort of assu mpti on we mad e in
distr ibuti ons which are appr oxim ately Sect10n 10.5, where we were examining
normal, and the Central Limit the dependence of the response rate
Theo rem helps to explain why this is
so. Statistical methods for analyzing o~ the.dose ~fa_ drug. We assumed that
all of the Y;'s were inde pend ent and
normally distr ibute d measurements are bmom1ally d1stnbuted, and that the only
relatively simple, and most of these effect of changing the dose was to
meth ods give reasonable results unde r alter the response probability p.
mod erate departures from normality .
Section l discusses the basic assumpti
ons and describes the models to be
considered in later sections. All of
these are examples of normal linear
models, which will be discussed in grea The Basic Assumptions
ter generality in Chap ter 14. Section
13.2 describes statistical meth ods for such
models. These methods are applied
to the one-sample and two-sample mod In fois c~apter and the next one, we deve
els in ·Sections 3 and 4, and to the lop the model and analysis unde r the
strai ght line model in Sections 5 and assumption that the Y;'s are independ
6. Section 7 discusses the analysis of ent and normally distr ibute d with the
paired measurements, such as meas same variance <J 2 , so that
urements take n on the same subject
before and after treatment.
for i = 1, 2, ... , n.
(13. l.l)
Und er these assumptions, the effect of
chan ging cond ition s is to alter µi, the
expected value of Y; , but the shape and
afteeted. spre ad of the distr ibuti on are not
13.1. Intr odu ctio n
The model can also be written in term
e1 , e2 , ... , e. , where s of n inde pend ent erro r variables
Suppose that n dete rmin ation s y , y ,
1 2 •• ., y. are mad e of the same quan tity y
unde r various different cond ition s. For
by a car over a fixed distance might be
instance, gasoline mileages achieved e1 = Y;- µ; ~ N(O, <1 2
).
recorded for several driving speeds, We can then write Y; as the sum of a "targ
weat her conditions, etc. We wish to et value" µ 1 and a "ran dom erro r" or
formulate a model which describes or "noise" com pone nt e :
explains the way in which y depends 1
upon these conditions. Hopefully the
model will help us to unde rstan d how
the various factors affect mileage, and where e1 ~ N(O,
to estimate the magnitudes of their effec <J 2 ). (13. l.2)
ts. If we take repeated measurements unde
Any realistic model will have to take natu r the same conditions, then µi stays
ral variability into account. If we the same, but the rand om erro r e varie
1 s from one repetition to the next.
198 13. Analysis of Normal Measurements 13.l. Introduction 199
The standard deviation a measures the amount of random variability observed li~etimes. The distribution of log-lifetimes is generally much closer
(scatter, noise) that one would expect to obtain in repeated measurements to ~ormal m shape, and the log transformatio n also helps to stabilize the
taken under the same conditions. Suppose that Y; and r; are independent
variance.
measurement s with the same expected value:
2
The usual estimate of a is Inferences for o:, /J, ... : <J 2 Unkno wn
, 2 numerat or of x quantity
2
l (13.2.5)
Usually a is unknown , and is estimated by s as defined in (13.2.5). By
2 1 2
5 degrees of freedom
= n - q :Ee, =
(13.2.4) we have
2
We shall show at the end of this section that s is the MLE of a based on the
2
I
204
13. Analysis of Norm al Measu remen ts 13.2. Statistical Metho ds
205
Inferences for a
Tc• obtain an approximate 95% confidenc
e interval for a, we note from
It can be argu ed that, when the parameters Table B4 that
a, p, .. . are unknown, the residual
sum of squares r.e? carries all of the infor P {xf1 1::; 3.841} = o.95.
mation from the y/s concerning a.
Inferences abou t a will therefore be based
on the marginal distribution of We then solve the inequality
r.ef.
By (13.2.4) we have -2r( a)$ 3.841,
either by plotting r(<J) or by using New ton's
meth od as i~ Section 9.8. The
interval obtained is also a 14.7% likelihoo
d interval and an appr oxim ate 5%
significance interval for a. ·
By (6.9.1), the p.d.f. of Vis Alternately, a 95% confidence interval for
a can be obta ined directly from
f(v) = k.vl• /2)- te -v/2 (13.2.4). From Table B4, we find values a,
for v > 0, b such that
where v = n - q and kv is a constant.
If we now change variables using P{xfn - ql::;; a} = P{xfn-ql ~ b} = 0.025.
(6.1.11), we find that the p.d.f. of 'f.ef is We then have
Idr.ef I
2
f(v)· -dv- =k vM2J - le - •/2,_
1 P { a::;; (n -q)s
a2 ::;;
}
b = 0.95,
v a2
and therefore the interval
vs2J(v/2)- l {
vs2} 1
=k• [ - exp - - · - (n - q)s 2
al
2 a2 a2 2
(n - q)s 2
b :=;;a ::;; a
Based on this distribution, the log likelihood
function of a is
has coverage probability 0.95.
vs 2 The second construction involves less arith
/(a)= - v log a - - for a> 0. metic than the first, but it does
2cr 2 not produce a likelihood interval. The inter
val will include some values of a 2
Setting /'(a) = 0 gives <J = s. Thu ss is the at the high end which are less likely than value
MLE of cr, and s2 is the MLE of a 2 , s excluded at the lower end (see
based on the marginal distribution of the Problem 11.4.10 and Example 13.3.2). For
residual sum of squares. this reason, the first construction
The log relative likelihood function of a based on the likelihood ratio statistic is
is preferable .
r(a) = l(a) - l(s)
PROBLEMS FOR SECTION 13.2
vs 2 v 1. Suppose that Y1 , Y1 ,
= - v log a - - 2 + v logs + - .• • , Y. are indep enden t N(a, 2
a ).
2a 2 (a) Show that, if a is known, the likelihood
= - - v[s 2
s2
2 -()2 - 1 - log -()2
J (13.2.8)
D;;;; Z 2 , where Z is defined in (13.2.6)
(b) Show that, if a is unknown, the likeli
ratio statis
and c = l /n.
tic for testing a value of a is
3. Let Y1 , Y2 , ••• , Y,, be independent variates, with Setting the derivative equal to zero gives
2
~ - N(µi , 11 /wi) for i = 1, 2, ... , n, • 1 .... -
11.=- .... y,=y.
n
where w 1 , w 2 , . . . , w. are known positive constants. The µ/ s are assumed to be
linear functions of q unknown parameters a, p, .. . . The MLE of 11. is the sample mean y. Note that a is a linear combination of the
(a) Derive the log likelihood function, and show that &, p, ...
are the parameter
2
y/ s:
values which minimize the weighted sum of squares .I:w1(y1 - µ 1) •
a=a1Y1+a2Y2 + ··· +a.y.
(b) Show that ii, p, ... are linear combinations of the y/s.
Note: This type of model might be used when the measurements y 1 , Ji , ... , Y. are where a 1 = a2 = ... =a.=~.
n
The sampling distribution of a is N(a., u 2 c),
not made with equal precision. Observations made with high precision are given
large weights w1, while less precise observations are given smaller weights. The where
estimates &, '/J, ... are called weighted least squares estimates. Because of (b), the 2
statistical methods described in this section can be extended easily to the weighted c = a21 + a22 + .. · +a.=
2
n ( ;;l ) =;;1
least squares case. It can be shown that .I:w,£f / 11 - xfn - q» and so the appropriate
2
variance estimate is
(see (6.6.8) ).
Upon replacing o: by a in (13.3. l ), we obtain the fitted values
iii =iii = · .. =ii. =a= y.
Thus the ith residual is
as The latter two formulas must be used cautiously, because they are highly
oo: = -2:E(y 1 -o:)= -2[:Ey 1 -na.].
susceptible to roundoff errors.
Note that the derivative of S ts linear in both a and the observations Having obtained ii, c, and s 2 , one can use the methods described in Section
Yi. Y2, .. ., Yn· 13.2 to make inferences about 11. and u.
208
13. Analysis of Norm al Measurem
ents
13.3. The One-Sample Model
EXAMPLE 13.3.1. A stan dard dru 209
g prod uce s bloo d pres sure incr
are norm ally dist ribu ted with eases which
me anµ = 22 and stan dard dev The new drug prod uce s a lower
A new dru g is also exp ecte d to iatio n a= 9.2. mea n increase than the stan dard
prod uce norm ally dist ribu ted incr also less vari able in its effect. This drug . It is
with possibly different values eases, but is an adv anta ge, because the effe
of µ and <J. The new dru g was new drug on an indi vidu al can ct of the
indi vidu als, and it prod uce d the given to ten be mor e accu rate ly pred icte d.
following bloo d pres sure increase 0
s: EXAMPLE 13.3.2. Eight plas tic gear
18 27 23 15 18 15 18 s were tested at 21 °C unti l they
20 17 8. times to failure (in millions of cycl failed. The
es) were as follows:
Is ther e evidence that µ i= 22?
that a i= 9.2? 2.37 2.01
2.47 2.20 1.87 2.32 2.00
SOLUTION. We assu me that the . 2.86
mea sure men ts y 1 ,Y2 . ·· ·· Yto are A com mon assu mpt ion in such
values of inde pen den t N(µ ;, a 2 obse rved situ atio ns is that the loga rith ms
) rand om vari able s,
and that times are norm ally dist ribu ted. of the failure
The natu ral loga rith ms of the
listed abo ve are as follows: eigh t times
µ1= µ2= ... =µ1 o=c t. .
We find that 0.863 0.698 0.904 0.788 . 0.626 0.84
n = 10; L.yi = 179; L.yf = 3433; 5: = 2 0.693 1.051
y= 17.9 Ass11ming these to be inde pen
den t obs erva tion s from N(µ , a 2
confidence inte rval s for µ and ), find 95%
S
n
= L.(y. -
'
Y) 2 = L.yf - ~(L.
n
y 1 ) 2 = 228.9 <J.
This model states that observations within a sample differ Since there are q = 2 unknown parameters µ and µi (or
from one another 1 a and /3), there are
only because of random variation. There are q = 2 unkno n - 2 degrees of freedom for variance estimation, and
wn parameters, ·µt (13.2.5) gives
and µ 2 •
i I (n 1 - l)sf + (ni - l)s~
Anoth er way to write the two-sample model is s =--E Ea• 2..1 = --------
n-2 ' (n 1 -l)+ (ni-l ) ·
µII =µ12= ... =µ1n 1 =rt. } The combined or pooled estimate s 2 .based on both sample
µi1 =µii= ··· = µ2•2 = rx + /3 (13.4.2) s is a weighted
average of the two sample variances si and sL with weight
s equal to their
where rx = µ 1 and f3 = µ 2 - µ 1 • An advantage of this param degrees of freedom n 1 - 1 and n2 - 1.
etrization is that
the difference in the means µ - µ , which is usually the
2 1 quantity of primary
interest, is explicitly represented by the param eter fl.
Upon substituting (13.4.1) into (13 .2.2), we find that Inferences for /3 = µz - µ1
the error sum of
squares is
For inferences about {J, we follow the procedures describ
ed in Section 13.2.
First we find the sampling distribution of 'fJ =Yi - f 1. Note
that
n1 n2
I
=
j=I
(Yu - µ1) 2 + I
j=I
(Yic µ1) 2.
f1 "'N(µ 1, : 1
IJ'i} f2- N(µi, : IJ' }
2
2
13.3. The One-Sample Model 211
210 13. Analysis of Normal Measurements
mothers at normal births is 31.25. The average age of the mothers at 50 births of
Since P{xfl}::::;; 3.841} = 0.95, the required interval is found by solving the mongolian children was 37.25 years, with sample variance s 2 = 49.35. Are these
inequality observations consistent with the hypothesis µ = 31.25?
-2r(a)s 3.841. 3.tUnder a special diet, twelve rats made the following weight gains (in grams) from
Since v = n - 1 = 7 and s 2 = 0.01876, (13.2.8) gives birth to age three months:
By plotting this function or using Newton's method, we find the interval to be Assuming that weight gains are independent N(µ, a 2 ), obtain 95% confidence
0.088 s as 0.258. This is also a 14.7% likelihood interval for a, and an intervals forµ and for a 2 . Use two methods to find the confidence interval for a 2 ,
approximate 5% significance interval for a. and compare the results.
Alternatively, we can construct a 95% confidence interval for a by using 4.t A manufacturer wishes to determine the mean breaking strength µ of string "to
the fact that within a pound", which we interpret as requiring that the 95% confidence interval
for µ should have length at most 2 pounds. If measurements are independent
N(µ, a 2 ), and if ten preliminary measurements gave L(xi - x) 2 = 80, how many
additional measurements would you advise the manufacturer to make?
Here n - 1 = 7, and Table B4 gives 5. Sixteen packages are randomly selected from the production of a detergent
P{xf7>2: 16.01} = P{xf7> s 1.690} = 0.025. packaging machine. Their weights (in grams) were as follows:
It follows that 287 293 295 295 297 298 299 300
2 300 302 302 303 306 307 308 311
p 1.690 s 7s s 16.01 } = 0.95,
{ It may be assumed that weights are independent N(µ, a 2 ).
and therefore the interval (a) Determine 95% confidence intervals for µ and a.
(b) Assuming that µ and a are equal to their estimates, find an interval which
7s 2 7s 2 contains the weight of a new randomly chosen package with probability 0.95.
<a2<--
l6.0l - - 1.690
6. Ten steel ingots chosen at random from a large shipment gave the following
has coverage probability 0.95. Upon substituting for s2 and taking square hardness measures:
roots, we obtain 0.091sa::::;;0.279 as the 95% confidence interval for a.
Note that the second construction does not produce a likelihood interval. 71.7 71.l 68.0 69.6 69.1
The interval 0.091::::;; as 0.279 includes values at the high end which are less 69.4 68.8 70.4 69.3 68.2
likely than some values excluded at the lower end. The first construction does
If the manufacturing process is under control, the hardness measures should be
produce a likelihood interval, and it is therefore preferable. D independent N(µ, a 2 ) with a 2 = 1.2.
(a) Are the ten observations consistent with the hypothesis a 2 = 1.2?
PROBLEMS FOR SECTION 13.3 (b) Assuming that a 2 = 1.2, find a 90% confidence interval forµ.
l. The following are the initial velocities in meters per second of seven projectiles fired (c) Find a 90% confidence interval forµ if a 2 is unknown and must be estimated
from the same gun: from the data.
451 447 454 450 454 449 452 7.tThe following are trypanosome counts (in thousands) in cattle seven days after
infection:
Assuming that velocity is normally distributed, obtain a 90% confidence interval
for the mean velocity µ. 17.0 2.1 1.7 44.2 5.1 2.9 3.5 19.6
28.0 7.0 17.1 0.7 34.5 13.0 1.5 5.2
2. The relationship between parental age and the incidence of mongolism wa.s
9.0 5.9 3.9 11.5. 14.5 16.2 33.3 12.2
discussed in Section 7.5. It is known that, in a certain population, the mean age of
214 13. Analysis of Normal Measurements 13.4. The Two-Sample Model 215
Furthermore, Y1 and Y2 are independent because Y11 , Yi 2 , .• ., Yin, are the first sample, and µ 2 for the second sample. This is the two-sample model.
We wish to know whether H: µ 1 = µ 2 , or equivalently H: f3 = 0, is consistent
independent of Y21 , Y22 , .•• , Y2 • 2 • It follows by (6.6.7) that
with the data.
- - (
Y2 - Y1 - N µz - µi, <1 2( 1
;;; + n21 )) .
Inferences about f3 are based on
To test H: /3 = 0, we compute
Here s 2 is the pooled variance estimate with n - 2 degrees of freedom, and T
has the same degrees of freedom as s 2 . We can now test an hypothesis f3 = {3 0 7;,bs =
'/1-0
fI = - 3.57;
or find confidence intervals for f3 as in Section 13.2. v s2c
SL= P{lt(22 1l 2: 3.57} ~ 0.002
EXAMPLE 13.4.1. Cuckoos lay their eggs in the nests of other birds. Table
13.4.1 gives the lengths in millimeters of n = 24 cuckoos' eggs found in nests of from Table B3. If µ 1 and µ 2 were equal, a difference as large as that observed
reed-warblers and wrens. The data are from a paper by O.H. Latter in would very rarely occur, and so there is strong evidence that µ 1 i= µ 2 •
Biometrika (1902). The table also shows the two sample means and sample EXAMPLE 13.4.2. The log-lifetimes of 8 plastic gears tested at 21 °C were
variances. analyzed in Example 13.3.2. The following are the log-lifetimes of 4 additional
The average length of cuckoos' eggs is 22.20 in the first sample, and only gears tested at 30°C:
21.12 in the second sample. It appears as though the lengths of cuckoos' eggs
may depend upon the locations in which they are found. On the other hand, 0.364 0.695 0.558 0.359 .
since only 24 eggs were measured, it may be that the observed difference It may be assumed that log-lifetimes are independent and normally dis-
between Yi and y2 is due merely to random variation. We wish to determine tributed with the same variance <J 2 , but with expected value µ 1 at 21°C and µ 2
whether the observed difference is too great to be attributed to random at 30°C. Find 95% confidence intervals for µ 1 , µ 2 - µ 1 , and <J::\.
variation. SOLUTION. The sample means and variances are as follows:
We model the 24 measurements as observed values of independent normal Sample 1 (21°C)
variates with the same variance. We assume that the expected length is µ 1 for
n 1 =8 y1 =0.8081 sf= 0.01876 (7 d.f.).
Table 13.4.1. Lengths in Millimeters of 24 Cuckoos' Eggs Sample 2 (30°C)
Y2 =0.4940 s~ = 0.02654 (3 d.f.).
Sample 1 Sample 2
The pooled variance estimate is
Eggs from reed-warblers' nests Eggs from wrens' nests
2 7sf + 3s~
21.2 21.6 21.9 19.8 20.0 20.3 20.8 20.9 s = 7 + 3 = 0.02109 (10 d.f.).
22.0 22.0 22.2 20.9 21.0 21.0 21.0 21.2
22.8 22.9 23.2 21.5 22.0 22.0 22.l 22.3
The sampling distribution of fi. 1 = y1 is N(µ 1 , <J
2
c), where c = 1/n 1 .
Inferences about µ 1 are based on
n 1 =9; .Yi= 22.20 n2 = 15; y = 21.12 P.1 - µ1
si = 0.4225 (8 d.f.) s~ = 0.5689 (14 d.f.) T= ! I -tcn-2)
...;s 2 c
216
13. Analysis of Normal Measurements 13.4. The Two-Sample Model
217
2
where s is the pooled variance estimate with n - 2 = I 0 degrees of freedom . The comments in Section 9.2 concerning the pooling of estimates are
From Table B3 we find that
relevant here. It is usually not a good idea to combine estimates sf, s~, ... ,sf
P{ -2.228::; t(IO)::; 2.228} = 0.95 which are significantly different from one another. A formal test for
homogeneity can be carried out as in Section 12.3. For details, see the
and hence the 95% confidence interval is problems at the end of this section.
µ1 E ii1±2.228Js2C=0.80 81±0.1144.
EXAMPLE 13.4.3. Table 13.4.2 shows data from a study on a pulse-jet
This differs from the construction given in Example 13.3.2 because now we pavement breaker using nozzles of three different diameters. The measure-
are using both samples to estimate a 2 . Thus s 2 is different, and there are more ment is the penetration (in millimeters) of a concrete slab produced by a single
degrees of freedom for T. discharge. The table also gives the sample mean y1 and sample variances? for
Next we consider P= µ 2 - µ 1 • The estimate '/J = y2 - y1 has sampling each sample. Since n 1 = n 2 = n 3 = 9, each of the sample variances has 8
distribution N(p, a 2c), where now c = -k + l Inferences about p are based on degrees of freedom.
Let y 1i denote the jth observation for the ith nozzle type, where i = 1, 2, 3
P-P
T = r::r: - t(IO)>
and j = 1, 2, .. . , 9. We assume that the yiJ's are observed values of independ-
2
-ys c ent N(µ 1i, a 2 ) variates, and that mea~urements made with the same type of
and the 95% confidence interval for p is nozzle have the same expected value:
for i = 1, 2, 3.
p E '/J ± 2.228Js2C = -0.3140 ± 0.1981;
that is, 0.1159 s µ 1 - µ 2 s 0:5121. This is the three-sample model with n 1 = n2 = n3 = 9. The variance estimate
for this model is
Two methods of constructing confidence intervals for a were described at
the end of Section 13.2. We shall use the first method, which is based on 2 = 8si + 8s~ + 8s~ = 3360 =
(13.2.9). Here we have s2 = 0.02109 with v = 10 degrees of freedom, so s 8+8+8 140
24 ,
0.02109 0.02109] with n - 3 = 24 degrees of freedom.
D=-2r(a)=IO - - - I - l o g --.
[ (J2 (J2 Suppose that we are interested· in µ 3 - µ 2 , which is the difference
in expected penetration for large and medium nozzles. This is estimated
We solve the inequality -2r(a)::; 3.841 to obtain
by y3 - y2 , which has sampling distribution N(µ 3 - µ 2 , a 2 c) where
0.0991s(Js0.2425 c = ! + t. Inferences about µ 3 - µ 2 are based on
as the (approximate) 95% confidence interval for a. 0 - {Y3 - Yi)- (µ3 - µ1) _ t
T= r::r: (24) •
v s2c
More Than Two Samples In particular, the 95% confidence interval is
Cut shoots 55 65 61 48 57 58 60 68 (a) Test the hypothesis that the variance in the readings is the same for the two
52 63
Rooted plants 53 48 50 39 43 44 objects. ·
46 56 35 51
(b) Assuming a common variance, obtain a 90% confidence interval for
the
Assuming that concentrations are independent normal with constant variance, difference in weights of the two objects.
obtain a 99% confidence interval for the difference in mean concentration for
the 9.tThe following are the distances traveled (in miles) by 15 rockets used to
two types of plant. test 3
different fuels.
5. Testing equality of variances. Let si, si, ... , sf be independent variance estimates,
where Fuel 1 16.2 17.3 17.0 16.6 17.4
Fuel 2 18.6 18.6 19.0 19.5 20.0
for i = 1, 2, ... , k. Fuel 3 19.7 19.4 20.0 19.2 18.9
(a) Find the joint log likelihood function of a , a , and show that it is
maximized for ar
=sf (i = 1, 2, ... , k).
1 2 ••• , O"t. (a) Test the hypothesis that the variability in distance traveled is the same for
three fuels.
the
220 13. Analysis of Normal Measurements
13.5. The Straight Line Model 221
(b) Assuming equal variances, obtain a 95% confidence mterval for the common
variance. E(Y) on xis linear. Even if the relationship is nonlinear, a straight line will
(c) Find a 95% confidence interval for the difference in mean distance traveled often give a satisfactory approximat ion over a restricted range of x-values. It
for fuels 2 and 3. is advisable to plot the data in order to see whether a straight line model will
(d) State the assumptions upon which the analysis in (a), (b), and (c) depends. be satisfactory (see Example 13.5.3).
10. Let si, s~, .. , sf be independent variate estimates as in problem 5 above. Consider Under the straight-line model, we have
the hypothesis
for i = 1, 2, .. ., n. (13.5.1)
for i= 1, 2, .. ., n,
There are q = 2 unknown parameters, the intercept a and the slope {3.
where w 1 , w 2 , .. ., wk are known positive constants and a 2 is unknown. For historical reasons, the straight line model is also called a simple linear
(a) Show that, under H, the MLE of a 2 is regression model. The ongin of the term "regression" is explained in Section
7.5.
s2 = (:l:v,w,sf)/(:i:v,).
(b) Derive the likelihood ratio statistic for testing H.
11. Consider the two-sample model (13.4.l) with a known. Show that the likelihood Estimation of a and fJ
ratio statistic for testing H: µ 1 = µ 2 is D = Z 2 , where
Upon substituting (13.5.1) into (13.2.2), we find that the error sum of squares
is
S = I:(y; - µ;)2 = I:(y; - a - /3x;)2.
Hence show that, if H is true, D has a x2 distribution with one degree of freedom.
The derivatives of S are
as as =
aa = - 21:(y; - ('J. - {3x;); -a{J -2I:x.(y.
, . ,- a - fix.), .
13.5. The Straight Line Model
Putting as/aa = 0 gives
Consider n measurements y 1 , y 2 , .. ., Yn• but now suppose that each
1: Y; - n& - '/JI:x; = 0.
measureme nt y, has associated with it a value X; of another variable, and that
the x-values can be used to explain or predict the corresponding y-values. We We divide by n and solve for &to obtain
call x the explanatory variable or independent variable, and y the response
variable or dependent variable. &=y-'/Jx. (13.5.2)
For instance, y 1 , y 2 , •. ., Yn could be blood pressure increases for n subjects We now put as;a{J equal to zero and substitute for &to obtain
who received doses x 1 , x 2 , •. ., Xn of a drug. Or y 1 , y 2 , .. ., Yn might be
0 = Lx;(Y1 - &- '/Jx 1)
gasoline mileages achieved by a ear inn tests at driving speeds x , x 2 , •. ., x"'
1
Or y 1 , y 2 , .. ., Yn might be log lifetimes of n pJa3tie gears tested at temper- = I:x,(y, - y + Px - '/Jx;)
atures x 1 , x 2 , .• ., x". In each ease, knowledge of the x-value will help to
= I:x 1(Y; - Y) - 'jJI:x,(x, - x).
predict or explain the value of y. 1
The x;'s will be treated as known constants in the analysis, and the y;'s will It follows that
be modelled as observed values of random variables Y1 , Y2 , .• ., Y,.. We shall
assume that the Yf's are independent and normally distributed with the same
P= Lx;(y;- ~) = Sxy. (13.5.3)
I:x 1(x 1 -x) Sxx
variance a 2 , so that
The numerator in (13.5.3) is called the corrected sum of products, and it can
for i = 1, 2, .. ., n. be rewritten in several forms:
The means µ 1 , p 2 , •. ., lln will then be modelled as functions of the ex- Sxy = I:(y, - y)x; = I:(x; - x)y; = I:(x; - x)(y; - y)
planatory variable x.
In many application s it is reasonable to assume that the dependence of 1
= I:x 1 y 1 -nxy= I:x 1y 1 - -(I:x;)(I:y;) .
n
222 13. Analysis of Normal Measurements 13.5. The Straight Line Model 223
y
The denominator is the corrected sum of squares of the x/s:
Sxx = I(x; - x)x; I(x; - x)2 160 x
x
1 x
x x
= Ixf - nx 2 = L.xf- -(fa;) 2 • x x
n
140 x
x
Variance Estimation x
120 x
x
The fitted values and residuals are given by
(13.5.5)
a= .Y - lJ.x = 80. 78.
The fitted line y = 80.78 + l.138x is shown in Figure 13.5.1.
Formula (13.5.4) is useful for hand calculation, but it is susceptible to large By (13.5.4), the residual sum of squares is
roundoff errors. For calculation by computer, it is better to evaluate the L.ef = Syy - Psxy = 492.47.
residuals 81 , 82 , ... , e., square them, and sum to get L.ef,
The estimate of the variance about the line is
EXAMPLE 13.5.1. The following table gives the age (x) and systolic blood
pressure (y) for each of n = 12 women: s2 = _!_2 L.ef = 49.247
n-
x 56 42 72 36 63 47 55 49 38 42 68 60 with n - 2 = 10 degrees of freedom.
y 147 125 160 118 149 128 150 145 115 140 152 155 EXAMPLE 13.5.2. In Examples 13.3.2 and 13.4.2 we considered data from
endurance tests of plastic gears at 21°C and 30°C. These data came from an
The data are plotted in Figure 13.5.1. The graph shows a roughly linear experiment in which n = 40 gears were tested at nine different temperatures.
increase in blood pressure with age. The amount of scatter about the line does Examination of the 40 lifetimes revealed that the lifetime distribution has a
not show any systematic change with x, and so the assumption of constant long tail to the right, and that there is more variability in the lifetimes at the
variance rJ 2 is reasonable. lower temperatures. It would not be appropriate to assume that the lifetimes
We assume that the y's are observed values of independent N(µi, rJ 2 ) were normally distributed with constant variance. Instead, we analyze log
variates and that the straight line model (13.5.1) holds. From the data we lifetimes as in Examples 13.3.2 and 13.4.2.
224
13. Analysis of Normal Measurements
13.5. The Straight Line Model
225
The natura l logarithms of the observed lifetimes (in
millions of cycles) are y
given in Table 13.5.1, and are plotted against operating
temperature in Figure
13.5.2. Note that the amou nt of scatter in the log lifetim
es is about the same at
all temperatures, and the dependence of mean log lifetim
e on temperature is
roughly linear.
1.5
There are 40 pairs (x,, yJ, where xi is the operating
temperature and Yi is
the log lifetime. Note that there are repeated x-valu
es: )<
)<
X1 = X2 = X3 = X4 = -16; 1.0 lf
x 5 = x 6 = x 7 = x 8 = O;
~ x )<
and so on. We assume that the y/s are observed x
values of independent 0 .5 x
N(µ;, a 2 ) variates where µi =ex+ {Jxi.
)<
l
20 40 60 80
I:.xf = 4(-16 ) + 4(0) 2 + 4(10) 2 + 8(21) 2 +' ... = 53816
2
)<
-1.0 x
y = 1.432 - 0.02946x
and the residual sum of squares is 1.24663 with 38 degree Figure 13.5.2. Scatte rplot of log lifetimes (y) versus
tempe rature (x).
s of freedom, giving
the variance estimate s2 = 0.03281.
When there are repeated x-values, it is possible to test line. See Section 14.4 for further discussion of this examp
the goodness of fit of le, and for a possible
the straight-line model (see Section 14.4). In this case, explanation of the poor fit.
the test indicates a poor
fit. The poor fit can also be seen in Figure 13.5.2,
since at 5 of the 9
temperatures, all of the observed log lifetimes lie on
the same side of the fitted
Plot the Data!
Table 13.5.1. Log Lifetimes of Plastic Gears at Nine
Operating It is possible to compute&, fJ, and s2 for any set of n pairs (xj, yJ Nothi ng in
Temperatures
the arithmetic tells us whether fitting a straight line
model is a sensible thing
Tempe rature Numb er to do. It is impor tant to plot the data and check that
y = Natura l logarithm of the straight-line model
x(OC) tested gives a reasonable fit. The graph may reveal difficu
lifetime (in millions of cycles) lties with the model, or
specfal features of the data which affect the interp
-16 4 retation. This point is
1.690 1.779 1.692 1.857 illustrated in the following example, which was given
0 4 1.643 1.584
by F.J. Anscombe,
1.585 1.462 Graph s in Statistical Analysis, The American Statist
10 4 1.153 0.991
ician 27 ( 1973), pages
1.204 1.029 17-21.
21 8 { 0.863 0.698 0.904 0.788
0.626 0.842 0.693 1.051 EXAMP LE 13.5.3. Four data sets each consis
30 4 0.364 0695
ting of 11 pairs (x 1, yJ are given in
0.558 0.359 Table 13.5.2. All four sets give approximately the same
37 4 0.412 0:425 numerical results
0.574 0.649
47 4 0.116 0.501 0.296 0.099 & = 3, fJ = 0.5, s = 1.528.
2
57 4 -0.355 -0.269 -0.354 -0.459 However, as Figure 13.5.3 shows, the appro priate
67 4 -0.736 -0.343 conclusions will be
-0.965 -0.705 l qualitatively different in each case.
l:
l
226 13. Analysis of Normal Measurements
13.5. The Straight Line Model
227
13.5.2.
With data set #3, there is an outlying point at x = 13. It causes the fitted
Set 1 Set 2 Set 3 line to be shifted upwards, so that it does not properly fit the remaining ten
x y x y points either. If we remove this point and recalculate, the fitted line is
4 4.26 4 3.10 4 5.39 Y =4 +0.346x
5 5.68 5 4.74 5 5.73 8 5.76
6 7.24 6 6.13 6 6.08 8 7.71 which gives a close fit to the remaining ten points and a much smaller
7 4.82 7 7.26 7 6.42 8 8.84 variance estimate. Both the outlier and the revised analysis should be
8 6.95 8 8.14 8 6.77 8 8.47 reported.
9 8.81 9 8.77 9 7.11 8 7.04 The fourth data set shows good agreement with the straight line model, but
10 8.04 10 9.14 10 7.46 8 5.25 the estimate of the slope depends entirely upon a single observation. If this
11 8.33 11 9.26 11 7.81 8 5.56 observation were found to be in error and deleted, the slope could not be
12 10.84 12 9.13 12 8.15 8 7.91 estimated. Furthermore, without measurements at additional values of x,
13 7.58 13 8.74 13 12.74 8 6.89 there is no way of determining whether the actual dependence of y on x is
14 8.10 14 8.84 19 12.50
even close to being linear. The fact that the analysis depends so heavily on a
single observation should be reported along with the numerical results.
The points of the first data set appear to be sc~ttered ra?domly about the In summary, although all four data sets yield the same numerical results for
fitted line y = 3 + 0.5x. The straight line model gives a satisfactory fit to the the straight line model, different conclusions are appropri ate in the four cases.
data and there are no peculiarities which need to be pointed out. Examination of a graph is an indispensable part of the statistical analysis.
F~r the second data set, the dependence of Yon xis clearly ~ot linear. T~e Some additional graphical methods for examining the adequacy of the
straight line model is inappropriate, and instead a quadratic polynomial model will be described in Section 14.5.
model
fori=l, 2, ... ,n PROBLEMS FOR SECTION 13.5
1. Theory suggests that a linear relationsh ip exists between the shearing strength
could be tried. of
steel bolts and their diameters. The following table gives the diameter
x and
y strength y for 9 bolts of a particular type.
10 10
x xx
x 1/8 1/4 3/8 1/2 5/8 3/4 7/8 1 3/2
x
x y 47 72 97 126 165 186 233
x x " 257 311
5 x 5 x (a) Fit a straight line to the data, and compute the variance estimate.
x
(b) Plot the data and the fitted line. Note that one of the observati ons
is seriously
0 x 0 out of line with the others.
0 5 10 15 0 5 10 15 (c) Recalculate the fitted line and variance estimate with the outlying observati
on
Data set I Data set 2 omitted, and plot the new line on the graph in (b). Briefly describe the effect
of
y y this observati on on the analysis.
x
2.tThe following are the breaking strengths of six bolts at each of five
10 10 different
x diameters.
Diameter 0.1 0.2 0.3 0.4 0.5
5 5
1.62 1.71 1.86 2.14 2.45
x 1.73 1.78 1.86 2.07 2.42
0 x 0
0 5 10 15 0 5 10 15 Breaking 1.70 1.79 1.90 2.11 2.33
Data set 3 Data set 4 strength 1.66 1.86 1.95 2.18 2.36
1.74 1.70 1.96 2.17 2.38
Figure 13.5.3. Scatterplo ts of the four data sets in Table 13.5.2.
1.72 1.84 2.00 2.07 2.31
228 13. Analysis of Normal Measurements 13.6: The Straight Line Model (Continued)
229
(a) Fit a straight line to the data and compute the variance estimate
. Note: It is not clear that either of the above analyses is appropri
(b) Plot the data and the fitted line. Does the dependence of breaking ate, because
strength on there 1s no natural choice for the dependent and independent variables
diameter appear to be linear? How should the model be modified? in this
example. In fact, both Rand C could be considered to be dependent on
actual age.
3. The analysis in the preceding example assumed that the variance
in breaking 6.tThe following measurements of atmospheric pressure (AP) and the
strength was the same at all five diameters. To check this assumption, boiling point of
compute the water (BP) were taken at various altitudes in the Alps and Scotland
sample variance for the six measurements at diameter 0.1. Repeat for . Theory
each of the suggests that the boiling point of water should change linearly with changes
other diameters to obtain five sample variances, each with five degrees in the
of freedom . (natural) logarithm of the atmospheric pressure.
Now carry out a likelihood ratio test of the hypothesis that the variance
is the same
at all five diameters. (See Problem 13.4.5.)
BP AP BP • AP BP AP
4. The following table gives x, the water content of snow on April 1,
and y, the water 194.5
yield from April to July (in inches), for the Snake River watershed in 20.79 200.9 23.89 209.5 28.49
Wyoming for 194.3
17 years (1919- 35). 20.79 201.1 23.99 208.6 27.76
197.9 22.40 201.4 24.02 210.7 29.04
x 198.4 22.67 201.3 24.0l 211.9 29.88
y x y x y 199.4 23.15 203.6 25.14 212.2 30.06
10.5 23.1 16.7 32.8 1~.2 31.8 199.9 23.35 204.6 26.57
17.0 32.0 16.3 30.4 10.5 24.0
23.1 39.5 12.4 24.2 24.9 52.5 (a) Fit a straight line model E(BP) =a+ f3 log (AP)
22.8 37.9 14.l to the data and compute the
30.5 12.9 25.l variance estimate. Use the fitted line to estimate the boiling point of
8.8 12.4 17.4 35.1 14.9 31.5 water for
atmospheric pressures 20, 25, and 30.
10.5 21.1 16.l 27.6 (b) How would the results in (a) be affected if one used
logarithms to the base 10
rather than natural logarithms?
(a) Fit a straight line to these data, and calculate the variance estimate. (cl Plot the data and the fitted line. Can you spot any difficulties? If so, how
Plot the would
data and the fitted line. Can you spot any difficulties? you suggest that the analysis should be modified?
(b) Suppose that the observations with the smallest and largest
x-values are
dropped from the analysis. Without redoing the calculations, explain
what
effects this will have on the estimates of the intercept, slope, and variance.
5. Archeologists use both tree ring dating and carbon dating in estimatin 13.6. The Straight Line Model (Continued)
g the age of
artifacts. In one study of Indian ruins, the estimated ages (in years)
by tree ring
dating (R) and carbon dating (C) were as follows: In Section 13.5 we derived estimate s of the parame ters a, {3, and a 2
in the
straight line model. In this section we apply the method s describe
d in Section
R c R c R c 13.2 to obtain significa nce tests and confide nce interval s.
We are conside ring n observe d pairs (xi, Y;) for i = 1, 2, ... , n.
710 795 212 222 415 The x :s are
432 treated as known constan ts, and the y;'s are modelle d as observe
717 764 822 765 272 352 d values of
indepen dent N(µ;, a 2 ) variates , where µ;=a+ /Jx;.
350 320 612 543 204 / 187
Throug hout this section, we use s2 to denote the appropr iate
323 360 647 642 206 192 varianc e
500 612 513 estimate based on the straight line model, as defined in (13.5.5).
533 824 764 This estimat e
620 642 722 724 641 has i1 - 2 degrees of freedom . Confide nce interval s for <J can be
701 obtaine d by
832 786 724 745 527 529 either of the method s describe d at the end of Section 13.2.
669 690 400 409 569 582
917 878 396 456 693 646
423 436 812 652 471 360 Inferences about the Slope f3
--
$ - r.:272 t (n - 2) • (13.6.2)
interval ....;s e'
Note tha t we will be able to determine f3 precisely (i.e. the confidence
will be narrow) if Sxx is large; Thus, if we are plannin g an experim ent to obtain The 95% confidence interval for µ =a+ f3x is
x. so that Sxx = :E(xi - x) 2 is
informa tion about fJ, we should select x 1 , x 2 , . .. ,
the x/s as [ 1 (x - .x)2 ]112
large. To maximize the information, we would need to make half of µ E ji ± t....;r:r:;
~ c; = jJ, ± ts - + ---'-
possible . Howeve r if we did n Sxx
large as possible, and the other half as small as
nce of
this, we would be unable to check the assump tion that the depende is
As a result, one would usually compro mise by taking where t is the value such that P{ -t $ t<• - l> :s; t} = 0.95. This interval
E(Y) on x is linear. s. We can
over the whole range of x-values , but with more observa tions at narrowest when x == x, and its width increase s as Ix - :XI increase
estimate a+ {Jx the most precisely when x is close to x, the mean
observa tions of the
to
the extremes than in the middle of the range. It would then be possible x-values used in fitting the line.
precise stateme nts about {J.
check the fit of the model, and also to make fairly
l
l
Inferences about E(Y)
Given a particul ar value for x, the expected value of Y is µ =a+
fJx, with l Inferences for the Intercept a
== r. [ ~ + (x - x)a] Yi
Soil type: 2 3 4 5 6 5. A study was carried out to investigate the effect of trap color on the catch of
Coated: 15.6 21.0 22.6 56.8 13.2 20.9 whiteflies. Two similar traps, one yellow and one green, were hung side by side on
Uncoated: 10.9 46.7 25.7 69.7 36.7 20.4 each of 8 plants in a greenhouse. The following table shows the weight of whitefiies
caught in each trap.
Soil type: 7 8 9 10 11 12
Coated: 8.6 31.2 25.4 8.5 11.2 35.8 5 7 8
Plant 2 3 4 6
Uncoated: 29.4 10.2 71.6 42.8 23.9 49.2 100.7 23.9 45.4 99.1 125.9
Yellow trap 20.5 42.7 19.4
Green trap 20.0 38.5 15.5 103.6 18.0 47.9 96.4 126.0
Obtain a 99% confidence interval for the mean difference in the amounts of
corrosion for the two types of pipe.
(a) Set up a normal model appropriate for examining the difference in effectiveness
2. Two analysts carried out simultaneous measurements of the percentage of of the two trap colors.
ammonia in a plant gas on nine successive days to find the extent of the bias, if any, (b) Test the hypothesis that yellow and green traps are equally effective in catching
between their results. Their measurements were: whiteflies, and state your conclusions carefully.
(cl Discuss briefly the advantages of conducting the study in the manner described
Day 2 3 4 5 6 7 8 9 rather than by hanging the 16 traps on 16 different plants.
Analyst A 4 37 35 43 34 36 48 33 33
Analyst B 18 37 38 36 47 48 57 28 42
6. Twenty pigs were grouped into ten pairs in such a way that the two pigs m a pair
had nearly equal weight. One pig was randomly chosen from each pair to receive
Obtain a 95% confidence interval for the mean difference in their measurements. diet X, and the other received diet Y. The following are the observed weight gains
On what assumptions does your analysis depend? per day:
3.t Six automobiles of different models were used to compare two brands of tires. Each
car was fitted with tires of brand A and driven over a difficult course until one of its Pair 2 3 4 5 6 7 8 9 10
tires could no longer be used. Tires of brand B were then fitted to the same cars, Diet X 21 21 19 16 26 19 18 29 22 19
and the procedure was repeated. The following are the observed mileages to tire Diet Y 30 25 25 16 29 18 18 19 24 22
failure in thousands of miles:
Give a 95% confidence interval for the difference in mean weight gain under the
Car I 2 3 4 5 6 two diets, and state the assumptions on which your analysis is based.
Brand A 18 23 16 27 19 17
Brand B 15 22 16 21 15 16 7. An experiment was carried out to determine how the defect rate y in a highway
surface depends on the amount x of asphalt cement used in the paving material.
(a) Test whether these data are consistent with the hypothesis that the mean Seven samples with known asphalt eontent were prepared. Each sample was split
lifetimes for the two brands are equal. in two, and two separate tests were made to determine the defect rate.
(b) What factor, other than difference in tire quality, might account for the lower
mileage achieved with brand B? Suggest an improvement in the design of the Asphalt content 50 75 100 125 200 250 275
experiment which would have helped to eliminate this source of bias. Defect rate 195 172 164 175 145 115 108
197 175 163 177 147 115 109
4. Two methods of treating sewage were compared. Each day for eight days, two
similar batches of sewage were selected. One batch was randomly chosen to receive
treatment A, and the other received treatment B. The following table shows the (a) Using all 14 observations (x, y), fit a straight line to the data. Plot the data and
coliform density per ml for the sixteen batches after treatment. the fitted line. Obtain a 95% confidence interval for the mean defect rate when
x = 100 and show it on the graph.
Day 1 2 3 4 5 6 7 8 (b) The graph in (a) suggests that it is not appropriate to model the two
A 16.44 22.00 18.17 20.09 11.02 20.09 24.53 13.46 measurements on the same sample as independent. Instead, it is better to
B 24.53 22.20 29.96 33.12 14.88 18.16 33.12 16.44 replace the two measurements by their average. Redo the analysis in (a) using
the seven observed pairs (x, ji,), and compare the results.
(a) Assuming that differences in coliform density are normally distributed, test the
hypothesis that the treatments are equally effective. 8.t A new technique for determining the fraction x of a given gas in a mixture of gases
(b) A more reasonable assumption in this case is that the logarithmic differem:es was investigated. Eleven gas mixtures with known x were prepared, and each of
log A, - log B, are independent N(µ, u 2 ). Repeat the test in {a) under 'th1~ them was divided into three portions. For each portion, the quantity y of the gas
assumption. which dissolved in a liquid was recorded.
240
13. Analysis of Norm al Measurements
Review Problems
x =con tent 241
y =am ount dissolving x =con tent y = amou nt dissolving
0.080 3. The following are yields (in poun ds)
2.67 2.68 2.75 0.131 of 16 toma to plant s grow n on 8 separ
4.46 4.40 4.43 uniform plots of land. One plant in each ate
0.082 2.73 2.69 2.62 plot was treate d with fertilizer A and the
0.139 4.78 4.80 4.86 .other with fertilizer B.
0.091 2.88 3.02 3.04 0.164 5.77 5.85 5.82
0.095 3.17 3.28 3.18 0.189 6.56 6.65 6.49 Plot
0.096 3.27 3.28 3.08 1 2 3 4 5 6
0.231 7.88 7.97 7 8
0.106 7.76 Fertilizer A 4.0 5.7 4.0 6.9
3.51 3.68 3.58 5.5 4.6 6.5 8.4
Fertilizer B 4.8 5.5 4.4 4.8 5.9 4.2 4.4 6.3
(a) Using all 33 obser ved pairs (x, y),
fit a straig ht line model to the data. Find Test the hypothesis that the two fertlli
a 95% confidence interv al for the expected zers are equal ly effective, and state the
amou nt dissolving in mixtures with assum ption s upon wl'lich the test is based
x =0.1. .
(b) The analysis in (a) assumes that the 4. A study was carrie d out to investigate
3 meas urem ents taken at each value of the depen dence of fuel oil consu mptio n
are indep enden t replicates. This is a quest x the mean atmo spher ic temp eratu re. The on
ionab le assumpt10n because these following are the results obser ved on ten
meas urem ents were obtai ned by dividing winter days.
one gas mixtu re into three portio ns
rathe r than by prepa ring three different
mixtures with the same x . One way
aroun d this difficulty is to replace the three Temp eratu re -3 -2
repea t obser vatio ns at each x by -10 +l -5 -6 -15 -4
their avera ge Yx· Repe at (a) using the 11 Cons umpt ion 150
-9 -2
obser ved pairs (x, Yxl•and comp are the 141 238 132 186 168 218
results. 163 210 169
(a) Fit a straig ht line to the data and calcu
late the variance estimate. Plot the fitted
line and the data, and comm ent on any
difficulties.
REVI EW PROB LEMS FOR CHAP (b) Obta in 90% confidence intervals for
TER 13 the interc ept, and for the mean fuel
consu mptio n on days when the mean temp
1. Two exper imen ts were carrie d out eratu re is - 5.
to deter mine µ, the mean increase in blood
press ure due to a certa in drug. Six differ 5. An experiment was performed to comp
ent subjects were used, three in each are two different meth ods of meas uring
exper imen t, and the following increases phosp hate conte nt of material. Ten samp the
were observed: les were chosen so that the mate rial
within a samp le was relatively homo geneo
us. Each samp le was then divid ed in half,
Expe rimen t 1: one half being analysed by meth od A and
4.5 5.6 4.9 the other half by meth od B.
Expe rimen t 2: -1.2 9.8 21.4 Sample 2 3 4 5 6 7 8 9 10
Indic ate, with reaso ns, which exper imen Meth od A 55.6 62.4 48.9 45.5
t produ ces stron ger evidence that the drug 75.4 89.6 38.4 96.8 92.5 98.7
does have an effect on blood pressures. Meth od B 58.4 66.3 51.2 46.l
Which exper imen t point s to the greater 74.3 92.5 40.2 97.3 94.8 99.0
effect?
Find a 95% confidence interv al for the
2.t Fourt een men were used in an exper mean difference in phosp hate conte nt as
imen t to deter mine which of two drugs meas ured by the two meth ods, and state
produ ces a great er incre ase in blood press the assum ption s upon which your
ure. Drug I was given to seven of the analysis depends.
men chose n at rando m, and drug 2 was
given to the rema ining seven. The observed 6.t In a progeny trial, the clean fleece weigh
increases in blood press ure are: ts of 9 ewe lamb s from each of four sires
were as follows:
Drug 1:
I
0.7 -0.2 3.4 3.7 0.8 0.0 2.0 Sire 1: 2.74 3.50 3.22 2.98 2.97 3.47
Drug 2: 1.9 1.1 4.4 5.5 1.6 4.6 3.47 3.68 4.22
3.4 Sire 2: 3.88 3.36 4.29 4.08 3.90 4.71 4.25 3.41 3.84
(a) Are these data consi stent with the Sire 3: 3.28 3.92 3.66 3.47
hypothesis of equal variances in blood 2.94 3.26 3.57 2.62 3.76
Sire 4: 3.52 3.54 4.13 3.29
press ure for the two drugs? 3.26 3.04 3.77 2.88 2.90
(b) Assu ming the variances to be equal
, obtai n a 95% confidence interval for the (a) Test the hypothesis that the variance
difference in mean blood press ure incre in fleece weight is the same for all four
ase µ 2 - µ 1 , and for the comm on sires.
varia nce ·a 2 •
(b) Assuming the variances to be equal,
(c) It is possible that the incre ase in blood obtai n a 95% confidence interv al for the
pressure with both drugs may depen d comm on variance a 2 •
upon the initial blood press ure of the subje
ct. How shoul d the design of the
exper imen t and the analysis be modified
to allow for this possibility?
14.1. Matrix Notation 243
CHAPTER 14 I The model is called linear if each of the µ/s may be written as a linear
function of the unknown parameters /J 1, /J 2, .. ., /3q· In a linear model, all
of the partial derivatives oµjo{J1 are known constants. The one-sample,
Normal Linear Models I two-sample, and straight line models are examples of linear models with
I
q =I, 2, 2, respectively.
A linear model can be described by a set of n linear equations:
I for 1 i; i ~ n. (14.1.2)
This is also called a multiple regression model. The fJ/s are unknown
parameters, and x 11 , X; 2 , .. ., X; 9 are known constants which describe the
conditions under which the ith observation is made. The x;/s may be values
of quantitative variables such as temperature or age, or values of 0 - I
indicator variables, or a mixture of these.
The n linear equations (14.1.2) can be represented by a single matrix
equation
µ=XfJ (14.l.3)
[p,
µ{J
X11 X12 X1q
In Chapter 13 we considered some simple models for normally distributed
measurement s. The basic assumptions for these models were discussed in
~~
X21 X22 X2q
Section 13.1, and some statistical methods were described in Section 13.2. fJ = , X=
............. ...........
All of the models considered in Chapter 13 are special cases of the normal
linear model, which is the subject of this chapter. Section 14.1 describes /Jq Xnl Xn2 Xnq
matrix notation for linear models and gives several examples. Section 2
considers the estimation of parameters in linear models, and likelihood ratio X has one row for each observation Y;, and one column for each unknown
tests are derived in Section 3. Section 4 gives some further discussion of the parameter {J1. To obtain X, we just write out the n equations (14.1.2) one
statistical methods described in Section 13.2. Section 5 describes some below the other and detach the coefficients of the fJ/s.
graphical procedures for checking the adequacy of the model. The distri- We shall assume that the q columns of X are linearly independent. If they
butions of the residual sum of squares and the additional sum of squares due were not, it would be possible to rewrite then equations using only q - 1 of
to a linear hypothesis are derived in Section 6. the unknown parameters /3 1, Pi. .. ., /39 •
The remainder of this section describes a few of the many situations
covered by linear models. In particular, all of the models considered in
Chapter 13 are linear models. Results derived for linear models in the
14.1. Matrix Notation following sections are applicable to all of these situations and many others as
well.
As in Chapter 13, we consider n measurement s y 1, y 2 , .. ., Yn of the same
quantity taken under various different conditions. We assume that the y/s are
observed values of independent random variables Y1 , Y2 , ••. , Y,,, where
Straight Line Model (Section 13.5)
for i = I, 2, ... , n. (14.1. l)
The n equations defining the straight line model are
See Section 13.1 for discussion of these basic assumptions.
The basic model (14.1.1) involves n + l unknown parameters µ 1, µ 2 , .. ., µ" µ1 = + /J2X1 = 1 • /J1 +.X1 • /J2
/J1
and a, but we have only n observations. Before we can estimate a, we must
µ2 = /J1 + /J2X2 = 1 • /J1 + X2. /J2
reduce the number of unknown parameters. We do this by writing the µ/s as
functions of q unknown parameters /3 1, /3 2 , .. ., /Jq where q < n. We then have
effectively n - q observations available for estimating a, and we say that there
are n - q degrees of freedom for variance estimation.
244
i4. Normal Lmear Models 14.L Matrix Notati on
245
These can be written in the form µ = X /3 wher
e
Two-Sample Problem (Section 13.4)
[~:l
Here we assume that m of the means are equal
x= [: are equal to {J 2:
to /3 1 , say, and the other n - m
/3 = ::].
1 x.
[::]
0
µ= . ; (n - m) x 2
0
µ.
Similarly, cubic and higher degree polynomia
µ = X/3 for a suitable choice of X, and are
ls can be written in the form Weighing Experiment (see Example 10.1.1)
examples of linear models. The
comp onen ts of X may be any known const Suppose that three objects with unkn own weigh
ants, such as 0, I, X;, xf, log x;,
sin X;, and so on. The model is still a linear mode ts /3 , {J , and /3 3 are weighed
l if the µ;'s are linear funct ions on a set of scales in all possible combinatio 1 2
of the unkno wn param eters /3 , fJ , ... , ns, giving 7 independent
1 2 /3q. measurements Y1 , Y2 , ... , Y • We assume
7 that the Y;'s are independent
N(µ;, u 2 ) , where
One-Sample Prob lem (Section 13.3)
µl =/31 ; µ1=/32; µ3={33;
In the one-sample problem, we assume that
equal to the same unknown value /3 , say.
then me7ns µ 1 , ~ 2 • . .• , µ.are all µ4 = /31 + /32; µ5 = /31 + {3 3; µ6 = /3 2 + {33;
1 Thus then equa tions are
µ7 = /31 + /32 + {33.
µ1=l ·/31 ; µ1=l ·f31; ... , µ.=l ·/31 ,
and we have µ = X f3 where This model has the form µ= X/3 where Xis
a 7 x 3 matrix with 0, 1 entries.
µ{]
Its transpose is
Parallel Line Model 4. A standard treatment A and three new treatments B, C, Dare to be compared in an
experiment using 3 mice from each of four litters. Twelve measurements I; are to be
Suppose that for the first m observations we wish to assume a straight line taken according to the following scheme.
modelµ; fJ 1 + fJ 3 x;, and that for the remaining n m observations we wish
to assume another straight line model µ; = fJ 2 + {J 3 x; which has the same Measurement No. 1 2 3 4 5 6 7 8 9 10 11 12
slope but a different intercept. This model can be written µ = X fJ where the Litter No. 1 2 3 4 1 2 3 4 1 2 3 4
transpose of X is
Treatment A A A B B B c c c D D D
Set up a linear model similar to that in the preceding problem, and write it in
X'~l ~ ~J
0 0
matrix notation.
0 0
X1 Xz Xm Xm+! Xm+2 x.
3. A standard treatment A and two new treatments B, C are to be compared in an where S is the error sum of squares,
experiment using 3 mice from each of four different litters. Twelve measurements Y;
are to be taken according to the following scheme: S = "i:,e[ = "i:,(y; µ;)2.
Measurement No. 2 3 4 5 6 7 8 9 10 11 12
The MLE's Pr.'/1 2 , ••• , pq are chosen to minimize S. The fitted values and
residuals are defined by
Litter No. 1 2 3 4 1 2 3 4 1 2 3 4
Treatment A A A A B B B B c c c c fl;=xil'/11 +x12P2 + ··· +x,q'/Jq;
It is assumed that il;=y;-[l;.
X'e=O. (14.2.2)
Then XLy=(X 'X) - 1 X'y is a 2 x l vector whose compone nts are ii: and p.
Substitu ting e= y - xp gives The compute r language APL is particularly convenient for linear models
X'(y - X'/J)=O. because it has a built-in operator tJ for handling the necessary calculations.
Having defined an n x q matrix X and a list of n y-values, one enters 8 X to
It follows that obtain XL, or Y ffi X to obtain p=XLy. Alternatively, statistical software
X'XP= X'y. packages such as SAS, SPSS, BMDP, and GLIM may be used for fitting linear
(14.2.3)
models.
This is a set of q linear equation s in p 1 , p , • . ., Pq·
2
Since Xis n x q, the product X' Xis q x q. It can be shown that, since X has
linearly independ ent columns, the product X' X is nonsingular, and its EXAMPLE 14.2.1. Data were collected to investigate how the amount of fuel oil
inverse (X'X) - 1 exists. Multiplying (14.2.3) by (X'X)- 1 gives required to heat a home depends upon the outdoor air temperat ure and wind
velocity. Table 14.2. l contains the result for n = l 0 winter days.
We expect fuel consump tion to increase as the wind velocity v increases ,
(14.2.4)
and to decrease as the temperature increases . As a first approxim ation, we
where XL =(X'X) - 1X'.
assume that these changes are linear, and that the effect of wind velocity is the
The matrix XL is q x n, and it has the property that same at all temperatures. Thus the y;'s are assumed to be observed values of
xLx = (X'X) - 1 X'X = Iq
where Iq is the q x q identity matrix. Thus XL is a left inverse of X. Note that
Table 14.2.1. Fuel Consum ption (y),
XXL= X(X'X) - X' 1 Temperature (t), and Wind Velocity
(v) on Each of Ten Winter Days
which is n x n and will not equal I" unless X is a sq'.?3re matrix (q = n).
Day y v
1 14.96 -3.0 15.3
Compu tation 2 14.10 -1.8 16.4
3 23.76 -10.0 41.2
Calculat ions for linear models are usually done by computer. The main labor 4 13.20 0.7 9.7
is in finding the q x n matrix XL = (X'X)- 1 X'. From this we can easily get 5 18.60 -5.l 19.3
p= XLy,µ = xp,and e = y - µ. Squaring and summing thencom ponents ofe 6 16.79 -6.3 11.4
gives the residual sum of squares :Eef. The variance estimate is then 7 21.83 -15.5 5.9
8 16.25 -4.2 24.3
1 9 20.98 -8.8 14.7
S
2
= --:Eef 10 16.88 -2.3
n- q 16.l
250 14. Nonna! Linear Models 14.2. Parameter Estimates
251
independ ent N(µ;, a 2 ) variates, where
PROBLEMS FOR SECTION 14.2
µi = /31 + f32ti + /J3Vi for i = 1, 2, ... , 10. I. Show that
Here {3 2 is the effect on mean fuel consump tion of a unit increase in
tempera ture assumin g that the wind velocity is held fixed, and {3 is the effect r.E:f = r.yf - P'(X'y).
3
on mean consump tion of a unit increase in wind velocity with the temper- This formula is useful when calculations are to be done by hand, but
it is
ature fixed . The "general constant term" {3 represents the mean fuel susceptible to roundoff errors.
1
consump tion when t = v = 0. 2.t Set up the straight line model of Example 13.5.1 in matrix notation. Calculate
The model can be written asµ= X/3 where Xis the 10 x 3 matrix shown in (X'X) - 1 and X'y, and hence obtain the parameter estimates . Use the formula
in
Figure 14.2.1. The left inverse Problem 1 to obtain the residual sum of squares.
XL=(X' X) - 1 X' 3. Set up the 3-sample model of Example 13.4.3 in matrix notation. Calculate
(X'X) - 1 and X'y, and hence obtain the parameter estimates . Use the formula
was obtained by compute r using the APL operator tB. Its transpose has the in
Problem 1 to find the residual sum of squares.
same shape as X and is shown rounded to four decimal places in Figure
14.2.1. The vector of paramet er estimates p= XLy is found next, and the 4. The following measurements are from the weighing experiment described
in
fitted model is Section 14.1.
-3 .0
-1.8
-10.0
15.3
16.4
41.2
0.2072
0.2159
-0.4710
0.0127
0.0189
-0.0177
-0.0020
-0.0006
0.0270
(X'X) '-l: ~: =:J
Hence show that the estimated weights are
0.7 9.7 0.4085 0.0302 -0.0080
X= -5.i 19.3 0.0770 0.0030
p, = 11.875 p3 = 28.675
(XL)'= 0.0023
-6.3 11.4 0.2003 -0.0044 -0.0072 (b) Use the formula in Problem 1 to evaluate r.ef, and show that s2 = 3.08375 with
-15.5 5.9 0.0769 -0.0511 -0.0152 4 degrees of freedom.
-4.2 24.3 0.0024 0.0083 0.0083 5.tThe yield Y of a chemical process was measured at each of nine different
-8.8 14.7 0.0737 -0.0162 -0.0037
-2.3 temperatures t1 with the following results:
16.1 0.2092 0.0163 -0.0010
15 16 17 18 19 20 21 22 23
14.96 15.81 -0.85 90 91.9 90.7 87.9 86.4 82.5 80.0 76.0 70.0
14.10 15.19 -1.09
23.76 23.57 0.19 Consider the 2nd degree polynomial model
13.20 12.75 0.45
18.60 µ;=/3, +(t,-19)/ 32 +(t,-19) 2 /33
y=
16.79 p= [ -0.6285
11.9339]
; fl=
17.64
, £=
0.96 for i = 1, 2, .. ., 9.
0.1298 17.37 -0.58 (a) Write this model in the formµ= X/3. Calculate the paramete r estimates
21.83 and
22.44 -0.61 the residual sum of squares.
16.25 17.73 -1.48
20.98 19.37 1.61 1
0. 2~5411 -0.0~1645]
0
16.88 15.47 1.41 Note: [ : : 6:]- -[
6 0.016667
Figure 14.2.1. Calculations for the fuel consumption example.
60 0 708 -0.021645 0 0.003247
252 14. Normal Linear Models 14.3. Testing Hypotheses in Linear Models 253
Let ci denote the sum of squares of elements in the jth column: and the maximum of the log likelihood is
µ,=x, 1 /3 1 +x; 2 /3 2 + ··· +x,q/Jq fori=l,2, .. .,n. D = n log(:Eef /Lef} = n log[ 1 + L.~f J (14.3.1)
In matrix notation this is µ = X fJ where X is an n x q matrix with linearly
where Q is the increase in the residual sum of squares due to the hypothesis:
independent columns. The model involves q + I \unknown parameters
f3 1 , fJ 2 , •• ., fJ q and 11. By ( 13.2.1) the log likelihood function is Q = L.ef- L.ef. (14.3.2)
2
the way the experiment was performed, the variance estimate s in the where Y1 is the mean of the n; observations at x = x,. Use this formula to check
denominator of Fobs is too small. A considerable amount of time and effort the value for Q m Example 14.3.3.
was required to reset the test machine from one temperature to another. To
4. In Problem 13.4.9, test the hypothesis that the mean distance traveled is the same
save time, the experimenter sometimes ran two or more tests at the same for all fuels, and state the assumptions upon which this test is based.
temperature without resetting the test machine. Repeat measurements
obtained without resetting will likely show less scatter than would be 5. t Several. chemical analyses of samples of a product were performed on each of four
obtained if the machine were reset each time. These repeats do not reflect all successive d.ays, and the following table gives the percentage impurity found in
each analysis.
possible sources of variability in the experiment, and consequently the
variance estimate we obtained is likely to be too small. We do not have a Day 1: 2.6 2.6 2.9 2.0 2.1 2.1 2.0
Day2: 3.1 2.9 3.1 2.5
valid estimate of u 2 , and so interpretation of the results is not clearcut.
Day 3: 2.6 2.2 2.5 2.2 1.2 1.2 1.8
The experiment should have been run in four complete replications. In the
Day4: 2.5 2.4 3.0 1.5 1.7
first replication, one gear would be tested at each temperature, with the order
of testing being decided at random. This procedure would then be repeated (a) Assuming equal variances, test whether there is a difference in
the mean
three more times, with a different random order each time. The four percentage impurity over the four days.
measurements at the same temperature would then be genuine independent (b) Check the equal-variance assumption (see Problem 13.4.5).
2
replicates, and it would be possible to obtain an estimate of u which takes
6. T.hre~ laboratories each carried out five independent determinations of the
into account all sources of variability in the experiment. n.1cotme content of a brand of cigarettes. Their findings, in milligrams per
cigarette, were as follows:
Laboratory A: 16.3 15.6 15.5 16.7 16.2
PROBLEMS FOR SECTION 14.3
Laboratory B: 13.5 17.4 16.9 18.2 15.6
Lt Measurements of breaking strength for six bolts at each offive diameters are given Laboratory C: 14.1 13.2 14.3 12.9 12.8
in Problem 13.5.2. Three different models are fitted to these data. The residual
sum of squares is found to be 0.074317 for a 5-sample model, 0.14066 for a straight Are there real differences among the results produced by the three laboratories?
line model, and 0.07436 for a second degree polynomial model. 7. '."feasu:ements of the ulti~ate tensile ~trength (UTS) were made for specimens of
msulatmg foam of five different densities.
(a) Assuming the 5-sample model to be correct, test the hypotheses H 1 · µi =
/31 + f32di and H2: µ1 = /31 + f32d1 + f33df. Density (x) Ultimate tensile strength (y)
(b) Assuming the second degree polynomial model to be correct, test the
I
hypothesis that /3 3 = 0. 4.155 82.8 95.5 97.5 ~02.8 105.6 107.8 115.7 118.9
3.555 79.7 84.5 85.2 98.0 105.2 113.6
2. Show that the additional sum of squares due to H: µ 1 = µ 2 = · ·· = µk in the 3.55 71.0 98.2 104.9 106.9 109.6 117.8
k-sample model is given by 3.23 67.1 77.0 80.3 81.8 83.0 84.1
k 4.25 98.5 105.5 111.6 114.5 126.5 127.l
Q= I
i=l
nJ.Y; - .YJ2,
<?alculate the residual sum of squares for a five-sample model, and for a straight
where n; and y1 are the sample size and mean for the ith sample, and yis the grand !me model.. Hence test the hypothesis that the dependence of mean strength on
mean. Use this formula to check the value given for Qin Example 14.3.2. density 1s lmear.
3. Consider n = n1 + n2 + ··· + nk pairs of measurements (x;, yii) for j = 1, 2, ... , n1; 8.tTh~ following table gives measurements of systolic blood pressure for 20 men of
i = 1, 2, ... , k. The Y;/s are observed values of independent N(µ 1j, u 2) random vanous ages:
variables, where
Age (years) Blood pressure (mm Hg)
for i = l, 2, ... , k.
30 108 110 106
Show that the additional sum of squares due to H: µ;=a + f3x 1for i = 1, 2, ... , k is 40 125 120 118 119
given by 50 132 137 134
k 60 148 151 146 147 144
Q= L n;(y1 - & - ~x 1 ) 2 , 70 162 156 164 158 159
i=l
260 14. Normal Linear Models
14.4. More on Tests and Confidence Intervals
261
Calculate the residual sums of squares for a five-sample model and for a straight
line model. Hence test the hypothesis that the dependence of mean blood pressure columns : From Section 14.2, the MLE of f3 is
on age is linear.
jJ =" XLY
9. Problem 13.7.8 presents eleven sets of three measurements of the amount of 1
where XL= (X' X)- X' is a q x n matrix of constants.
a gas
dissolving in a liquid.
(a) Calculate the residual sums of squares for an 11-sample model and a straight
line model. Test the hypothesis that the mean amount dissolving is a linear Inferences about /Ji
function of the gas content in the mixture.
(b) Explain why one might expect to obtain a small significance level in (a) even
if The MLE of /3; is a linear combina tion of the Y;'s:
the straight line model is correct.
'iJ,=a1Y1+a2Y2+ ... +anY,,
10. A procedure sometimes used to check the adequacy of a linear model
is to where a 1 , a2 , .•• ,a" are the elements in the ith row of xi. It follows by (6.6.7)
complicate the model by adding extra terms to it, refit, and then test whether
the that #1 ~ N(I.aiµi, cr 2 I.aJ).
new terms are significantly different from zero. For instance, in Example 14.2.1,
the residual sum of squares for the modelµ, = {3 + {3 t, + {3 v, was found to Note that I.aiµi is the product of the ith row of XL with the vectorµ. This is
1 2 3 be the ith element of the matrix product XLµ. But since µ = X (J and XL X = J, it
10.533. If the more complicated model
follows that
µ, = {3, + f32t1 + {33v, + {3.cf + /3 5vt + {36c,v,
XLµ= XL X/3 = 1/3 = (J.
is fitted to the data, the residual sum of squares decreases to 4.442. Using these
results, test the hypothesis that /3 4 = {3 5 = {3 = 0. (A small significance level would
6
Hence the ith element of XLµ is fJ,, and it follows that
indicate possible difficulties with the simpler model.) '
E(jJ;) = ~a 1 µ 1 = (J,.
11. Consider two linear models µ = X fJ and µ = Wy where X and W are
nxq Similarly, I.aJ is the product of the ith row of xi with itself. This is the (i, i)
matrices with linearly independent columns. Suppose that W = Xb where b element of
is a
q x q nonsingular matrix. (This means that the columns of W are linear com-
binations of the columns of X.)
(a) Show that P= bY. It follows that var(p;) = cr 2 v;;, where V;; is the ith diagonal element of
(b) Show that both models give the same fitted values V.
µ and residuals £. Similar arguments can be used to sh6w that cov ('°- 0 .) = cr 2 v ..
µ.,µ) l]"
According to the procedure described in Section 13.2, we start with the
sampling distribution P; - N((Ji, cr 2 v;;). We then standard ize and replace a 2
by s 2, the variance estimate for the model µ = X (J, to obtain
14.4. More on Tests and Confidence Intervals
'/J; - /3;
T= r:I::" - t 1n-q) · (14.4.1)
In Section 13.2 we described procedures for making inferences about a single Vs-vii
paramet er /3, in a linear model. These methods cari also be used to make We can now test an hypothesis concerning (Ji or set up confidence intervals
inferences about a single linear combina tion()= b /3 + b /3 + ··· + bqf3q· for /Ji as in Section 13.2.
1 1 2 2
In this section we give some further discussion of these methods. We
shall
show that the significance tests described in Section 13.2 are equivalent to
likelihood ratio tests, and that the confidence intervals obtained are in fact
maximum likelihood intervals.
As in the preceding sections we assume that Y , Y , .• . , Y,, are independent More generally, suppose that we are interested in a linear function of the (J/s,
1 2
N(µ,, cr 2 ), and that
{)= bi{JI + b2/J2 + ... + bq/Jq = b'{J
for i= 1, 2, ... , n. where the b/s are constant s and b is q x I. The MLE of() is
In matrix notation this is µ = X f3 where X is n x q with linearly independent B=b1P1 +b2P2+ ... +bqPq=b''/3.
,
262 14. Normal Linear Models 14.4. More on Tests and Confidence Intervals
263
Since p= XL Y, it follows that which is an increasing function of T 2 • It follow
s that
B=b1XLY =a 1 Y1 +a2Y 2+ ··· +a"Y" P{D z Dob.} = P{T 2 ~ T~b. } = P{I Tl z 17;,bsl},
where a; is the ith comp onen t of the 1 x q vecto 1 and so the significance tests described in Sectio
r b XL. . . . . . n 13.2 are equivalent to
Since 8 is a linear combination of the Y;'s, its likelihood ratio tests.
sampling d1stnbutlon IS
normal. Since E(p,) = /3; we have It also follows from these results that the maxim
um Jog relative likelihood
function of 8 is
E(B)= b 1 /3 1 +b2/32+ ··· +bq/3q=8.
The variance of Bis a 2 c, where
c = L.af = b'XL(b1XL)' = b'XL(XL)'b = b1Vb.
To construct a confidence interval for 8, we take
Thus we have 8- N(8, a 2 c), where c = b'Vb. - ts Ts t where t is the
Now, proceeding as in Section 13.2, we standardiz appropriate value from Table B3. The parameter
e and replace a 2 bys to
2 values belonging to the
confidence interval are those for which
get
8- 8
T= r.:r:- t<n-q )
....; s2 c
(14.4.2) rmax(8)z -~log{!+ n~qt 2 }.
where c = b' Vb. Inferences abou t 8 are based on Hence the confidence interval is a maximum likeli
this result. hood interval for 8.
Note that (14.4.1) is the special case of (14.4.2) in
which bi= 1, and b1 = 0 for
j #i. EXAM PLE 14.4.1. In Example 14.2.1 we fitted the model
µ;=/31+f32t ;+/33V;, i=l,2 , ... ,10
Con nect ion with Likelihood Ratio Tests to the fuel consumption data of Table 14.2.1. The
matrix (XL)' is given in
Figure 14.2. l, and from this we can find
We showed in Section 14.3 that the likelihood
ratio statistic for testing a
linear hypothesis H is 0.57939 0.02503
l
-0.01 942J
D = n log[ !+ ~ 2 J = n log[l + q-
L.s; n-q
p FJ
v= XL(XL)' = 0.02503 0.00497 0.00017 .
-0.01 942 0.00017
0.00117
where Q and Fare defined in (14.3 .2) and (14.3.
3). .. Parameter /3 3 measures the effect of wind veloc
At the end of this section we shall show that the ity on expected fuel
additional sum of squares consumption. Inferences about /3 are based on
due to the hypothesis 3
Q + 1 = (B ~ 8) = T 2 •
2
F=
s 2
s c
The likelihood ratio statistic for testing H is SL= P{ l t(7) 1z 3.09}~0.02
from Table B3.
1 A different procedure was used to test H : {J = 0
D = n log { 1 + - - T1 in Example 14.3.1. There
n-q } , we refitted the model with /3 = 0 and calcu 3
3 lated the additional sum of
264 14. Normal Linear Models 14.4. More on Tests and Confidence Intervals
265
squares Q. We then found
g(/31, /3z, ···, /3q, A.)= S(/31, /32, ... , /3q) + 2A..('Lbj/3; - 8).
+1 The extra variable), is called a Lagrange multiplier. We now minimize g over
Fobs= =9.58;
the q + 1 variables /3 1, /3 2, ... , /Jq, and ),.
The derivatives of g are
SL= P{F 1 , 7 ~ 9.58}.
og
Since Fobs= r;b,, and since tf
7> is distributed as F 1 , 7 by (6.10.7), it follows o). = 2('Lb)3;- B);
that
P{ F 1, 7 ~Fobs} = P{ tf7> ~ T;b,} = P{JtmJ ~ 17;,bsl}. ag
Both of these procedures are equivalent to the likelihood ratio test, and
therefore they will always give the same significance level. Upon setting these derivatives equal to zero, we find that p1 , p2 , ..• , f3q and A
To complete the example, we shall find a 95% confidence interval for the satisfy the q + 1 equations
mean fuel consumption on days when the temperature is - 5 and the wind
velocity is 20. According to the model, this is
'Lb;lJj = e;
forj= 1, 2, ... , q.
e= /31 - 5/32 + 20f33 = b1/3
where b' = (1 5 20). The MLE of fJ is .N_ot~ that the restriction "2:.b;/3; = fJ is satisfied at the minimum. Also, if({i, A)
m1mm1zes g, then lJ minimizes S.
e= '/11 - 5'/12 + 2op3 = 11.61. In matrix notation, the q + 1 equations are
The variance of eis a c where
2
X's= 'lb; b'P = 8
lj
c =b'Vb
where b is q x 1. Substituting 8 = Y - xp gives
l
0.57939 0.02503
-0.01942J [ X'(y - XTJ) =Ab
= [1 -5 20] 0.02503 0.00497 0.00017 -5 and now multiplying by V = (X' X) 1 gives
-0.01942 0.00017 0.00117 20
lJ=(X'x)- 1 X'y-X(x'x)- 1 b = '/J-AVb.
= 0.11025. Since b'P = 8, it follows that
Now, by (14.4.2), the 95% confidence interval for 8 is
8 = b' p - Ab' Vb = e Ac
eE e± 2.365ftc = 17.67 ± o.96. where B= b' '/J and c = b' Vb, and therefore
We know that this is also a maximum likelihood interval, so each value of e A= (8- 8)/c.
belonging to the interval has a higher maximum relative likelihood than any
value outside the interval. I Also, since 7J = '/1-A.Vb, we have
PROOF OF (14.4.3). Let 7J denote the MLE of f3 under the hypothesis e= y-XP = y-XP + AXVb = e+ l.XVb.
H: b 1 /3 1 + b2 /3 2 + ··· + bq[Jq = 8, The residual sum of squares under H is
and let ii= Y - XP be the vector of residuals. Then 7J is the value of f3 which "2:.sf =s's=(£+ J:xvb)'(e + :l.xvb)
minimizes
= e'e + A b'V'X'XVb +cross-product terms.
2
S = "J:.(yi - µ;)2 = "J:.(yi Xn/31 - ... - X;q/Jq) 2
The cross-product terms are zero because X'e = 0 by (14.2.2). Since
subject to the restriction "Lb;/3j = e. V = (X'X)- 1 and V' = V, it follows that
To find TJ, we use the method of Lagrange. We define a new function of
q + 1 variables,
14. Normal Linear Models 14.5. Checking the Model 267
266
(i) use (14.4.2); (a) Using an indicator variable x 3 = ± I for crack type, set up a linear model
(ii) use the hypothesis to simplify the ·model, then refit and use the additional corresponding to the above assumptions.
sum of squares method. (b) Fit the model by least squares, and compute the variance estimate. (If you did
(a) correctly, X' X will be a diagonal matrix, and the calculations are easy.)
(c) Assuming the simplified model in b (ii), recalculate the 95% confidence interval (c) Test the hypothesis that cycling has no effect on mean burst strength.
for {1 3 . (d) Obtain a 95% confideni:e intervalfor the mean burst strength of pressure tubes
2. Consider the chemical yield data in Problem 14.2.5. with sharp cracks of length + I at the lowest operating temperature.
(a) Find a 95% confidence interval for the expected yield of the process when 5. Two random samples of 11 lambs each were used in an experiment to assess the
t = 17.5.
effect of a treatment on body weight. One sample received the treatment, and the
(b) Test the hypothesis {1 3 '"'0 in two ways: other sample served as the control group (no treatment). The body weight y (in
pounds) and age x (in days) were recorded for each animal at the end of the
(i) use (14.4.2); experiment.
(ii) use the additional sum of squares method.
y 35 34 34 35 26 32 24 33 23 20 15
3.tThirteen sets of observations were taken on the variables y, x 1 , x 2 , and X3. Here y Control x 83 81 80 78 73 72 72 70 70 65 54
is the percentage of bacteria surviving a treatment, and x 1 , x 2 , x 3 are the
concentrations of three chemicals used in the treatment. The model y 45 44 44 46 42 39 38 40 38 31 23
Treated 72 70 66 54 50
x 90 83 80 80 79 74
l
lS:iS:l3
was fitted to the data by least squares, with the following results: (a) Plot the data. Fit different straight line models for the two samples, and
l""l
-0.08 -0.09 compute the total residual sum of squares.
1.02
8.06
-0.08 0.008 0.002 -0791
0.003
(b) Fit parallel straight lines to the two samples and calculate the residual sum
of squares.
P= -1.86
; 1
(X'X) - =
-0.09 0.002 0.017 0.002 (c) Use the additional sum of squares method to test the hypothesis of equal
slopes.
-0.34 -0.79 0.003 0.002 0.087 (d) Repeat (c) using a t-test.
(e) Assuming the parallel line model, find a 95% confidence interval for the
The residual sum of squares was 38.7. increase in mean weight due to the treatment.
(a) Obtain a 95% confidence interval for {1 2 -{1 3 .
(b) Test the hypothesis {1 3 = 0.
(c) The model was refitted with {1 2 = {1 4 = 0, and the new residual sum of squares
was found to be 167.6. Test the hypothesis {1 2 = {1 4 = 0. 14.5. Checking the Model
4. An experiment is performed to investigate the dependence of burst strength (y) on
crack length (x 1 ) and operating temperature (x 2 ) in pressure tubes. Cracks of three Example 13.5.3 demonstrates the importance of plotting the data to check that
different lengths are cut in specimens, and half of these are cycled to sharpen the a straight line model is reasonable. This is doubly important with more
cracks. The specimep.s are then tested at three different temperatures, and the burst
complex linear moi:lels. In this section we briefly describe some procedures for
strength is determined. The following table gives the results (simplified by changes
checking the assumptions which underly the normal linear model. Most of
of origin and scale).
268
14. Norma l Linear Models
14.5. Checking the Model
269
these involve looking for patter ns in plots of residu
als or standardized
residuals. For a more detailed discussion, see Chapt combination of y 1 , Yi, .. ., Yn and then find m;i as
er 3 of N. Drape r and the coefficient of Yi· For
H. Smith, Applie d Regre ssion Analy sis, 2nd Editio instance, in the one-sample problem we have
n, Wiley (1981).
The residual ei = Yi - [1 1 can be regarded as an
estimate of the error • 1
ei = Y; - µi. Since the Y;'s are assumed to be indep µ1= .Y = -(Y1+ ··· + y;+ ··· +y.)
endent N(µ 1, c; 2), the s/s n
are independent N(O, c; 2 ). Thus, if the model is correc
t, we would expect
the e;'s to look like independent observations from
N(O, a 2 ) . and so mii = ~.
In the k-sample problem, [1 1 is the sample mean Yi
for the
In fact, the e/ s have unequal variances less than fli. n
To see this we note sample which contains the ith observation. The levera
that, by (14.2. 1) and (14.2.4), ge is mii = l / ni where ni
is the number of observations in this sample. Thus
fi=X P=X XLy= M y, in Example 13.4.2, the 8
observations at 21 °C have leverages t, and the other
4 observations at 30°C
where M = (mii) is an n x n symmetric matrix : have leverages ;i-.
' .)<,
(I) Model sati sfactor y (i i ) Noncon stanl varian ce'
~alculation of Leverages
The quant ity m;i which appears in the expression x
for the standardized
residual ri is called the leverage of the ith point. The x x
leverages are the diagonal •2 - - - - - - - - -.- - - - - - ----x- - - ~- ---- -
elements of the n x n matrix M =XXL . Since XL " x
X =I, it follows that x x x x
MM= M , and from this result it can be shown that x
0 ::;mi; ::; 1. x µ -1---
x -x~~~~-x~x
~~µ
When the matrix methods of Section 14.2 are used, x x
the leverages may be -2 - - ---- ---- X.-- -
_:x____ _ _ _ _ _ j5. __ _
calculated by multiplying X and (XL)' term by term x
to obtain a new n x q x x
matrix and then finding its row totals .
(iii) Outlier (iv) Curvat ure
When algebraic formulas are used as in Chapt er 13,
we write [ii as a linear
Figure 14.5.1 . Pattern s in residual plots.
14.5. Checking the Model 271
14. Normal Linear Models
270
d results with
the means positive and negative residuals, and compa re the observe
In (iv) there is a pattern in the residual plot which suggests that ces (see Section 2.7).
e explan ation in (ii). theoretical results for random sequen
µ;have not been modelled correctly. This is also a possibl
tic term to the model should
With a straight line model, addition of a quadra
find a remedy
fix the problem, but some detective work may be needed to
. Looking for Influential Points
with more complex models
lly the same
For straight line models, a plot of r; ore; versusµ, gives essentia influence on
er, pattern s will show up more Sometimes just one or two of the observations will have a large
inform ation as a plot of Yi versus X;. Howev an extrem e case in
shows only deviati ons from the fitted line. the analysis. The fourth data set in Example 13.5.3 shows
clearly in the residual plot which ent upon one observ ation
that cov (61> fl;) = 0, but that cov (e" y;) =var (ei) == which the estimation of the slope is entirely depend
It can be shown to detect influen tial points when they
y will general ly not be at x == 19. We would like to be able
(1 - mi;)a 2 • A plot of 61 or r1 versus observed values 1 x linear models.
is correct. occur in more comple
helpful, because we expect it to show a pattern even if the model that
be useful, depend ing upon the situation. By (14.2.4) we have 'fJ = XLy where XL= (xb) is q x n. It follows
Various other residual plots may
to plot the residua ls in Examp le 13.5.l against for i == 1, 2, .. . , q.
For instance, we might wish
women if these were availab le. In the plastic gear exampl e,
the weights of the
a unit increase
we might plot the residuals in the order that the corresponding
measur ements Thus xfj is the amoun t by which 'fJ 1 would change as a result of
a systematic of we can determine
were made, with the purpos e of checking whether there was row
in the ith XL,
in YJ. By examining the elements
or two points which strongl y influen ce the estima tion
change in laborat ory conditions. whether there are one
a word of
Although residual plots are very useful in statistical analysis, of {3;.
n will fli by mii
caution is necessary. Even if the model is correct, random
variatio Similarly, (14.5.1) shows that increasing Yi by one unit will change
most people would then fl; is determ ined almost entirely by
produce pattern s in the residuals rather more often than units. If the leverage m11 is close to I,
every residual
expect. Many beginners spot an "unusual" pattern in almost just the one observ ation y 1 •
(b) Show that X(X'X)- 1 X' is symmetric and idempotent, and hence that the The set of all such linear combinations is a vector space 'Y(X) called the
leverages mu lie between 0 and 1. column space of X. Since the X/s are assumed to be linearly independent,
(c) Show that, if the ith point has leverage mu 1, then fl1 = Y1 and 81 = 0. 'Y(X) has dimension q.
Let P 1' P 2 , •• ., Pq be a set of normed orthogonal basis vectors for 'Y(X).
One way to construct the P/s is by applying the Gram-Schmidt ortho-
* 14.6. Derivations gonalization procedure to the columns of X. Let P be the n x q matrix with
P 1 , P 2 , •.• , Pq as its columns. Since the P/s are normed.and orthogonal we
In this section we derive the distribution of thexesidual sum of squares in the have PjP1 = 1 and P:P1 =0 for i 'I= j. It follows that P' P =I.
normal linear model, and also the distribution of the additional sum of Since P 1 , P 2 , ••• , Pq is a set of basis vectors for 'Y(X), every vector in 'Y(X)
squares due to a linear hypothesis. We used these results in earlier sections to can be written as a linear combination of P 1 , P 2 , ••. , Pq. In particular, each of
set up significance tests and confidence intervals. the X/s can be written as a linear combination of P 1 , P 2 , .. ., Pq,
We assume that Y1 , Y2 , ... , Y,, are independent N(µ;, u 2 ), and thatµ= X/3
where X is an n x q matrix of constants with linearly independent columns. X 1 =P 1 a 11 +P 2 a21 + ··· +Pqaq1
Let Ui = (Y;- µ;)/u, so that Y; = µi + uUi. In matrix notation we have for some constants aiJ· Thus we have X =PA. The matrix A must be
nonsingular because both X and P have rank q. 0
Y = µ + CJU =X/3 + uU
where U is an n x 1 vector whose components U i. U 2 , ••• , u. are indepen- Theorem 14.6.1. Under the normal li~ear model assumptions stated above, 'Eef
dent N(O, 1). is distributed independently of p, and
The vector of fitted values is
1 :E' 2 2
µ= xp = xxLy = xxL(X/3 + CJU). u2 ei ~ X<n-q)·
Since XL X =I, we have PROOF. By the Lemma we can write X =PA where P is n x q with normed
fl=Xf3+uMU (14.6.1) orthogonal columns and A is q x q nonsingular. Let C = (PIR) be an n x n
orthogonal matrix Whose first q columns are the columns of P. Since
where M =XXL. The vector of residuals is C'C =CC'= I, we have P'P =I, R'R =I, and
il= Y-fl = (X/3 + uU)-(X/3 + uMU) PP'+RR'=l.
=u(I-M)U. (14.6.2) Since P' P =I and A is nonsingular, we have
In proving the theorems below, we shall construct an orthogonal matrix C M =XXL= X(X'X)- 1 X'
and then consider the orthogonal transformation Z = C'U. Then Z = (Z;) is
n x 1, and by Theorem 7.3.1, its components Zv Zz, .. ., z.
are independent = PA(A'P'PA)- 1 A'P'
N(O, 1). = PAA- 1 (A')- 1 A'P' =PP';
The following lemma will be used in constructing the required orthogonal
transformation. 1-M=l-PP'=RR'.
Lemma 14.6.1. Let X be an n x q matrix with linearly independent columns. It follows from (14.6.1) and (14.6.2) that
Then there exists an n x q matrix P and a nonsingular q x q matrix A such that fl= X/3 + uPP'U; e=uRR'U.
X =PA and P'P =I.
Since R' R =I, the residual sum of squares is
PROOF. Let X 1 , X 2 , .. .,Xq denote the q columns of X, and let b=(b) be a
q x 1 vector of constants. The product Xb is n x 1 and represents a linear 'Eef = e'e = u 2 U'RR'RR'U = u 2 U'RR'U.
combination of the columns of X: Now consider the orthogonal transformation Z = C'U. Then Z 1 , Z 2 , ... , z.
Xb=X 1 b 1 +X 2 b2 + ··· +Xqbq. are independent N(O, 1) variates. Note that P'U contains Z 1 , Z 2 , ..• , Zq
and R'U contains Zq+ 1 , Zq+ 2 , ••• , z..
Sinceµ is a function of P'U and is a e
*This section may be omitted on first reading. e
function of R'U, it follows that fl and are distributed independently. Also we
276
14. Nonna! Linear Models
have
~Iet
11
=(R'U )'(R'U )
CHA PTE R 15
H: /3= Ay
where A is q x p with linearly independent colum In this chapter we discuss some general principles
ns. Let Ief be the residual sum
of squares under H. and let Q = I f.r - Ief be of statistical inference and
the additional sum of squares their applications in the constructio n of signif
due to H. Then, if H is true, Q is distrib intervals. icance tests and confidence
uted independently of Ier, and
Q/112 ~ Xfq - p» An impo rtant requirement of any valid statistical
inference is that it shoul d
PROOF. Unde r H, the model beco mesµ no~ depend upon any features of the data which
= Wy where W~ XA. The columns of are irrelevant to the question
W are linear comb inatio ns of the columns of mterest. The sufficiency principle attem pts to
of X , and therefore ..Y(W), the formalize this requirement.
column space of W, is a subspace of r(X) . Section 1 describes this principle and define
s sufficient statistics. Some
As in the preceding theorem, we consider an ortho properties of sufficient statistics are derived in
gonal transformation Section 2.
Z = C'U. For the first p columns of C we take a norm Significance levels and coverage probabilities are
ed ortho gonal basis of the comp uted from sampling
vector space ..Y(W). To this we add q - p colum distributions in a series of imaginary repetitions
ns so that the first q columns of the experiment. These
of C form a norm ed ortho gonal basis of..Y(X). This repetitions are purely hypothetical, and will
is possible because ..Y(W) not actually be carried out.
is a subspace of ..Y(X). Sections 3 and 4 are concerned with how to choos
e an appro priate series of
The argum ent in the preceding proof can now be repetitions for inferences abou t a parameter. In
used for both the original partic ular, it is argued that,
model µ = X/3 and the hypothesized model µ = when ancillary statistics are present, significance
Wy, giving levels and coverage proba -
bilities should be computed from a conditional
distribution.
I e?/11 2=z:+1+z:+ 2+ ·· · +z;; Section 5 considers difficulties which can arise
in testing composite
hypotheses. Sometimes a satisfactory test can be
Ief/11 2=z;+ 1+z;+2+ ·· · +z;. the observed values of sufficient statistics for the
obtai ned by condi tionin g on
Subtr acting gives unkn own param eters. Some
examples of conditional tests are given in Sectio
n 6.
Q/11 2 =
(Isr - Isf)/11 2
=z;+ 1+z ;+2+ ··· +z;.
Since this · is the sum of squares of q - p indep 15. l. The Sufficiency Principle
endent N(O, 1) variates, it
follows by (6.9.9) that Q/11 2 ~ Xfq-p)' Also, since
Zp+ 1 , ... , Zq are distributed
independently of Zq+ 1 , •.• , z., it follows that Q An impo rtant requirement of any valid statistical
is distributed independently inference is that it shoul d
ciu;. not be affected by features of the data which are
o interest. The sufficiency principle is an attem pt to
irrelevant to the question of
formalize this requirement.
278 15. Sufficient Statistics and Conditional Tests 15.1. The Sufficiency Principle 279
Let y, y' be two possible (mutually exclusive) outcomes of an experiment will lead to the same inferences, and therefore the sufficiency principle is
whose probability model involves an unknown parameter 8. Suppose that we automatically satisfied.
wish to make inferences about the value of 8. Roughly speaking, the
sufficiency principle states that, if the choice between y and y' is a purely
random one not depending upon the value of 8, then inferences about 8 Sufficient Statistics
should be the same if y is observed as they would be if y' were observed.
For instance, consider n = 3 Bernoulli trials, and suppose that we wish to A statistic Tis a random variable whose value T(y) can be computed from the
make inferences about 8 = ?(success). Consider the three outcomes y = SSF, data without knowledge of the value of 8. Tis called a sufficient statistic for 8
y' = SFS, and y" = FSS. Each of these outcomes has probability 82 (1 - 8). No if knowledge of the observed value bf Tis sufficient to determine L(8; y) up to
matter what the value of 8 is, the three outcomes are equally probable. The a constant of proportionality. In other words, Tis a sufficient statistic for 8 if
choice among them is purely random, and does not depend in any way on the L(8; y) can be written as a function of y only times a function of T and 8:
value of 8. The sufficiency principle states that inferences about 8 should be
L(8; y) = C(y) · H(T(y); 8). (15. l.4)
the same no matter which of these outcomes is observed.
The conditional probability of observing outcome y given that either y or Two outcomes y, y' such that T(y) = T(y') will give rise to proportional
y' has occurred is likelihood functions for 8, and by the sufficiency principle, they should lead to
, P( y; 8) Odds the same inferences concerning 8. All that we require from the data for
(15.1.1)
P(yly or y) = P(y; 8) + P(y'; 9) Odds+ 1 inferences about 8 is the observed value of a sufficient statistic T.
Even when 8 is one dimensional, we may need two or more functions of the
where the ratio of probabilities, data to fully determine the likelihood function. If knowledge of the observed
Odds= P(y; 8)/ P(y'; 8), (15.1.2) values of k statistics T 1 , T2 , •• • ,Tic is sufficient to determine L(8; y) up to a
proportionality constant, then T = (T1 , T2 , .•• , T,J is called a set of sufficient
is the fair betting odds for outcome y versus outcome y'. If the odds do not statistics. Two outcomes y, y' such that T,{y) = T,{y') for i = 1, 2, ... , k will give
depend upon 8, then the choice between outcomes y and y' is purely random rise to proportional likelihood functions for 8.
and is unrelated to the value of 8. The existence of a set of sufficient statistics T 1 , T2 , ..• , Tic enables us to
The sufficiency principle states that, if the odds (15.1.2) do not depend upon condense or reduce the data to k numbers T 1 (y), T2 (y), ... , T,.(y) without
8, then outcomes y and y' should lead to the same inferences concerning 8. An losing information about 8. A set of sufficient statistics which gives the
equivalent requirement is that the conditional probability (15.1.1) does not greatest possible reduction of the data is called minimally sufficient for 8. If T
depend upon 8. is minimally sufficient for 8, then T(y) = T(y') if and only if L(8; y) and L(8; y')
are proportional.
The sufficiency principle states that outcomes which give rise to propor-
Sufficiency and the Likelihood Function tional likelihood functions for 8 should lead to the same inferences concern-
ing 8. An equivalent statement of the sufficiency principle is that outcomes
The likelihood function of 8 based on outcome y is proportional to P(y; 8): which imply the same value of a minimally sufficient statistic or set of statistics T
L(8; y) = k(y) · P(y; 8), (15.1.3) should lead to the same inferences concerning 8. If Tis minimally sufficient for
8, then T carries all of the relevant information for inferences about 8.
where k(y) is positive and does not depend upon 8. The odds (15.1.2) a~e Inferences about 8 should depend only on T and not on the remainder of the
independent of 8 if and only if L(8; y) is proportional to L(8; y'). Another way
\
data.
of stating the sufficiency principle is that outcomes of the same experiment
which give rise to proportional likelihood functions for 8 should lead to the same
EXAMPLE 15.1.1. Consider n Bernoulli trials, and suppose that we wish to
inferences about 8. Indeed, this is the reason that the likelihood function is
make inferences about 8 = P(success). An outcome of the experiment may be
defined only up to a multiplicative constant, and that two likelihood
functions which are .proportional to one another are regarded as equivaler,t.
I' written as a sequence y = (y 1 , y 2 , .. ., y.), where y 1 = 1 if the ith trial produces
a success and y 1 = 0 otherwise. Since P(y 1 = 1) = 8 and P(y1 = 0) = 1 - 8, we
In Chapters 9-14 we restricted discussion almost exclusively to methods
based on the likelihood function or likelihood ratio statistic. For these
I have
methods, observations which give rise to proportional likelihood functions
I for y 1 = 0, 1.
280
15. Sufficient Statistics and Cond itiona
l Tests 15. : . The Sufficiency Principle
281
Since trials are inde pend ent, the prob
abili ty of ~utcome y is
sufficient statistic, altho ugh its value wou
n ld be requ ired for inferences abou t
P(y; 0) = CT f(y1) = oi:1 '(1
the parameter.
- O)"-i:1 ' .
i= l
EXAM PLE 15.1.3(a). Let Y ,
The likelihood function is a cons tant 1 Y2 , .. ., Y,, be inde pend ent expo nent
times P(y, O), with the same mea n B. Thei r joint p.d.f. ial variates
is
L(O; y)=k (y)·O l: 11 (1-o rl:y ; for0 <0<
Usually we would take k(y) = l for conv
enience.
l.
f(y, , Yz, ... , Y.) = n e1
n
1= 1
-e - rJ/ 9 = e - •e - t y,/o
Let y' = (y'1 , y~, .. ., y~) be anot her poss
ible outcome. Then for 0 < y 1 < ro . If the mea sure men t inter
vals are small (see Section 9.4), the
P(y; O)/P(y'; B) = oi:1,-tyl(l _ O)trl - t 11, likelihood function of 0 is
which is inde pend ent of 0 if and only L( O; y) = C(y) . e- •e- •18
if Ly1=Ly ;. This is also the cond ition fore > 0
unde r w~ich y a~d ~, give rise to prop
ortio nal likelihood functions for O. By where t = LY1· Assuming n to be know
the su~c1ency pnnc1ple, outc ome s y, y' n in advance, the total T =LY ; is a
such that Ly 1 =Ly ; shou ld lead to the sufficient statistic for 0.
same inferences for 0.
(b). A more complicated situa tion was
The rand om varia ble T =LY ; is a suffi cons idere d in Section 9.5. The
cient statistic for O in this example. lifetimes of n specimens were assu med
Outc ome s y, y' such that T(y) = T(y') to be inde pend ent expo nent ial variates,
(that is, Ly1=Ly ;) give rise to but censoring of lifetimes at pred eterm
prop ortio nal likelihood functions . In ined times was perm itted . The
fact, Tis minimally sufficient because likelihood function then has the form
T(y) # T(y') , then L(O; y) is not prop if
ortio nal to L(O; y').
=
The sufficient stati stic T 1: Y; carries
conc ernin g the value of 0. The rema inde
,
all of the infor mati on from the data fore > 0
r of the data (i.e. infor mati on abou t where m is the num ber of specimens whic
the orde r in which the O's and l's occu h fail, and s is the sum of m failure
rred) is not relevant to inferences abou times and n - m censoring times. We
0 unde r the model assu med . This addi t would not know m ors until after the
tiona l infor mati on is what would be experiment. Thus, in this case, we need
used to check the assu mpti ons of the observed values of two statistics
inde pend ent trials and equal success M (the num ber of failures) and S (the
prob abili ties which underly the Bern . total time on test), before we can writ~
oulli trials model. dow n L(O). Und er the exponential mod
el with cens oring, the pair (M, S) is
minimally sufficient. Neit her M nor S
EXAM PLE 15.1.2. Let Y , Y , by itself is a sufficient statistic for B.
1 2 .. ., Y. be inde pend ent Pois son varia
same mea n µ. Then the prob abili ty of tes with the
outc ome y = (y 1 , y 2 , •.• , y.) is EXAM PLE 15.1.4. Supp ose that
Y1 , Y2 , ... , Y,, are inde pend ent variates
havi ng a
P(y; µ)= n µY•e - µ/Y1!=µty,e-"" /(y1!Y2! ... y.!)
n
i=l
uniform distr ibuti on on the interval [O,
the likelihood function of e is
BJ where B > 0. From Prob lem 9.4.11,
where d(t) is not a function of e. The likelihood function based on (15.2.2) will EXAMPLE 15.2.2. Let Y1 , Y2 , •• ., Y,, be independent Poisson variates with the
be the same up to a proportionality constant as that based on (15.2.1). same meanµ. We showed in Example 15.1.2 that
and P(Y = y/T = t) = 0 otherwise. Thus by (15.2.1) and (15.2.2) we have This is a multinomial distribution with index t and equal probability
parameters p 1 = p 2 = · ·· = Pn = 1/n. As expected, the conditional distri-
P(Y =YI T = t) = c(y)I I
T(y)=t
c(y) = c(y)/d(t) (15.2.4) bution of outcomes given the sufficient statistic does not depend upon the
parameter µ.
which does not depend upon e.
Property 3. · Applying a one-to-one transformation to a set of sufficient
EXAMPLE 15.2.1. Consider n Bernoulli trials with success probability e. Let statistics produces another set of sufficient statistics.
Y = (Y1 , Y2 , .. ., Y,,) be a zero-one vector indicating the observed sequence of Let T1 , T2 , •• ., T,, be a set of sufficient statistics for 8. Suppose that
failures and successes as in Example 15.1.1. Then U 1 , U 2 , ••. ,Uk are functions of T1 , T1 , ••• , T,,, and that the transformation
P(Y = y; e) = (JLY•(l - er-z;y, = e'(,1 - er- 1 from (T1 , T2 , .. ., T,,) to (U 1 , U 2 , .. ., Uk) is invertible. Since the U;'s and T;'s
can be deduced from one another, they have the same information content.
where t = LYi· Here T =I: Y; is a sufficient statistic for e. Since Tis the total Given the values of the U/s, we can calculate the values of the T;'s and hence
number of successes in n Bernoulli trials, it has a binomial (n, e) distribution, determine L(8; y). Thus the Vi's also form a set of sufficient statistics for e.
and
EXAMPLE 15.2.3 (Normal Linear Model). We showed in Example 15.1.7 that
P(T= t; e) = ( ~) 8 (1- e)"-'
1
for t = 0, 1, ... , n. '/3 1 , '/3 2 , ... , Pq and L:ef form a set of sufficient statistics for the parameters
/Jt> /J 2 , .. ., /Jq, and a in the normal linear model. One can show, by a similar
The likelihood function based on observing T = t and that based on the full argument to that in Example 15.1.7, that
sample y are both proportional to 8'(1 - er-·. L:ef = L:Yf-P'X'X'/J.
The conditional probability of outcome y given that T(y) = t is
Note also that
otherwise.
p== (X'X)- 1 X'Y = (X'X)- 1 T
where T = X' Y is q x 1 with jth component
n
probable. This distribution does not depend upon e, and could be used for As in Example 15.1.7, we treat X as a matrix of constants which, like n, is
testing the adequacy of the Bernoulli model. known prior to the experiment.
288 15. Sufficient Statistics and Conditional Tests
15.3. Exact Significance Levels and Coverage
Probabilities
289
Give n obser ved value s of T and L. Yf,
we can calcu late 'fJ and L.ef.
Conv ersely , T and L. Yf can be comp uted in the series expan sion will be negligible with
from Pand L.ef. Thus T and L. Yf high proba
have the same infor matio n conte nt as Pand
L.ef. Since 'fJ 1 , '(3 2 , ... , '/Jq, and L.ef -t(e
r(6)::::: -W J(B) in large samples, so that and bility e
. Thus we have
J (B) form a set of
form a set of sufficient statistics, the same appro xima te sufficient statis tics for 8. In
is true of T1 , T2 , •• • , Yq and I: Yf. this case we can summ arize nearl y
all of the infor matio n conce rning 8 by givin
Property 4. The maxi mum likeli hood estim g the most likely value iJ and a
ate Bis part of any set of sufficient meas ure of preci sion J(B).
statis tics T1 , T2 , ... , T,. in the sense that its
value can be comp uted from JUSt
the T;'s. This follows becau se the T;'s deter
mine L(8; y) up to a propo rtion ality
e
const ant, and does not depen d upon this
const ant.
PROBLEMS FOR SECTION 15.2
e
In some simp le exam ples, is itself a suffic
the infor matio n relev ant to estim ation of
ient statis tic which carrie s all of L Suppose that X has a binomial distributio
n with parameters (11, 0), and that Y is
independent of X and has a binomial distributio
8. For instan ce, in Exam ple 15. Ll n with parameters (m, 8), where 111
=
we have IJ T /n wher e T= L.Y; is a suffic
ient statistic. Since n is given, the
and 11 are known. Show that T = X + Y is a suffic
ient statistic for 8, and verify that
e
value s of and T can be dedu ced from one
anoth er, and they have the same
the conditional distribution of X and Y given
T does not depend upon 8.
infor matio n conte nt. Simil ar comm ents apply 2. Suppose that Yt> Yi • .. . , Y,, are independen
in Exam ples 2, 3(a), 4, 6, and 7 t and identically distributed rando m
of Secti on 15.L variables, with
e
In more comp lex exam ples, is not by itself
insta nce, in Exam ple 15. U(b) we have a pair
a sufficient statis tic for 8. For
fory =l,2, .. . , N,
of sufficient statis tics (S, M), and
e= S/ M. The likeli hood funct ion is where N is an unknown positive integer.
-n[~ -1-log~J
ry statistics) for 8. We
for 0 > 0. Then Tis called an ancillary statistic (or set of ancilla
r(8) = exact signifi cance levels and coverage
shall argue that, in this situation,
able metho d for probabilities should be compu ted from the condit ional distrib e
ution of given
Here 0:: 'f.X J n is a sufficient statistic for 8. Any reasori T.
ting region s of plausib le param eter values will
testing H: B = 80 or for genera the value of 8
B. Signifi cance levels and covera ge probab ilities can thus be An ancillary statistic T gives no direct inform ation about
depen d only on upon 8. Observ ing
because its marginal distrib ution f 2
can use the (t) does not depen d
lently, we
calculated from the distrib ution ofO when B = 80 • Equiva re tell us nothin g about the value of 8. The
ent for B, and has a xf2. 1 just the value of T would therefo
distrib ution of u = 2ne;e 0 which is also suffici
ing supple menta ry
6.9.7). We used this result to find primar y inform ation about Bis carried by 0, with T provid
distrib ution when 8 = 0 B (see Proble m
or ancillary information.
le 11.3.3. ·
coverage probab ilities of likelihood regions in Examp write f(B, t) as a
to test H: 8 = 2. Let f (0, t) denote the joint p.f. or p.d.f. of 8 and T. We can
Suppose, for instance, that n = 10, 0 = 3, and that we wish
292 15. Sufficient Statistics and Conditiona
l Tests
15.3. Exact Significance Levels and Cove
rage Probabilities
293
prod uct,
as in Exam ple 15.1.1. Henc e the join
J(8, t) =ft (Blt)f2(t), t prob abili ty of y and n is
wher e the seco nd facto r does not depe P(y, n) = 8i:Y•(I - 8)" -i:v'g(n).
nd upon 8. Since (B, T) is a set of
sufficient statistics, L(8) is prop ortio The likelihood function of 8 is then
nal to f(B, t) (see Prop erty 1 in Section
15.2). Since f 2 (t) does not depe nd upon
8, it follows that L(8) is prop ortio
to f 1 (Bit), the cond ition al p.f. or p.d.f. nal
of 8 given the obse rved value of T. This for 0 < 8 < I
cond ition al distr ibuti on carri es all
marg inal distr ibuti on of T does not
of the infor mati on conc ernin g 8. The e
where = "'i.y;fn.
depe nd upon 8, and so it is not used It follows that (0, N) is a pair of mini mall
mak ing inferences abou t 8. in y sufficient stati stics for 8, and tha t
N is ancillary. Henc e inferences
abou t 8 will be base d on the cond ition
Ofte n one can inter pret an anci llary distr ibuti on of 8 given the obse rved al
stati stic T as a meas ure of the value of N . Equi vale ntly, inference
prec ision with which it is poss ible abou t 8 may be base d on the cond ition s
outc ome s of an expe rime nt may diffe
to estim ate 8. The vario us possible
r grea tly in the amo unt of infor mati on the obse rved samp le size n. Cove rage
al (binomial) distr ibuti on of
prob abili ties and significance levels will
ne
given
abou t 8 which they are capa ble of yield thus be calcu lated as in Exam ple 11.2.
ing. If we are fortu nate , we observe an 1 and 12.2. 1. Alth ough the samp le size
outc ome which perm its the value of e N is rand om, we take it as fixed in infer
to be dete rmin ed quite precisely. If we ences abou t 8. The fact that we migh
are unlu cky, we may obta in an unin get a different samp le size in repe tition t
form ative outc ome from which we can s of the expe rime nt is irrel evan t to the
learn relatively little abou t e. inter preta tion of the data actu ally obta
ined .
In prob lems of inference, it is necessary In the abov e discussion, we assu med
to take into acco unt the that the distr ibuti on of N did not
infor mati vene ss of the data actu ally obta depe nd upon 8. Of course, if the
ined . The fact that we migh t obta in a distr ibuti on of N depe nded upon
more infor mati ve or less infor mati ve cond ition ing on the obse rved value of 8,
resul t if the expe rime nt were repe ated N wou ld entail a loss of info rmat ion
shou ld be cons idere d in designing futur conc ernin g 8, because L(8) wou ld no
e experiments, but it is irrelevant to longer be prop ortio nal to f (Bin). For
the inter preta tion of the data at hand instance, one migh t decide to keep
. The obse rved value of the ancillary exam ining subjects until three with
stati stic indic ates the infor mati vene tuberculosis had been found and then
ss of the data actu ally obta ined . It stop. Then the distr ibuti on of N wou
there fore appr opri ate to base inference is depe nd upon 8, and it wou ld not ld
s abou t 8 on the cond ition al distri- be appr opri ate to cond ition upon
butio n of () given the obse rved value observed value. its
of the ancillary statistic T.
EXAMPLE 15.3.2 (Ran dom Sam ple Size) EXAMPLE 15.3.3. A total of n clou ds are
. Supp ose that the expe rime nt involves to be obse rved in an expe rime nt to
deter mine the effectiveness of clou d seed
n Bern oulli trials as in Exam ple 15.1.1. For ing in prod ucin g rain. For each clou d
instance, we migh t exam ine n it is decided whet her or not to seed by
subjects for tube rculo sis with the inten flipping a bala nced coin. Henc e Z , the
tion of estim ating 8, the prop ortio n of num ber of clou ds to be seeded; has a
the popu latio n havi ng this disease. In bino mial (n, ·! ) distr ibuti on.
Exam ple 15.1.1 we assu med that n, the Let X be the num ber of seeded clou ds
samp le size, was fixed and know n prio which prod uce rain, and let Y be the
r to the expe rime nt. How ever it may be num ber of unseeded clou ds which prod
that n itself is subje ct to varia tion, and uce rain. We assu me that clou ds are
coul d be mod elled as an observed inde pend ent, and that the prob abili ty
value of a rand om varia ble N. For of rain is p1 for a seed ed clou d and p
insta nce, the samp le size migh t depe for an unseeded clou d. Then, given that
upon the amo unt of mon ey and nd z clou ds are seeded, X has a bino mial2
labo rator y space, and the num ber (z, pi) distr ibuti on, and Y has a bino
pers onne l avail able for the stud y, and of mial (n - z, p2 ) distr ibuti on inde pend
perh aps none of these is unde r the stric ently of X . We observe (x, y, z) and wish -
cont rol of the expe rime nter. Or perh t to mak e inferences abou t p a nd p .
aps unforeseen circu msta nces unre lated In parti cula r, we migh t want to test 1 2
to the incidence of tube rculo sis coul d the hypo thesi s that p = p •
caus e the expe rime nt to be term inate The joint prob abili ty function of X, 1 2
after 150 peop le have been exam ined d· Y, and Z is
, altho ugh it was originally plan ned
exam ine 200. to f(x, y, z) = f(x, y! z)f(z ) = f(xlz )f(yl z)f(z
Supp ose, then, that the samp le size N )
is a rand om varia ble with prob abili ty
func tion g(n) not depe ndin g upon 8. The
expe rime nt prod uces n, the obse rved
value of N, and a sequ ence y = (y , y
1 2 , .. ., y.) wher e Y; = 1 or 0 acco rding to
whe ther the ith subj ect does or does The likelihood func tion of p and p
not have tuberculosis. Give n n, the
prob abili ty of the sequ ence y is 1 2 is thus
where p1 = x/z and p2 = y/(n - z). Here p1 , Pi· and Z are jointly minimally sufficient statistics for 8, and A 1 ",A 2 , ... , A. _ 1 are ancillary. All of the
sufficient for p 1 and p2 , and Z is ancillary. Inferences about p 1 and Pi will informatio n about 8 is carried by the condition al distributio n of T given the
therefore be based on the condition al distributio n of Pt and p2 given the observed values of the ancillary statistics. This distributio n, which may be
observed value of Z, or equivalen tly, on the condition al distributio n of X and f~und by numerical integratio n, would be used for calculatin g exact sig-
y given the observed value of Z. Since z is to be treated as fixed, a test of nificance levels or coverage probabilit ies.
H: p 1 =Pi can be carried out as in Example 12.4.1, with n 1 =z and n2 =n-z. See In this example, the ancillary statistics give informati on about the shape of
Section 15.6 for discussion of an exact test of this hypothesis. the likelihood function. For instance if n = 2, L(8) has a u·nique maximum at y
The relationsh ip between the value of Zand precision is easily seen in this when a 1 is small, but is bimodal with a relati ve minimum at y when a 1 is large.
example. If one should get z = 0 (improbab le but still possible), then one The observed value of A 1 indicates the shape of L(8), and hence the
would not seed a ny clouds, and thus would obtain no informati on about P1 • appropria te form of likelihood and confidenc e regions. However A 1 itself tells
us nothing about the magnitud e of 8.
Similarly, with z = n, one would obtain no informati on about p 2 . In both
cases, the experimen t would be incapable of giving evidence against the
hypothesi s p 1 =Pi· However, if one obtained z:::::; n/2, the experimen t would
give a reasonab le amount of informati on about p 1 and Pi , and hence would
be capable of showing that they are different. The observed value of Z thus PROBLEMS FOR SECTION l S.3
indicates the precision which is possible in inferences about Pt and p 2 •
Lt Suppose that patients arrive for treatment according to a Poisson process in time
Although Z is a random variable, we regard Z as fixed at its observed value in with 20 arrivals per year on average. The treatment is successful for a fraction{) of
the analysis. patients. Let X be the number of successful treatments and Y the number of
In this case, the existence of the ancillary statistic Z shows up a defect in the unsuccessful treatments in a one-year period. Then X and Y are independent
design of the experimen t. It would be better to set up the experimen t so that Poisson variates with means 20() and 20(1 - {)), Find an ancillary statistic T such
the value of Z was fixed in advance near n/2. This could be done by drawing that &and T are jointly sufficient for e, and derive the appropriate conditional
balls at random without replaceme nt from an urn containin g n/2 white balls distribution for inferences about e.
and n/ 2 black balls, and seeding a cloud if a white ball is drawn.
2. Let X1 , X 2, ... , X. be independent random variables having a continuous
EXAMPLE 15.3.4 (Cauchy Distributi on). Suppose that Y1 , Yi, .. . , Y,, are uniform distribution on the interval [{), {) + 1].
independe nt variates having a Cauchy distributio n centered at 8. From
(a) Show that &:X<•>-1 , and that T=X<·>- X<1 > is an ancillary statistic.
Example 15.1.5, the complete set of order statistics 1( 1 > ~ 1( 2 > ~ .. . ~ l(.J is (b) Show that the value of {) must lie in the interval [&, &+ c], where c is the
minimally sufficient for 8. observed value of T.
In this example it is possible to find a set of n - 1 ancillary statistics. To see (c) Show that the interval [/}, &+fl has (unconditional) coverage probability
this we note that the distributi on of U; = Y; - 8 does not depend upon 8. In 1-(!f.
fact, U; has a Cauchy distributi on centred at zero, with p.d.f. (d) !f n = 3, then[&, /}+ !J is an 87.5% confidence interval fore. Explain why this
n provided by
mterval might not give a satisfactory summary of the informatio
for - co < u < co . the data concerning the value of e.
distributi on of A 1 , A 2 , .. ., A. - 1 • (a) Show that, if A 1 ::; 2, the likelihood equation /'({)) = O has just one real root,
Now let Tbe a ny statistic such that the transform ation from }( 1» 1(2 ,, •.• , l(.> and that &= y. ·
to T, A 1 , Ai, ... , A._ 1 is one-to-on e . .For instance, we could take T:: Y(1> (b) S,how that, if A1 > 2, the likelihood equation has three real roots, and that
there is a relative minimum at {) = y.
for any i, or T= Y, or T:: (J, Then (T, A 1 , Ai, .. ., A. _ 1) is a set of minimally
296 15. Sufficient Statistics and Conditional Tests
15/.. Choosing the Reference Set
297
15.4. Choosing the Reference Set
of which are as follows:
To evalua te the significance level for a test of H: B=
B , it is necessary to (1) Repeat with X + Y fixed at 50.
imagin e a series of repetit ions of the experi ment with 0
B fixed at B0 • At each (2) Repeat with X fixed at 15, so that Y is the numbe
repetit ion the value of the test statisti c D is to be compu 15th success. r of failures before the
ted and compa red
with D0 b.- The significance level is the fraction of the ( J) Repeat.w .
time that D would be 1th Y fixed at 35, so that Xis the numbe r of successes before
greate r than or equal to Doh• in infinitely many 35th fadure. the
repetitions. Coverage
probab ilities are depend ent on a similar imagin ary
set of repetitions. The
series of repetit ions with respect to which SL and CP are Under H, the probab ility of pair (x, y) in the three cases
defined is sometimes is
called the reference set for inferences about B.
Even if the experi ment were actually going to be repeat
ed over and over !1 (x, y) = (
x+
x
y) BQ(I - B )Y +y=
again, care would be requir ed in choosi ng the referen
0 for x 50; x = 0, I , .. . , 50;
ce set for inferences
about B. The planne d series of repetit ions will not
approp riate set for inferences about 8! For instance,
necessarily be the f2(x, y) = ( x + y-1)
x_ l BQ(l - 80.)Y
in the cloud seeding for x = 15; y = 0, 1, 2, ... ;
experi ment (Exam ple 15.3.3), the numbe r Z of clouds
seeded would vary in
future repetitions. Howev er significance levels and x+y -1)
coverage probabilities J3(x, y) = ( y- 1 80(1 - Bo)Y
should be compu ted from the condit ional distrib ution for y = 35; x = 0, 1, 2, ....
of X and Y, with the
ancillary statisti c Z held fixed at its observed value.
Most real experi ments do not get repeat ed over and over We h~ve three. different reference sets depend ing upon
again, and so the repeht10ns we imagine. what sequen ce of
reference set (or series of repetitions) is purely hypoth
etical. Usually all that I.n case ( !), we calculate SL by summi ng f (x, y) over
we have is a set of data from which we wish to extrac 1 all pairs (x, y) for
t inform ation about B which x + Y = 50 and D(x, y) 2 D0 b.- In (2), we sum f (x,
and a descri ption of how it was collected. It may be possib 1 y) over all (x, y) with
le to imagine many x = 15 and D(x, y) 2 Dobs· And in (3) we sum f (x , y)
different ways in which the experi ment could be repeate 3 over all (x, y) with v = 35
d. Except in some and D(x, y) 2 Dobs· The significance level will in genera
simple examples it is not obviou s what set of repetitions l be different for the
is approp riate for th_ree case~. Two observers who see the same sequence
inferences about 8. of 15 successes and 35
failures might therefore calculate different significance
Significance levels and covera ge probab ilities are depend levels (or confidence
ent on the choice Intervals) because they imagine different ways in which
of a reference set. Since it is often unclea r how the referen the experi ment might
ce set should be be repeated. And of course it is entirely possible that there
chosen, there is an .unavo idable fuzziness about the is no intenti on of
definitions of exact actually repeating the experi ment anyway!
significance levels and coverage probabilities.
It is a bit ~~settling that inferences should depend upon
In this section we consid er two examp les which illustrate an imagin ary set of
the dependence of future repetltJOns which will not actuall y be carried
SL and CP on the choice of the reference set. These examp out. Howev er this is
les also illustrate u~avoidable if we wish to consider frequency characteristic
an impor tant proper ty of the likelihood ratio statistic: s such ' as sig-
that its distrib ution is mficance l~vels and coverage probabilities. What we
remark ably stable under different possible choices of the can do is attemp t to
reference set. Thus, if lessen the 1mpo_rta~ce of choosi ng the reference set by
likelih ood ratio tests are used, it generally matter using metho ds closely
s very little how the related to the hkehh ood function.
reference set is chosen. Similarly, intervals constr ucted
from the likelihood In all three cases above, the log likelihood function of
function or from likelihood ratio tests will have practic 8 is
ally the same coverage
probab ility under a variety of different choices for the referen l(B) = x log 8 + y log(! - 8) for O < 8 <I,
ce set. T,his is an
impor tant advan tage of likelih ood-ba sed methods.
· and the MLE is 8= x/(x + y). The likelihood ratio statistic for testing
EXAMP LE 15.4.1. Suppo se that X = 15 successes and Y = 35
H: B.= B0 is
failures are
observ ed in successive Bernoulli trials with ?(succ ess)=
H: 8 = 80 using some test statisti c D(X, Y), and let Dobs
observ ed value of D. Then the significance level is the sum
8. Consid er a test of
= D(15, 35) be the
D(x,y )=-2r (B 0 )=2[ xlog --x-+ ylog
. (x + y)8 0 (x + y}(I
Y
- 80 )
J
·
of the probabilities
of pairs (x, y) for which D(x, y);;?:: Dobs· In all_ three situati ons D ~ 1 > if H is t;ue, and the approx
xf
2 imate significance
One could imagine repeat ing this experi ment in many
different ways, three
level IS P{x(ll ~ Dobs}· If we are conten t to use this large-sample approx
ima-
tion, It does not matter which of the three reference sets
is chosen .
298 15. Sufficient Statistics and Conditional Tests 15.4. Choosing the Reference Set 299
Table 15.4. l. Exact Significance Levels for first technique gives a reading X ~ N(µ, 1) where µ is the true log con-
Three Possible Reference Sets centration, while the second gives X ~ N(µ, 100). A solution is assigned to
either the first technique or the second by flipping an unbiased coin, and a
Bo Approx. Exact significance levels single measurement is taken. We wish to obtain a confidence interval for the
SL (2) (3)
true log concentration µ of this particular solution.
0.15 0.0073 0.0082 0.0081 0.0077 Define T = 0 if the first technique is used, and T = 1 otherwise. The
0.16 0.0136 0.0186 0.0151 0.0226 experiment yields a pair of values (x, t). Given t, X has standard deviation 10',
0.17 0.0237 0.0372 0.0262 0.0238 and p.d.f.
0.18 0.0393 0.0403 0.0433 0.0489
0.0685 0.0678 0.0871 1
0.19 0.0619 f(xlt) =~~-exp { -!(x - µ) 2 /10 2 '} for - oo < x < oo
0.0933 0.1087 0.0995 0.0904 10' .
0.40 0.1416 0.1528 0.1407 0.1560 and the joint distribution of X and T is
0.42 0.0798 0.0877 0.0907 0.0879
0.44 0.0421 0.0471 0.0528 0.0489 f (x, t) = f (xlt) ·f2 (t) = !J (xlt) for - oo < x < oo; t = 0, 1.
0.46 0.0208 0.0235 0.0263 0.0243 Hence the likelihood function of µ is
0.48 0.0096 0.0108 0.0104 0.0121
0.0041 0.0066 0.0057 0.0049 L(µ) =exp { -!(x - µ) 2 /10 2 '} for - oo < µ < oo.
The MLE is µ = x, and (jl, T) is a pair of minimally sufficient statistics for µ.
The exact significance level in the likelihood ratio test depends on the Note that Tis an ancillary statistic because its distribution does not depend
choice of the reference set, but the dependence is slight. For instance, consider uponµ.
a test of H: 8 = 0.2, for which Dobs = 2.82 and SL~ P{xf1l ~ 2.82} = 0.0933. In Because of the symmetry, it is natural to consider symmetric intervals
(1) we find that D(x, y) < Dobs for 6 s x s 14, and thus X±a.
14
(a) Conditional reference set. Since Tis ancillary, the arguments of Section
SL 1 = 1 - I f 1 (x, 50 - x) = 0.1087.
15.3 imply that coverage probabilities should be calculated from the
x:6 conditional distribution of X (or jl) given the observed value of T. Thus the
In (2) we have D < Dobs for 36 sys 93, and coverage probability of X ±a is
93 CP(µo) = P{µ 0 EX± alT= t} = P{IX - µ 0 / s alT= t}
SL2 = 1 - I !2(15, y) = 0.0995.
y:36
= P{IZI s a/10'}
In (3) we have D < Dobs for 4 s x s 14, and where Z - N(O, 1). For instance, if a= 3, the coverage probability is
14 P{/Z/ s 3} = 0.997 when t = 0, and P{/ZI::::; 0.3} = 0.236 when t = 1. The
SL3 = 1 - I f3(X, 35) = 0.0904. 95% confidence interval for µ is X ± 1.96 when t = 0, and X ± 19.6 when
x=4
t = 1.
Similarly close agreement is found for other hypothesized values (see Table
(b) Unconditional reference set. The unconditiona l coverage probability of
15.4.1). the interval X ± a is
For reasons similar to those given in Example 11.2.1, the significance level .
is a discontinuou s function of 80 , and the discontinuities will occur at CP(µ 0 ) = P{µ 0 EX± a}= P{/X - µ 0 1 s a}
different parameter values in (1), (2), and (3). This accounts almost entirely for
the differences among SL 1 , SL 2, and SL3. = P{IX - µol s a/T= O}P{T =0} + P{IX - µ 0 1 s alT= l}P{T= 1}
When the likelihood ratio test is used, it matters very little whether (1), (2),
or (3) is assumed. This will generally not be the case for other choices of the =!P{IZ/sa }+tP{IZ/s a },
10
test statistic D.
where Z ~ N(O, 1). For instance, the coverage probability of X ± 3 is
EXAMPLE 15.4.2. Suppose that there are two different techniques for determin-
ing the log concentration (in standard units) of a chemical in solution. The !P{JZI s 3} + !P{/Z/ s 0.3} = t(0.997 + 0.236) = 0.617
300 I 5. Sufficient Stalistics and Conditional Tests 15.5. Conditional Tests for Composite Hypo
theses
301
for all µ 0 , and so X ± 3 is a 61.7 % confidenc A special feature of the norm al distr ibuti
e interval forµ . Similarly, we find on exam ples of Chap ters 13 and
that X + 16.45 is a 95% confidence 14 is that the exact distr ibuti on of the
interval for µ. The 95% coverage likelihood ratio statistic D does not
prob abih ty is achieved by inclu ding µ depend upon the values of any unkn
0 with prob abili ty l whenever the own para mete rs. For insta nce, the
precise techn ique is used (t = 0), and with likelihood ratio statistic for testing hypo
probability 0.9 whenever t = 1. these s abou t the slope {3 in a strai ght
Clearly it is the cond ition al reference line model is
set which is appr opria te in this
example. If it is know n that the meas urem
ent was made with the more precise
techn ique, then the narro wer interval x
half of futur e meas urem ents would be made
± 1.96 shou ld be given. The fact that
wit!). the less precise technique is
[
l T2
D::n log 1 + --
n- 2
J ' where T= P-P
r.:r:· -t{n - Z)
.ys2c
irrele vant in so far as inferences abou t
µ are concerned. and c = l / Sxx (see Sections 14.4 and 13.6).
(c) Likel ihood ratio statis tic. The likel The distr ibuti on of T does not
ihood ratio statistic for testing depend upon the values of the unkn own
H: µ=µ 0 is inter cept rx and varia nce 11 2 , and so
neither does the distr ibuti on of D. Thus P{D
112.
; : -: D b,} does not depe nd on rx or
0
Whe n T = O, Dis the squa re of the N(O, Usually, the exact distr ibuti on of the test
l) varia te X - µ 0 , and when T = l, D statis tic D does depe nd upon the
is the squa re of the N(O, l) varia te value of any unkn own para mete r 8 not
(X - µ 0 )/ 10. Thus the cond ition al specified by the hypothesis. Then
distr ibuti on of D given T = t is xf > for t P{D;?; D b,} will be a function of 8 rathe r
1 = 0 and for t = 1. It follows that the 0
than a nume rical value.
unco nditi onal distr ibuti on of D is also One way arou nd this prob lem is to comp
Xfn· ute the significance level from an
In Chap ter 11 we suggested that confidenc appr opria te cond ition al distr ibuti on whic
e intervals be cons truct ed from h does not depe nd upon 8. Supp ose
the likelihood function. Since P{xfl)::::; that, unde r H, Tis a sufficient statistic or
3.841} = 0.95, we take D 5. 3.841 to set of sufficient statistics for 8. Then,
obta in the 95% confidence interval by (15.2.1), we can write the prob abili ty
of a typical outc ome y as
X ± 1.96 x !OT. P(Y = y; 8) = c(y) · H(t; 8)
This interval has coverage prob abili (15.5 .1)
ty 0.95 both cond itiona lly and whe:-e t = T(y), and c does not depe nd
unco nditio nally : upon 8. By (15.2.4), the cond ition al
prob abili ty of y given that T = t is
P{µ 0 EX± 1.96 x lOTI T = t} = P{µ
0 EX± l.96 x !OT} = 0.95.
P(Y =YI T = t) = c(y)/ d(t)
Similarly, we have (15.5.2)
where d(t) is the sum of c(y) over ally for
P{D;?; DobslT= t} = P {D;?; Dobs}, which T(y) = t.
Suppose that we comp ute the significan
ce level from the cond ition al
so we get the same signi fican ce level whet distr ibuti on of Y given the obse rved value
her or not we cond ition on T. Whe n of T:
the likelihood ratio statis tic is used, we
get the corre ct answer even if we use SL= P{D;?; DobslT= t}.
the wron g (unc ondi tiona l) reference set! (15.5.3)
We note d in Section 15.3 that, if (0, T) is . . Then, since this cond ition al distr ibuti on
minimally sufficient for 8 and TIS does not depe nd upon 8, we shall
ancillary, then L(8) is prop ortio nal to f(OIT obtai n a numerical value for the significan
= t). Because oft?i s, sig~ifican~e ce level.
tests and confidence intervals base d on An example follows which illustrates this
the likelihood ratio statistic will cond ition al proc edur e, and some
autom atica lly reflect the presence of general comm ents are given at the end of
ancillary statistics, .and condi~ional the section. Addi tiona l exam ples of
significance levels and coverage probabili cond ition al tests for comp osite hypothese
ties will usual~y differ only shghtly s are cons idere d in Section 15.6.
from the unco nditi onal values. Choi ce
of the appr opna te reference set tor
inferences abou t 8 is less impo rtant when
we work with the likelihood ratio The Hardy-Weinberg Law
statis tic.
In some simple cases, the inher itanc e of a
chara cteri stic such as flower colo r is
15.5. Conditional Tests for Composite Hyp governed by a single gene which occurs
in two forms, R and W say. Each
otheses individual has a pair of these genes, one
obta ined from each paren t, so there
His called a comp osite hypothesis if, unde are t!1ree possible genotypes: RR, R W,
r H , there remains an unkn own and WW.
para mete r or vector. of param eters 8. Mos Supp ose that, in both the male and fema
t of the examples in Chap ters 12, 13, le popu latio ns, a prop ortio n 8 of
and 14 involved tests of comp osite hypo the genes are of type Ran d the othe r 1 -
theses. 8 are of type W. Supp ose furth er that
302 15. Sufficient Statistics and Conditional Tests 15.5. Conditional Tests for Composite Hypotheses 303
mating occurs at random with respect to this gene pair. Then the proportions If n is large, then D has approximately a x2 distribution with one degree of
of individuals with genotypes RR, RW, and WW in the next generation will freedom, and
be SL~ P{xtl);?::. D0 b,}.
2 (15.5.4)
P1 = 82 , P2 = 28(1 - 8), p3 = (1 - 8) .
The unconditional probability of the event D ~ D b, would be computed by
0
Furthermore, if random mating continues, these proportions will remain summing the trinomial probabilities P(y; 8) over all y 1 , y 2 , y 3 such that
nearly constant for generation after generation. This famous result from Dz Dobs· This probability will depend upon what value is taken for the
Genetics is called the Hardy-Weinberg Law. unknown parameter 8. Instead, we compute the conditional probability of
Suppose that n individuals (e.g. pea plants) are selected at random and are Dz D 0 " ' given the observed value of T. This conditional probability is found
classified according to genotype. Let y 1 be the number with genotype RR (red by summing P( Y = yl T = t ), and it will not depend upon 8.
flowers), y 2 the number with genotype RW (pink flowers), and y 3 the number Since 8 = t/2n, conditioning on the observed t is equivalent to restricting
with genotype WW (white flowers), where y 1 + y 2 + y 3 = n. We wish to test · attention to those outcomes for which 8 equals its observed value. Hence the
whether these observed frequencies are consistent with the Hardy-Weinberg expected frequencies e 1 , e2 , e 3 will be the same for all outcomes considered in
Law (15.5.4). the conditional test.
Note that, under (15.5.4), there remains an unknown parameter 8 to be To compute the exact conditional significance level, we list all possible
estimated from the data. Thus the hypothesis to be tested is composite. outcomes (y 1 , y 2 , y 3 ) with y 1 + y 2 + y 3 = n and 2y 1 + y 2 = t. For each we
Following the procedure described above, we shall calculate the significance calculate D(y) and c(y). We sum c(y) over all these outcomes to obtain d(t),
level from the conditional distribution of the Y;'s given the observed value of a and divide to get the conditional probabilities P(Y = yl T = t). Finally, we sum
sufficient statistic T. these probabilities over all outcomes such that D(y) ~ D bs· This procedure is
0
Under the hypothesis, the distribution of the Yi's is trinomial with illustrated in the following example.
probability parameters as given in (15.5.4):
EXAMPLE 15.5.1. Suppose that n = 20 individuals were observed, and that the
P(Y = y; 8) =( n ) [8 2JY1[28(1 - 8)Y1[(l - 8) 2 ] 13 observed frequencies were as follows:
Y1Y2Y3
=( n )2Y18'(l-8)2n-1
Yi Y2Y3
Genotype
Obs. freq. y1
RR
5(2.8)
RW
5(9.4)
WW
10(7.8)
Total
20
where t = 2y 1 + y 2 • Here T = 2 Y1 + Y2 is a sufficient statistic for 8, and we
have
Here t = 2y 1 +y 2 =15, and 8 = t/2n = 0.375. The expected frequencies are as
shown in parentheses, and
P(Y = y; 8) = c(y) • H(t; 8)
5
= 2 [ 5 log- 5 + 10 log-
+ 5 log- 10] =
where H(t; 8) = 8'(1 - 8) 2"-', and Dobs
2.8
4.45.
9.4 7.8
c(y) =( n )1Y>. All possible outcomes (y 1 , y 2 , y 3 ) with y 1 + y 2 + y 3 = 20 and 2Yi + Y2 = 15
Y1Y2Y3 are listed in Table 15.5.l together with the corresponding values of D(y) and
By (15.5 .2), the conditional probability of outcome y given that T= t is c(y). Summing c(y) gives d( 15) x 10- 10 = 4.0225, and we divide by this value
c(y)/d(t) where d(t) is the sum of c(y) over ally such that 2y 1 + y 2 = t, and, of to get the conditional probabilities P(Y = yl T = 15) = c(y)/d(l5). There are
course, y 1 + y 2 + y3 = n. four outcomes in the table such that D;?::. D0 b,, and summing their proba-
The MLE of 8 is 8 = t/2n, and the estimated expected frequencies for the bilities gives
three genotypes are SL= 0.0126 + 0.0370 + 0.0028 + 0.0001 = 0.0525.
e1 = n0 2 , e2 = 2nB(l - B), e 3 = n(l - 0) 2
. For comparison, the large-sample approximation gives
By (12.5.1), the likelihood ratio statistic for testing the hypothesis (15.5.4) is SL~ P{xtii z 4.45} = 0.035.
D(y) = 2'.Eyi log (y1/eJ)· The agreement is not too bad in view of the small expected frequencies.
J:
304 15. Sufficient Statistics and Conditional Tests
15.6. Some Examples of Conditional Tests
Table 15.5.1. Evalu ation of the Exact Cond itiona 305
l Significance Level in a
Test of the Hard y-We inber g Law
tht; unkno wn param eter under H. The secon d
facto r does not depen d upon 8,
and is used for testing the hypothesis H.
Y1 Y2 Y3 D( y) c(y) x 10- 10 P(Y = ylT = 15) Often T can be thoug ht of as a measure of precis
ion, and there are good
0
1
15
13
5 9.57 0.0508 0.0126
reasons for condi tionin g on its obser ved value
indicates the amou nt of inform ation =
. For instan ce, T 2 Y + Y
availa ble for testing the
1 2
6 3.22 0.4445 0.1105
2 11 7
Hard y- Weinberg Law. If Tis close to 2n, then
0.60 1.2383 0.3078 almos t all individuals must
3 9 8 necessarily fall in the RR class, wheth er or not
0.04 1.41E9 0.3527 the Hard y- Wein berg Law
4 7 9 1.30 holds , and it will not be possible to obtai n evide
0.7095 0.1764 nce again st this hypo thesis. A
*5 5 10 4.45 0.1490 similar comm ent applies when T is dose to
0.0370 0. The prosp ect of obtaining
6 3 11 9.86 0.0113 evidence again st the hypothesis is much better
0.0028 when Tis close to n. Thus Tis
7 1 12 18.69 0.0002 a measure of the experiment's precision, and one
0.0001 can argue , as in Sectio n 15.4,
that inferences shoul d be made condi tiona l on
Total 4.0225 its obser ved value.
0.9999 Cond itioni ng on a set of sufficient statistics will
not always give satisfactory
results, because in so doing we may disca rd some
of the inform ation releva nt
to assessing the hypothesis. This inform ation loss
can be subst antia l in some
Note that only one of the Y;'s is "free to vary" in examples. As a general rule, it seell).s dange
Table 15.5.1 , the other two rous to use this condi tiona l
then being deter mined by the const raints y + proce dure unless 8 1s sufficient fore arid T is a
1 y 2 + y 3 = 20 and 2y 1 + y 2 = 15. one-t o-one funct ion of 8, as in
This is directly relate d to the single degre the Hard y-We inber g example. If/') is not suffic
e of freedom in the x2 ient, it is proba bly better to use
appro xima tion. the condi tional distri butio n of Y given 8, even
though this distri butio n will
It is possible to obtai n an algeb raic formula for not be completely indep enden t of e.
P(Y = y\T = t) in this case.
Since T= 2Y1 + Y2 repre sents the total numb Again, there are advan tages in takin g D to be
er of R-genes out of 2n genes the likelihood ratio statis tic
selected at rando m, the distri butio n of Tis binom for testing H. In large samples, D and /') are distri
ial (2n, 8). It follows that buted indep enden tly of one
anoth er. Significance levels comp uted from the 2
x appro xima tion (12.3 .2) can
P(T = t) = (2;) 8'(1 - 8) 2" - r fort= 0, 1, ... , 2n
therefore be regarded as either condi tiona l (given
in very small samples, the condi tiona l distri butio
B) or uncon ditional. Except
n of D given tJ will be almost
and (15.2.3) gives the same as the uncon dition al distri butio n of D.
As we noted in Section 15.4,
the distri butio n of the likelihood ratio statis tic
is remar kably stable under
P(Y= y\T= t)= P(Y= y) = (
P(T= t) Y1Y2Y3
2n / (2n) .
t
n) different possible choices for the reference set.
usuall y doesn 't matte r much wheth er the signif
With likelih ood ratio tests it
icance level is comp uted
conditionally (given 8) or unconditionally.
In the exam ple we have
0.0000
D(Y1 , Y2)
29.63
Yi
10
Y2
8
g(y1,Y2)
0.1818
D(yi, Yi)
0.28
0
= CJC:)p'(l-p)··+.,-, l
2
17
16
0.0000
0.0002
20.92
15.21
11
12
7
6
0.1215
0.0616
1.13
2.55
4.60
3 15 0.0013 10.80 13 5 0.0233
where t = y 1 + y 2 . Thus T .= Y1 + Y2 is a sufficient statistic for p, and_the test *4 14 0.0065 7.32 14 4 0.0065 7.32
of H will be based on the condition al distributio n of Y1 and Y2 given the 5 13 0.0233 4.60 15 3 0.0013 10.80
12 0.0616 2.55 16 2 0.0002 15.21
observed value of T. 6
0.0000 20.92
The distributi on of Tis binomial (n 1 + n2 , p), and so 7 11 0.1215 1.13 17 l
IO 0.1818 0.28 18 0 0.0000 29.63
8
P(T=t;p )= ("1 ~"2)p'(l -p)"•+•2-r_ 9 9 0.2078 0.00
Total l.0002
P(f; a., {3) = U11J12 ... lab) i=f1I j=f1I (a.1 f31Y 11
. }
where y 1 + y 2 = 18. If p 1 = p 2 , then the 18 rats with tumors are a ra~do'll '
sample without replaceme nt from the 88 rats in the study, and g(y1 , Y2) is the
probabilit y that y 1 of the rats with tumors received the low dose and the other The r/s and c/s are sufficient statistics for the unknown parameter s.
y = 18 - y 1 received the high dose.
2
308
15. Sufficient Statistics and Con
ditional Tests
15.6. Some Examples of Con
The r;'s have a multinomial dist ditional Tests
ribution with class probabilities 309
and the c.'sJ are multinomial with class pro Ct. 1 , Ct. 2 , . • • , rxa
the independence hypothesis , babilities /3 1, [J 2 , ... , /3b· Under
the r/s are distributed independe Table 15.6.2. Conditional Exa
Hence the probability functio ntly of the c/s. ct Test for
n of the f;/s given the sufficie Independence jn a 2 x 2 Table
nt statistics is
x g(x)
0.0012
D(x )
10.69
The exact conditional signific 6.16 46 0.0002 14.97
ance level will be computed 37 0.0731
ditional distribution. from this con- 2.92 47 0.0000 20.05
38 0.1641 1.02
By (l 2.6.1) and ( 12.6.2), the 48 0.0000 26.00
likelihood ratio statistic for 39 0.2367 0.13
independence hypothesis is testing the 49 0.00 00 32.96
40 0.2320 0.o7 50 0.0000 41.12
41 0.1594 0.78
D = 2l..J:.J;i log (J;) eii) 51 0.0000 50.81
42 0.0781 2.21 52 0.00 00 62.75
where eii = ric /n. Not e tha t the 43 0.0275 4.33
1 estimated expected frequencies 53 0.00 00 80.40
same for all f;/s considered in eii will be the *44 0.0069 7.15
a conditional test.
To carry out an exact test of the Total 1.0000
independence hypothesis, we
(f;i) having the same row and list all tables
column totals as the observe
conditional probability and valu d table. The which simplifies to a hypergeom
e of Da re computed for each suc etric distribu tion :
exact conditional significance h table. The
level is then found by summin
all such tables for which D ~ Dob g P(f lr, c) over
s · Except in very small exampl for x = 35, 36, .. . , 53.
will be needed for the calculat es, a computer
ions.
The likelihood ratio statistic
is
EXAMPLE 15.6.2. In Example
12.6.1 we carried out an app
4 .~ 6 J
independence in the following roximate test for
2 by 2 table: D(x) = 2 [ x log :
3 56 + .. · + (x - 35) log
44 (39.56) with observed value Dobs = D(4
9 (13.44) 53 4) = 7.15 .
9 (13.44) 9 (4.56) 18 Fro m Table 15.6.2 we see tha t
D(x ) ~ D bs for x = 35 and
the exact significance level is 0 for x ~ 44. Hence
53 18 71
SL = g(35) + g(44) + g(45) +
Expected frequencies und er
the independence hypothesis
.. . + g(53) = 0.0104,
parentheses. are shown in and the observed table give
s strong evidence against the
For an exact test , we need to list independence. hypothesis of
all tables having the same row Iri this example the row and
totals as the observed table. and column column totals are modelled
The general form of such tabl variables, but we condition on as ran dom
es is
their observed values in the
independence. The independe exact test for
x 53- x 53 nce test .would be the same if
marginal totals had been fixed some or all of the
53 -x x- 35 18 prio r to the experiment. See the
Example 12.6.1. note following
53 18 71
where x = 35, 36, ... , 53. Onl EXAMPLE 15.6.3. Is the followi
ng 2 x 3 contingency table con
y one of the frequencies is hypothesis tha t the row and colu sistent with the
cor resp ond ing to the single deg "free to vary", mn classifications are indepe
ree of freedom for the approx ndent?
The conditional p.f. of such a imate X2 test.
table is
Total
g(x )= C 53 -x ;~-x x-3 5) / (53
71
18) (53
71
18)
1 (1.8)
2 (1.2)
1 (3.0)
4(2.0)
7 (4.2)
0 (2.8)
9
6
Total 3 5 7 15
310 15. Sufficient Statistics and Conditional Tests 15.6. Some Examples of Conditional Tests 311
SoLUTJON. The expected frequencies under the hypothesis of independence conditional probabilities:
are shown above in parentheses. Since these are small, it is advisable to carry
SL= g(O, 2) + g(I, l) + g(2, 0) + g(3, 0) + g(3 , 5) = 0.0084.
out an exact test for independence. For this we require a list of all 2 x 3 tables
having the same marginal totals as the observed table. The general form of The observed table gives strong evidence against the hypothesis of
these tables is independence. 0
x y 9-x-y 9
3-x 5-y x+y-2 6 PROBLEMS FOR SECTION 15.6
3 5 7 15 I. In a pilot study, a new deodorant was found to be effective for 2 of 10 men tested
and for 4 of five women tested. Carry out an exact conditional test of the
Just two of the frequencies are "free to vary", corresponding to the two hypothesis that the deodorant is equally effective for men and women.
x
degrees of freedom for the 2 approximation.
2.tTwo manufacturing processes produce defective items with probabilities p 1 and
There are 24 pairs (x, y) with 0 ~ x ~ 3 and 0 ~ y ~ 5, but three of these
p2 , respectively. It was decided to examine four items from the first process and
have x + y < 2 and would give a negative entry in the table. Thus there are
sixteen items from the second. In each case, two defectives were found. Perform an
only 21 allowable pairs (x, y) (see Table 15.6.3). The conditional probability exact conditional test of the hypothesis p 1 = p2 •
function is
3.tTwo manufacturing processes produce defective items with probabilities Pi and
g(x, y) p2 , respectively. Items were examined from the first process until the rth defective
15 had been obtained, by which time there had been Xi good items. The second
=C y 9-x-y 3-x process gave x 2 good items before the rth defective.
(a) Write down the joint probability function of X 1 and X 2 . Show that, if p 1 =
725.035
p2 = p, then T = X 1 + X 2 is a sufficient statistic for p.
- x! y!(9 - x - y)!(3 - x)!(5 - y)!(x + y - 2)! (b) For each process, items were examined until r = 2 defectives had been found.
and the likelihood ratio statistic is Process 1 gave 2 good items, and process 2 gave 14 good items. Carry out an
exact conditional test of the hypothesis p 1 = p2 , and compare the significance
x y level with that obtained in Problem 2.
D(x , y) = 2 x log ii+ y log .0
[ + + (x + y - 2)log x+y-2]
. .
3 28 4. Twelve pea plants were observed, and there were four of each of the genotypes
From Table 15.6.3. we see that D 0 b• = D(l, l) = 11.37. There are 5 tables for RR, RW, and WW. Use a conditio11al test to determine whether these results are
which D ~ 11.37, and the exact significance level is the sum of their consistent with the Hardy- Weinberg law (Section 15.5).
5. A study of the effect of Interferon on the severity of chicken pox was carried out
Table 15.6.3. Conditional Exact Test for Independence in a with 44 childhood cancer victims who had developed chicken pox. Doctors gave
2 x 3 Table Interferon to 23 children, and the. other 21 received an inactive placebo. The
disease was fatal or life-threatening in 2 of those who received Interferon, and in 6
x y g(x) D(x) x y g(x) D(x) of those who did not Test the hypothesis that disease severity is independent of
the treatment.
0 2 0.0020 13.46 2 2 0.1259 1.27
0 3 0.0140 7.72 2 3 0.2098 0.08 6.t An investigator wishes to learn whether the tendency to crime is influenced by
0 4 0.0210 6.81 2 4 0.1049 1.81 genetic factors. He argues that, if there is no genetic effect, the incidence of
0 5 0.0070 10.63 2 5 0.0126 8.00 criminality among identical twins should be the same as that among fraternal
•1 1 0.0030 11.37 3 0 0.0014 14.45 twins. Accordingly, he examines the case histories of 30 criminals with twin
1 2 0.0420 3.90 3 1 0.0210 6.81 brothers, of whom 13 are identical and 17 are fraternal. He finds that 12 of the
3 0.1259 1.27 3 2 0.0699 3.90 twin brothers have also been convicted of crime, but only two of these are
4 0.1049 1.81 3 3 0.0699 3.90 fraternal twins. Perform an exact conditional test of the hypothesis of no genetic
5 0.0210 6.81 3 4 0.0210 6.81 effect.
2 0 0.0006 16.37 3 5 0.0014 14.45
7. (a) Suppose that X and Y are independent and have Poisson distributions with
2 0.0210 5.63
Total 1.0002 means µ and v, respectively. Derive the appropriate conditional distribution
for a test of H: µ = kv, where k is a given constant.
312 15. Sufficient Statistics and Conditional Tests 15.6. Some Examples of Conditional Tests
313
(b) There were 13 accidents in a large manufacturin
g plant during the two weeks (a) Find a sufficient statistic for p.
prior to the introd uction of a new safety program. There (b) If the genes are not linked, they lie on differe
were oniy 3 accidents nt chromosomes, and p =}
in the week following its introduction. Test the hypoth Evidence against the hypothesis p =!is thus evidence
esis that the accident that the genes are linked .
rate has not changed. Describe an exact test for this hypothesis.
8. A likelihood ratio test for the hypothesis of margin (c) Describe exact and approximate tests of the model
al homogeneity in a 2 by 2 when p is unknown.
table was described in Section 12.8. 13. A lethal drug is administered to n rats at each
of k doses d 1 , d 2 , .. ., dk. Let the
(a) Show that the significance level in an exact condit numbers of deaths be Y1 , Y2 , .. ., Y,.. According to
ional test of this hypothesis the logistic model (Sectio n
will be computed from the binomial distribution 10.5), the probability of death at dose d is
1
that 8 s k for any specified parameter value k. The procedure for obtaining
CHAPTER 16* this probability is called the fiducial argument, and the probability is called a
fiducial probability to indicate the method by which it was obtained.
Topics in Statistical Inference
Probability Distributions of Constants
In the fiducial argument, the probability distribution of a variate U is
regarded as a summary of all the available information about U. This
distribution continues to hold until such time as additional information
about U becomes available. If U has a certain distribution before an
experiment is performed, and if the experiment provides no information
about U, then U has the same distribution after the experiment as before.
For example, consider a lottery in which there are N tickets numbered
1, 2, .. . , N , one of which is selected at random. Let U denote the number on
the winning ticket. Then
1 (16.1.1)
P(U =u)= - foru=l,2, .. . , N.
N
In Chapters 9- 15 we have used likelihood methods, confidence intervals, and Now suppose that the winning ticket has been chosen, but that the number U
significance tests in making ·inferences about an unknown parameter 8. In tHis not been announced. A value of U has now been determined, but we have
Sections 1 and 2 below, we consider two additional methods for making no more information concerning what that value is than we had before the
inferences about an unknown parameter. With both the fiducial argument draw. A ticket-holder would presumably feel that he had the same chance of
and Bayesian methods, information concerning 8 is summarized in a winning as he had before the draw was made. The .fiducial argument is based
probability distribution defined on the parameter space. For · Bayesian on the assertion that (16.1.l) summarizes the uncertainty about U even after the
methods one requires prior information about 8 which is also in the form of a draw has been made, provided that no information concerning the outcome of the
probability distribution. For the fiducial argume~t, 8 must be completely draw is available. After the draw, U is no longer subject to random variation,
unknown before the experiment. but is fixed at some unknown value. Now (16. l.l) summarizes all the available
In Section 3, we consider the problem of predictirig a value of a random information concerning the unknown constant U, and may be called its
variable Y whose probability distribution depends upon an unknown fiducial distribution .
parameter 8. When a Bayesian or fiducial distribution for 8 is available one The fiducial argument does not involve any new "definition" of probability.
can obtain a predictive distribution for Y which does not depend up~n 8. Instead, it enlarges the domain of application of the usual (long-run relative
Section 4 considers the use of predictive distributions in statistical inference, frequency) notion of probability. Of course, one could take the position (as
with particular reference to the Behrens- Fisher problem. Finally, in Section 5 some people have) that (16.1.l) applies only before the draw, and that, after
we illustrate how a test of a true hypothesis can be used to obtain intervals of the draw, no probability statements whatsoever can be made. This position
reasonable values for a future observation or an unknown parameter. seems unnecessarily restrictive, and if adopted, would rule out many
important applications of probability. .
Before proceeding with the general discussion, we illustrate the fiductal
argllinent in two examples.
16.1. The Fiducial Argument
EXAMPLE 16.1.1. A deck of N cards numbered 1, 2, ... , N is shuftled and one
Suppose that we have obtained data from an experiment whose probability card is drawn. Let U denote the number on this card. Then Uhas probability
model involves a real-valued parameter 8 which is completely unknown. We distribution (16.1.l ). To this number is added a real number 8 which is
shall see that, under certain conditions, it is possible to deduce the probability completely unknown to us. We are _not told the value of U or the value of 8,
but only the value of their total T = 8 + U. What can be said about 8 in the
*This chapter may be omitted on first reading. light of an observed total t?
316
16. Topics in Statistical Inference
16.1. The Fiducial Argument
The observed total t cou ld hav 317
e arisen in N different ways:
(u = 1, 8 = t - 1), (u = 2, 8 = t - 2), ... , (u = N, Not e that probability stat eme
8 = t - N). nts obta ined from (16. 1.2) are
Given t, there is a one -to- one would be obtained if 8 were a ran the same as
correspondence between values dom variable having a normal
possible values of 8. If we kne of U and with mean t and variance !. We dist ribu tion
w the value of 8, we could dete say tha t given T = t, the fiducial
value of U had .been obtained. rmine which of 8 is N(t, 1). This does not mea di stribution
If we knew that 8 was an even inte n that 8 is a ran dom variable, but
could deduce whether an odd ger, then we we know precisely as much abo rath er tha t
or even value of U had been ut e as we would abo ut an obs
However, if we know noth ing abo obtained . t::>ken at ran dom from N(t, 1). ervation to be
ut 8, then the experiment will tell Fro m (16. l.2), the cumulative
abo ut U; the stat e of uncertainty us nothing dist ribu tion function of the fidu
concerning the value of U will buti on of 8 is F(8 - t), where cial distri-
after the experiment as before. be the same F is the c.d.f. of N(O, 1). Differe
Hence we assume that (16.1.1) also respect to e gives ntiation with
tis known. But, given t, 8 has N holds when .
possible values t - 1, t - 2, ... ,
to-o ne correspondence with the t - N in one-
possible values of U, and we may
write .!__ F(8 - t) = f (e - t) o(e - t) = f (8 -
. 1 08 t)
P(8 = t - u) = P(U = U) = - ' : 08
u = 1, 2, ... ' N.
n where f is the p.d.f. of N(O , I). Hence the fidu
Thi s probability dist ribu tion ove cial p.d.f. of 8 is
r the possible values of 8 is called
distribution of 8.
For instance, suppose that N
= 13, and that the observed
the fiducial
total is r = 20.
f(8; t) = fo exp {- ~(8 - t) 2 } for - 00 < e < 00 .
The n 8 .~as 13 possible values
19, 18, 17, ... , 7, each with probab This is the p.d.f. of a normal dist
probab1hty of any subset of ility fl. The ribu tion with mean t and vari
8 values is now obtained by add result of the fiducial argument, ance 1. As a
example, ition. For 8 and T have switched roles, with
t now appearing as a "pa ram the observed
eter " in the fiducial dist ribu tion
P(8 .$l l)= P(8 =1 1)+ P(8 =1 of e.
0)+ ·· · +P (8= 7)= 2_
Alternately, we may note that 13·
if 8 .$ 11, then the observed tota Sufficient Co ndi tio ns for the Fid
resulted from a value of U grea l 20 must have uci al Arg um ent
ter than or equal to 9, and hen
ce In the preceding two examples,
we mad e use of a qua ntit y U
P(8 .$ 11) = P(U ~ 9) = 5 function of bot h the data and which was a
of 8 and possible values of U by C3(b). Since 8 is completely unknown, still distributed as xfin>· Statements about 8 can now be converted into
observing twill give us no information about which value of U was actually statements about U, and their probabilities can be obtained from tables of the
obtained. Hence we assume that the distribution of U is the same after t has x1 distribution.
been observed as it was before observing t. Given t, we can convert statements The fiducial p.d.f. of 8 can be obtained from the p.d.f. of U by standard
about 8 into statements about U and hence obtain their (fiducial) change of variables methods. By (6.9.1 ), the p.d.f. of U is
probabilities. for u > 0
The above conditions are quite restrictive. In particular, C3(a) and (b)
imply a one-to-one correspondence between values of T given 8, and values of where k = 1/2"r(n). The fiducial p.d.f. of e is thus
8 given T, which will very rarely exist if T is discrete. Example 16.1.l is
exceptional in that, when t is known, there are only finitely many possible
dul
g(8; t) = f(u). d8 = k
(2t)n-
e l
e-t/9.
2t
e1
values for e. l
If the sufficient statistic Tis continuous, one can usually take U = F(T; 8),
where Fis the cumulative distribution function of T. From Section 6.3, U has
- 1
- 8r(n)
(t)"
8 e
-1/9 fore> 0.
a uniform distribution between 0 and 1 ·for each value of 8, and hence is a
pivotal quantity. Since F(t; 8) = P(T s;; t) is an increasing function oft, C3(a) EXAMPLE 16.1.5. Consider the situation described in Example 16.1.1, but now
will also be satisfied, and only C3(b) needs to be checked. If C3(b) holds, then suppose that n cards are drawn at .random with replacement from the deck.
P(8 s;; k) will be equal to either F(t; k) or 1 - F(t; k), depending upon whether The same unknown 8 is added to the number on each card, and we are told
F(t; 8) is an increasing or decreasing function of 8, and the fiducial p.d.f. of 8 is then totals x 1 , x 2 , .•. , x •. We wish to make inferences about 8 on the basis of
given by the data.
Each Xi can take N equally probable values 8 + 1, 8 + 2, ... , 8 + N, so that
f(B; t) = [: F(t;
8
e)[. the probability function of X; is
f(x) = P(X 1 = x) = N- 1 for x = e + 1, e + 2, ... , e + N.
EXAMPLE 16.1.3. Suppose that the MLE & is a sufficient statistic for the Under random sampling with replacement, the Xi's are independent, and
unknown parameter IX, and that &"'N(IX, c) where c is a known constant. hence their joint probability function is
Then the standardized variable
fore+ 1 s;;x 1 ,x 2 , .• .,x.s;;O+N.
Z = (&-1X)/Jc
The likelihood function of 8 is thus constant over the range of possible
is pivotal and is distributed as N(O, 1). It satisfies conditions 3{a) and 3(b). To e
parameter values. We must have + 1 s;; x(l) and + N:?: X(n)> where x(l) ande
obtain the fiducial distribution of IX, we assume that Z is still distributed as x<n> are the smallest and largest sample values, so that
N(O, 1) when the variate &is replaced by its observed value. Then we have
L(8) = 1 for X(n) - N :£, e:£, x(l) - 1.
1X::&-zJc
It follows that
x(l) and x(n) are jointly minimally sufficient for e.
where &and care known constants, and (6.6.6) gives ·The number of possible parameter values is
IX"' N(&, c). x(ll - 1- [x<•> - N - 1] = N- a
Given &, the fiducial distribution of a is normal with mean &and variance c. where A = X<•> - X< 1 > is the sample range. The larger the value of A obtained,
the more precisely we may determine the value of 8. Ifwe observe A= 0, there
EXAMPLE 16.1.4. Let X 1 , X 2 , •• ., X" be independent variates having an are N equally likely values for 8, but if A= N - 1, the value of 8 can be
exponential distribution with unknown mean 8. Then T = EXi is sufficient for determined exactly without error. Thus A is a measure of the experiment's
e, and informativeness, and is in fact an ancillary statistic. To see this, we write
U = 2T/8 ~ XfinJ X 1 =8 + Ut> where U1 is thenumberon the ithcarddrawn(i = 1,2, ... , n). Then
X< 1> = 8 + U< 1 > and X<n> = 8 + U<n» so that
is a pivotal quantity satisfying conditions 3(a) and 3(b). To obtain the fiducial
distribution of 8, we replace T by its observed value t and assume that U 1s A= X<•> - X< 1 > = U<•> - U<i»
320
16. Topics in Statistical Inference 16.2. Bayesian Methods
321
The distr ibuti on of A thus depe nds only
on the range of numb ers which Two-Parameter Fiducial Distributions
appe ar on then card s draw n, and does
not depe nd upon 0.
We now define a statis tic T such that the Sometimes a doub le appli catio n of the one-
trans form ation from X<tl• X<n> to
T, A is one- to-on e; for insta nce, we could para mete r fiducial argu ment can
take T = X(l>- Then T, A are joint ly be used to obtai n a two- param eter fiduc
sufficient for 0 and A is ancillary. Infer ial distr ibuti on. However, there are
ences abou t 0 will be based on the examples where this can be done in two
cond ition al distr ibuti on of T given the or more different ways , leadi ng to
observed value of A . To obtai n this different two- param eter distr ibuti ons. Ther
distr ibuti on, we could first derive the e are serio us difficulties in exten d-
joint prob abili ty function of x(l) ing the fiducial argu ment beyo nd the one-
and X<n> as in Prob lem 7.2.11, chan ge para mete r case, and the precise
variables, sum out T to get the cond ition s unde r which this can be done
prob abili ty function of A, and divid are not known.
e to get the required cond ition al
prob abili ty function,
I
16.2. Bayesian Methods
f (tla; 0) = -- fort = 0 + l, 0 + 2, ... , 0 + N - a.
N- a In all of the proc edur es discussed so far,
only the infor mati on prov ided by the
Give n that A = a, the n total s must fall experimental data is formally taken into
in a rang e of lengt h a which lies acco unt. How ever, in some situ-
entirely between 8 + 1 and 8 + N. Ther ation s we may wish to incor pora te infor mati
e are N - a such ranges, with lower on abou t 8 from othe r sourc es as
limit s 8 + 1, 8 + 2, ... , 0 + N - a, and these well. If this addit ional infor mati on is in the
will be equally prob able. form of a prob abili ty distr ibuti on
Now define U = T- 8. The cond ition al for 8, it can be comb ined with the data
distr ibuti on of U given that A= a using Bayes's Theo rem (3.6.1).
is uniform, Supp ose that the prob abili ty mode l for
the expe rime nt depe nds on a
para mete r 8, and that an event E with
prob abili ty P(E; 8) is obse rved to
1 occur. In addit ion, supp ose that 8 is itself
P(U = uja) = - - for u = 1, 2, .. ., N - a, a rand om varia ble with a know n
N- a (16.1.3) prob abili ty distr ibuti on, called the prior
distr ibution of 0, with prob abili ty or
probability density function g, say. The
and does not depe nd upon 8. Give cond ition al distr ibuti on of 0 given
n A and 0, there is a one- to-on e that E has occurred is called the poste
corre spon denc e between possible values rior distr ibuti on of 0. The poste rior
of U and T. Give n A and T, there is a distr ibuti on has probability or prob abili
one- to-on e corre spon denc e between possi ty density function given by
ble values of U and 0. Thus, when
A is given, the sufficient cond ition s for the
fiducial argu ment are satisfied. The f (OIE) = P(E; O)g(O)/ P(E) (16.2 .1)
fiducial distr ibuti on of 0 is obta ined by where P(E) is a norm alizin g cons tant:
assuming that (16.1.3) conti nues to
hold when Tis replaced by its obse rved
value t, and this gives
L P(E; O)g(O) . e
if is discrete;
l 8E!l
P(O= k)= N _a for k = t - 1, t - 2, .. ., t - N + a. P(E) = f"" (16.2.2)
(16.1.4) { P(E; _ O)g(O)dO if 0 is cont inuo us.
For exam ple, supp ose that N = 13, and 00
that we observe then = 4 total s 17, The poste rior distr ibuti on comb ines the
11, 14, 23. Then t = x<I) = 11 , x<n> = 23, infor mati on abou t 0 prov ided by the
and a= 23-1 1=1 2. Now (16.1.4) experimental data with the infor mati on
implies that 0 = 10 with prob abili ty !. conta ined in the prior distr ibuti on.
In this case the experiment is very The likelihood function of 8 base d on the
infor mati ve and completely deter mine obse rved even t E is given by
s the value of 0. If we were less
fortu nate, we migh t observe total s such L(O; E) = kP(E; 8)
as 13, 17, 19, 13. Then t= 13 and
a= 6, so that now (16.1.4) gives where k does not depe nd upon 0. Henc
e we may write
P(O = k) =~ fork = 12, 11, 10,. .. , 6. f (81E) = cL(O; E)g(8) ( 16.2.3)
Ther e are now seven equally prob able where c is a cons tant with respect to
values of 0. In the wors t possible 0, and is chos en so that the total
case, we obse rve equa l totals, 18, 18, probability in the poste rior distr ibuti on
18, 18. Then t = 18, a= 0, and is 1:
( 16.1.4) gives
1
I L(O; E)g(O) e
if is discrete;
f
8E!l
P(8= k) =fs fork = 17, 16, 15, .. ., 5
~ = { ~,,, L(O; E)g(8)d8
(16.2.4)
so that there are 13 equa lly prob able value
s of fJ.
e
if is conti nuou s.
322 16. Topics in Statistical Inference 16.2. Bayesian Methods 323
The posterior p.f. or p.d.f. is thus proportional to the product of the likelihood defectives. If n is small in comparison with the batch size, the probability of x
function and the prior p.f. or p.d.f. of e. defectives in the sample is
EXAMPLE 16.2. i. Consider the inheritance of hemophilia as discussed previ- P(x; e) = (:) e"(l - er". ( 16.2.6)
ously in Example 3.6.3. Suppose that a woman has n sons, of whom x are
hemophilic and n - x are normal. The probability of this event is Given no additional information, inferences about e would be based on
(16.2.6).
P(x; e) =(:)OX(! - er" (16.2.5) It may be that similar batches are received at regular intervals from the
same manufacturer. The value of e will vary somewhat from batch to batch. If
where e is the probability that a particular son will be hemophilic. The the manufacturing process is reasonably stable, one might expect the
problem is to make inferences about e. variation in eto be random, and introduce the assumption that eis a random
Given no additional information about e, inferences would be based on variable with probability density function g, say. Data from past samples
(16.2.5). One could graph the relative likelihood function of e, or compute would be used to help determine the form of the prior density function g.
confidence intervals. However, it may be possible to extract some information An assumption which makes the mathematics easy is that 8 has a beta
about() by examining the woman's family tree. For instance, suppose that the distribution with parameters p and q,
woman had normal parents, but she had a brother who was hemophilic. Then
her mother must have been a carrier, and she therefore had a 50% chance of for 0 < () < 1, (16.2.7)
inheriting the gene for hemophilia. If she did inherit the gene, then there is a where k = r(p + q)/r(p)r(q). Then, by (16.2.3), the posterior p.d.f. of 8 given x
50% chance that a particular son will inherit the disease(()= 1), and if she did is
not, all of her sons will be normal (8 = 0). (The possibility of a mutation is f(f)l x) = c(x)()"+p-1(1- erx+q -1 forO<e<l,
ignored in order to simplify the example.) The prior probability distribution
of e is thus given by which is also a beta distribution with parameters x + p and n - x + q.
Probabilities can be computed by numerical integration, or from tables of the
g(O) = P(e = 0) = t; g(!) = P(() = 1) = 1. F-distribution (see Problem 6.10.12).
With this additional information, it is now possible to base the analysis on Of course, it would be unwise to assume (16.2. 7) merely because it leads to
Bayes's Theorem. simple mathematics. Data from past samples should be used to check the
By (16.2.3), the posterior probability function of () is given by adequacy of (16.2.7), and to estimate the parameters p and q. As additional
data accumulate, further checks of the model can be made, and more precise
f(el x) = c(xW(l - er"! fore= o, 1. estimates of p and q can be obtained. Procedures such as this, in which data
If x > O; then () = O and () = i have posterior probabilities 0 and I , respec- are used to give information about both the current value of() and the prior
tively. If x = 0, the posterior probabilities are distribution of(), are called empirical Bayes methods. 0
P(e = OIX = 0) = c/2; P(() =!IX= 0) = c/2"+ 1 .
In the two preceding examples, it was natural to regard the value of () as
Since the sum of these must be 1, we find that c = 2" + 1 /(2" + 1), and hence having been generated by a repeatable experiment. Prior probabilities for
that 8-values then correspond to the relative frequencies with which the various
9-values would be expected to arise in many repetitions of the experiment. It is
possible, conceptually at least, to verify the prior distribution empirically by
actually repeating the experiment to obtain a sample 8 1 , 8 2 , ... , e. of
If the woman has at least one hemophilic son (x > 0), she must be a carrier. If e-values. These values could be.compared with the assumed prior distribution.
she has only normal sons (x = 0), the probability that she is a carrier decreases However, the analysis would usually be complicated by the fact that only
as n increases. estimates 81' 82, ... ' 8. were available.
Applications such as these, in which the prior distribution is the probability
EXAMPLE 16.2.2. Suppose that components are received from a manufacturer model of a physical process which generates the value of e, are not
in large batches, and let 8 denote the proportion of defectives in a batch. A controversial. However, Bayesian methods are· sometimes advocated in
random sample of n items is chosen from the batch, and is found to contain x situations where() is thought of as a constant. The prior distribution may be
324 16. Topics in Statistical Inference 16.2. Bayesian Methods
325
an objective summa ry of the prior state of knowledge concerning 8,
or it may Example 16.1.4. The same result would be obtaine d by taking the
be a stateme nt of an individual's subjective beliefs about 8. There fiducial
are distribution of 8 from the previous experiment as the prior distribu
differences of opinion among statisticians concerning the appropriateness tion in
of Bayes's Theorem . However, the latter procedure seems inappro priate
Bayesian method s in such situations. because
it violates the symmetry between the two experiments, and it may
lead to
unacceptable results in more complicated situatio ns. For further discussi
on,
Fiducial Prior Distributions see D.A. Sprott, "Necessary restrictions for distribu tions a posterio
ri",
Journal of the Royal Statistic al Society, B, 22(1960), pages 312-318
.
It may be that the conditions for the fiducial argume nt were satisfie?
in s_om.e
previous experiment involving the same parame ter 8. The fiducial
dts~n Prior Distributions which Rep~esent Ignora nce
bution of 8 from the previous experiment might then be used as
the pnor
distribu tion of 8 in the current experiment.
Various attempts have been made to formulate prior probability
distri-
butions which represent a state of total ignorance about the parame
EXAMPL E 16.2.3. Suppos e that, in a previous experim ter (see
ent, N components with H. Jeffreys, Theory of Probability, 3rd edition, Oxford: Clarend on
exponentially distributed lifetimes were tested until failure. From Press,
Example 1961). These are generally derived from argume nts of mathem atical
16.1.4, the fiducial distribu tion of the mean lifetime 8 has p.d.f. symme-
try and invariance.
Let us consider the simplest case, in which nothing is known about
1 ( t )N -t/8 f a
g(O) =
8
r(N) (j e or 8 > 0, parame ter 8 except that it must take one of a finite set of values {1, 2,
... , N} .
It might be argued that, since there is no reason to prefer one of these
where t is the total of the observed lifetimes. In the current experim values
ent, n over another, they should be assigned equal probabilities (Laplace's
additional compon ents are tested simultaneously, and testing stops Principle
after a of Insufficient Reason). The stateme nt that the N possible parame ter
predetermined time period T . From Section 9.5, the likelihood function values
of 8 are equally probabl e is then supposed to represent a complete
based on the current experiment is lack of
knowledge of 8.
L(8) = 8- me - •IB The above argument implicitly assumes that there exists some probabi
for 8 > 0, lity
distribution which appropr iately represents total ignorance. If this
where mis the number of compon ents which were observed to fail, assump-
ands is tion is granted, then the assignment of equal probabilities seems inevitab
the total elapsed lifetime of all n compon ents (including t?ose :'h?se le.
~ailure However, the assumption itself is questionable. It would seem more
times were censored)~ By (16.2.3), the p.d.f. of the posterior d1stnbu reason-
t1on of able to represent prior ignorance by equally likely, rather than
8 is equally
probable, parame ter values. If the N parame ter values are equally probabl
e,
then P(8 # l) = (N - l)/N, and this would seem to be an informa
tive
statement. However, no such stateme nt is possible if they are assumed
to be
equally likely, because likelihoods are not additive.
for 8 > 0.
Now consider a parame ter 8 which can take values in a real interval
It can now be shown by change of variables that 2(s + t)/8 0 < 8 < l, say. Great difficulties arise in trying to formulate a proba
has a X2 bility
distribu tion with 2(m + N) degrees of freedom. Hence tables of distribution of 8 which represents total ignorance. If one assumes
the X2 that the
distribu tion may be used to obtain the posterior probabilities of stateme distribution of 8 is uniform, then one-to-one functions of 8 will generall
nts y not
about 0. have uniform distributions because of the Jacobia n involved in continu
Note that it would not be possible to derive a fiducial distribution for ous
8 on change of variables. If 8 is totally unknow n, then presum ably 8 3 is also
the basis of the current experiment, or on the basis of the p'.eviou totally
s an_d unknown, but it is impossible to have a uniform distribu tion on both
current experiments combined. In each case the minimally sufficien of them.
t stat1st1c This problem does not arise if prior ignorance is represented by equally
is two dimensional, and there exists no ancillary statistic . likely
. parame ter values, because likelihoods are invarian t under one-to-o
If there were no censoring in the second experiment, the two expenm ne para-
ents meter transformations.
could be combin ed to give a single experiment in which N + n co1?1pon For further discussion, see Chapter ' l of Statistic al Method s and Scientifi
en.ts c
were tested to failure. A fiducial distribu tion for 8 could then Qe denved Inference by R.A. Fisher (2nd edition, New York: Hafner, 1959).
as m
16. Topics in Statistical Inference 16.3. Prediction 327
326
16.3. Predicti on
Predicting an (n + l)st Observation from an Exponential
Suppose that we wish to predict the value of a random variable Y whose Distribution
probability distribution depends upon a parameter e. We assume that {} is
unknown, but that a previous set of data gives some information about the Suppose that n independen t values are observed from an exponentia l
value of e. In predicting Y, we have two types of uncertainty to contend with: distribution with unknown mean e. We wish to predict the value of Y, an
uncertainty due to random variation in Y, and uncertainty due to lack of (n + l)st observation to be taken from the same exponential distribution .
knowledge of e. We wish to make statements about Y which incorporate both The fiducial argument is applicable in this case. From Example 16.1.4, the
types of uncertainty . fiducial p.d.f. of{} based on the observed sample is
For example, suppose that the lifetimes of a certain type of rocket
component are exponentia lly distributed with mean e. We have tested n f({}) = _l_(~)"e - r/8 for{}> 0,
component s, and have observed their lifetimes x 1 , x 2 , .. ., x •. We wish to {}r(n) {}
predict the lifetime of another component , or perhaps the lifetime of a system
where t = LXi is the observed sample total. Given {}, the p.d.f. of y is
made up of several such component s. Even if we knew {}, we could not make
exact predictions because lifetimes are subject to random variation; that is, l
g(y; {}) = 7/-y/8 for y > 0.
component s run under identical conditions will generally have different
lifetimes. The problem is further complicated by the fact that we do not know
the value of {}, but have only limited information obtained from the n By (16.3.1 ), the p.d.f. of the predictive distribution of Y is
component s tested. Both the randomnes s of Y and the uncertainty about {}
f
00
Integ ratin g with respect to y now gives p(y) = I f"' exp{~-l-(y-cr.)2 __1_(ct. - &)2}dct..
2rc~ -co 2c 1 2c 2
P(Y $: y) = JYp(v)dv =
0
l -(-t-)
+
" t y
for y > 0,
This integral may be evalu ated by comp
prod uce a norm al integral. After a bit of
letin g the squa re in the expo nent to
algeb ra, we find that p(y) is the p.d.f.
and prob abili ties of state ment s abou of a norm al distr ibuti on with mean
t Y can easily be obtai ned. These & and varia nce c 1 + c . Henc e the
prob abili ties take into acco unt both predictive distr ibuti on is 2
the rand om varia tion of Y and the
avail able infor mati on abou t e. Y ~ N(a, c 1 + c 2 ) .
In Exam ple 9.4.1 we considere d n = An easier way to obta in this result is
10 observed lifetimes with total to write Y = er. + A Z wher e
t = 288, and in this case Z 1 ~ N(O, 1), and ct.= & +
Com binin g these gives
Fiz 1
2 wher e Z 2 - N(O, 1), indep ende ntly of
Z 1.
P(Y::; y)=. l- ( - 288
-
)lO for y > 0. Y=& +AZ 1 +F i z2
288 + y
where &, c 1 and c 2 are known cons
We use this to ma ke predi ctive state ment tants . Now (6.6.6) and (6.6.7) give
s abou t the lifetime Y of anot her Y ~ N(&, c 1 + c 2 ) as before.
com pone nt of the same type. For insta
nce, we obta in
EXAMPLE 16.3. l. Supp ose that we have
P(Y $: 5) = 0.158 , P(Y ~ 75) = 0.099 alrea dy obse rved n indep ende nt
meas urem ents x 1 , x 2, .. . , x. from N(µ, a 2
) with a know n, a nd that we wish to
and so on. Also , we find that predi ct the average value Y of m futur
e obse rvati ons from the same
distr ibuti on. From Exam ple 16.1.3, the fiduc
ial distr ibuti on ofµ base d on the
P( Y :$: 1.48) = P( Y ~ 100.6) = 0.05. x;'s isµ~ N(x, a 2 /n). The samp ling distr
ibuti on of Yis Y - N(µ, a 2/m}. Henc e
The inter val 1.48 $: Y $: 100.6 is called a by the discussion above, the predictive
90% predictive interval for Y . As one distr ibuti on is
migh t expect, the interval is quite wide ,
indic ating that we cann ot predi ct the
lifetime of a single comp onen t Y with
muc h precision.
It is of some inter est to comp are the
abov e results with what we could This distr ibuti on comb ines unce rtain ty
obta in if we knew the value of 8. If we assum due to lack of know ledge ofµ with
e that 8 is equa l to its maxi mum unce rtain ty due to rand om varia tion
likel ihoo d estim ate, we have in Y. If n-+ oo, then x >:::: µ. The
unee rtain ty due to lack of knowledge ofµ
is then negligible, and the predi ctive
P(Y $: yJ8 = 28.8) = 1 - distr ibution becomes the samp ling distr
e -yf l B. s for y > 0. ibuti on of Y. On the othe r ha nd, if
m-+ oo , then unce rtain ty due to rand
From this we obta in om varia tion in Ybec omes negligible , and
the predictive distr ibuti on beco mes the
fiducial distr ibuti on ofµ.
If a is also unkn own, we can integ rate over
P(Y $: 1.48) = P(Y~ 86.3) = 0.05. its fiducial distr ibuti on as well
to obta in
The centr al 90% interval is 1.48 :$: Y::; 86.3,
the 90% predi ctive interval. This indic
pred ictin g Y is due to the rand om varia
whic h is not much narro wer than
ates that most of the unce rtain ty in
tion of Y rathe r than to lack of
J s
Y-x
2(1
-+ -
n
1
m
)-t(n-1)
infor mati on abou t the value of e. 1
where s2 = - - E(xi - x) 2 (see Secti on
n-1 16.4).
16. Topics in Statistical Inference 16.4. Inferences from Predictive Distributions 331
330
EXAMPLE 16.3.2. Suppose that the straight line model (13.5.1) has been fitted It can be argued that, if a is known, then & carries all of the relevant
1
ton observed pairs (xi, Y;), i = 1, 2, ... , n. We now wish to predict the value Y information about IX. The sampling ,d istribution of & is N(IX, ca ) where c is a
of the dependent variable when the independent variable has value x. For constant. If a is known, inferences about IX are based on this distribution.
instance, in Example 13.5. l we might wish to predict the systolic blood The sampling distribution of & depends on a, and so we cannot use it for
pressure Y of a particular woman aged x = 50 years. inferences about IX when a is unknown. Instead we shall derive a predictive
If a is known, the argument preceding the last example may be applied. The distribution for & which does not depend on a, and then use the predictive
sampling distribution of Y is N(µ , a1 ) where µ =IX + {Jx. One can argue that distribution for inferences about a.
fl = &+ '/lx carries all of the relevant information aboutµ . From Section 13.6, =
Let V r.ef denote the residual sum of squares for the linear model. Then
we have fl - N(µ, ccr 1 ) where V carries all of the relevant information about a, and
c= -
l
+ (x - x) 2 /Sxx . U =V/a 1
- xfn - q)'
n
independently of&.
Hence, from Example 16. l.3, the fiducial distribution ofµ is N(fl, ca ). It now
1
U is a pivotal quantity which satisfies the conditions for the fiducial
follows that the predictive distribution is argument. To obtain the fiducial distribution of a, we replace V by its
Y - N(fl, (1 + c)a 2 ) . observed value v = (n - q)s 2 , giving
A central 99% predictive interval for Y is then We can avoid having to evaluate this integral by using (6.10.1). We have
YE fl± aJs 2
[ l +~ + (x - x) /SxxJ
2 &::cx+Z#c
2
where Z ~ N(O, 1), independently of U . Substituting for a gives
where P{lt(n-ll/ $a}= 0.99.
For instance, in Example 13.5.1, the central 99% predictive interval for the &=:a + ZJ c(n - q)s 2 /U ==:a + T.j'S2C
blood pressure of an individual woman aged 50 years is where T = Z -':-JU/(n - q)- t<n-q) .by (6.10.1). Hence predictive statements
YE 137.68 ± 23.18. for & given s are obtained from
From Section 13.6, a 99% confidence interval for the mean blood pressure of (16.4.1)
all women aged 50 years is
µ E 137.68 ± 6.55. 2
This result appears to be identical to (13.2.7), but it is not. In (13.2.7), s is a
The interval for Y is much wider than the interval for µ, because there is random variable such that (n - q)s
2
/a 2
- xtn-q)• whereas in (16.4. l) s 2
is the
considerable variability in systolic blood pressure among women of the same particular observed variance estimate.
age. Even if we knew µ exactly, we could not predict the value of Y very In this problem, s 2 plays the role of an ancillary statistic. Since its sampling
2
precisely. distribution does not depend upon IX, s give no direct information about the
magnitude of IX. However, its observed value indicates the informativeness or
precision of the experiment with respect to a. By the arguments of Section
15.3, s 2 should be held fixed at its observed value in making inferences about
16.4. Inferences from Predictive Distributions IX. Thus it would seem appropriate to use the predictive distribution (16.4.1)
Suppose that Y1 , Y2 , ... , Y,, are independent N (µ;, a ), and that the µ/s are
2 rather than the sampling distribution (13.2.7) in making inferences about a.
linear functions of q unknown parameters ex, fJ, y, .. . , where q < n. This is the In fact, one will obtain the same numerical values for significance levels and
normal linear model (see Sections 13.l and 13.2). confidence intervals whether one uses (16.4.1) or (13.2.7), and so the
332
16. Topics m Statistical Inference
16.4. Inferences from Predictive Distributions
333
distin ction does not matte r in this case. It
does matte r in more complicated
cases, such as the Behr ens-F isher probl em The distri butio n of a linear comb inatio n
to be consi dered below.
Note that ( 16.4.1) defines a pivot al quan tity
T which satisfies the cond itions T= T1 cos 0- T2 sin(),
set out for the fiducial argum ent in Section
16.1. Thus (16.4 .1) can be used to
obtai n a fiducial distri butio n for rt. when <f where T1 - t 1.,> and T2 - t<, >are indep ende
is unknown. 2 nt, is called the Behrens- Fisher
distribution. It is tabul ated in the Fishe r
and Yates Statistical Tables for
Biological, Agricultural and Medical Research.
In this case we have
Beh rens -Fis her Problem
· sin()
tan() = - - =
Supp ose that we have n + m indep ende nt cos()
meas urem ents made using two
different techn iques which may not be equal
ly precise. The n meas urem ents so that () is a function of the observed varia
made with the first techn ique are mode nce ratio sVsf .
lled as N(µ 1 , <ff), and the m When (J 1 and (J 2 are unkn own, inferences abou
meas urem ents made with the secon d techn t µ 1 - µ 2 may be based on
ique as N(µ 2 , <f~). We wish to the pivotal quan tity
make inferences abou t µ - µ •
1 2
If (J 1 = (J 2 , we have just the two-s ampl e mode
l discussed in Section 13.4. A T= (iii -fl2) -(µ1 -µ2)
similar analysis to that in Section 13.4 is
possible if (Ji = k(J 2 where k is a
know n const ant. However, if the ratio
(J tf(J is unkn own, the analysis
J c 1 sf + cis~
becomes difficult and controversial. The probl 2 which is referred to tables of the Behr ens-F isher
em of maki ng inferences abou t distri butio n with param eters
µi - µ 2 when (Ji/(J 2 is unkn own is called v 1 , v2 , and( ), where v = n -1 , v = m-
the Behr ens-F isher problem. 1 2 1, and
The MLE of µ 1 - µ 2 is y - fi, the differ
1 ence between the two samples
mean s. Its samp ling distri butio n is
16.5. Testing a True Hypothesis to construct a range of plausible values for y 2 • For instance, suppose that 12
successes are observed in 20 Bernoulli trials, and that we wish to predict y 2 ,
Sometimes one can generate intervals of"reasona ble" values for an unknown the number of successes in 30 future trials with the same success probability .
quantity y by the device of testing a true hypothesis H. One assumes a value Taking n 1 = 20, n2 = 30, and y 1 = 12, we can compute D for selected values of
for y, carries out a test of significance, and finds the significance level SL(y). A y 2 • We find that D::;; 3.841, and hence SL~ 0.05, for 10::;; y 2 ::;; 25. Values of
small significance level inilicates an inconsistency, and if H is known to be y 2 outside this interval are implausible in that they would lead to a
true, doubt is cast on the value assumed for y . One can define a 95% interval significance level less than 0.05 in a test of the true hypothesis p 1 = p 2 .
or region as the set of all values of y such that SL(y) ~ 0.05. Several examples EXAMPLE 16.5.3. Suppose that a lake contains n 1 tagged fish and n 2 untagged
will be given to illustrate this procedure. fish, where n 1 is known but n2 is unknown. Fish are caught during a
2 predetermin ed period of time, and the catch is observed to consist of x tagged
EXAMPLE 16.5.1. Suppose that X ~ N(µ 1 , a /n) and Y- N(µ 2 , a / m) inde-
2
fish and y untagged fish. What can be concluded about n 2 ?
pendently of X, where a is known. Given observed values of){ and Y, we can We assume n 1 + n 2 independen t trials, and let the probability that a
test H: µ 1 - µ 2 using the result that ){ - Y- N ( 0, u
2
( ~ + ~)) under the particular fish is caught during the time period be p1 for tagged fish and p2 for
untagged fish. Given a value for n2 , we can test H: p 1 = p 2 as in the preceding
hypothesis. The significance level will be 5% or more if and only if example. If we are willing to assume that H is true, then a small significance
level casts doubt on the value chosen for n 2 .
x-y
-1.96 s ---;:=== == s 1.96. (16.5.1) For instance, suppose that there are 110 tagged fish in the lake, and that the
u2G +~) sample contains 20 tagged fish and 478 untagged fish. Taking n 1 = 110,
y 1 = 20, and y 2 = 478, we can compute D for selected values of n 2 • We find that
D s 3.841, and hence SL~ 0.05, for 1817 s n 2 ::;; 4097.
Now suppose that we don't know ybut we do know that µ 1 = µ 2 . This would
Note that, in order to derive this range of values for n 2 , it is necessary to
be the case if Y were the average ofm future observation s to be taken from the
assume that p 1 = p 2 • This assumption may not be appropriate , because fish
same N(µ , u 2 ) distribution as the original sample x 1 , x 2 , .. . , x" (see Example that have been caught and tagged may have a larger (or smaller) probability
16.3.1). Now (16.5.1) yields a 95% interval for j: of being caught subsequently.
ye x ± 1.96 a 2 G ~) + . EXAMPLE 16.5.4. Suppose that the normal straight line model (13.5.1) has been
fitted ton observed pairs (xt> Y;), i = 1, 2, ... , n. Now we observe an additional
value y, for which the correspond ing value of the independen t variable x is
This interval consists of the values of Y such that a test of the true hypothesis
unknown. The problem is to make inferences about x.
µ 1 = µ 2 would produce a significance level of 5% or more. The same interval
Suppose that a value is given for x. Then from the straight line model, the
can be obtained as the central 95% interval in the predictive distribution of y
estimated mean value of Y is
(see Example 16.3.1).
jJ. = & + '(Jx - N(µ, ca )
1
EXAMPLE 16.5.2. Let Y1 and Y2 be independen t variates, with Yi- binomial
1
(n;, p;). The likelihood ratio statistic for testing H: p 1 = p 2 is where c=c(x)=(l /n)+(x-x) /Sxx· We take Y-N(µ ', a ) and test H: µ' =µ.
2
Under H, we have
D = 2~y; log Y;_ + 2k(n 1 - y 1) log ~; - y~
n1 p n; 1- p) Y - jJ. - N(O, (1 + c)a 2 )
where p = (y 1 + y 2 )/(n 1 + n 2 ). Given observed values y 1 and y 2 , we can test H and the significance level is 5% or more if and only if
as in Section 12.4. The distribution of Dis approximat ely xfiJ if His true, and y-jJ.-0
so -1.96 s s 1.96.
J a (1 + c)
2
lfwe don't know y 2 but we do know that p 1 = p2 , we can use the above test
336
16. Topics in Statistical Inference
[y - a - '/Jx]2
12.6.1
Dobs = 14.27 - 7.54 = 6.73; d.f. = 10 - 4 = 6.
10(6.03)
3(6.07)
117(120.97)
144( 140.03)
µ 2 e 12.74 ± 2.06s/ ji6; µ 1 -µ 2 e -6.51±2.06 s / J/
1 16
+
1
D 0 b, = 5.31; d.f. = (2 - 1)(2 - 1) = 1; SL:::: 0.021. Fairly strong evidence 13.4.6 (a) si = 0.0914, v
1 = 13; si = 0.0422, v2
= 3;
2
against independenc e hypothesis. s2 = 0.082175 (16 d.f.). Dobs = :Ev 1 log(s /sf)=0.616 (1 d.f.); SL:::: 0.43. No
12.6.5 74(85) 116(123.25) 68(63.75) 82(68) evidence against equal variance hypothesis.
126(115) 174(166.75) 82(86.25) 78(92)
D0 b, = 8.70; d.f. = (2- 1)(4- 1) = 3; SL:::: 0.034. Some evidence against (b) µ 1 - µ 2 E 0.273 ± 1.746 [.
0.082175
14
(1 41)]
+
1 2
' ·
(a) si = 0.25, s~ = 0.268, s~ = 0.183, s = 0.267; d.f. = 4, 4, 4, 12.
2
independenc e hypothesis. 13.4.9
12.6.7 (a) Doh•= 1.84; d.f. = (3 - 1)(3 - 1) = 4; SL:::: 0.77 D0 b, = 41: log(s 2 /sf) = 0.491 (2 d.f.); SL:::: 0.78.
(b) Obs. freq. 126 271 132; Exp. freq. 132.25 264.5 132.25 No evidence against equal variance hypothesis.
D h, = 0.46; d.f. = (3 - 1)- 0 = 2; SL:::: 0.8
0
2
(b) s2 = 0.267 (12 d.f.); 14.7% LI 0.132:::; 11 :::; 0.671.
No evidence against hypothesis in (a) or (b). 2 2
Or 4.404 $ 12s /11 $ 23.34 gives 0.137::; 11 2 ::; 0.728.
12.6.11 (a) Above 9(15) 6(8) 12(9) 23(18) (c) µ2 - µ3 E -0.3 ± 2.179(0.267( i + z)]'1 .
2
Below 21(15) 10(8) 6(9) 13(18) & = 1.468, p= 1.703, s2 = 0.00502 (28 d.f.). Plot shows curvature. Try a 2nd
13.5.2
D 0 h, = 10.8; d.f. = 3; SL:::: O.Ql 3. Strong evidence that birth weight is not degree polynomial model.
independent of parental smoking habits. & = 47.864, p= 48.247, s = 0.1783 (15 d.f.). Estimated boiling points
2
13.5.6
(b) MF Mf MF MP 19240 203.16 211.96. P would increase to 48.247logl0; other results
Above 9(9.8) 6(5.2) Above 12(11.7) 23(23.3) unchanged. One point (BP = 204.6) is seriously out of line. Redo analysis
Below 21(20.2) 10(10.8) Below 6(6.3) 13(12.7) with this point omitted.
Given mother's smoking habits, there is no evidence that birth weight 13.6.2 & = 0.976, p= 0.353, s2 = 0.00978 (6 d.f.). The last observation is somewhat
depends on father's smoking habits. larger than expected.
12.6.13 Test for independenc e in 2 x 10 table gives D = 37.5 (9 d.f.), which shows that pE 0.353 ± 2.447(s 2 / l.229) 1i 2 2 112
C1. + 0.4/3 E 1.117 ± 2.447s(i + 0.2875 /1.229)
insects tend to aggregate. This analysis is conditional on the total number of
(a) & = -0.228, p= 0.9948, :Eef = 0.3419, s = 0.04273 (8 d.f.).
2
insects which land on area A or B in each trial. 13.6.7
2 1 2
12.7.3 Test for independenc e in 2 x 4 table gives D = 66 (3 d.f.); SL:::: 0. Strong (b) T.bs =(ft - 1)/(s / 1569) ' = -1.00; SL= 0.35.
evidence against independenc e hypothesis. It may well be that only the best 1 2
(c) T.bs = &/s ( 10 + 23.25 /1569
. )1/2
= -1.65; SL= 0.14.
students chose to write the competition. There is no proof that writing the
competition made them any better. No evidence against H: /3 = 1 or H: C1. = 0.
(d) :Ex 2 = 6974.25, l:xy = 6884.65, :Ey = 6796.66; P= 0.9872;
2
12.7.5 D = 112; d.f. = l; SL:::: 0. The admission rate is certainly lower for females. 112
Only program A shows any evidence of bias, and here it appears to be against s = M:Ey - Pl:xy) = 0.05099 (9 d.f.); T.bs =(ft - 1)/(0.05099/6974) =
2 2
males. There are proportionat ely more female applicants to programs with -4.75; SL< 0.001. There is very strong evidence that /3 # 1. If we insist
low admission rates. that line goes through the origin, slope must be less than 1 to give a
12.8.3 (a) Each of the 400 electors is counted twice in the table, so rows are not reasonable fit to the data It is reasonable to take f3 = 1 or to take C1. = 0, but
independent . it is not satisfactory to assume both P= 1 and C1. = 0.
that differences are independent N(µ, 11 ), and test H: µ = 0. Here
2
(b) D = 256; d.f. = 1; SL:::: 0. Most electors have not changed their positions. 13.7.3 We assume
(c) Consider just those who changed their positions. Obs. freq. 17 33; exp. y= 2.5, s 2 = 5.10 (5 d.f.).
344 Appendix A: Answers to Selected Problems
Appendix A: Answers to Selected Problems
345
T.b• = (y-O)/ (s 2 /6) 112 = 2.71; SL= 0.042.
There is some evidence that brand A is superior. Howev
er, brand A tires were
P= (X'X)- 1
X'y gives Pi= 86.35, P2 = -2.575, p3 = -0.362 7,
always tested first. It would be better to test A first
on 3 randomly chosen ref= 63845.72-63841.59 = 4.13 (9 d.f.).
cars, and B first on the other 3 cars. '/31+ (11.5 - 19)'/32 + (17.5 - 19)2p 3 = 89.40
13.7.8 (a) &= -0.160 5, P= 35.348, s2 = 0.01031 (31 d.f.) d
di [}3, + (t - 19J'/32 + (t - 19)2 p3J = o for t = 19 - p 2; 2p = 15.45.
3
1
Cl+ 0.4/J E 3.3743 ± 2.04s
[ 33 + 0.02764 2/0.07242 ]1/2 = 3.3743 ± 0.0419 14.3.1 (a) s2 =0.074317/ 25 (25 d.f.)
(b) Same &, P; s 2
= 0.00753 (9 d.f.) H 1: Q = 0.066343 (3 d.f.); SL= P{F 3.2s z 7.44} =
.0 .001.
H 2: Q = 0.000043 (2 d.f.); SL= P{F2 .is z 0.007} = 0.993.
[
1
ex+ 0.4/3 E 3.3743 ± 2.262s Tl+ 0.02764 2/0.02414
]1/2= 3.3743 ± 0.0687 2
(b) s = 0.07436/ 27 (27 d.f.)
The interval in (a) is too narrow, owing to the Q = 0.0663 (1 d.f.); SL= P{ F 1.2 7 z 24.07} < 0.001.
treatment of repeat 14.3.5 (a) l::l::(y1i - yJ2 = 4.540; s 2 = 0.2389 (19 d.f.)
measurements as independent replicates.
13R2 sf= 2.495 (6 d.f.), s~ = 2.898 (6 d.f.), s 2 = 2.696 (12 d.f.). l:::E(ylj - y) 2 = 6.838 (22 d.f.); Q = 2.298 (3 d.f.)
D0 b, = 6:2: log(s 2/s?) = 0.03 (1 d.f.); SL~ 0.85. Fobs= (Q + 3)/s 2 = 3.21; SL= P{ F 3.!9 z 3.21 } = 0.046.
No evidence against hypothesis of equal variances. Some evidence that means are not equal.
(b) S 2 = 0.1324 0.08 0.3329 0.377; V; = 6 3 6
(71+ 71)'' 4;
2
µ2 - µIE 1.729 ± 2.179s = 1.729 ± 1.912. s 2 = 0.2389 (19 d.f.).
. D0 b, =:Ev; log (s 2/sf)= 101 (3 d.f.);
4.404 $ 12s2 /a 2 $ 23.34 gives 1.386 $ u 2 $ 7.346. One
possibility: divide 14 SL~ P(xf3 > z 3.01) = 0.39.
men into 7 pairs with nearly equal initial blood pressu
re. Choose one of pair No evidence against hypothesis of equal variances.
at random to get drug 1, and the other gets drug 14.3.8 Straight line model : :Eef = 118.91 (18 d.f.).
2. Analyze differences.
13R6 sf= 0.1983, s~ = 0.1820, sS = 0.1692, si = 0.1722, each with
8 d.f. s2 = 0.1804 5-sample model : :El:e5 = 117.27, s 2 = 7.82 (15 d.f.).
(32 d.f.).
Q = 1.64 (3 d.f.); Fobs= 0.07; SL= p { F 3.15 z O.D7}
D 0 b, = 8:2: log(s 2/st) = 0.06 (3 d.f.); SL~ 0.996. = 0.975.
Straight line model fits very well.
No evidence against hypothesis of equal varianc 14.4.1 '/3, V, and s 2 (4 d.f.) are given in Proble m 14.2.4.
es. In fact, the variance
estimates are so nearly equal that one might suspect
some tampering with the (a) /3 3 E '/3 3 ± 2.776(s 2 vJJ} 112 = 28.675 ± 2.985
data_
(b) (i) 0=P 2 -2/3 1 =b'/J wher eb'=( -2, 1, 0).
18.3 $ 32s 2 /u 2 $ 49.5 gives 0.1166 $ u 2 :::; 0.3155.
lJ = -2.375 ; var(tJ) = cu 2 where c = b' Vb= 19/ 8.
Alternatively, 14.7% LI is 0.1147 $ u 2 $ 0.3074.
T,,b, = (tJ- O)/(s2 c) 1 ' 2 = -0.88 ; SL=
14.1.l P{lt14 >z 0.88} = 0.43.
1 0 (ii) Under H, µ = X p where
0 0 0 0 0 0 0
1 0 2 0 1 0 0 0 0 0 0 X' = [ 1 2 0 3 1 2 3].
0 0010111 '
3 0 0 -1 0 0 0
0 0 4 0 New residual SS is 14.71 (5 d.f.).
0 0 0 0 0 Old residual SS is 12.335 (4 d.f.); s 2 = 3.08375
0 0 5 0 0 1 0 0 0 1 0 Q = 2.375 (1 d.f.); F 0 b, = 0.7702; SL= P{F . z 0.77} =
1 4 0.43
0 0 6 0 0 0 0 0 0 (c) For revised model, P = 28.8, v = 0.3684, s 2 =
3 33 2.942 (5 d.f.).
f33 E '/3 3 ± 2.571(s 2 v 33 ) 112 = 28.8 ± 2.677.
14.2.2 )(' X =[ n LX ] - [ 12 628] 14.4.3 (a) (J = /3 2 - /3 = b'P where b' = (0 1
:Ex rx 2 - 628 34416 ' 1 -1 0). lJ = 2.88; var (tJ) = cu 2
where c = b'Vb = 0.021. s2 = 4.3 (9 d.f.).
:Exy
J
X'y= [ :Ey = [ 1684]
89894
(J E 2.88 ± 2.262(cs 2) 1' 2 = 2.88 ± 0.680.
(b) T.b, = ('/J3 -O)/(V33S 2) 112 = -6.88;
(X'X )-1=[ 1.8495 -0.033 7 ]· P= I - I I -[80.89] SL= P{lt19 >1z6.88} < 0.001.
-0.033 7 0.000645 ' (XX) X Y- t.138 (c) Q = 167.6 - 38.7 = 128.9 (2 d.f.). Fobs= (Q + 2)/s 2
pt(X'y) = 1684 x 80.78 + 89894 x 1.138 = 238330 = 14.99;
2 SL= P{F 2 . 9 z 14.99} = 0.0014 .
l:y = 238822; rsf = 238822 - 233330 = 492. 14.5.6 (a) mf1 + mf2 + ··· + mf. = m ;. so m;;(l -
1 m;;) z 0.
14.2.5
x-[\;~;n rx{~ ~ ,~J X'y=
rl-154.5
755.4 .]
(c) ~f m., = 1, then mu= 0 for j ¥- i by resu!t in (a).•
µ,=m ,,y 1 +m12Y + ... +m;.y .=y,; e;=y; -µ;= 0
.
e .1.2 Y 1s sufficient but not2 minimal because L(u; - y) = L(cr; y).
4924.3 15.1.5 Yes, the pair (T , T ) is minimally sufficient.
1 2 L(O; y') and L(O; y) are propo r-
tional if and only if T,(y') = T1(y) and T (y') = Tz(y).
2
346 Appendix A: Answers to Selected Problems
15.1.7 f(X1 ... Xn) = (1/2/J)" for -{} < X1, ... , Xn < 8. APPENDIX B
L(B; x) = e-· for e
< x(I) and X(n) < 8; i.e. for II> max {x(n)' -x(n}
T max {X<•i• - X(l 1} is a sufficient statistic for 8.
15.1.9 2 2
2
2
L(µ) exp { - n(X - µ) /2rr - m(Y - µ) /2krr }
2
rr 2 }
Tables
c exp { - [(nk + m)µ -2(knX + mf)]/2k
+ m)rr 2 ) by (6.6.8)
Thus T = knX + Y is sufficient forµ. T- N((kn + m)µ, k(kn
and (6.6.7).
15.1.15 log
e logµ, e1 (or any constant multiples of them).
l is a sufficient
L{).)= )."exp( -,1,t) for A>O where t=L.xf_ Thus T==:L.X
. Now Z =2).T
15.2.3
statistic for ,1,, P.d.f. of Y = 2AX is fe-v which
2 is xf21
12 ,
15_6,3 P(x\t) = (x 1 + r - l)(x 2 + r - l)/(t + 2r -1) Table Bl gives the value x whose cumul ative probab
ility F(x) is the sum of the
r-1 r-1 2r-l
gs. Examp le: the value x such that
l(pl' P2) = r log P1 +Xi log(l - P1) + r log P2 + X2 log(l
- P2) corres pondin g row and colum n headin
and colum n .04 of Table Bl).
;6
D 2[1(;6 1 , 2 ) - l(p 1 , p2 )] where ;6, = r/(r + x,) and p, = 2r/(2r + t). F(x) = .64 is 0.358 (from row .6
x =0, 1, 2, 14, 15, 16; gives the cumul ative probab ility F(x), where x is the sum of the
r = 2, t = 16, Dobs = 2.41; D 2 2.41 for 1 Table B2
the cumul ative
le:
SL P(D22 .41IT 16)=0. 194. corres pondin g row and colum n headin gs. Examp
.3 and colum n .06 of
15.6.6 See Example 15.6.2 (test for independence in 2 x 2 table). probab ility at value 0.36 is F(.36) = .6406 (from row
t
)wher eL.x,= nandx 1 +x 4 =t.
2(1 F) .05 .01 .002 .001
.,...j
~
>
'O
)!
Table 82. Standardized Normal
Distribution (con tinu ed) "'
"-
;:;·
!;I)
...,
~
"'
"O
..,
-l
::r'
~"g ~ "' g.
i1
..,3
g: ~
"'"..
;r
8 l:5~~
'...J..JN NNN
0 '° 00 _, "'
Nt-.JNNN
1..A+::i- WN-
N ----
O\OQ:l... ...JO\
-----
Vl~WN- 0\000-.JO\ Vl.+;:.. WN- "'1
O" O"
"" 0
=.; 0.
0
.,, ....,
~
w
v.
~· c. !X1
NN NNNN W g ~:\.,,(! E O"
r.
0
iv t-...J N iv N t...J N N N NNN '-<: 0
5·~~ NivN N NivNt- NNN
..1tv Vl NN N 0\ 0\ 0\ O'. 0\ 0\ -.I -.I OCVN II
:::> :- 'T1 Vl Vl Vl Vl
w +:--
Vl Vl Vl Vl Vl Vl Vl Vl Vi
,+:::.. Vl
-.J -.J -.J -:I 00 ~~~~g 0 - N
Vl Vl Vl Vl Vl
ft
t:O
w
e.,g ~
:__,
~ 0
0.?S"" ~
i:: ::> :::>
"'"' 0-
;. 0..,,
+:o-0-.-J \O
0 ()Q ~
0 .., .
.., 0 -<
en ;;·
.., 0 II 2.
: -5 ~ Oo Oo Oo 6o Oo Oo Oo Oo Co Oo Oc Oo Oo Oo Oo Oc Oo Oc Oo Oc ~~~~ '°~~~~~ ~ ~ ~ ~
r.
~
Oo Oo Oo Oc Vl Vl Vl V'\ Vi 0\0--.-l-.J
\CW\00\~0-oo-O\ "'0-,
~
0- -
r-
?- ~
• "'" "'" "'" ..,.
NVlo:i-
Vl Vl Vl
Vl
..i::i,.~V\VlO"
Vl
0--.-.JOO OO\O 0' -"
" N W"V
' O' ' 'O'
l O' 00 0 w O'
0
::r'
0 8
,,.-..._
...., <
-w - - - -w -w -w - - -w w;...., (.,. ,
"O
Vl
~s a.
~ w \.....i w
w :::;-.
0 ""-c: + cc..
w w -0oo '°0
~ g_ ~-
- - -- WWW~ £· '.1>.V.0
\....J 0
;.....>;...., i.,,;.J i.....J \....J -.JWWQ :>-.J 0. ~
" c "
;,1,:._;
NNNW
0000\00
h .J \0 °' w -
0 - (.,...J +:-- Vl
---NN
0"00\0-1 .,,..)
N NW~
VlOOO W-l
.io..io.v .v.Q\
:.;.J
-V>OO'W
-.JOO...C -
NW-.JV '\ O"WOO Q'\00
0 ><
3 (") ~
<C l ;::N 0
::I
.., ""'"
'O
"~ . -~
0c: 'o-· -°' -°' - - - '.....J '.....J :.....i '.....J '.....J '.. . J '.. . J '.....! :.....i :.....i '.. . J '.....! '.....J
-- - - - NNNN°'
:.....i :.....i Oc :.0 Oo 00 \o 0 ;_ w \.o ;,.,.., \c
V>
~o
"' ..,
---:- @
<
+
~·
:=::
~ _ g.. w w w w
.t
WNWWW
°'°'°':.....i
WWWWW w w w w .i;i..
Oc Oc :..0 \.o 0 "'"~
0 "'""'""'"
~~~~~
;_ t...J w:.... 0co-,;..o~°"
°' - '° c:
'°'°..,,
.....J~Oo
-i
rvw·~v. '.....J'.....J' ~
1·
;;.
'O
ii:::>
0-
;;·
2
Table 84. Percentiles of the Chi-Squ are (i'. ) Distribu tion cc
F(x) = P(xfvi ~ x) = f: uv 12
-
1
e- " du/ 2v r
12 12 ( ~) . The body of the table gives the values x corresp onding to sele..:ted
values of the
-l
"'
0-
ff
cumula tive probabi lity (F) and degrees of freedom (v).
F
.005 .OJ .025 .05 .JO
v .25 .5 75 .9 .95 .975 .99 .995 .999
16 5.142 5.812 6.908 7.962 9.312 11.91 15.34 19.37 23.54 26.30 28.85
J7 5.697 6.408 7.564 8.672 10.09 32.00 34.27 39.25
12.79 16.34 20.49 24.77 27.59 30.19
18 6.265 7.015 8.231 9.390 10.86 33.41 35.72 40.79
13.68 17.34 21.60 25.99 28.87 31.53
19 6.844 7.633 8.907 10.12 34.SJ 37.16 42.31
11.65 14.56 18.34 22.72 27.20 30.14
20 7.434 8.260 9.591 10.85 32.85 36.19 38.58 43.82
12.44 15.45 19.34 23.83 28.41 31.41
21 34.17 37.57 40.00 45.32
8.034 8.897 J0.28 11.59 13.24 16.34 20.34 24.93 29.62 32.67 35.48
22 8.643 9.542 10.98 12.34 38.93 41.40 46.80
14.04 17.24 21.34 26.04 30.81 33.92
23 9.260 10.20 11.69 36.78 40.29 42.80 48.27
13.09 14.85 18.14 22.34 27.14 32.01
24 9.886 10.86 12.40 35.17 38.08 41.64 44.18 49.7 3
13.85 15.66 19.04 23 34 28.24 33.20
25 10.52 11.52 13.12 36.42 39.36 42.98 45.56 5U8
14.61 16.47 19.94 24.34 29.34 3438 37.65 40.65 44.31 46.93 52.62
26 l 1.16 12.20 13.84 15.38 17.29 20.84 25.34 30.43 35.56 38.89 41.92
27 11.81 12.88 14.57 16.15 45 .64 48.29 54.05
18.l l 21.75 26.34 31.53 36.74 40.l l
28 12.46 13.56 15.31 43.19 46.96 49.64 55.48
16.93 18.94 22.66 27.34 32.62 37.92
29 13.12 14.26 16.05 41.34 44.46 48.28 50.99 5689
17.71 19.77 23.57 28.34 33.71
30 13.79 14.95 39.09 42.56 45.72 49 .59 52.34
16.79 18.49 20.60 24.48 29.34 58.30
34.80 40.26 43.77 46.98 50.89 53.67 59.70
r
:>
-0
f(x)=P(F"·'"~x)= f(nmu 2
-
1
( l+mnur(n+m )1
2
n (n+m)/r (")'2 r (m)
mdu·r - l-
~
:0
0.
;:;
0
2 =
90th Percentiles (F = .9) .,a....,
~
[
1 2 3 4 5 6 8 12 24 oc;
1 39.86 49.50 53.59 55.83 57.24 58.20 59.44 60.70 62.00 63.33
2 8.53 9.00 9.16 9.24 9.29 9.33 9.37 9.41 9.45 9.49
3 5.54 5.46 5.39 5.34 5.31 5.28 5.25 5.22 5.18 5.13
4 4.54 4.32 4.19 4.11 4.05 4.01 3.95 3.90 3.83
5 4.06 3.78 3.62 3.76
3.52 3.45 3.40 3.34 3.27 3.19 3.10
6 3.78 3.46 3.29 3.18 3.11 3.05 2.98 2.90 2.82 2.72
7 3.59 3.26 3.07 2.96 2.88 2.83 2.75 2.67 2.58
8 3.46 3.11 2.92 2.47
2.81 2.73 2.67 2.59 2.50
9 3.36 3,01 2.40 2.29
2.81 2.69 2.61 2.55 2.47
10 3.28 2.38 2.28 2.16
2.92 2.73 2.61 2.52 2.46 2.38 2.28 2.18 2.06
12 3.18 2.81 2.61 2.48 2.39 2.33 2.24 2.15 2.04
15 3.07 2.70 2.49 2.36 1.90
2.27 2.21 2.12 2.02 1.90
20 2.97 2.59 2.38 2.25 l.76
2.16 2.09 2.00 1.89 1.77
25 2.92 2.53 2.32 1.61
2.18 2.09 2.02 1.93 1.82
30 2.88 2.49 2.28 1.69 1.52
2.14 2.05 1.98 1.88 1.77
40
1.64 1.46
2.84 2.44 2.23 2.09 2.00 1.93 1.83 l.71 1.57 1.38
60 2.79 2.39 2.18 2.04 1.95 1.87 1.77 1.66 l.S 1
"• ·"120 2.75 2.35 2.13 1.99 1.29
1.90 1.82 1.72 1.60 1.45
CJ:) 2.71 2.30 2.08 1.94 l.19
1.85 1.77 1.67 1.55 1.38 (;.>
1.00 V>
w
(continued on following page)
w
'-"
~
~ 1 2 3 4 5 6 8 12 24 ro
161 2()() ::'. 16 225 230 234 239 244 249 254
2 18.5 19.0 19.2 19.2 19.3 19.3 19.4 19.4 19.5 19.5
3 101 9.55 9.28 9.12 9.01 8.94 8.84 8:14 8.64 8.53
4 7.71 6.94 6.59 6.39 6.26 6.16 6.04 5.91 5.77 5.63
5 6.61 5.79 5.41 5.19 5.05 4.95 4.82 4.68 4.53 4.36
6 5.99 5.14 4.76 4.53 4.39 4.28 4.15 4.00 3.84 3.67
7 5.59 4.74 4.35 4.12 3.97 3.87 3.73 3.57 3.41 3.23
8 5.32 4.46 4.07 3.84 3.69 3.58 3.44 3.28 j 12 2.93
9 5.12 4.26 3.86 3.63 3.48 3.37 3.23 3.07 2.90 2.7 1
10 14.96 4.10 3.71 3.48 3.33 3.22 3.07 2.91 2.74 2.54
12 4.75 3.88 3.49 3.26 3.11 3.00 2.85 2.69 2.50 2.30
15 4.54 3.68 3.29 3.06 2.90 2.79 2.64 2.48 2.29 2.07
20 4.35 3.49 3.10 2.87 2.71 2.60 2.45 2.28 2.08 1.84
25 4.24 3.38 2.99 2.76 2.60 2.49 2.34 2.16 1.96 1.71
30 4.17 3.32 2.92 2.69 2.53 2.42 2.27 2.09 1.89 1.62
40 4.08 3.23 2.84 2.61 2.45 2.34 2.18 2.00 1.79 1.51 >
"'O
60 4.00 3.15 2.76 2.52 2.37 2.25 2.10 1.92 1.70 1.39 ]
120 3.92 3.07 2.68 2.45 2.29 2.17 2.02 1.83 1.61 1.25 0.
;:;·
OC; 3.84 2.99 2.60 2.37 2.21 2.10 1.94 1.75 1.52 1.00 c::i
.,...;
er
if
>
"'O
ii
Table B5. Variance Ratio Distribution (continued) ";:;·0.
99th Percentiles (F = .99) c::i
;;;"
~
8 12 24 er
I 2 3 4 5 6 00
1f
1 4052 4999 5403 5625 5764 5859 5982 6106 6234 6366
2 98.5 99.0 99 .2 99.2 99.3 99.3 99.4 99.4 99.5 99.5
3 34.1 30.8 29.5 28.7 28.2 27.9 27.5 27.l 26.6 26.l
4 21.2 18.0 16.7 16.0 15.5 15.2 14.8 14.4 13.9 13.5
5 16.3 13.3 12.1 11.4 11.0 10.7 10.3 9.89 9.47 9.02
6 13.74 10.92 9.78 9.15 8.75 8.47 8.10 7.72 7.31 6.88
7 12.25 9.55 8.45 7.85 7.46 7.19 6.84 6.47 6.07 5.65
8 11.26 8.65 7.59 7.01 6.63 6.37 6.03 5.G? 5 28 4.86
9 10.56 8.02 6.99 6.42 6.06 5.80 5.47 5.i 1 4 73 4.31
10 10.04 7.56 6.55 5.99 5.64 5.39 5.06 4.71 4.33 3.91
12 9.33 6.93 5.95 5.41 5.06 4.82 4.50 4.16 3.78 3.36
15 8.68 6.36 5.42 4.89 4.56 4.32 4.00 3.67 3.29 2.87
20 8.10 5.85 4.94 4.43 4.10 3.87 3.56 3.'.!3 2.86 2.42
25 7.77 5.57 4.68 4.18 3.86 3.63 3.32 2.99 2.62 2.17
30 7.56 5.39 4.51 4.02 3.70 3.47 3.17 2.84 2.47 2.01
40 7.31 5.18 4.31 3.83 3.51 3.29 2.99 2.66 2.29 1.80
60 7.08 4.98 4.13 3.65 3.34 3.12 2.82 2.50 2 12 1.60
120 6.85 4.79 3.95 3.48 3.17 2.96 2.66 2.34 1.95 1.38
OC; 6.64 4.60 3.78 3.32 3.02 2.80 2.51 2.18 1.79 1.00
-
w
'-"
'-"
356 Appendix B: Tables
Index
..,. - ·'°r-- N 000\0"-
8
--
00 r-
v) ......;
f""')
a\ r---:
00
"° V MM 00 V)
v)-.:iMNN
~O<n~~ 0$~"'
\Civ)-.:iMrr\ MNNN
r- V)
r- N V> V
-- -
0
ocit""i ·~oo
00
~~~~N
oO r---: '° v) v-)
OONV"'lf-.V) ~~~~~ Achievable significance level, 190 Comparable test statistics, 190
o-D"'
N- -
·o oo~'°v)v) Additional sum of squares, 253-267, 276 Composite hypothesis, 142, 149, 194,
Alternative hypothesis, 191, 194 300
M \A 0 0\ N Analysis of differences, 234-240 Computational methods , 46 , 54, 88, 91,
O\NV\Ot"'"l -.C N - V
c: _..; ~ ~ N 0-:00,....:'-C)._o Ancillary statistics, 291 - 295, 299, 317- 248
.g M- - __, -
320, 331-333 Conditional likelihood, 95
::J
.n Association , 170-182 Conditional test~. 300-313
·;::: "1" 0 vi V) 0 r- °' N
r-.00000\\0 OOM-VO '°-r-v Confidence coefficient, 113
(/) r"i oci v) M r-.i S°'oor---:r---: ...0-.0v)v)
i:5 N - -
Bayesian methods, 321-327
Confidence intervals, 113- 123, 189, 298--
.2 300
V> N r- Before and after measurements (see
0 r- V"l V °' OM°'Nr- in normal linear model, 202-205, 261-
~"' ~.....;oO\C)~
N N-
M ~o\oci Paired data) 264
cu Behrens-Fisher problem, 332-333
(J
c: Confidence region (see Confidence
Bias, 129
"'
·;::: \Q-..0000\M
oO...O~Mrri
\OOVOO
NN ·o Block effects , 236
interval)
>"' Consonance region, 186
Contingency table, 170-186
independence hypothesis , 172-182,
Calibration problem, 336 307-3 11
Causation, 179 marginal homogeneity, 182-186, 312
Censoring, 32 Continuity correction, 307
Chi-square approximation, 107, 112, Contour map, 61-65, 91
121 , 143, 145, 150 Corrected sum , 207, 221-222
tables, 351-352 Coverage probability, 102- 123 , 188, 290
Cl (see Confidence interval) CP (see Coverage probability)
Column space, 275 Cri tical region , 190
Combining space, 275
Combining likelihoods , 13 , 22, 45 , 152,
155 Degrees of freedom , 145, 150, 201 , 254
Index Index 359
358
Dependent variable, 220 Independence hypothesis (see Hypothesis Likelihood region, 18, 61, 121 Normal linear models, 196-276
D.f. (see Degrees of freedom) of independence) Linear estimate, 130 assumptions, 196-200
Design of experiments (see Planning Independent likelihoods, 13-17, 19-22 Linear hypothesis, 254 checking the model, 225-229, 267-274
experiments) Independent variable, 220 Linear independence, 243, 254 confidence intervals, 203-205
Discrepancy measure, 136 Influential point, 227, 271 Linear model (see Normal linear model) estimation, 200-202, 247-248
Dose-response models, 74 Information function, 5, 40, 54, 126- Logistic model, 75-83, 158, 313 matrix notation, 242-247
128, 288-289 Log likelihood function, 4, 54 paired measurements, 234-237
expected, 124-128 Log-odds, 76 prediction, 329-336
transformation of, 44, 71-74 significance tests, 202-204, 252-2ti7
ED50, 79 sufficient statistics, 283, 287
Efficiency, 126 Invariance, 37-40, 54, 130-132
Marginal homogeneity (see Hypothesis of Null hypothesis, 191
Empirical Bayes methods, 32J marginal homogeneity)
Error variable, 197 Marginal likelihood, 95, 204
K-sample model, 216-220, 245, 255-258
Estimate Maximum likelihood equation, 5, 47, 54, One-sample model, 199, 206-2!2, 244
least squares (see Least squares 89 Orthogonal matrix, 274
estimate) Maximum likelihood estimates, 3, 54 Orthogonal transformation, 274
maximum likelihood (see Maximum Least squares estimates, 200
Left inverse, 248 combined or pooled, 15, 45 Outlier, 227, 269
likelihood estimate) computation of, 47, 54
unbiased, 129 Leverage, 268, 271
LI (see Likelihood interval) infinite, 22-23
Expected information, 124-128 invariance of, 37-39, 54, 130-!32 Paired data, 94-95, 187, 234-240
Explanatory variable, 220 Likelihood contour, 61-65
calculation of, 63-64, 91-92 in normal linear models, 200, 247 Parallel line model, 246
Exponential family, 195, 282-283, 285 sampling distribution of, 97-102 Parameter space, 5
Likelihood function, 3, 54
for continuous models, 25-3 7 and sufficient statistics, 288-295, 305 Parameter transformations, 43-44, 71-74,
factorization of, 68-69 Maximum likelihood interval, 62-66, 116
F distribution tables, 353-356 for frequency tables, 8-10 121-123 Pearson goodness of fit statistic, 161
Fiducial argument, 314-321 multi-parameter, 92-95 Maximum relative likelihood function, Pivotal quantity, 317
Fiducial distribution, 315--321, 324 normal approximation to, 40-45, 70- 65, 93 Planning experiments, 124-128, 181,
Fiducial probability, 315 74, 115, 122, 155, 288 Mean squared error, 129 230, 237, 259, 294
Fisher's measure of expected information, Measurement interval, 25-29 Polynomial model, 244
Likelihood interval, 18, 48-49
124 approximate, 40, 115 Method of Lagrange, 264 Pooled variance estimate, 213, 216-219
Fitted value, 201 coverage probability of, 102-120 Minimally sufficient, 279 Pooling class frequencies, 164, 169
Frequency properties, 96-133, 188-195, maximum, 62-65 Minimum variance unbiased, 129 Posterior distribution, 321
289-300 Likelihood ratio statistic, 97, 120, 141- MLE (see Maximum likelihood estimate) Power, 190-195
F-test, 254-260, 262 149 Multi-parameter likelihoods, 92-95 Prediction, 326-336
Functionally independent, 145 chi-square approximation, 107-113, Multiple regression, 243 Predictive distribution, 327
121, 145, 150 MVU (see Minimum variance unbiased) interval, 328
sampling distribution of, 97-102, 296- Prior distribution, 321-326
Goodness of fit tests, 161-170 300, 305 Probit model, 75
Likelihood ratio tests, 142 Natural parameter, 285 P-value, 136
for binomial probabilities, 156-160, Newton-Raphson method, 55, 77-78,
305-307 88-90
Hardy-Weinberg law, 301-304
for composite hypotheses, 149-156 Newton's method, 46-51, 91-92 Random error, 197
Homogeneity hypothesis (see Hypothesis
for equal variances, 218-219 Neyman-Pearson fundamental lemma, Randomization, 179--182
of homogeneity)
Hypothesis, 134, 141-142, 149, 190- for homogeneity, 152--156 192 Random sample size, 292
for independence, 170-179, 307-311 Noise, 197 Reference set, 296-300
195, 254
of homogeneity, 152-155, 218-219 for marginal homogeneity, 183-185 Normal approximations (see Likelihood Regression, 221, 243
of independence, 172-179, 307-311 for multinomial probabilities, 160-170 function, normal approximations Relative efficiency, 126
of marginal homogeneity, 182-186, in normal linear models, 252--267 to) Relative likelihood function, 17, 61
for simple hypotheses, 141-148 Normal distribution tables, 347-349 Residuals, 20 I, 269-273
312
360 Index