Вы находитесь на странице: 1из 188

J. G.

Kalbfleisch

Probability and
Statistical Inference
Volume 2: Statistical Inference
Second Edition

With 27 Illustrations

Springer-Verlag
New York Berlin Heidelberg Tokyo
Springer Texts in Statistics

Advisors:
Stephen Fienberg Ingram Olkin
J.G. Kalbfleisch
University of Waterloo
Department of Statistics and Actuarial Scien
N2L 3G 1 -
ce Preface
Waterloo, Ontar io,
Canada
Editorial Boar d
Stephen Fienberg Ingram Olkin
Department of Statistics Department of Statistics
Carnegie-Mellon University Stanford University
Pittsburgh, PA 15213 Stanford, CA 94305
U.S.A. U.S.A.

as a text for introductory courses


This book is in two volumes, and is intended
d or third year university level . It em-
in probability and statistics at the secon
AMS Classification: 62-01 l princ iples rathe r than mathematical theory. A
phasizes applications and logica
sufficient for most of the material
good background in freshman calculus is
Library of Congress Cataloging in Publication
Data included as supplementary material.
presented. Several starred sections have been
Kalbfleisch, J.G. difficulty are given, and Appendix
Nearly 900 problems and exercises of varying
Probability and statistical inference. of them.
(Springer texts in statistics)
A contains answers to about one-third
bility models and with math-
Includes indexes. The first volume (Chapters 1-8) deals with proba
g them. It is similar in content
Contents: v. I. Proba bility -v. 2. Statistical
inference. ematical methods for describing and manipulatin
sections have been rewritten and
I. Probabilities. 2. Mathematical statistics. and organization to the 1979 edition. Some
indep endent random variables and
I. Title. II . Series . expa nded -for example, the discussions of
ises have been added.
QA273.K27 1985 519.5'4 85-12580 conditional probability. Many new exerc
ters 9-16) , proba bility models are used as the
es, In the second volume (Chap
The first edition was published in two volum This material has been revised
© 1979 by Springer-Ver lag New York Inc.: basis for the analysis and interpretation of data.
use of the likelihood function in
Probability and Statistical Inference I (Univ
ersitext) extensively. Chapters 9 and JO describe the
Chapter 11 then discusses frequency
Probability and Statistical Inference II (Univ ersitext). estimation problems, as in the 1979 edition.
introd uces coverage probability and
properties of estimation procedures, and
© 1985 by Springer-Verlag New York Inc. ibes tests of significance, with applications
be translated or reproduced in any fonn confidence intervals . Chapter 12 descr
All rights reserved. No part of this book may The likeli hood ratio statistic is used to unify the
lag, 175 Fifth Avenue, New York, New primarily to frequency data.
without written permission from Springer-Ver r material on estimation . Chapters
York 10010 , U.S.A . material on testing, and connect it with earlie
under the assumption of normality,
13 and 14 present methods for analyzing data
Typeset by H : Charlesworth & Co. Ltd., Hudde
rsfield, England. modelling the experimental situ-
Harrisonburg, Virginia. with emphasis on the importance of correctly
by R.R. Donnelley and Sons, tics and conditional tests, and Chapter
Printed and bound
States of America. ation. Chapter 15 considers sufficient statis
Printed in the United in statis tical infere nce .
16 presents some additional topics
is unusu al for an introd uctor y text. The importance
9 8 765 4 3 2 I The content of volume two
general techniques are presented
Berlin Heidelberg Tokyo of the probability model is emphasized, and
ISBN 0-387-96183-6 Springer-Verlag New York for derivi 1g suitable estimates, intervals, and
tests from the likelihood function.
Heide lberg New York Tokyo
ISBN 3-540-96183-6 Springer-Verlag Berlin
Prefac e
Th e intention is to
avoid the appearanc
1 fom1ulas set ou t for
using the me tho ds pre
type problems. A wi
e of a rec ipe bo ok
de variety of app lic ati
with ma ny special
on s can be treated
sented , particularly
facilities . if students ha ve acc
ess to co mp uti ng
I have om itte d mu ch
of the standard mater Contents o f Volume
mators and tests , wh
ich is bet ter left for
ial on op tim ali ty cri
lat er co urs es in ma the
ter ia for esti- 2
Also , I ha ve avo ide ma tic al statistics.
d using dec isi on -th eor
the cal cul ati on and eti c lan gu age . Fo r ins
interpretation of the tan ce, I discuss
pre sen tin g the forma ob ser ved sig nif ica nce
l theory of hypothesis lev el, rat her than
the aim is to learn testing. In most sta tis
from the da ta at han tic al app lic ati on s,
a long seq ue nc e of d, not to mi nim ize
decision>. err or frequencies in
I wish to thank my
col lea gu es and studen
their helpful co mm en ts at the Un ive rsi ty
ts on the I 97 9 editio of Wa ter loo for
Sp eci al thanks are n , and on ear lie r dra
du e to Professor Joc fts of this edi tio n.
ges tio ns, and to Ms k Ma cK ay for his ma
. Lynda Ho hn er for ny exc ell ent sug-
to ex pre ss my apprec sup erb tec hn ica l typ
iation to my wife Re ing . Fin all y, I wish
Br ian , for their enc be cca , and ch ild ren
ou rag em ent and sup Jan e , Da vid , and
I am grateful to the port.
Bi om etr ika trustees
from Ta ble 8 of Bio for pe rm iss ion to rep
metrika Tables for Sta rod uce material
to Joh n Wiley and tisricians, Vol. 1 (3r
Sons Inc. for permi d ed itio n, 1966);
from Sratistical Table ssion to rep rod uc e Pre fac e
s and Formulas by D. po nio ns of Ta ble II
Ex ecu tor of the late Ha ld (19 52 ); and to v
Sir Ronald Fis he r, F.R the Literary
to Lo ng ma n Gr ou p .S. , to Dr. Fra nk Ya
Ltd ., Lo nd on , for per tes , F.R .S. , and
from the ir book Sta mi ssi on to reprint Ta
listical Tables for Bio ble s I, Ill , and V
Resea rch (6th edition logical, Agricultural, CH AP TE R 9
, 1974). an d Medical
Lik eli ho od Me tho ds
9. 1 The Method of Maxim
J .G . Kalbfleisch um Likelihood
9.2 Combining Independe 3
nt Experiments
9. 3 Relative Likelihood 13
9 .4 Likelihood for Continuo 17
us Models
9.5 Censoring in Lifetim 25
e Experiments
96 Invariance 32
9. 7 Normal Approximatio 37
ns
9.8 Newton's MetlioJ 40
Review Problems 46
51

CH AP TE R 10
Tw o-P ara me ter Lik
eli ho od s
53
10. l Maximum Likelihood
Estimation
J0.2 Relative Likelihood and 53
Contour Maps
10.3 Maximum Relative Lik 61
elihood
10.4 Normal Approximatio 65
ns
!0. 5 A Dose-Response Ex. 70
ample
10.6 An Example from Le 74
aming Theory
10.7* Some Derivations 83
10.8* Multi-Parameter Likelih 88
oods
92
Contents ix
viii Contents

15.2 Properties .of Sufficient Statistics 285


CHAPTER 11 15.3 Exact Significance Levels and Coverage Probabilities 289
Frequency Properties 96 15.4 Choosing the Reference Set 296
15.5 Conditional Tests for Composite Hypotheses 300
11. I Sampling Distributions 97 305
15.6 Some Examples of Conditional Tests
11. 2 Coverage Probability 102
11.3 Chi-Square Approximations 107
11.4 Confidence Intervals 113 CHAPTER 16
Topics in Statistical Inference 314
11 .5 Results for Two-Parameter Models 120
11.6* Expected Information and Planning Experiments 124 16.1 * The Fiducial Argument 314
11. 7* Bias 129 16.2* Bayesian Methods 321
16.3* Prediction 326
CHAPTER 12 16.4* Inferences from Predictive Distributions 330
Tests of Significance 134 16.5* Testing a True Hypothesis 334
12. I Introduction 134
12.2 Likelihood Ratio Tests for Simple Hypotheses 141 APPENDIX A
12.3 Likelihood Ratio Tests for Composite Hypotheses 149 Answers to Selected Problems 337
12 .4 Tests for Binomial Probabilities 156
12.5 Tests for Multinomial Probabilities 160 APPENDIX B
12.6 Tests for Independence in Contingency Tables 170 Tables 347
I 2. 7 Cause and Effect 179
182 Index 357
12. 8 Testing for Marginal Homogeneity
12.9 Significance Regions 186
12.10* Power 190

CHAPTER 13
Analysis of Normal Measurements 196
13 . I Introduction 196
13.2 Statistical Methods 200
13 . 3 The One-Sample Model 206
13.4 The Two-Sample Model 212
13 . 5 The Straight Line Model 220
13.6 The Straight Line Model (Continued) 229
13 .7 Analysis of Paired Measurements 234
Review Problems 240

CHAPTER 14
Normal Linear Models 242
14. 1 Matrix Notation 242
14.2 Parameter Estimates 247
14.3 Testing Hypotheses in Linear Models 252
14.4 More on Tests and Confidence Intervals 260
14.5 Checking the Model 267
14.6* Derivations 274

CHAPTER 15
Sufficient Statistics and Conditional Tests 277
15 . I The Sufficiency Principle 277
Contents of Volume 1

Preface

CHAP TER I
Introduction
I. I Probability and Statistics
1. 2 Observed Frequencies and Histograms
1.3 Probability Models
1.4 Expected Frequencies

CHAP TER 2
Equi-Probable Outc omes
2.1 Combinatorial Symbols
2.2 Random Sampling Without Replacement
2.3 The Hypergeometric Distribution
2.4 Random Sampling With Replacement
2.5 The Binomial Distribution
2.6* Occupancy Problems
2.7* The Theory of Runs
2.8* Symmetric Random Walks

CHAP TER 3
The Calculus of Probability
3.1 Unions and lntersections of Events
3.2 Independent Experiments and Produ
ct Models
3.3 Independent Events ·
Contents Contents Xlll
xii

3.4 Conditional Probability 7. 3 Transformations of Normal Variates


3.5 Some Conditional Probability Examples 7.4 * The Bivariate Normal Distribution
3.6 Bayes's Theorem 7. 5* Conditional Distributions and Regression
3. 7* Union of n Events
Review Problems CHAPTER 8
Generating Functions
CHAPTER 4
8. 1* Preliminary Results
Discrete Variates Probability Generating Functions
8.2*
4.1 Definitions and Notation 8.3* Moment and Cumulant Generating Functions
4.2 Waiting Time Problems 8.4* Applications
4.3 The Poisson Distribution 8.5* Bivariate Generating Functions
4.4 The Poisson Process
4.5 Bivariate Distributions APPENDIX A
4. 6 Independent Variates Answers to Selected Problems
4.7 The Multinomial Distribution
Review Problems
APPENDIX B
Tables
CHAPTER 5
Mean and Variance
Index
5. 1 Mathematical Expectation
5.2 Moments; the Mean and Variance
5. 3 Some Examples
5.4 Covariance and Correlation
5. 5 Variances of Sums and Linear Combinations
5.6* Indicator Variables
5. 7* Conditional Expectation
Review Problems

CHAPTER 6
Continuous Variates
6.1 Definitions and Notation
6.2 Uniform and Exponential Distributions
6.3* Transformations Based on the Probability Integral
6.4* Lifetime Distributions
6.5* Waiting Times in a Poisson Process
6.6 The Normal Distribution
6. 7 The Central Limit Theorem
6.8 Some Normal Approximations
6.9 The Chi-Square Distribution
6.10 The F and t Distributions
Review Problems

CHAPTER 7
Bivariate Continuous Distributions
7. 1 Definitions and Notation
7.2 Change of Variables
CHAPTE R 9

Likelihood Methods

The first volume dealt with probability models, and with mathematical
methods for handling and describing them. Several of the simplest discrete
and continuous probability models were considered in detail. This volume is
concerned with applications of probability models in problems of data
analysis and interpretation.
One important use of probability models is to provide simple mathemat-
ical descriptions of large bodies of data. For instance, we might describe a set
of 1000 blood pressure measurements as being like a sample of 1000
independent values from a normal distribution whose mean µ and variance
a 2 are estimated from the data. This model gives a concise description of the
data, and from it we can easily calculate the approximat e proportion of blood
pressure measurements which lie in any particular range. The accuracy of
such calculations will, of course, depend upon how well the normal distri-
bution model fits the data.
We shall be concerned primarily with applications of probability models in
problems of statistical inference, where it is desired to draw general
conclusions based on a limited amount of data. For instance, tests might be
run to determine the length of life of an aircraft component prior to failure
from metal fatigue. Such tests are typically very expensive and time
consuming, and hence only a few specimens can be examined. Based on the
small amount of data obtained, one would attempt to draw conclusions
about similar components which had not been tested. The link between the
observed sample and the remaining component s is provided by the proba-
bility model. The data are used to check the adequacy of the model and to
estimate any unknown parameters which it involves. General statements
concerning this type of component are then based on the model.
A limited amount of data can be misleading, and therefore any general
2 9. Likelihood Methods 9.1 The Method of Maximum Likelihood 3

conclusions drawn will be subject to uncertainty. Measurement of the extent 9.1 The Method of Maximum Likelihood
of this uncertainty is an important part of the problem. An estimate is of little
value unless we know how accurate it is likely to be. Suppose that a probability model has been formulated for an experiment, and
In statistical inference problems, we usually start with a set of data, and that it involves a single unknown parameter 8. The experiment is performed
with some information about the way in which the data were collected. We and some data are obtained. We wish to use the data to estimate the value of
then attempt to formulate a probability model for the experiment which gave 8. More generally, we wish to determine which of the possible values of 8 are
rise to the data. Examination of the data, and· of other similar data sets, can plausible or likely in the light of the observations. .
be very useful at this stage. It is important to treat the data set in context, and The observed data can be regarded as an event E in the sample space for
to take full advantage of what is already known from other similar the probability model. The probability of event E can be determined from the
applications. model, and in general it will be a function of the unknown parameter, P(E; 8).
Usually the probability model will involve one or more unknown para- The maximum likelihood estimate (MLE) of 8 is the value of 8 which
meters which must be estimated from the data. We have already encountered maximizes P(E ; 8). The MLE of 8 is usually denoted by 0. It is the parameter
this problem on several occasions, and have used the observed sample mean value which best explains the data E in the sense that it maximizes the
as an estimate of the mean of a Poisson or exponential distribution. probability of E under the model..
Intuitively, this is a reasonable thing to do, but intuition may fail us in more
complicated situations. EXAMPLE 9.1.1. Suppose that we wish to estimate 8, the proportion of people
Section 9.1 introduces the method of maximum likelihood, which provides with tuberculosis in a large homogeneous population. To do this, we
a routine procedure for obtaining estimates of unknown parameters. Section randomly select n individuals for testing, and find that x of them have the
2 considers the problem of estimating an unknown parameter 8 on the basis disease. Since the population is large and homogeneous, we assume that the n
of data from two independent experiments. Section 3 shows how the relative individuals tested are independent, and that each has probability 8 of having
likelihood function may be used to rank possible values of 8 according to tuberculosis. The probability of the observed event (data) is then
their plausibilities.
Section 4 describes likelihood methods when the probability model is P(E; 8) = P(x out of n have tuberculosis)
continuous. The special case of censoring in lifetime experiments is consi-
dered in Section 5. Section 6 discusses the invariance property of likelihood (9.1.1)
methods, and Section 7 describes a normal approximation to the log relative
likelihood function. The use of Newton's method in finding maximum where 0::;; 8::;; 1. The maximum likelihood estimate 8 is the value of 8 which
likelihood estimates and likelihood intervals is illustrated in Section 8. x
maximizes (9.1.1). We shall show later that (9.1.1) is maximized for 8 = - , and
In this chapter it is assumed that the probability model involves only one n
unknown parameter. Likelihood methods for the estimation of two or more
unknown parameters are described in Chapter 10. Some theoretical pro- .
so the MLE of 8 is e= :_n . To maximize the probability of the data we
perties of these estimation procedures are considered in Chapter 11. x
Chapter 12 introduces tests of significance, which are used to investigate estimate 8, the proportion of diseased persons in the population, by - , the
n
whether various hypotheses of interest are consistent with the data. Several proportion of diseased persons in the sample.
applications of significance tests to frequency data are given.
Traditionally, the normal distribution has played a very important role in
statistical applications. Chapters 13 and 14 develop estimation procedures The Likelihood and Log Likelihood Functions
and significance tests for a variety of situations where measurements are
assumed to be independent and normally distributed. Finally, Chapters 15
and 16 deal with some more advanced topics in statistical inference. Note that the constant factor (:) will have no effect on the maximization of
(9.1.1) over 8. To simplify expressions, we shall generally omit such constants
and consider only the part of P(E; 8) which involves 8.
The likelihood function of 8 is defined as follows :
L(8) = c · P(E; 8). (9.1.2)
4
9. Likelihood Methods
\, 9.1 The Metho d of Maximum Likelihood
5
Here c is any positive const ant with respect to
8; that is, c is not a function of The ieformation functi on ..1(8) is minu s the
8, altho ugh it may be a function of the data. secon d derivative of the log
We choos e c to obtai n a simple
expression for Ll,8), and subse quent results will likelihood function with respect to 8:
not depen d upon the specific
choice made.
Usually P(E; 8) and L(8) are produ cts of terms
, and it will be more .~(8) = -/"(8 ) = -S'(8 ) = - d;~~). (9.1.5)
conve nient to work with logar ithms . The log likelih
ood functi on is the natur al
logar ithm of L Note that neither S(8) nor ..1(8) depen ds on the
choice of c in (9.1.2).
The set n of possible values of 8;is called the param
1(8) =log L(8). e.ter space. Usua lly Q is
(9.1.3) an interval of real values, such as. [O, l] in the
exam ple above, and the first
Note that, by (9.1.2), and second derivatives of /(8) with 'respect to 8
exist at all interi or point s of n.
Then, if 8 is an interi or point of n, the first deriv
/(8) = c' +log P(E; 8) ative will be zero and the
second derivative will be negat ive at 8 = 8. Thus
under these condi tions we
where c' =log c is not a funct ion of 8. have
The maxim um likeli hood estim ate B is the value
of 8 which maximizes S(B) =0; ..f(B) > o. (9.1.6)
P(E; 8). The value of 8 which maximizes P(E;
8) will also maximize L(8) and To find 8, we determ ine the roots of the maxim
1(8). Thus the MLE B is the value of 8 which um likelihood equat ion S(8) = 0.
maximizes the likelihood We then verify, by checking the sign of ..1(8)
funct ion and the log likelihood function. Usua or otherwise, that a relative
lly it is easiest to work with the maxim um has been found.
log likelihood function.
In some simple examples, the maxim um likeli
hood equat ion S(8) = 0 can
be solved algebraically to yield a formula
EXAM PLE 9.1.1 (continued). The
likelihood function of 8 is any const ant c situations, it will be necessary to solve this equat
for e.
In more comp licate d
times the expression for P(E; 8) in (9.1.1), where ion numerically (see Section
c may depen d on n and x but 9.8).
not on 8. Since the aim in choos ing c is to simpl

choice is c = 1 /(:). and then


ify the expression, a natur al
e
Situa tions do arise in which canno t be found
likelihood equat ion S(8) = 0. For instan ce S(8)
by solving the maxi mum
need not be zero if the overall
maxim um of /(8) occur s on a boun dary of
the param eter space n(see
L(B) = B"(l - er -" for 0:::;: 8:::;: I. i' Examples 9.1.1 and 9.1.2). The same is true if 8
possible values such as the integers (see Probl
is restri cted to a discrete set of
The log likeli hood function is now ems 9.1.7 and 9.1.11).

/(8) = x log 8 + (n- x)log (l - 8) for 0:::;: 8:::; I. I EXAM PLE 9.1.1 (continued). For
functions are
this example, the score and inform ation
The MLE Bis the value of 8 which maximizes
/(8). d/(8) x n- x
S( 8) = de = B - 1- 8 for0 <8 < l;
The Score and Information Functions
dS(8) x n- x
5 (8) = - d8 = for 0 < 8 <I.
82 + (1 - 8) 2
To evalu ate 8, we need to locate the maxim um
of /(8) over all possible values For 1 :::; x:::; n - I, the maxim um likelihood equat
of 8. This can usually be done by differentiating ion S(8) = 0 has a uniqu e
/(8) with respect to 8, settin g
the derivative equal to zero, and solving for soluti on 8 =::.S ince .1(8) > 0 at 8 = :: , the likeli
8. It is possible that this hood function has a relative
proce dure migh t yield a relative minim um or point n n
of inflexion instead of the
maxi mum desired. Thus it is necessary to verify
that a maxim um has been maxim um at 8 = ::. Furth ermo re, since L(B) =
found , perha ps by check ing that the second deriva 0 for 8 = 0 and for 8 = 1, we
tive is negative. n
The score functi on S(8) is defined to be the . x
likeli hood funct ion with respect to 8:
first derivative of the log have found the overall maxim um; and thus
= - .
n
e
If x = 0, the equat ion S(8) = 0 has no soluti on, and
the maxi mum occur s on
S(B) = /'(8) = d~~); (9.1.4)
a boun dary of the param eter space Q = (0, 1].
In this case we have
P(E; 8) = (1 -' 8)" for 0 s 8:::; 1,
6 9. Likelihood Methods 9.1 The Method of Maximum Likelihood 7

which is clearly largest when 8 = 0. Thus i1=0 when x = 0. Similarly we find In both cases, the MLE isµ= x. To maximize the probability of the data
x x1, x2,.. ., x", we estimate the population mean µby the sample mean x.
that i1 = l when x = n, and the formula 8= - holds for all x .
n EXAMPLE 9.1.3. It is usually not possible to count the number of bacteria in a
EXAMPLE 9.1.2. Some laboratory tests are run on samples of river water in sample of river water; one can only determine whether or not any are present.
order to determine whether the water is safe for swimming. Of particular n test tubes each containing a volume v of river water are incubated and
interest is the concentration of coliform bacteria in the water. The number of tested. A negative test shows that there were no bacteria present, while a
coliform bacteria is determined for each of n unit-volume samples of river positive test tube shows that at least one bacterium was·present. If y tubes out
water, giving n observed counts x 1 , x 1 , .. ., x 0 • The problem is to estimateµ, of the n tested give negative results, what is the maximum likelihood estimate
the average number of coliform bacteria per unit volume in the river. ofµ?
We assume that the bacteria are distributed randomly and uniformly SOLUTION. The probability that there are x bacteria in a volume v of river
throughout the river water, so that the assuinptions of a Poisson process water is given by a Poisson distribution with mean µv :
(Section 4.4) are satisfied. Then the probability of observing x 1 bacteria in a
sample of unit volume is given by a Poisson distribution: f(x) = (µv)"e - µ"/x!; x =0, l, 2, ... .

for x 1 = 0, l, 2, .. .. The probability of a negative reaction (no bacteria) is

Since disjoint volumes are independent, the probability of the n observed P = f(O) = e-µv;
counts x 1 , x 2 , .. ., xn is the probability of a positive reaction (at least one bacterium) is
P(E; µ) = f(x1)f(x1) .. · f(xn) l -p= l -e-µ•.

Since disjoint volumes are independent, the n test tubes constitute independ-
ent trials. The probability of observing y negative reactions out of n is
therefore
The likelihood function is c · P(E; µ)where c is any constant not depending
upon µ. We choose c to simplify the expression for L(µ), and a natural choice
here is
P(E ; µ) = c) pl'(l - pr-Y

c =l / x 1 !x 2 ! .. . xn!. where p = e-•µ and 0:::; µ < oo.


For this choice of c, the likelihood function is. The likelihood function is c · P(E; µ)where c does not depend upon µand is

L(µ) = µr.x•e-nµ for 0::; µ < oo, chosen to give a simple expression for L(µ). Taking c = 1 / (~), we have
and the log-likelihood function is L(µ) = pl'(l - Pry
I(µ) =fa; log µ - nµ. where p = e-vµ and 0:::; µ < oo .
I
The score and information functions are Since p = e -•µ , it follows thatµ= - - log p. From Example 9.1.1, the
v
1
S(µ) = - fa 1 - n;
µ function pl'(l - p)"- 1 is maximized for p =~ . The corresponding value ofµ is
n
These functions will be the same no matter how the constant c is chosen. µ= _ ~ log P= _ ~log~ = log n - logy
If l:x 1 > 0, the maximum likelihood equation S(µ) = 0 has a unique solution v v n v
1
µ = -l:x 1 = x. Since §(µ) > 0 at µ = x, we have found a relative maximum. Here we have used the invariance property of likelihood (see Section 9.6).
n For instance, suppose that 40 test tubes each containing 10 ml of river
Furthermore, since L(O) = 0 and L(µ)-+ 0 as µ-+ oo, this must be the overall water are incubated. If 28 give negative tests and 12 give positive tests, then
maximum.
If l:x 1 = 0, the equation S(µ) = 0 has no solution, and the maximum occurs • log 40 - log 28
µ= 10 = 0.0357.
on the boundary of the parameter space: µ = 0.
8 9. Likelihood Methods
9.1 The Method of Maximum Likelihoo d
9
The concentration of bacteria in the river is estimated to be 0.0357
per ml.
The greater the concent ration of bacteria in the river, the more probabl The number of defective items out of 10 is thought to have a binomia
e it l
is that all n test tubes will give positive results. Hence the larger the value distribution. Find the MLE of 8, the probability that an item is defectiv
ofµ, e, and
the more probable the observation y = 0. If. we observe y = 0, the MLE compute estimated expected frequencies under the binomial distribu
ofµ model. tion
will be + oo. In this case, it does not make much practical sense to give ·
merely
a single estimate ofµ. What we require is an indication of the range
ofµ- SOLUTION. According to a binomial distribution model, the probabi
values which are plausible in the light of the data, rather than a single lity of
"mos.t observing j defectives out of 10 ·is
plausible" value. This can be obtaine d by examining the relative likeliho
od
(~0)81(1-8)10-1;
function (see Section 3).
0 P1= j = 0, 1, 2, ... ' 10.

Likelihoods Based on Frequency Table.s The probability of observing 4 or more defectives is P4+ = 1 - p
0
- p 1 - p2
- p3 • By (9.1.7), the likelihood function of 8 is
Data from n independent repetitions of an experiment are often summar
ized for 0.::;; 8.::;; 1,
in a frequency table:
where c is any convenient positive constant. Taking
Event or class Total
Observed frequency n
Expected frequency n we obtain
The sample space S for a single repetition of the experiment is partitio 48) = [(1- 8)10]133[8(1 - 8)9]52[82(1 - 8)8] 12 (83(1 - 8)7f
ned
into k mutually exclusive classes or events, S = A u A u · · · u Ak. Then
1 2 fj is = 8ss(l _ 8)1 91 5 .
the number of times that Ai occurs in n repetitions CL.fj = n). Let Pi
be the
probability of event Ai in any one repetition (L.p = 1). The p/s This likelihood function is of the form considered in Example 9.1.l,wi
1 can be th
determined from the probability model. If the model involves an unknow
n 85
parame ter e, the p/s will generally be functions of e. x = 85 and n = 2000. Hence B= 2000 = 0.0425.
The probab ility of observing a particul ar frequency table is given by The estiniated probability for class j = 0 is
the
multinomial distribution
Po= ( ~ ) 8°(1 -
0
P(E; 8) = (11h ... !,.) p{• p{2... rl,;. 8) 10 - 0 = (1 - 0.0425)1 ° = 0.6477

The likelihood function is and the estimated expected frequency for this class is

L(8) = cp{• p{2 ... p{k (9.1.7) np0 = 200(0.6477) = 129.54.


where we have absorbe d the multinomial coefficient into the constan Similarly we can compute np for j = 1, 2, 3 and then find the estimat
t c. The 1 ed
MLE Bis the value of 8 which maximizes (9.1.7). Using B, we can comput ·expected frequency for the last class by subtrac tion from 200. The results
e are
estimated expected frequencies np for comparison with the observe as follows:
1 d fre-
quencies Jj .
Number of defectives 0 2 3 ;:: 4 Total
EXAMPLE 9.1.4. On each of 200 consecutive working days, ten items
were Observed frequency 133 52 12
random ly selected from a product ion line and tested for imperfections, 3 0 200
with Expected frequency 129.54 57.50 11.48 l.36 0.12
· the following results: 200

The agreement between observed and estimated expected frequencies


Number of defective items 0 2 3 ;::4 Total appears
to be reasonably good.
Frequency observed 133 52 12 3 0 200 Since the items are chosen at random, we would not observe exactly
the
same results if we repeated the experiment. According to the model,
the !j's
10
9. Likelihood Meth ods
9.1 The Meth od of Maximum Likelihood
are observed values of rand om varia
bles. Some differences between the 11
observed and expected frequencies will
occur owing to chance variation in the 6.
f/s. A test of significance (Chapter A samp le of n items is exam ined from
12) may be used to verify that the each large batch of a mass -prod uced
differences found here are not too grea article. The numb er of good items in
t to be accounted for by chance a samp le has a binom ial distr ibuti on
variation, and hence the binomial mod param~ters n and p. The batch is accep with
el is satisfactory. ted if all n items are good , and is rejec
0 othe~w1se. ?ut . of m batches, x are accep ted
maxi mum hkeh hood estim ate of p. ted and m - x are rejected. Find the
PROB LEMS FOR SECT ION 9.1
7.t "The enem y" has an unkn own
J.t Supp ose that disea sed trees are distr ibute numb er N of tanks , which he has
numb ered l, 2, ... , N. Spies have repor obligingly
d rando mly and uniformly throu ghou ted sight ing 8 tanks with numb ers 137
a large forest with an avera ge of ,l t 86, 33, 92, 129, 17, 111. Assume that sight 24
per acre. The numb ers of diseased ings are indep ende nt, and that each
trees
obse rved in ten four- acre plots were
0, 1, 3, 0, 0, 2, 2, 0, 1, 1. Find the maxi t~e N tanks has proba bility l/ N of being of
likeli hood estim ate of ,l. mum N = 137. obser ved at each sight ing Show that
.
2. Supp ose that the n coun ts in Exam 8. Bloo d samp les from nk peop le are analy
ple 9. 1.2 were summ arize d in a frequ zed to obtai n inform~tion abou t 8 the
table as follows: ency
fraction of the pop.ulation infected with
a certa in disea se. In orde r to save ;ime
the nk_ samp les are mixed toget her k
at a time to give n pool ed samp les.
Num ber of bacte ria 0 a?alys1s of a poole d samp le will be nega Th~
Tota l tive if the k indiv idual s are free from
Freq uenc y obser ved disease, and positive otherwise. Out the
Jo n of the n poole d samp les, x give nega
tive
results and n - x give pos1t1ve results.
The numb er of bacte ria in a samp le Find an expre ssion for 8.
is assum ed to have a Poiss on distri butio
with mean µ. Find the likeli hood funct
ion and maxi mum likeli hood estim ate
n 9.t S~cimens of a ne~ high- impa ct
ofµ plast ic are tested by repea tedly strik
based on the frequ ency table , and show with_ a _hammer until they fracture. If the ing them
that they agree with the results obtai ned speci men has a cons tant prob abili ty 8
in Exam ple 9.1.2. surv1vmg a blow, mdep ende ntly of the of
numb er of previ ous blow s received
numb er _of blows requi red to fractu the
3. Cons ider the following two expe rimen d1stn buho n, re a speci men will have a geom~tric
ts whos e purp ose is to estim ate 8,
fracti on of a large popu latio n havin the
g blood type A.
(i) Indiv idual s are selected at rand om for x = 1, 2, 3, ....
until 10 with blood type A are obtai ned. The results of tests on 200 speci mens
The total numb er of peop le exam ined were as follows:
is found to be 100.
(ii) 100 indiv idual s are selected at rand
om, and it is found that 10 of them have
blood type A. Num ber of blows requi red
2 3 Tota l
Show that the two expe rimen ts lead Num ber of specimens
to prop ortio nal likeli hood functions, 112
22 30 36
henc e the same MLE for 8. and 200
Find the. maxi mum likeli hood estim
4.t Acco rding to gene tic theor y, blood frequencies. ate of 8, and comp ute estim ated expe
types MM, NM, and NN shou ld occu cted
very large popu latio n with relative frequ r in a
encies 82 , 28(1 - 8), and (1 - 8)2 , wher 10.
() is the (unk nown ) gene frequency . e Then proge ny in a breed ing expe rimen
t are of three types there being x . of
ith type (i = 1, 2, 3). Acco rding to a gene the
(a) Supp ose that, in a rand om samp le
of size n from the popu lation , there are tic mode l, the pr;po rtion s of th~ three
x1, types shou ld be (2 + p)/4, (1 - p)/ 2, and
x 2 , and x 3 of the three types . Find anoth er. p/ 4, and prog eny are indep ende nt of
an expression for 8. one
(b) The obser ved frequencies in a
samp le of size 100 were 32, 46, and
respectively. Com pute@ and the estim 22,
ated expe cted frequencies for the three (a) Show that p is a root of the quad
blood types unde r the model. ratic equa tion
2

5.
np + (2x2 + x 3 - x 1 )p - 2x 3 = 0.
A brick -shap ed die (Exa mple 1.3.2) is
rolled 11 times, and the ith face comes up x (b) Supp ose that x, = 58, X2 = 33, and
times (i = 1, 2, ... , 6), where :Ex = n.
1
1 x
expected frequencies unde r the mode 3
= 9. Find p, and comp ute estim ated
l.
(a) Show that &= (3t - 2n)/ 12n, wher
e t = x1 + x2 + x3 + x • 11. An urn conta ins r red balls and b black
(b) Supp ose that the obse rved frequ 4
balls, wher e r is know n but b is unkn own.
encies are 11, 15, 13, 15, 22, 24. Com
estim ated expe cted frequencies unde pute Of n balls chose n at rand om witho ut
r the model. repla ceme nt, x were red and y were
(x + y = n). black

(a) Show that L(b) is prop ortio nal to


b(Yl/(r + b)<"l.
9.2. Combining Independent Experiments
13
9. Likelihood Methods
12

L{b + 1) (b + l)(r +b- n + 1) 9.2. Combining Independent Experiments


(b) Show that - - - = (b - y + l)(r + b + 1) .
L{b) about the
Suppose that two indepen dent experim ents both give informa tion
(c) By considering the conditions under which L{b + 1)/ 1..{b) exceeds one,
show £ with probabi lity P(E 1 ; 8).
The
same para.meter 8. Experim ent 1 gives data 1
M . 9 first experim ent is
8 based on the
that o is the smallest integer which exceeds -x - (r + 1). When 1s o likeliho od function of
f not
L 1 (8) = c 1 • P(E 1 ; 8)
unique?
£ , a nd
12. For a certain mass-produced article, the proportion of defectives is
e. It is where c 1 is any positive constan t. Similarl y, experim ent 2 gives data 2
kept
customary to inspect a sample of 3 items from each large batch. Records are the likeliho od function is
only for those samples which contain at least one defective item.
L 2 (8):;: c 2 • P(E 2 ; 8).
s,
(a) Show that the conditional probability that a sample contains i defective of data, E 1
We wish to obtain the likeliho od function of 8 based on both sets
given that it contains at least one defective , .is
and £ 2 .
ents of a
(i = 1, 2, 3). As in Section 3.2, we conside r the two experim ents as compon
the compos ite experim ent
single compos ite experiment. The sample space for
the data from the compos ite experim ent
is a . Cartesia n product , and
(b) Suppose that x, samples out af n recorded contain i defectives (i =
1, _2, 3; in this space. The probabi lity of the data is
corresp ond to the event £ 1 E 2
that () is the smaller root of the quadrati c equation experim ents is
I:x, = n). Show P(E E ; 8), and the likeliho od function based on both
1 2
t(J 2 - 3t(J + 3(t - n) = 0 L(8) = C" P(E I E2; 8)

where t = x 1 + 2x 2 + 3x 3 • where c is any positive constan t.


13.t Leaves of a plant are examined for insects. The number of insects on a
leaf is Since the experim ents are indepen dent, we have
a Poisson distribut ion with mean µ, except that many leaves
thought to have
have no insects because they are unsuitable for feeding and not merely because
are
of the chance variation allowed by the Poisson law. The empty leaves It follows that
therefore not counted.
that it
(a) Find the conditional probability that a leaf contains i insects, given
contains at least one. where c' = c/c 1 c 2 is any positive constan t.
tc
(b) Suppose that x 1 leaves are observed with i insects (i = 1, 2, 3, ... ),
where In the example s of the last section we chose the proport ionality constan
ofµ satisfies the equation in (9.1.2) to simplify the express ion for L(8). We noted that S(8), §(8), and 8
I:x, = n. Show that the MLE
are unaffected ty the choice of c. In the same spirit, we can take c' = 1 above.
fl=i(l- e - •)
For this choice of c' we have
where i = I:ixJn. L(8) =L (8) • L (8), 2
(9.2.1)
1
(c) Determine µ numerically for the case i = 3.2.
that x, of and taking natural logarith ms on both sides gives
14. In Problem 9. l.12, suppose that samples of size k > 3 are examined, and
= 1(8) = 11 (8) + 12 (8). (9.2.2)
those recorded contain i defectives (i = 1, 2, .. ., k; I:x, n).

(a) Show that the MLE of(} satisfies the equation ents,
To combin e informa tion about 8 from two or more indepen dent experim
the log likeliho od function s.
x[l - (1-{}t] -k(J=O we multiply the likeliho od functions, or add
It follows from (9.2.2) and (9.1.4) that
where :i = 'Lix Jn. (9.2.3)
(b) Use the binomial theorem to show that, if() is small, then S(8) = sI (8) + S2(8).
the score
() ~ 2(x - l)J(k - l)x. The score function for the compos ite experim ent is the sum of
compon ents. Similarl y, (9.2.3) and (9.1.5) give
function s for the indepen dent
(c) Solve for &in the case k = 5, :i = 1.12.
14
9. Likelihood Methods
9.2. Combining Independent Experiment
s ·
15
f(8) = f 1(8) +f2( 8).
(9.2.4)
Let 81 , Bi, and 8 be the MLE's of 8 base B=0.98 1 +0. 182 .
d on just the first experiment, on
just the second experiment, and on both The overall MLE lies between 8 and
experiments, respectively. Thus 8 82 , and is closer to 8 , the MLE from
maximizes 11(8), Bi maximizes 12(8), and
e
maximizes 1(8). Except in special
1 the larger sample, than to 8 • 1 1
2
cases, it is not possible to compute from
likelihoods using (9.2.2) and then rema
e just B1 and Bz. One has to add log Note that the log likelihood function
(9.2.6) is the same as would be
ximize to get
If 81 = B2 , then both terms on the right e. obtained if we considered a single samp
le of n + m individuals, x + y of whom
hand side of (9.2.2) attain their were ~ound t~ ~ave tuberculosis. The divis
max ima at the same value of 8, and henc ion of the results into two sepa rate
e B= 81 = 82 • Otherwise, the overall expenments 1s 1rrelevant in so far as estim
ation of 8 is concerned. o
max imum will usually lie between el
and 82 . EXAMPLE 9.2.2. I~ performing the experime
e
If the estimates 1 , 8 were quite diffe
2 rent, it would usually be unwise to necessary to s_pecify the volume v of river
nt described in Example 9.1.3, it is
com bine results from the two experime water which is to be placed in each
nts to obta in a single overall estimate. test tube. If v is made too large, then all
Instead the results from the two experime of the test tubes will cont ain bacteria
nts should be reported separately, and ~ve a p~sitive reaction. If v is too
and an explanation of the difference shou small, we may get only negative
ld be sought. For further discussion reactions. In either case, the_experimen
see Example 9.3.2 and Section 12.3. t will be rathe r uninformative abou tµ,
the concentrat10n of bact ena in the river
.
EXAMPLE 9.2.l. Suppose that, in Exam . One way to guard against this diffic
ple 9.1.l, m additional people are ulty is to prep are two (or more)
rand omly selected, and y of them are different types of test tubes containin
found to have tuberculosis. Find the g different volumes of river water.
MLE of 8 based on both sets of data. Suppose t~at 40 test tubes containing 10
ml of river water were tested, and 28
gave negative results. Also, 40 test tube
SOLUTION . For the first experiment, the s containing l ml of river water were
log likelihood function is tes~ed, and 37 gave negative results.
est1mate of µ? Wha t is the maximum likelihood
11 (8)= xlog 8+( n-x )log (l -8),
(9.2.5)
and the max imum likelihood estimate SOLUTION . From Example 9.1.3, the likel
is 81 =::. For th~ second experiment, containing 10 ml is ihood function based on the 40 tubes
n (
we similarly obta in
Li(µ )= pfs( l - pi)12
12 (8) = y log 8 +(m -y)l og( l - 8), where P1 =e- 10", and th MLE
function is e ~f µ · µ' 1 = 0.035
is 7. The log likelihood
and 82 = l':..
Because the popu latio n is large, the two
m samples will be very
/1(µ) = 28 log p 1 + 12 log( l - p ).
nearly independent, and hence by (9.2.2 1
), the log likelihood function based on Similarly, !'rom the 40 tubes containin
both samples is g l ml we obta in
1(8) = 11(8)+12(8) 12(µ) = 37 logp 2 + 3 log( l -p )
2
= (x + y) log 8 + (n + m - x - y) log (1 - where p 2 = e-", and the MLE ofµ is
8). (9.2.6)
This is of the same form as (9.2.5), and , log n - log y log 40 - log 37
the overall MLE is
µ2 = v l = 0.0780.
B= x+y . By (9.2.2), the log likelihood function
n+m based on all 80 tubes is
Since x = nB 1 and y= mB 2 , we have /(µ) = I1 (µ) + 12 (µ),
and the overall MLE µ is chosen to max
n m imize this function.
71
u= --8 1 +- -B z, For the first sample we have ·
n+m n+m
which is a weighted average of 8 and S 1 (µ) =-d11 (µ) d/1 dpl
=- ·-
1 8 . For instance, if 90 individuals are dµ dpl dµ
examined in the first sample (n = 90), and2
only 10 in the second (m = 10), we
have
=- !Op! [28 -
Pi
._E_J
l - P1
= ~-
1 - P1
400·
'
9.3. Relative Likelihood 17
9. Likelihood Methods
16
Region Popul ation Death s
d 120 dpt 1200p
I
f1(µ )= - -Si( µ)= - (I - pi) 2 -dµ = (1 - pi) 2 • I. Eastern Ontar io 423,447 37

2. Lake Ontar io 175,685 II
Similarly, for the second sample we obtai n 3. Central Ontar io l,245,379 72
4. Niaga ra 413,465 40
5. Lake Erie 216,476 12
6. Lake St. Clair 242,810 14
7. Mid-Western Ontar io 213,591 16
s are
Thus, by (9.2.3) and (9.2.4), the combined result 8. Georgian Bay 166,045 9
9. North easter n On~ario 265,880 15
120 3 10. Lakehead- NW Ontar io 116,371 12
S(µ) =- - + ---4 40;
1 -pl 1 -pl
a + a 2 = I. Show that 8 must
where a1 and a 2 are positive real numbers with 1
lie between 81 and 02 ,

previous examples, and it is (b) Suppose tha~;


The score function is more complicated than in
ion S(µ) = 0 algebraically.
not possible to solve the maxim um likelihood equat
instan ce, we could evaluate
However, j1 can easily be found numerically. For that iJ must lie between the
find by trial and error the approximate where the a;'s are positive and :Ea,= I. Show
S(µ) for various values ofµ, and hence smallest and the largest of the O,'s.
an iterati ve root-f inding pro-
value of µ for which S(µ) =: 0. Altern ativel y,
9.8). For this
(see Sectio n
cedure such as Newton's metho d can be used 0
decim al places.
example we find that P. = 0.04005, correct to five 9.3. Relative Likelihood
(observed event) E from an
PROBLEMS FOR SECTION 9.2 As in Section 9.1 we suppose that the data
ds upon an unkn own
of the gene for color blindness is e, experiment has probability P(E; 8) which depen
Lt(a) In a popul ation in which the frequency e, B is the value of 8 which
genetic theory indica tes that the proba bility that a male is color-blind is and param eter 8. The maximum likelihood estimate
t plausible" value of 8 in the
the probability that a female is color-blind e
. A rando m sample of M males maximizes P(E; 8). It is the "mos t likely" or "mos
2
is
sample of N females includes n what has been observed.
is found to include m color- blind, and a rando m sense that it maximizes the probability of
on both samples, and show 8-valu es may be exam ined by comp aring
color-blind. Find the likelihood function of(} based The relative plausibilities of other
of a quadr atic equati on. that P(E; 8) is nearly as large as P(E ; B) are
that 8 can be obtained as a root them with 8. Values of 8 such
examined. Eleven males and two t as well as() does . Values
fairly plausible in that they explain the data almos
(b) One hundr ed males and 100 female s were
of(} based on the data for B) becau se they
females were found to be color-blind. Find the MLE of 8 for which P(E; 8) is much less than P(E;
are impla usible
s, and the overan MLE of(} 8 .
males , the MLE of(} based on the data for female make what has been observed much less proba ble than does
based on all of the data. defined as the ratio of the
. The relative likelihood function (RLF) of 8 is
e occur randomly and uniformly likelihood function L(8) to its maximum L(B):
2 (a) If deaths from a rare non-contagious diseas
er of deaths in a region of population P (9.3.1)
throug hout the population, the numb R(8) = L(8)/L(B).
Suppose that the numbers of
should have a Poisson distribution with mean ).P.
, . . . , P. are y 1 , y 2 , . . • , y•. d upon 8, it follows that
deaths observed inn regions with popul ations P·1 P1
,
Since L(8) = c • P(E; 8) where c does not depen
Derive an expression for the MLE of A. = P(E; 8)
er of male deaths from cancer of R( 8) = c · P(E; 8)
(b) The table on the following page shows the numb c. P(E; B) P(E; B).
the liver during 1964-8 for Ontar io regions. Find
:l. for these data, and compute
deaths for each region . Do the data appea r to of the expression for R(8).
the estimated expected number of The multiplicative const ant c in (9.1.2) cancels out
be consistent with the assum ptions in (a)? by the choice of c in (9.1.2).
Thus R(8), like B, S(8), and f(8), is not affected
8-values, it follows that
3. (a) Suppose that 0 is a weighted average of 01 and 82 ; that is, Note that since L(8) s L(B) for all possible
OsR (8)s 1.
O=a 1 81 +a201
18
9. Likelihood Methods 9.3. Relative Likelihood
19
The log relative like liho od func tion
is the natural logarithm of the . Likeli?ood regions or interval
likelihood function: relative s may be determined from a gra
its logarithm r(8), an<;! usually it ph of R(8) or
is more convenient to work with
r(B) =lo g R(8) =lo g L(B) - log log 0.5 = -0.69, we have r(8) r(8). Since ·
L(B). :<! -0.6 9 for the 50% like
Similarly, r(8) "2:. -2.3 0 for the lihood interval
It follows that
r(B) = 1(8) - l(B) (9.3.2)
Alternatively, the endpoints of
equation r(8 )- log p = 0. Usu
10% LI, and r(8) :<! -4.61 for
the 100p% LI can be found as
the l % u:
roo ts of the
where 1(8) is the log likeliho ally it is necessary to solve
od function. Since 0 s R(8) s numerically (see Section 9.8). this equation
- oo $ r(8) $ 0 for all possible para 1, we have When r(8) has a simple form
meter values. as in the examples of this sect
Let 81 denote some particular
parameter value. Then informati.on ~bout .8 can be sum ion the
R(B )- L(8i) _ P(E; Bi)
or three hkehhood mtervals. Giv
marized adequately by rep orti ng
en these results, it is possible to
e and two
graph o~ r(8). to a r.easonable app reconstruct a
1
-~- P(E;B) rox ima tion . Such a summary mig
appropriate tf, for mstance, r(8) ht not be
had several relative maxima and
Probability of the data E when the neighborhood of 8. In this minima in
8 = 81 case it would be better to pres
Maximum probability of E for r(8) than to atte mpt a summary. ent a gra ph of
any value of 8 ·
If R(8 1 ) = 0.1, say , then 8 is rath
er an implausible param~ter valu EXAM_PLE9.3.l (continuation ofE
the dat a are ten times mor1 e pro e because
However if R(8 ) = 0.5, say, then e
bable when 8 = than they are
when 8 = 81 .
8 1 is a fairly plausible paramet
exammed, three are found to
xample 9.1.1). Suppose that, out
have tuberculosis. On the bas
of 100 people
1
because it gives the dat a 50% of er value observation, which values of 8 is of this
the maximum possible probability are plausible? Com par e with the
under the would be obtained if 200 people results that
model. The relative likelihood were examined and six were fou
function ranks all possible para tuberculosis. nd to have
according to their plausibilities meter values
in the light of the data.
Usually B exists and is unique, SOLUTION. Fro m Example 9.1.
and the definition (9.3.1) applies. l, the log likelihood function is
generally, R(8) may be defined More
as the ratio of L(8) to its suprem
parameter values, um over all 1(8) = 3 log 8 + 97 log (l - 8),
R(8) = L(8) / sup L(8). and the maximum likelihood esti
mate is tJ = 0.03. The maximum
9 likelihood is of the log
Since L(8) = c · P(E ; 8) where P(E
; 8) $ 1, the supremum is finit /(B) = 3 log(0.03) + 97 log(0.97
likelihood function exists and e. The relative ) = -13 .47 .
may be used to rank paramet The log relative likelihood func
according to their plausibilitie er values
e
s even when does not exist. tion is thus
r(B) = /(8) - 1(0) = 3 k>g 8 + 97 log
(l - 8) + 13.47.
Likelihood Regions and Interv A graph of this function is shown
als in Figure 9.3.l (solid line). Fro m
we find that r( 8) :<! - 2.30 for 0.00 the graph
6 $ 8 $ 0.081, and this is the l 0%
The set of 8-values for which R(8 Y_al~es of 8 inside this LI for 8.
) "2:. pis called a 100p% likeliho interval are fairly plausible in
od region for 8. Similarly, the 50% LI is 0.014 light of the dat a.
Usually the 100p% likelihood regi
on will consist of an interval of quite plausible, because they give
s 8 s 0.054. Values within this interval are
and then it is called a 100p% real values, the data at least 50% of the
like liho od interval (LI) for 8. probability which is possible und maximum
Usually we consider 50% , 10% er the model.
, and l % likelihood intervals or If we o_bserved 6 diseased out
Values inside the 10% LI will be regions. of 200, we would have
referred to as "plausible", and valu
this interval as "implausible". es outside
Similarly, we shall refer to valu 1(8) = 6 log 8 + 194 log (l - 8),
50% LI as "very plausible", es inside the
and values outside the 1% and 8 = 0.03 as before. The max
implausible". Of course, the cho LI as "very imum of the log likelihood is now
ice of division points at 50%, 10%
rath er arb itra ry and should not , and l % is
be taken too seriously. 1(0) = - 26.95.
The 14.7% and 3.6% likelihood
intervals are sometimes calculat Figure 9.3.1 shows the correspond
cor resp ond approximately to 95% ed. These ing log relative likelihood functio
and 99% confidence intervals (see broken line . Both functions atta in n with a
11.4). Section their maximum at 8 = 0.03. How
RL F based on the sample of 200 ever the log
people is more sharply peaked
tha n the log
9. Likelihood Methods 9.3. Relative Likelihood 21
20

r(9 ) Since p 1 = y/n = 0.7 at the maximum (Example 9.1.3), the maximum log
.04 .06 .08 .10
.02 likelihood is
11 (ji 1 ) = 28 log 0.7 + 12 log 0.3 = -24.43.
_ _ based on 3 diseased
out of 100 The log relative likelihood function is then
-I
r 1 (µ) = / 1 (µ)- / 1 (jii) = -280µ + 12 log(l -e - µ) + 24.43.
10
_____ based on 6 diseased
out of 200
Similarly, the log relative likelihood function based only on observation 2 is
-2
R•O.I
\
\
r 2 (µ) = -37µ + 3 log(l - e-µ) + 10.66.
\

-3
\
\ For both observations together, the log LF is
\

I
I \
\
= 11(µ)+12(µ)
/(µ)
I
-4 I
I \ = -317µ + 12 log(l-e - 10µ) + 3 log(l -e - µ).
\
From Example 9.2.2, the overall MLE is µ = 0.04005, and substituti on of this
I I
I R•0.01
value gives /(fl)= - 35.71. The log RLF based on both observations is thus
I
\
-5
r(µ) = /(µ) + 35.71.
Figure 9.3.1. Log relative likelihood functions from Example 9.3.1.
The three log RLF's are tabulated in Table 9.3.1 and graphed in Figure
RLF based on the sample of 100 people. As a result, the larger sample gives 9.3.2, with r(µ) being given by the broken line. From the graphs, the following
shorter likelihood intervals for 8. For instance, the 10% LI is (0.011, 0.063) for 50% likelihood intervals may be obtained:
the sample of 200, as opposed to (0.006, 0.081) for the sample of 100. Observation 1 only: 0.025 ::; µ::; 0.049
In general, increasing the number of independent observations will
produce a more sharply peaked likelihood function and thus shorter Observation 2 only: 0.036::; µ::; 0.144
likelihood intervals for 8. With more observations there will be a shorter Both observations combined: 0.029-::; µ-::; 0.053.
range of plausible values for 8, and so 8 can be more precisely estimated. As a
rough guide, the length of the 1OOpo/o likelihood interval is inversely Table 9.3.1. Log Relative Likelihood
proportio nal to the .square root of the number of independent observations. Functions for Example 9.3.2
Thus about 4 times as many observations are needed to produce an interval
only half as wide. D µ r1(µ) r2(µ) r(µ)

-5.43
EXAMPLE 9.3.2. In Example 9.2.2, we considered data from two experiments 0.005
-3 .55 -9.51
0.01 -6.59
with test tubes containin g river water: 0.015 -3.42 -2.52 -5.32
Observati on 1: y = 28 negative reactions out of n = 40 0.018 -2.25 -2.09 -3.71
0.02 -1.66 -1.85 -2.89
test tubes each containing v = 10 ml. -0.67 -1.37 -1.42
0.025
Observation 2: y = 37 negative out of n = 40 tubes om -0.17 -1.02 -0.57
0.04 -0.08 -0.54 -0.00
with v =I. 0.05 -0.76 -0.26 -0.39
-0.09 -1.39
Graph the log relative likelihood functions and obtain 50% likelihood 0.06 -1.92
intervals for µ based on the two observations taken separately, and taken om -3.40 -0.02 -2.80
0.08 -5.12 -0.00 -4.50
together. -0.10
0.10
0.20 -1.87
SOLUTION. The log likelihood function based only on observation 1 is 0.30 -4.50
11 (µ) = 28 log p 1 +12 log(l - pi); p 1 = e- 10 1t.
22
9. Likelihood Methods
9.3. Relative Likelihood
23
r(µ)
.08 .10 .12 r(µ)

-1

-2 50% LI (0.41 ,"')


\
\
-3 10% LI (0.29 ,oo) ·
I I% LI (0.22 ,m)
I
-3 \
\
\
I -5
I
I
-4 \
\ Figure 9.3.3. Log relative likelihood funct
I ion when µ = + co.
I
I
-5
Then, as we noted at the end of Example 9.1.3
I , L(µ) increases asµ increases to
+ oo . We say that ji = + oo, although strictly spea
because this value does not belong to the king ji does not exist
Figure 9.3.2. Combination of Jog RLFs from para mete r space .
independent experiments.
Even when ji does not exist, the relative likel
ihood function is well defined
The log RLF based on observation 2 only and can be used to determine the range of
is almost flat over a large range of plausible para mete r values. As µ
µ-values, indicating that this observation tends to + oo, L(µ) increases to 1, and henc
provides relatively little informa- e
tion abou t µ . The combined log RLF base
d on all the data is very nearly the sup L(µ) = l.
same as that based on observation 1 alone
. OSµ< oo
The combined log RLF r(µ) can be obtained The relative likelihood function ofµ is then
directly from a table or graph
ofr 1 (µ)an d r2 (µ). We form the sum r (µ) +
1 r 2 (µ), and observe the value of µat
which it is greatest. This will be the overall
MLE µ. The combined log RLF is R(µ) = L(µ) = ( 1 ..._ e - 'oµ)4o
then sup L(µ) for 0 $ µ < oo.
r(µ) = r 1 (µ) + r 2 (µ) - [r 1 (.u) + r 2 ([l)]. The log relative likelihood function,
lfr 1 ([l) + r 2 (j1) is small (e.g. less than -2),
then there exists no single value
ofµ which is plausible on both sets of
data. The two sets of data are then in r(µ) = 40 log( l - e- 10"),
contradiction, since they poin t to different is plotted in Figure 9.3.3. We have r(µ) 2':
values for the same parameter µ . -0.6 9 forµ > 0.41, and hence the
When this happens, it is generally inadvisabl 50% LI forµ is (0.41, oo ). Any value ofµ
e to combine the two data sets. which exceeds 0.41 is very plausible
Instead, the para mete r should be estimated in light of the data. Similarly, we have r(µ)$
separately for each data set, and -4.61 forµ $ 0.22, so that any
an explanation for the discrepancy should value ofµ less than 0.22 is extremely impl
be sought (see Section 12.3). ausible.
In the present example, we find that r (ji)
1 + r (ji) = -0.6 2. There do exist
values ofµ (near 0.04) which are quite plaus 2 PROBLEMS FOR SECTION
ible for both observations, and 9.3
hence no cont radic tion is apparent. It is there
fore reasonable to combine the Lt Prepare a graph of the log RLF in Prob lem
two observations, and to base statements 9.1.1, and from it obtai n 50% and
abou tµ on r(µ), the combined RLF. 10% likelihood intervals for A..
0 2. The number of westbound vehicles which
EXAMPLE 9.3.3 . Relative likelihood when pass a fixed point on a main east- west
jJ. = + oo. road in 10 seconds is a Poisson variate with
Suppose that n = 40 test tubes are prepared, mean µ. The numbers passing in
each containing v = 10 ml of disjoint time intervals are independent. The
river water, and that all of them give posit follcrwing table summarizes the data
ive results (y = 0). The likelihood from 300 ten-second intervals:
function of µ is then
L(µ) = (1 - p) 40 = ( l - e :._ 10") 40 No. of vehicles in 10 sec. 0
for 0 ~ µ < oo. 2 3 4 5
Frequency observed 61 107 76 45 10
25
9. Likelihood Methods 9.4. Likelihood for Contin uous Models
24
ution of family size in a large
50% and 10% likelihood 11. The following model is propo sed for the distrib
Plot the log RLF of µ, and from the graph obtain population:
intervals for µ.
P(k childr en in family) = rJ!' for k = 1, 2, .. . ;
or machine 2, and has available
3. A compa ny plans to purch ase either machine l P(O children in family )= (1 - 2a)/{l -a).
the following performance data: )

Here Cl is an unkno wn param eter; and 0 < a<


t. Fifty families were chosen at
Machine l: 0 failures in 7800 trials ed numb ers of children are summ arized
rando m from the popul ation. The observ
Mach ine 2: 4 failures in 21804 trials. in the following frequency table:
failure is 8 1 for machine l and 82 2 3 4
Trials are independent, and the proba bility of No. of children · 0
8 on the same graph . Under what .
for machine 2. Plot the Jog RLF's of 8 1 and 2
recom mend the purcha se of machine 2 rather than Frequ ency observed 17 22 7 3
conditions would you
machine l? expected frequencies. Does the
(a) Find the MLE of a and calculate estima ted
ed die) in Proble m 9.1.5(b).
4.t Find the relative likelihood of 8 = 0 (a balanc model give a reasonable fit to the data?
that a= 0.45. Is this plausible
8 in Proble m 9.1.4(b). (b) A large study done 20 years earlier indica ted
5. (a) Plot the log RLF of the gene frequency
e of 200 indivi duals is taken from a second large popul ation for the curren t data?
(b) A rando m sampl
duals with the three blood
in which e may be different. The numb ers of indivi
to be 48, 102, and 50. Plot the log RLF based on this sample
types are found
on the graph prepa red in {a).
(c) Indicate, with reasons, wheth er you think
that 8 could be the same in both
the log RLF for 8 based on
9.4. Likelihood for Con tinu ous Mod els
populations. If it is appro priate to do so, obtain
red in (a). ently used as probability
Continuous probability distributions are frequ
both sampl es, and show it on the graph prepa
ent of time, weight, length,
6. Find 50% and 10% likelihood intervals
for Nin Proble m 9.1.7. models for experiments involving the measurem
butio n with probability density
m 9.1.11. Which values of b have
etc. Suppose that X has a conti nuou s distri
7.t Suppo se that r = n =IO and y = 5 in Proble butio n functi on F, depending upon an
· function f and cumulative distri
relative likelihood 50% or more? 10% or more? rmed and values of X are
unknown parameter 8. The experiment is perfo
ate 8, or more generally, to
8. In Proble m 9.1.lO(b), graph the log RLF of p
and obtain a 10% LI for p . observed. The problem is to use the .data to estim
in light of the data.
9.1.12 showed 180 with one defective, determine which values of 8 are plausible
9. The records from 200 samples in Proble m e, f(x) does not give the probability of
Evaluate ll, plot the log RLF When X is a continuous variat
17 with two defectives, and 3 with three defectives. n 6.1 , the probability of
of e, and obtain a 10% likelih ood interv al for e. observing the value x. In fact, as we noted in Sectio
urement of time, weight, etc.
ded is poure d over a cell sheet.
any particular real value is zero. An actual meas
IO.t A soluti on in which virus particles are suspen many decim al places. An observed
which is visible. The cell will necessarily be made to only finitely
Each virus particle attack s the cells to form a plaque to some small interv al of real values
equal area. The following are the value x will therefore correspond

r
sheet is divided into disjoin t region s of value x is then
a< X s b, say. The probability of <;>bserving the
numb ers of plaques observed in 20 regions:
s f(x)d x = F(b) - F(a) . (9.4.1)
2 2 1 4 8 P(a < X b) =
3 2 6 0
l 3 2 0 1 5 2 3 4
we observe n values
Suppose that, in n independent repetitions,
(a) Suppo se that the virus particles are rando
mly and uniformly distrib uted
x 1, x 2 , ... , x., with x 1 corre spond ing to the
real interval [a,, bJ. Because
over the cell sheet at the rate of J. per region . Plot the log RLF of J. and find a the data is obtai ned as a
repetitions are independent, the probability of
10% LI. product:
r recorded only whether the
(b) Suppo se that, for each region, the experimente
result was positiv e (at least one plaque ) or negati
know
ve (no plaques) Thus for the
only that there were 18
P(E; 8) =fl P(a 1 < X s b;) =
I
fl
[F(b 1) - F(a;)] .
i=
(9.4.2)
20 regions referred to above, we would I= 1
ves. Plot the new log RLF for ,1, on the graph prepar ed (9.4.2).
positives and 2 negati
ation about ,1, been lost in The likelihood function of 8 is propo rtiona l to
in (a) and find a 10% LI. Has much inform then F(b 1) will be close to F(a 1) ,
and negati ves? If the interval length ~; = b1 - a1 is small,
recording only positives ~;
26
9. Likelihood Methods
9.4. Likelihood for Cont inuo us Mod
els
and com puta tion of the differenc 27
e F(bi) - F(a 1) may introduce serio
roun doff errors. In this case, we mak us with mean 8 = 30 changes very little
e use of (6.1. 7), and. approximate the over an interval of length 1. Areas
und er the density function between area • the p.d.f. will thus be well app roxi mat under
a, and b, by the area of a rectangle ed by rectangles, and (9.4.4) should
base Ll 1 and height f (x;): with an accurate approximation. We subs give
titut e for f(x 1) in (9.4.4) and take c
obta in = 1 to
(9.4.3)
Som e or all of the factors in (9.4.2)
function which is easier to deal with
are app roxi mat ed in this way to obta
in a
L(8 )= TI -e
n
1~ 1
1 - """
8
9
=& - •exp ( - -1 l:x ) •
computationally and mathematical 8 1
In the most usual case, all of the ly. The log likelihood function is
measurement intervals Ll are small,
the approximation (9.4.3) may be 1 and
applied to all of the terms in (9.4.
gives 2). This

P(E; 8) ~)]
n
f(x 1)Ll 1 = [
}n] Ll 1 JI\
n
f(x;). The score and information function
s are
Since the Ll/s do not dep end upon n
the product of probability densities
8, the likelihood function is proportio
nal to S(8) = - - + -12 Ex · J(8) =-
n 2
, 8 8 " 82 + 83 LX; .
n
We may now solve the maximum
L(B) = c · TI f(x; )
i:::.1
(9.4.4) 1
likelihood equ atio n S(8) = O to obta
in
iJ = - l:x 1 = x. Not e that
where c is any convenient positive n
constant. This is actually an approxim
tion , but it will be an extremely accu a-
It is not necessary to replace ever
rate one whenever the Ll;'s are all sma
y factor in (9.4.2) by the approxim
ll. e
J( ) =
n 2n8
n
- 02 + 7J3 = 02
(9.4.3). For instance, it may hap pen ation >0
that f(x) changes rapidly when xis
in which case the original terms in sma ll, and hence the root obtained is a rela
(9.4.2) could be retained for small
values x 1, tive maximum.
and the app roxi mat ion could be used
for large x;'s. Another situation whe
some of the terms in (9.4.2) should re
be retained will be discussed in the r( e)
section. next

EXAMPLE 9.4. 1. A certain type of


electronic com pon ent is susceptib
inst anta neo us failure at any time . le to -r- --- ->. --5 0% LI : 20"a
However, components do not dete -"43
with age, and the chance of failure riorate
within a given time period does not
upo n the age of the component. depend
Fro m Section 6.2, the lifetime of
com pon ent should have an exponent such a
ial dist_ribution, with probability dens
function ity
-2
10% LI: 16 se"6 2
f(x) = -1 e-xfB for x > 0,
8
where 8 is the expected lifetime of -3
such components .
Ten such com pon ents were tested
independently. The ir lifetimes, mea
to the nearest day, were as follows: sured
-4
70 11 66 5 20 4 35 40 29 8.
Wh at values of 8 are plausible in 1% LI : 12-" a" 90
the light of the data?
SOLUTION BASED ON (9.4.4). Each obse -5
rved lifetime corresponds to anjn terv
of length Li= 1. The average lifetime al
is abo ut 30, and the exponential p.d.f Figu re 9.4.l . Log relat ive likel ihoo
. from an exponential distribution.
d func tion for the mea n base d on
ten obse rvati ons
29
28 9. Likelihood Methods 9.4. Likelihood for Continu ous Models

The total of the II= 10 observed lifetimes is LX; = 288, so that e= 28.8 and Table 9.4.1. Comp arison of Exact and Appro ximate
Likelihoods Based on Ten Obser vation s from an
288
1(8)= -10 Jog 8- . Expon ential Distrib ution
0 Difference
Exact r(O) Approx. r(O)
The log relative likelihood function, 0 Based on (9.4.2) Based on (9.4.4) (9.4.2)- (9.4.4)
r(8) = 1(8) - £(8), 5 -30.0745 . -30.09 06 + 0.0161
mean lifetime between -8.218 4 -8.222 1 +0.0037
is plotted in Figure 9.4.1. The observ ations indicate a
10
-5.242 9 -5.245 3 +0.0024
20 and 43 days (50% LI). Values of 8 less than 16 days
or greater than 62 days 12
15 -2.675 4 -2.676 7 +0.0013
are implausible (relative likelihood less than 10%). + 0.0006
20 -0.753 0 -0.753 6
25 -0.104 8 -0.105 0 +0.0002
shall determine the
EXACT SOLUTION BASED ON (9.4.2). For compa rison, we 40 -0.485 3 -0.485 0 -0.000 3
of the exponential
exact likelihood function based on (9.4.2). The c.d.f. 60 -2.140 1 -2.139 7 -0.000 4
distrib ution with mean 8 is 80 -3.8169 -3.816 5 -0.000 4
100 -5.3284 -5.327 9 -0.000 5
F(x) = 1 - e - x/O for x > 0. -10.81 94 -0.000 5
200 -10.81 99
interval x ± 0.5, with -14.39 46 -14.39 41 -0.000 5
An observed integer value x > 0 corres ponds to a real 300
probab ility

F(x + 0.5)- F(x -0.5) = exp( - x


05
~
· )-ex p( _ x
5
) '+8°· to our previous result (e = 28.800). Table 9.4.1 compa res
which we obtain ed
the exact log RLF
previo usly from (9.4.4).
with the approx imate log RLF
8::; 100 which includes
The agreement is extremely close over the range 12::;
eter values.. As one might expect , the
all but the most implausible param
as 8 becom es small; for then the p.d.f. changes
agreement becomes worse
n (9.4.3) is less
, x , x. is more rapidly over a short interv!il, and the approx imatio
Hence by (9.4.2), the probab ility of observed values x 1 2
...,
accura te.
ential distrib ution
More generally; if an observ ation x from an expon
P(E ; 8) = Jl [exp(_!_)-
1•1 28
exp (-_!_ )]exp ( -x;/8)
28 corresponds to a real interva l x ± h, the ratio of the exact probab ility (9.4.1) to
the approx imate probab ility (9.4.3) is

exp (- x -
8
h) - exp (- x +8 h) e' - e
-c c
2
c
4

---'-- ----'. ,----; ,..--- -'---- - = - - - = 1 + - + - + .. .,


The likelihood function is 1 ( x) 2c 3! 5!
eexp -8 ·2h
L(8) = c · P(E; 8)
remen t interval to the
and we take c = 1 for convenience. The log likelihood
function is where c = h/8 is the ratio of half the length of the measu
will be accura te whenever c is
mean of the distrib ution. The approx imatio n
1(8) = n log[e xp(;8 ) - exp( - 28 ) ] -
1
~fa;, small.

PROBLEMS FOR SECTION 9.4


and the solutio n of the equati on S(8) = O is of the air·
Lt The following are the times (in hours) between successive failures
conditioning system in an aircraft:
97 51 11 4 141 18 142 68 77
80 1 16 106 206 82 54 31 216
The exact log RLF is now r(8) = 1(8)-1(8). 111 39 63 18 191 18 163 24
which is very close 46
For the ten observ ations given, we find that() = 28 .797,
30 9. Likelihood Methods 9.4. Likelihood for Continuous Models
31
(a) Assuming that these are independent observations from an exponen
tial (b) The original solution is diluted by half so that the concentration is
distribution with mean 0, find 0 and the 10% likelihood interval for now µ/2,
0. and m additional measurements Yi, y 2 , ... , Ym are taken. Find the MLE
(b) Prepare a frequency table for these data using classes (0, 50], ofµ
(50, 100], based on all n + m measurements.
(100, 200], and (200, oo ). Calculate estimated expected frequencies for
these
classes under the assumption in (a). Does the exponential distribut 7. A laboratory method for determining the concentration of a trace
ion metal in
appear to give a reasonable model for the data? solution produces independent N(O, u 2 ) errors. If the true concentr
ation is µ,
then the measured concentration Xis a random variable distributed as
2. Family income X is measured on a scale such that X = 1 correspo N(µ, u 2 ).
nds to a In order to estimate u, several measurements are taken for a solution
subsistence level income. The p.d.f. of the income distribution is assumed with
to be known concentration µ.
f(x) = O/x 8+ i for x;:::: 1
(a) The following 5 measurements were made on a solution with
where 0 > 0. The following are the incomes of ten randomly selected known
families: concentration µ = 10:
1.02, 1.41, 1.75, 2.31, 3.42, 4.31, 9.21, 17.4, 38.6, 392.8 9.3 11.2 8.7 10.1 10.7
Find the MLE and the 10% likelihood interval for 0.
Plot the log RLF of u.
3. It is thought that the times between particle emissions from a radioacti (b) The following 5 measurements were made on a solution with
ve source known
are exponentially distributed with mean 0. However, the Geiger counter concent rationµ= 20:
used to
register the emissions locks for 1 unit of time after recording an emission
. Thus 21.7 19.9 20.3 20.4 19.7
the p.d.f. of the time X between successive recordings is
Plot the log RLF of u based on these data on the graph prepared in
f(x)=~e-<x-IJ/O forx::O:l. (c) If it is appropriate to do so, find the MLE and the 10% LI for u based
(a).
0 on all
The following are ten observed times between recordings: ten measurements.
8. A scientist makes n measurements Xi, x , ... , x. of a constant
1.47 1.46 2.20 2 µ using a
1.36 2.90 technique with known error variance u 2 , and m additional measure
3.71 3.89 1.29 1.86 1.81 ments
Y1>Yz, ... ,ym ofµ using a technique with known error variance
ku 2 • Assuming
Find the MLE and the 10% LI for 0. that all measurements are independent and normally distributed, find
the MLE
ofµ. Show that, if n = m and k > 1, then µ is closer to x than
4. t A manufacturing process produces fibers of varying lengths. The length to ji, and explain
of a fiber why this is desirable.
is a continuous variate with p.d.f.
f(x) = 9. (a) Suppose that U is a continuous variate, and that U/O has a x2 distribut
e- 2 xe~~x/O for x > 0 ion
with n degrees of freedom. Find the p.d.f. of U, and show that 0 = U
where 0 > 0 is an unknown parameter. Suppose that n randomly selected /n.
fibers (b) Suppose that Vis independent of U, and V/O has a x2 distribution
have lengths x 1> x 2 , ... , x •. Find expressions for the MLE and RLF with m
of 0. degrees of freedom. Find the joint p.d.f. of U and V, and show that the
MLE
5. Let Y denote the time to failure of an electrical component. The distribut of 0 based on both U and V is (U + V)/(n + m).
ion of Y
is exponential with mean ()/t, where tis the temperature at which the compone
is operated. Suppose that n components are tested independently
nt 10.t The probability density function for a unit exponential distribut
ion with
at temper- guarantee time c > 0 is
atures tI> t 2 , .. ., t., respectively, and their observed lifetimes are y ,
1 y 2 , .. ., Yn·
Derive an expression for the MLE of 0. f(x) =ec-x for x;:::: c.
6.t A laborato ry method for determining the concentration of a Suppose that x 1 , x 2 , .. ., x. are independent observations from this distribut
trace metal in ion.
solution produces N(O, u 2 ) errors. If the true concentration is µ,
then the (a) Show that c= x(i)• the smallest observation, and find the RLF of
measured concentration X is a random variable distributed as N(µ, 2 c.
u ). The (b) Find an expression for the 100p% likelihood interval for c.
value of u is known from previous experience with the method.
(a) Let x 1,x 2, ... ,x. be independent measurements of the .same 11. Suppose that x 1 , x 2 , ... , Xn are independent observations from the continuo
unknown us
concentration µ. Show that µ = x, and that the log RLF ofµ is uniform distribution over the interval [O, O]. Show that the likelihoo
d function
of 0 is proportional to o-n for 0 2 x(n)> and IS zero otherwise. Hence
n determine
r(µ) = - u (x - µ) 2 for - oo < µ < oo. the MLE and RLF of 0.
2 2
Hint: Show that L(X; - µ) 2 = L(X; - x) 2 12.t Suppose that x 1 , x 2 , .. ., x. are independent observations from
+ n(x - µ) 2 . uniform distribution over the interval [O, 20]. Find the RLF of 0.
the continuous
32 9. Likelihood Methods 9.5. Censoring in Lifetime Experiments 33

13. Suppose that X and Y are continuous variates with joint probability density The model will usually involve one or more unknown parameters e which
function require estimation from the data.
f(x, y) e-ox-y/O for x > 0, y > 0. Suppose that n specimens are tested independently. If the experiment is
continued sufficiently long for all of the items to have failed, the likelihood
Find the MLE and RLF of 0 on the basis of n independent pairs of observations function fore based on then observed lifetimes x 1 , x 2 , ..• , x. can be obtained
(x., y1), i ='= 1, 2, ... , n. as in the last section. However, one might wait a very long time indeed for all
14. Independent measurements x 1 , x 2, •.. , x. are taken at unit time intervals. For of the specimens to fail, and it is often desirable to analyze the data before this
i = 1, 2, ... , 0 the measurements come from a standardized normal distribution happens. One or two hardy specimens may tie up a laboratory f-0r months or
N(O, !). A shift in the mean occurs after time(}, and for i = 0 + 1, 0 + 2, ... , n the years without greatly adding to the information about e, at the same time
measurements come from N(l, 1). preventing other experiments from being undertaken. It often makes good
(a) Show that the likelihood function of 0 is proportional to practical sense to terminate the experiment before all n items have failed.
If the ith specimen has failed by the time the experiment terminates, we will
exp{- .I
t=l
(x 1 !)}. know its lifetime xi. This will actually correspond to a real interval
ai < X::;: b1, say, with probability
(b) Graph the log RLF !or 0 on the basis of the following set of 20 consecutive P(a 1 < X::;: b1) = F(b 1) - F(a 1) ~f(xi).!1;,
measurements:
provided that the time interval Li;= b1 - a1 is small.
-1.26 -0.16 -0.64 0.56 -1.82 -0.76 -2,08 -0.58 0.14 If the jth specimen has not failed when the experiment ends, we will not
0.94 -0.58 0. 78 1.80 0.58 0.02 0.86 2.30 1.80 0.84 -0.18 know its lifetime, and the lifetime is said to be censored. The censoring time T;
is the total time for which the specimen had been tested when the experiment
Which values of 0 have relative likelihood 10% or more? ended. For this specimen, we know only that T; < X < oo, and the probability
15.* The p.d.f. of the double exponential distribution is of this event is
P(~ < X < oo) =F(oo)-f(~) = 1 - F(~).
f(x)=!exp{-lx-01} for co < x < co,
The likelihood function of e will be a product of n factors, one for
where - co < 0 < co. Let X1, X2, ... , Xn be independent observations from this
distribution, and let x0 ) s xm s ·.. S x<•l denote these n observed values each specimen tested. Suppose that m specimens fail and n - m do not, so
arranged in nondecreasing order. that we have m failure times xu x 2 , .•• , xm, and n - m censoring times
T1 , T2 , ... , T.-m· Then the likelihood function of e will be proportional to
e
(a) Show that, if n is odd, then = X(m) where n 2m - 1.
{b) Show that, if n = 2m, then 1(0) is maximized for any value of 0 between x<ml
and X(m+ !)o and SO 0 is not unique. [CT f(x )Li,] )J [1 - F(T)].
1

The Li;'s do not depend upon e and can be absorbed into the proportionality
constant to give
9.5. Censoring in Lifetime Experiments L(e) ={.a JXf c1 -
f(x;) F(Tj)]. (9.5.1)

In many experiments, the quantity of interest is the lifetime (or time to failure) where c is any convenient positive constant. The maximum likelihood estimate
of a specimen; for instance, the lifetime of an electronic component, or the and RLF can now be obtained.
length of time until an aircraft component fails from metal fatigue, or the
survival time of a cancer patient after a new treatment.
The probability model generally assumes the lifetime X to be a continuous
Special Case: Exponential Distribution
variate with some particular probability density function f and cumulative
distribution function F. For example, if we thought that the chance of failure If X is assumed to have an exponential distribution with mean e, then
did not depend upon the age of the specimen, we would assume an
exponential distribution. Lifetime distributions for situations in which the F(x) = 1 - e-x/O for x > 0.
risk of failure increases or decreases with age were considered in Section 6.4.
34
9. Likelihood Methods
9.5. Censoring in Lifetime Experiments
35
In this case, (9.5.1) simplifies to give

s = 152 + 100 = 252. Hence e= -252


8
- = 31.5, and

where s is the total elapsed lifetime (time on 252


test) for all n items: 1(8)= -8 log 8-lf .
m n-m
s= Ix; + I
i=
Ti. If it had been decided to terminate the exper
iment after 25 days, the data
1 I j=
would have been
The log likelihood function is ·
25 + 11 25 + 5 20 4 25 + 25 +
25 + 8.
/(8) = -m log 8- e's There are now m = 5 lifetimes with total 48,
and n - m = 5 censoring times
and solving S(B) = 0 gives e= s/m. The log RLF is then e
with total 125, giving s = 173 and = 34.6. The
log likelihood function is now

r(8) = /(8)- 1(0). 1(8)= -51o g8- 173 .


'-... 8
EXAM PLE 9.5.1. Consider the experiment described
in Example 9.4.1. Figure 9.5.2 shows the three log relative likeli
Suppose that then = 10 components were place hood functions resulting from
d on test simultaneously, and (i) stopping the experiment after T = 25 days,
it was decided to terminate the experiment after (ii) stopping the experiment
50 days. The ten actual life- after T = 50 days, and (iii) contipuing the
times are shown in Figure 9.5.1. If testing stopp experiment until all of the
ed at 50 days, everything to the components have failed (i.e. stopping at time
right of 50 would be hidden from view, or censo T> 70). The three functions
red. The data would then be agree reasonably well for 8 :::;; 30, indicating
that plausibilities of small
50 + 11 50 + 5 20 4 35 40 29 8, parameter values are affected very little even
when 50% of the lifetimes are
where 50 + indicates that the first and third lifetim censored. However, the three curves diverge
es were censored at 50 days. considerably for large values of
In the notat ion defined above, we have
m = 8 lifetimes with total
11 + 5 + 20 + ··· + 8 = 152, and n - m = 2 r (e}
censoring times with total
50 + 50 = 100. The total elapsed lifetim
e for all 10 components is

T 25
0
T•50
I I
I I 70
~II
I I
I I
I I 66
-5 I
I
120 I
-;<. 4
I I
I I
I I
35
I I
40 I
I I
29
I
I
I
I
0 20 40 60 Time
-5
Figure 9.5.1. Diagr amma tic representation
of lifetime data showing two possible
censoring times. Figure 9.5.2. Log refative likelihood functi
on for the exponential mean () under
various levels of censoring.
9. Likelihood Methods 9.6. Invariance 37
36

Hormone·treated Control
8. With no censoring, values of 8 greater than 62 are implausible (R < 0.1);
with censoring at 25 days, 8 can be as large as 108 before R decreases to 10%. Recurrence 2 4 6 9 9 9 1 4 6 7 13 24
Censoring thus makes it impossible to place as tight an upper bound on the times 13 14 18 23 31 32 25 35 35 39
value of 8, but has little effect on the lower bound.These results suggest that if 33 34 43
we were primarily interested in establishing a. lower bound for 8, a short
Censoring 10 14 14 16 17 18 3 4 5 8
experiment with heavy censoring could be quite satisfactory.
times 18 19 20 20 21 21 10 li13 14 14 15
23 24 29 29 30 30 17 19 20 22 24 24
Note. In applications, the appropriate anarysis will normally be that which 31 31 31 33 35 37 24 25 26 26 26 28
corresponds to the pattern of censoring actually used in the experiment. 40 41 42 42 44 46 29 29 32 35 38 39
However, in some cases one might also wish to examine the likelihood 48 49 51 53 54 54 40 41 44 45 47 47
function that would result from more severe censoring in order to see what 55 56 47 50 50 51
effect a few large lifetimes have on the analysis.

(a) Find the probability density function, and show that the mean of this
PROBLEMS FOR SECTION 9.5 distribution is e.
Ten electronic components with exponentially distributed lifetimes were tested (b) Forty bulbs were tested and failures occurred at the following times (in
l. hours):
for·predetermined periods of time as shown. Three of the tubes survived their
test periods, and the remaining seven failed at the times shown.
196 327 405 537 541 660 671 710 786
2 3 7 8 9 10
4 5 6 940 954 1004 1004 1006 1202 1459 1474 1484
Tube number
1602 1662 1666 1711 1784 1796 1799
Test period 81 72 70 60 41 31 31 30 29 21
Failure time 2 51 33 27 14 24 4
The remaining bulbs had not failed when testing stopped at 1800 hours.
Find the MLE and the 10% likelihood interval for the exponential mean 0. Find the MLE and the 10% likelihood interval for 8.
5. * An arrow is shot at the center of a circular target of radius 1. Let X denote the
2.t n electronic components were simultaneously placed on test. After a time T
testing was stopped. It wa~ observed that n - k were still operating and that k horizontal displacement and Y the vertical displacement of the point of impact
had failed, but the times at which the failures had occurred were not known. from the center of the target. It is to be assumed that X and Y are independent
Assuming that failure times follow an exponential distribution with mean. 0, N(O, cr 2 ) random variables.
derive the maximum likelihood estimate and the relative likelihood function (a) Show that the probability of a shot missing the target is
of 0.
P(X 2 + Y2 ;:::: 1) =exp { -1/2cr 2 }.
3. A clinical trial was conducted to determine whether a hormone treatment
benefits women who were treated previously for breast cancer. A woman entered (b) Of n independent shots, m hit the target at points (x,, y,) for i = 1, 2, ... , m.
the clinical trial when she had a recurrence. She was then treated by irradiation, The other n - m shots miss the target, and their points of impact are not
and assigned to either a hormone therapy group or a control group. The recorded. Find the MLE of er.
observation of interest is the time until a second recurrence, which may be
assumed to follow an exponential distribution with mean OH (hormone therapy
group) or Oc (control group). Many of the women did not have a second
recurrence before the clinical trial was concluded, so that their recurrence times 9.6. Invariance
are censored. In the following table, a censoring time "n" means that a woman
was observed for time n, and did not have a recurrence, so that her recurrence Suppose that the probability model for an experiment depends upon an
time is known to exceed n. Plot the log RLFs of (JH and Ocon the same graph. Is unknown parameter 8. The model then consists of a whole family of
there any indication that the hormone treatment increases the mean time to e
probability distributions, one for each value of in the parameter space n.
recurrence? For example, we might assume that the time to failure of an electronic
4.t* The cumulative distribution function for the lifetime of a new type oflight bulb is component has an exponential distribution, with probability density function
assumed to be 1
F(x) = 1-(l + 2; )e-2x/6 for x >0
f(x) = -e-xfO
e for 0 < x < oo, (9.6.1)
38 9. Likelihood Methods 9.6. Invariance
39

e e
where is the expected lifetime. For each value of belonging to !l = (0, oo ), EXAMPLE 9.6.1. In Example 9.4.l, we supposed that the lifetimes of electronic
we have a theoretical distribution. For instance, the distribution labeled by components were exponentially distributed, with mean lifetime e. On the
B= 1 is basis of ten observations, we found that 8 = 28.8. The 50% LI for was e
20:::; B s 43, and the 10% LI was 16:::; e:::; 62.
for 0 < x < oo, (9.6.2)
(a) Suppose that we are interested in the failure rate, A= 1/8. Then the
and the distribution labeled by 8 = 2 is MLE of A is
1 1
1 1 = 8 = 28.8 = 0.0347.
f(x) = -e-xfl for 0 < x < oo. (9.6.3)
2
The 50% LI for A is obtained by noting that 20:::; l/A:::; 43 if and only if
A family of distributions can be parametrized (or labeled) in many different
1/20:::: A:::: 1/43. Hence the 50% LI is 0.023:::; A:::; 0.050. Similarly, the 10%
ways. For instance, we could equally well write (9.6.1) as LI is found to be 0.016:::; A=:; 0.063.
for 0 <x < oo (b) Suppose that we are interested in the proportion p of such components
which will last at least 25 days. Then
where A= 1/8 is the failure rate. Distributions (9.6.2) and (9.6.3) are now
1
J
oo
labeled by A= 1 and A= 0.5, respectively. We have the choice of labeling the /) = P(X ;:o: 25) = (/-x/8dx = e-25/8,
family of exponential distributions by values of 8, or by values of A, or by 25
values of any other one-to-one function of 8. We usually try to select a which is a one-to-one function of 8. Hence the MLE of p is
parametrization so that the parameter represents some interesting character-
istic of the distribution, and the mathematical expressions are fairly simple. 'iJ = e- 2518 = 0.420.
When we say that 8 = 1 is ten times as likely as 8 = 2, we imply that the Since 8 = - 25/log /), the 50% LI for p is given by
distribution labeled by 8 = 1 is ten times as likely as the distribution labeled
by 8 = 2. When we say that the maximum likelihood estimate of 8 is 8 = 1.1, 25
20s; ---<43
we imply that the distribution labeled by 8 '== 1.1 is the most likely distri- log f3 -
bution. Since the method of labeling the distributions is largely arbitrary, it
would seem desirable that the plausibilities assigned to the distributions and solving for P gives 0.287:;:;; p::::; 0.559. Similarly, the 10% LI is
0.210 s p::; 0.668.
should not depend upon the particular method of labeling which has been
selected. In other words, the plausibilities assigned should be invariant under
one-to-one transformations of the parameter. PROBLEMS FOR SECTION 9.6
An attractive property of the likelihood methods we have discussed is that
1. Let Ydenote the median lifetime of electronic components in Example 9.4.1. Show
they are invariant under one-to-one parameter transformations. For suppose
that Y =Blog 2, and hence obtain the MLE and the 10% likelihood interval for y.
that 8 = g(A) where g is invertible, and let P(E; 8) be the probability of the
observed event E. Substituting 8 = g(A) in this expression gives the proba- 2. We wish to estimate p, the probability of no diseased trees in a four-acre plot, in
bility of E as a function of A. It follows that L(8), the likelihood function of 8, Problem 9.1.1. One approach would be to note that 4 out of 10 plots contained no
and L.(A), the likelihood function of A, are related as follows: diseased trees, so that p= 0.4 and L(p) = p4 (1- p) 6 • A second approach would be
to express p as a function of A and use the invariance property of likelihood.
L.(A) = L(8) where 8 = g(A). Determine the MLE and the 10% likelihood interval for p by both methods.
Under what conditions would the first method be preferable?
Hence both functions have the same maximum value, and
3.t The following table summarizes information concerning the lifetimes of one
R.(A) = R(8) where 8 = g(A). hundred V600 indicator tubes. (Ref.: D. J. Davis, Journal of the American
If A. 1 is any possible value of A and 8 1 = g(A 1 ) is the corresponding value of 8, Statistical Association 47 (1952), 113-150).
then A1 and 8 1 have the same relative likelihood. Relative likelihoods do not Lifetime (hours) 0-100 100--200 200--300 300--400 400--600
depend upon whether we choose to work with parameter A or parameter fJ. Frequency observed 29 22 12 10 10
It follows that, if 1is the MLE of 1, then 8 = g(1) is the MLE of 8. Similarly, Lifetime (hours) 600-800 800+
fJ 1 belongs to the 100p% likelihood region for 8 if and only if fJ 1 = g(A 1 ) where Frequency observed 9 8
A1 belongs to the 100p% likelihood region for A.
Suppose that the lifetimes follow an exponential distribution with mean 8.
•·.
40 9. Likelihood Methods 9.7. Normal Approximations
41

(a) Show that the joint probability distribut.ion of the frequenc or equivalently, r(8) :2: log p. Taking rN(8) :2: log p gives
ies is multinomial
with probabilities
p 1 =P(O< T< 100)= 1-/J; p2 = P(IOO < T < 200) = /J(l - /J); .. . ; 8 E {j ± j ( - 2 log p)/§ (B) (9.7.3)
p 7 = P(T> 800) = /J 8 , as an approx imation to the 100p% likeliho od region. This
where /3 = e- 10019• is an interva l
centere d at 8 with length
(b) Show that fl can be obtaine d as a root of a quadrat ic equatio
n, and deduce the 2J ( - 2 log p)/J(fJ) .
value of&.
(c) Prepare a graph of the log RLF of /J. Obtain 10% and The larger the value of J(li), the narrow er the approx imate
50% likelihood interva l will be,
intervals for /J, and transform them into likelihood intervals
for e. and hence the more inform ation we have concer ning 8. This
is the reaso n that
4.* The arrivals of westbound vehicles at a fixed point on ..1(8) is called the "inform ation functio n".
an east- west road are
random events in time. On the average there areµ arrivals per When the normal approx imatio n is sufficiently accurat
ten second interval. e, all of the
A traffic signal is to be installed a short distance beyond the observa inform ation concer ning 8 is summa rized in 8 and J (O). The
tion point. It MLE indicat es
is desired that the signal remain at "STOP " for a time /3 such that the most likely parame ter value, and J(8) indicat es the precisio
the probab1hty n with which
of holding up k or more vehicles is p. IJ can be determ ined. Given these two values, likeliho od
interva ls of any sizes
(a) Show that p = P(x~2 kl :s; fiµ / 5). In particula~. if k = 8 desired can be obtaine d from (9.7.3).
and p = 0.05, then
/3 = 39.8/µ. How large a sample is necessary before the normal
. approx imatio n
(b) Assuming that k = 8 and p = 0.05, use the data in Problem r(8) ~ rN(8) can be used? This depend s very much on the situatio
9.3 .2 to determme n. In the first
'/J and the 10% likelihood interval for /J. exampl e below we find that r(8) = rN(8) exactly for all sample
sizes, but in the
second exampl e the approx imatio n is not very good even
with 500 observ a-
tions. Thus it is necessary to check the accurac y of the normal
approx ima-
9.7. Norm al Appr oxim ation s tion in each new situatio n. We can do this by plottin g both
r(8) and rN(8) on
the same graph and verifying that they agree closely for values
of 8 inside, say,
Let 1(8) denote the log likeliho od functio n of a continu ous the 10% likeliho od interval. Alterna tively, we can check that
parame ter IJ with a graph of the
possibl e values in n. Let S(8) = 1'(8) and ..!(8) = -1"(8) denote score functio n S(8) is well approx imated by a straigh t line
.the scor~ and over this interval.
inform ation functio ns as in Section 1. We assume that
B exists and 1s an EXAMPLE 9.7.1. Let x 1 , x , .. ., xn be indepe ndent
interio r point of il, and that 1(8) has a Taylor 's series expans observ ations from a normal
ion at 8 = 8: 2
distribu tion with unknow n mean µ and known varianc e <J 2
. If the measur e-
8 - 8 71 (8 - 8) 2 71 (8 - 8) 3 ,,,(71) ment interva ls are small, the likeliho od functio n ofµ is propor
/(8) = 1(0) + -1!-l'( u) + - -!- l"(u) + - 3-!- 1 u + .. · · tional to the
2 produc t of probab ility density functions:
Since /'(8) = O and r(8) = 1(8) -1(0), we have
L(µ) = c · fl f( x;) = c · fl -~1-
i=l
exp {- ~
2<J
(x , - µ) 2 }
1 71
-(8 - u) 2§(u)
71 i =l
+ (8
(J
r(8) = -
2
- - -10)3
-
1"'(71)
t1 + ... · (9.7.l)
3.
The normal approximation to r(8) is defined as follows : = exp {- : 2 I:(x, - µ) 2 }
2
by choice of c. Hence the log likeliho od, score, and inform ation
rN(8) = - !(8- 0) 2 ..f(O). (9.7.2) functio ns are
2
l 2. 1 n
/(µ) = - 2<J2 L(X; - µ) • S(µ) = 2 I:(x; - µ);
If 18 - &i is small, the cubic and higher terms in (9.7.l) are small, and
hence (J
J(µ)= 2 ·
(J
r(8) ~ rN(8).
The effect of increas ing the amoun t of data is to produc e a
sharply peaked Solving S(µ) = O gives fl. = !n I:x ; = x, and hence the log relative likeliho od
likeliho od functio n and shorter likeliho od interva ls (see Examp
le 9.3.l). Thus, functio n is
for a sufficiently large sample , 18 - Bl will be small and rN(8)
will give a good
approx imatio n to r(8) over the entire region of plausible
parame ter values.
The 100p% likeliho od region for 8 is the set of 8-values such
that R(8) 2 p,
r(µ) =I(µ) - l(f.i.) = - 2:2 I:(x, - µ) + 2:2 I:(x; - x)
2 2

42 9. Likelihood Methods
9.7. Normal Approximations 43

Upon expanding the squares and simplifying, we get


These two functions are plotted in Figure 9.7.1 for the case n = 10, 8 = 28.8
n (µ - x_) = - 2
1(µ - ') 2 "'(') (see Example 9.4.1). The agreement is very poor because r(8) is highly skewed,
r(µ) = - 2a2 2
µ ..r µ .
but rN(8) is symmetrical about the line 8 = e. The exact 10% likelihood
interval is 15.65 s 8 s 61.88 (see Example 9.8.2), but (9.7.3) gives 9 s 8 s 48.
In this example we have r(µ) = rN(µ) for allµ, and indeed it is for this reason
With n = 500 and () = 28.8, the exact 10% likelihood interval is
that we call rN(µ) the normal approximation. By (9.7.3), the exact 100p%
26.20s8s31.75, and (9.7.3) gives 26.04s8::;;31.56. The agreement,
likelihood interval for µ is given by
although much better than for n = 10, is still not very good. It is true that
µex± r(8)-+ rN(8) as n-+ oo, but n must be very large before the normal approxima-
tion is accurate enough to use.
EXAMPLE9.7.2. Suppose that x 1 ,x 2 , •.• ,x. are independent observations
from an exponential distribution with unknown mean 8. We assume that the
measurement intervals are small. Then, from Example 9.4.1, the log likeli-
hood, score, and information functions are
Parameter Transformations
1 n 1 We noted in Section 6 that relative likelihoods are invariant under one-to-
1(8) = -n log 8- L.x;; S(8) = - {j + 82 L.x;;
0 one parameter transformations 8 = g(A.). However, the normal approxim-
n 2 ation (9.7.2) is not invariant. Because of this, it may be possible to achieve
§(8) = - 82 + 83 L.x;. greater accuracy by transforming from 8 to a new parameter }, before
approximating.
Also we have 8 = x and §(0) = n/0 2 • Hence the relative likelihood function Since (9.7.2) is obtained by ignoring the cubic and higher terms in (9.7.1), it
and normal approximation are makes sense to look for a transformation which reduces the size of the cubic
r(8) = 1(8)-1(0) = -n[~ - 1 - log ~l term relative to the quadratic term. Hopefully this will improve the accuracy
of the normal approximation. We can then obtain approximate likelihood
intervals for A. and transform them via g into intervals for 8.

EXAMPLE 9.7.2 (continued). Consider the family of power transformations


8 = J.a, and the log likelihood function of}, is
8 =A.a where a i= 0. Then
l.(A.) =/(A.a)= -na log A.- nJ.a;A.a.
The first three derivatives with respect to A. are
l:(A.) = -na/A. + naJ.a;1a+ 1 ;
I
-I I
\
lZ(A.) = na/A. 2 -na(a + l)J.apa+z;
I
I \
I rN(e)\ r(e) lZ'(A.) = -2na/A. 3 + na(a + l)(a + 2)J.a;1a+ 3 •
I \
I I Thus we have
I \
-2 I
I I J.(A) = -IZ(A) = na 2 /A 2 ;
I
I
\
I The third derivative is zero for a= - 3; that is, for 8=1- 3 • Thus if A.= 8- 113 ,
I
I
I \ the cubic term in the Taylor's series expansion of l.(A.) about A.= Ais zero, and
-3 \
I the normal approximation to l.(A.) should produce accurate results.
I \
I \ The normal approximation to r.(A.) is
I \
I \ r.(A.) ~ - t(A. - J.)2 J.(J.).
-4 I I
I \ This is compared with r(8) and rN(8) in Table 9.7.1 for the case n = 10,
Figure 9.7.1. Log relative likelihood function and normal approximation. 8= 28.8. Transforming from 8 to the new parameter A.= 8- 113 has substanti-
ally improved the accuracy of the normal approximation.
44
9. Likelihood Methods 9.7. Normal Approximat ions
45

Table 9.7.1. Comparis on of Normal Approximat~ons


If 8 and J(B) are known, 1 can be found by solving the equation 1J = g(l),
and §.(1) can be found from (9.7.4). The normal approxim ation to r.(J,) can
IJ A=lr 113 r(IJ) = r.().) -t(IJ- &)2 ..f(tJ) -w -1) ..f.(1)
2
then be written down. Given 8 and .f"(O), not much extra work is needed to
12 0.437 -5.25 -1.70 find the normal approxim ation for any one-to-on e function of e.
-5.17
15 0.405 -2.68 -1.15 -2.65
20 0.368 -0.75 -0.47 -0.75 PROBLEMS FOR SECTION
25 0.342 -0.10 -0.09 9.7
-0.10
40 0.292 -0.49 -0.76 -0.48 I. Obtain approximate 10% likelihood intervals for IJ in Problems 9. l.4(b)
60 0.255 -2.14 and
-5.87 -2.12 9.l.5(b), and investigate the accuracy of the normal approximation to r(IJ) in these
80 0.232 -3.82 -15.80 examples.
100 -3.75
0.215 -5.33 -30.56 -5.19
2. Find approximate 10% likelihood intervals for IJH and Oc in Problem 9.5.3 . Repeat
using the transformed parameters ).H = /Jii 113 and Ac:= Be 113 as in Example 9.7.2.
By (9.7.3), the approxim ate 100p% LI for). is Compare your results with the exact 10% LI's for IJH and Be.
A. E 1 ± j(-2 log p)/J,(A) = 1[1 ± j(-2 log p)/9n]. 3.tConsider the situation described in Problem 9.4.6. A decision must be made as to
the number n of measurements which will be taken to estimate the unknown
Transform ing this via e= A. - 3 gives concentrat ionµ of a trace metal in solution. lt is desired that the 10% LI forµ
eE B[l ± j(-2 logp)/9n] - 3 should have width at most 2 units. Determine the appropriate value of n as a
function of the error variance cr 2 .
as the approxim ate 100p% LI for 0. For n = 10, () = 28.8 this gives
15.62::; e::; 62.16 as the approxim ate 10% LI whereas the exact result is 4. Suppose that X has a binomial (n, 0) distribution, and consider the series
15.65::; 0::; 61.88. By transform ing to the new parameter ). = I/3, we are e- e.
expansion of l(IJ) about IJ = Show that the ratio of the cubic term to the quadratic
term in this expansion is
able to achieve greater accuracy for n = 10 than we obtained previously with
n = 500. 2
- (tJ - IJ)(l - 2tJ)/tJ( 1 - tJ).
3
Under what conditions will the normal approximation to r(IJ) be satisfactory?
Transforming the Information Function
5. Suppose that x successes are observed inn Bernoulli trials with success probability
Suppose that we change parameter s from 8 to ). via the one-to-on e 0. Let W denote the width of the approximate 100p% LI for 8.
transform ation 0 = g().). By the invariance property, the log likelihood (a) Find an expression for W, and show that
function of ). is /
l.(A.) = 1(8). W5 j (-2 logp)/ n.
The score function of ). is (b) How large must n be in order to ensure that the approximate 10% LI for 0 has
width at most 0.04?
d di
dB
S.(A.) = -1.
d).
= -d8 -d). = S(IJ)· -dB
d). . 6. .Suppose that two independent experiments both give information about the same
parameter e, and that
The informati on function of ). is

J,().) =
d
- d). S. = -S(IJ) dd). 82
2
-
dS (d8)
d/J. dA.
2
where .J 1 = J 1(tJ 1) and .J2 = J 2(fJ 2). Show that the overall MLE based on both
experiments is given approximately by

= - S(B/28
d). 2
+ J(B) (d8)2
d).
B=(.J,e, + .J2e2)/ (.J1 + .J2).
and that this value lies between 81 and 82 •
At the maximum we have S(B) = 0, and therefore
7. (a) Let X 1 , X 2 , ... , x. be independent Poisson variates with meanµ, and consider
J.(1) = qi J(B) transformations of the form µ = )." where a¥- 0. Find the log likelihood
(9.7.4) function of )., and show that the cubic term in the expansion of this function
where q is the value of d8/d). at the maximum . about ). = ~ is zero for a = 3.
46 9. Likelihood Methods 9.8. Newton's Method 47

(b) Obtain an approximate 10% LI for ,l = µ 113 , and transform it to obtain an The revised guess is the point at which the linear approximation (tangent) to
interval for µ.
(c) Suppose that n 10 and :Ex 1 = 53. Using a table or graph, investigate the
e1 as the new
g(B) at B = B0 crosses the B-axis (see Figure 9.8.1). We now take
accuracy of the normal approximations to the original log RLF ofµ, and to the preliminary guess and repeat the calculation to get
log RLF of the transformed parameter }. = µ 113 . 82 =Bi - g(Bi)/g'(Bi).
We continue this procedure until Bi+ 1 ~ B1, in which case g(O,) = 0 and a root
has been found.
9.8. Newton's Method
In this section we describe two applications of Newton's iterative method for Solving the Maximum Likelihood Equation
solving an equation.
Suppose that we wish to find a root Bof the equation g(B) = 0. Let B0 be a We noted in Section 9.1 that, under suitable conditions, fl is a root of the
parameter value close to Band consider the Taylor's series expansion of g(O) maximum likelihood equation S(8) = 0, where S(8) = 1'(8). Taking g(8) = S(8)
about B= 00 :
in the above derivation, we have g'(8) = S'(8) = -f(B) (see Section 9.1). Thus
g(8) =g(8 0 ) + (0- 80 )g'(80 ) + (8-8 0 ) 2 g"(8 0 )/2! + ···. the updating formula (9.8.1) becomes
For 18 - 80 1 small, the quadratic and higher terms in this expansion will be (9.8.2)
small, and dropping these terms gives
Starting with an initial guess 80 , we repeatedly update to get 8 1 , 82 , 8 3 , ..•.
g(8);::::; g(8 0 ) + (B - 80 )g'(Bo). We stop as soon as ei+ 1 ~ e,, so that S(8;);::::; 0. To verify that a relative
maximum has been found, we check that f(8;) > 0.
We are approximating g(8) by a linear function of ewhich has the same value
Newton's method works well in most statistical applications. If the initial
and slope as g(O) at e = 80 •
guess is reasonable, the procedure usually produces an accurate approxima-
Since g(B) = 0, we have
tion to Bafter only three or four iterations. The reason for this is that, for
g(0 0 ) + (B- 80 )g'(8'0 ) ~ 0, moderately large samples, S(8) is nearly linear in e (see Section 9. 7). If S(8) is
and therefore exactly linear in e, Newton's method produces Bin a single iteration.
e;::::; 80 - g(8o)/g'(8o). If S(8) = 0 has more than one root, Newton's method will not necessarily
converge to the one desired. Difficulties can also arise if the maximum occurs
In Newton's method we take 80 to be a preliminary guess at 8, and then on or near a boundary of the parameter space. It is a good idea to examine a
compute a revised guess e1 as follows: graph of 1(8) before applying Newton's method.
(9.8.1) EXAMPLE 9.8.1. Newton's method will be used to obtain the overall MLE P. in
Example 9.2.2. The score and inf9rmation functions are
120 3 1200p 1 3pz
g(e)
S(µ) = - - + - 440; f(µ) = (1 - p ) 2 + (1 - P2) 2
1- P1 1 - P2 1

where p 1 = e-ioµ and p 2 = e-µ. The calculations are summarized in Table


9.8.1.
'\
A convenient choice for the initial guess µ 0 is the average of the individual
'' estimates in Example 9.2.2:
'' 1
'' µ0 = 2(0.0357 + 0.0780) = 0.057.
' Now we find that
S(µ 0 ) = -109.66; f(µ 0 ) = 4518.16
and (9.8.2) gives
Figure 9.8.1. Solution of g(O) = 0 by Newton's method.
µ! = 0.057 -109.66/4518.16 = 0.03273.
48
9. Likelihood Metho ds
9.8. Newto n's Metho d
49
Table 9.8.1. Solut ion of S(µ) = 0 by Newt on's
Meth od g(e)= r (e)-1o g p

µ, S(µ,) .1'(µ1) I
''
µi+l I
I
0 0.057 -109. 66 4518.16 0.03273 I
I
I,
1 0.03273 83.07 13902.58 0.03871 I
I

2 0.03871 12.87 9910.74 0.04001 0


I
I
I
I
I
' ' \'1
3 0.04001 0.41 9270.86 0.04005 e 00 I e
4 0.04005 0.04 9252.15 0.04005
I
I
I
I .
I I
~ 100p% LI
At the next step we comp ute

S(µi) = 83.07; J(µ1) = 13902.58 Figure 9.8.2. Solution of r(l:l)- log p = 0 by Newt
on's method .
and hence obtai n
EXAMPLE 9.8.2. Newt on's meth od will be used
µ2 = 0.03273 + 83.07/ 13902.58 = 0.03871. interval for 8 in Example 9.4.l. For this exam
to obtai n the 10% likelihood
ple we have
Cont inuin g in this fashion, we obtai n µ
= 0.04005 corre ct to five decimal
places. Note that J(f.i) > 0, so a relative maxi 288 10 288 10 576
8 ; S(8)= --+
mum has been found. 1(8)= -101 og8 - -· ..1(8) = - 82 + B3.
8 82 '
The MLE is {) = 28.8, and so
Likelihood Inte rval Calc ulati on
1(8) = -43.6 04; ..f(B) = 0.01206.
In p~evi~us ~xamples w.e found likelihood
intervals from a graph of the log Thus the log relative likelihood function is
relative hkeh hood funct 1on r(8). Alternatively,
we can obtai n the endp oints of
the 100p% likeli hood interv al by solving 288
the equa tion g(8) = O, where r(8) = 1(8) -1(0) = - 10 log 8 - B + 43.604,
g(8) = r(8)- 1og p.
and (9.8.4) gives
Usually nume rical meth ods will be requi red,
and Newt on's iterative meth od
can again be used. 8 = 28.8 ± j(-2 log 0.1)/ 0.01206 = 28.8 ± 19.5.
Since r(8) = 1(8)- l(B), it follows that g'(8) = Table 9.8.2 shows the calcu lation of the left
/'(8) = S(8), and so (9.8.l) gives endp oint with initi al estim ate
81 = 80 - [r(80)- log p]/ S(8 ). 28.8 - 19.5 = 9.3. After five iterat ions, the left
0 endp oint is found to be 15.65,
(9.8.3)
Calcu lation of the right endp oint is illust
rated in Figur e 9.8.2. We begin Table 9.8.2. Calcu lation of 10% LI by
with a prelim inary estim ate 8 for the endp
0 oint. The revised estim ate 8 is the New ton's Meth od
8-value at which the tange nt to g(8) at 1
8 = 80 crosses the 8-axis. The
calcu lation is repea ted with the revised value
as the new initial estim ate. We 1:1, r(l:I,) S(8;) 1:1;+ 1
conti nue in this way until convergence to
the right endp oint is obtai ned. A
secon d iterat ion is then carrie d out for the 0 9.30 -9.66 4 2.255 12.57
left endp oint.
Start ing values for New ton's meth od can 1 12.57 -4.621 1.027 14.83
be taken from a preliminary
graph of r(8). Alternatively, they can often 2 14.83 -2.78 3 0.635 15.59
be obtai ned from the norm al 3 15.59 -2.33 6 0.544
appro xima tion (9.7.2), which gives . 15.65
4 15.65 -2.30 4 0.536 15.65
e= 8± j(-2 Iog p)/ ..f(O) 0 48.30 -1.133 -0.08 36 62.29
as appro xima tions to the interv al endp oints I 62.29 -2.33 8 -0.0863 61.88
. 2 61.88 -2.30 2 -0.0864 61.88
50 9. Likelihood Methods 9.8. Review Problems
51

correct to two decimal places. Similarly, the initial value for the right REVIEW PROBLEMS FOR CHAPTER 9
endpoint is 28.8 + 19.5 = 48.3, and the final value is 61.88 after three
iterations. Thus the 10% likelihood interval is 15.65 -o; 8 -o; 61.88. 1. (a) Red spider mites are distributed randomly and uniformly over the surface area
of leaves on an apple tree. A sample of 100 leaves of unit area yielded the
following results:
PROBLEMS FOR SECTION 9.8
Lt Use Newton's method to locate the maximum of the following log likelihood Number of mites 0 2 3 4 5 2:6
function : Observed frequency 16 31 22 18 ·10 3 0
/(µ) = 100 logµ- 50 µ- 50 log(l -e-µ) forµ> 0.
Find the MLE and the 10% LI for)., the expected number of mites per unit
2. Suppose that the score function is linear in 0,
area.
S(O) = ae + b for -w<fJ<o o, (b) The following collapsed table would have been obtained if only the absence or
presence of mites on a leaf had been recorded:
where a, bare constants with a< 0. Show that Newton's method converges to 0 in
one iteration for any starting value 00 .
Number of mites 0 1 or more
3. Samples of river water are placed in test tubes and incubated. There are n test
1
tubes each containing volume v" and y 1 of these give negative reactions, indicating Observed frequency 16 84
the absence of coliform bacteria. Altogether, data are available for m different
volumes v1 , v 2 , ... , vm. It is assumed that the bacteria are distributed randomly and Find the MLE and the 10% LI for). based on the collapsed table. Has much of
uniformly throughou t the river water, with A. bacteria per unit volume on average. the informatio n concerning A. been lost?

(a) Show that the score and informatio n functions for A. are 2.t In a study of the spread of diseas~ among spruce trees planted in a reforestatio n
project, a single line of trees is selected and the number of healthy trees between
S().) = r v,(n; - y,) - 1:v n ; successive diseased trees is counted.
1 1
x 1 -p,
Number of healthy trees 0 2 3 2:4 Total
where p1 = exp ( - Av1).
(h) Using Newton's method, evaluate J. for the following data: Observed frequency 50 23 14 8 5 100
Volume ~ 8 4 2 1 If the disease is non-contagious, the number X of healthy trees between successive
No. of test tubes ~ 10 10 10 10 diseased trees should have a geometric distributio n, with probability function
No. of negatives y, 0 2 3 7
for x = 0, 1, 2, . ..
4.t Use Newton's method to obtain the 10% likelihood interval forµ in Problem 9.3.2.
where 0 < 11. < 1.
5. The probability thatj different species of plant life are found in a randomly chosen
(a) Assuming the model to be appropriat e, calculate the MLE and the 10% LI
plot of specified area is
for 11..
(1 - e-;y+ I (b) Calculate estimated expected frequencies under the model. Does the model
P1= (j + l)). forj=O, 1, 2, ... , give a reasonable fit to the data?
where 0 < ). < oo . The data obtained from an examinatio n of200 plots are given in 3. A shipment of 20 items contains d defectives, where d is unknown. Six items are
the following frequency table: selected at random without replacement, and only one of them is defective. Find
the maximum likelihood estimate and the 50% likelihood interval for d.
No. of species 0 1 2 3 2:4
Frequency 4. An inoculum consists of a suspension of virulent microorganisms. To assess its
147 36 13 4 0
strength, n animals are given a dose of l ml. If the dose contains one or more
(a) Obtain expressions for the log likelihood, score, and informatio n functions organism the inoculated animal will get sick, otherwise it will not.
of A. (a) Find the probability p that an animal does not get sick as a function of)., the
(b) Evaluate J. by Newton's method.
density of organisms per ml o:f inoculum.
(c) Calculate estimated expected frequencies. Does the model appear to give a I (b) Out of 10 animals inoculated, 6 got sick. Find the MLE and 10% likelihood
reasonable fit to the data? j
interval for p.
(d) Use Newton's method to find the 10% likelihood interval for ).. l (c) From the results in (b), obtain the MLE and 10% likelihood interval for )..
52
9. Likelihood Methods

5. An experi.ment was conducted to estimate y, the 90th percentile (0.9-quantile) of the


hfet1me d1stnbut10n of a new type of transistor. Ten transistors were tested and the CHAPTER 10
observed lifetimes were

9 25 6 18 43 17 12 10 18 42 Two-Parameter Likelihoods
Assuming that the lifetimes follow an exponential distribution, find the maximum
hkehhood estimate of y, and determine the relative likelihood of the value y = 60.
6.t(a) The lifetimes (in hours) of certain radio tubes are independent continuous
variates with cumulative distribution function
F(x) = 1 - e - • 19 for x > 0
where e> 0. Five tubes were tested simultaneously over a period of 1000 hours.
One of them failed in hour 132 and another failed in hour 768. The remaining
three tubes survived the test period. Obtain the log likelihood function and
MLE of e based on these results.
(b) Find the maximum likelihood estimate of</>, the fraction of such tubes which
fail in the first I00 hours of use.
7. Suppose that events are occurring randomly in time at the constant rate of A. per
mmute. The numbers of events are observed in n time intervals of varying lengths, In this chapter we consider likelihood methods for parameter estimation
with the following results:
when the model involves two unknown parameters, a and /3. Section I
describes the method of maximum likelihood. The relative likelihood
Length of time interval lI 12 ... {0
Number of events function and likelihood regions are considered in Section 2. Section 3 defines
X1 X2 •.. X0 •
the maximum relative likelihood function of /3, whose properties are similar
Derive the likelihood function and maximum likelihood estimate of A.. to those of a one-parameter relative likelihood function. Normal approxi-
mations to the log RLF and maxi.mum log RLF are described in Section 4.
8. LetX1 , X2 .. ... x.. Yi. Y2,. . ., Y.,,beindependentvariates theX.'sbeingN{µ 2)
' I> 11 Sections 5 and 6 deal with two applications. The estimation of the
an d the Y.1 s N(µ2. 11 ). Bothµ, and µ 2 are known but 17 2 is 'not. Find
2 I
the MLE of 17 2
based on all n + m measurements. relationship between the probability of a response (e.g. death) and the dose of
a drug is considered .in Section 5. Section 6 describes an example from
9.tOne of the three children in a family comes home with the measles. Each of the learning theory, in which the probability of a response is dependent on the
other two children has probability e of catching measles from him. If neither or results of previous trials.
both get the measles, the epidemic ends. However, if only one of them gets the Section 7 derives some results quoted in Section I, and describes the use of
disease, the remammg child has another opportunity, with probability e, of being
infected. Newton's method to compute points on a likelihood contour.
Most of the discussion extends readily to the case of three or more
(a) Let X denote the total number of children in the family who are infected before unknown parameters. However, difficulties can anse with maximum likeli-
the epidemic ends. Show that hood estimation and maximum relative likelihood functions when there are
P(X = 1) = (1 - 8)2; P(X = 2) = W(l - 8)2; many unknown parameters. A brief discussion of the multi-parameter case is
P(X = 3) = 8 2 (3 - W).
given in Section 8.
(b) The following data were obtained in a survey of 100 three-child families in
which at least one child contracted the measles:

No. of children with measles 2 3


10.1. Maximum Likelihood Estimation
Observed frequency 48 32 20
Suppose that the probability model for an experiment involves two unknown
Evaluate the MLE of 8, and calculate estimated expected frequencies under the parameters, a and /3. The probability of the data (observed event) E will be a
model.
function of a and /3, and the joint .l ikelihood function is proportional to this
54 10. Two-Parameter Likelihoods 10.1 . Maximum Likelihood Estimation 55

probability: The Newton-Raphson method, which is a generalization of Newton's


L(IX, {3) = c · P(E; IX, {3) method, is often useful. In Newton's method, an initial guess B0 is improved
using
where c is positive and does not depend upon IX and p. The natural logarithm
of L( IX, /3) will be denoted by I(IX, /3). B1 =B 0 +S/.f
The maximum likelihood estimate of (IX, {3) is' the pair of parameter values where Sand§ are evaluated at B = B0 . In the two-parameter case, we have
(&, iJ) which maximizes the probability of the data. Equivalently,(&, P) is the
pair of parameter values which maximizes L(IX, /3) and /(IX, {3).
(10.1.3)
In the one-parameter case we found 0 by solving the equation S(B) = 0.
Now the score function is a vector with two components:
where the components of the score vector and information matrix are all
S(a, /3) =
s1 (a, /3)]
[ s (cx, /3) =
[81/8a.J
81/8/3 ·
· evaluated at a= r:t. 0 , f3 = {3 0 • As with Newton's method, we apply .(10.l.3)
2 repeatedly until convergence is obtained, and then check that the condition
To find (&, j)), we solve a pair of simultaneous equations (10.l.2) for a relative maximum is satisfied.
See Section 10.7 for a derivation of the Newton-Raphson method, and
(10.l.l) Section 10.5 for an example of its use.
Of course, these equations need not hold if the maximum occurs on a
boundary of the parameter space. EXAMPLE 10.l.l. Two objects with unknown weights µ 1 and µ 2 are weighed
The condition for a relative maximum in the one-parameter case was separately and together on a set of scales, giving three measurements X 1 , X 2 ,
f(O) > 0. Now the information function is a two-by-two symmetric matrix and X 3 . It is known from previous experience with the scales that measure-
ments are independent and normally distributed about the true weights with
variance l. Thus X 1 , X 2 , and X 3 are independent random variables, with

For a relative maximum, the matrix./(&, p) niust be positive definite; that is,
Given observed values x 1 = 15.6, x 2 = 29.3, and x 3 = 45.8, what are the
.J 1 I > O; .J 22 > O; .J l I .J 22 - .J f 2 > 0 (10.1.2) maximum likelihood estimates of µ 1 and µ 2 ?
where fu = §ij(a, 1J). See Section 10.7 for a derivation of this result. The joint p.d.f. of X 1 , X 2 , and X 3 is the product of three normal p.d.f.'s:
As in the one-parameter case, likelihoods are invariant under one-to-one
parameter transformations. Often a parameter transformation will simplify
the ca lculation of the maximum. The inverse transformation can then be
f(x1, Xz, X3) = (fo)3 e-(x, -µ,)'/2 e-(x,-µ,J'/2 e-(x,-µ, - µ,)'f2.

applied to obtain the MLE's of the original parameters. If the measurement intervals are small, L(µ 1 , µ 2 ) is proportional to f, and the
It follows from the invariance property that, if y = g(cx, {3), then y = g(a, p). log likelihood function is
1(µ1, µ1) = -t(x 1 - µi) 2 ~ t(x2 - µ1) 2 - !(x3 - µ1 - µ1) 2.
Calculation of(&, /3) The two components of the score function are
Suppose that it is possible to solve the first equation S1 (a, p) = 0 to obtain an
algebraic expression for a in terms of {3. Let &(p) denote the solution of this
equation. This is the MLE of r:t. given {3; that is, a(p) is the value of a which
maximizes /(ex, /3) when the value of f3 is assumed known. Substituting a= a({J)
into the second equation gives
The second derivatives are

This equation can then be solved for p as in the one-parameter case. We


2
a
--=-1
1
illustrate this procedure in the two examples below. 8µ1 8µ2
56
10. Two-Param eter Likelihoods 10. I. Maximum Likelihood Estim at ion
57
and hence the informat ion matrix is
lifetimes have approxim ately a Weibull distribut ion . From Section 6.4, the
p.d.f. of this distribut ion is
f(x) = 2f3xP- t exp { -2xP}
for 0 < x <co,
This example is exceptional in that ~(µ , µ ) does not depend upon µ and
µ2.
1 2 1 where 2 and f3 are positive paramete rs. We wish to find (1, f3) on the basis
of
the above 23 measurem ents.
To determin e (jl 1, jl 2 ), we solve the simultan eous equation s
The joint p.d.f. of n independ ent measure ments X l' X , ... , X" from the
2
Weibull distribut ion is

t~ f (x;) = A"{3"(Il xJ-


The first equation is 1
exp { - A.L:xf},
(x1 -µ1)+( x3-µ1- µ2)=0
and hence the log likelihood function of A. and f3 is
and solving this for µ 1 gives
1(2, {3) = n log 2 + n log f3 + (/3 - l)I: log X; - Xfaf_
fl.1 (µ2) = -!(x I + X3 -µ2).
The compone nts of the score function are
This would be the MLE of µ 1 if we knew the value of µ • In that case, x, and
X3 - µz are both estimate s of µ 1 , and the MLE is their2 average. _at_!!_
Substitu ting µ 1 = }(x 1 + x 3 - µ 2 ) into the second equation gives Si (A, /3) - a;. - ;_ - I:x;p.,

(x2-µ2) +x3-}( x1 +x3-µ2 )-µ2=0 a1 n


and solving for µ 2 gives
S2(A, /3) = ap = 7i +I: log X; - 2.Exf log X;.
The equation S 1(A., {3) = 0 can be solved algebraically for A., giving
fl2 = t{2x 2 + x 3 - x 1 ) = 29.6.
'J.(JJ) = n/I.xf. This is the MLE of A. when f3 is assumed to be known.
Finally, we obtain
To obtain· p, we substitut e A= n/ I:xf into the equation S (2, /3) = 0 and
2
fl1 = }(x1 + X3 - fl2) = t(2x1 + X3 - X2) = 15.9. solve for /3. We have

Since f ll = J 22 = 2 and f 11 f 22 - Jf 2 = 3, conditio n (10.1.2) is satisfied, g(/J} = s2(J.({3), {3)


and therefore a relative maximum has been found.
n
When µ1 is unknown , x 1 and x 3 - x 2 are both estimates of µ . The MLE is = {i + I: log X; - nI:xf log xjI:xf.
a weighted average of these, 1
The equation g({J) = 0 may be solved numerically using Newton' s method:
fl1 =!x1 +!(x3 -x2).
f3oew = f301d - g(/Jo1d)/g '(f301d).
The second estimate x 3 - x 2 is less precise, and therefore receives only half
the weight given to x 1. The derivative of g({J) with respect to f3 is

EXAMPLE 10.1.2. The following are the results, in millions of revolutions , n nI.xf (log x;)2 n(I.xf log x;)1
to
failure, ot enduran ce tests for 23 deep-groove ball bearings: g ({J) = - {3 2 - L.xf + (I:xf )2
17.88 28.92 33.00 41.52 42.12 45.60
In this example we have n = 23 and I: log X; = 95.46. Taking f3 = I as the
48.48 51.84 51.96 54.12 55.56 initial guess, we obtain
67.80
68.64 68.64 68.88 84. 12 ' 93.12 98.64
105.12 105.84 127.92 128.04 173.40
I:xf log X; = 7312; I:xf(log x;) 2 = 32572;

The data are from page 286 of a paper by J. Lieblein and M. Zelen in J. Res.
g(f3) = 17.213; g'({J) = -28.287;
National Bureau of Standards (1956). = /3- g(fJ)/g'(jJ) = 1.6085.
f3new
As a result of testing thousand s of ball bearings, it is known that their
Repeating the calculati ons with f3 = 1.6085 gives /3new = 2.0155. Continu ing
in
58 10. Two-Parameter Likelihoods 10.l. Maximum Likelihood Estimation 59

this fashion, we find that {J = 2.1021, correct to four decimal places. We then (a) Show that ji(IT), the MLE ofµ given IT, is the same for all possible values of IT.
obtain (b) Derive expressions for µ and a.
(c) Show that f(ji, a), the information matrix of(µ, IT) evaluated at the maximum,
2=n/l:xf =9.515 x 10 - s. is positive definite.
Owing to the large amount of arithmetic, use of a computer or programmabl e 4. Suppose that X 1 , X 2 , .•• , X. are independent normal variates with the same
calculator is almost essential in this example. variance 1T 2 , but with different means,
The parameter ,1, does not represent a quantity of interest, and it is usually
preferable to work with parameters (8, {3) where ,1, = 8 - fi. By (6.4.6), the c.d.f. of for i = 1, 2, .. .-, n
the Weibull distribution is where b 1 , b2 , .. ., b. are known constants. Find expressions for the MLE ofµ and
IT2.
F(x) = I - exp { -A.xP} = 1 - exp { -(x/B)P}.
5.tTwo treatments A and B, with success probabilities a and f3, are to be tested.
It follows that Subjects are treated one at a time, and the result for one subject is known before
P(X ~ 8) = F(8) = l -e - 1 =0.63. the next subject is treated. The first subject receives treatment A. Subsequently, a
subject receives the same treatment as the preceding subject if a success was
Thus the parameter 8 is directly interpretable as the 0.63-quantile of the observed, and the other treatment if a failure was observed. Testing continues
distribution. until there have been m failures with each treatment. The following data come
Since the transformatio n from (A., /3) to (8, /3) is one-to-one, the MLE of 8 from such an experiment with m = 2:
can be computed from). and p. Since 8 = ,1, - t tfl, the invariance property gives
Subject
e= 2- 1~ 1 = 81.88.
2 3 4 5 6 7 8 9 IO

Treatment A S S F S S S F
Treatment B S F F
PROBLEMS FOR SECTION 10.1
1. Pea plants are classified according to the shape (round or angular) and color (a) Show that, if a> /3, then the expected number of subjects who receive
(green or yellow) of the peas they produce. According to genetic theory, the four treatment A is greater than the expected number who receive treatment B.
possible plant types, RG, RY, AG, and AY have probabilities af3, a(l - /3), (1 - a)/J, (b) Find the log likelihood function and MLE's of a and p based on the above
and (1 - a)(l - /J), respectively, with different plants being independent of one data with m = 2. Generalize your results to the case of m failures with each
another. The following table shows the observed frequencies of the four types in treatment.
500 plants examined: 6. Suppose that Y 1 , Y2 , and Y3 are independent Poisson variates with means µ , µ ,
1 2
and µ 1 + µ 2 , respectively. Derive formulas for the maximum likelihood estimates
Plant type RG RY AG AY (ji 1, {1 2 ) based on nonzero observed values y 1 , y 2 , y 3.
Observed frequency 276 104 94 26
7. The number N of eggs laid by a female robin has a Poisson distribution with mean
Find the MLE's of a and /J, and calculate estimated expected frequencies under µ.Each egg has probability eof hatching, independently of other eggs. Given that
the model. n eggs were laid, the_number Y which hatch has a binomial (n, 8) distribution.
2. (a) Let x 1 , x 2 , .. ., x. be independent observations from N(µ, a 2) , whereµ and IT (a) Find the joint probability function of N and Y.
are both unknown. Show that (b) A biologist records n,, the number of eggs laid, and y,, the number which
µ = x; '2
IT = -l ,,_
'<'( x, - xl 2 . hatch, for k female robins. Find the log likelihood function and MLE's ofµ
n and e.
(b) Show that
8.tThe probability density functioJJ for an exponential distribution with guarantee
:E(x, - xl 2 = :Exf - nx 2 = :Ext - (:Ex,) 2 /n. time c is
3.t Suppose that Y1 , Y2 , .. . , Y;, are independent normal variates with the same mean fort> c
µ, but with different variances:
where A. and c are positive constants. This distribution might be used as a model
for i = 1, 2, ... , n for the response time Tin a computer system where there is a minimum response
time c. Suppose that both ,\ and c are unknown, and that we have available n
where a 1 , a2 , ... , a. are known positive constants.
independent observations t 1 , t 2 , .. ., c. from this distribution.
60
10. Two-Pa ramete r Likelihoods
10.2. Relative Likelihood and Contou r Maps
61
(a) Write down the likelihood function of ,land c, paying
careful attention to the
range of allowable values for ,l and c.
(b) Show that, for any given ,l, L(,l, c) increases as 10.2. Relative Likelihood and Con tour Map s
c increases. Hence find the
M LE's of c and A.
The joint relative likelihood function (RLF) of IX and
/3 is defined as follows:
9. Consider the situation described in Problem 9.1.9.
It 1s suggested that, while the R(a, /3) = L(a, /3)/ L(&, p).
geometric distribution applies to most specimens, a
fraction 1 - ,l of them have
flaws and therefore always fracture on the first blow. Note that 0 :s; R(IX, /3) :s; l, and R(&, '{J) = 1. As in the
one-p arame ter case, we
(a) Show that the propor tions of specimens fractur use r to denot e the natura l logari thm of R:
ing after one, two, three, and ·
four or more blows are, respectively,
r(IX, /3) =log R(IX, /3) =/(a, /3) - I(&, p).
1 - AO, W(l - 0), W 2 (1 - 0), ,l0 3 . The relative likelihood of param eter values (1X , /Jo)
0 is
(b) If x, specimens are observed in the ith catcgc•ry
(i =I, 2, 3, 4; I:x 1 = n), show Proba bility of the data when (IX, /3) = (ao, /Jo)
that
R(ao, /Jo)= Maxim um proba bility of the data
for any IX, /3.
{) = X3 + 2X 4 •
If R(1X 0, f3 0 ) is near 0, the pair (a. , {3 ) is impla usible
x2 + 2x 3 + 2x4 ' 0 0
pairs of param eter values such that the data are
becau se there exist other
(c) Comp ute estimated expected frequencies for the much more proba ble. The
data given in Problem 9.1.9 joint RLF R(IX, /3) ranks pairs of param eter values
and comment on the fit of the model. accord ing to t.he1r
plausibilities in light of the data.
The 100p% likelihood region is the set of param eter
10.tn individuals are randomly selected. Blood serum values (IX, /3) such that
certain chemical compo und and observed for a time
from each is mixed with a R(IX, /3) ~ p. The curve R(a, /3) = p which forms
Tin order to record the time the bound ary of this region is
at which a certain color change occurs . It is observed called the 100p% likelihood contour.
that m individuals respond We may think of R(a, /3) as a "mou ntain" of likelih
at times 11 , t,. .. ., t,., and that the remaining n- m have ood sitting on the (a, /3)
shown no response at the plane (see Figure 10.2.1). Its maxim um value 1
end of the observation period T. The situation is though o.ccurs at (~, /3) =(ii:, PJ. A
t to be describable by a
probability density function ,le_,, (t > 0) for a fractio conve nient way to draw R(cx, /3) in two dimen sions
n p of the population, and is by plotti ng conto urs of
complete immunity to the reaction in the remain consta nt relative likelihood in the (a, {3) plane . This
ing fraction 1 - p. Find the produ ces a conto ur map
maximum likelihood equations, and indicate how these similar to those used in geogr aphy and meteorology
can be solved for p and .l.. . Usual ly the conto urs
will form a nested set of closed curves, rough ly ellipti
11. The lengths of the gestati on period s for 1000 cal in shape .
females are summarized in the
following table: EXAMPLE 10.2.1. In Exam ple 10.1.l, the log likelih
ood functi on of the
param eters µ 1 and µ 2 was found to be
Interval (days) Frequency Interval (days) Frequency
249.5-264.5
/(µl, µ2) = -t(l 5.6 - µi) 2 - t(29.3 - µ 2) 2 - 1(45.8 - µI - µ2) 2.
6 284.5- 289.5 176
264.5- 269.5 27 289.5- 294.5 135
269.5- 274.5 107 294.5- 299.5 34
274.5- 279.5 198 299.5- 304.5- 4
279.5-284.5 312 304.5-309.5 1
Suppose that the length of the gestation period is norma
lly distributed with mean
µ and variance 2
<J .

(a) Obtain approx imate values of µ and 0- 2 by taking


all times to be at the
midpoints of their intervals (6 observations of 262,
27 observations of 267,
etc.), and using the formulas from Problem 10.l.2(a
). Compute estimated
expected frequencies, and comment on the fit of the
model to the data.
(b) Wnte down the exact likelihood function ofµ
and <J in terms of the N(O, 1)
probability integral, and indicate how µ and 0- could
be determined exactly .
Figure 10.2.1. Two-parameter relative likelihood functio
n.
62 10. Two-Parameter Likelihoods , 10.2. Relative Likelihood and Contour Maps 63

Since ji 1 = 15.9 and ji 2 = 29.6, its maximum is EXAMPLE 10.2.2. Consider the lifetime data from a Weibull distribution in
l(fl1 , f12) = -1((0.3) 2
+ (0.3) + (0.3) ) = -0.135.
2 2 Example 10.l.2. We shall work with the parameters (8, {J), where 8 is the
0.63-quantile and {J is the shape parameter. We obtain 1(8, /J) by substitut-
Hence the log relative likelihood function is ing A.= 8- P into the expression for /(A., /J) in Example 10.1.2:
r(µ1 , µ i ) = 1(µ 1 , µ 2 ) + 0.135. /(8, {J) = -n{J log 8 + n log {J + (/3 - l):E log X; - () - Pfaf_
The 50% , 10%, and l % likelihood contours are shown in Figure 10.2.2. We showed that p= 2.1021 and 8 = 81.88, so the maximum of the log
For instance, the 10% contour is given by r(µ 1 , µ 2 ) =log 0.1; that is, likelihood function is
-t(15.6 - µ 1 ) 1 -!(29.3 - µ 2 ) 2 - 1(45.8 - µ 1 - µ 2 ) 2 + 0.135 =log 0.1. 1(8, Pl= -113.691.
This is the equation of an ellipse centered at {µ 1 , j1 2 ). The 10% likelihood The log relative likelihood function of 8 and f3 is then
region is the set of all parameter values lying on or inside this ellipse.
The broken lines in Figure 10.2.2 show the outer limits of the 10% r(8, [3) = 1(8, [3) + 113.691.
likelih"od region. For all points (µ 1 , µ 2 ) in the 10% likelihood region we have Perhaps the simplest way to construct a contour map is from a tabulation
14.155µ 1 s 17.65 and 27.85sµ 2 s31.35. These are called 10% maximum of R(8, [3) = tt<0 • Pl over a lattice of (8, [3) values. Table 10.2.1 gives values of
likelihood intervals for µ 1 and for µ 2 (see Section 10.3), and parameter values R(8, [3) near the maximum, and the curve R(8, [3) = 0.5 is sketched in. This is
outside these intervals are implausible. the innermost curve on the contour map of Figure 10.2.3. The 10% and l %
Note that, although 14.15 and 27.85 are within the 10% intervals for µ 1 contours can be found in a similar way from a tabulation of R(8, {J) over a
and µ 2 , the pair of values (14.15, 27.85) is extremely implausible. It is possible larger region.
that µ 1 might be as small as 14.15, but if it is, then µ 2 is likely to be larger than The value f3 = 1 is of special interest, since for f3 = l the Weibull distri-
27.85. The axes of the elliptical contours are not parallel to the coordinate bution simplifies to an exponential distribution. Note that the line f3 = l lies
axes, and for this reason we cannot estimate µ 1 and µ 2 independently of one entirely outside the l % contour in Figure 10.2.3. If f3 = l, there does not exist
another. See Section l 0.3 for further discussion. a value of 8 for which R(8, [3);::: 0.01 ; in fact, the maximum of R(8, I) is about
0.0004. It is therefore highly unlikely that f3 = 1, and the simpler exponential
distribution model is not suitable for these data. Since f3 > l, the ball bearings
are deteriorating with age (see Section 6.4).
R=.01
32 I
The broken lines in Figure 10.2.3 show the outer limits of the 10%
I
I I likelihood region. The 10% maximum likelihood intervals are
I I
- - - - - - --1- - -----'----- 64.2 s 8 s 103.1, and 1.45 s f3 s 2.86. Parameter values outside these
I I' intervals are implausible.
I

30
Table 10.2.l. Relative Likelihood Function R({J, 8)

8=72 75 78 81 84 87 90 93
I
I
I
/J=2.6 0.019 0.066 0.155 0.261 0.338 0.351 0.306 0.230
28 _______ !I___ _ 2.5 0.047 0.136 0.275 ~418-0.501-0.49~ 0.416 0.307
2.4 0.100 0.245~.437 0.605 0.679 0.641 0. 5~5 0.383
2.3 . 0.184 0.387 0.619 0.791 0.839 0.764 0.613) 0.443
2.2 0.291/o.539 0.783 0.934 0.945 0.835 0.660 0.474
2.1 0.400 0.661 0.885 0.994 0.967 0.835 0.653 0.469
2.0 o.477 o.715 o.890 o.952 o.897 o.761 o.591 0.427
14 . 16 18 1.9 o.493 o.679 o.796 o.817 o.750 o.62~o<s1 0.354
Figure 10.2.2. Contour map for R(µ 1 , µ,)in Example 10.2.1. The broken lines show 1.8 o.441.....__ o.565 o.630 o.625 o.563---0.468 o.364 0.267
10% maximum likelihood intervals. 1.7 0.341 0.411 0.439 0.424 0.377 0.312 0.244 0.181
64
10. Two-Parameter Likelihood s 10.3. Maximum Relative Likelihood 65

f3
both samples, and x 4 in neither sample. He assumes that each individual
independently has a probability <P of survival between sampling periods, and a
3.0 I probability p of being caught in any sample if it is alive at the time of the sample.
--- - -- - ~---- ----
1
I (a) Show that the probabilities of the four classes of recapture are a( l - a), af3 , a 2 ,
I and I - a - ap, respectively, where a= </Jp and P= </J(I - p).
2.5 (b) Show that
P=-X_2_(1-&)·
x 2 +x 4 &. '
a= X1 +2x3
n+x 1 + x 3
2.0 (c) Suppose that the observed frequencies are 15, 11 , 9, and 29, respectively. Find
I the MLE's of </J and p, and compute estimated expected frequencies.
I (d) Find 10% maximum likelihood intervals for <P and p based on the data in (c).
I
I
I 3. Suppose that, in Example 10.1.2, testing had stopped at 75 million revolutions. The
1. 5
."'::::-~~=--=-::--= ______ .!._ - --- last 8 lifetimes would then have been censored. Thus we would have m = 15 failure
times x 1 = 17.88, ... , x 15 = 68.88 and 8 equal censoring times T1 = · ·· = T8 = 75.
Find the MLE's of() and p and prepare a contour map similar to Figure 10.2.3.
1.0-r----.~---.---.,----r---'-~---4
What effect does the censoring have on the estimation of() and {J?
e
60 72 84 96 108 4. Eighteen identical ball bearings were placed in test machines and subjected to a
Figure 10.2.3. Contours of constant relative likelihood for the Weibull distribution fixed radial load. The following are the numbers of hours the individual bearings
parameters m Example 10.2.2. endured at 2000 r.p.m.

183 355 538 618 697 834 862 887 1056


1147 1351 1506 1578 1607 1683 1710 2020 2410
Note on Calculation
The lifetimes are modeled as independent observations from a Weibull distribution
with c.d.f.
In the preceding example we constructed the contour map by tabulating
R(8, /J) over a lattice of (8, .8)-values. A large table of values, and therefore a F(x) = 1 - exp { - (x/B)P} for x > 0
great deal of arithmetic, may be needed to produce an accurate contour map. where () and Pare positive.
An alternative procedure is to use Newton's method to solve for points on a (a) Find &and 'fl.
particular contour. See Section 10.7 for details. (b) Plot the 10% likelihood contour, and obtain the 10% maximum likelihood
Before computing contours, the possibility of transforming parameters interval for each parameter.
should be considered. The calculation and interpretation of the contour map (c) Would it be reasonable to assume an exponential distribution model for these
will be easier if. the axes of the contours are roughly parallel to the parameter data?
axes. This is an additional reason for working with (8, {J) rather than (2, {J) in
the Weibull distribution example.

· 10.3. Maximum Relative Likelihood


PROBLEMS FOR SECTION 10.2
It may be that, although the probability model involves two unknown
1. Find the log relative likelihood function of a and pin Problem 10.1.1. Plot the 10%
likelihood contour, and obtain 10% maximum likelihood intervals for a and p parameters a. and {J, only one of them, say p, is of real interest. The joint RLF
ranks pairs of values (et., .8) according to their plausibilities in the light of the
2.t A zoologist wishes to investigate the survival offish in an isolated pond during the data. However, what we would like is a summary of the information
winter. The population may change by death during the period, but not by birth, concerning parameter .8 only.
immigration, or emigration. He catches n fish, marks them, and returns them. On
The maximum relative likelihood function of f3 is obtained by maximizing
two subsequent occasions be takes a sample of fish from the pond, observes which
of his marked fish are in the sample, and returns them. He finds th at x of the R(ix, {J) over et. with .8 fixed :
1
marked fish are caught in the first sample only, x 2 in the second sample only, x in Rmax(.8) =max R(ct, p) = R(&.({J), ,8).
3
a
66 10. Two-Parameter Likelihoods 10.3. Maximum Relative Likelihood 67

Here &:(/3) is the MLE of a given [3, which may be found by solving the and the 10% interval is 14.15 :s; µ 1 :s; 17.65. These intervals are shown in
equation S 1 (a, [3) = 0 (see Section 10.1). The natural logarithm of Rmax is Figure 10.2.2.

r mul/3) = r(&(/J), /3) = 1(£(/3), /3)-1(&:, m, (10.3.1) EXAMPLE 10.3.2. Consider the analysis of failure times from a Weibull
distribution, as previously discu.ssed in Examples 10.1.2 and 10.2.2. The
which is the difference between the restricted maximum of /(ct, /3) with /3 fixed log RLF of A. and f3 is
and the unrestricted maximum.
The joint RLF can be pictured as a mountain of likelihood sitting in the r(A, /3) == n log A+ n Jog /J + ({3- l)L Jog Xi - Xi:xf + 113.691.
(a, {J) plane (see Figure 10.2.1). The maximum RLF of f3 is the profile or From Example 10.1.2, the MLE of A. given f3 is 1(/3) = n/.Lx f . Hence the
silhouette of R(a, {3) when it is viewed from a distant point on the a-axis. maximum log RLF of f3 is
Similarly, Rmax(a) is the silhouette of the likelihood mountain when it is
viewed from a distant point on the /J-axis. rmaxlf3) = r(A(/1), {3)
The properties of Rmax ({3) are simil ar to those of a one-parameter RLF. For = n log(n/.Lxf) + n log f3 + (/3- l)I: log x 1 - n + 113.691.
instance, we have
This function is plotted with a solid line in Figure 10.3.1. The broken line
shows the normal approximation' to rmax(/1) (see Section 10.4).
If Rmaxlf3 0 ) is near 0, there does not exist a parameter value a0 such that the The 10% maximum likelihood interval for f3 is 1.45 :s; f3 :s; 2.86. This can be
pair (a 0 , {3 0 ) is plausible, and hence {3 0 is an implausible value of fl. On the obtained from Figure 10.3.1, or from Figure 10.2.3, or by the numerical
other hand, if Rm.. (/3 0 ) is near 1, then there exists at least one plausible pair of methods described in Section 9.8.
values (a 0 , {J 0 ), and thus /Jo is not an implausible value of {3. Next we find the maximum log RLF of e, the 0.63-quantile of the
The 100p% max imum likelihood interval (or region) for f3 is the set of all /3 distribution. The joint log RLF of(} and f3 is
values fo r which Rmaxl/3) ~ p. This interval contains those f3 values such that, r(8,f1)= -nf31og8+nlogf3+(f3- 1).Lx 1 -f} - P.Lxf + 113.691.
for some a, the pair (a, {3) belongs to the 100p% li.kelihood region. Ten percent
maximum likelihood intervals are shown with broken lines in Figures 10.2.2 We find p(O), the MLE of /3 given(}, by solving the equation S2 ((}, {3) = 0. Then
and 10.2.3. we obtain
rmax(f}) = r(8, p(8)).
EXAMPLE 10.3.1. Consider the situation described in Examples 10.1.1 and
Numerical methods are required to solve for P(8). For instance, when(}= 80
10.2.1. The joint log likelihood function is
we find by Newton's method that S 2 (80, /J) = 0 for f3 = 2.0764. Thus
2 2
1(µ1, µ1) = - t(x 1 - µi) 2 - ·H x2 - µ2) -i(x3 - µi - µ2) P(80) = 2.0764, and
and the MLE's are rmax<.B)
1.5 2.0 2.5 3.0
0 .8
From Example 10.1.1, the MLE of µ 1 given µ 2 is
f.t1 (µ2) = ·Hx 1 + X 3 - µ2).
-I '\ \
The maximum log RLF of µ 2 is \
\
rmax(µz) = l(j.L1 (µ2), µz) - l(j.L1 , f.t2). \
\
-2 Rmax(tJ)• 0.1 \
After substitution and simplification, we obtain \
I
2
r ma.Cµ2) = -i(µ2 - f.t2) .
-3
I
I
Taking r max (µ 2) ~ log 0.1 gives the 10% maximum likelihood interval I
I
27.85 :s; µ 2 :s; 31.35. Similarly, we find that
Figure 10.3.1. Maximum log RLF fo r fJ in the Weibull distribution example. The
rmax(µi) = -i(µ1 - f.ti) 2, normal approximation is shown with a broken line.
68
10. Two-P arame ter Likelihoods
10.3. Maxim um Relative Likelihood
69
rmax( e)
60 80 By careful desig n of the exper imen t and choic
0 100 e of the param eters , it may be
possi ble to arran ge that (10.3.2) is true, at
least appro xima tely. We can then
treat the two-p aram eter probl em as a pair
of one-p aram eter probl ems, thus
simpl ifying both the analy sis and the interp
retati on. Adva nce plann ing to
-1
I
h
h
" \
\
achie ve facto rizati on of the likeli hood funct
ion becom es progr essiv ely more
\ impo rtant as the numb er of unkn own param
I
I \
\
eters incre ases.
I \
I \
-2 \ EXAM PLE 10.3.l (cont inued ). The
I Rmax (e) • 0. I \ joint log likeli hood funct ion of µ and
\ conta ins a produ ct term µ µ , and hence 1 µ2
I th·: likeli hood funct ion does not
1 2
I facto r into a funct ion ofµ times a funct ion
I 1 of µ 2 . As Figur e I 0.2.2 show s, the
-3 I range of plaus ible value s for µ depen ds
I 1 upon the value of µ 2 . In parti cular,
I the most likely value of µ is ·
I 1

Figure 10.3.2. Maximum log RLF for 8 in jJ.i(µ2) = t(x1 + X3 - µ2)


the Weibull distribution example. The
normal approximation is shown with a broke which decre ases as µ 2 incre ases.
n line.
Supp ose that we work inste ad with param
eters (8 1 , 82 ) wher e 8 1 = µ1 + µ2
r max(80) = r(80, 2.0764) = -0.024. and 82 = µ 1 - µ 2 . Their MLE 's are
Simil arly, we find that S (70, /3) = 0 for
2 f3 = 1.8810, and Bl= ii1 + ii2 =t(x 1+X z+2x 3);
r max (70) = r(70, 1.8810) = - 1.023. B2= ii1-i iz=X 1-X2 .
Figur e 10.3.2 show s rmax(8) and the norm Upon subst itutin g µ 1 = (8 + 8 )/ 2, µ
al appro xima tion from Secti on 10.4. 1 2 = (8 1 - 82 )/ 2 and simpl ifying we find
The 10% maxi mum likeli hood interv al that the log RLF of 8 1 and 8 is
2
is 64.2::;; 8::;; 103.l . 2

r(8 1 , 82 ) = -;i(8 1 - Oi) 2 -!(8 2 - 82 ) 2


Fact oriz ation
= rmax(8i) + rmax(B2).
Supp ose that the joint likeli hood funct ion The log relati ve likeli hood of any pair
of 11. and f3 facto rs into a funct ion of of value s (µ 1 , µ 2 ) can be found by
11. times a funct ion of /J: comp uting the corre spon ding value s of 8
1 and 82 and then summ ing r m,.(8 )
and rm.,(8 2 ). 1
L(11., /3) = g(11.) • h(/3) for all 11., f3. (10.3.2)
Then a(/3) is not a funct ion of /3, and
PROBLEMS FOR SECTION 10.3
Rm.,(/3) = L(a(/3), /3)/L(a, {J) = h(/3)/ h({J). 1. Derive the maximum RLF's of ex and
f3 in Problem 10.1.1, and find the 10%
It follows that maximum likelihood interval for ex ..
R(11., /3) = Rm .. (11.). Rm .. (/3); 2.t(a) Let X and Y be independent Poisso
n variates with means µ and A.µ,
r(11., /3) = rmax(11.) + rmax(/3). respectively. Derive the maximum RLF of A..
(b) In a certain city there were 47 murders durin
Whe n (10.3.2) holds , ex and f3 can be estim g the year prior to abolition of the
ated indep enden tly. Grap hs of death penalty. There were 57 murders the year
rmax(11.) and r max(/3) will then provi de a comp after abolition. Assuming these
lete summ ary of the infor matio n to be observed values of independent Poisson
conc ernin g (ex, /3), and it is not neces sary variates with means µ and A.µ,
to consi der a conto ur map. find the 10% maximum likelihood interval for
A.. Is it plausible that the murder
If (10.3.2) does not hold, the range of plaus ible rate has not changed?
value s for one param eter
will depe nd upon the value of the other
param eter. This infor matio n cann ot 3. Let X 1 , X 2 , .. ., X. and Y , Y , •. ., Y,, be indep
be recov ered from just the maxi mum 1 2 endent exponential variates. The X/s
RLF' s and a conto ur map will be have mean e and the Y/s have mean W where
requi red. A and {) are positive unknown
parameters.
70 10. Two-Parameter Likelihoods 10.4. Normal Approximations 71

(a) Derive expressions for ~ and e. about {) = 8. A similar derivation in the two-parameter case gives
(b) Show that the maximum RLF of). is
r(a, /3) ~ -t(a - &)2 .J 11 -J(/3 - P) 2 .J22 - (a - &)({J - p).J 12,
Rmax(..l.)=2
2
n(fx)"(1 + fxr n. 2
(10.4.1)
where .J1i =§ii(&, fJ) as in Section 10.1. If we take
(c) The following are the observed survival times for 12 subjects:
Treatment A:
Treatment B:
9 186 25 6 44
l 18 6 25 14
115
45
{)= [;J J(B) = J(&, lJ) = [~ 11
f12
Survival times are modeled as independent exponential variates with mean 8 the approximation may be written
for treatment A and mean ).() for treatment B. Obtain the 10% maximum
likelihood interval for ..l.. Do these data clearly demonstrate the superiority of r(e) ~ -t(e - ey J(B)(e - 8),
the first treatment?
which shows its similarity to the one-parameter result.
2
4. Let x 1 , x 2 , ... , xn be independent observations from N(µ, 11 ) where bothµ and,,. When (10.4.1) applies, the likelihood contours are close to ellipses centered
are unknown. at (&, f3). As in the one-parameter case, the normal approximation is not
(a) Show that the maximum relative likelihood function ofµ is invariant, and a one-to-one transformation from (a, {J) to new parameters
(µ, v) may substantially improve its accuracy. The information matrix for the
-[L(X;-µ)2]-n/2
Rmax(µ) - ,
-[ 1 + (jl-µ)2]-n/2
2 -,- - new parameters is
n11 ,,.
for - oo < µ < oo. Hence show that the 100p% maximum likelihood interval J,(µ, v) = Q' J(&, lJ)Q. (10.4.2)
for µ has the form Here Q is the two-by-two matrix of derivatives of the old parameters with
µE fl± ca respect to the new:
where c is a function of p and n.
(b) Show that the maximum relative likelihood function of,,. is [oa/oµ Q= oa/ov]
of3/oµ ofJ/ov ·
for 11> 0.
We evaluate Q at the MLE to obtain Q. The proof of (10.4.2) is similar to the
5. Find the maximum RLF of µ 2 in Problem 10.1.6. proof of the one-parameter result (9.7.4).
Differentiating (10.4.1) with respect to a gives an approximation to S 1 (a, {J),
6.t Find the maximum RLF's of). and c in Problem 10.1.8.

7. Find the maximum relative likelihood function for 8 in Problem 10.1.9. Show that S1 (a, /3) ~ - (a - &).J 11 -(/3- lJ).J 12,
this is the same as the relative likelihood function for 8 based on the conditional
distribution of X 2 , X 3 , and X 4 given X 1 •
which is linear in a and /J. Setting this equal to zero and solving for a gives

8.t Find the maximum relative likelihood function for). in Problem 10.1.10. (10.4.3)
9. Show that a one-to-one parameter transformation from (r:x, {J) to (y, {3) does not If we now substitute for a in (10.4.1) and simplify, we obtain the following
affect the maximum RLF of /3. The maximum RLF of f3 can be found by normal approximation to r max(/3):
maximizing the joint RLF of r:x and f3 over r:x, or by maximizing the joint RLF of y
and f3 over y. (10.4.4)

This has the same form as the normal approximation which we derived in
Section 9.7 for the one-parameter case. The quantity in square brackets is
10.4. Normal Approximations positive by (10.1.2).
The inverse of the information matrix is
In Section 9.7 we derived the normal approximation
r(e) ~ -t(e - 8)2 J(B)
by ignoring cubic and higher terms in the Taylor's series expansion of/({))
72 10. Two-Parameter Likelihoods
10.4. Normal Approximations
73
and the (2, 2)-elemen t of the inverse is
- 22 - must first evaluate J(B, '/3). An expressio n for 1(8, fl) is given in Example
.~ -
-J1 i/(J- 11J22
-
- J- 12)
2 - - 2
= (J22-J1 -
2/J11) - 1
. 10.2.2. We find the second derivative s of /(8, fl) and change their signs to get
Thus the normal approxim ation (10.4.4) can also be written as e
J(8, /J). Substituti ng 8 = and fJ = iJ then gives
- -
rmax(/3)- t ({3- p)
1) 2 /§- 22 .
(10.4.5) J(lJ
'
'/3) = [ 0.01516
-0.13046
-0.13046 J-
EXAMPLE 10.4.L Consider the normal distributio n example of the preceding 10.379
three sections. The log likelihood function of µ and µ is From this we compute
1 2

/(µ1, µz) = -t(x1 - µ1) 2 -t(x2 - µ2) 2 -t(x3 - µ1 - µ2) 2, .Jll -.Jf2/.l2 2 = 0.01352;
which is a second-de gree polynomi al in µ and µ • As a result, the approxi- and then (10.4.4) gives
1 2
matio_ns (10.4.1), (10.4.3), and (10.4.4) hold exactly. Since .J =.J = 2
and f 12 =1, we have 11 22 rm .. (8):::::: -t(8-8L8 8)2(0.01 352);

r(µ1' µ2) = -(µ1 -f!i) 2 -(µ z -f!2) 2 -(µ1 -fl1Hµ2 -ft2). r mu(/3) :=:::: -i(/3 - 2.1021) 2(9.256).
The contours of constant relative likelihood are ellipses as shown in Figure fhese functions are plotted as broken curves in Figures 10.3. l and 10.3.2. The
10.2.2. Also, since · agreemen t is not too bad. The normal approxim ations give 10% maximum
likelihood intervals 63.4 :-:;; 8 :-:;; 100.3 and 1.40 ~ f1:-:;;2.81, while the exact
.J 11 - .Jf 2/.J 22 = .J22 -.J Li.J 11 = !,
the maximum log RLF's are
results are 64.2 :-:;; e:-:;
103.l and 1.45 :-:;; f3 :-:;; 2.86.

PROBLEMS FOR SECTION 10.4


In Example 10.3.l we transform ed parameter s from (µ , µ ) to (8 , 8 )
1 2 1 2 I. In Problem 10.2.4, evaluate the information matrix J(O, 'iJ). Find approximate
where µ 1 = (8 1 + 8 2 )/2 and µ 2 = (8 1 - 8 2)/ 2. Differenti ating the µ;'s with
10% maximum likelihood intervals for 8 and fl , and compare them with the exact
respect to the 8/s gives
results.

2.t (a) Evaluate J(ji., A) and find an approxima te 10% maximum likelihood interval
for A in Problem I0.3.2(b).
(b) Transform parameters from (µ, A) to (a, fl) where a = logµ and fJ =log )..
The derivative s are not functions of the parameter s because the transform -
Calculate the information matrix .I.(&, fl). Obtain an approxima te 10%
ation is linear. Thus Q = Q, and (10.4.5) gives
maximum LI for fJ and transform it to give an interval for A.

-t
-!]'[2 l
t]
-± = [!0
(c) Compare the results of (a) and (b) with the exact 10% interval. Does the
logarithmic transformation seem to improve the normal approxima tion?

The log likelihood function is a second-de gree polynomi al in 8 and 8 , and 3. Prove result (10.4.2) for transforming informatio n matrices.
1 2
the approxim ations hold exactly. From (10.4. l) and (10.4.4) we obtain
4. Consider a one-to-one parameter transformation from (tX, {3) with information
r(81, 82) = -i(81 -81) 2 -t(8 2 - B2)~;
matrix .} = .f(&, 'i3) to (y, fl) with information matrix .J. = .f.(y, /3).
(a) Show that

EXAMPLE 10.4.2. Figure 10.2.3 shows contours of constant relative likelihood - _


§, 1
= [a
0
for paramete rs (8, {3) in the Weibull distributio n example. The 10% and l %
contours are not elliptical in shape, and a sample size larger than n = 23 is ay ay .
where a= - and b = - are evaluated at the maximum.
needed before the normal approxim ation (10.4.1) will give accurate results. aa ap
The maximum log RLF's of fJ and 8 are shown as solid curves in Figures (b) Show that .}; 2 = .} 22 . Hence the normal approximation (10.4.5) does not
10.3.l and 10.3.2. To obtain the normal approxim ations to these curves, we depend upon whether we work with parameters (tX, /3) or with parameters
(y, fl).
74 10. Two-Parameter Likelihoods 10.5. A Dose-Response Example
75
(c) Show that
p(d)
j ! 1 = ai.} 1 1 + 2ab.J 1 2 + b2 .J 2 2
a1 J 21 - 2ah.1 11 +b 1.f11
)11)12 -Ji2
Hence it is possible to approximate rmu(y), where y = g(o:, /3), by using just the
t ----
results computed for o: and {3.
5.* An alternative method for deriving a normal approximation to rm .. (/3) is to
expand rmu (/3) about /3 = ~ and then ignore cubic and higher terms. Show that
this procedure also leads to the approximation (10.4.2).

Figure 10.5.l. A typical dose-response curve.

10.5. A Dose-Response Example


Probit Model
Suppose that a drug is administered ink different doses d 1 , d2 , •. . , dk. Dosage
is usually taken to be the log concentration of the active ingredient, so that Suppose that the tolerance D is normally distributed with mean µ and
d-> - oo as the concentration approaches zero. Suppose that each subject variance a 2 . Then
either responds to the drug or does not respond, so that the response is
quanta! (all or nothing). For instance, when an insecticide is applied, insects p(d) = P(D ~ d) = P( Z ~ d: µ) = F(r:x. + f3d)
either respond (die) or do not respond (survive). When a beneficial drug is
administered, an improvement in the patient's condition might be taken as a where r:x. = - µ/(J, /3 = 1/a, and Fis the standardized normal c.d.f. (6.6.3). This
response, and a lack of improvement as no response. can also be written
Let p(d) denote the probability of response for a subject who receives dose
r 1
(p) = a + f3d
d of the drug. We expect p(d) to be a smooth nondecreasing function of d.
To simplify matters, we assume that p(d) _. 0 as d-> - oo, and p(d)-> 1 as where p-i is the inverse of the N(O, l) c.d.f., and is called a probit dose-
d-> + oo; that is, we assume that no subjects will respond if the dose is very response model.
small, and all subjects will respond to a very large dose. These assumptions
are not always reasonable. There may be some subjects who would respond Logistic Model
naturally without the drug, and others may be immune to the drug. For dis-
cussion of these situations, see D.J. Finney, Probit Analysis, 3rd edition
The logistic distribution is similar in shape to N(O, 1) and has c.d.f.
(1971), published by the Cambridge University Press.
When these assumptions hold, the dose-response curve will be as shown in l
G(z) = 1- - - for - oo < z < oo.
Figure 10.5.1, and p(d) has the same mathematical properties as the c.d.f. of a 1 + e'.
continuous distribution. An advantage of the logistic distribution is that its c.d.f. can be evaluated
This result can also be obtained by imagining that different subjects have without numerical integration. Replacing F by G in the above derivation
different tolerances to the drug. Let D represent the minimum dose required gives
to produce a response in a randomly chosen subject. A dosed will produce a
1
response if and only if the tolerance of the individual is at most d. Thus the p(d) = G(r:x. + /Jd) = l -
1 + ea+pd
. (10.5.1)
probability of a response at dose d is
Solving G(z) = p gives z = log-P-, and hence the model may be rewritten
p(d) = P(D ~ d) = F(d) 1-p

where F is the cumulative distribution function of the random variable D.


p
log-- =r:x.
1-p
+ f3d. (10.5.2)
76 10. Two-Parameter Likelihoods !OS A Dose-Response Example 77

This is called the logistic dose-response model, and log is called the whereµ;= niPi· Differentiating again gives
1-p
log-odds or logistic transform of p.
o
Both the logistic and the probit models are commonly used in analyzing or:x
data from dose-response tests. The two models lead to quite similar results, where vi= nip;(l - p;). Similarly, we obtain
and a very large amount of clata would be needed to show that one was better
than the other. The calculations are a bit simpler for the logistic model, and
for this reason we shall use it in what follows. The MLE's are found by solving the simultaneous equations

Maximum Likelihood Estimates In general, these equations must be solved numerically, and the
Newton-Raphson method (I 0.1.3) can be used.
Suppose that ni subjects receive dose di, and that Y; of these respond
(i = 1, 2, ... , k). Then Y; has a binomial distribution with parameters ni and EXAMPLE 10.5.1. k = 5 different doses of an insecticide were applied under
P;, where standardized conditions to samples of an insect species. The results are shown
Pi= 1-(1 +ea+pd,)-1. in Table 10.5.1. We assume that p, the probability that an insect dies, is
related to the dose via the logistic model (10.5.1). We wish to find the
If different subjects are used for different doses, the Ji's will be independent,
and their joint probability function is maximum likelihood estimates(&, PJ.
Based only on the data for dose di, we would estimate Pi by Yilni and the
log-odds by
y.jn
log ' ' =log
The likelihood and log likelihood functions are 1 - y;/ni n; - Yi

These values are given in the last row of the table, and are plotted versus the
L(a, /3) = f1k pf'(! - p;)n,-y, = f1k [ _Ei_ ]~ (1 - Pi)"'; dose in Figure I 0.5.2. A straight line has been drawn in by eye. If the logistic
i=l i=l 1-pi
model holds, then (10.5.2) implies that the five points should be scattered
about a straight line. The agreement with a straight line is very good in this
example.
From Figure I 0.5.2, we see that ex::::; - 5 and f3 ::::; 3, and we use these as
k
starting values for the Newton-Raphson method. Taking a= - 5 and /3 = 3,
= I
i= 1
[yi(r:x + {Jd;) + ni log (1 - pi)].
we compute pi, µi = ni pi, and vi= ni Pi(l - p;) for i = 1, 2, ... , 5. Using these
values and the d;'s from Table 10.5.1, we then get
Note that
S 1 =11.195; S 2 = 19.031;
§ 11 = 40.11; § 12 = 66.85;
op.
- ' =(1 +ea+pd,)-2e•+Pd•d-=p-(1-p-)d.
0{3 I I I 1•
Table 10.5.1. Data from a Dose-Response Experiment
Using these results, one can easily show that Concentration (mg/I) 2.6 3.8 5.1 7.7 10.2
Log concentration d; 0.96 1.34 1.63 2.04 2.32
ol 49 50
S1 (IX, {J) = oa = L(Yi - µ;); Number of insects ni
Number killed y,
50
6
48
16
46
24 42 44
F ract1on killed 0.12 0.33 0.52 0.86 0.88
ol -1.99 -0.69 0.09 1.79 1.99
S2(a, {J) = o{J = L(Yi - µ;)di
78
10. Tw o-P aram eter Likelih
oods
10.5. A Dose-Response
Example
y
log n=y - 79
2 Table 10.5.2. Observed Fre
0 quencies of Insects Killed
Expected Frequencies Un and Surviving, and
0 der a Logistic Model
Nu mb er killed
Con cen trat ion Nu mb er surviving
Observed (expected)
Ob ser ved (expected)
Tot al
2.6 6 (6.39)
3.8 44 (43.61)
16 (15.47) 50
5.1 32 (32,53)
2 d 24 (24.94) 48
7.7 22 (21.06) 46
42 (39 .68)
0
10.2 7 (9.32)
-I 44 (45.53) 49
6 (4.47) 50

centration 2.6 is fi = 0.1277


-2 1 . The estimated expected
0
is [t 1 = n1 p1 = 6.39, and the number of insects killed
expected number surviving
Table 10.5.2 shows the obs is n 1 (1 - p1 ) = 43.6L
erved and expected freque
Figure 10.5 .2. Plo t of esti used in the experiment. ncies for the five doses
mated log-odds versus dos The agreement is very
e. logistic model gives a goo close, indicating tha t the
d description of the data.
Th e inverse of the inform
ation matrix is
- 1 -[4 0.1 1
66. 85] - l = [ 0.4196 Estimation of the ED50
J - 66.85 -0. 23 68 ]
118.44 -0. 23 68 0.1421 The ED50 is the dose y,
and by ( 10. 1.3), the impro say, which would be req
ved estim ates are given by response rate (see Figure uired to produce a 50%
10.5.1). Since p(y) = ·! , we
have
-5 ] [ 0.4196 -0 .23 68 ] [11 .19 5] = [-4
[ 3 + -0. 23 68 .80 94 ]· p(y)
0.1421 19.031 0 =lo g
We now repeat the calcul
3.0531 1 _ p(y) =a + {3y,
ations with ct.= -4 .8094 and it follows that y = -a
mo re iterations, we obt ain and f3 = 3.0531. After two /{3. By the invariance property,
Usually y is of more intere we have y= - &.//3.
st tha n the intercept par
consider a parameter transf am ete r a, and so we
&. = -4. 88 69; p= 3.1035 (10.5.2) then becomes
ormation from (ct., fJ) to (y,
{3). The logistic model
correct to four decimals.
The maximum of the log
likelihood is
I(&, lJ) = -11 9.894,
and the information matrix
is The log RL F of (a, /3) is
39.091 62.78 5]
J(&., '/J) = [ 62.785 r(a, {3) = l(a., {3) - /(ii, p).
107.491 .
Th e estimated dose respon Substituting ct.= -yf 3 giv
se model is es the log RL F of (y, {3):
p= l- (l + e-4 .88 69+ 3.t0 35d ) - 1, r,(y, /3) = r(- yf3 , {3) = l(-y
f3, {3)-1(&., '/3).
and from this we can find EXAMPLE 10.5.1 (continued
the estimated kill probabilit ). Th e ML E of the ED50
Fo r instance, at concentra y pfor any given dosed. is
tion 6 mgfl, the dose is d =lo
kill probability is p= 0.6 g 6, and the estimated y= -&.((J = 1.5746
62.
Using this result , we find and the log RL F of (y, {3)
tha t the estimated kill is
probability at con-
r.(y, /3) =I ( -yf 3, /3) + 119.894.
81
10. Two-Parameter Likelihoods 10.5. A Dose-Response Example
80

e likelihood in the (y, {3)


Figure 10.5.3 shows conto urs of const ant relativ
, and thus the norma l 1.7
plane. The conto urs are close to elliptical in shape 1.5 1.6
te results . Since the axes of 0
appro ximat ions of Sectio n l 0.4 shoul d give accura
param eter axes, the range of plausible
the ellipses are nearly parallel to the
enden t of the value of {3.
values for y is nearly indep
not parallel to the
If conto urs are plotte d in the (ex, /J) plane, their axes are -I
for ex is strong ly depen dent
coord inate axes, and the range of plausible values
for chang ing param eters from
upon the value of {3. This is anoth er reason
(ex, {3) to (y, {3).
fJ) over f3 for fixed y.
To find rmax(y), it is necessary to maximize 1(- y/J, -2
Define
0
g(/3) = of3 t(-yf3, /3) = L.(d; -y)(y ; - µ;) ;
-3
2
g'(/3) = :/J g(/J) = -'L.{d; - y) V;

l approximation (broken
Figure 10.5.4. Maximum log RLF (solid curve) and norma
where µ; = n; P; and n; p;(l - p;) as before. For any given value of y, we can
v1 =
to obtain fl(y), and then curve) for the ED50.
solve the equat ion g(f3) = 0 by Newto n's metho d
calculate
rmax(y) = /( -yft(y ), J3(y)) + 119.894.

After repea ting this proce dure for several y values


, a graph of this function
respect to (y, /J) is
-[ oex/oy
Q- of3/oy
oex/ofJJ
ap;ap
= [ - f3
o -n
e 10.5.4). and now (10.4.2) gives
can be prepa red (see Figur
need the inform ation
To find the norma l appro ximat ion to r m .. (y), we
matrix § .(y, fl) for the new param eters. The matrix
of deriva tives of (ex,/)) with §.(y,A
P> = [-fl0
= [376.5 27 -3.81 9] ·
-3.819 6.691
f3
From this we calculate
4
J";I -(J; 1)1/J;2 = 374.35

and hence the norma l appro ximat ion is


rmax(y) :=:::: -J(y -Y}2(374.35).
in Figure 10.5.4, and the
3 The appro ximat ion is shown with a broke n curve
curve is very close. The appro ximat e 10%
agreement with the exact
and the exact result is
maxim um likelihood interval is 1.464::;; y ::;; 1.686,
1.460 s y s 1.686.
2

PROBLEMS FOR SECTIO N 10.5


1.4 1.5 1.6 1. 7
1. The following table gives the number of beetles
which died within 6 days and the
an insecticide.
in the (y, /3) plane. number which survived at each of six concentration s of
Figure 10.5.3. Contours of constant relative likelihood
82 10. Two-Parameter Likelihoods 10.6. An Example from Learning Theory
83

Concentration 0.711 0.852 0.959 1.066 1.202 1.309 (c) Show that ft is a root of the equation
Number dead 15 24 26 24 29 29
Number alive (:Ex,yJ(:EePx;)-(:Ey 1)(:Ex 1e'1x') = 0,
35 25 24 26 21 20
and describe how p can be found by Newton's method.
Assume that the log-odds in favor of death is a linear function of the dose (d) Derive the maximum RLF of {1;
d,
p 5. The survival time Y; of an individual with tumor size x, has an exponent
log - - = a. + f3d ial
l-p distribution with mean
where d is the log concentration.
81 = E(Y,) = exp(a. + f3x 1)
(a) Prepare a graph to check whether the model seems reasonable, and from where a. and {1 are unknown parameters. Suppose that n survival
it times
obtain initial estimates of a. and /3. y 1, Yi· ... , y. with corresponding tumor sizes x , Xi, ..• , x. are observed.
1
(b) Obtain the maximum likelihood estimates &, ft by the Newton-R aphson
method. (a) Show that the score vector and information matrix of a. and /3 are as follows:
(c) Estimate the concentra tion of the insecticide which is required to obtain.a
kill probability.
(d) Find the 10% maximum likelihood intervals for a. and /3.
50%
s- [
-
J
:E(r 1 - l) ..
:Ex1(r 1 - 1) '

2.t The probability of a normal specimen after radiation dose d is assumed where r1 = y;/01•
to be (b) Show that the determinant
p = e•+Pd where a and {J are constants. The following table gives the number of~ is :Er1 times
of
normal specimens and the total number tested at each of five doses:

d = Radiation dose 0 1 2 3 4 where x = (:Er1x 1)/:Er1• Hence verify that condition (10.1.2) is satisfied.
y =Numbe r of normals 4357 3741 3373 2554 1914 (c) Derive an expression for the MLE of a when /3 is given, and describe
a
n =Numbe r tested 4358 3852 3605 2813 2206 numerical procedure for evaluating ft.

(a) Plot log(y/n) against d to check whether the model seems reasonable,
and
obtain rough estimates of a. and f3 from the graph.
(b) Find the maximum likelihood equations and solve numerically for 5: and
using the Newton- Raphson method or otherwise. Plot the 10% likelihood
ft 10.6. An Example from Learning Theory
contour, and obtain 10% maximum likelihood intervals for f3 and e•.
In their book Stochastic Models for Learning (Wiley, 1955), R.R. Bush and
3.tThe number of particles emitted in unit time from a radioactive source F. Mosteller develop general probabilistic learning models and apply them
bas a to
Poisson distribution. The strength of the source is decaying exponentially
with a variety of learning experiments. One of the most interesting applications is
time, and the mean of the Poisson distribution on the jth day is µi = to the Solomon...:Wynne experiment (R.L. Solomon and L.C. Wynne,
a.{31
U = 0, 1, ... , n). Independent counts x 0 , x 1 , ... , x. of the number of emissions in Traumat ic Avoidance Learning: Acquisition in Normal Dogs, Psych. Monog.
unit time are obtained on these n + l days. Find the maximum likelihood equations 67 (1953), No. 4). We shall first describe this experiment, then develop the
and indicate how these may be solved for ii and ft. model, and finally use likelihood methods to estimate the two parameters of
4. Observat ions Yt> Yi, ... , y. are taken on the number of plankton in
the model.
unit-volume In the Solomon -Wynne experiment, 30 dogs learned to avoid an intense
samples of seawater at temperatures x , xi, .. . , x •. The y,'s are modeled
1 as electric shock by jumping a barrier. The lights were turned out in the dog's
observed values of independent Poisson variates Y , Yi, ... , Y,,, where
1
compartment and the barrier was raised. Ten seconds later, an intense shock
µ 1 = E(Y;) = exp(<X + f3x;). was applied through the floor of the compart ment to the dog's feet, and was
(a) Show that the log likelihood function is left on until the dog escaped over the barrier. The dog could avoid the shock
only by jumping the barrier during the ten-second interval after the lights
/(a., {J) = :E(y1 log µ 1 - µ;). were turned out and before the shock was administered. Each trial could thus
(b) Find the score vector and information matrix for a and {J, and describe how be classified as a shock trial, or as an avoidance trial. The experimental record
to
obtain 5: and ft by the Newton- Raphson method. of 30 dogs, each of which had 25 trials, is shown in Table 10.6.1, with 0
84 10. Two-Parameter Likelihoods 10.6. An Example from Learning Theory 85

Table 10.6.1. Data from 25 Trials with 30 Dogs in the Solomon-Wynne The Model
Experiment
Consider the sequence of trials for one dog. As in Table 10.6.1 we take y1 =1 if
0 = Shock trial; 1 = Avoidance trial the dog avoids shock at trialj, and y1 = 0 otherwise (j = 0, 1, ... , 24). Because
Trial numbers of learning at trialj - 1, the probability that the dog receives shock should be
smaller at trial j than at trial j - 1. The amount by which the probability
0-4 5-9 10-14 15-19 20-24 decreases may well depend upon whether there was shock or avoidance at
trialj - I. We wish to compare the effectiveness of shock trials and avoidance
Dog 13 0 0 1 0 l 0 1 1 1 1 l 1 l 1 1 1 1 1 1 1 1 1 trials in teaching the dog to avoid future shocks.
16 0 0 0 0 0 0 0 1 0 0 0 00 0 1 l 1 1 1 1
17 0 0 0 0 0
Let <fl. be the probability that the dog receives a shock at trial j, given its
1 1 0 1 1 0 0 1 1 0 1 0 1 1 1
18 0 1 1 0 0 1 1 1 1 0
past history in trials 0 through j - I. Let x 1 be the number of times that the
1 0 1 0 1 1 1 I 1 1 1
21 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 l
dog has avoided shock in trials 0 through j - 1, so that
1 1 l 1
27 00000 01111 0 0 1 0 1 1 1 l 1 1 1 1 1 1 x 1 =Yo+ Yi+ ... + Y1-1 ·
29 00000 10000 0 0 1 1 1 1 1 1 1 1 1 1 1 1 The number of previous shock trials is thenj- x 1. Since all dogs were given a
30 00000 00110 0 1 1 1 1 1 1 l l 1 1 1 1
32 0 0 0 0 0 l 0 10 1
shock at trial 0, we assume that </1 0 =I. For j > 0 we assume that
1 0 1 0 0 0 1 1 l 1 0 1 1 0
33 00001 00110 1 0 1 1 1 1 1 1 1 1 1 1 1 (10.6.1)
34 0 0 0 0 0 0 0 0 0 0 1 l 11 l 1 0 l 1 1 1 1 1 1 where O::;; A::;; 1 and 0::;; B::;; I. We call A the avoidance parameter and B the
36 0 0 0 0 0 1 i 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 shock parameter. The model can also be written
37 0 0 0 1 1 0 1 0 0 1 l 1 1 1 l 1 1 l l l 1 1 1 1
41 0 0 0 0 1 0 1 I 0 I l 1 1 1 1 1 l 1 l l 1 1 1 l 1 log <jJ1 = X/1. + (j - x)f3 (10.6.2)
42 0 0 0 1 0 I 1 0 1 1 1 1 1 1 1 1 1 l 1 1 1 1 1 1 1
where CJ.= log A and f3 =log B.
43 0 0 0 0 0 0 0 I 1 1 l 1 1 l 1 1 l l l l 1 1 It is easy to show that
45 0 1 0 1 0 0 0 1 0 I 1 1 0 1 l l l l 1 1 1 1 l 1
47 0 0 0 0 1 0 l 0 1 1 1 1 1 1 1 1 l 1 1 1 l 1 11 1 if y1 _ 1 = 1;
48 0 1 0 0 0 0 l 00 0 1 1 1 1 1 l 1 l 1 1 1 1 1 1 1 ifY1-i=O.
46 0 0 0 0 1 1 0 10 1 1 0 1 0 1 l 1 1 1 1 l 1 1 1 1
The probability of a shock decreases by the factor A ifthere was an avoidance
49 0 0 0 1 1 1 1 I 0 1 1 1 1 1 1 1 1 1 1 1 1 l 1 1 1
at trial j - 1, or by the factor B if there was a shock at trial j - 1. If A is small,
50 0 0 1 0 l 0 1 1 1 1 l 1 l 1 1 1 0 0 1 l 11l 11
then the effect of an avoidance trial is to greatly reduce the chance of future
52 0 0 0 0 0 0 0 1 l 1 1 1 1 l 1 1 1 l 1 l 11 1 l 1
54 0 0 0 0 0 0 0 0 I 1 1 0 1 0 0 0 l 1 0 l 1 1 1 1 l shock. If A = 1, nothing is learned from an avoidance trial. If A < B, then
57 0 0 0 0 0 0 1 0 1 I 1 1 0 1 0 l 1 1 1 l 1 1 1 1 1 more is learned from an avoidance trial than from a shock trial.
59 0 0 1 0 1 l 1 0 l 1 0 1 1 1 1 1 l 1 1 l 1 1 1 1 1
67 0 0 0 0 1 0 l I 1 1 1 1 1 1 1 l 1 1 l 1 111 l 1
66 0 0 0 l 0 1 0 1 1 1 0 1 0 1 1 1 1 l 1 l 1 1 1 1 l
The Log Likelihood Function
69 0 0 0 0 1 I 0 0 1 1 0 1 0 l 0 l 0 l 1 1 1 1 1 1
71 0 0 0 0 l 1 I I 1 0 1 0 1 1 l l l 1 1 1 l 1 1 l The joint probability function of Y0 , Yi, ... , Y24 can be written as a product of
25 factors:

denoting a shock trial and 1 an avoidance trial. (The dogs are numbered 13, f(yo, Yi, .. ., Yn) =f(Yo) ·f(Yi IYo) ·f (Y2 IY0. Yi). · .. ·
16, etc. for identification purposes, and no use is made of these numbers in the Given the results of trials 0 through j - 1, the probability function of Y; is
analysis.) Initially, all of the dogs received shocks on almost every trial, but by
trial 20, all except dog number 32 had learned to avoid the shock by jumping ,;.. for y1 = O;
_,i.i-yi(l-</>Yj= 'l'j
the barrier. f(YilYo, .. .,y1-1 ) -'l'J 1 { 1-</Ji fory =1.
1
86
JO. Two-Parameter Likelihoods
10.6. An Example from Learning Theory
87
Since f(y 0 ) =I for Yo= I, this term makes no contrib
ution, and therefore MLE's and plot contou rs, thus obtain ing an exact summ
24 ary of the informa-
tion concerning the param eters.
f(yo,Y 1,.-.,y 24)= TI </>J - Y (l-</>i)Y
j = I
1 1.
A preliminary tabula tion of l(rx, /3) indicates that the maxim
um occurs near
The log likelihood function based on the data from a (rx, f3) = ( -0.1, -0.2). Taking these as initial values, the
single dog is thus MLE's can be found
by the Newto n-Rap hson metho d (10.1.3). After three iterati
24 ons we obtain
j=!
L [(I - Yi) log <Pi+ Yi log(! - </> 1)]. a:= -0.24091; p= -0.07872
Now we assume that results for different dogs are indepe
ndent, and that a I(&, p) = - 273.987
and /3 have the same values for all 30 dogs. Then the log
likelihood function • [2451
based on all of the data is ..f(a, p) = 2784]
2784 10277 .
30 24
l(rx, /3) = L L
i= I j= I
[(I - Yii) log </>Ii+ ylj log(l - </>ii)] The MLE's of the original param eters A, B are

A= e" = o.786; B=Ii= o.924.


where The log RLF of A and B is
r.(A, B) = l(log A, log B) + 273.987,
xii= Yw + Yn + ··· + YiJ - 1; and by (10.4.2), the inform ation matrix is
1'1 = :E:E(l - Yi)xii;
T1 = :E:E(l - Yii)(j - xii).
. .
..f.(A, B) = [1/A
O o ]' ..!(a.,. p)
1/B [I o/A 0 J
1/B = 3833
[3969 3833]
12029 .
It is easily shown that
Figure 10.6.l shows contou rs of consta nt relative likelih
ood in the (A, B)
plane. Since A < B over the entire region of plausible
values, an avoidance
trial is clearly more effective than a shock trial in reduci
ng the probab ility of
future shock. In fact, since A::::: JP, about as much is learne
The compo nents of the score function are then d in one avoidance

/3) = T1 -
S 1 (a, r.r.yiixii</>iJ /( I - </>ii); 8

S2(a, /3) = T2 - LLYi h- x;)</>ii/(l - </>Ii). R•.01


The compo nents of the inform ation matrix are I
----- -1--
I
.94
..f 11(a, /3) = r.:Eyiix&<f>j(l - </> j}2; I
I
1
I
I
/3) = r.r.yiixii(j - xii)</>ii /(1
..f 12(rx, - </>iJ) 2; I
.92 I
..I n(a, /3) = r.:Eyi i(j- xiJ) 2 </>ii/(l - </>u}1. I
I

Num erica l Results .90

The above expressions involve sums of 720 terms, and


one would certainly .88
not wish to attemp t the calculations by hand! Before high
speed compu ters
were available, the analysis of data such as these was very A
difficult, and often .74 .76
depen ded upon approx imatio ns whose accuracy could .78 .80 .82
not be checked.
However, now we require only a minute or two on a fast Figure 10.6.1. Contou rs of constan t relative likeliho
compu ter to find the od for the avoida nce and shock
parameters.
88 10. Two-Parameter Likelihoods 10.7. Some Derivations 89

trial as in three shock trials. Note also that the experim ent determi Let (ix, {3 ) be a prelimin ary estimate of the root(&:,
p), and conside r the linear
nes the 0 0
value of B more precisely than the value of A, and that the parame approximations
ters A , B
cannot be estimate d indepen dently of one another .
ag ag
The contour s in Figure 10.6.1 are nearly elliptical, and therefore we
would g(a, /3) ~ g(ao, /30) +(a - 1Xo) aa + ({3 - /30) a{J
expect the normal approxi mations of Section 10.4 to be fairly accurate
. By
(10.4.4), we hav.e ah ah
h(a, /3) ~ h(ao, /30) +(a - rt.o) aa + ({3 - /30) a{J .
rmax(A) ~ - 1(A - A) 2 (3969 - 3833 2/ 12029);
Here the derivatives are to be evaluate d at a= a 0 , {3 = {3 0 . These
rmax(B) ~ -}(B- B) 2 (12029 - 3833 2 / 3969). linear
approxi mations can be derived by truncati ng the bivariat e Taylor'
s series
From these, we obtain approxi mate 10% maximu m likelihood expansi ons of g and h at (a , {3 ) . They have the same values
intervals 0 0 and first
0.745 s; A::; 0.827 and 0.901 s; B::; 0.948. The exact results from derivatives as g and h at the point (a 0 , /30)·
Figure 10.6.1
are 0.744 $A $ 0.826 and 0.899 s; B::; 0.946. Since g(&:, p) = 0 and h(&:, p) = 0, we have
ag ag
(&: -1Xo) OIX + (P -- /30) ap ~ - g(ao, f3o);
*10.7. Some Deriv ations ah . ah
(&: - a0 ) aa + (p - /3 0) ap ~ - h(ao, Pol·
In this section, derivati ons will be given for some results quoted earlier
in the
chapter . First we shall derive the Newton -Raphs on procedu re for
evaluati ng
Solving these linear equatio ns for .&: - a0 and P- /3 0 gives
the MLE (&:, P) in the two-par ameter case. Then conditio ns (
10.1.l) and
(10.1.2) which must be satisfied at a relative maximu m will be
established.
Finally, the use of Newton 's method to solve for points on a
likelihood
contour will be describe d.

The Newto n-Rap hson Method and therefore

Newton 's method for solving the equatio n g(8) = O was derived
at the
beginni ng of Section 9.8. We conside red a linear approxi mation (10.7.1)
to g(8) at
8=80:

g(8) ~ g(8o) + (8 - 8o)g'(8 0 ).


The approxi mation has the same value and slope as g(8) at the point Beginning with a prelimin ary guess (o: , {3 ), we apply (10.7 .1) repe~ted
Since g(B) = 0, we have
8 = 80. 0 0 ly until
convergence is obtaine d. This gene.ralization of Newton 's method 1s
called the
e
~ 80 - g(8o)/g'(8ol· Newton -Raphs on method .
Beginni ng with a prelimi nary guess 8 , we apply this result repeated
0 ly to
e
obtain such that g(B) = o. Solving the Maximum Likelihood Equations
In the two-par ameter case we need to solve a pair of simulta
neous
equatio ns To obtain the MLE (&:, j3) in the two-par ameter case, we need to
solve the
g(a, {3) = O; h(a, {3) = 0. simulta neous equatio ns

where S and S are the compon ents of the score vector (see Section
*This section may be omitted on first reading. 1 2 10.1). In
90 10. Two-Parameter Likelihoods 10.7. Some Derivations
91
this case we have
function is
ag as 1 a2 1
iJct. =Ta= iJct.2 = -J11(ct., {J), h(d) =/(ct., fJ) = /(& + d cos cf>, 'fJ + d sin¢).
For each possible angle cf>, h(d) must have a relative maximum at d = 0. It
and similarly for the other derivatives. Thus (10.7.1) becomes
follows that, for all cf>,

[ Pa:J·~[ctPo .0J+[J11
1
.F 12 ] - [S 1 ] (10.7.2)
...f1 2 ...f22 S2 and at d = 0.
where St> S 2 , and the ...f;/s are all evaluated at ct.=ct. , fJ=fJ • This is the Using the chain rule, we find that
0 0
result stated in Section I 0.1.
In many applications, /(ct., {3) is approximately a quadratic function of ct. and
iJh(d) ol iJct. iJI iJ{J
{J near(&, Pl (see Section 10.4). Then S 1 and S will be nearly linear in ct. and {3 .
2
near (&, Jl). The Newton-R aphson procedure will then converge quickly
ad= iJa iJd + iJ{J iJd = S 1 cos c/> + S 2 sm </>.
provided that the initial guess (ct. 0 , {J 0 ) is not too far from (&, '/J). Since this must be zero for all </> when d = 0, it follows that
In Example 10.1.1 the log-likelihood function 1(µ , µ ) is a second-degree
1 2
polynomial in µ 1 and µ 2 , and the componen ts of the score function are linear
in µ 1 and µ 2 • In this case (10.7.2) will yield ([1 1 , [1 ) in one iteration, no matter which is (10. l.1).
2
what initial values are taken for µ 1 and µ • Upon differentiating again, we find that
2

2
a h(d) ~ ,I. ,, ~
---a;f2= J11COS 2 'f'+k.T12 ,i. ~
COS'f'sm 'f'+ J 22Slll 2
,I. • . ,i.
Derivati on of (10.1.1) and ( 10.1.2) '+'·

Since this must be positive for all </> when d = 0, it follows that
If l(ct., {J) has a relative maximum at (&, p), then it must be "down hill" from
(&, p) in all directions. .J 1l cos 2 </> + 2.J 12 cos</> sin</>+ .J 22 sin 2 </> > 0 (10.7.3)
Suppose that we move away from(&, p) along a iine at angle</> to the ct.-axis for all cf>. It is not difficult to show that this condition is satisfied if and only if
(see Figure 10.7.1). For points (ct., {3) on this line we have (10.1.2) holds.
ct. = &+ d cos </>; f3 = P+ d sin cf>
where d is the distance from (ct., {3) to (&, P). Along this line the log likelihood Calculation of Contour s

f3 In Example 10.2.2 we constructed a contour map by tabulating the joint


relative likelihood function over a lattice of paramete r values. A great deal of
i calculation may be needed to produce an accurate contour map in this way.
As an alternative, it is possible to use Newton's method to obtain points on
( a,jl)
the 100p% contour. We select an angle cf> as in Figure 10.7.1 and iterate to
find d such that {ct., /J) lies on the contour. By repeating this procedure for
{ Q,ft) various angles cf>, we can obtain enough points on the contour to permit
accurate plotting.
The equation of the 100p% contour is
100p%
contour r(ct., {3) - log p = 0
a. and so we consider the function
Figure 10.7.l. Calculatio n of a point on the 100p% likelihood contour by Newton's g(d) = r(a + d cos cf>, '/J + d sin</>)- log p.
method.
Since r(ct., {3) = l(a., fJ) - I(&, pj, we have
92
10. Two-Parameter Likelihoods
10.8. Multi-Parameter Likelihoods 93
g(d) = h(d) -1(&, '/J) log p
fixed and maximizing the joint RLF over the other k ~ 1 parameters:
where h(d) was defined earlier in this section. Thus the derivative of g(d) with
respect to d is · Rmax(Bi) =maximum R(8 1 , 82 , ••• , Bd.
01 fixed
g'(d) = h'(d) = S 1 cos <P + S 2 sin </J. If the number of unknown parameters is small in comparison with the
number of independent observations, then the maximum RLF has properties
To solve the equation g(d) = 0 by Newton's method, we begin with an initial similar to those of a one-parameter RLF. Maximum likelihood intervals or
guess d01 d, and compute
regions can be used to summarize the information available concerning (} 1 .
dnew = dold g(do1d)/g'(do1d). The 100p% maximum likelihood region in the set of parameter values such
that Rmax(8 1 ) 2: p (see Section 10.3).
This calculation is repeated until convergence is obtained. An initial guess The amount of information available for estimating 8 1 may well depend
can be obtained from a preliminary tabulation of r(a, /3), or from the large- upon which other parameters 82,. , ek are to be estimated from the same
sample results of Section 10.4.
data. For instance, an observation y whose distribution depends on the
product 8 1 82 will give information about 8 1 if 82 is known, but it may yield
little or no information about 8 1 if 8 2 is unknown. In general, one would
expect to be able to estimate 8 1 more precisely if the values of 82 , ... , Bk were
*10.8. Multi-Parameter Likelihoods all known.
The maximum relative likelihood function does not adequately adjust for
In this section we shall briefly consider likelihood methods in the multi- the possible loss of information about 81 due to the estimation of 82' ... , 8k.
parameter case. These methods will generally produce satisfactory results As a result, the maximum RLF may suggest that 8 1 can be estimated more
provided that the number of unknown parameters is small in comparison precisely than is appropriate in the situation.
with the number of independent observations. However, as we shall see, This is not a serious problem when there are many observations and only a
serious difficulties may arise when many unknown parameters are to be few parameters, since then only a small fraction of the information will be lost
estimated from a relatively small amount of data. due to estimation of 82, ... , ek. However, as the following examples show,
Suppose that the probability model involves a vector of k un- serious difficulties may arise when many parameters require estimation from
known parameters, 8 = (8 1 , 82 , ••. ,Ok). Then the log likelihood function only a small amount of data.
1(8 1, 8 2 , .•• ,Bk) is a function of k variables, and the MLE lJ = (0 , 82 , •.• ,Ok)
1
is the vector of parameter values which maximizes this function.
EXAMPLE 10.8.1. Suppose that a single measurement y is taken from a normal
In the k-parameter case, the score function S(8) is a k-dimensional vector
distribution N(µ, CJ 2 ).
whose ith component is S;(8) = 31/38;. The information function J(8) is a
Ifµ is known, then y provides information concerning the magnitude of CJ 2 .
symmetric~ x k matrix whose (i,j) component is - a2 l/38;38j.
The closer y is toµ, the smaller the variance is likely to be. The MLE of CJ 2 is
Usually 8 can be found by solving the k simultaneous equations S;(8) = 0 2 2
(; = (y - µ)2, and likelihood intervals for CJ can be found in the usual way.
for i = 1, 2, .. ., k. The condition for a relative maximum is that J(B) must be
Ifµ is unknown, then a single ~bservation y will tell us nothing about the
non-negative definite. Numerical methods may be needed to find and the e, variance. The proper conclusion is that it is impossible to estimate CJ 2 from a
Newton-Raphson method is often useful.
single observation ifµ is unknown.
The relative likelihood function is
Ifwe apply the method of maximum likelihood here, we find that jJ. = y an.d
2 2
(; = (y - j.i) = 0. No matter what is observed, CJ 2 is estimated to be zero. This
is a silly estimate. It arises because the method of maximum likelihood treats
In principle, one can plot this function to determine ranges of plausible values µ as though it were the known value ofµ. The method does not allow for the
for the parameters. In practice, it is difficult to plot and interpret the RLF fact that all of the information provided by y is lost in the process of
when k > 2, and so parameters are usually considered one or two at a time. estimating µ.
The maximum relative likelihood function of 81 is obtamed by holding 8
1
EXAMPLE 10.8.2. Suppose that we observe values of n + 1 independent
random variables Y0 , Y1 , ... , Y. where Y; ~ N(µ;, CJ 2 ). Suppose t h at /<o 1s.
*This section may be omitted on first reading. known, but that µ 1 , µ 2 , ... , µ.,and CJ are all unknown. In this case there are
n + 1 unknown parameters to be estimated from n + 1 observations.
94
10. Two-Parameter Likelihoods 10.8. Multi-Parameter Likelihoods
95
As in the preceding example, it can be argued
that the information
provid ed by y 1 , y 2 , .. . , Yn is lost in estimating µ , Si= (Xi - Y;)/ 2 - N(O, a 2 /2)
1 µ 2 •. • , µ •. Only observa-
,
tion u0 gives inform ation about a 2 . Based on just
this observation, the log T; =(Xi + Y;)/2- N(µu a 2 / 2).
likelihood function is
All of the information provided by T , T , ... ,
1 2 T,, is lost in estimating
µ 1 , µ 2 , ... , µ • . There are effectively only n
obser vation s S 1 , S2 , . .. , s. which
provide information about a 2 when the µ/s are unkno
wn. The log likelihood
and the MLE of u 2 is (y 0 - µ ) 2 . function of a based on the margi nal distrib ution of
0
the S;'s is
The likelihood function based on all n + 1 observ 1
ations is /(a)= - n log <J - 2 :Esi
2
<J
1 n which is maximized for
1(µ 1 , µ., u) = -(n + 1) log u - -
...,
I (Y1- ·µY,
2u 2 i = O 2
By a straig htforw ard calculation we find that fl.i =Yi
a• 2 = - :Esi2 = - I :E(xi - Yi )2 ·
for i = 1, 2, ... , n, n 2n
and
The original analysis underestimated u 2 by 50%
f. (Yi - µJ
. 2] = because it did not allow

a2 =
11
+1 1 [( Yo - µo )2 + .L. 1
- - (Yo - µo) 2 . for the fact that half of the information about 2
a is lost in estimating
1=1 n+1 µI , µ2 , · · ·' µ •.
If n is large, this estim ate will be much too
small. Furth ermor e, because
of the serious under estim ation of a 2 , maximum
likelihood intervals for
µ1, µ1 , ... , µ. will be too narrow . The analys Marginal and Conditional Likelihoods
is does not take
account of the
fact that the inform ation provided by n of the n +
1 observations is lost in
estimating µ 1, µ 2 , .. . , µ •. The preceding three examples show some of the difficu
lties which can arise
when many param eters are to be estimated from a
EXAMPLE 10.8.3. Consi der 2n independent measu relatively small amou nt of
rements Xi, Yi for data.
i = 1, 2, ... , n, where X; - N(µi, u 2) and Yi~ N(µi> 2
a ). Altogether there Sometimes these difficulties can be overcome by
are n + l unkno wn param eters µ , µ , •.• , µ., carefully analyzing the
1 2 and <J to be estimated situation and discarding all but the relevant part
from 2n observations. of the data. It may be
possible to identify a marginal (or conditional) distrib
It can be shown that the log likelihood functi ution which carries all
on based on all 2n of the relevant information concerning the param
obser vation s is eter of interest. The
likelihood function derived from this distrib ution is
then called a margi nal (or
conditional) likelihood function.
In Example 10.8.2 we argued that Y , Y , ... , Y,,
1 2 do not provide any
information about a 2 when µ , µ , • .• , µ. are unkno
Upon setting the derivatives of I equal to zero and 1 2 wn. The margi nal log
solving, we find that likelihood function of <J based on the distrib ution
of Y0 is
for i = 1, 2, ... , n; l(u) = - log u - -1( y
- µ0)
2
for <J > 0.
2 0
2a
Note that, since Y0 is independent of Y , Y , .. • ,
1 2 Y,,, the condi tional distri-
bution of Y0 given Y1 , Y2 , . .. , Y,, is the same as the
margi nal distrib ution.
Hence /(a) can also be regarded as a condi tional log
likelihood function.
Similarly, in Example 10.8.3 the margi nal log likelih
ood function of a
based on S 1 , S2 , ... ,Sn is
It can be shown that, if n is large, then &2 will be close l
to a 2 /2 with probability /(a)= -n log a -
a
near l. Both 2 and maximum likelihood interv 2<J :Esf for a> 0.
als will seriously under-
estim ate u 2 . Since the S/s are distributed independently of the
T;'s, this can also be
To see the similarity with the situat ion in the interpreted as the conditional log likelihood function
preceding example, we of u given T 1 , Ti, ... , T,, .
consider the one-to -one transf ormat ion from (Xi, In both examples, satisfactory results are obtain ed
Y;) to (Si, 7;), where if we apply likelihood
methods to the appro priate marginal or condi tional
likelihood function of a.
I I.I. Sampling Distributions 97

CHAPTER. 11 11.1. Sampling Distributi ons


Likelihood methods for parameter estimation were described in Chapters 9
Frequency Properties and 10. Usually one starts with a set of data from an experiment. One then
attempts to formulate a probability model which describes how the data were
generated. If this model involves an unknown parameter B, the probability of
the data is found as a function of B. From this one gets the likelihood
function, MLE, log relative likelihood function, and likelihood intervals for B.
The purpose of this chapter is t_o investigate how the log RLF and related
quantities, such as 0 and likelihood intervals, would behave in a hypothetical
series of repetitions of the experiment. Thus we imagine that the experiment is
to be repeated over and over again under identical conditions. We assume
that B has the same value B0 (called the true value of B) in all repetitions.
If the experiment were repeated, we would most likely get a different set of
data. The likelihood function depends on the data, and so repeating the
experiment would likely produce a different r(B). The MLE and endpoints of
the 10% likelihood interval would vary from one repetition to the next, and
can be modeled as random variables. Their probability distributions in
Th~ pu~pose of this chapter is to investigate some theoretical properties of the repetitions of the experiment with B fixed are called their sampling distri-
est1mat10n procedures described in Chapters 9 and 10. butions. In principle, it is possible to derive exact sampling distributions from
. It is customary to evaluate and compare statistical procedures by_examin- the probability model. However, this is often very difficult to do, and
mg how they would behave in a series of hypothetical repetitions of the approximatio ns will be required (see Section 11.3).
experiment. One imagines that the experiment which gave rise to the data is Various other quantities associated with the log RLF can be studied. The
to be repeated over and over again under identical conditions. One then most important of these for statistical applications is the likelihood ratio
determines how frequently the statistical procedure would give correct results statistic,
in this series of repetitions. . D = -2r(B 0 ). (11.1.1)
Quantities such as 0 and the endpoints of the 10% likelihood interval
would vary from one repetition of the experiment to another. Their It is more convenient to work with D rather than r(8 0 ) because D is non-
probability distributions in repetitions of the experiment are called their negative whereas r(B 0 ) is non-positive. The factor 2 is included in (11.1.1)
sampling distributions. Some exact sampling distributions are derived in because it slightly simplifies matters in normal distribution examples.
Section 1, and these are used to calculate coverage probabilities in Section 2. In repetitions of the experiment, both r(B 0 ) and D would vary according to
These sections also introduce the likelihood ratio statistic, which will be used the data obtained. D would be small whenever the data were such that the
extensively in Chapter 12. A chi-square approximatio n to the distribution of true value B0 was a likely value of B, and D would be large whenever the data
the likelihood ratio statistic is investigated in Section 3. were such that B0 was unlikely. We can consider D as a random variable, and
Most Statistics textbooks advocate the use of confidence intervals, and · its sampling distribution can be derived from the model.
these are defined in Section 4. It is shown that, under some mild conditions Likelihood ratio stat~stics will be used extensively in the next chapter, and
likelihood intervals are confidence intervals or approximate confidenc~ a more general definition is given there. In the language of Section 12.2, D is
the likelihood ratio statistic for testing the hypothesis B = B0 •
intervals. It is suggested that the best way to construct confidence intervals is
Note that the repetitions to which we keep referring are purely imaginary.
by calculating likelihood intervals of the appropriate size. Intervals construc-
Real experiments do not get repeated over and over under identical
ted in this way will give valid information summaries in particular appli-
conditions. The series of repetitions is invented by the statistician to give a
cations, as well as having the desired frequency properties in a series of
imaginary repetitions. theoretical framework within which statistical procedures may be investi-
gated and compared. It is often possible to imagine many different ways in
Section 5 gives results on coverage probabilities and confidence regions for
which the experiment could be repeated, and the answers obtained will
the two-paramete r case. Section 6 defines the expected information function
depend to some extent on the series of repetitions considered. We shall return
and illustrates its use in planning experiments. Finally, Section 7 gives a brief
discussion of bias in parameter estimation. to this troublesome point in Chapter 15.
98
11. Frequency Properties
11. I. Sampling Distributions
99
EXAM PLE 11.1.1. Supp ose that n = 10 people are chose
n at rando m from a
large homo geneo us popu lation and are tested locati on of the maxim um of r(e). In repeti tions
estim ate 8, the propo rtion of people with TB in
for tuberculosis. The aim is to
the popul ation. We showed in rando m variable with 11 possible values (see
of the exper iment , is a
the secon d colum n of Table
e
Exam ple 9.1.l that if x of the 10 tested have TB, 11.1.l ). The probabilities of these values can
the log likelihood function of be found from the binom ial
8 is distri butio n ofX, and they depen d on the true value
80 . The last three colum ns
of Table 11.l.1 give these proba bilitie s for
/(8)= xlog B+(1 0-x) log( l-B) 80 = 0.1, for 80 = 0.2, and for
for 0 < B < 1. B0 =0.3.
The MLE is 8 = x/ 10, and the log RLF is r(B) Similarly, the left and right endpo ints of the 10%
= l(B) - l(B). LI are now regar ded as
For instance, if 3 out of 10 are diseased, we have rando m variables. There are 11 possible interv
8 = 0.3, l(B) = -6.109, and als [A , B] as show n in Table
11.1.l, and their proba bilitie s are given in the
r(8) = 3 log 8 + 7 log(l - 8) + 6.109 last three colum ns of the table
for 0 < 8 < 1. for B0 = 0.1, for B0 = 0.2, and for B = 0.3. Using
0 the tabul ated values, one can
This funct ion is plotte d with a solid line in investigate how the 10% LI would behave in
Figur e 11.1.1. We find that repeti tions of the exper iment
r(8)?: -2.30 for 0.072 ~ 8 ~0. 635, and this with B fixed (see the next section).
is the 10% LI for 8 based on the
obser vatio n x = 3. Finally, consider the likelihood ratio statis tic
Now imagi ne that the exper iment is to be repea
fixed at a partic ular value 8 . Then X, the numb
0
ted over and over, with 8
er with TB in the samp le, is a
D =- =
2r(8 0 ) -2[/( B 0 ) - l(B)].
rando m variab le with proba bility funct ion For any partic ular B0 , there are 11 possible value
s for D corre spond ing to the
11 possible values of X. The proba bilitie s of these
values can be found from
f(x) = ( ~0 ) BO(l - 80) 10 - .. for x = 0, l, ... , 10.
the binom ial distri butio n of X. Table 11. l.2 gives
D (i.e. its possible values and their probabilities) for
the samp ling distri butio n of
80 = 0.1, for 80 = 0.2, and
Since the obser ved X would vary from one repeti for 80 = 0.3 .
tion to anoth er, so would . The value of D which corre spond s to a relativ
1(8), tJ and r(8). For instan ce, if we obser ved e likeli hood of 10% is
X = 2, the log RLF would be -2 log 0.1 =4.61 . From Table 11.l.2 we find
that
r(8) = 2 loge + 8 loge + 5.004,
P(D ~ 4.61) = P(X ~ 3) = 0.987 for 80 = 0.1;
and if we obser ved x = 0, the log RLF would be
P(D ~ 4.61) = P(X ~ 5) = 0.992 for 80 = 0.2.
r(8) = 10 log {l - 8). Similarly, for B0 = 0.3 we have
Altogether, there are 11 possible log RLF' s corre
spond ing to the 11 possible P(D ~ 4.61) = P(l ~ X ~ 6) = 0.961.
values of X . Figur e 11 . l . l shows six of these, corre
spond ing to X = 0, l, ... , 5.
Vario us features of the log RLF can be studie
d. First consider 8, the Table 11. l.l . Samp ling Distr ibutio ns of the MLE
and the 10% LI [A, B]
in a Binomial Example
r(e)

.2 .4 .6 .8 Probability
- --
....... ....... ..... ..... ......

' '
1.0
8
x e A B 00 = 0.1 0 0 = 0.2 0 0 =0.3
' ' ''' 0 0.0 0.000 O.Z06 0.349 0.107
-2 '\\ \
\
I 0.1 0.004
0.028
\
\ 0.403 0.387 0.268 0.121
\ \ \ 2 0.2 0.029 0.530
\
\
\ \ \
\ 0.194 0.302 0.233
\ \ 3
\ \
\
\ I
\ 0.3 0.072 0.635 0.057
\ \ \ I 0.201 0.267
\ \ \ I 4 0.4 0.128 0.725
\ \ \ I 0.011 0.088 0.200
\ \ I I 5 0.5 0.196 0.804
\
\ \ \ I
I
0.001 0.026 0.103
\ \ I I 6 0.6 0.275 0.872
x•O 0.000 0.006 0.037
2 3 4 5 7 0.7 0.365 0.928 0.000 0.001 0.009
Figure 11.1.1 . Six of the eleven possible log relative 8 0.8 0.470 0.971 0.000
likelihood functions in a binomial 0.000 0.001
example. 9 0.9 0.597 0.996 0.000 0.000 0.000
10 1.0 0.794 1.000 0.000 0.000 0.000
100 l L Frequency Properties : L1 Sampling Distributions 101

Table 11. l.2. Sampling Distribution of the Likelihood Ratio Statistic and B = X + c, with sampling distributions
=
D -2r(B 0 ) in a Binomial Example A~ N(B 0 - c, 1); B ~ N(B 0 + c, 1).
B0 =0.1 Bo 0.2 B0 =0.3 The likelihood ratio statistic is
x D(x) f(x) D(x) f(x) D(x) f(x)

0 2.11 0.349 4.46 0.107 7.13 0.028 which would also vary from one repetition of the experiment to another. To
0.00 0.387 0.73 0.268 2.33 0.121 find the sampling distribution of D, we note that, by (6.6.5),
2
3
0.89
3.07
0.194
0.057
0.00
0.56
0.302
0.201
0.51
0.00
0.233
0.267 Z =X - B 0 ~ N(O, 1).
4 6.22 0.011 2.09 0.088 0.45 0.200 Thus, by (6.9.8), we have
5 10.22 0.001 4.46 0.026 1.74 0.103
6 15.01 0.000 7.64 0.006 3.84 0.037 D = z2 ~ xf1)·
7 20.65 0.000 11.65 0.001 6.78 0.009 In this example the sampling distribution of the likelihood ratio statistic is
8 27.25 0.000 16.64 0.000 10.68 0.001
9 35.16 0.000 22.91 0.000 15.88 0.000
xfl) for all possible parameter values B0 • The situation is much simpler than in
10 46.05 0.000 32.19 0.000 24.08 0.000 the preceding example, where the sampling distribution of D depended on the
true value B0 •
To calculate probabilities for D, we can use either Table B4 for the chi-
In all three cases, values of D greater than 4.61 would occur very rarely in square distribution, or Table B2 for N(O, 1) (see Appendix B). For instance,
repetitions of the experiment. The true value B0 would almost always have a P(D $ 4.61) = P(xfl) $ 4.61) ~ 0.97
relative likelihood of 10% or more, and therefore a D-value of 4.61 or less.
by interpolating in Table B4. For greater accuracy, we note that
EXAMPLE 11.1.2. Suppose that an experiment involves taking a single mea-
surement x which is modeled as the observed value of a random variable P(D $ 4.61) = P(Z 2 $ 4.61) = P(IZI ::s; 2.146) = 0.968
X ~ N(B, 1). If the measurement interval is small, the log likelihood function from Table B2. This result is true for all B0 , whereas in the preceding example
of Bis we found that P(D $ 4.61) depended on the true value B0 •
l(B) = -t(x - B) 2 for - oo < B< oo, EXAMPLE 11.1.3. Suppose that we observe n measurements x 1 , x 2 , ... , x"
which we model as observed values of independent N(µ, () 2 ) variates. As in
from which we obtain {) = x and l(O) = 0. The log RLF is Example 9.7.1 we assume that (j is known and that the measurement intervals
r(B) = -t(x - 8)2 for - oo < B < oo. 1
are small. Then the MLE ofµ is fl= x= - LX;, the log RLF is
n
Upon solving r(B) ~log p, we find that the 100p% likelihood interval for Bis
given by n 2
for - oo < µ < oo,
r(µ) = - 2()2 (x - µ)
x-c$B$x+c
and the 100p% likelihood interval is
where c=J-21ogp. C(J C(J
Now imagine that the experiment is to be repeated over and over again x- :s;µ:s;x+
with B fixed at a particular value B0 . The probability distribution of X in
repetitions with B= B0 is N(B 0 , 1). Since the observed value of X would vary
from one repetition to the next, so would r(O), 0, and the endpoints of
where c = J-2 log p.
likelihood intervals. We can now think of{) as a random variable, 0 = X, with Now imagine that the experiment is to be repeated over and over with
sampling distribution u = µ 0 . The observed x/s would vary from one repetition to the next, and thus
0 ~ N(B 0 , 1). so would x. By (6.6.8) we have X ~ N(µ 0 , () 2 /n), and therefore

Similarly, the endpoints of the 100p% LI are random variables A= X - c µ =X ~ N(µ 0 , () 2 /n).
102 11. Frequency Propertie s 11.2. Coverage Probability
103

This is the samplin g distribu tion of the MLE in repetitio ns of the This is not a conditio nal probabi lity. Rather, the notation is
experim ent meant to
withµ= µ . Similarl y, the endpoin ts of the 100p% LI are now emphas ize that the true parame ter value B is to be used in comput
0 regarde d as 0 ing the
random variable s X - ca/Jn and X +ca/Jn . · coverag e probabi lity.
By (11.1.1), the likeliho od ratio statistic is The coverage probabi lity CP(B0 ) is the fraction of the time that the
interval
[A, B] would include the true value B in a large number of repetitio
D = -2r(µ 0) =2(A
n
(j
-
"U
µo)
2
= z2
0
experim ent with B = B0 • Note that A and B are the random
ns of the
variable s in
(11.2.1) and 80 is fixed.
where Z is the standar d form of X:
Z =(X - µ 0 )/~ ~ N(O, 1). EXAMPLE 11.2.1. Imagine that the experim ent describe d in Exampl
e 11.1.1 is
It follows by (6.9.8) that for all µ the samplin g distribu tion of Dis 2 e
to be repeated over and over again with fixed at B . Each time
0 the 10%
0 X with one likeliho od interval for Bis to be calculat ed. We want to know what
degree of freedom . The same result was obtaine d in the precedin fraction of
g example, the time this interval would contain the true value B .
which is the special case n = u = I. 0
e
First conside r repetitio ns with = 0.1. The 11 possible 10% likeliho
od
interval s are listed in Table 11.1.1 , and from the table we see that
As 0.1 s B
Note on Notati on whenever 0 s X s 3. Thus the coverag e probabi lity of the
10% LI in
repetitio ns with B= 0.1 is
In previou s chapter s we have used capital letters for random variable
s and CP(0.1) = P(A s 0.1sBI O=0.1) = P(O s X
the corresp onding small letters for their possible values. We shall s 3!B = 0.1).
no longer
follow this convent ion in all cases. In particul ar, we shall use 8 Now from the 5th column of Table 11.1.1 we get
to represen t
the MLE of e whether we are thinking of it as a random variable in CP(O.l) = 0.349 + 0.387 + 0.194 + 0.057 = 0.987.
repetitio ns of the experim ent, or as the particul ar value comput
ed from the
data. Also, we shall use r(8 ) for the log relative likeliho od of Bo The coverage probabi lity in repetitio ns with B = 0.2 is
0 whether we
are conside ring it to be fixed or random , because we are already CP(0.2) = P(A s 0.2 s BIB= 0.2) = P(O s X
mean e'.
using R to s 5)8 = 0.2).
Using the 6th column of Table 11.1.I we get

CP(0.2) = 0.107 + 0.268 + ··· + 0.026 = 0.992.


11.2. Coverage Proba bility Similarly, we find that

Suppos e that the probabi lity model for an experim ent involves
CP(0.3) = P(A s 0.3 s BIB= 0.3) = 0.961.
a single In repetitio ns of the experim ent with 8 = 0.1, the 10% LI for
unknow n parame ter B. As in Section 11.1 we imagine a series of repetitio B would
ns ~f 0
include, or cover, the true value 98.7% of the time. Similarly,
the experim ent withe fixed at Bo. Further , imagine that an interva~ the 10% LI
~A, B] IS would cover the true value B0 in 99.2% of repetitio ns with B =
to be comput ed from the data in the same way at each repet1t10 0.2, and in
n. For 96.1% of repetitio ns with 80 = 0.3.
0
insta nce, [A, B] could be the 10% likeliho od interval for 8.
Owing to random variabil ity in the data, the interval [A, B] would In this example, the coverag e probabi lity depends on the true value
vary B0 . It
from one repetitio n to the next. Its endpoin ts A, B can be modeled can be shown that CP(B 0 ) varies from a low of 89.3% when B
as random 0 is 0.206 or
variable s. The samplin g distribu tions of A and B can be derived 0.794 to a high approac hing 100% as 8 tends to 0 or 1. Owing
from the 0 to the
probabi lity model, and they will generally depend upon Bo. discreteness of the binomia l distribu tion, CP(B ) is not a continu ous
. 0 function .
Since the interval [A, B] would vary from one repetitio n to the For instance, we see from Table 11.1.1 that the upper endpoin t of
next, it the 10% LI
might sometim es fail to include the true value B . Hopeful ly corresp onding to X = 0 is 0.206. In comput ing CP(B ), we would
this would 0 include
0
happen only rarely, and the interval [A, B] would usually contain P(X = 0) for B0 s 0.206 but not for B0 > 0.206. As 8 increase s through
, or cover, 0 0.206,
the true value 80 . the coverage probabi lity suddenl y decreas es by
The coverage p,robability of the random interval [A, B] is the probabi P(X = OIB0 =
lity 0.206) = (0.794) 10 = 0.100.
that the interval [A, B] includes , or covers, the true parame ter
value Bo: Similarly, CP(8 0 ) will have a disconti nuity at each of the other endpoin
ts of
(11.2.1) the 11 possible likeliho od interval s.
104 I I. Frequency Properties
l 1.2. Coverage Probability
!05
EXAMPLE 11.2.2. In Example 11.1.2, the I 00p% likelihoo d interval for
B has
the form ' EXAMPLE 11.2.3. In Example 11. l.3 the 100p% likelihoo d interval for
µ has
the form
X-c::;;B ::s;X+ c
cu; cu
where c = J- 2 log p. By (11.2.1), the coverage probabil ity of this interval in X--< µ<X +-
Jn- -
repetitio ns of the experime nt with 8 = B is
0
Jn
CP(B 0 ) = P(X - c:::::; B0 ;:;: X + c!B = B0 ) where c = J-
2 log p. The coverage probabil ity of this. interval in repetitio ns
of the experime nt with µ = µ 0 is
= P(-c s X - 00 :::::; c!B = B0 ) .
The distribut ion of Xis N(B 0 , I), so (6.6.5) gives

Z=:X-0 0 ~N(O, 1).


It follows that =P(-c::: ::; ~ sclµ=µ o)·
CP(B 0 ) = P( - c :::::; Z :::::; c)
The probabil ity distribut ion of X when µ = µ is N(µ , u 2/n), and therefore
0 0
where Z ~ N(O, I) and c = J -2 log p. For any given p, the coverage
probabil ity can be found from Table Bl or B2, and it does not depend on
B0 . Z = Xr:It:.
:-- µ o
~ N(O, I).
If p = 0.1, then c = J- 2 log 0.1 = 2.146, and so the coverage probabili ty
2
.y u / n
of the 10% LI is Thus we have
CP = P(-2.146:::::; Z:::::; 2.146) = 0.968. CP(µ 0 ) = P(-c :::::; Z:::::; c)
The 10% LI is X ± 2.146, and it would include the true value of Bin 96.8%
repetitio ns with B fixed. Similarly , it can be shown that the coverage
of where Z ~ N(O, 1) and c = J-
2 log p. For every p, the coverage probability
of the 100p% likelihoo d interval is the same as in the precedin g example
probabil ities of the 50% and I% likelihoo d intervals are 0.761 and 0.9976, .
respecti vely (see Table 11.2.1).
From Table Bl we note that
Use of the Likelihood Ratio Statistic
P( -1.96:::;: Z:::::; 1.96) = 0.95.
Thus to obtain 95 % coverage probabil ity we require c = 1.96. Now solving In Example 11.2.0we used a list of all possible 10% likelihoo d intervals
in
c= J- 2 log p for p gives
calculati ng the coverage probabil ity. Essential ly the same method of calcu-
lation was used in Example s 11.2.2 and 11.2.3, except that there we were
able
p =e- ' = 0.147.
2 2
/ to use formulas for the interval endpoint s.
The 14.7% likelihoo d interval for Bis X ± 1.96. It would cover the true It is not really necessary to determin e all possible likelihoo d intervals ,
value either numerica lly or by formula, in order to determin e their coverage
of 0 in 95% of repetitio ns of the experime nt with B fixed . Similarly, we
can probabil ity. Instead, one can find coverage probabil ities of likelihoo
show that the 25.8% and 3.6% likelihoo d intervals have coverage proba- d
intervals from the sampling distribut ion of the likelihoo d ratio statistic.
bilities 0.9 and 0.99, respectiv ely.
The 100p% likelihoo d interval (or region) for B is the set of paramet
er
values such that R(B);;:: p, or equivalen tly, r(B);;:: log p. A particula r paramet
er
value 80 belongs to the 100p% LI for B if and only if r(B );;:: log p.
Table 11.2.1. Coverag e Probabil ities of I 00p% Likeliho od Intervals in 0 Since
D = - 2r{80 ), it follows that 80 belongs to the IOOp% LI if and only if
Example s 11.2.2 and 11.2.3 D:::::; - 2 log p. Therefor e, the coverage probabil ity of the I 00p% likelihoo
d
p 0.5 interval is
0.1 0.01 0.258 0.147 0.036
CP 0.761 CP(B 0 ) = P(B 0 belongs to 100p% LIIB = B0 )
0.968 0.9976 0.9 0.95 0.99
= P(D:::::; -2 log plB = B0 ) . (11.2.2)
106 11. Frequency Properties ll.3. Chi-Square Approximations
107
EXAMPLE 11.2. l (continued). Since -2 log 0.1 = 4.61 , the
coverage proba- (b) Determine p so that the 100p% likelihood interval
for ti will have coverage
bility of the 10% likelihood interval is probability 0.975.
CP(B 0 ) = P(D s; 4.61 IB::, B0 ). 4. A deck of I 000 cards contains a card from each of 901
denominations 0, I, ... , 900
Colum ns 2 and 3 of Table 11.1.2 give the sampling distrib and 99 extra cards from some unknown denom ination 0.
The possible values of 0
ution of D for are 0, 1, ... , 900. One card is selected at random from the
B = 0.1, and we see that D:::;; 4.61 whenever 0:::;; X ~ deck and its denomin-
3. It follows that ation X is observed.
CP(O. l) = P(O s; X ~ 31B = 0.1) = 0.987 (a) Show that &= x and
as before. Similarly, when B = 0.2 we have D ~ 4.61 for
{~.01
0s X ~ 5, and so for ti= x;
R(O)=
CP(0.2) = P(O s X:::;; 51B = 0.2) = 0.992. for 0# x.
(b) Show that the likelihood ratio statistic D = - 2r(0
Note that (11.2.2) permits us to find coverage probabiliti~s ) has just two possible
without having values, 0 and 4 log 10, with probabilities 0.1 and 0.9,0 respect
to calculate all possible intervals. For instance, since - ively.
2 log 0.5 = 1.386, the (c) Show that, for p > 0.01, the 100p% likelihood interva
coverage probability of the 50% likelihood interval is l for 0 consists of a single
value, and its coverage probability is 0.10.

CP(B 0 ) = P(D ~ 1.386!B = 80 ) .


This can be found from Table 11.1.2 for 8 = 0.1, 0.2,
and 0.3. It is not
0
necessary to calculate the 11 possible 50% likelihood
intervals in order to
11.3. Chi-Square Approximations
evaluate the coverage probability.
We noted in Section 11.2 that coverage probabilities of
likelihood intervals
EXAMPLES 11.2.2 and 11.2.3 (continued). We showed in can be found from the sampling distribution of the likelih
Section 11.1 that for ood ratio statistic
both of these examples, the sampling distribution of Dis
xfi> for all parameter D = - 2r(B 0 ). The exact distribution of D was derive
d under binomial and
values. Thus, in both examples, the coverage probab normal models in the examples of Section 11.1. We found
ility of the 100p% that the distri-
likelihood interval is bution of D depends on the true value B in the binomial
0 example. However,
in the normal examples, the distribution of D is x?l) for
CP = P(xfi) ~ - 2 log p). all 80 .
It is usually difficult to derive the exact sampli
ng distrib ution of D.
It follows from (6.9.8) that, for all d > 0, Fortunately a good approx imatio n is available. It turns
out that, in many
situations, the distribution of D is remarkably close to
P(xf1 >:::;; d) = P(Z 2 :::;; d) = P(-j d s; Zs xfi>:
jd)
where Z - N(O, 1). Thus we have D = -2r(B 0 );::; xf0 . (11.3.1)
The results in Table 11.2.1, which are exact for the
CP= P(-cs Zs;c ) norma l distrib ution
examples, will provide good approx imatio ns in many
situati ons with non-
where Z ~ N(O, 1) and c = J-2 log p. This is the same normal distributions. Coverage probabilities of 25.8%
result that we , 14.7%~
obtain ed previously by working with formulas for the likelihood intervals will usually be close to 0.9, 0.95, and
likelihood intervals. 0.99, respectively .
. The Central Limit Theorem can be used to establish (11.3.1
) when the data
are observed values of n IID random variables X , X
PROBLEMS 1 2 , ... , X". It can be
FOR SECTION 11.2 shown that, under some mild regularity conditions,
1. In Example 11.2.1, find the coverage probability of the
20% likelihood interval for Jim P(D:::;; dlB =Bo)= P(xf0
0 in repetitions of the experiment with 0 = 0.1. Repeat for 0 :::;; d) (11.3 .2)
= 0.2 and for 0 = 0.3.
2. In Example 11.2.1, show that the 100p% likelihood
interval for 0 has the same for all d > 0. A brief justification of this result is given at the
end of the section .
coverage probability in repetitions with 0 = 1 - 0 as in The most impor tant condition required in proving (11.3.2
0 repetitions with 0 = 00 . ) is that the range
3. t Consid er the situatio n described in Example 11.2.2. of the X/s must not depend upon 8. Also, it is assumed
that B0 is an interior
point of the parameter space. The x2 approx imatio n need
(a) Find the coverage probability of the 20% likeliho not hold if B0 is on
od interval for µ in the bound ary of the param eter space. If B is close to the
repetitions of the experiment with µ fixed. 0 bound ary, a very
large value of n may be needed before the x2 distrib ution
provides a good
108 11. Frequency Properties 11.3. Chi-Square Approximations 109

approximation. The limiting distribution of D is xf1 for all interior parameter Table 11.3.1. Exact and Approximate Probabilities for the Likelihood
values 80 , but the sample size needed to achieve ~easonable accuracy may Ratio Statistic in a Binomial Example
depend upon 80 .
. In ~he following three examples, the accuracy of (11.3.1) is investigated in n 80 P(D s 2.706) P(D s 3.841) P(D s 6.635)
,
s1t~at10ns. where the exact sampling distribution of D can be derived fairly IO 0.1 0.930 0.987 0.998
easily. Usmg the exact sampling distribution, we shall calculate 0.2 0.859 0.859 0.992
0.3 0.924 0.961 0.961
P(D :s; 2.706), P(D $ 3.841), P(D $ 6.635).
?O 0.1 0.835 0.867 0.998
The values 2.706, 3.841, and 6.635 were chosen because they are the 90%,
0.2 0.899 0.956 0.986
95%, and 99% points of xfl) (see Table B4). Thus, if (11.3.1) holds, the three 0.3 0.917 0.947 0.987
probabilities should be close to 0.9, 0.95, and 0.99.
By (11.2.2), P(D $ d) is the coverage probability of the 100p% likelihood 50 0.1 0.908 0.942 0.992
interval where d = - 2 log p; that is, p = e-d/1. Since e- 2 . 7 o612 = 0.258 0.2 0.891 0.951 0.988
P(D $ 2.706) is the coverage probability of the 25.8% likelihood interval'. 0.3 0.912 0.957 0.987
Similarly, P(D $ 3.841) and P(D $ 6.635) are coverage probabilities of 14.7% x2 approx. 0.9 0.95 0.99
and 3.6% likelihood intervals. The three examples provide a comparison of
the exact coverage probabilities of these three likelihood intervals with their
approximate coverage probabilities from (11.3.1).
EXAMPLE 11.3.2. Suppose that an experiment yields n counts which are
EXAMPLE 11.3.1. Suppose that n people are tested for tuberculosis as in modeled as observed values of independent Poisson-distributed variates
Example 11.1.1. In that example we took n = 10 and derived the exact X 1 , X 2 , .. , X" with expected valueµ. From Example 9.1.2, the log likelihood
sampling distribution of D for 80 = 0.1, for 80 = 0.2, and for 80 = 0.3. function of µ is
The exact distribution of D when B0 = 0.1 is given in columns 2 and 3 of I(µ) = t logµ - nµ forµ> 0
Table 11.1.2. We see that D $ 2. 706 for X $ 2, and therefore
where t = r.x;. The MLE is jJ. = t/n, and the log RLF is
P(D $ 2. 706) = P(X $ 2) = 0.349 + 0.387 + 0.194 = 0.930.
Similarly, we have r(µ) = l(µ)- l(j;.) = t log~ - nµ + nj;.
µ
P(D:::::; 3.841) = P(X :s; 3) = 0.987;
t
P(D :s; 6.635) = P(X :s; 4) = 0.998. = -tlog--nµ+t.

These results are shown in the first row of Table 11.3.1. The values in the next
Now imagine a series of repetitions of the experiment with µfixed at µ 0 .
two rows of this table are obtained in a similar fashion from the last four
columns of Table 11.1.2. The last row of Table 11.3.1 gives the approximate
=
The total count T r.x; would vary from one repetition to the next, and is
modeled as a random variable. By the corollary to Example 4.6.1, the
probabilities according to (11.3.1). The other rows are obtained by redoing
probability distribution of Tis Poisson with mean m = nµ 0 •
the calculations of Example 11.1.1 with n = 20, and then repeating them again
with n = 50. The likelihood ratio statistic is
In this example, D is a discrete random variable having n + 1 possible
values, one for each value of X. The approximating x2 distribution 1s
continuous. For this reason, one would not expect the approximation (11.3.1)
D = -2r(µ 0 ) = 2[ Tlogf + m- T J
to be very accurate when n is as small as 10. As n increases, so does the D is a discrete variate with one possible value for each value of T. For any
number of possible values for D. When n is very large, the discreteness no 3iven m we can substitute T = 0, 1, 2, ... to obtain the possible values of D.
longer matters, and the distribution of D will be well approximated by x21 • Their probabilities are obtained from the Poisson distribution
The limiting distribution of D is xh> for all B0 such that 0 < B0 < 1. If B~ ls P(T = t) = m'e-m/t! fort= 0, 1, 2, ....
near i, (11.3.1) gives fairly accurate results for n = 20. However, a much larger
n is needed if B0 is close to 0 or 1. For instance, suppose that m = 10. Then
110 11. Frequency Properties
11.3. Chi-Square Approximations 111

D 2 [ T log l: + 10 T J RLF is

() ()] =-n [ --1-log-


t t J
r(O)=-n [ --1-log-
which can be calculated for T = 0, 1, 2, .... (The term T log : is taken to be 0 e e ne ne
1
when T= 0.) From these calculated values, we find that D s 2.706 for Now imagine that the experiment is repeated over and over with 8 fixed at
6 s Ts 15. Since T has a Poisson distribution with m = 10, it follows that 80 • The total lifetime T= }2X 1 is now modeled as a continuous random
variable. Thus the likelihood ratio statistic
15
P(Ds2.706)= I lO'e- 10/t!=0.884.
D= -2r(8 0 )=:2n[_I_- l -log_I_J
t=6
neo n8o
Similarly, we obtain
is also a continuous variate. Note that
P(D s 3.841) = P(5:::::; Ts 16) = 0.944;
D=2n[Y-l-lo gY]
P(D:::::; 6.635) = P(4 :s; T:::::; 19) = 0.986. where Y= T/n0 0 , and so
These probabilities are recorded in the second row of Table 11.3.2, and the P(Dsd)=P(Y -1-log Y:S:d/2n).
last row gives the approximate probabilities from (11.3.1). The other rows of
the table are found by repeating the calculations with m = 5, 20, and 40. The Consider the function
table suggests that the x2 approximation is reasonably g~od for nµ 0 2: 10. g(y) = y - 1 - logy for y > 0.
Note that the exact distribution of D, and hence the accuracy of the x2
approximation, depends only on the product nµ 0 • If µ 0 is small, a very large 1
Since g'(y) = 1 - - and g"(y) = l/y 2 , we see that g(y) has a unique minimum
value of n will be needed before (11.3.1) is applicable. If µ 0 is large, (11.3.1) will y
give accurate results even for n = 1. value g(l) = 0. Also, g(y)-+ oo as y-. 0 and as y-> oo. Thus, for every d > 0,
As in the preceding example, the likelihood ratio statistic is a discrete there will exist two values y 1 , y 2 with y 1 < 1 < y 2 such that g(y) :s; d/2n if
variate whereas the x2 approximation is continuous. When m is small, there and only if y1 :::;; y :S:y 2 • To find _these values, we can solve the equation
are only a few values with appreciable probabilities and so we cannot expect g(y) - d/2n = 0 by Newton's method. We then have
(11.3.1) to be accurate. When mis large, there are many more D-values with
non-negligible probabilities and so the effects of discreteness will be less P(Dsd)=P(y1:::::; Ysy2)=P(Y1:::; ; n~o SY2)·
serious.
To evaluate this probability, we note that, from Problem 6.9.7, the variate
EXAMPLE 11.3.3. Suppose that an experiment yields n lifetimes which are U = 2T/80 has a x2 distribution with 2n degrees of freedom. Therefore,
modeled as observed values of IID exponential variates with mean e. From
Examples 9.4.1 and 9.7.2, the MLE of e is()= t/n where t = LX;, and the log P(D :S: d) = P (2ny 1 :S: ~: :S: 2ny 2) = P(2ny 1 s xfzn> $ 2ny 2 ).

For instance, suppose that n = 5 and we wish to evaluate P(D::::; 2.706).


Table 11.3.2. Exact and Approximate Probabilities for the Then d/2n = 0.2706, and y 1 , y 2 are the roots of the equation
Likelihood Ratio Statistic in a Poisson Example
y - 1 - log y - 0.2706 = 0.
nrio P(D ::;2.706) P(D :s; 3.841) P(D :s; 6.635) \
Solving this equation by Newton's method gives y 1 = 0.4326 and y 2 = 1.9261.
5 0.928 0.928 0.988 Thus we have
10 0.884 0.944 0.986
20 0.881 0.958 0.990 P(D::::; 2.706) = P(4.326 s xho):::;; 19.261).
40 0.886 0.951 0.991 Table B4 is not sufficiently detailed to permit accurate evaluation of this
x2 approx. 0.9 0.95 0.99 probability. However, a computer program for the c.d.f. of the x2 distribution
gives
112
11. Frequency Properties 11.4. Confidence Intervals
113

P{xf1oi:::; 4.326} = 0.0686;


Differen tiating with respect to 80 gives
P{xf1oi::::: 19.261} =0.9629 .
S(B 0 ) ~ (B - B0 )f(B).
It follows that, for n = 5,
It follows by (11.3.3) that
P(D:::; 2.706) = 0.9629 - 0.0686 = 0.8943.
(B- B0 )j7(8):: :::: N(O, 1). ( 11.3.4)
The other entries in Table 11.3.3 may be calculat ed similarl y.
In this exampl e the exact distribu tion of D depends on n but The likeliho od ratio statistic is
not B0 . The
approxi mation (11.3.1) is quite accurat e even for ;1 as small as 2
a continu ous variate, and we do not have to contend with
or 3. Here Dis D =-2r(B 0 ):::::: (B- B0 ) 2 f(B) ~ z 2
the effects of
discrete ness as in the precedin g two example s. where Z:::::: N(O, I) by (11.3.4). It follows by (6.9.8) that D has approxi
mately a
xf1 , distribu tion when n is large. D
. JUSTIFICATION OF (11.3.2). We conclud e this section by sketchin
g a derivati on
of the x2 approxi mation (11.3.2). A rigorou s proof of this result
is beyond the
scope of the book. PROBLEMS FOR SECTION 11.3
We assume that the data are observe d values of n IID random 1. Consider the situation described in Example 11.3.2.
variable s
X t> X 2 , ... , X n· Since the X /s are indepen dent, it follows by (9.2.3) that
the (a) Show that, if nµ 0 = 9, then r(µ 0 ) ~log 0.1 if and only if 4 s T 5, 16.
score function S(B) can be written as a sum of n indepen dent compon Hence find
ents. We the exact coverage probability of the 10% likelihood interval when nµ
shall show in Section 11.6 that S(8 ) has mean 0 and variance 0 = 9.
0 E {f(B 0 ) }. By (b) Investigate the behavior of the coverage probability of the 10%
the Central Limit Theorem (6.7.1) we have likelihood
interval for 9 5, nµ 0 s 10.
2.tLet X,, X 2 , ... , X" be HD random variables with p.d.f.
for n sufficiently large. f(x) = 2).xe - .1x' for x > 0,
The MLE is the solution of S(B) = 0, and it can be shown
using the where ). is a positive unknown parameter.
e
precedin g result that tends to B0 with probabi lity 1 as n--+ oo.
It can then be
(a) Show that the likelihood ratio statistic is
shown that f (B)/ E {f (B 0 )}--+ 1 with probabi lity 1 as n-+ oo, and
hence
S(B0 )/j](B}:::::: N(O, 1) (11.3.3)
D =- 2r(.l. 0) = T- 2n - 2n 1og(T/ 2n)
where T= 2.l. 0 l:Xf.
for n sufficiently large. (b) Show that 2.l. 0 Xl has a x2 distribution with two degrees of freedom,
Since Bwill be close to B0 when n is large, the normal approxi mation and hence
(9.7.2) that T ~ Xfin>· :
gives (c) Show that the coverage probability of the 100p% likelihood interval
is the
same as in Example 11.3.3.

Table 11.3.3. Exact and Approx imate Probabi lities for the
Likeliho od Ratio Statistic in an Expone ntial Exampl e 11.4. Confidence Intervals
P(Ds 2.706) s s The random interval [A, B] is called a confidence interval (CI)
11 P(D 3.841) P(D 6.635) for e if its
coverag e probabi lity
1 0.874 0.932 0.984
2 0.886 0.941 0.987 CP(B 0 ) = P(A :::; B0 ::::; BIB= Bo)
3 0.891 0.944 0.988
5 is the same for all parame ter values B • The coverag e probabi
0.894 0.946 0.989 0 lity of a
10 0.897 0.948 confiden ce interval is called its confidence coefficient.
0.989
For instance , [A, B] is a 95% confide nce interval for B if
x2 approx. 0.9 0.95 0.99
P(A :$ B0 :::; BIB= B0 ) = 0.95
114
11. Frequency Properties
11.4. Confidence Intervals
115
for all possible paramete r values 80 . A 95% CI would include the true
paramete r value B0 in 95% of repetitions of the experiment with B fixed. Another method of constructing confidence intervals is by inverting a test
In Examples 11.2.2, 11.2.3, and 11.3.3 we found that the coverage of significance (see Section 12.9). .
probability of the 100p% likelihood interval was the same for all parameter
values. In each of these examples, the 100p% LI is a confidence interval. In EXAMPLE 11.4.1. In Example 11.2.2 we noted that
particular, the confidence coefficient of the 14.7% LI is exactly 0.95 in Z:X-B 0 ~N(O , 1).
Examples 11.2:2 and 11.2.3, and is close to 0.95 in Example 11.3.3.
Likelihood intervals are not confidence intervals in Examples 11.2.1, 11.3.1, Since P{ -1.96 ~ Z ~ 1.96} = 0.95, it follows that
and 11.3.2 because their coverage probabilities depend on the true parameter ~ X + 1.96) = P( - 1.96 ~ Z ~ 1.96) = 0.95.
P(X - 1.96 ~ B0
value B0 . In general, when the probability model is discrete, the c~verage
probability of a random interval [A, B] will be a discontinuous function of B The interval [X - 1.96, X + 1.96] has coverage probability 0.95 for all B ,
0
(see Example 11.2.1). For this reason, it is generally not possible to construct0 and therefore it is a 95% confidence interval for B. It is also a likelihood
exact confidence intervals in the discrete case. However, the effects of interval. Values of B included by this interval are more likely than the
discreteness become Jess important as the sample size increases. Thus it is excluded values.
often possible to find approxim ate confidence intervals for which CP(B ) is There are plenty of ways to construct confidence intervals in this example.
nearly constant over those paramete r values B which are of interest.
0 For instance, Table B2 gives
0
Because of the x approximation, likelihood intervals are exact or
2
P( -2.376 ~ Z ~ 1.751) = 0.95.
approxim ate confidence intervals in most applications. When (11.3.1) applies,
the approxim ate confidence coefficient (coverage probability) of the 100p% Thus the interval [X - 2.376, X + 1.751] has coverage probability 0.95 for all
likelihood interval is given by B0 , and is also a 95% confidence interval for B. Note, however, that this is not
a likelihood interval. It includes values of Bat the lower end which are much
CP::::: P{xf1) ~ - 2 log p} less likely than values excluded at the upper end. Although this interval
(see Table 11.2.1).
would cover the true parameter value 95% of the time in repetitions of the
experiment, it would not properly summarize the information available
Interpre tation concerning B in any particular application.

Except in special cases, it is not correct to conclude that a particular observed


Use of the Normal Approximation
95 % confidence interval [a, b] has a 95% probability of including the true
value of B. It can happen that [a, b] contains all possible values of B, and so
The recommended method for obtaining, say, a 95% confidence interval is to
covers B0 with probability 100%. The 95% coverage probability is a
calculate the 100p% likelihood interval, where p is chosen so that the
theoretical average figure which refers to an imaginary sequence of repe-
coverage probability is close to 0.95. When (11.3.1) holds, the 14.7%
titions of the experiment. It is a property of the method used to construct the
likelihood interval is an exact or approximate 95% confidence interval. It can
interval rather than of the interval calculated in any particular case.
be found from a graph of r(B) as in Section 9.3, or by Newton's method as in
In most applications one has a particular observed data set and wants to Section 9.8.
know what can be learned from it about the value of B. If confidence intervals
Since confidence intervals constructed in this way are also likelihood
are to be useful in such applications, they must be constructed in such a way
intervals, they will provide proper summaries of the information available in
that an individual observed interval [a, b] does provide a reasonable
particular cases. A disadvantage of this construct ion is that a fair bit of
information summary. Values inside the interval should be in some sense
arithmetic may be needed to compute the likelihood interval. However, with
better estimates of B than values outside the interval. high-speed computers so widely available, this is not a serious problem.
For this reason, it is recommended that confidence intervals be constructed Sometimes one can avoid most of the arithmetic by using the normal
from the likelihood function. If a 95% confidence interval for B is desired
approximation of Section 9.7. By (9.7.3), the interval 8 ± c/..j:i{Ji) is an
then a 100p% likelihood interval is calculated where p is selected to give th~
approximate likelihood interval for B. Its coverage probability is
desired coverage probability of0.95. Intervals constructed in this way will have
the desired long-run coverage properties, and in addition, they will provide CP(B0 ) = P{O- c/ft{tfJ ~ B0 s 8 + c/ft{tfJIB =
useful information summaries in particular applications. B0 }
= P{ -c S (O - B0 )prtfJ S clB = B0 }.
116 11. Frequency Properties 11.4. Confidence Intervals 117

Now (11.3 .4) gives


or 0.096 $ 8 $ 0.244, as the approximat e 95% confidence interval for 8.
CP(8 0 );::;; P(-c ~ Z $ c) Intervals constructed in this way wo uld cover the true value of 8 abo ut 95 %
where Z - N(O, I). of the time in repetitions of the experiment with 8 fixed. However, thi s interval
Since P( -1.96 ~ Z $ 1.96) = 0.95, the interval is not a likelihood interval. Its endpoints have relative likelih oods
R(0.096) = 0.072; R(0.244) = 0.200.
e± 1.96/ JJ(B) (11.4.1)
Thus the interval includes values at the lower end. which are much less
is an approximat e 95 % confidence interval. Similarly, 8 ± 1.645/ jJ{B) is an plausible than values excluded at the upper end. F o r this reason, the fi rst
approximat e 90% confidence interval, and 8 ± 2.576/ftW j is an approxi- constructio n is preferable.
mate 99% confidence interval. EXAMPLE 11.4.3. Suppose that we have n = I 0 independen t observatio ns from
Although we can save arithmetic by using (11.4.1), there are two disad- an exponential distribution with mean 8, and that 8 = 28 .8 (see Example
vantages. The more serious is that the normal approximat ion (9.7.2) may be 9.7.2). We wish to obtain an approximat e 95 % confidence interval for 8.
inaccurate. Ifthis is so, the interval (11.4.1) may exclude some of the plausible From Example 9.7.2, the log RLF of 8 is
parameter values and include some parameter values which are very
implausible . The second disadvantag e is that the approximat ion (11.3.4),
which was used to evaluate the coverage probability of (11.4.1), is generally
less accurate than (11.3.1). Sometimes both of these difficulties can be
[8
r(8) = - n {J - 1 - log (j 8] = - 10 [28.8 2s.8J
- - - 1 - lo g B
8
for 8 > 0. By plottmg this function , or by Newton's method, we find that
overcome by making a suitable nonlinear transformat ion as in Section 9.7.
r(8);:::: log 0.147 for 16.42 $ 8 $ 57.47. This is an approximat e 95% co nfidence
However, in general, it is safer to compute likelihood intervals instead of
interval. In fact, from Table 11.3.3, the exact coverage probability of the
relying on ( 11.4.1 ).
14.7% likelihood interval in this situation is 0.948.
Alternatively, we note from Example 9.7.2 that
EXAMPLE 11.4.2. Suppose that x = 17 successes are observed in n = 100
Bernoulli trials with success probability 8. We wish to find an approximat e
95% confidence interval for 8.
~= Jn;e= 0.1098.
By (l 1.4.1), the approximat e 95% confidence interval is 28 .8 ± 17.85; that
The MLE is 8 = x/ n = 0.17, and the log RLF is
is, 10.95 s; 8 s; 46.65.
r(8) = 17 log 8 + 83 log(! - 8) + 45.589 for 0 < 8 < I. It can be shown that, with n = 10 independen t observation s from an
exponential distribution , the exact coverage probability of the interval
One can show that r(8);:::: log 0.147 for 0.105 $ 8 $ 0.251. This is a 14.7%
likelihood interval and also an approximat e 95% confidence interval for 8. 8 ± 1.96/ftWj is 0.9035. The approximat ion (11.3.4) is not very accurate in
Parameter values which belong to this interval are more plausible than this case. More seriously, the interval constructed is symmetric about 8
parameter values outside the interval. Also, over a series of repetitions of the whereas the log RLF is highly skewed (see Figure 9.7.1). The interval includes
experiment with 8 fixed, intervals constructed in this way would cover the very implausible values of 8 at the lower end, and excludes fairly plausible
true value of 8 about 95 % of the time. values at the upper end.
From Example 9.1.l , the information function is More satisfactory results are obtained if we apply (11.4.1) to the trans-
formed parameter A.= fr 113 . From Example 9.7.2 we have
x n- x 1 = (J- 1 13 = 0.3262;
J (8)= ei + (1-8)2 for 0 < e< 1. J .(1) = 9n/l 2 = 845.6.
By (11.4.1), the approximat e 95% confidence interval for A. is
Substitutin g 8 = 8 gives
1 ± 1.96/ j7.(f) = 0.3262 ± 0.0674,
..f x n-x n n n
0
( ) = 8 2 + (1 - 8) 2 = 8 + T=e = 8(1..:.T ) · or 0.2588 s; A. s; 0.3936. Since 8 = Jc - 3 , the interval for 8 is
(0.2588)- 3 ~ 8 ;:::: (0.3936) - 3 •
Now (11.4.1) gives
which gives 16.39 $ 8 s; 57.66. This is very nearly a likelihood interval, and .it
f8{l=l)
8 ± 1.96 y--;:;-- = 0.17 ± 0.0736,
can be shown that the exact coverage probability of intervals constructed m
this way is 0.949.
118 11. Frequency Properties
11.4. Confidence Intervals
119

PROBLEMS FOR SECTION 11 .4 Assuming that .their measureme nts are independe nt N(3727, o- 2 ) , obtain a like-
I. Suppose that the distributio n of the likelihood ratio statistic D = -2r(0 ) does lihood interval for u which is an approxima te 95% confidence interval.
0
not depend upon 80 . Show that, for all p, the 100p% likelihood interval for 8 is a
8. Let X 1 , X 2 , ... , X. be IID N(µ, o- 2 ) variates, where µ is known but u is unknown
confidence interval.
(see Problem 11.4.7).
2. In a poll of 200 randomly chosen voters, 94 indicated that they would vote for the
(a) Show that the likelihood ratio statistic is
Conservatives if an election were called. Let p be the proportion of all voters who
would vote for the Conservatives. Find a likelihood interval which is an
approxima te 95% confidence interval for p. Is it likely that p = t?
D =-2r(u =T- n - n log(T/ n)
0)

where L= L:(x;- µ) /u~. 2

3.t Five hundred people were chosen at random from a large population and were (b) Show that T has a x2 distributio n with 11 degrees of freedom.
asked their opinions on capital punishmen t for murderers of prison guards. Sixty (c) Show that the 100p% likelihood interval is a confidence interval, and describe
percent of those interviewed were in favor. Let p denote the fraction of the how its exact confidence coefficient can be determined.
population who favor capital punishment.
9. Let X 1 , X 1 , ... , X. be IID random variables having a gamma distributio n with
(a) Find likelihood intervals for p which are approxima te 95% and 99% p.d.f.
confidence intervals.
(b) Use the normal approxima tion to construct approxima te 95% and 99 %
confidence intervals, and compare them with the intervals in (a).
f(x} =; exp ( - ~)
2 for x > 0

4.t The following are the times to failure, measured in hours, of ten electronic where e is a positive unknown parameter.
component s: (a) Show that the likelihood ratio statistic is
2 119 51 77 33 27 14 24 4 37 = =
D -2r(8 0 ) T- 4n - 411 log(T/411)
Previous experience with similar types of componen ts suggests that the distri- where T= 2"LX;/8 0 .
bution of lifetimes should be exponential. The mean lifetime 8 is unknown. (b) Show that 2X,/80 has a x2 distributio n with 4 degrees of freedom, and hence
(a) Find a likelihood interval for (} which is an approxima te 90% confidence that T ~ xf4n»
(c) The total of n = 60 observatio ns was found to be "Lx; = 71.5. Find a likelihood
interval.
interval for 8 which is an approxima te 90% confidence interval. Will the exact
(b) Transform the result in (a) to obtain an approxima te 90% confidence interval
coverage probability be close to 90% in this situation?
for p, the proportion of componen ts whose lifetimes exceed 50 hours.
10. Let X 1 , X 2 , . .• , x. be IID exponentia l variates with mean 0, and define T= :EX;.
5. The number of accidents per month at a busy intersection has a Poisson
We noted in Example 11.3.3 that the distributio n of 2T/8 in repetitions of the
distributio n with mean µ, and successive months are independent. Over a 0
experiment with (} = 80 is xfini- Let a, b be values such that
10-month period there were 53 accidents altogether.
(a) Obtain a likelihood interval which is an approxima te 97.5% confidence P{xf2n) :$a}= 0.025 = P{xf2n) ~ b}.
interval for µ. (a) Show that the interval
(b) Use the normal approxima tion of Section 9.7 to obtain an approxima te
97.5% confidence interval forµ. 2T 5. fl< 2T
b - a
6. Wh en an automatic shear is set to cut plates to length µ, the lengths actually
produced are normally distributed about µ with standard deviation 1.6 inches. is a 95% confidence interval for e.
The average length of 15 plates cut at one setting was 125.77 inches. Find three (b) Let €IL and e. denote the lower and upper endpoints of the confidence interval
likelihood intervals for µ which are 90%, 95%, and 99% confidence intervals. in (a). Show that

7.t In a check of the accuracy of their measureme nt procedures, fifteen engineers are a 1
r(fl.) - r(flL) = n log b + 2(b - a).
asked to measure a precisely known distance of 3727 feet between two markers.
Their results are as follows:
(c) Using tables of the x2 distribution , evaluate r(O.) - r(Otl for n = 1, 5, 10, and
3727.75 3726.43 3728.04 3729.21 3726.30 15. Is the interval in (a) a likelihood interval? What happens as n increases?
3728.15 3724.25 3726.29 3724.90 3727.51 11.tLet X 1 , X z, . . ., X" be IID variates ha ving a continuous uniform distributio n on
3726.85 3728.50 3725.94 3727.69 3726.09 the interval [O, 8], where 8 is a positive unknown parameter.
120 11. Frequency Properties 11.5. Results for Two-Parameter Models 121

(a) Show that the likelihood ratio statistic is


that the distributions of D and D 2 can be approximated by x2 distributions in
D =-2r(B =-2n log(M/8
0) 0)
large samples:
where M ts the largest of the X/s. D;:::; xf2 > and D2 ;:::; xf!)·
(b) Show that
The x approximation has 2 degrees of freedom for D and only one degree of
2

P{ Ms mfB = B0 } = (m/8 0 )" forms B0 ; freedom for D 2 • See Section 12.3 for a discussion of degrees of freedom.
P{D s dfB= B0 } = 1-e-aii for d>O. The true value (a. 0 , /3 0 ) belongs to the 100p% likelihood region if and only if
r(a. 0 , /3 0 ) 2 log p. Thus the cover:ige probability of the 100p% likelihood
Thus Dis distributed as xfz)· Note that (11.3.1) does not apply here because region for (a., /3) is
the range of the X,'s depends on 8.
(c) Show that t.he l~p% likelihood interval has coverage probability I - p. CP(a. 0 , /3 0)= P(D :s; -2 log pla. = a. 0, /3 = /3 0)
(d) Fmd a hkehhood mterval for B which is a 95% confidence interval based on
the following sample of size IO. ;:::; P(xf2> :s; - 2 log p).
The exact coverage probability may depend upon a. 0 and /3 0 , but the
0.7481 0.7484 0.9537 0.1589 0.3773 approximation does not. Consequently, likelihood regions are approximate
0.3345 0.2906 0.8527 0.3479 0.9245 confidence regions in large samples.
By (6.9.3), the c.d.f. of xf2 > is
ford> 0.
11.5. Results for Two-Parameter Models It follows that
Suppose that the probability model for the experiment mvolves two unknown CP;:::; P(xf2 i :s; - 2 log p) = 1 - e10g P = 1 - p.
par~meters, a. and /3. Let r(a., /3) denote the joint log RLF of a. and f3 as in
The 100p% likelihood region for (a, /3) is an approximate 100(1 - p)%
Section 10.1. The 100p% likelihood region for (a., /3) is the set of parameter
confidence region.
values such that r(a., /3);::: Jog p (see Section 10.2).
The true value /3 0 belongs to the 100p% maximum likelihood interval for f3
In Section 10.3, we defined the maximum log RLF of f3 to be the maximum
of r( a., /3) over a. with f3 fixed: if and only if r max(/3 0);::: log p. Thus the coverage probability of the 100p%
maximum likelihood interval for f3 is
r max(/3) =max r(a, /3). CP(a 0 , /3 0) = P(D 2 :s; - 2 log pja = cto, /3 = /30)
The 100p% maximum likelihood interval for f3 is the set of all [3-values such ;:::; P(xfi> :s; - 2 log p).
that rmax(/3) 2 log p. This interval can be found from a graph of r maxC/3), or
Maximum likelihood intervals are approximate confidence intervals. They have
from a contour map of r(a, [3).
the same approximate coverage probabilities as likelihood intervals in the one-
Now imagine a series of repetitions of the experiment with (a, f3) fixed at
parameter case.
(ao, /30). We consider two likelihood ratio statistics:
Figures 10.2.2, 10.2.3, and 10.6.1 show both 10% likelihood regions and
D =- 2r(a 0 , [3 0 ); 10% maximum likelihood intervals for three numerical examples. The 10%
likelihood region consists of all points on or within the 10% contour, which is
D2 '= - 2r max(/30). roughly elliptical in shape. This region would include the true values of both
D is the. likelih.oo~ ratio statistic for testing the hypothesis (a, f3) = (a 0 , [3 0 ), parameters in about 90% of repetitions of the experiment with both
and .D2 1s the hkehhood ratio statistic for testing the hypothesis f3 = f3o (see parameters fixed. The broken vertical lines show the 10% maximum
Sections 12.2 and 12.3). likelihood interval for the first parameter. The true value of the first
The values of D and D 2 would vary from one repetition of the experiment parameter would lie between these lines about 96.8% of the time in
to the next depending upon the data obtained. In principle, their exact repetitions with both parameters fixed. Similarly, the true value of the second
sa~~ling distrib~tions can be derived from the probability model. In practice, parameter would lie between the broken horizontal lines about 96.8% of the
this 1s usually difficult to do, and so approximations are used. time.
It can be shown, under conditions similar to those given in Section 11.3, In the one-parameter case we considered two normal distribution
122 11. Frequency Properties 11.5. Results for Two-Parameter Models 123

examples for which the distribution of the likelihood ratio stat1st1c was (10.4.5) gives a good approximation to rmaxUJ). A nonlinear parameter
exactly xfii· A two-parameter example is given below in which the distri- transformation may help. See the discussion in Sections 9.7 and 10.4.
xf
butions of D and D2 are exactly xf2 > and 1 i, respectively. It can be shown
that the same is true in Example 10.1.1.
PROBLEMS FOR SECTION 11.5
EXAMPLE 11.5.1. Suppose that an experiment involves taking two measure- 1. Use (11.5.l) to obtain approximate 95% confidence intervals for f3 and y in
ments x, y which are modeled as observed values of independent variates Example 10.5.1.
X ~ N(IX, 1) and Y ~ N(fl, 1). It is easy to show that & = x, 1J = y, and 2. (a) Find approximate 90% confidence intervals for ex and f3 in Problem 10.1.1.
r(IX, fl)= - !(x - 1X)
2
- t(y - fl) 2 (b) Use the result of Problem 10.4.4(c) to obtain an approximate 90% confidence
interval for the parameter y =et - f3 in Problem 10.l.l.
for - oo <IX< oo and - oo <fl< oo, and that
3.tUse Problem 10.4.4(c) to obtain an approximate 99% confidence interval for the
r max(fl) = i(Y - {J)2. parameter y =et+ 2/3 in Example 10.5.1. Transform this interval to obtain an
approximate 99% confidence interval for the probability of death at log concentra-
Imagine a series of repetitions of this experiment with IX= IX 0 and fl= fl 0 • tion d = 2.
The two likelihood ratio statistics are as follows:
4. (a) Let X and Y be independent Poisson variates with means µ 1 and µ 2 , and define
D = 2r(1X 0 , flo) = (X - 1X 0 )
2
+ (Y- /J 0 ) 2 = Zi + Z~; y=log(µ 2 /µ 1 ). Derive the information matrix .§(jL 1 ,jL 2 ). Then use Problem
D2 =-2rmax(/Jo) =(Y - /J 0 )
2
=Z~. 10.4.4(c) to show that

Here Z 1 = X - IX 0 and Z 2 = Y -/J0 are independent N(O, 1) variates. It {1-,-t


log(Y/X) ± 1.96 '-./X -r y
follows by (6.9.9) and (6.9.8) that D ~ xf2 > and D 2 - xfll·
is an approximate 95% confidence interval for y.
(b) Calculate an approximate 95% confidence interval for log A. in Problem
Use of Normal Approximations 10.3.2(b), and transform it to obtain an approximate 95% confidence interval
for..\.
Normal approximations to r(a, fl) and r max(/J) were given in Section 10.4. 5. (a) Let X 1 , X 2 , ••• , Xn and Y1 , Y2 , ... , Ym be independent exponentially dis-
These results may be used to obtain approximate likelihood regions and tributed variates, with E(X;) = 8 1 and E(1j) = 8 2 • Define y =log (8 2 /8 1 ). Show
maximum likelihood intervals. Their approximate coverage probabilities can that
be obtained from the x2 approximations.
For instance, (10.4.5) gives -- Jf:fl
log(Y/X) ± 1.96 - +-
n m
rmax (/J) ::::; - !(P - /J) 2
/ .J 22
is an approximate 95% confidence interval for y.
where .J 22 is defined in Section 10.4. It follows from this result that (b) Use the result in (a) to obtain an approximate 95% confidence for), in Problem
10.3.3(c).
1J±c~ (11.5.1)
6.t Find an approximate 95% confidence interval for the median of the Weibull
is an approximate 100p% maximum likelihood interval, where distribution in Example 10.4.2.
c =-) - 2 log p. Hint: Show that log m =log 8 +(log log 2)//3, and use the result in Problem
By the x2 approximation to D2 , the approximate coverage probability of 10.4.4(c).
100p% maximum likelihood intervals is 7. Let Y1 , Y2 , .•. , Y,, be independent N(µ, cr 2 ) random variables whereµ and a are
unknown, and define y = µ/<J. Derive the information matrix .§(jL, a). Then use
P(Xfi> :s; - 2 log p) = P(xfll :s; c 2
) = P( - c :s; Z :s; c)
Problem 10.4.4(c) and (11.5.1) to show that
where Z - N(O, 1). Thus we see that the interval (11.5.1) is an approximate
confidence interval with confidence coefficient P( -c :s; Z :s; c). In particular, y± 1.96 ~(1 + ty 2 )
(11.5.1) gives an approximate 95% confidence interval when c = 1.96. n
This procedure will not produce sensible confidence intervals unless is an approximate 95% confidence interval for y.
124
11. Frequency Properties
I 1.6. Expected Information and Plann
ing Experiments
125
* 11.6. Expected Information and Planning
Experiments 1
J
O'.)
p=P (X> T)= ~e~xf 0 dx=e r;o
T 0
Up to this poin t we have assu med
that the expe rime nt had alrea dy been The n Y, the num ber surviving, has
selected and perf orm ed, and we have a bino mial (n, p) distr ibut ion, and the
considered the prob lem of extr actin likelihood function of 8 is log
info rmat ion abou t an unkn own g
para mete r 8 from the data . Stati
meth ods are also useful at the plan stical
ning stage in deciding wha t expe rime
nt
1(8) = y log p + (n- y) log( l
- p)
shou ld be performed.
where p = e-T/B . Differentiating t\fiCe with respect to
From Sections 9.7 and 11.4, the inter 8 gives
val

8±c /jJ( Ji)


is an appr oxim ate likel ihoo d/co nfid
ence interval for 0. As we note d in Sect
9.7, a large value of f(O) implies a ion Since E(Y) = np, the expected info rmat
shor t interval for 8. Thu s, if the aim ion func tion is
expe rime nt is to estim ate 0, we shou of the
ld try to select an expe rime nt for whic
~ p) G~ )2 = (/~:;04 = ;2 .h(p)
f(O) will be large. h
Jn general, f(O) is a func tion of 8 and f E(O) = p(l
possibly othe r features of the data .
Its value will not be know n until after where
the expe rime nt is performed. Conse-
quently, it is usua lly not possible
to use f(B) in plan ning the experime p(log p) 2
Inste ad, one can exam ine the expe nt.
cted value of the info rmat ion function, h(p) = 1 and p = e Tte.
-p
fE(O ) = E{"~(O)}. With the aid of a calculator, one can
easily show that h(p) reaches a max imu
This is called the expe cted informati value of 0.648 for p = 0.203. Thu s we m
on Junction of 8, or Fisher's meas shou ld try to pick Tso that abou t
expe cted information. We shall show ure of of the items tested will survive the test 20%
at the end of this section that f e(O); period. If we guess T nearly right, the
for all para met er values. ::: 0 expected info rmat ion is abou t 0.64 2
n/0 ; that is, abou t 64% of the expe
The expected info rma tion is a mea sure info rmat ion n/0 2 when all n expo nent cted
of the average precision that would ial failure times are observed.
be attai ned over a large num ber of repe This example is rath er artificial beca
titions of the experiment. It is relevant use we are not takin g costs into
only at the plan ning stage. Onc e account. If a shor ter experiment
the expe rime nt has been performe costs less, one migh t well get "mo
actu al precision can be assessed d, its info rmat ion per doll ar" by choo sing re
by calc ulati ng f(B) , or bette r yet, a smaller value of Tso that mor e
exam inin g a grap h of r(O). by 20% of items would survive testing. than
The following two examples illus trate
the use of expected info rmat ion in
plan ning experiments. EXAMPLE 11.6.2. Two experiments are
being cons ider ed for obta inin g info
atio n abou t a linkage para mete r 8, rm-
where 0 < 8 <~·Each expe rime nt wou
EXAMPLE 11.6.1. Sup pose that n item involve observing mult inom ial frequ ld
time T, and the num ber Y which surv
s are to be tested for a preset length
ive is to be recorded. The lifetimes
of For the first experiment, the prob abili
encies X 1 , X 2, X 3, X 4 where 1:X;
ties are
= n.
assu med to be inde pend ent expo nent are
ial variates with mea n 8. If Tis chos p 1 = p2 =8/2;
to be too small, the expe rime nt may en p3 = p4 = (1 -8)/ 2.
term inate before any failures occur.
is too large, all item s may fail long If T For the seco nd experim~nt, the prob
before the expe rime nt ends. In eithe abili ties are
these cases, one wou ld expect to learn r of
very little abou t 0. How large shou ld q 1 = (0 2 - 20 + 3)/4;
be chos en in orde r to maximize the T q2 = q3 = (20 - 8 2 )/4; q4 = (1- 8)2/4.
expected info rmat ion abou t 11?
Which ~xperiment can be expected
SOLUTION. Since the lifetime X of an to yield mor e info rma tion abou t 11?
item is assu med to have an exponent
distr ibut ion with mea n 0, the prob ial SOLUTION. The log likelihood function
abili ty that an item survives time Tis based on the multi~omial distr ibut ion
iskXi log p;, and differentiating twice with respect
to 8 gives
*This section may be omitted on first
reading f(8) =E 2 Xi (dp
-
i)l -EXi- (d p;) 2
-2
P; ·
dO p, dO
126 11. Frequency Properties 11.6. Expected Information and Planning Experiments
127

Since E(X;) = np" the expected informatio n function is As in Section 11. l, we imagine a series of repetition s of the experimen t with

.fE(8)=n l:- -
1 (dp;)2 -n:E-
d2p, 8 fixed at a particular value. The value of X will vary from one repetition to
the next. Sand .f are functions of X, and thus can· be considere d as random
p1 d8 d8 2 .
variables.
The latter sum is zero because I.p; = l. In what follows, we shall be taking expectatio ns over the distributi on of X.
Since dp;/d8 c= ±1 and dqjd8 = ±(1 - 8)/ 2, the expected informatio n func- Usually these expectatio ns would involve multiple sums or integrals.
tions for the two experimen ts are · However, for simplicity, we shall write all expectatio ns as single sums.
For any value of 8, the total probabilit y in the distributio n of X is equal to
n l ,,, ( ) n( l - 8) 2 l ·
fi(8) = - :E- an d .:r 2 8 = I.- . I, so
4 Pi 4 qi
2,f(x; 8) = l for all 8.
The ratio of these two functions, x

Now we differentiate with respect to 8. Assuming that the order of


.f 2 (8) = (1 - 8)2 I.l/ q; differentiation and summatio n can be interchang ed, we get
f I (8) Ll / p;'
is called the expected relative efficiency for the second experimen t versus the a a2
~ a f(x; 8) = O; ~ a82 f (x; 8) = o.
first, and is tabulated below: 8
But since
0.0 0.1 0.2 0.3 0.4 0.5
I 0.88 0.77 0.65 0.55 0.44

For all 8 > 0, the first experimen t is more efficient (has larger expected af
informatio n) than the second, and it is considera bly more efficient for 8 we have a =Sf, and therefore
8
near t . If costs were equal, the first experimen t would be preferable to the
second. O 2,Sf= I af =O.
x x a8
This shows that the expected value of the score function is zero. Also we have
Properti es of the Score and Informa tion Functions
We conclude this section by showing that, under suitable regularity con- f =-
as
a8 =-
a [ l of
a8 7 a8 J = f
1 (af )
2 a8
2

-
2
l a f
7 a8 2'

ditions, the score function S(8) has expected value 0 and variance equal to the
from which we obtain
expected informati on fE(8)=E {f(8)}. Since variances are non-negative, it
then follows that f E(8) ~ 0.
Let X be a random variable or vector of r~ndom variables having
probabilit y or probabilit y density function f(x; 8) which depends on a It now follows that
continuou s paramete r 8. The likelihood function of 8 is proportio nal to
f(x; 8), 2 a1f
Is J - Iff - I a8 = o.
-
2
L(8) = c ·f(x; 8), x x x

where c is positive and does not depend on 8. The score and informatio n The first sum is E(S 2 ), and the second sum is the expected informati on, so we
functions are have shown that

S( 8 ) = alog L = alog f. E(S 2 ) = E(.f).


ae a8 ' Since E(S) = 0, it now follows that
f( ) = _
8
az log L = _ alog S = _ o log f
2
E(f) = E(S 2 ) =var (S)
a8 2 a8 ae 2 as required.
128
11. Frequenc y Propertie s 11.7. Bias
129
PROBLEMS FOR SECTION 11.6
Lt (a) According to the Hardy- Weinberg Law, genotypes AA, Aa, and aa should *11.7. Bias
occur in a populati on with relative frequencies 8 2 , W(l - 8), and
(1- ()) 2 , Many statistics textbooks suggest that unbiasedness is a desirable propert
respectively. In an experiment to estimate th.e gene frequency 8, n randoml
y
y of
parameter estimates. Indeed, it is 'often suggested that one should
chosen individuals are to be examined, and the frequencies Y , Yi, Y restrict
1 3 with the attention to estimates which are unbiased, and that the "best" estimate
three genotypes are to be recorded . Find the expected informat ion function is the
of 0. unbiased estimate having the smallest variance. In this section, unbiase
d
(b) Suppose that it is very difficult and expensive to distinguish between estimates will be defined, and some examples will be given to illustrat
the Aa e their
and aa genotype s, and that three times as many individuals can be examine
d if properties.
only those with the AA genotype are to be identified . Find the As in Section 11.l, we suppose that the probability model for
expected an
information function of 8 if 3n individuals are to be classified as experiment depends upon an unknown parame ter 8, and we imagine
AA or a series
not AA. of repetitions of the experiment with 8 fixed. Let T be an estimate of 8,
(c) Under what conditio ns would you recommend doing the experime such as
nt in (b) the maximum likelihood estimate B. The value of T would vary from
rather than that in (a)? one
repetition to the next, and so we model T as a random variable. Its samplin
g
distribution will depend upon 8, and can be derived from the probabi
2. (a) The lifetimes of electroni c compone nts are independent and exponen lity
ually model.
distribut ed with mean 8. Suppose that n components are to be tested
for a If T has good frequency properties, the values of T obtained in a series
preset time T. The number M which fail and their failure times Y , Yi
• ... , YM
of
are to be recorded. Show that 1 repetitions should be clustered tightly about the value of 8. This means
that
the sampling distribution of T should be centered near 8, and should
have a
small spread.
A convenient measure of the "center" of a probability distribu tion
is the
mean, E(T). The difference between E(T) and 8 is called the bias of
Hint: Use the fact that the score function has expected value zero
to evaluate T:
E{LY; } in terms of E{M} . Bias= E(T) - 8.
(b) Examine the expected efficiency of the experiment in (a) relative
to that
described in Example 11.6.1. Tis said to be an unbiased estimate of 8 if E(T) = 8 for all possible parame
ter
(c) For the experiment described in (a), is it better to test 2n compone values.
nts for time
T, or n compon ents for time 2r? The spread of a probability distribution is usually measured by a second
moment. The second momen t of the random variable T - 8 is called the
mean
3.t Suppose that X, the number of insects on a plant, has a Poisson distribut
ion with squared error of T:
mean 100. When an insecticide is applied, each insect on a plant has
probability p
of surviving, independently of other insects. Two experiments for estimatin MSE = E {(T- 8) 2 } .
g pare
being considered. In the first experiment, both the initial number of insects If this is small, the estimates obtained in a series of repetitio
X; and ns of the
the number Y; which survive the insecticide are to be recorded for n experiment would be clustered about 8. It is sometimes suggested
plants. In the
second experiment, the initial count is omitted, and only the Y;'s that one
are to be should choose an estimate T which minimizes the mean squared error.
recorded . Show that the expected efficiency of the second experime
nt relative to If Tis unbiased, then E(T) = 8, and hence the mean squared error is
the first is 1 - p. equal
to the variance of T. An estimate T which is unbiased and has the
smallest
4. Consider a one-to-one paramet er transformation from 8 to 2 = g(O). possible variance is called MVU (minimum variance unbiased).
Show that .
the expected information function of 2 is given by The following examples illustrate some general results concerning
un-
biased estimates.
(do)i
d2 f £(8). EXAMPL E 11.7.1. If Xis the number of successes obtaine d inn independent
trials with success probability 8, the estimate of 8 obtained by the method
of
5.* In the experiment described in Problem 9.5.5(b), only points
of impact on the
target are recorded . Show that the expected efficiency of this experime
=
maximum likelihood is T X /n (Example 9.1.1). Since E(X) = n8, we
have
nt, relative
to one in which all points of impact are recorded, is equal to the probabil
ity that a
shot misses the target. •This section may be omitted on first reading.
130 11. Frequency Properties 11.7. Bias 131

1 1
E(T) E(X) = -(n8) 8,
n n 1 1
that where a= -Lai. If Lai= 1, then a= and La( is minimized for
and hence T is an unbiased estimate of 8. n n
By the invariance property, the maximum likelihood estimate of 82 is a 1 = a2 = · · · =an= I/n. Hence the unbiased linear estimate ofµ with smallest
T 2 =: X 2 /n 2 , with expected value vanance is
E(T 2 ) = E(X 2 )/n 2 = [var(X) + E(X) 2 ]/n 2 '
D
by (5.2.3). Since var(X) = n8(1 - 8), we have
EXAMPLE 11.7.4. Let X be the number of successes before the first failure in
E(T 2 ) = [n8(1 - 8) + n2 82 ]/n 2 = () 2 + ()(l - O). independent trials with success probability 8. Define T(x) = 0 for x = 0, and
n T(x) = 1 for x;;::: 1. Show that Tis the unique unbiased estimate of e.
2
Hence T is not an unbiased estimate of 8 • The bias is 2
SOLUTION. The distribution of X is geometric:
8(1-
E(T2) - 02 = - - for x = 0, 1, 2, ....
n
The expected value of T is
which is positive for 0 < 8 < 1, and tends to zero as n-+ oo.
E(T)=O·P(T=O )+ 1·P(T=1)
EXAMPLE 11. 7.2. If X 1 , X 2 , ... , X n are independent Poisson variates with
meanµ, the maximum likelihood estimate ofµ is X = LXdn (Example 9.1.2). = P(X;;::: 1) = 1 - f(O) = ().
Since E(Xi) = µ, we have
Hence T is an unbiased estimate of ().
- 1 1 Now suppose that T' is another unbiased estimate, and define
E(X) = -LE(Xi) = -(nµ) = µ, U(x) = T'(x) - T(x) for x = 0, 1, .... Then
n n
and hence X is an unbiased estimate of µ. E(U) = E(T') - E(T) = 8 - 8 = 0 for all e.
By the invariance property, the maximum likelihood estimate of f3 = e -µis Also by (5.1.3) we have
e-x. Since T= I:Xi has a Poisson distribution with mean nµ (corollary to
Example 4.6.1), the expected value of e-x is E(U) = LU(x)OX(l - ()).

E(e-Tfn) = I"" e-t/n(nµ)'e-nµ/t! = e-nµL(nµe-11•)'/t! It follows that


t=O
U(O) + U(l)O + U(2)82 + U(3)8 3 + ·· · = 0
= e-nµ. enµe- 1 !n = e-Mn-ne~ 1 11 = pn(l-e-''J.
for all values of Obetween 0 and 1. This will be true if and only if U(O) = U(l) =
Hence e-x is not an unbiased estimate of /J. The bias is U(2) = · · · = 0. Hence T(x) = T'(x) for x = 0, 1, 2, ... , and T is the unique
pn(l-e-•I•) _ /J unbiased estimate of 8. D
which is always positive, and tends to zero as n-+ oo.
EXAMPLE 11. 7.3. Suppose that X 1 , X 2 , ... , X n are independent variates with Discussion
the same mean µ and variance <1 2 • A linear estimate of µ is a linear
combination of the X/s, T= La;Xi, where the a;'s are constants. Show that Examples 11. 7.1 and 11. 7.2 show that the criterion of unbiasedness is not
the sample mean X is the unique MVU linear estimate ofµ. invariant under one-to-one parameter changes. If Tis an unbiased estimate
of 8, then g(T) will generally not be an unbiased estimate of g(8) unless g is a
SOLUTION. By (5.5.5) and (5.5.7), the mean and variance of Tare linear transformation. It is not possible to require both unbiasedness and
invariance.
E(T) =µLa,; If an invariant estimation procedure is used, it does not matter whether we
Thus T is unbiased for µ if and only if La;= 1. Now it is easy to shJw use e, 1/8, or some other one-to-one function of e to label the distributions
132
11. Frequenc y Propertie s t:.7. Bias
133
which make up the probability model. Often the choice . of parame
ter is
largely arbitrary, and therefore invariance would seem to be a
highly
desirable property (see Section 9.6). However, if unbiased estimate
s are
required , a nonlinear parame ter transformation completely changes Plot the bias, and show that it tends to zero as 8-+ 0 and as 8-+
the 1.
estimation problem . · 4 Suppose that Y has a binomia l (n, 8) distribut ion. Show that
Since maximum likelihood estimates are invariant under one-to-o T= Y(Y - I) is an
ne n(n - I)
parame ter transformations, they will generally be biased. Usually the unbiased estimate of µ 1, and more generally , that y<k>; n<k> is an unbiased
bias is estimate of
small and tends to zero as the number of independent observations µk for k = 1, 2, ....
per
unknown parame ter increases. 5. Let YI> Y1 , . .. , Y,, be independ ent . Poisson variates with mean
µ, and define
The only way to achieve unbiasedness in Example 11.7.4 is to estimate
P(success) by l or 0 according to whether the first trial gives success or
S =Y + Y2 + ··· + Y,,.
1 Show that s<kl;n<kJ is an unbiased estimate of µk for
failure. k= I, 2, .. ..
This is not a very sensible estimation procedure. It ignores information
that 6. Suppose that X 1 , X 2 , .•• , X" are independ ent variates with the
might be gained in trials beyond the first one, and it either overestimates same meanµ and
or variance u 2 . Show that
underestimates 8 in every particul ar application. This seems a high
price to
pay in order to achieve the correct long-run average value in a
series of
repetitions that will never take place! and hence verify that
It may be sensible to require an estimate with small bias in some situatio
ns.
However, the requirement of unbiasedness is too strong, and as Exampl =_
e s2 1 _ :E(X; - X)2
11.7.4 shows, this requirement may eliminate all "sensible" estimati n-1
on
procedures. The choice of statistical terminology is rather unfortun is an unbiased estimate of u 2 • Is S an unbiased estimate of u?
ate. No
one likes to be accused of bias, but in parame ter estimation, a little bias
may 7.tSuppo se that X 1 , X 2 , •.. , X" are independ ent variates with the
be a good thing. same meanµ but
with different variances ui, uL ... , u;. Find the minimum variance
There is an extensive literature on the theory of unbiased estimati linear unbiased
on. estimate of µ.
Although this is of some mathematical interest, it does not seem
terribly
relevant from a practical point of view.
The suggestion that estimates should be chosen to minimize mean squared
error or the variance of the sampling distribution also deserves
some
comment. The variance or mean squared error of an estimate in
a hypo-
thetical series of repetitions does not necessarily indicate the precisio
n of the
estimate in any particular application. In maximum likelihood estimati
on it
is the observed information f(B), and not the variance or mean squared
error
of 8, which measures precision. There are special situations where
· the
variance of the sampling distribution is an appropr iate measure of precisio
n,
but this is not true in general.

PROBLE MS FOR SECTION 11. 7


l. Let T be an estimate of 8. Show that the mean squared error
of Tis equal to its
variance plus the square of its bias.
2.tSupp ose that Y -N(µ, 1), and consider the following estrmate
s of µ 2 :
=
T1 yi; T1 = y2 - 1.
Show that T1 is unbiased and has smaller mean squared error than
T1 • Why would
T1 be unsatisfa ctory as an estimate of µ 1?
3. In Example 11.7.4, show that the bias of the maximum likelihoo
d estimate Bis
12.1. Introduction
135

CHAPTER 12 true and then check whether this assumpt ion leads to an inconsistency. If a
contradiction is found, the hypothesis is disproved. If no contradi ction is
found, the method of proof fails and the hypothesis could be either true or
Tests of Significance false.
For instance, to prove by contradiction that there is no largest prime
number, we first formulate the hypothesis
H: there is a largest prime number
which is the opposite to what we want to prove. Assuming H to be true, there
are finitely many prime numbers p 1 < p 2 < · .. < Pn· If this is so, every
number larger than p. is composite, and is divisible by at least one of
P1.P2, ... ,p• . However, p=l+p 1 p 2 ... p. is larger than Pn and is
not
divisible by any of the p;'s. This is a contradiction, and therefore H is false.
In a mathematical proof by contradi ction, we look for a logical inconsis-
tency, but in statistical applications there will rarely be a logical inconsistency
between data and hypothesis. Even if we observed 100 heads in 100 tosses of a
A test of significance is a procedure for evaluating the strength of the evidence coin, we could not prove mathematically that the coin was biased, because
,.,: . -
provided by the data against an hypothesis. Section 1 gives a general this result could have arisen from ·100 tosses of a balanced coin. Nevertheless,
,.,..
introduc tion to significance tests and their interpretation, and defines test we would be quite sure that the coin was biased, because the probabil ity of
statistics and significance levels. obtaining such an extreme result with a balanced coin is extremely small.
In many applications, the hypothesis of interest can be formulated as an In a significance test, we compute the probability of observing such an
hypothesis concerning the values of unknown parameters in the probability extreme result when the hypothesis is true. The smaller the probability, the
model. It is then possible to derive a test statistic, called the likelihood ratio stronger the evidence that the hypothesis is false.
statistic, from the log likelihood function . Likelihood ratio tests are described
in Sections 2 and 3. EXAMPLE 12.1.l. Let X be the number of heads in 100 tosses of a coin. We
Sections 4, 5, 6, and 8 give applications of significance tests to examples assume that tosses are independent, and that 8, the probability of heads, is the
involving frequency data, where the basic model for the experiment is same at all trials. We observe a value of X, and we wish to test the hypothesis
binomial or multinomial. In particular, Section 5 discusses goodness of fit H : B=t. ·
tests for multinomial data, and Section 6 describes tests for independence in Under the hypothesis, X has a binomial distribut ion with probability
contingency tables. Section 7 is conrerned with the importan ce of controlled function
experiments and randomi zation in establishing cause and effect.
Significance intervals or regions are defined in Section 9, and their coverage f(x)= ('~)m100 for x = 0, 1, ... , 100.
probabilities are determined. Also, the connection between significance
intervals and likelihood regions is investigated. If H is true we expect to observe a value of X near 50. The quantity
The power of a test statistic against an alternative hypothesis is defined in D =IX - 501 measures how closely the observation agrees with the hypothesis.
Section 10. Power is sometimes useful in a theoretical comparison of two or If Dis close to 0, then Xis in good agreement with H : 8 =!-.A large value of
more possible test statistics, or in selecting the sample size for an experiment. D
indicates poor agreement between the data and hypothesis.
Suppose that we observe X = 35, so that the observed value of D is
\35 - 50\ = 15. The probability of getting such poor agreement with H (i.e. such
a large value of D) is
12.1. Introd uction
I P{D;;:;: 15} = P{\X - 501;;:;: 15}::::; 0.0027
A test of significance is a procedure for measuring the strength of the evidence (see below). If H were true, a result as extreme as X = 35 would occur very
provided by the data against an hypothesis H. It is similar to a proof by rarely. Thus we have strong evidence that H is false and the coin is biased.
contradi ction in mathematics. In each case we assume the hypothesis to be On the other hand, if we observe X = 45, the observed value of D is
136 12. Tests of Significance 12.1. Introduction 137
145 501 = 5, and the probability of such poor agreement with H is
the hypothesis were true, and we have evidence that His false. The smaller the
P{D;:::5}= P{IX 501:?::5}~0.32. significance level, the stronger the evidence against the hypothesis. A large SL
indicates only a lack of evidence against the hypothesis. Even a significance
Results as extreme as X = 45 would occur fairly often with a balanced coin,
level of 90% or 100% does not imply that the hypothesis is "probably true".
and we do not have evidence that His false. We have not shown that His true
The probability statement refers to the data, not the hypothesis.
either! There are plenty of other values of 8, such as 8 = 0.45, which could
Conventionally, 0.05 is taken to be the dividing line between "small" and
have produced X = 45. A large probability means simply that no contradic-
"large" significance levels. If SL s 0.05, the hypothesis is said to be con-
tion has been found. The method of proof fails, and H could be either true or
tradicted by the data (at the 5% level), whereas if SL> 0.05, the hypothesis is
false.
said to be consistent or compatible with the data (at the 5% level). Of course,
The above probabilities could be calculated exactly by summing f(x) over
this convention should not be taken too seriously. Significance levels 0.049
the appropriate values of X. Instead, the normal approximation to the
and 0.051 are on opposite sides of 0.05, but they imply about the same
binomial distribution was used (see Section 6.8). Under H: p = t we have
strength of evidence against the hypothesis.
X ~bin (100, t) ~ N(50, 25),
EXAMPLE 12.1.2 (Test for ESP). Consider a possible experiment for detecting
so that (X - 50)/5 is approximately N(O, 1). Thus
ESP (extra-sensory perception) in a human subject. Four cards labeled A, B,
C, and D are shuffled and placed face down on a table. The subject attempts
P{IX - 5012 15} = P{IX ~ 50 12 3} ~ P{IZI 2 3} to match the hidden letters to envelopes marked a, b, c, and d, and the number
of correct matches is recorded. The experiment is to be repeated 50 times
where Z ~ N(O, 1). Now Table B2 gives altogether.
P{IZI;::: 3} = 2(1 - 0.998650) = 0.0027. Even if the subject has no special powers, some correct matches will occur
by chance. A subject with ESP should be able to achieve more correct
matches than would occur by chance alone. To determine whether there is
evidence for ESP, we compare the results obtained with what would be
Test Statistics and Significance Levels expected under H, the hypothesis that the subject has no ESP. If the observed
results are in reasonable agreement with H, then we cannot claim to have
For a test of significance we require a ranking of possible outcomes according
proof of ESP.
to how closely they agree with the hypothesis. This ranking is usually
Let T denote the total number of correct guesses in 50repetitions. We shall
specified by defining a test statistic D, also called the test criterion or
show below that, under the hypothesis of no ESP, T has approximately a
discrepancy measure. A small value of D shows close agreement between the normal distribution with mean 50 and variance 50. Large values of Twill be
outcome and the hypothesis, and a large value of D indicates poor agreement. interpreted as evidence against H and in favor of ESP, so we take the test
The test statistic is to be chosen before the data are examined, and the
choice will reflect the type of dep~rture from the hypothesis that we wish to
=
statistic to be D T. The significance level is then
detect. A general method for constructing test statistics from the likelihood
function will be described in the next two sections. Power comparisons may
help in choosing among several possible test statistics (see Section 12.10).
When the experiment has been performed and data have been obtained, we where Z has a standardized normal distribution.
can compute the observed value of D. Then, assuming H to be true, we For instance, suppose that such an experiment produced the following
compute the probability of obtaining a value of D at least as great as that data:
observed. This probability is called the significance level (SL), or P-value, of
No. of correct matches 0 2 4 Total
the data in relation to the hypothesis: Frequency observed 17 18 9 6 50
SL= P{D 2 DobslH is true}.
The total number of correct matches is
The significance level is the probability of observing such poor agreement
between the hypothesis and data if the hypothesis is true. 7;,bs = 0 X 17 + 1 X 18 + 2 X 9+4 X 6 = 60,
If SL is very small, then such poor agreement would almost never occur if and hence the significance level is
138 12. Tests of Significance
12.1. Introduction 139

P(T?:. 60) ~ P(Z?:. 1.41) = 0.079. ·


probability that all four are correctly matched y times is
If the subject has no ESP, there is about an 8% chance of getting such a large
number of correct guesses. Thus we cannot claim to have proved that the g(y)= ( 5:)C~XG!Yo~y for y=O, 1, ... , 50.
subject has ESP. Nevertheless the results are encouraging, and one might
wish to collect additional data for this subject. The probability of at least 6 correct matches is
To complete the example, we show that the distribution of T und~r the
hypothesis is approximately normal with mean 50 and variance 50. Let X; be P(Yc6)=1 g(O)-g(l)- ··· -g(5)=.0.017.
the number of correct guesses in the ith repetition, so that ,, This looks like fairly strong evidence that the subject has ESP.
Of course, since the test was performed because we had noticed a large
T=.X 1 +X 2 + ··· +Xso·
value of Y, we should not be surprised that a small significance level was
Since the cards are randomly rearranged in each repetition, the X/s are obtained! One can find something unusual about almost any set of data, and
independent and identically distributed under H, and the distribution of Xi is then devise a test to produce a small significance level. In such situations, a
as follows: small significance level proves nothing.
For a valid statistical proof, one must specify the type of discrepancy being
x =Number correct 0 1 2 4 Total sought before the data are examined. Tests suggested by the data may be
f (x) = Probability 9/24 8/24 6/24 1/24 1 useful for indicating questions to be investigated in future experiments, but
they do not "prove" anything by themselves.
(see Problem 1.3.1). The mean and variance of this distribution are
E(X 1) = I:xf(x) = l;

var(X;) = !:(x - 1) 2f(x) = 1. Detection Versus Estimation


The mean of T is then !:E(X;) = 50. Since the X/s are independent, the A small significance level shows that there is a "real" departure from the
variance of T is !: var (X;) = 50. The distribution of T is approximately hypothesis; that is, a departure which cannot readily be explained by chance.
normal by the Central Limit Theorem. A small significance level does not mean that the departure is necessarily
large or important. With a large amount of data, very small departures of no
Nate. All that we needed in this example was the distribution of T under the practical importance may be detected by the test. With a small sample, large
hypothesis Hof no ESP. We didn't need to know what the distribution of T and important departures may go undetected.
would be if H were false. A model which incorporates the possibility of ESP is Merely reporting a small significance level is not enough. We also need to
likely to be quite complex. The test of significance tells us that so far we don't describe the type of departure observed and estimate its size. For instance, in
have conclusive evidence that ESP even exists, so we'd probably be wasting Example 12.1.1, one should not report just the significance level from the test
our time if we tried to model it at this stage. One of the most important uses of H: 8=}. One should also give the MLE and a likelihood or confidence
significance tests is in helping us to avoid wasting time with complicated interval for e.
models when a simple one will do.

PROBLEMS FOR SECTION 12.1


Tests Suggested by the Data
LtOf 100 peas planted in a genetics experiment, 65 produced tall plants and 35
Examination of a set of data will often reveal an interesting pattern which had produced short plants. According to genetic theory, plants are independent and
the probability of a tall plant is !. Carry out a test of significance to investigate
not been anticipated before the experiment was performed. There is then a
whether the theory is consistent with the data.
temptation to design a test of significance which will "prove" that this pattern
could not have arisen by chance. 2. In a particular Ontario county, a very large number of people are eligible for jury
For example, upon examining the data in Example 12.1.2, we see that all duty, and half of these are women. The judge is supposed to prepare a jury list by
four cards were correctly matched in 6 cases out of 50. Under the hypothesis randomly selecting individuals from all those eligible. In an important 1974
murder trial, the jury list of 82 people contained 58 men and 24 women. Could
of no ESP, the probability of matching all four is only 1/24, and the
such an extreme imbalance in the sexes reasonably have occurred by chance?
140
12. Tests of Significance 12.2. Likelih ood Ratio Tests for Simple Hypoth
eses 141
3.tin research on drugs to count eract the intoxi
cating effect of alcohol, twenty (a) Suppose that the blocks are placed in
subjects were used to comp are the relative merits a rando m order. Tabul ate the
of benzedrine and caffeine. Each probability function of X, the numb er of red blocks
subject received the drugs in a rando m order placed between the two
on two different occasions far
enoug h apart to eliminate carry- over effects. Benze green blocks, and find the mean and variance
drine broug ht about the more of X.
rapid recovery in 14 subjects, while caffeine was (b) Let X = (X + X + · · + X
judged better in the other 6 cases. 1 2 100 )/100 be the averag e numb er of red blocks
Are these results consistent with the hypothesis between green blocks in 100 replications of the
that the two drugs are equally experiment. Is the observed X
good? significantly different from what one would expec
t under rando m placement?
4. In Janua ry, party A won 53% of a very large 10. In an experiment on huma n behavior, a
numb er of votes in an election. Six sociologist asks four men and four
month s later, a poll of200 rando mly selected voters women to enter a room and sit wherever they
showed that only 48% would wish at a rectan gular table. There
vote for party A if anoth er election were called. are.three chairs at each side of the table and one
Could these results reasonably be at each end. The two end seats are
due to chanee, or is there evidence of a real chang considered to be special in that people sittmg there
e in the suppo rt for party A? have more domin ant positions
at the table.
5.tA seed merch ant states that 80% of the seeds
of a certain plant will germinate. (a) Find the mean and variance of X, the numb
Each of 4 custom ers buys one packe t and sows er of men occupying end seats,
100 seeds from it The numbers of under the assum ption that people choose their
seedlings appea ring are 73, 76, 74, and 77. seats at rando m.
(b) The seating experiment was repeated 84 times,
and altoge ther there were 98
(a) Discuss wheth er any customer, on the basis men in the end seats and only 70 women. Do these
of his observ ation only, has results differ significantly
adequ ate cause to claim that the stated germi from what one would expect under rando m seatin
nation rate is erroneous. g?
(b) If the 4 packe ts are from a homogeneou
s stock of seed, 1s the total 11.tU nder norma l conditions, the mean numb
germi nation record consistent with the stated er of person al calls handl ed by a
rate? compa ny switchboard was 7.2 per hour. The
6. (a) Each of 25 individuals was given two simila manag er sent a letter to all
r glasses, one of Pepsi and one of employees requesting that the numb er of person
al calls be reduced. Durin g five
Coke, and was asked to pick the one he prefer one-h our periods the following week, the numbe
red. Sixty percent of them rs of person al calls were 4, 2, 7, 5,
picked Coke. Is this result consistent with the and 3. Do these observations give strong eviden
hypothesis that there is no ce that the mean numb er of
detectable difference between Pepsi and Coke? personal calls per hour has been reduced?
(b) Repeat (a) if 250 individuals were tested and
60% of them picked Coke. 12. A food taster is given 12 samples of natura l
7. Deter mine the appro ximat e significance level flavoring and 12 samples of synthetic
of each of the following obser- flavoring in a rando m order He is asked to identif
vations in relation to the hypothesis that p, the y the 12 samples of natura l
probability of a male birth, is equal flavoring, and manages to get 8 of them right. Test
the hypothesis that the taster is
to !, Find an appro ximat e 95% confidence interv unable to distinguish between the two flavors.
al for p in each case.
(a) 293 girls and 299 boys in 592 births; 13. An experiment is carried out to investigate wheth
er subjects can tell the difference
(b) 2930 girls and 2990 boys in 5920 births; between butter and margarine. Each subject
is blindfolded and receives two
(c) 29300 girls and 29900 boys in 59200 births. samples of butter and two of marga rine in a rando
m order. The subject is asked to
identify the two samples of butter . The follow
8. The table below summarizes the data from ing table shows the numb er of
100 repetitions of the ESP experiment correct butter identifications in 100 indepe ndent
described in Example 12.1.2. replications of the experiment.

No. of correc t match es 0 Numb er correct 0 2


2 4 Total
Observed frequency 28 33 Frequency observed 18 62 20
31 8 100
Test wheth er the total numb er of correct match Altogether there were 102 correct identification
es is significantly greater than s. Using a test statistic based on
would be expected if cards were turned over in the total numb er correct, test the hypothesis
a rando m order. that subjects are unable to
distinguish between butter and margarine.
9.tin an experiment studyi ng the relatio nship betwe
en color perception and order, a
psychologist asks young children to place 6 simila
r blocks in a row. Four of the
blocks are red and two are green, but otherw
ise the blocks are identical. The
numb er of red blocks between the two green blocks
is recorded, and the observed 12.2. Likelihood Ratio Tests for Simple Hypothe
frequencies in 100 repetitions of the experiment
are as follows: ses
Numb er of red blocks 0 2 In many applic ations , the hypot hesis to be
3 4 Total tested can be formu lated as an
hypot hesis conce rning the value s of unkn own
Frequ ency observed param eters in the proba bility
28 22 22 18 10 100 mode l A test statis tic D, called the likelih
ood ratio statistic, can then be
143
12. Tests of Significance 12.2. Likelihood Ratio Tests for Simple Hypotheses
142

probabilities of likelihood intervals. The relationship between significance


derived from the log likelihood function. A significance test in which the
tests and likelihood/confidence intervals will be considered in Section 12.9.
likelihood ratio statistic is used as test statistic is called a likelihood ratio test.
is In some simple examples, it is possible to derive the exact sampling
In this section we shall restrict the discussion to simple hypotheses. H
l values for all of the distribution of D when()= () 0 (see Section 11.1). More often, derivation of the
called a simple hypothesis if it specifies numerica
exact distribution is too difficult, and an approxim ation is used. We noted in
unknown paramete rs in the model. A simple hypothesis reduces the number
Section 11.3 that, under suitable conditions, the distribut ion of D when () = e0
of unknown paramet ers in the model to zero. A composite hypothesis is one
is well approximated by a x distribution with one degree of freedom . When
2
which reduces the number of unknown parameters in the model, but not to
this approxim ation is applicable, we have
zero. Thus, under a composite hypothesis, there will still be one or more
paramet ers which require estimation from the data. Likelihood ratio t<;sts for SL= P{D 2: D0 b,j() = e0} ~ P{xfl) 2: D0 b<}· (12.2.3)
composi te hypotheses will be discussed in the next section.
We can find approxim ate significance levels by using Table B4, or by using
(6.9.8) and Table B2. 2
See Section 11.3 for a discussion of the conditions under which the X
One-Pa ramete r Case
approximation applies, and for some examples in which the accuracy of this
First suppose that the probability model involves a single unknown par- approximation is investigated.
,
ameter B. We wish to test the hypothesis H: B = B0 , where B0 is a particula,.
the success proba- 12.2.1. Suppose that we observe X, the number of successes in n
numerical value. For instance, in Example lil.l, B was EXAMPLE

bility in Bernoulli trials, and the hypothesized value was B0 =!. His a simple Bernoulli trials with success probability B. We want to test the hypothesis
hypothesis because it specifies a numerical value for the only unknown H: () = () 0 where () 0 is a particular numerical value such as!.
The distribution of Xis binomial (n, B), and the log likelihood function of B
parameter.
Let /(B) denote the log likelihood function and () the MLE of B under the is
model. The maximum log likelihood under the model is /(B). The log l(B) = x log B + (n - x) log(l - B)
likelihood under the hypothesis is l(B 0 ). The likelihood ratio statistic for
testing H: B = B0 is defined to be twice the difference between these two log for 0 < B < 1. The MLE is () = x/n, and the maximum log likelihood is
likelihoods, x n-x
1(0) = x log-+ (n - x) log-- .
D = 2[/(B) -1(8 0 )] = - 2r(B 0 ), (12.2.l) n n

where r(B) is the log relative likelihood function of B. The log likelihood under H is
e
Since maximizes l(B), we have l(B) s l(B)for all values of B, and therefore l(B 0 ) = x log B0 + (n - x) log (1 - B0 ),
D 2: 0 (see Section 11. l). A small value of D means that the outcome of the
experiment is such that B0 is a likely paramete r value. A large value of D and so the likelihood ratio statistic for testing H: B = B0 is
means that the outcome is such that B0 is unlikely. Thus D ranks possible D = 2[/(B)-1 (8 0 )] = - 2r(B 0 )
outcomes of the experiment according to how well they agree with H: B = B0 . n-x
x
Taking D as the test statistic, we have = 2x log nBo + 2(n - x) log n(l _Bo)
SL= P{D 2: D bslH is true}
If n is large, the distribution of D is approximately x with one degree of
0 2

(12.2.2)
= P{D 2: D0 bsiB =Bo}, freedom (see Example 11.3.l). If B0 is near!, the approxim ation gives fairly
accurate results for n = 20. However, a much larger value of n is needed when
where Dobs is the observed value of D. The significance level is calculated from
B0 is close to 0 or l.
the (sampling) distribution of the likelihood ratio statistic when B = B0 . If we
For instance, suppose that we observe X = 35 inn= 100 trials and wish to
imagine a series of repetitions of the experiment with B fixed at B0 , SL is the
test H: B = ! as in Example 12. l. l. The likelihood ratio statistic for testing
fraction of the time that the test statistic D would be greater than or equai to
the observed value D0 b,. H: B=! is
x 100- x
Upon compari ng (12.2.2) with (11.2.;2,), we see that there will be a close D = 2x log + 2(100 - x) log --SO-,
50
connection between significance levels in likelihood ratio tests and coverage
144
12. Tests of Significance
12.2. Likelihood Ratio Tests for Simp
le H~potheses
and its observed value is 145

35 65 The likelihood ratio statistic for testi


Dobs = 70 log + 130 log ng H: B = B0 is twice the difference
50 = 9.14. between the maximum log likelihoo
d and the log likelihood unde r H. Thu
50 we have s
The significance level is

SL= P{D z 9.14IB = !} D = 2[l( B)- l(B 0 )] = -2r( B )


0
where now Bis a vector of MLE's and
~ P{x( ll 2 9.14} = 0.0025 r(B) is the join t log relative likel
function . The exact significance level ihood
from Table B4. If B were equal to!, in the likelihood ratio test of H: B =
a result as extreme as X = 35 would again given by (12.2.2). B0 is
rarely occur, and so there is stro ng very
evidence that B #- t. It can be shown that, under condition
Any othe r hypothesized value for £J s similar to those described in Section
can be tested in a similar way . For 11.3, the· distribution of D whe~
instance, the LR statistic for testing B = B0 can be appr oxim ated by a 2
the hypothesis H: B= 0.4 is distribution. The degrees offreed om for the x2 appr oxim X
r.umber of functionally independent ation is equal to the
x 100 -x ;unknown para meters in the model
D = 2x log +2( 100 -x) lo g~ example below). It follows that ' (see the
40
and its observed value is
SL= P{D 2 D0 b,IB =Bo}~ P{xfkl z
D0 b,} (12.2.4)
35
D 1>s = 70 log
0 + 130 log 65 = 1.06. where k is the number of functionally
independent unknown para mete rs in
40 60 the model.
Table B4 gives
The case of two unknown parameters,
B =(a , /3), was considered previously
SL~ P{xti> 2 1.06} = 0.303, in Section 11.5. The likelihood ratio
statistic for testing H: a = a and /3 =/lo
so the hypothesis B = 0.4 is compatib is 0
le with the data .
To find the exact significance level,
it is necessary to add up the binomial D = 2[1(&, ~) - l(a 0 , /3 0 )] = -2r( ao, /lo).
probabilities of all outcomes x such
that D 2 D b,. Taking B = 0.4, we find We noted in Section 11.5 that the distr
that D z 1.06 for x :o; 35 and for x z 0
ibution of D is approximately XtiJ · D
46. Thu s we have has exactly a xf2 > distribution in the norm
al distribution Examples 11.5 .1 and
SL= P{D 2 Dobs} = P(X :o; 35) + P(X z 46) = 1 - P(36 :o; x :o; 45). 10. l. I.
Und er H: 8 = 0.4, the distr ibut ion of
Xis binomial (100, 0.4), and there fore EXAMPLE 12.2.2. The following are the
observed frequenci~s of the six face s
SL= ] - I (
x = 36
100
X
)(0.4)"(0.6) 100 - x=0.311.
·
100 rolls of a die from Example 1.4.1
:
in

Similarly, the exact significance level Face ) 2 3 4 5 6 Total


in relation to the hypothesis £J = t is
found to be Obs.freq. fj 16 15 14 20 22 13 100
SL= 1 - I (l00)(0.5)"(0.5) 00- x = 0.0035.
x= 36 X
1 Are these observations consistent with
balanced?
the hypothesis that the die is
There is good agreement between the
approximate and exact results. SOLUTION. Assuming that rolls of the
die are independent, the distr ibut ion
the fj's is multinomial, with join t prob of
ability function
Two or Mo re Parameters
(fif : ... f6)p{• p{2 ... p~·
Suppose that the probability mod where L.p = 1 and L.fj = n = 100. The
el depends on a vector of unknown 1 hypothesis to be tested is
para mete rs B. Let B0 be a vector of num
erical values, one for each com pone nt
of B. The n H : B = B0 is a simple hypo H: P1 =Pi = .. · = P6 = t·
thesis because it specifies a numerica
value for each of the unkn own para l This is a simple hypothesis because it
meters. assigns a numerical value to each of
unknown parameters p , p , . ., P6· the
1 2
146 12. Tests of Significance
12.2. Likelihood Ratio Tests for Simple Hypotheses 147

Neither of these procedures is really necessary in this example. The x2


The log likelihood function is
approximation is quite accurate, and the conclusion will not be affected by a
i(P1, P2, ···, P6) =I.Ji log P1 small change in the computed value of the significance level. O
where the p/s are non-negative, and I.p 1 = 1. It can be shown that the MLE's
are given by p1 = Jj/n (see Section 12.5). Hence the maximum log likelihood is PROBLEMS FOR SECTION 12.2
l(fi1, P2, ... , P6) = I.Jj log Pi= I.Jj log(fj/n). Lt Let 0 be the probability of a tall plant in the genetics. experiment of Problem
Under H, the log likelihood is 12.1.1. Perform an approximate likelihood ratio test of the hypothesis = 3/4. e
2. Suppose that X = 5 successes are observed in n = 10 Bernoulli trials with success
I(f,,+;, ... , tl = I.Jj log!-.
probability 0. Perform an exact likelihood ratio test of the hypothesis = 3/4. e
The likelihood ratio statistic for testing H is Would the same significance level be obtained if IX - 7.51 were used as the test
statistic?
D = 2[/(fi1' fi2, . .. 'P6)- !(i, t, ... 'ill
3. Consider a sequence of Bernoulli trials with success probability 0. Trials are
D is large whenever the fj's are such that the set of hypothesized parameter
continued until the 5th success has occurred, and it is observed that altogether 10
values {1;, 1;, ... , tl is unlikely. trials are needed. Carry out an exact likelihood ratio test of the hypothesis
Substituting the observed fj's from above gives 0= 3/4.
D0 b, = 2[ -177.326 + 179.176] = 3.70. 4. The seating experiment of Problem 12.1.10 is repeated 28 times using new subjects
By (12.2.4), the significance level is each time. The following table shows the numbers of times that the two end seats
were occupied by two men, by two women, and by a man and a woman:
SL= P{D 2: 3.701H is true}~ {Xfkl:?: 3.70}
Occupants of end seats MM FF MF or FM
where k is the number of functionally independent parameters in the model. Frequency observed 10 4 14
There are six unknown parameters Pt> p2 , ••. , p6 • However, since I.p1 =1,
these are not functionally independent. Only five of the p/s are free to vary,
and then the sixth is determined by the condition I.p1 =1. Thus there are just Test the hypothesis that the probabilities for the three classes are 2_, 2_, and~.
14 14 14
five functiOnally independent parameters, and the x2 approximation will have
k = 5 degrees of freedom. 5.tSeeds from a variety of pea plant are classified as round or angular, and as green
Table B4 now gives or yellow, so that there are four possible seed types: RY, RG, AY, and AG. The
following are the observed frequencies of the four types in 556 seeds:
SL~ P{XfsJ 2: 3.70} 2: 0.5.

The observed value of Dis certainly not unusua)ly large, and hence there is no Pea type RY RG AY AG
evidence against the hypothesis that the die is balanced. Frequency 315 108 101 32
The exact significance level is a sum of multinomial probabilities:
SL= P{D 2: 3.701H is true}
Test the hypothesis that the probabilities of the four types are~, 2_, ~,and .2_
16 16 16 16,
respectively, as predicted by Mendelian theory.
= r.u)~o .. .f5)G)100 6. In a long-term study of heart disease in a large group of men, it was noted that 65
men who had no previous record of heart problems died suddenly of heart
The sum is taken over all sets of frequencies {fj} with I.Jj = 100 such that attacks. The following table shows the number of such deaths recorded on each
D 2: 3.70. Much arithmetic is needed to determine the appropriate sets of day of the week.
frequencies {fj}, although the calculations are certainly feasible on a high-
speed computer. · Day of week Mon. Tues. Wed. Thurs. Fri. Sat. Sun.
Alternatively, one could simulate the experiment a large number of times No. of deaths 22 7 6 13 5 4 6
on a computer and determine the fraction of the time that Dis greater than or
equal to 3.70. This gives an estimate of SL which can be made as precise as Test the significance of these dat(!. in relation to the hypothesis that deaths are
desired by increasing the number of simulations. equally likely to occur on any day of the week.
148
12. Tests of Significance
12.3. Likelihood Ratio Tests for Composite Hypotheses
149
7. (a) Let X,, X 2 , .. ., X. be IID Poisson variates with meanµ.
Derive the
likelihood ratio statistic for testing H: µ = µ .
0
12.3. Likelihood Ratio Tests for Composite
(b) Prior to the installation of a traffic signal, there were 6
accidents per month
(on the average) at a busy intersection. In the first year following
Hypotheses
the
installation there were 53 accidents. Using an approximate likelihood
ratio In this section we extend the discussi on of likeliho od ratio tests
test, determine whether there is evidence that the accident rate has changed. to include
compos ite hypothe ses as well as simple hypothe ses.
8.t(a) Let X 1 , X 2 , .. ., X" be independent exponential variates with mean Suppos e that the basic probabi lity model for the experim ent depends
9. Derive upon
the likelihood ratio statistic for testing H: (} = 9 . a vector of unknow n parame ters B, and conside r an hypothe sis H
0
(b) Survival times for patients treated for a certain disease may concern ing
be assumed to be the value of B. Togethe r, the basic model and hypothe sis determi
exponentially distributed. Under the standard treatment, the expected ne the
sur- hypothe sized model.
vival is 37.4 months. Ten patients receiving a new treatment survived for
the Let k denote the nm:pber of function ally indepen dent unknow n parame
following times (in months): ters
in the basic probabi lity model, and let q denote the number of
function ally
99 8 30 6 53 indepen dent unknow n parame ters which remain in the hypothe
sized model.
60 44 12 105 17 In general, it is not possible to test an hypothe sis H unless it produce
s a real
(i) Are these data consistent with a mean survival time of 37.4 months? simplifi cation in the model, so that q < k.
(ii) The doctor who developed the new treatment claims that it gives a A simple hypothe sis specifies numeric al values for all of the
50% unknow n
increase in mean survival time. Are the data consistent with this claim? paramet ers m the basic probabi lity model. Thus there are no
unknow n
(iii) Obtain a likelihood interval which is an approximate 95% confiden parame ters in the hypothe sized model, and so q = 0 for a simple hypothe
ce sis. A
interval for the mean survival time under the new treatment. compos ite hypothe sis does not complet ely elimina te the unknow
n par-
9. (a) Let X, , X 2 , .. ., x. be IID normal variates with known standard ameters , and so q > 0 for a compos ite hypothe sis.
deviation a Let l(B) denote the log likeliho od function of B under the basic model.
and unknown mean µ. Derive the likelihood ratio statistic for testing Let ()
the be the MLE under the basic model, so that l(G) 2 l(B) for all possible
hypothesis H: µ = µ 0 . values of
(b) The measurement errors associated with a set of scales are independ B. The maximu m log likeliho od under the basic model is l(G).
ent
normal with known standard deviation a= 1.3 grams. Ten weighings Next let 8 denote the MLE of B under the hypothe sized model.
of an The
unknown mass µ give the following results (in grams): maximu m log likeliho od under the hypothe sis is l(O). Since /(G)
2 l(B) for all
possible values of B, we have l(B) z l(O). The restricte d maximu m
227.1 226.8 224.8 228.2 of l(B) under
225.6 the hypothe sis cannot exceed the unrestri cted maximu m of l(B).
229.7 228.4 228.8 225.9 229.6 The likelihood ratio statistic for testing the hypothe sis H is defined
(i) Perform likelihood ratio tests of the hypothesis µ = 226, and to be
the twice the difference between these two maximu m log likeliho ods,
hypothesis µ = 229.
(ii) For which parameter values µ 0 does a likelihood ratio test of H: µ D = 2(1(0) -1(0)]. (12.3.1)
= µ0
give a significance level of 5% or more?
Note that D is twice the natural logarith m of a ratio of likeliho
ods,
10. Let X 1 , X 2 , .. ., X. be independent normal variates with known
variances D = 2 log [L(B)/ L(O)],
V1, Vi, .. ., v. and the same unknown mean
µ. Show that the likelihood ratio
statistic for testing H: µ = µ 0 is and this explains its name.
· Since /(0) 2 1(0), D is non-neg ative. If D is small, then the
D =(jl.- µ 0 ) 2 :Ev1- •, probabi lity of the data is nearly as great under the hypothe sis as
maximu m
it is under
where jl. = (:EX 1v1- 1 )j:Ev1- 1
• Show that, if H is true, the distribution of D is exactly the basic model, and therefor e the data are in good agreem ent
with the
xfl)· hypothe sis. A large value of D means that the data are much less
probabl e
under the hypothe sis, and therefor e the agreem ent is poor. Thus
D ranks
possible outcom es of the experim ent accordi ng to how closely they
agree with
the hypothe sis.
A simple hypothe sis has the form H: B = B , where B is a
0 0 vector of
numeric al values. Under H there is only one possible parame ter
value Bo .
Thus we have 8 = B0 , and the maximu m log likeliho od under
H is /(Bo).
Hence (12.3.1) is the same as (12.2.1) when His a simple hypothe
sis.
150 12. Tests of Significance 12.3. Likelihood Ratio Tests for Composite Hypotheses 151

Calculation of the Significance Level Usually we can find the MLE's ii and pby solving the simultaneous equations

The significance level in a likelihood ratio test of the hypothesis H is given by


SL= P{D;::: D0 b,IH is true}, We can find FJ.({3 0 ), the MLE of a given that f3 = {3 0 , by solving the equation

where Dis the likelihood ratio statistic for testing H, and Dobs is the observed S i(a, /3 0 ) = 0.
value of D. The maximum log likelihood under the model is I(&, '/J). The maximum log
Calculation of the exact significance level is possible in some examples, but likelihood under the hypothesis H : f3 = {3 0 is /(&({3 0 ), /3 0 ). Hence the likelihood
in general there are both theoretical and computational difficulties. If H is ratio statistic for testing H : f3 = /3 0 is
composite, the exact significance level may well depend on the values of the q
unknown parameters in the hypothesized model. Sometimes this problem can D = 2[/(&, P) - l(&(f3o), Pon
be avoided by using a suitable conditional distribution to calculate the Note that, by (10.3.1), '
significance level, but then the calculations required may become unman- D = - 2r maA/30)
ageable. See Chapter 15 for further discussion of conditional tests.
where r max(/J) is the maximum log relative likelihood function of {3. We
Usually it is satisfactory to calculate an approximate significance level considered this likelihood ratio statistic in Section 11.5, and noted that its
using the x2 approximation to the distribution of the likelihood ratio statistic
distribution when f3 = {3 0 is approximately x2 with k - q = 1 degrees of
D. It can be shown that, under conditions similar to those described in
freedom. Thus we have
Section 11.3, the distribution of D when H is true is approximately x2 with
k - q degrees of freedom . When this approximation applies, we have SL= P{D;?: Dobsl/J = /30} ~ P {xfl) ;?: Dobs}·
(12.3.2) There is one degree of freedom for testing H : /1 = p0 , because it reduces the
number of unknown parameters by one.
which can be evaluated using Table B4.
The x2 approximation will generally be· quite accurate whenever the EXAMPLE 12.3.1. In Example 10.1.2 we considered the lifetimes x 1 , x 2 , ... , x"
number of independent observations in the experiment is large in comparison of n = 23 deep-groove ball bearings. These were assumed to be independent
with k, the number of parameters in the basic model. It is unwise to trust observations from a Weibull distribution with probability density function
(12.3.2) whenever 8 or ~ is on or near the boundary of the parameter space.
Note that the degrees of freedom for the x2 approximation is equal to k - q, for 0 < x < oo.
where k and q are the numbers of functionally independent unknown There are two unknown parameters, ,l > 0 and f3 > 0.
parameters in the basic model and hypothesized model, respectively. Thus the
We noted in Example 10.2.2 that the value p = 1 is of special importance,
degrees offreedom for testing H is equal to the number of unknown parameters
because when P= 1 the Weibull distribution simplifies to an exponential
which are eliminated by H.
distribution. Under an exponential distribution model, there is a constant
To conclude this section, we give two examples of likelihood ratio tests for risk of failure, and no deterioration or improvement with age. Thus we wish
composite hypotheses. Many additional examples will be found in the
to know whether the 23 observed lifetimes are consistent with the hypothesis
following sections.
P= 1.
To test H : p = 1, we shall compute the observed value of the likelihood
ratio statistic and then use the x2 approximation. Since H reduces the number
Testing H: f3 = /3 0 when rx is Unknown of unknown parameters by one, th.ere is one degree of freedom for the test.
From Example 10.1.2, the joint log likelihood function is
Suppose that the probability model involves two unknown parameters,
IJ =(a, {3), so that k = 2. Consider the hypothesis H : /3 =Po where f3 is a /(,l, p) = n log ,l + n log P+ (p- 1)1: log x; - ,lfaf,
particular numerical value. This is a composite hypothesis because no value is and the MLE's are
given for a. The hypothesized model involves the unknown parameter a, so p= 2.1021.
that q = 1.
Let /(a, p) be the joint log likelihood function of a and Punder the model. The maximum log likelihood under the model is
Let S 1 and S 2 be the two components of the score function as in Section 10.1. l(X, Pl = - 113.691.
152 12. Tests of Significance 12.3. Likelihood Ratio Tests for Composite Hypotheses
153

Also from Example 10.1.2, the MLE of). given f3 is by (12.3.1), the likelihood ratio statistic for testing H is
X(/3) = n/"i..xf. D = 2[L.11(0;) - L.l;(B) J = - 2'Lr;(B)
Thus the MLE of). under the hypothesi s f3 = 1 is where r1 is the log RLF from the ith experimen t:
A(l) = n/'Lx; = 23/ 1661 = 0.001385, r;(8;) = 1;(8;) -11(i1;).
and the maximum log likelihood under H: fJ = 1 is If D is large, there is no parameter value which is reasonabl y plausible in all
experimen ts, and hence the experimen ts give conflictin g informati on about 8.
1(0.001385 , 1) = -121.433.
There are k - 1 degrees of freedom for testing H because it reduces the
The observed value of the likelihood ratio st!!tistic for testing H: fJ = 1 is number of unknown parameter s from k to 1. Hence (12.3.2) gives
twice the difference between these maximum Jog likelihood s:
SL~ P{xfk - 1) ~ Dobs} ·
Dobs = 2[ -113.691 +121.433 ] = 15.48. A small significance level is evidence that the homogene ity hypothesi s is false ,
This result could also have been obtained from ·the expression for r (p) in and that the informati on from the k experimen ts should not be pooled.
Example 10.3.2. The X2 approxim ation gives max
EXAMPLE 12.3.2. [n Examples 9.2.2 and 9.3.2 we considere d data from k = 2
SL~ P{xf1 ) ~ 15.48} < 0.001 experimen ts with test tubes containin g river water. The paramete r of interest
from Table 84. There is very strong evidence against the hypothesi s fJ = I. is µ, the expected number of bacteria per ml of river water. For the first
The observatio ns are not compatibl e with the simpler exponenti al distri- experimen t the Jog RLF is
bution model.
r 1 (µ) = -280µ + 12 log(I - e- ioµ) + 24.43,
and for the second experimen t we have
Tests for Homoge neity r 2 (µ) = -37µ + 3 Jog(l - e-µ) + 10.66.
Suppose that two or more independe nt experimen ts give informatio n about The pooled MLE ofµ based on the data from both experimen ts was found to
the same unknown paramete r 8. If the experimen ts are in reasonabl e be µ = 0.04005. Hence the observed value of the likelihood ratio statistic for
agreemen t with one another, we can pool or combine the informatio n about 8 testing homogene ity is
by adding log likelihood functions (see Section 9.2 and Example 9.3.2). If, on
the other hand, the experimen ts contradic t one another, it would not be
appropria te to combine them. Instead we would estimate 8 separately for There is just one degree of freedom for testing H, and thus
each experimen t, and try to discover why the experimen ts produced dis-
similar results. SL~ P{xfl) ~ 1.24} > 0.25.
Suppose that there are k independe nt experimen ts, and initially let us There is no evidence against the homogene ity hypothesi s, and it is reasonabl e
suppose that we have a different parameter 8; for each experimen t. Let 1;(8) to pool informatio n about µ as we did in Example 9.3.2.
and 81 denote the Jog likelihood function and MLE for the ith experiment.
(i = 1, 2, ... , k). The overall Jog likelihood function is
PROBLEMS FOR SECTION 12.3
l(81 , 82 , .. .,8k)=l 1 (8i)+l 2 (8
2 )+ .. . +lk(Bd='L l;(O;),
J.tSuppose that X 1 , X 2 , and X 3 have a trinomial distributio n with index n and
and its maximum value is L.l;(O;). probability parameters p 1 , p 2 , p 3 where "'£.pi= 1. The log likelihood function is
Now consider the hypothesi s of homogene ity,
l(p 1 , p 2 , p 3 ) = "LX 1 log p1 ,
H : 8 1 =8 2 = ... =Bk> and the observed values of the X/s are 32, 46, and 22 (see Problem 9.1.4).
and Jet 8 denote the unknown common value of the 8/s. Under H, the log
(a) Find the maximum log likelihood when
likelihood function is L.1;(8), and we maximize this to obtain the combined or
e,
pooled MLE say. The maximum of the log likelihood under His 'Ll;(B), and (i) Pi is estimated as X 1/n for j = l, 2, 3;
154 12. Tests of Significance 12.3. Likelihood Ratio Tests for Composite l'Jypotheses 155

(ii) the p/s satisfy the hypothesis The results were as follows:

H: p 1 82 , p2 = 28(1 8), Location 1: 0 2 0 2 2 0 2 0 0


(iii) the p/s satisfy H and, in addition, 8 = !. Location 2: 3 2 3 2 3 3 2 2 3 3

(b) Use the results from (i) and (ii) above to test the hypothesis H. The bacteria are assumed to be randomly and uniformly distributed
(c) Use the results from (ii) and (iii) to test whether 8 =!,assuming H to be true. throughout the river water, with µ 1 per unit volume at location 1, and µ 2 per
(d) Use the results from (i) and (iii) to test the hypothesis p 1 = p2 = i, unit volume at location 2. Test the hypothesis µ 1 = µ 2 ,
p 3 = !. Note that the likelihood ratio statistic and degrees of freedom are the
totals of those in (b) and (c). 6. Let X 1 , X 2 , .. ., Xn be independent exponential variates with mean 81 , and let
Y1 , Y2 , .. ., Ym be independent exponential variates with mean 82 . Show that the
2. A genetics experiment yields observations X 1 , X 2 , X 3 , X 4 with multinomial likelihood ratio statistic for testing H: 8 1 = 82 depends only on n, m, and X/Y.
probability
7. Suppose that k independent experiments give log RLF's r 1 , r 2 , .. ., rk and MLE's
81 , 82 , .. ., Bk for the same unknown parameter 8. Furthermore, suppose that the
normal approximation applies to each of the r/s:

where :EX;= n. The following are the results from three independent repetitions of r,(8) ~ -!(8 - 8,) 2 c,,
the experiment: where c; = J,(8,).
(a) Show that the MLE of 8 based on all k experiments is approximately equal to
e,where
Repetition 1 26 7 9 22
Repetition 2 24 9 9 22
Repetition 3 23 9 12 20 (b) Show that the likelihood ratio statistic for testing H: 8 = (} 0 is approximately
(8- 80 )2 :Ee,.
Test the hypothesis that the value of p is the same in all three repetitions. (c) Show that the likelihood ratio statistic for testing the homogeneity hypothesis
H: 8 1 = 82 = ... = (}k is approximately :E(fJ, - 8) 2 c,.
3. (a) Let Y1 , Y2, .. ., Yi, be independent Poisson variates with means µ 1 , µ 2, .. ., µk. (d) What are the approximate distributions of the likelihood ratio statistics in (b)
Show that the likelihood ratio statistic for testing H: µ 1 = µ 2 = ... = µk is and (c)?
given by
8. Continuation of Problem 7. Seven different dilution series experiments were used
D 2:EY;log(Y;/Y).
to estimate a parameter h, called the "hit number". The MLE fi and observed
(b) The numbers of cancer cells surviving a treatment in each of three replications information .J are given below for each of the seven experiments.
of an experiment were 235, 184, and 189. Test the hypothesis that these three
observations come from the same Poisson distribution. fi 2.028 2.108 1.912 1.675 1.730 1.808 1.889
.J 19.63 25.18 32.34. 70.54 64.88 67.63 36.58
4. (a) Suppose that Y1 , Y2 , .. ., Y,, are independent Poisson variates with means µ 1 ,
µ 2 , .. ., µn. Let P 1 , P 2 , .. ., Pn be known constants. Consider the hypothesis In each case, the likelihood function was approximately normal in h.
H: µ1 =A.Pi, µ2 = AP2, .. ., µn= ).pn (a) Are these results consistent with a common value of h in all seven
where ). is unknown. Show that the likelihood ratio statistic for testing H is experiments?
(b) Are the combined results consistent with the theoretical value h = 2?
D = 2:EY, log(YJ,ii,)
9.tContinuation of Problem 7. Suppose that three independent experiments give
whereµ,= 'XP, and A= (:EY;)/(:EP,). likelihood functions that are approximately normal in (}, with the following
(b) In Problem 9.2.2(b), test the hypothesis that the death rates for the 10 regions summary statistics:
are proportional to the populations of the regions.
e 1 =9.74 82 = 8.35 83=10.27
5.t(a) Let X 1 , X 2 , .. ., X" and Y1 , Y2 , .. ., Ym be independent Poisson variates. The J 1 (8 il = o.563 J 2 (B2) = 0.345 J 3 (B3) = o.695
X;'s have expected value µ 1 , and the Y;'s have expected value µ 2 • Derive the
likelihood ratio statistic for testing the hypothesis µ 1 = µ 2 • (a) Test the hypothesis that the value of(} is the same in all three experiments.
(b) Bacteria counts were made for 27 volumes of river water, each of unit volume. (b) Obtain four approximate 95% confidence intervals for (}, one from each
156
12. Tests of Significance
12.4. Tests for Binomial Probabilities
157
experiment taken separately, and one from the
combined results of all three
experiments. Now suppose that we wish to test an hypothesis
H about the p;'s. For
10. Consider the situation described in Problem instance, we may wish to test that they are equal
10.1.5. Testing stops when there have :
been m failures with each treatment. Let X
1 , X •.•. , X m be the numbers of
successes with treatm ent A, and let Y , Y , ••• , Ym 2 Hi:P i=P2 = ··· Pk·
1 2 be the numbers of successes with
treatment B. Derive the likelihood ratio statistic Their common value is not given, so there is one
for testing the hypothesis ex= fJ. unknown param eter under
11. Let X 1 , X 2 , ••. , X n and Y , Y , ••. , Ym be indepe Hi. Alternatively, if the k treatments are differ
1 2 ndent normal variates, all with the ent doses d1' d 2 , .. ., dk of a
same known variance a 2 • The X;'s have mean drug, we might wish to test the hypothesis
µ 1 and the Y;'s have mean µ .
2 ·
(a) Show that the likelihood ratio statistic for for i = 1, 2, .. ., k,
testing H: µ 1 = µ 2 is
which states that the response probability is relate
D= ~[n(
a
X -ii) 2 +m(Y -WJ nm _
--(X -Y)2,
-
model (10.5.1 ). There are two unknown param eters
d to the dose via the logistic
n+m under H 2. .
Assuming H to be true, we can rewrite the log
where µ (nX + mY)/(n + m). likelihood as a funct10n of
the q remaining unknown parameters and find
(b) Find the distribution of X - Y. Hen_ce show their MLE's. From these we
that the distribution of D is can compute p1 , p2, ... , pk, the MLE's of the origin
exactly x1n. al probability param eters
under H. The maximum of tht log likelihood is
then
l(p) =LY; log P; + L(n; - y;) log (1 - p;).
By (12.3.1), the likelihood ratio statistic for testin
12.4. Tests for Binomial Probabilities g H is
D = 2[/(p )- l(p)]
Suppose that k different treatments are to be
compared on the basis of
success/failure data. The first treatment is given = 2LY; log
to ni subjects and Yi _ + 2L(n; - y;) log (l _ - ) ·
successes are observed. The second treatment is n;p; n; P;
given to n2 different subjects
and Y2 successes are observed. The results of an Note that n;P; and n;(l - p;) are the expected
experiment with k treatments numbers of successes and
can be summarized in a table as follows: failures for the ith treatment under H, whereas Y;
and n; - Y; are the_observed
frequencies. Thus we can write
Treatm ent no. 2 k · obs freq
D = 2L(obs freq)· log f
No. of successes Yi Y2 exp req (12.4.1)
~
No. of failures ni - Yi n1 - Y2 nk- ~ where the sum extends over all 2k classes (succe
sses and failures).
Total The degrees of freedom for testing H is k - q,
ni n1 nk where q is the numb er of
unknown parameters which remain under H. By
We wish to make inferences abou t the success (12.3.2) we have
probabilities Pi, p 2 , ... , Pk on
the basis of the observed results. SL::::; P{x&-q) ~ Dobs}·
We assume that Y;, the numb er of successes The approximation will be accurate provided
with treatment i, has a that all of the expected
binomial (n;, p;) distribution, and that the Y;'s
are independent The basic frequencies n;p; and n;(l - p;) are fairly large.
model involves a vector of k different unknown param
eters, p = (Pi, P2, .. ., Pk),
where P; is the success probability for the ith treatm EXAMPLE 12.4.l. The food additive "Red Dye Num
ent. The log likelihood ber 2" was fed to 4~ rats at
function is a low dose and to 44 rats at a high dose. Later
the rats were examined for
/(p) =LY; log P; + L(n; - y;) log(l - p;). tumors, and the results were as follows:
The MLE of P; is P; = yjn;, and the maximum of Treatment Low dose High dose
the log likelihood under the
basic model is
Tumo r present 4(9) 14(9)
l(p) =LY; log~+ L(n; - Y;) log No tumor 40(35) 30(35)
n;
Total 44 44
12. Tests of Significance 124. Tests for Binomial Probabilities 159
158

Note that 47% developed tumors at the high dose, and only 9% developed Dobs = 2 [ 6 log _
6 + 44 log 44
_
43 61
+ ··· + 6 log 6
.4
4 7
J= 1.42.
6 39
tumors at the low dose. Could these results have arisen by chance, or is there
evidence of a real dose effect? Since H reduces the number of unknown parameter s from 5 to 2, there are
Let Y1 and Y2 be the numbers of rats with tumors at the low and high doses, three degrees of freedom for the test, and
respectively. We assume that Y1 and Y2 are independe nt, with Y1 - binomial
SL~ P{xf3 ,;:::: 1.42} > 0.5.
(n 1 , pi) and Y2 - binomial (n 2 , p 2 ), where n 1 = n2 = 44. We wish to know
whether there is conclusiv e evidence against H: Pt = p2 . The observed value of D is not unusually large, and hence there is no evidence
Let p denote the unknown common value of p 1 and p 2 under the against the hypothesi s of a logistic dose-resp onse curve.
hypothesi s H. From Example 9.2.2, the MLE of pis We concluded previously , after informal inspection s of Figure 10.5.2 and
- Yi + Y2 4 + 14 9 Table 10.5.2, that the logistic model fits the data well. The LR test just
p=--=---=-.
n 1 +n 2 44+44 44 performed provides a more formal justificati on of this conclusio n. The test
tells us whether the observed discrepan cies can be attributed to chance
variations . Tables and graphs tell us what kinds of departure s have occurred
Under H we have p1 = p2 =~,and the expected frequencies are
and how large they are. Both significance tests and less formal methods are
useful in assessing the fit of the model.
n1fi1=9;
n1(1- P1l = 35; PROBLEMS FOR SECTION 12.4
The table above shows these values in parenthes es. By (12.4.1), the observed 1. Two hundred volunteers participated in an experiment to examine the effectiveness
value of the LR statistic is of vitamin C in preventing the common cold. One hundred of them were selected at
random to receive a daily dose of vitamin C, and the others received a placebo .
30]
Dobs
.[ 4 14
= 2 4 log 9 + 14 log 9 + 40 log 40 + 30 log 35 = 7.32. None of the volunteers knew which group they were in. During the test period, 20
35 of those taking vitamin C and 35 of those receiving the placebo caught colds. Test
the hypothesis that the probability of catching a cold is the same for both groups.
Since H reduces the number of unknown parameter s from 2 to 1 there is one
degree of freedom for testing H, and 2. A seed dealer claims that his sweet pea seeds have a germination rate of 80% . A
customer purchased 4 packages of sweet pea seeds, one package of each of four
SL~ P{xfi>;:::: 7.32} < 0.01. colors. He planted 100 seeds from each package. The numbers of seeds germinating
within one month were as follows:
Results as extreme as those observed would rarely occur if p 1 and p2 were
equal, and therefore we have strong evidence against H: Pt= p 2 • The Red White Blue Yellow
greater at the high dose than at the low dose, and the Germination 75 66 81 74
incidence of tumors is
No germination 25 34 19 26
difference is too large to be attributed to chance.
(a) Test the hypothesis that the germination rate is 80% for all four colors.
EXAMPLE 12.4.2. Table 10.5.1 shows the data from an experimen t in which an
(b) Test the hypothesis that the germination rate is the same for all four colors (but
insecticide was administe red in k = 5 doses. We assume that Y;, the number not necessarily 80%).
killed at dose di> has a binomial (ni> Pi) distributio n, and that results for (c) Assuming that the germination rate is the same for all four colors, test the
different doses are independe nt. We wish to determine whether the logistic hypothesis that it is 80%.
dose-resp onse model (10.5.1) is compatibl e with these data. Thus the (d) How are the likelihood ratio statistics in (a), (b), and (c) related?
hypothesi s of interest is
3.tFour hundred patients took part in a study to compare the effectiveness of three
for i = 1, 2, . .. , 5 similar drugs. Each drug was given to 100 patients, and the remaining 100 patients
received a placebo. It was then observed whether or not there was improvement in
where a and fJ are unknown parameter s. the condition of each patient. The results were as follows:
We showed in Example 10.5. l that the MLE's of a and fJ are a= -4.8869,
Drug A Drug B DrugC Placebo
7J = 3.1035. Using these values, we computed estimated probabilit ies Pi and 24 19 29 10
Improvement
then found the expected frequencies ndii> nJl - p;) (see Table 10.5.2). Now, by No improvement 76 81 71 90
(12.4.l}, the observed value of the LR statistic for testing His
160
12. Tests of Significance
12.5. Tests for Multinomial Probabilit
ies
(a) Test the hypothesis that the prob 161
ability of improvement is the same
groups. in all four
(b) Test the hypothesis that the three To cons truc t a freq uenc y tabl e,
drugs are equally effective. we part ition the sam ple spac e S
(c) Assuming that the three drugs repe titio n into k mut uall y excl usiv for a sing le
are equally effective, test the hypothesi e'cla sses or even ts, S = A u A
success rate is the same for those s that the Let p1 be the prob abil ity of even 1 2
u ... u Ak.
receiving a drug as for those recei t A1 , and let fj be the num ber of
placebo. ving the occu rs in the n repe titio ns. Exa time s that A .
ctly one of the even ts mus t occu
(d) How are the likelihood ratio stati repe titio n, so "Lp = 1 and "Lfj = r in each
stics in (a), (b), and (c) related? 1 n.
4. An experiment involved expo Und er the assu mpt ion of inde pend
ent
then observing how many survived
sing a large number of cancer cells to
a treatment and fj's is mul tino mia l with join t prob abil repe titio ns, the dist ribu tion of the
. There were two treatments, each of ity func tion
applied to two different groups of which was
cells. The results were as follows: ui1; ....fiJp{' p{' ... Pi".
Treatment The log like liho od func tion is
A A B B
Num ber of cells 4800 0 4800 0 192000 1920 00 l(p) = l(P1, Pz, .. ., Pk)= °Lfj log p
Num ber surviving 7 9 49 39 1
whe re "Lp1 = 1. It can be show
Assume that cells respond independ n ,t hat, subj ect to this cond ition
ently, and that the survival probabili max imiz ed for p1 = y /n. Hen ce , l(p) is
four groups are ci: 1 , ci: , /3 , and /3 ties for the 1 the max imu m log like liho od und
2 1 2 , respectively. mul tino mia l mod el is er the basi c
(a) Test the hypothesis H: et: = et:
1 , /3 = /3 . l(p) = L.Jj log (fj/ n).
(b) Assuming that the hypothesis 2in 1 2
(a) is true, test the hypothesis that the The hypo thes ized mod el will dete
probability is the same for both treat survival rmin e the p/s num eric ally or as
ments. of unk now n para met ers. We find func tion s
5. Test the hypothesis of a logis the _ML E's of any unk now n para
tic dose-response model in Problem use thes e to com pute P1> p , met ers and
10.5. l. 2 .. ., Pk> the ML E's of the p
6.t Test the hypothesis p = e• +Pd hypo thes ized mod el. The max imu .'s und er the
in Problem 10.5.2. m of the log like liho od is then
7. An interviewer in a shopping plaz l(p) = L.fj log p .
a asks individuals who pass by if they
to fill in a questionnaire. He keeps are willing 1
asking people until 30 agree. The follo By (12.3.1), the like liho od ratio
the numbers of refusals he receives wing are stati stic for testi ng the mod el is
on each of six days.
D = 2[/( fi)- /(p) ] = 2"£fj log( j;le
Day ), (12.5.1)
2 3 4 5 6
Number refusing whe re e1 = np1 is the estim ated expe
70 67 80 62 100 cted freq uenc y for the jth clas s
und er the
112 hypo thes ized mod el. Not e that
Assume that individuals respond (12. 5.l) has the sam e form as (12.4
independently , and that each indiv sum is now take n over k clas ses .1), but the
questioned on the ith day has prob idual rath er than 2k classes.
ability p1 of responding. Test the Sinc e the k prob abil ities p , p ,
P1= P2 = ... = P6· hypothesis 1 2 .. ., Pk mus t sum to 1, ther e are only
func tion ally inde pend ent para met k- 1
Note: Since the distribution of the ers in the basi c mul tino mia l mod
number refusing is negative binomial be the num ber of unk now n para met el. Let q
than binomial, you will need to deriv rather ers in the hyp othe size d mod el. The
e the likelihood ratio statistic from first art; (k- 1)- q degr ees of free dom n ther e
principles. for the x2 app roxi mat ion, and (12.3
.2) give s
SL;::, P{Xfk-t -q) ~Dabs}·
Clas ses for whic h e ;::, 0 but f :2:
12.5. Tests for Multinomial Probab 1 l will have a big effect on D b,,
appr oxim atio n shou ld not be trus 1 and the x2
ilities thum b is that the e/s shou ld all
ted whe n the e/s are sma ll. The
0

usua l rule of
be at leas t 5, but an occa sion al
Sup pose that we have data from is not too harm ful. sma ller valu e
n inde pend ent repe titio ns of an expe
and that we wish to asse ss how rime nt, Ano ther test stati stic whic h may
well the data agre e with an hypo be used with mul tino mia l or bino
prob abil ity mod el for the expe rime thes ized data is the Pear son good ness of mia l
nt. One way of doin g this is to cons fit stati stic,
tabl e of obse rved freq uenc ies, truc t a
whi ch are then com pare d with
freq uenc ies und er the hyp othe expe cted
size d mod el (see Sect ion 1.4). (12.5.2)
sign ifica nce may be used to dete A test of Th!! obse rved valu e of this stati
rmin e whe ther the disc repa ncy stic will be very near ly equa l to
obse rved and expe cted freq uenc betw een the like liho od ratio stati stic (12.5.1) that of the
ies is too grea t to be attri bute d or (12.4 .1) whe n the e/s are very
to chan ce.
x
the sam e 2 app roxi mat ion can
be used .
larg e, and
162 12. Tests of Significance
12.5. Tests for Multinomial Probabilities
163
Significance tests for multin omial data using test statisti
c (12.5.1) or (12.5.2) If H were true, one would obtain D;;:: 6.22 in about 10%
are often called goodness of fit tests. of repetit ions of the
experiment. Therefore we do not have conclusive
evidence agains t the
EXAMPLE 12.5.1. In Examp le 12.2.2 the basic model hypothesis.
was multinomial with
k = 6 classes , and we carried out a likelihood ratio test This test and the one in Examp le 12.1.2 give about the
of the hypothesis same significance
level for these data, but in other examples they may
give quite different
H:p1 =pz= ··· =p6= t· results. For instance, suppo se that
The data were the observed frequencies 16, 15, 14, 20, 22,
13 from 100 rolls of
a die. This analysis can be simplified by using formu la (12.5.1
fo /1 = 10, f2 = 7, f4 = 8.
= 25,
) to compu te the Then 7;,bs = 56, and the test of Examp le 12. l.3 gives SL~
observ ed value of the likelihood ratio statistic. Under 0.2. However, the
H, each class has likelihood ratio test gives
expect ed frequency e =lOO (i)=16 .67. Now (12.5.1) gives
1

Dabs= 2 16 log 16
[ . + 15 log 15
16 67
. + ··· + 13 log 13
16 67 16 .67 = 3.70,
J SL~P{xf3 >;;:: 17.58} < 0.001.
The total numbe r of correct guesses is not far from
the expected numbe r
which agrees with the result in Examp le 12.2.2. Since H under H, but the observed frequencies are not at all
reduces the numbe r of like what we'd expect
unkno wn param eters from 6 - l = 5 to 0, there are 5 under H.
degrees of freedom for The likelihood ratio statistic (12.5.1) is a "general purpos
the test, and e" measu re which
does not look for any specific type of depart ure from H.
The test statist ic used
SL~ P{xfs);;:: 3.70};;:: 0.5 in Examp le 12.1.2 was designed to detect a particu lar type
of depart ure -an
as before. There is no evidence agains t the hypothesis excess of correc t guesses. It is more sensitive to depart
of a balanced die. ures of the type
anticipated, but it may fail to detect substa ntial depart
EXAMPLE 12.5.2. The following are the observed freque ures of other kinds.
ncies from the ESP
experi ment in Examp le 12.1.2: EXAMPLE 12.5.3. In Examp le 4.4.3 we considered the
distrib ution of flying-
bomb hits over 576 regions of equal area in south Londo
n. The following
No. correct j 0 table shows the numbe r of regions f which suffere
2 4 Total 1 d exactly j hits
(j=O, 1, 2, ... ):
Obs freq fj 17 18 19 6 50
Exp freq e1 18.75 16.67 12.50 2.08 50 No. of hits j 0 2 3 4 Total
under the basic model, the rs come from a multinomial distribution with Obs freq fj
Exp freq e1
229
226.74
211 93 35 7 576
J
9 8 211.39 98.54 30.62 7.14
k = 4 classes . If there is no ESP, the four classes have 1.57 576
probabilities
24 , 24 , One region received 7 hits, and the total numbe r of hits
6 observ ed on all 576
- and -l so the hypothesis
. f. .
o mterest 1s regions is
24' 24'
9 8 6 L.if1 = 229 x 0 + 211 x l + ... + 7 x 4 + 1 x 7 = 537.
1
H: Pt= 24, P2 = 24, p3 = 24, p4 = 24. Under the basic model, the f/s come from a multin omial
distrib ution with
We multip ly these four probabilities by 50 to get the k = 6 classes.
expected frequencies If points of impac t are random ly and uniformly distrib
under H. uted over the study
By (12.5.1), the observed value of the LR statistic is region, the numbe r of hits in an area should have a Poisso
n distrib ution. Thus
we consider the hypothesis
D =2[1 7log_ !!_+ :.. +6lo g_i_ J=6.22. for j = 0, I , 2, .. .
obs 18.75 2.08
H reduces the numbe r of unkno wn param eters from 4 where µ is an unkno wn param eter.
- l = 3 to 0, so Under H, the log likelihood function is
SL~ P{xf3>;;:: 6.22}::::: 0.10.
L.Jj log p1 = L.jf1 logµ - µL.jj- L.Jj logj!
164 12. Tests of Significance 12.5. Tests for Multinomial Probabilities
165

from which the MLE is found to be which is maximized for ji = 0.9300. Recomputi ng expected frequencies with
µ = L.jfj/'Lfj = 537/576 = 0.9323. this value ofµ gi ves D0 bs = 0.99 rather than 1.00.
In general, if the value ofµ used in the calculation s is not the "true" MLE,
(This is not quite right - see the note at the end of this example.) Using this D 0 b, will be too large. However, unless there is a substantial amo unt of
estimate, we can find p1 and e1 = 516p for j = 0, 1, ... , 4. The expected grouping, the difference will usually be too small to matter.
1
frequency for the last class is then obtained by subtraction from the total (see
Example 4.4.3). EXAMPLE 12.5.4. Consider the set of 109 waiting times between mining
The observed value of the LR statistic is accidents which we discussed in Sections 1.2 and 1.4. If accidents occur
randomly in time at the constant rate of }. per day, the time T between
Dobs =
I
2L.fj log2 = 1.18. successive accidents has an exponential distribution with mean (J = 1/ A. (see
ei
Section 6.5). Here we haven= 109 observation s t 1 , t 2 , . . . , t" , and we wish to
The hypothesis reduces the number of unknown parameters from k - l = 5 determine whether an exponential distribution model is satisfactory.
to 1, so One way to examine the fit of the model is to group the data into k classes
[a1 _ 1 ,a) and prepare a frequency table (see Example 1.2.1). Fo r the
SL~ P{xf4 > 2 1.18} ~ 0.9. exponential distribution we have
There is no evidence against the hypothesis. The observed frequencies are in
close agreement with the expected frequencies from a Poisson distribution. P(a1 _ 1 s; T < a)=exp(- a1 _ 1 /8)-exp(- a)8)
The expected frequency in the last class is only 1.57, and we might therefore and so the hypothesis of interest is
have some concern about the adequacy of the x2 approximat ion. To check
this, we could combine the last two classes into a single class ( 2 4) with H : p1 =exp(-a1 _ 1/8)-exp( -a) 8) forj= 1, 2, ... , k.
f = 7 + 1 = 8 and e = 7.14 + 1.57 = 8.71. Summing over the k = 5 classes gives There will be k - 2 degrees of freedom for testing H because it reduces the
D b, = 1.00 with (5 - 1)- 1=3 degrees of freedom, and
0 number of unknown parameters from k - l to 1.
Table 12.5.1 is obtained from Table 1.4.l by combining the last two classes.
SL~ P{xf3 > 2 1.00} ~ 0.8.
e
The ejs were computed using = t = 241 , which is the MLE based on the
The conclusion is the same as before. original set of 109 measureme nts. Now (12.5.1) gives

Note. In calculating ji above, we used the fact that the observation in class 2 5 D obs = 2L.fj log(fj /e) = l 8.79.
was "7''. Strictly speaking, ji should be obtained using only the information in Since there are k = 11 classes, we have
the frequency table. For the first test with k = 6 we have p = µ1e - µ/j! for
1 SL~ P{xf91 2 18.79} ~ O.o25.
0 s;j s; 4, and
Thus there is some evidence against the exponentia l distribution model.
P ~ s = 1-e-µ(1 + µ + µ 2/ 2! + µ 3 / 3! + µ 4 /4!) The expected frequency for the last class is only 1.72, and we might be
so the appropriate log likelihood function is tempted to combine the last two classes as we did in Example 12.5.3. We
4
I fj log p1 +f~ 5 log [l - e - µ(l + µ + ··· + µ 4 /4!)]. Table 12.5.1. Observed and Expected Frequencie s for
j = O the Mining Accident Data of Example 1.2.1
Maximizin g this by Newton's method or trial and error gives ji = 0.9291, and
this is the estimate which should be used in computing the ejs. The result is a Class Jj ei Class Ji el
slightly better fit (Dobs = 1.17 rather than 1.18), but no change in the [ 0, 50) 25 20.42 [ 300, 350) II 5.88
conclusion. [ 50, 100) 19 16.60 [ 350, 400) 6 4.78
Similarly, when we combine the last two classes, the appropriate log [100, 150) 11 13.49 [ 400, 600) 5 11.69
likelihood ftinction is [150, 200) 8 10.96 [ 600, 1000) 3 7.32
[200, 250) 9 8.91 [1000, co) 5 1.72
3
I fj log p1 +f;,:4 log [1-e-µ(l + µ + µ 2 /2! + µ 3/3!)] [250, 300) 7 7.24
Total 109 109.01
j=O
12 Tests of Significance 12.5. Tests for Multinomial Probabilities 167
166

4. Test the goodness of fit of a Poisson distribution model to the data of Example
would then obtain D obs = 11.51 with 8 degrees of freedom, and SL::<::: 0.2. The 4.4.2.
result is now quite different because the deviations in the last two classes were
in opposite directions and have cancelled one another. Combining these 5. In a biological experiment, a square millimeter of yeast culture was subdivided
classes is not a good idea because it obscures the difficulties in the right hand into 400 equal-sized squares, and the number of yeast cells in each small square
was recorded. The results are summarized in the following frequency table:
tail of the distribution. In fact, the model is not appropriate for these data
because the accident rate ). is not constant over time (see Example l.4.2).
Number of cells 0 2 3 4 5 6 '?.7
A difficulty with the above analysis is that the results obtained will depend Frequency observed 137
129 83 38 10 2 0
to some extent upon the arbitrary grouping used to produce the frequency
table. It is a good idea to try three or four different groupings and check that If yeast cells are randomly and uniformly distributed over the area examined the
similar results are obtained. Alternatively, the fit of the model can be checked number of yeast cells per square should have a Poisson distribution. Test wh:ther
via informal graphical procedures (see Example 6.3.1). a Poisson model is consistent with the data.
The note following Example 12.5.3 applies to this example as well. The
6.t According to genetic theory, blood types MM, MN, and NN should occur in a
likelihood function based on the frequency table is
very large population with relative frequencies 8 2 , 28(1 -8), and (I - 8) 2 where 8
is the (unknown) gene frequency .
where p1 = exp(-a1_ 1 /8)-exp(-a1/8),
(a) The observed frequencies in a sample of size 100 from the population were 33,
and maximizing this function gives~= 232.5. This value, not()= 241 , should 44, and 23, respectively. Test the goodness of fit of the model to these data.
properly have been used in computing expected frequencies. We would then (b) Suppose that the observed frequencies in a sample of size 400 were exactly
have obtained D b• = 18.65 instead of D b, = 18.79, a change which is too four times those given in (a). Carry out a goodness of fit test and explain why it
0
0
gives a different result than that in (a).
small to be of any practical importance.
7. Test the goodness of fit of the model in Problem 10.1.1.

PROBLEMS FOR SECTION 12.5 8. Test the goodness of fit of the exponential distribution model in Problem 9.6.3.

I. In Problem 12.1 .9, carry out a goodness offit test of the hypothesis that blocks are 9. Test the goodness of fit of the model in Problem 9.1. IO(b).
placed in a random order. 10.t(a) A city police department kept track of the number of traffic accidents
2.tTwelve dice were rolled 26306 times. Each time, the number of dice showing 5 or 6 involving personal injury on sixty week-day mornings. The results were as
uppermost was r.ecorded. The results are summarized in the following table: follows:

No. of 5's and 6's 0 I 2 3 4 5 6 Number of accidents 0 2 3 4 5 '?.6


Frequency observed 185 1149 3265 5475 6114 5194 3067 Frequency observed - 17 17 16 7 2 0

No. of 5's and 6's 7 8 9 10 II 12 Total ls a Poisson distribution model consistent with these data?
Frequency observed 1331 403 105 14 4 0 26306 (b) The police department also recorded the number of persons injured in traffic
accidents for the same sixty mornings, with the following results:
Compute expected frequencies under the assumption that trials are independent
and the dice are balanced. Test for consistency, and give a possible explanation for Number injured 0 I 2 3 4 5 6 7 2::8
the poor agreement. Frequency observed 17 8 9 8 10 4 2 2 0
3. Mass-produced items are packed in cartons of 10 as they come off the assembly
line. The items from 250 cartons are inspected for defects, with the following If injuries were randomly and uniformly distributed over time, the number of
injuries per morning would have a Poisson distribution. Show that this model
results:
is contradicted by the data, and indicate which of the assumptions for a
Number defective 0 I 2 3 4 5 '?.6 Poisson process is violated.
Frequency observed 103 81 39 19 6 2 0 (c) Of the 83 accidents recorded in (a), 22 occurred on Mondays, 13 on Tuesdays,
11 on Wednesdays, 12 on Thursdays, and 25 on Fridays. Are these results
Test the hypothesis that the number of defective items per cartcn has a binomial
consistent with the hypothesis that accidents are equally likely to occur on
distribution. Can you suggest a reason that the distribution might not be
any day of the week?
binomial?
168
12. Tests of Significance
12.5. Tests for Multinomial Probabilities
11. The following results were obtain ed in 169
150 rolls of a 6-sided die:
16.tn items were exami ned from the outpu t of
Side three simila r machi nes in a factor y. Ten
2 3 4 5 6 percen t were defective for the first machi ne,
Frequ ency observ ed five percen t for the secon d m achine ,
29 26 31 24 19 21 and twelve percen t for the third machi ne. A likel
ihood ratio test of the hypo thesis
that the proba bility of a defective is the same
(a) For a brick-shape d die (Exam ple 1.3.2), for all three machi nes gave a
the face probab ilities are significance level of 5%. How large was n?

·p, = P6 =i- 28. 17 (a) Let / 1 , f 1 , e,, and e1 be any positiv e real numbe rs. Prove
that
Comp ute expec ted freque ncies under this model
, and test the goodn ess of fit.
(b) Assum ing the model in (a) to be correc (!1 +!)I f,+f1
og - - - ~ fl f1
t, test 'the hypoth esis 8 = 0. 2
e 1 + e1 1 og - +f 1 log -f1 .
(c) Carry out a likelih ood ratio test of the e1 e2
hypoth esis p, = p 1 = · · = p = k.
How is the likelih ood ratio statist ic rel ated 6 Hint: Consi der the functi on
to those in (a) and (b)?
12. Fifty specim ens of plastic are repeat edly /(). 1 , A1) = f log ). + f log ).
struck with a hamm er until they fracture. 1 1 2 2 -(e 1). 1 + e 2 ). 2 ).
The data are summ arized in the following table:
Its restric ted maxim um subjec t '. to ). = ).
1 2 canno t exceed its unrest ricted
maxim um over all ). , A •
No. of blows 2 3
1 2
4 5 6 ~7 (b) Suppo se that (12.5.1) is calcul ated over k
Obser ved freque ncy 23 classes . The first two classes a re then
13 8 4 0 combi ned into a single class with observ ed
freque ncy f 1 + / 2 and expec ted
frequency e, + e 2 and D is recalc ulated . Show
It is thoug ht that the numb er of blows that the value of D ca nnot
neede d to fractu re a specim en 1s increase.
geome tricall y distrib uted, with proba bility
functi on (c) Show that the Pearso n goodn ess of fit statist
ic (12.5.2) has a simila r prope rty.
f(x)= O(I -er' ; x =I, 2, ... ; 18.t(a ) A popul ation consis ts of a, families with
exactl y i childr en, for i = 0, 1, .. .. k.
Find the maxim um likelih ood estima te of 8, There are l:ia, childr en in the popul ation,
and test the goodn ess of fit of the and from these n childr en are
model to the data. chosen at rando m. Find the expec ted numb er
of childr en in the sampl e who
have exactl y j siblings (broth ers and sisters).
13.t A long seque nce of digits (0, 1, .. ., 9) produ (b) The 1931 Canad a census yields the follow
ced by a rando m numb er genera tor ing data
was exami ned. There were 51 zeroes altoge
ther, giving 50 pairs of successive
zeroes. For each such pair, the numb er of nonze
ro digits betwe en the two zeroes
was determ ined. The results were as follows:
I 207756 5 32080 9 2859
I 6 8 10 22 12 15 0 2 156111 6 18128 10 1353
0
2 26 I 20 4 2 0 10 3 95779 7 10511 11
4 19 575
2 3 0 5 2 8 I 4 56275 8 5624 12
6 14 2 326
2 2 21 4 3 0

l
0 7 2 4
4 7 16 18 2 13 22 7 3 5 Find the expec ted numb er of childr en with exactl
y j sibling s (j = 1•... , 11)
in a rando m sampl e of 242 childr en from this
Descr ibe an appro priate proba bility model for popul ation.
these counts ifthe rando m numb er (:) A sociol ogist asked each of 242 alcoho lics
gener ator is actual ly produ cing rando m digits. bow many sibling s he had, and the
C:::onstruct a frequency table and results were as follow s:
test the goodn ess of fit of the model .
·
No ..of
14. Consi der the multin omial log likelih ood
functi on siblings 0 2 3 4 5 6 7 8 9 10 II Total
l=f1 logp 1 +f2 logp 2 + -·· +fi.lo gpk Obser ved
frequency 21 32 40 47 29 23 20
where l: pj = 1 and l:fj = n. Show that 11 l0 3 3 2 242
I is maxim ized when pj = f /n for
j =I, 2, .. ., k. 1
Test the hypoth esis that the distrib ution of
family size for alcoho lics is as
15. Show that, for both the likelih ood ratio statist indica ted by the 1931 census . ·
ic ( 12.5. I) and the Pearso n goodn ess
of fit statist ic (12.5.2), the effect of doubl ing all (d) Anoth er possib le proced ure would be
observ ed and expec ted freque ncies to comp are observ ed values fj =
is to doubl e the value of D. fi(j + 1) with the popul ation values ej = 242a/ l:a
Expla in why this proce dure is incorr ect. 1 using (12.5.1) or (12.5.2).
12. Tests of Significance 12.6. Tests for Independence in Contingency Tables 171
170

19.tThe following table records 292 litters of mice classified according to litter size the row classification (treatment). This can be investigated by testing the
and number of females in the litter. hypothesis that the row and column classifications are independent. If this
hypothesis is contradicted by the data, then there is evidence of an
Number of females association between the two classifications.
0 1 2 3 4

{'
8 12
a x b Contingency Table
. .
Litter size !
2 23
10
44
25
13
48 13
As in the preceding section, we consider n independent repetitions of an
5 30 34 22 5
experiment. However now we suppose that the outcome of each repetition is
Suppose that the number of females in a litter of size i is binomially distributed classified in two ways: according to which of the events A 1 , A 2 , ... , A 0 occurs,
with parameters (i, p1). and according to which of the events B 1 , B 2 , ... , Bb occurs. We assume that
the A/s (and similarly the B/s) are mutually exclusive and exhaustive, so that
(a) Test the hypothesis P1 = P2 = p3 = p4.
(b) Assuming the hypothesis in (a) to be true, test the significance of deviations each outcome belongs to exactly one of them. Altogether there are k =ab
from the binomial distribution model. possible classes Ai Bi. Let Pii be the probability of class Ai Bi, and let fii be the
(c) How would the test in (b) be affected if equality of the p;'s was not assumed? observed frequency for this class in the n repetitions. The frequencies can be
arranged in an ax b table as shown in Figure 12.6.1. We denote the ith row
20. Test the goodness offit of the exponential distribution model in Problem 9.4.l(b).
total by ri and the jth column total by c1. Note that
First, compute expected frequencies using the MLE of 0 based on the original 27
measurements. Then repeat the test using the MLE of 0 based on just the
frequency table.
Under the assumption of independent repetitions, the distribution of the
21. In Problem 10.1.11, test the hypothesis that the length of the gestatfon period is fu's is multinomial with k =ab classes. The joint probability function is
normally distributed. Use the approximate MLE's from Problem 10.1.1 l(a) in n
computing the expected frequencies. How would the value of the likelihood ratio U11f 12 ... f.b)pfrpfr ... p~r;b,
statistic change if one used the exact MLE's in calculating expected frequencies?
and the log likelihood function is
l(p) = L.L.fii log PiJ

12.6. Tests for Independence in Contingency Tables where L.L.p;1 = 1. The situation is the same as in Section 12.5 except that now
we are using double subscripts. It follows from (12.5.1) that the likelihood
Many interesting statistical applications involve the analysis of cross- ratio statistic for testing an hypothesis H concerning the Pu's is
classified frequency data. For instance, in a study to evaluate three treatments (12.6.1)
for cancer, one might classify each of n patients according to the treatment
received, and also according to whether or not the patient survived a five-year where eii = npii is the expected frequency for class A;Bi under the hypothesis.
period. The results could be displayed in a 3 x 2 array, with one row for each The degrees of freedom for testing H will be (k -1)- q =ab - 1 - q, where q
treatment category and one column for each survival category. The body of is the number of unknown parameters which remain under H.
the table would give the number of patients in each of the six classes. A cross-
tabulation of frequency data such as this is called a contingency table. B, Total
B2 Bb
In the example just described, we have a two-way or two-dimensional
table. If we also classified patients by sex, we would have a three-way (3 x 2 A1 f 11 f 12 f1b r1
x 2) contingency table containing 12 frequencies. We shall restrict the A2 !21 !22 f2b r2
discussion here to two-way tables only. Many examples of higher dimen-
sional contingency tables may be found in Bishop, Fienberg, and Holland, A. f.1 f.2 Jab r.
Discrete Multivariate Analysis, MIT Press (1975). Total C1 C2 Cb n
A question of interest in the cancer study would be whether there is a
connection or association between the column classification (survival) and Figure 12.6.l. ax b contingency table.
172
12. Tests of Significance 12.6. Tests for Independence in Contingency Tables
173
Hypothesis of Independence
EXAMPLE 12.6.1. (R.A. Fisher, Smoking and the Cance
r Contr oversy, Oliver
and Boyd, 1959). Seventy-one pairs of twins were
A questi on which is often of interest is whether there examined with respect to
is any connection or their smoking habits . For each pair, it was ascert
association between the row and colum n classification ained whether they were
s. To investigate this, identical twins (Ai) or fraternal twins, (A ), and 'wheth
we consider the hypothesis of independence, 2 er their smoki ng habits
were alike (Bi) or unlike (B 2 ) . The results are shown
in the following 2 x 2
for all i,j contingency table:
which states that each row event A, is independent
of every column event Bi. Like habits Unlike habits
Using the definition of condi tional proba bility (3.4.1) Total
, we can rewrite this as
Identical twins 44 (39.56) 9 (13.44)
for all i, j 53
Frater nal twins 9 (13.44) 9 (4.56) 18
which states that the probability of obtain ing an
observation in the jth Total 53
colum n is the same for every row. Evidence agains 18 71
t the independence
hypot hesis is evidence in favor of an assoCiation betwe
en the row and column Note that 83% of identical twin pairs have like
classifications. habits , but only 50% of
fraternal twins have like habits. Could such a large differe
Unde r H , the unkno wn param eters are nce reasonably have
occurred by chance, or is the probability of like habits
different for the two
et.;= P(A;) for i = 1, 2, ... , a; types of twins?
We wish to know whether the probability of 8 (like
{Ji= P(Bi) for j = 1, 2, .. . , b. habits) could be the
same for identical twins (A ) and fraternal twins1
1 (A 2 ); that is, we wish to
Since .fo, = 1 and I.{Ji = 1, the numb er of functi examine the hypothesis
onally independent para-
meters under H is q =(a - 1) + (b - 1), and the degree
s of freedom for testing
His H : P(B 1 IAi) = P(B 1 IA 2 ) = P(Bi).
(k - 1) - q =ab - 1 - (a - 1) - (b - 1) =(a - l)(b
- I).
This is the independence hypothesis. Unde r H, the expec
ted frequency for the
( 1, 1) cell is
Since Pii = a,{Ji, the log likelihood function is
e 11 = r 1 ctfn = 53 x 53/71 = 39.56.
I,I,f;i Jog Pii = I.I.f;i log Ct. ;+ log {3
1)
The remaining expected frequencies can be found
=~(log a;)~ f;i +~(log fJi ) ~ !. in a similar way, o r by
1 subtra ction from the marginal totals : The observed
value of the LR statistic
(12.6.1) is

Maximizing this function subject to I.a:; = 1 and I.{Ji


= 1 gives
Dobs = 2[ 44 log
3 ~6 + 9 log 13~44 + ·· ·] = 7.15.

a,=r j n; l31=c j/n. The degrees of freedom for the test is (a - l)(b -1)
= 1, and
Hence the expei;:ted freque11cy for class A;B is SL~ P{x fll ~ 7.15} < O.ot.
1
eij = npij = na,/31 = r,cj/n. It is not reasonable to attrib ute the observed discre
(12.6.2) pancies to chance, and
To obtain expected frequencies under the indepe herce there is strong evidence against the indep
ndence hypothesis, we endence hypothesis. The
multiply row totals by coli.{mn totals and divide by probability of like smoking habits is greater for
the grand total. identical twins than for
Note that the e,/s have the same row and column fraternal twms.
totals as the f;/s:
L
eii = r;(I.ci) /n = r;;
j
L
e;1 = C'Lr;) cj(n = ci . N ote. The main difference between the situations
in Examples 12.4.1 and
i 12.6.1 is that ·the column totals c =c =44 were
If we comp ute all of the expected frequencies in 1 2 fixed in advance in the
the upper left hand (a - 1) former, whereas only the grand total n = 71 was fixed
x (b - I) subtable, we can get the expected frequencies in the latter. Thus the
for the last row and basic model for Example 12.4.1 is a pair of indep
column of the table by subtra ction from the margm enden t binomial distri-
al totals. butions, but the basic model for Example 12.6.l
is a single multin omial
174 12. Tests of Significance 12.6. Tests for Independence in Contingency Tables 175

distribution. If we had applied the analysis from Example 12.4.l to the Is there any evidence that laterality of eye is related to laterality of hand?
current example, we would have obtained exactly the same expected Assuming that the two classifications are independent, the expected
frequencies, LR statistic, degrees of freedom, and significance level. For a test frequencies for the four classes in the upper left hand corner are
of the independence hypothesis it makes no difference whether the marginal 118 x 124/413 = 35.43 195 x 124/413 = 58.55
totals are random variables, or are fixed in advance by the experimental
design. 118 x 75/413 = 21.43 195 x 75/413 = 35.41

EXAMPLE 12.6.2. Twenty-seven of the pairs of identical twins considered in The remaining expected frequencies are obtained by subtraction from the
Example 12.6.1 had been separated at birth, whereas the other 26 pairs had marginal totals, and are shown in parentheses above. The observed value of
been raised together. The frequencies of like and unlike smoking habits for the LR statistic (12.6.1) is Dobs = 4.03, and there are (a - l)(b - 1) = 4 degrees
the two groups are as follows: of freedom for the test. Thus
Like habits Unlike habits Total SL~ P{xf4 >:;?: 4.03} > 0.25.
Separated 23 (22.42) 4 (4.58) 27 The hypothesis of independence is compatible with the data, and there is no
Not separated 21 (21.58) 5 (4.42) 26 evidence of an association between laterality of hand and Jaterality of eye.
Total 44 9 53 EXAMPLE 12.6.4. Nine hundred and fifty school children were classified
The figures in parentheses are the expected frequencies under the assumption according to their nutritional habits and intelligence quotients, with the
that the two classifications are independent. We do not need a formal test of following results:
significance to tell us that the agreement is extremely good. There is no Intelligence Quotient
evidence that the probability of like smoking habits is different for the two
groups. <80 80-89 90-99 2100 Total
The greater similarity between smoking habits of identical twins (Example Good nutrition 245 (252.5) 228 (233.3) 177 (173.8) 219 (209.4) 869
12.6.1) could be accounted for in two ways. Firstly, it could be due to the fact Poor nutrition 31 (23.5) 27 (21.7) 13 (16.2) 10 (19.6) 81
that identical twins have the same genotype, whereas fraternal twins are no
more alike genetically than ordinary brothers and sisters. Secondly, it could Total 276 255 190 229 950
be due to greater social pressures on identical twins to conform in their
habits. If the latter were the case, one would expect to find less similarity in If there is no relationship, the row and column classifications are
the smoking habits of identical twins who had been separated at birth. Since independent, and the expected frequencies are as shown. The observed value
this is not the case, it appears that genetic factors are primarily responsible for of the LR statistic is 10.51 with (a - l)(b - 1) = 3 degrees of freedom, giving
the similarity of smoking habits. SL~ P {xf3 >:;?: 10.51} ~ 0.02.
The possibility that genetic factors may influence smoking habits has
interesting implications for the smoking and cancer controversy, since these The data provide reasonably strong evidence against the hypothesis of
same genetic factors might also produce an increased susceptibility to cancer. independence. Poor nutrition and a low IQ tend to occur together.
See Fisher's pamphlet for further discussion.
EXAMPLE 12.6.3. In a study to determine whether Jaterality of hand is PROBLEMS FOR SECTION 12.6
associated with laterality of eye (measured by astigmatism, acuity of vision, I.tin December 1897 there was an outbreak of plague in a jail in Bombay. Of 127
etc.), 413 subjects were classified with respect to these two characteristics. The persons who were uninoculated, 10 contracted the plague. Of 147 persons who
results were as follows: had been inoculated, 3 contracted the disease. Test the hypothesis that contrac-
Left-eyed Ambiocular Right-eyed Total tion of the plague is independent of inoculation status.

Left-handed 34 (35.43) 62 (58.55) 28 (30.02) 124 2. It was noticed that married undergraduates seemed to do better academically
Ambidextrous 27 (21.43) 28 (35.41) 20 (18.16) 75 than single students. Accordingly, the following observations were made: of 1500
Right-handed 57 (61.14) 105 (101.04) 52 (51.82) 214 engineering students, 297 had failed their last set of examinations; 157 of them
were married, of whom only 14 had failed. Are these observations consistent with
Total 118 195 100 413 the hypothesis of a common failure rate for single and married students? Under
176
12. Tests of Significance
12.6. Tests for Independence in Contingency Tables
177
what conditions would the information that there
were more married students in
3rd and 4th years than in !st and 2nd years affect (a) Test the hypothesis that the shape
your conclusion? and color classifications are independent.
(b) According to Mendel's theory, the freque
3. Six hundr ed and four adult patients in a large ncies of yellow , yellow and green,
hospital were classified according to and green seeds should be in the ratio I :2: l. Test
whether or not they had cancer, and according whether this hypot hesis is
to whether or not they were consistent with the data.
smokers. The results were as follows:
8. In the folloWing table, 64 sets of triplets are
Cancer patient classified according to the age of their
Other mother at their birth and their sex distribution:
Smoker 70 397
Non-smoker 12 125 3 boys . 2 boys 2 girls 3 girls Total
Test the hypothesis that the disease classification
is independent of the smoking Mother under 30
classification. 5 8 9 7 29
Mothe r over 30 6 10 13
4. A total of 1376 father - daugh ter pairs were classif 6 35
ied as SS, ST, TS, or TT where S
stands for short and T for tall. Heights were divide Total 11 18 22
d at 68" for fathers and 64" for 13 64
daughters. The propo rtion of short daughters
amon g short fathers is 522/726
while amon g tall fathers the propo rtion is 206/65 (a) Is there any evidence of an association betwe
0.. Do the data indicate any en the sex distribution and the
association between the heights of fathers and age of the mother?
daughters?
(b) Suppose that the probability of. a male birth
5.tln a series of autopsies, indications of hyper is 0.5, and that the sexes of
tension were found in 37%' of 200 triplets are determined independently. Find the
heavy smokers, in 40% of 290 mode rate smoke probability that there are x
rs, in 45.3% of 150 light smokers, boys in a set of triplets (x = 0, 1, 2, 3), and test wheth
and in 51.3% of 160 non-smokers. Test the hypoth er the column totals are
esis that the probability of consistent with this distribution.
hypertension is independent of the smoki ng catego
ry.
9. 1398 school children with tonsils present were
6. The following table classifies 5816 births by classified according to tonsil size
day of the week. Row 1 classifies the and absence or presence of the carrier for strept
first 2000 births in Who's Who for 1970 (avera ococcus pyogenes. The results
ge year of birth 1907). Row 2 were as follows:
classifies the 3816 births which were announced
in The Times during one year
ending in August 1976. Norm al Enlarged Much enlarged
Carrier present 19
M 29 24
T W Th F Sa Su Total Carrier absent 497
Who's Who 262 560 269
307 270 307 272 280 302 2000
The Times 572 585 594 594 582 Is there evidence of an association between the
498 391 3816 two classifications?
(a) Test the hypothesis that, for the sample from 10. The following data on heights of 205 marrie
Who 's Who, births are uniformly d couples were presented by Yule in
distributed over the days of the week. 1900.
(b) Show that the distributions of births are signifi Tall wife
cantly different for the two Medium wife Short wife
samples. In what way are they different? Can you Tall husband 18
suggest an explanation for 28 19
this? Medium husba nd 20 51 28
Short husband 12 25
7.tGr egor Mendel grew 529 pea plants using seed 9
from a single source, and classified
them according to seed shape (round, round and · Test the hypothesis that the heights of husbands
wrinkled, wrinkled) and color and wives are independent.
(yellow, yellow and green, green). He obtained
the following data:
11.t A study was undertaken to determine wheth
38 round, yellow er there is an association between the
birth weights of infants and the smoking habits
65 round, yellow and green of their parents. Out of 50 infant s
of above average weight, 9 had parents who both
60 round and wrinkled, yellow smoked, 6 had mothers who
smoked but fathers who did not, 12 had fathers who
138 round and wrinkled, yellow and green smoked but mothers who did
not, and 23 bad parents of whom neither smoke
28 wrinkled, yellow d. The corresponding results for
50 infants of below average weight were 21, 10,
68 wrinkled, yellow and green 6, and 13, respectively.
35 round , green (a) Test whether these results are consistent with
the hypothesis that birth weight
67 round and wrinkled, green is independent of parental smoking habits.
30 wrinkled, green (b) Are these data consistent with the hypothesis
that, given the smoking habits
of the mother, the smoking habits:ofthe father are
not related to birth weight?
178 12. Tests of Significance 12.7. Cause and Effect 179

12. In a California study, data were collected on some features of motorcycle preparation of the virus extract, and the other half was rubbed with the second
accidents. As part of the study, questionnaires were sent to individuals who had extract. The following table shows the number of lesions appearing on the half
been involved in motorcycle collisions. One question of interest was the possible leaf.
relationship between the occurrence of head injury and helmet use. The following
data were reported on 626 injured male drivers who responded to the Leaf no. 2 3 4 5 Total
questionnaire.
Extract 1 31 20 18 17 9 95
No head Minor head Serious head Extract 2 18 17 14 11 10 70
injury injury injury
Helmet used 165 20 33 Test the hypothesis that the proportion of lesions produced by Extract 1 is the
No helmet 262 53 93 same on all leaves.

(a) Is there any evidence of a difference in relative frequencies of the different


injury types between the two groups (helmet versus no helmet)?
(b) Of those who received a head injury, is there any evidence of a difference in the 12.7. Cause and Effect
frequency of serious versus minor head injuries for the two groups?
(c) From these data we see that, of all the injured drivers, only 218 out of 626 A small significance level in a test for independence implies that the observed
(35%) wore helmets. Is there any evidence in these data that wearing helmets
frequencies would have been unlikely to occur if the row and column
reduces the chance of an injury in an accident?
classifications were independent of one another. Thus the data indicate some
13.tin an experiment to detect a tendency of a certain species of insect to aggregate, connection or association between the two classifications. However, the fact
12 insects were released near two adjacent leaf areas, A and B, and after a certain that an association has been detected does not imply that there is necessarily
period of time the number of insects that had settled on each was counted. The a direct or causative relationship between the two classifications.
process was repeated 10 times, using the same two leaf areas. The observations are The statement "A causes B" means that, by manipulating the cause A, we
set out below.
can control the effect B. If we make A happen, we increase the probability
5 6 7 8 9 10 that B will occur (within some reasonable time limit). If we prevent A from
Trial number 2 3 4
Number on A 7 3 3 9 0 0 5 5 7 4 happening, we decrease the probability that B will subsequently occur.
Number on B 3 5 6 10 8 2 5 4 6 The statement "A and Bare associated" means that A and B tend to occur
together. However, there is no guarantee that forcing A to occur will have any
Do the observations suggest that insects tend to aggregate, or that they distribute effect on the occurrence of B. In fact, there are three possible cause-and-effect
themselves at random over the two areas? relationships which could produce the association:

14. Consider a two-way cross-classification of counts hi' where 1 sis a and (i) A causes B;
1 sj s b. Assume that the h/s are independent, and that hi has a Poisson (ii) B causes A;
distribution with mean µii. Under this assumption, the total count n = r;r.hi is a (iii) some other factor C causes both A and B.
random variable. Consider the hypothesis
We cannot claim to have proof that A causes B unless the data have been
for 1 sis a, 1 sj s b, collected in such a way that (ii) and (iii) can be ruled out.
where the rt.,'s, {3/s, and y are unknown parameters and 'f.rt.; = 'f./3i = 1. This For instance, in Example 12.6.4, low IQ's were found more often in
hypothesis says that the expected counts in any two rows of the table are children with poor nutrition than in children with good nutrition. The
proportional: significance test tells us that the observed association cannot reasonably be
attributed to chance. However, it would not be valid to conclude that poor
for 1 sj s b.
nutrition causes low IQ, or that low IQ causes poor nutrition. There could be
Show that the expected frequencies under H are given by (12.6.2), and the a third factor such as poor home environment which is responsible for both
likelihood ratio statistic for testing His given by (12.6.1). poor nutrition and low IQ.
15. Continuation of Problem 14. An experiment was carried out to determine Rigorous evidence of cause-and-effect can be obtained only from a
whether two concentrations of a virus would produce different effects on tobacco controlled experiment in which the experimenter demonstrates that by
plants. Half a leaf of a tobacco plant was rubbed with cheesecloth soaked in one manipulating the cause A, he can control the effect B. Randomization is an
180
12. Tests of Significance
12.7. Cause and Effect
181
impo rtant comp onen t of the experiment.
If the subjects who received A were
chosen at rand om, then we know what serious cases . The larger total number of
caused A, and neither (ii) nor (iii) successes with treat ment 2 is due to
above is a possible explanation. a third factor, illness severity, which was not
considered in the original table.
For instance, suppose that we wish to demo Of course, there may be additional factors,
nstrate that aspirin causes a such as the sex and age of the
reduction in the probability of a second patient, which also affect success rates.
attack for hear t attac k victims. The A further breakdown of the data
experimental subjects should be assigned according to these factors may change the
at random to either the treatment picture again . If just one impo rtant
grou p which receives aspirin, or the cont rol factor is overlooked, conclusions abou
grou p which receives a placebo (a t the relative merits of the two
pill similar to aspirin in appearance and treatments may be incorrect.
flavor, but with no active ingre-
dients). If we allowed the patients or their In a designed experiment, patients would
doctors to choose their treatments, first be grouped according to any
we could not be sure that any reduction in factors such as disease severity which were
second attacks was actually due to expected to influence the success
the aspirin rathe r than to the way in whic rate. The patients in a grou p would then be
h the treatments were assigned. assigned to treatments at rand om.
The following example shows the sort ' Unde r rand om assignment one would expe
of difficulty that can arise when ct that any impo rtant factor
treat ment s are not rand omly assigned. which had been overlooked would be reaso
nably well balanced over the
treatment groups. An imbalance could still
occur by chance. However, if the
EXAM PLE 12.7. 1. In orde r to comp
are two possible treatments for the same experiment is repeated, it is very unlik
ely that we will again obta in an
disease, hospital records for 1100 appli imbalance of the same sort. Thus, by
cations of each treatment were rand omly assigning subjects to
examined. The treat ment was classified treatment groups, and by repeating the expe
as a success or failure in each riment, one can guard against the
application and the observed frequencies presence of unsuspected factors which migh
were as follows: t invalidate the conclusions.
Treat ment 1 Treat ment 2 PROBLEMS FOR SECTION 12.7
Success 595 905 l. Explain why it is that studies such as
Failure those in Probl ems 12.6.3 and 12.6.5 cann ot
505 195 used to establish cause be
-and-effect relationships.
Total 1100 1100 2. In an Onta rio study , 50267 live births
were classified according to the baby 's
weight Oess than or greater than 2.5 kg)
The success rate was 82% for treat ment and according to the moth er's smok ing
2 versus only 54% for treatment 1, habits (non-smoker, 1- 20 cigarettes per day,
and a significance test shows that this or more than 20 cigarettes per day).
difference is far too great to be The results were as follows:
attrib uted to chance.
One might be temp ted at this poin t to assum No. of cigarettes 0 1-20 > 20
e that the relationship was
causal, and that the overall success rate could Weight :::;2.5
be improved if treat ment 2 were 1322 1186 793
always used. This isn't necessarily so! We Weight >2.5
do not know that patients receiving 27036 14142 5788
the two treatments were similar in othe r
ways, and therefore we cann ot rule (a) Test the hypothesis that birth weight
out possibility (iii) above. is indep enden t of the mother's smok ing
For instance, it might be that treat ment habits.
1 was given primarily to patients (b) Explain why it is that these results
who were seriously ill, while treat ment do not prove that birth weights would
2 ·was usually given to those less increase if mothers stopp ed smoking durin
seriously ill. The breakdown into serious g pregnancy. How shoul d a study to
cases and less serious cases might be obtai n such proof be designed?
as follows: (c) A similar, though weaker, association
exists between birth weight and the
amou nt smoked by the father. Explain why
Less serious cases More serious cases this is to be expected even if the
father's smoking habits are irrelevant.
Trt 1 Trt 2 Trt 1 Trt 2
3.t91 8 freshman math emat ics students
were classified according to their first term
Success 95 900 500 average on six subjects, and according to
5 whether or not they had written a high
Failu re 5 100 500 95 school mathematics competition. The result
s were as follows:
Total 100 1000 1000 100 First term average < 50 50- 59 60--69 2 70
Trea tmen t 1 has a higher success rate for Wrot e competition 10 46
both the less serious and more 128 289
Did not write 41 89 146 169
182 12. Tests of Significance 12.8. Testing for Marginal Homogeneity
183

(a) Test the hypothesis that first term average is independent of competition
depend upon the situation. To illustrate this point, we give an example where
status.
(b) Explain why it is incorrect to conclude that high school students can improve
one is interested in comparing the marginal probabilities P(A;) and P(B;)
their prospects for first term at university by .writing the competition. rather than in testing independence.
Two drugs are compared to see which of them is less likely to produce
4. One hundred and fifty Statistics students took part in a study to evaluate unpleasant side effects. Each of 100 subjects is given the two drugs on
computer-assisted instruction (CAI). Seventy-five received the standard lecture different occasions, and is classified according to whether or not the drugs
course while the other 75 received some CAI. All 150 students then wrote the same
upset his stomach. The results can be summarized in · a 2 x 2 contingency
examination. Fifteen students in the standard course and 29 of those in the CAI
table as follows:
group received a mark over 80%.
(a) Are these results consistent with the hypothesis that the probability of Nausea with No nausea
drug B with B Total
achieving a mark over 80% is the same for both groups?
(b) Based on these results, the instructor concluded that CAI increases the chances Nausea with drug A 38 2 40
of a mark over 80%. How should the study have been carried out in order for No nausea with A 10 50 60
this conclusion to be valid?
Total 48 52 100
5.t (a) The following data were collected in a study of possible sex bias in graduate
admissions at a large university:
Drug B produced nausea in 48% of subjects, but drug A produced nausea in
Admitted Not admitted only 40% of subjects. Could this discrepancy reasonably be ascribed to
Male applicants 3738 4704 chance, or is there evidence of a real difference between the two drugs?
Female applicants 1494 2827 We assume that results for different subjects are independent, so that we
Test the hypothesis that admission status .is independent of sex. Do these data have n = 100 independent repetitions of an experiment with k = 4 possible
indicate a lower admission rate for females? outcomes. The basic model for the experiment is multinomial as in Section
(b) The following table shows the numbers of male and female applicants and the 12.6. However, the hypothesis of independence is not of interest in this
percentages admitted for the six largest graduate programs in (a): example. One would expect a patient who experiences nausea from one drug
to be more susceptible to nausea from the other drug as well. Indeed, 88 of the
Men Women 100 subjects reacted in the same way to both drugs. The row and column
Program Applicants % Admitted Applicants % Admitted classifications are certainly not independent, and there would be no point in
testing the hypothesis of independence.
A 825 62 108 82
B 560 63 The question of interest in this example is whether the probability of
25 68
c 325 37 593 34 nausea is the same for both drugs. Thus we consider the hypothesis of
D 417 33 373 35 marginal homogeneity,
E 191 28 393 24
F 373 6 341 7
H : P(A) = P(B).
Since P(A) = p 11 + p 12 and P(B) = p 11 + p 21 , this hypothesis is equivalent to
Test the independence of admission status and sex for each program. Does any
H: P12 = P21
of the programs show evidence of a bias against female applicants?
(c) Why is it that the totals in (a) seem to indicate a bias against women, but the which states that the 2 x 2 table of probabilities (Pu) is symmetric. There will
results for individual programs in (b) do not? be one degree of freedom for testing H because it reduces the number of
unknown parameters by 1.
Under H, the log likelihood function is
12.8. Testing for Marginal Homogeneity + U12 +f2il log P +!22 log P22
l = "f."f.fii log Pu= !11 log Pi1

Although the hypothesis of independence will often be of interest, one should where p is the common value of p 12 and p 21 , and p 11 + 2p + p = {.
22
not assume that every contingency table automatically calls for such a test. Maximizing l subject to this restriction gives
Contingency tables can arise in many ways, and the hypothesis of interest will P11 = !11 / n; ft22 =!22 /n; ft12 = ft21 =ft= tU12 +!21 )/n,
184 12. Tests of Significance 12.8. Testing for Marginal Homogeneit y 185

and hence the expected frequencies are hence


SL::::; P{xfil ~ 1.30} > 0.25.
The following are the observed and expected frequencies for the present According to this analysis, there is no evidence of a real difference between
example: the two drugs.
This alternativ e analysis would be correct if Y1 and Y were independe nt. In
2
38 (38) 2 (6) Example 12.4.1 there were 88 different rats, 44 for each column of the table,
10 (6) 50 (50) and so it was reasonabl e to assume the independe nce of Y and Y • However,
1 2
The observed value of the LR statistic is in the present example, the same 100 subjects received both drugs. A subject
who experiences nausea with drug A is likely to be affected in a similar way by
drug B, and so the results in the second column of the table are not
Dobs = 2 [ 38 log 38 + 2 log 62 + 10 log 610 +,50 log 50] = 5.82, independe nt of those in the first column. Thus the alternativ e analysis is not
38 50
valid in this example.
and therefore
In general, it is not valid to assume that repeated observatio ns on the same
SL~ P{xfil ~ 5.82} ~ 0.02. subject are independe nt, and care mu~t be taken that the analysis does not
depend upon the independe nce assumptio n. See Section 13.7 for further
There is fairly strong evidence against the hypothesi s of marginal homo-
discussion of this point.
geneity. The chance of nausea is significantly less with drug A than with drug B.
Note that, since log I= 0, the diagonal terms contribute nothing to the LR
statistic. The hypothesi s says nothing about the probabilit ies on the diagonal,
Testing Margina l Homogeneity in Larger Tables
and so the test does not depend upon the diagonal frequencies. The
hypothesi s says only that the off-diagon al probabilit ies are equal, and so the
The hypothesi s of marginal homogene ity in an a x a contingen cy table is
total frequency f 12 +f 21 for these cells should be divided equally between
them. H: P(A;) = P(B;) for i = 1, 2, ... , a.
This hypothesi s implies that the matrix of P;;'s is symmetric for a= 2, but not
for a> 2. Numerica l methods are required to deter.m ine the expected
An Alternative (Incorrect) Analysis frequencies under H when a> 2. Once these have been obtained, the
hypothesi s may be tested using the LR statistic (12.6.1). Since H reduces the
The marginal totals from the table above can be arranged to form a new 2 x 2 number of unknown parameter s by a - I, there will be a - 1 degrees of
table:
freedom for the test.
Treatment Drug A Drug B
PROBLEMS FOR SECTION 12.8
Nausea 40 (44) 48 (44)
No nausea 60 (56) 52 (56) I. Consider the function
l(a:, {J, y) =a log rt:+ blog f3 +clog y,
Total 100 100 where a:+ 2(1 + y = I. Show that this function is maximized for rt:= a/n, /3 = b/ 2n,
y = c/n, where n = a + b + c.
The situation now appears similar to that in Example 12.4.1. The number of
subjects Y1 who experienc e nausea with drug A will have a binomial (100, p ) 2. (al A random sample of 10000 people was taken from the Canadian labor force.
distributio n, and the number Y2 with drug B will have a binomial (100, p )
1 Of these, 523 were unemployed. Obtain an approximate 95% confidence
2 interval for the proportion of unem,ployed in the Canadian labor force.
distributio n. Under H: p 1 = p 2 , the estimated probabilit y _of nausea is
(b) A second random sample of 10000 people was taken from the Canadian labor
~ 40 +48 force one year later. This time 577 were found to be unemployed. Is there
p= 100 + 100 =
0·44' conclusive evidence that the overall unemployment rate has changed?
a
(c) Suppose that, instead of choosing second random sample, the same 10000
and using this estimate we obtain the expected frequencies shown in people had been re-interviewed one· year later. Why would the test in (b) no
parenthes es. The observed value of the LR statistic (12.4.1) is D b, = 1.30, and longer be appropriate?
0
12. Tests of Significance 12..9. Significance Regions 187
186

3.t Of 400 randomly chosen electors in a riding, 212 said that they supported inside this region is compatible with the data at the 5% level because a test of
government policy and 188 were opposed. Soon after this a new budget was H: (} = (} 0 gives a significance level of 5% or more. Any parameter value (} 1
introduced and the same 400 electors were re-interviewed. There were found to be outside this region is contradicted by the data at the 5% level because a test of
196 who now supported government policy, including 17 who had previously been H: (} = B1 gives a significance level less than 5%.
opposed.
(a) Explain why it would not be valid to carry out a test for independence in the
EXAMPLE 12.9.l. Suppose that X has a binomial (n, ~) distribution. The
following table:
expected value of X under H: (} = (} 0 , is ne0 , and so we might choose
Support Opposed D =IX - ne 0 1 as the test statistic (see Example 12.1.1). Given an observed
Before budget 212 188 value x, the significance level is
After budget 196 204
SL((}o) = P{IX - ne 0 1~Ix - nB 0 1}.
(b) Another way to tabulate the data is as follows: For n large, the normal approximation to the binomial distribution gives
Support after Opposed after
Support before 179 33
Support after 17 171
and hence the square of this quantity is approximately xfl)· It follows that
2
Carry out a test for independence in this table, and carefully explain what the 2 (x - n(} 0 ) }
result means. SL( Bo)~ p { Xo> ~ neo(l - (Jo) .
(c) Test the hypothesis that the proportion of voters who support government
policy is the same after the budget as it was before the budget. The approximate 5% significance interval for B based on this test contains
all parameter values B0 such that SL(B0 ) ~ 0.05. Since P{xf1 > z 3.841} = 0.05,
we have SL~ 0.05 if and only if
12.9. Significance Regions (x - n(} 0 ) 2 / n(} 0 (l - (} 0 ) $ 3.841
<-->(&- (} 0 ) 2 $ 3.84l(}o(I - (}o)/ n
In Section 11.4 we defined confidence intervals and suggested that they be
constructed from the likelihood function. In this section we consider another where &= x/n. The endpoints of the interval are thus the roots of a quadratic
construction based on a test of significance. equation
Suppose that the model involves a single unknown parameter(}, and that
we have a test of significance for the hypothesis(}= (} 0 . The significance level (l> - (}) 2 = 3.841(}( 1 - {})/n.
will depend upon which parameter value is tested, so we can think of SL as a For instance, suppose that we ·observe X = 35 in n = 100 trials as in
function of 0. If SL(B0 ) is near I, H: B = B0 is in good agreement with the data Example 12.2. l. Then l} = 0.35, and the equation is
and e0 is a "reasonable" parameter value. If SL((} 0 ) is near 0, H: (} = (} 0 is
strongly contradicted by the data, and (} 0 is not a reasonable parameter value. (0.35 - (}) 2 = 0.03841(}(1 - (}).
The significance level, considered as a function of (}, gives a ranking of
Its roots are (} = 0.2636 and (} = 0.4474, and so the approximate 5%
possible parameier values.
significance interval for (} based on the above test is 0.2636 $ (} ::; 0.4474.
Intervals or regions of parameter values can be obtained from SL((}) in the
Alternatively, we could use the likelihood ratio statistic
same way that likelihood regions are obtained from R((}). The set of
parameter values such that SL((})~ pis called a 100p% significance region far x n-x
e. Significance regions are also called consonance regions. See Kempthorne D' = - 2r((} 0 ) = 2x log nBo + 2(n - x) log n(l _(Jo)
and Folks, Probability, Statistics, and Data Analysis, Iowa State University
Press (1971). as in Example 12.2.1. Since D' ~ xf1 > for n large, we have
The 5% significance region for B consists of all parameter values (} 0 such
SL((}0 ) ~ P{xfll z -2r((} 0 ) }.
that SL((} 0 ) z 0.05. Usually this will be an interval. Any parameter value 00
188
12. Tests of Significance
12.9. Significance Regions
We then have SL~ 0.05 if and only if 189

-2r( 8 0 ) s 3.841. Construction of Confidence Region


s
Solving for n = 100, x = 35 gives 0.26
12:::; 8 s 0.4464 as the approximate In Section 11.4 we constructed
significance interval for 8. This is 5% confidence regions from the like
also a 14.7% likelihood interval function . For instance, the 95% conf lihood
app roxi mat e 95% confidence inte and an idence region was taken to be the
rval (see Section 11.4). likelihood region where p was chos 100p %
In orde r to nnd an exact 5% sign en to give coverage probability
ificance interval, we would need Regions constructed in this way have 0.95.
eval uate SL(8 0 ) by summing bino to the prop erty that each para met er
mial (n , 80 ) probabilities as in Exa inside the region is mor e likely than value
12.2.1. We would repeat this calculat mple each para met er value outside the
ion for several values of 8 , and by The above results on coverage prob region.
and erro r find the range of para met 0 trial ability suggest a second met hod
er values 80 such that SL(8 ) ~ 0.05 construction. We begin with a test of
two test statistics D, D' will give sligh 0 . The of the hypothesis 8 = 80 with test
tly different intervals. D, say. Using this test, we dete statistic
rmine the 5% significance region for
this as the 95% confidence region. 8 and take
Regions con stru cted in this way hav
property that para met er values incl e the
Coverage Probabilities of Significan uded are com pati ble with the d~ta
at the
ce Regions 5% level, while values excluded are
con trad icte d by the data at this leve
Significance levels, and hence sign l.
As in Section 11.1 , we imagine a ificance regions, dep end up~n
very large num ber of repetitions particular test statistic D which is the
experiment, with 0 having the sam of the used in the significance test. If a diffe
e unk now n value in all repetitions. test statistic is used, the 5% significa rent
repetition we com pute a 5% significa At each nce region for 8 will generally chan
nce region for 8 using a significance Likelihood regions do not depend ge.
of H: 8 = 80 with test statistic D, test upo n the choice of a test statistic.
say. The coverage probability CP reason, it seems preferable to take For this
prop orti on of these regions which is the confidence regions directly from
cont ain the true value of 8. likelihood function as suggested the
The true value 80 , say, belongs to in Section 11.4. However the
the 5% significance region if and only construction using a significance test second
test of H : 8 = 80 gives SL~ 0.05. if a is widely used.
If D is a con tinu ous variate, ther
e exists a variate value d such
P(D ~ d) = 0.05 . The significance that
level will be 5% or mor e if and only Significance Regions from Likelih
observed value of D is at most d. if the ood Ratio Tests
Thu s we have
CP( 8 0 ) = P(SL ~ 0.05) = P(D s d) We now have two methods for
= I - P(D > d) . obta inin g confidence regions from
likelihood function. The first met hod the
Since D is cont inuo us, P(D = d) is is to take a likelihood regi on with
zero, and therefore desired coverage probability. The the
second met hod is to obta in a sign
region from the likelihood ratio test ifican~e
CP( 80 ) = 1 - P(D ~ d) = 0.95. of H : 8 = 80 • Und er wha t cond1tio
these two constructions prod uce the ns will
The coverage probability is exac same region?
tly 95 % for all para met er valu The likelihood ratio statistic for testi
Therefore the 5% significance regi es 80 • ng H: 8 = 80 is
on is a 95% confidence region
con tinu ous case. in the D = 2[1(0) -1(8 0 )] = - 2r(8o).
If the probability model for the expe Let dP = dP(8 0 ) be the largest valu
riment is discrete then D will be e of D such that
discrete variate, and there usually a
will not exist a variate value d such
P(D ~ d) = 0.05 . Inst ead we take that P{ D ~ dp(8 0 )l80 ii;" the true value}~
d to be the variate value such p.
P(D ~ d) ~ 0.05, and P(D > d) < that
0.05. The significance level will be Then 80 belongs to the 100p% sign
mor e if and only if the observed 5% or ificance region if and only if th~ o~se
value of D is at mos t d. Thu s value of Dis at mos t dp(8 ). Thu s rved
0 the 100p% significance reg10n ts give
the inequality n by
CP( 8 0 ) = P(SL ~ 0.05) = P(D:::; d)
= 1 - P(D > d).
It follows that CP( 8 ) > 0.95 for - 2r(8 0 ) s dp(8 0 ).
0 all para met er values 8 • The exact
prob abil ity of the 5% significance 0 coverage
region will usually depend upon 8 This defines a likelihood region if and
discrete case, and it is always grea 0 in the
only if dp(8 0 ) does not depend upo
ter than 95%. If the distribution of Dis the same for n 80 .
In general, the coverage probability all 80 , then dP does not depend upo
of a 100p% significance region for 80 • Every significance region obta ined n
exactly 1 - p if D is cont inuo us, and 8 is from the likelihood ra~io. test will
greater than 1 - p if D is discrete. likelihood region, and the two cons be.a
tructions will agree. This 1s the case
large samples when D ~ xfl) for all m
80 (see Example 12.9.1).
190 12. Tests of Significance 12.10. Power
191

If the distribution of D depends upon 00 , as it usually will in examples with SL s a.. If a. is achievable, then x EC. if and only if D(x):;:::. d•. It follows that,
discrete distributions, then dP will generally depend upon 00 • Significance for any achievable a.,
regions obtained from the likelihood ratio test need not be likelihood regions,
and the two constructions will usually give slightly different results. P(X e C.IH 0 is true)= P(D:;:::. d.IH 0 ) =a..
Note the similarity with the results on coverage probabilities in Section 12.8.
EXAMPLE 12.9.1 (continued). Consider again the binomial distribution Now let H 1 denote another hypothesis which is chosen to represent the
example with n = 100 and X observed to be 35. We shall show that the exact kind of departure from H 0 that we wish to detect. H 0 and H 1 are called the
5.4% significance interval obtained from the likelihood ratio test of H: 0 = 00 null hypothesis and the alternative hypothesis, respectively. Initially we assume
is not a likelihood interval. that both H0 and H 1 are simple hypotheses, so that the probability of any
The 14.7% likelihood interval (approximate 95% confidence interval) for 0 outcome x can be computed numerically under H 0 and under H 1 .
is given by The size a. power (or sensitivity) of a test statistic D with respect to the
-2r(O) s 3.841 simple alternative hypothesis H 1 i~

and solving gives 0.26117 s es 0.44642. The two endpoints of this interval K. = P{SL ~ a.IH 1 is true}= P{ Xe C.IH i}.
have equal relative likelihoods. For instance, Ko.os is the probability that a test of H 0 using D will produce a
An exact likelihood ratio test of H: e = 0.26117 can be carried out as in significance level of 5% or less if in fact H 1 is true. If Ko.as is near 1, the test
Example 12.2.1, and the significance level is found to be 0.052. Similarly, an statistic D is said to be powerful or sensitive against H 1 , because if H were
1
exact test of H: 0 = 0.44642 gives SL= 0.056. Thus the exact 5.4% sig- true the test would almost surely give evidence that H 0 is false.
nificance interval for () will contain () = 0.44642, but it won't contain the Now let D, D' be two comparable statistics for testing H 0 with power K.,
equally likely value 0 = 0.26117. It follows that the 5.4% significance interval K~ against H 1 • Dis said to be more powerful than D' against H if K. 2 K~ for
1
is not a likelihood interval. all achievable significance levels a.. A statistic D is called most powerful for
testing H 0 against H 1 if it is more powerful than every comparable statistic
D'.

*12.10. Power EXAMPLE 12.10.1. Let X ~ N(µ, 1), and consider a test of H 0 : µ = O against
H 1 : µ = 2. Two possible statistics for testing µ = 0 are D = X and D' IXI.
With test statistic D, only large positive values of X are considered to be in
=
This section briefly introduces a theory of test statistics. This theory is based
on the concept of the power or sensitivity of a test statistic against an poor agreement with µ = 0, whereas with D' both large positive and large
alternative hypothesis. Power comparisons may be helpful in a theoretical negative values of X are considered as evidence against µ = 0. Both D and D'
comparison of several possible test statistics to determine which of them is are continuous variates, and therefore all significance levels are achievable for
more likely to detect departures of a particular type. both statistics.
Consider a test of the simple hypothesis H 0 , with test statistic D. The The size a critical region for D has the form X 2 d. where d. is chosen so
significance level of outcome x in relation to H 0 is that
P{X;:::. d.IH 0 is true}= a..
SL= P(D;:::: di Ho is true)
Since Xis N(O, i) under H 0 , d, is the value such that F(d.) = 1- a. where Fis
where d = D(x) is the observed value of D.
the standardized normal c.d.f. The size a power of D with respect to H 1 is
If D is a continuous variate, it is possible to obtain any significance level
between 0 and 1. However if D is discrete there will be only a discrete set of K. = P{X;:::: d,IH 1 is true}= P{X;:::: d.IX ~ N(2, 1)}
possible significance levels corresponding to the possible values of D. If there
exists a variate value d, such that P(D:;:::. d.IH) =a., then a. is called an = P{Z;:::: d.-21Z ~ N(O, 1)} = 1-F(d.- 2),
achievable significance level. Two test statistics are called comparable if they The size a. critical region for D' has the form !XI;:::: d~ where d~ is chosen so
have the same set of achievable significance levels. that
The size a. critical region of a test is the set c. of outcomes x for which P{IXl.;::::d~IH 0 is true} =a..

*This section may be omitted on first reading. SinceXisN(O, 1) under H 0 , d~is the value such that F(d~) = 1- ~·The size a.
192
12. TesU. of Significance 12.10. Power
193
power of D' with respect to H is
1
Table 12.10.1. Probabi lities Under the Null Hypoth esis and
K~ = P{IXI;::: d~IH 1 is true}= l - P{!XI < d~IX -N(2, l)} Under the Alterna tive Hypoth esis for the Four Regions of the
Sample Space Defined by Two Size IX Critical Regions
= l - P{ - d~ - 2:::;; z ~ d~ - 2!Z - N(O, l)}
=l - F(d~ - 2) + F( -d~ - 2). Ho C' . c~ Total Hi c~ c~ Total
For IX = 0.05 we find from Table B2 that d. = l.645 and d~ = 1.960.
Thus we
c. P11 P12 Cl c. qll q,2 K.
have c. P21 P22 I-ex c. q21 q2 2 1-K.
K 0 . 05 = l - F(-0.35 5) = F(0.355) = 0.64; Total Cf. I-ct. Total K~ l-K~
K~. 05 = l - F(-0.00 4) + F(-3.96 0) = 0.48.
Ifµ= 2 and we test the hypoth esisµ= 0, the probabi lity of getting
is 0.64 with statistic D, but only 0.48 with statistic D'. Thus D gives
SL:::;; 0.05 Since c. and C~ are size IX critical regions, we have
us a better
chance of obtainin g evidenc e against H: µ = 0 when in fact µ =
2. P11 + P12 = P(X E C.iHo) =IX= P(X E C~!Ho) = P11 + P11 •
It can be shown that K. ;::: K~ for all values of IX, so that D is more
powerful and hence p 12 = p 21 . The size IX power is
than D' for testing H 0 : µ = 0 versus H : µ = 2. In fact, it follows
1 from the
theorem below, that Dis the most powerful statistic for testing µ= K. = P(X E C.!Hi) =qi!+ qll for D;
0 against
µ=2 .
K~=P(XEC~IH 1 )=q11 +q2 1 for D'
and the difference in power is
Most Powerful Test when H 0 and H are Simple
1 K.-K~=q1z-q11 ·
The following theorem , which is called the Neyman - Pearson Fundam Since c.C~ is a subset of c., (12.10.3) gives
ental
Lemma , yields a most powerfu l test statistic when both H and H
0 1 are simple q1z = Efi (x) ~ d. Efo(x) = d.P11
hypothe ses.
where the sums are taken over x E c.c~. Similarly, since c.c~ is a subset of
Theorem 12.10.1. Let H 0 and H 1 be simple hypotheses, and letf
0 (x) andf1 (x)
C. , (12.10.3). gives
denote the probabi lity of a typical outcome x under H and
0 under H 1 , qzi = Ef1 (x) < d.Efo(x ) = d.P21 ·
respectively. Then the statistic
Now, since p 12 = p 21 , we have
D(x) = f 1 (x) ffo(x) (12.10.l)
is most powerful for testing H 0 against H . K. - K~ = q12 - qz1 > d.P12 - d.P21= 0.
1
This result holds for all compar able statistic s D' and achieva ble
PROOF. Let IX be an achieva ble significance level for D, significance
and let d0 be the value levels ct, and hence the theorem follows.
of D such that P(D ~ d.IH 0 ) =IX. The size IX critical region c. 0
is the set of
x-value s for which D(x) ~ d•. Note that, by (12.10.1), we have EXAMPLE 12.10.2. Let X - N(µ, 1), and conside r a
test of the simple null
hypothe sis H 0 : µ = µ 0 versus the simple alternat ive hypothe sis
f 1 (x);::: d.f0 (x) H1: µ = µ1.
for XE C 0 ; (12.10.2) The theorem gives
f 1 (x) < d.f0 (x) for XE C•. (12.10.3) D(x) = f 1 (x)/f 0 (x) =exp { - t(x - µ 1 ) 2 + -!(x - 11 ) 2 }
0
Let C~ be the size IX critical region for any compar able test statistic
D', and = exp{x(µ1 - µo) + t{µ~ - µi)}
conside r the partitio n of the sample space into four disjoint regions
as a most powerful statistic for testing H against H .
S = (C.C~)u(C.C~)u(C.c~)u(C.C~). 0 1
If µ 1 > µ 0 , large values of D corresp ond to large values of X . The
size a
We use p's to denote the probabi lities of these regions under H critical region has the form X ;::: b. where b. is chosen so that
0 , and q's to
denote their probabi lities under H 1 (see Table 12.10.1).
P{X ~ b0 1H 0 is true}= IX.
12.10. Power 195
12. Tests of Significance
194
powerf ul statist ic for
H 1: µ > µ 0 • Similarly, there exists a unifor mly most
Since X - N(µ 0 , 1) under H 0 , we find that b. = µ 0 +
z. where z. is the value Howev er there is no unifor mly most
rdized norma l distrib ution. The size rx testing H 0 :µ=µ 0 versus H 1 : µ<µ 0 .
exceed ed with probab ility rx in a standa H
powerful statisti c for testing 0 : µ = µ 0 versus H 1 : µ # µ 0
becaus e a differe nt
power is rankin g is obtain ed for µ 1 < µ 0 than for µ 1 > µ 0 .
belong ing to the
K. = P{X <'.'.: b.IH i} = P{X <'.'.: b0 /X - N(µ 1 , 1)} A similar result can be establi shed for any distrib ution
expone ntial family (Exam ple 15.1.6). '
= P{Z ~ b0 - µ 1 /Z is N(O, 1)}
= 1- F(µ 0 + z. - µi)
Discussion
where F is the c.d.f. of N(O, 1).
Simila rly, if µ 1 < µ 0 , the critica l region has the form
X::;; a., and the size 11.
a statist ic which is
1. We have seen that there genera lly will not exist
power is found to be ul for testing H B = 8 agains t a two-si ded alterna -
uniform ly most powerf 0 : 0

K. = 1 - F(µ 1 + z,. - µ 0 ). *
tive 8 B0 • In fact, uniform ly most powerf ul tests will
in
rarely exist except in
definin g a theore ti-
t H: µ = µ 0 , we simple textbo ok examp les. To make furthe r progre ss
In accept ing only large values of X as eviden ce agains cally optim um test, additio nal restric tions must be placed on the types of test
ures on the high side (µ 1 > µ 0 ), but
we
achiev e maxim um power agains t depart to be consid ered. The restric tions usually sugges ted seem arbitra ry and
ures on the low side (µ < µ ) . The cost of
lose the ability to detect depart 1 0
situati on is even less satisfa ctory when both the null and
ure is a decrea se in unconv incing . The
increa sed sensiti vity to one particu lar type of depart alterna tive hypoth eses are compo site, and we shall
not give details here.
sed previo usly in Examp le
sensiti vity to other types. This point was discus 2. Althou gh power consid eration s will not identif y an optim um test
12.5.2. cases, a compa rison of power may still be
statisti c except in very special
The .likelih ood ratio statist ic for testing H: µ = µ 0 is helpful in choosi ng betwee n two test statisti cs D and D'. Given a statist ic D for
ine the size rx power as a
D' = (X - µ0)2 testing H 0 : B = B0 agains t H 1 : B = B1 , one can determ
of /X - µ 0 /. The LR functio n of B1 ,
which ranks outcom es accord ing to the magni tude
statist ic is not most power ful for testing µ = µ 0 agains t any particu lar K,.(Bi) = P(SL::;; 11./B =Bi).
power for depart ures in
alterna tive value µ 1 , but it does have reason ably high A graphi cal compa rison of the power functio ns K,.(B)
and K~(B) for selecte d
both directi ons. is prefera ble.
values of a. may sugges t that one of the statisti cs
put is the determ ination of
3. Anoth er use to which power has been
sample size. Suppo se that we intend to test H: B = B 0 using a test statist ic D,
Composite Alternative Hypothesis and that . we want to be 90% sure of obtain ing a signifi cance level of 5% or
be selecte d so that
H is simple but H 1 is less if in fact B = B1 . Then the sample size n should
Now consid er a slightl y more genera l proble m in which 0 Ko.os(Bi) = 0.9. For anothe r approa ch to experi menta l planni ng, see the
outcom e x has probab ility f (x; B) where B
compo site. Suppo se that a typical discus sion of expect ed inform ation in Sectio n 11.6.
eter. The null hypoth esis is taken to be H 0 : B= B0 . The unless one can be
is a real-va lued param 4.. Power compa risons are not likely to be very useful
nl is a set of possible
alterna tive hypoth esis has the form H 1: BE nl where In many applic ations of
*
param eter values; for instan ce H 1 : B B0 and 1 H : B > B0 have this form.
statisti c for testing
quite specific with respect to the alterna tive hypoth esis.
significance tests, one will have only a vague idea concer ning the types of
Given any particu lar value 1 B E n 1 , a most powerf ul
would like to avoid buildin g elabor ate model s
by · depart ure that may occur. One
B = B0 versus B = 1 B is given
the need for them has been demon strated in a test of
to explain these until
D(x) =f(x; Bi)/f(x ; B0 ) . significance.
(i.e. the same critica l
If this statist ic defines the same rankin g of outcom es
region ) for all B1 E n 1 , then Dis uniformly most powerful for testing H 0 versus
Hi .
g of outcom es from
In Examp le 12.10.2, one obtain s the same rankin
favora ble) whene ver µ 1 > µ 0 • Hence
smalle st (most favora ble) to largest (least
power ful statist ic for testing H 0 : µ = µ 0 versus
there exists a unifor mly most
13.1. Introd uctio n
197
CHAPTER 13
mc!asure. mileages repeatedly unde r cond
ition s that are identical, or as close
to 1dent1cal as w_e can mak e them, we
Analysis of Normal Measurements ~esult. There will be scatt er, or varia
will not always get exactly the same
bility, in observations mad e unde r
identical conditions. We model this
by assuming that the obse rvati ons
Y1,Y2, ... .'y" are observe~ values of
rand om variables Y , Y , ... , Y•. The
problem is the~ .to determme how the 1
probability distribution 2of Y; depends
upon the cond1t1ons unde r which this
observation was mad e .
. If .the. conditions are very different
for Y; than for }j, the probability
d1stnbut1ons for Y; and }j may be of com
pletely different types. For instance,
suppose that we are observing failu
re times of plastic gears at various
temperatures. Gear s fail due to melti11g
at very high temp eratu res, whereas at
low temperatures they become brittle 'and
tend to fracture. There is no reason
to suppose that the distributions of
extremes. lifeti mes will be similar at these two
In most studies we deal with relatively
small changes in cond ition s. Then
we expect the distributions of Y , Y , .•.
, Y,, to be simi lar to one
w_e 1:11ig~t reasonably assume that
1 2 anot her. Thus
the Y;'s all have the same type of
The norm al distr ibuti on plays a central d1stnbut10n, and that the effect of chan
role in the modelling and statistical ging conditions is to alter the value of
analysis of cont inuo us measurements. a pa.rameter in this distribution. This
Man y types of measurements have is the sort of assu mpti on we mad e in
distr ibuti ons which are appr oxim ately Sect10n 10.5, where we were examining
normal, and the Central Limit the dependence of the response rate
Theo rem helps to explain why this is
so. Statistical methods for analyzing o~ the.dose ~fa_ drug. We assumed that
all of the Y;'s were inde pend ent and
normally distr ibute d measurements are bmom1ally d1stnbuted, and that the only
relatively simple, and most of these effect of changing the dose was to
meth ods give reasonable results unde r alter the response probability p.
mod erate departures from normality .
Section l discusses the basic assumpti
ons and describes the models to be
considered in later sections. All of
these are examples of normal linear
models, which will be discussed in grea The Basic Assumptions
ter generality in Chap ter 14. Section
13.2 describes statistical meth ods for such
models. These methods are applied
to the one-sample and two-sample mod In fois c~apter and the next one, we deve
els in ·Sections 3 and 4, and to the lop the model and analysis unde r the
strai ght line model in Sections 5 and assumption that the Y;'s are independ
6. Section 7 discusses the analysis of ent and normally distr ibute d with the
paired measurements, such as meas same variance <J 2 , so that
urements take n on the same subject
before and after treatment.
for i = 1, 2, ... , n.
(13. l.l)
Und er these assumptions, the effect of
chan ging cond ition s is to alter µi, the
expected value of Y; , but the shape and
afteeted. spre ad of the distr ibuti on are not
13.1. Intr odu ctio n
The model can also be written in term
e1 , e2 , ... , e. , where s of n inde pend ent erro r variables
Suppose that n dete rmin ation s y , y ,
1 2 •• ., y. are mad e of the same quan tity y
unde r various different cond ition s. For
by a car over a fixed distance might be
instance, gasoline mileages achieved e1 = Y;- µ; ~ N(O, <1 2
).
recorded for several driving speeds, We can then write Y; as the sum of a "targ
weat her conditions, etc. We wish to et value" µ 1 and a "ran dom erro r" or
formulate a model which describes or "noise" com pone nt e :
explains the way in which y depends 1
upon these conditions. Hopefully the
model will help us to unde rstan d how
the various factors affect mileage, and where e1 ~ N(O,
to estimate the magnitudes of their effec <J 2 ). (13. l.2)
ts. If we take repeated measurements unde
Any realistic model will have to take natu r the same conditions, then µi stays
ral variability into account. If we the same, but the rand om erro r e varie
1 s from one repetition to the next.
198 13. Analysis of Normal Measurements 13.l. Introduction 199

The standard deviation a measures the amount of random variability observed li~etimes. The distribution of log-lifetimes is generally much closer
(scatter, noise) that one would expect to obtain in repeated measurements to ~ormal m shape, and the log transformatio n also helps to stabilize the
taken under the same conditions. Suppose that Y; and r; are independent
variance.
measurement s with the same expected value:

Y;- N(µ 1, a 2 ); Y~ - N(µi, a 2 ) independent.


Assumptions Concerning µ 1 , µ 2 , ..., µ"
Then by (6.6.7) we have
The basi.c model (_13.1 .1) involves n + 1 parameters µ 1 , µ 2 , . .. , µ.and a. It is
not possible to estimate all n + 1 parameters with only n observations. Unless
After standardizing , we consult Table B2 to obtain some oft.h e parameter values are known, it is necessary to simplify the model
by reducmg the number of unknown parameters. We do this by expressing
the means µ1, µi, ... , µ. as functions of q parameters, where q < n. The form
of the assumptions concerning µ 1 , µ 2 , •.• , µ.will depend upon the conditions
under which the observations were made.
If the experiment is repeated under identical conditions, we expect about half In the s(mplest case, we have n measurement s y 1 , y 2 , . • • , Yn which were all
the measurement s to change by more than a. taken under the same conditions. The Yi's should differ from one another only
lf a = 0, then i;i = 0 and Y; = µi with probability one. In this case, repeating because of random variation, and so we assume that
the experiment would produce exactly the same measurement s y 1 , y 2 , .•• , y•.
µ1 = µi = · · · =µ.=ex.
Most real experiments do involve some natural variability, and so a> 0.
We are assuming that a is the same for all conditions under which In this case t.he. µ;'s are written as functions of a single unknown parameter ex,
observations are being taken. This means that all n measurements y 1 , y 2 , •.. , y. so q = 1. This 1s the one-sample model (see Section 13.3).
are made with the same precision, and therefore they should be treated Alternatively, we might have two groups or samples of measurement s -
equally in the analysis. If the Y;'s had unequal variances, it would be necessary for instance, data for .males and data for females. We might be willing to
to modify the analysis so that greater weight was given to the more precise assume that observat10ns belongmg to the same sample differ from one
observations. This can be done easily provided that all of the variance ratios another only because of random error. We could then take
var (Y;)/var (lj) are known.
for measurement s in group 1;
Note that we are assuming the independence of Y1 , Y2 , •.• , Y,,. Whether or
not this is a reasonable assumption depends upon the way in which the µi = ex + /3 for measurement s in group 2.
observations are made. For instance, it is usually not appropriate to assume
!he µ/s are expressed as functions of two unknown parameters, so q = 2. This
that repeat measurement s on the same subject or specimen are independent
ts the two-sample model which will be considered in Section 13.4.
(see Section 13.7).
In Sections 13.5 and 13.6 we shall consider the straight-line model
The normality assumption is less critical than the assumptions of inde-
pendence and equal variances. Most of the methods developed for normally for i = 1, 2, ... , n
distributed measurement s will give reasonable results under moderate
where X1, x 2 , ... , x. are known constants. For instance, this model might be
departures from normality. The normal distribution provides a good first
appr~priate if th.e y/s were blood pressure measurement s for n subjects, and
approximatio n in many applications, and the Central Limit Theorem
the xis were their ages. The µ/s are functions of two unknown parameters ex
(Section 6.7) helps to explain why this is so. and /3, so again we have q = 2.
If the original measurement s are decidedly non-normal, it may still be
possible to use the normal analysis, provided that a suitable nonlinear
transformatio n is first applied to the y/s. For instance, lifetime distributions
usually have a long tail to the right, and are far from normal in shape. Also,
Linear Models
there is usually more variability in the data under conditions which produce
long lifetimes than under conditions which produce short lifetimes. It wouhj Each of the models described above has the property that the µ.'s are
be inappropriate to assume normality and equal variances in such cases. A e~press~d as linear functions of the q unknown parameters ex, {3, .... Models
common practice is to apply the normal analysis to the logarithms of the With this property are called linear models (see Chapter 14).
200 13. Analysis of Norm al Measurem
ents
13.2. Statistical Methods
Because all of the models consider 201
ed in this chap ter are linear models,
analysis is very similar. The general their
form of the analysis will be describe whether or not the value of a is kno
the next section. Before proceedi d in wn. For this reason, we can trea
ng, the reader may wish to revie known in obtaining a, '/3, ... , and then t a as
mat eria l on the x2, t, and F dist ribu w the adjust later in the analysis for the
tion s in Sections 6.9 and 6.10. in which a is unknown. case
Since the µ/s are assumed to be
linear functions of rx, {3, ... , the sum
squares Sis of the second degree in of
these para met ers, and the derivatives
are linear in IX, /3, .. . . Thu s we can of S
13.2. Statistical Me tho ds find &, '/J, ... by solving the simu ltan
eous
linear equations
In this section, we describe stati
These met hod s will be applied to
stical methods for normal linear
models. as
-arx =O·
as
the one-sample, two-sample, and , a{J = O; .... (13.2 .3)
line models in the following sect straight
ions. See Cha pter 14 for derivati
addi tion al examples of norm al line ons and Iterative methods are not required
ar models. with norm al linear models.
Und er the basic model (13.1.1) or (13.1 The equations (13.2 .3) will be line
.2), the 11 measurements Y1 , Yi · ··· , ar in the measurements y , y , . . .
are observed values of inde pend ent Yn well as in the para met ers rx, /3, ... 1 2 , y. as
rand om variables ¥1 , Y2 , • . . , Y,,, whe . As a result, each of the estim ates &, '[3, .. .
re will be a linear com bina tion of
the. y/s. We shall use this fact belo
Y, - N(µ; , cr 2 ) for i = l ; 2, .. . , n. discussing significance tests and conf w in
idence intervals.
The join t p.d.f. of Y1 , Y , ... , Y,, is
2

n Fitted Values and Residuals


f(y1 , Yi· ... , Yn) = f1 f(y, )
i=1
Each of the µ/si s a linear function
of the para met ers rx, {3, ... . Upo n repl
= TI -foc- r exp {- ~(Y; - µ;)i}
i= I
1 these parameters by their estimates
ML E's of µ 1 ,µ 2 , •.. ,µ •. These are
a, p, ... , we obtain ii1 , iii ... ., ii •.acintheg
2cr cailed the fitte d valu es.
The difference between the ith mea
= (-1-)"
foc r
exp {- ~:E
2cr
(y; - µ;)i} .
(fitted value) ji; is
surement y 1 and its estimated mea
n

If the mea sure men t intervals are all


small (see Section 9.4), the log likel This is called the ith res idual, and
function is ihoo d it provides an estimate of £. = y -
rand om erro r associated with the µ . the
ith measurement. ' '
1 The n residuals I• e e2, ... 'e. "
I= - n log a - a 2 :E(y; - µ1) 2. (13 .2.1) provide two types of information:
info rma tion
2 abo ut the adequacy of the model, and
info rma tion abo ut the mag nitu de
We also assume that the µ/s can The adequacy of the model is asse of a.
be written as linear functions of ssed by plot ting the residuals and
unk now n para met ers rx, {3, .. . . The q examining them for unusual patterns
num ber of unk now n parameters and . See Section 14.5 for a disc ussion
equ atio ns relating the µ/s to rx, the residual plots. of
{3, ... will depend upon the situ
Section 13.1). ation (see Inferences abo ut er are based on the
residual sum of squa res, 'I:e f. It can
To find the ML E's &, p, .. ., we shown that, if the model is correct, be
substitute for the µ/s in (13.2.1) then
maximize. In orde r to maximize 1 and
by choice of rx, {3, ... , we need to min
the sum of squa res imiz e 1 -2 2
2'I:e
(J
1 - X<n - q) ( 13.2.4)
(13.2.2) where q is the num ber of unk now
n para met ers rx, {3, . . . in the mod
where e· = Y · - µ. is the erro r asso Furthermore, :Ee? is distributed inde el.
ciated with the ith measurement. pendently of&, p, ... . See Section 14.6
&, p, ... ~re ~hos:n to minimize the Because derivations of these results. for
sum of squares of the errors, they are
referred to as least squares estimates also Maximizing the full log likelihood
. function (13 .2.1) over er and IX, {3, ...
Not e that S does not depend upo 0- 2 = (1/n):Eef. This estimate is unsa gives
minimizing S, they do not depend
n a. Since &, p, ...
may be found by allow for the fact that , since q para met
tisfactory when q is large. It does
not
upo n a. The ML E's of rx, /J, ... are
the same ers rx, /3, .. . are estimated from the
there are effectively only n - q obse data ,
rvations for estimating cr 2 (see Section
10.8).
202 13. Analysis of Normal Measurements 13.2. Statistical Methods 203

2
The usual estimate of a is Inferences for o:, /J, ... : <J 2 Unkno wn
, 2 numerat or of x quantity
2
l (13.2.5)
Usually a is unknown , and is estimated by s as defined in (13.2.5). By
2 1 2
5 degrees of freedom
= n - q :Ee, =
(13.2.4) we have
2
We shall show at the end of this section that s is the MLE of a based on the
2

margina l distribut ion of 'fif.

Since "f.er is distributed independently of&, it follows that v is distributed


independently of Z in (I 3.2.6).
2
Inferences for o:, /J, ... : <J 2 Known When a is unknown, we consider the quantity
a- ex
We noted earlier that each of the estimates a, p, ... is a linear combination of T=--
Fc
the y/s. Thus we have
which we get by replacing rJ
2
by s2 in (13.2.6). Note that

for some constant s a 1 , a 2 , ..• ,a•.


Since the y;'s are assumed to be observed values of independent normal
T=Jic+$=Z+~
a
variates, it follows by (6.6.7) that the sampling distribut ion of is normal. It where Z - N(O, I) and V- xfn-q» independently of Z . It follows by (6.10.5)
can be shown that £(&)=ex, and (5.5.7) gives that T has Student's distribut ion with n - q degrees of freedom:
2
var(&)= "f.a'f var(Y;) = a c
(13.2.7)
where c = "f.a'f. It follows that
2
&-ex Note that T has the same degrees of freedom as the variance estimate s .
(13.2.6)
Z ==. CT: - N(O, I). To test H: ex= ex 0 , we compute the observed value of T when ex= ex 0 , and
v aic then use Table BJ to find
If a 2 is known,.inferences about ex are based on (13.2.6). To test H: ex= ex 0 , SL= P{lt(n-q)' ~ 17;,bsl}.
we compute the observed value of Z when ex= ex 0 , and then find
Alternatively, we have
SL= P{IZI ~ IZob•I} = P{xf1> ~ z;b.}· r;bs} = P{Fi .• -q ~ r;b.}
SL= P{tfn-qJ ~
It can be shown that the likelihood ratio statistic for testing H : ex= exa is
Z2,
by (6.10.7), which can be evaluated from Table BS for the F distribution. It
and so the procedur e just described is the likelihood ratio test. can be shown that the likelihood ratio statistic for testing H: ex= ex 0 is an
To construc t a 95% confidence interval for ex when a is known, we note increasing function of T , and hence the procedur e just described is the
2

from Table Bl or B2 that likelihood ratio test.


P{ - l.96:;; Z:;; l.96} = 0.95. To construc t a 95% confidence interval for ex, we use Table B3 to find the
value t such that
Substitu ting for Z and solving gives
P{ -t :s; t<n-qJ S-t} = 0.95.
a- 1.96# c $ex $a + l.96# c
Substituting from (13.2.7) and solving gives
as the 95% confidence interval. This is also a 5% significance interval:
it consists of all paramet er values ex 0 such that a likelihood ratio test of
&-tFc :s;ex:.:;& + tFc
H: cx = ex 0 gives SL ~ 0.05. It is also a likelihood or maximu m likelihood as the 95% confidence interval. This is also a 5% significance interval and a
interval for ex. maximum likelihood interval for oc.

I
204
13. Analysis of Norm al Measu remen ts 13.2. Statistical Metho ds
205
Inferences for a
Tc• obtain an approximate 95% confidenc
e interval for a, we note from
It can be argu ed that, when the parameters Table B4 that
a, p, .. . are unknown, the residual
sum of squares r.e? carries all of the infor P {xf1 1::; 3.841} = o.95.
mation from the y/s concerning a.
Inferences abou t a will therefore be based
on the marginal distribution of We then solve the inequality
r.ef.
By (13.2.4) we have -2r( a)$ 3.841,
either by plotting r(<J) or by using New ton's
meth od as i~ Section 9.8. The
interval obtained is also a 14.7% likelihoo
d interval and an appr oxim ate 5%
significance interval for a. ·
By (6.9.1), the p.d.f. of Vis Alternately, a 95% confidence interval for
a can be obta ined directly from
f(v) = k.vl• /2)- te -v/2 (13.2.4). From Table B4, we find values a,
for v > 0, b such that
where v = n - q and kv is a constant.
If we now change variables using P{xfn - ql::;; a} = P{xfn-ql ~ b} = 0.025.
(6.1.11), we find that the p.d.f. of 'f.ef is We then have

Idr.ef I
2
f(v)· -dv- =k vM2J - le - •/2,_
1 P { a::;; (n -q)s
a2 ::;;
}
b = 0.95,
v a2
and therefore the interval
vs2J(v/2)- l {
vs2} 1
=k• [ - exp - - · - (n - q)s 2
al
2 a2 a2 2
(n - q)s 2
b :=;;a ::;; a
Based on this distribution, the log likelihood
function of a is
has coverage probability 0.95.
vs 2 The second construction involves less arith
/(a)= - v log a - - for a> 0. metic than the first, but it does
2cr 2 not produce a likelihood interval. The inter
val will include some values of a 2
Setting /'(a) = 0 gives <J = s. Thu ss is the at the high end which are less likely than value
MLE of cr, and s2 is the MLE of a 2 , s excluded at the lower end (see
based on the marginal distribution of the Problem 11.4.10 and Example 13.3.2). For
residual sum of squares. this reason, the first construction
The log relative likelihood function of a based on the likelihood ratio statistic is
is preferable .
r(a) = l(a) - l(s)
PROBLEMS FOR SECTION 13.2
vs 2 v 1. Suppose that Y1 , Y1 ,
= - v log a - - 2 + v logs + - .• • , Y. are indep enden t N(a, 2
a ).
2a 2 (a) Show that, if a is known, the likelihood
= - - v[s 2
s2
2 -()2 - 1 - log -()2
J (13.2.8)
D;;;; Z 2 , where Z is defined in (13.2.6)
(b) Show that, if a is unknown, the likeli
ratio statis
and c = l /n.
tic for testing a value of a is

hood ratio statistic for testing a value of a


is given by
where v = n - q. We can plot r(a) to obta
in likelihood intervals for a, and we
can use the fact that 1
D=n log[ l +n-1
--T 2 ]
D = -2r(a ):::; Xf11 (13.2.9) where Tis defined in (13.2.7) and c = 1/n.
to test hypotheses or to obta in approxim
ate confidence intervals for cr. 2. Suppose that Y1 , ¥2 , • • • , Y. are independen
Note that r(<J) has the same form as r(O) t N(ax 1, a 1 ) variates where x 1 , x , .• . , x.
in Example 11.3.3. The results are known constants. Show that the result 2
given in Table 11.3.3 for n = 1, 2, 3, . .. are s of the preceding problem still hold, but
applicable to the present situation with c = lf'£x~.
with v = 2, 4, 6, .... The x2 appr oxim ation Hint: You will need to show that
(13.2 .9) is accurate enough for
most practical purposes even when v is
as small as 2. I:(y 1 - ax;)2 = I:(y 1 - &x;)2 + (& - ix)2 I:xf.
206 13. Analysis of Normal Measurements 13.3. The One-Sample Model 207

3. Let Y1 , Y2 , ••• , Y,, be independent variates, with Setting the derivative equal to zero gives
2
~ - N(µi , 11 /wi) for i = 1, 2, ... , n, • 1 .... -
11.=- .... y,=y.
n
where w 1 , w 2 , . . . , w. are known positive constants. The µ/ s are assumed to be
linear functions of q unknown parameters a, p, .. . . The MLE of 11. is the sample mean y. Note that a is a linear combination of the
(a) Derive the log likelihood function, and show that &, p, ...
are the parameter
2
y/ s:
values which minimize the weighted sum of squares .I:w1(y1 - µ 1) •
a=a1Y1+a2Y2 + ··· +a.y.
(b) Show that ii, p, ... are linear combinations of the y/s.

Note: This type of model might be used when the measurements y 1 , Ji , ... , Y. are where a 1 = a2 = ... =a.=~.
n
The sampling distribution of a is N(a., u 2 c),
not made with equal precision. Observations made with high precision are given
large weights w1, while less precise observations are given smaller weights. The where
estimates &, '/J, ... are called weighted least squares estimates. Because of (b), the 2
statistical methods described in this section can be extended easily to the weighted c = a21 + a22 + .. · +a.=
2
n ( ;;l ) =;;1
least squares case. It can be shown that .I:w,£f / 11 - xfn - q» and so the appropriate
2

variance estimate is
(see (6.6.8) ).
Upon replacing o: by a in (13.3. l ), we obtain the fitted values
iii =iii = · .. =ii. =a= y.
Thus the ith residual is

13.3. The One-Sample Model e, = Y1 - ii1= Y1 - y,


and the residual sum of squares is
As in Sections 13.1 and 13.2, we consider n measurements y 1 , y 2 , ••• , Yn· :Eef = :E(y, - p)2.
These are modelled as observed values of n independent random variables
Y1 , Y2 , •• ., Y,,, where This is called the corrected sum of squares of the y;'s, and is often denoted by
for i = 1, 2, ... , n. Syy·
The one-sample model (13 .3.1) replaces n parameters µ 1 , µ 2 , .. ., µ.by a
We also assume that the µ/s are linear functions of q unknown parameters single parameter 11., thus reducing the number of unknown parameters by
2
o:, {3, ... . n - 1. There are n - 1 degrees of freedom for estimating u • By (13.2.5),
In this section we consider the simplest case, in which all n measurements the variance estimate is
we re taken under the same conditions. Then the y/s should differ from one
another only because . of random variation, and their expected values 1 '<'( y,-y-)2 '
s2 =--1.<... (13.3.2)
µ 1 , µ 2 , .. . , µ.should all be equal. Thus we consider the one-sample model n-

µ1 = µ2= .. . = µ. = 11.. (13.3.1) and this is called the sample variance.


To calculate s2 by computer, one evaluates then residuals Y; - y, sums their
The µ/s are expressed as functions of a single unknown parameter o:, and so squares, and then divides by n - L For hand calculation, the following
q = l. identities are useful:
Substituting for the µ;'sin (13.2.2) gives
s = :E(y, - µy = :E(y, - 0:)2; (13.3.3)

as The latter two formulas must be used cautiously, because they are highly
oo: = -2:E(y 1 -o:)= -2[:Ey 1 -na.].
susceptible to roundoff errors.
Note that the derivative of S ts linear in both a and the observations Having obtained ii, c, and s 2 , one can use the methods described in Section
Yi. Y2, .. ., Yn· 13.2 to make inferences about 11. and u.
208
13. Analysis of Norm al Measurem
ents
13.3. The One-Sample Model
EXAMPLE 13.3.1. A stan dard dru 209
g prod uce s bloo d pres sure incr
are norm ally dist ribu ted with eases which
me anµ = 22 and stan dard dev The new drug prod uce s a lower
A new dru g is also exp ecte d to iatio n a= 9.2. mea n increase than the stan dard
prod uce norm ally dist ribu ted incr also less vari able in its effect. This drug . It is
with possibly different values eases, but is an adv anta ge, because the effe
of µ and <J. The new dru g was new drug on an indi vidu al can ct of the
indi vidu als, and it prod uce d the given to ten be mor e accu rate ly pred icte d.
following bloo d pres sure increase 0
s: EXAMPLE 13.3.2. Eight plas tic gear
18 27 23 15 18 15 18 s were tested at 21 °C unti l they
20 17 8. times to failure (in millions of cycl failed. The
es) were as follows:
Is ther e evidence that µ i= 22?
that a i= 9.2? 2.37 2.01
2.47 2.20 1.87 2.32 2.00
SOLUTION. We assu me that the . 2.86
mea sure men ts y 1 ,Y2 . ·· ·· Yto are A com mon assu mpt ion in such
values of inde pen den t N(µ ;, a 2 obse rved situ atio ns is that the loga rith ms
) rand om vari able s,
and that times are norm ally dist ribu ted. of the failure
The natu ral loga rith ms of the
listed abo ve are as follows: eigh t times
µ1= µ2= ... =µ1 o=c t. .
We find that 0.863 0.698 0.904 0.788 . 0.626 0.84
n = 10; L.yi = 179; L.yf = 3433; 5: = 2 0.693 1.051
y= 17.9 Ass11ming these to be inde pen
den t obs erva tion s from N(µ , a 2
confidence inte rval s for µ and ), find 95%
S
n
= L.(y. -
'
Y) 2 = L.yf - ~(L.
n
y 1 ) 2 = 228.9 <J.

SOLUTION. The 8 log failure time


s y 1 , y 2 , ... , y 8 are assu med to
s2 = _1_l:( Y1 - y)2 = 25.4 values of inde pen den t N(µ i, <J 2 be obs erve d
) vari ates with
n-1
µI = µ2 = .'. = µ5 = µ.
var (5:) = ca 2 1 This is the one -sam ple mod el (13.
whe re c = -
n
= 0.1. 3.1), except tha tµ rath er than
den ote the com mon mean. Proc a is used to
The vari ance esti mat e has n - eedi ng as in the Exa mpl e 13.3.1,
1 = 9 degrees of freedom . we obta in
Wh en a is unk now n, inferences n = 8; µ = y = 0.8081; s 2 = 0.01876;
abo ut a are base d on
5:- a 1
T= --- t<9 )·
Js2c
Var(fl.) = 2
<J C Where C = ~ = s·1
Not e that T has the sam e degr The variance esti mat e has n -
ees of freedom as s 2 • To test H: l = 7 degrees of freedom .
calc ulat e a= 22, we Inferences con cern ing µ are base
d on
5:-2 2 fl-µ
Tb =
os
r.:r: = - 2.57;
v's2 c T == - - - t<7J"
Js2c
SL = P{it< 9 ll :<:::. 2.57} ~ 0.03 FroT'l Tab le B3, we find that
from Tab le B3. The re is evidence
that the mea n increase for the new P{ - 2.365 s t(7) s 2.365} = 0.95.
different from 22. dru g is
By (13.2.8), the like liho od rati o Hen ce the 95% confidence inte
stat istic for testing H: a= <lo is rval for µ is
µ E fl.± 2.3 65f tc = 0.80 81± 0.11
D= -2r( <J0 )= v[s :-l -lo gs 45;
:1. that is, 0.6936 s;; µ s;; 0.9226. If we
<Jo <Joj tested H: µ = µ 0 for any para met
in this interval, we wou ld obta er value µo
whe re here v = n - 1=9 . Tak in a significance level of 5% or
ing a 0 = 9.2 and s 2 = 25.4, we para :net er values inside the inte mor e. Also,
D0 bs = 4.53. Since D ~ XftJ if H find that rval have high er max imu m rela
is true, Tab le B4 gives than values outside. tive like liho od
SL~ P{xf1 l :<!: 4.53} ~ 0.03. To obta in an app roxi mat e 95%
confidence inte rval for a, we use
that the fact
The re is evidence that <Ji= 9.2.
D = - 2r(a) ~ xfl)·
212 13. Analysis of Normal Measurements
13.4. The Two-Sample Model
213
(a) Assuming that the counts are independent N(µ, a 2 ) ,
find estimates ofµ and a 1 .
Hence estimate the probability of a negative count under Setting the derivatives of S equal to zero and solving gives
this model.
(b) A more reasonable assumption in this example is that
the (natural) logarithms
of the counts are independent normal. Repeat (a) under
8. It is sometimes convenient to relocate and rescale
this assumption. fl.1 =.Y1 =-I i I"'=
n 1 1
Y11;
measurements y 1 , y 2 , .. . , y.
before analysis. The new measurements are then given by 1
z, = (y, - a)/ b w'here a, b • 2
are known constants. Suppose that the y;s are modelled
How should the z;s be modelled? How can estimates and
as independent N(µ, a 2 ) . fl.2 = Y2 = -
n2
I
j = I
Y21 ·
confidence intervals for
µ and a 2 be obtaine d from an analysis of the z;s? It now follows that

&. = fl.1 = Y1 JJ = fl.2 - ii1 =Yi - y,.


The fitted values and residuals are given by
13.4. The Two-Sample Model
fl.ij= µ, = y,;
Consider n = n 1 + n 2 independent measurements in two samples. The eii =Yu - fl.u = Yii - y,.
first sample consists of n 1 measurements y , Yii· .. .,
11 y • recorded under The residual sum of squares is
one set of conditions, and the second sample consists 1 1
of n measurements
Yi 1 , Yii· .. ., y 2 • 1 recorded under a different set of condit 2 EEefj = EE(yii - y;)2
ions. For instance,
the y 1/s might be blood pressure increases for n subjects
1 who received drug A,
while the Yi/s are increases for ni different subjects who
received drug B.
= E(Y11 - Yi)i + E(Yi1 - Yi) 2
As in the preceding sections, we model the Yu's as observ l)s~ + (n 2 -
ed values of I = (n 1 - l)s~
independent norma l random variables Y/i with the same
variance IJ'i: Ii- where sf and s~ are the two sample variances:
for j = 1, 2, .. . , n1; i = 1, 2.
The two-sample model is as follows :

µII =µ12 = ... =µln1 =µ1;}


µ11=µ 22= ... =µ2.2 =µ2. (13.4.1)

This model states that observations within a sample differ Since there are q = 2 unknown parameters µ and µi (or
from one another 1 a and /3), there are
only because of random variation. There are q = 2 unkno n - 2 degrees of freedom for variance estimation, and
wn parameters, ·µt (13.2.5) gives
and µ 2 •
i I (n 1 - l)sf + (ni - l)s~
Anoth er way to write the two-sample model is s =--E Ea• 2..1 = --------
n-2 ' (n 1 -l)+ (ni-l ) ·
µII =µ12= ... =µ1n 1 =rt. } The combined or pooled estimate s 2 .based on both sample
µi1 =µii= ··· = µ2•2 = rx + /3 (13.4.2) s is a weighted
average of the two sample variances si and sL with weight
s equal to their
where rx = µ 1 and f3 = µ 2 - µ 1 • An advantage of this param degrees of freedom n 1 - 1 and n2 - 1.
etrization is that
the difference in the means µ - µ , which is usually the
2 1 quantity of primary
interest, is explicitly represented by the param eter fl.
Upon substituting (13.4.1) into (13 .2.2), we find that Inferences for /3 = µz - µ1
the error sum of
squares is
For inferences about {J, we follow the procedures describ
ed in Section 13.2.
First we find the sampling distribution of 'fJ =Yi - f 1. Note
that
n1 n2
I
=
j=I
(Yu - µ1) 2 + I
j=I
(Yic µ1) 2.
f1 "'N(µ 1, : 1
IJ'i} f2- N(µi, : IJ' }
2
2
13.3. The One-Sample Model 211
210 13. Analysis of Normal Measurements

mothers at normal births is 31.25. The average age of the mothers at 50 births of
Since P{xfl}::::;; 3.841} = 0.95, the required interval is found by solving the mongolian children was 37.25 years, with sample variance s 2 = 49.35. Are these
inequality observations consistent with the hypothesis µ = 31.25?
-2r(a)s 3.841. 3.tUnder a special diet, twelve rats made the following weight gains (in grams) from
Since v = n - 1 = 7 and s 2 = 0.01876, (13.2.8) gives birth to age three months:

0.01876 0.01876] 55.3 54.8 65.9 60.7 59.4 .62.0


-2r(a) = 7 [ -a-2- 1- log-a-2- • 62.1 58.7 64.5 62.3 67.6 61.2

By plotting this function or using Newton's method, we find the interval to be Assuming that weight gains are independent N(µ, a 2 ), obtain 95% confidence
0.088 s as 0.258. This is also a 14.7% likelihood interval for a, and an intervals forµ and for a 2 . Use two methods to find the confidence interval for a 2 ,
approximate 5% significance interval for a. and compare the results.
Alternatively, we can construct a 95% confidence interval for a by using 4.t A manufacturer wishes to determine the mean breaking strength µ of string "to
the fact that within a pound", which we interpret as requiring that the 95% confidence interval
for µ should have length at most 2 pounds. If measurements are independent
N(µ, a 2 ), and if ten preliminary measurements gave L(xi - x) 2 = 80, how many
additional measurements would you advise the manufacturer to make?
Here n - 1 = 7, and Table B4 gives 5. Sixteen packages are randomly selected from the production of a detergent
P{xf7>2: 16.01} = P{xf7> s 1.690} = 0.025. packaging machine. Their weights (in grams) were as follows:

It follows that 287 293 295 295 297 298 299 300
2 300 302 302 303 306 307 308 311
p 1.690 s 7s s 16.01 } = 0.95,
{ It may be assumed that weights are independent N(µ, a 2 ).
and therefore the interval (a) Determine 95% confidence intervals for µ and a.
(b) Assuming that µ and a are equal to their estimates, find an interval which
7s 2 7s 2 contains the weight of a new randomly chosen package with probability 0.95.
<a2<--
l6.0l - - 1.690
6. Ten steel ingots chosen at random from a large shipment gave the following
has coverage probability 0.95. Upon substituting for s2 and taking square hardness measures:
roots, we obtain 0.091sa::::;;0.279 as the 95% confidence interval for a.
Note that the second construction does not produce a likelihood interval. 71.7 71.l 68.0 69.6 69.1
The interval 0.091::::;; as 0.279 includes values at the high end which are less 69.4 68.8 70.4 69.3 68.2
likely than some values excluded at the lower end. The first construction does
If the manufacturing process is under control, the hardness measures should be
produce a likelihood interval, and it is therefore preferable. D independent N(µ, a 2 ) with a 2 = 1.2.
(a) Are the ten observations consistent with the hypothesis a 2 = 1.2?
PROBLEMS FOR SECTION 13.3 (b) Assuming that a 2 = 1.2, find a 90% confidence interval forµ.
l. The following are the initial velocities in meters per second of seven projectiles fired (c) Find a 90% confidence interval forµ if a 2 is unknown and must be estimated
from the same gun: from the data.

451 447 454 450 454 449 452 7.tThe following are trypanosome counts (in thousands) in cattle seven days after
infection:
Assuming that velocity is normally distributed, obtain a 90% confidence interval
for the mean velocity µ. 17.0 2.1 1.7 44.2 5.1 2.9 3.5 19.6
28.0 7.0 17.1 0.7 34.5 13.0 1.5 5.2
2. The relationship between parental age and the incidence of mongolism wa.s
9.0 5.9 3.9 11.5. 14.5 16.2 33.3 12.2
discussed in Section 7.5. It is known that, in a certain population, the mean age of
214 13. Analysis of Normal Measurements 13.4. The Two-Sample Model 215

Furthermore, Y1 and Y2 are independent because Y11 , Yi 2 , .• ., Yin, are the first sample, and µ 2 for the second sample. This is the two-sample model.
We wish to know whether H: µ 1 = µ 2 , or equivalently H: f3 = 0, is consistent
independent of Y21 , Y22 , .•• , Y2 • 2 • It follows by (6.6.7) that
with the data.
- - (
Y2 - Y1 - N µz - µi, <1 2( 1
;;; + n21 )) .
Inferences about f3 are based on

Hence the sampling distribution of '/J is N(/3, <J


2
c), where
1 1 1 1
C=-+-. where lJ = y2 - y1 = -1.08, c = +TS, n = 24, and s2 is the pooled variance
ni nz 9
We now standardize and replace <J
2
by s 2 to get estimate:
2 8si + 14si
s = 8 + 14 = 0.5156 (22 d.f.).

To test H: /3 = 0, we compute
Here s 2 is the pooled variance estimate with n - 2 degrees of freedom, and T
has the same degrees of freedom as s 2 . We can now test an hypothesis f3 = {3 0 7;,bs =
'/1-0
fI = - 3.57;
or find confidence intervals for f3 as in Section 13.2. v s2c
SL= P{lt(22 1l 2: 3.57} ~ 0.002
EXAMPLE 13.4.1. Cuckoos lay their eggs in the nests of other birds. Table
13.4.1 gives the lengths in millimeters of n = 24 cuckoos' eggs found in nests of from Table B3. If µ 1 and µ 2 were equal, a difference as large as that observed
reed-warblers and wrens. The data are from a paper by O.H. Latter in would very rarely occur, and so there is strong evidence that µ 1 i= µ 2 •
Biometrika (1902). The table also shows the two sample means and sample EXAMPLE 13.4.2. The log-lifetimes of 8 plastic gears tested at 21 °C were
variances. analyzed in Example 13.3.2. The following are the log-lifetimes of 4 additional
The average length of cuckoos' eggs is 22.20 in the first sample, and only gears tested at 30°C:
21.12 in the second sample. It appears as though the lengths of cuckoos' eggs
may depend upon the locations in which they are found. On the other hand, 0.364 0.695 0.558 0.359 .
since only 24 eggs were measured, it may be that the observed difference It may be assumed that log-lifetimes are independent and normally dis-
between Yi and y2 is due merely to random variation. We wish to determine tributed with the same variance <J 2 , but with expected value µ 1 at 21°C and µ 2
whether the observed difference is too great to be attributed to random at 30°C. Find 95% confidence intervals for µ 1 , µ 2 - µ 1 , and <J::\.
variation. SOLUTION. The sample means and variances are as follows:
We model the 24 measurements as observed values of independent normal Sample 1 (21°C)
variates with the same variance. We assume that the expected length is µ 1 for
n 1 =8 y1 =0.8081 sf= 0.01876 (7 d.f.).
Table 13.4.1. Lengths in Millimeters of 24 Cuckoos' Eggs Sample 2 (30°C)
Y2 =0.4940 s~ = 0.02654 (3 d.f.).
Sample 1 Sample 2
The pooled variance estimate is
Eggs from reed-warblers' nests Eggs from wrens' nests
2 7sf + 3s~
21.2 21.6 21.9 19.8 20.0 20.3 20.8 20.9 s = 7 + 3 = 0.02109 (10 d.f.).
22.0 22.0 22.2 20.9 21.0 21.0 21.0 21.2
22.8 22.9 23.2 21.5 22.0 22.0 22.l 22.3
The sampling distribution of fi. 1 = y1 is N(µ 1 , <J
2
c), where c = 1/n 1 .
Inferences about µ 1 are based on
n 1 =9; .Yi= 22.20 n2 = 15; y = 21.12 P.1 - µ1
si = 0.4225 (8 d.f.) s~ = 0.5689 (14 d.f.) T= ! I -tcn-2)
...;s 2 c
216
13. Analysis of Normal Measurements 13.4. The Two-Sample Model
217
2
where s is the pooled variance estimate with n - 2 = I 0 degrees of freedom . The comments in Section 9.2 concerning the pooling of estimates are
From Table B3 we find that
relevant here. It is usually not a good idea to combine estimates sf, s~, ... ,sf
P{ -2.228::; t(IO)::; 2.228} = 0.95 which are significantly different from one another. A formal test for
homogeneity can be carried out as in Section 12.3. For details, see the
and hence the 95% confidence interval is problems at the end of this section.
µ1 E ii1±2.228Js2C=0.80 81±0.1144.
EXAMPLE 13.4.3. Table 13.4.2 shows data from a study on a pulse-jet
This differs from the construction given in Example 13.3.2 because now we pavement breaker using nozzles of three different diameters. The measure-
are using both samples to estimate a 2 . Thus s 2 is different, and there are more ment is the penetration (in millimeters) of a concrete slab produced by a single
degrees of freedom for T. discharge. The table also gives the sample mean y1 and sample variances? for
Next we consider P= µ 2 - µ 1 • The estimate '/J = y2 - y1 has sampling each sample. Since n 1 = n 2 = n 3 = 9, each of the sample variances has 8
distribution N(p, a 2c), where now c = -k + l Inferences about p are based on degrees of freedom.
Let y 1i denote the jth observation for the ith nozzle type, where i = 1, 2, 3
P-P
T = r::r: - t(IO)>
and j = 1, 2, .. . , 9. We assume that the yiJ's are observed values of independ-
2
-ys c ent N(µ 1i, a 2 ) variates, and that mea~urements made with the same type of
and the 95% confidence interval for p is nozzle have the same expected value:
for i = 1, 2, 3.
p E '/J ± 2.228Js2C = -0.3140 ± 0.1981;
that is, 0.1159 s µ 1 - µ 2 s 0:5121. This is the three-sample model with n 1 = n2 = n3 = 9. The variance estimate
for this model is
Two methods of constructing confidence intervals for a were described at
the end of Section 13.2. We shall use the first method, which is based on 2 = 8si + 8s~ + 8s~ = 3360 =
(13.2.9). Here we have s2 = 0.02109 with v = 10 degrees of freedom, so s 8+8+8 140
24 ,
0.02109 0.02109] with n - 3 = 24 degrees of freedom.
D=-2r(a)=IO - - - I - l o g --.
[ (J2 (J2 Suppose that we are interested· in µ 3 - µ 2 , which is the difference
in expected penetration for large and medium nozzles. This is estimated
We solve the inequality -2r(a)::; 3.841 to obtain
by y3 - y2 , which has sampling distribution N(µ 3 - µ 2 , a 2 c) where
0.0991s(Js0.2425 c = ! + t. Inferences about µ 3 - µ 2 are based on
as the (approximate) 95% confidence interval for a. 0 - {Y3 - Yi)- (µ3 - µ1) _ t
T= r::r: (24) •
v s2c
More Than Two Samples In particular, the 95% confidence interval is

µ3 - µ2 E Y3 - Y2 ± 2.064Js2C = 1.16 ± 11.51.


A similar model and analysis can be developed when there are more than two
samples. Suppose that there are n = n 1 + n 2 + ··· + nk measurements in k Since this interval contains zero, a test of H: µ 3 - µ 2 = 0 would give a large
samples. We assume that all n measurements are independent normal with
2
variance a , and that the n1 measurements in the ith sample have expected
value µ 1(i = I, 2, ... , k). Let y1 and sf denote the sample mean and variance for Table 13.4.2. Penetration of Concrete for 27 Discharges of a Jet Pulse
Machine with Three Nozzle Sizes
the ith sample. Then µ1 = f;, and the pooled variance estimate is
Nozzle Penetration (in millimeters) y, sf
sz = _1_ l:l:(fr- y.)2 = (n1 - l)sf + ··· + (nk - l)sf
1
n-k ' (n 1 -l)+ ··· +(nk-1) · Small 67 47.5 46 62.5 49 53.5 42 55.5 39 51.33 85.00
Medium 88 60 72 73.5 62 72.5 73.5 44 54.5 66.67 167.38
There are k unknown parameters µ 1 , µ 2 , •• . , µk> and therefore n - k degrees Large 83 53 87 71 78 51.5 68 58 61 67.83 167.63
of freedom for variance estimation.
218 13. Analysis of Normal Measurements 13.4. The Two-Sample Model
219
significance level. There is no evidence of a difference in expected penetration
(b) Show that, if a 1 =a 2 = · · · =at= a, then the MLE of a 2 is given by
for medium and large nozzles.
s 2 = (:Ev,sf)/(I:v;).
PROBLEM S FOR SECTION 13.4. (c) Show that the likelihood ratio statistic for testing H: a = a = ··· =a, is
1 2
given by
!. The following are measurements of ultimate tensile strength (UTS)
for twelve
specimens of·insulating foam of two different densities. D = I:v 1 log(s 2 /sf}.

High density 98.5


Note: The distribution of D is approximately x& - n if His true.
105.5 111.6 114.5 126.5 127.1
Low density 79.7 84.5 6.tFourte en welded girders were cyclically stressed at 1900 pounds per square
85.2 98.0 105.2 . 113.6 inch,
and the numbers of cycles to failure were observed. The sample mean and
Assuming normality and equality of variances, obtain 95% confidence intervals variance of the log failure times were y = 14.564 and s 2 = 0.0914. Similar tests
on
for the common variance and for the difference in mean UTS at the two densities. four additional girders with repaired welds gave y = 14.291 and s2 = 0.0422. Log
failure times are assumed to be independent and normally distributed.
2. tTwenty-seven measurements of yield were made on two industrial processes,
with (a) Test the hypothesis that the variance of log failure time is the same
the following results: for
repaired welds as for normal welds.
Process 1: (b) Assuming equal variances, obtain a 90% confidence interval for the difference
n1=11 y1 = 6.23 si = 3.79 in mean log failure time.
Process 2: n2 = 16 y2 = 12.74 s~ = 4.17
7. A common final examination was written by 182 honors mathematics students,
Assuming that the yields are normally distributed with the same variance, find of
whom 61 were in the co-operative program. The results were as follows:
95% confidence intervals for the means µ 1 and µ , and for the difference in means
2
µI -µ2.
Co-op students nl = 61 y1 = 68.30 S1 = 10.83
3. An experiment to determine the effect of a drug on the blood glucose Others n1=121 y2 =65.93 S2 = 15.36
con-
centration of diabetic rats gave the following results:
(a) Assuming that the examination marks are normally distributed, determine
Control rats 2.05 1.82 2.00 1.94 2.12 whether the variances are significantly different for the two groups.
Treated rats 1.71 1.37 (b) Estimate the proportion of students obtaining a mark of 90 or more, and
2.04 1.50 1.69 1.83 the
proportion obtaining 50 or more, for each group.
Test the hypothesis that the treatment has no effect on mean · blood glucose 8. Readings produced by a set of scales are independent and normally distribute
concentration. State the assumptions upon which this test depends. d
about the true weight with constant variance. Six weighings of each of two objects
4. An experiment to discover the movement of an antibiotic in a certain variety gave the following results:
of
broad bean plants was carried out by treating 10 cut shoots and 10 rooted plants
for 18 hours with a solution of the antibiotic. Assay results giving the con- Object 1: 4.83 4.98 4.91 4.96 5.05 4.93
centration of the antibiotic per gram of plant weight are given below. Object 2: 3.02 2.95 2.83 3.00 3.02 3.09

Cut shoots 55 65 61 48 57 58 60 68 (a) Test the hypothesis that the variance in the readings is the same for the two
52 63
Rooted plants 53 48 50 39 43 44 objects. ·
46 56 35 51
(b) Assuming a common variance, obtain a 90% confidence interval for
the
Assuming that concentrations are independent normal with constant variance, difference in weights of the two objects.
obtain a 99% confidence interval for the difference in mean concentration for
the 9.tThe following are the distances traveled (in miles) by 15 rockets used to
two types of plant. test 3
different fuels.
5. Testing equality of variances. Let si, si, ... , sf be independent variance estimates,
where Fuel 1 16.2 17.3 17.0 16.6 17.4
Fuel 2 18.6 18.6 19.0 19.5 20.0
for i = 1, 2, ... , k. Fuel 3 19.7 19.4 20.0 19.2 18.9
(a) Find the joint log likelihood function of a , a , and show that it is
maximized for ar
=sf (i = 1, 2, ... , k).
1 2 ••• , O"t. (a) Test the hypothesis that the variability in distance traveled is the same for
three fuels.
the
220 13. Analysis of Normal Measurements
13.5. The Straight Line Model 221
(b) Assuming equal variances, obtain a 95% confidence mterval for the common
variance. E(Y) on xis linear. Even if the relationship is nonlinear, a straight line will
(c) Find a 95% confidence interval for the difference in mean distance traveled often give a satisfactory approximat ion over a restricted range of x-values. It
for fuels 2 and 3. is advisable to plot the data in order to see whether a straight line model will
(d) State the assumptions upon which the analysis in (a), (b), and (c) depends. be satisfactory (see Example 13.5.3).
10. Let si, s~, .. , sf be independent variate estimates as in problem 5 above. Consider Under the straight-line model, we have
the hypothesis
for i = 1, 2, .. ., n. (13.5.1)
for i= 1, 2, .. ., n,
There are q = 2 unknown parameters, the intercept a and the slope {3.
where w 1 , w 2 , .. ., wk are known positive constants and a 2 is unknown. For historical reasons, the straight line model is also called a simple linear
(a) Show that, under H, the MLE of a 2 is regression model. The ongin of the term "regression" is explained in Section
7.5.
s2 = (:l:v,w,sf)/(:i:v,).
(b) Derive the likelihood ratio statistic for testing H.
11. Consider the two-sample model (13.4.l) with a known. Show that the likelihood Estimation of a and fJ
ratio statistic for testing H: µ 1 = µ 2 is D = Z 2 , where
Upon substituting (13.5.1) into (13.2.2), we find that the error sum of squares
is
S = I:(y; - µ;)2 = I:(y; - a - /3x;)2.
Hence show that, if H is true, D has a x2 distribution with one degree of freedom.
The derivatives of S are
as as =
aa = - 21:(y; - ('J. - {3x;); -a{J -2I:x.(y.
, . ,- a - fix.), .
13.5. The Straight Line Model
Putting as/aa = 0 gives
Consider n measurements y 1 , y 2 , .. ., Yn• but now suppose that each
1: Y; - n& - '/JI:x; = 0.
measureme nt y, has associated with it a value X; of another variable, and that
the x-values can be used to explain or predict the corresponding y-values. We We divide by n and solve for &to obtain
call x the explanatory variable or independent variable, and y the response
variable or dependent variable. &=y-'/Jx. (13.5.2)
For instance, y 1 , y 2 , •. ., Yn could be blood pressure increases for n subjects We now put as;a{J equal to zero and substitute for &to obtain
who received doses x 1 , x 2 , •. ., Xn of a drug. Or y 1 , y 2 , .. ., Yn might be
0 = Lx;(Y1 - &- '/Jx 1)
gasoline mileages achieved by a ear inn tests at driving speeds x , x 2 , •. ., x"'
1
Or y 1 , y 2 , .. ., Yn might be log lifetimes of n pJa3tie gears tested at temper- = I:x,(y, - y + Px - '/Jx;)
atures x 1 , x 2 , .• ., x". In each ease, knowledge of the x-value will help to
= I:x 1(Y; - Y) - 'jJI:x,(x, - x).
predict or explain the value of y. 1
The x;'s will be treated as known constants in the analysis, and the y;'s will It follows that
be modelled as observed values of random variables Y1 , Y2 , .• ., Y,.. We shall
assume that the Yf's are independent and normally distributed with the same
P= Lx;(y;- ~) = Sxy. (13.5.3)
I:x 1(x 1 -x) Sxx
variance a 2 , so that
The numerator in (13.5.3) is called the corrected sum of products, and it can
for i = 1, 2, .. ., n. be rewritten in several forms:
The means µ 1 , p 2 , •. ., lln will then be modelled as functions of the ex- Sxy = I:(y, - y)x; = I:(x; - x)y; = I:(x; - x)(y; - y)
planatory variable x.
In many application s it is reasonable to assume that the dependence of 1
= I:x 1 y 1 -nxy= I:x 1y 1 - -(I:x;)(I:y;) .
n
222 13. Analysis of Normal Measurements 13.5. The Straight Line Model 223

y
The denominator is the corrected sum of squares of the x/s:
Sxx = I(x; - x)x; I(x; - x)2 160 x
x
1 x
x x
= Ixf - nx 2 = L.xf- -(fa;) 2 • x x
n
140 x

x
Variance Estimation x
120 x
x
The fitted values and residuals are given by

µi =a + lJxi = y + lJ(x; - x); x


30 40 50 60 70
8; = Y; - µi = (Y; - y) - lJ(x; - x).
Figure 13.5.l. Scatterplot of blood pressure (y) versus age (x).
The residual sum of squares is

L.ef = I[(Y; - y) - lJ(x 1 - x)] 2 obtain


fai=628 L,yi = 1684
= L.(y; - .Y) 2 - 2lJL.(xi - x)(Yi - y) + lJ 2 L.(x; - x) 2
Ixf = 34416 L.yf = 238822 LX1Yi = 89894.
= Syy - 2l3sxy + 13 2 sxx·
From these we find the sample means and corrected sums:
Since lJ = SxyfSxx, it follows that
x= 52.33 .Y= 140.33
L.ef = Syy - PSxy· (13.5.4)
sxx = 1550.67 Syy = 2500.67 Sxy = 1764.67.
Since there are q = 2 unknown parameters et. and {J, there are n - 2 degrees of
freedom for variance estimation, and (13.2.5) gives Now we have

(13.5.5)
a= .Y - lJ.x = 80. 78.
The fitted line y = 80.78 + l.138x is shown in Figure 13.5.1.
Formula (13.5.4) is useful for hand calculation, but it is susceptible to large By (13.5.4), the residual sum of squares is
roundoff errors. For calculation by computer, it is better to evaluate the L.ef = Syy - Psxy = 492.47.
residuals 81 , 82 , ... , e., square them, and sum to get L.ef,
The estimate of the variance about the line is
EXAMPLE 13.5.1. The following table gives the age (x) and systolic blood
pressure (y) for each of n = 12 women: s2 = _!_2 L.ef = 49.247
n-
x 56 42 72 36 63 47 55 49 38 42 68 60 with n - 2 = 10 degrees of freedom.
y 147 125 160 118 149 128 150 145 115 140 152 155 EXAMPLE 13.5.2. In Examples 13.3.2 and 13.4.2 we considered data from
endurance tests of plastic gears at 21°C and 30°C. These data came from an
The data are plotted in Figure 13.5.1. The graph shows a roughly linear experiment in which n = 40 gears were tested at nine different temperatures.
increase in blood pressure with age. The amount of scatter about the line does Examination of the 40 lifetimes revealed that the lifetime distribution has a
not show any systematic change with x, and so the assumption of constant long tail to the right, and that there is more variability in the lifetimes at the
variance rJ 2 is reasonable. lower temperatures. It would not be appropriate to assume that the lifetimes
We assume that the y's are observed values of independent N(µi, rJ 2 ) were normally distributed with constant variance. Instead, we analyze log
variates and that the straight line model (13.5.1) holds. From the data we lifetimes as in Examples 13.3.2 and 13.4.2.
224
13. Analysis of Normal Measurements
13.5. The Straight Line Model
225
The natura l logarithms of the observed lifetimes (in
millions of cycles) are y
given in Table 13.5.1, and are plotted against operating
temperature in Figure
13.5.2. Note that the amou nt of scatter in the log lifetim
es is about the same at
all temperatures, and the dependence of mean log lifetim
e on temperature is
roughly linear.
1.5
There are 40 pairs (x,, yJ, where xi is the operating
temperature and Yi is
the log lifetime. Note that there are repeated x-valu
es: )<

)<
X1 = X2 = X3 = X4 = -16; 1.0 lf

x 5 = x 6 = x 7 = x 8 = O;
~ x )<
and so on. We assume that the y/s are observed x
values of independent 0 .5 x
N(µ;, a 2 ) variates where µi =ex+ {Jxi.
)<

To find the param eter estimates, we first compute l1 * )<

fai = 4(-16 ) + 4(0) + 4(10) + 8(21) + ... = 1096


- 20

l
20 40 60 80
I:.xf = 4(-16 ) + 4(0) 2 + 4(10) 2 + 8(21) 2 +' ... = 53816
2
)<

I:.yj = 24.996 ~ )<


I:.yf = 37.506862 LX1Y1 = -15.7 81 x
We can now comp ute means; corrected sums of square -0. 5
s, and estimates as in x
the preceding example. The fitted line is x

-1.0 x
y = 1.432 - 0.02946x
and the residual sum of squares is 1.24663 with 38 degree Figure 13.5.2. Scatte rplot of log lifetimes (y) versus
tempe rature (x).
s of freedom, giving
the variance estimate s2 = 0.03281.
When there are repeated x-values, it is possible to test line. See Section 14.4 for further discussion of this examp
the goodness of fit of le, and for a possible
the straight-line model (see Section 14.4). In this case, explanation of the poor fit.
the test indicates a poor
fit. The poor fit can also be seen in Figure 13.5.2,
since at 5 of the 9
temperatures, all of the observed log lifetimes lie on
the same side of the fitted
Plot the Data!
Table 13.5.1. Log Lifetimes of Plastic Gears at Nine
Operating It is possible to compute&, fJ, and s2 for any set of n pairs (xj, yJ Nothi ng in
Temperatures
the arithmetic tells us whether fitting a straight line
model is a sensible thing
Tempe rature Numb er to do. It is impor tant to plot the data and check that
y = Natura l logarithm of the straight-line model
x(OC) tested gives a reasonable fit. The graph may reveal difficu
lifetime (in millions of cycles) lties with the model, or
specfal features of the data which affect the interp
-16 4 retation. This point is
1.690 1.779 1.692 1.857 illustrated in the following example, which was given
0 4 1.643 1.584
by F.J. Anscombe,
1.585 1.462 Graph s in Statistical Analysis, The American Statist
10 4 1.153 0.991
ician 27 ( 1973), pages
1.204 1.029 17-21.
21 8 { 0.863 0.698 0.904 0.788
0.626 0.842 0.693 1.051 EXAMP LE 13.5.3. Four data sets each consis
30 4 0.364 0695
ting of 11 pairs (x 1, yJ are given in
0.558 0.359 Table 13.5.2. All four sets give approximately the same
37 4 0.412 0:425 numerical results
0.574 0.649
47 4 0.116 0.501 0.296 0.099 & = 3, fJ = 0.5, s = 1.528.
2
57 4 -0.355 -0.269 -0.354 -0.459 However, as Figure 13.5.3 shows, the appro priate
67 4 -0.736 -0.343 conclusions will be
-0.965 -0.705 l qualitatively different in each case.
l:

l
226 13. Analysis of Normal Measurements
13.5. The Straight Line Model
227
13.5.2.
With data set #3, there is an outlying point at x = 13. It causes the fitted
Set 1 Set 2 Set 3 line to be shifted upwards, so that it does not properly fit the remaining ten
x y x y points either. If we remove this point and recalculate, the fitted line is
4 4.26 4 3.10 4 5.39 Y =4 +0.346x
5 5.68 5 4.74 5 5.73 8 5.76
6 7.24 6 6.13 6 6.08 8 7.71 which gives a close fit to the remaining ten points and a much smaller
7 4.82 7 7.26 7 6.42 8 8.84 variance estimate. Both the outlier and the revised analysis should be
8 6.95 8 8.14 8 6.77 8 8.47 reported.
9 8.81 9 8.77 9 7.11 8 7.04 The fourth data set shows good agreement with the straight line model, but
10 8.04 10 9.14 10 7.46 8 5.25 the estimate of the slope depends entirely upon a single observation. If this
11 8.33 11 9.26 11 7.81 8 5.56 observation were found to be in error and deleted, the slope could not be
12 10.84 12 9.13 12 8.15 8 7.91 estimated. Furthermore, without measurements at additional values of x,
13 7.58 13 8.74 13 12.74 8 6.89 there is no way of determining whether the actual dependence of y on x is
14 8.10 14 8.84 19 12.50
even close to being linear. The fact that the analysis depends so heavily on a
single observation should be reported along with the numerical results.
The points of the first data set appear to be sc~ttered ra?domly about the In summary, although all four data sets yield the same numerical results for
fitted line y = 3 + 0.5x. The straight line model gives a satisfactory fit to the the straight line model, different conclusions are appropri ate in the four cases.
data and there are no peculiarities which need to be pointed out. Examination of a graph is an indispensable part of the statistical analysis.
F~r the second data set, the dependence of Yon xis clearly ~ot linear. T~e Some additional graphical methods for examining the adequacy of the
straight line model is inappropriate, and instead a quadratic polynomial model will be described in Section 14.5.
model
fori=l, 2, ... ,n PROBLEMS FOR SECTION 13.5
1. Theory suggests that a linear relationsh ip exists between the shearing strength
could be tried. of
steel bolts and their diameters. The following table gives the diameter
x and
y strength y for 9 bolts of a particular type.

10 10
x xx
x 1/8 1/4 3/8 1/2 5/8 3/4 7/8 1 3/2
x
x y 47 72 97 126 165 186 233
x x " 257 311
5 x 5 x (a) Fit a straight line to the data, and compute the variance estimate.
x
(b) Plot the data and the fitted line. Note that one of the observati ons
is seriously
0 x 0 out of line with the others.
0 5 10 15 0 5 10 15 (c) Recalculate the fitted line and variance estimate with the outlying observati
on
Data set I Data set 2 omitted, and plot the new line on the graph in (b). Briefly describe the effect
of
y y this observati on on the analysis.
x
2.tThe following are the breaking strengths of six bolts at each of five
10 10 different
x diameters.
Diameter 0.1 0.2 0.3 0.4 0.5
5 5
1.62 1.71 1.86 2.14 2.45
x 1.73 1.78 1.86 2.07 2.42
0 x 0
0 5 10 15 0 5 10 15 Breaking 1.70 1.79 1.90 2.11 2.33
Data set 3 Data set 4 strength 1.66 1.86 1.95 2.18 2.36
1.74 1.70 1.96 2.17 2.38
Figure 13.5.3. Scatterplo ts of the four data sets in Table 13.5.2.
1.72 1.84 2.00 2.07 2.31
228 13. Analysis of Normal Measurements 13.6: The Straight Line Model (Continued)
229
(a) Fit a straight line to the data and compute the variance estimate
. Note: It is not clear that either of the above analyses is appropri
(b) Plot the data and the fitted line. Does the dependence of breaking ate, because
strength on there 1s no natural choice for the dependent and independent variables
diameter appear to be linear? How should the model be modified? in this
example. In fact, both Rand C could be considered to be dependent on
actual age.
3. The analysis in the preceding example assumed that the variance
in breaking 6.tThe following measurements of atmospheric pressure (AP) and the
strength was the same at all five diameters. To check this assumption, boiling point of
compute the water (BP) were taken at various altitudes in the Alps and Scotland
sample variance for the six measurements at diameter 0.1. Repeat for . Theory
each of the suggests that the boiling point of water should change linearly with changes
other diameters to obtain five sample variances, each with five degrees in the
of freedom . (natural) logarithm of the atmospheric pressure.
Now carry out a likelihood ratio test of the hypothesis that the variance
is the same
at all five diameters. (See Problem 13.4.5.)
BP AP BP • AP BP AP
4. The following table gives x, the water content of snow on April 1,
and y, the water 194.5
yield from April to July (in inches), for the Snake River watershed in 20.79 200.9 23.89 209.5 28.49
Wyoming for 194.3
17 years (1919- 35). 20.79 201.1 23.99 208.6 27.76
197.9 22.40 201.4 24.02 210.7 29.04
x 198.4 22.67 201.3 24.0l 211.9 29.88
y x y x y 199.4 23.15 203.6 25.14 212.2 30.06
10.5 23.1 16.7 32.8 1~.2 31.8 199.9 23.35 204.6 26.57
17.0 32.0 16.3 30.4 10.5 24.0
23.1 39.5 12.4 24.2 24.9 52.5 (a) Fit a straight line model E(BP) =a+ f3 log (AP)
22.8 37.9 14.l to the data and compute the
30.5 12.9 25.l variance estimate. Use the fitted line to estimate the boiling point of
8.8 12.4 17.4 35.1 14.9 31.5 water for
atmospheric pressures 20, 25, and 30.
10.5 21.1 16.l 27.6 (b) How would the results in (a) be affected if one used
logarithms to the base 10
rather than natural logarithms?
(a) Fit a straight line to these data, and calculate the variance estimate. (cl Plot the data and the fitted line. Can you spot any difficulties? If so, how
Plot the would
data and the fitted line. Can you spot any difficulties? you suggest that the analysis should be modified?
(b) Suppose that the observations with the smallest and largest
x-values are
dropped from the analysis. Without redoing the calculations, explain
what
effects this will have on the estimates of the intercept, slope, and variance.
5. Archeologists use both tree ring dating and carbon dating in estimatin 13.6. The Straight Line Model (Continued)
g the age of
artifacts. In one study of Indian ruins, the estimated ages (in years)
by tree ring
dating (R) and carbon dating (C) were as follows: In Section 13.5 we derived estimate s of the parame ters a, {3, and a 2
in the
straight line model. In this section we apply the method s describe
d in Section
R c R c R c 13.2 to obtain significa nce tests and confide nce interval s.
We are conside ring n observe d pairs (xi, Y;) for i = 1, 2, ... , n.
710 795 212 222 415 The x :s are
432 treated as known constan ts, and the y;'s are modelle d as observe
717 764 822 765 272 352 d values of
indepen dent N(µ;, a 2 ) variates , where µ;=a+ /Jx;.
350 320 612 543 204 / 187
Throug hout this section, we use s2 to denote the appropr iate
323 360 647 642 206 192 varianc e
500 612 513 estimate based on the straight line model, as defined in (13.5.5).
533 824 764 This estimat e
620 642 722 724 641 has i1 - 2 degrees of freedom . Confide nce interval s for <J can be
701 obtaine d by
832 786 724 745 527 529 either of the method s describe d at the end of Section 13.2.
669 690 400 409 569 582
917 878 396 456 693 646
423 436 812 652 471 360 Inferences about the Slope f3

Taking R as the dependent variable (y) and C as the independent variable


By (13.5.3), the MLE of f3 is
(x), fit a
straight line to the data. Repeat with the roles of R and C interchan L(xi:.....x)yi _ '<'
ged. Plot the 0 _ Sxy _
p
data and both fitted lines. Why are the two lines different? --- -.:..a.y .
SXX SXX I I
13. Analysis of Normal Measurements 13.6. The Straight Line Model (Continued) 231
230

where the a/s are constan ts: e' == ~{ ~ + (x _ x)aT


ai =(xi - x)/Sxx for i = 1, 2, ... , n.
pis N({J, a 2
e), where I 2(x - x )
The sampling distribu tion of =;; + n .ta,+ (x - x) 2 :Eaf.
e = 'I.ar = 'I.(xi - x) /S1x = 1/Sxx.
2

Inferences about fJ are based on But since x = ~n :Ex1, it follows that ·

P-fJ (13.6.1) 'I.(x, - x) = :Ex, - nx = o,


T= r::l:.-t<n-2>
v s2e
and hence that :Ea,= 0. Since .tar= 1/Sm we have
where s2 is the variance estimate for the straight line model.
The 95% confidence interval for fJ is 1 (x - x )2 ,
e =-+ ---
fJ e p± tftc = p± ts-JliS : n Sxx
Inferences about µ are now based on
where t is the value (from Table B3) such that
P{ -t t<n -l> :s; t} = 0.95. T ' = ji-µ -

--
$ - r.:272 t (n - 2) • (13.6.2)
interval ....;s e'
Note tha t we will be able to determine f3 precisely (i.e. the confidence
will be narrow) if Sxx is large; Thus, if we are plannin g an experim ent to obtain The 95% confidence interval for µ =a+ f3x is
x. so that Sxx = :E(xi - x) 2 is
informa tion about fJ, we should select x 1 , x 2 , . .. ,
the x/s as [ 1 (x - .x)2 ]112
large. To maximize the information, we would need to make half of µ E ji ± t....;r:r:;
~ c; = jJ, ± ts - + ---'-
possible . Howeve r if we did n Sxx
large as possible, and the other half as small as
nce of
this, we would be unable to check the assump tion that the depende is
As a result, one would usually compro mise by taking where t is the value such that P{ -t $ t<• - l> :s; t} = 0.95. This interval
E(Y) on x is linear. s. We can
over the whole range of x-values , but with more observa tions at narrowest when x == x, and its width increase s as Ix - :XI increase
estimate a+ {Jx the most precisely when x is close to x, the mean
observa tions of the
to
the extremes than in the middle of the range. It would then be possible x-values used in fitting the line.
precise stateme nts about {J.
check the fit of the model, and also to make fairly
l
l
Inferences about E(Y)
Given a particul ar value for x, the expected value of Y is µ =a+
fJx, with l Inferences for the Intercept a

The intercept a is the expected value of Y when x = 0. Upon substitu


ting
MLE we find that ahas samplin g distribu tion
µ=a+ px. I x = 0 in the precedin
N(a, a 2 e"), where
g two paragra phs,

Since a == y - '/Jx, it follows that Il ,,


e
1
=-+ - .
x2
Sxx
n
ii= y+ '/J(x-x )
1 Inferences about a are thus based on
= -:Eyj + (x - x ):Eajyj
n

== r. [ ~ + (x - x)a] Yi

ofµ is EXAMPLE 13.6.l. In Example 13.5.1, a straight line model was


fitted ton= 12
where the a/s are as defined above. Hence the sampling distribution g
N(µ , u 2
e'), where blood pressure measurements for women of various ages. The followin
232 13. Analysis of Norm al Measure
ments
13.6. The Straight Line Model
(Continued)
results were obtained: 233
&= 80.78 PROBLEMS FOR SEC TION
.B = 1.138 s2 = 49.247 (10 d.f.) 13.6
x= 52.33 sx:r = 1550.67. !. In Prob lem 13.5.l(a), test the
hypothesis that the line goes thro
ugh the origin, and
obta in a 95% confidence inte
We shall use these results to obt rval for the mea n stre ngth of
ain 99% confidence intervals for x = 0.9 Repeat using the revi bolts with diam eter
{3, a+ 50{3, sed analysis of Prob lem 13.5
and a.+ 10{3. the results. .l(c), and com pare
Inferences abo ut {3 are bas ed
on ( 13.6.1) with c = l /Sxx· Tab 2.tE xpec ted energy use in heat
le B3 gives ing a hou se decreases as the amo
attic increases. To a first app roxi unt of insu latio n in the
P{ -3.169::::; t(lO) ::::; 3.169} = 0.99 mat ion, the expected ener gy per
, linear function of x, a ratin g degree day is a
and hence the 99% confidence of the attic insulation. Small
interval for {3 is well-insulated attic, and large values of x indi cate a
values of x indi cate poo r insu
table gives the observed fuel con latio n. The following
{3 E '/J ± 3.1 69f tc = l.138 ± 0.56 similar cons truc tion :
sum ptio n y and insu latio n ratin
g x for 8 hou ses of
5.
Thi s is also a l % significance
interval. If we test H: {3 = {3 for
value {3 0 outs ide this interval 0 any para met er Insu latio n ratin g x
, we will obt ain S,L < 0.01. In 1.4 1.1 0.9 0. 7 0.5
hypothesis {3 = 0 is very strongly part icular, the Energy use y 0.4 0.3 0.2
con trad icte d by the data . 1.56 1.30 1.34 1.12 1.08 1.09 1.05 1.21
According to the model, the
mea n bloo d pressure of wom (a) Fit a stra ight line to the data
µ= a+ 50{3 . Inferences abo utµ are en aged 50 is , and calculate the vari ance estim
based on (13.6.2) with and the fitted line. Does the ate. Plot the data
stra ight line mod el give a reas
onab le fit to the
,= ~ (x - x) 2 = ~ (50 - 52.33) 2 data ?
= _ (b) Find a 95% confidence inte
c n + Sx:c 12 + 1550.67 0 0868 _ rval for the slope of the line.
(c) Find a 95% confidence inte
rval for the mea n energy use
The 99% confidence interval insu latio n ratin g x = 0.4. of such hou ses with
for a+ 50{3 is
3. A stud y was carr ied out to
a+ sop± 3.169~ = 137.68 ± 6.55. investigate evap orat ion loss from
storage. Nine similar packages pack ages of food in
Similarly, the mea n blo od pres were stor ed for vari ous peri ods
weight losses were recorded. of time, and the
sure of women aged 70 is
Inferences abo ut µ' are bas ed µ'= a.+ 70{3.
on (13.6.2) with Day s in stor age 2 7 9 12 18 23 30 35 40
. '= ~ (70 ·- 52.33)2 = 0.2847.
Weight loss 15 25 40 65 105 105 136 175 180
c 12 + 1550.67 (a) Fit a stra ight line to the data
and calc ulat e the vari ance estim
The 99% confidence interval (b) Plot the data and the fitted ate.
for ex. + 70{3 is line, and com men t on the adeq
line model. uacy of the stra ight
a+ 1op ± 3.169~ = 160.44 ± 11.81. (c) Find a 95% confidence inte
rval for the mea n weight loss in
pack ages stor ed for
two weeks.
As we not ed earlier, the wid th
of the confidence interval for ex.+
as Ix - xi increases. We can esti [3x increases 4. Suppose that both of the follo
wing models are fitted by least
mat e a.+ {3x with the greatest prec squa res to n obse rved
. xis close to x = 52.33, which ision when poin ts (x" y;).
is the average of the x~values}lse
model. d in fitting the
Mod el I: µ;= a+ /3x;;
These confidence intervals are
com pute d und er the assu mpt
stra ight line mod el is correct, ion that the Mod el 2: µ, = y + 6(x; - x ).
and they may be quit e mislead
dependence of E(Y ) on xis non ing if the actual Show that y = y, 3 = p, and that both models give the
line ar. Even if the model seems
very well, ther e is no gua rant
ee that it will apply over a
to fit the data residuals e,. sam e fitted values µ; and
x-values. It is always dan gero us wider rang e ?f 5. Suppose that x = 0 for n
to extr apo late bey ond the rang 1 obse rvat ions , and x 1 = 1 for the othe
the sample. For instance, in e of x-values m 1
(n 1 + n 2 = n). Show that , in this r n obse rvat ions
this example it would be unw case, (13.5.2) and (13.5 .3) give 2
stra ight line model to esti mat ise to use the estimates&, pfor the two-sam the app ropr iate
e the mea n bloo d pressure of ple model (13.4.2).
We have no obs erva tion s nea women aged 30.
r x = 30, and we can not be sure 6. Fitting a Straight Line Thro
stra ight line model will hold in that the same ugh the Origin. Sup pose that
Y1 , Y2 , ... , Y,, are
this region. inde pend ent N(µ 1, u 2 ). Con side
r the mod el µ; = /Jx; where x ,
cons tant s and {3 is an unk now 1 x.2 , •.• , x. are kno wn
n para met er.
13.7. Analysis of Paired Measur ements
235
13 . .Analysis of Norma l Measur ements
234
y reason able to assum e
P is Only the y;'s are used in the analysis, and it is usuall
ng distribution of ed about the distri-
(a) Show that P= r.x,y;/"£.xt, and that the sampli that they are independent. No assum ptions are requir
bution s of the individual measu remen ts A 1 and B 1
2 •
N(/J, a (Lxf).
(b) Show that the variance estimate is details of the analys is will depen d upon what is assum ed about the
The
assum ed to be observ ed values of
1 2 ?>
s2 =--[ "f.y1 -µ"f.x 1 y 1]. distrib utions of the y;'s. If the y/s are
n-1 2 ds develo ped in this chapt er will apply . We
N(µ 1, a ) variates, then the metho
eters /J 1 , p2 , •. • , {Jq as
calcium content. The
7.t A new procedure is being investigated for measuring of ten samples, and the model the µ/s as !inea: functions of unkno wn param
estima tes to minimize
following table gives the actual calcium conten t x for each called for by the s1tuat1on, and choos e param eter
measurement y given by the new procedure. s = I:(y, - µ;)2. (13.7.1)
40.0
x 4.0 8.0 12.5 16.0 20.0 25.0 31.0 36.0 40.0 ce estima tion.
39.4 39.5 There will be n - q degrees of freedom for varian
y 3.7 7.8 12.1 15.6 19.8 24.5 31.1 35.5
l sum of squares. Plot the er of manh ours
(a) Fit a straigh t line to the data and calculate the residua EXAMPLE 13.7.1. The following table gives the averag e numb
data and the fitted line. factor ies of simila r size over a period
1. per month lost due to accidents in eight
(b) Test the hypothesis that the slope of the line is of one year before and after the introd uction of an indus trial safety progr am.
hypoth esis that the line passes throug h the origin.
(c) Test the
and test the hypothesis that
(d) Fit a straigh t line throug h the origin to the~e data, Factory i 1 2 3 4 5 6 7 8
is 1. Why is the result different from that obtained in (b)?
the slope After A 1 28.7 62.2 28.9 0.0 93 .5 49.6 86.3 40.2
Before B1 48.5 79.2 25.3 19.7 130.9 57.6 88.8 62.1
Difference y 1 -19.8 -17.0 3.6 -19.7 -37.4 -8.0 -2.5 -21.9

13.7. Analysis of Paired Measurements ries with the best safety


There is a natura l pairin g of the data by factory. Facto
In Sectio n 12.8 we consid ered an examp le in
which drugs A and B were to have the best record s after the
records before the safety progr am tend
which was more likely to take this pairin g into
admin istere d to the same n subjects in order to see safety progr am as well. The analys is of the data must
to the same subjects, we could the differences y 1 = A 1 - B 1•
produ ce nause a. Because the drugs were given account. One way of doing this is to analyze only
drug A were. indep enden t of observ ations indep enden t N(µ 1, a 2 )
not assum e that obser vation s for V.:e assume that the y/s are observ ed values of
for drug B. vanate s, and that µ 1 = µ 2 = ··· =µ.= a: , say. The
n = 8 differences are thus
ce or presence of an n 13.3). We have
In Sectio n 12.8. we were conce rned only with the absen analyzed as a one-sample proble m (see Sectio
we are measu ring the size of
effect. However, simila r proble ms can arise when &= y = - 15.34, and the varian ce estima te is
an effect.
remen ts (Ai> B 1) for l
Suppo se that an exper iment yields n pairs of measu s2 =
2
- - I:(yi - Y) = 164.l l
i = 1, 2, .. ., n. Because of the way the data were
collected, we expect Ai and B1 n- 1
ce, B 1 and A 1 might be
to be more alike than Ai and B 1 for i of; j. For instan with n - l = 7 degrees of freedom.
same subjec t before and after the
measu remen ts of blood pressu re on the
measu remen ts of gasoli ne mileage for the same 2
The sampl ing distrib ution of & is N(a., a c) where
c = ~, and inferences
admin istrati on of a drug, or n
car before and after a tuneu p.
A , A , •. ., A. and conce rning a: are based on
In such situat ions, it would be incorr ect to treat 1 2
es. The analys is must take the
B 1 , B 2 , •.. , B. as two indep enden t sampl
nt. We shall descri be two ways in which this can
pairin g of the data into accou
interv al for a. is
be done. Since P{ -2.365 st(?) s 2.365}, the 95% confidence
Ct: Ea± 2.365 J?/n" = -15.34 ± 10.71.
Analysis of Differences se in lost manh ours per
The 95% confidence interv al for the mean decrea
, B ) by a suitab le summ ary g to this interv al there is
The simplest appro ach is to replace each pair (A 1 1 month is (4.63, 26.05). Since zero does not belon
the mean nµmb er of lost manh ours.'
statist ic such as the difference evidence of a real decrea se in
236
13. Analysis of Nonn a! Measurements
13.7. Analysis of Paired Measurements
237
Incorrect Analysis
Ther e are n + q para mete rs to estim
ate from the data , and so ther e will
Supp ose that we igno re the pair ing 2n - (n + q) = n - q degrees of freedom be
of the data , and analyze the original for vari ance estim ation .
mea sure men ts as a two- sam ple prob 2n This analysis gives essentially the sam
lem (see Section 13.4). We find that e results as the analysis of differences.
To see this, we note that the solu tion
if= 48.68; ff= 64.01 ; s~ = 977; s~ = 1295 ' of 8S/8 JC; = 0 is
and the pool ed vari ance estim ate is
Upo n subs titut ing for JC, and simplifyi
52
= 7 x 977 +7x 1295 = erro r sum of-squares over the JC;'s
ng, we find that the min imu m of the
1136 is
7+7
with 14 degrees of freedom .
Let a= µA - µ 8 be the decrease in
Smin = !L(y , - µJ2
the mea n num ber of lost man hour where y 1 = A 1 - B,. This is one-half
The MLE of a is a=
A - IJ, with sam plin g distr ibut ion N(a, 2 s.
a c), where be the same as in the analysis of diffe
of (13. 7.1 ). As a result, the ~/s and it;'s
will
c = t + i = t. Inferences abou t a are rences. The vari ance estim ates will
now based on by a factor of 2 because now a 2 = var( differ
A;) = var( B 1) is the vari ance of a sing
measurement, whereas previously we le
T' =v&-a
! I : "'
s2c
t(14) difference A 1 - B,.
used a 2 to deno te the vari ance of
the
If A 1 and B1 are inde pend ent norm al,
where s 2 is the pool ed vari ance estim then A 1 - B; has a norm al distr ibut ion
ate. The 95% confidence interval for as we assumed in the analysis of diffe
based on the two- sam ple mod el is a rences. How ever it is possible to have
A1 - B 1 norm ally distr ibut ed with out
having eithe r the A;'s or the B;'s norm
The revised model with block effec al.
ct. E i): ± 2.14 5fl7 4 = -15 .34 ± 36.18 . ts involves mor e strin gent assu mpt
than are necessary for the analysis ions
The interval is now muc h wider of differences.
than before, and the decrease in An adva ntag e of models with block
num ber of lost man hour s is no long mea n effects is that they can be cons truc
er statistically significant. The varia for situa tion s in which the mea sure ted
nce men ts occu r in blocks of three or
estim ate is infla ted by the large diffe rath er than pairs, and it is not nece mor e
rences amo ng factories, and so the effec ssary that all blocks be of the same
of the safety prog ram does not show t The analysis of differences does not size.
up. easily generalize to these situa tion s.

Models with Block Effects Design of Experiments


Inst ead of anal yzin g differences, we Often it is adva ntag eous to design
coul d retai n all 2n mea sure men ts (A;, the expe rime nt so that it will prod
in the analysis and buil d a mod el whic B;) pair ed mea sure men ts. For instance, uce
h allows for the fact that A and B, are supp ose that trea tmen ts A and Bar
likely to be similar. For insta nce, we 1
be com pare d using 2n subjects, n for e to
coul d assume that the 2n mea sure men each treat men t. Before trea tmen ts
are inde pend ent and norm ally distr ts assigned, subjects are grou ped or bloc are
ibut ed with the same variance, and ked into n pair s so that the mem bers
take a pair are as similar as possible with of
E{B 1} = JC 1; E{ A1} = JC + µ .
for 1 = l, 2, ...I , n. respect to pote ntial ly imp orta nt facto
1 1 like age, weight, prog nosi s, etc. The rs
The para mete rs JC , JC , . . . , JC" are the treat men ts are then assigned rand oml
1 pair or block effects, and µ , µi, ... within each pair, so that for each pair y
2
repr esen t the effects of the trea tmen 1 , µ" we obta in two mea sure men ts A ;, B
t in then pair s. Mea sure men ts A,, Bi Using the analysis of differences, ,.
the sam e pair shar e the para met er from or a mod el with bloc k effects, one
JC; , whereas mea sure men ts A , Bi from eliminate differences between pair s can
from the analysis and obta in a mor
different pair s have different para mete 1
precise com pari son of the treatmen e
rs JC; , JCi . ts.
The µ;'s can now be modelled as linea
r functions of unkn own para mete rs
p1 , p2 , •. • , pq as appr opri ate for the situation . Para mete r estimates are found
by mini mizi ng PROBLEMS FOR SECTION 13. 7
1. The following table gives the resul
ts of a series of meas urem ents of the
coated and unco ated unde corro sion of
rgrou nd pipes:
238 13. Analysis of Normal Measurements 13.7. Analysis or Paired Measurements 239

Soil type: 2 3 4 5 6 5. A study was carried out to investigate the effect of trap color on the catch of
Coated: 15.6 21.0 22.6 56.8 13.2 20.9 whiteflies. Two similar traps, one yellow and one green, were hung side by side on
Uncoated: 10.9 46.7 25.7 69.7 36.7 20.4 each of 8 plants in a greenhouse. The following table shows the weight of whitefiies
caught in each trap.
Soil type: 7 8 9 10 11 12
Coated: 8.6 31.2 25.4 8.5 11.2 35.8 5 7 8
Plant 2 3 4 6
Uncoated: 29.4 10.2 71.6 42.8 23.9 49.2 100.7 23.9 45.4 99.1 125.9
Yellow trap 20.5 42.7 19.4
Green trap 20.0 38.5 15.5 103.6 18.0 47.9 96.4 126.0
Obtain a 99% confidence interval for the mean difference in the amounts of
corrosion for the two types of pipe.
(a) Set up a normal model appropriate for examining the difference in effectiveness
2. Two analysts carried out simultaneous measurements of the percentage of of the two trap colors.
ammonia in a plant gas on nine successive days to find the extent of the bias, if any, (b) Test the hypothesis that yellow and green traps are equally effective in catching
between their results. Their measurements were: whiteflies, and state your conclusions carefully.
(cl Discuss briefly the advantages of conducting the study in the manner described
Day 2 3 4 5 6 7 8 9 rather than by hanging the 16 traps on 16 different plants.
Analyst A 4 37 35 43 34 36 48 33 33
Analyst B 18 37 38 36 47 48 57 28 42
6. Twenty pigs were grouped into ten pairs in such a way that the two pigs m a pair
had nearly equal weight. One pig was randomly chosen from each pair to receive
Obtain a 95% confidence interval for the mean difference in their measurements. diet X, and the other received diet Y. The following are the observed weight gains
On what assumptions does your analysis depend? per day:
3.t Six automobiles of different models were used to compare two brands of tires. Each
car was fitted with tires of brand A and driven over a difficult course until one of its Pair 2 3 4 5 6 7 8 9 10
tires could no longer be used. Tires of brand B were then fitted to the same cars, Diet X 21 21 19 16 26 19 18 29 22 19
and the procedure was repeated. The following are the observed mileages to tire Diet Y 30 25 25 16 29 18 18 19 24 22
failure in thousands of miles:
Give a 95% confidence interval for the difference in mean weight gain under the
Car I 2 3 4 5 6 two diets, and state the assumptions on which your analysis is based.
Brand A 18 23 16 27 19 17
Brand B 15 22 16 21 15 16 7. An experiment was carried out to determine how the defect rate y in a highway
surface depends on the amount x of asphalt cement used in the paving material.
(a) Test whether these data are consistent with the hypothesis that the mean Seven samples with known asphalt eontent were prepared. Each sample was split
lifetimes for the two brands are equal. in two, and two separate tests were made to determine the defect rate.
(b) What factor, other than difference in tire quality, might account for the lower
mileage achieved with brand B? Suggest an improvement in the design of the Asphalt content 50 75 100 125 200 250 275
experiment which would have helped to eliminate this source of bias. Defect rate 195 172 164 175 145 115 108
197 175 163 177 147 115 109
4. Two methods of treating sewage were compared. Each day for eight days, two
similar batches of sewage were selected. One batch was randomly chosen to receive
treatment A, and the other received treatment B. The following table shows the (a) Using all 14 observations (x, y), fit a straight line to the data. Plot the data and
coliform density per ml for the sixteen batches after treatment. the fitted line. Obtain a 95% confidence interval for the mean defect rate when
x = 100 and show it on the graph.
Day 1 2 3 4 5 6 7 8 (b) The graph in (a) suggests that it is not appropriate to model the two
A 16.44 22.00 18.17 20.09 11.02 20.09 24.53 13.46 measurements on the same sample as independent. Instead, it is better to
B 24.53 22.20 29.96 33.12 14.88 18.16 33.12 16.44 replace the two measurements by their average. Redo the analysis in (a) using
the seven observed pairs (x, ji,), and compare the results.
(a) Assuming that differences in coliform density are normally distributed, test the
hypothesis that the treatments are equally effective. 8.t A new technique for determining the fraction x of a given gas in a mixture of gases
(b) A more reasonable assumption in this case is that the logarithmic differem:es was investigated. Eleven gas mixtures with known x were prepared, and each of
log A, - log B, are independent N(µ, u 2 ). Repeat the test in {a) under 'th1~ them was divided into three portions. For each portion, the quantity y of the gas
assumption. which dissolved in a liquid was recorded.
240
13. Analysis of Norm al Measurements
Review Problems
x =con tent 241
y =am ount dissolving x =con tent y = amou nt dissolving
0.080 3. The following are yields (in poun ds)
2.67 2.68 2.75 0.131 of 16 toma to plant s grow n on 8 separ
4.46 4.40 4.43 uniform plots of land. One plant in each ate
0.082 2.73 2.69 2.62 plot was treate d with fertilizer A and the
0.139 4.78 4.80 4.86 .other with fertilizer B.
0.091 2.88 3.02 3.04 0.164 5.77 5.85 5.82
0.095 3.17 3.28 3.18 0.189 6.56 6.65 6.49 Plot
0.096 3.27 3.28 3.08 1 2 3 4 5 6
0.231 7.88 7.97 7 8
0.106 7.76 Fertilizer A 4.0 5.7 4.0 6.9
3.51 3.68 3.58 5.5 4.6 6.5 8.4
Fertilizer B 4.8 5.5 4.4 4.8 5.9 4.2 4.4 6.3
(a) Using all 33 obser ved pairs (x, y),
fit a straig ht line model to the data. Find Test the hypothesis that the two fertlli
a 95% confidence interv al for the expected zers are equal ly effective, and state the
amou nt dissolving in mixtures with assum ption s upon wl'lich the test is based
x =0.1. .
(b) The analysis in (a) assumes that the 4. A study was carrie d out to investigate
3 meas urem ents taken at each value of the depen dence of fuel oil consu mptio n
are indep enden t replicates. This is a quest x the mean atmo spher ic temp eratu re. The on
ionab le assumpt10n because these following are the results obser ved on ten
meas urem ents were obtai ned by dividing winter days.
one gas mixtu re into three portio ns
rathe r than by prepa ring three different
mixtures with the same x . One way
aroun d this difficulty is to replace the three Temp eratu re -3 -2
repea t obser vatio ns at each x by -10 +l -5 -6 -15 -4
their avera ge Yx· Repe at (a) using the 11 Cons umpt ion 150
-9 -2
obser ved pairs (x, Yxl•and comp are the 141 238 132 186 168 218
results. 163 210 169
(a) Fit a straig ht line to the data and calcu
late the variance estimate. Plot the fitted
line and the data, and comm ent on any
difficulties.
REVI EW PROB LEMS FOR CHAP (b) Obta in 90% confidence intervals for
TER 13 the interc ept, and for the mean fuel
consu mptio n on days when the mean temp
1. Two exper imen ts were carrie d out eratu re is - 5.
to deter mine µ, the mean increase in blood
press ure due to a certa in drug. Six differ 5. An experiment was performed to comp
ent subjects were used, three in each are two different meth ods of meas uring
exper imen t, and the following increases phosp hate conte nt of material. Ten samp the
were observed: les were chosen so that the mate rial
within a samp le was relatively homo geneo
us. Each samp le was then divid ed in half,
Expe rimen t 1: one half being analysed by meth od A and
4.5 5.6 4.9 the other half by meth od B.
Expe rimen t 2: -1.2 9.8 21.4 Sample 2 3 4 5 6 7 8 9 10
Indic ate, with reaso ns, which exper imen Meth od A 55.6 62.4 48.9 45.5
t produ ces stron ger evidence that the drug 75.4 89.6 38.4 96.8 92.5 98.7
does have an effect on blood pressures. Meth od B 58.4 66.3 51.2 46.l
Which exper imen t point s to the greater 74.3 92.5 40.2 97.3 94.8 99.0
effect?
Find a 95% confidence interv al for the
2.t Fourt een men were used in an exper mean difference in phosp hate conte nt as
imen t to deter mine which of two drugs meas ured by the two meth ods, and state
produ ces a great er incre ase in blood press the assum ption s upon which your
ure. Drug I was given to seven of the analysis depends.
men chose n at rando m, and drug 2 was
given to the rema ining seven. The observed 6.t In a progeny trial, the clean fleece weigh
increases in blood press ure are: ts of 9 ewe lamb s from each of four sires
were as follows:
Drug 1:
I
0.7 -0.2 3.4 3.7 0.8 0.0 2.0 Sire 1: 2.74 3.50 3.22 2.98 2.97 3.47
Drug 2: 1.9 1.1 4.4 5.5 1.6 4.6 3.47 3.68 4.22
3.4 Sire 2: 3.88 3.36 4.29 4.08 3.90 4.71 4.25 3.41 3.84
(a) Are these data consi stent with the Sire 3: 3.28 3.92 3.66 3.47
hypothesis of equal variances in blood 2.94 3.26 3.57 2.62 3.76
Sire 4: 3.52 3.54 4.13 3.29
press ure for the two drugs? 3.26 3.04 3.77 2.88 2.90
(b) Assu ming the variances to be equal
, obtai n a 95% confidence interval for the (a) Test the hypothesis that the variance
difference in mean blood press ure incre in fleece weight is the same for all four
ase µ 2 - µ 1 , and for the comm on sires.
varia nce ·a 2 •
(b) Assuming the variances to be equal,
(c) It is possible that the incre ase in blood obtai n a 95% confidence interv al for the
pressure with both drugs may depen d comm on variance a 2 •
upon the initial blood press ure of the subje
ct. How shoul d the design of the
exper imen t and the analysis be modified
to allow for this possibility?
14.1. Matrix Notation 243

CHAPTER 14 I The model is called linear if each of the µ/s may be written as a linear
function of the unknown parameters /J 1, /J 2, .. ., /3q· In a linear model, all
of the partial derivatives oµjo{J1 are known constants. The one-sample,
Normal Linear Models I two-sample, and straight line models are examples of linear models with
I
q =I, 2, 2, respectively.
A linear model can be described by a set of n linear equations:
I for 1 i; i ~ n. (14.1.2)

This is also called a multiple regression model. The fJ/s are unknown
parameters, and x 11 , X; 2 , .. ., X; 9 are known constants which describe the
conditions under which the ith observation is made. The x;/s may be values
of quantitative variables such as temperature or age, or values of 0 - I
indicator variables, or a mixture of these.
The n linear equations (14.1.2) can be represented by a single matrix
equation
µ=XfJ (14.l.3)

whereµ is n x 1, fJ is q x 1, and Xis n x q:

[p,
µ{J
X11 X12 X1q
In Chapter 13 we considered some simple models for normally distributed
measurement s. The basic assumptions for these models were discussed in
~~
X21 X22 X2q
Section 13.1, and some statistical methods were described in Section 13.2. fJ = , X=
............. ...........
All of the models considered in Chapter 13 are special cases of the normal
linear model, which is the subject of this chapter. Section 14.1 describes /Jq Xnl Xn2 Xnq
matrix notation for linear models and gives several examples. Section 2
considers the estimation of parameters in linear models, and likelihood ratio X has one row for each observation Y;, and one column for each unknown
tests are derived in Section 3. Section 4 gives some further discussion of the parameter {J1. To obtain X, we just write out the n equations (14.1.2) one
statistical methods described in Section 13.2. Section 5 describes some below the other and detach the coefficients of the fJ/s.
graphical procedures for checking the adequacy of the model. The distri- We shall assume that the q columns of X are linearly independent. If they
butions of the residual sum of squares and the additional sum of squares due were not, it would be possible to rewrite then equations using only q - 1 of
to a linear hypothesis are derived in Section 6. the unknown parameters /3 1, Pi. .. ., /39 •
The remainder of this section describes a few of the many situations
covered by linear models. In particular, all of the models considered in
Chapter 13 are linear models. Results derived for linear models in the
14.1. Matrix Notation following sections are applicable to all of these situations and many others as
well.
As in Chapter 13, we consider n measurement s y 1, y 2 , .. ., Yn of the same
quantity taken under various different conditions. We assume that the y/s are
observed values of independent random variables Y1 , Y2 , ••. , Y,,, where
Straight Line Model (Section 13.5)
for i = I, 2, ... , n. (14.1. l)
The n equations defining the straight line model are
See Section 13.1 for discussion of these basic assumptions.
The basic model (14.1.1) involves n + l unknown parameters µ 1, µ 2 , .. ., µ" µ1 = + /J2X1 = 1 • /J1 +.X1 • /J2
/J1
and a, but we have only n observations. Before we can estimate a, we must
µ2 = /J1 + /J2X2 = 1 • /J1 + X2. /J2
reduce the number of unknown parameters. We do this by writing the µ/s as
functions of q unknown parameters /3 1, /3 2 , .. ., /Jq where q < n. We then have
effectively n - q observations available for estimating a, and we say that there
are n - q degrees of freedom for variance estimation.
244
i4. Normal Lmear Models 14.L Matrix Notati on
245
These can be written in the form µ = X /3 wher
e
Two-Sample Problem (Section 13.4)

[~:l
Here we assume that m of the means are equal
x= [: are equal to {J 2:
to /3 1 , say, and the other n - m
/3 = ::].

1 x.

Polynomial Model We can write this in the form µ= X/3 where


As a generalization of the straight line mode
l, one might consider a second-
degree polynomial model 0
µ1 =/31 + x 1f32 +xf/3 3 0
µ2 = fi1 + X2fJ 2 + X~{J3 m x2
0
µ. = {J 1 + x./32 + x;, /33- X=
Here we have µ = X f3 where
0

[::]
0
µ= . ; (n - m) x 2
0
µ.
Similarly, cubic and higher degree polynomia
µ = X/3 for a suitable choice of X, and are
ls can be written in the form Weighing Experiment (see Example 10.1.1)
examples of linear models. The
comp onen ts of X may be any known const Suppose that three objects with unkn own weigh
ants, such as 0, I, X;, xf, log x;,
sin X;, and so on. The model is still a linear mode ts /3 , {J , and /3 3 are weighed
l if the µ;'s are linear funct ions on a set of scales in all possible combinatio 1 2
of the unkno wn param eters /3 , fJ , ... , ns, giving 7 independent
1 2 /3q. measurements Y1 , Y2 , ... , Y • We assume
7 that the Y;'s are independent
N(µ;, u 2 ) , where
One-Sample Prob lem (Section 13.3)
µl =/31 ; µ1=/32; µ3={33;
In the one-sample problem, we assume that
equal to the same unknown value /3 , say.
then me7ns µ 1 , ~ 2 • . .• , µ.are all µ4 = /31 + /32; µ5 = /31 + {3 3; µ6 = /3 2 + {33;
1 Thus then equa tions are
µ7 = /31 + /32 + {33.
µ1=l ·/31 ; µ1=l ·f31; ... , µ.=l ·/31 ,
and we have µ = X f3 where This model has the form µ= X/3 where Xis
a 7 x 3 matrix with 0, 1 entries.

µ{]
Its transpose is

In this case X is an n x 1 matrix whose comp


onents are all equal to 1.
246 14. Normal Linear Models 14.2. Parameter Estimates 247

Parallel Line Model 4. A standard treatment A and three new treatments B, C, Dare to be compared in an
experiment using 3 mice from each of four litters. Twelve measurements I; are to be
Suppose that for the first m observations we wish to assume a straight line taken according to the following scheme.
modelµ; fJ 1 + fJ 3 x;, and that for the remaining n m observations we wish
to assume another straight line model µ; = fJ 2 + {J 3 x; which has the same Measurement No. 1 2 3 4 5 6 7 8 9 10 11 12
slope but a different intercept. This model can be written µ = X fJ where the Litter No. 1 2 3 4 1 2 3 4 1 2 3 4
transpose of X is
Treatment A A A B B B c c c D D D

Set up a linear model similar to that in the preceding problem, and write it in

X'~l ~ ~J
0 0
matrix notation.
0 0
X1 Xz Xm Xm+! Xm+2 x.

14.2. Parameter Estimates


PROBLEMS FOR SECTION 14.1
Lt Consider six measurements Y1, Y2, ... , Y6 with expected values µ 1, µz, ... , µ5. Four The linear models considered in Chapter 13 were simple enough so that we
. possible linear models are described below. In each case define a matrix X with
could obtain an algebraic formula for each of the parameter estimates. With
linearly independent columns and a parameter vector /3 such that the model can be
written in the form µ = X /J. more complicated models, the estimates are usually determined numerically
by computer using matrix arithmetic.
(a) µ 1 = µ 2 = µ 3 and µ 4 = µ 5 = µ6 As in Section 14.1, we suppose that Y1 , Y2 , ••• , Y,, are independent
(b) µ;= /3 1 + i/3 2 for i::;; 3 andµ;= /3 1 + i/3 3 for i~4 N(µi> 0" 2 ), and consider the linear model
(c) µ 1 = µ 2 and µ, + µ4 = µ5 + µ6
(d) µ1 = µ2 = µ, for i= 1, 2, ... , n.
2. Consider n measurements ¥1, Y2, ... , Y. with expected values µ 1,µ 2, ... ,µ •. A This can be written in matrix notation as µ = X fJ where X is n x q. We
straight line model µ1 = /3 1 + f3 2 x; is to be assumed for the first m measurements,
assume that the q columns of X are linearly independent (see Section 14.1).
and a different straight line model µ 1 = /3 3 + f3 4 x; is to be assumed for the
We noted in Section 13.2 that, under these assumptions, the log-likelihood
remaining n - m measurements.
function is
(a) Define a matrix X such that the model can be written in the formµ= XfJ.
(b) Show that, by adding two columns of X in (a), one can obtain X for two 1
l = - n log O" 0" 2 S
straight lines with equal slopes or with equal intercepts. 2

3. A standard treatment A and two new treatments B, C are to be compared in an where S is the error sum of squares,
experiment using 3 mice from each of four different litters. Twelve measurements Y;
are to be taken according to the following scheme: S = "i:,e[ = "i:,(y; µ;)2.

Measurement No. 2 3 4 5 6 7 8 9 10 11 12
The MLE's Pr.'/1 2 , ••• , pq are chosen to minimize S. The fitted values and
residuals are defined by
Litter No. 1 2 3 4 1 2 3 4 1 2 3 4
Treatment A A A A B B B B c c c c fl;=xil'/11 +x12P2 + ··· +x,q'/Jq;
It is assumed that il;=y;-[l;.

E{Y;} = a 1; E{Y;+4}=a;+l'1; E{Y;+ 8 }=a;+l'2 y


To put this in matrix notation, we let = (y;), µ =(fl;), and = (e 1) be n x 1 e
for i = 1, 2, 3, 4. Here y1 and y2 represent the effects of treatments B and C relative p
vectors, and let = ('/Jj) be the q x 1 vector of parameter estimates. Then we
to the standard treatment A. Define a matrix X and parameter vector /3 such that have
the model can be written in the form µ = X /J. (14.2.1)
248 14. Normal Linear Models 14.2. Parameter Estimates
249
Derivation of ~ with n - q degrees of freedom. Various other quantitie s useful for assessing
the adequacy of the model or for testing hypotheses about the /3/s can also be
The derivative of S with respect to /Ji is obtained from XL (see Sections 14.4 and 14.5).
as To analyze one of the examples from Chapter 13 in this way, we must
ae aµ
a{JJ = 2LB; a{J~ = - 2LB; a{J~ =- 2LB;Xij.
define the appropri ate matrix X as indicated in Section 14.1. For instance, in
Example 13.5.1 a straight line model µ;=a+ /Jx; is to be fitted to n = I 2
Thus P P
1, 2 , .• . , Pq satisfy the q simultaneous equations observations. We take y to be the 12 x l vector of observed blood pressure
measurements 147, 125, .. ., 155, and X to be 12 x 2 matrix with transpos e
forj= 1,2, .. .,q.
l l
These q equation s are equivalent to the matrix equation X'-
[
X1 Xz

X'e=O. (14.2.2)
Then XLy=(X 'X) - 1 X'y is a 2 x l vector whose compone nts are ii: and p.
Substitu ting e= y - xp gives The compute r language APL is particularly convenient for linear models
X'(y - X'/J)=O. because it has a built-in operator tJ for handling the necessary calculations.
Having defined an n x q matrix X and a list of n y-values, one enters 8 X to
It follows that obtain XL, or Y ffi X to obtain p=XLy. Alternatively, statistical software
X'XP= X'y. packages such as SAS, SPSS, BMDP, and GLIM may be used for fitting linear
(14.2.3)
models.
This is a set of q linear equation s in p 1 , p , • . ., Pq·
2
Since Xis n x q, the product X' Xis q x q. It can be shown that, since X has
linearly independ ent columns, the product X' X is nonsingular, and its EXAMPLE 14.2.1. Data were collected to investigate how the amount of fuel oil
inverse (X'X) - 1 exists. Multiplying (14.2.3) by (X'X)- 1 gives required to heat a home depends upon the outdoor air temperat ure and wind
velocity. Table 14.2. l contains the result for n = l 0 winter days.
We expect fuel consump tion to increase as the wind velocity v increases ,
(14.2.4)
and to decrease as the temperature increases . As a first approxim ation, we
where XL =(X'X) - 1X'.
assume that these changes are linear, and that the effect of wind velocity is the
The matrix XL is q x n, and it has the property that same at all temperatures. Thus the y;'s are assumed to be observed values of
xLx = (X'X) - 1 X'X = Iq
where Iq is the q x q identity matrix. Thus XL is a left inverse of X. Note that
Table 14.2.1. Fuel Consum ption (y),
XXL= X(X'X) - X' 1 Temperature (t), and Wind Velocity
(v) on Each of Ten Winter Days
which is n x n and will not equal I" unless X is a sq'.?3re matrix (q = n).
Day y v
1 14.96 -3.0 15.3
Compu tation 2 14.10 -1.8 16.4
3 23.76 -10.0 41.2
Calculat ions for linear models are usually done by computer. The main labor 4 13.20 0.7 9.7
is in finding the q x n matrix XL = (X'X)- 1 X'. From this we can easily get 5 18.60 -5.l 19.3
p= XLy,µ = xp,and e = y - µ. Squaring and summing thencom ponents ofe 6 16.79 -6.3 11.4
gives the residual sum of squares :Eef. The variance estimate is then 7 21.83 -15.5 5.9
8 16.25 -4.2 24.3
1 9 20.98 -8.8 14.7
S
2
= --:Eef 10 16.88 -2.3
n- q 16.l
250 14. Nonna! Linear Models 14.2. Parameter Estimates
251
independ ent N(µ;, a 2 ) variates, where
PROBLEMS FOR SECTION 14.2
µi = /31 + f32ti + /J3Vi for i = 1, 2, ... , 10. I. Show that
Here {3 2 is the effect on mean fuel consump tion of a unit increase in
tempera ture assumin g that the wind velocity is held fixed, and {3 is the effect r.E:f = r.yf - P'(X'y).
3
on mean consump tion of a unit increase in wind velocity with the temper- This formula is useful when calculations are to be done by hand, but
it is
ature fixed . The "general constant term" {3 represents the mean fuel susceptible to roundoff errors.
1
consump tion when t = v = 0. 2.t Set up the straight line model of Example 13.5.1 in matrix notation. Calculate
The model can be written asµ= X/3 where Xis the 10 x 3 matrix shown in (X'X) - 1 and X'y, and hence obtain the parameter estimates . Use the formula
in
Figure 14.2.1. The left inverse Problem 1 to obtain the residual sum of squares.

XL=(X' X) - 1 X' 3. Set up the 3-sample model of Example 13.4.3 in matrix notation. Calculate
(X'X) - 1 and X'y, and hence obtain the parameter estimates . Use the formula
was obtained by compute r using the APL operator tB. Its transpose has the in
Problem 1 to find the residual sum of squares.
same shape as X and is shown rounded to four decimal places in Figure
14.2.1. The vector of paramet er estimates p= XLy is found next, and the 4. The following measurements are from the weighing experiment described
in
fitted model is Section 14.1.

y = 11.934 - 0.6285t + 0.1298v. Objects weighed 2 3 1&2 1&3 2&3 1&2&3


Measurement 12.5 23.9 28.1 31.5 41.9 49.5 61.7
Next we obtain the vector of fitted valuesµ = xp, and the vector of residuais
f. =y - µ. We can then find the residual sum of squares, "Lef = 10.533. The (a) Evaluate X'X and X'y, and show that
variance estimate is s 2 = t "Lef = 1.505 with 10 - 3 = 7 degrees of freedom.

-3 .0
-1.8
-10.0
15.3
16.4
41.2
0.2072
0.2159
-0.4710
0.0127
0.0189
-0.0177
-0.0020
-0.0006
0.0270
(X'X) '-l: ~: =:J
Hence show that the estimated weights are
0.7 9.7 0.4085 0.0302 -0.0080
X= -5.i 19.3 0.0770 0.0030
p, = 11.875 p3 = 28.675
(XL)'= 0.0023
-6.3 11.4 0.2003 -0.0044 -0.0072 (b) Use the formula in Problem 1 to evaluate r.ef, and show that s2 = 3.08375 with
-15.5 5.9 0.0769 -0.0511 -0.0152 4 degrees of freedom.
-4.2 24.3 0.0024 0.0083 0.0083 5.tThe yield Y of a chemical process was measured at each of nine different
-8.8 14.7 0.0737 -0.0162 -0.0037
-2.3 temperatures t1 with the following results:
16.1 0.2092 0.0163 -0.0010
15 16 17 18 19 20 21 22 23
14.96 15.81 -0.85 90 91.9 90.7 87.9 86.4 82.5 80.0 76.0 70.0
14.10 15.19 -1.09
23.76 23.57 0.19 Consider the 2nd degree polynomial model
13.20 12.75 0.45
18.60 µ;=/3, +(t,-19)/ 32 +(t,-19) 2 /33
y=
16.79 p= [ -0.6285
11.9339]
; fl=
17.64
, £=
0.96 for i = 1, 2, .. ., 9.
0.1298 17.37 -0.58 (a) Write this model in the formµ= X/3. Calculate the paramete r estimates
21.83 and
22.44 -0.61 the residual sum of squares.
16.25 17.73 -1.48
20.98 19.37 1.61 1

0. 2~5411 -0.0~1645]
0
16.88 15.47 1.41 Note: [ : : 6:]- -[
6 0.016667
Figure 14.2.1. Calculations for the fuel consumption example.
60 0 708 -0.021645 0 0.003247
252 14. Normal Linear Models 14.3. Testing Hypotheses in Linear Models 253

(b) Use the fitted model to estimate


(i) the expected yield µwhen r = 17.5; and so the MLE of a for a given value of fJ is
(ii) the temperature t for which the expected yield is greatest.
(c) Plot the data and the fitted model. Does the model appear to give a good fit to
a-2(/J) = !n S(/J).
the data?
Let pbe the MLE of fJ under the modelµ= X{J, and let e= Y - X'[J. Then
6. Consider the linear model µ = X /J, and suppose that the columns of X are mutually
orthogonal: 0- 2 = ~s(p) = ~Lef,
for j #- k. n n

Let ci denote the sum of squares of elements in the jth column: and the maximum of the log likelihood is

Prove the following results:


Now let H be an hypothesis which expresses the q parameters f3 1 , /3 2 , ... , /Jq
as functions of p new parameters y 1 , y 2 , • . ., y P' where p < q. If the y/s are
functionally independent, there are q - p degrees of freedom for testing H (see
Section 12.3).
Assuming H to be true, we can find the MLE's y1 , y2, ... , yP, and use these
7. Suppose that a linear model contains a general constant (intercept) term /J 1 , so that to compute P1.
P2 •..., pq and e= y - xp. The new MLE of a 2 is
for i = 1, 2,' .. ., n. a = ~s(p)
2
n
= !L.sf,
n
Show that, in this case, the residuals £, must sum to zero.
and the maximum log likelihood under H is
I
a2 S(p) = -2n logo-_ -2n ·
i7J _ _ 7J 2
l\)J, a)= -n log 11-
14.3. Testing Hypotheses in Linear Models 2
The likelihood ratio statistic for testing H is twice the difference between
In this section we shall derive the likelihood ratio statistic for testing an the two maximum log likelihoods:
hypothesis about the parameters /3 1 , {3 2 , •• ., {3q in a linear model.
D = 2[1(/J, &) - l(p, a)]= n log (a 2 /(J 2 ).
As in the preceding section we suppose that Y1 , ¥ 2 , .. . , Y,. are independent
N(µ;, 11 2 ), and that It follows that

µ,=x, 1 /3 1 +x; 2 /3 2 + ··· +x,q/Jq fori=l,2, .. .,n. D = n log(:Eef /Lef} = n log[ 1 + L.~f J (14.3.1)
In matrix notation this is µ = X fJ where X is an n x q matrix with linearly
where Q is the increase in the residual sum of squares due to the hypothesis:
independent columns. The model involves q + I \unknown parameters
f3 1 , fJ 2 , •• ., fJ q and 11. By ( 13.2.1) the log likelihood function is Q = L.ef- L.ef. (14.3.2)

I Q is called the additional sum of squares due to H. Since D 2 0, it follows that


l(fJ, 11) = - n log a - a S(/3) QzO.
2 2
There are q - p degrees of freedom for testing H, and so (12.3.2) gives
where S(/J) is the error sum of squares:
SL:::::: P{xfq-pl 2 Dobs}·
S(/3) = L.ef = L.(y, - µ;) 2 •
This approximation will be accurate whenever n is much larger than q.
Note that
In what follows, we shall consider the special case of a linear hypothesis H.
81 n I We shall see that, when H is linear, the significance level can be computed
- + -S(/J)
aa = - -
(J
3
11 exactly.
254 14. Normal Linear Models 14.3. Testing Hypotheses in Linear Models 255

Large values of D correspond to large values of F, and so


Testing Linear Hypotheses
SL= P{D 2 DobslH is true}
Let H be an hypothesis which expresses {J 1 , {J 2 , .•• , {Jq as functions of p new
parameters y 1 , y2 , ..• , yP, where p < q. H is called a linear hypothesis if the fJ ;'s = P{F 2 FobslH is true}
can be written as linear functions of the y/s: = P{Fq-p,n-q2 Fobs} (14.3.4)
{J; = b;1 Y1 + bi2Y2 + ··· + b;pYv for i = 1, 2, ... , q. which can be evaluated using Table B5.
In matrix notation, the linear hypothesis is
EXAMPLE 14.3.1. Consider the fuel consumption data of Example 14.2.1. We
H: fJ by fitted the linear model µi = fJ 1 + fJ 2ti + f3 3vi and obtained I.ef = 10.533 and
s2 = 1.505 with n - q = 10 3 = 7 degrees of freedom.
where bis a q x p matrix of constants. We can assume that the columns of b
are linearly independent, since otherwise it would be possible to rewrite the Consider the hypothesis H: /3 3 = 0, which states that wind velocity v has no
effect on the mean fuel consumption. The hypothesized model isµ;= /3 1 + f3 2 t;,
{J;'s as functions of only p 1 of the y/s.
which is another linear model with q = 2 unknown P/s. We can fit this
If H is true, then
model to the data as in Section 13.5, or we can omit the last column of X and
µ=Xf3=Xby=Wy repeat the calculation of Example 14.2.1. In either case, we find that the
residual sum of squares is I.iif = 24.944 with n p = 8 degrees of freedom.
where W = Xb is n x p. Under H we have another linear modelµ= Wy, and The additional sum of squares due to H is then
so the MLE y can be found as in Section 14.2.
We shall show in Section 14.6 that, if H is a linear hypothesis, then Q is Q= r.ef - r.er = 24.944 10.533 = 14.411
distributed independently of the smaller residual sum of squares I.er, and with q p = 1 degree of freedom.
To test H: /3 3 = 0 we compute the observed value of the F-statistic:
Q/rJ2 ~ X~-p> if His true;
F Q-;-1 14.411
I.ef/rJ ~ X~-ql"
2
obs= ~ = l .505 = 9.58.
If H is true, the quantity Q/I.ef in (14.3.1) is distributed as a ratio of
The significance level is then
independent x2 random variables.
To obtain a quantity whose distribution is tabulated, we divide each x
2
SL= P{Fi. 7 2 9.58}::;:; 0.02
variate by its degrees of freedom before taking their ratio. Thus we consider
from Table B5. Thus there is evidence that wind velocity does have an effect
the F-statistic:
on mean fuel consumption, and the wind velocity term should be kept in the
F = (Q/rJ2)-;-(q - p) = Q (q - p) linear model.
(I.sf/rJ 2 )-;-(n q) s2 When q - p = 1 as in the present example, it is also possible to test H by the
method described in Section 13.2. In the next section we will show that these
It follows by (6.10.1) that, if H is true, then F has a variance ratio (F) two methods of testing H will always give the same significance level.
distribution with q - p numerator and n - q denominator degrees of freedom:
EXAMPLE 14.3.2. Table 13.4.2 shows 27 concrete penetration measurements
Q-;- (q p) from a study on a pulse-jet pavement breaker using nozzles of three sizes.
F=----~F
s2 q-p,n-q• (14.3.3)
These data were analyzed using a 3-sample model in Example 13.4.3. We
assumed that the Y;/s were observed values of independent N(µii' rJ 2 ) variates,
The numerator of F is the variance estimate based on the additional sum of and that
squares. The denominator is the variance estimate s 2 =I.if /(n - q) for the
for i = 1, 2, 3.
model µ = X fJ.
Note that the likelihood ratio statistic D is an increasing function of F: The variance estimate was found to be s2 = 140 with n - q = 24 degrees of
freedom, and so the residual sum of squares for this model is
D = n log [1 + Q/I.ef] = n log[l + g____f
n-q
p].
r.r.ei = 24s2 = 3360.
256
14. Normal Linear Models
14.3. Testing Hypotheses in Linear Models
Now consider the hypothesis 257
Table 14.3. 1. Sample Mea ns and Vari
H:µ 1=µ 2=µ 3 ances for the Plastic Gea r Data
which state s that the mea n pene trati on Sample Temp. Sample Sample
is the same for all three nozzle sizes . No. (i) Sample 41
Ther e are q - p = 2 degrees of freed X1 size ni mean y, variance sf
om for testing H. Und er H, the 27 n1 - l
obse rvati ons form a single sample, and
the resi.d ual sum of squares is 1 -16 4 1.755 0.006 4 3
:E:Ee fj = :E:E(yij - f) 2 = 4886.17 2 0 4 1.569
3 0.0058 3
10 4 1.094 0.0101
with 26 d.f., where y is the gran d mea 4 21 3
n of all 27 observations. 8 0.808 0.0188
The addi tiona l sum of squa res due to 5 30 7
H is 4 0.494 0.0265
6 37 4 3
Q = :E:Ee?j - r.:Ee0 = 1526. t 7 0.515 0.013 4 3
7 47 4
r 0.253 0.0353 3
with 2 degrees of freedom. The observed 8 57 4 -0.35 9
value of the F-statistic is 9 0.006 0 3
67 4 ,., -0.68 7 0.0661 3
Q-;-2 1526.17-;-2
Fobs =~= 140 = 5.45.
Now Tabl e BS gives with :E(ni - 1) = n - 9 = 31 degrees of
freedom. The (pooled) variance est-
SL= P{ F 2 , 24 ~ 5.45}:::; 0.01. imate for the 9-sample model is
Ther e is stron g evidence agai nst the
hypothesis of equa l means. The smal
nozzle diam eter gives a significantly l l :E ·2
s 2 = 31
lower mean penC<tration than the :Ecij = 0.02066.
med ium or large di(lmeter.
It wou ld have been possible to desig Now consider the hypothesis
n a test specifically for the purp ose of
dete rmin ing whe ther mea n pene trati on
is significantly less for small diameter H : µ;=a .+ {Jxi
nozzles. For instance, the test statistic for i = 1, 2, ... , 9.
D = t(Y2 + Y3 ) - Y1 could be used. It
would not be valid to do this unless this This is a linear hypothesis which reduces
parti cula r type of depa rture had been the num ber of unkn own para mete rs
antic ipate d prio r to exam inati on of by 7, so there are 7 degrees of freedom
the data . See the discussion on tests for testing H. Und er H we have a
suggested by the data in Section 12. 1. strai ght line model, and from Example
13.5.2 the residual sum of squa res is
EXAMPLE 14.3.3. Table 13.5. 1 gives the :E:Eii& = 1.24663
log lifetimes of 40 plastic gears tested
at 9 different temp eratu res x , x , .. ., with n - 2 = 38 degrees of freedom. The
1 2 x 9 • A strai ght line model was fitted to addi tiona l sum of squa res due to His
these data in Example 13.5.2. It is
also possible to ignore the values of
x 1 , x 2 , .. ., x 9 and analyze the data as Q = :E:Ee&- :E:Ee& = o.60617
nine inde pend ent samples. Usin g the
residual sums of squa res for these two with 7 degrees of freedom .
models, we can test the adequacy of the
strai ght line model. To test H, we com pute
Let yij be the log lifetime for thej th gear
tested at.Jhe ith temperature. We
assume that the yiJ's are inde pend ent Q -7- 7 0.60617 -7- 7
N(µij> 0' 2 ), and that Fobs= -;z = 0.0
2066 = 4.19 .
fori =l,2 , ... ,9.
Now Table BS gives
This is the 9-sample model. From Sect
ion 13.4, the MLE of µi is Ji, the sample
mea n for the ith sample. The residual SL~ P{F 7 , 31 ~4. 19} < 0.01.
sum of squa res is
:E:Ee& = :E:E(yij - y;}1 = :E(ni - l)sr The test gives very stron g evidence agai
nst H: µi =a.+ fJxi.
The reason for this result is appa rent
where si, s~, .. ., s~ are the nine samp temperatures all of the obse rvati ons
from Figu re 13.5.2, where at several
le variances. Using the results from lie on the same side of the fitted line.
Tabl e 14.3.1 we get Ther e is no simple patte rn to the depa
rture s from the line, and it is not
:E:Ee& = o.64046 obvious how the strai ght line model coul
d be altered to give a satisfactory fit.
T..1e most likely expl anat ion of the smal
l significance level is that, owing to
258 14. Normal Linear Models 14.3. Testing Hypotheses in Linear Models
259

2
the way the experiment was performed, the variance estimate s in the where Y1 is the mean of the n; observations at x = x,. Use this formula to check
denominator of Fobs is too small. A considerable amount of time and effort the value for Q m Example 14.3.3.
was required to reset the test machine from one temperature to another. To
4. In Problem 13.4.9, test the hypothesis that the mean distance traveled is the same
save time, the experimenter sometimes ran two or more tests at the same for all fuels, and state the assumptions upon which this test is based.
temperature without resetting the test machine. Repeat measurements
obtained without resetting will likely show less scatter than would be 5. t Several. chemical analyses of samples of a product were performed on each of four
obtained if the machine were reset each time. These repeats do not reflect all successive d.ays, and the following table gives the percentage impurity found in
each analysis.
possible sources of variability in the experiment, and consequently the
variance estimate we obtained is likely to be too small. We do not have a Day 1: 2.6 2.6 2.9 2.0 2.1 2.1 2.0
Day2: 3.1 2.9 3.1 2.5
valid estimate of u 2 , and so interpretation of the results is not clearcut.
Day 3: 2.6 2.2 2.5 2.2 1.2 1.2 1.8
The experiment should have been run in four complete replications. In the
Day4: 2.5 2.4 3.0 1.5 1.7
first replication, one gear would be tested at each temperature, with the order
of testing being decided at random. This procedure would then be repeated (a) Assuming equal variances, test whether there is a difference in
the mean
three more times, with a different random order each time. The four percentage impurity over the four days.
measurements at the same temperature would then be genuine independent (b) Check the equal-variance assumption (see Problem 13.4.5).
2
replicates, and it would be possible to obtain an estimate of u which takes
6. T.hre~ laboratories each carried out five independent determinations of the
into account all sources of variability in the experiment. n.1cotme content of a brand of cigarettes. Their findings, in milligrams per
cigarette, were as follows:
Laboratory A: 16.3 15.6 15.5 16.7 16.2
PROBLEMS FOR SECTION 14.3
Laboratory B: 13.5 17.4 16.9 18.2 15.6
Lt Measurements of breaking strength for six bolts at each offive diameters are given Laboratory C: 14.1 13.2 14.3 12.9 12.8
in Problem 13.5.2. Three different models are fitted to these data. The residual
sum of squares is found to be 0.074317 for a 5-sample model, 0.14066 for a straight Are there real differences among the results produced by the three laboratories?
line model, and 0.07436 for a second degree polynomial model. 7. '."feasu:ements of the ulti~ate tensile ~trength (UTS) were made for specimens of
msulatmg foam of five different densities.
(a) Assuming the 5-sample model to be correct, test the hypotheses H 1 · µi =
/31 + f32di and H2: µ1 = /31 + f32d1 + f33df. Density (x) Ultimate tensile strength (y)
(b) Assuming the second degree polynomial model to be correct, test the
I
hypothesis that /3 3 = 0. 4.155 82.8 95.5 97.5 ~02.8 105.6 107.8 115.7 118.9
3.555 79.7 84.5 85.2 98.0 105.2 113.6
2. Show that the additional sum of squares due to H: µ 1 = µ 2 = · ·· = µk in the 3.55 71.0 98.2 104.9 106.9 109.6 117.8
k-sample model is given by 3.23 67.1 77.0 80.3 81.8 83.0 84.1
k 4.25 98.5 105.5 111.6 114.5 126.5 127.l
Q= I
i=l
nJ.Y; - .YJ2,
<?alculate the residual sum of squares for a five-sample model, and for a straight
where n; and y1 are the sample size and mean for the ith sample, and yis the grand !me model.. Hence test the hypothesis that the dependence of mean strength on
mean. Use this formula to check the value given for Qin Example 14.3.2. density 1s lmear.
3. Consider n = n1 + n2 + ··· + nk pairs of measurements (x;, yii) for j = 1, 2, ... , n1; 8.tTh~ following table gives measurements of systolic blood pressure for 20 men of
i = 1, 2, ... , k. The Y;/s are observed values of independent N(µ 1j, u 2) random vanous ages:
variables, where
Age (years) Blood pressure (mm Hg)
for i = l, 2, ... , k.
30 108 110 106
Show that the additional sum of squares due to H: µ;=a + f3x 1for i = 1, 2, ... , k is 40 125 120 118 119
given by 50 132 137 134
k 60 148 151 146 147 144
Q= L n;(y1 - & - ~x 1 ) 2 , 70 162 156 164 158 159
i=l
260 14. Normal Linear Models
14.4. More on Tests and Confidence Intervals
261
Calculate the residual sums of squares for a five-sample model and for a straight
line model. Hence test the hypothesis that the dependence of mean blood pressure columns : From Section 14.2, the MLE of f3 is
on age is linear.
jJ =" XLY
9. Problem 13.7.8 presents eleven sets of three measurements of the amount of 1
where XL= (X' X)- X' is a q x n matrix of constants.
a gas
dissolving in a liquid.
(a) Calculate the residual sums of squares for an 11-sample model and a straight
line model. Test the hypothesis that the mean amount dissolving is a linear Inferences about /Ji
function of the gas content in the mixture.
(b) Explain why one might expect to obtain a small significance level in (a) even
if The MLE of /3; is a linear combina tion of the Y;'s:
the straight line model is correct.
'iJ,=a1Y1+a2Y2+ ... +anY,,
10. A procedure sometimes used to check the adequacy of a linear model
is to where a 1 , a2 , .•• ,a" are the elements in the ith row of xi. It follows by (6.6.7)
complicate the model by adding extra terms to it, refit, and then test whether
the that #1 ~ N(I.aiµi, cr 2 I.aJ).
new terms are significantly different from zero. For instance, in Example 14.2.1,
the residual sum of squares for the modelµ, = {3 + {3 t, + {3 v, was found to Note that I.aiµi is the product of the ith row of XL with the vectorµ. This is
1 2 3 be the ith element of the matrix product XLµ. But since µ = X (J and XL X = J, it
10.533. If the more complicated model
follows that
µ, = {3, + f32t1 + {33v, + {3.cf + /3 5vt + {36c,v,
XLµ= XL X/3 = 1/3 = (J.
is fitted to the data, the residual sum of squares decreases to 4.442. Using these
results, test the hypothesis that /3 4 = {3 5 = {3 = 0. (A small significance level would
6
Hence the ith element of XLµ is fJ,, and it follows that
indicate possible difficulties with the simpler model.) '
E(jJ;) = ~a 1 µ 1 = (J,.
11. Consider two linear models µ = X fJ and µ = Wy where X and W are
nxq Similarly, I.aJ is the product of the ith row of xi with itself. This is the (i, i)
matrices with linearly independent columns. Suppose that W = Xb where b element of
is a
q x q nonsingular matrix. (This means that the columns of W are linear com-
binations of the columns of X.)
(a) Show that P= bY. It follows that var(p;) = cr 2 v;;, where V;; is the ith diagonal element of
(b) Show that both models give the same fitted values V.
µ and residuals £. Similar arguments can be used to sh6w that cov ('°- 0 .) = cr 2 v ..
µ.,µ) l]"
According to the procedure described in Section 13.2, we start with the
sampling distribution P; - N((Ji, cr 2 v;;). We then standard ize and replace a 2
by s 2, the variance estimate for the model µ = X (J, to obtain
14.4. More on Tests and Confidence Intervals
'/J; - /3;
T= r:I::" - t 1n-q) · (14.4.1)
In Section 13.2 we described procedures for making inferences about a single Vs-vii
paramet er /3, in a linear model. These methods cari also be used to make We can now test an hypothesis concerning (Ji or set up confidence intervals
inferences about a single linear combina tion()= b /3 + b /3 + ··· + bqf3q· for /Ji as in Section 13.2.
1 1 2 2
In this section we give some further discussion of these methods. We
shall
show that the significance tests described in Section 13.2 are equivalent to
likelihood ratio tests, and that the confidence intervals obtained are in fact
maximum likelihood intervals.
As in the preceding sections we assume that Y , Y , .• . , Y,, are independent More generally, suppose that we are interested in a linear function of the (J/s,
1 2
N(µ,, cr 2 ), and that
{)= bi{JI + b2/J2 + ... + bq/Jq = b'{J
for i= 1, 2, ... , n. where the b/s are constant s and b is q x I. The MLE of() is
In matrix notation this is µ = X f3 where X is n x q with linearly independent B=b1P1 +b2P2+ ... +bqPq=b''/3.
,
262 14. Normal Linear Models 14.4. More on Tests and Confidence Intervals
263
Since p= XL Y, it follows that which is an increasing function of T 2 • It follow
s that
B=b1XLY =a 1 Y1 +a2Y 2+ ··· +a"Y" P{D z Dob.} = P{T 2 ~ T~b. } = P{I Tl z 17;,bsl},
where a; is the ith comp onen t of the 1 x q vecto 1 and so the significance tests described in Sectio
r b XL. . . . . . n 13.2 are equivalent to
Since 8 is a linear combination of the Y;'s, its likelihood ratio tests.
sampling d1stnbutlon IS
normal. Since E(p,) = /3; we have It also follows from these results that the maxim
um Jog relative likelihood
function of 8 is
E(B)= b 1 /3 1 +b2/32+ ··· +bq/3q=8.
The variance of Bis a 2 c, where
c = L.af = b'XL(b1XL)' = b'XL(XL)'b = b1Vb.
To construct a confidence interval for 8, we take
Thus we have 8- N(8, a 2 c), where c = b'Vb. - ts Ts t where t is the
Now, proceeding as in Section 13.2, we standardiz appropriate value from Table B3. The parameter
e and replace a 2 bys to
2 values belonging to the
confidence interval are those for which
get
8- 8
T= r.:r:- t<n-q )
....; s2 c
(14.4.2) rmax(8)z -~log{!+ n~qt 2 }.
where c = b' Vb. Inferences abou t 8 are based on Hence the confidence interval is a maximum likeli
this result. hood interval for 8.
Note that (14.4.1) is the special case of (14.4.2) in
which bi= 1, and b1 = 0 for
j #i. EXAM PLE 14.4.1. In Example 14.2.1 we fitted the model
µ;=/31+f32t ;+/33V;, i=l,2 , ... ,10
Con nect ion with Likelihood Ratio Tests to the fuel consumption data of Table 14.2.1. The
matrix (XL)' is given in
Figure 14.2. l, and from this we can find
We showed in Section 14.3 that the likelihood
ratio statistic for testing a
linear hypothesis H is 0.57939 0.02503

l
-0.01 942J
D = n log[ !+ ~ 2 J = n log[l + q-
L.s; n-q
p FJ
v= XL(XL)' = 0.02503 0.00497 0.00017 .
-0.01 942 0.00017
0.00117
where Q and Fare defined in (14.3 .2) and (14.3.
3). .. Parameter /3 3 measures the effect of wind veloc
At the end of this section we shall show that the ity on expected fuel
additional sum of squares consumption. Inferences about /3 are based on
due to the hypothesis 3

H:b 1 /3 1 +b 2 /3 2 + ··· +b 4 /3 4 =8 p3 - /33


T = - - - - t(7)
is given by ~
where c = b' Vb. (14.4.3) where s2 = 1.505(7 d.f.), P3 =0.1298, and v =0.00
There is one degree of freedom for testing H, and compute
33 117. To test H: /3 3 =0, we
so q - P =I. By (14.4.3),
(14.3.3), and (14.4.2) we have

Q + 1 = (B ~ 8) = T 2 •
2
F=
s 2
s c
The likelihood ratio statistic for testing H is SL= P{ l t(7) 1z 3.09}~0.02
from Table B3.
1 A different procedure was used to test H : {J = 0
D = n log { 1 + - - T1 in Example 14.3.1. There
n-q } , we refitted the model with /3 = 0 and calcu 3
3 lated the additional sum of
264 14. Normal Linear Models 14.4. More on Tests and Confidence Intervals
265
squares Q. We then found
g(/31, /3z, ···, /3q, A.)= S(/31, /32, ... , /3q) + 2A..('Lbj/3; - 8).
+1 The extra variable), is called a Lagrange multiplier. We now minimize g over
Fobs= =9.58;
the q + 1 variables /3 1, /3 2, ... , /Jq, and ),.
The derivatives of g are
SL= P{F 1 , 7 ~ 9.58}.
og
Since Fobs= r;b,, and since tf
7> is distributed as F 1 , 7 by (6.10.7), it follows o). = 2('Lb)3;- B);
that
P{ F 1, 7 ~Fobs} = P{ tf7> ~ T;b,} = P{JtmJ ~ 17;,bsl}. ag
Both of these procedures are equivalent to the likelihood ratio test, and
therefore they will always give the same significance level. Upon setting these derivatives equal to zero, we find that p1 , p2 , ..• , f3q and A
To complete the example, we shall find a 95% confidence interval for the satisfy the q + 1 equations
mean fuel consumption on days when the temperature is - 5 and the wind
velocity is 20. According to the model, this is
'Lb;lJj = e;
forj= 1, 2, ... , q.
e= /31 - 5/32 + 20f33 = b1/3
where b' = (1 5 20). The MLE of fJ is .N_ot~ that the restriction "2:.b;/3; = fJ is satisfied at the minimum. Also, if({i, A)
m1mm1zes g, then lJ minimizes S.
e= '/11 - 5'/12 + 2op3 = 11.61. In matrix notation, the q + 1 equations are
The variance of eis a c where
2
X's= 'lb; b'P = 8

lj
c =b'Vb
where b is q x 1. Substituting 8 = Y - xp gives

l
0.57939 0.02503
-0.01942J [ X'(y - XTJ) =Ab
= [1 -5 20] 0.02503 0.00497 0.00017 -5 and now multiplying by V = (X' X) 1 gives
-0.01942 0.00017 0.00117 20
lJ=(X'x)- 1 X'y-X(x'x)- 1 b = '/J-AVb.
= 0.11025. Since b'P = 8, it follows that
Now, by (14.4.2), the 95% confidence interval for 8 is
8 = b' p - Ab' Vb = e Ac
eE e± 2.365ftc = 17.67 ± o.96. where B= b' '/J and c = b' Vb, and therefore
We know that this is also a maximum likelihood interval, so each value of e A= (8- 8)/c.
belonging to the interval has a higher maximum relative likelihood than any
value outside the interval. I Also, since 7J = '/1-A.Vb, we have

PROOF OF (14.4.3). Let 7J denote the MLE of f3 under the hypothesis e= y-XP = y-XP + AXVb = e+ l.XVb.
H: b 1 /3 1 + b2 /3 2 + ··· + bq[Jq = 8, The residual sum of squares under H is
and let ii= Y - XP be the vector of residuals. Then 7J is the value of f3 which "2:.sf =s's=(£+ J:xvb)'(e + :l.xvb)
minimizes
= e'e + A b'V'X'XVb +cross-product terms.
2
S = "J:.(yi - µ;)2 = "J:.(yi Xn/31 - ... - X;q/Jq) 2
The cross-product terms are zero because X'e = 0 by (14.2.2). Since
subject to the restriction "Lb;/3j = e. V = (X'X)- 1 and V' = V, it follows that
To find TJ, we use the method of Lagrange. We define a new function of
q + 1 variables,
14. Normal Linear Models 14.5. Checking the Model 267
266

Blunt cracks (not cycled) Sharp cracks (cycled)


Hence the additional sum of squares due to H is X1 = -! X 1 =0 x, = l X1 = -1 X 1 =0 Xi= I

Q = I;ef- L.ef =:Pc =(B- 8) 2/c X2= -! 10 10 8 5 2


0
which is (14.4.3). X2=0 16 14 13 8 7 6
Xi= I 17 16 11 16 12 8
PROBLEMS FOR SECTION 14.4
Lt Consider the data from the weighing experiment in Problem 14.2.4. It is assumed that burst strength is a linear function of x 1 and x 2 , that the effect of
(a) Find a 95% confidence interval for {1 3.
cycling is the same at all levels of crack length and temperature, and that errors are
(b) Test the hypothesis {1 2 = 2{1 1 in two ways: independent N(O, cr 2 ).

(i) use (14.4.2); (a) Using an indicator variable x 3 = ± I for crack type, set up a linear model
(ii) use the hypothesis to simplify the ·model, then refit and use the additional corresponding to the above assumptions.
sum of squares method. (b) Fit the model by least squares, and compute the variance estimate. (If you did
(a) correctly, X' X will be a diagonal matrix, and the calculations are easy.)
(c) Assuming the simplified model in b (ii), recalculate the 95% confidence interval (c) Test the hypothesis that cycling has no effect on mean burst strength.
for {1 3 . (d) Obtain a 95% confideni:e intervalfor the mean burst strength of pressure tubes
2. Consider the chemical yield data in Problem 14.2.5. with sharp cracks of length + I at the lowest operating temperature.

(a) Find a 95% confidence interval for the expected yield of the process when 5. Two random samples of 11 lambs each were used in an experiment to assess the
t = 17.5.
effect of a treatment on body weight. One sample received the treatment, and the
(b) Test the hypothesis {1 3 '"'0 in two ways: other sample served as the control group (no treatment). The body weight y (in
pounds) and age x (in days) were recorded for each animal at the end of the
(i) use (14.4.2); experiment.
(ii) use the additional sum of squares method.
y 35 34 34 35 26 32 24 33 23 20 15
3.tThirteen sets of observations were taken on the variables y, x 1 , x 2 , and X3. Here y Control x 83 81 80 78 73 72 72 70 70 65 54
is the percentage of bacteria surviving a treatment, and x 1 , x 2 , x 3 are the
concentrations of three chemicals used in the treatment. The model y 45 44 44 46 42 39 38 40 38 31 23
Treated 72 70 66 54 50
x 90 83 80 80 79 74

l
lS:iS:l3

was fitted to the data by least squares, with the following results: (a) Plot the data. Fit different straight line models for the two samples, and

l""l
-0.08 -0.09 compute the total residual sum of squares.

1.02
8.06
-0.08 0.008 0.002 -0791
0.003
(b) Fit parallel straight lines to the two samples and calculate the residual sum
of squares.
P= -1.86
; 1
(X'X) - =
-0.09 0.002 0.017 0.002 (c) Use the additional sum of squares method to test the hypothesis of equal
slopes.
-0.34 -0.79 0.003 0.002 0.087 (d) Repeat (c) using a t-test.
(e) Assuming the parallel line model, find a 95% confidence interval for the
The residual sum of squares was 38.7. increase in mean weight due to the treatment.
(a) Obtain a 95% confidence interval for {1 2 -{1 3 .
(b) Test the hypothesis {1 3 = 0.
(c) The model was refitted with {1 2 = {1 4 = 0, and the new residual sum of squares
was found to be 167.6. Test the hypothesis {1 2 = {1 4 = 0. 14.5. Checking the Model
4. An experiment is performed to investigate the dependence of burst strength (y) on
crack length (x 1 ) and operating temperature (x 2 ) in pressure tubes. Cracks of three Example 13.5.3 demonstrates the importance of plotting the data to check that
different lengths are cut in specimens, and half of these are cycled to sharpen the a straight line model is reasonable. This is doubly important with more
cracks. The specimep.s are then tested at three different temperatures, and the burst
complex linear moi:lels. In this section we briefly describe some procedures for
strength is determined. The following table gives the results (simplified by changes
checking the assumptions which underly the normal linear model. Most of
of origin and scale).
268
14. Norma l Linear Models
14.5. Checking the Model
269
these involve looking for patter ns in plots of residu
als or standardized
residuals. For a more detailed discussion, see Chapt combination of y 1 , Yi, .. ., Yn and then find m;i as
er 3 of N. Drape r and the coefficient of Yi· For
H. Smith, Applie d Regre ssion Analy sis, 2nd Editio instance, in the one-sample problem we have
n, Wiley (1981).
The residual ei = Yi - [1 1 can be regarded as an
estimate of the error • 1
ei = Y; - µi. Since the Y;'s are assumed to be indep µ1= .Y = -(Y1+ ··· + y;+ ··· +y.)
endent N(µ 1, c; 2), the s/s n
are independent N(O, c; 2 ). Thus, if the model is correc
t, we would expect
the e;'s to look like independent observations from
N(O, a 2 ) . and so mii = ~.
In the k-sample problem, [1 1 is the sample mean Yi
for the
In fact, the e/ s have unequal variances less than fli. n
To see this we note sample which contains the ith observation. The levera
that, by (14.2. 1) and (14.2.4), ge is mii = l / ni where ni
is the number of observations in this sample. Thus
fi=X P=X XLy= M y, in Example 13.4.2, the 8
observations at 21 °C have leverages t, and the other
4 observations at 30°C
where M = (mii) is an n x n symmetric matrix : have leverages ;i-.

M =XX L =X(X 'X) - 1 X'.


Thus [1 1 and ei are linear comb inatio ns of Yi, Yi , .. Residual Plots
., Yn:
It is a good idea to plot the standardized residuals
µ,=m il Y1 + m;2 Y2 + ··· + m;.y.; r 1 or residuals i: 1 versus the
(14.5.1) fitted values fii· Four such plots are shown in Figure
14.5.l. In the first of
e; =Yi - f11 = Y1- m;1 Y 1 - mi2Y2 - ··· - m1.Yn· (14.5.2) these, the n points (r 1, [iJ lie in a band of roughly consta
nt width about zero,
Using (5.5.5) , (5.5.6), and properties of the matrix and the model appears to be satisfactory. In (ii), there
M , it can be shown that is more scatte r in the
residuals as [1 increa ses. A possible explanation
E(ei) = 0 and var(e 1) = (1 - mii)c;i _ Thus e + <iJ of this is that the error
1 I - mi; has a standardized variance c; i is not constant. A nonlinear transf ormat
normal distribution. ion of the y/s, suth as a
logarithmic transformat ion, may help to remedy the
Usually c; will be unkno wn, and we replace it by the problem.
estim ates to obtain the In the third diagram there is an outlying point. If this
standa rdized residual point is removed and
the model is refitted, the new residual plot should
resemble case (i). An
explanation for the outlier should be sought, and
both the outlier and the
revised analysis should be reported:
If the model is satisfactory, then r , r , • . ., r. should
1 2 look like observ ations
from N(O, 1). About 95 % of the r/s should lie
in the interval ( - 2, 2), r
and about 68% in the interval ( - 1, I). For most purpo
ses, it is better to plot
standardized residuals, althou gh usually a graph of
the residuals will show a • 2 - - - - -)(- - - - - - - - - x x
similar pattern. The r/s, like the £/ s, are slightly correl x
ated, but this does not x x
seem to create difficulties provided that n, the numb
er of observations, is x
x µ
much larger than q, the numb er of {3/s in the linear x ___ }5 __
model. -2 _x____ ____ x

' .)<,
(I) Model sati sfactor y (i i ) Noncon stanl varian ce'
~alculation of Leverages
The quant ity m;i which appears in the expression x
for the standardized
residual ri is called the leverage of the ith point. The x x
leverages are the diagonal •2 - - - - - - - - -.- - - - - - ----x- - - ~- ---- -
elements of the n x n matrix M =XXL . Since XL " x
X =I, it follows that x x x x
MM= M , and from this result it can be shown that x
0 ::;mi; ::; 1. x µ -1---
x -x~~~~-x~x
~~µ
When the matrix methods of Section 14.2 are used, x x
the leverages may be -2 - - ---- ---- X.-- -
_:x____ _ _ _ _ _ j5. __ _
calculated by multiplying X and (XL)' term by term x
to obtain a new n x q x x
matrix and then finding its row totals .
(iii) Outlier (iv) Curvat ure
When algebraic formulas are used as in Chapt er 13,
we write [ii as a linear
Figure 14.5.1 . Pattern s in residual plots.
14.5. Checking the Model 271
14. Normal Linear Models
270
d results with
the means positive and negative residuals, and compa re the observe
In (iv) there is a pattern in the residual plot which suggests that ces (see Section 2.7).
e explan ation in (ii). theoretical results for random sequen
µ;have not been modelled correctly. This is also a possibl
tic term to the model should
With a straight line model, addition of a quadra
find a remedy
fix the problem, but some detective work may be needed to
. Looking for Influential Points
with more complex models
lly the same
For straight line models, a plot of r; ore; versusµ, gives essentia influence on
er, pattern s will show up more Sometimes just one or two of the observations will have a large
inform ation as a plot of Yi versus X;. Howev an extrem e case in
shows only deviati ons from the fitted line. the analysis. The fourth data set in Example 13.5.3 shows
clearly in the residual plot which ent upon one observ ation
that cov (61> fl;) = 0, but that cov (e" y;) =var (ei) == which the estimation of the slope is entirely depend
It can be shown to detect influen tial points when they
y will general ly not be at x == 19. We would like to be able
(1 - mi;)a 2 • A plot of 61 or r1 versus observed values 1 x linear models.
is correct. occur in more comple
helpful, because we expect it to show a pattern even if the model that
be useful, depend ing upon the situation. By (14.2.4) we have 'fJ = XLy where XL= (xb) is q x n. It follows
Various other residual plots may
to plot the residua ls in Examp le 13.5.l against for i == 1, 2, .. . , q.
For instance, we might wish
women if these were availab le. In the plastic gear exampl e,
the weights of the
a unit increase
we might plot the residuals in the order that the corresponding
measur ements Thus xfj is the amoun t by which 'fJ 1 would change as a result of
a systematic of we can determine
were made, with the purpos e of checking whether there was row
in the ith XL,
in YJ. By examining the elements
or two points which strongl y influen ce the estima tion
change in laborat ory conditions. whether there are one
a word of
Although residual plots are very useful in statistical analysis, of {3;.
n will fli by mii
caution is necessary. Even if the model is correct, random
variatio Similarly, (14.5.1) shows that increasing Yi by one unit will change
most people would then fl; is determ ined almost entirely by
produce pattern s in the residuals rather more often than units. If the leverage m11 is close to I,
every residual
expect. Many beginners spot an "unusual" pattern in almost just the one observ ation y 1 •

In standardizing residuals, we divide by JI - mu. The effect


plot that they examine! of this is to
to generate to those for
To help judge whether an observed pattern is "real", it is helpful increase the magnit udes of residua ls for influen tial points relative
them in the same way
n observations from N(O, 1) on the compu ter and plot less influential points.
feeling for the
as the residuals. By r~peating this several times one gets a
to interpr et the The
amoun t of random variatio n, and is in a better positio n
EXAMPLE 14.5.1. Consider the fuel consumption example of Section 14.2.
Figure 14.2.1. Lookin g down the 2nd column (which
observed graph. matrix (XL)' is shown in
observ ation in
is the 2nd row of XL), we see that y 7 is the most influential
determining the estimated temperature effect p2 . An increas e of 1 unit in y 7
Checking the Independence Assumption increas e of 1 unit in
would change P2 from -0.628 5 to -0.679 6. Similar ly, an
y 3 would produce a rather large change in p 3 , from 0.1298 to 0.1568.
the analysis. term-by-
The assumption that the Y;'s are independent is crucial to To find the leverages mu, we multiply X and (XL)' in Figure 14.2.1
may indicat e a lack of the row totals. The
Clusters of points in a plot of the data or the residuals term to obtain a new 10 x 3 matrix, and then calculate
much as possibl e about how the
independence . It is import ant to learn as results are as follows:
that such difficulties can be anticip ated and
data were collected in order
the plastic gear exampl e, knowle dge of the way the experim ent 2 3 4 5 6 7 8 9 10
explained. In
at the same temper -
was run is enough to suggest that repeat observations 0.14 0.17 0.82 0.35 0.11 0.15 0.78 0.17 0.16 0.16
Examp le 14.3.3). mu
ature are not independent replicates (see
If the y/s are observed sequentially in time, there may be some
effect from one time period to the next. For instanc e, monthl y
carryover
expend iture I' Two observations, YJ and y 7 , are highly influential in determining
e, a unit increas e in y would increas e [t by 0.82.
the fitted
The fitted
expens es are some- values. For instanc 3 3
figures may show a zigzag pattern because month-end by just the one
times included in the completed month and sometimes in the
y 1 _ 1 is large, then y 1 will tend
serial correlation, and it
to
should
be small.
show up
This
as a
sort
trend
of
in
depend
a
new month. If
ence
scatter-
is called a
plot of the
Il value at t 3 = -10, v3 == 41.2 is determined almost entirely
observation y 3 , because there are no other points (t;, v;)
One would be reluctant to put much faith in the model for values
close to (- 10, 41.2).
of (t, v) in
e runs of this vicinity.
n - 1 points (e;, e1_1) or (r 1, r; _ 1). Another possibility is to examin
272
14. Normai Linear Models
14.5. Checking the Model
Checking the Normality
Assumption 273
Fai lur e of the nor ma lity a
ass um pti on is less seriou 0 0
0
or inc orr ect mo del lin g s tha n lack of independe 0 00
a aaoaooo oo 0 0 0 0
of the µ/s . So lon g as nce 0-- --- o 0 0 0
no rm al, it is still rea the Y;'s are no t too far --- --- --- --- --- ---00 00 00 00 00 00
--- --- ooo
--- --- --- -- - - ---
son abl e to est im ate from
:E(y; - µ;) 2 , and the analys par am ete rs by minim Figure 14.5.3. Histogram
of transformed residu als
is will give sensible results izing u = F(r) for the plastic
nor ma lity is on var ian . Th e biggest effect of non (k-sample model). gear dat a
ce est im atio n. If the Y;'s -
bu tio n of :Lef /a 2 ma y are non -no rm al, the dis
be qui te unlike xf•-qJ> tri-
(13 .2.4) ma y be seriously and intervals for a 2 bas the nor ma lity ass um pti
in err or. Te sts and con ed on on holds, the u's sho uld
are mu ch less severely fidence intervals for the uniformly between 0 and be sca tte red ran dom ly
affected by dep art ure s {3/s I. Fig ure 14.5.3 shows a and
No te tha t, by (14.5 .1), e; fro m nor ma lity . uii = F(rij) for the plastic his tog ram of the 40 val
is a linear com bin ati on gear exampfo. It is per hap ues
the Ce ntr al Lim it Th eor of Y1 , Y2 , .. ., Y,,. Because there is an excess ofl arg s easier to jud ge wh eth
em , the dis trib uti on of£ of e or small values from Fig er
wh en the Y;'s are decide , ma y be close to nor ma 14.5.2. ure 14.5.3 tha n from Fig
dly non -no rm al. Th e fac l even ure
no rm al doe s no t imply t tha t the &;'s or r/s app A useful exercise is to
tha t the nor ma lity ass um ear generate sets of n = 40
lar ge nu mb er of ind epe pti on is correct. On e needs com put er and plo t them ran do m num ber s on the
nde nt replicate me asu rem a as in Fig ure 14.5.3. In thi
ent s in ord er to have a go for how the gra ph sho uld s wa y on e can get a fee
cha nce of det ect ing non od loo k if the nor ma lity ass ling
-no rm ali ty. present example, the obs um pti on is satisfied. In
Ma ny gra phi cal pro ced erved gra ph does no t app the
ure s for checking nor ma evidence aga ins t the nor ear unu sua l, and the re is
the lite rat ure . A pro ble m lity hav e been pro pos ed ma lity ass um pti on . no
wit h all of these is the dif in
pat ter n in the gra ph is ficulty in jud gin g wheth Th e same technique can
indicative of dep art ure er a be used to check oth er
cha nce var iat ion in the s from nor ma lity rat her models by tak ing F to con tin uo us pro bab ilit y
dat a. tha n be the app rop ria te cum
ula tiv e dis trib uti on fun
ction.
EX AM PLE 14.5.2.
Co nsi der the analysis of PROBLEMS FOR SEC
pro ble m in Ex am ple 14. the plastic gear dat a as TIO N 14.5
3.3 . We not ed ear lie r in a 9-sample
mo del the leverages are the section tha t und er 1. A straight line mo
obs erv ati ons . Th us the
i for the 8 obs erv ati ons at
21°C and i for the oth er
this
squares.
del µ = rx + px is fitted
to n observed points (x
sta nda rdi zed residuals 30 1, y 1) by lea st
are
(a) Show that the lev
erage of the ith poi nt is
Yu - f;
'ii=-~
fi1 1-
-1 for i # 4
Sy 1- 4
wh ere s = 0.02066.
2 m;; = -1 +(x 1 -x-) 2/Sxx
n .
ln Fig ure 14.5.2 the 40 Which of the residuals £
sta nda rdi zed residuals 1 , . .. , £" have the smalle
ob tai n wh at is essentiall are plo tte d ou t on a line (b) Which observations st variances?
y a his tog ram with a lar to will be the most influen
pic tur e shows rea son abl ge num ber of classes. Th tial in determining f3?
e sym me try abo ut 0, and is 2. Calculate leverages mu
too ma ny very large or there do not app ear to and standardized residu
small observations. On be 13.5.2, and plot the standa als r1 for the four dat a sets
e cou ld poo l classes, com rdized residuals versus the in Table
exp ect ed frequencies fro put e undefined for the last pom fitted values. (Note tha t
m N(O, 1), and test goo t in dat a set 4. Th e fitted r, is
However, this test is no dne ss ?f ~t as in Sectio line must go thr oug h this
t usually rel eva nt bec ~ 12.5. and so this point gives no
information abo ut the ade poi nt,
ext rem e tails of the dis aus e 1t 1s dep art ure s m quacy of the model.)
trib uti on which are of the 3. Calculate residuals, lev
Alternatively, we can pri ma ry interest. erages, and standardized
tra nsf orm the r's via and plot standardized res res iduals in Problem 13.5
tra nsf orm ati on u = F(r), the pro bab ilit y integr iduals versus fitted values .l(a),
where Fi s the c.d.f. of N(O al increases the relative ma . No te tha t sta nda rdi zat
r's are obs erv ati ons fro , l) (see Section 6.3). lf the gnitude of the residual ion
m N(O, 1), the n the u's are because this poi nt has a for the observation at
obs erv ati ons from U(O, high leverage. x = 1.5,
1). If 4. Calculate leverages
and standardized residu
0 standardized residuals ver als in Pro ble m 13.5.4(a
000
o n a sus fitted values. Co mm ), and plot
--:3------~2 __ __g_~_
D tDD DOD ent on your findings.
!!L~~I g__ g_: ~-----; 3 5. Check the standardiz
ed residuals for normality
Figure 14.5.2. His tog ram _; it seem more reasonable in Problems 13.3.7(a) and
of standardized residuals to assume normality of (b). Does
model). for the plastic gear dat a the log counts?
(k-sample 6. t (h) Let M be a symme
tric idempotent n x n ma
Show tha t each diagonal trix, so tha t M' = M and
element of M must lie bet MM = M.
ween 0 and 1.
274 14. Normal Linear Models 14.6. Derivations 275

(b) Show that X(X'X)- 1 X' is symmetric and idempotent, and hence that the The set of all such linear combinations is a vector space 'Y(X) called the
leverages mu lie between 0 and 1. column space of X. Since the X/s are assumed to be linearly independent,
(c) Show that, if the ith point has leverage mu 1, then fl1 = Y1 and 81 = 0. 'Y(X) has dimension q.
Let P 1' P 2 , •• ., Pq be a set of normed orthogonal basis vectors for 'Y(X).
One way to construct the P/s is by applying the Gram-Schmidt ortho-
* 14.6. Derivations gonalization procedure to the columns of X. Let P be the n x q matrix with
P 1 , P 2 , •.• , Pq as its columns. Since the P/s are normed.and orthogonal we
In this section we derive the distribution of thexesidual sum of squares in the have PjP1 = 1 and P:P1 =0 for i 'I= j. It follows that P' P =I.
normal linear model, and also the distribution of the additional sum of Since P 1 , P 2 , ••• , Pq is a set of basis vectors for 'Y(X), every vector in 'Y(X)
squares due to a linear hypothesis. We used these results in earlier sections to can be written as a linear combination of P 1 , P 2 , ••. , Pq. In particular, each of
set up significance tests and confidence intervals. the X/s can be written as a linear combination of P 1 , P 2 , .. ., Pq,
We assume that Y1 , Y2 , ... , Y,, are independent N(µ;, u 2 ), and thatµ= X/3
where X is an n x q matrix of constants with linearly independent columns. X 1 =P 1 a 11 +P 2 a21 + ··· +Pqaq1
Let Ui = (Y;- µ;)/u, so that Y; = µi + uUi. In matrix notation we have for some constants aiJ· Thus we have X =PA. The matrix A must be
nonsingular because both X and P have rank q. 0
Y = µ + CJU =X/3 + uU
where U is an n x 1 vector whose components U i. U 2 , ••• , u. are indepen- Theorem 14.6.1. Under the normal li~ear model assumptions stated above, 'Eef
dent N(O, 1). is distributed independently of p, and
The vector of fitted values is
1 :E' 2 2
µ= xp = xxLy = xxL(X/3 + CJU). u2 ei ~ X<n-q)·

Since XL X =I, we have PROOF. By the Lemma we can write X =PA where P is n x q with normed
fl=Xf3+uMU (14.6.1) orthogonal columns and A is q x q nonsingular. Let C = (PIR) be an n x n
orthogonal matrix Whose first q columns are the columns of P. Since
where M =XXL. The vector of residuals is C'C =CC'= I, we have P'P =I, R'R =I, and
il= Y-fl = (X/3 + uU)-(X/3 + uMU) PP'+RR'=l.
=u(I-M)U. (14.6.2) Since P' P =I and A is nonsingular, we have
In proving the theorems below, we shall construct an orthogonal matrix C M =XXL= X(X'X)- 1 X'
and then consider the orthogonal transformation Z = C'U. Then Z = (Z;) is
n x 1, and by Theorem 7.3.1, its components Zv Zz, .. ., z.
are independent = PA(A'P'PA)- 1 A'P'
N(O, 1). = PAA- 1 (A')- 1 A'P' =PP';
The following lemma will be used in constructing the required orthogonal
transformation. 1-M=l-PP'=RR'.

Lemma 14.6.1. Let X be an n x q matrix with linearly independent columns. It follows from (14.6.1) and (14.6.2) that
Then there exists an n x q matrix P and a nonsingular q x q matrix A such that fl= X/3 + uPP'U; e=uRR'U.
X =PA and P'P =I.
Since R' R =I, the residual sum of squares is
PROOF. Let X 1 , X 2 , .. .,Xq denote the q columns of X, and let b=(b) be a
q x 1 vector of constants. The product Xb is n x 1 and represents a linear 'Eef = e'e = u 2 U'RR'RR'U = u 2 U'RR'U.
combination of the columns of X: Now consider the orthogonal transformation Z = C'U. Then Z 1 , Z 2 , ... , z.
Xb=X 1 b 1 +X 2 b2 + ··· +Xqbq. are independent N(O, 1) variates. Note that P'U contains Z 1 , Z 2 , ..• , Zq
and R'U contains Zq+ 1 , Zq+ 2 , ••• , z..
Sinceµ is a function of P'U and is a e
*This section may be omitted on first reading. e
function of R'U, it follows that fl and are distributed independently. Also we
276
14. Nonna! Linear Models

have

~Iet
11
=(R'U )'(R'U )
CHA PTE R 15

= sum of squares of comp onent s of R' U

=Z:+1+z:+2+ ··· +z~ .


Sufficient Statistics and Co nd itio nal
Now (6.9.9) implies that i::i.f/11 2 is distributed as
Tests
xf.-q»
Since X'e = 0 by (14.2.2), it follows that
P= (X'X ) - 1
X'Y = (X'X )- 1 X'(ji +€)
=(X' X)- X'jJ..
1

Thus '[3 is a function of fl.. Since jJ. and eare distri


buted independently, so are 'f3
and Ief .
o
Theorem 14.6.2. Suppose that Y , Y , ••• , Y,, are
1 2
JL = X fJ as above . Let H be a linear hypothesis,
independent N(µ;, 11 2) with

H: /3= Ay
where A is q x p with linearly independent colum In this chapter we discuss some general principles
ns. Let Ief be the residual sum
of squares under H. and let Q = I f.r - Ief be of statistical inference and
the additional sum of squares their applications in the constructio n of signif
due to H. Then, if H is true, Q is distrib intervals. icance tests and confidence
uted independently of Ier, and
Q/112 ~ Xfq - p» An impo rtant requirement of any valid statistical
inference is that it shoul d
PROOF. Unde r H, the model beco mesµ no~ depend upon any features of the data which
= Wy where W~ XA. The columns of are irrelevant to the question
W are linear comb inatio ns of the columns of mterest. The sufficiency principle attem pts to
of X , and therefore ..Y(W), the formalize this requirement.
column space of W, is a subspace of r(X) . Section 1 describes this principle and define
s sufficient statistics. Some
As in the preceding theorem, we consider an ortho properties of sufficient statistics are derived in
gonal transformation Section 2.
Z = C'U. For the first p columns of C we take a norm Significance levels and coverage probabilities are
ed ortho gonal basis of the comp uted from sampling
vector space ..Y(W). To this we add q - p colum distributions in a series of imaginary repetitions
ns so that the first q columns of the experiment. These
of C form a norm ed ortho gonal basis of..Y(X). This repetitions are purely hypothetical, and will
is possible because ..Y(W) not actually be carried out.
is a subspace of ..Y(X). Sections 3 and 4 are concerned with how to choos
e an appro priate series of
The argum ent in the preceding proof can now be repetitions for inferences abou t a parameter. In
used for both the original partic ular, it is argued that,
model µ = X/3 and the hypothesized model µ = when ancillary statistics are present, significance
Wy, giving levels and coverage proba -
bilities should be computed from a conditional
distribution.
I e?/11 2=z:+1+z:+ 2+ ·· · +z;; Section 5 considers difficulties which can arise
in testing composite
hypotheses. Sometimes a satisfactory test can be
Ief/11 2=z;+ 1+z;+2+ ·· · +z;. the observed values of sufficient statistics for the
obtai ned by condi tionin g on
Subtr acting gives unkn own param eters. Some
examples of conditional tests are given in Sectio
n 6.
Q/11 2 =
(Isr - Isf)/11 2
=z;+ 1+z ;+2+ ··· +z;.
Since this · is the sum of squares of q - p indep 15. l. The Sufficiency Principle
endent N(O, 1) variates, it
follows by (6.9.9) that Q/11 2 ~ Xfq-p)' Also, since
Zp+ 1 , ... , Zq are distributed
independently of Zq+ 1 , •.• , z., it follows that Q An impo rtant requirement of any valid statistical
is distributed independently inference is that it shoul d
ciu;. not be affected by features of the data which are
o interest. The sufficiency principle is an attem pt to
irrelevant to the question of
formalize this requirement.
278 15. Sufficient Statistics and Conditional Tests 15.1. The Sufficiency Principle 279

Let y, y' be two possible (mutually exclusive) outcomes of an experiment will lead to the same inferences, and therefore the sufficiency principle is
whose probability model involves an unknown parameter 8. Suppose that we automatically satisfied.
wish to make inferences about the value of 8. Roughly speaking, the
sufficiency principle states that, if the choice between y and y' is a purely
random one not depending upon the value of 8, then inferences about 8 Sufficient Statistics
should be the same if y is observed as they would be if y' were observed.
For instance, consider n = 3 Bernoulli trials, and suppose that we wish to A statistic Tis a random variable whose value T(y) can be computed from the
make inferences about 8 = ?(success). Consider the three outcomes y = SSF, data without knowledge of the value of 8. Tis called a sufficient statistic for 8
y' = SFS, and y" = FSS. Each of these outcomes has probability 82 (1 - 8). No if knowledge of the observed value bf Tis sufficient to determine L(8; y) up to
matter what the value of 8 is, the three outcomes are equally probable. The a constant of proportionality. In other words, Tis a sufficient statistic for 8 if
choice among them is purely random, and does not depend in any way on the L(8; y) can be written as a function of y only times a function of T and 8:
value of 8. The sufficiency principle states that inferences about 8 should be
L(8; y) = C(y) · H(T(y); 8). (15. l.4)
the same no matter which of these outcomes is observed.
The conditional probability of observing outcome y given that either y or Two outcomes y, y' such that T(y) = T(y') will give rise to proportional
y' has occurred is likelihood functions for 8, and by the sufficiency principle, they should lead to
, P( y; 8) Odds the same inferences concerning 8. All that we require from the data for
(15.1.1)
P(yly or y) = P(y; 8) + P(y'; 9) Odds+ 1 inferences about 8 is the observed value of a sufficient statistic T.
Even when 8 is one dimensional, we may need two or more functions of the
where the ratio of probabilities, data to fully determine the likelihood function. If knowledge of the observed
Odds= P(y; 8)/ P(y'; 8), (15.1.2) values of k statistics T 1 , T2 , •• • ,Tic is sufficient to determine L(8; y) up to a
proportionality constant, then T = (T1 , T2 , .•• , T,J is called a set of sufficient
is the fair betting odds for outcome y versus outcome y'. If the odds do not statistics. Two outcomes y, y' such that T,{y) = T,{y') for i = 1, 2, ... , k will give
depend upon 8, then the choice between outcomes y and y' is purely random rise to proportional likelihood functions for 8.
and is unrelated to the value of 8. The existence of a set of sufficient statistics T 1 , T2 , ..• , Tic enables us to
The sufficiency principle states that, if the odds (15.1.2) do not depend upon condense or reduce the data to k numbers T 1 (y), T2 (y), ... , T,.(y) without
8, then outcomes y and y' should lead to the same inferences concerning 8. An losing information about 8. A set of sufficient statistics which gives the
equivalent requirement is that the conditional probability (15.1.1) does not greatest possible reduction of the data is called minimally sufficient for 8. If T
depend upon 8. is minimally sufficient for 8, then T(y) = T(y') if and only if L(8; y) and L(8; y')
are proportional.
The sufficiency principle states that outcomes which give rise to propor-
Sufficiency and the Likelihood Function tional likelihood functions for 8 should lead to the same inferences concern-
ing 8. An equivalent statement of the sufficiency principle is that outcomes
The likelihood function of 8 based on outcome y is proportional to P(y; 8): which imply the same value of a minimally sufficient statistic or set of statistics T
L(8; y) = k(y) · P(y; 8), (15.1.3) should lead to the same inferences concerning 8. If Tis minimally sufficient for
8, then T carries all of the relevant information for inferences about 8.
where k(y) is positive and does not depend upon 8. The odds (15.1.2) a~e Inferences about 8 should depend only on T and not on the remainder of the
independent of 8 if and only if L(8; y) is proportional to L(8; y'). Another way
\
data.
of stating the sufficiency principle is that outcomes of the same experiment
which give rise to proportional likelihood functions for 8 should lead to the same
EXAMPLE 15.1.1. Consider n Bernoulli trials, and suppose that we wish to
inferences about 8. Indeed, this is the reason that the likelihood function is
make inferences about 8 = P(success). An outcome of the experiment may be
defined only up to a multiplicative constant, and that two likelihood
functions which are .proportional to one another are regarded as equivaler,t.
I' written as a sequence y = (y 1 , y 2 , .. ., y.), where y 1 = 1 if the ith trial produces
a success and y 1 = 0 otherwise. Since P(y 1 = 1) = 8 and P(y1 = 0) = 1 - 8, we
In Chapters 9-14 we restricted discussion almost exclusively to methods
based on the likelihood function or likelihood ratio statistic. For these
I have
methods, observations which give rise to proportional likelihood functions
I for y 1 = 0, 1.
280
15. Sufficient Statistics and Cond itiona
l Tests 15. : . The Sufficiency Principle
281
Since trials are inde pend ent, the prob
abili ty of ~utcome y is
sufficient statistic, altho ugh its value wou
n ld be requ ired for inferences abou t
P(y; 0) = CT f(y1) = oi:1 '(1
the parameter.
- O)"-i:1 ' .
i= l
EXAM PLE 15.1.3(a). Let Y ,
The likelihood function is a cons tant 1 Y2 , .. ., Y,, be inde pend ent expo nent
times P(y, O), with the same mea n B. Thei r joint p.d.f. ial variates
is
L(O; y)=k (y)·O l: 11 (1-o rl:y ; for0 <0<
Usually we would take k(y) = l for conv
enience.
l.
f(y, , Yz, ... , Y.) = n e1
n

1= 1
-e - rJ/ 9 = e - •e - t y,/o
Let y' = (y'1 , y~, .. ., y~) be anot her poss
ible outcome. Then for 0 < y 1 < ro . If the mea sure men t inter
vals are small (see Section 9.4), the
P(y; O)/P(y'; B) = oi:1,-tyl(l _ O)trl - t 11, likelihood function of 0 is
which is inde pend ent of 0 if and only L( O; y) = C(y) . e- •e- •18
if Ly1=Ly ;. This is also the cond ition fore > 0
unde r w~ich y a~d ~, give rise to prop
ortio nal likelihood functions for O. By where t = LY1· Assuming n to be know
the su~c1ency pnnc1ple, outc ome s y, y' n in advance, the total T =LY ; is a
such that Ly 1 =Ly ; shou ld lead to the sufficient statistic for 0.
same inferences for 0.
(b). A more complicated situa tion was
The rand om varia ble T =LY ; is a suffi cons idere d in Section 9.5. The
cient statistic for O in this example. lifetimes of n specimens were assu med
Outc ome s y, y' such that T(y) = T(y') to be inde pend ent expo nent ial variates,
(that is, Ly1=Ly ;) give rise to but censoring of lifetimes at pred eterm
prop ortio nal likelihood functions . In ined times was perm itted . The
fact, Tis minimally sufficient because likelihood function then has the form
T(y) # T(y') , then L(O; y) is not prop if
ortio nal to L(O; y').
=
The sufficient stati stic T 1: Y; carries
conc ernin g the value of 0. The rema inde
,
all of the infor mati on from the data fore > 0
r of the data (i.e. infor mati on abou t where m is the num ber of specimens whic
the orde r in which the O's and l's occu h fail, and s is the sum of m failure
rred) is not relevant to inferences abou times and n - m censoring times. We
0 unde r the model assu med . This addi t would not know m ors until after the
tiona l infor mati on is what would be experiment. Thus, in this case, we need
used to check the assu mpti ons of the observed values of two statistics
inde pend ent trials and equal success M (the num ber of failures) and S (the
prob abili ties which underly the Bern . total time on test), before we can writ~
oulli trials model. dow n L(O). Und er the exponential mod
el with cens oring, the pair (M, S) is
minimally sufficient. Neit her M nor S
EXAM PLE 15.1.2. Let Y , Y , by itself is a sufficient statistic for B.
1 2 .. ., Y. be inde pend ent Pois son varia
same mea n µ. Then the prob abili ty of tes with the
outc ome y = (y 1 , y 2 , •.• , y.) is EXAM PLE 15.1.4. Supp ose that
Y1 , Y2 , ... , Y,, are inde pend ent variates
havi ng a
P(y; µ)= n µY•e - µ/Y1!=µty,e-"" /(y1!Y2! ... y.!)
n

i=l
uniform distr ibuti on on the interval [O,
the likelihood function of e is
BJ where B > 0. From Prob lem 9.4.11,

where y 1 = 0, 1, 2, .. . . The likelihood


function ofµ is L(O; y) = { ~(y)e - • for B 2:: Y(nl;
L(µ; y) = C(y) • µ'e - nµ otherwise
forµ > 0,
where t =Ly ;. The varia te T =LY ; is where Y<nl is the largest sample value
a sufficient. statistic for µ. In fact, Tis . Two samples y, y' will give rise to
minimally sufficient because L(µ; y') prop ortio nal likelihood functions for
is not prop ortio nal to L(µ; y) unless B if and only if Y(n) = Y(np so that the
L/; =Ly1• range of L(O) is the same for both samp
les. Henc e l(.l is a minimally sufficient
Und er .the m?d el assu med , all of the statistic for e.
infor mati on relevant to inffrences Special care is required in examples like
ab~ut µ .1s cam ed by the suffi
~-dunens~onal obse rvati on vect
=
cient stati stic T L y 1• We can repla
or y by the single num ber t = L y with out
ce the upon 0. A set of sufficient statistics mus
this where the rang e of y depe nds
t dete rmin e not only the functional
losing form of L(O; y), but also the rang e of
mformat10n abou t µ. The individual 1 possible values for B.
y;'s are not need ed for inferences abou
µ, altho ugh they wou ld be requ ired t EXAM PLE 15.1.5. Let Y , Y ,
if we wished to check the assumptions
of 1 2 .. ., Y,, be inde pend ent varia tes havi ng a Cauc hy
the model. distr ibuti on centred at 0. The p.d.f. of
this distr ibuti on is
In both this exam ple and the precedin
g one, the sample size n is regarded as 1
fixed and know n in advance. For this for - ro < x < oo.
reaso n we have not included n in the f (x) = rc[l + (x - 0) 2 ]
282 15. Sufficient Statistics and Conditional Tests 15. 1. The Sufficiency Principle
283

The joint p.d.f. of Y1 , Y2 , ... , Y,, is


B(y) = (:) for y = 0, 1, ... , n and B(y) = 0 otherwise.

f(Yi, Yz, ... , Y.) = n-· TI [1 +(Yi -
i= 1
e) 2
r 1
\ Similarly, the Poisson, exponential , and x2 distribution s are members of the
exponential family.
and the likelihood function of e is If Yi , Y2 , ... , Y,, are independen t and identically distributed variates whose
L(e; y) = C(y) • TI• [1 +(Yi - e) 2 J- 1 for - co < e<co.
distribution belongs to the exponential family, their joint p.f. or p.d.f. is
'
i=l

This is more complicate d than the likelihood functions in the preceding


examples. We cannot find one or two statistics which determine the
f(yl' Y2 • ... , y.) = [A(e)]" Ln B(y,) J
.

exp { c(e). itl d(yi)}.


The likelihood function is then
likelihood function in this case.
Note that reordering they/swill not change the likelihood function. Thus, L(e) = k(y) • [A(e)]" exp {c(e) · l:d(y 1) }.
e
by the sufficiency principle, inferences concerning should not depend upon
e,
Since the range of the Y;'s does not depend upon the set of possible values
the order in which the y/s were recorded. Let Y<o denote the ith smallest of the
y/s, so that Y<tJ:::; Y(2) :::; · · · :::; Y<n) · Then
for e does not depend upon the data. Hence the statistic T = l:d(y;) is
minimally sufficient for e. Because of this, statistical inference is more
n
straightforw ard for distribution s belonging to the exponentia l family.
L(e; y) = C(y) · TI
j;;::: 1
[1 + (Y(i) - WJ - 1 for - 00 < e < co, The definition of the exponential family can be extended to include
distribution s which depend upon several parameters e , 1 , . . . , e
Details e,.
and }'( 1 ), }'( 2 ), .. . , }'(.J is a set of sufficient statistics fore . It can be shown that 1
may be found in Chapter 2 of Theoretical Statistics by D .R. Cox and D.V. Hin~
this set of statistics is in fact minimally sufficient. The best. that we can do in
this example is an n-dimensio nal set of sufficient statistics. EXAMPLE 15.l.7 (Normal Linear Model). Let Y1 , Y , ... , Y,, be independen t
2
In general, when y consists of the observed values of n independen t and N(µ" u 2 ) variates with µ 1 = Xn/3 1+ xi2 /3 2 + ··· + xiq/Jq, where the xii's are
identically distributed random variables, the ord.e r in which the observation s known constants and the /J/s are unknown parameters . We shall show that
were recorded is irrelevant for inferences about e. Thus Ycn, Y( » .. . , Y(.J is a the parameter estimates P1 , '/3 2 , . .. , pq and residual sum of squares l:ef form a
2
set of sufficient statistics for e. Whether or not these statistics are minimally set of sufficient statistics for the unknown parameters /3 1, /3 2 , .. ., /Jq and u.
sufficient depends upon the form of the distribution assumed. In the From Section 13.2, the log likelihood function is
preceding four examples it was possible to further reduce the data to only one
e,
or two statistics without losing information about but this is not possible l(/J, u) = - n log u - 0"1 2 l:(y 1- µJ 2 .
2
when a Cauchy distribution is assumed.
Using matrix notation as in Sections 14.1 and 14.2, we have
EXAMPLE 15. l .6 (The Exponentia l Family). Suppose that Y is a variate whose
p.f. or p.d.f. depends upon a single parameter and is of the form e, l:(y1 - µJ1= (y - µ)'(y - µ).
f (y; e) = A(e) • B(y) • ec(BJ·d(y)
Now sinceµ= X/3, fl= X'/J, and e = y- fl, we have
for - oo < y < oo
where A, B, c, and dare known functions. The distribution of Yis then said to y-µ=y-fl+fl-µ= e + x ~-m
belong to the exponential family of distributions. Note that the range of Y does and therefore
not depend upon e. L(Y1 - µJ1 =+ X(/J- /J)]'[e + X(p- /3)]
[e
Several of the one-parame ter distribution s which we have considered are
members of the exponentia l family. For example, the binomial p.f. can be = e'e + e' X(/J - /3) + (P - /3)' X'e +(fl - /3)' X' X(/J - /J).
written Since X'e = 0 by (14.2.2), we have 8 X = (X' e)' = 0. Thus both cross-produ ct
G}~.1>Jog{8/(l
1

f (y; e) = c) eY(I - er y = (1 - er -8 )}'


terms are zero, and
I:(y1- µJ2 = I:ef +(fl- /3)'X'X(P- /3).
which is of the exponentia l form with It follows that
e e; d( y ) =
2 ~ 2 [l:ef + (p -
A(8) = (1 - 8)"; c(8) =log l _ y; l(/J, q) = - n log u - {J)' X' X(p - fJ)].
284 15. Sufficie~t Statistics and Condition al Tests
15.2. Propertie s of Sufficient Statistics
285
Two samples y, y' for which "fJ and L.if are the same will give rise
to the same
log likeliho od function for f3 and a. Therefo re "fJ and "Lef form 10. Suppose that X 1 , X 2 , ..• , X. are N(µ , a 2 ) and Y , Y , ... , Ym are
a set of 1 N(µ , a 2 ) , all
sufficient statistic s for the unknow n paramet ers. = 1 2
independent. Show that X, Y, and V !:(X; - X) 2 + I:("Y; - Y) 2 form 2
sufficient statistics for µ 1 , µ 2 , and a.
a set of
In the above argume nt, the x;/s and n are treated as constan ts whose
values
are known prior to the experim ent, and the vector of y/s is the 11. Suppose that Y1 , Y2 , ... , Y,, are independent and exponentially
experim ental distributed
outcom e. The only function s of the y/s which we require for inferenc random variables, with E( Y,) =(a+ Px;) - 1. Here x 1> x , ... , x. are
es about known
P
fJ and a are and "Le?. We would also need to known and X'X, but these are 2
constants, and a, Pare unknown parameters. Find a pair of sufficient statistics
for
not include d as part of the set of sufficient statistic s. ex and {J.
12. Show that the Poisson, exponential, and x2 distributions are members
PROBLEMS FOR SECTION 15.1 of the
exponential family.
1. Show that T= Y1 + Y2 + ··· + Y,, is a sufficient statistic for). in Problem 9.2.2(a), 13. Show that the normal distributions N(O, a 2 ) and N(µ, 1) are
and find the probability distribution of T. members of the
exponential family.
2.tSuppose that we observe a single measurement Y from N(O, a 2 ) . Is 14. Suppose that the distribution of X belongs to the exponential family,
Ya sufficient and Y is a
statistic for a? Is Y minimally sufficient? one-to-one funct10n of X. Show that the distribution of Y also belongs
to the
3. Bacteria are distributed randomly and uniformly throughout river exponential family.
water at the
rate of,\ bacteria per unit volume. n test tubes containing volumes v , v 15.tSuppose that the distribution of X belongs to the exponential
1 2 , .•. , v. of family. The
river water are prepared. parameter <fJ = c(fJ) is called the natural parameter of the distribution.
Find the
(a) Suppose that the number of bacteria in each of the n test tubes is determin natural parameter for the binomial, Poisson, and exponential distribut
ed. ions.
Find a sufficient statistic for .A..
(b) Suppose that then samples are combined to give a single sample
of volume
v = !:v" and the total number of organisms is determined. Find
a sufficient
statistic for .A.. Does combining the samples result in a loss of informat 15.2. Properties of Sufficient Statistics
ion
concerning ,\?
In this section we discuss some properti es of a sufficient statistic
4. Show that Xis a sufficient statistic forµ in Problem 9.1.13. or set of
sufficient statistic s T.
5.tSuppose that Y has a binomial (n, fJ) distribution where n is known Let y be a typical outcom e of an experim ent for which the probabi
and fJ is lity
= 1 =
unknown. Is the pair of statistics T1 Y, T n - Y minimally sufficient
for 8?
model depends upon an unknow n parame ter 8. In the precedi ng
defined T to be a sufficient statistic (or set of sufficient statistic
section we
6. Let X 1 , X 2 , ... , x.
be independent variates having a continuous uniform
knowled ge of T(y), the observe d value of T, is sufficient to determi
s) for 8 if
distribution on the interval (fJ, fJ + 1). Show that X ) and X .l form
a pair of ne L(B; y)
sufficient statistics for e.
0 1 up to a proport ionality constan t. It follows by ( 15.1.3) and (15.1.4)
that, if Tis
sufficient for 8, then
7.tLet X 1 , X 2 , ..• , X. be independent variates having a continuo
us uniform
distribution on the interval ( - fJ, fJ). Find a sufficient statistic for fJ P(y; 8) = c(y) · H(T(y); 8) (15.2.1)
8. Let Y1 , Y2 , ... , Y. be independent N(µ, a 1 ) random variables. Show for all y, where c(y) does not depend upon 8.
the following:
(a) Y is a sufficient statistic for µ when a is known;
Pro,:erty 1. If Tis sufficient for 8, then the likeliho od function for(}
(b) I:(Y, - µ) 2 is a sufficient statistic for a when µis known; based on
(c) Y and !:( Y; - Y} 2 form a set of sufficient statistics for µ and a the distribu tion of T is proport ional to L(8; y).
when both This result is to be expecte d because T carries all of the sample informa
parameters are unknown. tion
(d) !: Y, and !: Yf also form a set of sufficient statistics for µ and a concern ing 8. Thus we should get the same informa tion about
when both 8 from
parameters are unknown. observin g just the value of T as we would get from the complet
e sample y.
To find the probabi lity of the event T = t, we sum or integrat e ( 15.2.1)
9.t A scientist makes n measurements X , X , ... , X" of a constant over
1 2 µ using an all y such that T(y) = t. Since the second factor on the right
apparatu s of known variance a 2 , and m additional measurements Y , Y hand side is
1 2 , ..• , Ym of constan t in this sum or integral , we obtain
µ using a second apparatu s of known variance ka 2 . Assume
that all measurements
are independent and normally distributed. Show that T = nkg + m Y is
a sufficient
statistic for µ, and find its distribution. P(T= r; 8) =[ I c(y)J · H(t; 8) = d(t)· H(t; 8) (15.2.2)
T(y) ~i
286 15. Sufficient Statistics and Conditional Tests 15.2. Properties of Sufficient Statistics 287

where d(t) is not a function of e. The likelihood function based on (15.2.2) will EXAMPLE 15.2.2. Let Y1 , Y2 , •• ., Y,, be independent Poisson variates with the
be the same up to a proportionality constant as that based on (15.2.1). same meanµ. We showed in Example 15.1.2 that

Property 2. If Tis sufficient for e, the conditional distribution of outcomes y


given the observed value of T does not depend upon e.
and that T =I: Yi is a sufficient statistic forµ. Since Tis a sum of independent
This property is often used to define sufficient statistics. It is closely related
Poisson variates, it has a Poisson distribution with mean I:E(Y;) = nµ. See the
to the fact that if T is sufficient for e,
then (15.1.1) is independent of e Corollary to Example 4.6.1. Thus \ve have ·
whenever T(y) = T(y'). We shall use this result in Section 15.5 to construct
exact significance tests for composite hypotheses. P(T= t; µ) = (nµ)'e_",,/t! = µ'e_",,(n 1/t!).
It follows from (3.4.1) that
The likelihood function based on observing just the total t = L:y1 and that
P(Y = ylT= t) =P(Y= y, T= t)/P(T= t). based on the full sample y 1 ,y2 , ..• ,y. are both proportional to µ'e_",,.
The numerator in this expression equals P(Y = y) whenever T(y) = t, and it The conditional probability of outcome y given that T(y) = t is
equals zero otherwise. It follows that
P(Y = ylT= t) = P(Y = y)/P(T= t) if T(y) = t, (15.2.3) P(Y=y)/P(T=t)= ' ti,· (1)'
I -
Yi· Y2· ··· Yn· n
for L:y 1 = t.

and P(Y = y/T = t) = 0 otherwise. Thus by (15.2.1) and (15.2.2) we have This is a multinomial distribution with index t and equal probability
parameters p 1 = p 2 = · ·· = Pn = 1/n. As expected, the conditional distri-
P(Y =YI T = t) = c(y)I I
T(y)=t
c(y) = c(y)/d(t) (15.2.4) bution of outcomes given the sufficient statistic does not depend upon the
parameter µ.
which does not depend upon e.
Property 3. · Applying a one-to-one transformation to a set of sufficient
EXAMPLE 15.2.1. Consider n Bernoulli trials with success probability e. Let statistics produces another set of sufficient statistics.
Y = (Y1 , Y2 , .. ., Y,,) be a zero-one vector indicating the observed sequence of Let T1 , T2 , •• ., T,, be a set of sufficient statistics for 8. Suppose that
failures and successes as in Example 15.1.1. Then U 1 , U 2 , ••. ,Uk are functions of T1 , T1 , ••• , T,,, and that the transformation
P(Y = y; e) = (JLY•(l - er-z;y, = e'(,1 - er- 1 from (T1 , T2 , .. ., T,,) to (U 1 , U 2 , .. ., Uk) is invertible. Since the U;'s and T;'s
can be deduced from one another, they have the same information content.
where t = LYi· Here T =I: Y; is a sufficient statistic for e. Since Tis the total Given the values of the U/s, we can calculate the values of the T;'s and hence
number of successes in n Bernoulli trials, it has a binomial (n, e) distribution, determine L(8; y). Thus the Vi's also form a set of sufficient statistics for e.
and
EXAMPLE 15.2.3 (Normal Linear Model). We showed in Example 15.1.7 that
P(T= t; e) = ( ~) 8 (1- e)"-'
1
for t = 0, 1, ... , n. '/3 1 , '/3 2 , ... , Pq and L:ef form a set of sufficient statistics for the parameters
/Jt> /J 2 , .. ., /Jq, and a in the normal linear model. One can show, by a similar
The likelihood function based on observing T = t and that based on the full argument to that in Example 15.1.7, that
sample y are both proportional to 8'(1 - er-·. L:ef = L:Yf-P'X'X'/J.
The conditional probability of outcome y given that T(y) = t is
Note also that

P(Y=y)/P(T=t)= {~/(;) for L:yi = t;

otherwise.
p== (X'X)- 1 X'Y = (X'X)- 1 T
where T = X' Y is q x 1 with jth component
n

All ( ~) possible sequences of t successes and n - t failures are equally ~= L xuY;·


i= 1

probable. This distribution does not depend upon e, and could be used for As in Example 15.1.7, we treat X as a matrix of constants which, like n, is
testing the adequacy of the Bernoulli model. known prior to the experiment.
288 15. Sufficient Statistics and Conditional Tests
15.3. Exact Significance Levels and Coverage
Probabilities
289
Give n obser ved value s of T and L. Yf,
we can calcu late 'fJ and L.ef.
Conv ersely , T and L. Yf can be comp uted in the series expan sion will be negligible with
from Pand L.ef. Thus T and L. Yf high proba
have the same infor matio n conte nt as Pand
L.ef. Since 'fJ 1 , '(3 2 , ... , '/Jq, and L.ef -t(e
r(6)::::: -W J(B) in large samples, so that and bility e
. Thus we have
J (B) form a set of
form a set of sufficient statistics, the same appro xima te sufficient statis tics for 8. In
is true of T1 , T2 , •• • , Yq and I: Yf. this case we can summ arize nearl y
all of the infor matio n conce rning 8 by givin
Property 4. The maxi mum likeli hood estim g the most likely value iJ and a
ate Bis part of any set of sufficient meas ure of preci sion J(B).
statis tics T1 , T2 , ... , T,. in the sense that its
value can be comp uted from JUSt
the T;'s. This follows becau se the T;'s deter
mine L(8; y) up to a propo rtion ality
e
const ant, and does not depen d upon this
const ant.
PROBLEMS FOR SECTION 15.2
e
In some simp le exam ples, is itself a suffic
the infor matio n relev ant to estim ation of
ient statis tic which carrie s all of L Suppose that X has a binomial distributio
n with parameters (11, 0), and that Y is
independent of X and has a binomial distributio
8. For instan ce, in Exam ple 15. Ll n with parameters (m, 8), where 111
=
we have IJ T /n wher e T= L.Y; is a suffic
ient statistic. Since n is given, the
and 11 are known. Show that T = X + Y is a suffic
ient statistic for 8, and verify that
e
value s of and T can be dedu ced from one
anoth er, and they have the same
the conditional distribution of X and Y given
T does not depend upon 8.
infor matio n conte nt. Simil ar comm ents apply 2. Suppose that Yt> Yi • .. . , Y,, are independen
in Exam ples 2, 3(a), 4, 6, and 7 t and identically distributed rando m
of Secti on 15.L variables, with
e
In more comp lex exam ples, is not by itself
insta nce, in Exam ple 15. U(b) we have a pair
a sufficient statis tic for 8. For
fory =l,2, .. . , N,
of sufficient statis tics (S, M), and
e= S/ M. The likeli hood funct ion is where N is an unknown positive integer.

for 8 > 0. (a) Show that Yin> is a sufficient statistic for N,


and derive its probability function .
e
Know ledge of alone is not enou gh to deter
mine L(8) up to a const ant of
(b) Derive the conditional probability functi
on of Y1 , Yi , ... , Y,, given YcnJ·
propo rtion ality. We also need to know the 3.t A manufacturing process produces fibers of
obser ved value of M, the numb er varying lengths. The length X of such a
of failures out of the n speci mens tested fiber is assumed to be a continuous variate
. Simil arly, in Exam ple 15.1.5 ,
e
know ledge of alone is not sufficient to
deter mine L(B; y). In this case, we
with p.d.f.
need n - 1 addit ional statis tics for a suffic for x >0
ient set.
Unde r suita ble regul arity cond ition s (see where A.> 0. Suppose that 11 fibers are select
Secti on 9.7) we can expa nd the ed at rando m and their lengths
log relati ve likeli hood funct ion in a Taylo X 1 , Xi, ... , X. are determined.
r's series abou t 8 =fl:
(8 - tJ) 3 (a) Show that T= I.Xf is a sufficient statist
8-8) 2 J(8 )+- -P' (B) +(8-- -tJ)/ (4 '(0)+
r(B)= _.1.(
4 ic for A.
2 (b) Show that 2A.X i has a x2 distribution with
3! ··· . 2 degrees of freedom, and hence that
4! 2A. T- xlin>· Find the p.d.f. of T, and show
Here l(i'(B) deno tes the ith deriv ative of 1(8) that it gives rise to the same
with respect to 8, evalu ated at likelihood function for A. as the original sampl
8= e.In general, J(B) and /(il(8) will depe nd not
only on 8 but also on other 4. Show that X(ll is a sufficient statistic for c
e.
funct ions of the data. in Problem 9.4.10. Derive the probability
density function of X<o• and verify that the likelih
Since the log relati ve likeli hood funct ion ood function of c is propo rtiona l
is deter mine d comp letely by to this p.d.f.
e,J(8), /(3)(8) , /(4l(B) , . .. , 5.tCo nside r Problems I through 11 of Sectio
n 15. l. In which of these problems are
it follows that this set of statis tics is suffic the maximum likelihood estimates nor suffic
ient for IJ. We can think of 8 as the ient statistics?
prim ary sourc e of infor matio n conc ernin g
8, while the rema ining statistics in
the list supp ly supp leme ntary infor matio
n. The infor matio n J(8) indic ates
the preci sion of the exper imen t With respe
likeli hood interv als or appro xima te confi dence
e,
ct to since if J(B) is large, then 15.3. Exact Significance Levels and Coverag
e
interv als for 8 will be narro w.
The rema ining statis tics /( 3 l(8), /( 4 \8), .. . give
infor matio n abou t the shape of Pro babili ties
the likeli hood funct ion , and hence indic
ate the appro priat e form for like-
lihoo d and confi dence regions. Cons ider an exper imen t for which the proba
bility mode l depe nds upon an
In Section 11.3 we noted that, in large samp unkn own param eter e. To test the simp le
les, the cubic and highe r terms hypo thesi s H: 8 = 00 , we defin e a
tes( statis tic D such that large values of D
indic ate evide nce again st H . The
15.3. Exact Significance Levels and Coverage Probabilities 291
15. Sufficient Statistics and Conditional Tests
290
cance level can be
For any reason able choice of the test statisti c D, the signifi
significance level is then 100 has a xf distrib ution when B = 2. In
found from the fact that U = 201
SL= P{D:?: Dob•} particu lar, the likelih ood ratio statisti c for testing H: 8 = 8 0
is

where Dobs is the observed value of D. This probab ility


is calculated under the
D = -2r(B 0 ) = 2n [ O- 1 - log! ]= 2n [ U - 1 - log~
2n
].
assum ption that B = B0 . 00 B0 . 2n
ilities. Suppose
Similar calcul ations are requir ed to find coverage probab
R within which the true The observed value of U is 100 = 30, and thus
that we have a proced ure for constr ucting a region
For instanc e, R might be the 10%
param eter value is though t to lie. 30 - 1 - log 30] = 1.8907.
region for 0. Then the covera ge probability D0 b• = 20 [
likelih ood region or significance 20 20
of the region at B = B0 is
By Newto n's metho d, we find that D ~ D0 b• for U ~ 30
and for U :$ 12.516.
CP(0 0 ) = P{B 0 ER}. Hence the exact significance level is
that 0 = 00 .
This probab ility is again calcul ated under the assum ption SL= P{U :$ 12.516} + P{U ~ 30} = 1 - P{l2.5 16 < U
< 30}
which we have ignore d until now,'is how to choose the
A difficult proble m,
probability distrib ution from which SL and CP should
be calcula ted. In this = 1- P{12.516 < x{201 < ~O} =0.172 6.
is clear how this distrib ution
section we consid er two special cases in which it For compa rison, the large-sample r.esult is
l discus sion of the proble m will be given
should be chosen. Some more genera
in the next section. SL::::: P{xfl) ~ 1.8907} = 0.1691,
which agrees closely with the exact result.

Case 1. 0 a Sufficient Statistic for 8


nt statistic for 8 in Case 2. Ancillary Statistics
In Section 15.2 we noted that the MLE 0 is itself a sufficie
princip le, inferen ces about 8
some simple examples. Then, by the sufficiency sufficient statistic. In
c D or region R will Except in some simple examples, Bwill not itself be a
should depen d only on 0. Thus any
depen d upon the data only throug h
valid
the
test
value
statisti
of 0. Signifi cance levels and general, it will be necessary to supple ment with one e or more additio nal
distrib ution statistics T in order to obtain a set of sufficient statisti cs (B, T) for 0. The
nal)
coverage probab ilities can therefore be found from the (margi covera ge probab ilities will depen d
B-valu es for which D;?: D 0 b.,
or for which B0 ER, definition of exact significance levels and
of 0. We find the rarige of
carried by the supple menta ry statisti cs T.
range to obtain SL or upon the sort of inform ation
and then sum or integr ate the distrib ution of 8 over this rty 3 in Sectio n 15.2, there will be many differe nt ways to
with the distrib ution of any convenient one- Because of Prope
CP. Equivalently, we can work or set of statisti cs T such
of e,
which will also be a suffici ent statisti c for 8. select T. Suppo se that it is possible to find a statisti c
to-one functio n that
ndent exponential
EXAMPLE 15.3.1. Suppo se that X 1 , X 2 , .. ., X. are indepe (i) (B, T) is minimally sufficient for B; and
le 11.3.3, the log RLF of B is
variates with the same mean 8. From Examp (ii} the distrib ution of T does not depen d upon B.

-n[~ -1-log~J
ry statistics) for 8. We
for 0 > 0. Then Tis called an ancillary statistic (or set of ancilla
r(8) = exact signifi cance levels and coverage
shall argue that, in this situation,
able metho d for probabilities should be compu ted from the condit ional distrib e
ution of given
Here 0:: 'f.X J n is a sufficient statistic for 8. Any reasori T.
ting region s of plausib le param eter values will
testing H: B = 80 or for genera the value of 8
B. Signifi cance levels and covera ge probab ilities can thus be An ancillary statistic T gives no direct inform ation about
depen d only on upon 8. Observ ing
because its marginal distrib ution f 2
can use the (t) does not depen d
lently, we
calculated from the distrib ution ofO when B = 80 • Equiva re tell us nothin g about the value of 8. The
ent for B, and has a xf2. 1 just the value of T would therefo
distrib ution of u = 2ne;e 0 which is also suffici
ing supple menta ry
6.9.7). We used this result to find primar y inform ation about Bis carried by 0, with T provid
distrib ution when 8 = 0 B (see Proble m
or ancillary information.
le 11.3.3. ·
coverage probab ilities of likelihood regions in Examp write f(B, t) as a
to test H: 8 = 2. Let f (0, t) denote the joint p.f. or p.d.f. of 8 and T. We can
Suppose, for instance, that n = 10, 0 = 3, and that we wish
292 15. Sufficient Statistics and Conditiona
l Tests
15.3. Exact Significance Levels and Cove
rage Probabilities
293
prod uct,
as in Exam ple 15.1.1. Henc e the join
J(8, t) =ft (Blt)f2(t), t prob abili ty of y and n is
wher e the seco nd facto r does not depe P(y, n) = 8i:Y•(I - 8)" -i:v'g(n).
nd upon 8. Since (B, T) is a set of
sufficient statistics, L(8) is prop ortio The likelihood function of 8 is then
nal to f(B, t) (see Prop erty 1 in Section
15.2). Since f 2 (t) does not depe nd upon
8, it follows that L(8) is prop ortio
to f 1 (Bit), the cond ition al p.f. or p.d.f. nal
of 8 given the obse rved value of T. This for 0 < 8 < I
cond ition al distr ibuti on carri es all
marg inal distr ibuti on of T does not
of the infor mati on conc ernin g 8. The e
where = "'i.y;fn.
depe nd upon 8, and so it is not used It follows that (0, N) is a pair of mini mall
mak ing inferences abou t 8. in y sufficient stati stics for 8, and tha t
N is ancillary. Henc e inferences
abou t 8 will be base d on the cond ition
Ofte n one can inter pret an anci llary distr ibuti on of 8 given the obse rved al
stati stic T as a meas ure of the value of N . Equi vale ntly, inference
prec ision with which it is poss ible abou t 8 may be base d on the cond ition s
outc ome s of an expe rime nt may diffe
to estim ate 8. The vario us possible
r grea tly in the amo unt of infor mati on the obse rved samp le size n. Cove rage
al (binomial) distr ibuti on of
prob abili ties and significance levels will
ne
given
abou t 8 which they are capa ble of yield thus be calcu lated as in Exam ple 11.2.
ing. If we are fortu nate , we observe an 1 and 12.2. 1. Alth ough the samp le size
outc ome which perm its the value of e N is rand om, we take it as fixed in infer
to be dete rmin ed quite precisely. If we ences abou t 8. The fact that we migh
are unlu cky, we may obta in an unin get a different samp le size in repe tition t
form ative outc ome from which we can s of the expe rime nt is irrel evan t to the
learn relatively little abou t e. inter preta tion of the data actu ally obta
ined .
In prob lems of inference, it is necessary In the abov e discussion, we assu med
to take into acco unt the that the distr ibuti on of N did not
infor mati vene ss of the data actu ally obta depe nd upon 8. Of course, if the
ined . The fact that we migh t obta in a distr ibuti on of N depe nded upon
more infor mati ve or less infor mati ve cond ition ing on the obse rved value of 8,
resul t if the expe rime nt were repe ated N wou ld entail a loss of info rmat ion
shou ld be cons idere d in designing futur conc ernin g 8, because L(8) wou ld no
e experiments, but it is irrelevant to longer be prop ortio nal to f (Bin). For
the inter preta tion of the data at hand instance, one migh t decide to keep
. The obse rved value of the ancillary exam ining subjects until three with
stati stic indic ates the infor mati vene tuberculosis had been found and then
ss of the data actu ally obta ined . It stop. Then the distr ibuti on of N wou
there fore appr opri ate to base inference is depe nd upon 8, and it wou ld not ld
s abou t 8 on the cond ition al distri- be appr opri ate to cond ition upon
butio n of () given the obse rved value observed value. its
of the ancillary statistic T.
EXAMPLE 15.3.2 (Ran dom Sam ple Size) EXAMPLE 15.3.3. A total of n clou ds are
. Supp ose that the expe rime nt involves to be obse rved in an expe rime nt to
deter mine the effectiveness of clou d seed
n Bern oulli trials as in Exam ple 15.1.1. For ing in prod ucin g rain. For each clou d
instance, we migh t exam ine n it is decided whet her or not to seed by
subjects for tube rculo sis with the inten flipping a bala nced coin. Henc e Z , the
tion of estim ating 8, the prop ortio n of num ber of clou ds to be seeded; has a
the popu latio n havi ng this disease. In bino mial (n, ·! ) distr ibuti on.
Exam ple 15.1.1 we assu med that n, the Let X be the num ber of seeded clou ds
samp le size, was fixed and know n prio which prod uce rain, and let Y be the
r to the expe rime nt. How ever it may be num ber of unseeded clou ds which prod
that n itself is subje ct to varia tion, and uce rain. We assu me that clou ds are
coul d be mod elled as an observed inde pend ent, and that the prob abili ty
value of a rand om varia ble N. For of rain is p1 for a seed ed clou d and p
insta nce, the samp le size migh t depe for an unseeded clou d. Then, given that
upon the amo unt of mon ey and nd z clou ds are seeded, X has a bino mial2
labo rator y space, and the num ber (z, pi) distr ibuti on, and Y has a bino
pers onne l avail able for the stud y, and of mial (n - z, p2 ) distr ibuti on inde pend
perh aps none of these is unde r the stric ently of X . We observe (x, y, z) and wish -
cont rol of the expe rime nter. Or perh t to mak e inferences abou t p a nd p .
aps unforeseen circu msta nces unre lated In parti cula r, we migh t want to test 1 2
to the incidence of tube rculo sis coul d the hypo thesi s that p = p •
caus e the expe rime nt to be term inate The joint prob abili ty function of X, 1 2
after 150 peop le have been exam ined d· Y, and Z is
, altho ugh it was originally plan ned
exam ine 200. to f(x, y, z) = f(x, y! z)f(z ) = f(xlz )f(yl z)f(z
Supp ose, then, that the samp le size N )
is a rand om varia ble with prob abili ty
func tion g(n) not depe ndin g upon 8. The
expe rime nt prod uces n, the obse rved
value of N, and a sequ ence y = (y , y
1 2 , .. ., y.) wher e Y; = 1 or 0 acco rding to
whe ther the ith subj ect does or does The likelihood func tion of p and p
not have tuberculosis. Give n n, the
prob abili ty of the sequ ence y is 1 2 is thus

L(P1, P2) =PW - P1y-xp~(l - P2t- z-y


P(yln) = 8i:v'(1 - 8)" - r.y,
=PIP•(! _Pt) '( l - Pdp~" - z))l2( 1 _Pi) (" - z)(l - /)2)
294 15. Sufficient Statistics and Conditional Tests 15.J. Exact Significance Levels and Coverage Probabilitie s 295
I
.. •

where p1 = x/z and p2 = y/(n - z). Here p1 , Pi· and Z are jointly minimally sufficient statistics for 8, and A 1 ",A 2 , ... , A. _ 1 are ancillary. All of the
sufficient for p 1 and p2 , and Z is ancillary. Inferences about p 1 and Pi will informatio n about 8 is carried by the condition al distributio n of T given the
therefore be based on the condition al distributio n of Pt and p2 given the observed values of the ancillary statistics. This distributio n, which may be
observed value of Z, or equivalen tly, on the condition al distributio n of X and f~und by numerical integratio n, would be used for calculatin g exact sig-
y given the observed value of Z. Since z is to be treated as fixed, a test of nificance levels or coverage probabilit ies.
H: p 1 =Pi can be carried out as in Example 12.4.1, with n 1 =z and n2 =n-z. See In this example, the ancillary statistics give informati on about the shape of
Section 15.6 for discussion of an exact test of this hypothesis. the likelihood function. For instance if n = 2, L(8) has a u·nique maximum at y
The relationsh ip between the value of Zand precision is easily seen in this when a 1 is small, but is bimodal with a relati ve minimum at y when a 1 is large.
example. If one should get z = 0 (improbab le but still possible), then one The observed value of A 1 indicates the shape of L(8), and hence the
would not seed a ny clouds, and thus would obtain no informati on about P1 • appropria te form of likelihood and confidenc e regions. However A 1 itself tells
us nothing about the magnitud e of 8.
Similarly, with z = n, one would obtain no informati on about p 2 . In both
cases, the experimen t would be incapable of giving evidence against the
hypothesi s p 1 =Pi· However, if one obtained z:::::; n/2, the experimen t would
give a reasonab le amount of informati on about p 1 and Pi , and hence would
be capable of showing that they are different. The observed value of Z thus PROBLEMS FOR SECTION l S.3
indicates the precision which is possible in inferences about Pt and p 2 •
Lt Suppose that patients arrive for treatment according to a Poisson process in time
Although Z is a random variable, we regard Z as fixed at its observed value in with 20 arrivals per year on average. The treatment is successful for a fraction{) of
the analysis. patients. Let X be the number of successful treatments and Y the number of
In this case, the existence of the ancillary statistic Z shows up a defect in the unsuccessful treatments in a one-year period. Then X and Y are independent
design of the experimen t. It would be better to set up the experimen t so that Poisson variates with means 20() and 20(1 - {)), Find an ancillary statistic T such
the value of Z was fixed in advance near n/2. This could be done by drawing that &and T are jointly sufficient for e, and derive the appropriate conditional
balls at random without replaceme nt from an urn containin g n/2 white balls distribution for inferences about e.
and n/ 2 black balls, and seeding a cloud if a white ball is drawn.
2. Let X1 , X 2, ... , X. be independent random variables having a continuous
EXAMPLE 15.3.4 (Cauchy Distributi on). Suppose that Y1 , Yi, .. . , Y,, are uniform distribution on the interval [{), {) + 1].
independe nt variates having a Cauchy distributio n centered at 8. From
(a) Show that &:X<•>-1 , and that T=X<·>- X<1 > is an ancillary statistic.
Example 15.1.5, the complete set of order statistics 1( 1 > ~ 1( 2 > ~ .. . ~ l(.J is (b) Show that the value of {) must lie in the interval [&, &+ c], where c is the
minimally sufficient for 8. observed value of T.
In this example it is possible to find a set of n - 1 ancillary statistics. To see (c) Show that the interval [/}, &+fl has (unconditional) coverage probability
this we note that the distributi on of U; = Y; - 8 does not depend upon 8. In 1-(!f.
fact, U; has a Cauchy distributi on centred at zero, with p.d.f. (d) !f n = 3, then[&, /}+ !J is an 87.5% confidence interval fore. Explain why this
n provided by
mterval might not give a satisfactory summary of the informatio
for - co < u < co . the data concerning the value of e.

3. ·Let Y1, Yi, .. ., Y,, be independent variates having a continuous uniform


Now consider the n - 1 statistics distribution on the interval ({), W), where {) > O.
for i == 1, 2, ... , n - 1.
(a) Show that l{ 1 >and l(.>together are sufficient for{), and that &is not a sufficient
statistic.
=
Since Y(;> 8 + U<i» we have (b) Show that A= l(.>/ l( 1 >is ancillary, and that &and A are jointly sufficient fore.
A;= [8 + u(i+ l)J - [8 + u<i>J = u(i+ I> - u<i>· 4. • Let Y1 , Y2 be independent variates having a Cauchy distribution centered at{) and
The distributi on of the U;'s does not depend upon 8, and so neither does the
=
define A 1 l( 2>- Y(ll as in Example 15.3.4. '

distributi on of A 1 , A 2 , .. ., A. - 1 • (a) Show that, if A 1 ::; 2, the likelihood equation /'({)) = O has just one real root,
Now let Tbe a ny statistic such that the transform ation from }( 1» 1(2 ,, •.• , l(.> and that &= y. ·
to T, A 1 , Ai, ... , A._ 1 is one-to-on e . .For instance, we could take T:: Y(1> (b) S,how that, if A1 > 2, the likelihood equation has three real roots, and that
there is a relative minimum at {) = y.
for any i, or T= Y, or T:: (J, Then (T, A 1 , Ai, .. ., A. _ 1) is a set of minimally
296 15. Sufficient Statistics and Conditional Tests
15/.. Choosing the Reference Set
297
15.4. Choosing the Reference Set
of which are as follows:
To evalua te the significance level for a test of H: B=
B , it is necessary to (1) Repeat with X + Y fixed at 50.
imagin e a series of repetit ions of the experi ment with 0
B fixed at B0 • At each (2) Repeat with X fixed at 15, so that Y is the numbe
repetit ion the value of the test statisti c D is to be compu 15th success. r of failures before the
ted and compa red
with D0 b.- The significance level is the fraction of the ( J) Repeat.w .
time that D would be 1th Y fixed at 35, so that Xis the numbe r of successes before
greate r than or equal to Doh• in infinitely many 35th fadure. the
repetitions. Coverage
probab ilities are depend ent on a similar imagin ary
set of repetitions. The
series of repetit ions with respect to which SL and CP are Under H, the probab ility of pair (x, y) in the three cases
defined is sometimes is
called the reference set for inferences about B.
Even if the experi ment were actually going to be repeat
ed over and over !1 (x, y) = (
x+
x
y) BQ(I - B )Y +y=
again, care would be requir ed in choosi ng the referen
0 for x 50; x = 0, I , .. . , 50;
ce set for inferences
about B. The planne d series of repetit ions will not
approp riate set for inferences about 8! For instance,
necessarily be the f2(x, y) = ( x + y-1)
x_ l BQ(l - 80.)Y
in the cloud seeding for x = 15; y = 0, 1, 2, ... ;
experi ment (Exam ple 15.3.3), the numbe r Z of clouds
seeded would vary in
future repetitions. Howev er significance levels and x+y -1)
coverage probabilities J3(x, y) = ( y- 1 80(1 - Bo)Y
should be compu ted from the condit ional distrib ution for y = 35; x = 0, 1, 2, ....
of X and Y, with the
ancillary statisti c Z held fixed at its observed value.
Most real experi ments do not get repeat ed over and over We h~ve three. different reference sets depend ing upon
again, and so the repeht10ns we imagine. what sequen ce of
reference set (or series of repetitions) is purely hypoth
etical. Usually all that I.n case ( !), we calculate SL by summi ng f (x, y) over
we have is a set of data from which we wish to extrac 1 all pairs (x, y) for
t inform ation about B which x + Y = 50 and D(x, y) 2 D0 b.- In (2), we sum f (x,
and a descri ption of how it was collected. It may be possib 1 y) over all (x, y) with
le to imagine many x = 15 and D(x, y) 2 Dobs· And in (3) we sum f (x , y)
different ways in which the experi ment could be repeate 3 over all (x, y) with v = 35
d. Except in some and D(x, y) 2 Dobs· The significance level will in genera
simple examples it is not obviou s what set of repetitions l be different for the
is approp riate for th_ree case~. Two observers who see the same sequence
inferences about 8. of 15 successes and 35
failures might therefore calculate different significance
Significance levels and covera ge probab ilities are depend levels (or confidence
ent on the choice Intervals) because they imagine different ways in which
of a reference set. Since it is often unclea r how the referen the experi ment might
ce set should be be repeated. And of course it is entirely possible that there
chosen, there is an .unavo idable fuzziness about the is no intenti on of
definitions of exact actually repeating the experi ment anyway!
significance levels and coverage probabilities.
It is a bit ~~settling that inferences should depend upon
In this section we consid er two examp les which illustrate an imagin ary set of
the dependence of future repetltJOns which will not actuall y be carried
SL and CP on the choice of the reference set. These examp out. Howev er this is
les also illustrate u~avoidable if we wish to consider frequency characteristic
an impor tant proper ty of the likelihood ratio statistic: s such ' as sig-
that its distrib ution is mficance l~vels and coverage probabilities. What we
remark ably stable under different possible choices of the can do is attemp t to
reference set. Thus, if lessen the 1mpo_rta~ce of choosi ng the reference set by
likelih ood ratio tests are used, it generally matter using metho ds closely
s very little how the related to the hkehh ood function.
reference set is chosen. Similarly, intervals constr ucted
from the likelihood In all three cases above, the log likelihood function of
function or from likelihood ratio tests will have practic 8 is
ally the same coverage
probab ility under a variety of different choices for the referen l(B) = x log 8 + y log(! - 8) for O < 8 <I,
ce set. T,his is an
impor tant advan tage of likelih ood-ba sed methods.
· and the MLE is 8= x/(x + y). The likelihood ratio statistic for testing
EXAMP LE 15.4.1. Suppo se that X = 15 successes and Y = 35
H: B.= B0 is
failures are
observ ed in successive Bernoulli trials with ?(succ ess)=
H: 8 = 80 using some test statisti c D(X, Y), and let Dobs
observ ed value of D. Then the significance level is the sum
8. Consid er a test of
= D(15, 35) be the
D(x,y )=-2r (B 0 )=2[ xlog --x-+ ylog
. (x + y)8 0 (x + y}(I
Y
- 80 )
J
·
of the probabilities
of pairs (x, y) for which D(x, y);;?:: Dobs· In all_ three situati ons D ~ 1 > if H is t;ue, and the approx
xf
2 imate significance
One could imagine repeat ing this experi ment in many
different ways, three
level IS P{x(ll ~ Dobs}· If we are conten t to use this large-sample approx
ima-
tion, It does not matter which of the three reference sets
is chosen .
298 15. Sufficient Statistics and Conditional Tests 15.4. Choosing the Reference Set 299

Table 15.4. l. Exact Significance Levels for first technique gives a reading X ~ N(µ, 1) where µ is the true log con-
Three Possible Reference Sets centration, while the second gives X ~ N(µ, 100). A solution is assigned to
either the first technique or the second by flipping an unbiased coin, and a
Bo Approx. Exact significance levels single measurement is taken. We wish to obtain a confidence interval for the
SL (2) (3)
true log concentration µ of this particular solution.
0.15 0.0073 0.0082 0.0081 0.0077 Define T = 0 if the first technique is used, and T = 1 otherwise. The
0.16 0.0136 0.0186 0.0151 0.0226 experiment yields a pair of values (x, t). Given t, X has standard deviation 10',
0.17 0.0237 0.0372 0.0262 0.0238 and p.d.f.
0.18 0.0393 0.0403 0.0433 0.0489
0.0685 0.0678 0.0871 1
0.19 0.0619 f(xlt) =~~-exp { -!(x - µ) 2 /10 2 '} for - oo < x < oo
0.0933 0.1087 0.0995 0.0904 10' .
0.40 0.1416 0.1528 0.1407 0.1560 and the joint distribution of X and T is
0.42 0.0798 0.0877 0.0907 0.0879
0.44 0.0421 0.0471 0.0528 0.0489 f (x, t) = f (xlt) ·f2 (t) = !J (xlt) for - oo < x < oo; t = 0, 1.
0.46 0.0208 0.0235 0.0263 0.0243 Hence the likelihood function of µ is
0.48 0.0096 0.0108 0.0104 0.0121
0.0041 0.0066 0.0057 0.0049 L(µ) =exp { -!(x - µ) 2 /10 2 '} for - oo < µ < oo.
The MLE is µ = x, and (jl, T) is a pair of minimally sufficient statistics for µ.
The exact significance level in the likelihood ratio test depends on the Note that Tis an ancillary statistic because its distribution does not depend
choice of the reference set, but the dependence is slight. For instance, consider uponµ.
a test of H: 8 = 0.2, for which Dobs = 2.82 and SL~ P{xf1l ~ 2.82} = 0.0933. In Because of the symmetry, it is natural to consider symmetric intervals
(1) we find that D(x, y) < Dobs for 6 s x s 14, and thus X±a.
14
(a) Conditional reference set. Since Tis ancillary, the arguments of Section
SL 1 = 1 - I f 1 (x, 50 - x) = 0.1087.
15.3 imply that coverage probabilities should be calculated from the
x:6 conditional distribution of X (or jl) given the observed value of T. Thus the
In (2) we have D < Dobs for 36 sys 93, and coverage probability of X ±a is
93 CP(µo) = P{µ 0 EX± alT= t} = P{IX - µ 0 / s alT= t}
SL2 = 1 - I !2(15, y) = 0.0995.
y:36
= P{IZI s a/10'}
In (3) we have D < Dobs for 4 s x s 14, and where Z - N(O, 1). For instance, if a= 3, the coverage probability is
14 P{/Z/ s 3} = 0.997 when t = 0, and P{/ZI::::; 0.3} = 0.236 when t = 1. The
SL3 = 1 - I f3(X, 35) = 0.0904. 95% confidence interval for µ is X ± 1.96 when t = 0, and X ± 19.6 when
x=4
t = 1.
Similarly close agreement is found for other hypothesized values (see Table
(b) Unconditional reference set. The unconditiona l coverage probability of
15.4.1). the interval X ± a is
For reasons similar to those given in Example 11.2.1, the significance level .
is a discontinuou s function of 80 , and the discontinuities will occur at CP(µ 0 ) = P{µ 0 EX± a}= P{/X - µ 0 1 s a}
different parameter values in (1), (2), and (3). This accounts almost entirely for
the differences among SL 1 , SL 2, and SL3. = P{IX - µol s a/T= O}P{T =0} + P{IX - µ 0 1 s alT= l}P{T= 1}
When the likelihood ratio test is used, it matters very little whether (1), (2),
or (3) is assumed. This will generally not be the case for other choices of the =!P{IZ/sa }+tP{IZ/s a },
10
test statistic D.
where Z ~ N(O, 1). For instance, the coverage probability of X ± 3 is
EXAMPLE 15.4.2. Suppose that there are two different techniques for determin-
ing the log concentration (in standard units) of a chemical in solution. The !P{JZI s 3} + !P{/Z/ s 0.3} = t(0.997 + 0.236) = 0.617
300 I 5. Sufficient Stalistics and Conditional Tests 15.5. Conditional Tests for Composite Hypo
theses
301
for all µ 0 , and so X ± 3 is a 61.7 % confidenc A special feature of the norm al distr ibuti
e interval forµ . Similarly, we find on exam ples of Chap ters 13 and
that X + 16.45 is a 95% confidence 14 is that the exact distr ibuti on of the
interval for µ. The 95% coverage likelihood ratio statistic D does not
prob abih ty is achieved by inclu ding µ depend upon the values of any unkn
0 with prob abili ty l whenever the own para mete rs. For insta nce, the
precise techn ique is used (t = 0), and with likelihood ratio statistic for testing hypo
probability 0.9 whenever t = 1. these s abou t the slope {3 in a strai ght
Clearly it is the cond ition al reference line model is
set which is appr opria te in this
example. If it is know n that the meas urem
ent was made with the more precise
techn ique, then the narro wer interval x
half of futur e meas urem ents would be made
± 1.96 shou ld be given. The fact that
wit!). the less precise technique is
[
l T2
D::n log 1 + --
n- 2
J ' where T= P-P
r.:r:· -t{n - Z)
.ys2c
irrele vant in so far as inferences abou t
µ are concerned. and c = l / Sxx (see Sections 14.4 and 13.6).
(c) Likel ihood ratio statis tic. The likel The distr ibuti on of T does not
ihood ratio statistic for testing depend upon the values of the unkn own
H: µ=µ 0 is inter cept rx and varia nce 11 2 , and so
neither does the distr ibuti on of D. Thus P{D
112.
; : -: D b,} does not depe nd on rx or
0

Whe n T = O, Dis the squa re of the N(O, Usually, the exact distr ibuti on of the test
l) varia te X - µ 0 , and when T = l, D statis tic D does depe nd upon the
is the squa re of the N(O, l) varia te value of any unkn own para mete r 8 not
(X - µ 0 )/ 10. Thus the cond ition al specified by the hypothesis. Then
distr ibuti on of D given T = t is xf > for t P{D;?; D b,} will be a function of 8 rathe r
1 = 0 and for t = 1. It follows that the 0
than a nume rical value.
unco nditi onal distr ibuti on of D is also One way arou nd this prob lem is to comp
Xfn· ute the significance level from an
In Chap ter 11 we suggested that confidenc appr opria te cond ition al distr ibuti on whic
e intervals be cons truct ed from h does not depe nd upon 8. Supp ose
the likelihood function. Since P{xfl)::::; that, unde r H, Tis a sufficient statistic or
3.841} = 0.95, we take D 5. 3.841 to set of sufficient statistics for 8. Then,
obta in the 95% confidence interval by (15.2.1), we can write the prob abili ty
of a typical outc ome y as
X ± 1.96 x !OT. P(Y = y; 8) = c(y) · H(t; 8)
This interval has coverage prob abili (15.5 .1)
ty 0.95 both cond itiona lly and whe:-e t = T(y), and c does not depe nd
unco nditio nally : upon 8. By (15.2.4), the cond ition al
prob abili ty of y given that T = t is
P{µ 0 EX± 1.96 x lOTI T = t} = P{µ
0 EX± l.96 x !OT} = 0.95.
P(Y =YI T = t) = c(y)/ d(t)
Similarly, we have (15.5.2)
where d(t) is the sum of c(y) over ally for
P{D;?; DobslT= t} = P {D;?; Dobs}, which T(y) = t.
Suppose that we comp ute the significan
ce level from the cond ition al
so we get the same signi fican ce level whet distr ibuti on of Y given the obse rved value
her or not we cond ition on T. Whe n of T:
the likelihood ratio statis tic is used, we
get the corre ct answer even if we use SL= P{D;?; DobslT= t}.
the wron g (unc ondi tiona l) reference set! (15.5.3)
We note d in Section 15.3 that, if (0, T) is . . Then, since this cond ition al distr ibuti on
minimally sufficient for 8 and TIS does not depe nd upon 8, we shall
ancillary, then L(8) is prop ortio nal to f(OIT obtai n a numerical value for the significan
= t). Because oft?i s, sig~ifican~e ce level.
tests and confidence intervals base d on An example follows which illustrates this
the likelihood ratio statistic will cond ition al proc edur e, and some
autom atica lly reflect the presence of general comm ents are given at the end of
ancillary statistics, .and condi~ional the section. Addi tiona l exam ples of
significance levels and coverage probabili cond ition al tests for comp osite hypothese
ties will usual~y differ only shghtly s are cons idere d in Section 15.6.
from the unco nditi onal values. Choi ce
of the appr opna te reference set tor
inferences abou t 8 is less impo rtant when
we work with the likelihood ratio The Hardy-Weinberg Law
statis tic.
In some simple cases, the inher itanc e of a
chara cteri stic such as flower colo r is
15.5. Conditional Tests for Composite Hyp governed by a single gene which occurs
in two forms, R and W say. Each
otheses individual has a pair of these genes, one
obta ined from each paren t, so there
His called a comp osite hypothesis if, unde are t!1ree possible genotypes: RR, R W,
r H , there remains an unkn own and WW.
para mete r or vector. of param eters 8. Mos Supp ose that, in both the male and fema
t of the examples in Chap ters 12, 13, le popu latio ns, a prop ortio n 8 of
and 14 involved tests of comp osite hypo the genes are of type Ran d the othe r 1 -
theses. 8 are of type W. Supp ose furth er that
302 15. Sufficient Statistics and Conditional Tests 15.5. Conditional Tests for Composite Hypotheses 303

mating occurs at random with respect to this gene pair. Then the proportions If n is large, then D has approximately a x2 distribution with one degree of
of individuals with genotypes RR, RW, and WW in the next generation will freedom, and
be SL~ P{xtl);?::. D0 b,}.
2 (15.5.4)
P1 = 82 , P2 = 28(1 - 8), p3 = (1 - 8) .
The unconditional probability of the event D ~ D b, would be computed by
0

Furthermore, if random mating continues, these proportions will remain summing the trinomial probabilities P(y; 8) over all y 1 , y 2 , y 3 such that
nearly constant for generation after generation. This famous result from Dz Dobs· This probability will depend upon what value is taken for the
Genetics is called the Hardy-Weinberg Law. unknown parameter 8. Instead, we compute the conditional probability of
Suppose that n individuals (e.g. pea plants) are selected at random and are Dz D 0 " ' given the observed value of T. This conditional probability is found
classified according to genotype. Let y 1 be the number with genotype RR (red by summing P( Y = yl T = t ), and it will not depend upon 8.
flowers), y 2 the number with genotype RW (pink flowers), and y 3 the number Since 8 = t/2n, conditioning on the observed t is equivalent to restricting
with genotype WW (white flowers), where y 1 + y 2 + y 3 = n. We wish to test · attention to those outcomes for which 8 equals its observed value. Hence the
whether these observed frequencies are consistent with the Hardy-Weinberg expected frequencies e 1 , e2 , e 3 will be the same for all outcomes considered in
Law (15.5.4). the conditional test.
Note that, under (15.5.4), there remains an unknown parameter 8 to be To compute the exact conditional significance level, we list all possible
estimated from the data. Thus the hypothesis to be tested is composite. outcomes (y 1 , y 2 , y 3 ) with y 1 + y 2 + y 3 = n and 2y 1 + y 2 = t. For each we
Following the procedure described above, we shall calculate the significance calculate D(y) and c(y). We sum c(y) over all these outcomes to obtain d(t),
level from the conditional distribution of the Y;'s given the observed value of a and divide to get the conditional probabilities P(Y = yl T = t). Finally, we sum
sufficient statistic T. these probabilities over all outcomes such that D(y) ~ D bs· This procedure is
0
Under the hypothesis, the distribution of the Yi's is trinomial with illustrated in the following example.
probability parameters as given in (15.5.4):
EXAMPLE 15.5.1. Suppose that n = 20 individuals were observed, and that the
P(Y = y; 8) =( n ) [8 2JY1[28(1 - 8)Y1[(l - 8) 2 ] 13 observed frequencies were as follows:
Y1Y2Y3

=( n )2Y18'(l-8)2n-1
Yi Y2Y3
Genotype

Obs. freq. y1
RR

5(2.8)
RW

5(9.4)
WW

10(7.8)
Total

20
where t = 2y 1 + y 2 • Here T = 2 Y1 + Y2 is a sufficient statistic for 8, and we
have
Here t = 2y 1 +y 2 =15, and 8 = t/2n = 0.375. The expected frequencies are as
shown in parentheses, and
P(Y = y; 8) = c(y) • H(t; 8)
5
= 2 [ 5 log- 5 + 10 log-
+ 5 log- 10] =
where H(t; 8) = 8'(1 - 8) 2"-', and Dobs
2.8
4.45.
9.4 7.8
c(y) =( n )1Y>. All possible outcomes (y 1 , y 2 , y 3 ) with y 1 + y 2 + y 3 = 20 and 2Yi + Y2 = 15
Y1Y2Y3 are listed in Table 15.5.l together with the corresponding values of D(y) and
By (15.5 .2), the conditional probability of outcome y given that T= t is c(y). Summing c(y) gives d( 15) x 10- 10 = 4.0225, and we divide by this value
c(y)/d(t) where d(t) is the sum of c(y) over ally such that 2y 1 + y 2 = t, and, of to get the conditional probabilities P(Y = yl T = 15) = c(y)/d(l5). There are
course, y 1 + y 2 + y3 = n. four outcomes in the table such that D;?::. D0 b,, and summing their proba-
The MLE of 8 is 8 = t/2n, and the estimated expected frequencies for the bilities gives
three genotypes are SL= 0.0126 + 0.0370 + 0.0028 + 0.0001 = 0.0525.
e1 = n0 2 , e2 = 2nB(l - B), e 3 = n(l - 0) 2
. For comparison, the large-sample approximation gives
By (12.5.1), the likelihood ratio statistic for testing the hypothesis (15.5.4) is SL~ P{xtii z 4.45} = 0.035.
D(y) = 2'.Eyi log (y1/eJ)· The agreement is not too bad in view of the small expected frequencies.

J:
304 15. Sufficient Statistics and Conditional Tests
15.6. Some Examples of Conditional Tests
Table 15.5.1. Evalu ation of the Exact Cond itiona 305
l Significance Level in a
Test of the Hard y-We inber g Law
tht; unkno wn param eter under H. The secon d
facto r does not depen d upon 8,
and is used for testing the hypothesis H.
Y1 Y2 Y3 D( y) c(y) x 10- 10 P(Y = ylT = 15) Often T can be thoug ht of as a measure of precis
ion, and there are good
0
1
15
13
5 9.57 0.0508 0.0126
reasons for condi tionin g on its obser ved value
indicates the amou nt of inform ation =
. For instan ce, T 2 Y + Y
availa ble for testing the
1 2
6 3.22 0.4445 0.1105
2 11 7
Hard y- Weinberg Law. If Tis close to 2n, then
0.60 1.2383 0.3078 almos t all individuals must
3 9 8 necessarily fall in the RR class, wheth er or not
0.04 1.41E9 0.3527 the Hard y- Wein berg Law
4 7 9 1.30 holds , and it will not be possible to obtai n evide
0.7095 0.1764 nce again st this hypo thesis. A
*5 5 10 4.45 0.1490 similar comm ent applies when T is dose to
0.0370 0. The prosp ect of obtaining
6 3 11 9.86 0.0113 evidence again st the hypothesis is much better
0.0028 when Tis close to n. Thus Tis
7 1 12 18.69 0.0002 a measure of the experiment's precision, and one
0.0001 can argue , as in Sectio n 15.4,
that inferences shoul d be made condi tiona l on
Total 4.0225 its obser ved value.
0.9999 Cond itioni ng on a set of sufficient statistics will
not always give satisfactory
results, because in so doing we may disca rd some
of the inform ation releva nt
to assessing the hypothesis. This inform ation loss
can be subst antia l in some
Note that only one of the Y;'s is "free to vary" in examples. As a general rule, it seell).s dange
Table 15.5.1 , the other two rous to use this condi tiona l
then being deter mined by the const raints y + proce dure unless 8 1s sufficient fore arid T is a
1 y 2 + y 3 = 20 and 2y 1 + y 2 = 15. one-t o-one funct ion of 8, as in
This is directly relate d to the single degre the Hard y-We inber g example. If/') is not suffic
e of freedom in the x2 ient, it is proba bly better to use
appro xima tion. the condi tional distri butio n of Y given 8, even
though this distri butio n will
It is possible to obtai n an algeb raic formula for not be completely indep enden t of e.
P(Y = y\T = t) in this case.
Since T= 2Y1 + Y2 repre sents the total numb Again, there are advan tages in takin g D to be
er of R-genes out of 2n genes the likelihood ratio statis tic
selected at rando m, the distri butio n of Tis binom for testing H. In large samples, D and /') are distri
ial (2n, 8). It follows that buted indep enden tly of one
anoth er. Significance levels comp uted from the 2
x appro xima tion (12.3 .2) can
P(T = t) = (2;) 8'(1 - 8) 2" - r fort= 0, 1, ... , 2n
therefore be regarded as either condi tiona l (given
in very small samples, the condi tiona l distri butio
B) or uncon ditional. Except
n of D given tJ will be almost
and (15.2.3) gives the same as the uncon dition al distri butio n of D.
As we noted in Section 15.4,
the distri butio n of the likelihood ratio statis tic
is remar kably stable under
P(Y= y\T= t)= P(Y= y) = (
P(T= t) Y1Y2Y3
2n / (2n) .
t
n) different possible choices for the reference set.
usuall y doesn 't matte r much wheth er the signif
With likelih ood ratio tests it
icance level is comp uted
conditionally (given 8) or unconditionally.
In the exam ple we have

P(Y =y\T =15) =( 20 )2Y'/ (40)


Y1Y2 Y3 .15 15.6. Some Examples of Con ditio nal Tests
and this formu la could have been used to calcu
late the last colum n of Table
15.5.1. A condi tiona l proce dure for testing comp osite
hypot heses was described in
Section 15.5. In this section, we give some addit
ional exam ples of condi tiona l
test5.
Discussion
The condi tiona l test is based on a factorizatio Comparison of Binomial Prop ortio ns
n of the distri butio n of Y:
P(Y = y; 8) = P(T= t; 8)· P(Y = ylT= t).
Suppo se that Y1 and Y2 are indep enden t with
Since Tis sufficient for 8, the first factor carries Y ~binomial (n 1 , p 1 ) and
all of the inform ation abou Y2 - binom ial (n 2 , p 2 ), and that we wish to test 1
t 8, the comp osite hypot hesis
H : p 1 = p 2 = p, say, where pis unkno wn. Unde r
H , the joint proba bility function
15.6. Some Examples of Conditional Tests 307
15. Sufficient Statistics and Conditional Tests
306
Table 15.6.l. Calculatio n of Condition al Significan ce Level in Example
of Y1 and Y2 is 15.6.1

P(Y = y ; p) = C:) p"'(l - Pt° - y'. C:) p" 2(l - p)WY2 Y1 Yi


18
g(y1 , Yi)

0.0000
D(Y1 , Y2)

29.63
Yi
10
Y2
8
g(y1,Y2)

0.1818
D(yi, Yi)

0.28
0
= CJC:)p'(l-p)··+.,-, l
2
17
16
0.0000
0.0002
20.92
15.21
11
12
7
6
0.1215
0.0616
1.13
2.55
4.60
3 15 0.0013 10.80 13 5 0.0233
where t = y 1 + y 2 . Thus T .= Y1 + Y2 is a sufficient statistic for p, and_the test *4 14 0.0065 7.32 14 4 0.0065 7.32
of H will be based on the condition al distributio n of Y1 and Y2 given the 5 13 0.0233 4.60 15 3 0.0013 10.80
12 0.0616 2.55 16 2 0.0002 15.21
observed value of T. 6
0.0000 20.92
The distributi on of Tis binomial (n 1 + n2 , p), and so 7 11 0.1215 1.13 17 l
IO 0.1818 0.28 18 0 0.0000 29.63
8
P(T=t;p )= ("1 ~"2)p'(l -p)"•+•2-r_ 9 9 0.2078 0.00
Total l.0002

By ( 15.2.3), the condition al distributi on of Y given T = t is


The 19 possible outcomes (y 1 , y:z) with y 1 + y 2 = 18 are listed in Table
PY= T=t =P(Y=y ;p)=("1 )("2)/(" 1+n2) 15.6.1. There are 10 outcomes with D ~ D0 bs, and we sum their condition al
YI ) P(T= t; p) Y1 Y2 . t
( probabilit ies to obtain the exact condition al significan ce level, SL= 0.0160.
where y 1 + y 2 = t. This condition al distributio n is hypergeom etric, and it For compariso n, the large-sam ple approxim ation gives
does not depend upon the unknown paramete r p. SL::::: P{xfi> ~ 7.32} = 0.0068.
Under H, the MLE of pis p = t/(n 1 + n 2 ). From Section 12.4, the likelihood The agreemen t with the exact result is not very good, although the general
ratio statistic for testing H is conclusio n (strong evidence that p 1 #- p 2 ) is the same in either case.
Y· n;-Y;
When there is only one degree of freedom, the accuracy of the large-sam ple
D(y) = 2Ly; log~ + n:(n; - Y;) log n; (l - p-) ·
n;p approxim ation to the exact condition al significan ce level can often be
improved by using a continuity correction (see Section 6.8). In this example,
The exact condition al significan ce level is found by summing P(Y =YI T = t)
we replace y 1 = 4 by y 1 = 4.5 and y 2 = 14 by y 2 = 13.5 before computin g D:
over ally such that y 1 + y 2 = t and D ~ D0 b• · Note that, since p = t/(n1 + n2),
We then obtain D0 b, = 5.87, and
the estimated expected frequencie s n;p and n;(l - p) will be the same for all Y
considere d in the condition al test. SL::::: P{xf1 > ~ 5.87} = 0.0154
which is much closer to the exact result.
EXAMPLE 15.6.L For the data of Example 12.4.l, we have n 1 =n2=44, and
the observed value of Tis t = 14 + 4 = 18. Hence p = 18/88, and the estimated
expected frequencie s are n; f5 = 9, and n;(I - p) = 35. The likelihood ratio Exact Test for Independence
statistic for testing H: Pi= P2 is
As in Section 12.6, we consider an a x b contingen cy table (f;;) with row totals
D(y 1 , y 2 ) = 2Ly; log(y;/9) + 2L(44 - Y;) log((44- yJ/ 35)
r; and columns totals c1 where Lr;= Lc1 = n. The J;/s have a multinom ial
with observed value D0 b, = D(4, 14) = 7.32. The condition al probabilit y distributio n, and the independe nce hypothesi s is H: Pu= a.J31 where the a./s
function of(Y1 , Y2 ) given that T= 18 is and /3/s are unknown parameter s. Under H, the probabilit y of the f;/s is
n a b

P(f; a., {3) = U11J12 ... lab) i=f1I j=f1I (a.1 f31Y 11

. }
where y 1 + y 2 = 18. If p 1 = p 2 , then the 18 rats with tumors are a ra~do'll '
sample without replaceme nt from the 88 rats in the study, and g(y1 , Y2) is the
probabilit y that y 1 of the rats with tumors received the low dose and the other The r/s and c/s are sufficient statistics for the unknown parameter s.
y = 18 - y 1 received the high dose.
2
308
15. Sufficient Statistics and Con
ditional Tests
15.6. Some Examples of Con
The r;'s have a multinomial dist ditional Tests
ribution with class probabilities 309
and the c.'sJ are multinomial with class pro Ct. 1 , Ct. 2 , . • • , rxa
the independence hypothesis , babilities /3 1, [J 2 , ... , /3b· Under
the r/s are distributed independe Table 15.6.2. Conditional Exa
Hence the probability functio ntly of the c/s. ct Test for
n of the f;/s given the sufficie Independence jn a 2 x 2 Table
nt statistics is
x g(x)

I(r .~. r.) (c .~ . cJ ·


D(x ) x
P(f jr, c) = Ui if :1 ... Jab)
1 1
35
36
0.0021
0.0187
12.47 45
g(x)

0.0012
D(x )

10.69
The exact conditional signific 6.16 46 0.0002 14.97
ance level will be computed 37 0.0731
ditional distribution. from this con- 2.92 47 0.0000 20.05
38 0.1641 1.02
By (l 2.6.1) and ( 12.6.2), the 48 0.0000 26.00
likelihood ratio statistic for 39 0.2367 0.13
independence hypothesis is testing the 49 0.00 00 32.96
40 0.2320 0.o7 50 0.0000 41.12
41 0.1594 0.78
D = 2l..J:.J;i log (J;) eii) 51 0.0000 50.81
42 0.0781 2.21 52 0.00 00 62.75
where eii = ric /n. Not e tha t the 43 0.0275 4.33
1 estimated expected frequencies 53 0.00 00 80.40
same for all f;/s considered in eii will be the *44 0.0069 7.15
a conditional test.
To carry out an exact test of the Total 1.0000
independence hypothesis, we
(f;i) having the same row and list all tables
column totals as the observe
conditional probability and valu d table. The which simplifies to a hypergeom
e of Da re computed for each suc etric distribu tion :
exact conditional significance h table. The
level is then found by summin
all such tables for which D ~ Dob g P(f lr, c) over
s · Except in very small exampl for x = 35, 36, .. . , 53.
will be needed for the calculat es, a computer
ions.
The likelihood ratio statistic
is
EXAMPLE 15.6.2. In Example
12.6.1 we carried out an app

4 .~ 6 J
independence in the following roximate test for
2 by 2 table: D(x) = 2 [ x log :
3 56 + .. · + (x - 35) log
44 (39.56) with observed value Dobs = D(4
9 (13.44) 53 4) = 7.15 .
9 (13.44) 9 (4.56) 18 Fro m Table 15.6.2 we see tha t
D(x ) ~ D bs for x = 35 and
the exact significance level is 0 for x ~ 44. Hence
53 18 71
SL = g(35) + g(44) + g(45) +
Expected frequencies und er
the independence hypothesis
.. . + g(53) = 0.0104,
parentheses. are shown in and the observed table give
s strong evidence against the
For an exact test , we need to list independence. hypothesis of
all tables having the same row Iri this example the row and
totals as the observed table. and column column totals are modelled
The general form of such tabl variables, but we condition on as ran dom
es is
their observed values in the
independence. The independe exact test for
x 53- x 53 nce test .would be the same if
marginal totals had been fixed some or all of the
53 -x x- 35 18 prio r to the experiment. See the
Example 12.6.1. note following
53 18 71
where x = 35, 36, ... , 53. Onl EXAMPLE 15.6.3. Is the followi
ng 2 x 3 contingency table con
y one of the frequencies is hypothesis tha t the row and colu sistent with the
cor resp ond ing to the single deg "free to vary", mn classifications are indepe
ree of freedom for the approx ndent?
The conditional p.f. of such a imate X2 test.
table is
Total
g(x )= C 53 -x ;~-x x-3 5) / (53
71
18) (53
71
18)
1 (1.8)
2 (1.2)
1 (3.0)
4(2.0)
7 (4.2)
0 (2.8)
9
6
Total 3 5 7 15
310 15. Sufficient Statistics and Conditional Tests 15.6. Some Examples of Conditional Tests 311

SoLUTJON. The expected frequencies under the hypothesis of independence conditional probabilities:
are shown above in parentheses. Since these are small, it is advisable to carry
SL= g(O, 2) + g(I, l) + g(2, 0) + g(3, 0) + g(3 , 5) = 0.0084.
out an exact test for independence. For this we require a list of all 2 x 3 tables
having the same marginal totals as the observed table. The general form of The observed table gives strong evidence against the hypothesis of
these tables is independence. 0
x y 9-x-y 9
3-x 5-y x+y-2 6 PROBLEMS FOR SECTION 15.6

3 5 7 15 I. In a pilot study, a new deodorant was found to be effective for 2 of 10 men tested
and for 4 of five women tested. Carry out an exact conditional test of the
Just two of the frequencies are "free to vary", corresponding to the two hypothesis that the deodorant is equally effective for men and women.
x
degrees of freedom for the 2 approximation.
2.tTwo manufacturing processes produce defective items with probabilities p 1 and
There are 24 pairs (x, y) with 0 ~ x ~ 3 and 0 ~ y ~ 5, but three of these
p2 , respectively. It was decided to examine four items from the first process and
have x + y < 2 and would give a negative entry in the table. Thus there are
sixteen items from the second. In each case, two defectives were found. Perform an
only 21 allowable pairs (x, y) (see Table 15.6.3). The conditional probability exact conditional test of the hypothesis p 1 = p2 •
function is
3.tTwo manufacturing processes produce defective items with probabilities Pi and
g(x, y) p2 , respectively. Items were examined from the first process until the rth defective
15 had been obtained, by which time there had been Xi good items. The second
=C y 9-x-y 3-x process gave x 2 good items before the rth defective.
(a) Write down the joint probability function of X 1 and X 2 . Show that, if p 1 =
725.035
p2 = p, then T = X 1 + X 2 is a sufficient statistic for p.
- x! y!(9 - x - y)!(3 - x)!(5 - y)!(x + y - 2)! (b) For each process, items were examined until r = 2 defectives had been found.
and the likelihood ratio statistic is Process 1 gave 2 good items, and process 2 gave 14 good items. Carry out an
exact conditional test of the hypothesis p 1 = p2 , and compare the significance
x y level with that obtained in Problem 2.
D(x , y) = 2 x log ii+ y log .0
[ + + (x + y - 2)log x+y-2]
. .
3 28 4. Twelve pea plants were observed, and there were four of each of the genotypes
From Table 15.6.3. we see that D 0 b• = D(l, l) = 11.37. There are 5 tables for RR, RW, and WW. Use a conditio11al test to determine whether these results are
which D ~ 11.37, and the exact significance level is the sum of their consistent with the Hardy- Weinberg law (Section 15.5).
5. A study of the effect of Interferon on the severity of chicken pox was carried out
Table 15.6.3. Conditional Exact Test for Independence in a with 44 childhood cancer victims who had developed chicken pox. Doctors gave
2 x 3 Table Interferon to 23 children, and the. other 21 received an inactive placebo. The
disease was fatal or life-threatening in 2 of those who received Interferon, and in 6
x y g(x) D(x) x y g(x) D(x) of those who did not Test the hypothesis that disease severity is independent of
the treatment.
0 2 0.0020 13.46 2 2 0.1259 1.27
0 3 0.0140 7.72 2 3 0.2098 0.08 6.t An investigator wishes to learn whether the tendency to crime is influenced by
0 4 0.0210 6.81 2 4 0.1049 1.81 genetic factors. He argues that, if there is no genetic effect, the incidence of
0 5 0.0070 10.63 2 5 0.0126 8.00 criminality among identical twins should be the same as that among fraternal
•1 1 0.0030 11.37 3 0 0.0014 14.45 twins. Accordingly, he examines the case histories of 30 criminals with twin
1 2 0.0420 3.90 3 1 0.0210 6.81 brothers, of whom 13 are identical and 17 are fraternal. He finds that 12 of the
3 0.1259 1.27 3 2 0.0699 3.90 twin brothers have also been convicted of crime, but only two of these are
4 0.1049 1.81 3 3 0.0699 3.90 fraternal twins. Perform an exact conditional test of the hypothesis of no genetic
5 0.0210 6.81 3 4 0.0210 6.81 effect.
2 0 0.0006 16.37 3 5 0.0014 14.45
7. (a) Suppose that X and Y are independent and have Poisson distributions with
2 0.0210 5.63
Total 1.0002 means µ and v, respectively. Derive the appropriate conditional distribution
for a test of H: µ = kv, where k is a given constant.
312 15. Sufficient Statistics and Conditional Tests 15.6. Some Examples of Conditional Tests
313
(b) There were 13 accidents in a large manufacturin
g plant during the two weeks (a) Find a sufficient statistic for p.
prior to the introd uction of a new safety program. There (b) If the genes are not linked, they lie on differe
were oniy 3 accidents nt chromosomes, and p =}
in the week following its introduction. Test the hypoth Evidence against the hypothesis p =!is thus evidence
esis that the accident that the genes are linked .
rate has not changed. Describe an exact test for this hypothesis.
8. A likelihood ratio test for the hypothesis of margin (c) Describe exact and approximate tests of the model
al homogeneity in a 2 by 2 when p is unknown.
table was described in Section 12.8. 13. A lethal drug is administered to n rats at each
of k doses d 1 , d 2 , .. ., dk. Let the
(a) Show that the significance level in an exact condit numbers of deaths be Y1 , Y2 , .. ., Y,.. According to
ional test of this hypothesis the logistic model (Sectio n
will be computed from the binomial distribution 10.5), the probability of death at dose d is
1

p(d1) = e-+P 4 •/(l + ea+P 4 •).


=
parameters a and {J.
1=
(a) Show that S L. Y, and T L.d Y, are sufficient
statistics for the unknown
(b) Carry out a conditional exact test using the data
of Section 12.8. (b) Show that the conditional probability function
of the Y,'s given S and T is
9. Articles coming off a produc tion line may be classifi
ed as acceptable, repairable, or
useless. If n items are examined let X t> X , and X
2 3 be the number of acceptable,
repairable, and useless items found. Suppose that it is
twice as probable that an item
is acceptable as it is that it is repairable. where c is chosen so that the total conditional probab
ility is 1.
(a) Show that X 1 + X 2 is a sufficient statistic for p, (c) In an experiment with 10 rats at each of 3 doses
the probability of a repairable -1, 0, 1, the numbers of
item. deaths observed were 3, 0, and 10, respectively. Perfor
m an exact condit ional
(b) Of six items examined, one is acceptable, four test of the logistic model.
are repairable, and one is (d) In an experiment with 10 rats at each of the 4 doses
useless. Use an exact conditional test to assess - 3, -1, 1, 3, the numbers
the agreement of these of deaths observed were 1, 6, 4, and 10, respectively.
observations with the model. Are these frequencies
consistent with the logistic model?
10.tln a certain factory there are three work shifts: days
(#1), evenings (#2), and nights
(#3). Let X 1 denote the number of accidents in the ith
shift (i = 1, 2, 3). The X;'s are
assumed to be independent Poisson variates with means
µ 1 , µ 2 , and µ 3. There are
only half as many workers on the night shift as on
the other two. Hence, if the
accident rate is consta nt over the three shifts, we should
have µ 1 = µ 2 = 2µ 3. Set
up an exact conditional test for this hypothesis.
11. Suppose that n families each with three childre
n are observed. Let X 1 be the
numbe r of such families which contain i boys and 3 -
i girls (i = 0, 1, 2, 3). If births
are independent, the probability that a family of 3
has i boys will be given by

P1= C)81 (1-8) 3-i for i = 0, 1, 2, 3

where 8 is the probability of a male child.


(a) Show that L= X 1 + 2X + 3X is a sufficient
2 3 statistic for 8 and has a
binom ial distribution with parameters (3n, 8).
(b) In 8 families there were 3 with three boys, 2 with
one boy, and 3 with no boys.
Use an exact conditional test to investigate whethe
r these results are
consistent with the model.
12.tln an experiment to detect linkage of genes,
there are four possible types of
offspring. According to theory, these four types have
probabilities p/2, 1 - p/2,
1 - p/2, and p/2, where p is an unkno wn param eter
called the recombination
fraction. Let X 1 , X 2 , X , and X be the frequencies
3 4 of the four offspring types in n
independent repetitions.
16.1. The Fiducial Argument
315

that 8 s k for any specified parameter value k. The procedure for obtaining
CHAPTER 16* this probability is called the fiducial argument, and the probability is called a
fiducial probability to indicate the method by which it was obtained.
Topics in Statistical Inference
Probability Distributions of Constants
In the fiducial argument, the probability distribution of a variate U is
regarded as a summary of all the available information about U. This
distribution continues to hold until such time as additional information
about U becomes available. If U has a certain distribution before an
experiment is performed, and if the experiment provides no information
about U, then U has the same distribution after the experiment as before.
For example, consider a lottery in which there are N tickets numbered
1, 2, .. . , N , one of which is selected at random. Let U denote the number on
the winning ticket. Then
1 (16.1.1)
P(U =u)= - foru=l,2, .. . , N.
N
In Chapters 9- 15 we have used likelihood methods, confidence intervals, and Now suppose that the winning ticket has been chosen, but that the number U
significance tests in making ·inferences about an unknown parameter 8. In tHis not been announced. A value of U has now been determined, but we have
Sections 1 and 2 below, we consider two additional methods for making no more information concerning what that value is than we had before the
inferences about an unknown parameter. With both the fiducial argument draw. A ticket-holder would presumably feel that he had the same chance of
and Bayesian methods, information concerning 8 is summarized in a winning as he had before the draw was made. The .fiducial argument is based
probability distribution defined on the parameter space. For · Bayesian on the assertion that (16.1.l) summarizes the uncertainty about U even after the
methods one requires prior information about 8 which is also in the form of a draw has been made, provided that no information concerning the outcome of the
probability distribution. For the fiducial argume~t, 8 must be completely draw is available. After the draw, U is no longer subject to random variation,
unknown before the experiment. but is fixed at some unknown value. Now (16. l.l) summarizes all the available
In Section 3, we consider the problem of predictirig a value of a random information concerning the unknown constant U, and may be called its
variable Y whose probability distribution depends upon an unknown fiducial distribution .
parameter 8. When a Bayesian or fiducial distribution for 8 is available one The fiducial argument does not involve any new "definition" of probability.
can obtain a predictive distribution for Y which does not depend up~n 8. Instead, it enlarges the domain of application of the usual (long-run relative
Section 4 considers the use of predictive distributions in statistical inference, frequency) notion of probability. Of course, one could take the position (as
with particular reference to the Behrens- Fisher problem. Finally, in Section 5 some people have) that (16.1.l) applies only before the draw, and that, after
we illustrate how a test of a true hypothesis can be used to obtain intervals of the draw, no probability statements whatsoever can be made. This position
reasonable values for a future observation or an unknown parameter. seems unnecessarily restrictive, and if adopted, would rule out many
important applications of probability. .
Before proceeding with the general discussion, we illustrate the fiductal
argllinent in two examples.
16.1. The Fiducial Argument
EXAMPLE 16.1.1. A deck of N cards numbered 1, 2, ... , N is shuftled and one
Suppose that we have obtained data from an experiment whose probability card is drawn. Let U denote the number on this card. Then Uhas probability
model involves a real-valued parameter 8 which is completely unknown. We distribution (16.1.l ). To this number is added a real number 8 which is
shall see that, under certain conditions, it is possible to deduce the probability completely unknown to us. We are _not told the value of U or the value of 8,
but only the value of their total T = 8 + U. What can be said about 8 in the
*This chapter may be omitted on first reading. light of an observed total t?
316
16. Topics in Statistical Inference
16.1. The Fiducial Argument
The observed total t cou ld hav 317
e arisen in N different ways:
(u = 1, 8 = t - 1), (u = 2, 8 = t - 2), ... , (u = N, Not e that probability stat eme
8 = t - N). nts obta ined from (16. 1.2) are
Given t, there is a one -to- one would be obtained if 8 were a ran the same as
correspondence between values dom variable having a normal
possible values of 8. If we kne of U and with mean t and variance !. We dist ribu tion
w the value of 8, we could dete say tha t given T = t, the fiducial
value of U had .been obtained. rmine which of 8 is N(t, 1). This does not mea di stribution
If we knew that 8 was an even inte n that 8 is a ran dom variable, but
could deduce whether an odd ger, then we we know precisely as much abo rath er tha t
or even value of U had been ut e as we would abo ut an obs
However, if we know noth ing abo obtained . t::>ken at ran dom from N(t, 1). ervation to be
ut 8, then the experiment will tell Fro m (16. l.2), the cumulative
abo ut U; the stat e of uncertainty us nothing dist ribu tion function of the fidu
concerning the value of U will buti on of 8 is F(8 - t), where cial distri-
after the experiment as before. be the same F is the c.d.f. of N(O, 1). Differe
Hence we assume that (16.1.1) also respect to e gives ntiation with
tis known. But, given t, 8 has N holds when .
possible values t - 1, t - 2, ... ,
to-o ne correspondence with the t - N in one-
possible values of U, and we may
write .!__ F(8 - t) = f (e - t) o(e - t) = f (8 -
. 1 08 t)
P(8 = t - u) = P(U = U) = - ' : 08
u = 1, 2, ... ' N.
n where f is the p.d.f. of N(O , I). Hence the fidu
Thi s probability dist ribu tion ove cial p.d.f. of 8 is
r the possible values of 8 is called
distribution of 8.
For instance, suppose that N
= 13, and that the observed
the fiducial

total is r = 20.
f(8; t) = fo exp {- ~(8 - t) 2 } for - 00 < e < 00 .
The n 8 .~as 13 possible values
19, 18, 17, ... , 7, each with probab This is the p.d.f. of a normal dist
probab1hty of any subset of ility fl. The ribu tion with mean t and vari
8 values is now obtained by add result of the fiducial argument, ance 1. As a
example, ition. For 8 and T have switched roles, with
t now appearing as a "pa ram the observed
eter " in the fiducial dist ribu tion
P(8 .$l l)= P(8 =1 1)+ P(8 =1 of e.
0)+ ·· · +P (8= 7)= 2_
Alternately, we may note that 13·
if 8 .$ 11, then the observed tota Sufficient Co ndi tio ns for the Fid
resulted from a value of U grea l 20 must have uci al Arg um ent
ter than or equal to 9, and hen
ce In the preceding two examples,
we mad e use of a qua ntit y U
P(8 .$ 11) = P(U ~ 9) = 5 function of bot h the data and which was a

EXA MPL E 16.1.2. Sup pos


0 -. dist ribu tion did not depend
the para met er, and whose pro
upo n 8. Such a function is call
bability
e that T ~ N(8, 1) where 8 is quantity. ed a pivo tal
and tha t the experiment yields completely unknown,
an observed value t. We define The following conditions are suff
that U has a stan dard ized nor U := T- 8, so icient to perm it app lica tion of
mal distribution. The observed argument in the one-parameter the fiducial
from some pair of values (U = value t arose case:
u, 8 = t - u). Given t, there is a CL There is a single real-val
correspondence between possibl one-to-one ued par ame ter 8 which is complet
e values of U and possible values C2. There exists a statistic T ely unknown.
is unk now n, the experiment will of 8. Since 8 which is minimally sufficient for
tell us noth ing abo ut which valu C3. There exists a pivotal qua 8.
actu ally obtained. Consequently, e of U was ntity U = U(T, 8) such that
we assume that U ~ N(O , l) even
been observed. after t has
(a) for each value of 8, U(t , 8)
We can now com put e probabiliti is a one -to- one function oft ;
es of statements abo ut 8 by tran (b) for each value oft, U(t, 8) is
them into stat eme nts abo ut U. sforming a one -to- one function of e.
For instance, 8 :s; k if and only If the variate Tis con tinu ous, we
and hence if U ~ t - k, also require that Ub e con tinu ous
monotonic) in bot h t and e. (and hence
P(8 .$ k) = P(U ~ t - k) = 1
- F(t - k) = F(k - t) (16. 1.2) The pur pos e of conditions C2 and
C3(a) is to ens ure that inferences
where Fis the N(O, 1) cumulative are based on all of the relevant abo ut 8
distribution function. For any k, information con tain ed in the data
pro bab ility of 8 $ k can be obta the fiducial replaced by the weaker con dition . C2 can be
ined from N(O , 1) tables. For exa
mple, if we th~t there exists a set of minimally
observe t = 10, the fiducial pro st3.tistics (T, A) where Tis real-val sufficient
bability of 8 $ 11 is ued and A is a vector of ancillar
(see Section 15.3). We then use y statistics
P(8 $ 11) = F(l l - 10) = 0.84
the con diti ona l dist ribu tion s of
1. the observed value of A. T and U given
Given T = t, there is a one-to-one
c0rrespondence between possibl
e values
318 16. Topics in Statistical Inference 16.1. The Fiducial Argument 319

of 8 and possible values of U by C3(b). Since 8 is completely unknown, still distributed as xfin>· Statements about 8 can now be converted into
observing twill give us no information about which value of U was actually statements about U, and their probabilities can be obtained from tables of the
obtained. Hence we assume that the distribution of U is the same after t has x1 distribution.
been observed as it was before observing t. Given t, we can convert statements The fiducial p.d.f. of 8 can be obtained from the p.d.f. of U by standard
about 8 into statements about U and hence obtain their (fiducial) change of variables methods. By (6.9.1 ), the p.d.f. of U is
probabilities. for u > 0
The above conditions are quite restrictive. In particular, C3(a) and (b)
imply a one-to-one correspondence between values of T given 8, and values of where k = 1/2"r(n). The fiducial p.d.f. of e is thus
8 given T, which will very rarely exist if T is discrete. Example 16.1.l is
exceptional in that, when t is known, there are only finitely many possible
dul
g(8; t) = f(u). d8 = k
(2t)n-
e l
e-t/9.
2t
e1
values for e. l
If the sufficient statistic Tis continuous, one can usually take U = F(T; 8),
where Fis the cumulative distribution function of T. From Section 6.3, U has
- 1
- 8r(n)
(t)"
8 e
-1/9 fore> 0.
a uniform distribution between 0 and 1 ·for each value of 8, and hence is a
pivotal quantity. Since F(t; 8) = P(T s;; t) is an increasing function oft, C3(a) EXAMPLE 16.1.5. Consider the situation described in Example 16.1.1, but now
will also be satisfied, and only C3(b) needs to be checked. If C3(b) holds, then suppose that n cards are drawn at .random with replacement from the deck.
P(8 s;; k) will be equal to either F(t; k) or 1 - F(t; k), depending upon whether The same unknown 8 is added to the number on each card, and we are told
F(t; 8) is an increasing or decreasing function of 8, and the fiducial p.d.f. of 8 is then totals x 1 , x 2 , .•. , x •. We wish to make inferences about 8 on the basis of
given by the data.
Each Xi can take N equally probable values 8 + 1, 8 + 2, ... , 8 + N, so that
f(B; t) = [: F(t;
8
e)[. the probability function of X; is
f(x) = P(X 1 = x) = N- 1 for x = e + 1, e + 2, ... , e + N.
EXAMPLE 16.1.3. Suppose that the MLE & is a sufficient statistic for the Under random sampling with replacement, the Xi's are independent, and
unknown parameter IX, and that &"'N(IX, c) where c is a known constant. hence their joint probability function is
Then the standardized variable
fore+ 1 s;;x 1 ,x 2 , .• .,x.s;;O+N.
Z = (&-1X)/Jc
The likelihood function of 8 is thus constant over the range of possible
is pivotal and is distributed as N(O, 1). It satisfies conditions 3{a) and 3(b). To e
parameter values. We must have + 1 s;; x(l) and + N:?: X(n)> where x(l) ande
obtain the fiducial distribution of IX, we assume that Z is still distributed as x<n> are the smallest and largest sample values, so that
N(O, 1) when the variate &is replaced by its observed value. Then we have
L(8) = 1 for X(n) - N :£, e:£, x(l) - 1.
1X::&-zJc
It follows that
x(l) and x(n) are jointly minimally sufficient for e.
where &and care known constants, and (6.6.6) gives ·The number of possible parameter values is
IX"' N(&, c). x(ll - 1- [x<•> - N - 1] = N- a
Given &, the fiducial distribution of a is normal with mean &and variance c. where A = X<•> - X< 1 > is the sample range. The larger the value of A obtained,
the more precisely we may determine the value of 8. Ifwe observe A= 0, there
EXAMPLE 16.1.4. Let X 1 , X 2 , •• ., X" be independent variates having an are N equally likely values for 8, but if A= N - 1, the value of 8 can be
exponential distribution with unknown mean 8. Then T = EXi is sufficient for determined exactly without error. Thus A is a measure of the experiment's
e, and informativeness, and is in fact an ancillary statistic. To see this, we write
U = 2T/8 ~ XfinJ X 1 =8 + Ut> where U1 is thenumberon the ithcarddrawn(i = 1,2, ... , n). Then
X< 1> = 8 + U< 1 > and X<n> = 8 + U<n» so that
is a pivotal quantity satisfying conditions 3(a) and 3(b). To obtain the fiducial
distribution of 8, we replace T by its observed value t and assume that U 1s A= X<•> - X< 1 > = U<•> - U<i»
320
16. Topics in Statistical Inference 16.2. Bayesian Methods
321
The distr ibuti on of A thus depe nds only
on the range of numb ers which Two-Parameter Fiducial Distributions
appe ar on then card s draw n, and does
not depe nd upon 0.
We now define a statis tic T such that the Sometimes a doub le appli catio n of the one-
trans form ation from X<tl• X<n> to
T, A is one- to-on e; for insta nce, we could para mete r fiducial argu ment can
take T = X(l>- Then T, A are joint ly be used to obtai n a two- param eter fiduc
sufficient for 0 and A is ancillary. Infer ial distr ibuti on. However, there are
ences abou t 0 will be based on the examples where this can be done in two
cond ition al distr ibuti on of T given the or more different ways , leadi ng to
observed value of A . To obtai n this different two- param eter distr ibuti ons. Ther
distr ibuti on, we could first derive the e are serio us difficulties in exten d-
joint prob abili ty function of x(l) ing the fiducial argu ment beyo nd the one-
and X<n> as in Prob lem 7.2.11, chan ge para mete r case, and the precise
variables, sum out T to get the cond ition s unde r which this can be done
prob abili ty function of A, and divid are not known.
e to get the required cond ition al
prob abili ty function,

I
16.2. Bayesian Methods
f (tla; 0) = -- fort = 0 + l, 0 + 2, ... , 0 + N - a.
N- a In all of the proc edur es discussed so far,
only the infor mati on prov ided by the
Give n that A = a, the n total s must fall experimental data is formally taken into
in a rang e of lengt h a which lies acco unt. How ever, in some situ-
entirely between 8 + 1 and 8 + N. Ther ation s we may wish to incor pora te infor mati
e are N - a such ranges, with lower on abou t 8 from othe r sourc es as
limit s 8 + 1, 8 + 2, ... , 0 + N - a, and these well. If this addit ional infor mati on is in the
will be equally prob able. form of a prob abili ty distr ibuti on
Now define U = T- 8. The cond ition al for 8, it can be comb ined with the data
distr ibuti on of U given that A= a using Bayes's Theo rem (3.6.1).
is uniform, Supp ose that the prob abili ty mode l for
the expe rime nt depe nds on a
para mete r 8, and that an event E with
prob abili ty P(E; 8) is obse rved to
1 occur. In addit ion, supp ose that 8 is itself
P(U = uja) = - - for u = 1, 2, .. ., N - a, a rand om varia ble with a know n
N- a (16.1.3) prob abili ty distr ibuti on, called the prior
distr ibution of 0, with prob abili ty or
probability density function g, say. The
and does not depe nd upon 8. Give cond ition al distr ibuti on of 0 given
n A and 0, there is a one- to-on e that E has occurred is called the poste
corre spon denc e between possible values rior distr ibuti on of 0. The poste rior
of U and T. Give n A and T, there is a distr ibuti on has probability or prob abili
one- to-on e corre spon denc e between possi ty density function given by
ble values of U and 0. Thus, when
A is given, the sufficient cond ition s for the
fiducial argu ment are satisfied. The f (OIE) = P(E; O)g(O)/ P(E) (16.2 .1)
fiducial distr ibuti on of 0 is obta ined by where P(E) is a norm alizin g cons tant:
assuming that (16.1.3) conti nues to
hold when Tis replaced by its obse rved
value t, and this gives
L P(E; O)g(O) . e
if is discrete;
l 8E!l
P(O= k)= N _a for k = t - 1, t - 2, .. ., t - N + a. P(E) = f"" (16.2.2)
(16.1.4) { P(E; _ O)g(O)dO if 0 is cont inuo us.
For exam ple, supp ose that N = 13, and 00
that we observe then = 4 total s 17, The poste rior distr ibuti on comb ines the
11, 14, 23. Then t = x<I) = 11 , x<n> = 23, infor mati on abou t 0 prov ided by the
and a= 23-1 1=1 2. Now (16.1.4) experimental data with the infor mati on
implies that 0 = 10 with prob abili ty !. conta ined in the prior distr ibuti on.
In this case the experiment is very The likelihood function of 8 base d on the
infor mati ve and completely deter mine obse rved even t E is given by
s the value of 0. If we were less
fortu nate, we migh t observe total s such L(O; E) = kP(E; 8)
as 13, 17, 19, 13. Then t= 13 and
a= 6, so that now (16.1.4) gives where k does not depe nd upon 0. Henc
e we may write
P(O = k) =~ fork = 12, 11, 10,. .. , 6. f (81E) = cL(O; E)g(8) ( 16.2.3)
Ther e are now seven equally prob able where c is a cons tant with respect to
values of 0. In the wors t possible 0, and is chos en so that the total
case, we obse rve equa l totals, 18, 18, probability in the poste rior distr ibuti on
18, 18. Then t = 18, a= 0, and is 1:
( 16.1.4) gives
1
I L(O; E)g(O) e
if is discrete;

f
8E!l
P(8= k) =fs fork = 17, 16, 15, .. ., 5
~ = { ~,,, L(O; E)g(8)d8
(16.2.4)
so that there are 13 equa lly prob able value
s of fJ.
e
if is conti nuou s.
322 16. Topics in Statistical Inference 16.2. Bayesian Methods 323

The posterior p.f. or p.d.f. is thus proportional to the product of the likelihood defectives. If n is small in comparison with the batch size, the probability of x
function and the prior p.f. or p.d.f. of e. defectives in the sample is

EXAMPLE 16.2. i. Consider the inheritance of hemophilia as discussed previ- P(x; e) = (:) e"(l - er". ( 16.2.6)
ously in Example 3.6.3. Suppose that a woman has n sons, of whom x are
hemophilic and n - x are normal. The probability of this event is Given no additional information, inferences about e would be based on
(16.2.6).
P(x; e) =(:)OX(! - er" (16.2.5) It may be that similar batches are received at regular intervals from the
same manufacturer. The value of e will vary somewhat from batch to batch. If
where e is the probability that a particular son will be hemophilic. The the manufacturing process is reasonably stable, one might expect the
problem is to make inferences about e. variation in eto be random, and introduce the assumption that eis a random
Given no additional information about e, inferences would be based on variable with probability density function g, say. Data from past samples
(16.2.5). One could graph the relative likelihood function of e, or compute would be used to help determine the form of the prior density function g.
confidence intervals. However, it may be possible to extract some information An assumption which makes the mathematics easy is that 8 has a beta
about() by examining the woman's family tree. For instance, suppose that the distribution with parameters p and q,
woman had normal parents, but she had a brother who was hemophilic. Then
her mother must have been a carrier, and she therefore had a 50% chance of for 0 < () < 1, (16.2.7)
inheriting the gene for hemophilia. If she did inherit the gene, then there is a where k = r(p + q)/r(p)r(q). Then, by (16.2.3), the posterior p.d.f. of 8 given x
50% chance that a particular son will inherit the disease(()= 1), and if she did is
not, all of her sons will be normal (8 = 0). (The possibility of a mutation is f(f)l x) = c(x)()"+p-1(1- erx+q -1 forO<e<l,
ignored in order to simplify the example.) The prior probability distribution
of e is thus given by which is also a beta distribution with parameters x + p and n - x + q.
Probabilities can be computed by numerical integration, or from tables of the
g(O) = P(e = 0) = t; g(!) = P(() = 1) = 1. F-distribution (see Problem 6.10.12).
With this additional information, it is now possible to base the analysis on Of course, it would be unwise to assume (16.2. 7) merely because it leads to
Bayes's Theorem. simple mathematics. Data from past samples should be used to check the
By (16.2.3), the posterior probability function of () is given by adequacy of (16.2.7), and to estimate the parameters p and q. As additional
data accumulate, further checks of the model can be made, and more precise
f(el x) = c(xW(l - er"! fore= o, 1. estimates of p and q can be obtained. Procedures such as this, in which data
If x > O; then () = O and () = i have posterior probabilities 0 and I , respec- are used to give information about both the current value of() and the prior
tively. If x = 0, the posterior probabilities are distribution of(), are called empirical Bayes methods. 0
P(e = OIX = 0) = c/2; P(() =!IX= 0) = c/2"+ 1 .
In the two preceding examples, it was natural to regard the value of () as
Since the sum of these must be 1, we find that c = 2" + 1 /(2" + 1), and hence having been generated by a repeatable experiment. Prior probabilities for
that 8-values then correspond to the relative frequencies with which the various
9-values would be expected to arise in many repetitions of the experiment. It is
possible, conceptually at least, to verify the prior distribution empirically by
actually repeating the experiment to obtain a sample 8 1 , 8 2 , ... , e. of
If the woman has at least one hemophilic son (x > 0), she must be a carrier. If e-values. These values could be.compared with the assumed prior distribution.
she has only normal sons (x = 0), the probability that she is a carrier decreases However, the analysis would usually be complicated by the fact that only
as n increases. estimates 81' 82, ... ' 8. were available.
Applications such as these, in which the prior distribution is the probability
EXAMPLE 16.2.2. Suppose that components are received from a manufacturer model of a physical process which generates the value of e, are not
in large batches, and let 8 denote the proportion of defectives in a batch. A controversial. However, Bayesian methods are· sometimes advocated in
random sample of n items is chosen from the batch, and is found to contain x situations where() is thought of as a constant. The prior distribution may be
324 16. Topics in Statistical Inference 16.2. Bayesian Methods
325
an objective summa ry of the prior state of knowledge concerning 8,
or it may Example 16.1.4. The same result would be obtaine d by taking the
be a stateme nt of an individual's subjective beliefs about 8. There fiducial
are distribution of 8 from the previous experiment as the prior distribu
differences of opinion among statisticians concerning the appropriateness tion in
of Bayes's Theorem . However, the latter procedure seems inappro priate
Bayesian method s in such situations. because
it violates the symmetry between the two experiments, and it may
lead to
unacceptable results in more complicated situatio ns. For further discussi
on,
Fiducial Prior Distributions see D.A. Sprott, "Necessary restrictions for distribu tions a posterio
ri",
Journal of the Royal Statistic al Society, B, 22(1960), pages 312-318
.
It may be that the conditions for the fiducial argume nt were satisfie?
in s_om.e
previous experiment involving the same parame ter 8. The fiducial
dts~n­ Prior Distributions which Rep~esent Ignora nce
bution of 8 from the previous experiment might then be used as
the pnor
distribu tion of 8 in the current experiment.
Various attempts have been made to formulate prior probability
distri-
butions which represent a state of total ignorance about the parame
EXAMPL E 16.2.3. Suppos e that, in a previous experim ter (see
ent, N components with H. Jeffreys, Theory of Probability, 3rd edition, Oxford: Clarend on
exponentially distributed lifetimes were tested until failure. From Press,
Example 1961). These are generally derived from argume nts of mathem atical
16.1.4, the fiducial distribu tion of the mean lifetime 8 has p.d.f. symme-
try and invariance.
Let us consider the simplest case, in which nothing is known about
1 ( t )N -t/8 f a
g(O) =
8
r(N) (j e or 8 > 0, parame ter 8 except that it must take one of a finite set of values {1, 2,
... , N} .
It might be argued that, since there is no reason to prefer one of these
where t is the total of the observed lifetimes. In the current experim values
ent, n over another, they should be assigned equal probabilities (Laplace's
additional compon ents are tested simultaneously, and testing stops Principle
after a of Insufficient Reason). The stateme nt that the N possible parame ter
predetermined time period T . From Section 9.5, the likelihood function values
of 8 are equally probabl e is then supposed to represent a complete
based on the current experiment is lack of
knowledge of 8.
L(8) = 8- me - •IB The above argument implicitly assumes that there exists some probabi
for 8 > 0, lity
distribution which appropr iately represents total ignorance. If this
where mis the number of compon ents which were observed to fail, assump-
ands is tion is granted, then the assignment of equal probabilities seems inevitab
the total elapsed lifetime of all n compon ents (including t?ose :'h?se le.
~ailure However, the assumption itself is questionable. It would seem more
times were censored)~ By (16.2.3), the p.d.f. of the posterior d1stnbu reason-
t1on of able to represent prior ignorance by equally likely, rather than
8 is equally
probable, parame ter values. If the N parame ter values are equally probabl
e,
then P(8 # l) = (N - l)/N, and this would seem to be an informa
tive
statement. However, no such stateme nt is possible if they are assumed
to be
equally likely, because likelihoods are not additive.
for 8 > 0.
Now consider a parame ter 8 which can take values in a real interval
It can now be shown by change of variables that 2(s + t)/8 0 < 8 < l, say. Great difficulties arise in trying to formulate a proba
has a X2 bility
distribu tion with 2(m + N) degrees of freedom. Hence tables of distribution of 8 which represents total ignorance. If one assumes
the X2 that the
distribu tion may be used to obtain the posterior probabilities of stateme distribution of 8 is uniform, then one-to-one functions of 8 will generall
nts y not
about 0. have uniform distributions because of the Jacobia n involved in continu
Note that it would not be possible to derive a fiducial distribution for ous
8 on change of variables. If 8 is totally unknow n, then presum ably 8 3 is also
the basis of the current experiment, or on the basis of the p'.eviou totally
s an_d unknown, but it is impossible to have a uniform distribu tion on both
current experiments combined. In each case the minimally sufficien of them.
t stat1st1c This problem does not arise if prior ignorance is represented by equally
is two dimensional, and there exists no ancillary statistic . likely
. parame ter values, because likelihoods are invarian t under one-to-o
If there were no censoring in the second experiment, the two expenm ne para-
ents meter transformations.
could be combin ed to give a single experiment in which N + n co1?1pon For further discussion, see Chapter ' l of Statistic al Method s and Scientifi
en.ts c
were tested to failure. A fiducial distribu tion for 8 could then Qe denved Inference by R.A. Fisher (2nd edition, New York: Hafner, 1959).
as m
16. Topics in Statistical Inference 16.3. Prediction 327
326

appropriate . Mathematic al mode1s are only approximat e description s of


Subjective Prior Distributions
reality, and predictions based on them may be wildly in error if they are poor
approximat ions. Errors of this kind are potentially the most serious, and in
In yet another approach to the use of Bayes's Theorem, the prior distribution
many situations it is difficult to estimate how large they are likely to be.
is taken to be a summary of an individual's prior belief about e. See, for
Although we can and should check the agreement of the model with the past
example, H. Raiffa and R. Schlaifer, Applied Statistical Decision Theory,
data, we cannot check the agreement with the future values which we are
Boston: Harvard Univ. Graduate School of Bus. Admin., 1961; and L.J. trying to predict.
Savage, The Foundations of Statistical Inference, London: Methuen, 1962.
Prediction problems have tidy solutions in the specia.I case where all of the
According to the advocates of this approach, the prior distribution for{} is to
a~ail.abl~ information about{} can be summarize d in the form of a probability
be determined by introspectio n, and is a measure of personal opinion
d1stnbut10n for {} (fiducial or Bayesian posterior). Suppose that {} has
concerning what the value of{} is likely to be. Bayes's Theorem is then used to
probability density function f, and that y has p.d.f. g(y; {})depending upon e.
modify opinion on the basis of the experimental data.
If we interpret the latter as the conditional p.d.f. of Y given {}, the joint p.d.f. of
Any statistical analysis involves some elements of subjective judgement -
Y and {} is g(y; {})f ({}). We then integrate out {} to obtain the marginal p.d.f.
for instance, in the choice of the probability model. Nevertheless, this
of Y,
subjective input is open to public scrutiny and possible modification if poor
agreement with the data is obtained. The same is not true of a subjective prior
distribution , which is entirely a personal matter. A subjective prior distri- p(y) = f:oo g(y; {})j({})d{}. (16.3. 1)
bution may be based on nothing more than hunches and feelings, and it seems
a mistake to give it the same weight in the analysis as information obtained This distribution combines uncertainty due to random variation in y with
from the experiment al data. The subjective Bayesian approach may prove to uncertainty due to lack of knowledge of {}, and is called the predictive
distribution of Y.
be valuable in personal decision problems, but it does not seem appropriate
for problems of scientific inference. . Pr_edic_tion problems are more difficult when there is no probability
d1stnbut1on fore. A procedure which is sometimes useful in this situation will
be discussed in Section 16.5.

16.3. Predicti on
Predicting an (n + l)st Observation from an Exponential
Suppose that we wish to predict the value of a random variable Y whose Distribution
probability distribution depends upon a parameter e. We assume that {} is
unknown, but that a previous set of data gives some information about the Suppose that n independen t values are observed from an exponentia l
value of e. In predicting Y, we have two types of uncertainty to contend with: distribution with unknown mean e. We wish to predict the value of Y, an
uncertainty due to random variation in Y, and uncertainty due to lack of (n + l)st observation to be taken from the same exponential distribution .
knowledge of e. We wish to make statements about Y which incorporate both The fiducial argument is applicable in this case. From Example 16.1.4, the
types of uncertainty . fiducial p.d.f. of{} based on the observed sample is
For example, suppose that the lifetimes of a certain type of rocket
component are exponentia lly distributed with mean e. We have tested n f({}) = _l_(~)"e - r/8 for{}> 0,
component s, and have observed their lifetimes x 1 , x 2 , .. ., x •. We wish to {}r(n) {}
predict the lifetime of another component , or perhaps the lifetime of a system
where t = LXi is the observed sample total. Given {}, the p.d.f. of y is
made up of several such component s. Even if we knew {}, we could not make
exact predictions because lifetimes are subject to random variation; that is, l
g(y; {}) = 7/-y/8 for y > 0.
component s run under identical conditions will generally have different
lifetimes. The problem is further complicated by the fact that we do not know
the value of {}, but have only limited information obtained from the n By (16.3.1 ), the p.d.f. of the predictive distribution of Y is
component s tested. Both the randomnes s of Y and the uncertainty about {}
f
00

p(y) = -1 e- y/8 . -1- ( -t )" e-•fBd(J for y > 0.


will influence predictive statements about Y. o 0 Or(n) 0
Throughou t the discussion, we assume that the probability. model is
328
16. Topics in Statistical Inference
16.3. Prediction
Upo n subs tituti ng u = (y + t)/8 and simp 329
lifying, we obta in
Predicting a Futu re Value from a Nor mal
Dist ribu tion
Supp ose that we wish to pred ict a futur
e value of Y, wher e Y - N(ct. , c ) with
The integ ral on the right equa ls f(n c 1 known. Supp ose furth er that er. is 1
+ 1), and hence by (2.1.14), unkn own , but that all avail able
infor mati on conc ernin g ct. is summ arize
d in the (fiducial or Bayesian)
t" r(n + 1) nt" distr ibuti on er.~ N(&, c ) wher e & and
2 c 2 are know n. Then by (16.3.1), the
p(y) = (t + y)"+ t • r(n) = (t +Yr+ i for y > 0. predictive distr ibuti on of Y has p.d.f.

Integ ratin g with respect to y now gives p(y) = I f"' exp{~-l-(y-cr.)2 __1_(ct. - &)2}dct..
2rc~ -co 2c 1 2c 2
P(Y $: y) = JYp(v)dv =
0
l -(-t-)
+
" t y
for y > 0,
This integral may be evalu ated by comp
prod uce a norm al integral. After a bit of
letin g the squa re in the expo nent to
algeb ra, we find that p(y) is the p.d.f.
and prob abili ties of state ment s abou of a norm al distr ibuti on with mean
t Y can easily be obtai ned. These & and varia nce c 1 + c . Henc e the
prob abili ties take into acco unt both predictive distr ibuti on is 2
the rand om varia tion of Y and the
avail able infor mati on abou t e. Y ~ N(a, c 1 + c 2 ) .
In Exam ple 9.4.1 we considere d n = An easier way to obta in this result is
10 observed lifetimes with total to write Y = er. + A Z wher e
t = 288, and in this case Z 1 ~ N(O, 1), and ct.= & +
Com binin g these gives
Fiz 1
2 wher e Z 2 - N(O, 1), indep ende ntly of
Z 1.
P(Y::; y)=. l- ( - 288
-
)lO for y > 0. Y=& +AZ 1 +F i z2
288 + y
where &, c 1 and c 2 are known cons
We use this to ma ke predi ctive state ment tants . Now (6.6.6) and (6.6.7) give
s abou t the lifetime Y of anot her Y ~ N(&, c 1 + c 2 ) as before.
com pone nt of the same type. For insta
nce, we obta in
EXAMPLE 16.3. l. Supp ose that we have
P(Y $: 5) = 0.158 , P(Y ~ 75) = 0.099 alrea dy obse rved n indep ende nt
meas urem ents x 1 , x 2, .. . , x. from N(µ, a 2
) with a know n, a nd that we wish to
and so on. Also , we find that predi ct the average value Y of m futur
e obse rvati ons from the same
distr ibuti on. From Exam ple 16.1.3, the fiduc
ial distr ibuti on ofµ base d on the
P( Y :$: 1.48) = P( Y ~ 100.6) = 0.05. x;'s isµ~ N(x, a 2 /n). The samp ling distr
ibuti on of Yis Y - N(µ, a 2/m}. Henc e
The inter val 1.48 $: Y $: 100.6 is called a by the discussion above, the predictive
90% predictive interval for Y . As one distr ibuti on is
migh t expect, the interval is quite wide ,
indic ating that we cann ot predi ct the
lifetime of a single comp onen t Y with
muc h precision.
It is of some inter est to comp are the
abov e results with what we could This distr ibuti on comb ines unce rtain ty
obta in if we knew the value of 8. If we assum due to lack of know ledge ofµ with
e that 8 is equa l to its maxi mum unce rtain ty due to rand om varia tion
likel ihoo d estim ate, we have in Y. If n-+ oo, then x >:::: µ. The
unee rtain ty due to lack of knowledge ofµ
is then negligible, and the predi ctive
P(Y $: yJ8 = 28.8) = 1 - distr ibution becomes the samp ling distr
e -yf l B. s for y > 0. ibuti on of Y. On the othe r ha nd, if
m-+ oo , then unce rtain ty due to rand
From this we obta in om varia tion in Ybec omes negligible , and
the predictive distr ibuti on beco mes the
fiducial distr ibuti on ofµ.
If a is also unkn own, we can integ rate over
P(Y $: 1.48) = P(Y~ 86.3) = 0.05. its fiducial distr ibuti on as well
to obta in
The centr al 90% interval is 1.48 :$: Y::; 86.3,
the 90% predi ctive interval. This indic
pred ictin g Y is due to the rand om varia
whic h is not much narro wer than
ates that most of the unce rtain ty in
tion of Y rathe r than to lack of
J s
Y-x
2(1
-+ -
n
1
m
)-t(n-1)
infor mati on abou t the value of e. 1
where s2 = - - E(xi - x) 2 (see Secti on
n-1 16.4).
16. Topics in Statistical Inference 16.4. Inferences from Predictive Distributions 331
330

EXAMPLE 16.3.2. Suppose that the straight line model (13.5.1) has been fitted It can be argued that, if a is known, then & carries all of the relevant
1
ton observed pairs (xi, Y;), i = 1, 2, ... , n. We now wish to predict the value Y information about IX. The sampling ,d istribution of & is N(IX, ca ) where c is a
of the dependent variable when the independent variable has value x. For constant. If a is known, inferences about IX are based on this distribution.
instance, in Example 13.5. l we might wish to predict the systolic blood The sampling distribution of & depends on a, and so we cannot use it for
pressure Y of a particular woman aged x = 50 years. inferences about IX when a is unknown. Instead we shall derive a predictive
If a is known, the argument preceding the last example may be applied. The distribution for & which does not depend on a, and then use the predictive
sampling distribution of Y is N(µ , a1 ) where µ =IX + {Jx. One can argue that distribution for inferences about a.
fl = &+ '/lx carries all of the relevant information aboutµ . From Section 13.6, =
Let V r.ef denote the residual sum of squares for the linear model. Then
we have fl - N(µ, ccr 1 ) where V carries all of the relevant information about a, and

c= -
l
+ (x - x) 2 /Sxx . U =V/a 1
- xfn - q)'
n
independently of&.
Hence, from Example 16. l.3, the fiducial distribution ofµ is N(fl, ca ). It now
1
U is a pivotal quantity which satisfies the conditions for the fiducial
follows that the predictive distribution is argument. To obtain the fiducial distribution of a, we replace V by its
Y - N(fl, (1 + c)a 2 ) . observed value v = (n - q)s 2 , giving

1 a 2 = (n - q)s 2/U where u - xln -q)·


If a is unknown, we replace a 2 by s 2 = - - :-£el to get
n-2 Now, by (16.3.1), the p.d.f. of the predictive distribution of & givens is .
Y-jJ.
T= j(l+c)sz-t(n - 2)· p(&; a, s) = t"' g(&; a, a)f (a; s)da.

A central 99% predictive interval for Y is then We can avoid having to evaluate this integral by using (6.10.1). We have

YE fl± aJs 2
[ l +~ + (x - x) /SxxJ
2 &::cx+Z#c
2
where Z ~ N(O, 1), independently of U . Substituting for a gives
where P{lt(n-ll/ $a}= 0.99.
For instance, in Example 13.5.1, the central 99% predictive interval for the &=:a + ZJ c(n - q)s 2 /U ==:a + T.j'S2C
blood pressure of an individual woman aged 50 years is where T = Z -':-JU/(n - q)- t<n-q) .by (6.10.1). Hence predictive statements
YE 137.68 ± 23.18. for & given s are obtained from

From Section 13.6, a 99% confidence interval for the mean blood pressure of (16.4.1)
all women aged 50 years is
µ E 137.68 ± 6.55. 2
This result appears to be identical to (13.2.7), but it is not. In (13.2.7), s is a
The interval for Y is much wider than the interval for µ, because there is random variable such that (n - q)s
2
/a 2
- xtn-q)• whereas in (16.4. l) s 2
is the
considerable variability in systolic blood pressure among women of the same particular observed variance estimate.
age. Even if we knew µ exactly, we could not predict the value of Y very In this problem, s 2 plays the role of an ancillary statistic. Since its sampling
2
precisely. distribution does not depend upon IX, s give no direct information about the
magnitude of IX. However, its observed value indicates the informativeness or
precision of the experiment with respect to a. By the arguments of Section
15.3, s 2 should be held fixed at its observed value in making inferences about
16.4. Inferences from Predictive Distributions IX. Thus it would seem appropriate to use the predictive distribution (16.4.1)

Suppose that Y1 , Y2 , ... , Y,, are independent N (µ;, a ), and that the µ/s are
2 rather than the sampling distribution (13.2.7) in making inferences about a.
linear functions of q unknown parameters ex, fJ, y, .. . , where q < n. This is the In fact, one will obtain the same numerical values for significance levels and
normal linear model (see Sections 13.l and 13.2). confidence intervals whether one uses (16.4.1) or (13.2.7), and so the
332
16. Topics m Statistical Inference
16.4. Inferences from Predictive Distributions
333
distin ction does not matte r in this case. It
does matte r in more complicated
cases, such as the Behr ens-F isher probl em The distri butio n of a linear comb inatio n
to be consi dered below.
Note that ( 16.4.1) defines a pivot al quan tity
T which satisfies the cond itions T= T1 cos 0- T2 sin(),
set out for the fiducial argum ent in Section
16.1. Thus (16.4 .1) can be used to
obtai n a fiducial distri butio n for rt. when <f where T1 - t 1.,> and T2 - t<, >are indep ende
is unknown. 2 nt, is called the Behrens- Fisher
distribution. It is tabul ated in the Fishe r
and Yates Statistical Tables for
Biological, Agricultural and Medical Research.
In this case we have
Beh rens -Fis her Problem
· sin()
tan() = - - =
Supp ose that we have n + m indep ende nt cos()
meas urem ents made using two
different techn iques which may not be equal
ly precise. The n meas urem ents so that () is a function of the observed varia
made with the first techn ique are mode nce ratio sVsf .
lled as N(µ 1 , <ff), and the m When (J 1 and (J 2 are unkn own, inferences abou
meas urem ents made with the secon d techn t µ 1 - µ 2 may be based on
ique as N(µ 2 , <f~). We wish to the pivotal quan tity
make inferences abou t µ - µ •
1 2
If (J 1 = (J 2 , we have just the two-s ampl e mode
l discussed in Section 13.4. A T= (iii -fl2) -(µ1 -µ2)
similar analysis to that in Section 13.4 is
possible if (Ji = k(J 2 where k is a
know n const ant. However, if the ratio
(J tf(J is unkn own, the analysis
J c 1 sf + cis~
becomes difficult and controversial. The probl 2 which is referred to tables of the Behr ens-F isher
em of maki ng inferences abou t distri butio n with param eters
µi - µ 2 when (Ji/(J 2 is unkn own is called v 1 , v2 , and( ), where v = n -1 , v = m-
the Behr ens-F isher problem. 1 2 1, and
The MLE of µ 1 - µ 2 is y - fi, the differ
1 ence between the two samples
mean s. Its samp ling distri butio n is

iii - fl2 - N(µ1 - µz, C1 <fI + CiO'D


where c 1 = l/ n and c 2 = l/ m. If (J and <fi A similar result can be derived for inferences
abou t any linear comb inati on of
1 were known, inferences abou t
µ 1 - µ 2 woul d be based on this distri µ 1 and µ 2 •
butio n.
Since the samp ling distri butio n of iii - Note that the Behr ens-F isher distri butio n
iii depen ds on <1 1 and (Ji , we arises in conn ectio n with the
cann ot use it for inferences abou t µ - µi predictive distri butio n of ii - ii , in which
1 when (J 1 and (J 2 are unkn own. 1 2 s 1 and s 2 are held fixed at their
Inste ad we shall derive a predictive distri observed values. Alternatively, one migh t consi
butio n for [l 1 - ji. 2 given the der the sampling distri butio n
obser ved samp le variances si and s~. This of
predictive distri butio n does not
depe nd upon (J 1 or (J 2 , and it can be used
for inferences abou t µ 1 - µ when T' = (ii1 -
<f 1 and (J 2 are unkn own. 2 f12)- (µi - µz)
From (16.4.1), the predictive distri butio ns
of iii and ii 2 are given by Jc1SI+ciS~
where now Si and S~ are independent rando m variables such that
wher e T1 - t<n- i) and T2 - t<m - I) · Ti and T (m - l)S~ /<1~ - xfm-1 )'
2 are indep ende nt because the first
samp le is assum ed to be indep ende nt of
the second. Henc e the predictive The distri butio n of T' in repea ted samp ling
distri butio n of [1 1 - iii given si and s~ is given is not Behr ens-F isher , and it will
by depen d upon the unkn own varia nce ratio
uUu~ .
Many statisticians are of the opini on that
fl1 -fi.2 =µ1 -µi +Ti~ -Ti~ based on sampling distri butio ns, so that
inferences shou ld always be
proba biliti es can be interp reted
= µ 1 - µ 2 + T Jc 1 sf + c2 s~ directly as relative frequencies in repet itions
of the experiment. As a resul t,
wher e Tis a linea r comb inatio n of Ti and they do not accept the above solut ion based
T2 : on the predictive distributio n of
fi.i - µ2 . On the other hand , it seerris appro priat e that Sf
and S~ shou ld be
treate d as ancillary statistics and held fixed
at their obser ved values as is the
case in the abov e solution. If one insists that
statistics, then inferences cann ot be based
Si and S~ be treated as ancillary
on a samp ling distri butio n.
· 16. Topics in Statistical Inference 16.5. Testing a True Hypothesis 335
334

16.5. Testing a True Hypothesis to construct a range of plausible values for y 2 • For instance, suppose that 12
successes are observed in 20 Bernoulli trials, and that we wish to predict y 2 ,
Sometimes one can generate intervals of"reasona ble" values for an unknown the number of successes in 30 future trials with the same success probability .
quantity y by the device of testing a true hypothesis H. One assumes a value Taking n 1 = 20, n2 = 30, and y 1 = 12, we can compute D for selected values of
for y, carries out a test of significance, and finds the significance level SL(y). A y 2 • We find that D::;; 3.841, and hence SL~ 0.05, for 10::;; y 2 ::;; 25. Values of
small significance level inilicates an inconsistency, and if H is known to be y 2 outside this interval are implausible in that they would lead to a
true, doubt is cast on the value assumed for y . One can define a 95% interval significance level less than 0.05 in a test of the true hypothesis p 1 = p 2 .
or region as the set of all values of y such that SL(y) ~ 0.05. Several examples EXAMPLE 16.5.3. Suppose that a lake contains n 1 tagged fish and n 2 untagged
will be given to illustrate this procedure. fish, where n 1 is known but n2 is unknown. Fish are caught during a
2 predetermin ed period of time, and the catch is observed to consist of x tagged
EXAMPLE 16.5.1. Suppose that X ~ N(µ 1 , a /n) and Y- N(µ 2 , a / m) inde-
2
fish and y untagged fish. What can be concluded about n 2 ?
pendently of X, where a is known. Given observed values of){ and Y, we can We assume n 1 + n 2 independen t trials, and let the probability that a
test H: µ 1 - µ 2 using the result that ){ - Y- N ( 0, u
2
( ~ + ~)) under the particular fish is caught during the time period be p1 for tagged fish and p2 for
untagged fish. Given a value for n2 , we can test H: p 1 = p 2 as in the preceding
hypothesis. The significance level will be 5% or more if and only if example. If we are willing to assume that H is true, then a small significance
level casts doubt on the value chosen for n 2 .
x-y
-1.96 s ---;:=== == s 1.96. (16.5.1) For instance, suppose that there are 110 tagged fish in the lake, and that the
u2G +~) sample contains 20 tagged fish and 478 untagged fish. Taking n 1 = 110,
y 1 = 20, and y 2 = 478, we can compute D for selected values of n 2 • We find that
D s 3.841, and hence SL~ 0.05, for 1817 s n 2 ::;; 4097.
Now suppose that we don't know ybut we do know that µ 1 = µ 2 . This would
Note that, in order to derive this range of values for n 2 , it is necessary to
be the case if Y were the average ofm future observation s to be taken from the
assume that p 1 = p 2 • This assumption may not be appropriate , because fish
same N(µ , u 2 ) distribution as the original sample x 1 , x 2 , .. . , x" (see Example that have been caught and tagged may have a larger (or smaller) probability
16.3.1). Now (16.5.1) yields a 95% interval for j: of being caught subsequently.

ye x ± 1.96 a 2 G ~) + . EXAMPLE 16.5.4. Suppose that the normal straight line model (13.5.1) has been
fitted ton observed pairs (xt> Y;), i = 1, 2, ... , n. Now we observe an additional
value y, for which the correspond ing value of the independen t variable x is
This interval consists of the values of Y such that a test of the true hypothesis
unknown. The problem is to make inferences about x.
µ 1 = µ 2 would produce a significance level of 5% or more. The same interval
Suppose that a value is given for x. Then from the straight line model, the
can be obtained as the central 95% interval in the predictive distribution of y
estimated mean value of Y is
(see Example 16.3.1).
jJ. = & + '(Jx - N(µ, ca )
1
EXAMPLE 16.5.2. Let Y1 and Y2 be independen t variates, with Yi- binomial
1
(n;, p;). The likelihood ratio statistic for testing H: p 1 = p 2 is where c=c(x)=(l /n)+(x-x) /Sxx· We take Y-N(µ ', a ) and test H: µ' =µ.
2

Under H, we have
D = 2~y; log Y;_ + 2k(n 1 - y 1) log ~; - y~
n1 p n; 1- p) Y - jJ. - N(O, (1 + c)a 2 )
where p = (y 1 + y 2 )/(n 1 + n 2 ). Given observed values y 1 and y 2 , we can test H and the significance level is 5% or more if and only if
as in Section 12.4. The distribution of Dis approximat ely xfiJ if His true, and y-jJ.-0
so -1.96 s s 1.96.
J a (1 + c)
2

SL~ P{XftJ ~ Dob•}.


Substituting for µ and c and squaring gives
Alternatively, an exact conditional significance level can be calculated as in
Section 15.6. [y-a-,lh ]
2
::;; l.96 2 a
2
[ 1+ ~ + (x -x) /sxxJ.
2

lfwe don't know y 2 but we do know that p 1 = p2 , we can use the above test
336
16. Topics in Statistical Inference

The 95% interval for x is the


set of x-values for which this qua
inequality is satisfied. For any valu drat ic
hypothesis E(Y) =a + /Jx will give
e of x outside this interval, a test
of the APPENDIX A
a significance level less than 5% .
For a unknown, we replace l.96 2 2 I
a by t 2 s 2 , where s 2 = - · -· Le 2 and
n-2 '
Answers to Selected Problems
P{ - t $ t(n - 2 ) $ t} = 0.95, giving

[y - a - '/Jx]2

If we take x as known and y as unk


$ t2s2 [ l + ~ + (x - X)2/Sxx l (16.5.2)

now n in this inequality, we get the


95% predictive interval for a futu central
re observation Y (see Example 16.3
kno wn but x unknown, we get a 95% .2). For y
interval for x. This interval consists
all values of x such that the observed of
value of the observation Y belongs
95% predictive interval. to its
If the slope f3 were zero, then an obse
rved y would not determine a value
x . Thu s one might anticipate of
difficulties when the estimated slop
significantly different from zero. A
test of H : fJ = 0 is based on
e is not p
'/1-o
PIS: - t(• - 2)> 9.1.1 I. = !.xJ n = 0.25
9.1.4 0 = (2X1+Xz)/2n =0.5 5; exp. freq. 30.25 , 49.50, 20.25
and the con ditio n for pto be significantly different from zero at the 5% 9.1.7 L(N) = N-• for N :2:: largest sample
is level value Xe•» and L(N ) = 0 otherwise.
decreases as N increases, so N = Xcnl· L(N)
132 > t2s2 / Sxx
9.1.9 L(B) =CT p{• where Pi= 81- 1(1 - 8) for i = 1, 2, 3
8= <fz + 2f3 + 3f4)/ (f1 + 2f2 + 3f3 + 3f.) = 3 and p 4 = 8 .
where P { - t $ t<n- 2 ) $ t} = 0.95 . If Exp. freq. 100, 50, 25, 25; poor agre 0.5
this condition is not satisfied , the ement with obs. freq.
interval for x will be either the enti 95% 9.1.13 L(µ) =CT p(i where Pi=µ •e-" /i!(
re real line, - w < x < w , or else the l-e - ") for i=l , 2, .... 1'(µ) =0
real line with a finite interval rem entire \ gives
oved . These results can be derived equa tion for fl; P. = 3.048.
examining the discriminant and by 9.2.1 LJ_8) = 8'"(1 - 8)M-m82"(1- 82)N-n
the sign of the second-degree term
qua drat ic inequality (16.5.2). in the
/'(8) =0 gives (M + 2N)8 2 + (M -
The problem of estimating the inde
pendent variable x given a value of 0=0 .12.
m)8 -(m + 2n) = 0. (}, = 0.11 , & =
2 ft.02,
dependent variable y is called the the 9.3.1 50% LI (0.168, 0.3 55); 10% LI (0.11
calibration problem. It arises whe 6, 0.460)
qua ntity x which is of interest cann n the 9.3.4 1(8) = 54 log( !+ 8) + 46 log( t- 28);
ot be measured directly, but we mus /(0) = -175 .74
do with a measurement y which is t make r(O) = 1(0) - 1(0) = - 3.44; R(O) = 0.032
related to x. The equ atio n relating . It is quite unlikely that 8 = 0.
x is called the calibration curve. E(Y) to 9.3.7 R(b) = 1 for b = 9, 10 (6 is not uniq
In this case we have assumed a ue)
cali brat ion curve E(Y) =a + {Jx with linear R(b) ~ 0.5 for b = 6, 7, ... , 17; R(b)
independent N(O, a 2 ) errors. ~ 0.1 for b = 5, 6, ... , 32.
9.3.10 r(A.) = 51 log A. -20,1. + 3.259; 10%
LI 1.86 :5 A. :5 3.39.
r(A.) = 2 log p + 18 log (1- p) + 6.502
where p = e-\
10% LI 1.21 :5 ,1. :5 4.28. Yes, since
the 10% LI is muc h wider.
9.4.1 (} = 76.8, r(8) = -27 log 8- -2074- + 144.218; 10%
8 LI 61.7 :5 8 :5 97.2. Obs .
freq. 11, 8, 6, 2; exp. freq. 12.92, 6.74,
5.35, 2.00. Yes, the agreement is good
.
9.4.4 L(8) = e- 1• exp( -"Ex;/8); &=~:Ex,; R(8) = L(8)/ L(O).
2n
' . 9.4.6 /(µ) = -[:E (x1 - µ) 2 + :E(y, - µ/2) 2
] /211 2 ; /'(µ) = 0 gives
fl= (nx + }my}/(n + ;im).
Appendix A: Answers to Selected Problems
341
340 Appendix A: Answers to Selected Problems

D0 b, = 5.98; d.f. = 2 - l = 1; SL::::: 0.014


Expt 2: r; is Poisson with mean !OOp;
Strong evidence against H: µ 1 = µz.
• lOOn
/(p) ==Lr; Jog p - lOOpn; J E(P) = - . 12.3.9 {i = 9.6706; D0 b, = 0.854; d.f. == 3 - I= 2; SL::::: 0.65.
p No evidence of heterogeneity . .
MSE(T1 ) = E(Y ) - µ 2 (2 + µ ) since E(Y ) = 1 + µ .
2 2 2
4 112 etc.
11.7.2 95% CI 9.74 ± 1.96/0.563
2 2 112
MSE(T2 ) = E(Y ) - (1 - µ ) = MSE(T1) - 1.
4
Overall 9.6706 ± 1.96/ (0.563 + 0.345 + 0.695) .
2
11.7.7 T= (!:w 1X 1)/(l:w 1) where w, = u,- . 12.4.3 (a) p, = 82/400, np, = 20.5, n(l - p;) = 79.5 for I$ i $ 4, D0 b, = 12.897,
12.1.1 X =number tall - binomial (100, p); H: p = 0.75 d.f. = 4 - I = 3, SL::::: 0.005
(b) p, = 72/ 300 for z =I, 2, 3; p4 = 10/ 100. D0 .,. = 2.757, d.f. = 4 - 2 = 2,
D = \X - 75\; D0 b, = 10; SL= P{\X - 75\ 2 10\p = 0.75} . Exact SL= 0.0275
from summing binomial probabilities ; approx. SL::::: P{\Z\ z 2.31} = 0.0209, SL :::::0.25
No
(c) D0 b, = 12.897 - 2.757 = 10.140, d.f. = 2 - I= 1, SL::::: 0.0015.
or SL::::: P{\Z\:?: 2.19} = 0.0282 with continuity correction.
evidence of difference among drugs; strong evidence of difference between
12.1.3 X = # times benzedrine is faster - binomial (20, p)
drugs and placebo.
H: p =0.5; D= \X -10\; Dobs = 4
SL= P{\X - 10\:?: 4\p = 0.5} = 0.115. Results do not strongly contradict 12.4.6 0: = -0.0002126, ~ = -0.03297, p = exp(O: + dif)
H:p=0.5. np 4537.07 3726.28 3374.24 2547.54 1933.03
n(l -jj) 0.93 125.72 230.76 265.46 272.97
12.1.5 X = #seedlings from one packet - binomial (JOO, p}
H: p = 0.8; D = \X - 80\; D0 .,. = 7 for 1st customer. Dobs = 3.53, d.f. == 5 - 2 = 3, SL::::: 0.32
SL= P{\X - 80\ z 7\p = 0.8} = 0.103. No customer has a strong case against No evidence against hypothesis.
2 1
merchant. T =Total # seedlings - binomial (400, p) 12.5.2 i '3 '3
Exp. freq. 26306 ( 12)(1)'(2)1 - ; Pool last 2 classes. D0 b, = 38.15; d.f. =
H : p = 0.8; D = \T- 320\; D0 b, == 20
SL= P{\ T- 320\ z 20\p == 0.8} = O.D15 (12-1)-0 = 11; SL< 0.001. MLE of P(5 or 6) is p= 0.3377. Using this value
Combined results give strong evidence that p # 0.8. rather than p = t gives D0 bs == 11.13 with 10 d.f. Dice appear to be biased in
12.1.9 f(x) = (5 - x)/15; E(X) = 4/ 3; var (X) = 14/9. E(X) = E(X) = 4/ 3; favor of 5 or 6.
var (X) = var(X)/ 100 = 14/ 900; D =i \X - 4/ 3\; Xobs = 1.6; D0 b, == 4/ 15 12.5.6 li = 0.55; exp. freq. 30.25 49.5 20.25
SL= P{\X -4/ 3\:?: 4/ 15}::::: P{\Z\:?: 2.138} == 0.0325 D0 b, == 1.24; d.f. = (3 - 1) - 1 = l; SL::::: 0.27
Yes, X 0 .,. is significantly larger than expected. Doh•= 4.94; d.f. = l ; SL::::: 0.026. The larger sample provides stronger evidence
12.1.11 T =#calls in 5 hours - Poisson (5..l). H : ,l = 7.2 = T- Poisson (36). T.~ = 21. against the model.
21
12.5.10 (a) ji::::: 83/ 60; exp. freq . 15.0 20.8 14.4 6.6 3.1 (:?:4)
SL==P{TsT .b,\..l =7.2}== L we-
x~ o
36
/x!==0.0049
A good fit.
(b) ji::::: 136/ 60; exp. freq. 6.2 14.1 16.0 12.l 6.8 3.1 1.7 (26)
Strong evidence that ,l < 7.2.
12.2.l 1(0) = 65 log 0 + 35 log(! - 0); iJ = 0.65 Dobs = 24.7; d.f. = (7 - 1) - I = 5; SL < 0.001.
Dobs = 2(1(0.65)- /(0.75)] == 4.95; SL::::: P(xfl) z 4.95) = 0.0261 Injuries tend to occur in clusters, not individually.
12.2.5 l(p) = l::f1 log p1; p1 == fj/556; l(p) = -619.586 (c) Exp. freq. 16.6; D0 b, == 9.67; d.f. = (5 - 1) - 0 = 4; SL::::: 0.046. Some
l(p 0 ) = - 619.824; Doba = 2( -619.586 + 619.824] == 0.475 evidence against hypothesis.
1
SL::::: P(xt31 :?: 0.475) == 0.92; no evidence against hypothesis. 12.5.13 P(i digits between O's)== (0.9) (0.1) for i == 0, 1, 2, ...
12.2.8 1(0) = -n log 0- l::X 1/0; iJ = X; D = 2(1(0)-1(0 0 )] Class O 1-2 3-4 5-6 7-9 10-13 14-19 20-co
0=43.3, 1(0)= -47.705 Exp. freq. 5.0 8.55 6.93 5.61 6.48 6.00 5.36 6.08
H: 0 = 37.4; l(Oo) = -47.82 1; D0 b, = 0.23; SL"" 0.63 Obs. freq. 6 13 8 4 5 4 5 5
H: 0 = 65.1; l(Oo) = -48.426; D0 b, = 1.44; SL::::: 0.23
Data are consistent with both hypotheses. D0 b, == 4.20; d.f. = (8 - 1)- 0 = 7; SL::::: 0.76. No evidence against the model.
95% CI (14.7% LI) 24.76 s 0 s 86.55. (Other groupings are possible.)
12.3.l -105.493; -105.743(0 = 0.55); -106.745 12.5.16 p = (0.ln + 0.05n + 0.12n)/ 3n = 0.09
D.b. = 0.5; d.f. = 2 - 1 = l ; SL::::: 0.48. Dobs = 2.00; d.f. = 1 - 0 = 1; SL::::: 0.157. 0.05 0.88] .
D = 211 [ O·! log -0.1
0.9
+ 0.05 Jog -0.09 + ... + 0.88 log -0.9 J = 0.03418n,
Dobs = 2.50; d.f. = 2 - 0 = 2; SL::::: 0.286 obs
12.3.5 I(µ)= l:X, log µ 1 - nµ 1 + l: Y; log µ 2 - mµ 2 ; [l 1 = X, iii= Y. d.f.=3-1= 2
D == 2[/(ji) - !(ii)] where iii =iii= (nX + m Y}/(n + m). P(xf21 z5.991)=0.0 5 so n ~ 5.991/ 0.03418 = 175
' 11 ' 31 - - 42 12.5.18 Pi= P(j siblings)=(} - l)ai_ 1jLia,; exp. freq. is npi .
n == 12, m = 15, µ 1 =12' µ 2 =15" µ 1 = µ 2 = 27 Exp. freq. 34.21 51.41 47.31 ... 3.91 (z9)
Dobs = 32.33; d.f. == (10- 1)- 0 = 9; SL< 0.001.
338 Appendix A: Answers to Selected Problems
Appendix A: Answers to Selrcted Problems
339
9.4.10 L(c) = e"' for 0 < c::::; small est samp le value;
L(c) increases with c.
R(c) =exp {n(c - x >)} for 0 < c < x l; 10.3.6 Rm.,().)= (l /A)" exp {n - nl/ A} for 0 <). <
0 0
100p% LI: Xoi +{l/n ) slog p s c ::s; x<IJ· co;
Rm (c) = (t- tc 1 i)"(f- c)-• for 0 < c ::s; t(l>·
L(O) = o-· for 8 ::s; X1, X2, ... 'x. ::s; 28; i.e. for
01
9.4.12 10.3.8
8 ::s; x(I) and x(•) ::s; 28. Since L(O) Lm.u(,l.) = ).'"(1 -e -.< r)-'" exp { -l.Et; }; Rm.,(
).)= Lm.. (A.)/Lmax(AJ.
is decre asing, 0 = !x<•l; R(O) = (0/8)" for 0
::s; 8 ::s; Xcll· 10.4.2 }'2 2 = 0.057097; 0.7000 ::s;). ::s; 1.7255
9.5.2 L(O) = pk(l - p)"-k where p = 1- e- rie; .J; 2 = 0.0388
20; 0.7946 ::s;). ::s; 1.8510
/j = T/log (- "- ); R(O) = L(O)/L(O). Exact 10% interv al 0.7956 ::s;). ::s; 1.8593. Yes,
the logar ithmi c transf ormat ion
n-k helps.
9.5.4 f(x) = 4x8 - 2e-ix1e; L(O) = 0-2me - 2~1IO(l + 10.5.2 ML eqns . .E(n1 - a,)= 0, .E(n - a,)d = 0
2T/ 8)"-'"e-2T<n-ml/B 1 1
/'(8) = 0 gives quadr atic equat ion for 8. 2nd deriv. -r.b,, -r.bit11, -r.bid f, where a
1 =(n,- x,)/(1 - pJ; b,=a; p;/(1- p;).
B= 1813.42; 10% LI (1382, 2449) p
&: = -0.00 02126 , = -0.03 297
9.6.3 P(T::::; lOOt) = l -e - 100 ' 18 = 1 -/3' -0.03 54 ~ /3 ::s; -0.03 06; 0.99896 ::s; e" ::s; 0.9999
92.
L(/J) = pf9p~2 ... P~ = p234(l -/3)92(1 + /3)19 10.5.3 l(a., /3) = rxj log ex+ r.jx log /3 - ar.p1. ML
for o < /3 < 1. l'(/3) = o gives 1 equat ions :Exj - &r.pj = O;
quadr atic equat ion for p. P=0.7245 and .Ejx1 - &r.jµ 1 =0. Subst itute
LI 0.670 ::s; /3 ::s; 0.775.-.249::::; 8 ::s; 392
0= -100/ logfi =310 .3 10%
nume rically for p.
a= r.x1jl:fJ1 in 2nd equat ion; solve
50% LI 0.695 ::s; /3 ::s;0.753.-.275 ::s; 8:::; 352.
9.7.3 10% LI is x ± u((2/ n) log 10) 1' 2. Width is 11.2.3 c = J-2 log 0.2 = 1.794; CP = P(-1. 794 ::s; Z:::;
at most 2 for n ~ 2u 2 log 10. 1.794) = 0.927
9.8.1 µ = 1.5936; l(jl) = -21.7 233 P( - 2.242 ::::; Z S 2.242) = 0.975, so p = exp
{ - !(2.24 2) 2} = 0.081.
9.8.4 1.3185:::; µ::::; 1.6184 11.3.2 =
P.d.f. of V 2;.x is 2

9R2 L(cx) = (1 - cx) 50 [cx(l - cx)] 23 [cx 2(1 - cx)] 14 [cx 3


&: = !; 10% LI is 0.4226 ::s; ex$ 0.5774
(1 - cx}) 8 [cx 4] 5 ; f( fu)!dy.-(,_;u.fu) = ~e-uf
,_;u. 2
l for u > O.
Exp. freq. 50 25 12.5 6.25 6.25; a good fit.
9R6 L(O) = (l /O)e - 1n18. (l/O)e - 768/8. [e- looo1e ]3 Thus U - xf2> and U 1 + U 2 + · · · + V" - xf21 •
~ 2dn) where Y -
1(8) = -2 log 8- 3900/ 8; 8 = 1950
P(D ::s; d) = P( Y - 1 - log Y 1
(/> = 1- e-IOO/ d = 1-e- 1119 · 5 =0.05 00. n xf > as in Exam ple 11.3.3.
9R9 1(8) = 72 log 8 + 160 log(l - 8) + 20 log(3 2 2
- 28); 11.4.3 r(p) = 300 log p + 200 log(l - p) + 336.506;
1'(8) = 0 gives a quadr atic equat ion. 8 =
0.2954; exp. freq. 49.64 29.33 21.03. 95% CI (14.7% LI) 0.5567 ::s; p ::s; 0.6423
p= 0.6
10.1.3 I(µ, u) = -n log u - r.a (y - µ) 2/2u 2 99% CI (3.6% LI) 0.5428 ::s; p ::s; 0.6554
1 1
1 §(p) = 2083.3; 0.6 ± 0.0429; 0.6 ± 0.0564
µ= jJ.(u) = .Ea1y 1/ka;; a 2 = -.Ea (y - jJ.)2 11.4.4 r(O) = -10 log 8 - 388/8
n 1. 1 + 46.584; 8 = 38.8
.?(fl, a) is a diago nal matri x with positive diago 90% CI (25.8% LI) 24.03 ::s; 8 ::s; 68.60; p =
nal eleme nts r.aJa 2 and 2n/a 2. e- 5 o1o; 0.1248 ::s; p ::s; 0.4825
10.1.5 If f(x)= 0"'- 1 (1-8 ) for x= 1, 2, 11.4.7 r(u) = -15 log(] - 26.102
... then E(X )=(l- 8)- 1. 6/ 2u 2 + 11.655; a= 1.319
E{X 1 + X 2 + ... + x .. }=m( l- 8)- 1 . Since 95% CI (14.7% LI) 0.958 :::; u ::s; 1.979
ex> /3, m(l -a)- 1 > m(l-/ 3) - 1 .
a 11.4.11 L(O) = o-• for 8 ~
a= - b
- , p= - - where a, b are numb ers of
M; B= M. P(M ::s; m) = [P(X, ::s; m)]" = (m/8)"
successes for treatm ents A P(D ::s; d) = P(M ~ Oe- 412") = 1 -(e- 412 •)"; ;
a+m b+m ·
and B. CP(8 0 ) = P(D ::s; - 2 log p) = i _ e-<-2 101
P>f 2;
10.1.8 L(J.., c)=A " exp {-lr. t,}·e xp {nlc} 95% CI (5% LI) 0.9537 :::; 8 ::s; 1.2868.
for 0<c:: :;t0 > and l>O. c=to> ; 11.5.3 a= 1, b= 2; y= 1.3201
.A= 1/(r - t 111 ).
10.1.10 l(p, l) = m log p + lr.t, - m log). .i~ 1 =.Jt1 + 4..1 12 + 4..1 22 = 0.04891
+ (n - m)log (l - p + pe-.<~. Solve iJl/iJp for
p as a functi on of A. Subst itute into iJlfiJ). = y ± 2.576(.Jf! 1 ) 1' 2 = 1.320 1±0.5 697
0 and solve for ,l.. p = e'/(1+ e');p= 0.789 2; 0.6793 ::s; p ::s; 0.8687
10.2.2 a: = 3/ 8, p= 11;24; = & + p= 5/
q, .
6; P= &:/ </> = 9; 20.
Exp. freq. 15 11 9 29 (a perfect fit). 11.5.6 Y =log m =log l
r = 44 log ex+ 15 log(l - ex)+ 11lo gp+1 9 p
8 + log(log 2); a= 0.012213, b = 0.082944; y
= 4.2309
log(l -a -cx/3) + 81.744.
10% max LI 0.61 ::s; </> ::s; 1, 0.28 ::s; p ::s; 0.68. .i!1 =0.01 366
95% CI 4.0018 ::s; y ::s; 4.4600, or 54. 70 ::s; m
10.3.2
Rmax(,l.) = C! YC~;.)'IC: YC~ Y
A Y Y
A= 57 /47; 0.80 ::s;). ::s; 1.86. Rm.. (1) = 0.62. ). = 1 (no chang
11.6.l 2n 12n
..FE(O) = O(l _ O) in (a); (l _ O)(l + O) in (b). Do
::s; 86.49 .

(b) if it is thoug ht that 8 > 0.2.


e in rate) is very
plausi ble. 11.6.3 Expt 1: /(p) = r. Yi
log p + r.(Y; - X 1) log (1 - p); § Jp) = -lOOn
-.
p(l - p)
Appendix A: Answers to Selected Problems
343
342 Appendix A: Answers to Selected Problems

freq. 25 25 (assuming no ,change in overall support).


Strong evidence against hypothesis.
D = 5.21; d.f. = 1; SL:::: 0.022. Evidence of a loss in support for the
Adjusted frequencies fj are not multinomial.
government.
12.5.19 L(P1 ... p4) = p~(l - P1)'2Pio(l - /12)90 ... p!84(1 - p4)200
p1 = 8/20, p 2 = 70/ 160, etc., so l(p) = -586.791. 13.3.3 y = 61.21, s 2 = 14.74 (11 d.f.); µ E 61.21 ± 2.44
8 + 70 + ... 1 14.7% LI 7.08 $ 11 2 $ 38.85
+ ... = ; l(p) = -590.561
2
Under H, p, = + 3.816 $ l ls 2/11 2 $ 21.92 gives 7.40 $ 11 $ 42.48.
20 160 2 13.3.4 Assuming n large 95% CI for µ has width 2(1.96)(11 2 /n) 112 • For width 2,
D0 bs = 2[l(p)- l(p)] = 7.54; d.f. = 4 - 1 = 3; n = (1.96) 11 . Variance estimate s = 80/ 9, so n .=::: 34. Advise about 25
2 2 2

SL:::: 0.057. Weak evidence against hypothesis. additional measuremen ts.


Exp. freq. 10 10; 20 40 20; ... ; 6 24 36 24 6 13.3.7 (a) y = 12.9, s2 = 138.9; P{ Y < O}:::: P{Z <(O - y)/s} = 0.14 from Table B2.
D0 b, = 14.27; d.f. = (1+2+3+4 ) - 1=9; SL:::: 0.113. (b) y = 2.082, s2 = 27.67. Negative counts are impossible.
In (c), estimate p separately for each litter; s2 = 4.018 (25 d.f.); µ 1 e 6.23 ± 2.06s/ fa
13.4.2

12.6.1
Dobs = 14.27 - 7.54 = 6.73; d.f. = 10 - 4 = 6.
10(6.03)
3(6.07)
117(120.97)
144( 140.03)
µ 2 e 12.74 ± 2.06s/ ji6; µ 1 -µ 2 e -6.51±2.06 s / J/
1 16
+
1

D 0 b, = 5.31; d.f. = (2 - 1)(2 - 1) = 1; SL:::: 0.021. Fairly strong evidence 13.4.6 (a) si = 0.0914, v
1 = 13; si = 0.0422, v2
= 3;
2
against independenc e hypothesis. s2 = 0.082175 (16 d.f.). Dobs = :Ev 1 log(s /sf)=0.616 (1 d.f.); SL:::: 0.43. No
12.6.5 74(85) 116(123.25) 68(63.75) 82(68) evidence against equal variance hypothesis.
126(115) 174(166.75) 82(86.25) 78(92)
D0 b, = 8.70; d.f. = (2- 1)(4- 1) = 3; SL:::: 0.034. Some evidence against (b) µ 1 - µ 2 E 0.273 ± 1.746 [.
0.082175
14
(1 41)]
+
1 2
' ·
(a) si = 0.25, s~ = 0.268, s~ = 0.183, s = 0.267; d.f. = 4, 4, 4, 12.
2
independenc e hypothesis. 13.4.9
12.6.7 (a) Doh•= 1.84; d.f. = (3 - 1)(3 - 1) = 4; SL:::: 0.77 D0 b, = 41: log(s 2 /sf) = 0.491 (2 d.f.); SL:::: 0.78.
(b) Obs. freq. 126 271 132; Exp. freq. 132.25 264.5 132.25 No evidence against equal variance hypothesis.
D h, = 0.46; d.f. = (3 - 1)- 0 = 2; SL:::: 0.8
0
2
(b) s2 = 0.267 (12 d.f.); 14.7% LI 0.132:::; 11 :::; 0.671.
No evidence against hypothesis in (a) or (b). 2 2
Or 4.404 $ 12s /11 $ 23.34 gives 0.137::; 11 2 ::; 0.728.
12.6.11 (a) Above 9(15) 6(8) 12(9) 23(18) (c) µ2 - µ3 E -0.3 ± 2.179(0.267( i + z)]'1 .
2

Below 21(15) 10(8) 6(9) 13(18) & = 1.468, p= 1.703, s2 = 0.00502 (28 d.f.). Plot shows curvature. Try a 2nd
13.5.2
D 0 h, = 10.8; d.f. = 3; SL:::: O.Ql 3. Strong evidence that birth weight is not degree polynomial model.
independent of parental smoking habits. & = 47.864, p= 48.247, s = 0.1783 (15 d.f.). Estimated boiling points
2
13.5.6
(b) MF Mf MF MP 19240 203.16 211.96. P would increase to 48.247logl0; other results
Above 9(9.8) 6(5.2) Above 12(11.7) 23(23.3) unchanged. One point (BP = 204.6) is seriously out of line. Redo analysis
Below 21(20.2) 10(10.8) Below 6(6.3) 13(12.7) with this point omitted.
Given mother's smoking habits, there is no evidence that birth weight 13.6.2 & = 0.976, p= 0.353, s2 = 0.00978 (6 d.f.). The last observation is somewhat
depends on father's smoking habits. larger than expected.
12.6.13 Test for independenc e in 2 x 10 table gives D = 37.5 (9 d.f.), which shows that pE 0.353 ± 2.447(s 2 / l.229) 1i 2 2 112
C1. + 0.4/3 E 1.117 ± 2.447s(i + 0.2875 /1.229)
insects tend to aggregate. This analysis is conditional on the total number of
(a) & = -0.228, p= 0.9948, :Eef = 0.3419, s = 0.04273 (8 d.f.).
2
insects which land on area A or B in each trial. 13.6.7
2 1 2
12.7.3 Test for independenc e in 2 x 4 table gives D = 66 (3 d.f.); SL:::: 0. Strong (b) T.bs =(ft - 1)/(s / 1569) ' = -1.00; SL= 0.35.
evidence against independenc e hypothesis. It may well be that only the best 1 2
(c) T.bs = &/s ( 10 + 23.25 /1569
. )1/2
= -1.65; SL= 0.14.
students chose to write the competition. There is no proof that writing the
competition made them any better. No evidence against H: /3 = 1 or H: C1. = 0.
(d) :Ex 2 = 6974.25, l:xy = 6884.65, :Ey = 6796.66; P= 0.9872;
2
12.7.5 D = 112; d.f. = l; SL:::: 0. The admission rate is certainly lower for females. 112
Only program A shows any evidence of bias, and here it appears to be against s = M:Ey - Pl:xy) = 0.05099 (9 d.f.); T.bs =(ft - 1)/(0.05099/6974) =
2 2

males. There are proportionat ely more female applicants to programs with -4.75; SL< 0.001. There is very strong evidence that /3 # 1. If we insist
low admission rates. that line goes through the origin, slope must be less than 1 to give a
12.8.3 (a) Each of the 400 electors is counted twice in the table, so rows are not reasonable fit to the data It is reasonable to take f3 = 1 or to take C1. = 0, but
independent . it is not satisfactory to assume both P= 1 and C1. = 0.
that differences are independent N(µ, 11 ), and test H: µ = 0. Here
2
(b) D = 256; d.f. = 1; SL:::: 0. Most electors have not changed their positions. 13.7.3 We assume
(c) Consider just those who changed their positions. Obs. freq. 17 33; exp. y= 2.5, s 2 = 5.10 (5 d.f.).
344 Appendix A: Answers to Selected Problems
Appendix A: Answers to Selected Problems
345
T.b• = (y-O)/ (s 2 /6) 112 = 2.71; SL= 0.042.
There is some evidence that brand A is superior. Howev
er, brand A tires were
P= (X'X)- 1
X'y gives Pi= 86.35, P2 = -2.575, p3 = -0.362 7,
always tested first. It would be better to test A first
on 3 randomly chosen ref= 63845.72-63841.59 = 4.13 (9 d.f.).
cars, and B first on the other 3 cars. '/31+ (11.5 - 19)'/32 + (17.5 - 19)2p 3 = 89.40
13.7.8 (a) &= -0.160 5, P= 35.348, s2 = 0.01031 (31 d.f.) d
di [}3, + (t - 19J'/32 + (t - 19)2 p3J = o for t = 19 - p 2; 2p = 15.45.
3
1
Cl+ 0.4/J E 3.3743 ± 2.04s
[ 33 + 0.02764 2/0.07242 ]1/2 = 3.3743 ± 0.0419 14.3.1 (a) s2 =0.074317/ 25 (25 d.f.)
(b) Same &, P; s 2
= 0.00753 (9 d.f.) H 1: Q = 0.066343 (3 d.f.); SL= P{F 3.2s z 7.44} =
.0 .001.
H 2: Q = 0.000043 (2 d.f.); SL= P{F2 .is z 0.007} = 0.993.
[
1
ex+ 0.4/3 E 3.3743 ± 2.262s Tl+ 0.02764 2/0.02414
]1/2= 3.3743 ± 0.0687 2
(b) s = 0.07436/ 27 (27 d.f.)
The interval in (a) is too narrow, owing to the Q = 0.0663 (1 d.f.); SL= P{ F 1.2 7 z 24.07} < 0.001.
treatment of repeat 14.3.5 (a) l::l::(y1i - yJ2 = 4.540; s 2 = 0.2389 (19 d.f.)
measurements as independent replicates.
13R2 sf= 2.495 (6 d.f.), s~ = 2.898 (6 d.f.), s 2 = 2.696 (12 d.f.). l:::E(ylj - y) 2 = 6.838 (22 d.f.); Q = 2.298 (3 d.f.)
D0 b, = 6:2: log(s 2/s?) = 0.03 (1 d.f.); SL~ 0.85. Fobs= (Q + 3)/s 2 = 3.21; SL= P{ F 3.!9 z 3.21 } = 0.046.
No evidence against hypothesis of equal variances. Some evidence that means are not equal.
(b) S 2 = 0.1324 0.08 0.3329 0.377; V; = 6 3 6
(71+ 71)'' 4;
2
µ2 - µIE 1.729 ± 2.179s = 1.729 ± 1.912. s 2 = 0.2389 (19 d.f.).
. D0 b, =:Ev; log (s 2/sf)= 101 (3 d.f.);
4.404 $ 12s2 /a 2 $ 23.34 gives 1.386 $ u 2 $ 7.346. One
possibility: divide 14 SL~ P(xf3 > z 3.01) = 0.39.
men into 7 pairs with nearly equal initial blood pressu
re. Choose one of pair No evidence against hypothesis of equal variances.
at random to get drug 1, and the other gets drug 14.3.8 Straight line model : :Eef = 118.91 (18 d.f.).
2. Analyze differences.
13R6 sf= 0.1983, s~ = 0.1820, sS = 0.1692, si = 0.1722, each with
8 d.f. s2 = 0.1804 5-sample model : :El:e5 = 117.27, s 2 = 7.82 (15 d.f.).
(32 d.f.).
Q = 1.64 (3 d.f.); Fobs= 0.07; SL= p { F 3.15 z O.D7}
D 0 b, = 8:2: log(s 2/st) = 0.06 (3 d.f.); SL~ 0.996. = 0.975.
Straight line model fits very well.
No evidence against hypothesis of equal varianc 14.4.1 '/3, V, and s 2 (4 d.f.) are given in Proble m 14.2.4.
es. In fact, the variance
estimates are so nearly equal that one might suspect
some tampering with the (a) /3 3 E '/3 3 ± 2.776(s 2 vJJ} 112 = 28.675 ± 2.985
data_
(b) (i) 0=P 2 -2/3 1 =b'/J wher eb'=( -2, 1, 0).
18.3 $ 32s 2 /u 2 $ 49.5 gives 0.1166 $ u 2 :::; 0.3155.
lJ = -2.375 ; var(tJ) = cu 2 where c = b' Vb= 19/ 8.
Alternatively, 14.7% LI is 0.1147 $ u 2 $ 0.3074.
T,,b, = (tJ- O)/(s2 c) 1 ' 2 = -0.88 ; SL=
14.1.l P{lt14 >z 0.88} = 0.43.
1 0 (ii) Under H, µ = X p where
0 0 0 0 0 0 0
1 0 2 0 1 0 0 0 0 0 0 X' = [ 1 2 0 3 1 2 3].
0 0010111 '
3 0 0 -1 0 0 0
0 0 4 0 New residual SS is 14.71 (5 d.f.).
0 0 0 0 0 Old residual SS is 12.335 (4 d.f.); s 2 = 3.08375
0 0 5 0 0 1 0 0 0 1 0 Q = 2.375 (1 d.f.); F 0 b, = 0.7702; SL= P{F . z 0.77} =
1 4 0.43
0 0 6 0 0 0 0 0 0 (c) For revised model, P = 28.8, v = 0.3684, s 2 =
3 33 2.942 (5 d.f.).
f33 E '/3 3 ± 2.571(s 2 v 33 ) 112 = 28.8 ± 2.677.
14.2.2 )(' X =[ n LX ] - [ 12 628] 14.4.3 (a) (J = /3 2 - /3 = b'P where b' = (0 1
:Ex rx 2 - 628 34416 ' 1 -1 0). lJ = 2.88; var (tJ) = cu 2
where c = b'Vb = 0.021. s2 = 4.3 (9 d.f.).
:Exy
J
X'y= [ :Ey = [ 1684]
89894
(J E 2.88 ± 2.262(cs 2) 1' 2 = 2.88 ± 0.680.
(b) T.b, = ('/J3 -O)/(V33S 2) 112 = -6.88;
(X'X )-1=[ 1.8495 -0.033 7 ]· P= I - I I -[80.89] SL= P{lt19 >1z6.88} < 0.001.
-0.033 7 0.000645 ' (XX) X Y- t.138 (c) Q = 167.6 - 38.7 = 128.9 (2 d.f.). Fobs= (Q + 2)/s 2
pt(X'y) = 1684 x 80.78 + 89894 x 1.138 = 238330 = 14.99;
2 SL= P{F 2 . 9 z 14.99} = 0.0014 .
l:y = 238822; rsf = 238822 - 233330 = 492. 14.5.6 (a) mf1 + mf2 + ··· + mf. = m ;. so m;;(l -
1 m;;) z 0.
14.2.5
x-[\;~;n rx{~ ~ ,~J X'y=
rl-154.5
755.4 .]
(c) ~f m., = 1, then mu= 0 for j ¥- i by resu!t in (a).•
µ,=m ,,y 1 +m12Y + ... +m;.y .=y,; e;=y; -µ;= 0
.
e .1.2 Y 1s sufficient but not2 minimal because L(u; - y) = L(cr; y).
4924.3 15.1.5 Yes, the pair (T , T ) is minimally sufficient.
1 2 L(O; y') and L(O; y) are propo r-
tional if and only if T,(y') = T1(y) and T (y') = Tz(y).
2
346 Appendix A: Answers to Selected Problems

15.1.7 f(X1 ... Xn) = (1/2/J)" for -{} < X1, ... , Xn < 8. APPENDIX B
L(B; x) = e-· for e
< x(I) and X(n) < 8; i.e. for II> max {x(n)' -x(n}
T max {X<•i• - X(l 1} is a sufficient statistic for 8.
15.1.9 2 2
2
2
L(µ) exp { - n(X - µ) /2rr - m(Y - µ) /2krr }
2

rr 2 }
Tables
c exp { - [(nk + m)µ -2(knX + mf)]/2k
+ m)rr 2 ) by (6.6.8)
Thus T = knX + Y is sufficient forµ. T- N((kn + m)µ, k(kn
and (6.6.7).
15.1.15 log
e logµ, e1 (or any constant multiples of them).
l is a sufficient
L{).)= )."exp( -,1,t) for A>O where t=L.xf_ Thus T==:L.X
. Now Z =2).T
15.2.3
statistic for ,1,, P.d.f. of Y = 2AX is fe-v which
2 is xf21
12 ,

=L.2) .Xr- xf2• 1 by (6.9.7). P.d.f. of Z is g(z)1~;1 = k(2).rr 1 e-J.t·2)..

15.2.5 Proble m 15.1.6 only.


15.3.1 =
T X + Y is ancillary; base inferences on conditi onal distribu
ently, on distribu tion of X given
tion of Bgiven
observe d T. This
observed T, or equival
conditi onal distrib ution is binomial (t, 8).
15.6.2 See Example 15.6.1. Tables Bl, B2
Y1.Y2 0,4 1,3 *2, 2 3, 1 4,0
D(y 1 , Y2 ) 2.02 0.07 2.41 8.04 20.02 Standa rdized norma l distrib ution
P(ylt) 0.376 0.462 0.149 0.013 0.000 x 1
SL=P( D22.4 1\T 4)=0.1 62. F(x) = P{N(O, 1):::;; x} =
I- oo v 2rc
212
h::e- " du.

15_6,3 P(x\t) = (x 1 + r - l)(x 2 + r - l)/(t + 2r -1) Table Bl gives the value x whose cumul ative probab
ility F(x) is the sum of the
r-1 r-1 2r-l
gs. Examp le: the value x such that
l(pl' P2) = r log P1 +Xi log(l - P1) + r log P2 + X2 log(l
- P2) corres pondin g row and colum n headin
and colum n .04 of Table Bl).
;6
D 2[1(;6 1 , 2 ) - l(p 1 , p2 )] where ;6, = r/(r + x,) and p, = 2r/(2r + t). F(x) = .64 is 0.358 (from row .6
x =0, 1, 2, 14, 15, 16; gives the cumul ative probab ility F(x), where x is the sum of the
r = 2, t = 16, Dobs = 2.41; D 2 2.41 for 1 Table B2
the cumul ative
le:
SL P(D22 .41IT 16)=0. 194. corres pondin g row and colum n headin gs. Examp
.3 and colum n .06 of
15.6.6 See Example 15.6.2 (test for independence in 2 x 2 table). probab ility at value 0.36 is F(.36) = .6406 (from row

(~) ( 12 ~x)/G~} Dobs= 14.02


1 Table B2).
g(x)=
SL= g(O) + g(lO) + g(ll) + g(12) = 0.00054. Distri bution
Table Bl. Percen tiles of the Standa rdized Norm al
=
15.6.10 Under H: µ 1 = µ 2 = 2µ 3 , T X 1 +X 2 + X 3 is
sufficient.
.08 .09
t
f(xlt) = ( Xi Xz x
)
5
(2)x1+x2(1)x3
S .where L.x,
F .00 .01 .02 .03 .04 .05 .06 .07

3 .100 .126 .151 .176 .202 .228


.5 .000 .025 .050 .075
D(x) = 2L.x, log(x;/,U,) where ji 3 = ~; ,U 1 = ,U 2 ~ .6 .253 .279 .305 .332 .358 .385 .412 .440 .468 .496
5 5 .613 .643 .674 .706 .739 .772 .806
.7 .524 .553 .583
SL P(D2D obs1T= t). .954 .994 1.036 1.080 1.126 1.175 1.227
.8 .842 .878 .915
15.6.12 (a) T=X 1 +X 4 ; T~binomial (n,p). .9 1.282 1.341 1.405 1.476 1.555 1.645 1.751 1.881 2.054 2.326
2T 2(n:: T)
(b) D 2T log + 2(n - T) log - - - 3.090 3.291 3.891 4.417 4.982
4 n x1.960 2.576
.99995 .999995 .9999995
SL P(D 2 D0 b,) =sum of binomial (n, f) probabilities. F .975 .995 .999 .9995
.000001
.00001
(c) P(xlt) =(
X1 X2
n
X3 X4
)1-•/( 11

t
)wher eL.x,= nandx 1 +x 4 =t.
2(1 F) .05 .01 .002 .001

Source: R. A. Fisher and F. Yates, Statistical Tables for


.0001

Biological, Agricultural and Medical Research, Table


t n t Ltd., London (previou sly publish ed by Oliver and Boyd, Edinbur gh);
=2; e e3 = - -. I; published by Longma n Group
D 2L.X; log(X; /e;) where e 1 =e4 1
=
2 reprinted by permission of the authors and publishe rs.

SL= P(D 2 DobslT= t) ~ P(xf212 Dobs)·


...,
~
00

Table 82. Standardized Normal


Cum ulat ive Distribution Functio
n
x .00 .01 .02 .03 .04 .05 .06 .07 .08
.0 .5000 .O'J
.5040 .5080 .5120
.1 .5160 .5199 .5239
.5398 .5438 .5478 .5517 .5279 .5319 .5359
.2 .5793 5557 .5596 .5636
.5832 .5871 .5910 .5675 .5714 .5753
.3 .6l7 9 .5948 .5987 .6026
.6217 .6255 .6293 .6064 .6103 .6141
.4 .6554 .6331 .6368 .6406
.6591 .6628 .6664 .6443 .6480 .6517
.6700 .6736 .6772
.5 .6915 .6950 .6808 .6844 .6879
.6985 .7019 .7054
.6 .7257 .7088 .7123 .7157
.7291 .7324 .7357 .7389 .7190 .7224
.7 .7580 .761 l .7422 .7454 .7486
.7642 .7673 .7703 .7517 .7549
.8 .7881 .7734 .7764 .7794
.7910 .7939 .7967 .7823 .7852
.9 .8159 .7995 .8023 .8051
.8186 .8212 .8238 .8078 .8106 .8131
.8264 .8289 .8315
1.0 .8413 .8438 .8340 .8365 .8389
.8461 .8485 .8508
LI 8643 .8531 .8554 .8577
.8665 .8686 .8708 .8599 .8621
1.2 .8849 .8729 .8749 .8770
.8869 .8888 .8907 .8790 .8810 8830
1.3 .90320 .8925 .8944 .8962
.90490 .90658 .90824 .8980 .8997 .90147
1.4 .91924 .90988 .91149 .9130')
.92073 .92220 .92364 .91466 .91621 .91774
.92507 .92647 .92785 .92922 .93056 .931'89
'O
>
"g
"-
;:;·
!;I)

.,...j
~

>
'O
)!
Table 82. Standardized Normal
Distribution (con tinu ed) "'
"-
;:;·
!;I)

x .00 .01 .02 .03 .04 .05'


.,
...j
.06 .07 .08 .09
er
1.5 [
.93319 .93448 .93574 .93669
1.6 .93822 .93943 .94062 .94179
.94520 .94630 .94738 .94845 .94295 .94408
1.7
.94950 .95053 .95154
.95543 .95637 .95728 .95818 .95254 .95352 .95449
1.8 .95907 .95994 .96080 .96164
.96407 .96485 .96562 .96638 .96246 .96327
L9 .96712 .96784 .96856 .96926
.97128 .97193 .97257 .97320 .96995 .97062
.97381 .97441 .97500 .97558
2.0 .97725 .97778 .97831 .97615 .97670
.97882 .97932 .97982 .98030
2.1 .98214 .98257 .98300 .98077 .98124 .98169
.98341 .98382 .98422 .98461
2.2 .98610 .98645 .98679 .98500 .98537 .98574
.98713 .98745 .98778 .98809
2.3 .98928 .98956 .98983 .98840 .98870 .98899
.990097 .990358 .990613 .990863
2.4 .991802 .992024 .992240 .991106 .991344 .991576
.992451 .992656 .992857 .993053
2.5 .993790 .993244 .993431 .993613
.993963 .994132 .994297 .994457
2.6 .995339 .994614 .994766 .994915
.995473 .995604 .995731 .995060 .995201
2.7 .995855 .995975 .9960')3 .996207
.996533 .996636 .996736 .996833 .996319 .996427
2.8 .996928 .997020 .997110 .997197
.997445 .997523 .997599 .997673 .997282 .997365
2.9 .997744 .997814 .997882
.998134 .998193 .998250 .997948 .998012 .998074
.998305 .998359 .998411 .998462
3.0 .998650 .998511 .998559 .998605
.998694 .998736 .998777 .998817 .998856 .998893 .998930
Source : A. Hald , Stati stical Table 965 .998999 .998
s and Formulas (1 952), Table II;
reprinted by permission ol'Jo hn
Wiley & Sons , Inc.

...,
~

"'
"O
..,
-l
::r'
~"g ~ "' g.
i1
..,3
g: ~
"'"..
;r
8 l:5~~
'...J..JN NNN
0 '° 00 _, "'
Nt-.JNNN
1..A+::i- WN-
N ----
O\OQ:l... ...JO\
-----
Vl~WN- 0\000-.JO\ Vl.+;:.. WN- "'1
O" O"
"" 0
=.; 0.
0
.,, ....,
~
w
v.
~· c. !X1
NN NNNN W g ~:\.,,(! E O"
r.
0
iv t-...J N iv N t...J N N N NNN '-<: 0
5·~~ NivN N NivNt- NNN
..1tv Vl NN N 0\ 0\ 0\ O'. 0\ 0\ -.I -.I OCVN II
:::> :- 'T1 Vl Vl Vl Vl
w +:--
Vl Vl Vl Vl Vl Vl Vl Vl Vi
,+:::.. Vl
-.J -.J -.J -:I 00 ~~~~g 0 - N
Vl Vl Vl Vl Vl

" ' O' " ' O' er,


WV\ -.J - - J \ C
" ' " ' " ' O' _,
l
..,, :r
~....,

ft
t:O
w
e.,g ~
:__,
~ 0

~ v. Viv. v. V.V.V,0 -'....J


I" - 'i;I.
~3;;
- OQ "
V.V.V. V.Vi ;, ,Jlv,v,v .v. v.~v,v,v.
v. (_,, v. v. v.v.v.~v. 0\00
V .\C _,- N 0
~ ""
::s IA 0
~ ~ ~ et~~~
'CJ -.J O"
N N N N '..;) w w
ooo--
'.JJ w WWWWW
--NNN
'-.1WW WW
ww+:i-+ :i-.Vl ~ ::; ~

0.?S"" ~
i:: ::> :::>
"'"' 0-
;. 0..,,
+:o-0-.-J \O
0 ()Q ~
0 .., .
.., 0 -<
en ;;·
.., 0 II 2.
: -5 ~ Oo Oo Oo 6o Oo Oo Oo Oo Co Oo Oc Oo Oo Oo Oo Oc Oo Oc Oo Oc ~~~~ '°~~~~~ ~ ~ ~ ~
r.
~
Oo Oo Oo Oc Vl Vl Vl V'\ Vi 0\0--.-l-.J
\CW\00\~0-oo-O\ "'0-,
~
0- -
r-
?- ~
• "'" "'" "'" ..,.
NVlo:i-
Vl Vl Vl
Vl
..i::i,.~V\VlO"
Vl
0--.-.JOO OO\O 0' -"
" N W"V
' O' ' 'O'
l O' 00 0 w O'
0
::r'
0 8
,,.-..._
...., <

-w - - - -w -w -w - - -w w;...., (.,. ,
"O
Vl
~s a.
~ w \.....i w
w :::;-.
0 ""-c: + cc..
w w -0oo '°0
~ g_ ~-
- - -- WWW~ £· '.1>.V.0
\....J 0
;.....>;...., i.,,;.J i.....J \....J -.JWWQ :>-.J 0. ~
" c "
;,1,:._;
NNNW
0000\00
h .J \0 °' w -
0 - (.,...J +:-- Vl
---NN
0"00\0-1 .,,..)
N NW~
VlOOO W-l
.io..io.v .v.Q\
:.;.J
-V>OO'W
-.JOO...C -
NW-.JV '\ O"WOO Q'\00
0 ><
3 (") ~
<C l ;::N 0
::I

.., ""'"
'O
"~ . -~
0c: 'o-· -°' -°' - - - '.....J '.....J :.....i '.....J '.....J '.. . J '.. . J '.....! :.....i :.....i '.. . J '.....! '.....J
-- - - - NNNN°'
:.....i :.....i Oc :.0 Oo 00 \o 0 ;_ w \.o ;,.,.., \c
V>
~o
"' ..,
---:- @
<
+

°' °' °' °' w - - N O - . . ~~g~e


W V'l N -
q;, .t;::..V'l-.J OC
VlOC-+; i..
:.....i :.....i '.. . J o - - - N ti.tt:t:'.~~ V'l
~:g2s~ ':x> - ~ -.J -
0\ -.J 00 \CJ -
V.N<»O .io. "'0::S
"O -
'='.
N
0
.,_ ,
"O 5· i::... "'
0.
;.~ - · t: :;-
~:
NNNNN N N w ~ N
~§ ­ -NN NNNNN Vi '.....J ;_ NNNNN NNNNN NNNNN
;_;_.;_;_"' t-.Jt-.Jwwt w '.....! '°..,._, Jg er
'°'°~o i:~~oo i;;?S;:>~oo o~;_;
_;_
'3~ ~~~~-.J :::! ~ ~ 8 ~ 0 ;:1 ! .., §..
g, ~ g~~t::! NV'lOO~~ o~\O~~ ~\,...):=:o~ ~t;~:.
o-<>o
'< ., ~
(/>
...,
<:
i <:
1· , , . - . . . _ 0
::s
o§ w g ,,.-..._, N ; +
~~ NNNNN NNNNN NNNNW Wl,...)~0\- (;" NI ~ ~ !-
~ o.~ l ~
NNNNN W~Vi\oOo
".., "... N N N N N N N N N
~ ~ Vi Vi ~
~~ g~~~
kJ~~-.JW ::
'.....JQoQo\o:._
"'
::>"
0-
-
~
WWW~
NV'l\C N
o-.ooo w
~~~ ~~ ~~8~00
-.JN-.JW \O OC\ON N.i:i..o -oo
O\N '-0\0 . .
~-o-..oow
Q\ . . .io.O\N
V'l-.J-Vi-
<
~ >
g: z:: ~
'O
'it:I
0- "'
'< ~
NNNNN NNNNN NNWW W t,..Jl,...)W WW .i;i...i;i..V 'l\OW °' "'....,
0 0-
m2· N N
V.0-0-'... ..i
N N NNNNN
'......i'.....J'.....J'.....J'.....J '.....i'.....JOoOoOo 0o0c0c 0o'a 'a'aoo :... -tvW~'.....J
O"ViV i\00
o·§Oo\
W
oO-.
.t;::..NVi '°..,.
'-0 ;:;
oc ·- .oo-w c;5~!:::! ~:jt::;~~ :r
~:::; s~ co-.w --.o -.J-..J- .J\0- t2:'·
0- - VlViO"-- ..J-..J \0 0 Vi \0 -..J N - U1 -..J
- · :>::
::>"
er"'
0 °"
c: .. (") ;:;'
.., "... c::
OQ
~.i:i..ViViV'l 0\ OC N
""
-
'-"'
0\ "' 3 [
0-

:=::
~ _ g.. w w w w
.t
WNWWW
°'°'°':.....i
WWWWW w w w w .i;i..
Oc Oc :..0 \.o 0 "'"~
0 "'""'""'"
~~~~~
;_ t...J w:.... 0co-,;..o~°"
°' - '° c:
'°'°..,,
.....J~Oo
-i
rvw·~v. '.....J'.....J' ~

~· ~ '° _, ~~~~~ ...J--oc -...c


LllOON 0"-- N \0 -
V.-.1\0 0 tv.io.O'O -O- \CO.io.C C\C
,..:;:: ·
-" --
V>
-...o.i;i..o -.J V'lV'l-.J N\C 0 w N V'l tJi
- w -
0...:-:-'

;;.
'O
ii:::>
0-
;;·
2
Table 84. Percentiles of the Chi-Squ are (i'. ) Distribu tion cc

F(x) = P(xfvi ~ x) = f: uv 12
-
1
e- " du/ 2v r
12 12 ( ~) . The body of the table gives the values x corresp onding to sele..:ted
values of the
-l
"'
0-
ff
cumula tive probabi lity (F) and degrees of freedom (v).

~ .25 .5 .75 .9 .95 .975 .99 .995 .999


.01 .025 .05 .10
.005
.1015 .4549 l.323 2.706 3.841 5.024 6.635 7.879 10.83
1 .0 4 3927 .0 3 1571 .03 9821 .0 2 3932 .01579
.5754 1.386 2.773 4.605 5.991 7.378 9.210 10.60 13.82
2 .01003 .Q2010 .05064 .1026 .2107
1.213 2.366 4.108 6.251 7.815 9.348 l 1.34 12.84 1627
3 .07172 .1148 .2 158 .3518 .5844
3.357 5.385 7.779 9.488 11.14 13.28 14.86 18.47
4 .2070 .297 1 .4844 .7 107 1.604 1.923
2.675 4.351 6626 9.236 11.07 12.83 15.09 16.75 20.52
5 .4117 .5543 .8312 1.145 1.160
3.455 5.348 7.841 10.64 12.59 14.45 16.8 1 18.55 22.46
6 .6757 .872 1 1.237 1.635 2.204
4.255 6.346 9.037 12.02 14.07 16.01 18.48 20.28 24.32
7 .9893 1.239 1.690 2.167 2.833
5.071 7.344 10.22 13.36 15.51 17.53 20.09 21.96 26.13
8 1.344 1.646 2.180 2.733 3.490
5.899 8.343 11.39 14.68 16.92 19.02 21.67 23.59 27.88
9 1.735 2.088 2.700 3.325 4.168
4.865 6.737 9.342 12.55 15.99 18.31 20.48 23.21 25.19 29.59
10 2.156 2.558 3.247 3.940
5.578 7.584 10.34 13.70 17.28 19.68 2192 24. 72 26.76 31.26
11 2603 3.053 3.816 4.575
8.438 l 1.34 14.85 18.55 21.03 23.34 16.22 28.30 32.91
12 3.074 3.571 4.404 5.226 6.304
7.042 9.299 12.34 15.98 19.81 2236 24.74 27.69 29.82 34.53
13 3.565 4.107 5.009 5.892
7.790 10.17 13.34 17.12 21.06 23.68 26.12 29.14 31.32 36.12
14 4.075 4.660 5.629 6.571
8.547 11.04 14.34 18.25 22.31 25.00 2"49 30.58 32.80 37.70
15 4.601 5.229 6.262 7.2'51
(co• :inued on following page )
,._,.,
Vl
Table B4. Chi-Square Distribution (cont inued )

F
.005 .OJ .025 .05 .JO
v .25 .5 75 .9 .95 .975 .99 .995 .999
16 5.142 5.812 6.908 7.962 9.312 11.91 15.34 19.37 23.54 26.30 28.85
J7 5.697 6.408 7.564 8.672 10.09 32.00 34.27 39.25
12.79 16.34 20.49 24.77 27.59 30.19
18 6.265 7.015 8.231 9.390 10.86 33.41 35.72 40.79
13.68 17.34 21.60 25.99 28.87 31.53
19 6.844 7.633 8.907 10.12 34.SJ 37.16 42.31
11.65 14.56 18.34 22.72 27.20 30.14
20 7.434 8.260 9.591 10.85 32.85 36.19 38.58 43.82
12.44 15.45 19.34 23.83 28.41 31.41
21 34.17 37.57 40.00 45.32
8.034 8.897 J0.28 11.59 13.24 16.34 20.34 24.93 29.62 32.67 35.48
22 8.643 9.542 10.98 12.34 38.93 41.40 46.80
14.04 17.24 21.34 26.04 30.81 33.92
23 9.260 10.20 11.69 36.78 40.29 42.80 48.27
13.09 14.85 18.14 22.34 27.14 32.01
24 9.886 10.86 12.40 35.17 38.08 41.64 44.18 49.7 3
13.85 15.66 19.04 23 34 28.24 33.20
25 10.52 11.52 13.12 36.42 39.36 42.98 45.56 5U8
14.61 16.47 19.94 24.34 29.34 3438 37.65 40.65 44.31 46.93 52.62
26 l 1.16 12.20 13.84 15.38 17.29 20.84 25.34 30.43 35.56 38.89 41.92
27 11.81 12.88 14.57 16.15 45 .64 48.29 54.05
18.l l 21.75 26.34 31.53 36.74 40.l l
28 12.46 13.56 15.31 43.19 46.96 49.64 55.48
16.93 18.94 22.66 27.34 32.62 37.92
29 13.12 14.26 16.05 41.34 44.46 48.28 50.99 5689
17.71 19.77 23.57 28.34 33.71
30 13.79 14.95 39.09 42.56 45.72 49 .59 52.34
16.79 18.49 20.60 24.48 29.34 58.30
34.80 40.26 43.77 46.98 50.89 53.67 59.70

Source : E. S. Pearso n and H. 0 . Hartley (editor


- [l;xfv)
For v > 30, ft{
r 3
- 1 + 2}
9
v is approximately N(O, 1).
""~
"0
,,
s), Biometrika Tables for Sraciscicians, i;ol. I, Table ;:;
permission of the Biometrika Trustees. 8; Cambridge University Press (3rd edition, 1966); c:
reprinted by
~
er
~

·- ---· - - ; .-- . : ~~ .. - ~ -:;

Table 85. Percentiles of the Variance Ratio (f)


Distribution n Num erato r and m
Deno mina tor Degrees of Freedom

r
:>
-0

f(x)=P(F"·'"~x)= f(nmu 2
-
1
( l+mnur(n+m )1
2
n (n+m)/r (")'2 r (m)
mdu·r - l-
~
:0
0.
;:;
0
2 =
90th Percentiles (F = .9) .,a....,

~
[
1 2 3 4 5 6 8 12 24 oc;

1 39.86 49.50 53.59 55.83 57.24 58.20 59.44 60.70 62.00 63.33
2 8.53 9.00 9.16 9.24 9.29 9.33 9.37 9.41 9.45 9.49
3 5.54 5.46 5.39 5.34 5.31 5.28 5.25 5.22 5.18 5.13
4 4.54 4.32 4.19 4.11 4.05 4.01 3.95 3.90 3.83
5 4.06 3.78 3.62 3.76
3.52 3.45 3.40 3.34 3.27 3.19 3.10
6 3.78 3.46 3.29 3.18 3.11 3.05 2.98 2.90 2.82 2.72
7 3.59 3.26 3.07 2.96 2.88 2.83 2.75 2.67 2.58
8 3.46 3.11 2.92 2.47
2.81 2.73 2.67 2.59 2.50
9 3.36 3,01 2.40 2.29
2.81 2.69 2.61 2.55 2.47
10 3.28 2.38 2.28 2.16
2.92 2.73 2.61 2.52 2.46 2.38 2.28 2.18 2.06
12 3.18 2.81 2.61 2.48 2.39 2.33 2.24 2.15 2.04
15 3.07 2.70 2.49 2.36 1.90
2.27 2.21 2.12 2.02 1.90
20 2.97 2.59 2.38 2.25 l.76
2.16 2.09 2.00 1.89 1.77
25 2.92 2.53 2.32 1.61
2.18 2.09 2.02 1.93 1.82
30 2.88 2.49 2.28 1.69 1.52
2.14 2.05 1.98 1.88 1.77
40
1.64 1.46
2.84 2.44 2.23 2.09 2.00 1.93 1.83 l.71 1.57 1.38
60 2.79 2.39 2.18 2.04 1.95 1.87 1.77 1.66 l.S 1
"• ·"120 2.75 2.35 2.13 1.99 1.29
1.90 1.82 1.72 1.60 1.45
CJ:) 2.71 2.30 2.08 1.94 l.19
1.85 1.77 1.67 1.55 1.38 (;.>
1.00 V>
w
(continued on following page)
w
'-"
~

Table B5. Variance Ratio Distribution (conrinued)


95th Percentiles (F = .95)

~ 1 2 3 4 5 6 8 12 24 ro

161 2()() ::'. 16 225 230 234 239 244 249 254
2 18.5 19.0 19.2 19.2 19.3 19.3 19.4 19.4 19.5 19.5
3 101 9.55 9.28 9.12 9.01 8.94 8.84 8:14 8.64 8.53
4 7.71 6.94 6.59 6.39 6.26 6.16 6.04 5.91 5.77 5.63
5 6.61 5.79 5.41 5.19 5.05 4.95 4.82 4.68 4.53 4.36
6 5.99 5.14 4.76 4.53 4.39 4.28 4.15 4.00 3.84 3.67
7 5.59 4.74 4.35 4.12 3.97 3.87 3.73 3.57 3.41 3.23
8 5.32 4.46 4.07 3.84 3.69 3.58 3.44 3.28 j 12 2.93
9 5.12 4.26 3.86 3.63 3.48 3.37 3.23 3.07 2.90 2.7 1
10 14.96 4.10 3.71 3.48 3.33 3.22 3.07 2.91 2.74 2.54
12 4.75 3.88 3.49 3.26 3.11 3.00 2.85 2.69 2.50 2.30
15 4.54 3.68 3.29 3.06 2.90 2.79 2.64 2.48 2.29 2.07
20 4.35 3.49 3.10 2.87 2.71 2.60 2.45 2.28 2.08 1.84
25 4.24 3.38 2.99 2.76 2.60 2.49 2.34 2.16 1.96 1.71
30 4.17 3.32 2.92 2.69 2.53 2.42 2.27 2.09 1.89 1.62
40 4.08 3.23 2.84 2.61 2.45 2.34 2.18 2.00 1.79 1.51 >
"'O
60 4.00 3.15 2.76 2.52 2.37 2.25 2.10 1.92 1.70 1.39 ]
120 3.92 3.07 2.68 2.45 2.29 2.17 2.02 1.83 1.61 1.25 0.
;:;·
OC; 3.84 2.99 2.60 2.37 2.21 2.10 1.94 1.75 1.52 1.00 c::i

.,...;
er
if

>
"'O
ii
Table B5. Variance Ratio Distribution (continued) ";:;·0.
99th Percentiles (F = .99) c::i

;;;"

~
8 12 24 er
I 2 3 4 5 6 00
1f
1 4052 4999 5403 5625 5764 5859 5982 6106 6234 6366
2 98.5 99.0 99 .2 99.2 99.3 99.3 99.4 99.4 99.5 99.5
3 34.1 30.8 29.5 28.7 28.2 27.9 27.5 27.l 26.6 26.l
4 21.2 18.0 16.7 16.0 15.5 15.2 14.8 14.4 13.9 13.5
5 16.3 13.3 12.1 11.4 11.0 10.7 10.3 9.89 9.47 9.02
6 13.74 10.92 9.78 9.15 8.75 8.47 8.10 7.72 7.31 6.88
7 12.25 9.55 8.45 7.85 7.46 7.19 6.84 6.47 6.07 5.65
8 11.26 8.65 7.59 7.01 6.63 6.37 6.03 5.G? 5 28 4.86
9 10.56 8.02 6.99 6.42 6.06 5.80 5.47 5.i 1 4 73 4.31
10 10.04 7.56 6.55 5.99 5.64 5.39 5.06 4.71 4.33 3.91
12 9.33 6.93 5.95 5.41 5.06 4.82 4.50 4.16 3.78 3.36
15 8.68 6.36 5.42 4.89 4.56 4.32 4.00 3.67 3.29 2.87
20 8.10 5.85 4.94 4.43 4.10 3.87 3.56 3.'.!3 2.86 2.42
25 7.77 5.57 4.68 4.18 3.86 3.63 3.32 2.99 2.62 2.17
30 7.56 5.39 4.51 4.02 3.70 3.47 3.17 2.84 2.47 2.01
40 7.31 5.18 4.31 3.83 3.51 3.29 2.99 2.66 2.29 1.80
60 7.08 4.98 4.13 3.65 3.34 3.12 2.82 2.50 2 12 1.60
120 6.85 4.79 3.95 3.48 3.17 2.96 2.66 2.34 1.95 1.38
OC; 6.64 4.60 3.78 3.32 3.02 2.80 2.51 2.18 1.79 1.00
-
w
'-"
'-"
356 Appendix B: Tables

Index
..,. - ·'°r-- N 000\0"-

8
--
00 r-
v) ......;
f""')

a\ r---:
00
"° V MM 00 V)
v)-.:iMNN

~O<n~~ 0$~"'
\Civ)-.:iMrr\ MNNN

r- V)
r- N V> V

-- -
0
ocit""i ·~oo

00

~~~~N
oO r---: '° v) v-)

OONV"'lf-.V) ~~~~~ Achievable significance level, 190 Comparable test statistics, 190
o-D"'
N- -
·o oo~'°v)v) Additional sum of squares, 253-267, 276 Composite hypothesis, 142, 149, 194,
Alternative hypothesis, 191, 194 300
M \A 0 0\ N Analysis of differences, 234-240 Computational methods , 46 , 54, 88, 91,
O\NV\Ot"'"l -.C N - V
c: _..; ~ ~ N 0-:00,....:'-C)._o Ancillary statistics, 291 - 295, 299, 317- 248
.g M- - __, -
320, 331-333 Conditional likelihood, 95
::J
.n Association , 170-182 Conditional test~. 300-313
·;::: "1" 0 vi V) 0 r- °' N
r-.00000\\0 OOM-VO '°-r-v Confidence coefficient, 113
(/) r"i oci v) M r-.i S°'oor---:r---: ...0-.0v)v)
i:5 N - -
Bayesian methods, 321-327
Confidence intervals, 113- 123, 189, 298--
.2 300
V> N r- Before and after measurements (see
0 r- V"l V °' OM°'Nr- in normal linear model, 202-205, 261-
~"' ~.....;oO\C)~
N N-
M ~o\oci Paired data) 264
cu Behrens-Fisher problem, 332-333
(J
c: Confidence region (see Confidence
Bias, 129
"'
·;::: \Q-..0000\M
oO...O~Mrri
\OOVOO
NN ·o Block effects , 236
interval)
>"' Consonance region, 186
Contingency table, 170-186
independence hypothesis , 172-182,
Calibration problem, 336 307-3 11
Causation, 179 marginal homogeneity, 182-186, 312
Censoring, 32 Continuity correction, 307
Chi-square approximation, 107, 112, Contour map, 61-65, 91
121 , 143, 145, 150 Corrected sum , 207, 221-222
tables, 351-352 Coverage probability, 102- 123 , 188, 290
Cl (see Confidence interval) CP (see Coverage probability)
Column space, 275 Cri tical region , 190
Combining space, 275
Combining likelihoods , 13 , 22, 45 , 152,
155 Degrees of freedom , 145, 150, 201 , 254
Index Index 359
358

Dependent variable, 220 Independence hypothesis (see Hypothesis Likelihood region, 18, 61, 121 Normal linear models, 196-276
D.f. (see Degrees of freedom) of independence) Linear estimate, 130 assumptions, 196-200
Design of experiments (see Planning Independent likelihoods, 13-17, 19-22 Linear hypothesis, 254 checking the model, 225-229, 267-274
experiments) Independent variable, 220 Linear independence, 243, 254 confidence intervals, 203-205
Discrepancy measure, 136 Influential point, 227, 271 Linear model (see Normal linear model) estimation, 200-202, 247-248
Dose-response models, 74 Information function, 5, 40, 54, 126- Logistic model, 75-83, 158, 313 matrix notation, 242-247
128, 288-289 Log likelihood function, 4, 54 paired measurements, 234-237
expected, 124-128 Log-odds, 76 prediction, 329-336
transformation of, 44, 71-74 significance tests, 202-204, 252-2ti7
ED50, 79 sufficient statistics, 283, 287
Efficiency, 126 Invariance, 37-40, 54, 130-132
Marginal homogeneity (see Hypothesis of Null hypothesis, 191
Empirical Bayes methods, 32J marginal homogeneity)
Error variable, 197 Marginal likelihood, 95, 204
K-sample model, 216-220, 245, 255-258
Estimate Maximum likelihood equation, 5, 47, 54, One-sample model, 199, 206-2!2, 244
least squares (see Least squares 89 Orthogonal matrix, 274
estimate) Maximum likelihood estimates, 3, 54 Orthogonal transformation, 274
maximum likelihood (see Maximum Least squares estimates, 200
Left inverse, 248 combined or pooled, 15, 45 Outlier, 227, 269
likelihood estimate) computation of, 47, 54
unbiased, 129 Leverage, 268, 271
LI (see Likelihood interval) infinite, 22-23
Expected information, 124-128 invariance of, 37-39, 54, 130-!32 Paired data, 94-95, 187, 234-240
Explanatory variable, 220 Likelihood contour, 61-65
calculation of, 63-64, 91-92 in normal linear models, 200, 247 Parallel line model, 246
Exponential family, 195, 282-283, 285 sampling distribution of, 97-102 Parameter space, 5
Likelihood function, 3, 54
for continuous models, 25-3 7 and sufficient statistics, 288-295, 305 Parameter transformations, 43-44, 71-74,
factorization of, 68-69 Maximum likelihood interval, 62-66, 116
F distribution tables, 353-356 for frequency tables, 8-10 121-123 Pearson goodness of fit statistic, 161
Fiducial argument, 314-321 multi-parameter, 92-95 Maximum relative likelihood function, Pivotal quantity, 317
Fiducial distribution, 315--321, 324 normal approximation to, 40-45, 70- 65, 93 Planning experiments, 124-128, 181,
Fiducial probability, 315 74, 115, 122, 155, 288 Mean squared error, 129 230, 237, 259, 294
Fisher's measure of expected information, Measurement interval, 25-29 Polynomial model, 244
Likelihood interval, 18, 48-49
124 approximate, 40, 115 Method of Lagrange, 264 Pooled variance estimate, 213, 216-219
Fitted value, 201 coverage probability of, 102-120 Minimally sufficient, 279 Pooling class frequencies, 164, 169
Frequency properties, 96-133, 188-195, maximum, 62-65 Minimum variance unbiased, 129 Posterior distribution, 321
289-300 Likelihood ratio statistic, 97, 120, 141- MLE (see Maximum likelihood estimate) Power, 190-195
F-test, 254-260, 262 149 Multi-parameter likelihoods, 92-95 Prediction, 326-336
Functionally independent, 145 chi-square approximation, 107-113, Multiple regression, 243 Predictive distribution, 327
121, 145, 150 MVU (see Minimum variance unbiased) interval, 328
sampling distribution of, 97-102, 296- Prior distribution, 321-326
Goodness of fit tests, 161-170 300, 305 Probit model, 75
Likelihood ratio tests, 142 Natural parameter, 285 P-value, 136
for binomial probabilities, 156-160, Newton-Raphson method, 55, 77-78,
305-307 88-90
Hardy-Weinberg law, 301-304
for composite hypotheses, 149-156 Newton's method, 46-51, 91-92 Random error, 197
Homogeneity hypothesis (see Hypothesis
for equal variances, 218-219 Neyman-Pearson fundamental lemma, Randomization, 179--182
of homogeneity)
Hypothesis, 134, 141-142, 149, 190- for homogeneity, 152--156 192 Random sample size, 292
for independence, 170-179, 307-311 Noise, 197 Reference set, 296-300
195, 254
of homogeneity, 152-155, 218-219 for marginal homogeneity, 183-185 Normal approximations (see Likelihood Regression, 221, 243
of independence, 172-179, 307-311 for multinomial probabilities, 160-170 function, normal approximations Relative efficiency, 126
of marginal homogeneity, 182-186, in normal linear models, 252--267 to) Relative likelihood function, 17, 61
for simple hypotheses, 141-148 Normal distribution tables, 347-349 Residuals, 20 I, 269-273
312
360 Index

standardized, 268 Target value, 197


Residual sum of squares, 201, 275 t distribution tables, 350
Response variable, 220 Test criterion, 136
RLF (see Relative likelihood function) Test of significance, 134-141
conditional, 300-3 13
Sample variance, 207 Test statistic , 136
Sampling distributions, 97-102 Testing a true hypothesis, 334-336
Sensitivity, 190-195 Transformations of data, 198, 223-224
Serial correlation, 270 of parameters (see Parameter
Set of sufficient statistics, 279 transfonnations)
Significance interval, 186-190 of sufficient statistics, 287
Significance level, 136, 289 t-test, 203. 260-266, 331
achievable, 190 Two-sample model , 199, 212-220, 245
Significance region, 186-190
Significance test (see Test of significance)
Simple hypothesis, 142, 192 Unbiased estimate, 129
Size a critical region, 190 'u~iformly most powerful, 194
SL (see Significance level)
Solomon-Wynne experiment, 83
Standardized residual, 268 Variance estimation , 93-95, 202-222,
Straight line model , 199, 220-234, 243 253-258
through the origin, 233 Variance ratio distribution (see F
Student's distribution (see t distribution) distribution)
Sufficiency principle, 277- 279
Sufficient statistics, 279-289

Weibull distribution, 56-58, 63 , 67 , 72


Tables, 347-356 Weighted least squares, 206

Вам также может понравиться