Вы находитесь на странице: 1из 16

Generalized Linear Models

Author(s): J. A. Nelder and R. W. M. Wedderburn


Source: Journal of the Royal Statistical Society. Series A (General), Vol. 135, No. 3 (1972), pp.
370-384
Published by: Blackwell Publishing for the Royal Statistical Society
Stable URL: http://www.jstor.org/stable/2344614
Accessed: 12/11/2010 07:06

Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at
http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless
you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you
may use content in the JSTOR archive only for your personal, non-commercial use.

Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at
http://www.jstor.org/action/showPublisher?publisherCode=black.

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.

Royal Statistical Society and Blackwell Publishing are collaborating with JSTOR to digitize, preserve and
extend access to Journal of the Royal Statistical Society. Series A (General).

http://www.jstor.org
J. R. Statist.Soc. A, 370
(1972), 135, Part 3, p. 370

GeneralizedLinear Models

By J. A. NELDER and R. W. M. WEDDERBURN


Rothamsted
Experimental Herts
Station,Harpenden,

SUMMARY
The techniqueof iterative
weightedlinearregressioncan be used to obtain
maximumlikelihoodestimatesof theparameters withobservations distri-
butedaccordingto someexponential familyand systematiceffectsthatcan
be madelinearbya suitabletransformation. oftheanalysis
A generalization
Thesegeneralized
ofvarianceis givenforthesemodelsusinglog-likelihoods.
by examplesrelatingto fourdistributions;
linearmodelsare illustrated the
Normal,Binomial(probitanalysis,etc.),Poisson(contingency tables)and
gamma(variancecomponents).
The implicationsof the approachin designingstatisticscoursesare
discussed.
Keywords: ANALYSIS OF VARIANCE; CONTINGENCY TABLES; EXPONENTIAL FAMILIES;
INVERSE POLYNOMIALS; LINEAR MODELS; MAXIMUM LIKELIHOOD:
QUANTAL RESPONSE; REGRESSION; VARIANCE COMPONENTS; WEIGHTED
LEAST SQUARES

INTRODUCTION
LINEARmodelscustomarily embodybothsystematic and random(error)components,
withtheerrorsusuallyassumedto have normaldistributions. The associatedanalytic
techniqueis least-squarestheory,whichin its classical formassumedjust one error
component;extensionsformultipleerrorshave been developedprimarily foranalysis
of designedexperimentsand surveydata. Techniques developed for non-normal
data includeprobitanalysis,wherea binomialvariatehas a parameterrelatedto an
assumedunderlying tolerancedistribution, and contingency tables,wherethe distri-
bution is multinomialand the systematicpart of the model usuallymultiplicative.
In both theseexamplesthereis a linearaspectto the model; thusin probitanalysis
theparameter p is a functionoftoleranceY whichis itselflinearon thedose (or some
functionthereof),and in a contingency tablewitha multiplicative modelthelogarithm
of theexpectedprobabilityis assumedlinearon classifying factorsdefining thetable.
Thus forboth,thesystematic partofthemodelhas a linearbasis. In anotherextension
(Nelder, 1968) a certaintransformation is used to produce normal errors,and a
differenttransformation of theexpectedvaluesis used to producelinearity.
So far we have mentionedmodels associated with the normal,binomial and
multinomial distributions (thislast can be thoughtof as a set of Poisson distributions
with constraints).A furtherclass is based on the x2 or gamma distributionand
arises in the estimationof variancecomponentsfromindependentquadraticforms
derivedfromtheoriginalobservations.Againthesystematic componentofthemodel
has a linearstructure.
In thispaper we develop a class of generalizedlinearmodels,whichincludesall
the above examples,and we give a unifiedprocedurefor fittingthem based on
1972] NELDER AND WEDDERBURN- Generalized Linear Models 371

likelihood. This procedureis a generalizationof the well-knownone describedby


Finney (1952) for maximumlikelihoodestimationin probit analysis. Section 1
definesthe models, and Section 2 develops the fittingprocess and generalizesthe
for
analysis of variance. Section 3 gives exampleswith four special distributions
the randomcomponents.In Section4 we considerthe usefulnessof the modelsfor
coursesof instructionin statistics.
1.1. The RandomComponent
withdensityfunction
Suppose our observationsz come froma distribution
-g(z; 6, b)= exp [x(o {z6-g(O) +h(z)}+13(#,z)],
where ()(O>0 so that for fixed b we have an exponentialfamily. The parameter
b could standfora certaintypeof nuisanceparametersuch as the variancea2 of a
normal distributionor the parameterp of a gamma distribution (see Section 3.4).
We denotethemean of z by ft.
We requireexpressionsforthe firstand second derivativesof the log-likelihood
in termsof themean and varianceof z and the scale factoro6(4). We use theresults
(see, forexample,Kendall and Stuart,1967,p. 9)
E(aL/a0) = 0 (1)
and
E?2la62) -EQ?L/a )2 (2)
underthe sign of integrationused in theirderivationcan
wherethe differentiation
be justifiedby Theorem9 of Lehmann(1959).
We have
AL/@ Z - (O)}.
0 = o {zg
Then (1) impliesthat
u= E(z) = g'(0);
hence
aLl a = 1o()(z- ). (3)
From (2) we obtain a(O)g"(0) = [(O)]2 var(z), whence
g"(0) = a(#)var(z) = V, say, (4)
so that V is thevarianceof z whenthe scale factoris unity. Then
a2 LI-a -24q V. (5)
We note also that
V = dp]/dO. (6)
exponentialfamilyc(A)= 1, we can write
For a one-parameter
-(z; 0) = exp{z6-g(6) + h(z)},
so that

and
_a2LIa-2= V = var(z).
372 NELDER AND WEDDERBURN - Generalized
LinearModels [Part3,

1.2. The LinearModelfor SystematicEffects


The term"linearmodel" usuallyencompassesboth systematic and random
components
in a statistical
model,butwe shallrestrict
thetermto includeonlythe
systematic
components. We write
m
Y= E/3X2
i=1
whenthexi are independentvariateswhosevaluesare supposedknownand Piqare
parameters.The Piqmayhave fixed(known)valuesor be unknownand require
estimation.
Anindependentvariatemaybequantitative
andproducea singlex-variate
andproducea setof x-variates
inthemodel,orqualitative whosevaluesare0 and 1,
or mixed.Considerthemodel
Yjj=ajf+Pujj+jvq,j (i= 1,...,n,j=, 1...,p),
wherethedataareindexedbyfactors whoselevelsaredenotedbyi andj. Theterm
(x includes
n parameters witha qualitative
associated variaterepresented
byn dummy
x-variatecomponents takingvalues1 foroneleveland0 fortherest; luij represents
a quantitative
variate,
namelyu withsingleparameterP,.andyjvi showsp parameters
yj associatedwitha mixedindependent variatewhosep components takethevalues
ofvijforonelevelofj andzerofortherest.A notation suitableforcomputer usehas
been developedby C. E. Rogersand G. N. Wilkinson and is to be publishedin
AppliedStatistics.
1.3. The Generalized
LinearModel
We nowcombine thesystematic
andrandomcomponents inourmodeltoproduce
thegeneralized linearmodel.Thisis characterized
by
(i) A dependent variablez whosedistribution
withparameter
0 is one of the
classin Section1.1.
(ii) A set of independentvariables xl, ...,x. and predicted Y = ,/3 xi as in
Section1.2.
(iii) A linkingfunction 0 =f(Y) connectingtheparameter 0 ofthedistribution
ofz withthe Y's ofthelinearmodel.
Whenz is normally withmean0 andvarianceu2 andwhen0 = Y, we
distributed
have ordinary linearmodelswithNormalerrors.Otherexamplesof thesemodels
willbe described in Section3 underthevariousdistributions
oftheexponentialtype.
Wenowconsider thesolution ofthemaximum likelihood
equationsfortheparameters
ofthegeneralized linearmodelsand showitsequivalenceto a procedureofiterative
weighted leastsquares.
2. FITTING THE MODELS
2.1. The MaximumLikelihoodEquations
The solutionof themaximum equationsis equivalent
likelihood to an iterative
weighted
least-squares witha weight
procedure function
w = (d14dY)2/V

and a modified variable(theworking


dependendent probitofprobitanalysis)
y = y+ (Z-)/(d/dY),
1972] NELDER AND WEDDERBURN - LinearModels
Generalized 373

where,u, Y and V are based on current theresultsof


estimates.Thisgeneralizes
Nelder(1968).
Proof.Writing L forthelog-likelihood we have,from(3)
fromone observation
and (6),
aL f (_ d8 dlbyx

VdYd
-
=o0Z V{t dZX. (7)

and
02L 02L
O pj- ay2Xi Xi

where
a2Z b2L(dO\2 aL d2O
Oya= s dY_7+To dy2

=1_f+lV(-(dY + dY2

(y)
(-d-)
i /2 V+ (z
s?-d-yU
(z__)

o) d2 02 (8)
The expected
secondderivative signis thus
withnegative

(0) {(d )2/ V}xixj from(8).

wfortheweight
Writing function V, (7) gives
(dp/dy)2/
OL/Ij3,= o(k) wx,(z- )/(dl/d Y).
ThustheNewton-Raphson processwithexpectedsecondderivatives to
(equivalent
Fisher'sscoring fora sampleofn gives
technique)
ASP = C, (9)
whereA is a mx m matrix
with
n
Ai= Wk Xik Xjk
k=1
and C is a mx 1 vectorwith
Ci = IWkXik(ZIJu)/(dp4dY).
Finallywehave
(AAi = 1A4jfPJ = WkXikYk
in theform
so that(9) maybe written
AP*=r
where rj= IWkXikYk, Yk= Yk+(Zk-,uk)/(dtkl/dYk) and * = +Sp
374 NELDER AND WEDDERBURN - GeneralizedLinearModels [Part3,

Startingmethod
In practicewe can obtaina good startingprocedureforiterationas follows:take
as a firstapproximationj = z and calculate Y fromit; thencalculate w as before
and set y = Y. Then obtain the firstapproximationto the 3's by regression.The
methodmayneed slightmodification to deal withextremevalues of z. For instance,
withthe binomialdistribution it will probablybe adequate to replaceinstancesof
z =O0 or z = n withz = 2 and z = n- -, where,e.g. withthe probitand logittrans-
formations, tu= 0 or p = n would lead to infinite
values for Y.
Statistics
2.2. Sufficient
of
An importantspecial case occurs when 0, the parameterof the distribution
the randomelement,and Y the predictedvalue of the linearmodel,coincide. Then
L -zY-g(Y)+h(z),
and, using(3),
aL/ ai = CO) (z -) Xi.

The maximumlikelihood equations are then of the form k )(Z-) Xik = 0, the
summationbeingoverthe observations.Hence we have
E Zk Xik = E&k Xik. (10)

For a qualitativeindependentvariate,thisimpliesthatthefittedmarginaltotalswith
respectto thatvariatewillbe equal to theobservedones.
FromtheexpressionforL we see thatthequantitiesEk Zk Xik are a setof sufficient
statistics.Also, in (8) d2 O/dy2= 0 and so

62p_
_ Ex=(q) - (dv /V) xi (11)

When 0 is also the mean of the distribution,


i.e. t = 0 = Y, we have the usual
linearmodelwithnormalerrors,forg'(0) = 0 gives
1
g(O) = 02 +const
whichuniquelydeterminesthe distribution as Normal withvariance 1/x(0b)(using
Theorem 1 of Patil and Shorrock,1965). The sub-classof models forwhichthere
are sufficient
statisticswas notedby Cox (1968), and Dempster(1971) has extended
it to includemanydependentvariates.

2.3. The Analysisof Deviance


A linearmodelis said to be orderedifthefitting oftheP's is to be done in the same
sequence as theirdeclarationin the model. Ordering(or partial ordering)may be
impliedby the structure of the model; forinstanceit makesno senseto fitan inter-
actionterm(ab)ij beforefitting thecorresponding maineffects a, and b3. It mayalso
be impliedby the objectivesof thefitting,i.e. ifa trendmustbe removedfirstbefore
the fittingof furthereffects.More commonly,however,the orderingis to some
extentarbitrary,and thisgivesriseto difficult problemsof inference whichwe shall
not tryto tacklehere. For ease of expositionof thebasic ideas we shall assumethat
1972] NELDER AND WEDDERBURN- Generalized Linear Models 375

the model underconsiderationis ordered,and will be fittedsequentiallya termat a


time. The objectivesof the fitting will be to assess how many termsare required
foran adequate descriptionof thedata, and to derivetheassociatedestimatesof the
parametersand theirinformation matrix.
Two extrememodels are conceivablefor any set of data, the minimalmodel
whichcontainsthe smallestset of termsthatthe problemallows, and the complete
modelin whichall the Ys are different and matchthe data completelyso that{i = z.
An extremecase of the minimalmodel is the null model, which is equivalentto
thegrandmeanonlyand effectively
fitting consignsall thevariationin thedata to the
random componentof the model, while the completemodel fitsexactlyand so
consignsall thevariationin thedata to thesystematic part. The model-fitting
process
withan orderedmodelthusconsistsofproceedinga suitabledistancefromtheminimal
modeltowardsthecompletemodel. At each stagewe tradeincreasinggoodness-of-fit
to the currentset of data againstincreasingcomplexityof the model. The fitting of
the parametersat each stage is done by maximizingthe likelihoodfor the current
model and thematchingof the modelto the data willbe measuredquantitatively by
the quantity-2Lmax whichwe propose to call the deviance. For the fourspecial
distributionsthedeviancetakesthe form:
Normal I(z -
A)2/o2,

Poisson 2{Ez In(z/j2)- (z - u},


Binomial 2[Ez ln(z/4)+ (n-z) ln{(n-z)/(n-A)}],
Gamma 2p{-E ln(z/) + E(z-MI).
Note thatthe devianceis measuredfromthat of the completemodel,so thatterms
involvingconstants,thedata alone, or thescale factoralone are omitted.The second
term in the expressionsfor the Poisson and gamma distributionis commonly
identicallyzero (see Appendixforconditionsand proof).
Associatedwitheach model is a quantityr termedthedegreesoffreedomwhich
is givenbytherankoftheXmatrix,or equivalently thenumberoflinearlyindependent
parametersto be estimated.For a sampleofn independent observations,thedeviance
for the model has residual degreesof freedom(n-r). The degrees of freedom,
multiplied wherenecessarybya scalefactor,forma scalefora setofsequentialmodels
withwhichdeviancescan be compared; when (residualdegreesof freedomx scale
factor)is approximately equal to thedevianceof thecurrentmodelthenit is unlikely
thatfurther fittingof systematiccomponentsis worthwhile. The scale factormaybe
known (e.g. unityfor the Poisson distribution)or unknown(e.g. for the normal
distributionwithunknownvariance) If unknownit may be estimabledirectly,e.g.
by replicateobservations,or indirectlyfromthe devianceafteran adequate model
has been fitted.The adequacyof themodelmaybe determined byplottingsuccessive
deviancesagainsttheirdegreesof freedom,and acceptingas a measureof the scale
factorthe linearportionthroughthe origindeterminedby thosepointswithfewest
degreesof freedom.
2.4. The Generalizationof Analysisof Variance
The firstdifferences of the deviancesforthe normaldistribution are (apart from
a scale factor)the sums of squares in the analysisof variancefora sequentialfitas
shownfora three-term modelin Table 1.
376 NELDER
AND WEDDERBURN
- Generalized
LinearModels [Part3,

TABLE1
Deviancesandtheirdifferences

Model term Deviance Difference Component

Minimal dm dm-dA A
B dA dA- dAB B eliminating
A
C dAB dAB-dABO C eliminating
A and B
Complete do dABC-do Residual

Thegeneralized
analysisofvariance modelis nowdefined
fora sequential to have
componentsgivenby thefirstdifferences
of thedeviance,withdegreesof freedom
defined
as above. Thesecomponents havedistributions to X2, exactly
proportional
fornormalerrors,
approximatelyforothers.Sucha generalization
oftheanalysisof
variancewas suggested
by Good (1967).

3. SPECIAL
DISTRLJUTIONS
3.1. The NormalDistribution
Here,we have
L =L a=~~2
-2(Zlk
_
tj2 _-1Z2) -i ln62
and in thenotation
of Section1.2
p =G, V=1.
Inversepolynomials
providean examplewherewe assumethattheobservations
u arenormalon thelog scaleandthesystematiceffectsadditiveon theinverse
scale.
Then
z = ln u and Y = ell.
Nelder(1966) givesexamplesof inversepolynomials calculatedusingthefirst
approximation ofthemethodin thispaper.
More generally, as shownin Nelder(1968),we can considermodelsin which
thereis a linearizing
transformation
f and a normalizing transformation
g. This
meansthatif theobservationsare denotedby u, theng(u) is normally distributed
withmean,uandconstant varianceu2 andf{g-1(Ik)}= ,S, xi.
Thenwe haveV = 1 and Y =f{g-'(Ik)} so that
W = [(d/dY)g{f-1(y)}]2
and
y = Y+{g(u)-p}/[(d/dY)g{f-1(Y)}].

data
Example: Fisher'stuberculin-test
Fisher(1949) published16 measurements of tuberculin
responsewhichwere
classified
by threefour-level
factorsin a Latin-square
typeof arrangementas in
Table 2.
1972] NELDER AND WEDDERBURN- Generalized
LinearModels 377

TABLE 2

Cow class
Sites Treatments
I III II IV

3+6 454 249 349 249 A B C D


4+5 408 322 312 347 B A D C
1+8 523 268 411 285 C D A B
2+7 364 283 266 290 D C B A

Fishergave reasonsforbelieving thatthevariancesof the observations were


proportionalto theirexpectation, and thatthe systematic partof themodelwas
linearon thelog-scale.The treatments were:
B Standardsingle
A Standard double
D Weybridge half
C Weybridge single
andtheseweretreated as a 2 x 2 factorial
arrangement, no interactionbeingfitted.
If the data had been Poissonobservations the maximum likelihoodestimates
wouldhavehad theproperty thatthemarginal totalsof thefittedvalues(on the
untransformed scale) wouldbe equal to the marginaltotalsof the observations.
Although theobservations werenot,infact,countsbutmeasurements inmillimetres,
Fisherdecidedto estimatethe effects as if theywerePoissonobservations. He
producedapproximations to theeffects by a methodwhichmadeuse of features
of theparticular Latinsquare,and thenverified thatthesegavefitted valueswith
margins approximately equalto theobserved ones.
Anotherapproachwouldbe to treatthe squarerootsof the observations as
normally distributedwithvariancea2/4.Thenwe havez = lu whereu is an obser-
vationand
Y= 21n,u.
Fishergave estimates of effectson the log scale relativeto B. We produced
estimatesbythesquare-root/logarithmic method justdescribed, and thetwosetsof
estimatesaregivenin Table3.
TABLE 3

Fisher'sresult Ourresult

B 0.0000 0.0000
A 0.2089 0.2092
D 0.0019 0.0023
C 0.2108 0.2115

The methodof thispapercan also be usedto analysethedata as if theywere


Poissonobservations.
The estimatesof effects
obtainedby thismethodagreewith
to aboutfourdecimalplaces.
ourotherestimates
378 NELDER AND WEDDERBURN- Generalized Linear Models [Part 3,

3.2. ThePoissonDistribution
Here L = zln,u-,u so we have 0 = lnp.and V== pt.
When Y= In,u thereare sufficient statisticsand a unique maximumlikelihood
estimateof/, provideditis finite.It willalwaysbe finiteifthereareno zeroobservations.
If Y-=1p (O< A< 1), L -oo as IPIoo and henceL must have a maximum
forfinitep. Also
a2L a2 L
= E
:apiay2a

It is easilyverifiedthata2L/ y2< 0 and henceL is negativedefinite.It followsthat


P is uniquelydetermined.When Y = ,u the same resultholds providedthat the
x's are linearlyindependent whenunitswithz = 0 are excluded.
The main application of generalizedlinear models with Poisson errorsis to
contingency tables. These arisefromdata on countsclassifiedby two or morefactors,
and theliterature on themis enormous(see, forexample,Simpson,1951; Irelandand
Kullback, 1968; Chapter8 of Kullback 1968; Ku et al., 1971).
Probabilisticmodelsforcontingency tables are built on assumptionsof a multi-
nomialdistribution or a set of multinomial distributions.As Birch(1963) has shown,
the estimationof a set of independentmultinomialdistributions is equivalentto the
estimationof a set of independent Poisson distributions,and in whatfollowswe shall
regarda contingency table as a set of independent Poisson distributions.
The systematic part of modelsof contingency tablesis usuallymultiplicative,and
thusgivessufficient statisticswithPoissonerrors.The modeltermsusuallycorrespond
to qualitativex's, the equivalentof constantfitting, but quantitativetermsoccur
naturallywhenthe classifying factorshave an underlying quantitativebasis.
Example: A contingencytable
Maxwell (1961, pp. 70-72) discussesthe analysis of a 5 x 4 contingencytable
givingthe numberof boys withfour different ratingsfor disturbeddreamsin five
age groups. The data are given in Table 4. The higherthe ratingthe
different
moretheboy suffers fromdisturbeddreams.

TABLE 4

Age Rating
in
years 4 3 2 1 Total

5- 7 7 3 4 7 21
8- 9 13 11 15 10 49
10-11 7 11 9 23 50
12-13 10 12 9 28 59
14-15 3 4 5 32 44

Total 40 41 42 100 223

Here we can fitmain effectsand a linearx linearinteractionusingan x-variate


of the formuv whereu -2, - 1, 0, 1 or 2 accordingto the age groupand v is the
ratingfordisturbeddreams.
1972] NELDER AND WEDDERBURN- Generalized Linear Models 379

is - 0-205.
The estimatedlinearx linearinteraction
We can forman analysisof deviancethus(regardingmain effectsas fixed):

Degrees
Model term(s) Deviance Difference of
freedom

Minimal (i.e. main effectsonly) 32f46 18-38 1


Linear x linear 14-08 14f08 11
Complete 0

Treating18f38and 14 08 as x2 variateswith1 and 11degreesoffreedomrespectively


we findthat 18X38is largewhile 14*08is close to expectation.We concludethatthe
data are adequatelydescribedby a negativelinearx linear interaction(indicating
thatthedreamratingtendsto decreasewithage).
Maxwell,usingthemethodofYates (1948),obtaineda decompositionofa Pearson
X2 as follows:

Degrees
Sourceofvariation Of X2
freedom

Due to linear regression 1 17X94


Due to departure fromlinear regression 11 13X73

Total 12 31X67

Maxwell's values of x2 are clearlyquite close to ours and his conclusionsare


essentiallythe same.
3.3. The BinomialDistribution
theusualform
We re-write
L = rlnp+(n-r)lnq
as
L = z ln (k/n)+ (n-z) ln{(-(/n)}
= z ln{4/(n-)} + nln (n-) + termsin n,
i.e. we put z forr, and pu= E(z) = np. Thus
6 =ln{/n)
and
V = t(n-p,)/n.
statisticsare providedby thelogittransformation
Sufficient giving
= ney/(1
+ ey).
380 NELDER AND WEDDERBURN - Generalized
LinearModels [Part3,

Probitanalysis
We put
p = n(D(Y),
where(D is the cumulativenormaldistribution
function.Thereare no sufficient
statistics
here;theanalysis
viaiterative
weighted
least-squares
is wellknown(Finney,
1952).

Fittingconstantson a logitscale
Thistechnique was introduced by Dykeand Patterson (1952)and is appliedto
multiway tablesofproportions. It is a specialcaseofthelogistictransformationwhen
all the x's are qualitative,
and yieldsas sufficient statistics
the totalresponding
(Zz) foreachrelevant margin.
Thelogitanalogueofprobitanalysis is,ofcourse,formallyidentical,withquanti-
tativeratherthanqualitative
x's. Againarbitrary mixturesofx typesdo notintroduce
anything new.Modelsbasedonthelogistic transform
havebeenextensively developed
byCox (1970).
Fromtheresultsof Birch(1963)mentioned above,it followsthatmodelswith
independent binomialdataareequivalent to modelswithindependent Poissondata.
Bishop(1969)showedthata binomialmodelthatis additive on thelogitscalecanbe
treatedas a Poissonmodeladditiveon thelog scale.

3.4. The GammaDistribution


For thegammadistribution
L = -p(z/,u+ln,u-lnz)-lnz.
We have 0 =- l/p and V = dpud6= ,U2. Thereare sufficient when Y = l/,u.
statistics
Thustheinversetransformation is relatedto thegammadistribution,
to linearity
as thelogarithmic
to thePoisson,or theidentity transformationto thenormal.The
correspondingmodelsseemnotto havebeenexplored.
One applicationof linearmodelsinvolving thegammadistribution is theesti-
mationofvariancecomponents. Herewehavesumsofsquareswhichareproportional
to x2variates
andtheexpectation ofeachis a linearcombination ofseveralvariances
whichareto be estimated. It is better
to write
L= - (zv/2p)- (v/2)In t +{(v/2)+ 1}lnz,
where v is the degrees of freedom of z; then putting 0=- v/2p we have
V= dudO = 2u2/vand y = z, Y==, and w = v/2u2.
The deviancetakestheform

Iv Iln(z/,u)+ {(z- A)I}]

and theresultoftheAppendix = 0; hencethedeviancesimplifies


givesZ{v(z-,u)/1A}
to
1972] NELDER AND WEDDERBURN - GeneralizedLinearModels 381

Example
In a balancedincompleteblock designin whichb > v, we can producean analysis
of varianceas follows(Yates, 1940 or paper VIII of Yates, 1970).

Degreesof
freedom

varieties):
Blocks(eliminating
Varietalcomponent v-1
Remainder b-v

Total b-1

Varieties(ignoringblocks) v-1
Intra-block error rv-v-b +1

Total rv-1

The expectationsof some of the mean squares are as follows:

Expected
meansquare

varieties):
Blocks(eliminating
Varietalcomponent a2 + Ekcb
Remainder a2+ ka2b
Intra-blockerror a2

Here u2 iS theintra-block variance.


varianceand U2 iS theinter-block
On p. 322 of Yates (1940) (or p. 207 of Yates, 1970) an exampleis givenwith
v = 9, r = 8, k = 4, b = 18, A = 3 and E= - . The threemean squares mentioned
above, theirdegreesof freedomand theirexpectationsare as follows:

Degrees
Mean square of Expectation
freedom

Blocks(eliminating
varieties):
Varietalcomponent 4-6329 8 a + 87Ob
Remainder 15-3557 9 a2 +4ab
Intra-blockerror 2-5968 46 a2

Each iterationthentakes the formof a weightedfittingof a straightline with


x-values2, 4 and 0.
The estimatesobtainedwere:
o= 2-5870, b-= 20314.
Yates equated the intra-blockerrormean square and the blocks (eliminating
varieties)mean square to theirexpectations.This gives
a = 2 5968, N2 = 2-0813.
382 NELDER AND WEDDERBURN - GeneralizedLinearModels [Part3,

This was one example wherethe firstapproximationwas not verygood. Our


firstapproximationwas
2 = 2-5874, 'bA = 09313
Subsequentiterationsgave the followingvalues:

IterationNo. a2 orb

1 2-5695 2.0737
2 2-5874 2.0302
3 2-5870 2.0315
4 2-5870 2.0314

4. THE MODELS IN THE TEACHING OF STATISTICS


We believethatthegeneralizedlinearmodelsheredevelopedcould forma useful
basis for courses in statistics.They give a consistentway of linkingtogetherthe
systematicelementsin a model withthe random elements. Too oftenthe student
meetscomplexsystematic linearmodelsonlyin connectionwithnormalerrors,and if
he encountersprobit analysisthis may seem to have littleto do with the linear
regressiontheoryhe has learnt. By isolating the systematiclinear component
the studentcan be introducedto multiwaytables and theirmargins,additivity,
weighting,quantitativeand qualitativeindependentvariates,and transformations,
quite independently of the added complicationsof errorsand associatedprobability
distributions.The essentialunityof the linear model, encompassingqualitative,
quantitativeand mixedindependent variatescan be broughtout and theintroduction
of qualitativevariatesbringsin naturallythe ideas of singularity of matricesand of
constraints.
The complementary set of probabilitydistributions would be introducedin the
usual way,includingtheuse of transformations of data to attaindesirableproperties
oftheerrors.The difficult problemofdiscussinghowfartransformations can produce
both linearityand normalitysimultaneouslynow disappears because the models
allow two different transformations to be used, one to induce the linearityof the
systematiccomponentand one to induce the desired distributionin the error
component. (Note that this distributionneed not necessarilybe equal-variance
normal.)
The systematicuse of log-likelihood-ratios (or, equivalently,differencesin
deviance)extendstheideas of analysisofvarianceto otherdistributions and produces
an additivedecompositionfor the sequentialfitof the model. To appreciatethe
simplicitythatthiscan produceitis onlynecessaryto look at thealgebraiccomplexities
arisingfromtheattemptsto analysecontingency tablesby extensionsof thePearson
x2 approach. Irwin (1949), Lancaster (1949, 1950) and Kimball (1954) have given
modificationsof the usual formulaefor the Pearson x2 in contingencytables, to
producean additivedecompositionofx2whenthetableis partitioned.Theseformulae
are muchmorecomplicatedthantheusual one forPearsonx2. However,theequiva-
lentlikelihoodstatistic,namely

2fn 1lnnnj- i~ lnnni-YnlnnjI +n lnn}


1972] NELDER AND WEDDERBURN - GeneralizedLinearModels 383

does not need any modification to make it additiveand is easy to computeas tables
of nInn are available (Kullback, 1968).
Finally,the fact that a singlealgorithmcan be used to fitany of the models
impliesthatquite a small set of routinescan providethe basic computingfacilityto
allow studentsto fitmodels to a wide range of data. They can get experienceof
programming bywritingspecialroutinesto deal withspecialformsof outputrequired
by particularmodels,suchas theLD5O of probitanalysis. In thisway thedistinction
can be made betweenthemodel-fitting partof an analysisand thesubsequentderived
quantities(withestimatesof theiruncertainties) whichparticularproblemsrequire.
We hope thattheapproachdevelopedin thispaperwillproveto be a usefulway of
unifyingwhatareoftenpresentedas unrelatedstatisticalprocedures,and thatthisunifi-
cationwill simplifythe teachingof the subjectto bothspecialistsand non-specialists.

REFERENCES
BIRCH, M. W. (1963). Maximumlikelihoodin three-way contingency tables. J. R. Statist.Soc.
B, 25, 220-233.
BISHOP, Y. M. M. (1969). Full contingency
tables,logits,and splitcontingency tables. Biometrics,
25, 383-399.
Cox, D. R. (1968). Noteson someaspectsofregression analysis.J.R. Statist.Soc. A, 131,265-279.
- (1970). Analysis of Binary Data. London: Methuen.
DEMPSTER, A. P. (1971). An overview
ofmultivariate
data analysis.J. Multivar.Anal., 1, 316-346.
DYKE, G. V. and PATTERSON, H. D. (1952). Analysisof factorialarrangements whenthe data
are proportions.Biometrics,
8, 1-12.
FINNEY, D. J. (1952). Probit Analysis. 2nd ed. Cambridge:University Press.
FISHER, R. A. (1949). A biologicalassay of tuberculosis.Biometrics,
5, 300-316.
GOOD, I. J.(1967). Analysisoflog-likelihood ratios,"ANOA". (A contribution to thediscussion
of a paper on least squares by F. J. Anscombe.) J. R. Statist. Soc. B, 29, 39-42.
IRELAND, C. T. and KULLBACK, S. (1968). Contingency
tableswithgivenmarginals.Biometr-ika,
55, 179-188.
IRWIN, J.0. (1949). A noteon thesubdivision
ofX2intocomponents.Biometrika, 36, 130-134.
KENDALL, M. G. and STUART, A. (1967). The Advanced Theoryof Statistics, 2nd ed, Vol. II.
London: Griffin.
KIMBALL, A. W. (1954). Short-cut formulasfortheexactpartitionof x2 in contingencytables.
Biometrics,10, 452-458.
Ku, H. H., VARNER, R. N. and KULLBACK, S. (1971). On the analysisof multidimensional
contingencytables. J. Amer. Statist. Ass., 66, 55-64.
KULLBACK, S. (1968). InformationTheoryand Statistics. 2nd ed. New York: Dover.
LANCASTER, H. 0. (1949). The derivationand partitionof x2 in certaindiscretedistributions.
Biometrika,36, 117-128.
(1950). The exactpartitionof x2 and its applicationto theproblemof thepoolingof small
expectations.Biometrika,37, 267-270.
LEHMANN, E. L. (1959). Testing Statistical Hypotheses. London: Wiley.
MAXWELL, A. E. (1961). Analysing Qualitative Data. London: Methuen.
NELDER, J. A. (1966). Inversepolynomials, a usefulgroupof multi-factor responsefunctions.
Biometrics,22, 128-141.
(1968). Weightedregression,
quantal responsedata and inversepolynomials.Biometrics,
24, 979-985.
PATIL, G. P. and SHORROCK, R. (1965). On certainpropertiesof theexponential-type families.
J. R. Statist. Soc. B, 27, 94-99.
SIMPSON, C. H. (1951). The interpretation in contingency
ofinteraction tables. J. R. Statist. Soc.
B, 13, 238-241.
YATES, F. (1940). The recovery
of inter-block in balancedincomplete
information blockdesigns.
Ann. Eugen., 10, 317-325.
(1948). The analysisof contingency
tableswithgroupingsbased on quantitative
characters.
Biometrika,35, 176-181.
(1970). ExperimentalDesign: Selected Papers. London: Griffin.
384 NELDER AND WEDDERBURN - Generalized
LinearModels [Part3,

APPENDIX
We can sometimes simplifytheexpression sum-
fordeviance.In whatfollows,
mationis to be takento be overtheobservations
unlessotherwise
stated.
Theorem. If either(a) Y = j1 or (b) Y = logp- and a constanttermis being
fittedin the model(say Y = E/i xi wherexo takesthe constantvalue 1) then
~{(z- 2)i/J2}= 0, where[i and Vjarethemaximumlikelihood
estimates
of, and V.
Proof.For case (b)
AL =0_
~Xo V
In case(a), we have
AL z-,udp, -Z
__

api V dYXi V 0 yi
Hence

(z V2) j Xi =

Then
- ~~ ~ ~ ~~~~ -

(Z- H)Z Y
&)_
(Z-
A

= zRiXiVY iV = Ij VY Xi) =

Thiscompletes
theproof.Whentheconditions
ofthistheorem wehave,
aresatisfied,
forthePoissondistribution
z-_2=O
andforthegammadistribution
- 0.
A
IJL
Thenthedeviances
forthesedistributions as follows:
simplify
Poisson 2 i z ln (zlj),
Gamma -2 ln (z/p).

Вам также может понравиться