Академический Документы
Профессиональный Документы
Культура Документы
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at
http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless
you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you
may use content in the JSTOR archive only for your personal, non-commercial use.
Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at
http://www.jstor.org/action/showPublisher?publisherCode=black.
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
Royal Statistical Society and Blackwell Publishing are collaborating with JSTOR to digitize, preserve and
extend access to Journal of the Royal Statistical Society. Series A (General).
http://www.jstor.org
J. R. Statist.Soc. A, 370
(1972), 135, Part 3, p. 370
GeneralizedLinear Models
SUMMARY
The techniqueof iterative
weightedlinearregressioncan be used to obtain
maximumlikelihoodestimatesof theparameters withobservations distri-
butedaccordingto someexponential familyand systematiceffectsthatcan
be madelinearbya suitabletransformation. oftheanalysis
A generalization
Thesegeneralized
ofvarianceis givenforthesemodelsusinglog-likelihoods.
by examplesrelatingto fourdistributions;
linearmodelsare illustrated the
Normal,Binomial(probitanalysis,etc.),Poisson(contingency tables)and
gamma(variancecomponents).
The implicationsof the approachin designingstatisticscoursesare
discussed.
Keywords: ANALYSIS OF VARIANCE; CONTINGENCY TABLES; EXPONENTIAL FAMILIES;
INVERSE POLYNOMIALS; LINEAR MODELS; MAXIMUM LIKELIHOOD:
QUANTAL RESPONSE; REGRESSION; VARIANCE COMPONENTS; WEIGHTED
LEAST SQUARES
INTRODUCTION
LINEARmodelscustomarily embodybothsystematic and random(error)components,
withtheerrorsusuallyassumedto have normaldistributions. The associatedanalytic
techniqueis least-squarestheory,whichin its classical formassumedjust one error
component;extensionsformultipleerrorshave been developedprimarily foranalysis
of designedexperimentsand surveydata. Techniques developed for non-normal
data includeprobitanalysis,wherea binomialvariatehas a parameterrelatedto an
assumedunderlying tolerancedistribution, and contingency tables,wherethe distri-
bution is multinomialand the systematicpart of the model usuallymultiplicative.
In both theseexamplesthereis a linearaspectto the model; thusin probitanalysis
theparameter p is a functionoftoleranceY whichis itselflinearon thedose (or some
functionthereof),and in a contingency tablewitha multiplicative modelthelogarithm
of theexpectedprobabilityis assumedlinearon classifying factorsdefining thetable.
Thus forboth,thesystematic partofthemodelhas a linearbasis. In anotherextension
(Nelder, 1968) a certaintransformation is used to produce normal errors,and a
differenttransformation of theexpectedvaluesis used to producelinearity.
So far we have mentionedmodels associated with the normal,binomial and
multinomial distributions (thislast can be thoughtof as a set of Poisson distributions
with constraints).A furtherclass is based on the x2 or gamma distributionand
arises in the estimationof variancecomponentsfromindependentquadraticforms
derivedfromtheoriginalobservations.Againthesystematic componentofthemodel
has a linearstructure.
In thispaper we develop a class of generalizedlinearmodels,whichincludesall
the above examples,and we give a unifiedprocedurefor fittingthem based on
1972] NELDER AND WEDDERBURN- Generalized Linear Models 371
and
_a2LIa-2= V = var(z).
372 NELDER AND WEDDERBURN - Generalized
LinearModels [Part3,
VdYd
-
=o0Z V{t dZX. (7)
and
02L 02L
O pj- ay2Xi Xi
where
a2Z b2L(dO\2 aL d2O
Oya= s dY_7+To dy2
=1_f+lV(-(dY + dY2
(y)
(-d-)
i /2 V+ (z
s?-d-yU
(z__)
o) d2 02 (8)
The expected
secondderivative signis thus
withnegative
wfortheweight
Writing function V, (7) gives
(dp/dy)2/
OL/Ij3,= o(k) wx,(z- )/(dl/d Y).
ThustheNewton-Raphson processwithexpectedsecondderivatives to
(equivalent
Fisher'sscoring fora sampleofn gives
technique)
ASP = C, (9)
whereA is a mx m matrix
with
n
Ai= Wk Xik Xjk
k=1
and C is a mx 1 vectorwith
Ci = IWkXik(ZIJu)/(dp4dY).
Finallywehave
(AAi = 1A4jfPJ = WkXikYk
in theform
so that(9) maybe written
AP*=r
where rj= IWkXikYk, Yk= Yk+(Zk-,uk)/(dtkl/dYk) and * = +Sp
374 NELDER AND WEDDERBURN - GeneralizedLinearModels [Part3,
Startingmethod
In practicewe can obtaina good startingprocedureforiterationas follows:take
as a firstapproximationj = z and calculate Y fromit; thencalculate w as before
and set y = Y. Then obtain the firstapproximationto the 3's by regression.The
methodmayneed slightmodification to deal withextremevalues of z. For instance,
withthe binomialdistribution it will probablybe adequate to replaceinstancesof
z =O0 or z = n withz = 2 and z = n- -, where,e.g. withthe probitand logittrans-
formations, tu= 0 or p = n would lead to infinite
values for Y.
Statistics
2.2. Sufficient
of
An importantspecial case occurs when 0, the parameterof the distribution
the randomelement,and Y the predictedvalue of the linearmodel,coincide. Then
L -zY-g(Y)+h(z),
and, using(3),
aL/ ai = CO) (z -) Xi.
The maximumlikelihood equations are then of the form k )(Z-) Xik = 0, the
summationbeingoverthe observations.Hence we have
E Zk Xik = E&k Xik. (10)
For a qualitativeindependentvariate,thisimpliesthatthefittedmarginaltotalswith
respectto thatvariatewillbe equal to theobservedones.
FromtheexpressionforL we see thatthequantitiesEk Zk Xik are a setof sufficient
statistics.Also, in (8) d2 O/dy2= 0 and so
62p_
_ Ex=(q) - (dv /V) xi (11)
TABLE1
Deviancesandtheirdifferences
Minimal dm dm-dA A
B dA dA- dAB B eliminating
A
C dAB dAB-dABO C eliminating
A and B
Complete do dABC-do Residual
Thegeneralized
analysisofvariance modelis nowdefined
fora sequential to have
componentsgivenby thefirstdifferences
of thedeviance,withdegreesof freedom
defined
as above. Thesecomponents havedistributions to X2, exactly
proportional
fornormalerrors,
approximatelyforothers.Sucha generalization
oftheanalysisof
variancewas suggested
by Good (1967).
3. SPECIAL
DISTRLJUTIONS
3.1. The NormalDistribution
Here,we have
L =L a=~~2
-2(Zlk
_
tj2 _-1Z2) -i ln62
and in thenotation
of Section1.2
p =G, V=1.
Inversepolynomials
providean examplewherewe assumethattheobservations
u arenormalon thelog scaleandthesystematiceffectsadditiveon theinverse
scale.
Then
z = ln u and Y = ell.
Nelder(1966) givesexamplesof inversepolynomials calculatedusingthefirst
approximation ofthemethodin thispaper.
More generally, as shownin Nelder(1968),we can considermodelsin which
thereis a linearizing
transformation
f and a normalizing transformation
g. This
meansthatif theobservationsare denotedby u, theng(u) is normally distributed
withmean,uandconstant varianceu2 andf{g-1(Ik)}= ,S, xi.
Thenwe haveV = 1 and Y =f{g-'(Ik)} so that
W = [(d/dY)g{f-1(y)}]2
and
y = Y+{g(u)-p}/[(d/dY)g{f-1(Y)}].
data
Example: Fisher'stuberculin-test
Fisher(1949) published16 measurements of tuberculin
responsewhichwere
classified
by threefour-level
factorsin a Latin-square
typeof arrangementas in
Table 2.
1972] NELDER AND WEDDERBURN- Generalized
LinearModels 377
TABLE 2
Cow class
Sites Treatments
I III II IV
Fisher'sresult Ourresult
B 0.0000 0.0000
A 0.2089 0.2092
D 0.0019 0.0023
C 0.2108 0.2115
3.2. ThePoissonDistribution
Here L = zln,u-,u so we have 0 = lnp.and V== pt.
When Y= In,u thereare sufficient statisticsand a unique maximumlikelihood
estimateof/, provideditis finite.It willalwaysbe finiteifthereareno zeroobservations.
If Y-=1p (O< A< 1), L -oo as IPIoo and henceL must have a maximum
forfinitep. Also
a2L a2 L
= E
:apiay2a
TABLE 4
Age Rating
in
years 4 3 2 1 Total
5- 7 7 3 4 7 21
8- 9 13 11 15 10 49
10-11 7 11 9 23 50
12-13 10 12 9 28 59
14-15 3 4 5 32 44
is - 0-205.
The estimatedlinearx linearinteraction
We can forman analysisof deviancethus(regardingmain effectsas fixed):
Degrees
Model term(s) Deviance Difference of
freedom
Degrees
Sourceofvariation Of X2
freedom
Total 12 31X67
Probitanalysis
We put
p = n(D(Y),
where(D is the cumulativenormaldistribution
function.Thereare no sufficient
statistics
here;theanalysis
viaiterative
weighted
least-squares
is wellknown(Finney,
1952).
Fittingconstantson a logitscale
Thistechnique was introduced by Dykeand Patterson (1952)and is appliedto
multiway tablesofproportions. It is a specialcaseofthelogistictransformationwhen
all the x's are qualitative,
and yieldsas sufficient statistics
the totalresponding
(Zz) foreachrelevant margin.
Thelogitanalogueofprobitanalysis is,ofcourse,formallyidentical,withquanti-
tativeratherthanqualitative
x's. Againarbitrary mixturesofx typesdo notintroduce
anything new.Modelsbasedonthelogistic transform
havebeenextensively developed
byCox (1970).
Fromtheresultsof Birch(1963)mentioned above,it followsthatmodelswith
independent binomialdataareequivalent to modelswithindependent Poissondata.
Bishop(1969)showedthata binomialmodelthatis additive on thelogitscalecanbe
treatedas a Poissonmodeladditiveon thelog scale.
Example
In a balancedincompleteblock designin whichb > v, we can producean analysis
of varianceas follows(Yates, 1940 or paper VIII of Yates, 1970).
Degreesof
freedom
varieties):
Blocks(eliminating
Varietalcomponent v-1
Remainder b-v
Total b-1
Varieties(ignoringblocks) v-1
Intra-block error rv-v-b +1
Total rv-1
Expected
meansquare
varieties):
Blocks(eliminating
Varietalcomponent a2 + Ekcb
Remainder a2+ ka2b
Intra-blockerror a2
Degrees
Mean square of Expectation
freedom
Blocks(eliminating
varieties):
Varietalcomponent 4-6329 8 a + 87Ob
Remainder 15-3557 9 a2 +4ab
Intra-blockerror 2-5968 46 a2
IterationNo. a2 orb
1 2-5695 2.0737
2 2-5874 2.0302
3 2-5870 2.0315
4 2-5870 2.0314
does not need any modification to make it additiveand is easy to computeas tables
of nInn are available (Kullback, 1968).
Finally,the fact that a singlealgorithmcan be used to fitany of the models
impliesthatquite a small set of routinescan providethe basic computingfacilityto
allow studentsto fitmodels to a wide range of data. They can get experienceof
programming bywritingspecialroutinesto deal withspecialformsof outputrequired
by particularmodels,suchas theLD5O of probitanalysis. In thisway thedistinction
can be made betweenthemodel-fitting partof an analysisand thesubsequentderived
quantities(withestimatesof theiruncertainties) whichparticularproblemsrequire.
We hope thattheapproachdevelopedin thispaperwillproveto be a usefulway of
unifyingwhatareoftenpresentedas unrelatedstatisticalprocedures,and thatthisunifi-
cationwill simplifythe teachingof the subjectto bothspecialistsand non-specialists.
REFERENCES
BIRCH, M. W. (1963). Maximumlikelihoodin three-way contingency tables. J. R. Statist.Soc.
B, 25, 220-233.
BISHOP, Y. M. M. (1969). Full contingency
tables,logits,and splitcontingency tables. Biometrics,
25, 383-399.
Cox, D. R. (1968). Noteson someaspectsofregression analysis.J.R. Statist.Soc. A, 131,265-279.
- (1970). Analysis of Binary Data. London: Methuen.
DEMPSTER, A. P. (1971). An overview
ofmultivariate
data analysis.J. Multivar.Anal., 1, 316-346.
DYKE, G. V. and PATTERSON, H. D. (1952). Analysisof factorialarrangements whenthe data
are proportions.Biometrics,
8, 1-12.
FINNEY, D. J. (1952). Probit Analysis. 2nd ed. Cambridge:University Press.
FISHER, R. A. (1949). A biologicalassay of tuberculosis.Biometrics,
5, 300-316.
GOOD, I. J.(1967). Analysisoflog-likelihood ratios,"ANOA". (A contribution to thediscussion
of a paper on least squares by F. J. Anscombe.) J. R. Statist. Soc. B, 29, 39-42.
IRELAND, C. T. and KULLBACK, S. (1968). Contingency
tableswithgivenmarginals.Biometr-ika,
55, 179-188.
IRWIN, J.0. (1949). A noteon thesubdivision
ofX2intocomponents.Biometrika, 36, 130-134.
KENDALL, M. G. and STUART, A. (1967). The Advanced Theoryof Statistics, 2nd ed, Vol. II.
London: Griffin.
KIMBALL, A. W. (1954). Short-cut formulasfortheexactpartitionof x2 in contingencytables.
Biometrics,10, 452-458.
Ku, H. H., VARNER, R. N. and KULLBACK, S. (1971). On the analysisof multidimensional
contingencytables. J. Amer. Statist. Ass., 66, 55-64.
KULLBACK, S. (1968). InformationTheoryand Statistics. 2nd ed. New York: Dover.
LANCASTER, H. 0. (1949). The derivationand partitionof x2 in certaindiscretedistributions.
Biometrika,36, 117-128.
(1950). The exactpartitionof x2 and its applicationto theproblemof thepoolingof small
expectations.Biometrika,37, 267-270.
LEHMANN, E. L. (1959). Testing Statistical Hypotheses. London: Wiley.
MAXWELL, A. E. (1961). Analysing Qualitative Data. London: Methuen.
NELDER, J. A. (1966). Inversepolynomials, a usefulgroupof multi-factor responsefunctions.
Biometrics,22, 128-141.
(1968). Weightedregression,
quantal responsedata and inversepolynomials.Biometrics,
24, 979-985.
PATIL, G. P. and SHORROCK, R. (1965). On certainpropertiesof theexponential-type families.
J. R. Statist. Soc. B, 27, 94-99.
SIMPSON, C. H. (1951). The interpretation in contingency
ofinteraction tables. J. R. Statist. Soc.
B, 13, 238-241.
YATES, F. (1940). The recovery
of inter-block in balancedincomplete
information blockdesigns.
Ann. Eugen., 10, 317-325.
(1948). The analysisof contingency
tableswithgroupingsbased on quantitative
characters.
Biometrika,35, 176-181.
(1970). ExperimentalDesign: Selected Papers. London: Griffin.
384 NELDER AND WEDDERBURN - Generalized
LinearModels [Part3,
APPENDIX
We can sometimes simplifytheexpression sum-
fordeviance.In whatfollows,
mationis to be takento be overtheobservations
unlessotherwise
stated.
Theorem. If either(a) Y = j1 or (b) Y = logp- and a constanttermis being
fittedin the model(say Y = E/i xi wherexo takesthe constantvalue 1) then
~{(z- 2)i/J2}= 0, where[i and Vjarethemaximumlikelihood
estimates
of, and V.
Proof.For case (b)
AL =0_
~Xo V
In case(a), we have
AL z-,udp, -Z
__
api V dYXi V 0 yi
Hence
(z V2) j Xi =
Then
- ~~ ~ ~ ~~~~ -
(Z- H)Z Y
&)_
(Z-
A
= zRiXiVY iV = Ij VY Xi) =
Thiscompletes
theproof.Whentheconditions
ofthistheorem wehave,
aresatisfied,
forthePoissondistribution
z-_2=O
andforthegammadistribution
- 0.
A
IJL
Thenthedeviances
forthesedistributions as follows:
simplify
Poisson 2 i z ln (zlj),
Gamma -2 ln (z/p).