(Van Den Broek) A Score Test For Zero Inflation in A Poisson Distribution

A Score Test for Zero Inflation in a Poisson Distribution
Author(s): Jan van den Broek

Source: Biometrics, Vol. 51, No. 2 (Jun., 1995), pp. 738-743
Published by: International Biometric Society
Stable URL: http://www.jstor.org/stable/2532959
Accessed: 09-04-2015 03:40 UTC
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
http://www.jstor.org/page/info/about/policies/terms.jsp
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content
in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship.
For more information about JSTOR, please contact support@jstor.org.
International Biometric Society is collaborating with JSTOR to digitize, preserve and extend access to Biometrics.
http://www.jstor.org
This content downloaded from 202.92.128.135 on Thu, 09 Apr 2015 03:40:00 UTC
All use subject to JSTOR Terms and Conditions
BIOMETRICS
June 1995
51, 738-743
A Score Test forZero Inflationin a Poisson Distribution

Jan van den Broek
Center for Biostatistics, University of Utrecht,
Yalelaan 7, 3584 CL Utrecht, The Netherlands
SUMMARY
When analyzingPoisson-countdata sometimesa lot ofzeros are observed. Whenthereare too many
zeros a zero-inflatedPoisson distribution
can be used. A score testis presentedto testwhetherthe
numberof zeros is too large fora Poisson distributionto fitthe data well.
1. Introduction
Johnson, Kotz, and Kemp (1992, pp. 312-318) discuss a simple way of modifyinga discrete
distributionto handle extra zeros. An extra proportionof zeros, w, is added to the proportionof
zeros fromthe originaldiscretedistribution,
f(0), while decreasingthe remainingproportionsin an
appropriateway:
JP(Yi= 0) = w + (1 - Wo)f)
P(Yi = yi) = (1 - W)f(yi)
(Yi >0)
(1)
They state thatit is possible to take w less thanzero, providedthat:

f(0)
[1 - M)]A
withequalityforlefttruncation.
Farewell and Sprott(1988) discuss an inflatedbinomialas a mixturemodel forcount data. They
also pointout the two-populationinterpretation
of thismodel: in one populationone observes only
zeros, while in the otherone observes counts froma discretedistribution.
As an example of such an interpretation
considera populationwhichconsistsof two groups: one
of people who are not at riskof developinga certaindisease and one of people who are at riskand
may develop the disease several times. Of course such a model should be plausible in a given
situation.
Anotherexample is discussed by Lambert(1992). Manufacturing
equipmentmaybe in two states:
a perfectstate in which the machine produces no defects and an imperfectstate in which the
machine produces a numberof mistakesaccordingto a Poisson distribution.She discusses maximum likelihoodestimationand testingin the zero-inflatedPoisson regressionusing:
ln(A) = X,8 (with A the mean of the Poisson distribution)
ln
Gy
forcovariate matricesX and G. Two cases are considered: A and o functionallynot related and A
and o functionallyrelated. She also proves the asymptoticnormalityof the distributionof the
parameterestimatesand shows thatthe likelihoodratiostatisticis asymptoticallydistributedas a x2
withappropriatedegrees of freedom.
is appropriateor, to put it differently,
These examples assume thatthe zero inflateddistribution
thatthepopulationconsideredconsistsoftwo subpopulationsas describedabove. This is notalways
obvious. One would like to see ifthereis some evidence fromthe observed data to supportsuch an
Key words: Poisson; Score test; Zero inflation.

738
739
A Score Testfor Zero Inflation
assumption.
To achievethisforthezero-inflated
Poissondistribution
a scoretestis proposedinthe
nextsection.
2. A ScoreTest
A scoretestforw = 0 in theinflated
Poissonhas theadvantagethatone neednotfittheinflated
instead.
Poissonbutjust a Poisson,whichis thedistribution
underthenullhypothesis,
Using(1) thedensityfortheinflated
Poissonis:
P(Yi = 0) =
+ (1 -
P(Yi = Yi)= ( - c))

Using0 =
l(A,0; y) =
=
E
)ee
AI
Yf
(Yi> ?).
as:
constant(-f(O) - 0 < xo),thelog likelihood
can be written
{-elog(l
+ 6) + l(Y=)elog(6 + eA)
+ I(.,,>O)[-Ai+
- elog(y!)1},
yielog(Ai)
(2)
where1(condition) takes the value 1 if the conditionis trueand 0 otherwise.Fromthis,taking

U(,B,0) andtheexpectedinformation
J(,B,0) can be calculated.The
ln(Ai)= XiB,thescorefunction
scorestatistic
fortesting0 = 0 is then:
S(f3)= S(f3,0) = UT(f3,O)[J(f3,
O)]PU(f3,0)
(3)
1)
TXXTdiag()X11XT
fordetails
wheref8andAiaretheestimates
off8andAiunderthenullhypothesis.
(See theAppendix
and,forinstance,Cox and Hinkley(1974),pp. 321-325,fora discussionofthescoretest.)
thelatterequality
If themodelcontainsa constantthen
IXTA = E Ai= ny7,
diag()
beingtruedue to theestimating
equationsunderthenullhypothesis
(see Appendix).
Ifone writes:floi= P(Yi = 0) = ek-i, thenthescorestatistic
as:
can be written
S(f3)
1
l(Yj=o)
J~
I
ELi=l
{
Poi}
1-j5}-Poi
willhave an asymptotic
Underthenullhypothesis
thisstatistic
distribution
with1
chi-squared
degreeoffreedom.
Thiscan be interThe termXTX[XTdiag(X)X]-IXTAcan be readas E(XTY)[var(XTY)]-IE(XTY).
as itrelatesto XTE(Y).IfXTE(Y)departsmuchfromzero,the
pretedas an "F value"-likestatistic
The statistic
secondterminthedenominator
of(3) willhavesubstantial
influence.
S(13)can be seen
as a goodness-of-fit
statistic.
It looksat thefitofthezerosbutalso accountsforthemeansofthe
fitted
Thisis reasonable,becausethequestionoftoo manyzerosis notonly
Poissondistribution.
withthemeanofthe
thisnumber
answeredbylookingatthenumber
ofzerosbutalso bycomparing
canindicatethatthe
A moderate
number
ofzerosanda highmeanoftheobservations
observations.
thebinomial
number
ofzerosis toohigh.If,insteadofthePoissondistribution,
distribution
B(ni,pi)
is used, thesame statistic
is obtainedexceptforthetermE(XTY)[var(XTY)]-IE(XTY)
whichwill
havevaluesaccording
to thebinomial
distribution.
thenbecomes(1 - fii)'i.However,underthe
fl0i
same conditionsunderwhichthe binomialcan be approximated
by the Poisson (smallpi, np1
andnilarge),thestatistic
obtainedwith
ofthesamestatistic
constant,
(4) willbe an approximation
thebinomialdistribution.
3. The case ofno covariates
Consider the case where thereare n observations,among themn0 zeros, and no covariates. The
score statisticfor testingwhetherthe Poisson distribution
fitsthe numberof zeros well is, in this
case:
740
Biometrics,June 1995
Table 1
Percentile
pointsofthestatisticS(,81)based on 5,000samplesofsize nfroma Poisson
distribution
withmeanAand thesamepointsofa X2(l) distribution
P7
n = 100
A = .5
A= 1
n = 200
A= .5
A= 1
=
1.07
1.13
1.11
Percentilepointsof a X2(l)
P.8 = 1.64 P9 = 2.71 P95 = 3.84
1.13
1.02
1.66
1.68
2.73
2.75
3.86
3.97
P99 = 6.63
6.77
6.87
1.37
1.58
2.67
2.56
3.73
3.68
6.52
6.61
(nO- nfo)2
fo(l
fio)
(5)
- nypfiO
In orderto see ifthechi-squareapproximation

is appropriate
a simulation
studywas carriedout.
Froma Poissondistribution
withmean.5, 5,000samplesweretakenoncewithsamplesize 100and
once withsamplesize 200. The same was done witha meanof 1. These smallvaluesforA are
chosen,because if thereare a lot of zeros the meanof thePoissondistribution
underthe null
is low. For everysamplethescore statistic
hypothesis
was calculatedand afterwards
percentile
pointswereobtained.These are to be comparedwiththepercentile
pointsof a chi-squaredistributionwithone degreeof freedom(Table 1). The y2(l) approximation
for S(f31)looks very
reasonable.A reasonablyhighmeanand hardlyanyzeros givesproblemswiththeapproximate
distribution
unlessthesamplesizeis large.In thiscase however,
theremight
be no need
chi-squared
fora teston thefitofthezeros.
Cochran(1954)proposeda statistic
forcomparing
theobservedand expectedfrequencies
of a
Ifone uses thisstatistic
to comparetheobservedand
singleoutcomefroma Poissondistribution.
thescorestatistic
expectedzero-frequencies,
(5) is obtained.
Another
statistic
forlookingat thenumber
ofzerosinthecase ofno covariateswas proposedby
Rao and Chakravarti
(1956):
(f;
no-nn(
2(
l)Y
jl
n
+ n(n - 1)(-
This statistic
is obtainedby conditioning
on thesumoftheobservations.
In a simulation
withthe
studyEl-Shaarawi(1985)comparedtheabove two statistics
together
This simulation
likelihoodratiostatistic.
study,usingsamplesizes of 15 and 50 and a meanof5,
levelbuthas
showedthatthelikelihood
ratiostatistic
ofthetruesignificance
givesclosestestimate
muchlowerpowerthentheothertwo.The significance
andthestatistic
levelsofthescorestatistic
ofRao andChakravarti
are closerto thetruelevelfora samplesize of50 thanfora samplesize of
15.El-Shaarawiconcludesthatthescorestatistic
arepreferable
andtheoneofRao andChakravarti
has theadvantageof
to thelikelihood
ratiostatistic
becauseofthehigher
power.The scorestatistic
beingeasierto compute.Besidesthis,it can be used in thecase werethereare covariates.
4. An example
the department
of internal
virus(HIV)-infected
Of 98 humanimmunodeficiency
men,attending
oftimestheyhadan urinary
tractinfection
medicine
attheUtrecht
University
Hospital,thenumber
(numberof episodes)was recorded(Hoepelmanet al., 1992).Besidesthis,theimmunestatusof
theCD4+ cellcount.Table2 showsthata lotofpatients
everypatientwas determined
bymeasuring
did nothave a urinary
tractinfection.
Table2
Frequencies of the numberof episodes

Number of episodes
Frequencies
0
81
1
9
2
7
3
1
A Score Test for Zero Inflation
741
To assess whetherthereare too manyzeros forthe data to have arisen froma Poisson distribution,the score statistic(4) can be calculated with
Pioi= exp(-e
F+jj1CD4+i)
The outcome of the score statisticis 5.96 (whereas the score statisticwithoutusing the covariate
CD4+ has an outcome of 15.35,illustrating
theimportanceof theuse of covariates),givingevidence
thattoo many zeros are observed forthe Poisson to fitthe data well.
An alternativedistribution
forthe Poisson thenis the inflatedPoisson. As pointedout, the model
then used is that the population can be thoughtof as consistingof two parts: a proportionw
consistingof patients not being at risk of developing a urinarytract infectionand an other part
consistingof patientswho are at risk of developinga urinarytractinfection.
The covariate can be used to model the mean numberof episodes of the patientsbeing at risk:
ln(A) = I3 + 131CD4+.
It can also be used to model the "probability" of not being at risk:
ln(
co + c CD4+.
If one fitsan inflatedPoisson with both, the log-likelihoodis -53.21 on 94 degrees of freedom.
Lambert (1992) discusses the fitting
procedureforwhich iterativemethodsare needed. The likelihood ratio statisticfortesting3, - 0 has an out come of .101, indicatingthatln(A) can be modeled
as a constant: ln(A) = p0. Using this and the same model for the "probability" of not being at
risk as above, the log-likelihoodis -53.26 on 95 degrees of freedom.The resultsof this fitare in
Table 3.
Table 3
Estimationresults
Asymptoticcorrelations
Parameter
Estimates
Standard
errors
a}o
a,]
130
-.487
.007
--.094
.699
.003
.317
between estimates
al
I3
1
-.66
.64
- .23
1
1
ao
To see ifthe model can be simplifiedany furtherthe likelihoodratio statisticfortestingca = 0 was

calculated. The outcomeis 12.19, so thereis a relationbetweenthe "probability"of notbeingat risk
and the CD4+ cell count. Roughly:the odds of not beingat riskfora patientwho has a CD4+ cell
count of 100 higherthan anotherpatient,are about twice as high.
It mightbe possible thatthereis a relationbetweenthefollow-uptimeand thenumberofobserved
episodes. The model-was refittedwithln(follow-uptime) as an additionalcovariate. This gave no
improvement(a lik'elihoodratio statisticof .56).
As El-Shaarawi (1985) points out, rejectingthe hypothesisthatthe Poisson distributionfitsthe
numberof zeros well, does notimplythattheinflatedPoisson is theappropriatemodel. There might
be other distributionsthat fitthe data well. For instance the negative binomial can be a good
candidate. Fittingthe negative binomialwith mean ,u, variance ,u(1 + cr,u)and a log-link,in this
example, gives a log-likelihoodof -55.67 on 95 degrees of freedom.Table 3 shows some summary
statisticsof the Pearson residuals,definedas [Yi - E(Yi)]/V'var(Yi)with Yi the numberof episodes
of patienti, forthe negativebinomialand the inflatedPoisson. Inspectionof these residuals shows
thattheyare, in absolute sense, more oftensmallerforthe inflatedPoisson. (See Table 4).
Table 4
Summarystatisticsof thePearson residuals
Firstquartile
Negative binomial
InflatedPoisson
.513
- .544
Mean
-
.010
.003
Thirdquartile Range
-
.096
.042
6.42
5.39
742
Biometrics,June 1995
ACKNOWLEDGEMENTS
I wouldliketo thankJimLindsey,ByronJ.T. Morgan,and an AssociateEditorfortheirhelpful

comments
and suggestions
and AndyHoepelmanforpermission
to use thedata.
RESUME
I1 arrivequelquefoisqu'on observeun grandnombrede z6ros lors de l'analysede donn6esde
comptages.Dans une tellesituation
une distribution
de Poissonmodifi6e
peutetreutilis6e.Nous
pr6sentons
untestdu scorepourd6cidersi le nombre
de z6rosobserv6sesttropgrandpourqu'une
distribution
de Poissonnonmodifi6e
puisseetrecompatible
avec les donn6es.
REFERENCES
Cochran,W. G. (1954). Some methodsforstrengthening

the commonx2 tests.Biometrics10,
417-451.
Statistics.London:Chapmanand Hall.
Cox, D. R. and Hinkley,D. V. (1974).Theoretical
El-Shaarawi,A. H. (1985).Some goodness-of-fit
methodsforthePoissonplusaddedzerosdistribution.Applied and EnvironmentalMicrobiology49, 1304-1306.
Farewell,V. T., andSprott,D. A. (1988).Theuse ofa mixture

modelintheanalysisofcountdata.
Biometrics44, 1191-1194.
Hoepelman,A. I. M., Van Buren,M., Van denBroek,J.,andBorleffs,

J.C. C. (1992).Bacteriuria
in meninfectedwithHIV-1 is relatedto theirimmunestatus(CD4+ cell count).AIDS 6,
179-184.
Johnson,N. L., Kotz, S., and Kemp,A. W. (1992). UnivariateDiscreteDistributions,
second
edition.New York:JohnWiley& Sons, Inc.
Lambert,D. (1992).Zero-inflated
withan application
Poissonregression,
to defectsin manufacturing. Technometrics34, 1-14.
Rao, C. R., and Chakravarti,

I. M. (1956).Some smallsampletestsof significance
fora Poisson
distribution.Biometrics12, 264-282.
Received December 1993; revisedDecember 1994; accepted February 1995.
APPENDIX
The model underthe nullhypothesishas linkfunctionln(A) = X,Bwith/8a p x 1 vector. The model

the log likelihood(2) withrespectto 18and 0 gives:
matrixincludes a constant.Then differentiating
dl()
dl(*)
d6
d(9
{l(Y=O)
'
?1
(It-1~~
+ 0) +
Under the null hypothesis: 0

hypothesis,(6) becomes:
E
{;Y
0j=) + ~~
?(A)jt.
(6)
(7)
e-Ai}(7
0, with /3and Aj maximumlikelihood estimates under the null
E(yji-Aj)xij.
and (7) then is
Aixir+ l(yi>O)(yi - Ai)xi,4 r = 1
= O, r= 1 ... p
(8)
x;-ijso
UT(p,
0) =0,
0, E
-(
j)
The second derivativesare given by:
(9)
A Score Testfor Zero Inflation
d21()
/-e [(A1 - Aj)6 + e A\l
dfrdf3,
[0 +
|;O)
df3rdOs E(5'=O)(
d21(
A
AIiXvxi - 1(,>o)AiXisxi4
(yi=?)
e-j2J
Jr,p+
X -e(
/d 1(
dpi
1I
Ai
AiAi6 + 6
+e
i ((1 +
=8 s
0)2[0 +
1 +6
r= 1
}AiXirXis
((1 + 6[f6+ e Al)Aixir
itcan be seenthat
(1 + 6)[6 + eAil
-(d/3dfJ)
p l,pd I
p, S
+ 1
0 + eA
UsingE[1(y;=O)]= P(yi = 0) =
andE [1(,>o)] = P(yi > 0)
1 +0
J(/3,0), has entries:
matrix,
theexpectedinformation
Jr,S -
r= 1
r= 1
+e12)AiXirJ
(I1 + 0)2
d02
743
p,
1.
s =
r = 1 ** p
e-A)
and so J(f3,
0) has entries
r,s = >
ixirxis
ii
JPIp+=)+
(eA;
i).
Aix
r=1...
-A
(1
PartitionJ(f3,0) as
r= ...p,s =1...p; Jp +I=-
Lillj121Xwhere Jl
J22
[J2
XTdiag(X)X;jl2
TX, =
-XTAJ21
XATX;J22
as
Now denotetheinverseofJ(P, 0) as C whichcan be partitioned
[C c2J
Due to thestructure
ofU(13,0) onlyC22is needed.
C22
=22
l)
(>
A;
XTX[XTdiag(A)XIXTX.
Usingthiswith(9), equation(3) follows.

Sincethemodelcontainsa constant,
Acan be written
as A = diag(A)Xep,
whereep is a (p x 1)
equalzero.So ATX[XTdiag(X)X]andhavingotherelements
element
IXTA=
vectorhavinga 1 as first
epTXTdiag(X)X[XTdiag(A)X]
-IXTdiag(X)Xep= 1TX.From(8) thiscanbe seento equalny.

(Van Den Broek) A Score Test For Zero Inflation in A Poisson Distribution

Загружено:

Сведения о документе

Исходное описание:

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

(Van Den Broek) A Score Test For Zero Inflation in A Poisson Distribution

Загружено:

Авторское право:

Доступные форматы

A Score Test for Zero Inflation in a Poisson Distribution

Author(s): Jan van den Broek

A Score Test forZero Inflationin a Poisson Distribution

They state thatit is possible to take w less thanzero, providedthat:

Key words: Poisson; Score test; Zero inflation.

A Score Testfor Zero Inflation

P(Yi = Yi)= ( - c))

where1(condition) takes the value 1 if the conditionis trueand 0 otherwise.Fromthis,taking

In orderto see ifthechi-squareapproximation

Frequencies of the numberof episodes

A Score Test for Zero Inflation

To see ifthe model can be simplifiedany furtherthe likelihoodratio statisticfortestingca = 0 was

I wouldliketo thankJimLindsey,ByronJ.T. Morgan,and an AssociateEditorfortheirhelpful

Cochran,W. G. (1954). Some methodsforstrengthening

Farewell,V. T., andSprott,D. A. (1988).Theuse ofa mixture

Hoepelman,A. I. M., Van Buren,M., Van denBroek,J.,andBorleffs,

Rao, C. R., and Chakravarti,

Received December 1993; revisedDecember 1994; accepted February 1995.

The model underthe nullhypothesishas linkfunctionln(A) = X,Bwith/8a p x 1 vector. The model

Under the null hypothesis: 0

0, with /3and Aj maximumlikelihood estimates under the null

and (7) then is

Aixir+ l(yi>O)(yi - Ai)xi,4 r = 1

The second derivativesare given by:

A Score Testfor Zero Inflation

/-e [(A1 - Aj)6 + e A\l

((1 + 6[f6+ e Al)Aixir

r= ...p,s =1...p; Jp +I=-

Usingthiswith(9), equation(3) follows.

Вам также может понравиться