Академический Документы
Профессиональный Документы
Культура Документы
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
http://www.jstor.org/page/info/about/policies/terms.jsp
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content
in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship.
For more information about JSTOR, please contact support@jstor.org.
International Biometric Society is collaborating with JSTOR to digitize, preserve and extend access to Biometrics.
http://www.jstor.org
This content downloaded from 202.92.128.135 on Thu, 09 Apr 2015 03:40:00 UTC
All use subject to JSTOR Terms and Conditions
BIOMETRICS
June 1995
51, 738-743
SUMMARY
When analyzingPoisson-countdata sometimesa lot ofzeros are observed. Whenthereare too many
zeros a zero-inflatedPoisson distribution
can be used. A score testis presentedto testwhetherthe
numberof zeros is too large fora Poisson distributionto fitthe data well.
1. Introduction
Johnson, Kotz, and Kemp (1992, pp. 312-318) discuss a simple way of modifyinga discrete
distributionto handle extra zeros. An extra proportionof zeros, w, is added to the proportionof
zeros fromthe originaldiscretedistribution,
f(0), while decreasingthe remainingproportionsin an
appropriateway:
JP(Yi= 0) = w + (1 - Wo)f)
P(Yi = yi) = (1 - W)f(yi)
(Yi >0)
(1)
withequalityforlefttruncation.
Farewell and Sprott(1988) discuss an inflatedbinomialas a mixturemodel forcount data. They
also pointout the two-populationinterpretation
of thismodel: in one populationone observes only
zeros, while in the otherone observes counts froma discretedistribution.
As an example of such an interpretation
considera populationwhichconsistsof two groups: one
of people who are not at riskof developinga certaindisease and one of people who are at riskand
may develop the disease several times. Of course such a model should be plausible in a given
situation.
Anotherexample is discussed by Lambert(1992). Manufacturing
equipmentmaybe in two states:
a perfectstate in which the machine produces no defects and an imperfectstate in which the
machine produces a numberof mistakesaccordingto a Poisson distribution.She discusses maximum likelihoodestimationand testingin the zero-inflatedPoisson regressionusing:
ln(A) = X,8 (with A the mean of the Poisson distribution)
ln
Gy
forcovariate matricesX and G. Two cases are considered: A and o functionallynot related and A
and o functionallyrelated. She also proves the asymptoticnormalityof the distributionof the
parameterestimatesand shows thatthe likelihoodratiostatisticis asymptoticallydistributedas a x2
withappropriatedegrees of freedom.
is appropriateor, to put it differently,
These examples assume thatthe zero inflateddistribution
thatthepopulationconsideredconsistsoftwo subpopulationsas describedabove. This is notalways
obvious. One would like to see ifthereis some evidence fromthe observed data to supportsuch an
This content downloaded from 202.92.128.135 on Thu, 09 Apr 2015 03:40:00 UTC
All use subject to JSTOR Terms and Conditions
739
assumption.
To achievethisforthezero-inflated
Poissondistribution
a scoretestis proposedinthe
nextsection.
2. A ScoreTest
A scoretestforw = 0 in theinflated
Poissonhas theadvantagethatone neednotfittheinflated
instead.
Poissonbutjust a Poisson,whichis thedistribution
underthenullhypothesis,
Using(1) thedensityfortheinflated
Poissonis:
P(Yi = 0) =
+ (1 -
l(A,0; y) =
=
E
)ee
AI
Yf
(Yi> ?).
as:
constant(-f(O) - 0 < xo),thelog likelihood
can be written
{-elog(l
+ 6) + l(Y=)elog(6 + eA)
+ I(.,,>O)[-Ai+
- elog(y!)1},
yielog(Ai)
(2)
(3)
1)
TXXTdiag()X11XT
fordetails
wheref8andAiaretheestimates
off8andAiunderthenullhypothesis.
(See theAppendix
and,forinstance,Cox and Hinkley(1974),pp. 321-325,fora discussionofthescoretest.)
thelatterequality
If themodelcontainsa constantthen
IXTA = E Ai= ny7,
diag()
beingtruedue to theestimating
equationsunderthenullhypothesis
(see Appendix).
Ifone writes:floi= P(Yi = 0) = ek-i, thenthescorestatistic
as:
can be written
S(f3)
1
l(Yj=o)
J~
I
ELi=l
{
Poi}
1-j5}-Poi
willhave an asymptotic
Underthenullhypothesis
thisstatistic
distribution
with1
chi-squared
degreeoffreedom.
Thiscan be interThe termXTX[XTdiag(X)X]-IXTAcan be readas E(XTY)[var(XTY)]-IE(XTY).
as itrelatesto XTE(Y).IfXTE(Y)departsmuchfromzero,the
pretedas an "F value"-likestatistic
The statistic
secondterminthedenominator
of(3) willhavesubstantial
influence.
S(13)can be seen
as a goodness-of-fit
statistic.
It looksat thefitofthezerosbutalso accountsforthemeansofthe
fitted
Thisis reasonable,becausethequestionoftoo manyzerosis notonly
Poissondistribution.
withthemeanofthe
thisnumber
answeredbylookingatthenumber
ofzerosbutalso bycomparing
canindicatethatthe
A moderate
number
ofzerosanda highmeanoftheobservations
observations.
thebinomial
number
ofzerosis toohigh.If,insteadofthePoissondistribution,
distribution
B(ni,pi)
is used, thesame statistic
is obtainedexceptforthetermE(XTY)[var(XTY)]-IE(XTY)
whichwill
havevaluesaccording
to thebinomial
distribution.
thenbecomes(1 - fii)'i.However,underthe
fl0i
same conditionsunderwhichthe binomialcan be approximated
by the Poisson (smallpi, np1
andnilarge),thestatistic
obtainedwith
ofthesamestatistic
constant,
(4) willbe an approximation
thebinomialdistribution.
3. The case ofno covariates
Consider the case where thereare n observations,among themn0 zeros, and no covariates. The
score statisticfor testingwhetherthe Poisson distribution
fitsthe numberof zeros well is, in this
case:
This content downloaded from 202.92.128.135 on Thu, 09 Apr 2015 03:40:00 UTC
All use subject to JSTOR Terms and Conditions
740
Biometrics,June 1995
Table 1
Percentile
pointsofthestatisticS(,81)based on 5,000samplesofsize nfroma Poisson
distribution
withmeanAand thesamepointsofa X2(l) distribution
P7
n = 100
A = .5
A= 1
n = 200
A= .5
A= 1
=
1.07
1.13
1.11
Percentilepointsof a X2(l)
P.8 = 1.64 P9 = 2.71 P95 = 3.84
1.13
1.02
1.66
1.68
2.73
2.75
3.86
3.97
P99 = 6.63
6.77
6.87
1.37
1.58
2.67
2.56
3.73
3.68
6.52
6.61
(nO- nfo)2
fo(l
fio)
(5)
- nypfiO
no-nn(
2(
l)Y
jl
n
+ n(n - 1)(-
This statistic
is obtainedby conditioning
on thesumoftheobservations.
In a simulation
withthe
studyEl-Shaarawi(1985)comparedtheabove two statistics
together
This simulation
likelihoodratiostatistic.
study,usingsamplesizes of 15 and 50 and a meanof5,
levelbuthas
showedthatthelikelihood
ratiostatistic
ofthetruesignificance
givesclosestestimate
muchlowerpowerthentheothertwo.The significance
andthestatistic
levelsofthescorestatistic
ofRao andChakravarti
are closerto thetruelevelfora samplesize of50 thanfora samplesize of
15.El-Shaarawiconcludesthatthescorestatistic
arepreferable
andtheoneofRao andChakravarti
has theadvantageof
to thelikelihood
ratiostatistic
becauseofthehigher
power.The scorestatistic
beingeasierto compute.Besidesthis,it can be used in thecase werethereare covariates.
4. An example
the department
of internal
virus(HIV)-infected
Of 98 humanimmunodeficiency
men,attending
oftimestheyhadan urinary
tractinfection
medicine
attheUtrecht
University
Hospital,thenumber
(numberof episodes)was recorded(Hoepelmanet al., 1992).Besidesthis,theimmunestatusof
theCD4+ cellcount.Table2 showsthata lotofpatients
everypatientwas determined
bymeasuring
did nothave a urinary
tractinfection.
Table2
0
81
1
9
2
7
3
1
This content downloaded from 202.92.128.135 on Thu, 09 Apr 2015 03:40:00 UTC
All use subject to JSTOR Terms and Conditions
741
To assess whetherthereare too manyzeros forthe data to have arisen froma Poisson distribution,the score statistic(4) can be calculated with
Pioi= exp(-e
F+jj1CD4+i)
The outcome of the score statisticis 5.96 (whereas the score statisticwithoutusing the covariate
CD4+ has an outcome of 15.35,illustrating
theimportanceof theuse of covariates),givingevidence
thattoo many zeros are observed forthe Poisson to fitthe data well.
An alternativedistribution
forthe Poisson thenis the inflatedPoisson. As pointedout, the model
then used is that the population can be thoughtof as consistingof two parts: a proportionw
consistingof patients not being at risk of developing a urinarytract infectionand an other part
consistingof patientswho are at risk of developinga urinarytractinfection.
The covariate can be used to model the mean numberof episodes of the patientsbeing at risk:
ln(A) = I3 + 131CD4+.
It can also be used to model the "probability" of not being at risk:
ln(
co + c CD4+.
If one fitsan inflatedPoisson with both, the log-likelihoodis -53.21 on 94 degrees of freedom.
Lambert (1992) discusses the fitting
procedureforwhich iterativemethodsare needed. The likelihood ratio statisticfortesting3, - 0 has an out come of .101, indicatingthatln(A) can be modeled
as a constant: ln(A) = p0. Using this and the same model for the "probability" of not being at
risk as above, the log-likelihoodis -53.26 on 95 degrees of freedom.The resultsof this fitare in
Table 3.
Table 3
Estimationresults
Asymptoticcorrelations
Parameter
Estimates
Standard
errors
a}o
a,]
130
-.487
.007
--.094
.699
.003
.317
between estimates
al
I3
1
-.66
.64
- .23
1
1
ao
.513
- .544
Mean
-
.010
.003
Thirdquartile Range
-
.096
.042
6.42
5.39
This content downloaded from 202.92.128.135 on Thu, 09 Apr 2015 03:40:00 UTC
All use subject to JSTOR Terms and Conditions
742
Biometrics,June 1995
ACKNOWLEDGEMENTS
APPENDIX
dl()
dl(*)
d6
d(9
{l(Y=O)
'
?1
(It-1~~
+ 0) +
E
{;Y
0j=) + ~~
?(A)jt.
(6)
(7)
e-Ai}(7
E(yji-Aj)xij.
= O, r= 1 ... p
(8)
x;-ijso
UT(p,
0) =0,
0, E
-(
j)
This content downloaded from 202.92.128.135 on Thu, 09 Apr 2015 03:40:00 UTC
All use subject to JSTOR Terms and Conditions
(9)
d21()
dfrdf3,
[0 +
|;O)
df3rdOs E(5'=O)(
d21(
A
AIiXvxi - 1(,>o)AiXisxi4
(yi=?)
e-j2J
Jr,p+
X -e(
/d 1(
dpi
1I
Ai
AiAi6 + 6
+e
i ((1 +
=8 s
0)2[0 +
1 +6
r= 1
}AiXirXis
itcan be seenthat
(1 + 6)[6 + eAil
-(d/3dfJ)
p l,pd I
p, S
+ 1
0 + eA
UsingE[1(y;=O)]= P(yi = 0) =
andE [1(,>o)] = P(yi > 0)
1 +0
J(/3,0), has entries:
matrix,
theexpectedinformation
Jr,S -
r= 1
r= 1
+e12)AiXirJ
(I1 + 0)2
d02
743
p,
1.
s =
r = 1 ** p
e-A)
and so J(f3,
0) has entries
r,s = >
ixirxis
ii
JPIp+=)+
(eA;
i).
Aix
r=1...
-A
(1
PartitionJ(f3,0) as
Lillj121Xwhere Jl
J22
[J2
XTdiag(X)X;jl2
TX, =
-XTAJ21
XATX;J22
as
Now denotetheinverseofJ(P, 0) as C whichcan be partitioned
[C c2J
Due to thestructure
ofU(13,0) onlyC22is needed.
C22
=22
l)
(>
A;
XTX[XTdiag(A)XIXTX.
This content downloaded from 202.92.128.135 on Thu, 09 Apr 2015 03:40:00 UTC
All use subject to JSTOR Terms and Conditions