Вы находитесь на странице: 1из 3

Homicide example:

1308 individuals were asked: How many people have you known personally that were victims
of homicide?
> homicide=data.frame(nvics=rep(c(0:6),2),
+ race=rep(c("Black","White"),each=7),
+ Freq=c(119,16,12,7,3,2,0,1070,60,14,4,0,0,1))
> xtabs(Freq~race+nvics,data=homicide)

nvics
race 0 1 2 3 4 5 6
Black 119 16 12 7 3 2 0
White 1070 60 14 4 0 0 1

> homicide=transform(homicide,race=relevel(race,"White"))
> homicide

nvics race Freq


1 0 Black 119
2 1 Black 16
3 2 Black 12
4 3 Black 7
5 4 Black 3
6 5 Black 2
7 6 Black 0
8 0 White 1070
9 1 White 60
10 2 White 14
11 3 White 4
12 4 White 0
13 5 White 0
14 6 White 1

Poisson loglinear model


> hom.poi=glm(nvics~race,family=poisson(link="log"),weights=Freq,data=homicide)
> summary(hom.poi)

Call:
glm(formula = nvics ~ race, family = poisson(link = "log"), data = homicide,
weights = Freq)

Deviance Residuals:
Min 1Q Median 3Q Max
-14.051 0.000 5.257 6.216 13.306

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.38321 0.09713 -24.54 <2e-16 ***
raceBlack 1.73314 0.14657 11.82 <2e-16 ***
---
Signif. codes: 0 S***S 0.001 S**S 0.01 S*S 0.05 S.S 0.1 S S 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 962.80 on 10 degrees of freedom


Residual deviance: 844.71 on 9 degrees of freedom
AIC: 1122

Number of Fisher Scoring iterations: 6

1
> sum(resid(hom.poi,type="pearson")^2)

[1] 2279.873

Notice that X 2 = 2279.873 which is large compared to the df = 1308 2 = 1306. Thus,

X2 2279.873
= = 1.745692 > 1
df 1306
is an indication of overdispersion. However we will fit the negative binomial model and cal-
culate the dispersion parameter D for that model to better determine the potential existence
of overdispersion.
Remark : The degrees of freedom are actually N p = 1308 2 = 1306 and not 9 as shown
in the software. This is because observation occurrences are weighted by their frequency
instead of the data entered as 1308 rows. For example, a black person that knew 0 victims
of homicide is weighed by the frequency of 119. Rather, there should have been 119 rows
(each row corresponding to an individual) of black person that knew 0 victims. However,
the model fit remains unchanged in all other aspects.

> homicide2=homicide[rep(1:14,homicide$Freq),]
> hom.poi2=glm(nvics~race,family=poisson(link="log"),data=homicide2)
> hom.poi2$df.residual

[1] 1306

Negative Binomial loglinear model


> library(MASS)
> hom.nb=glm.nb(nvics~race,weights=Freq,data=homicide)
> summary(hom.nb)

Call:
glm.nb(formula = nvics ~ race, data = homicide, weights = Freq,
init.theta = 0.2023119205, link = log)

Deviance Residuals:
Min 1Q Median 3Q Max
-12.754 0.000 2.086 3.283 9.114

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.3832 0.1172 -20.335 < 2e-16 ***
raceBlack 1.7331 0.2385 7.268 3.66e-13 ***
---
Signif. codes: 0 S***S 0.001 S**S 0.01 S*S 0.05 S.S 0.1 S S 1

(Dispersion parameter for Negative Binomial(0.2023) family taken to be 1)

Null deviance: 471.57 on 10 degrees of freedom


Residual deviance: 412.60 on 9 degrees of freedom
AIC: 1001.8

Number of Fisher Scoring iterations: 1

Theta: 0.2023
Std. Err.: 0.0409

2 x log-likelihood: -995.7980

2
The Wald and Likelihood-ratio CIs for exp both models are
> # Wald CI
> confint.default(hom.poi)

2.5 % 97.5 %
(Intercept) -2.573577 -2.192840
raceBlack 1.445877 2.020412

> exp(confint.default(hom.poi))

2.5 % 97.5 %
(Intercept) 0.0762623 0.1115994
raceBlack 4.2455738 7.5414329

> exp(confint.default(hom.nb))

2.5 % 97.5 %
(Intercept) 0.07332043 0.1160771
raceBlack 3.54571025 9.0299848

> # Likelihood CI
> confint(hom.poi)

2.5 % 97.5 %
(Intercept) -2.579819 -2.198699
raceBlack 1.443698 2.019231

> exp(confint(hom.poi))

2.5 % 97.5 %
(Intercept) 0.0757877 0.1109474
raceBlack 4.2363330 7.5325339

> exp(confint(hom.nb))

2.5 % 97.5 %
(Intercept) 0.07305976 0.1157258
raceBlack 3.57784560 9.1316443

The dispersion parameter D = 1/ for this R glm.nb function, so D = 1/0.2023 = 4.94.


is small so we are confidently say that D > 0
Also, note that the standard error of theta
and that a negative binomial is a better fit for the count data.

Вам также может понравиться