Вы находитесь на странице: 1из 8

STAT 504 Assessment #12

Due Monday, 4/8, before midnight

1. An experiment analyzes imperfection rates for two processes used to fabricate silicon wafers
for computer chips. For Treatment A applied to 10 wafers, the number of imperfections are 8,
7, 6, 6, 3, 4, 7, 2, 3, 4. For Treatment B applied to 10 wafers, the number of imperfections is 9,
9, 8, 14, 8, 13, 11, 5, 7, 7. Treat the counts as independent Poisson variates having means µ𝐴
and µ𝐵

a) Fit the Poisson regression model log(μ) = α + β x, where x = 1 for treatment A and x = 0 for
treatment B. Show that β = log µ𝐴 − log µ𝐵 . Interpret the estimate.

The estimated model coefficients from R code is given bellow. Therefore, the model
would be:
𝐿𝑜𝑔(𝜇) = 2.2083 − 0.5988 𝑥

Interpretation of Coefficient 
Form model we can write:
𝑥 = 1 → 𝐿𝑜𝑔(𝜇𝐴 ) = 𝛼 + 𝛽
𝑥 = 0 → 𝐿𝑜𝑔(𝜇𝐵 ) = 𝛼
𝛽 = 𝐿𝑜𝑔(𝜇𝐴 ) − 𝐿𝑜𝑔(𝜇𝐵 )

When the treatment is changed from A to B log of mean of imperfection increases by 


value.

1
b) Test H0 : µ𝐴 − µ𝐵 = 0 using either a Wald test or a likelihood ratio test from your SAS or R
output from the Poisson regression model you fitted. Interpret.

This null hypothesis is equavalent to the test of significance of coefficient , that is:
Ho: 𝛽 = 0 (or equivalently µ𝐴 − µ𝐵 = 0)
Ha: 𝛽 ≠ 0 (or equivalently µ𝐴 − µ𝐵 ≠ 0)

Test statistic: likelihood ration test

Conclusion:
Since the p-value=0.0005053<0.05, there is significant evidence against the null hypothesis.
Therefore, we reject the null and conclude that the means for treatmnet A and B are different.

#HW12-P1
A = c(8,7,6,6,3,4,7,2,3,4)
B = c(9,9,8,14,8,13,11,5,7,7)
y=c(A,B)
x = as.factor(c(1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0))

fit1 = glm(y~x, family=poisson(link="log"))


summary(fit1)

fit2 = glm(y~1, family=poisson(link="log"))


summary(fit2)

#GOOF test:LRT test


install.packages("lmtest")
library(lmtest)
lrtest(fit2,fit1)

2
2. The data below were reported by Laird and Olivier (1981) on the survival of patients after
heart-valve replacement surgery. Varying numbers of patients fell into the two categories of age
(Under 55, and 55+), two types of heart valve (Aortic and Mitral), and they were followed for
different lengths of time in terms of days (values under label exposure), and the last column is the
total number of deaths for the combination of the three predictors.

Age Type Exposure Deaths


Under 55 Aortic 1259 4
Mitral 2082 1
55+ Aortic 1417 7
Mitral 1647 9

a) Under a saturated model we can estimate the mean death rates directly. Let λ1 be the
mean death rate for individuals (Under55, Aortic) combination. Just looking at the table
(even without running the SAS or R code), do you know the estimate of this value? When
you take the natural log of this estimate, which parameter estimate in your model do you
expect to get?

We can compute mean death rate directly from the table by deviding numbre of recorded
deaths by the number of exposure days:

4
𝐷𝑒𝑎𝑡ℎ 𝑟𝑎𝑡𝑒(𝑈𝑛𝑑𝑒𝑟 55, 𝐴𝑜𝑟𝑡𝑖𝑐) = = 0.00317 (instance per exposure day)
1259

Assuming predictors as:


0 𝐴𝑔𝑒 𝑢𝑛𝑑𝑒𝑟 55
𝑥1𝑖 = {
1 𝐴𝑔𝑒 𝑎𝑏𝑜𝑣𝑒 55
0 𝐻𝑒𝑎𝑟𝑡 𝑣𝑎𝑙𝑣𝑒 𝑡𝑦𝑝𝑒 = 𝐴𝑜𝑟𝑡𝑖𝑐
𝑥2𝑖 ={
1 𝐻𝑒𝑎𝑟𝑡 𝑣𝑎𝑙𝑣𝑒 𝑡𝑦𝑝𝑒 = 𝑀𝑖𝑡𝑟𝑎𝑙
and
𝑥3𝑖 = 𝑥1𝑖 𝑥2𝑖
as their interaction term

we can construct the saturated model as


𝐿𝑜𝑔(𝜇𝑖 ) = log(𝑡𝑖 ) + 𝛽0 + 𝛽1 𝑥1𝑖 + 𝛽2 𝑥2𝑖 + 𝛽3 𝑥3𝑖
Where 𝜇𝑖 is the mean death rate for i-th individual after being exposed to the heart valve
for 𝑡𝑖 days.

4
For individuals (Under 55, Aortic), 𝑥1𝑖 = 𝑥2𝑖 = 𝑥3𝑖 = 0. Therefore, 𝐿𝑛 ( ) = −5.75177
1259
is an estimate for 𝛽0 .

3
b) Examine the Wald statistics of the saturated model output. Which predictors are
significant? Interpret the parameters of this model.

For testing significance of parameter 𝜃, the hypothesis testing would be:


𝐻𝑜 : 𝜃 = 0
𝐻𝑎 : 𝜃 ≠ 0
and the Wald test statistic for parameter 𝜃 is computed with:
(𝜃 − 𝜃0 )
𝑍=
√𝑉𝑎𝑟(𝜃0 )
Where 𝑉𝑎𝑟(𝜃0 ) and 𝜃0 are the values of population mean and variance for parameter 𝜃.
When fitting saturated Poisson model in R, we can get Z values and their corresponding
p-values for coefficient of predictors. These values are listed in the last two columns of R
results:

According to the above results, coefficients of predictors 𝛽1 , 𝛽2 , 𝛽3 have p-values 0.4813,


0.0911, 0.1046 respectively. Since all coefficients are greater than 0.05, we can conclude
that all of them are significant.

Interpreteaion of the parameters


Recalling
0 𝐴𝑔𝑒 𝑢𝑛𝑑𝑒𝑟 55
𝑥1𝑖 = {
1 𝐴𝑔𝑒 𝑎𝑏𝑜𝑣𝑒 55
0 𝐻𝑒𝑎𝑟𝑡 𝑣𝑎𝑙𝑣𝑒 𝑡𝑦𝑝𝑒 = 𝐴𝑜𝑟𝑡𝑖𝑐
𝑥2𝑖 ={
1 𝐻𝑒𝑎𝑟𝑡 𝑣𝑎𝑙𝑣𝑒 𝑡𝑦𝑝𝑒 = 𝑀𝑖𝑡𝑟𝑎𝑙

and the saturated model:


𝜇𝑖
𝐿𝑜𝑔 ( ) = 𝛽0 + 𝛽1 𝑥1𝑖 + 𝛽2 𝑥2𝑖 + 𝛽3 𝑥3𝑖
𝑡𝑖
when 𝑥𝑖1 = 𝑥𝑖2 = 𝑥𝑖3 = 0
𝜇0
𝐿𝑜𝑔 ( ) = 𝛽0
𝑡0
𝛽0 : Log of death rate for individuals aged under 55 with Aortic valve type

For 𝑥𝑖1 = 1, 𝑥𝑖2 = 𝑥𝑖3 = 0


𝜇1
𝐿𝑜𝑔 ( ) = 𝛽0 + 𝛽1
𝑡1

4
𝜇1
𝜇1 𝜇0 𝑡
𝛽1 = 𝐿𝑜𝑔 ( ) − 𝐿𝑜𝑔 ( ) = 𝐿𝑜𝑔 ( 𝜇1 )
𝑡1 𝑡0 0
𝑡0
𝛽1 : Log of ratio of death rate for individual (above 55, Aortic) with respect to that for
(under 55, Aortic) individuals.
For 𝑥𝑖1 = 0, 𝑥𝑖2 = 1, 𝑥𝑖3 = 0
𝜇2
𝐿𝑜𝑔 ( ) = 𝛽0 + 𝛽2
𝑡2
𝜇2
𝜇2 𝜇0 𝑡
𝛽2 = 𝐿𝑜𝑔 ( ) − 𝐿𝑜𝑔 ( ) = 𝐿𝑜𝑔 ( 𝜇2 )
𝑡2 𝑡0 0
𝑡0
𝛽2 :Log of ratio of death rate for individual (Under 55, Mitral) with respect to that for
(Under 55, Aortic) individuals.

For 𝑥𝑖1 = 1, 𝑥𝑖2 = 1, 𝑥𝑖3 = 1


𝜇3
𝐿𝑜𝑔 ( ) = 𝛽0 + 𝛽1 + 𝛽2 + 𝛽3
𝑡3
Substituting for 𝛽0 , 𝛽1 , 𝛽2 from above
𝜇1 𝜇2
𝜇3 𝜇0 𝑡1 𝑡
𝛽3 = 𝐿𝑜𝑔 ( ) − 𝐿𝑜𝑔 ( ) − 𝐿𝑜𝑔 ( 𝜇 ) − 𝐿𝑜𝑔 ( 𝜇2 )
𝑡3 𝑡0 0 0
𝑡0 𝑡0
𝜇3
𝑡
( 𝜇30 )
𝑡0
𝛽3 = 𝐿𝑜𝑔 𝜇1 𝜇2
𝑡1 𝑡
( 𝜇0 ) ( 𝜇20 )
( 𝑡0 𝑡0 )

𝛽3 is interpreted as the change in log of relative death rate per one unit incease in relative
death rate for (Above 55, Aortic) and (Under 55, Mitral) individuals. Relative death rates
are defined with (Under 55, Aortic) individuals as the base line. ???

𝛽3 is interpreted as the change in log of relative death rate when age group changes from
under 55 to above 55 and heart valve type changes from Aortic to Mitral. ???

c) Why did we use the offset in this model?

Since death instances are recorded over different time intervals, i.e. Exposure days, the
date instances needs to be normalized so that it can be compared across different groups.
Normalized death rate is obtained by deviding total number of Death instances reported
over the associated Exposure days.

Fitting the Poisson regression to the death rate will lead to the offset model.
𝜇𝑖
𝐿𝑜𝑔 ( ) = 𝐿𝑜𝑔(𝜇𝑖 ) − log(𝑡𝑖 ) = 𝛽0 + 𝛽1 𝑥1𝑖 + 𝛽2 𝑥2𝑖 + 𝛽3 𝑥3𝑖
𝑡𝑖
or equivalently,
𝐿𝑜𝑔(𝜇𝑖 ) = log(𝑡𝑖 ) + 𝛽0 + 𝛽1 𝑥1𝑖 + 𝛽2 𝑥2𝑖 + 𝛽3 𝑥3𝑖
where 𝐿𝑜𝑔(𝑡𝑖 ) is considered as the offset of the model.

5
6
d) What would be your criticism of this model, if any? Do you think that main effects model
would be better?

Criticism:
The interaction term is hard to interpret, and seems not to indicate a tangible meaning. In
other words, for the sake of interpretation, we would better neglect 𝛽3 . ???

In order to consider main effect model, we need to perform goodness of fit test:

𝐻𝑜 : 𝛽3 = 0 (𝑖. 𝑒. 𝑚𝑎𝑖𝑛 𝑒𝑓𝑓𝑒𝑐𝑡 𝑚𝑜𝑑𝑒𝑙 𝑣𝑎𝑙𝑖𝑑: 𝐿𝑜𝑔(𝜇𝑖 ) = log(𝑡𝑖 ) + 𝛽0 + 𝛽1 𝑥1𝑖 + 𝛽2 𝑥2𝑖 )


𝐻𝑎 : 𝛽3 ≠ 0(𝑠𝑎𝑡𝑢𝑟𝑎𝑡𝑒𝑑 𝑚𝑜𝑑𝑒𝑙 𝑣𝑎𝑙𝑖𝑑: 𝐿𝑜𝑔(𝜇𝑖 ) = log(𝑡𝑖 ) + 𝛽0 + 𝛽1 𝑥1𝑖 + 𝛽2 𝑥2𝑖 + 𝛽3 𝑥3𝑖 )

The results for fitting saturated model is reported in part b, and the results for fitting main
effect model is given in the following table:

Test statistic: Liklihood ratio test (LRT) and AIC criterion


The result of LRT test is given in the following table:

where -8.1747 and -6.5635 are the log-likelihood values for main effect and saturated
model with degrees of freedom 3 and 4, respectively. Since the p-value=0.0726 is greater
than the 0.05, we fail to reject the null and conclude that main effect model is valid.

Why AIC(main effect only)=22.349>AIC(saturated model)=21.127 ???

7
#HW12-P2
age=c("55-","55-","55+","55+")
heartvalve=c("Aortic","Mitral","Aortic","Mitral")
exposure=c(1259,2082,1417,1647)
death=c(4,1,7,9)
table=cbind(age,heartvalve,exposure,death)
table

fit1 = glm(death ~ offset(log(exposure)) + age*heartvalve, family=poisson(link='log'))


summary(fit1)

fit2 = glm(death ~ offset(log(exposure)) + age+heartvalve, family=poisson(link='log'))


summary(fit2)

#GOOF test:LRT test


install.packages("lmtest")
library(lmtest)
lrtest(fit2,fit1)