Вы находитесь на странице: 1из 4

ST3241 Categorical Data Analysis I

Semester II, 2010-2011

Solution to Tutorial 2
1. (a) Belief in after life (Y ) is the response variable and Race (X) is the explanatory variable.
(b) To measure the association, we can use odds ratio. Then the sample odds ratio is
621 × 42
θ̂ = = 1.2262.
89 × 239
The sample odds ratio is greater than one indicating that the odds for a white believing in
afterlife is more than the odds for a black believing in afterlife. However, it is very close to
one and may not be significantly different from 1. If we calculate relative risk as a measure
of association, we get sample relative risk = 1.0629 and for difference in proportions, we
have p1 − p2 = 0.0427.
(c) A 95% confidence interval for the odds ratio, θ can be computed from the large sample
confidence interval for log-odds ratio. Here sample log odds ratio is log(θ̂) = 0.2039 and
the estimated standard error,
r
1 1 1 1
σ̂(log(θ̂)) = + + + = 0.20209.
621 239 89 42
Therefore, the 95% C.I. for log θ is

0.2039 ± 1.96 × 0.20209 = (−0.1922, 0.6000).

Therefore, the 95% C.I. for θ is

(e−0.1922 , e0.6000 ) = (0.8251, 1.8221).

The confidence interval contains 1, which indicates that the true odds ratio is not signifi-
cantly different from 1 and there is not enough evidence that the belief in afterlife is depen-
dent on race. Similarly, for relative risk, the sample log relative risk is, log(p1/p2) = 0.0610
and its estimated standard error is 0.0636. Therefore, the 95% C.I. for log relative risk is
0.0610 ± 1.96 × 0.0636= (-0.0638, 0.1857). Therefore, the 95% C.I. for relative risk is

(e−0.0638 , e0.1857 ) = (0.9382, 1.2041).

The 95% confidence interval for difference in proportions is,

0.0427 ± 1.96 × 0.0435 = (−0.0426, 0.1281).

2. (a) The data refers to the conditional distribution of X given Y .


(b) Let p1 be the proportion of all blacks slain by blacks and p2 be the proportion of all
whites slain by blacks. Then p1 = 0.94 and p2 = 1 - 0.83 = 0.17. Then the sample odds
ratio is given by,
p1 (1 − p2 ) 0.94 × 0.83
θ̂ = = = 76.7902.
p2 (1 − p1 ) 0.17 × 0.06
Therefore, the odds that the victims are black when the murderer is black is nearly 76.5
times higher than the odds that the victims are black when the murderer is white. Note

1
that, here we can interpret odds ratio in these way, because the calculation of odds ratio
does not depend on which conditional distribution is given. In this case, we cannot compute
the relative risk of a black victim when the murderer is black compared to a black victim
when the murderer is white.
(c) To estimate the probability that the victim is white given that the murderer is white,
we need to estimate the conditional distribution of Y given X. Let W denote White and
B denote Black, then
P (Y = W, X = W )
P (Y = W |X = W ) =
P (X = W )
P (X = W |Y = W ) × P (Y = W )
=
P (X = W |Y = W ) × P (Y = W ) + P (X = W |Y = B) × P (Y = B)
The last line follows from the so-called Bayes’s theorem. Now, from the data, we can
estimate the conditional probabilities for X given Y . But we also need to estimate the
marginal distribution of Y . That is, we need the additional information of the proportion
of white victims among all victims.

3. (a) The Pearsons chi-square test statistic and likelihood ratio test statistics are: χ2 =
28.986 and G2 = 34.613 respectively. The associated degrees of freedom is 8 and the p-
values are 0.0003 and 0.00003, respectively. So, there is a strong evidence of association
between physician specialty and the recommended surgery.
(b)The adjusted residuals are given as following:

Physician Surgery
Specialty R CR C
Internal 0.3711387 1.3545291 -1.4609730
Surgery 2.5994878 1.8181122 -3.1426714
Radiotherapy -1.2880956 -3.7964272 4.2341608
Oncology -1.7378616 -0.6606875 1.5880950
Gynecology -1.3876462 0.3669879 0.4410646

The adjusted residuals indicate that the physicians with specialty ”Oncology”, ”Gynecol-
ogy” and ”Internal” recommend more independently, but the surgeons recommend more
radical surgeries and radiotherapists recommend more conservative surgery.

4. The following table shows the classification of students according to their family income
and educational aspirations:

Education Aspiration
Family Some High Some
Income School School College College
Low 9 44 13 10
Middle 11 52 23 22
High 9 41 12 27

(a) To test independence between educational aspirations and family income, the Pear-
sons chi-square test statistic is χ2 = 8.8709 and the likelihood ratio test statistic, G2 =

2
8.9165 with 6 degrees of freedom. The p-values for these two test statistics are 0.1809674
and 0.1783272, respectively. Thus, both of the tests suggest that there is not enough evi-
dence of dependence between educational aspiration and family income. Note that, both
educational aspirations and family income are ordinal categorical variables and the tests
considered here fails to take account of their ordering.
(b) The adjusted residuals are given below:

Education Aspiration
Family Some High Some
Income School School College College
Low 0.4061 1.5828 -0.1286 -2.1078
Medium -0.1898 -0.5441 1.3042 -0.4032
High -0.1903 -0.9459 -1.2374 2.4360

The adjusted residuals for the low family income group increases with increasing edu-
cational aspiration indicating that there might be some linear trend in the association
pattern. The residuals for educational aspiration college in the low and high income group
are quite large indicating possible deviations from independence.
(c) We conduct a test for linear trend alternative which is more powerful in the case of
ordinal data. For that purpose, we assign the scores for educational aspirations as 1,2,3
and 4 and for income categories as 1, 2 and 3. We have r = 0.1321 and n = 273. Thus,
the statistic, M 2 = (n − 1)r2 = 4.7489(d.f. = 1) has a p-value 0.0293, which suggests
that there is an increasing linear trend in educational aspirations with increasing family
income.

5. (a) For this data, p1 = 17/147 = 0.1156 and p2 = 218/646 = 0.3375. Thus, the sample
difference in proportions, p1 − p2 = -0.2218. The estimated standard error,
r
0.1156(1 − 0.1156) 0.3375(1 − 0.3375)
σ̂(p1 − p2) = + = 0.0323.
147 646
Therefore, the 90% C.I. for difference of proportions is

−0.2218 ± 1.645 × 0.0323 = (−0.2749, −0.1687).

The confidence interval does not contain 0 and entirely on the negative side, from which we
can conclude that head injuries are not independent of wearing helmets and the probability
of having head injuries without helmets is higher than that with helmets.
(b) The sample odds ratio,
17 × 428
θ̂ = = 0.2567
218 × 130
and sample log-odds ratio is, log θ̂ = -1.3597. The estimated standard error for logθ̂ is
r
1 1 1 1
σ̂(log θ̂) = + + + = 0.2710.
17 218 130 428
Therefore, the 90% C.I. for log-odds ratio is

−1.3597 ± 1.645 × 0.2710 = (−1.8055, −0.9139).

3
Therefore, the 90% C.I. for odds ratio is,

(e−1.8055 , e−0.9139 ) = (0.1644, 0.4010).

The confidence interval is smaller than 1, which also supports the fact observed in part
(a).
(c) The following table shows the expected frequencies:

Wearing Helmet
Head Injury Yes No
Yes 43.56 191.44
No 103.44 454.56

Therefore, to test for independence, the Pearson’s chi-square test statistic is, χ2 = 28.256
and likelihood ratio test statistic is G2 = 32.5432 with 1 degrees of freedom. The p-values
for these tests are, 1.063 × 10−7 and 1.165 × 10−8 respectively. Both of these tests suggest
that head injuries are dependent on wearing helmets.

Вам также может понравиться