Comparing Two Tests For Two Rates PDF

ACCEPTED MANUSCRIPT
Comparing two tests for two rates

Chunpeng Fan , Lin Wang, and Lynn Wei
Department of Biostatistics and Programming, Sanofi US Inc.
Abstract
This article rigorously proves superiority of the proportion 2 test to the logistic regression
Wald test in terms of power when comparing two rates, despite their asymptotic equivalence
under the null hypothesis that the two rates are equal.
Keywords: Logistic regression; Parabola; Proportion 2 test.

Corresponding author: Chunpeng.Fan@sanofi.com
ACCEPTED MANUSCRIPT
1
ACCEPTED MANUSCRIPT
1 Introduction
Comparative analysis of two rates is a characteristic problem in the study of the occurrence of
illness (epidemiology) and in many other investigative contexts. When the compared rates take the
form of proportions, each empirical rate (p) expresses the number of cases (r) as a proportion
of the number of subjects (n): p = r/n. For a review of making statistical inference for the
comparative tests for two rates, we refer to Miettinen and Nurminen (1985).
When comparing two rates, two traditional tests can be used: the proportion 2 test and the
logistic regression Wald test. Asymptotic control of the Type I error rates of both these two tests
under the null hypothesis that the two rates are equal has been intensively investigated and well
documented in literature including Fienberg (1980), Venables and Ripley (2002), and many other
statistical textbooks.
Although the major advantage of the proportion 2 test is that it can handle stratified analysis for
comparing multiple rates and the major advantage of the logistic regression is that it can incorporate
discrete and continuous covariates, both are valid for comparing two rates, which represents their
applications in this simple case.
Although it can be easily shown that the proportion 2 test and the logistic regression Wald
test are asymptotically equivalent under the null hypothesis that the two rates are equal, under the
alternative hypothesis, they may lead to different inference. Therefore it would be important to
know which of them may be superior together with the corresponding conditions.
The rest of the article is organized as the follows. Section 2 gives explicit forms of the propor-
tion 2 test and the logistic regression Wald test when comparing two rates, and also rigorously
derives superiority of the proportion 2 test in terms of power. Section 3 numerically shows the
difference between these two tests in selected scenarios, and also uses an illustrative phase II clin-
ical trial to show possible different conclusions that may result. Conclusions and discussion are in
Section 4. The straightforward but mathematically tedious proof of the main theorem is provided
in the Appendix.
ACCEPTED MANUSCRIPT
2
ACCEPTED MANUSCRIPT
2 Comparison between the proportion 2 test and the logistic

regression Wald test in a two-sample setting
Assume we have two binary random variables X1 and X2 with means E(X1 ) = p1 and E(X2 ) = p2 ,
respectively. In a study with n1 independent random subjects from X1 in which there are r1 cases
and n2 from X2 with r2 cases, the null hypothesis H0 : p1 = p2 can be tested using two tests: the
proportion 2 test and the logistic regression Wald test.
With such a two-sample study, as derived in section 2.1 of Fienberg (1980), the test statistic
of the proportion 2 test is Prop = ZProp
2
, where ZProp can be expressed as
| p2 p1 |
ZProp = q , (1)
p(1 p)(n1 1
1 + n2 )
where p1 = r1 /n1 , p2 = r2 /n2 , n = n1 + n2 and p = (n1 p1 + n2 p2 )/n. The corresponding p-value is

then
p-valueProp = Pr(21 > Prop ) = 2{1 (ZProp )},
where 21 denotes a random variable that follows a 2 -distribution with 1 degree of freedom
and (z) denotes the cumulative density function (CDF) of a standard normal distribution. Writ-
ing the proportion 2 test in such a z-test format facilitates direct comparison with the logistic
regression Wald test below.
Meanwhile, the corresponding logistic regression can be expressed by jointly considering
E(X1 ) = p1 , E(X2 ) = p2 , and
! !
p1 p2
log = 0 , log = 0 + 1 ,
1 p1 1 p2
where 0 and 1 are two regression coefficients. The two-sided logistic regression Wald test statistic
for H0 : p1 = p2 can be expressed as
|1 |
ZLogistic = q .
c 1 )
Var(
c 1 ) by the generalized linear model theory (McCullagh

In such a two-sample study, 1 and Var(
ACCEPTED MANUSCRIPT
3
ACCEPTED MANUSCRIPT
and Nelder 1989) can be simplified and explicitly written as

! !
p2 p1
1 = log log , and
1 p2 1 p1
c 1 ) = 1 1
Var( + ;
n1 p1 (1 p1 ) n2 p2 (1 p2 )
ZLogistic can thus be explicitly written as
| log{ p2 /(1 p2 )} log{ p1 /(1 p1 )}|
ZLogistic = p , (2)
{n1 p1 (1 p1 )}1 + {n2 p2 (1 p2 )}1
and the corresponding p-value is then
p-valueLogistic = 2{1 (ZLogistic )}.
Although ZProp and ZLogistic are asymptotically equivalent under the null hypothesis H0 : p1 = p2
in the sense that
ZProp ZLogistic p 0 under H0 : p1 = p2
which can be obtained by simple derivations (trivial details omitted here), under the alternative
hypothesis Ha : p1 , p2 , they may lead to different inference. Therefore it would be important to
know which of them may be superior together with the corresponding conditions.
To this purpose, the following Theorem 1 establishes the fact that the proportion 2 test is
always superior to the logistic regression Wald test. Its straightforward but mathematically te-
dious proof extensively utilizes the property of parabolas (quadratic polynomials), and is in the
Appendix.
Theorem 1. For any n1 , n2 > 0 and any 0 < p1 , p2 < 1, when p1 , p2 ,
ZProp > ZLogistic .
Note that Theorem 1 only requires p1 , p2 , instead of p1 , p2 . This means that regardless
under the null or the alternative hypothesis, as long as p1 , p2 , the test statistic of the proportion 2
test, ZProp , is greater than that of the logistic regression Wald test, ZLogistic . However, asymptotically,
both of them can still appropriately control the Type I error rate at the desired nominal level which
can be derived from their asymptotic equivalence under the null hypothesis. Subsequently, the
ACCEPTED MANUSCRIPT
4
ACCEPTED MANUSCRIPT
following proposition also holds
Corollary 1. For any n1 , n2 > 0 and any 0 < p1 , p2 < 1, when p1 , p2 ,
p-valueProp < p-valueLogistic ,
and under the alternative hypothesis Ha : p1 , p2 ,
PowerProp > PowerLogistic .
Results in Theorem 1 and Corollary 1 suggest that, in finite sample studies, the proportion 2
test is always superior to the logistic regression Wald test in terms of power.
3 Numerical assessments
3.1 Numerical comparisons between the two test statistics
Numerical comparisons between the proportion 2 test statistic ZProp and the logistic regression
Wald test statistic ZLogistic were conducted in selected scenarios. These comparisons were to numer-
ically illustrate the difference between ZProp and ZLogistic .
As an illustration, we assumed n1 = n2 = 30, 50, or 100; p1 = 0.1, 0.3, or 0.5; and p2 between
0 to 1 by 0.01. For each selection of n1 = n2 , p1 , and p2 , the proportion 2 test statistic ZProp
and the logistic regression Wald test statistic ZLogistic could be calculated using formulas (1) and
(2), respectively. Figure 1 below shows the difference between ZProp and ZLogistic , while all the
calculated test statistic values are displayed in Figure 4 in Appendix B.
Results in Figure 1 confirm that ZProp ZLogistic > 0 in all cases, and also show that the difference
is more visible when the difference between p1 and p2 is relatively large.
3.2 Numerical comparisons between powers of the two tests
Numerical comparisons between the empirical power of the proportion 2 test and the logistic
regression Wald test were conducted in selected scenarios. As an illustration, we still assumed
n1 = n2 = 30, 50, or 100; p1 = 0.1, 0.3, or 0.5; and p2 being 0.01, 0.05 to 0.95 by 0.05.
ACCEPTED MANUSCRIPT
5
ACCEPTED MANUSCRIPT
For each combination of n1 , n2 , p1 , and p2 , n1 binary random samples with mean p1 and another
n2 binary random samples with mean p2 were generated. For such generated data, estimators of
the two rates were then calculated and denoted as p1 and p2 . The corresponding test statistics, ZProp
and ZLogistic , were calculated subsequently.
In this calculation, when p1 or p2 equals 0 or 1, ZProp or ZLogistic may not be well defined and
therefore some special handling is needed. The following handling was carried out:
1. When p1 = p2 = 0 or p1 = p2 = 1, ZProp is not well defined and its value was imputed as
0 which is its limiting value when p1 = 0 and p2 0 or p1 = 1 and p2 1 (trivial proof
omitted) and such a value indicates no difference between the two rates; the corresponding
p-valueProp = 1;
2. When p1 = 0 or 1, or p2 = 0 or 1, ZLogistic is not well defined and its value was imputed
as 0 which is its limiting value when p1 0 or 1, or p2 0 or 1; this statement can be
derived using the LHopitals rule (details of the proof are available upon request from the
corresponding author); and the corresponding p-valueLogistic = 1;
Repeat this process for 10,000 times. For each of these two tests, the rejection rates for the tests
with nominal level 5% can be calculated by counting the proportion of generated data that give a
p-value less than 0.05 in the 10,000 generated data sets. Under the null hypothesis H0 : p1 = p2 ,
such rejection rates are just empirical Type I error rates; while under the alternative hypothesis,
these rejection rates are powers. Figure 2 below shows the difference between PowerProp and
PowerLogistic , while all the power values are displayed in Figure 5 in Appendix B.
Results in Figure 2 confirm that PowerProp PowerLogistic > 0 in all cases, and also show that the
difference is more visible when p1 or p2 is close to 0 or 1.
3.3 Example
A 12-week phase II rheumatoid arthritis clinical trial is used to illustrate applications of the meth-
ods. See ? for the study design and the conduct of such studies. In this study, the ACR20 responder
status (the status of achieving/not achieving 20% improvement based on the American College of
ACCEPTED MANUSCRIPT
6
ACCEPTED MANUSCRIPT
Rheumatology criteria) is used to measure the treatment effects to the improvement in signs and
symptoms of rheumatoid arthritis.
The illustrative clinical trial has 34 randomized patients in each arm: the active treatment (Trt
A) and the placebo (Placebo). There are 16 responders in the Placebo group and 24 responders
in Trt A, while the rest are non-responders. The two response rates can be compared using the
proportion 2 test or the logistic regression Wald test. The test statistics and the corresponding
p-values by each of these two tests can be calculated using formulas (1) and (2), or software such
as R (R Development Core Team 2008) or SASr (2008). Results by all three methods are reported
in Table 1.
Results in Table 1 show that in this illustrative study, the proportion 2 test gives a smaller
p-value (p-valueProp = 0.0487) than the logistic regression Wald test (p-valueLogistic = 0.0513).
For statisticians, such close p-values may not indicate different statistical inference. How-
ever in practice, when a predefined hierarchical gatekeeping procedure is employed to control the
family-wise error rate (FWER) which requires a predefined cutoff for each test in the hierarchy,
with a commonly selected cutoff 0.05, a p-valueProp of 0.0487 will warrant the achievement of
the current endpoint and allow assessments of subsequent endpoints, but a p-valueLogistic of 0.0513
stops the hierarchical testing procedure and none of the subsequent endpoints can even be as-
sessed, therefore having no chance to be claimed statistically significance. The superiority of the
proportion 2 test may gain much advantage in such borderline scenarios.
4 Conclusions and Discussions

In conclusion, we rigorously prove the proportion 2 test is superior to the logistic regression Wald
test in terms of power when comparing two rates, despite their asymptotic equivalence under the
null hypothesis that the two rates are equal.
Results in Theorem 1 also indicate that the Type I error rate of the proportion 2 test is larger
than that of the logistic regression Wald test. Besides theoretical justifications that both these two
tests can asymptotically control the Type I error rate under the null hypothesis (Fienberg 1980;
Venables and Ripley 2002), further simulation studies were also conducted (results omitted) and
ACCEPTED MANUSCRIPT
7
ACCEPTED MANUSCRIPT
numerically confirmed appropriate Type I error rate control of both tests, although the logistic
regression Wald test is more conservative than the proportion 2 test and may be too conservative
when p1 = p2 are close to 0 or 1.
A simple derivation can show that the proportion 2 test is very close to the Cochran-Mantel-

Haenszel (CMH) test without stratification, with a multiplication factor of (n 1)/n difference in
the test statistics (the proportion 2 test statistic is larger). Therefore the CMH test statistic may be
larger or smaller than the logistic regression Wald test statistic, depending on the values of n1 , n2 ,
p1 , and p2 . Since the CMH test is commonly used to analyze data in stratified studies, comparisons
between the CMH test and the logistic regression Wald test in stratified studies may warrant future
research.
Results in Section 3.1 confirm that ZProp > ZLogistic in all cases, and also show that the difference
between ZProp and ZLogistic is more visible when the difference between p1 and p2 is relatively large.
The reason why their difference is marginal is that under the null and also the local alternative
hypotheses where the difference between p1 and p2 is in the order of O(n1/2 ), the proportion 2
test and the logistic regression Wald test are asymptotically equivalent (trivial details omitted).
Their difference is therefore more visible when the difference between p1 and p2 is relatively large.
Although simulation studies in Sections 3.1 and 3.2 only investigated scenarios with equal
sample sizes, say n1 = n2 , similar findings can be obtained in scenarios with unequal sample sizes
(simulation results omitted).
Appendix A. Proof of Theorem 1

Without loss of generality, we assume p2 > p1 . Then ZProp > ZLogistic in Theorem 1 can be written as
p2 p1 log{ p2 /(1 p2 )} log{ p1 /(1 p1 )}
q > p . (3)
1 1
p(1 p)(n1 + n2 ) {n 1 p 1 (1 p 1 )} 1 + {n p (1 p )}1
2 2 2
For simplicity in notation, in the entire proof below, we eliminate the hat on p1 , p2 , and p, and
write the inequality (3) as
p 2 p1 log{p2 /(1 p2 )} log{p1 /(1 p1 )}
q > p . (4)
1 1
p(1 p)(n1 + n2 ) {n 1 p 1 (1 p 1 )} 1 + {n p (1 p )}1
2 2 2
ACCEPTED MANUSCRIPT
8
ACCEPTED MANUSCRIPT
To prove (4), we first prove the following Lemma 1 that will be used multiple times.
Lemma 1. Let R denote the set of real numbers. For any x R, (1 e x )2 e x x2 > 0.
Proof. By Taylor expansion, for any real number x,

X X
x xn x (1)n xn
e = and e = .
n=0
n! n=0
n!
Then
nX
x 2 x 2 x x x 2 xn X (1)n xn
x
o
(1 e ) e x = e (e + e 2 x ) = e + 2 x2
n=0
n! n=0 n!
nX o X
2x2m 2x2m
= ex 2 x2 = e x > 0,
m=0
(2m)! m=2
(2m)!
since (2x2m )/(2m)! = 2 when m = 0 and (2x2m )/(2m)! = x2 when m = 1.
Now let us prove the inequality (4).

Denote t = n1 /n2 > 0. Then it is easy to see n = (1 + t)n2 , p = (tp1 + p2 )/(1 + t). Also denote
D = log{p2 /(1 p2 )} log{p1 /(1 p1 )} > 0.
It can be derived that

(p2 p1 ) n2 t D n2
(4) r > p
tp1 + p2 tp1 + p2 {tp1 (1 p1 )}1 + {p2 (1 p2 )}1
1 (t + 1)
1+t 1+t
n 1 t o (tp1 + p2 ){t(1 p1 ) + (1 p2 )}
(p2 p1 )2 + > D2
p1 (1 p1 ) p2 (1 p2 ) 1+t
n (p2 p1 )2 o
t2 p1 (1 p1 )D2
p2 (1 p2 )
h n 1 1 o i
+t (p2 p1 )2 + (p1 + p2 2p1 p2 )D2
p1 (1 p1 ) p2 (1 p2 )
(p2 p1 ) 2
+ p2 (1 p2 )D2 > 0. (5)
p1 (1 p1 )
Now we denote
(p2 p1 )2
A = p1 (1 p1 )D2 ,
p2 (1 p2 )
ACCEPTED MANUSCRIPT
9
ACCEPTED MANUSCRIPT
n 1 1 o
B = (p2 p1 )2 + (p1 + p2 2p1 p2 )D2 ,
p1 (1 p1 ) p2 (1 p2 )
(p2 p1 )2
C = p2 (1 p2 )D2 . (6)
p1 (1 p1 )
Inequality (5) is now equivalent to
f (t) = At2 + Bt + C > 0, (7)
with f (t) being a univariate quadratic function with indeterminate t > 0.

Now we only need to prove inequality (7) holds for all t > 0 and any fixed p1 and p2 . To
achieve this aim, we first prove the following Lemma 2.
Lemma 2. In f (t) in (7), A > 0 and C > 0.
Proof. Denote q1 = p1 /(1 p1 ) and q2 = p2 /(1 p2 ). Then p1 = q1 /(1 + q1 ), p2 = q2 /(1 + q2 ),

and q2 = q1 eD .
Now A can be simplified as
q2 q1 2

1 + q2 1 + q1 q1
A = q2 2
D2
(1 + q1 )
(1 + q2 )2
q1 n (q2 q1 )2 2
o
= D
(1 + q1 )2 q 1 q2
q1 n (e 1)2
D o
2
= D
(1 + q1 )2 eD
q1
= D {(1 eD )2 eD D2 } > 0
e (1 + q1 )2
by Lemma 1.
C > 0 follows the fact that C = A p2 (1 p2 )/{p1 (1 p1 )}.
Based on theory for quadratic polynomials, the vertex of the quadratic function f (t) is
B B2 4AC
(t, f (t)) = , .
2A 4A
Since A > 0, the parabola f (t) opens upward and there are two different scenarios for the axis of
symmetry of f (t):
ACCEPTED MANUSCRIPT
10
ACCEPTED MANUSCRIPT
Scenario 1. B 0. In this case, the axis of symmetry B/(2A) 0. Since f (0) = C > 0 by
Lemma 2, it is easy to see that f (t) > 0 for all t > 0. This scenario is displayed in the left panel of
Figure 3.
Scenario 2. B < 0. In this case, the axis of symmetry B/(2A) > 0. Therefore to guarantee
f (t) > 0 for all t > 0, the y-coordinate of the vertex needs to be positive; that is, (B2 4AC)/(4A) >
0 (right panel of Figure 3) which is equivalent to B2 4AC < 0.

Since B < 0, B2 4AC < 0 is equivalent to B > 2 AC. So we only need to prove B+2 AC >
0. To achieve this aim, we first write B using notations q1 , q2 , and D that are introduced in Lemma
2.
n 1 1 o
B = (p2 p1 )2 + (p1 + p2 2p1 p2 )D2
p1 (1 p1 ) p2 (1 p2 )
q2 q1 2 n (1 + q1 )2 (1 + q2 )2 o n q1 q2 2q1 q2 o
= + + D2
1 + q2 1 + q1 q1 q2 1 + q1 1 + q2 (1 + q1 )(1 + q2 )
1 h (q2 q1 )2 n (1 + q1 )2 (1 + q2 )2 o i
= + (q1 + q2 )D2
(1 + q1 )(1 + q2 ) (1 + q1 )(1 + q2 ) q1 q2
1 h
= (q2 q1 )2 {q2 (1 + q1 )2 + q1 (1 + q2 )2 }
q1 q2 (1 + q1 )2 (1 + q2 )2
i
q1 q2 (1 + q1 )(1 + q2 )(q1 + q2 )D2
q21 h
= (1 eD )2 {eD (1 + q1 )2 + (1 + eD q1 )2 }
q2 (1 + q1 )2 (1 + q2 )2
i
eD (1 + eD )(1 + q1 )(1 + eD q1 )D2
q1 h
= D 2 2
q21 {eD (1 + eD )(1 eD )2 e2D (1 + eD )D2 }
e (1 + q1 ) (1 + q2 )
+q1 {4eD (1 eD )2 eD (1 + eD )2 D2 } + {(1 + eD )(1 eD )2 eD (1 + eD )D2 }.
Using the same notations, it can be derived that

p n (p2 p1 )2 o
2 AC = 2 p1 (1 p1 )p2 (1 p2 ) D2
p1 (1 p1 )p2 (1 p2 )
2q1 e D/2 n (q2 q1 )2 o
= D2
(1 + q1 )(1 + q2 ) q1 q2
n o
D (1 e )
D 2
q1 3D/2 2
= D 2e (1 + q 1 )(1 + q1 e ) D
e (1 + q1 )2 (1 + q2 )2 eD
ACCEPTED MANUSCRIPT
11
ACCEPTED MANUSCRIPT
n o
3D/2 (1 e )
D 2
q1 2 5D/2 3D/2 D 2
= {q 1 2e + q 1 2e (1 + e ) + 2e } D
eD (1 + q1 )2 (1 + q2 )2 eD
Denote
n (1 eD )2 o
A1 = eD (1 + eD )(1 eD )2 e2D (1 + eD )D2 + 2e5D/2 D2 ,
eD
n (1 eD )2 o
B1 = 4eD (1 eD )2 eD (1 + eD )2 D2 + 2e3D/2 (1 + eD ) D 2
,
eD
n (1 eD )2 o
C1 = (1 + eD )(1 eD )2 eD (1 + eD )D2 + 2e3D/2 D 2
. (8)
eD

B + 2 AC can be written as
q1
B + 2 AC = (A1 q21 + B1 q1 + C1 ).
eD (1 2
+ q1 ) (1 + q2 )2

It would be sufficient to prove that for any fixed D, g(q1 ) > 0 for all q1 to establish B + 2 AC > 0
for any D > 0 and q1 > 0, where
g(q1 ) = A1 q21 + B1 q1 + C1 .
To this aim, it will be sufficient to prove that, for any fixed D, A1 > 0 and B21 4A1C1 < 0
since, in this case, the parabola g(q1 ) opens upward (A1 > 0) and the y-coordinate of its vertex
(4A1C1 B21 )/(4A1 ) is positive.
To prove A1 > 0 and B21 4A1C1 < 0, we first simplify A1 , B1 , and C1 as the follows.
A1 = eD (1 + eD/2 )2 {(1 eD )2 eD D2 },
B1 = (1 + eD/2 )2 {2eD/2 (1 eD )2 eD (1 + eD )D2 },
C1 = (1 + eD/2 )2 {(1 eD )2 eD D2 }.
Then A1 > 0 directly follows Lemma 1.
ACCEPTED MANUSCRIPT
12
ACCEPTED MANUSCRIPT
B21 4A1C1 < 0 can be proved using the following procedure.

h i
B21 4A1C1 = eD (1 + eD/2 )4 {2(1 eD )2 eD/2 (1 + eD )D2 }2 4{(1 eD )2 eD D2 }2
h i
= eD (1 + eD/2 )4 D2 {eD (1 + eD )2 4e2D }D2 4eD/2 (1 eD )2 {(1 + eD ) 2eD/2 }
h i
= eD (1 + eD/2 )4 D2 4eD/2 (1 eD )2 (1 eD/2 )2 eD (1 eD )2 D2
= eD (1 + eD/2 )4 D2 eD/2 (1 eD )2 {4(1 eD/2 )2 eD/2 D2 }
= 4eD (1 + eD/2 )4 D2 eD/2 (1 eD )2 {(1 eD/2 )2 eD/2 (D/2)2 } < 0
by Lemma 1.
This completes the proof.
Now we summarize the process of the entire proof.
1. When B 0 for B defined in (6), by Lemma 2, A > 0 and C > 0, and therefore f (t) > 0 for
all t > 0 with f (t) defined in (7); this proves inequality (4).
2. When B < 0, after proving that A1 > 0 and B21 4A1C1 < 0 for A1 , B1 , and C1 defined in (8),

we have g(q1 ) > 0 for any q1 R, and therefore B + 2 AC > 0 for any D > 0 and q1 > 0.
This further gives that (B2 4AC)/(4A) > 0, together with A > 0, we conclude that the
parabola f (t) opens upward with a positive y-coordinate of the vertex. Then f (t) > 0 for all
t R.
This indicates that for any fixed t = n1 /n2 > 0, p1 > 0 and p2 > 0, f (t) > 0; and by the fact that
(4) is equivalent to f (t) > 0, inequality (4) holds.
Appendix B. Plots for the test statistics and powers

Figures 4 and 5 display the test statistics and powers, respectively, that were obtained under the
settings in Section 3.
ACCEPTED MANUSCRIPT
13
ACCEPTED MANUSCRIPT
Acknowledgements
The authors wish to thank the anonymous Associate Editor and two referees for constructive sug-
gestions and comments that substantially improved the original version of this article.
ACCEPTED MANUSCRIPT
14
ACCEPTED MANUSCRIPT
References
Fienberg, S. E. (1980). The Analysis of Cross-Classified Categorical Data
. MIT Press, Cambridge,
MA, second edition.
McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models. Chapman and Hall/CRC,
Boca Raton, second edition.
Miettinen, O. and Nurminen, M. (1985). Comparative analysis of two rates. Statistics in Medicine
4, 213226.
R Development Core Team (2008). R: A Language and Environment for Statistical Computing. R
Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0.
SASr (2008). SAS Institute Inc., Cary, NC.
Venables, W. N. and Ripley, B. D. (2002). Modern Applied Statistics with S. Springer-Verlag Inc,
New York.
ACCEPTED MANUSCRIPT
15
ACCEPTED MANUSCRIPT
Table 1: Test statistics and the corresponding p-values by the proportion 2 test or the logistic
regression Wald test
Proportion 2 test Logistic regression Wald test
Method ZProp p-valueProp Method ZLogistic p-valueLogistic
Equation Equation (1) 1.9712 0.0487 Equation (2) 1.9490 0.0513
R prop.test 1.9712 0.0487 glm 1.9490 0.0513
SASr PROC FREQ 1.9712 0.0487 PROC GENMOD 1.9490 0.0513
ACCEPTED MANUSCRIPT
16
ACCEPTED MANUSCRIPT
0.0 0.2 0.4 0.6 0.8 1.0

n1=n2=100,p1=0.1 n1=n2=100,p1=0.3 n1=n2=100,p1=0.5
6
4
2
ZProp - ZLogistic
0
n1=n2=50,p1=0.1 n1=n2=50,p1=0.3 n1=n2=50,p1=0.5
6
4
2
0
n1=n2=30,p1=0.1 n1=n2=30,p1=0.3 n1=n2=30,p1=0.5
6
4
2
0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
p2
Figure 1: Numerical comparisons between ZProp and ZLogistic : ZProp ZLogistic > 0 in all cases, and the
difference is more visible when the difference between p1 and p2 is relatively large.
ACCEPTED MANUSCRIPT
17
ACCEPTED MANUSCRIPT
0.0 0.2 0.4 0.6 0.8

n1=n2=100,p1=0.1 n1=n2=100,p1=0.3 n1=n2=100,p1=0.5
0.6
PowerProp PowerLogistic
0.4
0.2
0.0
n1=n2=50,p1=0.1 n1=n2=50,p1=0.3 n1=n2=50,p1=0.5
0.6
0.4
0.2
0.0
n1=n2=30,p1=0.1 n1=n2=30,p1=0.3 n1=n2=30,p1=0.5
0.6
0.4
0.2
0.0
0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8
p2
Figure 2: Numerical comparisons between PowerProp and PowerLogistic : PowerProp PowerLogistic > 0 in
all cases; the difference is more visible when p1 or p2 is close to 0 or 1.
ACCEPTED MANUSCRIPT
18
ACCEPTED MANUSCRIPT
f (t ) y = f (t ) y = f (t ) f (t )
B f (0) = C f (0) = C
2A x =t
(0,0) B2- 4AC
4A
x =t
B2- 4AC B
(0,0)
4A 2A
Figure 3: Parabola f (t) = At2 + Bt + C with A > 0 and C > 0.

Left Panel: Scenario 1, B 0. In this case, since the axis of symmetry B/(2A) 0 and
f (0) = C > 0, f (t) > 0 for all t > 0.
Right Panel: Scenario 2, B < 0. In this case, since the axis of symmetry B/(2A) > 0, the
y-coordinate of the vertex should be positive (that is, (B2 4AC)/(4A) > 0) to guarantee
f (t) > 0 for all t > 0.
ACCEPTED MANUSCRIPT
19
ACCEPTED MANUSCRIPT
0.0 0.2 0.4 0.6 0.8 1.0

n1=n2=100,p1=0.1 n1=n2=100,p1=0.3 n1=n2=100,p1=0.5
10
5
ZProp or ZLogistic
0
n1=n2=50,p1=0.1 n1=n2=50,p1=0.3 n1=n2=50,p1=0.5
10
5
0
n1=n2=30,p1=0.1 n1=n2=30,p1=0.3 n1=n2=30,p1=0.5
10
5
0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
p2
ZProp ZLogistic
Figure 4: Numerical comparisons between the test statistics ZProp and ZLogistic
ACCEPTED MANUSCRIPT
20
ACCEPTED MANUSCRIPT
0.0 0.2 0.4 0.6 0.8

Rejection rates of 5% test
n1=n2=100,p1=0.1 n1=n2=100,p1=0.3 n1=n2=100,p1=0.5
0.8
0.4
0.0
n1=n2=50,p1=0.1 n1=n2=50,p1=0.3 n1=n2=50,p1=0.5
0.8
0.4
0.0
n1=n2=30,p1=0.1 n1=n2=30,p1=0.3 n1=n2=30,p1=0.5
0.8
0.4
0.0
0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8
p2
PowerProp PowerLogistic
Figure 5: Numerical comparisons between the powers PowerProp and PowerLogistic
ACCEPTED MANUSCRIPT
21

Comparing Two Tests For Two Rates PDF

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Comparing Two Tests For Two Rates PDF

Загружено:

Авторское право:

Доступные форматы

ACCEPTED MANUSCRIPT

Comparing two tests for two rates

Keywords: Logistic regression; Parabola; Proportion 2 test.

2 Comparison between the proportion 2 test and the logistic

where p1 = r1 /n1 , p2 = r2 /n2 , n = n1 + n2 and p = (n1 p1 + n2 p2 )/n. The corresponding p-value is

p-valueProp = Pr(21 > Prop ) = 2{1 (ZProp )},

c 1 ) by the generalized linear model theory (McCullagh

and Nelder 1989) can be simplified and explicitly written as

p-valueLogistic = 2{1 (ZLogistic )}.

ZProp ZLogistic p 0 under H0 : p1 = p2

Theorem 1. For any n1 , n2 > 0 and any 0 < p1 , p2 < 1, when p1 , p2 ,

ZProp > ZLogistic .

following proposition also holds

Corollary 1. For any n1 , n2 > 0 and any 0 < p1 , p2 < 1, when p1 , p2 ,

p-valueProp < p-valueLogistic ,

and under the alternative hypothesis Ha : p1 , p2 ,

PowerProp > PowerLogistic .

3.1 Numerical comparisons between the two test statistics

3.2 Numerical comparisons between powers of the two tests

4 Conclusions and Discussions

Appendix A. Proof of Theorem 1

Proof. By Taylor expansion, for any real number x,

Now let us prove the inequality (4).

D = log{p2 /(1 p2 )} log{p1 /(1 p1 )} > 0.

It can be derived that

Inequality (5) is now equivalent to

f (t) = At2 + Bt + C > 0, (7)

with f (t) being a univariate quadratic function with indeterminate t > 0.

Lemma 2. In f (t) in (7), A > 0 and C > 0.

Proof. Denote q1 = p1 /(1 p1 ) and q2 = p2 /(1 p2 ). Then p1 = q1 /(1 + q1 ), p2 = q2 /(1 + q2 ),

Using the same notations, it can be derived that

B1 = (1 + eD/2 )2 {2eD/2 (1 eD )2 eD (1 + eD )D2 },

Then A1 > 0 directly follows Lemma 1.

B21 4A1C1 < 0 can be proved using the following procedure.

= eD (1 + eD/2 )4 D2 eD/2 (1 eD )2 {4(1 eD/2 )2 eD/2 D2 }

= 4eD (1 + eD/2 )4 D2 eD/2 (1 eD )2 {(1 eD/2 )2 eD/2 (D/2)2 } < 0

Now we summarize the process of the entire proof.

Appendix B. Plots for the test statistics and powers

SASr (2008). SAS Institute Inc., Cary, NC.

0.0 0.2 0.4 0.6 0.8 1.0

0.0 0.2 0.4 0.6 0.8

Figure 3: Parabola f (t) = At2 + Bt + C with A > 0 and C > 0.

0.0 0.2 0.4 0.6 0.8 1.0

0.0 0.2 0.4 0.6 0.8

Figure 5: Numerical comparisons between the powers PowerProp and PowerLogistic

Вам также может понравиться