Академический Документы
Профессиональный Документы
Культура Документы
Dean P. Foster and Robert A. Stine
Department of Statisti s
Philadelphia, PA 19104-6340
April 7, 2005
Abstra
t
We propose an adaptive, sequential methodology for testing multiple hypotheses.
Our methodology
onsists of a new
riterion, the ex
ess dis
overy
ount (EDC), and a
new
lass of testing pro
edures that we
all alpha-investing rules. The ex
ess dis
overy
ount is the dieren
e between the number of
orre
tly reje
ted null hypotheses and a
fra
tion of the total number of reje
ted hypotheses. EDC shares many properties with
the false dis
overy rate (FDR), but is adapted to testing a sequen
e of hypotheses rather
than a xed set. Be
ause EDC
ontrols the
ount of in
orre
tly reje
ted hypotheses
rather than a ratio, we are able to prove that a wide
lass of testing pro
edures that
we
all alpha-investing rules
ontrol EDC. Alpha-investing rules mimi
alpha-spending
rules used in sequential trials, but possess a key dieren
e. When a test reje
ts a null
hypothesis, alpha-investing rules earn additional probability toward testing subsequent
hypotheses. Alpha-investing rules allow one to in
orporate domain knowledge into the
testing pro
edure and improve the power of the tests.
Key words and phrases: Bonferroni method, false dis
overy rate (FDR), family wide
error rate (FWER), multiple
omparison pro
edure.
All
orresponden
e regarding this manus
ript should be dire
ted to Prof. Stine at the address shown
1
EDC and Alpha-investing 2
1 Introdu
tion
We propose an adaptive, sequential methodology for testing multiple hypotheses. Our
approa
h works in the usual setting in whi
h one has a bat
h of several hypotheses
as well as
ases in whi
h the hypotheses arrive sequentially in a stream. Streams of
hypothesis tests arise naturally in variety of
ontemporary modeling appli
ations, su
h
as genomi
s and variable sele
tion for large models. In
ontrast to the
omparatively
well-dened problems that spawned multiple
omparison pro
edures su
h as Tukey's
studentized range, these appli
ations
an involve thousands of tests. For example,
mi
roarrays lead one to
ompare a
ontrol group to a treatment group using measured
dieren
es on over 6,000 genes (Dudoit, Shaer and Boldri
k, 2003). In
ontrast, the
example used by Tukey to motivate the problems of multiple
omparisons
ompares
the means of only 6 groups (Tukey, 1953, available in Braun (1994)). If one
onsiders
the possibility for intera
tions, then the number of tests is virtually innite. Be
ause
our approa
h allows the testing to pro
eed sequentially, the
hoi
e of future hypotheses
an depend upon the results of previous tests. Thus, having dis
overed dieren
es in
ertain genes, an investigator
ould, for example, dire
t attention toward related genes
identied by
ommon trans
ription fa
tor binding sites (Gupta and Ibrahim, 2005).
Our methodology has two key
omponents, a
riterion and a pro
edure. For mul-
tiple testing, we distinguish
riteria that
ontrol the number of Type I errors from
testing pro
edures. We
all our new
riterion the ex
ess dis
overy
ount (EDC). EDC
tra
ks the expe
ted number of true reje
tions among the reje
ted hypotheses. To
on-
trol EDC, a test pro
edure must guarantee that the expe
ted
ount of true reje
tions
ex
eeds a
hosen fra
tion of the number of reje
ted hypotheses. For example, one might
want to guarantee that at least 95% of the reje
ted hypotheses were reje
ted
orre
tly.
Although one
an use EDC to
ontrol traditional tests, the advantage of this
riterion
is that it permits one to
ontrol adaptive testing pro
edures in whi
h the
hoi
e of the
next hypothesis to test depends on previous results.
The se
ond
omponent of our methodology is a
lass of adaptive testing pro
edures
that we
all alpha-investing rules. We show that testing pro
edures in this
lass
ontrol
EDC. Alpha-investing rules allow one to test a possibly innite stream of hypotheses,
a
ommodate dependent tests, and in
orporate domain knowledge. Alpha-investing
rules mimi
alpha-spending rules that are
ommonly used in
lini
al trials. Unlike
alpha-spending rules, however, alpha-investing rules treat ea
h test as an \investment."
EDC and Alpha-investing 3
Ea
h test has a
ost, but
an generate a prot in the form of the an in
rease in the
amount of Type I error available for subsequent tests.
The rest of this paper develops as follows. We rst review several ideas from the
literature on multiple
omparisons, parti
ularly those related to the family wide error
rate and the false dis
overy rate. With these ideas in pla
e, we dene EDC in Se
tion 3
and alpha-investing rules in Se
tion 4. In Se
tion 5, we show that alpha-investing rules
ontrol a generalized version of EDC. We give several examples of testing a sequen
e
of hypotheses using alpha-investing rules in Se
tion 6. We
lose in Se
tion 7 with a
brief summary dis
ussion, and defer the single proof to the appendix.
Table 1: Counts of the number of null hypotheses that are true and false, displayed as sums
of unobserved random variables. The marginal random variable R(m) that ounts the total
number reje ted is observable, but internal ounts su h as V (m) depend upon .
Claim
A
ept H0 Reje
t H0
True H0 U ( m) V ( m) m0
State H0
T (m) S (m) m m0
m R(m) R(m) m
number of
orre
tly reje
ted null hypotheses. We index these random variables with a
supers
ript to distinguish them from a statisti
su
h as R(m); V (m) and S (m) are
not observable without . For a null model, m0 = m, V (m) = R(m) and S (m) = 0.
A basi
premise of multiple testing is to
ontrol the
han
e for any false reje
tion.
The family wide error rate (FWER) is the probability of falsely reje
ting any null
hypothesis from H(m), regardless of the values of the underlying parameters,
FWER(m) sup P (V (m) 1) : (1)
2
An important spe
ial
ase is
ontrol of FWER under the null model. We refer to this
riterion as the size of a pro
edure,
Size(m) = P0(V (m) 1) ; (2)
where P0 denotes the probability measure under the null model. All of the pro
edures
that we des
ribe
ontrol Size(m), but not all
ontrol the more general FWER.
The Bonferroni pro
edure is familiar and represents an important ben
hmark for
omparison. Let p1 ; : : : ; pm denote the p-values of tests of H1; : : : ; Hm . Given a
hosen
level 0 < < 1, the usual Bonferroni pro
edure reje
ts those Hj for whi
h pj =m.
Let the indi
ators Vj 2 f0; 1g tra
k in
orre
t reje
tions; Vj = 1 if Hj is in
orre
tly
reje
ted and is zero otherwise. Then V (m) = P Vj and the inequality
m
X
P (V (m) 1) P (Vj = 1) (3)
j =1
shows that this pro
edure
ontrols FWER(m) . More generally, one need not
distribute equally over H(m); the pro
edure only requires that the sum of the -
levels is not more than . For example, alpha-spending rules allo
ate over a
olle
tion
EDC and Alpha-investing 5
The test of H(1) has p-value p(1) , the test of H(2) has p-value p(2) and so forth.
Holm's pro
edure reje
ts those hypotheses H(j) for whi
h p(j) is less than an in-
reasing sequen
e of thresholds. The pro
edure rst
ompares the smallest p-value
to the Bonferroni threshold. If p(1) > =m, the pro
edure stops and does not re-
je
t any hypothesis. Consequently, Size(m) . If p(1) =m, the pro
edure re-
je
ts H(1) and moves on to test H(2) . Rather than
ompare p(2) to =m, however,
Holm's pro
edure
ompares p(2) to a larger threshold, =(m 1). In general, if we
dene jd = minfj : p(j) > =(m j + 1)g, then Holm's step-down pro
edure reje
ts
H(1) ; : : : ; H(jd 1) . Be
ause of the nesting, this testing pro
edure is
losed in the sense
of Mar
us, Peritz and Gabriel (1976) and hen
e
ontrols FWER(m) . Obviously,
when
ompared to using the Bonferroni threshold for ea
h p-value, Holm's method has
larger power. The improvement is small, however, when m is large be
ause =m is so
lose to =(m j ) when testing the smallest p-values.
The false dis
overy rate (FDR)
riterion
ontrols the size of a testing pro
edure
but introdu
es a dierent type of
ontrol if the null model is reje
ted. Benjamini
and Ho
hberg (1995) dene FDR as the expe
ted proportion of false positives among
reje
ted hypotheses,
!
V (m)
FDR(m) = E R(m) j R(m) > 0 P(R(m) > 0) : (4)
For the null model, R(m) = V (m) and FDR(m) = FWER(m). Thus, test pro
edures
that
ontrol FDR(m) have Size(m) . Under the alternative, FDR(m) de
reases
as the number of false null hypotheses m m0 in
reases (Dudoit et al., 2003). As a
result, FDR(m) be
omes more easy to
ontrol in the presen
e of non-zero ee
ts,
allowing more powerful pro
edures. Variations on FDR in
lude pFDR (whi
h drops
EDC and Alpha-investing 6
the term P(R > 0) Storey, 2002, 2003) and the lo
al false dis
overy rate fdr(z) (whi
h
estimates the false dis
overy rate as a fun
tion of the size of the test statisti
Efron,
2005a,b). Closer to our work, Meinshausen and Ri
e (2004) and Meinshausen and
Buehlmann (2004)
onsider estimates of m0, the total number of false hull hypotheses
in H(m).
Benjamini and Ho
hberg (1995) show that the following so-
alled step-up testing
pro
edure
ontrols FDR. First, assume that the p-values are independent and dene
ju = maxfj : p(j ) j =mg. Using the inequality of Simes (1986), they show that the
testing pro
edure that reje
ts H(1) ; : : : ; H(j)
ontrols FDR(m) . This testing pro-
edure thus
ontrols Size(m) , but does not
ontrol FWER for all . A similar step-
down pro
edure that reje
ts H(1) ; : : : ; H(jd 1) for jd = minfj : p(j) > =(m j + 1)g
also has FDR(m) . Although this step-down pro
edure has less power than its step-
up
ousin (be
ause jd 1 ju ), it has more power than Holm's pro
edure. Holm's
step-down pro
edure sets thresholds for the p-values to m ; m 1 ; m 2 ; : : : whereas a
Simes-based step-down pro
edure uses the larger thresholds m ; 2m ; 3m ; : : :. A
ost of
this greater power is a restri
tion to independent tests that Holm's pro
edure does
not require. Subsequent papers (su
h as Benjamini and Yekutieli, 2001; Sarkar, 1998;
Troendle, 1996)
onsider situations in whi
h this type of step-up/step-down testing
ontrols FDR under dependen
e, but the results obtain only for
ertain types of de-
penden
e.
Figure 1: EDC ontrols the gap between the number of true reje tions S and a fra tion of
the number of reje ted null hypotheses. A strong signal implies most of the null hypotheses
in H are false.
Count
EΘ R
EΘ SΘ
EDC
Γ EΘ R - Α
Θ
No signal Moderate Strong signal
FDR(m)
ontrols the expe
ted proportion of false positives V (m)=R(m) given that
R(m) > 0. EDC;
(m) instead
ontrols the expe
ted dieren
e in the
ounts S (m)
R(m). Being a ratio, 0 FDR(m) 1 and hen
e resembles a
onditional probability.
In
ontrast EDC;
(m) need not be positive, let alone lie between 0 and 1.
We are most interested in pro
edures su
h as that suggested by Figure 1 for whi
h
EDC is positive. In this gure, the x-axis indi
ates the amount of signal in the sense of
the proportion of null hypotheses in H that are false. \Strong signal" implies that many
of the m hypothesis are false, whereas \no signal" implies the null model. We will say
that a multiple testing pro
edure \
ontrols EDC" if EDC;
(m) 0. Control of EDC
amounts to showing that the expe
ted
ount of true reje
tions is at least
E R(m) .
Under the null model, S (m) = 0 so that
EDC;
(m) =
E R(m)
Size(m) :
Thus, a pro
edure that
ontrols EDC;
(m) 0 also
ontrols Size(m) =
. One
an
also use EDC to
ontrol FWER. If
= 1,
ontrol of EDC implies
ontrol of FWER
be
ause
EDC;1 (m) 0 ) P (V (m) 1) E V (m) :
This property suggests that one
an think of as
ontrolling the FWER when
1.
The se
ond tuning parameter
more
losely resembles FDR in the sense of
ontrol-
ling the pro
edure on
e it reje
ts the null model. Assuming that E R(m) > 0,
ontrol
EDC and Alpha-investing 8
Figure 2: When viewed as ontrolling the proportion of false positives among reje ted null
hypotheses, EDC ontrols the gap between the ratio of expe tations EV =ER and a de-
reasing fun tion of the number of reje ted null hypotheses. A strong signal in the heuristi
FWER ΑΓ
H1-ΓL+ΑEΘ R
EΘ VΘ EΘ R
Θ
No signal Moderate Strong signal
Figure 3: FDR (left) and EDC (right, with = 0:05 and
= 0:95)
ontrol the size of test
pro
edures (1 = 0) and the number reje
ted as the level of signal 1 grows. The lines show
FDR and EDC for the Bonferroni pro
edure (|), Simes-based step-down testing ( ), and
a naive pro
edure that reje
ts ea
h hypothesis at level = 0:05 ( ).
FDR EDC
0.08
0.07 6
0.06
4
0.05
0.04 2
0.03
0
0.02
0.01 -2
Π1 Π1
.1 .2 .3 .4 .5 .6 .7 .8 .9 1 .1 .2 .3 .4 .5 .6 .7 .8 .9 1
note also that the Bonferroni pro
edure produ
es linear trends in EDC. The slope of
the line seen in the right panel of Figure 3 depends upon the
hoi
e of
in EDC;
.
Conservative
methods for
e
V (m) to be small regardless of the presen
e of signal so
that E S (m)
R(m) +
(1
) 1.
4 Alpha-Investing Rules
Alpha-investing rules provide a framework for devising multiple testing pro
edures that
ontrol EDC in a dynami
setting that allows streams of hypotheses. Alpha-investing
rules resemble alpha-spending rules su
h as those often used in sequential
lini
al trials.
In a sequential trial, investigators routinely monitor the a
umulating results for safety
and eÆ
a
y. This monitoring leads to a sequen
e of tests of one (or several) null
hypotheses as the data a
umulate. Alpha-spending (or error-spending) rules
ontrol
the level of su
h tests. Given an overall Type I error rate for the trial, su
h as = 0:05,
alpha-spending rules allo
ate, or spend, over a sequen
e of tests. As Tukey (1991)
writes, \On
e we have spent this error rate, it is gone." When repeatedly testing one
null hypothesis H0 in a
lini
al trial, spending rules guarantee that P (reje
t H0)
when H0 is true.
While similar in that they allo
ate Type I error over multiple tests, alpha-investing
rules dier from alpha-spending rules in the following way. An alpha-investing rule
earns additional probability toward subsequent Type I errors with ea
h reje
ted hy-
pothesis. Rather than treating ea
h test as an expense that
onsumes its Type I
error rate, an alpha-investing rule treats tests as investments, motivating our
hoi
e of
name. In keeping with this analogy, we
all the Type I error rate available to the rule
its alpha-wealth. As with an alpha-spending rule, an alpha-investing rule
an never
spend more than its
urrent alpha-wealth. Unlike an alpha-spending rule, however, an
alpha-investing rule earns an in
rement in its alpha-wealth ea
h time that it reje
ts a
null hypothesis. For alpha-investing, Tukey's remark be
omes \If we invest the error
rate wisely, we'll earn more for further tests." A pro
edure that invests its alpha-wealth
in testing hypotheses that are reje
ted a
umulates additional wealth toward subse-
quent tests. The more hypotheses that are reje
ted, the more alpha-wealth it earns. If
the test of Hj is not signi
ant, however, the rule loses the -level invested in this test
and its alpha-wealth de
reases. The more wealth a rule invests in testing hypotheses
that are not reje
ted, the less alpha-wealth remains for subsequent tests.
EDC and Alpha-investing 11
More spe
i
ally, an alpha-investing rule is a fun
tion I that determines the -
level for testing the next hypothesis in a sequen
e of tests. We assume an exogenous
system external to the investing rule determines the next hypothesis to test. (Though
not part of the investing rule itself, this exogenous system
an use the sequen
e of
reje
tions Rj to determine the next hypothesis to test.) An alpha-investing rule has
two parameters: the initial alpha-wealth and the amount earned (
alled the pay-out)
when a null hypothesis is reje
ted. Let W (k) 0 denote the alpha-wealth a
umulated
by an investing rule after k tests; W (0) is the initial alpha-wealth. For example, one
might
onventionally set W (0) = 0:05 or 0:10. At step j , an alpha-investing rule sets
the level for testing Hj to some value j up to its
urrent wealth, 0 j W (j 1).
The level j for testing Hj typi
ally depends upon the sequen
e of prior out
omes R1 ,
R2 ; : : : ; Rj 1 , and so we write an alpha-investing rule in general as
j = IW (0);! (R1 ; R2; : : : ; Rj 1)
= IW (0);! (j ) : (7)
The out
omes of the sequen
e of tests determine the alpha-wealth W (j 1) available
for testing Hj+1. Let pj denote the p-value of the test of Hj . If pj j , the test reje
ts
Hj . In this
ase, the investing rule pays log 1=(1 pj ) pj from the invested j and
earns a pay-out ! that is added to its alpha-wealth. If pj > j , the pro
edure does not
reje
t Hj and its alpha-wealth de
reases by log(1 j ). The
hange in the alpha-wealth
is thus 8
< ! + log(1 pj ) if pj j ;
W (j ) W (j 1) = : (8)
log(1 j ) if pj > j :
The appearan
e of log(1 ) and log(1 p) in (8) deserves some explanation.
Consider the following \mi
ro-investment" approa
h to testing a single null hy-
pothesis H0. Set the initial wealth W (0) = and assume that the test of H0 returns
p-value p0. Rather than use one test at level , a mi
ro-investment approa
h uses a
sequen
e of tests, ea
h risking a small amount of the total alpha-wealth. First
test H0 at level , reje
ting H0 if p0 . If p0 > , the investing rule pays for the
rst test, and then tests H0
onditionally on p0 > at level . This se
ond test reje
ts
H0 if < p0 2 2 . If this se
ond test does not reje
t H0 , the investing rule again
pays and retests H0, now
onditionally on p0 > 2 2. This pro
ess
ontinues until
the investing rule either spends all of its alpha-wealth or reje
ts H0 on the kth attempt
be
ause
1 (1 )k 1 < p0 1 (1 )k :
EDC and Alpha-investing 12
If the pro
edure reje
ts H0 after k tests, then the total of the mi
ro-payments made is
k =
log(1 p0 ) ! log(1 p ) as ! 0 :
log(1 ) 0
The in
rements to the wealth dened in equation (8) essentially treat ea
h test as a
sequen
e of su
h mi
ro-level tests.
In the next se
tion, we show that alpha-investing rules that a
umulate alpha-
wealth in this way
ontrol EDC. The initial alpha-wealth W (0)
ontrols the
han
e
for reje
ting the null model. Under the null model when no hypothesis is reje
ted, an
investing rule performs like an alpha-spending rule with level W (0) and so Size(m)
W (0). Results des
ribed in the next se
tion permit one to make a
orresponden
e
between the parameters W (0) and ! that
hara
terize an alpha-investing rule and the
parameters and
that identify EDC. In parti
ular, to
ontrol EDC, it will be shown
most natural to asso
iate W (0) with and ! with
.
Whereas W (0)
ontrols the probability of reje
ting the null model, the pay-out
!
ontrols how the testing pro
edure performs on
e it has reje
ted the null model.
The notion of
ompensation for reje
ting a hypothesis
aptured in (8) allows one to
build
ontext-dependent information into the testing pro
edure. Suppose that the
substantive
ontext suggests that the rst few hypotheses are most likely to be those
that are reje
ted and that false hypotheses
ome in
lusters. In this setting, one might
onsider using an alpha-investing rule like the following. Assume that the last reje
ted
hypothesis is Hk . If false hypotheses are
lustered, an alpha-investing rule should
invest most of its wealth W (k) available after reje
ting Hk in testing Hk+1 . A rule
that does this is
IW (0);! (k) = 6 W(2k ) (k 1k)2 ; k = k + 1; : : : ; minfj : j > k ; Rj = 1g : (9)
This rule invests 6=2 0:6 of its wealth in testing H1 or the null hypothesis Hk+1
that follows a reje
ted hypothesis. The -level falls o rapidly at the rate 1=k2 as more
subsequent hypotheses are tested and not reje
ted. If the substantive insight is
orre
t
and the false null hypotheses are
lustered, then tests of hypotheses like H1 or Hk+1
represent \good investments." An example in Se
tion 6 illustrates these ideas.
While it is relatively straightforward to devise investing rules, it may be diÆ
ult
a priori to order the hypotheses in su
h a way that those most likely to be reje
ted
ome rst. Su
h an ordering relies heavily on the stru
ture of the spe
i
testing
situation. Another
ompli
ation is the
onstru
tion of tests that provide the p-values
EDC and Alpha-investing 13
that determine the alpha-wealth of an investing rule a
ording to (8). In order to show
that a pro
edure
ontrols EDC, we require a test of Hj to have the property that
8 2 ; E (Vj j Rj 1; Rj 2; : : : ; R1 ) j : (10)
This
ondition amounts to requiring that,
onditionally on having either a
epted or
reje
ted the prior j 1 hypotheses, the test of Hj is done at level no higher than the
nominal
hoi
e j . The tests need not be independent.
Remark. These pro
edures only require that the test of Hj maintain the stated -
level
onditionally on the binary random variables R1, R2 ; : : : ; Rj 1. In parti
ular, we
note that the test is not
onditioned on the test statisti
(su
h as a z-s
ore) or parameter
estimate. Adaptive testing in a group sequential trial (e.g. Lehma
her and Wassmer,
1999) uses the information on the observed z-statisti
at the rst look. Tsiatis and
Mehta (2003) shows that using this information leads to a less powerful test
ompared
to traditional group sequential tests that only look at a
eptan
e at the rst look.
100% of the hypotheses to be falsely reje
ted. Be
ause all of the null hypotheses are
true, S (m) = 0 and EDC;
(m) =
E R(m) = (1
m) ! 1 as m ! 1.
Hen
e EDC;
= 1.
Se
ond, we observe that it is always possible to
onstru
t a test pro
edure for
whi
h EDC;
0. The Bonferroni pro
edure oers a
on
rete example. Although
the
ommon appli
ation of the Bonferroni rule assigns equal -level to ea
h test, this
need not be the
ase. All that is ne
essary is that the sum of the levels be less than
P
. If one tests Hj at level j and j j , then E V (m) for all m. Thus,
EDC;
(m) 0 for all and m.
The following theorem states that an alpha-investing rule IW (0);! with wealth de-
termined by (8)
ontrols EDC so long as the pay-out ! is not too large. The theo-
rem follows by showing that a sto
hasti
pro
ess related to the alpha-wealth sequen
e
W (0); W (1); : : : is a sub-martingale. Be
ause the proof of this result relies only on
the optional stopping theorem for martingales, we do not require independent tests,
though this is the
ertainly the easiest
ontext in whi
h to show that the p-values are
honest in the sense required for (10) to hold.
Theorem 1 An alpha-investing rule IW (0);! governed by (8) with initial alpha-wealth
W (0) and pay-out ! 1
ontrols EDC;
,
6 Examples
The examples in this se
tion illustrate alpha-investing rules and EDC. Our rst two
examples
onsider testing a large, but xed,
olle
tion of m hypotheses for whi
h we ob-
serve independent p-values p1, p2, : : :, pm. The rst des
ribes an alpha-investing rule
that mimi
s Simes-based step-down testing. The se
ond shows how alpha-investing
rules are able to leverage domain knowledge to form a more powerful multiple test-
ing pro
edure. A third example des
ribes alpha-investing when testing a stream of
hypothesis using dependent test statisti
s.
EDC and Alpha-investing 15
We
ompare alpha-investing to the Simes-based step-down testing pro
edure des
ribed
in Se
tion 2. This pro
edure reje
ts H(1) ; H(2) ; : : : ; H(jd 1) , where jd = minfk : p(k) >
k =mg identies the rst test that is not reje
ted. (Step-up testing does not provide
a stopping time.) Assume that the step-down pro
edure
ontrols FDR(m) and
reje
ts a small number k > 0 of the m hypotheses. It follows then that the p-values
have the following stru
ture:
p(1) =m; p(2) 2=m; : : : ; p(k) k=m; and p(k+1) > (k + 1)=m : (13)
To reprodu
e this behavior with alpha-investing,
onsider the following approa
h.
Set the initial alpha-wealth W (0) = and ! = . Dene the alpha-investing to
allo
ate its available alpha-wealth W (j ) equally over the hypotheses that have not been
reje
ted, and begin by testing ea
h hypothesis at the Bonferroni level =m. Be
ause of
the stru
ture in the p-values (13), this rst pass reje
ts at least one hypothesis, namely
H(1) . To keep the presentation simple, suppose that only one hypothesis has p-value
less than =m. The pro
edure pays log(1 =m) for ea
h test that does not reje
t,
and earns + log(1 p(1) ) for reje
ting H(1) . Hen
e, after testing ea
h hypothesis at
level =m, its alpha-wealth is at least
W (m) = W (0) + + log(1 p(1) ) + (m 1) log(1 =m)
2 + m log(1 =m)
2=m (14)
After this rst pass through the hypotheses, its alpha-wealth is virtually un
hanged,
and it retains enough wealth to reje
t H(2) .
For the se
ond pass through the remaining m 1 null hypotheses, the alpha-
investing rule reje
ts any hypothesis for whi
h pj 2=m, as in the Simes pro
edure.
Be
ause these tests
ondition on pj > =m, this round of testing requires that the
alpha-investing rule test ea
h of the remaining m 1 hypothesis at level
P < p 2 j p > = :
0 m j
m j
m m
It possesses enough wealth after the se
ond round to do this be
ause, from (14) for
1=2,
W (m) 2 =m
m 1
m 1 m :
EDC and Alpha-investing 16
As in the rst round, this se
ond pass again approximately
onserves the alpha-wealth
of the pro
edure. Thus, so long as m is large and k m so that bounds similar
to (14) hold, ea
h pass though the hypotheses
onserves enough alpha-wealth for the
next round of tests. In this way, the investing rule gradually raises the threshold for
reje
ting a hypothesis as the number of reje
ted hypotheses in
reases.
The simulation summarized in the next se
tion
ompares this alpha-investing rule
to step-down testing. The alpha-investing rule generally does slightly better (reje
ts
more false hypotheses) than step-down testing for two reasons. First, the lower bound
(14) for the wealth W (m), for example, assumes p(1) = =m. In fa
t, we would expe
t
p(1) to be
loser to =(2m), on average. Se
ond, our des
ription assumes that the
p-values reje
ted by step-down testing are evenly distributed, with one between ea
h
threshold. Instead, it is likely that some passes of the investing rule will reje
t more
than one hypothesis and thus have greater alpha-wealth for testing in the next round
than suggested by these lower bounds.
The performan
e of an alpha-investing rule improves, in the sense of being more pow-
erful, if the investigator \knows the s
ien
e". If the investigator is able to order the
hypotheses a priori so that those most likely to be reje
ted are tested rst, then alpha-
investing
an reje
t
onsiderably more hypotheses than step-down testing. The full
benet is only realized, however, when one exploits an aggressive investing rule. The
prior investing rule assumes that the hypotheses are arranged in no parti
ular order
and spreads its alpha-wealth evenly over the remaining hypotheses.
Suppose that the test pro
edure reje
ts Hk and is about to test Hk+1. Rather
than spread its
urrent alpha-wealth W (k ) evenly over the remaining hypotheses, a
rule
an invest more in testing the next hypothesis. For example, one
an allo
ate
W (k ) using a dis
rete probability mass fun
tion su
h as this version of the investing
rule (9). If none of the remaining hypotheses are reje
ted, then the level for testing Hj
is
W (k ) 1 ; j = k + 1; : : : ; m ;
j = (15)
h (j k )2
m k ;2
where the normalizing
onstant hq;2 = Pqi=1 1=i2 . If one of these tests reje
ts a hypoth-
esis, the pro
edure reallo
ates its wealth so that all is spent by the time the pro
edure
tests Hm. Mimi
ing the language of nan
ial investing, we des
ribe this type of alpha-
EDC and Alpha-investing 17
and EDC (right), as does step-down testing ( ). Conservative alpha-investing assumes
no domain knowledge, whereas aggressive alpha-investing uses domain knowledge, here the
ordering of 2i .
FDR EDC
0.07 6
0.06
0.05 4
0.04 2
0.03
0.02 0
0.01 -2
Π1 Π1
.1 .2 .3 .4 .5 .6 .7 .8 .9 1 .1 .2 .3 .4 .5 .6 .7 .8 .9 1
hypotheses than step-down testing. The two be
ome more similar as signal strength
grows (in the form of more false null hypotheses). As dis
ussed in the prior se
tion,
onservative alpha-investing reje
ts a few more hypothesis, about 5-10%, than Simes-
based step-down testing.
The previous examples illustrate EDC and alpha-investing rules when testing a
losed
set of m hypotheses using independent tests. For dependent tests, however, step-down
testing does not guarantee
ontrol of FDR. In
omparison, one
an nd alpha-investing
rules that
ontrol EDC.
EDC itself makes no assumption of independen
e of the the tests, but does require
that the tests be
onditionally
orre
t in the sense of (10). When hypothesis tests
are independent, it is simple to assure that ea
h test indeed has level j . One need
only form ea
h test as though only one hypothesis were being tested; the out
omes of
the prior tests R1 ; R2 : : : ; Rj 1 do not ae
t its level. This
ondition is mu
h more
diÆ
ult to establish when the tests are dependent. Although EDC allows any sort
of dependen
e, it may not be possible to
onstru
t tests that satisfy this
ondition
without making assumptions on the form of the dependen
e.
In some
ases, however, known properties of multivariate distributions suggest a
suitable test pro
edure. For example, suppose that the test statisti
s Y = (Y1; : : : ; Ym )
for H(m) have a multivariate normal distribution with mean ve
tor ~ and
ovarian
e
EDC and Alpha-investing 19
Figure 5: Aggressive alpha-investing using (15) exploits domain knowledge to a hieve higher
power than Simes-based step-down testing. This plot shows the per entage of orre tly
reje ted null hypotheses for ea h pro edure, relative to step-down testing. Both alpha-
investing rules have more power than step-down testing with the same size.
% Rejected vs Step-Down
150
140
130
120 Aggressive
110
Conservative
100
Step-down
Π1
.1 .2 .3 .4 .5 .6 .7 .8 .9 1
Hj : ~0r;j ~r = 0; ~0
;j ~
= 0. Standard results from linear models show that the usual
tests of Hj and Hk are independent if ~0r;j~r;k = 0 and ~0
;j~
;k = 0. Suppose one
begins with tests of the row ee
ts (
= 0). There are no
onstraints on the tests
until reje
ting a hypothesis, Hk say. At this point, one
an
ommen
e testing
olumn
ee
ts, ignoring the prior results for the row ee
ts be
ause these are orthogonal. One
an
ontinue testing other hypotheses among the row ee
ts so long as ~r;j is orthogonal
to ~r;k .
A similar pro
edure
an be used in stepwise regression. Consider the familiar
forward stepwise sear
h, seeking predi
tors of the response Y among X1 ; X2 ; : : : ; Xm
in a linear model
Yi = 0 + 1 X1;i + 2 X2;i + + m Xm;i + Zi ; Zi iid
N (0; 2 ) :
Assume that all of the variables have mean zero and 0 = 0. Under the normal linear
model with known error varian
e, (16) implies that tests of Hj : j = 0 based on the
familiar z-s
ores for the predi
tors Zj = (Xj0 Y )=(Xj0 Xj ) satisfy (10) until some Hk
is reje
ted. For further tests, one
an assure that (10) holds by sweeping Xk and all
predi
tors among X1 ; X2 ; : : : ; Xk 1 that are
orrelated with Xk from the remaining
predi
tors. In pra
ti
e, most predi
tors are
orrelated with ea
h other to some extent
and this
ondition requires sweeping X1 ; X2 ; : : : ; Xk from subsequent predi
tors. If
we
olle
t these k predi
tors into an n k matrix X , then the subsequent predi
tors
would be X~j = (I X 0 (X 0 X ) 1 X )Xj ; j = k + 1; : : :. The resulting loss of variation
in predi
tors suggests it would be prudent to at least partially \orthogonalize" the
predi
tors prior to using this type of sear
h.
7 Dis
ussion
The
ombination of EDC with alpha-investing rules invites the use of adaptive strate-
gies for testing multiple hypotheses. Rather than posit a xed set of hypotheses in
advan
e of analysis, one
an oer a strategy for determining whi
h hypotheses to test
next after getting some preliminary results. We would expe
t good strategies to lever-
age domain knowledge and be spe
i
to the parti
ular method of analysis.
Part of our motivation for developing EDC and alpha-investing rules arose from
our work using stepwise regression for data mining (Foster and Stine, 2004). In this
appli
ation, we
ompared forward stepwise regression to tree-based
lassiers for pre-
di
ting the onset of personal bankrupt
y. To make regression
ompetitive, we expanded
EDC and Alpha-investing 21
the stepwise sear
h to in
lude all possible intera
tions among more than 350 \base"
predi
tors. This produ
ed more than 67,000 possible predi
tors. Be
ause so many of
these predi
tors were intera
tions (more than 98%), it is not surprising that most of
the predi
tors identied by the sear
h were intera
tions. Furthermore, be
ause of the
wide s
ope of this sear
h, the pro
edure la
ked power to nd subtle ee
ts that while
small, improve the predi
tive power of a model. It be
ame apparent to us that a hybrid
sear
h that only
onsidered the intera
tion Xj Xk , say, after in
luding both Xj and
Xk as main ee
ts might be very ee
tive. At the time, however, we la
ked a method
for
ontrolling the sele
tion pro
edure when the s
ope of the sear
h dynami
ally ex-
pands as in this situation. We expe
t to use alpha-investing heavily in this work in the
future.
We spe
ulate that the greatest reward from developing a spe
ialized testing strategy
will
ome from developing methods that sele
t the next hypothesis rather than spe
i
fun
tions to determine how is spent. The rule (15) invests most of the
urrent wealth
in testing hypotheses following a reje
tion. One
an imagine quite a few other
hoi
es.
Our work and those of others in information theory (Rissanen, 1983; Foster, Stine
and Wyner, 2002), however, suggest that one
an nd universal alpha-investing rules.
Given a pro
edure for ordering the hypothesis, a universal alpha-investing rule would
lead to reje
ting as many hypothesis as the best rule within some
lass. We would
expe
t su
h a rule to spend its alpha-wealth a bit more slowly than the simple rule
(15), but retain this general form.
Another area of appli
ation for alpha-investing is in group-sequential
lini
al trials.
In other work (Foster and Stine, 2005) we address the
on
ept of adaptive design with
a modi
ation for alpha-investing. We show that the
omplaints raised in Tsiatis and
Mehta (2003) about the eÆ
ien
y of su
h tests
an be mitigated by proper alpha-
investing. At the same time, we allow the resear
her freedom to design rules that
guide how to spend or invest their alpha-wealth.
so that
EDC;
= inf inf E (ed
;
(; M )) :
2 M 2M
Now dene
A(j ) ed
;
(; j ) W (j ) :
Our main lemma shows that A(j ) is a sub-martingale for alpha-investing rules with
initial alpha-wealth W (0) and pay-out ! 1
. A sub-martingale is \in
reasing"
in the sense that
E (A(j ) j A(j 1); A(j 2); : : : ; A(1)) A(j 1) :
By denition S (0) = R(0) = 0 so that ed
;
(; 0) = . So if W (0) , ! 1
and A(j ) is a sub-martingale, then the optional stopping theorem implies that for all
nite stopping times M
E (ed
;
(; M )) E (ed
;
(; M ) W (M )) W (0) 0 :
The rst inequality follows be
ause the alpha-wealth W (j ) 0 [a:s:℄, and the se
ond
inequality follows from the sub-martingale property. Sin
e EDC for alpha-investing
rules is the inmum over su
h expe
tations, all of whi
h are non-negative, EDC itself
is non-negative.
Thus to show Theorem 1 all we need is the following lemma:
Lemma 1 Let V (m) and R(m) denote the
umulative number of false reje
tions and
the
umulative number of all reje
tions, respe
tively, when testing a sequen
e of null
hypotheses fH1 ; H2 ; : : :g using an alpha-investing rule IW (0);! with initial alpha-wealth
W (0) , pay-out ! 1
, and
umulative alpha-wealth W (m). Then the pro
ess
A(j ) ed
;
(; j ) W (j )
= (1
)R(j ) V (j ) + W (j )
is a sub-martingale,
Similarly write the a
umulated alpha-wealth W (m) and A(m) as sums of in
rements,
P Pm
W (m) = m j =0 Wj and A(m) = j =0 Aj . Let j denote the alpha level of the test of
Hj that satises the
ondition (10). The
hange in the alpha-wealth from testing Hj
an be written as:
Wj = Rj ! + log(1 (pj ^ j )) ;
where ^ is the minimum operator. Substituting this into Aj we get
Aj = (1
!)Rj Vj log(1 (pj ^ j )) :
Sin
e Rj 0 and 1
! 0 by the
onditions of the lemma, it follows that
Aj Vj log(1 (pj ^ j )) : (18)
If j 62 Hj , then Vj = 0 and Aj 0 almost surely. So we only need to
onsider the
ase in whi
h the null hypothesis Hj is true.
Abbreviate the
onditional expe
tation
Ej 1 (X ) = E (X j A(1); A(2); : : : ; A(j 1)) :
Then, when Hj is true, pj U [0; 1℄ so that
Z1
Ej 1( log(1 (pj ^ j )) = log(1 (p ^ j ))dp
Z0j Z1
= log(1 p)dp log(1 j )dp
0 j
= j :
Sin
e Ej 1(Vj ) j by the denition of this being an j level test, equation (18)
implies Ej 1 Aj 0.
Referen
es
Benjamini, Y. and Ho
hberg, Y. (1995) Controlling the false dis
overy rate: a pra
ti
al
and powerful approa
h to multiple testing. Journal of the Royal Statist. So
., Ser.
B, 57, 289{300.
Benjamini, Y. and Yekutieli, D. (2001) The
ontrol of the false dis
overy rate in multiple
testing under dependen
y. Annals of Statisti
s, 29, 1165{1188.
EDC and Alpha-investing 24
Braun, H. I. (ed.) (1994) The Colle
ted Works of John W. Tukey: Multiple Compar-
isons, vol. VIII. New York: Chapman & Hall.
Dudoit, S., Shaer, J. P. and Boldri
k, J. C. (2003) Multiple hypothesis testing in
mi
roarray experiments. Statisti
al S
ien
e, 18, 71{103.
Dykstra, R. L. (1980) Produ
t inequalities involving the multivariate normal-
distribution. Journal of the Amer. Statist. Asso
., 75, 646{650.
Efron, B. (2005a) Large s
ale simultaneous hypothesis testing: the
hoi
e of a null
hypothesis. Journal of the Amer. Statist. Asso
., 100, 96{104.
| (2005b) Sele
tion and estimation for large-s
ale simultaneous inferen
e.
Te
h. rep., Department of Statisti
s, Stanford University, http://www-
stat.stanford.edu/brad/papers/hivdata.
Foster, D. P. and Stine, R. A. (2004) Variable sele
tion in data mining: Building a
predi
tive model for bankrupt
y. Journal of the Amer. Statist. Asso
., 99, 303{313.
| (2005) Theoreti
al foundations for adaptive testing using alpha-investing rules. Te
h.
rep., Statisti
s Department, University of Pennsylvania.
Foster, D. P., Stine, R. A. and Wyner, A. J. (2002) Universal
odes for nite sequen
es
of integers drawn from a monotone distribution. IEEE Trans. on Info. Theory, 48,
1713{1720.
Genovese, Christopher, K. R. and Wasserman, L. (2004) False dis
overy
ontrol with
p-value weighting. in progress.
Gupta, M. and Ibrahim, J. G. (2005) Towards a
omplete pi
ture of gene regulation:
using Bayesian approa
hes to integrate genomi
sequen
e and expression data. Te
h.
rep., University of North Carolina, Chapel Hill, NC.
Holm, S. (1979) A simple sequentially reje
tive multiple test pro
edure. S
andinavian
Journal of Statisti
s, 6, 65{70.
Lehma
her, W. and Wassmer, G. (1999) Adaptive sample size
al
ulations in group
sequential trials. Biometri
s, 55, 1286{90.
EDC and Alpha-investing 25
Mar
us, R., Peritz, E. and Gabriel, K. R. (1976) On
losed testing pro
edures with
spe
ial referen
e to ordered analysis of varian
e. Biometrika, 63, 655{660.
Meinshausen, N. and Buehlmann, P. (2004) Lower bounds for the number of false null
hypotheses for multiple testing of asso
iations under general dependen
e. Te
h. Rep.
121, ETH Zuri
h, http://stat.ethz.
h/ ni
olai/.
Rissanen, J. (1983) A universal prior for integers and estimation by minimum des
rip-
tion length. Annals of Statisti
s, 11, 416{431.
Sarkar, S. K. (1998) Some probability inequalities for ordered Mtp2 random variables:
A proof of the Simes
onje
ture. Annals of Statisti
s, 26, 494{504.
Simes, R. J. (1986) An improved bonferroni pro
edure for multiple tests of signi
an
e.
Biometrika, 73, 751{754.
Storey, J. D. (2002) A dire
t approa
h to false dis
overy rates. Journal of the Royal
Statist. So
., Ser. B, 64, 479{498.
| (2003) The positive false dis
overy rate: a Bayesian interpretation and the q-value.
Annals of Statisti
s, 31, 2013{2035.
Tsiatis, A. A. and Mehta, C. (2003) On the ineÆ
ien
y of the adaptive design for
monitoring
lini
al trials. Biometrika, 90, 367{378.
Tukey, J. W. (1953) The problem of multiple
omparisons. Unpublished le
ture notes.
| (1991) The philosophy of multiple
omparisons. Statisti
al S
ien
e, 6, 100{116.