Академический Документы
Профессиональный Документы
Культура Документы
ABSTRACT: Methods of sample size and power calculations are reviewed for the most com-
mon study designs. The sample size and power equations for these designs are shown
to be special cases of two generic formulae for sample size and power calculations. A
computer program is available that can be used for studies with dichotomous, con-
tinuous, or survival response measures. The alternative hypotheses of interest may
be specified either in terms of differing response rates, means, or survival times, or
in terms of relative risks or odds ratios. Studies with dichotomous or continuous
outcomes may involve either a matched or independent study design. The program
can determine the sample size needed to detect a specified alternative hypothesis with
the required power, the power with which a specific alternative hypothesis can be
detected with a given sample size, or the specific alternative hypotheses that can be
detected with a given power and sample size. The program can generate help messages
on request that fadlitate the use of this software. It writes a log file of all calculated
estimates and can produce an output file for plotting power curves. It is written in
FORTRAN-77 and is in the public domain.
KEY WORDS: Power and sample size calculations, cohort studies, case-control studies, dichotomous
or continuous outcomes
INTRODUCTION
Sample size a n d p o w e r calculations for clinical trials a n d observational
studies are typically p e r f o r m e d either by h a n d [1-5], t h r o u g h the use of
p u b l i s h e d g r a p h s or tables [6-10], or t h r o u g h the use of specialized c o m p u t e r
p r o g r a m s [11-17 I. Selecting the s a m p l e size for a s t u d y inevitably requires a
c o m p r o m i s e balancing the n e e d s for p o w e r , e c o n o m y , a n d timeliness. In-
vestigators m u s t d e t e r m i n e their s t u d y ' s s a m p l e size, p o w e r , a n d detectable
alternative h y p o t h e s e s . To do this, it is useful to h a v e a p r o g r a m that, given
a n y two of the p r e c e d i n g p a r a m e t e r s , is to be able to calculate the third.
The p u r p o s e of this article is to introduce such a p r o g r a m (POWER) a n d
to review the p o w e r a n d s a m p l e size calculations that are required for the
m o s t c o m m o n s t u d y designs. For each design considered in this article, POWER
calculates the s a m p l e size n e e d e d to detect a particular difference in t r e a t m e n t
efficacy with a specified p o w e r , the p o w e r with w h i c h a particular difference
Address reprint requests to: William D. Dupont, S-3301 Medical Center North, Department of Pre-
ventive Medicine, Vanderbilt University School of Medicine, Nashville, TN 37232-2637.
Received May 10, 1989; revised October 11, 1989.
116 ControUed Clinical Trials 11:116-128 (1990)
0197-2456/1990/$3.50 Elsevier Science Publishing Co., Inc. 1990
655 Avenue of Americas, New York, New York I0010
Review of Power and Sample Size Calculations 117
can be detected with a given sample size, and the difference that can be
detected with a specified power and sample size.
The study designs that can be evaluated by this program are summarized
in Table 1. In this table, independent study designs refer to those in which
subjects are independently selected at random from some target population.
Matched designs are ones in which one or more control subjects are matched
to each case patient with respect to certain attributes. Paired designs are
matched designs with one control per case. In cohort studies, subjects are
followed forward in time until some event occurs [18]. All clinical trials are
cohort studies. Case-control studies look for risk factors in samples of case
patients with a specific disease and control patients who do not have this
disease [2]. A survival outcome variable consists of the time until death or
some morbid event occurs, or the total follow-up time for a patient who does
not suffer this event. Continuous outcome variables like weight or serum
creatinine may take a wide range of values. Dichotomous outcomes take only
two values such as success or failure, or the presence or absence of some risk
factor.
In justifying our study design, the actual power that will be achieved with
the selected sample size is more relevant than the power that would have
been achieved with other sample sizes that were considered but ultimately
rejected. The power associated with the selected sample size can be most
effectively demonstrated by plotting the power curve as a function of the true
value of the parameter of interest under different alternative hypotheses. The
coordinates for such curves can be generated by POWER for input into graph-
ics software packages.
POWER is in the public domain and is available from the authors on request
for the cost of distribution. It is written in ANSI standard FORTRAN-77 and
has been run successfully on both PC computers running under MS-DOS and
VAX computers running VMS.
METHODS
Generic Power and Sample Size Formulas
All the methods discussed in this paper are variations on a familiar theme
[3, sect. 5.2]. Suppose that we observe responses on n patients (or groups of
patients) that are dependent on some parameter 0. Let f(0) be a known mon-
otonic function of 0 and let S be a statistic derived from the n responses that
has a normal distribution with mean X/-dnf(0) and standard deviation ~(0). Let
[z] be the cumulative probability distribution for a standard normal random
variable and let z~ -- ~-111 - a] denote the critical value that is exceeded by
a standard normal random variable with probability o~. Let 00 and 0a denote
the values of 0 under the null and a specific alternative hypothesis, respec-
tively. Let or0 = o'(00), o-a = o-(0o) and let ~ = {f(%) - f(00)}/o'~ denote the
difference between f(%) and f(00) expressed in standard deviations of S under
the alternative hypothesis. Testing the null hypothesis against a two-sided
alternative hypothesis with type I error probability ~ leads to rejection of the
null hypothesis when
and substituting 8, 0, and cr~ into Eqs. 2 and 3 gives the p o w e r and sample
size formulas associated with the specific alternative h y p o t h e s i s that the ratio
to m e d i a n survival times equals R. This version of Eq. 3 is identical to the
sample size formula derived by Schoenfeld and Richter [10, p. 169]. These
formulas are appropriate for studies that will be analyzed using the log-rank
test [20] in addition to the parametric test of Schoenfeld and Richter [10].
D.~
0.8
PATIENTSPER
0.7-~ \ / TREATMENT
GROUP= 110
0.6-~ \ / TWOSIDEDTYPEI ERROR
~05- \ / PROBAB,LITY
= OA
~o " ~ / PATIENTACCRUALTIME
0.~,- \ / = 24 MONTHS
0..5- \ / ADDITIONALFOLLOW-UPTiME
O2- ~ / = 12 MONTHS
~ MEDIANSURVIVALTIMEFOR
0.1 CONTROLPATIENTS=I1 MONTHS
0.0 . . . . , . . . . , . . . . , . . . . 0
5 10 15 20 25
MEDIANSURVIVALTIMEFOREXPERIMENTALPATIENTSIN MONTHS
Figure 1 Power curve for a clinical trial in which 110 patients are randomized into
each of two treatments. The coordinates of this curve were generated by
the POWER computer program using the method Schoenfeld and Richter
[101.
A D D I T I O N A L EXAMPLES A N D C O M M E N T S
Survival Data
Consider a clinical trial in which patients are r a n d o m i z e d to one of two
treatments and then followed for some specified length of time, or until death.
A statistic that is c o m m o n l y used to assess treatment efficacy for such trials
is the log-rank test [20]. This test makes no a s s u m p t i o n s about h o w mortal
risk varies with time since r a n d o m i z a t i o n in either g r o u p , and is the optimal
test with respect to alternative h y p o t h e s e s in which the hazard ratio (instan-
taneous relative risk) b e t w e e n the treatment g r o u p s remains constant over
time [23]. The POWER p r o g r a m uses the m e t h o d of Schoenfeld and Richter
[10] to assess the p o w e r of trials that will be analyzed with the log-rank test.
In this m e t h o d , the alternative h y p o t h e s i s of interest is specified by the median
survival times on the two treatments. If preliminary data are available, these
times m a y be estimated from K a p l a n - M e i e r survival curves [20,24] to be the
times at which 50% of patients in each g r o u p will have died. These curves
must be extrapolated b e y o n d the available follow-up period if less than 50%
of patients have died d u r i n g this time. If the survival curves follow an ex-
ponential distribution then the ratio of the m e d i a n survival times of the ex-
perimental patients relative to the controls will equal the hazard ratio of
controls relative to experimental patients. Thus, if we only have preliminary
data on the control group, we can base our p o w e r calculations on the expected
m e d i a n survival time a m o n g control patients and the relative risk or hazard
ratio between experimental and control subjects that we wish to detect. POWER
permits p o w e r calculations for survival studies to be formulated in this way.
In the preceding example, this could be d o n e by a n s w e r i n g "2" to the third
question to specify that the alternative h y p o t h e s i s is to be expressed as a
hazard ratio. POWER will then ask for this ratio, which, in this example, is
Table 2 Q u e s t i o n s A s k e d by the P O W E R Program, the Acceptable A n s w e r s , a n d the Resulting S a m p l e Size
a n d P o w e r Calculation M e t h o d s T h a t Are Used a
Questions Answers
Type of outcome variable? Survival Continuous Dichotomous
O
What is the study design? Q.N.A. ~ Paired Independent Matched or
Paired Independent
Is this a case-control study? Q.N.A. Q.N.A. Q.N.A. Yes No Yes No
Method number 1 2 3 4 5 6,7 7,8
aThe method number given above is defined in Table 1.
~Question not asked
126 W.D. Dupont and W.D. Plummer, Jr.
16.5/11 = 1.5. In other words a trial with 110 patients per group will be able
to detect the alternative hypothesis of 50% greater morbidity on the control
treatment with 80% power and a 10% type I error.
The Schoenfeld and Richter [10] method permits the follow-up interval to
be specified as an accrual interval A when patients are recruited plus an
additional follow-up interval F. Note that if all patients are followed for the
same length of time then A equals zero and F equals the uniform follow-up
interval.
We thank Robert A. Parker, George W. Reed, Gordon R. Bernard, Curtis L. Meinert, and the
referees for helpful advice, and Janelle Steele and Virginia McKinney for assistance in preparing
this manuscript. This research was supported in part by NIH grants and contracts HL-14192,
N01-AI-52593, R01-CA40517, and R01-CA46492.
REFERENCES
1. Meinert CL: Clinical Trials: Design, Conduct, and Analysis. New York: Oxford
University Press, 1986
2. Schlesselman JJ: Case-Control Studies: Design, Conduct, Analysis. New York:
Oxforcl University Press, 1982
3. Steel RGD, Torrie JH: Principles and Procedures of Statistics: A Biometrical Ap-
proach, 2nd ed. New York: McGraw-Hill, 1980
4. Fleiss JL: Statistical Methods for Rates and Proportions, 2nd ed. New York: Wiley,
1981
5. Casagrande JT, Pike MC, Smith PG: An improved approximate formula for cal-
culating sample sizes for comparing two binomial distributions. Biometrics 34:483-
486, 1978
6. Dupont WD: Power calculations for matched case-control studies. Biometrics 44:
1157-1168, 1988
7. Feigl P: A graphical aid for determining sample size when comparing two inde-
pendent proportions. Biometrics 34:111-122, 1978
8. Friedman LM, Furberg CD, DeMets DL: Fundamentals of Clinical Trials. Boston:
John Wright PSG, 1982
9. Pearson ES, Hartley HO: Biometrika Tables for Statisticians, 3rd ed. Cambridge:
Cambridge University Press, 1970, vol I
10. Schoenfeld DA, Richter JR: Nomograms for calculating the number of patients
needed for a clinical trial with survival as an endpoint. Biometrics 38:163-170,
1982
11. Gross AJ, Hunt HH, Cantor AB, Clark BC: Sample size determination in clinical
trials with an emphasis on exponentially distributed responses. Biometrics 43:875-
883, 1987
12. Halpen J, Brown BW Jr: Designing clinical trials with arbitrary, specification of
128 W.D. Dupont and W.D. Plummer, Jr.
survival functions and for the log rank or generalized Wilcoxon test. Controlled
Clin Trials 8:177-189, 1987
13. Lachin JM, Foulkes MA: Evaluation of sample size and power for analysis of
survival with allowance for nonuniform patient entry, losses to follow-up, non-
compliance and stratification. Biometrics 42:507-519, 1986
14. Lakatos E: Sample sizes based on the log rank statistic in complex clinical trials.
Biometrics 44:229-241, 1988
15. Parker RA, Bregman DJ: Sample size for individually matched case-control studies.
Biometrics 42:919-926, 1986
16. Self SG, Mauritsen RH: Power/sample size for generalized linear models. Bio-
metrics 44:79-86, 1988
17. Taulbee JD, Symons MJ: Sample size and duration for cohort studies of survival
time with covariables. Biometrics 39:351-360, 1983
18. Kelsey JL, Thompson WD, Evans AS: Methods in Observational Epidemiology.
New York: Oxford University Press, 1986
19. Ralston A: A First Course in Numerical Analysis. New York: McGraw-Hill, 1965
20. Peto R, Pike MC, Armitage P, et al: Design and analysis of randomized clinical
trials requiring prolonged observation of each patient: II. Analysis and examples.
Br J Cancer 35:1-39, 1977
21. Johnson NL, Kotz S: Distributions in Statistics Continuous Univariate Distribu-
tions--2. New York: Wiley, 1970
22. Breslow NE, Day NE: Statistical Methods in Cancer Research: Vol I--The Analysis
of Case-Control Studies. Lyon: International Agency for Research on Cancer, 1980
23. Peto R: Rank tests of maximal power against Lehmann-type alternatives. Bio-
metrika 59:472-475, 1972
24. Lee ET: Statistical Methods for Survival Data Analysis. Belmont, CA: Lifetime
Learning, 1980, pp 76-87
25. Dupont WD: Sensitivity of Fisher's exact test to minor perturbations in 2 x 2
contingency tables. Stat Med 5:629-635, 1986
APPENDIX
S u p p o s e that 2(Cro/cr~)z./~ ~ z~ ~ 3.1 and we select n according to Eq. 3.
Then
I, lx/ n =
and hence