Вы находитесь на странице: 1из 8

MAS6012/MAS461/MAS361 Medical Statistics: Clinical Trials

Contents
Preliminaries Multiplicity &c.
0: Introduction
1: Background & Basic Concepts Multiplicity & Interim Analysis
2: Basic Trial analysis Books
3: Randomization Andersen, B. (1990) Methodological
4: Protocol Deviations Errors in Medical Research. Blackwell
5: Size of the Trial
Papers
6: Multiplicity & Interim Analysis ICH E9 Expert Working Group. (1999)
7: Crossover Trials Statistics in Medicine, 18, 1905-42.
8: Combining Trials Philips, Alan & Haudiquet, Vincent (2003)
9: Binary Response Data Statistics in Medicine, 22, 1-11
10: Comparing Methods of Measurement Medical Statistics: Clinical Trials Medical Statistics: Clinical Trials
NRJF, University of Sheffield, 2011/12 Semester
163 NRJF, University of Sheffield, 2011/12 Semester
164

Multiplicity arises in Example: Effect of new dietary control


Multiple end-points
regime.
Subgroup analyses
Data: 250 subjects:
Interim testing
Weight loss at end of week.
Repeated Measures
Data in kg.
&c.
Paired t-test gives p-value of 0.067
Problem of repeated significance tests
May inflate risk of false positive Not quite significant at the 5% level !
i.e. overall significance level

Medical Statistics: Clinical Trials Medical Statistics: Clinical Trials


NRJF, University of Sheffield, 2011/12 Semester
165 NRJF, University of Sheffield, 2011/12 Semester
166

Can anything be done to squeeze a


significant result out of this expensive p-value for Aries is 0.019
With mean weight loss of 0.5kg
study ?????
weve been told we cannot change our mind
and use a one-sided test instead! p-value for Taurus is 0.099
With mean weight gain of 0.3kg
subgroup data by Sign of the Zodiac:
Conclusions:
Diet successful for those under Aries
Taurus subjects are perverse

Medical Statistics: Clinical Trials Medical Statistics: Clinical Trials


NRJF, University of Sheffield, 2011/12 Semester
167 NRJF, University of Sheffield, 2011/12 Semester
168

NRJF, University of Sheffield, 2011/12 Semester 1


MAS6012/MAS461/MAS361 Medical Statistics: Clinical Trials

Clearly a False Positive Result Statistical tests make mistakes


Declare a real difference exists
Fallacy arises because of but in fact the observed difference
selecting most significant result. is due to natural chance variation
Risk controlled
for each individual single test
(data are artificial, but not very)
significance level of the test or the p-value

useful device to try if pressed to perform if many separate significance tests then
post-hoc subgroup analysis (c.f. Richard Peto) difficult to control overall risk of declaring
at least one false positive somewhere
Medical Statistics: Clinical Trials Medical Statistics: Clinical Trials
NRJF, University of Sheffield, 2011/12 Semester
169 NRJF, University of Sheffield, 2011/12 Semester
170

5% test then 95% chance of no mistake c.f. Normal Ranges


in clinicochemcal tests
Two 5% tests then 95%95% A normal person is one who has
(= 90.25%) of no mistake on either not been sufficiently investigated.
A normal range comprise 95% of values
So 10% risk of one or other (or both)
100 normal persons evaluated
giving a false positive
then only 95 of them will normal
If then subjected to another independent
i.e. overall significance level is ~10%
test only 90 will remain as normal

Medical Statistics: Clinical Trials Medical Statistics: Clinical Trials


NRJF, University of Sheffield, 2011/12 Semester
171 NRJF, University of Sheffield, 2011/12 Semester
172

Multiplicity: Bonferroni correction


10 independent tests at [nominal] 5% k tests, want overall level to be

H0 true in all (i.e. no difference) Take nominal level


Chance of rejecting at least 1 is 40%
on each test as /k
Example:
SO reduce nominal level in each 5 separate tests
to control overall significance level Overall 5% level of significance wanted
Declare a result if any test
nominally significant at the
5%/5=1% significance level
Medical Statistics: Clinical Trials Medical Statistics: Clinical Trials
NRJF, University of Sheffield, 2011/12 Semester
173 NRJF, University of Sheffield, 2011/12 Semester
174

NRJF, University of Sheffield, 2011/12 Semester 1


MAS6012/MAS461/MAS361 Medical Statistics: Clinical Trials

Example:
Example: 12 tests have been performed
25 tests are to be performed smallest p-value is 0.019
overall level of 1% intended What is the overall level of significance?
so each should be run at a Bonferroni method says overall level
nominal level of 1/25=0.04% is 120.019 = 0.228
i.e. a result should not be claimed This is the Signs of the Zodiac example
unless p < 0.0004 in any one of them i.e. no worthwhile evidence of any birth sign
being particularly suited to dieting
(see again later)

Medical Statistics: Clinical Trials Medical Statistics: Clinical Trials


NRJF, University of Sheffield, 2011/12 Semester
175 NRJF, University of Sheffield, 2011/12 Semester
176

Bonferroni method typically Multiple End-points


very conservative e.g. pulse rate, systolic
& diastolic blood pressure
i.e. less likely to be able to declare a real
sitting,standing & supine
difference exists even if there is one
before & after exercise
But is safe
i.e. you preserve your scientific reputation
Separate tests high risk of false positives
by avoiding making mistakes but
at expense of failing to discover
something scientifically interesting

Medical Statistics: Clinical Trials Medical Statistics: Clinical Trials


NRJF, University of Sheffield, 2011/12 Semester
177 NRJF, University of Sheffield, 2011/12 Semester
178

Remedies:
Bonferroni correction Very frustrating if you had considered
choose primary outcome measure 20 highly correlated measures
multivariate analysis each gives nominal p-value of 0.01
NB: Bonferroni very conservative Bonferroni says can only claim
multiple outcome measures an overall p-value of 0.2
likely to be highly correlated Would have been better not
standing systolic BP will give to have measured the other 19
similar evidence to sitting BP

Medical Statistics: Clinical Trials Medical Statistics: Clinical Trials


NRJF, University of Sheffield, 2011/12 Semester
179 NRJF, University of Sheffield, 2011/12 Semester
180

NRJF, University of Sheffield, 2011/12 Semester 1


MAS6012/MAS461/MAS361 Medical Statistics: Clinical Trials

Better is to define primary outcome Multivariate Analysis


perhaps 2 or 3 secondary measures Makes proper allowance in the analysis
Must be stated in the protocol
for correlated observations
medical expertise There are multivariate equivalents of
initial results from a pilot study standard univariate statistical analyses
Students t-test Hotellings T2-test
ANOVA MANOVA
Other measures (e.g. lab results)
Multivariate Analysis of Variance
should be scrutinised Wilks test or Lawley-Hotelling test
report causes for concern

Medical Statistics: Clinical Trials Medical Statistics: Clinical Trials


NRJF, University of Sheffield, 2011/12 Semester
181 NRJF, University of Sheffield, 2011/12 Semester
182

Advantage of multivariate analysis


handle all measures simultaneously Cautionary Examples
ref: Br J Clin Pharmacol [Suppl.], 1983, 16: 103
return a single p-value
effect of midazolan on sleep
Disadvantage
table of 29 tests of significance
difficulty of interpreting the nature on measures of platform balance
of the difference detected made at various times
Many MV procedures in stats packages repeated measures analysis (see later)
Advice must be to use them with caution
unless experienced help is to hand
Medical Statistics: Clinical Trials Medical Statistics: Clinical Trials
NRJF, University of Sheffield, 2011/12 Semester
183 NRJF, University of Sheffield, 2011/12 Semester
184

Cautionary Examples
ref: Basic Clin Med 1981, 15: 445 Andersen quotes
double-blind controlled clinical trial The Lancet (1984, ii: 1457)
to treat rheumatoid arthritis
several end-points repeated at various Moreover, submitting a larger number
timepoints and various subdivisions
of factors to statistical examination
850 pairwise comparisons were made
not only improves your chances of
t-tests and Fishers exact test
48 of these gave p-values < 0.05
a positive result but also enhances
But expect 5% of 850 = 850/20 = 42.5
your reputation for diligence
so finding 48 is not very impressive
Medical Statistics: Clinical Trials Medical Statistics: Clinical Trials
NRJF, University of Sheffield, 2011/12 Semester
185 NRJF, University of Sheffield, 2011/12 Semester
186

NRJF, University of Sheffield, 2011/12 Semester 1


MAS6012/MAS461/MAS361 Medical Statistics: Clinical Trials

One-way ANOVA generalisation to


Subgroup analyses several samples of a two-sample t-test
Similar problems with subgroups
tests differences between subgroups
Need to specify which subgroups tests null hypothesis that all subgroups have
of particular interest in protocol the same mean vs one or more is different
If none in particular then If effect exhibited in only one of
Bonferroni adjustment several subgroups then one (or more)
Analysis of Variance of the subgroups is different from the rest
so test this with ANOVA
Follow-up tests for multiple comparisons
Follow-up tests to identify which is of interest
Tukeys / Dunnetts / Neuman-Keuls /

Medical Statistics: Clinical Trials Medical Statistics: Clinical Trials


NRJF, University of Sheffield, 2011/12 Semester
187 NRJF, University of Sheffield, 2011/12 Semester
188

Example: Signs of Zodiac (see notes) Can also look at 12 separate p-values
p-value for differences in weight loss
between Zodiac signs is 0.405
Dotplot of p-values
Boxplots of Weight loss by Zodiac sign
(means are indicated by solid circles)

No evidence of 2
0.0 0.1 0.2 0.3 0.4 0.5
p-value
0.6 0.7 0.8 0.9

difference so 1
Weight loss

follow-up tests
If any evidence that some groups
0

not really -1
were shewing an effect then some
appropriate -2
of them would be clustered towards
Zodiac sign
Aries

Pisces

Taurus
Leo
Cancer
Aquarius

Capricorn

Libra

Virgo
Scorpio
Sagittarius
Gemini

(but see example


near 0.0 and not evenly spread out
in notes)
Medical Statistics: Clinical Trials Medical Statistics: Clinical Trials
NRJF, University of Sheffield, 2011/12 Semester
189 NRJF, University of Sheffield, 2011/12 Semester
190

Example (Lee et al, Circulation, 1980)


1073 subjects randomized in two groups But:
No overall significance In fact, no difference between treatments
6 [post-hoc] subgroups defined
All patients were treated in the SAME way
One of these produced significance
at nominal 2.5% level (p=0.023) Groups were just random allocations
Medical reason for expecting
this subgroup to be different
i.e. a false positive effect

Medical Statistics: Clinical Trials Medical Statistics: Clinical Trials


NRJF, University of Sheffield, 2011/12 Semester
191 NRJF, University of Sheffield, 2011/12 Semester
192

NRJF, University of Sheffield, 2011/12 Semester 1


MAS6012/MAS461/MAS361 Medical Statistics: Clinical Trials

Cautionary Example (see notes) Interim analyses


ref: N Engl J Med 1978, 298: 647 Desirable in long trial
Complex study on age at presentation of Check protocol compliance
European, black and Latino men & women Side effects?
in an anaemia study Feedback
(maintains interest)
Needs a 3-way ANOVA to investigate
interactions between gender and race and Detect big effects quickly
age.

Medical Statistics: Clinical Trials Medical Statistics: Clinical Trials


NRJF, University of Sheffield, 2011/12 Semester
193 NRJF, University of Sheffield, 2011/12 Semester
194

Repeated significance tests


on accumulating data
However, multiplicity problems:
Number of repeated overall significance
Specify in protocol
tests at the 5% level level
Adjust nominal significance levels
1 0.05

Bonferroni too conservative 5 0.14

(accumulating data) 10 0.19


100 0.37

Medical Statistics: Clinical Trials Medical Statistics: Clinical Trials


NRJF, University of Sheffield, 2011/12 Semester
195 NRJF, University of Sheffield, 2011/12 Semester
196

Comparison of drug combinations CP


Nominal significance levels and CVP in non-Hodgkins lymphoma.
required to achieve overall level
Measure: tumour shrinkage
N =0.05 =0.01
Trial: over 2 years, about 120 patients.
2 0.029 0.0056
Five interim analyses planned,
5 0.016 0.0028
roughly after every 25th result.
10 0.0106 0.0018
Table gives numbers of successes and
nominal p-values using a 2 test at each stage.

Medical Statistics: Clinical Trials Medical Statistics: Clinical Trials


NRJF, University of Sheffield, 2011/12 Semester
197 NRJF, University of Sheffield, 2011/12 Semester
198

NRJF, University of Sheffield, 2011/12 Semester 1


MAS6012/MAS461/MAS361 Medical Statistics: Clinical Trials

response rates
Analysis CP CVP statistic & p-value Conclusion:
1 3/14 5/11 1.63 (p>0.20) Not significant at end of trial
2 11/27 13/24 0.92 (p>0.30) (overall p>0.05) since p>0.016
the required nominal value for 5 repeat tests
3 18/40 17/36 0.04 (p>0.80)
If NO interim analyses had been done
4 18/54 24/48 3.25 (0.05<p<0.1)
then conclusion would have been different
5 23/67 31/59 4.25 (0.025<p<0.05) CVP declared significantly better at 5% level

Medical Statistics: Clinical Trials Medical Statistics: Clinical Trials


NRJF, University of Sheffield, 2011/12 Semester
199 NRJF, University of Sheffield, 2011/12 Semester
200

Cautionary Example:
ref: Br J Surg, (1974), 61: 177
No significant difference with 49 patients Continuing to collect data
The trial was therefore continued until a significant result
After 100 patients gave result is obtained is clearly dishonest
2 = 4.675, d.f. = 1, p< 0.05 eventually an apparently
(and the trial was published) significant result will be obtained
Actual nominal p-value is 0.031 > 0.029
so cannot claim overall 5% significance

Medical Statistics: Clinical Trials Medical Statistics: Clinical Trials


NRJF, University of Sheffield, 2011/12 Semester
201 NRJF, University of Sheffield, 2011/12 Semester
202

Repeated Measures Remedies


same feature on a patient Bonferroni adjustments
measured at several time points Very conservative (high correlation)
blood concentration at baseline and at Multivariate analysis
1, 3, 6, 12 and 24 hours after drug Special techniques for this
Must not do t-tests at each time point Construction of summary measures
diagrams with mean values of the two Area under curve
treatment groups plotted against time
Change from base line
with error bars for each mean
invite the eye to do exactly that Mean change

Medical Statistics: Clinical Trials Medical Statistics: Clinical Trials
NRJF, University of Sheffield, 2011/12 Semester
203 NRJF, University of Sheffield, 2011/12 Semester
204

NRJF, University of Sheffield, 2011/12 Semester 1


MAS6012/MAS461/MAS361 Medical Statistics: Clinical Trials

placebo metoprolol
Miscellany deaths 62/697 (8.9%) 40/698 (5.7%) p<0.02
Post-hoc re-grouping age 4064 26/453 (5.7%) 21/464 (4.5%) p>0.2
Dangerous to combine
small subgroups together age 6574 36/244 (14.8%) 19/234 (8.1%) p=0.03
after the data have been collected Metoprolol better for elderly?
Example: age 4069 51/627 (8.1%) 32/629 (5.1%) p=0.04
Death or survival in 90 days
age 7074 11/70 (15.7%) 8/69 (11.6%) p>0.2
after heart attack
65-69 age group combined Metoprolol better for younger?
with older or younger groups

Medical Statistics: Clinical Trials Medical Statistics: Clinical Trials


NRJF, University of Sheffield, 2011/12 Semester
205 NRJF, University of Sheffield, 2011/12 Semester
206

Multiple Regression Example:


men who did not shave regularly were
large regression analyses
many explanatory variables 70% more likely to suffer a stroke and
ordinary regression
30% more likely to suffer heart disease
according to study at the University of Bristol
logistic regression for success/failure data
Cox regression for survival data Perhaps from a logistic regression analysis
Need to ensure that effects are Is diligence in shaving a medically
not selected just because they are plausible feature to be investigated???
the most significant coefficients How many other variables were
included in the study???
Medical Statistics: Clinical Trials Medical Statistics: Clinical Trials
NRJF, University of Sheffield, 2011/12 Semester
207 NRJF, University of Sheffield, 2011/12 Semester
208

Problems of multiplicity
can be overcome by
Summary and Conclusions Bonferroni corrections
Multiplicity can arise in Bonferroni typically very conservative
testing several different responses other adjustments in special cases
subgroup analyses e.g. for accumulating data in interim analyses
where adjusting for multiplicity can
interim analyses have counter-intuitive effects
repeated measures more sophisticated analyses
&c. e.g. ANOVA or multivariate methods
The effect of multiplicity is to increase If you torture the data often enough
the overall risk of a false positive it will eventually confess
(i.e. the overall significance level)
Medical Statistics: Clinical Trials Medical Statistics: Clinical Trials
NRJF, University of Sheffield, 2011/12 Semester
209 NRJF, University of Sheffield, 2011/12 Semester
210

NRJF, University of Sheffield, 2011/12 Semester 1

Вам также может понравиться