Al M Best, PhD

Professor, Periodontics, School of Dentistry

Professor, Biostatistics, School of Medicine

Outline

• Idea for the editorial

• A history of significance testing

• A guide to misinterpretation

• Using a dental example

• My practice as a collaborator

Best AM, Greenberg BL, Glick M. From tea tasting to t test: A P value ain’t

what you think it is. Journal of the American Dental Association. 2016

Jul;147(7):527-9. PMID: 27350642.

7-Mar-2017 retractionwatch.com blog

http://retractionwatch.com/2016/03/07/were-using-a-common-statistical-test-

all-wrong-statisticians-want-to-fix-that/

TAS

http://www.tandfonline.com/doi/full/10.1080/00031305.2016.1154108

Metrics

amstat.tandfonline.com/doi/citedby/10.1080/00031305.2016.1154108

Supplemental Material

• Greenland, S, Senn, SJ, Rothman, KJ, Carlin, JB, Poole, C, • Ioannidis, John PA: Fit-for-purpose inferential

Goodman, SN and Altman, DG: “Statistical Tests, methods: abandoning/changing P-values versus

P-values, Confidence Intervals, and Power: abandoning/changing research

A Guide to Misinterpretations” • Johnson, Valen E: Comments on the “ASA

• Altman, Naomi: Ideas from multiple testing of high Statement on Statistical Significance and P-values"

dimensional data provide insights about reproducibility and and marginally significant p-values

false discovery rates of hypothesis supported by p-values • Lavine, Michael, and Horowitz, Joseph: Comment

• Benjamin, Daniel J, and Berger, James O: A simple • Lew, Michael J: Three inferential questions, two

alternative to p-values types of P-value

• Benjamini, Yoav: It’s not the p-values’ fault • Little, Roderick J: Discussion

• Berry, Donald A: P-values are not what they’re cracked up • Mayo, Deborah G: Don’t throw out the error control

to be baby with the bad statistics bathwater

• Carlin, John B: Comment: Is reform possible without a • Millar, Michele: ASA statement on p-values: some

paradigm shift? implications for education

• Cobb, George: ASA statement on p-values: Two • Rothman, Kenneth J: Disengaging from statistical

consequences we can hope for significance

• Gelman, Andrew: The problems with p-values are not just • Senn, Stephen: Are P-Values the Problem?

with p-values • Stangl, Dalene: Comment

• Goodman, Steven N: The next questions: Who, what, when, • Stark, PB: The value of p-values

where, and why?

• Ziliak, Stephen T: The significance of the ASA

• Greenland, Sander: The ASA guidelines and null bias in statement on statistical significance and p-values

current teaching and practice

Supplemental Material

Carlin, JB, Poole, C, Goodman, SN and

Altman, DG: “Statistical Tests, P-values,

Confidence Intervals, and Power: A Guide to

Misinterpretations” Eur J Epidemiol. 2016

Apr;31(4):337-50.

The Lady Tasting Tea

● Classical example

Salsburg D. The Lady Tasting Tea. New York, NY: WH Freeman and Co; 2001.

Fisher RA. Statistical Methods and Scientific Inference. 3rd ed. New York, NY: Hafner

Press; 1973.

Coke vs Pepsi

● Say I poured, hidden from you, two soft-

drink cups. One with Coke and one with

Pepsi. Then I ask you: “Which is Coke?

And which is Pepsi?”

● What are the possible outcomes?

Actual number

outcome correct

1 2

2 0

From: Maita Levine and Raymond H. Rolwing (1993).

Teaching Statistics, 15, 4-5.

Likelihood of outcomes

number of correct. Calculate the

probability of each result.

number more extreme

correct frequency proportion results

0 1 0.5 1.00

2 1 0.5 0.50

● Would this experiment be convincing?

Coke vs Pepsi: 4 cups

● Assuming an equal number of Cokes and

Pepsis, the next larger experiment would

be 4 cups.

● What are the possible outcomes?

Actual number

outcome correct

1 0

2 2

3 2

4 2

5 2

6 4

Likelihood of Outcomes

● With each outcome equally likely, we

calculate the p-values for all the possibilities:

number more extreme

correct frequency proportion results

0 1 0.1667 1.0000

● 2 4 0.6667 0.8333

4 1 0.1667 0.1667

● Would this experiment be convincing?

– So if someone got all 4 right, we would be able to

conclude that this person could “… tell the

difference between Coke and Pepsi,

p-value = .1667.” Would this be convincing?

Fisher’s tea lady used 8 cups

● All the possible outcomes

Actual Number Actual Number Actual Number Actual Number

# Correct # Correct # Correct # Correct

1 0 18 4 36 4 54 6

2 2 19 4 37 4 55 6

3 2 20 4 38 4 56 6

4 2 21 4 39 4 57 6

5 2 22 4 40 4 58 6

6 2 23 4 41 4 59 6

7 2 24 4 42 4 60 6

8 2 25 4 43 4 61 6

9 2 26 4 44 4 62 6

10 2 27 4 45 4 63 6

11 2 28 4 46 4 64 6

12 2 29 4 47 4 65 6

13 2 30 4 48 4 66 6

14 2 31 4 49 4 67 6

15 2 32 4 50 4 68 6

16 2 33 4 51 4 69 6

17 2 34 4 52 4 70 8

35 4 53 4

Likelihood of Outcomes

● We calculate the p-values

Number more extreme

Correct frequency proportion results

0 1 0.0143 1.0000

2 16 0.2286 0.9857

4 36 0.5143 0.7571

6 16 0.2286 0.2429

8 1 0.0143 0.0143

● If someone got all 8 right, we could conclude

that this person could “… tell the difference

between Coke and Pepsi, p-value = .0143.”

Would this be convincing?

Inference?

based on the key idea that we make

observations on a sample of subjects and

then draw inferences about the population

of all such subjects from which the sample

is drawn.”

Altman D, Machin D., Bryant T, & Gardner M (Eds.) (2013) Statistics with confidence: confidence intervals and

statistical guidelines. John Wiley & Sons. ISBN 0-7279-1375-1. Page 3.

Gardner MJ, Altman DG. (1988) Estimating with confidence. Br Med J. 30;296(6631):1210-1. PMID: 3133015; PubMed

Central PMCID: PMC2545695.

Jerzy Neyman & Egon Pearson

fuzzy and heuristic

● Instead of focusing on what a scientist

thinks about the evidence, an experiment

should tell the scientist what to do.

● Out of this came Ha, type-I and type-II

error rates, power

Greenland’s “Guide to Misinterpretations”

analgesia as a supplement to inferior alveolar

nerve block in patients with irreversible pulpitis.”

JADA 2016 147(6):427-37.

● CONCLUSIONS: There is moderate evidence to

support the use of oral NSAIDs-in particular,

ibuprofen-1 hour before the administration of

IANB local anesthetic to provide additional

analgesia to the patient.

A Guide to Misinterpretations” Eur J Epidemiol. 2016 Apr;31(4):337-50.

Severely infected irreversible pulpitis

Tom Hanks (2000) A FedEx executive must

transform himself physically and emotionally to

survive a crash landing on a deserted island

Ibuprofen versus placebo, frequency of

participants in each group having “little or no

pain during endodontic treatment.”

Benzodiazepine versus placebo, frequency of

participants in each group having “little or no pain

during endodontic treatment.”

True or False?

The p-value is the probability that

the null hypothesis is true.

hypothesis gave P = 0.02, the null

hypothesis has only a 2% chance of being

true.

Guide to Misinterpretations” Eur J Epidemiol. 2016 Apr;31(4):337-50.

The p-value is the probability that

the null hypothesis is true.

No!

The p-value simply indicates the degree to

which the data conform to the pattern

predicted by the null hypothesis

and all the other assumptions used in the

test (the underlying statistical model).

Backwards

interpretation might be appreciated

by pondering how the p-value,

which is a probability deduced from a set

of assumptions,

can possibly refer to the probability of

those assumptions.

True or False?

The p-value is the probability that chance

alone produced the observed association.

null hypothesis is 0.02.

And so there is a 2% probability that

chance alone produced the association.

The p-value for the null hypothesis is the

probability that chance alone produced the

observed association.

assumption used to compute the p-value is

correct, including the null hypothesis.

Greenland. et al.’s Guide

p-value(s)

● 4 misinterpretations of p-values across

studies or in subgroups

● 5 misinterpretations of confidence intervals

● 2 misinterpretations of power

< .05

p< .05 does

means … ?mean

NOT

● Ha is true

● Scientifically important effect detected

● Substantially important relationship

demonstrated

● Chance of false positive finding is 5%

pp >> .05

.05 does

means …?mean

NOT

● Ha is false

● Evidence in favor of Ho

● There is no effect

● The effect size is small

Greenland. et al.’s Conclusions included:

for a hypothesis cannot be derived from

statistical methods alone.

● Significance tests and confidence intervals

do not by themselves provide a logically

sound basis for concluding an effect is

present or absent with a given probability.

Not even scientists can

easily explain p-values

● You can get it right,

or you can make it intuitive,

but it’s all but impossible to do both.

ASA: Conclusion

● Good statistical practice, as an essential component of

good scientific practice, emphasizes:

– principles of good study design and conduct,

– a variety of numerical and graphical summaries of

data,

– understanding of the phenomenon under study,

– interpretation of results in context,

– complete reporting and

– proper logical and quantitative understanding of

what data summaries mean.

● No single index should substitute for scientific

reasoning.

ASA: Conclusion

● Good statistical practice, as an essential component of

good scientific practice, emphasizes:

– principles of good study design and conduct,

– a variety of numerical and graphical summaries of

data,

– understanding of the phenomenon under study,

– interpretation of results in context,

– complete reporting and

– proper logical and quantitative understanding of

what data summaries mean.

● No single index should substitute for scientific

reasoning.

Study Design and Conduct

● PICO-T

● Bias, Confounding, Contamination

Publication Bias Exposure Bias (performance bias) Interpretation Bias

1. Bias of rhetoric 1. Contamination bias 1. Mistaken identity bias

2. All’s well literature bias 2. Withdrawal bias 2. Cognitive dissonance bias

3. Reference bias 3. Compliance bias 3. Magnitude bias

4. Positive results bias 4. Therapeutic personality bias 4. Significance bias

5. Hot stuff bias 5. Bogus control bias 5. Correlation bias

6. Pre-publication bias 6. Misclassification bias 6. Under-exhaustion bias

7. Post-publication bias 7. Proficiency bias The Dunning-Kruger effect

8. Sponsorship bias Detection Bias (measurement bias)

9. Meta-analysis bias 1. Insensitive measure bias

Selection Bias (susceptibility bias) 2. Underlying cause bias (rumination

1. Popularity bias bias)

2. Centripetal bias 3. End-digit preference bias

3. Referral filter bias 4. Apprehension bias

4. Diagnostic access bias 5. Unacceptability bias

5. Diagnostic suspicion bias 6. Obsequiousness bias

6. Unmasking bias 7. Expectation bias

7. Mimicry bias 8. Substitution game bias

8. Previous opinion bias 9. Family information bias

9. Wrong sample size bias 10. Exposure suspicion bias

10. Admission rate bias (Berkson) 11. Recall bias

11. Prevalence-incidence bias (Neyman) 12. Attention bias

12. Diagnostic vogue bias 13. Instrument bias Hartman JM, Forsen JW Jr,

13. Diagnostic purity bias 14. Surveillance bias

14. Procedure selection bias 15. Comorbidity bias

Wallace MS, Neely JG.

15. Missing clinical data bias 16. Nonspecification bias “Tutorials in clinical research:

16. Non-contemporaneous control bias 17. Verification bias (work-up bias) part IV: recognizing and

17. Starting time bias Analysis Bias (Transfer Bias) controlling bias.” Laryngoscope.

18. Unacceptable disease bias 1. Post-hoc significance bias

19. Migrator bias 2. Data dredging bias

2002 Jan;112(1):23-31.

20. Membership bias 3. Scale degradation bias Expanded from:

21. Nonrespondent bias 4. Tidying-up bias (deliberate elimination Sackett DL. “Bias in analytic

22. Volunteer bias bias)

23. Allocation bias 5. Repeated peeks bias research.” J Chronic Dis.

24. Vulnerability bias 1979;32(1-2):51-63.

25. Authorization bias

Cognitive Bias Codex

ASA: Conclusion

● Good statistical practice, as an essential component of

good scientific practice, emphasizes:

– principles of good study design and conduct,

– a variety of numerical and graphical summaries of

data,

– understanding of the phenomenon under study,

– interpretation of results in context,

– complete reporting and

– proper logical and quantitative understanding of

what data summaries mean.

● No single index should substitute for scientific

reasoning.

Context

● David Moore:

but they are not ‘just numbers.’

They are numbers with a context.”

NY: Freeman, p xxi

Context

speak for themselves

Ed Koren, © The New Yorker, 9 December 1974

ASA: Conclusion

● Good statistical practice, as an essential component of

good scientific practice, emphasizes:

– principles of good study design and conduct,

– a variety of numerical and graphical summaries of

data,

– understanding of the phenomenon under study,

– interpretation of results in context,

– complete reporting and

– proper logical and quantitative understanding of

what data summaries mean.

● No single index should substitute for scientific

reasoning.

Words Matter

● CONSORT 2010

● How to Report Statistics in Medicine

● AMA Manual of Style

Moore and Notz 2006, Statistics: Concepts and Controversies, NY: Freeman, p xxi

CONsolidated Standards of Reporting Trials

item checklist and a flow diagram. The checklist

items focus on reporting how the trial was

designed, analysed, and interpreted; the flow

diagram displays the progress of all participants

through the trial.

● The CONSORT “Explanation and Elaboration”

document explains and illustrates the principles

underlying the CONSORT Statement.

www.consort-statement.org

Specialized CONSORT

● Harms (safety)

● Non-inferiority

● Cluster randomized trials

● Herbal, Acupuncture

● Non-pharmacologic agents

● Pragmatic trials

● Parent reported outcomes

● N-of-1 trials

● Orthodontic trials

● Pilot and feasibility trials

Enhancing the QUAlity and Transparency

of health Research

● STROBE – Observational studies

● PRISMA – Systematic reviews

● CARE – Case reports

● SRQR – Qualitative research

● STARD – Diagnostic/prognostic studies

● SQUIRE – Quality improvement studies

… a total of 358 reporting guidelines

http://www.equator-network.org/

Dedication

● Lang: To anyone who

has encountered the

frustration of what

I call “Statistical

Buddhism”

● To those who know,

no explanation is

necessary.

To those who do not

know, no explanation

is possible.

● Glossary

– P value: probability of

obtaining the observed

data (or data that are

more extreme) if the null

hypothesis were exactly

true.

● www.amamanualofstyle.com

Everitt BS. The Cambridge Dictionary of Statistics in the Medical Sciences.

Cambridge, England: Cambridge University Press; 1995.

Al’s Conclusion

● Good statistical practice, is an essential component of

good scientific practice

– Data are information in context.

– Insist on a full and complete description of the

context of a study.

– A p-value is calculated from a set of numbers

encased in certain assumptions.

– Viewed alone, the p-value may be meaningless.

● No single index can substitute for scientific

reasoning.

Thank you

ASA: Six Principles

● P-values can indicate how incompatible the data are with a

specified statistical model.

● P-values do not measure the probability that the studied

hypothesis is true, or the probability that the data were

produced by random chance alone.

● Scientific conclusions and business or policy decisions

should not be based only on whether a p-value passes a

specific threshold.

● Proper inference requires full reporting and transparency.

● A p-value, or statistical significance, does not measure the

size of an effect or the importance of a result.

● By itself, a p-value does not provide a good measure of

evidence regarding a model or hypothesis.

George Cobb—Looking Ahead:

Five Imperatives

● George Cobb (2015) Mere Renovation is Too Little Too Late: We Need to Rethink our Undergraduate Curriculum from the Ground Up, The American Statistician, 69:4, 266-282, DOI:

10.1080/00031305.2015.1093029

● Flatten prerequisites

– Calc I → Calc II → Calc III → Probability → Math

Stat → Biostatistics

● Strip away technical formalism and formulas

● Embrace computation

● Exploit context

– Interpretation, motivation,direction

● Teach through research

