Вы находитесь на странице: 1из 1

COMMENT

P values are just the tip


of the iceberg
Ridding science of shoddy statistics will require scrutiny of every step,
not merely the last one, say Jeffrey T. Leek and Roger D. Peng.

T
here is no statistic more maligned analysis is taught through an apprenticeship
than the P value. Hundreds of papers DATA PIPELINE model, and different disciplines develop
and blogposts have been written The design and analysis of a successful study their own analysis subcultures. Decisions
about what some statisticians deride as null has many stages, all of which need policing. are based on cultural conventions in spe-
hypothesis significance testing (NHST; see, cific communities rather than on empirical
Extreme scrutiny
for example, go.nature.com/pfvgqe). NHST evidence. For example, economists call data
deems whether the results of a data analysis measured over time panel data, to which
are important on the basis of whether a they frequently apply mixed-effects models.
P value
summary statistic (such as a P value) has Biomedical scientists refer to the same type
crossed a threshold. Given the discourse, it of data structure as longitudinal data, and
is no surprise that some hailed as a victory often go at it with generalized estimating
Inference Little debate
the banning of NHST methods (and all of equations.
statistical inference) in the journal Basic Statistical research largely focuses on
and Applied Social Psychology in February1. mathematical statistics, to the exclusion of
Summary statistics
Such a ban will in fact have scant effect the behaviour and processes involved in
on the quality of published science. There data analysis. To solve this deeper problem,
are many stages to the design and analysis Statistical modelling
we must study how people perform data
of a successful study (see Data pipeline). analysis in the real world. What sets them up
The last of these steps is the calculation of for success, and what for failure? Controlled
an inferential statistic such as a P value, and Potential statistical models
experiments have been done in visualiza-
the application of a decision rule to it (for tion3 and risk interpretation4 to evaluate
example, P<0.05). In practice, decisions how humans perceive and interact with data
that are made earlier in data analysis have Exploratory data analysis
and statistics. More recently, we and others
a much greater impact on results from have been studying the entire analysis pipe-
experimental design to batch effects, lack line. We found, for example, that recently
of adjustment for confounding factors, or Tidy data trained data analysts do not know how to
simple measurement error. Arbitrary levels infer Pvalues from plots of data5, but they
of statistical significance can be achieved by can learn to do so with practice.
changing the ways in which data are cleaned, Data cleaning The ultimate goal is evidence-based data
summarized or modelled2. analysis6. This is analogous to evidence-
P values are an easy target: being widely based medicine, in which physicians are
used, they are widely abused. But, in prac- Raw data encouraged to use only treatments for which
tice, deregulating statistical significance efficacy has been proved in controlled trials.
opens the door to even more ways to game Statisticians and the people they teach and
statistics intentionally or unintentionally Data collection collaborate with need to stop arguing about
to get a result. Replacing P values with Pvalues, and prevent the rest of the iceberg
Bayes factors or another statistic is ultimately from sinking science.
about choosing a different trade-off of true Experimental design
positives and false positives. Arguing about Jeffrey T. Leek and Roger D. Peng are
the P value is like focusing on a single mis- associate professors of biostatistics at the
spelling, rather than on the faulty logic of a Johns Hopkins Bloomberg School of Public
sentence. designed to address this crisis. For Health in Baltimore, Maryland, USA.
Better education is a start. Just as anyone example, the Data Science Specialization, e-mail: jleek@jhsph.edu
who does DNA sequencing or remote- offered by Johns Hopkins University in
1. Trafimow, D. & Marks, M. Basic Appl. Soc. Psych.
sensing has to be trained to use a machine, Baltimore, Maryland, and Data Carpen- 37, 12 (2015).
so too anyone who analyses data must be try, can easily be integrated into training 2. Simmons, J. P., Nelson, L. D. & Simonsohn, U.
trained in the relevant software and con- and research. It is increasingly possible to Psychol. Sci. 22, 13591366 (2011).
3. Cleveland, W. S. & McGill, R. Science 229,
cepts. Even investigators who supervise data learn to use the computing tools relevant 828833 (1985).
analysis should be required by their funding to specific disciplines training in Bio- 4. Kahneman, D. & Tversky, A. Econometrica 47,
agencies and institutions to complete train- conductor, Galaxy and Python is included 263291 (1979).
ing in understanding the outputs and poten- in Johns Hopkins Genomic Data Science 5. Fisher, A., Anderson, G. B., Peng, R. & Leek, J.
PeerJ 2, e589 (2014).
tial problems with an analysis. Specialization, for instance. 6. Leek, J. T. & Peng, R. D. Proc. Natl Acad. Sci. USA
There are online courses specifically But education is not enough. Data 112, 16451646 (2015).

6 1 2 | NAT U R E | VO L 5 2 0 | 3 0 A P R I L 2 0 1 5
2015 Macmillan Publishers Limited. All rights reserved

Вам также может понравиться