Resource Guide

Moving Beyond Tradition:
Why and How to Replace Statistical Significance Tests with Better Methods
Andreas Schwab (Iowa State University) William H. Starbuck (University of Oregon)
Eric Abrahamson (Columbia University) Sam H. Holloway (University of Portland)
Academy of Management Meeting 2018

Alternatives to Null-Hypothesis Significance Testing
This webpage provides comprehensive and up-to-date information on how to improve on statistical
significance tests as a methodology to investigate empirical data.

General Sources
Schwab, A., Abrahamson, E., Starbuck, W. H. & Fidler, F. (2011). Researchers should make thoughtful assessments
instead of null-hypothesis significance tests. Organization Science, 22(4), 1105-1120.
Cumming, G. (2011). Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. New
York: Routledge. (http://www.latrobe.edu.au/psy/research/projects/esci)
Wasserstein, R. L., & Lazar, N. A. (2016). The ASA's statement on p-values: context, process, and purpose.
American Statistician, 70(2), 129-133.
Gigerenzer, G. (2004). Mindless statistics. Journal of Socio-Economics, 33, 587-606.
Hubbard, R. (2015). Corrupt Research: The case for reconceptualizing empirical management and social science.
Los Angeles: SAGE Publications.
Specific NHSTs Issues
Replication Crisis in Social Science
Open Science Collaboration (2015). Estimating the reproducibility of psychological science. Science, 349(6251),
Goldfarb, B., & King, A. A. 2016. Scientific apophenia in strategic management research: Significance tests &
mistaken inference. Strategic Management Journal, 37(1), 167-176.
Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data
collection and analysis allows presenting anything as significant. Psychological science, 22(11), 1359-
Cortina, J.M., & Folger, R.G. (1998). When is it acceptable to accept a null hypothesis: No way, Jose?
Organizational Research Methods, 1, 334-350.
Sanabria, F. & Killeen, P. R. (2007). Better statistics for better decisions: Rejecting null hypothesis statistical tests in
favor of replication statistics. Psychology in the Schools, 44(5), 471-481.
Schwab, A., & Starbuck, W. H. (2017). A call for openness in research reporting: How to turn covert practices into
helpful tools. Academy of Management Learning & Education, 16(1), 125-141.

Sizes of Effects
Cumming, G. (2011). Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. New
York: Routledge. (http://www.latrobe.edu.au/psy/research/projects/esci)
Schwab, A. (2015). Why all researchers should report effect sizes and their confidence intervals: Paving the way for
meta-analysis and evidence-based management practices. Entrepreneurship, Theory & Practice, 39(4). 719-

Breaugh, J.A. (2003). Effect size estimation: Factors to consider and mistakes to avoid. Journal of Management,
29(1), 79-97.
Cortina, J.M. & Nouri, H. (1999). Effect size for ANOVA designs. Newbury Park, CA: Sage.
Grissom, R., & Kim, J.J. (2005). Effect sizes for research: A broad practical approach. Mahwah, NJ: Erlbaum.

Probability of Effects
Cumming, G. & Fidler, F. (2009). Confidence intervals better answers to better questions. Zeitschrift Fur
Psychologie-Journal of Psychology, 217(1), 15-26.
Cumming, G. & Finch, S. (2002). A primer on the understanding, use and calculation of confidence intervals that are
based on central and noncentral distributions. Educational and Psychological Measurement, 61, 532-575.
Cohen, J. (1994). The Earth is round (p < .05). American Psychologist, 49, 997-1003.
Baseline Models
Schwab, A. & Starbuck, W. H. (2012). Using baseline models to improve theories about emerging markets.
In C. Wang, D. Bergh, & D. Ketchen (eds.), Research Methodology in Strategy and
Management, 7, 3-33. Bingley, UK: Emerald.
Schwab, A. & Starbuck, W. H. (2013). Why baseline modelling is better than null-hypothesis testing:
Examples from research about international management, developing countries, and emerging
markets. Advances in International Management, 26: 171-195.
Graphing Data and Findings
Schwab, A. 2018. Investigating and communicating the uncertainty of effects: The power of graphs.
Entrepreneurship, Theory and Practice, Available online before print.
Aiken, L. S., West, S. G., Luhmann, M., Baraldi, A., & Coxe, S. J. (2012). Estimating and graphing interactions. In
H. Cooper, P. M. Camic, D. Long, A. Panter, D. Rindskof, & K. Sher (Eds.), The APA Handbook of
Research Methods in Psychology, Vol. 3: 101-129. Washington, DC: American Psychological Association.
Tufte, E.R. (2001) The Visual Display of Quantitative Information. Cheshire, CT: Graphics Press.
Tukey, J.W. (1977). Exploratory data analysis. Reading, MA: Addison-Wesley.
Robust Regression
Andersen, R. (2008). Modern methods for robust regression. Quantitative Applications in the Social Sciences #07-
152, Los Angeles, CA: Sage.
Van de Ven, A. H. (2007). Engaged scholarship: A guide for organizational and social research. Chapter 4. New
York: Oxford University Press.
Methodological Change
Bettis, R. A., Ethiraj, S., Gambardella, A., Helfat, C., & Mitchell, W. (2016). Creating repeatable cumulative
knowledge in strategic management. Strategic Management Journal, 37(2), 257-261.
Orlitzky, M. (2012). How can significance tests be deinstitutionalized? Organizational Research Methods, 15(2),
Fidler, F. & Cumming, G. (2007). Lessons learned from statistical reform efforts in other disciplines. Psychology in
the Schools, 44(5), 441-449.
Fidler, F., Thomason, N., Cumming, G., Finch, S., & Leeman, J. (2004). Editors can lead researchers to confidence
intervals, but can't make them think - Statistical reform lessons from medicine. Psychological Science,
15(2), 119-126.

Various Additional Sources
Carver, R.P. (1993). The case against statistical significance testing, revisited. Journal of Experimental Education,
61(4), 287-292.
Connor, E. F., & Simberloff, D. (1986). Competition, scientific method, and null models in ecology. American
Scientist, 74, 155-162.
Cortina, J.M., & Deshon, R.P. (1998). Determining relative importance of predictors with the observational design.
Journal of Applied Psychology, 83, 798-804.
Cortina, J.M., & Dunlap, W.P. (1997). On the logic and purpose of significance testing. Psychological Methods, 2,
Gauch, H.G. (2002). Scientific method in practice. Cambridge, U.K: Cambridge University Press.
Gauch, H.G. (2006). Winning the accuracy game. American Scientist, 94, 135–143.
Greenwald, A.G. (1975). Consequences of prejudice against the null hypothesis. Psychological Bulletin, 82: 1-20.
Haller, H. & Krauss, S. (2002). Misinterpretations of significance: A problem students share with their teachers?
Methods Psychol. Res., 7(1), 1–20.
Harlow, L.L., Mulaik, S.A. & Steiger, J.H. (1997). What if there were no significance tests? Mahawah, NJ: Erlbaum.
Hubbard, R. (2004). Alphabet soup: Blurring the distinctions between p’s and α’s in psychological research. Theory
& Psychology, 14(3), 295-327.
Hubbard, R. & Armstrong, J.S. (2006). Why we don't really know what statistical significance means: Implications
for educators. J. Marketing Education, 28( 2), 114-120.
Hubbard, R., and Ryan, P.A. (2000). The historical growth of statistical significance testing in psychology – and its
future prospects. Educational and Psychological Measurement, 60, 661-681.
Hunter, J.E. (1997). Needed: A ban on the significance test. Psychological Science, 8(1), 3-7.
Ioannidis, J.P.A. (2005). Why most published research findings are false. PloS Med. 2(8) e124.
Ioannidis, J.P.A. (2005). Contradicted and initially stronger effects in highly cited clinical research. Journal of the
American Medical Association, 294(2) 218-228.
Kehle, T.J., Bray, M.A., Chafouleas, S.M., & Kawano, T. (2007). Lack of statistical significance. Psychology in the
Schools, 44(5), 417-422.
Kileen, P.R. (2005). An alternative to null-hypothesis significance tests. Psychological Science, 16, 345-353.
Rosnow, R.L., & Rosenthal, R. (1989). Statistical procedures and the justification of knowledge in psychological
science. American Psychologist, 44, 1276 1284.
Schmidt, F. & Hunter, J. (2002). Are there benefits from NHST? American Psychologist, 57(1), 65-66.
Schmidt, F.L. & Hunter, J.E. (1997). Eight common but false objections to the discontinuation of significance testing
in analysis of research data. L. Harlow, S. Mulaik, J. Steiger (Eds.) What if there were no significance
tests? Erlbaum, Mahawah, NJ, 37-63.
Schmidt, F.L. (1996). Statistical significance testing and cumulative knowledge in psychology: Implications for the
training of researchers. Psychological Methods, 1, 115-129.
Schwab, A. & Starbuck, W.H. (2009). Null-hypothesis significance tests in behavioral and management research:
We can do better. D. Bergh, & D. Ketchen (Eds.) Research Methodology in Strategy and Management, Vol.
5, New York: Elsevier JAI Press, 29-54.
Starbuck, W.H. (2006). The production of knowledge: The challenge of social science research. New York: Oxford
University Press.
Thompson, B. (1996). AERA editorial policies regarding statistical significance testing: Three suggested reforms.
Educational Researcher, 25(2), 26-30.
Thompson, B. (2002). "Statistical," "practical," and "clinical": How many kinds of significance do counselors need
to consider? Journal of Counseling & Development, 80(4), 64-71.

Thompson, B. (2006). Foundations of behavioral statistics: An insight-based approach. New York: Guilford Press.
Thompson, B. (2006). Research synthesis: Effect sizes. In J. L. Green, G. Camilli, P. B. Elmore, A. Sukuauskaite
and E. Grace (Eds.), Handbook of complementary methods in education research, American Educational
Research Association, Washington, DC, 583-603.
Tukey, J.W. (1991). The philosophy of multiple comparisons. Statistical Science, 6(1), 100-116.
Venables, W.N., & Ripley, B.D. (2002). Modern applied statistics with S (4 ed.). New York: Springer.
Wacholder, S., Chanock, S., Garcia-Closas, M., El ghormli, L. & Rothman, N. (2004). Assessing the probability that
a positive report is false: An approach for molecular epidemiology studies. J. Natl. Cancer Institute, 96(6),
Related Websites Covering NHST Issues
Cumming, G. The New Statistics (http://www.psychologicalscience.org/index.php/members/new-statistics)
Gelman, A. Statistical Modeling, Causal Inference and Social Science (http://andrewgelman.com/)
Schwab, A., Starbuck, W., Abrahamson, E., Holloway, S. & Miller C. Alternatives to Null-Hypothesis Significance
Testing (https://sites.google.com/site/nhstresearch/)
Final Note: If you have any questions, additional sources or recommendations for this resources guide, please
contact Andreas Schwab (aschwab@iastate.edu).