Replicability and Questionable Research Practices

Replicability & Questionable Research Practices
Readings on Sakai: Pashler & Harris, 2012; Simmons et al., 2011
Replication
Replicability: the degree to which similar results are obtained if a study is repeated
Exact replication:

Repeat study using the same methods as exactly as possible. Rare; difficult to publish (publication bias for novel research) Use slightly different methods (e.g., measures, manipulations, sample) to test the same hypotheses Very common (virtually required in some top-tier journals) Add something to extend the results (e.g., another condition). Can help show that the results 1) replicate and 2) generalize
Conceptual Replication:

Replication-plus-extension:

Example of a Direct Replication:

Bargh, Chen, & Burrons (1996)

Priming a feature of a stereotyped group can yield behaviour that is consistent with the stereotype They used words in a scrambled sentence task to prime an elderly stereotype (e.g., old, wise, sentimental, bingo, retired, wrinkle) or neutral words (e.g., thirsty, clean, private) After the study ended, they timed participants as they walked down the hallway.
Example of a Direct Replication:

Bargh, Chen, & Burrons (1996)
Example of a Conceptual Replication:

Elliot, Maier, Moller, Friedman, & Meinhardt (2007)
Examined the effects of presenting the color red on performance across 6 experiments All studies used the same variables at the abstract (conceptual) level, but differed at the operational level

Experiment 1:

IV: ID number was written on page in red, green, or black ink DV: Number of anagrams solved correctly

Experiment 2:
IV: Cover page was page in red, green, or white DV: Number of correct analogy items on IQ test
The replicability crisis in psychology
Crisis of confidence in psychology this decade:

High-profile fraud cases (e.g., Diederik Stapel, Dirk Smeesters) Report that psychologists are reluctant to share data for reanalysis (Wicherts, Bakker, & Molenaar, 2011) Focus on questionable research practices (Simmons, Nelson, & Simonsohn, 2011) Widely ridiculed publication showing extrasensory perception effects (Bem, 2011) that failed to replicate (Ritchie, Wiseman, & French, 2012)
How many results in psychology would replicate?
Extransensory Perception Studies
Bem (2011), reported 9 experiments supporting precognition of events before they occurred
Examined well-known psychological effects time-reversed (measure outcome before manipulation)

Presented list of words serially Type all words they could remember After typing the words, they practiced a randomly selected half of the words Result: Participants recalled significantly more words that they practiced (vs. control words), t(49) = 2.96, p = .002, d = 0.42.
Example (experiment 9):

Stephen Colbert's summary
Bem (2011) was published in the Journal of Personality and Social Psychology, a top-tier journal The same journal subsequently rejected a manuscript that failed to replicate the finding (JPSP does not publish replications) Ritchie, Wiseman, & French (2012) failed to replicate Bems Experiment 9 across 3 pre-registered direct replications
What does a failure to replicate mean?

Failures to replicate are ambiguous! They could represent:
Differences in methods (measures, setting, sample, etc.) Random variation across samples Mistakes made during data collection
The failure to replicate could be a Type II error (or the original study could be a Type I error)

Should Bem have been published? Editorial (Judd & Gawronski, 2011):
We openly admit that the reported findings conflict with our own beliefs about causality and that we find them extremely puzzling. Yet, as editors we were guided by the conviction that this paperas strange as the findings may beshould be evaluated just as any other manuscript on the basis of rigorous peer review. Our obligation as journal editors is not to endorse particular hypotheses but to advance and stimulate science through a rigorous review process. (abstract)
Is the Replicability Crisis Overblown?

(Pashler & Harris, 2012)
Choosing a=.05 does not mean the risk of a Type 1 error is 5% How many of the effects that we examine actually exist? How much power do we have to detect those effects?
Direct replications are rare and conceptual replications are problematic Science is not always self-correcting
The replicability crisis
How many results in psychology would replicate?
We dont know! (File drawer effect) The Reproducibility Project (Open Science Collaboration, 2012)

Large-scale (>150 scientists) attempt to replicate studies Currently replicating studies from 3 prominent psychology journals from 2008 What is the overall rate of replicability in psychology? What predicts replicability of studies?
Is this crisis unique to psychology?

Begley and Ellis (2012) attempted to replicate 53 papers in top journals on cancer research They focused on new (unreplicated) results They did not replicate 47 (89%) of those studies
Questionable Research Practices
Homework Assignment
Pretend that you MUST obtain a statistically significant result in your group project at any cost. Try to change your analyses to find a statistically significant result. For instance, you could:

Exclude participants for any reason Add control variables Change scores that look unusual
The significant result does not need to be relevant to your hypotheses Could you write a paper that makes sense of this significant result?
Questionable Research Practices
Practices in the collection, analysis, and reporting of results that inflate the risk of making a Type I error False positive (Type I error): Incorrect rejection of a null hypothesis More common (and perhaps less problematic) than outright fraud
False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis (Simmons, Nelson, & Simonsohn, 2011)
False-positives (Type I errors) are problematic!

Persistent because failures to replicate are not conclusive and are usually not published Inspire future research that may waste resources
Using a conservative a (e.g., a = .05) does not solve those problems Researcher degrees of freedom
Decisions made during data collection, analysis, and reporting Can yield significant results, but inflate Type 1 error rates
Researcher degrees of freedom inflate type 1 error rates (Simmons et al., 2011)

Simulations using randomly generated data Proportion of significant results (Type 1 errors)
Checking data and adding subjects if p > .05 inflates type 1 error rates (Simmons et al., 2011)
Checking data and adding subjects if p > .05 inflates type 1 error rates (Simmons et al., 2011)
How common are questionable research practices? (John, Loewenstein, & Prelec, 2012)
Sent anonymous survey to 5,964 academic psychologists at U.S. universities; 2,155 (36%) responded Have you done this? Is it defensible? (0=no, 1=possible, 2=yes) Item Admission rate Mean defensibility
(%)
Falsifying data Wrongly claiming results are unaffecting by demographic variables Reporting unexpected finding as having been predicted from the start Deciding whether to exclude data after looking at the impact on results
0.6 3.0
0.16 1.32
27.0 38.2
1.50 1.61
Stopping data collection earlier than planned because a result is significant

Failing to report all conditions in a
15.6
1.76
27.7
1.77
How common are questionable research practices? (John, Loewenstein, & Prelec, 2012)
This study had several methodological limitations:

Initial response rate of 36% 33% of participants dropped out of the survey before finishing Some participants argued that the questions were worded in a biased manner (e.g., Norbert Schwartz, 2012, listserv posting)
Some QRPs may be justifiable in some contexts
Another approach to identifying falsepositives: p-values
Masciampo & Lalande (2012) A peculiar prevalence of p values just below .05
Examined p-values from three prominent journals: JEPG, JPSP, PS Collected 3,627 p values between .01 and .10 from 36 issues
With real effects, you expect relatively low p-values (an exponential curve of p-values In reality, there are more p-values just below .05 than would be expected by chance
Another approach to identifying falsepositives: p-values (Masciampo & Lalande, 2012)
Frequency
P-curve: A key to the file drawer

(Simonsohn, Nelson, & Simmons, 2013)
P-hacking: Engaging in questionable research procedures in order to reduce the p-value to under .05 The shape of the distribution of p-values (the pcurve) can help to identify p-hacking With a large effect size, p-hacking should not matter much With a small effect or no effect, p-hacking will lead to more p-values just under .05
P-curves for different effect sizes with and without phacking (Simonsohn, Nelson, & Simmons, 2013)
How can we minimize false positives?

(Simmons et al., 2011)
Guidelines for authors: 1. Decide rule for terminating data collection before data collection begins and report that rule 2. Collect at least 20 observations per sell or justification 3. List all variables collected in a study 4. Report all experimental conditions 5. If observations are eliminated, report results with and without those observations 6. If covariates are included, report results with and without the covariate
How can we minimize false positives?

(Simmons et al., 2011)
Guidelines for journal reviewers: 1. Ask authors to follow previous requirements 2. Be tolerant of imperfections in results 3. Ask authors to show that results are robust (vs. hinging on a very specific type of analysis) 4. In some cases, require an exact replication
Additional guidelines for increasing replicability (Asendorpf et al., 2013)

Increase sample size Increase reliability of measures Choose study designs that minimize error variance:

Clear and standardized instructions Use controlled conditions Design strong manipulations Test and address assumptions Control for covariates, when justified
Use appropriate statistical methods

Avoid multiple underpowered studies Publish all relevant information (materials, sample size justifications, etc.)
Any Questions ?

Replicability and Questionable Research Practices

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Replicability and Questionable Research Practices

Загружено:

Авторское право:

Доступные форматы

Replicability & Questionable Research Practices

Readings on Sakai: Pashler & Harris, 2012; Simmons et al., 2011

Example of a Direct Replication:

Example of a Direct Replication:

Example of a Conceptual Replication:

Example of a Conceptual Replication:

Example of a Conceptual Replication:

The replicability crisis in psychology

Crisis of confidence in psychology this decade:

How many results in psychology would replicate?

Extransensory Perception Studies

Examined well-known psychological effects time-reversed (measure outcome before manipulation)

Example (experiment 9):

Stephen Colbert's summary

Extransensory Perception Studies

What does a failure to replicate mean?

Failures to replicate are ambiguous! They could represent:

Extransensory Perception Studies

Is the Replicability Crisis Overblown?

The replicability crisis

How many results in psychology would replicate?

Is this crisis unique to psychology?

Questionable Research Practices

Questionable Research Practices

False-positives (Type I errors) are problematic!

Stopping data collection earlier than planned because a result is significant

This study had several methodological limitations:

Some QRPs may be justifiable in some contexts

Another approach to identifying falsepositives: p-values

Another approach to identifying falsepositives: p-values (Masciampo & Lalande, 2012)

P-curve: A key to the file drawer

How can we minimize false positives?

How can we minimize false positives?

Additional guidelines for increasing replicability (Asendorpf et al., 2013)

Use appropriate statistical methods

Вам также может понравиться