Академический Документы
Профессиональный Документы
Культура Документы
Replication
Replicability: the degree to which similar results are obtained if a study is repeated
Exact replication:
Repeat study using the same methods as exactly as possible. Rare; difficult to publish (publication bias for novel research) Use slightly different methods (e.g., measures, manipulations, sample) to test the same hypotheses Very common (virtually required in some top-tier journals) Add something to extend the results (e.g., another condition). Can help show that the results 1) replicate and 2) generalize
Conceptual Replication:
Replication-plus-extension:
Priming a feature of a stereotyped group can yield behaviour that is consistent with the stereotype They used words in a scrambled sentence task to prime an elderly stereotype (e.g., old, wise, sentimental, bingo, retired, wrinkle) or neutral words (e.g., thirsty, clean, private) After the study ended, they timed participants as they walked down the hallway.
Examined the effects of presenting the color red on performance across 6 experiments All studies used the same variables at the abstract (conceptual) level, but differed at the operational level
Experiment 1:
IV: ID number was written on page in red, green, or black ink DV: Number of anagrams solved correctly
Experiment 2:
IV: Cover page was page in red, green, or white DV: Number of correct analogy items on IQ test
High-profile fraud cases (e.g., Diederik Stapel, Dirk Smeesters) Report that psychologists are reluctant to share data for reanalysis (Wicherts, Bakker, & Molenaar, 2011) Focus on questionable research practices (Simmons, Nelson, & Simonsohn, 2011) Widely ridiculed publication showing extrasensory perception effects (Bem, 2011) that failed to replicate (Ritchie, Wiseman, & French, 2012)
Bem (2011), reported 9 experiments supporting precognition of events before they occurred
Bem (2011) was published in the Journal of Personality and Social Psychology, a top-tier journal The same journal subsequently rejected a manuscript that failed to replicate the finding (JPSP does not publish replications) Ritchie, Wiseman, & French (2012) failed to replicate Bems Experiment 9 across 3 pre-registered direct replications
Differences in methods (measures, setting, sample, etc.) Random variation across samples Mistakes made during data collection
The failure to replicate could be a Type II error (or the original study could be a Type I error)
Should Bem have been published? Editorial (Judd & Gawronski, 2011):
We openly admit that the reported findings conflict with our own beliefs about causality and that we find them extremely puzzling. Yet, as editors we were guided by the conviction that this paperas strange as the findings may beshould be evaluated just as any other manuscript on the basis of rigorous peer review. Our obligation as journal editors is not to endorse particular hypotheses but to advance and stimulate science through a rigorous review process. (abstract)
Choosing a=.05 does not mean the risk of a Type 1 error is 5% How many of the effects that we examine actually exist? How much power do we have to detect those effects?
Direct replications are rare and conceptual replications are problematic Science is not always self-correcting
We dont know! (File drawer effect) The Reproducibility Project (Open Science Collaboration, 2012)
Large-scale (>150 scientists) attempt to replicate studies Currently replicating studies from 3 prominent psychology journals from 2008 What is the overall rate of replicability in psychology? What predicts replicability of studies?
Begley and Ellis (2012) attempted to replicate 53 papers in top journals on cancer research They focused on new (unreplicated) results They did not replicate 47 (89%) of those studies
Homework Assignment
Pretend that you MUST obtain a statistically significant result in your group project at any cost. Try to change your analyses to find a statistically significant result. For instance, you could:
Exclude participants for any reason Add control variables Change scores that look unusual
The significant result does not need to be relevant to your hypotheses Could you write a paper that makes sense of this significant result?
Practices in the collection, analysis, and reporting of results that inflate the risk of making a Type I error False positive (Type I error): Incorrect rejection of a null hypothesis More common (and perhaps less problematic) than outright fraud
False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis (Simmons, Nelson, & Simonsohn, 2011)
Persistent because failures to replicate are not conclusive and are usually not published Inspire future research that may waste resources
Using a conservative a (e.g., a = .05) does not solve those problems Researcher degrees of freedom
Decisions made during data collection, analysis, and reporting Can yield significant results, but inflate Type 1 error rates
Researcher degrees of freedom inflate type 1 error rates (Simmons et al., 2011)
Simulations using randomly generated data Proportion of significant results (Type 1 errors)
Checking data and adding subjects if p > .05 inflates type 1 error rates (Simmons et al., 2011)
Checking data and adding subjects if p > .05 inflates type 1 error rates (Simmons et al., 2011)
How common are questionable research practices? (John, Loewenstein, & Prelec, 2012)
Sent anonymous survey to 5,964 academic psychologists at U.S. universities; 2,155 (36%) responded Have you done this? Is it defensible? (0=no, 1=possible, 2=yes) Item Admission rate Mean defensibility
(%)
Falsifying data Wrongly claiming results are unaffecting by demographic variables Reporting unexpected finding as having been predicted from the start Deciding whether to exclude data after looking at the impact on results
0.6 3.0
0.16 1.32
27.0 38.2
1.50 1.61
15.6
1.76
27.7
1.77
How common are questionable research practices? (John, Loewenstein, & Prelec, 2012)
Initial response rate of 36% 33% of participants dropped out of the survey before finishing Some participants argued that the questions were worded in a biased manner (e.g., Norbert Schwartz, 2012, listserv posting)
Masciampo & Lalande (2012) A peculiar prevalence of p values just below .05
Examined p-values from three prominent journals: JEPG, JPSP, PS Collected 3,627 p values between .01 and .10 from 36 issues
With real effects, you expect relatively low p-values (an exponential curve of p-values In reality, there are more p-values just below .05 than would be expected by chance
Frequency
P-hacking: Engaging in questionable research procedures in order to reduce the p-value to under .05 The shape of the distribution of p-values (the pcurve) can help to identify p-hacking With a large effect size, p-hacking should not matter much With a small effect or no effect, p-hacking will lead to more p-values just under .05
P-curves for different effect sizes with and without phacking (Simonsohn, Nelson, & Simmons, 2013)
Guidelines for authors: 1. Decide rule for terminating data collection before data collection begins and report that rule 2. Collect at least 20 observations per sell or justification 3. List all variables collected in a study 4. Report all experimental conditions 5. If observations are eliminated, report results with and without those observations 6. If covariates are included, report results with and without the covariate
Guidelines for journal reviewers: 1. Ask authors to follow previous requirements 2. Be tolerant of imperfections in results 3. Ask authors to show that results are robust (vs. hinging on a very specific type of analysis) 4. In some cases, require an exact replication
Increase sample size Increase reliability of measures Choose study designs that minimize error variance:
Clear and standardized instructions Use controlled conditions Design strong manipulations Test and address assumptions Control for covariates, when justified
Avoid multiple underpowered studies Publish all relevant information (materials, sample size justifications, etc.)
Any Questions ?