Beach and Pedersen (2019) Process Tracing Methods. 2nd Edition

Revised Pages
Process-Tracing Methods
Derek Beach and Rasmus Brun Pedersen have completely reworked and
updated the first book-length guide for using process-tracing in social
science research. They begin by introducing a more refined definition of
process-tracing, differentiating it into four distinct variants, and explaining
the applications for and limitations of each. The authors develop the un-
derlying logic of process-tracing, including how one can understand causal
mechanisms. They walk the researcher through each stage of the research
process, starting with issues of developing mechanistic theoretical explana-
tions, formulated in either minimalist terms or as systems that detail how
a process could work in a given case. Building on recent advances in case-
selection techniques, the book elaborates guidelines for which cases are
appropriate for the different variants of process-tracing. The book then de-
velops an innovative framework that enables generalizations about mecha-
nistic claims to be made. The authors then develop guidelines for making
causal inferences using Bayesian logic in an informal, nontechnical fashion
and for translating empirical material into mechanistic evidence that may
or may not confirm how causal mechanisms operate in particular cases.
The book concludes with three chapters that provide a step-by-step guide
to using the different variants of process-tracing.
This updated book makes three major contributions to the method-
ological literature on case studies. First, it develops the underlying logic
of process-tracing methods in a level of detail that has not been presented
previously and thus establishes a new standard for the use of these methods
in tracing causal mechanisms in real-world cases. Second, by explaining
the application of Bayesian logic to process-tracing methods in a non-
technical manner, the volume provides a coherent framework for drawing
strong causal inferences within cases using mechanistic evidence. Finally,
the book develops guidelines for case selection and generalizing that are
appropriate for tracing mechanisms, whereas existing guidelines are blind
to the potential risk of mechanistic heterogeneity.
Widely acclaimed instructors, the authors draw on their extensive ex-
perience at the graduate level in university classrooms and professional
workshops across the globe.
Derek Beach is Professor of Political Science at Aarhus University,

Denmark.
Rasmus Brun Pedersen is Associate Professor of Political Science
at Aarhus University, Denmark.
Beach_2d-edn.indd 1 9/28/2018 1:08:22 PM

Revised Pages

Revised Pages
Process-Tracing Methods
Foundations and Guidelines
Second Edition
Derek Beach and Rasmus Brun Pedersen
University of Michigan Press

Ann Arbor

Revised Pages
Copyright © 2019 by Derek Beach and Rasmus Brun Pedersen

All rights reserved
This book may not be reproduced, in whole or in part, including illustrations,

in any form (beyond that copying permitted by Sections 107 and 108 of the
U.S. Copyright Law and except by reviewers for the public press), without
written permission from the publisher.
Published in the United States of America by the

University of Michigan Press
Manufactured in the United States of America
Printed on acid-free paper
First published January 2019
A CIP catalog record for this book is available from the British Library.
Library of Congress Cataloging-in-Publication data has been applied for.
ISBN 978-0-472-13123-5 (hardcover : alk. paper)

ISBN 978-0-472-03735-3 (paper : alk. paper)
ISBN 978-0-472-12478-7 (e-book)

Revised Pages
Contents
Acknowledgmentsix
Chapter 1. What Is Process-Tracing? 1
1.1. Introduction: The Three Components of Process-Tracing
as a Social Science Method 1
1.2. What Are We Tracing? 2
1.3. What Are Traces? 4
1.4. Case Selection and Generalization 6
1.5. Four Variants of Process-Tracing 9
1.6. A Primer on the Foundations of Case-Based Methods 12
Chapter 2. What Are We Tracing? 29
2.1. Introduction 29
2.2. The Nature of Causal Mechanisms 30
2.3. Five Common Questions Relating to Mechanistic Approaches 41
Chapter 3. Theorizing Concepts and Causal Mechanisms 53
3.2. Defining Concepts to Be Compatible with
Mechanistic Explanations 54
3.3. Theorizing Mechanisms—Minimalist and Systems
Understandings64
3.4. Level of Abstraction of Mechanistic Explanations 73
3.5. The Importance of Context and the Dangers of Mechanistic
Heterogeneity77
3.6. The Temporal Dimension and Levels of Analysis 81
3.7. The Building Blocks of Mechanism-Based
Theoretical Explanations 84
3.8. How to Theorize about Mechanisms in Practice 86

Revised Pages
vi Contents
Chapter 4. Case Selection and Nesting of Process-Tracing

Case Studies 89
with Markus Siewert
4.2. Selecting Appropriate Cases for Process-Tracing 91
4.3. Problems with the State of the Art Regarding Generalizing
Findings about Mechanisms 103
4.4. Five Sources of Mechanistic Heterogeneity 112
4.5. Snowballing Outward as a Strategy to Test the Boundaries of
Generalization When Generalizing from Process-Tracing
Case Studies 129
4.6. Conclusions 144
Appendix 1—An Illustration of Mechanistic Heterogeneity in Practice 145
Appendix 2—Bounding Populations in Practice 150
Chapter 5. Making Inferences Using Mechanistic Evidence 155
5.2. Problems with Existing Approaches for Making Inferences
in Process-Tracing 158
5.3. What Is Mechanistic Evidence? 171
5.4. Bayesian Reasoning and Operationalizing Propositions about
the Empirical Fingerprints of Mechanisms 172
5.5. What Is Our Prior Confidence in the Theoretical Hypothesis? 182
5.6. Operationalizing Empirical Fingerprints of Mechanisms 186
Chapter 6. Linking Propositions with Empirical Material—
Finding and Evaluating Evidence 195
6.2. Collecting Observations 197
6.3. Evaluating What Found/Not Found Observations Tell Us
about the Existence of a Proposition 199
6.4. Can We Trust the Found/Not Found Observables? 204
6.5. Challenges Associated with Different Types of Sources 213
Chapter 7. Evaluating the Overall Probative Value of
Mechanistic Evidence 223
7.2. Argument Road Maps 224
7.3. Rules for Aggregating Evidence 229
7.4. An Example of the Aggregation of Evidence in
Practice—Tannenwald (1999) 236

Revised Pages
Contents vii
Chapter 8. Theory-Testing Process-Tracing 245

8.2. Minimalist Theory-Testing Process-Tracing 246
8.3. In-Depth Theory-Testing Process-Tracing 252
8.4. Guidelines for Theory-Testing Process-Tracing 257
Appendix 1—An Example of In-Depth Theory-Testing
Process-Tracing 260
Chapter 9. Theory-Building Process-Tracing 269
9.2. Theory-Building Process-Tracing 270
9.3. Theory Revision with Theory-Building Process-Tracing 274
9.4. Guidelines for Theory-Building Process-Tracing 275
Appendix 1—Janis’s Theory-Building Process-Tracing 278
Chapter 10. Explaining-Outcome Process-Tracing 281
10.2. Conducting an Explaining-Outcome Process-Tracing Design 286
Appendix 1—An Example of Explaining-Outcome Process-Tracing 288
Appendix: “Silver Blaze” 289
References295
Index313
Digital materials related to this title can be found on

the Fulcrum platform via the following citable URL:
https://doi.org/10.3998/mpub.10072208

Revised Pages

Revised Pages
Acknowledgments
We are in debt to the growing volume of work on case-based methods in the

social sciences. When we published the first edition of this book, the idea
that case-based methods was a distinctive approach in social science meth-
odology was still novel and not completely worked through. While many
key issues are still underdeveloped—in particular, ideas about generalization
of mechanisms—the literature on many topics has reached a remarkable
level of sophistication. At the same time, considerable disagreements about
foundational issues remain, including debates about what exactly we are
tracing when engaging in process-tracing. In this book, we try to develop an
understanding of the foundations and guidelines for “good” process-tracing
that are consistent with mechanism-based claims and at the same time open
to some of the different positions on issues such as the nature of causal
mechanisms (minimalist versus mechanisms as systems).
We have also been heavily inspired in revising this book by developments
in the philosophy of science in the fields of medicine and biology, where
ideas about the nature of mechanisms and how we can trace them empiri-
cally using mechanistic evidence have been developed in the past decade.
While research in the social sciences raises many different issues than those
encountered in the natural sciences, we believe that because these philoso-
phers have wrestled successfully with many of the challenges of developing
evidence-based mechanistic explanations, we can draw on their insights if
we adapt to them to the context of social science.
This book would not have been possible without the extensive feed-
back supplied by participants in numerous doctoral-level methods train-
ing courses held by the authors throughout the world. In these intensive,
multiday courses, we challenge participants to apply the developing stan-

Revised Pages
x Acknowledgments
dards for case-based designs in their own research. We are greatly in debt
to the participants in these courses on process-tracing methods, including
but by no means limited to the participants at the ECPR Summer Schools
(2011–17) and ECPR Winter Schools (2012–18), the IPSA Summer Schools
in São Paulo (2012–18), the Berlin Graduate School for Transnational Stud-
ies (2012–14, 2016–18), APSA short courses (2013–17), the Concordia Work-
shops for Social Science Research (2013–17), and the short courses at the
ICPSR Summer School (2016–17). We particularly thank our teaching as-
sistants at the ECPR and IPSA schools, who served as vital interlocutors and
sparring partners in developing our ideas for this book. We thank Terra Bu-
dini, Natália Nahas Carneiro Maia Calfat, Peter Marton, Hilde van Meeg-
denburg, Kim Sass Mikkelsen, Yf Reykers, and Camila Rocha. The picture
used on the cover is part of a visual depiction of a theorized causal mecha-
nism linking economic development and democratization developed by two
participants (Camilla Rocha and Betina Sarue) in a PhD course at the IPSA
Summer School in São Paulo, Brazil.
Derek Beach and Rasmus Brun Pedersen

Aarhus, Denmark
March 2018

Revised Pages
C h a p ter 1
What Is Process-Tracing?
You know a conjurer gets no credit when once he has explained his
trick; and if I show you too much of my method of working, you will
come to the conclusion that I am a very ordinary individual after all.
—Sherlock Holmes, A Study in Scarlet
1.1. Introduction: The Three Components of Process-Tracing

as a Social Science Method
Process-tracing is a research method for tracing causal mechanisms using de-

tailed, within-case empirical analysis of how a causal mechanism operated in
real-world cases. Process-tracing can be used both for case studies that aim to
gain a greater understanding of the causal dynamics that produced the out-
come of a particular historical case and to shed light on causal mechanisms
linking causes and outcomes within a causally homogeneous population of
cases. The analytical value added of process-tracing is that it enables causal
inferences to be made about how causal processes work in real-world cases
based on studying within-case mechanistic evidence. The essence of process-
tracing case studies is that we shift the analytical focus from causes and out-
comes to the hypothesized causal mechanisms in between. That is, mecha-
nisms are not causes but are causal processes that are triggered by causes and
that link them with outcomes in a productive relationship.
Process-tracing as a method can be broken down into its three core com-
ponents: (1) theorization about causal mechanisms linking causes and out-
comes, (2) the analysis of the observable empirical manifestations of theo-
rized mechanisms, and (3) the complementary use of comparative methods
to select cases and to enable generalizations of findings from single-case
studies to other causally similar cases.

Revised Pages
2 Process-Tracing Methods
First, a “process” is more than a descriptive narrative of what happened

in a case. Instead, the process we are tracing is commonly understood to be a
causal mechanism that links a cause (or set of causes) with an outcome. Sec-
ond, we “trace” the process by observing the empirical fingerprints, or traces,
left by the operation of a causal mechanism in a case. Third, except when
we want to understand the processes that produced a particular historical
outcome (i.e., explaining-outcome process-tracing), we also need to utilize
comparisons to select appropriate cases and to generalize to other causally
similar cases (Beach and Pedersen 2016b).
This book expands on these three core components, in essence reveal-
ing how a conjurer performs a trick. By lifting this veil, we show readers
that process-tracing is an “ordinary” social science method like many others,
with comparative strengths and weaknesses. Process-tracing is not a pana-
cea, but when applied in appropriate research situations, it can enable us to
make strong within-case causal inferences about causal mechanisms based
on in-depth single-case studies that cannot be made with other social sci-
ence methods, and it can shed light on how causes contribute to producing
outcomes.
If done well, tracing mechanisms using process-tracing methods results
in (1) better overall theories, as the causal logic underlying a given theoreti-
cal relationship is subject to greater scrutiny, (2) knowledge of how a causal
process operates in real-world cases, (3) a better understanding of the con-
text in which a given mechanism works, and (4) stronger within-case causal
inferences when confirming evidence is present for the activities associated
with each part of the mechanism.
1.2. What Are We Tracing?
While the term process-tracing is often attached to descriptive analyses of

historical episodes, when viewed as a social science method, the term should
be reserved for research that aims to trace causal mechanisms as they oper-
ate in real-world cases (e.g., Beach and Pedersen 2016a; Beach and Rohlfing
2018; Bennett 2008a, b; Bennett and Checkel 2013; George and Bennett
2005; Rohlfing 2012).
This, then, raises the question of what causal mechanisms are. Unfor-
tunately, causal mechanisms are one of the most widely used but least un-
derstood types of causal claims in the natural and social sciences (see, e.g.,
Beach and Pedersen 2016a; Brady 2008; Craver and Darden 2013; Gerring

Revised Pages
What Is Process-Tracing? 3
2010; Hedström and Yklikowski 2010; Waldner 2012). However, the essence
of mechanistic explanations is that we shift the analytical focus from causes
and outcomes to the hypothesized causal process in between them. That is,
mechanisms are not causes but are causal processes that are triggered by causes
and that link them with outcomes in a productive relationship. But beyond
this core point, there is considerable disagreement about the nature of mecha-
nisms. We discuss these issues in chapters 2 and 3. In chapter 2, we first
discuss two understandings of causal mechanisms that are not compatible
with the goal of tracing mechanisms using within-case studies: descriptive
narratives and intervening variables. Descriptive narratives are by definition
not causal because the causal linkage between events is not developed. While
viewing mechanisms as intervening variables can be relevant when engaging
in large-n cross-case analyses, it results in the black-boxing of precisely the
causal links that we wanted to trace because relevant evidence for the causal
effect of the intervening variable (i.e., mechanism) is the difference that pres-
ence/absence of it makes across cases that are otherwise similar in all other
factors. Instead of tracing the process as it played out within a case, we would
investigate cross-case variation. This means that we lose focus on the process
that links together causes and outcomes, which was our original objective.
The chapter then develops two different understandings of mechanisms
that are compatible with process-tracing and that are widely used in the
social and natural sciences, a minimalist understanding and a systems un-
derstanding. In a minimalist understanding, the causal mechanisms—the
arrow between a cause and outcome—is not unpacked in much detail, either
empirically or theoretically. This understanding is appropriate to use early in
a research process, where many different plausible mechanisms might link a
cause and an outcome and we want to explore whether there is any evidence
for particular mechanisms, enabling us to focus our analytical attention on
one or a small handful of mechanisms. It is also appropriate as a follow-up to
in-depth process-tracing as an analytically less taxing tool to explore whether
similar mechanisms operate in other cases.
In a systems understanding, the core elements of a causal mechanism
are unpacked theoretically and studied empirically in the form of the traces
left by the activities associated with each part of the process. Mechanisms
are here viewed in a more holistic fashion, where the effects of a mechanism
are more than the sum of their parts. Each of the parts of the mechanism
can be described in terms of entities that engage in activities (Machamer
2004; Machamer, Darden, and Craver 2000). Entities are the factors (ac-
tors, organizations, or structures) engaging in activities, whereas the activi-

Revised Pages
ties are the producers of change or what transmits causal forces or powers
through a mechanism. When a causal mechanism is unpacked theoretically
as a system, the goal becomes understanding how a process actually works
by tracing the operation of each part (or at least the most critical parts) in
one or more cases. This requires considerable analytical resources, meaning
that in-depth process-tracing is often done only after we have some idea that
there is something to look at. In-depth process-tracing is like bringing out
an electron microscope: we only do it after we have an idea that there might
be something to look at because we have engaged in case research using
minimalist process-tracing.
Chapter 3 discusses how we can theorize causal mechanisms in process-
tracing. We start by developing how defining concepts (causes and out-
comes) differs in process-tracing from other methods, including case-based
comparisons and variance-based approaches. We then explore how theorized
mechanisms can look when we raise or lower the analytical level of abstrac-
tion from detailed, case-specific mechanisms to abstract, midrange theorized
mechanisms that may be present in a large number of cases.
1.3. What Are Traces?
How can we make causal inferences about mechanisms when we possess

only within-case evidence in the form of the empirical fingerprints of a theo-
rized mechanism? In process-tracing, we are not assessing the difference that
changes in values of X make for values of Y across a set of cases. Instead,
inferences are made using the correspondence between the hypothetical em-
pirical fingerprints that might have been left by the operation of a mecha-
nism and the actual empirical observations we find in a case. These traces are
what can be termed mechanistic evidence, defined as observational evidence
left by the activities associated with mechanisms or their parts (Illari 2011;
Russo and Williamson 2007).
Chapters 5, 6, and 7 develop a two-stage evidence-evaluation framework
that enables us to assess the degree to which found/not found mechanistic
evidence confirms or disconfirms the operation of a mechanism or its parts.
The framework can be utilized both in theory-testing research, where we
develop predictions about empirical fingerprints that are then assessed in
the actual record of a case, and in theory-building research, where we probe
the record to search for empirical patterns that can shed light on underlying
mechanisms operative in a case.

Revised Pages
In chapter 5, we first discuss why many existing approaches to making

inferences in case study research are either inappropriate or provide precious
little guidance for what types of empirical material can act as evidence and
why. In the former, we find variance-based approaches (e.g., King, Keohane,
and Verba 1994), where evidence of the difference that variation in values
of X has for values of Y (or an intervening variable) across cases enables
inferences about causal effects. Unfortunately, variation across cases means
that we lift the level of analysis away from the level at which causation actu-
ally works—that is, within a case—to the cross-case level, leaving us in the
dark about how a process actually works (Dowe 2011; Illari 2011; Machamer
2004; Russo and Williamson 2007; Waskan 2011). While causal process ob-
servations (CPOs) have been offered as an alternative solution, the existing
literature is vague about what CPOs are purportedly “observing”: given that
we are not told what CPOs are tracing, it is difficult to assess what form of
empirical material could actually act as evidence.
We then develop what mechanistic evidence can actually look like
and how we can use Bayesian reasoning as the logical underpinning
for evaluating what mechanistic evidence can tell us. We suggest using
Bayesian reasoning in an informal sense, where it points us in the direc-
tion of asking the right questions about what empirical fingerprints we
should expect the evidence-generating processes associated with activi-
ties to leave in a given case and whether there are alternative explanations
for finding the fingerprint other than that the activities were operative
as theorized.
In chapter 6, we develop the empirical side of the evaluation of mech-
anistic evidence, focusing on collecting and evaluating what particular
sources can tell us. We use the term observation to refer to raw empirical
material from sources that have not been evaluated for content and accu-
racy. After evaluation, we then can use the term mechanistic evidence. There
can be numerous empirical reasons why we do not find predicted empirical
fingerprints—for example, issues of access. If we find an observation that
we think matches a predicted empirical fingerprint, we still have to assess
what it means in a given context to evaluate whether or not we have actually
found the fingerprint. And in both instances, we must also assess whether we
can trust the sources of found/not found observations.
Chapter 7 discusses the aggregation phase in relation to evidence, going
from what a single piece of found/not found mechanistic evidence tells us to
the evaluation of the whole evidentiary picture. We develop argument road
maps as an analytical tool to systematize the aggregation process.

Revised Pages
1.4. Case Selection and Generalization
When process-tracing has the aim of developing a comprehensive expla-

nation of a particularly interesting historical outcome—as in explaining-
outcome process-tracing—case-selection principles are not relevant because
the case is not a “case of ” a narrower theoretical phenomenon but is a proper
noun in the form of the Cuban Missile Crisis or the French Revolution.
In contrast, in many situations, process-tracing is used to understand
how a given mechanism works within a bounded population of causally
similar cases. Causal homogeneity can refer to two levels: causes and mecha-
nisms. At the level of causes, we might speak of equifinality, where the same
outcome is produced by different causes in different cases. However, differ-
ent operative mechanisms may link together the same causes and outcomes
in different contexts. This problem of mechanistic heterogeneity produced
by differing contextual conditions is an overlooked but vital challenge to
generalizing mechanistic explanations, as we explore in greater detail in
chapter 4.
We develop guidelines for selecting appropriate cases for different types
of process-tracing. In most situations, we argue that we should select “typi-
cal” cases, understood as cases that are positive on the cause (or set of causes),
outcome, and known contextual conditions that might affect how a process
works. Once we have a good working understanding of a mechanism op-
erative within a set of typical cases, we can then move onward to study-
ing deviant cases where the mechanism should have been operated but it
broke down. This type of deviant case forms the basis of theoretical-revision
process-tracing, and enables us to detect omitted conditions that enable a
given mechanism to function.
Selecting appropriate cases for process-tracing requires some form of
prior cross-case knowledge of the population of cases within which a given
mechanism might be operative. Using comparisons, we need to map cases
based on their membership in the set of a cause (or set of causes), outcome
and contextual conditions that might affect how a process plays out. This
can be done using simple comparisons in table form, but it can also be more
systematic in the form of a qualitative comparative analysis (QCA) study
(Beach and Pedersen 2016b; Beach and Rohlfing 2018; Schneider and Rohlf-
ing 2013).
At its core, process-tracing involves the detailed empirical tracing of the
operation of mechanisms within an individual case. Mechanistic evidence
then either confirms or disconfirms our theories about the operation of a

Revised Pages
causal mechanism in the studied case (Illari 2011). Generalizing these find-
ings to other cases requires the ability to document that other cases are caus-
ally similar to the studied case, meaning that we can then expect similar
processes to operate there also. However, given the contextual sensitivity of
mechanistic explanations, we argue that it is not enough to assume mecha-
nistic homogeneity in other cases based on cross-case comparisons because
of the risk of mechanistic heterogeneity produced by known/unknown con-
textual differences. In chapter 4, we uncover a range of different sources
of mechanistic heterogeneity that can result in flawed generalizations if we
merely assume that a similar process should operate in other cases that look
similar in cross-case terms. Instead, we suggest a snowballing-outward strat-
egy for exploring a bounded population to test whether or not mechanistic
heterogeneity is present, thereby reducing the risk of flawed generalizations.
A critical reader might view these arguments as a recipe for piecemeal,
noncumulative research, with process-tracing resulting only in idiographic
studies of mechanisms that operate in only one or a small number of cases.
This is a common argument put forward by neopositivist variance-based
scholars. Gerring writes that “social science gives preference to broad infer-
ences over narrow inferences. First, the scope of an inference usually cor-
relates directly with its theoretical significance. . . . Second, broad empirical
propositions usually have greater policy relevance, particularly if they ex-
tend to the future. They help us to design effective institutions. Finally, the
broader the inference, the greater its falsifiability” (2017: 234).
However, the alternative to taking mechanistic heterogeneity seriously
by appreciating the complexity of how processes play out in real-world cases
and the limited bounds of generalization of mechanisms because of contex-
tual sensitivity is to lift the level of abstraction about mechanisms to such
a high level that our theorized mechanisms are in essence nothingburgers
that tell us precious little, if anything, about how a process works in the real
world (for examples of nothingburger mechanisms, see chapters 3 and 4).
Despite Gerring’s claim about policy relevance, the field of policy evaluation
is increasingly interested in using process-tracing and the tracing of mecha-
nisms as an analytical tool to study how interventions actually work in par-
ticular contexts instead of working with broad propositions that tell us little
about how things work in the real world (Bamanyaki and Holvoet 2016;
Wauters and Beach 2018; Befani and Stedman-Bryce 2016; Cartwright 2011;
Cartwright and Hardie 2012; Clarke et al. 2014; Schmitt and Beach 2015).
The claim about breadth of claims is also increasingly under fire in the
natural sciences, where the appreciation of complexity and the contextual

Revised Pages
sensitivity of mechanisms is becoming increasingly acknowledged and is of-

ten discussed in terms of “systems biology” or “systems medicine” (Ahn et al.
2006; Bechtel and Richardson 2010; Cartwright 2007; Kitano 2002; Levi-
Montalcini and Calissano 2006). Instead of research that aims to evaluate
the effect of individual treatments in isolation, systems biology and medi-
cine seek to investigate how treatments work within complex, real-world
systems. In the words of Kitano,
The focus is on understanding a system’s structure and dynamics.

Because a system is not just an assembly of genes and proteins, its
properties cannot be fully understood merely by drawing diagrams
of their interconnections. Although such a diagram represents an
important first step, it is analogous to a static roadmap, whereas
what we really seek to know are the traffic patterns, why such traffic
patterns emerge, and how we can control them. . . . [T]o under-
stand how a particular system functions, we must first examine how
the individual components dynamically interact during operation.
(2002: 1662)
Appreciating complexity means that our claims become more contex-

tually specific (Bechtel and Richardson 2010; Rickles 2009: 87). Instead of
engaging in a simple experiment that isolates the effect of a treatment in a
controlled environment, researchers are increasingly interested in explor-
ing how things work in particular contexts (Cartwright 2011). In the case
of medicine, this could mean that we understand how a treatment works
in a particular type of patient (e.g., one taking particular medications
because of commonly occurring complications), but we do not assume
that the treatment would work in other patients who may be taking other
medications for other diseases. Instead of one-size-fits-all claims, systems
medicine would try to understand what might work in a particular patient
type (Federoff and Gostin 2009).
But appreciating complexity does not mean that we cannot engage in
cumulative research. Ideally, after intensive collaborative research over a lon-
ger time period, the result would be an evidence-based catalog of different
mechanisms that are triggered by a given cause (or set of causes) in different
contexts. Naturally, this type of research demands more resources, but this is
not an excuse to engage in sloppy generalizations about mechanisms.

Revised Pages
1.5. Four Variants of Process-Tracing
In this book, we distinguish between four variants of process-tracing accord-

ing to their purposes: theory-testing, theory-building, theoretical-revision,
and explaining-outcome. All four variants can be used in both a minimalist
version and a more in-depth version aiming to trace mechanisms as systems.
Table 1.1, shows the research purposes for each of the variants. The theory-
testing, theory-building, and theoretical-revision variants are what can be
termed “theory-focused” research designs, given that the goal is to test/
build/revise theories of causal mechanisms that can in principle be present
in multiple cases. In contrast, in explaining-outcome, the goal is to develop
a case-specific mechanistic explanation that can account for the “big and
important” things going on in the case.
Theory-Testing Process-Tracing
In chapter 8, we develop theory-testing process-tracing. Theory-testing starts

with conceptualizing a plausible hypothetical causal mechanism based on
existing theorization and empirical research. Theorized mechanisms can be
either in a minimalist form or in a more unpacked form—that is, a system,
where we theorize each of the constituent parts of the mechanism in terms
of entities and the causally productive activities that provide the causal link
to the next part. The theorized causal mechanism then needs to be opera-
tionalized in terms of developing propositions about potential empirical fin-
gerprints that might have been left in a given case by the activities associated
TABLE 1.1. Four Variants of Process-Tracing

Theoretical- Explaining-
Theory-testing Theory-building revision outcome process-
process-tracing process-tracing process-tracing tracing
Research Is hypothesized What is the Why did the What mechanis-
purpose causal mecha- causal mecha- mechanism break tic explanation
nism present and nism between down in the case? accounts for
does it function the cause and the historical
as theorized? outcome? outcome?
Analytical Theory-focused Theory-focused Theory-focused Case-focused
focus

Revised Pages
with a mechanism and its parts. The predictions about mechanistic evidence
should be as clear as possible, making it easier to determine whether or not
they are then actually found in the subsequent case study. The researcher
then collects and assesses the available empirical record to determine whether
mechanistic evidence suggests that the mechanism was present and worked
as theorized or whether the theory needs to be modified. If the predicted evi-
dence is found, we can then infer that the hypothesized causal mechanism is
present in the case and worked as we theorized. If evidence is not found for a
given part (or for the overall mechanism, if the minimalist understanding is
used), the researcher should engage in a round of theory-building using the
insights gained from the empirical analysis of what went wrong as inspira-
tion for building theories of new parts of the mechanism.
Theory- Building and Theoretical- Revision Process-Tracing
In chapter 9, we develop two variants of process-tracing: theory-building

and theoretical-revision. Both have the goal of developing theories, but what
we term theory-building process-tracing is aimed at developing theories of
mechanisms, whereas theoretical-revision process-tracing is more focused on
understanding why mechanisms did not work as expected in deviant cases
and using this information to better understand the conditions required for
the mechanism to work.
Theory-building process-tracing is an empirics-first form of research
that in its purest form starts with empirical material and uses a structured
analysis of this material to build a plausible hypothetical causal mechanism
whereby a cause (or set of causes) is linked with an outcome that can be pres-
ent in multiple cases—that is, it can be generalized beyond the single case.
In effect, this variant of process-tracing involves using empirical material to
answer the question “How did we get here?” (Frieden 1986b: 582; Swedburg
2012: 6–7). Theory-building process-tracing is utilized primarily when we
know that there might be a relationship between a cause and outcome, but
we are in the dark regarding potential mechanisms linking the two. In real-
ity, theory-building process-tracing is usually an iterative and creative pro-
cess that goes back and forth between empirical probing and theorization.
After the key theoretical concepts (causes and outcome) are defined and
operationalized, theory-building proceeds to investigate the empirical mate-
rial in the case, using it as clues that can shed light on an underlying causal
mechanism. This probing involves an intensive and wide-ranging search of

Revised Pages
the empirical record, with material collected without knowing what it is evi-
dence of yet. Here it can also be helpful to develop a descriptive narrative of
what happened in the case to shed light on potential mechanisms. The next
step involves inferring that the found observable empirical material is actual
evidence that reflects the empirical fingerprints left by the operation of a
plausible causal mechanism in the case. Tentative hunches about potential
mechanisms (and their parts in the systems understanding) are made based
on the first round of empirical probing, after which the researcher evaluates
whether any of the collected material is actually evidence of the tentative
mechanism (or parts of the mechanism). This evaluation of evidence pro-
ceeds in a slightly different fashion than in theory-testing process-tracing,
given that discussing the certainty of evidence is not relevant because it has
already been found; instead, one evaluates only the uniqueness in relation to
the tentative hypothesized mechanism or its parts.
Evidence does not speak for itself. Theory-building often has a strong
testing element in that scholars seek inspiration about what to look for from
existing theoretical work and previous observations. Here, existing theory
can be thought of as a form of grid to detect systematic patterns in empirical
material, enabling inferences about predicted evidence. In addition, one can
look to research on mechanisms on similar research topics for inspiration
about how parts of the mechanism may look. In other situations, the search
for mechanisms is based on hunches drawn from puzzles that existing work
cannot explain.
Theoretical-revision process-tracing involves a combination of tracing a
mechanism in a deviant case, where a mechanism should have been opera-
tive but where it broke down, and using the information garnered about
where in the process the mechanism broke down to engage in a focused
comparison of what differs between a typical case where the mechanism
worked and the studied deviant case. The goal is to uncover unknown omit-
ted conditions that must be present for the mechanism to function properly.
Explaining- Outcome Process-Tracing
In chapter 10, we discuss explaining-outcome process-tracing. Explaining-

outcome process-tracing is an iterative research strategy that aims to trace
causal mechanisms to produce a comprehensive explanation of a particular
historical outcome. Research in explaining-outcome process-tracing uses
abductive analysis as a way of building explanations, where there is a con-

Revised Pages
tinual and creative juxtaposition between empirical material and theories

(for more, see Peirce 1955; Tavory and Timmermans 2014). The types of
theoretical explanations constructed often are eclectic combinations, view-
ing theories as heuristic tools (for more on this type of explanation, see
Sil and Katzenstein 2010). The overall philosophy of science position that
explaining-outcome reflects is philosophical pragmatism (for more, see
Tavory and Timmermans 2014).
There are two different starting points for explaining-outcome process-
tracing: theory and empirics. The theory-first path follows the steps de-
scribed under theory-testing, where an existing cause (or set of causes) and
the associated mechanisms are tested to see whether they can account for the
outcome. In most research situations, a single existing cause and mechanism
cannot provide a sufficient explanation of an outcome, resulting in a second
stage of research where either a testing or building path can be chosen based
on the results of the first empirical analysis. If the testing path is chosen
again, another theorized cause and associated mechanism would be tested
as a supplemental explanation to see whether together they can account
for the “big and important” things going on in the case. Alternatively, the
theory-building path can be chosen in the second iteration, using empirical
evidence to build a new mechanism that can account for the elements of the
outcome that were unaccounted for using the first mechanism, following the
steps discussed under theory-building. In both paths, theorized mechanisms
and empirical tests are treated more pragmatically as heuristic devices to
understand important events.
1.6. A Primer on the Foundations of Case-Based Methods
All case-based methods, including process-tracing, share some key elements

that differ from variance-based approaches (Beach and Pedersen 2016a).
Readers may be familiar with the distinction between case- based and
variance-based methods, but there are disagreements in the literature about
some of these fundamentals.
What Are Cases?
As the name suggests, the basic level of causal relationship in case-based

research is the case, with a case defined as the unit in which a given causal

Revised Pages
relationship plays out. The temporal and spatial scope and the level of analy-
sis (micro, mezzo, or macro) of a case are defined by the causal theory with
which we are working. The temporal bounds are defined as starting with
the occurrence of a cause and ending with the manifestation of the out-
come. The spatial bounds of a case are determined by the scope in which
a theory operates, which could be a meeting room for social psychologi-
cal theories like groupthink that deal with decision-making within small
groups (Janis 1983) or the international system as a whole over many years
for international relations theories like neorealism (Waltz 1979). The level of
analysis also depends on the theory, ranging from individual-level analyses
to macrolevel theories dealing with collective actors (social groups, states,
international organizations, and so forth). Therefore, the bounds of a “case”
are determined by the causal theory with which we are working.
In any given case, a hypothesized causal relationship either has taken
place or not, which logically means there are no within-case likelihoods or
propensities of a relationship taking place. Based on our knowledge of con-
textual conditions, we might estimate that a given case has a higher or lower
propensity for the hypothesized causal relation to exist, but when we are
engaging in an actual study of that case, the relationship is either present or
not. In case-based research, because individual cases are the analytical point
of departure, we are working with a form of bottom-up analysis of causal
relationships, which then logically implies that we must adopt a determin-
istic and asymmetric understanding of causation if within-case analysis is
to make any sense. We now turn to the reasons underlying this argument.
Ontological Questions Have Methodological Implications
Although this is not a treatise on the nature of causation itself, ontological

debates about the nature of causation are important because different un-
derstandings of causation and causal mechanisms result in different types of
causal claims, which then imply that different methodologies are appropri-
ate for investigating them. For example, the most appropriate method to
investigate a probabilistic, counterfactual-based causal claim about the mean
causal effects of a causal factor X would be an experiment that produces
evidence of difference-making within a randomized sample of a population.
In contrast, the best method to investigate a deterministic claim about the
parts of a causal mechanism that links cause C with outcome O within a
given context would be an in-depth process-tracing study that assesses the

Revised Pages
observational mechanistic evidence left behind by the operation of each part

of the hypothesized mechanism in a strategically selected case.
Although we advocate a pluralist position on methodological issues in
which one understands and accepts ontological and epistemological differ-
ences across methods, we do not argue an “anything goes” position. Instead,
we argue for the principle of methodological alignment, in which our meth-
ods match up with the deeper ontological and epistemological assumptions
that we adopt based on the type of inquiry in which we are engaging. Our
research questions should drive our adoption of ontological and epistemo-
logical positions. But these choices have methodological consequences. If
we adopt a probabilistic account of causation that states that a cause makes
effects more or less probable on average across a population of cases (i.e.,
mean causal effects), it makes little sense to use in-depth case study methods
because a single case cannot shed light on trends between causes and out-
comes across the population.
One might mistakenly contend that we are actually methodological mo-
nists, wedded to one particular ontological position that subsumes other
positions. An example of a monist position is arguably the one adopted by
James Woodward (2003), who claims that the counterfactual position is the
only understanding of causation one can hold. Adopting this view results in
a methodological monist position, where evidence of causation in the form
of the difference that an experimental intervention on a cause makes is the
strongest form of evidence of counterfactual causation, with all other meth-
ods second-best (e.g., Gerring 2011).
In contrast, while we hold that ontological determinism and asymmetry
are the only assumptions that are logically compatible with taking causation
at the case level as the analytical point of departure, we do not attempt to
subsume all social science methods under this umbrella. For example, we
agree that probabilistic, counterfactual-based causal claims that are empiri-
cally tested using experiments are methodologically very different but can
also produce compelling evidence of causal relationships, albeit of a differ-
ent kind than that produced by using case-based methods such as process-
tracing. As recent advances in the philosophy of science within medicine
have made clear, an experiment produces evidence of the difference that
variation in the cause (X) makes for values of the outcome (Y), whereas
methods like process-tracing produce within-case mechanistic evidence that
tells us about the process that links a cause and outcome in an actual case
(Illari 2011; Russo and Williamson 2007, 2011).
The distinctions between variance-based and case-based approaches can

Revised Pages
be depicted as shown in table 1.2. We have also included interpretivist ap-

proaches and the types of causal claims made there in the interests of com-
prehensiveness, although we will not discuss them below. Variance-based
approaches are “top-down,” in that they take population-level trends about
causal effects as the analytical point of departure, in contrast to the “bottom-
up” nature of case-based methods.
In addition, we contend that outside of the common core of deter-
minism and asymmetry, one can take different positions on the nature of
mechanisms in case-based research (minimalist or systems; see chapter 2),
although one should be cognizant of the methodological consequences of
these choices. The existing literature has not clearly examined the conse-
quences of adopting either a minimalist or systems understanding, but the
rest of this book offers just such a systematic exploration.
Defining Ontological Determinism and Probabilism
An ontological probabilistic understanding of causation is one where a cause

can be defined as something that makes an effect more or less probable (Illari
and Russo 2014: 76). A probabilistic cause is a factor that is a frequency-
raiser of the effect across a population of cases, but there are also random
(i.e., stochastic) factors that explain incidence rates of the effect, best seen
in the error terms used in regression analysis (see, e.g., King, Keohane, and
Verba 1994: 89 n. 11). Probabilistic claims are typically made about trends in
the form of mean causal effects in the relationship between causes and out-
comes across a population of cases, although some scholars argue for proba-
bilism at the case level as well. In contrast, ontological determinism can be
defined simply as the claim that in a given case, an outcome has occurred for
TABLE 1.2. Three Different Methodological Approaches in the Social Sciences

Variance-based Case-based Interpretive
Analytical point Population-level Case-level Individuals
of departure (top-down) (bottom-up) (meaning making in
context)
Understanding counterfactual mechanisms (usu- constitutive causality
of causation probabilistic ally), deterministic, (contextual mean-
symmetric (usually) asymm. ing)
Ideal-typical Experiment Detailed process- Ethnographic field
approach tracing + comparison study

Revised Pages
a reason (or more realistically, a set of reasons) (Adcock 2007; Bhaskar 1979:
70–71; Mahoney 2008). This is a form of bottom-up claim about causation
(Russo and Williamson 2011) in which outcomes have occurred for a set of
reasons within a particular case, but moving upward from the individual
to a population can only realistically be done when we translate our claims
into probabilistic ones except if the population exhibits a “law-like” causal
homogeneity (Cartwright 1999: 154–59)—that is, if the same cause produces
the same outcome through the same process in all of the cases in the popula-
tion. This does not mean that we empirically are always able to figure out
why something happened in a case. However, just because we cannot figure
out empirically why something occurred does not mean that the outcome
was the product of randomness. Things do not “just happen”; things happen
for a reason.
The distinction between ontological probabilism and determinism is one
of the most misunderstood distinctions in the social sciences. One reason is
that many scholars conflate ontological and epistemological issues. Here we
are discussing the nature of causal claims (ontology), not how we can learn
about these causal relationships using empirical research (epistemology).
Building on the work of social science methodologists such as Goertz and
Mahoney and philosophers of science such as Illari, Russo, and William-
son, we argue that only a deterministic ontology is compatible with study-
ing mechanisms in single cases using process-tracing (Goertz and Mahoney
2012; Illari and Russo 2014; Mahoney 2008; Russo and Williamson 2011);
anything else makes the study of individual cases a relatively superfluous
exercise that merely illustrates that a found cross-case mean causal effect ac-
tually makes sense at the case level. The evidential heavy lifting of assessing a
probabilistic causal claim about a mean causal effect is, however, done by the
cross-case analysis, preferably in the form of a controlled experiment (e.g.,
Gerring 2011). At the same time, process-tracing should adopt a probabilistic
epistemology based on Bayesian logic (see chapter 5), which tells us that our
knowledge of the world will never be certain but instead that our degree of
confidence in a causal relationship being valid depends on the quality of
the evidence produced. Epistemological determinism would mean that we
can gain knowledge about causal relationships that is 100 percent certain,
which is naturally an impossibility outside of the realm of trivial facts and
conspiracy theories.
Surprisingly, the distinctive combination of ontological determinism and
epistemological probabilism has not been clearly developed in social science
methodology, although it has long existed in the philosophical literature

Revised Pages
(Chatterjee 2011). Given the lack of development of this position, many ex-
isting accounts of case-based research methods adopt a weaker position as a
means of protecting themselves from critiques of epistemological determin-
ism by softening their deterministic ontological claims. For example, many
case-based scholars use such terms as an “almost deterministic” relationship
to refer to a pattern where C might be necessary in 19 out of 20 cases (e.g.,
Ragin 2000). In practice, this loosening has quite serious implications for
the building and testing of theories both within and across cases, given that
we never really know whether we have found an exception that proves the
rule (the 20th case where a case-specific condition meant that the relation-
ship did not work there) or whether there is a much more complicated
causal pattern (e.g., the cause has a positive impact in some of the cases but
a negative or no impact in other cases, depending on the context), or even
no relationship at all (spurious causation).
Another common misunderstanding is that ontological determinism is
the same as the theoretical determinism of structural, macrolevel theories.
Dunning (2017) conflates the two when he claims that deterministic theo-
ries make claims about outcomes being “inevitable.” If this argument holds,
there would no place for human free will (i.e., agency) in deterministic theo-
ries. While this is true for strong structural macrolevel theories, ontological
determinism is also compatible for theories that incorporate agency. Indeed,
the claim that it is always possible for one or more conditions to interfere
with the workings of a theorized mechanism, leading to the process break-
ing down in a given case, is completely compatible with an ontologically
deterministic causal logic.
We can illustrate this conflation of structural deterministic theories
and ontologically deterministic understandings of causation itself by com-
paring a structural theory with a theory that incorporates agency. Waltz’s
(1979) structural realist theory of the international system leaves little if
any space for agency or contingency by either states in their foreign poli-
cies or individual leaders within states. The distribution of relative power
determines the type of international system that exists, acting as a struc-
tural straightjacket on states that produces the broad contours of their
foreign policies. Here, there is little wiggle room for state agency, especially
when viewed over the longer term. But this is a theoretical position on the
agent-structure/micro-macro debate (see section 2.4). Taking a position
about the causal priority of structure is not the same thing as understand-
ing causation in a deterministic fashion.
Indeed, many theories hypothesize that actor-level causes such as agency

Revised Pages
by policy entrepreneurs are necessary conditions for policy change to occur in

a situation in which a structural window of opportunity is open (e.g., King-
don 2002). Claiming that a condition is necessary is a deterministic ontolog-
ical claim. Here the outcome (policy change) would not occur if the policy
entrepreneur either decided against or was prevented from “attaching solu-
tions to problems, overcoming the constraints by redrafting proposals, and
taking advantage of politically propitious events” (Kingdon 2002: 165–66).
Yet the outcome (policy change) is not “inevitable” and can be prevented
if the policy entrepreneur either chooses not to engage in the activities as-
sociated with coupling problems to solutions or is prevented from doing so.
Based on this theory, using process-tracing in a positive case of policy reform
(a typical case—see chapter 4). would shed light on how policy windows
open and how political entrepreneurs couple policy streams. Engaging in
theoretical-revision process-tracing in negative cases of nonreform in situ-
ations in which a policy window was open would help shed light on where
the process broke down and thus enhance our knowledge about the condi-
tions required for the process to work.
Therefore, operating with ontological deterministic assumptions about
causation does not imply that we have to operate with an “inevitability”
thesis in which there is no human free will. We might have theorized that
the causal mechanism went down the path of a → b → c and found that we
were wrong in a studied case. But this does not mean that the mechanism
is probabilistic in functioning; rather, it merely means that we were wrong
and that we should modify our theory using the type of back-and-forth in-
teraction between theory and data for which case-based research commonly
argues. Being wrong empirically in a case should therefore not be conflated
with the underlying ontological structure of the world (Adcock 2007: 349).
Indeed, an ontologically deterministic position simply means that things do
not happen by chance, although our empirical knowledge of why something
happened will always be imperfect.
A final common argument for the invalidity of ontological determinism
is that it implies that we should be able to predict what happens in future
cases based on a deterministic claim. However, as the chaos theory branch of
mathematics has developed, complex, nonlinear relationships can appear to
be probabilistic even when they are actually produced by deterministic causal
relations. This complexity makes prediction almost impossible because even
if we have perfect understanding of causal laws (as we do in meteorology)
and very good empirical knowledge of a given situation, complex, nonlinear
relationships can result in even small differences between the observed and

Revised Pages
the actual producing large deviations between the predicted and the actual
over time. We thus must be very careful in generalizing from a studied case
to other, nonstudied cases, but we do not necessarily need to adopt ontologi-
cal probabilistic models. Instead, when studying past cases, we should try to
understand these complexities and model them theoretically.
Ontological Probabilism—Causation as Mean Causal Effects
Causal claims in probabilistic theories take the form of “Y tends to increase

when X increases,” which can only be assessed meaningfully by examining
trends across cases. Claims about mean causal effects are made because there
can be many reasons the relationship does not hold in an individual case be-
cause of an inherent randomness in the world or more realistically because of
the complexity of causal patterns across a population of cases that can result
from unknown omitted variables (Leamer 2010; Williams and Dyer 2009).
The argument for a probabilistic ontology based on the inherent ran-
domness of the world is often linked to arguments inspired by quantum
physics.1 However, there are at least two reasons to not accept that theories
from quantum physics tell us that the social world also should have a degree
of inherent randomness. First, the theory of decoherence in physics tells us
that nonclassical (probabilistic) quantum behavior at the subatomic level
gets suppressed as a result of interactions between objects and systems, thus
resulting in the emergence of classical (deterministic) behavior as we move
toward the levels of analysis in which we as social scientists are interested
(Bacciagaluppi 2012; Joos et al. 2003). Second, at the empirical level, few
physicists claim that the theoretical mathematics in quantum theory actually
represents the world as it is; rather, quantum theory is a conceptual tool to
understand the “spooky action” properties of what physicists observe empir-
ically. And recent applications of Bayesian probability theory have begun to
make sense of the seemingly random nature of observations where particles
appeared to be in two places at the same time, developing a probabilistic
Bayesian epistemology to account for uncertainties in measurement (Fuchs
2010; Von Baeyer 2013).
A more plausible argument for ontological probabilism in the social
1. The “randomness” in the quantum world deals in particular with the indeterminacy of
properties, the nonlocalizability of quantum objects, and the nonseparability of quantum
states as a result of entanglement. For a good introduction, see Kuhlmann and Glennan 2014.

Revised Pages
world than importing lessons from quantum behavior that cannot be ex-
pected to hold at our level of existence is that when we are making causal
claims across a number of cases, we can at best detect trends (e.g., mean
causal effects) because of the causal complexity of the social world.2 This
relates to the question of causal homogeneity/heterogeneity. A causally ho-
mogenous set of cases is where the same cause(s) produce the same out-
comes, whereas a set of causally heterogeneous cases is one where the same
outcome might be produced by different causes, or the same cause produces
different outcomes.3 For example, when explaining what causes democratic
transitions, we might find a cross-case pattern that suggests that higher lev-
els of inequality and distributional conflict in authoritarian societies make
democratic transition more likely (Haggard and Kaufmann 2016). However,
the population of all cases of democratic transitions since 1980 is very het-
erogeneous because of the contextual conditions that differentiate the South
American cases in the 1980s from the collapse of authoritarianism in Central
and Eastern Europe in the 1990s. Therefore, Haggard and Kaufmann find
that the cross-case trend of inequality and distributional conflict with transi-
tions does not hold in particular cases of democratic transition: for example,
in Czechoslovakia, a combination of the withdrawal of Soviet hegemony
and a desire to “return to Europe,” not the level of distributional conflict,
created conditions under which a transition became possible.
At the end of the day, whether the stochastic elements of the social world
result from inherent randomness or are the product of causal complexity
across cases is irrelevant—the methodological implications of ontological
probabilism are the same (Gerring 2005, 2011; King, Keohane, and Verba
1994: 211; Marini and Singer 1988). Variance-based scholars approach the
study of mechanisms and process using probabilistic terms, treating causal
mechanisms as analogous to intervening variables that make the occurrence
of outcome O more likely within a population of cases, other things equal.
The process is translated into a simple pathway “variable” whose causal effect
within a population of cases can then be assessed statistically (Weller and
Barnes 2014). The mean causal effect can then be assessed by investigating
cross-case trends.
Many scholars would argue that studying single cases is meaningless
2. As the rest of this book explores, complexity does not necessarily mean that we must
adopt probabilistic ontological assumptions; it can also mean that we just have to be more
cautious in the scope of generalizations being made.
3. Within variance-based approaches, the term typically used to refer to causal homogene-
ity is stable unit treatment value assumption (SUTVA). See Morgan and Winship 2007: 37–40.

Revised Pages
when we are dealing with mean causal effects. Indeed, King, Keohane, and
Verba (1994: 129) go so far as to say that an investigation of one case tells
us nothing: “Nothing whatsoever can be learned about the causes of the
dependent variable without taking into account other instances when the
dependent variable takes on other values.” We could make the probabilis-
tic claim that when the star player on our favorite team (cause) plays, the
probability of our team winning increases. A single game in which the star
player was absent and the team lost would provide no information about the
relationship because there is no variation in the outcome of the game. The
probabilistic claim about the star player increasing the probability that our
team wins should instead be assessed by measuring the difference that the
star player makes across a number of cases, holding other factors constant.
If we still insisted on engaging in a case study in this variance-based
probabilistic world, King, Keohane, and Verba’s (1994) methodological
guidance is to recommend that we transform one case into many to assess
the mean causal effects that variation in the values of X have for values of
Y across cases. In their opinion, a case can be disaggregated geographically
(e.g., moving from the federal (n = 1) to state level (n = 50)), temporally (e.g.,
before/after), or by splitting a negotiation into distinct issue areas. We might
disaggregate a game by treating quarters or halves as cases. We could then
investigate whether our team was winning when the star player was present
and losing when he was absent. However, as chapter 5 explores, we cannot
treat these cases as independent cases, which is required to assess difference-
making variation. We might have been winning in the first half of the game
when the star player was on the field. However, at halftime, the other team
rallies in the dressing room, and they return to the field so highly motivated
that they cause an injury to our star player. Our team then loses, but not
because of his absence but because the “outcome” of the first half (our team
winning) acted as a “cause” for the different outcome (loss) in the second
half. In other words, the value of the outcome in t0 influences the value of
a cause in t1.
Other scholars suggest that it can make sense to study single cases if we
are able to estimate the likelihood of finding a causal variable being op-
erative (here a mechanism) in a given case (Gerring 2007: 115; Macartan
Humphreys and Jacobs 2015; Levy 2008: 12).4 This would take the form of
a propensity score, which is a function of the impact of the other variables
4. In the second edition of his book, Gerring (2017) suggests that single-case studies merely
serve an “illustrative function.”

Revised Pages
present in a given case along with the level of the mechanism present in the
case if it is measured as an ordinal-or interval-scale variable. “A ‘least-likely’
case is one that, on all dimensions except the dimension of theoretical inter-
est, is predicted not to achieve a certain outcome, and yet does so” (Gerring
and Seawright 2007: 115). In contrast, “a ‘most-likely’ case is one that, on
all dimensions except the dimension of theoretical interest, is predicted to
achieve a certain outcome, and yet does not” (115). The logic, then, is that
if we find that the mechanism operates in the least-likely case, we increase
our cross-case estimate of the strength of the causal effect in the population,
whereas if we do not find the mechanism in a most-likely case, we down-
grade our estimate of the strength of the causal effect in the population.
However, this updating of cross-case propensities based on the mismatch
between predicted and actual propensity of a single case requires the very
strong assumption of cross-case causal homogeneity within a population of
cases (Cartwright 1999). This means that we must be able to assume that
all variables have a continuous and linear relationship with the outcome.
For example, what if a particular causal condition matters in a subset of the
cases (e.g., a geographic region) but not in the other cases? We might have
selected a “least-likely” case from this region, where we had expected that
the relationship would not hold because of adverse contextual conditions,
but nevertheless found that it did matter. Based on the logic of least-likely
cases, we would then make a generalization with more confidence that the
same relationship holds across all cases, also in other regions. However, this
inference would be a flawed unless we can demonstrate that the strong as-
sumption of perfect causal homogeneity within the population holds. In
the real world, context matters for how causal relationships work. A causal
condition might work in a small set of cases but have precisely the opposite
effect in another context (see chapter 4).
Ontological Determinism—Outcomes in Cases Have Causes
In contrast, in a case-based understanding of ontological determinism, an

outcome is either possible or not in a given case. We might select a case that
looks like a least-likely case and find the causal relationship present, but this
result does not increase our confidence in the existence of the cross-case re-
lationship. Instead, we have learned that contextual conditions that we had
thought prevented a relationship actually did not matter, at least in this par-
ticular case. Therefore, if an outcome occurs when we did not theoretically

Revised Pages
expect it (a least-likely case), we should revise the theory about the context in
which the relationship holds rather than make stronger generalizations. For
example, when studying elite decision-making, we might have selected what
looked like a least-likely case for the relationship between efficient presi-
dential leadership (C) and rational decision-making (O), where in a severe
crisis situation, conditions such as the extreme stakes involved and the short
time frame for decisions would be expected to prevent efficient presidential
leadership (C), thereby resulting in poor decision-making processes (~O).
But if we then found in a case study that the outcome was actually rational
decision-making (O), we would not make a stronger cross-case inference
that C matters across the population of more typical cases because it mat-
tered when we least expected it. Instead, we would want to delve into the
case to understand why our theoretical expectations about the impact of the
adverse context were confounded, probing which factors would enable us to
more accurately delineate the contextual conditions required for presidential
leadership to matter. One plausible result would be that we revise our theory
to claim that the combination of extreme stakes and short time frame makes
actors more vigilant in their decision-making, as occurred in the Cuban
Missile Crisis (see Janis 1983).
In a case-based understanding of process-tracing, the tables are flipped.
Here the analytical focus is on understanding how the mechanism works
within single cases. Cases are the natural focus because causes and the mech-
anisms linking them with outcomes are causally operative within cases, not
across cases (Russo and Williamson 2011: 61). This means that the traces of
their operation are found within cases, making individual cases the analyti-
cal point of departure. This implies a bottom-up form of research in which
we first assess whether mechanism A was present and functioned as hypothe-
sized in a single case. The next question then is whether the same mechanism
is present in other cases, and if so, under what conditions mechanism A links
together the cause and outcome. But for this bottom-up approach to study-
ing mechanisms to make sense, an ontological deterministic understanding
of causation must be adopted.
For process-tracing in cases to make sense, therefore, we logically have
to adopt an ontological deterministic position. This does not mean that the
mechanism will always occur; whether or not it does is a function of the
presence or absence of the contextual conditions required to trigger it. Nor
does it mean that the outcome will always occur when the cause and mecha-
nism are present; this is dependent on whether other causal and contextual
conditions are present that together are sufficient to produce the outcome.

Revised Pages
All it means is that in a given case, a mechanism either worked as theorized

or it did not. Mechanisms are not present or absent by chance: they worked
or did not work for some reason. We might have imperfect empirical knowl-
edge about the contextual or causal conditions required for the mechanism
to work, but this is an empirical question, not an ontological one.
A theory of a causal mechanism that is hypothesized to link C with O in
a particular context is therefore an ontological deterministic claim. We are
not claiming that the presence of C will increase the likelihood of a mecha-
nism linking C with O; instead, we are claiming that if C and the contextual
conditions are present, the mechanism will be triggered. If we believe that it
is actually a probabilistic relationship—either because of inherent random-
ness or because our knowledge of contextual conditions is so patchy that we
cannot make educated hypotheses about what conditions are required for it
to work—we should study it not with a case study but instead with more
appropriate cross-case methods that can capture trends.
Beyond being logically necessary for engaging in case study research,
the assumption of ontological determinism also results in better theoretical
and empirical explanations for real-world phenomena. When we make on-
tological deterministic claims, we are forced to tackle head-on any incon-
gruences and anomalies that result from in-depth empirical case-based re-
search instead of discounting them as being exceptions to an overall trend,
as we would if we were working with probabilistic theoretical claims. If we
do not find a causal relationship in a case for which our theory told us that
it should be present, we do not just discount this as an exception to an oth-
erwise strong cross-case trend. Instead, we reappraise our theory, attempt-
ing to figure out why what we expected did not occur (Adcock 2007; Dion
2003: 106–7; Mahoney 2008). These failures of our theories are intensely
interesting for case-based researchers, enabling us to build better theories
and adding to our cumulative knowledge about causal mechanisms (An-
dersen 2012). For example, if we theorize that mechanism A links C and
O together under a set of specified contextual conditions but find in a
process-tracing case study that this did not happen in a particular case, we
would not just discount this negative finding as an anomaly in an other-
wise strong trend. On the contrary, we would want to figure out what went
wrong with our theory in this deviant case. Did we omit a contextual con-
dition that had to be present to trigger the mechanism? We could explore
this question by comparing the contextual conditions that existed in those
cases where mechanism A was present with the conditions in the deviant
case. After both examining the deviant case itself and comparing it with

Revised Pages
other positive cases, we might find an omitted contextual condition in the

deviant case, with mechanism A occurring only when both C and C1 are
present. This finding would enable us to update our empirical theory, re-
sulting in a better theoretical understanding of the causal mechanism and
the contextual conditions under which it holds.
This intense interest in learning from deviant cases marks a key analytical
difference between research designs based on probabilistic and deterministic
ontological understandings of causal relationships as applied to single cases.
The result of grounding case-based research in the assumption of ontologi-
cal determinism is that our theories are progressively refined in an iterated
process of empirical research, thus making our findings less and less wrong
as repeated meetings with empirical evidence provide us with a better un-
derstanding of how causal mechanisms work and the contextual bounds in
which these relationships hold. Determinism at the ontological level does
not logically imply that we can gain certain empirical knowledge about why
things occur. At the epistemological level, we should adopt a probabilistic
Bayesian-inspired logic. As chapter 5 discusses, while the social world at the
ontological level can be claimed to be deterministic (i.e., things do not hap-
pen randomly), our empirical knowledge about why things occur will always
be imperfect and contingent on further research.
Symmetric and Asymmetric Causation
Our claims in process-tracing are asymmetric: that is, we make claims about
the mechanisms triggered by a cause (or set of causes) but make no claims
about other mechanisms triggered in other circumstances or what happens
when the cause is not present. Whether we make symmetric or asymmetric
claims has major implications for research design, notably with regard to
how we conceptualize concepts and mechanisms (chapters 2–3), how we
make causal inferences (chapters 5–7), and proper case selection (chapter 4).
Symmetric causation means that we hypothesize that there are causal
effects across different values of both causes and outcomes. For example, a
deterministic symmetric claim would be that low values of X result in low
values of Y and high values of X result in high values of Y. A probabilistic
symmetric claim would be that as values of X increase, so does the probabil-
ity that higher values of Y will occur. Symmetric claims are typically made by
variance-based scholars who build on constant conjunction and/or counter-
factual understandings of causation (e.g., Gerring 2011; J. Woodward 2004).

Revised Pages
Fig. 1.1. Symmetric versus asymmetric causal claims
When making symmetric claims, theoretical concepts are typically treated

as variables. (For more on the distinction between variables and conditions,
see chapter 3.) This can take the form either of a dichotomous variable where
each pole of the concept is defined (e.g., war and peace) or of different values
being defined on an ordinal or higher scale (e.g., interval-scale per capita
gross domestic product). The analytical focus is then on assessing whether
differences in values of X produce a difference in values of Y. The left side of
figure 1.1 depicts the nature of symmetric causal claims.
Both probabilistic and deterministic symmetric claims require evidence
of variation and difference-making, typically in the form of ordinal or higher
scales of variables but also via dichotomous variation (presence or absence).
The population of cases to which symmetric causal claims refer includes
both positive and negative cases of Y. Because of the need for variation, we
argue that symmetric causal claims cannot be made about within-case causal
relationships unless we transform a single case into a set of cases by disag-
gregating either temporally or spatially.
In process-tracing, as with all case-based methods, the claims are asym-
metric (Beach and Pedersen 2016a; Goertz 2017; Goertz and Mahoney 2012;
Ragin 2000: 21–42). Asymmetric claims can be made either about cross-
case causal relationships—for example, that a cause is necessary for the out-
come—or within-case causal relationships (e.g., mechanisms). For example,
using theory-testing process-tracing in a given case, we might conclude that
mechanistic evidence supports the claim that mechanism A operated in the
case. However, this is an asymmetric within-case causal claim because we
are not making any claims about what happens when the mechanism is not
present. The right side of figure 1.2 depicts this asymmetry. Asymmetry has

Revised Pages
major implications for proper case selection in case-based research because

in most situations, the population of relevant cases includes only the posi-
tive cases of the outcome (O). At the core of asymmetric causal claims is
set theory, where concepts are defined by the theoretical characteristics that
determine whether a given case is a member of the set of the concept. Set
membership is defined by a qualitative threshold—a given case is either a
member of the set or it is not. This means that causal concepts in case-based
research are not variables but instead are defined solely in terms of what
causally relevant attributes characterize cases within the set of the population
of the phenomenon from those outside it (see chapter 3).

Revised Pages
C h a p ter 2
What Are We Tracing?

If you don’t know where you are going any road can take you there.
—Lewis Carroll, Alice in Wonderland
2.1. Introduction
Even though the term process is in the title, there is widespread disagreement
about what we are actually tracing when engaging in process-tracing. Most
scholars acknowledge that process- tracing is tracing causal mechanisms
(Beach and Pedersen 2016a; Bennett 2008a, b; Bennett and Checkel 2014;
Checkel 2008; Hall 2008; Rohlfing 2012), but there is considerable confu-
sion and discord about what causal mechanisms actually are. These disagree-
ments about what we are tracing have resulted in considerable methodologi-
cal debate about what good process-tracing is.
This chapter cuts through this confusion and discord by developing two
distinct positions relating to the nature of causal mechanisms that are com-
patible with the goal of tracing processes empirically within cases. Both posi-
tions are relevant for process-tracing, but the appropriate choice depends on
the research situation. Early in a project, it can make sense to operate with a
minimalist understanding, where mechanisms remain relatively black-boxed
as mere “sketches” in which the parts and the causal logics linking them
are not specified. When there is considerable uncertainty about whether
a mechanism links a cause (or set of causes) and an outcome or multiple
mechanisms might plausibly link them together, it makes sense to engage in
a form of plausibility probe to see whether there is any evidence of a link be-
fore engaging in a detailed tracing of a full-fledged mechanism. In contrast,
tracing a full-fledged mechanism makes sense when we have a strong hunch
29

Revised Pages
that a particular mechanism might be operative and want to test more rig-
orously whether it operates as we had theorized, or when there is strong
evidence of a causal relationship and we want to find what mechanism links
together the causes and outcome.
Drawing on recent advances in the philosophy of science related to
mechanisms and case-based methodologies, this chapter delineates the dif-
ferent positions in the debates about the nature of mechanisms, explain-
ing which positions are compatible with a case-based approach to study-
ing causal mechanisms using process-tracing. We first explain why tracing
events in a case is not the same thing as tracing a causal mechanism and
why mechanisms should be understood in case-based approaches as some-
thing more than mere intervening variables.1 After discussing what causal
mechanisms are not, we then explore two different conceptions of what
causal mechanisms are (minimalist and systems) that are compatible with
a case-based understanding of process-tracing. We discuss the methodologi-
cal consequences and trade-offs of adopting either a minimalist or systems
understanding of mechanisms. The chapter concludes with a discussion that
addresses five common questions regarding mechanisms.
2.2. The Nature of Causal Mechanisms
The essence of making a mechanism-based claim is that we shift the an-

alytical focus from identifying causal effects to explaining how causes are
linked to an outcome—that is, from causes → outcomes to the process in
between. Mechanisms are not causes but are causal processes that are triggered
by causes and that link them with outcomes in a productive relationship.
Viewing causation in mechanism-based terms means that we explain why
something occurred by analyzing the productive processes that link a cause
(or set of causes) with an outcome (e.g., Bhaskar 1978; Bunge 1997; Glennan
1996; Illari and Russo 2014; Illari and Williamson 2011; Machamer 2004;
Machamer, Darden, and Cravwe 2000; Russo and Williamson 2007, 2011;
Salmon 1998).
As Table 2.1 shows, there are four positions on what we are tracing in
process-tracing. The first two positions are not compatible with within-case
1. It is of course completely valid to understand mechanisms as intervening variables, but
doing so leads down a very different methodological path than the case-based approach de-
veloped in this book. For excellent introductions to variance-based approaches, see Gerring
2017; Weller and Barnes 2014.

Revised Pages
What Are We Tracing? 31
research designed to make causal inferences. The second two positions in-
volve a minimalist and a maximalist (systems) understanding of the nature
of mechanisms. Causal mechanisms should not be understood as series of
events, given that a descriptive narrative is not the same thing as explaining
a process that links together causes and outcomes. An intervening-variable
understanding is not compatible with the goal of tracing the process using
in-depth, within-case analysis. Therefore, these two positions are depicted in
gray in table 2.1.
This leaves us with the minimalist and systems-understanding positions.
Both are compatible with a case-based understanding of process-tracing; the
choice of which one is more appropriate depends on the question in a par-
ticular research project.
Understandings That Are Not Compatible with the Goal of Tracing

Mechanisms within Cases
In theory-guided social science research (a category to which process-

tracing belongs), the goal is to make evidence-based causal claims. When
dealing with mechanistic causal explanations, the goal is to figure out what
process(es) were operative in an actual case. Figure 2.1 depicts how some
scholars view process-tracing, using the term to refer to research that traces
an empirical process in the form of a sequence of events between the oc-
currence of a cause (or set of causes) and an outcome (e.g., Abell 2004;
TABLE 2.1. Different Understandings of What Process-Tracing Is Tracing

What are we tracing in process-tracing? Authors
Descriptive narratives, i.e. series of Abell 2004: 295–96; Mahoney 2012: 571;
events in between the occurrence of Suganami 1996: 164–68
the cause and outcome
Intervening variables Gerring 2007; King, Keohane, and Verba 1994;
Mahoney 2015: 206; Weller and Barnes 2015
Minimalist mechanisms Bennett and Checkel 2014; Elster 1998; Falletti
and Lynch 2009: 1146; George and Bennett
2005: 6
Mechanisms as systems Bunge 1997, 2004; Cartwright 1999; Craver
and Darden 2013; Glennan 1996, 2002;
Illari and Russo 2014; Machamer, Craver,
and Darden 2000; Russo and Williamson
2007, 2011

Revised Pages
Fig. 2.1. Empirical narratives and the complete black-boxing of mechanisms
Mahoney 2012: 571; Roberts 1996: 16; Suganami 1996: 164–68). This type of
research takes the form of an empirical narrative in the form of “actor A did
X to actor B, who then changed its position on issue Y,” etc.
Yet causal mechanisms are more than just a series of events. A sequence of
events tells us who did what but does not tell us why or how the events were
linked together in a causal sense. Describing a series of events can provide
a plausible descriptive narrative about what happened but does not shed
light on the causal question of why things happened. Causal explanation
therefore involves more than just tracing temporal sequences (Gryzmala-
Busse 2012: 1268–71). Tracing a series of events between C and O is in reality
merely crafting a “just so” story of case-specific happenings between C and
O, with the causal logic underlying the linkages between events completely
black-boxed. Therefore, sequences of events do not qualify as mechanistic
explanations (Craver and Darden 2013). The result of viewing mechanisms
as series of events is that the causal mechanism linking causes and outcomes
is completely black-boxed (Bunge 1997). This is depicted in figure 2.1.
Abell’s account of analytical narratives goes a bit further than suggest-
ing we should just trace events; rather, he contends that we need to de-
velop narrative structures that include action linkages between events that
build on subjective counterfactuals, where we ask actors who participated
in a process whether things could have been different at critical junctures
(2004: 295–96). However, this position has two problems. First, logically it
collapses back down onto the counterfactual understanding of causation,
which means that we shift our attention from how a process worked within
a case to the investigation of the difference that variations in action linkages

Revised Pages
make for the outcome across cases (i.e., comparing the actual with the hypo-
thetical scenario imagined by a participant). Second, it significantly reduces
the scope of research questions to only those that can be assessed by asking
actors whether things could have been different.
Some scholars view mechanisms as an intervening or mediating variable
between C and O (Gerring 2007; King, Keohane, and Verba 1994; Weller
and Barnes 2014). For example, Mahoney (2015: 206) writes, “I use the term
mechanism to refer to a factor that intervenes between a cause and outcome.
I treat mechanisms in the same way as causes and outcomes; they are par-
ticular events or specific values on variables. Mechanisms are different from
causes and outcomes because of their temporal position: they stand between
a cause and outcome in time. Thus, in the expression X → M → Y, the
letters refer to events or specific values on variables, with X being treated
as the cause, M as the mechanism, and Y as the outcome.” In this section
we discuss the theoretical problems with treating mechanisms as variables,
whereas in chapter 5 we discuss the empirical challenges of claiming that we
are tracing mechanisms within cases by using evidence of difference-making
across cases.
At the theoretical level, treating a mechanism as a variable means logi-
cally that it collapses down onto a counterfactual understanding of causa-
tion (Woodward 2003: 350–58). Counterfactual causation is defined as the
claim that a cause (or mechanism) produced an outcome because its absence
would result in the absence of the outcome, all other things being held equal
(Lewis 1986: 160; Woodward 2003). Treated as variables, mechanisms have
an independent causal impact that must be assessed by measuring the im-
pact of the mechanism’s absence on values of the outcome across cases, with
everything else held constant—that is, using evidence of difference-making.
Yet treating mechanisms as intervening variables (and thereby as
counterfactual-based claims) results in the theoretical black-boxing of the
causal links in our theory (Groff 2011). A “mechanism explanation for some
happening that perplexes us is explanatory precisely in virtue of its capacity
to enable us to understand how the parts of some system actually conspire to
produce that happening” (Waskan 2011: 393). In the words of Bogen (2005:
415), “How can it make any difference to any of this whether certain things
that did not happen would have or might have resulted if other things that
did not actually happen had happened?” Groff (2011: 309) claims that mech-
anisms are real processes that involve the exercise of causal powers in the real
world, not in logically possible counterfactual worlds. Machamer (2004: 31)
writes, “In the case of counterfactuals, the modality of the subjunctive con-

Revised Pages
trast is somehow supposed to warrant the necessity of the actual causal case.
But even if this were true, and I am not sure it is, there still would be no
explanation, for they would still have forsaken the process of production by
which these certain entities and activities produced such and such changes,
which was what was to be explained.” The distinction between understand-
ing parts of mechanisms as lower-level counterfactuals and the mechanistic
account can be understood by using an analogy to an integrated electronic
circuit. In a counterfactual-based analysis, we would assess whether each
part turned on or not as current passed through. The evidence of difference-
making would be the difference made to the part by turning the current on
or not. In contrast, in a mechanistic account, while there is a counterfactual
“difference” overall between current on and off states, we are interested in
tracing how the current moved through each part of the system. Here, the
circuit would be assessed as a holistic system, using the circuit schematic to
understand the step-by-step transference of current from input to output.
Engineers “tracing” this system in detail might then look for traces in the
form of heat or photons to determine whether current leakage occurs in
particular parts, helping them design more efficient and robust circuits or
to understand where the circuit failed. In plain English, learning that some-
thing could have been different—that is, identifying a causal effect—is not
the same thing as learning about how a process works in an actual case.
The counterfactual-based account on which an intervening-variable un-
derstanding of mechanisms builds has the result of transforming the within-
case tracing of causal mechanisms into a cross-case analysis of patterns of
variation, just at a lower level of aggregation. But by studying difference-
making by assessing the effect of variation of intervening variables between
a cause and outcome, we lose focus on the process itself that links the cause
and outcome (Bogen 2005: 415; Dowe 2011; Illari 2011; Mayntz 2004: 244–
45; Waldner 2012). This means that if we treat mechanisms as intervening
variables, we are no longer studying the causal links in the theory—that is,
how the process worked within a case. Yet this was precisely the reason we
wanted to trace the causal mechanism using process-tracing!
But our arguments here do not mean that mechanisms cannot be re-
searched using an intervening-variable approach. Indeed, there are excellent
methodological guidelines for designing research on mechanisms defined
as intervening variables (e.g., Weller and Barnes 2014). But understand-
ing mechanisms as intervening variables is not compatible with the goal of
learning how processes actually work within cases because studying interven-
ing variables by definition means that we need to explore the difference that

Revised Pages
variation makes across cases—for example, by disaggregating a case tempo-

rally into a series of cases that are compared.
Minimalist Understandings of Mechanisms
Despite the minimalist definition’s superficial resemblance to the

intervening-variable understanding of mechanisms, scholars who use the
minimalist definition do understand mechanisms as linking causes and out-
comes. However, the arrow between a cause and outcome is not unpacked
in any detail, either empirically or theoretically. Typically, scholars operating
with minimalist understandings are more focused on finding some form of
within-case mechanistic evidence of a link, understood as the observables
that could have been left by a causal mechanism (Illari 2011; Russo and Wil-
liamson 2007). In the social science literature, some scholars use the term
causal process observations (e.g., Brady and Collier 2011) or diagnostic evidence
(Bennett and Checkel 2014: 7), which means basically the same thing. We
return to mechanistic evidence in chapter 5.
In the minimalist understandings, mechanisms are typically theorized
at a relatively high level of abstraction (see, e.g., Bennett and Checkel 2014;
Elster 1998; Falletti and Lynch 2009: 1146; George and Bennett 2005: 6;
Mahoney 2015). The theoretical causal process that binds together the cause,
the mechanism, and the outcome is not unpacked in any detail. This means
that the theorized mechanistic explanation is either (1) superficial because
both the parts of the process itself and the causal logics linking them to-
gether are not specified at all or (2) incomplete because the causal logics that
link together parts of the process are not specified (Craver and Darden 2013:
83–95). Table 2.2. depicts this gray-boxing of either the whole process or the
causal links in the process.
A superficial minimalist mechanism is typically depicted in the form of
Cause → Causal Mechanism (M1) → Outcome, whereas an incomplete
TABLE 2.2. Superficial and Incomplete Mechanistic Explanations

Superficial mechanistic Cause → “one-liner” description of process → Outcome
explanation
Incomplete mechanistic Cause → entity → entity → Outcome

explanation
Source: Based on Craver and Darden 2013: 83–95.

Revised Pages
mechanism scheme would include more parts (e.g., Cause → Part 1 → Part 2
→ Outcome), but the causal logics linking parts together are not described,
but instead are merely depicted as arrows that are assumed to link parts
together in a relationship of conditional dependence. In both instances, the
theorized minimalist mechanism does not provide enough information to
answer fully the “how does it work” question (Craver and Darden 2013:
90–91).
An example of a theorized “minimalist” mechanism can be found in Nina
Tannenwald’s 1999 article on the impact of norms on U.S. decision-making.2
She theorizes that norms against the nonuse of A-weapons (cause) (a nu-
clear “taboo”) contributed to U.S. decision-makers’ avoidance of using those
weapons (outcome), but the mechanism remains firmly within a theoretical
gray-box because no mechanism is detailed to link the cause to the outcome.
The closest she gets to unwrapping causal mechanisms is in the conclusion,
where she mentions three plausible links between norms and nonuse in the
form of minimalist “one-liners”: constraints imposed by individual decision-
makers’ personal moral convictions or domestic or world opinion (Tannen-
wald 1999: 462; see also Tannenwald 2007: 47–51). Yet these brief descrip-
tions do not explain the causal process that links norms with nonuse; that
is, how does the existence of the taboo actually produce behavioral changes?
How does the process work to link individual moral convictions to nonuse
in a situation where nuclear weapons might otherwise have been used. Do
these individuals have to deploy normative speech acts to shame other actors?
Given that we cannot answer these questions, we can conclude that the actual
causal process remains minimalist. This is depicted in figure 2.2, where the
process is gray-boxed because we are not told how the process actually works.
However, Tannenwald’s research situation warranted not unpacking mecha-
nisms in any detail because there was a low prior confidence in the existence
of a causal relationship between norms and nonuse (1999: 438). But after she
found within-case evidence of a relationship, the natural follow-up would
have been to probe mechanisms in more detail.
While a minimalist mechanism is by definition an inadequate mechanis-
tic explanation because key parts of the process are not unpacked and there-
fore are not subject to theoretical and empirical scrutiny, this is a deliberate
choice determined by the research situation. Of course, the analytical result
of not unpacking the mechanism in any detail theoretically means that the
2. For a more in-depth discussion of Tannenwald’s research design, including the theorized
mechanism and the evidence presented in her research, see chapter 7.

Revised Pages
Fig. 2.2. A minimalist “norms matter” causal mechanism
mechanistic evidence that we gain from a minimalist process-tracing case

study does not enable strong within-case causal inferences to be made. But
given that many different plausible mechanisms may link together a cause
and an outcome, it makes sense to explore whether there is any within-case
mechanistic evidence of particular causal processes early in a research pro-
cess, before engaging in more encompassing, step-by-step tracing of mecha-
nisms. If no within-case evidence is found for any of the plausible mecha-
nisms, a causal link may not exist; instead, a found correlation in cross-case
analysis is merely spurious. If, in contrast, within-case evidence is found for
one or more mechanisms, this narrows the candidate field down to a man-
ageable size, enabling the researcher to turn to more in-depth tracing of one
or more causal mechanisms in more detail. When combined as a two-stage
case study research design, the initial minimalist process-tracing study acts
as a form of plausibility probe of rough mechanism sketches, with more
detailed process-tracing of the mechanism(s) to follow.
Theorizing mechanisms in a minimalist fashion can also be appropriate
as a follow-up to successful process-tracing case studies. Considerable uncer-
tainty may exist about whether a mechanism found in several cases actually
operates more broadly within a bounded population. One strategy for ex-
ploring the bounds of the operation of a mechanism is to focus attention on
critical parts of the mechanism, in effect reducing the unpacked mechanism
to be studied in the additional cases to only one or two critical parts instead
of tracing each part of the process (see chapter 4).
Unpacked Mechanisms—Mechanisms as Systems
At its core, the systems understanding means that the core elements of the
causal mechanisms are unpacked theoretically and studied empirically in

Revised Pages
the form of the traces that the activities associated with parts of the process
leave within cases. Mechanisms in this understanding are defined as systems
of interlocking parts that transmit powers or forces between a cause (or a
set of causes) to an outcome (see, e.g., Beach and Pedersen 2016a; Bhaskar
1978; Bunge 1997, 2004; Glennan 1996, 2002; Machamer 2004; Machamer,
Darden, and Craver 2000; Mayntz 2004; Rohlfing 2012; Waldner 2012). In
the systems understanding, we are operating at a lower level of analytical
abstraction because we are trying to capture how actual causal processes play
out within cases. The level of abstraction can range from very detailed, case-
specific models of process to more abstract processes that can in theory be
present within a bounded population of cases.
The ambition is to unpack explicitly the causal process that occurs be-
tween a cause (or set of causes) and an outcome and trace each of its con-
stituent parts empirically. Here the goal is to dig deeper into how things
work by tracing each part of the mechanism empirically using mechanistic
evidence. In particular, observing the empirical fingerprints left by the activ-
ities of entities in each part of the process should enable us to make stronger
inferences about how causal processes actually worked in real-world cases
(Illari 2011; Russo and Williamson 2007). In comparison, in the minimalist
understanding, we have less direct mechanistic evidence because we have
not explicitly detailed the process, resulting in weaker inferences about the
operation of a causal process.
When mechanistic explanations are viewed as systems, they are under-
stood in a holistic way, with the effects of the mechanism more than the
sum of its parts (Sawyer 2004). Parts have no independent existence (i.e.,
they are not variables) in relation to producing an outcome; instead, they
are integral elements of a system that transmits causal forces to the outcome.
In a system, a complex interrelationship often exists between the parts of the
mechanisms, with the effects of the individual parts only manifesting them-
selves fully in conjunction with the effects of other parts. In the words of
Cartwright (2007: 239), “There are any number of systems whose principals
cannot be changed one at a time without either destroying the system or
changing it into a system of a different kind.” This means that a mechanism-
as-system explanation cannot be reduced to counterfactual dependencies.
Each of the parts of the mechanism can be described in terms of entities
that engage in activities (Machamer 2004; Machamer, Darden, and Craver
2000). Entities are the factors (actors, organizations, or structures) that en-
gage in activities, whereas the activities are the manifestations of the causal
powers of the entities in each part (Kaiser 2017; Piccinini 2017)—in other

Revised Pages
words, what transmits causal forces or powers through a mechanism. Given

the focus on the productive nature of mechanisms, activities that link to-
gether parts are the main analytical focus.
The analytical value added of engaging in mechanism-centered research
that views mechanisms as systems is twofold. First, unpacking mechanisms
exposes the causal claim to more logical scrutiny because we cannot just pos-
tulate that a cause like norms is linked to behavioral change through mecha-
nisms described with one-liners like “personal moral convictions” or “world
opinion” (e.g., Tannenwald 2007: 47–51). By unpacking a causal process,
we are better able to identify logical shortcomings in our theories as well as
exposing the critical links in causal stories that are particularly interesting to
elaborate. More logical scrutiny of the causal logics linking together parts re-
sults in better causal theories, other things being equal. Another theoretical
benefit is that learning about how a process works sheds light on the contex-
tual conditions that must be present for the mechanism to work, something
that is vital to transfer the lessons learned from process-tracing to real-world
policy situations (see chapters 3 and 4). A final theoretical benefit is that by
learning about a process, we also learn more about what it is about a cause
that is actually causal because it has to be able to trigger a mechanism (or set
of mechanisms).
Second, if we explicitly theorize the activities that are expected to leave
empirical fingerprints for each part of the mechanism, the subsequent em-
pirical analysis should also study the workings of each part. If direct evidence
is found that each part worked as theorized, then a much stronger causal
inference about the process can be made than can be determined from a
minimalist understanding of mechanisms. If evidence for one or more parts
is not found, a theoretical revision of the mechanism should result, thereby
producing more accurate theories of causal processes.
In the Tannenwald example, the minimalist mechanism that linked in-
dividual moral convictions and behavior could be unpacked theoretically
a bit more by detailing parts of the process and the causal logics linking
them together. Here, we would first have to develop further the causal log-
ics underpinning the links between parts of the process, making explicit
what activities could plausibly link parts together and why. For example, we
could draw on theories that focus on the constraining impact of norm-based
speech acts (e.g., Krebs and Jackson 2007; Schimmelfennig 2001). Using
this causal logic, the mechanism could then be depicted as in figure 2.3.
The theorized mechanism has two parts: (1) a believer (entity) in the taboo
uses a speech act (activity) to attempt to shame proponents of use, and (2)

Revised Pages
Fig. 2.3. A two-part “taboo talk” causal mechanism

Source: Based on Tannenwald 1999.
the proponent (entity) is silenced (activity) because she is unable to deploy

counterarguments because of their normative costs (clash with taboo).
The context for the operation of the “taboo talk” mechanism would
also have to be defined clearly. This could be defined as situations where
decision-makers are debating the possible use of nuclear weapons and where
a medium-strength taboo against their use exists as a form of shared norm
among a number of participants in the group. If the taboo were deeply em-
bedded, it might impact behavior by alternative processes—for example, by
keeping certain options outside of the realm of consideration by actors. If
there was no norm, no process would be triggered.
This example shows the value added of using the systems understanding,
where we have to make explicit both the theorized workings of the parts
of the mechanisms and the causal logics linking parts together as well as to
specify the contextual conditions. First, by unpacking the process, we de-
velop a better theoretical understanding of the process. Tannenwald’s initial
theorization was a superficial mechanistic explanation because we cannot
answer such questions as how normative speech acts actually affect decision-
making. Unpacking the norm → nonuse mechanism into parts forces us to
specify the causal logic hypothesized to be binding together those parts. Of
course, after empirical research, we might find out that the hypothesized
process did not work in the manner we thought. This should lead us to revise
our theory. Second, on the empirical side, while Tannenwald uses “taboo
talk” as mechanistic evidence of the process, finding “taboo talk” tells us
merely that actors deployed norm-based arguments; it does not shed any
light on whether they had an effect and if so, how.
By studying mechanisms as systems using in-depth case studies, case-
based researchers can gain what Salmon (1998) refers to as deeper explana-
tory knowledge of causal relationships, thus enabling stronger claims about

Revised Pages
causation to be made. The analytical trade-off is that we can engage in de-

tailed tracing of mechanisms as systems in only a small number of cases. In
other words, we can say a lot about a little unless we can verify that cases
within a larger population are actually causally homogenous at the level of
mechanisms with the studied ones (see chapter 4). Mechanistic homogene-
ity means that we would expect the same mechanisms at work in all cases
that share the same causes, outcome, and context. Mechanistic heterogene-
ity, conversely, is present if the same cause (or set of causes) is linked to the
same outcome in different cases via different mechanisms (Beach and Peder-
sen 2016b, 10–11; Schneider and Rohlfing 2016, 555). The literature on causal
mechanisms frequently points out that the ways mechanisms unfold in a
specific case are highly sensitive to the surrounding context (Bunge 1997;
Falleti and Lynch 2009; Gerring 2010; Goertz and Mahoney 2009). Given
this sensitivity, when we theorize mechanisms at lower levels of abstraction,
as in the systems understanding, our ability to generalize from studied cases
to other cases is substantially weakened. We trade higher internal validity
of our findings about process for more limited ability to generalize outside
of the studied cases (lower external validity). The remainder of this book
further develops the analytical benefits and trade-offs of engaging in either
minimalist or in-depth process-tracing.
Mechanisms do things, meaning that there is production involved in a
mechanistic explanation. Theories dealing with stable equilibria are there-
fore not compatible with mechanistic explanation because nothing is hap-
pening. However, other mechanisms would be operative in creating a stable
institutional equilibrium, and mechanisms may be involved in sustaining
and reproducing the equilibrium.
2.3. Five Common Questions Relating to Mechanistic

Approaches
Theorizing a Mechanism Does Not Require Theorization about

“Alternative” Mechanisms
There is a widespread belief that alternative explanations for an outcome

must be formulated when engaging in process-tracing (e.g., Bennett and
Checkel 2014; Fairfield and Charman 2017: 366; Zaks 2017). The idea is
that after eliminating alternative explanations, the last standing cause is the
explanation of the outcome in a case.

Revised Pages
However, given that causes are usually not theorized to be individually

sufficient and mutually exclusive—in contrast to simple causal statements
such as which suspect committed a murder—finding that there is evidence
that C1 is causally related to an outcome does not usually rule out that other
causes also are present. A war seldom has a single cause, for example. There-
fore, finding that leader miscalculations mattered does not rule out that
other causes like the need for distraction from domestic ills also mattered.
The requirement that alternative explanations of the outcome are for-
mulated is a legacy of potential-outcomes (aka counterfactual) thinking that
focuses on assessing causal effects in variance-based approaches, whereas
process-tracing involves making inferences about processes that link to-
gether causes and outcomes. If we extend the reasoning about the need for
alternatives to the level of mechanisms, this would mean that we would want
to formulate alternative mechanisms linking a given cause with an outcome.
However, this is misguided, given that a particular cause might trigger many
different mechanisms that have different effects on the outcome (and other
outcomes). This means that finding evidence supporting the claim that M1
is present does not mean that other mechanisms are not also operative.
Most works on process-tracing that talk about the need for formulating
alternative explanations suggest that we should develop rival causes of the
outcome and put them to a competitive theory test. Yet finding within-
case evidence in favor of one cause does necessarily disconfirm that other
causes also matter except when causes are individually sufficient and mutu-
ally exclusive (Rohlfing 2014). In the social sciences, most of the outcomes
we are trying to explain have multiple, non-mutually-exclusive causes. For
example, in the example of Tannenwald’s theory of norms → nonuse of
nuclear weapons, she clearly states that finding evidence that norms im-
pact decision-making does not rule out the involvement of materialist cost-
benefit considerations (1999: 441).
If theories are not rivals, Fairfield and Charman (2017: 366) suggest that
we should reformulate our hypotheses so that they are rivals. For example,
if there are two potential causes (C1 and C2), we could then formulate five
hypotheses: (1) C1 predominantly mattered, (2) C1 mattered more than C2,
(3) C1 and C2 mattered equally, (4) C2 mattered more than C1, (5) C2 pre-
dominantly mattered. However, it is difficult to see what kind of mechanis-
tic, within-case evidence would enable us to discriminate in this fashion.
They admit that this can be problematic: “Strictly speaking, ensuring that
these possibilities are mutually exclusive requires greater precision—what
exactly do we mean by ‘predominantly’ vs. ‘mostly’ vs. ‘relatively equal?’ This

Revised Pages
specification issue can pose challenges in explicit Bayesian process-tracing.

In practice, it is important to specify hypotheses as carefully as possible and
to explicitly include the assumption that they are mutually exclusive and
exhaustive as part of the background information” (2017: 366–67). Tellingly,
in their examples of evidence evaluation, they operate only with the simple
rival explanations of outcomes instead of this more nuanced reformulation
of nonrivals into rival hypotheses.
Irrespective of whether or not mutually exclusive causes are involved,
the discussion of alternative explanations misses the point in relation to
process-tracing, given that we are shifting attention from the causes of an
outcome (causal effects) to the mechanism(s) linking them together. And
the demands for crafting alternative explanations really breaks down at the
level of mechanisms.
A key reason for this breakdown involves the problem of masking in
relation to mechanisms (Steel 2004: 68). Masking means that a given cause
might be linked to the same outcome through multiple mechanisms that
can have different effects on the outcome. For example, Harish and Little
(2017) discuss the effects of holding elections (cause) on levels of political
violence (outcome) in nonconsolidated democracies (context). They claim
that holding elections can have the direct effect of increasing levels of politi-
cal violence, as opposition politicians try to use violence to win an election;
at the same time, however, elections have the longer-term, more indirect
effect of lowering levels of political violence because they lower the costs
of losing a given election (there is always another election!) (240–43). In
other words, elections are linked to the political violence outcome through
two contrasting but nonexclusive mechanisms that have different effects on
the outcome. Here one mechanism can mask the other in terms of overall
net effects on the outcome, and which mechanism has the stronger impact
depends on contextual factors such as the relative short-term efficacy of vio-
lence and the closeness of the election (244–45).
Given that many of the complex causes involved in the social sciences
might trigger many different mechanisms linked either to the same outcome
or different outcomes, the argument that we need to formulate alternative
explanations in the form of mechanisms only applies when the mechanism
being studied can be claimed to exclude other mechanisms from linking the
cause with the outcome (Rohlfing 2014).
Concerns about formulating rival explanations of outcomes or mecha-
nisms linking causes with outcomes stems from two factors. First, it is natu-
ral to want to avoid cherry-picking evidence that supports a favored causal

Revised Pages
explanation. However, the risk of confirmation bias can be reduced sub-

stantially by systematically exploring alternative explanations for particular
pieces of mechanistic evidence (see chapter 5).
Second, many scholars are trained to think in terms of controlling for
other causes, which is vital in experimental and quasi-experimental variance-
based designs that build on the potential-outcomes framework. The worst-
case scenario for variance-based designs is a situation of overdetermination,
defined as the presence of multiple sufficient causes in a case. If overdeter-
mination exists in a given case, variance-based designs cannot disentangle
which cause actually produced the outcome. However, overdetermination is
not a serious problem when studying mechanisms because we can isolate the
workings of individual mechanisms empirically from each other through our
evaluation of the theoretical uniqueness of each piece of found mechanistic
evidence, asking whether finding the evidence can be explained by any other
plausible explanation (see chapter 5). Therefore, multiple causes might be
present, but we can disentangle them by developing empirical fingerprints
of a given mechanism that are relatively unique—in particular, if the mecha-
nism is unpacked into its constituent parts.
A natural counterargument to this claim would be the idea that parts of
mechanisms in the social sciences often do not leave theoretically unique
empirical evidence. In response, however, we should expend more effort on
creatively developing as many potential observables as possible to uncover
some that have a degree of uniqueness. In addition, while one piece of evi-
dence might not be unique, combining independent pieces can potentially
produce a unique signature (Bennett 2014; Good 1991)—a signature that is
connected solely to a particular part of a mechanism at the empirical level.
Do Mechanisms Have to Occur More Than Once to Be Causal?
Can we speak of causation in relation to causal processes that only occurred

once? For example, can we claim that a 1988 speech in which Soviet leader
Mikhail Gorbachev proclaimed that Eastern Europe “could go its own way”
triggered a process that contributed to the peaceful resolution of the Cold
War (Evangelista 2014)? Given that this form of peaceful retreat by a great
power has arguably occurred only once, how can we know whether a par-
ticular cause was actually causal? Is an explanation truly causal only when
we can determine that an individual outcome is just an instance of a more
general pattern across a set of cases?

Revised Pages
Singular causation (also termed token causation) means that claims about
a causal relationship in a particular case can be made, whereas the regularity
position holds that causation by definition requires that the individual event
is subsumed under a more general relationship between cause and effects.
Regular associations can also be termed type causation.
In many respects, positioning in this debate is determined by founda-
tional assumptions regarding the nature of science itself (Groff 2011; Jackson
2016). Whereas neopositivism is typically associated with a neo-Humean
regularity understanding (e.g., Hume 1975), pragmatists and analyticists fa-
vor singular causation because they believe that real-world events are too
complex and multifaceted to compare—as in, the causes of the start of
World War I. The critical or philosophical realist school has adherents to
both singular and regularity understandings of causation, although regu-
larity tends to come in the form of contingent generalization within very
bounded populations (e.g., midrange theories).
In the regularity understanding, for a relationship to be causal it must
occur more than once. In the philosophy of science, this position is asso-
ciated with David Hume. The neo-Humean understanding of causality as
nothing more than patterns of regular empirical association has traditionally
prevailed in social science (Brady 2008; Jackson 2016; Kurki 2008: 33–59).
Hume’s claim about causation as regularity can be understood by using the
example of a pen falling to the ground. We can observe that the pen falls
to the ground, but we cannot observe the gravitational forces that caused
the object to fall.3 Given this inability to empirically observe the forces that
link the cause with the outcome for most phenomena in the natural world
(at the time), Hume argued that we should merely define causes in terms
of constant conjunction (correlations) between factors: any theorization of
“undetectable” mechanisms would quickly, in his opinion, degenerate into
metaphysics. Causation is taken to involve the regular association between
C and O, controlled for other relevant possible causes (Chalmers 1999: 214;
Holland 1986; Marini and Singer 1988). Adherents of regularity deny that
causal claims can be made about individual outcomes. This is can be seen
in King, Keohane, and Verba’s (1994: 81–82) definition of mean causal ef-
fects, where causation is defined in regularity terms when they write, “The
difference between the systematic component of observations made when
the explanatory variable takes one value and the systematic component of
comparable observations when the explanatory variables takes on another
3. At least until very recently. See Castelvecchi and Witze 2016.

Revised Pages
value.” In this definition, causation is always about regularity because they

focus on systematic components, defined as factors that are present across
cases, coupled with the explicit use of a comparison to evaluate causality.
Regularity is sometimes referred to in the neopositivist tradition as a
covering or general law to be used to subsume the particular as an instance
of a more general relationship. A covering law is defined by Hempel (1965:
232) as “a regularity of the following type: In every case where an event of a
specified kind C occurs at a certain place and time, an event of a specified
kind E will occur at a place and time which is related in a specified manner
to the place and time of the occurrence of the first event.”
To establish causality, Hume argued, three criteria for the relationship
between a cause (C) and outcome (O) must be fulfilled: (1) C and O were
contiguous in space and time; (2) C occurred before O (temporal succes-
sion); and (3) there is a regular conjunction between C and O (Holland
1986). For example, a regular association across a number of cases between
governments that impose austerity measures to cut deficits (C) and their in-
ability to win subsequent elections (O) would in this understanding suggest
the existence of a causal relationship between C and O, assuming that the
three criteria are fulfilled.
But even when Hume’s three criteria hold, other causes may produce
the pattern of constant conjunction—for example, a common background
condition. And even if we control for confounding causes, we would still
not have any within-case evidence of a relationship, meaning that we cannot
make strong inferences about causal relationships solely based on patterns
of constant conjunction—as the adage says, correlation is not the same as
causation. This is why the regularity understanding is (almost) never used
by itself but instead is coupled with either a counterfactual or mechanism-
based understanding.
In contrast, singular causation refers to explanations of particular events—
that is, causal relationships that occur only once. There is both a counter-
factual-and mechanism-based position on singular causation. Within the
literature on counterfactuals, many scholars claim that constant conjunction
(regularity) is not needed when counterfactual singular-causation claims can
be made that enable us logically to assess difference-making (Woodward
2003). Here, an existing real-world case is “compared” with a hypothetical
counterfactual case that argues that if C had not occurred, O would not have
occurred (Goertz and Levy 2007; Levy 2015; Tetlock and Belkin 1996).
Within mechanism-based understandings of causation, there are a range
of different positions on whether mechanisms are regularly occurring phe-

Revised Pages
nomena in the philosophical debate (e.g., Andersen 2012; Hedström and Yk-
likowski 2010) (i.e., they are midrange theories that exist within a bounded
population of cases) or whether singular causal mechanisms can apply only
in a particular case (Bogen 2005: 399; Glennan 2010, 2011; Russo and Wil-
liamson 2011; Waskan 2011).
However, the debate about whether singular mechanistic claims can be
made is actually more of an epistemic question than an ontological one. This
can be clearly seen in the arguments used for/against singular causal claims.
Arguing for regularity, Anderson (2012: 426) states that
a key task is essentially winnowing out, from all of the many entities
engaging in activities that are candidates for causally contributing to
the phenomenon, those that actually did so. Were we to provide “an
explanation” by citing all of the causal interactions that took place in
the spatiotemporal vicinity in question, we would be swamped by in-
formation, most of which would be irrelevant . . . Multiple instances
of some phenomenon allow us to judge the entities and activities that
were genuinely causally involved in the mechanism that produced
the phenomenon. Comparing different occurrences thus provides
the grounds to identify the stages that genuinely contribute to the
mechanism.
In contrast, adherents of singularity also ground their ideas epistemically,

stating that collecting within-case mechanistic evidence of the traces left by
the activities of entities enables us to make inferences within a case without
necessarily appealing to a general regularity (Illari 2011; Russo and William-
son 2007, 2011; Waskan 2008, 2011).
Positioning on this question reflects the underlying philosophical un-
derstanding of science itself. In neopositivist and many critical realist un-
derstandings, the goal is to achieve contingent generalizations about mech-
anisms, meaning that mechanisms are seen as regular (within a bounded
context), whereas in more pragmatic and analyticist understandings, the
goal is to use multiple causes and mechanisms to explain why an event oc-
curred in a particular case.
Both positions are logically defensible as long as we consistently apply
within a given research project. If we claim to be studying a causal mecha-
nism in a particular event—understood in singular-causation terms—we
cannot then claim that the study found a causal mechanism that should
also be found in other typical cases of the phenomenon. What is required

Revised Pages
is only that C could actually produce O through a causal mechanism link-

ing the two in the particular case and that mechanistic evidence within the
case suggests that the mechanism was actually operative. Therefore, studying
mechanisms can be compatible with single-case token causation, but also
type causation.
Do Mechanisms Exist Only at the Micro/Actor Level?
There is considerable debate in the philosophy of social science on whether

theories of mechanisms always have to be reduced to the microlevel (Hed-
ström and Swedberg 1998), or if macrolevel mechanisms can be studied
without going to the microlevel of individual actors (Bunge 2004; Mayntz
2004; McAdam, Tarrow, and Tilly 2001). Should we reduce every causal
mechanism to the microlevel, investigating the actions of individuals, or do
some causal mechanisms have macrolevel properties?
Figure 2.4 illustrates how the level-of-mechanism debate relates to the
study of causal mechanisms. Macrolevel mechanisms (type 1) are processes
that directly link macrolevel causes with macrolevel outcomes. In contrast,
three different types of mechanisms are related to the microlevel actions of
agents, and two of them combine macrolevel properties with microlevel ac-
tions. At the microlevel are action-formation mechanisms (type 3), or what
Hedström and Ylikoski (2010: 59) term “structural individualism,” where all
social facts, their structure, and change are in principle explicable in terms of
individuals and their properties, actions, and relations to one another. Situa-
tional mechanisms link the macrolevel to the microlevel (type 2). Situational
mechanisms describe how social structures constrain individual action and
how cultural environments shape their desires and beliefs. Transformational
mechanisms describe processes whereby individuals’ actions and interactions
generate various intended and unintended social outcomes at the macrolevel
(type 4).
When Hedström and Swedburg (1998) take the extreme position that
there are no purely macrolevel mechanisms (type 1), they are building on
sociologist Coleman’s (1990) extreme rational choice position, which argues
that every human interaction has microlevel causes. George and Bennett
(2005: 137) take a similar position when they state that mechanisms are
“processes through which agents with causal capacities operate,” defining
mechanisms as the microfoundations of a causal relationship that involve
the “irreducibly smallest link between one entity and another” (142).

Revised Pages
Fig. 2.4. Levels of theories of causal mechanisms

Source: Based on Hedström and Swedberg 1998: 22.
This view unnecessarily privileges microlevel theoretical explanations,

usually at the level of individual actors and their behavior in specific
decision-making processes. Yet many of the most interesting social phe-
nomena we want to study, such as democratization or revolutions, cannot
meaningfully be reduced solely to the actor level; in certain situations, they
can be better theorized at the macrolevel (Mayntz 2004; McAdam, Tarrow,
and Tilly 2001). Given that this conundrum is in essence the classic debate
between agent and structure/micro-macro theorization,4 we argue for an ag-
nostic and pragmatic middle-ground position, where the choice of level that
is theorized is related to the level at which the implications of the existence
of a theorized causal mechanism are best studied. There is no single correct
answer to the question of the level at which a causal mechanism should be
theorized. Mechanisms may occur or operate at different levels of analysis,
and we should not see one level as more fundamental than another (Falleti
and Lynch 2009: 1149; Mahoney 2003; McAdam, Tarrow, and Tilly 2001:
25–26).
The level at which a mechanism should be theorized should not be dic-
tated by one’s methodological approach but instead is a theoretical question
that is determined by the theoretical tradition within which one is working.
There are social mechanisms whose workings can best be theorized at the
macrolevel and/or whose empirical fingerprints can be best measured at the
4. For positions in this debate, see Archer 2000; Coleman 1990; Emirbayer and Mische
1998; Giddens 1984; Hedström and Swedberg 1998.

Revised Pages
macrolevel. Levels of mechanisms can hypothetically exist at both the mac-

rolevel and the microlevel, as can mechanisms that span the two levels (situ-
ational and transformative mechanisms). The appropriate choice of level for
theorizing causal mechanisms depends on pragmatic concerns (such as the
level at which the empirical manifestations of a given causal mechanism are
best studied) and/or the level used in the theoretical approach being used.
If the strongest tests of a given mechanism are possible at the macrolevel,
then it should be theorized and studied empirically at that level, whereas if
the empirical manifestations are better observed at the microlevel, then we
should conceptualize and operationalize our study at this level.
Mechanisms Are the Link between Causes and Outcomes—

And Nothing More
While the assumption that a mechanism has to be sufficient and/or nec-

essary to produce an outcome is found in the literature (Andersen 2012:
416; Mahoney 2001: 580; Mayntz 2004: 241–53; Waskan 2011: 403), there
are no logical reasons that it has to be necessary or sufficient (Hedström and
Ylikoski 2010). One reason that mechanisms need not be necessary and/or
sufficient is that a mechanism cannot logically be more important for the
outcome than the cause itself. If we have theorized that a cause is neither
necessary or sufficient, if the cause actually has a productive relationship
with the outcome as a contributing cause, a mechanism would still link it
with the outcome. But the mechanism (and cause that triggered it) would be
neither necessary nor sufficient for the outcome. In the more permissive un-
derstanding of mechanisms in this book, the only requirement for a mecha-
nism to be causal is that it transfers some form of causal forces from C to
O, meaning that C has a causal relationship with O through the operation
of the mechanism. It can be sufficient, but it can also only be a contributing
cause that together with other causal conditions produces the outcome. In
addition, masking may mean that a cause is linked to an outcome through
multiple mechanisms that have different effects on the outcome.
Therefore, when working with mechanistic explanations, we claim only
that if the cause and requisite contextual conditions are present, the mecha-
nism will be triggered (see chapter 3). But the mechanism linking C with O
in a causal relationship cannot logically be more causally important than the
initial causal condition triggering it. If we theorize that C is neither neces-
sary nor sufficient but has only a causal impact on Y, the mechanism linking

Revised Pages
C with O cannot logically be either necessary or sufficient. The mechanism

is merely the link and therefore cannot be causally more important than the
condition that triggers it.
Some scholars have even demanded that we go further than necessity or
sufficiency in conceptualizing mechanisms. For example, Goertz and Ma-
honey (2012: 89) write that when studying mechanisms, we “inevitably carry
out an over-time, processual analysis of the case. . . . The analyst will nor-
mally identify historical junctures where key events directed the case toward
certain outcomes and not others. She or he may well pause to consider how
small changes during these junctures might have led the case to follow a dif-
ferent path. . . . The overall explanation likely will be rich with details about
specific events, conjunctures, and contingencies.” This argument closely re-
sembles historical institutionalist theory. Blatter and Haverland (2012) dis-
cuss mechanisms in terms of capturing complexity. Other (rational-choice-
inspired) scholars have claimed that we must always theorize multiple
mechanisms when studying macrolevel phenomena, describing mechanisms
linking the macrolevel to the microlevel (situational mechanisms), multiple
microlevel factors (action-based mechanisms), the and microlevel to the
macrolevel (transformational mechanisms) (e.g., Hedström and Swedburg
2008). Yet in all of these instances, scholars have imported theories into their
description of methodologies, biasing the method toward specific theories.
However, there is no logical reason a mechanism has to look like an integra-
tive, macro-micro-macro rational choice theory or a highly contingent, his-
torical institutionalist theory with path dependencies and critical junctures.
A theory is not a research methodology, and a research methodology is not a
theory. Our theory should tell us what type of causal mechanism to expect,
but theory should not dictate our understanding of mechanisms.
Are Mechanisms Directly Observable?
Finally, do we have to be able to observe causal mechanisms directly to en-

gage in process-tracing? Or can we infer their existence through their “ob-
servable implications”? Some scholars hold the view that causal mechanisms
are unobservable. For example, George and Bennett (2005: 137) posit that
mechanisms are “ultimately unobservable physical, social, or psychological
processes through which agents with causal capacities operate” and that we
can study them only indirectly through their observable manifestations.
Hedström and Swedberg (1998) argue that causal mechanisms are merely

Revised Pages
analytical constructs that do not have a real-world existence. In contrast,

other scholars contend that the parts of a mechanism should be understood
as having a “kind of robustness and reality apart from their place within that
mechanism” (Glennan 1996:53). In the words of Bunge (1997: 414), “Mech-
anisms are not pieces of reasoning but pieces of the furniture of the real
world.” Reskin (2003) suggests that we can answer the question of how an
outcome was produced only by investigating observable causal mechanisms,
thereby excluding many cognitive and macrolevel mechanisms.
The debate about whether mechanisms can be actually observed or
whether we can infer their existence through their empirical fingerprints5 is
somewhat irrelevant in the social sciences, given that many of the mecha-
nisms that we are interested in studying have a more ephemeral, relational
character. For example, some mechanisms are triggered under a particular
set of conditions that exists while a relational interaction between social ac-
tors takes place but then disappears when the interaction stops. This rela-
tional process might have physical manifestations (e.g., meetings in which
things are said, informational or material are transferred, and so forth), but
the mechanism is nonpermanent.
Positioning in this debate depends on an underlying understanding of
science itself (Jackson 2016). According to a more empiricist, neopositivist
view of science, we can study only the directly observable—that is, only
mechanisms that have an actual physical existence—using process-tracing.
This would, of course, by definition rule out macrolevel mechanisms be-
cause collective entities such as social classes are analytical abstractions, not
physical objects we can observe directly. In contrast, in a critical realist un-
derstanding, we can infer the existence of ultimately unobservable processes
through their empirical fingerprints.
We advocate a pragmatic approach, adopting the latter position to open
up a much richer research agenda relating to studying causal processes than
opting to study solely things that can be directly observed. But both posi-
tions are compatible with the methodological guidelines that are developed
in this book.
5. We prefer the term empirical fingerprint, but it has the same meaning as observable mani-
festation.

Revised Pages
C h a p ter 3
Theorizing Concepts and Causal

Mechanisms
Science is the century-old endeavor to bring together by means of
systematic thought the perceptible phenomena of this world into as
thorough-going an association as possible.
—Albert Einstein
3.1. Introduction
Working with theories of causal mechanisms is different in at least two im-

portant ways from the X → Y causal theories with which social scientists
traditionally are trained to work. First, causes need to be defined in terms of
attributes that can (in theory) trigger a mechanism (or set of mechanisms),
and outcomes need to be defined as something that can be produced or
influenced by the preceding mechanism.1 Second, the mechanism linking
causes and outcomes together needs to be unpacked—either as a sketch,
when a minimalist understanding is used, or as a full-fledged mechanistic
explanation in the systems understanding.
This chapter elaborates on what theorized causal mechanisms in mini-
malist and systems understandings can look like in the social sciences. In a
minimalist understanding, the actual process and/or the causal links remain
in a black box, often depicted merely as Cause → CM → Outcome, without
explicating what the causal arrows signify (e.g., Bennett and Checkel 2014;
Waldner 2014). This is a deliberate choice based on the role played by a min-
1. We use both the terms influenced and produced. Produced implies sufficiency, which
of course depends on whether the cause (or set of causes) that triggered the mechanism is
sufficient to produce the outcome. We also reserve a place for tracing mechanisms trig-
gered by nonsufficient but contributing causes; influence denotes a nonsufficient but causal
contribution.
53

Revised Pages
imalist process-tracing case study in a broader research design. But it has the
consequence that the mechanistic explanation remains within a gray box.
In contrast, the goal of the systems understanding is to formulate a full-
fledged mechanism-based explanation, where the theorized mechanism is
unpacked into its constituent parts in enough detail to explain the work-
ings of the process between the causes and the outcomes (e.g., Cartwright
1999; Craver and Darden 2013; Illari and Russo 2014; Machamer, Craver,
and Darden 2000; Russo and Williamson 2007).
The chapter then turns to the issue of the level of abstraction of our theo-
ries of causal mechanisms, which is an issue when operating with unpacked
mechanisms (systems understanding). Theorized mechanisms can range
from detailed, case-specific mechanisms to abstract, midrange mechanisms
that are formulated in such a way that they can in theory be present in many
different cases within a bounded population. We discuss issues relating to
the temporal bounds of the operation of mechanisms and differences be-
tween macro-and microlevel mechanisms.
Next, we turn to the challenges of context sensitivity and mechanistic
heterogeneity. Mechanistic claims are often very sensitive to the surround-
ing context, meaning that the same causes can trigger different mechanisms
in different contexts, a phenomenon that we term mechanistic heterogeneity.
Because mechanistic heterogeneity can be produced by contextual differ-
ences, when theorizing causal mechanisms it is also important to specify
the context in which the mechanism is expected to operate as theorized. Of
course, this information might not be known before engaging in research,
which is why chapter 4 suggests engaging in extensive probing of cases across
a bounded population to develop a better understanding of the context in
which the particular mechanism can be expected to operate.
The chapter closes with a discussion of modularity and building blocks of
mechanistic explanations in the social sciences before offering a set of practi-
cal suggestions for translating cause → outcome theories into minimalist or
full-fledged mechanistic theories.
3.2. Defining Concepts to Be Compatible with

Mechanistic Explanations
Theoretical concepts such as war or democracy are very abstract, with many
different potential meanings. Therefore, before we can move to theorizing

Revised Pages
Theorizing Concepts and Causal Mechanisms 55
about mechanisms, we first must define our causes and outcomes in more
concrete terms (Adcock and Collier 2001: 533).
Causes and Outcomes Are Not Variables When Working with

Mechanistic Explanations
As discussed in chapter 2, understanding mechanisms as intervening vari-

ables shifts the claim from being a within-case to a cross-case counterfactual-
based causal claim. This means that when working with mechanistic expla-
nations, the causes that trigger mechanisms and the outcomes they produce
should not be thought of as variables. The term variable is otherwise widely
used throughout the case study literature to refer to causes and outcomes,
and even mechanisms (Bennett and Checkel 2014; Coppedge 2012; Ger-
ring 2017; Goertz 2006; for a notable exception, see Goertz and Mahoney
2012). However, we are making an asymmetric causal claim: if the causal and
contextual conditions are present that trigger a mechanism, the mechanism
will be present in a case. We are not making any claims about what, if any,
mechanisms are operative when the causal and contextual conditions re-
quired for the mechanism are not present. Therefore, a mechanistic claim is
by definition an asymmetric causal claim. It implies that when we then want
to trace a mechanism empirically, we would want to trace its operation in
a positive case where the mechanism can—at least in theory—be present.2
Because we make inferences using the within-case empirical traces that the
operation of the mechanism leaves in a case (i.e., mechanistic evidence),
variation in values of causes and outcomes are completely irrelevant. Indeed,
if the value of the cause in the case changed from present to absent, the varia-
tion would change how the process operated, which means we would no
longer be tracing the mechanism within the same case. In contrast, variation
is required when making inferences based on evidence of difference-making
across cases, in particular when we can manipulate values of our cause (treat-
ment = X) in a controlled experiment. Here, the variation in either giving
the treatment or not across a set of cases tells us about the effects of the cause
2. Research situations can exist where we want to study mechanisms that should have
worked but broke down during the process. In such situations, we use theoretical-revision
process-tracing, which involves studying deviant cases where the cause(s) are present but the
expected outcome did not occur to learn about omitted conditions that also must be present.
See chapter 9.

Revised Pages
because we can measure the difference on values of the outcome when it is

present/absent. However, in a mechanistic sense, variation across cases does
not have causal properties.
Why should we be concerned about using the term variable for causes
and outcomes in relation to process-tracing? The core reason is that using
variable in causal terminology implies that variation on scores across cases is
analyzed as the cause of an outcome based on a counterfactual-based model
of potential outcomes (Morgan and Winship 2007: 31–50). While a treat-
ment in an experiment either works or does not work within a given case,
we measure its causal effects by comparing values of the outcome across cases
that received/did not receive the treatment.
When we use mechanistic within-case evidence to make inferences, we
do so not based on variation across cases but instead based on the match be-
tween what empirical mechanistic evidence we would expect the mechanism
to leave and what we actually find (see chapter 5). Mechanistic evidence is
found solely within a case; variation here would mean that we are no longer
tracing the operation of the mechanism within a case but instead are assess-
ing difference-making evidence across cases.
Evidence of difference-making enables inferences about causal effects
across cases—for example, that an increase in X tends to produce an increase
in values of Y. In contrast, mechanistic evidence enables inferences about
whether and how a mechanism works within a case (Illari 2011; Russo and
Williamson 2007). In this respect, these two types of evidence demonstrate
two different things: causal effects and how/whether mechanisms operate.
This means that if we treat causes, mechanisms, and outcomes as vari-
ables and then develop a design that is appropriate for studying variation,
we transform our research question away from one that tells us something
about mechanisms within a case into a covariational cross-case analysis that
can tell us about causal effects.
Attributes of Causal Concepts
When defining concepts in process-tracing, we are theorizing about the as-

pects of the concept that are causally relevant in relation to a mechanistic ex-
planation and to how they relate to each other. We use the term attribute to
refer to these aspects of concepts. For example, the abstract concept of soci-
etal corporatism can be defined as a case in which two attributes are present:
organizational centralization and associational monopoly (Schmitter 1979).

Revised Pages
What, then, determines what attributes we should include in our def-

initions of causal concepts when engaging in process-tracing? We adopt
what is sometimes called an essentialist position on conceptualization,
where the goal of our definitions is to capture the essence of what the
concept means as a cause or outcome (Sartori 1984) instead of thinking
of conceptualization in terms of choosing indicators of latent variables.
This can also be termed a “concept-driven” approach, in contrast to
the “data-driven” approaches found in variance-based research (Gerring
1999).
Using democracy as an example, Goertz (2006: 15) writes that “com-
petitive elections are not indicators of democracy but what it means to be a
democracy.” We agree with Goertz (28) that we should go a step further than
Sartori by explicitly taking into account the causal nature of concepts in our
definitions and that how we define the attributes of causal concepts should
be compatible with the mechanistic causal claim with which we are working.
How, then, do we know a good defined concept when we see it in
process-tracing? Three guiding principles should be ever-present when de-
fining concepts in case-based research:
1. Concepts should be defined thickly as a consequence of the need

to create as causally homogeneous a population of cases as possible
by creating contextually specific defined concepts.
2. Defined concepts should be compatible with mechanistic claims.
3. Defined concepts should focus only on the positive pole.
Working with Thick Definitions of Concepts
The starting point in case-based research like process-tracing is that we typi-

cally choose to create thick definitions of concepts that include a number of
causally relevant attributes in order to produce smaller bounded populations
of relatively causally homogeneous cases, at least at the level of causes/out-
comes (Collier and Mahoney 1996; Coppedge 1999, 2012; Ragin 2000).3 The
connotation of a concept describes the attributes of a defined concept that
determine what the words used apply to, whereas the denotation describes
3. For more on the distinction between homogeneity at the cause/outcome level and the
mechanistic level, see chapter 4.

Revised Pages
the referents or real-world cases. Thick definitions of concepts include more

attributes and/or define them more narrowly (more extensive connotation),
resulting in fewer cases having membership in the concept (narrower deno-
tation) (Coppedge 2012; Sartori 1970).
In case-based research, we want to avoid conceptual stretching, where
concepts are defined so broadly that they result in populations that lump
together cases with different causal properties (causally heterogeneous popu-
lations) (Adcock and Collier 2001; Collier and Mahoney 1996; Ragin 2000:
53–63). An example of causal heterogeneity produced by too few attributes
can be found in the literature on intrastate conflict (civil war). Bartusevi-
cius (2014) has argued that existing theorization has neglected two attributes
when conceptualizing civil war that result in a causally heterogeneous popu-
lation (O = intrastate conflict): (1) what the conflict is being fought over, ei-
ther governmental control or territory, and (2) whether the conflict is ethnic
or nonethnic. By focusing relentlessly on plausible causal relations between
inequality and conflict, he theorizes that we should expect that the causal
relationship of inequality with civil war is very different when the attribute
ethnic territorial conflict is present from what we should expect in noneth-
nic governmental conflict. There is a difference of kind between ethnic and
nonethnic conflicts, with the first produced by inequality (C1) and the latter
tending to have very different causes (e.g., C2, elite-related factors such as
military dissatisfaction with the country’s leadership because of a lost war).
Include Only Attributes That Are Relevant for the Mechanistic Claim
Second, only causally relevant attributes should be included in the defined

concept. Defining concepts is deeply embedded in thinking about causal
relationships (Adcock and Collier 2001: 532). As related to process-tracing,
this means that the attributes included in our defined concept should be
compatible with a mechanistic claim, where the attributes defined in a cause
are things that can actually trigger a causal mechanism and the outcome
includes attributes that could in theory be produced by the hypothesized
mechanism. When defining a cause that triggers a mechanism, we want to
capture theoretically the attributes of the cause that are actually productive
in relation to a mechanism in a case.
Focusing attention on the causally relevant attributes of concepts is a
practice that is widely found in mechanism-focused research in the natural
sciences. For example, within medicine, mechanism-focused research on the

Revised Pages
potential beneficial cardiovascular effects of drinking red wine has tried to

dig deeper into what specific attributes of red wine might be linked in a
mechanistic sense with reduction of cardiovascular risk. Instead of working
with a broadly defined cause (red wine) that would give us little guidance
in focusing our search for productive mechanisms, medical researchers have
trained their attention on specific attributes of red wine, such as resveratrol
and other polyphenolic compounds. Once they have identified a causally
active attribute in red wine (resveratrol) that might trigger a mechanism
that could produce an effect on the outcome, they then try to trace the
mechanism(s) linking the cause with the reduction of cardiovascular risk
using observational studies of rats (for a good review of the research, see
Andriantsitohaina et al. 2012).
In the social sciences, we should similarly be concerned with specifying
the causally relevant attributes of concepts in relation to our mechanistic
claims. This means that we have to go further than broadly defined causes
such as democracy and identify what particular attribute(s) of democracy
might trigger a hypothesized mechanism linking it to a particular outcome
such as peace. Although many broad indexes of concepts such as war or
democracy are widely available in the social sciences, scholars interested in
studying mechanisms are advised to focus on attributes of concepts that
can be causally active in relation to the type of mechanism they are work-
ing with.
For example, democratic norms of compromise and deliberation found
in two countries could hypothetically trigger a mechanism in which their
leaders would be able to resolve their conflicts because they both speak the
same deliberative “language” of democratic compromise.4 But other attri-
butes of the abstract concept of democracy might trigger different processes.
For example, democratic accountability through elections5 might trigger a
mechanism whereby the potential costs of war for voters lead democratic
leaders to become more averse to starting a conflict, other things equal.
However, if we merely used a broad concept of democracy in this research,
we might run the risk of false negative findings because we selected inap-
propriate cases for process-tracing. If we selected a case that scored positive
on the broad concept of democracy but where the attribute of democratic
4. This attribute of democracy could be measured by using the “deliberative component
index” or the more specific “respect for counterarguments” measures in the V-DEM project’s
dataset of democracy. See Coppedge et al. 2017: 53–54, 203.
5. This attribute could be measured using the “electoral democracy index” of the V-DEM
project. See Coppedge et al. 2017: 49–50.

Revised Pages
norms was not present, we might incorrectly disconfirm the existence of a

“compromise norm” mechanism linked to the outcome. However, the cor-
rect conclusion here would be that because democratic norms were not pres-
ent in the case, there can by definition be no mechanism linking it with
peaceful conflict resolution. In contrast, other cases in which the process
might be present might not be selected because they score negative on the
broad concept of democracy but are cases where the causally active attribute
is actually present (democratic norms).
Defining concepts by focusing on potentially causally active attributes
does not mean that we are prejudicing our research, making us more likely
to find what we want to find. Instead, it helps us identify relevant cases and
develop better theories of causal mechanisms because we have a better idea
of what facets of our cause triggered the mechanism and what it eventually
produced, enabling us to focus our theorization on relevant causal logics
that can bind together the cause and outcome through a mechanism.
This also means that how we define our concepts in process-tracing does
not necessarily match how we define them when engaging in comparative
research. Definitions of concepts used in comparative analysis of necessary
or sufficient conditions therefore need to be translated into mechanistic-
compatible terms. When a cause is defined as a necessary condition, we
are trying to capture the attribute(s) of the concept whose absence would
prevent the outcome from occurring. This means that a necessary condi-
tion does not have to be causally “active”; a necessary condition can be a
contextual factor that has to be present for an outcome to occur but does
not do anything in productive terms. Therefore, if we are using a compara-
tive analysis of necessity to select cases for process-tracing, we would have
to make sure that the defined necessary condition includes attributes that
could trigger a productive process. The problem is not as great when deal-
ing with sufficient conditions because the defined concept should include
attributes that are hypothesized to produce the outcome, meaning that they
are active in a causal sense. However, we would still need to ensure that the
definition that determines which cases we select for process-tracing includes
attributes that could trigger a mechanism.
Defining the Positive Pole of Concepts
A good defined concept in process-tracing should clearly identify the at-

tributes whose presence demarcates cases where a given causal mechanism

Revised Pages
is possible from those where it is not. But the implication of understanding

mechanistic explanations as asymmetric claims is that when we are defin-
ing our concepts (causes, contextual conditions, and outcomes), we have
to define only the positive pole of concepts. This contrasts with many exist-
ing treatments of conceptualization that suggest we should define both the
positive and negative poles of concepts along with the continuum between
them (e.g., Coppedge 2012; Goertz 2006). The positive pole of a concept is
defined as the attributes that are included in the definition of our concept
that have to be present for a case to be a member of the concept. The nega-
tive pole of a concept is often understood as the conceptual opposite of a
concept: for example, democracy and autocracy are commonly treated as
conceptual opposites. The positive pole would include the attributes that
define democracy, whereas the negative pole would include the attributes
that define an autocratic system. The continuum in between could be either
a dichotomy or a continuous scale (e.g., ordinal scale). The two poles should
then be mutually exclusive (cases cannot be both democratic and autocratic).
Goertz (2006: 34) contends that concepts should be understood as being
continuous, with both the positive and the negative poles defined clearly: “If
one’s ontology specifically allows for the existence of borderline cases then
one is ready to see them in reality. If one starts with dichotomous concepts
then the tendency is to downplay, if not ignore, the problems—theoretical
and empirical—of the gray zone.” He then places countries in democratic
transitions in the gray zone between democracy and autocracy. However, if
we take seriously the risk of causal heterogeneity, cases of democratic transi-
tion would be better understood as another class of political system with
distinct causal properties (e.g., there are a number of theories about democ-
racies in transition being particularly war-prone; see Mansfield and Snyder
2002). They would therefore be defined as a different set of cases that would
be analytically distinct from democracy or autocracy given that they have
different causal properties.
Conceptualizing concepts as positive-negative poles and a continuum in
between, if possible, is important when defining causal concepts as variables,
given that it enables us to assess the impact that differences in values of
independent variables have across the full range of values of the dependent
variable across a set of cases. If we are defining democracy as a variable, we
will therefore at a minimum have to define what it means to be the nega-
tive pole to assess the difference that values of X have on values of Y across a
range of comparable cases.
In contrast, in process-tracing, as with other case-based methods, mak-

Revised Pages
ing asymmetric claims implies that we need to define our concepts in a

set-theoretical fashion (Beach and Rohlfing 2018; Rohlfing and Schneider
2016). Set theory as used in social science methodology defines causes and
outcomes in terms of the attributes that determine whether a given case is
a member of the set of the concept and theoretical relationships between
causes and outcomes as subset relationships (e.g., a necessary condition is
one where the cases that are members of the outcome are a subset of cases
that are members of the necessary condition). Cases that are not members
of the concept (i.e., the attributes that define the positive pole) are just “ev-
erything else.”
The argument for excluding the negative pole from our definitions of
concepts in process-tracing is simple: causal claims about mechanisms are
intrinsically asymmetric, meaning that we are ascribing causal powers only
to the positive pole of a concept. Claiming that a causal mechanism links
democracy with peace makes no claims about whether there is a causal re-
lationship between autocracy and war or about the mechanisms linking
autocracy and war. If democracy is the cause in our research question, we
would only need to define the positive pole of the concept, with cases clearly
outside of this set merely defined as “anything but” democracy. Negative
cases in which a cause is not present are irrelevant for process-tracing because
no mechanism is triggered when the cause is not present, which also means
that there will be no mechanistic evidence left because the mechanism itself
by definition cannot be present. Therefore, in process-tracing, we need to
define what attributes must be present for a case to be a member of the given
causal concept.
Defining the Qualitative Threshold That Demarcates Cases in and out

of the Set of the Concept
Defining the positive pole also involves setting a qualitative threshold—a cat-
egorical difference in kind—between cases that are in and out of the set of the
concept. A difference in kind demarcates a change in causal properties. Cases
that are members of the positive pole of a concept are expected to have the
causal properties that our theory hypothesizes, whereas cases outside the set
have either different or no causal properties. When working with mechanistic
claims, thresholds can be understood as the point at which a given mechanism
should be expected to become operative. In other words, the threshold is a
representation of an underlying pattern of asymmetric causation.

Revised Pages
Because we want to capture causally relevant similarities and differences

across cases, we need to avoid arbitrary cutoff points (Verkuilen 2005: 467).
We cannot just set a threshold using the justification that we are following
conventional practice or using the median or mean of an interval scale (for
examples of these practices, see Bogaards 2012). Instead, we need to provide
clear justifications for why the threshold should be set at this point based on
when we expect a given mechanism to be triggered. Thresholds should be
clear enough that they enable us to differentiate cases into mutually exclu-
sive categories of presence or absence of the causal concept that are compat-
ible with the mechanism-based claim.
While setting the threshold should in principle be based solely on a theo-
retical and empirical assessment of when the causal mechanism is possible,
we need to avoid setting excessively restrictive or permissive thresholds that
result in either almost no positive cases or an anything-goes inclusion of too
many causally dissimilar cases. The goal is to develop thresholds that capture
mechanistic similarities and differences across cases, clustering cases into like
categories.
Despite our best efforts, some residual uncertainty will always remain
about the placement of a threshold, and we need to avoid artificial preci-
sion that might lead to flawed inferences. And in practice, if there are many
cases close to the threshold or if small changes in the threshold result in large
changes in the number of cases in and out of the set, we should be very cau-
tious about choosing cases close to the threshold for within-case analysis.
While we want to avoid vague thresholds, we also have to accept that our
thresholds will typically not be razor-thin. Instead, there is typically a gray
zone of ambiguous cases that can be wider or narrower depending on the
precision of our measure and the level of empirical uncertainty. We should
attempt to assess the degree of uncertainty of the location of the threshold by
exploring the analytical implications that minor adjustments in the thresh-
old have for set membership, enabling us to develop an idea about the width
of the gray zone. However, only after we have a good working understanding
of mechanisms operative in typical cases whose membership we are certain
about should we move to studying these borderline cases.
Regardless of whether the uncertainty about case membership of bor-
derline cases derives from the ambiguity about the measure itself or poor
empirical material, the methodological implications are the same. Unless we
can document with a reasonable degree of confidence that a case is a mem-
ber of a causal concept, we should exclude it from our analysis. For example,
when selecting individual cases, we might mistakenly select a case that we

Revised Pages
think is in but it is actually out of the sets of C and/or O. The existence of

this gray zone explains why case-based scholars tend to be more comfortable
with selecting cases that are obviously either in or out of sets (e.g., Goertz
and Mahoney 2012; Ragin 2000: 226), meaning that their membership is
robust across different definitions of the threshold of the concept.
3.3. Theorizing Mechanisms—Minimalist and

Systems Understandings
Mechanistic explanations are causal claims about the processes that link
together causes and outcomes. This section provides guidelines for what
“good” theorized causal mechanisms can look like in both minimalist and
systems understandings. When we are theorizing about causal mechanisms,
we are turning our attention away from causes and outcomes to focus ana-
lytically on what is going on between them.
While theorizing causal relationships in cause → outcome terms is (rel-
atively) straightforward, it is much more difficult to conceptualize causal
mechanisms, a challenge made even greater by the existence of two distinct
understandings of mechanisms in the literature (minimalist versus systems).
In the minimalist understanding, mechanisms are not unpacked in any de-
tail. In the systems understanding, the causal process is unpacked in greater
detail in an attempt to theorize a mechanism that ideally achieves what Ma-
chamer, Craver, and Darden (2000) term “productive continuity,” where
the mechanism composed of entities engaging in activities is theorized to
transmit causal forces in an uninterrupted process that links cause(s) and
outcome.
Minimalist Understandings of Mechanisms
In the minimalist understanding, the theorized mechanistic explanation is

either superficial because both the parts of the process itself and the causal
logics linking them together are not specified at all or incomplete because the
causal logics about what links together the parts are not specified but are
merely assumed by depicting them as arrows in causal graphs (Craver and
Darden 2013: 83–95). Working with a minimalist understanding of mecha-
nisms is a deliberate choice that reflects the type of research situation. How-
ever, we cannot claim to have traced a mechanism without detailing the
mechanism being traced.

Revised Pages
Fig. 3.1. The minimalist “causal mechanism” in Pierson’s “Path to European

Integration”
Source: Pierson 1996: 149.
A superficial, minimalist mechanism is typically depicted in the form

of Cause → Causal Mechanism (M1) → Outcome. This understanding is
widely used in the literature (e.g., Goertz 2017: 49–55). Figure 3.1 shows
an example of a superficial minimalist depiction of mechanisms, in which
Schimmelfennig (2015) describes a set of “mechanisms” theorized by Paul
Pierson (1996) linking an initial institutional and policy outcome (cause for
t1) with institutional and policy outcomes at t2 through a set of superficial
mechanisms.
In this example, the mechanisms linking the cause (outcomes in t0) with
the outcome in t2 are described with one-liners such as “accumulated policy
constraints” and “microlevel adaptations (‘sunk costs’).” What binds parts
together is completely black-boxed in the figure, which merely shows arrows
without any information about the causal logics underlying the linkages be-
tween factors. In Pierson’s article, the causal links are made plausible with a
variety of short causal arguments, but we are still left in the dark about what
mechanisms linking together the parts of the process. For example, of the
link between “sunk costs” and “member state preferences,” Pierson (1996:
144) writes, “When actors adapt to the new rules of the game by making ex-
tensive commitments based on the expectation that the rules will continue,
previous decisions made lock in member states to policy options that they
would not choose to initiate.” He then details a number of possible ways
that member states could become locked in, including learning and coordi-

Revised Pages
nation effects (145). But how would learning really operate in terms of who
does what and why? What activities should we expect to see if coordination
is taking place? Because we are told little about how the process works, we
are left uncertain about the logics that provide the causal links in the pro-
cess. We are therefore unable to answer the how does it work question that is
the baseline for an adequate mechanistic explanation (Craver and Darden
2013: 90–91). The analytical result can be seen in the subsequent case studies
(Pierson 1996), where some mechanistic evidence makes plausible that fac-
tors like sunk costs and policy constraints are present, but we are left in the
dark about how the process actually works.
An incomplete mechanism scheme unpacks mechanisms slightly more by
including constituent parts of mechanisms, but the causal logics linking parts
together are still depicted as arrows that are assumed to link parts together in
a relationship of conditional dependence. However, the actual causal logic
linking parts together remains black-boxed. One good example of this type of
minimalist understanding of mechanisms is Waldner’s proposed use of causal
graphs as a technique for depicting causal mechanisms. Waldner (2014: 134,
135) depicts a causal graph as X → M1→ M2 → Y, writing,
Nodes or vertices of the graph represent random variables; for sim-

plicity, we can treat these random variables as binary variables that are
either true or false. Directed edges, or arrows, refer to relations of con-
ditional dependence. Arrows therefore represent causal mechanisms.
Mechanisms should not be confused with intervening variables or
mediators; M1 and M2 are random variables that are contingently
located in non-initial and non-terminal nodes; they are not mecha-
nisms. . . . For each realization of the causal graph in a particular case
study, process tracing requires the specification of a set of events that
correspond to each node in the causal graph.
However, in this understanding, the actual mechanism is not fleshed out

because the causal logics binding parts together are assumed instead of being
made explicit. The correspondence between the causal graph and an event-
history map enables descriptive inferences, but the causal logics linking parts
together remain black-boxed. Waldner (2015: 248–49) writes, “Ideally, pro-
cess tracers will identify causal mechanisms for each arrow connecting two
nodes in the causal graph. It is the identification of these causal mechanisms
with their critical property of invariance that allows us to justify the as-
sumption of unit homogeneity and hence make unit-level causal inferences.”

Revised Pages
Waldner does not offer methodological guidance for how to do this (e.g.,
Waldner 2014, 2015), while the guidelines for tracing mechanisms as systems
in this book provide a set of tools for doing so. However, Waldner’s frame-
work can be a good tool for probing the plausibility of a mechanism before it
is traced more rigorously and can also be used after in-depth process-tracing
to test the bounds of valid mechanistic generalizations.
An example of a minimalist mechanism can be seen in Ziblatt (2009).
He develops and tests a theory linking landholding inequality (C) to elec-
toral fraud (O) in Germany between 1871 and 1914, during which time “old
patterns of cooptation are recast as landed elites ‘capture’ local institutions,
thereby equipping landed elites with the institutional, coercive, and mate-
rial resources to subvert free and fair elections, even as their traditional social
power erodes” (3). The core of the article is a cross-case statistical analysis that
demonstrates a strong correlation between C and O (3–12). The cross-case
analysis is followed by a case study that aims to make the correlation more
plausible by attempting to answer the question “What was the causal process
through which this relatively abstract and distant concept of landholding
inequality actually operated on the ground leading to election fraud?” (14).
Given that the case study acts as a plausibility probe, the causal mecha-
nism is not unpacked in any detail. The closest we get to a theoretical de-
scription of the causal mechanism is where Ziblatt (2009: 14) writes that
the landed elites “exert influence indirectly via the capture of rural public
officials such as mayors, county commissioners, police officials, and election
officials, who in turn are the actors that interfere with free and fair elec-
tions.” This can be depicted as a causal graph: landholding inequality (cause)
→ landed elites capture rural public officials (part 1) → officials interfere in
elections (part 2) → electoral fraud (outcome).We are told earlier that this
capture occurs through deployment of institutional, coercive, and material
resources as well as that social resources, while on the decline, might also
matter. But the causal logics binding the cause with part 1, part 1 with part 2,
and part 2 with the outcome remain black-boxed because we are not able to
answer questions such as What types of power resources do landed elites deploy
to capture officials? Does capture occur through the use of material resources such
as the power to control revenue or through control of appointment processes? Or
perhaps by deploying more institutional power resources? Do landed elites have to
actively intervene to capture officials, or do officials anticipate what local officials
want? And once captured, what is the process whereby local officials actually en-
gage in electoral fraud? What types of actions do they use? Removal of voters from
electoral rolls, pressuring poll officers, and so on?

Revised Pages
The analytical consequence of not explicitly theorizing the causal logics

binding the parts of the causal mechanism together is that the “evidence” of
the causal connections is more indirect and weak because we do not really
know what it demonstrates in terms of activities linking parts of the process
together. Ziblatt’s empirical case study illustrates that both C and O are
present and that there is some mechanistic evidence suggesting that they
are related through some form of a capture process. However, given that
the causal logics are not explicitly theorized, the actual analysis produces
small empirical vignettes that merely insinuate the existence of an underly-
ing mechanism without providing more direct (i.e., stronger) mechanistic
evidence. Indeed, it is difficult to determine whether the presented empirical
material confirms the underlying mechanism, given that we are left guess-
ing about what that mechanism really is. The indirect, anecdotal nature of
the evidence is clear in the types of empirical material presented in the case
study: “As one Landrat from Posen reported in his memoirs in 1894, ‘I had
to join the local branch of the Agrarian League, because everyone I interact
with socially—and everyone I hunt with—is a member!’” (Ziblatt 2009:
16). This piece of empirical material obviously relates in some fashion to
an underlying part of a mechanism whereby landed elites can pressure lo-
cal officials, although here this is the use of social resources, which Ziblatt
previously suggested were less important in the capture process (3). Further,
by not telling us what empirical fingerprints (predicted evidence) each of the
parts of the mechanism can be expected to leave in a case, the presentation of
empirical material seems very unsystematic. A scholar critical of case study
methodology might go so far as to state that Ziblatt is just cherry-picking
empirical observations instead of offering a systematic analysis of direct em-
pirical evidence that confirms or disconfirms the operation of each part of
the causal mechanism.
However, given that Ziblatt was using the case study as a supplement to
increase the likelihood that the correlation found in the statistical analysis
reflects a causal relationship, the black-boxing of the mechanism was justi-
fied. Ideally, the study would then be followed up by more in-depth process-
tracing, building on a systems understanding of mechanisms that gives us
a better picture of how the capture process actually worked, thereby also
providing strong evidence of a causal connection.
When working with both variants of minimalist mechanisms (superficial
or incomplete), the scholar can only claim that there is some within-case evi-
dence that C is causally related to O. However, by not explicitly theorizing
the parts of a causal mechanism in detail and the causal logics in the form
of activities binding them together, we are left unable to answer basic ques-

Revised Pages
tions about how the mechanism operates. It is difficult to evaluate whether

the theorized causal process is logically consistent; even more problematic,
it is very difficult to trace empirically whether there is mechanistic evidence
of the process when we are not told what process is being traced. Because a
causal process is not explicitly theorized, one cannot claim that one has actu-
ally traced causal mechanisms.
But again, this is a deliberate choice that reflects the research situation. In
many research situations, there is no need to go into detail about the causal
process. If the goal is to just make some form of causal mechanism plausible,
even relatively weak mechanistic evidence can do the job. Minimalist-type
mechanisms are also often used when starting a mechanism-focused research
project. Mechanism-based research often begins by working with sketch
type minimalist mechanisms, where “filler” terms (e.g., produce or cause)
for the activities that link parts together are used but do not make explicit
how they do so (Machamer 2004). After some mechanistic evidence of the
process is found, the analyst can then try to figure out what activities link
parts together and what causal logics underlie those activities, resulting in
a full-fledged mechanistic explanation (i.e., mechanism as system). In addi-
tion, generalizing mechanistic claims beyond a small set of cases is also made
possible by tracing minimalist mechanisms (see chapter 4).
Systems Understandings of Mechanisms
In this section, we adapt guidelines for working with full-fledged mecha-

nistic explanations from the natural sciences to the social sciences. In gen-
eral, the goal is to develop an “adequate” mechanistic explanation, under-
stood as a causal theory that describes the workings of a process linking
together causes and outcomes in enough detail that it enables us to answer
the question How does it work? for each critical part of the process (Craver
and Darden 2013: 83–95). Theories of causal mechanisms in a systems under-
standing describe the key interlocking parts composed of entities engaging
in activities that transmit causal forces between a cause (or a set of causes)
to an outcome (Beach and Pedersen 2016a; Machamer 2004; Machamer,
Darden, and Craver 2000; Mayntz 2004). Again, mechanisms are not causes
but are causal processes that are triggered by causes and that link them with
outcomes in a productive relationship.6
6. In Aviles and Reed’s (2017: 717–18) terms for mechanisms in sociology, we are operat-
ing with a “substantial standard” of what constitutes a mechanistic explanation, where the

Revised Pages
When theorizing mechanisms based on a systems understanding, the

process and the causal logics that bind parts together are not left in a black
box. In particular, the causal logic(s) can be made explicit by formulating
the activities in which entities engage that can plausibly link together parts of
the process. In simple terms, the goal of unpacking mechanisms as systems
is to enable us to follow the money—both theoretical and empirically—of
a process linking together the cause(s) and outcome. Our theories of causal
mechanisms can be compelling causal stories, but at the end of the day,
when operating with a systems understanding of mechanisms in process-
tracing, the empirical evidence of the activities that bind together parts of
a mechanism should convince us that it is present and works as theorized
(Clarke et al. 2014: 350). Consequently, it is important to theorize these
activities and the causal logics that explain why the activities can transfer
causal forces to the next part of a process in enough detail that it is possible
to develop clear propositions about what observable manifestations these
activities might leave in a case (see chapter 4).
To qualify as a full-fledged mechanism-based explanation, a theorized
causal mechanism needs to be unpacked into its constituent parts in enough
detail that it sheds light on how things work for the key parts of the process
between causes and outcomes (Cartwright 1999; Craver and Darden 2013:
15; Illari and Russo 2014; Machamer, Craver, and Darden 2000; Russo and
Williamson 2007). When theorizing the parts of a causal mechanism in the
systems understanding, the parts should exhibit some form of productive
continuity, meaning that each of the parts logically leads to the next part,
with no large logical holes in the story linking a cause (or set of causes) and
an outcome (Machamer, Darden, and Craver 2000: 3). Unpacked mecha-
nisms can be depicted as: cause → part 1 (e1 * a1) → part 2 (e2 * a2) → out-
come. This can be read as the cause (and contextual conditions required)
triggers a mechanism that moves through part 1, composed of described
entity 1 engaging in activity 1, and so forth, until the outcome. The overall
mechanism can be depicted visually as in figure 3.2, where each part of the
mechanism between a cause and outcome is detailed in terms of entities
engaging in activities.
The more precise nature of entities and activities in conceptual terms
depends on the type of causal explanation along with the level at which the
focus is on tracing the workings of causal structures that are stable and determinable within a
given context. This definition maps nicely onto the discussions of mechanisms in the natural
sciences by scholars such as Craver, Darden, Illari, Machamer, Russo, and Williamson (e.g.
Craver and Darden 2013; Illari 2011; Machamer 2004; Russo and Williamson 2007).

Revised Pages
Fig. 3.2. A simple template for a two-part causal mechanism
mechanism works and the time span of its operation. Entities should be
formulated as nouns, but they should be described as things that can have
properties, structures, or orientations that enable them to engage in activi-
ties. They can be identified by their properties and spatiotemporal location.
For example, in social science mechanisms, common entities would be in-
dividual voters, lobbyists, or politicians or macroentities such as political
parties, social movements, or governments.
The activities in which entities engage move the mechanism from an
initial or start condition through different parts to an outcome. Activities
should be formulated as active verbs that describe what entities do in terms
of something that can transfer causal forces to the next part of the theorized
process. Examples in social science of activities could include voting, mobi-
lizing, lobbying, cajoling, demonstrating, attacking, and so on. For example,
a group of voters can vote for a politician, which might then lead the elected
politician to try to represent their interests, at least in theory.
Figure 3.2 depicts a very simple, linear mechanism. But there is noth-
ing in the logic of theorizing mechanisms as systems that tells us that they
cannot include more complicated forms. For example, we might expect a
mechanism to split into two parallel mechanisms for part of the causal pro-
cess, or feedback loops and other forms of nonlinear relationships might ex-
ist. Multiple causes might affect the process at different times. Cause 1 might
trigger the process, but cause 2 might be required to take the process further.
For example, the initial parts of a theorized mechanism of democratization

Revised Pages
might be triggered by economic development of an educated middle class,

but further in the process, another cause (e.g., an economic crisis) might be
required to trigger demands for democratic reforms.
Mechanisms can also operate at multiple levels—for example, going
from the microlevel of individual actors to societal-level processes and back
again—depending on which theory one is operating with. Redundancy can
also exist in critical parts—for example, the connection between part 1 and
part 3 could be bridged by either part 2a or 2b (i.e., different entities and
activities that have a form of functional equivalence).
Even more complex and difficult to theorize are the types of relational
causal mechanisms that many scholars theorize exist in the social world. How-
ever, these can be modeled in mechanistic terms by delineating the conditions
under which the mechanism is theorized to materialize as well as the rela-
tional, iterative character of the process. This type of mechanism can also be
expected to be ephemeral rather than a fixed arrangement of parts and activi-
ties (i.e., clock-like mechanisms). In addition, if we are studying a mechanism
embedded in a complex social environment, other inputs (and outputs) might
also affect the workings of the mechanism and should ideally be modeled.
Theorizing mechanisms in the social sciences is quite similar to the situ-
ation facing biologists and medical researchers, where many mechanisms are
relational, ephemeral, and embedded in complex systems. As a consequence,
biologists and medical researchers tend to develop contingent, context-
bound mechanistic explanations instead of the law-like and widely found
mechanisms of physics (Craver and Darden 2013). This implies also that
our generalizations about mechanisms will tend to be very context-sensitive.
As an example of a full-fledged mechanistic explanation, O’Mahoney
(2017) provides a theorized causal mechanism that details how a policy dis-
pute between two groups of states can be resolved through “rhetorical ad-
duction,” where arguments and actions together can convince undecided
third-party states of the legitimacy of a policy, leading them to adopt it. The
theorized mechanism (depicted in figure 3.3) is tested on a case study of the
conflict between India and Pakistan in 1971 and the question of recognizing
the new state of Bangladesh. Observable implications for each part of the
process are then developed (2017: 336), providing evidence for the causal
inference that the “policy of recognizing Bangladesh was made possible by
a combination of creative argumentation and the action of withdrawing In-
dian troops. So recognition happened (at least for some states) because of the
argumentation and withdrawal” (335). The theorized mechanism qualifies as
a full-fledged mechanistic explanation because the causal logic linking each
part is made explicit enough for us to answer the How does it work? question.

Revised Pages
Fig. 3.3. Causal mechanism linking policy dispute among states with policy
adoption
Source: Based on O’Mahoney 2017.
When theorizing mechanisms in systems terms, the analytical purpose

is to unpack the key parts of a causal story. Developing a good unpacked
mechanism is a difficult process that typically involves first formulating a
mechanism sketch (minimalist mechanism) and learning more about how
the process actually worked by using theory-building process-tracing. This
knowledge then enables the further elaboration of the activities that are hy-
pothesized to link together critical parts of the process. In our logical scru-
tiny of the process, we are likely to find holes in our theorized mechanism—
that is, a lack of productive continuity in the process. Further empirical and
theoretical research will then be required to flesh out what is going on in this
hole in the process. After several iterations, the final theorized mechanism
will (hopefully) be detailed enough to answer the question How does it work?
both theoretically and empirically. This back-and-forth between empirics
and theory is analytically taxing, but no one said that doing good empirical
case-based research in the social sciences was easy.
3.4. Level of Abstraction of Mechanistic Explanations
When unpacked as systems, theories of causal mechanisms can have varying

levels of analytical abstraction. Theorized mechanisms can be made more ab-
stract by dropping details (typically dropping parts or making them simpler)
and can be made more specific by adding details. The level of abstraction

Revised Pages
can range from case-specific causal mechanisms, which are highly specified
mechanisms that describe how a causal process works in a particular case;
over contingent mechanisms, which can be present in at least two cases; to
midrange mechanisms, which are processes that are described in quite ab-
stract terms but that still identify interlocking parts of the process in terms
of entities engaging in activities. Entities and activities in a case-specific
mechanism will typically be described using proper nouns (e.g., particular
people or institutions) and the specific activities in which they engage. In
contrast, in a midrange mechanism, the parts are quite abstract, focusing
only on the most causally critical elements shared across a range of similar
cases. Logically, lifting the level of abstraction by dropping specifics expands
the potential scope of generalizations about the mechanism, whereas low-
ering the level of abstraction involves further specification of parts of the
process, thereby narrowing the scope of potential generalization.
But in all instances, to qualify as a mechanism-based explanation in a sys-
tems understanding, we still have to answer the How does it work? question,
which requires that the causal arrow between causes and outcome be eluci-
dated in enough detail that the critical parts of the process are made clear in
terms of entities engaging in activities that link one part to the next (Craver
and Darden 2013: 31; Hedström and Ylikoski 2010: 53). We suggest thinking
about raising the level of abstraction using an analogy of reduction of sauces
in cooking, where the goal is to reduce the amount of liquid while preserv-
ing and intensifying the flavor. When theorizing mechanisms, we can reduce
a case-specific mechanism by cooking out the liquid (case-specific parts) into
the essence that still captures the key working parts of the process.
To exemplify what a mechanism can look like as we raise/lower lev-
els of analytical abstraction, we utilize a case-specific mechanism de-
scribed in Haggard and Kaufman (2016). The authors provide a set of
case studies that can be interpreted as descriptions of very case-specific
mechanisms linking repressive autocratic regimes and economic griev-
ances (causes) with democratic transition (outcome). We ignore for the
moment the fact that the case studies sometimes veer toward narratives
of events where the causal logics linking parts together become foggy.7
7. Haggard and Kaufman put forward no real evidence to substantiate most of the key
causal links in their case studies. For example, in the Argentine case, they write, “But the
decision to invade was itself a major gamble that was directly precipitated by growing pres-
sure from below and the increasing internal rifts within the regime itself that resulted from
the surge in mass mobilization” (2016: 112–13). But they do not provide the evidence to

Revised Pages
We also ignore the fact that the mechanisms described in different cases
differ widely (see chapter 4).
Instead, we utilize the analytical narrative in the Argentine case to flesh
out a case-specific mechanism and then demonstrate what it could then look
like if we raise the level of abstraction to both a less detailed but still contin-
gent mechanism and an abstract midrange mechanism that could travel to a
range of contextually similar cases.
Table 3.1 describes a fourteen-part collective-action mechanism based on
the narrative provided by Haggard and Kaufman (2016: 111–13). If we were
to lift the level of abstraction to a midrange mechanism, we would want
to ask ourselves what elements of this process are case-specific and what in
theory could be present in other democratic transitions in a similar context
of a labor-repressive regime with few channels of representation, a history of
union involvement in politics, and repeated economic crises.
Here it is obvious that the military invasion of the Falkland/Malvinas
Islands and subsequent lost war is a case-specific factor. This detail of the
Argentine case could be abstracted away, putting it down into a category of
severe grievances with the regime that undermines its support with critical
constituencies. In addition, there is a repeated cycle of mass mobilization
followed by repression that then triggers further protests. While these partic-
ular events in the Argentine case and the postulated causal linkages in them
are case-specific, we could theorize that a midrange mechanism might have
to go through multiple iterations of the protest-repression cycle before the
threshold is reached to push toward democratic transition. This can be mod-
eled as a feedback loop that repeats several times. Which particular groups
eventually support the unions beyond human rights organizations is prob-
ably case-specific, but the critical point is that unions act as the spearhead
for broader middle-class mobilization. In other cases, this feedback cycle will
have a different face, but the key testable feature would be that we find an
iterated cycle where union protests and demonstrations are eventually sup-
ported by a broader middle-class coalition, resulting in a level of protest at
which further repression is no longer feasible, leading to the regime to accept
a transition to democracy.
Figure 3.4 depicts a more abstract process in which the essential elements
of the Argentine case-specific mechanism are reduced into three critical
back up this postulated causal link between the invasion and the pressures created by mass
mobilization.

TABLE 3.1. Case-Specific Collective-Action Mechanism in the Argentine 1983 Case
Beach_2d-edn.indd 76
Cause
Autocratic Unions initially Military repres- More combative In midst of deep Unions respond Strikes deepen Military com-
repression AND divided over sion hardens factions organize recession in by increasing divisions be- mand replaces
economic whether to resolve and wave of strikes 1981, military pressure on re- tween hard-line Viola with
grievances negotiate or arrests of older beginning in high command gime, July 1981 officers and hard-line officer
oppose conservatives 1977 puts in power strikes more moderate (Galtieri)
clears way for more moderate president
more combat- leader (Viola)
ive younger
militants
Outcome
Militant opposi- Government Protests surge Army and Question of Formal proposal Military govern- Transition to de-
tion to regime launches inva- as information navy withdraw amnesty for for legal amnesty ment abandons mocracy in 1983
increases, March sion of British- about losses representatives military crimes withdrawn in attempts, allows
1982 largest held Falkland spreads and from ruling sticking point in face of massive transition to
demonstrations Islands to divert economy dete- junta, army negotiations opposition rally competitive
yet, now includ- popular pres- riorates command ap- elections to go
ing the support sures and rally points caretaker forward without
of human rights military and government preconditions in
organizations society around (Bignone) to or- October 1983
and political nationalism ganize elections
parties in negotiation
with opposition
coalition
Source: Haggard and Kaufman 2016: 111–13.
9/28/2018 1:08:26 PM
Revised Pages
Fig. 3.4. A midrange theory of a collective action mechanism
parts: the mass mobilization (part 1), and the iterative sequence of repression
and protest between ruling elites and union opposition (parts 2 and 3) that
finally results in democratic transition. In Haggard and Kaufman (2016),
this mechanism is then applicable in the case of Bolivia in 1982, where there
is a similar context although in the Bolivian case it includes elements such
as church support of protests that are not theorized to matter in the Argen-
tine case (113–16). If we find evidence suggesting that we cannot understand
this process in the Bolivian case without including church support, and we
cannot categorize the church as “other societal groups,” we would conclude
that the midrange mechanism developed in the Argentine case does not even
travel to the Bolivian case. This collective-action mechanism does not travel
to other cases in the book, like Niger in 1985, where the role of unions and
economic crisis is replaced by another dynamic involving a weaker regime
and unions of state workers mobilizing within the state instead of the soci-
etal mass mobilization described in Argentina and Bolivia (see chapter 4).
3.5. The Importance of Context and the Dangers of Mechanistic

Heterogeneity
Tracing causal mechanisms using process-tracing in case studies enables us to

make strong causal inferences when we have evidence of the workings of each
step of the process (Beach and Pedersen 2016a; Illari 2011; Russo and Wil-
liamson 2007; Steel 2004, 2008). However, what is gained in terms of higher
internal validity of causal inferences has costs in terms of the external validity
of our findings because of the contextual sensitivity of mechanism-based expla-
nations. This is a particularly acute problem when operating with the systems
understanding of mechanisms, because the process is unpacked into its core
constituent parts at a relatively low level of analytical abstraction.

Revised Pages
The literature on causal mechanisms frequently points out that the ways
mechanisms unfold in a specific case are sensitive to the surrounding con-
text (Bunge 1997; Falleti and Lynch 2009; Gerring 2010; Goertz and Ma-
honey 2009; Steel 2004, 2008; Tilly 1995: 1601). Contextual conditions are
sometimes termed “scope” conditions in the literature, but the terms have
the same meaning. We understand contextual conditions as all “relevant as-
pects of a setting (analytical, temporal, spatial, or institutional)” in which
the analysis is embedded and that might have an impact on the constitutive
parts of a mechanism (Falleti and Lynch 2009: 1152; similarly, Mahoney and
Goertz 2004, 660–61).
Contextual conditions can sometimes be difficult to distinguish from
causal conditions. It is relatively easy to distinguish the two when we are
operating with causal theories where C and O are linked together with a
causal mechanism. When operating with mechanism-based understandings
of causation, a cause is defined as something that triggers a mechanism,
meaning it is in a productive relationship with the outcome. Here a cause
does something, whereas a contextual condition is merely an enabler or in-
hibitor; it does not do anything active. For example, if we conceptualize a car
as being a mechanism that transfers causal forces from a cause (the burning
of fuel) to the outcome (forward movement), we might theorize that the
contextual conditions in which the car mechanism can be expected to oper-
ate include the presence of oxygen and relatively level ground. If we throw
the car mechanism into a lake, even though the mechanism might be in
perfect shape, it will not work, as it is outside of the contextual conditions in
which it will run. But the presence of oxygen or ground does not actually do
anything in a causal sense; only the absence of these contextual conditions
can prevent C from producing O.
Even when the same causes and outcome are present, different contex-
tual conditions can trigger differences in the mechanisms linking them to-
gether (Falleti and Lynch 2009; Steel 2008). Context sensitivity of mecha-
nisms means that two cases that look causally homogeneous at the level of
conditions (same causes, same outcome) might be heterogeneous when we
move to the mechanism level. Mechanistic heterogeneity can both mean
that in two or more cases, the same cause triggers different processes, thereby
resulting in different outcomes, or it can mean that the same cause is linked
to the same outcome through different processes. Different processes can be
either a whole mechanism that is completely different or only one or more
parts that diverge between two cases.
Depicted in abstract terms, mechanistic heterogeneity between the same

Revised Pages
cause and same outcome can be illustrated as in table 3.2, where there are
two scenarios: either completely different mechanisms link the same cause
and outcome in two different cases, or the mechanisms diverge at one part.
The parts that diverge are shaded gray. In both situations, a contextual differ-
ence exists between the two cases that produces mechanistic heterogeneity.
In situation 1, the same cause and outcome are linked through completely
different processes (parts 1a and 2a), whereas in situation 2, the two cases
diverge only in part 2. But inferring that the same process operated in both
cases based on studying only one case would be a faulty inference because of
mechanistic heterogeneity.
A real-world example of mechanistic heterogeneity found by policy evalua-
tion produced by differences in context can be found in Howard White (2009).
He describes a mechanism that links a cause (World Bank funding of educa-
tion of mothers in nutrition) with an outcome (improved nutritional outcomes
for children) that was found to have worked in a case (the Tamil Nadu Inte-
grated Nutrition Project in India). The unpacked mechanism can be described
as: Cause (mother participates in program) → (1) mother receives nutritional
counseling → (2) exposure results in knowledge acquisition by mother → (3)
knowledge used by mother to change child nutrition → Outcome (improved
nutritional outcomes for children) (4–5). Based on the program’s success in
Tamil Nadu, the World Bank tried it in Bangladesh. However, the mechanism
did not function as expected in the different context and instead broke down.
The reason was a key contextual difference. In Bangladesh, mothers were not the
key decision-makers in households, with men doing the shopping and mothers-
in-law in joint households (a sizable minority) acting as decision-makers about
what food went onto the table. The mechanism therefore worked until part 3,
but a contextual difference caused it to break down in the Bangladesh case.
The problem of mechanistic heterogeneity cannot be resolved merely
TABLE 3.2. Two Scenarios of Mechanistic Heterogeneity

Scenario 1
Case 1 Cause → part 1 (e1 * a1) → part 2 (e2 * a2) → Outcome
→ part 1a (e1a * a1a) → part 2a (e2a * a2a)
Case 2 Cause → Outcome
Scenario 2
Case 1 Cause → part 1 (e1 * a1) → part 2 (e2 * a2) → Outcome
→ part 2a (e2a * a2a)
Case 2 Cause → part 1 (e1 * a1) → Outcome

Revised Pages
by raising the level of abstraction of the theorized mechanism. Of course,

logically, the higher the level of abstraction of a theorized mechanism, the
greater our ability to generalize about the mechanism across cases, other
things equal. Different levels of abstraction of theorized mechanisms can
exist, ranging from midrange mechanisms, where the process is described in
quite abstract terms but where we can still identify interlocking parts com-
posed of entities engaging in activities; over contingent mechanisms at quite
low levels of abstraction; to detailed, case-specific mechanisms that describe
a causal process in a particular case. But to qualify as a mechanism-based ex-
planation (i.e., systems understanding), we must be able to answer the How
does it work? question, which requires that the arrow in between causes and
outcomes has to be elucidated in enough detail that the critical parts of the
process are made clear in terms of entities engaging in activities that link one
part to the next (Craver and Darden 2013: 31; Hedström and Ylikoski 2010:
53). In the above example, it would not be enough to abstract away all details
of the process, for example by theorizing that Cause (mother participates in
program) → mother uses information → Outcome (improved nutrition of
children). It still would not work in cases outside of the context. One could
instead make the entity more abstract (e.g., decision-maker), but this would
still have to be made specific in different cases to ensure that the cause is
applied correctly; in Bangladesh the training program would have to be ap-
plied for fathers and/or mothers-in-law.
This means that the solution to the problem of the contextual sensitivity
of mechanism-based explanations is not just to define away details to a level
of abstraction at which context no longer matters, turning our theories of
mechanisms into context-free one-liners in the way Elster (1998), for exam-
ple, uses the term mechanism. Elster (1998: 45) defines mechanisms as “fre-
quently occurring and easily recognizable causal patterns that are triggered
under generally unknown conditions or with indeterminate consequences.”
Because he is not concerned about context and mechanistic heterogeneity,
he is forced to use very abstract one-liners such as “wishful thinking” or “the
spillover effect” to describe mechanisms, thereby providing “explanations”
that tell us nothing about the actual causal process (e.g., 52–55). The result
is that Elster-like mechanisms define away the very essence of what consti-
tutes a mechanism-based explanation, which is to understand how a causal
process actually works. Simply put, it is not possible to claim to be tracing a
causal process in the absence of some information regarding the underlying
process being traced.

Revised Pages
While contextually bounded mechanisms limit the scope of generaliza-

tions, this fact can be leveraged to learn more about the conditions under
which causes produce particular effects and how those causes work. In medi-
cine, given the complexity of the human body and the fact that many people
who are sick often take multiple different medications to alleviate symp-
toms, alleviate other maladies, or prevent future illness, individual medical
treatments occur in a complex context in which the treatment might work
in particular subsets of people (e.g., those taking particular drugs or having
particular physical characteristics), not work in others, and have disastrous
effects in others. Here learning about how a mechanism works in actual pa-
tients sheds light on the conditions that must be present for a given process
to work as wanted, enabling better diagnosis of which patients can receive
particular treatments.8
Similarly, learning about the contextual conditions required for a policy
intervention to work can also enable decision-makers to use the instrument
in situations in which it could plausibly work instead of exporting it to cases
that lack the required contextual conditions.
3.6. The Temporal Dimension and Levels of Analysis
When working with theories of mechanisms, it is important to specify both

the time span within which a given mechanism is theorized to operate and
whether the mechanism operates at the microlevel, between individual ac-
tors, or at the macro/structural level.
The Temporal Dimension of Mechanisms
Causal mechanisms vary according to the time horizon of the causal forces
that produce an outcome and the time horizon of the manifestation of the
8. Randomized controlled experiments across a number of subsets of people would also
shed light on contextual differences, although such a design would be a very onerous and
expensive because we would have to engage in experiments on a large number of different
subsets. We would need a large number because we would have no knowledge about what
contextual differences might matter. And even if we could do this, we would still not know
why these contextual differences mattered because we did not trace the operation or break-
down of the process.

Revised Pages
outcome. As table 3.3. shows, we adapt Pierson’s (2003, 2004) theorization

of the temporal dimension in causal theory to theorization of causal mecha-
nisms in process-tracing.
Many scholars have traditionally theorized causal relationships in terms
of short time horizons both with regard to the cause, the mechanisms, and
the outcome. Khong’s (1992) process-tracing analysis of the impact of analo-
gous reasoning in the U.S. government’s decision-making process in the es-
calation of the Vietnam War is an example of a short-term “normal” mecha-
nism in terms of both the temporal duration of the mechanism and the time
frame in which the resulting outcome manifests itself.
Yet theories can vary depending on the length of time within which the
mechanism is theorized to be operating and the time horizon during which
the outcome manifests itself (Pierson 2004). Incremental causal mechanisms
have causal impacts that first become significant after they have been in ac-
tion over a long period. After an incremental mechanism has been in play
for a long period, a cusp is reached, after which the outcome becomes al-
most immediately apparent. For example, many analyses of treaty reform
in the European Union take a relatively short-term view, focusing on the
grand bargains between governments on more integration (e.g., Dyson and
Featherstone 1999; Moravcsik 1998). Other scholars working with historical
institutionalist types of explanations have contended that this snapshot-type
analysis misses the important long-term effects of an institutional mecha-
nism of “informal constitutionalization,” which is theorized as an incremen-
tal process of small decisions by actors that accumulate over time, with the
outcome being the creation of a structure that forms a pro-integrative con-
text for governmental decisions (Christiansen and Jørgensen 1999; Chris-
tiansen and Reh 2009).
In addition, one can theorize that the outcome of a causal mechanism
can first manifest itself over a longer time period (Pierson 2004: 90–92). For
example, Campbell (2005) analyzes globalization (cause) and institutional
TABLE 3.3. The Temporal Dimension of Causal Mechanisms

Time horizon of outcome
Short Long
Short Normal Cumulative effects
Time horizon of mechanism “Tornado-like” “Meteorite/extinction”
producing an outcome
Long Incremental Cumulative causes
“Earthquake-like” “Global warming”
Source: Adapted from Pierson 2003: 179, 192.

Revised Pages
change (outcome) and contends that institutional change is not always self-
evident. Different forms of institutional change are theorized to vary de-
pending on their time span, including evolutionary change (composed of
small incremental steps along a single path), punctuated equilibrium (where
nothing happens for long periods followed by a period of relatively rapid and
profound institutional change), and punctuated evolution (which features
long periods of evolutionary change followed by a rapid period) (33–35).
It is therefore important to theorize explicitly about the time dimension
involved in the workings of a mechanism along with how an outcome mani-
fests itself. Activities in an incremental mechanism will not be dramatic inter-
ventions but instead will be small nudges whose effects accumulate over time.
A longer-term mechanism will therefore look very different empirically from
a short-term mechanism. In particular, these differences manifest themselves
in the types of empirical fingerprints that an incremental, long-term mecha-
nism will be expected to have in comparison to a short-term mechanism
(see chapter 5). Here, we should expect small, almost unnoticeable empirical
traces that will only be found if we know what we are looking for.
Incremental mechanisms are also challenging to study empirically. This
type of mechanism produces very little observable evidence until the mecha-
nism has reached a cusp, after which a very sudden development (outcome)
occurs. This type of mechanism could be mistaken for a short-term mecha-
nism, but incorrectly theorizing that the outcome resulted from a short-term
mechanism would cause us to miss the longer-term incremental process that
actually produced the outcome.
Micro-or Macrolevel Mechanisms
As discussed in chapter 2, there is disagreement about whether theories

of mechanisms have to be at the level of individual actors (microlevel) or
whether we can also operate with macrolevel structural theories of mecha-
nisms. We are agnostic on this question, leaving it to the theoretical tradi-
tion within which a researcher is working and/or the level at which it is best
possible to trace the observable manifestations of the process.
Macrolevel mechanisms are structural theories that cannot always be re-
duced to the actions of individuals (type 1 in figure 2.3). Many sociologists
claim that the search for the microlevel foundations of behavior is futile and
that much of the capacity of human agents derives from their position in
society (structure) (Mahoney 2001; McAdam, Tarrow, and Tilly 2001; Wight
2004). Many important causal mechanisms in the social world are arguably

Revised Pages
macrolevel phenomena that are “collaboratively created by individuals yet

are not reducible to individual” action (Sawyer 2004: 266). This means that
we are dealing with the phenomena termed emergence, defined as macrolevel
mechanisms that have their own existence and have properties that can-
not be reduced to the microlevel. Institutional roles, norms, and relational
structures can play a significant role in actor behavior, and mechanisms have
structural properties that cannot be defined solely by reference to the at-
omistic attributes of individual agents (Sawyer 2004). When formulated in
mechanistic terms, we would need to theorize the macrolevel entities and
their associated activities.
Purely microlevel theories relate to how individuals’ interests and beliefs
impact their actions and how individuals interact with each other (type 3).
However, this does not mean that actors are necessarily individual humans.
Social science operates with many forms of collective actors that are treated
as if they were individuals, an idea most bravely captured by Wendt’s (1999:
194) contention that “states are people too.” One example of a purely mi-
crolevel theory is Coleman’s (1990) rational-choice-based theory of social
action, where even altruistic actions are reduced solely to individual self-
interested motivations (desire for reciprocity in a long-term iterated game).
Situational mechanisms describe how macrolevel social structures con-
strain microlevel individual action and how cultural environments shape de-
sires and beliefs (Hedström and Swedberg 1998). Examples of a macro-micro
situational mechanism include constructivist theories of actor compliance
with norms that are embedded at the macrolevel (structural).
Transformational mechanisms describe processes whereby individuals,
through their actions and interactions, generate various intended and unin-
tended social outcomes at the macrolevel (Hedström and Swedberg 1998).
An example of this type of micro-macro mechanism could be socialization
processes, whereby actors through their interaction create new norms at the
macrolevel. Another example is from game theory, where individual actions
in situations like the Prisoner’s Dilemma create macrolevel phenomena like
the tragedy of the commons.
3.7. The Building Blocks of Mechanism-Based

Theoretical Explanations
When theorizing about mechanisms in the systems understanding, we are

not reinventing the wheel each time. Instead, we can think of theorized

Revised Pages
mechanisms in building-block terms, with certain elements that might be

common to similar types of causal mechanisms (Steel 2008: 49–53). The
principle that parts of mechanisms might be present in different mechanisms
is termed modularity, in which parts of a causal mechanism act as modules
that can travel across classes of homologous theorized mechanisms. Similar
types of theoretical explanations can share certain conceptual commonalities
in terms of similar entities that engage in similar activities. For example, a
theoretical causal mechanism that binds together causes and outcomes in an
institutional explanation will have certain common modules irrespective of
context: elements that deal with how institutions create opportunity struc-
tures for individual actors and/or groups.
Parsons (2007: 49–52) offers a useful differentiation of types of ex-
planations in the social sciences: structural, institutional, ideational, and
psychological. This differentiation can serve as inspiration for different
modules that might be more general within the different types. Structural
causal theories focus on the exogenous constraints and opportunities for
political action created by the material surroundings of actors. Common
building blocks for structural theories include how certain preferences and
a given material structure dictate observed behavior (or in a looser sense,
create a pattern of structural constraints and incentives) (65). Another
building block of structural theories that might be shared across structural
mechanisms is a part that describes action as being a rational process (52).
For structure to have any impact, actors have to react in predictable (i.e.,
rational) ways to their structural positions (52). An example is found in the
theorization on electoral realignment in the U.S. context, where realign-
ments at the congressional and local level are theorized to be the prod-
uct of changes in demographic factors and other slow-moving structural
mechanisms (Miller and Schofield 2008).
Institutional theories are distinct from structural ones in that institutions
are man-made and therefore manipulable structures. Institutions can be de-
fined as “formal or informal rules, conventions or practices, together with
the organizational manifestations these patterns of group behavior some-
times take on” (Parsons 2007: 70). Typical institutional explanations deal
with how certain intersubjectively present institutions channel actors unin-
tentionally in a certain direction. The exact content of institutional explana-
tions is determined by which of the different subtypes of institutional theory
is being utilized, ranging from sociological institutionalist explanations that
have norms and institutional cultures as common building blocks; to ra-
tionalists, who share a focus on institution-induced equilibria; to historical

Revised Pages
institutionalists, who conceptualize causal mechanisms in ways that capture

the unforeseen consequences of earlier institutional choices, prioritizing the
building blocks of path dependency and temporal effects. An example of
an institutional explanation is Streeck and Thelen’s (2005: 22–23) layering
theory, where progressive amendments and revisions slowly change existing
political institutions.
Ideational theories share the argument that outcomes are (at least par-
tially) the product of how actors interpret their world through certain ide-
ational elements (Parsons 2007: 96). Here, the focus is not on how structures
or institutions constrain behavior but on how ideas matter in ways that can-
not be reduced to the objective position of an actor. Common theoretical
building blocks include how actions reflect certain elements of ideas and the
fact that elements arose with a degree of autonomy from preexisting objec-
tive conditions (ideas are not just manifestations of structures). An example
is Khong’s (1992) theory that explains how historical analogies impact how
actors interpret the world, making certain foreign policy choices more likely
than would otherwise have been the case.
Finally, psychological theories deal with mental rules that are hardwired
into the human brain, resulting in behavioral regularities. Common build-
ing blocks include theorization about how and how much internal psycho-
logical dispositions interacted with other factors to produce action. An ex-
ample is Janis’s (1983) groupthink causal mechanism, where the innate social
needs of individuals are theorized to produce a mechanism that results in
poor decision-making processes dominated by premature consensus.
3.8. How to Theorize about Mechanisms in Practice
How can we develop good theories of mechanisms, either in minimalist

or systems terms, that are amenable to empirical analysis? Here we provide
three different procedures that can be used to formulate theories of mecha-
nisms. Common to all is that the mechanism developed should be compat-
ible with the type of explanatory framework in which operations are taking
place—structural, institutional, ideational, or psychological.
Theoretical Brainstorm
Theorization of mechanisms can start with a theoretical brainstorm inspired

by existing ideas from relevant literatures. This brainstorm should be in-

Revised Pages
formed by a wide-reaching survey of the literature. For example, if we are

working with a theory of decision-making in international negotiations,
we can search the literature on domestic policy-making and the cognitive
psychological literature on decision-making for inspiration about potential
processes or parts thereof.
Using existing literature as inspiration, it can be helpful to write the
cause (or set of causes) in one corner of a piece of paper and the outcome in
another and then game through a plausible process that links them together.
This can be thought of as a causal story, where the entities and activities are
described for critical parts of the story.
In practice, while it is possible to start with a theoretical brainstorm, the
actual formulation is often a back-and-forth process between empirics and
theory. Beginning with hunches about a possible process—based on existing
theorization and case knowledge—we then explore using plausibility-probe-
like case studies. Initial hunches are typically wrong, which should result in
an updating process in which the theorized mechanism is modified to match
what was found. The initial theorization of the attributes of the cause and/
or outcome may also not have captured the attribute(s) that were actually
causally productive in the selected case. This should also lead to theoretical
revision.
After the first revisions, we can start testing the theorized mechanism
more rigorously by developing propositions about observables that can be
tested empirically, engaging in an in-depth theory-testing process-tracing
analysis (see chapter 8).
From Narrative to Mechanism
Instead of a theoretical brainstorm in which conjectures about mechanisms

are progressively updated through repeated meetings with empirics, it is pos-
sible to engage in an empirics-first, bottom-up-type theorization process.
Here, we can first formulate a descriptive narrative of events that occurs
between the cause(s) and the outcome. As discussed in chapter 2, narratives
in the form of a series of events are not causal mechanisms because the causal
links in the story are not made explicit: a narrative tells us who did what
but not why. To qualify as a causal mechanism in both the minimalist and
systems understandings, the narrative will need to be translated into parts
that focus on the causal links in the process—that is, not just what happened
but why.
But the descriptive narrative can act as a first step that can suggest pos-

Revised Pages
TABLE 3.4. Comparing Narratives to Develop Ideas about Mechanisms

Case A Cause Event 1a Event 2a Event 3a Event 4a Event 5a → Outcome
Case B Cause Event 1b Event 2b Event 3b Event 4b Event 5b → Outcome
Commonalities? Cause → part 1 (e1 * a1) → part 2 (e2 * a2) → part 3 (e3 * a3) → Outcome
sible causal links in a mechanism. In particular, a full descriptive narrative

should suggest particular episodes that might reflect critical parts of a mech-
anism. The next step is then to ask why key events were linked together: this
becomes the first stage of the theorization of a mechanism.
Cooking Out the Essence of Two or More Narratives
A particularly useful way of formulating generalizable mechanisms is to de-

velop case-specific narratives of two cases that appear to have relatively simi-
lar causes, outcome, and context. Once narratives of events are developed for
both cases, we can compare to see whether any commonalities might reflect
parts of a more abstract causal mechanism. For example, do we see particu-
lar types of social actors engaging in roughly analogous activities at similar
times in the two cases? If so, this commonality might reflect an underlying
part of a causal mechanism that might be present in the two cases. This com-
parative process can be continued until the mechanism is developed, either
as a minimalist sketch or as a full-fledged mechanistic explanation that can
then be tested more systematically, as depicted in table 3.4.

Revised Pages
C h a p ter 4
Case Selection and Nesting of

Process-Tracing Case Studies
with Markus Siewert
Derived from the sloppy search for causes and their location in
large events is the tendency to slight the importance of conditions
and circumstances under which the outcome occurred. When a
person constructs an account of events that stresses the overriding
importance of a few variables and simple connections between
them, he will learn a set of rigid rules that will not be a good guide
to a changing world. . . . The person will see the world as more
unchanging than it is and will learn overgeneralized lessons.
—Robert Jervis, Perceptions and Misperceptions
in International Politics (1976: 230)
4.1. Introduction
This chapter develops guidelines for selecting appropriate cases for process-
tracing and for generalizing findings about mechanisms. In-depth case stud-
ies using methods like process-tracing only enable within-case inferences
to be made about how causal mechanisms work within the studied case.
To make insights travel to other cases, process-tracing case studies need
to be nested into broader cross-case comparisons, where the studied case
is compared with other cases to enable the generalizing inference that the
mechanism(s) found in the examined case(s) should also be present in simi-
lar targeted cases.
The first task is to select cases that are appropriate for engaging in process-
tracing. If we are interested in tracing a causal mechanism linking a cause or
89

Revised Pages
set of causes and an outcome, we want to trace it in cases where it could have
been present, at least in theory. Tracing a nonexistent mechanism in a case
where we a priori knew it was not present tells us nothing about how the
mechanism works. Using case selection guidelines that are appropriate for
variance-based designs when tracing mechanisms using in-depth case studies
therefore creates the risk that we select analytically irrelevant cases or that we
make flawed generalizing inferences to other cases.
Selecting appropriate cases for process-tracing requires that we first map
a population of cases by scoring them according to whether the cause (or
set of causes) that is theorized to trigger a mechanism and the outcome
are present, along with contextual conditions that can plausibly affect how
the process works. Once we have done this mapping, we can distinguish
between four types of cases: (1) typical cases where the hypothesized cause,
outcome, and contextual conditions are all present; (2) deviant consistency
cases, where the known cause and contextual conditions are present but the
outcome is not present; (3) deviant coverage cases, where the cause(s) is not
present but the outcome is; and (4) irrelevant cases, where neither the cause
or outcome are present.1 In process-tracing, typical cases are used for build-
ing and testing theories about mechanisms, whereas deviant consistency
cases shed light on why a process did not work as expected—for example, an
omitted contextual condition must be present for the mechanism to work.
While relevant for other methods, deviant coverage and irrelevant cases tell
us nothing about the mechanisms linking causes and outcomes and there-
fore have limited use for process-tracing.
We then turn to the issue of generalizing about mechanisms based on
successful process-tracing of mechanisms. If we have found a mechanism in
a case, how can we generalize these findings to other similar typical cases?
Unfortunately, most of the existing literature has focused on establishing
cross-case similarity as the main criterion for generalization, thereby assum-
ing mechanistic homogeneity (different mechanisms operating in cases that
share similar causes and outcomes). There are two forms of mechanistic het-
erogeneity: (1) where the same causes trigger different processes in two or
more cases, thereby resulting in different outcomes, or (2) where the same
cause is linked to the same outcome through different processes. This chap-
ter focuses on the second form because the risks of the first variant should
be reduced through careful mapping of the population by scoring cases on
1. The terms deviant coverage and consistency were developed in the literature on QCA. See,
e.g., Schneider and Rohlfing 2013.

Revised Pages
Case Selection and Nesting of Process-Tracing Case Studies 91
their values of the cause, outcome, and contextual conditions. The contex-
tual sensitivity of mechanistic explanations—in particular, when unpacked
as systems—means that mechanistic heterogeneity might be lurking under
what might look like a homogeneous set of cases at the level of causes. After
having reviewed the deficiencies in the state of the art regarding generalizing
after process-tracing, we explore potential sources of mechanistic heteroge-
neity that can result from mapping using either variance-or case-based com-
parisons. This is followed by a snowballing-outward strategy for generalizing
about mechanisms within a bounded population of cases that can reduce
the risk of flawed generalizations about mechanisms. As discussed in the
introduction, while the risk of mechanistic heterogeneity produced by the
contextual sensitivity of mechanisms reduces the scope of valid generaliza-
tions about mechanisms, the goal of cumulative research about mechanisms
should be to delineate the proper bounds within which specific mechanisms
work, enabling us to more confidently claim that the mechanism will work
as hypothesized in a particular context. The alternative of lifting the level of
abstraction of our mechanistic claims to one-liners that can be generalized
across many cases is not an attractive alternative because we learn next to
nothing about how the process works in real-world cases.
The chapter ends with two examples (appendixes 1 and 2) illustrating
that the problems of mechanistic heterogeneity discussed in a hypothetical
example in the chapter also are present in published research.
4.2. Selecting Appropriate Cases for Process-Tracing
When the research goal is to say something beyond a single case, after theo-
retical concepts are defined, the next step is to map a population of cases that
are relevant for tracing mechanisms. The overall mapping of a population
should include both positive and negative cases on both the potential cause
(or set of causes) and outcome, although in process-tracing, only positive
cases on the cause(s) are relevant for in-depth tracing of mechanisms. The
mapping should enable cases to be categorized based on similarities and
differences on key conditions and the outcome. The initial mapping can
include cases that score both positive and negative on the outcome.
Mapping a population involves scoring cases on values of the cause(s),
outcome, and relevant contextual conditions. Contextual conditions can be
defined as any factor that could impact how a process works. For example,
the degree of ethnic heterogeneity in a country might impact how societal

Revised Pages
conflicts play out. Permissive contextual conditions are factors that must be
present for process to work, whereas inhibiting conditions prevent a given
mechanism from working. Modifying conditions are factors that impact how
a process works—for example, by shifting it from one pathway to another.
Mapping cases results in a simple dataset as depicted in table 4.1. Here,
there are four cases: two positive cases of the outcome (cases 1, 2) and two
negative cases of the outcome (cases 3, 4). If our theory was that C1 and C2
are causal conditions that together are sufficient to produce the outcome,
cases 1 and 2 would be cases in which it would be relevant to trace the
mechanism(s) linking C1 and C2 with O (positive, typical cases). Cases 1
and 2 differ on a contextual condition (C3), which potentially can impact
what mechanism links C1 and C2 with O.
TABLE 4.1. A Simple Mapping of a Population

Case C1 C2 C3 C4 O
1 + + − − +
2 + + + − +
3 − − + − −
4 + − − − −
Note: + means that condition is present; − means that
condition is absent.
Problems with Using Existing Datasets to Map Populations of Cases
There are many existing large-n datasets that may provide case scores on one
or more conditions being utilized. However, three modifications are typically
required before they can be used as the basis for mapping a population for
selecting cases for process-tracing. First, most existing datasets have already
defined concepts, which means that the attributes they include might not be
relevant for the mechanism(s) we are interested in studying.2 For example, if
we are interested in studying the mechanisms that link democratic systems
with more peaceful resolution of domestic conflicts, it would be problematic
to use an off-the-shelf dataset like the Polity IV aggregate measure of democ-
racy because it does not include the attribute civil liberties in its definition
of democracy (Marshall, Gurr, and Jaggers 2017: 14). Therefore, before exist-
2. For an illustrative discussion of problems of attribute inclusion/exclusion in large-n

datasets of democracy, see Munck and Verkueillen 2002.

Revised Pages
ing datasets are utilized, it is important to determine whether the attributes

included in definitions of concepts match those that we are theorizing as
causally relevant for our research question. For example, if we were drawing
on the V-Dem dataset for case scores on attributes of democracy (Coppedge
et al. 2017), we would not utilize the aggregate indexes unless it has been
ascertained that all of the attributes are causally relevant for the hypothesis.
Given that the V-Dem dataset measures literally hundreds of individual at-
tributes, it is relatively easy for the process-tracing researcher to custom-fit
the measure used to the research question.
The second challenge relates to how cases are typically defined in large-n
variance-based designs. When studying mechanisms, how we should define
a case is determined by the spatiotemporal bounds of the operation of the
mechanism linking causes and outcomes—that is, from the triggering of the
mechanism by the cause to the occurrence of the outcome. In contrast, in
variance-based research, cases are often measured using country years: one case
of the democratic peace would be a dyad measured in one year on scores of
democracy and peace/war. Here the variance-based case does not match the
spatiotemporal bounds of the operation of the mechanism linking mutual
democracy with peaceful conflict resolution, which could range from weeks
to years depending on the crisis (see chapter 3). Information on cases from a
large-n dataset therefore must be translated into terms compatible with the
spatiotemporal bounds of operation of the given mechanism being investi-
gated before they can be used to categorize cases for process-tracing research.
Finally, concepts in existing large-n datasets are typically measured using
ordinal or interval scales or by setting an arbitrary threshold that might not
match the threshold at which a given mechanism becomes operative. The
asymmetry of mechanism-based claims means that a particular mechanism
only operates in cases where the cause, outcome, and requisite contextual
conditions are present. In cases that are not members of the set of the cause,
the mechanism will by definition not be triggered (although some empirical
uncertainty may exist about the level at which it will be triggered). If we are
operating with ordinal-or interval-scale measures of the causes and/or out-
comes, we would be unable to determine in which cases we should expect
the mechanism to operate. For example, when exploring the impact that
the capacity for collective action has for the processes linking authoritar-
ian repression with democratic transitions, one of the quantitative measures
Haggard and Kaufman (2016: 71–73) use is the log of per capita membership
in unions affiliated with the International Trade Union Conference. This
interval-scale variable ranges from scores of −2.43 to 3.4. However, if we

Revised Pages
wanted to select a case to study how societal grievances can be channeled

through trade unions, we would first have to set a threshold above which
we could confidently claim that the case was in the set of cases where the
mechanism of “collective action through trade union mobilization” is possi-
ble.3 This means that we have to translate ordinal-or interval-scale variables
into dichotomous, asymmetrically defined conditions, focusing on clearly
defining the qualitative threshold demarcating cases that are in and out of
the set of cases where a given mechanism could operate (see chapter 3).
Some large-n datasets include dichotomous measures of concepts such as
war, but the thresholds for membership will usually be arbitrary in relation
to the types of mechanism-based claims we are researching with process-
tracing. For example, in the Correlates of War measure of interstate war, a
conflict involving two or more states that results in more than 1,000 battle-
related deaths over twelve months is coded as a war (Sarkees 2010). If in-
terstate war is the outcome we are investigating, should we expect that the
causes and operative mechanisms would really be different in a case where
950 battle-related deaths occurred than one where 1,050 such deaths oc-
curred? And more problematic, should we expect that similar causes and
mechanisms were operative in cases where casualties were close to 1,000 ver-
sus those that numbered in the hundreds of thousands, or even millions, as
in World Wars I and II? Here, we would first have to explore whether the
threshold makes any sense in relation to the mechanism-based claim we are
researching. If our process-tracing research question deals with exploring
the mechanisms linking great power rivalry with war, major systemic wars
such as World War I would be very relevant cases to study, whereas minor
border skirmishes like the Football War between El Salvador and Honduras
that erupted briefly in 1969 would be irrelevant even though the number
of casualties was over the threshold. To make sure that cases are relevant
for process-tracing, we would want to define the concept in thicker terms,
including attributes in a thicker concept that help us narrow the bounds of
the population of cases to include only those that are causally relevant for
our research question.
3. Considerable uncertainty can of course exist about where the exact threshold should be
set, and it can vary across contexts. For this reason, case-based scholars tend to favor selecting
cases that are obviously in—what can be termed ideal-typical cases. An example would be
Sweden in 1985 as a member of the set of countries that have welfare states. This case would be
included in the set of welfare states irrespective of how the concept is defined and how high
the threshold is set.

Revised Pages
Setting the Bounds of the Population
The goal of mapping a population for process-tracing research is to deter-

mine similarities and differences that may impact what mechanism operates,
enabling us to select appropriate cases and generalize our findings about
mechanisms. The bounds of the population of cases that are appropriate for
process-tracing are determined by the similarities in the causal and contex-
tual conditions and the outcome that might be relevant for which mecha-
nisms are operative (Ragin 2000: 190–98). Basically, we want to include cases
with similar mechanisms operative in our population and exclude those with
different mechanisms (or none) operative. If we are studying the causes and
mechanisms that produce minor border wars, we would not want major
systemic conflagrations like World War II to be included in our population
of positive cases because the outcome in such a case is probably produced
by different causes and processes than are present in minor border wars.
We should set a threshold for membership that enables us to distinguish
between minor border skirmishes and other types of war.
Early in a research project and when there is little accumulated knowl-
edge about the topic in the field, it can be appropriate to work with only
a handful of cases. A preliminary mapping of similarities and differences
across cases on potential causes, contextual conditions, and the outcome
is undertaken, enabling the selection of one or more appropriate cases for
process-tracing. Instead of worrying about the exact location of a qualitative
threshold, one can also select obvious cases for inclusion—that is, cases that
would be clearly in the set of cases where the cause and outcome are pres-
ent almost irrespective of the level at which the threshold is set (Goertz and
Mahoney 2012: 128).
After engaging in more than one successful process-tracing case study, a
more systematic mapping of the population is required if the mechanistic
findings are to be generalized beyond the studied cases. This involves ask-
ing whether other comparable cases exist and determining the appropriate
bounds for inclusion in time and space (Ragin 2000: 190–91). If we are
working with relatively rare events (e.g., a crisis between superpowers), it
might be possible to map similarities and differences in all cases. In other
instances, cases should be selected that are representative of the broader dis-
tribution of cases.
There is no correct answer to the question of the appropriate bounds of
a population in relation to a particular research question (Collier and Ma-
honey 1996: 66–67; Ragin 2000: 190–98). When engaging in mechanism-

Revised Pages
focused research, the underlying goal is to include cases in which similar

processes operate and exclude those where different or no processes operate.
In other words, there should be mechanistic homogeneity among the cases
included in the population. However, the bounds of this homogeneity are
also a function of the level of abstraction of our mechanisms (see chapter 3).
If we are working with complex and contingent mechanisms that include a
number of highly contextually sensitive parts, the proper bounds might be
only a small handful of cases. In contrast, a relatively abstract mechanism
might be expected to operate in a much larger number of cases.
In case-based research, we typically operate with relatively small bounded
populations because we are very concerned about the potential risk of causal
heterogeneity (Collier and Mahoney 1996: 68–69; Goertz 2006: 29; Ma-
honey 2007, 2008; Skocpol and Somers 1980). This type of contextualized
and bounded explanation can seem far from the twin ideals of parsimony
and generalizability that pervade areas of the social sciences where neoposi-
tivist philosophic views prevail (Jackson 2016; Mahoney and Thelen 2015).
Case-based scholars would counter that parsimony only makes sense when
it does not come at the price of explanatory accuracy. For this reason, case-
based scholars might study democratization in Latin America in the 1980s
(instead of all cases of democratization) or ethnic civil wars in Africa after
the end of the Cold War. Naturally, if we want to make inferences to a
broader, more heterogeneous population, we have to hedge our bets by mak-
ing only probabilistic causal claims, but that tells us next to nothing about
the mechanisms operative in a single case (see chapter 1).
Types of Cases
The result of a mapping of the population is that cases can be divided into
four types depending on whether they are members of the cause(s) and req-
uisite contextual conditions and the outcome: typical, deviant coverage,
deviant consistency, and irrelevant (Beach and Pedersen 2016a, b; Schnei-
der and Rohlfing 2013). In table 4.2, cases are divided into four quadrants
based on these types. Cases in quadrant I are “typical” cases, understood
as the cases where a priori we can expect the theorized C → O relation-
ship through the theorized mechanism to be operative. Quadrants II and IV
represent two different types of deviant cases: quadrant II is deviant cover-
age cases, where the cause(s) and/or the required contextual conditions are
not present but the outcome is; quadrant IV is deviant consistency cases,
where the cause(s) and contextual conditions are present but the outcome is

Revised Pages
not. Cases in quadrant III are irrelevant and thus tell us nothing about the
mechanisms operative.
Disregarding cases in quadrant III as analytically irrelevant clashes with
most existing case-selection guidelines (but see Mahoney and Goertz 2004).
According to Lieberman (2005: 444), we should select cases for in-depth
study based on the size of their residuals in relation to a regression-based
analysis of X:Y correlations. Gerring and Seawright (2007: 89) make simi-
lar claims, defining typical cases of a given X:Y relationship as regression
on-liers that could include cases in quadrant III. Yet these guidelines are
relevant only when investigating the magnitude of causal effects, where
variation between cases where the cause is present and absent is key to our
ability to make inferences based on evidence of difference-making (e.g.,
Gerring 2007; King, Keohane, and Verba 1994: 142–46; Lieberman 2005;
Tarrow 2010: 249). When making asymmetric claims, as we do when study-
ing mechanisms using process-tracing, positive typical cases are the most
interesting. For example, if we are studying mechanisms that link mutual
democracy (cause) to avoidance of war (outcome), tracing a mechanism in a
negative case where the cause and outcome are not present (two nondemoc-
racies went to war) would shed light on mechanisms that lead to war but not
the mechanism (if any) between democracy and peaceful conflict resolution.
While negative (aka deviant consistency) cases can provide important infor-
mation about contextual conditions, they tell us nothing about the actual
mechanisms and how they work in positive, typical cases.
Typical Cases—Quadrant I
When our research goal is to either build or test theories of mechanisms,

typical cases are the only type that is relevant to select, irrespective of whether
or not the theory posits that the cause is a sufficient condition.
TABLE 4.2. Four Types of Cases in Process-Tracing

Outcome present II—Deviant case (coverage) I—Typical cases
(comparative methods, minimalist (minimalist or in-depth versions of
theory-building process-tracing) theory-testing or theory-building
process-tracing)
Outcome not present III—Irrelevant cases IV—Deviant case (consistency)
(theoretical revision process-tracing)
Cause(s) and/or contextual condi- Cause(s) and known contextual
tions not present conditions present

Revised Pages
When a cause (or set of causes) is theorized to be sufficient, we are re-

stricted to choosing cases within quadrant I based on the argument Why
should we investigate whether a mechanism is present linking C and O when
we know a priori based on values of C and O that it is not present? The situa-
tion is slightly more complex when the cause is not a sufficient cause of the
outcome. A mechanism can in principle be present in deviant consistency
cases in quadrant IV. C and the mechanism can in theory be present, but
given that C is only a contributing cause, O has not occurred because of the
lack of other causal conditions that together produce the outcome with C.
However, despite this complication, we should still choose only typical
cases within quadrant I even when the cause is not theorized as a sufficient
cause. Selecting a deviant consistency case within quadrant IV to trace a
mechanism from C to O is not an analytically wise strategy given that there
are three logical possibilities: (1) the mechanism is present but C is not suf-
ficient to produce O, (2) the mechanism is not present because of omitted
contextual conditions, and (3) the mechanism is not present in the case be-
cause there is no causal relationship between C and O.
If we are in situation 1, we would find mechanistic evidence that tells
us there is a causal mechanism triggered by C, but given that C is not suf-
ficient, the process would break down at some point. In situation 2, the
mechanism did not work because of omitted contextual conditions, but
the mechanism might have been triggered and then broken down. Situa-
tion 3 would tell us that the mechanism does not work in the chosen case,
although whether this finding holds in other cases would require further
case studies. Situation 3 tells us something about the mechanism—that
is, it did not work in the case. But if we are interested in building or test-
ing a mechanism-based explanation, we would want to trace it in cases
in which it could—at least in principle—be present from start to finish.
Situation 3 would not be relevant for this purpose for obvious reasons, nor
would situations 1 and 2 because the mechanism does not play out from
start to finish. Therefore, a safer case-selection strategy when focusing on
building or testing theories of mechanisms is to select only cases where we
expect that the mechanism could in principle be present. However, deviant
consistency cases within quadrant IV do have important uses when our
research focuses on revising theories.
When we are uncertain about what contextual conditions must be
present for a given mechanism, we can start by selecting a case where as
many possible relevant contextual conditions are present. Alternatively,
when there are empirical and/or theoretical reasons to believe that the

Revised Pages
population has considerable mechanistic heterogeneity, it can make sense

to select a case that is a similar to other typical cases on as many contextual
conditions as possible. For example, if we are engaging in theory-building,
we would not want initially to develop a theorized mechanism using a
case that differs from other typical cases on many contextual conditions
because we could reasonably expect that the found mechanism would not
travel to many other cases.
Mapping contextual similarity can be done by scoring cases on the cause,
outcome, and potentially relevant contextual conditions and then determin-
ing the level of similarity between pairs of cases (Berg-Schlosser 2012: 111–
59). This can result in a simple similarity graph, as depicted in table 4.3.
Case scores for all potentially relevant conditions are assessed to determine
the level of similarity between cases. For simplicity reasons, we depict this
using crisp-set scores (present = 1, absent = 0). Levels of similarity—or what
Berg-Schlosser terms “Boolean distance”—are determined by the number
of conditions shared by cases with the same outcome. These procedures can
be simplified by using a spreadsheet that compares the number of condi-
tions shared across cases to find the one that has the most commonalities
with other cases. Table 4.3 depicts the comparison of similarity focusing on
contextual conditions (C1, C2, C3), where case 3 has the most similarities
with cases 1 and 2 and case 4. In contrast, case 4 only has one similarity with
cases 1 and 2.
If we find the mechanism in a typical case, we cannot automatically infer
to other cases where fewer or more contextual conditions are present.4 We
would then want to study another case with other contextual conditions
present (Rohlfing 2012: 200–211), gradually becoming more confident about
TABLE 4.3. Mapping Cases for Contextual Similarity

# of
similarities
Case number Cause C1 C2 C3 with case 3
1 + + + + 2
2 + + + + 2
3 + + + −
4 + + − − 2
4. While this guidance might seem similar to what Tarrow (2010: 251) terms “progressively
testing scope conditions,” here we are talking about within-case analyses that trace mecha-
nisms instead of using paired comparisons as a research strategy.

Revised Pages
what conditions must be present for the mechanism to function through

repeated case studies using the snowballing-outward case selection strategy.
Do we need to select typical cases in which other causes are not present?
Some scholars question whether we can make within-case inferences about
mechanisms when we do not control for other causes by selecting cases in
which only one potential cause is present. Gerring and Seawright (2007:
122) write that “researchers are well advised to focus on a case where the
causal effect of one factor can be isolated from other potentially confound-
ing factors.” They term this type of case a “pathway case.” Schneider and
Rohlfing (2013: 8) draw on this guidance in their discussion of case selection
for process-tracing when they state that we should choose “unique set” cases,
where we “focus on one term . . . to unravel the mechanism through which
it contributes to the outcome in the case under study.” Goertz (2017: 234)
writes that we should avoid overdetermined cases that exemplify multiple
causal mechanisms.
The logic of controlling for other potential causal conditions is relevant
for variance-based designs aimed at assessing probabilistic claims about the
average effects of individual causes at the population level using evidence
of difference-making. The logic behind the variance-based argument about
controlling for other causes (and mechanisms) is best seen in an experi-
mental design. It is vital to manipulate only one potential cause at a time
(present or absent). Holding all other things equal enables us to assess the
effect that the presence/absence of the cause has on values of the outcome
(i.e., causal effects).
But when evaluating empirical evidence of the workings of parts of mech-
anisms using process-tracing, we are not assessing evidence of difference-
making but instead are asking what finding particular empirical clues tells us
about the operation of the mechanism (see chapter 5). This is done by assess-
ing whether a particular empirical observation is a relatively unique clue for
the particular part of a mechanism or whether finding it is just as plausible
with other explanations (other theoretical mechanisms or case-specific ex-
planations of the evidence). Controlling for other causes in process-tracing
therefore occurs at the level of mechanistic evidence within a case, which
matches with the level of our causal inferences being made. This implies that
we do not need to control for the presence of other causes when selecting
cases because process-tracing offers analytical tools that enable us to isolate
analytically the workings of a particular mechanism through the assessment
of relatively theoretically unique mechanistic evidence.
Further, complex real-world causes will usually trigger any number of

Revised Pages
different processes that lead to different outcomes, meaning that when en-
gaging in process-tracing we have to choose which process(es) to focus on
and ignore others. For example, budget support to developing countries can
produce (at least hypothetically) a range of outcomes, among them an in-
crease in public goods and services, strengthened public sector institutions,
and increased corruption if mismanaged (see Schmitt and Beach 2015).
Given this, we would not be able to select a real-world case in which only
one of these mechanisms was triggered.
Deviant Coverage Case—New Causes—Quadrant II
Cases within quadrant II are deviant coverage cases where existing causal
theories are unable to explain the outcome—that is, they are not covered
by existing causal theories. This type of deviant case can be used to find
new causes of the outcome, although we are skeptical about whether tracing
mechanisms is the most efficient methodological tool for these purposes, in
contrast to the arguments found in the existing literature (e.g., Lieberman
2005: 443; Rohlfing 2008: 1510; Schneider and Rohlfing 2013).
The basic idea is that one traces backward from the occurrence of the
outcome to find a new cause. But determining what one is actually tracing
under these circumstances is far from straightforward. Existing suggestions
that we engage in backward-tracing build on an understanding of mecha-
nisms that sees them as mere events. Yet tracing events is not the same thing
as tracing causal mechanisms that can be understood as theoretical systems
found in multiple cases (see chapter 2). And if we have no idea about the
cause, we also have no clue about the mechanism(s) linking the mystery
cause with the outcome, meaning that we are in effect blindly groping in
the dark. We therefore suggest that a comparative design would be a much
more efficient analytical first step in detecting potential new causes (Beach
and Pedersen 2016a: 241–45). For example, we might systematically compare
two cases that are similar in all aspects except the occurrence of the out-
come (O, ~O) (i.e., a most-similar-systems design). We would then want
to know what new, undiscovered cause is different between the two cases.
After having found a potential cause using a paired comparison, it would
then be possible to engage in an exploratory case study that focuses on this
new candidate cause, attempting to discern whether the new condition is
actually causally related to the outcome. Alternatively, one could use Mill’s
method of agreement, where we look to see what condition is shared across

Revised Pages
a range of positive cases of the outcome to find a plausible cause (see Beach
and Pedersen 2016a).
Deviant Consistency Cases, or What Is Missing?—Quadrant IV
Cases within quadrant IV are used in theoretical-revision process-tracing,

where a mechanism is traced until it breaks down to detect either omitted
contextual conditions when a cause (or set of causes) was theorized to be
sufficient or omitted causal conditions when a cause (or set of causes) was
not assumed to be sufficient. If we theorize that C is a sufficient cause of
O, deviant cases where C is present but where O is not present are useful
to investigate the contextual conditions that must be present to trigger the
mechanism that will produce O. If C is not theorized as a sufficient cause,
deviant cases within quadrant IV can be used to detect omitted causal condi-
tions that together with C would be sufficient to produce O (see chapter 9).
In both instances, we employ this type of design only after we have posi-
tive results when tracing a mechanism in one or several typical cases within
quadrant I. There is no reason to investigate mechanism breakdown before
we are relatively confident about the existence of a mechanism in one or
more typical cases and have some knowledge about how it operates. If we do
not know much about how a mechanism operates, it would be very difficult
to investigate why it broke down in a deviant consistency case.
Once we are confident about what is going on in typical cases, inves-
tigating deviant consistency cases is very important for developing better
mechanism-based explanations. Using an analogy, once we are certain about
the mechanisms that enable airplanes to fly, we would want to investigate
very closely any accidents to develop a better understanding of the contex-
tual conditions under which planes can fly safely—for obvious reasons.
The Uncertainty of Thresholds
Given that both theoretical and empirical uncertainty about the exact
threshold of membership can exist, we recommend a degree of caution
about selecting cases close to the qualitative threshold that demarcates caus-
ally relevant differences of kind. There can be theoretical and empirical rea-
sons why we are not exactly sure where the appropriate threshold is (Beach
and Pedersen 2016a; Ragin 2000: 224–25; 2008; Schneider and Wagemann

Revised Pages
2012: 33–40). For example, there might be considerable disagreement in the

literature about whether or not measures of C need to include a particular
attribute. Further, there will always be a degree of empirical uncertainty
about whether cases close to the threshold are actually in or out. We might
have poor-quality secondary data that we estimate has a low measurement
validity, meaning that there is a considerable degree of empirical ambiguity
about whether the case really is in or out of the set. When there is residual
uncertainty either for theoretical or empirical reasons, we recommend se-
lecting only cases that are widely accepted by experts as being within the set
of the concept—that is, they are almost “ideal-typical” cases where there are
clear substantive and theoretical arguments for set membership that can be
documented in a transparent fashion (Goertz and Mahoney 2012: 133).
4.3. Problems with the State of the Art Regarding Generalizing

Findings about Mechanisms
Irrespective of whether we have used theory-building or theory-testing

process-tracing, the results of a process-tracing case study either shed light
on how a given causal mechanism operates in a case or tell us that no mecha-
nism operated in the case despite our insistent probing. How, then, can we
generalize our findings about mechanisms to other cases without engaging
in process-tracing in all other cases? In this section, we discuss the state of
the art regarding generalization about mechanisms after process-tracing case
studies, investigating both the variance-based5 and case-based literatures.
Most obviously—although not readily apparent in most of the litera-
ture—we logically have to trace a mechanism in more than one case before
we can generalize because otherwise we have no way of knowing what as-
pects of the found process are case-specific (nonsystematic) and what parts
can be found in similar cases (systematic) (Rohlfing 2008). But beyond the
ignored need for iterative process-tracing, we contend that the key problem
with using existing guidelines is that they are blind to the difference between
causal homogeneity at the cause/outcome and mechanism levels. In both
5. We denote guidelines as being “variance-based” when the comparative mapping of cases
is based on a statistical, large-n analysis or when it is based on probabilistic assumptions
about the propensity of individual cases to exhibit a particular causal relationship (most/least
likely cases). For example, Lieberman’s (2005) nested analysis approach is categorized as being
variance-based, given that case selection and generalization are based on large-n statistical
analysis.

Revised Pages
literatures, the focus is mostly on generalizations at the level of causes instead

of the mechanisms that link them together. In variance-based terms, we
would be talking about generalizing a found causal effect from the studied
case (or cases) to claim that there is an average causal effect of X on Y across
the rest of the population, whereas in case-based terminology, we would
talk about invariant associations in terms of making claims about condi-
tions that are necessary and/or sufficient for an outcome to occur (Goertz
and Mahoney 2012; Mahoney 2016). The basic logic in existing guidelines is
that generalizations from one case to others in a population deals with us-
ing the within-case evidence of the causal effect/association gained from the
studied case(s) to make us more/less confident in the average causal effect/
invariant association in the rest of the population. Most of the literature that
is relevant for generalizing about mechanisms is still trapped in thinking
about cross-case causal relationships, be they causal effects in variance-based
approaches or invariant claims about necessity or sufficiency in case-based
approaches.
Given the focus on effects/invariant associations at the levels of causes,
causal heterogeneity is understood in terms of different causal effects
(variance-based) or different conjunctions of causes that can produce the
same outcome (case-based), which has the knock-on implication that exist-
ing guidelines are blind to the risk of mechanistic heterogeneity in which
the same cause(s) and outcome are linked by different mechanisms. As a
result, existing guidelines downplay the risk of mechanistic heterogeneity.
Even when we have found mechanisms that operate in a small number of
cases using process-tracing, given the sensitivity of mechanisms to context,
we should be averse to making any cross-case generalizing claims about simi-
lar mechanisms being present in other cases until we have tested whether it is
reasonable to expect similar mechanisms to operate in the population. This
can be done using the snowballing-outward strategy for multiple case stud-
ies. Taking causal mechanisms seriously focuses attention on potential mech-
anistic heterogeneity produced by contextual differences that can impair our
ability to generalize about mechanisms. Our advice stands in marked con-
trast to existing guidelines for generalizing on the basis of process-tracing in
both variance-based and case-based traditions (e.g., Humphreys and Jacobs
2015; Lieberman 2005; Schneider and Rohlfing 2013, 2016).
Ignoring the risk of potential mechanistic heterogeneity is highly prob-
lematic. We cannot merely assume that because we found M1 to operate in
a typical case, it will also be present in other cases that are members of the
same cause (or set of causes) and outcome. We shed light on five sources

Revised Pages
of potential mechanistic heterogeneity that can be lurking in sets of cases

that existing guidelines suggest would be causally homogenous. Three of the
sources can be found in both variance-and case-based mappings of popula-
tions, whereas the last two are relevant only in case-based designs that use
QCA to map a population. We present a snowballing-outward strategy as a
solution to these problems.
Variance-Based Guidelines for Generalization—Nested Analysis,

Bayesian Mixed Methods, and Most/Least Likely cases
In the variance-based literature on the combination of within-case and

cross-case analyses and generalization after case studies, we can identify three
overall sets of guidelines.6 First, there are a number of proposals that give
no real guidance for how to generalize from one or more studied cases to
the rest of a population. Prominent here is Lieberman’s (2005: 444) nested
analysis approach, where cases are selected for within-case analysis based on
the results of a large-n statistical analysis. The goal is to test whether there is
within-case evidence of the causal effect of the best-fitting statistical model
found using large-n statistical analysis. Cases that are relevant for model test-
ing using case studies are close to the regression line—that is, they have small
residuals. In figure 4.1, cases that are representative of the X:Y correlation
found in the statistical analysis include cases 1, 2, 3, 5, and 6. If within-case
evidence is found of the relationship in one or more regression on-liers, the
generalization can be made to all other cases on or near the regression line
(Lieberman 2005: 445–46).
Lieberman (2005: 445) suggests that when there is a risk of heterogeneity
of causal effects as a consequence of contextual factors, we should select two
(or more) regression on-liers that exhibit maximum variation (e.g., low X,
low Y and high X, high Y cases). In figure 4.1, we might select cases 1 and 6.
If we find within-case evidence, we could then infer that contrasting con-
texts do not affect the predicted causal effect (444).
However, these guidelines are uninformative in relation to generalizing
about mechanisms. If we find within-case evidence in case 6, can we be
confident that a similar mechanism links X and Y in other cases near the
6. We do not discuss Weller and Barnes’s pathway-analysis approach, given that it builds
on an understanding of mechanisms as intervening variables (2014: 14) and that they explicitly
state that we cannot learn anything about mechanisms by studying a single case (42).

Revised Pages
Fig. 4.1. Lieberman’s nested analysis approach
regression line? Given that we are not told about the threshold at which a
mechanism is theorized to kick in, we might expect that similar mechanisms
also operated in cases 1 and 2, just to a lesser degree. Yet as discussed in chap-
ter 2, mechanism-based explanations are intrinsically asymmetric, meaning
that we should not expect the mechanism to operate in cases below the
threshold. In addition, tracing mechanisms in two cases that merely exhibit
maximum variation on values of X and Y does not inoculate us from the risk
of mechanistic heterogeneity produced by contextual differences (Zs). In a
standard regression model, these factors would be held constant to isolate
the difference that X makes on values of Y. However, this focuses our atten-
tion solely on the cross-case cause/outcome level, ignoring the potential that
the mechanisms linking X and Y might vary significantly in cases with dif-
fering Zs. Therefore, based on Lieberman’s guidelines, we might generalize
about a particular mechanism found in one high X, high Y case with small
residuals to all other cases with small residuals without exploring whether
different processes actually operate in cases with different values on the con-
trol variables. This would mean generalizing based on one or two cases to the
several hundred or thousand used in the large-n analysis. Given that Lieber-
man’s recommendations completely ignore the risk of mechanistic heteroge-
neity, we strongly suggest not using Lieberman’s guidelines when engaging
in mechanism-focused research.

Revised Pages
Second, there are existing guidelines in which generalization focuses on

updating our confidence in the average causal effect within a population of
cases based on the proportion of cases studied using process-tracing. Notable
here is Humphrey and Jacobs’s (2015) Bayesian-based approach for combin-
ing within-case and cross-case analysis, in which they argue that confidence
in the average effects of a cause on an outcome within a population of cases
can be increased by studying one or more cases using process-tracing.7 Criti-
cal to the ability of case studies to update our confidence in average causal
effects is the proportion of studied cases to the population.8 Other things
equal, the more cases studied as a proportion of the population of cases,
the more confident one can be about the size of the average causal effect in
the population. Humphrey and Jacobs (2015: 669) suggest that when there
is a risk of causal heterogeneity in the form of different causal effects across
cases, there is a need for more cases to update our confidence in the average
causal effect in the population because individual cases enable less updating,
other things equal. However, while useful if we are restricting our within-
case process-tracing to inferences about causal effects, their procedures leave
us in the dark about how we could generalize about mechanisms to other
cases, given that we are no longer talking about updating our estimate of the
average causal effect across the population but instead are focusing on esti-
mating whether a found mechanism is present in other positive cases in the
population. Just using information on causal effects tells us nothing about
what processes link together causes and outcomes.
Finally, there is a widespread practice of using the logic of most/least
likely cases to generalize to other cases in a population. The logic of general-
izing based on most/least likely logics goes back at least to Eckstein (1975).
We categorize this logic of crucial cases as variance-based because it builds
on a probabilistic understanding of causation and because it focuses on us-
ing estimates of likelihoods of differences that causes make for values of out-
comes. The use of the term likelihood in relation to causation in cases builds
on a probabilistic ontological understanding of causation at the case level,
where “a cause raises the probability of an event occurring” (Gerring 2005:
167). Likelihood in relation to cases refers to the propensity of individual
cases to exhibit a causal relationship based on case scores on the cause(s) and
7. Macartan Humphreys and Jacobs (2015: 671) recognize the potential for mechanisms to
differ across cases but provide no guidance for dealing with this problem.
8. It is not a simple proportional updating in terms of number of studied cases/size of total
population; instead, it is a more complicated multilevel Bayesian model. But the gist of the
generalization procedure is proportional.

Revised Pages
contextual conditions. For a positive causal relationship between X and Y,

cases with high scores of X would be expected to be more likely to exhibit
Y, other things equal.
The literature defines most/least likely cases in a variety of different ways
(table 4.4)—either by scores on causal conditions and outcomes (variables)
(or changes therein; e.g., Eckstein 1975) or based on the assumptions of the-
ories. Common to most definitions is the idea that a most likely case is one
where other causal conditions except the X in focus suggest that Y should
occur but does not, implying that we can disconfirm X as a cause across
the population. A least likely case is where other causal conditions except
X point in the direction of Y not occurring, but it does. The likelihood of a
causal relationship occurring in a case is based on theoretical reasons—that
is, confounders determine the likelihood of the causal relationship occur-
ring. To determine whether a case is most or least likely requires extensive
knowledge about the relative causal effects of competing theoretical conjec-
tures within a population of cases.
The logic of generalizing based on selecting a most likely case is that if we
do not find a causal relationship in a case where scores on X and Y suggest
that the relationship is highly likely, we would significantly downgrade our
confidence in the cross-case relationship based on studying the most likely
case. The amount we disconfirm confidence in the cross-case relationship is
a function of how “most likely” the case is: the more likely it is, the more
disconfirmation if the theory fails in the case study.
A least likely case logic is the opposite. It is sometimes referred to as a
Sinatra inference based on the lyrics to “New York, New York”: “If I can
make it there, I’ll make it anywhere” (Levy 2008: 12). If we find evidence of
a causal relationship in a case where it should not be present, we would then
make a strong confirming inference to the rest of the population.
However, while intuitively pleasing, the logic of generalizing based on
categorizing cases as most/least likely collapses when we distinguish between
causal effects or invariant associations at the level of causes and the mechanisms
linking causes and outcomes within cases, which is what we do when engag-
ing in process-tracing. Knowing that a case has a high score on a cause does
not necessarily tell us anything about the mechanism linking together the
cause(s) and outcome. Indeed, different mechanisms could operate across
cases with similar scores on the cause depending on which contextual condi-
tions are present.
In process-tracing we want to produce mechanism-based explanations
that can account for what processes operated within particular cases in-

Revised Pages
TABLE 4.4. Different Definitions of Most Likely/Least Likely Cases in the Literature
Eckstein 1975: 118–20 • Cases categorized based on scores on variables, but if marked change
in X has occurred and other causes remain constant, the case can also
be crucial
• Most likely = X1 strongly predicts Y but X2 found to matter (discon-
firmatory for X1)
• Least likely = X2 strongly predicts Y but X1 found to matter (confir-
matory for X1)
King, Keohane, and • Cases categorized based on case scores on pertinent variables
Verba 1994: 209 • Most likely = “if predictions of what appear to be an implausible
theory confirm with observations of a most likely observation, the
theory will not have passed a rigorous test but will have survived a
‘plausibility probe’ and may be worthy of further scrutiny”
• Least likely = case that “seems on a priori grounds unlikely to accord
with theoretical predictions—a ‘least likely’ observation—but the
theory turns out to be correct regardless”
George and Bennett • Cases categorized based on scores on variables
2005: 109–25 • Most likely = single variable X at such an extreme value that its
underlying causal mechanism, even when considered alone, should
strongly determine Y; if the predicted Y does not occur, then hypoth-
esized causal mechanism strongly impugned
• Least likely = case least likely for causal mechanism and alternative
hypotheses offer different predictions, but causal mechanism correctly
predicts Y
Gerring 2007: 115 • Cases categorized based on scores on variables, in particular the
outcome
• Most likely = “on all dimensions except the dimension of theoretical
interest, is predicted to achieve a certain outcome, and yet does not”
• Least likely = “on all dimensions except the dimension of theoretical
interest, is predicted not to achieve a certain outcome, and yet does”
Levy 2008: 12 • Cases categorized based on values of key variables or theory’s assump-
tions and scope conditions satisfied or not satisfied
• Most likely = case is likely to fit a theory but data from case does not
fit, strongest if case is least likely for alternative theory
• Least likely = case is not likely to fit a theory but data supports theory,
strongest if case is most likely for alternative theory
stead of just producing theoretical odds for whether a causal relationship

is present in a case. In a deterministic understanding of causation, if
the causes and contextual conditions required are present in a case, we
should expect that a given mechanism will be triggered. Any case that
fulfils these conditions should demonstrate the mechanism, meaning
that logically it is not more or less likely. Understood in deterministic

Revised Pages
and asymmetric causal terms (see chapter 2), the relevant distinction is
possible and not possible. If a mechanism is present when we did not
theoretically expect it (a least likely case), we should revise the theory
about the contextual conditions under which the mechanism is pres-
ent instead of making a strong cross-case inference that the relationship
should also work in other more-typical cases.
Second, case studies produce within-case mechanistic evidence, updat-
ing our posterior confidence in a mechanism operating within a single case.
Inferring beyond the single case to the rest of the population requires that
it is causally similar to the studied case. However, given the large differences
in the contextual conditions present in least likely and most-likely cases and
more typical cases and given the sensitivity of causal processes to contextual
conditions that are typically assumed in case-based research, we should ex-
pect that cases with different likelihoods will exhibit high degrees of mecha-
nistic heterogeneity. This means that we cannot just infer that because we
found confirming evidence of a causal mechanism in a least likely case, it
should also be present in potentially causally dissimilar cases throughout the
population (e.g., in most likely cases).
Therefore, the Sinatra inference does not hold for process-tracing of
mechanisms. Irrespective of the musical virtues of the song, a case-based
scholar would rebut that Sinatra was basically wrong in claiming that some-
one who can “make it” as a crooner in a major venue in New York would
“make it” everywhere. He ignores the importance of context for causal rela-
tionships. For example, the style of music might matter, leading us to expect
that Sinatra probably would not “make it” in a bluegrass club in Nashville
or in a Chinese opera in Beijing. The type of audience would probably also
matter, meaning that just because Sinatra made it in a New York nightclub,
we should not infer that he would also rock an audience that is hard of hear-
ing in a nursing home in Albuquerque. Case-based scholars recognize that
context matters, but when extended to the level of mechanisms, this means
that we should be very cautious in generalizing about mechanisms across
causally heterogeneous sets of cases.
We therefore recommend the use of the term typical case without any
qualifications about likelihood in relation to cases. The critical distinction
is not the propensity of a causal relationship in a case in terms of most/least
likely but instead the dichotomous question of possible/not possible—that
is, whether a causal mechanism being operative is possible or not a given
context.

Revised Pages
Generalizing in Case-Based Research—Most Typical Cases

in a QCA Analysis
Existing case-selection guidelines use the combination of QCA and process-

tracing based on assessing patterns of sufficiency within a population. Iden-
tifying single or combinations of conditions that qualify as potential suf-
ficient causes (Ci) for an outcome (O) constitutes the first analytic step of
generalizing (Ragin 2008; Schneider and Rohlfing 2013, 2016; Schneider
and Wagemann 2012). Case-based scholars spend considerable time explor-
ing the relevant bounds of a population of cases, often in conjunction with
careful conceptualization of potential causes, contextual conditions, and the
outcome (e.g., Ragin 2000: 57–63). Once all causally relevant conditions
and the outcome are defined, cases are then scored to demarcate the popula-
tion on which the QCA analysis of sufficiency will be run.
Based on the QCA analysis, we can then distinguish among five types of
cases when using fuzzy-set scores of conditions and the outcome:9 (1) typi-
cal cases, which display the outcome and are members of one or more of
the conjunctions; (2) deviant cases consistency in kind, where the sufficient
term is present but the outcome does not occur; (3) deviant cases consistency
in degree, where the outcome and the configuration are present but the case
violates the subset relation, showing a higher membership in the conjunc-
tion; (4) deviant cases coverage, where the outcome is present but not the
identified configuration; and (5) individually irrelevant cases displaying nei-
ther outcome nor identified conditions (see Schneider and Rohlfing 2016:
532–33).10
Process-tracing can then be used to trace mechanisms in typical cases
and to trace mechanisms until they break down in deviant cases consistency
that can inform a focused comparison of the deviant with a typical (Beach
and Pedersen 2016b: 313–28; Rohlfing and Schneider 2016: 8–9; Schneider
9. Fuzzy-set scores are measures that include a qualitative threshold (0.5), with cases under
0.5 outside of the set and those above in the set. The “fuzzy” part of the scores is the inclu-
sion of differences of degree between full membership (1.0) and nonmembership (0). For
example, a case score of 0.6 would be mostly in the set but not a full member. In our example,
full membership in the set of “popular president” would mean a very high approval rating,
whereas 0.6 would be a president who just barely had a net positive approval rating.
10. These cases are defined in formal terms as follows: typical cases (Ci > 0.5, O > 0.5, and
Ci ≤ O), deviant cases consistency in kind (Ci > 0.5, O < 0.5), deviant cases consistency in
degree (Ci > 0.5, O > 0.5; Ci > Y), deviant cases coverage (Ci < 0.5, O > 0.5), and individually
irrelevant cases (Ci < 0.5, O < 0.5).

Revised Pages
and Rohlfing 2016: 541). Schneider and Rohlfing (2016: 535–39) suggest that
for tracing mechanisms, we should start by selecting a typical case that is
not a member of other conjunctions (unique membership). In other words,
we look for typical cases that are (1) either full members in both the final
configuration and the outcome (Ci = 1; O = 1) or as close as possible to that
and (2) exclusively belong to one term. Unique membership is arguably not
required when engaging in process-tracing because we can distinguish em-
pirically between different mechanisms in operation at the evidential level.
After we have found a mechanism in the typical case, existing case-based
guidelines suggest that we can generalize to all other typical cases that belong
to the same conjunction (Schneider and Rohlfing 2013, 2016). The validity
of making such strong claims about the generalizability of process-tracing
findings is based on the assumption of mechanism homogeneity, according
to which, in the absence of model misspecification, “the same sufficient term
cannot give rise to different mechanisms” (Schneider and Rohlfing 2016:
555). Assumptions about homogeneity with respect to the causal effects on
the cross-case level are thereby extended into the sphere of within-case anal-
ysis—in other words: mechanism homogeneity is assumed to follow from
homogeneity of the cause-effect relation. This means in the context of QCA
that once causal heterogeneity across cases is delineated via identifying differ-
ent sufficient configurations, “cases of the same type are qualitatively identi-
cal and cases belonging to different types are qualitatively different,” which
is why “findings from the study of one, say, typical case, travel to all other
typical cases of the same term, but not beyond this term” (Schneider and
Rohlfing 2016: 556, see also Schneider and Rohlfing 2013: 30; Williams and
Gemperle 2017: 127).
However, given the risks of mechanistic heterogeneity, assuming homo-
geneity without exploring whether the assumption holds creates significant
risks for faulty generalizations about mechanisms.
4.4. Five Sources of Mechanistic Heterogeneity
While the assumption of causal homogeneity across cases that share similar
scores on causes and outcomes makes sense in the context of cause-effect
relations (see, e.g., Collier, Brady, and Seawright 2010: 41–43; Rohlfing 2012:
44–47), it becomes more problematic to assume mechanistic homogeneity a
priori based on homogeneity found at the cross-case level via either statisti-

Revised Pages
cal or QCA-based analysis (B. Clarke et al. 2014: 349). In other words, how
do we know whether mechanistic heterogeneity is actually present across
cases that look similar at the level of causes and outcomes?
In many respects, it can seem strange that existing guidelines ignore the
risks of flawed generalizations created by hidden mechanistic heterogeneity.
In policy interventions, we would not blindly recommend the introduction
of a new policy based solely on evidence that it worked in one case without
exploring first whether the conditions required for the policy to work were
present in the target case. Why, then, are social scientists so willing to blindly
generalize about mechanisms without looking more closely at how the op-
eration of the process can be impacted by contextual factors?
Five potential sources of mechanistic heterogeneity can lurk in what
looks at the level of conditions/outcomes like a causally homogenous set of
cases. The first three sources are shared by variance-based (e.g., regression
results) and case-based (e.g., using QCA) mappings of populations, whereas
the final two are relevant only for case-based mapping using QCA. The five
potential sources of mechanistic heterogeneity are the product of
1. known/unknown omitted contextual conditions;

2. multiattribute concepts;
3. differences of degree in concepts;
4. the configurational nature of QCA results; and
5. the weakest-link logic in conjunctions produced in QCA.
For pedagogical reasons, we use a hypothetical example of mechanistic

heterogeneity and how it can manifest itself within a set of positive cases to
illustrate all five problems within a single study. We conducted a compre-
hensive survey of social science applications of QCA that then engaged in
case studies, and although all the studies suffered from some of the prob-
lems, no study suffered from all of them. We thus created a hypothetical
dataset of cases of “substantial presidential success in passing a law in Con-
gress” (positive outcome). While the individual cases are imaginary, we use a
well-established body of literature concerning legislative leadership by U.S.
presidents (e.g., Barrett and Eshbaugh-Soha 2007; Beckmann 2010; Ruda
levige 2002; Siewert 2017). The cross-case QCA analysis found that the com-
bination of C1 (president’s party enjoys a majority in Congress) and C2 (a
high level of presidential popularity based on public approval ratings) was
sufficient to produce the outcome.

Revised Pages
Problem 1—Known/Unknown Omitted Contextual Conditions
It is obvious that the same cause in a different context can trigger different
mechanisms or even no mechanism (see chapter 3). Unknown conditions
are the product of exclusion of conditions before the cross-case analysis.11
Known conditions are ones that are ignored when selecting cases because the
cross-case analysis found they did not matter for the identified causal effect/
invariant association.
Variance-based approaches typically ignore the problem of omitted con-
textual conditions. In the Lieberman (2005) guidelines, cases are selected
solely on values of X and Y, holding other conditions constant. The crucial
case logic (most/least likely) deliberately works against contextual condi-
tions by arguing that if we find a process working in a context in which we
least expect it to occur, we would be much more confident that the process
is also present in more likely cases. However, this logic is flawed because it
ignores the impact of context on how mechanisms work. If a mechanism
is present when we did not theoretically expect it (a least likely case), this
should result in revision of the theory about the contextual conditions un-
der which the mechanism is present instead of enabling us to make a strong
cross-case inference that the relationship should also work in more typical
cases. For example, when studying elite decision-making, we might have
selected a least likely case for finding a process linking presidential leadership
(C) and rational decision-making (O). In a severe crisis situation, contextual
conditions like the extreme stakes involved and the short time frame for
decisions would be expected to hinder C’s production of Y. But if a least
likely case study then found that the outcome was actually rational decision-
making (O) and that a process linked the two (M1), we would not make
a stronger cross-case inference that C is linked through M1 with O across
the population of more typical cases because we found M1 where we least
expected it to function. Instead, the proper generalization would be that our
knowledge about the requisite contextual conditions needs to be updated.
To do this, we would want to delve into the studied case to understand why
our theoretical expectations about the impact of the adverse context were
confounded, probing the factors that enabled the process linking X and Y to
function and thereby enabling us to delineate more accurately the contex-
tual conditions required for M1 to function.
11. Schneider and Rohlfing (2016: 555) acknowledge this in a brief discussion of model mis-
specification as a source of potential mechanistic heterogeneity in QCA research.

Revised Pages
Although the literature repeatedly discusses QCA as a case-oriented ap-

proach that involves the construction of a bounded population of homoge-
nous cases and a conscious “casing” (e.g., Berg-Schlosser and De Meur 2009:
20–21; Goertz 2017: 60, 219–21; Ragin 2000: 53–57; Ragin 2004: 125–28;
Schneider and Rohlfing 2016: 555), even a cursory glance at QCA applica-
tions reveals that actual research practice frequently does not live up to the
well-intentioned plea for a theory-driven and purposive construction of a
population. Rather, studies regularly draw on rather heterogeneous popula-
tions. For example, a study by Pennings (2003) examines configurations ex-
plaining the constitutional control of executives in forty-five parliamentary
and semipresidential democracies across the globe between 1945 and 1998.
But what is not clear from the outset is how insights about the effects of
configurations of conditions—in particular, as they apply to mechanisms
linking conditions and outcomes—can travel between political contexts as
different as Macedonia and Germany or Guyana and Japan. Similarly, an
analysis by Bretthauer (2015) investigates the propensity for armed conflict
in thirty-one countries experiencing water scarcity, covering a diverse sample
that includes Rwanda, Egypt, Haiti, Nepal, Bahrain, Chile, Japan, the Neth-
erlands, and Switzerland. It is not difficult to imagine that very different
mechanisms might operate even if the causes and outcomes are the same.12
Against this backdrop, the claim that “QCA works on populations of cases
that, by design, are comprised of causally homogeneous cases” (Schneider
and Rohlfing 2016, 555) does not necessarily hold seem to hold in actual
research practice.
While there is always the risk of missing some important factors in the
analytic framework, the second variant of omitted conditions is what we
label known omitted context conditions. In contrast to unknown omitted
factors, known ones are part of the explanatory conditions selected for the
statistical or QCA model yet are dropped from the final terms since they
are identified as not statistically significant in the variance-based analysis
or as redundant in the cross-case QCA analysis through the minimization
procedure. In a QCA analysis, if we find the combination ABC in one case,
and AB~C in another case, condition C would be logically redundant and
could be eliminated (Baumgartner 2013; Ragin 2008; Schneider and Wage-
mann 2012). However, we contend that just because one or several condi-
12. We do not seek to criticize these works, since their primary purpose was not to in-
tegrate their analysis with further in-depth case studies or to hypothesize about the causal
mechanisms at play.

Revised Pages
tions are redundant across cases, we cannot automatically infer that these
conditions do not matter for the mechanisms being operative within cases.
While this notion of difference-making makes sense for the cross-case level
(e.g., Baumgartner 2009; Ragin 2008; Rohlfing and Schneider 2016), it can
easily produce mechanistic heterogeneity because minimized conditions
might be important contextual factors that matter for how the causal pro-
cesses plays out within cases. As an example, in the study on military (non-)
participation in the Iraq war, Mello (2012) finds a solution term linked to
troop deployment that involves the absence of parliamentary veto rights
and constitutional restrictions together with a right executive. According
to the QCA results, whether the center of gravity in the legislature tilts to
the right or the left of the ideological spectrum is logically redundant across
the set of cases. However, it is likely that mechanistic heterogeneity could
be present here, as it is easy to see why the actual causal processes leading to
military participation might differ in cases where a right executive is backed
by a right parliament—as in Australia or the UK—from cases where it had
to face a legislature with a more polarized or even ideologically juxtaposed
center—like in Spain.
We can illustrate the dangers of (un)known omitted conditions to mech-
anistic generalizations by turning to our hypothetical example. Table 4.5
depicts a subset of the full set of cases analyzed using the QCA (see table
4.11). Applying existing guidelines on combining process-tracing after QCA
(Schneider and Rohlfing 2013, 2016; T. Williams and Gemperle 2017) on
the cases included in table 4.11, we could have chosen case 4 as typical and
uniquely covered by the solution term C1*C2.13
TABLE 4.5. Mechanistic Heterogeneity Hidden behind Known and Unknown Omitted
Conditions
Case O C1 C2 C3 C4 UC Mechanism Operative Comment
4 1 1 1 0 0 0 C1*C2 → part 1a → studied with initial
part 2a → part 3a → O process-tracing study
5 1 1 1 0 0 1 C1*C2*UC → part 1 mechanism differs at
→ part 2 → part 3a → the final stage
part 3b → O
3 1 1 1 0 1 0 C1*C2*C4 → part 1c mechanism differs at
→ part 2a*2c → part every stage
3a*3c → O
13. In the absence of further guidelines, we could have equally selected any other most typi-

Revised Pages
Fig. 4.2. A midlevel three-part mechanism linking configuration C1 and C2 to the

outcome O
We then would trace the underlying mechanism linking the configura-

tion of unified government and high levels of public support (C1*C2) to the
legislative success of the president (O). In our example, the process-tracing
case study can have revealed a three-part mechanism as depicted in figure
4.2. The theorized mechanism is not just a simple minimalist “mechanism”
that describes the link as a one-liner, where C1 and C2 are linked through a
“legislative bargaining process” to the outcome. Such a “mechanism” would
tell us nothing about what is actually going on in the process. And if we are
not told anything theoretically about the process, how can we claim that we
have traced it empirically in a case study?
The level of abstraction is midrange (see chapter 3), which means not de-
scribing in detail a case-specific causal process in a particular case but instead
reducing it to a relatively abstract midrange mechanism that could be pres-
ent in other typical cases that belong to the term C1*C2. In the hypothetical
mechanism, a large partisan majority for the president in Congress and high
cal case—i.e., cases 1–3 in table 4.11—for in-depth process-tracing, which, of course, would
lead to the same problems.

Revised Pages
presidential public approval ratings (C1*C2) constitute the initial causal fac-
tors that trigger a three-part mechanism. The majority party first searches
for a legislative proposal that matches the popular president’s agenda (part
1) and then introduces a congressional bill that closely tracks the president’s
priorities (part 2). Part 3 deals with the legislative bargaining, where opposi-
tion within the majority party is whipped into supporting the party line
both through the deployment of material resources (e.g., carrots in the form
of earmarked spending in districts),14 and through anticipation of the util-
ity of a popular president for representatives’ reelection bids. Condition C2
is also a direct cause in part 3 since it ensures the silence of the opposition
party. Once the majority has its troops in line through whipping and silenc-
ing, legislation favored by the president will pass (O). The theory qualifies as
a mechanistic explanation because the causal logics underlying key parts of
the story are made explicit—for example, that material power resources have
to be deployed to whip opposing party members into line.
In table 4.5, we label unknown omitted conditions UC, while known
omitted conditions are C3 and C4, both of which were identified to be
redundant in the QCA analysis in relation to the conjunction C1*C1. C3 is
a strong level of partisan conflict, and C4 is intensive presidential involve-
ment. Case 5, for example, shares the same configuration across all known
conditions with case 4 (C1 to C4), yet the actual causal process might look
different as a result of conditions that are not included in the explanatory
model. One reasonable scenario for a UC in case 5 could be that the bill’s
substance was contested between the House and the Senate, leading to the
two chambers passing divergent policy proposals. Because bills must pass
each house with the exact same wording, resolving differences between the
House and the Senate is crucial; however, it also changes the way the mecha-
nism plays out at the last stage of the bargaining process, where part 3a
(legislative bargaining) found in case 4 would include a second step in case
5 and would involve a conference committee of members of both chambers
(part 3b).
But even if we constitute a very homogeneous population, thus keeping
the effects of potential causally important omitted conditions at bay, the
challenge of mechanistic heterogeneity might persist. As table 4.5 shows,
case 3 is an example of known omitted conditions, differing from case 4 on
condition C4. For example, it would be reasonable to expect a very differ-
14. Both parties stopped the practice of explicit earmarking in 2011. After 2011, this part of
the process worked differently—an example of mechanistic heterogeneity.

Revised Pages
ent legislative process if the president is extensively involved through such

activities as lobbying for preferences both publicly and by negotiating di-
rectly with members of Congress. These actions would alter every part of
the mechanism displayed in case 3 by either replacing or supplementing the
original mechanism. In the first part, part 1a is replaced by part 1c, where the
president proactively needs to convince her party to act on a piece of legisla-
tion. In part 2, the White House first bargains with its party leadership in
Congress about the bill’s substance (2c), after which a bill is proposed (2a).
During the floor phase in part 3, on top of the existing dynamics (3a), the
president in case 3 also actively exerts legislative influence (3c)—for example,
by inviting members of Congress into the Oval Office and on board Air
Force 1 or by making phone calls to convince wavering legislators to support
the agenda. All the time, the president’s popularity works in the background,
conditioning and reinforcing her influence at the bargaining table.
Thus, the presence of known and unknown omitted conditions po-
tentially affects how causal processes play out across different cases that
appeared relatively homogenous based on the cross-case QCA or statistical
results. This means that we cannot just assume mechanistic homogeneity.
Instead, we must check the extent to which existing contextual differences
might change the causal dynamics between the conditions and evaluate
empirically what this means for the generalizability of our process-tracing
case study.
Problem 2—Multiattribute Concepts
Another potential source of mechanistic heterogeneity stems from the use

of multiple attributes in the definition of key concepts. We develop this
problem using the example of constructing sets based on multiple attributes
combined via a logical OR combination in QCA (i.e., attributes are theo-
rized to be substitutable) but note that this is just one potential source.
Others include utilizing indexes or other aggregation techniques used in
variance-based approaches.
The literature on concept formation frequently discusses different ways
to integrate multiple attributes into a single concept (e.g., Beach and Ped-
ersen 2016a: 104–8; Collier and Levitsky 1997; Goertz 2006; Goertz and
Mahoney 2012). It is frequently argued that sets in case-based research al-
low the establishment of a stronger connection to their underlying concepts
since cases—or, more correctly, their properties—are assessed according to

Revised Pages
their degree of fit to theoretical concepts (Ragin 2000; Schneider and Wage-
mann 2012). Moreover, combining several attributes into one set of a con-
cept when engaging in QCA is explicitly advocated to reduce the number
of conditions in an analysis with the goal of restricting the property space of
the truth table and consequently reducing the potential for limited diversity
(see, e.g., Ragin 2000, 321–28; Schneider and Wagemann 2012).
Yet whenever we define concepts and calibrate sets using more than one
attribute, we need to be attentive to the fact that these attributes can carry
different causal properties at the level of mechanisms. This becomes appar-
ent in a simple two-way OR combination of attributes but is even more
problematic if more complex concept-formation strategies are employed—
for example, radial concepts or indexes (Beach and Pedersen 2016a; Collier
and Levitsky 1997). The underlying logic with OR combinations is that two
attributes capture two different dimensions of the same concept but are sub-
stitutable in a causal sense (e.g., Goertz 2006, 39–46).
Multiple attributes in concepts can result in mechanistic heterogeneity
when the attributes that are equivalent from a cause-effects perspective at
the cross-case level trigger different mechanisms at the within-case level. For
example, strong left parties or strong unions can be perceived as functional
equivalent causes of the creation of an established welfare state regime (Ra-
gin 2008; Schneider and Wagemann 2012), yet it is reasonable to expect
that the processes through which the two conditions produce a welfare state
would look very different—for example, through introducing policies inside
parliament versus mobilization of public pressure and strikes outside the
parliamentary arena.
In our example, let us assume that the president’s legislative engagement
(condition C4) is defined using two attributes that tap into two different
types of involvement: the degree of direct negotiations with members of
Congress (C41) or a high degree of presidential public activity (C42), which
were combined via the logical OR, treating the two dimensions as substitut-
able legislative instruments in the president’s toolkit. In case 3, let us assume
that a positive score on C4 is achieved because the president directly negoti-
ated with members of Congress (C41). We might have found the mechanism
operative in case 3 in table 4.6.
In contrast, in case 1, a positive score on C4 is achieved through another
attribute in the OR concept (C42): a high degree of presidential public activ-
ity. Here, we should expect that the process would look quite different from
case 3. For example, instead of part 1c, we could expect the president to hold
public speeches and rallies, to using the bully pulpit to convince wavering

Revised Pages
members of Congress to support the president’s legislative priority (part 1d).

In part 2, we might expect to find public negotiations through the media
between the president and members of Congress before 2a occurs, with the
president using public speeches and other forms of communication to ne-
gotiate about the desired content of the legislation (part 2d). Finally, supple-
menting part 3a, we could expect that during the floor phase, the president
would hold rallies in the districts of wavering and/or supporting members
of Congress to convince them to support her agenda or to strengthen their
support.
Conditions that are functionally equivalent with regard to their causal
effects are not per se equivalent with regard to the processes they trigger
because elements of multiattribute concepts might have different effects at
the level of mechanisms. As our example shows, the two cases differ in cru-
cial parts of the mechanism leading to the president’s legislative success. A
counterargument would be that this is just a question of abstraction and
that the differences disappear once we lift the mechanism to a higher level
and label it a bargaining mechanism. Yet such a one-liner would black-box
the mechanism and mean that we learn nothing about the actual causal
processes at work.
Our hypothetical example illustrates the problem of mechanistic het-
erogeneity linked to the set construction via the logical OR. Yet we can
think of many other sources of mechanistic heterogeneity resulting from
the concepts we use at the cross-case level—and most of them are less easy
to detect. We therefore must check empirically to see whether the concepts
in our analysis involve multiple causal properties at the level of mechanisms
that are masked by the same set value.
TABLE 4.6. Mechanistic Heterogeneity Hidden by Multiple Attributes in Sets

Case O C1 C2 C3 C4 UC Mechanism Operative Comment
4 1 1 1 0 0 0 C1*C2 → part 1a studied with initial
→ part 2a → part 3a process-tracing study
→O
3 1 1 1 0 1 0 C1*C2*C41 → mechanism differs at
part 1c → part 2a*2c every stage
→ part 3a*3c → O
1 1 1 1 0 1 0 C1*C2*C42 → part mechanism differs at
1d → part 2a*2d → every stage
part 3a*3d → O

Revised Pages
Problem 3—Mechanistic Heterogeneity Hidden behind

Differences of Degree
A third potential source of hidden mechanistic heterogeneity is categorical

differences in kind related to mechanisms that are masked by differences of
degree of concepts, whether in the form of interval-scale variables or fuzzy-
set concepts. We focus on the problem of differences of degree in fuzzy-set
concepts, but the problem is even greater when using ordinal/interval-scale
variables as the basis for mapping cases and generalizing because qualitative
thresholds are not included. Therefore, we would have to recalibrate the
variable to include a categorical difference before we could proceed further,
in effect transforming the variable into a fuzzy-set concept.
The main analytic idea behind fuzzy sets is that they enable us to capture
both differences of kind and differences of degree in a single measure (Ragin
2008: 29–33; Schneider and Wagemann 2012: 27–31). The key categorical
point of difference—that is, qualitative threshold that changes the nature of
causal relationships—is the 0.5 fuzzy-set value, which determines whether
or not a case is a member of a given set. Above this 0.5 crossover point, cases
can range from fully in (1) to fully out of the set (0), with the option of
introducing fine-grained gradations in between to account for partial (non)
membership of cases.
The causal rationale at the cross-case level is that the categorical point
of difference at 0.5 establishes the threshold at which a concept develops
or changes its causal character as a consequence of its change in conceptual
status—for example, switching from unified to divided government or from
public support to lack of public support. The importance of difference in
kind has been well expressed by Sartori (1987: 184) in relation to measure-
ment of the concept of democracy: “What is completely missed by this de-
greeism, or continuism, is that political systems are systems, that is, bounded
wholes characterized by constitutive mechanisms and principles that are ei-
ther present (albeit imperfectly) or absent (albeit imperfectly).” Differences
of degree, however, are usually conceived as having a linear effect—that is,
a partial set value of 0.6 in a condition is expected to result in a partial out-
come of 0.6, whereas a full membership in a condition should fully produce
the outcome (e.g., Ragin 2008: 47–54; Schneider and Rohlfing 2013: 581;
Schneider and Rohlfing 2016: 548; Schneider and Wagemann 2012: 65–76).
The ascription of causal properties to fuzzy sets is less clear with respect
to mechanistic explanations. For example, Ragin (2008: 30–33) labels the
two conceptual poles in the context of fuzzy sets as two distinct “qualitative

Revised Pages
states,” which might imply that there are three categorical anchor points
establishing four different types of cases—those that are fully in, partially in,
partially out, and fully out. Yet from Schneider and Rohlfing’s (2016: 555) ar-
gument that inferences about mechanisms identified in one typical case can
travel to all other typical cases irrespective of their partial set membership,
we can infer that only the categorical difference established by the 0.5 thresh-
old matters for the presence of the respective causal mechanism. Mikkelsen
(2017), conversely, introduces the notion of partial mechanisms in his treat-
ment of fuzzy-set case studies, which mimics the logic of the cross-case level.
A cautious middle position regarding the causal role of fuzzy sets can also be
found in Schneider and Wagemann (2012: 30), which states that although
the categorical threshold trumps quantitative nuances, differences of degree
nevertheless constitute real gradual differences with respect to the cases.
While using information about differences of degree in fuzzy sets can
make sense in cross-case analyses using QCA, what this means for mecha-
nisms at the within-case level remains ambiguous. From a mechanistic per-
spective, we agree that categorical thresholds in concepts are crucial in trig-
gering different mechanisms because of the change of status in the concept’s
causal properties. However, we often do not pay enough attention to the fact
that the way fuzzy sets are constructed might include more than one categor-
ical difference at the level of mechanisms. When conducting process-tracing
case studies utilizing fuzzy sets, researchers therefore must be alert to hidden
categorical differences that may change the underlying causal mechanisms
connecting a configuration and an outcome.
We can clarify the underlying challenge by revisiting our hypothetical ex-
ample as shown in table 4.7. If only the categorical difference established by
the 0.5 anchor point matters for producing the causal mechanism, we should
be able to generalize the mechanism found in case 4 to all other typical cases
belonging to the configuration C1*C2, independent of levels of fuzzy-set
scores. However, if we translate the fuzzy-set values back into what they
actually mean at a conceptual level, we might doubt whether the processes
TABLE 4.7. Mechanistic Heterogeneity Hidden behind Differences of Degree

Case O C1 C2 Mechanism Operative Comment
4 1 1 1 C1*C2 → part 1a → part 2a → Studied with initial
part 3a → O process-tracing
12 1 0.6 1 C1*C2 → part 1e → part 2a → mechanism differs at
part 3a*3e → O the final stage

Revised Pages
leading to presidential legislative success are really the same in cases that are
similar in kind but differ in degree.
In case 4, for example, a very popular president (C2 = 1) can rely on
support from an overwhelming partisan majority in both the House and
the Senate, including the sixty-vote supermajority in the latter necessary to
break a filibuster (C1 = 1), producing the mechanism described in figure 4.2.
In this process, the minority party has only limited leverage and avenues to
influence the legislative process. In the end, the president’s legislative pro-
posal passes largely unchanged.
Following existing guidelines, we would expect to find the same mecha-
nism at work in case 12, which is also a typical and unique member of con-
junctions C1*C2. But there is a difference: here, the popular president has
only a slim congressional majority (A = 0.6). It seems reasonable to expect
that this difference of degree might affect how the mechanism unfolds. For
example, we can expect the negotiations in Congress to follow a different
script, with vital consequences for the dynamics in part 3 of the mechanism.
Since the president’s party lacks a sixty-vote supermajority, interparty bar-
gaining becomes crucial to convince enough senators from the other side of
the aisle not to block the legislation (part 3e).
The problem of categorical differences lurking behind differences of de-
gree is far from hypothetical. We know, for example, from the literature in
psychometrics that categorical differences are often masked by degree differ-
ences in concepts (Mitchell 2011). In the QCA context, Hasebrouck (2015)
shows that fuzzy sets can mask categorical differences that become visible
when using crisp or multivalue sets. Going one step further, this means that
the possibility exists that other mechanisms stemming from categorical dif-
ferences link together Ci and O across various set levels. One real-world
example can be found in Samford’s (2010) study on trade liberalization in
Latin America. His case narratives based on the QCA show strikingly dif-
ferent processes at work in Peru (1990–95) and Uruguay (1972–85) although
both reflect the conjunction of hyperinflation and unconstrained executive
power. In particular, inflation plays a different causal role in the two cases, as
might be expected given that the inflation rate in Peru was at 7,481 percent
while Uruguay’s rate was only (!) 58 percent.
Against this backdrop, it becomes clear that assuming mechanistic ho-
mogeneity across cases displaying differences of degree is not the best choice.
Instead, we need to pay closer attention to the possibility that the sets em-
ployed at the cross-case level might conceal categorical differences in the
underlying concept and empirically probe whether these differences might
result in different processes at the within-case level.

Revised Pages
Problem 4—Mechanistic Heterogeneity Hidden behind

Configurative Dynamics
A fourth source of mechanistic heterogeneity lurks behind the configura-

tional nature of QCA and sufficient conjunctions. It is widely acknowledged
that configurations derived at the cross-case level per se tell us only that
multiple conditions coincide and together display a consistent set relation
with the outcome of interest. Results of a QCA therefore tell us next to
nothing about the interplay of the constituting parts of these configurations
and its interactive processes (e.g., Baumgartner 2009; Beach and Rohlfing
2018; Ragin 2000; Wagemann 2017).15 The standard advice in the QCA lit-
erature therefore is that configurations should be interpreted holistically as
concurrence or conjunction of factors and that not too much (or any) em-
phasis should be placed on single components within a given configuration
(e.g., Berg-Schlosser and De Meur 2009: 13; Ragin 2008: 109; Schneider and
Wagemann 2010: 411).
Yet this perspective changes completely when we switch to tracing the
processes because we then are especially interested in getting a better under-
standing of how the conditions actually work by focusing on the productive
powers and interplays between the constituent parts. This means that issues
become complex very quickly when we think about different types of inter-
actions, since the conditions included in a configuration can independently
or jointly affect the outcome, can work additively or through other forms of
interaction, can be present at one point in time or at multiple points, and
can show different sequences, among other things (e.g., Beach and Rohlf-
ing 2015, 17–23; Blatter and Haverland 2012: 94; Falleti and Mahoney 2015;
Goertz 2017; Grzymala-Busse 2011: 1275; Rohlfing 2012). Given that a QCA
does not give any information about the dynamics within a configuration,
one cannot assume mechanistic homogeneity merely based on the shared
membership of cases in a conjunction. Instead, we need to check whether
the conditions forming a configuration play out across cases in a comparable
manner or show different processes. As an example, in the study by Kuehn
and Trinkunas (2017), one conjunction that is found to produce military
15. This holds true irrespective of whether we look at configurations at complex stages—-
i.e., single-truth table rows—or at configurations at later stages after the minimization pro-
cess. Elaborate designs such as temporal QCA (Caren and Panofsky 2005; Ragin and Strand
2008), which allow for including temporal ordering among conditions in the analysis, or
coincidence analysis (Baumgartner 2009, 2013), which concentrates on causal dependencies
among conditions, cannot uncover interplay between the conditions at the level of mecha-
nisms and hence do not account for explanations at the level of causal processes.

Revised Pages
contestation in Latin American countries is the combination of Leftist ideol-

ogy, radicalism, and resource rents. When we look at individual cases such
as the first Venezuelan case (1994–98), we can see that oil rents fluctuate
quite wildly during the case (e.g. from 18.2% of GDP in 1996 to 5.3% in 1998
[World Bank]). Here the dramatic collapse in rents during the period—
which is not captured by the static picture used in the QCA—can plausibly
have produced widespread dissatisfaction that can have affected which pro-
cesses were at play in the case.
To point out the challenge of mechanistic heterogeneity, we focus on the
ways timing and sequencing between conditions might affect the causal pro-
cesses. Going back to our hypothetical example, unified government (C1)
and the president’s popularity (C2) might produce a mechanism in which
the two conditions work jointly throughout the legislative process, come
into play at various points, or are sequentially ordered. For example, as table
4.8 shows, based on the QCA result showing condition C1*C2 as sufficient
configuration for presidential success, we traced mechanism C1*C2 → part
1a → part 2a → part 3a →O in case 4. Yet in a different typical case, case 8,
an alternative mechanism might be involved. Here, the president’s popular-
ity has no active role in the introduction of the bill; instead it is introduced
by the majority-party leadership as an item of their shared partisan agenda
(part 1f ). However, the rest of the process is then similar.
Thus, we cannot draw any inferences about the temporal and/or causal
dynamics at play within a given case from merely looking at its configu-
ration of conditions. This is not problematic as long as we interpret con-
figurations in QCA holistically. However, if we aim to unpack mechanisms
linking a conjunction with an outcome and to make these insights travel to
other cases, we must be alert to potential mechanistic heterogeneity hiding
behind the snapshot of conjunctions produced by QCA.
TABLE 4.8. Mechanistic Heterogeneity Hidden behind Configurative Dynamics

4 1 1 1 C1*C2 → part 1a → part 2a Studied in initial process-
→ part 3a → O tracing case study
8 1 1 1 C1*C2 → part 1f → part 2a First part of mechanism differ-
→ part 3a → O ent because C2 does not play
role in introduction of bill

Revised Pages
Problem 5—Mechanistic Heterogeneity Hidden behind the

Weakest- Link Condition
A fifth and final challenge to deriving valid generalizations in nested process-

tracing with QCA is that mechanistic heterogeneity can be masked by the
aggregate set value of a configuration. Operating based on Boolean logic, a
case’s membership in an AND configuration can only be as high as its lowest
value on a condition that is part of the conjunction. In other words, a case’s
overall membership in a conjunction is defined by its weakest condition,
independent of its value on other conditions included in the configuration.
As a consequence, cases can display the same set value in a configuration
but in reality can vary greatly with respect to their actual causal properties
and the triggered processes. We therefore need to check for dissimilarities
between cases displaying the same set value as a potential source of mecha-
nistic heterogeneity.
When selecting cases for process-tracing based on QCA, existing guide-
lines exclusively utilize the aggregate set membership score of cases within a
given solution term (e.g., Schneider and Rohlfing 2016: 550–54). This proce-
dure is blind to the fact that cases with the same set value on a conjunction
can look very different across the conditions that make up the configura-
tion. However, when we engage in process-tracing, the whole idea of mov-
ing down to the mechanistic level is to zoom into the conjunction and to
explore the interplay and inner workings among its constituent parts. If
cases now differ across conditions essential to a conjunction, even though
the fsQCA told us that differences did not matter for sufficiency claims,
when we shift the focus to mechanisms, we need to be aware of the possibil-
ity that different mechanisms might be at work depending on differing case
scores within conjunctions and the outcome. For instance, in Bretthauer’s
(2015) analysis of the absence of conflicts about water scarcity, Mongolia and
Armenia exhibit almost the exact same fuzzy-set membership in the solution
term—high quality democracy*low agricultural dependency*high tertiary
education—and the outcome: Mongolia 0.61/0.95 and Armenia 0.55/0.95.
However, once we look closer at how the membership in the conjunction is
produced, we realize that the two countries display very different case char-
acteristics. While the two states share similar levels of agricultural dependen-
cies, Mongolia is coded as a rather high quality democracy (0.94) with lower
levels of tertiary education (0.61), whereas Armenia just barely passes the
threshold of being considered a quality democracy (0.55), but exhibits very
high levels of tertiary education (0.95).

Revised Pages
We illustrate this pitfall by going back to our working example. As we

can see in table 4.9, even in a very simple two-way conjunction like C1*C2,
three different scenarios are possible, where different values on condition A
and condition B can produce the same aggregate set score in C1 AND C2.
Obviously, all three cases display the same membership in the conjunction
C1*C2 (0.6) and the outcome set (O = 0.6).
But as table 4.9 highlights, the three cases differ markedly in how their
set value in C1*C2 comes about. In case 20, the president enjoys large ma-
jorities in the House and the Senate (C1 = 1), while his public approval
ratings are just high enough to be considered more popular than not (C2 =
0.6). In case 24, the situation from the vantage point of the White House is
the other way around: the president has just a slim majority on Capitol Hill
(C1 = 0.6) but enjoys overwhelming public support (C2 = 1). In case 28, the
president’s public approval is just high enough (C2 = 0.6) and his partisan
allies in Congress hold only a slim margin (C1 = 0.6).
Once we concentrate on the within-case level, we need to probe whether
these different constellations among conditions—hidden behind the same
set score—lead to variations in the causal processes at play in these cases.
And indeed, it is reasonable to expect that the processes in case 20 and case
24 are not approximately identical. For example, we can imagine a mecha-
nism in case 20 in which the president’s partisan allies introduce a bill that
includes many but not all of the president’s legislative priorities (part 1g).
After the party introduces the legislation (2a), with the president’s popularity
hovering just above 50 percent, she does not have enough political capital
throughout the legislative negotiations (part 3g) to convince her partisan al-
lies to pass a bill that would be more to her liking.
Can we generalize this process identified through the process-tracing
case study in case 20 to all cases that appear similar? In case 24, the start-
ing situation might be completely different, as indicated by the different
TABLE 4.9. Mechanistic Heterogeneity Hidden behind Weakest-Link Condition

20 0.6 1 0.6 C1*C2 → part 1g → part 2a Studied using process-tracing
→ part 3g → O
24 0.6 0.6 1 C1*C2 → part 1a*1h → mechanism differs at every
part 2h → part 3h → O stage because of differing
values on C1 and C2
28 0.6 0.6 0.6 not studied mechanism unclear

Revised Pages
configuration of conditions: the president’s party holds only a slim majority

in Congress and thus needs to strike deals with the opposition party on the
substance from the beginning (part 1h). Throughout the legislative process
(parts 2h and 3h), the president’s overwhelming public approval is a strong
strategic resource, ensuring that his partisan allies in Congress support his
main legislative priorities.
This example raises serious doubts about whether we can assume mecha-
nism homogeneity across cases based on their aggregate set membership in a
configuration. Process-tracing case studies essentially involve disentangling
the interplay between the conditions included in a conjunction and how
they are connected to the outcome. When two or more cases display the
same set value in a configuration but show different constellations among
conditions, we need to be wary about whether comparable processes are re-
ally at work.
Conclusion
Given the sensitivity of mechanisms to context, we cannot assume that the

same within-case processes are at work based on cross-case patterns identified
using case-based comparisons (e.g., QCA) or variance-based statistical analy-
sis. Table 4.10 summarizes the different hypothetical processes that could have
been produced by differences across cases in these five scenarios. It is therefore
necessary to empirically test the degree to which mechanistic homogeneity ex-
ists in a given population and to explore the extent to which inferences about
causal processes can travel across cases (Clarke et al. 2014: 349).
4.5. Snowballing Outward as a Strategy to Test the Boundaries

of Generalization When Generalizing from Process-Tracing
Case Studies
In an utopian world of unlimited analytical resources, we would explore all

typical cases of a given conjunction within a population and would not need
to generalize from studied cases to causally similar but unstudied cases. In
the real world, however, we usually cannot conduct process-tracing studies
in more than a small handful of cases, which is why we need to purpose-
fully select cases to probe the outer bounds of valid generalizations regarding
mechanistic findings. To this end, we propose a snowballing-outward case-

TABLE 4.10. Summary of Different Mechanisms Produced in Different Cases
Differences created
Problem Part 1 Part 2 Part 3 by . . .
1a 2a 3a
• president’s party • president’s party • internal bargain-
leadership searches puts forward ing by party
for proposal that legislation leadership
matches popular • silencing due to
president’s agenda strong public
support
1 1a 2a AND 3a AND Unknown omitted bill’s

2b 3b substance contested
• but two different • followed by nego- between House and
bills in different tiations in confer- Senate)
chambers ence committee
of members of
both chambers
1 1a AND 2a AND 3a AND Known omitted

1c 2c 3c C4—president exten-
• president proac- resident bargains • president lobbies
•p sively involved
tively convinces with party leader- members of
party members ship on substance Congress
to act
2 1d 2a AND 3a AND Multiple attributes

•p
resident uses 2d 3d C42—president pub-
bully pulpit to • president uses • president holds licly active
convince party public negotia- rallies and other
to act tions with mem- tools to pres-
bers of Congress sure members of
through media Congress
3 1a 2a 3a AND Differences of degree

3e C1 has low score (0.6)
• intraparty bar- (slim congressional
gaining also im- majority)
portant because
of slim majority
4 1f 2a 3a Configurative dynamics
• president’s party C2 does not play role
introduces bill in part 1
as an item of a
shared partisan
agenda
(continues)

Revised Pages
TABLE 4.10.—Continued
Differences created
Problem Part 1 Part 2 Part 3 by . . .
5 1g 2a 3g Configurative dynamics
• President’s party • internal bargain- (weakest link)
introduces bill ing by party C1 has high score
that includes leadership (large congressional
some but not all majority); C2 has low
of president’s leg- score (0.6) (president
islative priorities more popular than
not)
5 1a AND 2h 3h Configurative dynamics

1h • president’s party • internal bargain- (weakest link)
• president’s negotiates with ing by party C1 has low score (0.6)
party holds only opposite party on leadership and (slim congressional
a slim majority putting forward intraparty bar- majority); C2 has high
in Congress and legislation (bipar- gaining score (1)
thus needs to tisan) • silencing as a
strike deals with result of strong
the opposite party public support
on the substance
from the begin-
ning
selection strategy that enables us to test empirically whether assumptions

about mechanistic homogeneity hold across a set of cases, thereby enabling
inferences about causal processes to be made with more confidence for un-
studied cases within a bounded population.
The underlying idea of our snowballing-outward strategy is that by com-
paring most similar cases and moving outward to more different cases, we
can incrementally enlarge or restrict the boundaries of our generalizing in-
ferences about mechanisms. In general, we expect three different scenarios.
First, we might find that the outlined sources of varied contexts do not
affect the mechanism linking a conjunction and outcome across cases. We
hence gain empirical confidence in our ability to generalize that similar
mechanisms operate across cases. Second, we might detect heterogeneity
in the causal processes at play across cases. This would lead us to question
the mechanistic homogeneity assumption, meaning that we would need to
divide the population of cases into smaller subsets within which we have
evidence that mechanistic homogeneity exists. Third, we may be unable to

Revised Pages
identify any mechanism in other cases, which would mean that the case we
selected for process-tracing is largely idiosyncratic and that we abstain from
making generalizing inferences beyond the single case.
Empirically testing the assumption of mechanistic homogeneity signifi-
cantly increases the analytic burdens for researchers to generalize based on
their single process-tracing case studies. Yet it is better to be right about a
little than wrong about a lot. To lighten the analytical load, we suggest three
tools that can be employed—either alone or in combination—to make the
snowballing-outward strategy of tracing mechanisms in multiple cases more
feasible in research practice.
One way to reduce the analytical costs of conducting multiple process-
tracing case studies is to focus only on critical parts of the identified mech-
anism. The critical stages of a mechanism can be determined by asking
whether certain parts of the process are particularly crucial from a causal
perspective and where we have theoretical reasons to expect that the pro-
cesses might most plausibly differ across cases (Steel 2008: 88–92). If we have
theoretical and/or empirical justifications for expecting that a given mecha-
nism will always have to travel through a particularly critical stage, we can
focus our analytical attention only on this stage in additional case studies,
engaging in more focused process-tracing case studies.
An alternative means of lightening the burden of engaging in a
snowballing-outward case-selection strategy is to develop signatures of a
particular process that can be tested across a number of cases. If our initial
process-tracing case study has unpacked a causal process linking a condition
(or set of conditions) with an outcome, we can look for empirical observ-
ables in critical stages of our theorized mechanism that (1) are highly theo-
retically unique for this critical part and (2) can be expected to have similar
evidential weight in other cases. In this way, we can utilize such observables
as a form of signature of the operative mechanism as a whole in other cases
instead of empirically tracing each part of the process, using a minimalist
version of theory-testing process-tracing (see chapter 8). This can take the
form of a single observable, or a cluster of observables, i.e. engage in mini-
malist process-tracing.
Finally, researchers can explore multiple sources of mechanistic hetero-
geneity simultaneously in one analytic step to figure out what exactly causes
divergent processes across cases. In addition, it seems unlikely that a QCA
analysis or large-n statistical analysis will run into all problems simultane-
ously. Our review of existing applications of QCA and case studies in the

Revised Pages
literature found that they experienced only some of the pitfalls. Researchers
can make the informed choice to skip certain steps if doing so can be sub-
stantiated based on theoretical and/or empirical grounds.
Table 4.12 summarizes our snowballing-outward approach and its ba-
sic logic regarding the five potential sources of mechanistic heterogeneity:
(1) known/unknown omitted contextual conditions, (2) multiple attributes
within conditions, (3) differences of degree, (4) configurational dynamics
related to the sequencing of conditions in a conjunction, and (5) different
rankings of conditions that are part of the configuration producing the out-
come. We start the stepwise exploration of mechanism homogeneity by ex-
amining the effects of potential omitted contextual conditions based on the
expectations that the risks of lurking mechanistic heterogeneity are greatest
in this scenario. Moreover, we elaborate the first step in detail and present
the procedure for the subsequent steps in an abbreviated manner since they
follow the same underlying logic. If we are operating with a mapping of
cases using a variance-based dataset,16 the data first must be translated into
set-theoretical terms by demarcating cases that are in and out of the sets of
all of the theoretical concepts.
The Snowballing-Outward Case-Selection Strategy
Table 4.11 presents the set of cases whose fuzzy-set scores on C1, C2, and
O demarcate them as members of the conjunction C1*C2. In our example,
our initial case is case 4. Let us assume that our in-depth study of case 4 has
found confirming evidence for the operation of mechanism C1*C2 → part
1a → part 2a → part 3a → O (see figure 4.2). The question then becomes
how to generalize this causal inference about this mechanism to additional
cases. We should not simply assume mechanism homogeneity but instead
should test it empirically.
Given that we cannot repeat the process-tracing if there are more than
a handful of potential cases that are members of the conjunction and out-
come (table 4.11 has 31 cases), we need to strategically select cases to explore
the bounds of valid generalizations about the found mechanism. Based on
16. Statistical clustering analysis offers a particularly useful tool for mapping cases and clus-
tering them into sets that are relatively homogeneous at the level of causes. See, e.g., Ahlquist
and Breunig 2012.

TABLE 4.11. Full Set of Cases That Are Members of C1, C2, and O
Case O C1 C2 C3 C4 Comments
4 1 1 1 0 0
5 1 1 1 0 0 0 known differences Problems
2 1 1 1 1 0 1 difference in context 1 (omitted conditions), 2
(multiattribute concepts)
3 1 1 1 0 1 1 difference in context 4 (configurational nature
of term)
1 1 1 1 1 1 2 differences in context
9 0.8 1 1 0.8 0.8 difference of degree in Problem

outcome
10 0.6 1 1 0.6 1 3 (degree differences)
11 1 0.8 1 0.8 0.8 difference of degree in C1

12 1 0.6 1 0.6 1
13 1 1 0.8 0.8 1 difference of degree in C2

14 1 1 0.6 0.8 0.8
27 0.8 0.8 0.8 0.4 0 difference of degree in C1,
C2, and O
28 0.6 0.6 0.6 0.4 0
6 1 1 1 1 0.8 difference of degree in
context
7 1 1 1 0.6 1 difference of degree in
context
8 1 1 1 1 0.6 difference of degree in
context
15 1 0.8 0.8 0.2 0.2 C1 and C2 lowest Problem

16 1 0.6 0.6 0.4 0 5 (ranking of
17 1 0.6 0.8 0.6 0 C1 lowest value degree differences—
18 1 0.8 0.6 0.6 0 weakest link)
19 0.8 1 0.8 0.6 0.4 O and C2 lowest
20 0.6 1 0.6 0.6 0.4
21 0.8 1 0.6 0.2 0.2
22 0.6 1 0.8 0.2 0.4
23 0.8 0.8 1 0.6 0 O and C1 lowest
24 0.6 0.6 1 0.4 0
25 0.6 0.8 1 0.6 0.4
26 0.8 0.6 1 0.6 0.4
29 0.6 0.8 0.8 0.6 0 O lowest, at lower levels
30 0.8 0.6 0.8 0.6 0 C1 lowest, at lower levels
31 0.8 0.8 0.6 0.6 0 C2 lowest, at lower levels

Revised Pages
expectations about the degree of risk posed to valid generalizations by the

five problems described previously, we suggest that the snowballing-outward
procedure start by exploring the impact of omitted contextual conditions.
Table 4.12 graphically depicts the step-by-step procedure.
TABLE 4.12. The Snowballing-Outward Strategy
If find that not similar mechanism = mechanistic heterogeneity
• Theory-build- • Theory-building • Theory-building • Theory-build-
ing process- process-tracing process-tracing to ing process-
tracing to find to find new find new mecha- tracing to find
new mechanism mechanism nism (whole or new mecha-
(whole or part) (whole or part) part) nism (whole or
part)
• Identify omit- • Divide popula- • If all previous • Divide
ted condition tion into subsets conditions tested, population into
2) Test for importance of configuration dynamics (e.g. temporal order of conditions) (problem 4)
using focused e.g., divide popula- subsets
comparison C1*C2*C3 → tion into subsets e.g.,
• Divide M1 → O along last found C1*C2 → M1
population into C1*C2*~C3 → condition →O
subsets based M2 → O • If most different, C2*C1 → M2
on evidence go backward to →O
of mechanistic select more simi-
1) Test for importance of known/unknown omitted conditions (problem 1)
heterogeneity lar case to figure

e.g., out at which
C1*C2*~UC contextual condi-
→ M1 → O tion mechanistic
C1*C2*UC → heterogeneity
M2 → O kicks in
↑ ↑ ↑ ↑
Test for un- Test for known Continue sampling Test for tem-
known omit- conditions omit- until either all poral ordering
ted conditions ted by QCA/sta- conditions tested, of conditions
by choosing tistical analysis or most different in conjunction
case with same by choosing case case selected (continue until
conditions (most with one contex- all combinations)
similar) tual difference
↓ ↓ ↓ ↓
• Conclude that all
known omitted
conditions do not • Conclude
• Conclude that • Conclude that matter (weaker that temporal
omitted condi- known omitted claim if most ordering of
tions do not condition does different strategy conditions does
matter not matter used) not matter
If find that similar mechanism = mechanistic homogeneity

If find that not similar mechanism = mechanistic
heterogeneity
• Theory-building • Theory- • Theory-building

process-tracing to building process-tracing
find new mecha- process-tracing to find new
nism (whole or to find new mechanism
part) mechanism (whole or part)
• Recode condi- (whole or part) • If all previous
tion into two • Introduce conditions
conditions difference in tested, recode
• Divide popula- kind in condi- last condition as
tion into subsets tion (e.g., as a a lexical scale,
e.g., lexical scale), then divide
3) Testing for importance of multiattribute concepts (OR relationships within conditions) (problem 2)
C1*C21 → M1 thereby divid- population . . .

→O ing population • If most dif-
C1*C22 → M2 into subsets ferent, go
→O e.g., backward to
C1*C21 → M1 select more
→O
4) Test for importance of fuzzy-set scores (differences of degree) (problem 3)
similar case to
C1*C22 → M2 figure out where
→O mechanistic
heterogeneity
kicks in
↑ ↑ ↑
Test for impact Continue sam-
Test for OR
of difference of pling until all
relationship in
degree in most conditions tested,
conditions (con-
similar case (one or most different
tinue until all OR
difference of de- case selected
relationships in
gree in fuzzy set
conditions tested)
or composition
of condition)
↓ ↓ ↓
• Conclude that
all differences
of degree in
• Conclude that conditions do
• Conclude that difference of not matter
OR relationships degree in con- (weaker claim if
in concepts does dition does not most different
not matter matter strategy used)
If find that similar mechanism = mechanistic homogeneity

If find that not similar mechanism =
mechanistic heterogeneity
• Theory- • Theory-building
building process-tracing to
process-tracing find new mecha-
to find new nism (whole or
mechanism part)
(whole or part) • If all previous
• Divide conditions tested,
population into divide population
subsets accord- into subsets along
ing to which last difference
mechanism found
operative
5) Testing for ranking differences within conjunction (weakest-link) (problem 5)
e.g.,
C1 = 1*C2
= 0.6 → M1
→O
C1 = 0.6*C2 =
1 → M2 → O
↑ ↑
Test for impact Continue sampling
of ranking dif- until all differences
ferences within tested
conjunction in
most similar case
(smallest differ-
ence in ordering)
↓ ↓
• Conclude that
ranking differ-
ences do not
matter within • Conclude that all
conjunction ranking differenc-
• Ability to es do not matter
recode tested • Ability to recode
conditions as all conditions as
crisp set crisp set
If find that similar mechanism =
mechanistic homogeneity

Revised Pages
Step 1—Test Whether Omitted Conditions Matter When

Determining Which Mechanism Links Conjunction and Outcome
(Problem 1)
The first step is to explore whether omitted conditions matter, irrespective of

whether omission is the product of QCA minimization or statistical analysis
(known omitted) or whether they were not identified as relevant conditions
in the research (unknown omitted). We cannot simply assume that known/
unknown contextual differences do not matter when determining which
mechanism links causal conditions with an outcome. In terms of table 4.11,
while conditions C3 and C4 were eliminated in the parsimonious QCA so-
lution of sufficiency, when we shift the analysis to the level of mechanisms,
we cannot just presume that conditions C3 and C4 do not impact which
mechanism links C1 and C2 with the outcome. When working with con-
junctions of conditions, it is possible to conduct parallel tests for problems 1
and 4. In addition, if conditions in the conjunction or outcome include OR,
we can explore the extent to which these differences matter at the same time
that we engage in steps 1 and 2. Therefore, in step 1, when conjunctions and
OR relationships are present, it is important to be cognizant that mecha-
nistic differences might be produced by these problems, not just omitted
conditions.
If it is possible to do multiple additional case studies—either in-depth
process-tracing or minimalist process-tracing to explore whether the same
process occurs at a critical stage and/or we can develop a unique signature
empirical observable—then we should start with a case that has minimal
contextual differences.
When testing for unknown omitted conditions, we select a case that is
similar on all known conditions if one exists in the dataset. In table 4.11, this
is case 5. If we do not find evidence of a mechanism in case 5, we then need
to figure out whether cases 4 and/or 5 are idiosyncratic. An additional case
can be selected (ideally, one very similar to 4 and 5) to see whether or not
there is evidence of a mechanism. If no evidence is found, this suggests that
the mechanism in case 4 was idiosyncratic and that no causal mechanism
exists—only a correlation in the other cases.
If we find that it is not the same process in case 5, we have to engage in
a form of in-depth theory-building process-tracing to figure out what the
mechanism is (see chapter 9). It could be a completely different mechanism
(M2), or the process may be similar until a critical point at which it diverges.

Revised Pages
Once we have figured out what different process operates in case 5, we need
to identify the omitted contextual condition by comparing cases 4 and 5. If
the mechanisms in the two cases initially run in parallel but then diverge,
the information about divergence can be used to inform a focused compari-
son of the two cases at this point. Once the omitted condition is identified
(UC) and all cases are scored on it, the population of cases can be split into
two subsets: one where UC is not present and M1 works, and another where
M2 operates because the UC contextual condition is present. While further
division of a population of cases might pose problems for engaging in a
subsequent QCA analysis—that is, increasing problems related to limited
diversity, skewed ratio between cases displaying the outcome and nonout-
come, and so forth—this is not problematic here because we are merely
using QCA as a tool to explore the bounds within which we can make valid
generalizations about mechanisms.
If we find evidence for a similar mechanism in case 5, we can conclude
that it does not look like unknown conditions matter, although we can by
no means be 100 percent confident about mechanism homogeneity, in par-
ticular if numerous plausible contextual conditions are omitted from the
dataset. In Pennings (2003), the set of cases spanned a number of different
contextual factors such as region and time (e.g., during or after the Cold
War, recently decolonized countries, and so on) that might be expected to
affect which processes are operative. In this type of situation, testing for
potential omitted conditions should proceed in a more strategic fashion,
with multiple cases sampled across plausible differences. If we still do not
find differences, then we can conclude that there is a relatively low risk of
lurking mechanistic heterogeneity as a consequence of unknown omitted
conditions.
Testing for known omitted conditions involves selecting a case in which
there are the fewest differences (case 2 or 3). Ideally, the impact of each
known omitted condition should be tested—for example, by starting with
case 2 (testing for whether the presence of C3 matters) and then following
with case 3 (testing for whether C4 matters). If we find that there is a similar
process, we can conclude that the known omitted condition does not appear
to matter, thereby enabling mechanistic generalizations to all similar cases.
In table 4.11, if we find M1 also present in case 2, we could then generalize
with more confidence that M1 should be present in all cases similar to case 1
and case 2—that is, irrespective of whether or not C3 is present. This analysis
would then be followed by an analysis of case 3 to determine whether the

Revised Pages
absence of C4 matters, by an analysis of case 4 to determine whether the

absence of both C3 and C4 matters, and so on.
If we find that the same mechanism did not link C1*C2 in cases 4 and
2, then we must figure out what mechanism, if any, provides the link in
case 2. If no mechanism is found in case 2 and we had already found M1 in
both cases 4 and 5, this would suggest that either case 2 is idiosyncratic or
that condition C3 is an inhibiting condition for the mechanism. If there is
another case similar to case 2, examining this case would enable us to distin-
guish between idiosyncrasy and inhibiting condition.
If the mechanism in case 2 was either completely or partially different,
the question is whether the known or unknown omitted condition pro-
duced the difference in the workings of the mechanism. Detecting this can
be accomplished using a focused comparison of mechanisms in cases 2 and
4. This can be made easier if the process is similar at the start but later di-
verges. Once we figure out what known/unknown omitted condition was
responsible for the different process, we should split our population into two
subsets. If we, for example, found a completely different process (M2) in case
2 and our focused comparison suggested that the presence of condition C3
was the cause of a different process, we could then split the population into
a set of cases where we should see M1 at play because C3 is not present and
another set where M2 operates because C3 is present.
In some research situations, it is more appropriate to jump immediately
to the most different case within the set of cases (case 1 in table 4.11). This
can be because we have reasonable theoretical and/or empirical expectations
that context probably does not affect which mechanism is at play or because
of very limited analytical resources. If we find evidence of M1 in the most dif-
ferent typical case, this provides some evidence of mechanistic homogeneity
within the population. However, in contrast to Schneider and Rohlfing’s
(2013, 587) claims, there may still be reasons why M1 is present in the most
typical case (case 4) and the most different typical case (case 1) but not in the
cases in between: contextual differences may matter here (cases 2, 3, and 5 in
table 4.11), although the plausibility of this scenario is reduced. This means
that although the maximum-difference strategy described by Schneider
and Rohlfing (2016) can be used, the generalization to all cases still would
build on a strong assumption about mechanistic homogeneity. We therefore
strongly recommend engaging in a snowballing-outward strategy that tests
for rather than assumes mechanistic homogeneity.

Revised Pages
Step 2—Does the Temporal Ordering of Conditions in a

Conjunction Matter? (Problem 4)
When working with conjunctions that are found in a QCA for sufficiency,
the QCA analysis tells us nothing about whether the temporal ordering of
the conditions matters (Beach and Rohlfing 2018). Testing for temporal im-
portance can in principle be done in parallel with step 1. If C1 occurred
before C2 in case 4 and we found a different mechanism operative in case 2,
we would want to investigate whether evidence suggests that the temporal
ordering of C1 and C2 mattered. If we find either that C1 and C2 occur si-
multaneously or that C2 occurs before C1 in case 2, this could be a plausible
source of mechanistic heterogeneity that should be explored in more detail
using theory-building process-tracing together with a focused comparison
of the two cases. If evidence suggests that temporal ordering mattered for
the process triggered, we should divide the population into a subset of cases
where C1 occurs before C2 and a subset where C2 occurs before C1.
Step 3—Do Multiattribute Concepts Matter (e.g., OR

Relationships) (Problem 2)
When operating with OR relationships in conditions or other forms of mul-

tiattribute concepts (e.g., indexes), it is also important to test whether they
matter for the operation of mechanisms. As with step 2, this step can also be
achieved in parallel with step 1.
In table 4.11, let us assume that condition C2 is an OR condition, in
which membership is achieved by having either attribute C21 or C22 or both
present. If membership in B in cases 4 and 5 occurs because of the presence of
C21 and membership in C2 in cases 1, 2, and 3 occurs because of membership
in attribute C22; if we have found that another mechanism operates in cases
4 and 2; and if we could rule out omitted conditions and temporal ordering
in the conjunction (steps 1 and 2), we would then want to explore to what
extent the different attributes actually are substitutable (OR) or whether
the two attributes are causally distinct at the level of mechanisms. This can
be done by engaging in theory-building process-tracing in case 2, exploring
the link between attribute C22 and the triggered process, combined with a
focused comparison with case 4. If evidence indicates that the different at-
tributes trigger different processes, the condition should be recoded into two
distinct conditions, thereby dividing the population into two subsets.

Revised Pages
Step 4—Do Differences of Degree of Individual Conditions

Matter? (Problem 3)
When the comparative analysis utilizes fuzzy-set or interval-scale scores,

the next steps involve testing whether mechanistic heterogeneity is lurking
within differences of degree, either individually or because of different rank-
ing in a conjunction and/or the outcome.
We suggest first exploring whether individual degree differences in con-
ditions matter. Here, a case that is as similar as possible to the initial case but
with a slight difference of degree on an individual condition in focus should
be studied (e.g., case 9), followed ideally by cases that have larger differences
of degree on the individual condition.
If we find that individual differences of degree in a condition do not
matter for the mechanism at play, then we can more confidently assume that
the same mechanism should be present in other cases that are members of
the condition, irrespective of individual degree differences across cases. We
then can treat the condition “as if ” it is a crisp set when generalizing about
mechanisms.
If we find that M1 is present in case 9, case 10 could then be explored to
see whether a larger difference of degree matters, given that different mecha-
nisms might plausibly link C1*C2 with O at different levels of O. This can
be thought of as a gradual expansion outward of the valid bounds of gener-
alizations about a mechanism in relation to a particular condition/outcome.
If we have limited resources and/or are unable to develop a unique signature
of M1, then the case with maximum difference of degree within the set of
the individual condition can be selected (case 10 in table 4.11), although the
cross-case assumption that it is the same mechanism linking when O has
values of 0.6, 0.8, and 1 is not empirically verified because the intermediate
case (case 9) is not investigated.
The procedure should ideally be repeated for each individual condition
and the outcome, attempting to assess whether concepts are free of differ-
ences of degree that might mask categorical differences at the mechanistic
level. However, it might also be suitable to concentrate on specific sets and/
or set scores if there are strong theoretical and/or empirical reasons that
already indicate that the concepts used in the analysis include hedges that
might exert different causal powers at the level of mechanisms. In addition,
if possible, we should also explore whether differences of degree in contex-
tual conditions also impact which processes link C1*C2 with O (cases 5–8).
Does the same process operate when C3 is 0.8 (case 5) as when it has a value

Revised Pages
of 1 (case 4)? If the answer is yes, we could then generalize that quantitative
differences in contextual condition also do not matter.
If we do not find a similar mechanism in the two cases, we should engage
in theory-building process-tracing to figure out what mechanism is present
in the second case. When we figure out what mechanism is operative, we
should recode the condition to incorporate this causal threshold relating to
mechanisms, reintroducing categorical distinctions into the measure. One
tool could be to recode cases using a recent methodological innovation—
what Skaaning, Gerring, and Bartusevičius (2015) term a “lexical scale.” A
lexical scale is a categorical ordinal scale where membership at each level is
determined by the presence of necessary and sufficient attributes in an ad-
ditive fashion. With this approach, each category in the scale demarcates a
causally relevant difference in kind. For example, if we found that similar
mechanisms operated in cases 4 and 9–12 but that a different mechanism
(M2) was present in case 13, we could recode condition C2 into two distinct
concepts, where cases with full membership in C2 are members of C2a and
cases with fuzzy-set scores of 0.8 or below are members of C2b, where an-
other mechanism operates. In the example of Samford (2010) , the two very
different processes in Uruguay and Peru could be accounted for by a causal
threshold relating to mechanisms lurking within the degree differences in
the condition hyperinflation. In the case of Peru, extreme hyperinflation
produced a very strong imperative for reforms. This finding could lead to
the recoding of cases into two distinct categories of hyperinflation: moderate
(e.g., Uruguay) and extreme (Peru).
Step 5—Do Different Orderings of Fuzzy-Set Scores Matter?

(Problem 5)
Finally, and most complex, is the question of whether different rankings by

degrees of conditions matter. We may need to control for almost unlimited
possibilities because we are dealing with both different rankings and differ-
ing degrees of conditions and the outcome. In table 4.11, even in the rela-
tively simple situation of two conditions in the conjunction and only three
possible values of membership in the set of a condition/outcome (1, 0.8,
0.6), there are many possible cases (cases 15–26, 29–31). Given this complex-
ity, we must prioritize by focusing first on whether which condition is lowest
matters and then on whether having different values of C1, C2, or O within
these different rankings matters.
In table 4.11, we could start by exploring whether having fuzzy-set

Revised Pages
case scores on C1, C2, or O lowest matters. This could be done by testing
whether M1 is present in cases 19 (C2 and O lowest) and 23 (C1 and O low-
est). If we find that M1 is present in both, we have empirically validated the
homogeneity assumption regarding this set of cases, enabling us to general-
ize cautiously to other relatively similar cases (cases 20–22, 23–26). Ideally,
we would also explore whether differences of degree in this relative ranking
matters, but when a bounded population is medium-n, this would in most
instances be practically infeasible given the large number of possibilities.
4.6. Conclusions
The assumption of mechanistic homogeneity demands a lot of blind trust.

If this assumption fails, we risk making flawed generalizations because we
assume the presence of the same mechanism(s) across all typical cases when
different mechanisms are at work. We illustrate this problem in a real-world
article in appendix 1.
The root cause of the identified sources of mechanistic heterogeneity
lies in overstating the causal homogeneity of cases sharing similar values
on the cause and outcome and the contextual conditions found to mat-
ter using comparisons via either case-based methods like QCA or statistical
analyses in variance-based approaches. In doing so, however, we may run
into severe problems by missing decisive contextual differences—omitted
conditions, temporal or causal dynamics, complex causal concepts, categori-
cal differences incorporated in fuzzy sets, or masked case configurations—
that change the inner workings of the causal processes. The severity of these
problems largely depends on three factors: (1) the mechanism’s degree of
contextual sensitivity, (2) the mechanism’s level of abstraction, and (3) the
size of the population being investigated.
To face the challenge of mechanistic heterogeneity, we argue in favor of
empirically testing the limits of generalizability. Our snowballing-outward
strategy achieves this goal: through a systematic assessment of cases within a
population, we can gradually probe the boundaries to which insights about
causal process can travel. In this way, we can explore whether generalizations
are justified or whether mechanistic heterogeneity is present among cases
even though they are members of the same conjunction. If successful, a cu-
mulative, mechanism-focused research program will result in the cataloging
of mechanisms that link a given cause (or set of causes) and an outcome in
different contextual conditions. Instead of assuming that because a mecha-

Revised Pages
nism works in one case, it will work everywhere, cumulative process-tracing

of cases using the snowballing-outward procedure will empower us to make
stronger claims about where we should expect a particular mechanism to
operate.
Appendix 4.1—An Illustration of Mechanistic Heterogeneity

in Practice
Kuehn and Trinkunas (2017) offers a real-world example of hidden mecha-

nistic heterogeneity, in which existing guidelines lead to flawed generaliza-
tions based on a single process-tracing case study of a case that is a member
of a conjunction of conditions. The authors first investigate using QCA the
conditions under which military contestation in populist regimes occurs in a
set of Latin American countries. The cross-case QCA analysis for sufficiency
is then followed by four case studies (two positive and two negative cases).
We analyze whether, based on existing guidelines, Kuehn and Trinkunas
would be justified in generalizing from the Chávez 2 case to other cases that
are members of the same conjunction and outcome. We are not criticiz-
ing their excellent QCA analysis but are merely pointing out that following
existing guidelines for generalization after a QCA for sufficiency results in
flawed generalizing inferences.
Cases in Latin America between 1982 and 2012 are scored on six poten-
tial causes of military contestation: (1) leftist ideology, (2) degree of radical
politics, (3) natural-resource rents, (4) mobilization capacity, (5) constraints
on the executive, and (6) supportive international environment. Table A.1
depicts the scores for positive cases of military contestation. Kuehn and
Trinkunas (2017: 866) then engage in a QCA analysis for sufficiency of the
outcome military contestation, finding two configurations in the QCA in-
termediate solution that are sufficient for the outcome. The first, and more
widespread, configuration is Leftist ideology*Radical populist*Natural-
resource rents, found in cases 3, 4, 6, 7, 15, 16 (light gray in the table). The
second configuration is Radical populist*No natural-resource rents*Strong
mobilization capacity*Lack of executive constraints*Nonsupportive interna-
tional environment, which is found in cases 10 and 2 (dark gray in the table).
Kuehn and Trinkunas then select a typical case of each configuration to
“test whether the ‘causal paths’ identified . . . lead to military contestation”
(2017: 869) with the goal of generalizing about processes from the typical
case to the rest of the members of the configuration. Note that they do not

TABLE A.1. Summary of Conditions Leading to Contestation (Outcome)
Presidential Leftist Radical Mobilization Executive Foreign
Id term Country Years Contest ideology politics Rents capacity construction intervention
3 Chávez 2 Venezuela 2001–6 1 1 1 0.98 0.21 0.09 0.4
4 Chávez 3 Venezuela 2007–12 1 1 1 0.9 0.49 0.02 0.6
6 Correa 1 Ecuador 2007–9 0.8 1 1 0.7 0.36 0.22 0.4
7 Correa 2 Ecuador 2010–12 1 1 1 0.56 0.6 0.12 0.8
10 Fujimori 2 Peru 1996–2000 0.6 0 1 0.03 0.54 0.03 0.4
11 Kirchner 1 Argentina 2004–7 0.8 1 0 0.12 0.48 0.94 0.4
13 Menem 1 Argentina 1990–94 1 0 1 0.03 0.86 0.85 0.24
14 Menem 2 Argentina 1995–99 0.6 0 1 0.03 0.86 0.89 0.4
15 Morales 1 Bolivia 2006–9 1 1 1 0.98 0.73 0.67 0.4
16 Morales 2 Bolivia 2010–12 0.8 1 1 0.61 0.9 0.61 0.8
19 Toledo 1 Peru 2002–6 0.8 0 0 0.07 0.53 0.97 0.4
22 Zelaya 1 Honduras 2006–9 0.6 0 1 0.04 0.56 0.41 0.4
Source: Based on Kuehn and Trinkunas 2017: 864.
Note: Light gray shaded cases are covered by configuration 1, dark gray shaded cases by configuration 2, and
white cases by neither configuration.
9/28/2018 1:08:29 PM
Revised Pages
explicitly claim to generalize to other typical cases based on the narrative

analysis of one typical case. According to the current template, however, we
could expect that the reconstructed “causal path” in the selected case study
would be generalizable to all other cases belonging to the same configura-
tion (Schneider and Rohlfing 2016; Williams and Gemperle 2017). In other
words, based on current guidelines for combining QCA and PT, study-
ing the causal process in the case of Hugo Chávez in Venezuela (Chávez 2)
should shed light on the causal path (or mechanism) in other typical cases
like Evo Morales in 2009 (Morales 1) or Raffael Correa in 2010 (Correa 2).
When we try to reconstruct the causal pathway in other typical cases of
the same conjunction of conditions, we find that very different mechanisms
are at play. We compare the process in the Chávez 2 case described in the
article with two other typical cases of configuration 1: Morales 1 (coup at-
tempt in 2009) and Correa 2 (coup attempt in 2010). We do not go into the
debate about whether the Morales and Correa incidents were actually coups,
as occurred in Venezuela in 2002: Correa 2 may have been a spontaneous ac-
tion by striking police, and Morales 1 may have been an attack by unknown
assailants, with the presidents themselves declaring the incidents coups as a
means of initiating crackdowns on opposition forces (e.g., New York Times,
April 18, 2009; Time, April 19, 2009 regarding Bolivia; Economist, October
9, 2010: 64–66; Becker 2016 for Ecuador).
If we first look at the causal path—that is, causal mechanism—at
a relatively high level of abstraction, Kuehn and Trinkunas (2017: 874)
claim that contestation occurred because the “military reacts not just to
threats to institutional prerogatives, but in combination with appeals from
elites threatened by populist leaders that appear to be particularly uncon-
strained, either because they ignore institutions or because they have ac-
cess to major rents.” In the Venezuela case, a highly abstract mechanism is
depicted in figure A.1a.
Kuehn and Trinkunas show that military opposition to Chávez resulted
both from ideological reasons and from officers who were concerned about
the improper use of oil money on social programs administered by the mil-
itary. In contrast, both Bolivia and Ecuador offer little evidence that the
military felt that its interests were threatened or that it opposed the regime
in sympathy with the broader elite protests against radical leftist populism.
In Bolivia, the core conflict between the leftist populist leader and the op-
position was an ethnic nationalist conflict between indigenous groups sup-
porting president Morales and regional elites. In Ecuador, the conflict that
sparked the “coup” attempt was a perceived threat to institutional preroga-

Revised Pages
Fig. A.1a. A highly abstract one-liner mechanism in the Venezuela (Chávez 2) case
Source: Based on Kuehn and Trinkunas 2017.
tives, although the police rather than the military saw its prerogatives threat-
ened. Indeed, in the period before the “coup,” Correa had increased the
military budget by 91 percent between 2006 and 2009 (Rittinger and Cleary
2013: 413). In both Bolivia and Ecuador, there is little evidence suggesting
that oil rents played a causal role in triggering opposition among military
officials, as in Venezuela.
If we lower the level of abstraction in our theorization about the causal
mechanism from a minimalist mechanism to a more detailed description of
the key steps in the process, comparing Venezuela with other typical cases
suggests an even greater degree of heterogeneity at the mechanistic level. In
the Venezuela case, there was a period of growing popular protest against
the Chávez regime. The events leading up to the 2002 coup featured a cycle
of escalating protests and counterprotests, with the trigger occurring when
Chávez ordered the military to stop the protests (Kuehn and Trinkunas 2017:
871) in April 2002. Instead of obeying, military leaders rebelled by detaining
Chávez, fearing the legal and reputational costs of suppressing protests (871).
After three days, the coup collapsed because of popular opposition and mili-
tary infighting (871). The “mechanism” Kuehn and Trinkunas describe in the
Chávez 2 case is depicted in figure A.1b.
In contrast, as figure A.1c shows, in the 2009 Bolivian “coup,” the lower
level of abstraction mechanism would be one in which the military had
few grievances with the president (Eaton 2014: 1143) but in which the core
conflict was between regional elites and the indigenous groups that Morales
mobilized in support of his conflicts with the regional elites (Cyr 2015: 299–
300; Eaton 2014: 1142–45; Madrid 2013: 255; Torre 2013). The “coup” attempt
took place in this context; during a visit by Morales to a province that was
the bastion of opposition to his statist and ethnopopulist policies. Moreover,

Revised Pages
Fig. A.1b. A more detailed causal mechanism in the Chávez 2 case

Fig. A.1c. The causal mechanism in the Morales 1 case

the 2008 constitutional settlement had actually calmed the regional conflict
(Cyr 2015: 299; Eaton 2014: 1142), although the push for autonomy of gas-
rich areas was perceived as a threat to regional elites. In the run-up to the
“coup,” Bolivia did not see the same type of popular protests as occurred
in Venezuela, nor did the military actually take part in the “coup” attempt.
In the case of Ecuador (Correa 2), statism and radical foreign policies had
produced considerable upset among elites. In 2009 a decline in oil revenue
resulted in widely unpopular austerity policies. However, the actual “coup”
event resulted from protests and strikes by police officers in reaction to a
proposed new civil service law that cut wages and advancement opportuni-
ties. Correa confronted protesting policemen at a barracks, provoking an an-
gry response in which tear gas was used and a police officer allegedly kicked
the president’s knee. Correa was taken to a hospital and was rescued by
military units several hours later. Figure A.1d depicts this causal mechanism.
This example illustrates that what look like causally homogeneous cases
via a QCA-based comparison are not when we shift to the level of causal
mechanisms that actually played out. This example shows why we cannot as-
sume mechanistic homogeneity based on homogeneity at the cross-case level

Revised Pages
Fig. A.1d. The causal mechanism in the Correa 2 case

Source: builds on Kuehn and Trinkunas 2017.
found using QCA; instead, we must test whether the assumption actually
holds at the level of mechanisms to avoid making incorrect generalizations.
Based on our preliminary process-tracing in the other cases, we suggest that
the population of positive cases should be split into different subsets—for
example, by recoding the outcome into real coups (Chávez 2) and coup-like
events (Correa 2 and Morales 1).
Appendix 4.2—Bounding Populations in Practice
This appendix illustrates our snowballing-outward strategy by reconstructing

what Haggard and Kaufman’s (2016) study of democratic transitions would
have looked like if they had explored mechanistic heterogeneity across cases
instead of keeping the mechanism at a very high level of abstraction—what
they term a “stylized arc.” Here, we focus on explanations of distributive
conflict transitions.
Haggard and Kaufman first engage in statistical analyses of covariates of
distributive conflict transitions (DCT), finding that authoritarian institu-
tions, union density and size of the manufacturing sector (termed collective
action), and a lack of economic growth increased the likelihood of DCT
outcomes (2016: 75–82). Admitting that the statistical analysis enables them
to make claims only about correlations, they move onto case studies of posi-
tive DCT outcomes. They discuss but do not provide a table showing how
the cases are similar and different on key contextual conditions(110). Based
on their discussion, we constructed table A.2, showing case scores across the
conditions, including some “unknown” conditions that clearly stand out as

TABLE A.2. Case Scores of Distributive Democratic Transitions
# of differences
International Level of Unions key Ethnonationalist Unknown from
Country Region alignment development actors actors key conditions Argentine case
Argentina Latin Close proxim- Medium + − Lost war
America ity to US
Bolivia Latin Close proxim- Low + − Church 2
America ity to US support
Poland Eastern Soviet domi- Medium + − Collapse of 3
Europe nance Soviet Union
Niger Africa Western client Low + − 4
(state workers)
Congo Africa Soviet client Low + − 4
(state workers)
South Africa Internationally Medium − + 4
Africa sanctioned
Baltics Eastern Part of Soviet Medium − + Collapse of 5
Europe Union Soviet Union
Source: Haggard and Kaufman 2016: 110.
Note: Differences from the Argentine case and other cases are shaded gray. Double lines demarcate clusters of
mechanistically similar cases.
9/28/2018 1:08:30 PM
Revised Pages
important factors in the case narratives. They engage in seven different case
studies out of a total population of roughly forty cases,17 providing analytical
narratives of events occurring from the start of serious disturbances to the
final democratic transition.
As discussed in chapter 3, these narratives are not theorized mechanisms
because they do not flesh out the underlying causal mechanisms18 in the
seven cases, nor do they provide evidence of causal links. Unfortunately,
Haggard and Kaufman’s (2016: 128) descriptions of the mechanisms remain
at the level of very abstract one-liners: the pathway is “credible and sustained
mass mobilization” (although they also note that the processes played out
in many different ways: “Violence and direct displacement of incumbents
did occur in some of the distributive conflict cases, but the dominant pat-
tern of exit was credible and sustained mobilization that raised the costs of
continued authoritarian rule. Elections did play a focal point in some cases,
as did political parties, but even where they did, it was typically in conjunc-
tion with the mobilization of both protest and grievance on the part of civil
society organizations”).
Using our snowballing-outward strategy and unpacking mechanisms in
more detail than one-liner “stylized arcs,” based on the Argentina case, they
might have theorized a midrange abstract mechanism as depicted in fig-
ure A.2. When we look at the empirical narrative for the Bolivian case (the
most similar case), we find a relatively similar mechanism in operation, even
though in the Bolivian case there is a potential omitted contextual condition
(church support of protests) that did not matter in the Argentine case but is
important in Bolivia.
The next step in exploring the bounds of generalization of the union
collective-action mechanism would be to probe whether it holds in the Pol-
ish case, which differs on two known conditions (as well as on unknown
omitted conditions—e.g., the impending collapse of the Soviet empire in
Central and Eastern Europe). When we look at the analytical narrative in
the Polish case, we find mass mobilization organized around the Solidarity
unions in the early 1980s that experienced repression in October 1981 and
a resurgence of this phenomenon in the wake of the easing of Soviet pres-
17. Either 36 or 42 depending on the dataset used. See Haggard and Kaufman 2016: 45.
18. Haggard and Kaufman (2016: 128) claim that they “used these cases to trace causal
mechanisms more closely, and also to take note of several other causal factors that operated.”
As discussed in chapter 3, the narratives are not tracing mechanisms but can be translated into
mechanisms by cooking out the essence of the causal links.

Revised Pages
Fig. A.2. A midrange theory of a collective-action mechanism through unions
sure after 1985. As economic conditions deteriorated, Solidarity resurfaced

in 1988 but as a political actor instead of as a union. In the face of protests,
the Polish Communist Party invited Solidarity to collaborate by offering
limited representation, but the effort failed and led to the transfer of power
to Solidarity. Reconstructing this process as a mechanism, we see parts 1 and
2 of the mechanism in figure A.2 occurring but then a gap of seven years
before Solidarity resurfaces (part 3). The repression in part 2 was relatively ef-
fective in smothering the protests, and only the loosening of Soviet pressure
in the late 1980s enabled Solidarity to resurface in 1988. For the mechanism-
based explanation of the Polish case to tell us anything, we would have to
incorporate the critical role played by the loosening of Soviet pressure and
Solidarity’s transformation into a type of actor other than a labor union. We
would then conclude that another type of mechanism operates in the Polish
case from the two Latin American cases, leading us to divide our population
into subsets.
The process of testing for mechanistic heterogeneity would continue in
other cases. In the African cases, the process looked very different because
it went not through broader societal actors mobilized through labor unions
but instead through elites within the state apparatus itself. In the Baltic cases,
the conflicts had a strong nationalist element because of their occupied sta-
tus, leading to a very different form of collective mobilization than seen in
the Latin American or Polish cases.
This example shows that it is only possible to claim that similar mecha-
nisms operate across a set of cases when we (1) keep the analytical abstraction
at the level of one-liner theories of mechanisms that tell us next to nothing
about how a process actually works in a case or (2) test empirically whether

Revised Pages
similar mechanisms operate across cases in different contexts. Here the ad-
age of “the more one knows, the less one knows” is very applicable. The goal
in process-tracing is to get to know our cases, but the more we learn about
how processes play out, the harder it becomes to make sweeping generaliza-
tions about mechanisms across sets of cases. Instead, serious process-tracing
is compatible only with more bounded generalizations about mechanisms.

Revised Pages
C h a p ter 5
Making Inferences Using

Mechanistic Evidence
Having gathered these facts, Watson, I smoked several pipes over
them, trying to separate those which were crucial from others which
were merely incidental.
—Sherlock Holmes
5.1. Introduction
Chapters 5–7 present a Bayesian-inspired, two-stage evidence-evaluation

framework that enables us to assess whether there is mechanistic evidence
confirming or disconfirming the operation of a mechanism—or a part of a
mechanism—in a case using process-tracing methods. Figure 5.1 depicts the
overall framework and the questions that need to be asked regarding empiri-
cal material.
We distinguish between theoretical and empirical evaluations of evi-
dence. The theoretical level of evidence evaluation is where we reason about
the empirical fingerprints that the evidence-generating process of activities
of mechanisms (or parts thereof ) might leave in a given context. It is in-
formed speculation about what fingerprints we would have to find (theoreti-
cal certainty), and whether there are alternative explanations for finding a
fingerprint (theoretical uniqueness). At the empirical level, we then attempt
to observe whether the posited fingerprints are actually present in the acces-
sible empirical record, determining what the found/not found observation
means and whether we can trust it. At the level of actual sources, we engage
in critical assessments, asking whether we have full access to the empiri-
cal record if we do not find the predicted fingerprint (empirical certainty)
155

Revised Pages
Fig. 5.1. A two-stage evidence-evaluation framework for turning empirical

material into evidence of mechanisms
and whether alternative explanations exist for a particular found fingerprint,

such as whether we can trust the found observation (empirical uniqueness).
This distinction between what a proposition about mechanistic evidence can
tell us in theory, and what the actual sources of evidence tell us, can be illus-
trated using the example of confessions and torture. In theory, a confession
to a crime by a suspect could be a strong confirming piece of evidence. But
if the actual source of the evidence was a confessional produced by the police
torturing the suspect, we would not be able to trust the source because the
suspect would confess irrespective of whether they are guilty or not to stop
the torture. In this instance, despite the confession being a strongly confirm-
ing “test” in theory, because we cannot trust its veracity, the actual observa-
tion would be inadmissible evidence.
Our presentation of the two-stage evidence-evaluation framework sug-
gests designs where we have formulated mechanisms before testing them
empirically (i.e., theory-testing). However, the framework can also be used
for theory-building process-tracing. Here, we would start by collecting em-
pirical material, both via building a narrative of events that could shed light
on the causal process and through soaking-and-probing to find any puzzling
empirical facts that might indicate the operation of an unknown mechanism
(see chapter 9). Once we have empirical material, we would then engage in
a back-and-forth dialogue with plausible conjectures about mechanisms and
their parts, asking ourselves whether the found material might be evidence
of a mechanism.
This chapter focuses on the theoretical level of evidence evaluation, where

Revised Pages
Making Inferences Using Mechanistic Evidence 157
we develop propositions about the empirical fingerprints of the activities as-

sociated with mechanisms. In chapter 6, we move to the empirical level,
and in chapter 7, we evaluate the collective probative value of evidence for a
given mechanism using an argument road map to structure the aggregation
and evaluation process.
This chapter starts by discussing the problems with existing guidelines for
making within-case inferences using case studies. We first discuss challenges
related to variance-based frameworks that utilize evidence of difference-
making to make inferences about mechanisms, highlighting the use of fo-
cused comparisons that attempt to approximate natural experiments as well
as of logical hypotheticals. We then move on to the case-based alternative
of using “causal process observations” (CPOs), showing that it is difficult to
evaluate probative value of CPOs because we are never told about what pro-
cess the evidence is purportedly evidence of, or why it can act as evidence.
In the next section, we suggest using the more precise term mechanistic
evidence instead, defined in chapter 2 as the empirical observables left by
the activities associated with the operation of a causal mechanism within
a case (Russo and Williamson 2007; Illari 2011).1 We develop a framework
for thinking about the potential empirical fingerprints of mechanisms by
differentiating mechanistic evidence into four different types: pattern, se-
quence, trace, and account evidence. We then elaborate how an informal
Bayesian-inspired approach can be used to ask the right questions when
evaluating mechanistic evidence. This logic has been widely utilized as the
underpinning for making evidence-based inferences in fields such as law
and intelligence analysis in which inferences are not made via experiments
or assessments of patterns of covariation between variables across cases (e.g.,
CIA 1968; Good 1991; McGrayne 2011; Walker 2007). Instead, inferences
within cases are made by updating our confidence in theories based on the
correspondence between the mechanistic evidence our theories predict and
what we actually find.
1. In fields such as medicine, mechanistic evidence is collected by tracing mechanisms us-
ing detailed observations of actual processes (often in animals, for obvious ethical reasons).
For example, the link between smoking and cancer involves a “physiological mechanism [that]
operates in the lungs. The hair-like cilia in the lungs, which beat rhythmically to remove
inhaled particles, are destroyed by smoke inhalation; thus the lung cannot cleanse itself effec-
tively. Cancer-producing agents in cigarette smoke are therefore trapped in the mucus. Cancer
then develops when these chemical agents alter the cells, in particular, cell division” (Russo
and Williamson 2007: 162). Our knowledge of the workings of this process derives from
mechanistic evidence produced by numerous observational studies of parts of the mechanism
linking C with O, such as how smoke enters lung tissue (Illari and Russo 2014).

Revised Pages
Based on this logic, we suggest that we need to ask three critical questions
when trying to figure out what evidence in theory can tell us about the op-
eration of a mechanism: (1) What is our prior confidence in the mechanism
(overall and for particular parts)? (2) Do we have to find the posited observ-
ables (theoretical certainty)? (3) If we find them, are there alternative expla-
nations for finding the observables other than the mechanism or part was
operating as theorized (theoretical uniqueness)? Ideally, if we have clearly
theorized activities that enable us to formulate theoretically unique propo-
sitions about observables, we can collect relatively direct evidence of the
operation of a mechanism (or a part). In contrast, if we can formulate only
theoretically certain but not unique propositions, we only have more cir-
cumstantial evidence. If we do not find the evidence, we would disconfirm
the theorized mechanism (or part) to some degree; if we do find it, we would
not be much more confident that the mechanism actually was operating.
Our framework uses Bayesian reasoning in an informal fashion, as the
logical foundation for the questions we need to ask of empirical material to
transform it into evidence of causal relationships. The informal use of Bayes-
ian logic enables scholars to focus on what matters most in case studies—
learning about how a causal relationship works (or does not work) by under-
standing what particular pieces of empirical material mean in the context of
a particular case. In place of unnecessarily complicated analytical techniques
that divert attention from this contextual interpretation, we develop a set
of guidelines that analysts can use to ask the right questions when assessing
what their collected empirical material can demonstrate in a given research
context, thus enabling the interrogation of empirical material in a more ro-
bust and transparent fashion and consequently resulting in the production
of stronger confirming or disconfirming causal inferences about the opera-
tion of mechanisms in particular cases.
5.2. Problems with Existing Approaches for Making Inferences

in Process-Tracing
Existing approaches for making evidence-based inferences about mecha-

nisms have some problems and challenges. We first discuss the problems
related to variance-based approaches and the use of evidence of difference-
making of mechanisms on values of the outcome, followed by a discussion
of the problems related to causal process observations.

Revised Pages
Variance-Based Approaches and Evidence of Difference-Making
Variance-based approaches vary a cause and/or mechanism to see whether

different outcomes result, holding all other things equal. Because variance-
based approaches use counterfactual understandings, causal inferences are
possible only when we can evaluate empirically the difference that a mecha-
nism makes for the outcome. Variance-based approaches are thus often
termed “potential outcome frameworks” (e.g., Morgan and Winship 2007).
Without evaluating the difference that a mechanism can make, no inferences
are possible. In the words of Tetlock, Lebow, and Parker (2000: 18), “History
can be rendered counterfactual free only if those who study it are prepared to
eschew all causal inferences and limit themselves to strictly descriptive narra-
tives of what happened.” Gerring (2005: 187) states categorically, “Empirical
evidence of causal relationships is covariational in nature.” King, Keohane,
and Verba (1994: 129) speak of assessing the mean effect of causes based on
evidence of covariation between values of X and Y across a range of cases,
attempting to assess whether increases or decreases in values of X have a
measurable effect on values of Y. Unless we have difference-making evidence
in the form of variation of values of mechanisms as intervening variables,
no inferences can be made: “Nothing whatsoever can be learned about the
causes of the dependent variable without taking into account other instances
when the dependent variable takes on other values” (129).
Strong evidence of difference-making can be produced using actual
experiments in a controlled environment as well as through (1) different
types of natural experiments, where two cases are compared to see whether
the presence/absence of a mechanism results in a difference, holding every-
thing else equal, and (2) hypothetical “thought” experiments where analysts
speculate about what might have happened if a mechanism were not pres-
ent. Each of these two uses of evidence of difference-making has associated
problems. We do not discuss actual experiments, given both that they rarely
are relevant for the types of questions we are interested in when engaging
in process-tracing and, more important, experiments do not shed light on
how a mechanism works. Instead, they enable only inferences about a causal
effect resulting from difference-making, black-boxing the actual mechanism
in the process (Dowe 2011; Illari 2011; Machamer 2004; Russo and William-
son 2011; Waskan 2011).
First, evidence of difference-making can be gathered by comparing most-
similar cases where the only difference is the presence/absence of the mecha-

Revised Pages
nism (i.e., a natural experiment). If we find that the absence of the mecha-
nism (or a part of it) results in the absence of the outcome, we can infer that
the presence of the mechanism produces a difference (Runhardt 2015: 1304).
The comparison can be enabled by finding two cases that are similar in all
respects other than the presence/absence of a mechanism (or of a part). If
we unpack a mechanism into multiple parts (in-depth process-tracing), we
would try to find cases in which all other factors are present except the part
of the mechanism being assessed, requiring strong theoretical assumptions
that no parts of the mechanism are redundant (i.e., substitutable) and about
the lack of impact of contextual conditions. We would then have to repeat
this natural experiment for each part of the mechanism!
Alternatively, we can disaggregate a single case into multiple subcases,
engaging in a “one-into-many” strategy (King, Keohane, and Verba 1994:
217–28). King, Keohane, and Verba suggest several different options2 for
disaggregating cases,3 including splitting a case up into its subunits or tem-
porally. For example, a case can be split temporally by studying a single
negotiation as a series of phases (t0 = prenegotiation, t1 = agenda-setting, t2 =
bargaining phase, and t3 = negotiation end game), or the single negotiation
can be split into subissues. In both, we transform the single case into a set
of cases that can be compared to detect patterns of cross-case variation in
an intervening variable (i.e., mechanism). The most fundamental problem
with the recommendation to transform one case into many is that we move
away from the level at which causal relationships play out. When studying
mechanisms using within-case studies, we use mechanistic evidence that is
produced by tracing how a given causal process plays out in a particular case.
However, if we transformed the case from one to many, we would no longer
have evidence at the unit level at which the causal relationship is theorized
to operate. For example, if we are studying group dynamics, disaggregation
into individuals would move the focus of the research away from the very
level at which the causal relationship is theorized to play out, producing
“cross-case” evidence of difference-making among individuals that does not
match the level of the causal relationship we intended to study.
But irrespective of whether we compare cases that were found or we have
transformed a single case into many cases, using observational evidence of
2. King, Keohane, and Verba offer two additional suggestions, but both involve shifting
the causal claim being assessed (e.g., by changing the outcome being explained). We believe
that this advice basically boils down to creating variation by comparing apples and oranges.
3. King, Keohane, and Verba use the term observation in the same way that we use the term
case in this chapter, defined as one measure of one dependent variable on one unit.

Revised Pages
difference-making gained through comparisons rests on two preconditions:

(1) the cases compared are causally similar (unit homogeneity) and (2) the
compared cases are conditionally independent of each other (e.g., King,
Keohane, and Verba 1994: 91–95). Unit homogeneity means that the same
cause or mechanism will produce the same results in two or more cases (i.e.,
causal homogeneity). Independence means that the potential outcome in one
case is unaffected by values of the cause/mechanism in other cases. If these
two assumptions do not hold, we will have biased estimates of the difference
that variations in X or M have for Y.
However, the assumptions of unit homogeneity and independence
almost never hold when engaging in a small-n comparison of difference-
making.4 For example, almost any one-into-many transformation of cases
will result in a set of cases that are not causally similar, and there will also
be serious violations of case independence (a phenomenon also termed en-
dogeneity). With regard to unit homogeneity, disaggregating a case of a ne-
gotiation temporally into stages (t0, t1, . . . tn) results in cases that are quite
causally dissimilar, where we can expect critical differences in how causes/
mechanisms play out when comparing early stages (agenda-setting) and the
end game. In addition, the “cases” would not be independent of each other,
because in a negotiation what happens at the start (t0) naturally affects events
later in the negotiation, meaning that values of Y in case t0 will influence val-
ues of X in subsequent cases (periods of the negotiation). If we disaggregated
the negotiation into different issue areas instead of temporally, we should
expect that deals or deadlock with respect to one issue (case) will affect other
important issues (other cases), especially in a setting where package deals
are typical forms of resolving negotiations. The different “cases” would also
not be homogeneous in that we would expect that factors such as expertise
might matter more in low-salience issues and matter less in highly salient
issues in which actors would have incentives to mobilize the necessary infor-
mational resources to understand an issue. King, Keohane, and Verba (1994:
222) admit that this is a problem, concluding, “When dealing with partially
dependent observations, we should be careful not to overstate the certainty
of the conclusions.” It might have been more productive for them to develop
4. If we have many cases to compare, the unit homogeneity assumption logically becomes
less necessary if we are making claims about mean causal effects because potential differences
in the causal effect in individual cases “wash out,” as long as cases are independent of each
other (ensured, for example, through randomization in experiments) (Brady 2008: 261–66)
because the case-related differences will wash out across many cases if they are not systemic.
However, this is a second-best strategy. For an excellent discussion, see Gerring 2011: 246–55.

Revised Pages
methodological tools that are appropriate for case-based research than to im-
port variance-based ideas of assessing difference-making that result in a set
of guidelines in which case studies are always inferior to large-n comparative
research or experiments.
Similar problems occur when we try to identify two or more cases that
can be compared. As Runhardt (2015: 1306) admits, “A similarity compari-
son in areas like political science is, however, difficult to defend.” Because of
the complexity of the social world, it is difficult to find cases in which the
“all other things equal” assumption required in a natural experiment actually
holds (Ragin 1987: 48). Levy (2015: 390) writes that “Controlled comparison
and matching face the difficulty of finding real-world cases that are identi-
cal in all respects but one.” But unless we can substantiate that all other
things are equal except for the presence/absence of a cause or mechanism,
we cannot make a causal inference that its absence made a difference for the
outcome.
Another route is to utilize logical counterfactuals. Single-case counter-
factual comparisons involve comparing an existing, real-world case with a
hypothetical counterfactual case, where the logical argument is then made
that if a particular cause or mechanism had not been present, the outcome
would not have occurred (Fearon 1991; Goertz and Levy 2007; Lebow 2000–
2001; Levy 2015; Mahoney and Barrenechea 2017; Tetlock and Belkin 1996).
In effect, a single-case counterfactual comparison attempts to approximate
a most-similar-systems comparison between a real and hypothetical possible
other world. To achieve an analysis as close as possible to a most-similar-
systems test, the most important criteria is the minimal-rewrite rule, which
states that our hypothetical changes should be as minor as possible to see
whether the change can logically produce a major change in an outcome
(Tetlock and Belkin 1996). For example, gaming through the consequences
if Archduke Franz Ferdinand had not been assassinated in 1914 is an example
of a minimal rewrite that could have prevented the outcome (World War I)
from occurring (see Lebow 2007 and Schroeder 2007). But despite many
attempts to build a methodology for logical counterfactuals, there are no
objective empirical truth conditions for assessing a nonexistent “possible”
alternative world.
In addition, a minimal rewrite is a nonachievable ideal in most circum-
stances, as the absence of all but trivial conditions would have significant
knock-on effects for other causal conditions, meaning that everything else
would not be the same. In the words of one critic, “Though it is logically
defensible to think up counterfactual questions with which to confront the

Revised Pages
historical record, the exercise seems pointless or at best of limited value from
a practical standpoint because even so-called ‘easily imagined variations’ in-
troduced into the complex matrix of historical developments can change so
many variables [conditions] in so many unpredictable or incalculable ways,
leading to so many varied and indeterminate consequences, that the proce-
dure quickly becomes useless for helping us deduce or predict an alternative
outcome” (Schroeder 2007: 149).
While this critique partially reflects a historian’s skepticism of theory-
driven social science research in general, it is almost impossible to minimize
the scope of the rewrite. However, certain types of research questions may
be more amenable to counterfactual comparisons than others. Rewrites that
involve changes in political leaders are good examples of what are claimed
to be minimal rewrites. For example, Harvey (2012) games through whether
the Iraq War would have occurred if Al Gore rather than George W. Bush
had been elected U.S. president in 2000, concluding that Gore likely would
also have gone to war. But even this minimal rewrite can be critiqued as
involving more changes than just leader. If Gore had been elected, the ar-
gument can be made that the terrorist attacks of 9/11 might have been pre-
vented because of the amount of attention the Clinton/Gore administration
was paying to the threat from al-Qaeda in the late 1990s (see, e.g., Clarke
2004). The incoming Bush administration chose instead to focus on more
traditional state-based threats such as China and Russia, with the result that
despite strong signals that an attack was imminent, the Bush administration
ignored the threat (9/11 Commission Report 2004; B. Woodward 2004). And
if 9/11 had not occurred, it is highly unlikely that a Gore administration
would have elected to go to war against Iraq. Harvey does not explore this
knock-on effect, merely assuming that 9/11 would also have occurred if Gore
had been elected president.
The challenges grow even greater when we attempt to apply the hypo-
thetical counterfactual at the level of parts of mechanisms with in-depth
process-tracing, where we would have to assess hypothetically the difference
that the absence of each part of a mechanism made for the outcome, holding
everything else equal.
At a more fundamental level, even if we could engage in meaningful
assessment of the difference-making of mechanisms through comparisons
using either real or hypothetical cases, assessing the difference that an indi-
vidual part of a mechanism makes assumes that the mechanism has no em-
bedded redundancy. Biological mechanisms have redundancy of key parts,
meaning that if we remove one part to see what happens, the mechanism

Revised Pages
Fig. 5.2. The distinction between evidence of difference-making and mechanistic

evidence
still works to produce the outcome because another previously unknown

part with similar capacities becomes activated (Illari and Russo 2014: 158).
There are no good logical grounds for not expecting similar redundancy
in key parts of important social mechanisms. But when using evidence of
difference-making to study mechanisms, if we found that removing a part
had no discernible effect on the outcome, we would disconfirm the part
as being causally relevant, which would be a flawed inference. The correct
inference would be that the part was only one of multiple ways that causal
forces could be transferred through the mechanism, but we could not make
this inference using the simple counterfactual manipulation. This situation
is depicted in the top section of figure 5.2, where we compare two cases
to see whether the presence/absence of a part of the mechanism results in
nonoutcome.
Furthermore, isolating the difference that individual parts make results

Revised Pages
in a form of atomist analysis of the workings of mechanisms. In a systems

understanding, mechanisms are seen in a holistic fashion, where the sum is
greater than the parts. Theorized as a system, a complex interrelationship
often exists among the parts of the mechanisms, with the effects of indi-
vidual parts only manifesting themselves fully together with the effects of
other parts. In the words of Cartwright (2007: 239), “There are any number
of systems whose principals cannot be changed one at a time without ei-
ther destroying the system or changing it into a system of a different kind.”
Therefore, adherents of a systems understanding claim that the counterfac-
tual logic of demonstrating the theoretical effect of parts by conceiving of
their individual “absence” creates an atomistic theory of trees where we do
not see the broader forest.
And most problematically, when using evidence of difference-making,
the causal links between parts of mechanisms are black-boxed, preventing us
from answering the question of how a process works (Dowe 2011; Illari 2011;
Machamer 2004; Russo and Williamson 2007; Waskan 2011). This means
that we have no direct evidence of how one part is actually linked to the
next; all we know is that the absence of a part results in breakdown. If we
are using evidence of difference-making to study mechanisms, this in effect
transforms the within-case study of mechanisms into a cross-case analysis
of patterns of variation at a lower level of aggregation. We lose focus on the
process between the cause and outcome (Illari 2011; Mayntz 2004: 244–45).
The result of transforming a within-case tracing of causal processes into a
cross-case assessment of variation is that we gain precious little information
about how the process actually played out. But studying the process between
a cause and an outcome was precisely the reason we wanted to trace the
causal mechanism.
Adherents of the mechanistic approach consequently advocate using
mechanistic evidence—the empirical fingerprints left by the activities as-
sociated with entities (Illari and Williamson 2011; Machamer 2004; Waskan
2008, 2011). Instead of exploring whether things could have been different,
the mechanistic approach traces actual causal mechanisms operating in a
particular case, a distinction depicted in figure 5.2.
Case-Based Approaches, or Why CPOs Are Not the Solution
Important developments have occurred in the types of evidence with which

we are working when attempting to trace mechanisms in the social sciences,

Revised Pages
in particular regarding ideas about causal process observations (CPOs) that

move us in the direction of thinking about other forms of evidence than
difference-making (Blatter and Haverland 2012; Collier, Brady, and Seaw-
right 2010; Mahoney 2012). Collier, Brady, and Seawright (2010: 184–85)
define CPOs as “an insight or piece of data that provides information about
the context or mechanism and contributes a different kind of leverage in
causal inference. It does not necessarily do so as part of a larger, systematized
array of observations. [A] causal-process observation may be like a ‘smok-
ing gun.’ It gives insight into causal mechanisms, insight that is essential to
causal assessment and is an indispensable alternative and/or supplement to
correlation-based causal inference.”
Yet there is still almost no guidance in the existing literature on CPOs
for which types of empirical material can act as relevant evidence or for
how a CPO enables an inference to be made. In our reading, one key prob-
lem with the term CPO is that it conflates observations, understood as raw
empirical material, with actual evidence upon which we can make causal
inferences about mechanisms. An observation is a piece of empirical mate-
rial collected from archival sources, interviews, newspapers, and the like.
These observations first become evidence that we can use to make infer-
ences only after being assessed in a transparent fashion for what they can
theoretically tell us and what the particular source means in the context
of the case. Another problem is that we are usually not told what causal
process the CPO is supposedly observing. Yet it is difficult to argue that a
CPO is evidence when we do not know what causal process it is evidence of.
The lack of specification of the underlying causal mechanism complicates
assessments of what types of empirical material can act as evidence and the
probative value they might have.
The result of this ambiguity and/or narrow definition of CPOs is that
existing methodological work basically suggests that scholars should just go
out and collect CPOs in the form of “hoop tests” or “smoking guns” without
telling us clearly what this type of empirical evidence would actually look like
or how we would be able to evaluate in a transparent fashion the direction
in which the collective body of evidence points (e.g., Bennett and Checkel
2014; Collier 2011; Mahoney 2012). While the literature on Bayesianism in
case studies tells us that we should evaluate the certainty and/or uniqueness
of evidence (e.g., Barrenechea and Mahoney 2017; Bennett 2014; Fairfield
and Charman 2017; Mahoney 2012), what the existing literature overlooks
is the answer to basic questions such as what are we actually tracing, thereby
creating great ambiguity about what and why empirical material can act as

Revised Pages
evidence. In plain English, how can we claim that our hoop test is evidence
when we are never told what is jumping through the hoop?
The key reason for this ambiguity in much of the literature on CPOs is
that the process linking a cause and outcome together remains in a theoreti-
cal black box. Instead, CPOs, also sometimes called “diagnostic evidence”
(Bennett and Checkel 2014), are merely produced by asking If causal mecha-
nism M exists, what observables would it leave in a case? However, the mecha-
nism is not unpacked theoretically in any detail, meaning that it is difficult
to claim that empirical material is evidence of a process since we are not told
what process it is evidence of.
For example, in Fairfield and Charman’s (2017) discussion of evidence in
process-tracing, basically any empirical material that tells us about a mecha-
nism can act as evidence. However, they provide precious little guidance for
scholars because they never tell us what the process is. The analytical result
of this lack of clarity is that, in a reconstruction of Fairfield’s (2013) analysis
that presents a set of CPOs, they claim that the CPOs beyond reasonable
doubt confirm a postulated link between legitimacy appeals (cause) and tax
reforms (outcome) that are otherwise opposed by business interests (Fair-
field and Charman 2017: appendix A).5
Fairfield and Charman (2017: 16) claim, for example, that a confessional
from a right-wing party deputy who states that “Our candidate made a com-
mitment, and it was also a difficult moment for him. Therefore the political
decision was made to support what the candidate said.” is particularly strong
evidence. In their view, this statement “captures the causal mechanism un-
derlying HEA more completely than any of the pieces of evidence previously
analyzed” (16). However, even if the confessional can be trusted because it
goes against the assumed interests of the respondent, this information at
best provides us with evidence of a causal effect (C → O) and tells us next
to nothing about the actual process that linked the two. In this example, the
confessional cannot be strong evidence of a causal process because there are
multiple potential abstract causal processes that could be compatible with
the causal hypothesis of C (appeal to legitimacy norm) → O (tax reform).
One process might be a bottom-up reaction to pressure from voters, where
the process could be appeal to norm → appeal resonates with voters →
voters contact their representatives → these representatives inform party
leadership about a surge in public opposition → party leadership decides
5. Fairfield and Charman astonishingly claim that the evidence enables them to reach a
posterior of 99.4 percent, which approaches verbally “no question about” level of confidence!

Revised Pages
to support reform. This appears to be what Fairfield (2013: 45) had in mind
when discussing the need to mobilize public opinion in support of a norm
that puts opponents publicly on the defensive. However, another potential
process could be more elite-driven, with party leaders anticipating a poten-
tial public reaction before it happens, meaning that the mobilization does
not need to actually take place. Indeed, the confessional appears to be more
compatible with parts of this type of anticipatory process.
Other scholars have produced narrower definitions of CPOs but have
not solved the problem of what we are observing with CPOs. Mahoney
(2012: 571), for example, describes CPOs as “diagnostic pieces of evidence—
usually understood as part of a temporal sequence of events—that have pro-
bative value in supporting or overturning conclusions.” He then shifts from
discussing CPOs as within-case evidence to discussing them as cross-case
counterfactuals that compare the existing with an “alternative universe” case.
He describes a smoking-gun test as “identifying a mechanism that is neces-
sary for the outcome. The analyst then determines whether the cause itself
is necessary for this mechanism” (581). But empirical material from the case
itself is not doing the inferential heavy lifting in enabling a claim that a cause
is necessary for the mechanism in Mahoney’s framework; rather, that role is
played by cross-case evidence that enables us to either eliminate or substanti-
ate the claim of a necessary relationship. The lack of any noticeable role for
actual within-case empirical evidence in his framework is best illustrated in
the appendix, where he discusses how three classic works use empirical tests
to make inferences. Mahoney claims that Downing’s test involves claim-
ing that a mechanism (M) that leads to the outcome (liberal democracy)
is found to be necessary “using cross-case evidence” (592). This is followed
by the claim that three causes (A, B, C) are necessary for M, substantiated
by “the fact that A, B, and C are temporally proximate to M, such that it
is easier to show that the counterfactual absence of any one of them would
have eliminated M. Once [Downing] has persuaded us that A, B, and C are
necessary for M, he then logically reasons that the three factors must also
be necessary for Y, given that M is established to be necessary for Y” (592).
Beyond the temporal sequence of occurrence of A, B, C, and M, within-case
evidence is not used. Therefore, while logically clear, Mahoney’s test types
provide us only with a counterfactual-based framework based on evidence
of difference-making, which we contend is not compatible with mechanism-
based claims as developed in chapters 2 and 3.
Blatter and Haverland (2012: 23) define CPOs as “a cluster of infor-
mation that is used (a) to determine the temporal order in which causal

Revised Pages
factors work together to produce the outcome of interest, (b) to deter-

mine the status of these causal factors as individually necessary and jointly
sufficient for the outcomes in the cases under investigation, and/or (c)
to identify and to specify the social mechanisms that form the basis for
mechanism-based explanations.” CPOs can be “information that allows
for plotting the historical development of structural factors, such as eco-
nomic growth, strength of interest groups” as well as “information that
reveals the perceptions and motivations of individual, collective, or corpo-
rate actors” (106). Blatter and Haverland conclude, “When the empirical
bits and pieces form a coherent picture, they can provide a high level of
certainty that a causal-process has occurred as described. . . . Crucially
important for ‘dense’ descriptions are ‘smoking guns’—core observations
within a coherent cluster of observations that closely link cause and effect
in time and space. ‘Deeper’ insights into the perceptions, motivations, and
anticipations of important actors in crucial moments are gained through
‘observations’ that we call ‘confessionals’” (110).
However, why should we operate with such a narrow definition of evi-
dence, confined to temporal sequences or confessionals from participants?
Confessionals have serious problems as sources of evidence because of the
risk of biased accounts from stakeholders, especially for the type of sensi-
tive political events that many social scientists study. Temporal sequences
can of course be a relevant form of evidence, but their probative weight
depends on which type of theory is being evaluated. Blatter and Haverland
link their understandings of CPOs to a theoretical position that focuses on
the importance of time (i.e., historical institutionalist theories). They make
this linkage explicit they write, “The new interest in case study methodology
and especially in [causal process-tracing] has been triggered by theoretical
developments in which temporality plays a major role” (2012: 110). Yet while
temporal sequences might be relevant evidence for assessing historical insti-
tutionalist theories where temporality plays a key role, there are many other
types of theories where temporality plays less of a role, if any. Should these
theories that do not emphasize temporal factors then be evaluated empiri-
cally using sequences of events as evidence? This would be like empirically
assessing a theory of a crime by looking for empirical fingerprints that the
theory is unlikely to leave.
Mahoney and Barrenechea’s (2017) discussion of CPOs also sheds no
light on why empirical material can act as evidence in process-tracing be-
cause they remain trapped in a variable-oriented X → Y account of process-
tracing in which evidence is evidence of a causal effect of a hypothesis in-

Revised Pages
stead of being evidence of an actual process. In other words, they completely

black-box the mechanism we are supposed to be tracing when engaging in
process-tracing. This problem can clearly be seen in their discussion of the
meteorite impact (X) → dinosaur extinction (Y) hypothesis. They claim that
“the presence of iridium in a particular layer of the earth’s crust” (10) offered
strong evidence of the X → Y hypothesis. However, Barrenechea and Ma-
honey do not discuss the actual process linking X and Y together theorized
by Alvarez (2008) and others. Indeed, finding iridium does not necessarily
tell us anything about the causal effect of X; in theory, it might only tell us
that a small meteor hit the earth in Italy 65 million years ago and had only
local effects.
What actually enabled the team led by Alvarez (2008) to make inferences
about the causal process linking X and Y together was the fact that they made
the causal process itself (i.e., the mechanism) explicit and then systematically
assessed whether the empirical fingerprints of the activities associated with
key parts of the process could be found in the form of mechanistic evidence.
The “iridium-found” piece of empirical material was of course a key piece
of evidence for a part of a mechanism that hypothesized that a meteor large
enough to have worldwide effects would have thrown a large amount of ash
containing iridium from the meteor into the atmosphere. The mechanistic
evidence (trace) of the activity hurling large quantity of material into the at-
mosphere could then be a layer of iridium-laced ash throughout the world
that could be dated to 65 million years ago. Yet just finding iridium was not
conclusive evidence. Instead, what was conclusive mechanistic evidence of
the operation of this part of the mechanism was when they found the layer
of ash with iridium in Denmark also, which made it much more likely that
the meteor impact was not just a local event but a worldwide event that
could have contributed to dinosaur extinction. But they could evaluate the
evidential weight of empirical material like “iridium found in a rock layer”
only because they made explicit the activities associated with the process
being traced.
In conclusion, while the term CPO implies that it is within-case evi-
dence of a causal process, it is difficult to evaluate whether a CPO is evidence
of a process when we are not told what process is being observed. Beck
(2006: 350) has eloquently critiqued the term CPOs as vacuous: “I do not
know what it means to directly observe a causal process or how I know when
a researcher observes such a process.” He contends that the term does not
tell us anything about how we make inferences in research situations where
variation across cases is irrelevant (500). We agree. Case-study methodolo-

Revised Pages
gists have not satisfactorily answered Beck’s critical questions: What exactly
does a CPO look like? And, more important, how do CPOs enable causal
inferences to be made in case studies? We now turn to answering precisely
those questions.
5.3. What Is Mechanistic Evidence?
Mechanistic evidence is similar to the evidence used in a court of law. U.S.

evidence law has a very helpful term, relevant evidence, that offers an ap-
propriate standard for defining which forms of empirical material can act
as mechanistic evidence in case-based designs. Rule 401 of the Federal Rules
of Evidence defines relevant evidence as “any evidence having any tendency
to make the existence of any fact that is of consequence to the determina-
tion of the action more probable or less probable than it would be without
evidence” (Michigan Legal Publishing 2018: 6). This wide-ranging standard
is very appropriate for social science case studies, where we should not limit
ourselves to searching for particular types of evidence, such as “sequences
of events” (Mahoney 2012: 571) or “confessionals from participants” (Blat-
ter and Haverland 2012). Instead, evidence can be any type of material that
might be left by the workings of our theorized causal mechanisms that en-
ables us to say something about whether or not the relationship was present
in a case. Relevant mechanistic evidence can take many forms in case study
research; even statistical covariation can be relevant evidence in certain cir-
cumstances, although this would take a very different form than in variance-
based designs. In a case study, statistical information might inform our in-
terpretation of what something means in a particular context or might take
the form of within-case evidence, like differences in the lengths of the policy
briefs possessed by different types of actors. But privileging particular types
of empirical material over others results in methods that are biased toward
(or against) particular theories.
The within-case evidence that a mechanism will leave depends on theory
and the case being studied. Developing predictions of what mechanistic evi-
dence we might find therefore involves answering the question What em-
pirical fingerprints might the activities of the entities, if operative, leave in the
selected case?, either for the overall mechanism (minimalist process-tracing)
or for each part of the mechanism (in-depth process-tracing)? The answers
require that we think carefully but also creatively about the activities’ poten-
tial evidence-generating processes.

Revised Pages
We use the term proposition to refer to hypothetical empirical fingerprints

of mechanisms, reserving the term evidence for empirical material after it has
been evaluated both theoretically and empirically for what inferences it can
enable. We do not use the terms hypothesis or test because they have associa-
tions with theory-first deductive research, whereas case-base designs often
engage in empirics-first research, where we find empirical material and then
evaluate what it might be evidence of.
There are four types of mechanistic evidence: pattern, sequence, trace,
and account. Pattern evidence relates to predictions of statistical patterns
in the empirical record. For example, if we are testing a causal mechanism
dealing with racial discrimination in a case related to employment, statisti-
cal differences in patterns of employment could be relevant evidence upon
which we could make inferences. Sequence evidence deals with the temporal
and spatial chronology of events that are predicted by a hypothesized causal
mechanism. For example, if we are testing a causal mechanism that involves
rational decision-making in a given case, we might predict that decision-
makers would first collect all available information, then make an assess-
ment of the information, and finally make a decision that maximizes their
utility based on that assessment. If, however, we found that decision-makers
first made a decision and then collected information, this would be discon-
firming evidence for the hypothesized rational decision-making model.
Trace evidence is evidence whose mere existence provides proof. For ex-
ample, if we were testing a mechanism involving lobbying activities, the ex-
istence of some record of a meeting between a decision-maker and a lobbyist
would prove that they had met. If this predicted evidence had to be found
and we did not find it despite having full access to the empirical record, we
would then downgrade our confidence in the lobbying theory. Finally, ac-
count evidence deals with the content of empirical material—meeting min-
utes that detail what was discussed, an oral account of what took place in a
meeting, or discourse present in speeches or other material.
5.4. Bayesian Reasoning and Operationalizing Propositions about

the Empirical Fingerprints of Mechanisms
Bayesian reasoning can act as the logical underpinning for using mechanistic
evidence to trace causal mechanisms in operation. At the core of Bayesian
reasoning is the idea that science is about using new evidence to update our
confidence in causal theories, either within a single case or across a popu-

Revised Pages
lation.6 Bayesians are pragmatic about what type of empirical material can
be used to make inferences, making the approach particularly well suited
as a logical epistemological foundation for process-tracing research using
mechanistic evidence. Further, Bayesian empirical updating goes in both
directions—confirmation and disconfirmation—which is a significant im-
provement on much existing Popper-inspired social science methodology,
where falsification (disconfirmation) is claimed to be the only type of infer-
ence possible. In many respects, Bayesian reasoning makes explicit the back-
and-forth relationship between empirics and theories found in abductive
analytical tools (e.g., Tavory and Timmermans 2014).
Before we present and discuss the core elements of Bayesian reasoning,
we must address a few issues about how it can be applied to process-tracing
research to avoid misunderstandings. When working with mechanistic evi-
dence, we are asking questions such as Did we observe that a politician actu-
ally attempted to mobilize voters as the part of the mechanism would predict?
This is a piece of evidence that contains no variation within a case because it
is either present or not. However, this one piece of evidence (match or not
match) still can enable inferences about a causal link because it is an observa-
tion of an activity linking one part of a process with another.
Another common point of misunderstanding arises when we use Bayes-
ian probabilistic language in case studies when making inferences based on
empirical material while also claiming that case-based methods build on a
deterministic ontological understanding of causation (see chapter 1). Con-
fusion typically arises when we fail to distinguish between ontology and
epistemology. It is more analytically fruitful to talk about deterministic on-
tology as regards causal relationships at the case level. By putting forward
deterministic claims that can be wrong in a case or small set of cases, we
progressively refine our theories through repeated meetings with empirical re-
ality. But at the same time, our evidence-based knowledge about relationships
in the real world will always be imperfect. Determinism at the ontological
level does not logically imply that we can gain 100 percent certain empiri-
cal knowledge about why things occur. Bayesian reasoning is epistemologi-
cally probabilistic—we can never absolutely confirm or disconfirm a theory
because of the inherent complexity and ambiguity of the empirical world;
instead, we attach varying degrees of confidence to our theories based on the
6. For good popular science introductions to the Bayesian approach, see McGrayne 2011;
Silver 2013. For an exposition of how the Bayesian approach underlies medical reasoning, see
Gill, Sabin, and Schmid 2005.

Revised Pages
empirical evidence we find to confirm or disconfirm (Howson and Urbach

2006). We can never be 100 percent confident about any nontrivial causal
relationship in the empirical world.
However, epistemological probabilism does not mean that we cannot
evaluate causal theories that are ontologically deterministic. It just means
that there will always be a degree of uncertainty in our confidence in the
inferences we make using empirical evidence. But Bayesian reasoning makes
this uncertainty explicit, giving us a language to express our confidence in
the validity of a causal inference based on empirical evidence. In contrast, in
the naive Popperian understanding widely used in political science, a theory
is either “falsified” or not. Falsification here means that we are 100 percent
confident it is disconfirmed, which we never can be with nontrivial rela-
tionships because of the intrinsic messiness of empirical evidence. Even an
empirical test like a DNA match is never 100 percent, especially when we
have to interpret what finding a DNA match in a piece of material means in
a particular context. Therefore, when empirically evaluating a deterministic
claim in which strong evidence suggests that our mechanism did not work
as theorized, we revise our theory. But we can never be 100 percent confident
that the theory did not hold; instead, we operate with degrees of confidence
in a way similar to a court of law.
A final point of confusion is that the existing literature on test types in
Bayesian terms focuses primarily on theory before empirics (e.g., Bennett
2014). The reality of case study research is that the analyst usually has under-
taken an empirical “soaking and probing” of the case before clear theoretical
predictions of evidence are developed. In this exploration, scholars often
find empirical material that they believe might be either confirming or dis-
confirming evidence of a given hypothesis. Yet in their published research,
they then present the material as if they had previously developed a test and
then employed it on a case or set of cases, with the hypothesis, for example,
confirmed because the predicted evidence was found in the case. However
this “deductive myth” does not reflect how science is practiced in reality.
Across science as a whole, many of the most interesting and important find-
ings have not been the product of a preplanned theory test but instead have
been stumbled upon serendipitously (Swedburg 2014). Just because we know
what empirical material we will find before we evaluate the evidence does
not mean we cannot use it as evidence to make inferences about theories.7
7. This can also be interpreted as the “old evidence problem.” For a more technical discus-
sion of the topic, see Wagner 2001.

Revised Pages
Empirics-first research is also scientific in Bayesian reasoning. All that mat-

ters is that the evidence is new in relation to the existing body of knowledge,
that the empirical material gathered has a bearing on the merits of a theory,
and that we transparently evaluate what the found empirical material may
tell us in relation to the empirical record and whether we can trust what we
found.
Informal Bayesian Reasoning and Making Inferences Using

Mechanistic Evidence
Bayesian reasoning can provide us with a set of logical tools for evaluat-
ing what finding particular pieces of evidence tells us about our theories of
causal mechanisms. As used here, empirical material acts as evidence that
can either increase or decrease our confidence in the validity of a hypothesis.
Evidence in the form of mechanistic evidence enables within-case inferences
about causal mechanisms (Illari 2011; Russo and Williamson 2007).
Updating our confidence in a hypothesis about the validity of a causal
mechanism (or a part) operating in a case is a function of (1) our prior con-
fidence in the mechanism (or a part thereof ), (2) the confirming or discon-
firming power of evidence in theory (theoretical evaluation), and (3) the em-
pirical evaluation of a particular observation (on this point, see chapter 6).
After we have collected new evidence, we update our degree of confidence in
the validity of the mechanism (or a part thereof )—“posterior probability.”
We recommend the use of Bayesianism as an informal rather than for-
mal logic used to estimate quantitative values for probabilities, as others
have begun advocating (Bennett 2014; Humphrey and Jacobs 2015). There
are several reasons to avoid trying to quantify theoretical certainty and
uniqueness when evaluating the probative value of mechanistic evidence
in process-tracing, in contrast to Bayesian statistical applications. First, if
we want to quantify priors in relation to whether we should expect a causal
mechanism to hold in a particular case, our prior knowledge in a quanti-
fied form would typically be drawn from estimates about the propensity
of individual cases to exhibit a causal relationship based on population-
level research (e.g., Humphrey and Jacobs 2015: 661). Even if we ignore
problems such as conflicting ontological foundations of the causal claims
being made in case-based and variance-based designs that result in evi-
dence of different things (probabilistic versus deterministic, symmetric
versus asymmetric, and mechanisms operating within cases versus mean

Revised Pages
causal effects), population-level knowledge about causal effects (variance-

based) or invariant associations (case-based QCA) does not necessarily tell
us anything about whether a particular causal mechanism operates in a
particular case or whether it works in a different fashion (Brillmayer 1986;
Leamer 2010; Kaye 1986) unless we can make very strong assumptions
about mechanistic homogeneity (see chapters 3 and 4).
Statistics-based quantified priors that there is a relationship at the popu-
lation level can be incorrect in the chosen case for a multitude of reasons.
In the analogy of a criminal case, while we might possess population-based
statistics on the propensity of particular ethnic or socioeconomic groups to
commit particular types of crimes, this data would be relatively meaningless
prior knowledge in relation to determining whether the suspect in a particu-
lar case actually committed the particular crime. If we know that low-income
people have a higher propensity for robberies, would this cross-case prior
help the judge reach a verdict because we know that the suspect was poor?
Indeed, in criminal trials, this population-level data would be inadmissible
precisely because it is not relevant evidence: population-level propensities
would not tell us anything about the guilt of a particular suspect. To do this
would be to commit an ecological fallacy, as the population-level trend may
not hold in the individual case for a myriad of reasons. More relevant priors
that could help inform judicial decision-making would be whether the indi-
vidual in the case had a prior criminal record. But would we want to quantify
this or instead preserve the richness of the information that would enable
us to judge whether a prior history of similar crimes increases the likelihood
that the suspect might be guilty?
Second, at the case level, the probative value of mechanistic evidence is
very difficult if not impossible to meaningfully quantify. Standardized “tests”
(e.g., medical tests or polygraph tests in criminal investigations) indicate
that an empirical observable means the same things across a range of cases.
With a standardized test, we can assess the rates of false negatives/positives,
enabling us to put numbers into the formula (see Good 1991). But when
operating with mechanistic evidence in case studies, the probative value of
pieces of evidence is always heavily contextualized. In such situations, as-
signing numbers to theoretical certainty and uniqueness can be quite arbi-
trary. In a study exploring the utility of using explicit Bayesian reasoning in
intelligence work, the CIA (1968: 5) is clear on this point: “The analyst in
the Korea simulation used both direct and indirect approaches. He turned
the matter over in his mind one way, then the other, until he came to what
he felt to be a fair judgment.”

Revised Pages
Even more damning is the argument that quantification leads to exces-

sive simplification given that the probative value of individual pieces of
within-case evidence derives from complex interpretations of what evidence
means in a given context. Charman and Fairfield (2015: 31–32) work through
an extended example of quantification of the probative value of evidence,
illustrating clearly that attempting to formalize qualitative interpretations
of evidence provides little analytical added value: “There is little productive
middle ground between process tracing underpinned by informal Bayesian
reasoning and full quantification to apply Bayes’ theorem. . . . [T]he most
probative pieces of evidence are precisely those for which quantification is
least likely to provide added value. The author can explain why the evidence
is highly decisive without the need to invent numbers.”
While we do not rule out the use of numbers and suggest the use of ver-
bal equivalents when making claims about the probative value of evidence,
quantification should not direct attention away from the critical evaluative
judgment involved in transparently translating raw empirical material into
mechanistic evidence. The field of legal studies had a similar debate on for-
malizing Bayesian reasoning during the mid-1980s (e.g., Brillmayer 1986;
Kaye 1986). The nonquantification camp clearly won the debate with regard
to employing Bayesian reasoning within individual legal cases, given the im-
portance of qualitative interpretations of the meaning of empirical material
in particular contexts that cannot be quantified.
Finally, an oft-heard critique holds that Bayesian reasoning is too sub-
jective, especially with regard to setting prior confidence (Chalmers 1999:
177–81). However, the idea of purely objective “science” is a myth: scientists
always bring some subjective biases to the table. Indeed, the Bayesian coun-
terargument is that by laying subjective beliefs out openly on the table, we
can produce more objective research (Chalmers 1999; Howson and Urbach
2006; Sober 2009). One reason for this is that other scientists are better
able to evaluate whether the conclusions of the research are justified based
on the arguments about prior confidence and why empirical material can
be confirming or disconfirming evidence of a causal relationship. Further,
laying one’s cards on the table also encourages self-evaluation by enabling us
to better detect our own excessively subjective decisions about prior research
and evidence evaluation. And even if priors are somewhat subjective, after a
series of empirical tests that increase confidence in the validity of a theory,
the final posterior probability would converge on the same figure irrespec-
tive of whether two different values of the prior were initially used (Howson
and Urbach 2006: 298).

Revised Pages
Bayes’s Theorem
The simplest version of Bayes’s theorem is: posterior α prior × theoretical

weight of evidence × empirical weight of evidence. This theorem states
that the degree of confidence in the validity of a theory after collecting
new evidence (posterior) is equal to the probability that a theory is true
based on our prior knowledge multiplied by the theoretical weight of the
evidence (theoretical certainty and uniqueness) and the probability that
the actual sources of evidence are what we think they are and we can trust
them.8 In plain English, new empirical evidence updates our belief in the
validity of the hypothesis, contingent on (1) our prior confidence based on
existing research, (2) the theoretical weight of the evidence in relation to
the hypothesis, and (3) the amount of trust we can place in the source of
the evidence. We use the term hypothesis here to refer to theories about the
existence of either a minimalist mechanism or a part of a theorized causal
mechanism in systems terms.
Bayesian reasoning posits that we can both confirm and disconfirm our
confidence in the validity of a theory, although given the uncertain nature of
empirical observation, we can never be 100 percent confident about either
confirmation or disconfirmation. This is the reason law uses terms such as
beyond reasonable doubt and the preponderance of evidence that express pos-
teriors close to but not equal to 100 percent. In social science, we will never
approach this standard of confirmation because of the nature of our sources,
nor will we approach the falsification of a theory, which implies 100 percent
disconfirmation. Bayes’s theorem also predicts that a theory test that merely
repeats existing scholarship with the same data will do little to update our
confidence in the veracity of the theory. This is due to the Bayesian principle
of updating, where the findings of a previous study form the prior for the
next study; given that the data has already been assessed in previous studies,
finding the data is not surprising (high likelihood), and therefore little or no
updating of the posterior takes place (Howson and Urbach 2006). In con-
trast, our belief in the veracity of a theory is most strongly confirmed when
we use new data whose presence is highly unlikely unless the hypothesized
theory actually exists.
The three elements of Bayes’s theorem related to what evidence can in
8. For more technical introductions, see Bennett and Checkel 2014: appendix; Fairfield
and Charman 2017; for good shorter expositions, see Frieden 1986a, b; Good 1968, 1991. The
best book-length introduction is Howson and Urback 2006. For a good illustration of Bayes-
ian logic in practice in a case-study-type research setting, see CIA 1968.

Revised Pages
theory tell us are posterior probability, likelihood, and the prior. The full
theorem is expressed as (Howson and Urbach 2006: 21):
p(h)
p(h|e) =
p(e|~h)
p(h) + p(~h)
p(e|h)
The term p(h│e) is the posterior probability of the degree of confidence

we have in the validity of a hypothesis (h) after collecting evidence (e).
Confirmation—or, more accurately, an increase in our confidence about
the veracity of a theory—is achieved when the posterior probability of a
theory exceeds the prior probability before evidence was collected. If there is
a high prior probability (i.e., existing scholarship suggests that we should be
relatively confident about the validity of a theory) and the new confirming
evidence found is not very strong, the new evidence does little to update our
confidence in the theory. Disconfirmation occurs when the posterior has a
lower value than the prior.
The probabilities expressed in Bayesian reasoning can be expressed in
language form as in table 5.1. Bayesian reasoning is evident in this form in le-
gal procedures, when, for example, a criminal court finds sufficient evidence
to suggest that the suspect’s guilt in a crime is beyond reasonable doubt. We
have marked the absolutes with italics because confirmation never reaches
TABLE 5.1. Numerical Probabilities Expressed in Linguistic Form

Numerical Prior/posterior
Linguistic form equivalent confidence Likelihood ratio
Certainty, no question about 1.0 (100%) Higher levels, Level of certainty
Almost certainly, beyond reason- 0.9 (90%) more confident (higher levels, more
able doubt that mechanism certain to find)
Very probably 0.8 (80%) present based
Probably 0.7 (70%) on existing
On balance, somewhat more 0.6 (60%) knowledge
likely than not
Like as not, even money 0.5 (50%) Lower levels, Level of uniqueness
Somewhat less than even chance 0.4 (40%) less confidence (lower levels, more
Probably not 0.3 (30%) that mechanism unique if found)
Very probably not 0.2 (20%) present based on
Almost certainly not 0.1 (10%) existing knowl-
Certainly not, impossible 0.0 (0%) edge
Source: Based on CIA 1968: 5.

Revised Pages
100 percent and disconfirmation never reaches 0, both of which are logical
impossibilities in Bayesian reasoning (Howson and Urbach 2006: 103–5).
The term p(h) is the prior, which is the researcher’s degree of confidence
in the validity of a hypothesis prior to gathering evidence, based on existing
theorization, empirical studies, and other forms of expert knowledge.
The likelihood ratio is composed of two elements: theoretical certainty
(p(e│h)) and theoretical uniqueness (p(e│~h)) of evidence. Theoretical cer-
tainty should be read here as the probability that the piece of evidence will
be found if the hypothesis is true, whereas theoretical uniqueness describes
the expected probability of finding the predicted evidence with alternative
explanations of the evidence.
Theoretical certainty relates to the disconfirming power of evidence. If
it was theoretically very certain that the evidence-generating process associ-
ated with an activity should have left an empirical fingerprint and we do
not find it in the case despite a thorough search of the empirical record, we
can significantly downgrade our confidence in the hypothesis (Sober 2009).
However, if we are engaging in an empirics-first study (i.e., we have already
found empirical material that we think might be evidence), we would not
evaluate certainty of evidence in relation to a given proposition because we
have already found the evidence! Theoretical certainty can also be thought
of as the rate of false negatives: if it is highly unlikely that we do not find
the proposed evidence, not finding it would suggest that the mechanism (or
part) is not present. In table 5.1, as the value of p(e│h) rises, the proposition
about mechanistic evidence becomes more certain, meaning more disconfir-
mation if not found.
Theoretical uniqueness describes the expected probability of finding the
observable empirical fingerprints if the mechanism (or part thereof ) does
not exist, telling us about the confirmatory power of evidence. In plain Eng-
lish, if the predicted evidence is found, can we plausibly account for it with
any alternative plausible explanation, depicted as ~h? We must evaluate the
plausibility of all alternative explanations for finding the evidence (Sober
2009). If we find the predicted evidence and it was theoretically unique, we
can make a strong confirming inference. Theoretical uniqueness can also
be understood as the rate of false positives. If it is highly unlikely that we
would find the evidence without the mechanism (or part) being operative,
then finding it is confirmatory to some degree. In table 5.1, as the value of
p(e│~h) declines, the proposition regarding mechanistic evidence becomes
more unique and confirmation increases if it is found.
This equation deals with what updating is possible if we find the posited

Revised Pages
evidence. If we do not find the posited evidence (i.e., we find ~e), we use
another variant of the theorem. Given that p(e│~h) = 1—p(~e│~h), and
p(e│h) = 1—p(~e│h), and using the same values for p(h) and p(~h), the
formula becomes:
p(h)
p(h|~e) =
p(~e|~h)
p(h) + p(~h)
p(~e|h)
For example, let us quantify the updating possible when using Tannen-
wald’s (1999) proposition about “taboo talk” evidence. The minimalist mecha-
nism that she assesses using the proposition lies between norms of nuclear
nonuse and the actual nonuse in the Korean War, where the personal convic-
tions informed by beliefs about American values and conceptions of the ap-
propriate behavior constrain self-interested decision-makers (462). The propo-
sition about the empirical fingerprint of the process (“taboo talk”) is defined
as “non-cost-benefit-type reasoning along the lines of ‘this is simply wrong’
in and of itself (because of who we are, what our values are, ‘we just don’t do
things like this,’ ‘because it isn’t done by anyone,’ and so on)” (440).
We can quantify the prior as being relatively low for a norm-based ex-
planation (0.3—i.e., norms probably do not matter), based on the idea that
“dominant explanations are materialist” (Tannenwald 1999: 438). The prop-
osition “taboo talk” has a relatively low theoretical certainty to be found in
later cases, because when the norm is very strong, it “might become a shared
but ‘unspoken’ assumption of decision-makers” (440). However, given that
the norm was still developing in the Korean War case, we should expect to
find “taboo talk” if the norm matters because it is not yet dominant, mean-
ing that the theoretical certainty in the case might be taken to be 0.7 (“prob-
ably” present evidence). Finally, she claims that “taboo talk” is theoretically
unique: the predicted evidence “is not just ‘cheap talk,’ as realists might
imagine” (440). We might set theoretical uniqueness at 0.2 (“very probably”
we will not find the evidence with alternatives). If we then actually find
sources that suggest that the observable “taboo talk” was present in the case,
this mechanistic evidence would strengthen our confidence in the opera-
tion of the mechanism from the prior of 0.3 to the posterior of 0.6. In plain
English, we would have gone from believing that “norms probably do not
matter” to believing that it is “somewhat more likely than not that norms
matter.” If we did not find the posited evidence, we would have downgraded
our confidence from 0.3 to 0.14 (norms almost certainly did not matter).

Revised Pages
5.5. What Is Our Prior Confidence in the

Theoretical Hypothesis?
Priors are among the most misunderstood aspects of Bayesian reasoning,

particularly when applied to case study research. Prior confidence in a causal
mechanism combines our assessment of our confidence in its validity based
both on existing research about the overall causal relationship (at the popu-
lation level and, if it exists, within the chosen case), and about the particular
mechanism (both within the case itself, but also more broadly). In good
Bayesian-inspired empirical research, different levels of prior confidence
held by different scholars eventually wash out through the theory’s repeated
meetings with new empirical evidence.
The level of prior confidence in the mechanism affects whether we
should focus on developing confirming (i.e., theoretically unique predic-
tions) or disconfirming (i.e., theoretically certain predictions) evidence. If
our prior confidence is high, given that only very strong confirming evi-
dence would further increase our confidence, it can be more productive to
focus on disconfirming evidence in an attempt to learn something new. In
contrast, when we have low prior confidence, even relatively weak confirm-
ing evidence will update our confidence. Here a process-tracing study acts as
a form of plausibility probe.
Prior confidence is determined by an assessment of the plausibility of the
presence of a given theoretical mechanism and of its working as theorized
in a case. When determining priors for within-case analysis, existing evi-
dence from the particular case should trump knowledge from the cross-case
level because it is a more proximate form of knowledge: there can be many
reasons why a population-level trend does not hold in a particular case, es-
pecially when we descend to the level of mechanisms connecting causes and
outcomes.
In addition, in the social sciences, our existing cross-case knowledge
about relationships is often based on variation-based analysis of observa-
tional evidence of difference-making instead of on experimentally manipu-
lated evidence. When using observational evidence of difference-making,
we are actually just assuming causation based on covariation between X and
Y, but we possess no actual evidence of causal processes working (i.e., cor-
relation is not causation) (Holland 1986; King, Keohane, and Verba 1994).
The good news in process-tracing is therefore that we typically have low
prior confidence in the existence of a mechanism. In many fields, there
will be a dearth of research that has explored process, meaning that any

Revised Pages
knowledge about the mechanism(s) linking causes and outcomes tells us

something. This was the situation faced by Löblová (2017) (see chapter 8).
There was much research on epistemic communities, but no one had actu-
ally explored the processes associated with them, meaning that her process-
tracing case studies can be interpreted as plausibility probes that started with
low priors. Here a positive result from even a relatively weak within-case
test would significantly update our confidence in the existence of a causal
mechanism. At the same time, the positive within-case results do not neces-
sarily tell us anything about whether the mechanism exists in the population
for the same reasons that going from population to case level is problematic
(see chapter 4).
In a minimalist process-tracing case study, our prior confidence would
relate to how confident we are based on existing research (in particular, in
the case in hand) that the hypothesized causal mechanism as a whole would
exist in the case being studied.
Setting priors is more complicated when engaging in in-depth process-
tracing because we unpack the mechanism into its constituent parts,
meaning that we need to estimate prior confidence both for each part of
the mechanism being studied and for the overall process. We consequently
walk through how priors can be set in process-tracing by using an example
of research on how ratification by referendum (C) might produce congru-
ence between what voters want and governmental positions in negotia-
tions (O) in European Union treaty reforms (see Beach 2018). Figure 5.3
illustrates the relationship between priors for each part of the mechanism
and for the whole.
The theorized mechanism separates the causal relationship into three
parts that work together to account for how C can hypothetically cause O.
The first part contends that for referenda to matter, the government must
initially make a decision, in light of the probable referendum, to actively
adopt a strategy of ensuring that the final outcome can be ratified by decid-
ing to listen to public opinion. The second and third parts detail how posi-
tions are adapted to voter views through the collection of data on voter views
and the modification of positions on the basis of this information.
Given that existing findings are purely observational evidence of
difference-making that do not enable causal claims to be made (e.g., Finke
2009; Hug and König 2002; Hug and Schultz 2007), we could set the overall
prior for the existence of the causal relationship as being relatively low in any
given case. In addition, given that there are no prior case studies of the rela-
tionship, we would have a low relative prior confidence irrespective of which

Revised Pages
Fig. 5.3. A hypothesized causal mechanism and priors in process-tracing
case we select to study. Overall, this low prior confidence means that even if
our process-tracing case study can produce only relatively weak confirming
evidence, it would still enable some updating to take place if we found the
predicted evidence for each part of the mechanism.
The priors for each individual part determine the type and strength of
evidence necessary to further update our confidence for each part—what
can be thought of as miniature case studies embedded within the overall
study of the mechanism in the case. This means that the prior for each part
of the mechanism should be described. For example, the prior for part 2
(p(h2)) is set relatively high, given what we know about modern represen-
tative governments in general. That is, it is almost self-evident that they
would be monitoring public opinion in some fashion on important issues.
Therefore, only very strong confirming evidence would further increase our
confidence. In contrast, the low priors for parts 1 and 3 reflect our overall low
confidence in the mechanism given that these are the real “working parts” of
the causal relationship if it exists.
New empirical evidence for each individual part then enables us to in-

Revised Pages
crease or decrease our confidence in the presence of the mechanism as a

whole. The level of our posterior confidence in the whole mechanism re-
flects the lowest posterior level for each of the parts after we have done our
empirical research. For example, after collecting new evidence, our posterior
confidence might be quite high for part 1 and very high for part 2, but if we
could not deploy a strong confirming test for part 3, the posterior might be
only somewhat low, meaning that little updating took place. Given that each
part of a mechanism must be present for us to infer that the mechanism as
a whole is present, our posterior confidence in the presence of a mechanism
would be only somewhat low—only as strong as the weakest test.
What Priors Look Like in Practice
How should we present priors in our research? Basically, we need to present

fairly what existing research tells us about the plausibility of the presence
of the causal relationship in the chosen case or in the bounded population
when using comparative methods. This involves mustering existing theoreti-
cal and empirical research in a transparent fashion. Unfortunately, when we
look at case study research, the evaluation of prior confidence is often quite
superficial.
One exception is found in Janis (1983), which states that a cohesive small
group (C) can result in flawed decision-making in foreign policy (O) in
particular contexts through a groupthink causal mechanism. He starts by
writing that we know little about social-psychological processes in foreign
policy decision-making, and he describes what was known from existing
research using other types of small groups (3–6). He states that knowledge
on “group dynamics is still in the early stages of scientific development,
and much remains to be learned” (6) before moving on to discuss the low
prior confidence in the hypothesis in relation to a particular case (decision-
making in the Bay of Pigs fiasco). He assesses existing explanations for faulty
decision-making in that case: “After studying Schlesinger’s analysis of the
Bay of Pigs fiasco and other authoritative accounts, I still felt that even all
four factors operating at full force simultaneously could hardly have given
rise to such a faulty decision. . . . Sensitized by my dissatisfaction with the
four-factor explanation, I noticed . . . Schlesinger’s account of what policy-
makers said to each other during and after the crucial sessions. . . . Group-
think does not replace the four-factor explanation of the faulty decision;
rather, it supplements the four factors and perhaps gives each of them added

Revised Pages
cogency in the light of group dynamics” (33–34). One can infer from his
description of the state of the art of the theory and knowledge of the case
that he believes that the prior confidence in the group dynamic (group-
think) hypothesis is relatively low. Therefore, it makes sense when he then
claims that plausibility probes with relatively weak evidence are sufficient
to update prior low confidence in the hypothesis. He writes, “For purposes
of hypothesis construction—which is the stage of inquiry with which this
book is concerned—we must be willing to make some inferential leaps from
whatever historical clues we can pick up” (ix).
Scholars should be explicit about which forms of existing knowledge
have contributed to their derivation of an estimation of prior confidence
by answering three questions. First, at the most abstract level, is the postu-
lated causal relationship theoretically plausible? Do we find plausible causal
mechanisms linking C with O in the literature, or is it possible to derive
these mechanisms from existing research? Second, what does existing re-
search tell us about the relationship at the cross-case level? Given that this
evidence typically shows correlations across cases, it provides relatively weak
support for a causal relationship, especially when applied to single cases. In
addition, why should we expect that the cross-case relationship might actu-
ally be present in the selected case? That is, how can we justify the assump-
tion of causal homogeneity in the population?
Third, what does the evidence from existing research on the case itself
tell us about the plausibility of the mechanism? This answer often takes the
form of historical work on the case, which can shed light on the plausibil-
ity of the relationship in the particular case. Finally, what is the plausibility
that the theorized mechanism and each of its individual parts are present in
a particular case?
5.6. Operationalizing Empirical Fingerprints of Mechanisms
Critical to our ability to operationalize the empirical fingerprints of mecha-

nisms (minimalist process-tracing) or their parts (in-depth process-tracing) is
that we make activities explicit when theorizing mechanisms, even attempt-
ing to do so when using a minimalist understanding. When we have made
explicit what activities can be expected to be associated with a mechanism
(or its parts), we can better formulate propositions about what the evidence-
generating processes associated with these activities can be expected to leave
in a particular case when engaging in theory-testing research. This section

Revised Pages
discusses how to operationalize propositions about empirical fingerprints,

maximizing the certainty and/or uniqueness of the predictions. While our
discussion is formulated in theory-first terms, considerations about activi-
ties are also important during theory-building because speculating about
potential activities helps when figuring out what empirical material is actu-
ally evidence of. Theoretically certain evidence is similar to circumstantial
evidence in a criminal investigation, enabling the disconfirmation of causal
inferences, whereas theoretically unique evidence of the activities associated
with a part of a mechanism is similar to direct evidence, enabling the con-
firmation of causal inferences about a process actually occurring in the case.
Get to Know the Empirical Record to Get Ideas about Potential

Empirical Fingerprints
Before we can get an idea about possible empirical fingerprints that can be
left by the evidence-generating processes of the activities of a mechanism
(or its parts), we need to have some ideas about the evidential setting within
which our theorized mechanism is expected to function. The predicted evi-
dence that a given mechanism will leave depends on the mechanism and
the case being studied. A banal point is that different mechanisms will leave
different empirical fingerprints. More interesting is the claim that the same
mechanism can leave different predicted evidence in different cases. Theories
and mechanisms often have different empirical fingerprints in different cases,
despite being theorized to be a similar process. Operationalizing the empirical
fingerprints of mechanisms that are sensitive to the particulars of individual
cases therefore requires considerable case-specific knowledge and expertise. If
we are investigating a mechanism related to lobbying, while the causal process
might be the same, the fingerprints it might leave in the United States may
differ substantially from the fingerprints it would leave in an Italian context.
Gaining an overview of the evidential settings is therefore an important step
before operationalizing the theoretical expectations about the empirical fin-
gerprints, since the operationalization of expected empirical manifestations
of the underlying causal mechanism will always be case-specific.
Pedersen and Reykers (2017) utilize a process-tracing research design to
test a bandwagon-for-status hypothesis, where they aim to study whether
small states joined U.S. interventions to improve their relative status within
their geographical peer groups by gaining recognition and prestige. The au-
thors conceptualize a general status-seeking bandwagoning mechanism that

Revised Pages
is then tested in two typical (positive) cases—Denmark and Belgium—to

assess whether status-seeking helped to guide decision-makers. Before test-
ing the mechanism, they had to identify the evidentiary settings in the two
cases, which turned out to be very different. Denmark had evolved a tradi-
tion of codification of strategic concerns in foreign policy doctrines, strategy
papers, or written political agreements. As the product of sensitive political
negotiations, these documents made the general premise for Danish inter-
vention very explicit and often formulated it in a very principled manner
that stressed the ideational justifications for war participation. Belgium does
not have such a tradition, and justifications are more often found in parlia-
mentary statements, speeches, and public interviews in which politicians
make more direct and often nonideational arguments for participation in
U.S.-led interventions. The potential evidence consequently looked very dif-
ferent in the two cases, which also impacted the types of empirical finger-
prints expected to be found and their evidentiary value.
Be Creative When Formulating Observables
After learning about the evidential setting, we need to formulate empiri-

cal fingerprints that might have been left by the activities of mechanisms
(or their parts). When developing predictions about which empirical fin-
gerprints a mechanism (or its parts) might leave (theory-first) or evaluating
what found empirical material potentially is evidence of (empirics-first), it is
important to be as creative as possible.
In the natural sciences, new types of empirical fingerprints are developed
by creatively gaming through the types of clues causal relationships might
leave. A good example of this creativity can be found in research about theo-
ries of human evolution. Before the 1990s, scientists utilized skull and teeth
morphology or microscopic wear patterns in teeth as a form of crude evi-
dence, but this evidence had a relatively low probative value, meaning that it
told us little about the diet of early hominins. In the 1990s, it was discovered
that different types of plants leave different ratios of carbon-12 and carbon-
13 in skeletal tissue (best preserved in dental enamel) and in soil samples in
rocks (Klein 2013). Savanna grasses that thrive in hot, dry climates leave a
relatively high ratio of carbon-13 to carbon-12 because these plants use a pho-
tosynthetic pathway called C4, whereas shrubs and woody plants in wetter
climates have a lower ratio of carbon-13 to carbon-12. A team of scientists
subsequently used this technique to uncover what different species of homi-

Revised Pages
nins ate (Cerling et al. 2013). These authors found that as vegetation patterns
shifted between three million and two million years ago toward more C4
grasses, one hominin species (Paranthropus bosei) ate a narrow, grass-based
diet, whereas early humans (Homo) ate a more varied diet. This suggests that
what differentiated us from our close ancestors was the ability to exploit and
manipulate our relationship with the environment. Creatively exploiting
different forms of empirical fingerprints as evidence has therefore resulted
in new knowledge about early human evolution. We strongly recommend
trying to be just as creative about developing propositions about empirical
evidence in the social sciences to strengthen the conclusions we can make
based on case study research.
When designing theory-first propositions, we need to state clearly which
types of evidence the causal relationship should leave in the empirical record
and what that evidence may tell us regarding a given theory (theoretical cer-
tainty and uniqueness). The clearer we are in describing what we expect to
find, the easier it is to determine whether we have actually found it. When
evaluating what evidence might tell us, it is also important to provide con-
textual information that enables us to assess the evidence’s probative value
because of the importance of case-specific context (Kreuzer 2014).
Evidence might also be in the form of what is not mentioned in the text
or in an archive. This type of evidence can be termed e silentio—based on
silence or the absence of an expected statement or message in a text or in
an archive. If we expect an author to have a strong interest in presenting an
event in a certain light or we would expect the author to take credit for a
given decision and this is not the case, this omission might have inferential
weight in our analysis. When a certain event is not mentioned in the text,
one possible explanation might be that the event did not take place. Con-
versely, if we are very confident based on other sources that an event took
place, its omission from a source where it should be is a highly unique obser-
vation that would have strong evidential weight, other things equal.
The activities associated with different types of theoretical explanations
can be expected to leave different types of empirical fingerprints. Activities
in explanations that deal with the role that ideas play in decision-making
might be expected to leave particular discursive fingerprints—for example,
a key phrase might be found in an initial document and then make its
way through successive drafts until the final text is agreed (Jacobs 2014;
Parsons 2016). Here, we would need to use methods that enable us to
observe and evaluate language, such as discourse analysis (e.g., Fairclough
1995; Schreier 2012; Schwartz-Shea and Yanow 2011). Theories about how

Revised Pages
institutions channel agent interaction might leave fingerprints in the form

of particular networks of social actors and the transmission of information
between them that could be mapped using methods of social network
analysis (e.g., Diani and McAdam 2003; Leifeld and Schneider 2012; Was-
serman and Faust 1994).
Justify the Theoretical Certainty and Uniqueness of Observables
Arguing for the theoretical certainty and uniqueness of each proposi-

tion about potential empirical fingerprints involves developing reasoned
arguments for why the evidence-generating processes of the causal re-
lationship being studied would leave these fingerprints (propositions
about evidence). An important step in this process is making theoreti-
cal and case-specific arguments about what the postulated observables
would mean in a given context. It is important to be clear about where in
the empirical record we would expect the evidence to be found and why
the evidence-generating process of the causal relationship would leave
these specific empirical fingerprints.
Defining certainty is in this respect always a function of the theory we
aim to test. Theoretical certainty means that the predicted empirical finger-
print must be observed; otherwise, the empirical test disconfirms the ex-
istence of the part of the mechanism, at least to some degree. We do not
need to evaluate theoretical certainty when we are engaging in empirics-first
research—that is, when we have already found some piece of empirical ma-
terial in theory-building process-tracing and are assessing what underlying
process (or part thereof ) it could be evidence of.
It is important to think about the evidence-generating process associated
with the activities of each part of a mechanism in relation to evaluating theo-
retical certainty. What types of fingerprints should we expect the activity will
leave in a given case? Why do we have to find it? It is useful to think about
differing degrees of theoretical certainty in terms of empirical hoops that a
hypothesized mechanism (or part thereof ) must jump through to survive
empirical scrutiny. These hoops can be of differing sizes, depending on the
degree of certainty.
That absence of predicted evidence is not always evidence of the absence
of a mechanism or part thereof (Sober 2009). Some activities can leave few
if any traces. This means that if we do not find the predicted evidence, given
that it was not theoretically certain that the activity would leave the trace,

Revised Pages
little disconfirmation takes place. Furthermore, a theoretically certain predic-

tion about evidence is not the same thing as actually observing evidence in
the accessible empirical record (Sober 2009: 71). In many research situations,
we lack access to the full empirical record, which means that while the em-
pirical fingerprint might be theoretically certain if we had full access to the
record, we are unable to assess whether or not the predicted evidence was
present. Not finding the predicted evidence would also not be disconfirming
in this situation. Only when activities are relatively certain to leave particular
traces and we have full access to the empirical record can we say that absence
of evidence is evidence of absence of a mechanism or a part thereof (Sober
2009: 71).
Theoretical uniqueness relates to the plausibility of alternative explana-
tions for finding mechanistic evidence. One common misunderstanding in
social science applications of Bayesian reasoning in process-tracing is that
theoretical uniqueness of evidence is always relative to rival or competing
theories that explain the outcome (e.g., Bennett 2014; Fairfield and Char-
man 2017: 366–68). In Bayesian reasoning, rival theories must be mutually
exclusive theoretically and empirically for evidence to be informative about
both theories (Eells and Fitelson 2000: 134; Rohlfing 2014). Otherwise, evi-
dence is informative only about the hypothesis under investigation and its
negation (i.e., the hypothesis is not true). In law and medicine, hypotheses
are typically binary (guilt/innocence, cured/no effect). But in the social sci-
ences, whether we should interpret ~h as rival theories depends on the re-
search situation.
When we are tracing minimalist mechanisms, alternative explanations
for a given empirical fingerprint may come from other potential causal pro-
cesses associated with the outcome, but only if the rival theory can provide
an alternative explanation of the found mechanistic evidence. However,
given that our rival theories are typically drawn from broad and abstract
theoretical approaches such as constructivism or rationalism, theoretical ex-
planations are almost never mutually exclusive in either theoretical or em-
pirical terms (Rohlfing 2014). Therefore, if we want to produce evidence that
is informative about rival theories in relation to each other (Fairfield and
Charman 2017: 366–67), theories must be formulated in a mutually exclu-
sive way at the evidential and theoretical levels so they can act as alternative
explanations in Bayesian terms (Eells and Fitelson 2000: 134; Rohlfing 2014).
We are, however, often in situations where theoretical explanations of an
outcome are not mutually exclusive. At the theoretical level, multiple causes
are usually linked to a given outcome: just because we have found evidence

Revised Pages
that cause C1 matters does not mean that other causes are not also pres-
ent. Further, the empirical fingerprints left by various theories often differ
greatly, meaning that evidence suggesting that a process associated with C1
matters does not necessarily imply that other causes and/or processes were
not present. For example, Tannenwald’s (1999; 2007), finding of evidence
of “taboo talk” suggests that a process associated with norms was present
(constructivist explanation of the outcome) but does not mean that processes
associated with cost-benefit calculations (rationalist explanation) played no
role. Indeed, Tannenwald (2007: 53–54) explicitly declares that that norms
are necessary but not sufficient and that cost-benefit calculations also mat-
ter. Therefore, finding confirming mechanistic evidence that suggests that
norms matter does not disconfirm a cost-benefit rationalist explanation.
When we are making inferences about parts of mechanisms using in-
depth process-tracing, the logic of rival theories completely breaks down
because it is difficult to imagine a situation in which two or more different
causes would trigger two parallel processes in which each part is mutually
exclusive of the other theoretically and empirically. Given the complex na-
ture of many of the types of causes we are studying in social science (e.g.,
authoritarian repression, electoral democracy, popular pressure from voters),
it is very likely that a given cause might trigger any number of potential
processes (see chapter 2; see also Steel 2008). Therefore, finding confirming
evidence of the activities associated with a particular mechanism triggered
by authoritarian repression does not in most circumstances enable us to dis-
confirm the existence of other mechanisms triggered by the same cause. This
means that when we are tracing mechanisms using in-depth process-tracing,
~h should be understood simply as any plausible explanation for finding the
evidence in a universe in which the hypothesized part of the mechanism is
not working as theorized.
We can illustrate the logic of developing predictions of relatively unique
empirical fingerprints for the activities associated with different competing
mechanisms in a situation that variance-based approaches would categorize
as overdetermination. The classic example from the philosophy of science in-
volves a person found dead after wandering in the desert, with poison in the
person’s canteen as well as a hole in the canteen. Therefore, both poison (C1)
and thirst (C2) are potentially sufficient causes, but only one can actually
have had a causal relationship with the outcome—that is, they are mutually
exclusive because the person died either of poisoning or of dehydration.
Overdetermination—that is, multiple sufficient causes—wreaks havoc in a
variance-based design that uses evidence of difference-making because we

Revised Pages
cannot isolate the differences that C1 and C2 make for the outcome be-
cause both are present. But this is not a problem when engaging in in-depth
process-tracing using mechanistic evidence. Here, overdetermination at the
cause/outcome level does not mean that we cannot figure out which mecha-
nism operated because we can distinguish at the empirical level between the
fingerprints that the activities associated with mechanisms C1 and C2 would
leave. For example, we would expect to find relatively theoretically unique
mechanistic evidence in the form of poor skin turgor, tinting of skin, sunken
eyes, and/or dry galea and dry organ surfaces if the mechanisms associated
with dehydration had proceeded from start to finish (Madea and Lachen-
meier 2005), whereas the mechanisms associated with poisoning could be
detected if we find increased levels of poison in the blood in an autopsy
conducted relatively quickly after death (Musshoff et al. 2002). If we found
the mechanistic evidence predicted by the mechanism associated with poi-
soning, we could then conclude that the person died of poison and not
dehydration.
Evaluating theoretical uniqueness can be done by brainstorming about
other activities that might have left the empirical fingerprint. Here, it can be
helpful to act as one’s own devil’s advocate, drawing on other theories and
case-specific knowledge to explore additional potential explanations for the
presence of the piece of mechanistic evidence. It can also be helpful to ask
case experts about alternative interpretations of evidence.
We can also attempt to improve the uniqueness of empirical fingerprints
by increasing the specificity of the prediction. Instead of putting forward a
prediction that “taboo talk” would be found (Tannenwald 1999), the unique-
ness of this prediction could be increased by making it even more specific—
that is, by stating that central actors would be expected to be making this
type of speech act during critical phases of the decision-making process. This
more specific prediction would have a higher degree of uniqueness, other
things equal. The downside of making predictions more unique is that do-
ing so often decreases the theoretical certainty of the test, meaning that we
become less likely to find the evidence, other things equal. And not finding
unique but not certain evidence tells us little if anything about the operation
of the mechanism (or its parts).

Revised Pages
C h a p ter 6
Linking Propositions with Empirical

Material—Finding and Evaluating Evidence
There is nothing more deceptive than an obvious fact.
—Sherlock Holmes
6.1. Introduction
Chapter 5 discusses how we can operationalize propositions about mecha-

nistic evidence in the form of empirical fingerprints that the evidence-
generating process associated with the operation of mechanisms (minimal-
ist process-tracing) or the activities associated with entities for each part
(in-depth process-tracing) should leave in a case and what causal inferences
that finding/not finding the posited mechanistic evidence can be expected
to enable based on a theoretical evaluation. This chapter discusses the actual
collection and assessment of empirical observables and what finding or not
finding them means in relation to whether we can confirm the existence or
absence of the predicted mechanistic evidence. We use the term observation
to refer to raw empirical material prior to the evaluation of its content and
accuracy. After it has been assessed for its content and potential measure-
ment error to determine whether we have found the empirical fingerprint as-
sociated with a proposition, we refer to the material as mechanistic evidence.
We develop a framework for the empirical evaluation of what finding/
not finding observations means in relation to the posited mechanistic evi-
dence. If we find an observation that we think matches the predicted mecha-
nistic evidence, the first question we must ask is what the observation actu-
ally means in a given context. This is followed by an assessment of whether
or not we can trust the source of the observation. If we do not observe the
195

Revised Pages
Fig. 6.1. The relationship between observations and propositions about evidence
Source: Based on Sober 2009: 68.
predicted mechanistic evidence, we evaluate whether we have had access to

enough of the potential empirical record to assert that we have actually not
found the predicted evidence. Another question is whether we can trust
that our sources have not obscured evidence (either accidentally—by not
remembering something—or deliberately). Only after we have answered all
of the questions related to the theoretical and empirical evaluation of found/
not found observables can we make confirming or disconfirming inferences
about the operation of a mechanism (minimalist process-tracing) or a part
of a mechanism (in-depth process-tracing).
Figure 6.1 depicts the logical relationship between found/not found ob-
servables and hypothesized mechanistic evidence, building on Sober (2009:
68). Empirical uniqueness deals with what finding an observation tells us in
relation to whether or not the posited mechanistic evidence was present. If
we can justify that the found observable actually means what we think in
the given context and we can trust it, finding the observable would be highly
empirically unique, meaning that we can confirm that we have found the
predicted mechanistic evidence to a high degree (upper left corner).
If, in contrast, we have cherry-picked the observation, meaning that it
does not reflect the full empirical record, or we cannot trust the source, the
empirical uniqueness would be quite low, meaning that we can think we

Revised Pages
Linking Propositions with Empirical Material 197
have found the posited mechanistic evidence but it could in reality be a false
positive finding (upper right corner). In this situation, the correct inference
would be to state that while we have found what looks like evidence, no
inferences should actually be made.
Empirical certainty deals with whether we can conclude that a not found
observable is absence of the posited mechanistic evidence. If we do not find
observables, we had extensive access to the empirical record, and/or we can
trust that sources did not cover anything up, we can then disconfirm to a con-
siderable degree the existence of the posited mechanistic evidence (evidence
of absence—lower left corner). In contrast, if we either lack good access or
cannot trust sources, we should not claim that we have not found the posited
evidence (lower right corner). Instead, no inferences should be made.
We should not conflate the empirical assessment of what found/not
found observables mean and whether we can trust the sources with the the-
oretical evaluation of what found/not found mechanistic evidence tells us
about the existence of the hypothesized activities associated with a mecha-
nism. In theory, a piece of mechanistic evidence might be certain, but if we
are unable to gain access to the empirical record that would enable us to
assess whether it was actually present, no inferences would be possible even
if we do not observe the predicted evidence. Here, absence of evidence is not
evidence of absence.
We start by discussing the challenges associated with the collection of
observations. We then discuss the different questions to ask depending on
whether we find or do not find observables. We split the discussion into ques-
tions relating to what finding/not finding observables can mean in a given
context. After we have evaluated whether the found observation reflects the
proposed fingerprint or whether not found observations occurred because we
lacked access to relevant parts of the empirical record, we must assess the sec-
ond question: Can we trust the source? Here we are assessing the accuracy
of our observations. We discuss a variety of different challenges relating to
accuracy and develop a set of questions that assist in evaluating whether we
can trust a given source. We then turn to different challenges associated with
different types of sources, including interviews and archival sources.
6.2. Collecting Observations
The collection of empirical observations is not a random, ad hoc process.

This is most obvious when engaging in theory- testing process-
tracing,

Revised Pages
where we conceptualize mechanisms and their parts and then operationalize

propositions about empirical fingerprints that their operation might have
left. The collection of empirical material is strategically focused on assessing
whether or not the predicted mechanistic evidence actually existed in a case.
However, we are also guided to some extent by theory when engaging in
theory-building process-tracing. Pure induction—where the “facts” speak
for themselves—is a fiction given the complexity of the social world. Of
course, as we discuss in chapters 3 and 9, we can start our search for potential
causal mechanisms by using empirical narratives about what happened, at-
tempting to distill the essence of a given causal process. However, in moving
from events to causal mechanisms, we always have some form of preexisting
theoretical knowledge that we use to structure our search for mechanisms
(Swedburg 2012, 2014; Tavory and Timmermans 2014). In the example of
Janis’s (1983) theory-building study of groupthink, he was inspired by exist-
ing social psychological theories, leading him to look for certain types of
fingerprints that might be manifestations of a social-psychological need for
conformity.
We are not merely cherry-picking observations that fit a favored hy-
pothesis when we do process-tracing research. Instead, in theory-testing
process-tracing we are attempting to test whether the predicted evidence
(proposition) is present. We do not just go out and try to find support-
ing evidence; instead, we strategically aim to collect empirical material that
would enable us to determine and document whether or not the proposition
about evidence was present. We therefore collect observations that enable us
to put our hypothesized causal mechanism to a critical test. The researcher
must evaluate whether the observations collected are enough to determine
whether or not the proposition was present.
What is relevant empirical material to collect depends on the proposi-
tion. Collecting different types of evidence raises different types of chal-
lenges. Collecting pattern evidence in the form of a number of proposals in
a negotiation can be time-consuming, and important holes can exist when
documents are unavailable for one reason or another. Collecting accurate
elite interviews that provide account evidence of what happened in a given
negotiation raises numerous challenges relating to gaining access to high-
level decision-makers; if we are able to interview them, we face challenges
regarding whether their accounts are actually independent of each other. In
collecting data for process-tracing, the most appropriate analogy is detective
work: the collection of relevant evidence involves wearing out much shoe
leather (Freedman 1991).

Revised Pages
Finally, the collection of empirical observations to evaluate whether we

have found or not found the proposed empirical fingerprints should be seen
as a cumulative process. We need to evaluate whether or not we have enough
empirical material to enable us to conclude that the proposed fingerprint
exists. More is not necessarily better; instead, we need to collect strategic
observations that allow us to assess our empirical tests. Further, we need to
be realistic and be aware of resource limitations.
Yet even with the most strenuous efforts, the evidence we gather will
always be a small sample of the full empirical record in complex events.
Furthermore, future studies might have access to new sources that lead to
new interpretations of what happened. Drawing on an analogy with as-
tronomy, until recently the theory that other solar systems had Earth-like
planets was merely a hypothetical conjecture. Recent advances in telescope
technology have enabled the collection of evidence that confirms the ex-
istence of Earth-like exoplanets. Similarly, in process-tracing research, the
mechanistic evidence that we gather in relation to a particular case will
always have a preliminary form, and the inferences we make based on the
collected mechanistic evidence (i.e., the posterior probability) can always
be updated by new mechanistic evidence. Important archival sources can
be declassified, memoirs can be published, and scholars who critically ana-
lyze existing evidence can develop more compelling interpretations that
either increase or decrease our confidence in the presence of hypothesized
causal mechanisms. Therefore, scholars using process- tracing methods
need to be as aware as historians that any result can be updated when
new sources and new interpretations of them. In the words of Elman and
Elman (2001: 29), “Historians know that there are likely to be other docu-
ments, indeed whole collections of papers that they have not seen. Accord-
ingly they are inclined to view their results as the uncertain product of an
incomplete evidentiary record.”
6.3. Evaluating What Found/Not Found Observations Tell Us

about the Existence of a Proposition
In theory-testing process-tracing, we operationalize empirical fingerprints

of mechanisms or their parts and then collect empirical material that en-
ables us to evaluate whether or not the predicted evidence was there. De-
pending on whether we find or do not find observables, we must evaluate
either the empirical certainty (if not found) or the empirical uniqueness (if

Revised Pages
found) of each source of observables. In contrast, when theory-building,

empirical material is collected but is first really evaluated in terms of what
it potentially can be evidence of when we have some hunches about a
potential mechanism (or its parts) that can enable us to assess empirical
uniqueness.
Not Finding Observables and Empirical Certainty—Issues of Access
In empirics-first research (i.e., theory-building process-tracing), we have al-

ready found empirical material, meaning that assessing empirical certainty
is not a concern. However, in theory-testing, if we do not find the predicted
evidence (proposition about an empirical fingerprint) in the empirical re-
cord, we have to ask ourselves whether we would have to find the predicted
evidence in the collected material. We might simply have been unable to
gain access to the empirical material that would enable us to evaluate the
proposition, meaning that even if our proposition was theoretically certain,
we have no empirical way of knowing whether the predicted evidence exists
(Sober 2009: 71). We might be unable to interview participants in a negotia-
tion or to gain access to archives to search for the minutes of meetings.
Sources are often self-selecting—for example, where availability con-
cerns determine which sources we use (Thies 2002: 356). There may also be
reasons why a particular record has survived through time, whereas others
were deliberately destroyed (Darnton 2017–18: 94). In these circumstances,
the empirical certainty of the sources of evidence is low because of a lack
of access, meaning that little or no disconfirmation takes place despite the
proposition itself being theoretically certain. Conversely, if we gain close
to full access to the empirical record and still do not find the predicted
evidence, we can disconfirm that the proposition actually was present with
considerable confidence.
For example, if the proposition dealt with whether we find internal gov-
ernmental study papers produced before a decision was made, and we do
not have access to the archival material that would enable us to assess the
existence of such papers, we would have to conclude that even though the
proposition was theoretically certain, lack of access prevented us from finding
it in the available sources. In this situation, the correct inference would be
that we do not know whether or not the predicted mechanistic evidence ex-
ists. In contrast, if we did have access to the full empirical record, we would
expect to find any study papers that existed. If we then did not find them,

Revised Pages
we could conclude that the theoretically certain proposition was not present
empirically, leading us to disconfirm the operation of the part of the mecha-
nism associated with the proposition.
Found Observables and Empirical Uniqueness—What Do They Mean?
In both theory-building and theory-testing, if we find an observation that

we think matches the proposition about evidence, we must assess the em-
pirical uniqueness of the found observation—that is, whether it enables us
to confirm that the proposition actually existed. We therefore have to assess
empirical uniqueness in terms of whether the observation enables us to con-
firm the existence of the proposition about mechanistic evidence or is just
an empirical anomaly.
The first step is to evaluate the content of individual observations to
determine the degree to which they actually reflect the proposed empiri-
cal fingerprints. What does the observation tell us? What is the source of
the observation and the context in which it was produced? Assessing these
questions requires considerable background knowledge. We need to get to
know the potential evidentiary record—what Trachtenberg (2006) calls the
architecture of the sources—and to develop considerable knowledge about
the evidence-generating process of the different activities in a given context.
For example, how does the political system work? Is anything amiss in the
events that have been uncovered?
What is normally included in the type of source (e.g., the minutes of
a cabinet meeting)? Documents must be interpreted within their historic,
situational, and communication contexts. We need to understand the pur-
pose of the document and the events leading up to it to correctly interpret
its meaning (Larson 2001: 343). According to George and Bennett (2005:
99–100), analysts should always consider who is speaking to whom, for what
purpose, and under what circumstances. The inferential weight of the con-
tents of a document often cannot be reliably determined without addressing
these questions. Similarly, when conducting elite interviews, it is vital to
consider who is being interviewed and why they are speaking.
Thus, each observation should be evaluated relative to what is known
about the actors, their intentions, their interactions, and their situations
(Thies 2002: 357). When analyzing documents, we must ask what purposes
they were intended to serve and what agendas their authors might have.
How does the document fit into the political system, and what is the rela-

Revised Pages
tion to the stream of other communications and activities within the policy-
making process? It is also important to note the circumstances surrounding
the document: Why has it been declassified or released? For what purpose?
What has not been released?
Sometimes assessing what an observable means can be relatively straight-
forward. If we had a proposition about mechanistic evidence that stated that
we would expect a leader to use bellicose rhetoric toward an opponent in
public speeches, and we find obviously bellicose language in a speech, we
can conclude with reasonable confidence that the observation matches the
proposed empirical fingerprint. However, we would still want to investigate
whether the particular speech was representative of all of the speeches or
whether extenuating circumstances can lead us to conclude that the speech
does not represent the overall pattern of discourse in the leader’s public
speeches. A “seemingly explosive document” might be followed by a “note
shortly thereafter counteracting the previous verdict” (Darnton 2017–18:
116). Even more insidious is the possibility that a published speech does not
exactly match what an actor actually said in a public speech.
Almost all sources are ambiguous to some extent. According to Munson
(1976: 74), “An expression is ambiguous when it has more than one meaning
and it is used in a situation or context in which it can be understood in at
least two different ways.” Unfortunately, many social science sources can be
deliberately ambiguous—for example, when parties in a negotiation agree
to use equivocal language that can mask disagreement on issues (Iklé 1964:
15–16; Smeets 2015).
And when we think that language in documents or interviews is unam-
biguous, it is typically because we know too little about the context. Lan-
guage and its usage are terribly complicated. Building on Austin (1970) and
Wittgenstein (1973), Schaffer (2016: 34–39) has suggested a set of helpful
questions for interpreting what language can potentially mean. For example,
are there differences in how close synonyms are used in a document and how
opposites and negations are used? This can involve looking for closely related
keywords in United Nations resolutions to see whether they are used differ-
ently, thereby shedding light on what they might mean in the context. This
investigation can be aided by asking diplomats or lawyers what particular
phrases or words are intended to mean and confronting them with subtle
differences in the usage of terms in particular situations (45–47).
In some situations, one observation can be so definitive (high empirical
uniqueness) that it alone is enough to confirm the presence of the proposi-
tion about evidence. If we are investigating a proposition dealing with lob-

Revised Pages
byists’ influence on politicians and we expected to find accounts of politi-

cians discussing how they were influenced by lobbyist money (predicted
fingerprint), a politician’s admission that she took money in exchange for
influence would lead to the reasonable conclusion that the proposition is
confirmed. However, multiple observations are more typically required to
confirm the presence of the proposition.
Particular observations must be situated within the full body of potential
evidence in a given case—although we almost never have access to the full
evidentiary record (Trachtenberg 2006: 156, 157). If a found observation is
not representative of the complete evidentiary record, inferring that we have
found the predicted mechanistic evidence would be incorrect (i.e., a false
positive). Evaluating representativeness is therefore vital to avoid the cherry-
picking of observations that do not represent the broader pattern in the
empirical record (Darnton 2017–18: 116; Kreuzer 2014). This danger is par-
ticularly acute when we do convenience sampling based on the accessibility
of sources. In effect, by evaluating empirical uniqueness we are introducing
controls on merely seeing what we want to see to avoid what Moore (1966:
xiii) describes as “too strong a devotion to theory,” which “always carries the
danger that one may overemphasize the facts that fit a theory beyond their
importance in the history of individual countries.” We therefore recommend
that the representativeness of individual observations be explicitly evaluated
when discussing empirical uniqueness. We must specify our search strat-
egy in the empirical record, detailing where we have looked and why some
sources are cited but others are not (Darnton 2017–18: 111, 116).
For example, if we are testing a causal mechanism relating to whether
a decision-making process was rational, we might have operationalized a
proposition about empirical fingerprints that stated that we should expect
to find evidence in the form of well-organized meetings in which decision-
makers evaluated the costs and benefits of different options. However, find-
ing in the minutes of a particular meeting that the discussion was relatively
well structured does not necessarily allow us to infer that the proposition
about the fingerprints left by the activities of this part of the mechanism
was actually present. A plausible alternative empirical explanation for find-
ing this observation could be that the particular meeting was the exception
that proved the rule, with later meetings being relatively unstructured and
chaotic. However, if we found that systematic cost-benefit deliberations oc-
curred during a majority of meetings in the process and in particular the
key meetings where decisions were actually made, it would be much harder
to claim that the proposition about evidence (“well-organized meetings”)

Revised Pages
was not present. Here, multiple observations would be required to assess

whether the proposition was actually present in the empirical record. We
need to elaborate on how the found observation fits into the overall jigsaw
puzzle of potential empirical evidence that the causal mechanism is theo-
rized to produce.
Once we have evaluated whether the found/not found observations en-
able us to conclude whether or not the proposition about evidence was pres-
ent, we then must discuss whether we can trust the found evidence before
we can make inferences about the presence of a mechanism.
6.4. Can We Trust the Found/Not Found Observables?
Whether we can trust our sources relates to questions of measurement error.

The risk of measurement error is crucial in determining the probative value
of found/not found observations (Howson and Urbach 2006). Inaccurate
observables can be the product of either nonsystematic or systematic er-
rors in the sources we use. Nonsystematic (or random) error is commonly
termed reliability, whereas systematic error is the level of bias in our source.
A found observable that has a low probability of accuracy means that
finding it does little to update our confidence in the existence of the mecha-
nistic evidence; in other words, untrustworthy sources have low empirical
uniqueness because finding them can also explained by such alternatives as
“The source did not remember correctly” and “The source was deliberately
misleading us.” If we do not find an observable but we are unable to deter-
mine whether the source either forgot something or deliberately hid some-
thing from us, the observable would have low empirical certainty. In both
instances, inferring that we have either found or not found the predicted
evidence based on the untrustworthy source would be incorrect (depicted
on the right side of figure 6.1).
High degrees of confirmation that the posited empirical fingerprints
were actually present in a case based on found observables are possible only
when we can document that the observable means what we claim it does and
we can trust it. High degrees of disconfirmation that the posited empirical
fingerprints were actually not present in a case based on not found observ-
ables are possible only when we can document that we have had almost full
access to the evidentiary record and we can trust our sources.
Accuracy issues related to reliability are unavoidable (we are human) but

Revised Pages
can be minimized by carefully double-checking our sources and by doc-

umenting in online appendixes the content of sources and the analytical
procedures used to evaluate them. In contrast, validity problems stem from
sources deliberately not telling us the truth or from confirmation bias, where
we see what we want to see instead of examining each source critically.
We should generally be suspicious when evaluating the probability that
found/not found observables can be trusted. A source of an observable that
appears to be a definite smoking gun in favor of a theory is probably too
good to be true, either because of the inherent ambiguity and messiness of
empirical sources or because political and bureaucratic actors can have moti-
vations to distort the truth. When actors have strong interests, we should as
a rule be more skeptical about whether we can trust them as sources, other
things equal. We should therefore focus particularly on assessing the degree
to which a found/not found observation either favors or disfavors the pre-
dicted evidence more than seems reasonable in context.
For each of our empirical sources, we therefore must evaluate transpar-
ently the degree to which we trust that it actually measures what we think
it measures (measurement accuracy). The good news is that we can use the
Bayesian framework to improve our confidence in measurement accuracy by
corroborating with other sources. Here, independence among sources is cru-
cial, in that if we find similar patterns we can make the Bayesian argument,
How probable is it that we would find similar patterns in the sources unless
we are actually measuring what we wanted to measure? If pieces of empirical
material from different sources are truly independent, it is highly unlikely
that the observations will result in the same findings unless they are actually
measuring what happened (Howson and Urbach 2006: 125).
We can attempt to estimate the degree of inaccuracy by critically evaluat-
ing its size and direction through triangulation—in particular, by corrobo-
rating via other sources whose independence can be assessed. We can do
so both by critically assessing the source of each piece of evidence and by
comparing it with other relatively independent pieces to assess the size and
direction of bias.
Some case study methodologists have suggested that considerations
about the trustworthiness of sources can be folded into our evaluation of
the theoretical uniqueness of evidence, claiming that error is always related
to a causal hypothesis. In other words, an alternative explanation for finding
the evidence when the causal relationship does not hold can be that an ac-
tor has motives for not telling the truth (e.g., Fairfield and Charman 2017:

Revised Pages
370).1 This idea conflates the empirical evaluation of particular sources with
the theoretical evaluation of what predicted mechanistic evidence can tell us
about a hypothesized process (or its parts). In particular, reliability is usually
not a function of a causal hypothesis but instead relates to factors linked
to the source itself. There are, for example, natural cognitive limitations to
human memory, where actors do not remember that something took place
even though it did or incorrectly recall events. But even validity concerns are
not always correlated with our hypotheses about mechanisms.
An additional reason to separate theoretical and empirical evaluations
relates to the different procedures by which we can increase the probative
value of evidence by increasing either theoretical certainty/uniqueness or
empirical certainty/uniqueness. Theoretical uniqueness or certainty can be
increased only by changing predictions about which evidence will be found
or not found (e.g., uniqueness can be increased by making an expectation
about evidence more specific, such as precisely described phrases in particu-
lar documents). Empirical uniqueness can be improved not only by search-
ing for better sources that help to develop a more representative picture
of the empirical record but also by engaging in extensive source criticism,
asking critical questions in relation to particular sources (e.g., assessing po-
tential motivations for distorting accounts and corroborating with other in-
dependent sources).
Unless we explicitly discuss our confidence a given source using ques-
1. Fairfield and Charman also claim that this cannot be expressed mathematically. How-
ever, accuracy can be expressed in terms of probability (p(a)), where an unreliable measure
has a low probability of accuracy, and vice versa. Entered into Bayes’s theorem, an unreliable
measure reduces the ability of evidence to update our confidence in whether or not a hypoth-
esis is true. Howson and Urbach (2006:111) provide Bayesian reasoning for this statement.
(The probability of accuracy (p(a)) enters the calculation of the posterior probability of a hy-
pothesis through the likelihood function. In the nominator of the likelihood ratio is p(e|~h),
which logically equals p(e|~h & a)p(a) + p(e|~h & ~a)p(~a). The latter means that the prob-
ability of e being found if the hypothesis is not true when the measure is accurate is multiplied
by the probability of the accuracy of the instrument. This is then added to the product of the
probability of e being found if h is not true and the instrument is not accurate multiplied by
the probability that the instrument is not accurate. In the denominator, p(e|h) = p(e|h & ~a)
* p(~a) means that the probability of e being found if the hypothesis is true equals the prob-
ability of e being found if h is true when the measure is inaccurate times the probability of
the measure being inaccurate. This shows that a very unreliable measure increases the size of
the denominator and decreases the numerator. In plain English, if we find e but p(a) is low,
finding e does little to update our confidence in the veracity of the hypothesis, irrespective of
the theoretical weight of e (based on the likelihood ratio p(e|~h)/p(e|h)), meaning that the
posterior probability is not updated substantially.

Revised Pages
tions that historians typically ask to evaluate veracity, we should assume that
the found/not found observable has little if any probative value. We can ask
five questions of our sources to evaluate whether they are trustworthy.
Question 1: Can the Source Have Known about the Events?
Can the source have known something firsthand about the events he or she
describes? If careful assessment of the position that a particular person held
at the time of an event demonstrates that he or she would not have been
privy to key discussions, we would conclude that the source is not an eye-
witness but instead heard the information from some other source. Further-
more, given that the source misled us about his or her role, should we really
trust anything else this source says? In both instances, our assessment of the
source’s measurement accuracy would fall considerably.
Question 2: How Many Steps Removed from the Events Is the Source?
Second, how many steps removed from the original evidence-generating

process is the source of the found/not found observable? This is one of the
first questions that historians ask because they favor closeness to events: the
fewer the steps between a source and the event that produced the observa-
tion, the greater the probability of accuracy, other things equal. However,
there can be reasons to prefer a secondary source who has triangulated his
or her findings across multiple independent sources over a “hard” primary
source where we have no way of objectively controlling for the amount of
error in the measure or sometimes even knowing the size and direction of
potential bias.
We define primary sources as eyewitness accounts of a given process—for
example, accounts by participants or documents produced by participants
at the time an event occurred. Secondary sources, in contrast, are produced
based on primary sources. For example, the work of a historian studying
primary sources (e.g., the documentary record of a negotiation) is a second-
ary source.
Logically, there are three ways that two sources can be related to events
and each other (see figure 6.2). (1) Source A has given source B information
about events. Here, source A is primary, and source B is secondary in rela-
tion to the event(s). (2) Sources A and B are reports drawn from a common

Revised Pages
Fig. 6.2. Dependence between sources

Source: Based on Erslev 1963.
source C, and C is either known or unknown to us. If C is unknown, then

A and B are both primary to us; if C is known, they are secondary. In both
instances, A and B are not independent, even if we do not know the com-
mon source. (3) If parts of B draw on A but B also reports something from
C and C is unknown to us, then A is primary, the parts of B that draw on
A are secondary, and those parts that rely on the unknown C are primary to
us. However, if both C and A are known, they are primary sources and B is
secondary (Erslev 1963: 44–45).
It can often be difficult to determine whether a source is primary or sec-
ondary. Figure 6.3 illustrates the number of steps removed from the actual
event (the evidence-generating process) that our source may be.
Even though a participant took part in a particular event, he or she is still
one step removed from the actual evidence-generating process itself. Given
the limitations of human cognition, the record of the activities that took
place (event) is not always even accurately reflected in the subjective percep-
tion of what took place among participants.
More problematic is the existence of common cognitive biases, which in-
clude the use of simple categorizations and stereotypes to understand events,
the tendency to simplify causal inferences about what took place, the use of
historical analogies to understand new situations, and the ignoring of infor-
mation and avoidance of situations that produce dissonance with existing
beliefs and images (Jervis 1976; Khong 1992). For example, the human mind
tends to see the actions of others as intentional and planned but tends to
overemphasize the situational causes of one’s own behavior. Actors also tend
to over-or underemphasize their role in others’ policies: “When the other
behaves in accord with the actor’s desires, he will overestimate the degree

Revised Pages
Fig. 6.3. Relation between sources and the evidence-generating process
to which his policies are responsible for the outcome. . . . When the other’s
behavior is undesired, the actor is likely to see it as derived from internal
sources rather than as being a response to his own actions” (Jervis 1976: 343).
But it is by no means certain that the original participants produce a
record of the event. Instead, they might discuss what took place with col-
leagues or with an official record-maker after events have occurred, meaning
that if we interview the colleagues rather than the participant, the interview
would be a secondary source, two steps removed from the original activity. It
is therefore vital to discuss with interviewees the sources of the information
they provide to determine whether they are a primary source who took part
in the event or are referring to official minutes of the meeting or discussions
with actual participants, in which case the interviewee is a secondary source.
Finally, accessibility issues are endemic in the social sciences, especially
for research questions involving corruption or political violence in nondem-
ocratic systems. Therefore, the record of events that is accessible is often
even further removed from the original activities that generated empirical
fingerprints. We may not even know about primary sources closer to events
because they are classified or otherwise inaccessible to us, thereby forcing us
to rely on insiders or the information that authorities decide to make public.
We can use tools such as the dating of documents or text analysis to de-
termine which sources are primary and secondary as well as the number of
steps that our source is removed from events (Milligan 1979). For example,
if source A has the same linguistic phrasing as source B, and we can show
that the production of source A preceded the production of source B, this
suggests that source B should be treated as secondary material that builds on
source A, as it is highly unlikely that they would have used the same phras-
ing otherwise. The sources are therefore dependent on each other.
Question 3: Is the Source Reliable?
Third, what are the reasons to trust a source? Was the source competent to
observe what took place and provide an accurate account of it? For example,
can interviewees accurately recall complicated events that might have taken

Revised Pages
place several years earlier? Was the source able to comprehend events? For
example, in a complicated legal negotiation in a subcommittee, did a partic-
ular politician possess the requisite knowledge to understand what was going
on? If we are working with a document, we would ask ourselves what is nor-
mally included in the type of source (e.g., minutes from a cabinet meeting)?
One way to test for reliability is to use our knowledge of the case to
evaluate a particular source in relation to what we know from other sources
(in effect a form of triangulation). If we are dealing with trade statistics,
in which multiple different indicators are produced by different authori-
ties, do some sources appear to produce more reliable indicators than other
sources? In interviews, we can use questioning techniques used in criminal
interrogations, where reliability is probed by asking similar but differently
phrased questions at different times to assess whether we hear similar things.
If a respondent has an unusually clear recollection of events, we can probe
which factor has enabled the respondent to remember events so well. If, for
example, we find that the events took place on the respondent’s birthday or
on the same day that the low-level civil servant met the U.S. president, the
respondent might be more likely to actually recall events.
Question 4: Does the Source Have Motives for Distorting Content?
Can we provide justifications for the claim that the source had no significant
motives for distorting content? If bias exists (and it always does), the found/
not found observable can measure something other than the predicted ev-
idence (low empirical uniqueness or certainty), resulting in flawed infer-
ences. We must use all available knowledge to assess whether the source has
motivations for distorting content.
Bias can express itself in many different forms, depending on the type of
source. For example, particular documents may be declassified while oth-
ers remain classified, with actors wanting to release only documents that fit
their interpretation of events (Trachtenberg 2006: 157). If we study only the
available documents without taking into consideration what might appear
in other documents, the result could be a skewed picture. Further, bias can
manifest itself in what the source tells us. If we go to a prison in a well-
functioning legal system and uncritically ask convicted criminals whether
they committed the crimes for which they are incarcerated, we might be
amazed at the number of “innocent” people in prison.
Each collected observable should be evaluated relative to what is known

Revised Pages
about the actors, their possible intentions, their interactions, and the situa-
tion in which they found themselves (Thies 2002: 357). How does the docu-
ment fit into the political system, and what is the document’s relation to
the stream of other communications and activities within the policymaking
process? Such assessments require considerable background knowledge. For
example, how does the political system work? Is anything amiss in the events
that have been uncovered? What is normally included in the type of source
(e.g., minutes of a cabinet meeting)?
Unfortunately, for most of social science research questions, actors of-
ten have strong incentives to distort content. Even seemingly innocuous
sources of data such as unemployment figures can be politically motivated
to distort information—for example, by excluding parts of the labor force to
produce lower unemployment figures. More insidious are the validity prob-
lems created by public authorities deliberately cooking the books, as seen in
the systematic underreporting of levels of political violence in authoritarian
systems.
When our sources have strong interests in distorting content, it can be
nearly impossible to evaluate with any degree of confidence the level of bias
without extensive corroboration with other independent sources. Other
things equal, we should therefore assume a much lower empirical certainty
or uniqueness when the political stakes are high unless we can provide strong
justifications for trusting a particular source.
In general, open sources such as public statements contain more poten-
tial bias, given that there can be many reasons for strategic communication
in public. This means that we should place greater trust in confidential re-
cords that will be made public only long after events have taken place, other
things equal (Khong 1992; Trachtenberg 2006: 151–53). Further, we should
trust accounts that go against the motives we would expect a given actor to
have, although this can be very difficult to establish. Historians tend to trust
interviews much less than archival documents, given that respondents “have
a real interest in getting you to see things in a certain light” (Trachtenberg
2006: 154). However, our default position regarding all sources should be
one of skepticism, and any ranking is quite arbitrary. Wohlforth (1997: 229)
warns, “Documentary records are ambiguous in their intrinsic meaning for
many reasons, not least because the historical actors who created them were
deliberately deceptive in their efforts to bring about desired results. Theories
and interpretive debates focus on precisely the information that statesmen
and diplomats face incentives to obscure.”
When assessing actors’ potential incentives actors to distort, we can ask

Revised Pages
ourselves whether, on the basis of what we know about the case, the par-
ticular source might have institutional or personal interests in skewing the
account. For example, if we are studying decision-making before the 2003
Iraq War, we might be very pleased with ourselves for securing an interview
with a central participant such as former secretary of state Colin Powell. Yet
if we assessed motives even superficially, we could seriously doubt Powell’s
accuracy as a witness to events, since he has strong institutional and personal
interests in producing an unflattering account of the Bush administration
and particularly former vice president Cheney and former secretary of de-
fense Rumsfeld. Powell “lost” the internal debates and consequently was
publicly humiliated when he went to the United Nations Security Council
before the invasion with dubious intelligence. His position finally became
so untenable that he left office, meaning that his revenge motive is quite
strong. Of course, if Powell were to give an account that ran against what we
expected based on all available information regarding his potential motives,
we would tend to trust it even more. Here, the Bayesian reasoning is this:
How probable is it that Powell would tell us an account that manifestly goes
against everything we know about his potential motives unless it is actually
a true account? Conversely, actors might have motives of which we are not
aware, meaning that there is a limit to our confidence in accounts that ap-
pear to go against actors’ interests.
Question 5: How Can We Cross-Check to Increase Our Confidence

in Accuracy?
Every individual source should be vigorously cross-checked with indepen-

dent sources when possible in an attempt to evaluate measurement accuracy.
This recommendation is often suggested as the best solution to the problem
of inaccurate measures, and it is commonly referred to as triangulation. Tri-
angulation can involve collecting observations either from different sources
of the same type (e.g., interviewing different participants or collecting ob-
servations across different types of sources, such as archives and interviews)
or of different types, if available (e.g., pattern, sequence, account, trace).
Another method is to use area experts to cross-check our measures, exploit-
ing their contextual knowledge to evaluate the accuracy of particular sources
(e.g., most experts might treat the work of a particular historian with extreme
skepticism) and passing judgment on our evaluations of evidence (Bowman,
Lehoucq, and Mahoney 2005: 940). However, this is by no means a panacea,

Revised Pages
as area experts often have a horse in the race and thus cannot be treated as
neutral, nonbiased sources.
6.5. Challenges Associated with Different Types of Sources
No one said that working with empirical material in process-tracing was

easy. The empirical evaluation of whether a found/not found observation
actually is (or is not) the mechanistic evidence that our theoretical proposi-
tion predicted requires considerable empirical knowledge about the context
in which the evidence was found.
Interviews
One of the most commonly used sources of mechanistic evidence in process-

tracing research is interviews—whether elite interviews, where the respon-
dents provide information about events or their motivations therein, or
interviews with persons who are treated as representatives of a particular
worldview. Interview observations are primarily used to supply account evi-
dence, where we are interested in the recollections of participants for differ-
ent aspects of a process, and sequence evidence, where we want to gather
information about the order in which events took place.
The content of what is provided in an interview must be assessed to
see whether we found what our proposition expected us to find and if so,
whether there are alternative explanations for finding it in the context of
the complete evidentiary record (empirical uniqueness). What does the re-
spondent say—that is, what is the observation? One of the advantages of
elite interviews is that they may offer the opportunity to interview people
who actually participated in the events under investigation. Participant ac-
counts potentially offer a more direct measure of the activities associated
with a causal mechanism, depending on how the theoretical test has been
operationalized. Further, interviewing allows the researcher to move beyond
written accounts and gather information about the underlying context of
events. Was the selection of sources biased in one direction? If we are analyz-
ing a political negotiation, have we spoken to both the winners and losers?
To quote a proverb, “Success has many fathers; failure is an orphan.”
Once we have determined that we appear to have found or not found
the mechanistic evidence predicted by a proposition, we then need to assess

Revised Pages
the degree of accuracy of the found observation in terms of reliability and

potential bias. The first question is what role the interviewee played in the
process. Should the observation be treated as primary or secondary material?
Normally, we should expect the accuracy of a primary interview source
to be higher than that of a secondary source, but this might not always be
the case. A participant who was unable to comprehend what was happen-
ing or who perceived events in an incorrect manner is less accurate than a
secondary account by an expert observer who had full access to the docu-
mentary record and many different participant accounts. However, if the
observer is biased toward a particular theory about why events happened,
the secondary account would also contain significant measurement errors
that would decrease the accuracy of the observations.
A particular challenge is raised by the fact that respondents will some-
times overstate their centrality in a political process. Could the respondent
know what took place behind closed doors? Kramer (1990) gives the example
of former Soviet ambassador to the United States Anatoly Dobrynin, whom
some historians have used as an inside source for observations about the So-
viet decision-making process leading to the Cuban Missile Crisis. However,
despite Dobrynin’s claims, Kramer argues that Dobrynin was not privy to
the highest-level deliberations (215). Most damningly, Dobrynin was not
even informed about the deployment of missiles until after the United States
had discovered them (215). Therefore we would assess his account of the
high-level deliberations as not very accurate.
When we are dealing with secondary sources, the observations might be
particularly unreliable if the interviewee has relied on hearsay. To check for
this, the researcher should ask the respondent about the sources for his or
her claims about what happened in a political process. Does the respondent
build those claims on the minutes of the meetings or from detailed discus-
sions with participants immediately after the negotiations took place?
Another pitfall is the length of time between an event and when the in-
terview is conducted. Participants interviewed immediately after a negotia-
tion can be expected to have a reasonable recollection of the details, but as
time passes, observations become less reliable. The imperfections of human
memory mean that interviews will never be a perfectly reliable measuring
instrument. Reliability can, however, be improved through the careful use
of triangulation both across different persons and among different types of
sources (interviews, archival observations, and so forth).
More insidious is the risk that reading other accounts and talking with
other participants will over time cause participants to change their interpre-

Revised Pages
tation of events to match other accounts, resulting in potential bias in our

observations if we interview these participants long after the fact. To assess
potential bias, we need to ask whether the respondent has a potential mo-
tive to present a skewed account. Indeed, we need to ask ourselves why the
respondent has chosen to be interviewed. Interviewing only the winners in
a political negotiation, for example, should raise warning flags about the
potential bias of the material provided.
However, triangulation requires establishing that the sources are inde-
pendent of each other. If we are triangulating across interviews, we need to
make sure that we have conducted interviews with participants from differ-
ent sides. If we are investigating a case involving a negotiation of a bill in the
U.S. Congress, we would say that triangulation across independent sources
has occurred if we interviewed members of both parties in the House and
Senate, along with their aides, and lobbyists who were involved, together
with members of White House who took part in the negotiations. Finding
the same account evidence in multiple independent sources would be highly
unlikely unless the observation is an accurate measure.
Triangulation across different types of sources can also be used to check
the accuracy of interview observations, again contingent on the indepen-
dence of the different sources. If we interview someone who took part in a
meeting and then find the same observations in the minutes of the meeting,
our confidence in the accuracy of the observations could increase. However,
if the interviewee also wrote the meeting minutes, finding the same observa-
tion in two different sources would do nothing to increase our confidence in
the accuracy of our observations.
Archival Material
Process-tracing scholars often aspire to produce case studies that build on

what social scientists often term “hard” primary evidence—for example, of-
ficial internal documents produced by public authorities that describe what
took place behind closed doors (e.g., Moravcsik 1998: 80–85). While even
internal documents can include measurement error, when they are gener-
ated as an internal record of what took place we can be reasonably confident
that they are accurate, for “what would be the point of keeping records if
those records were not even meant to be accurate” (Trachtenberg 2006: 147).
Conversely, many classified archival sources are ridden with error and bias.
As historian David Garrow said regarding the release of the FBI’s error-

Revised Pages
ridden analysis of civil rights leader Martin Luther King Jr.’s activities, “The
number one thing I’ve learned in 40 years of doing this, is just because you
see it in a top-secret document, just because someone had said it to the FBI,
doesn’t mean it’s all accurate” (cited in Phillips 2017). We therefore need to
be very careful before we assume that all documentary materials are hard
primary sources.
Archival material can provide all four types of evidence. Pattern evidence
could, for example, entail counting the number and length of documents.
Sequence evidence could be in the form of documents describing what
meetings took place over a period of time (e.g., agendas for meetings). Trace
evidence could take the form of meeting minutes, proving that a meeting
actually took place. Finally, meeting minutes could also be used as account
evidence for what took place in a meeting.
The first step in evaluating the accuracy of archival material is examining
the authenticity of a document. If something seems amiss in the author-
ship, time period, style, genre, or origin of a document, we must uncover
its past to evaluate whether the source can tell us anything. Relevant ques-
tions are: (1) Was the document produced at the time and place when an
event occurred, or was it produced later and/or away from where the events
took place? (2) Is the document what it claims to be? (3) Where and under
what circumstances was it produced? (4) Why was the document created? (5)
What would such a document be expected to tell us? Naturally, if the docu-
ment is not authentic, it has no inferential value for us.
The rest of this section deals primarily with challenges related to deal-
ing with official archives, such as foreign ministerial archives. However, for
many social science research questions, such archives may be either unin-
formative or irrelevant. The personal archives of participants can also be
relevant but can raise daunting challenges of access. And for many research
questions, there are no relevant archives, meaning that the historical record
must be traced using other types of sources.
The historical record is usually quite ambiguous (Wohlforth 1997), and
we do not suggest that social scientists enter the archives before they have
operationalized some tests of the presence/absence of each part of the hy-
pothesized causal mechanism. Finding answers in archival records is often
akin to finding a needle in a haystack. Without a clear set of theory tests to
guide the search, it can be more akin to a mission impossible—finding some
small unspecified object in the haystack.
Before we can admit observations gathered in archival material as evi-

Revised Pages
dence, we need to assess what the observation actually is in light of our

background knowledge (e.g., the context in which it was produced) and the
potential sources of measurement error that can exist in the document.
Basic questions include What is the source of the document? Can we assume
that it is authentic? Why was the document created, and under what circum-
stances? (Thies 2002: 357, 359). Using an example from Trachtenberg (2006:
141), questions that could put documents related to an international conflict
in context include What did each country want? What policies were they pursu-
ing? What kind of thinking was the policy rooted in? What did each side actually
do? How does the document fit into this story? We need to assess the purpose of
the document and the empirical process leading to its creation to enable an
accurate interpretation of what evidence the observation provides (Larson
2001: 343).
In assessing archival sources, it is useful to consider what is usually in
this type (genre) of document. What is typically included in the minutes
of a National Security Council meeting or a CIA study paper? We can then
move on to evaluating the possibility of measurement error in the observa-
tion. Does the source give a reliable account of what it alleges to measure
(Trachtenberg 2006: 146)? If the source alleges that it records what took
place in a meeting, do we find other evidence that confirms that the meeting
was held? If the meeting was between governments, do we find accounts in
the archives of the other participants? If the document was produced long
after the meeting, we should expect it to be less reliable, other things equal.
Is there any form of systematic error in the observations? We know that
high-level actors attempt to skew the historical record to favor their accounts
(Wohlforth 1997). One way in which this can occur is that documents are
selectively declassified, with sources released only when they are slanted to-
ward the account of events favored by authorities. We therefore have to ask
ourselves why has this particular document been declassified, whereas others
have not? Is there are potential bias in this selective declassification?
Further, what is the likelihood that the observations themselves have
been manipulated? In other words, do the minutes of meetings reflect what
took place or the account favored by a decision-maker? The risk of this form
of bias is particularly problematic in centralized dictatorial systems like the
Soviet Union, where official records reflect only what leaders want them
to say (English 1997). Evidence suggests that the documentary record in
the Soviet archives record reflects the views of leaders, especially during the
Stalin years.

Revised Pages
But the problem also exists in democratic systems, as a fictional exchange

between two civil servants in “Official Secrets,” a 1987 episode of the BBC
TV series Yes Prime Minister, illustrates:
Sir Humphrey: What I remember is irrelevant. If the minutes don’t say

that he did, then he didn’t.
Bernard: So you want me to falsify the minutes?
Sir Humphrey: I want nothing of the sort! . . .
Bernard: So what do you suggest, Sir Humphrey?
Sir Humphrey: The minutes do not record everything that was said at a
meeting, do they?
Bernard: Well, no, of course not.
Sir Humphrey: And people change their minds during a meeting, don’t
they.
Bernard: Yes . . .
Sir Humphrey: So the actual meeting is actually a mass of ingredients
from which to choose from. . . . You choose from a jumble of
ill-digested ideas a version which represents the PM’s views as he
would, on reflection, have liked them to emerge.
Bernard: But if it’s not a true record . . .
Sir Humphrey: The purpose of minutes is not to record events, it is to
protect people. You do not take notes if the PM says something
he did not mean to say, particularly if it contradicts something he
has said publicly. You try to improve on what has been said, put it
in a better order. You are tactful.
Bernard: But how do I justify that?
Sir Humphrey: You are his servant.
Another challenge with archival documents relates to the relationship be-

tween the document’s producer and its consumer. In many circumstances,
civil servants are like other producers in that there must be a demand for
their product. Without demand, the civil servant becomes irrelevant and
might be demoted or lose his job. If decision-makers do not perceive that
the information provided by the organization is useful in relation to policy
goals, the organization might have its budget slashed. A classic example of the
producer-consumer relationship can be seen during the Vietnam War (Ford
1998): CIA assessments produced by junior officials in the field in the early
1960s were consistently very pessimistic, but more senior CIA officials in Sai-

Revised Pages
gon knew that the consumers of intelligence in Washington, DC, wanted

to hear that the United States was winning. In response to pressure from
above, therefore, junior officials edited documents to paint a more optimistic
picture. If we are testing a part of a causal mechanism dealing with whether
high-level decision-makers were aware of the real situation on the ground
in Vietnam, our results would be very different depending on whether we
gain access to the first, unedited version or the redacted final version sent to
Washington, DC.
As with interviews, triangulation can be used to assess and potentially
correct for measurement error, as long as the sources are independent of
each other. If we find the same account of an event in the archives of two
different governments, it is highly likely that they actually are measuring
what took place.
Memoirs, Public Speeches, and Other Forms of Primary Sources
In contrast to “hard” primary sources, “soft” primary sources include mem-

oirs and diaries of participants, private letters, and public statements and
speeches. Such sources are usually intended to be made public. Only the
most naive person would expect letters detailing important events to remain
private after the writer’s death. Therefore, even “private” letters can be as-
sumed to have been written with the understanding that they might some-
day become public. We can, however, use these sources to provide account
evidence, though it of course must be taken with a grain of salt. These softer
sources are usually reliable for sequence evidence, although participants in
particularly sensitive negotiations may also have strong incentives to distort
the timetable of events. The most interesting memoirs tend to be those pro-
duced by secondary officials who were eyewitnesses but who were not the
key decision-makers.
When assessing these types of observations, we need to ask the same
questions that we ask of interview observations. How close was the person to
the events? Did the person participate, or is the information from hearsay?
Does the source have a motive to present an accurate account of the events?
For example, politicians might have a tendency to overstate their impor-
tance in a series of events, whereas civil servants often downplay their roles
to keep the appearance of neutrality.
We should be particularly cautious in claiming that an observation is

Revised Pages
accurate when the information contained in it aligns with the interests/mo-

tivations of the source. Conversely, if the author gives an account that does
not accord with his or her interests, it is more likely to be accurate.
Public statements and speeches can be used as evidence in specific cir-
cumstances. Speeches often serve to justify policy choices and therefore can-
not be used to measure the real motivations behind a decision. For example,
public justifications for war in Iraq included speeches by Bush about weap-
ons of mass destruction that were not necessarily the real motivation behind
the invasion. However, if the same justifications appear in private settings
where policymakers can speak more freely, they are more likely to accurately
reflect participants’ motivations (Khong 1992: 60).
Historical Scholarship
When we are doing process-tracing research, we are usually very dependent

on secondary historical scholarship. In the words of Skocpol, “Redoing pri-
mary research for every investigation would be disastrous; it would rule out
most comparative-historical research. If a topic is too big for purely primary
research and if excellent studies by specialists are already available in some
profusion—secondary sources are appropriate as the basic source of evidence
for a given study” (quoted in Lustick 1996: 606). Historical work can be
used as sequence and account evidence.
When we are using historical reports, we must remember that historical
work is not “the facts.” We need to assess carefully the reliability and poten-
tial bias of each observation taken from historical scholarship.
Historians are people and can make mistakes. Historians may misin-
terpret primary sources and consequently draw incorrect descriptive infer-
ences about what actually happened. This does not mean that we cannot
use historical accounts but merely means that we must be aware of the fact
that any given work is potentially unreliable. To reduce this risk, we should
triangulate observations from multiple historical sources to ensure that we
reduce the risk of random measurement error.
More problematic is the fact that the work of historians reflects their
implicit (or sometimes explicit) theories. The “historical record” is there-
fore not a neutral source of information but instead reflects these theories
(Lustick 1996). There is a substantial risk of bias when a social scientist
with a particular theoretical and conceptual predisposition purposefully

Revised Pages
selects work produced by historians who share this bias, resulting in the
mistaken confirmation of the social scientist’s preferred theory (Lustick
1996: 606).
For example, 1950s scholarship on the start of the Cold War tended to at-
tribute the conflict to either the innate expansionistic tendencies inherent in
communist doctrine (the “neurotic bear”) or Soviet behavior as a traditional
great power (T. J. White 2000). Revisionist accounts of the 1960s then con-
tended that U.S. expansion in keeping with capitalist interests triggered the
Cold War (e.g., W. Williams 1962). During the 1970s, the “postrevisionist”
school saw the Cold War primarily as the product of misunderstandings that
could have been avoided at least to some degree (e.g., Gaddis 1972). If we are
interested in testing a liberal theory dealing with the impact of perceptions
and misperceptions, we would find supporting evidence in the postrevision-
ist school and disconfirming evidence in the previous two schools. In Bayes-
ian terms, such bias implies a form of rigging of results that undermines our
ability to update our confidence in the accuracy of a hypothesis.
Lustick’s (1996) solution to this bias problem is to first know the historio-
graphic schools and then triangulate across them. We should select histori-
cal works that are best suited to provide a critical test. If we are testing an
ideational mechanism dealing with actor perceptions, we should not choose
works from the postrevisionist school as sources of evidence.
Newspaper Sources
Finally, newspaper and other journalistic sources can in certain circum-

stances provide quite accurate observations. Many political scientists down-
play the utility of these sources. In the words of Moravcsik (1998: 81), “Jour-
nalists generally repeat the justifications of governments or the conventional
wisdom of the moment without providing much with which to judge the
nature or reliability of the source. Second; and more important, even . . . if
reliable, their sheer number and diversity means that the ability of an ana-
lyst to present such evidence tells us little or nothing.” Larson suggests that
newspaper sources can provide important background material about the
context in which decisions were made and about what events took place but
cannot be used as “evidence” in process-tracing research unless the observa-
tions are triangulated with other types of sources.
However, observations from newspapers that have rigorous editorial

Revised Pages
standards can provide quite accurate evidence. Good journalists determine

independence between sources as part of the editorial review process, where
editors ask for proof that sources are truly independent. Before a story is
printed, a reporter must carefully document the independence of sources,
providing cross-confirmation that increases the editor’s confidence in the
story’s accuracy. In theory, information collected just after the events took
place, triangulated, and passed through a rigorous editorial peer review pro-
cess with the results made public would have a high degree of accuracy.

Revised Pages
C h a p ter 7
Evaluating the Overall Probative Value

of Mechanistic Evidence
Not everything that can be counted counts, and not everything that
counts can be counted.
—Albert Einstein
7.1. Introduction
Once we have evaluated our sources for what they tell us and whether we
can trust them and evaluated theoretically what a particular fingerprint can
tell us, we possess individual pieces of mechanistic evidence that enable us
to confirm or disconfirm to some degree the existence/ nonexistence of a
mechanism or a part thereof. But when we are working with mechanistic
evidence in process-tracing, we typically operate with many different pieces
of evidence for the hypothesized causal process (or for each part of the causal
mechanism when using in-depth process-tracing). This means that we need
to sum together the collective picture painted by multiple pieces of mecha-
nistic evidence. This chapter develops a procedure for the aggregation of
mechanistic evidence using argument road maps that delineate the two-
stage evidence-evaluation framework (from theory to propositions and from
propositions to actual sources of observations). The chapter concludes with
an extended reconstruction of the two-stage evidence-evaluation framework
as applied to Tannenwald’s (1999) case study of the nuclear taboo in the
Korean War.
The first step in aggregating mechanistic evidence in support of infer-
ences about a causal relationship is to map the structure of the arguments
about the empirical fingerprints that a causal relationship might have left
223

Revised Pages
(propositions) and the actual evidence found (or not found) for each propo-
sition. Some propositions might have the character of a smoking gun in rela-
tion to the overall causal relationship being studied because they are very
theoretically unique (van Evera 1997), where finding evidence of the propo-
sition is highly confirming but not finding it would do little to disconfirm
the causal relationship. Other propositions can be more like hoops, with
high theoretical certainty—that is, not finding supporting evidence would
be highly disconfirming, but finding it tells us little (van Evera 1997). We
also have to position each found/not found observation in relation to each
of these propositions about mechanistic evidence.
In most process-tracing research situations, there is a complex, multilayer
structure in the collective body of evidence.1 Multiple propositions about evi-
dence may be related to each part of the mechanism. Even more complex is
the fact that some propositions can be logically composed of multiple support-
ing propositions. For example, the famous smoking gun is actually a cluster
of supporting propositions about evidence in the form of “the gun is the same
as used in the crime” and “the suspect’s fingerprints are on the gun.” If, for
example, the gun is the same one used in the crime, but no evidence links the
suspect to the gun, the found gun is not the proverbial smoking gun.
Propositions about mechanistic evidence need to be decomposed into
supporting propositions because each of them might have different logi-
cal relationships to the proposition: some might be theoretically unique,
whereas others might be theoretically certain. In the case of the cluster of
propositions related to the smoking gun, both are theoretically certain indi-
vidually but only theoretically unique when combined together.
Levels of theoretical certainty and uniqueness are developed for the logical
links between propositions and supporting propositions about empirical fin-
gerprints. Empirical uniqueness or certainty is assessed for the links between
found/not found observations and propositions and supporting propositions.
7.2. Argument Road Maps
When aggregating evidence, we must start at the lowest level (supporting

propositions), evaluating the observations for each supporting proposition
1. While Roberts (1996) presents a similar multilayered diagram, he uses it only to trace the
events that led up to the event to be explained, whereas our framework focuses more explicitly
on mapping the underlying nature of the causal arguments put forward and the evidence used
in support of these claims.

Revised Pages
Evaluating the Overall Probative Value of Mechanistic Evidence 225
individually and then evaluating what it can tell us about the existence of the
proposition before we move onto evaluating how the supporting proposi-
tions relate to overall propositions, and so on.
In the argument road map depicted in figure 7.1, we are testing a causal
theory about a suspect who is hypothesized to have killed someone. Proposi-
tion 1 states that there should be evidence that the victim was actually killed
by a gun. This proposition is highly theoretically certain, for if the person
was killed by a gun, it would leave clearly observable signs (fingerprints).
If our source of the observation is an autopsy performed by a credible and
competent authority, we should also expect that the empirical uniqueness
and certainty will be relatively high. This means that if the autopsy tells
us that the person was killed by a gun, we can beyond reasonable doubt
confirm that the person was killed by a gun. However, just because we find
strong confirming evidence that the victim was killed by a gun does not
mean that the suspect committed the crime: the proposition has low theo-
retical uniqueness in relation to the causal hypothesis.
Proposition 2 states, “There is a smoking gun that links the suspect to the
crime.” In relation to the overall causal hypothesis that the suspect is guilty
of murdering the victim, the smoking gun proposition is highly confirm-
ing (high theoretical uniqueness), but the proposition tells us little if we do
not find the smoking gun (low theoretical certainty). Each of the support-
ing propositions is theoretically certain but not unique individually; only
together are they relatively theoretically unique. Sources for proposition 2a
could be the suspect’s fingerprints on the gun or a witness who has seen the
suspect holding the gun and then dropping it on the ground as he or she
fled. The fingerprints, if found, would have relatively low empirical unique-
ness because they confirm only that the suspect held the gun at some point,
not that he actually used it.
Each of these propositions or supporting propositions has different mecha-
nistic evidence that might be relevant; even more important, the inferential
relationship in terms of empirical certainty or uniqueness between found and
not found evidence and supporting propositions is not necessarily the same
as the theoretical uniqueness and certainty between supporting propositions
and higher-level propositions and causal hypotheses. For example, relevant
confirming evidence for proposition 2 could be a forensic report that shows
a ballistic match between the smoking gun and the crime. However, finding
strong confirming evidence of P2a alone does not enable us to confirm the
smoking gun proposition 2; in the absence of other evidence, it could be just
as probable that someone else used the gun to kill the victim. In contrast, if we

Fig. 7.1. An argument road map—mapping propositions, sources of evidence,
and the relationship with causal hypotheses
(key: Solid lines indicate theoretical relationship between propositions about empirical
fingerprints and supporting propositions, dotted lines indicate theoretical relationship
between propositions and sources of evidence. For links between propositions and
causal hypotheses, theoretical certainty (c) and uniqueness (u) are depicted. For the links
between propositions and actual sources of evidence, the questions asked are whether
the found or not found evidence is empirically unique [or empirically certain, if not
found].)

Revised Pages
find a credible witness who testifies that she saw the suspect fire the gun at the
victim (evidence of P2b), given that (1) the piece of found evidence is highly
empirically unique, (2) proposition 2b is highly theoretically unique in rela-
tion to proposition 2, and (3) proposition 2 is theoretically unique in relation
to hypothesis 1, we could, based on this one piece of evidence, confirm the
overall hypothesis 1 with a reasonable degree of confidence.
An additional consideration is whether the pieces of evidence support-
ing a particular proposition are independent of each other, in which case
they can be summed together to enable more updating, or whether they
have varying degrees of dependence, which lowers our ability to sum them
together.
The argument road map can be presented in table form, as shown in table
7.1. A table is an efficient form of presentation when dealing with multilevel
propositions and/or more pieces of evidence. Presentation in table form also
enables us to provide transparent justifications for why we believe, for ex-
ample, that particular sources of evidence is highly empirically certain and/
or unique in relation to a given proposition. The “lowest” level is the actual
pieces of empirical evidence on which inferences are based, followed by the
logical links from empirical evidence to propositions and/or supporting prop-
ositions. We could also present our case study in an analytical narrative form,
but key parts of the narrative should be explicitly linked to the underlying set
of propositions and evidence and the justifications for why empirical material
is evidence (theoretical and empirical certainty or uniqueness). The table can
then be presented as an appendix (either printed or online). However, aggre-
gating pieces of evidence is not just a simple additive process in most research
situations. In particular, we often face situations where we find both pieces of
confirmatory and disconfirmatory evidence (a mixed evidential picture) in re-
lation to propositions, or we find evidence only for some propositions but not
for others in relation to the overall causal mechanism being assessed.
When aggregating, we therefore first need to map the structure of the
propositions about empirical fingerprints left by activities and how they re-
late logically to the underlying hypothesized causal mechanism. This takes
the form of an assessment of theoretical certainty and uniqueness, where
justifications are provided for why the proposed empirical fingerprints could
be evidence of the activities of the mechanism or a part thereof. This should
be followed by the evaluation of the found (or not found) observations for
each proposition or supporting proposition in terms of what it can tell us
about the proposition and whether we can trust it (empirical certainty or
uniqueness). Using the road map and the logical links between evidence

TABLE 7.1. An Argument Road Map in Table Form
Causal hypothesis 1: Suspect killed the victim
Propositions linked to causal hypothesis 1
1 The victim was killed by a gun (Hc, Lu) (see justifications for reasoning in text)
• observation P1(i) Autopsy report documents that victim was killed by a gun
• The report is quite empirically unique, because we can trust
the source (credible and competent authority) and because
finding a bullet hole and the associated signs of death by
gunshot are difficult to account for with alternative empiri-
cal explanations (e.g., someone poisoned the victim but then
reconstructed all of the forensic traces that would be produced
by death by gunshot and masked the signs of poisoning).
2 A smoking gun links the suspect to the crime (Lc, Hu)
Supporting proposition Gun same as used in crime (Hc, Lu)
P2a (see justifications for reasoning in text)
• observation P2a(i) Ballistic match found in forensic test that documents that this
gun killed the victim.
• From forensic science, we know that the ballistic match
would be highly empirically unique, and if we can expect the
forensic lab to be competent and independent, meaning we
can trust the source.
Supporting proposition Suspect used gun to kill victim (Lc, Hu)
P2b (see justifications for reasoning in text)
• observation P2b(i) Suspect’s fingerprints found on gun
• I f found, it does not mean the suspect fired the gun, only that
the suspect touched the gun at some point (low empirical
uniqueness). However, if the crime scene investigation and
forensic lab collected and assessed the evidence in a compe-
tent manner and were independent, we could at least trust the
source.
• observation P2b(ii) Testimony of an eyewitness who saw the suspect with gun in
hand.
• There can be many reasons why we might not trust the
witness. In particular, we can never really know whether he
or she is independent of the victim or suspect. In addition,
the witness might have been far away and not really seen the
suspect (or even worse, had poor vision), or the crime might
have occurred in a dark place. Depending on the credibility of
the witness, this might range from low to medium empirical
uniqueness (Lu, Mu).
•A
ggregation of evidence Because both pieces of evidence are found, we can confirm that
for supporting proposi- the suspect used the gun.
tion P2b • If only P2b(i) is found, we cannot confirm, whereas if P2b(ii)
is found, we might cautiously infer that P2b is present,
although the low expected accuracy of the witness reduces the
probative value of the evidence.
• If neither is found, only a slight disconfirmation occurs.

Revised Pages
and propositions, we then start the aggregation process at the lowest level
proposition, assessing in which direction the weight of evidence for each
proposition or supporting proposition points. For example, if we have found
multiple independent pieces of confirmatory evidence that we can trust for
a given proposition, we would add together their probative value, enabling
a strong confirmatory inference. However, the picture is often murkier, with
different pieces of evidence pointing in different directions. When we face
this common situation, the aggregation process must be as transparent as
possible, with clear justifications for why we conclude that the collective
body of evidence points more in one direction than the other.
7.3. Rules for Aggregating Evidence
We develop two sets of rules for aggregating evidence, enabling the system-
atic and transparent evaluation of what causal inferences and the degree
of confidence in them that are warranted based on the collective body of
empirical evidence. The first set of rules relates to adding together evidence
at one level; we then discuss how to move from lower to higher levels of
proposition using the second rule.
Rule 1—The Additive Properties of Independent Evidence, but

with Diminishing Returns
The basic rule for aggregating evidence is that two pieces of evidence that are
independent of each other have an additive effect in terms of the amount of
updating they enable (Fitelson 2001: S125; Good 1991: 89–90). This results
in rule 1a:
Rule 1a—The probative value of independent pieces of evidence

should be summed together unless the conditions for rules 1b
and/or 1c hold.
Rule 1a relates to sequential updating, where a piece of new (independent)

evidence is used to update our posterior confidence, which then becomes the
prior for the next round of updating.2 Following Good (1991: 90), evidence
2. For the debate on this issue of “old” evidence and sequential updating, see, e.g., Eels and
Fitelson 2000; Gallow 2014; Weisberg 2009.

Revised Pages
is taken in single bites, where the first bite updates our confidence and then
informs the prior for the next bite of evidence. We use quantified terms in
this example, but we believe that numbers cannot meaningfully express the
probative value of evidence in process-tracing.
In simplified terms, if finding E1 updates our prior confidence by 5 per-
cent and E2 also updates it by 5 percent, then finding both would update
our confidence by 10 percent, contingent on E1 and E2 being independent
of each other (rule 1b).3 In principle, this means that adding more pieces of
“weak” evidence will provide stronger evidence confirming or disconfirming
a given proposition, contingent on rules 1b and 1d holding.4 However, there
is also a pragmatic limit to adding more pieces of weak evidence. Rule 403
in US evidence law states, “The court may exclude relevant evidence if its
probative value is substantially outweighed by a danger of one or more of
the following: unfair prejudice, confusing the issues, misleading the jury,
undue delay, wasting time, or needlessly presenting cumulative evidence” (em-
phasis added, Michigan Legal Publishing, 2018). When adding more pieces
of evidence would provide only a small amount of additional updating and
would provoke boredom, no more evidence should be produced. In other
words, when adding new pieces of evidence does not tell us anything new,
we should stop.
If the evidence points in different directions in relation to a particu-
lar proposition, we use individual pieces’ direction and relative probative
value to determine the overall inference they enable. If we find two pieces
of confirming evidence that each enable 1 percent updating but do not find
a highly certain piece of evidence (disconfirming evidence) that reduces our
confidence by 10 percent, the overall result might be a disconfirming infer-
ence where we are 8 percent less confident in a proposition holding based on
the evidence. According to Rule 1b,
Rule 1b—The level of independence of two pieces of evidence

determines the amount of updating that finding both enables.
3. This is a simplification, given that the subsequent test would actually provide a bit less
updating, which is also contingent on how close our prior confidence approaches either 0
percent or 100 percent (see rule 1d on the declining marginal effects of adding more pieces
of evidence).
4. Bennett (2014) talks about adding multiple straw-in-the-wind pieces of evidence to
paint a more compelling overall picture. We agree with this, though with the caveat that we
should avoid a more-is-always-better logic in which we attempt to mask the poor quality of
the evidence (i.e., low probative value) with a high quantity.

Revised Pages
As a natural corollary to rule 1a, if two pieces of evidence are completely

dependent on each other in relation to a particular proposition, finding both
does not tell us anything more than finding either of the single pieces by
itself. If E1 and E2 are completely dependent on each other, finding E2 after
we already know E1 would not tell us anything new about the underlying
proposition, given that any updating that E2 could enable was already in-
corporated into our posterior confidence after finding E1. For example, if we
know from the minutes of a meeting that actor A opposed an action (E1),
and if participants we later interviewed tell us that actor A opposed an action
(E2), little additional updating would take place unless we could establish
that the source for E2 had not prepared for the interview with us by reading
the minutes. If the source had read the minutes, E1 and E2 would be very
dependent on each other because E2 builds primarily on the information
included in E1 (E1 is primary to us, and E2 is a secondary source that builds
on E1), meaning that E2 tells us nothing new.
Given rule 1b, unless we can establish that sources are actually indepen-
dent of each other, we should assume a considerable degree of dependence
in relation to particular propositions. When we can demonstrate indepen-
dence, we can sum pieces of evidence according to rule 1a. But when we
cannot demonstrate full independence, we need to estimate the degree of
independence of sources, thereby determining the amount of total updating
that takes place with subsequent collection of evidence. The higher the level
of dependence, the less additional weight each piece of evidence has.
Establishing independence is often easier when we are dealing with very
different types of evidence from very different sources—one of the reasons
Bayesians (and historians) favor evidential diversity (Fitelson 2001; Howson
and Urbach 2006). A particularly important tool for evaluating the inde-
pendence of sources is determining their relationship with each other—in
particular, whether both are secondary sources building on the same primary
source or whether they are, for example, two independent primary sources.
This cross-checking should include information regarding (1) the degree
of match between sources, ideally with full documentation in an online ap-
pendix, and (2) the extent to which we can document that the sources are ac-
tually independent of each other. Demonstrating independence is relatively
easy in situations where there are two opposing sides in a negotiation and
where they have self-evident motivations that are not correlated with each
other. Yet in other situations, one actor might have incentives to downplay
her role even though she “won” in the negotiations. Here, both the “loser”
and “winner” would have incentives to portray the negotiations as an equi-

Revised Pages
table 50–50 deal where no one “won” or “lost.” The winner might want to
avoid gloating about victory because she is engaged in a long-term, iterated
game of repeated negotiations with the loser, whereas the loser might want
to avoid being portrayed as such for domestic political reasons. Therefore,
we cannot assume that two actors on opposing sides are independent: we
must empirically verify their independence. And we will never be 100 per-
cent confident—a degree of uncertainty will always exist about the indepen-
dence of our sources.
Rule 1c states,
Rule 1c—When independent pieces of evidence can be used to

corroborate each other, the probative value of the found evidence
increases even more because we become more confident about
empirical uniqueness, other things equal.
Bayesian scholars have suggested that evidentiary diversity matters be-

cause it enables us to be confident that successive pieces of evidence are
independent of each other (e.g., Fitelson 2001). However, another im-
portant but sometimes overlooked aspect of evidential diversity is that
when multiple pieces of evidence can be used to corroborate each other,
we increase the empirical uniqueness of our observations. If we interview
a lobbyist who tells us that she significantly influenced legislation in a
given policy area (observation i), given that the lobbyist has strong mo-
tives for skewing the truth, we would not confirm a proposition about
lobbyist influence merely based on her say-so (low empirical unique-
ness). However, if we corroborated the evidence in E1 by talking to in-
formed neutrals who witnessed the process but had no stake in the mat-
ter (E2), finding similar things in E1 and E2 would enable more updating
than merely summing the two because the findings would tend to bolster
the veracity of both sources. Before we sum E1 and E2, the corroboration
provided across the two (relatively) independent sources would increase
the probative value of both E1 and E2 individually. Therefore, the pro-
bative value of both would increase even more than we would expect
merely based on the simple additive rule 1a. Both E1 and E2 might have
only enabled 2 percent updating on their own, but after corroboration,
the probative value of each might have increased to 5 percent. Therefore,
the total increase in updating would be significantly more than it would
have been if the two sources had not also updated our confidence in the
veracity of the observation (more empirically unique).
For example, if we interview three participants in a political negotia-

Revised Pages
tion and find that their accounts are broadly similar, and if we can verify
that the accounts are independent of each other, we would increase our
confidence in the accuracy of the evidence (more empirically unique),
given that it would be highly unlikely to find similar accounts unless the
observed material is a true measure of what happened. However, finding
similar accounts could also mean that the participants met afterward to
agree on a common account of the events or were just retelling the publicly
available account of the event. And finding that the accounts are too simi-
lar should actually decrease our assessment of the empirical uniqueness of
the observations quite dramatically, as it is highly unlikely that we would
find close-to-perfect correspondence down to particular phrases. Here, the
collected evidence is too good to be true, meaning that such a close fit with
our expectations is not plausible and therefore should significantly down-
grade our confidence in the empirical uniqueness of the observations. If
we are using a secondary source and the collected evidence matches too
closely—more than seems reasonable—this would suggest that we have
selected a biased historical source whose implicit theories run in the same
direction as our own (Lustick 1996: 608, 615). In this situation, we should
significantly downgrade our assessment of the source’s accuracy, with the
result that the collected evidence would not enable us to make inferences
about the veracity of our hypothesis.
Corroboration does not increase confidence in empirical uniqueness un-
less we can substantiate that sources are actually independent of each other.
Doing three interviews and postulating that we have triangulated sources
is not enough—we need to provide proof that the interviews are actually
independent of each other. And so rule 1d holds,
Rule 1d—There are diminishing returns from new evidence as the

level of our confidence increases/decreases.
A final well-known property of evidence in Bayesian reasoning is that there

are diminishing returns on new evidence. Given the important role of the
size of prior confidence in determining how much updating can take place
based on new evidence, as we approach higher or lower degrees of confi-
dence in a proposition existing (i.e., there is stronger evidence supporting
or not supporting), only increasingly strong confirmatory pieces of evidence
(higher probative value) can further increase or decrease our confidence
(e.g., Friedman 1986b; Howson and Urbach 2006).
In practical terms, this means that there is a natural point at which col-
lecting further evidence is no longer warranted unless it provides new evi-

Revised Pages
dence that would significantly update our confidence. However, because

much case-based research starts in a situation similar to a plausibility probe,
where our prior confidence in a causal theory actually working at the within-
case level is relatively low, collecting confirmatory pieces that enable only a
modest amount of updating still will enable us to make inferences. But we
stop our evidence collection when additional pieces of evidence no longer
tell us something new about the validity of a given proposition.
Rule 2—Using the Road Map to Evaluate the Inferences Enabled

by Empirical Evidence
Rule 2a—Evidence-based inferences travel up only one level

of proposition.
After the collected weight of evidence in support of a given proposition is

assessed following rules 1a–1d, we need to move from evidence in support
of specific propositions to the overall inferences enabled by the evidence
for all of the propositions. We do so by moving sequentially from the col-
lected evidence for propositions at one level and then adding the inferences
enabled to the next level. The link between levels is forged by the logical
arguments in relation to whether a given proposition is theoretically certain
and/or unique in relation to either higher-level propositions or the underly-
ing causal hypothesis itself.
Figure 7.1 shows two propositions about the suspect having killed the
victim: (1) the victim was killed by a gun, (2) a smoking gun links the suspect
to the victim. Proposition 1 is theoretically certain but not unique, whereas
proposition 2 is theoretically unique but not certain. Proposition 2 does not
stand on its own but is composed of two supporting propositions, each of
which has different relevant evidence.
A lower-level proposition that is theoretically certain and has strong dis-
confirming observations backing it (empirically certain, meaning evidence
was not found despite far-ranging search) enables us to disconfirm the prop-
osition at the next level. Figure 7.2 depicts this situation, where evidence for
supporting propositions P1a and P1b is not found. Given that P1a is highly
theoretically certain, we would disconfirm proposition 1 based on not find-
ing support for P1a.
A mixed picture often results. For example, if we had found pieces of evi-

Revised Pages
Fig. 7.2. Disconfirmation of higher-level propositions as a consequence of high

certainty at a lower level
dence confirming P1b, given that it is theoretically unique, we would be in

a situation where our overall inference about proposition 1 would be deter-
mined by the relative probative value of the disconfirming evidence of P1a in
relation to the confirming evidence of P1b. If we had found stronger discon-
firming evidence, we would cautiously conclude that on balance, there was
more disconfirming than confirming evidence. The converse also holds true.
Figure 7.3 depicts a situation in which we have found confirming evi-
dence of both P1a and P1b. Given that P1b is not theoretically unique, our
confirmation of proposition 1 would be driven by the evidence that supports
P1a.
Rule 2b—Claims about higher-level propositions can never be

stronger than the evidence upon which they rest.
Claims about propositions can never be stronger than the actual pieces of
evidence that support them directly or through supporting propositions.
Figure 7.4 illustrates a situation with a mixed evidential picture but in
which we could reasonably conclude that there is support for the overall
hypothesis because the confirming evidence for proposition 1 is stronger
than the disconfirming evidence for proposition 2. Here, the contrast in

Revised Pages
Fig. 7.3. Confirmation of higher-level propositions as a consequence of lower-

level uniqueness
the empirical certainty and uniqueness of the found/not found observations

would enable an overall (cautious) confirming inference to be made. Finding
observation 1a(ii) (high empirical uniqueness) for supporting proposition 1a
enables a relatively strong confirming inference to be made, whereas the low
empirical certainty (e.g., due to access issues) of observation 2(i) for proposi-
tion 2 does not enable any strong disconfirming inferences to be made.
7.4. An Example of the Aggregation of Evidence in Practice—

Tannenwald (1999)
We illustrate the utility of our two-stage evidence-evaluation framework by

reconstructing the arguments and supporting evidence in a well-known ar-
ticle that uses minimalist process-tracing methods. We are not seeking to
discredit Tannenwald’s (1999) research but rather are illustrating a method-
ological point: a more systematic evaluation asking what empirical material
can be evidence of can prevent us from claiming too much on the basis of
weak empirical evidence. Tannenwald is justified in claiming that the evi-
dence shows that norms mattered in the Korean War case, but not based
on the evidence she puts forward as the core proposition about evidence
(“taboo talk”).

Revised Pages
Fig. 7.4. Overall confirmation
Tannenwald (1999: 443–51) uses four parallel case studies to assess whether
norms against the use of atomic weapons (C) affected the U.S. decision not
to use them after 1945 (the outcome, O). She uses within-case evidence to
assess whether C is causally related to O in the Korean War case.
While Tannenwald claims to put forward only one proposition about ev-
idence in relation to the causal hypothesis norms → non-use (“taboo talk”),
the Korean War case study actually includes (at least) four propositions
about mechanistic evidence. When we then aggregate the found evidence
in support of the core propositions and supporting propositions, only very
weak confirming evidence appears to support her core proposition (1—the
operation of norms leaves empirical fingerprints) (table 7.2), mostly because
we are unable to assess what finding “taboo talk” by a particular actor means
in the context of the case (low empirical uniqueness). Similarly, while she
claims that norms against using nuclear weapons may have resulted in a
lack of planning and readiness (proposition 3—inhibitions about nuclear
weapons may have operated in more indirect ways as well—e.g., by influ-
encing perceptions about suitable targets and the state of readiness for tacti-

TABLE 7.2. The Argument Road Map for Tannenwald’s Case Study of the Korean War
Causal hypothesis: Norms (C) resulted in U.S. nonuse of nuclear weapons (O) in Korean War
case
Prior relatively low based on existing research, with most plausible alternative deterrence.
• “We still lack a full understanding of how this tradition arose and is maintained and of its pros-
pects for the future. The widely cited explanation is deterrence, but this account is either wrong
or incomplete. Although an element of sheer luck no doubt has played a part in this fortuitous
outcome, this article argues that a normative element must be taken into account in explaining
why nuclear weapons have not been used since 1945” (433).
Propositions about empirical fingerprints
1 Norms operating leave empirical fingerprints (taboo talk)—theoretical Lc, Hu
• Account evidence: “non-cost-benefit-type reasoning along the lines of ‘this is simply wrong’
in and of itself (because of who we are, what our values are, ‘we just don’t do things like
this,’ ‘because it isn’t done by anyone,’ and so on)” (440)
•L ow theoretical certainty, does not describe whether it has to be found, only opening up
the possibility that when norms really matter (more taken for granted), they do not mani-
fest themselves openly (440).
•R elatively high theoretical uniqueness. “Taboo talk [the predicted evidence] is not just
‘cheap talk,’ as realists might imagine” (440). But Tannenwald describes only one alterna-
tive and does not explain whether the realist cheap talk is actually the most plausible
alternative.
• observation P1(i) “General Ridgway wrote later that using nuclear weapons in situ-
ations short of retaliation or survival of the homeland was ‘the
ultimate in immorality’” (445).
• Lu: No information given about Ridgway’s role in the decision-
making process about nuclear weapons, which changed over
time (from being commander of the 8th army in Korea to overall
commander until 1952). Because we are not clearly told where and
when discussions are taking place during this period, it is difficult
to evaluate whether he might be so peripheral in the process that
finding “taboo talk” tells us little or nothing about the existence of
the proposition. Given that Ridgway’s remark appears in an auto-
biography written twenty years later, it is doubtful whether we can
trust this source for what statements actually were made during the
decision-making process.
• Very weak confirmation of proposition 1
• observation P1(ii) “Paul Nitze, director of policy planning . . . nevertheless found
[nuclear weapons] ‘offensive to all morality’” (445).
• Lu: we are not told about Nitze’s role in the decision-making
process about nuclear weapons, and no contextual information is
given, meaning that the comments could have nothing to do with
the Korean War case.
• observation P1(iii) “Dean Rusk, at the time assistant secretary of state for the Far East,
. . . ‘We would have worn the mark of Cain for generations to
come’” (445).

• Lu: the quote suggests that taboo talk was present, but only for a
relatively minor official (at the time). Given that we do not know
when the interview took place, we cannot evaluate its veracity. An
autobiographical work should be taken with a grain of salt.
• observation P1(iv) “Truman later recalled his resistance to pressures of some of his
generals. . . . ‘I could not bring myself to order the slaughter of
25,000,000. . . . I just could not make the order for a Third World
War’” (446).
• Lu: Not really evidence of “taboo talk.” Given that we do not
know more about the context for the remarks, we cannot evaluate
whether we can trust the statement. However, given that Truman
had strong interests in how history judged him, we can assume
he would have motives to paint a picture that does not necessarily
reflect what actually took place.
• No confirmation of proposition 1
• observation P1(v) “Truman’s own personal post-Hiroshima abhorrence of atomic
weapons appears to have been a crucial factor” (446).
• Lu: We know nothing about the source and therefore should
assume very low empirical uniqueness because this source
could have many reasons for not telling us the truth.
• observation P1(vi) In the view of Gavin, a U.S. Army general who was member of the
Weapons System Evaluation Group, “the United States had not
pursued tactical nuclear options aggressively enough because of ‘old
thinking’ that nuclear weapons could only be used strategically and
also because of moral qualms about nuclear weapons in general”
(448).
• Lu: There is no information about Gavin’s relation to top officials,
and our only information about the document’s context is that it is
a memoir, which should lead us to expect very low trustworthiness.
• Aggregation of Tannenwald’s conclusions: “In sum, the evidence suggests . . . the
evidence for normative opprobrium that was already developing heightened the
proposition 1 salience of moral and political concerns. . . . Nuclear weapons were
clearly acquiring a special status that encouraged political leaders to
view them as weapons of last resort” (448).
• Sources of evidence appear to be relatively independent.
• But overall only very weak confirmation possible due to low
empirical uniqueness: there are few sources that we can trust.
• Tannenwald’s conclusions not warranted.
2 U.S .and global public opinion opposed using atomic weapons (no information provided
about theoretical certainty or uniqueness)
• Evidence expected not clearly described.
• Theoretical certainty and uniqueness not discussed in case study.
• observation P2(i) “An officer in the State Department’s Bureau of Far East Affairs
warned in November 1950 that even though ‘the military results
achieved by atomic bombardment may be identical to those attained
by conventional weapons, the effect on world opinion would be
vastly different’” (444).
(continues)

Table 7.2—Continued
• Lu: No information given about this secondary source, meaning
that it is difficult to evaluate whether we can trust it. In addition,
there is low uniqueness because one could argue that the report does
not reflect the overall empirical record but is an exception.
•Very weak confirmation of proposition 2
• observation P2(ii) “British prime minister Clement Attlee rushed to Washington for
anxious discussions on nuclear use policy” (444).
• Lu: While we might be able to trust the source, we are not told
anything about whether his trip was in reaction to moral abhor-
rence to nuclear weapons or strategic concerns about possible
retaliation, meaning that empirical uniqueness is relatively low.
• observation P2(iii) “At the end of November 1950 the Joint Chiefs advised using
nuclear weapons was inappropriate except under the most compel-
ling military circumstances, citing, in addition to battlefield factors,
world opinion and the risk of escalation” (444).
• Lu: Little information is available about the source, which hinders
critical evaluation.
• Weak confirmation of proposition 2
• observation P2(iv) “Secretary of State Dean Acheson thought that the Chinese might
not act ‘rationally’ if atomic weapons were used” (444).
• Not evidence as no relationship is established linking it with
proposition 2
• observation P2(v) “Perceived public opprobrium against using atomic weapons on
Chinese cities made it difficult to think about this option in any
purely military fashion” (445)
• Not evidence: No information is given about the source of this
statement, meaning that it is a postulate without any backing.
• observation P2(vi) “US leaders that using atomic weapons would destroy Asian and
others’ support for the United States in any future global war. . . .
If the United States acted ‘immorally’ (by using atomic weapons) it
would sacrifice its ability to lead” (445).
• Lu: Little information is given about source other than that that
it is a memorandum by Nitze. We receive no information about
why the memorandum enables the claim that U.S. leaders worried
about public opinion.
• observation P2(vii) “State Department followed public opinion closely, reporting in the
months after Truman’s infamous press conference that European
public opinion on atomic weapons was generally negative” (445).
• Hu: Given that the evidence relates to what the State Department
officials who wrote the report believed about public opinion based
on polling data, not whether the questions used were of poor qual-
ity, the accuracy can be evaluated as relatively high. However, we
are not told whether these State Department reports were isolated
instances or reflect a broader pattern.
• Some confirmation of proposition 2

• Aggregation of Tannenwald’s conclusions: “The public horror of atomic weapons
evidence for presented a serious political obstacle” (444).
proposition 2 • Sources of evidence appear to be relatively independent.
• Overall confirmation of proposition 2 warranted
3 “Inhibitions about nuclear weapons may have operated in more indirect ways as well, for ex-
ample, by influencing perceptions about suitable targets and the state of readiness for tacti-
cal nuclear warfare” (446) (no information provided on theoretical certainty or uniqueness)
• Theoretical certainty and uniqueness not discussed explicitly, meaning that we are not told
how this claim relates to the underlying causal hypothesis.
• observation P3(i) “In March 1951 a Johns Hopkins University research group working
with the Far East Command informed MacArthur that many ‘large
targets of opportunity’ existed for nuclear attack. But the group
found U.S. forces ill-prepared for tactical nuclear warfare” (446).
• Lu: No information given for whether this finding is linked to
moral inhibitions or merely states that the United States was not
ready (low uniqueness). We also have no information about the
context of the remarks, preventing us from further evaluating why
it could be evidence of P3.
• observation P3(ii) “Truman’s general reluctance to consider nuclear weapons as like any
other weapon . . . must be taken into account. Because of this, as
Rosenberg and others have documented, U.S. planning for nuclear
warfare lagged in the years before Korea” (447).
• Lu: Citing three historical works without discussion means that
we cannot evaluate accuracy. The two other works could build on
Rosenberg’s, but we have no information that would enable us to
assess whether or not this is the case. Some historiographic source
criticism is warranted.
• Weak confirmation of proposition 3
• Aggregation of • Tannenwald’s conclusions: “In short, inhibitions about using
evidence for nuclear weapons in general may have delayed readiness and plan-
proposition 3 ning for tactical nuclear use” (447).
• No information about independence of sources given (P3ii might
have built on P3i).
• Overall confirmation not warranted, given that P3i is not really
evidence of the link between planning and readiness and norms,
and P3ii is of questionable accuracy.
4 “US leaders perceived a taboo developing and sought to challenge it” (448) (no information
provided on theoretical certainty and uniqueness of empirical proposition)
• Theoretical certainty and uniqueness not discussed explicitly, although it is relatively obvi-
ous that if found it would be difficult to account for leaders challenging a taboo unless a
taboo actually existed.
• observation P4(i) “General Nicholas . . . expressed his disappointment over the failure
to use nuclear weapons in Korea. . . . ‘I knew that many individuals
in the United States opposed such thinking for idealistic, moral, or
other reasons’” (447).
(continues)

Table 7.2—Continued
• Lu: While finding that the “principal Pentagon authority on, and
promoter of, nuclear weapons” (447) wrote this might be unique
evidence, given that it is in an autobiography many years later, we
cannot trust that it reflects what he felt at the time.
• observation P4(ii) “In Gavin’s view, the United States had not pursued tactical nuclear
options aggressively enough because of ‘old thinking’” (448).
• Lu: Memoirs have low accuracy, other things equal.
Supporting proposi- “Both Eisenhower and Dulles sought to resist an emerging percep-
tion 4a tion that nuclear weapons should not be used and appeared far more
concerned with the constraints imposed by a perceived taboo on
nuclear weapons and negative public opinion than with any fear of
Soviet retaliation” (448).
• observation P4a(i) “‘This moral problem’ as Dulles referred to it, could potentially be
an obstacle” (448).
• Hu: National Security Council meeting minutes used as a source.
However, Dulles’s pronouncement occurred only in one meeting in
the spring of 1953, raising the question of whether it was an excep-
tion that proves the rule.
• Relatively strong confirmation of proposition 4a
• observation P4a(ii) “Later in the spring Eisenhower asserted his complete agreement
with Dulles that ‘somehow or other the tabu which surrounds the
use of atomic weapons would have to be destroyed’” (449).
• Hu: National Security Council meeting minutes used as a source.
Very doubtful that the president himself would say this in the
meeting unless it reflected his concerns. Tannenwald evaluates
the uniqueness of this evidence when she writes, “It is sometimes
argued that Eisenhower, famous for his dissembling, merely talked
a ‘tough’ line on tactical nuclear weapons in order to maximize
‘deterrence.’ But these discussions were internal policy delibera-
tions at the highest level where the audience he was attempting to
persuade were his own advisers, not foreign enemies. These were
not statements for public consumption” (449).
• Strong confirmation of proposition 4a
• observation P4a(iii) “In his memoirs Eisenhower maintained he was ready to challenge
it” (449).
• Lu: Public memoirs of a president have questionable accuracy.
• Very weak confirmation of proposition 4a
• Aggregation of • No conclusions made by Tannenwald.
evidence for proposi- • We do not know whether P4ai and P4aii are independent of each
tion 4a other.
• Overall confirmation because of strong confirming evidence,
where either P4ai or P4aii by itself would enable a confirming
inference in relation to proposition 4.
Source: Based on Tannenwald 1999.

Revised Pages
cal nuclear warfare), precious little evidence substantiates this proposition,

warranting her use of the phrase may have.
In contrast, there is stronger evidential support for propositions 2 (U.S.
and global public opinion opposed using atomic weapons) and 4 (U.S. lead-
ers perceived a taboo developing and sought to challenge it), although con-
firming proposition 2 does not enable any overall inferences about whether C
caused O because the proposition is most likely neither theoretically certain
nor unique in relation to the causal hypothesis (although Tannenwald does
not explicitly evaluate this). However, despite not being directly linked logi-
cally to Tannenwald’s theory that norms matter, proposition 4 is manifestly a
direct logical corollary to the causal hypothesis that emerging norms matter.
And as table 7.2 shows, despite the risk that the observations 4a(i) and 4a(ii)
in support of proposition 4a are not empirically independent, finding either
would, given the centrality of the actors and the context of the remarks, that
they are enough to confirm proposition 4 (very empirically unique).
Therefore, in aggregate, Tannenwald is warranted in concluding that
norms mattered in the case. However, the evidence does not support many
of her conclusions. We reproduce her conclusions here, with our comments.
In sum, during the Korean War, an emerging taboo shaped how U.S.
leaders defined their interests [no evidence is put forward that warrants
concluding that the norm defined interests]. In contrast to the moral
opprobrium Truman personally felt [weak confirming evidence of
proposition 1 supports this], the taboo operated mostly instrumentally
for Eisenhower and Dulles, constraining a casual resort to tactical
nuclear weapons [not warranted—only evidence for their attempt to
counter the emerging norm (proposition 4) is produced]. The burden of
proof for a decision to use such weapons had already begun to shift.
For those who wanted to challenge the emerging taboo, the best way
to do so would have been to actually use such weapons, but the politi-
cal costs of doing so were already high [confirmation of proposition 2
supports the last claim about political costs, but the first parts of the claim
are speculative]. . . . [T]he political debate over the categorization of
nuclear weapons as “unordinary” weapons suggests the early develop-
ment of constitutive effects [evidence for proposition 1 suggests that this
might be warranted, although evidence very weak—therefore using the
term suggest is warranted]. (Tannenwald 1999: 448; bracketed text
added)

Revised Pages
This example illustrates that when aggregating evidence, it is important

to make explicit how propositions and supporting propositions are logically
linked to the overall causal hypothesis, detailing theoretical certainty and
uniqueness. In Tannenwald’s example, the theoretical uniqueness of her core
proposition 1 is evaluated in only a few short sentences. Readers not con-
vinced by the logical arguments that the test is theoretically unique could be
justified in deciding not to continue based on the belief that they could just
as plausibly account for any evidence put forward in the case study.
The links from propositions to actual empirical evidence must be de-
tailed, focusing on providing clear justifications for levels of empirical cer-
tainty for not found evidence and empirical uniqueness for found evidence.
Once empirical material has been collected and evaluated individually, we
can then assess whether or not there is support for propositions by aggregat-
ing from lower to higher levels, enabling overall conclusions about whether
the collective body of evidence supports an inference that a given causal re-
lationship existed in a case. The strongest evidence Tannenwald puts forward
is not for her main proposition (proposition 1) but instead for proposition
4, which should have received a more prominent place in the development
of her argument.

Revised Pages
C h a p ter 8
Theory-Testing Process-Tracing
8.1. Introduction
Theory-testing process-tracing is used when we know causes and outcomes

and we either (1) have existing conjectures about a plausible mechanism or
(2) can use logical reasoning to formulate a causal mechanism from existing
theorization. Theory-testing process-tracing enables inferences to be made
about whether a causal mechanism operated as hypothesized. Theory-testing
process-tracing does not enable us to test the relative explanatory power of
different theories of mechanisms against each other because doing so requires
evidence of difference-making. The only circumstance where competitive
theory-testing is possible is where mechanisms at both the theoretical and
empirical levels are mutually exclusive, where finding confirming evidence
for one would be disconfirming for the other (Rohlfing 2014). Outside of
this rare situation, confirming or disconfirming inferences are made possible
by assessing whether or not the operation of a given mechanism left mecha-
nistic evidence, irrespective of whether other mechanisms also operate.
This chapter develops both minimalist and in-depth variants of theory-
testing process-tracing. To enable the operationalization of empirical finger-
prints, even a minimalist understanding requires some form of hypothesizing
about the activities that might be taking place. In the systems understand-
ing, the key concept is productive continuity, which means that the theo-
rized mechanism accounts for the link between a cause and outcome, telling
us about the activities that bind together parts of mechanisms. This under-
standing is utilized in what can be termed “in-depth” theory-testing process-
tracing, where we explore whether the empirical fingerprints associated with
the activities of entities are present for each part of the mechanism.
245

Revised Pages
Only typical cases—where the cause, outcome, and contextual condi-

tions required are present—should be selected for engaging in theory-testing
process-tracing (see chapter 4). Minimalist and in-depth theory-testing em-
ploy the same logic of inference (see chapters 5–7), using the correspondence
between predicted mechanistic evidence (empirical fingerprints) and actual
sources to determine the degree to which we can update our confidence in
the operation of a mechanism in a given case.
The minimalist understanding is utilized either as a form of plausibility
probe to explore whether there is any mechanistic evidence that suggests a
mechanism might link together a cause and outcome or as a follow-up to
in-depth process-tracing case studies to explore the bounds of generaliza-
tions of a mechanism (see chapter 4). In contrast, in-depth process-tracing
is deployed only when it is reasonable to expect that a process links a cause
and outcome because of the large amount of analytical resources required to
trace empirically the operation of each part of a mechanism.
The distinction between the system understanding and minimalist un-
derstanding should realistically be thought of as a continuum, but there
is a clear cutoff. On one side, in-depth process-tracing explicitly unpacks
mechanisms—in particular, detailing the dynamic, productive elements of
the causal process that links together the parts, whereas these elements are
merely depicted as causal arrows in minimalist understandings of process-
tracing. But the methodological implications of this difference are that in-
depth process-tracing involves the detailed empirical investigation of dis-
aggregated causal mechanisms in single-case studies, producing detailed,
step-by-step mechanistic evidence of the workings of the causal process that
enables strong inferences to be made. In contrast, the mechanistic evidence
is quite weak in relation to enabling causal inferences in minimalist process-
tracing because the mechanism is not unpacked. The difference can also be
expressed in terms of what type of research question is in focus. In minimal-
ist process-tracing, the question is whether there is any evidence of a link
between a cause and outcome, whereas in-depth process-tracing focuses on
explaining how a cause is linked to an outcome.
8.2. Minimalist Theory-Testing Process-Tracing
Minimalist theory-testing explores whether there is any mechanistic ev-

idence—or what Bennett and Checkel term “diagnostic evidence” (2014:
7)—of a process linking together a cause and outcome in a typical case. We
use minimalist theory-testing process-tracing when we have little knowledge

Revised Pages
Theory-Testing Process-Tracing 247
of what types of mechanism link a given cause and outcome and under
which conditions a particular mechanism provides the link. It makes sense
first to engage in a form of a plausibility probe, where mechanisms are not
unpacked in any detail. In this situation, we first want to know whether a
causal link exists; if it does, we then want to determine what mechanism
links a cause (or set of causes) and an outcome. Only then do we turn to
learning about the inner workings of that mechanism. And after we have
engaged in more in-depth process-tracing of one or more cases, we can use
minimalist process-tracing to determine whether what we found in the stud-
ied case(s) also holds in other cases within the population of causally similar
cases (e.g., cases with similar contextual conditions).
The key distinctions in minimalist theory-testing relate to how “un-
packed” the mechanism is in theoretical terms (superficial or incomplete)
(see chapter 2) and to how many different empirical fingerprints are op-
erationalized for the mechanism, ranging from a singular proposition to a
cluster of predictions. Figure 8.1 depicts in abstract terms the research pro-
cess of minimalist process-tracing in a single case. The mechanism shown
is a superficial one that is not disaggregated into parts. In a singular test,
a proposition about potential evidence is assessed multiple times during a
temporal process or across space (e.g., issues in a negotiation). In the cluster
test, multiple nonoverlapping propositions about evidence are assessed em-
pirically. Operationalizing both singular empirical tests and clusters of tests
becomes considerably easier when potential activities associated with the
process are made as explicit as possible.
Singular tests of superficial mechanisms can be thought of as simple
plausibility probes that can be deployed early in a research process or as
follow-ups aimed at generalizing insights about mechanisms across addi-
tional cases. In contrast, a stronger version of minimalist process-tracing
involves clusters of empirical tests and/or unpacking the process to some
extent (incomplete mechanistic explanations). In the following, we first dis-
cuss the differences between singular and clusters of tests, followed by a
discussion of when to use plausibility probes and more robust versions of
minimalist process-tracing.
Singular Empirical Tests versus Clusters of Tests
While there are exceptions, a singular test usually involves a theoretically

unique but not certain proposition about the mechanistic evidence that will
be found in a case (also termed a “smoking gun test” in the literature; see

Revised Pages
Fig. 8.1. The three steps of minimalist theory-testing process-tracing
Van Evera 1997). With a singular proposition, the more (relatively indepen-
dent) sources we find that suggest the presence of the proposed empirical
fingerprint is present, the more we can be confident that we have found the
mechanistic evidence. Finding the proposition multiple times also increases
the empirical uniqueness of the evidence, because we can rule out the alter-
native empirical explanations of the evidence—for example, that the source
does not represent the full empirical record. The exception to this “more is
better” logic would be if the first found observation itself was highly empiri-
cally unique—for example, if we can trust it and it documents the activities
of a particularly central actor at a particularly critical juncture.
A cluster, in contrast, involves operationalizing a battery of nonoverlap-
ping propositions to assess the causal process. These propositions about evi-
dence often are theoretically certain but not very unique (also termed “hoop
tests” in the literature; Van Evera 1997). These are disconfirming when taken
individually, but if we find independent evidence for each of the proposi-
tions, the small probative value of each individual proposition is summed,
enabling a relatively strong overall confirmation of the theory.
For example, our propositions might make predictions about evidence
in the form of the timing and nature of decisions that we should find if

Revised Pages
our theory is valid. In the evaluation literature on case studies, inspired by

criminal science, Scriven (2011) develops what he terms “modus operandi”
clusters that incorporate a range of different propositions regarding what
we should find if a theory is valid. In the criminal sciences, modus operandi
relate to clusters of traces left by a particular criminal. If the crime is a bur-
glary, a criminal might leave signature traces that are similar across different
cases, such as entering through basements, carefully taking windows out of
their panes instead of breaking the glass, and stealing only jewelry. The logic,
then, is that given that we know burglar A’s modus operandi from other
instances, if we find a crime that follows this pattern, we are more confident
that burglar A perpetrated it. Each individual part of the cluster is a weak
confirming test, but together the tests have a degree of confirmatory power
based on the Bayesian argument How probable is it that other burglars would
do each of these steps in exactly the same fashion as burglar A?
When to Use Plausibility Probes and More Robust Versions

of Minimalist Process-Tracing
If we have a relatively strong prior confidence in the presence of a causal

relationship and some ideas about a plausible causal link, we can engage in
a more robust minimalist theory-testing, where we unpack the hypothetical
process to some extent and/or develop multiple nonoverlapping proposi-
tions about evidence for the process. In contrast, if there is a low prior plau-
sibility because little or no existing research has studied the topic, we can
engage in a plausibility probe, which is a type of quick-and-dirty case study
that employs relatively weak empirical tests to determine whether any basis
exists for proceeding with more robust empirical case studies aimed at trac-
ing the process. Here, we typically utilize more singular empirical proposi-
tions about evidence that are deployed repeatedly over time or space but in
which the hypothesized mechanism remains firmly within a theoretical gray
box (i.e., it is a superficial mechanism; see chapter 2).
Plausibility probe case studies typically occur either very early or very
late in a research project. Early in our research, we have only initial hunches
about causal relationships based on the existing literature and preliminary
descriptive case studies. It is a good idea to gain a better understanding of
what is going on in a case before we employ more robust (and costly) con-
gruence case studies. Plausibility probes can also be used late in a research

Revised Pages
process when we have developed a singular test that can act as a signature of
a process, enabling us to engage in a broader exploration of the bounds of
valid generalization about a mechanism (see chapter 4).
Even though we are employing weak tests in a plausibility probe, the
analysis offers important insights that enable us to refine theories of mecha-
nisms and/or choose to deploy stronger empirical tests. If we do not find
the predicted evidence of our propositions, we would want to engage in a
more comprehensive assessment of why. Was the absence of the predicted
evidence the result of deploying a theoretically low certainty test? To assess
whether the negative finding resulted from the weak test, we should repeat
the plausibility probe with one or more different tests to conclude more con-
fidently that there is or is not a causal relationship. We would then want to
expand our probing outward by brainstorming other potential causal links
that could then be tested empirically. If we still do not find any confirming
evidence, we can then cautiously conclude that—at least for the selected
case—there is no evidence of a link. If we find confirming evidence, we can
then proceed to a stronger case study test of the theorized mechanism.
If we have higher prior confidence in the presence of a theoretical rela-
tionship because of either existing research or our own plausibility probe, we
can then deploy stronger empirical tests in the form of a cluster of nonover-
lapping propositions. If the propositions are independent of each other, their
probative value can be summed together (see chapter 7). The individual tests
in a cluster often are not very theoretically unique, but taken together they
might enable relatively strong inferences. If they are theoretically certain
(either by themselves or together), not finding them would enable relatively
strong disconfirming inferences to be made.
Dealing with Positive and Negative Findings in Minimalist Process-Tracing
How should we should proceed after negative and positive findings? If we do

not find the predicted evidence, we want to diagnose why. Was our proposi-
tion merely a smoking gun, where not finding the predicted evidence tells us
little or nothing about the presence of the causal relationship? Can we trust
the empirical evidence collected? Or is there no causal link? If we are not
very confident in the answers to these questions, we can first refine our em-
pirical tests so that we can conclude with more confidence whether or not a
causal link exists. We can use our case-specific knowledge to craft additional
propositions, often in the form of a set of nonoverlapping, theoretically cer-

Revised Pages
tain propositions that together would offer stronger confirmation or discon-

firmation. If for example, we are studying the importance of norms in high-
level political decision-making, we might not find the predicted taboo talk
in the empirical record because we simply did not have access to the minutes
of meetings in which the evidence might be found. Another reason could
be that the minutes did not record the arguments put forward by actors.
In both instances, the activity might have occurred but we simply did not
observe it in the empirical record. In these circumstances, we could employ
different tests with a focus on theoretical certainty. For example, we might
make predictions about finding the fingerprint in interviews with partici-
pants or about fingerprints that affect the form of the final decisions. If we
still do not find the evidence after deploying the new, more theoretically and
empirically certain proposition, we would significantly downgrade our con-
fidence in the case, although following Bayesian logic, we would never falsify
the hypothesis in the particular case with 100 percent confidence, given the
uncertainties involved in empirical research in the real world.
If we can trust the disconfirming evidence, another explanation is that we
might have chosen an idiosyncratic case. We can assess whether the chosen
case was idiosyncratic using comparative methods, assessing whether caus-
ally relevant differences exist between the selected and other typical cases
in quadrant I (see chapter 4). If we find that the case was idiosyncratic, we
can then select another typical case for analysis. If we again fail to find the
predicted evidence, we can cautiously conclude that the causal relationship
is not present in the bounded population.
If we find the predicted evidence in the chosen case, the strength of our
confirming conclusions is a function of the theoretical uniqueness of our
proposition and the empirical uniqueness of the found evidence. A singular
proposition that is a strong smoking gun would significantly increase our
confidence that a hypothesis holds, whereas a positive finding in a single
theoretically certain but not unique proposition does little if anything.
However, if we have positive findings for all or most of the propositions in a
cluster, we can also infer with a reasonable degree of confidence that a cause
is linked to the outcome in the chosen case.
Given that we can make inferences only in the case in hand, we should
then repeat the case study on another typical case using either minimalist
or in-depth process-tracing methods to gain greater confidence in the cross-
case inference about the operation of a particular mechanism.
A trade-off always exists between the level of empirical rigor of indi-
vidual case studies that enable stronger within-case inferences and the num-

Revised Pages
ber of cases within the population studied: the more cases we study within
a population, the better we can infer that what was found in the chosen
cases should also be present in other typical cases throughout the popula-
tion (Rohlfing 2012). Although more rigorous studies of individual cases
enable additional updating about causal relations, other things equal, they
also require significant analytical resources, meaning that in most research
situations we can never conduct more than two or three rigorous case stud-
ies. An additional challenge is more practical, in that presenting the results
of rigorous case studies in a transparent fashion requires significant space,
ruling out more than one or two case studies in article-length work. Yet if
we study only one or two cases in a bounded population of a dozen or more
cases, our cross-case inferences are very weak at best. The trade-off involves
increased confidence about a few cases versus information about a greater
number of cases. One cannot have one’s cake and eat it too.
An Example of Minimalist Process-Tracing
Brooks and Wohlforth (2000–2001) assess whether a causal link exists be-
tween Soviet economic decline and Soviet leaders’ decisions to engage in a
major strategic shift of policy that resulted in the end of the Cold War. The
causal mechanism linking them together is not unpacked, meaning that it
remains firmly in a theoretical gray box, as figure 8.2 depicts.
The authors test whether material constraints mattered by deploying a
cluster test of three distinct predictions of evidence of a relationship be-
tween the occurrence of the cause and the outcome. The description of their
propositions, in the form of the evidence that the authors “predict” that they
will find and an explanation of why its presence would constitute confirm-
ing evidence, is not as explicitly operationalized as it could be. In particular,
the theoretical certainty and uniqueness could be better developed. At least
three predictions of evidence can be understood as nonoverlapping theoreti-
cally certain propositions (see figure 8.2). We do not describe the empirical
sources used to evaluate whether or not the predicted evidence was actually
present, although multiple sources are used for each proposition.
8.3. In-Depth Theory-Testing Process-Tracing
In-depth theory-testing is used when we have a reasoned belief that there

might actually be something to trace and when we can theorize a plausible

Revised Pages
Fig. 8.2. An example of minimalist process-tracing

Source: Based on Brooks and Wohlforth 2000–2001.
mechanism in considerable detail—ideally with “productive continuity”

between the cause and outcome (see chapters 2 and 3). In-depth theory-
testing process-tracing seeks to dig deeper into how things work by unpack-
ing a causal mechanism into parts composed of entities engaging in activi-
ties, operationalizing empirical fingerprints for each part, and then tracing
empirically whether actual evidence indicates that the mechanism worked
as hypothesized. By tracing each part of the mechanism empirically using
mechanistic evidence, we can make stronger inferences about how causal
processes worked in real-world cases (Illari 2011; Russo and Williamson
2007). In comparison, in the “minimalist” understanding, we have less di-
rect mechanistic evidence because we have not made the process explicit,
resulting in weaker inferences about the operation of a causal process.
Figure 8.3 illustrates the three steps involved in in-depth theory-testing
process-tracing. The first step is to conceptualize the cause and outcome in
a set-theoretical manner as well as the causal mechanism hypothesized to
link together the two. This abstract example develops a two-part mechanism
between C and O, with each part composed of entities engaging in activities.
Conceptualization involves using logical reasoning and existing theoretical
and empirical literature to formulate a plausible mechanism that links the

Revised Pages
Fig. 8.3. The three-steps of in-depth theory-testing process-tracing
cause to the outcome, along with the contextual conditions that can be ex-
pected to affect the functioning of the mechanism.
The theoretical conceptualization of the entities includes nouns, while
the activities include verbs that transmit causality through the mechanism.
In social science terms, social entities have causal powers, which can be un-
derstood as “a capacity to produce a certain kind of outcome in the presence
of appropriate antecedent conditions” (Little 1996: 37). A good theorized
mechanism as a system should clearly describe what links together each of
the parts, ideally exhibiting productive continuity between cause and out-
come in a more or less seamless causal story. In practice, our theorized mech-
anisms will often be more mechanistic sketches, but the critical difference
between minimalist and systems understandings is whether our mechanistic
explanation provides a good enough account of how the cause is linked to
the outcome through a process.
The amount of theoretical work necessary to flesh out a causal mecha-

Revised Pages
nism and the context in which it is expected to function depend on whether

existing theories are formulated in terms of mere correlations (C → O), as
plausible causal links between C and O (e.g., intervening variables) or as full-
fledged causal mechanisms. Most common is the situation where we know
C and O but where the process (i.e., causal mechanism) by which C causes
O has not been explicitly conceptualized. If a detailed causal mechanism has
not already been formulated, the first step of theory-testing process-tracing
is to do so.
The theorized causal mechanism then needs to be operationalized in step
2, translating theoretical expectations into case-specific propositions about
what empirical fingerprints the activities associated with each of the parts of
the mechanism should have left if they operate as theorized. The prior con-
fidence in the presence of the causal mechanism also needs to be developed,
since it determines the strength and type of empirical tests that should be
used (see chapter 5). Clusters of propositions about observables can exist for
each part or for only one proposition if it is relatively theoretically unique.
At the core of in-depth theory-testing process-tracing is a structured test
of whether a hypothesized causal mechanism is present in the accessible em-
pirical record. Empirical material is gathered to see whether or not the pre-
dicted evidence (proposition) was present and then is evaluated in context to
determine whether the predicted evidence for each part was actually found
and can be trusted (see chapters 6 and 7).
If we can claim for each of the parts of the mechanism that the actual
observations matched our propositions about evidence and that they are
trustworthy, we can then infer that the hypothesized causal mechanism is
present on the basis of Bayesian logic. If we do not find predicted theoreti-
cally certain fingerprints for a part of the hypothesized mechanism and we
can trust our sources (empirical certainty), we would then explore whether
alternative parts are operative in the process. If extensive probing finds that
no plausible part is operative, we can then conclude that no causal link ex-
isted in the case. The case might be idiosyncratic, meaning that the mecha-
nism might still operate in other cases.
Isolating the Workings of Particular Parts and Extrapolation
While large differences exist in how the natural and social sciences conduct
within-case analysis based on mechanistic evidence, in some social science
research situations, it is appropriate to focus our analytical attention on the

Revised Pages
workings of a particular part of a mechanism by studying its operation in

a more conducive context for research (e.g., where there is a large amount
of accessible empirical material) and then using that mechanistic evidence
to infer that we should expect similar parts to operate in mechanisms in
other contexts. Such extrapolation of findings from one context to another
is widely used in the natural sciences (Steel 2008). Extrapolation requires
that the part or parts of the theorized mechanism we are studying exhibit
modularity—that is, they have a degree of interchangeability with similar
mechanisms in other contexts (see chapter 3).
When we are interested in understanding the workings of a particular
part, we can potentially study it in a more conducive context (e.g., a histori-
cal case for which there is a richer archival record). We do not do so to isolate
the workings of the part from other potential causes, because in a system
understanding we view mechanisms in holistic terms, with the whole more
than the sum of the parts. Instead, we do so to move our research to a con-
text in which we can observe the workings of the part of the mechanism in
far greater empirical detail. In the natural sciences, researchers who focus on
a particular part of a mechanism might investigate its workings in a setting
where they can gain richer observational evidence, such as by substituting
laboratory animals for human subjects (Steel 2008). For example, in research
on the causal mechanism linking smoking and lung cancer, researchers in-
vestigated how smoke is absorbed into the lungs of rats. The researchers
then evaluated whether it was reasonable to extrapolate from the mecha-
nistic evidence for the operation of parts of the causal mechanism found in
rats, concluding that we could expect that this part of the mechanism would
function similarly in humans. Do the two contexts have relevant differences,
such as metabolism differences in rats and humans, that would hinder the
extrapolation of findings from one setting to another (Steel 2008: 88–99)?
In the social sciences, we might focus on a part of an ideational theory
that details how shared norms restrain decision-makers. If we want to say
something about the workings of this part of the mechanism in cases of U.S.
presidential decision-making in the past two decades, we might go back
and select a historical case for which substantial empirical material is avail-
able instead of attempting to rely solely on biased empirical material such
as memoirs by participants or elite interviews. If we find strong evidence of
the mechanism’s operation in the historical record, we would then have to
evaluate whether there are theoretical and/or empirical reasons to expect that
what we found in the earlier case would not hold for a recent case.

Revised Pages
Extrapolation of findings about parts of mechanisms studied in differ-

ent contexts is very difficult (if not impossible) for most social mechanisms.
Our knowledge of social science mechanisms is much shallower than what
we possess in many natural sciences, where we have a good understanding of
many of the core mechanisms such as natural selection, thus enabling us to
compare well-known contexts with new contexts and extrapolate findings.
In addition, social mechanisms are arguably even more sensitive to context
because of the self-awareness of social actors. The role of self-awareness in
our ability to understand social scientific questions is a classic sociological
problem that adds a level of contextual factors to the equation, complicating
assumptions that what we found in the context of presidential decision-
making in the 1960s would also be relevant today. We must be extremely
cautious when attempting to extrapolate findings about the operation of
parts of mechanisms gained in different contexts. However, the information
gained from studying a part of a mechanism in a more conducive context
can—in theory—be helpful in focusing our attention on developing better
theories of parts of mechanisms and better empirical tests of them.
8.4. Guidelines for Theory-Testing Process-Tracing
These guidelines for a theory-testing process-tracing case study are appro-

priate for both minimalist and in-depth theory tests, with the distinction
being the extent to which the mechanism is unpacked theoretically in the
conceptualization phase.
Conceptualization
Before a mechanism can be theorized, we must define the condition (C) and
the outcome (O), attempting to conceptualize what about C can trigger a
mechanism or mechanisms. What are the spatial and temporal bounds of
the operation of the link theorized to be operative (see chapter 3)? In simple
terms, we should be able to clearly state what a given case is a “case of ”—
for example, the democratic peace, where two democratic countries resolve
their conflicts in a peaceful fashion in a situation in which nondemocracies
might have gone to war. If C and/or O is not known, a theory-building
design should be adopted to allow their identification. This could involve

Revised Pages
a most-similar-system design comparison or Mill’s method of agreement

(Beach and Pedersen 2016a: 241–45).
After C and O are defined, we can turn to fleshing out in theoretical
terms the process linking them together. In plausibility probe situations or
where the analytical focus is more on causes/outcomes instead of the pro-
cess linking them together, it is appropriate to define the mechanism in
minimalist terms, although some thought should still be given to potential
activities associated with the process to facilitate the operationalization of
stronger empirical tests. When in-depth theory-testing is appropriate, each
of the working parts should be defined in terms of entities engaging in ac-
tivities, ideally with “productive continuity” in the theorized process linking
the cause with an outcome. The entities and activities should be formulated
as nouns and verbs. For both variants, the contextual conditions should also
be developed.
Mapping the Population of Potential Cases and Case Selection
Before the empirical fingerprints of a theorized mechanism can be devel-

oped, an appropriate case needs to be selected (see chapter 4). Case must be
selected before tests are operationalized because within-case traces (mecha-
nistic evidence) are case-sensitive.
The first step here is to identify a population of potential cases where
a given mechanism might operate and then map them for their scores on
potentially relevant contextual conditions. Early in a research project, when
we are still unsure about whether a mechanism links together C and O, we
might select a only small handful of potential cases before we engage in a
comprehensive mapping of the population.
Once candidate cases are identified, we select a typical cases in which
the cause, outcome, and relevant contextual conditions are present. Early in
research we might not know much about the contextual conditions required
for a mechanism to work, which means that it can make sense to engage in a
theory test on a case where every condition that can plausibly be relevant for
the operation of the theorized mechanism is present. Alternatively, if we are
concerned about a large amount of mechanistic heterogeneity, it can make
sense to select the case with the most contextual similarities with the other
cases in the population. Identifying this type of case can be done using the
comparative tools discussed in chapter 4.

Revised Pages
Operationalization
After a typical case is selected, we estimate our prior confidence for the
mechanism (and each part when using in-depth theory-testing) to deter-
mine the relative strength of test required (see chapter 5). Predicted empiri-
cal fingerprints of the traces that activities associated with a mechanism (or
parts thereof ) should then be developed and evaluated for whether they
are theoretically certain and/or unique (see chapter 5). Predictions about
what evidence we should find translate the theoretical concepts of the causal
mechanism into case-specific tests. We must evaluate whether the activity
would have to leave a particular fingerprint (theoretical certainty) and if that
fingerprint is found, whether any plausible alternative explanations exist for
it (theoretical uniqueness). Theoretically certain but not unique tests can
be thought of as hoop tests that provide relatively circumstantial evidence
that enables only disconfirming inferences. In contrast, theoretically unique
evidence is more direct evidence of the activities of a causal process because
it enables confirming inferences to be made.
Collection and Assessment of Empirical Observations
The next step involves the collection and assessment of empirical observa-
tions (see chapter 6). Before our sources of evidence become mechanistic
evidence on which we can make inferences, we need to assess the content
and accuracy of found/not found empirical observations. Particularly im-
portant here is assessing what found/not found observations mean in a spe-
cific case context. Here, we must be as cautious as historians, not overstating
what found/not found observations actually mean and whether we can trust
them. As the adage holds, “The more you know, the less you know.” Unless
we can demonstrate that a found observation actually means what we think
it means and that it is trustworthy, we should not admit it as confirming
evidence.
After the probative value of each individual piece of confirming or dis-
confirming evidence is found, the final part of the assessment involves aggre-
gating evidence for the mechanism or its parts (see chapter 7). If the evidence
points in the direction of confirmation for each part of a mechanism, we can
infer that the mechanism worked as theorized. If disconfirming evidence is
found for a part (or overall link when minimalist theory-testing), theoretical
building should then be used (see chapter 9).

Revised Pages
Generalization
If confirming evidence is found for a minimalist mechanism or for each

part in in-depth process-tracing, the next question becomes whether the
mechanistic explanation can be generalized to other cases. As discussed in
chapter 4, we cannot assume that a set of cases that looks causally homo-
geneous at the level of conditions also exhibits mechanistic homogeneity.
Instead, the validity of this assumption needs to be tested empirically us-
ing the snowballing-outward strategy of repeated testing of a mechanistic
theory in different cases until we can be confident about the valid bounds
of generalization.
Appendix 1—An Example of In-Depth Theory-Testing

Process-Tracing
Löblová’s (2017) article uses in-depth theory-testing process-tracing to ex-

plain why some epistemic communities convince decision-makers of their
preferred policies and others do not. The article builds on a reading of the
literature of epistemic communities in which the author argues for the ex-
istence of a consensus that epistemic communities affect decision-making
but points out that the literature to date has failed to specify why and under
what circumstances these communities succeed. Accordingly, the lack of de-
tailed knowledge about how epistemic communities actually affect decision-
making leads her to conduct a theory test after developing an initial concep-
tualization of a set of parts linking the cause and outcome together, based on
existing theorization (see below). Löblová selects two cases for a comparison:
a (typical) successful diffusion of health technology assessment agencies as a
result of the efforts of the Polish epistemic community, and a negative (de-
viant) case where despite similar activities by a comparable epistemic com-
munity, no agency was created. Her analysis is interesting because that she
wants to trace the workings of the actual causal mechanism by which epis-
temic communities affect decision-making and specify the conditions under
which the mechanism produces the expected outcome.
Conceptualization of the Causal Mechanism
Löblová understands the cause as a situation where decision-makers lack

information and demand knowledge to solve a certain policy problem and

Revised Pages
the outcome is the adaptation of government policies in line with the prefer-
ence of the epistemic community. Haas’s (1992) theory is not explicit about
the cause that triggers the mechanisms, but the theory seems to generally
assume that decision-makers face complexity and lack of information about
possible solutions in the specific situation. To establish a possible mecha-
nism between C and O, Löblová uses Haas’s suggestion regarding how such
a mechanism might look:
As demands for such information arise, networks or communities

of specialists capable of producing and providing the information
emerge and proliferate. The members of a prevailing community
become strong actors at the national and transnational level as de-
cision makers solicit their information and delegate responsibility
to them. . . . To the extent to which an epistemic community con-
solidates bureaucratic power within national administrations and in-
ternational secretariats, it stands to institutionalize its influence and
insinuate its views into broader international politics. Members of
transnational epistemic communities can influence state interests ei-
ther by directly identifying them for decision makers or by illuminat-
ing the salient dimensions of an issue from which the decision makers
may then deduce their interests. (4)
Löblová then uses this to conceptualize an explicit causal mechanism

through which epistemic communities are expected to operate. This mecha-
nism consists of four parts, with a bifurcated third step. This mechanism is
illustrated in figure 8.4. The first step in the theoretical causal mechanism
is that a group corresponding to the definition of an epistemic community
must emerge. Step 2 relates to the idea that the community must actively
pursue its policy goal by disseminating its views. In step 3, members of the
community penetrate decision-making structures and shape their prefer-
ences from within and/or continue providing learning opportunities in line
with their views to those in power. Finally, decision-makers must adapt their
preferences to those of the epistemic community and act accordingly. Lö-
blová could have made her mechanism even more detailed and explicit—
for example, in relation to how C is operationalized and measured—but
the mechanism follows the theoretical arguments quite closely and is very
explicit about the workings of the activities at each step in the mechanism.
At the same time, the causal logics binding parts 3 and 4 together could be
made even more specific.

Cause (C) Part 1 Part 2 Part 3 Part 4 Outcome (O)
Theoretical causal mechanism
Decision Individuals Epistemic (3a) Decision Decision-
makers lacks with similar community Epistemic makers makers adopt
information causal, promotes their community are heavily policy in line
and demands principled favored policy consolidates influenced by with epistemic
knowledge and norma- bureaucratic epistemic com- community
tive beliefs, power munity when preferences
notions (3b) assessing
of validity Epistemic cost-benefits
and shared commu- of given
goals form nity providing policy
an epistemic learning op-
community portunities
Proposition about evidence (predicted observable manifestations)
Rise in Academic Epistemic (3a) Civil Argumenta- Fit between
demand for publications community servant and tion in policy epistemic
information or confer- organizing advisor posi- papers, bills, community
from politi- ences and conferences, tions on the and regula- preference
cians and mentions meetings, community tory impact and content
civil servants of indi- presentations, member or assessments, in legislation
vidual group workshops in interview as well as
members in and inviting evidence interview
interviews policymakers that detail accounts,
that they that repeat
managed to or closely
influence follow the
upon the de- arguments
cision makers put forward
through vari- by the HTA
ous official or epistemic
non-official community
access process
(3b) Same as
in part 2
Fig. 8.4. Löblová’s causal mechanism for domestic epistemic communities’

influence on policy

Revised Pages
Propositions and Predicted evidence
After specification of the theoretical mechanism that draws rather explicitly

on Haas’s (1992) formulations, Löblová identifies various observable impli-
cations for each of the parts in the causal mechanism—that is, she delineates
what would count as evidence for the existence of each part of the process.
The propositions about predicted evidence for each of the four parts are
depicted in the lower part of figure 8.4. She could be more explicit regard-
ing the types of evidence we should expect to see, why this material can act
as evidence, and what sources of evidence will be used, although to be fair
there are practical limits to how thorough one can be when working with
word limits for articles. Her lack of transparency in reasoning about theo-
retical certainty and uniqueness—basically, not fully answering the question
of why empirical material can be evidence—unfortunately reduces our abil-
ity to update our confidence in the existence of parts of the mechanism. To
alleviate this problem, the argument and the evidence supporting should be
made more persuasive—for example, by presenting it in an online appendix
using active citation (Moravcsik 2010).
According to Löblová, we would expect that part 1 (the emergence of an
epistemic community) should leave traces such as academic publications or
conferences and mentions of group members in interviews. Observable im-
plications of policy promotion in part 2 should include traces of the organi-
zation of conferences, meetings, presentations, workshops, and so forth and
of invitations to policymakers. Consolidation of bureaucratic power (part 3,
upper path) should be manifested in civil servant and adviser positions on
the community members’ resumes or in account evidence from interviews
specifying official or nonofficial access to decision-makers. Providing learn-
ing opportunities (part 3, lower path) should involve similar observables as
part 2—that is, dissemination and outreach. Finally, to empirically discern
part 4 (persuasion of decision-makers by the epistemic community), we
would ideally expect to find traces of argumentation in policy papers, bills,
and regulatory impact assessments as well as interview accounts, that repeat
or closely follow the arguments put forward by the epistemic community.
Prior Confidence
Löblová does not utilize very explicit Bayesian language, but she indicates
that she has a relatively high prior confidence in the presence of the causal

TABLE 8.1. Argument Road Map for Part 2 of Löblová Analysis of Epistemic Communities
Causal relationship
Causal mechanism linking lack of information with epistemic communities’ impact on government
policy.
Prior relatively low
• Existing research has focused only on causes and outcomes and has not explored the causal
mechanism. “Few studies of epistemic communities (with the notable exception of Dunlop,
2009) have examined the way the community strives to achieve its ends” (3).
Part 2 of the process: The situation where the epistemic community promotes its favored policy
Proposition 2: Activity of the Czech and Polish Communities: first building capacity, then
disseminating
• Evidence expected not clearly described.
• Theoretical certainty and uniqueness not discussed in case study.
Evidence of P2 (i) “The first step, though, was capacity building in the topic. Neither Poland
nor the Czech Republic had any experience with HTA and had only limited
expertise in pharmacoeconomics in general. EBM was also a relatively new
concept in both countries in the early 2000s. In fact, HTA activities in
Poland and in the Czech Republic both evolved from an interest in quality
issues in health care, and related notions of EBM and clinical guidelines,
introduced to the domestic environment by prominent clinicians. Nowhere
is this logical link more visible than in Poland, where first HTA reports were
produced by a department of a newly established National Center for Qual-
ity Assessment in Health Care” (12). Source: Interviews: I-110, I-185, I-27,
I-73; Nizankowski and Wilk 2009.
Lu: Unclear how this piece of evidence actually relates to capacity-building
and dissemination. We must assume that there is a high degree of accuracy
in the evidence provided, but the evidence could be make more transparent.
Weak confirmation of proposition 2
Evidence of P2 (ii) “In the absence of formal university programs or trainings on HTA or
health economics, HTA enthusiasts in both countries learned from text-
books and international conferences” (12). Source: Interviews I-10, I-110,
I-169, I-185, I-27, I-4, I-94.
Lu: Difficult to evaluate whether this was a unique evidence, since it is not
clear whether the absence of a program actually induced enthusiasts to learn
from textbooks, thereby weakening our ability to make confirming infer-
ences based on evidence found. Instead, it would be relevant to document
that members took active measures to enhance and spread knowledge. The
interview data might provide more direct proof of this.
Very weak confirmation of proposition 2
Evidence of P2 (iii) “CMJ employees in Poland received trainings in the early 2000s from inter-
national HTA academics” (12). Source: Nizankowski and Wilk 2009.
Lu: Not clear whether the training was part of a general capacity-building
process. We also lack a clear presentation of the source.

Evidence of P2 (iv) “In both countries, we can follow a growing interest and expertise in phar-
macoeconomics, with the Polish Society for Pharmacoeconomics founded
in 2000 and the Czech one in 2005, rapidly producing first notable research
outputs (guidelines for pharmacoeconomic analysis in local languages)”
(12). No sources provided.
?u. Though this is potentially a very important piece of evidence, the article
fails to offer evidence for the temporal order and evidence that the dis-
semination of the research outputs resulted from a deliberate process by the
community.
No updating possible
Evidence of P2 (v) “The two professional societies also organized a number of meetings,
working groups, and conferences. All of these efforts testify to a bottom-up
capacity-building: we observe in both countries the emergence of a new
field, driven by individuals with an interest in the topic. In Poland, this pro-
cess took place at a time where HTA was still a relatively fresh discipline—
with major international HTA organizations founded in the 1990s and
interest from the EU intensifying only in the early 2000s” (12). Sources:
Böhm and Landwehr 2014; Greer and Löblová 2017.
Lu: No evidence presented for that contention that the field was driven by
individuals with an interest in the topic. The listed trace evidence testifies
about the existence of a dissemination process, but clearer proofs could be
provided about the content and timing of these activities.
Evidence of P2 (vi) “Parallel to this internal learning process dissemination efforts targeted at
health practitioners (mainly hospital managers) and policymakers, including
ministry of health and payers’ staff, took place in both countries. In Poland,
CMJ organized week-long trainings on quality management and EBM,
related to HTA, as a conscious educational enterprise. . . . The educational
outreach of CMJ employees, during their stay at the institute and after, was
substantial” (12). Source: Interviews I-110, I-151, I-27; Nizankowski and
Wilk 2009.
Lu: The listed trace evidence testifies about the existence of a dissemination
process, but clearer proofs could be provided about the content and timing
of these activities. This proposition is potentially rather important, and
stronger documentation for the quality of the evidence is warranted.
Evidence of P2 (vii) “In addition, a Cracow-based HTA consultancy founded a nonprofit arm in
order to organize a meeting of an international HTA association in Poland
in 2004 and organized numerous trainings in EBM (HTA Consulting,
2004). Its audience included ministry of health as well as NFZ staff” (12).
Sources: Interviews I-185, I-73; HTA Consulting 2004; Nizankowski and
Wilk 2009.
Lu: Unclear whether the consultancy firm actually created the nonprofit
arm to organize the meeting of a HTA association in Poland or for another
purpose. Proposition is, however, well documented in relation to that the
consultancy-organized training in EBM and participation.
(continues)

Revised Pages
Evidence P2 (viii) “A nonprofit branch of a pharmacoeconomic consultancy (iHETA) gave
a number of presentations on HTA and pharmacoeconomics and, similar
to CEESTAHC activities in Poland, has offered health economics train-
ings since 2011, and ran a ‘capacity-building’ project cofinanced by Swiss
development funds” (13). Source: iHETA, 2013
Lu. No direct reference to what CEESTAHC actually did in Poland or any
direct sources on this.
Aggregation of General weak discussion of the accuracy of the evidence provided. Each
evidence for proposi- test has a very low degree of uniqueness. The low empirical uniqueness
tion 2 of all of the found sources means that it is possible only to provide weak
confirmation.
Inferences made:
The evidence suggests that the expected part of the causal mechanism was present in the case of
Poland.
Discussion of whether inferences warranted:
• More discussion needed of why empirical material is evidence, first by developing more clearly
the logical relationship between the propositions about observables and the parts of the causal
mechanism and then via discussions of why actual empirical material is evidence for each of
the propositions. Little or no source criticism, although the breadth of sources and multiple
pieces of evidence used increase our confidence in measurement accuracy.
relationship in Poland and the Czech Republic based on the existing litera-
ture, but also notes that the literature has not actually assessed Haas’s pos-
tulated causal mechanism. Her reading of the existing literature thus sup-
ports having a relatively high prior confidence but also leaves us somewhat
in the dark regarding the functioning of the causal mechanism itself. This
suggests that Löblová believes that the prior confidence in the mechanism
itself is relatively low, meaning that she is doing a form of plausibility probe,
where any evidence of a mechanism updates our confidence. Therefore, not
developing stronger reasoned propositions and testing them using stronger
empirical evidence is justified. When we have low prior confidence, even
relatively weak confirming evidence can be enough to update our confi-
dence. Löblová (2017: 7) therefore argues for the need to test the workings
of the causal mechanism “by verifying the mechanism with the help of a
case of an epistemic community’s success”; doing so would enable us to “see
if all the individual parts Haas proposes are necessary to how an epistemic
community influences policy.”

Revised Pages
Evidence for the Propositions
Löblová does not engage in a thorough theoretical evaluation of why empiri-

cal material might act as evidence (propositions) or evaluate the individual
pieces of found evidence for empirical uniqueness. We focus solely on the
evidence produced in relation to proposition 2 in table 8.1. As the argument
road map shows, she provides (at least) eight pieces of evidence for the ex-
istence of part 2. Although each of the pieces provides only relatively weak
updating of our confidence of the presence of part 1 of the mechanism, there
is a relatively low initial prior confidence in the hypothesized theory, mean-
ing that this approach is appropriate.
Does the Inferential Weight of the Evidence Match the Conclusions?
As a whole, the empirical evidence for each individual part should enable us
to increase or decrease our confidence in the presence of the entire mecha-
nism. The level of our posterior confidence in the whole mechanism reflects
the lowest posterior level of confidence for each of the parts after our empiri-
cal research. Despite its innovations regarding theoretical clarity about the
causal mechanism, Löblová’s analysis only slightly increases our confidence
in the causal mechanism she theorizes. However, this use of relatively weak
evidence is warranted in her research situation, with her study framed as
a plausibility probe where even weak evidence of the disaggregated causal
mechanism is better than no evidence. Löblová’s article could however ben-
efit from additional reflection on how the individual pieces of evidence are
aggregated and how the collected body of evidence increases or decreases
our confidence in the presence of each part and for the whole mechanism.

Revised Pages
C h a p ter 9
Theory-Building Process-Tracing
9.1. Introduction
In its purest form, theory-building process-tracing starts from scratch with

an empirical narrative in an attempt to build a plausible hypothetical causal
mechanism that can explain the link between a cause and outcome. How-
ever, theory-building process-tracing is almost never a purely inductive
method. Instead, hunches about potential fingerprints of a process that are
informed by existing theorization act as implicit structuring tools for the
search for mechanisms. And in most realistic research situations, theory-
building and theory-testing are actually used in an iterative fashion, and the
actual research process is therefore often closer to abduction (Tavory and
Timmermans 2014). We might have started with theory-testing but quickly
found our theorized mechanism wanting. This should lead to a period of
theory-building, attempting to shed light on what process might link to-
gether the cause and outcome.
This chapter develops two distinct variants of theory-building process-
tracing: (1) building a theorized mechanism linking together a cause and out-
come, and (2) revising theories about mechanisms in cases where a mecha-
nism found operative in other cases should have worked but broke down.
In the first instance, we know causes and outcomes but do not know what
links them together. If we either do not know what cause may be linked to
an outcome or what outcome might be linked with a cause, we suggest using
more explorative “soaking and probing”-type case studies or focused com-
parisons, which require considerably fewer resources than theory-building
process-tracing. In particular, using a focused comparison across a set of four
or more very similar cases is particularly attractive because doing so is easy
269

Revised Pages
and the use of multiple cases increases our confidence that a candidate cause
is not case-specific (Beach and Pedersen 2016a: 241–45; Rohlfing 2008).
Theoretical-revision process-tracing is used when we have engaged in
process-tracing in a number of typical cases and want to explore cases where
a cause (or causes) is present and the outcome should have occurred but did
not. In this type of deviant (consistency) case, process-tracing seeks to trace
the mechanism until it breaks down, thereby shedding light on omitted con-
textual and/or causal conditions that potentially must be present. The actual
tracing of mechanisms until failure is, however, an adjunct method used to
focus a subsequent comparative analysis of similar typical and deviant cases
and figure out what is missing in the deviant case.
9.2. Theory-Building Process-Tracing
Building theories of mechanisms is a creative and iterative process, which

means that it is not possible to develop a set of step-by-step instructions.
Thus, we cannot define clearly what “good” theory-building process-tracing
actually looks like. Instead, the key metric is that regardless of how the pro-
cess is conducted, the final result is a theorized mechanism that sheds light
on how a cause (or set of causes) contributes to produce an outcome. Never-
theless, we provide some suggestions for steps that can be taken to maximize
the likelihood of identifying a mechanism—if present—in a given case.
Both understandings of mechanisms can be used, depending on the re-
search question. If the question is more focused on causes and outcomes,
theory-building can shed light on a plausible link. Here it is not necessary to
unpack the mechanism in much detail, and analysis can stop once informa-
tion regarding the causal link and some confirming mechanistic evidence
have been found. In contrast, if the research question focuses on how a cause
(or set of causes) is linked to an outcome, the mechanism should be probed
in more detail, enabling the building of a theorized mechanism that details
each of the important working parts of the process.
In theory-building process-tracing, empirical material is used to con-
struct a hypothesized plausible causal mechanism. This involves a process
where we infer that found observations are systematic patterns in the empiri-
cal record that plausibly reflect the empirical fingerprints of an underlying
causal mechanism (or a part thereof ). While theory-building process-tracing
has some elements that overlap with explaining-outcome process-tracing
(see chapter 10), the key difference between the two is that theory-building

Revised Pages
Theory-Building Process-Tracing 271
process-tracing seeks to build a theorized causal mechanism that is poten-

tially generalizable outside of the individual case (a contingent or mid-range
mechanism), whereas explaining-outcome process-tracing focuses on build-
ing a minimally sufficient explanation of the major factors associated with
the outcome in an individual historic case, involving multiple causes and
mechanisms.
As with theory-testing process-tracing, only typical cases are selected for
theory-building—if we want to build a theory about a mechanism that links
together C and O, we should select a case in which the mechanism can
be present, at least in principle. A difference between theory-testing and
-building is that, because we do not know the mechanism binding the cause
and outcome, we also do not know much about the contextual conditions
required for the mechanism to function properly. This means that when en-
gaging in theory-building, we should select typical cases in which almost any
thinkable contextual condition is present. We do not know a priori whether
the mechanism is actually present or whether there is enough evidence to
update our confidence in the existence of the mechanism, but by selecting a
case where the mechanism can hypothetically be present, we can (at least in
principle) build a theory about what mechanism(s) link C and O.
Figure 9.1 illustrates the basic framework for theory-building process-
tracing. The starting point for this type of analysis is to define the key theo-
retical concepts (C and O). As with theory-testing, it is important to be
cognizant of the need to theorize causes and outcome in ways that are com-
patible with mechanistic explanations—for example, by emphasizing the ac-
tive attributes of the cause that could trigger some process.
After defining causes and outcomes and before we engage in the collec-
tion of empirical material, it can be useful to look at existing theorization as
a source of inspiration for processes for which to search. This can be done by
engaging in a far-reaching survey of the existing literature, probing in dif-
ferent subdisciplines for possible mechanisms. It can also be useful to think
about the building blocks that different types of theoretical explanations
share to focus analytical attention on potential processes (see chapter 3).
Is the theory structural, institutional, ideational, or psychological (Parsons
2007)? For example, if we are searching for structural causes, we might focus
our analytical attention on the potential impact of exogenous constraints
and opportunities for political action created by actors’ material surround-
ings (Parsons 2007: 49–52).
After gaining some insight into potential processes, the actual theory-
building process-tracing starts with a rough empirical narrative about what

Revised Pages
Fig. 9.1. The three steps of theory-building process-tracing
happened in a case. By crafting this narrative, we might find a period of

slow institutional change, which would suggest that we should look toward
mechanisms incorporating structural or institutional theoretical elements
that could account for the institutional stickiness in the face of massive pres-
sures for change (Grzymala-Busse 2011: 1272).
Combining these hunches about process with an empirical narrative, we
can start probing to see whether anything in the empirical record might con-
stitute systematic patterns that are the empirical fingerprints of the activities
associated with a mechanism (or a part thereof ). Given that we do not yet
know the precise scope of the case because we are still unsure about the
mechanisms involved, the probing should be quite open and far-reaching.
This process resembles what an investigator does when building a theory of
a crime. After a rough description of the case is developed, the investigator
then probes the empirical record to determine whether any clues can shed
light on why things happened. We often stumble on surprising or unex-
pected pieces of empirical material, which then are assessed using the two-
step procedure and Bayesian framework (chapters 5–6). For our hypothetical
crime, the investigator might find a set of tire tracks in the mud leading away

Revised Pages
from the scene and then would ask whether the clue could link the suspect’s
car to the crime, whether there is any plausible theoretical link between the
clue and the crime, and whether this proposition about evidence matches
what has been found empirically. By itself, the clue would be only a weak
piece of evidence in support of a proposition that a person who might be
a suspect was present, but if combined with other pieces of evidence, like
footprints at the crime scene that roughly match those of the owner of the
car, the evidence would be stronger confirmation of the theory.
Empirical probing is an iterative process in which hunches about poten-
tial clues are explored to determine what they can tell us about the newly
found candidate part of the process. If our initial hunch holds empirically
and theoretically, we could then engage in more focused testing, exploring
whether any other independent empirical fingerprints of the theorized part
of the mechanism exist. Returning to the crime scene, we could then inves-
tigate whether the suspect’s footprints or other forms of forensic evidence are
present. If we find the evidence and if it is relatively unique both theoreti-
cally and empirically (e.g., there is no evidence of other people’s presence at
the scene), we would be more confident that our theory holds empirically.
Once we are relatively confident about the newly identified part of a process,
we can then engage in more systematic and robust case studies.
Step 2 in the figure involves inferring from the found observations that
there is actual mechanistic evidence that an underlying plausible part of a
causal mechanism was present. But mechanistic evidence does not speak
for itself. Theory-building often in practice has a testing element, in that
scholars seek inspiration from existing theoretical work and previous obser-
vations for what to look for. For example, an analyst investigating socializa-
tion of administrative officials within international organizations could seek
inspiration in theories of domestic public administration or in psychological
theories of small group dynamics while reading more descriptive accounts
of the workings of international organizations. Existing theory can be con-
ceived as a form of grid that aids in the detection of systematic patterns
in empirical material, enabling inferences about predicted evidence to be
made. In other situations, the search for mechanisms is based on hunches
drawn from puzzles for which existing work cannot account.
In step 3, the secondary inferential leap is made from found evidence
to infer that it reflects an underlying causal mechanism. In reality, theory-
building process-tracing is usually an iterative and creative process, with
the results of each search forming the background for further searches. This
means that steps 1 and 2 are often repeated before step 3 is reached.

Revised Pages
Another way to build a theorized mechanism is to compare empirical

narratives in two or more typical cases, attempting to distill the essence
of the operation of a midrange, more abstract mechanism (see chapter 3).
However, once a hunch about the process is developed, we would still need
to engage in more focused within-case testing to determine whether mecha-
nistic evidence supports the operation of the found mechanism; the results
of this testing would likely trigger another round of theory-building.
9.3. Theory Revision with Theory-Building Process-Tracing
In this variant of theory-building, the main purpose is revising causal theo-

ries (C → O), and tracing mechanisms is an adjunct analytical tool to find
omitted contextual and/or causal conditions that must be present for the
outcome to occur. Here, we select cases for tracing mechanisms that are
deviant consistency in relation to an existing theoretical understanding, usu-
ally based on previous research on typical cases that found the mechanism
operative. By studying mechanism breakdown in these deviant consistency
cases (see chapter 4), we gain information about omitted contextual and/or
causal conditions that can be used to build better causal theories. Tracing
mechanisms do not stand alone in this type of design; indeed, the analytical
heavy lifting in revising theories is done by systematically comparing a devi-
ant case with a typical case.
Deviant consistency cases have two different purposes, depending on
whether we are theorizing that C is a sufficient cause for O or merely a
contributing cause. These guidelines illustrate how we can combine com-
parative and within-case methods to uncover omitted causal or contextual
conditions. If we theorize that C is a sufficient cause of O, deviant con-
sistency cases are those where C is present but where O is not present. In
these circumstances, the deviant consistency case is useful to investigate the
contextual conditions that must be present to trigger the mechanism that will
produce O. If C is not theorized as a sufficient cause, deviant consistency
cases can also be used to detect omitted causal conditions that together with
C would be sufficient to produce O.
In both instances, we employ this type of design only after we have posi-
tive results when tracing a mechanism in one or several typical cases. There
is no reason to investigate mechanism breakdown before we are more confi-
dent about the actual existence of a mechanism linking C and O in one or
more typical cases.

Revised Pages
The basic idea for using theoretical-revision process-tracing of deviant

consistency cases is that one traces a mechanism until it breaks down and
then explore when and why the mechanism failed (Anderson 2011: 421–22).
Existing theories of causal mechanisms that were either built or tested on
typical cases provide the foundation for the tracing in deviant cases until
mechanism breakdown. Finding out when and why a mechanism breaks
down gives us clues about omitted contextual or causal conditions, although
the process-tracing component here is auxiliary: the main analytical method
is a systematic and focused comparison of the deviant case with a typical
case. Using an example of tracing mechanisms linking smoking and cancer,
this type of hybrid design would involve tracing mechanisms linking C and
O in a heavy smoker who does not get cancer (deviant consistency case) to
shed light on an omitted condition that may have inhibited the carcinogenic
effects of smoke. We might, for example, find that in the deviant case there
was a particular gene that suppressed the carcinogenic effects of smoking in
lung cells, resulting in mechanism breakdown. This would suggest that the
absence of the particular gene is a contextual condition for C to produce O.
A theoretical-revision process-tracing design therefore relies on a two-
step analysis. We first use a deviant case to trace where and why the theorized
mechanism breaks down and then use these insights to inform our pairwise
comparison of the deviant and typical cases to uncover omitted causal and/
or contextual conditions. A good example of theory revision can be found
in Löblová’s (2017) case study of the impact of epistemic communities on
government policy outputs (see chapter 8). By tracing the mechanism until
breakdown, and comparing this with the Polish (typical) case (described in
chapter 8), she finds changing conditions in the Czech case related to a drop
in policymaker demand for expertise can account for breakdown.
After we have found an omitted contextual or causal condition, we would
have to reclassify our cases, with deviant cases becoming irrelevant (moved
to quadrant III), as they lack a causal condition that must be present for O
to occur (sufficiency) or a contextual condition that enables the mechanism
to function properly.
9.4. Guidelines for Theory-Building Process-Tracing
The procedure for theory-building follows the path of theory-testing regard-

ing conceptualization and case selection but then diverges significantly. The
guidelines are appropriate for building both minimalist explanations and

Revised Pages
mechanisms-as-systems explanations, with the key difference being whether

we stop the search for mechanisms after a very rough sketch is developed
(either a superficial or incomplete mechanistic explanation) or whether we
continue until each working part is detailed.
Conceptualization
Before a mechanism can be found, we must define the condition (C) and the
outcome (O) and then conceptualize what about C might trigger a mecha-
nism or mechanisms. While we do not know what mechanism links the
two, the definitions of C and O should give us some idea of the spatial and
temporal bounds of the link theorized as operating (see chapter 3). If C
and/or O are not known, a theory-building design should be adopted that
would allow their identification. Theory-building process-tracing should be
virtually the last methodological tool used, given the amount of analytical
resources that would be required to trace an unknown mechanism triggered
by an unknown cause. More appropriate methods would include probing
case studies or comparisons such as a most-similar-system design or Mill’s
method of agreement (Beach and Pedersen 2016a: 241–45).
After C and O are defined, we should devote some thought to potential
contextual conditions. However, given that we do not know what mecha-
nism (if any) links together C and O, we often have little information about
the required contextual conditions.
Mapping the Population of Potential Cases and Case Selection
Theory-building process-tracing also utilizes typical cases in which a causal

mechanism linking together C and O might be present. The first step is to
identify a population of potential cases where a mechanism might operate,
mapping cases on their scores on C, O, and potentially relevant contex-
tual conditions. When we are still unsure about whether a mechanism links
together C and O early in a research project, we might select only a small
handful of potential cases before we engage in a comprehensive mapping of
the population.
After candidate cases are identified, we select a typical case in which the
cause, outcome, and potentially relevant contextual conditions are present.
If we have little knowledge about potential contextual conditions, we can

Revised Pages
build a theorized mechanism by selecting a case in which every condition

that can plausibly be relevant for the operation of the theorized mechanism
is present. Alternatively, if we fear the existence of a large amount of mecha-
nistic heterogeneity, we can select a case with contextual conditions that are
most similar to those of the other cases in the population (see chapter 4). If
the comparative narrative strategy is used (see chapter 3), two typical cases
that are as similar as possible on relevant contextual conditions are selected.
Two Strategies for Finding Mechanisms
There is no such thing as pure induction in theory-building process-tracing

because we are (and should always be) inspired in our search for mechanisms
by existing theories. The first step in theory-building is to develop an empiri-
cal narrative of a typical case or of two typical cases if a comparative narrative
strategy is used (see chapter 3). The goal is to translate this descriptive narra-
tive into a blow-by-blow account that can plausibly link together C and O
as we attempt to figure out the key causal steps in the process.
At the same time, a wide-ranging search of the empirical record, inspired
by existing theorization, should be undertaken for potential systematic pat-
terns that might be the empirical fingerprints of the activities associated with
a mechanism. Combining the information gathered from the narrative strat-
egy and this probing of the empirical record, we can slowly develop a sketch
of an underlying mechanism. This back-and-forth process is very similar to
abductive analysis (Tavory and Timmermans 2014).
Once we develop a sketch that details some activities associated with the
mechanism, they can be tested in a more systematic fashion by operational-
izing fingerprints of the activities that are independent of the evidence on
which the theory was developed. The goal of the tests is to flesh out the
mechanism’s key working parts.
Generalization
As with theory-testing, if we find confirming evidence for a found mech-

anism, the next question in theory-building becomes whether the found
mechanistic explanation can be generalized to other cases. As discussed in
chapter 4, we cannot assume that a set of cases that looks homogeneous
at the level of causes/outcomes also exhibits mechanistic homogeneity. In-

Revised Pages
stead, the validity of this assumption needs to be tested empirically using

the snowballing-outward strategy of repeated testing of a mechanistic theory
in different cases until we can be confident about the valid bounds of the
generalizing inference.
Appendix 1— Janis’s Theory-Building Process-Tracing
An example of a theory-building process-tracing work is Janis’s (1983) book

on groupthink. He builds a causal mechanism to detail how conformity
pressures in small groups can adversely affect foreign policy decisions, using
a selection of case studies of policy fiascoes that resulted from poor decision-
making practices by small and cohesive group of policymakers. He uses the
term groupthink to describe this causal mechanism.
Janis’s first exploratory case is an analysis of the Bay of Pigs Invasion.
He notes first that groupthink was by no means the sole cause of fiasco
(1983: 32), but he also notes a puzzle that existing explanations are unable
to explain: why did the “best and the brightest” policymaking group in the
Kennedy administration not pick to pieces the faulty assumptions underly-
ing the decision to support the intervention? He writes, “Because of a sense
of incompleteness about the explanation, I looked for other causal factors
in the sphere of group dynamics” (32–33). Janis suggests that the groupthink
mechanism is a part of the explanation but that it is not sufficient to explain
the outcome (34).
The starting point of each of his case studies is psychological theories
of group dynamics; relevant political science theories, such as Allison’s or-
ganizational model; and Janis’s previous research, drawing on these factors
in a search through the empirical record for systematic factors that form
part of a possible groupthink causal mechanism. His search for parts of the
mechanism is also informed by empirical works on the Bay of Pigs: “When
I reread Schlesinger’s account, I was struck by some observations that earlier
had escaped my notice. These observations began to fit a specific pattern of
concurrence-seeking behavior that had impressed me time and again in my
research on other kinds of face-to-face groups. . . . Additional accounts of
the Bay of Pigs yielded more such observations, leading me to conclude that
group processes had been subtly at work” (Janis 1983: vii). Here we see the
importance of imagination and intuition in devising a theory from empirical
evidence, though theoretical research also plays a role.
Step 1 involves collecting empirical material to detect potential evidence

Revised Pages
of underlying causal mechanisms. Inferences are then made that part of the
mechanism existed (step 2), resulting in the secondary inference that an un-
derlying mechanism was present in step 3. According to Janis (1983: ix), “For
purposes of hypothesis construction—which is the stage of inquiry with which
this book is concerned—we must be willing to make some inferential leaps
from whatever historical clues we can pick up. But I have tried to start off
on solid ground by selecting the best available historical writings and to use
as my springboard those specific observations that appear to be solid facts in
the light of what is now known about the deliberations of the policy-making
groups.” Further, “What I try to do is to show how the evidence at hand can
be viewed as forming a consistent psychological pattern, in the light of what
is known about group dynamics” (viii).
The presentation of the empirical evidence is not in the form of an ana-
lytical narrative describing events or causal steps between C and O. Instead,
Janis writes, “Since my purpose is to describe and explain the psychological
processes at work, rather than to establish historical continuities, I do not
present the case studies in chronological order. The sequence I use was cho-
sen to convey step-by-step the implications of group dynamics hypotheses”
(1983: viii–ix). He describes four different “symptoms” that can be under-
stood as evidence of a groupthink mechanism, including the illusion of in-
vulnerability, the illusion of unanimity, the suppression of personal doubts,
and the presence of self-appointed mind guards. For example, the shared
illusions of invulnerability and unanimity helped members of the group
maintain a sense of group solidarity, resulting in a lack of critical appraisal
and debate that produced a dangerous level of complacent overconfidence.
Janis (1983: 47) concludes, “The failure of Kennedy’s inner circle to de-
tect any of the false assumptions behind the Bay of Pigs invasion plan can
be at least partially accounted for by the group’s tendency to seek concur-
rence at the expense of seeking information, critical appraisal, and debate.
The concurrence-seeking tendency was manifested by shared illusions and
other symptoms, which helped the members to maintain a sense of group
solidarity. Most crucial were the symptoms that contributed to complacent
overconfidence in the face of vague uncertainties and explicit warnings that
should have alerted the members to the risks of the clandestine military
operation—an operation so ill-conceived that among literate people all over
the world the name of the invasion site has become the very symbol of per-
fect failure.”

Revised Pages
C h a p ter 1 0
Explaining-Outcome Process-Tracing
10.1. Introduction
The goal of many process-tracing case studies is to develop a more compre-

hensive explanation of a particular specific interesting historical outcome.
In this variant of process-tracing, the goal is not to develop a generalizable
theorized mechanism but instead to craft a case-specific explanation of the
major factors in a case. In explaining-outcome, cases are defined in a more
holistic fashion. Instead of being instances of a narrow theoretical relation-
ship, they instead are depicted as proper nouns (e.g., the Cuban Missile Cri-
sis or the start of World War I). In the words of one scholar, “Not all cases are
equal. Some have greater visibility and impact because of their real-world or
theoretical consequences. World War I is nonpareil in both respects. Its ori-
gins and consequences are also the basis for our major theories in domains
as diverse as political psychology, war and peace, democratization, and state
structure” (Lebow 2001: 594). This holistic approach means that multiple
different causes are usually at play in the analysis, and many of them are
case-specific. Evans (1995: 4) writes, “Cases are always too complicated to
vindicate a single theory, so scholars who work in this tradition are likely
to draw on a mélange of theoretical traditions in hopes of gaining greater
purchase on the cases they care about.”
Explaining-outcome process-tracing makes sense only when one adopts
an understanding of science that differs from the neopositivist or critical re-
alist understandings that undergird the theory-building and testing variants
of process-tracing discussed in the previous chapters.1 Explaining-outcome
1. The neopositivist version of theory-testing process-tracing focuses on the empirical trac-
281

Revised Pages
builds on what Jackson (2016) terms analyticism, a set of different philo-

sophical positions that have the common denominator that we as observers
are part of the social world we are studying. In this view, knowledge cannot
be separated from our place in the world, and we can learn about it through
practical instruments (theories) that enable us to ask useful empirical ques-
tions (Jackson 2016: 124–26). Of most relevance here is the philosophical
pragmatist variant of analyticism, where theories are understood as heuristic
tools that capture important aspects of what is going on in particular cases
(Friedrichs and Kratochwill 2009; A. Humphreys 2010; Peirce 1955; Tavory
and Timmermans 2014). Theories are not tested per se but are instead uti-
lized in a more instrumental fashion to enable us to craft comprehensive
explanations of particular cases. Given this goal, multiple theories are typi-
cally combined in an eclectic fashion, as one theory by itself is almost never
sufficient (Sil and Katzenstein 2010). Allison’s (1971) classic case study of the
Cuban Missile Crisis is an example of this type of research, employing three
different theories to explain different aspects of the crisis in a complemen-
tary fashion (see appendix 1; see also Allison and Zellikow 1999).
Comprehensive explanations of particular historic outcomes can also be
thought of as minimally sufficient explanations of a particular outcome, with
sufficiency defined as an explanation that accounts for all of the important
aspects of an outcome with no redundant parts being present (Mackie 1965).
This way of understanding sufficiency corresponds somewhat with histori-
ans’ approach to explaining events—that is, they seek to account for what
actually happened and why. As an example of Explaining-Outcome process-
tracing, Jervis (2010) analyzes two intelligence failures by the U.S. national
intelligence community: the failure to detect the 1979 coup against the Shah
of Iran and the incorrect belief that weapons of mass destruction were pres-
ent in Iraq. The explanations for outcomes in the two cases are very complex
and case-specific, combining many different causes.
Developing sufficient explanations of particular outcomes therefore re-
quires a strategy of analytical eclecticism, defined as the pragmatic use of dif-
ferent causes and causal mechanisms in combination with each other to craft
a sufficient explanation of a particular outcome. It “offers complex causal
stories that incorporate different types of mechanisms as defined and used in
ing of minimalist mechanisms that are generalizable to a larger number of cases. In contrast,
critical realists take mechanisms more seriously and consequently believe that generalizations
are possible only across small sets of cases. Critical realists believe that mechanisms are un-
observable but are nevertheless useful analytical constructs whose fingerprints can be traced
empirically.

Revised Pages
Explaining-Outcome Process-Tracing 283
diverse research traditions [and] seeks to trace the problem-specific interac-

tions among a wide range of mechanisms operating within or across differ-
ent domains and levels of social reality” (Sil and Katzenstein 2010: 419).The
theories developed about processes are typically complex, often including
conglomerates of different causes along with more case-specific causes. The
ambition is not to merely test whether or not one cause (and mechanism) is
present; instead, the goal is to craft a comprehensive explanation.
Because of the involvement of many causal conditions, many of them
unique to the case at hand, and because the outcomes are defined in a
broader, more holistic sense, case-centric scholars question the usefulness of
generalizing from the studied case to other cases (A. Humphreys 2010: 269–
70). This means that Explaining-Outcome process-tracing does not seek to
create synthetic grand theories but instead is a more pragmatic strategy for
capturing the multiplicity of causes and linking them to outcomes that pro-
duce particular historical outcomes. Explaining-Outcome process-tracing
can therefore also be termed “problem-oriented research.”
This approach marks a significant departure from the two theory-centric
variants of process-tracing methods, which are much more parsimonious in
their theorization of the causal mechanism linking C with O. For example,
theory-testing process-tracing makes no claims about whether the cause and
mechanism is sufficient—inferences are made only about whether the pos-
tulated mechanism is present or absent in the single case.
While explaining- outcome process- tracing studies sometimes more
closely resemble historical scholarship, this type of process-tracing still con-
stitutes social science research, as the ultimate explanation involves theo-
retical claims about processes that are coupled with a systematic search for
confirming/disconfirming mechanistic evidence. Philosophical pragmatism
therefore is not an anything-goes excuse for sloppy empirical analysis but
instead relates to how theories are understood and used as heuristic vehi-
cles instead of as parsimonious and generalizable explanatory devices. The
types of empirical tests used in explaining-outcome process-tracing should
therefore be just as rigorous empirically, clearly defining theoretically why
empirical material can be act as evidence and what it tells us if found or not
found, along with careful source criticism to determine the trustworthiness
of found evidence.
While explaining-outcome process-tracing often combines causal theories
from different research traditions, key concepts and theoretical assumptions
must be compatible with one another (Sil and Katzenstein 2010: 414–15). For
example, we cannot combine an ideational causal logic that contends that

Revised Pages
subjective beliefs drive actor behavior with an institutionalist logic in which

behavior is driven purely by the rational maximization of material interests.
These two theories would be mutually exclusive at the theoretical level, so they
should not be combined without developing some form of bridging theory
that explains interaction between norms and rational interests.
The term causal mechanism is used in a much broader sense in
explaining-outcome process-tracing than in the two theory-testing and
theory-building variants. First, whereas theory- testing and - building
process-tracing seek to test/build mechanisms that apply across a range
of cases, when crafting a minimally sufficient explanation in explaining
outcome process-tracing, we almost always need to combine mechanisms
into an eclectic conglomerate mechanism to account for a particular his-
torical outcome. While Elster (1998: 45) and others contend that mecha-
nisms always have to be at level of generality that transcends a particular
spatiotemporal context, thereby excluding case- specific elements from
mechanisms, other scholars have more pragmatically argued that mecha-
nisms that are unique to a particular time and place also can be defined
as mechanisms. Wight (2004: 290), for example, has defined mechanisms
as the “sequence of events and processes (the causal complex) that lead
to the event.” Case-specific or nonsystematic mechanisms and their parts
can be distinguished from systematic ones by asking whether we should
expect the mechanism or their parts to play a role in other cases. The im-
portance of nonsystematic elements in explaining a particular outcome
makes explaining-outcome process-tracing sometimes more analogous to
the historical interpretation of events (Roberts 1996).
At the same time, the inclusion of case-specific mechanisms (sometimes
depicted as events) has an important advantage, enabling us to capture actor
choice and the contingency of historical events and thus immunizing our
research against historians’ criticisms of political science (Gaddis 1992–93;
Roberts 1996; Rueschemeyer 2003; Schroeder 1994). In the words of Lebow,
“Underlying causes, no matter how numerous or deep-seated, do not make
an event inevitable. Their consequences may depend on fortuitous coinci-
dences in timing and on the presence of catalysts that are independent of
any of the underlying causes” (2000–2001: 591–92). This does not mean
that we must adopt probabilism at the ontological level, because doing so
would imply that things “just happened.” Events happened for a reason, but
providing a full account of an outcome requires including reasons that are
complex and case-specific. The admission of case-specific causes does not
mean that case-specific factors are preferable: “To clarify, single-outcome

Revised Pages
research designs are open to idiographic explanation in a way that case study
research is not. But single-outcome researchers should not assume, ex ante,
that the truth about their case is contained in factors that are specific to that
case” (Gerring 2006: 717).
Second, given that the inferential ambition is case-centric, seeking a min-
imally sufficient explanation of a particular outcome, more case-sensitive
parts must usually be included in the causal mechanism. The explanation
cannot be detached from the particular case. Theorized mechanisms are
therefore seen as heuristic instruments whose function is to help build the
best possible explanation of a particular outcome (A. Humphreys 2010;
Jackson 2016).
Case selection in explaining-outcome process-tracing is driven by a
strong interest in accounting for a particular interesting and/or historically
important outcome. The outcome is viewed not as a case of something
but instead as a particular event expressed as a proper name. Explaining-
outcome process-tracing can therefore be thought of as a single-outcome
study, seeking the causes of a specific outcome in a single case (Gerring
2006). Examples of this type of study in the literature include Layne’s (1996)
study of U.S. grand strategy toward Western Europe after World War II and
Schiff’s (2008) analysis of the creation of the International Criminal Court.
While case selection in explaining-outcome process-tracing can resemble
the selection of extreme cases (Gerring and Seawright 2007), a case such as
the Cuban Missile Crisis, when understood in a more holistic fashion, is
not just a case of a theoretical concept like crisis bargaining. In explaining-
outcome process-tracing, the goal is to craft a minimally sufficient explana-
tion that captures the unique character of a specific historical event. We
choose the case because it is the Cuban Missile Crisis—it is, in and of itself,
historically important to understand.
The findings of explaining-outcome process-tracing cannot be general-
ized to other cases for two reasons. First, the case itself is unique, given our
broader conceptualization of outcomes (the Cuban Missile Crisis instead
of a narrower theoretical phenomenon such as a case of deterrence bargain-
ing). Second, given the inclusion of nonsystematic parts and case-specific
combinations of mechanisms in our explanations, the actual explanation is
also case-specific. However, it is possible to point outwards when presenting
findings. This occurs when there are elements of the explanation that appear
to have a more “general” character, meaning they potentially can travel to
other cases because they appear to be less contextually specific to the par-
ticular case.

Revised Pages
10.2. Conducting an Explaining-Outcome Process-Tracing Design
Explaining-outcome process-tracing is an iterative research strategy that

seeks to trace causal mechanisms defined in the broader and more pragmatic
sense discussed earlier. The method of abduction is used here, in which there
is a continual and creative juxtaposition of empirical material and theories.
Claims about sufficiency are made after we have pragmatically accounted for
the major events in the case; in essence the explanation engages in what can
be termed “inference to the best explanation.” Using abduction, the interac-
tion of theory and empirics enables us to converge on the best explanation
in the particular case (Day and Kincaid 1994; Peirce 1955; Swedberg 2014;
Timmermans and Tavory 2014).
Figure 10.1 depicts two different starting points, theory and empirics.
The theory-first path follows the steps described in the discussion of theory-
testing, where an existing cause and mechanism is tested to determine
whether it can account for the outcome. To get an idea of plausible causes
and mechanisms, we should first examine existing scholarship for potential
causes that can explain the particular outcome. If we are trying to account
for the major events of the Cuban Missile Crisis, we can ask what the crisis
might have been a case of, drawing on a range of different answers to the
question to identify relevant theories. Empirical tests can then be developed
for the mechanism, which is then assessed to determine whether or not the
process was operative (see chapter 8).
The theory-building path uses empirical evidence and the tools discussed
in chapter 9 to build a new mechanism that may account for key elements
of the outcome. The building path is often used when we are research-
ing a little-studied phenomenon. Here, we can proceed in a manner more
analogous to historical methodology (Roberts 1996)—for example, work-
ing backward from the outcome by sifting through the empirical record in
an attempt to uncover a causal mechanism that might have produced the
outcome, much like classic detective work. This bottom-up analysis uses
empirical material as the basis for building a plausible explanation of causal
mechanisms whereby C (or multiple causes) produced O. The important
question, then, is when to stop this process. That is, how do we know a
sufficient explanation when we see it? There is no foolproof answer to this
question in the pragmatic understanding of research. Instead, the decision
that an explanation is sufficient is based on an assessment of whether all
the relevant facets of the outcome have adequately been accounted for and

Revised Pages
Fig. 10.1. Explaining-outcome process-tracing
whether the evidence is best explained by the developed explanation rather

than plausible alternatives (Peirce 1955). This research process is therefore
iterative: we update our explanation until it provides what we believe is the
best possible explanation (Day and Kincaid 1994). We can never confirm a
theory with 100 percent confidence; instead, we stop when we are satisfied
that the found explanation accounts for the most important aspects of the
outcome.
In both paths, theorized mechanisms and empirical tests are treated
pragmatically, as heuristic devices to understand important events (see A.
Humphreys 2010). However, given that existing theorization usually cannot
provide a sufficient explanation of the major events in a complex historical
outcome, the second stage of explaining-outcome process-tracing involves
combining and revising existing mechanisms and theories to achieve a bet-
ter account of the case. The final explanation is therefore typically a com-
bination of existing theories and mechanisms supplemented by more case-
specific explanations.

Revised Pages
Appendix 1—An Example of Explaining-Outcome

Process-Tracing
Perhaps the best example of explaining-outcome process-tracing is Allison’s

classic study of the Cuban Missile Crisis, one of the most cited works in the
social sciences (see Allison 1971; Allison and Zellikow 1999). Allison and Zel-
likow use a minimalist understanding of mechanisms.
In a withering critique of Allison’s (1971) original model of bureaucratic
politics (which became even more complex in the Allison and Zellikow
[1999] revision), Bendor and Hammond (1992: 318) state that the analysis
“is simply too thick. It incorporates so many variables that it is an analytical
kitchen sink. Nothing of possible relevance appears to be excluded.” Yet this
critique misses the point: Allison was attempting to craft a comprehensive
explanation of the “big and important things” in the case itself. Doing so
naturally requires multiple theories (or theoretical lenses, in Allison’s termi-
nology). Here we discuss the updated 1999 version of the book.
Allison and Zellikow (1999) start by examining how far a rational-actor
explanation (model I) gets them in accounting for the case. They assess why
the Soviets sent nuclear missiles to Cuba, why the United States responded
with a blockade, and why the Soviet Union withdrew the missiles. The au-
thors find that the rational-actor model does not account for some curi-
ous aspects of the Soviet deployment, including why the missiles were not
camouflaged. These puzzling aspects of the case led Allison and Zellikow
to supplement model I with an organizational-theory explanation (model
II) that highlights standard operating procedures and the like—factors that
produced the lack of camouflage and other puzzlements. However, the au-
thors contend that although model II does explain these factors, other im-
portant aspects of the complex case remain unaccounted for. Allison and
Zellikow consequently introduce a supplementary third explanation, gov-
ernmental politics (model III), that focuses analytic attention on the “pol-
itics of choice” and the bargaining between different government actors.
They conclude, “The need for all three lenses is evident when one considers
the causal bottom line. (383).

Revised Pages
Appendix: “Silver Blaze”
The Sherlock Holmes story “Silver Blaze” (Doyle 1975) is a frequently used
example in teaching process-tracing methods. In contrast to other accounts
though, there are no “variables” in the story since there is no variation within
the case (Collier 2011). The story can better be understood when it is un-
packed as a causal mechanism linking a cause (motive) and outcome (death
of the trainer), with attached propositions about empirical fingerprints and
the actual sources of evidence. We depict this as an argument road map in
table A, which details a causal mechanism that Sherlock Holmes might have
theorized and then tested empirically. We reconstruct the reasoning about
the probative value of evidence for each part based on what we are told in
the story.
The cause that triggers the process is the incurrence of debts by the
trainer, Straker, in connection with his affair with Madame Derbyshire. By
all accounts, the theory that the trainer had nefarious intentions is not very
plausible beforehand, because the trainer had served the owner of the horse
as a jockey for five years and as a trainer for seven years, “always [showing]
himself to be a zealous and honest servant” (Doyle 1975: 14).
While the first part of the process is not tested empirically—although it
logically has to be present—the second part is expected to leave some em-
pirical fingerprints. Evidence that the trainer practiced is not theoretically
certain, because he could have practiced well before the crime occurred and
the injuries might have healed. However, depending on the animal chosen
for practice, the number of mistakes he made in practicing, and the number
of animals hurt, the fingerprint would be relatively unique because most
farmyard animals are robust. The actual evidence found is testimony from
the maid who tends the sheep that three sheep have gone lame. Holmes
289

TABLE A. Argument Road Map for Sherlock Holmes’s Mechanistic Explanation of Straker’s Death in “Silver Blaze”
Cause Causal mechanism (case-specific) Outcome
Part 1 Part 2 Part 3 Part 4 Part 5 Part 6
Theorized mechanism
Trainer (Straker) Straker develops Straker prac- Straker drugs Straker takes horse Straker leads horse Straker gets ready Horse gets
in debt because plans for betting tices procedure for stable boy in from stable in to hollow in moor to hurt horse spooked and kicks
of expensive affair on injured horse as hurting horse evening by putting middle of night (needs light and Straker (death)
with Madame way to cover debts poison in curried ability to engage in
Derbyshire mutton delicate procedure)
Propositions about fingerprints
Bills for expenses No fingerprints ef2a—Animals ef3a—stable boy ef4a—dog not ef5a—horse prints ef6a—some form Fatal wound that
related to Madame discussed in the that have been (Hunter) bears barking at night at hollow of lighting needed could come from
Derbyshire story used for practice signs of poisoning (Hc, Hu) (Hc, Hu) (Hc, Lu) horse
(Lc, Mu) (Mc, Mu) ef6b—jacket taken
ef2b -instrument off and laid on
for procedure bush
(Hc, Lu) (Hc, Mu)
Actual sources
1—bill for dress e2a—lame sheep e3a—other stable e4a—neither Mrs. e5a—“abundant e6a1—wax vesta “His head had
2—photograph of (Hu) boys find Hunter Straker nor other proofs in the at scene of crime been shattered
Straker identified by e2b—small cata- groggy (“no sense stable boys heard mud” (Hu) (Mu) by a savage blow
dressmaker ract knife found could be got out dog at night (Mu) e6a1—candle from some heavy
on Straker (Mu) of him”) (Mu) found in Straker’s weapon”
pocket (Lu)
e6b—“overcoat
flapping from a
furze-bush” (Mu)
9/28/2018 1:08:39 PM
Revised Pages
assesses the maid as a credible and independent source, and it is highly un-
likely that three sheep would go lame coincidentally because they are very
robust animals. Holmes evaluates this evidence as highly empirically and
theoretically unique, stating to the inspector, “Gregory, let me recommend
to your attention this singular epidemic among the sheep” (23).
Another fingerprint expected to be found was the actual instrument in-
tended to injure the horse, which, given Holmes’s mechanistic explanation,
would have to be found at the scene of the crime. It did have a lower theo-
retical uniqueness, as the police believed that the trainer had grabbed the
knife as a weapon to stop the horse thief. The actual source of evidence was
the found cataract knife, which Watson describes as relatively empirically
unique because it is a “very delicate blade devised for very delicate work. A
strange thing for a man to carry with him upon a rough expedition” (15).
The next part relating to drugging the stable boy is expected to leave
fingerprints in the form of the stable boy bearing some signs of having been
poisoned. Aside from the maid, only the trainer and other stable boys had
access to the food and the knowledge that they were having curried mutton,
which could hide the smell of the drug. The actual evidence was testimony
from the other stable boys that they found Hunter groggy in the morning.
Given that Hunter probably only drank water that night, it is unlikely that
he was hungover after drinking alcohol.
Holmes theorized that after drugging the stable boy, Straker took the
horse out of the stable in the middle of the night. The stable dog was a very
vigilant guard dog who always barked when strangers were around but not
when known people were present. Therefore, the e silentio fingerprint of the
dog not barking was expected to be found in theory. However, turning to
evaluate the observation process, while it is implausible that sources would
have lied about hearing/not hearing the dog, there are plausible reasons that
the dog might not have been heard, such as people sleeping soundly (only
medium empirical uniqueness).
The next parts in the causal story relate to the trainer taking the horse to
a hollow on the moor and getting ready to injure the horse. The predicted
empirical fingerprints for part 5 were the hoofprints in the hollow, which
were obviously both highly theoretically certain and unique. In contrast,
the fingerprints for part 6 (preparing) included expectations about finding
traces of plans to light a candle and taking off his jacket to engage in the
delicate procedure. The need for light made traces of a candle very theoreti-
cally certain, but there could be innocent reasons why Straker would have
had a candle or matches (relatively low theoretical uniqueness). Similarly,

Revised Pages
292 Appendix: “Silver Blaze”
taking his jacket off was certain because of the delicate nature of the proce-
dure and because his death prevented him from putting it back on again.
However, he could have taken his jacket off to fight Simpson, as the police
theorized, although hanging it up carefully on a bush before a fight is a less
plausible explanation of the evidence. The actual sources of evidence in-
cluded Holmes’s discovery of a wax vesta (match) at the scene of the crime,
the candle in Straker’s pocket, and the jacket hung carefully on a bush at the
crime scene. The presence of the actual wax vesta at the scene of the crime
was difficult to explain as random chance and even more difficult to explain
with the police’s theory.
Finally, if Holmes’s theorized mechanism was present, we would expect
that the killing blow could have come from the horse. Indeed, the blow was
consistent with a kick from a horse, although it also could have come from
a heavy stick such as what Simpson possessed.
What makes the story intriguing to read is that Holmes’s argument road
map is not put forward at the start of the story. Instead, the reader is left
wondering what theory of the crime (cause and causal mechanism) Holmes
is testing. In the story, it is plausible that Holmes arrived at Dartmoor with
a sketch of a mechanistic explanation along with propositions about poten-
tial empirical fingerprints based on his prior study of published accounts of
the puzzling events. His preliminary theory-building is described as, “For a
whole day my companion had rambled about the room with his chin upon
his chest and his brows knitted, charging and recharging his pipe with the
strongest black tobacco, and absolutely deaf to any of my questions or re-
marks. Fresh editions of every paper had been sent up by our news agent,
only to be glanced over and tossed down into a corner. Yet, silent as he
was, I knew perfectly well what it was over which he was brooding” (1). On
the train ride, Holmes admits that he has a preliminary mechanistic sketch.
Watson asks whether he has formed a theory, and Holmes responds, “At least
I have a grip of the essential facts of the case,” although he then describes
only events, not his mechanistic explanation (3).
On arriving at Dartmoor, Holmes states that he first wants to “go into
one or two questions of detail” before moving on to the crime scene (14).
When he looks through the trainer’s possessions, he finds a cataract knife: as
a surgeon, Watson can assess the knife’s meaning (“A strange thing for a man
to carry” [15]). Other pieces of potential evidence are also found, including
a bill for an expensive dress. We can view Holmes’s initial exploration as a
form of plausibility probe: he is seeking potential evidence for his theorized
mechanism to update his confidence in the theory before expending the ef-
fort to walk to the crime scene. It is obvious here that he is not sure exactly

Revised Pages
what fingerprints his theory could have left in the empirical record of the
crime; instead, he is learning about the empirical record to strengthen his
propositions about fingerprints before testing them.
Later in the story, it becomes evident that Holmes is explicitly testing
propositions based on his theorized mechanism, particularly when he arrives
at the crime scene:
Holmes took the bag, and descending into the hollow he pushed the
matting into a more central position. Then stretching himself upon
his face and leaning his chin upon his hands he made a careful study
of the trampled mud in front of him.
“Halloa!” said he, suddenly, “what’s this?’
It was a wax vesta, half burned, which was so coated with mud
that it looked at first like a little chip of wood.
“I cannot think how I came to overlook it,” said the Inspector
with an expression of annoyance.
“It was invisible, buried in the mud. I only saw it because I was
looking for it.” (17)
The story has an auxiliary outcome: what happened with the prize race-
horse, Silver Blaze. Holmes’s theory expected that the horse had wandered
off, and given that Holmes is told that another racing stable is located nearby,
it was a natural corollary to expect that a prize racehorse wandering around
the moor would be found by the staff at the other stable.
We believe that an argument road map can be a helpful tool for mak-
ing explicit a theorized mechanism and systematizing the presentation of
evidence confirming/disconfirming the mechanism. We do not suggest
keeping the argument road map hidden—as Sherlock Holmes does—until
the end of a theory-testing analysis. Instead, it can be included explicitly
at the start, with the subsequent analysis presenting and discussing only
the pieces of mechanistic evidence with highest probative value—in par-
ticular, providing justifications for claims about theoretical and empirical
certainty or uniqueness.

Revised Pages
References
9/11 Commission Report. 2004. http://www.9-11commission.gov/report/911Report.

pdf
Abell, Peter. 2004. Narrative Explanation: An Alternative to Variable-Centered Expla-
nation? Annual Review of Sociology 30 (1): 287–310.
Adcock, Robert. 2007. Who’s Afraid of Determinism? The Ambivalence of Macro-
Historical Inquiry. Journal of the Philosophy of History 1 (2007): 346–64.
Adcock, Robert, and David Collier. 2001. Measurement Validity: A Shared Standard
for Qualitative and Quantitative Research. American Political Science Review 95
(3): 529–46.
Ahlquist, John S., and Christian Breunig. 2012. Model- Based Clustering and
Typologies in the Social Sciences. Political Analysis 20 (1): 92–112. http://dx.doi.
org/10.1093/pan/mpr039
Ahn, Andrew C., Muneesh Tewari, Chi-Sang Poon, and Russell S Phillips. 2006. The
Limits of Reductionism in Medicine: Could Systems Biology Offer an Alternative?
PLoS Medicine 3 (6): e208. https://doi.org/10.1371/journal.pmed.0030208
Allison, Graham. 1971. Essence of Decision: Explaining the Cuban Missile Crisis. Boston:
Little, Brown.
Allison, Graham, and Philip Zelikow. 1999. Essence of Decision: Explaining the Cuban
Missile Crisis. 2nd ed. New York: Longman.
Alvarez, Walter. 2008. “T. Rex” and the Crater of Doom. Princeton: Princeton Univer-
sity Press.
Andersen, Holly. 2012. The Case for Regularity in Mechanistic Causal Explanation.
Synthese 189 (2012): 415–32.
Andriantsitohaina, Ramaroson, Cyril Auger, Thierry Chataigneau, Nelly Étienne-
Selloum, Huige Li, M. Carmen Martínez, Valérie B. Schini-Kerth, and Ismail
Laher. 2012. Molecular Mechanisms of the Cardiovascular Protective Effects of
Polyphenols. British Journal of Nutrition 108 (9): 1532–49.
Archer, Margaret. 2000. Being Human: The Problem of Agency. Cambridge: Cambridge
University Press.
Aviles, Natalie B., and Isaac Ariail Reed. 2017. Ratio via Machina: Three Standards for
Mechanistic Explanation in Sociology. Sociological Methods and Research 46 (4):
715–68.
295

Revised Pages
296 References
Bacciagaluppi, Guido. 2012. The Role of Decoherence in Quantum Mechanics. In The

Stanford Encyclopedia of Philosophy (Winter), ed. Edward N. Zalta, http://plato.
stanford.edu/archives/win2012/entries/qm-decoherence/
Bamanyaki, Patricia A., and Nathalie Holvoet. 2016. Integrating Theory-based Evalu-
ation and Process Tracing in the Evaluation of Civil Society Gender Budget Initia-
tives. Evaluation 22 (1): 72–90.
Barrett, Andrew W., and Matthew Eshbaugh-Soha. 2007. Presidential Success on the
Substance of Legislation. Political Research Quarterly 60 (1):100–112.
Bartusevius, Henrikas. 2014. The Inequality-Conflict Nexus Re-Examined: Income,
Education, and Popular Rebellions. Journal of Peace Research 51 (1): 35–50.
Baumgartner, Michael. 2009. Uncovering Deterministic Causal Structures: A Boolean
Approach. Synthese 170 (1): 71–96.
Becker, Marc. 2016. The Correa Coup. Latin American Perspectives 43 (1): 71–92.
Beach, Derek. 2018. Achieving Methodological Alignment When Combining QCA
and Process Tracing in Practice. Sociological Methods and Research 47 (1): 64–99.
Beach, Derek, and Rasmus Brun Pedersen. 2016a. Causal Case Studies: Foundations
and Guidelines for Comparing, Matching, and Tracing. Ann Arbor: University of
Michigan Press.
Beach, Derek, and Rasmus Brun Pedersen. 2016b. Selecting Appropriate Cases When
Tracing Causal Mechanisms. Sociological Methods and Research. https://doi.
org/10.1177/0049124115622510
Beach, Derek, and Ingo Rohlfing. 2018. Integrating Cross-Case Analyses and Process
Tracing in Set Theoretic Research: Strategies and Parameters of Debate. Sociologi-
cal Methods and Research 47 (1): 3–36.
Bechtel, William, and Robert C. Richardson. 2010. Discovering Complexity: Decompo-
sition and Localization as Strategies in Scientific Research. Cambridge: MIT Press.
Beck, Nathaniel. 2006. Is Causal-Process Observation an Oxymoron? Political Analysis
14 (2): 347–52.
Becker, Marc. 2016. The Correa Coup. Latin American Perspectives 206 (1): 71–92.
Beckmann, Matthew N. 2010. Pushing the Agenda: Presidential Leadership in U.S.
Lawmaking, 1953–2004. Cambridge: Cambridge University Press.
Befani, Barbara, and Gavin Stedman- Bryce. 2016. Process Tracing and Bayes-
ian Updating for Impact Evaluation. Evaluation 23 (1): 42– 60. https://doi.
org/10.1177/1356389016654584
Bendor, Jonathan, A. Glazer, and Thomas H. Hammond. 2001. Theories of Delega-
tion. Annual Review of Political Science 4:235–69.
Bendor, Jonathan, and Thomas H. Hammond. 1992. Rethinking Allison’s Models.
American Political Science Review 86 (2): 301–22.
Bennett, Andrew. 2008a. The Mother of All “Isms”: Organizing Political Science
around Causal Mechanisms. In Revitalizing Causality: Realism about Causality in
Philosophy and Social Science, ed. Ruth Groff, 205–19. London: Routledge.
Bennett, Andrew. 2008b. Process-Tracing: A Bayesian Perspective. In The Oxford
Handbook of Political Methodology, ed. Janet M. Box-Steffensmeier, Henry E.
Brady, and David Collier, 702–21. Oxford: Oxford University Press.
Bennett, Andrew. 2014. Appendix: Disciplining Our Conjectures: Systematizing Pro-

Revised Pages
References 297
cess Tracing with Bayesian Analysis. In Process Tracing: From Metaphor to Analytic
Tool, ed. Andrew Bennett and Jeffrey Checkel, 276–98. Cambridge: Cambridge
University Press.
Bennett, Andrew, and Jeffrey Checkel. 2014. Process Tracing: From Metaphor to Analytic
Tool. Cambridge: Cambridge University Press.
Berg-Schlosser, Dirk. 2012. Mixed Methods in Comparative Politics: Principles and
Applications. Houndmills: Palgrave Macmillan.
Berg-Schlosser, Dirk, and Gisèle De Meur. 2009. Comparative Research Design: Case
and Variable Selection. In Configurational Comparative Methods: Qualitative Com-
parative Analysis, ed. Benoit Rihoux and Charles Ragin, 19–32. Thousand Oaks,
CA: Sage.
Bhaskar, Roy. 1978. A Realist Theory of Science. Brighton: Harvester.
Blatter, Joachim, and Till Blume. 2008. In Search of Co-Variance, Causal Mecha-
nisms, or Congruence? Towards a Plural Understanding of Case Studies. Swiss
Political Science Review 14 (2): 315–56.
Blatter, Joachim, and Markus Haverland. 2012. Designing Case Studies: Explanatory
Approaches in Small-N Research. Houndmills: Palgrave Macmillan.
Bogaards, Matthijs. 2012. Where to Draw the Line? From Degree to Dichotomy in
Measures of Democracy. Democratization 19 (4): 690–712.
Bogen, Jim. 2005. Regularities and Causality: Generalizations and Causal Explana-
tions. Studies in History and Philosophy of Biological and Biomedical Sciences 36 (2):
397–420.
Bowman, Kirk, Fabrice Lehoucq, and James Mahoney. 2005. Measuring Political
Democracy: Case Expertise, Data Adequacy, and Central America. Comparative
Political Studies 38 (8): 939–70.
Brady, Henry E. 2008. Causation and Explanation in Social Science. In The Oxford
Handbook of Political Methodology, ed. Janet M. Box-Steffensmeier, Henry E.
Brady, and David Collier, 217–70. Oxford: Oxford University Press.
Brady, Henry E., and David Collier, eds. 2011. Rethinking Social Inquiry: Diverse Tools,
Shared Standards. 2nd ed. Lanham, MD: Rowman and Littlefield.
Bretthauer, J. M. 2015. Conditions for peace and conflict: applying a fuzzy-set Qualita-
tive Comparative Analysis to cases of resource scarcity. Journal of Conflict Resolu-
tion 59 (4): 593–616.
Brilmayer, Lea. 1986. The Role of Evidential Weight in Criminal Proof. Boston Univer-
sity Law Review 66 (4): 673–91.
Brooks, Stephen G., and William C. Wohlforth. 2000–2001. Power Globalization and
the End of the Cold War. International Security 25 (3): 5–53.
Bunge, Mario. 1997. Mechanism and Explanation. Philosophy of the Social Sciences 27
(4): 410–65.
Bunge, Mario. 2004. How Does It Work? The Search for Explanatory Mechanisms.
Philosophy of the Social Sciences 34 (2): 182–210.
Campbell, John L. 2005. Institutional Change and Globalization. Princeton: Princeton
University Press.
Caren, Neal, and Aaron Panofsky. 2005. TQCA: A Technique for Adding Temporality to
Qualitative Comparative Analysis. Sociological Methods and Research 34 (2): 147–72.

Revised Pages
298 References
Cartwright, Nancy. 1999. The Dappled World: A Study of the Boundaries of Science.
Cambridge: Cambridge University Press.
Cartwright, Nancy. 2007. Hunting Causes and Using Them: Approaches in Philosophy
and Economics. Cambridge: Cambridge University Press.
Cartwright, Nancy. 2011. Predicting “It Will Work for Us”: (Way) beyond Statistics.
In Causality in the Sciences, ed. Phyllis McKay Illari, Federica Russo, and Jon Wil-
liamson, 750–68. Oxford: Oxford University Press.
Cartwright, Nancy, and Jeremy Hardie. 2012. Evidence-Based Policy: A Practical Guide
to Doing It Better. Oxford: Oxford University Press.
Cerling, Thure E., Fredrick Kyalo Manthi, Emma N. Mbua, Louise N. Leakey, Meave
G. Leakey, Richard E. Leakey, Francis H. Brown, Frederick E. Grine, John A.
Hart, Prince Kaleme, Hélène Roche, Kevin T. Uno, and Bernard A. Wood. 2013.
Stable Isotope-Based Diet Reconstructions of Turkana Basin Hominins. Proceed-
ings of the National Academy of Sciences of the United States of America 110 (26):
10501–6.
Chalmers, A. F. 1999. What Is This Thing Called Science? Buckingham: Open Univer-
sity Press.
Charman, Andrew, and Tasha Fairfield. 2015. Applying Formal Bayesian Analysis to
Qualitative Case Research: An Empirical Example, Implications, and Caveats.
Unpublished paper.
Chatterjee, Partha. 2011. Lineages of Political Society: Studies in Postcolonial Democracy.
New York: Columbia University Press.
Checkel, Jeffrey T. 2008. Tracing Causal Mechanisms. International Studies Review 8
(2): 362–70.
Christiansen, Thomas, and Knud Erik Jørgensen. 1999. The Amsterdam Process: A
Structurationist Perspective on EU Treaty Reform. http://eiop.or.at/eiop/texte/1999-
01a.htm
Christiansen, Thomas, and Christine Reh. 2009. Constitutionalizing the European
Union. Basingstoke: Palgrave Macmillan.
CIA. 1968. Intelligence Report: Bayes’ Theorem in the Korean War. No. 0605/68, July.
https://www.cia.gov/library/readingroom/docs/DOC_0001205738.pdf
Clarke, B., D. Gillies, Phyllis Illari, Federica Russo, and Jon Williamson. 2014. Mecha-
nisms and the Evidence Hierarchy. Topoi 33 (2): 339–60.
Clarke, Richard A. 2004. Against All Enemies: Inside America’s War on Terror. New
York: Free Press.
Coleman, James. 1990. Foundations of Social Theory. Cambridge: Harvard University
Press.
Collier, David, Henry E. Brady, and Jason Seawright. 2010. Sources of Leverage in
Causal Inference: Toward an Alternative View of Methodology. In Rethinking
Social Inquiry: Diverse Tools, Shared Standards, 2nd ed., ed. Henry E. Brady and
David Collier, 161–200. Lanham, MD: Rowman and Littlefield.
Collier, David, and Steven Levitsky. 1997. Democracy with Adjectives: Conceptual
Innovation in Comparative Research. World Politics 49 (3): 430–51.
Collier, David, and James Mahoney. 1996. Research Note: Insights and Pitfalls: Selec-
tion Bias in Qualitative Research. World Politics 49 (1): 56–91.

Revised Pages
References 299
Coppedge, Michael. 1999. Thickening Thin Concepts and Theories: Combining Large
N and Small in Comparative Politics. Comparative Politics 31 (4): 465–76.
Coppedge, Michael. 2012. Democratization and Research Methods. Cambridge: Cam-
bridge University Press.
Coppedge, Michael, John Gerring, Staffan I. Lindberg, Svend-Erik Skaaning, Jan Teo-
rell, David Altman, Frida Andersson, Michael Bernhard, M. Steven Fish, Adam
Glynn, Allen Hicken, Carl Henrik Knutsen, Kyle L. Marquardt, Kelly McMann,
Valeriya Mechkova, Pamela Paxton, Daniel Pemstein, Laura Saxer, Brigitte
Seim, Rachel Sigman, and Jeffrey Staton. 2017. V-Dem Codebook v7.1. Varieties
of Democracy (V- Dem) Project. https://www.v-dem.net/en/reference/version-
7-1-july-2017/
Craver, Carl F., and Lindley Darden. 2013. In Search of Mechanisms. Chicago: Univer-
sity of Chicago Press.
Cyr, Jennifer. 2015. Making or Breaking Politics: Social Conflicts and Party-System
Change in Democratic Bolivia. Studies in Comparative International Development
50:283–303.
Darnton, Christopher. 2017–18. Archives and Inference. International Security 42 (3):
84–126.
Day, Timothy, and Harold Kincaid. 1994. Putting Inference to the Best Explanation in
Its Place. Synthese 98 (2): 271–95.
Diani, Mario, and Doug McAdam. 2003. Social Movements and Networks. Oxford:
Oxford University Press.
Dion, Douglas. 2003. Evidence and Inference in the Comparative Case Study. In Nec-
essary Conditions: Theory, Methodology, and Applications, ed. Gary Goertz and H.
Starr, 95–112. Oxford: Rowman and Littlefield.
Dowe, Phil. The Causal-Process-Model Theory of Mechanisms. In Causality in the
Sciences, ed. Phyllis McKay Illari, Federica Russo, and Jon Williamson, 865–79.
Oxford: Oxford University Press.
Doyle, A. Conan. 1892. The Adventures of Sherlock Holmes. London: George Newnes.
Doyle, A. Conan. 1975. The Memoirs of Sherlock Holmes. London: George Newnes.
Dunning, Thad. 2017. Contingency and Determinism in Research on Critical Junc-
tures: Avoiding the “Inevitability Framework.” Qualitative and Multi- Method
Research 15 (1): 41–47.
Dyson, Kenneth, and Kevin Featherstone. 1999. The Road to Maastricht: Negotiating
Economic and Monetary Union. Oxford: Oxford University Press.
Eaton, Kent. 2014. Recentralization and the Left Turn in Latin America: Diverg-
ing Outcomes in Bolivia, Ecuador, and Venezuela. Comparative Political Studies
47:1130–57.
Eckstein, Harry. 1975. Case Study and Theory in Political Science. In Handbook of
Political Science, vol. 7, Strategies of Inquiry, ed. Fred I. Greenstein and Nelson W.
Polsby, 79–138. Reading, MA: Addison-Wesley.
Economist. 2010. A Strike against Democracy. October 9, 64–65.
Eells, Ellery, and Brandon Fitelson. 2000. Measuring Confirmation and Evidence.
Journal of Philosophy 97 (12): 663–72.
Elman, Colin, and Miriam Fendius Elman. 2001. Introduction: Negotiating Interna-

Revised Pages
300 References
tional History and Politics. In Bridges and Boundaries: Historians Political Scientists
and the Study of International Relations, ed. Colin Elman and Miriam Fendius
Elman, 1–36. Cambridge: MIT Press.
Elster, Jon. 1998. A Plea for Mechanisms. In Social Mechanisms, ed. P. Hedström and
R. Swedberg, 45–73. Cambridge: Cambridge University Press.
Emirbayer, Mustafa, and Ann Mische. 1998. What Is Agency? American Journal of
Sociology 103 (4): 962–1023.
English, Robert D. 1997. Sources, Methods, and Competing Perspectives on the End
of the Cold War. Diplomatic History 21 (2): 283–94.
Erslev, K. R. 1963. Historisk Teknik: Den Historiske Undersøgelse Fremstillet I sine Grund
lier. Copenhagen: Gyldendalske Boghandel.
Evangelista, Mathew. 2014. Explaining the Cold War’s End: Process Tracing All the
Way Down? In Process Tracing: From Metaphor to Analytic Tool, ed. Andrew Ben-
nett and Jeff Checkel, 153–85. Cambridge: Cambridge University Press.
Evans, Peter 1995. The Role of Theory in Comparative Politics. World Politics 48 (1):
3–10.
Fairclough, Norman. 1995. Critical Discourse Analysis: The Critical Study of Language.
London: Routledge.
Fairfield, Tasha. 2013. Going Where the Money Is: Strategies for Taxing Economic
Elites in Unequal Democracies. World Development 47 (1): 42–57.
Fairfield, Tasha, and Andrew E. Charman. 2017. Explicit Bayesian Analysis for Process
Tracing: Guidelines, Opportunities, and Caveats. Political Analysis 25 (3): 363–80.
Falleti, Tulia G., and Julia F. Lynch. 2009. Context and Causal Mechanisms in Politi-
cal Analysis. Comparative Political Studies 42 (9): 1143–66.
Falleti, Tulia G., and James Mahoney. 2015. The Comparative Sequential Method.
In Advances in Comparative-Historical Analysis, ed. James Mahoney and Kathleen
Thelen, 211–39. Cambridge: Cambridge University Press.
Fearon, James. 1991. Counterfactuals and Hypothesis Testing in Political Science.
World Politics 43 (2): 169–95.
Federoff, Howard J., and Lawrence O. Gostin. 2009. Evolving from Reductionism to
Holism: Is There a Future for Systems Medicine? Journal of the American Medical
Association 302 (9): 994–96.
Finke, Daniel. 2009. Domestic Politics and European Treaty Reform: Understanding
the Dynamics of Governmental Position Taking. European Union Politics 10 (4):
482–506.
Fitelson, B. 2001. A Bayesian Account of Independent Evidence with Applications.
Philosophy of Science 68:S123–40.
Ford, Harold P. 1998. CIA and the Vietnam Policymakers: Three Episodes, 1962–1968.
Washington, DC: Center for the Study of Intelligence.
Freedman, David A. 1991. Statistical Models and Shoe Leather. Sociological Methodol-
ogy 21:291–313.
Frieden, Richard D. 1986a. A Close Look at Probative Value. Boston University Law
Review 66 (4): 733–59.
Frieden, Richard D. 1986b. A Diagrammatic Approach to Evidence. Boston University
Law Review 66 (4): 571–620.

Revised Pages
References 301
Friedrichs, Jörg, and Friedrich Kratochwill. 2009. On Acting and Knowing: How
Pragmatism Can Advance International Relations Research and Methodology.
International Organization 63 (4): 701–31.
Gaddis, John Lewis. 1972. The United States and the Origins of the Cold War. Ithaca:
Cornell University Press.
Gaddis, John Lewis. 1992–93. International Relations Theory and the End of the Cold
War. International Security 17 (3): 5–58.
Gallow, J. Dimitri. 2014. How to Learn from Theory-Dependent Evidence; or, Com-
mutativity and Holism: A Solution for Conditionalizers. British Journal of the Phi-
losophy of Science 65: 493–519.
George, Alexander L., and Andrew Bennett. 2005. Case Studies and Theory Develop-
ment in the Social Sciences. Cambridge: MIT Press.
Gerring, John. 1999. What Makes a Concept Good? A Criterial Framework for Under-
standing Concept Formation in the Social Sciences. Polity 31 (3): 357–93.
Gerring, John. 2005. Causation: A Unified Framework for the Social Sciences. Journal
of Theoretical Politics 17 (2): 163–98.
Gerring, John. 2007. Case Study Research: Principles and Practices. Cambridge: Cam-
Gerring, John. 2010. Causal Mechanisms: Yes but . . . Comparative Political Studies 43
(11): 1499–526.
Gerring, John. 2011. Social Science Methodology: A Unified Framework. Cambridge:
Cambridge University Press.
Gerring, John. 2017. Case Study Research. 2nd ed. Cambridge: Cambridge University Press.
Gerring, John, and Jason Seawright. 2007. Techniques for Choosing Cases. In Ger-
ring, Case Study Research: Principles and Practices, 86–150. Cambridge: Cambridge
University Press.
Giddens, Anthony. 1984. The Constitution of Society: Outline of the Theory of Structura-
tion. Cambridge: Polity Press.
Gill, Christopher, Lora Sabin, and Christopher Schmid. 2005. Why Clinicians Are
Natural Bayesians. British Medical Journal 330 (7499): 1080–83.
Glennan, Stuart S. 1996. Mechanisms and the Nature of Causation. Erkenntnis 44 (1):
49–71.
Glennan, Stuart S. 2002. Rethinking Mechanistic Explanation. Philosophy of Science
69 (S3): 342–53.
Glennan, Stuart S. 2010. Ephemeral Mechanisms and Historical Explanation. Erken-
ntnis 72 (2): 251–66.
Glennan, Stuart S. 2011. Singular and General Causal Relations: A Mechanist Perspec-
tive. In Causality in the Sciences, ed. Phyllis McKay Illari, Federica Russo, and Jon
Williamson, 789–817. Oxford: Oxford University Press.
Goertz, Gary. 2006. Social Science Concepts: A User’s Guide. Princeton: Princeton Uni-
versity Press.
Goertz, Gary. 2017. Multimethod Research, Causal Mechanisms, and Case Studies: An
Integrated Approach. Princeton: Princeton University Press.
Goertz, Gary, and Jack S. Levy, eds. 2007. Explaining War and Peace: Case Studies and
Necessary Condition Counterfactuals. London: Routledge.

Revised Pages
302 References
Goertz, Gary, and James Mahoney. 2009. “Scope in Case-Study Research.” In The Sage
Handbook of Case-Based Methods, ed. David Byrne and Charles C. Ragin, 307–17.
Thousand Oaks, CA: Sage.
Goertz, Gary, and James Mahoney. 2012. A Tale of Two Cultures: Qualitative and
Quantitative Research in the Social Sciences. Princeton: Princeton University Press.
Good, L. J. 1968. Corroboration, Explanation, Evolving Probability, Simplicity, and a
Sharpened Razor. British Journal of Philosophy 19 (2): 123–43.
Good, L. J. 1991. Weight of Evidence and the Bayesian Likelihood Ratio. In Use of Statistics
in Forensic Science, ed. C. G. Aitken and D. A. Stoney, 85–106. London: CRC Press.
Groff, Ruth. 2011. Getting Past Hume in the Philosophy of Social Science. In Causal-
ity in the Sciences, ed. Phyllis McKay Illari, Federica Russo, and Jon Williamson,
296–316. Oxford: Oxford University Press.
Grzymala-Busse, Anna. 2011. Time Will Tell? Temporality and the Analysis of Causal
Mechanisms and Processes. Comparative Political Studies 44 (9): 1267–97.
Haas, Peter M. 1992. “Introduction: Epistemic Communities and International Policy
Coordination.” International Organization 46 (1): 1–35.
Haesebrouck, Tim. 2016. The Added Value of Multi-Value Qualitative Comparative
Analysis. Forum: Qualitative Social Research 17 (1). http://dx.doi.org/10.17169/
fqs-17.1.2307
Haggard, Stephan, and Robert R. Kaufman. 2016. Dictators and Democrats: Masses,
Elites, and Regime Change. Princeton: Princeton University Press.
Hall, Peter A. 2008. Systematic Process Analysis: When and How to Use It. European
Political Science 7 (3): 304–17.
Harish, S. P., and Andrew T. Little. 2017. The Political Violence Cycle. American Politi-
cal Science Review 111 (2): 237–55.
Harvey. Frank P. 2012. Explaining the Iraq War: Counterfactual Theory, Logic and Evi-
dence. Cambridge: Cambridge University Press.
Hedström, Peter, and Richard Swedberg, eds. 1998. Social Mechanisms: An Analytical
Approach to Social Theory. Cambridge: Cambridge University Press.
Hedström, Peter, and Petri Ylikoski. 2010. Causal Mechanisms in the Social Sciences.
Annual Review of Sociology 36:49–67.
Hempel, Carl G. 1965. The Function of General Laws in History. In Aspects of Scientific
Explanation and Other Essays, 231–44. New York: Free Press.
Holland, Paul W. 1986. Statistics and Causal Inference. Journal of the American Statisti-
cal Association 81 (396): 945–60.
Howson, Colin, and Peter Urbach. 2006. Scientific Reasoning: The Bayesian Approach.
3rd ed. La Salle, IL: Open Court.
Hug, Simon, and Thomas König. 2002. In View of Ratification: Governmental Pref-
erences and Domestic Constraints at the Amsterdam Intergovernmental Confer-
ence. International Organization 56 (4): 447–76.
Hug, Simon, and Thomas Schultz. 2007. Referendums in the EU’s Constitution
Building Process. Review of International Organizations 2 (2): 177–218.
Hume, David. 1975. Enquiries Concerning Human Understanding and Concerning the
Principles of Morals. Ed. P. H. Nidditch. 3rd ed. Oxford: Oxford University Press.
Humphreys, Adam R. C. 2010. The Heuristic Explanation of Explanatory Theories in
International Relations. European Journal of International Relations 17 (2): 257–77.

Revised Pages
References 303
Humphreys, Macartan, and Alan Jacobs. 2015. Mixing Methods: A Bayesian Approach.
American Political Science Review 109 (4): 653–73.
Iklé, Fred Charles. 1964. How Nations Negotiate. New York: Praeger.
Illari, Phyllis McKay. 2011. Mechanistic Evidence: Disambiguating the Russo-
Williamson Thesis. International Studies in the Philosophy of Science 25 (2): 139–57.
Illari, Phyllis McKay, and Federica Russo. 2014. Causality: Philosophical Theory Meets
Scientific Practice. Oxford: Oxford University Press.
Illari, Phyllis McKay, and Jon Williamson. 2011. Mechanisms Are Real and Local. In
Causality in the Sciences, ed. Phyllis McKay Illari, Federica Russo, and Jon Wil-
liamson, 818–44. Oxford: Oxford University Press.
Illari, Phyllis McKay, and Jon Williamson. 2013. In Defense of Activities. Journal for
General Philosophy of Science 44 (1): 69–83.
Jackson, Patrick T. 2016. The Conduct of Inquiry in International Relations. 2nd ed.
London: Routledge.
Janis, Irving L. 1983. Groupthink: Psychological Studies of Policy Decisions and Fiascoes.
Boston: Houghton Mifflin.
Jervis, Robert. 1976. Perceptions and Misperceptions in International Politics. Princeton:
Princeton University Press.
Jervis, Robert. 2010. Why Intelligence Fails: Lessons from the Iranian Revolution and the
Iraq War. Ithaca: Cornell University Press.
Joos, Erich, H. Dieter Zeh, Claus Kiefer, Domenico J. W. Giulini, Joachim Kupsch,
and Ion-Olimpiu Stamatescu. 2003. Decoherence and the Appearance of a Classical
World in Quantum Theory. Berlin: Springer-Verlag.
Kaiser, Marie I. 2017. The Components and Boundaries of Mechanisms. In The Rout-
ledge Handbook of Mechanisms and Mechanistic Philosophy, ed. Stuart Glennan and
Phyllis Illari, 116–30. London: Routledge.
Kaye, David H. 1986. Comment: Quantifying Probative Value. Boston University Law
Review 66 (4): 761–66.
Khong, Yuen Foong. 1992. Analogies at War: Korea, Munich, Dien Bien Phu, and the
Vietnam Decisions of 1965. Princeton: Princeton University Press.
King, Gary, Robert O. Keohane, and Sidney Verba. 1994. Designing Social Inquiry:
Scientific Inference in Qualitative Research. Princeton: Princeton University Press.
Kitano, Hiroaki. 2002. Systems Biology: A Brief Overview. Science 295 (5560): 1662–
64.
Klein, Richard G. 2013. Comments: Stable Isotope-Based Diet Reconstructions of
Turkana Basin Hominins. Proceedings of the National Academy of Sciences in the
United States of America 110 (26): 10470–72.
Kramer, Mark. 1990. Remembering the Cuban Missile Crisis: Should We Swallow
Oral History? International Security 15 (1): 212–16.
Krebs, R. R., and P. T. Jackson. 2007. Twisting Tongues and Twisting Arms: The Power
of Political Rhetoric. European Journal of International Relations 13 (1): 35–66.
Kreuzer, Markus. 2014. The Structure of Description: Elements of and Criteria for
Evaluating Historical Analysis. Paper presented at the annual meeting of the
American Political Science Association.
Kuehn, David, and Harold Trinkunas. 2017. Conditions of Military Contestation in
Populist Latin America. Democratization 24 (5): 859–80.

Revised Pages
304 References
Kuhlmann, Meinard, and Stuart Glennan. 2014. On the Relation between Quantum
Mechanical and Neo-Mechanistic Ontologies and Explanatory Strategies. Euro-
pean Journal of the Philosophy of Science 4 (3): 337–59.
Kurki, Milja. 2008. Causation in International Relations: Reclaiming Causal Analysis.
Larson, Deborah Welch. 2001. Sources and Methods in Cold War History: The Need
for a New Theory-Based Archival Approach. In Bridges and Boundaries: Historians,
Political Scientists, and the Study of International Relations, ed. Colin Elman and
Miriam Fendius Elman, 327–50. Cambridge: MIT Press.
Layne, Christopher. 2006. The Peace of Illusions: American Grand Strategy from 1940 to
the Present. Ithaca: Cornell University Press.
Leamer, Edward E. 2010. Tantalus on the Road to Asymptopia. Journal of Economic
Perspectives 24 (2): 31–46.
Lebow, Richard Ned. 2000–2001. Contingency, Catalysts, and International System
Change. Political Science Quarterly 115 (4): 591–616.
Leifeld, Philip, and Volker Schneider. 2012. Information Exchange in Policy Net-
works. American Journal of Political Science 53 (3): 731–44.
Levi-Montalcini R., and P. Calissano. 2006. The Scientific Challenge of the 21st Cen-
tury: From a Reductionist to a Holistic Approach via Systems Biology. BMC Neu-
roscience 7 (Suppl 1): S1. https://doi.org/10.1186/1471-2202-7-S1-S1
Levy, Jack. 2008. Case Studies: Types, Designs, and Logics of Inference. Conflict Man-
agement and Peace Science 25 (1): 1–18.
Levy, Jack. 2015. Counterfactuals, Causal Inference, and Historical Analysis. Security
Studies 24 (3): 378–402.
Lewis, David K. 1986. Postscripts to “Causation.” In Philosophical Papers, 2:172–213.
Lieberman, Evan S. 2005. Nested Analysis as a Mixed-Method Strategy for Compara-
tive Research. American Political Science Review 99 (3): 435–51.
Little, Daniel. 1996. Causal Explanation in the Social Sciences. Southern Journal of
Philosophy 34 (S1): 31–56.
Löblová, Olga. 2017. When Epistemic Communities Fail: Exploring the Mechanism
of Policy Influence. Policy Studies Journal. https://doi.org/10.1111/psj.12213
Lustick, Ian S. 1996. History, Historiography, and Political Science: Multiple Histori-
cal Records and the Problem of Selection Bias. American Political Science Review
90 (3): 605–18.
Machamer, Peter. 2004. Activities and Causation: The Metaphysics and Epistemology
of Mechanisms. International Studies in the Philosophy of Science 18 (1): 27–39.
Machamer, Peter, Lindley Darden, and Carl F. Craver. 2000. Thinking about Mecha-
nisms. Philosophy of Science 67 (1): 1–25.
Mackie, J. L. 1965. Causes and Conditions. American Philosophical Quarterly 2 (2):
245–64.
Madea, Burkhard, and Dirk W. Lachenmeier. 2005. Postmortem Diagnosis of Hyper-
tonic Dehydration. Forensic Science International 155 (1): 1–6.
Madrid, Raúl. 2013. Bolivia: Origins and Politics of the Movimiento al Socialismo. In
The Resurgence of the Latin American Left, ed. Stephen Levitsky and Kenneth M.
Roberts, 239–59. Baltimore: Johns Hopkins University Press.

Revised Pages
References 305
Mahoney, James. 2001. Beyond Correlational Analysis: Recent Innovations in Theory

and Method. Sociological Forum 16 (3): 575–93.
Mahoney, James. 2003. Strategies of Causal Assessment in Comparative Historical Anal-
ysis. In Comparative Historical Analysis in the Social Sciences, ed. James Mahoney and
Dietrich Rueschemeyer, 337–72. Cambridge: Cambridge University Press.
Mahoney, James. 2007. Qualitative Methodology and Comparative Politics. Compara-
tive Political Studies 40 (2): 122–44.
Mahoney, James. 2008. Toward a Unified Theory of Causality. Comparative Political
Studies 41 (4–5): 412–36.
Mahoney, James. 2012. The Logic of Process Tracing Tests in the Social Sciences. Socio-
logical Methods and Research 41 (4): 570–97.
Mahoney, James. 2015. Process Tracing and Historical Explanation. Security Studies 24
(2): 200–218.
Mahoney, James, and Rodrigo Barrenechea. 2017. The Logic of Counterfactual
Analysis in Case- Study Explanation. British Journal of Sociology. https://doi.
org/10.1111/1468-4446.12340
Mahoney, James, and Gary Goertz. 2004. The Possibility Principle: Choosing Negative
Cases in Comparative Research. American Political Science Review 98 (4): 653–69.
Mahoney, James, and Kathleen Thelen, eds. 2015. Advances in Comparative-Historical
Analysis. Cambridge: Cambridge University Press.
Mansfield, Edward D., and Jack Snyder 2002. Democratic Transitions, Institutional
Strength, and War. International Organization 56 (2): 297–337.
Marini, Margaret Mooney, and Burton Singer. 1988. Causality in the Social Sciences.
Sociological Methodology 18:347–409.
Marshall, Monty G., Ted Robert Gurr, and Keith Jaggers. 2017. Polity IV Project:
Political Regime Characteristics and Transitions, 1800–2016. Dataset Users’ Manual.
http://www.systemicpeace.org/inscr/p4manualv2016.pdf
Mayntz, Renate. 2004. Mechanisms in the Analysis of Social Macro-Phenomena. Phi-
losophy of the Social Sciences 34 (2): 237–59.
McAdam, Doug, Sidney Tarrow, and Charles Tilly. 2001. Dynamics of Contention.
McAdam, Doug, Sidney Tarrow, and Charles Tilly. 2008. Methods for Measuring
Mechanisms of Contention. Qualitative Sociology 31 (4): 307–31.
McGrayne, Sharon Bertsch. 2011. The Theory That Would Not Die: How Bayes’ Rule
Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Trium-
phant from Two Centuries of Controversy. New Haven: Yale University Press.
Mello, P. 2012. Parliamentary Peace or Partisan Politics? Democracies’ Participation
in the Iraq War. Journal of International Relations and Development 15 (3): 420–53.
Michigan Legal Publishing. 2018. Federal Rules of Evidence. Grand Rapids, MI: Michi-
gan Legal Publishing, Ltd.
Michell, Joel. 2011. Qualitative Research Meets the Ghost of Pythagoras. Theory and
Psychology 21 (2): 241–59.
Mikkelsen, Kim Sass. 2017. Negative Case Selection: Justifications and Consequences
for Set-Theoretic MMR. Sociological Methods and Research 46 (4): 739–71.
Miller, Gary, and Norman Schofield. 2008. The Transformation of the Republican
and Democratic Party Coalitions in the U.S. Perspectives on Politics 6 (3): 433–50.

Revised Pages
306 References
Milligan, John D. 1979. The Treatment of a Historical Source. History and Theory 18
(2): 177–96.
Moore, Barrington. 1991 [1966]. Social Origins of Dictatorship and Democracy: Lord and
Peasant in the Making of the Modern World. London: Penguin.
Moravcsik, Andrew. 1998. The Choice for Europe. Ithaca: Cornell University Press.
Moravcsik, Andrew. 2010. Active Citation: A Precondition for Replicable Qualitative
Research. PS: Political Science and Politics 43 (1): 29–35.
Morgan, Stephen L., and Christopher Winship. 2007. Counterfactuals and Causal
Inference: Methods and Principles for Social Research. Cambridge: Cambridge Uni-
versity Press.
Munck, Gerardo L., and Jay Verkuilen. 2002. Conceptualizing and Measuring Democ-
racy: Evaluating Alternative Indices. Comparative Political Studies 35 (1): 5–34.
Munson, Ronald. 1976. The Way of Words. Boston: Houghton Mifflin.
Musshoff, Frank, Peter M. D. Schmidt, Thomas Daldrup, and Burkhard Madea. 2002.
Cyanide Fatalities: Case Studies of Four Suicides and One Homicide. American
Journal of Forensic Medicine and Pathology 23 (4): 315–20.
New York Times. 2009. Bolivian President Says Plot on His Life Was Tied to Coup
Attempt. April 18.
O’Mahoney, Joseph O. 2017. Making the Real: Rhetorical Adduction and the Bangla-
desh Liberation War. International Organization 71 (2): 317–48.
Owen, John M. 1994. How Liberalism Produces Democratic Peace. International Secu-
rity 19 (2): 87–125.
Owen, John M. 1997. Liberal Peace, Liberal War: American Politics and International
Security. Ithaca: Cornell University Press.
Parsons, Craig. 2007. How to Map Arguments in Political Science. Oxford: Oxford
University Press.
Parsons, Craig. 2016. Ideas and Power: Four Intersections and How to Show Them.
Journal of European Public Policy 23 (3): 446–63.
Pedersen, Rasmus Brun, and Y. Reykers. 2017. Small States Bandwagon for Status?
A Comparative Study of Denmark and Belgium. Paper presented at The XVIII
Nordic Political Science Congress, University of Denmark (Odense), August 8–11.
Peirce, C. S. 1955. Philosophical Writings of Peirce. Ed. J. Buchler. New York: Dover.
Pennings, Paul. 2003. Beyond Dichotomous Explanations: Explaining Constitutional
Control of the Executive with Fuzzy-Sets. European Journal of Political Research 42
(4): 541–67.
Piccinini, Gualtiero. 2017. Activities Are Manifestations of Causal Powers. In Eppur
Si Muove: Doing History and Philosophy of Science with Peter Machamer, ed. M. P.
Adams, Z. Biener, U. Feest, and J. A. Sullivan, 171–82. Cham: Springer.
Pierson, Paul. 1996. The Path to European Integration: A Historical Institutionalist
Perspective. Comparative Political Studies 29 (1): 123–63.
Pierson, Paul. 2003. Big, Slow-Moving, and . . . Invisible: Macrosocial Processes in
the Study of Comparative Politics. In Comparative Historical Analysis in the Social
Sciences, ed. James Mahoney and Dietrich Rueschemayer, 177–207. Cambridge:
Pierson, Paul. 2004. Politics in Time: History Institutions and Social Analysis. Princeton:

Revised Pages
References 307
Phillips, Kristine. 2017. In the Latest JFK Files: The FBI’s Ugly Analysis on Martin
Luther King Jr., Filled with Falsehoods. Washington Post, November 4.
Ragin, Charles C. 1987. The Comparative Method: Moving beyond Qualitative and
Quantitative Strategies. Berkeley: University of California Press.
Ragin, Charles C. 2000. Fuzzy-Set Social Science. Chicago: University of Chicago
Press.
Ragin, Charles. 2004. Turning the Tables: How Case-Oriented Methods Challenge
Variable-Oriented Methods. In Rethinking Social Inquiry: Diverse Tools, Shared
Standards, ed. Henry E. Brady and David Collier, 125–41. Lanham, MD: Rowman
and Littlefield.
Ragin, Charles C. 2008. Redesigning Social Inquiry: Fuzzy Sets and Beyond. Chicago:
University of Chicago Press.
Ragin, Charles C., and Sarah Ilene Strand. 2008. Using Qualitative Comparative
Analysis to Study Causal Order: Comment on Caren and Panofsky (2005). Socio-
logical Methods and Research 36 (4): 431–41.
Reskin, Barbara F. 2003. Including Mechanisms in Our Models of Ascriptive Inequal-
ity. American Sociological Review 68 (1): 1–21.
Rickles, Dean. 2009. Causality in Complex Interventions. Medical Health Care and
Philosophy 12:77–90.
Rittinger, Eric R., and Mathew R. Cleary. 2013. Confronting Coup Risk in the Latin
American Left. Studies in Comparative International Development 48 (4): 403–31.
Roberts, Clayton. 1996. The Logic of Historical Explanation. University Park: Pennsyl-
vania State University Press.
Rohlfing, Ingo. 2008. What You See and What You Get: Pitfalls and Principles of
Nested Analysis in Comparative Research. Comparative Political Studies 41 (11):
1492–514.
Rohlfing, Ingo. 2012. Case Studies and Causal Inference: An Integrative Framework.
Houndmills: Palgrave Macmillan.
Rohlfing, Ingo. 2014. Comparative Hypothesis Testing via Process Tracing. Sociological
Methods and Research 43 (4): 606–42.
Rohlfing, Ingo, and Carsten Q. Schneider. 2016. A Unifying Framework for Caus-
al Analysis in Set- Theoretic Multimethod Research. Sociological Methods and
Research. https://doi.org/10.1177/0049124115626170
Rudalevige, Andrew. 2002. Managing the President’s Program: Presidential Leadership
and Legislative Policy Formulation. Princeton: Princeton University Press.
Rueschemeyer, Dietrich. 2003. Can One or a Few Cases Yield Theoretical Gains? In
Comparative Historical Analysis in the Social Sciences, ed. James Mahoney and Diet-
rich Rueschemayer, 305–37. Cambridge: Cambridge University Press.
Runhardt, Rosa W. 2015. Evidence for Causal Mechanisms in Social Science: Recom-
mendations from Woodward’s Manipulability Theory of Causation. Philosophy of
Science 82 (5): 1296–307.
Russo, Federica, and Jon Williamson. 2007. Interpreting Causality in the Health Sci-
ences. International Studies in the Philosophy of Science 21 (2): 157–70.
Russo, Federica, and Jon Williamson. 2011. Generic versus Single-Case Causality: The
Case of Autopsy. European Journal of the Philosophy of Science 1 (1): 47–69.
Salmon, Wesley. 1998. Causality and Explanation. Oxford: Oxford University Press.

Revised Pages
308 References
Samford, Steven. 2010. Averting “Disruption and Reversal”: Reassessing the Logic of
Rapid Trade Reform in Latin America. Politics and Society 38 (3): 373–407.
Sarkees, Meredith Reid. 2010. Defining and Categorizing Wars. In Resort to War: A
Data Guide to Inter-State, Extra-State, Intra-State, and Non-State Wars, 1816–2007,
ed. Meredith Reid Sarkees and Frank Whelon Wayman, 39–73. Washington, DC:
CQ Press.
Sartori, Giovanni. 1970. Concept Misformation in Comparative Politics. American
Political Science Review 64 (4): 1033–53.
Sartori, Giovanni. 1984. Guidelines for Concept Analysis. In Social Science Concepts: A
Systematic Analysis, ed. Giovanni Sartori, 15–85. Beverly Hills, CA: Sage.
Sawyer, R. Keith. 2004. The Mechanisms of Emergence. Philosophy of the Social Sci-
ences 34 (2): 260–82.
Schaffer, Frederic Charles. 2016. Elucidating Social Science Concepts: An Interpretivist
Guide. London: Routledge.
Scheier, Margrit. 2012. Qualitative Content Analysis in Practice. London: Sage.
Schimmelfennig, Frank. 2001. The Community Trap: Liberal Norms, Rhetorical
Action, and the Eastern Enlargement of the European Union. International Orga-
nization 55 (1): 47–80.
Schimmelfennig, Frank. 2015. Efficient Process Tracing: Analyzing the Causal Mecha-
nisms of European Integration. In Process Tracing: From Metaphor to Analytical
Tool, ed. Andrew Bennett and Jeffrey T. Checkel, 98–125. Cambridge: Cambridge
University Press.
Schmitt, Johannes, and Derek Beach. 2015. The Contribution of Process Tracing to
Theory-Based Evaluations of Complex Aid Instruments. Evaluation 21 (4): 429–47.
Schmitter, Philippe C. 1979. Still the Century of Corporatism? In Trends towards Cor-
poratist Intermediation, ed. P. Schmitter and G. Lehmbruch, 7–49. London: Sage.
Schneider, Carsten Q., and Ingo Rohlfing. 2013. Combining QCA and Process Trac-
ing in Set-Theoretical Multi-Method Research. Sociological Methods and Research
42 (4): 559–97.
Schneider, Carsten Q., and Ingo Rohlfing. 2016. Case Studies Nested in Fuzzy-Set
QCA on Sufficiency: Formalizing Case Selection and Causal Inference. Sociologi-
cal Methods and Research 45 (3): 526–68.
Schneider, Carsten Q., and Claudius Wagemann. 2012. Set-Theoretic Methods for the
Social Sciences: A Guide to Qualitative Comparative Analysis. Cambridge: Cam-
Schwartz-Shea, Peregrine, and Dvora Yanow. 2012. Interpretive Research Design: Con-
cepts and Processes. New York: Routledge.
Scriven, Michael. 2011. Evaluation, Bias, and Its Control. Journal of Multidisciplinary
Evaluation 7 (15): 79–98.
Siewert, Markus B. 2017. Qualitative Comparative Analysis. In Weiterentwicklungen
in den politikwissenschaftlichen Methoden. Innovative Techniken für qualitative und
quantitative Forschung, ed. Sebastian Jäckle. Wiesbaden: VS Springer.
Sil, Rudra, and Peter J. Katzenstein. 2010. Beyond Paradigms: Analytical Eclecticism in
the Study of World Politics. Basingstoke: Palgrave Macmillan.
Silver, Nate. 2013. The Signal and the Noise. New York: Penguin.

Revised Pages
References 309
Skaaning, Svend-Erik, John Gerring, and Henrikas Bartusevičius. 2015. A Lexical

Index of Electoral Democracy. Comparative Political Studies 48 (12): 1491–525.
Skocpol, Theda, and Margaret Somers. 1980. The Uses of Comparative History in
Macrosocial Inquiry. Comparative Studies in Society and History 22 (2): 174–97.
Smeets, Sandrino. 2015. Negotiations in the EU Council of Ministers: “And All Must
Have Prizes.” Colchester: ECPR Press.
Sober, Elliot. 2009. Absence of Evidence and Evidence of Absence: Evidential Transi-
tivity in Connection with Fossils, Fishing, Fine-Tuning, and Firing Squads. Philo-
sophical Studies 143 (1): 63–90.
Steel, Daniel. 2004. Social Mechanisms and Causal Inference. Philosophy of the Social
Sciences 34 (1): 55–78.
Steel, Daniel. 2008. Across the Boundaries: Extrapolation in Biology and Social Science.
Streek, Wolfgang, and Kathleen Thelen, eds. 2005. Beyond Continuity: Institutional
Change in Advanced Political Economies. Oxford: Oxford University Press.
Suganami, Hidemi. 1996. On the Causes of War. Oxford: Clarendon Press.
Swedberg, Richard. 2012. Theorizing in Sociology and Social Science: Turning to the
Context of Discovery. Theoretical Sociology 41 (1): 1–40.
Swedberg, Richard. 2014. The Art of Social Theory. Princeton: Princeton University Press.
Tannenwald, Nina. 1999. The Nuclear Taboo: The United States and the Normative
Basis of Nuclear Non-Use. International Organization 53 (3): 433–68.
Tannenwald, Nina. 2007. The Nuclear Taboo. Cambridge: Cambridge University Press.
Tarrow, Sidney. 2010. The Strategy of Paired Comparisons: Toward a Theory of Prac-
tice. Comparative Political Studies 43 (2): 230–59.
Tavory, Iddo, and Stefan Timmermans. 2014. Abductive Analysis: Theorizing Qualita-
tive Research. Chicago: University of Chicago Press.
Tetlock, Philip E., and Aaron Belkin, eds. 1996. Counterfactual Thought Experiments
in World Politics: Logical, Methodological, and Psychological Perspectives. Princeton:
Tetlock, Philip E., and Geoffrey Parker. 2006. Counterfactual Thought Experiments:
Why We Can’t Live without Them and How We Must Learn to Live with Them.
In Unmasking the West, ed. Philip E. Tetlock, Richard Lebow, and Geoffrey Parker,
14–46. Ann Arbor: University of Michigan Press.
Thies, Cameron G. 2002. A Pragmatic Guide to Qualitative Historical Analysis and
the Study of International Relations. International Studies Perspectives 3 (4): 351–72.
Tilly, Charles. 1995. To Explain Political Processes. American Journal of Sociology 100
(6): 1594–1610.
Time. 2009. A Plot to Kill Bolivia’s Leftist President? April 19.
Torre, Carlos de la. 2013. In the Name of the People: Democratization, Popular Orga-
nizations, and Populism in Venezuela, Bolivia, and Ecuador. European Review of
Latin American and Caribbean Studies 95(October): 27–48.
Trachtenberg, Marc. 2006. The Craft of International History. Princeton: Princeton
University Press.
Van Evera, Stephen. 1997. Guide to Methods for Students of Political Science. Ithaca:
Cornell University Press.

Revised Pages
310 References
Verkuilen, Jay. 2005. Assigning Membership in a Fuzzy Set Analysis. Sociological Meth-
ods and Research 33 (4): 462–96.
Wagemann, Claudius. 2017. Qualitative Comparative Analysis (QCA) and Set
Theory. Oxford Research Encyclopedia of Politics. https://doi.org/10.1093/acre-
fore/9780190228637.013.247
Wagner, Carl. G. 2001. Old Evidence and New Explanation III. Philosophy of Science
68 (3): S165–75.
Waldner, David. 2012. Process Tracing and Causal Mechanisms. In Oxford Handbook
of the Philosophy of Social Science, ed. H. Kincaid, 65–84. Oxford: Oxford Univer-
sity Press.
Waldner, David. 2014. What Makes Process Tracing Good? Causal Mechanisms,
Causal Inference, and the Completeness Standard in Comparative Politics. In
Process Tracing: From Metaphor to Analytic Tool, ed. Andrew Bennett and Jeffrey
Checkel, 126–52. Cambridge: Cambridge University Press.
Waldner, David. 2015. Process Tracing and Qualitative Causal Inference. Security Stud-
ies 24 (2): 239–50. https://doi.org/10.1080/09636412.2015.1036624
Walker, Vern. 2007. Discovering the Logic of Legal Reasoning. Hofstra Law Review 35
(4): 1687–1708.
Waltz, Kenneth N. 1979. Theory of International Politics. New York: McGraw-Hill.
Waskan, Jonathan. 2008. Knowledge of Counterfactual Interventions through Cogni-
tive Models of Mechanisms. International Studies in the Philosophy of Science 22
(3): 259–75.
Waskan, Jonathan. 2011. Mechanistic Explanation at the Limit. Synthese 183 (3): 389–
408.
Wasserman, Stanley, and Katherine Faust. 1994. Social Network Analysis. Cambridge:
Wauters, Benedict, and Derek Beach. 2018. Process-Tracing and Congruence Analysis
to Support Theory-Based Impact Evaluation. Evaluation 24 (3): 284–305.
Weisberg, Jonathan. 2009. Commutativity or Holism? A Dilemma for Conditional-
izers. British Journal of the Philosophy of Science 60 (4): 793–812.
Weller, Nicholas, and Jeb Barnes. 2014. Finding Pathways: Mixed-Method Research for
Studying Causal Mechanisms. Cambridge: Cambridge University Press.
Wendt, Alexander. 1999. Social Theory of International Politics. Cambridge: Cambridge
University Press.
White, Howard. 2009. Theory-Based Impact Evaluation: Principles and Practice. Inter-
national Initiative for Impact Evaluation Working Paper 3. http://www.3ieimpact.
org/media/filer_public/2012/05/07/Working_Paper_3.pdf
White, Timothy J. 2000. Cold War Historiography: New Evidence behind Traditional
Typologies. International Social Science Review 75 (3–4): 35–46.
Wight, Colin. 2004. Theorizing the Mechanisms of Conceptual and Semiotic Space.
Philosophy of the Social Sciences 34 (2): 283–99.
Williams, Malcolm and Wendy Dyer. 2009. Single Case Probabilities. In The SAGE
Handbook of Case-Based Methods, ed. David Byrne and Charles C. Ragin, 85–100.
London: Sage.
Williams, Timothy, and Sergio M. Gemperle. 2017. Sequence Will Tell! Integrating

Revised Pages
References 311
Temporality into Set-Theoretic Multi-Method Research Combining Comparative

Process Tracing and Qualitative Comparative Analysis. International Journal of
Social Research Methodology 20 (2): 121–35.
Williams, William Appleman. 1962. The Tragedy of American Diplomacy. New York:
Delta.
Wittgenstein, Ludwig. 1973. Philosophical Investigations: The English Text of the Third
Edition. New York: Macmillan.
Wohlforth, William C. 1997. New Evidence on Moscow’s Cold War: Ambiguity in
Search of Theory. Diplomatic History 21 (2): 229–42.
Woodward, Bob. 2004. Plan of Attack. New York: Simon and Schuster.
Woodward, James. 2003. Making Things Happen: A Theory of Causal Explanation.
Woodward, James. 2004. Counterfactuals and Causal Explanation. International Stud-
ies in the Philosophy of Science 18:41–72.
Yes Prime Minister. 1987. Series 2, episode 2, “Official Secrets.” Aired December 10,
BBC.
Zaks, Sherry. 2017. Relationships among Rivals (RAR): A Framework for Analyzing
Contending Hypotheses in Process Tracing. Political Analysis 25 (3): 344–62.
Ziblatt, Daniel. 2009. Shaping Democratic Practice and the Causes of Electoral Fraud:
The Case of Nineteenth-Century Germany. American Political Science Review 103
(1): 1–21.

Master Pages
Index
Abduction, 269, 286 184, 187, 190–91, 196–97, 199, 200,

Activities. See Operationalization of 204, 205, 210, 250–51, 263
causal mechanism prior probability, 179
Account evidence, 157, 172, 198, 213, 215– priors, 36, 86, 158, 175–77, 182–87,
16, 219–20 263–64
Actor /micro level, 13, 17, 48–50, 51, 54, probability of accuracy, 204, 206, 207
65, 72, 83, 84 probability theory, 19
Aggregation updating, 22, 87, 107, 110, 157, 173,
of cases, 34, 119 175, 178, 180–81
pieces of evidence, 157, 165, 223, 228, Bias / systematic errors, 169, 171, 177,
229, 236, 238–42, 266 204–5, 207, 208, 210, 211, 213,
Alignment. See Methodological align- 215–20
ment Black boxing mechanisms. See Causal
Alternative explanations, 5, 41–43, 44, mechanisms
155, 156, 158, 180, 191, 205, 213, 259 Body of evidence, 166, 224, 229, 244,
Analogies, 86, 208 267
Asymmetric causal relationship. See Bureaucratic politics, theory of, 288
Causality
Archival observations, 166, 197, 199, 200, Calibrating concepts. See Threshold
211, 214, 215–19 Case-centric research, 283–85
Case selection, 283–85
Bayesian logic deviant cases, 6, 10, 11, 24–25, 96–97;
Bayes theorem, 178–81 consistency cases, 102; new causes,
empirical predictions (case specific), 101–2
187, 189, 255, 259 explaining-outcome process-tracing,
likelihood ratio, 14, 24, 107–8, 176. 285
178, 179, 180, 206 generalization, 7–8, 45, 47, 67, 72,
logic of inference, 155–58 74, 91, 90, 123, 260; and case-based
posterior probability, 175, 177, 179, research, 111–12; and causal mecha-
199, 206 nisms, 103–5; and variable based
predicted evidence, 10–11, 21, 180–81, research, 105–10
313

Master Pages
314 Index
Case selection (continued) mechanisms as systems, 9, 31, 37–41,

snowballing-outward strategy, 104–5, 70–71, 276
129–37 parts of (see under Operationalization
theory-building process-tracing, of causal mechanisms, activities and
270–71 entities)
theory revision process-tracing, 274– psychological, 271, 273
75 situational, 48, 50, 51, 84
theory-testing process-tracing, 257–58 macro-level, 48–50, 51, 52, 54, 71, 81,
typical cases, 97–101; least likely, 22– 83–84
23, 105–10, 114, 177; most likely, 22, structural, 48, 51, 84, 85, 86, 271–72
105–10, 114 transformational, 48, 51, 84
Case studies Causal process observation (CPO), 157,
definition of, 12–13 165–71
deviant (see Case selection) Causal relation
irrelevant, 90, 97, 111 contributing, 47, 50, 98, 274
negative, 18, 26, 62, 91–92, 97, 145 mechanism, 30–35
pathway cases, 100 minimally sufficient, 282–87
positive, 18, 25, 27, 55, 63, 91, 92, 95, necessary, 18, 60, 62
107, 113, 150, 188 spurious, 17, 37
typical (see Case selection) sufficient, 12, 23, 42, 44, 50, 60, 68,
Case types. See Case studies 92, 98, 102, 111, 143, 145, 169, 179,
Causal effect, 40, 56, 100 186, 192, 271, 274, 278
mean causal effect, 13–16, 19–22, 45, Causality
161 asymmetric, 25–27, 55, 61, 94, 97, 106,
Causal inference 110, 175
across case (see Comparison) constitutive, 15, 78
confirming, 108, 110, 133, 155–56, 158, deterministic, 15–19, 24–26, 173–74
174–75, 179–80, 187 probabilistic, 15, 16, 18, 19–21, 24–26,
disconfirming, 155, 158, 172, 180, 182, 100, 107
196, 230 regularity, 45–47
within-case (see Comparison) symmetric, 15, 25–27, 175
Causal graph, 64, 66–67 singular, 45–47
Causal homogeneity, 6, 7, 16, 20, 22, Certainty of predictions (theoretical or
103, 112, 144, 161, 186 empirical). See Empirical predic-
Causal mechanisms tions
actor /micro level (see Actor / micro) Comparison
black boxing mechanisms, 3, 32–33, across case, 3, 5, 19–20, 23, 33, 46, 55–
68, 159 56, 63, 80, 95, 107–8, 112, 116, 124,
conceptualization of, 64–69, 69–73, 131–32, 150, 170, 188
257–58, 276, 286 counterfactual, 13–15, 25, 32–34, 46,
conglomerate mechanisms, 284 55, 159, 162–65, 168
equifinality, 6 within-case, 2, 4, 13, 23, 26, 29, 31, 33,
generalization, 96, 112, 114, 144 34–36, 42, 46, 55, 89, 100, 104–5
ideational, 85, 86, 188, 221, 256, 271, Comparative methods
283 crisp set, 62, 99, 122, 137, 142–43
institutional, 78, 82–85 fuzzy set, 111, 122–24, 127, 133, 143–44

Master Pages
Index 315
Mill’s methods of agreement, 101, 258, memoirs, 199, 219–20, 256

276 newspaper sources, 221–22
most different, 140 primary sources, 207–9, 216, 219–20
most similar, 99, 101, 131, 152, 159, 162, public speeches, 219–20
258, 276 secondary sources, 207, 209, 214, 220,
QCA, 6, 90, 105, 111–20, 122, 124–29, 231, 233
132, 138–39, 141, 144–47 Deductive analysis. See Research
Conceptualization, 57, 61, 111, 253, 254, strategy
257–58, 260, 276 Democratic peace, theory of, 93, 257
Concepts Dependent variable, 21, 61, 159, 160
attribute, 53, 56–57, 58–60, 62, 84, 87, Descriptive inference. See Inference
92, 93, 113, 119–21, 133, 141, 143, 221 Determinism. See Causality
formation, 119–20 Deviant case. See Case studies
indicators, 57 Dichotomous variable, 26, 61, 94, 110
measurement, 122 Difference in degree / crisp set concepts.
negative pole, 61–62 See Comparative methods
positive pole, 60–62 Difference in kind / fuzzy set concepts.
structure: and: 127, 128; or: 119, 120, See Comparative methods
121, 136, 138, 141 Difference making, 13, 21, 26, 33, 34, 46,
Conceptual stretching, 58 55, 56, 97, 100, 116, 157, 159–65, 168,
Conditions (theoretical) 182, 192, 245
necessary (see Causal relation) Disaggregating cases, 26, 35, 160, 161
sufficient (see Causal relation) Doubly decisive test. See Empirical
versus variables, 55–56 tests
Confirming tests. See Evidence
Congruence methods (pattern match- Eclectic theorization, 12, 282, 283, 284
ing), 183, 246, 249 Empirical certainty, 155–56, 197, 200–
Constitutive causation. See under Cau- 201, 206, 210, 211, 224–36, 255. See
sality also Empirical predictions
Contextual conditions, 78, 91 Empirical fingerprints, 38, 39, 49, 52, 68,
Contextual factors. See Scope conditions 83 155–57, 165, 169, 170, 171, 173–75,
Correlations, 37, 45, 46, 97, 138, 150, 166, 180, 186–90, 192, 195, 198–204, 209,
182, 186 223, 227, 237, 245–47, 253, 258, 270,
Counterfactuals, 32, 33, 34, 46, 162, 168 277
minimal rewrite, 162, 163 Empirical predictions (empirical finger-
Court trial (process-tracing similar to), prints)
171, 174, 179, 230 empirical certainty of observations,
Crisp-set concepts /difference in kind. 155–56, 197, 200–201, 206, 210, 211,
See Comparative methods 224–36, 255
Cuban Missile Crisis, 23, 281, 282, 285, empirical uniqueness of observations,
286, 288 155–56, 196, 200–204, 205–6, 210,
211, 213, 224–36
Data sources theoretical certainty of predictions,
archives, 189, 200, 212, 216–19 155–56, 158, 175–76, 178, 180–81,
interview (elite), 188, 198, 200–202, 190–93, 206, 224–36, 244, 250, 251,
209–12, 213–15 259, 263

Master Pages
316 Index
Empirical predictions (continued) mechanistic, 4–6, 14, 35, 40, 47, 55–
theoretical uniqueness of predic- 56, 164
tions, 44, 100, 138, 142, 155–56, 158, pattern, 172, 198, 216
175–76, 178, 180–81, 190–93, 205–6, probative value, 157, 166, 168, 175–77,
224–36, 244, 259, 263 204, 223–24
Empirical test types sequence, 172, 213, 216, 219
hoop tests, 166, 167, 248, 259 trace, 172, 216, 265
smoking gun, 166, 168, 169, 205, 224, Experiments, 8, 13–16, 44, 55, 56, 81, 157,
225, 228, 234, 247, 250, 251 159–62, 182
straw-in-the-wind, 230 Explaining-outcome process-tracing, 2,
Empirical uniqueness, 155–56, 196, 200– 6, 9, 11–12, 281
206, 210, 211, 213, 224–36 Abductive (see Abduction)
Endogeneity, 161 composite / conglomerate mecha-
Entities. See Operationalization of causal nisms, 284
mechanisms empirics first, 286–87
Epistemological probabilism, 16, 174 historical outcomes, 2, 9, 11, 281, 283,
Epistemology, 16, 19, 173 284, 287
Equifinality. See Causal mechanisms theory first, 286–87
Errors Extrapolation, 255–57
nonsystematic, 103, 204, 284, 285
systematic, 204, 217 Falsification, 173, 174, 178
Essentialist position, 57 Findings
Evaluating evidence negative, 59, 250
accuracy of observations, 195, 197, positive, 250–51
204–7, 212–13, 214, 216, 233, 259
bias / systematic errors (see Bias) Group-think, 13, 86, 185–86, 198, 278–
content of observations, 5, 172, 195, 79
201, 205, 210, 211, 259
reliability / nonsystematic errors (see Historical scholarship, 199–207, 214, 221,
Bias) 231, 259, 282, 284
Events /empirical narratives. See Nar- Homogeneity. See under Causal homo-
rative geneity
Evidence Homogeneous populations. See Popula-
account, 157, 172, 198, 213, 215, 216, tion
219, 220 Hume, David: neo-Humean, 45–46
archival observations (see Data) Hypothesis, 93, 167, 169, 172, 175, 178,
confirming, 110, 179, 182, 184, 192, 179, 180, 182, 186
225, 230, 235, 249
difference making, 13, 21, 26, 33, 34, Idiosyncratic, 132, 138, 140, 251, 255
46, 55, 56, 97, 100, 116, 157, 159–65, Independence of evidence / observa-
182 tions, 205, 215, 230–32
disconfirming, 172, 174, 221, 235 Independent variable, 61
e silentio evidence, 189, 291 In-depth theory-testing process-tracing.
evidentiary weight, 178, 229, 234 See Process-tracing

Master Pages
Index 317
Indicators. See Concepts; Empirical Minimalist understanding of mecha-

fingerprints nisms, 39, 64–69, 288
Inference Mixed-methods. See Nested analysis
causal, 66, 72, 77, 100, 133, 161, 162, Modularity (building blocks), 54, 85,
166, 174, 187, 195, 208, 229, 246 256
descriptive, 66, 220 Monist, 14
Inferential leap, 186, 273, 279 Most different system design. See Com-
Inferential weight, 189, 201, 267 parative methods
Intervening variable, 3, 5, 20, 30, 31, 33, Most likely cases. See Case selection
34, 35, 66, 159 Most similar system design. See Com-
Interviews (elite). See under Data sources parative methods
Large-N, 3, 92, 93, 105, 106, 162 Narrative, 124, 147, 152, 156
Least likely case. See Case selection descriptive, 2, 3, 11, 31, 32, 87, 88, 159,
Likelihood ratio. See Bayesian logic 277
events, 45, 46, 51, 87, 95, 101, 148, 156,
Manipulation, 164 161
Measurement Necessary condition. See Causal relation
accuracy, 199, 205, 207, 212 Nested analysis, 103, 105–6
error, 195, 204, 214, 215, 217 Neo-positivist, 7, 45, 46, 47, 52, 96,
Mechanismic evidence. See Evidence 281
Mechanistic heterogeneity, 6, 7, 41, 54, Non-homogeneous populations. See
77–81, 90, 91, 99, 104, 106, 112–32, Population
135, 136
Mechanistic homogeneity, 41, 90, 96, Observations
112, 119, 124, 125, 129, 131, 132, 135, independence of sources, 222, 231
136, 140, 144, 149, 176, 260, 277 raw data material, 5, 166, 177
Mechanisms as systems. See Causal Observable implications, 166, 177, 195
mechanisms Omitted causal conditions, 6, 11, 55, 102,
Membership. See Comparative methods, 116, 118, 135, 138–40, 144, 152, 274
fuzzy set; Crisp set concepts; Set Ontological determinism, 15–19
theory Ontology, 16, 19, 61, 173
Metaphysic, 45 Operationalization of causal mechanisms
Method of agreement. See Comparative activities, 3–5, 18, 37–40, 64, 66, 68–
methods 72, 74, 80, 83
Meteor, 170 entities, 3, 9, 34, 38, 47, 52, 64, 69–73
Methodological alignment, 14 Operationalization of empirical tests,
Micro / macro debate, 12, 17. See also 187, 245, 258–60
Actor / micro level Outcome, 13, 14, 18, 21, 22–25
Midrange theory, 77, 153
Mill’s methods of agreement. See Com- Parsimony, 96
parative methods Parts of a mechanism
Mill’s methods of disagreement. See nonsystematic, 103, 284, 285
Comparative methods systematic, 45–46, 103

Master Pages
318 Index
Philosophy deductive / theory first, 12, 172, 187–

of pragmatism, 12, 283 89, 286
of science, 12, 14, 30, 45, 192 inductive / empirics first, 10, 87, 175,
Plausibility probes, 29, 37, 67, 87, 109, 180, 188, 190, 200, 269
182, 183, 186, 234, 246–47, 249–50, Residual value, 97, 105, 106
258, 266–67, 292 Rival theory (mutually exclusive), 42, 43,
Polity IV democracy data set, 92 61, 63, 191, 192, 245, 247, 284
Popper, 173, 174
Population, 14–16, 19, 20, 22, 26, 27 Scale of measurement, 22, 26, 61, 63,
bounded, 37, 38 93–94, 122, 136, 142–43
heterogeneous population, 58, 96, 115 Scope conditions, 78
homogenous population, 20, 41, 57, contextual factors, 43, 105, 113, 116,
105, 113, 115, 119 139, 257
Positive on outcome. See Method of Secondary sources. See Sources
agreement Set-theory, 27, 62
Potential outcome framework, 159 membership, 27, 63, 64, 94, 95, 102–3,
Predicted evidence. See under Bayesian 111–12, 122, 123, 125, 127, 128, 129
logic Sherlock Holmes, 216–17, 289
Priors. See under Bayesian logic “Silver Blaze” (A. Conan Doyle), 289–93
Probes. See under Bayesian logic Sinatra-inference, 108, 110
Process-tracing Single case studies, 1, 2, 246
definition of, 1–2 Skocpol, Theda, 96, 220
explaining outcome, 2, 6, 9, 11–12, Soaking and probing, 150, 174, 269
281–88 Sources
theory building, 10, 268–74, 275–78 primary, 207–9, 216, 219–20, 231
theory refining, 10, 274–75 secondary, 207, 209, 214, 220, 231, 233
theory testing, 9, 245–68; in-Depth Small-N analysis, 7, 41, 104, 161
theory-testing process-tracing, 223, Snow balling. See Case selection
252–55 Stochastic, 15, 20
Productive continuity, 64, 70, 73, 245, Structure vs. agency, 17, 38, 48, 49, 71,
253–54, 288 83, 85, 86
Propositions, 7, 9, 70, 156–58, 180–81, Sufficient condition. See Causal relation
186, 189, 195–96, 198–202, 223– Systematic errors / bias. See Bias
24 Systematic mechanisms, 284
definition of, 172
Taboo-talk, 238
QCA. See Comparative methods Tannenwald, Nina, 181, 192, 193, 223,
Qualitative threshold. See Thresholds 236–44
Quantum physics, 19 Temporality of causal mechanisms, 81–83
Tests
Regression analysis, 15, 97, 105–6, 113 confirming (see Evidence)
Regular association, 45, 46 congruence, 24, 183, 246, 249
Reliability, 204, 206, 210, 214, 220, 221 disconfirming (see Evidence)
Research strategy Test types. See Empirical test types
abductive (abduction), 173, 277 Theory-before-facts / deductive. See

Master Pages
Index 319
Research strategy, deductive / Types of mechanisms

theory first action formation level, 48
Theoretical concepts. See Concepts macro-level mechanisms, 48–50, 51,
Theoretical explanations 52, 83–84
ideational, 85, 86, 256, 271, 283 situational mechanisms, 48, 50, 51,
institutional, 85, 271 84, 208
psychological, 85, 86, 271, 273, 279 transformational mechanisms, 48,
structural, 17, 38, 48, 49, 71, 83, 85, 51, 84
86, 271
Theoretical certainty, 155–56, 158, Uniqueness of predictions (theoretical or
175–76, 178, 180–81, 190–93, 206, empirical). See Empirical predictions
224–36, 244, 250, 251, 259, 263
Theoretical uniqueness, 44, 100, 138, Validity
142, 155–56, 158, 175–76, 178, 180– external, 112, 182
81, 190–93, 205–6, 224–36, 244, internal, 77, 103, 174–75, 178, 179, 182,
259, 263 205–6
Threshold (concepts) Variance-based explanations, 5, 7, 14–15,
cutoff points, 62, 94, 95, 123 20, 21, 25, 30, 115, 119, 129, 144, 157,
defining qualitative thresholds, 62–64, 158, 159–60, 175, 176, 192
94, 95 Variation, 55–56, 97, 105–6, 160–61, 163,
uncertainty, 102–3, 106 165, 170, 171, 173, 182
Theory-building process-tracing. See
Process-tracing Within-case inferences, 89, 100, 157, 175,
Theory-testing process-tracing. See 251
Process-tracing
Triangulation, 205, 210, 212, 214, 215, 219 Y-centric / case-centric. See Explaining-
Typical case. See Case selection; Case outcome process-tracing
studies


Beach and Pedersen (2019) Process Tracing Methods. 2nd Edition

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Beach and Pedersen (2019) Process Tracing Methods. 2nd Edition

Загружено:

Авторское право:

Доступные форматы

Revised Pages

Derek Beach is Professor of Political Science at Aarhus University,

Beach_2d-edn.indd 1 9/28/2018 1:08:22 PM

Beach_2d-edn.indd 2 9/28/2018 1:08:22 PM

Derek Beach and Rasmus Brun Pedersen

University of Michigan Press

Beach_2d-edn.indd 3 9/28/2018 1:08:22 PM

Copyright © 2019 by Derek Beach and Rasmus Brun Pedersen

This book may not be reproduced, in whole or in part, including illustrations,

Published in the United States of America by the

Library of Congress Cataloging-­in-­Publication data has been applied for.

ISBN 978-­0-­472-­13123-­5 (hardcover : alk. paper)

Beach_2d-edn.indd 4 9/28/2018 1:08:22 PM

Beach_2d-edn.indd 5 9/28/2018 1:08:22 PM

Chapter 4. Case Selection and Nesting of Process-­Tracing

Beach_2d-edn.indd 6 9/28/2018 1:08:22 PM

Chapter 8. Theory-­Testing Process-­Tracing 245

Digital materials related to this title can be found on

Beach_2d-edn.indd 7 9/28/2018 1:08:22 PM

Beach_2d-edn.indd 8 9/28/2018 1:08:22 PM

We are in debt to the growing volume of work on case-­based methods in the

Beach_2d-edn.indd 9 9/28/2018 1:08:22 PM

Derek Beach and Rasmus Brun Pedersen

Beach_2d-edn.indd 10 9/28/2018 1:08:22 PM

1.1. Introduction: The Three Components of Process-­Tracing

Process-­tracing is a research method for tracing causal mechanisms using de-

Beach_2d-edn.indd 1 9/28/2018 1:08:22 PM

First, a “process” is more than a descriptive narrative of what happened

1.2. What Are We Tracing?

While the term process-­tracing is often attached to descriptive analyses of

Beach_2d-edn.indd 2 9/28/2018 1:08:22 PM

Beach_2d-edn.indd 3 9/28/2018 1:08:22 PM

1.3. What Are Traces?

How can we make causal inferences about mechanisms when we possess

Beach_2d-edn.indd 4 9/28/2018 1:08:22 PM

In chapter 5, we first discuss why many existing approaches to making

Beach_2d-edn.indd 5 9/28/2018 1:08:22 PM

1.4. Case Selection and Generalization

When process-­tracing has the aim of developing a comprehensive expla-

Beach_2d-edn.indd 6 9/28/2018 1:08:22 PM

Beach_2d-edn.indd 7 9/28/2018 1:08:22 PM

sensitivity of mechanisms is becoming increasingly acknowledged and is of-

The focus is on understanding a system’s structure and dynamics.

Appreciating complexity means that our claims become more contex-

Beach_2d-edn.indd 8 9/28/2018 1:08:22 PM

1.5. Four Variants of Process-­Tracing

In this book, we distinguish between four variants of process-­tracing accord-

In chapter 8, we develop theory-­testing process-­tracing. Theory-­testing starts

TABLE 1.1. Four Variants of Process-­Tracing

Beach_2d-edn.indd 9 9/28/2018 1:08:22 PM

Theory- ­Building and Theoretical- ­Revision Process-­Tracing

In chapter 9, we develop two variants of process-­tracing: theory-­building

Beach_2d-edn.indd 10 9/28/2018 1:08:22 PM

Explaining- ­Outcome Process-­Tracing

In chapter 10, we discuss explaining-­outcome process-­tracing. Explaining-­

Beach_2d-edn.indd 11 9/28/2018 1:08:22 PM

tinual and creative juxtaposition between empirical material and theories

1.6. A Primer on the Foundations of Case-­Based Methods

All case-­based methods, including process-­tracing, share some key elements

What Are Cases?

As the name suggests, the basic level of causal relationship in case-­based

Beach_2d-edn.indd 12 9/28/2018 1:08:22 PM

Ontological Questions Have Methodological Implications

Library of Congress Cataloging-in-Publication data has been applied for.

ISBN 978-0-472-13123-5 (hardcover : alk. paper)

Chapter 4. Case Selection and Nesting of Process-Tracing

Chapter 8. Theory-Testing Process-Tracing 245

We are in debt to the growing volume of work on case-based methods in the

1.1. Introduction: The Three Components of Process-Tracing

Process-tracing is a research method for tracing causal mechanisms using de-

While the term process-tracing is often attached to descriptive analyses of

When process-tracing has the aim of developing a comprehensive expla-

1.5. Four Variants of Process-Tracing

In this book, we distinguish between four variants of process-tracing accord-

In chapter 8, we develop theory-testing process-tracing. Theory-testing starts

TABLE 1.1. Four Variants of Process-Tracing

Theory- Building and Theoretical- Revision Process-Tracing

In chapter 9, we develop two variants of process-tracing: theory-building

Explaining- Outcome Process-Tracing

In chapter 10, we discuss explaining-outcome process-tracing. Explaining-

1.6. A Primer on the Foundations of Case-Based Methods

All case-based methods, including process-tracing, share some key elements

As the name suggests, the basic level of causal relationship in case-based

Ontological Probabilism—Causation as Mean Causal Effects

Ontological Determinism—Outcomes in Cases Have Causes

In contrast, in a case-based understanding of ontological determinism, an

major implications for proper case selection in case-based research because

The essence of making a mechanism-based claim is that we shift the an-

In theory-guided social science research (a category to which process-

TABLE 2.1. Different Understandings of What Process-Tracing Is Tracing

Fig. 2.1. Empirical narratives and the complete black-boxing of mechanisms

variation makes across cases—for example, by disaggregating a case tempo-

mechanistic evidence that we gain from a minimalist process-tracing case

Unpacked Mechanisms—Mechanisms as Systems

Fig. 2.3. A two-part “taboo talk” causal mechanism

causation to be made. The analytical trade-off is that we can engage in de-