Вы находитесь на странице: 1из 11

Law and Human Behavior 2013 American Psychological Association

2013, Vol. 37, No. 3, 197207 0147-7307/13/$12.00 DOI: 10.1037/lhb0000027

The Expression and Interpretation of Uncertain Forensic Science Evidence:


Verbal Equivalence, Evidence Strength, and the Weak Evidence Effect
Kristy A. Martire, Richard I. Kemp, Ian Watkins, Malindi A. Sayle, and Ben R. Newell
University of New South Wales

Standards published by the Association of Forensic Science Providers (2009, Standards for the formu-
lation of evaluative forensic science expert opinion, Science & Justice, Vol. 49, pp. 161164) encourage
forensic scientists to express their conclusions in the form of a likelihood ratio (LR), in which the value
of the evidence is conveyed verbally or numerically. In this article, we report two experiments (using
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

undergraduates and Mechanical Turk recruits) designed to investigate how much decision makers change
This document is copyrighted by the American Psychological Association or one of its allied publishers.

their beliefs when presented with evidence in the form of verbal or numeric LRs. In Experiment 1 (N
494), participants read a summary of a larceny trial containing inculpatory expert testimony in which
evidence strength (low, moderate, high) and presentation method (verbal, numerical) varied. In Exper-
iment 2 (N 411), participants read the same larceny trial, this time including either exculpatory or
inculpatory expert evidence that varied in strength (low, high) and presentation method (verbal, numer-
ical). Both studies found a reasonable degree of correspondence in observed belief change resulting from
verbal and numeric formats. However, belief change was considerably smaller than Bayesian calculations
would predict. In addition, participants presented with evidence weakly supporting guilt tended to
invert the evidence, thereby counterintuitively reducing their belief in the guilt of the accused. This
weak evidence effect was most apparent in the verbal presentation conditions of both experiments, but
only when the evidence was inculpatory. These findings raise questions about the interpretability of LRs
by jurors and appear to support an expectancy-based account of the weak evidence effect.

Keywords: forensic science, evidence strength, interpretation, juror decision making

Forensic scientists can never compare the sample (e.g., finger- numerical and verbal expressions; see Table 1) indicating opinions
print, toolmark, shoeprint) they have with every potential contrib- should be based upon the estimation of a likelihood ratio (p. 161)
uting sourceas would be required to individualize a sample to a reflecting the ratio of two probabilities (making no reference to the
source. Moreover, it is possible for two different sources to leave possibility of error): (a) the likelihood of obtaining a piece of
indistinguishable markings (or, conversely, for the same source to evidence given a proposition broadly consistent with the prosecu-
result in two distinguishable markings; Koehler & Saks, 2010). For tions case; compared with (b) the likelihood of obtaining a piece
both of these reasons, expert opinions must have a statistical basis of evidence given an alternative (defense) proposition. For exam-
and therefore reflect some degree of uncertainty (National Acad- ple, following these guidelines, a forensic scientist could express
emies of Science, 2009; Thornton & Peterson, 2007). the results of their analysis of a shoe print in the following manner:
Growing acceptance of the value of a probability framework as In my opinion the correspondence between the footwear mark at
a coherent logical foundation for forming opinions in the forensic the crime scene and the shoe of the accused is 4.5 times more
sciences (Aitken et al., 2011) can be seen in documents recom- likely when the shoe has made the mark [Proposition 1], than when
mending that evaluative opinions be formulated in line with prob- the shoe has not made the mark [Proposition 2].
ability theory and Bayesian principles of belief updating (Berger, To date, uptake of these standards has varied considerably
2010). In particular, the Association of Forensic Science Providers across jurisdictions and disciplines. For example, although the use
(AFSP; 2009) published a set of standards (including a scale of of LRs is now standard practice for cartridge-case and bullet
comparisons in the Netherlands (Stoel, 2012), and is actively being
explored in other disciplines and jurisdictions (see Morrison, 2011;
Neumann, Evett, & Skerret, 2012), in the United States, the use of
Kristy A. Martire, Richard I. Kemp, Ian Watkins, Malindi A. Sayle, and LRs for any non-DNA comparison is rare.
Ben R. Newell, School of Psychology, University of New South Wales, Even so, in 2011, in the wake of critical attention from the U.K.
Sydney, Australia. Court of Appeal in the case of R v T (2010), the available standards
This article was facilitated in part by funding from the Australian were extended to address forms of expression (Aitken et al., 2011).
Research Council to Richard I. Kemp (Linkage Project LP110100448) and
In the resultant position statement, 31 leading stakeholders,
Ben R. Newell (Discovery Grant DP110100797; Future Fellowship
largely, although not exclusively, from European institutions and
FT110100151). Sincerest thanks to Jon Berengut for his kind assistance in
the preparation of this article. organizations, reaffirmed that LRs are the most appropriate foun-
Correspondence concerning this article should be addressed to Kristy A. dation for assisting the court. This group also asserted that a
Martire, School of Psychology, University of New South Wales, Sydney, verbal scale based on the notion of the LR is the most appropriate
NSW, Australia, 2052. E-mail: k.martire@unsw.edu.au basis for communication of an evaluative expert opinion to the

197
198 MARTIRE, KEMP, WATKINS, SAYLE, AND NEWELL

Table 1 of a set of standardized verbal labels proposed for use by the


Standards for Numerical and Verbal Expression of Likelihood American Board of Forensic Odontology (ABFO). The ratings
Ratios (Association of Forensic Science Providers, 2009) provided by these mock jurors were roughly the opposite of what
the ABFO intended. For example, participants attributed the great-
Recommended likelihood ratio terminology est certainty to testimony of a match (86 on a 100-point scale),
Numerical expression Verbal expression (support) but this was the phrase that the ABFO reserved for the lowest level
of certainty among the four options. Conversely, the phrase the
110 Weak or limited
10100 Moderate
ABFO designated for the strongest expression of certainty, rea-
1001,000 Moderately strong sonable scientific certainty, was rated as second most uncertain
1,00010,000 Strong by participants (70.7/100).
10,0001,000,000 Very strong More recently, de Keijser and Elffers (2012) used realistic
1,000,000 Extremely strong technical forensic reports to examine how well judges and lawyers
(jurists) and experts in the Netherlands understood evaluative
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

expert opinions expressed using the scale recommended by the


This document is copyrighted by the American Psychological Association or one of its allied publishers.

court (see Aitken et al., 2011, Point 5). Importantly, this debate Netherlands Forensic Institute (NFI; Berger, 2010). The results
about the best way to present the results of complex forensic indicated that although experts (members of the NFI) showed a
science analyses in court has occurred seemingly without reference greater ability to identify correct interpretations of the opinions
to the body of empirical evidence amassed by psychologists relat- compared with jurists, both groups commonly made errors inter-
ing to reasoning under uncertainty. preting the LRs and had little insight into their limited understand-
Psychological evidence suggests that people often have diffi- ing of the report conclusions.
culty understanding probabilities and statistics (Gigerenzer & Ed- These studies all suggest that several questions still remain
wards, 2003) and that they tend to confuse likelihoods and poste- regarding the interpretability of verbal expressions of LRs as
rior beliefs (as in the prosecutor and defense-attorney fallacies; formulated under the new standards proposed by the AFSP. More-
Koehler, 1996; Thompson, 1989; Thompson & Schumann, 1987) over, given that values on the scale range from weak or limited
in a way that may sometimes lead to overvaluing of, and at other to extremely strong support, it is also unclear how the degree of
times undervaluing of, evidence relative to normative (i.e., Bayes- numerical and verbal correspondence varies with evidence
ian) models (Faigman & Baglioni, 1988; Goodman, 1992; Kaye & strength across the scale (see Table 1). Accordingly, the current
Koehler, 1991; Smith, Bull, & Holliday, 2011; Smith, Penrod, study was designed to measure change in belief regarding the
Otto, & Park, 1996). The recommendation that forensic scientists likely guilt or innocence of the defendant, given verbal or numer-
should use verbal equivalents may also be problematic, given the ical expert evaluative opinions of various strengths. To this end,
psychological evidence indicating that the meaning attributed to a we measured belief prior to hearing a forensic scientists opinion
single word can vary from person to person and from context to (prior belief), and after hearing the experts opinion (posterior
context (e.g., Brun & Teigen, 1988; Budescu, Broomell, & Por, belief). This approach allows us to calculate belief change in
2009; Budescu, Por, & Broomell, 2012; Wallsten & Budescu, response to the evidence, which can be used (a) to assess the
1995). equivalence of the verbal and numerical scales over various levels
Three key psychological experiments speak particularly to the of evidentiary strength, and (b) to compare the observed change
(non)equivalence of verbal and numerical expressions in the fo- from prior belief with posterior belief with the change that was
rensic science arena, and to the question of how the decision maker intended by the expert.
will interpret evidence presented in the form of LRs. The first, by It is predicted that evidence attributed greater strength by the
Nance and Morris (2005), investigated various forms for quanti- expert will result in significantly greater belief change (prior to
fying DNA random-match probabilities (RMPs) and laboratory posterior) than evidence attributed lesser strength. It is also antic-
error rates. Judges and jurors were asked to rate the probability of ipated that opinions presented in numerical format will signifi-
the defendants guilt, given a RMP presented in one of three cantly differ in impact from opinions expressed in equivalent
formats: a frequency (e.g., a 1 in 40,000 chance of a coincidental verbal form, consistent with the findings of McQuiston-Surrett and
match), a LR (e.g., 40,000 times more likely to match if the Saks (2008), as described previously. Furthermore, there is some
accused is the source of the crime scene sample than if he is not), reason to anticipate these main effects may be moderated by what
or a chart mapping hypothetical prior and posterior probabilities is known as a weak evidence effect (Fernbach, Darlow, &
for the indicated LR. The RMP presentation format was found to Sloman, 2011).
significantly influence rated guilt probability, with post hoc com- The weak evidence effect describes a situation in which a piece
parisons indicating a significant difference between the frequency of evidence, which weakly supports belief in a given hypothesis
format (which produced the lowest estimates of guilt probability) (as is the case in which an expert produces an evaluative opinion
and the chart format (which provided the highest estimates), with with a small LR), has the counterintuitive effect of increasing
the LR falling between the two. The authors concluded that the use belief in the alternative hypothesis (Lopes, 1987). These direc-
of LRs appeared to move juror assessments of evidence in the tional errors (or boomerang effects; Petty & Cacioppo, 1996)
direction of Bayesian norms (p. 429), and consequently supported have been observed in many contexts, including the evaluation of
their use in courts over frequentistic expressions of the probative public policy initiatives (Fernbach et al., 2011), argumentation
value of DNA analyses. (McKenzie, Lee, & Chen, 2002), and mock-juror decision making
The second study, by McQuiston-Surrett and Saks (2008), ex- (Smith et al., 1996), and has been explained by various mecha-
amined undergraduate psychology students rating of the strength nisms from informational neglect (Fernbach et al., 2011), to
EXPRESSING AND INTERPRETING UNCERTAINTIES 199

expectancy-based decision making, in which a downward revision Expert evidence. Each participant also read the testimony
results from comparing the actual evidence strength against your (loosely based on the evidence provided in R v T) of an experi-
high expectations of how strong the evidence would or should be enced expert forensic science analyst who compared footwear
(McKenzie et al., 2002). Although it is not yet known if forensic marks found at the scene of the crime with the shoe the accused
science expert evidence, posed in the form of verbal and numerical was wearing at the time of arrest. As a result of his analysis, the
LRs, is also susceptible to this counterintuitive style of interpre- expert stated,
tation, aspects of the recommended expression for low-strength
evidence (weak or limited support) would seem to make this When assessing the significance of any similarity or differences
between a shoe and a mark resulting from an analysis, the likelihood
outcome likely. More precisely, although the suffix support
of obtaining that similarity or difference is considered against two
indicates the evidentiary value is in favor of the proposition rather alternative propositions: (1) the shoe has made the mark; (2) the shoe
than against it, the term weak may reasonably be interpreted as has not made the mark.
the opposite of strong evidence. Consistent with a weak evi-
dence effect, this could cause people to increase their belief in the Participants were then provided one of six possible versions of
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

alternative proposition rather than weakly increasing their belief in the experts opinion, based on their randomly allocated condition
This document is copyrighted by the American Psychological Association or one of its allied publishers.

the supported proposition. (see Table 2). These values and verbal labels were derived from
those proposed by the AFSP. For example, someone in the low
evidential strength condition next read, In my opinion the corre-
Experiment 1 spondence between the footwear mark at the crime scene and the
shoe of the accused [is 4.5 times more likely] (numerical) or
[offers weak or limited support] (verbal) when proposition 1 is
Method
correct than when proposition 2 is correct.
Design. A 3 (evidential strength: low, moderate, high) 2 Participant responses and procedure. After consenting to
(presentation method: verbal, numerical) between-subjects facto- participate, accessing the experimental materials online, and read-
rial design was employed. ing the details of the minimal trial and case facts, participants were
Participants and data screening. Participants were 75 under- asked whether, based on the information presented, they currently
graduate psychology students participating for course credit and believed the accused was more likely to be guilty or not guilty. If
545 workers from an online self-enlisted workforce (Mechanical the participant indicated a preference for guilt, they were then
Turk; Buhrmester, Kwang, & Gosling, 2011; Paolacci, Chandler, asked to complete the following sentence using a number greater
& Ipeirotis, 2010), who were compensated US 50 for their time. than 1: Based on the available evidence I believe that it is ____
Of the 620 participants completing the experiment, 131 (21.1%) were times more likely that the accused is guilty than not guilty
excluded, based on three predefined criteria,1 resulting in a final (emphasis added). If the participant expressed an original prefer-
sample of N 494 (Numerical nlow 87, nmid 73, nhigh 73; ence for the not guilty option, they were given the same question
Verbal nlow 81, nmid 91, nhigh 89). Of this sample, 13.8% with the order of the italicized terms reversed. The response to this
resided in Australia, 55.7% in the United States, 20.9% in India, question was taken as the participants prior belief in the accuseds
and 9.7% were based in other countries. Three-quarters (75.5%) of guilt. Participants were then presented with one of the six types of
the sample reported being native English speakers. On average, the expert evidence before being asked the same two questions again,
remaining 121 participants had been speaking English for 17.44 providing a posterior-belief value. They were also asked to com-
years (range 3 to 55, SD 8.39). Females accounted for 54.5% of plete a series of demographic questions and the Subjective Nu-
the sample and the mean age was 31.28 years (range 16 to 81, meracy Scale (SNS; Fagerlin et al., 2007), which assesses numer-
SD 12.36). Almost 81% (80.8%) of the sample indicated that, to ical fluency. The entire procedure took approximately 15 min to
the best of their knowledge, they were eligible for jury duty in their complete.
country of residence.
Materials and procedure.
The minimal trial. All participants were presented with a Results
one-page vignette of a hypothetical larceny trial including only the Presentation method and evidence strength. Initial analyses
requirements for conviction, the case facts, and the inculpatory were conducted separately for undergraduate and Mechanical Turk
testimony of an expert. participants. The two groups produced the same pattern of belief-
Participants were first asked to imagine that they were a juror in change results and accordingly have been analyzed and reported
a trial and were informed that in order to return a guilty verdict, together henceforth.
they must be satisfied beyond a reasonable doubt that (a) the Belief-change values were calculated for each participant by
accused person, (b) took and carried away, (c) the property of subtracting the stated prior belief from the posterior belief. For
another, (d) with the intent to permanently deprive the owner of the these purposes not guilty beliefs were coded as negative values.
property, and (e) that taking was without the owners consent.
They were then presented with the following case facts: (a) $300
1
was stolen from the victim, (b) the accused was arrested with $328 Participants were excluded if they (a) completed the experiment in less
in his possession for which he did not account, (c) the accused was than 120 s (n 10); (b) failed the catch-trial (Paolacci et al., 2010) by
not choosing the not very good response when asked, How good are you
arrested in the vicinity of the theft, (d) the accused did not have an at surviving one hour without oxygen? (n 55) or; (c) belief-change
alibi, and (e) at the time of his arrest, the accused was wearing value was classified as an extreme outlier falling outside 7 times the
clothes similar to those described by a witness. interquartile range (n 61; Barbato, Barini, Genta, & Levi, 2011).
200 MARTIRE, KEMP, WATKINS, SAYLE, AND NEWELL

Table 2 MSE 119.78, partial 2 0.052, 95% CI [.019, .092], and a


Evidence Strength and Presentation Method significant interaction effect, F(2, 486) 7.75, MSE 69.87, p
.0005, partial 2 0.031, 95% CI [.006, .065]. Pairwise compar-
Presentation method isons showed that numerical and verbal expressions resulted in
Evidentiary strength Numerical Verbal equal amounts of belief change when the evidence strength was
moderate or high. However, when evidence strength was low,
Low 4.5 Weak or limited support
Moderate 450 Moderately strong support
numerical expression (M 1.35) resulted in significantly greater
High 495,000 Very strong support belief change than the verbal label (M 1.39). Overall, the
belief change observed in the low-strength numerical condition
was in the direction intended by the expert giving the evidence
For example, if a participant began by believing the defendant was (i.e., toward guilt); however, the overall belief change observed in
4 times more likely to be not guilty than guilty (prior 4), and the low-strength verbal condition was in the opposite direction to
finished believing the defendant was 4 times more likely to be that intended by the expert (i.e., toward innocence).
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

guilty than not guilty (posterior 4), that person will have a Weak evidence effect. Given that, in all conditions, partici-
This document is copyrighted by the American Psychological Association or one of its allied publishers.

belief-change score of 8 (posterior minus prior). pants were presented with expert evidence supporting the hypoth-
A 2 3 ANCOVA was conducted to examine the impact of esis that the accuseds shoe had made the mark at the crime scene
presentation method and evidential strength on belief change (and therefore supporting the prosecution case), observed declines
while controlling for prior-belief value (i.e., the number reflecting in guilt ratings are consistent with a weak evidence effect. That is,
how many times more likely one hypothesis is than the other) and inculpatory evidence increased belief in the innocence of the
score on the SNS (see Figure 1). Of these, only the prior-belief accused. To explore this effect further, the proportion of partici-
covariate was significant, F(1, 486) 8.16, MSE 73.59, p pants in each condition who revised their belief in the guilt or
.005, partial 2 0.017, 95% CI [.002,.046]. Adjusting for this innocence of the accused downward after reading the experts
resulted in a significant main effect for presentation method, such evidence (i.e., toward not guilty) was calculated (see Table 3). A
that numerical expressions of evidence resulted in greater adjusted majority of those in the low/verbal condition (61.72%) responded
mean belief-change (M 1.63) compared with verbal expressions in a manner incongruent with the evidence provided by the expert
(M 0.31, F(1, 486) 23.51, MSE 212.06, p .0005, partial (taking inculpatory evidence to be exculpatory), compared with an
2 0.046, 95% CI [.017, .087]. There was also a significant main average of 12.91% in the remaining conditions. Moreover, a siz-
effect of evidential strength, F(2, 486) 13.28, p .0005, able number of those revising their guilty beliefs toward innocence
in the low/verbal condition actually crossed the guilty/not guilty
threshold when moving from priors to posteriors (n 19;
23.46%). That is, they considered the defendant more likely to be
guilty than not guilty before reading the experts testimony, and
considered the defendant more likely to be not guilty than guilty
after reading incriminating evidence from the expert.
In order to obtain a clearer picture of the variance captured in
the mean belief-change values, particularly compared with Bayes-
ian norms, box plots were also constructed depicting observed
belief change compared with the belief change that would be
expected by applying Bayess theorem to participants stated pri-
ors (Bayesian belief-change prior-belief x LR; see Figure 2). In
all conditions, the overwhelming majority of participants re-
sponded more conservatively to the evidence than would have
been predicted by the application of Bayess theorem, and, in the
case of the high-strength evidence conditions, the difference be-
tween the observed and predicted responses was especially large,
amounting to several orders of magnitude.

Table 3
Percent Moving Toward Innocence After Hearing the Expert
Evidence by Presentation Method and Evidence Strength

Percent moving towards innocence


(number of participants changing belief
preference from guilty to not guilty)
Evidence strength
[LR provided] Numerical presentation Verbal presentation

Low [4.5] 12.6% (4) 61.7% (19)


Moderate [450] 8.2% (1) 14.2% (3)
High [495,000] 13.6% (3) 15.7% (2)
Figure 1. Mean adjusted belief change by presentation method and
evidential strength (error bars 2 standard errors). Note. LR likelihood ratio.
EXPRESSING AND INTERPRETING UNCERTAINTIES 201

Low Strength Moderate Strength High Strength


60 6x103 6x106

Presentation Type
50 5x103 5x106
Verbal
Numerical
40 4x103 4x106
Belief-Change Units

30 3x103 12 3x106 12
10 10
20 2x103 8 2x106 8
6 6
10 1x103 4 1x106 4
2 2
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

0 0 0 0 0
This document is copyrighted by the American Psychological Association or one of its allied publishers.

-2 -2
-10 -1x103 -4 -1x106 -4
Bayesian Observed Bayesian Observed Observed Bayesian Observed Observed

Figure 2. The central 80% of the distribution of observed and Bayesian belief change in the low-evidence (left
panel), moderate-evidence (center panel; observed belief-change magnified inset), and high-evidence (right
panel; Observed belief-change magnified inset) strength conditions. Each box includes the central 50% of
belief-change values, and the solid lines in the boxes mark the medians. Belief change equals stated posterior
belief minus stated prior belief.

As another gauge of the degree of correspondence between the change was in line with the experts intended interpretation, par-
experts intention and the jurors interpretation of the evidence, the ticipants were only weakly sensitive to large differences in evi-
data were used to calculate implicit LRs (ILRs), accounting for dential strength and underestimated its value compared with
each participants change from prior to posterior belief. This value Bayesian calculations. This could be due to a misuse of the
could then be compared with the LR provided by the expert. evidence or, alternatively, may reflect perceptions of the relevance
The LR is produced by dividing the posterior-belief odds by the of the evidence to the guilt of the accused; these potential expla-
prior-belief odds. However, some adjustment was necessary to nations will be considered in more detail in the general discussion.
allow for the fact that some participants were expressing belief that These data also suggest that verbal and numerical mechanisms
favored guilt, whereas others favored innocence. To accommodate proposed by forensic scientists (e.g., AFSP; Aitken et al., 2011) do
for this, where belief favored innocence over guilt, the reciprocal not have an equivalent impact on participant-jurors beliefs. Spe-
of the belief value was calculated, so a participant who thought that cifically, presenting participants with low-strength expert evidence
the suspect was 2 times more likely to be guilty than innocent was in a verbal format resulted in an average response that was oppo-
given an odds value 2.0, whereas a participant who thought he was site to the direction intended by the expert giving the evidence,
2 times more likely to be innocent than guilty was given an odds with 61.72% of respondents in this condition treating evidence
of 0.5 (1/2). Both the prior and the posterior odds were adjusted in supporting the defendants guilt, as though it favored innocence.
this way before calculating the ILR for each participant, using the That is, as hypothesized, a majority of participants in the low-
formula ILR posterior odds/prior odds. For example, if a strength/verbal evidence condition treated information that should
participant began by believing the defendant was 2 times more have further implicated the defendant (i.e., inculpatory evidence)
likely to be not guilty than guilty, they would have a prior of 0.5. as though it actually strengthened the case for the defendants
If, after hearing the expert evidence, they thought the defendant innocence (i.e., exculpating the defendant). This same inversion,
was 2 times more likely to be guilty than not guilty, their posterior although present, was much less pronounced in the low-strength
odds would be 2. The ILR, which should achieve a change from a numerical condition, affecting only 12.64% of participants in the
prior of 0.5 to a posterior of 2, is 2/0.5 4. condition. This suggests that numerical expressions may be less
The median ILR values are reported in Table 4. These data are
consistent with undervaluing the experts testimony in all condi-
tions. Where the expert attributed values of 4.5, 450, or 495,000 to Table 4
the evidence, participants ILRs were roughly 3.7 (low strength), Median Implicit Likelihood Ratio by Presentation Method and
300 (moderate strength), and 353,571(high strength) times smaller Evidence Strength
than intended by the expert. For example, in the high-strength
numerical evidence condition, participants interpreted a statement Median ILR (range)
Evidence strength
that explicitly included the LR of 495,000 by using an ILR of 1.4. [LR provided] Numerical presentation Verbal presentation

Low [4.5] 1.2 (0.125.0) 0.8 (0.135.0)


Discussion Moderate [450] 1.5 (0.220.0) 1.1 (0.16.0)
High [495,000] 1.5 (0.225.0) 1.3 (0.120.0)
Examination of the belief change induced by forensic science
evidence of varying strengths revealed that, although most belief Note. ILR implicit likelihood ratio; LR likelihood ratio.
202 MARTIRE, KEMP, WATKINS, SAYLE, AND NEWELL

susceptible to this type of misinterpretation, at least when com- half (49.6%) of the sample indicated that, to the best of their knowl-
pared with the form of words proposed by the AFSP. edge, they were eligible for jury duty in their country of residence.
This reversal of intended direction of the evidence appears to be Materials and procedure. The materials and procedure for
an important and novel demonstration of the weak evidence effect Experiment 2 were identical to Experiment 1 except where the
(Fernbach et al., 2011). The possible mechanisms behind this direction of the expert evidence was reversed. In the inculpatory
effect will be discussed later, but at this time we note the practical (replication) conditions, as in Experiment 1, the expert testified
implications of this effect. Although it may appear problematic for against two alternative propositions: (1) the shoe has made the
decision makers to be treating inculpatory evidence as though it mark; (2) the shoe has not made the mark. For the exculpatory
were exculpatory, as observed here, such an outcome is likely conditions, the order of these propositions was reversed such that
more desirable than if the opposite were also true. The criminal the expert testified that the two alternatives propositions were (1)
justice system is designed to be asymmetric, giving greater em- the shoe has not made the mark; and (2) the shoe has made the
phasis to the protection of the rights of the accused than those of mark. Irrespective of whether the expert provided inculpatory or
the State (via procedural safeguards, such as a presumption of exculpatory evidence, the opinion statement was the same as for
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

innocence and the reasonable doubt verdict thresholds, among Experiment 1: In my opinion the correspondence between the
This document is copyrighted by the American Psychological Association or one of its allied publishers.

others). Accordingly, a tendency to discount weak, although in- footwear mark at the crime scene and the shoe of the accused is
criminating, prosecution evidence is consistent with this inten- [4.5 times more likely/offers weak or limited support] when prop-
tional bias, and although concerning from a logical decision- osition 1 is correct than when proposition 2 is correct, with the
making point of view, its implications for the criminal justice precise wording of each of the propositions remaining on screen
system may be less critical. If, however, the opposite were also throughout.
truethat weak exculpatory evidence was taken to support the
guilt of the accusedthis would be a much more serious concern,
even more so if the newly recommended verbal methods of com- Results
munication made such an outcome more likely than traditional
Initial analyses were conducted separately for undergraduate
numerical expressions.
and Mechanical Turk participants. The two groups produced the
In order to explore this possibility, we conducted a second
same pattern of belief-change results and accordingly have been
experiment, using modified materials from Experiment 1, to pres-
analyzed and reported together.
ent participants with either high or low-strength exculpatory evi-
dence or low-strength inculpatory evidence (thereby replicating Inculpatory (replication) conditions. A one-way ANCOVA
part of Experiment 1) in both verbal and numerical formats. was conducted using the belief-change values obtained from the
inculpatory conditions (low-strength verbal and numerical, N
133) to establish whether the weak evidence effect could be
Experiment 2 replicated in Experiment 2 (see Figure 3). Once again, the prior-
belief covariate was significant, F(1, 130) 18.16, MSE
150.58, p .0005, partial 2 0.123, 95% CI [.036, .230].
Method Adjusting for this resulted in a significant main effect for presen-
Design. The principal design of Experiment 2 was a 2 (evi- tation method, such that numerical expressions of evidence re-
dence: exculpatory high strength, exculpatory low strength) 2 sulted in greater adjusted mean belief change (M 1.54) com-
(presentation method: verbal, numerical), with both factors manip- pared with verbal expressions (M 2.54, F[1, 130] 66.71,
ulated between subjects. In addition, we repeated two conditions MSE 553.30, p .0005, partial 2 0.339, 95% CI [.212,
from Experiment 1 (inculpatory low-strength verbal and inculpa- .448]. These results are consistent with an overall weak evidence
tory low-strength numerical) in an attempt to replicate the previ- effect in the inculpatory low-strength verbal evidence condition,
ously observed weak evidence effect with inculpatory evidence. but not the inculpatory low-strength numerical evidence condition.
Participants. Participants were 139 undergraduate psychology Exculpatory expert evidence. A 2 2 ANCOVA was con-
students participating for course credit and 399 Mechanical Turk ducted to examine the impact of evidence strength (high or low)
workers who were compensated US 50 for their time. Data collected and presentation method on belief change (controlling for priors)
online was screened using the same three criteria employed in Ex- in the exculpatory expert conditions (N 278; see Figure 2). In
periment 1, resulting in the following removals: 4 on the basis of this case, only the main effect of evidence strength was significant,
completion time; 104 failed the catch trial; and 19 based on the with high-strength evidence resulting in greater belief-change (to-
Inter Quartile Range (IQR). Overall, of the 538 participants complet- ward innocence, M 2.43) than the low-strength evidence (M
ing the experiment, 127 (23.6%) were excluded based on these 1.47, F[1, 273] 3.99, MSE 78.02, p .05, partial 2
criteria, resulting in a final sample of 411 (Numerical nexculp/high 0.014, 95% CI [.000, .054]. On average, belief-change scores in
66, nexculp/low 67, ninculp/low 66; Verbal nexculp/high 74, nexculp/ each of the low-strength exculpatory conditions were in the direc-
low 71, ninculp/low 67). Of the remaining participants, 33.8% tion intended by the expert (negative). This is not consistent with
resided in Australia, 20.9% in the United States, 30.2% in India, and a weak evidence effect, in which positive belief change (i.e.,
15.1% were in other countries. Almost 60% (59.9%) of the sample movement toward guilt) in response to weak exculpatory evidence
reported being native English speakers. The remaining 165 partici- would have been expected.
pants had been speaking English for an average of 18.59 years (range The direction of individual belief-change decisions was exam-
5 to 52, SD 10.16). Males accounted for 52.1% of the sample, and ined in order to establish what proportion of individuals in each
the mean age was 26.89 years (range 17 to 66, SD 10.26). Almost condition revised their belief-change in a manner that was incon-
EXPRESSING AND INTERPRETING UNCERTAINTIES 203

Inculpatory Low Strength


(replication)
60

50

40

Belief-Change Units
30

20
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
This document is copyrighted by the American Psychological Association or one of its allied publishers.

10

0 Presentation Type
Verbal
-10 Numerical
Bayesian Observed

Figure 4. The central 80% of the distribution of observed and Bayesian


belief-change in the inculpatory low-strength (replication) condition. Each
box includes the central 50% of belief-change values, and the solid lines in
the boxes mark the medians. Belief change equals stated posterior belief
minus stated prior belief.

Figure 3. Mean adjusted belief change from prior to posterior by pre-


sentation method and evidence condition (error bars 2 standard errors).
Box plots of observed and Bayesian belief-change are presented
in Figure 4 (replication condition) and Figure 5 (exculpatory
evidence conditions). Again, participant belief change is conser-
gruent with the expert evidence (see Table 5). As in Experiment 1, vative compared with Bayesian norms, with the low-strength in-
a majority of those in the inculpatory low/verbal condition culpatory verbal evidence resulting in a weak evidence effect for a
(67.16%) responded in the opposite direction to that intended by majority of participants.
the expert, compared with 30.99% in the exculpatory low verbal The median ILRs are presented in Table 6. The ILRs used in the
condition and around 12% to 20% in the remaining conditions. As exculpatory conditions were inverted here in order to permit direct
in Experiment 1, a substantial proportion of those making incon- comparisons between the ILR and the provided LR. Consistent
gruous changes in the inculpatory low-strength verbal condition with Experiment 1, the data show that the ILRs were markedly
changed their preference from one of guilty to not guilty more conservative than those provided.
(38.81%). In contrast, only 18.18% of participants in the exculpa-
tory low-strength/verbal condition changed from not guilty to
guilty. However, none of the participants in the exculpatory General Discussion
low-strength numerical condition made incongruous changes of We conducted two experiments investigating the impact of the
this sort. newly agreed format for the presentation of forensic science evi-
dence in court. Our results suggest that, although in some ways, the
numerical scale and verbal equivalents proposed by the AFSP
Table 5 performed as intended, in other ways, they did not and hence are
Percent Moving Opposite to Expert Evidence by Presentation far from ideal.
Method and Evidence Strength
Verbal-Numerical Equivalence
Percent moving incongruous to the evidence
(number of participants in condition who When tested using the inculpatory evaluative opinion of a fo-
changed their original guilty/not guilty
preference) rensic scientist, significant differences between verbal and numer-
ical expressions of evidence were revealed only for low-strength
Evidence condition Numerical presentation Verbal presentation
evidence. This suggests verbal-numerical equivalence at the mid
Exculpatory high 18.2 (2) 18.9 (5) and high levels of the scale. Similarly, when tested using excul-
Exculpatory low 20.9 (0) 31.0 (4) patory testimony, the only significant difference observed was
Inculpatory low 12.1 (0) 67.2 (26)
between high and low-strength evidenceas one would hope
204 MARTIRE, KEMP, WATKINS, SAYLE, AND NEWELL

Exculpatory High Strength Exculpatory Low Strength


0.5x106 5
4
2
0 0 0
-2
Belief-Change Units

-4
-0.5x106 -6 -5

-8
-10
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

-1.0x106 -10
This document is copyrighted by the American Psychological Association or one of its allied publishers.

-12
-14
Observed
-1.5x106 -15
Presentation Type
Verbal
-2.0x106 Numerical -20
Bayesian Observed Bayesian Observed

Figure 5. The central 80% of the distribution of observed and Bayesian belief change in the exculpatory high
strength (left panel; observed belief change magnified inset) and exculpatory low strength (center panel). Each
box includes the central 50% of belief-change values, and the solid lines in the boxes mark the medians. Belief
change equals stated posterior belief minus stated prior belief.

and not between verbal and numerical expression. Yet these pos- from moderate strength to high strength was 0.36 units in the
itive indications belie a much more nuanced set of results. numerical condition and 0.53 units in the verbal condition.
Similarly, the difference between implicit and provided LRs
Correspondence Between Intentions and show a change in the median odds (in the wrong direction) from
Interpretations moderate to high strength of 0.01 in the numerical condition
and 0.02 in the verbal condition.
The significant main effects observed in Experiment 1 were
The observation that the degree of belief change achieved in the
modified by a significant interaction between evidence strength
and presentation type. This means that participant belief change moderate compared with high condition was far from commensu-
from before to after hearing the experts evaluative opinion did rate with the actual difference cited in the evidence and intended
not increase in a simple fashion in response to increasing by the expert suggests that participants were insensitive to the
evidence strength. Specifically, decision makers presented with relative weights of the evidence. Further comparison of the Bayes-
evidence that one hypothesis was 450 times more likely, given ian belief-change distributions and the values assigned by the
an inculpatory rather than exculpatory account, showed the expert with observed belief change and participant ILRs also
same amount of belief change as was observed among those reveals a pattern consistent with a conservative application of the
presented with evidence that was 495,000 times more likely. probabilistic evidence.
Furthermore, although these two evidence strengths differ by This apparent insensitivity to the intended value of the evidence
three orders of magnitude, the mean difference in belief change displayed by participants in Experiment 1 is moderated by the

Table 6
Median Implicit Likelihood Ratio by Presentation Method and Evidence Strength

Median ILR (range)


Evidence condition
[LR provided] Numerical presentation Verbal presentation
a
Exculpatory high [495,000] 1.5 (0.180.0) 1.4 (0.256.0)
Exculpatory low [4.5]a 1.2 (0.435.0) 1 (0.272.0)
Inculpatory low [4.5] 1.3 (0.315.0) 0.6 (0.06.0)
Note. ILR implicit likelihood ratio; LR likelihood ratio.
a
ILRs were inverted to permit direct comparisons with the provided LRs.
EXPRESSING AND INTERPRETING UNCERTAINTIES 205

results from Experiment 2. In particular, in Experiment 2, a sig- of the presumption of innocence, a range of procedural rules, and
nificant main effect of evidence strength was observed, such that burden of proof standards. Similarly, participants in our experi-
participants engaged in significantly greater belief change when ments were asked to imagine they were a juror in a trial, were
presented with LRs of 495,000 compared with 4.5. However, a made aware of the requirements for a larceny conviction, and were
comparison of the observed belief change and ILRs against Bayes- advised to use a reasonable-doubt threshold. Thus, within such a
ian belief change and provided LRs again shows an absence of context, it may be appropriate to see that weak prosecution evi-
correspondence. dence (inculpatory) generally does not strengthen belief in guilt
Overall, then, although the verbal and numerical levels of the (and, in fact, does the opposite), whereas weak defense evidence
scale tested here appear to induce similar levels of belief change in (exculpatory) is more likely to be used to strengthen belief in the
the participants (except for the low-strength verbal inculpatory innocence of the accused.
evidence, which we will address later), there is little correspon- This pattern of results is, however, more concerning if consid-
dence between the meaning intended by the expert and the inter- ered from the perspective of the expert. Expert evaluative opin-
pretations made by the participants. This finding is consistent with ions, whatever their specific formulation, are presented to fact
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

literature demonstrating that decision makers tend to undervalue finders in order to help them to reach an accurate resolution to a
This document is copyrighted by the American Psychological Association or one of its allied publishers.

probabilistic information (Faigman & Baglioni, 1988; Goodman, dispute in issue (United States v Downing, cited in Cutler &
1992; Kaye & Koehler, 1991; Smith et al., 1996, 2011). Penrod, 1995, p. 27). Neither the possible undervaluing of the
Alternatively, this pattern of results may reflect an appropriate weight of the experts evidence, nor the incongruent revision of
weighting of the evidence, given its relevance to the actual guilt of beliefs that we have observed to result from the application of the
the accused (Schum & Martin, 1982). Specifically, the presence of standards proposed by the AFSP, are consistent with this aim.
the defendants shoeprint at the crime scene is a piece of circum- Accordingly, it would seem appropriate for the AFSP to reconsider
stantial evidence that has the potential to implicate the defendant in at least the use of these verbal equivalents in the presentation of
the crime but does not directly speak to whether the defendant low-strength inculpatory evidence.
actually committed larceny or not. Put another way, having been at Lastly, when considered from a psychological standpoint, the
a crime scene does not mean that you are the perpetrator of the asymmetry of the weak evidence effect observed here may have
crime. Our participants may have been sensitive to this distinction, important theoretical implications. At least three competing mech-
and, accordingly, it may be that the low levels of weight attributed anisms have been proposed to account for the weak evidence
to the evidence by the decision makers demonstrate a sophisticated effects observed in various contexts. Stated simply, the neglect
appreciation for the value of circumstantial evidence within the account suggests that the effect results from a failure to consider
broader case context, rather than a misapplication of the evidence. other information (e.g., other circumstantial evidence of guilt) that
Although this alternative explanation does not undermine observed could also impact belief in a proposition (Fernbach et al., 2011).
between-groups differences or the observed weak evidence effect, The averaging explanation contends that the error is a function of
further studies in which the expert testifies regarding evidence averaging the amount of support offered to the proposition by the
with direct implications for the guilt of the accused should be new evidence with our prior beliefs in that proposition, thereby
conducted to tease apart these alternative accounts and clarify the reaching a midpoint between the two (Lopes, 1987). The expec-
cause of the apparent undervaluation. tancy violation mechanism suggests that an incongruent revision
occurs when we compare the actual support offered with the
proposition by the new evidence, against the amount of support we
The Weak Evidence Effect
expected the new evidence would provide. When the actual sup-
In both studies, the inversion characterizing a type of weak port offered is smaller than what we expected, we revise down our
evidence effect (Fernbach et al., 2011) was most frequently ob- original belief, despite being given more evidence to support it
served in low-strength conditions, particularly for which verbal (McKenzie et al., 2002).
communication methods were used. These data suggest that verbal Although each of these theoretical accounts has found support in
rather than numerical methods of expression are more open to this the literature, only the expectancy violation explanation can ac-
kind of misinterpretation. It is particularly interesting, however, count for the results we report here. The fact that weak evidence
that participants presented with exculpatory rather than inculpatory effects were more prominent in the incriminating rather than the
expert evidence were not affected in the same way. When a exculpatory conditions cannot be explained by neglect, as the
significant interaction between evidence strength and presentation external information available (to be neglected) in both instances
method was observed in Experiment 1, this same interaction was was the same. Moreover, given that the inculpatory and exculpa-
not evident in the exculpatory conditions in Experiment 2. That is, tory evidence also provided the same amount of support (weak or
participants presented with weak evidence pointing toward inno- limited) for the proposition (although the order of these proposi-
cence interpreted this as increasing the likelihood of innocence, tions differed across conditions), the data would only support the
whereas those presented with weak evidence consistent with guilt averaging explanation if (a) most people in the exculpatory con-
also interpreted this as increasing the likelihood of innocence. dition began with prior beliefs lower than the amount of support
From a criminal justice perspective, this pattern of results is eventually offered by the expert (thereby preventing the midpoint
encouraging in that it appears to demonstrate that individuals between the two being lower than their priors), and (b) most in the
making decisions within a criminal justice context (even if exper- inculpatory condition began with prior beliefs higher than the
imentally induced) are sensitive to the asymmetry built into the amount of support offered by the expert (resulting in a midpoint
system. Jurors participating in a trial are explicitly required to give between the two being lower than their priors). To test the validity
more weight to the rights of the accused than to the State by virtue of this explanation, a comparison of the mean prior beliefs in the
206 MARTIRE, KEMP, WATKINS, SAYLE, AND NEWELL

verbal low-strength inculpatory and exculpatory conditions was ining central, rather than peripheral, evidence, as well as compet-
conducted and revealed no significant difference (Minculp 2.21, ing accounts for over and undervaluing of evidence, is necessary to
Mexculp 3.19, t 0.58, df 136, p .571, two-tailed), clarify our understanding of uncertain evidence and to inform its
indicating the averaging account is not supported here. communication.
Only the expectancy violation explanation allows for the differ-
ent expectations one might have regarding exculpatory and incul-
patory evidence to affect the belief revision process. Specifically, References
it is reasonable to believe that decision makers in a forensic Aitken, C., Berger, C. E. H., Buckleton, J. S., Champod, C., Curran, J.,
context might expect the prosecution to introduce highly incrimi- Dawid, A., . . . Jackson G. (2011). Expressing evaluative opinions: A
nating evidence, if they are to secure a conviction. Thus, when position statement. Science & Justice, 51, 12. doi:10.1016/j.scijus.2011
presented with evidence offering only weak or limited support .01.002
for an inculpatory proposition, decision makers revise, in a down- Association of Forensic Science Providers. (2009). Standards for the for-
ward direction, their belief in the guilt of the accused, even though mulation of evaluative forensic science expert opinion. Science & Jus-
tice, 49, 161164. doi:10.1016/j.scijus.2009.07.004
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

they have been presented with additional information supporting


Barbato, G., Barini, E., Genta, G., & Levi, R. (2011). Features and
This document is copyrighted by the American Psychological Association or one of its allied publishers.

the hypothesis, in essence saying, If this is the best the prosecu- performance of some outlier detection methods. Journal of Applied
tion can offer, then I am not persuaded the defendant is guilty at Statistics, 38, 21332149. doi:10.1080/02664763.2010.545119
all. Conversely, decision makers may have lower expectations Berger, C. E. H. (2010). Criminalistics is reasoning backwards. Nederlands
regarding exculpatory evidence. In fact, they may properly have no Juistenblad, 85, 784 789.
expectations at all, given that they are required to presume the Brun, W., & Teigen, K. H. (1988). Verbal probabilities: Ambiguous,
defendant is innocent and that the burden of proof is borne by the context-dependent, or both? Organizational Behavior and Human De-
prosecution (i.e., the defendant has no obligation to mount a cision Processes, 41, 390 404. doi:10.1016/0749-5978(88)90036-2
positive defense). Thus, even weak exculpatory evidence has the Budescu, D. V., Broomell, S., & Por, H. H. (2009). Improving communi-
potential to surpass the decision-makers expectation, removing cation of uncertainty in the reports of the Intergovernmental Panel on
Climate Change. Psychological Science, 20, 299 308. doi:10.1111/j
any need to revise down their belief in the innocence of the
.1467-9280.2009.02284.x
accused in the face of weak evidence. Budescu, D. V., Por, H. H., & Broomell, S. (2012). Effective communi-
cating of uncertainty in the IPCC reports. Climatic Change, 113, 181
Limitations and Future Directions 200. doi:10.1007/s10584-011-0330-3
Buhrmester, M., Kwang, T., & Gosling, S. D. (2011). Amazons Mechan-
We did not target, select, or analyze only jury-eligible respon- ical Turk: A new source of inexpensive, yet high-quality, data? Per-
dents in these experiments. We would argue that an emphasis on spectives on Psychological Science, 6, 35. doi:10.1177/
jury-eligible respondents was unnecessary, given that we were 1745691610393980
primarily interested in the impact of various types of expert evi- Cutler, B. L., & Penrod, S. D. (1995). Mistaken identification: The eye-
dence on decision making. As we have no basis for believing that witness, psychology, and the law. Cambridge, UK: Cambridge Univer-
sity Press.
the individuals in our studies would differ systematically or sig-
de Keijser, J., & Elffers, H. (2012). Understanding of forensic expert
nificantly from a jury-eligible sample in their response to expert reports by judges, defense lawyers and forensic professionals. Psychol-
evidence, we stand by our approach. ogy, Crime & Law, 18, 191207. doi:10.1080/10683161003736744
A second issue worthy of consideration is the recruitment of Fagerlin, A., Zikmund-Fisher, B. J., Ubel, P. A., Jankovic, A., Derry,
participants via the online marketplace Mechanical Turk. It is H. A., & Smith, D. M. (2007). Measuring numeracy without a math test:
possible that individuals completing our study online may have Development of the subjective numeracy scale. Medical Decision Mak-
been less diligent or engaged in the task compared with individuals ing, 27, 672 680. doi:10.1177/0272989X07304449
completing the task under the supervision of the experimenter, and Faigman, D. L., & Baglioni, A. (1988). Bayes theorem in the trial process.
this may threaten the validity of the results. Various steps were Law and Human Behavior, 12, 117. doi:10.1007/BF01064271
taken in these experiments to ensure this was not the case. First, in Fernbach, P. M., Darlow, A., & Sloman, S. A. (2011). When good evidence
goes bad: The weak evidence effect in judgment and decision-making.
both experiments, we ran a validation sample of undergraduates
Cognition, 119, 459 467. doi:10.1016/j.cognition.2011.01.013
alongside those completing online. The patterns of results pro- Gigerenzer, G., & Edwards, A. (2003). Simple tools for understanding
duced by our student and online samples were the same. Second, risks: From innumeracy to insight. British Medical Journal, 27, 741
we set rigorous predefined exclusion criteria using a recommended 744. doi:10.1136/bmj.327.7417.741
catch test in addition to a speed-based performance measure, Goodman, J. (1992). Jurors comprehension and assessment of probabilis-
leading to the exclusion of participants whose data was of ques- tic evidence. American Journal of Trial Advocacy, 16, 361.
tionable quality. Third, Experiment 2 involved the repetition and Kaye, D. H., & Koehler, J. J. (1991). Can Jurors Understand Probabilistic
clear replication of two conditions from Experiment 1. As a result, Evidence? Journal of the Royal Statistical Society, Series A, 75 81.
we are confident that the data obtained online speaks to real-world Retrieved from: http://www.jstor.org/stable/2982696
decision-making performance on this task. Koehler, J. J. (1996). On conveying the probative value of DNA evidence:
Frequencies, likelihood ratios, and error rates. University of Colorado
In conclusion, it is clear from this research that decision makers
Law Review, 67, 859 886.
vary widely in their responses to uncertain forensic science evi- Koehler, J. J., & Saks, M. J. (2010). Individualization claims in forensic
dence, revising their beliefs in vastly different ways than those science: Still unwarranted (June 1, 2010). Brooklyn Law Review, 75,
predicted by Bayesian calculations. What remains unclear, how- 1187.
ever, is what mechanism(s) account for the observed discrepancies Lopes, L. L. (1987). Procedural debiasing. Acta Psychologica, 64, 167
and therefore how they can be minimized. Further research exam- 185. doi:10.1016/0001-6918(87)90005-9
EXPRESSING AND INTERPRETING UNCERTAINTIES 207

McKenzie, C. R. M., Lee, S. M., & Chen, K. K. (2002). When negative Smith, B. C., Penrod, S. D., Otto, A. L., & Park, R. C. (1996). Jurors use
evidence increases confidence: Change in belief after hearing two sides of probabilistic evidence. Law and Human Behavior, 20, 49 82. doi:
of a dispute. Journal of Behavioral Decision Making, 15, 118. doi: 10.1007/BF01499132
10.1002/bdm.400 Smith, L. L., Bull, R., & Holliday, R. (2011). Understanding juror percep-
McQuiston-Surrett, D., & Saks, M. J. (2008). Communicating opinion tions of forensic evidence: Investigating the impact of case context on
evidence in the forensic identification sciences: Accuracy and impact. perceptions of forensic evidence strength. Journal of Forensic Sciences,
Hastings Law Journal, 59, 1159 1190. 56, 409 414. doi:10.1111/j.1556-4029.2010.01671.x
Morrison, G. S. (2011). Measuring the validity and reliability of forensic Stoel, R. (2012). A practitioners view on the application of the likelihood ratio
likelihood-ratio systems. Science & Justice, 51, 9198. doi:10.1016/j approach in cartridge case and bullet comparison. In the 21st International
.scijus.2011.03.002 Symposium on the Forensic Sciences: Convicts to criminalistics. Retrieved from
Nance, D. A., & Morris, S. B. (2005). Juror understanding of DNA http://events.cdesign.com.au/ei/viewpdf.esp?id314&file//srv3/events/
evidence: An empirical assessment of presentation formats for trace eventwin/docs/pdf/anzfss2012abstract00374.pdf
evidence with a relatively small random match probability. The Journal Thompson, W. C. (1989). Are juries competent to evaluate statistical
of Legal Studies, 34, 395 444. evidence? Law and Contemporary Problems, 52, 9 41.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

National Academies of Science. (2009). Strengthening forensic science in Thompson, W. C., & Schumann, E. L. (1987). Interpretation of statistical
evidence in criminal trials: The prosecutors fallacy and the defense
This document is copyrighted by the American Psychological Association or one of its allied publishers.

the United States: A path forward. Washington, D.C.: The National


Academies Press. attorneys fallacy. Law and Human Behavior, 11, 167187. doi:10.1007/
Neumann, C., Evett, I. W., & Skerret, J. (2012). Quantifying the weight of BF01044641
evidence from a forensic fingerprint comparison: A new paradigm. Thornton, J., & Peterson, J. (2007). The general assumptions and rationale
Journal of the Royal Statistical Society Series A (Statistics in Society), of forensic identification. In D. L. Faigman, D. Kaye, M. J. Saks, J.
175, 371 415. doi:10.1111/j.1467-985X.2011.01027.x Sanders, & E. Cheng (Eds.), Modern scientific evidence: The law and
Paolacci, G., Chandler, J., & Ipeirotis, P. G. (2010). Running experiments science of expert testimony (pp. 157222). St Paul, MN: Thomson West.
on Amazon Mechanical Turk. Judgment and Decision Making, 5, 411 Wallsten, T. S., & Budescu, D. V. (1995). A review of human linguistic
419. probability processing: General principles and empirical evidence.
Petty, R. E., & Cacioppo, J. T. (1996). Attitudes and persuasion: Classic Knowledge Engineering Review, 10, 43 62. doi:10.1017/
and contemporary approaches. Boulder, CO: Westview Press. S0269888900007256
R v T, EWCA Crim 2439 (Court of AppealCriminal Division, 2010).
Schum, D. A., & Martin, A. W. (1982). Formal and empirical research on Received September 17, 2012
cascaded inference in jurisprudence. Law and Society Review, 17, 105 Revision received February 3, 2013
151. Accepted February 4, 2013

Вам также может понравиться