You are on page 1of 33

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/247744837

Evaluating the Predictive Accuracy of Six Risk Assessment Instruments for Adult Sex
Offenders

Article  in  Criminal Justice and Behavior · August 2001


DOI: 10.1177/009385480102800406

CITATIONS READS

327 1,133

4 authors, including:

Howard Barbaree Michael C Seto


University of Toronto University of Ottawa
124 PUBLICATIONS   6,494 CITATIONS    216 PUBLICATIONS   6,642 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Atypical sexual interests in victim selection and recidivism View project

Incest aversion, Inbreeding avoidance, and Intra-familial sexual abuse View project

All content following this page was uploaded by Michael C Seto on 17 November 2015.

The user has requested enhancement of the downloaded file.


CRIMINAL JUSTICE AND BEHAVIOR
Barbaree et al. / SEX OFFENDER RISK ASSESSMENT

EVALUATING THE PREDICTIVE


ACCURACY OF SIX RISK
ASSESSMENT INSTRUMENTS
FOR ADULT SEX OFFENDERS

HOWARD E. BARBAREE
MICHAEL C. SETO
CALVIN M. LANGTON
Centre for Addiction and Mental Health and University of Toronto
EDWARD J. PEACOCK
Correctional Service of Canada

Five actuarial instruments and one guided clinical instrument designed to assess risk for recidi-
vism were compared on 215 sex offenders released from prison for an average of 4.5 years. The
Violence Risk Appraisal Guide, Sex Offender Risk Appraisal Guide, Rapid Risk Assessment of
Sexual Offense Recidivism, and Static-99 predicted general recidivism, serious (violent and sex-
ual) recidivism, and sexual recidivism. The Minnesota Sex Offender Screening Tool-Revised
and a guided clinical assessment (Multifactorial Assessment of Sex Offender Risk for Recidi-
vism) predicted general recidivism but did not significantly predict serious or sexual recidivism.
On its own, the Psychopathy Checklist-Revised predicted general and serious recidivism but not
sexual recidivism. The results support the utility of an actuarial approach to risk assessment of
sex offenders.

T he assessment of risk to reoffend is an important task for clini-


cians working with sex offenders. These assessments aid in deci-
sions made about these individuals, including sentencing, institu-

AUTHORS’ NOTE: We would like to thank Karl Hanson, Martin Lalumière, and
Vernon Quinsey for their helpful comments on an earlier version of this manuscript.
We would also like to express our gratitude to our research assistants Michelle
Adams, Leigh Harkins, Alexandra Maric, and Jennifer McCormick. Finally, we want
to recognize the staff at the Warkworth Sexual Behaviour Clinic, the administration of
CRIMINAL JUSTICE AND BEHAVIOR, Vol. 28 No. 4, August 2001 490-521
© 2001 American Association for Correctional Psychology

490
Barbaree et al. / SEX OFFENDER RISK ASSESSMENT 491

tional placement, treatment planning, recommendations with regard


to parole, and the restrictiveness of conditions attached to supervision
in the community. The enactment of “sexual predator” laws with
regard to the long-term incapacitation of high-risk sex offenders after
serving their criminal sentences will likely increase the impact and
need for valid sex offender risk assessments (see Lieb & Matson,
1998).
Knowledge about risk assessment for offenders in general has
advanced greatly in the past 20 years. Studies have consistently found
that variables such as offender age, number of previous convictions,
pro-criminal attitudes and associations, and measures of antisocial
personality predict reoffending. As Seto and Lalumière (1999) and
others have noted, the validity of these predictors are not restricted to
specific groups of offenders. Recent meta-analytic reviews have
found that these variables reliably predict general recidivism among
juvenile delinquents (Lipsey & Derzon, 1998), adult sex offenders
(Hanson & Bussiére, 1998), adult offenders in general (Gendreau, Lit-
tle, & Goggin, 1996), and mentally disordered offenders (Bonta, Law, &
Hanson, 1998).
Reflecting the increasing demand for sex offender risk assess-
ments, the number of available instruments has proliferated in recent
years. Many of these have not yet been empirically evaluated. Of those
with at least some evidence of predictive validity, the most promising
appear to be the Violence Risk Appraisal Guide (VRAG) (Harris,
Rice, & Quinsey, 1993), Sex Offender Risk Appraisal Guide
(SORAG) (Quinsey, Harris, Rice, & Cormier, 1998), Rapid Risk
Assessment of Sexual Offense Recidivism (RRASOR) (Hanson,

Warkworth Institution, and the staff at the National Parole Board office in Kingston,
Ontario, for their help and support. Support for this study was provided by a research
contract with the Department of the Solicitor General of Canada (9414-CL/525),
awarded to the first author, and two research contracts with the Correctional Service
of Canada (21150-6-7605 and 21150-7-6614), awarded to the first and second
authors. The views expressed are those of the authors and do not necessarily reflect
the views of the Department of the Solicitor General of Canada or the Correctional
Service of Canada. Please address correspondence to Howard E. Barbaree or
Michael C. Seto, Law and Mental Health Program, Centre for Addiction and Mental
Health, 1001 Queen Street, Toronto, Ontario, Canada M6J 1H4; e-mail:
Howard_Barbaree@camh.net or Michael_Seto@camh.net.
492 CRIMINAL JUSTICE AND BEHAVIOR

1997), Static-99 (Hanson & Thornton, 1999), and the Minnesota Sex
Offender Screening Tool-Revised (MnSOST-R) (Epperson, Kaul, &
Hesselton, 1998). These five actuarial instruments are objectively
scored and provide probabilistic estimates of risk based on the empiri-
cal relationships between their combination of items and the outcome
of interest. The probabilistic estimates indicate the percentage of peo-
ple with the same score who would be expected to reoffend within a
defined period of opportunity. It is not surprising that the instruments
have similar item content; in fact, the SORAG is a modification of the
VRAG, and the Static-99 includes all four RRASOR items.
The VRAG accurately predicts violent recidivism among male
offenders, including sex offenders (Harris et al., 1993; Rice & Harris,
1995, 1997; reviewed in Quinsey et al., 1998). The operational defini-
tion of violent recidivism for the VRAG includes all offenses against
persons, including assault, armed robbery, sexual offenses that
involve physical contact with the victim, homicide, and attempted
homicide. It does not include offenses such as uttering threats or sex-
ual offenses that do not involve contact, such as possession of child
pornography or indecent exposure. Hare’s (1991) Psychopathy
Checklist-Revised (PCL-R), a reliable and valid measure of psychop-
athy among male forensic patients and correctional inmates, is an
important item in this instrument. Rice and Harris (1997) reported that
VRAG scores correlated .44 with violent recidivism and .17 specifi-
cally with sexual recidivism in a sample of 288 sex offenders followed
for an average of 10 years; the corresponding areas (AUC1) under the
Receiver Operating Characteristic (ROC) curve were .76 and .60,
respectively. The VRAG was developed using a sample of predomi-
nantly mentally disordered offenders, but its utility has since been
demonstrated in correctional samples of offenders (Kroner & Mills,
1997; Loza & Dhaliwal, 1997).
The SORAG is a modification of the VRAG, developed specifi-
cally to predict violent recidivism (which includes sexual offenses
involving physical contact with the victim) among male sex offenders.
The same procedure used in developing the VRAG was followed to
identify and weigh items; 10 of its 14 items are the same as items in the
VRAG. Rice and Harris (1997) reported that a scale composed of
items closely resembling the SORAG performed as well as the VRAG
in predicting violent recidivism among sex offenders, and Bélanger
Barbaree et al. / SEX OFFENDER RISK ASSESSMENT 493

and Earls (1996) found that the SORAG had an AUC of .82 with an
outcome of parole failure or recidivism of any kind in a sample of 57
Canadian sex offenders released from prison. Rice and Harris (1999),
in a cross-validation sample of sex offenders released in Ontario,
found that the VRAG and SORAG were highly correlated with each
other, and both significantly predicted violent and sexual recidivism.
Assigning raw scores to four instead of nine risk categories, Firestone,
Bradford, Greenberg, Nunes, and Broom (2001) found that the
SORAG was significantly associated with violent (including sexual)
recidivism in a sample of 558 Canadian sex offenders with an average
follow-up time of slightly more than 7 years; however, the AUC of .63
was lower than the values reported in other studies.
In their meta-analytic review of 61 data sets representing more than
20,000 sex offenders, Hanson and Bussiére (1998) found that indica-
tors of deviant sexual interests—number of prior sexual offenses,
phallometrically measured sexual arousal to children—consistently
predicted sexual recidivism (both contact and noncontact offenses).
Drawing from these results, Hanson (1997) selected variables with a
minimum correlation of .10 with sexual recidivism and developed a
brief actuarial scale with four items, representing the best independent
predictors of sexual reoffending. Across seven development samples,
comprising a total of 2,592 sex offenders, Hanson found that
RRASOR scores had an average correlation of .27 with sexual recidi-
vism, with an average AUC of .71, ranging from .62 to .77. The predic-
tive validity of the RRASOR for sexual recidivism has been replicated
in a number of as yet unpublished studies (Haynes, Yates,
Nicholaichuk, Gu, & Bolton, 2000; Smiley, Hills, & McHattie, 2000).
In a large cross-validation sample of 1,400 sex offenders followed for
an average of almost 4 years in Sweden, Sjöstedt and Långström
(2000) found that the RRASOR had a correlation of .22 and an AUC of
.72 with sexual recidivism.
The Static-99 incorporates the items from the RRASOR as well as
items from the Structured Anchored Clinical Judgement scale devel-
oped in the United Kingdom (Hanson & Thornton, 1999, 2000).
Hanson and Thornton reported that scores on the Static-99 correlated
.33 with sexual recidivism and .32 with sexual or violent recidivism in
a combined sample of 1,208 sex offenders, drawn from four sources;
the corresponding AUCs were .71 and .69, respectively. Three of the
494 CRIMINAL JUSTICE AND BEHAVIOR

four samples, representing slightly more than half of the combined


sample, were used in the development of the RRASOR; however, the
Static-99 also significantly predicted sexual or violent recidivism in
the fourth sample of men released from Her Majesty’s Prison Service
in 1979 (AUCs of .72 for sexual recidivism and .69 for sexual or vio-
lent recidivism). Beech, Beckett, and Fisher (2000) found that the
Static-99 had an AUC of .73 in predicting sexual recidivism among a
sample of 53 treated sex offenders, with an average follow-up time of
6 years. Firestone et al. (2001) found that the Static-99, using four risk
categories, significantly predicted sexual recidivism, with an AUC of
.70, and sexual or violent recidivism, with an AUC of .69. Sjöstedt and
Långström (2000) reported AUCs of .76 for sexual recidivism and .74
for nonsexually violent recidivism using the Static-99 in their large
sample of Swedish sex offenders.
Many of the actuarial risk assessment instrument studies cited here
are not yet published in peer-reviewed journals. Replication of their
results and a direct comparison of these risk assessment instruments
would be of great practical interest to clinicians working with sex
offenders. Dempster (1998) found that scores on the VRAG, SORAG,
and RRASOR distinguished nonsexually violent or sexual recidivists
from nonrecidivists in a retrospective comparison. Firestone et al.
(2001) found no difference between the SORAG and Static-99 in the
prediction of violent and/or sexual recidivism. Also, the actuarial
instruments described above do not include treatment-related infor-
mation, and an interesting and important research question is whether
treatment-related information can independently contribute to risk
prediction (see Seto & Barbaree, 1999).
The MnSOST-R is an actuarial instrument that incorporates both
historical and institutional information. There is overlap between the
historical variables assessed by the MnSOST-R and the other actuarial
instruments, but the institutional items are unique because two of the
items refer to treatment participation while incarcerated. Like the
RRASOR and Static-99, the MnSOST-R was designed to predict sex-
ual recidivism (rather than the broader outcome of violent recidivism
used for the VRAG and SORAG). Epperson et al. (1998) reported that
scores on the MnSOST-R correlated .45 with sexual recidivism in a
combined sample of rapists and extrafamilial child molesters fol-
lowed for a minimum of 6 years; the corresponding AUC was .77. The
Barbaree et al. / SEX OFFENDER RISK ASSESSMENT 495

MnSOST-R was developed using a sample of 274 sex offenders (8


without complete data). In a cross-validation sample of 220 sex
offenders, the MnSOST-R correlated .35 with sexual reoffense and
had an AUC of .73 after a minimum follow-up time of 6 years
(Epperson et al., 2000).
A clinical instrument, the Multifactorial Assessment of Sex
Offender Risk for Recidivism (MASORR), was developed by the first
author, Howard E. Barbaree, in 1989 for a prison-based sex offender
treatment program, prior to the availability of actuarial risk assess-
ment instruments for sex offenders. The MASORR has not been
empirically evaluated prior to this study. According to the terminol-
ogy suggested by Hanson (1998), the MASORR would be described
as a guided clinical approach to risk assessment (in contrast to the
actuarial instruments described earlier). The clinical MASORR rat-
ings were based on variables that were empirically related to sex
offender recidivism according to the research literature available at
the time: offense history, antisocial personality (essentially psychopa-
thy, because ratings were based on PCL-R scores), deviant sexual
interests (based on phallometric test results), and a rating of social
competence (based on estimated intellectual functioning, marital sta-
tus, and employment). These individual ratings were subjectively
combined to provide a global rating of pretreatment risk. After partici-
pating in the treatment program, an individual’s pretreatment rating of
risk was subjectively combined with ratings of motivation for treat-
ment, degree of behavior change achieved, and clinical impression of
risk to provide a global rating of posttreatment risk. The MASORR
was therefore designed to provide a pretreatment rating of risk based
on empirically derived variables, and then to adjust this global rating
based on treatment participation and performance. Because many sex
offender treatment programs have used or continue to use guided clin-
ical risk assessment instruments, the MASORR was included in this
evaluation as a reasonable example of this approach.
The present study was conducted to evaluate the performances of
the VRAG, SORAG, RRASOR, Static-99, MnSOST-R, and
MASORR in predicting recidivism in a sample of adult male sex
offenders who participated in treatment while in prison and were
eventually released to the community. To our knowledge, this is the
first cross-validation study directly comparing the VRAG, SORAG,
496 CRIMINAL JUSTICE AND BEHAVIOR

RRASOR, Static-99, and MnSOST-R. Given previous results, we


expected that the VRAG and SORAG would accurately predict vio-
lent recidivism, whereas the RRASOR, Static-99, and MnSOST-R
would accurately predict sexual recidivism. Because this is the first
empirical investigation of the MASORR, we did not make a specific
prediction about its performance but did note that it was designed to
predict sexual recidivism.
The present study also examined the predictive ability of the PCL-
R (Hare, 1991). In the development of the PCL-R, Hare was specifi-
cally interested in the construction of an instrument to quantify psy-
chopathic personality traits, not an instrument to predict reoffense.
Nevertheless, published research has reported that the PCL-R does a
reasonable job in predicting any recidivism (Hart, Kropp, & Hare,
1988) and violent recidivism (Rice, Harris, & Quinsey, 1990). The
PCL-R is an important component of the VRAG and SORAG, and its
content items are not unrelated or dissimilar to items on the Static-99.
Therefore, we included the PCL-R in the present study for comparison
purposes.

METHOD

PARTICIPANTS

The sample consisted of 215 adult sex offenders who were assessed
at the Warkworth Sexual Behaviour Clinic (WSBC), a prison-based
sex offender treatment program, between June 1989 and June 1996.
Selection for sex offenders for inclusion in the WSBC program did not
involve any specific selection or exclusionary criteria. All sex offend-
ers housed at Warkworth Penitentiary were eligible for treatment at
the WSBC and were actively encouraged to participate by their case
managers and treatment staff. All offenders who sought treatment
were admitted to the program on a priority basis depending on their
projected release date, with earlier release leading to earlier admis-
sion. Of course, offenders who did not consent to participation were
not included in the program and would not be represented in the pres-
ent study. The present sample represents all the offenders seen at the
WSBC who had relatively complete information in their files for cod-
Barbaree et al. / SEX OFFENDER RISK ASSESSMENT 497

ing the risk assessment instruments and who were released from
prison and at risk for offending during the follow-up period. Approxi-
mately half the sample had index sexual offenses involving females
age 14 or older, and the remainder had index sexual offenses involving
female or male children younger than 14 years old. Approximately
half of the offenders against children committed offenses against bio-
logically or legally related children, and the remainder committed
their offenses against at least one unrelated child. Many of these
offenders (n = 153) were in a follow-up study recently reported by
Seto and Barbaree (1999).
The average age of the sample was 37.6 years (SD = 10.6, ranging
from 21 to 68). Participants had an average grade 9 level of education
(SD = 2.4, ranging from grade 1 to high school graduation) and had an
average Blishen Index score of 33.4 (SD = 8.3). The Blishen Index is
based on the individual’s most recent occupation and takes into
account the median education and income for major occupational cat-
egories drawn from Canadian census data; examples of occupations
with Blishen scores in the low 30s include construction tradesperson
or truck driver (Blishen, Carroll, & Moore, 1987). Data on socioeco-
nomic status were available for only 140 participants because many
were unemployed, attending school, or retired at the time of their
index offense. Thirty-two percent of the sample were married or in
common-law relationships at the time of their involvement with the
Warkworth program, 43% were separated, divorced, or widowed, and
25% had never been married.

MEASURES

PCL-R. Based on semi-structured interviews and review of file


information, participants were assigned ratings of 0 (absent), 1 (some
indication), or 2 (present) on each of the 20 PCL-R items, tapping
characteristics such as impulsivity, irresponsibility, and callousness.
Scale scores are obtained by summing the items, for a total possible
score of 40. Factor analyses have consistently found two factors in the
PCL-R (Hare, 1991). Eight items load above .40 on Factor 1, describ-
ing a pattern of “selfish, callous, and remorseless use of others.” Nine
items load above .40 on Factor 2, describing a “chronically unstable,
498 CRIMINAL JUSTICE AND BEHAVIOR

antisocial, and socially deviant lifestyle,” with only two items specifi-
cally pertaining to criminal behavior.
The conventional cutoff for making a diagnosis of psychopathy is
30. The total score can also be interpreted as reflecting the probability
that an individual is a psychopath (Harris, Rice, & Quinsey, 1994).
The PCL-R was scored from interview and file information by pre-
dominantly master’s-level clinicians as part of the intake assessment
at the WSBC. The PCL-R allows up to 5 missing items; scores were
prorated accordingly.
Descriptive statistics for the PCL-R are shown in Table 1. To serve
as a comparison for the risk assessment instruments evaluated in this
study, the PCL-R was also submitted to the ROC analysis. Participants
were assigned to one of eight risk categories based on their scores,
divided into equal intervals (1 = 0 to 5, 2 = 6 to 10, 3 = 11 to 15, and
so on).

VRAG. The VRAG contains 12 items: Living with both biological


parents until age 16, elementary school maladjustment, history of
alcohol problems, marital status, nonviolent offense history, failure on
prior conditional release, age at index offense, index victim injury, sex
of index victim, meeting DSM-III1 criteria for any personality disor-
der, meeting DSM-III criteria for schizophrenia, and PCL-R score
(Quinsey et al., 1998). Using a method adapted from Nuffield (1982),
their scores (weights) are based on the empirical relationship between
the predictor and violent recidivism in the large development sample;
each “point” represents an increment or decrement of 5% from the
base rate of violent recidivism in that sample (31% after an average
follow-up time of approximately 7 years). Looking at the item weights
that are used, it is clear that the PCL-R is an important part of the
instrument, with the single biggest influence on the overall score.
Total VRAG scores can range from –26 to +38. Individuals are
assigned to one of nine risk categories based on their scores; ordinal
position among these risk categories, ranging from 1 (lowest) to
9 (highest), was submitted to the ROC analyses.

SORAG. As mentioned above, the SORAG is a modification of the


VRAG, with 10 common items, and was developed using a similar
method. The SORAG has 14 items: living with both biological parents
TABLE 1: Distribution of Scores on the Psychopathy Checklist-Revised and the Risk Assessment Instruments

MASORR
PCL-R VRAG SORAG RRASOR Static-99 MnSOST-R Pretreatment Posttreatment

N 212 215 215 215 215 150 215 169


M 16.1 .81 6.18 1.62 3.24 3.43 3.72 3.37
SD 7 8.31 12.15 1.3 2.09 5.26 1.15 1.11
Range 3 to 35 –16 to +20 –20 to +38 0 to 5 0 to 9 –10 to +17 1 to 5 1 to 5
IRR .81 .9 .92 .94 .9 .8 .31 .49

1 7 1 0 1 27 0 41 0 26 1 5 1 9 1 8
2 43 2 4 2 19 1 76 1 21 2 31 2 26 2 27
3 58 3 32 3 30 2 52 2 33 3 46 3 48 3 62
4 48 4 65 4 47 3 24 3 43 4 34 4 66 4 39
5 31 5 60 5 28 4 14 4 35 5 27 5 66 5 33
6 17 6 37 6 27 5 8 5 24 6 7
7 8 7 17 7 25 6 0 6+ 33
8 0 8 0 8 10
9 0 9 2

NOTE: PCL-R = Psychopathy Checklist-Revised; VRAG = Violence Risk Appraisal Guide; SORAG = Sex Offender Risk Appraisal Guide;
RRASOR = Rapid Risk Assessment for Sexual Offense Recidivism; MnSOST-R = Minnesota Sex Offender Screening Tool-Revised; MASORR =
Multifactorial Assessment of Sex Offender Risk for Recidivism. PCL-R, VRAG, SORAG, and MnSOST-R scores are reported in terms of risk cat-
egories. IRR = Interrater reliability, calculated as Pearson correlations between total scores.

499
500 CRIMINAL JUSTICE AND BEHAVIOR

until age 16, elementary school maladjustment, history of alcohol


problems, marital status, nonviolent offense history, violent offense
history, sexual offense history, sex and age of index victim, failure on
prior conditional release, age at index offense, meeting DSM-III
(American Psychiatric Association, 1980) criteria for any personality
disorder, meeting DSM-III criteria for schizophrenia, phallomet-
rically measured deviant sexual interests, and PCL-R score (Quinsey
et al., 1998). Total SORAG scores can range from –27 to +51. Individ-
uals are assigned to one of nine risk categories based on their scores;
ordinal position among these risk categories, ranging from 1 (lowest)
to 9 (highest), was submitted to the ROC analyses.

RRASOR. The RRASOR has four items: Number of prior charges


or convictions for sexual offenses; age upon release from prison or
anticipated opportunity to reoffend in the community; any male vic-
tims, coded as yes or no; and any unrelated victims, coded as yes or no
(Hanson, 1997). Total scores can range from 0 to 6; the item weights
reflect the magnitude of the item’s independent relationship with sex-
ual recidivism. The instrument was developed using samples of adult
males who had been convicted of at least one sexual offense. Raw
scores on the RRASOR were submitted to the ROC analyses.

Static-99. This instrument has 10 items, including the 4 RRASOR


items (Hanson & Thornton, 1999). The additional items are prior sen-
tencing dates, any convictions for noncontact sexual offenses, index
offense of nonsexually violent nature, prior nonsexually violent
offense, any stranger victims, and cohabitation status. The Static-99
was developed for adult males who were known to have committed at
least one sexual offense. Total scores can range from 0 to 12; individu-
als are assigned to one of seven risk categories based on their score
(individuals with scores of 6 or more are combined). Ordinal position
among these risk categories, ranging from 0 (lowest) to 6+ (highest),
was submitted to the ROC analyses.

MnSOST-R. This instrument has 16 items, 12 pertaining to histori-


cal information and 4 pertaining to institutional information (Epper-
son et al., 1998). Unlike the other instruments described here, the
MnSOST-R is not scored for offenders with only related victims (i.e.,
Barbaree et al. / SEX OFFENDER RISK ASSESSMENT 501

incest offenders). The historical items are number of sex/sex-related


convictions, length of sexual offending history, offender under super-
vision at time of any sexual offense, any sexual offense committed in a
public place, force or threat of force used in any sexual offense, any
sexual offense within a single incident that involved multiple acts per-
petrated on a single victim, number of different age groups victimized
across all sexual offenses, victim aged 13 to 15 years and offender 5 or
more years older, victim was stranger in any sexual offense, adoles-
cent antisocial behavior, substantial drug or alcohol abuse in year
prior to arrest, and employment history. The institutional items are
discipline history while incarcerated, involvement in substance use
treatment, involvement in sex offender treatment, and age at time of
release.
Like the VRAG and SORAG, item weights for the MnSOST-R are
based on the empirical relationship between each item and the out-
come of sexual reoffense in the development sample, using a method
adapted from Nuffield (1982). Each point represents an increment or
decrement of 5% from the base rate of sexual recidivism in that sample
(artificially set at 35%, after a minimum follow-up time of 6 years, by
over sampling recidivists). The selection of the items and derivation of
their weights is described in detail by Epperson et al. (1998). Total
MnSOST-R scores can range from –14 to +30. In the latest update of
this instrument, individuals can be assigned to one of six risk catego-
ries based on their total score; ordinal position among these risk cate-
gories, ranging from 1 (lowest) to 6 (highest), was submitted to the
ROC analyses.

MASORR. Individual ratings on the MASORR are made on a scale


from 1 (low) to 5 (high) (see Barbaree & Seto, 1998; Barbaree, Seto, &
Maric, 1995). Together, these ratings were subjectively combined to
provide a global rating of pretreatment risk. After participating in
treatment, this rating was subjectively combined with ratings of moti-
vation, degree of change achieved, and clinical impression to provide
a global rating of posttreatment risk. Higher scores indicated greater
estimated risk for sexual reoffending for the pretreatment ratings, clin-
ical impression of risk, and the global rating of posttreatment risk. In
contrast, higher ratings on motivation for treatment and degree of
behavior change achieved indicated higher motivation for treatment
502 CRIMINAL JUSTICE AND BEHAVIOR

and greater change achieved, respectively; in other words, these two


ratings were reversed with regard to judged risk because higher moti-
vation for treatment and greater change achieved were assumed to be
indicators of lower risk to reoffend. The global pretreatment and
posttreatment MASORR ratings of risk, on a 5-point scale ranging
from 1 (low) to 5 (high), were submitted to the ROC analyses.

DATA COLLECTION

Participants gave written consent for the use of their information


for research purposes at the time of their assessment at the Warkworth
program. The clinical files contained the following information: (a) A
review of institutional files, including police reports, court records,
previous psychological reports, and case management reports; (b) a
semi-structured interview with the offender; (c) psychological test
results; and (d) treatment reports cowritten by the group therapist and
the program director. The MASORR was scored as part of the pre-
treatment and posttreatment assessment completed by the Warkworth
treatment program from 1989 to 1998. The VRAG and RRASOR
were scored retrospectively from file information in 1997 and 1998,
and the SORAG, Static-99, and MnSOST-R were scored retrospec-
tively from file information in 2000. All instrument coding was com-
pleted by individuals who were unaware of the recidivism outcomes.
Recidivism information was obtained in April 2000 from the Cana-
dian Police Information Centre database maintained by the Royal
Canadian Mounted Police; this national database records criminal
charges and convictions incurred in Canada.
It is worth noting that MASORR scores were reported to case man-
agers and parole board officials by Warkworth program staff and sub-
sequently influenced release decisions made about participants in the
treatment program. The global rating of posttreatment risk was highly
related to program recommendations made to case managers (on a
5-point scale ranging from 1 = recommended release to the community
to 5 = no change in status, r(163) = .70, p < .001; treatment partici-
pants rated as high in risk were much less likely to be recommended
for conditional release to the community. It is not surprising that the
global rating of posttreatment risk was significantly related to the like-
Barbaree et al. / SEX OFFENDER RISK ASSESSMENT 503

lihood of being granted parole among those who were eligible, r(162) =
–.51, p < .001.
The average time at risk for reoffense was 4.5 years (SD = 2.2,
range = 29 days to 9.9 years). We distinguished between any recidi-
vism, meaning a reoffense of any kind, serious recidivism, meaning a
new nonsexually violent or sexual reoffense, and specifically sexual
recidivism. Sexual recidivism could include both contact and non-
contact offenses, but all sexual reoffenses identified in the April 2000
follow-up involved physical contact with the victim. For the entire
sample of 215 offenders, the recidivism rates after an average of 4.5
years’ follow-up time were 38% for reoffenses of any kind, 24% for
serious reoffenses, and 9% for sexual reoffenses.

RESULTS

RISK ASSESSMENT INSTRUMENT SCORES

The distribution of risk assessment instrument scores (including


PCL-R categories) is shown in Table 1. Risk categories rather than raw
scores are reported for the VRAG, SORAG, Static-99, and MnSOST-R.
Interrater reliability estimates between two different raters on a
subset of 30 cases (except for the PCL-R) are also presented in Table 1.
These reliability estimates are Pearson correlations between total
scores for the risk assessment instruments. Interrater reliability for the
PCL-R was based on a subset of 47 cases; the correlation was calcu-
lated between PCL-R scores assigned at the WSBC and PCL-R scores
assigned by a different assessment team at a penitentiary placement
unit. Interrater reliability for the risk assessment instruments was gen-
erally very good, with the exception of the MASORR pretreatment
and posttreatment ratings. We nonetheless proceeded with the analy-
sis of the MASORR because these clinical ratings were reported by
the program for much of its history.
Complete information on the VRAG was available for 76% of the
sample. Eighteen percent had one missing item, 4% had two missing
items, and 2% had three missing items. No participants had scores that
placed them in the lowest risk category or the two highest risk
categories.
504 CRIMINAL JUSTICE AND BEHAVIOR

Complete information on the SORAG was available for 55% of the


sample. Thirty-two percent had one missing item (predominantly the
item referring to phallometric assessment results), 9% had two miss-
ing items, and 4% had three missing items. Unlike the VRAG, there
were participants assigned to all nine risk categories.
Complete information on the RRASOR was available for 96% of
the sample. Nine participants had one missing item and one partici-
pant had two missing items. The modal score was 1, and only 22 par-
ticipants received a score of 4 or higher.
Complete information on the Static-99 was available for 89% of the
participants. Nine percent had one missing item, 1% had two missing
items, 1% had three missing items, and the remaining participant had
five missing items.
The MnSOST-R could not be scored for the 59 participants (27% of
the sample) who had only committed offenses against related victims
(i.e., they were incest offenders). Another 6 participants had many
missing items (2 were missing 14 items, 2 were missing 7 items, 1 was
missing 5 items, and 1 was missing 4 items). Of the remaining 150 par-
ticipants, complete information was available for 68% of the partici-
pants; 20% had one missing item (predominantly the item referring to
chemical dependency treatment while incarcerated), 10% had two
missing items, and 2% had three missing items.
For the MASORR, all participants had a global rating of pretreat-
ment risk, and 169 participants had a global rating of posttreatment
risk. Global ratings of posttreatment risk were unavailable for partici-
pants who refused treatment (n = 22) or dropped out of treatment early
in the program (n = 14); the reasons for not having posttreatment rat-
ings were unknown for the remaining 10 participants. Less than 25%
of participants were rated as low or low-moderate in the pretreatment
or posttreatment MASORR ratings.
We used all participant scores for the correlations between risk
assessment instrument scores and the ROC analyses to evaluate the
predictive accuracy of the risk assessment instruments, so the sample
sizes varied across instruments (MnSOST-R scores were not derived
for 65 incest participants due to scoring instructions, and posttreatment
MASORR ratings were not available for 46 participants). We com-
pleted the direct comparison of risk assessment instruments on the
Barbaree et al. / SEX OFFENDER RISK ASSESSMENT 505

group of 150 participants with scores available for all the risk assess-
ment instruments.

CONCURRENT VALIDITY

The correlations between risk assessment instrument scores are


shown in Table 2. As might be expected, the VRAG and SORAG were
highly correlated with each other, the RRASOR and Static-99 were
strongly correlated with each other, and the global MASORR ratings
of pretreatment and posttreatment risk were strongly correlated with
each other. Except for the RRASOR and PCL-R, all the risk assess-
ment instruments were significantly correlated with each other.
Because the VRAG and SORAG both include PCL-R scores, it is
not surprising that both are strongly correlated with the PCL-R. It is
interesting that PCL-R scores were significantly correlated with scores
on the other risk assessment instruments, except for the RRASOR.

PREDICTIVE VALIDITY WITH RECIDIVISM OUTCOMES

Table 3 shows AUCs, calculated using all participant data on the


VRAG, SORAG, RRASOR, Static-99, MnSOST-R, and MASORR
(global ratings of pretreatment and posttreatment risk), across the
three outcomes of any reoffense, serious reoffense, and specifically
sexual reoffense. Underlined values indicate the AUC for the out-
come(s) for which a risk assessment instrument was originally
designed. The AUCs for the PCL-R are included for comparison pur-
poses. Standard errors are reported so that 95% confidence intervals
can be calculated.
Risk categories (ordinal data) for the VRAG, SORAG, Static-99,
and MnSOST-R were analyzed in the ROC analyses because it is the
risk categories that are used to report probabilistic estimates of risk to
reoffend. For the same reason, actual score on the RRASOR was ana-
lyzed. The pattern of results obtained in the ROC analyses did not
change when raw scores (continuous data) on the VRAG, SORAG,
Static-99, and MnSOST-R were analyzed. The ROC analyses were
conducted using ROCKIT Version 0.9.1b (Metz, 1998).
Recidivism of any kind was significantly predicted by all the instru-
ments except the global MASORR rating of posttreatment risk.
506
TABLE 2: Correlations Between Scores on the Psychopathy Checklist-Revised and the Risk Assessment Instruments

MASORR
Measure PCL-R VRAG SORAG RRASOR Static-99 MnSOST-R Pretreatment Posttreatment

PCL-R —
VRAG .70** —
SORAG .72** .90** —
RRASOR .13 .14* .38** —
Static-99 .45** .49** .67** .75** —
MnSOST-R .30** .36** .41** .32** .46** —
MASORR, Pretreatment .54** .36** .47** .38** .50** .37** —
MASORR, Posttreatment .32** .18* .33** .42** .44** .32** .65** —

NOTE: PCL-R = Psychopathy Checklist-Revised; VRAG = Violence Risk Appraisal Guide; SORAG = Sex Offender Risk Appraisal Guide;
RRASOR = Rapid Risk Assessment for Sexual Offense Recidivism; MnSOST-R = Minnesota Sex Offender Screening Tool-Revised; MASORR =
Multifactorial Assessment of Sex Offender Risk for Recidivism. N = 215 except for the PCL-R, which was available for 212 participants, the
MnSOST-R, which was available for 150 participants, and the posttreatment MASORR rating, which was available for 169 participants.
* p < .05. ** p < .001.
TABLE 3: Areas Under the Curve of the Relative Operating Characteristic for the Psychopathy Checklist-Revised and the Risk
Assessment Instruments (using all available participant scores)

MASORR
Outcome Ratea PCL-R VRAG SORAG RRASOR Static-99 MnSOST-R Pretreatment Posttreatment

Any reoffense 38% .71** (.037) .77** (.034) .76** (.033) .60* (.040) .71** (.036) .65** (.046) .62** (.042) .56 (.046)
Serious reoffense 24% .65** (.043) .69** (.040) .73** (.037) .65** (.043) .70** (.040) .58 (.054) .58 (.050) .54 (.052)
Sexual reoffense 9% .61 (.064) .61* (.068) .70** (.060) .77** (.050) .70** (.050) .65 (.077) .61 (.078) .60 (.092)

NOTE: PCL-R = Psychopathy Checklist-Revised; VRAG = Violence Risk Appraisal Guide; SORAG = Sex Offender Risk Appraisal Guide;
RRASOR = Rapid Risk Assessment for Sexual Offense Recidivism; MnSOST-R = Minnesota Sex Offender Screening Tool-Revised; MASORR =
Multifactorial Assessment of Sex Offender Risk for Recidivism. Standard errors are in parentheses. Italicized values indicate the outcomes for
which the risk assessment instruments were originally designed to predict. N = 215 except for the PCL-R, which was available for 212 partici-
pants, the MnSOST-R, which was available for 150 participants, and the posttreatment MASORR rating, which was available for 169 participants.
a. Calculated for the total sample of 215 participants.
* p < .05. ** p < .01.

507
508 CRIMINAL JUSTICE AND BEHAVIOR

Although they were not designed to predict general recidivism, the


highest AUC was obtained by the VRAG, followed closely by the
SORAG and then the Static-99. Serious recidivism was also predicted
by the VRAG, SORAG, RRASOR, and Static-99; the SORAG had the
highest AUC for this outcome (which it was designed to predict). The
VRAG, SORAG, RRASOR, and Static-99 significantly predicted
sexual recidivism; the RRASOR had the highest predictive accuracy
for this outcome (which it was designed to predict). All other AUCs
were not statistically significant at p < .05. It is interesting that the
AUCs for the Static-99 were very similar across the three recidivism
outcomes.
The correlations between risk assessment instrument scores (as
well as their component scores, when applicable) and recidivism out-
comes are reported for illustrative purposes in Table 4. The correla-
tions for PCL-R scores are also included. Correlations are not as infor-
mative as AUCs because the former are more constrained by the
relatively low base rates for serious and sexual recidivism. Looking at
the correlations for the outcomes that were significantly predicted in
the ROC analyses, it is worth noting that the MASORR rating of anti-
social personality (based on PCL-R score) had the highest correlation
of the individual pretreatment ratings with recidivism of any kind, the
global MASORR rating of pretreatment risk had a higher correlation
with recidivism of any kind than the global MASORR rating of
posttreatment risk, and the MnSOST-R historical item subtotal had a
higher correlation with recidivism of any kind and serious recidivism
than the institutional item subtotal. It is also worth noting that the
PCL-R had the fourth highest significant correlation with recidivism
of any kind, after the VRAG, SORAG, and Static-99. Factor 2 scores
on the PCL-R had higher correlations than Factor 1 scores for any
recidivism and serious recidivism.

DIRECT COMPARISON OF RISK ASSESSMENT INSTRUMENTS

To provide a direct comparison of the risk assessment instruments


with the same group of participants, we calculated the AUCs for the
150 participants with scores on all the risk assessment instruments,
excluding the global MASORR rating of posttreatment risk (see Table
5). The standard errors are reported so that 95% confidence intervals
Barbaree et al. / SEX OFFENDER RISK ASSESSMENT 509

TABLE 4: Correlations Between Recidivism Outcomes and Scores on the Psy-


chopathy Checklist-Revised and the Risk Assessment Instruments

Recidivism
Measure Any Serious Sexual

PCL-R .30** .17* .09


Factor 1: selfish, callous use of others .02 –.02 .1
Factor 2: antisocial lifestyle .37** .22** .06
VRAG .45** .24** .11
SORAG .47** .30** .17*
RRASOR .14* .20* .26**
Static-99 .34** .28** .18*
MnSOST-R .25** .13 .14
Historical items subtotal .28** .12 .11
Institutional items subtotal –.07 .01 .08
MASORR, overall pretreatment rating .18* .07 .11
Offense history .15 .06 .19*
Antisocial personality .23* .12 .01
Deviant sexual interests –.16 –.02 .06
Social competence 0 .03 –.04
MASORR, overall posttreatment rating .12 .06 .12
Motivation –.01 –.01 –.08
Degree of behavior change achieved –.05 –.05 –.1
Clinical impression .04 –.01 .15

NOTE: PCL-R = Psychopathy Checklist-Revised; VRAG = Violence Risk Appraisal


Guide; SORAG = Sex Offender Risk Appraisal Guide; RRASOR = Rapid Risk Assess-
ment for Sexual Offense Recidivism; MnSOST-R = Minnesota Sex Offender Screening
Tool-Revised; MASORR = Multifactorial Assessment of Sex Offender Risk for Recidi-
vism. N = 215 except for the PCL-R, which was available for 212 participants, the
MnSOST-R, which was available for 150 participants, and the posttreatment MASORR
rating, which was available for 169 participants.
* p < .05. ** p < .001.

can be calculated. We excluded the global MASORR rating of post-


treatment risk because the earlier analysis indicated that these scores
did not predict any of the recidivism outcomes, and including it would
have further reduced the sample size for the direct comparison of
instruments.
The pattern of results was very similar when we restricted the anal-
ysis to participants who had scores on all the risk assessment instru-
ments: All the risk instruments significantly predicted recidivism of
any kind, but the SORAG now had a higher AUC value than the
VRAG; the VRAG, SORAG, RRASOR, and Static-99 significantly
predicted serious recidivism, with the SORAG having the largest
510
TABLE 5: Directly Comparing the Psychopathy Checklist-Revised and the Risk Assessment Instruments Using Areas Under the
Curve of the Relative Operating Characteristic

Pretreatment
Outcome Rate PCL-R VRAG SORAG RRASOR Static-99 MnSOST-R MASORR

Any reoffense 43% .68** (.045) .76** (.040) .78** (.038) .61* (.048) .76** (.040) .65** (.046) .62* (.051)
Serious reoffense 27% .63* (.051) .66** (.048) .71** (.043) .65** (.052) .70** (.047) .58 (.054) .57 (.061)
Sexual reoffense 9% .61 (.069) .58 (.080) .68** (.070) .73** (.073) .68* (.070) .65 (.077) .63 (.091)

NOTE: PCL-R = Psychopathy Checklist-Revised; VRAG = Violence Risk Appraisal Guide; SORAG = Sex Offender Risk Appraisal Guide;
RRASOR = Rapid Risk Assessment for Sexual Offense Recidivism; MnSOST-R = Minnesota Sex Offender Screening Tool-Revised; MASORR =
Multifactorial Assessment of Sex Offender Risk for Recidivism. Standard errors are in parentheses. Italicized values indicate the outcomes for
which the methods were designed to predict. N = 150 for all risk assessment instruments except N = 148 for the PCL-R.
* p < .05. ** p < .01.
Barbaree et al. / SEX OFFENDER RISK ASSESSMENT 511

AUC value. The SORAG, RRASOR, and Static-99 significantly pre-


dicted sexual recidivism, with the RRASOR having the largest AUC
value. The PCL-R significantly predicted recidivism of any kind and
serious recidivism.
We used the ROCKIT software to compare pairs of instruments in
terms of their AUC values (i.e., predictive accuracy). To control for
inflation of Type I error associated with the large number of possible
pairwise comparisons, we looked only at the difference in AUCs
between the SORAG and the other risk assessment instruments for
recidivism of any kind and serious recidivism (the SORAG had the
highest AUC values for these outcomes), and between the RRASOR
and the other risk assessment instruments for sexual recidivism (the
RRASOR had the highest AUC value for this outcome). The SORAG
was significantly more accurate than the PCL-R, z = 2.58, p < .01,
RRASOR, z = 3.04, p < .005, MnSOST-R, z = 2.62, p < .01, and global
pretreatment MASORR, z = 2.92, p < .005, in predicting recidivism of
any kind; the SORAG was also significantly more accurate than the
MnSOST-R, z = 2.41, p < .05, or the global pretreatment MASORR,
z = 2.12, p < .05, in predicting serious recidivism. The RRASOR was
not significantly more accurate than any of the other instruments in
predicting sexual recidivism.

DISCUSSION

The present study was conducted to evaluate the performances of a


number of available actuarial instruments in their prediction of recidi-
vism among adult male sex offenders. For comparison purposes, the
present study included the PCL-R and a guided clinical assessment of
risk (MASORR). The present findings provide support for the predic-
tive validity of a number of actuarial measures. In general terms, four
of the actuarial instruments (VRAG, SORAG, RRASOR, and Static-
99) were successful in predicting general, serious, and sexual recidi-
vism. Performance statistics reported in this study were in the range
of those already published in the literature. To our knowledge, this is
the first cross-validation study directly comparing these actuarial
instruments.
512 CRIMINAL JUSTICE AND BEHAVIOR

Results from the present study using the VRAG confirm findings
published previously. Specifically, the VRAG has been found to pre-
dict violent and sexual recidivism among sex offenders (Harris et al.,
1993; Rice & Harris, 1995, 1997; reviewed in Quinsey et al., 1998).
Rice and Harris (1997) reported that the VRAG resulted in AUCs of
.76 and .60 for violent and sexual recidivism, respectively. The present
study found AUCs of .69 and .61, respectively. Similarly, results from
the present study using the SORAG confirm findings published previ-
ously. Specifically, the SORAG has been reported to predict serious
recidivism among sex offenders (Quinsey et al., 1998). Firestone et al.
(2001) found that the SORAG resulted in an AUC of .65 for sexual
recidivism. The present study reported AUCs of .73 and .70 for serious
and sexual recidivism, respectively. Results from the present study
confirm findings published previously using the RRASOR. Spe-
cifically, the RRASOR predicts sexual recidivism. Hanson (1997)
reported that the RRASOR resulted in AUCs ranging from .62 to .77.
Sjöstedt and Långström (2000) found that the RRASOR resulted in an
AUC of .72. The present study reported an AUC of .76 for sexual
recidivism. Finally, results from the present study confirm findings
published previously using the Static-99. Specifically, the Static-99
predicts both serious and sexual recidivism. Hanson and Thornton
(2000) reported that use of the Static-99 resulted in AUCs of .71 and
.69 for serious and sexual recidivism, respectively. Beech et al. (2000)
found that the Static-99 had an AUC of .73 in predicting sexual recidi-
vism. Firestone et al. (2001) found that the Static-99 predicted sexual
recidivism with an AUC of .61, and sexual or violent recidivism with
an AUC of .62. Sjöstedt and Långström (2000) reported AUCs of .76
for sexual recidivism and .74 for nonsexually violent recidivism. The
present study reported an AUC of .70 for both sexual and serious
recidivism.
One reasonable objective of the present study might have been to
identify a single actuarial instrument that would provide the field with
superior predictive capability. No one instrument was found to be
superior in predicting recidivism outcome.
However, not every actuarial instrument was found to be successful
in predicting outcome. The fifth of the actuarial instruments
(MnSOST-R) failed to meet conventional levels of statistical signifi-
cance in the prediction of serious and sexual recidivism, although it
Barbaree et al. / SEX OFFENDER RISK ASSESSMENT 513

did predict general recidivism. The MnSOST-R was designed to pre-


dict sexual recidivism. Epperson et al. (1998) reported that the
MnSOST-R resulted in an AUC for sexual recidivism of .77 in their
original development sample and an AUC of .73 in a cross-validation
sample (Epperson et al., 2000). The present study found an AUC of
.65 for the MnSOST-R, a result that was found to approach conven-
tional statistical significance (p = .11). Of course, the MnSOST-R may
be found to be successful in predicting recidivism in larger studies or
studies using other samples.
It is unknown why the MnSOST-R was unsuccessful in predicting
sexual recidivism in the present study. The MnSOST-R incorporates a
number of historical items. There is considerable overlap between the
historical variables assessed by the MnSOST-R and the other actuarial
instruments. However, unlike the other instruments, the MnSOST-R
also incorporates institutional information, and two of the institutional
items refer to treatment participation while incarcerated. The inclu-
sion of these items may reduce the instrument’s predictive ability.
The base rate of recidivism was much lower in the present study
than in the MnSOST-R development sample. However, because the
base rate of recidivism was equally low for all of the actuarial instru-
ments evaluated in the present study, the low base rate does not put the
MnSOST-R at any unfair disadvantage compared with the four suc-
cessful instruments.
Lower relative reliability of coding might explain our failure to suc-
cessfully predict sexual recidivism using the MnSOST-R. Our reli-
ability of coding the MnSOST-R was similar to that found by
Epperson and his group. In their Web site material, Epperson and his
colleagues report a study of reliability in which 11 raters rated the
same 12 offenders, resulting in a single measure intraclass correlation
coefficient of .80. As reported earlier, our interrater reliability
(Pearson r = .80) was acceptably high and similar in value to that of the
developers of the instrument. Therefore, reduced reliability cannot
explain why our AUCs for the MnSOST-R were lower than those
reported by the test developers. At the same time, our reliability of
scoring for the MnSOST-R was lower than for the four successful
actuarial instruments (.90 or above). So, lower relative reliability of
scoring might explain why the MnSOST-R was relatively weaker as a
predictive instrument in the present study.
514 CRIMINAL JUSTICE AND BEHAVIOR

Scoring the MnSOST-R requires careful reading of extensive man-


ual material, a relatively large amount of training of the coders, and a
high degree of diligence among the coders. Our coders were trained
over a period of one full working day. The developers of the MnSOST-
R have provided more comprehensive scoring guidelines and exam-
ples than provided by the developers of the RRASOR or Static-99.
Nevertheless, we found the MnSOST-R to be the most difficult of the
actuarial measures to code. The MnSOST-R items are very specific
and the clinical file material available was not always exactly pertinent
to the items as described. In contrast, the VRAG, SORAG, RRASOR,
and Static-99 were straightforward to code and score.
A specific explanation for the MnSOST-R’s failure to predict sex-
ual recidivism is that the instrument was designed to predict arrest for
a new sexual offense, whereas the outcome evaluated in the present
study was a new charge or conviction for a sexual offense (arrest data
were not available). In contrast, the other instruments were designed
to predict a new charge or conviction for a sexual offense. However,
the authors of the MnSOST-R have indicated that nearly all arrests in
their follow-up data resulted in conviction.
Therefore, taking all of these considerations into account, the pres-
ent study found that the MnSOST-R had fewer advantages than the
other four actuarial measures. First, it was not successful in predicting
important recidivism outcomes. Second, it was difficult and more
expensive to code and score. Finally, it did not allow for the assess-
ment of intrafamilial child molesters.
Studies consistently find that psychopathy is related to recidivism
(see Seto & Lalumière, 2000). The present study confirmed that the
PCL-R was predictive of general (AUC = .71) and serious (AUC = .65)
recidivism. Although it did not successfully predict sexual recidivism,
the AUC value of .61 approached conventional levels of statistical sig-
nificance. The correlation between PCL-R score and recidivism of
any kind was the third highest after that for the VRAG and SORAG. It
may not be surprising, then, that actuarial instruments that incorpo-
rated the PCL-R as a component item (VRAG & SORAG) were strong
performers in the present study. However, actuarial instruments that
did not take psychopathy into account directly (RRASOR & Static-
99) were no less capable as predictors of serious and sexual reoffense.
Although the PCL-R was moderately correlated with the Static-99, its
Barbaree et al. / SEX OFFENDER RISK ASSESSMENT 515

correlation with the RRASOR was very low. It may be that the PCL-R
and the RRASOR each tap different independent contributors to
recidivism (psychopathy and sexual deviance).
It should be noted here that conventional scoring of the PCL-R
requires specialized training, lengthy file review, and an interview that
may take up to 2 hours to complete. The VRAG and SORAG require
the PCL-R score as an important component item. In the event that the
PCL-R score is not available in the clinical file, scoring the VRAG and
the SORAG therefore require the additional scoring of the PCL-R, a
requirement that is expensive and time consuming. The present study
found that the RRASOR and the Static-99, instruments that do not
include the PCL-R, were equal to the PCL-R-based instruments in
their reliability and their ability to predict serious and sexual recidi-
vism among sex offenders. This finding may be good news for settings
that do not have PCL-R scores available or that do not have the
resources to conduct these personality assessments.
Among the successful actuarial instruments, the RASORR was by
far the easiest to score, and its performance was in many ways remark-
able. The AUC found for sexual recidivism was numerically superior
to the other instruments, although this difference was not statistically
significant. We should conclude that, in the present study, the
RRASOR was at least equal in predictive validity to other actuarial
instruments that are more difficult and time consuming to score. This
is particularly remarkable when you realize that one of its items takes
account of whether the individual has ever offended against an unre-
lated child, and the direct comparison between instruments in the
present study excluded incest offenders. Therefore, its strong perfor-
mance here would have depended on only three simply scored items.
One important caveat on the scoring of the RRASOR is important
to make. Assuming that our computerized database provided adequate
proxies for the simply scored RRASOR items, we initially scored the
RRASOR by recoding these database variables. Using these data, we
initially found that the RRASOR, and indeed the Static-99 (which
included the RRASOR items), were not successful in predicting recid-
ivism. Subsequently, when we calculated the reliability of our scoring,
we detected important differences between the independent rater
scoring the RRASOR directly from the file and the recoded database
variables. When RRASOR scores were coded directly from the files,
516 CRIMINAL JUSTICE AND BEHAVIOR

we found the results reported here. When coded directly from the files,
the RRASOR was successful in predicting recidivism in sex
offenders.
As mentioned in the introduction, the SORAG contains many items
from the VRAG. In addition, it contains items specifically designed to
capture contributions to reoffending in sex offenders. As discussed in
Quinsey et al. (1998), statistical comparisons between these two
instruments would assess the marginal utility of the additional items in
the SORAG. The present study found no significant differences
between the VRAG and SORAG in the prediction of recidivism out-
come in sex offenders.
In a similar vein, as mentioned in the introduction, the Static-99
contains the four items from the RRASOR, as well as additional items
from an instrument developed by David Thornton in England. The
Static-99 was designed to be an improvement over the RRASOR
(Hanson & Thornton, 2000). Statistical comparisons of the RRASOR
and the Static-99 would assess the extent to which the Static-99 repre-
sents an improvement over the RRASOR. The present study reported
no significant differences between the RRASOR and the Static-99.
Examination of the tables of AUCs indicate that the performance of
the RRASOR and Static-99 were similar with respect to serious and
sexual recidivism. However, the RRASOR appears to fall short of the
Static-99 in the prediction of general recidivism. The direct statistical
comparisons were not made due to a concern for increases in Type I
error.
Treatment-related information did not improve the prediction of
recidivism in the present study. The global MASORR rating of
posttreatment risk did not significantly predict any of the recidivism
outcomes, whereas the global MASORR rating of pretreatment risk
did predict recidivism of any kind. This finding is consistent with the
results of Seto and Barbaree (1999), who found that positive treatment
behavior was actually associated with a greater risk for offending,
especially with higher scores on the PCL-R. In a similar vein, the
correlational analysis revealed that the institutional item subtotal on
the MnSOST-R was less informative with regard to recidivism of any
kind than the historical item subtotal. This finding has important
implications for the assessment of risk with sex offenders. Many pro-
Barbaree et al. / SEX OFFENDER RISK ASSESSMENT 517

grams continue to use subjective clinical ratings to evaluate risk or


estimate treatment response (usually by lowering risk estimates fol-
lowing successful completion of a treatment program). To the extent
that these clinically derived instruments have not been empirically
validated, these instruments may not accurately assess future risk for
serious offending and may have a spurious, undesirable influence on
decisions with regard to conditional release, supervision require-
ments, and risk to the community.
More generally, the present findings can be interpreted as further
evidence for the disadvantages of subjective clinical judgment in
assessing risk (see Quinsey et al., 1998). Plausible reasons for the
inaccuracy of subjective clinical judgment include cognitive biases
such as the recency effect, the salience of unusual or extreme informa-
tion, and the general difficulty in subjectively combining information
from a variety of sources. Hanson and Bussiére (1998) found that
unstructured clinical judgments were, on average, only modestly
related to sexual recidivism. One inherent advantage of actuarial
instruments over subjective clinical ratings is that the former have
objective criteria and explicit scoring rules, reducing or eliminating
the influence of unspecified or potentially irrelevant information.
The MASORR has not been empirically evaluated prior to this
study, and some evidence for the predictive ability of the MASORR
was found. First, the pretreatment MASORR was a significant predic-
tor of any recidivism. Whereas this result may provide some comfort
for users of guided clinical approaches, further examination reduces
the apparent support. The MASORR pretreatment scoring was
heavily dependent on the PCL-R, resulting in a correlation between
these two measures of .54. Because the PCL-R has been repeatedly
found to be predictive of any reoffense and was significantly predic-
tive of any reoffense in the present study, this success of the MASORR
might be largely attributable to the influence of the PCL-R, which was
numerically superior to the MASORR (AUCs of .71 vs. .62, respec-
tively) in predictive validity. The PCL-R has other advantages, includ-
ing proven reliability of scoring. Second, the “offense history” com-
ponent of the MASORR was significantly correlated with sexual
recidivism. Again, this result may be seen to support the guided clini-
cal approach. The RRASOR considers the same essential information
518 CRIMINAL JUSTICE AND BEHAVIOR

from the clinical file as does the MASORR offense history compo-
nent. However, unlike the subjectively scored MASORR, the
RRASOR structures and quantifies the scoring, ensuring that such
scoring is applied consistently across subjects and raters. The present
study was unable to report that the MASORR ratings were reliable but
found very high reliability for the RRASOR ratings. The RRASOR’s
correlation with sexual recidivism was found to be numerically higher
(.26 vs. .18) than the MASORR’s. Therefore, whereas a guided clini-
cal approach to risk assessment may be found to predict recidivism
from time to time, an actuarial approach is favored over the clinical
judgments because of consistently superior reliability and validity.
We do not mean to imply that the assessment of sex offender risk for
reoffense cannot be improved. Other variables may provide informa-
tion over and above the combination of items that are already included
in a particular instrument. Of particular interest is research on proxi-
mal antecedents of reoffending, including noncompliance with super-
vision, substance abuse, access to potential victims, and acute psychi-
atric symptoms such as persecutory delusions (Hanson & Harris,
1998; Quinsey, Coleman, Jones, & Altrows, 1997). It is clear from the
results of the present study, however, that the development and modifi-
cation of new risk assessment instruments for sex offenders must be
guided by validation research.

NOTE

1. Many of the indices commonly used to evaluate predictive accuracy, such as correlations
or percentage of recidivists and nonrecidivists correctly classified, are influenced by the base
rate (e.g., the proportion of individuals who reoffend) and selection ratio (e.g., the proportion of
individuals predicted to reoffend). Mossman (1994) and Rice and Harris (1995) have proposed
that area under the ROC curve is the most appropriate index in the assessment of accuracy in pre-
diction of recidivism. ROC curves are obtained by plotting the sensitivity of a prediction instru-
ment as a function of specificity (Hanley & McNeil, 1982, 1983). In the context of predicting
sexual recidivism, the area under the curve (AUC) represents the probability that a randomly
selected individual who commits a new sexual offense has a higher score on the prediction
instrument than a randomly selected individual who does not. AUC values can range from 0 to 1;
an AUC of .5 represents prediction at the chance level, whereas values higher or lower than .5
indicate better or worse performance, respectively.
Barbaree et al. / SEX OFFENDER RISK ASSESSMENT 519

REFERENCES

American Psychiatric Association. (1980). Diagnostic and statistical manual of mental disor-
ders (3rd ed.). Washington, DC: Author.
Barbaree, H. E., & Seto, M. C. (1998). The ongoing follow-up of sex offenders treated at the
Warkworth Sexual Behaviour Clinic. Research report prepared for the Correctional Service
of Canada.
Barbaree, H. E., Seto, M. C., & Maric, A. (1995). Sex offender characteristics, response to treat-
ment, and correctional release decisions at the Warkworth Sexual Behaviour Clinic
(Research report 1996-73). Ottawa: Ministry of the Solicitor-General.
Beech, A., Beckett, R., & Fisher, D. (2000). Outcome data of representative UK sex offender
treatment programs: Short-term effectiveness and some preliminary re-conviction data.
Unpublished manuscript.
Bélanger, N., & Earls, C. M. (1996). Sex offender recidivism prediction. Forum on Corrections
Research, 8, 22-24.
Blishen, B. R., Carroll, W. K., & Moore, C. (1987). The 1981 socioeconomic index for occupa-
tions in Canada. Canadian Review of Sociology and Anthropology, 24, 465-488.
Bonta, J., Law, M., & Hanson, K. (1998). The prediction of criminal and violent recidivism
among mentally disordered offenders: A meta-analysis. Psychological Bulletin, 123, 123-
142.
Dempster, R. J. (1998). Prediction of sexual violent recidivism: A comparison of risk assessment
instruments. Unpublished master’s thesis, Simon Fraser University.
Epperson, D. L., Kaul, J. D., & Hesselton, D. (1998). Final report on the development of the Min-
nesota Sex Offender Screening Tool-Revised (MnSOST-R). Paper presented at the 17th
Annual Conference of the Association for the Treatment of Sexual Abusers, Vancouver, Brit-
ish Columbia.
Epperson, D. L., Kaul, J. D., Huot, S. J., Hesselton, D., Alexander, W., & Goldman, R. (2000,
November). Cross-validation of the Minnesota Sex Offender Screening Tool-Revised. Paper
presented at the 17th Annual Conference of the Association for the Treatment of Sexual
Abusers, San Diego, California.
Firestone, P., Bradford, J. M., Greenberg, D. M., Nunes, K. L., & Broom, I. (2001). A compari-
son of the Static-99 and the Sex Offender Risk Appraisal Guide (SORAG). Manuscript sub-
mitted for publication.
Gendreau, P., Little, T., & Goggin, C. (1996). A meta-analysis of the predictors of adult offender
recidivism: What works! Criminology, 34, 575-607.
Hanley, J. A., & McNeil, B. J. (1982). The meaning and use of the area under a receiver operating
characteristic (ROC) curve. Radiology, 143, 29-36.
Hanley, J. A., & McNeil, B. J. (1983). A method of comparing the areas under receiver operating
characteristic curves derived from the same cases. Radiology, 148, 839-843.
Hanson, R. K. (1997). The development of a brief actuarial risk scale for sexual offense recidi-
vism (User report 1997-04). Ottawa: Department of the Solicitor General of Canada.
Hanson, R. K. (1998). What do we know about sexual offender risk assessment? Psychology,
Public Policy and Law, 4, 50-72.
Hanson, R. K., & Bussiére, M. T. (1998). Predicting relapse: A meta-analysis of sexual offender
recidivism studies. Journal of Consulting and Clinical Psychology, 66, 348-362.
Hanson, R. K., & Harris, A. (1998). Dynamic predictors of sexual recidivism (User report 1998-
01). Ottawa: Department of the Solicitor General of Canada.
520 CRIMINAL JUSTICE AND BEHAVIOR

Hanson, R. K., & Thornton, D. (1999). Static 99: Improving actuarial risk assessments for sex
offenders (User report 1999-02). Ottawa: Department of the Solicitor General of Canada.
Hanson, R. K., & Thornton, D. (2000). Improving risk assessments for sex offenders: A compar-
ison of three actuarial scales. Law and Human Behavior, 24, 119-136.
Hare, R. D. (1991). Manual for the revised Psychopathy Checklist. Toronto: Multi-Health
Systems.
Harris, G. T., Rice, M. E., & Quinsey, V. L. (1993). Violent recidivism of mentally disordered
offenders: The development of a statistical prediction instrument. Criminal Justice and
Behavior, 20, 315-335.
Harris, G. T., Rice, M. E., & Quinsey, V. L. (1994). Psychopathy as a taxon: Evidence that psy-
chopaths are a discrete class. Journal of Consulting and Clinical Psychology, 62, 387-397.
Hart, S. D., Kropp, P. R., & Hare, R. D. (1988). Performance of psychopaths following condi-
tional release from prison. Journal of Consulting and Clinical Psychology, 56, 227-232.
Haynes, A. K., Yates, P. M., Nicholaichuk, T., Gu, D., & Bolton, R. (2000, June). Sexual devi-
ancy, risk, and recidivism: The relationship between deviant arousal, the Rapid Risk Assess-
ment for Sexual Offence Recidivism (RRASOR) and sexual recidivism. Paper presented at the
Annual Convention of the Canadian Psychological Association, Ottawa, Ontario.
Kroner, D. G., & Mills, J. F. (1997, February). The VRAG: Predicting institutional misconduct in
violent offenders. Paper presented at the 50th Annual Convention of the Ontario Psychologi-
cal Association, Toronto, Canada.
Lieb, R., & Matson, S. (1998). Sexual predator commitment laws in the United States: 1998
update. Washington State Institute for Public Policy.
Lipsey, M. W., & Derzon, J. H. (1998). Predictors of violent or serious delinquency in adoles-
cence and early adulthood: A synthesis of longitudinal research. In R. Loeber & D. P.
Farrington (Eds.), Serious and violent juvenile offenders: Risk factors and successful inter-
ventions (pp. 86-105). London: Sage.
Loza, W., & Dhaliwal, G. K. (1997). Psychometric evaluation of the Risk Appraisal Guide
(RAG): A tool for assessing violent recidivism. Journal of Interpersonal Violence, 12, 779-
793.
Metz, C. E. (1998). ROCKIT (Version 0.9.1b) [Computer software]. Chicago, IL: University of
Chicago.
Mossman, D. (1994). Assessing predictions of violence: Being accurate about accuracy. Journal
of Consulting and Clinical Psychology, 62, 783-792.
Nuffield, J. (1982). Parole decision-making in Canada: Research towards decision guidelines.
Ottawa: Department of the Solicitor General of Canada.
Quinsey, V. L., Coleman, G., Jones, B., & Altrows, I. F. (1997). Proximal antecedents of eloping
and reoffending among supervised mentally disordered offenders. Journal of Interpersonal
Violence, 12, 794-813.
Quinsey, V. L., Harris, G. T., Rice, M. E., & Cormier, C. A. (1998). Violent offenders: Appraising
and managing risk. Washington, DC: American Psychological Association.
Rice, M. E., & Harris, G. T. (1995). Violent recidivism: Assessing predictive validity. Journal of
Consulting and Clinical Psychology, 63, 737-748.
Rice, M. E., & Harris, G. T. (1997). Cross-validation and extension of the Violence Risk
Appraisal Guide for child molesters and rapists. Law and Human Behavior, 21, 231-241.
Rice, M. E., & Harris, G. T. (1999, May). A multi-site followup study of sex offenders: The pre-
dictive accuracy of risk prediction instruments. Third Annual Research Day of the Univer-
sity of Toronto Forensic Psychiatry Program, Penetanguishene, Ontario.
Rice, M. E., Harris, G. T., & Quinsey, V. L. (1990). A follow-up of rapists assessed in a maximum
security psychiatric facility. Journal of Interpersonal Violence, 4, 435-448.
Barbaree et al. / SEX OFFENDER RISK ASSESSMENT 521

Seto, M. C., & Barbaree, H. E. (1999). Psychopathy, treatment behavior and sex offender recidi-
vism. Journal of Interpersonal Violence, 14, 1235-1248.
Seto, M. C., & Lalumière, M. L. (1999). Adolescent sex offenders: Investigating the roles of
antisociality and sexual deviance. Psychiatry Rounds, 13(3).
Seto, M. C., & Lalumière, M. L. (2000). Psychopathy and sexual aggression. In C. Gacono (Ed.),
The clinical and forensic assessment of psychopathy: A practitioner’s guide (pp. 333-350).
Mahwah, NJ: Lawrence Erlbaum.
Sjöstedt, G., & Långström, N. (2000, November). Actuarial assessment of risk for criminal
recidivism among sex offenders released from Swedish prisons 1993-1997. Poster presented
at the 19th Annual Conference of the Association for the Treatment of Sexual Abusers, San
Diego, California.
Smiley, C., Hills, A. L., & McHattie, L. (2000, June). Are dangerous offenders really dangerous?
Presentation at the Canadian Psychological Association Convention, Ottawa, Ontario.

View publication stats