Вы находитесь на странице: 1из 12

C T Co l o n o g r a p h y

Assessing The Comparative


Effectiveness Of A Diagnostic
Technology: CT Colonography
Comparative effectiveness assessment of medical imaging is likely to
grow as all stakeholders appreciate the need for better information.
by Steven D. Pearson, Amy B. Knudsen, Roberta W. Scherer, Jed
Weissberg, and G. Scott Gazelle
ABSTRACT: Medical imaging is a prime example of an innovation that has brought impor-
tant advances to medical care while triggering concerns about potential overuse and exces-
sive costs. Many hopes are riding on comparative effectiveness research to help guide
better decision making to improve quality and value. But the dynamic nature of medical im-
aging poses challenges for the traditional paradigms of evidentiary review and analysis at
the heart of comparative effectiveness. This paper discusses these challenges and pre-
sents policy lessons for manufacturers, evidence reviewers, and decisionmakers, illus-
trated by an assessment of a prominent emerging imaging technique: computed tomogra-
phy (CT) colonography. [Health Affairs 27, no. 6 (2008): 1503–1514; 10.1377/hlthaff.27.6
.1503]

N
e w m e d i c a l t e s ts a n d t r e at m e n ts often become widely used
while important gaps in evidence remain regarding their clinical effective-
ness compared to the best existing alternatives. Many proponents of
health system reform have therefore argued that efforts to improve the quality and
affordability of U.S. health care may be crippled without greater efforts to assess
the “comparative effectiveness” of both new and existing technologies.1
One area ripe for more assessment is medical imaging. Ultrasound, nuclear im-
aging, computed tomography (CT), magnetic resonance imaging (MRI), positron
emission tomography (PET) scans, and other imaging techniques have revolution-
ized medical care. But sustained growth in the use of many types of medical imag-
ing has occurred without convincing evidence of additional clinical benefits or

Steven Pearson (spearson@icer-review.org) is president of the Institute for Clinical and Economic Review (ICER)
at the Massachusetts General Hospital Institute for Technology Assessment (ITA) in Boston. Amy Knudsen is a
senior scientist at the ITA. Roberta Scherer is an associate scientist in epidemiology at the Johns Hopkins
Bloomberg School of Public Health in Baltimore, Maryland. Jed Weissberg is associate executive director, Quality
and Performance Improvement, at the Permanente Federation in Oakland, California. Scott Gazelle is director of
the ITA, in the Department of Radiology at Mass General.

H E A L T H A F F A I R S ~ Vo l u m e 2 7 , N u m b e r 6 1503
DOI 10.1377/hlthaff.27.6.1503 ©2008 Project HOPE–The People-to-People Health Foundation, Inc.

Downloaded from HealthAffairs.org on March 12, 2018.


Copyright Project HOPE—The People-to-People Health Foundation, Inc.
For personal use only. All rights reserved. Reuse permissions at HealthAffairs.org.
Im ag i ng Bo om

improved cost-effectiveness over existing care. Against a backdrop of increasing


cost pressures and wide variation in usage, these gaps in evidence create a compel-
ling argument for more assessment of medical imaging to support health care deci-
sions by all stakeholders.2

Challenges Of Evaluation
Distinct challenges in analyzing the comparative effectiveness of medical imag-
ing have been recognized for many years within the health technology assessment
community.3 The standard paradigm that guides identifying, synthesizing, and in-
terpreting evidence to evaluate medical treatments is poorly suited for diagnostic
tests, and so in the mid-1970s several groups developed a now widely adopted
framework to evaluate all diagnostic technologies by categorizing evidence into
six levels (Exhibit 1).4 The first two levels of the evidence hierarchy reflect the ba-
sic technical attributes of the test and evidence of diagnostic accuracy. Taken to-
gether, they can be called “test performance.” The additional levels of evidence
build upon these first two, with each level representing a distinct type of evidence
that might, or might not, be available when one is evaluating a diagnostic test.
n Constant change. Medical imaging is notable for several features that compli-
cate the creation or assessment of evidence within this hierarchy, even at the basic
levels that constitute test performance (Exhibit 2). First and foremost, medical im-
aging techniques are constantly changing. Whereas many blood tests have fixed lab-
oratory specifications, applications, and interpretations, imaging techniques are of-
ten in nearly constant flux. Both imaging devices and patient preparation

EXHIBIT 1
Levels Of Evidence In Evaluations Of Diagnostic Technologies

Level of evidence Example of measures


1. Technical Pixels per millimeter
Section thickness
2. Diagnostic accuracy Sensitivity
Specificity
Area under the receiver operating characteristic curve
3. Impact of diagnostic thinking Percentage of cases in which the clinician believes that the test
alters the diagnosis
4. Impact on therapeutic actions Percentage of cases in which the choice of therapies is changed
after information from the test is provided
5. Impact on patient outcomes Differences in mortality, morbidity, or quality of life between patients
managed with the test and those managed without it
6. Impact on societal outcomes Cost-effectiveness of the improvement in patient outcomes, such as
cost per life-year saved, calculated from a societal perspective
SOURCE: Adapted from J.R. Thornbury, “Clinical Efficacy of Diagnostic Imaging: Love It or Leave It,” American Journal of
Roentgenology 162, no. 1 (1994): 1–8.

1504 November/ December 2008

Downloaded from HealthAffairs.org on March 12, 2018.


Copyright Project HOPE—The People-to-People Health Foundation, Inc.
For personal use only. All rights reserved. Reuse permissions at HealthAffairs.org.
C T Co l o n o g r a p h y

EXHIBIT 2
Impact Of Distinctive Evidentiary Considerations Of Comparative Effectiveness
Assessments Of Diagnostic Medical Imaging Techniques

Evidentiary consideration Impact on assessment


Medical imaging devices undergo rapid evolution, Judgment required on when to launch assessment
often improving the technical precision of images Randomized controlled trials are difficult to perform,
and a trial long enough to measure patient outcomes
may be so long that its results will not be relevant
Judgment required on technical specification criteria
to identify relevant published evidence
Evidence on “best” devices might not be generalizable
to community practice
Interpretation of images may be dependent upon the Literature may include studies whose practitioners
skill and experience of practitioners vary widely in training and skill of interpretation
Diagnostic accuracy is not independent of the Differences in test performance across studies may be
spectrum of patients tested due to differences in the spectrum of patients
involved
Most early published literature comes from academic
centers, where the patient population might not be
representative of the spectrum of cases that will be
seen in general community practice
SOURCE: Authors’ analysis.

techniques evolve, often through small, iterative changes, and many different ver-
sions can be in use or under study at any one time. Assessments may be crippled by
questions of whether the medical literature has much relevance to the most recent
version of the technique—the version that clinicians view, if only ephemerally, as
state of the art.
n Role of interpretation. Another complicating aspect of medical imaging is
that the diagnostic accuracy of an imaging technique is integrally linked with prac-
titioners’ experience and skill in the interpretation of results. Thus, any divergent
evidence in the literature on diagnostic accuracy may be due, in part or in whole, to
differences in the practitioner’s skill at interpretation.
n Variability in published results. A third complication in the uneasy calculus
of assessing test performance is judging the variability in published results that may
arise from differences in the spectrum of patients.5 The measured sensitivity and
specificity of a test are likely to be higher when the test is used among patients with
more typical or severe expressions of the condition. Conversely, tests will usually
have a harder time distinguishing “positive” from “negative” results among patients
with atypical or less severe cases.
n Transferability. Often these evidentiary challenges of medical imaging act in
combination to present a common problem for evidence reviewers and decision-
makers alike: how to interpret the meaning of test performance results that come
from studies of the most advanced devices, in the hands of experts with extensive

H E A L T H A F F A I R S ~ Vo l u m e 2 7 , N u m b e r 6 1505

Downloaded from HealthAffairs.org on March 12, 2018.


Copyright Project HOPE—The People-to-People Health Foundation, Inc.
For personal use only. All rights reserved. Reuse permissions at HealthAffairs.org.
Im ag i ng Bo om

experience, among highly selected patients at academic health centers. This can be
called the “best hands” problem, a common affliction noted equally by those en-
gaged in the assessments of surgical procedures.6 How transferable early results
from academic centers are to devices, practitioners, and patients in the community
is a critical question for evidence reviewers and decisionmakers to consider.
n Screening intervals and impact on patient outcomes. Medical imaging
presents two other fundamental types of challenges for the conduct and interpreta-
tion of comparative effectiveness assessments. First, evidence comparing the one-
time use of a new technique with an established gold standard produces no evidence
to guide decisions on recommended intervals for screening or follow-up. The clini-
cal effectiveness, and certainly the cost-effectiveness, of a medical imaging technique
in practice may be highly dependent on how frequently it is repeated, but solid evi-
dence is almost never available with which to weigh this important variable.
Second, when the evidence on a medical imaging technique is gathered, almost
never is there direct evidence of the impact of a medical imaging technique on pa-
tient outcomes (level 5).7 There are several reasons for this. New imaging tech-
niques often do not exist alongside competing approaches in the same clinical
practice sites, thwarting most schemes to randomize patients within a trial at the
point of care. But even more intractable are the concerns about conducting a pro-
spective study over the years necessary to capture meaningful patient outcomes. If
the study includes modifications to the imaging technique as they emerge, pooled
results from the use of different versions might not be viewed as informative. Con-
versely, if the study sticks resolutely to a single early version of an evolving imag-
ing technique, the study results may be more cohesive but may be viewed as irrele-
vant because of the “obsolescence” of the original version of the technique. It is not
surprising, therefore, that any ardor for a randomized controlled trial (RCT) of a
medical imaging technique, or even the more modest desire for a prospective co-
hort study, is usually extinguished by both practical and conceptual concerns.
n Pros and cons of modeling. Without direct evidence of the impact of a new
diagnostic strategy on patient outcomes, there are two choices for comparative ef-
fectiveness evaluations of medical imaging devices: they can summarize the existing
evidence and characterize the major gaps that remain; or they can supplement this
approach with decision-analytic modeling.8 Modeling provides an opportunity to
fill in the evidentiary gaps with estimates based on expert opinion and to simulate
the impact of various diagnostic strategies on patient outcomes. Modeling can also
be used to vary key evidentiary assumptions about test performance and clinician
decision making and determine how sensitive estimated patient outcomes are to un-
certainty in these data. Lastly, decision-analytic modeling is the platform for exam-
ining all benefits, harms, and costs together to provide cost-effectiveness and cost-
utility information to decisionmakers.
Modeling, however, has prominent drawbacks and some vocal detractors.9 It is
time- and resource-intensive, and the many assumptions required in every model-

1506 November/ December 2008

Downloaded from HealthAffairs.org on March 12, 2018.


Copyright Project HOPE—The People-to-People Health Foundation, Inc.
For personal use only. All rights reserved. Reuse permissions at HealthAffairs.org.
C T Co l o n o g r a p h y

ing approach often produce new sources of uncertainty to consider. Decision-


makers may view modeling as trying to do too much with too little. Nonetheless,
for comparative effectiveness evaluations to address the questions of most rele-
vance to decisionmakers, modeling must serve as an important, if admittedly im-
perfect, methodological tool.

A Case Study: The Comparative Effectiveness Of CTC


The history of CT colonography (CTC) provides a vivid demonstration of the
complex relationship between the developmental pathway of an evolving imaging
technique, the studies performed to evaluate its effectiveness, and the process of
assembling and critically assessing that evidence for decisionmakers. CTC is a
screening technique in which a CT scanner is used to display two-dimensional
images or three-dimensional reconstructions of the bowel (thereby earning the
moniker of “virtual colonoscopy”). Given that only half of U.S. adults undergo rec-
ommended screening for colorectal cancer, some advocates have suggested that
the speed, relative ease, and availability of CTC give it the potential to markedly
improve the rates of cancer screening in the general population.10
Because of its potential for widespread use, the evidence on CTC has attracted
much scrutiny. Despite vocal support among some researchers and clinical groups,
serious doubts about the effectiveness of CTC and its potential impact on the
health care system have persisted after more than a decade of development and re-
search. Neither Medicare nor major national private health plans cover CTC for
general screening, and the widespread conclusion of independent evidence review
efforts in recent years has been that CTC is “not yet ready for widespread use.”11
In early 2008, a formal assessment of the comparative effectiveness of CTC for
colorectal cancer screening in the general population was undertaken by the Insti-
tute for Clinical and Economic Review (ICER) at Massachusetts General Hospital
in Boston.12 In each phase of the ICER assessment, specific questions arose regard-
ing how best to identify the appropriate evidence, synthesize it, and interpret it.
These questions illustrate the key evidentiary challenges presented by diagnostic
imaging and also highlight broader issues of framing evidence for decisionmakers
that are relevant for comparative effectiveness assessment efforts in any domain.
n Determining the scope of the assessment. To help inform the academic
work of the ICER assessment, a multidisciplinary workgroup was convened com-
prising academic and clinical experts from radiology and gastroenterology; medical
policy leaders from private health plans; manufacturer representatives; and inde-
pendent experts in clinical epidemiology, health policy, and ethics.13 At the outset of
the assessment, the CTC workgroup wrestled with the fundamental issue of what
the focus should be. Some members felt that the primary question should be “test
versus test”: how does the diagnostic accuracy of CTC compare to that of optical
colonoscopy? Optical colonoscopy is considered by many to be the “gold standard”
because of its ability to view all polyps and to prevent cancer by removing them. Be-

H E A L T H A F F A I R S ~ Vo l u m e 2 7 , N u m b e r 6 1507

Downloaded from HealthAffairs.org on March 12, 2018.


Copyright Project HOPE—The People-to-People Health Foundation, Inc.
For personal use only. All rights reserved. Reuse permissions at HealthAffairs.org.
Im ag i ng Bo om

cause the effectiveness of CTC is also determined by its ability to view polyps, com-
paring the two had been the focus of the academic community for more than a de-
cade. Many members of the workgroup felt that when the diagnostic accuracy of
CTC was judged to be comparable to that of optical colonoscopy, the new technique
would be accepted, covered by insurance, and widely adopted.
Others in the workgroup, however, felt that the scope of the assessment needed
to be broader. Their argument was that CTC should be viewed as a diagnostic
strategy with both patient and system considerations. The key question was not
test versus test but how the introduction of a new CTC-based screening strategy
would compare to current practice and to other relevant screening strategies that
health systems could promote. Some health systems were in fact considering other
noninvasive options and viewed them as potential comparators for CTC.
ICER decided to frame its assessment to capture the broader perspective of di-
agnostic strategy versus diagnostic strategy. This approach would seek to identify
evidence not only on test performance but also on patient outcomes; on patients’
preference for CTC as opposed to optical colonoscopy; on the impact of CTC on
population screening rates; and on the broader risks and benefits of a CTC strat-
egy, including radiation exposure and the impact of incidental lesions discovered
outside the colon during CTC. Decision-analytic modeling was planned to ad-
dress this broader scope of the assessment. It was recognized, however, that many
clinicians and other decisionmakers would view diagnostic accuracy as the most
salient guide to a judgment of comparative effectiveness. The ICER assessment
therefore emphasized this evidence within the body of its review.
n Defining the technical criteria. Another important function of the CTC
workgroup during the scoping phase of the assessment was to help ICER establish
criteria for what versions of the imaging technique were relevant while providing in-
sight and guidance on the overall developmental trajectory of CTC and its compara-
tors. The goal was to establish criteria for technical specifications and interpreter
training that would reflect recent advances but that were also reasonable to expect
of radiological practice in the community. Extensive discussion on this balance
helped identify two technical device criteria and one interpreter-training criterion
that, if satisfied, meant that a study would be deemed to reflect evidence on “high-
quality CTC” and be eligible for inclusion in the ICER review.
These technical specification requirements, applied in combination with gen-
eral criteria for acceptable diagnostic study design and conduct, had a strong ef-
fect on the selection of relevant studies for the ICER assessment. From a comput-
erized literature search that initially identified ninety-seven eligible articles, only
nine met all of the criteria for formal inclusion in the review. This sharp narrowing
of admissible evidence highlights the importance of the judgment of appropriate
technical specifications. On this single judgment often turns the results of an en-
tire assessment. The identification of explicit criteria and transparency in their
application are thus of great importance to the legitimacy of any comparative ef-

1508 November/ December 2008

Downloaded from HealthAffairs.org on March 12, 2018.


Copyright Project HOPE—The People-to-People Health Foundation, Inc.
For personal use only. All rights reserved. Reuse permissions at HealthAffairs.org.
C T Co l o n o g r a p h y

fectiveness assessment of medical imaging.


n Assembling and analyzing the evidence. As is typical of assessments of
medical imaging, not one of the nine articles identified as relevant for judging the
clinical effectiveness of CTC was an RCT; most were single-site case series, and none
had followed patients long enough after testing to be able to determine the impact of
CTC on subsequent screenings, cancer incidence, or mortality. The ICER assess-
ment therefore faced the common predicament of reviewing an evidence base lim-
ited to diagnostic accuracy studies when the goal was to inform decisionmakers
about the impact of a diagnostic imaging strategy on patient outcomes and the
health system. Although the decision-analytic model would aid in this effort, not
surprisingly, even the interpretation of the evidence on diagnostic accuracy raised
pointed evidentiary challenges.
CTC was initially developed and studied in the early 1990s, when preliminary
studies showed that it was poor at detecting medium-size (6–9 mm) colorectal
polyps that many clinicians felt were clinically significant. Within a couple years,
however, these data were considered obsolete, as the use of more precise multi-
row CT scanners and improved bowel-cleansing regimens became widely estab-
lished. Then, as sometimes happens, a prominent study appeared that seemed to
demonstrate convincingly that CTC had leapfrogged ahead and could now claim
comparability to the gold standard of optical colonoscopy. In 2003, Perry
Pickhardt and colleagues published a study of the largest screening cohort to date
in which CTC was found to be essentially identical to colonoscopy in the detec-
tion of medium-size and larger polyps.14 False positive scans were infrequent, and
there was not a single complication of CTC in more than 1,000 patients screened.
The results signaled an important landmark in the development of CTC.
The Pickhardt study was not the end of the story, however. How had these re-
searchers accomplished such superior findings? The protocol for Pickhardt’s
study included an especially meticulous bowel preparation. In addition, a novel
system had been used that could digitally “cleanse” CT images of obscuring resid-
ual fluid in the bowel. These measures were combined with use of a sophisticated
three-dimensional imaging system, interpreted by well-trained top academic spe-
cialists. The total package typified the “best hands” problem. Following the publi-
cation of the Pickhardt study, the clinical and insurer communities waited expec-
tantly for further studies to confirm whether the excellent results could be
replicated by other clinicians in community settings.
To the surprise of many, markedly inferior results next appeared in two notable
studies. In 2004, Peter Cotton and colleagues reported results from a large multi-
institutional study in which CTC detected only 23 percent of medium-size lesions
and 52 percent of larger polyps.15 Relatively poor results from another large study
were also described by Don Rockey and colleagues in 2005.16 Faced with signifi-
cant variation in diagnostic accuracy across these major studies, no public or pri-
vate insurer moved to cover CTC for screening purposes.17

H E A L T H A F F A I R S ~ Vo l u m e 2 7 , N u m b e r 6 1509

Downloaded from HealthAffairs.org on March 12, 2018.


Copyright Project HOPE—The People-to-People Health Foundation, Inc.
For personal use only. All rights reserved. Reuse permissions at HealthAffairs.org.
Im ag i ng Bo om

In evaluating this body of evidence, ICER again found that the CTC workgroup
helped provide important context. Some senior academic researchers in the field
knew that the Cotton and Rockey studies had actually been conducted earlier
than Pickhardt’s study and had not incorporated some of the more subtle ad-
vanced techniques that had subsequently become routine. There were also ques-
tions regarding the adequacy of radiologist training. For example, whereas radiol-
ogists in the Pickhardt study had been required to have a minimum of twenty-five
prior readings as training, radiologists at most centers in the Cotton study were
only required to have read ten prior CTC studies. Concern about the results from
the Rockey study was similar and was buttressed by a little-known reanalysis of
the original scans in which two experienced interpreters reread the scans and
found nearly all of the polyps that had been missed by the less experienced readers
in the original study.18
Unearthing concerns such as this about the validity of study results does not al-
ways simplify the path toward a judgment of the evidence. There is no purely evi-
dence-based way to determine whether the Cotton and Rockey studies should be
viewed as less relevant than Pickhardt’s data; it cannot be known with certainty
which results are more likely to best represent those that would occur in commu-
nity practice. In the end, ICER decided to exclude the Cotton data because upon
closer examination, the study protocol had actually not met the technical criteria
for interpreter training that ICER had established for inclusion in the assessment.
The Rockey study, however, still met criteria, and its data were not excluded. The
questions about its relevance were highlighted in the text of the assessment, and
to guide discussion, all quantitative meta-analyses of diagnostic accuracy data
were done two ways: with the Rockey data, and without them.
n Judgment on limited evidence. Radiation risks. The CTC workgroup had ad-
vised ICER to include in the scope of its assessment three other aspects of CTC that
illustrate important general lessons for comparative effectiveness assessments of
medical imaging. One of these was the radiation risk of CTC. All estimates of risks of
radiation levels produced by diagnostic imaging come from controversial extrapola-
tions of the cancer death rates for adults with much higher radiation exposure. The
ICER assessment outlined these evidentiary challenges and opted to display the
dose of CTC radiation exposure within a spectrum of other radiation risks: approxi-
mately half as much radiation as a standard lumbar spine x-ray but twenty to thirty
times more than a chest x-ray. ICER also summarized literature that suggested that
the estimated additional deaths from the increased radiation of CTC would be less
than the estimated attributable death rate from the complications of colonoscopy
with polypectomy.
When this evidence was presented to the CTC workgroup for discussion, the
major uncertainties gave play to widely differing attitudes about radiation risks in
general. Most members of the CTC workgroup discounted a significant radiation
risk, but for others the radiation risk posed by CTC remained an important con-

1510 November/ December 2008

Downloaded from HealthAffairs.org on March 12, 2018.


Copyright Project HOPE—The People-to-People Health Foundation, Inc.
For personal use only. All rights reserved. Reuse permissions at HealthAffairs.org.
C T Co l o n o g r a p h y

sideration, particularly since CTC was being discussed as a potential screening in-
tervention for large populations. Radiation risk is likely to be a core issue for com-
parative effectiveness assessment of most medical imaging techniques, and in the
absence of good evidence, it may divide many reviewers and decisionmakers.
Incidental findings. The second element within the ICER assessment that illus-
trates a core issue for assessments of diagnostic imaging is the topic of incidental
findings. The ICER review concluded that approximately 6–8 percent of asymp-
tomatic adults undergoing CTC will have some kind of lesion identified outside
the colon that will generate a recommendation for follow-up. Were CTC to be
adopted broadly, this rate of extracolonic findings could lead to sizable numbers
of patients’ requiring further investigation. But should that be considered a benefit
or a harm of CTC? The evidence available is too limited to determine whether early
detection of some aneurysms and cancers outside the colon produces clinical ben-
efit when considered alongside the additional risks, anxieties, and costs generated
by follow-up investigations of these and other less significant lesions.
As with radiation risk, the lack of good evidence led to divergent opinions
within the CTC workgroup on the weight to ascribe incidental findings in a judg-
ment of comparative effectiveness. For several members of the CTC workgroup,
the issue of incidental findings was the “swing vote” in their judgment of CTC. As
with radiation risks, incidental findings need to be viewed as a central component
of all comparative effectiveness assessments of diagnostic imaging, and assess-
ment efforts must be aware that the lack of good data creates a major challenge for
evidence synthesis and interpretation.
Screening intervals. The third aspect of CTC that reinforces a general issue for all
medical imaging assessment is the question of screening interval. The ICER as-
sessment found no evidence comparing the use of CTC at varying screening inter-
vals. Guidelines recommend a ten-year interval for optical colonoscopy, and so one
reasonable option for CTC would also be a ten-year interval. However, other
colorectal cancer screening methods are performed annually or biennially, leading
some experts in CTC to advocate for screening every three to five years.19 The
ICER assessment, with guidance again from the CTC workgroup, used its
decision-analytic model to analyze the outcomes under two screening-interval
scenarios: CTC every ten years; and CTC every five years.20 Not surprisingly, the
net harms, benefits, and costs of these two scenarios differed. CTC every ten years
was less effective than optical colonoscopy in life-years gained from screening, but
the model suggested that CTC every five years could be slightly more effective.21
This finding illustrates the general point that screening interval is a critically im-
portant consideration within a comparative effectiveness assessment of medical
imaging. Decision-analytic modeling will be the primary method by which to ex-
amine screening intervals, given the usual absence of direct data in published
studies.

H E A L T H A F F A I R S ~ Vo l u m e 2 7 , N u m b e r 6 1511

Downloaded from HealthAffairs.org on March 12, 2018.


Copyright Project HOPE—The People-to-People Health Foundation, Inc.
For personal use only. All rights reserved. Reuse permissions at HealthAffairs.org.
Im ag i ng Bo om

Implications For Research And Policy


n For research. The ICER assessment of CTC faced all of the major challenges
presented generally by comparative effectiveness assessments of medical imaging.
There was no magic bullet with which to solve the problems created by a controver-
sial evidence base that tracked an evolving technique. ICER had to grapple with
how to interpret evidence limited by variations in CTC techniques and interpreter
training standards; it had to sort through many questions about how to assess the
generalizability of results to the broader community; and it had to help decision-
makers make sense of CTC without robust data on radiation risk, the longer-term
impact of incidental findings, or the impact of different screening intervals. Every
stage of the assessment—the scoping, analysis, and interpretation—required judg-
ment. But judgment, expressed clearly, need not undermine the legitimacy of an as-
sessment of a medical imaging technique.
n For stakeholders working together. From consideration of the conceptual
evidentiary challenges presented by medical imaging, we believe that several lessons
can be drawn to guide future medical-imaging comparative-effectiveness policy for
manufacturers, researchers, evidence reviewers, and health care decisionmakers.
First, manufacturers and clinical researchers should work together to improve the
evidence available for comparative effectiveness assessments. Key to their efforts
should be the design of standardized outcome measures and practitioner training
standards so that the results from different studies can be compared with more as-
surance. Studies of emerging imaging techniques should also seek to evaluate out-
comes beyond simple tests of diagnostic accuracy, including impact on clinicians’
thinking and therapeutic actions, and ideally on patient outcomes. In particular,
studies of medical imaging should be designed to follow patients and evaluate the
subsequent course of testing and the impact of incidental findings on outcomes and
health care use.
n For evidence reviewers. For evidence reviewers, an important lesson is the
value of longitudinal engagement and dialogue with an advisory group consisting of
a broad mix of academic leaders, clinical leaders, manufacturers, insurers, and pa-
tient advocates. The importance of this guidance cannot be overstated. Outside in-
put is the only way to get the scope of the comparative effectiveness question right;
the only way to gain insight into the nuances of the technique and the published lit-
erature; and the only way to remain aware of emerging changes to the imaging tech-
nique, of the impending release of new research findings, or of the emergence of a
“disruptive” new comparator.
Evidence reviewers should build into their assessments robust considerations
of radiation risk, incidental findings, and screening or follow-up interval, since
these are all critical to judgments of the balance of clinical risks and benefits. Re-
viewers should also acknowledge the many judgments they make within a com-
parative effectiveness assessment and should develop explicit criteria and trans-

1512 November/ December 2008

Downloaded from HealthAffairs.org on March 12, 2018.


Copyright Project HOPE—The People-to-People Health Foundation, Inc.
For personal use only. All rights reserved. Reuse permissions at HealthAffairs.org.
C T Co l o n o g r a p h y

parent rationales for the decisions that determine the selection, analysis, and
interpretation of evidence. The importance of explicitness and transparency is
just as vital to the legitimacy of a systematic review as it is to a complex decision-
analytic model used to perform a cost-effectiveness analysis.
n For decisionmakers. The policy lessons for health care decisionmakers, in-
cluding patients, clinicians, and insurers, relate to their role in managing the uncer-
tainty inherent in a comparative effectiveness assessment. First, decisionmakers
need to become actively engaged with an assessment and realize that their engage-
ment must be continuous to sustain the necessary dialogue between reviewers and
decisionmakers. Through this dialogue, decisionmakers will share in the uncertainty
and help frame the key judgments to guide the assessment forward. They will also
gain their own independent appreciation for the quality of the evidence, the nuances
of the analysis, and the best way to apply the findings in light of residual uncertainty.
Decisionmakers should work with reviewers to become increasingly comfort-
able with the application of decision-analytic modeling to questions of compara-
tive effectiveness. Modeling was the only way the ICER assessment of CTC could
explore the critical questions of how CTC compared with optical colonoscopy in
the number of lives saved; what the impact of false positive CTC scans might be;
how the risks of adverse outcomes from polypectomy match up against the radia-
tion risks of CTC; and how the number of colonoscopies and overall health care
costs would shift inside a health care system after the introduction of CTC as a
screening method. The ICER CTC workgroup included many who were not ex-
pert in decision analysis, yet they found the model results instrumental in consid-
ering CTC as part of a diagnostic strategy. Modeling introduces its own set of un-
certainties and judgments, but it remains the best way to try to develop
information for decisionmakers who want to evaluate the impact of medical imag-
ing on broader patient outcomes and on the health care system.

A
t t e n t i o n to t h e s e l e s s o n s should lead to improved quality of com-
parative effectiveness assessments of medical imaging. Innovation in the
field of medical imaging is impressive. So are the cost pressures and quality
concerns within the U.S. health care system. Comparative effectiveness assess-
ment of medical imaging is likely to grow as all stakeholders appreciate the need
for better information to guide health care decisions. As these initiatives move for-
ward, the principles and practice of comparative effectiveness assessment will
need to remain mindful of the unique, but not insurmountable, challenges pre-
sented by medical imaging.

The Institute for Clinical and Economic Review (ICER) assessment of computed tomography colonography
(CTC) was supported by the Blue Shield of California Foundation and the Washington State Health Care
Authority. During the time of the CTC appraisal described in this paper, Steven Pearson also served as a part-time
consultant to America’s Health Insurance Plans (AHIP).

H E A L T H A F F A I R S ~ Vo l u m e 2 7 , N u m b e r 6 1513

Downloaded from HealthAffairs.org on March 12, 2018.


Copyright Project HOPE—The People-to-People Health Foundation, Inc.
For personal use only. All rights reserved. Reuse permissions at HealthAffairs.org.
Im ag i ng Bo om

NOTES
1. G.R. Wilensky, “Developing a Center for Comparative Effectiveness Information,” Health Affairs 25 (2006):
w572–w585 (published online 7 November 2006; 10.1377/hlthaff.25.w572). See also E.J. Emanuel, V.R.
Fuchs, and A.M. Garber, “Essential Elements of a Technology and Outcomes Assessment Initiative,” Journal
of the American Medical Association 298, no. 11 (2007): 1323–1325.
2. G.S. Gazelle et al., “Cost-Effectiveness Analysis in the Assessment of Diagnostic Imaging Technologies,”
Radiology 235, no. 2 (2005): 361–370. See also D. Hailey and I. McDonald, “The Assessment of Diagnostic
Imaging Technologies: A Policy Perspective,” Health Policy 36, no. 2 (1996): 185–197.
3. A. Tatsioni et al., “Challenges in Systematic Reviews of Diagnostic Technologies,” Annals of Internal Medicine
142, no. 12, Part 2 (2005): 1048–1055.
4. D.G. Fryback and J.R. Thornbury, “The Efficacy of Diagnostic Imaging,” Medical Decision Making 11, no. 2
(1991): 88–94; B.J. McNeil and S.J. Adelstein, “Determining the Value of Diagnostic and Screening Tests,”
Journal of Nuclear Medicine 17, no. 6 (1976): 439–448; and H.V. Fineberg, R. Bauman, and M. Sosman, “Com-
puterized Cranial Tomography: Effect on Diagnostic and Therapeutic Plans,” Journal of the American Medical
Association 238, no. 3 (1977): 224–227.
5. J.J. Deeks, ”Systematic Reviews of Evaluations of Diagnostic and Screening Tests,” British Medical Journal 323,
no. 7305 (2001): 157–162.
6. B.Y. Safadi, “Trends in Insurance Coverage for Bariatric Surgery and the Impact of Evidence-Based Re-
views,” Surgical Clinics of North America 85, no. 4 (2005): 665–680.
7. Tatsioni et al., “Challenges in Systematic Reviews.”
8. Gazelle et al., “Cost-Effectiveness Analysis.”
9. G.R. Wilensky, “Cost-Effectiveness Information: Yes, It’s Important, but Keep It Separate, Please!” Annals of
Internal Medicine 148, no. 12 (2008): 967–968.
10. B. Levin et al., “Screening and Surveillance for the Early Detection of Colorectal Cancer and Adenomatous
Polyps, 2008: A Joint Guideline from the American Cancer Society, the U.S. Multi-Society Task Force on
Colorectal Cancer, and the American College of Radiology,” Gastroenterology 134, no. 5 (2008): 1570–1595.
11. S. Banerjee and J. Van Dam, “Colonoscopy or CT Colonography for Colorectal Cancer Screening in 2006?”
Nature Clinical Practice: Gastroenterology and Hepatology 3, no. 6 (2006): 296–297.
12. Further information on the ICER assessment process is available at http://icer-review.org/index.php?
option=com_content&task=blogsection&id=13&Itemid=42 and http://icer-review.org/index.php?option=
com_content&task=view&id=14&Itemid=39 (accessed 11 September 2008).
13. A full description of the role of the ICER CTC workgroup and a list of participants is available in online
Appendix 1, http://content.healthaffairs.org/cgi/content/full/27/6/1503/DC1.
14. P.J. Pickhardt et al., “Computed Tomographic Virtual Colonoscopy to Screen for Colorectal Neoplasia in
Asymptomatic Adults,” New England Journal of Medicine 349, no. 23 (2003): 2191–2200.
15. P.B. Cotton et al., “Computed Tomographic Colonography (Virtual Colonoscopy): A Multicenter Com-
parison with Standard Colonoscopy for Detection of Colorectal Neoplasia,” Journal of the American Medical
Association 291, no. 14 (2004): 1713–1719.
16. D.C. Rockey et al., “Analysis of Air Contrast Barium Enema, Computed Tomographic Colonography, and
Colonoscopy: Prospective Comparison,” Lancet 365, no. 9456 (2005): 305–311.
17. Australian Medical Services Advisory Committee, Application 1095: Computed Tomography Colonography, No-
vember 2007, http://www.msac.gov.au/internet/msac/publishing.nsf/Content/app1095-1 (accessed 7 Au-
gust 2008); and California Technology Assessment Forum, Computed Tomographic (CT) Colonography (Virtual
Colonoscopy) for Screening of Colorectal Cancer, June 2004, http://www.ctaf.org/content/general/detail/535 (ac-
cessed 7 August 2008).
18. T. Doshi et al., “CT Colonography: False-Negative Interpretations,” Radiology 244, no. 1 (2007): 165–173.
19. P.J. Pickhardt, “Screening CT Colonography: How I Do It,” American Journal of Roentgenology 189, no. 2 (2007):
290–298.
20. Details on the structure of the model are available in online Appendix 2, as in Note 13.
21. For cost-effectiveness results, see R. Scherer, A. Knudsen, and S.D. Pearson, CT Colonography for Colorectal
Cancer Screening, ICER Final Appraisal Document, 27 January 2008, http://icer-review.org/index.php?
option=com_docman&task=doc_download&gid=22&Itemid= (accessed 29 September 2008).

1514 November/ December 2008

Downloaded from HealthAffairs.org on March 12, 2018.


Copyright Project HOPE—The People-to-People Health Foundation, Inc.
For personal use only. All rights reserved. Reuse permissions at HealthAffairs.org.

Вам также может понравиться