You are on page 1of 9

Emerging Treatments and Technologies


Validation of the Archimedes Diabetes

DAVID M. EDDY, MD, PHD of results and to collectively span a wide
LEONARD SCHLESSINGER, PHD range of patient populations, organ sys-
tems, treatments, delivery settings, and
outcomes. Half of the trials were used to
OBJECTIVE — To validate the Archimedes model of diabetes and its complications for a help build the model (“internal” or “de-
variety of populations, organ systems, treatments, and outcomes. pendent” validations); the other half were
not (“external” or “independent” valida-
RESEARCH DESIGN AND METHODS — We simulated a variety of randomized con- tions).
trolled trials by repeating in the model the steps taken for the real trials and comparing the results For each validation exercise, we cre-
calculated by the model with the results of the trial. Eighteen trials were chosen by an independent
ated a “virtual trial” by repeating the steps
advisory committee. Half the trials had been used to help build the model (“internal” or “dependent”
validations); the other half had not. Those trials comprise “external” or “independent” validations. taken in the real trial and then compared
the outcomes seen in the virtual trial with
RESULTS — A total of 74 validation exercises were conducted involving different treatments those that occurred in the real trial. To set
and outcomes in the 18 trials. For 71 of the 74 exercises there were no statistically significant up a validation exercise, we first had the
differences between the results calculated by the model and the results observed in the trial. model create a large virtual population
Considering only the trials that were never used to help build the model—the independent or containing a broad spectrum of ages,
external validations—the correlation was r ⫽ 0.99. Including all of the exercises, the correlation sexes, race/ethnicities, characteristics, be-
between the outcomes calculated by the model and the outcomes seen in the trials was r ⫽ 0.99. haviors, and diseases. We did this by hav-
When the absolute differences in outcomes between the control and treatment groups were ing the model “give birth” to a very large
compared, the correlation coefficient was r ⫽ 0.97.
number people of different sexes and
CONCLUSIONS — The Archimedes diabetes model is a realistic representation of the anat- race/ethnicities and letting them grow up
omy, pathophysiology, treatments, and outcomes pertinent to diabetes and its complications for (i.e., letting their physiologies function
applications that involve the populations, treatments, outcomes, and health care settings according to the equations described in
spanned by the trials. the companion article). Information from
the National Health and Nutrition Exami-
Diabetes Care 26:3102–3110, 2003 nation Survey (NHANES)-III on the mar-
ginal and joint distributions of patient
characteristics and other risk factors is
used to ensure that the population is rep-

he Archimedes diabetes model is de- circumstances or actions whatever out-
scribed in a companion article in comes one wants to use the model to pre- resentative of the U.S. population (2).
this issue (1). This article describes dict. To test how well the Archimedes Other populations could be constructed if
the validation of that model. diabetes model does this, we simulated a desired (e.g., an Indian reservation).
wide range of clinical trials. The studies In general, the steps we used to sim-
RESEARCH DESIGN AND were chosen by an independent advisory ulate a particular clinical trial were as
METHODS committee appointed by the American follows. We began with the initial de-
The purpose of any model is to estimate as Diabetes Association. The trials were cho- scription of the trial, focusing in particu-
accurately as possible for a given set of sen by quality of design and importance lar on the inclusion and exclusion criteria,
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● treatment protocols, follow-up protocols,
From the Care Management Institute, Kaiser Permanente and Kaiser Permanente Southern California, and definitions of the outcomes. We then
Oakland, California. had the model do the following. 1) First, it
Address correspondence and reprint requests to David M. Eddy, 1426 Crystal Lake Rd., Aspen, CO searched the large population to identify
81611. E-mail: people who met the entry criteria for the
Received for publication 24 February 2003 and accepted in revised form 24 July 2003.
L.S. holds stock in Merck and Pfizer.
trial. Then it confirmed that their charac-
Abbreviations: 4S, Scandinavian Simvastatin Survival Study; CAD, coronary artery disease; CARE, Cho- teristics (e.g., age, sex, other conditions,
lesterol and Recurrent Events; DCCT, Diabetes Control and Complications Trial; DPP, Diabetes Prevention treatments, and lab results) matched the
Program; FPG, fasting plasma glucose; HHS, Helsinki Heart Study; HOPE, Health Outcomes Prevention distribution of characteristics published
Evaluation; HPS, Heart Protection Study; IDNT, Irbesartan Diabetic Nephropathy Trial; IRMA, Irbesartan in
Patients with Type 2 Diabetes and Microalbuminuria; LIPID, Long-Term Intervention with Pravastatin in
in the description of the trial. If not, over-
Ischemic Disease; LRC-CPPT, Lipid Research Clinics Coronary Primary Prevention Trial; MRC, Medical or undersampling was performed as re-
Research Council; SHEP, Systolic Hypertension in the Elderly Study; UKPDS, U.K. Prospective Diabetes quired, as would occur for a real trial.
Study; VA-HIT, Veterans Affairs High-Density Lipoprotein Cholesterol Interventions Trial; WOSCOPS, From that group, people were randomly
West of Scotland Coronary Prevention Study. selected to match the number of people in
A table elsewhere in this issue shows conventional and Système International (SI) units and conversion
factors for many substances. the trial. At the end of this selection pro-
© 2003 by the American Diabetes Association. cess, the demographic, physiologic, and
See accompanying editorial, p. 3182. anatomic features, as well as the medical


Eddy and Schlessinger

histories of the people in the virtual trial, sion criterion. If a trial included variables Use of trial data to build the model.
should match those of the people in the or conditions that were not yet in the Ten of the trials (DPP, HPS, MICRO-
real trial, as far as can be determined from model at the time the simulation was re- HOPE, LIPID, HHS, SHEP, LRC-CPPT,
the publications and within sampling er- quested, we expanded the model to in- MRC, VA-HIT, and WOSCOPS) were not
ror. 2) If the description of the trial called clude those factors before performing the used at all to build the physiology model;
for any interventions to be given before simulation. If any information from such they provided external or independent
the people were randomized, such as a a trial was used to help expand the model, validations of the model. The remaining
diet, then the simulated providers were we noted that fact and classified the re- eight trials (UKPDS, HOPE, CARE, Lewis,
instructed to give that intervention. 3) sulting validation as an internal or depen- IRMA-2, DCCT, IDNT, and 4-S) provided
The people were then randomized into dent validation. (The use of trial internal or dependent validations. For
the number of groups used in the trial. 4) information will be described more in de- these, the type of use varied from trial to
Simulated providers then gave the people tail below.) For example, before the Irbe- trial but can be summarized as follows. In
in each group the designated treatments, sartan in Patients with Type 2 Diabetes general, between 10 and 30 equations are
using the protocols described for the trial. and Microalbuminuria 2 trial (IRMA) (5) needed to represent the pathophysiology
This included any important breaches in could be simulated, we had to expand the of the disease and to calculate the effect of
either provider or patient adherence that part of the model that represented the a specific treatment on a specific outcome
were described for the trial. 5) The peo- progression of untreated nephropathy at in a specific population (i.e., not includ-
ple’s physiologies were allowed to con- high levels of albuminuria and the effects ing the equations for behaviors, care pro-
tinue to function, including the effects of of angiotensin-II receptor antagonists on cesses, logistics, and other nonbiological
whatever treatments they were receiving, glomerular function. The IRMA trial is aspects of the model). When a piece of
all as determined by the equations in the therefore considered an internal or de- information from a trial is used, it is used
model. 6) Simulated providers then fol- pendent validation. to help write only one of those 10 –30
lowed each patient with simulated ap- equations. A trial that significantly pushes
pointments and tests, using the protocols The trials the boundaries of the model might con-
and intervals described for the real trial. 7) The model was validated against 18 trials, tribute two or three pieces of information,
In the model, as in the real trial, between all chosen by the independent advisory each to a particular equation. A trial’s re-
scheduled visits patients could also de- committee. Ten trials explicitly included sults are never used to write or “fit” an
velop symptoms, seek care, make ap- people with diabetes. These are the U.K. equation, such as a regression equation or
pointments, have visits, be tested, be Prospective Diabetes Study (UKPDS) (3), transition probability, that directly relates
diagnosed, and be treated, all as deter- the Diabetes Prevention Program (DPP) the population, treatment, and outcome.
mined by the equations. 8) The results (6), the Heart Protection Study (HPS) (7), Indeed, there are no such equations in the
were recorded at the time intervals used the Health Outcomes Prevention Evalua- model. When the results being matched
in the real trials. 9) The results were then tion (HOPE) (8), Micro-HOPE (the dia- are sampled outcomes, an iterative
processed and compared with those de- betic subpopulation of the HOPE trial) method is used, stopping when the cal-
scribed for the real trial. (9), Cholesterol and Recurrent Events culated result and real result are within
All of this was done at whatever level (CARE) (10), the ACE Inhibitors and Di- ⫾1 SD.
of detail was necessary to simulate what abetic Nephropathy Trial (Lewis) (11), When information from a trial is used
was done in the real trial, using whatever the IRMA-2 trial (5), the Diabetes Control to help build the model, it is used to build
descriptions were available from the pub- and Complications Trial (DCCT) (4), and some new or deeper part of the model.
lications. For example, if two trials re- the Irbesartan Diabetic Nephropathy Thereafter, that part is used in all subse-
ported retinopathy outcomes but one Trial (IDNT) (12). The CARE trial has also quent simulations. For example, the 4S
measured two-step retinopathy (3), published results for age-group subpopu- trial was the primary source for informa-
whereas the other measured three-step lations (13). Eight more trials were cho- tion about the possible direct effects of
retinopathy (4), we had the simulated sen by the committee to test the model’s Simvastatin on rates of coronary artery oc-
physicians apply the appropriate protocol realism for representing coronary artery clusion. Thereafter, that equation was
to the appropriate trial. This also applies disease (CAD). They are the Long-Term used for all subsequent simulations in-
to inclusion criteria. If hypertension was Intervention with Pravastatin in Ischemic volving statins; for example, simulation of
defined as “a finding on at least two of Disease (LIPID) trial (14), the Helsinki the HPS study of Simvastatin did not use
three consecutive measurements ob- Heart Study (HHS) (15), the Systolic Hy- any data from the HPS trial. The high ac-
tained 1 week apart. . . of a mean systolic pertension in the Elderly Study (SHEP) curacy of the simulation of the HPS (Table
blood pressure ⬎135 mmHg or mean di- (16), the Lipid Research Clinics Coronary 1) provides an independent check on the
astolic blood pressure ⬎85 mmHg, or Primary Prevention Trial (LRC-CPPT) equation fitted with 4S data. As each new
both” (5), then these were the guidelines (17), the Medical Research Council equation is written, it becomes a perma-
that we had the simulated physicians fol- (MRC) hypertension trial (18), the West nent part of the model; the parameters of
low. When the description of a trial in- of Scotland Coronary Prevention Study an equation are never changed to fit par-
cluded variables or diseases that were not (WOSCOPS) (19), the Veterans Affairs ticular trials. Previously performed simu-
yet in the Archimedes model, we ignored High-Density Lipoprotein Cholesterol In- lations are rerun as needed to ensure that
them. For example, the model does not terventions Trial (VA-HIT) (20), and the as the model advances it remains accurate
yet include pregnancy. If a trial excluded Scandinavian Simvastatin Survival Study for all of the trials. At any time there is
pregnant women, we ignored that exclu- (4S) (21). always a single set of equations, and those


Validation of Archimedes diabetes model

Table 1—Comparison of model and trial results: trials that include people with diabetes

Result (%)
Name of trial Population Outcome Years Initial size Treatment group Model Trial
UKPDS Newly diagnosed type 2 Myocardial infarction 12 1,138 Conventional 19.6 19
diabetes 2,729 Intensive* 15.4 16
Albuminuria 12 1,138 Conventional 33.8 34
2,729 Intensive 21.3 23
Proteinuria 12 1,138 Conventional 9.8 10.3
2,729 Intensive 7.6 6.8
Retinopathy 12 1,138 Conventional 50 49
2,729 Intensive 39 39
DPP† Impaired glucose tolerance, Progression to diabetes 4 1,082 Control 38 37
Impaired fasting glucose 1,073 Metformin 31 28
and Overweight 1,079 Lifestyle 21 20
HPS† High risk for CAD events‡ Major coronary events 5 10,267 Placebo 11.7 11.8
10,269 Simvastatin 8 8.8
CHD death 5 10,267 Placebo 6.2 6.9
10,269 Simvastatin 5 5.5
HOPE High CAD risk§ Myocardial infarction 4.5 4,652 Placebo 11.3 11.3
4,645 Ramipril 8.9 9
MICRO-HOPE† High CAD risk, type 2 Myocardial infarction 4 1,808 Placebo 13 12.9
diabetes 1,769 Ramipril 9 10.2
CARE㛳 Recent myocardial Myocardial infarction 5 2,078 Placebo 12.3 13.2
infarction, average 2,081 Simvastatin 9.3 10.2
cholesterol CHD death 5 2,078 Placebo 6.2 5.7
2,081 Simvastatin 4.4 4.6
Lewis Type 1 diabetes, Doubling of creatinine 4 202 Placebo 37 33
nephropathy 207 Captopril 19 22
IRMA-2 Type 2 diabetes, micro- Nephropathy 1.8 201 Placebo 17.4 15
albuminurea 195 Irbesartan 150 9.5 9
194 Irbesartan 300 5.3 4.5
DCCT primary Type 1 diabetes without Retinopathy 8 378 Loose control 34 38
retinopathy 348 Tight control 9.3 10
Albuminuria 8 378 Loose control 29 28
348 Tight control 17 15
Proteinuria 9 378 Loose control 32 25
348 Tight control 15 18
DCCT secondary Type 1 diabetes with Retinopathy 8 352 Loose control 52 48
retinopathy 363 Tight control 22 21
Albuminuria 8 352 Loose control 33 35
363 Tight control 22 22
Proteinuria 9 352 Loose control 9 11
363 Tight control 5 6
IDNT Type 2 diabetes, Doubling of creatinine 4 579 Placebo 35 37
nephropathy 569 Irbesartan 26 28
*Sulphonylurea, Metformin, or insulin; †not used to build physiology model; ‡CAD, occlusive arterial disease or diabetes; §CAD or diabetes plus at least one CVD
risk factor; 㛳eight additional validation exercises were done for the under-60 and over-60 age-groups. No model results were significantly different from trial results.

equations reproduce or predict every progression of diabetes (Fig. 1 of the com- trials were used to help build models of
trial. The fact that the model is anchored panion article [1]). Specifically, the aver- the complications of diabetes. We used
to such a wide variety of populations, age fasting plasma glucose (FPG) in the data from the UKPDS to help write the
treatments, and outcomes guards against control group of the UKPDS trial (22) was equations for the retinopathy and ne-
overspecification of the model. used to help write an equation for the ef- phropathy features. The DCCT was used
With that background, the actual fects of insulin resistance, and the DCCT’s to help model the progression of ne-
uses of trial data were as follows. Two tri- results were used to help model the de- phropathy and retinopathy in patients
als contributed to the model of glucose velopment of type 1 diabetes and the ef- with type 1 diabetes. The HOPE trial was
homeostasis and the development and fect of glucose control. Data from several used to model the effects of ACE inhibi-


Eddy and Schlessinger

to calculate whether the differences are

statistically significant or could be ex-
plained by chance. The published reports
of trials rarely contain sufficient infor-
mation to perform precise statistical
comparisons of Kaplan-Meier curves.
Specifically, because entry into a trial is
usually staggered, the number of people
actually followed to the last reported
year of a trial is usually much smaller
than the number of people entered, typ-
ically ⬍25% of the starting sample size.
To calculate the statistical significances
of the differences, we used a very con-
servative method that assumes that ev-
eryone entered into a trial is followed
for its full duration, with the provision
that if there are known to be ⬍100 peo-
ple at the last follow-up time, we would
use the results from the previous obser-
Figure 1—Comparison of model and trial: fraction of patients having myocardial infarctions in vation period. This method biases
the UKPDS. against the model because it greatly un-
derestimates the random variation that
affects the results toward the end of the
tors on variables such as peripheral resis- model with the actual results of the trial. real trial. With that limitation, we deter-
tance, fast and slow occlusion, the action Kaplan-Meier curves provide the most mine for each arm of a trial whether the
of thrombolytics, and the progression of complete information about the out- difference between the trial and the
congestive heart failure. Data from the comes over the entire time course of the model are statistically significant at the
CARE trial were used to help build the trial in all the arms of a trial. Because the P ⫽ 0.05 level (corrected ␹2). If not, we
part of the model that determines survival results of both a real trial and the model say that the model’s results “statistically
following myocardial infarction as a func- are subject to random variation, one match” the trial’s results. To gain an
tion of the proportion of the myocardium would not expect the Kaplan-Meier overview of the complete body of vali-
affected by myocardial infarction and the curves to match exactly. Our approach is dation exercises, we also calculated cor-
recovery of the myocardium following
nonfatal myocardial infarction. Data from
the Lewis trial were used to help estimate
the progression of glomerular damage in
people with established and severe ne-
phropathy. Information from the IRMA-2
trial was used to model the progression of
untreated nephropathy at high levels of
albuminuria and to model the effects of
angiotensin-II receptor antagonists on
glomerular function. Finally, data from
the 4S trial were used to model the effect
of statins on development of thrombi.
Goodness of fit. To determine the accu-
racy of the model, we focused on out-
comes determined by the underlying
disease. Outcomes that are likely to be
heavily influenced by local practices, such
as the rate of bypasses, or by nondiabetes
factors, such as deaths from other causes,
have questionable external validity and
were not included.
For the disease-determined out-
comes, we use Kaplan-Meier curves to Figure 2—Comparison of model and trial: Fraction of patients developing diabetes in the Diabetes
compare the results calculated by the Prevention Program


Validation of Archimedes diabetes model

Table 2—Comparison of model and trial results: for trials of CAD

Result (%)
Name of trial Population Outcome Years Initial size Treatment group Model Trial
LIPID* Acute MI within 3–36 months, CHD death 6 4,502 Placebo 7.5 8.3
“broad range” of lipid levels 4,512 Pravastatin 6.5 6
Myocardial infarction 6 4,502 Placebo 14.4 15.6
4,512 Pravastatin 11 12
HHS* Middle aged men, dyslipidemia Myocardial infarction 5 2,030 Placebo 4.2 4.1
2,051 Gemfibrozil 3 2.7
4S History of angina or acute Myocardial infarction 5.4 2,223 Placebo 23.8 25
myocardial infarction 2,221 Simvastatin 14.2 16
SHEP* Isolated systolic hypertension CAD events 4.5 2,371 Placebo 5.8 5.9
2,365 Antihypertensive† 4.5 4.3
LRC* Primary hypercholesterolemia Myocardial infarction 4.5 1,543 Placebo 5.4 6
1,543 Cholestyramine 4 5
MRC* Mild hypertension Myocardial infarctions 4 8,677 Placebo 4.5 4.5
8,677 Antihypertensive‡ 3.3 3.4
WOSCOPS* Very high risk and hyper- Myocardial infarctions 5 3,293 Placebo 5.2 7.9§
cholesterolemia 3,302 Pravastatin 2.6 5
Coronary heart disease deaths 5 2,078 Placebo 1.9 1.7
2,081 Pravastatin 1.1 1.2
VA-HIT Previous CAD, low HDL Myocardial infarctions 5 1,264 Placebo 25.2 23
1,267 Gemfibrozil 17.8 19.7
Coronary heart disease death 5 1,264 Placebo 10.2 9.6
1,267 Gemfibrozil 8.7 8.4
Stroke 5 1,264 Placebo 4.2 6.6
1,267 Gemfibrozil 3.5 5.2
*Not used to build physiology model; †step 1: Chlorthalidone, step 2: Atenolol; ‡Bendrofluazide or propranolol; §difference between model results and trial results
statistically significant, P ⬍ 0.01; 㛳difference between model results and trial results statistically significant, P ⬍ 0.05.

relation coefficients for the two sets of publicly presented before the actual out- further. The correlation coefficient for all
results. comes of the trial were published. 74 exercises is r ⫽ 0.99 (Fig. 3). If the
The results for the 10 trials that ex- outcomes in the control group and the
RESULTS plicitly included patients with diabetes absolute differences between the control
Including each arm and each outcome re- are summarized in Table 1. The results of and treated groups are compared for
ported in a trial as a validation exercise to the other trials that are pertinent to the model and trial, the correlation coefficient
date, the model has been subjected to 74 cardiovascular complications of diabetes is r ⫽ 0.99. Focusing specifically on the
validation exercises involving the 18 tri- are summarized in Table 2. Trials not absolute differences in the outcomes,
als. The use of Kaplan-Meier curves to used to build the model are marked. which determines the number needed to
compare the results of the model and trial treat, the correlation coefficient is r ⫽
are illustrated in Figs. 1 and 2. Figure 1 Goodness of fit 0.97. For the 10 trials that were not used
shows the curves calculated by the model Of 74 validation exercises, the results of to build the model, the correlation coeffi-
and reported for the trial for the fraction the model statistically matched the results cient is also r ⫽ 0.99. (This includes the
of people who develop fatal or nonfatal of the trial in all but three exercises. Each three discrepant results.)
myocardial infarctions in the UKPDS (3), of these was from a trial that was not used
a trial that was used to help build the to build the model. In one of them, the CONCLUSIONS — Our objective for
model and thus represents an internal or stroke outcome in the placebo group of the Archimedes diabetes model is to cre-
dependent validation. This also illustrates the VA-HIT trial, the results just barely ate a “virtual world” that represents clini-
the relatively unstable results that can oc- reached statistical significance (P ⫽ 0.04), cal reality as realistically as is reasonably
cur at the longest follow-up time due to which is to be expected in 74 exercises. possible given today’s information and
the steady decrease in sample size over For this exercise, the model still estimated modeling methods. Once created, the
time; ⬍100 patients were followed for the the effect of the treatment with good ac- model can be used to address a variety of
full 15 years in the UKPDS. Figure 2 curacy (1.7 vs. 1.4%, P ⬎ 0.05). The only clinical problems and questions. For ex-
shows the results for the DPP. The DPP exercises that showed a highly significant ample, interventions, guidelines, perfor-
was not used to build the model; the re- difference between the results of the mance measures, disease management
sults of the model were calculated based model and the trial came from the programs, strategic goals, implementa-
on the initial descriptions of the trial and WOSCOPS trial (19), which is discussed tion strategies, continuous quality im-


Eddy and Schlessinger

diabetes of various durations, had systolic

blood pressures ⬎135 mmHg and dia-
stolic blood pressures ⬎85 mmHg, had
initial albumin excretion rates ranging
from 20 to 200 ␮g/min, had serum creat-
inine concentrations not exceeding 1.5
mg/dl, etc. Then, to deliver the same re-
sults as Archimedes, the model would
need to simulate 17 other trials, each of
which has an equally complex list of bio-
logical variables that address other popu-
lations, treatments, and outcomes. It is
difficult to imagine how this would be
done without a robust model of biology at
the level of detail defined by these vari-
ables. Some may question the need for
this level of detail in a simulation, but our
operating principle is that if researchers
and clinicians consider a variable or pro-
cess sufficiently critical to be made part of
a trial’s protocol, then it should be con-
Figure 3—Comparison of the results calculated by model with the results of the actual trials for sidered critical in the simulation of that
74 validation exercises. Filled circles compare the results calculated by the trials (x-axis) and the protocol. Even if testing protocols can be
results calculated by the model (y-axis) for independent or external validation exercises. Gray
loosened (e.g., “the average of three read-
diamonds compare the results for dependent or internal validation exercises. The 45° line indicates
perfect accuracy. The results will deviate from this line due to random factors as well as any ings a week apart”), certainly the inclu-
inaccuracies in the model. sion criteria (e.g., “an myocardial
infarction in the past 2–30 months”), and
outcomes (e.g., “increase in urinary pro-
provement projects, and research projects troversies about the pathogenesis of dia- tein of 30% over baseline”) are critical.
can be “tried out” and optimized in the betes. These validations provide some There is very little empirical evidence
virtual world of the model in ways that assurance that despite these gaps, the in- to bring to this question because there are
may not be feasible in the real world. formation that does exist, at least as inter- extremely few validations of other models
The ability to use a model for these preted through this model, provides a against clinical trials using any methodol-
purposes depends critically on the accu- reasonably sound basis for making deci- ogy. A complete analysis of this literature
racy of its estimates. Ultimately, this re- sions and setting policies. is beyond this article, but cautionary flags
quires comparisons to real experiences. A Perhaps the single most important are raised by findings that the Framing-
starting point is to use the model to sim- feature that distinguishes the Archimedes ham equation, which is the core of most
ulate real clinical trials and compare the model from other clinical models is that it Markov models of CAD complications in
results. If the results match within the ex- is based on a representation of human bi- diabetes, was “disappointing” because of
pected degree of sampling variation, we ology. The primary motivation for taking its inability to predict the incidence of
gain confidence that the model’s repre- this approach is that it is the only way to CAD events in the Cardiff Diabetes Regis-
sentation of the pathophysiology of dia- achieve the objectives we had for the try (24), although firm conclusions can-
betes and its complications is reasonably model, as described elsewhere (1). This not be drawn due to the methods used in
realistic. This in turn builds confidence justification notwithstanding, it is still that study. The Framingham equation
that the results of future applications will reasonable to ask whether a nonbiological also misestimated CAD events in the UK-
be reasonably accurate, at least for the model might be able to achieve the same PDS, by a factor of almost two for CAD
populations, organ systems, treatments, degree of predictive accuracy. We believe events and a factor of five for coronary
outcomes, and care settings represented that it would be extremely difficult. The heart disease mortality (25). A compari-
by the trials. To our knowledge, no other immediate problem is that just perform- son of the UKPDS risk engine and the
model in health care—for diabetes, CAD, ing the validations requires a high degree Joint British Society (JBS) method found
or any other condition— has been vali- of biological detail. For example, the out- highly significant and clinically important
dated against clinical trials as we describe come of the IRMA-2 trial was “a urinary differences in the proportions of people
here (23). albumin excretion rate that was ⬎200 ␮g/ classified into different risk groups by
The results obtained to date are en- min and at least 30% higher than the those two models (26). Estimates by the
couraging. First, they do suggest that the baseline rate on at least two consecutive Framingham and Prospective Cardiovas-
model is reasonably realistic. But they also visits” (5). To simulate that trial, a model cular Munster (PROCAM) models of the
carry information concerning the body of would need to estimate the effect of Irbe- risk of coronary heart disease events in
knowledge about diabetes and CAD that sartan on that outcome, as determined by people with diabetes varied by a factor of
has been built up over the years. There are that protocol, in 590 patients who ranged more than two (27). Also noteworthy are
many remaining uncertainties and con- in age from 30 to 70 years, had type 2 the findings of wide differences in cross


Validation of Archimedes diabetes model

tests of different diabetes models, even simulate glucose homeostasis and calcu- fects of these factors and to identify which
when each model is handed identical pa- late a person’s FPG, and many more equa- aspects of care processes are most impor-
tients (28,29). tions are needed to calculate an end point tant to monitor.
It is important to stress several points like a myocardial infarction. Thus even A similar limitation concerns the loss
about the validation exercises. First, each when a validation exercise for Archimedes of patients to follow-up in real trials. In
of these exercises involves a very deep involves a trial that contributed some in- real trials, patients who die or are lost to
simulation. In each, the predicted results formation to the model, in order for the follow-up are censored in the calculation
come from thousands of simulated indi- validation to be successful dozens of other of Kaplan-Meier curves. In the model,
viduals. Each of them has a simulated equations that were not touched by the censoring can occur due to deaths. How-
liver, heart, pancreas, and other organs. trial need to function correctly. The vali- ever, we do not model other reasons for
Each liver is producing glucose, each cor- dations of the eight trials that contributed losing patients unless the necessary infor-
onary artery can develop plaque or to the model can be considered not only mation is published. This has the impli-
thrombus at any point in any artery, each confirmations of the particular equations cation of assuming that, in a real trial,
kidney is clearing urine, and so forth. All that were affected by each particular trial, there are no patient selection biases
told, each simulation involves scores of but also independent tests of all the other affecting which patients are lost to follow-
equations in every patient; they all have to equations needed to complete the calcu- up. If there were information on this,
work together correctly over long periods lations. Furthermore, an equation that Archimedes could include it. Lacking
of simulated time in order to generate the was touched by any particular trial was that, the model can be used to explore the
outcomes seen in the virtual trials. In gen- independently validated by all the other potential importance of such biases.
eral, the results for the control groups test exercises that involved other trials. A fourth limitation is that the model
the realism of the model’s representation The validations have several limita- does not attempt to represent the under-
of the natural history of the disease, and tions. First, the fact that they demonstrate lying biology for causes of death other
the effects of the risk factors, patient char- the realism of our representation of the than diabetes and its complications, CAD,
acteristics, previous medical histories, se- underlying biology of the disease does not congestive heart failure, and asthma. The
verity of disease, and previous and mean that our representation is the only validation exercises presented here only
concurrent medications, as described in one capable of accomplishing similarly address causes of death related to diabetes
the designs of the trials. The results for the accurate predictions. All that can be said and CAD.
treated groups test the model for all these now is that the representation we have Fifth, our validation methods ignore
plus the effects of the treatments. chosen is successful in producing accu- variables or conditions (e.g., pregnant
A second point is that together these rate results for a wide variety of popula- women) that might have been in the ex-
validations crisscross virtually every as- tions, treatments, outcomes, and settings. clusion criteria of a trial but that are not in
pect of diabetes and its complications (see We know of no other representations that the model. In essence, this means that the
Table 1). We believe a model should be have been tested in this way that would validations are testing the explanatory
considered “validated” only for applica- permit any comparisons. power of the variables and conditions that
tions that are spanned by the trials used to Second, the validations indicate that the model does currently include.
validate it. A measure of this is whether the model simulates what happens in A sixth limitation of the validations is
the populations, treatments and out- clinical trials. However, this does not nec- that they do not evaluate any care pro-
comes for a proposed application have essarily document the model’s accuracy cesses that go beyond those that are de-
each been included in at least one trial for predicting what happens outside of scribed as part of a trial’s protocol. The
against which the model has been vali- trials. The issue is “efficacy” vs. “effective- validations also do not address the logis-
dated. Thus the multiple validations re- ness.” The fact that patient and physician tics, resources, or costs involved in the
ported here are not redundant; each is behaviors may be different in research set- model. These factors can vary from setting
probing different parts of the model. tings than nonresearch settings affects all to setting and cannot be validated in any
Third, whenever data from a trial approaches for interpreting clinical trials, general sense, the way a representation of
were used to help build the model (i.e., including expert judgment. The barrier to human physiology or the effect of a treat-
the internal or dependent validations), conducting validations outside of re- ment can. Our approach to this is to en-
they were used to address a very specific search settings is the availability of the able users to check the care processes and
aspect of the underlying physiological necessary data. On the positive side, resources that are currently in the model
process, usually one equation out of doz- Archimedes includes the features needed and modify them as needed.
ens that are needed to complete a calcu- to perform such validations, such as pa- A seventh limitation derives from the
lation. For example, the rate of increase of tient and practitioner behaviors, failure to fact that the model has been validated
FPG in the “conventional policy” group of follow protocols or reach treatment goals, against the average or aggregated out-
the UKPDS was used to help build the both random and systematic variations in comes for populations because that is the
model. But it was not used to fit an equa- practices, errors in conducting or inter- information available from the published
tion for FPG. It was used to help write the preting tests, and so forth. As better infor- trials. The simulations do reproduce the
equation that describes the effect of insu- mation on these factors becomes available complex spectrum of ages, race/ethnicities,
lin resistance on hepatic glucose produc- from computerized medical records, we previous medical histories, and so forth
tion and the uptake of glucose by muscle will perform these types of “effectiveness” that are in each trial, but the outcomes
(Equation 10 in the companion article). validations. In the meantime, the model have to be averaged before they can be
Nine other equations are also needed to can be used to explore the potential ef- compared with the averages published for


Eddy and Schlessinger

the trials. Some trials, such as the CARE poor adherence to protocols or treat- committee, especially Richard Kahn and John
and HOPE trials, have published results ments by practitioners or patients, incom- Buse, for overseeing the validations.
for some subpopulations, and Archi- plete follow-up, and/or important facts
medes matches those results very well. about the population that are not com-
Furthermore, the wide mix of popula- pletely understood or described by the in- 1. Eddy DM, Schlessinger L: Archimedes: a
tions and other factors across the different vestigators. But beyond these is the fact trial-validated model of diabetes. Diabetes
trials provides a between-trial check on that the intervention being studied in the Care 26:3093–3101, 2003
the model’s realism for those factors—the trial might contain some surprises. In- 2. Third National Health and Nutrition Ex-
model delivers accurate average results no deed, that is the very reason most trials are amination Survey (NHANES III, 1994)
matter how the populations and factors done. When a new trial reveals a result CD ROM Series 11, No 1. Hyattsville,
are mixed in the different trials. However, that could not have been predicted, we MD, National Center for Health Statistics,
a more systematic analysis of these issues rejoice with everyone else about learning 1988
requires patient-specific information the new information and use it to advance 3. UK Prospective Diabetes Study (UKPDS)
from trials. the model. Group: Intensive blood-glucose control
with sulphonylureas or insulin compared
Regarding individual patients, there Conversely, successful prediction of a with conventional treatment and risk of
are theoretical limits on the extent to trial’s results by the model without any complications in patients with type 2 di-
which any model can ever be validated for use of the trial’s data, as occurred here for abetes (UKPDS 33). Lancet 352:837– 852,
predicting the outcomes for a particular more than half of the trials (6,7,9,14 –20), 1998
patient. All we can say about the does not mean that these trials should not 4. The Diabetes Control and Complications
Archimedes model from these validations have been done. For example, the DPP Trial Research Group: The effect of in-
is that it has been reasonably accurate for not only confirms the interpretation of the tensive treatment of diabetes on the de-
a wide spectrum of populations with dif- previous research, which is very impor- velopment and progression of long-term
ferent mixtures of ages, sexes, race/ tant in its own right, but also suggests that complications in insulin-dependent dia-
ethnicities, complications, severities of there are no surprises; our current under- betes mellitus. N Engl J Med 329:977–986,
disease, prior histories, concurrent treat- standing of the early natural history of the 5. Irbesartin in Patients With Type-2 Diabe-
ments, and comorbid conditions. When disease (at least as described by this tes and Microalbuminuria Study Group:
person-specific data from clinical infor- model) appears to be correct. Further- The effect of irbesartan on the develop-
mation systems become available, that more, the DPP collected patient-specific ment of diabetic nephropathy in patients
potential use of the model can be ex- data, which if analyzed with methods we with type 2 diabetes. N Engl J Med 345:870 –
plored in greater depth. describe elsewhere (30), could greatly in- 878, 2001
When a mismatch in a validation oc- crease our understanding of the patho- 6. Diabetes Prevention Program Research
curs, we examine it to determine its cause physiology of the disease. Group: Reduction in the incidence of type
and whether any revisions to the model In summary, we have tried to build a 2 diabetes with lifestyle intervention or
are appropriate. In the 74 validation ex- model that operates at the level of detail metformin. N Engl J Med 356:393– 402,
ercises conducted thus far, the results that clinicians and administrators con- 7. Heart Protection Study Collaborative
for only one trial were substantially differ- sider important for their decisions. To Group: MRC/BHF Heart Protection Study
ent from the real results. The discrepancy strengthen the credibility and usefulness of antioxidant vitamin supplementation
occurred in the control group of the of the model we have tested it against a in 20,536 high-risk individuals: a ran-
WOSCOPS trial, where the model under- wide range of clinical trials. It appears to domized placebo-controlled trial. Lancet
estimated the rate of CAD events by be a good representation of the anatomy, 360:23–33, 2002
⬃35%. The model still predicted the ab- pathophysiology, tests, treatments, and 8. The Heart Outcomes Prevention Evalua-
solute effect of Pravastatin accurately. The outcomes of diabetes and its complica- tion Study Investigators: Effects of an an-
discrepancy in the background rate of tions for applications that involve the giotension-converting-enzyme inhibitor,
CAD events could be due to the presence range of populations, organ systems, ramipril, on cardiovascular events in
high-risk patients. N Engl J Med 342:145–
of a risk factor in that population that was treatments, outcomes, and care processes 153, 2000
not measured or reported in the descrip- spanned by these trials. As additional in- 9. The Heart Outcomes Prevention Evalua-
tion of the trial and therefore could not be formation becomes available the model tion Study Investigators: Effects of ramipril
included in the trial. Alternatively, it will be expanded and revalidated as on cardiovascular and microvascular out-
could be that the model’s representation needed. comes in people with diabetes mellitus:
of physiology is not accurate for that par- results of the HOPE study and MICRO-
ticular population. We have not added a HOPE substudy. Lancet 355:253–259,
“WOSCOPS factor” to the model to make Acknowledgments — The order of author- 2000
it match this trial’s results. ship is alphabetical. The development of this 10. Sacks FM, Pfeffer MA, Moye LA, Rouleau
This example emphasizes the fact that model was supported by Kaiser Permanente JL, Rutherford JD, Cole TG, Brown L,
Southern California and the Care Management Warnica JW, Arnold JMO, Wun CC,
failure of a validation exercise does not Institute of Kaiser Permanente. The advisory Davis BR, Braunwald E: The effect of Prav-
necessarily mean the model is flawed. In committee and performance of the validation astatin on coronary events after myocar-
addition to discrepancies due to random exercises was supported in part by Bristol My- dial infarction in patients with average
variations, the results can be thrown off ers Squibb through an educational grant to the cholesterol levels. N Engl J Med 335:
by any changes in the treatment protocols American Diabetes Association. 1001–1009, 1996
in the real trial that are not described, We thank the members of the advisory 11. Lewis EJ, Hunsicker LG, Clarke WR,


Validation of Archimedes diabetes model

Bain, Raymond P, Berl T, Rohde R, Raz lated systolic hypertension: final results of years (UKPDS 13). BMJ 310:83– 88, 1995
I: The effect of angiotensin-converting- the Systolic Hypertension in the Elderly 23. Medline search for “model” and “valida-
enzyme inhibition on diabetic nephropa- Program (SHEP). JAMA 265:3255–3264, tion,” “validity,” or “accuracy.” Available
thy. N Engl J Med 329:1456 –1462, 1993 1991 from Accessed
12. Lewis EJ, Hunsicker LG, Clarke WR, To- 17. The Lipid Research Clinics Coronary Pri- 19 September 2003
mas P, Pohl MA, Lewis JB, Ritz E, Atkins mary Prevention Trial results. I. Reduction 24. McEwan PC, Peters J, Currie CJ, Hopkins
RC, Rohde R, Raz I: Renoprotective effect in incidence of coronary heart disease. P, Griffiths JD, Williams JE: The unreli-
of the angioitensin-receptor antagonist JAMA 251:351–364, 1984 ability of Framingham risk equations in
irbesartan in patients with nephropathy 18. Medical Research Council Working Party: predicting coronary heart disease (CHD)
due to type 2 diabetes. N Engl J Med 345: MRC trial of treatment of mild hyperten- events in diabetes (Abstract). Diabetes 49
851– 860, 2001 sion: principal results. BMJ 322:97–104, (Suppl. 1):A187, 2000
13. Lewis SJ, Moye LA, Sacks FM, Johnstone 2001 25. Yeo WW, Yeo KR: Predicting CHD risk in
DE, Timmis G, Mitchell J, Limacher M, 19. Shepherd J, Cobbe SM, Ford I, Isles CG, patients with diabetes mellitus. Diabet
Kell S, Glasser SP, Grant J, Davis Barry R, Lorimer AR, Macfarlane PW, McKillop Med 18:341–344, 2001
Pfeffer MA, Braunwald E: Effect of prava- JH, Packard CJ, the West of Scotland 26. Song S, Brown P: Comparison of UKPDS
statin on cardiovascular events in older Coronary Prevention Study Group: Pre- Risk Engine Model with Framingham-
patients with myocardial infarction and vention of coronary heart disease with Based Method in the assessment of CHD
Pravastatin in men with hypercholester- risk in patients with diabetes mellitus and
cholesterol levels in the average range: re-
olemia. N Engl J Med 333:1301–1307,
sults of the cholesterol and recurrent its clinical implications (Presented at ADA
events (CARE) trial. Ann Intern Med 129: Annual Meeting, New Orleans, LA, 14
20. Rubins HB, Robins SJ, Collins D, Fye CL,
681– 689, 1998 June 2003). Diabetes 52 (Suppl. 1):41-
Anderson JW, Elam MB, Faas FH, Linares
14. LIPID Study Group: Prevention of cardio- E, Schaefer EJ, Schectman G, Wilt TJ, OR, 2003
vascular events and death with pravasta- Wittes J: Gemfibrozil for the secondary 27. Game FL, Jones AF: Coronary heart dis-
tin in patients with coronary heart disease prevention of coronary heart disease in ease risk assessment in diabetes mellitus:
and a broad range of initial cholesterol men with low levels of high-density lipo- a comparison of PROCAM and Framing-
levels. N Engl J Med 339:1349 –1357, 1998 protein cholesterol. N Engl J Med 341: ham risk assessment functions. Diabetes
15. Frick MH, EloO, Haapa K, Heinonen OP, 410 – 418, 1999 UK. Diabet Med 18:355–359, 2001
Heinsalme P, Helo P, Huttunen JK, Kaita- 21. Scandinavian Simvastatin Survival Study 28. Brown JB, Palmer AJ Bisgaard P, han W,
niemi P. Koskinen P, Manninen X, et al: Group: Randomized trial of cholesterol in Pedula K, Russell A: The Mt. Hood Chal-
Helsinki Heart Study: primary-preven- 4444 patients with coronary heart dis- lenge: cross-testing two diabetes simula-
tion trial with gemfibrozil in middle-aged ease: the Scandinavian Simvastatin Sur- tions models. Diabetes Res Clin Pract 50
men with dyslipidemia: safety of treat- vival Study (4S). Lancet 344:1383–1389, (Suppl.):S57–S64, 2000
ment, changes in risk factors, and inci- 1994 29. Mt. Hood Challenge. II. San Francisco,
dence of coronary heart disease. N Engl 22. UK Prospective Diabetes Study Group: CA, 12–13 June 2002
J Med 317:1237–1245, 1987 Relative efficacy of randomly allocated diet, 30. Schlessinger L, Eddy DM: Archimedes: a
16. SHEP Cooperative Research Group: Pre- sulphonylurea, insulin, or Metformin in new model for simulating health care sys-
vention of stroke by antihypertensive patients with newly diagnosed non-insu- tems: the mathematical formulation. J Bio-
drug treatment in older persons with iso- lin dependent diabetes followed for three medical Informatics 35:37–50, 2002