Вы находитесь на странице: 1из 9

Ultrasound Obstet Gynecol 2001; 18: 357– 365

Comparison of ‘pattern recognition’ and logistic regression


Blackwell Science Ltd

models for discrimination between benign and malignant pelvic


masses: a prospective cross validation
L. VALENTIN, B. HAGEN*, S. TINGULSTAD* and S. EIK-NES
Departments of Obstetrics and Gynecology, Lund University, University Hospital, Malmö, Sweden and *Trondheim University Hospital,
Trondheim, Norway

K E Y W O R D S : Doppler ultrasound, Multiple logistic regression model, Ovarian cancer, Ovarian tumor, Pattern recognition,
Pelvic tumor, Ultrasound

The diagnostic performance of the mathematical models was


ABSTRACT
much poorer in this study than in those in which the models
Objectives To test prospectively the diagnostic performance had been created.
of two logistic regression models for calculation of individual Conclusion The poor diagnostic performance of the
risk of malignancy in adnexal tumors (the ‘Tailor model’ and mathematical models can probably be explained by subtle
the ‘Timmerman model’), and to compare them to that of differences in definitions and examination technique and by
‘pattern recognition’ (subjective evaluation of the gray- differences between the original tumor populations and the
scale ultrasound image and color Doppler ultrasound study population. For mathematical models to be generally
examination). useful, they probably need to be created on the basis of a very
Design Consecutive women with a pelvic mass judged large number of tumors, and the variables in the model must
clinically to be of adnexal origin underwent preoperative be unequivocally defined and the examination technique
ultrasound examination including color and spectral Doppler meticulously standardized.
examination. The same examination techniques and defini-
tions as those used in the studies in which the logistic regres-
sion models had been created were used. The Tailor model INTRODUCTION
was tested in 133 women (35 of whom had a malignancy) and
the Timmerman model in 82 women (29 of whom had a It is important to be able to discriminate between benign
malignancy). A subset of 79 women (28 of whom had a and malignant adnexal masses, because a correct diagnosis
malignancy) was used to compare the performance of the makes it possible to optimize and individualize treatment, for
Tailor model and the Timmerman model by calculating and example, to choose expectant management, puncture, or sur-
comparing the areas under the receiver operating character- gery, and to choose the time and method of operation. In
istics curves of the two models. Sensitivity and specificity with most cases, an experienced sonographer can confidently and
regard to malignancy were calculated for all three methods. correctly distinguish between benign and malignant adnexal
masses on the basis of subjective evaluation of the gray-scale
Results Pattern recognition performed better than the two ultrasound image with or without the added information of
logistic regression models (sensitivity around 85%, specifi- results of color Doppler ultrasound examination (‘pattern
city around 90%). Using a risk of malignancy of > 50% to recognition’). The figures reported for the sensitivity of
indicate malignancy (as suggested in the original publica- pattern recognition range from 88% to 98%, and those for
tions), the sensitivity of the Tailor model was 69% and the specificity from 89% to 96%1–3. However, for less experi-
specificity 88% (n = 133). The corresponding values for the enced sonographers, ultrasound methods other than pattern
Timmerman model were 62% and 79% (n = 82). The recognition might be preferable. Recently, multiple logistic
receiver operating characteristics curves showed the two regression models or artificial neural networks using clinical
logistic regression models to have similar diagnostic proper- information and results of gray-scale and Doppler ultrasound
ties (area under the curve, 0.87 vs. 0.84; P = 0.25; n = 79). examination have been advocated as excellent methods

Correspondence: Dr L. Valentin, Department of Obstetrics and Gynecology, Lund University, University Hospital, Malmö, SE 205 02 Malmö, Sweden
(e-mail: lil.valentin@obst.mas.lu.se)
Received 1-12-00, Revised 23-4-01, Accepted 11-6-01

ORIGINAL PAPER 357


Pelvic tumors Valentin et al.

for calculating individual risks of malignancy in adnexal (58%) in Malmö. One hundred and ten women had only one
masses4–6. The multiple logistic regression model suggested tumor, 24 women had two pelvic tumors, and two women
by Tailor and coworkers4 includes information on the woman’s had three or more pelvic tumors. For statistical reasons, each
age, the presence of papillary projections in the mass, and the woman contributed only one tumor to the study, the most
highest time-averaged maximum velocity recorded from the relevant tumor (the largest or most complex one) being selected
tumor using Doppler ultrasound. The model of Timmerman for inclusion. Only one woman had a benign lesion (serous
and colleagues5 uses information on menopausal status, papillary cystadenoma) on one side and a malignant lesion
the presence of papillary projections, the color content of the (mucinous cystadenocarcinoma) on the other, the malignant
color Doppler tumor scan, and serum CA 125 values. The lesion being included for analysis. Informed consent was
model of Tailor and coworkers had a sensitivity of 93% and obtained from all the participants after the nature of the pro-
a specificity of 90%, when a risk of malignancy of > 25% was cedures had been fully explained. The study was conducted
used to indicate malignancy4. The corresponding figures for in agreement with the Declaration of Helsinki principles7.
the logistic regression model suggested by Timmerman and The women underwent ultrasound examination by the
colleagues were 96% and 87%5. In the two studies cited, the first author (L.V.) within 2 weeks preceding the operative
mathematical models were created and tested in the same procedure (laparotomy or laparoscopic surgery) but without
group of patients. Therefore, their diagnostic performance regard to the day of the menstrual cycle. Immediately before
was almost certainly overestimated, and both research teams the start of the examination, the women were interviewed by
emphasize the necessity of testing their models prospectively the sonographer in a structured manner about menopausal
in a new series of patients4,5. status, use of hormonal therapy, previous hysterectomy and
The purpose of this study was to test prospectively, in a symptoms. To ensure that the same definitions and examina-
new series of patients with tumors, the diagnostic per- tion technique were used in this study as in the original stud-
formances of the mathematical models designed by Tailor ies, the methods sections of the original publications4–6 were
and coworkers4 and by Timmerman and colleagues5 and to scrutinized, and some of the authors of the other research
compare them to that of pattern recognition by an experi- teams (Dirk Timmerman, Thomas Bourne and Anil Tailor)
enced sonographer using a good ultrasound system. were contacted to clarify uncertainties. Thus, postmenopause
was defined as ≥ 1 year after the last menstruation, and a
papillary projection was defined as a solid projection into a
METHODS
cyst cavity from the cyst wall of > 3 mm in height5. Women
The logistic regression models tested were that of Tailor and who had undergone hysterectomy and who were > 50 years
coworkers4 referred to here as ‘the Tailor model’: old were classified as postmenopausal5.
The examinations started with transabdominal and/or
Probability of malignancy = 1/(1 + e–z),
transvaginal real-time gray-scale ultrasound examination of
where z = (0.1273 × age) + (0.2794 × time-averaged maximum the pelvis, transvaginal examination being carried out with
velocity) + (4.4136 × papillary projection score) – 14.2046 and the woman in the lithotomy position and with an empty
e is the mathematical constant and base value of natural bladder. All Doppler examinations were performed trans-
logarithms, and that of Timmerman and colleagues5 referred vaginally. The length (L), depth (D) and width (W) of each
to here as ‘the Timmerman model’: tumor were measured in cm with calipers on the frozen ultra-
sound image, tumor volume (cm3) being calculated as
Probability of malignancy = 1/(1 + e–z),
L × D × W × 0.5. Based on the gray-scale ultrasound image,
where z = (2.6369 × color score) + (0.0225 × CA 125) + each mass was classified as a unilocular cyst, a multilocular
(7.1062 × papillary projection score) + (2.6423 × post- cyst, a unilocular solid cyst, a multilocular solid tumor, or a
menopausal score) – 13.6796 and e is the mathematical solid tumor8,9. The presence of papillary projections was
constant and base value of natural logarithms. noted. Tumor vascularization was visualized by color Dop-
The study was conducted at the university hospitals of pler, each tumor being characterized by the color content of
Trondheim, Norway, and Malmö, Sweden. One hundred the tumor scan; a color score of 1, 2, 3, or 4 was assigned to
and fifty-seven consecutive women scheduled for laparotomy the tumor5. Standardized settings of the ultrasound systems
or laparoscopic surgery because of a pelvic mass judged clin- were used. Having assigned a color score to the tumor, the
ically to be of adnexal origin were recruited for the study and sonographer identified the tumor artery with the highest
underwent preoperative ultrasound examination as described blood flow velocity as described by Tailor and coworkers4.
below. Twenty-one women were excluded for the following After having completed the examination, the sonographer
reasons: in 14 women surgery was canceled and replaced by classified each tumor as benign or malignant on the basis of
clinical follow-up; in one woman, representative tissue for pattern recognition as previously described1,2. All examina-
histological diagnosis was not obtained despite laparotomy; tions were documented on videotape and as hard copies.
in one woman, only diagnostic laparoscopy was performed; The ultrasound examinations in Trondheim were carried
two women were operated on abroad and no histopathological out using an Acuson 128 XP ultrasound system equipped
report was obtained; two women died before laparotomy with a 4-MHz transabdominal and a 7-MHz transvaginal
and did not undergo autopsy; one woman had two different transducer (Acuson Inc., Mountain View, CA, USA). Both in
lesions in the same ovary. Thus, 136 women were included. the color and spectral modes, the Doppler ultrasound had a
Of these, 57 (42%) were examined in Trondheim and 79 frequency of 5 MHz. A high-pass filter with a cut-off level of

358 Ultrasound in Obstetrics and Gynecology


Pelvic tumors Valentin et al.

125 Hz was used. The output energy of the Doppler instru- diagnosis in the three women with missing values for time-
ment did not exceed 500 mW/cm2 (spatial peak temporal averaged maximum velocity were serous borderline tumor,
average intensity). The ultrasound examinations in Malmö mucinous cystoma, and adenofibroma. The subset of 79
were carried out using a Sequoia ultrasound system equipped women was used to compare the performances of the Tailor
with a 2.5 –4-MHz transabdominal transducer and a 5– 8- model and the Timmerman model by calculating and com-
MHz transvaginal transducer. In the color mode the Doppler paring the areas under the ROC curves of the two models.
ultrasound frequency was 7 MHz; in the spectral mode it The statistical significance of differences in unpaired
was 5 MHz. Occasionally the color Doppler frequency was continuous data was determined using the Mann–Whitney
lowered to enable examination of the most distant parts U-test. The chi-square test with continuity correction and
of large tumors. The mechanical and thermal indices were Fisher’s exact test were considered appropriate to test the
kept at < 1.0, except during short periods when high-energy statistical significance of differences in unpaired categorical
output was necessary. data. The McNemar test was used to test the statistical
Arterial Doppler shift spectra obtained with the Acuson significance of differences in sensitivity, specificity and accu-
128 XP ultrasound system were analyzed offline from the racy. Non-parametric testing was chosen, as all continuous
videotapes, whereas those obtained with the Sequoia ultra- variables tested manifested a skewed distribution. Two-
sound system were analyzed online. The built-in software of tailed P-values are given with 5% as the level of significance.
the ultrasound systems was used. Three uniform consecutive All statistical analyses except Fisher’s exact test and the
heart beats were analyzed and the resulting values averaged. McNemar test were carried out using the Statview SE +
The analysis was based on the envelope of the Doppler shift Graphic statistical program (Abacus Concepts, Inc., Berkeley,
spectrum, the time-averaged maximum velocity derived from CA, USA, 1988) and SPSS (SPSS Inc., Chicago, Illinois, USA,
the waveform with the highest peak systolic velocity being 1989 –97) was used to carry out Fisher’s exact test. Calculations
selected to characterize the tumor4. for the McNemar test were made using the StatXact-3
Blood was drawn preoperatively for analysis of serum CA statistical program (Cytel Software Corporation, Cambridge,
125. In Trondheim, CA 125 was analyzed by ELISA CA 125 MA, USA, 1995). Exact confidence intervals (95% CIs) were
II (Centocor, Malvern, Pa, USA). In Malmö, ELSA CA 125 calculated using the binomial distribution.
II (Cis-Bio, Gif-sur-Yvette, Cedex, France) was used.
The final diagnosis was made on the basis of histological
R ES U LT S
examination of the respective specimens and on classification
of malignant ovarian tumors by the attending physician in The final diagnoses are shown in Table 1. In the Tailor group,
accordance with the system recommended by the Interna- 26% (35/133) of the tumors were malignant, 8% (11/133) of
tional Federation of Gynecology and Obstetrics10. For the all tumors being borderline ovarian tumors. The correspond-
purpose of statistical analysis, borderline tumors were ing figures in the Timmerman group were 35% (29/82) and
classified as malignant tumors. The sensitivity and specificity 11% (9/82). Ovarian malignancies in Stage I (including
with regard to malignancy were calculated for pattern recog- borderline cases) comprised 40% (14/35) of the malignancies
nition and for the two mathematical models using the cut-off in the Tailor group and 41% (12/29) of those in the Timmer-
values for risk of malignancy suggested in the original man group.
publications, i.e. a risk of > 25% or > 50% to indicate Clinical information and results of ultrasound examina-
malignancy4,5. Receiver operating characteristic curves tions and CA 125 analyses in this study and in the original
(ROC curves) were generated with the Statistical Package studies4,5 are presented in Tables 2 and 3. The tables show some
for the Social Sciences (SPSS Inc., Chicago, IL, USA, 1989– important differences between our study and the original
99) and GraphROC for Windows (downloaded from http:// ones. The patients in our study were slightly older and their
members.tripod.com/refstat/). These curves were used to tumors somewhat bigger than those in the study of Tailor and
determine if cut-off values other than those suggested in the coworkers4. Solid malignant tumors were more common in our
original publications4,5 had better diagnostic properties11. study than in that of Timmerman and colleagues5. Moreover,
The GraphROC for Windows software enabled statistical the differences between benign and malignant tumors with
testing of ROC curves by comparing the areas under the regard to the presence of papillary projections and high color
graphs12. score were smaller in our study than in the original studies4,5.
The Tailor model4 could be tested prospectively in 133 Receiver operating characteristic curves drawn using the
women (referred to here as ‘the Tailor group’), time-averaged results of the 133 women in the Tailor group (area under the
maximum velocity not having been recorded in three patients ROC curve, 0.86) and the 82 women in the Timmerman
(only venous flow was detected in one patient, and technical group (area under the ROC curve, 0.83) showed the best cut-
problems explain the missing values in the other two). The off of the Tailor model to be 1.2%, whereas the best cut-off
Timmerman model5 could be evaluated in 82 women of the Timmerman model was 6.9% (Table 4). Receiver
(referred to here as ‘the Timmerman group’), CA 125 values operating characteristic curves drawn using the results of the
missing in 54 women. Seventy-nine women had all the 79 women who had all the information necessary to calculate
information necessary to calculate the risk of malignancy the risk of malignancy using both logistic regression models
using both mathematical models, i.e. the 82 women in the are shown in Figure 1. The figure shows the two models to
Timmerman group minus the three women with missing have similar diagnostic properties (areas under the ROC
values for time-averaged maximum velocity. The histological curve, 0.87 and 0.84; P = 0.25).

Ultrasound in Obstetrics and Gynecology 359


Pelvic tumors Valentin et al.

The sensitivity, specificity, and accuracy with regard to 0.0074 and 0.02, respectively). When pattern recognition
malignancy, of pattern recognition and of the two logistic was compared to the best cut-off of the Tailor model in this
regression models in this study and in the original studies4,5, study (i.e. a risk ≥ 1.2% indicating malignancy), pattern rec-
are shown in Table 4. In the Tailor group, pattern recogni- ognition had lower sensitivity (83% vs. 94%, P = 0.22) but
tion had a sensitivity of 83%, a specificity of 91%, and an better specificity and accuracy (specificity, 91% vs. 60%; P <
accuracy of 89%. At a sensitivity of 83%, the Tailor model 0.0001 and accuracy, 89% vs. 69%; P < 0.0001). When
had a specificity of 69% and an accuracy of 73% (a risk of pattern recognition was compared to the best cut-off of the
≥ 5.3% indicating malignancy). These differences in specifi- Timmerman model (i.e. a risk ≥ 6.9% indicating malig-
city and accuracy at fixed sensitivity (83%) are statistically nancy), the sensitivity of the two methods was similar (86%
significant (P < 0.0001 and P = 0.0002, respectively). In the vs. 90%; P = 1.0), but pattern recognition had better specif-
Timmerman group, pattern recognition had a sensitivity of icity and accuracy (specificity, 87% vs. 66%; P = 0.007 and
86%, a specificity of 87%, and an accuracy of 87%. At a accuracy, 87% vs. 74%; P = 0.04).
sensitivity of 86%, the specificity of the Timmerman model Tumor types over-represented among false-negative, false-
was 66% and the accuracy 73% (a risk of ≥ 7.5% indicating positive and true-negative diagnoses are shown in Table 5.
malignancy). These differences in specificity and accuracy at Irrespective of the diagnostic method used, borderline tumors
a fixed sensitivity of 86% between pattern recognition and were more common among false-negative diagnoses than
the Timmerman model were statistically significant (P = among true-positive ones. When pattern recognition or the

Table 1 Final diagnoses

Tailor group (n = 133) Timmerman group (n = 82)


Diagnosis Patients (n (%)) Stage I (n (%)) Patients (n (%)) Stage I (n (%))

Benign diagnoses
Dermoid cyst 16 (16) 8 (15)
Endometriosis 14 (14) 11 (21)
Mucinous cystadenoma 13 (13) 7 (13)
Serous cystadenoma 11 (11) 4 (8)
Benign cyst 11 (11) 5 (9)
> one benign diagnosis 7 (7) 2 (4)
Adenofibroma 6 (6) 4 (8)
Torsion of adnexa 5 (5) 3 (6)
Myoma 5 (5) 3 (6)
Fibroma 3 (3) 2 (4)
Functional cyst 3 (3) 1 (2)
Hydrosalpinx 2 (2) 2 (4)
Peritoneal cyst 1 (1) 1 (1)
Abscess 1 (1) 0 (0)
All 98 (100) 53 (100)
Borderline ovarian tumors
Mucinous 7 (63) 7 (100) 4 (44) 4 (100)
Serous 3 (27) 2 (67) 4 (44) 3 (75)
Mucinous and serous 1 (9) 1 (100) 1 (11) 1 (100)
All 11 (100) 10 (91) 9 (100) 8 (89)
Primary invasive ovarian tumors
Serous cystadenocarcinoma 3 (23) 0 3 (23) 0
Mucinous cystadenocarcinoma 3 (23) 0 3 (23) 0
Endometroid cystadenocarcinoma 3 (23) 1 (33) 3 (23) 1 (33)
Clear cell cystadenocarcinoma 1 (8) 1 (100) 1 (8) 1 (100)
Granulosa cell tumor 1 (8) 1 (100) 1 (8) 1 (100)
Dysgerminoma 1 (8) 0 1 (8) 0
Unclassified adenocarcinoma 1 (8) 1 (100) 1 (8) 1 (100)
All 13 (100) 4 (31) 13 (100) 4 (31)
Adenocarcinoma of unknown origin 3 (100) 0 2 (100) 0
Metastatic invasive malignancies
Breast 1 (20) 1 (33)
Signet cell cancer 2 (40) 2 (67)
Colon cancer 2 (40) 0
All 5 (100) 3 (100)
Invasive non-ovarian tumors
Carcinoid 1 (33) 0 1 (50) 0
Lymphoma 1 (33) 0 1 (50) 0
Tubal cancer 1 (33) 1 (100) 0
All 3 (100) 1 (33) 2 (100) 0

360 Ultrasound in Obstetrics and Gynecology


Pelvic tumors Valentin et al.

Timmerman model was used, mucinous cystadenomas, serous Despite steps having been taken to ensure that the same
cystadenomas and adenofibromas were over-represented definitions and examination technique were used in this
among the false-positive diagnoses, and dermoid cysts, endo- study as in the original ones4–6 subtle differences cannot be
metriomas, and simple benign cysts were under-represented. avoided and are likely to have affected the papillary projec-
When the Tailor model was used, myomas, fibromas, and tion and color scores, and possibly time-averaged maximum
adenofibromas were over-represented among the false-positive velocity.
diagnoses, and endometriomas and simple benign cysts were In the original studies, few benign tumors (approximately
under-represented. 10%) but many malignant tumors (approximately 75%) had
papillary projections4,5. In our study, papillary projections
occurred with a similar frequency in benign and malignant
DISCUSSION
tumors. This difference between the original studies and ours
The diagnostic performance (as measured by sensitivity, may be explained both by differences in the tumor popula-
specificity, and area under the ROC curve) of the two logistic tions studied and by differences in the definition of ‘papillary
regression models was much poorer in this study than in projection.’ We used the same definition of papillary projec-
those in which the models were created4,5. In the original tion as that in the original publications, a papillary projection
studies, the sensitivity was > 85% and the specificity > 90%, being defined as ‘a solid projection into a cyst cavity from the
when a cut-off value of 50% was used, and the area under the cyst wall of > 3 mm in height.’ It is possible that the other
ROC curves was 0.98 in both studies. The differences can research teams required additional criteria to be fulfilled to
probably be explained by differences in definitions, examina- allow a structure to be called a papillary projection, even if
tion technique, ultrasound equipment, method of analysis of this was not stated explicitly in the publications or during
CA 125, and tumor populations. our personal communications. The possible use of additional

Table 2 Clinical information and results of ultrasound examinations and CA 125 analyses in the Tailor group of this study and in the original study
by Tailor and coworkers4,6*

Tailor group, this study (n = 133) Original Tailor study4,6* (n = 67)


Characteristic Benign (n = 98) Malignant (n = 35) P Benign (n = 52) Malignant (n = 15) P

Trondheim patients (% (n)) 36 (35/98) 57 (20/35) 0.0445 — —


Age (years)
Mean (SD) 47 (16.7) 56 (18.1) 0.0145 43 55 0.002
Median 46 58 — —
Range 19– 84 18–84 20–75 37–76
Postmenopausal (% (n)) 41 (40 /98) 69 (24/35) 0.0087 29 (15/52) 53 (8/15) 0.078
Tumor volume (cm3)
Mean 319 771 0.0002 183 462 0.009
Median 100 432 — —
Range 6 – 5224 20–4718 6– 972 27–1113
Ultrasound morphology (% (n))
Unilocular 19 (19 /98) 0 < 0.001 67 (35/52) 47 (7/15) 0.249
Multilocular 18 (18 /98) 3 (1/35) — —
Unilocular solid 14 (14 /98) 0 — —
Multilocular solid 41 (40 /98) 54 (19/35) — —
Solid 7 (7 /98) 43 (15/35) — —
Papillary projection (% (n)) 18 (18 /98) 26 (9/35) 0.4947 10 (5/52) 73 (11/15) < 0.0001
Color score (% (n))
1 12 (12 /98) 0 < 0.001 — —
2 18 (18 /98) 3 (1/35) — —
3 52 (51 /98) 37 (13/35) — —
4 17 (17 /98) 60 (21/35) — —
High (3,4) color score 69 (68/98) 98 (34/35) 0.0019 — —
Detectable arterial flow (% (n)) 88 (86/98) 100 (35/35) 0.036 — —
Time-averaged maximum
velocity (cm / s)
Mean 10 36 0.0001 11 29 0.0001
Median 7 29 — —
Range 1– 62 6–125 2–32 8–80
CA 125 (U / mL)
Median 20 120 0.0002 — —
Range 5 – 91 7–12 043
Probability of malignancy,
Tailor model
Median 0.003 0.969 0.0001 — —
Range 0.0005 – 0.998 0.001–1 — —

*Some information about the original Tailor study4 was obtained from another publication using the same patients6. SD, standard deviation.

Ultrasound in Obstetrics and Gynecology 361


Pelvic tumors Valentin et al.

Table 3 Clinical information and results of ultrasound examinations and CA 125 analyses in the Timmerman group of this study and in the original
study by Timmerman and coworkers5

Timmerman group, this study (n = 82) Original Timmerman study5* (n = 191)


Characteristic Benign (n = 53) Malignant (n = 29) P Benign (n = 140) Malignant (n = 51) P

Trondheim patients (% (n)) 53 (28 /53) 65 (19/29) 0.3805 — —


Age (years)
Mean (SD) 45 (16.2) 55 (16.6) 0.0128 49 (16) 58 (14) 0.001
Median 43 53 — —
Range 19 – 84 18 –84 — —
Postmenopausal (% (n)) 38 (20 /53) 66 (19/29) 0.0295 40 71 0.0003
Tumor volume (cm3)
Mean 300 690 0.0048 246 524 0.0794
Median 100 348 — —
Range 9 – 5224 20 –4718 3–1844 4–3807
Ultrasound morphology (% (n))
Unilocular 15 (8 /53) 0 < 0.001 42 6 < 0.0001
Multilocular 19 (10 /53) 0 31 4 0.0002
Unilocular solid 13 (7 /53) 3 (1 /29) 2 16 0.0014
Multilocular solid 47 (25 /53) 52 (15/29) 16 49 < 0.0001
Solid 6 (3 /53) 45 (13/29) 9 27 0.0018
Papillary projection (% (n)) 26 (14/53) 28 (8/29) 0.8837 8 75 < 0.0001
Color score (% (n))
1 15 (8 /53) 0 < 0.001 — —
2 23 (12 /53) 7 (2 /29) — —
3 53 (28 /53) 41 (12/29) — —
4 9 (5 /53) 52 (15/29) — —
High (3,4) color score 62 (33 /53) 93 (27/29) 0.0059 26 90 < 0.0001
Detectable arterial flow (% (n)) 81 (43 /53) 97 (28/29) 0.087 73 100 < 0.0001
Time-averaged maximum
velocity (cm / s)
Mean 9 38 0.0001 13 18 0.0004
Median 6 31 — —
Range 1– 44 6– 125 1–72 5–43
CA 125 (U/mL)
Median 20 100 0.0002 15 177 < 0.0001
Range 5 – 191 7 – 12 043 1–1046 5–31 090
Probability of malignancy,
Timmerman model
Median 0.047 0.83 0.0001 — —
Range 0.00002 – 0.995 0.005–1.0

*Only percentages (not absolute numbers) are given in the publication by Timmerman and coworkers5. SD, standard deviation.

Table 4 The sensitivity, specificity and accuracy with regard to malignancy of pattern recognition and of the two logistic regression models

Diagnostic method Sensitivity %; 95% CI (n) Specificity %; 95% CI (n) Accuracy %; 95% CI (n)

Pattern recognition
Total patients (n = 136) 83; 67–94 (30/36) 91; 84–96 (91/100) 89; 82–94 (121/136)
Tailor group (n = 133) 83; 66–93 (29/35) 91; 83–96 (89/98) 89; 82–94 (118/133)
Timmerman group (n = 82) 86; 68–96 (25/29) 87; 75–94 (46/53) 87; 77–93 (71/82)
Tailor model4
Risk of malignancy > 25%, this study 71; 54–85 (25/35) 82; 72–89 (80/98) 79; 71–86 (105/133)
Risk of malignancy > 25%, original study4 93; 67–100 90; 79–97
Risk of malignancy > 50%, this study 69; 51–83 (24/35) 88; 80–94 (86/98) 83; 75–89 (110/133)
Risk of malignancy > 50%, original study4 87; 59–98 98; 90–100
Best cut-off, this study (≥ 1.2%) 94; 81–99 (33/35) 60; 50–70 (59/98) 69; 61–77 (92/133)
Timmerman model5
Risk of malignancy > 25%, this study 72; 53–87 (21/29) 68; 54–80 (36/53) 70; 58–79 (57/82)
Risk of malignancy > 25%, original study5 96* 87*
Risk of malignancy > 50%, this study 62; 42–79 (18/29) 79; 66–89 (42/53) 73; 62–82 (60/82)
Risk of malignancy > 50%, original study5 88* 92*
Best cut-off, this study (≥ 6.9%) 90; 73–98 (26/29) 66; 52–78 (35/53) 74; 64–83 (61/82)
5
*Neither CI nor absolute numbers are presented in the original Timmerman study . CI, confidence interval.

362 Ultrasound in Obstetrics and Gynecology


Pelvic tumors Valentin et al.

color Doppler sensitivity of the ultrasound systems used and


1.0
by differences in the subjective evaluation of the color content
0.9 of the tumor scan, the problem with the color score being
that it is purely subjective and difficult to reproduce13.
0.8 The time-averaged maximum velocities recorded from
0.7
the benign tumors in our study were very similar to those in
the study by Tailor and coworkers4 whereas the malignant
0.6 tumors in our study were characterized by higher velocities
Sensitivity

than those in the Tailor study. Differences in the malignant


0.5
tumor populations as well as subtle differences in examina-
0.4 tion technique may explain the discrepancy.
The CA 125 values in our study and in that of Timmerman
0.3 and colleagues5 were similar, although the benign tumors in
the Timmerman study were characterized by slightly lower
0.2
CA 125 values and the malignant tumors by slightly higher
0.1 CA 125 values than those in our study. The small difference
between the two studies may be explained by differences in
0.0 tumor populations and possibly in the method of CA 125
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.9 0.9 1.0
analysis, because the method used in Malmö was not the
1 – specificity same as that used in Trondheim and in the Timmerman
Figure 1 Receiver operating characteristic curves of the Tailor model4 study5.
(dotted line; area under curve = 0.87 + – 0.039) and the Timmerman Subtle differences in examination technique and defini-
model5 (solid line; area under curve = 0.84 + – 0.044); paired tions may only partly explain the poorer performance of
comparison of 79 tumors. The software ‘GraphROC for Windows’ was the two logistic regression models in our study than in the
used. Paired significance test: P = 0.25.
original ones4,5. The most important contributing factor to
the differences in results is likely to be true differences in the
criteria by the other research teams may explain the lower tumor populations studied, because our results showed that
frequency of papillary projections in the benign tumors in the certain tumor types tended to be over-represented among the
original studies4,5. On the other hand, differences in defini- false-positive, false-negative, and true-negative diagnoses.
tion are unlikely to explain the higher frequency of papillary Borderline tumors were often misclassified as benign, and
projections in the malignant tumors in these studies. The lat- cystadenomas, adenofibromas, myomas and fibromas were
ter difference is more likely to be explained by true differences often misclassified as malignant. Simple benign cysts and
in tumor populations. endometriomas were usually correctly classified as benign
In our study, the difference in color content between and were over-represented among the true-negative diag-
benign and malignant tumors was much smaller than in the noses. Borderline tumors were much more common in our
Timmerman study5. Both in that study and in our study, study than in the original ones4–6. Moreover, the invasive
approximately 90% of the malignant tumors had high color malignancies in the original studies5,6 differed from those in
content at color Doppler examination (color scores of 3 or 4), ours. Those in the original Tailor study6 comprised only
but in our study as many as 60% of the benign tumors had ovarian malignancies, whereas those in our Tailor group
high color content vs. only 26% in the Timmerman study. were much more heterogeneous. In the original Timmerman
Although this difference between the two studies may reflect study5 the proportion of primary invasive malignancies and
a difference in the benign tumor populations studied, it is metastatic tumors was higher than in our Timmerman group.
perhaps more likely to be explained by differences in the There were also substantial differences in the types of benign

Table 5 Tumor types over-represented among false-negative, false-positive and true-negative diagnoses

Ultrasound method/tumor type False negative True positive P False positive True negative P

Pattern recognition (% (n))


Borderline 67 (4/6) 27 (8/30) 0.15 — — —
Mucinous and serous cystadenomas and adenofibromas — — — 55 (5/9) 30 (27/91) 0.22
Dermoid cysts, endometriomas and simple benign cysts — — — 11 (1/9) 44 (40/91) 0.08
Timmerman model, cut-off 50% (% (n))
Borderline 45 (5/11) 22 (4/18) 0.24 — — —
Mucinous and serous cystadenomas and adenofibromas — — — 64 (7/11) 19 (8/42) 0.01
Dermoid cysts, endometriomas and simple benign cysts — — — 18 (2/11) 52 (22/42) 0.09
Tailor model, cut-off 50% (% (n))
Borderline 55 (6/11) 21 (5/24) 0.06 — — —
Myomas, fibromas and adenofibromas — — — 42 (5/12) 10 (9/86) 0.01
Endometriomas and simple benign cysts — — — 0 (0/12) 29 (25/86) 0.03

Ultrasound in Obstetrics and Gynecology 363


Pelvic tumors Valentin et al.

tumor between our study and the original ones5,6. Dermoid a correct and confident diagnosis of many types of benign
cysts and extraovarian tumors comprised a larger proportion tumor, such as endometriomas and dermoid cysts2. In con-
of the benign tumors in the original Tailor study6 than in our trast, adenofibromas, serous cystomas, and mucinous cysto-
Tailor group. Serous cystadenomas and functional cysts were mas offer greater diagnostic difficulties, and, according to the
much more common among the benign tumors in the original results of this study, they tend to be over-represented among
Timmerman study5 than in our Timmerman group, whereas false-positive diagnoses. Pattern recognition would almost
endometriomas and dermoid cysts were less common. These certainly be a poor method of distinguishing benign from
differences between the tumor populations may explain the malignant tumors in a series of tumors comprising only
differences in tumor morphology (solid tumors, papillary borderline tumors, cystadenomas, and adenofibromas, even
projections), Doppler results and CA 125 values, and con- if the sonographer were very experienced. For such ‘difficult
sequently the poorer performance of the logistic regression tumors’ an alternative to pattern recognition, such as a
models in our study. Naturally, a mathematical model yields mathematical model specifically designed for such cases,
better results if it is tested in a tumor population very similar might prove helpful, even though mathematical models are
to that in which it was created rather than in a dissimilar one. intended mainly to be an alternative to pattern recognition
The Tailor model and the Timmerman model have also been for less experienced sonographers.
cross-validated prospectively by Aslam and colleagues14. They,
too, found the diagnostic performance of the two models to
ACKNOWLEDGMENTS
be poorer than in the original studies4,5 and, like us, they found
the two models to have similar diagnostic performance. The study was supported by grants from the Malmö General
The idea of using mathematical models to calculate indi- Hospital Cancer Foundation, Funds administered by the
vidual risks of malignancy in pelvic tumors is very attractive. Malmö Health Care Administration, and the Swedish
A robust mathematical model could be an alternative to Medical Research Council (grant nos B96–17X-11605– 01 A,
pattern recognition for less experienced sonographers. How- K98–17X-11605–03 A, and K2001–72X-11605–06 A).
ever, our results suggest that mathematical models need to
be created on the basis of a very large number of tumors to REFERENCES
ensure coverage of the whole spectrum of benign and malig-
1 Valentin L. Prospective cross-validation of Doppler ultrasound
nant pelvic tumors. It is also obvious that for a mathematical examination and gray scale ultrasound imaging for discrimination of
model to be generally useful, the variables used in the model benign and malignant pelvic masses. Ultrasound Obstet Gynecol
must be very clearly defined. From discussions in the steering 1999; 14: 273–83
committee of the ongoing IOTA (International Ovarian 2 Valentin L. Pattern recognition of pelvic masses by gray scale ultra-
Tumor Analysis) study, it is clear that many descriptive sound imaging: the contribution of Doppler ultrasound. Ultrasound
Obstet Gynecol 1999; 14: 338–47
ultrasound terms, such as papillary projection, unilocular 3 Timmerman D, Schwärzler P, Collins WP, Clarehout F, Coenen M,
cyst and multilocular cyst, have different meanings to differ- Amant F, Vergote I, Bourne TH. Subjective assessment of adnexal
ent people. The fact that 47% of the malignant tumors in masses with the use of ultrasonography: an analysis of interobserver
the Tailor study4 were classified as unilocular vs. none in variability and experience. Ultrasound Obstet Gynecol 1999; 13:
our study and only 6% in the Timmerman study5, and the 11–6
4 Tailor A, Jurkovic D, Bourne TH, Collins WP, Campbell S.
fact that the frequency of papillary projections in benign Sonographic prediction of malignancy in adnexal masses using
and malignant tumors differed very much between our study multivariate logistic regression analysis. Ultrasound Obstet Gynecol
and the others4,5, support this, because it is highly unlikely 1997; 10: 41–7
that these differences are to be explained exclusively by true 5 Timmerman D, Bourne TH, Tailor A, Collins WP, Verrelst H,
differences in tumor morphology. To facilitate the process of Vandenberghe K, Vergote I. A comparison of methods for pre-
operative discrimination between benign and malignant adnexal
defining ultrasound findings characteristic of various masses: the development of a new logistic regression model. Am J
types of tumor, and to make comparison between studies Obstet Gynecol 1999; 181: 57–65
meaningful, unequivocal definitions of the ultrasound 6 Tailor A, Jurkovic D, Bourne TH, Collins WP, Campbell S.
terms used are necessary15. Sonographic prediction of malignancy in adnexal masses using an
Pattern recognition by an experienced sonographer per- artificial neural network. Br J Obstet Gynaecol 1999; 106: 21– 30
7 World Medical Association Declaration of Helsinki: Ethical principles
formed better than the logistic regression models in this for medical research involving human subjects. Bull World Health
study, but the sensitivity and specificity of pattern recog- Organ 2001; 79: 373–4
nition were slightly poorer in this study than in another 8 Granberg S, Wikland M, Jansson I. Macroscopic characterization
publication using the same sonographer (L.V.)1. The lower of ovarian tumors and the relation to the histological diagnosis:
sensitivity (83% in this study vs. 88% in the previous one) criteria to be used for ultrasound evaluation. Gynecol Oncol 1989;
35: 139–44
can probably be explained by the much higher frequency of 9 Valentin L. Gray scale sonography, subjective evaluation of the color
borderline tumors in this study (9% vs. 3% in the previous Doppler image and measurement of blood flow velocity for distin-
study), because borderline ovarian tumors seem to be mis- guishing benign and malignant tumors of suspected adnexal origin.
classified as benign more often than other types of pelvic Eur J Obstet Gynecol Reprod Biol 1997; 72: 63–72
malignancies. The lower specificity (91% vs. 96% in the pre- 10 Shepherd JH. Revised FIGO staging for gynaecological cancer. Br J
Obstet Gynaecol 1989; 96: 889–92
vious study) is more difficult to explain, because the propor- 11 Richardson DK, Schwartz JS, Weinbaum PJ, Gabbe SG. Diagnostic
tions of different types of benign tumor were similar in the tests in obstetrics: a method for improved evaluation. Am J Obstet
two studies. Using pattern recognition it is possible to make Gynecol 1985; 152: 613–8

364 Ultrasound in Obstetrics and Gynecology


Pelvic tumors Valentin et al.

12 Hanley JA, McNeil B. A method of comparing areas under the Prospective evaluation of logistic regression models for the diagnosis
receiver operating characteristics curves derived from the same cases. of ovarian cancer. Obstet Gynecol 2000; 96: 75–80
Radiology 1983; 148: 839 – 43 15 Timmerman D, Valentin L, Bourne TH, Collins WP, Verrelst H,
13 Sladkevicius P, Valentin L. Inter-observer agreement in the results of Vergote I. Terms, definitions and measurements to describe the sono-
Doppler examinations of extrauterine pelvic tumors. Ultrasound graphic features of adnexal tumors: a consensus opinion from the
Obstet Gynecol 1995; 6: 91 – 6 International Ovarian Tumor Analysis (IOTA) group. Ultrasound
14 Aslam N, Banerjee S, Carr JV, Savvas M, Hooper R, Jurkovic D. Obstet Gynecol 2000; 16: 500–5

Ultrasound in Obstetrics and Gynecology 365

Вам также может понравиться