Академический Документы
Профессиональный Документы
Культура Документы
Introduction
The purpose of this challenge is to evaluate and compare coronary artery cen-
tral lumen line extraction methods. Developers of coronary CTA processing
software, and companies selling products in this field, are invited to apply their
methods on the data provided in this challenge. Developers of generic methods
for extracting elongated tubular structures from 3D images or methods devel-
oped for other imaging modalities and anatomies are also very welcome to join
this competition and tailor their methods to this specific application.
1
the lumen. The start point of a centerline is defined in the aorta, and the end
point is the most distal point where the artery is still distinguishable from the
background. The centerline is smoothly interpolated if the artery is partly in-
distinguishable from the background, e.g. in case of a total occlusion or imaging
artifacts.
Challenges
Methods and algorithms will be divided in three different categories for evalua-
tion: automatic tracking methods, methods with minimal user interaction and
interactive tracking methods. Please note that the organizers keep the right to
combine challenges if one of the challenges has not enough submissions.
Point A should be used for selecting the appropriate centerline. If the automatic
tracking result does not contain centerlines near point A, point B can be used
to select the appropriate centerline. The participants must report in the paper
how many times point A or point B are used, but can of course choose to report
it per vessel or summarize it per dataset. Point A and B are only meant for
selecting the right centerline, and may not be used as input for the automatic
tracking method.
2
Points A, B, S and E will be provided with the data. The participants should
clearly describe which point was used by their method. Furthermore, in case
the method obtains a vessel tree from the initial point, a second point may
be used after the centerline determination to select the appropriate centerline.
This point can be either point A or B as defined in challenge 1, and the use
participants have to report how many times point A or point B are used.
Data
Coronary CTA data for this challenge was acquired in the Erasmus Medical
Center Rotterdam, The Netherlands. 32 datasets were randomly selected from a
series of patients that underwent coronary CTA. Twenty datasets were acquired
on a Siemens Somatom Sensation 64 and twelve datasets on a Siemens Somatom
Definition CT scanner. Diastolic reconstructions were used, with reconstruction
intervals varying from 250ms to 400ms before the R-peak. Three datasets were
reconstructed using a B46f kernel, all others were reconstructed using a B30f
kernel.
3
Table 1: Image quality and presence of calcium in the training and test sets
(currently 28 patients scored, 4 others evenly distributed).
Image quality Presence of calcium
Total Good Adequate Poor Low Moderate Severe
Training 8 3 3 1 3 3 1
Testing 1 16 7 5 2 6 6 2
Testing 2 8 4 1 2 2 4 1
datasets (Testing 2) will be used for testing during the workshop. To ensure
representative training and testing sets, each dataset was visually assessed on
image quality and presence of calcium by a 4th year radiology resident. Image
quality was scored as poor, adequate or good, based on the noise level, presence
of streaking artifacts, irregular heart rate artifacts and other artifacts. Presence
of calcium was scored as low, moderate, or severe. Based on these scorings the
data was distributed equally over the three groups. The patients and scanning
parameters were assesed to be representative for clinical practice. Image quality
and calcium scores for the training and test sets are listed in Table 1.
Reference standard
Three observers annotated points along the center of the lumen of four coronary
arteries, namely the RCA, LAD, LCX and one large side branch of the main
coronary arteries, in all 32 datasets, yielding 32 × 4 = 128 annotated centerlines.
The observers were instructed to use our definition of a centerline. The observers
also specified the radius of the lumen at least every 5 mm, where the radius was
chosen such that the enclosed area of the annotated circle matched the area of
the lumen. The radius was specified after the central lumen line was annotated.
After annotation the centerlines were sampled equidistantly using a sampling
distance of 0.03 mm enabling accurate comparison between centerlines. The
radii were linearly interpolated to obtain a radius estimation of the coronaries
at every point along the resampled centerlines.
Error inspection
After creating a first weighted average, the observer centerlines were compared
with this average centerline. This comparison was used to create curved-planar-
4
Figure 1: An example of the color-coded curved-planar-reformatted images used
to detect possibile annotation errors.
Evaluation
In the evaluation we discern between tracking capability and tracking accuracy.
Three overlap measures are used to assess the ability of tracking centerlines
and three distance measures are used to determine the accuracy of centerline
tracking.
Each of these measures is related to the inter-observer variabilities. The
scores for each measure range from 100 to 0: 100 points implies that the result
of the method is perfect, 50 points implies that the performance of the method
is similar to the inter-observer variability, 0 points implies a complete failure.
5
Aorta
Figure 2: Every point before the first intersection of a method centerline and
a disc that is positioned at the start of the reference standard centerline is not
taken into account during evaluation.
6
Figure 3: Correspondence between two centerlines, and average centerline de-
termined via correspondence.
7
Path found by
method
Reference standard
with radius
TPR FP
TPM
End of reference
standard
FN
FP
TPR
TPM
Clipping disc
Not taken
into account
Figure 4: An illustration of the different terms used in the overlap measure OV.
The measure OV represents the ability to track the complete vessel annotated
by the human observers.
Overlap measures
Overlap (OV)
The first overlap measure, OV, represents the ability to track the complete vessel
annotated by the human observers. It is defined as (see also Figure 4):
TPM + TPR
OV = .
TPM + TPR + FN + FP
TPRfe
OF = (1)
TPRfe + FNfe
8
Path found by
method
Reference standard
with radius
FN fe
End of reference
standard
fe
FN
fe
TPR
Clipping disc
Not taken
into account
Figure 5: An illustration of the different terms used in the overlap measure OF.
The measure OF represents the ability to track vessels without making errors.
9
First time
Stenotic above 1.5mm
FPt region Path found by
method
1.5 mm
TPM t TPM
t
Reference standard
with radius
Figure 7: An illustration of the different terms used in the overlap measure OT.
The measure OT represents the ability to track vessels with diameter ≥ 1.5 mm.
TPMt + TPRt
OT = . (2)
TPMt + TPRt + FNt + FPt
10
100 100
80 80
60 60
IO IO
40 40
20 20
Figure 8: Figure (a) shows an example of how overlap measures are transformed
into scores. Figure (b) shows this transformation for the accuracy.
P P
TPR(i) + TPM(i)
OVio = P P P P
TPR(i) + TPM(i) + FP(i) + FN(i)
P
TPRfe (i)
OFio = P P
TPRfe (i) + FNfe (i)
P P
TPRt (i) + TPMt (i)
OTio = P P P P ,
TPRt (i) + TPMt (i) + FPt (i) + FNt (i)
Overlap score
The performance of the method is scored with a measure related to the inter-
observer variability. For methods that perform better than the observers the
OV, OF, and OT measures are converted to scores by linearly interpolating be-
tween 100 and 50 points, respectively corresponding to an overlap of 1.0 and an
overlap similar to the inter-observer variability. If the method performs worse
than the inter-observer variability the score is obtained by linearly interpolat-
ing between 50 and 0 points, respectively corresponding to the inter-observer
variability and an overlap of 0.0.
(
(Om /Oio ) ∗ 50 Om ≤ Oio
ScoreO = Om −Oio (3)
50 + 50 ∗ 1−Oio Om > Oio ,
11
Accuracy measures
Average distance (AD)
The first accuracy measure is the average distance between the reference stan-
dard and the automatic centerline. The average distance is defined as the
summed distance of all the connections between the two equidistantly sampled
centerlines, divided by the number of connections.
Accuracy score
The tracking accuracy of the method is related per connection to the observer
performance. A connection is worth 100 points if the distance to the reference
standard is 0 mm, it is worth 50 points if the distance is equal to the inter-
observer variability at that point. Methods that perform worse than the inter-
observer variability are rewarded per connection 50 points times the fraction of
the inter-observer variability and the method accuracy.
(
100 − 50(Am (x)/Aio (x)) Am (x) ≤ Aio (x)
ScoreA (x) = (4)
(Aio (x)/Am (x)) ∗ 50 Am (x) > Aio (x),
12
where Am (x) and Aio (x) define the distance from the method centerline to the
reference centerline and the inter-observer accuracy variability at point x. An
example of this conversion is shown in Figure (8(b)).
The average score over all connections yields the AD score for the centerline,
the average over all connections that connect TPR and TPM points yields the
AI score, and the AT score is defined as the average score over all connections
that connect a point in the clinical relevant section.
Technical details
Directory structure
The training data and testing data are stored in archives with directories for
each dataset. The directories uniquely describe the datasets. The training
datasets are numbered ’00’ to ’07’ and are stored in the directories ’dataset00’
to ’dataset07’. The testing 1 set is stored in the directories ’dataset08’ to
’dataset23’. The datasets that will be used for testing during the workshop
are stored in directories named ’dataset24’ to ’dataset31’.
Each directory datasetXX contains an image file, named imageXX.mhd and
imageXX.raw, and four directories for the vessels, these are named vessel0,
vessel1, vessel2, and vessel3. These directories contain the reference standard
and point A,B,S and E for each vessel.
13
each path point, the radius at that point (ri ) and the inter-observer variability
of that position (ioi ), in case of the averaged reference standard. Every point is
on a different line in the file starting with the most proximal point and ending
with the most distal point of the vessel. The voxel coordinate of each point can
be calculated by dividing the world coordinate by the voxel size of the image.
The voxel size can be found in the ’ElementSpacing’ line of the .mhd file. A
typical ’reference.txt’ file looks like this:
x0 y0 z0 r0 io0
x1 y1 z1 r1 io1
x2 y2 z2 r2 io2
x... y... z... r... io...
xn yn zn rn ion
Point files
The files ’pointA.txt’, ’pointB.txt’, ’pointS.txt’, and ’pointE.txt’ contain respec-
tively the A,B,S, and E point for each vessel. These files contain three values,
corresponding with the x-,y- and z-coordinate of the respective point.
Submitting results
A participant should create an archive similar to the directory structure of
the training and testing data. It should contain a directory for each dataset.
These directories should be named ’dataset08’ to ’dataset23’ if one is submit-
ting results on the testing 1 data, results on the testing 2 data are stored in the
directories ’dataset24’ to ’dataset31’. The directories should contain 4 subdirec-
tories, named vessel0 to vessel3, with a file called ’result.txt’. This file should
contain the extracted centerline. It should contain one point per line, ordered
from proximal to distal. Each point should be described by three values corre-
sponding to the x-,y- and z-coordinate of each point. These points should be in
world-coordinates; similar to the input point and reference files.
14
Updating the result tables
The organizers will provide the participants at the day of the workshop with
tables that include ranks and from that moment on participants can also down-
load the tables with ranks from the website. Participants should update their
document with the new tables.
Evaluation software
A C++ implementation of the evaluation measures is provided by the organizers.
15
Table 2: Average overlap per dataset.
Dataset OV OF OT Avg.
nr. % score rank % score rank % score rank rank
8 81.3 81.9 31 62.9 83.9 20 83.9 77.0 41 30.7
9 70.8 65.2 45 90.5 71.7 09 78.2 84.8 17 23.7
10 81.1 83.9 18 69.8 84.3 47 75.6 76.4 10 25.0
11 90.8 79.0 12 75.4 77.8 09 78.2 85.6 08 09.7
12 74.0 90.8 35 86.2 82.5 21 89.3 89.9 06 20.7
13 75.7 78.3 44 88.2 74.5 28 81.7 80.6 34 35.3
14 73.4 78.2 43 77.6 87.3 17 78.9 94.3 46 35.3
15 91.2 72.7 31 87.6 81.1 25 92.8 71.6 14 23.3
16 81.6 87.1 33 75.4 79.7 26 78.6 76.0 32 30.3
17 82.9 76.7 26 79.3 69.9 50 86.9 81.1 42 39.3
18 87.1 77.7 44 83.4 87.6 35 96.0 80.8 16 31.7
19 72.2 76.3 20 84.5 71.2 13 80.6 74.2 48 27.0
20 83.6 69.3 30 77.1 77.1 45 76.4 87.1 29 34.7
21 83.5 81.9 18 84.3 70.6 14 72.1 70.1 19 17.0
22 87.2 80.7 22 79.9 80.1 21 91.6 74.6 34 25.7
23 90.7 69.8 49 76.8 89.9 39 82.8 85.9 09 32.3
Avg. 81.7 78.1 31.3 79.9 79.3 26.2 82.7 80.6 25.3 27.6
16
References
[1] T. van Walsum, M. Schaap, C. Metz, A. van der Giessen, and W. Niessen,
“Averaging center lines: Mean shift on paths,” in Medical Image Computing
and Computer-Assisted Intervention - MICCAI 2008, 2008.
17