Академический Документы
Профессиональный Документы
Культура Документы
ABSTRACT
In dental X-ray images, an accurate detection of cephalometric landmarks plays an important role in clinical
diagnosis, treatment and surgical decisions for dental problems. In this work, we propose an end-to-end deep
learning system for cephalometric landmark detection in dental X-ray images, using convolutional neural net-
works (CNN). For detecting 19 cephalometric landmarks in dental X-ray images, we develop a detection system
using CNN-based coordinate-wise regression systems. By viewing x- and y-coordinates of all landmarks as 38 in-
dependent variables, multiple CNN-based regression systems are constructed to predict the coordinate variables
from input X-ray images. First, each coordinate variable is normalized by the length of either height or width
of an image. For each normalized coordinate variable, a CNN-based regression system is trained on training im-
ages and corresponding coordinate variable, which is a variable to be regressed. We train 38 regression systems
with the same CNN structure on coordinate variables, respectively. Finally, we compute 38 coordinate variables
with these trained systems from unseen images and extract 19 landmarks by pairing the regressed coordinates.
In experiments, the public database from the Grand Challenges in Dental X-ray Image Analysis in ISBI 2015
was used and the proposed system showed promising performance by successfully locating the cephalometric
landmarks within considerable margins from the ground truths.
Keywords: Computer-aided detection (CADe), landmark detection, dental X-ray, deep learning, convolutional
neural networks
1. INTRODUCTION
In a recent decade, deep learning has become the most powerful and reliable machine learning methods in various
fields, such as computer vision,1–3 language processing,4, 5 and gaming.6 In medical imaging, deep learning has
also shown its superior performance in various applications from pre-processing techniques to semantic analysis
on patient images.7, 8 It has achieved the state-of-the-art-level performances in various tasks such as abnormality
detection,9, 10 disease classification,11, 12 and organ segmentation.13, 14 Due to these powerfulness and effectiveness,
deep learning has recently been playing an important role in computer-aided detection (CADe) and diagnosis
(CADx) fields in medical imaging.
In dental X-ray images, an accurate detection of cephalometric landmarks shown in Fig. 1, plays an important
role in clinical diagnosis, treatment and surgical decisions for dental problems. Manual landmark detection is
however time consuming and has a risk of inter- and intra-observer variability, so that it is required to perform
the landmark detection process automatically. To achieve this, two recent public grand challenges15, 16 have been
held and several approaches have been suggested to perform automatic landmark detection on dental X-ray
images. Vandaele et al.17 used ensemble learning with extremely randomized trees (ERT) to learn the location of
cephalometric landmarks. Mirzaalian et al.18 proposed a pictorial structure algorithm based on random forest-
based likelihoods from several hand-crafted features, e.g. local binary patterns, spatial coordinates, blobness,
tubularness, and Zernike features. Chen et al.19 formulated the convex optimization problem to estimate the
displacements from randomly sampled image patches to the landmark locations. Chu et al.20 combined random
Further author information: (Send correspondence to Junmo Kim)
Hansang Lee: (E-mail) hansanglee@kaist.ac.kr
Minseok Park: (E-mail) pms0209@kaist.ac.kr
Junmo Kim: (E-mail) junmo.kim@kaist.ac.kr
Medical Imaging 2017: Computer-Aided Diagnosis, edited by Samuel G. Armato III, Nicholas A. Petrick,
Proc. of SPIE Vol. 10134, 101341W · © 2017 SPIE · CCC code: 1605-7422/17/$18 · doi: 10.1117/12.2255870
forest regression and shape models to further correct the landmark locations. Lindner et al.21, 22 used random
forest regression-voting to detect the landmarks automatically. In addition, Ibragimov et al.23, 24 applied game
theory concepts into random forest detector with Haar-like features to determine the optimal landmark locations.
In this work, we propose an end-to-end deep learning system for cephalometric landmark detection in dental
X-ray images, using convolutional neural networks (CNN). CNN has recently been applied to dental X-ray
images for teeth classification25 and has shown high accuracy results. We apply the CNN to the task of landmark
detection in dental X-ray images to check whether the CNN still shows competitive performance for this problem,
as it has shown for other detection, segmentation, and classification problems. To solve the problem, by viewing
x- and y-coordinates of all landmarks as 38 independent variables, multiple CNN-based regression systems are
constructed to predict the coordinate variables from input X-ray images. We train 38 regression systems with
the same CNN structure on input images and coordinate variables to be regressed, respectively. In experiments,
the proposed system showed promising performance by successfully locating the cephalometric landmarks within
considerable margins from the ground truths. As far as we know, this is the first attempt to apply the deep
learning technique to the problem of cephalometric landmark detection in dental X-ray images.
2. METHODS
To detect 19 cephalometric landmarks in dental X-ray images, a CNN-based landmark detection system is
proposed. Details of the proposed approach are summarized in Fig. 2. In the proposed detection system, we view
x- and y-coordinates of 19 landmarks as 38 independent variables. We then re-formulate the landmark detection
problem as multiple problems consisting of regressing individual coordinate variables. To solve these problems,
we construct the multiple CNN-based regression systems in which each of them predicts the individual coordinate
variable from the input X-ray images. As a pre-processing, we normalize each coordinate variable by the length
of coordinate axis, which is one of height or width of an input image.
For each normalized coordinate variable, a CNN-based regression system is designed as shown in Fig. 2. In
the proposed system, the used CNN structure has two convolutional layers, two max pooling layers, and one
fully-connected layer. We use input images of size 64 × 64 which is scaled from the original input images. The
first convolutional layer consists of six feature maps with a 5 × 5 kernel, and is followed by 2 × 2 max pooling
to reduce the size of feature maps. The second convolutional layer consists of twelve feature maps, with a 5 × 5
kernel, and is also followed by 2 × 2 max pooling to subsample the feature maps. Finally, a fully-connected layer
computes binary-sized output vector consisting of probabilities in which the first probability value corresponds
to the normalized coordinate variable of landmarks. The proposed CNN regression system is trained on training
images and corresponding coordinate variables of landmarks.
Predicted
landmarks
We construct the above CNN regression systems for 38 coordinate variables individually with the same
structure and specification. With training each CNN regression system on the corresponding coordinate variable
of landmarks, we apply the trained regression system with the test images to obtain the output probability value
for each coordinate variable. We then multiply the output probability value with the test image size to compute
the actual coordinate value of the test image. Finally, we obtain 38 coordinate variables with the trained CNN
regression systems from the test images and combine them to generate 19 landmarks by pairing the regressed
coordinates.
Despite these promising results, the proposed approach was limited to a relatively low detection accuracy,
or mis-location margins from the ground truths. These limitations can be due to the facts that (1) the input
images were scaled from 1935 × 2400 pixels to 64 × 64 pixels so that the fine error in the scaled images grew
rapidly as the images were enlarged to the original size, and (2) the regression systems were trained without
proper use of deep learning-related techniques, e.g. data augmentation, so that the trained systems were not fully
robust to the data variability. To overcome these limitations, it is required to (1) extend the proposed regression
system to coarse-to-fine framework by including an additional step to correct the location of the initially detected
landmarks, and (2) to use the appropriate techniques including data augmentation for making the regression
system robust to the variability of input images.
1 T # +
+ : , +
+ +
$ s
T #
i
+ s I
T T # T
$ i
i
_LA _L _L -L -L
6_L _L _L _L
Figure 4: Boxplot of Euclidean distances between predicted landmarks and ground truths.
4. CONCLUSION
In this research, we proposed the landmark detection system for dental image analysis by constructing mul-
tiple CNN-based regression systems predicting individual coordinate values of landmarks, independently. In
experiments, the proposed system showed promising performance by successfully locating the landmarks with
considerable margins. Use of deeper networks with larger input images would enhance the power of the pro-
posed system. As far as we know, this was the first attempt to construct the end-to-end learning framework for
cephalometric landmark detection task, whereas conventional models usually incorporated random forest-based
regression methods with hand-crafted features and shape-based modeling. Future works will include further
improvement by using deeper network structures and extension of our framework to other landmark detection
problems.
ACKNOWLEDGMENTS
This work was supported in part by Samsung Advanced Institute of Technology (SAIT).
REFERENCES
[1] Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla,
A., Bernstein, M., Berg, A. C., and Fei-Fei, L., “ImageNet Large Scale Visual Recognition Challenge,”
International Journal of Computer Vision (IJCV) 115(3), 211–252 (2015).
[2] Fukui, A., Park, D. H., Yang, D., Rohrbach, A., Darrell, T., and Rohrbach, M., “Multimodal compact
bilinear pooling for visual question answering and visual grounding,” CoRR abs/1606.01847 (2016).
[3] Lee, H., Park, M., and Kim, J., “Plankton classification on imbalanced large scale database via convolu-
tional neural networks with transfer learning,” in [2016 IEEE International Conference on Image Processing
(ICIP)], 3713–3717 (Sept 2016).
[4] Johnson, M., Schuster, M., Le, Q. V., Krikun, M., Wu, Y., Chen, Z., Thorat, N., Viégas, F. B., Wattenberg,
M., Corrado, G., Hughes, M., and Dean, J., “Google’s multilingual neural machine translation system:
Enabling zero-shot translation,” CoRR abs/1611.04558 (2016).
[5] Park, M., Li, H., and Kim, J., “HARRISON: A benchmark on hashtag recommendation for real-world images
in social networks,” CoRR abs/1605.05054 (2016).
[6] Silver, D., Huang, A., Maddison, C. J., and et al., “Mastering the game of go with deep neural networks
and tree search,” Nature 529, 484–489 (Jan 2016). Article.
[7] Yao, J., Wang, S., Zhu, X., and Huang, J., “Imaging biomarker discovery for lung cancer survival prediction,”
in [Proc. 19th International Conference on Medical Image Computing and Computer-Assisted Intervention
(MICCAI 2016)], MICCAI II, 649–657 (2016).
[8] Shin, H., Roberts, K., Lu, L., Demner-Fushman, D., Yao, J., and Summers, R. M., “Learning to read chest
x-rays: Recurrent neural cascade model for automated image annotation,” CoRR abs/1603.08486 (2016).