Вы находитесь на странице: 1из 8

Deployment of a Scalable Single Shot Detector (SSD)

Mobile Architecture for the Localization and

Classification of Pneumonia Chest Radiographs
Daniel Fleury

ABSTRACT false positives and negatives. A frozen, compressed, and weighted

localization-classification model with adaptable real-time
Pneumonia emerges as the leading cause of death in children capabilities could prove crucial in assisting the traditional "human
under five years of age worldwide, accounting for more than 1.6 eye" detection process. The project proposes several goals:
million deaths each year in this age demographic. A combined
18% of these deaths occur in children, and 99% of these Reliable Input Data: Training a traditional and weighted object
complications circulate in low-middle income countries with detection framework on thousands of images derived from a
underserved on-point clinical interventions. Consistent and certified NIH clinic repository.
scalable diagnostic protocols that eliminate problematic human
false negatives/positives are essential in preventative clinical and A Small-Scale Localization Model: Manipulating and using a
pulmonary treatment measures. The upsurge of Convolution Single Shot Detector (SSD) localization-classification architecture.
Neural Network (CNN)-driven object detection tasks in the
Efficiency and Precision: Instead of Regional Proposals Network-
previous ~2-3 years has provided a new field of manipulation for use a feedforward neural network with rapid multi-box regression
radiographic image feature map detection. and non-maximum suppression functions, resulting in faster
response and detection time.
This project investigates the potential of a low-latency mobile
scaled Single Shot Multibox Detector (SSD) architecture in the Response Time: Cutting down final detection time into ~5-10
localization and classification of Pneumonia-related radiographs. seconds.
A dataset of ~5000 annotated and de-identified bacterial and viral
Pneumonia chest X-Rays were derived from the NIH Clinical Accessibility and Scalability: Compress a real-time detection
Center to deploy a compressed frozen inference model on both a model into both a web application cloud service and an Android
standard Android device and cloud-based web application. Data
analysis employed varying confidence thresholds on Receiver Introduction
Operating Characteristic Curves (ROC), regularized and
converged localization-classification loss, and broad total loss Pneumonia escalates as an acute infection of the lungs with minimal
values to frame parameters of sensitivity, specificity, and onset values of prediction or precluding symptoms of severity. An
performance on diverse preidentified NIH validation datasets. individual with pneumonia experiences an accumulation of pus and
Following a mini sample size validation of 200 randomized lung fluid in their alveoli (small air sacks of the lung facilitating gas
radiographs, SSD Mobilenet V1 attained an Area Under the Curve exchange) ultimately complicating pulmonary functions (e.g. makes
(AUC) of 0.93 with high threshold sensitivity of 94% and a breathing difficult and limits oxygen intake). Chest x-ray scans
reveal vague opacities and "white spots" either dispersed or
specificity rating of 82% on a standard real-time Android video
concentrated in regions of the lungs
capture. The SSD model proves applicable in realtime .
diagnostics. Pneumonia reveals several difficulties:
Categories and Subject Descriptors
Computer Vision, Deep Learning, Single Shot Detectors, Two critical bacterial and viral subtypes of pneumonia
Convolutional Neural Networks, Feed Forward, Object Localization include Streptococcus pneumoniae and Haemophilus
influenzae with different proportions of global prevalence and
minute size variations in chest x-ray diagnostic patterns (i.e.
Proposal: Integration of SSD Archtiectures as a difficult-to-distinguish chest x-ray features).
Low Latency Detection Framework Bacterial (Streptococcal Pneumonia) accounts for the bulk of
global Pneumonia cases.
Pneumonia radiographs (Chest X-Rays) provide vague detection
features due to unclear dispersions of opacity/"whiteness" across the Claims the leading cause of death in children under five
photographs. At the same time, 1.6 million Pneumonia-related deaths worldwide- associated with ~1.6 million deaths each year.
and health burdens occur in low-middle income countries with scarce
access to consistent and high-quality diagnoses. Although chest
South Asia and sub-Saharan Africa endure more that half of total
radiography provides a reliable gold standard in developed radiology
suspected Pneumonia cases.
departments, 3rd world regions without rapid and high-quality
diagnostic protocols can complicate and misguide long-term
treatments. Radiologists are prone to false detections of chest x-ray Holistically, more than 99% of problematic cases occur in low-
scans leading to incorrect diagnoses. Introducing a critical low- income nations and developing countries. Ultimately, low-
latency detection method is crucial in eliminating potential physician
income countries establish 18 times more likelihood in children
dying under five years.
We can acknowledge the following about R-CNN Meta
A Gap in Detection: Features between normal lung types Architectures
(left) and Pneumonia (right) become indistinguishable at
particular disease stages (NIH Clinical Center)
• Uses a selective search technique known as a
regional proposal method to find ~2000 Regions of
Interest (ROIs)- merges similar pixel values
together to detect borders and features in the image.
• Regional Proposal layer is followed by a classifier.
• Depends on thousands of ROI proposals for
baseline accuracy.
• Slow inference time/long GPU processing times- an
R-CNN must create 2000 ROIs for each region of
interest and loops that process until baseline
precision is met.
• Although architectures such as Fast R-CNN use
improved ROI pooling to warp and simplify the
feature map, real-time object detection is still
disadvantaged due to an overwhelming number of
processes for each ROI.
R-CNN: Regrouping of Pixels in Selective Search to find
hundreds of ROIs (van de Sande et al. ICCV'11), ROI

Pooling Process

Under-resourced health infrastructure in developing nations

exacerbates potential global health burdens due to inconsistencies in
diagnostic efficacy (e.g. false positive and negative detection from
the physician's eye), an overwhelming number of patients, and
limited access to rapid and low-latency analysis tools.

Single Shot Detector: Model Architecture

Convolutional Neural Networks (CNN) establish the

backbone of image classification problems by manipulating
pixel data and using a "sliding window" fashioned approach
to discover unique and critical regions of the image. On a
more complex scale, generic object detection architectures
localize regions of interest in the image rather than giving a
broad label. Conventional object detection systems A "Single Shot" approach produces a fixed-size collection
implement some variant of hypothesizing bounding boxes, of bounding boxes along with their confidence scores and
resampling the feature map of each box, and applying a high- applies a non-maximum suppression function to eliminate
quality classifier over the neural network outputs. Several extraneous bounding box results.
years ago, regional-convolutional (R-CNN) neural networks
made tasks of object detection and classification possible by The SSD Architecture:
deploying architectures such as Faster R-CNN.
• Drastically decreases inference time by simplifying
a standard convolution into a depthwise and Simple Augmentation: Training sets were preprocessed through
pointwise convolution. augmentation with a flip, black/white, Gaussian distortion, rotation,
• Drastically decreases inference time by and skew functions. Augmentation outputs the following example
simplifying a standard convolution into a changes:
depthwise and pointwise convolution.
• Drastically decreases inference time by
simplifying a standard convolution into a
depthwise and pointwise convolution.

Detection Batch Input

~mAP FPS #Boxes
Method Size Resolution
Faster R-CNN
73.2 7 1 ~6000 1000X1000
Fast YOLO 52.7 155 1 98 448X448
YOLO (VGG16) 66.4 21 1 98 448X448
SSD300 74.3 46 1 8732 300X300
SSD512 76.8 19 1 24565 512X512
SSD300(2) 74.3 59 8 8732 300X300
SSD512(2) 76.8 22 8 24565 512X512
Annotation Using Provided Ground Truth Data: Training
data was manually annotated using ground-truth box
coordinates. Bounding box coordinates were outputted in a
".csv" extension format with width, height, class, and x
minimum-maximum/y min-max values. Annotation was
facilitated by using bounding box software (LabelImg). The
following image and table reveal bounding box creation and
resulting coordinate variables. The table represents only a
fraction of relevant bounding box coordinates.

Methodology: Data Collection and


Creation of tf.record: Although a .csv file type provides

Deploying a small-scale SSD architecture that uses unique
relevant image source and bounding data, weighted neural
localization functions (e.g. non-maximum suppression and
networks require a compressed binary file with relevant train and
multi-box regression) requires reliable training data across a
validation formats to process data. Tf.record files convert
variety of cases. Certified medical repositories including the U.S.
relevant bounding box coordinates and class names into a
National Institutes of Health (NIH), The Society of Thoracic
recognizable binary storage file that the Tensorflow architecture
Radiology, and MD.ai provide rich training data across three
can use to establish feature maps and weights in correspondence
classifiers- Bacterial Pneumonia, viral Pneumonia, and a non-
with the images. The test.csv and train.csv filetypes were
Pneumonic lung (normal). Convolutional Neural Networks
converted into their corresponding tf.record filetypes.
function similarly to a vulnerable brain that absorbs input cases
and reapplies them in real-world situations. Narrow and unvaried
training data can cause and exacerbate overfitting where the Model Training
networks progressively learn to recognize only the input training
data (i.e. unable to apply detection in new and real-world cases). Training the Architecture: Training was initiated for the feed-
Moreover, although training accuracy may rise considerably on forward SSD architecture. Training was lengthy- requiring
a large chunk of training data, validation datasets are essential. upwards of 3 days due to dependency on a 2 GB NVIDIA GT
Training and Validation sets were split on a 3:2 ratio with 3000 750 ti. Three primary stages are established throughout training
training images and 2000 validation images of the architecture: 1.) Input data is fed forward from the
compressed tf.record file and class values (normal, bacterial, or
viral pneumonia) are interpreted as "cond" (conditionals) that are
used to weight the model. 2.) Image data is preprocessed,
undergoing resizing into 512X512 and an additional stage of
augmentation. 3.) Bounding box predictors and classifiers are
created from previously weighted functions in the network.
Training was finalized once classification loss converged ~3.00
and localization converged ~0.3

Further, a real-time android detection demo was


Exporting Frozen Inference Graph (Prediction


The prediction model (actual classifier) was exported into a .pb Binary Data Analysis: ROC Curve/AUC, Loss
format (protobuf-protocol buffer) which again serializes and
compress our modified and weighted model into a small-scale and
Scalars, and Interactive Dot Diagram
deployable format
Although IoU (intersection over the Union) could have been practical
Deployment on Web Server and Android for bounding box validation, this research problem escalated into a
classification problem with several difficulties in identifying normal,

The frozen inference graph was integrated into a Flask bacterial, or viral instances of Pneumonia. Over 200 randomized test
(microframework that supports Python) web app- cases were executed for ROC curve/AUC analysis across varying
available here: (https://amicii.herokuapp.com/) thresholds. Additionally, real-time loss graphs (classification,
localization, and total loss) were exported in order to visualize
regression of the model and convergence of the loss values after
thousands of steps. Loss values are quantified per step out of
~31,000-32,000 steps in total accumulated over ~2-3 days. The
following data reveals an interactive dot diagram (illustrating the
distribution of binary results with 0=negative and 1=positive), ROC
curve analysis (with Area Under the Curve), and loss scalars over
~31,000 steps.
The loss value vs. step count reveals a discrepancy between
localization and classification loss: Localization loss may have
presented itself as a low-feature obstacle in training. In other words,
due to the consistent nature of lung shape and overall appearance of
the datasets, the architecture easily grasped where Pneumonic
pus/fluid was accumulating- even in unclear/vague cases.
Ultimately, localization loss converged ~0.3-0.4. However, in terms
of classification across three identifiers (normal, bacterial, and
viral), complicated feature maps on distinct instances of Pneumonia
versus normal cases created a threshold in loss values: Classification
loss converged ~3.0. Although classification loss was a hindrance in
baseline image recognition, ROC curve analysis across 200
validation cases marked an AUC of ~0.93 and an overall sensitivity
of 0.94 on a real-time Android platform. Example inferences on the
Android platform are shown below:
Conclusion Bibliography
This project investigated advances in a small-scale real-time Single
Shot Detector (SSD) and the manipulation of its feed-forward
capabilities to function on both a cloud web application and an Fuentes, A. (2018). High-Performance Deep Neural
Android platform. Moreover, a responsive, real-time architecture Network-Based Tomato Plant Diseases and Pests
that attains relatively high accuracy proved a plausible candidate in Diagnosis System With Refinement Filter Bank. Frontiers
closing the gap of rampant false positives and negatives especially in Plant Science,1-2. doi:10.3389/fpls.2018.01162
in developing medical departments. Attainable project constraints
and goals were recognized in the preliminary stage: Certified and Grel, T. (2017, February 28). Region of interest pooling
Reliable Input data, a small-scale/scalable localization model, explained. Retrieved July 3, 2018, from
relatively high efficiency and precision, accessibility, and high https://deepsense.ai/region-ofinterest- pooling-explained/
response time. The SSD Mobilenet V1 model's ROC/AUC and real-
time video capture analysis fulfilled hypotheses that an architecture
which replaces overwhelming ROIs/proposals with non-maximum
Hiu, J. (March 27). Object detection: Speed and accuracy
suppression and multi-box regression would cut down inference comparison (Faster R-CNN, R-FCN, SSD, FPN,
time while maintaining high accuracy. ROC curve analysis RetinaNet and YOLOv3). Retrieved July 20, 2018, from
highlights an AUC of ~0.93, a sensitivity of ~0.94, and a specificity https://medium.com/@jonathan_hui/object-detection-
score of ~0.82 across ~200 validation cases. In contrast to speed-and-accuracy-comparison-faster-r-cnn-r-fcn-ssd-
conventional CNN meta-architectures, frozen inference graphs and-yolo-5425656ae359
produced by the SSD V1 model were compact enough for web
application and mobile Android usage. Ultimately, an SSD Huang, J., & Fathi, A. (2017). Speed/accuracy trade-offs
architecture proves applicable in high-feature medical detection in for modern convolutional object detectors. Arxiv, 1-21.
real-time deployment situations.Whenever a user logs back in, the
Retrieved July 20, 2018, from
browser extension will evaluate his current learning status. When
the user browses pages where there are words that he has learned
before, or are similar to those that he learned, it will pop up a quiz
to help evaluate the user’s past learning. The tool will show J. (2018, January 18). Faster R-CNN: Down the rabbit
words that the user is weak on more frequently. Once the hole of modern object detection. Retrieved July 3, 2018,
user reaches ’expertise’ level on a particular word, it is no from https://tryolabs.com/blog/2018/01/18/faster-r-cnn-
longer shown to him. This is where the intelligence of the down-therabbit- hole-of-modern-object-detection/
extension comes into play.
Jha, P. (2013). Disease Control Priorities in Developing
Clinical Value: Countries, 3rd Edition Working Paper #2. Economic
Evaluation for Health,1- 66. Retrieved July 3, 2018.
Model Attained High Response Time/Low Inference
Time: Confirming detection requires ~5-10 seconds on both Liu, W. (2015). SSD: Single Shot MultiBox Detector.
a video capture and web application instance; physicians can Arxiv: Computer Vision and Pattern Recognition,1-17.
receive realtime responses from the Android application doi:10.1007/978- 3- 319-46448-0_2
Liu, W. (2018). SSD: Single ShotMultiBoxDetector. 1-1.
Static Model that serves as Reference for Retrieved July 20, 2018, from
Diagnostics: Although not purposed to single-handedly http://www.eccv2016.org/files/posters/O-1A-02.pdf
replace diagnostic protocols, the inference model uses
static weights derived from a variety of NIH clinical Mustamo, P. (2018). Object detection in sports:
repositories, thus eliminating potential false TensorFlow Object Detection API case study. University
positives/negatives. of Oulu: Faculty of Science,1- 43. Retrieved July 3, 2018.

Scalability and Data Addition: Radiologists and Priority diseases and reasons for inclusion. (n.d.). World
physicians can reinforce the network by adding local clinical Health Organization (WHO), 1-4. Retrieved July 20,
chest x-ray scans. 2018, from
Application in Low-Income and Developing Ch6_22Pneumo.pdf.
Nations: Underresourced radiology departments without a
quality network of supporting staff and quality training SciELO Public Health Library 2001
may utilize and modify the model to assist in critical
cases of Pneumonia. Trends in Pneumonia and Influenza Morbidity and
Mortality. (2015). American Lung Association,1-16.
Retrieved July 20, 2018, from

van de Sande et al. ICCV'11

Xu, J. (2017). Deep Learning for Object Detection: A
Comprehensive Review. Towards Data Science.
Retrieved July 3, 2018, from
objectdetection-a- comprehensive-review-73930816d8d9