Uncertainty-Assisted Deep Vision Structural Health Monitoring

DOI: 10.1111/mice.
12580
ORIGINAL ARTICLE
Uncertainty-assisted deep vision structural

health monitoring
Seyed Omid Sajedi Xiao Liang
Department of Civil, Structural and

Environmental Engineering, University at Abstract
Buffalo, the State University of New York, Computer vision leveraging deep learning has achieved significant success in
Buffalo, NY 14260, USA
the last decade. Despite the promising performance of the existing deep vision
Correspondence inspection models, the extent of models’ reliability remains unknown. Structural
Xiao Liang, Department of Civil, Structural health monitoring (SHM) is a crucial task for the safety and sustainability of
and Environmental Engineering, Univer-
sity at Buffalo, the State University of New structures, and thus, prediction mistakes can have fatal outcomes. In this paper,
York, Buffalo, NY. we use Bayesian inference for deep vision SHM models where uncertainty can
Email: liangx@buffalo.edu
be quantified using the Monte Carlo dropout sampling. Three independent case
studies for cracks, local damage identification, and bridge component detection
are investigated using Bayesian inference. Aside from better prediction results,
the two uncertainty metrics, variations in softmax probability and entropy, are
shown to have good correlations with misclassifications. However, modifying
the decision or triggering human intervention can be challenging based on raw
uncertainty outputs. Therefore, the concept of surrogate models is proposed
to develop the models for uncertainty-assisted segmentation and prediction
quality tagging. The former refines the segmentation mask and the latter is
used to trigger human interventions. The proposed framework can be applied
to future deep vision SHM frameworks to incorporate model uncertainty in the
inspection processes.
1 INTRODUCTION great promise in dealing with real-world images and

more advanced ones are continuously being developed
Visual inspections are an indispensable part of structural (e.g., Chen, Girshick, He, & Dollár, 2019; Jégou, Drozdzal,
health monitoring (SHM). Condition assessments can be Vazquez, Romero, & Bengio, 2017). This progress has moti-
carried out periodically as a part of the maintenance pro- vated researchers to investigate the potential applications
gram or in near real-time after the extreme events (Sajedi of the machine and deep learning in civil engineering.
& Liang, 2019b, 2020a). Considering several factors such as There have been efforts to apply these algorithms
time–cost constraints, reliability, and life-safety concerns in material properties prediction (Rafiei, Khushefati,
of human-based inspections, there has been a growing Demirboga, & Adeli, 2017), asphalt surface analysis
incentive for automation in SHM. Extracting useful (Tong, Gao, Sha, Hu, & Li, 2018), recovering lost sensor
information from images is considered as a challenging data (Oh, Glisic, Kim, & Park, 2019), early earthquake
task, especially in the presence of noisy and complex warning systems (Rafiei & Adeli, 2017a), and construc-
backgrounds. Deep learning algorithms have shown tion management (Rafiei & Adeli, 2016, 2018a). More
© 2020 Computer-Aided Civil and Infrastructure Engineering
Comput Aided Civ Inf. 2020;1–17. wileyonlinelibrary.com/journal/mice 1

2 SAJEDI and LIANG
specifically, SHM research has significantly benefited By taking a closer look at this literature review, espe-
from utilizing deep learning in processing information cially in the past three years, it is evident that deep
to assess structural conditions. This input information learning for visual inspections is becoming more trending
can be from various sources such as vibration (Azimi among the SHM research community. Artificial intel-
& Pekcan, 2019; Eltouny & Liang, 2020; Rafiei & Adeli, ligence may revolutionize SHM research and practice.
2017b, 2018b; Sajedi & Liang, 2020b, 2020c), acoustic Nevertheless, certain facts should be kept in mind:
emissions (Ebrahimkhanlou, Dubuc, & Salamone, 2019), proposed models in the literature are data-driven, and
and strain measurements (Karypidis, Berrocal, Rempling, thus, their performance is highly dependent on the
Granath, & Simonsson, 2019). While being good indicators quantity and quality of the training data. While there
of structural damage (Amezquita-Sanchez, Park, & Adeli, have been efforts to improve the overall performance,
2017), acquiring these types of records commonly requires such as using transfer learning and data augmentation
special instrumentations. Hence, their availability is com- (Gao & Mosalam, 2018), deep learning algorithms are not
monly limited to specific case studies, simulations, and lab mistake-free.
experiments. This has made the generalization capability Erroneous predictions by vision models have caused
of these data-driven models challenging, especially for fatal accidents in the past where the side of a trailer vehicle
training deep learning models that require a substantial was mistakenly classified as the sky in a self-driving car
number of observations. In contrast, images are relatively (McAllister et al., 2017). We believe that the consequences
more accessible both in terms of quality and quantity. of misclassifications in structural inspections could be far
This is partially thanks to camera-equipped drones that more severe because SHM investigates structural damage
can easily capture images that are challenging to obtain and safety directly. Therefore, it is necessary to have a
by a human inspector (e.g., Liu, Nie, Fan, & Liu, 2020). measure of the model’s confidence. This need is further
Moreover, a significant amount of research is dedicated highlighted considering the much smaller size of image
to convert raw visual records into information that can be datasets for civil engineering applications compared to
utilized for the inspection and monitoring of structures those of computer science like ImageNet (Deng et al.,
(Spencer, Hoskere, & Narazaki, 2019). 2009).
The vision-based algorithms are generally investigated In this paper, we leverage the concept of deep Bayesian
in two main categories: object detection and damage neural networks (Gal & Ghahramani, 2015) for SHM
classification. Identifying the structural components vision tasks. This model’s uncertainty output, alongside
has been done in forms of predicting objects’ bounding the predictions, can be used to trigger human interven-
boxes (Liang, 2019) and pixel-wise segmentation of the tions when the model uncertainty is high (e.g., monitoring
whole scene (Narazaki et al., 2020). Information obtained damage in a nuclear power plant). The following section
from these models can be used for further inspection provides a brief underlying theory for Bayesian segmenta-
guidance. The second category (i.e., damage detection) tion networks. In Section 3, we then explain the concept of
includes identifying various types of damage. Pavement surrogate models as tools that aid in the interpretation of
defects and road conditions have been studied using uncertainty. Later, in Section 4, datasets for the three inde-
different algorithms including probability generative pendent case studies are briefly explained. In Section 5, a
models and support vector machines (Ai, Jiang, Kei, & sensitivity analysis is performed to provide insight on the
Li, 2018), convolutional neural networks (CNNs) (Bang, selection of dropout hyperparameters. The experimental
Park, Kim, & Kim, 2019), and recurrent neural networks validations are given in Section 6 on the three case studies
(A. Zhang et al., 2017). Identifying cracks have also been while showing the applications of the two surrogate mod-
the focus of several SHM studies using either bounding els. Finally, the paper is concluded by summarizing the key
boxes (Cha, Choi, & Büyüköztürk, 2017; Deng, Lu, & Lee points.
2020; Xue & Li, 2018) or semantic segmentation (Sajedi &
Liang, 2019a; Yang et al., 2018). Other types of structural
defects, such as delamination (Cha, Choi, Suh, Mah- 2 DEEP LEARNING FRAMEWORK
moudkhani, & Büyüköztürk, 2018), cavity (R. Li, Yuan, WITH BAYESIAN INFERENCE
Zhang, & Yuan, 2018; C. Zhang, Chang, & Jamshidi , 2019),
fatigue cracks (Hoskere, Narazaki, Hoang, & Spencer, This section gives a brief theoretical background on
2018), and efflorescence (S. Li, Zhao, & Zhou, 2019), Bayesian inference and how model uncertainty can be
or a subset of them are identified using deep learning quantified in deep segmentation models. Later, the deep
architectures. learning architecture used in this paper is explained.
SAJEDI and LIANG 3
2.1 Inference and uncertainty 𝑁

1 ∑𝑥 ∑𝐿 (
2 2
)
quantification 𝐿do = 𝐸(𝐘𝑖 , 𝐘̂ 𝑖 ) + 𝜆 ‖𝐖𝑖 ‖ + ‖𝐛𝑖 ‖ (3)
𝑁𝑥 𝑖=1 𝑖=1
Uncertainty should be an essential output of any dam- where E(.,.) denotes the distance between predictions and
age diagnostic system that is intended for automa- real output, which is captured using binary or categorical
tion. Knowing the confidence with which we can trust cross-entropy in this paper, and the second term refers to
the SHM model’s predictions is essential for decision- the L2 regularization of weights and bias terms (bi ). Nx is
making. Bayesian probability theory (Ahmadlou & Adeli, the number of observations in the training set and 𝜆 is a
2010; Koller & Friedman, 2009) offers a mathematically scaling hyperparameter.
grounded framework to reason about the uncertainty of Beyond the training phase, models need to be evaluated.
a data-driven model. However, such methods are compu- In terms of practical implementation, the main difference
tationally expensive, especially for real-time implementa- between standard and Bayesian inference is in the way
tion in computer vision. Considering the importance of dropout is used. With a prior probability, this operator
model uncertainty on the one hand, and the challenge randomly sets a fraction of input elements into zero as a
of implementing Bayesian machine learning for computer way to reduce overfitting while training a neural network.
vision on the other hand, Gal and Ghahramani (2015) pro- The standard dropout uses the weight averaging technique
posed a delicate solution. They have showed that the use (Srivastava, Hinton, Krizhevsky, Sutskever, & Salakhutdi-
of dropout operator in neural networks of arbitrary types nov, 2014) at test time such that the deep learning models
approximates Bayesian inference in deep Gaussian process produce deterministic output. In Bayesian inference, the
statistical model. In what follows, we provide a brief the- output probabilities of classes, as intended, are random
oretical overview on how the uncertainty is dealt with in variables. Therefore, the probability of a testing input
the proposed approach. observation (x*) belonging to the class yi can be estimated
CNNs are proven to be effective in dealing with image as:
data. However, the learnable parameters of kernels (W)
are deterministic values that are optimized by minimizing
𝑝(𝑦𝑖 |𝐱∗ , 𝐗, 𝐘) = 𝑝(𝑦𝑖 |𝐱∗ , 𝐗, 𝐖)𝑞(𝐖)d𝐖 (4)
a loss function calculated based on the raw image input ∫
(X) and the desired output labels (Y). To reason about a
model’s uncertainty, W and subsequently the predicted In pixel-wise image segmentation, a regular prediction
output probabilities should be treated as random vari- output (y) is a tensor of shape (height, width, Nb ) where
ables. With that said, obtaining the posterior distribution the last channel refers to the softmax probabilities of
of convolution kernels given the existing labeled data, class i ( 𝑆𝑖 , 𝑖 ∈ {1, 2, … , 𝑁b }). Feeding a specific input
𝑝(𝐖|𝐗, 𝐘), is of interest. This distribution is intractable image to a deep learning model with standard inference
and cannot be analytically calculated (Gal & Ghahramani, will result in deterministic 𝑆𝑖 values, the maximum
2015). Instead, an approximate variational distribution of which is considered as the decision probability. In
(Graves, 2011), q(W), is used that is determined by contrast, activating dropout layers at the inference time
minimizing the Kullback–Leibler (KL) divergence as a yields a prediction output where class probabilities can be
measure of similarity between this distribution and the deemed to be random variables. For a single observation,
full posterior: Ns Monte Carlo samples can be drawn to form a stacked
output tensor with the shape of (height, width, Nb , Ns ).
KL (𝑞(𝐖)||𝑝(𝐖|𝐗, 𝐘)) (1) The expected value (mean) of 𝑆𝑖 samples is then used for
the inference, which is equivalent to the integration in
which can be expressed as the minimization of the (4). Therefore, the probability of a pixel observation (y)
objective function (Lv ) with respect to the variational belonging to the class i can be expressed as:
parameters q(W):
𝑁
1 ∑s 𝑛
𝑝(𝑦 = 𝑖|𝐱∗ , 𝐗, 𝐘) ≈ E(𝑆𝑖 ) = 𝑆 (5)
𝐿𝑣 = − 𝑞(𝐖) log 𝑝(𝐘|𝐗, 𝐖)d𝐖 + KL (𝑞(𝐖)||𝑝(𝐖)) 𝑁s 𝑛=1 𝑖
∫
(2)
By considering a Bernoulli distribution for setting This equation highlights the key difference between
individual inputs before each convolution layer, and by the Bayesian and standard inferences. The latter directly
performing Monte Carlo integration, the objective cost in uses constant 𝑆𝑖 values that are obtained from a neural
(2) is equivalent to the training loss of the deep learning network without dropout sampling, while the Bayesian
model (Gal & Ghahramani, 2016): inference utilizes the mean of the Ns samples.
4 SAJEDI and LIANG
Two metrics are introduced to quantify model uncer-

tainty. Kendall and Gal (2017) proposed to use entropy (H)
as a measure of model epistemic uncertainty:
𝑁b
∑
𝐻(𝑝) = −𝑝𝑖 log(𝑝𝑖 ) (6)
𝑖=1
Moreover, class softmax variance can also be used as an

indicator of the model uncertainty for each class (Kendall,
Badrinarayanan, & Cipolla, 2015). This metric is obtained
by taking the variance of Monte Carlo samples ( 𝑆𝑖 ’s) with (b)
respect to each class. A relatively high variance in the soft-
max probabilities can indicate a model’s lack of confidence
in its predictions. In this paper, we use the class softmax
standard deviation (CSSD) for better numerical stability in
the input for the models that will be explained in Section 3. (c)
√
√
√ ∑𝑁s
1 (a)
𝐶𝑆𝑆𝐷𝑖 = √
2
[𝑆 𝑛 − E(𝑆𝑖 )] (7)
𝑁s − 1 𝑛=1 𝑖
FIGURE 1 FC-DenseNet building blocks
As the second overall measure of model uncertainty,

the mean of the per-class softmax standard deviation
(MCSSD) is used. In segmentation, entropy and MCSSD recovered in the upsampling path. The two paths are
can be obtained for all pixels. Therefore, the two measures connected with a bottleneck in between. What makes this
of uncertainty are presented for all pixels in the form of architecture unique is the presence of sophisticated con-
a 2D tensor with the same dimension as that of the final nectivity patterns where input from the previous layers is
segmentation mask. concatenated with the extracted features maps. Moreover,
It should be emphasized that there is a fundamental several skip connections help to recover fine-grained
difference between softmax probabilities and model con- information in the upsampling process.
fidence. The last softmax layer provides an approximation The building blocks of FC-DenseNet are dense blocks
of relative class probabilities rather than the absolute (DBs), transition up (TU), and transition down (TD) units.
model uncertainty that is captured using Monte Carlo The details of each unit are shown in Figure 1. DBs are
dropout. For example, a predicted label with a high soft- composed of modular stacked layers. Each module con-
max probability may also have a high model uncertainty tains a sequence of batch normalization, ReLU activation,
at the same time. convolution, and dropout layers (Figure 1a). The output of
each module is then concatenated with its input. The first
convolution will extract 48 filters, while the number of
2.2 Deep learning architecture filters inside DB convolutions (growth rate) is 16. The TD
units are similar to a modular stacked layer, while it has
In this paper, fully convolutional (FC) DenseNet is lever- an additional max-pooling operation to reduce the spatial
aged to perform Bayesian inference. This neural network resolution (Figure 1b). The number of filters in their cor-
combines the underlying idea of U-Net (Ronneberger, responding 1 × 1 convolution is equal to the final output of
Fischer, & Brox, 2015) and DenseNets (Huang, Liu, Van the previous DB. Finally, TU units are simply transposed
Der Maaten, & Weinberger, 2017). The architecture has 3 × 3 convolution (Long, Shelhamer, & Darrell, 2015).
achieved state-of-the-art performance on benchmark In this paper, we investigate three different case studies.
datasets for urban scene segmentation (Jégou et al., The first dataset deals with binary crack detection. The
2017) and is considered among the most successful ones second one is dedicated to localizing damage. Finally, a
for the given task. Similar to most image segmentation multiclass task of bridge component recognition is investi-
algorithms, FC-DenseNet automatically extracts features gated as the third case study. Given the different complex-
while reducing the spatial resolution of feature maps by ity, the original architecture is tailored for computational
performing multiple pooling operations in the downsam- efficiency. Therefore, the number of modular stacked units
pling path. The spatial resolution of the input is later inside the DBs is modified. The details are given in Table 1.
SAJEDI and LIANG 5
TA B L E 1 Dense block configuration training and inference phases. For a single input image,
Block ID Number of modular stacked layers multiple Monte Carlo dropout samples are drawn to
DB-1, DB-11 2 generate a bin of softmax probabilities, from which the
DB-2, DB-10 3 prediction and uncertainty metrics can be obtained.
DB-3, DB-9 4 All three models are trained using Keras API (Chollet,
DB-4, DB-8 5
2015) with a batch size of 2 on computers equipped with
NVIDIA GTX 1070 and GTX 1080, each with 8 GB of
DB-5, DB-7 6
memory. Observations are randomly shuffled and 80%
DB-6 (Bottle neck) 8
are considered for training and the remaining are used
as the test set. Moreover, 20% of training observations are
held out as a validation set. The model with the lowest
The general structure of the adopted deep learning validation loss is selected for testing. It should be noted
architecture for Bayesian inference is illustrated in Fig- that Bayesian inference is used for both validation and test
ure 2. Recall that the dropout layers are active in both sets.
Input
Downsampling
First convolution
DB-1 C TD DB-2 C TD DB-3 C TD DB-4 C TD DB-5
TD
Skip connection
Skip connection
Skip connection
Skip connection
DB-i Dense block (i) DB-6
C Concatenation (Bottle neck)
TD Transition down
TU Transition up
TU
C
Last convolution
Softmax C DB-11 C TU DB-10 C TU DB-9 C TU DB-8 C TU DB-7
...
Upsampling
+ +
Softmax bin Prediction MCSSD Entropy

(Monte Carlo dropout samples)
FIGURE 2 Bayesian inference with FC-DenseNet architecture (parts of this figure are inspired by Jégou et al., 2017)
6 SAJEDI and LIANG
Based on our experiments, Nadam optimizer with uncertainty are generated for thousands of pixels in a
exponential weight decay (per epochs) yields a faster con- single image. The incentive behind this selection is the
vergence with a fewer number of epochs. L2 regularization presence of relatively large uncertainty output that is both
(Goodfellow, Bengio, & Courville, 2016) is also used for intellectually challenging and laborious to process for
better numerical stability and reducing the chances of a human operator. To tackle this issue, we develop two
overfitting. All models described in this section are trained concept solutions that process the raw model uncertainty
with a maximum of 200 epochs and an early stopping of and provide valuable information. The two are explained
15 epochs based on the validation loss. A learning rate of in what follows.
1.0e-4 and an exponential decay rate of 0.9996 are also used
in training. Further details on dropout hyperparameters
are discussed in Section 5. 3.1 Uncertainty-assisted segmentation
(UAS)
3 APPLICATIONS OF MODEL The uncertainty masks correlate well with the boundaries
UNCERTAINTY in the regions of interest (i.e., cracks, local damage, and
bridge components). Moreover, compared with correct
Bayesian inference has two potential advantages for deep predictions, a considerable portion of misclassified pixels
vision SHM. First, it can potentially improve the overall have a relatively high value of model uncertainty. These
performance of a model. Second, the predictor is equipped correlations visually make sense because one can perform
with a measure of uncertainty. The importance of this a side-by-side comparison between the uncertainty mask
output is crucial for the condition assessment of critical and the ground truth.
structures such as nuclear facilities where missing a When exploring the field or performing an additional
simple defect may lead to catastrophic outcomes. Quan- expert analysis is not possible, model uncertainty is the
tification of uncertainty helps further minimize the risk only source of information to potentially enhance the
of using data-driven models for safety-critical inspections. predictions. In such circumstances, a human operator
Being informed of the lack of confidence in a model’s pre- may also face challenges in how to utilize the MCSSD
dictions, a decision-maker can further explore the scene. or entropy for a better estimate of real conditions. For
However, merely quantifying the uncertainty may not be example, this utilization is very complicated (or even
sufficient for a practical application in the SHM industry. impossible) for a human if the CNNs are used for
This insufficiency stems from two important factors. regression tasks such as depth estimation of pixels.
First, the majority of the proposed deep learning frame- In this section, we propose a way regarding how to
works in the literature are designed/optimized to partially benefit from the uncertainty mask without human inter-
automate the SHM process. Assigning a human operator vention. This goal is achieved by introducing a second
to monitor the uncertainty output and to modify the data-driven model to automatically interpret the MCSSD
predictions is not feasible for certain SHM tasks that and entropy masks. We call it a surrogate model because
require processed information in real time. For example, identifying uncertainty rules for thousands of pixels,
automatic guidance of unmanned aerial vehicles (UAVs) which may or may not be correlated, is not directly
for inspections (e.g., Liang, Zheng, & Zhang, 2018; Zheng, possible in an image. After training the original model
Chen, & Liang, 2019) will not be automatic anymore if and calibrating the learnable parameters, the uncertainty
it is constantly interrupted by manual instructions. Nev- output is available. Therefore, it is desired to construct a
ertheless, inspecting hundreds of frames generated per second input by concatenating the original image with
second is laborious and potentially impossible in a timely the mean softmax probabilities (E(Si )), CSSDi for all
manner. classes, and entropy in stacked channels (see Figure 3).
The second challenge is the interpretation of uncer- The surrogate model is trained in a supervised manner
tainty. Identifying the prediction mistakes may not be with the second input and the same labeled pixels as the
visually evident from the values expressed in Equations (6) output. Different from the initial Bayesian segmentation,
and (7). The upcoming case studies in the manuscript the surrogate model has access to uncertainty output. It
demonstrate this complexity, especially for the last two is trained to learn the underlying mapping between the
datasets that include real-world objects (such as trees, second input (image data combined with early prediction
rivers, cars, etc.) and distortions (e.g., presence of shadow, uncertainties) and the ground truth. This process is
different camera angles). denoted as uncertainty-assisted segmentation (UAS).
The classification problems in this paper are based on In this paper, for the sake of comparison, we use the
semantic segmentation where the output prediction and same architecture of the segmentation network to train
SAJEDI and LIANG 7
(A): Mean Softmax Probabilities (E(Si))

(B): CSSDi for all classes
(C): Entropy
Bayesian Segmentation
Second Input Surrogate Model
Network
+ + +
(B) (C)
(A)
Raw input
Refined Prediction
(e.g., an image)
FIGURE 3 UAS surrogate model with the second input
the UAS surrogate model. The only difference is the (PQT) is proposed with the idea of making such warning
construction of uncertainty input data. This helps to systems. PQT provides insights on the robustness of the
ensure that the improvement in the results is solely due predictions based on the model uncertainty. Similar to
to the proposed novel input. Therefore, factors such as UAS, a surrogate model is developed that takes stacked
an in-depth architecture or change in the selection of channels of softmax probabilities, MCSSD, entropy, and
hyperparameters are not a potential reason for the refined additionally the predicted labels as input. The goal of
predictions. Given this similarity, the second model is this PQT surrogate model is to label each output of the
also Bayesian. This assumption is appropriate for the original model based on the quality of predictions, which
comparative studies. However, future work can focus on is a prior metric selected by decision-makers. For example,
optimizing the surrogate model itself for accuracy and the PQT surrogate can be calibrated (trained) to accept or
computational efficiency in real-time implementations. reject an output based on a threshold of global prediction
accuracy among labeled pixels in a segmentation mask. As
a supervised learning task, the PQT labels are generated
3.2 Prediction quality tagging for training, validation, and test sets.
A CNN architecture, inspired by VGG-16 (Simonyan
The UAS refines the prediction of the original deep learn- & Zisserman, 2014), is adopted for the PQT surrogate
ing architecture. However, depending on the design of (Figure 4). The architecture has been optimized for speed
the data-driven model and the quality of the dataset, such and robustness. The network starts with four convolution
improvements could be limited. In other words, chances computational blocks (CBs). The 3D output tensor after
for mistakes (while reduced) still exist. the last CB is then reshaped into a vector to be fed to a
The original Bayesian deep learning model is trained sequence of fully connected blocks (FCBs). The surrogate
and validated based on hyperparameters that optimize model ends with a sigmoid activation to predict the like-
the performance given the existing data. As a result, the lihood of PQT labels. Dropout with a probability of 40% is
majority of output predictions are often acceptable if the used before the dense layers to alleviate overfitting. Leaky
model is trained properly. Nevertheless, a decision-maker ReLU activation is also utilized after convolution and
could be concerned about less common situations in dense layers with a 0.05 negative slope coefficient. Further
which a model has made mistakes. This issue is of great details on the application of PQT are given in Section 6.2.
importance, especially in cases where the deep learning
model is designed to localize damage in monitoring
critical infrastructure. For example, structural defects may 4 IMAGE DATASETS
propagate and failure to detect damage in early stages
could lead to costly or unjustified repairs. Three different datasets are investigated regarding the
When achieving a perfect data-driven model is not proposed model uncertainty for deep vision SHM. A
possible, the alternative is to equip the automation frame- brief description of real-world images used for each case
work with a system that triggers human intervention. study is given in what follows. Each dataset has specific
In this section, the concept of prediction quality tagging challenges that will be discussed later in the paper.
8 SAJEDI and LIANG
(a) CB (n): Convolution with n filters + Batch normalization + Leaky ReLU + Max pooling
(b) FCB (n): Dropout + Fully connected layer with n units + Leaky ReLU
(c) Dense (1): The last fully connected layer with 1 output unit
Reshape
Sigmoid
Rejected
CB (512) or
CB (256)
CB (256) FCB (64) Dense(c) (1) Accepted
FCB (256)
CB (128) FCB (512)
CB(a) (64) FCB(b) (1024)
Second Input (similar to UAS + the prediction mask as an extra channel) Prediction quality tag
FIGURE 4 PQT surrogate model
4.1 Crack forest (I) of highway bridges. While originally used to identify
bridge columns with bounding boxes (Liang, 2019), bridge
Binary segmentation of cracks is one of the well-studied components are pixel-wise labeled by the authors for
areas for automated SHM. For the first case study, the semantic segmentation.
crack forest dataset (Shi, Cui, Qi, Meng, & Chen, 2016)
is utilized. The dataset includes 118 color images with
a resolution of 320 × 480 pixels. The images are taken 5 SENSITIVITY ANALYSES ON
by a camera-equipped smartphone with nearly constant DROPOUT HYPERPARAMETERS
photography setup (e.g., distance, exposure, aperture). The
dataset reflects information on surface road conditions. Dropout probability ratio (Pdo ) and the number of Monte
The ground truth is processed to include a binary mask Carlo samples Ns are the two main hyperparameters
with two classes: crack and background. that affect the performance of Bayesian inference. For
example, without dropout (Pdo = 0), Bayesian inference is
meaningless. In contrast, very high dropping probabilities
4.2 Local structural damage (II) will lower the convergence speed and increase the chances
of underfitting. Moreover, a proper selection of Ns for
The second case study is dedicated to localizing the Monte Carlo integration in (5) is important. Small sample
presence of damage. The dataset is obtained from Liang sizes could reduce the stability of the model and the
(2019), which includes 436 images with a resolution of quality of output predictions. Nevertheless, the inference
430 × 400 pixels. The observations are obtained from time grows almost linearly with respect to Ns , which can
two main sources: laboratory experiments on reinforced affect the real-time implementation of the deep learning
concrete columns and postearthquake damage records of architecture. The focus of this section is to perform a
bridges. For computational efficiency, images are resized series of extensive sensitivity analyses to provide insight
to 215 × 200 pixels. on the selection of these hyperparameters, and to select
a final model for the three case studies. Therefore, the
results shown in this section are based on the validation
4.3 Bridge components (III) set. The overall performance for the unseen test sets will
be discussed later in Section 6.
For the third case study, we propose an example that deals
with object recognition rather than damage diagnosis. As
mentioned in the literature review, automatic identifica- 5.1 Dropout probability (Pdo )
tion of different objects (e.g., structural components) is
crucial, especially in guiding UAVs for inspection. Bridges The effects of dropout probability are inspected by per-
are important parts of the civil infrastructure that will be forming a series of sensitivity analyses where 14 values
benefited from such camera-equipped UAV inspections. of Pdo (%) ∈ {1, 2, 5, 8, 10, 12, 15, 20, 25, 30, 40, 50, 60, 70},
To this end, the third dataset is composed of 236 images are considered. The training process is accompanied by
SAJEDI and LIANG 9
Data set (I) Data set (II) Data set (III)
MCA (%)
MCA (%)
MCA (%)
(a) (b) (c)
MCA (%)
MCA (%)
MCA (%)
(d) (e) (f)
FIGURE 5 Sensitivity analysis plots for dropout hyperparameters
TA B L E 2 The selection of Pdo hyperparameter for the datasets

Dataset Image size Pdo (%) T20 (s) T50 (s)
I 320 × 480 20 1.31 3.11
MIoU (%)
II 215 × 200 5 0.38 0.94
III 224 × 224 15 0.44 1.06
the evaluation of the validation set per epoch because the

model with the lowest validation loss is used for further Ns
testing. As a result, it is necessary to select Ns as a prior.
A second bin of Ns ∈ {10, 20, 50, 100, 200} is utilized to FIGURE 6 Variations of MIoU for Dataset III
address this issue. In total, 70 models are trained and
evaluated for each case study considering all the possible
combinations (Figure 5a–c). For each dataset, the model
with the highest mean class accuracy (MCA) is selected.
While the optimum value of Pdo depends on both the
deep learning architecture and the characteristics of the From this point forward, Ns = 50 is used to evaluate the
dataset, we find that values between 5% and 20% yield three models for the testing set. However, with a slight
reasonable performance for the three datasets. compromise in the performance, one may utilize a smaller
sample for computational efficiency. Increasing Ns makes
the prediction output more stable. As an example given
5.2 Monte Carlo sample size in Figure 6, for the third dataset, this stability is shown by
whisker plots of the mean intersection over union (MIoU).
To investigate the effects of sample size, the best models
obtained from Section 5.1 are considered for each dataset
(Table 2). The validation MCA values for different sample 6 EXPERIMENTAL VALIDATIONS
sizes are compared in Figure 5d–f. Note that given the
stochastic nature of dropout operation, the reported MCA The proposed concepts in Sections 2 and 3 are investigated
values in this figure are the average of repeating the with more thorough analyses and visual examples are
sampling 10 times. The Bayesian inference data points given when necessary. This section is composed of two
are also compared with the benchmark dashed line (i.e., parts. First, the testing performance is investigated for
the standard dropout) where the Bayesian. Moreover, the different approaches including the surrogate model. In
average inference times per image for the sample size of the second part, an example is given to show how the PQT
20 and 50 (denoted as T20 and T50 ) are given in Table 2. concept can be useful for SHM.
10 SAJEDI and LIANG
TA B L E 3 Comparison of testing performance metrics for the binary crack detection case study (UW-MAP)
Class accuracy (%) IoU (%) F1-score (%)
Background Crack Average Background Crack Average Background Crack Average
No dropout 99.54 47.89 73.72 98.51 39.04 68.78 99.25 56.16 77.71
Standard dropout 99.44 62.69 81.07 98.70 49.06 73.88 99.35 65.83 82.59
Bayesian inference 99.50 62.56 81.03 98.76 50.10 74.43 99.37 66.75 83.06
Surrogate model 99.37 65.27 82.32 98.69 49.77 74.23 99.34 66.46 82.90
Note: Values with bold font refer to the strategy with the best performance.
6.1 Segmentation performance TA B L E 4 Change in testing performance metrics for different

strategies
Each of the following subsections includes an indepen- Metric UW-MAP UW-ML MFW-MAP
dent comparative investigation on one of the datasets. The Crack recall (%) 62.56 80.78 93.57
testing performance of the Bayesian inference and the IoU (%) 50.10 37.95 31.36
corresponding surrogate models are presented. Addition- F1-score (%) 66.75 55.01 47.75
ally, two other models are also included for comparison: Precision (%) 71.54 41.71 32.27
no dropout and standard dropout models. Having a total
of four models, it is possible to investigate the extent of
improvements with respect to different approaches. We more recent work, Chan, Rottmann, Hüger, Schlicht, and
show that for the three case studies, Bayesian inference Gottschalk (2019) proposed modifying the decision rule
and surrogate models have superior performance com- instead of adjusting the observation weights in the loss
pared with the one with standard dropout that utilizes the function. This goal is achieved by weighing the posterior
common weight averaging technique. class probabilities of crack and background classes with
their inverse frequency in the training data. As an alter-
native to MAP, the decision rule for this type of inference
6.1.1 Binary crack detection (I) is called the maximum likelihood (ML). Considering
the two previous strategies and UW-MAP, the variations
Crack segmentation is a challenging vision problem in the performance metrics are shown for the Bayesian
because of the significant class imbalance. Images that inference model in Table 4. For each strategy, the first
reflect a cracked surface are commonly dominated by term relates to the way that weights are balanced and the
background pixels. In the current dataset, less than 2% of second indicates the decision rule to assign a label to each
the ground truth pixels are labeled as crack. As a result, pixel.
global accuracy (GA) is not a proper metric for this task. The results in Table 4 follow a trend that indicates a
For example, if one labels all pixels as background, GA will trade-off between crack recall and other metrics. The
be approximately 98%, while no cracks are successfully MFW-MAP model captures the highest number of cracks
detected. by increasing more than 30% on the crack recall. However,
The original implementation of FC-DenseNet utilizes it is the most conservative by having the lowest precision.
uniform weights (UW) in the loss function and maxi- In contrast, UW-MAP has the best performance regarding
mum a-posteriori (MAP) decision rule, which takes the the other three metrics. While it is ideal to have perfect
maximum of softmax probabilities. A summary of results values for all metrics, the Bayesian UW-ML, with a
is presented in Table 3. Considering MCA and crack reasonable balance of all metrics, may be selected for
recall, the surrogate model has the highest performance. industrial applications. However, the choice of strategy
However, the Bayesian model shows better IoU and may differ depending on the application.
F1-score. Both models perform better than the one with Aside from the improvements in the prediction results,
standard dropout. Some visual examples based on UWs the main advantage of Bayesian inference for vision SHM
and MAP strategy are given in Figure 7. Because the is the uncertainty output. The two model’s uncertainty
crack pixels comprise a low percentage of images, the measures (MCSSD and entropy) are illustrated in Figure 7.
observations in Figure 7 are zoomed-in crops of 64 × It can be observed that model uncertainty is relatively
96 pixels. This has been done for illustration purposes high at the crack boundaries and, more importantly, the
only. model uncertainty correlates well with misclassifications.
Eigen and Fergus (2015) proposed a median frequency For example, the model shows high uncertainty in iden-
weight (MFW) assignment to help increase the MCA. In tifying the background noises as crack. Moreover, both
SAJEDI and LIANG 11
Standard Bayesian Surrogate MCSSD MCSSD Entropy Entropy

Image Ground truth
dropout Inference Model Bayesian Surrogate Bayesian Surrogate
FIGURE 7 Sample testing observations for damage localization using Bayesian inference and UAS surrogate model
TA B L E 5 Comparison of testing performance metrics for the damage localization case study
Class accuracy (%) IoU (%) F1-score (%)
Background Damage Average Background Damage Average Background Damage Average
No dropout 92.14 66.26 79.20 86.12 48.07 67.10 92.54 64.93 78.74
Standard dropout 94.20 76.68 85.44 89.85 59.92 74.89 94.66 74.94 84.80
Bayesian inference 93.30 78.18 85.74 89.26 59.08 74.17 94.32 74.28 84.30
Surrogate model 93.98 78.12 86.05 89.90 60.54 75.22 94.68 75.42 85.05
metrics are good indicators of model uncertainty while the surrogate model, which is the best approach. Some
being different in values and distribution. Considering detailed examples are provided in Figure 8. It is clear that
the correlation between mistakes and model uncertainty, the surrogate model has less uncertainty compared to the
entropy is more conservative. original Bayesian inference.
6.1.2 Damage localization (II) 6.1.3 Bridge component recognition

(III)
Unlike crack forest with close-up figures and minimum
background noise, the images in this dataset commonly Different from the first two case studies, this one deals
have complex backgrounds. This complexity is even ampli- with a multiclass segmentation problem. The assumed
fied for lab experiments where the structural elements structural components of interest for inspection are the
are commonly covered with testing instruments. This bridge column (pier), cap beam, abutment, foundation,
dataset is also labeled with a binary mask of two classes: and superstructure. All the other miscellaneous objects
damage and background. The performance metrics for in the scene are labeled as background. An important
this task are presented in Table 5. For all the considered point about this dataset is that the abutment and foun-
metrics, significant improvements are observed using dation are usually buried inside the ground or soil for
12 SAJEDI and LIANG

Image Ground truth
FIGURE 8 Sample testing observations for damage localization using Bayesian inference and UAS surrogate model
stabilization and are commonly less exposed. As a result, is accompanied by a significantly smaller recall compared
the frequency of pixel observations for the foundation with the Bayesian model. In this case, high precision does
and abutment is considerably less than that of the other not imply better performance. For instance, in a sample
classes. test image, we observe that only 1.79% of the pixels are
A summary of performance metrics is presented in correctly captured as a foundation by the no dropout
Table 6. It is evident that, compared with other com- model with a 100% precision. Therefore, precision is
ponents, the less frequent classes of abutment and excluded from the results in Table 6. Instead, F1-score and
foundation are detected with less accuracy. Similar to the IoU are selected as better alternatives.
previous two case studies, the superior performance of Sample testing observations for bridge component
the Bayesian inference and UAS surrogate model over the segmentation are presented in Figure 9. Similar to the
benchmark model is evident. other datasets, the presence of misclassification has
The benchmark has higher precision for the classes of a good correlation with the model uncertainty. Yet,
cap beam and foundation. However, this high precision there exist cases where the uncertainty is low while the
SAJEDI and LIANG 13
TA B L E 6 Comparison of testing performance metrics for bridge component recognition
Strategy Background Superstructure Column Cap beam Foundation Abutment Average

Class accuracy (%)
No dropout 95.44 87.88 86.90 19.47 1.79 22.13 52.27
Standard dropout 93.97 88.84 88.64 64.65 9.93 64.72 68.46
Bayesian inference 95.89 88.37 88.89 51.36 22.91 61.80 68.20
Surrogate model 95.45 89.03 88.63 65.36 24.34 62.71 70.92
IoU (%)
No dropout 81.54 70.73 69.80 18.98 1.79 21.75 44.10
Surrogate model 87.20 76.94 78.20 55.47 23.23 57.44 63.08
F1-score (%)
No dropout 89.83 82.85 82.21 31.89 3.51 35.27 54.26
Surrogate model 93.16 86.97 87.77 71.35 37.70 72.97 74.99
segmentations are not accurate. For example, Figure 9(7) 6.1.4 Discussion of results
illustrates a column in a laboratory setup. The column
is properly identified but the upper region of the image The comparisons of testing metrics in Tables 3, 5, and 6 are
is misclassified as the superstructure. While this is an indicators of overall better performance for the Bayesian
unlikely scene to happen for an outdoor inspection, it is and surrogate models. In all three datasets, IoU, F1-score,
important to be aware of the cases where the model uncer- and class accuracies are improved compared to the one
tainty does not reflect mistakes. In this example, the model with standard dropout. This improvement is more signif-
shows confidence in predicting that the superstructure is icant for the third dataset with bridge components that
the closest class for the roof of a laboratory compared with are less frequent and thus more difficult to recognize (i.e.,
other possible classes in the dataset. One potential reason abutment and foundation classes).
for this type of inconsistency is the fact that no knowledge As mentioned in Section 3.1, the surrogate model is
about the existence of the laboratory roof is introduced in Bayesian, and therefore, also outputs uncertainty. A side
data labeling. It is important to consider the limitations by side comparison between the two MCSSD/entropy
of supervised learning caused by the lack of sufficient masks is also presented in the given examples. The
information in the dataset. Such inconsistencies can be color scale is normalized for each observation. It can be
alleviated by enlarging the training set for such observa- observed that the surrogate model has less uncertainty
tions and potentially increasing the number of possible compared with the original Bayesian mask.
classes.
In the two previous binary segmentation problems,
despite the difference in the relative values, the two 6.2 Validation of PQT surrogate model
uncertainty metrics follow a similar distribution. In this
multiclass dataset, noticeable differences between the One of the main topics of concern in this paper is the
distribution patterns of entropy and MCSSD are observed. reliability of deep vision models for automated SHM. After
Some of the examples in Figure 9 highlight this difference observing the results in Section 6.1, one can notice that
where some pixels have relatively high entropy but low despite the presence of a surrogate model, mistakes, while
MCSSD and vice versa. Observations like this will make reduced, still exist. This subsection focuses on an example
the incorporation of uncertainty for the decision-making of the PQT concept mentioned earlier as a way to trigger
challenging. It is difficult or impossible for a human to human interventions.
simultaneously interpret the entropy and MCSSD for all Defect classification is probably one of the most con-
pixels and modify the decision accordingly. This phe- cerning fields in SHM where mistakes of a data-driven
nomenon further justifies the adoption of a UAS model model can have grave consequences. Therefore, we select
that is explained in Section 3.1. the second dataset to validate the PQT concept. While
14 SAJEDI and LIANG

Image Ground truth
7
Segmentation class dictionary
Background Superstructure Column Cap Beam Abutment Foundation
FIGURE 9 Sample testing observations for bridge component recognition using Bayesian inference and UAS surrogate model
different metrics can be used to trigger an alarm for human The testing results of the PQT surrogate proposed
interventions, we select the damage class accuracy (DCA) in Section 3.2 are given in Figure 10. Considering the
as the metric for acceptance or rejection. A threshold of rejection criteria, the model has 78.2% accuracy in label-
75% is used to label each segmentation output based on the ing observations where mistakes often happen in the
quality of damage prediction. It is possible to calibrate the proximity of the prior threshold for DCA. Nevertheless,
PQT for other metrics such as GA or MCA with different a decision-maker can adjust not only DCA but also the
performance objectives. decision rule for the PQT surrogate for more conservative
The PQT surrogate model analyzes all the predictions quality tagging. For example, instead of 50%, one may
and their corresponding uncertainty mask, and labels utilize 60% and more output probability to accept the
them based on the quality of prediction. This information prediction output. The importance of PQT surrogate is
can be used to pass on a handful of problematic segmen- further highlighted in Figures 10(3) and (6). Both images
tations to human agents for further examination. Being have a high GA because the damaged area is relatively
equipped with a PQT surrogate, a damage detector com- small. While the original Bayesian model has missed
puter can use such labels to trigger human intervention if damage in both images, the PQT surrogate can be used to
necessary. send an alarm signal for further inspections.
SAJEDI and LIANG 15
Image Ground truth Prediction Mistakes MCSSD Entropy
Accepted DCA: 98.17

1
DCA: 46.96
2 Rejected
DCA: 0.0
3 Rejected
Accepted DCA: 90.59

4
DCA: 79.42
5 Rejected
DCA: 0.0
6 Rejected
FIGURE 10 Visual comparison between the initial and surrogate segmentation results
7 CONCLUSION In this paper, we leverage on deep Bayesian neural

networks for vision-based structural inspections. The
Recent advances in computer vision and deep learn- novel frameworks proposed in this paper are validated
ing have had a major impact on vision-based struc- with three different datasets of real-world images, while
tural inspections. Considering an ever-growing interest the performance is monitored for different metrics. It is
in the application of deep learning models for SHM, shown that Bayesian inference can further boost the over-
it is necessary to develop a framework that quanti- all performance while providing an uncertainty output
fies the model’s confidence such that decision-makers for the corresponding predictions. A series of sensitivity
can make risk-informed decisions in different circums- analyses are performed to investigate the effects of dropout
tances. hyperparameters on Bayesian inference.
16 SAJEDI and LIANG
Beyond quantification, it is shown that the interpreta- ference on Computer Vision and Pattern Recognition, 2009 (CVPR
tion of uncertainty output can be challenging to manually 2009). IEEE.
analyze and integrate with an automated SHM system. Deng, J., Lu, Y., & Lee, V. C.-S. (2020). Concrete crack detection with
handwriting script interferences using faster region-based convo-
Therefore, the concept of surrogate models is introduced
lutional neural network. Computer-Aided Civil and Infrastructure
to further benefit from uncertainty output. The UAS
Engineering, 35(4), 373–388. https://doi.org/10.1111/mice.12497.
method is proposed to automatically refine the prediction Ebrahimkhanlou, A., Dubuc, B., & Salamone, S. (2019). A general-
results from the original Bayesian model. Moreover, PQT izable deep learning framework for localizing and characterizing
is developed as a means to trigger human interventions in acoustic emission sources in riveted metallic panels. Mechanical
an efficient manner. Systems and Signal Processing, 130, 248–272.
The proposed methodology in this paper can be utilized Eigen, D., & Fergus, R. (2015). Predicting depth, surface normals and
to equip the existing vision models for inspection and semantic labels with a common multi-scale convolutional archi-
tecture. In Proceedings of the IEEE International Conference on
monitoring with tools that can quantify and benefit from
Computer Vision.
output model uncertainty. In the absence of huge datasets Eltouny, K. A., & Liang, X. (2020). A nonparametric unsuper-
for training, Bayesian inference can be an effective tool to vised learning approach for structural damage detection. In 17th
make visual inspections using deep vision models more World Conference on Earthquake Engineering (17WCEE), Sendai,
reliable. The presented examples are the cornerstone to Japan.
further expand uncertainty in other related fields of SHM Gal, Y., & Ghahramani, Z. (2015). Bayesian convolutional neural net-
for future research. For example, the uncertainty masks in works with Bernoulli approximate variational inference. arXiv
preprint arXiv:.02158.
the crack segmentation can be used in the calculation of
Gal, Y., & Ghahramani, Z. (2016). Dropout as a Bayesian approxima-
crack width and heights in the postprocessing steps.
tion: Representing model uncertainty in deep learning. In Interna-
tional Conference on Machine Learning.
REFERENCES Gao, Y., & Mosalam, K. M. (2018). Deep transfer learning for image-
Ahmadlou, M., & Adeli, H. (2010). Enhanced probabilistic neural based structural damage recognition. Computer-Aided Civil and
network with local decision circles: A robust classifier. Integrated Infrastructure Engineering, 33(9), 748–768.
Computer-Aided Engineering, 17(3), 197–210. Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. Cam-
Ai, D., Jiang, G., Kei, L. S., & Li, C. (2018). Automatic pixel-level pave- bridge, MA: MIT Press.
ment crack detection using information of multi-scale neighbor- Graves, A. (2011). Practical variational inference for neural networks.
hoods. IEEE Access, 6, 24452–24463. In Advances in Neural Information Processing Systems.
Amezquita-Sanchez, J. P., Park, H. S., & Adeli, H. (2017). A novel Hoskere, V., Narazaki, Y., Hoang, T., & Spencer, B. F., Jr (2018).
methodology for modal parameters identification of large smart Vision-based structural inspection using multiscale deep convo-
structures using MUSIC, empirical wavelet transform, and Hilbert lutional neural networks. arXiv preprint arXiv:1805.01055.
transform. Engineering Structures, 147, 148–159. Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q.
Azimi, M., & Pekcan, G. (2019). Structural health monitoring using (2017). Densely connected convolutional networks. In Proceed-
extremely-compressed data through deep learning. Computer- ings of the IEEE Conference on Computer Vision and Pattern
Aided Civil and Infrastructure Engineering, 35(6), 597–614. Recognition.
Bang, S., Park, S., Kim, H., & Kim, H. (2019). Encoder–decoder net- Jégou, S., Drozdzal, M., Vazquez, D., Romero, A., & Bengio, Y. (2017).
work for pixel-level road crack detection in black-box images. The one hundred layers tiramisu: Fully convolutional densenets
Computer-Aided Civil Infrastructure Engineering, 34(8), 713–727. for semantic segmentation. In Proceedings of the IEEE Conference
Cha, Y. J., Choi, W., & Büyüköztürk, O. (2017). Deep learning-based on Computer Vision and Pattern Recognition Workshops.
crack damage detection using convolutional neural networks. Karypidis, D. F., Berrocal, C., Rempling, R., Granath, G., & Simons-
Computer-Aided Civil Infrastructure Engineering, 32(5), 361–378. son, P. J. I. (2019). Structural health monitoring of RC structures
Cha, Y. J., Choi, W., Suh, G., Mahmoudkhani, S., & Büyüköztürk, using optic fiber strain measurements: A deep learning approach.
O. (2018). Autonomous structural visual inspection using region- In 2019 IABSE Congress. New York.
based deep learning for detecting multiple damage types. Kendall, A., Badrinarayanan, V., & Cipolla, R. (2015). Bayesian
Computer-Aided Civil Infrastructure Engineering, 33(9), 731–747. segnet: Model uncertainty in deep convolutional encoder-
Chan, R., Rottmann, M., Hüger, F., Schlicht, P., & Gottschalk, decoder architectures for scene understanding. arXiv preprint:
H. (2019). Application of decision rules for handling class ArXiv:.1511.02680.
imbalance in semantic segmentation. arXiv preprint arXiv: Kendall, A., & Gal, Y. (2017). What uncertainties do we need in
.08394. Bayesian deep learning for computer vision? In Advances in Neu-
Chen, X., Girshick, R., He, K., & Dollár, P. (2019). Tensormask: ral Information Processing Systems.
A foundation for dense object segmentation. arXiv preprint Koller, D., & Friedman, N. (2009). Probabilistic graphical models:
arXiv:.12174. Principles and techniques. Cambridge, MA: MIT Press.
Chollet, F. (2015). Keras. Retrieved from https://github.com/keras- Li, R., Yuan, Y., Zhang, W., & Yuan, Y. (2018). Unified vision-based
team/keras methodology for simultaneous concrete defect detection and
Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei–Fei, L (2009). geolocalization. Computer-Aided Civil and Infrastructure Engi-
Imagenet: A large-scale hierarchical image database. In IEEE Con- neering, 33(7), 527–544.
SAJEDI and LIANG 17
Li, S., Zhao, X., & Zhou, G. (2019). Automatic pixel-level multiple inspection. In 10th New York City Bridge Conference. New York
damage detection of concrete structure using fully convolutional City.
network. Computer-Aided Civil Infrastructure Engineering, 34(7), Sajedi, S. O., & Liang, X. (2019b). Intensity-based feature selection for
616–634. near real-time damage diagnosis of building structures. In IABSE
Liang, X. (2019). Image-based post-disaster inspection of reinforced Congress. New York City, The International Association for Bridge
concrete bridge systems using deep learning with Bayesian opti- and Structural Engineering.
mization. Computer-Aided Civil Infrastructure Engineering, 34(5), Sajedi, S. O., & Liang, X. (2020a). A data-driven framework for near
415–430. real-time and robust damage diagnosis of building structures.
Liang, X., Zheng, M., & Zhang, F. (2018). A scalable model-based Structural Control & Health Monitoring, 27, e2488.
learning algorithm with application to UAVs. IEEE Control Sys- Sajedi, S. O., & Liang, X. (2020b). Deep Bayesian U-nets for efficient,
tems Letters, 2(4), 839–844. robust and reliable post-disaster damage localization. 17th World
Liu, Y. F., Nie, X., Fan, J. S., & Liu, X. G. (2020). Image-based Conference on Earthquake Engineering (17WCEE), Sendai, Japan.
crack assessment of bridge piers using unmanned aerial vehi- Sajedi, S. O., & Liang, X. (2020c). Vibration-based semantic dam-
cles and three-dimensional scene reconstruction. Computer-Aided age segmentation for large-scale structural health monitoring.
Civil Infrastructure Engineering, 35(5), 511–529. https://doi.org/10. Computer-Aided Civil and Infrastructure Engineering, 35(6), 579–
1111/mice.12501. 596.
Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional net- Shi, Y., Cui, L., Qi, Z., Meng, F., & Chen, Z. (2016). Automatic
works for semantic segmentation. In Proceedings of the IEEE Con- road crack detection using random structured forests. IEEE
ference on Computer Vision and Pattern Recognition. Transactions on Intelligent Transportation Systems, 17(12), 3434–
McAllister, R., Gal, Y., Kendall, A., Van Der Wilk, M., Shah, 3445.
A., Cipolla, R., & Weller, A. V. (2017). Concrete problems Simonyan, K., & Zisserman, A. (2014). Very deep convolutional
for autonomous vehicle safety: Advantages of bayesian deep networks for large-scale image recognition. arXiv preprint
learning. In International Joint Conferences on Artificial arXiv:1409.1556.
Intelligence. Spencer, B. F., Hoskere, V., & Narazaki, Y. (2019). Advances in com-
Narazaki, Y., Hoskere, V., Hoang, T. A., Fujino, Y., Sakurai, A., & puter vision-based civil infrastructure inspection and monitoring.
Spencer, B. F. Jr (2020). Vision-based automated bridge compo- Engineering, 5(2), 199–222.
nent recognition with high-level scene consistency. Computer- Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdi-
Aided Civil and Infrastructure Engineering, 35(5), 465–482. nov, R. (2014). Dropout: A simple way to prevent neural networks
https://doi.org/10.1111/mice.12505. from overfitting. The Journal of Machine Learning Research, 15(1),
Oh, B. K., Glisic, B., Kim, Y., & Park, H. S. (2019). Convolutional neu- 1929–1958.
ral network-based wind-induced response estimation model for Tong, Z., Gao, J., Sha, A., Hu, L., & Li, S. (2018). Convolutional neural
tall buildings. Computer-Aided Civil and Infrastructure Engineer- network for asphalt pavement surface texture analysis. Computer-
ing, 34(10), 843–858. Aided Civil Infrastructure Engineering, 33(12), 1056–1072.
Rafiei, M. H., & Adeli, H. (2016). A novel machine learning model for Xue, Y., & Li, Y. (2018). A fast detection method via region-based fully
estimation of sale prices of real estate units. Journal of Construc- convolutional neural networks for shield tunnel lining defects.
tion Engineering and Management, 142, 04015066. Computer-Aided Civil Infrastructure Engineering, 33(8), 638–654.
Rafiei, M. H., & Adeli, H. (2017a). NEEWS: A novel earthquake Yang, X., Li, H., Yu, Y., Luo, X., Huang, T., & Yang, X. (2018). Auto-
early warning model using neural dynamic classification and neu- matic pixel-level crack detection and measurement using fully
ral dynamic optimization. Soil Dynamics Earthquake Engineering, convolutional network. Computer-Aided Civil and Infrastructure
100, 417–427. Engineering, 33(12), 1090–1109.
Rafiei, M. H., & Adeli, H. (2017b). A novel machine learning-based Zhang, A., Wang, K. C. P., Li, B., Yang, E., Dai, X., Peng, Y., . . . Chen,
algorithm to detect damage in high-rise building structures. The C. (2017). Automated pixel-level pavement crack detection on 3d
Structural Design of Tall and Special Buildings, 26(18), e1400. asphalt surfaces using a deep-learning network. Computer-Aided
Rafiei, M. H., & Adeli, H. (2018a). Novel machine-learning model for Civil and Infrastructure Engineering, 32(10), 805–819.
estimating construction costs considering economic variables and Zhang, C., Chang, C. C., & Jamshidi, M. (2019). Concrete bridge sur-
indexes. Journal of Construction Engineering and Management, face damage detection using a single-stage detector. Computer-
144(12), 04018106. Aided Civil Infrastructure Engineering, 35(4), 389–409.
Rafiei, M. H., & Adeli, H. (2018b). A novel unsupervised deep learn- Zheng, M., Chen, Z., & Liang, X. (2019). A preliminary study on a
ing model for global and local health condition assessment of physical model oriented learning algorithm with application to
structures. Engineering Structures, 156, 598–607. UAVs. In ASME 2019 Dynamic Systems and Control Conference.
Rafiei, M. H., Khushefati, W. H., Demirboga, R., & Adeli, H. (2017). American Society of Mechanical Engineers Digital Collection.
Supervised deep restricted Boltzmann machine for estimation of
concrete. ACI Materials Journal, 114(2), 237–244.
Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: Convolutional
How to cite this article: Sajedi SO, Liang X.
networks for biomedical image segmentation. In International
Conference on Medical Image Computing and Computer-Assisted Uncertainty-assisted deep vision structural health
Intervention. Springer. monitoring. Comput Aided Civ Inf. 2020; 1–17.
Sajedi, S. O., & Liang, X. (2019a). A convolutional cost-sensitive https://doi.org/10.1111/mice.12580
crack localization algorithm for automated and reliable RC bridge

Uncertainty-Assisted Deep Vision Structural Health Monitoring

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Uncertainty-Assisted Deep Vision Structural Health Monitoring

Загружено:

Авторское право:

Доступные форматы

DOI: 10.1111/mice.

Uncertainty-assisted deep vision structural

Seyed Omid Sajedi Xiao Liang

Department of Civil, Structural and

1 INTRODUCTION great promise in dealing with real-world images and

© 2020 Computer-Aided Civil and Infrastructure Engineering

Comput Aided Civ Inf. 2020;1–17. wileyonlinelibrary.com/journal/mice 1

2.1 Inference and uncertainty 𝑁

Two metrics are introduced to quantify model uncer-

Moreover, class softmax variance can also be used as an

As the second overall measure of model uncertainty,

DB-1 C TD DB-2 C TD DB-3 C TD DB-4 C TD DB-5

Softmax C DB-11 C TU DB-10 C TU DB-9 C TU DB-8 C TU DB-7

Softmax bin Prediction MCSSD Entropy

(A): Mean Softmax Probabilities (E(Si))

FIGURE 3 UAS surrogate model with the second input

FIGURE 4 PQT surrogate model

Data set (I) Data set (II) Data set (III)

FIGURE 5 Sensitivity analysis plots for dropout hyperparameters

TA B L E 2 The selection of Pdo hyperparameter for the datasets

the evaluation of the validation set per epoch because the

6.1 Segmentation performance TA B L E 4 Change in testing performance metrics for different

Standard Bayesian Surrogate MCSSD MCSSD Entropy Entropy

6.1.2 Damage localization (II) 6.1.3 Bridge component recognition

Standard Bayesian Surrogate MCSSD MCSSD Entropy Entropy

TA B L E 6 Comparison of testing performance metrics for bridge component recognition

Strategy Background Superstructure Column Cap beam Foundation Abutment Average

Standard Bayesian Surrogate MCSSD MCSSD Entropy Entropy

Image Ground truth Prediction Mistakes MCSSD Entropy

Accepted DCA: 98.17

Accepted DCA: 90.59

7 CONCLUSION In this paper, we leverage on deep Bayesian neural

Вам также может понравиться