ESREL - 2018 - Novel CNN Paper - Rev5 PDF

Acoustic Emission Based Fault Diagnosis via a Novel Deep
Convolutional Neural Network Method

D. González Toledo1, E. López Droguett1,2, V. Meruane1, M. Modarres2
1
University of Chile, Santiago, Chile
2
University of Maryland, College Park, USA
ABSTRACT: Acoustic Emission (AE) has seen increased popularity in applications involving machine condi-
tion monitoring. AE applications usually involve higher sampling rate than vibration signals, not rare reaching
2 MHz. One of the main challenges involving AE based fault diagnosis is the need of preprocessing massive
amounts of data generated by this technique, including engineering of appropriate features and dimensionality
reduction so to be able to handle such massive datasets. In this paper, we propose a novel method based on
Deep Convolutional Neural Networks (CNN) to handle raw AE signals for diagnosis of a system’s health
states. This method is flexible enough to not only handle the massive amount of AE data, but also to provide
the means for automatic feature extraction by applying various filters to the raw AE signals, and thus identify-
ing relevant frequencies related to different faults. The proposed CNN method is applied to fatigue crack de-
tection on blades of an experimental rotor.
1 INTRODUCTION The remainder of the paper is structured as fol-
lows. Section 2 introduces deep learning and CNNs
Unscheduled maintenance of mechanical systems and their architectural building blocks. Then, Sec-
leads to loss of production and might as well affect tion 3 discusses the proposed method, application
safety. There are many ways to minimize this effect, and validation for fault diagnosis of an experimental
some of them are: increase redundancy, programed rotor and compares its performance to a fully opti-
maintenance to identify problems that could incur in mized shallow neural network. Section 4 presents
extended downtimes and, condition monitoring some concluding remarks.
(Rabiei, Droguett and Modarres, 2016).
A popular approach to condition monitoring is
vibration analysis. However, vibration monitoring is 2 CONVOLUTIONAL NEURAL NETWORKS
usually less sensitive to detecting damages already
developed, which pose a significant limitation in
2.1 Artificial Neural Networks
sensitive systems. On the other hand, Acoustic
Emission techniques are gaining grounds because Artificial Neural Networks (ANNs) are com-
they can identify damage at early stages, with the prised of simple units called neurons. Each of these
tradeoff of introducing higher sample rates resulting neurons creates a linear combination for an input (x)
in massive and higher data dimensionality. between a weight (w) and a bias (b) parameters that
Moreover, both AE and vibration monitoring re- are learned by the ANN.
quire signal preprocessing and interpretation such as Also, an activation function (f) adds the nonlinear
wavelets, fast Fourier transform and band filtering behavior that allows to compute nontrivial respons-
among others (Riaz et al., 2017), a labor intensive es. Then, the output (O) of a neuron is computed by
and expensive endeavor requiring specialized engi- Equation (1):
neering expertise.
Machine learning techniques have become a pop- 𝑂 𝑥 = 𝑓(𝑤𝑥 + 𝑏) (1)
ular choice for fault diagnosis and prognosis. Most
of these shallow models heavily rely on manual fea-
ture identification and extraction (Ruiz-Gonzalez et 2.2 Deep Learning
al., 2014; Kane and Andhare, 2016; Li et al., 2016). Simply put, deep learning is a branch of the Ma-
As discussed in (Verstraete et al., 2017), the per- chine Learning that uses many hidden layers to per-
formance of these methods is dependent on the qual- form the learning. The deep learning based networks
ity of the hand-engineered features, which obviously learn multiple features over the features learned by
requires significant understanding of the system’s previous layers, integrating the concept of hierarchy
degradation processes. between features implying different levels of ab-
To tackle these challenges, we propose a deep straction (Deng and Yu, 2014). This is important to
CNN-based method for fault diagnosis that operates achieve high accuracy in tasks that have complex re-
on massive raw acoustic signals and allows for the lationship among data such as image recognition and
automatic hierarchical “layer to layer” feature ex- signal processing.
traction to learn complex representations of the data.
2.3 Convolutional Neural Network Overview 2.3.3 Pooling Layer
Convolutional Neural Networks (CNNs) are a Pooling layers are used to reduce the dimension
type of Neural Networks that are specialized for of the input and achieve spatial invariance. This is
processing grid-topology data (Goodfellow, Bengio usually accomplished by taking the maximum value
and Courville, 2017). CNNs have been shown to of a pooling window and switching it for all values
outperform shallow architectures in many image in that window. This lowers the resolution but taking
recognition tasks and have been applied to vibration only the most important feature.
based fault diagnosis (Verstraete et al., 2017).
The main characteristics of the CNNs are that the 2.3.4 Activation Function
layers have sparse connectivity and parameter shar- As discussed before, activation functions add
ing. The first characteristic means that CNNs use fil- nonlinear behavior to the network. In the proposed
ters that are considerably smaller than the input im- CNN architecture, we employ the Rectified Linear
plying that the filters store less parameters than a Units (ReLUs) as activation function for the convo-
shallow neural network and detect important fea- lutional layers as they provide increased sparsity
tures of the input. The second one implies that the compared with Tanh or Sigmoid activation func-
filter weights in a convolutional layer are used mul- tions, thus decreasing computation time (Maas,
tiple times across the input resulting in computation- Hannun and Ng, 2013). The commonly used ReLU
ally efficient matrix multiplication. is shown in Equation (5):
2.3.1 Convolutional Layer
𝑔 𝑥 = max (0, 𝑥) (5)
As we are dealing with raw AE data, the convolu-
tion operation in the proposed model is also 1D.
For the fully connected layers (see Section 2.3.5),
This means that the filters (w(t)) learned by the net-
the softmax activation function is used to quantify
work are in the time domain and generate a filtered
the probability of a sample to correspond to a given
signal of the input (x(t)) highlighting features that
system’s health state. This function is displayed in
represent the system’s health state. The output signal
Equation (6):
of a 1D convolution, s(t), is computed as shown in
Equation (2):
𝑒 𝒂H
𝑓D (𝒛) = I 𝒂H
(6)
D15 𝑒
/
𝑠 𝑡 = 𝑥∗𝑤 𝑡 = 𝑥 𝑎 𝑤(𝑡 − 𝑎) (2)
012/
2.3.5 Fully-Connected Layer

2.3.2 Batch Normalization Layer Finally, a couple of fully connected layers with
Batch Normalization (BN) is a method for accel- same dimension are responsible for the classification
erating the learning of deep networks by reducing based on the feature maps from the last convolution.
internal covariate shift of the data (Ioffe and
Szegedy, 2015). This is achieved by performing a 2.3.6 Network Optimization
normalization for each mini-batch with the learning For supervised learning, the network learns by
of new normalization parameters: scale parameter optimizing (minimizing) a loss function on the dif-
gamma (𝛾) and shift parameter beta (𝛽). ference between the predicted and true labels. The
Given the p-dimension input to a BN layer 𝑥 = selected loss function is the cross-entropy between
(𝑥 5 , . . . , 𝑥 (8) ), the transformation is made with the estimated softmax output 𝑞(𝑥) and target class
Equation (3) and Equation (4): 𝑝(𝑥):
(9)
𝑥 (9) − 𝐸 𝑥 (9) 𝐿𝑜𝑠𝑠 = − 𝑝 𝑥 log (𝑞 𝑥 ) (7)
𝑥 = (3)
𝑉𝑎𝑟 𝑥 (9) N
The network is optimized via ADAM (Kingma

𝑦 (9) = 𝛾 (9) 𝑥 (9) + 𝛽 (9) (4)
and Ba, 2014), an algorithm inspired in stochastic
gradient descent (SGD). This optimizer works with
where Var[X] is the variance and E[X] is the ex- adaptive estimates of lower-order moments and per-
pectation and are computed over the training set. forms well with noisy and sparse gradients. This
Equation (3) is used to standardize features and ac- method combines two extensions of SGD: Adaptive
celerate convergence and Equation (4) restores the Gradient Algorithm that improves performance on
representation power of the network. problems with sparse gradients by maintaining a
learning rate per parameter. The other one is the use
of Root Mean Square Propagation, which adapts the Table 1 Size and position of notches.
learning rate as a function of the magnitude of the
Position [mm] Size [mm]
gradients for the weights, improving the perfor-
mance with noisy data. 5 3
20 6
10
2.3.7 Regularization
To tackle the CNNs tendency to overfit during
training and to improve generalization performance,
regularization is implemented by means of the fol-
lowing two approaches. First, we employ L2 regu-
larization by adding a term to the loss function that
penalizes high weights over the network (Peng et al.,
2015). Second, we use dropout, which consists of
disconnecting some neurons during training to pre-
vent co-adapting (Srivastava et al., 2014), is imple-
mented in the fully connected layer with 50% drop
probability.
Figure 1 Sketch of a blade with a 6 mm notch at the 5 mm
position.
3 PROPOSED CNN METHOD
A sketch of a blade is displayed in Figure 1.
3.1 Dataset
The proposed CNN based method is conducted The acquisition rate is 500 kHz and each combi-
on a dataset generated from AE monitoring of an nation between position and size of the notches are
experimental rotor, as shown in Figure 2. measured for 176.16s divided in 168 files of 524,288
data points each.
In this dataset, there are seven health conditions:
undamaged; 3mm, 6mm and 10mm cracks at the
5mm position; 3mm, 6mm and 10mm cracks at the
20mm position. For each health state, there are
53,477,376 data points and the CNN is fed with
samples composed by slices of 49,152 points (1.77
rotor turn) of the raw signal with 50% overlap. Note
that the proposed CNN method is trained for a 3-
health state scenario corresponding to: (1) Undam-
aged, (2) Damage at 5mm position obtained by
combining the 3mm, 6mm and 10mm cracks at that
position; and (3) Damage at 20mm obtained by
Figure 2 Experimental rotor setup. combining the crack sizes 3mm, 6mm and 10mm at
the 20mm position.
Data augmentation is also implemented to en-
The setup is comprised of: force network generalization, thus improving the ac-
1. Mistras, Micro 30 Acoustic Emission sensors curacy on unseen samples. The dataset has a total of
2. Rotor and blades 548,458,432 data points that are split into 80-20
3. DC Motor MY-1016, 24[V] 13.7[A] proportion for training and testing, respectively.
4. MCP Q10-QS305 Power Source. Figure 3 shows examples of raw AE signals for
each of the three health conditions. Notice that the
The rotor has 8 blades with one of them notched signals have a significant amount of noise mainly
according to size and position in Table 1. because the rotor system is not perfectly balanced
and has some degree of misalignment. In addition,
vibrations from the bearings, coupling and motor
add additional noise to the response.
Figure 3 a) 5mm sample signal, b) 5mm amplitude spectrum, c) 20mm sample signal, d) 20mm amplitude spectrum, e) Undamaged sample
signal and f) Undamaged amplitude spectrum
However, the implementation of de-noising shaped before being fed to the last two fully con-
methods incurs in loss of information and encom- nected layers, each with 1024 neurons, that are re-
passes pre-processing time that we want to avoid sponsible for processing the features obtained from
and handle with the proposed architecture. the convolutional layers to perform fault diagnosis.
Moreover, based on the raw signals and the am- The proposed CNN method is trained for 15000
plitude spectrums shown in Figure 3, the health con- epochs, where one consists of all training samples.
ditions are remarkably similar that, coupled with the The CNN is regularized via dropout for the fully
signal noise levels, makes this dataset a significant connected layers with 50% of keep probability and
challenging diagnosis task. L2 regularization (see Section 2.3.7) as well as early
stopping by saving the best epoch in terms of accu-
racy and generalization capability (train loss remain-
3.2 Proposed Deep CNN Architecture ing low as test loss decreases).
The proposed CNN architecture, processing Also of note is that the proposed CNN architec-
batches of 256 samples, consists of five convolu- ture does not have pooling layers. There are two un-
tional layers as follows (see Figure 4): the first con- derlying reasons: firstly, the proposed CNN method
volutional layer has 32 oversized filters of 128x1 de- is not required to achieve spatial invariance as it
signed to tackle background noise in the acoustic deals with raw acoustic emission signals; secondly,
emission raw signal; this is followed by four convo- the CNN marginally improved (in terms of accuracy
lutional layers with 32, 32, 64 and 128 filters of size and generalization) by the reduction in size of the
3x1, respectively, which are designed to automati- feature maps resulting from the polling layers or,
cally and hierarchically extract features from the AE conversely, its performance deteriorated due to the
data. The last convolutional layer’s output is re-
Figure 4 Architecture of the CNN
loss of information incurred by the implementation

of pooling layers.
In terms of activation functions, the fully con-
nected layers use softmax, whereas ReLU is imple- Table 2 Accuracy for Health State of the System.
mented in all five convolutional layers (see Section
Test Accuracy (%)
2.3.4 for details). The CNN weights are initialized
by Xavier Normal Initialization, a Normal Uniform ANN 33.7
distribution normalized by the size of the previous CNN 93.0
and next layer (Glorot and Bengio, 2010) and bias as
0.1 constant in all layers. To corroborate these results, we collect the aver-
age values and corresponding standard deviations
per health state from multiple runs for different per-
3.3 CNN Implementation formance metrics as shown in Table 3.
All the results shown in the next section were ob-
Table 3 Performance measures in percentages [%] for the proposed
tained using the following hardware configuration at CNN method.
Smart Reliability and Maintenance Integration La-
boratory (SRMILab) in the University of Chile: In- 5 [mm] 20 [mm] Undamaged
tel® Core™ i7-6700K CPU with 32Gb RAM and a Sensitivity 89.77 ± 1.70 91.50 ± 1.48 97.65 ± 0.36
NVIDIA Titan XP GPU. Specificity 94.69 ± 0.80 95.93 ± 0.62 99.04 ± 0.46
Precision 89.22 ± 1.62 91.53 ± 1.22 98.18 ± 0.92
F1 Score 89.48 ± 0.98 91.51 ± 0.88 97.91 ± 0.57
3.4 Results and Discussion Accuracy 93.06 ± 0.80 94.50 ± 0.42 98.56 ± 0.39
The proposed CNN method is compared with a Moreover, the unnormalized and normalized con-
shallow ANN that has the same two fully connected fusion matrices are shown in Table 4 and Figure 5,
layers, but lacks the convolutional layers. This al- respectively.
lows us to assess the impact on the fault diagnosis
performance of the convolutions as signal filtering Table 4 Confusion Matrix for the proposed CNN method.
and de-noising tool as wells as the quality and ro-
bustness of the extracted features. This ANN is fully 5 [mm] 20 [mm] Undamaged
optimized with ADAM adaptive gradient-based op- 5 [mm] 363 35 6
timization algorithm and regularized via dropout 20 [mm] 28 382 0
(with 50% keep probability), weight regularization Undamaged 8 0 402
for both hidden layers and early stopping.
Table 2 shows the overall test fault diagnosis ac-
curacy. The proposed CNN method significantly
outperforms the shallow ANN in terms of accuracy
and generalization capacity, with the ANN barely
learning from the complex AE dataset.
Based on these results, the proposed CNN method
outperforms the shallow ANN for the rotor’s fault
diagnosis based on acoustic emission monitoring.
This is corroborated by observing Figure 6 a) and c)
that the CNN presents a monotonically descendent
testing loss behavior that leads to improvement in
the fault diagnosis accuracy. But, it should be ob-
served that that the accuracy improvement to time
ratio for the CNN is very low for the last epochs
even though the network still learns. This could be
driven by a very low learning rate for these epochs
as ADAM adapts this hyperparameter.
However, as shown in Figure 6 b) and d), the
ANN barely learns from the raw AE data, which
could be attributed to the complexity of the data as
well as the meaningless features that the ANN ex-
tracts by treating the signals as independent points,
problem that seems to be compensated by the convo-
lutional filters in the CNN.
However, the superior performance achieved by
the proposed CNN method comes at a much higher
computational cost due to the significant number of
Figure 5 Normalized confusion matrix for the proposed CNN learnable parameters and hyperparameters leads to
method.
extended training times.
Figure 6 a) Accuracy behavior of the CNN, b) Accuracy behavior of the ANN, c) Loss behavior of CNN and d) Loss behavior of the ANN
4 CONCLUSIONS ‘Rectifier Nonlinearities Improve Neural Network
Acoustic Models’, Proceedings of the 30 th
This paper has introduced a new deep CNN-based International Conference on Machine Learning, 28,
method for fault diagnosis using raw acoustic emis- p. 6. Available at:
sion signals. The application of this method to an https://web.stanford.edu/~awni/papers/relu_hybrid_i
experimental rotor has shown that the proposed cml2013_final.pdf.
Peng, H. et al. (2015) ‘A Comparative Study on
method delivers satisfactorily performance metrics Regularization Strategies for Embedding-based
for health state diagnosis. The CNN method was al- Neural Networks’, (1). Available at:
so compared to a fully optimized ANN, with the http://arxiv.org/abs/1508.03721.
former significantly outperforming the shallow Rabiei, E., Droguett, E. L. and Modarres, M.
method. (2016) ‘A prognostics approach based on the
These solid results in fault diagnosis are mainly evolution of damage precursors using dynamic
due to the CNN’s ability to automatically extract Bayesian networks’, Advances in Mechanical
features from and efficiently handle the noisy acous- Engineering, 8(9), p. 168781401666674. doi:
tic emission signals. This also brings major ad- 10.1177/1687814016666747.
vantages to the development of automated monitor- Riaz, S. et al. (2017) ‘Vibration Feature
ing and fault diagnosis tools such as the possibility Extraction and Analysis for Fault Diagnosis of
to bypass the intervention of the human element in Rotating Machinery-A Literature Survey’, Asia
Pacific Journal of Multidisciplinary Research,
the labor-intensive feature engineering process and 5(51), pp. 103–110. Available at: www.apjmr.com.
reducing the need for preprocessing and de-noising Ruiz-Gonzalez, R. et al. (2014) ‘An SVM-Based
of acoustic emission signals. Based on these prelim- classifier for estimating the state of various rotating
inary results, the proposed CNN method is a promis- components in Agro-Industrial machinery with a
ing tool for fault diagnosis. vibration signal acquired from a single point on the
machine chassis’, Sensors (Switzerland), 14(11), pp.
ACKNOWLEDGMENTS 20713–20735. doi: 10.3390/s141120713.
The authors acknowledge the partial ﬁnancial sup- Srivastava, N. et al. (2014) ‘Dropout: A Simple
port of the Chilean National Fund for Scientiﬁc and Way to Prevent Neural Networks from Overfitting’,
Technological Development (Fondecyt) under Grant Journal of Machine Learning Research, 15, pp.
No. 1160494. 1929–1958. doi: 10.1214/12-AOS1000.
Verstraete, D. et al. (2017) ‘Deep Learning
Enabled Fault Diagnosis Using Time-Frequency
Image Analysis of Rolling Element Bearings’, 2017,
REFERENCES pp. 1–29.
Deng, L. and Yu, D. (2014) ‘Deep Learning:
Methods and Applications’, Foundations and
Trends® in Signal Processing, 7(3–4), pp. 197–387.
doi: 10.1561/2000000039.
Glorot, X. and Bengio, Y. (2010) ‘Understanding
the difficulty of training deep feedforward neural
networks’, 9, pp. 249–256.
Goodfellow, I., Bengio, Y. and Courville, A.
(2017) Deep Learning. doi: 10.1007/s00287-016-
1013-2.
Ioffe, S. and Szegedy, C. (2015) ‘Batch
Normalization: Accelerating Deep Network Training
by Reducing Internal Covariate Shift’. doi:
10.1007/s13398-014-0173-7.2.
Kane, P. and Andhare, A. (2016) ‘Application of
psychoacoustics for gear fault diagnosis using
artificial neural network’, Journal of Low Frequency
Noise, Vibration and Active Control, 35(3), pp. 207–
220. doi: 10.1177/0263092316660915.
Kingma, D. P. and Ba, J. (2014) ‘Adam: A
Method for Stochastic Optimization’, pp. 1–15. doi:
http://doi.acm.org.ezproxy.lib.ucf.edu/10.1145/1830
483.1830503.
Li, C. et al. (2016) ‘Fault diagnosis for rotating
machinery using vibration measurement deep
statistical feature learning’, Sensors (Switzerland),
16(6). doi: 10.3390/s16060895.
Maas, A. L., Hannun, A. Y. and Ng, A. Y. (2013)

ESREL - 2018 - Novel CNN Paper - Rev5 PDF

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

ESREL - 2018 - Novel CNN Paper - Rev5 PDF

Загружено:

Авторское право:

Доступные форматы

Acoustic Emission Based Fault Diagnosis via a Novel Deep

Convolutional Neural Network Method

2.3.5 Fully-Connected Layer

The network is optimized via ADAM (Kingma

loss of information incurred by the implementation

Вам также может понравиться