Explainable Convolutional Neural Network For Gearbox Fault Diagnosis PDF

Available online at www.sciencedirect.
com
Available online at www.sciencedirect.com
ScienceDirect
Available ScienceDirect
online
Procedia
Available atonline
CIRPwww.sciencedirect.com
00 (2018) 000–000
at www.sciencedirect.com
www.elsevier.com/locate/procedia
ScienceDirect
Procedia CIRP 00 (2018) 000–000
ScienceDirect www.elsevier.com/locate/procedia
Procedia CIRP 00 (2017)

Procedia 000–000
CIRP 80 (2019) 476–481
www.elsevier.com/locate/procedia
26th CIRP Life Cycle Engineering (LCE) Conference
26th CIRP Life Cycle Engineering (LCE) Conference
Explainable Convolutional Neural Network for Gearbox Fault Diagnosis
Explainable Convolutional Neural
28th CIRP Design Network
Conference, forNantes,
May 2018, GearboxFranceFault Diagnosis
John Grezmak a, Peng Wang a, Chuang Sun b, and Robert X. Gao a,*
A new methodology
John Grezmak toa,analyze
Case Western Reserve
a Wang the
PengUniversity, a
functional
, Chuang
10900 Euclid Ave,Sun
b
, and
Cleveland,and physical
Robert X.
OH, 44106-7222, architecture
USAGao
a,
* of
existing products
Casefor
School anReserve
ofWestern assembly
Mechanical Engineering,
b
University,Xi’an
a
10900oriented
Jiaotong product
University,
Euclid Ave, Xi’an,
Cleveland, OH,Shaanxi family Chinaidentification
710049,USA
44106-7222,
* Corresponding author. Tel.: +1-216-368-6045;
b fax: +1-216-368-6445.
School of Mechanical E-mail
Engineering, Xi’an address:
Jiaotong Robert.Gao@case.edu
University, Xi’an, Shaanxi 710049, China
Paul Stief *, Jean-Yves Dantan, Alain Etienne, Ali Siadat
* Corresponding author. Tel.: +1-216-368-6045; fax: +1-216-368-6445. E-mail address: Robert.Gao@case.edu
Abstract École Nationale Supérieure d’Arts et Métiers, Arts et Métiers ParisTech, LCFC EA 4495, 4 Rue Augustin Fresnel, Metz 57078, France
Abstract
The rapid advancement of data analytics has opened up new opportunities for improving the life cycle of engineered products and enhancing
* Corresponding author. Tel.: +33 3 87 37 54 30; E-mail address: paul.stief@ensam.eu
sustainability by intelligent monitoring and fault diagnosis of the related manufacturing processes and systems. Recently, Deep Neural Networks
The rapidhave
(DNNs) advancement
demonstratedof data analyticsaccuracy
improved has opened anduprobustness
new opportunities for improving
in classifying machinethe life types
fault cycle of andengineered
severities,products and enhancing
when compared with
sustainability by intelligent monitoring and fault diagnosis of the related manufacturing processes and systems.
conventional machine learning techniques. A major constraint of DNNs is that they operate as ‘black boxes’, which do not provide insight Recently, Deep Neural Networks
into
(DNNs)
how faulthave demonstrated
classification improved
decisions accuracy
are made. This notandonly robustness in classifying
raises questions on the machine fault types
trustworthiness of theand severities,
decisions when compared
themselves, with
but also limits
conventional
Abstract machineoflearning
further improvement DNNs for techniques.
adaptation A tomajor constraint
a broader rangeofofDNNs is that they
applications. Thisoperate as ‘blackanboxes’,
paper presents whichDeep
explainable do not provide insight
Convolutional into
Neural
how fault classification decisions are made. This not only raises questions on the trustworthiness of the
Network (DCNN), which has been developed on the basis of Layer-wise Relevance Propagation (LRP), for fault diagnosis of gearboxes. Vibration decisions themselves, but also limits
Infurther
today’s
signals improvement
as business
time series of DNNs
environment,
data forthe
are first adaptation
converted tototime-frequency
trend towards a broader rangespectra
more product ofvariety
applications. This paper
and customization
images through presents
wavelet an explainable
is unbroken.
transform, Due
which thisDeep
toare Convolutional
development,
then classified Neural
thea need
by DCNN. of
Network
agile and (DCNN), which
reconfigurable has been
production developed
systems on the
emerged basis
to of Layer-wise
cope with Relevance
various productsPropagation
and (LRP),
product for
families. fault
To
To explain the rationale for classification decision, LRP decomposes contributions from local regions in the spectra images to the classification diagnosis
design of
and gearboxes.
optimize Vibration
production
signals
results, as
systems andtime
as well series
as to data
determines choose are the
which first converted
optimal
time-frequency topoints
product time-frequency
matches, spectra
product
in the spectra images
analysis
image through
methods
contribute wavelet
theare totransform,
needed.
most type which
Indeed,
fault most
and are then
of the
severity classified
known byResults
methods
identification. a DCNN.
aim toof
To explain
analyze the
a product
the analysis rationale
or one
are then for classification
product family
cross-checked decision,
on the
with LRP
the time-frequency decomposes
physical level. Different contributions
analysis. product from local
families, however,
The effectiveness regions in the spectra
may differexplainable
of the developed images
largely in terms
DCNNto the
of the classification
number and
is evaluated by
results,ofand
nature
experiments determines
components.
on a gearbox which
This facttime-frequency
testbed, impedes points
an efficient
where gears incomparison
the spectra
with different types image contribute
and degrees
choice of the most
of appropriate
faults toproduct
fault type
are evaluated. andresults
family
LRP severity identification.
combinations
have for thethat
revealed Results
production of
a trained
the analysis
DCNN is are then
selective to cross-checked
different with bands
frequency the time-frequency
in the analysis.spectra
time-frequency The effectiveness
for of theofdeveloped
classification gearbox explainable
fault type and DCNN
severity. is evaluated by
system. A new methodology is proposed to analyze existing products in view of their functional and physical architecture. The aim is to cluster
experiments on a gearbox testbed, where gears with different types and degrees of faults are evaluated. LRP results have revealed that a trained
these products in new assembly oriented product families for the optimization of existing assembly lines and the creation of future reconfigurable
DCNN is selective to different frequency bands in the time-frequency spectra for classification of gearbox fault type and severity.
© 2019 The
assembly Authors.
systems. BasedPublished
on Datum by Elsevier
Flow Chain,B.V.the This is an open
physical accessofarticle
structure under the
the products CC BY-NC-ND
is analyzed. Functionallicense
subassemblies are identified, and
©functional
2019 Theanalysis
Authors.isPublished by Moreover,
Elsevier B.V.
a(http://creativecommons.org/licenses/by-nc-nd/3.0/).
performed. a This is functional
hybrid an open access
and article under
physical the CC BY-NC-ND
architecture graph (HyFPAG)licenseis the output which depicts the
© 2019 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/3.0/)
Peer-review
similarity under product
between responsibility
families of the
by scientific
providingcommittee
design support of the to
26th CIRP
both, Life Cycle
production Engineering
system planners (LCE)
and Conference.
product designers. An illustrative
(http://creativecommons.org/licenses/by-nc-nd/3.0/).
Peer-review under responsibility of the scientific committee of the 26th CIRP Life Cycle Engineering (LCE) Conference.
example of a nail-clipper is used to explain the proposed methodology. An industrial case
Peer-review under responsibility of the scientific committee of the 26th CIRP Life Cycle Engineering (LCE) Conference. study on two product families of steering columns of
Keywords: Fault
thyssenkrupp Diagnosis;
Presta FranceExplainable Neuralout
is then carried Networks;
to give aLayer-wise Relevance
first industrial Propagation
evaluation of the proposed approach.
©Keywords:
2017 TheFaultAuthors. Published
Diagnosis; by Elsevier
Explainable Neural B.V.
Networks; Layer-wise Relevance Propagation
Peer-review under responsibility of the scientific committee of the 28th CIRP Design Conference 2018.
1. Introduction
Keywords: Assembly; Design method; Family identification occurrence in the current gearbox design to be used for
1. Introduction redesign
occurrence towards
in thelonger
current life gearbox
cycle, as design
shown in to Fig. 1.
be used for
Achieving sustainability in manufacturing is of ever- redesign towards longer life cycle, as shown in Fig. 1.carried
Fault diagnosis of gearboxes has traditionally been
increasing
Achieving importance to meetinthemanufacturing
sustainability growing demands is ofof ever-
strict outFault
by analyzing
diagnosissensor measurements
of gearboxes (e.g. vibration)
has traditionally in the
been carried
1.environmental
Introduction and occupational safety regulations and of time-frequency
the product range and characteristics
domain, where fault manufacturedfeatures
characteristic and/or
increasing importance to meet the growing demands of strict out by analyzing sensor measurements (e.g. vibration) in the
diminishing natural assembled in this system. In thisthecontext, the health
main challenge in
environmental and resources [1]. Assafety
occupational gearboxes are a critical
regulations and are extracted
time-frequency to reveal
domain, where gearbox
fault characteristic condition,
features
Due to ofthe
component power fasttransmission
development systemsin in themost domain
modern of modelling and
abnormalities, analysis is now not only to cope with single
diminishing natural resources [1]. As gearboxes are a critical are extracted and to fault
reveal typesthe[2].gearbox
However, the performance
health condition,
communication
manufacturing and an consideration
systems, ongoing trendofofthedigitization
life cycle and
of products,
of the a limited product
time-frequency rangeheavily
analysis or existingreliesproduct
the families,
on performance
accuracy
component of power transmission systems in most modern abnormalities, and fault types [2]. However, the
digitalization,
gearboxes is manufacturing
expected to be a enterprises
key strategy are
forfacing
enhancingimportant
the butand
also to be able to analyze and
comprehensiveness of tophysical
compare products
knowledge to define
(e.g.
manufacturing systems, consideration of the life cycle of of the time-frequency analysis heavily relies on the accuracy
challenges
sustainability in of
today’s market environments:
manufacturing. Specifically, a continuing
the accurate new product families.
frequency responses Itofcan be observed
certain fault that classical
types). In addition,existing
time-
gearboxes is expected to be a key strategy for enhancing the and comprehensiveness of physical knowledge (e.g.
tendency
and timelytowards reduction
diagnosis of gearof faults
product development
is not only the timesfor
essential and product families are regrouped
frequency in function of clients ortypes
features.
sustainability of manufacturing. Specifically, accurate frequency analysis
responsescannot reliably
of certain faultdistinguish fault
types). In addition, and
time-
shortened
avoiding product
unplanned lifecycles. In
maintenance addition,
or there
shutdowns,is an increasing
but also However,
severity assembly
levels iforiented
the product
faults share families
the are
same hardly to find.
characteristic
and timely diagnosis of gear faults is not only essential for frequency analysis cannot reliably distinguish fault types and
demand
provides of opportunities
customization,for being
reuseat the same
or shutdowns,time in a global
remanufacturing of On the product
frequency. family level,
Recently, products differ mainly
artificial in two
avoiding unplanned maintenance or but also severity levels if the faults shareintelligence algorithms,
the same characteristic
competition
faulted with competitors all over the world. This trend, main characteristics: (i) the number of components and (ii) the
providesgears, as well as obtaining
opportunities for reuseuseful information on fault
or remanufacturing of especially
frequency. Deep Neuralartificial
Recently, Networks (DNNs), algorithms,
intelligence have been
which is inducing the development from macro to micro type of components (e.g. mechanical, electrical, electronical).
faulted gears, as well as obtaining useful information on fault especially Deep Neural Networks (DNNs), have been
markets, results in diminished lot sizes due to augmenting Classical methodologies considering mainly single products
2212-8271 © 2019 The Authors. Published by Elsevier
product varieties (high-volume to low-volume production) [1]. B.V. This is an open access article under thealready
or solitary, CC BY-NC-NDexistinglicense
product families analyze the
To cope
2212-8271 with
© this
2019 Theaugmenting
Authors. variety
Published by as well
Elsevier as
B.V. to
Thisbeis able
an opento
accessproduct
article structure
under the CC on a
BY-NC-ND
Peer-review under responsibility of the scientific committee of the 26th CIRP Life Cycle Engineering (LCE) Conference.
physical level
license (components level) which
identify possible optimization potentials
doi:10.1016/j.procir.2017.04.009 in the existing causes difficulties regarding an efficient definition and
Peer-review under
production responsibility
system, of the scientific
it is important to have committee
a precise of the 26th CIRP Life Cycle
knowledge Engineering
comparison of(LCE) Conference.
different product families. Addressing this
doi:10.1016/j.procir.2017.04.009
2212-8271 © 2019 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/3.0/)
2212-8271 © 2017 The Authors. Published by Elsevier B.V.
Peer-review
Peer-review under
under responsibility
responsibility of scientific
of the the scientific committee
committee of the of theCIRP
28th 26thDesign
CIRP Conference
Life Cycle 2018.
Engineering (LCE) Conference.
10.1016/j.procir.2018.12.008
John Grezmak et al. / Procedia CIRP 80 (2019) 476–481 477
2 Grezmak et al./ Procedia CIRP 00 (2019) 000–000
Manufacturing LRP has been used for interpreting DNN-based

electroencephalogram (EEG) classifications related to motor-
Condition monitoring
imaginary Brain-Computer Interfacing [18] and explaining
decisions made by an autonomous driving system [19]. To
Material Use
Preparation/Extrusion
Diagnosis explanation DL-based diagnosis take advantage of the classification abilities of DCNNs while
allowing for transparency and physical explanations of their
Optimized
manufacturing
Optimized
design decisions, this paper extends the method in [16] and proposes
option
Redesign a framework for explainable DCNN-based fault diagnosis of
Design
Reduce Recycle
End of
Service Life
rotating machinery on the basis of LRP. The proposed
framework is validated through an experimental study on
gearbox fault diagnosis.
Remanufacture Recover
Reuse 2. Explainable DCNN

Fig. 1. Gearbox life cycle in sustainable manufacturing
This section reviews the basic theory of DCNNs for image
investigated for machine fault diagnosis, given their representation and classification, and subsequently
demonstrated capability in characterizing complex patterns elaborates on how LRP can quantify the contributions of the
underlying data [3]. Several variants of DNNs have been network inputs to the classification results.
investigated for fault diagnosis of rotating machinery. 2.1 DCNN
Autoencoder (AE) networks, as one such variant, reduce data
dimensionality while learning efficient representations of the DCNN is a type of feed-forward artificial neural network
data, which can be used for fault classification purposes [4]. with a data processing structure characterized by local
Stacked AE [5] and stacked de-noising AE [6-7] have been receptive fields and sparse connections between layers.
investigated for their robustness to noise and generalization DCNNs can be trained to automatically extract hierarchical
capabilities to varying operating conditions for bearing fault features in the input data through its processing layers, while
diagnosis. To deal with non-stationary signal, empirical exhibiting high computational efficiency in processing
mode decomposition (EMD) has been integrated with images compared to other DNN variants [20].
autoregressive (AR) and AE [8], in which the AR coefficients A typical DCNN is composed of an input layer, multiple
were used as inputs to a stacked AE for identifying the fault convolutional and pooling layers, and fully-connected layers,
types. Another DNN variant, Deep Belief Networks (DBNs), as well as an output (classification) layer. The input layer is
are similar in function to AE networks, but use a stochastic an image, 2-D (grey image) or 3-D (RGB image), to be
approach to determine the efficient data representations [9- classified. Among the DCNN layers, convolutional layers
10]. To reduce the training effort, a particle swarm extract features (e.g. edge, curve, and motif) from the image
optimization has been investigated to automatically optimize through kernel-based convolution, while pooling layers
the DBN network structure, for bearing fault classification down-sample the outputs of convolutional layers to improve
[11]. A multi-layer DBN has been developed for both bearing computational efficiency and allow for spatial invariance of
learned features. The fully-connected layers connect the
fault type identification and fault severity characterization
output of the final convolutional or pooling layer through
[12]. Taking the wavelet energy coefficients as the network
weighted connections to the classification layer, whose
inputs, the first layer identified the underlying fault type,
outputs (with number of outputs equal to the number of
while the second layer further identified the fault severity classes) correspond to the likelihood of an input belonging to
based on the first layer results. Deep Convolutional Neural a specific class.
Networks (DCNNs), which have demonstrated advanced A convolutional layer utilizes h kernels (filters) of size p
capability in image processing and classification, have also x q to extract features from the outputs of the previous layer.
been investigated for machine fault diagnosis [13-15]. In past Each kernel, whose values k are a set trainable weights, is
work, the authors have investigated DCNN for gearbox fault convolved sequentially with p x q regions of the previous
diagnosis. Specifically, one-dimensional vibration signals are outputs, the result of which is given a scalar additive bias b,
first converted by wavelet analysis to time-frequency images, and passed through a non-linear activation function, such as
from which a DCNN is leveraged to learn the underlying sigm() or the rectified linear unit ReLU(), which introduces
features and performs fault classification [16]. non-linearity in the mapping of features from layer to layer.
However, despite their demonstrated performance, DNN- Thus, each kernel creates a feature map of size (m – p + 1) x
based fault diagnosis algorithms are still not ready to be (n – q + 1), assuming the convolution is done with a unit
applied in reality, because of their “black-box” nature. In stride. The jth feature map of layer l can be obtained through
other words, it is difficult to explain the rationale for the the convolution as
decisions made by DNNs, making it difficult to evaluate their
 
reliability. Explainable DNN structures have been attracting   blj   yil 1  k lj 
y lj  (1)
attention recently. Layer-wise Relevance Propagation (LRP)  iM l 1

[17] has recently been proposed as an explanation method for where  is the non-linear activation function, the subscript j
DNN classifiers, which quantifies the contribution of the denotes the jth feature map ylj generated by the jth kernel k lj
individual inputs to the DNN output (i.e. the predicted class).
478 John Grezmak et al. / Procedia CIRP 80 (2019) 476–481
Grezmak et al./ Procedia CIRP 00 (2019) 000–000 3
and bias blj in the lth layer, M l 1 denotes all feature maps in interpret how pixels in the input image affect the classifier’s
the (l – 1)th layer, and  is the convolution operator. Along decision on fault type and severity for that input.
the network propagation, the lower-level convolutional LRP is performed in a top-down manner with respect to
layers extract lower-level features (e.g. edges and curves), the classifier structure, starting with a relevance score at the
and higher-level layers extract higher-level features (i.e. output layer, which can be taken as a real-valued prediction
combinations of the lower level features, motifs). output of the classifier corresponding to a specific class, and
Each convolution layer is typically followed by a pooling propagating relevance scores to the input pixels. The
layer, which down-samples the outputs of the preceding relevance is propagated such that the sum of relevance scores
convolution layer. To perform down-sampling, each feature in each layer is constant:
map is subject to region-wise pooling operations that act on
N l x N l sized non-overlapping regions of the feature map. f ( x)    R
d l 1
( l 1)
d   Rd(l )     Rd(1)
d l d
(2)
Typical pooling operations include calculating the mean or
maximum value in the N l x N l regions. where f(x) is the real-valued prediction output, and Rld is the
After the final convolutional or pooling layer, the relevance score for neuron d (i.e. a pixel in a feature map in
resulting feature maps are concatenated into a feature vector, hidden layers) in layer l. The relevance in a given layer is
which is fully connected via weights to the classification distributed to the neurons in the preceding layer depending
layer. The values at the classification layer resulting from the on the layer type. For fully-connected and convolutional
weighted multiplications of the fully-connected layer values layers, the relevance scores in the preceding layer are
are passed through a final non-linear activation function, computed as
such as the sigm() function, to obtain a value in the range of zij
0 to 1 specifying the likelihood the input corresponds to the Ri(l )   R (j l 1) (3)
class associated with that value. The weights and biases in j  z    sign( i ' zi ' j )
i' i' j
the network are learned during a training process such that a where sign() is equal to the sign of the argument of this
satisfactory percentage of inputs are correctly classified on a function and zij is the contribution of the activation at
set of testing data. Learning is achieved through use of the neuron i towards the total pre-activation of neuron j, z j ,
stochastic gradient descent method for optimizing an
objective function related to the error between the actual which is defined as zij  ai(l ) wij(l ,l 1) ,
output of the network and the desired output. The gradients where ai( l ) is the activation of neuron i in layer l and wij(l ,l 1)
required for this method can be computed by the
is the weight connecting neurons i and j, and ε is a numerical
backpropagation method [21].
stabilizer. This propagation rule implies that, for fully-
2.2. LRP-based pixel-wise explanation connected and convolutional layers, the neurons in the
preceding layers receive a share of the relevance scores based
Once a DCNN has been trained to classify the images on their relative contribution to the next layer values in the
with satisfactory performance, LRP is applied to interpret the forward pass. For pooling layers, the relevance scores are up-
classification decisions made by the DCNN. LRP sampled to match the output dimensions of the previous layer
decomposes an output of a trained DCNN into a set of and scaled by the scaling factor between the layers.
relevance scores for pixels in the associated input image, Because the pre-activation contribution of a single neuron
which quantify the contribution of each pixel to the output can have different sign than the total pre-activation due to all
values. Such relevance scores are leveraged in this paper to neurons in the same layer, the propagated relevance scores
Fig. 2. Framework for LRP-based explainable DCNN for gearbox fault diagnosis
according to Eq. (3) can take on positive or negative values. crafted fault types: no fault (normal), slight crack, large
Neurons with relevance scores that take on positive values crack, or missing tooth. The shaft connected to the 55 tooth
are interpreted as having an overall net positive contribution driven gear is connected to a brake. Vibration sensors are
to the classifier output. Thus, input pixels with positive mounted on the gearbox housing near the bearings in four
relevance can be interpreted as evidence in the input image locations. In this study, vibrations signals from a sensor
for a particular classification. In the same manner, neurons mounted near the driving gear are used for testing and
with negative relevance scores can be interpreted as evidence training the DCNN.
against a particular classification. A higher relevance score
denotes stronger evidence for or against a particular
classification depending on the sign of the relevance score.
3. Explainable DCNN for Gearbox Fault Diagnosis
In our previous work [16], a DCNN is investigated to

recognize and classify time-frequency spectrum images
obtained by taking the continuous Wavelet transformation
(CWT) of vibration signals, to diagnose the fault type and
severity of gears. To explain the rationale for decisions made
by the DCNN, LRP is leveraged, with the framework for
DCNN-based classification and LRP-based pixel-wise
explanation illustrated in Fig. 2. Once the DCNN has been
trained to reach a satisfactory level of classification accuracy,
the LRP method is then used to quantify the contribution of
the pixels of input images to the classification output, as a
means of providing an explanation as to why the DCNN
arrived at a given classification decision. Fig. 3. Experimental gearbox setup (top) and gear faults (bottom).
Given a time-frequency spectra image, the classification
result is obtained in a forward pass through the network. The 4.2 Image Processing
value corresponding to the predicted class by the network is Two 30 s long vibration signals are collected for each fault
set as the final layer relevance score, which corresponds to type at a sampling rate of 8192 Hz. Every 0.5 s of data (4096
obtaining the relevance scores as evidence for and against the data points) is transformed using the CWT to a time-
predicted class. The relevance is then propagated towards the frequency image. The wavelet function is chosen as the
input layer using the propagation rules described in Section Morlet wavelet, on the basis of an energy-to-Shannon-
2.2. The obtained input pixel relevance scores, as measures entropy ratio [22]. Each time-frequency image is plotted in
of contribution to the predicted class, are then used to gray-scale using the same colormap scale (i.e. the wavelet
interpret which pixels or regions of pixels (physically coefficients for each sample have the same corresponding
representing certain frequency ranges) in an input time- pixel intensity value in the colormap), and resized to have
frequency image are most relevant to the DCNN’s decision dimensions of 60 x 60 pixels. A total of 480 images are
on the classified fault type. Specifically, the locations and obtained, with 120 images for each fault type. Half of the
patterns of the relevance scores associated with the images are used for network training and the rest half images
frequency ranges are cross-checked with known phenomena are used for testing.
occurring in the time-frequency domain that characterize the
4.3 DCNN setup and training
known fault types, such that a physical explanation of the
DCNN’s decisions is possible. The structure of the DCNN used to classify the time-
frequency images is shown in Fig. 2. The first convolutional
4. Experimental Evaluation layer consists of 6 feature maps with kernel size of 11 x 11,
and is followed by a mean-pooling layer of size 2 x 2. The
To verify the proposed explainable DCNN for gearbox
second convolutional layer consists of 12 feature maps with
fault diagnosis, an experimental study on gearbox fault
kernel size of 6 x 6, and is followed by a mean-pooling layer
diagnosis is carried out. This section describes the
of 2 x 2. The second pooling layer is concatenated to form
experimental setup used to obtain the vibration signals for
the feature vector, which is fully connected to an output layer
different fault types, the pre-processing steps to obtain the
with 4 neurons corresponding to the four different fault class
time-frequency spectra images from the signals, and the
types. The sigm() function is used as the non-linear activation
DCNN structure and training details.
function at the convolutional layers and the output layer. The
4.1 Experimental setup network is trained using the stochastic gradient descent
method, with mini-batch size of 10 samples and learning rate
The test setup used for collecting vibration signals from of 0.1, for a total of 100 epochs. The network reached a final
gears with various faults is shown in Fig. 3. The gearbox classification accuracy of 100% on the testing data set.
contains a gear pair with a 75/55 transmission ratio, for
which the 75 tooth driven gear contains one of four hand-
480 John Grezmak et al. / Procedia CIRP 80 (2019) 476–481
Grezmak et al./ Procedia CIRP 00 (2019) 000–000 5
4.4 LRP-based explanation of DCNN’s classifications Additionally, pixels near the image edges tend to have
neutral relevance scores, since they have sparser connections
The LRP method is used to obtain the relevant features in
to the proceeding layer neurons than pixels nearer to the
the time-frequency spectra of the testing data for the
center, and thus receive a smaller amount of relevance.
DCNN’s classifications. The final layer relevance score for
In the 500 Hz to 1500 Hz range, the relevance scores
each sample is set as the value corresponding to the predicted
appear to follow patterns specific to the associated fault type.
class (i.e. the values corresponding to all other classes are set
Specifically, bands of positive and negative relevance with
to 0), and relevance is propagated to the input pixels using
nearly constant widths along the frequency axis are shown to
the propagation rules described in Section 2.2, with a
span across the time domain of the images, with each fault
numerical stabilizer value of 0.1. To visualize the relevant
type displaying unique band locations. This suggests that the
features in the inputs, the relevance scores for each sample
DCNN classifier has learned to differentiate between the
are normalized and plotted as a heatmap such that positive
underlying fault types from the time-frequency images by
and negative relevance is shown in hot and cold hues,
detecting differences in the relative shapes of the frequency
respectively, with the colormap centered at relevance scores
distributions as represented in the time-frequency spectra,
of zero, which indicate no significant positive or negative
which have a near cyclic pattern in time with the gear
contribution to a classification.
revolution. This is intuitively plausible, since changes in the
5. Results and Discussions frequency distribution are expected to occur in faulty gears
as a result of sideband amplitude changes, which are a
Heatmaps of relevance scores for 4 time-frequency function of fault type and severity [23].
spectra images for each fault type are shown in Fig. 4. Each The fast Fourier Transform of a vibration signal for each
image represents a 0.5 s time-frequency spectra (roughly 15 fault type is shown in Fig. 5. The differences between these
gear revolutions per image) of the vibration data frequency spectra can be easily seen by visual inspection. For
corresponding to a given fault type. The vertical axis shows example, the amplitude decrease in the frequency range of
the frequency values from the time-frequency spectra that the 1100 to 1200 Hz for the slight crack distinguishes this fault
pixels are associated with. The colormaps have been chosen type from the normal gear, and the amplitude decrease in the
such that red colors correspond to positive relevance, blue frequency range of 1200 to 1300 Hz for the missing tooth
colors correspond to negative relevance, and green distinguishes this fault type from the large crack. In Fig. 4,
corresponds to neutral (approximately zero) relevance. bands of positive or negative relevance can be seen at these
Most of the positive and negative relevance falls within frequency locations with opposite signs between the fault
frequency range of just below 500 Hz to 1500 Hz, while classes (e.g., a band of positive relevance is seen at 1100 to
neutral relevance is dominant outside of this range. This is 1200 Hz for the slight crack, while a band of negative
expected because the vibration components outside of this relevance is seen at the same frequency range for the normal
range are very small in comparison to those within, which gear). This suggests that the DCNN has been trained to
are more closely related to the first harmonic of the meshing distinguish between the fault classes based on these changes
frequency and its sidebands, and vibration components with in the frequency spectra, as represented in the time-frequency
very small wavelet coefficients from the CWT will be input images.
assigned a value of 0 when the spectrum is converted to a
gray-scale image. According to the propagation rule of Eq. 6. Conclusions
(3), any inputs of 0 will be assigned a relevance score of 0, This paper presents an explainable DCNN, developed on
since they do not contribute to the output layer values. the basis of layer-wise relevance propagation, for gearbox
1500 Negative band
(1100-1200 Hz)
Normal
1000
500
1500
Positive band
Missing Tooth Large Crack Slight Crack
(1100-1200 Hz)
1000
Frequency (Hz)
500
1500 Negative band
(1200-1300 Hz)
1000
500
1500
Positive band
1000 (1200-1300 Hz)
500
0 0.5 1 1.5 2
Time (s)
Fig. 4. LRP results for 4 sample inputs from each fault types
[5] Tao S, Zhang T, Yang J, Wang X, Lu W. Bearing fault diagnosis

Normal method based on stacked autoencoder and softmax regression. Proc.
1100-1200 Hz
Of 2015 34th Chinese Control Conference p. 1-5.
[6] Lu C, Wang ZY, Qin WL, Ma J. Fault diagnosis of rotary machinery
components using a stacked denoising autoencoder-based health state
identification. Signal Process. 2017;130 p. 377-388.
[7] Shao H, Jiang H, Zhao H, Wang F. A novel deep autoencoder feature
learning method for rotating machinery fault diagnosis. Mech Syst
Slight Crack Signal Process. 2017;95 p. 187-204.
1100-1200 Hz [8] Qi Y, Shen C, Wang D, Shi J, Jiang X, Zhu Z. Stacked sparse
autoencoder-based deep network for fault diagnosis of rotating
machinery. IEEE Access 2017;5 p. 1-14.
[9] Chen Z, Li W. Multisensor feature fusion for bearing fault diagnosis
using sparse autoencoder and deep belief network. IEEE Trans.
Instrum. Meas. 2017;66:7 p. 1693-1702.
[10] Tao J, Liu Y, Yang D. Bearing fault diagnosis based on Deep Belief
Large Crack
Network and multisensor information fusion. Shock Vib. 2016 p. 1-9
1200-1300 Hz
[11] Shao H, Jiang H, Zhang X, Niu M. Rolling bearing fault diagnosis
using an optimization depe belief network. Meas. Sci. Technology
2015;26:11 p. 1-17.
[12] Gan M, Wang C, Zhu C. Construction of hierarchical diagnosis
network based on deep learning and its application in the fault pattern
recognition of rolling element bearings. Mech Syst Signal Process.
Missing
1200-1300 Hz
2016;72-73 p. 92-104.
Tooth
[13] Chen Z, Li C, Sanchez RV. Gearbox fault identification and
classification with convolutional neural networks. Shock Vib. 2015;2
p. 1-10.
[14] Janssens O, Slavkovikj, Vervisch B, Stockman K, Loccufier M,
Verstockt S, Van de Walle R, Van Hoecke S. Convolutional neural
Fig. 5. FFT of gear vibration signals network based fault detection for rotating machinery. J. Sound Vib.
2016;377 p. 331-345.
fault diagnosis. Time frequency spectra of vibration signals [15] Guo X, Chen L, Shen C. Hierarchical adaptive deep convolution
related to various gear faults are obtained through the neural network and its application to bearing fault diagnosis.
wavelet transformation of the signals, and subsequently Measurement 2016;93 p. 490-502.
[16] Wang P, Ananya, Yan R, Gao RX. Virtualization and deep recognition
routed to a DCNN for identification of fault type and for system fault classification. Proc. of NAMRC 45 2017 p. 1-7.
severity. LRP has been utilized to explain which parts of the [17] Bach S, Binder A, Montavon G, Klauschen F, Muller KR, Samek W.
time-frequency spectra represented in the input images are On pixel-wise explanations for non-linear classifer decisions by
relevant for the DCNN’s classification of fault type and Layer-wise Relevance Propagation. PLoS One 2015 p. 1-46.
severity. The LRP results have demonstrated that the DCNN [18] Sturm I, Lapuschkin S, Samek W, Muller KR. Interpretable deep
neural networks for single-trial EEG classification. J. Neurosci.
can clearly see the difference in frequency response caused Methods 2016;274 p. 141-145.
by crack initiation, as compared to the normal gear. The [19] Bojarski M, Yeres P, Choromanska A, Choromanski K, Firner B,
finding is consistent with knowledge of the sideband Jackel L, Muller Urs. Explaining how a deep neural network trained
phenomena due to local gear tooth damage. The decision with end-to-end learning steers a car. arXiv:1704.07911, 2017 p. 1-8.
transparency of the DCNN enabled by LRP opens up [20] Weimer D, Scholz-Reiter B, Shpitalni M. Design of deep
convolutional neural network architectures for automated feature
opportunities for more widespread adoption of DNN extraction in industrial inspection. CIRP Annals – Manuf. Tech. 2016
classifiers for machine fault diagnosis to support sustainable [21] Bouvrie J. Notes on convolutional neural networks. 2006.
manufacturing. [22] Yan R, Gao RX. Base wavelet selection for bearing signal analysis.
Int. J Wavelets Multiresolut Inf. Process. 2009;7:4 p. 411-426.
Acknowledgement [23] Liu B, Riemenschneider S, Xu Y. Geabrox fault diagnosis using
empirical mode decomposition and Hilbert spectrum. Mech. Syst.
This research was partially supported by the Digital Signal Process. 2006;20 p. 718-734.
Manufacturing and Design Innovation Institute under award
DMDII-15-14-01.
References
[1] Jayal AD, Badurdeen F, Dillon Jr. OW, Jawahir IS. Sustainable
manufacturing: Modeling and optimization challenges, process and
system levels. CIRP J. Manuf. Sci. and Technol. 2010;2:3 p. 144-152.
[2] Zappala D, Tavner PJ, Crabtree CJ, Sheng S. Side-band algorithm for
automatic wind turbine gearbox fault detection and diagnosis. IET
Renew. Power Gener. 2014;8:4. p. 380-389.
[3] Liu R, Yang B, Zio E, Chen X. Artificial intelligence for fault
diagnosis of rotating machinery: A review. Mech. Syst. Signal
Process. 2018;108 p. 33-47.
[4] Verma N, Gupta VK, Sharma M, Sevakula RK. Intelligent condition
based monitoring of rotating machines using sparse auto-encoders.
Proc. of 2013 IEEE PHM Conference p. 1-7.

Explainable Convolutional Neural Network For Gearbox Fault Diagnosis PDF

Загружено:

Сведения о документе

Оригинальное название

Авторское право

Доступные форматы

Поделиться этим документом

Поделиться или встроить документ

Параметры публикации

Этот документ был вам полезен?

Это неприемлемый материал?

Авторское право:

Доступные форматы

Explainable Convolutional Neural Network For Gearbox Fault Diagnosis PDF

Загружено:

Авторское право:

Доступные форматы

Available online at www.sciencedirect.

Procedia CIRP 00 (2017)

Manufacturing LRP has been used for interpreting DNN-based

Reuse 2. Explainable DCNN

3. Explainable DCNN for Gearbox Fault Diagnosis

In our previous work [16], a DCNN is investigated to

[5] Tao S, Zhang T, Yang J, Wang X, Lu W. Bearing fault diagnosis

Вам также может понравиться