Академический Документы
Профессиональный Документы
Культура Документы
rF
k
rR
k
; 1
where r(R
k
) is the standard deviation of the kth target and
cov(F
k
, R
k
) is the covariance between F
k
and R
k
.
Secondly, the symmetrical uncertainty is dened as follows:
SUF
k
; R
k
2
IGF
k
jR
k
HF
k
HR
k
_ _
; 2
with
IGF
k
jR
k
HF
k
HF
k
jR
k
; 3
HF
k
F
0
k
2XF
k
PF
0
k
logPF
0
k
; 4
HF
k
jR
k
R
0
k
2XR
k
PR
0
k
F
0
k
2XF
k
PF
0
k
jR logPF
0
k
jR
_
; 5
PF
0
k
N
i1
dd
i
; F
0
k
N
; 6
dd
i
; F
0
k
1; if d
i
F
0
k
0; otherwise
_
; 7
Fig. 2. Flowchart of HFS for unsupervised learning.
Table 1
Comparison between FRMV and HFS.
FRMV HFS
Subspace Randomly select
n
2
features Randomly divide feature space to two subspaces X
1
, X
2
Clustering analysis Remain the bias towards single data structure Consider the bias towards date structure of individual algorithm
Independent evaluation None Re-rank the features according to similarity measurement
Note: N the number of all features.
11314 Y. Yang et al. / Expert Systems with Applications 38 (2011) 1131111320
where H(F
k
) calculates the entropy of F
k
and H(F
k
|R
k
) is the condi-
tional entropy of F
k
. X(F
k
) denotes all possible values of F
k
and
X(R
k
) is all possible values of R
k
. PF
0
k
is the probability that F
k
equals to F
0
k
and PF
0
k
jR
k
is the probability that F
k
equals to F
0
k
under
the condition that the instances are assigned into the group R
k
. In
addition, the value 1 of the symmetrical uncertainty SU(F
k
, R
k
) indi-
cates that F
k
is completely related to R
k
. On the other hand, the va-
lue 0 of the symmetrical uncertainty SU(F
k
, R
k
) means that F
k
is
absolutely irrelevant with target (Hong et al., 2008a; Shao & Nezu,
2000).
Thirdly, DB index is a function of the ratio of the sum of within-
cluster scatter to betweencluster separations, which is computed
as follows:
DB
1
n
n
i1
min
1j
S
n
Q
i
S
n
Q
j
SQ
i
; Q
j
_ _
; 8
where n is the number of clusters, Q
i
stands for the ith cluster, S
n
de-
notes the average distance of all objects from the clusters to their
cluster centre, S(Q
i
, Q
j
) is the distance between cluster centers.
The DB index is small if the clusters are compact and fat from each
other, in other word, a small DB index means a good clustering.
2.3. Combination and similarity measurement
Besides maximization of clustering performance, the other
important purpose is the selection of features based on the feature
dependency or the similarity. Since any feature carrying little or no
additional information beyond that subsumed by the remaining
features, is redundant and should be eliminated (Mitra et al.,
2002). That is, if there is a feature with high rank carrying valuable
information and it is very similar to a lower-ranked feature, thus
latter one should be eliminated due to it carries no additional valu-
able information. Therefore, the similarities between features are
considered as reference for the redundancy evaluation.
In step 2, a consensus function is utilized to combine all sub-
decisions into a pre-nal decision, named combiner. A large num-
ber of combiners used for combining the results of the classier
were discussed in Dietrich, Palm, and Schwenker (2003). The most
common combiners are majority vote, simple average and
weighted average. In the simple average, the average of learning
model results is calculated and the variable owned the largest
average value is selected as the nal decision. The weighted aver-
age is the same concept as the simple average except that the
weights are selected heuristically. While, the majority vote assigns
the kth variable a rank j if more than half of sub-decisions vote it to
rank j.
Practically, the determination of the weights in the weighted
average combiners relies on experience. On the other hand, the
majority vote could lead to confusion of decision making, e.g. one
feature could be nominated with two ranks at the same time.
Therefore, the simple average combiner is applied in this study
to combine the sub-decisions, which is computed as follows:
ARj
M
k1
rank
k
j
M
; 9
where M is the population of sub-decisions, and rank
(k)
(j) is the sig-
nicance measurement of feature j in kth sub-decision RF
(k)
.
Thereafter, in step 3, in order to reduce the redundancy, those
high ranked but less independent features with respect to the ob-
tained pre-nal decision are eliminated. The similarity between
features could be utilized to estimate the redundancy. There are
broadly criteria for measuring similarity between two random
variables, based on the linear dependency between them. The rea-
son why chooses the linear dependency as a feature similarity
measure is that the data is still linearly separable when all but
one of the linearly dependent features are eliminated if the data
is linearly separable in the original representation. In this research,
the most well known measure of similarity between two random
variables, correlation coefcient, is adopted. The correlation coef-
cient q between two variables x and y is dened as
qx; y
covx; y
varxvary
_ ; 10
where var(x) denotes the variance of x and cov (x, y) is the covari-
ance between two variables x and y.
The elimination procedure is then conducted according to the
pre-nal decision and the similarity measure between features.
For example, the most signicant (top one) feature is retained, to
which thereby the most related features based on the similarity
measure are considered as the redundant features to be removed,
and the successive features are processed likewise until the well-
ranked features are linear independent.
3. HFSs application in bearing fault diagnosis
This section applies the HFS in bearing fault diagnostics. The
comparison results between HFS and other feature selection meth-
ods will be demonstrated and discussed.
To validate the proposed feature selection scheme could im-
prove the classication accuracy, a comparison between the pro-
posed hybrid feature selection scheme and other ve feature
selection approaches was carried out. The eight learning algo-
rithms are listed as follows:
(1) HFS with Symmetrical uncertainty (HFS_SU);
(2) HFS with Linear Correlation Coefcient (HFS_LCC)
(3) HFS with DB index (HFS_DB)
(4) PCA-based feature selection (Malhi & Gao, 2004);
(5) FRMV based on k-means clustering with Symmetrical uncer-
tainty (FRMV_KM) (Hong et al., 2008a);
(6) Forward search feature selection (SFFS) (Oduntan et al.,
2008);
(7) Forward orthogonal search feature selection algorithm by
maximizing the overall dependency (fosmod) (Wei &
Billings, 2007);
(8) Feature selection through feature clustering (FFC) (Li, Hu,
Shen, Chen, & Li, 2008).
The comparisons among them were in term of classication
accuracy. According to Hong et al. (2008a), the iteration of
FRMV_KM was set to 100, the k-means clustering was used to ob-
tain the population of clustering solutions and SU was adopted as
the evaluation criteria. In order to get the comparable population
of sub-decision, the iteration of the proposed algorithm was set
to 50. The threshold of the fosmod was set to 0.2. Two commonly
used clustering algorithms were adopted in the HFS, fuzzy c-mean
clustering and hierarchical clustering algorithms. In this research,
the result of FCM was defuzzied as follows:
Rk
1; if Pk maxP
0; otherwise
_
; 11
where P and P(k) denote the membership of instance that belongs to
each cluster and the possibility of the instance that belongs to kth
cluster, respectively.
Features discussed in this chapter for bearing defects included
the features extracted from time-domain, frequency domain,
time-frequency domain and empirical mode decomposition
(EMD). Firstly, in the time domain, statistical parameters were ex-
tracted from the waveform of the vibration signals directly. A wide
set of statistical parameters, such as rms, kurtosis, skewness, crest
Y. Yang et al. / Expert Systems with Applications 38 (2011) 1131111320 11315
factor and normalized high order central moment, have been
developed (Jack & Nandi, 2002; Lei, He, & Zi, 2008; Samanta &
Nataraj, 2009; Samanta, Al-Balushi, & AI-Araimi, 2003). Second,
the characteristic frequencies related to the bearing components
were located, e.g. ball spin frequency (BSF), ball-pass frequency
of inner ring (BPFI), and ball-pass frequency of outer ring (BPFO).
Besides, in order to interpret real world signals effectively, the
envelope technique for the frequency spectrumwas used to extract
the features of the modulated carrier frequency signals (Patil,
Mathew, & RajendraKumar 2008). In addition, a new signal feature,
proposed by Huang from envelope signal (Hung, Xi, & Li, 2007b),
the power ratio of maximal defective frequency to mean or PMM
for short, was calculated as follows:
PMM
maxpf
po
; pf
pi
; pf
bc
meanp
; 11
where P(f
Po
), p(f
pi
) and p(f
bc
) are the average power of the defective
frequencies of the outer-race, inner race and ball defects, respec-
tively; and mean (p) is the average of overall frequency power.
Thirdly, Yen introduced wavelet packet transform (WPT) in Yen
(2000) as follows,
e
jn
k
w
2
j;n;k
; 12
where w
j.n.k
is the packet coefcient, j is the scaling parameter, k is
the translation parameter, and n is the oscillation parameter. Each
wavelet packet coefcient measures a specic sub-band frequency
content. In addition, EMD was used to decompose signal into sev-
eral intrinsic mode functions (IMFs) and a residual. The EMD energy
entropy in Yu, Yu, and Cheng (2006) was developed to calculate the
rst several IMFs of signal.
In this research, self-organized map (SOM) was used to validate
the classication performance based on the selected features. The
theoretical background of unsupervised SOM has been extensively
studied in the literature. A brief introduction of SOM can be found
in Liao and Lee (2009) for bearing faults diagnose. With available
data from different bearing failure modes, the SOM can be applied
to build a health map in which different regions indicate different
defects of a bearing. Each input vector could be represented by a
BMU (Best Machining Unit) in the SOM. After training, the input
vectors of a specic bearing defect are represented by a cluster of
BMUs in the map, which forms a region indicating the defect. If
the input vectors are labeled, each region could be dened to rep-
resent a defect.
3.1. Experiments
In this research, two tests were conducted on two types of bear-
ing and the class information was considered as unknown in both
cases.
In the rst test, bearings were articially made to have roller
defect, inner-race defect, outer-race defect and four different
0 0.5 1 1.5 2 2.5 3
-0.1
-0.05
0
0.05
0.1
Time (s)
A
c
c
e
l
e
r
a
t
i
o
n
(
g
)
Normal
0 0.5 1 1.5 2 2.5 3
-0.2
0
0.2
0.4
0.6
Time (s)
A
c
c
e
l
e
r
a
t
i
o
n
(
g
)
Roller defect
0 0.5 1 1.5 2 2.5 3
-0.5
0
0.5
1
Time (s)
A
c
c
e
l
e
r
a
t
i
o
n
(
g
)
Inner-race defect
0 0.5 1 1.5 2 2.5 3
-0.1
-0.05
0
0.05
0.1
Time (s)
A
c
c
e
l
e
r
a
t
i
o
n
(
g
)
Outer-race defect
0 0.5 1 1.5 2 2.5 3
-0.5
0
0.5
1
Time (s)
A
c
c
e
l
e
r
a
t
i
o
n
(
g
)
Inner-race & Roller defect
0 0.5 1 1.5 2 2.5 3
-1
-0.5
0
0.5
1
Time (s)
A
c
c
e
l
e
r
a
t
i
o
n
(
g
)
Outer & Inner-race defect
0 0.5 1 1.5 2 2.5 3
-2
-1
0
1
2
Time (s)
A
c
c
e
l
e
r
a
t
i
o
n
(
g
)
Outer & inner-race & Roller defect
0 0.5 1 1.5 2 2.5 3
-0.5
0
0.5
Time (s)
A
c
c
e
l
e
r
a
t
i
o
n
(
g
)
Outer-race & Roller defect
Fig. 3. Vibration signal of the rst test, including normal pattern and seven failure patterns.
11316 Y. Yang et al. / Expert Systems with Applications 38 (2011) 1131111320
combinations of the single failures respectively. In this case,
SKF32208 bearing was tested, with an accelerometer installed on
the vertical direction of its housing. The sampling rate for the
vibration signal was 50 kHz. The BPFI, BPFO, and BSF for this case
were calculated as 131.73 Hz, 95.2 Hz and 77.44 Hz, respectively.
Fig. 3 shows the vibration signal of all defects as well as the normal
condition in the rst test.
In the second test, a set of 6308-2R single row deep groove ball
bearings were run to failure resulting in roller defect, inner-race
defect and outer-race defect (Huang et al., 2007b). Totally 10 bear-
ings were involved through the experiment. The data sampling fre-
quency was 20 kHz. The BPFI, BPFO, and BSF in this case were
calculated as 328.6 Hz, 205.3 Hz and 274.2 Hz, respectively. It
should be pointed out that the beginning of the second test was
not stable, and then it fell into a long normal period. Hence, two
separate segments from the stable normal period were selected
to be baseline for training and testing, respectively. On the other
hand, the data that exceeded mean value before end of the test
was supposed as potential failure patterns. Therefore, 70% of the
faulty patterns and the half of good patterns were used for training
unsupervised learning model, while all the faulty patterns and the
other half of good patterns for testing. Fig. 4 shows the part of the
data segments of one bearing from the run-to-failure experiment
in the second test.
3.2. Analysis and result
In the rst test, totally 24 features were computed as follows.
Half of the data was used for training the SOM and the remaining
part for testing.
Energies centered at 1xBPFO, 2xBPFO, 1xBPFI, 2xBPFI, 1xBSF,
2xBSF.
6 statistics for the raw signal (mean, rms, kurtosis, crest factor,
skewness, entropy).
6 statistics for envelop signal obtained by hilbert transform.
6 statistics for the spectrum results of the waveform by FFT.
Figs. 5 shows the results of rst test, with the x axis represent-
ing the number of selected features fed into the unsupervised SOM
for clustering and the y axis representing the classication accu-
racy correspondingly. The rst 12 features selected by each algo-
rithm are shown for convenience. Take Fig. 5a as an example, the
classication accuracies based on HFS_SU, HFS_LCC and HFS_DB
with the top one ranked feature as input were 92.11%, 92.11%
and 85.59%, respectively. When using the rst three ranked fea-
tures, the accuracy of 97.19%, 99.77% and 97.03% were achieved.
In the comparison with HFS_SU, HFS_DB, features selected by
HFS_LCC achieved higher classication accuracy of 99.77%. In other
word, HFS_LCC apparently selected most representative features
for this specic application. As shown in Fig. 5b, the highest classi-
cation accuracy was 99.38% with 5 features for PCA. In Fig. 5c, the
classication accuracies based on HFS_SU, HFS_DB and HFS_LCC
were higher than the results based on FRMV_KM. For FRMV_KM,
highest classication accuracy of 98.36% was achieved with 12 fea-
tures. Fig. 5d compared SFFS and three HFS methods, HFS_LCC se-
lected most representative features, and the accuracies reached by
HFS were higher. For SFFS, highest classication accuracy of 98.43%
was achieved with 9 features. As shown in Fig. 5e, although the
rst 11 features selected by fosmod ultimately reached the accu-
racy of 99.14%, HFS not only obtained higher accuracy but also
ranked features with high reliability. Comparing to FFC as shown
in Fig. 5f, features selected by HFS provided better classication
0 500 1000 1500 2000
-40
-20
0
20
40
A
c
c
e
l
e
r
a
t
i
o
n
(
g
)
unstable beginning
0 500 1000 1500 2000
-50
0
50
A
c
c
e
l
e
r
a
t
i
o
n
(
g
)
stable 1
0 500 1000 1500 2000
-50
0
50
A
c
c
e
l
e
r
a
t
i
o
n
(
g
)
stable 2
0 500 1000 1500 2000
-200
-100
0
100
200
A
c
c
e
l
e
r
a
t
i
o
n
(
g
)
failure
Fig. 4. Vibration signal of one bearing of the second test; (1) unstable beginning of
the test; (2) rst stable segment; (3) second stable segment; (4) failure pattern
(inner race defect).
1 2 3 4 5 6 7 8 9 10 11 12
88
90
92
94
96
98
100
Validated by SOM
Number of features according to rankings
A
c
c
u
r
a
c
y
HFS SU
HFS LCC
HFS DB
Fig. 5a. Comparison results of classication accuracy of HFS_LLC, HFS_SU, HFS_DB.
1 2 3 4 5 6 7 8 9 10 11 12
88
90
92
94
96
98
100
Validated by SOM
Number of features according to rankings
A
c
c
u
r
a
c
y
HFS SU
HFS LCC
HFS DB
PCA
Fig. 5b. Comparison results of classication accuracy of HFS_LLC, HFS_SU, HFS_DB
and PCA based method.
Y. Yang et al. / Expert Systems with Applications 38 (2011) 1131111320 11317
accuracy with less features. For FFC, the highest accuracy of 98.43%
was reached with 9 features. The performance improvement of the
proposed model over FRMV_KM, SFFS, fosmod and FFC was mainly
due to making use of every feature, combining the clustering solu-
tions and independence evaluation, which overcomes deciencies
of less uncertainty of clustering solution and jeopardize of related
features.
In order to illustrate the effect of the redundancy of the feature
set to the classication performance and demonstrate the robust-
ness of HFS, more features were involved to be the candidates in
the second test. Totally 40 features were calculated and given as
follows.
10 statistics for the raw signal (var, rms, skewness, kurtosis,
crest factor, 5th to 9th central moment).
Energies centered at 1xBPFO, 1xBPFI, 1xBSF for both raw signal
and envelop.
PMMs for both raw signal and envelop.
16 WPNs.
6 IMF energy entropies.
The results in Fig. 6 show the classication accuracy of the sec-
ond test. As shown in Fig. 4(a), HFS_LCC reached highest classica-
tion accuracy of 88.56% with rst 10 features, while HFS_DB and
HFS_SU achieved their highest classication accuracy of 87.29%,
85.17% with rst 8 features and 11 features, respectively Fig. 6b
shows that compare to PCA based feature selection method,
HFS_LCC and HFS_DB outperformed PCA (highest accuracy
86.02%) with respect to higher accuracy with the same number
of features or less features. In the comparison with FRMV_KM (as
shown in Fig. 6c) (highest accuracy 83.90%), HFS group showed
apparently better classication accuracy with less features. As
shown in Figs. 6d and 6e, the feature selected by SFFS and fosmod
resulted in the accuracy of 84.75% and 85.17%, which were worse
than the only one feature selected by HFS_DB. Compared to FFC
(as shown in Fig. 6f), HFS_LCC showed better performance. Since
86.86% accuracy was reached by FFC with 6 features selected.
From the results of the two tests, the conclusion could be drawn
that the proposed HFS is robust and effective in selecting the most
representative features, which maximizes the unsupervised classi-
cation performance. It also should be noted that for both tests,
FRMV_KM and HFS_SU shared the same evaluation criterion, but
the decision provided by the HFS_SU was always better than
1 2 3 4 5 6 7 8 9 10 11 12
40
50
60
70
80
90
100
Validated by SOM
Number of features according to rankings
A
c
c
u
r
a
c
y
HFS SU
HFS LCC
HFS DB
FRMV KM
Fig. 5c. Comparison results of classication accuracy of HFS_LLC, HFS_SU, HFS_DB
and FRMV_KM.
1 2 3 4 5 6 7 8 9 10 11 12
88
90
92
94
96
98
100
Validated by SOM
Number of features according to rankings
A
c
c
u
r
a
c
y
HFS SU
HFS LCC
HFS DB
SFFS
Fig. 5d. Comparison results of classication accuracy of HFS_LLC, HFS_SU, HFS_DB
and SFFS.
1 2 3 4 5 6 7 8 9 10 11 12
50
55
60
65
70
75
80
85
90
95
100
Validated by SOM
Number of features according to rankings
A
c
c
u
r
a
c
y
HFS SU
HFS LCC
HFS DB
fosmod
Fig. 5e. Comparison results of classication accuracy of HFS_LLC, HFS_SU, HFS_DB
and fosmod.
1 2 3 4 5 6 7 8 9 10 11 12
40
50
60
70
80
90
100
Validated by SOM
Number of features according to rankings
A
c
c
u
r
a
c
y
HFS SU
HFS LCC
HFS DB
FFC
Fig. 5f. Comparison results of classication accuracy of HFS_LLC, HFS_SU, HFS_DB
and FFC.
11318 Y. Yang et al. / Expert Systems with Applications 38 (2011) 1131111320
1 2 3 4 5 6 7 8 9 10 11 12
76
78
80
82
84
86
88
90
Validated by SOM
Number of features according to rankings
A
c
c
u
r
a
c
y
HFS SU
HFS LCC
HFS DB
Fig. 6a. Comparison results of classication accuracy of HFS_LLC, HFS_SU, HFS_DB.
1 2 3 4 5 6 7 8 9 10 11 12
76
78
80
82
84
86
88
90
Validated by SOM
Number of features according to rankings
A
c
c
u
r
a
c
y
HFS SU
HFS LCC
HFS DB
PCA
Fig. 6b. Comparison results of classication accuracy of HFS_LLC, HFS_SU, HFS_DB
and PCA based method.
1 2 3 4 5 6 7 8 9 10 11 12
76
78
80
82
84
86
88
90
Validated by SOM
Number of features according to rankings
A
c
c
u
r
a
c
y
HFS SU
HFS LCC
HFS DB
FRMV KM
Fig. 6c. Comparison results of classication accuracy of HFS_LLC, HFS_SU, HFS_DB
and FRMV_KM.
1 2 3 4 5 6 7 8 9 10 11 12
75
80
85
90
Validated by SOM
Number of features according to rankings
A
c
c
u
r
a
c
y
HFS SU
HFS LCC
HFS DB
SFFS
Fig. 6d. Comparison results of classication accuracy of HFS_LLC, HFS_SU, HFS_DB
and SFFS.
1 2 3 4 5 6 7 8 9 10 11 12
75
80
85
90
Validated by SOM
Number of features according to rankings
A
c
c
u
r
a
c
y
HFS SU
HFS LCC
HFS DB
fosmod
Fig. 6e. Comparison results of classication accuracy of HFS_LLC, HFS_SU, HFS_DB
and fosmod.
1 2 3 4 5 6 7 8 9 10 11 12
75
80
85
90
Validated by SOM
Number of features according to rankings
A
c
c
u
r
a
c
y
HFS SU
HFS LCC
HFS DB
FFC
Fig. 6f. Comparison results of classication accuracy of HFS_LLC, HFS_SU, HFS_DB
and FFC.
Y. Yang et al. / Expert Systems with Applications 38 (2011) 1131111320 11319
FRMV_KM, which indicated that the proposed HFS scheme is supe-
rior to FRMV with respect to the same evaluation criterion.
Besides, it is worth noticing that in both tests, the proposed HFS
based on three evaluation criterions, e.g. SU, LCC and DB index,
generated results with slight difference. It suggested that the effec-
tiveness of features selected by proposed HFS relied on the applied
evaluation criterion, and LCC was considered more appropriate for
both the two cases. Nonetheless, it is still appropriate to conclude
that the overall performance based on features selected by HFS
was better comparing to other ve methods.
4. Conclusion
This paper presented a hybrid unsupervised feature selection
(HFS) approach to select the most representative features for
unsupervised learning and used two experimental bearing data
to demonstrate the effectiveness of HFS. The performance of
HFS approach was compared with other ve feature selection
methods with respect to the accuracy improvement of unsuper-
vised learning algorithm SOM. The results showed that the
proposed model could (a) identify the features that are relevant
to the bearing defects, and (b) maximize the performance of
unsupervised learning models with fewer features. Moreover, it
suggested that HFS relied on the evaluation criterion to the
certain application. Therefore, the further research will focus
on expand HFS to broader applications and online machinery
defect diagnostics and prognostics.
Acknowledgement
The authors gratefully acknowledge the support of 863 Program
(No. 50821003), PR China, for this work.
References
Blum, A. L., & Langley, P. (1997). Selection of relevant features and examples in
machine learning. Articial Intelligence, 12, 245271.
Bouldin, D. L. D. a. D. W. (1979). A cluster separation measure. IEEE Transactions on
Pattern Analysis and Machine Intelligence, 224227.
Dash, M., & Koot, P. W. (2009). Feature selection for clustering. In Encyclopedia of
database systems (pp. 11191125).
Dash, M., & Liu, H. (1997). Feature selection for classication. Intelligent Data
Analysis, 14, 131156.
Dietrich, C., Palm, G., & Schwenker, F. (2003). Decision templates for the
classication of bioacoustic time series. Information Fusion, 2, 101109.
Frigui, H. (2008). Clustering: Algorithms and applications. In 2008 1st international
workshops on image processing theory,tools and applications, IPTA 2008. Sousse.
Ginart, A., Barlas, I., & Goldin, J. (2007). Automated feature selection for embeddable
prognostic and health monitoring (PHM) architectures. In AUTOTESTCON
(Proceedings), Anaheim, CA (pp. 195201).
Greene, D., Cunningham, P., & Mayer, R. (2008). Unsupervised learning and
clustering. Lecture Notes in Applied and Computational Mechanics, 5190.
Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection.
Journal of Machine Learning Research, 11571182.
Hong, Y., Kwong, S., & Chang, Y. (2008a). Consensus unsupervised feature ranking
from multiple views. Pattern Recognition Letters, 5, 595602.
Hong, Y., Kwong, S., & Chang, Y. (2008b). Unsupervised feature selection using
clustering ensembles and population based incremental learning algorithm.
Pattern Recognition, 9, 27422756.
Huang, J., Cai, Y., & Xu, X. (2007a). A hybrid genetic algorithm for feature selection
wrapper based on mutual information. Pattern Recognition Letters, 13,
18251844.
Huang, R., Xi, L., & Li, X. (2007b). Residual life predictions for ball bearings based on
self-organizing map and back propagation neural network methods. Mechanical
Systems and Signal Processing, 1, 193207.
Jack, L. B., & Nandi, A. K. (2002). Fault detection using support vector machines and
articial neural networks, augmented & by genetic algorithms. Mechanical
Systems and Signal Processing, 23, 373390.
Jain, A. K., Duin, R. P. W., & Mao, J. (2000). Statistical pattern recognition: A review.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 1, 437.
Jardine, A. K. S., Lin, D., & Banjevic, D. (2006). A review on machinery diagnostics and
prognostics implementing condition-based maintenance. Mechanical Systems
and Signal Processing, 7, 14831510.
Kwak, N., & Choi, C. H. (2002). Input feature selection for classication problems.
IEEE Transactions on Neural Networks, 1, 143159.
Lei, Y. G., He, Z. J., & Zi, Y. Y. (2008). A new approach to intelligent fault diagnosis of
rotating machinery. Expert Systems with Applications, 4, 15931600.
Li, G., Hu, X., Shen, X., et al. (2008) A novel unsupervised feature selection method
for bioinformatics data sets through feature clustering, In IEEE international
conference on granular computing, GRC 2008. Hangzhou. (pp. 4147).
Li, Y., Dong, M., & Hua, J. (2008). Localized feature selection for clustering. Pattern
Recognition Letters, 1018.
Liao, L., & Lee, J. (2009). A novel method for machine performance degradation
assessment based on xed cycle features test. Journal of Sound and Vibration,
326, 894908.
Liu, X., Ma, L., Zhang, S., & Mathew, J. (2006). Feature group optimisation for
machinery fault diagnosis based on fuzzy measures. Australian Journal of
Mechanical Engineering, 2, 191197.
Liu, H., & Yu, L. (2005). Toward integrating feature selection algorithms for
classication and clustering. IEEE Transactions on Knowledge and Data
Engineering, 4, 491502.
Malhi, A., & Gao, R. X. (2004). PCA-based feature selection scheme for machine
defect classication. IEEE Transactions on Instrumentation and Measurement, 6,
15171525.
Mitra, P., Murthy, C. A., & Pal, S. K. (2002). Unsupervised feature selection using
feature similarity. IEEE Transactions on Pattern Analysis and Machine Intelligence,
3, 301312.
Oduntan, I. O., Toulouse, M., & Baumgartner, R. (2008). A multilevel tabu search
algorithm for the feature selection problem in biomedical data. Computers &
Mathematics with Applications, 5, 10191033.
Patil, M. S., Mathew, J., & RajendraKumar, P. K. (2008). Bearing signature analysis as
a medium for fault detection: A review. Journal of Tribology, 1.
Peng, Z. K., & Chu, F. L. (2004). Application of the wavelet transform in machine
condition monitoring and fault diagnostics: a review with bibliography.
Mechanical Systems and Signal Processing, 2, 199221.
Samanta, B., Al-Balushi, K. R., & AI-Araimi, S. A. (2003). Articial neural networks
and support vector machines with genetic algorithm for bearing fault detection.
Engineering Applications of Articial Intelligence, 7-8, 657665.
Samanta, B., & Nataraj, C. (2009). Use of particle swarm optimization for machinery
fault detection. Engineering Applications of Articial Intelligence, 2, 308316.
Shao, Y., & Nezu, K. (2000). Prognosis of remaining bearing life using neural
networks. Proceedings of the Institution of Mechanical Engineers. Part I. Journal
of Systems and Control Engineering, 3, 217230.
Sugumaran, V., & Ramachandran, K. I. (2007). Automatic rule learning using
decision tree for fuzzy classier in fault diagnosis of roller bearing. Mechanical
Systems and Signal Processing, 5, 22372247.
Wei, H. L., & Billings, S. A. (2007). Feature subset selection and ranking for data
dimensionality reduction. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 1, 162166.
Xu, Z., Xuan, J., Shi, T., & Wu, B. (2009). Application of a modied fuzzy ARTMAP
with feature-weight learning for the fault diagnosis of bearing. Expert Systems
with Applications, 6, 99619968.
Yen, G. G. (2000). Wavelet packet feature extraction for vibration monitoring. IEEE
Transactions on Industrial Electronics, 3, 650667.
Yu, Y., Yu, D., & Cheng, J. (2006). A roller bearing fault diagnosis method based on
EMD energy entropy and ANN. Journal of Sound and Vibration, 12, 269277.
Yu, L., & Liu, H. (2004). Efcient feature selection via analysis of relevance and
redundancy. Journal of Machine Learning Research, 12051224.
Zhang, H., & Sun, G. (2002). Feature selection using tabu search method. Pattern
Recognition, 35, 701711.
11320 Y. Yang et al. / Expert Systems with Applications 38 (2011) 1131111320