An Early Fault Detection Method of Rotating Machines Based On Unsupervised Sequence Segmentation Convolutional Neural Network

open source research paper

Uploaded by

arslan.chohan88

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

An Early Fault Detection Method of Rotating Machines Based On Unsupervised Sequence Segmentation Convolutional Neural Network

open source research paper

Uploaded by

arslan.chohan88

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL.

71, 2022 3504712

An Early Fault Detection Method of Rotating

Machines Based on Unsupervised Sequence
Segmentation Convolutional
Neural Network
Wenbin Song , Weiming Shen , Fellow, IEEE, Liang Gao , Senior Member, IEEE,
and Xinyu Li , Member, IEEE

Abstract— Early fault detection (EFD) is vital for mechanical financial losses and casualties [2]. Therefore, it is necessary
systems to reduce downtime and increase stability. The main to develop an early fault detection (EFD) method for rotating
challenge of EFD for rotating machines is to extract discrimina- machines to conduct the predictive maintenance, which can
tive features from noisy signals to identify early faults. However,
the lack of labels for the whole lifecycle data hinders the ensure the reliability and stability of rotating machines.
application of some powerful supervised deep learning methods EFD is a kind of condition-based fault diagnosis method.
in EFD. Besides, many EFD methods have to set a criterion The main purpose is to extract a health index (HI) from col-
manually, such as a threshold, to judge whether an early fault lected data to indicate the degradation level of the equipment
has occurred. To address these challenges, this article proposes a and recognize the weak changes of HIs to detect early faults.
novel EFD method based on unsupervised sequence segmentation
convolutional neural network (USSCNN). At first, frequency- There are two objectives for EFD. One is to identify early
domain features are extracted from raw signals and converted faults as early as possible, and the other is to extract the
to 2-D gray images. Then, historical lifecycle data are labeled HI to represent the trend of equipment degradation process.
by USSCNN so that a CNN classifier can be trained with EFD methods aim at mining sensitive features that reflect
these labeled data. The deep features of the historical data the condition of degradation from online/offline monitoring
learned by the CNN classifier are utilized to train the health
index (HI) assessment model. The proposed method is tested on data [3]. The degradation process of bearings is generally
three bearing datasets. The results have shown that the proposed divided into three stages [4]. The first stage is a stable
method can detect incipient faults earlier than the comparing operation period when rotating machines operate normally.
methods with lower false alarms. Also, the HIs learned by the The second stage is a rapid degradation period from the
HI assessment model shown that the proposed method can extract appearance of an early fault to the occurrence of the severe
discriminative features for EFD. More importantly, the proposed
method can detect an early fault by the well-trained classifier, fault. The last stage is an accelerated degradation period from
which avoids manual criterion-making. Results of comparison the occurrence of the severe fault to the complete failure. The
demonstrated the effectiveness and the robustness of the proposed decrease of the performance in the last two stages is much
method. faster than the first stage. Alarming faults earlier can provide
Index Terms— Convolutional neural network (CNN), early sufficient time to take maintenance measures in advance,
fault detection (EFD), simulated annealing (SA) algorithm. so as to reduce maintenance costs and decrease risks of
failures. To deal with this problem, numerous smart sensors
I. I NTRODUCTION are installed on machines to collect operation data, includ-

R OTATING machines are basic components of mechan-

ical systems and have been widely used in intelligent
manufacturing [1]. Faults of rotating machines can lead to a
ing vibration, displacement, and temperature. Among them,
vibration signals are commonly used to detect early faults
in rotating machines [5]. The most commonly used feature
breakdown of the entire mechanical system and possibly cause extraction methods for signals can be categorized into three
types: time-domain analysis, frequency-domain analysis, and
Manuscript received August 26, 2021; revised November 14, 2021; accepted
November 21, 2021. Date of publication December 6, 2021; date of current time–frequency-domain analysis [6]. After feature extraction,
version March 2, 2022. This work was supported in part by the National Key an important step of EFD is to construct HI from extracted
Research and Development Program of China under Grant 2019YFB1704600, features. The appearance of an early fault can be defined
in part by the Program for HUST Academic Frontier Youth Team under
Grant 2017QYTD04, and in part by the Huazhong University of Science and as the time when the HI begins to deviate from the normal
Technology under Grant 2021GCRC58. The Associate Editor coordinating the condition. The HI is nearly invariant when rotating machines
review process was Dr. Datong Liu. (Corresponding author: Weiming Shen.) operate normally. When an early fault occurs, the equipment
The authors are with the State Key Laboratory of Digital Manufacturing
Equipment and Technology, Department of Industrial and Manufacturing Sys- state degrades rapidly, which causes a sharp change of the HI.
tems Engineering, School of Mechanical Science and Engineering, Huazhong There have been a lot of research efforts on EFD.
University of Science and Technology, Wuhan 430074, China (e-mail: Cheng et al. [7] proposed a new online health degradation
[email protected]; [email protected]; [email protected];
[email protected]). monitoring method of rolling bearings based on growing
Digital Object Identifier 10.1109/TIM.2021.3132989 self-organizing mapping (GSOM) and clustered support vec-
1557-9662 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Southeast University. Downloaded on September 30,2024 at 09:38:01 UTC from IEEE Xplore. Restrictions apply.
3504712 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 71, 2022

tor machine (CSVM). At first, five types of features are constructed a deep neural network (DNN) to extract features
extracted to reflect the health degradation process. Then, mul- and utilized LSTM to detect early faults. Apart from these
tiple GSOMs are conducted to adaptively combine extracted well-known DL models, deep transfer learning (DTL) has been
features as an HI. Finally, the CSVM is utilized to detect well studied in fault diagnosis for its low dependence on target
early faults. Li et al. [4] conducted symbolic dynamic filter- data [21], [22]. DTL-based fault diagnosis methods can train
ing (SDF) to extract fault features at first and then applied a model with data from the source domain and transfer the
intrinsic characteristic-scale decomposition (ICD) to recognize model to the target domain to detect faults [23], [24]. The data
early faults. Hong et al. [8] conducted spectral kurtosis (SK) from these two domains can be collected from different work-
to extract features and then utilized the Gaussian mixture ing conditions, different types of faults, different locations,
model to detect early faults. Mao et al. [9] proposed a semi- and different machines [25]. In order to deal with the issue
supervised architecture to detect early faults. First, deep fea- of lacking labels in the target-domain data, some researchers
tures are extracted from the Hilbert–Huang transform (HHT) proposed unsupervised DTL-based fault diagnosis methods,
marginal spectrum of raw signals by stacked denoising autoen- which only utilize training data from a source domain to
coder (SDAE). Then, the safe semisupervised support vector predict unlabeled target-domain data. Qin et al. [26] proposed
machine (S4VM) is utilized to identify the state of the target a multiscale transfer voting mechanism (MSTVM), which can
bearing based on the sequentially arrived monitoring data. detect faults without labels in the target domain. The proposed
However, most of the existing approaches need to design method can extract multiscale transfer features and utilize
original features manually, which is not practical in indus- multiple top classifiers to achieve multiscale classification.
trial applications [6]. First, manually selected features rely Zhao et al. [25] conducted several typical unsupervised DTL
heavily on expert knowledge, which is not always available methods on five open fault diagnosis datasets and proved the
for mechanical systems. Besides, the extracted features in effectiveness of the unsupervised DTL methods.
collected signals need to be further mapped to degradation However, most of the DL-based models have to set a
levels, but most of them are distorted or indifferent to the threshold. It limits the application of DL methods due to
HI [10]. Another solution of EFD is to conduct anomaly detec- the variation of thresholds when using a well-trained model
tion. Some anomaly detection-based EFD methods conducted to detect early faults under different operation conditions.
EFD directly by comparing the value of HI with a predefined Besides, the whole lifecycle data usually do not have enough
threshold. One way to construct the HI of these methods is labels, but most of these methods require labeled data to
to utilize features such as root mean square (RMS) [11] and train their models. As for the unsupervised DTL methods,
kurtosis [4] to serve as HI. Others construct HI based on the although target-domain labeled data are not required in the
similarities of excavated features between the current moment training process, the source-domain labeled data are still
and the beginning normal moment. Commonly used similar- needed to train classification models. However, in the field
ity metrics include the Pearson correlation coefficient [12] of EFD, the whole lifecycle training data are unlabeled as
and cosine distance [13]. However, degradation processes of well. Therefore, this article proposes an EFD method of rotat-
rotating machines vary from one to another due to different ing machines based on unsupervised sequence segmentation
operation conditions, rotating speeds, operating loads, environ- convolutional neural network (USSCNN), which can label the
ment temperatures, and so on [14]. It is hard to set a generic historical lifecycle data and train a CNN model with labeled
threshold to recognize early faults of rotating machines under data. The contributions of this article are summarized as
different operation conditions. Other anomaly detection-based follows.
EFD methods applied anomaly detection algorithms, such as
local outlier factor (LOF), support vector data description 1) A DL structure USSCNN based on LeNet-5 is proposed
(SVDD), and one-class support vector machine (SVM), which in this article. The USSCNN converts the classification
are trained by normal data only. Despite these methods can task into a sequence segmentation problem and conducts
develop models with normal data merely, which are easier simulated annealing (SA) to search for the optimal seg-
to access in mechanical systems, they still lack an effective mentation. Different from most existing methods, the
fault alarm threshold that is important to monitor such a small USSCNN can classify historical unlabeled data without
deviation [9]. manually designed features and classification criteria.
Recently, deep learning (DL) methods have been applied Besides, in order to address the problem of class imbal-
in mechanical systems and have achieved good performance, ance, an oversampling strategy is conducted to balance
especially in the field of fault diagnosis. The DL mod- the number of positive and negative samples.
els for fault diagnosis include convolutional neural network 2) Unlike other existing EFD methods, the proposed method
(CNN) [15], [16] and long short-term memory (LSTM) [17]. can identify the health state of new arrival data without
Wen et al. [18] converted 1-D signals to 2-D images and a predefined threshold through the implementation of a
utilized CNN to detect faults of bearings. Xu et al. [19] pro- well-trained CNN classifier, which is more generic and
posed a deep transfer convolutional neural network (TCNN) practical in real applications. Besides, an early fault alarm
framework and applied it in fault diagnosis. Luo et al. [6] con- criterion is utilized to improve the reliability of prediction
ducted a DL model to automatically select impulse responses results.
from vibration signals and then extracted dynamic properties 3) An HI assessment model is developed based on prin-
from impulse responses to detect early faults. Lu et al. [20] cipal component analysis (PCA), which can extract an

Authorized licensed use limited to: Southeast University. Downloaded on September 30,2024 at 09:38:01 UTC from IEEE Xplore. Restrictions apply.
SONG et al.: EARLY FAULT DETECTION METHOD OF ROTATING MACHINES BASED ON USSCNN 3504712

Fig. 2. 1-D-to-2-D transformation method.

The temperature cooling method can be expressed as

Fig. 1. Flowchart of SA.
follows:
T = T × e−c1 ×iter
c2
HI to represent the degradation process of rotating (1)
machines.
where c1 and c2 are positive integers and iter is the iteration
The rest of this article is organized as follows. Section II number.
presents the preliminaries. Section III introduces the proposed
method for EFD. Section IV illustrates the details and results
of the experiment cases. Finally, Section VI concludes this B. 1-D-to-2-D Transformation Method
article with a discussion of future work. Data preprocessing is vital for data-driven fault diagno-
sis methods since the extracted features will influence the
performance of fault detection. Incipient faults of rotating
II. P RELIMINARIES machines always cause a variation in frequency domain.
A. Simulated Annealing Therefore, we utilize the fast Fourier transform (FFT) to
extract frequency-domain features and detect early faults.
It is a global optimization method [27]. It comes from
CNN is a powerful DL method, which can extract repre-
annealing in metallurgy. Annealing involves heating and con-
sentative features from data automatically. In order to take
trolled cooling of a material to increase the size of its crystals
advantage of the learning ability of CNN and explore 2-D
and reduce their defects. The atoms in the material will stay
features in the frequency domain, a 1-D-to-2-D transformation
at the position where the internal energy is local minimum
method [18] is applied in this article.
at the beginning. If the energy is increased by heating, the
As shown in Fig. 2, in order to obtain an image of
atoms will leave the original position and move randomly in
size M × M, a segment signal with the length 2 × M 2
other positions. A lower cooling speed will make it possible
would be obtained from raw signals with a certain time
for more atoms to find optimal positions with lower internal
interval. Then, FFT is conducted to extract frequency-domain
energy than before.
amplitudes with the length of M 2 from the segment time-
SA decreases temperature slowly and searches for a best
domain signal. Let L(i ), i = 1, . . . , M 2 , denote the values
state in a solution space based on the current state in each
of the frequency-domain amplitudes of the segment signal.
temperature. If the fitness value of a new state is better than
P( j, k), j, k = 1, . . . , M, denotes the pixel strength of the
the current state, SA lets the new state replace the current
image, as shown in the following equation [18]:
state. On the contrary, if the fitness value of a new state is
greater than the current state, SA uses an acceptance function (L(( j − 1) × M + k) − min(L))
P( j, k) = round × 255 .
to decide whether to replace the current state or not. The max(L) − min(L)
flowchart of SA is presented in Fig. 1. (2)

Fig. 3. Framework of the proposed EFD method.

The function round{·} is the rounding function to normalize In the training process, after obtaining 2-D images, the
the whole pixel value in the range of 0 to 255, which is proposed USSCNN is utilized to divide those time sequence
the pixel strength of the gray image. max(L) and min(L) sorted images into two classes: normal conditions and faults.
are the maximum and minimum amplitudes of the extracted A CNN classifier is obtained based on the labeled historical
frequency-domain features, respectively. data. In order to deal with the problem of class imbalance,
an oversampling strategy is conducted to balance the number
of positive and negative samples in the training process.
III. P ROPOSED M ETHOD
Besides, the deep state features can be extracted by the
The whole lifecycle data are usually in a lack of labels. well-trained CNN classifier. The deep features of historical
Therefore, we propose a novel EFD method based on data are used to learn an HI representation mapping, which
USSCNN. The 1-D raw signals are first transferred to 2-D can be utilized to extract HI from new data.
images. Subsequently, USSCNN are conducted to label the In the online predicting stage, the new collected signal
historical data and obtain a well-trained CNN early fault should be converted to a 2-D gray image first. Then, the
classifier. At the same time, an HI assessment model is trained processed image is transferred into the well-trained CNN
based on the deep features of historical data learned by model to extract deep features and recognize the equipment
USSCNN. Finally, the state and the HI of equipment can be condition. Finally, the deep features of the new data are used
obtained online by the new monitoring signal. to construct HI based on the HI assessment model.

A. Framework of the Proposed Method B. Unsupervised Sequence Segmentation Convolutional

As shown in Fig. 3, the proposed EFD method contains Neural Network
three main steps, i.e., data processing, off-line training, and The main challenge of EFD is the lack of labels of the
online predicting. whole lifecycle data. However, the whole lifecycle data are
In general, vibration signals for EFD are collected by in time sequence, and the degradation process is monotonous.
accelerometers with a certain time interval in consideration Usually, raw signals are collected at intervals throughout the
of the capability of hardware for data transfer and storage. whole lifecycle, represented as [st1 , . . . , sti , . . . , st N ]. In each
In the data processing procedure, historical lifecycle signals snapshot, a window with the length of 2048 is used to choose
S contain a set of discrete snapshots [st1 , . . . , sti , . . . , st N ] the sample x ti . Then, each sample x ti is converted to a 2-D
collected with the same time interval, where ti denotes the image f ti . As shown in Fig. 4, the moment t ∗ that an early fault
sampling time, ti − ti−1 = ti−1 − ti−2 , i = 3, . . . , N. N occurs divides the sequence data into two subsets, including
is the number of snapshots. Then, 2048 consecutive points the normal subset and the fault subset. The image obtained
are randomly selected from individual snapshots to generate from the signal before collecting time t ∗ is the normal state,
a sample x ti . Subsequently, frequency-domain features are and the image converted from the signal after t ∗ is the fault
extracted from those samples by FFT and converted to 2-D state. The objective is to find the optimal segmentation t ∗
images [ f t1 , . . . , fti , . . . , f t N ] with size 32 × 32 as described of the whole lifecycle data. The optimal segmentation must
in Section II-B. satisfy two conditions. First, samples from the normal subset

Algorithm 1 USSCNN
Input: Feartures extracted from historical data: F =
{ f t1 , . . . , ft N }, Number of iterations K .
Output: Optimal t ∗ , CNN model: MC .
1: Initialization: k = 1, t ∗ = tek−1
2: repeat
3: Update the pseudo labels of the historical data F based
on Equation (3)
4: Train model MC with labeled historical data F
5: Calculate the loss function based on Equation (4) and (5)
6: Update tek based on SA
7: k = k+1
8: until k = K

to represent the degradation trend. LeNet-5 [28] is a classical

CNN model, which is proven to be effective and promising in
image pattern recognition.
The size of images transferred to the CNN model is
32 × 32. As shown in Fig. 4, the proposed CNN model
contains two alternating convolutional and pooling layers:
two FC layers and an output layer. In order to prevent the
Fig. 4. Process of USSCNN. dimension loss, zero-padding method is applied in the CNN
model. The loss function of the CNN classifier is defined as
and fault subset divided by the optimal segmentation have to follows:
1
be classified accurately. Second, the volume of the normal N

subset should be as small as possible, so as to ensure detecting LossCNN = − [lt × log pti + (1 − lti ) × log(1 − pti )]
N i=1 i
early faults earlier. Therefore, the USSCNN is proposed to
divide the unlabeled historical lifecycle data and train a CNN (4)
classifier for online EFD. where pti represents the probability that the CNN classifier
The process of USSCNN is presented in Fig. 4. It contains predicts the i th sample as fault.
two main steps. The first step is to initialize a segmentation to The objective of USSCNN is to minimize the loss of the
define pseudo labels for the time sequence data. Second, the CNN classifier on the historical data with pseudo labels and
2-D image samples with pseudo labels are utilized to train a the volume of the normal subset. The proportion of normal
CNN classifier. data defined by the pseudo labels in the historical data can
Let F = { f t1 , . . . , f t N } represent N images sorted by time be represented as (t − 1)/N. Therefore, the loss function of
sequence. Suppose that te is the moment when an early fault USSCNN is presented as follows:
occurs. The pseudo labels can be defined as follows:
t −1
LossUSSCNN (t) = α1 × LossCNN + α2 × (5)
0, i < e N
lti = (3)
1, i ≥ e where t is the decision variable, which indicates when an early
fault occurred in the historical data. LossCNN denotes the loss
where li represents the pseudo label of the sample f ti , i =
function of the CNN classifier based on the segmentation of
1, . . . , N. The numbers of the normal data and fault data are
data with t. N represents the number of samples. The second
p − 1 and N − p + 1, respectively. In general, the normal
loss (t − 1)/N is the proportion of normal data. Minimizing
samples are more than fault samples. However, in some cases,
the second loss ensures detecting early faults in the training
the amount of normal samples is much greater than fault
data as early as possible. α1 and α2 are constant variables,
samples, which leads to the class-imbalance problem. The
which are utilized to adjust the influence of two losses on the
biased distribution of training data could lead to ignore the
objective function. The SA algorithm is conducted to search
minority class entirely. An efficient approach to address this
for the optimal segmentation time t of the whole lifecycle
problem is to randomly duplicate examples in the minority
data in this article. The pseudocode of USSCNN is shown in
class, which is called oversampling. If p − 1 > N − p − 1 or
Algorithm 1.
p − 1 < N − p − 1, the oversampling strategy is utilized
to randomly select samples from the minority class until
p − 1 = N − p − 1. C. Construction of HI
Subsequently, a CNN classifier can be trained by the pseudo The core point of EFD is to construct HI from extracted
labels. In this article, a CNN model based on LeNet-5 is features. Early fault starting point is defined as the time
designed to identify the equipment state and extract the HI when the HI begins to deviate from the normal condition.

Fig. 5. Strategy of early fault alarm.

The USSCNN can obtain a well-trained CNN classifier based

on historical data. The well-trained CNN classifier is able
to extract discriminative features from historical data, so as
to conduct EFD. Therefore, an unsupervised HI assessment
Fig. 6. PRONOSTIA platform for rolling element bearings experiments.
model based on PCA is proposed to compress the discrimina-
tive features to a 1-D HI.
Let D h = [dth1 , . . . , dthN ] denote the deep features of his- A. Datasets
torical data extracted by the well-trained CNN and D new = 1) PHM2012: The PHM 2012 dataset is provided by the
[dtnew
1
, . . . , dtnew
n
] denote the deep features of new monitoring FEMTO-ST Institute on the PRONOSTIA platform [29].
data. At first, PCA is conducted on D h to calculate the Fig. 6 presents the details of the PRONOSTIA platform.
eigenvector V corresponding to the largest eigenvalue. There are three parts of the experimental platform, including
Then, the HI of the new data can be calculated as follows: a rotating part, a load part, and a data collection part. The
rotating part drives the bearing by a motor with a power
h new
ti = V T dtni (6)
of 250 W. The load part is used to provide an extra load on the
where h new
ti is the HI of the new monitoring sample at time ti bearings to accelerate the degradation. In the data collection
and V T is the transpose of V . part, the signals are collected by accelerometer sensors of type
DYTRAN 3035B in the horizontal and vertical directions. The
D. Early Fault Alarm Criterion sampling frequency is 25.6 kHz.
The early fault is detected based on the well-trained CNN Totally 17 run-to-failure accelerated degradation experi-
classifier. The proposed method can identify whether the state ments of bearings are conducted in three different operation
of a new sample is normal or abnormal. However, anomalies conditions as follows.
may occasionally appear once when rotating machines operate 1) Speed: 1800 r/min and load: 4000 N.
normally. If a signal collected at time ti is detected to be abnor- 2) Speed: 1650 r/min and load: 4200 N.
mal and the signals after ti are judged as normal, then ti cannot 3) Speed: 1500 r/min and load: 5000 N.
be seen as the early fault starting time. The occurrence of an The 10-Hz signals are collected every 10 s. Therefore, the
early fault must be abnormal samples appearing continuously. time interval in the PHM 2012 dataset is set to be 10 s.
Therefore, a robust strategy for early fault alarm [9] is utilized Besides, Bearing 3 under the first condition is first chosen
to improve the reliability of the proposed method. The strategy as training data, and Bearing 1 under the same condition is
has to set two parameters: the length of time window k and the served as testing data. Conversely, Bearing 3 are tested with
threshold p. As shown in Fig. 5, in order to determine whether the model trained by Bearing 1 under the first condition. The
an early fault has occurred at time ti , the CNN classifier will bearings are denoted as PHM1_1 and PHM1_3 for short.
detect the states of multiple successive samples with a length 2) Intelligent Maintenance System: The experimental
of k. Let x ti , x ti+1 , . . . , x ti+k−1 denote the successive samples, dataset is acquired from the Intelligent Maintenance Sys-
where x ti denotes the sample at time ti . Let pi denote the state tem (IMS) Laboratory of the University of Cincinnati [30].
x ti . If the state is normal, pi = 0; otherwise,
of the sample There are four bearings installed on the shaft of the bearing
pi = 1. If i+k−1j =i p j ≥ p, an early fault is considered to have test rig, which is driven by an ac motor (shown in Fig. 7).
occurred. The rotation speed keeps at 2000 r/min. A radial load of
The values of k and p are set according to the real require- 6000 lbs is applied to the shaft and bearings by a spring
ment. Large values of k and p will improve the reliability of mechanism. An oil circulation system measures the flow and
detection results. In this article, k and p are set to be 10 and 6, the temperature of the lubricant. Besides, a magnetic plug
respectively. installed in the oil feedback pipe collects the debris from the
oil, which serves as a metric of the bearing system degrada-
IV. E XPERIMENTS tion. The system stops when the accumulated debris adhered
In order to evaluate the performance of the proposed to the magnetic plug exceeds a threshold. There are three
method, the proposed method is tested on three frequently used degradation experiments totally. Signal data are collected by
run-to-failure bearing datasets, i.e., PHM 2012, IMS, and XJD accelerometers of type PCB 353B33 High Sensitivity Quartz
datasets. ICP, which are installed on each bearing housing. The data

TABLE I
B EARINGS OF XJD B EARING D ATASET

Fig. 7. IMS platform for rolling element bearings experiment.

Fig. 8. XJD platform for rolling element bearings experiment.

sampling rate is 20 kHz and each vibration signal snapshot Fig. 9. Converting images of PHM1_3.
contains 20 480 points. At the end of the first run-to-failure
experiment, an inner race defect occurred in Bearing 3 and Table I lists the details of bearings, including the operation
a roller element defect occurred in Bearing 4. Also, outer conditions, whole lifetime, and fault elements [31].
race faults occurred in Bearing 1 at the end of the second The time interval is set to be 1 min since the signal is
experiment and Bearing 3 at the end of the third experiment. collected every 1 min. In this article, Bearing1_1 is chosen as
As the vibration data are recorded every 10 min, the time training data, and Bearing1_5 is selected as testing data. They
interval of IMS dataset is 10 min. Besides, Bearing 1 of the are represented as XJD1_1 and XJD1_5.
second experiment is selected as testing data, and Bearing 3 of
the third experiment is chosen as training data. They are B. Evaluation Indicators
denoted as IMS2_1 and IMS3_3.
The performance of the proposed method is evaluated
3) XJD: The bearing dataset is provided by the Institute
by two indicators. The first one is the earliness of fault
of Design Science and Basic Component, Xi’an Jiaotong
detetcion. The earlier the fault is identified, the better the
University (XJTU), and Changxing Sumyoung Technology
method performance is. The second indicator is the number
Company Ltd. (SY) [31]. Fig. 8 shows the platform of
of false alarms. It means the number of misclassified samples
the rolling element bearings experiment. There are totally
before the recognized early fault. A small false alarm number
15 run-to-failure bearings collected by conducting accelerated
indicates a better performance of the method.
degradation experiments. The signals are collected by two
accelerometers of type PCB 352C33, which are installed on
the horizontal axis and the vertical axis. The sampling fre- C. Comparative Methods
quency is 25.6 kHz. Every 1 min, a sample with two channels The proposed method is compared with 16 different EFD
is recorded, including horizontal and vertical vibration signals. methods as follows:
Each channel has 32 768 data points (i.e., 1.28 s). Besides, 1) bandwidth empirical mode decomposition (BEMD) +
there are three different operation conditions of these bearings. adaptive multiscale morphology analysis (AMMA);

TABLE II
H YPERPARAMETERS OF USSCNN IN E XPERIMENTS

2) RMS + correlation coefficient;

3) SDAE + one-class SVM;
4) RMS + one-class SVM;
5) Kurtosis + one-class SVM;
6) SDAE + SVDD;
7) RMS + SVDD;
8) Kurtosis + SVDD;
9) SDAE + LOF;
10) RMS + LOF;
11) Kurtosis + LOF;
12) SDAE + isolation forest (iForest);
13) RMS + iForest;
14) Kurtosis + iForest;
15) S4VM + second-order difference of radius margin bound
(SODRMB);
16) USSCNN (without oversampling);
17) USSCNN.
Method 1 [32] is an early fault diagnosis method based
on the BEMD analysis and AMMA. Method 2 [33] detects
early faults by calculating the correlation coefficient of new
data with the first normal state sample. Methods 3–14 are the
combination of four widely used anomaly detection algorithms
(one-class SVM, SVDD, LOF, and iForest) and three typi-
cal feature extraction methods (SDAE, RMS, and Kurtosis).
Method 15 [9] is the state-of-the-art EFD method based on
safe S4VM and utilizes the SODRMB of SVM as the HI.
Moreover, in order to verify the effectiveness of oversampling,
Method 16 conducts USSCNN without oversampling to train Fig. 10. EFD results of PHM1_1. (a) EFD result and the HI of PHM1_1.
(b) States of samples in PHM1_1.
a CNN classifier.
where i and j = 2, . . . , N, with N the number of samples, and
thresholdl and thresholdu are thresholds of lower and upper
D. Experiment Results boundaries and set to be 0.01 and 10, respectively.
The hyperparameters of USSCNN in the experiments are At first, we select PHM1_3 for training and use PHM1_1
listed in Table II, including the stop and start temperatures of for testing. The USSCNN is utilized to label the data of
SA, max iteration numbers of SA, and the weights of the loss PHM1_3 and train a CNN classifier. The prelabeling results
function. of PHM1_3 is 1286, which means that data collected after
1) PHM2012: In this case, the collected data have two the 1286th sample are all abnormal. The EFD result and the
channels. Therefore, the input size of images is 2@32 × 32. HI of PHM1_1 are shown in Fig. 10(a). In order to smooth
Taking PHM1_3 as an example, the images of raw signal the HI, a moving average is utilized. The early fault appears
transforming are presented in Fig. 9. The images contain at the 1395th sample, and there is no false alarm. The states
samples whose remaining useful life (RUL) remains 100%, of samples predicted by the CNN classifier are presented in
50%, and 0%. Fig. 10(b). The prediction results show that after about the
In order to improve the training speed of USSCNN, the 1300th sample, anomalies appear more and more frequently.
lower and upper boundaries are set based on the RMS. Let Before the 1200th sample, several anomalies are detected by
rms1 represent the smallest RMS of the first 20 samples in the the CNN classifier. This is mainly because the degradation
whole lifecycle data. The lower and upper boundaries can be of rotating machines is a gradual process. When an early
calculated by the following equations: fault is about to occur, the boundary between normal data
and anomalies is blurred. The early fault alarm strategy can
lower = max[i |rmsi ≥ (1 + thresholdl ) × rms1 ] improve the reliability of the proposed method and decrease
upper = min[ j |rms j ≤ (1 + threshold1 ) × rms1 ] (7) the false alarms significantly.

TABLE III
C OMPARATIVE R ESULTS OF F OUR E XPERIMENTS

Second, PHM1_1 is used for training and PHM1_3 is

utilized to test. The prelabeling result of PHM1_1 is 1350. The
EFD result and the HI of PHM1_3 are shown in Fig. 11(a). The
early fault appears at the 1261th sample, and there is no false
alarm as well. The prediction labels are shown in Fig. 11(b).
The results show that anomalies appear at around the 1150th
sample. The samples before the 1100th sample are all normal
and the samples after the 1261th sample are all abnormal. The
proposed method can capture the change of equipment state
from normal to abnormal, and the early fault alarm strategy
provides a robust criterion to determine whether an early fault
has appeared.
The results have shown that the proposed method can
detect early faults with high reliability. More importantly,
the extracted HIs of PHM1_1 and PHM1_3 begin to rise
sharply after the appearance of early faults, which proves that
the proposed HI assessment model can extract discriminative
features to indicate the degradation process.
2) IMS: In this case, the collected data have one channel.
Therefore, the input size of images is 1@32 × 32. Besides,
thresholdl is set to be 0.008 and thresholdu is set to be 3.
The IMS3_3 is used for training and the prelabeling result
of IMS3_3 is 5968. The EFD result and the HI of IMS2_1
are shown in Fig. 12(a). The early fault appears at the 531th
sample, and there is no false alarm as well. The extracted
HI keeps stable when the bearing operates normally and
starts to increase rapidly after an early fault occurs. The
prediction results are shown in Fig. 12(b). There is a clear
boundary between normal data and anomalies in the samples
of IMS2_1.
3) XJD: In this case, the collected data have two channels.
Therefore, the input size of images is 2@32 × 32; thresholdl
is set to be 0.01 and thresholdu is set to be 10. The USSCNN
is conducted on XJD1_1 to obtain a CNN classifier. The
prelabeling result of XJD1_1 is 79. It is worthy to note that
in order to compare with other EFD methods, the number of Fig. 11. EFD results of PHM1_3. (a) EFD result and the HI of PHM1_3.
(b) States of samples in PHM1_3.
samples in XJD1_5 is expanded to 1664 from 52 by dividing
each snapshot into 32 discrete signals. The EFD results and
the HI of XJD1_5 are shown in Fig. 13(a). The early fault as well. The degradation process of the HI extracted from
appears at the 1085th sample, and there is no false alarm XJD1_5 is similar to IMS2_1, which has an obvious boundary

Fig. 12. EFD results of IMS2_1. (a) EFD result and the HI of IMS2_1. Fig. 13. EFD results of XJD1_5. (a) EFD result and the HI of XJD1_5.
(b) States of samples in IMS2_1. (b) States of samples in XJD1_5.

between the normal condition and faults. The prediction results In Methods 3–14, Method 3 detects the incipient faults
are shown in Fig. 13(b). The boundary between normal data 57 and 77 samples earlier than the proposed method
and anomalies in the samples of XJD1_5 is also clearer than on PHM1_3 and XJD1_5. However, the false alarms of
PHM1_1 and PHM1_3. Method 3 on PHM1_3 and XJD1_5 are 74 and 68, respec-
tively, which are much greater than the proposed method.
It is obvious that Method 3 detects early faults with lower
E. Result Analyses and Discussion reliability. Similarly, Method 6 detects incipient faults earlier
The results of 17 methods under comparison are presented than the proposed method on PHM1_3 and XJD1_5 but gen-
in Table III. It should be noted that Method 1 conducts EFD erates a greater number of false alarms. The EFD performance
based on the fault characteristic frequency, and Method 2 cal- on PHM1_3 of Method 9 is a little better than the proposed
culates the correlation coefficient directly to detect early faults. method, but it generates more false alarms. The results on the
These two methods need not to worry about false alarms. rest bearings of Methods 3–14 are worse than the proposed
Therefore, Table III only contains false alarms of Methods method both on the earliness of fault detection and the number
3–16 and the proposed method. of false alarms. The results indicate that the proposed method
The proposed method has no false alarm on four testing is more generic and reliable than Methods 3–14.
bearings, which is the best among all comparative methods. In order to verify the effectiveness of the oversampling strat-
Compared with Methods 1 and 2, the proposed method detects egy of the proposed method, Method 16 conducts USSCNN
incipient fault earlier than these methods. Besides, the pro- without oversampling on these bearings to train CNN clas-
posed method outperforms the state-of-the-art Method 15 both sifiers. The EFD results of Method 16 on PHM1_1 and
on the earliness of the fault detection and the number of false PHM1_3 are the same as the proposed method, but it generates
alarms. The results have shown that the proposed method can more false alarms than the proposed method. The results
detect incipient faults at an early stage precisely. of Method 16 on IMS2_1 and XJD1_5 are worse than the

proposed method though it does not generate false alarms [7] Y. Cheng, H. Zhu, K. Hu, J. Wu, X. Shao, and Y. Wang, “Health degra-
either. The results indicate that the oversampling strategy can dation monitoring of rolling element bearing by growing self-organizing
mapping and clustered support vector machine,” IEEE Access, vol. 7,
improve the accuracy of the CNN classifiers and contribute pp. 135322–135331, 2019.
to extract more sensitive features to detect incipient faults [8] Y. Hong, M. Kim, H. Lee, J. J. Park, and D. Lee, “Early fault
earlier. diagnosis and classification of ball bearing using enhanced kurtogram
and Gaussian mixture model,” IEEE Trans. Instrum. Meas., vol. 68,
V. C ONCLUSION no. 12, pp. 4746–4755, Dec. 2019.
[9] W. Mao, S. Tian, J. Fan, X. Liang, and A. Safian, “Online detection
In this article, an EFD method of rotating machines based of bearing incipient fault with semi-supervised architecture and deep
on USSCNN is proposed. The proposed method converts the feature representation,” J. Manuf. Syst., vol. 55, pp. 179–198, Apr. 2020.
1-D vibration signals to 2-D images and utilizes the historical [10] R. Teti, K. Jemielniak, G. O’Donnell, and D. Dornfeld, “Advanced
monitoring of machining operations,” CIRP Ann. Manuf. Technol.,
data to train a CNN classifier by USSCNN. In order to address vol. 59, no. 2, pp. 717–739, Jan. 2010.
the problem of class imbalance, an oversampling strategy is [11] I. El-Thalji and E. Jantunen, “A summary of fault modelling and
utilized in the training process of USSCNN. Moreover, an HI predictive health monitoring of rolling element bearings,” Mech. Syst.
Signal Process., vols. 60–61, pp. 252–272, Aug. 2015.
assessment model based on PCA is proposed to extract an [12] X. Li, X. Yang, Y. Yang, I. Bennett, and D. Mba, “A novel diagnostic
HI from deep features learned by the CNN classifier. The pro- and prognostic framework for incipient fault detection and remaining
posed method is tested on four bearings in the PHM2012, IMS, service life prediction with application to industrial rotating machines,”
Appl. Soft Comput., vol. 82, Sep. 2019, Art. no. 105564.
and XJD datasets. The results have shown that the proposed [13] Y. Dong, Z. Sun, and H. Jia, “A cosine similarity-based negative
method can detect incipient faults earlier than other methods selection algorithm for time series novelty detection,” Mech. Syst. Signal
with the lowest number of false alarms. Besides, the HI Process., vol. 20, no. 6, pp. 1461–1472, Aug. 2006.
extracted by the proposed method can reflect the degradation [14] W. Mao, J. He, and M. Zuo, “Predicting remaining useful life of rolling
bearings based on deep feature representation and transfer learning,”
process of rotating machines and has a clear boundary between IEEE Trans. Instrum. Meas., vol. 69, no. 4, pp. 1594–1608, May 2020.
the normal conditions and faults. The oversampling strategy [15] L. Wen, L. Gao, and X. Li, “A new deep transfer learning based on sparse
is also proven to be effective in improving the accuracy and auto-encoder for fault diagnosis,” IEEE Trans. Syst., Man, Cybern., Syst.,
vol. 49, no. 1, pp. 136–144, Jan. 2017.
sensitivity of the proposed method. More importantly, the [16] G. Xu, M. Liu, Z. Jiang, D. Söffker, and W. Shen, “Bearing fault
proposed method can detect early faults without a predefined diagnosis method based on deep convolutional neural network and
threshold to determine whether an early fault appears. random forest ensemble learning,” Sensors, vol. 19, no. 5, p. 1088, 2019.
[17] Y. T. Wu, M. Yuan, S. Dong, L. Li, and Y. Liu, “Remaining useful life
Nevertheless, the proposed method has some limitations. estimation of engineered systems using vanilla LSTM neural networks,”
On the one hand, it can only detect abnormal states but cannot Neurocomputing, vol. 275, pp. 167–179, Jan. 2018.
identify fault types. On the other hand, the proposed method [18] L. Wen, X. Li, L. Gao, and Y. Zhang, “A new convolutional neural
can achieve a good performance on bearings with the same network-based data-driven fault diagnosis method,” IEEE Trans. Ind.
Electron., vol. 65, no. 7, pp. 5990–5998, Jul. 2017.
operation conditions, but its prediction performance is not [19] G. Xu, M. Liu, Z. Jiang, W. Shen, and C. Huang, “Online fault diagnosis
satisfactory when transferring to a different working mode. method based on transfer convolutional neural networks,” IEEE Trans.
Moreover, the proposed method can only be used to detect Instrum. Meas., vol. 69, no. 2, pp. 509–520, Feb. 2020.
[20] W. Lu, Y. Li, Y. Cheng, D. Meng, B. Liang, and P. Zhou, “Early
early faults of rotating machines with stable speeds and loads, fault detection approach with deep architectures,” IEEE Trans. Instrum.
while there are a lot of fluctuations in real mechanical systems. Meas., vol. 67, no. 7, pp. 1679–1689, Jul. 2018.
Therefore, our future research work will be conducted in the [21] C. Li, S. Zhang, Y. Qin, and E. Estupinan, “A systematic review of
deep transfer learning for machinery fault diagnosis,” Neurocomputing,
following aspects. First, this method will be further developed vol. 407, pp. 121–135, Sep. 2020.
to recognize fault types. Moreover, an EFD method based on [22] C. Zhao, G. Liu, and W. Shen, “A dual-view alignment-based domain
CNN and transfer learning will be studied to enhance the gen- adaptation network for fault diagnosis,” Meas. Sci. Technol., vol. 32,
eralization ability. More importantly, an adaptive EFD method no. 11, Nov. 2021, Art. no. 115102.
[23] G. Liu, W. Shen, L. Gao, and A. Kusiak, “Predictive modeling with an
will be studied to adjust to variable operating conditions for adaptive unsupervised broad transfer algorithm,” IEEE Trans. Instrum.
real industry applications. Meas., vol. 70, pp. 1–12, 2021.
[24] C. Zhao, G. Liu, W. Shen, and L. Gao, “A multi-representation-based
R EFERENCES domain adaptation network for fault diagnosis,” Measurement, vol. 182,
Sep. 2021, Art. no. 109650.
[1] Y. Lei, J. Lin, Z. He, and M. J. Zuo, “A review on empirical mode
decomposition in fault diagnosis of rotating machinery,” Mech. Syst. [25] Z. Zhao et al., “Applications of unsupervised deep transfer learning to
Signal Process., vol. 35, nos. 1–2, pp. 108–126, Feb. 2013. intelligent fault diagnosis: A survey and comparative study,” IEEE Trans.
[2] Y. Lei, Intelligent Fault Diagnosis and Remaining Useful Life Prediction Instrum. Meas., vol. 70, pp. 1–28, 2021.
of Rotating Machinery. Oxford, U.K.: Butterworth-Heinemann, 2016. [26] Y. Qin, X. Wang, Q. Qian, H. Pu, and J. Luo, “Multiscale transfer
[3] X. Wen, G. Lu, J. Liu, and P. Yan, “Graph modeling of singular values voting mechanism: A new strategy for domain adaption,” IEEE Trans.
for early fault detection and diagnosis of rolling element bearings,” Ind. Informat., vol. 17, no. 10, pp. 7103–7113, Oct. 2021.
Mech. Syst. Signal Process., vol. 145, Nov. 2020, Art. no. 106956. [27] M. M. Keikha, “Improved simulated annealing using momentum terms,”
[4] Y. Li, M. Xu, Y. Wei, and W. Huang, “Health condition monitoring and in Proc. 2nd Int. Conf. Intell. Syst., Modeling Simulation, Jan. 2011,
early fault diagnosis of bearings using SDF and intrinsic characteristic- pp. 44–48.
scale decomposition,” IEEE Trans. Instrum. Meas., vol. 65, no. 9, [28] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based
pp. 2174–2189, Sep. 2016. learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11,
[5] Z. Gao, C. Cecati, and S. X. Ding, “A survey of fault diagnosis and pp. 2278–2324, Nov. 1998.
fault-tolerant techniques—Part I: Fault diagnosis with model-based and [29] P. Nectoux et al., “PRONOSTIA: An experimental platform for bearings
signal-based approaches,” IEEE Trans. Ind. Electron., vol. 62, no. 6, accelerated degradation tests,” in Proc. IEEE Int. Conf. Prognostics
pp. 3757–3767, Jun. 2015. Health Manage. (PHM), Jun. 2012, pp. 1–8.
[6] B. Luo, H. Wang, H. Liu, B. Li, and F. Peng, “Early fault detection of [30] H. Qiu, J. Lee, J. Lin, and G. Yu, “Wavelet filter-based weak signature
machine tools based on deep learning and dynamic identification,” IEEE detection method and its application on rolling element bearing prog-
Trans. Ind. Electron., vol. 66, no. 1, pp. 509–518, Jan. 2018. nostics,” J. Sound Vib., vol. 289, nos. 4–5, pp. 1066–1090, 2006.

[31] B. Wang, Y. Lei, N. Li, and N. Li, “A hybrid prognostics approach Dr. Shen is a member of American Society of Mechanical Engineers
for estimating remaining useful life of rolling element bearings,” IEEE (ASMEs) and Association of Computing Machinery (ACM). He is also the
Trans. Rel., vol. 69, no. 1, pp. 401–412, Mar. 2020. Editor-in-Chief of the IET Collaborative Intelligent Manufacturing and an
[32] Y. Li, M. Xu, X. Liang, and W. Huang, “Application of bandwidth Associate Editor or an Editorial Board Member of over ten international
EMD and adaptive multiscale morphology analysis for incipient fault journals, including the IEEE T RANSACTIONS ON S YSTEMS , M AN , AND
diagnosis of rolling bearings,” IEEE Trans. Ind. Electron., vol. 64, no. 8, C YBERNETICS : S YSTEMS , Advanced Engineering Informatics, and Service
pp. 6506–6517, Aug. 2017. Oriented Computing and Applications. He has served as a guest editor for
[33] Z. Guo, G. Jiang, H. Chen, and K. Yoshihira, “Tracking probabilistic several international journals.
correlation of monitoring data for fault detection in complex systems,”
in Proc. Int. Conf. Dependable Syst. Netw. (DSN), 2006, pp. 259–268.

Wenbin Song received the B.S. and M.E. degrees in Liang Gao (Senior Member, IEEE) received the
mechanical engineering from the Huazhong Univer- Ph.D. degree in mechatronic engineering from the
sity of Science and Technology, Wuhan, China, in Huazhong University of Science and Technology
2013 and 2019, respectively, where he is currently (HUST), Wuhan, China, in 2002.
pursuing the Ph.D. degree in mechanical engineering He is currently a Professor with the State Key
with the State Key Laboratory of Digital Man- Laboratory of Digital Manufacturing Equipment and
ufacturing Equipment and Technology, School of Technology, Department of Industrial and Manufac-
Mechanical Science and Engineering. turing System Engineering, School of Mechanical
His current research interests include prognostics Science and Engineering, HUST. He has authored
and health management, remaining useful life pre- or coauthored more than 440 refereed articles. His
diction, and machine learning. current research interests include operations research
and optimization, big data, and machine learning.
Weiming Shen (Fellow, IEEE) received the bache-
lor’s and master’s degrees in mechanical engineer-
ing from Northern (Beijing) Jiaotong University,
Beijing, China, in 1983 and 1986, respectively, and
the Ph.D. degree in system control from the Uni-
versity of Technology of Compiègne, Compiègne, Xinyu Li (Member, IEEE) received the Ph.D. degree
France, in 1996. in industrial engineering from the Huazhong Uni-
He is currently a Professor at the Huazhong Uni- versity of Science and Technology (HUST), Wuhan,
versity of Science and Technology (HUST), Wuhan, China, in 2009.
China, and an Adjunct Professor at the University He is currently a Professor with the State Key
of Western Ontario, London, ON, Canada. Prior to Laboratory of Digital Manufacturing Equipment and
joining HUST in 2019, he was a Principal Research Officer at the National Technology, Department of Industrial and Manufac-
Research Council Canada, Ottawa, ON, Canada. His work has been cited turing System Engineering, School of Mechanical
over 14 000 times with an H-index of 56. He has published several books Science and Engineering, HUST. He has authored
and over 500 papers in scientific journals and international conferences in the more than 100 refereed articles. His research inter-
related areas. His recent research interests include agent-based collaboration ests include intelligent algorithm, big data, and
technology and applications, the Internet of Things, and big data analytics. machine learning.