0% found this document useful (0 votes)

9 views

Real-Time Bearing Remaining Useful Life Estimation Based on the Frozen Convolutional and Activated Memory Neural Network

Uploaded by

Pranav Dev

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

Real-Time Bearing Remaining Useful Life Estimation Based on the Frozen Convolutional and Activated Memory Neural Network

Uploaded by

Pranav Dev

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/334500740

Real-Time Bearing Remaining Useful Life Estimation Based on the Frozen

Convolutional and Activated Memory Neural Network

Article in IEEE Access · July 2019

DOI: 10.1109/ACCESS.2019.2929271

CITATIONS READS

3 224

4 authors, including:

Xiaotong Tu Yue Hu
Xiamen University Shanghai Jiao Tong University
32 PUBLICATIONS 185 CITATIONS 27 PUBLICATIONS 192 CITATIONS

SEE PROFILE SEE PROFILE

Fucai Li
Shanghai Jiao Tong University
135 PUBLICATIONS 1,211 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Impact of Environmental Parameters on guided waves propagation during Structural Health Monitoring of a structure View project

signal processing View project

All content following this page was uploaded by Xiaotong Tu on 18 July 2019.

The user has requested enhancement of the downloaded file.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2929271, IEEE Access

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.Doi Number

Real-Time Bearing Remaining Useful Life

Estimation based on the Frozen Convolutional
and Activated Memory Neural Network
ZESHENG CHEN, XIAOTONG TU, YUE HU, AND FUCAI LI
State Key Laboratory of Mechanical System and Vibration, Shanghai Jiao Tong University, Shanghai 200240, China

Corresponding author: Fucai Li (e-mail: fcli@ sjtu.edu.cn).

This work was supported in part by the National Science and Technology Major Project under Grant 2018ZX04011001, in part by the National Natural
Science Foundation of China under Grant 11427801.

ABSTRACT Bearings are widely used in rotating machinery such as aircraft engines and wind turbines. In
this paper, we proposed a new data-driven method called frozen convolution and activated memory network
(FCAMN) for bearing remaining useful life (RUL) estimation based on the deep neural network. The
proposed method is composed of two parts: the multi-scale convolutional neural network is carried out to
pre-train the raw data to directly obtain the global and local features; the second step is accomplished by the
convolutional-memory neural network, which enables to connect the convolutional layer with the long-short-
time-memory layer together to predict the continuous bearing RUL. Compared with traditional networks, the
proposed network can additionally extract both the global and local information on the vertical feature axis
and the associated context information on the horizontal time axis. Experiments are conducted to prove that
the proposed method requires fewer training samples and outperforms than other methods in RUL estimation.

INDEX TERMS Bearings, remaining useful life estimation, multi-scale convolutional network, long short
time memory neural network.

I. INTRODUCTION system can be quite difficult considering the complex

In modern manufacturing, condition-based maintenance mechanical structure and the necessary professional
(CBM) becomes an increasingly important tool to ensure knowledge such as the tribological failure mechanism [4].
safe operation [1]. As a maintenance strategy, it is usually Under these circumstances, using the data-driven method
based on measuring the condition of equipment to assess becomes a more efficient and succinct way.
whether they will fail during some future period. Then the The run-to-fail data can be easily collected from different
appropriate action can be taken to avoid the failure. Since types of sensors installed on the mechanical structures. Data
today is the era of big data, CBM has been widely used in driven approaches can be roughly classified into three
vibration monitoring, sound or acoustic monitoring, classes: The first kind is the traditional signal processing
lubricant monitoring and other related maintenance algorithm. Heng et al. [5] studied statistical characteristics
situations [2]. such as kurtosis and skewness of the rolling bearing vibration
Bearings are widely applied in machinery industry and signal for reconstructing the degradation process of the
generally work in harsh environments. According to a survey, rolling bearings. Wang et al. [6] applied an enhanced
about 45%-55% asynchronous motor failures are caused by Kalman filter and an expectation maximization algorithm to
bearing failures [3]. Among the indicators, predicting the predict RUL on the deviation of multiple statistics of
remaining useful life (RUL) plays an important role in vibration signals. Zhang et al. [7] adopted information
prognostic health management. Currently, the RUL entropy features with SVD for further constructing a model
prediction paradigms can be categorized into three parts: to track the variation of the characteristic parameters.
model-based methods, data-driven methods and model-data- Besides, Si et al. [8] made a comprehensive and detailed
hybrid methods. For model-based and hybrid methods, review on data-driven approaches.
establishing a model which matches well with the actual The second way is traditional machine learning. Lei et al.

VOLUME XX, 2019 1

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2929271, IEEE Access

[9] proposed a novel approach, which combines the weighted method possesses both of the features for predicting bearing
minimum quantization error and particle filtering-based RUL appears to have better performance. To fill this research
algorithm to construct the degradation process. gap, the frozen convolution and activated memory network
Simultaneously, Rai et al. [10] presented a new health (FCAMN) approach is proposed, which contains a multi-
indicator using the self-organizing map with support vector scale convolutional-memory network. The convolution part
regression to predict bearing RUL. Ben et al. [11] explored is to extract primary features and the second part is to learn
the Simplified Fuzzy Adaptive Resonance Theory Map the features to predict the continuous bearing RUL.
(SFAM) using Weibull distribution to match measurement The rest of this paper is organized as follows. Section II
and to avoid time domain fluctuation. Xi et al. [12] also provides the methodology of the proposed method. Section III
established a new degradation process with memory using verifies the proposed method using several bearing run-to-fail
fractional Brownian motion to compensate the drawback of datasets. Section IV discusses future research works, and
the memoryless Markovian process. A recent review Section V is the conclusion of this paper.
presented some other machine learning methods [13].
The third class is deep learning, which makes great A. NETWORK STRUCTURE
progress in computer vision [14], speech recognition [15], As described in the first section, the proposed model
machine translation [16] and diagnostics [17]. However, contains an adaptive multi-scale convolutional neural
there are few references in the field of RUL prediction, and network(A-MSCNN) and a CNN-LSTM network. The
these approaches seldom reach the expected performance on overall flowchart is shown in Fig. 1.
the prognostic. Li et al. [18] presented a two-stage method
which combines a denoising autoencoder-based deep neural
network and a shallow regression neural network to obtain
the final RUL. Another representative deep learning model
is the convolutional neural network (CNN), which is first
proposed by LeCun et al. [19] and used in image
classification on the MNIST dataset. Benefiting from the
strong power of the CNN [20], researchers have made great
efforts to deal with CBM problems on diagnosis work, but
prognosis work is relatively lacking. Babu et al. [21] applied
the convolutional and pooling filters along the dimension
over the multivariate time series and predicted the bearing
RUL based on the regression approach. On the basis of the
CNN method. Besides, Wei et al. [22] combined local and
global pooling method to better extract rich and robust
representation from sparse feature maps learned from the
raw data in pattern recognition. Inspired by the multi-scale
method, Zhu et al. [23] presented a new multi-scale
convolutional neural network (MSCNN), which takes time-
frequency representation (TFR) to represent the raw time
FIGURE 1. Flowchart of the proposed network structure
signals and extracts non-stationary features successfully.
After getting TFR, a multi-layer structure which integrates The training process is divided into two steps. Firstly, the
the last convolutional layer with some previous pooling A-MSCNN is pre-trained, which aims to extract detailed
layers is employed to train this regression task. The features from the raw vibration signals. In this step, the data
MSCNN’s effectiveness has been validated on the IEEE to import the input layer can be the raw time series, pre-
PHM 2012 Data Challenge Set [24]. The other representative processed frequency series or TFR series. The parameters of
deep learning method is the recurrent neural network (RNN). convolutional layers, pooling layers and fully connected
Since the state of the bearing is a continuous process, thus layers are randomly initialized since this model is prepared
RNN should be adopted to establish the context along the for pre-training. The output layer of the A-MSCNN gives out
time axis. Guo et al. [25] selected the main features with the dense_layer_2 (the second fully connected layer, FC)
correlation and monotonicity algorithms from time- values to compare with the labels and start backpropagation.
frequency features, and then proposed the RNN-HI model to After pre-training, the A-MSCNN model is “frozen” in the
estimate the bearing RUL. Gugulothu et al. [26] used the graph model which means the model structure and the weight
RNN to generate embeddings for multivariate time series parameters are fixed as constants. In the second step, the
and verified the method on turbofan engine datasets [27] model’s dense_layer_1 (the first fully connected layer, FC)
from NASA Ames Prognostics Data Repository. is connected to the CNN-LSTM model. The model saved in
Due to advantages of both the CNN and the RNN, a new the first step is loaded and the pre-trained network

2 VOLUME XX, 2019

parameters are imported into the front part of FCAMN model features by certain rules. In this paper, taking interval from 0
at the same time corresponding to “activation”. In this step, to 𝑀 is applied on the feature sequences, where 𝑀 represents
dense_layer_1 value becomes the input of the CNN-LSTM the scales used in the network. Then, the same convolution
model. In the second training process, the FCAMN model is kernel is convolved with each secondary feature sequence of
trained through the A-MSCNN, the CNN, the LSTM and the certain layer to obtain features of different time scales. Finally,
FC, and finally the bearing RUL prediction is given by the all the secondary feature sequences inherited from the same
last layer of FC. An end-to-end deep learning approach is primary feature are joined together in turn to form a multi-
advocated which requires no expert knowledge. In the scale feature.
following parts of this section, the model structure and Jump convolution is defined as
mathematical derivation are described. 𝑙−1 𝑙−1
𝑐𝑜𝑛𝑣1𝐷(𝑤1𝑘 , 𝑠𝑖1 )
······
B. PRE-TRAINED MULTI-SCALE CONVOLUTION 𝐽𝑐𝑜𝑛𝑣1𝐷(𝜔𝑘𝑙−1 , 𝑠𝑖𝑙−1 ) = 𝑐𝑜𝑛𝑣1𝐷(𝑤𝑗𝑘𝑙−1 𝑙−1
, 𝑠𝑖𝑗 ) (1)
NETWORK ······
The A-MSCNN is used for extracting features from the raw { 𝑐𝑜𝑛𝑣1𝐷(𝑤𝑀 𝑙−1 𝑙−1
, 𝑠𝑖𝑀 )
𝑙−1 𝑘 𝑙−1
time signals. After this process, the A-MSCNN model can
learn highly concentrated features, which is beneficial to 𝐿𝑒𝑛𝑔𝑡ℎ(𝑠𝑖𝑙−1 )−1
𝑀𝑙−1 = 𝐺𝑟𝑖𝑑𝑆𝑒𝑎𝑟𝑐ℎ(1, [ 𝑙−1 )−1 ]) (2)
match the overall model fast and decrease the difficulties of 𝐿𝑒𝑛𝑔𝑡ℎ(𝑤𝑘
generalizing the whole model. Besides, an adaptive multi-
scale strategy is reflected in adjusting the scales automatically where 𝑠𝑖𝑙−1 is the 𝑖 𝑡ℎ feature of layer 𝑙 − 1; 𝑤𝑘𝑙−1 is the 𝑘 𝑡ℎ
to achieve the optimal results. Considering the original time convolution kernel of layer 𝑙 − 1 ; 𝑠𝑖𝑗𝑙−1 is the 𝑗𝑡ℎ scale
vibration signals, the kernel size is set to 8 after applying secondary feature inherited from the primary feature 𝑠𝑖𝑙−1 by
𝑙−1
different trails from 4 to 32. The brief network architecture is setting the interval parameter as 𝑗; 𝑤𝑗𝑘 is the convolution
𝑙−1
presented in Table I. weights corresponding to 𝑠𝑖𝑗 and has the same parameters
TABLE I as 𝑤𝑘𝑙−1 ; 𝑐𝑜𝑛𝑣1𝐷 represents the one dimension convolution
LAYER DETAILS OF THE A-MSCNN ARCHITECTURE operation; 𝑀𝑙−1 is the number of different secondary scales
Layer 1 input channels=2, sequence_length=2560 of layer 𝑙 − 1; [ ] refers to the rounding down operation, and
Layer 2 1D-convolution
filters=32, kernel_size=8, strides=1, 𝐺𝑟𝑖𝑑𝑆𝑒𝑎𝑟𝑐ℎ represents a linear search method for finding
padding=same, act=ReLu
the optimal parameter.
multi-scale pooling method=grid_search, target_scales=3
filters=64, kernel_size=8, strides=1,
According to Fig. 2, the forward propagation for multi-
Layer 3 1D-convolution
padding=same, act=ReLu scale kernel layers between layer 𝑙 and 𝑙 − 1 can be
1D-max-pooling pool_size=4, strides=8, padding=same expressed as
filters=128, kernel_size=8, strides=1,
1D-convolution 𝑁
Layer 4 padding=same, act=ReLu 𝑥𝑘𝑙 = 𝑏𝑘𝑙 + ∑𝑖=1
𝑙−1
𝐽𝑐𝑜𝑛𝑣1𝐷(𝑤𝑘𝑙−1 , 𝑠𝑖𝑙−1 ) (3)
1D-max-pooling pool_size=4, strides=8, padding=same
fully-connected layer_size=50, act=ReLu 𝑦𝑘𝑙 = 𝑓(𝑥𝑘𝑙 ) (4)
Layer 5
dropout keep_prob=0.8
Layer 6 fully-connected layer_size=1, act=None
where 𝐾 shows that layer 𝑙 − 1 has 𝐾 kinds of convolution
Layer 7 output bearing RUL prediction, shape=1, value=Layer 6
kernels; 𝑏𝑘𝑙 represents the overall bias of the neuron; 𝑁𝑙−1 is
the number of features of layer 𝑙 − 1; 𝑥𝑘𝑙 refers to the input
of the 𝑘 𝑡ℎ neuron of layer 𝑙, and 𝑦𝑘𝑙 indicates the activated
The A-MSCNN model is composed of multi-scale kernel
output value of the above neuron of layer 𝑙.
layers, convolutional layers, down-sampling layers and fully
Subsequently, a max-pooling layer is employed for down-
connected layers. The multi-scale kernel layer is the core of
sampling operation, which is written as
the network, which is devoted to extracting features of
different time scales. The function of the convolutional layer 𝑠𝑘𝑙 = 𝑦𝑘𝑙 ↓ 𝑠𝑠 (5)
is to abstract low-level features of the signals into advanced-
where 𝑠𝑘𝑙
is the output value after the max-pooling
level features. The down-sampling layer aims to reduce the
processing; ↓ 𝑠𝑠 represents down-sampling operation. In this
dimension of the features and enhance the generalization
ability of the model, and the fully connected layer plays a paper, all down-sampling layers refer to max-pooling layers.
comprehensive decision-making role. 2) The Convolution Feature Extraction Layer:
1) The Multi-scale Kernel Feature Extraction Layer:
The overall structure of this layer and the calculation
To solve the insensitivity to time series of the one- process are shown in Fig. 3.
dimension convolutional neural network, a multi-scale kernel In this layer, the forward propagation between layer 𝑙 and
layer structure is proposed. Firstly, the feature extraction layer 𝑙 − 1 is expressed as
machine is applied on some front layer features then primary 𝑁
𝑥𝑘𝑙 = 𝑏𝑘𝑙 + ∑𝑖=1
𝑙−1 𝑙−1 𝑙−1
𝑐𝑜𝑛𝑣1𝐷(𝑤𝑖𝑘 , 𝑠𝑖 ) (6)
scale features are extracted into secondary scale segmented

VOLUME XX, 2019 3

FIGURE 2. The multiple scale convolution layers

FIGURE 3. The convolution layers

neuron, and 𝑦𝑖𝐿 is the predicted output value of the 𝑖 𝑡ℎ

𝑦𝑘𝑙 = 𝑓(𝑥𝑘𝑙 ) (7)
neuron after activation processing. Considering the
where 𝑥𝑘𝑙 is the input value of the 𝑘 𝑡ℎ neuron of layer 𝑙; 𝑏𝑘𝑙 is significance to industrial production on the loss function, a
the bias of the 𝑘 𝑡ℎ neuron of layer 𝑙; 𝑠𝑖𝑙−1 is the 𝑖 𝑡ℎ feature of different manner is applied for bearing RUL prediction since
𝑙−1
layer 𝑙 − 1; 𝑤𝑖𝑘 is the convolution kernel weight between overestimation is much worse than underestimation.
the 𝑖 feature of layer 𝑙 − 1 and the 𝑘 𝑡ℎ neuron of layer 𝑙;
𝑡ℎ
Under the circumstances, the loss function is defined as
𝑦𝑘𝑙 is the activated output corresponding to the input 𝑥𝑘𝑙 ; 𝑓 is 1
the activation function. ∗ 𝛼 ∗ ∑𝑖 (𝑦𝑖𝐿 − 𝑝𝑖𝐿 )2 , 𝑦𝑖𝐿 ≥ 𝑝𝑖𝐿
𝐿𝑜𝑠𝑠 = {21 (9)
In accord with the multi-scale kernel feature extraction ∗ 𝛽 ∗ ∑𝑖 (𝑝𝑖𝐿 − 𝑦𝑖𝐿 )2 , 𝑦𝑖𝐿 < 𝑝𝑖𝐿
2
layer, a down-sampling layer is applied for the activated
output 𝑦𝑘𝑙 , which is given as
where 𝐿𝑜𝑠𝑠 is the loss value; 𝛼 and 𝛽 are weights of the loss
𝑠𝑘𝑙 = 𝑦𝑘𝑙 ↓ 𝑠𝑠 (8) function responding to overestimation and underestimation
with the relation of 𝛼 > 𝛽 > 0. The goal of the BP process
3) The Iterative Algorithm Derivation of the MSCNN:
is to minimize 𝐿𝑜𝑠𝑠, and the derivative to each weight and
Because the A-MSCNN model is first proposed by the
bias is required to be calculated. For the output layer, the
author, it is necessary to give out the derivation of the
gradient value is
backpropagation (BP) process. Let 𝑙 = 𝐿 denotes the index
of the output layer, 𝑝𝑖𝐿 is the expected output value of the 𝑖 𝑡ℎ

4 VOLUME XX, 2019

1 𝜕(𝑦𝑖𝐿 −𝑝𝑖𝐿)2 𝜕𝑦𝑖𝐿 the convolutional layer is given as

∗ 𝛼 ∗ ∑𝑖 ∗ , 𝑦𝑖𝐿 ≥ 𝑝𝑖𝐿
𝜕𝐿𝑜𝑠𝑠 2 𝜕𝑦𝑖𝐿 𝜕𝑥𝑗𝐿
𝛿𝑗𝐿 = ={ (10) 𝜕𝐿𝑜𝑠𝑠
= ∑𝑖 𝑐𝑜𝑛𝑣1𝐷(𝑠𝑖𝑙−1 , 𝛿𝑘𝑙 ) (18)
𝜕𝑥𝑗𝐿 1 𝜕(𝑝𝑖𝐿−𝑦𝑖𝐿)2 𝜕𝑦𝑖𝐿 𝑙−1
𝜕𝑤𝑘
∗𝛽∗∑ 𝑖 ∗ , 𝑦𝑖𝐿 < 𝑝𝑖𝐿
2 𝜕𝑦𝑖𝐿 𝜕𝑥𝑗𝐿
𝜕𝐿𝑜𝑠𝑠
= ∑ 𝛿𝑙 (19)
𝜕𝑦𝑖𝐿 𝜕𝑏 𝑙
𝛼∗ ∑𝑖(𝑦𝑖𝐿 − 𝑝𝑖𝐿 ) ∗ , 𝑦𝑖𝐿 ≥ 𝑝𝑖𝐿
𝜕𝑥𝑗𝐿 As shown in Fig. 2, the derivative to weights and biases of
𝛿𝑗𝐿 = { (11)
𝜕𝑦𝑖𝐿 the multi-scale kernel layer is calculated as
𝛽∗ ∑𝑖 (𝑦𝑖𝐿 − 𝑝𝑖𝐿 ) ∗ , 𝑦𝑖𝐿 < 𝑝𝑖𝐿
𝜕𝑥𝑗𝐿
𝜕𝐿𝑜𝑠𝑠
𝑙−1 = ∑𝑖 𝐽𝑐𝑜𝑛𝑣1𝐷 𝑗 (𝛿𝑘𝑙 , 𝑠𝑖𝑙−1 ) (20)
𝜕𝑦𝑖𝐿 𝑓 ′, 𝑖 = 𝑗 𝜕𝑤𝑗𝑘
={ (12)
𝜕𝑥𝑗𝐿 0, 𝑖 ≠ 𝑗 𝜕𝐿𝑜𝑠𝑠
= ∑ 𝛿𝑙 (21)
𝜕𝑏 𝑙
where 𝛿𝑗𝐿 𝑡ℎ
is the gradient of the 𝑗 neuron of the output layer;
𝑥𝑗𝐿 is the inactivated output of the 𝑗𝑡ℎ neuron; 𝑓 ′ is the Where 𝑤𝑗𝑘 𝑙−1
is the weight of the multi-scale convolutional
derivative of the activation function. Taking the output layer layer in Fig. 2; 𝑖 is the index of features of layer 𝑙 − 1;
for example, we have 𝑖 = 𝑗 = 1 and 𝑓 ′ = 1 since there is no 𝐽𝑐𝑜𝑛𝑣1𝐷 𝑗 represents the 𝑗𝑡ℎ convolution result defined in
activation function in the output layer. Finally, the gradient formula (1). In this operation, 𝛿𝑘𝑙 is truncated the same length
matrix 𝛿 𝐿 is obtained based on the formulas above. as each result of 𝐽𝑐𝑜𝑛𝑣1𝐷(𝛿𝑘𝑙 , 𝑠𝑖𝑙−1 ).
When the front layer of the output layer is a convolutional
layer, the deviation transfer between the two layers in the BP C. CONVOLUTIONAL-MEMORY NETWORK
process is expressed as This CNN-LSTM network is used for predicting bearing
𝛿 𝐿−1
= (𝑤 𝐿−1 𝑇 𝐿
) 𝛿 .∗ ′
𝑓𝑟𝑒𝑙𝑢 (𝑥 𝐿−1 ) (13) RUL after the A-MSCNN model is pre-trained. It can extract
the features along the time axis as the bearing degradation is
′
1, 𝑥𝑗𝐿−1 >0 a continuous changing process. The overall network
𝑓𝑟𝑒𝑙𝑢 (𝑥𝑗𝐿−1 ) = { (14)
0, 𝑥𝑗𝐿−1 ≤0 architecture is shown in the following table, and the A-
MSCNN network is also listed in order to describe the
where 𝛿 𝐿−1 is the gradient matrix of layer 𝐿 − 1; 𝑤 𝐿−1 is the connection details. The serial number of the layer is followed
weight matrix of the fully connected layer; 𝑥 𝐿−1 is the input by ones in Table II. Layer 11 is the input layer of the LSTM
matrix of layer 𝐿 − 1 ; 𝑓𝑟𝑒𝑙𝑢 represents the activation model. The omitted parts between Layer 1 and Layer 5 are
′
function named ReLu and 𝑓𝑟𝑒𝑙𝑢 refers to the derivative of the the same as A-MSCNN. The input of FCAMN only is
ReLu function. composed of 50 features. Considering the features are highly
When the front layer of the output layer is a pooling layer, extracted, the kernel size is set to 2 after trying the range of
the gradient matrix of layer 𝐿 − 1 is defined as 2 to 4.
𝛿 𝐿−1 = 𝑢𝑝(𝛿 𝐿 ) (15) TABLE II
LAYER DETAILS OF THE FCAMN ARCHITECTURE
𝐿)
where 𝑢𝑝(𝛿 represents the up-sampling operation. The Layer 1 input channels=2, sequence_length=2560
gradient is averaged to each unit when mean-pooling is ··· ··· ···
applied. Otherwise the gradient is to fill the unit which fully-connected layer_size=50, act=ReLu
Layer 5
contains the biggest value in the forward-propagation while dropout keep_prob=0.8
the other units are set to zero when applying max-pooling 1D-convolution
filters=18, kernel_size=2, strides=1,
Layer 8 padding=same, act=ReLu
method. 1D-max-pooling pool_size=2, strides=2, padding=same
According to the backpropagation process, the gradient of filters=36, kernel_size=2, strides=1,
1D-convolution
every layer can be calculated. To calculate the gradient of Layer 9 padding=same, act=ReLu

convolutional layers and multi-scale kernel layers, the 1D-max-pooling pool_size=2, strides=2, padding=same
filters=72, kernel_size=2, strides=1,
gradient matrix is given as 1D-convolution
padding=same, act=ReLu
Layer10
′
𝛿 𝑙 = 𝑢𝑝(𝛿 𝑙+1 ).∗ 𝑓𝑟𝑒𝑙𝑢 (𝑥 𝑙 ) (16) 1D-max-pooling pool_size=2, strides=2, padding=same
fully-connected layer_size=50, act=ReLu
Layer11
The gradient matrix of pooling layers is given as dropout keep_prob=0.8
LSTM-cell layer_num=2, cell_num=150
𝑙 𝑙+1 𝑙+1 ))
𝛿 = 𝑐𝑜𝑛𝑣(𝛿 , 𝑟𝑜𝑡180(𝑤 (17) Layer12
dropout Keep_prob=0.8
𝑙 𝑙+1 fully-connected layer_size=50, act=ReLu
where 𝛿 is the gradient matrix of layer 𝑙; 𝑤 is the weight Layer13
dropout keep_prob=0.8
matrix between layer 𝑙 and layer 𝑙 + 1 , and 𝑟𝑜𝑡180
fully-connected layer_size=1, act=None
indicates that the matrix rotates 180 degrees. Layer14
dropout keep_prob=0.8
Finally, the derivative to weights and biases can be Layer15 output bearing RUL prediction, shape=1, value=Layer 14
calculated as follows according to Fig. 3. The derivation of

VOLUME XX, 2019 5

The core of this model is LSTM, one of the most efficient are fed into the LSTM layer along the time axis at a time.
gated RNNs whose gradient does not disappear or get stuck Number 50 shows that each input vector has 50 features
in explosive gradient growth. Different from traditional which are extracted from the A-MSCNN network. A
RNNs, the weights of self-loop of LSTM are determined by comprehensive report about the backpropagation process can
the context. Gates are used to control these weights so that be found in [28].
cumulative time scales can be changed dynamically. The Because the whole model is complex, some regularization
basic framework is shown in Fig. 4. techniques are adopted to prevent over-fitting. Dropout is
one of the most efficient way to deal with the issue. In
addition, parameter norm strategy is simultaneously taken
into consideration for better performance. In this paper, the
L2-norm penalty is chosen for all layers, and the weight
value is set to 0.1.

III. EXPERIMENT VERIFICATION

In this section, the proposed model is validated on the
dataset from IEEE PHM 2012 Data Challenge provided by
the FEMTO-ST Institute [23]. Descriptions about the
experiment platform and the validation datasets are given in
FIGURE 4. Cell framework of the LSTM
the next section. The bearing RUL is calculated with
In this paper, two LSTM layers are stacked up and down. proposed methods on the provided datasets after a two-step
Giving an input sequence of 𝑋 = (𝑥1 , ⋯ , 𝑥𝑇 ), the output training. Finally, several traditional estimation methods are
sequence of 𝑌 = (𝑦1 , ⋯ , 𝑦𝑇 ) is generated using the hidden briefly introduced. And corresponding results are brought up
sequence of 𝐻 = (ℎ1 , ⋯ , ℎ 𝑇 ). According to Fig. 4, the cell of on the same dataset to compare the ability for predicting the
the LSTM is working in the following rules [27]. actual remaining useful life of bearings.

ℎ𝑡 = 𝑓(𝑊ℎ𝑥 𝑥𝑡 + 𝑊ℎℎ ℎ𝑡−1 + 𝑏ℎ ) (22) A. PLATFORM INTRODUCTION

This platform is dedicated to testing and validating
𝑦𝑡 = 𝑊𝑦ℎ ℎ𝑡 + 𝑏𝑦 (23)
diagnostic and prognostic approaches for bearings. The test
where ℎ𝑡 is the vector of the hidden layer of time 𝑡; 𝑥𝑡 is the rig contains three parts, as shown in Fig. 5. Rotating part
vector of the input layer of time 𝑡. To calculate ℎ𝑡 , 𝑊ℎ𝑥 is the includes an asynchronous motor with a speed reducer. The
weight vector related to 𝑥 and 𝑊ℎℎ is the weight vector to torque is subsequently transferred to the tested bearing
deal with ℎ ; 𝑏ℎ is the bias of the hidden sequence. To through a complicated shaft coupling. Loading part contains
calculate the output vector 𝑦𝑡 , 𝑊𝑦ℎ is the weight vector a cylinder pressure to accelerate bearings’ degradation
connected to ℎ, and 𝑏𝑦 is the bias of the output sequence. process by setting the loading value higher than the bearing’s
Since the calculation of ℎ𝑡 is an iterative process, it is defined maximum dynamic limits. Measurement part is composed of
as several sensors. Two accelerometers are used to collect the
vibration data in the orthogonal directions. Speed sensors,
𝑖𝑡 = 𝜎(𝑊𝑖𝑥 𝑥𝑡 + 𝑊𝑖ℎ ℎ𝑡−1 + 𝑏𝑖 ) (24)
torquenmeters and force sensors are adopted to record the
𝑓𝑡 = 𝜎(𝑊𝑓𝑥 𝑥𝑡 + 𝑊𝑓ℎ ℎ𝑡−1 + 𝑏𝑓 ) (25) bearing’s working conditions. All the sensor signals are
collected together by NI DAQ cards for further analysis.
𝑐𝑡 = 𝑓𝑡 𝑐𝑡−1 + 𝑖𝑡 𝑡𝑎𝑛ℎ( 𝑊𝑐𝑥 𝑥𝑡 + 𝑊𝑐ℎ ℎ𝑡−1 + 𝑏𝑐 ) (26)
𝑜𝑡 = 𝜎(𝑊𝑜𝑥 𝑥𝑡 + 𝑊𝑜ℎ ℎ𝑡−1 + 𝑏𝑜 ) (27) NI DAQ Pressure Regulator Cylinder Pressure Tested Bearing Force Sensor

ℎ𝑡 = 𝑜𝑡 𝑡𝑎𝑛ℎ( 𝑐𝑡 ) (28)
where 𝜎 is the logistic sigmoid operator; tanh is the
hyperbolic tangent operator; 𝑖𝑡 denotes the input gate; 𝑓𝑡
denotes the forget gate; 𝑐𝑡 denotes the cell activation vector,
and 𝑜𝑡 denotes the output gate. 𝑊𝑖𝑥 , 𝑊𝑖ℎ , 𝑊𝑓𝑥 , 𝑊𝑓ℎ , 𝑊𝑐𝑥 , 𝑊𝑐ℎ ,
𝑊𝑜𝑥 and 𝑊𝑜ℎ are weight vectors, which are defined the same
way as 𝑊ℎ𝑥 , 𝑊ℎℎ and 𝑊𝑦ℎ . 𝑏𝑖 , 𝑏𝑓 , 𝑏𝑐 and 𝑏𝑜 are biases which
have the similar definition with 𝑏ℎ and 𝑏𝑦 . All the above
formulas respond to time 𝑡 or time 𝑡 − 1 depending on the
subscripts.
In the CNN-LSTM model, the input of LSTM layers has AC Motor Speed Sensor Speed Reducer Torquenmeter Accelerometers

the shape of 100*50. Number 100 represents that 100 vectors FIGURE 5. Bearing experiment platform

6 VOLUME XX, 2019

In this experiment, the platform is organized to get information.

bearings’ run-to-failure data, which contain vibration and The last method is to extract a new feature vector to
temperature signals. In this paper, the vibration data rather replacing the raw signal. Considering the versatility of the
than temperature data are chosen to predict the bearing RUL. model and computational cost, time domain statistics and
The data of accelerometers are collected every 10 seconds simple time-frequency representation are calculated. In time
lasting for 0.1 seconds with the sampling frequency of 25.6 domain, the features selected is listed in Table IV and the
kHz. When the acceleration is up to 20 g, the bearing RUL degradation described by the statistics is shown in Fig. 9.
is supposed to reaches 0. TABLE IV
TIME DOMAIN FEATURE EXTRACTION
B. DATA DESCRIPTION Feature Formula
𝑛
In PHM 2012 dataset, three bearing conditions are 1
Mean value 𝑥̅ = ∑ 𝑥𝑖
presented with run-to-failure data. The proposed method 𝑛
𝑖=1
based on the FCAMN model is adopted to the first operating Peak value 𝑥𝑝 = 𝑚𝑎𝑥 (𝑎𝑏𝑠(𝑥𝑖 ))
condition in which the bearing works with the speed of 1800 Peak-to-peak 𝑥𝑝𝑝 = 𝑥𝑚𝑎𝑥 − 𝑥𝑚𝑖𝑛
rpm and the load of 4000 N. On this condition, seven
𝑛
bearings are tested. The structure of the dataset is shown in 1
Root mean square 𝑥𝑟𝑚𝑠 = √ ∑ 𝑥𝑖2
the following table. The first row is the index of the columns 𝑛
𝑖=1
and the second row lists the main details in which Horiz.Ac
represents the horizontal acceleration data. 𝑛
1
TABLE III
Standard deviation 𝑥𝑠𝑡𝑑 = √ ∑(𝑥𝑖 − 𝑥̅ )2
𝑛
ARRANGEMENT OF THE DATASET 𝑖=1

Index Col 1 Col 2 Col 3 Col 4 Col 5 Col 6 𝑥𝑝

Crest factor 𝑥𝑓 =
Vib.Sig Hour Minute Second µ-second Horiz.Ac Vert.Ac 𝑥𝑟𝑚𝑠
𝑛
1 𝑥𝑖 4
As mentioned in Part A, every file in the dataset records Kurtosis factor 𝑥𝐾𝑓 = ∑( )
𝑛 𝑥𝑠𝑡𝑑
𝑖=1
bearing signals within the interval of 10 seconds. For 𝑛
convenience, RUL label is set as integral according to the 1 𝑥𝑖− 𝑥̅ 4
Skewness 𝑥𝐾 = ∑ ( )
time interval from the failure point to current moment. For 𝑛 𝑥𝑠𝑡𝑑
𝑖=1
example, RUL label “121” means in this time stamp, there
are 121∗10 seconds left before the bearing comes into failure.
The raw training signals of the horizontal direction are
plotted in Fig. 6. Bearing1_1 shows the gradual degrading
trajectory and bearing1_2 exposes the abruption features
near the end of the bearing’s lifetime. Besides, the data
collected from XJTU is shown the same time. The
characteristics of bearings at different time points are
compared both on time and frequency domains in Fig. 7. The
degrading trend can be easily observed, but predicting
accurate RUL remains challenging.

C. FEATURE ENGINEERING
To compare the applicability of the model to various input
signals, feature engineering is taken into consideration. It’s
known that feature engineering is to transform raw data into
features that better represent the underlying problem to the
predictive models, resulting in improved model accuracy on
unseen data. FIGURE 10. Time domain feature description of training data
In this essay, three types of input signals are applied.
In time-frequency domain, wavelet packet decomposition
Original time signal is the first choice for it contains all the
is applied to the time signal using 6𝑡ℎ order Daubechies
information of the bearing data. As the bearing vibration
wavelet with seven-layer decomposition. Then the energy
signals are non-stationary complex data, frequency analysis
is simultaneously considered as the second way to describe value of each sub band is used as feature vector which has
the vibration characteristics. In Part B, the time and 64 elements and is given by
frequency representation are shown in Fig. 7. It should be 2
𝐸𝑖,𝑗 = ∑𝑘 𝑝𝑖,𝑗,𝑘 (29)
pointed out that a low-pass filter is used in the Fourier
transform process to remove the noise and useless where 𝐸𝑖,𝑗 is the energy value of the node 𝑖 in layer 𝑗 and

VOLUME XX, 2019 7

(a) Training dataset (b) Testing dataset

FIGURE 6. Run-to-fail bearing vibration signal

(b)Frequency domain with different RUL

(a) Time domain with different RUL
FIGURE 7. Time and frequency domain at different time points

(a) Time signal (b) Frequency signal (c) Feature vector

FIGURE 8. Training and testing errors over iterations using different features

(a) Time signal (b) Frequency signal (c) Feature vector

FIGURE 9. RUL estimation using different features

8 VOLUME XX, 2019

𝑝𝑖,𝑗,𝑘 represents the corresponding coefficient of the wavelet methods are introduced to compare with the proposed
packet transform. method on different testing datasets. Before presenting the
results, a score function is cited as a criterion for evaluating
D. EXPERIMENT RESULT the effectiveness of different methods from IEEE PHM 2012
The FCAMN model is trained inheriting the parameters of challenge [23].
the pre-trained A-MSCNN model. As features are highly
extracted in the first step training, the spatial distribution is First, the percent error of experiment 𝑖 is defined as
well generalized to the distribution of actual RUL. Moreover, 𝑎𝑐𝑡𝑅𝑈𝐿𝑖 −𝑝𝑟𝑒𝑑𝑅𝑈𝐿𝑖
𝐸𝑟𝑖 = 100 ∗ (31)
the training of the whole model can be converged in a shorter 𝑎𝑐𝑡𝑅𝑈𝐿𝑖
time compared with other models. As the iteration increases, where 𝑎𝑐𝑡𝑅𝑈𝐿𝑖 is the actual RUL of the 𝑖 𝑡ℎ experiment and
the training error and the testing error are shown in Fig. 9.
𝑝𝑟𝑒𝑑𝑅𝑈𝐿𝑖 is the RUL prediction of the 𝑖 𝑡ℎ experiment.
For time signals, the error over iterations tends to be stable
after about 50 epochs and ultimately decreases to less than Since underestimation and overestimation are considered in
100 and the testing error is a little higher. In this paper, a a different manner. Good estimates tend to predict less RUL
cross-average method is adopted to get final estimation. while bad ones exceed actual RUL. Considering these issues,
Giving an input sequence of 𝑋 = (𝑥1 , ⋯ , 𝑥𝑁 ), the output the score is defined as
sequence of 𝑌 = (𝑦1 , ⋯ , 𝑦𝑁 ) can be estimated. 𝑥𝑖 is the 𝑒𝑥𝑝 − 𝑙𝑛(0.5)∗(𝐸𝑟𝑖/5) , 𝐸𝑟𝑖 ≤ 0
vector of the vibration signals which contains 5120 points 𝐴𝑖 = { (32)
𝑒𝑥𝑝 + 𝑙𝑛(0.5)∗(𝐸𝑟𝑖/20) , 𝐸𝑟𝑖 > 0
and 𝑦𝑖 is the prediction of the bearing RUL corresponding to
𝑥𝑖 . Assuming that a window of length 𝑙 slides along 𝑋, a The final score is obtained by calculating average value of
prediction list 𝑃𝑗 is defined as all testing bearings’ scores on experiment 𝑖, which is defined
𝑗 𝑗 as
𝑃𝑗 = (𝑦𝑗 , ⋯ , 𝑦𝑗+𝑙 ) (30) 1
𝑆𝑐𝑜𝑟𝑒 = ∑5𝑖=1 𝐴𝑖 (33)
5
where 𝑗 is the sliding window’s start position in the sequence
𝑗 The mean absolute error (MAE) is defined as
𝑋 . In 𝑦𝑗+𝑙 , superscript 𝑗 represents the 𝑗𝑡ℎ window and
1
subscript (𝑗 + 𝑙) represents the (𝑗 + 𝑙)𝑡ℎ prediction in the 𝑀𝐴𝐸 = ∑5𝑖=1|𝑎𝑐𝑡𝑅𝑈𝐿𝑖 − 𝑝𝑟𝑒𝑑𝑅𝑈𝐿𝑖 | (34)
5
prediction list of 𝑌. Finally, the average value of 𝑦 with the
same subscript is calculated as the prediction. The process For comparison, several methods are applied using the
can be described in Fig. 11. indicators above. The first one is support vector regression
(SVR), which treats underestimation and overestimation in
the same manner. The RNN facilitates estimations benefiting
from its correlation and monotonicity. The MSCNN uses the
multi-features to extract local and global details for making
a better prediction of bearing RUL.
1). COMPARISION WITH SAME FEATURE
The comparison results with the same time signals can be
seen in Table V. Time signal is chosen for it contains
complete details. In the table, MAE shows the average
deviation from the actual RUL, and the score demonstrates a
FIGURE 11. Cross average strategy
comprehensive indicator to measure the effectiveness of the
Another strategy adopted in this paper is setting the forecasts. The performance of the RNN is not that satisfying
maximum RUL as 1200. From the plot of every training and because of its feeblish ability to extract features. The
testing signals, the degradation duration is less than 1200 MSCNN makes a better prediction, but is not as stable as the
even in the most stable run-to-failure process. Therefore, the proposed method. The whole life prediction using these
time point before RUL 1200 can be treated as the normal methods without smooth strategy is plotted in Fig. 12.
stable working condition. With these two assumptions, the
estimation results is shown in Fig. 9. Moving average smooth TABLE V
COMPARISON I: TIME SIGNAL
strategy is applied for better observation. Besides, more
Actual SVR RNN MSCNN FCAMN
severe punishment is imposed on the loss function when Dataset
RUL(10s) RUL(10s) RUL(10s) RUL(10s) RUL(10s)
overestimation. Fig. 9 verifies the effectiveness of this Bearing1_3 965 1134 1049 759 981
Bearing1_4 41 74 352 51 32
punishment operation. The predictions of bearing RUL are Bearing1_5 999 965 917 980 1151
lower than the actual ones relatively in total. Bearing1_6 836 1202 1141 1006 884
Bearing1_7 1001 1142 1116 1368 1081
MAE 0 148.6 179.4 154.4 61.0
E. METHOD COMPARISION Score 1 0.2242 0.2523 0.3027 0.4329
In this part, several machine learning and deep learning

VOLUME XX, 2019 9

future research directions to be studied.

1) The multi scales are mainly reflected in only one layer.
Although we apply adaptive method, it is unclear how to
form an optimal combination of multi scales for the network
structure. More trials need to be conducted for better results.
2) The raw time signals are imported into the network as
input. As is shown in Fig. 6, signals contain strong noise that
makes the input sequence rough and unstable, which may
reduce the generalization ability of the proposed model. To
solve this problem, some filtering processes can be added
before the input layer to remove noises.
3) In most researches about bearing RUL prediction, using
time signal data is not a good choice to train the deep neural
networks because features are difficult to be extracted from
time domain data. Time-frequency representations are better
FIGURE 12. Bearing remaining useful life estimation of different choices because they can assimilate experiences from image
methods using time signals convolution field.
4) The training is divided into two steps. Firstly, the A-
It can be seen that MSCNN reconstructs the trend of the
MSCNN model is pre-trained. Secondly, FCAMN applies
bearings’ RUL with good performance, but the variance is
the parameters from pre-training processing for further
rather high. RNN predicts a relatively stable RUL trajectory
training. Classically, much time can be saved in the training
but the prediction is featured with low precision. The
step applying this method. However, pre-trained parameters
proposed method makes great improvement in prognostics.
may have great fluctuation depending on the distribution of
the input data which might conversely increase the training
2). COMPARISION WITH DIFFERENT FEATURES
In the experiment with same feature, there may be some time and even lead to over-fitting. A solution is that we keep
cases that the input signal does not match the proposed model. freezing the previous part of FCAMN in the second step until
Some classic models like SVM may only be suitable for the training error reaches a low threshold. Then, the “frozen”
layers are “activated”, and parameters in the previous part
some simple and concentrated features. In this case, complex
are to join in the training process.
time signals might lead to bad results using SVM. In this part,
5) Currently, bearings’ operation condition is assumed to
several models are trained with their best-performed features
be constant. We will extend our method for predicting
and the result is shown in Table VI. The superscript in the
bearing RUL under different working conditions with a
header represents the type of the input signals selected to be
hybrid training method or adjust the model according to
trained in which “1” represents original time signal, “2” links
different working conditions.
to frequency signal and “3” stands for feature vector.
TABLE VI V. CONCLUSION
COMPARISON II: DIFFERENT SIGNAL In this paper, a novel FCAMN model for bearing RUL
Actual SVR3 RNN3 MSCNN1 FCAMN1
Dataset
RUL(10s) RUL(10s) RUL(10s) RUL(10s) RUL(10s) prediction is proposed. In order to deal with the challenge
Bearing1_3 965 1011 1023 759 981 that traditional data-driven methods cannot predict the
Bearing1_4 41 61 28 51 32
Bearing1_5 999 1036 864 980 1151 bearing RUL accurately, we presented a two-step training
Bearing1_6 836 1153 956 1006 884 method, which combines the adaptive MSCNN and the
Bearing1_7 1001 1097 1189 1368 1081
MAE 0 103.2 102.8 154.4 61.0 LSTM. The previous part takes advantage of multi scales that
Score 1 0.27717 03209 0.3027 0.4329 global and local features can be extracted together. The last
part utilizes long short-time memory cells, thus historical
From the table, it is clear that proposed model performs bearing conditions can be studied in the network for future
best among the other models even in manual feature predictions. The details of the proposed network are
engineering. It is verified that FCAMN is qualified for comprehensively described in the previous sections. The
different feature signals. Moreover, using this model to deal proposed model is validated on both IEEE PHM and XJTU
with time domain signals can get better estimation bearing datasets. Comparisons with other methods prove that
performance with less time and computing cost. FCAMN performs best among these approaches. Moreover,
drawbacks and future directions are discussed.
IV. DISCUSSION
The principle of the proposed model has been fully
explained, and the effectiveness has been substantially
verified in previous sections. Moreover, there are also some

10 VOLUME XX, 2019

REFRENCE [20] W. Rawat, “Deep Convolutional Neural Networks for Image

Classification: AComprehensive Review,” Neural Comput., vol.
29, pp. 2352–2449, 2017.
[21] G. S. Babu, P. Zhao, and X. L. Li, “Deep convolutional neural
[1] A. K. S. Jardine, D. Lin, and D. Banjevic, “A review on
network based regression approach for estimation of remaining
machinery diagnostics and prognostics implementing condition-
useful life,” Lect. Notes Comput. Sci. (including Subser. Lect.
based maintenance,” Mech. Syst. Signal Process., vol. 20, no. 7,
Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 9642, pp.
pp. 1483–1510, 2006.
214–228, 2016.
[2] R. Ahmad and S. Kamaruddin, “An overview of time-based and
[22] W. Xiong, L. Zhang, B. Du, and D. Tao, “Combining local and
condition-based maintenance in industrial application,” Comput.
global: Rich and robust feature pooling for visual recognition,”
Ind. Eng., vol. 63, no. 1, pp. 135–149, 2012.
Pattern Recognit., vol. 62, pp. 225–235, 2017.
[3] T. H. Loutas, D. Roulias, and G. Georgoulas, “Remaining useful
[23] J. Zhu, N. Chen, and W. Peng, “Estimation of Bearing
life estimation in rolling bearings utilizing data-driven
Remaining Useful Life Based on Multiscale Convolutional
probabilistic E-support vectors regression,” IEEE Trans. Reliab.,
Neural Network,” IEEE Trans. Ind. Electron., vol. 66, no. 4, pp.
vol. 62, no. 4, pp. 821–832, 2013.
3208–3216, 2019.
[4] J. Schwiesau et al., “Biotribology of alternative bearing
[24] P. Nectoux et al., “PRONOSTIA: An Experimental Platform for
materials for unicompartmental knee arthroplasty,” Acta
Bearings Accelerated Life Test,” IEEE Int. Conf. Progn. Heal.
Biomater., vol. 6, no. 9, pp. 3601–3610, 2010.
Manag., 2012.
[5] R. B. W. Heng and M. J. M. Nor, “Statistical analysis of sound
[25] L. Guo, N. Li, F. Jia, Y. Lei, and J. Lin, “A recurrent neural
and vibration signals for monitoring rolling element bearing
network based health indicator for remaining useful life
condition,” Appl. Acoust., vol. 53, no. 1–3, pp. 211–226, 2002.
prediction of bearings,” Neurocomputing, vol. 240, pp. 98–109,
[6] Y. Wang, Y. Peng, Y. Zi, X. Jin, and K. L. Tsui, “A Two-Stage
2017.
Data-Driven-Based Prognostic Approach for Bearing
[26] A. Saxena and K. Goebel, “Turbofan Engine Degradation
Degradation Problem,” IEEE Trans. Ind. Informatics, vol. 12,
Simulation Data Set,” ACM Ref. Format, 2017.
no. 3, pp. 924–932, 2016.
[27] A. Saxena, K. Goebel, D. Simon, and N. Eklund, “Damage
[7] B. Zhang, L. Zhang, J. Xu, and P. Wang, “Performance
propagation modeling for aircraft engine run-to-failure
degradation assessment of rolling element bearings based on an
simulation,” in 2008 International Conference on Prognostics
index combining SVD and information exergy,” Entropy, vol.
and Health Management, 2008, pp. 1–9.
16, no. 10, pp. 5400–5415, 2014.
[28] A. Graves, “Long Short-Term Memory,” 2012, pp. 37–45.
[8] X. S. Si, W. Wang, C. H. Hu, and D. H. Zhou, “Remaining
[29] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. The
useful life estimation - A review on the statistical data driven
MIT Press, 2016.
approaches,” Eur. J. Oper. Res., vol. 213, no. 1, pp. 1–14, 2011.
[9] Y. Lei, N. Li, S. Gontarz, J. Lin, S. Radkowski, and J. Dybala,
“A Model-Based Method for Remaining Useful Life Prediction
of Machinery,” IEEE Trans. Reliab., vol. 65, no. 3, pp. 1314–
1326, 2016.
[10] A. Rai and S. H. Upadhyay, “Intelligent bearing performance
degradation assessment and remaining useful life prediction
based on self-organising map and support vector regression,”
Proc. Inst. Mech. Eng. Part C J. Mech. Eng. Sci., vol. 232, no. 6,
pp. 1118–1132, 2018.
[11] J. Ben Ali, B. Chebel-Morello, L. Saidi, S. Malinowski, and F.
Fnaiech, “Accurate bearing remaining useful life prediction
based on Weibull distribution and artificial neural network,”
Mech. Syst. Signal Process., vol. 56, pp. 150–172, 2015.
[12] X. Xi, M. Chen, D. Zhou, S. Member, and M. Carlo, “Processes
With Memory Effects,” IEEE Trans. Reliab., vol. 66, no. 3, pp.
751–760, 2017.
[13] Y. Lei, N. Li, L. Guo, N. Li, T. Yan, and J. Lin, “Machinery
health prognostics: A systematic review from data acquisition to
RUL prediction,” Mech. Syst. Signal Process., vol. 104, pp. 799–
834, 2018.
[14] A. Voulodimos, N. Doulamis, A. Doulamis, and E.
Protopapadakis, “Deep Learning for Computer Vision: A Brief
Review,” Comput. Intell. Neurosci., vol. 2018, pp. 1–13, 2018.
[15] I. Shahin, M. Azzeh, K. Shaalan., I. Attili, and A. B. Nassif,
“Speech Recognition Using Deep Neural Networks: a
Systematic Review,” IEEE Access, vol. 7, pp. 1–1, 2019.
[16] G. Neubig, “Neural Machine Translation and Sequence-to-
sequence Models: A Tutorial,” pp. 1–65, 2017.
[17] R. Zhao, R. Yan, Z. Chen, K. Mao, P. Wang, and R. X. Gao,
“Deep Learning and Its Applications to Machine Health
Monitoring: A Survey,” vol. 14, no. 8, pp. 1–14, 2016.
[18] T. Li, J. Wan, C. W. de silva, T. Shu, Z. Wang, and M. Xia, “A
Two-Stage Approach for the Remaining Useful Life Prediction
of Bearings using Deep Neural Networks,” IEEE Trans. Ind.
Informatics, vol. PP, no. c, pp. 1–1, 2018.
[19] L. D. L. C. J. B. B. J. S. D. D. H. R. E. H. W. Hubbard, B. Le
Cun, J. Denker, and D. Henderson, “Introduction.,” Dermatol.
Surg., vol. 39, no. 1 Pt 2, p. 149, 2013.

VOLUME XX, 2019 11

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
View publication stats