Real-Time Bearing Remaining Useful Life Estimation Based on the Frozen Convolutional and Activated Memory Neural Network
Real-Time Bearing Remaining Useful Life Estimation Based on the Frozen Convolutional and Activated Memory Neural Network
net/publication/334500740
CITATIONS READS
3 224
4 authors, including:
Xiaotong Tu Yue Hu
Xiamen University Shanghai Jiao Tong University
32 PUBLICATIONS 185 CITATIONS 27 PUBLICATIONS 192 CITATIONS
Fucai Li
Shanghai Jiao Tong University
135 PUBLICATIONS 1,211 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Impact of Environmental Parameters on guided waves propagation during Structural Health Monitoring of a structure View project
All content following this page was uploaded by Xiaotong Tu on 18 July 2019.
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.Doi Number
ABSTRACT Bearings are widely used in rotating machinery such as aircraft engines and wind turbines. In
this paper, we proposed a new data-driven method called frozen convolution and activated memory network
(FCAMN) for bearing remaining useful life (RUL) estimation based on the deep neural network. The
proposed method is composed of two parts: the multi-scale convolutional neural network is carried out to
pre-train the raw data to directly obtain the global and local features; the second step is accomplished by the
convolutional-memory neural network, which enables to connect the convolutional layer with the long-short-
time-memory layer together to predict the continuous bearing RUL. Compared with traditional networks, the
proposed network can additionally extract both the global and local information on the vertical feature axis
and the associated context information on the horizontal time axis. Experiments are conducted to prove that
the proposed method requires fewer training samples and outperforms than other methods in RUL estimation.
INDEX TERMS Bearings, remaining useful life estimation, multi-scale convolutional network, long short
time memory neural network.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2929271, IEEE Access
[9] proposed a novel approach, which combines the weighted method possesses both of the features for predicting bearing
minimum quantization error and particle filtering-based RUL appears to have better performance. To fill this research
algorithm to construct the degradation process. gap, the frozen convolution and activated memory network
Simultaneously, Rai et al. [10] presented a new health (FCAMN) approach is proposed, which contains a multi-
indicator using the self-organizing map with support vector scale convolutional-memory network. The convolution part
regression to predict bearing RUL. Ben et al. [11] explored is to extract primary features and the second part is to learn
the Simplified Fuzzy Adaptive Resonance Theory Map the features to predict the continuous bearing RUL.
(SFAM) using Weibull distribution to match measurement The rest of this paper is organized as follows. Section II
and to avoid time domain fluctuation. Xi et al. [12] also provides the methodology of the proposed method. Section III
established a new degradation process with memory using verifies the proposed method using several bearing run-to-fail
fractional Brownian motion to compensate the drawback of datasets. Section IV discusses future research works, and
the memoryless Markovian process. A recent review Section V is the conclusion of this paper.
presented some other machine learning methods [13].
The third class is deep learning, which makes great A. NETWORK STRUCTURE
progress in computer vision [14], speech recognition [15], As described in the first section, the proposed model
machine translation [16] and diagnostics [17]. However, contains an adaptive multi-scale convolutional neural
there are few references in the field of RUL prediction, and network(A-MSCNN) and a CNN-LSTM network. The
these approaches seldom reach the expected performance on overall flowchart is shown in Fig. 1.
the prognostic. Li et al. [18] presented a two-stage method
which combines a denoising autoencoder-based deep neural
network and a shallow regression neural network to obtain
the final RUL. Another representative deep learning model
is the convolutional neural network (CNN), which is first
proposed by LeCun et al. [19] and used in image
classification on the MNIST dataset. Benefiting from the
strong power of the CNN [20], researchers have made great
efforts to deal with CBM problems on diagnosis work, but
prognosis work is relatively lacking. Babu et al. [21] applied
the convolutional and pooling filters along the dimension
over the multivariate time series and predicted the bearing
RUL based on the regression approach. On the basis of the
CNN method. Besides, Wei et al. [22] combined local and
global pooling method to better extract rich and robust
representation from sparse feature maps learned from the
raw data in pattern recognition. Inspired by the multi-scale
method, Zhu et al. [23] presented a new multi-scale
convolutional neural network (MSCNN), which takes time-
frequency representation (TFR) to represent the raw time
FIGURE 1. Flowchart of the proposed network structure
signals and extracts non-stationary features successfully.
After getting TFR, a multi-layer structure which integrates The training process is divided into two steps. Firstly, the
the last convolutional layer with some previous pooling A-MSCNN is pre-trained, which aims to extract detailed
layers is employed to train this regression task. The features from the raw vibration signals. In this step, the data
MSCNN’s effectiveness has been validated on the IEEE to import the input layer can be the raw time series, pre-
PHM 2012 Data Challenge Set [24]. The other representative processed frequency series or TFR series. The parameters of
deep learning method is the recurrent neural network (RNN). convolutional layers, pooling layers and fully connected
Since the state of the bearing is a continuous process, thus layers are randomly initialized since this model is prepared
RNN should be adopted to establish the context along the for pre-training. The output layer of the A-MSCNN gives out
time axis. Guo et al. [25] selected the main features with the dense_layer_2 (the second fully connected layer, FC)
correlation and monotonicity algorithms from time- values to compare with the labels and start backpropagation.
frequency features, and then proposed the RNN-HI model to After pre-training, the A-MSCNN model is “frozen” in the
estimate the bearing RUL. Gugulothu et al. [26] used the graph model which means the model structure and the weight
RNN to generate embeddings for multivariate time series parameters are fixed as constants. In the second step, the
and verified the method on turbofan engine datasets [27] model’s dense_layer_1 (the first fully connected layer, FC)
from NASA Ames Prognostics Data Repository. is connected to the CNN-LSTM model. The model saved in
Due to advantages of both the CNN and the RNN, a new the first step is loaded and the pre-trained network
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2929271, IEEE Access
Author Name: Preparation of Papers for IEEE Access (February 2017)
parameters are imported into the front part of FCAMN model features by certain rules. In this paper, taking interval from 0
at the same time corresponding to “activation”. In this step, to 𝑀 is applied on the feature sequences, where 𝑀 represents
dense_layer_1 value becomes the input of the CNN-LSTM the scales used in the network. Then, the same convolution
model. In the second training process, the FCAMN model is kernel is convolved with each secondary feature sequence of
trained through the A-MSCNN, the CNN, the LSTM and the certain layer to obtain features of different time scales. Finally,
FC, and finally the bearing RUL prediction is given by the all the secondary feature sequences inherited from the same
last layer of FC. An end-to-end deep learning approach is primary feature are joined together in turn to form a multi-
advocated which requires no expert knowledge. In the scale feature.
following parts of this section, the model structure and Jump convolution is defined as
mathematical derivation are described. 𝑙−1 𝑙−1
𝑐𝑜𝑛𝑣1𝐷(𝑤1𝑘 , 𝑠𝑖1 )
······
B. PRE-TRAINED MULTI-SCALE CONVOLUTION 𝐽𝑐𝑜𝑛𝑣1𝐷(𝜔𝑘𝑙−1 , 𝑠𝑖𝑙−1 ) = 𝑐𝑜𝑛𝑣1𝐷(𝑤𝑗𝑘𝑙−1 𝑙−1
, 𝑠𝑖𝑗 ) (1)
NETWORK ······
The A-MSCNN is used for extracting features from the raw { 𝑐𝑜𝑛𝑣1𝐷(𝑤𝑀 𝑙−1 𝑙−1
, 𝑠𝑖𝑀 )
𝑙−1 𝑘 𝑙−1
time signals. After this process, the A-MSCNN model can
learn highly concentrated features, which is beneficial to 𝐿𝑒𝑛𝑔𝑡ℎ(𝑠𝑖𝑙−1 )−1
𝑀𝑙−1 = 𝐺𝑟𝑖𝑑𝑆𝑒𝑎𝑟𝑐ℎ(1, [ 𝑙−1 )−1 ]) (2)
match the overall model fast and decrease the difficulties of 𝐿𝑒𝑛𝑔𝑡ℎ(𝑤𝑘
generalizing the whole model. Besides, an adaptive multi-
scale strategy is reflected in adjusting the scales automatically where 𝑠𝑖𝑙−1 is the 𝑖 𝑡ℎ feature of layer 𝑙 − 1; 𝑤𝑘𝑙−1 is the 𝑘 𝑡ℎ
to achieve the optimal results. Considering the original time convolution kernel of layer 𝑙 − 1 ; 𝑠𝑖𝑗𝑙−1 is the 𝑗𝑡ℎ scale
vibration signals, the kernel size is set to 8 after applying secondary feature inherited from the primary feature 𝑠𝑖𝑙−1 by
𝑙−1
different trails from 4 to 32. The brief network architecture is setting the interval parameter as 𝑗; 𝑤𝑗𝑘 is the convolution
𝑙−1
presented in Table I. weights corresponding to 𝑠𝑖𝑗 and has the same parameters
TABLE I as 𝑤𝑘𝑙−1 ; 𝑐𝑜𝑛𝑣1𝐷 represents the one dimension convolution
LAYER DETAILS OF THE A-MSCNN ARCHITECTURE operation; 𝑀𝑙−1 is the number of different secondary scales
Layer 1 input channels=2, sequence_length=2560 of layer 𝑙 − 1; [ ] refers to the rounding down operation, and
Layer 2 1D-convolution
filters=32, kernel_size=8, strides=1, 𝐺𝑟𝑖𝑑𝑆𝑒𝑎𝑟𝑐ℎ represents a linear search method for finding
padding=same, act=ReLu
the optimal parameter.
multi-scale pooling method=grid_search, target_scales=3
filters=64, kernel_size=8, strides=1,
According to Fig. 2, the forward propagation for multi-
Layer 3 1D-convolution
padding=same, act=ReLu scale kernel layers between layer 𝑙 and 𝑙 − 1 can be
1D-max-pooling pool_size=4, strides=8, padding=same expressed as
filters=128, kernel_size=8, strides=1,
1D-convolution 𝑁
Layer 4 padding=same, act=ReLu 𝑥𝑘𝑙 = 𝑏𝑘𝑙 + ∑𝑖=1
𝑙−1
𝐽𝑐𝑜𝑛𝑣1𝐷(𝑤𝑘𝑙−1 , 𝑠𝑖𝑙−1 ) (3)
1D-max-pooling pool_size=4, strides=8, padding=same
fully-connected layer_size=50, act=ReLu 𝑦𝑘𝑙 = 𝑓(𝑥𝑘𝑙 ) (4)
Layer 5
dropout keep_prob=0.8
Layer 6 fully-connected layer_size=1, act=None
where 𝐾 shows that layer 𝑙 − 1 has 𝐾 kinds of convolution
Layer 7 output bearing RUL prediction, shape=1, value=Layer 6
kernels; 𝑏𝑘𝑙 represents the overall bias of the neuron; 𝑁𝑙−1 is
the number of features of layer 𝑙 − 1; 𝑥𝑘𝑙 refers to the input
of the 𝑘 𝑡ℎ neuron of layer 𝑙, and 𝑦𝑘𝑙 indicates the activated
The A-MSCNN model is composed of multi-scale kernel
output value of the above neuron of layer 𝑙.
layers, convolutional layers, down-sampling layers and fully
Subsequently, a max-pooling layer is employed for down-
connected layers. The multi-scale kernel layer is the core of
sampling operation, which is written as
the network, which is devoted to extracting features of
different time scales. The function of the convolutional layer 𝑠𝑘𝑙 = 𝑦𝑘𝑙 ↓ 𝑠𝑠 (5)
is to abstract low-level features of the signals into advanced-
where 𝑠𝑘𝑙
is the output value after the max-pooling
level features. The down-sampling layer aims to reduce the
processing; ↓ 𝑠𝑠 represents down-sampling operation. In this
dimension of the features and enhance the generalization
ability of the model, and the fully connected layer plays a paper, all down-sampling layers refer to max-pooling layers.
comprehensive decision-making role. 2) The Convolution Feature Extraction Layer:
1) The Multi-scale Kernel Feature Extraction Layer:
The overall structure of this layer and the calculation
To solve the insensitivity to time series of the one- process are shown in Fig. 3.
dimension convolutional neural network, a multi-scale kernel In this layer, the forward propagation between layer 𝑙 and
layer structure is proposed. Firstly, the feature extraction layer 𝑙 − 1 is expressed as
machine is applied on some front layer features then primary 𝑁
𝑥𝑘𝑙 = 𝑏𝑘𝑙 + ∑𝑖=1
𝑙−1 𝑙−1 𝑙−1
𝑐𝑜𝑛𝑣1𝐷(𝑤𝑖𝑘 , 𝑠𝑖 ) (6)
scale features are extracted into secondary scale segmented
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2929271, IEEE Access
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2929271, IEEE Access
Author Name: Preparation of Papers for IEEE Access (February 2017)
convolutional layers and multi-scale kernel layers, the 1D-max-pooling pool_size=2, strides=2, padding=same
filters=72, kernel_size=2, strides=1,
gradient matrix is given as 1D-convolution
padding=same, act=ReLu
Layer10
′
𝛿 𝑙 = 𝑢𝑝(𝛿 𝑙+1 ).∗ 𝑓𝑟𝑒𝑙𝑢 (𝑥 𝑙 ) (16) 1D-max-pooling pool_size=2, strides=2, padding=same
fully-connected layer_size=50, act=ReLu
Layer11
The gradient matrix of pooling layers is given as dropout keep_prob=0.8
LSTM-cell layer_num=2, cell_num=150
𝑙 𝑙+1 𝑙+1 ))
𝛿 = 𝑐𝑜𝑛𝑣(𝛿 , 𝑟𝑜𝑡180(𝑤 (17) Layer12
dropout Keep_prob=0.8
𝑙 𝑙+1 fully-connected layer_size=50, act=ReLu
where 𝛿 is the gradient matrix of layer 𝑙; 𝑤 is the weight Layer13
dropout keep_prob=0.8
matrix between layer 𝑙 and layer 𝑙 + 1 , and 𝑟𝑜𝑡180
fully-connected layer_size=1, act=None
indicates that the matrix rotates 180 degrees. Layer14
dropout keep_prob=0.8
Finally, the derivative to weights and biases can be Layer15 output bearing RUL prediction, shape=1, value=Layer 14
calculated as follows according to Fig. 3. The derivation of
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2929271, IEEE Access
The core of this model is LSTM, one of the most efficient are fed into the LSTM layer along the time axis at a time.
gated RNNs whose gradient does not disappear or get stuck Number 50 shows that each input vector has 50 features
in explosive gradient growth. Different from traditional which are extracted from the A-MSCNN network. A
RNNs, the weights of self-loop of LSTM are determined by comprehensive report about the backpropagation process can
the context. Gates are used to control these weights so that be found in [28].
cumulative time scales can be changed dynamically. The Because the whole model is complex, some regularization
basic framework is shown in Fig. 4. techniques are adopted to prevent over-fitting. Dropout is
one of the most efficient way to deal with the issue. In
addition, parameter norm strategy is simultaneously taken
into consideration for better performance. In this paper, the
L2-norm penalty is chosen for all layers, and the weight
value is set to 0.1.
ℎ𝑡 = 𝑜𝑡 𝑡𝑎𝑛ℎ( 𝑐𝑡 ) (28)
where 𝜎 is the logistic sigmoid operator; tanh is the
hyperbolic tangent operator; 𝑖𝑡 denotes the input gate; 𝑓𝑡
denotes the forget gate; 𝑐𝑡 denotes the cell activation vector,
and 𝑜𝑡 denotes the output gate. 𝑊𝑖𝑥 , 𝑊𝑖ℎ , 𝑊𝑓𝑥 , 𝑊𝑓ℎ , 𝑊𝑐𝑥 , 𝑊𝑐ℎ ,
𝑊𝑜𝑥 and 𝑊𝑜ℎ are weight vectors, which are defined the same
way as 𝑊ℎ𝑥 , 𝑊ℎℎ and 𝑊𝑦ℎ . 𝑏𝑖 , 𝑏𝑓 , 𝑏𝑐 and 𝑏𝑜 are biases which
have the similar definition with 𝑏ℎ and 𝑏𝑦 . All the above
formulas respond to time 𝑡 or time 𝑡 − 1 depending on the
subscripts.
In the CNN-LSTM model, the input of LSTM layers has AC Motor Speed Sensor Speed Reducer Torquenmeter Accelerometers
the shape of 100*50. Number 100 represents that 100 vectors FIGURE 5. Bearing experiment platform
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2929271, IEEE Access
Author Name: Preparation of Papers for IEEE Access (February 2017)
C. FEATURE ENGINEERING
To compare the applicability of the model to various input
signals, feature engineering is taken into consideration. It’s
known that feature engineering is to transform raw data into
features that better represent the underlying problem to the
predictive models, resulting in improved model accuracy on
unseen data. FIGURE 10. Time domain feature description of training data
In this essay, three types of input signals are applied.
In time-frequency domain, wavelet packet decomposition
Original time signal is the first choice for it contains all the
is applied to the time signal using 6𝑡ℎ order Daubechies
information of the bearing data. As the bearing vibration
wavelet with seven-layer decomposition. Then the energy
signals are non-stationary complex data, frequency analysis
is simultaneously considered as the second way to describe value of each sub band is used as feature vector which has
the vibration characteristics. In Part B, the time and 64 elements and is given by
frequency representation are shown in Fig. 7. It should be 2
𝐸𝑖,𝑗 = ∑𝑘 𝑝𝑖,𝑗,𝑘 (29)
pointed out that a low-pass filter is used in the Fourier
transform process to remove the noise and useless where 𝐸𝑖,𝑗 is the energy value of the node 𝑖 in layer 𝑗 and
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2929271, IEEE Access
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2929271, IEEE Access
Author Name: Preparation of Papers for IEEE Access (February 2017)
𝑝𝑖,𝑗,𝑘 represents the corresponding coefficient of the wavelet methods are introduced to compare with the proposed
packet transform. method on different testing datasets. Before presenting the
results, a score function is cited as a criterion for evaluating
D. EXPERIMENT RESULT the effectiveness of different methods from IEEE PHM 2012
The FCAMN model is trained inheriting the parameters of challenge [23].
the pre-trained A-MSCNN model. As features are highly
extracted in the first step training, the spatial distribution is First, the percent error of experiment 𝑖 is defined as
well generalized to the distribution of actual RUL. Moreover, 𝑎𝑐𝑡𝑅𝑈𝐿𝑖 −𝑝𝑟𝑒𝑑𝑅𝑈𝐿𝑖
𝐸𝑟𝑖 = 100 ∗ (31)
the training of the whole model can be converged in a shorter 𝑎𝑐𝑡𝑅𝑈𝐿𝑖
time compared with other models. As the iteration increases, where 𝑎𝑐𝑡𝑅𝑈𝐿𝑖 is the actual RUL of the 𝑖 𝑡ℎ experiment and
the training error and the testing error are shown in Fig. 9.
𝑝𝑟𝑒𝑑𝑅𝑈𝐿𝑖 is the RUL prediction of the 𝑖 𝑡ℎ experiment.
For time signals, the error over iterations tends to be stable
after about 50 epochs and ultimately decreases to less than Since underestimation and overestimation are considered in
100 and the testing error is a little higher. In this paper, a a different manner. Good estimates tend to predict less RUL
cross-average method is adopted to get final estimation. while bad ones exceed actual RUL. Considering these issues,
Giving an input sequence of 𝑋 = (𝑥1 , ⋯ , 𝑥𝑁 ), the output the score is defined as
sequence of 𝑌 = (𝑦1 , ⋯ , 𝑦𝑁 ) can be estimated. 𝑥𝑖 is the 𝑒𝑥𝑝 − 𝑙𝑛(0.5)∗(𝐸𝑟𝑖/5) , 𝐸𝑟𝑖 ≤ 0
vector of the vibration signals which contains 5120 points 𝐴𝑖 = { (32)
𝑒𝑥𝑝 + 𝑙𝑛(0.5)∗(𝐸𝑟𝑖/20) , 𝐸𝑟𝑖 > 0
and 𝑦𝑖 is the prediction of the bearing RUL corresponding to
𝑥𝑖 . Assuming that a window of length 𝑙 slides along 𝑋, a The final score is obtained by calculating average value of
prediction list 𝑃𝑗 is defined as all testing bearings’ scores on experiment 𝑖, which is defined
𝑗 𝑗 as
𝑃𝑗 = (𝑦𝑗 , ⋯ , 𝑦𝑗+𝑙 ) (30) 1
𝑆𝑐𝑜𝑟𝑒 = ∑5𝑖=1 𝐴𝑖 (33)
5
where 𝑗 is the sliding window’s start position in the sequence
𝑗 The mean absolute error (MAE) is defined as
𝑋 . In 𝑦𝑗+𝑙 , superscript 𝑗 represents the 𝑗𝑡ℎ window and
1
subscript (𝑗 + 𝑙) represents the (𝑗 + 𝑙)𝑡ℎ prediction in the 𝑀𝐴𝐸 = ∑5𝑖=1|𝑎𝑐𝑡𝑅𝑈𝐿𝑖 − 𝑝𝑟𝑒𝑑𝑅𝑈𝐿𝑖 | (34)
5
prediction list of 𝑌. Finally, the average value of 𝑦 with the
same subscript is calculated as the prediction. The process For comparison, several methods are applied using the
can be described in Fig. 11. indicators above. The first one is support vector regression
(SVR), which treats underestimation and overestimation in
the same manner. The RNN facilitates estimations benefiting
from its correlation and monotonicity. The MSCNN uses the
multi-features to extract local and global details for making
a better prediction of bearing RUL.
1). COMPARISION WITH SAME FEATURE
The comparison results with the same time signals can be
seen in Table V. Time signal is chosen for it contains
complete details. In the table, MAE shows the average
deviation from the actual RUL, and the score demonstrates a
FIGURE 11. Cross average strategy
comprehensive indicator to measure the effectiveness of the
Another strategy adopted in this paper is setting the forecasts. The performance of the RNN is not that satisfying
maximum RUL as 1200. From the plot of every training and because of its feeblish ability to extract features. The
testing signals, the degradation duration is less than 1200 MSCNN makes a better prediction, but is not as stable as the
even in the most stable run-to-failure process. Therefore, the proposed method. The whole life prediction using these
time point before RUL 1200 can be treated as the normal methods without smooth strategy is plotted in Fig. 12.
stable working condition. With these two assumptions, the
estimation results is shown in Fig. 9. Moving average smooth TABLE V
COMPARISON I: TIME SIGNAL
strategy is applied for better observation. Besides, more
Actual SVR RNN MSCNN FCAMN
severe punishment is imposed on the loss function when Dataset
RUL(10s) RUL(10s) RUL(10s) RUL(10s) RUL(10s)
overestimation. Fig. 9 verifies the effectiveness of this Bearing1_3 965 1134 1049 759 981
Bearing1_4 41 74 352 51 32
punishment operation. The predictions of bearing RUL are Bearing1_5 999 965 917 980 1151
lower than the actual ones relatively in total. Bearing1_6 836 1202 1141 1006 884
Bearing1_7 1001 1142 1116 1368 1081
MAE 0 148.6 179.4 154.4 61.0
E. METHOD COMPARISION Score 1 0.2242 0.2523 0.3027 0.4329
In this part, several machine learning and deep learning
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2929271, IEEE Access
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2019.2929271, IEEE Access
Author Name: Preparation of Papers for IEEE Access (February 2017)
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
View publication stats