Feature-Enhanced Multisource Subdomain Adaptation On Robust Remaining Useful Life Prediction

Feature-Enhanced_Multisource_Subdomain_Adaptation_on_Robust_Remaining_Useful_Life_Prediction

Uploaded by

street food

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views

Feature-Enhanced Multisource Subdomain Adaptation On Robust Remaining Useful Life Prediction

Feature-Enhanced_Multisource_Subdomain_Adaptation_on_Robust_Remaining_Useful_Life_Prediction

Uploaded by

street food

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

6130 IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 9, NO.

7, JULY 2024

Feature-Enhanced Multisource Subdomain

Adaptation on Robust Remaining
Useful Life Prediction
Hsuan-Wen Lu and Chia-Yen Lee , Senior Member, IEEE

Abstract—In prognostic and health management (PHM), the maintenance (PdM), which make the decision according to
remaining useful life (RUL) prediction is one of the key tasks. the predictive condition of equipment and allow flexibility in
However, a complete run-to-failure record (with label) is not always scheduling [3]. PdM supports reducing the unnecessary mainte-
available on some specific machines, for example, the new machine. nance associated with PM costs and preventing the unexpected
To address the issue, the new machine may refer to other machines breakdown associated with RM repair costs. Particularly, the
of the same or similar type (even old machines with labels) for
developing the prediction model. Transfer learning, particularly
remaining useful life (RUL) prediction, which estimates the
domain adaptation, is one methodology used to transfer the knowl- lifetime before a piece of equipment fails, is one of the most
edge gained from the source domain to the target domain. This challenging tasks in PdM. Thus, accurate prediction of RUL en-
study proposes feature-enhanced multi-source subdomain adap- ables to evaluate the health state of equipment, plan maintenance
tation (FEMSA) to predict the RUL. FEMSA learns the domain- actions, and improve the PdM [4], [43].
invariant features and characterizes the similarity by redefining The techniques of RUL prediction can be roughly divided into
the multiple source domains; that is, we handle the cross-domain model-based and data-driven approaches [4]. The model-based
generalization by reformulating the multiple operating conditions. techniques tend to build a physical model that is associated
In the experimental study, two datasets are applied to validate with degradation [5]. Data-driven techniques, which do not
the proposed FEMSA, and the result shows that the FEMSA can require much prior knowledge related to the specific equipment
provide more robust RUL prediction over time and thus improve
the PHM system.
and extract the degradation features from the historical data,
have become more popular recently. Lots of machine learning
Index Terms—Prognostic and health management, transfer methods were proposed to characterize the degradation process.
learning, domain adaptation, predictive maintenance, data science. For example, a tree-based method with bagging technique called
random forest was developed and used for RUL prediction of
spur gears failure mode [6]. The deep learning-based methods
I. INTRODUCTION with diverse neural network (NN) structures were proposed to
ITH the digital transformation of the manufacturing classify the health status of equipment and then predict RUL in
W industry, process capability, machining accuracy, and
product reliability are becoming more sophisticated [1]. To
a two-stage approach [7]. Moreover, the collaborative features
between failure mode and RUL prediction were captured by
enhance core competence, prognostic and health management using convolutional neural network (CNN) [8]. In addition, the
(PHM) has been introduced and implemented in recent years. estimation of RUL can also be treated as a time series forecast,
The maintenance strategy is one key activity to improve equip- and thus the recurrent neural network (RNN), such as long
ment availability and reduce maintenance costs as well. In short-term memory (LSTM), is commonly used for degradation
fact, equipment maintenance accounts for 15% to 70% of total analysis [9].
production costs [2]. The training and testing datasets are assumed to be ob-
The typical strategies include run-to-failure maintenance (also served from the same (or comparable) distributions in these
known as reactive maintenance, RM), preventive maintenance supervised approaches, and they contain features that can be
(PM), conditional-based maintenance (CBM), and predictive used to identify the traits of real behavior for prediction. In
actuality, though, model prediction performance may suffer
Manuscript received 11 December 2023; accepted 2 May 2024. Date of
greatly from changes in operating settings, especially when
publication 13 May 2024; date of current version 20 May 2024. This letter was it comes to the new status, as the models may have been
recommended for publication by Associate Editor A. Chehade and Editor C. Yan trained under particular, predetermined conditions. Thus, the
upon evaluation of the reviewers’ comments. This work was supported by the PHM was introduced to transfer learning (TL). Transfer learning
Ministry of Science and Technology under Grant MOST111-2221-E-002-197, aims to mitigate the cross-domain generalization problem by
Taiwan. (Corresponding author: Chia-Yen Lee.)
Hsuan-Wen Lu is with the Institute of Manufacturing Information and minimizing data distribution disparities between the source and
Systems, National Cheng Kung University, Tainan 70101, Taiwan (e-mail: target domains [10]. The primary purpose of TL is to extract
[email protected]). domain-invariant features from which the target domain can
Chia-Yen Lee is with the Department of Information Management, National be estimated using a regression-type model constructed in the
Taiwan University, Taipei 10617, Taiwan, and also with the Institute of Man-
ufacturing Information and Systems, National Cheng Kung University, Tainan
source domain. A branch of TL called domain adaptation (DA)
701, Taiwan (e-mail: [email protected]). is frequently applied to visual and robotic tasks. To improve the
Digital Object Identifier 10.1109/LRA.2024.3400160 segmentation network’s generalization capacity, [11] suggested

2377-3766 © 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Center for Science Technology and Information (CESTI). Downloaded on June 12,2024 at 02:38:57 UTC from IEEE Xplore. Restrictions apply.
LU AND LEE: FEATURE-ENHANCED MULTISOURCE SUBDOMAIN ADAPTATION ON ROBUST REMAINING USEFUL LIFE PREDICTION 6131

a multi-spectral unsupervised domain adaptation for use in The remainder of this study is organized as follows: Section II
thermal image semantic segmentation. [12] addressed the reality presents the fundamentals and methodology, and Section III
gap by transferring deep reinforcement learning strategies from introduces the FEMSA framework. Section IV uses two datasets
simulated environments to the real-world domain for visual including 2012 PHM challenge bearing data and XJTU-SY bear-
control tasks. To accomplish realistic domain adaptation for ing data to verify the proposed FEMSA. Section V concludes.
grasp pose detection, [13] employed a brand-new Grasp Pose
Domain Adaptation Network (GPDAN). II. FUNDAMENTALS AND METHODOLOGIES
For the RUL prediction, [14] proposed a framework to detect
state change time and estimate RUL by multiple layer percep- This section introduces the MDA, subdomain adaptation and
tron, which is combined with TL. Domain adversarial neural DA for PHM. These techniques are used to construct the pro-
networks (DANN) [15] combined domain adaptation and feature posed FEMSA framework.
learning to extract domain-invariant features for prediction. [16]
integrated the LSTM and DANN to adapt the target domain, A. Multi-Source Domain Adaptation
which only contains sensor information, and estimate the RUL
of aircraft gas turbine engines. To learn the domain-invariant DA methods usually contain a source domain with ground
features, CNN was employed as a feature extractor as well. truth and a target domain without true labels. DA was
Cheng et al. [17] proposed a transferable convolutional neural proposed to extract domain-invariant knowledge from the
network (TCNN) that used the CNN to extract the feature of source domain to infer the label of the target domain. Since
degradation and then employed the multiple-kernel maximum the observations in the source and target domains may follow
mean discrepancies to reduce the difference between distribu- different distributions, the key task is to find the common features
tions. Contrastive adversarial domain adaptation (CADA) [18] to reduce the difference between them. Maximum mean discrep-
focused on target-specific information when learning domain- ancy (MMD) [21], Wasserstein distance [22], and correlation
invariant features at the same time. alignment (CORAL) loss [23] are metrics used for DA assess-
However, most of the existing methods for RUL prediction ment. Besides single source domain, the data is often collected
based on transfer learning only use single-source domain adapta- from complex and multiple sources. It means that the difference
tion (SDA). The historical data was observed for only a single op- between the source domains should be considered, i.e., MDA.
erating condition to generalize the target condition. In practice, Rezaeianjouybari & Shang [19] transferred the knowledge from
however, a complete run-to-failure record (with label) is scarce multi-source domains into a single target domain by reducing the
and difficult to reproduce. We usually collect data from different sliced Wasserstein discrepancy [24]. [25] introduced a moment
operating conditions on different equipment. These data may matching-based distance metric to reduce the distance among
differ not only from the target domain but also from themselves all source and target domains.
(i.e., different distributions in the source domain). That is, treat-
ing all source domains as one single domain and applying the B. Subdomain Adaptation
SDA methods may not be appropriate in this case. Due to the Subdomain adaptation (Sub-DA) considers the relationship
limitation of SDA, the multi-source domain adaptation (MDA) between subdomains in a global domain and focuses on aligning
was introduced to address the diverse source domain and gain the the distributions of relevant subdomains. [26] used the adversar-
common features applicable to target domain. Previous studies ial structure to capture the fine-grained alignment of different
applied MDA and proposed the framework to reduce the feature data distributions. Zhu et al. [20] present a deep subdomain
distribution discrepancy between the target domain and each adaptation network (DSAN), which applied local MMD to align
source domain to improve fault diagnosis or RUL prediction the distributions of subdomains and then solved the image classi-
[3], [19]. fication problem. [27] constructed a deep adversarial subdomain
The previous MDA studies usually focused on aligning the adaptation network (DASAN) in order to align the relevant
domains globally, which may ignore the fine-grained informa- distributions of subdomains by minimizing the local MMD loss
tion in each source domain [20]. Besides the subdomain issue, of the same categories in the source domain and target do-
feature engineering and selection are more critical in MDA. main. Transfer with manifolds discrepancy alignment (TMDA)
Instead of the usual feature selection process, neuron networks [28] was proposed to integrate the subdomains by minimizing
were frequently used in DA investigations as feature extractors. the MMD. However, most of the methods mentioned above
Although the neuron network is powerful, it is also a computa- are based on adversarial network architectures, which may make
tional burden. In fact, quite a few studies have applied MDA to it difficult to select hyper-parameters and raise the convergence
fault diagnosis, but there is a gap about how to use run-to-failure issue.
data from multiple sources to improve RUL prediction under a
new working condition.
To address the limitation, this study proposes feature- C. Domain Adaptation for PHM
enhanced multi-source subdomain adaptation (FEMSA), which DA is employed in PHM and improves the scope and scale of
redefines the source subdomains and enhances feature extraction equipment maintenance with data-driven models since TL or DA
through data science techniques. We also propose a new loss have shown better performance in solving cross-domain issues
function that reduces the domain discrepancy and minimizes the [29]. However, most methods for DA are focused on diagnosis
source domain regression error simultaneously. We employ two missions. [30] using TCA and achieved good cross-domain clas-
decent datasets to validate the robustness of the RUL prediction, sification results in gearbox fault diagnosis. [31] proposed a deep
and also consider the concept drift to trigger the model retraining model where the MMD metric is employed to reduce domain dis-
mechanism of FEMSA for the streamline data collected from a crepancy in fault diagnosis. A deep generative model to generate
real system. fault target data using a labeled source domain and an unlabeled

Authorized licensed use limited to: Center for Science Technology and Information (CESTI). Downloaded on June 12,2024 at 02:38:57 UTC from IEEE Xplore. Restrictions apply.
6132 IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 9, NO. 7, JULY 2024

Fig. 1. DA vs FEMSA: (a) original source and target domain; (b) general domain adaptation; (c) redefine source domains; (d) domain adaptation.

target domain, resulting in better classification performance,

was proposed in [32]. [33] integrated the Kullback-Leibler
divergence and autoencoder in TL for fault diagnosis. For RUL
prediction, [18] developed a contrastive adversarial domain
adaptation, and [34] embedded kernel regression and DA
into the autoencoder architecture. [35] employed the Sub-DA
for RUL prediction under multiple operating conditions.
A novel variational auto-encoder-long–short-term memory
network-local weighted deep sub-domain adaptation network
(VLSTM-LWSAN) is proposed for RUL prediction in [36].
It applied the LSTM to compress the input data into the
interpretable latent space, and LWSAN is designed to capture
fine-grained information in multiple degradation stages based
on the interpretable latent space. [37] assessed the quality of
observations and tends to learn the knowledge from the source
domain, which has a higher similarity, by using Wasserstein
distance-based weighted DA. However, the relationship between
subdomains was not characterized in the previous PHM-related
studies.

III. FEATURE-ENHANCED MULTISOURCE SUBDOMAIN Fig. 2. Framework of FEMSA.

ADAPTATION
This section proposes a FEMSA for RUL prediction and fills
A. Feature Enhancement Module
the gap in the literature about subdomain issues in PHM. Fig. 1
compares DA and FEMSA, which combine the Sub-DA and Feature enhancement module contains three components to
MDA. Fig. 1(a) shows the original source domain and target generate important features by feature extraction.
domain. Fig. 1(b) illustrates general domain adaptation. After 1) Feature Engineering: Feature engineering is used to ex-
general domain adaptation, the distribution of each domain is tract the new features that characterize the observations and
approximately the same, but different distributions are usually enhance the prediction performance of RUL analysis [1], [38].
represented in complex systems at multiple degenerate stages. All data source domains and target domains are collected from
Only adopting a general DA approach may confuse features sensor values and production recipes that are related to target
between subdomains represented by different degenerate stages, equipment. Data include vibration, current, sound, temperature,
reducing the prediction performance. Therefore, we consider SVID (status variable identification), etc. The extracted features
that all source domains can be treated as a global domain and could be significantly different for different equipment in RUL
each source domain has its own subdomain, such as in Fig. 1(c); analysis. In this study, we extract both physical features (i.e.,
however, the number of subdomains may not be the same as explainable) such as peak-to-peak, mean amplitude, skewness,
the source domain, and different subdomains can be integrated kurtosis, etc. in the time domain [9], [39] and NN-based features
into several new subdomains. In FEMSA, the important step is (i.e., non-explainable) [33] for model training. The physical
to identify the subdomains under a global domain. Take these features are statistics or nonlinear patterns extracted from the
subdomains as the new source domains, then apply the MDA to time domain and frequency domain [40]. The raw data are
build the prediction model as shown in Fig. 1(d). The details of also transformed into a spectrogram, and then the non-physical
identifying subdomains will be illustrated in the follows. features are extracted from the convolutional autoencoder (CAE)
Fig. 2 shows the FEMSA framework, which contains a feature with the spectrogram as input [10], [41].
enhancement module (including feature engineering, source 2) Redefine Source Domain: In Sub-DA, we consider the
domain redefinition, and feature selection), a DA module, and relationship between subdomains and focus on aligning the
a regressor construction module. In the feature enhancement distributions of relevant subdomains. We attempt to identify each
module, we extract the features from the source domain, redefine subdomain from the original domain and treat each subdomain
the subdomains, and identify the important features for RUL as a source domain, then apply the MDA. As mentioned in Sec-
prediction. The DA module reduces the discrepancy between tion II, adversarial network could be used for Sub-DA; however,
source and target domains. The regressor construction module how to adjust the hyperparameters for algorithm convergence
trains the model for RUL prediction. We describe the details as should be addressed. In addition, the Sub-DA is often applied
follows: in classification problems, which may have a specific number of
Authorized licensed use limited to: Center for Science Technology and Information (CESTI). Downloaded on June 12,2024 at 02:38:57 UTC from IEEE Xplore. Restrictions apply.
LU AND LEE: FEATURE-ENHANCED MULTISOURCE SUBDOMAIN ADAPTATION ON ROBUST REMAINING USEFUL LIFE PREDICTION 6133

labels, but in regression problems, there is no clear category of

labels. Thus, we suggest using the clustering technique, which
requires less computation resources, instead of the adversarial
network. In cluster analysis, the number of clusters is often
required as an input parameter to trigger algorithms, but it is
usually unknown in real data. Therefore, practitioners often
resort to guessing a suitable number, leading to unsatisfactory
results. Consequently, identifying the appropriate number of
clusters is a key step in cluster analysis. [42] proposed a mech-
anism for automatically selecting the number of clusters, and
we extend the idea to identify new source domains. Let S d be
a source domain with d dimensions, Y be a set containing N
objects Y = {Y1 , Y2 , …, YN } in S d , and T c be the centroid of
target domain. Compute all distances between T c and Yi ∈ Y
to construct the distance distribution of Y with respect to obser-
vation point T c . Considering multiple peaks might be presented
in the distance distribution, i.e., the data could contain more
than one cluster, the distribution can be modelled as a Gaussian Fig. 3. Architecture of DA module and regressor module.
mixture model (GMM), and the expectation-maximization (EM)
algorithm is used to solve the GMM [43]. Since the number of
components in the GMM is unknown, we build multiple GMMs MMD [21] and CORAL Loss [23] are used as metrics to
with different numbers of components and solve these models. evaluate the domain discrepancy. MMD is defined as the largest
Then, we use the second-order variant of the Akaike information difference in expectations over functions between redefined
criterion (AIC) to select the best-fitted model whose number of source domain distribution and target domain distribution, which
components is the observed number of peaks in the distance are mapping to a reproducing kernel Hilbert space (RKHS).
distribution. The number of components in GMM is regarded Let xsi is the i -th sample of redefined source domain. Given
as the number of clusters. After determining the number of redefined source domain Ds = {xs1 , xs2 , . . . , xsN } and target
clusters, the k-medoid clustering method is applied to identify
domain DT = {xT1 , xT2 , . . . , xTM }, the estimation of MMD is
the subdomains.
defined as
3) Feature Selection: Feature selection has been used for
2
dimension reduction to address the curse of dimensionality in
machine learning. The objectives of feature selection include 1 1
d MMD
= φ(xi ) −
s T
φ xj , (1)
preparing understandable features, providing faster and more N xsi ∈Ds M T T
x ∈D
cost-effective features, improving the prediction performance j H
of the features, and establishing more comprehensible models
[44] [45]. After redefining the source domain into subdomains, where H is the RKHS endowed with a characteristic kernel,
the features highly relevant to RUL might be different from the φ(·) denotes the function mapping the original x to RKHS. The
original source domain. Thus, feature selection was applied to MMD loss between any redefined source domain k and target
identify the important features from subdomains. In this step, domain t ca be defined as
several models were applied to select important features by
dM
k
MD
= dM M D Fk (X sk ) , Fk X t , (2)
voting scheme [34]. Note that the important features might not
be the same in each source subdomain; the selection criterion
where Fk (·) is the extractor after passing feature enhancement
we suggest is that the features are at least voted for by any two
module and subnetwork, X sk refers to all samples of redefined
selectors.
source domain k, and X t represents all sample of target domain.
For the other, CORAL loss is the distance between the covari-
ance of the redefined source and target domains. The general
B. Domain Adaptation Module form is shown as below.
For MDA, this study applies deep learning with a simpler 1
C s − C T 2 ,
dCORAL = 2
(3)
architecture, referring to [3], as shown in Fig. 3. There are two 4d F
different modules, the domain adaptation module and the re-
gressor construction module, responsible for different tasks. The 2
where · denotes the squared matrix Forbenius norm. C S
domain adaptor uses a subnetwork with three layers and dropout F
mechanism to aliment the distribution between redefined source and C T are the covariance matrices of the redefined source and
and target domains, and the regressor uses a fully-connected target data, d is the dimension of data. Similar to (2), the CORAL
network to infer the RUL of the target domain. Each subnetwork loss of source domain k and target domain t is
is related to one subdomain; the input of the subnetwork is the
output of the feature selection module. We train each regressor dCORAL
k = dCORAL Fk (X sk ) , Fk X t . (4)
with 90% training data and 10% validation data from the source
domain to avoid overfitting. DA module is to minimize the This study proposes a composite metric ddis which combines
domain discrepancy between each pair of redefined source and MMD and CORAL loss to measure the domain discrepancy
target domains. with a parameter α, which provides a tradeoff between two.

Authorized licensed use limited to: Center for Science Technology and Information (CESTI). Downloaded on June 12,2024 at 02:38:57 UTC from IEEE Xplore. Restrictions apply.
6134 IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 9, NO. 7, JULY 2024

Accordingly, DA is carried out by minimizing ddis in the (5). TABLE I

TRANSFER TASKS FOR MULTI-SOURCE DOMAIN ADAPTATION
1 MMD 1 CORAL
m N
ddis = dk +α· dk (5)
m N
k=1 k=1

C. Regressor Construction Module

The DA narrows the distribution gap between the target rep-
resentation and each source representation, but does not directly
identify the difference of maintenance strategies among each
source domain. It means that the estimation of the target domain
may vary significantly when passed through different source
domain paths. To address the problem, we need to identify
the contribution of each source domain and then improve the TABLE II
prediction performance. We calculate the mean square error RMSE COMPARISON OF DA METHODS ON TRANSFER TASKS
(MSE) Ekmse of each source domain regressor k. The inverse
distance weight (IDW) is used to give them different weights
wk and sum them up.

1 sk
N
Ekmse = (yi − ŷisk )2 , (6)
N i=1

N
(E mse )−1
derr = wk Ekmse , where wk = N k (7)
mse )−1
k=1 k=1 (Ek

Similarly, minimizing the metric derr is able to enhance the of the bearings were collected with different rotation speeds and
prediction by ensembling the outputs of multiple regressors. magnitudes of radial loading.
Finally, the loss function Ltotal can be combined as follows. The XJTU-SY Bearing dataset includes the vibration accel-
eration data with 25.6kHz sampling frequency in the interval of
1.28s every 1 minute. The dataset has three operating conditions.
Ltotal = ddis + βderr , (8) We setup six different transfer tasks to validate the proposed
where β is a parameter to trade-off the contribution of the FEMSA. Table I shows the details, and all the transfer tasks
two errors and optimized by sensitivity analysis. The proposed (Task A, B, and C for IEEE dataset; Task D, E, and F for XJTU
composite loss considers reducing the domain discrepancy and dataset) are with multi-source data under different operating
minimizing the source domain regression error simultaneously. conditions.
It can fully exploit the supervision from multiple source domains Root mean squared error (RMSE) is a common metric to
to achieve DA between the target and multiple sources. evaluate the model performance in machine learning [1]. Let
Based on the proposed FEMSA, raw data was collected. After n be the number of samples, and yi and ŷi be the real value
feature engineering, the data is expanded to N dimensions, where and predicted value of i-th sample, respectively. RMSE can be
N depends on the number of feature selection module settings. defined as follows.
After the subdomain adaptation, the output is a multidimensional
1 n
matrix, and the dimensions depend on the design of the subnet. RMSE = (ŷi − yi )2 (9)
The final output is the weighted majority prediction under mul- n i=1

tiple operating conditions.

B. Feature Enhancement
IV. EXPERIMENTAL STUDY The raw data of two datasets contain millions of data points
For the experimental study, two datasets are used to verify with different operating condition; the patterns of data presented
the proposed FEMSA— 2012 IEEE PHM Challenge Bearing vary even under the same operating condition.
dataset [46] and XJTU-SY Bearing dataset [47]. The computer For feature engineering, we use sliding window to extract
platform equips with CPU AMD Ryzen5 1600 (3.2G), GPU 55 physical-meaning features from time domain and frequency
Nvidia GeForce GTX 1080Ti, and RAM 32G. domain, respectively [9] [39]. At the same time, the raw data in
each window are transformed to spectrogram, and then CAE is
employed to extract 55 features of spectrogram. Through feature
A. Data Description
engineering, we obtain 110 new features.
For IEEE PHM Challenge 2012 Bearing dataset, the vibra- Then, all the extracted features are used to redefine the new
tion signals are collected from PRONOSTIA, which uses two source domains (i.e., subdomains) by clustering analysis with
accelerometers to collect the signals in horizontal and verti- the GMM and EM algorithm. Next, we use feature selection,
cal directions, respectively. The data were collected every 10 including Elastic net [48], random forest (RF), and gradient
seconds, and each time the data is continuously collected for boosting machine (GBM), to select important features by voting
0.1 seconds with the sampling frequency 25.6 kHz. To simulate scheme. Note that the important features might not be the same
different operating conditions, a complete run-to-failure signals in each source subdomain; the selection criterion we suggest

TABLE III
THE RMSE COMPARISON WITH DIFFERENT NUMBER OF STAGES

TABLE IV
THE SENSITIVITY ANALYSIS OF FEMSA

TABLE V
RMSE COMPARISON WITH DIFFERENT FEATURES

Fig. 4. RUL prediction results of FEMSA.

is that the features are voted by any two selectors in the top competitive with other MDA methods. Given the same important
30% of single source subdomain and presented in every source features, for one sampling cross validation, model training time
subdomain. Finally, we selected 36 important features. with 7500 epochs for DA and regressor modules takes 2.13 hours
in the proposed FEMSA, much less than 4.83 hours in MDAN
and 3.08 hours in LSTM-DA. In addition, we consider the retrain
C. Domain Adaptation and Model Training mechanism of FEMSA by applying concept drift detection. We
For DA and model training, we apply 5 sampling cross assume that RUL decay consists of several distinct stages; that
validation [9]; particularly, we sample 50% from each subdo- is, we divide the raw data into different stages detected by
main in C1, C2, and C3, respectively. In the proposed FEMSA concept drift. We retrain the model when the stage changes.
framework, the features of source and target domains are input to Table III shows the comparison with the existing benchmarks
the DA module to calculate the MMD and CORAL loss between when splitting target domain data into different numbers of
each pair of redefined source and target domains. Then the source stages. We sample 200 data points at each stage. Then apply
domain output was input to the regressor construction module to 5 sampling cross validation to ensure robustness. The results
train the RUL prediction model. Fig. 4 shows the RUL prediction show that the proposed FEMSA provides robust performance
results of the transfer tasks. The results show that the prediction (RMSE varies between 15.38% and 16.51%), no matter how
presents a good declining trend for each transfer task. Besides, many stages were detected by concept drift. The results imply
the predictions show a larger deviation in the beginning and a that it is feasible to develop FEMSA embedded with concept drift
smaller one at the later stages; particularly, the predicted RUL is for predicting RUL automatically in the manufacturing system.
much closer to the true value at the end, which provides guidance To retain a better and more robust performance, the values
for maintenance strategy when the failure is close. of the hyperparameters α and β in the loss function of FEMSA
Table II shows the metrics comparison of the proposed should be tested by grid search and sensitivity analysis. These
FEMSA on transfer tasks. We compared the mean and standard hyperparameters tests are repeated several times on each task.
deviation of RMSE with the state-of-art methods such as multi- Table IV illustrates the best setting of hyperparameters. The
source domain adaptation network (MDAN) [3] and LSTM-DA results support the robustness mentioned above (i.e., small vari-
(domain adversarial) [16], where MDAN was embedded with ation of RMSE even with a larger range on hyperparameters).
ResNet50 and LSTM-DA was a LSTM-based model. The pro- Furthermore, Table V compares the performance when applying
posed FEMSA with a simpler structure performs well and is different feature engineering. The results show that when only

Authorized licensed use limited to: Center for Science Technology and Information (CESTI). Downloaded on June 12,2024 at 02:38:57 UTC from IEEE Xplore. Restrictions apply.
6136 IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 9, NO. 7, JULY 2024

Fig. 5. Steps in FEMSA: (a) original source and target domain; (b) redefine source domains; (c) subdomain adaptation.

TABLE VI coefficients, and R-squared. These three metrics were improved

THE P-VALUE AND COEFFICIENT COMPARISON WITH DIFFERENT SCENARIOS when the redefine source domain mechanism was triggered.
The results in Table VII show that it is useful to redefine the
source domain, and it can significantly reduce the mean and
standard deviation of RMSE. Particularly in the scenario without
redefining the source domain, the prediction performance is just
as similar or slightly better than the model, which only uses
frequency- and time-domain features but no CAE features.

V. CONCLUSION
This study proposes that the FEMSA address the MDA
under multiple operating conditions. The proposed framework
TABLE VII includes feature enhancement, DA, and regressor construction.
RMSE COMPARISON WITH SOURCE DOMAIN REDEFINITION We redefine the source subdomains using clustering analysis
and propose a new loss function. Two public datasets about
accelerated degradation are applied to validate the proposed
FEMSA. The results show that the FEMSA with a simpler NN
structure is useful and competitive. We replace the deep learning
networks, which usually have complex network structures (i.e.,
GAN), with the redefined source domain module and feature
selection module. The enhanced features can train the model
well. The practical result shows the robustness of RUL predic-
tion using the FEMSA framework. In addition, the number of
subnets in the DA module is determined by clustering, which
is in the redefined source domain stage, and a large number
of subnets is computationally intensive. It is helpful to develop
statistic or nonlinear pattern extracted from time domain, fre- the mechanism for identifying subdomains even with small data
quency domain, or time-frequency domain, the performance of and dynamically adjust the influence weights of small clusters
FEMSA is dominated by other state-of-art methods. However, during model training. Furthermore, it is worthy to design the
when CAE features are extracted, the prediction performance composite loss metric, including different loss functions for
become competitive. different module evaluations and the weighted majority loss for
Fig. 5 shows the two-dimensional schematic diagram of the improving regression accuracy when training the subnets.
t-SNE [47] method after different stages of the empirical study.
Fig. 5(a) shows the original source domain and target domain.
Fig. 5(b) is the data after redefining source domains and feature REFERENCES
selection. Fig. 5(c) shows the data point after. [1] C.-Y. Lee and C.-F. Chien, “Pitfalls and protocols of data science in
The result illustrates that redefining the source domain and manufacturing practice,” J. Intell. Manuf., vol. 33, pp. 1189–1207, 2022.
then applying subdomain adaptation can improve performance. [2] Z. Zhao, B. Liang, X. Wang, and W. Lu, “Remaining useful life prediction
of aircraft engine based on degradation pattern learning,” Rel. Eng. Syst.
Furthermore, the step that redefines the source domain is an Saf., vol. 164, pp. 74–83, 2017.
important component in this study; this step considers the rela- [3] Y. Ding, P. Ding, X. Zhao, Y. Cao, and M. Jia, “Transfer learning for
tionship between subdomains and their relevant subdomains. We remaining useful life prediction across operating conditions based on
compare the results with/without redefining the source domain multisource domain adaptation,” IEEE/ASME Trans. Mechatron., vol. 27,
in Tables VI and VII. no. 5, pp. 4143–4152, Oct. 2022.
[4] Z. Pan, Z. Meng, Z. Chen, W. Gao, and Y. Shi, “A two-stage method based
Table VI provides the p-values and coefficients for different on extreme learning machine for predicting the remaining useful life of
scenarios. The features are selected by the feature selection rolling-element bearings,” Mech. Syst. Signal Process., vol. 144, 2020,
module. The linear regression is used to evaluate the p-values, Art. no. 106899.

[5] A. Cubillo, S. Perinpanayagam, and M. Esperon-Miguez, “A review of [27] Y. Liu, Y. Wang, T. W. S. Chow, and B. Li, “Deep adversarial subdo-
physics-based models in prognostics: Application to gears and bearings main adaptation network for intelligent fault diagnosis,” IEEE Trans. Ind.
of rotating machinery,” Adv. Mech. Eng., vol. 8, no. 8, pp. 1–20, 2016, Inform., vol. 18, no. 9, pp. 6038–6046, Sep. 2022.
doi: 10.1177/1687814016664660. [28] P. Wei, Y. Ke, X. Qu, and T. Y. Leong, “Subdomain adaptation with
[6] P. Kundu, A. K. Darpe, and M. S. Kulkarni, “An ensemble decision tree manifolds discrepancy alignment,” IEEE Trans. Cybern., vol. 52, no. 11,
methodology for remaining useful life prediction of spur gears under natu- pp. 11698–11708, Nov. 2022.
ral pitting progression,” Struct. Health Monit., vol. 19, no. 3, pp. 854–872, [29] C. Tan, F. Sun, T. Kong, W. Zhang, C. Yang, and C. Liu, “A survey
2020. on deep transfer learning,” in Proc. Int. Conf. Artif. Neural Netw., 2018,
[7] M. Xia, T. Li, T. Shu, J. Wan, C. W. De Silva, and Z. Wang, “A two-stage pp. 270–279.
approach for the remaining useful life prediction of bearings using deep [30] J. Xie, L. Zhang, L. Duan, and J. Wang, “On cross-domain feature fusion
neural networks,” IEEE Trans. Ind. Inform., vol. 15, no. 6, pp. 3703–3711, in gearbox fault diagnosis under various operating conditions based on
Jun. 2019. transfer component analysis,” in Proc. IEEE Int. Conf. Prognostics Health
[8] R. Liu, B. Yang, and A. G. Hauptmann, “Simultaneous bearing fault recog- Manage., 2016, pp. 1–6.
nition and remaining useful life prediction using joint-loss convolutional [31] W. Lu, B. Liang, Y. Cheng, D. Meng, J. Yang, and T. Zhang, “Deep model
neural network,” IEEE Trans. Ind. Inform., vol. 16, no. 1, pp. 87–96, based domain adaptation for fault diagnosis,” IEEE Trans. Ind. Electron.,
Jan. 2020. vol. 64, no. 3, pp. 2296–2305, Mar. 2017.
[9] H. W. Lu and C. Y. Lee, “Kernel-based dynamic ensemble technique for [32] X Li, W Zhang, and Q. Ding, “Cross-domain fault diagnosis of rolling
remaining useful life prediction,” IEEE Robot. Automat. Lett., vol. 7, no. 2, element bearings using deep generative neural networks,” IEEE Trans.
pp. 1142–1149, Apr. 2022. Ind. Electron., vol. 66, no. 7, pp. 5525–5534, Jul. 2019.
[10] S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Trans. Knowl. [33] L. Wen, L. Gao, and X. Li, “A new deep transfer learning based on sparse
Data Eng., vol. 22, no. 10, pp. 1345–1359, Oct. 2010. auto-encoder for fault diagnosis,” IEEE Trans. Syst., Man, Cybern.: Syst.,
[11] Y. H. Kim, U. Shin, J. Park, and I. S. Kweon, “MS-UDA: Multi- vol. 49, no. 1, pp. 136–144, Jan. 2019.
spectral unsupervised domain adaptation for thermal image semantic [34] Y. Ding, M. Jia, Q. Miao, and P. Huang, “Remaining useful life estimation
segmentation,” IEEE Robot. Automat. Lett., vol. 6, no. 4, pp. 6497–6504, using deep metric transfer learning for kernel regression,” Rel. Eng. Syst.
Oct. 2021. Saf., vol. 212, 2021, Art. no. 107583.
[12] J. Zhang et al., “VR-goggles for robots: Real-to-sim domain adaptation for [35] Y. Ding, M. Jia, and Y. Cao, “Remaining useful life estimation under
visual control,” IEEE Robot. Automat. Lett., vol. 4, no. 2, pp. 1148–1155, multiple operating conditions via deep subdomain adaptation,” IEEE
Apr. 2019. Trans. Instrum. Meas., vol. 70, 2021, Art. no. 3516711.
[13] L. Zheng, W. Ma, Y. Cai, T. Lu, and S. Wang, “GPDAN: Grasp pose [36] J. Zhang, X. Li, J. Tian, Y. Jiang, H. Luo, and S. Yin, “A variational local
domain adaptation network for sim-to-real 6-DoF object grasping,” IEEE weighted deep sub-domain adaptation network for remaining useful life
Robot. Automat. Lett., vol. 8, no. 8, pp. 4585–4592, Aug. 2023. prediction facing cross-domain condition,” Rel. Eng. Syst. Saf., vol. 231,
[14] J. Zhu, N. Chen, and C. Shen, “A new data-driven transferable re- 2023, Art. no. 108986.
maining useful life prediction approach for bearing under different [37] T. Hu, Y. Guo, L. Gu, Y. Zhou, Z. Zhang, and Z. Zhou, “Remaining
working conditions,” Mech. Syst. Signal Process., vol. 139, 2020, useful life estimation of bearings under different working conditions via
Art. no. 106602. Wasserstein distance-based weighted domain adaptation,” Rel. Eng. Syst.
[15] Y. Ganin et al., “Domain-adversarial training of neural networks,” J. Mach. Saf., vol. 224, 2022, Art. no. 108526.
Learn. Res., vol. 17, no. 1, pp. 2096–2030, 2016. [38] L. Ren, Y. Sun, J. Cui, and L. Zhang, “Bearing remaining useful life
[16] P. R. D. O. da Costa, A. Akçay, Y. Zhang, and U. Kaymak, “Remaining prediction based on deep autoencoder and deep neural networks,” J. Manuf.
useful lifetime prediction via deep domain adaptation,” Rel. Eng. Syst. Saf., Syst., vol. 48, pp. 71–77, 2018.
vol. 195, 2020, Art. no. 106682. [39] W. Jiang, Y. Hong, B. Zhou, X. He, and C. Cheng, “A GAN-based anomaly
[17] H. Cheng, X. Kong, G. Chen, Q. Wang, and R. Wang, “Transferable detection approach for imbalanced industrial time series,” IEEE Access,
convolutional neural network based remaining useful life prediction of vol. 7, pp. 143608–143619, 2019.
bearing under multiple failure behaviors,” Measurement, vol. 168, 2021, [40] C. Y. Lee, T. S. Huang, M. K. Liu, and C. Y. Lan, “Data science for vi-
Art. no. 108286. bration heteroscedasticity and predictive maintenance of rotary bearings,”
[18] M. Ragab et al., “Contrastive adversarial domain adaptation for machine Energies, vol. 12, no. 5, 2019, Art. no. 801.
remaining useful life prediction,” IEEE Trans. Ind. Inform., vol. 17, no. 8, [41] Z. Cheng, H. Sun, M. Takeuchi, and J. Katto, “Deep convolutional
pp. 5239–5249, Aug. 2021. autoencoder-based lossy image compression,” in Proc. IEEE Picture Cod-
[19] B. Rezaeianjouybari and Y. Shang, “A novel deep multi-source domain ing Symp., 2018, pp. 253–257.
adaptation framework for bearing fault diagnosis based on feature-level [42] M. A. Masud, J. Z. Huang, C. Wei, J. Wang, I. Khan, and M. Zhong,
and task-specific distribution alignment,” Measurement, vol. 178, 2021, “I-nice: A new approach for identifying the number of clusters and initial
Art. no. 109359. cluster centres,” Inf. Sci., vol. 466, pp. 129–151, 2018.
[20] Y. Zhu et al., “Deep subdomain adaptation network for image classifica- [43] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from
tion,” IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 4, pp. 1713–1722, incomplete data via the EM algorithm,” J. Roy. Stat. Soc.: Ser. B, vol. 39,
Apr. 2021. no. 1, pp. 1–22, 1977.
[21] A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf, and A. Smola, “A [44] C.-Y. Lee, J.-H. Zeng, S.-Y. Lee, R.-B. Lu, and P.-H. Kuo, “SNP data
kernel two-sample test,” J. Mach. Learn. Res., vol. 13, no. 1, pp. 723–773, science for classification of bipolar disorder I and bipolar disorder II,”
2012. IEEE/ACM Trans. Comput. Biol. Bioinf., vol. 18, no. 6, pp. 2862–2869,
[22] L. Rüschendorf, “The Wasserstein distance and approximation theorems,” Nov.-Dec. 2021.
Probability Theory Related Fields, vol. 70, no. 1, pp. 117–129, 1985. [45] J. Li et al., “Feature selection: A data perspective,” ACM Comput. Surv.,
[23] B. Sun and K. Saenko, “Deep coral: Correlation alignment for deep domain vol. 50, no. 6, pp. 1–45, 2017.
adaptation,” in Proc. Eur. Conf. Comput. Vis., 2016, pp. 443–450. [46] P. Nectoux et al., “PRONOSTIA: An experimental platform for bearings
[24] N. Bonneel, J. Rabin, G. Peyré, and H. Pfister, “Sliced and radon wasser- accelerated degradation tests,” in Proc. IEEE Int. Conf. Prognostics Health
stein barycenters of measures,” J. Math. Imag. Vis., vol. 51, no. 1, Manage., 2012, pp. 1–8.
pp. 22–45, 2015. [47] B. Wang, Y. Lei, N. Li, and N. Li, “A hybrid prognostics approach for
[25] Y. Xia, C. Shen, D. Wang, Y. Shen, W. Huang, and Z. Zhu, “Moment estimating remaining useful life of rolling element bearings,” IEEE Trans.
matching-based intraclass multisource domain adaptation network for Rel., vol. 69, no. 1, pp. 401–412, Mar. 2020.
bearing fault diagnosis,” Mech. Syst. Signal Process., vol. 168, 2022, [48] H. Zou and T. Hastie, “Regularization and variable selection via the elastic
Art. no. 108697. net,” J. Roy. Stat. Soc.: Ser. B, vol. 67, no. 2, pp. 301–320, 2005.
[26] Z. Pei, Z. Cao, M. Long, and J. Wang, “Multi-adversarial domain adap- [49] L. van der Maaten and G. Hinton, “Visualizing data using t-SNE,” J. Mach.
tation,” in Proc. 32th AAAI Conf. Artif. Intell., 2018, vol. 32, no. 1, Learn. Res., vol. 9, no. 86, pp. 2579–2605, 2008.
pp. 3934–3941, doi: 10.1609/aaai.v32i1.11767.

Authorized licensed use limited to: Center for Science Technology and Information (CESTI). Downloaded on June 12,2024 at 02:38:57 UTC from IEEE Xplore. Restrictions apply.