Dual Mix-Up Adversarial Domain Adaptation For Machine Remaining Useful Life Prediction
Dual Mix-Up Adversarial Domain Adaptation For Machine Remaining Useful Life Prediction
University of Electronic Science and Technology of China Shanghai Radio Equipment Research Institute
Chengdu, China Shanghai, China
[email protected] [email protected]
Abstract—Remaining useful life (RUL) prediction is one of popular along with the development of machine learning and
the core issues in the equipment maintenance process. It aims deep learning, studies like [7], [8] show data-driven methods’
to accurately forecast machines’ run-to-failure life span using effectiveness, and no prior knowledge of the machines is
previous and current state data. As various data-driven models
are proving to be effective, because RUL labels for machines in needed.
particular conditions are difficult to obtain, domain adaptation Traditional deep learning approaches only works when large
approaches begins to be explored in the RUL prediction issue. We number of labeled data are supplied and there is no distribution
propose a novel Dual Mix-up Adversarial Domain Adaptation difference between training data and testing data. When it
(DMADA) approach to further improve the RUL forecasting comes to real-world industrial application, however, it can be
accuracy, building on existing RUL domain adaptation studies.
In DMADA, both time-series mix-up and domain mix-up reg- difficult to collect sufficient labeled data of the aiming machine
ularization are conducted. Virtual samples generated by linear under a specific working condition because of time or financial
interpolation lead to enriched and more continuous sample space. reasons. Domain adaptation has been developed to address
The linear interpolation encourages consistency of prediction in- such issues.
between samples and allows the model to explore the feature In the RUL prediction problem, raw data is sampled read-
space more thoroughly. At the same time, the domain mix-
up conserves the invariance of learned features. The two mix- ings from continuous signal streams on different sensors. It
up regularizations combined promote both the transferability leads to a gap between the sampled data distribution and the
and the discriminability of extracted features, which is essential real data distribution due to real-world precision constraints.
to satisfactory unsupervised domain adaptation performance. As a result, even though at last the target domain is well-
Thorough experiments on the C-MPASS dataset are conducted aligned with the source domain, the prediction effect in the
and satisfactory results prove the proposed approach effective.
target domain is not good enough, because the data from both
Index Terms—RUL prediction,Adversarial domain adaptation, domains is sampled sparsely and the learned knowledge from
mix-up
the source domain is flawed at the first place.
I. I NTRODUCTION To address this issue and inspired by the mix-up tech-
nique [9], [10], we proposed a novel dual mix-up adversarial
The goal of remaining useful life (RUL) prediction is to domain adaptation (DMADA) approach to conduct domain
accurately predict the remaining time for a machine to adaptation on the RUL prediction problem. In our approach,
normally function until a failure happens. According to the a time-series mix-up regularization and a domain mix-up
analysis of the remaining life prediction results, the equip- regularization are both designed to close the gap between
ment’s availability and reliability can be improved. At the same sampled data distribution and real data distribution. Mix-up
time, maintenance costs and the risk for operations failure generates convex combinations of virtual samples. With the
events could be reduced. Studies on RUL prediction helps to dual mix-up regularization, wider range of samples is available
mitigate serious costs brought by equipment’s sudden failures, and the prediction in-between samples are enforced to be
which have important practical value [1]. more consistent. The time series mix-up leads to a more
The methods for forecasting RUL is mainly divided into two continuous and enriched sample space, allowing the feature
categories: model-based ones [2], [3] and data-driven ones [4]– extractor to explore more thorough information and to learn
[6]. Model-based methods require information from machine more discriminative representations. At the same time, the
constructions as well as statistical knowledge to build physical domain mix-up strengthens the domain discriminator, resulting
models. They are complex to build, expertise-requiring, and in the model learning more domain-invariant representations
hard to generalize. Data-driven approaches are becoming more due to the feature extractor being adversarially trained with
574
Authorized licensed use limited to: Center for Science Technology and Information (CESTI). Downloaded on June 12,2024 at 02:39:03 UTC from IEEE Xplore. Restrictions apply.
true value of RUL labels and the predicted RUL values through
XS fS
the network. The loss is formulated as below:
RUL Prediction
1 X
nS
2 Mix-up ES R
Lmse = ŷSi − ySi , (1) Time Series
nS i=1 XS fS Mix-up Regularization
575
Authorized licensed use limited to: Center for Science Technology and Information (CESTI). Downloaded on June 12,2024 at 02:39:03 UTC from IEEE Xplore. Restrictions apply.
TABLE I TABLE II
P ROPERTIES OF 4 DATASETS PARAMETER SETTINGS FOR λ AND PREHEAT EPOCHS
576
Authorized licensed use limited to: Center for Science Technology and Information (CESTI). Downloaded on June 12,2024 at 02:39:03 UTC from IEEE Xplore. Restrictions apply.
TABLE III
C OMPARISON AGAINST FIVE STATE - OF - THE - ART M ETHODS
RMSE Score
Method WDGRL [25] ADDA [19] RULDDA [18] CADA [20] DMADA WDGRL ADDA RULDDA CADA DMADA
FD001→FD002 21.46 31.26 24.08 19.52 18.40 33160 4865 2684 2122 2016
FD001→FD003 71.7 57.09 43.08 39.58 25.22 15936 32472 10259 8415 2417
FD001→FD004 57.24 56.66 45.7 31.23 28.16 86139 68859 26981 11577 7324
FD002→FD001 15.24 19.73 23.91 13.88 13.65 157572 689 2430 351 316
FD002→FD003 41.45 37.22 47.26 33.53 28.18 19053 11029 12756 5213 3575
FD002→FD004 37.62 37.64 45.17 33.71 31.23 523722 16856 25738 15106 12831
FD003→FD001 36.05 40.41 27.15 19.54 19.90 18307 32451 2391 1451 1667
FD003→FD002 40.11 42.53 30.42 19.33 19.20 32112 459911 6754 5257 4608
FD003→FD004 29.98 31.88 31.82 20.61 21.38 296061 82520 5775 3219 3503
FD004→FD001 42.01 37.81 32.37 20.10 19.63 45394 43794 13377 1840 1092
FD004→FD002 35.88 36.67 27.54 18.5 19.07 38221 23822 4937 4460 3735
FD004→FD003 18.18 23.59 23.31 14.49 14.12 77977 1117 1679 682 665
577
Authorized licensed use limited to: Center for Science Technology and Information (CESTI). Downloaded on June 12,2024 at 02:39:03 UTC from IEEE Xplore. Restrictions apply.
TABLE IV
T HE ABLATION STUDY OF TWO MIX - UP MODULES
RMSE Score
w/o time-series mix-up w/o domain mix-up DMADA w/o time-series mix-up w/o domain mix-up DMADA
FD001→FD002 18.89 2.66% 18.70 1.63% 18.40 2204 9.29% 2039 1.10% 2017
FD001→FD003 22.64 -10.23% 25.37 0.59% 25.22 1724 -28.67% 2244 -7.15% 2416
FD001→FD004 29.48 4.69% 29.32 4.12% 28.16 7265 -0.81% 7288 -0.50% 7324
FD002→FD001 13.50 -1.10% 13.62 -0.22% 13.65 312 -1.32% 333 5.33% 316
FD002→FD003 29.67 5.29% 29.88 6.03% 28.81 3860 7.96% 4090 14.39% 3575
FD002→FD004 31.24 0.03% 31.18 -0.16% 31.23 23245 81.16% 17007 32.55% 12831
FD003→FD001 21.48 7.94% 18.65 -6.28% 19.90 2042 22.46% 1231 -26.17% 1667
FD003→FD002 19.76 2.92% 18.91 -1.51% 19.20 6801 47.58% 6568 42.52% 4608
FD003→FD004 21.41 0.14% 21.21 -0.80% 21.38 6028 72.03% 3559 1.57% 3503
FD004→FD001 20.62 5.04% 19.90 1.38% 19.63 1243 13.74% 1153 5.51% 1093
FD004→FD002 19.71 3.36% 18.06 -5.39% 19.70 6701 79.41% 3653 -2.20% 3735
FD004→FD003 14.04 -0.57% 14.26 0.99% 14.12 859 29.08% 627 -5.78% 665
Average 1.68% 0.04% 27.66% 5.10%
The DMADA combines the time-series mix-up and domain [13] A. Al-Dulaimi, S. Zabihi, A. Asif, and A. Mohammadi, “Hybrid deep
mix-up to the adversarial domain framework. The enriched neural network model for remaining useful life estimation,” in ICASSP
2019-2019 IEEE International Conference on Acoustics, Speech and
dataset leads to a more continuous latent feature space. Better Signal Processing (ICASSP), 2019, pp. 3872–3876.
alignment between the source domain and target domain is [14] Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Lavio-
achieved because DMADA encourages the sample distribu- lette, M. Marchand, and V. Lempitsky, “Domain-adversarial training of
neural networks,” 2016.
tion to be closer to real distribution in the two domains. [15] M. Long, Z. CAO, J. Wang, and M. I. Jordan, “Conditional adversarial
The DMADA gets satisfactory results for RUL prediction domain adaptation,” in Advances in Neural Information Processing
precision on the C-MPASS dataset, an average of 6% and Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-
Bianchi, and R. Garnett, Eds., vol. 31, 2018.
18% performance improving are achieved in two evaluation [16] E. Tzeng, J. Hoffman, T. Darrell, and K. Saenko, “Simultaneous
metrics comparing to the state-of-the-art methods, indicating deep transfer across domains and tasks,” in Proceedings of the IEEE
its effectiveness. International Conference on Computer Vision (ICCV), December 2015.
[17] E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell, “Adversarial discrim-
R EFERENCES inative domain adaptation,” in Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), July 2017.
[1] Fault Diagnosis. John Wiley Sons, Ltd, 2006, ch. 5, pp. 172–283. [18] P. R. de Oliveira da Costa, A. Akçay, Y. Zhang, and U. Kaymak,
[2] Y. Lei, N. Li, S. Gontarz, J. Lin, S. Radkowski, and J. Dybala, “A “Remaining useful lifetime prediction via deep domain adaptation,”
model-based method for remaining useful life prediction of machinery,” Reliability Engineering System Safety, vol. 195, p. 106682, 2020.
IEEE Trans. Reliab., vol. 65, no. 3, pp. 1314–1326, 2016. [19] M. Ragab, Z. Chen, M. Wu, C. K. Kwoh, and X. Li, “Adversarial
[3] N. Li, Y. Lei, T. Yan, N. Li, and T. Han, “A wiener-process-model- transfer learning for machine remaining useful life prediction,” in 2020
based method for remaining useful life prediction considering unit-to- IEEE International Conference on Prognostics and Health Management
unit variability,” IEEE Trans. Ind. Electron., vol. 66, no. 3, pp. 2092– (ICPHM). IEEE, 2020, pp. 1–7.
2101, 2019. [20] M. Ragab, Z. Chen, M. Wu, C. S. Foo, C. K. Kwoh, R. Yan, and
[4] Y. Wang, Y. Zhao, and S. Addepalli, “Remaining useful life prediction X. Li, “Contrastive adversarial domain adaptation for machine remaining
using deep learning approaches: A review,” Procedia manufacturing, useful life prediction,” IEEE Trans. Ind. Informatics, vol. 17, no. 8, pp.
vol. 49, pp. 81–88, 2020. 5239–5249, 2021.
[5] B. Yang, R. Liu, and E. Zio, “Remaining useful life prediction based on [21] J. Li, X. Li, and D. He, “Domain adaptation remaining useful life
a double-convolutional neural network architecture,” IEEE Trans. Ind. prediction method based on adabn-dcnn,” in 2019 Prognostics and
Electron., vol. 66, no. 12, pp. 9521–9530, 2019. System Health Management Conference (PHM-Qingdao). IEEE, 2019,
[6] K. Liu, Y. Shang, Q. Ouyang, and W. D. Widanage, “A data-driven pp. 1–6.
approach with uncertainty quantification for predicting future capacities [22] S. Fu, Y. Zhang, L. Lin, M. Zhao, and S.-s. Zhong, “Deep residual
and remaining useful life of lithium-ion battery,” IEEE Trans. Ind. lstm with domain-invariance for remaining useful life prediction across
Electron., vol. 68, no. 4, pp. 3170–3180, 2021. domains,” Reliability Engineering & System Safety, vol. 216, p. 108012,
[7] G. Sateesh Babu, P. Zhao, and X.-L. Li, “Deep convolutional neural 2021.
network based regression approach for estimation of remaining useful [23] S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Transac-
life,” in International conference on database systems for advanced tions on Knowledge and Data Engineering, vol. 22, no. 10, pp. 1345–
applications, 2016, pp. 214–228. 1359, 2010.
[8] Y. Cheng, J. Wu, H. Zhu, S. W. Or, and X. Shao, “Remaining useful life [24] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural
prognosis based on ensemble long short-term memory neural network,” Computation, vol. 9, pp. 1735–1780, 1997.
IEEE Transactions on Instrumentation and Measurement, vol. 70, pp. [25] J. Shen, Y. Qu, W. Zhang, and Y. Yu, “Wasserstein distance guided
1–12, 2020. representation learning for domain adaptation,” in Thirty-second AAAI
[9] H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz, “mixup: Beyond conference on artificial intelligence, 2018.
empirical risk minimization,” arXiv preprint arXiv:1710.09412, 2017. [26] A. Saxena, K. Goebel, D. Simon, and N. Eklund, “Damage propagation
[10] Y. Wu, D. Inkpen, and A. El-Roby, “Dual mixup regularized learning modeling for aircraft engine run-to-failure simulation,” in 2008 Interna-
for adversarial domain adaptation,” vol. 12374, 2020, pp. 540–555. tional Conference on Prognostics and Health Management, 2008.
[11] H. Li, W. Zhao, Y. Zhang, and E. Zio, “Remaining useful life prediction
using multi-scale deep convolutional neural network,” Applied Soft
Computing, vol. 89, p. 106113, 2020.
[12] Z. Shi and A. Chehade, “A dual-lstm framework combining change point
detection and remaining useful life prediction,” Reliability Engineering
& System Safety, vol. 205, p. 107257, 2021.
578
Authorized licensed use limited to: Center for Science Technology and Information (CESTI). Downloaded on June 12,2024 at 02:39:03 UTC from IEEE Xplore. Restrictions apply.