0% found this document useful (0 votes)
16 views

Developing Health Indicators and RUL Prognostics For Systems With Few Failure Instances and Varying Operating Conditions Using A LSTM Autoencoder

Uploaded by

qiaoli zhou
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Developing Health Indicators and RUL Prognostics For Systems With Few Failure Instances and Varying Operating Conditions Using A LSTM Autoencoder

Uploaded by

qiaoli zhou
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Engineering Applications of Artificial Intelligence 117 (2023) 105582

Contents lists available at ScienceDirect

Engineering Applications of Artificial Intelligence


journal homepage: www.elsevier.com/locate/engappai

Developing health indicators and RUL prognostics for systems with few
failure instances and varying operating conditions using a LSTM autoencoder
Ingeborg de Pater a ,∗, Mihaela Mitici b
a Faculty of Aerospace Engineering, Delft University of Technology, HS 2926 Delft, The Netherlands
b Faculty of Science, Utrecht University, Heidelberglaan 8, 3584 CS Utrecht, The Netherlands

ARTICLE INFO ABSTRACT


Keywords: Most Remaining Useful Life (RUL) prognostics are obtained using supervised learning models trained with
Remaining Useful Life prognostics many labelled data samples (i.e., the true RUL is known). In aviation, however, aircraft systems are often
Health indicators preventively replaced before failure. There are thus very few labelled data samples available. We therefore
Unlabelled data samples
propose a Long Short-Term Memory (LSTM) autoencoder with attention to develop health indicators for
Autoencoder
an aircraft system instead. This autoencoder is trained with unlabelled data samples (i.e., the true RUL is
Varying operating conditions
Attention
unknown). Since aircraft fly under various operating conditions (varying altitude, speed, etc.), these conditions
are also integrated in the autoencoder. We show that the consideration of the operating conditions leads to
robust health indicators and improves significantly the monotonicity, trendability and prognosability of these
indicators. These health indicators are further used to predict the RUL of the aircraft system using a similarity-
based matching approach. We illustrate our approach for turbofan engines. We show that the consideration
of the operating conditions improves the monotonicity of the health indicators by 97%. Also, our approach
leads to accurate RUL estimates with a Root Mean Square Error (RMSE) of 2.67 flights only. Moreover, a 19%
reduction in the RMSE is obtained using our approach in comparison to existing supervised learning models.

1. Introduction 2022) and Long Short-Term Memory (LSTM) Neural networks (Xi-
ang et al., 2020), which directly predict the RUL. However, such
Complex technical systems are crucial for the safe and reliable supervised learning methods require the availability of many labelled
operation of machines, vehicles, manufacturing processes, etc. The data samples to train accurate prognostic models. This makes these
unexpected failures of such systems lead to costly unplanned downtime supervised learning approaches unsuitable for complex, safety–critical
and potential safety risks. To limit the number of failures, systems aircraft systems.
are often replaced preventively (Ochella et al., 2022; Koutroulis et al., Instead, accurate RUL prognostics can be obtained by first devel-
oping a health indicator using the unlabelled data samples and signal
2022). However, due to such preventive, early replacements, the ac-
reconstruction, i.e., an autoencoder learns the normal system behaviour
tual failure time of the system is unobserved. This complicates the
with the unlabelled data samples (Fink et al., 2020; Malhotra et al.,
estimation of the Remaining Useful Life (RUL) for such systems.
2016). This autoencoder is then used to detect deviations from the nor-
Especially in aviation, preventive replacement of complex, safety–
mal system behaviour that emerge due to increasing degradation (Fink
critical systems is common. Replacing these systems early is preferred
et al., 2020). This approach has been considered in, for instance, Mal-
over keeping them running for a long time and risking a failure.
hotra et al. (2016) and Ye and Yu (2021). In Malhotra et al. (2016),
Consequently, the data-monitoring samples from such systems are often a health indicator is developed using the reconstruction error of a
unlabelled, i.e., the corresponding true RUL is unknown. In the rare LSTM autoencoder and a linear regression model. Similarly, in Ye and
case when such a system does fail during operation, the failure time is Yu (2021), a health indicator is obtained based on the reconstruction
observed and the data-monitoring samples coming from this system are errors of a LSTM autoencoder and a Gaussian distribution. In contrast,
labelled (i.e., the true RUL is known) (Berghout et al., 2020). This mix in Gugulothu et al. (2017), Yu et al. (2019), Fu et al. (2021) and
of very few labelled data samples, but many unlabelled data samples is in Zhai et al. (2021), the embeddings of a recurrent autoencoder and a
often seen for complex aircraft systems. conditional variational autoencoder, respectively, are used to develop
Common RUL prognostics models are Convolutional Neural net- a health indicator. In Liu et al. (2020), a health indicator for low-
works (de Pater et al., 2022; Shen et al., 2021; de Pater and Mitici, frequency time-series is developed using both the reconstruction error

∗ Corresponding author.
E-mail address: [email protected] (I. de Pater).

https://doi.org/10.1016/j.engappai.2022.105582
Received 15 July 2022; Received in revised form 19 October 2022; Accepted 28 October 2022
Available online 7 November 2022
0952-1976/© 2022 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
I. de Pater and M. Mitici Engineering Applications of Artificial Intelligence 117 (2023) 105582

aircraft operating conditions solely to assist in encoding and recon-


structing the sensor data samples. In contrast with Zhai et al. (2021),
no information on the operating conditions is thus lost by clustering
the operating conditions.
We apply our methodology to develop health indicators and RUL
prognostics for the aircraft turbofan engines of the new N-CMAPSS
dataset (Arias Chao et al., 2021). An overview of the considered ap-
proach is in Fig. 1. The obtained health indicators have a high mono-
tonicity (0.38), trendability (0.95) and prognosability (0.94). We show
that the monotonicity of the health indicators decreases with 97% when
the operating conditions are not considered in the LSTM-AE. Also, the
monotonicity decreases by 11% when no attention is applied in the
LSTM-AE.
Having the health indicators, we divide the lifetime of each engine
in a healthy and an unhealthy stage. Last, we estimate the RUL of
Fig. 1. Schematic overview of the consider approach. the engines in the unhealthy stage with a similarity-based matching
method, using the health indicators and the few available labelled data
samples (Yu et al., 2020; Lyu et al., 2020; Malhotra et al., 2016). Due
and the embeddings of a LSTM autoencoder. With such health indica- to the high monotonicity, trendability and prognosability of the health
tors, accurate RUL prognostics are obtained with just a few labelled indicators, the overall RMSE of these RUL prognostics in the unhealthy
data samples. stage is only 2.67 flights.
The main contributions of this study are:
However, the autoencoders mentioned above cannot be directly
applied to develop health indicators for complex aircraft systems. First, 1. We propose an unsupervised learning approach for health indi-
health-monitoring sensor measurements are usually recorded at a high cator construction and RUL prognostics for systems with very
frequency during flight. Moreover, the flight itself is several hours long. few labelled data samples, i.e., very few failures. Such systems
As such, the time-series of measurements of each flight is long. For the are rarely operated until failure, but rather replaced preven-
case study presented in this paper, the aircraft system performs many tively. We show that our approach outperforms existing super-
flights, each containing 60-303 time-steps. In contrast, existing studies vised learning methods for the considered system. Specifically,
consider autoencoders that use fixed-length data samples of 5 (Yu et al., the RMSE of the RUL estimates is 19% lower compared to
2019) to 40 (Liu et al., 2020) time-steps only. existing supervised learning methods.
Also, the conditions under which a system operates are expected 2. We develop health indicators by integrating the highly-varying
to influence the degradation of the system (Wei et al., 2021). This is operating conditions of the system in a LSTM autoencoder. This
especially the case for aircraft, where the operating conditions vary due makes the health indicators robust to these highly-varying oper-
to weather conditions, flying routes, flying altitudes etc. (Wang et al., ating conditions and improves their monotonicity, trendability
2022). One of the major open challenges for Prognostics and Health and prognosability significantly. Moreover, the obtained health
Management (PHM), in aviation and in other application domains, is indicators have a high trendability even when the operating con-
therefore to develop health indicators and RUL prognostics that are ditions from the test set differ significantly from the operating
robust to varying operating conditions (Fink et al., 2020; Ochella et al., conditions in the training set.
2022; Koutroulis et al., 2022). 3. We use attention in the LSTM autoencoder to handle the high-
When proposing health indicators using an autoencoder, only one frequency measurements gathered during the long flights of
study accounts for the operating regime: a high-level cluster of similar the considered system. We show that using attention improves
operating conditions (Zhai et al., 2021). The operating conditions of the monotonicity of the health indicators by 11%. This is par-
aircraft, however, are highly-varying during each flight. For example, ticularly relevant for novel technical systems whose health is
the altitude of an aircraft continuously changes during a flight. Consid- monitored continuously and at a high-frequency.
ering only a few aggregated clusters as in Zhai et al. (2021) would thus
The remainder of this paper is organized as follows. We first in-
lead to a loss of information (Fink et al., 2020).
troduce the proposed methodology to construct the health indicators
To develop a health indicator for systems with high-frequency mea- in Section 2. Then, we introduce the considered N-CMAPSS dataset in
surements and highly-varying operating conditions, we propose to use Section 3, and present the resulting health indicators in Section 4. Last,
the reconstruction error of a LSTM autoencoder (LSTM-AE) with (i) at- we introduce the similarity-based matching approach to develop RUL
tention and (ii) integrated operating conditions. LSTM neural networks prognostics in Section 5, and analyse the RUL prognostics in Section 6.
are well suited to process varying-length time-series, while avoiding The conclusions are provided in Section 7.
the vanishing gradient problem (Vasilev, 2019). However, a standard
recurrent autoencoder cannot reconstruct long time-series of sensor 2. Methodology — constructing health indicators with a LSTM
data well: The final embedding of the autoencoder cannot contain all autoencoder
relevant features of a long input sample. Moreover, the final embedding
contains more information about the last sensor measurements than In Section 2.1, we introduce the LSTM-AE with attention and inte-
about the first sensor measurements of the flight (Vasilev, 2019). In grated operating conditions. In Section 2.2, we use the reconstruction
language-related application with long sentences, these problems have errors from this autoencoder to construct a health indicator.
been successfully elevated by implementing attention, giving state-
of-the-art results (Fink et al., 2020). Inspired by this, we also apply 2.1. LSTM-AE with local Luong attention and informative operating condi-
attention in the LSTM-AE. tions
To develop health indicators robust to the highly-varying operating
conditions, we input the operating conditions in the autoencoder. In this section, we introduce the Long Short-Term Memory au-
These operating conditions are not encoded and then reconstructed, toencoder (LSTM-AE). Let 𝐗𝑒,𝑓 = {𝐗𝑒,𝑓 𝑒,𝑓
𝑡 , 𝑡 ∈ {1, 2, … , 𝑛 }} be the
but merely used ‘‘informatively’’, i.e., the LSTM-AE is informed on the multi-sensor measurements of an aircraft system 𝑒 during a flight 𝑓 ,

2
I. de Pater and M. Mitici Engineering Applications of Artificial Intelligence 117 (2023) 105582

Fig. 2. A schematic overview of the considered LSTM-AE with informative operating conditions.

with 𝑛𝑒,𝑓 the number of multi-sensor measurements of flight 𝑓 . Here,


s
𝐗𝑒,𝑓
𝑡 = [𝑋𝑡𝑒,𝑓 ,1 , 𝑋𝑡𝑒,𝑓 ,2 , … , 𝑋𝑡𝑒,𝑓 ,𝑚 ] is the 𝑡th multi-sensor measurement of
s
this flight 𝑓 , with 𝑚 the number of sensors considered. The LSTM-AE
first consists of an encoder, that maps the multi-sensor measurements
𝐗𝑒,𝑓 to an embedding with a smaller dimension (i.e., encodes), and
then a decoder, that reconstructs these measurements from this embed-
ding. The objective of the LSTM-AE is to minimize the total absolute
reconstruction error  of each flight:
𝑒,𝑓

𝑛
| ̂ 𝑒,𝑓 𝑒,𝑓 |
= |𝐗𝑡 − 𝐗𝑡 | , (1)
| |
𝑡=2

with 𝐗̂ 𝑒,𝑓 𝑡 the reconstructed sensor measurements 𝐗𝑒,𝑓 𝑡 at time-step 𝑡


of system 𝑒 during flight 𝑓 , i.e., the output of the LSTM-AE. We train
this LSTM-AE solely with the unlabelled sensor data samples from a
Fig. 3. A schematic overview of a LSTM-cell with informative operating conditions.
just-installed aircraft system, i.e., when the system is still considered
healthy.
Beside the multi-sensor measurements, also the operating conditions
during each flight are available. Let 𝐎𝑒,𝑓 = {𝐎𝑒,𝑓 𝑒,𝑓
𝑡 , 𝑡 ∈ {1, 2, … , 𝑛 }} be
The input gate first proposes a new candidate long-term state
the operating conditions during flight 𝑓 with system 𝑒. Here, 𝐎𝑒,𝑓 = 𝑐𝑡can (Vasilev, 2019; Géron, 2018):
𝑡
o
[𝑂𝑡𝑒,𝑓 ,1 , 𝑂𝑡𝑒,𝑓 ,2 , … , 𝑂𝑡𝑒,𝑓 ,𝑚 ] denotes the operating conditions at time-step ( )
𝑐𝑡can = tanh 𝑊𝑐 𝐗𝑒,𝑓 𝑒,𝑓
𝑡 + 𝑉𝑐 𝐎𝑡 + 𝑈𝑐 ℎ𝑡−1 + 𝑏𝑐 , (3)
𝑡 during this flight, and 𝑚o is the number of operating conditions.
The operating conditions are used as input for both the encoder and where 𝑊𝑐 , 𝑉𝑐 and 𝑈𝑐 are the weight matrices connecting 𝐗𝑒,𝑓 𝑒,𝑓
and
𝑡 , 𝐎𝑡
the decoder. But in contrast with the sensor measurements, the oper- can
ℎ𝑡−1 to the output 𝑐𝑡 respectively, and 𝑏𝑐 is the bias of this layer. A
ating conditions are not encoded and then reconstructed, but merely tanh (hyperbolic tangent) activation function is considered. Next, the
used ‘‘informatively’’: they assist in encoding and decoding the sensor input gate determines which parts of the candidate long-term state are
measurements. A schematic overview of the considered LSTM-AE is in added to the new long-term state (Vasilev, 2019; Géron, 2018):
Fig. 2. ( )
𝑖𝑡 = 𝜎 𝑊𝑖 𝐗𝑒,𝑓 𝑒,𝑓
𝑡 + 𝑉𝑖 𝐎𝑡 + 𝑈𝑖 ℎ𝑡−1 + 𝑏𝑖 . (4)
2.1.1. Encoder
At each time step 𝑡 during a flight 𝑓 , the goal of the encoder is to Again, 𝑊𝑖 , 𝑉𝑖 and 𝑈𝑖 are the weight matrices connecting 𝐗𝑒,𝑓 𝑒,𝑓
𝑡 , 𝐎𝑡 and
encode the multi-sensor measurement 𝐗𝑒,𝑓 𝑡 to the short-term state ℎ𝑡 , ℎ𝑡−1 to the output 𝑖𝑡 respectively, and 𝑏𝑖 is the bias of this layer.
which has a smaller dimension. The encoder consists of 𝑛𝑒,𝑓 LSTM-cells Last, we update the long-term state with the output of the forget
for this flight 𝑓 (Hochreiter and Schmidhuber, 1997; Gers et al., 2000). and input gate as follows (Vasilev, 2019; Géron, 2018):
At time step 𝑡, we consider as input to the LSTM-cell (i) the long-term ( ) ( )
𝑐𝑡 = 𝑐𝑡−1 ⊗ 𝑔𝑡 ⊕ 𝑖𝑡 ⊗ 𝑐𝑡can , (5)
state 𝑐𝑡−1 and the short-term state ℎ𝑡−1 of previous time-step 𝑡 − 1, (ii)
the multi-sensor measurement 𝐗𝑒,𝑓 𝑡 , and (iii) the operating conditions where ⊕ and ⊗ denote element-wise addition and element-wise multi-
𝐎𝑒,𝑓
𝑡 . Each LSTM-cell consists of 3 gates (see Fig. 3): plication, respectively.
The forget gate determines which part of the long-term state is The output gate constructs the short-term state ℎ𝑡 . The output gate
(partly) erased (Vasilev, 2019; Géron, 2018): first determines which parts of the long-term state 𝑐𝑡 are transferred to
( ) the short-term state (Vasilev, 2019; Géron, 2018):
𝑔𝑡 = 𝜎 𝑊𝑔 𝐗𝑒,𝑓 𝑒,𝑓
𝑡 + 𝑉𝑔 𝐎𝑡 + 𝑈𝑔 ℎ𝑡−1 + 𝑏𝑔 . (2) ( )
𝑝𝑡 = 𝜎 𝑊𝑝 𝐗𝑒,𝑓 𝑒,𝑓
𝑡 + 𝑉𝑝 𝐎𝑡 + 𝑈𝑝 ℎ𝑡−1 + 𝑏𝑝 . (6)
Here, 𝑊𝑔 , 𝑉𝑔 and 𝑈𝑔 are the weight matrices connecting 𝐗𝑒,𝑓 𝑒,𝑓
𝑡 , 𝐎𝑡 and
ℎ𝑡−1 to the output 𝑔𝑡 respectively, and 𝑏𝑔 is the bias of this layer. Also, Here, 𝑊𝑝 , 𝑉𝑝 and 𝑈𝑝 are the weight matrices connecting 𝐗𝑒,𝑓 𝑒,𝑓
𝑡 , 𝐎𝑡 and
𝜎() denotes the logistic activation function. ℎ𝑡−1 to the output 𝑝𝑡 respectively, and 𝑏𝑝 is the bias of this layer. The

3
I. de Pater and M. Mitici Engineering Applications of Artificial Intelligence 117 (2023) 105582

First, we compute how well the initial short-term state ℎ′𝑡 of the
decoder aligns with the encoder short-term states ℎ𝑗 , 𝑗 ∈ {𝑡 − 𝐷, 𝑡 −
𝐷 + 1, … , 𝑡 + 𝐷}. Here, 𝐷 is the window size of the local attention
mechanism. The alignment score 𝑎𝑗,𝑡 between the decoder short-term
state ℎ′𝑡 and the encoder short-term state ℎ𝑗 is determined as (Luong
et al., 2015):

𝑎𝑗,𝑡 = ℎ′𝑇
𝑡 𝑊𝑎 ℎ𝑗 , 𝑗 ∈ {𝑡 − 𝐷, … , 𝑡 + 𝐷}. (8)

Here, 𝑊𝑎 is the weight matrix belonging to the attention mechanism,


and 𝑇 denotes the transpose. With these alignment scores, the weights
𝑎̄𝑗,𝑡 are derived with the softmax function (Luong et al., 2015):
𝑒𝑎𝑗,𝑡
𝑎̄𝑗,𝑡 = ∑𝑡+𝐷 , 𝑗 ∈ {𝑡 − 𝐷, … , 𝑡 + 𝐷}. (9)
𝑘=𝑡−𝐷 𝑒𝑎𝑘,𝑡
Next, the weights are used to derive the context vector 𝑣𝑡 (Luong et al.,
2015):

𝑡+𝐷
𝑣𝑡 = 𝑎̄𝑗,𝑡 ℎ𝑗 (10)
𝑗=𝑡−𝐷

With this context vector, the decoder short-term state is updated with
one fully connected layer (Luong et al., 2015):
( )
ℎ̃ ′𝑡 = tanh 𝑊ℎ [𝑣𝑡 , ℎ′𝑡 ] , (11)

with 𝑊ℎ a weight matrix belonging to this fully connected layer.

Fig. 4. Schematic overview of the local Luong attention mechanism. Fully connected layers. Last, we reconstruct the sensor measurements
at time-step 𝑡 with 𝑙 fully connected layers. As input, we use both the
augmented short-term state ℎ̃ ′𝑡 and the operating conditions 𝐎𝑒,𝑓𝑡 . By
new short-term state ℎ𝑡 is now constructed as (Vasilev, 2019; Géron, adding the operating conditions as input, we ensure that it is not useful
2018): for the augmented short-term states ℎ̃ ′𝑡 to contain any information on
the current operating conditions. We thus truly aim to encode and
ℎ𝑡 = 𝑝𝑡 ⊗ tanh(𝑐𝑡 ). (7)
decode the sensor measurements only. Here, the first 𝑙 − 1 layers have
the tanh activation function. The last layer has a linear activation
2.1.2. Decoder function, and contains 𝑚𝑠 nodes.
The decoder reconstructs the sensor measurements 𝐗𝑒,𝑓 𝑡 ,
𝑡 ∈ {2, … , 𝑛𝑒,𝑓 } using the short-term states from the encoder. We
2.2. Constructing a health indicator with the reconstruction errors of the
first obtain at each time-step 𝑡 of flight 𝑓 the short-term state ℎ′𝑡 of
LSTM-AE
the decoder with a LSTM-cell. Next, local Luong attention is used
to update the short-term state ℎ′𝑡 to the augmented short-term state
ℎ̃ ′𝑡 . Last, we input the augmented short-term state together with the We train the LSTM-AE only with the unlabelled sensor data samples
operating conditions at time-step 𝑡 to a fully connected neural network. of just-installed aircraft systems, i.e., from systems that are considered
This network outputs the reconstructed sensor measurements 𝐗̂ 𝑒,𝑓 (see healthy. We therefore expect that the reconstruction errors increase
𝑡
Fig. 2). when a system degrades over time (Malhotra et al., 2016). The recon-
struction errors of the trained LSTM-AE are thus used to derive a health
Recurrent layer. The first layer of the decoder consists of 𝑛𝑒,𝑓 − 1 LSTM- indicator.
cells. At time-step 𝑡 of flight 𝑓 , we consider the decoder short-term state Let 𝑒,𝑠
𝑓
be the mean reconstruction loss of a sensor 𝑠 during flight
ℎ′𝑡−1 and the decoder long-term state 𝑐𝑡−1 ′ of time-step 𝑡 − 1 as input 𝑓 performed by system 𝑒:
to the LSTM-cell. If 𝑡 = 2, we consider the last short-term state ℎ𝑛𝑒,𝑓 𝑒,𝑓
and long-term state 𝑐𝑛𝑒,𝑓 of the encoder as input instead. Moreover, 1 ∑
𝑛
| ̂ 𝑒,𝑓 ,𝑠 |
𝑒,𝑠 = |𝑋 − 𝑋𝑡𝑒,𝑓 ,𝑠 | , (12)
we input the previous reconstructed sensor measurements 𝐗̂ 𝑒,𝑓 𝑡−1
during 𝑓 𝑛𝑒,𝑓 − 1 𝑡=2 | 𝑡 |
the testing phase. During the training phase, we use teacher forcing
instead (Vasilev, 2019), i.e., we input the true sensor measurements with 𝑋̂ 𝑡𝑒,𝑓 ,𝑠 the 𝑡th reconstructed measurement of sensor 𝑠 during flight
𝐗𝑒,𝑓 . If 𝑡 = 2, we always input the true sensor measurements from time- 𝑓 performed by system 𝑒. Let 𝑒,𝑠 = {𝑒,𝑠 𝑓
, 𝑓 ∈ {1, 2, … , 𝐹 𝑒 }} be the
𝑡−1
time-series of the reconstruction loss for a sensor 𝑠 that monitors system
step 𝑡 = 1. Last, we use the operating conditions 𝐎𝑒,𝑓 𝑡 of time-step 𝑡 as
𝑒. Then, we define 𝜆𝑒 = {𝜆𝑒𝑓 , 𝑓 ∈ {1, 2, … , 𝐹 𝑒 }} as the health indicator
input, to assist in decoding the sensor measurements. The output of the
of an engine 𝑒, with
LSTM-cell is the decoder short-term state ℎ′𝑡 and the decoder long-term
𝑠
state 𝑐𝑡′ . ∑
𝑚
𝜆𝑒𝑓 = 𝑒,𝑠
𝑓
. (13)
Local luong attention. The dimension of the last encoder short-term 𝑠=1
state ℎ𝑛𝑒,𝑓 is too small to contain all relevant features of a long flight.
Moreover, the last encoder hidden state contains more information 3. Case study — aircraft engines
about the last sensor measurements than about the first sensor mea-
surements of the flight (Vasilev, 2019). We therefore use local Luong In this section, we first describe the considered data set in Sec-
attention (Luong et al., 2015) to update the decoder short-term states tion 3.1. Then, we describe the preprocessing of the data in Section 3.2,
with all the encoder short-term states ℎ𝑡 , 𝑡 ∈ {1, 2, … , 𝑛𝑒,𝑓 }. A schematic and illustrate the dataset in Section 3.3. Last, we introduce the metrics
overview of the local Luong attention mechanism is in Fig. 4. to evaluate the health indicators in Section 3.4.

4
I. de Pater and M. Mitici Engineering Applications of Artificial Intelligence 117 (2023) 105582

Fig. 5. Heatmap of the correlation between the sensor measurements and operating conditions — training engines of DS02, N-CMAPSS.

Table 1 per sensor per flight, we have a total of 147 million data points in
Sensors selected based on the correlation. LPC — Low Pressure Combustor. HPC — High
the training data set. However, most of the measurements of the 28
Pressure Combustor. HPT — High Pressure Turbine. LPT — Low Pressure Turbine.
sensors are highly correlated. For example, the flow out of the low
Symbol Description Unit
pressure turbine and the flow out of the high pressure turbine have a
Wf Fuel flow pps
correlation of 1.00. To reduce the computational load when training the
Nf Physical fan speed rpm
T24 Total temperature at LPC outlet ◦
R LSTM-AE, without comprising the information contained in the sensor
T30 Total temperature at HPC outlet ◦
R measurements, we select only one of two or more sensors that have a
T48 Total temperature at HPT outlet ◦R
correlation of 0.99 or higher. This results in the selection of 𝑚s = 13
T50 Total temperature at LPT outlet ◦R
sensors from the available 28 sensors (see Table 1). With this, the
P2 Total pressure at fan inlet psia
P50 Total pressure at LPT outlet psia
number of sensor measurements considered in the training dataset is
W21 Fan flow pps reduced to 68 million.
W50 Flow out of LPT lbm/s To further reduce the computational load for training the LSTM-
SmFan Fan stall margin – AE, we aggregate the sensor measurements and operating conditions
SmLPC LPC stall margin –
per minute. In other words, we consider the mean measurement and
SmHPC HPC stall margin –
operating condition per minute. This reduces the number of sensor
measurements in the training set to 1.143 million.
Moreover, the sensor measurements and the operating conditions
3.1. Aircraft engines in the N-CMAPSS data set are normalized using min–max normalization:
2 ⋅ (𝑋𝑡𝑒,𝑓 ,𝑠 − 𝑋min
𝑠 )
We consider dataset DS02 of the new N-CMAPSS data set − 1, (14)
𝑠
𝑋max 𝑠
− 𝑋min
(Arias Chao et al., 2021). Here, the degradation of aircraft turbofan
engines is simulated with the Commercial Modular Aero-Propulsion 2 ⋅ (𝑂𝑡𝑒,𝑓 ,𝑜 − 𝑂min
𝑜 )
System Simulation (C-MAPSS) model of NASA (Arias Chao et al., 2021). 𝑜 𝑜 − 1, (15)
𝑂max − 𝑂min
There are some key differences between this new data set compared to
where 𝑋min 𝑠 /𝑂𝑜 and 𝑋max𝑠 /𝑂𝑜
the previous C-MAPSS data set (see Saxena and Goebel (2008)). min max are the minimum and maximum
First, dataset DS02 of N-CMAPSS contains a limited number of measurement of sensor 𝑠/operating condition 𝑜 in the training set
engines: the training set contains only 6 engines. The test set contains 3 respectively.
engines, namely engine 11, 14 and 15. For each engine 𝑒 in the training Last, there are only 101 flights in the training set where the sensor
and the test set of DS02, sensor measurements are available during measurements are generated with the linear, slow degradation model.
each flight 𝑓 from engine installation until engine failure (i.e., run- We therefore use data augmentation to increase the number of data
to-failure instances). Let 𝐹 𝑒 denote the number of flights performed samples for training the LSTM-AE (Chao et al., 2022). For each flight
by engine 𝑒. Besides the sensor measurements, N-CMAPSS also con- 𝑓 ≤ 𝑓a𝑒 performed by engine 𝑒, we consider time-windows with
tains the operating conditions of the flights of the aircraft. A total of a size of 60, 70, 80, … , 𝑛𝑒,𝑓 − 10, 𝑛𝑒,𝑓 time-steps. These time-windows
𝑚o = 4 operating conditions are available: the altitude of the aircraft are rolled over flight 𝑓 of engine 𝑒 with a step size (i.e., stride) of
(alt), the flight Mach number (Mach), the throttle-resolver angle (TRA) 5 min. In this way, we subtract for each time-window with a size of
and the total temperature at the fan inlet (T2). Last, N-CMAPSS con- 60, 70, 80, … , 𝑛𝑒,𝑓 −10, 𝑛𝑒,𝑓 time-steps, several time-series of multi-sensor
tains high-frequency sensor measurements, with one measurement per measurements (i.e., data samples) from flight 𝑓 of engine 𝑒. With this
sensor/operating condition per second. approach, 25433 data samples are obtained to train the LSTM-AE.
In the beginning of the engine’s lifetime, the N-CMAPSS simulator
generates sensor measurements using a linear, slow degradation model. 3.3. Illustration of N-CMAPSS data set
Afterwards, an exponential, accelerated degradation model is used to
simulate the sensor measurements instead (Arias Chao et al., 2021). The Fig. 6(a) shows the normalized operating conditions during the
degradation of the engines when the linear, slow degradation model is first flight of engine 2 from the training dataset. Fig. 6(b) shows the
used is still small, so we consider these sensor measurements as coming normalized sensor measurements of sensors SmHPC, Nf, T48 and P50.
from ‘‘healthy’’, just-installed engines. We thus train our LSTM-AE only These figures and the correlation heatmap in Fig. 5 show that the sensor
with the sensor measurements obtained with this slow degradation measurements are highly correlated with the operating conditions. For
model. Let 𝑓a𝑒 be the last flight for which the sensor measurements are example, the correlation between the total pressure at the LPT outlet
generated with the slow degradation model. (P50) and the altitude of the aircraft (alt) is −0.98 (see Fig. 5).
Fig. 7 shows the mean normalized sensor measurement per flight,
3.2. Data preprocessing for all flights performed by engine 2 and for sensors SmHPC, Nf, T48
and P50. These mean sensor measurements do not exhibit a clear trend
The training dataset consists of 6 engines, which together perform towards failure. A more extensive analysis is thus necessary to obtain
446 flights. With 28 sensors and 4311 ≤ 𝑛𝑒,𝑓 ≤ 18 169 measurements a health indicator.

5
I. de Pater and M. Mitici Engineering Applications of Artificial Intelligence 117 (2023) 105582

Fig. 6. Normalized operating conditions and normalized sensor measurements — flight 1, training engine 2 of DS02, N-CMAPSS.

Table 2
Considered hyperparameters of the LSTM-AE.
Hyperparameter Value
Hyperparameters - architecture
Hidden size ℎ𝑡 , 𝑐𝑡 , ℎ′𝑡 and 𝑐𝑡′ 4
Window-size 𝐷 5
Number of fully connected layers 𝑙 3
Number of nodes first 𝑙 − 1 fully connected layers 128
Hyperparameters - optimization
Optimizer Adam (Kingma
and Ba, 2014)
Fig. 7. Mean normalized sensor measurement per flight — training engine 2 of DS02, Number of epochs 100
N-CMAPSS. Training–Validation split 90%–10%
Initial learning rate 0.01
Decrease learning rate when no improvement in validation 10
loss for ... epochs in a row
1
3.4. Metrics to evaluate the health indicators Decrease learning rate by 10

We evaluate the health indicators with the monotonicity (),


trendability ( ) and prognosability () metrics as follows: 4. Results — health indicator for aircraft engines
Monotonicity. We measure the monotonicity  of the health indicator
𝜆𝑒 of an engine 𝑒 as follows (Liu et al., 2020): In this section, we present the health indicators developed for the
engines in DS02, N-CMAPSS. We first describe the hyperparameters
|∑ 𝐹 𝑒 −1 |
|
1 | of the LSTM-AE in Section 4.1, and then the sensor selection in Sec-
= | 𝐼(𝜆𝑒𝑓 +1 − 𝜆𝑒𝑓 ) − 𝐼(𝜆𝑒𝑓 − 𝜆𝑒𝑓 +1 )|| ,
𝐹𝑒 | tion 4.2. Next, we present the health indicators from the LSTM-AE in
| 𝑓 =1
−1 |
{ | | Section 4.3. Last, we compare our approach with other methods in
1 𝑥>0 Section 4.4.
𝐼(𝑥) =
0 𝑥 ≤ 0.
4.1. Hyperparameters of LSTM-AE
Trendability. We consider the Spearman correlation coefficient be-
tween the health indicator 𝜆𝑒 and the flights {1, 2, … , 𝐹 𝑒 } to measure Table 2 shows the considered hyperparameters for the LSTM-AE.
the trendability  for an engine 𝑒 (Lei et al., 2018; Koutroulis et al., The architecture is derived using a grid search. After training, we select
2022): the weights belonging to the lowest validation loss. Using a computer
∑ 𝑒 𝑒
(∑ 𝑒 ) (∑ 𝑒 ) with an Intel Core i7 processor (4 CPU cores) and 8Gb RAM, it took on
𝐹 𝑒 𝐹𝑓 =1 𝑟𝜆𝑓 𝑓 − 𝐹 𝜆𝑒 𝐹
𝑓 =1 𝑟𝑓 𝑓 =1 𝑓 average 4.04 min to train the LSTM-AE for one epoch.
 = √ √
( ∑ 𝑒 ) (∑ 𝑒 )2 ( ∑ 𝑒 ) ( ∑ 𝑒 )2
𝑒 𝜆𝑒
𝐹 𝑒 𝐹𝑓 =1 (𝑟𝜆𝑓 )2 − 𝐹
𝑓 =1 𝑟𝑓 ⋅ 𝐹 𝑒 𝐹𝑓 =1 𝑓 2 − 𝐹
𝑓 =1 𝑓 , 4.2. Sensor selection for constructing a health indicator

(16)
Fig. 8 shows the reconstructed measurements of sensors T48 and
𝑒
where 𝑟𝜆𝑓 , 𝑓 ∈ {1, 2, … , 𝐹 𝑒 } is the rank sequence of the health indicator P50 for the first and the last flight of training engine 2. During the
𝜆𝑒 . first flight, the reconstructed measurements are very close to the ac-
tual measurements for both sensors. In contrast, the reconstructed
Prognosability. We consider the following prognosability metric  (also measurements of sensor T48 deviate considerably from the actual mea-
called consistency) (Lei et al., 2018; Liu et al., 2020): surements during the last flight of engine 2 (Fig. 8(c)). This is expected,
since the engine is severely degraded just before failure, while we train
⎡ −STD(𝜆𝑒 , 𝑒 ∈ 𝐸 test ) ⎤
 = 𝑒𝑥𝑝 ⎢ 𝐹𝑒 ⎥, (17) the LSTM-AE with sensor measurements from slightly-degraded engines
⎢ 1 ∑ test ||𝜆𝑒 − 𝜆𝑒 || ⎥ only. This does not hold, however, for all sensors: the reconstructed
⎣ |𝐸 test | 𝑒∈𝐸 | 1 𝐹𝑒| ⎦
measurements of sensor P50 are still very close to the actual sensor
where STD(𝜆𝑒𝐹 𝑒 , 𝑒 ∈ 𝐸 test ) is the standard deviation of the last health measurements (Fig. 8(d)).
indicator values 𝜆𝑒𝐹 𝑒 , 𝑒 ∈ 𝐸 test (with 𝐹 𝑒 the last flight of engine 𝑒), and The different trends towards the time of failure of sensors T48
𝐸 test is the set with the test engines of DS02. and P50 are also shown in Fig. 9. The reconstruction loss of sensor

6
I. de Pater and M. Mitici Engineering Applications of Artificial Intelligence 117 (2023) 105582

Fig. 8. Actual and reconstructed sensor measurements with the LSTM-AE — first and last flight, training engine 2 of DS02, N-CMAPSS.

Fig. 9. Mean loss 𝑒,𝑠


𝑓
per flight with the LSTM-AE — training engine 2 of DS02, N-CMAPSS.

Fig. 10. Health indicator with the LSTM-AE — test engines 11, 14 and 15 of DS02, N-CMAPSS.

T48 monotonically increases towards failure, while the reconstruction of 0.94. The increasing trend of the health indicators towards failure is
loss of sensor P50 resembles random noise. Let  𝑒,𝑠 be the Spearman reflected by the mean trendability of 0.95. Some small noise is visible
correlation coefficient (see Eq. (16)) between the reconstruction loss in the health indicators, which is reflected by the mean monotonicity
𝑒,𝑠 of sensor 𝑠 monitoring engine 𝑒 and the flights 𝑓 , i.e., the operating is 0.38.
time. Let  𝑠 be the mean over the Spearman correlation coefficients  𝑒,𝑠 The operating conditions for test engine 11 are similar to the oper-
for sensor 𝑠, where the mean is taken over all training engines 𝑒. This ating conditions of the engines in the training set, while the operating
mean correlation is close to 1 for sensors for which the loss clearly
conditions of test engines 14 and 15 are different than in the training
increases towards failure. In contrast, it is close to 0 for sensors for
set (Arias Chao et al., 2021). The monotonicity for engine 14 and 15
which the loss shows no trend towards failure. To construct a health
is indeed lower than for engine 11. However, the trendability is still
indicator, we include in Eq. (13) only those sensors for which the mean
high for all three test engines. Our approach thus achieves a high
Spearman correlation  𝑠 between the reconstruction loss and the flights
trendability and prognosability even when the operating conditions are
is 0.5 or larger. In this way, we do not construct the health indicator
with sensors that are very weakly correlated with the time to failure, different than in the training set.
such as sensor P50: These sensors add little to no information on the
degradation to the health indicator. 4.4. Comparison with other autoencoders

4.3. Health indicators of the test engines


We compare the health indicators from the proposed approach
Fig. 10 shows the obtained health indicators, and Table 3 shows with the health indicators from several other methodologies: With
the sensors selected to construct the health indicators and the health other recurrent autoencoders, with the LSTM-AE without attention or
indicator metrics. The three test engines fail when the health indicator operating conditions, and with standard, non-recurrent autoencoders.
equals roughly 0.8/0.9, which is reflected by the high prognosability The results for these other methodologies are in Table 3 as well.

7
I. de Pater and M. Mitici Engineering Applications of Artificial Intelligence 117 (2023) 105582

Table 3
Evaluation of the health indicators for various autoencoders — test engines 11, 14 and 15 of DS02, N-CMAPSS.  — Monotonicity.  — trendability.  — prognosability. The
best results are denoted in bold.
Engine 11 Engine 14 Engine 15 Mean
Selected sensors         
Proposed method
LSTM-AE W50, SmFan, SmLPC, SmHPC, Wf, T24, T30, T48, T50 0.48 0.97 0.31 0.91 0.36 0.97 0.38 0.95 0.94
Proposed method with other recurrent autoencoders
GRU-AE W50, SmFan, SmLPC, SmHPC, Wf, T24, T30, T48, T50 0.21 0.94 0.23 0.79 0.12 0.93 0.18 0.89 0.94
BiGRU-AE W50, SmLPC, SmHPC, T48, T50 0.24 0.88 0.23 0.84 0.03 0.79 0.17 0.84 0.95
BiLSTM-AE W50, SmFan, SmLPC, SmHPC, Wf, T24, T30, T48, T50 0.52 0.97 0.33 0.90 0.30 0.97 0.38 0.94 0.94
Proposed method without operating conditions (no o.c.) or without attention (no att.)
LSTM-AE-no o.c. SmLPC 0.00 0.65 0.01 0.53 0.03 0.39 0.01 0.52 0.67
LSTM-AE-no att. W50, SmFan, SmLPC, SmHPC, Wf, T24, T30, T48, T50 0.48 0.97 0.23 0.88 0.30 0.97 0.34 0.94 0.94
Other non-recurrent autoencoders
1D-CAE SmHPC 0.10 0.55 0.01 0.54 0.09 0.28 0.07 0.45 0.77
FAE W21, W50, SmFan, SmLPC, SmHPC, Nf, T24, T30, T48, T50, P50 0.38 0.77 0.12 0.52 0.18 0.82 0.23 0.70 0.89

Other recurrent autoencoders. We compare the obtained health indica- sliding time-window with a fixed size of 16 time-steps and a stride of
tors with the health indicators from four other recurrent autoencoders: 1. To create the health indicator value for a flight 𝑓 of an engine 𝑒, we
the Gated Recurrent Unit autoencoder (GRU-AE), the bidirectional use the mean reconstruction loss 𝑒,𝑠𝑓
over all time-windows of size 16
GRU-AE (BiGRU-AE) and the bidirectional LSTM-AE (BiLSTM-AE). For and stride 1 of this flight 𝑓 .
each recurrent autoencoder, we also implement local Luong attention The one-dimensional convolutional encoder consists of two blocks,
and integrate the operating conditions. each with two convolutional layers and one max pooling layer with
For the BiLSTM-AE, we consider a bidirectional LSTM encoder a pooling size of 2. The filters of the convolutional layer have size 4
(Vasilev, 2019, Chapter 8). However, in the decoder, the reconstructed and a stride of 1. The first three convolutional layers have 8 channels,
sensor measurements of time-step 𝑡 are used as input in the LSTM-cell while the last convolutional layer has only 1 channel. The convolutional
at time-step 𝑡 + 1. We therefore cannot consider a bidirectional decoder decoder consists of the same structure, but instead of pooling layers we
as well. For the GRU-AE, we replace each LSTM-cell in the autoencoder use interpolating layers. We consider zero-padding for all convolutional
by a GRU-cell (Vasilev, 2019, Chapter 7). For the BiGRU-AE, we replace layers. Moreover, all layers use the ReLU activation function, except the
each LSTM-cell in the BiLSTM-AE with a GRU-cell. last layer of the decoder, which uses the linear activation function.
Table 3 shows the monotonicity, trendability and prognosability for The encoder of the FAE consists of two fully connected layers. The
the health indicators of the different recurrent autoencoders. With the number of neurons is halved for each subsequent fully connected layer.
(Bi)GRU-AE, the monotonicity and the trendability are considerably The decoder consists of the same structure, only here the number of
lower than with the LSTM-AE. The prognosability with the BiGRU-AE is neurons is doubled in each fully connected layer. Each layer applies the
0.95, slightly higher than the prognosability of 0.94 with the LSTM-AE. ReLU activation function, except the last layer of the decoder, which
The monotonicity and the prognosability are the same with the uses the linear activation function.
LSTM-AE and the BiLSTM-AE. With the LSTM-AE, however, the mean Table 3 shows the results for the 1D-CAE and the FAE. For the 1D-
trendability of 0.95 is slightly higher than the mean trendability of 0.94 CAE, no sensor had a Spearman trendability for the training engines of
for the BiLSTM-AE. 0.5 or higher. Instead, we only include sensor SmHPC, which has the
The same subset of sensors is selected to create the health indicators highest trendability of all sensors. The trendability and prognosability
for the GRU-AE, the LSTM-AE and the BiLSTM-AE. For the BiGRU-AE, are considerably higher when considering a recurrent autoencoder
however, less sensors are selected to create the health indicators: Here, instead of the 1D-CAE or the FAE. This shows the added value of
sensor SmFan, Wf, T24, and T30 are not selected. processing the time-series of sensor measurements with a recurrent
autoencoder.
No operating conditions or no attention. We also analyse the health
indicators when the operating conditions are not incorporated in the 5. Methodology - Online RUL prognostics using similarity-based
LSTM-AE, i.e., when the operating conditions 𝐎𝑒,𝑓 are completely matching
removed from the autoencoder. When not considering the operating
conditions, no sensor had a Spearman trendability for the training In this section, we show how the health indicators developed in
engines of 0.5 or higher. Instead, we only include sensor SmLPC, Section 4 are used for health state division (Section 5.1) and to obtain
with the highest trendability of all sensors. Without incorporating the RUL prognostics (Section 5.2).
operating conditions, the monotonicity, trendability and prognosability
of the health indicators decrease considerably (see Table 3). 5.1. Health state division using Chebyshev’s inequality
Table 3 also shows the metrics when we do incorporate the operat-
ing conditions, but when no attention is used. The prognosability is the Before we predict the RUL, we first diagnose an engine as healthy or
same with and without attention. The monotonicity, however, is lower unhealthy. This is called health state division or diagnostics (Lei et al.,
when not using attention (0.34 instead of 0.38). Also the trendability 2018). An engine is diagnosed as unhealthy once its health indicator
is slightly lower (0.94 instead of 0.95). crosses a threshold 𝜂 times in a row. This threshold is determined using
Chebyshev’s inequality (Singh et al., 2020; Kong and Yang, 2019). For
Other non-recurrent autoencoders. Last, we compare our method with
our application, this inequality states that:
two standard, non-recurrent autoencoders, namely a one-dimensional
Convolutional autoencoder (1D-CAE) and a fully connected autoen- 1
𝑃 (|𝜆𝑒𝑓 − 𝜇| ≥ 𝑘𝜎) ≤ , (18)
coder (FAE). Both the sensor measurements and the operating condi- 𝑘2
tions are selected as input, though only the sensor measurements are where 𝑘 > 0, 𝑃 (⋅) denotes the probability, and 𝜇 is the mean and
reconstructed. To construct fixed-length input samples, we consider a 𝜎 is the standard deviation of the health indicator values 𝜆𝑒𝑓 of the

8
I. de Pater and M. Mitici Engineering Applications of Artificial Intelligence 117 (2023) 105582

Fig. 11. Illustration of the similarity-based matching method to predict the RUL of engine 𝑖, with 𝐻 health-indicators from the training set in the library.

Fig. 12. The iterative process of matching the partial health indicator 𝜆̃𝑖 with the offline health indicator 𝜆𝑒 , with different values for the time-lag 𝜏.

training engines 𝑒 and flights 𝑓 for which the sensor measurements are Fig. 12 shows an example of a matching between the partial health
simulated using the slow, linear degradation model (i.e., 𝑓 ≤ 𝑓a𝑒 ). Thus, indicator 𝜆̃𝑖 and a health indicator 𝜆𝑒 from the library. In this example,
an engine is diagnosed as unhealthy as soon as the threshold 𝜇 + 𝑘𝜎 is the partial health indicator is 𝑀 = 40 flights long, while the offline
exceeded by the health indicator 𝜆𝑒𝑓 𝜂 times in a row. The probability health indicator 𝜆𝑒 is 120 flights long. When 𝜏 = 0, the Euclidean
that this occurs while the sensor measurements are generated using the distance is based on the first 40 values of both health indicators, so
slow, linear degradation model is less than ( 𝑘12 )𝜂 . Let 𝑓u𝑒 be the flight between 𝜆̃𝑖 and {𝜆𝑒𝑓 , 𝑓 ∈ {1, 2, … , 40}}. However, the similarity between
during which an engine 𝑒 is diagnosed as unhealthy. We start with these two health indicators is very small (see Fig. 12(a)). In Fig. 12(b),
predicting the RUL from flight 𝑓u𝑒 onwards. we therefore shift 𝜆̃𝑖 forward with 𝜏 = 50 flights. Now, the Euclidean
𝑖,𝑒
distance between 𝜆̃𝑖 and {𝜆𝑒𝑓 , 𝑓 ∈ {51, … , 90}} decreases. Let 𝜏max ,
𝑖,𝑒
5.2. Similarity-based matching method for RUL prognostics 𝜏max = |𝜆 | − 𝑀 flights, denote the maximum number of flights 𝜆̃ can
𝑒 𝑖

be shifted forward when matching with 𝜆𝑒 , with |𝜆𝑒 | the length of the
Once an engine is diagnosed as unhealthy, we estimate its RUL offline health indicator.
after each flight using a similarity-based health indicator matching Given a time-lag 𝜏, the average Euclidean distance 𝑑(𝜆̃𝑖 , 𝜆𝑒 , 𝜏) be-
approach (Yu et al., 2020; Malhotra et al., 2016). These are online RUL tween 𝜆𝑒 and 𝜆̃𝑖 is:

prognostics, since the RUL prognostics are updated every time more √
√∑ 𝑀 ( )2
sensor measurements become available. ̃ 𝑖 𝑒 1 √

𝑑(𝜆 , 𝜆 , 𝜏) = 𝜆𝑒𝑓 +𝜏 − 𝜆̃𝑖𝑓 , (19)
Let 𝜆̃𝑖 = {𝜆𝑖𝑓 , 𝑓 ∈ {𝑓̃ − 𝑀, … , 𝑓̃}} denote a partial health indicator 𝑀 𝑓 =1
available for engine 𝑖, using the sensor measurements available up to
a flight 𝑓̃ ≥ 𝑓u𝑒 . Here, 𝑀 is the fixed length of the partial health with a corresponding preliminary RUL prognostic of (see also
indicators. Our aim is to predict the RUL of engine 𝑖 at flight 𝑓̃. For Fig. 12):
this, we consider a library with for each training engine 𝑒 the offline ̂ 𝑒, 𝜏) = |𝜆𝑒 | − 𝑀 − 𝜏,
RUL(𝑖, (20)
health indicator 𝜆𝑒 . To predict the RUL, we match the partial health
indicator 𝜆̃𝑖 with all the offline health indicators 𝜆𝑒 in the library. A and a similarity score of (Yu et al., 2020; Malhotra et al., 2016):
schematic overview of the matching procedure is in Fig. 11.
𝜌(𝑖, 𝑒, 𝜏) = 𝑒𝑥𝑝(−𝑑(𝜆̃𝑖 , 𝜆𝑒 , 𝜏)∕𝛾), (21)
When matching, we determine the similarity between the partial
health indicator 𝜆̃𝑖 and the offline indicators 𝜆𝑒 in the library as the where 𝛾 > 0 is a parameter that influences the scaling of the score with
average Euclidean distance between these indices. To maximize the respect to the Euclidean distance. The score 𝜌(𝑖, 𝑒, 𝜏) is higher when 𝜆̃𝑖
similarity between 𝜆̃𝑖 and 𝜆𝑒 , i.e., to identify the best matches between and 𝜆𝑒 are more similar, given time-lag 𝜏.
𝜆̃𝑖 and 𝜆𝑒 , 𝜆̃𝑖 is shifted along 𝜆𝑒 in the positive time-direction for 𝜏 Let 𝜌̃ denote the highest similarity score of engine 𝑖, obtained across
flights. all training engines 𝑒 in the library and all time-lags 𝜏 (see also Fig. 11),

9
I. de Pater and M. Mitici Engineering Applications of Artificial Intelligence 117 (2023) 105582

Table 4
Health state division: Flight 𝑓a𝑒 , after which the sensor measurements are generated
using the exponential degradation model, and flight 𝑓u𝑒 , during which the engine is
diagnosed as unhealthy — test engines 11, 14 and 15 of DS02, N-CMAPSS. The best
results are denoted in bold.
Engine
Method 11 14 15
Flight 𝑓a𝑒 18 35 23
Proposed method
LSTM-AE 𝑓u𝑒 30 36 32
Proposed method with other recurrent autoencoders
GRU-AE 𝑓u𝑒 44 53 48
BiGRU-AE 𝑓u𝑒 46 66 54
BiLSTM-AE 𝑓u𝑒 30 36 32
Proposed method without attention (no att.) Fig. 13. RUL predictions with the LSTM-AE — test engines 11, 14 and 15 of DS02,
N-CMAPPS. The first RUL prediction is made when the engine is declared unhealthy,
LSTM-AE- no att. 𝑓u𝑒 30 36 37
and is updated after every flight.

Table 5
Evaluation of the RUL predictions for our proposed approach (LSTM-AE) versus other
i.e.,:
approaches — test engines 11, 14 and 15 of DS02, N-CMAPSS.
𝜌̃ = max {𝜌(𝑖, 𝑒, 𝜏)}, (22) Engine 11 Engine 14 Engine 15 All
𝑒 ∈ 𝐸 train , MAE RMSE MAE RMSE MAE RMSE MAE RMSE
𝑖,𝑒
𝜏 ∈ {0, 1, … , 𝜏max }
Proposed method

where 𝐸 train denotes the set with all training engines. To estimate the LSTM-AE 2.89 3.50 1.89 2.41 1.95 2.12 2.18 2.67
̂ 𝑒, 𝜏)
RUL of engine 𝑖, we include all preliminary RUL prognostics RUL(𝑖, Proposed method with other recurrent autoencoders
for which the score 𝜌(𝑖, 𝑒, 𝜏) is high enough, i.e., when (Malhotra et al., GRU-AE 1.06 1.22 3.76 4.27 0.98 1.24 2.12 3.87
2016): BiGRU-AE 1.24 1.72 3.02 3.45 1.73 2.80 1.91 2.36
BiLSTM-AE 2.72 3.41 2.42 3.02 2.77 2.96 2.62 3.12
𝜌(𝑖, 𝑒, 𝜏) ≥ 𝛼 ⋅ 𝜌,
̃ (23)
Proposed method without attention (no att.)
with 𝛼 ∈ [0, 1]. Let 𝛱 𝑖 be the set of all combinations (𝑒, 𝜏) of training LSTM-AE-no att. 2.76 3.50 2.49 3.31 2.09 2.28 2.45 3.10
engines 𝑒 and time lags 𝜏, such that 𝜌(𝑖, 𝑒, 𝜏) ≥ 𝛼 ⋅ 𝜌.
̃ Then the weight of
̂ 𝑒, 𝜏), (𝑒, 𝜏) ∈ 𝛱 𝑖 , is:
RUL prediction RUL(𝑖,
𝜌(𝑖, 𝑒, 𝜏) Mean Absolute Error (MAE) with these RUL predictions. The results
𝑝(𝑖, 𝑒, 𝜏) = ∑ . (24)
(𝜖,𝑇 )∈𝛱 𝑖 𝜌(𝑖, 𝜖, 𝑇 ) show that RUL is well estimated for all three engines, with a RMSE
Finally, the predicted RUL RUL𝑖 of engine 𝑖 is (Yu et al., 2020): between 2.12 and 3.50 flights only.
∑ The hyperparameters of the health state division and the similarity-
RUL𝑖 = ̂ 𝑒, 𝜏).
𝑝(𝑖, 𝑒, 𝜏) ⋅ RUL(𝑖, (25) based matching, used to obtain these RUL results, are derived us-
(𝑒,𝜏)∈𝛱 𝑖 ing a grid search with leave-one-out cross validation in the training
set (Bishop, 2006). Here, the goal is to minimize the RMSE for the
6. Results - Online RUL prognostics for aircraft engines
training engines. For the health state division, we obtain 𝜂 = 3 and
𝑘 = 5 (see Section 5.1). For the similarity-based matching method, we
In this section, we present the RUL prognostics for the test engines obtain 𝑀 = 10, 𝜆 = 0.01 and 𝛼 = 0.7 (see Section 5.2).
of dataset DS02. First, we analyse the health state division and the
RUL prognostics in Section 6.1. Then, we compare our results with the 6.2. Comparison with the RUL prognostics of other autoencoders
results of the other autoencoders in Section 6.2. Moreover, we compare
our RUL prognostics with the RUL prognostics of common supervised We also analyse the health state division and the RUL predictions
learning models that directly predict the RUL in Section 6.3. Last, we with the health indicators of the best autoencoders of Section 4.4:
analyse the RUL prognostics for a decreasing number of offline health Specifically, we consider the other recurrent autoencoders, and the
indicators in the library in Section 6.4. LSTM-AE without attention.
The test engines are diagnosed as unhealthy during the same flights
6.1. Health state division and RUL prognostics for the BiLSTM-AE and the LSTM-AE (see Table 4). Moreover, test
engine 11 and 14 are diagnosed as unhealthy during the same flights
Table 4 shows the flight 𝑓u𝑒 during which the test engines of DS02 for the LSTM-AE with and without attention. However, test engine 15
are diagnosed as unhealthy. Each test engine is labelled as unhealthy is diagnosed as unhealthy 5 flights later when no attention is used.
between 29 to 40 flights before failure (see also Fig. 10). This is 1 to For the BiGRU-AE and the GRU-AE, the engines are diagnosed as
12 flights after the last flight 𝑓a𝑒 during which the sensor measurements unhealthy after a later flight: For the GRU-AE, the engines are labelled
are generated using the linear degradation model. as unhealthy 15 to 23 flights before failure, while for the BiGRU-AE,
Fig. 13 shows the RUL predictions for the test engines of DS02. the engines are labelled as unhealthy only 10 to 13 flights before
These RUL predictions are generated as soon as the engine is diagnosed failure. This late diagnosis as unhealthy is expected, given the relatively
as unhealthy, and then updated after each flight. The RUL of engine 11 low monotonicity and trendability of the (Bi)GRU-AE. It is, however,
is slightly overestimated when this engine is diagnosed as unhealthy, preferable if an engine is diagnosed as unhealthy far before failure,
with a prediction error of −6 flights. In contrast, the RUL for engine provided that the degradation in the engines is large enough to make
14 is slightly underestimated (a prediction error of 6 flights) after it accurate RUL predictions.
is diagnosed as unhealthy. However, the RUL predictions of all test Table 5 also shows the MAE and the RMSE of the RUL prognostics
engines quickly converge to the true RUL as the engines approach their for the other considered autoencoders. For all four considered autoen-
failure time. Table 5 shows the Root Mean Square Error (RMSE) and the coders, we find that 𝜂 = 3, 𝑘 = 5 and 𝜆 = 0.01. The fixed length equals

10
I. de Pater and M. Mitici Engineering Applications of Artificial Intelligence 117 (2023) 105582

Table 6
RMSE for the test engines 11,14 and 15 of DS02, N-CMAPSS, with various
methodologies. The best results are denoted in bold.
Engine
11 14 15 All
Proposed methodology
LSTM-AE 3.50 2.41 2.12 2.67
Supervised learning neural networks
1D-CNN 4.09 5.07 2.84 4.16
LSTM-NN 4.24 3.32 2.22 3.31

Fig. 14. Learning curve of the RMSE and MAE for the LSTM-AE — test engines of
𝑀 = 10 for the GRU-AE, and 𝑀 = 5 for the other autoencoders. The DS02, N-CMAPSS.

parameter 𝛼 is 0.1 for the BiGRU-AE, 0.2 for the LSTM-AE without
attention, 0.4 for the BiLSTM-AE and 0.5 for the GRU-AE.
Table 7
The RUL predictions with the LSTM-AE have a lower overall RMSE Overview of the considered libraries. Here, 𝐸 train is the set with all training engines.
and MAE then the RUL predictions with the BiLSTM-AE. This is as Size of # of Set of offline libraries (⋅)
expected, since the health indicators of the LSTM-AE have a slightly library libraries
higher trendability. 1 6 {(𝑒1 ) ∶ 𝑒1 ∈ 𝐸 train }
We cannot directly compare the RUL predictions of the (Bi)GRU-AE 2 15 {(𝑒1 , 𝑒2 ) ∶ 𝑒1 ∈ 𝐸 train , 𝑒2 ∈ 𝐸 train ⧵ {𝑒1 }}
and the LSTM-AE: the engines are diagnosed as unhealthy much closer 3 20 {(𝑒1 , 𝑒2 , 𝑒3 ) ∶ 𝑒1 ∈ 𝐸 train , 𝑒2 ∈ 𝐸 train ⧵ {𝑒1 }, 𝑒3 ∈ 𝐸 train ⧵ {𝑒1 , 𝑒2 }}
4 15 {(𝑒1 , 𝑒2 , 𝑒3 , 𝑒4 ) ∶ 𝑒1 ∈ 𝐸 train , 𝑒𝑖 ∈ 𝐸 train ⧵ {𝑒𝑗 , 𝑗 = 1, 2, … , 𝑖 − 1}, 𝑖 =
to failure when considering the (Bi)GRU-AE. In general, we expect that 2, 3, 4}
the RUL predictions improve when an engine degrades over time. We 5 6 {(𝑒1 , 𝑒2 , 𝑒3 , 𝑒4 , 𝑒5 ) ∶ 𝑒1 ∈ 𝐸 train , 𝑒𝑖 ∈ 𝐸 train ⧵ {𝑒𝑗 , 𝑗 =
thus expect that the RMSE is lower when an engine is diagnosed as 1, 2, … , 𝑖 − 1}, 𝑖 = 2, 3, 4, 5}
unhealthy later. Nevertheless, the overall RMSE of the LSTM-AE (2.67 6 1 𝐸 train

flights) is better than the overall RMSE of the GRU-AE (3.87 flights)
and only slightly worse than the overall RMSE of the BiGRU-AE (2.36
flights).
6.4. Impact of the number of available labelled data samples on the RUL
For each test engine, the RMSE with the LSTM-AE without attention
prognostics
is larger than or equal to the RMSE with the LSTM-AE with attention.
This also holds for test engine 15, even though engine 15 is diagnosed
Due to prevent maintenance, most aircraft systems are replaced
as unhealthy 5 flights later when not using attention. Moreover, the
before their failure. There are therefore only limited labelled data
overall RMSE equals 3.10 flights without attention, while it only equals
samples available. In this section, we thus study the impact of the size of
2.67 flights with attention. This shows the benefits of incorporating
the library in the matching approach, i.e., the number of offline health
attention in the autoencoder. indicators in the library, on the accuracy of the RUL prognostics. The
health indicators are constructed using unlabelled data samples from
6.3. Comparison with other, supervised learning methods the beginning of an engine’s lifetime only. In real life, there are enough
unlabelled data samples to train an autoencoder. We thus use the same
Last, we compare our results with the results of neural networks online and offline health indicators as in Section 4.
Fig. 14 shows the RMSE and the MAE of the RUL prognostics for an
that directly output a RUL prediction, i.e., supervised learning methods.
increasing number of available offline health indicators in the library.
Here, we train two benchmark neural networks to directly predict the
This is also called the learning curve. We consider for each number of
RUL: the one-dimensional convolutional neural network (1D-CNN) and
available offline health indicators, all possible combinations of histori-
the LSTM neural network (LSTM-NN). These two neural networks are
cal health indicators that give a library of this size. For example, if two
also used as benchmark in Chao et al. (2022), and we thus use the same
health indicators are available, we consider the following 15 libraries:
architecture and hyperparameters as in Chao et al. (2022). However, to
allow for a fair comparison, we use the same sensors that are used as {(1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (2, 3), (2, 4), (2, 5),
input to our approach (see Table 1) as input to the benchmark neural (2, 6), (3, 4), (3, 5), (3, 6), (4, 5), (4, 6), (5, 6)}.
networks.
Here, 1, 2, … , 6 denotes the offline health indicator from the first,
There is no straightforward method for health state division when
second, . . . , sixth training engine respectively. An overview of all
using a supervised learning method. However, RUL predictions usually
considered libraries is in Table 7. With each library, we predict the
improve when the true RUL becomes smaller. The comparison of the
RUL of the test engines.
RUL prognostics would thus not be fair if we do not use any health
As expected, the RUL predictions improve when the size of the
state division for the supervised learning methods. Instead, we apply
library increases. The decrease in the RMSE and MAE is highest when
the health state division of the proposed methodology also to the
we consider two offline health indicators in the library, instead of just
benchmark neural networks. For example, engine 11 is diagnosed as
one. However, even when a library consists of just one offline health
unhealthy at flight 30 with the proposed approach. We thus also predict indicator, the RUL is well estimated with a RMSE of only 3.95 flights,
the RUL of engine 11 with the benchmark neural networks from flight and a MAE of only 3.38 flights. This shows that our approach works
30 onward. well when only very few labelled data samples are available.
Table 6 shows the RMSE of the RUL predictions with the various
methodologies. The RMSE of the RUL prognostics is lowest for all test 7. Conclusion
engines when considering our proposed approach. This shows that our
approach works well for the considered data set with limited failure In aviation, safety–critical aircraft systems usually undergo preven-
instances, compared to a supervised learning method. tive maintenance. Consequently, only very few labelled sensor data

11
I. de Pater and M. Mitici Engineering Applications of Artificial Intelligence 117 (2023) 105582

samples, with as label the true RUL, are available. Many labelled data Chao, M.A., Kulkarni, C., Goebel, K., Fink, O., 2022. Fusing physics-based and deep
samples, however, are required to train supervised learning models that learning models for prognostics. Reliab. Eng. Syst. Saf. 217, 107961.
de Pater, I., Mitici, M., 2022. Novel metrics to evaluate probabilistic remaining useful
directly predict the RUL. In this paper, we therefore instead propose to
life prognostics with applications to turbofan engines. In: PHM Society European
construct a health indicator by training a LSTM autoencoder (LSTM- Conference, Vol. 7. pp. 96–109.
AE) with unlabelled data samples (i.e., the corresponding true RUL de Pater, I., Reijns, A., Mitici, M., 2022. Alarm-based predictive maintenance scheduling
is unknown). The reconstruction error with the LSTM-AE increases as for aircraft engines with imperfect Remaining Useful Life prognostics. Reliab. Eng.
the degradation in a system increases, and is therefore used as health Syst. Saf. 221, 108341.
Fink, O., Wang, Q., Svensen, M., Dersin, P., Lee, W.-J., Ducoffe, M., 2020. Potential,
indicator. The sensor measurements of aircraft systems are generated at challenges and future directions for deep learning in prognostics and health
a high frequency during flights of several hours. Each data sample thus management applications. Eng. Appl. Artif. Intell. 92, 103678.
consists of a long time-series of multi-sensor measurements. We apply Fu, S., Zhong, S., Lin, L., Zhao, M., 2021. A novel time-series memory auto-encoder
attention in the LSTM-AE to handle these long time-series. Moreover, with sequentially updated reconstructions for remaining useful life prediction. IEEE
Trans. Neural Netw. Learn. Syst..
aircraft are operated under highly-varying operating conditions. To cre-
Géron, A., 2018. Neural Networks and Deep Learning. O’Reilly.
ate robust health indicators, we thus integrate the operating conditions Gers, F.A., Schmidhuber, J., Cummins, F., 2000. Learning to forget: Continual prediction
in the LSTM-AE. with LSTM. Neural Comput. 12 (10), 2451–2471.
Next, we divide the lifetime of each engine in a healthy and an Gugulothu, N., Tv, V., Malhotra, P., Vig, L., Agarwal, P., Shroff, G., 2017. Predicting
unhealthy stage using Chebyshev’s inequality on the health indicators. remaining useful life using time series embeddings based on recurrent neural
networks. arXiv preprint arXiv:1709.01073.
Then, we use the health indicators and the few available labelled data Hochreiter, S., Schmidhuber, J., 1997. Long short-term memory. Neural Comput. 9 (8),
samples in a similarity-based matching approach to predict the RUL of 1735–1780.
the engines in the unhealthy stage. Kingma, D.P., Ba, J., 2014. Adam: A method for stochastic optimization. arXiv preprint
We apply this approach to the aircraft engines in the new N-CMAPSS arXiv:1412.6980.
Kong, X., Yang, J., 2019. Remaining useful life prediction of rolling bearings
dataset (Arias Chao et al., 2021). The obtained health indicators have a
based on RMS-MAVE and dynamic exponential regression model. IEEE Access 7,
high monotonicity (0.38), prognosability (0.94) and trendability (0.95). 169705–169714.
Moreover, the health indicators are indeed robust to the varying oper- Koutroulis, G., Mutlu, B., Kern, R., 2022. Constructing robust health indicators from
ating conditions: the trendability is also high for engines with operating complex engineered systems via anticausal learning. Eng. Appl. Artif. Intell. 113,
conditions deviating from the operating conditions in the training set. 104926.
Lei, Y., Li, N., Guo, L., Li, N., Yan, T., Lin, J., 2018. Machinery health prognostics:
Also the obtained RUL prognostics are accurate, with a RMSE of only
A systematic review from data acquisition to RUL prediction. Mech. Syst. Signal
2.67 flights. Moreover, our approach outperforms supervised learning Process. 104, 799–834.
methods, that directly predict the RUL, with a decrease in the RMSE of Liu, C., Sun, J., Liu, H., Lei, S., Hu, X., 2020. Complex engineered system health
19%. indexes extraction using low frequency raw time-series data based on deep learning
methods. Measurement 161, 107890.
Our proposed methodology is illustrated for aircraft engines. How-
Luong, M.-T., Pham, H., Manning, C.D., 2015. Effective approaches to attention-based
ever, the described methodology is also suitable for applications in neural machine translation. arXiv preprint arXiv:1508.04025.
other fields, such as wind turbine gearboxes, bearings in industrial Lyu, J., Ying, R., Lu, N., Zhang, B., 2020. Remaining useful life estimation with multiple
applications or batteries. For future research, we therefore plan to apply local similarities. Eng. Appl. Artif. Intell. 95, 103849.
this methodology for other components and systems in other industries. Malhotra, P., TV, V., Ramakrishnan, A., Anand, G., Vig, L., Agarwal, P., Shroff, G.,
2016. Multi-sensor prognostics using an unsupervised health index based on LSTM
Moreover, for future research, we plan to analyse if the failure mode
encoder-decoder. arXiv preprint arXiv:1608.06154.
of an engine can be determined based on the reconstruction loss of Ochella, S., Shafiee, M., Dinmohammadi, F., 2022. Artificial intelligence in prognostics
different sensors. and health management of engineering systems. Eng. Appl. Artif. Intell. 108,
104552.
Saxena, A., Goebel, K., 2008. Turbofan engine degradation simulation data set. NASA
CRediT authorship contribution statement
Ames Progn. Data Repos. 1551–3203.
Shen, S., Lu, H., Sadoughi, M., Hu, C., Nemani, V., Thelen, A., Webster, K., Darr, M.,
Ingeborg de Pater: Conceptualization, Methodology, Software, Val- Sidon, J., Kenny, S., 2021. A physics-informed deep learning approach for bearing
idation, Writing – original draft, Writing – review & editing, Visual- fault detection. Eng. Appl. Artif. Intell. 103, 104295.
ization. Mihaela Mitici: Conceptualization, Methodology, Writing – Singh, J., Darpe, A., Singh, S.P., 2020. Bearing remaining useful life estimation using
an adaptive data-driven model based on health state change point identification
original draft, Writing – review & editing, Supervision, Project admin-
and K-means clustering. Meas. Sci. Technol. 31 (8), 085601.
istration, Funding acquisition. Vasilev, I., 2019. Advanced Deep Learning with Python: Design and Implement
Advanced Next-Generation AI Solutions using TensorFlow and PyTorch. Packt
Declaration of competing interest Publishing Ltd.
Wang, J., Zeng, Z., Zhang, H., Barros, A., Miao, Q., 2022. An hybrid domain adaptation
diagnostic network guided by curriculum pseudo labels for electro-mechanical
The authors declare that they have no known competing finan- actuator. Reliab. Eng. Syst. Saf. 228, 108770.
cial interests or personal relationships that could have appeared to Wei, Y., Wu, D., Terpenny, J., 2021. Learning the health index of complex systems
influence the work reported in this paper. using dynamic conditional variational autoencoders. Reliab. Eng. Syst. Saf. 216,
108004.
Xiang, S., Qin, Y., Zhu, C., Wang, Y., Chen, H., 2020. Long short-term memory neural
Data availability
network with weight amplification and its application into gear remaining useful
life prediction. Eng. Appl. Artif. Intell. 91, 103587.
The data is publically available at the NASA open data portal. Ye, Z., Yu, J., 2021. Health condition monitoring of machines based on long short-term
memory convolutional autoencoder. Appl. Soft Comput. 107, 107379.
References Yu, W., Kim, I.Y., Mechefske, C., 2019. Remaining useful life estimation using a
bidirectional recurrent neural network based autoencoder scheme. Mech. Syst.
Signal Process. 129, 764–780.
Arias Chao, M., Kulkarni, C., Goebel, K., Fink, O., 2021. Aircraft engine run-to-failure
Yu, W., Kim, I.Y., Mechefske, C., 2020. An improved similarity-based prognostic
dataset under real flight conditions for prognostics and diagnostics. Data 6 (1), 5.
algorithm for RUL estimation using an RNN autoencoder scheme. Reliab. Eng. Syst.
Berghout, T., Mouss, L.-H., Kadri, O., Saïdi, L., Benbouzid, M., 2020. Aircraft engines
Saf. 106926.
remaining useful life prediction with an adaptive denoising online sequential
Zhai, S., Gehring, B., Reinhart, G., 2021. Enabling predictive maintenance integrated
extreme learning machine. Eng. Appl. Artif. Intell. 96, 103936.
production scheduling by operation-specific health prognostics with generative deep
Bishop, C.M., 2006. Pattern Recognition and Machine Learning. Springer.
learning. J. Manuf. Syst. 61, 830–855.

12

You might also like