A Generic Framework For Prognostics of Complex SystemsAerospace
A Generic Framework For Prognostics of Complex SystemsAerospace
Article
A Generic Framework for Prognostics of Complex Systems
Marie Bieber 1, * and Wim J. C. Verhagen 2
1 Faculty of Aerospace Engineering, Delft University of Technology, 2629 HS Delft, The Netherlands
2 Aerospace Engineering and Aviation, RMIT University, Carlton, VIC 3053, Australia
* Correspondence: [email protected]
Abstract: In recent years, there has been an enormous increase in the amount of research in the field
of prognostics and predictive maintenance for mechanical and electrical systems. Most of the existing
approaches are tailored to one specific system. They do not provide a high degree of flexibility and
often cannot be adaptively used on different systems. This can lead to years of research, knowledge,
and expertise being put in the implementation of prognostics models without the capacity to estimate
the remaining useful life of systems, either because of lack of data or data quality or simply because
failure behaviour cannot be captured by data-driven models. To overcome this, in this paper we
present an adaptive prognostic framework which can be applied to different systems while providing
a way to assess whether or not it makes sense to put more time into the development of prognostic
models for a system. The framework incorporates steps necessary for prognostics, including data pre-
processing, feature extraction and machine learning algorithms for remaining useful life estimation.
The framework is applied to two systems: a simulated turbofan engine dataset and an aircraft cooling
unit dataset. The results show that the obtained accuracy of the remaining useful life estimates
are comparable to what has been achieved in literature and highlight considerations for suitability
assessment of systems data towards prognostics.
Keywords: prognostics and health management; adaptive framework; remaining useful life
techniques making use of monitored system condition data and failure data can be applied
in such a case. They are mostly based on statistical or artificial intelligence (AI) methods.
The requirement for such algorithms is the availability of data characterizing system be-
haviour that covers all phases of normal and faulty operation and all degradation scenarios
under different operating conditions. Recent developments in sensing technologies, data
storage, data processing, IT systems and computational power have been major drivers
of data-driven prognostic approaches, leading to an increase in available methods and
algorithms in the state of the art.
Most of the existing literature on data-driven prognostics focuses on the development
of more advanced and more accurate models and algorithms. For this purpose, standard
datasets are often used as these enable comparative evaluation of multiple models. This is
a valid approach when the aim is the development of better-performing methods for those
specific datasets. However, it also makes the approaches application- and system-specific.
When applying those methodologies on ’real’ systems, it can be the case that simple
algorithms outperform very complex ones. Furthermore, tuning a complex algorithm
to reach a better performance generally takes a lot of time and skill, which is often not
available. Consider, for example, an airline operating different types of aircraft and aiming
to introduce prognostics on a broad basis. Each aircraft can be considered as a complex
system with multiple subsystems and components. For each of these subsystems or
components, a dedicated prognostic model is needed and the costs for the airline to hire
data scientists that develop, test, and validate a single model for each of the components
would be immense. Therefore, what would be more desirable is a generic prognostic
framework that chooses the most accurate prognostic approach from a set of algorithms
given component data.
Prior studies proposing such frameworks have yielded promising results. An au-
tonomous diagnostics and prognostics framework (DPF) is suggested by Baruah et al. [11].
It consists of several steps, including data pre-processing, clustering to distinguish operat-
ing conditions and, finally, diagnostics and prognostics steps. A limitation of the approach
is the fact that some parameters, including the number of observations for initialisation
and optimization of cluster adaption rates have to be set manually and it can be tricky to
tune the algorithm in an optimal way. Another limitation is the fact that a classification
is performed (i.e., at any time it is determined if the component is faulty or not), rather
then a remaining useful life estimation. To account for this, Voisin et al. [12] provide a
generic prognostic framework that can be instantiated to various applications. However,
their approach is very formal and no specific machine learning algorithms are used in this
framework. Again, this is a limitation, as it is up to the user to define proper techniques.
To overcome this problem, An et al. [13] provide guidelines to help with the selection of
appropriate prognostic algorithms depending on the application. Another way to address
this is by using ensembles of machine learning approaches that combine multiple prog-
nostic algorithms with an accuracy-based weighted-sum formulation [14]. Still, a problem
remains: this addresses only prognostics but not the steps needed before, namely the data
pre-processing and diagnostics. This is overcome by Trinh and Kwon [15], who suggest
a prognostics method based on an ensemble of genetic algorithms that includes all the
steps, from the data pre-processing until the RUL estimation. With this it provides a truly
generic framework for prognostics. The authors of the paper validated their framework by
applying it to three commonly used and available datasets and comparing its performance
to other existing approaches. However, their findings are limited to simulated datasets.
This development makes sense, especially when one considers the problems and
challenges arising with using real-life data: As Zio [4] points out, often collected sensor
signals are collected under changing operational and environmental conditions. On top of
that they are often incomplete, unlabeled, as data are missing or scarce. Therefore, extracting
informative content for the diagnostics and prognostics can be a challenging task. Still,
this points towards a problematic trend: many prognostic method developments in recent
literature are not tested on real-life industrial cases. While many methods show highly
Aerospace 2022, 9, 839 3 of 27
promising results [1], they may face significant limitations when applied to real-life cases.
However, it is not often that these limitations are identified and addressed in literature.
Nevertheless, several studies using real aircraft data have been published. Fault messages
of an aircraft system have been used in [16] to compare data-driven approaches for aircraft
maintenance to the more classically used experience-based maintenance. An anomaly
detection method for condition monitoring for an aircraft cooling system unit is presented
in [17]. On the same dataset, two more studies have been conducted on remaining useful
life estimation: first, a clustering approach was used to determine degradation models
and failure thresholds and together with a particle filter algorithm this results in RUL
estimates [18]. Second, a HI construction approach integrating physics based and data-
driven methods was applied to the same dataset to estimate the systems RUL [19].
Still, applications for generic prognostic frameworks are limited to simulated datasets.
We therefore present a generic framework and apply it to both a simulated dataset as well
as a ’real’ dataset of operating aircraft within an airline. For both applications, the aim is
to provide guidance in the choice of prognostic methodologies for a given dataset and a
systems data suitability analysis from a prognostics perspective. We thereby also address
the challenge of applying prognostic methodologies in real applications of complex systems
and provide an assessment of whether or not a system is prognosable given the system
data. A genetic algorithm is used to find the optimal combination of methodologies and
associated hyperparameter settings for each step in the process of generating prognostics.
With respect to the current academic state of the art, our novel contributions include:
• The presentation of a generic prognostic framework with the capability to not only estimate
a system’s RUL, but also give an assessment towards the ability to perform prognostics on
such a system. A system is defined to be ’prognosable’ if meaningful and accurate data-
driven prognostic models can be developed based on available operational, contextual and
failure data. Meaningful refers to the fact that the models are able to capture degradation
trends and learn failure behaviour, while the term accurate pertains to the prediction
quality in terms of one or multiple defined prognostic metrics.
• The implementation of the framework on both real aircraft data, as well as a simulated
dataset.
• An identification of the challenges faced with using prognostic approaches on a real
aircraft dataset opposed to using simulated data.
The remainder of this paper is organized as follows. Section 2 introduces the generic
prognostic framework. In Section 3, the aircraft systems, underlying data, and failure modes
are described and the results of the case study are presented. Subsequently, the adaptivity
of the framework, the difficulties with applying it to a real dataset and the question of
how to determine the ability to perform prognostics on a system are discussed. Finally, in
Section 4, we conclude by highlighting the most important findings and limitations and
providing directions for further research.
previous steps. Note that we distinguish prognostic algorithms from prognostic models:
when using the term ’prognostic algorithm’ we refer to a certain selected technique used
to perform prognostics, e.g., Random forest (RF) or neural networks, and by ’prognostic
model’ we indicate the derived predictor (as output of the prognostic algorithm and feature
engineering methodologies) that takes system data as an input and outputs the RUL estimate.
The GPF treats the selection of the according techniques as an optimization problem:
the objective is to select the optimal methodology (in terms of Mean squared error (MSE),
defined in Equation (1)) with the optimal hyper parameter settings for each element of
prognostics included in the framework (such as data rebalancing). We implement this
in four steps as shown in Figure 1. In step 1, the selected system data are pre-processed.
As the GPF is a generic framework that is adaptive by nature to different datasets, the
data pre-processing techniques applied are kept to a minimum. Further details about
the pre-processing applied are given in Section 2.1. In step 2, the hyper parameters for
prognostic algorithms are tuned by grid search, as further explained in Section 2.2. Step
3 aims to solve the optimization problem that can be formulated as follows: find the
optimal combination to generate predictions for a given dataset, where optimality is
evaluated through minimisation of the MSE, given a set of re-balancing, feature engineering
techniques, and prognostic algorithms. A detailed explanation of this process and the
according techniques is given in Section 2.3. Finally, in step 4, the settings are used to
build the prognostic model to output the RUL estimate. The framework as suggested in
this paper can be used in multiple ways, two of which are of primary importance in the
context of our research: either it can provide a quick assessment of the ability to perform
prognostics based on the input data or it can be used to perform an automatic selection
of feature engineering settings. This is further explained in Section 2.4. To guide through
the following sections and make the dynamics of the GPF clearer, we make use of a small
example dataset. It is split in a training and test set as it would for a machine learning
application as considered in this paper. The example training set is presented in Table 1
and the respective test set can be found in Table 2.
Aerospace 2022, 9, 839 5 of 27
Current Mean Current Min Current Max Speed Mean Speed Min Speed Max High Current Count RUL id
0.00 0.0 0.0 0.00 0 0 751 0 11
1.19 0.0 2.1 4035 0 5024 967 1 11
2.15 2.1 2.2 4998 4976 5024 42 2 11
2.11 2.1 2.2 4997 4976 5016 83 3 11
2.18 1.8 2.4 4822 4472 5024 2223 4 11
2.15 1.8 2.4 4516 4448 5024 39,267 5 11
1.84 1.6 2.2 4547 4456 4840 1693 6 11
2.13 2.1 2.2 4996 4976 5008 12 7 11
1.49 0.0 2.4 4564 0 5032 1910 0 3
2.43 2.4 2.5 4639 4576 4720 39 1 3
2.43 2.4 2.5 4557 4536 4584 9 2 3
2.40 2.4 2.5 4497 4472 4552 104 3 3
2.24 2.1 2.4 4493 4464 4528 846 4 3
2.13 1.9 2.2 4493 4456 4528 1017 5 3
Current Mean Current Min Current Max Speed Mean Speed Min Speed Max High Current Count RUL id
1.08 0.0 2.2 3225 0 5024 567 0 25
2.11 2.1 2.2 4996 4968 5032 41 1 25
2.12 2.1 2.2 4998 4984 5008 10 2 25
centred on the training data points and then selecting a subset of these during training. The
two selected algorithms are well-established and offer potential advantages in terms of
interpretability and explainability, which is necessary to understand systems retrospectively
and prospectively [23]. This may assist in the adoption of these algorithms for a variety
of applications, potentially even covering safety-critical components. They thereby also
provide the possibility to establish first baseline models for a quick prognostic assessment.
Those two methodologies are chosen as representative machine learning algorithms. Both
RF and SVMs have been shown to be adaptive to different datasets even without applying
a thorough hyper parameter selection and are, therefore, good candidates to establish a first
baseline. However, the framework can easily be extended to include further methodologies
or algorithms.
For the chosen algorithms on a validation set, a grid search is performed to find the
optimal hyper parameter settings. Since the aim of the grid search in this case is to establish
quick baseline models that can consequently be used as in input in the following step of the
framework, we only search a limited set of parameters. The according hyper parameters
and their possible settings explored during the grid search are given in Table 3. The found
settings are the ones then used as initial settings for the prognostic algorithms in the genetic
algorithm that is presented in the next section.
Table 3. The hyper parameters and combination of settings explored during the grid search for each
of the prognostic algorithms.
with RULi the true RUL value and RUL ˆ i the predicted RUL value at timestep i.
The MSE has been selected for the evaluation of the prognostics for two main reasons:
first, as a score which captures accuracy, the MSE gives a good indication over how well
the algorithms perform with respect to predicting the RUL. Second, despite the fact that
it is important to not rely on one metric to evaluate predictions [24], we found that the
majority of the literature considering the simulated turbofan engine dataset uses the MSE
or root mean squared error (RMSE) to evaluate RUL predictions. For this reason, it makes
sense for us to apply it in this case study as well to have results that are comparable with
the state of the art and, thereby, can be validated against existing approaches.
The concepts of natural selection and genetics inspired the field of evolutionary strategies
and genetic algorithms. Genetic algorithms are based on the concepts of natural selection
and genetics [25]. Due to their flexibility, Genetic algorithm (GA)s are able to solve global
optimization problems and optimize several criteria at the same time, such as in our case the
simultaneous selection of data re-balancing, feature engineering, and prognostic algorithm
techniques [26]. This is what makes them good candidates for our optimization problem.
Aerospace 2022, 9, 839 7 of 27
Figure 3. The prognostic steps and methodologies included in the genetic algorithm.
we introduce the underlying concepts in the following paragraph. The main idea behind
re-balancing methods for continuous target variables is the construction of bins based on
a relevance function. The relevance function maps the values of the target variable into
a range of importance, where 1 corresponds to maximal importance and 0 to minimum
relevance. With this, the bins classify the data in normal (BI NN ) and relevant samples
(BI NR ). In our setup, we use a sigmoid relevance function as defined in [29] and shown in
Figure 4 with a relevance threshold, tr of 0.5. Furthermore, we set all values with a RUL of
less then the threshold cl = 10 to be of importance, set the oversampling rate to 0.9 and the
undersampling rate to 0.1.
Figure 4. Example of a sigmoid relevance function similar to the one used for the rebalancing task.
Figure 5. The dataset sizes for the different rebalancing strategies when applied to the demonstration
example.
Aerospace 2022, 9, 839 10 of 27
Table 4. Characteristics of the four turbofan engine datasets, note that the difference between the four
datasets lies within the number of fault modes (’modes’) and operating conditions (’conditions’).
Figure 6. The features selected by the PCA and their relevance scores (the higher the more relevant).
Table 5. The selected most relevant features of the C-MAPSS FD001 dataset by the methodologies
included in the GPF and in existing literature.
Figure 7. The most relevant features selected based on two different relevance scores by [35]. (a): pass
rate of MK test; (b): Uncertainty Score.
In order to validate the outputs of the prognostic algorithms and the GPF itself, we
select three papers from literature presented in Table 6 to compare the metrics reached
when using the SVM and RF of the GPF to the results reached in the respective papers on all
four C-MAPSS datasets. Note that all of those papers use a piecewise linear RUL function
(well explained in [36]), which has been shown to result in much better predictions and in
order to make the results comparable we do so too. Therefore, the results presented in the
following are not comparable with the results reached using the linear RUL function as
presented in Section 3.1.5. Furthermore, two metrics are used to compare the results, the
root mean-squared error (RMSE) which is simply the square root of the MSE and the score
function as defined in [37]. The resulting metrics of the three selected papers (in case of
using the RF only two selected papers) and of the GPF are summarized in Tables 7 and 8.
On all four datasets the results reached by the RF and SVM of the GPF in terms of RMSE
and the score function are in the same range as the algorithms presented in the three papers
in literature.
Table 6. Reference papers to validate the output of the prognostic algorithms in the GPF.
Paper ID Reference
1 [38]
2 [39]
3 [35]
Table 7. Random Forest algorithm RMSE and score on the three papers of literature and using the
GPF.
Table 8. Support vector machine RMSE and score on the three papers of literature and using the GPF.
Table 9. A comparison of applying different rebalancing methodologies and the resulting MSEs on
dataset FD001.
Settings MSE
Rebalancing Feature Engineering Prognostic Algorithm
RO None rf 1657,90
None None rf 1650,41
GN None rf 1656,45
WERCS None rf 1658,01
Table 10. A comparison of applying different feature engineering methodologies and the resulting
MSEs on dataset FD001.
Settings MSE
Rebalancing Feature Engineering Prognostic Algorithm
None correlation rf 1769,25
None importance rf 1775,82
None None rf 1650,88
None PCA rf 2105,58
Aerospace 2022, 9, 839 15 of 27
Table 11. A comparison of applying the different prognostic algorithms and the resulting MSEs on
dataset FD001.
Settings MSE
Rebalancing Feature Engineering Prognostic Algorithm
None None rf 1650,88
None None SVM 1775,05
Table 12. The resulting MSEs of using the GPF versus purely using RF or SVM.
Dataset Algorithm
GPF (50 Individuals) RF SVM
FD001 1649.923528 1650.410000 1775.053164
FD002 1877.882809 1974.466387 2152.961399
FD003 4170.124626 4239.466717 4650.671887
FD004 4559.050200 4559.050200 5238.340000
More insight in the quality of predictions can be gained by observing Figures 10–13
showing the resulting predictions and the ground truth on six randomly selected trajectories
of the test set for the GPF, the RF and the SVM models. Note that the figures show six
randomly selected trajectories of the test set and they might not be a representative choice
as the performance varies between different trajectories. Still, the figures give some insight
into how well the models are able to capture degradation. By and large, the trends are
captured quite well. Throughout all four datasets FD001–FD004 it can be observed that
the true RUL is better approximated by predictions for longer trajectories, i.e., trajectories
operating for longer than 100 time cycles. For shorter trajectories throughout all datasets
the algorithm is not able to predict RUL accurately or capture degradation trends. For
dataset FD001, the least complex dataset, the trends are in most cases very close to ground
truth. For most trajectories, the RF outperforms SVM (see Figure 10a,d,e). Although for
trajectories 15 and 87, shown in Figure 10b,f, this is not the case, those are also the cases with
the trajectories only running for a bit more than 70, respective 30 time cycles. In dataset
FD002 for trajectories 7, 15, and 46 represented in Figure 11a,b,e the RUL prediction is very
Aerospace 2022, 9, 839 16 of 27
close to the ground truth and for most of the other trajectories the RUL towards the end
of the component life is predicted quite accurately. In dataset FD003, the predictions seem
more unstable. Still the degradation is captured quite well, especially towards the end of life.
As mentioned before for dataset FD004, the GPF chose as the optimal prognostic settings RF
without any feature engineering or rebalancing method, therefore, only two lines visible
in the plots. On the chosen trajectories it seems as if the SVM outperforms RF quite often,
although most of the trajectories are quite short (none is longer than 175 time cycles), so the
set of trajectories might not be a good representation of the overall performance.
Figure 9. The MSE of GPF versus purely using RF or SVM for the four CMAPSS datasets.
Figure 10. True and predicted value on dataset FD001 for six different trajectories when using the
GPF, RF, and SVM.
Aerospace 2022, 9, 839 17 of 27
Figure 11. True and predicted value on dataset FD002 for six different trajectories when using the
GPF, RF, and SVM.
Figure 12. True and predicted value on dataset FD003 for six different trajectories when using the
GPF, RF, and SVM.
Aerospace 2022, 9, 839 18 of 27
Figure 13. True and predicted value on dataset FD004 for six different trajectories when using the
GPF, RF, and SVM.
Table 13 shows the chosen prognostic settings when running the GPF on the four
C-MAPSS datasets with 20, 30, and 50 individuals. We see a consistency of the choices
of the GPF over the population size. Furthermore, in those cases where the choices of
methodologies differ, than it is only minor changes in the settings, e.g., a different selection
of rebalancing method for datasets FD001 and FD003. Furthermore, we note that the
GPF consistently chooses the RF over the SVM, only for dataset FD003 it selects the SVM,
which together with the suiting feature engineering and data rebalancing methods even
outperforms the RF. This shows the importance of including such steps when developing
prognostic models. While the differences in terms of MSE in this case are minor, it can be
the case that they are bigger for a different dataset.
Table 13. The resulting prognostic settings when running the GPF with populations of 20, 30, and 50
individuals on the four C-MAPSS datasets.
All in all, on the C-MAPSS dataset, even simple methodologies, such as applying
RF and SVM without any feature engineering or data rebalancing, yield quite promising
results and although the GPF improves performance, it does not significantly add to the
prediction quality.
Aerospace 2022, 9, 839 19 of 27
for the training, which is useful for two reasons: first, this reduces the dataset size and,
therefore, also the computational time needed. Second and more importantly though, this
reduces noise introduced by long running trajectories that do not contain much information
about degradation behaviour and condenses the information on the failure dynamics. Note
that in step 4 of the GPF (Section 2.4), when training the prognostic model using the by the
GPF chosen settings, as opposed to using cross-validation the train and test sets are fixed
and the test set consists of the three trajectories with failure IDs 108, 113, and 116.
Table 14. The 24 trajectories of the CUs, the number of flight cycles in operation and the number of
data points after aggregation.
Figure 14. The mean RUL for all trajectories of the cooling unit dataset.
while the MSE is lower when cutting 50 Flight cycles (FC) before failure (see Table 15), this
is not really comparable to the slightly higher MSEs when cutting 100 or 200 FC before
failure, since the MSE punishes false predictions closer to the end of life of a component
less than false predictions at the beginning.
Table 15. MSE of using GPF, only RF or SVM for different cut settings (cut 50, 100, 200, or 500 FC
before failure).
Settings MSE
Population
Cut GPF SVM RF
Size
20 50 121,133 252,559 256,327
20 100 160,608 228,725 235,180
20 200 176,610 191,486 186,678
20 500 12,818 43,626 75,002
Figure 15. MSE of using GPF, only RF or SVM for different cut settings (cut 50, 100, 200, or 500 FC
before failure).
Table 16 contains the by the GPF chosen prognostic settings for different cut settings
and the corresponding MSEs. In all cases, rebalancing methods are chosen and in most
cases, feature engineering methods are also included by the GPF to arrive at the optimal
prognostic output. Still, the MSE is remarkably high in all cases even when it is low
compared to using only RF or SVM.
Table 16. Chosen prognostic settings and MSE for different cut settings (cut 50, 100, 200, or 500 FC
before failure).
The resulting predictions and the ground truth on three trajectories of the test set for
the GPF, using only the RF and only the SVM for predictions when using different cut
settings (cutting 50, 100, 200, and 500 FC before failure) are displayed in Figures 16–19.
Cutting 50 FC before failure results in quite unstable predictions, which do not depict any
degradation trends at all. When including a bit more points and cutting 100 FC before
failure this changes. In fact, using Gaussian Noise to do rebalancing and applying the
random forest importance feature selection methodology, improves the prediction quality
Aerospace 2022, 9, 839 22 of 27
in such a way that now a trend is captured compared to the RF and SVM models predictions
(see Figure 17). Table 15 reflects this behaviour in the lowered MSE of using the GPF as
compared to using only RF or SVM. For cutting 200 FC before failure, the predictions seem
to be less stable, perhaps due to the additional noise introduced through the data. This
changes again when cutting 500 FC before failure as displayed in Figure 19. In this case,
the predictions become more stable again and especially the GPF captures the degradation
trend quite well.
Figure 16. True and predicted values for three different trajectories of the SCU test set when using
the GPF, RF, and SVM (cut 50 FC before failure).
Figure 17. True and predicted values for three different trajectories of the SCU test set when using
the GPF, RF, and SVM (cut 100 FC before failure).
Figure 18. Trueand predicted values for three different trajectories of the SCU test set when using the
GPF, RF, and SVM (cut 200 FC before failure).
Figure 19. True and predicted values for three different trajectories of the SCU test set when using
the GPF, RF, and SVM (cut 50 FC before failure).
Aerospace 2022, 9, 839 23 of 27
All in all, the two main points we find when applying the GPF to the cooling unit
dataset can be summarized as follows: first, the GPF outperforms using simple machine
learning methods by margins (in terms of MSE). Second, the impact of when to ’cut’ the
data before failure in the train set is high, which can be seen as the impact of labelling
data as ’healthy’/’faulty’. A next step can be to use a piecewise linear function similarly to
existing approaches on the C-MAPSS dataset, such as the one presented in [40], or to use a
health indicator flagging data as ’healthy’ or ’faulty’.
3.3.2. Using the GPF to Determine the Ability to Perform Prognostics on a System
As noted in the previous section, applying the GPF to real data seems to result in
predictions of much better quality for the cooling unit dataset. This indicates the GPF
provides a more thorough prognostic assessment as simply applying a RF or SVM would
do. Since it is straightforward to apply the framework, it can not only give an indication
over which prognostic methodologies might be the most effective on a given dataset, but
also it can give an indication of the ability to perform prognostics on a system. In Section 1
we defined the ability to perform prognostics on a system to mean that meaningful and
accurate data-driven prognostic models can be developed based on given underlying
operational and failure data for a system. To be more precise, the assessment of whether or
Aerospace 2022, 9, 839 24 of 27
not a system is prognosable is approached from a data suitability point of view. The aim is
to understand if, based on the system data, we are able to retrieve first simple prognostic
models. If this is not the case, the system data might not be of sufficient quality and size to
train a prognostic model. It is not very surprising that the simple prognostic methodologies
included in the GPF result in quite accurate predictions in terms of MSE and visually
compared against the true RUL on all four simulated datasets. The C-MAPSS datasets
are created for prognostics and it has been shown over the past decade multiple times in
literature that even with simple methodologies the RUL of the underlying turbofan engine
can be accurately estimated. For the cooling unit dataset this is a bit more complicated.
There are several additional challenges when working with a real dataset, as covered in
the previous Section 3.3.1. Other authors who have worked on the cooling unit dataset
noticed the same: in their paper in which they present an anomaly detection method and
apply it to the dataset, Basora et al. [17] point out that the prediction of fault occurrences
proved a challenge, especially due to the fact that fault dynamics are different from one
case to another. This situation is not improved by the small number of faults and the
lack of knowledge of failure modes. Still, other authors found that applying prognostic
methodologies to the same dataset results in quite accurate RUL predictions (see [19,41]).
All in all, we would, based upon previous works in literature, conclude that the system
based on the collected data is prognosable. The only remaining challenge is to extend the
dataset and especially collect more data concerning faults. This is in alignment with the results
presented in Section 3.2 and becomes especially visible in Figures 17 and 19. Based on this
and our findings in the previous Sections 3.1 and 3.2, the output of the GPF can be used to
tell if a system is prognosable when keeping the following in mind: even if the MSE does
contain some information on the ability to perform prognostics on a system, this information
does not suffice to make real implications. Depending on the dataset the resulting MSE can
differ significantly and it does not give a real indication if the degradation trend is captured or
not. Table 15 shows that the GPF results in a lower MSE than the classic machine learning
algorithms, but Figure 16 shows us that exactly like the RF and SVM models, it does not seem
to capture any trends at all on the test dataset. For this purpose, visual representations of the
predictions can be helpful.
While the data availability and quality, especially of failure related data, is one of the
biggest challenges when applying prognostics to real system data [4], this is also one of the
main points we aim to address with the presented framework. Applying the GPF to system
data results in an assessment of the suitability of the underlying data for prognostics. While
such an assessment can help in the decision of further prognostic development effort, it also
can provide insights into possible arising requirements for further or more dense failure
related or sensor data.
4. Conclusions
We have presented a generic prognostic framework with two major aims: (1) provide
an approach capable of identifying a suiting prognostic model given related system data;
and (2) provide a way to assess system data in terms of the ability to perform prognostics
on a system. To substantiate both points, we have applied the framework towards two
datasets: the synthetic C-MAPPS dataset and a real aircraft system dataset, enabling a
comparative evaluation. This is in contrast to existing literature, which focuses exclusively
on either synthetic or real data, with the latter being much less prevalent. Additionally,
as pointed out in the introduction, recent advances in prognostic method developments
lack convincing proof regarding generalizability, i.e., suitability for application beyond
synthetic datasets, such as C-MAPSS towards real-life industrial cases.
The results of our study suggest that the generic prognostic framework can be adapted
to various systems and provides potential towards valid remaining useful life estimates
for aircraft systems. Furthermore, the framework provides a means to quickly assess the
ability to perform prognostics based on system data. In addition to that, we highlight the
limitations and challenges with applying prognostics to real-life datasets.
Future research will focus on expanding and testing the methods included in the
framework. Furthermore, the influence of a variety of metrics on prognostic performance
will be assessed more thoroughly.
Author Contributions: Conceptualization, M.B. and W.J.C.V.; methodology, M.B. and W.J.C.V.;
software, M.B.; validation, M.B.; writing—original draft preparation, M.B.; writing—review and
editing, W.J.C.V.; visualization, M.B.; supervision, W.J.C.V.; funding acquisition, W.J.C.V. All authors
have read and agreed to the published version of the manuscript.
Funding: This research has received funding from the European Union’s Horizon 2020 research and
innovation programme under grant agreement No. 769288.
Conflicts of Interest: The authors declare no conflict of interest.
Abbreviations
The following abbreviations are used in this manuscript:
References
1. Scott, M.; Verhagen, W.J.C.; Bieber, M.T.; Marzocca, P. A Systematic Literature Review of Predictive Maintenance for Defence
Fixed-Wing Aircraft Sustainment and Operations. Sensors 2022, 22, 7070. [CrossRef] [PubMed]
2. Elattar, H.M.; Elminir, H.K.; Riad, A.M. Prognostics: A literature review. Complex Intell. Syst. 2016, 2, 125–154. [CrossRef]
3. Lei, Y.; Li, N.; Guo, L.; Li, N.; Yan, T.; Lin, J. Machinery health prognostics: A systematic review from data acquisition to RUL
prediction. Mech. Syst. Signal Process. 2018, 104, 799–834. [CrossRef]
4. Zio, E. Prognostics and Health Management (PHM): Where are we and where do we (need to) go in theory and practice. Reliab.
Eng. Syst. Saf. 2022, 218, 108119. [CrossRef]
5. Peng, Y.; Dong, M.; Zuo, M.J. Current status of machine prognostics in condition-based maintenance: A review. Int. J. Adv.
Manuf. Technol. 2010, 50, 297–313. [CrossRef]
6. Zhang, J.; Lee, J. A review on prognostics and health monitoring of Li-ion battery. J. Power Sources 2011, 196, 6007–6014.
[CrossRef]
7. Brownjohn, J.; de Stefano, A.; Xu, Y.L.; Wenzel, H.; Aktan, A.E. Vibration-based monitoring of civil infrastructure: Challenges and
successes. J. Civ. Struct. Health Monit. 2011, 1, 79–95. [CrossRef]
8. Baptista, M.; Henriques, E.P.; de Medeiros, I.P.; Malere, J.P.; Nascimento, C.L.; Prendinger, H. Remaining useful life estimation in
aeronautics: Combining data-driven and Kalman filtering. Reliab. Eng. Syst. Saf. 2019, 184, 228–239. [CrossRef]
9. Downey, A.; Lui, Y.H.; Hu, C.; Laflamme, S.; Hu, S. Physics-based prognostics of lithium-ion battery using non-linear least
squares with dynamic bounds. Reliab. Eng. Syst. Saf. 2019, 182, 1–12. [CrossRef]
10. Reddy Lyathakula, K.; Yuan, F.G. Fatigue Damage Diagnostics-Prognostics Framework for Remaining Life Estimation in Adhesive
Joints. AIAA J. 2022, 1–19. [CrossRef]
11. Baruah, P.; Chinnam, R.B.; Filev, D. An autonomous diagnostics and prognostics framework for condition-based maintenance.
IEEE Int. Conf. Neural Netw.—Conf. Proc. 2006, 3428–3435. [CrossRef]
12. Voisin, A.; Levrat, E.; Cocheteux, P.; Iung, B. Generic prognosis model for proactive maintenance decision support: Application to
pre-industrial e-maintenance test bed. J. Intell. Manuf. 2010, 21, 177–193. [CrossRef]
13. An, D.; Kim, N.H.; Choi, J.H. Practical options for selecting data-driven or physics-based prognostics algorithms with reviews.
Reliability Engineering and System Safety 2015, 133, 223–236. [CrossRef]
14. Hu, C.; Youn, B.D.; Wang, P. Ensemble of data-driven prognostic algorithms with weight optimization and k-fold cross validation.
Proc. Asme Des. Eng. Tech. Conf. 2010, 3, 1023–1032. [CrossRef]
15. Trinh, H.C.; Kwon, Y.K. A data-independent genetic algorithm framework for fault-type classification and remaining useful life
prediction. Appl. Sci. 2020, 10, 368. [CrossRef]
16. Baptista, M.; Nascimento, C.L.; Prendinger, H.; Henriques, E. A case for the use of data-driven methods in gas turbine prognostics.
In Proceedings of the Annual Conference of the Prognostics and Health Management Society, PHM, Paris, France, 2–5 May 2017;
pp. 441–452.
17. Basora, L.; Bry, P.; Olive, X.; Freeman, F. Aircraft fleet health monitoring with anomaly detection techniques. Aerospace 2021,
8, 103. [CrossRef]
18. Mitici, M.; De Pater, I. Online model-based remaining-useful-life prognostics for aircraft cooling units using time-warping
degradation clustering. Aerospace 2021, 8, 168. [CrossRef]
19. Rosero, R.L.; Silva, C.; Ribeiro, B. Remaining Useful Life Estimation of Cooling Units via Time-Frequency Health Indicators with
Machine Learning. Aerospace 2022, 9, 309. [CrossRef]
20. Breiman, L. Bagging Predictors. Mach. Learn. 1996, 24, 123–140. [CrossRef]
21. Breiman, L. Random forests. Mach. Learn. 2001, 5–32. [CrossRef]
22. Vapnik, V.N. The Nature of Statistical Learning Theory; Springer Science & Business Media: Berlin/Heidelberg, Germany, 1995;
Volume 1.
23. Ward, F.R.; Habli, I. An Assurance Case Pattern for the Interpretability of Machine Learning in Safety-Critical Systems. Lect.
Notes Comput. Sci. 2020, 12235 , 395–407. [CrossRef]
24. Lewis, A.D.; Groth, K.M. Metrics for evaluating the performance of complex engineering system health monitoring models.
Reliab. Eng. Syst. Saf. 2022, 223, 108473. [CrossRef]
25. Holland, J.H. Adaptation in Natural and Artificial Systems ; MIT Press: Cambridge, MA, USA, 1992. [CrossRef]
26. Stanovov, V.; Brester, C.; Kolehmainen, M.; Semenkina, O. Why don’t you use Evolutionary Algorithms in Big Data? IOP Conf.
Ser. Mater. Sci. Eng. 2017, 173. [CrossRef]
27. Jović, A.; Brkić, K.; Bogunović, N. A review of feature selection methods with applications. In Proceedings of the 2015
38th International Convention on Information and Communication Technology, Electronics and Microelectronics, MIPRO
2015—Proceedings, Opatija, Croatia, 25–29 May 2015; pp. 1200–1205. [CrossRef]
28. Branco, P.; Torgo, L.; Ribeiro, R.P. Pre-processing approaches for imbalanced distributions in regression. Neurocomputing 2019,
343, 76–99. [CrossRef]
29. Gado, J.E.; Beckham, G.T.; Payne, C.M. Improving Enzyme Optimum Temperature Prediction with Resampling Strategies and
Ensemble Learning. J. Chem. Inf. Model. 2020, 60, 4098–4107. [CrossRef] [PubMed]
30. Hoque, N.; Bhattacharyya, D.K.; Kalita, J.K. MIFS-ND: A mutual information-based feature selection method. Expert Syst. Appl.
2014, 41, 6371–6385. [CrossRef]
Aerospace 2022, 9, 839 27 of 27
31. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.;
et al. Scikit-learn: Machine Learning in {P}ython. J. Mach. Learn. Res. 2011, 12, 2825–2830. [CrossRef]
32. Frederick, D.K.; DeCastro, J.A.; Litt, J.S. User’s Guide for the Commercial Modular Aero-Propulsion System Simulation (C-MAPSS); NASA:
Washington, DC, USA, 2007.
33. Saxena, A.; Goebel, K.; Simon, D.; Eklund, N. Damage Propagation Modeling for Aircraft Engine Prognostics; IEEE: Piscataway, NJ,
USA, 2008.
34. Wang, T.; Yu, J.; Siegel, D.; Lee, J. A similarity-based prognostics approach for remaining useful life estimation of engineered
systems. In Proceedings of the 2008 International Conference on Prognostics and Health Management, PHM 2008, Denver, CO,
USA, 6–9 October 2008. [CrossRef]
35. Jia, X.; Cai, H.; Hsu, Y.; Li, W.; Feng, J.; Lee, J. A novel similarity-based method for remaining useful life prediction using kernel
two sample test. In Proceedings of the Annual Conference of the Prognostics and Health Management Society, PHM, Scottsdale,
AZ, USA, 21–26 September 2019; Volume 11, pp. 1–9. [CrossRef]
36. Heimes, F.O. Recurrent neural networks for remaining useful life estimation. In Proceedings of the 2008 International Conference
on Prognostics and Health Management, PHM 2008, Denver, CO, USA, 6–9 October 2008. [CrossRef]
37. Saxena, A.; Goebel, K.; Simon, D.; Eklund, N. Damage propagation modeling for aircraft engine run-to-failure simulation. In
Proceedings of the 2008 International Conference on Prognostics and Health Management, PHM 2008, Denver, CO, USA, 6–9
October 2008. [CrossRef]
38. Zhang, C.; Lim, P.; Qin, A.K.; Tan, K.C. Multiobjective Deep Belief Networks Ensemble for Remaining Useful Life Estimation in
Prognostics. IEEE Trans. Neural Networks Learn. Syst. 2017, 28, 2306–2318. [CrossRef] [PubMed]
39. Babu, G.S.; Zhao, P.; Li, X.L. Deep convolutional neural network based regression approach for estimation of remaining useful
life. Lect. Notes Comput. Sci. 2016, 9642, 214–228. [CrossRef]
40. Jayasinghe, L.; Samarasinghe, T.; Yuen, C.; Chen, J.; Low, N.; Ge, S.S. Temporal Convolutional Memory Networks for Remaining
Useful Life Estimation of Industrial Machinery. arXiv 2018, arXiv:1810.05644.
41. de Pater, I.; Mitici, M. Predictive maintenance for multi-component systems of repairables with Remaining-Useful-Life prognostics
and a limited stock of spare components. Reliab. Eng. Syst. Saf. 2021, 214, 107761. [CrossRef]