A Rare Failure Detection Model for Aircraft Predictive Maintenance
A Rare Failure Detection Model for Aircraft Predictive Maintenance
https://doi.org/10.1007/s00521-022-07167-8 (0123456789().,-volV)(0123456789().
,- volV)
Received: 29 April 2021 / Accepted: 2 March 2022 / Published online: 26 March 2022
Ó The Author(s) 2022
Abstract
The use of aircraft operation logs to develop a data-driven model to predict probable failures that could cause interruption
poses many challenges and has yet to be fully explored. Given that aircraft is high-integrity assets, failures are exceedingly
rare. Hence, the distribution of relevant log data containing prior signs will be heavily skewed towards the typical (healthy)
scenario. Thus, this study presents a novel deep learning technique based on the auto-encoder and bidirectional gated
recurrent unit networks to handle extremely rare failure predictions in aircraft predictive maintenance modelling. The auto-
encoder is modified and trained to detect rare failures, and the result from the auto-encoder is fed into the convolutional
bidirectional gated recurrent unit network to predict the next occurrence of failure. The proposed network architecture with
the rescaled focal loss addresses the imbalance problem during model training. The effectiveness of the proposed method is
evaluated using real-world test cases of log-based warning and failure messages obtained from the fleet database of aircraft
central maintenance system records. The proposed model is compared to other similar deep learning approaches. The
results indicated an 18% increase in precision, a 5% increase in recall, and a 10% increase in G-mean values. It also
demonstrates reliability in anticipating rare failures within a predetermined, meaningful time frame.
Keywords Predictive maintenance Deep learning Extremely rare failure Auto-encoder GRU network
Aircraft
1 Introduction
123
2992 Neural Computing and Applications (2023) 35:2991–3009
aircraft on ground (AOGs) and the associated operational imbalanced, and the cost of misclassification is high. An
interruptions [2, 3]. A good predictive model could tell example of such a domain is log data generated by the
which aircraft parts need schedule checks and those that do aircraft central maintenance system known as ACMS data.
not need it, but achieving such maintenance accuracy ACMS data are usually imbalanced because aircraft com-
necessities experience and the right technology [2]. ponent failure rarely occurs during regular flight operations
Artificial intelligence (AI) and related technologies, due to robust safety measures. Apart from the extremely
such as the Internet of Things (IoT), machine learning, and imbalanced problem, ACMS data pose several analytical
symbolic reasoning, have recently advanced to the point issues: irregular patterns and trends, class overlapping, and
where they are causing a paradigm shift in every aspect of small class disjunct. The standard machine learning algo-
human life, including manufacturing, transportation, rithm and feature selection or extraction methods become
energy, and advertising. Aerospace is one of the industries less effective when extremely imbalanced data with class
that has had the share impact of AI. Aircraft maintenance is overlapping are used for training [6]. Training machine
quickly adopting AI to build predictive maintenance learning algorithms with imbalanced data has been shown
towards ‘‘aircraft smart maintenance’’. Machine learning to degrade data-driven models’ performance, causing
algorithms are trained to forecast failure and suggest unreliable prognostics [7, 8].
appropriate actions depending on the predicted failure, There have been recent improvements in predictive
which is a step towards smart maintenance solutions. The modelling research from both academic and industrial
conditioned-based predictive maintenance provides cost- perspectives [9]. There are four types of predictive main-
saving over time-based preventive maintenance Burijs tenance modelling approaches, namely physics-based,
et al. [4] as maintenance is done based on the condition of knowledge-based, data-driven-based, and hybrid-based.
the component, not time-based as in preventive mainte- The physics-based approach focuses on the equipment
nance. The large amount of data generated from IoT degradation process and necessitates the knowledge of the
devices installed in aircraft to monitor various components’ underlying physical failure mechanisms of the components
health conditions combined with data analytics through [10]. The physics-based modelling approach’s application
machine learning can significantly improve aircraft main- can be seen in [11, 12], where a digital model of equipment
tenance activities. is created to enable the digital-twin (DT) concept in pre-
Applying correct data analytics and training machine dictive maintenance applications. DT is the concept where
learning algorithms with a vast amount of data can reveal multi-physics modelling is combined with data-driven
underlying patterns and trends that are not visible to analytics. GE has developed an intelligent IoT-based
humans. The information discovered can support proactive monitoring and diagnostics platform based on DT to pre-
decision-making, such as recommending the best mainte- dict physical asset future [13]. The advantage of this
nance actions. Therefore, well-developed machine algo- approach is that it is applicable even if the dataset is scarce.
rithms are needed to harness relevant information from big Another approach to predictive maintenance modelling
data. As artificial intelligence (AI) and related technologies is knowledge-based or expert system modelling. This
continue to advance, data become more available with a approach involves a combination of domain expert
less challenging acquisition, storage, and processing knowledge and computational intelligence techniques. It
methods. However, newer analytical challenges are stores information from domain experts, and rulesets are
emerging. One unique challenge is the extremely rare event defined based on the knowledge base for interpretation
prediction when events are infrequent, causing the gener- [14]. The knowledge-based approach has been applied for
ated data to be imbalanced, meaning that there are signif- predictive aircraft maintenance [15, 16]. The authors
icantly fewer data in one class compared to other classes. develop a framework and design methodology for the
Training a traditional machine learning algorithm with a development of knowledge-based condition monitoring
skewed dataset has been shown to degrade the resulting systems. Knowledge-based approaches are more practical
model’s performance [4]. Therefore, to develop a robust for a small and basic system. Its implementation in a big,
machine learning model for predictive maintenance, it is complicated system, which is difficult and, in some situa-
vital to address imbalanced data before training (data level tions, impossible since domain experts must constantly
approach) or to train the model (algorithm-level approach). update the rules in the event of upgrades or changes, which
The challenge traditional machine learning algorithms is time-consuming.
face with the extremely imbalanced dataset is that they are The data-driven approach involves learning systems’
built on the assumption that the data distribution is always behaviour directly from already collected historical oper-
balanced, and the cost of misclassification is the same for ational data to predict the future of a system’s state or
all classes [5]. However, that assumption is untrue because identify and match similar patterns in the dataset to infer
there exist some domains where the data are highly remaining useful life (RUL) or other insights. The data-
123
Neural Computing and Applications (2023) 35:2991–3009 2993
driven modelling methods can be grouped into artificial attributed to an increase in the availability of data, hard-
intelligence (AI)-based, statistical modelling methods, and ware, and software improvements and many breakthroughs
sequential pattern mining modelling methods [17]. AI in algorithm development that speed up training and other
methods include machine learning, Bayesian methods, and data generalisations [20]. Despite the advances, little work
deep learning methods. AI-based methods have been has been done to investigate the effect of extremely
widely used for developing predictive maintenance models imbalanced, class overlapping, and small class disjunct on
in different industries. Çinar et al. [19] provided a detailed the network’s architectures. Many researchers have agreed
survey on recent applications of AI in predictive mainte- that the subject of imbalanced data with deep learning is
nance. The hybrid approach includes a combination of two understudied [21–24]. In deep learning, the ANNs are
or more techniques for estimation to improve accuracy. trained to find complex structures in a dataset by using a
Improving accuracy in rare failure prediction requires a back-propagation algorithm. The algorithm calculates
robust hybrid approach. In recent times, deep learning (DL) errors made by the model during training, and the models’
models have been shown to produce state-of-the-art per- weights are updated in proportion to the error. The draw-
formance when trained with large datasets [18, 19] because back of this learning method is that examples from both
of their capability of combining feature extraction with classes are treated the same. In that situation where the data
learning. The advances in machine learning research, are imbalanced, the model will be adapted more to the
especially using deep neural networks to learn more com- majority class than the minority class, which can affect the
plex temporal features, make DL suitable for a large log- performance of the models [20]. The majority of the deep
based dataset [1]. Other work has shown the effectiveness learning methods for imbalanced classification have
of DL models in handling extremely imbalanced datasets, depended on integrating either resampling or cost-sensitive
especially using log-based ACMS datasets to develop air- into the deep learning process [25]. For instance, Hensman
craft predictive maintenance models [9]. et al. [26] use random oversampling techniques to balance
In this study, a data-driven model is proposed for rare the data and then train the balanced data using CNN. Also,
failure prediction. The model consists of deep neural net- Lee et al. [22] use random undersampling to balance the
works, the auto-encoder to detect failures, and bidirectional dataset for the purpose of pretraining CNN. The use of
gated recurrent unit (BGRU) networks combined with dynamic sampling to adjust the sampling rate according to
convolutional neural networks (CNNs) to learn the co-re- the class size for training CNN was proposed by Pouyanfar
lationships between variables, enhancing the prediction of et al. [27]. Buda et al. [24] investigate the effect of random
rare failure. The effectiveness of the model is evaluated oversampling, random undersampling, and two-face
using real-world log-based ACMS time series data. The learning across many imbalanced datasets on deep neural
proposed model will help mitigate the effects of unsched- networks. The literature review [20, 24] reveals that most
uled aircraft maintenance, producing systematic condi- of the proposed deep learning resampling approaches for
tioned-based predictive maintenance, a step towards a imbalanced problems use image datasets and CNN archi-
smart-aircraft maintenance system. tecture. The need to investigate the effect of imbalanced on
The remainder of this paper is structured as follows: other deep learning architectures and to use time series is
Section 2 discusses the related work. Section 3 provides a still lacking.
methodology that shows a detailed architecture of the auto- On the other hand, several studies have focused on
encoder, convolutional neural network—CNN, and bidi- applying cost-sensitive strategies to solve the problem of
rectional gated recurrent unit—BGRU. Section 4 presents imbalanced classification, which entails changing the deep
the experimental set-up and case study. The experimental learning process to favour both classes during model
result is presented and discussed in Sect. 5. Finally, Sect. 6 training. For example, Khan SH et al. [28] proposed a cost-
presents the conclusion and further work. sensitive deep neural network that can automatically learn
robust feature representations for both the majority and
minority classes. Also, Zhang et al. [29] propose cost-
2 Related work sensitive deep belief networks, and Wang H et al. [30]
propose a cost-sensitive deep learning approach to predict
Deep learning is a branch of machine learning that consists hospital readmission. Also, the use of loss function to
of multiple processing layers that use artificial neural net- control biases has been shown in Wang S et al. [23]. The
works (ANNs) to learn data representations at multiple authors proposed a novel loss function called mean false
levels of abstraction. Deep learning models have dramati- error and its improved version of mean-squared false error
cally improved the performance of models in a variety of for learning from an imbalanced dataset. Similarly, a new
areas, including large-scale data processing and image loss function called focal loss was proposed by Lin et al.
identification, among others [7]. The success has been [31] for dense object detection in image classification. The
123
2994 Neural Computing and Applications (2023) 35:2991–3009
focal loss was proposed to specifically handle the challenge prediction model using event matching for aerospace
of extreme data imbalances commonly faced in object applications. As seen in the previous study by Maren et al.
detection problems, where the foreground samples usually [1], one of the approaches to identifying and predicting rare
outnumber the background samples. Normally, this type of failure is using an anomaly detection approach, which is
problem is mostly solved using the one-stage detection framed in the form of unsupervised machine learning,
approach or two-stage detection. The two-stage detection where the data are divided and labelled as negative and
usually performs at the cost of computation time compared positive samples. In the case of using an auto-encoder, each
to one-stage. Lin et al.’s [31] study focused on determining class is treated separately, and the negatively labelled
how the one-stage approach with fast computation time can samples’ low-dimensional features are extracted from
achieve a state-of-the-art performance compared to the higher-dimensional data using any feature extraction pro-
two-stage. Their study discovered that the main cause of cesses [1]. Then, rare failures are detected and predicted
performance degradation in one-stage detection is the based on the reconstruction error. Most of the well-known
imbalanced data problem. The overwhelming background traditional or typical data reduction and fault detection
samples create imbalance, causing the majority class to methods are the principal component analysis (PCA),
account for most of the overall loss. To address that partial least square (PLS), and independent component
challenge, Lin et al. [8] proposed a loss function known as analysis (ICA). These methods use different ways to reduce
the focal loss (FL) derived from a normal binary cross- data dimensionality, and they have achieved a varying
entropy loss. The FL is expressed as follows: degree of success on different data distributions [1, 37].
However, they have fundamental limitations to the non-
Focal Loss FL p;t ¼ ð1 ðpt ÞÞc log10 ðpt Þ ð1Þ
linear features since they rely on linear techniques. Kernel
The new FL tries to reduce the impact that the majority tricks have been developed to convert the nonlinear raw
of samples have on the loss by multiplying the cross-en- data into linear data, and examples are the KPCA [37] and
tropy loss with a modulating factor - ð1 ðpt ÞÞc , where KICA [38]. However, they required high computational
the hyperparameter c C 0 adjusts the learning rate, the power due to kernel function, especially large data [1].
negative samples are downweighed. Their implementation Deep learning (DL) has recently proven superior per-
shows that using one-stage detection with focal loss by formance in many areas, such as image classification. Also,
selecting the right learning rate outperformed the two-stage it has widely been used in the finance sector for the analysis
approach. The implantation method was only compared of time-series data [9]. DL can also be utilised for pre-
with cross-entropy and tested for imbalance problems in dictive maintenance. The system installed to monitor an
objection detection. The focal loss was later tested in image asset’s state generates extensive time-series data. There-
classification by K Nemoto et al. [32]. The authors use fore, deep learning algorithms are trained using time-series
CNN architecture and then compare the performance of data to find patterns to predict failures. Recent develop-
focal loss and cross-entropy loss for image classification. ments in deep learning have made it easy for deep, com-
The open literature lacks a study investigating the focal plex artificial neural networks to automatically extract
loss’s effectiveness on time-series systems’ log-based features from the original dataset (dimension reduction)
datasets, particularly the ACMS dataset. during training [39, 40]. The auto-encoder (AE) [41] is an
The identification and prediction of rare failures are example of a deep neural network algorithm that has been
active research subjects that have sparked the creation of a successfully implemented for fault detection and predic-
variety of methodologies [33]. Asset rare failure prediction tion. However, it needs larger data samples and a longer
is a critical issue that has been approached within various processing time to achieve higher performance [42].
contexts, such as machine learning and statistics [1, 17]. Advances have been made to tackle slightly rare event
System log data have widely been used to develop rare predictions, especially in the aerospace domain, using
failure predictive models in different domains. For exam- machine learning approaches [1, 43, 44]. Deep learning
ple, deep learning has been used to predict rare IT software models have also been developed for rare event predic-
failures using a log-based dataset [34]. Panagiotis et al. tions. For example, Wu et al. [18] developed a weighted
[35] developed a failure event model using post-flight deep representation learning model for imbalanced fault
records. The authors used multiple instances learning diagnosis in cyber-physical systems. Their model is com-
approaches to structure the model as a regression problem posed of long recurrent convolutional LSTM model with a
to approximate the risk of a target event occurring. Sipos sampling policy. Also, Khanh et al. [19] developed a
et al. [36] developed a data-driven approach based on dynamic predictive maintenance framework based on
multiple-instance learning for predicting equipment fail- sensor measurements. Changchang et al. [45] combined
ures. Evgeny [10] developed a data-driven rare failure multiple DL algorithms for aircraft prognostic and health
management. In fact, Burnaev et al. [46] pointed out that
123
Neural Computing and Applications (2023) 35:2991–3009 2995
many aircraft predictive maintenance solutions are built on decoder then reconstructs the input only using the latent
basic threshold settings that detect trivial errors on specific representation. An encoder with more than one hidden
components. On the other hand, the threshold-setting layer is called a deep auto-encoder.
strategy is prone to producing high false-positive rates, The encoding and decoding process can be represented
which lowers model confidence. using the equation as follows:
Although the approaches mentioned above have suc- pi ¼ f ðwp :xi þ bt Þ ð2Þ
cessfully handled normal fault detection and prediction,
there was a limited study about the application of deep yi ¼ gðwy :pi þ bt Þ ð3Þ
learning models for extremely rare failure prediction, where f(.) and g(.) are the sigmoid functions, wi represents
especially for predictive aircraft maintenance using the the weights, and bi represents biases. The following min-
ACMS dataset. Also, developing a robust predictive model imised loss function is used to train the model:
for costly rare aircraft component failure using a large log-
based dataset is quite challenging because many compo- 1 Xn
LðX; YÞ ¼ kxi yi k2 ð4Þ
nents work together and influence each other’s lifetime. 2n i
Another challenge is the heterogeneous nature of the where xi represents the observed value, yi represents pre-
ACMS log data, including symbolic sequence, numeric dicted values, and n represents the total number of pre-
time series, categorical variables, and unstructured text. dicted values.
Therefore, our approach focuses on extremely rare Equation (3) helps in checking the validity of the
failure prediction using log-based aircraft central mainte- resulting underlying feature P.
nance system (ACMS) data. Secondly, the work also Figure 1 shows a more detailed visualisation of an auto-
concentrates on applying a hybrid of deep learning tech- encoder architecture. First, the input data pass through the
niques for performance optimisation. The proposed model encoder, a fully connected artificial neural network (ANN),
integrates AE with BGRU and CNN to detect and predict to produce the middle code layer. The decoder, which has a
extreme aircraft component replacement. The hybrid mirrored ANN structure, will produce the output using the
method is designed to address the challenge of irregular middle-coded layer. The goal is to get an output identical to
patterns and trends caused by skewed data distributaries, the input. Creating many encoder layers and decoder layers
hence enhancing the prediction of rare failures. will enable the AE to represent more complex input data
distribution [1].
3.1 Auto-encoder and bidirectional gated A bidirectional gated recurrent unit (BGRU) is a recurrent
recurrent unit network architecture neural network that has successfully been used to solve
time-series sequential data problems because of its bidi-
This section explains how to combine auto-encoder and rectional learning approach, which enhance the learning of
bidirectional gated recurrent unit network designs to temporal patterns in the time-series data [49]. Each BGRU
improve predictive model performance using large log- block contains a cell that stores information. Each block is
based, multivariate, nonlinear, and time-series datasets. made up of a reset and update gate, and the cells help
tackle the vanishing gradient problem Janusz et al. [50].
3.1.1 The auto-encoder (AE) The reset gate determines how to combine new input with
previous memory, while the update gate defines how much
As presented in Maren et al. [1], auto-encoder [47, 48] is a of the previous memory to retain, BGRU comprises two
specific type of multi-layer feedforward neural network GRU blocks. The input data are fed into the two networks,
where the input is the same as the output neurons. AE aims the feedforward and feedback with respect to time, and
to learn the original data’s internal representation by both of them are connected to one output layer [51]. The
compressing the input into a lower-dimensional space gates in bidirectional GRU are designed to store informa-
called latent-space representation (see Fig. 1). It then uses tion longer in both forward and backward directions, pro-
the compressed representation to reconstruct the output viding better performance than feedforward networks. The
while minimising the error for the input data. Training is bidirectional approach provides the capability to use both
done using a back-propagation algorithm with respect to the past and future contexts in a sequence. BGRU can be
the loss function. AE comprises three components: encoder expressed as:
X, latent-space P, and decoder Y. The encoder compresses
the input and produces the latent representation. The
123
2996 Neural Computing and Applications (2023) 35:2991–3009
Fig. 1 Auto-encoder
architecture [47]
h ! i
ht ¼ ht ; ht ð5Þ where is the Hadamard product. W, U, b are parameter
matrices and vectors. rg and £h are the activation func-
!
where ht ; is the feedfoward and ht the backward block tions, rg is a sigmoid function, and [ h is a hyperbolic
tangent.
The final output layer at time t is:
The BGRU section of the model is designed as follows.
yt ¼ rðW y ht þ by Þ ð6Þ First, the BGRU cells are constructed so that the result of
where r is the activation function, W y is the weight, and by feedforward is computed (F t ) and the feedback propagation
is the bias vector. (Bt ) are merged at the first BGRU layer. Four methods can
As seen in Figs. 2 and 3, each of the GRU blocks is merge the outcome, concatenation (default), summation,
made up of four components. Input vector xI with corre- multiplication, and average. In this study, we will compare
sponding weights and bias, reset gate r I with corresponding the performance of each merging method. The merging is
weight and bias W r; U r ; br , update gate zI with corre- represented as follows:
!
sponding weight and bias W z; U z ; bz and out vector ht with O1t ¼ concatð F t ; Bt Þ ð10Þ
its weight and bias W h; U h ; bh . Fully gated unit is repre- ! ! ! !
sented as follows: !
Such that Ft ¼ h1 ; h2 ; h3 ; . . .; ht
Initially, for t = 0, the output vector is h0 = 0
zt ¼ rg ðWz xt þ Uz ht1 þ bz Þ ð7Þ and Bt ¼ ht ; htþ1 ; htþ2 ; htþ3 ; . . .hn
rt ¼ rg ðWr xt þ Ur ht1 þ br Þ ð8Þ Second, a fully connected layer is used to multiply the
ht ¼ zt ht1 þ ð1 zt Þ ;hðWh xt þ Uh ðrt ht1 Þ þ bh Þ BGRU network’s output with its weight and bias. Then, a
ð9Þ Softmax regression layer makes a prediction using input
123
Neural Computing and Applications (2023) 35:2991–3009 2997
from the fully connected layer. A weighted classification improvement, especially for image classification [53, 54].
layer is used to compute the weighted cross-entropy loss In a scenario where the input data are not images, such data
function for prediction score and training target, which can be transformed to suit CNN [55]. Time-series data are
helps tackle the imbalanced classification problem. The one of those data structures that can be transformed for
following loss is used: CNN applications. As seen in Fig. 3, with a time-series
dataset of length M and width N, the length is the number
p;t ¼ ð1 ðpt Þc Þ log2 ðpt Þ hi ð11Þ
of timesteps in the data, and the width is the number of
where (p;t ) represents the estimated probability of each variables in a multivariate time series. In transforming the
class, and c 0 is the discount factor parameter that can be time-series data for CNN [56, 57], a 1D convolutional
tuned for best estimation, and hi is the logic weight of each kernel would be of the same width (number of variables).
class (Table 1). The kernel will then move top to down performing con-
volutions until the end of the series. The time-series ele-
3.1.3 The convolutional neural networks ments covered at a given time (window) are multiplied
with the convolutional kernel elements, the multiplication
The use of deep learning approaches to process time-series result is added, and a nonlinear activation function is
data has recently been shown to produce improved results applied to the value. The resulting value becomes an ele-
[52]. One of the deep learning approaches that have been ment of the next new filtered series. The kernel then moves
widely used is convolutional neural networks (CNNs). forward to produce the next value. Max-pooling is applied
CNN’s popularity is attributed to its capability to read, to each of the filtered series of vectors. The vector’s largest
process, and extract the most important features of two- value is chosen, which is used as an input to a regular, fully
dimensional data, contributing to its performance connected layer (Fig. 4).
There is no out-of-the-box or specified rule-of-thumb
technique to constructing the framework of BGRU with
Table 1 Proposed BGRU architecture
CNN. Standard artificial neural network structure usually
Layer (type) Output shape Param # consists of an input layer, one or more hidden layers, and
Bidirectional (Bidirectional multiple 8256 an output layer. The number of hidden layers and neurons
Bidirectional_1 (Bidirection multiple 7872 used to achieve an optimal solution varies per situation,
Repeat_vector (RepeatVector) multiple 0
and it is usually a trial-and-error process. The most com-
Bidirectional_2 (Bidirection multiple 4800
mon approach is the use of K-fold cross-validation, as seen
Bidirectional_3 (Bidirection multiple 12,672
in [58–60]. However, for evaluation, some k number of
nodes need to be defined, which can be obtained by a
time_distributed (TimeDistri multiple 585
simple formula,
Total params: 34,185
Trainable params: 34,185 Ms
Mk ¼ ð12Þ
Non-trainable params: 0 aðM i þ M 0 Þ
123
2998 Neural Computing and Applications (2023) 35:2991–3009
where M s is the total number of samples in the training Recurrent neural networks (RNNs) are commonly used to
data, M i and M 0 are a number of input and output neurons, train time-series datasets; nevertheless, RNNs suffer from
respectively, and a is the scaling factor. For example, if a is vanishing gradient difficulties and have a short-term
set between two to ten, it means we can calculate eight memory. When using a gradient-based learning strategy
different numbers to feed into the validation process to with back-propagation to train a deep multi-layer RNN
obtain an optimal result. The number of parameters to train (feedforward network), the problem of vanishing gradients
is computed as Eqs. 5–11, the number of inputs in the first emerges [61]. Each iteration of the method updates the
layer equals the defined window size, and the number of weight of each ANN in proportion to the partial derivatives
folds to use in the cross-validation. The subsequent layers of the error function with respect to weight. The problem
have a number of outputs of the previous layer as input. A develops when valuable gradient information cannot be
simulation is conducted, and the training and testing errors propagated back to the model’s input layer from the out-
are plotted over the number of neurons in the hidden layer. layer [62]. The gated recurrent unit (GRU) networks were
The number of neurons is chosen that minimises the test designed to capture long-time dependencies in sequence
error while keeping an eye on overfitting. Because the learning and to manage the gradient vanishing problem
problem is formulated as binary classification and the data using modified hidden layers or gates in order to overcome
are extremely imbalanced, we use a modified loss function the vanishing gradient problem in RNN.
(Eqs. 6–10) and Softmax as the final activation function. Convolutional neural network (CNN) uses a process
known as convolution when determining a relationship
3.2 Proposed method between available variables in the dataset [20]. For
example, in convolutional learning, given two functions f
3.2.1 AE–-CNN–BGRU network and g, the convolution integral expresses how the shape of
one function is modified by the other. Traditionally, CNNs
Our goal is to create a model that can identify and predict were designed to process multi-dimensional data, such as
rare failures using a large log-based dataset. The main idea image classification, not to account for sequential depen-
is to separate the prediction of rare failure from its detec- dencies like in RNNs, LSTMs, or GRUs [63]. Therefore,
tion, as shown in Fig. 5. As a result, the proposed model the key benefit of adding CNN layers for sequential
uses two stages: auto-encoder for detecting unusual failures learning is its ability to use filters bank [64] to compute
and BGRU and CNN architectures for forecasting future dilations between each cell, also referred to as ‘‘dilated
instances of that failure. convolution’’, which in turn allows the network layers in
The BGRU was chosen in the design because it can CNN to understand better the relationships between the
capture a long dependency in both directions (forward and different variables in the dataset, generating improved
backwards) to allow for successful learning. The rationale results.
for the method selection is based on the dataset’s charac- As explained in Maren et al. [1], the dataset is extremely
teristics (i.e., heterogeneous and time series in nature). imbalanced; that is, the imbalanced ratio between the
positively labelled and negatively labelled data is less than
123
Neural Computing and Applications (2023) 35:2991–3009 2999
Fig. 5 An integrated AE, BGRU, and CNN networks for rare fault detection and prediction
5% of the total. In such an extremely rare problem, tradi- of the framework, which is the AE–BGRU or AE–CNN–
tional deep learning algorithms are overwhelmed by the BGRU model for the failure prediction. The input data to
majority class, producing bias result without detriment to the prediction model are the learned latent representation of
the minority class [41, 65]. Therefore, we proposed AE– the original dataset. To determine a threshold that offers
CNN–BGRU to handle the problem differently. The the best result, we construct a function that iterates through
framework of the proposed model is shown in Fig. 5. The a loop using precision and recall until the desired threshold
first AE model is used to detect rare failures using recon- is obtained.
struction errors at the detection stage. The data are divided
into positively labelled (rare minority class) and negatively
labelled (majority class). The AE model is then trained 4 Case study and experimental setup
with only negatively labelled data (X ve ) by feeding the
encoder layer of AE with the original negatively labelled The goal of the experiment is to see how well our proposed
data. The latent code, which represents a compresses fea- technique handles the infrequent incidences of failure. The
ture, is extracted in the middle layer. The decoder layers primary research question is whether AE–CNN–BGRU can
will then reconstruct the original data using compressed beat traditional unidirectional deep learning time-series
latent code as input. After the encode–decode process, a approaches with explicit failure detection and additional
reconstruction error is known, which also shows the training capabilities on an extremely imbalanced dataset.
highest error that is later used for threshold setting. Since Another significant question is whether learning in two
the AE model is first trained using negatively labelled data directions might increase model performance for rare
when the data are combined (X t ) and fed into the AE failure prediction (feedforward and feedback propagation).
model. An anomaly can easily be detected because any Also, how different does the architecture of deep learning
data point coming from the negatively labelled class is models treat the input data? A series of experiments are
expected to have a low error, and if coming from a positive conducted to answer the questions. A log-based data from
class, the error will be higher. The low error is because it is the aircraft central maintenance system, which includes
coming from the same data used to train the first-section aircraft failure and warning alerts, are used. The following
AE model (as seen in the detection phase of Fig. 5). On the experiments are set up.
other hand, when a new data point is from a positively
1. To investigate whether the proposed AE–BGRU model
labelled class, it is expected to have a higher reconstruction
has a performance advantage over the normal GRU
error score which will pick as an anomaly [1].
model in predicting rare aircraft component failure.
For example, when a datapoint xt is fed into the AE
2. To investigate whether additional layers of training in
model, it will be classified as a fault if the reconstruction
the AE–CNN–BGRU model architecture can improve
error exceeds a defined threshold; otherwise, it will be
model performance.
classified as no-fault. Once the faults are identified, the
3. To investigate whether training the proposed model
resulting compressed data are then fed into the next section
using an extremely imbalanced dataset in a
123
3000 Neural Computing and Applications (2023) 35:2991–3009
bidirectional way (forward and backwards) can From Fig. 7, we derive Eqs. (12) to (16),
improve model performance. Accuracy ¼ ðtp þ tnÞ=n ð12Þ
4. To provide a performance analysis of deep learning
architecture for the rare failure prediction via the log- ) ability to correctly classify all observations.
based ACMS dataset. Precision (p): is the measure of classifier exactness, the
percentage of true-positive predictions made by the clas-
The modelling approach is divided into two categories: sifier that is truly correct. So, low precision indicates a
binary class and multi-class. We characterised the first large number of false positives.
situation as a multi-class classification problem that pre-
dicts all the targeted component failures at once. Second, P ¼ tp=ðtp þ fpÞ ð13Þ
we modelled it as a binary classification problem in which Recall (r) is the classifier completeness measure and is
specific functional items are predicted. defined as the percentage of true positives that the classifier
can correctly detect. So, low recall indicates many false
4.1 Dataset negatives.
R ¼ tp=ðtp þ fnÞ ð14Þ
As Maren et al. [66] explained, this study uses over eight
years’ worth of data recorded from more than 60 aircrafts. G-mean measures the root of the product of class-wise
The dataset is collected from two databases. The first sensitivity; it maximises each class’s accuracy and keeps
database is the aircraft central maintenance system the accuracy balanced.
(ACMS) data, which comprises error messages from BIT pffiffiffiffiffiffiffiffiffiffiffi
G mean ¼ P R ð15Þ
(built-in test) equipment (that is, aircraft fault report
records) and the flight deck effects (FDE). These messages False-positive rate is calculated using the following
are generated at different stages of flight phases (take-off, equation.
cruise, and landing). The second database is the record of FPR ¼ fp=fp þ tn ð16Þ
aircraft maintenance activities (i.e., the comprehensive
description of all aircraft maintenance activities recorded Receiver operating characteristic curve (ROC): ROC is
over time). The dataset is obtained from a fleet comprised a graphical representation that illustrates the classifier’s
of A330 and A320 aircraft. Some components are identi- diagnostic ability as the discriminant threshold is varied.
fied by functional item number (FIN) chosen for validation. An excellent model has an area under the curve AUC with
The target components are chosen based on their high a value near one, meaning the model has a good separa-
practical value and an adequate number of known failure bility measure.
cases. The other consideration for the choice of the com- Assuming we have two classes, the positive and nega-
ponent is those that are replaced due to unscheduled. Fig- tive classes ROC curve of those classes’ probability is
ure 6 shows an example of the ACMS dataset. discribed in Figs. 8 and 9.
Data from the year 2011 to 2016 are used for training, Figure 8 shows the ROC curve for an ideal situation.
while the remaining data from 2016 to 2018 are used for The green distribution curve represents the positive class
testing. The targeted LRUs from the A330 aircraft family (component failure), and the black distribution curve rep-
are 4000KS—Electronic Control Unit/ Electronic Engine resents the negative class (non-failure). When the two
Unit, 4000HA—Pressure Bleed Valve, and 438HC—Trim curves do not overlap, the model has an ideal measure of
Air Valve. From A320 are 11HB—flow control valve, separability (that is, the model can correctly distinguish
10HQ—Avionics equipment ventilation computer, between positive and negative classes).
1TX1—air traffic service unit. Figure 9 depicts a case in which two distributions
intersect. Type 1 and type 2 faults will be introduced in this
4.2 Evaluation metrics instance. The mistake can be minimised or maximised
depending on the threshold value. When the AUC is 0.8,
In general, ‘‘accuracy’’ is the most important performance the model has an 80% chance of correctly distinguishing
metric in machine classification. However, using accuracy between positive and negative data. The weakest separa-
to measure performance in extreme imbalanced classifi- bility measure is when the model AUC = 0. (The model is
cation issues can be misleading since, in order to attain reciprocating the classes, which means the model predicts a
high overall accuracy, classifiers would be biased towards negative class as a positive class and vice versa.) When
the majority class. As a result, various alternative metrics, AUC = 0.5, the model has no ability to distinguish
like precision, recall, g-mean, and area under the curve, are between classes.
used better to evaluate the classifiers’ performance [67].
123
Neural Computing and Applications (2023) 35:2991–3009 3001
4.3 Sensitivity analysis for BGRU merge modes concatenation, multiplication, and average). A time-series
data of size 10,000 was generated and trained, using a loss
Sensitivity analysis was carried out to determine the best shown in Eqs. (5–10) and running the BGRU networks for
merging mode that can be used to integrate the outcomes of 200 epochs. The result indicates that concatenation (the
the BGRU layers for the proposed model. As shown in green line) is the best merge mode because it has lower loss
Fig. 10, plotting loss against epoch, the line plot is created values.
to compare the four merge modes (summation,
123
3002 Neural Computing and Applications (2023) 35:2991–3009
123
Neural Computing and Applications (2023) 35:2991–3009 3003
Table 2 Aircraft A330 and A320 rare failure prediction of individual LRUs using ACMS dataset
Aircraft ACMS dataset
LRU’s IR GRU (Baseline) AE–BGRU AE–CNN–BGRU
P R GM FPR P R GM FNR P R GM FNR
A330-Family 4000KS 0.0043 0.60 0.55 0.53 0.005 0.720 0.61 0.67 0.00091 0.909 0.66 0.778 0.00011
4000HA 0.0047 0.41 0.40 0.41 0.008 0.538 0.538 0.632 0.00127 0.769 0.768 0.769 0.000638
438HC 0.0044 0.54 0.51 0.53 0.006 0.666 0.600 0.632 0.00083 0.88 0.610 0.730 0.00027
A320 Family 11HB 0.0028 0.62 0.51 0.49 0.005 0.660 0.58 0.624 0.00019 0.66 0.59 0.671 0.00019
10HQ 0.0031 0.60 0.51 0.55 0.006 0.625 0.49 0.55 0.00028 0.75 0.66 0.707 0.000191
1TX1 0.0064 0.66 0.52 0.58 0.007 0.866 0.764 0.814 0.00029 0.85 0.741 0.860 0.000193
**LRUs represents an aircraft line replacement unit. P is precision, R is recall, GM is g-mean, FPR is a false-positive rate
achieved a precision of 66%, recall 59%, g-mean 67%, and the AE–CNN–BGRU model (Fig. 12b) shows that
a false-positive rate of 0.019% compared to GRU with a AUC = 0.822 indicates that there is an 82.2% chance that
precision of 61%, recall 51% g-mean 49%, and a false- the model will be able to distinguish between positive class
positive rate of 0.005. Similar performance is seen for other (component failure) and negative class (non-failure). In
components, the 10HQ—Avionics equipment ventilation contrast, Fig. 12a shows the ROC curve for the AE–BGRU
computer and 1TX1—Air traffic service unit. model with AUC = 0.737, which indicates that the model
In the six FINs considered, the proposed models show a has a 73.7% chance of distinguishing between classes.
significant improvement in reducing the false-positive rate, Also, to measure the model success rate in predicting
which is very important for any predictive maintenance extremely rare failure, a confusion matrix was plotted for
model acceptability. Also, the AE–CNN–BGRU model both proposed models. Figure 13 shows a confusion matrix
shows an overall improvement of 18% in precision, 5% in for predicting the failure of the electronic engine unit
the recall, and 10% in G-mean. (FIN_4000KS). As seen in Fig. 13a AE–BGRU model
predicted eight failures correctly out of the eleven true
5.1 Measuring the success rate of the proposed failures, and Fig. 13b shows that the AE–CNN–BGRU
models using A330 aircraft model predicted ten out of eleven. This prediction includes
10 flight legs in advance. It can also be observed that the
Figure 12 shows the ROC curve for the proposed models AE–CNN–BGRU model predicts approximately 94% of
AE–BGRU and the AE–CNN–BGRU. The ROC curve for extremely rare failure of components, which is a reasonable
Fig. 12 ROC curve for FIN_4000KS prediction using (a) AE–BGRU and (b) AE–CNN–BGRU models
123
3004 Neural Computing and Applications (2023) 35:2991–3009
specificity, especially for aircraft maintenance 0.864, which indicates that there is an 86.4% probability
acceptability. that the model will be able to distinguish between positive
Similarly, as seen in Fig. 14a, AE–BGRU predicted 7 classes (component failure) and negative class (non-fail-
out of 13, and in Fig. 14b AE–CNN–BGRU predicted 10 ure). In contrast, Fig. 15a shows the ROC curve for the
out of 13 unplanned replacement of pressure bleed valve AE–BGRU model with AUC = 0.817, which indicates that
(FIN_4000HA) failures. This prediction includes 10 flight the model has an 81.7% probability of distinguishing
legs in advance, and it can also be observed that the AE– between classes. The result indicated that AE–CNN–
CNN–BGRU model shows superior performance. A simi- BGRU has an 8% better classification performance com-
lar performance is observed for other components tested. pared to AE–BGRU.
The general result indicated that the proposed AE–CNN– Similarly, as seen in Fig. 16a, AE–BGRU predicted 4
BGRU model detected and predicted approximately 80% out of 6 and Fig. 16b AE-CNN-BGRU 4 out of 6 unplan-
of extremely rare failures, which is a reasonable specificity, ned replacement of pressure bleed valve (FIN_11HB). This
especially for aircraft maintenance. prediction includes 10 flight legs in advance. A similar
performance is observed for other components tested. The
5.2 Measuring the success rate of the proposed general result indicated that the proposed AE-CNN-BGRU
models using A320 aircraft model detected and predicts approximately 50% of extre-
mely rare failures.
Figure 15 shows the ROC curve for the proposed models Although both models predicted 50% of the failure, it
AE–BGRU and the AE–CNN–BGRU. The ROC curve for can be observed that the AE–CNN–BGRU model shows
the AE–CNN–BGRU model (Fig. 15b) shows AUC = superior performance in terms of recall. A good recall
123
Neural Computing and Applications (2023) 35:2991–3009 3005
indicates that the model has a good potential measure of CNN uses a process known as convolution when deter-
correctly identifying true positives. mining a relationship between available variables in the
dataset [20]. For example, in convolutional learning, given
5.3 Sensitivity of AE–CNN–BGRU model two functions f and g, the convolution integral expresses
to design parameters how the shape of one function is modified by the other.
Traditionally, CNNs were designed to process multi-di-
Additional analysis was carried out to determine whether mensional data, such as in image classification, not to
adding CNN layers to the AE–BGRU network could account for sequential dependencies like in RNNs, LSTMs,
improve performance. After the implantation, the result or GRUs [63]. Therefore, the key benefit of adding CNN
indicated that there was performance improvement. The layers for sequential learning is its ability to use filters bank
AE–CNN–BGRU model performance improvement can be [64] to compute dilations between each cell, also referred
accounted to the following factors. First, in training time- to as ‘‘dilated convolution,’’ which in turn allows the net-
series dataset, especially using BGRU or LSTM, such work layers in CNN to understand better the relationships
networks account for the sequential dependency in a situ- between the different variables in the dataset, generating
ation where a correlation exists between the variables in the improved results.
given dataset (a process known as autocorrelation); during
training, a normal GRU/LSTM network would treat all the
variables as independent, excluding any relationship that
exists between both observed and latent variables, whereas
123
3006 Neural Computing and Applications (2023) 35:2991–3009
5.4 Sensitivity analysis of imbalanced ratio. predictive maintenance, with an emphasis on predicting
extremely infrequent failures. The proposed model com-
A sensitivity analysis was carried out for the imbalanced bines an auto-encoder with bidirectional gated recurrent
ratio on the designed network architecture and the input networks, which work together to deliver correct link
data. As observed in Table 2, the six cases considered have failure/warning signals related to aircraft LRU removal
different imbalanced ratios (400KS = 0.0043, 4000HA = while also assisting in the detection of abnormal patterns
0.0047, 438HC = 0.0044, 11HB = 0.0028, 10HQ = and trends. The auto-encoder is in charge of detecting
0.0031, 1TX1 = 0.0064). The components differed not unusual failures, while the BGRU networks (with CNN)
only in the imbalanced ratio but also in distributions and are in charge of making predictions. The proposed tech-
failure patterns. As seen in Fig. 17, it can be observed that nique is evaluated using real-world aircraft central main-
the novel model (AE–CNN–GRBU) shows a significant tenance system (ACMS) data. The evaluation results
reduction in the false-negative rate as compared to others, indicate that the AE–CNN–BGRU model can effectively
indicating that it is robust to different conditions of the handle irregular patterns and trends, mitigating the imbal-
dataset. Also, it is observed that the imbalance ratio anced classification problem. Comparing AE–CNN–BGRU
impacts the false-negative rate for the test components with other similar deep learning methods, the proposed
from the A330 aircraft family (4000KS—electronic control approach shows superior performance with 18% better
unit/electronic engine unit, 4000HA—pressure bleed valve, precision, 5% in a recall, and 10% in g-mean. The results
and 438HC—trim air valve). For example, 4000HA with also indicate the model effectiveness in predicting com-
the highest imbalance ratio of 0.0047 has a false-negative ponent failure within a defined useful period that aids in
rate of about 0.000639 compared to 4000KS with the minimising operational disruption. By traversing the input
lowest imbalanced ratio and false-negative rate of 0.00011. data in a bidirectional manner (feedforward and feedback)
The analysis for A320 (11HB—flow control valve, while making the prediction, the AE–CNN–BGRU model
10HQ—Avionics equipment ventilation computer, networks can better capture the underlying temporal
1TX1—air traffic service unit) shows insignificant changes structure. For specific types of data, such as in-text clas-
to the imbalance ratio in terms of false-negative rate. sification and text-to-words prediction in sequence-to-se-
quence learning, the performance advantage of AE–CNN–
BGRU over the unidirectional GRU is reasonable. How-
6 Conclusion and future work ever, it was unclear whether employing a bidirectional
strategy to train imbalanced time-series data would
This study presents a novel method for condensing a large increase model performance because there may not be
number of logs of aircraft warning and failure messages enough definite temporal contexts and observable in-text
recorded by the central maintenance system into a small sequence examples. Our findings reveal that AE–CNN–
number of the most significant and relevant logs. The BGRU outperforms standard GRU in forecasting
reduced log is then used to create a model for aircraft’
123
Neural Computing and Applications (2023) 35:2991–3009 3007
uncommon failure in log-based aircraft ACMS datasets, 5. Krawczyk B (2016) Learning from imbalanced data: open chal-
answering this topic. lenges and future directions. Prog Artif Intell 5:221–232. https://
doi.org/10.1007/s13748-016-0094-0
In the future, other AE–CNN–BGRU architectures will 6. Dangut MD, Skaf Z, Jennions I (2020) Aircraft predictive
be studied further by translating time data into graphical maintenance modeling using a hybrid imbalance learning
representations utilising recurrence plots. The generated approach. SSRN Electron J. https://doi.org/10.2139/ssrn.3718065
images can be trained with CNN-BGRU to improve their 7. Raghuwanshi BS, Shukla S (2018) UnderBagging based reduced
Kernelised weighted extreme learning machine for class imbal-
performance. Other aircraft data can also be imported into ance learning. Eng Appl Artif Intell 74:252–270. https://doi.org/
ACMS to improve model training. 10.1016/j.engappai.2018.07.002
8. Wu Z, Lin W, Ji Y (2018) An integrated ensemble learning model
Acknowledgements The authors would like to acknowledge the for imbalanced fault diagnostics and prognostics. IEEE Access
Integrated Vehicle Health Management Centre (IVHM), Cranfield 6:8394–8402. https://doi.org/10.1109/ACCESS.2018.2807121
University, and the first author would like to thank PTDF Nigeria for 9. Zhang Y, Li X, Gao L, Wang L, Wen L, Lee DH et al (2020)
sponsoring his study. Deep learning for smart manufacturing: methods and applica-
tions. J Manuf Syst 56:1–13. https://doi.org/10.1016/j.jmsy.2018.
Author contributions Maren designed, coordinated this research, and 01.003
drafted the manuscript. Maren, Steve, and Zakwan carried out 10. Blancke O, Combette A, Amyot N, Komljenovic D, Lévesque M,
experiments and data analysis. Ian and Steve proofread and partici- Hudon C et al (2018) A predictive maintenance approach for
pated in research coordination. The authors read and approved the complex equipment based on petri net failure mechanism prop-
final manuscript. agation model. Proc Eur Conf PHM Soc 4:1–12
11. Blancke O, Komljenovic D, Tahan A, Combette A, Amyot N,
Funding This study is funded by IVHM Centre, Cranfield University, Lévesque M, et al. (2018) A predictive maintenance approach for
UK, in relation to PTDF, Nigeria. complex equipment based on petri net failure mechanism prop-
agation model. In: Proc Eur Conf PHM Soc p. 1
Availability of data and materials All softwares used for supporting 12. Aivaliotis P, Georgoulias K, Arkouli Z, Makris S (2019)
the conclusions of this article are available in the public. The dataset Methodology for enabling digital twin using advanced physics-
used is confidential. It will be available based on request. based modelling in predictive maintenance. Procedia CIRP
81:417–422. https://doi.org/10.1016/j.procir.2019.03.072
13. Parris CJ. (2016) The future for industrial services - the digital
Declarations
twin. Infosys Insights pp. 42–9
14. Okoh C, Roy R, Mehnen J (2017) Predictive maintenance mod-
Conflict of interest The authors declare that they have no competing elling for through-life engineering services. Procedia CIRP
interests. The authors have no relevant financial or non-financial 59:196–201. https://doi.org/10.1016/j.procir.2016.09.033
interests to disclose. 15. Phillips P, Diston D (2011) A knowledge driven approach to
aerospace condition monitoring. Knowledge-Based Syst
Open Access This article is licensed under a Creative Commons 24:915–927. https://doi.org/10.1016/j.knosys.2011.04.008
Attribution 4.0 International License, which permits use, sharing, 16. Ferri FAS, Rodrigues LR, Gomes JPP, De Medeiros IP, Galvao
adaptation, distribution and reproduction in any medium or format, as RKH, Nascimento CL. (2013) Combining PHM information and
long as you give appropriate credit to the original author(s) and the system architecture to support aircraft maintenance planning. In:
source, provide a link to the Creative Commons licence, and indicate SysCon 2013 - 7th Annu IEEE Int Syst Conf Proc pp. 60–5. Doi:
if changes were made. The images or other third party material in this https://doi.org/10.1109/SysCon.2013.6549859
article are included in the article’s Creative Commons licence, unless 17. Berberidis C, Angelis L, Vlahavas I (2004) Inter-transaction
indicated otherwise in a credit line to the material. If material is not association rules mining for rare events prediction. In: Proc 3rd
included in the article’s Creative Commons licence and your intended Hell Conf
use is not permitted by statutory regulation or exceeds the permitted 18. Wu Z, Guo Y, Lin W, Yu S, Ji Y (2018) A weighted deep
use, you will need to obtain permission directly from the copyright representation learning model for imbalanced fault diagnosis in
holder. To view a copy of this licence, visit http://creativecommons. cyber-physical systems. Sensors (Switzerland). https://doi.org/10.
org/licenses/by/4.0/. 3390/s18041096
19. Nguyen KTP, Medjaher K (2019) A new dynamic predictive
maintenance framework using deep learning for failure prog-
References nostics. Reliab Eng Syst Saf 188:251–262. https://doi.org/10.
1016/j.ress.2019.03.018
1. Dangut MD, Skaf Z, Jennions IK (2020) Rare failure prediction 20. Johnson JM, Khoshgoftaar TM (2019) Survey on deep learning
using an integrated auto-encoder and bidirectional gated recurrent with class imbalance. J Big Data. https://doi.org/10.1186/s40537-
unit network. IFAC-PapersOnLine 53:276–282. https://doi.org/ 019-0192-5
10.1016/j.ifacol.2020.11.045 21. Pouyanfar S, Tao Y, Mohan A, Tian H, Kaseb AS, Gauen K,
2. Kingsley-Jones M. (2017) Airbus sees big data delivering ‘‘zero- et al. (2018) dynamic sampling in convolutional neural networks
AOG’’ goal within 10 years. Flightglobal for imbalanced data classification. In: Proc. - IEEE 1st Conf.
3. Wang Y. (2018) Strategies for aircraft using model-based Multimed. Inf. Process. Retrieval, MIPR 2018, p. 112–7. Doi:
prognostics https://doi.org/10.1109/MIPR.2018.00027
4. Buijs YJ. (2018) Integration of smart maintenance and spare part 22. Lee H, Park M, Kim J. (2016) Plankton classification on imbal-
logistics for healthcare systems anced large scale database via convolutional neural networks
with transfer learning. In: Proc - Int Conf Image Process ICIP
2016; -Augus, pp. 3713–7 doi: https://doi.org/10.1109/ICIP.2016.
7533053
123
3008 Neural Computing and Applications (2023) 35:2991–3009
23. Wang S, Liu W, Wu J, Cao L, Meng Q, Kennedy PJ. (2016) network. Sensors (Switzerland). https://doi.org/10.3390/
Training deep neural networks on imbalanced data sets. In: Proc s18051429
Int Jt Conf Neural Networks 2016-Octob, pp. 4368–74. https:// 41. Park P, Di Marco P, Shin H, Bang J (2019) Fault detection and
doi.org/10.1109/IJCNN.2016.7727770 diagnosis using combined autoencoder and long short-term
24. Buda M, Maki A, Mazurowski MA (2018) A systematic study of memory network. Sensors (Switzerland) 19:1–17. https://doi.org/
the class imbalance problem in convolutional neural networks. 10.3390/s19214612
Neural Netw 106:249–259. https://doi.org/10.1016/j.neunet.2018. 42. Liu R, Yang B, Zio E, Chen X (2018) Artificial intelligence for
07.011 fault diagnosis of rotating machinery: a review. Mech Syst Signal
25. Song J, Shen Y, Jing Y, Song M. (2017) Towards deeper insights Process 108:33–47. https://doi.org/10.1016/j.ymssp.2018.02.016
into deep learning from imbalanced data 2: 674–84. https://doi. 43. Dangut MD, Skaf Z, Jennions IK (2020) An integrated machine
org/10.1007/978-981-10-7299-4_56 learning model for aircraft components rare failure prognostics
26. Hensman P, Masko D. (2015) The impact of imbalanced training with log-based dataset. ISA Trans 113:127–139. https://doi.org/
data for convolutional neural networks. PhD 10.1016/j.isatra.2020.05.001
27. Pouyanfar S, Tao Y, Mohan A, Tian H, Kaseb AS, Gauen K, 44. Burnaev E. (2019) Rare failure prediction via event matching for
et al. (2018) Dynamic sampling in convolutional neural networks aerospace applications. In: 2019 3rd Int Conf Circuits, Syst
for imbalanced data classification. In: Proc - IEEE 1st Conf Simulation, ICCSS 2019, pp. 214–20. https://doi.org/10.1109/
Multimed Inf Process Retrieval, MIPR 2018, pp. 112–7 doi: CIRSYSSIM.2019.8935598
https://doi.org/10.1109/MIPR.2018.00027. 45. Che C, Wang H, Fu Q, Ni X (2019) Combining multiple deep
28. Khan SH, Hayat M, Bennamoun M, Sohel FA, Togneri R (2018) learning algorithms for prognostic and health management of
Cost-sensitive learning of deep feature representations from aircraft. Aerosp Sci Technol 94:105423. https://doi.org/10.1016/j.
imbalanced data. IEEE Trans Neural Netw Learn Syst ast.2019.105423
29:3573–3587. https://doi.org/10.1109/TNNLS.2017.2732482 46. Burnaev E. (2019) Rare failure prediction via event matching for
29. Zhang C, Tan KC, Ren R. (2016) Training cost-sensitive Deep aerospace applications
Belief Networks on imbalance data problems. In: Proc. Int. Jt. 47. Baldi P (2012) Autoencoders, unsupervised learning, and deep
Conf. Neural Networks, vol. 2016- Octob, p. 4362–7. Doi: https:// architectures. ICML Unsupervised Transf Learn. https://doi.org/
doi.org/10.1109/IJCNN.2016.7727769 10.1561/2200000006
30. Wang H, Cui Z, Chen Y, Avidan M, Ben AA, Kronzer A (2018) 48. Le Q V. A Tutorial on Deep Learning Part 2: Autoencoders,
Predicting hospital readmission via cost-sensitive deep learning. Convolutional Neural Networks and Recurrent Neural Networks.
IEEE/ACM Trans Comput Biol Bioinf. https://doi.org/10.1109/ Tutorial 2015:1–20
TCBB.2018.2827029 49. Farzad A, Gulliver TA. (2019) Log message anomaly detection
31. Lin TY, Goyal P, Girshick R, He K, Dollar P (2017) Focal loss and classification using auto-B/LSTM and auto-GRU, pp. 1–28
for dense object detection. Proc IEEE Int Conf Comput Vis. 50. Konar A. (1999) Artificial intelligence and soft computing.
https://doi.org/10.1109/ICCV.2017.324 https://doi.org/10.1201/9781420049138
32. Keisuke Nemoto , Ryuhei Hamaguchi , Tomoyuki Imaizumi SH. 51. Savoy J, Gaussier E. (2010) Information retrieval. https://doi.org/
Classification of rare building change using cnn with multi-class 10.4324/9781351044677-24
focal loss Keisuke Nemoto , Ryuhei Hamaguchi , Tomoyuki 52. Livieris IE, Pintelas E, Pintelas P (2020) A CNN–LSTM model
Imaizumi , Shuhei Hikosaka Satellite Business Division , PASCO for gold price time-series forecasting. Neural Comput Appl
CORPORATION ( Japan ) 2018:4667–70 32:17351–17360. https://doi.org/10.1007/s00521-020-04867-x
33. Salfner F, Lenk M, Malek M (2010) A survey of online failure 53. Debayle J, Hatami N, Gavet Y. (2018) Classification of time-
prediction methods. ACM Comput Surv. https://doi.org/10.1145/ series images using deep convolutional neural networks, 23 doi:
1670679.1670680 https://doi.org/10.1117/12.2309486
34. Zhang K, Xu J, Min MR, Jiang G, Pelechrinis K, Zhang H. (2016) 54. Jafari G, Shirazi AH, Namaki A, Raei R. (2011) Coupled time
Automated IT system failure prediction: a deep learning series analysis: Methods and applications. vol. 13. Doi: https://
approach. In: Proc - 2016 IEEE Int Conf Big Data, Big Data doi.org/10.1109/MCSE.2011.102
2016, pp. 1291–300. https://doi.org/10.1109/BigData.2016. 55. Lu W, Li J, Wang J, Qin L (2020) A CNN-BiLSTM-AM method
7840733 for stock price prediction. Neural Comput Appl 33:4741–4753.
35. Korvesis P, Besseau S, Vazirgiannis M. (2018) Predictive https://doi.org/10.1007/s00521-020-05532-z
maintenance in aviation: Failure prediction from post-flight 56. Zhao B, Lu H, Chen S, Liu J, Wu D (2017) Convolutional neural
reports. In: Proc - IEEE 34th Int Conf Data Eng ICDE 2018, networks for time series classification. J Syst Eng Electron
pp. 1423–34. Doi: https://doi.org/10.1109/ICDE.2018.00160 28:162–9. https://doi.org/10.21629/JSEE.2017.01.18
36. Sipos R, Wang Z, Moerchen F. (2014) Log-based predictive 57. Ouhame S, Hadi Y, Ullah A (2021) An efficient forecasting
maintenance, pp. 1867–76 approach for resource utilisation in cloud data center using CNN-
37. Kallas M, Mourot G, Anani K, Ragot J, Maquin D (2017) Fault LSTM model. Neural Comput Appl 33:10043–10055. https://doi.
detection and estimation using kernel principal component anal- org/10.1007/s00521-021-05770-9
ysis. IFAC-PapersOnLine 50:1025–1030. https://doi.org/10.1016/ 58. Munna MTA, Alam MM, Allayear SM, Sarker K, Ara SJF (2020)
j.ifacol.2017.08.212 Prediction model for prevalence of type-2 diabetes complications
38. Lee J-M, Qin SJ, Lee I-B (2008) Fault detection of non-linear with ANN approach combining with K-fold cross validation and
processes using kernel independent component analysis. Can J K-means clustering, vol 69. Springer, Berlin
Chem Eng 85:526–536. https://doi.org/10.1002/cjce.5450850414 59. Applications C. Mathematical and computational applications,
39. Ismail Fawaz H, Forestier G, Weber J, Idoumghar L, Muller PA 2011;16:702–11.
(2019) Deep learning for time series classification: a review. Data 60. Jiang P, Chen J (2016) Displacement prediction of landslide
Min Knowl Discov 33:917–963. https://doi.org/10.1007/s10618- based on generalised regression neural networks with K-fold
019-00619-1 cross-validation. Neurocomputing 198:40–47. https://doi.org/10.
40. Guo S, Yang T, Gao W, Zhang C (2018) A novel fault diagnosis 1016/j.neucom.2015.08.118
method for rotating machinery based on a convolutional neural 61. David Dangut M, Skaf Z, Jennions I. (2020) Rescaled-LSTM for
predicting aircraft component replacement under imbalanced
123
Neural Computing and Applications (2023) 35:2991–3009 3009
dataset constraint. In: 2020 Adv. Sci. Eng. Technol. Int. Conf. 66. David Dangut M, Skaf Z, Jennions I. (2020) Rescaled-LSTM for
ASET 2020, doi: https://doi.org/10.1109/ASET48392.2020. predicting aircraft component replacement under imbalanced
9118253 dataset constraint. In: 2020 Adv. Sci. Eng. Technol. Int. Conf.,
62. Kamath U, Liu J, Whitaker J (2019). Deep Learning for NLP and IEEE; pp. 1–9. https://doi.org/10.1109/ASET48392.2020.
Speech Recognition. https://doi.org/10.1007/978-3-030-14596-5 9118253
63. Lecun Y, Bottou L, Bengio Y, Ha P. (1998) LeNet. Proc IEEE, 67. Roc B. (2021) Comparing two ROC curves – independent groups
pp. 1–46 design. NCSS, LLC, pp. 1–26
64. Lecun Y, Bengio Y, Hinton G (2015) Deep learning. Nature
521:436–444. https://doi.org/10.1038/nature14539 Publisher’s Note Springer Nature remains neutral with regard to
65. Bengio Y, Courville A, Vincent P (2013) Representation learn- jurisdictional claims in published maps and institutional affiliations.
ing: a review and new perspectives. IEEE Trans Pattern Anal
Mach Intell 35:1798–1828. https://doi.org/10.1109/TPAMI.2013.
50
123