A Comparison of Deep Learning Architectures For Spacecraft Anomaly Detection
A Comparison of Deep Learning Architectures For Spacecraft Anomaly Detection
terest has been seen in leveraging these sophisticated algorithms missions. As mankind expands its presence in outer space,
for anomaly detection in space operations. Our study aims to the importance of precise and dependable data from space-
compare the efficacy of various deep learning architectures in craft systems has become of utmost significance. Time series
detecting anomalies in spacecraft data. The deep learning mod- data, which refers to a sequential arrangement of data points
els under investigation include Convolutional Neural Networks organised in chronological order, holds major significance
(CNNs), Recurrent Neural Networks (RNNs), Long Short-Term
Memory (LSTM) networks, and Transformer-based architec- in the domain of spacecraft telemetry. Spacecraft systems
tures. Each of these models was trained and validated using are reflected by telemetry data, which provides information
a comprehensive dataset sourced from multiple spacecraft mis- on their state, health, and performance. This data allows
sions, encompassing diverse operational scenarios and anomaly for the analysis of both regular and potentially abnormal
types. We also present a novel approach to the rapid assignment operations [1].
of spacecraft telemetry data sets to discrete clusters, based on
the statistical characteristics of the signal. This clustering allows Anomalies observed in spacecraft telemetry data are unantic-
us to compare different deep learning architectures to different ipated occurrences that pose potential risks, as they depart
types of data signal behaviour. Initial results indicate that significantly from the predicted operational patterns of the
while CNNs excel in identifying spatial patterns and may be
effective for some classes of spacecraft data, LSTMs and RNNs system. The quick detection and identification of these
show a marked proficiency in capturing temporal anomalies abnormalities is of paramount significance in order to avert
seen in time-series spacecraft telemetry. The Transformer-based catastrophic failures, limit risks, and guarantee the durability
architectures, given their ability to focus on both local and of space missions. According to [2], the prompt identification
global contexts, have showcased promising results, especially and effective detection of these anomalies by operational
in scenarios where anomalies are subtle and span over longer engineers play a crucial role in enhancing efficiency, min-
durations. Additionally, considerations such as computational imising expenses, and enhancing safety. As the complexity
efficiency, ease of deployment, and real-time processing capabil- of spacecraft continues to advance, there is a corresponding
ities were evaluated. While CNNs and LSTMs demonstrated a growth in the variety of telemetry parameters associated with
balance between accuracy and computational demands, Trans-
former architectures, though highly accurate, require significant them. The utilisation of conventional, manual or simple “out-
computational resources. In conclusion, the choice of deep of-limits” techniques are becoming ever more difficult for the
learning architecture for spacecraft anomaly detection is highly purpose of identifying anomalies [3].
contingent on the nature of the data, the type of anomalies,
and operational constraints. This comparative study provides a In recent years, there has been considerable focus on the
foundation for space agencies and researchers to make informed advancement of anomaly detection techniques for satellite
decisions in the integration of deep learning techniques for telemetry data. Numerous advanced algorithms and strate-
ensuring spacecraft safety and reliability. gies have been proposed by prominent organisations such as
NASA [4], ESA [3], and CNES [5] to tackle this task. Every
approach possesses its own set of advantages and disadvan-
tages. There is a clear trend towards deep learning approaches
TABLE OF C ONTENTS over statistical methods due to their ability to synthesise the
1. I NTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 complex multivariate temporally-connected data inherent to
spacecraft telemetry [6]. The objective of this paper is to
2. R ELATED W ORK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 investigate and assess different methodologies for anomaly
3. E XPERIMENTAL S ETUP . . . . . . . . . . . . . . . . . . . . . . . . . . 3 identification in order to determine the most optimal and
efficient approach for analysing spacecraft telemetry.
4. E XPERIMENTS AND R ESULTS . . . . . . . . . . . . . . . . . . . . 5
5. C ONCLUSION AND F UTURE W ORK . . . . . . . . . . . . . . 8 Our work pioneers several notable contributions to the do-
R EFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 main of spacecraft anomaly detection, presenting advance-
ments that enhance the understanding of deep learning in
B IOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 this field. Firstly, it unfolds a comprehensive side-by-side
comparison of multiple deep learning model architectures,
shedding light on their effectiveness in detecting anomalies
in spacecraft telemetry. This comparison is distinctively
©2024 IEEE. Personal use of this material is permitted. Permission from
valuable as it incorporates models that, to our knowledge,
IEEE must be obtained for all other uses, in any current or future media, have not been previously applied to spacecraft anomalies,
including reprinting/republishing this material for advertising or promotional thereby opening new avenues for exploration and implemen-
purposes, creating new collective works, for resale or redistribution to servers tation. Secondly, we introduce an innovative unsupervised
or lists, or reuse of any copyrighted component of this work in other works.
1
mechanism to cluster spacecraft telemetry into like-types, Anomaly Detection
using statistical methods, which allows for a more granular
and nuanced understanding of telemetry data. Thirdly, our 1.0
study unveils insights into the comparative performance of
different deep learning models across the identified clusters, 0.5
providing insights for selecting the most suitable model based Actual Data
Value
on the specific type of telemetry data. These diverse con- 0.0 Forecast
tributions collectively elevate the current state of research Threshold
in spacecraft anomaly detection, offering robust and refined Point
0.5
tools and methodologies for practical applications and future Anomaly
explorations. Collective
1.0 Anomaly
Nomenclature
0 2 4 6 8 10
The following terms are used in this work. Time
3
Following the approach taken in [4], one model is trained anomalies are to be detected.
per telemetry channel. Our study compares thirteen different
architectures, leading to 82 × 13 = 1, 066 trained models The data was pre-split by [4] into “train” anomaly-free data
overall. to establish the nominal conditions and “test” sets, one per
telemetry channel, which contain the labelled anomalies. We
The models were trained utilizing the “fit one cycle” used the same split as in the original study in order to have
method [53], a technique noted for its efficacy in training comparable results.
deep learning models efficiently and reliably. The exper-
iment endeavoured to keep the setup fair and comparable; Data Clustering
thus, hyperparameter tuning was predominantly confined to
ensuring that the RNN-based architectures possessed at least Initial inspection of the telemetry channels show that different
telemetry channels had varying general characteristics such
equivalent depth to the default LSTM implemented in Tele- as “spiky” or “flat”. We wanted to investigate the link
manom. Apart from this modification, we retained the default between the characteristics of the telemetry channels and
hyperparameters provided by the tsai and fastai models the best performing deep learning model architecture, and
to maintain the integrity of the comparative analysis, on the
basis that the defaults are anyway sensible [54]. Furthermore, whether specific architectures work better for certain types of
data. Manual classification is not feasible due to the number
due to the large number of trained models, hyperparameter of telemetry channels, so our idea was to use an unsupervised
tuning was infeasible in any case.
clustering approach.
Early experience during model training showed that model To associate the telemetry channels into clusters, we used an
performance was very sensitive to the learning rate. In order
to negate these effects, we applied the learning rate reduction unsupervised clustering approach. Each class represents a
particular set of characteristics. The method used the standard
scheme ReduceLROnPlateau, provided by the fastai central moments (mean, standard deviation, skewness and
framework [52], to each model. The callback reduces the
learning rate on each epoch if the training loss metrics are kurtosis) calculated for the target parameter of each telemetry
unchanging between consecutive epochs. This has given channel using SciPy [57]. NaN1 values are set to 0. There-
good results in studies such as [55] but at the cost of longer fore each telemetry channel was represented by a single four-
training times. dimensional vector. We applied K-Means clustering [58] to
these four dimensional vectors, as illustrated in Figure 2 and
The computational environment for the experiments was pro- further elaborated in Listing 1.
visioned on a virtual machine, equipped with 8 CPU cores
(Intel Xeon Platinum 8260 CPU @ 2.40GHz) and 16GB of Time Series Input
RAM. No hardware acceleration or GPUs were available.
Data
Due to commercial, legal and security considerations, there Parameter 1 Parameter 2 Parameter 3
are very few well-labelled spacecraft anomaly datasets avail-
able to the public. The “SMAP/MSL” dataset provided by [4] Compute
is a dataset used in other studies into autonomous detection Statistics
of spacecraft anomalies (i.e. [9], an LSTM-based study,
and [56], a CNN-based approach). This dataset consists [mean1,std1,skew1,kurt1]
of curated telemetry streams from NASA’s Soil Moisture [mean2,std2,skew2,kurt2]
Active Passive (SMAP) [13] and Mars Science Laboratory
“Curiosity rover” (MSL) [14] missions. We will have selected [mean3,std3,skew3,kurt3]
this as the dataset for our study because it offers a good
baseline against which to compare our results. K-Means
The data in [4] has been scaled from between (-1,1) and The handling of NaN values is required for the statistics
anonymised. “Model input data also includes one-hot en- skewness and kurtosis because some telemetry channels con-
coded information about commands that were sent or re- tain parameter values with no variance (“flat”, in Table 5);
ceived by specific spacecraft modules in a given time win- these are forced to zeros. Skewness measures the asymmetry
dow.” [4]. This results in a collection of 82 multivariate data of the probability distribution. For a constant data series,
sets, with around 100 labelled anomalies in total across all skewness is not defined, as skewness presupposes that there
data sets, as detailed in Table 2. Each telemetry channel is variance in the data. Kurtosis measures the “tailedness” of
is a multivariate time series of one target parameter and the distribution. For a constant series, like skewness, kurtosis
additional parameters to be used as contextual information.
The target parameter is the time series to be forecast, in which 1 Not a Number - used to signify an arithmetic error
4
is also not defined because kurtosis measures the outliers and
a constant series has none. Mean and standard deviation are The Elbow Method showing the optimal k
defined in case of constant value so do not need to be treated
for NaNs. Skewness and Kurtosis are calculable for non-flat 300
telemetry channels and give a better summary of the data than
mean and standard deviation alone. 250
200
Distortion
Our clustering focuses on the training data set, without
anomalies, so as to identify what the “normal” behaviour of
the parameter is, as summarised by the shape of the curve. In 150
spacecraft operations this is the more likely scenario as often
data channels have yet to experience an anomaly [9], [5]. 100
For each t e l e m e t r y c h a n n e l i : 50
E x t r a c t t a r g e t p a r a m e t e r p i from i
C a l c u l a t e c e n t r a l moments o f p i :
0 2 4 6 8 10
[ Mean , s t a n d a r d d e v i a t i o n , k
skew , k u r t o s i s ] => v e c t o r i
S e t any v a l u e ( v e c t o r i j = NaN ) => 0
Add v e c t o r i t o l i s t Figure 3. Elbow Method used to Determine Optimum
Number of Clusters
Apply K−Means t o l i s t => n c l u s t e r s
total training time and the average training time per channel. Model Performance
True positive (TP), False positive (FP) and False negative The best performing model architecture in our study is
(FN) values are also given per anomaly.
the CNN-based XceptionTimePlus implementation, with F1
anomaly score of 69.9%. This is lower than the tuned
It is expected that “F1 time point” will not be very high, as results from the Telemanom study (Table 1) but represents
the nature of the threshold-based anomaly detector means that
data points either side of a labelled anomaly may not be de- a 6% better performance than the worst performing model
here, FCNPlus. It is noteworthy that the best and worst
tected as anomalous themselves, even though a domain expert performing models are both of the CNN architecture families
would label them as such. Nevertheless, it gives an indication
of the overall model performance when determining if any (XceptionTimePlus 69.9%, FCNPlus 63.2%). This suggests
that there is no intrinsic advantage of CNN-based models in
given data point is anomalous. This is illustrated in Fig- general.
ure 6, whereby a predicted anomaly and actual anomaly may
share few actual data point yet nevertheless be considered The hybrid TransformerLSTMPlus (69.6%) and RNN-based
a successful detection of an anomaly. That is, any overlap GRUPlus (69.1%) show similar performance although with
of predicted anomaly and actual anomaly is considered a
detection, no matter how small (how few data points are vastly different training times.
correctly labelled). This is the metric used in [4] and we
retain it to allow direct comparison of results between their Given the overall good performance of XceptionTimePlus
(69.9%) and the relatively low training time, this architecture
study and ours. would be our recommendation for a general purpose anomaly
The F1 score pertaining to the detected anomalies (“F1 detector, as an initial investigation, before extensive effort it
applied to tuning.
anomaly”) is more significant in terms of perceived anomaly
detection performance by the spacecraft operator [4], [9].
6
A: Unaware of the temporal adjacency B: Unaware of the event duration
True Positive (TP)
False Negative (FN)
Prediction
Truth False Positive (FP)
Time Time
This prediction, close but disjoint Each of those two events should
to the ground truth event have equal importance
should be rewarded irrespective of its duration
Figure 6. Anomaly detection metrics, per anomaly versus per time point
Tra LS TPlu
rLS Plu
LS Plu
tio F Plu
TM im Plu
en Plu
for _F ML
nP
tio et
15% (absolute).
ep sN
i
Xc Re
Inc
Mu
7
Table 3. Results per Architecture
5. C ONCLUSION AND F UTURE W ORK deep learning model architectures exhibit varying degrees of
In conclusion, the insights derived from our study have shown proficiency depending on the nature of the data, be it “spiky”,
innovative advancements in spacecraft anomaly detection, “flat”, “complex”, “oscillating”, or “binary”. We introduced
an innovative clustering methodology in this paper, facilitat-
laying a robust foundation for future explorations and discov- ing the efficient allocation of spacecraft telemetry channels
eries in this domain.
into distinct clusters contingent on the inherent statistical
properties of the data, based on the shape of the curve. This
Conclusion novel approach has not only advanced our understanding but
In this work, we performed a comparative study of diverse has also paved the way for the advent of more sophisticated
deep learning model architectures, with the goal of assessing ensemble models, based on individual models that are har-
their efficacy in spacecraft anomaly detection. Our findings moniously optimized for disparate data types. This ensemble
revealed that model XceptionTimePlus (69.9%) exhibited the approach was able to exceed the performance of the baseline
most optimal performance among all the models assessed study (84.7% vs 83.6%), despite using unoptimised models.
in the study, across all telemetry channels. However, it is
important to note that the overall performance was not on par Future Work
with the outcomes demonstrated in [4]. A contributing factor The work in our study has suggested new possibilities and
to this is the conscious decision to refrain from hyperparame- directions for future research. A natural extension of this
ter optimisation in order to preserve the default comparisons
and allow direct relative comparisons. Nevertheless, our work would be the exploration of ensemble models that are
proficiently optimised to accommodate various data types,
study provides valuable insights into which families of deep leveraging the clustering methodology introduced in this pa-
learning architecture perform well, and which not.
per. Furthermore, a meticulous exploration of hyperparame-
Furthermore, due to constraints in computational resources it ter space will be pivotal to harness the maximal potential of
was not possible to follow the standard optimisation strategies the models, thereby advancing the state-of-the-art in space-
such as grid search, which runs many iterations of the model craft anomaly detection.
to explore the hyperparameter space. With some models
taking several days to run once (e.g. LSTMAttentionPlus at As described above, the individual models were not indi-
vidually optimised per model, rather used defaults from the
1 day and 5 hours), it is infeasible to run the large number of respective frameworks fastai and tsai. The success
iterations required.
of the clustering approach suggests itself as an alternative
approach to the one-model-for-all approach seen in other
In addition to this, our research illuminated that different studies ([9], [4], [3]): that of creating a set of optimised
8
hyperparameters per data type (spiky, binary, etc). Transactions on Software Engineering, vol. 30, no. 3,
pp. 172–180, 2004.
Additionally, current anomaly detection approaches (e.g. [3], [13] P. O’Neill, D. Entekhabi, E. Njoku, and K. Kellogg,
[4], [5], [9]) rely predominantly on forecasting models to “The nasa soil moisture active passive (smap) mission:
deduce nominal behavior, identifying anomalies through a Overview,” in 2010 IEEE International Geoscience and
comparative analysis of predictions against predetermined Remote Sensing Symposium, 2010, pp. 3236–3239.
thresholds. A promising avenue for future research would
be the application of deep learning classification techniques, [14] A. R. Vasavada, “Mission overview and scientific
which could potentially offer a direct assessment of the contributions from the mars science laboratory curiosity
telemetry channels without relying on thresholds. rover after eight years of surface operations,” Space
Science Reviews, vol. 218, no. 3, Apr. 2022. [Online].
Available: https://doi.org/10.1007/s11214-022-00882-
R EFERENCES 7
[1] A. Zacchei, S. Fogliani, M. Maris, L. Popa, N. Lama, [15] J. He, Z. Cheng, and B. Guo, “Anomaly detection
M. Türler, R. Rohlfs, N. Morisset, M. Malaspina, and in satellite telemetry data using a sparse feature-based
and F. Pasian, “Housekeeping and science telemetry: method,” Sensors, vol. 22, no. 17, 2022. [Online].
the case of planck/lfi,” Memorie della Supplementi, pp. Available: https://www.mdpi.com/1424-8220/22/17/
331–334, 2003. 6358
[2] S. Guan, B. Zhao, Z. Dong, M. Gao, and Z. He, [16] K. Chakraborty, K. Mehrotra, C. K. Mohan, and
“Gtad: graph and temporal neural network for S. Ranka, “Forecasting the behavior of multivariate time
multivariate time series anomaly detection,” Entropy, series using neural networks,” Neural networks, vol. 5,
vol. 24, no. 6, p. 759, 2022. [Online]. Available: no. 6, pp. 961–970, 1992.
https://www.mdpi.com/1099-4300/24/6/759 [17] D. Walther, J. Viehweg, J. Haueisen, and P. Mäder, “A
[3] J. M. Heras and A. Donati, “Enhanced telemetry mon- systematic comparison of deep learning methods for eeg
itoring with novelty detection,” AI Magazine, vol. 35, time series analysis,” Frontiers in Neuroinformatics,
no. 4, pp. 37–46, 2014. vol. 17, 2023. [Online]. Available: https://www.
frontiersin.org/articles/10.3389/fninf.2023.1067095
[4] K. Hundman, V. Constantinou, C. Laporte, I. Colwell,
and T. Soderstrom, “Detecting spacecraft anomalies [18] C. S. Han and K. M. Lee, “Hybrid deep learning
using lstms and nonparametric dynamic thresholding,” model for time series anomaly detection,” in RACS
arXiv, 2018. ’23: Proceedings of the 2023 International Conference
on Research in Adaptive and Convergent Systems,
[5] B. Pilastre, L. Boussouf, S. D’Escrivan, and J.-Y. ser. RACS ’23. New York, NY, USA: Association
Tourneret, “Anomaly detection in mixed telemetry for Computing Machinery, 2023. [Online]. Available:
data using a sparse representation and dictionary https://doi.org/10.1145/3599957.3606232
learning,” Signal Processing, vol. 168, p. 107320,
2020. [Online]. Available: https://www.sciencedirect. [19] J. Wang, Z. Wang, J. Li, and J. Wu, “Multilevel
com/science/article/pii/S0165168419303731 wavelet decomposition network for interpretable time
series analysis,” in Proceedings of the 24th ACM
[6] S. Schmidl, P. Wenig, and T. Papenbrock, “Anomaly SIGKDD International Conference on Knowledge
detection in time series: A comprehensive evaluation,” Discovery & Data Mining, ser. KDD ’18. New York,
Proc. VLDB Endow., vol. 15, no. 9, p. 1779–1797, may NY, USA: Association for Computing Machinery,
2022. [Online]. Available: https://doi.org/10.14778/ 2018, p. 2437–2446. [Online]. Available: https:
3538598.3538602 //doi.org/10.1145/3219819.3220060
[7] R. C. J. Chapman, G. Critchlow, and H. Mann, Com- [20] G. Michau, G. Frusque, and O. Fink, “Fully learnable
mand and Telemetry Systems. NASA, 1963. deep wavelet transform for unsupervised monitoring
[8] M. Omran, A. Engelbrecht, and A. Salman, “An of high-frequency time series,” Proceedings of the
overview of clustering methods,” Intell. Data Anal., National Academy of Sciences, vol. 119, no. 8, Feb.
vol. 11, pp. 583–605, 11 2007. 2022. [Online]. Available: https://doi.org/10.1073/pnas.
2106598119
[9] S. Baireddy, S. R. Desai, J. L. Mathieson, R. H. Foster,
M. W. Chan, M. L. Comer, and E. J. Delp, “Spacecraft [21] S. Ye and F. Zhang, “Unsupervised anomaly detection
time-series anomaly detection using transfer learning,” for multilevel converters based on wavelet transform
in 2021 IEEE/CVF Conference on Computer Vision and and variational autoencoders,” in 2022 IEEE Energy
Pattern Recognition Workshops (CVPRW), 2021, pp. Conversion Congress and Exposition (ECCE), 2022, pp.
1951–1960. 1–6.
[10] T. Yairi, T. Oda, Y. Nakajima, N. Miura, and N. Takata, [22] H. Liu, Z. Dai, D. R. So, and Q. V. Le, “Pay attention to
“Evaluation testing of learning-based telemetry moni- mlps,” 2021.
toring and anomaly detection system in sds-4 opera- [23] G. Raman MR, N. Somu, and A. Mathur, “A multilayer
tion,” in Proceedings of the International Symposium perceptron model for anomaly detection in water
on Artificial Intelligence, Robotics and Automation in treatment plants,” International Journal of Critical
Space (i-SAIRAS), 2014. Infrastructure Protection, vol. 31, p. 100393, 2020.
[11] P. Fortescue, G. Swinerd, and J. Stark, Spacecraft Sys- [Online]. Available: https://www.sciencedirect.com/
tems Engineering, 4th ed. Nashville, TN: John Wiley science/article/pii/S1874548220300573
& Sons, 2011. [24] E. Hedström and P. Wang, “Anomaly detection
[12] R. R. Lutz and I. C. Mikulski, “Empirical analysis using a deep learning multi-layer perceptron to
of safety-critical anomalies during operations,” IEEE mitigate the risk of rogue trading,” Ph.D. dissertation,
9
KTH, School of Electrical Engineering and Computer A. Mohammadi, “Xceptiontime: A novel deep archi-
Science (EECS), 2021. [Online]. Available: https: tecture based on depthwise separable convolutions for
//urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-301948 hand gesture classification,” 2019.
[25] P. Bernal-Mencia, K. Doerksen, and C. Yap, “Machine [39] H. I. Fawaz, B. Lucas, G. Forestier, C. Pelletier,
learning for early satellite anomaly detection,” Proceed- D. F. Schmidt, J. Weber, G. I. Webb, L. Idoumghar,
ings of the Small Satellite Conference, 2021. P.-A. Muller, and F. Petitjean, “InceptionTime: Finding
[26] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, AlexNet for time series classification,” Data Mining
L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, and Knowledge Discovery, vol. 34, no. 6, pp.
“Attention is all you need,” 2023. 1936–1962, sep 2020. [Online]. Available: https:
//doi.org/10.1007%2Fs10618-020-00710-y
[27] J. Kim, H. Kang, and P. Kang, “Time-series anomaly
detection with stacked transformer representations and [40] S. Hochreiter and J. Schmidhuber, “Long short-term
1d convolutional network,” Engineering Applications memory,” Neural Computation, vol. 9, no. 8, pp. 1735–
of Artificial Intelligence, vol. 120, p. 105964, 2023. 1780, 1997.
[Online]. Available: https://www.sciencedirect.com/ [41] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Em-
science/article/pii/S0952197623001483 pirical evaluation of gated recurrent neural networks on
[28] Y. Jeong, E. Yang, J. H. Ryu, I. Park, and M. Kang, sequence modeling,” 2014.
“Anomalybert: Self-supervised transformer for time se- [42] R. Cahuantzi, X. Chen, and S. Güttel, “A comparison
ries anomaly detection using data degradation scheme,” of lstm and gru networks for learning symbolic se-
2023. quences,” 2023.
[29] J. Xu, H. Wu, J. Wang, and M. Long, “Anomaly [43] G. Xiang and R. Lin, “Robust anomaly detection for
transformer: Time series anomaly detection with as- multivariate data of spacecraft through recurrent neu-
sociation discrepancy,” in International Conference on ral networks and extreme value theory,” IEEE Access,
Learning Representations, 2022. [Online]. Available: vol. 9, pp. 167 447–167 457, 2021.
https://openreview.net/forum?id=LzQQ89U1qm [44] S. Lin, R. Clark, R. Birke, S. Schönborn, N. Trigoni,
[30] G. Zerveas, S. Jayaraman, D. Patel, A. Bhamidipaty, and S. Roberts, “Anomaly detection for time series
and C. Eickhoff, “A transformer-based framework for using vae-lstm hybrid model,” in ICASSP 2020 - 2020
multivariate time series representation learning,” in IEEE International Conference on Acoustics, Speech
Proceedings of the 27th ACM SIGKDD Conference and Signal Processing (ICASSP), 2020, pp. 4322–4326.
on Knowledge Discovery & Data Mining, ser. KDD [45] F. Andayani, L. B. Theng, M. T. Tsun, and C. Chua,
’21. New York, NY, USA: Association for Computing “Hybrid lstm-transformer model for emotion recogni-
Machinery, 2021, p. 2114–2124. [Online]. Available: tion from speech audio files,” IEEE Access, vol. 10, pp.
https://doi.org/10.1145/3447548.3467401 36 018–36 027, 2022.
[31] H. Meng, Y. Zhang, Y. Li, and H. Zhao, “Spacecraft [46] Z. Zeng, V. T. Pham, H. Xu, Y. Khassanov, E. S.
anomaly detection via transformer reconstruction Chng, C. Ni, and B. Ma, “Leveraging text data using
error,” in ICASSE 2019: Proceedings of the hybrid transformer-lstm based end-to-end asr in transfer
International Conference on Aerospace System Science learning,” 2020.
and Engineering 2019, 2020. [Online]. Available:
https://api.semanticscholar.org/CorpusID:214396765 [47] B. Urazalinov, “Parkinson’s freezing of
gait prediction,” 2023. [Online]. Avail-
[32] Y. LeCun, B. E. Boser, J. S. Denker, D. Henderson, able: https://www.kaggle.com/competitions/tlvmc-
R. E. Howard, W. E. Hubbard, and L. D. Jackel, parkinsons-freezing-gait-prediction/discussion/416026
“Handwritten digit recognition with a back-propagation
network,” in NIPS, 1989. [Online]. Available: https: [48] I. Oguiza, “tsai - a state-of-the-art deep learning library
//api.semanticscholar.org/CorpusID:2542741 for time series and sequential data,” Github, 2022.
[Online]. Available: https://github.com/timeseriesAI/
[33] Y. Song, J. Yu, D. Tang, J. Yang, L. Kong, and X. Li, tsai
“Anomaly detection in spacecraft telemetry data using
graph convolution networks,” in 2022 IEEE Interna- [49] E. Sanderson and B. J. Matuszewski, “Fcn-transformer
tional Instrumentation and Measurement Technology feature fusion for polyp segmentation,” in Medical Im-
Conference (I2MTC), 2022, pp. 1–6. age Understanding and Analysis, G. Yang, A. Aviles-
Rivero, M. Roberts, and C.-B. Schönlieb, Eds. Cham:
[34] M. Tennberg and L. Ekeroot, “Anomaly detection Springer International Publishing, 2022, pp. 892–907.
on satellite time-series,” Ph.D. dissertation, Uppsala
University, 2021. [Online]. Available: https://urn.kb.se/ [50] E. M. Al-Ali, Y. Hajji, Y. Said, M. Hleili, A. M. Alanzi,
resolve?urn=urn:nbn:se:uu:diva-446292 A. H. Laatar, and M. Atri, “Solar energy production
forecasting based on a hybrid cnn-lstm-transformer
[35] H. Fanaee-T and J. Gama, “Tensor-based anomaly model,” Mathematics, vol. 11, no. 3, 2023. [Online].
detection: An interdisciplinary survey,” Knowledge- Available: https://www.mdpi.com/2227-7390/11/3/676
Based Systems, vol. 98, pp. 130–147, 2016.
[Online]. Available: https://www.sciencedirect.com/ [51] Z. Xu, Z. Cheng, and B. Guo, “A hybrid data-
science/article/pii/S0950705116000472 driven framework for satellite telemetry data anomaly
detection,” Acta Astronautica, vol. 205, pp. 281–294,
[36] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual 2023. [Online]. Available: https://www.sciencedirect.
learning for image recognition,” 2015. com/science/article/pii/S0094576523000590
[37] Z. Wang, W. Yan, and T. Oates, “Time series classifica- [52] J. Howard and S. Gugger, “Fastai: A layered API for
tion from scratch with deep neural networks: A strong deep learning,” Information, vol. 11, no. 2, p. 108,
baseline,” 2016. feb 2020. [Online]. Available: https://doi.org/10.3390%
[38] E. Rahimian, S. Zabihi, S. F. Atashzar, A. Asif, and 2Finfo11020108
10
[53] L. N. Smith, “A disciplined approach to neural network B IOGRAPHY [
hyper-parameters: Part 1 – learning rate, batch size,
momentum, and weight decay,” 2018.
Daniel Lakey received his BSc degree in
Computer Science in 2003 from Cardiff
[54] J. Howard and S. Gugger, “fast.ai - fastai A University, and is completing a MSc in
Layered API for Deep Learning — fast.ai,” Data Science with IU International Uni-
https://www.fast.ai/posts/2020-02-13-fastai-A- versity of Applied Sciences. Working
Layered-API-for-Deep-Learning.html, 2021, with the European Space Agency, Daniel
[Accessed 02-10-2023]. has been deeply involved with interplan-
etary exploration missions since 2006.
Since 2013 Daniel has been a Spacecraft
[55] A. Al-Kababji, F. Bensaali, and S. P. Dakua, “Schedul- Operations Engineer on the ESA/NASA
ing techniques for liver segmentation: Reducelron- Solar Orbiter mission, with a particular focus on anomaly
plateau vs onecyclelr,” 2022. investigation and resolution.
orcid.org/0000-0002-8198-7892
[56] L. Liu, L. Tian, Z. Kang, and T. Wan, “Spacecraft
anomaly detection with attention temporal convolution Tim Schlippe is a professor of Artificial
network,” 2023. Intelligence at IU International Univer-
sity of Applied Sciences and CEO of
[57] P. Virtanen, R. Gommers, and T. E. Oliphant, the company Silicon Surfer. Prof. Dr.
“SciPy 1.0: fundamental algorithms for scientific Schlippe has in-depth knowledge in the
computing in python,” Nature Methods, vol. 17, fields of artificial intelligence, machine
no. 3, pp. 261–272, Feb. 2020. [Online]. Available: learning, natural language processing,
https://doi.org/10.1038/s41592-019-0686-2 multilingual speech recognition/synthe-
sis, machine translation, language mod-
eling, computer-aided translation, and
[58] A. M. Ikotun, A. E. Ezugwu, L. Abualigah, entrepreneurship, which can be seen in his numerous pub-
B. Abuhaija, and J. Heming, “K-means clustering lications at international conferences in these areas.
algorithms: A comprehensive review, variants orcid.org/0000-0002-9462-8610
analysis, and advances in the era of big data,”
Information Sciences, vol. 622, pp. 178–210, 2023.
[Online]. Available: https://www.sciencedirect.com/
science/article/pii/S0020025522014633
11