0% found this document useful (0 votes)

68 views

s13042-025-02560-w

This survey reviews the advancements in Deep Learning for Time Series Forecasting (DTSF), addressing limitations of classical statistical models in various domains. It categorizes deep learning models, summarizes feature extraction methods, and compiles datasets while highlighting challenges and future research directions. The article aims to provide a comprehensive overview of DTSF, enhancing understanding and guiding future studies in the field.

Uploaded by

Safaa Kahil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

68 views

s13042-025-02560-w

Uploaded by

Safaa Kahil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

International Journal of Machine Learning and Cybernetics

https://doi.org/10.1007/s13042-025-02560-w

ORIGINAL ARTICLE

Deep learning for time series forecasting: a survey

Xiangjie Kong1 · Zhenghao Chen1 · Weiyao Liu1 · Kaili Ning1 · Lechao Zhang1 · Syauqie Muhammad Marier1 ·
Yichen Liu1 · Yuhao Chen1 · Feng Xia2

Received: 8 October 2024 / Accepted: 20 January 2025

Abstract
Time series forecasting (TSF) has long been a crucial task in both industry and daily life. Most classical statistical models
may have certain limitations when applied to practical scenarios in fields such as energy, healthcare, traffic, meteorology, and
economics, especially when high accuracy is required. With the continuous development of deep learning, numerous new
models have emerged in the field of time series forecasting in recent years. However, existing surveys have not provided a
unified summary of the wide range of model architectures in this field, nor have they given detailed summaries of works in
feature extraction and datasets. To address this gap, in this review, we comprehensively study the previous works and sum-
marize the general paradigms of Deep Time Series Forecasting (DTSF) in terms of model architectures. Besides, we take an
innovative approach by focusing on the composition of time series and systematically explain important feature extraction
methods. Additionally, we provide an overall compilation of datasets from various domains in existing works. Finally, we
systematically emphasize the significant challenges faced and future research directions in this field.

Keywords Time series forecasting · Model architecture paradigm · Feature extraction methodology · Multivariate time
series sata

1 Introduction
* Feng Xia Time series are pervasive in various facets of our manu-
[email protected]
facture and life, serving as a primary dimension to record
Xiangjie Kong historical events. Forecasting, a critical task, leverages
[email protected]
historical information within sequences to infer the future
Zhenghao Chen [113, 191]. It finds extensive applications in various domains
[email protected]
closely intertwined with our lives, including energy produc-
Weiyao Liu tion and consumption [55, 143, 194, 209, 240, 290], mete-
[email protected]
orological variations [177, 275], finance, stock markets,
Kaili Ning and econometrics [6, 29, 37, 119, 163, 213, 224] sales and
[email protected]
demand [17, 23] urban traffic flows [144, 165, 237], and
Lechao Zhang welfare-related healthcare conditions [120, 173, 192, 238].
[email protected]
Machine learning, data science, and other research groups
Syauqie Muhammad Marier employing operations research and statistical methods have
[email protected]
extensively explored time series forecasting [71–73, 81].
Yichen Liu Statistical models typically consider non-stationarity, linear
[email protected]
relationships, and specific probability distributions to infer
Yuhao Chen future trends based on the statistical properties of historical
[email protected]
data such as mean, variance, and autocorrelation. On the
1
College of Computer Science and Technology, Zhejiang other hand, machine learning models learn patterns and rules
University of Technology, Hangzhou 310023, China from the data. With the emergence [203] and rapid devel-
2
School of Computing Technologies, RMIT University, opment of deep learning [94, 134], an increasing number
Melbourne 3000, Australia

Vol.:(0123456789)
International Journal of Machine Learning and Cybernetics

of neural network models are being applied to time series current achievements in DTSF research but also elucidate
forecasting. In contrast to the first two approaches that rely prospective avenues for future exploration. Finally, we con-
on domain-specific knowledge or meaningful feature engi- clude this survey in Sect. 6. In Appendix A, an exhaustive
neering, deep learning autonomously extracts intricate time account of TSF datasets across various domains is presented.
features and patterns from complex data. This capability Figure 1 shows an outline of the entire paper.
enables the capture of long-term dependencies and complex
relationships, ultimately enhancing prediction accuracy. In
this article, we will refer to works on Deep Learning for 2 Time series forecasting
Time Series Forecasting as DTSF works, and Time Series
Forecasting will be abbreviated as TSF. Time series represents a continuous collection of data points
In recent years, deep learning methods have continuously recorded at regular or irregular time intervals, offering a
advanced and innovated in time series forecasting (TSF) chronological record of observed phenomena such as vital
across various domains [18, 132, 150, 183, 196, 206, 257]. signs, sales trends, stock market prices, weather changes,
However, current research efforts primarily focus on key and more. The nature of these observations can encompass
TSF concepts and fundamental model components, while numerical values, labels, etc. Moreover, time series can be
lacking a high-level categorization of deep learning-based either discrete or continuous [101]. It is commonly employed
DTSF model structures, comprehensive summaries of recent for the analysis and prediction of trends and patterns [175]
developments, and in-depth analyses of future prospects and that evolve over time.
challenges. This article aims to address these gaps by draw- TSF is the process of forecasting future values based on
ing on the latest research. The main contributions of this the inherent properties and characteristic patterns found in
work are as follows: historical data. These properties and intrinsic patterns may
provide valuable insights into describing future occurrences.
• Dynamic and systematic taxonomy. We propose a novel Discovering potential features within time series data based
dynamic classification method designed to categorize on the similarity of statistical characteristics between adja-
deep learning models for time series forecasting in a sys- cent data points or time steps is crucial for building a strong
tematic manner. Our survey classifies and summarizes foundation for designing prediction models and achieving
these models from the perspective of their architectural improved results.
structure. To the best of our knowledge, this represents In this section, we will begin with the definition of time
the first dynamic classification of deep learning model series and explain the concept of TSF. Furthermore, we will
architectures for time series forecasting. introduce classical methods based on mathematical statis-
• Comprehensive review of data feature enhancement. We tics. Lastly, we will analyze the factors contributing to lower
analyze and summarize feature enhancement methods prediction accuracy to provide researchers new to this field
for time series data, including dimensional decomposi- with a preliminary understanding.
tion, time-frequency transformation, pre-training, and
patch-based segmentation. Our analysis begins with the 2.1 Time series definition
composition of complex, high-dimensional data features,
aiming to reveal the latent learning potential within time In this survey, we consider time series as observation
series data. sequences recorded in chronological order, which may have
• Summary of challenges and future directions. This sur- fixed or variable time intervals between observations. Let t
vey summarizes major TSF datasets from recent years, denote the time of observation, and yt represents the time
discusses key challenges, and highlights promising future series, corresponding to a stochastic process composed of
research directions to advance the field. random variables observed over time. In most cases, t ∈ ℤ,
where ℤ = (0, ±1, ±2, …) represents the set of positive and
The remaining content is organized as follows. Section 2 negative integers [81]. When only a limited amount of data is
introduces the fundamental aspects of TSF, encompassing available, a time series can be represented as (y1 , y2 , y3 , …).
the definition and composition of time series, forecasting Let Y = {yi,1∶Ti }Ni=1 denote the collection of N univariate time
tasks, statistical models, and existing problems. Section 3, series, where yi,1∶Ti = (yi,1 , … , yi,Ti ), and yi,t represents the
a pivotal component of this paper, mainly delineates the values of t for the i-th time series. Yt1 ∶t2 is the collection of
overarching structural paradigm of DTSF models. Section 4 values for all N time series within the time interval [t1 , t2 ].
outlines the prevalent paradigms for extracting and learn- Time series data differs from other forms of data since
ing features from time series data, constituting the second it is prevalent in all major fields and is significant as one of
major focus. Section 5 is another key focus of this paper. We the aspects that make up our reality. It has a wide range of
not only highlight the limitations and challenges within the attributes and characteristics. First of all, time series data
International Journal of Machine Learning and Cybernetics

W
R

Fig. 1 The outline of this article

are usually noisy and high-dimensional. Techniques such as and long-term forecasting based on the prediction horizon,
dimensionality reduction, wavelet analysis, or filtering can which is determined by specific application requirements
be used to eliminate some noise and reduce dimensionality and domain characteristics. Short-term forecasting typi-
[276]. Secondly, the sample time interval has an impact on cally involves shorter time spans, often ranging from hours
it. Due to its inherent instability in reality, the distribution of to weeks, emphasizing high prediction accuracy and is
time series obtained at different sampling frequencies does suitable for tasks demanding precision. In contrast, long-
not have a uniform probability distribution [262]. Finally, if term forecasting spans longer periods, including months,
time series data is viewed as an information network, each years, or even longer durations, and addresses challenges
time point can be considered a node, with the relationships related to long-term trends and seasonal variations that
between nodes evolving over time. Similar to most real- can significantly impact prediction accuracy. The distinc-
world networks, this data is inherently heterogeneous and tion between these two types of forecasting lies in their
dynamic [189], which presents significant challenges for the specific emphasis. Short-term forecasting prioritizes pre-
modeling and analysis of spatio-temporal data. It is worth cision and relies mainly on extrapolating data, suitable
noting that the representation of time series data is crucial for scenarios where fluctuations within relatively short
for relevant features extraction and dimensionality reduction. periods are critical for prediction outcomes. Conversely,
The success or failure of model design and application is long-term forecasting requires consideration of long-term
closely tied to this representation. trends and seasonal influences, making it more complex
and necessitating additional factors such as extra assump-
2.2 Forecasting task tions and supplemental external data, which may affect its
accuracy. Therefore, the role of external factors is particu-
TSF is a process of predicting future data based on histori- larly important in long-term forecasting, as they help the
cal observations, widely applied in various domains such forecasting model better capture long-term trends, cyclical
as energy, finance, and meteorology to anticipate future fluctuations, and other macro-level changes. For exam-
trends. The task of TSF can be categorized into short-term ple, external factors such as weather, holidays, economic
International Journal of Machine Learning and Cybernetics

indicators, and road network information often have a sig- products or the interrelationships within various financial
nificant impact on the trends and seasonal variations in markets.
time series data. Currently, many researchers have incor- In the following subsections, we will introduce statistical
porated these external factors into forecasting models to forecasting models and highlight their limitations, emphasiz-
improve the accuracy of predictions. Common approaches ing the challenges posed by traditional TSF methods. Subse-
to handling external influences include incorporating quently, we will delve into the development of deep learning
external data as additional features into the model, using forecasting models and methods.
multi-task learning with external data [204], and introduc-
ing exogenous variables into classical time series models.
Deep learning methods, such as LSTM, GRU, and atten- 2.3 Statistical forecasting model
tion mechanisms, also enhance model performance by
considering external factors [193]. Additionally, seasonal The development history of statistical forecasting models
adjustment, periodic modeling, and the integration of road can be traced back to the early 20th century. Equations 1 and
network knowledge are effective methods for addressing 2 illustrate how the first statistical forecasting methods, such
external influences. For instance, MultiSPANS [299] uses as Moving Averages (MA) [24, 50, 109] and simple Expo-
a structural entropy minimization algorithm to generate nential Smoothing (ES) [87], were based on time series.
optimal road network hierarchies, considering complex t
1 ∑
multi-distance dependencies in the road network for pre- MAt (n) = x (1)
n i=t−n+1 i
diction; [128], in summarizing forecasting tasks, con-
structed a new bus station distance network to account for where n is the window size, and MA represents the moving
the relationships between external bus stations. average at time t.
On the other hand, in addition to being categorized as
Univariate [115, 174, 211, 281] and Multivariate [125, ESt+1 = 𝛼 ⋅ xt + (1 − 𝛼) ⋅ ESt (2)
164] forecasting based on whether multiple variables are
considered, TSF can also be distinguished by the distinc- where ESt+1 represents the predicted trend, 𝛼 is the smooth-
tion between global and local models. Univariate forecasting coefficient, and ESt is the value predicted at the previous
ing involves tasks where only one variable is considered time step. Moving average smooths data by calculating the
during the forecasting process, primarily focusing on pre- average of observed values over a certain period of time,
dicting the future values of a single variable. Multivariate while exponential smoothing assigns higher weights to more
forecasting, on the other hand, entails the simultaneous recent observations to reflect the trend of the data.
prediction of multiple correlated variables, considering the Subsequently, the autoregressive (AR) [24, 109, 135]
interdependencies among various variables and forecasting and Moving Average (MA) models (represented by Equa-
their future values. When discussing univariate and mul- tions 3 and 4, respectively) were introduced as two important
tivariate forecasting, it’s essential to consider the distinc- concepts, leading to the development of the Autoregressive
tion between global and local models, which impacts the Moving Average Model [1, 24, 109] (ARMA, as shown in
modeling approach and the interpretation of results. Global equation 5). These models aim to accurately capture the auto
models consider all variables across the entire time series correlation and averaging properties of time series data.
dataset, while local models focus on subsets of the data, such AR ∶ Yt = c + 𝜑1 Yt−1 + 𝜑2 Yt−2 + ⋯ + 𝜑p Yt−p + 𝜉t (3)
as specific segments or windows, affecting how dependen-
cies within the data are captured and predictions are made.
In summary, the categorization and focus of forecasting
MA ∶ Yt = 𝜇 + 𝜖t + 𝜃1 𝜖t−1 + 𝜃2 𝜖t−2 + ⋯ + 𝜃q 𝜖t−q (4)
tasks depend on the application context and requirements.
For instance, in the financial domain, short-term forecast- Yt = c + 𝜑1 Yt−1 + 𝜑2 Yt−2 + ⋯ + 𝜑p Yt−p
ing may involve predicting stock price fluctuations within (5)
+ 𝜃1 𝜖t−1 + 𝜃2 𝜖t−2 + ⋯ + 𝜃q 𝜖t−q + 𝜖t
minutes or hours, while long-term forecasting could encom-
pass forecasts over several weeks or months. Similarly, in where Yt represents the time series data under consideration,
meteorology, short-term forecasting might entail predicting 𝜑1 to 𝜑p are parameters of the AR model. These parameters
weather conditions within a few hours, while long-term fore- describe the relationship between the current value and
casting may involve predictions spanning days or weeks. values from the past p time points. Similarly, 𝜃1 to 𝜃q are
For univariate forecasting, the focus could be on forecast- parameters of the MA model, which describe the relation-
ing the sales volume of a particular product or the price of ship between the current value and errors from the past
a specific stock. On the other hand, multivariate forecasting q time points. 𝜀t represents the error term at time t, and c
might simultaneously predict the sales volumes of multiple denotes a constant term.
International Journal of Machine Learning and Cybernetics

Specifically, the AR model leverages past time series models, these limitations have been overcome, leading to
observations to predict future values, while the MA model improved predictive performance.
relies on the moving average of observations to make these
predictions. To address non-stationary time series data,
the Autoregressive Integrated Moving Average (ARIMA) 3 DTSF model architecture
model [24, 50, 102, 109, 280] is introduced. ARIMA is
employed to transform non-stationary sequences into sta- Time series data is prevalent in various real-world domains,
tionary ones by means of differencing, thereby reducing including energy, transportation, and communication sys-
or eliminating trends and seasonal variations in the time tems. Accurately modeling and predicting time series data
series. This transformation is represented by Equation (6) plays a crucial role in enhancing the efficiency of these sys-
as follows: tems. Classical deep learning models (RNN, TCN, Trans-
former, and GAN) have made significant advancements in
ΔYt = (1 − L)d Yt = 𝜖t (6) TSF [250, 251, 284, 294], providing valuable insights for
subsequent research.
where L denotes the lag operator, d represents the differenc-
One of the widely adopted methods is the Recurrent Neu-
ing order, yt signifies the time series, and 𝜖t is the error term.
ral Network (RNN), which utilizes recurrent connections to
This integration of ARIMA helps mitigate non-stationarity,
handle temporal relationships and capture evolving patterns
paving the way for more effective TSF.
in sequential data. Variants of RNNs, namely Long Short-
Machine learning models represented by Random For-
Term Memory (LSTM) and Gated Recurrent Units (GRU),
ests and Decision Trees [5, 111, 129, 201] offer enhanced
are specifically designed to address long-term dependencies
flexibility and predictive performance in statistical fore-
and effectively capture patterns in long time series. There
casting [2, 105]. A decision tree comprises a series of
is a lot of research based on RNNs, DeepAR [206] lever-
decision nodes and leaf nodes, constructed based on the
aged RNN and autoregressive techniques to capture tempo-
selection of optimal features and splitting criteria to mini-
ral dependencies and patterns in time series data. MQRNN
mize prediction errors or maximize metrics like informa-
[248] exploited the expressiveness and temporal nature of
tion gain or Gini index. Each decision node splits based
RNNs, the nonparametric nature of Quantile Regression
on feature conditions, while each leaf node provides pre-
and the efficiency of Direct Multi Horizon Forecasting, pro-
diction results. Random Forest, on the other hand, makes
posed a new training scheme named forking-sequences to
forecasting by constructing multiple decision trees and
boost stability and performance. ES-RNN [225] proposed a
combining their forecasting results. It can handle high-
dynamic computational graph neural network with a stand-
dimensional features and large-scale datasets, capturing
ard exponential smoothing model and LSTM in a common
nonlinear relationships and interactions between features.
framework.
However, the development of emerging technologies
In addition to RNNs, Convolutional Neural Networks
such as the Internet of Things (IoT) has brought efficiency
(CNNs) can also be employed for TSF. By processing time
and convenience to data acquisition, collection, and stor-
series data as one-dimensional signals, CNNs can extract
age [127, 140]. The era of big data has arrived [74, 205],
features from local regions, enabling them to capture local
with data being generated at an increasing rate. Statistical
patterns and translational invariance effectively. Notably,
forecasting models need to better adapt to the demands of
Temporal Convolutional Networks (TCNs) represent a
processing large-scale and high-dimensional data [34, 186,
prominent example of CNN-based models for time series
255]. Different industries and domains are also increas-
analysis.
ingly in need of accurate forecasting models to support
The Temporal Convolutional Network is a classical deep
decision-making and planning [200]. Furthermore, more
learning model that has garnered widespread attention in
complex relationships among data are encountered in prac-
time series forecasting due to its ability to effectively capture
tical applications, requiring more flexible and accurate
long-range dependencies. Unlike traditional RNN, TCNs
models to tackle these challenges.
employ convolutional layers with dilated convolutions to
In summary, traditional statistical forecasting models
expand the receptive field without increasing the number
are limited in terms of computational power, prediction
of parameters. This enables TCNs to handle long-range
accuracy, and length. There are major shortcomings in sta-
dependencies more efficiently while maintaining compu-
tistical forecasting methods in handling non-stationarity,
tational efficiency [15]. TCNs are particularly useful for
nonlinear relationships, noise, and complex dependen-
time series data with complex temporal patterns, as they
cies, and their adaptability to long-term dependencies and
can model sequences of varying lengths without suffering
multi-feature forecasting tasks is also limited. With the
from the vanishing gradient problem [56]. In traffic flow
continuous development and innovation of deep learning
prediction, TCNs have been successfully applied to model
International Journal of Machine Learning and Cybernetics

the temporal dependencies in sensor data, achieving high classic models for time series data modeling, capable of
accuracy in forecasting traffic conditions [289]. Further- capturing long-term dependencies, and have demonstrated
more, when combined with other techniques such as atten- excellent performance in various time series forecasting
tion mechanisms and feature extraction layers, TCNs have tasks, such as financial forecasting and weather prediction
demonstrated improved performance across various predic- [46]. In contrast to traditional RNNs, TCN leverage con-
tion tasks. For instance, integrating TCNs with attention- volutional layers to address long-term dependency issues,
based models has shown enhanced results in multivariate achieving strong results in several time series forecast-
time series forecasting tasks like electricity load prediction ing applications, particularly in traffic flow prediction and
and energy demand forecasting. Overall, TCNs provide a weather forecasting [15]. Moreover, the Bi-directional
powerful and effective approach to time series forecasting, Encoder-Decoder model, which utilizes bidirectional
especially when dealing with long sequences or datasets LSTM, captures both past and future time information,
with complex temporal dependencies. further enhancing the model’s forecasting accuracy [45].
Another valuable technique is the attention mechanism, These classic Encoder-Decoder models, with their abil-
which allows models to assign varying weights to different ity to automatically learn complex patterns in time series
parts of the input sequence. This is particularly beneficial for data, have become essential tools in time series forecast-
handling long-term series or focusing on important informa- ing tasks.
tion at specific time points. Additionally, Generative Adver- Encoder-decoder has also been extensively and success-
sarial Networks (GANs) can be utilized for TSF. Through fully applied in the field of TSF. For instance, [190] was
adversarial training between a generator and a discriminator, inspired by U-net [202] and designed a time fully convo-
GANs can generate synthetic time series samples and pro- lutional network called U-Time based on the U-net archi-
vide more accurate prediction results. tecture. U-Time maps arbitrarily long sequential inputs to
In this section, we dynamically classify existing time label sequences on a freely chosen time scale. The overall
series models based on the model architecture dimension. network exhibits a U-shaped architecture with highly sym-
We focus on the internal structural design of the models and metric encoder and decoder components. We believe that
categorize the five model architectures into explicit struc- the high degree of symmetry in the architecture is because
ture paradigms and implicit structure paradigms. Figure 2 the proposed network’s input and output exist in the same
shows more details of our proposed model classification. space. The encoder maps the input into another space, and
Table 1 comprehensively summarizes the models that have the decoder should map back from this space. Therefore,
made outstanding contributions in recent years. Table 2 the network architecture is theoretically highly symmetric.
selects several key models and provides a detailed analysis There are many highly symmetric encoder-decoder
of their advantages, disadvantages, application domains, and network architectures, as well as cases where the encoder
prediction horizons. The aim is to help readers understand and decoder are asymmetric. The most typical example is
the unique characteristics of each model and guide them the Transformer architecture [251, 265, 292, 294]. It can
in selecting the most suitable model for specific prediction be observed that the decoder differs from the encoder and
tasks. receives input. This encoder-decoder architecture is con-
sidered to require additional information for assistance to
3.1 Model with explicit structure perform better.
Likewise, Guo et al. [97] proposed an asymmetric
3.1.1 Encoder‑decoder model encoder-decoder learning framework where the spatial rela-
tionships and time-series features between multiple build-
The encoder-decoder model is widely used in the field of ings are extracted by a convolutional neural network and a
deep learning, which appears similar to seq2seq and has an gated recurrent neural network to form new input data in the
explicit encoder and a decoder. However, seq2seq seems to encoder. The decoder then makes predictions based on the
be described from an application-level perspective, while input data with an attention mechanism.
the encoder-decoder is described at the network level. U-net There are some other examples of encoder-decoder here
for medical image segmentation [202] and various forms of as well. In [21], a novel hierarchical attention network
Transformers are well-known applications. (HANet) for the long-term prediction of multivariate time
In this context, the classic Seq2Seq model stands as one series was proposed, which also includes an encoder and a
of the most representative Encoder-Decoder architectures. decoder. However, the encoder and decoder architectures are
It uses Long Short-Term Memory networks as both the noticeably different. That is to say, the encoder and decoder
encoder and decoder to map input sequences to output are asymmetric (see Fig. 3). There are also network architec-
sequences, making it particularly suitable for multi-step tures that explicitly involve an encoder but lack an explicit
forecasting tasks [231]. Additionally, LSTM and GRU are decoder [66].
International Journal of Machine Learning and Cybernetics

Table 1 DTSF model architecture paradigm

Architecture Model Multi/uni Output Loss Metrics Year

COST [249] Multi & uni Point contrastive loss MSE, MAE 2022
TS2Vec [274] Multi & uni Point contrastive loss MSE 2022
ACT [146] Multi & uni Point cross-entropy Q50 loss, Q90 loss 2022
SimTS [291] Multi & uni Point cos-similarity loss, InfoNCE loss MAE, MSE 2023
DeepTCN [41] Multi Pro quantile loss NRMSE, SMAPE, MASE 2020
STEP [215] Multi Pro MAE MAE, RMSE, MAPE 2022
DCAN [106] Multi Point RMSE MAE, RMSE 2022
FusFormer [265] Multi Point – RMSE, RMSE Decrease 2022
HANet [21] Multi Point – MAE, RMSE 2022
D3 VAE [145] Multi Pro – MSE, CRPS 2022
TI-MAE [147] Multi Point MSE MSE, MAE 2023
Encoder - Decoder AST [254] Uni Pro cross-entropy Q50, Q90 loss 2020
TFT [150] Multi & uni Prob quantile loss P50, P90 quantile loss 2021
Informer [292] Multi & uni Point MSELoss MSE, MAE 2021
ETSformer [250] Multi & uni Point MSELoss MSE, MAE 2022
FEDformer [294] Multi & uni Point MSELoss MSE, MAE, Permutation 2022
TACTiS [62] Multi & uni Pro log-likelihood CRPS-Sum, CRPS-means 2022
Autoformer [251] Multi & uni Point L2 loss MSE, MAE 2022
NSTformer [159] Multi & uni Point L2 loss MSE, MAE 2023
Dateformer [270] Multi & uni Point MSE MSE, MAE 2023
Crossformer [286] Multi & uni Point MSE MSE, MAE 2023
Scaleformer [214] Multi & uni Pro MSE MSE, MAE 2023
BasisFormer [179] Multi & uni Point MSE MSE, MAE 2023
CRT [282] Multi Point – ROC-AUC, F1-Score 2021
Pyraformer [156] Multi Point MSE MSE, MAE 2022
TDformer [284] Multi Point MSE MSE, MAE 2022
FusFormer [265] Multi Point – RMSE, RMSE Decrease 2022
Scaleformer [214] Multi Point MSE, Huber, Adaptive loss MSE, MAE 2022
Infomaxformer [234] Multi Pro MSELoss MSE, MAE 2023
PatchTST [180] Multi Point Adaptive Loss MSE, MAE 2023
Transformer iTransformer [160] Multi Point L2 Loss MSE, MAE 2023
MCformer [103] Multi Point MSE, MAE MSE, MAE 2024
SAMformer [114] Multi Point MSE MSE, MAE 2024
TSLANet [68] Multi Point MSE MSE, MAE 2024
MASTER [142] Multi Point MSE IC, ICIR, RankIC 2024
TimeSiam [60] Multi Point L2, Cross-Entropy MSE, MAE, Recall, F1 Score 2024
Chronos [12] Multi Point Cross Entropy WQL, CRPS, MASE 2024
TimeXer [246] Multi Point L2 loss MSE, MAE 2024
Time-SSM [112] Multi Point MSE MSE, MAE 2024
SageFormer [288] Multi Point MSE MSE, MAE 2024
TIME-LLM [244] Multi Point MSE, SMAPE MSE, MAE, SMAPE 2024
CARD [245] Multi Point MSE, MAE MSE, MAE 2024
Pathformer [38] Uni Pro L1 loss MSE, MAE 2024
ForGAN [130] Multi & uni Pro RMSE MAE, MAPE, RMSE 2019
COSCI-GAN [212] Multi & uni Pro Global loss = local + central MAE 2022
RCGAN [70] Multi Pro cross-entropy AUROC, AUPRC 2017
TimeGAN [269] Multi Pro Unsupervised, Supervised, Discriminative and Predictive Score 2019
Reconstruction, Loss
PSA-GAN [116] Multi Point Wasserstein loss – 2022
AEC-GAN [243] Multi Point MSE ACF, Skew / Kurt, FD 2023
International Journal of Machine Learning and Cybernetics

Table 1 (continued)
Architecture Model Multi/uni Output Loss Metrics Year

ITF-GAN [121] Multi Point MSE MSE, STS, Pearson, Hellinger, Pred 2024
MAGAN [76] Multi Point – MAE, MAPE 2024
TSGAN [258] Multi Point – MAE, RMSE, MAPE 2022
GAN AST [254] Uni Pro cross-entropy Q50 loss, Q90 loss 2020

Model
Paradigm

Paradigms with Paradigms without

Explicit Structure Explicit Structure

Encoder-Decoder Transformer Generate Integrated Cascade

Model Model Adversarial Model Model Model

Encoder- Attention- Temporal Stack Tree Triangle

Encoder Decoder Generator Descriminator CNN+LSTM RNN+Attention
Decoder Mechanism Self-Attention Cascade Cascade Cascade

Fig. 2 The details of five paradigms

Table 2 A comparative analysis of time series forecasting models: advantages, disadvantages, applications, and prediction lengths
Model Advantages Disadvantages Applications Predic-
tion
horizon

Informer [292] Effcient; strong repre–sentation; Sensitive to data shifts Energy; weather Long
good generalization
HANet [21] Capture complex dependencies; flex- High complexity Weather; ecology Long
ible for multivariate data
Autoformer [251] Efficient; good information High complexity; depend on data Finance; energy; electricity; traffic; Long
periodicity weather; healthcare
ETSformer [250] Combine traditional methods with High computational cost; requires Finance; energy; electricity; traffic; Short
transformer; adaptive time window large data weather; healthcare
FEDformer [294] Frequency enhancement; Better flex- High complexity; large data needed Finance; energy; electricity; traffic; Long
ibility for long-term forecasts weather; healthcare
TreeDRNet [295] Capture time dynamics; efficient High complexity; need large data Finance; energy; electricity; traffic; Long
training with joint networks weather; healthcare
TATCN [242] Capture temporal dependencies; High computational cost; data Electricity; healthcare Short
extract local patterns dependence

3.1.2 Transformer model However, applying Transformer to TSF tasks is not

without challenges and limitations. Recent studies have
With the remarkable performance of Transformer in com- highlighted several issues, such as the inability to directly
puter vision and Natural Language Processing (NLP) handle Long Sequence Time Forecasting (LSTF), includ-
domains, they have also been applied to the field of TSF ing quadratic time complexity, high memory usage, and
and have shown great promise. The main architecture of inherent limitations of the encoder-decoder architecture. To
the Transformer includes the attention mechanism and the address these limitations, Informer [292] was an efficient
encoder-decoder architecture. Transformer-based architecture specifically designed for
International Journal of Machine Learning and Cybernetics

Fig. 3 The overview of HANet model

LSTF. This architecture utilizes the ProbSparse self-atten- other architectural improvements to improve accuracy and
tion mechanism, which reduces the time complexity and computational complexity, which integrates high-perfor-
memory usage to O(LlogL). From the network architecture mance multi-horizon forecasting with interpretable insights
perspective, it is evident that Informer’s architecture [292] into temporal dynamics, capturing temporal relationships
closely resembles the vanilla Transformer, consisting of an at different scales by employing recurrent layers for local
encoder and a decoder. The encoder receives the input, and processing and interpretable self-attention layers for long-
the decoder receives the output from the encoder as well as term dependencies (see Fig. 4).
the input, with the addition of zero-padding in the parts to be Autoformer [251], on the other hand, argues that previ-
predicted. The self-attention mechanism is replaced with the ous Transformer-based prediction models (e.g., Informer
ProbSparse self-attention mechanism. TFT [150] proposed [292]) mainly focused on improving self-attention for sparse

Fig. 4 The overview of

Informer model
International Journal of Machine Learning and Cybernetics

versions. While significant performance improvements were by simultaneously embedding temporal and spatial dimen-
achieved, they sacrificed the utilization of information. One sions of the Seasonal part of the time series decomposition
of the reasons why Transformer cannot be directly applied patches.
to LSTF is the complex characteristics of time series data. There are further works addressing Transformers in the
Without special design, traditional attention mechanisms context of TSF. ETSformer [250] argues that the sequence
struggle to model and learn these characteristics. Autofor- decomposition used by Autoformer makes simplified
mer [251] adopts decomposition as a standard approach for assumptions and is insufficient to properly model complex
time series analysis [49, 167], as it is believed that decom- trend patterns. Considering that seasonal patterns are more
position can untangle the intertwined time patterns and easily identifiable and detectable, ETSformer designs expo-
highlight the intrinsic properties of time series. Autoformer nential smoothing attention (ESA) and frequency attention
[251] introduces a novel decomposition architecture with (FA) mechanisms. The network architecture decomposes the
autocorrelation mechanisms, which is different from the time series into interpretable sequence components such as
conventional series decomposition preprocessing. In terms level, growth, and seasonality. FEDformer combines Trans-
of the network architecture, it follows a macro architecture formers with seasonal-trend decomposition methods. The
similar to Transformers, Informer, and other architectures. decomposition method captures the global profile of the
The difference lies in the input to the Decoder, which is no time series, while the Transformer captures more detailed
longer the original input but rather sub-sequences obtained architectures, making it a frequency-enhanced Transformer.
through time series decomposition, including seasonal and These studies demonstrate the ongoing efforts in lever-
trend dimensions (Fig. 5). aging Transformers for TSF and the development of spe-
In time series forecasting tasks, many researchers pre- cialized architectures and mechanisms to overcome the
fer to divide long time series into smaller segments to help challenges and limitations associated with applying Trans-
Transformer models focus more effectively on local tempo- formers to this domain.
ral features. This approach enhances the model’s ability to
learn local patterns while reducing computational burden. 3.1.3 Generative adversarial model
TSMixer [65] adopts a similar strategy by partitioning time
series data into multiple patches and then processing these GAN (Generative Adversarial Networks) has attracted sig-
patches through MLP-based layers to extract features. This nificant attention since its introduction as a generative model
approach, akin to patch-based methods in computer vision, consisting of an explicit structure including a discrimina-
enables the model to capture local features effectively while tor and a generator. While GANs have been widely used in
reducing computational complexity and memory require- the field of computer vision, their application in TSF has
ments in time series forecasting tasks. Zhang et al. (2023b) been relatively limited. The reason for this limited usage is
[285] proposed a novel Transformer-based multivariate time speculated to be the availability of alternative metrics such
series modeling approach in their work, MTPNet. It achieves as CRPS (Continuous Ranked Probability Score) that can
modeling of temporal information at arbitrary granularities measure the quality of generated samples [19].

Fig. 5 The overview of autoformer model

International Journal of Machine Learning and Cybernetics

In the existing literature on GAN-based TSF, most These studies highlight the application of GANs in TSF,
studies focus on generating synthetic time series datasets specifically in generating synthetic time series data and cap-
[70, 233, 269]. The discriminator is trained to distinguish turing the characteristics of real-world time series data.
between real and generated time series data, with the goal
of producing synthetic data that is indistinguishable from 3.2 Model without explicit structure
real data. TimeGAN [269], a GAN-based network archi-
tecture, was proposed to generate realistic time series data Integrated model
3.2.1
by leveraging the flexibility of unsupervised models and
the control of supervised models. It utilizes an embedding As widely known, recurrent neural networks (RNNs) are
function and a recovery function to extract high-dimen- often considered suitable for sequence modeling, and the
sional features from time series data, which are then fed chapter on sequence modeling in classic deep learning text-
into the sequence generator and sequence discriminator books is titled “Sequence Modeling: Recurrent and Recur-
for adversarial training. Another study proposed a GAN- sive Nets” [108]. Time series naturally falls within the realm
based network architecture using Recurrent Neural Net- of sequence modeling tasks, and therefore, RNNs, LSTM,
works (RNNs) to generate real-valued multidimensional GRU, and similar models are expected to be applicable to
time series [233]. The study introduced two variations, solve time series-related tasks. However, convolutional
Recursive GAN (RGAN) and Recursive Conditional architectures have achieved state-of-the-art accuracy in tasks
GAN (RCGAN), where RGAN generates real-valued data such as audio synthesis, word-level language modeling, and
sequences, and RCGAN generates sequences conditioned machine translation [15], which has garnered significant
on specific inputs. The discriminators and generators attention and led to inquiries on how to apply convolutional
of both RGAN and RCGAN are based on simple RNN architectures in the domain of sequences. Integrated models
architectures. have emerged as a solution (see Fig. 7).
Furthermore, a deep neural network-based approach was Integrated models can combine the strengths of individual
proposed for modeling financial time series data [233]. This model architectures, with each focusing on learning features
approach learns the properties of the data and generates real- it excels at, resulting in improved performance. For exam-
istic data in a data-driven manner, while preserving statisti- ple, convolutional architectures excel at learning local fea-
cal characteristics of financial time series such as nonlinear ture patterns, while recurrent architectures excel at learning
predictability, heavy-tailed return distributions, volatility temporal dependencies between nodes. Integrated models
clustering, leverage effect, coarse-to-fine volatility correla- have also found various applications in time series tasks [14,
tions, and asymmetric return/loss patterns (see Fig. 6). 15, 218]. In [218], precipitation forecasting was modeled

Fig. 6 The overview of

TimeGAN model
International Journal of Machine Learning and Cybernetics

Fig. 7 The overview of TATCN model

as a spatio-temporal sequence prediction problem, where cascade is widely applied in various network model archi-
a convolutional architecture was designed to replace fully tectures. Firstly, stacking multiple identical modules or the
connected layers in LSTM for sequence modeling, effec- entire network can be considered as utilizing the cascade
tively leveraging the advantages of both convolutional and idea, as seen in the Transformer series [49, 54, 141, 167].
recurrent architectures. Similarly, Asiful et al. (2018) [14] Additionally, some models [295] incorporate specially
integrated multiple network architectures, namely LSTM designed cascade approaches to ensure the flow of informa-
and GRU, for stock prediction. In this model, the input was tion in a specific manner, thereby achieving unique effects.
first fed into the LSTM layer, then into the GRU layer, and
finally into a dense network.
4 Series components and enhanced feature
3.2.2 Cascade model extraction methodology

Cascade networks, which are widely used in deep neural In the previous sections, we have provided a comprehen-
networks, especially in Computer Vision (CV) domain [28], sive overview of five prominent paradigms for construct-
have multiple applications. A cascade network typically ing DTSF models. These paradigms offer researchers a
consists of multiple components, each serving a different concise pathway to understanding and building DL models.
function, collectively forming a deeper and more powerful However, a macroscopic understanding and construction of
network model. The components in a cascade model can be DTSF models alone is insufficient. This chapter delves into
either identical or different. When the components are differ- the methodological aspects of learning temporal features,
ent, each component has a specific role and function. If the which enable models to better capture the underlying repre-
components are the same, it means that a particular module sentations of the data, emphasizing a pre-training, decompo-
or the entire network is repeated several times. When the sition, extraction, and refinement process that aligns closely
same component is repeated multiple times, its concept is with the intrinsic nature of data.
somewhat similar to the iterative approach used in solving The chapter is divided into two parts. It begins by dis-
optimization problems (Fig. 8). secting the constituents of time series data in the real world.
In the field of TSF, there are not many works specifically Subsequently, it proceeds to provide an in-depth explora-
known for their cascade models. However, the concept of tion of four well-established feature extraction methods with

Fig. 8 The overview of TreeDRNet model

International Journal of Machine Learning and Cybernetics

strong theoretical foundations and notable performance in potential flaws in the model, further enhancing the quality
the field. These methods facilitate a richer understanding of and reliability of forecasting.
time series data and its essential features. In the real world, time series data contains discrete infor-
mation and is non-stationary, meaning that its mean and
4.1 Components of a time series variance are not constant over time. By decomposing the
data into its constituent parts, we gain a better understanding
In general, time series data can be decomposed into three of the data’s structure, identify long-term trends and peri-
main components: trend, seasonality, and residuals or white odic variations, and distinguish them from random noise.
noise [219], as illustrated in Fig. 9. These decomposition components aid in making more accu-
rate forecasts, uncovering hidden patterns, extracting useful
4.1.1 Trend information, and providing insights into the mechanisms and
regularities underlying the time series data.
Represents the long-term changes in the time series data
and reflects the overall growth or decline of the data over an 4.2 Methodology for enhanced feature extraction
extended period [175]. For example, the increase in popu-
lation over the years exhibits an upward trend [1], and the Numerous studies have been dedicated to improving the
growing wind power generation during multiple windy sea- model architecture and refining its components in DTSF.
sons can also be considered an upward trend. These studies aim to enhance the predictive performance
of models by optimizing or replacing the methods used for
4.1.2 Seasonality extraction and feature learning. To achieve accurate predic-
tions, it is crucial to learn time series representation features
Refers to the periodic variations observed in time series data, thoroughly, and sufficient information is essential for train-
often caused by seasonal, monthly, weekly, or other time ing high-quality model parameters.
unit influences. For instance, the number of tourists and ice In recent years, influential works on DTSF have shown
cream sales tend to increase during long vacations or in the significant changes in data processing and component mod-
summer. eling. Notably, decomposing time series into its major com-
ponents for analysis has been a primary focus, facilitating
4.1.3 Residuals a more comprehensive exploration of trends and seasonal
dimensions. Furthermore, transforming time-domain data
Represent the part of the data that cannot be explained by the into the frequency domain has proven to be more effective in
trend and seasonality components [170]. They capture the feature differentiation. Additionally, exploring non-end-to-
random fluctuations or noise remaining after the decomposi- end approaches and devising suitable data pre-training meth-
tion of trend and seasonality. Residuals reflect the short-term ods to address the potential mismatch between the target task
fluctuations and irregularities that have not been modeled and the data is also a valuable consideration. In the following
in the time series data. Additionally, residuals exhibit some sections, we will introduce the primary methodologies for
autocorrelation, which can help us identify and adjust for enhancing feature extraction and learning in DTSF.

Fig. 9 Components of the time series. The data is sourced from the Exchange-Rate dataset spanning from January 1, 1990, to June 23, 1990. The
blue line represents the original data, the green indicates the trend, the yellow represents seasonality, and the red signifies the residuals
International Journal of Machine Learning and Cybernetics

4.2.1 Dimension decomposition on the overall time series. Furthermore, decomposing data

dimensions enhances the interpretability of TSF models,
Dimension decomposition plays a vital role in the realm of which facilitates a better understanding of the influence of
TSF. It involves breaking down the data into its constituent different components on overall temporal behavior. As a
dimensions or components, such as trends, seasonal patterns, relatively universal method in time series analysis, dimen-
and residuals (Fig. 10). sion decomposition plays a foundational yet crucial role in
In current research, some works have integrated encoder- enhancing feature extraction methodologies.
decoder architectures with seasonal-trend decomposition
[32, 188, 234, 247, 251, 284, 294, 298]. Wu et al. [251] in
the similar work, devised an internal decomposition block 4.2.2 Time‑frequency conversion
to endow deep forecasting model with intrinsic progres-
sive decomposition capability. Subsequently, Zhou et al. The time-frequency domain conversion plays a crucial
[294] proposed a seasonal-trend-based frequency enhanced role in deep learning-based time series forecasting tasks. It
decomposition Transformer architecture in the FEDformer refers to converting the time-domain data into its frequency-
framework. Additionally, Wang et al. [247] introduced the domain representation, enabling a more effective analysis of
LaTS model, leveraging variational inference to unravel the frequency, spectral characteristics, and dynamic varia-
latent space seasonal trend features, and Zhang et al. [284] tions within time series data (Fig. 11).
presented the TDformer model, using MLP to model trends In current research, the time-frequency domain conver-
and Fourier attention to simulating seasonality. Notably, sion finds extensive application in the preprocessing and
Zhu et al. [298] designed an approach to decompose input feature extraction of time series data [43, 131, 229]. This
sequences into trend and residual components across mul- method reveals the components of the data at different fre-
tiple scales, which summed the learned features as the quencies and aids in identifying repetitive patterns, periodic
model output. In recent work, the challenge of capturing trends, and frequency-domain features such as seasonal pat-
outer-window variations was overcome by employing con- terns or periodic oscillations [294]. Converting time series
trastive learning and an enhanced decomposition architec- data into spectrograms provides an overview of the data’s
ture [10]. It is observed that decomposition networks can distribution in the frequency domain, facilitating the iden-
significantly benefit contrastive loss learning of long-term tification of major frequency components and the shape of
representations, thereby enhancing the performance of long- the spectrum. This is particularly valuable for capturing the
term forecasting. overall spectral characteristics of signals and the primary
The significance of dimension decomposition lies in its fluctuation patterns across frequencies. In their work, [30]
ability to delve into and capture the inherent components or employ StemGNN to jointly capture inter-sequence correla-
dimensions within time series data. On one hand, it aids in tions and temporal dependencies in the spectral domain for
isolating and extracting latent patterns in time series data multivariate time series forecasting. In recent work, Yi et al.
for identification and analysis. On the other hand, it isolates [266] proposed a simple yet effective time series forecast-
individual features that influence the overall behavior, allowing architecture, named FreTS, based on Frequency-Domain
ing for a more focused analysis of each constituent part. MLP. It primarily consists of two stages, domain conver-
This contributes to understanding the impact of each feature sion and frequency learning, which enhance the learning of

Fig. 10 The overview of LaST model

International Journal of Machine Learning and Cybernetics

Fig. 11 The overview of FEDformer model

channel and temporal correlations across both inter-series and unsupervised pre-training, yielding promising results
and intra-series scales. [44, 198, 207, 230]. In certain scenarios, the adoption
Furthermore, employing time-frequency domain conver- of sampling pre-training methods could be considered
sion can help reduce the impact of noise and interference (Fig. 12).
[96, 293]. In specific time series forecasting scenarios, noise Contrastive pre-training. Due to potential mismatches
may affect the data, resulting in a decline in the model’s pre- between pre-training and the target domain, there is a
dictive performance. In the FiLM model, Zhou et al. [293] unique challenge in time series pre-training that may lead
introduced a Frequency Enhancement Layer to address this to diminished downstream performance. While domain
issue. They achieved noise reduction by combining Fourier adaptation methods can alleviate these changes [20, 222],
analysis and low-rank matrix approximation, which mini- most approaches are considered suboptimal for pre-training
mized the influence of noise signals and mitigated overfit- as they often require direct examples from the target domain.
ting problems. Apparently, converting time-domain data into To address this, these methods need to adapt to the diverse
the frequency-domain, along with operations like filtering temporal dynamics of the target domain without relying on
and denoising in the frequency domain, proves effective in any target examples during pre-training.
lessening the impact of noise. Contrastive learning, a form of self-supervised learning,
The importance of time-frequency domain conversion lies aims to train an input encoder to map positive sample pairs
in providing a comprehensive and detailed approach to data closer and negative pairs apart [184]. In time series, if the
analysis, which is capable of unveiling the hidden frequency representations based on time and frequency for the same
characteristics and dynamic changes within time series. This instance are close in the time-frequency space, it suggests a
technique has been widely employed in the domain of TSF, certain similarity or consistency in their features or attrib-
representing a crucial methodology for enhancing predic- utes. Zhang et al. [282]. proposed the need for Time-Fre-
tive performance and comprehending the intricacies of time quency Consistency (TF-C) in pre-training, which involves
series data. embedding the time-based neighborhood of an example
close to its frequency-based neighborhood. This work
4.2.3 Pre‑training employs frequency-based contrastive enhancement to lev-
erage rich spectral information and explore time-frequency
Compared to natural language, temporal data exhibits lower consistency in time series. Contrastive pre-training can
information density, necessitating longer sequences to cap- provide robust feature representations for forecasting tasks,
ture temporal patterns. Additionally, temporal data also contributing to enhanced model performance and generali-
exist challenges such as temporal dynamics, rapid evolu- zation (Fig. 13).
tion, and the presence of both long and short-term effects. Masking Pre-training. Time series data is often con-
Due to potential mismatches between pre-training and tar- tinuous, ordered, but practically exhibits incompleteness.
get domains, downstream performance might suffer. Recent Additionally, real-world time series data commonly con-
endeavors in TSF involve novel attempts at self-supervised tains noise and uncertainty, necessitating models to possess
International Journal of Machine Learning and Cybernetics

Fig. 12 The overview of TF-C

Fig. 13 The overview of STEP

robustness in dealing with such uncertainties. To address strategy for training, which effectively learns temporal pat-
these crucial challenges in practice, the masking mecha- terns and generates segment-level representations. These
nism is regarded in some studies as an effective approach to representations provide contextual information for sub-
enhance feature extraction. sequent inputs, facilitating the modeling of dependencies
In the work STEP, Shao et al. [215] designed an unsu- between short-term time series. The Ti-MAE model [147]
pervised pre-training model for time series based on Trans- exhibits analogous efficacy in this regard. In the pre-training
former blocks. The model employs a masked autoencoding model SimMTM, Dong et al. [59] highlighted that randomly
International Journal of Machine Learning and Cybernetics

masking parts of the data severely disrupts temporal vari- preserves local semantic information for each variable in
ations. They relate masking modeling to manifold learning the embedding but also focuses on a more extended his-
and propose a Simple pre-training framework for Masked tory. Furthermore, leveraging the channel-independent
Time-series Modeling. characteristics, potential feature correlations between sin-
In summary, Masking pre-training simulates incomplete- gle variables can be further learned through graph mod-
ness and noise by masking some data points, enabling the eling methods [287]. It allows for spatial aggregation of
model to learn how to handle partially missing information representations for global tokens in the graph.
during the pretraining phase. This methodology can enhance While the modeling emphasis varies across different
the model’s ability to capture long-term dependencies, works, there is a common consideration of employing
increase tolerance to data uncertainty, and improve overall methods that utilize subseries-level patches to process the
generalization performance. raw time series data. This approach proves highly benefi-
cial for capturing and learning the local features of the
4.2.4 Patch‑based segmentation data. The patch-based segmentation method introduces
another methodology for TSF. Additionally, channel inde-
In recent DTSF works, especially those of the Transformer pendence emerges as a viable avenue for exploring multi-
models, the adoption of patch-based data organization has variate time series forecasting.
become prevalent [65, 93, 151, 180, 261, 285]. It is advanta-
geous to enhance the model’s local perception capabilities
by employing a patch-based strategy. Through segmenting
long time series into smaller patches, the model becomes 5 Challenges and prospects
more adept at capturing short-term and local patterns within
the sequence, thereby augmenting its comprehension of We have investigated the neural network architectures,
complex dynamics in the sequence. Simultaneously, the feature extraction and learning approaches, and signifi-
relationships among multivariate variables can yield infor- cant experimental datasets of deep learning models in the
mation gain. Challenges lie primarily in how to learn the context of TSF. While DTSF models have demonstrated
relationships among individual variables and introduce valid remarkable achievements across diverse domains in recent
information into the model, while avoiding redundant infor- years, certain challenging issues remain to be addressed,
mation that may interfere with the model training process which point towards potential future research directions.
(Fig. 14). We summarize these challenges and propose viable ave-
Nie et al. [180] proposed the PatchTST model, where nues as follows. We classify the challenges into three main
they segment time series into subseries-level patches, serv- categories: data features, model structure, and task-related
ing as input tokens for the Transformer. They indepen- issues. Within each category, we highlight several repre-
dently model each channel to represent a single variable. sentative challenges. Figure 15 illustrates an overview of
This channel-independent approach not only effectively these challenges.

Fig. 14 The overview of PatchTST

International Journal of Machine Learning and Cybernetics

5.1 Challenges Furthermore, it is noteworthy that, in contrast to the

black-box nature of traditional neural networks, a series of
5.1.1 Lack of data privacy protection and completeness TSF models based on the Transformer architecture incor-
porate attention layers with inherent interpretability. These
Federated learning (FL) is gaining momentum in the field of attention layers can be strategically integrated into other
TSF, primarily addressing challenges associated with large models, with the analysis of attention weights aiding in the
local data volumes and privacy concerns during information comprehension of the relative importance of features at each
exchange. With FL, multiple participants can collaboratively time step [16, 47, 141]. By scrutinizing the distribution of
train models without the need to share sensitive raw data attention vectors across time intervals, the model can gain
[171]. In TSF tasks, each participant can leverage their local better insights into persistent patterns or relationships within
time series data for model training. Through FL algorithms, the time series [150], such as seasonal patterns.
the parameters of local models are aggregated to obtain a Recent advancements in the field have focused on learn-
global predictive model. This distributed learning process ing from perturbations and interpretable sparse system
ensures privacy protection, mitigating the risks of privacy identification methods to enhance the interpretability of
breaches associated with centralized data storage and trans- time series data [8, 69]. Among these, sparse optimization
mission. Current research efforts predominantly focus on methods, which obviate the need for time-consuming back-
load detection [25, 83, 232], traffic speed and flow [158, propagation training, exhibit efficient training capabilities on
279], energy consumption [208, 283], and communication CPUs. These methods offer insights for further exploration
networks [58, 227], among others. Exploring feasible solu- into interpretable time series forecasting.
tions in other domains remains an open avenue. Further-
more, federated learning harnesses the diversity of distrib- 5.1.3 Lack of temporal continuity
uted data sources, thereby enhancing model generalization
and prediction accuracy. Hence, federated learning holds Compared to traditional deep learning forecasting models,
great promise in the realm of TSF, offering a prospective the proposal of the Neural Ordinary Differential Equation
solution for large-scale, secure, and efficient time series pre- (NODE) [39] has directed our attention towards the deriva-
diction and analysis. tives of neural network parameterized hidden states, which
showcases superior performance over RNNs in both con-
5.1.2 Lack of Interpretability tinuous and discrete time series problems. Recent studies
applying Ordinary Differential Equations (ODE) or Partial
So far, the majority of efforts in the field of TSF have primar- Differential Equations (PDE) to TSF have explored various
ily focused on enhancing predictive performance through the directions such as learning latent relationships between vari-
design of intricate model architectures. However, research ables or events [53, 85, 138], handling irregular data [210],
into the interpretability of these models has been relatively achieving interpretable continuity [84, 117], optimizing
limited. As neural networks find application in critical tasks model parameters [42], and exploring differential dynamics
[176], the demand for comprehending why and how models [90, 153]. The ETN-ODE model proposed by Gao et al. [84]
make specific predictions has been growing. The N-BEATS is the first interpretable continuous neural network for multi-
model achieves high accuracy and interpretability in TSF by step time series forecasting of multiple variables at arbitrary
designing the interpretable architecture and output mecha- time instances. Additionally, their EgPDE-Net model [85] is
nisms [185]. This enables users to better comprehend the also the first to establish the continuous-time representation
model’s predictive outcomes while maintaining high fore- of multivariate time series as a partial differential equation
casting precision. problem. Its specially designed architecture utilizes ODE
Post-hoc interpretable models are developed for the pur- solvers to transform the partial differential equation problem
pose of elucidating already trained networks, aiding in the into an ODE problem, facilitating predictions at arbitrary
identification of crucial features or instances without modi- time steps.
fying the original model weights. These approaches mainly Temporal continuation is one of the crucial factors to
fall into two categories. One involves the application of sim- consider in the TSF process. The application of the Neural
pler interpretable surrogate models between the inputs and Differential Equation (NDE) paradigm in DTSF integrates
outputs of the neural network, relying on these approximate DL with differential equation modeling to naturally and
models to provide explanations [161, 199]. The other cat- accurately capture the dynamic evolution of time series.
egory encompasses gradient-based methods, such as those It interprets the evolution of individual components more
presented in [124, 220, 221], which scrutinize the network clearly and flexibly captures instantaneous changes by
gradients to determine which input features exert the most using a differential equation to describe the rate of change
significant influence on the loss function. of the data at each time point. For deep learning modelling
International Journal of Machine Learning and Cybernetics

of complicated time series data, the NDE technique offers 5.2 Prospects

an innovative and effective paradigm.
5.2.1 Potential representation learning

5.1.4 Challenges of parallel computing Representation Learning (RL) has recently emerged as one
of the hot topics in time series forecasting. While models
In the era of massive data, there is an urgent demand for based on stacked layers can yield respectable results, they
online real-time analysis of time series data. Currently, often come with high computational costs and may struggle
time series models are constructed based on stand-alone to capture the inherent features of the data. RL, on the other
sequence analysis, which often requires the use of high- hand, focuses on acquiring meaningful latent features that
performance GPU servers to improve computational effi- result in lower-dimensional and compact data representa-
ciency. However, on one hand, it is constrained by compu- tions, capturing the fundamental characteristics of the data.
tational resources and data scale, making real-time online Presently, many self-supervised or unsupervised approaches
forecasting unattainable. On the other hand, GPU servers aim to encode raw sequences to learn these latent represen-
are costly. Therefore, the research on efficient parallel tation features [51, 67]. Some works employ multi-module
computing based on deep learning and big data analytics architectures or model ensembles [166, 172, 264], while
technologies is poised to become a critical challenge. others use pre-training with denoising, smoothing proper-
ties, siamese structures or 2D-variation modeling [252, 277,
291], which provide novel solutions to various domain-spe-
5.1.5 Challenges of large models cific problems. Besides, contrastive learning is dedicated to
enabling models to compare observations at different time
Large models demonstrate advantages in the field of points and learn rich data representations by contrasting pos-
time series forecasting, excelling in capturing long-term itive and negative samples. Some works [162, 187, 274, 282]
dependencies, handling high-dimensional data, and miti- have utilized contrastive learning to assist models in learning
gating noise. A noteworthy exploration in this direction meaningful features from unlabeled data, thus enhancing
occurred on December 13, 2023 when Amazon released their generalization performance. This is especially valuable
work utilizing large models for time series forecasting, when labeled data is limited or unavailable.
marking a pioneering effort in applying large models to Learning temporal representations and employing con-
temporal prediction [259]. This work leverages large mod- trastive training can significantly enhance the model’s repre-
els to construct intricate relationships between sequences sentation and generalization capabilities in TSF. This greatly
while harnessing their robust text data processing capa- improves the model’s performance in handling complex,
bilities. The integration of large models has enhanced noisy, or changing data distributions.
the handling of multimodal data and interpretability in
financial forecasting scenarios. Large models have already
ventured into various domains, encompassing stock price 5.2.2 Counterfactual forecast and causal inference
predictions in financial markets [33, 118, 296], inference
of medical data [95, 228], forecasting human mobility tra- Counterfactual forecasting and causal inference represent
jectories [31], and serving as general-purpose models for promising avenues for future research in DTSF. Despite the
weather and energy demand predictions [137, 157, 256, existence of lots of deep learning methods for estimating
273, 278]. causal effects in static settings [3, 104, 268], the primary
On another note, significant strides have been made in challenge in time series data lies in the presence of time-
the training of foundational time series models [88, 260]. dependent confounding effects. This challenge arises due to
The recent TimeGPT-1 model [197] applies the techniques the time-dependence, where actions that influence the target
and architecture underlying large language models (LLM) are also conditioned on observations of the target. Recent
to the forecasting domain, successfully establishing the research advancements encompass the utilization of statisti-
first foundational time series model capable of zero-shot cal techniques, novel loss functions, extensions of existing
inference. This breakthrough opens avenues for creating methods, and appropriate inference algorithms [22, 86, 139,
foundational models specifically tailored for time series 149, 154].
forecasting. Moreover, while some efforts provide counterfactual
We believe that the performance and value of large mod- explanations for time series models [57, 178], they fall short
els in the realm of time series forecasting will continue of generating realistic counterfactual explanations or feasible
to unfold as technological advancements and innovations counterfactual explanations for time series models. Recent
progress. work has introduced a self-interpretable model capable of
International Journal of Machine Learning and Cybernetics

generating actionable counterfactual explanations for time (RLMC). It uses deterministic policies to output dynamic
series forecasting [263]. model weights for non-stationary time series data and lever-
Future research directions may revolve around further ages deep learning to extract hidden features from raw time
refining these approaches to address the additional com- series data, allowing rapid adaptation to evolving data distri-
plexities inherent in time series data and get more accu- butions. Notably, in RLMC, the use of DDPG, an off-policy
rate counterfactual interpretations. Additionally, innova- actor-critic algorithm [148], can produce continuous actions
tive methods should be sought to harness the full potential suitable for model combination problems and is trained with
of deep learning in counterfactual forecasting and causal recorded data to achieve improved sample efficiency. There-
inference, ultimately enhancing decision-making processes fore, the combination of reinforcement learning with some
across various domains. continuous control algorithms [80, 100] presents a unique
utility in determining ensemble model weights and is a path
5.2.3 TS diffusion worth exploring.

The burgeoning development of Diffusion models in the

5.2.5 Interdisciplinary exploration
domain of image and video streams has sparked novel
theories and models, gradually extending into the realm of
Due to the multidimensional nature of the relationships
TSF. Notably, TimeGrad employs RNN-guided denoising
between causes and effects in reality, there exist complex
for autoregressive predictions [196], while CSDI utilizes
interconnections among time series. While deep learning
non-autoregressive methods with self-supervised masking
models have demonstrated excellent performance in tack-
[235]. Similarly, SSSD utilizes structured state-space mod-
ling intricate TSF problems, they often lack systematic inter-
els to reduce computational complexity [4]. Despite being
pretability and clear hierarchical structures. In the realm of
early explorations in the TSF domain, these models still
network science, when dealing with extensive data, numer-
suffer from slow inference, high complexity, and boundary
ous variables, and intricate interconnections, it is possible
inconsistencies.
to construct multi-layered networks by categorizing and
In recent researches, the unconditionally trained TSDiff
stratifying the relationships among various elements. By
model employs self-guidance mechanisms to alleviate the
examining the dynamic changes in multi-layered networks,
computational overhead in reverse diffusion for downstream
it becomes feasible to forecast multidimensional data by ana-
task forecasting without auxiliary networks [126]. TimeDiff
lyzing high-dimensional correlations.
addresses boundary inconsistencies with future mixups and
For diverse domains, an interdisciplinary approach, such
autoregressive initialization mechanisms [216]. The multi-
as incorporating network science or other relevant theories,
scale diffusion model MR-Diff leverages multi-resolution
can be a beneficial choice in the future of DTSF research.
temporal structures for sequential trend extraction and non-
This approach enables a more insightful analysis of prob-
autoregressive denoising [9].
lems and their multidimensional aspects.
The first framework based on DDPM, Diffusion-TS, accu-
rately reconstructs samples using Fourier-based loss func-
tions, extending to forecasting tasks [7]. Furthermore, the
TMDM model combines conditional diffusion generation 6 Conclusion
processes with Transformer to achieve precise distribution
prediction for multivariate time series [11]. In this paper, we present a systematic survey for deep
The work on Diffusion primarily focuses on denoising, learning-based time series forecasting. We commence with
and numerous groundbreaking initiatives are emerging in the fundamental definition of time series and forecasting
the realm of DTSF. We anticipate Diffusion to become a tasks and summarize the statistical methods and their short-
prominent direction. comings. Next, moving on, we delve into neural network
architectures for time series forecasting, summarizing five
5.2.4 Determine the weight of the aggregate model major model paradigms that have gained prominence in
recent years: the Encoder-Decoder, Transformer, Genera-
At present, ensemble learning, as one of the mainstream tive Adversarial, Integration, and Cascade. Furthermore, we
paradigms, has proven to be effective and robust [13, 169, conduct an in-depth analysis of time series composition, elu-
236]. However, determining the weights of base models in cidating the primary approaches to enhance feature extrac-
an ensemble remains an unsolved challenge. Sub-optimal tion and learning from time series data. Additionally, we sur-
weighting can hinder the full potential of the final model. vey time series forecasting datasets across major domains,
To address this challenge, Fu et al. [79] proposed a model encompassing energy, healthcare, traffic, meteorology, and
combination framework based on reinforcement learning economics. Finally, we comprehensively outline the current
International Journal of Machine Learning and Cybernetics

Table 3 Time series datasets in primary domains

Domain Datasets Variants Data time range Data granularity Multi/uni Authors

ETTh1 7 2016–2018 1h Multi + uni Zhou et al.

ETTm1 7 2016–2018 15 m Multi + uni Zhou et al.
Energy Electricity 321 2011–2014 1h Multi + uni –
Wind 28 1986–2015 1h Uni –
Solar-energy 137 2006–2006 10 m Multi + uni Solar
Healthcare ILI 7 2002–2021 1w Uni –
MIT-BIH 2 1975–1979 360Hz Uni George
Traffic 862 2015–2016 1h Uni Caltrans
Transportation PeMSD4 PeMSD7 PeMSD8 307 228 170 2018/1 2012/5 2016/7 5m Multi Chen et al.
Weather1 12 1981–2010 1h Uni –
Meteorology Weather2 21 2020–2021 10 m Multi + uni Sparks et al.
Temperature rain 2 2015–2017 1d Multi + uni Rakshitha et al.
Exchange-rate 8 1990–2016 1d Uni Lai et al
Economics LOB-ITCH 149 2010–2010 1ms-10min Uni Adamantios et al.
Dominick 25 1989–1994 1w Uni Godahewa et al.

The table summarizes commonly used datasets and indicates whether they are multivariate, which implies temporal alignment with known
timestamps

challenges in the field and propose some potential research scales, such as sensor data, text, and images, complicating
directions. model construction. To address these issues, several tech-
niques have been proposed.
Multimodal learning, through shared representation learn-
Datasets in different domain ing, integrates diverse data types, improving model handling
of heterogeneous data [99]. Time alignment techniques, such
Time series, which exists in every aspect of our lives, carries as the TAM model, synchronize data from different time
the historical data of various fields in the time dimension. granularities by introducing a novel time-distance measure
Many datasets have been accumulated during the develop- [77]. Deep generative models, like GinAR, address missing
ment of the TSF task. These datasets are often cited in top values and noise by generating new samples and rebuilding
conferences and journals within the computer domain, fur- spatiotemporal dependencies [272]. Self-supervised learning
nishing researchers with high-quality research data charac- methods, such as SimCLR, allow models to learn from unla-
terized by rich samples and features, thus holding significant beled data, improving adaptability to heterogeneous sources
reference value. However, the diversity of these datasets [40]. Finally, collaborative attention mechanisms capture
introduces a significant challenge-data heterogeneity. The complex correlations between multimodal data and adjust
datasets described below cover five key TSF application modality weights dynamically, enhancing model learning
areas: energy, transportation, economics, meteorology, and capacity [61]. These models and techniques effectively inte-
healthcare [89], as shown in Table 3. These fields feature grate heterogeneous data, improving the stability and accu-
data with varying structures, formats, time granularities, and racy of time series forecasting in multi-source environments.

Fig. 15 Challenges in time series forecasting

International Journal of Machine Learning and Cybernetics

Fig. 16 Time series datasets in primary domains

Energy recorded at 15-minute intervals. These datasets originate

from two geographically disparate regions within the same
TSF is currently being extensively applied in a prominent province in China, designated as ETT-small-m1 and ETT-
domain, namely, energy management. Accurate forecast- small-m2, respectively. Each of these datasets consists of
ing within this domain plays a crucial role in facilitating an extensive 70,080 data points, calculated based on a dura-
status assessment and trend analysis, which in turn enables tion of 2 years, 365 days per year, 24 h per day, and data
the implementation of intelligent strategies in engineering sampling at 15-minute intervals. Furthermore, the dataset
planning. Fortunately, modern energy systems autonomously offers an alternate version with hourly granularity, denoted
gather extensive datasets encompassing diverse energy as ETT-small-h1 and ETT-small-h2. Each data point within
sources such as electricity [223], wind energy [75], and the ETT dataset is characterized by an 8-dimensional feature
solar energy [194]. These data resources are leveraged for vector, which includes the timestamp of the data point, the
the identification of patterns and trends in energy demand target variable ’oil temperature’, and six distinct types of
and supply, providing valuable insights for the development external load values.
of advanced forecasting models (see Fig. 16).

Electricity transformer temperature (ETT) Electricity

The ETT-small dataset encompasses data originating from The initial dataset utilized in this investigation is the Elec-
two distinct power transformer installations, each situated tricity Load Diagrams 2011-2014 Dataset [241], which
at a separate site [292]. This dataset comprises a variety records 370 customers’ electricity usage information
of parameters, such as load profiles and oil temperature between 2011 and 2014. Data is recorded in the original
readings. It serves the purpose of predicting the oil tem- dataset every 15 min. It was necessary to preprocess the
perature of power transformers and investigating their resil- dataset by deleting the 2011 data and aggregating it into
ience under extreme load conditions. The temporal scope hourly consumption in order to address the problem of some
of this dataset spans from July 2016 to July 2018, with data dimensions having a value of 0. As a result, the final dataset
International Journal of Machine Learning and Cybernetics

includes information on 321 customers’ electrical use from MIT‑BIH (arrhythmia database)
2012 to 2014.
There are 48 half-hour segments of two-channel ambulatory
Wind (European wind generation) ECG recordings available in the MIT-BIH Arrhythmia Data-
base.4 These recordings were from 47 individuals that the
For 28 European countries between 1986 and 2015, this BIH Arrhythmia Laboratory examined from 1975 to 1979.
dataset1 offers hourly estimates of energy potential expressed Every recording was digitalized with a resolution of 11 bits
as a percentage of the maximum output from power plants. and a range of 10 mV, at a rate of 360 samples per second
It is distinguished from other datasets by having sparser data per channel. Electrocardiogram data from this dataset can be
and a notable frequency of zeros at regular intervals. used for anticipating arrhythmias, among other uses.

Solar‑energy Transportation

The solar power production of 137 photovoltaic plants in Accurate and timely TSF of traffic is vital for urban traffic
Alabama State in 2006, recorded at 10-minute intervals, control and management. It aids in predicting traffic conges-
constitutes the dataset for our evaluation of short-sequence tion, traffic flow, accident rates, and the utilization of public
forecasting capabilities.2 transportation. These predictions can be used by transporta-
tion authorities and companies to plan and manage transpor-
Healthcare tation systems more effectively, thereby improving traffic
efficiency and safety.
TSF plays a pivotal role in the healthcare domain, serving as
a critical tool for predicting disease onset and progression, Traffic
evaluating the efficacy of pharmaceutical interventions, and
monitoring fluctuations in patients’ vital signs. These fore- This dataset5 includes hourly data from 2015-2016 that was
casts empower healthcare practitioners in enhancing disease collected during a 48-month period from the California
diagnosis, devising treatment strategies, overseeing patient Department of Transportation. The statistic shows the hourly
well-being, and implementing preventive measures for dis- road occupancy rate, which ranges from 0 to 1. The San
ease surveillance and containment. Francisco Bay Area’s roadways are home to 862 different
sensors from which the measurements are obtained.
ILI (influenza‑like illness)
PeMSD4/7/8
Weekly reports from the US Centers for Disease Control and
Prevention from 2002 to 2021 are included in the set of data. These datasets are highly regarded as industry standards for
It contains data on the overall number of patients as well as traffic forecasting [35].
the percentage of patients having influenza-like symptoms. PeMSD4 is one of them and it includes traffic speed data
from the San Francisco Bay Area. It incorporates data from
EEG (Electroencephalogram) 29 roads’ worth of 307 sensors. The January-February 2018
time frame is covered by the dataset.
The collection includes EEG3 recordings of participants PeMSD7 includes traffic information from California’s
obtained both prior to and during the performance of men- District 7. It covers the workday period from May to June
tal math exercises. Every recording is made up of 60-sec- 2012 and includes traffic speeds recorded by 228 sensors.
ond EEG segments free of artifacts. For every subject in the Five minutes are allotted for the collection of data.
dataset, there are 36 CSV files total, and each file has 19 PeMSD8 contains San Bernardino traffic statistics taken
data channels. during July and August of 2016. It includes data from 170
detectors positioned along 8 distinct routes. Five minutes are
allotted for the collection of data.

1
https://www.kaggle.com/datasets/sohier/30-years-of-european-
wind-generation.
2 4
https://www.nrel.gov/grid/solar-power-data.html. http://ecg.mit.edu/
3 5
https://github.com/meagmohit/EEG-Datasets. https://pems.dot.ca.gov/
International Journal of Machine Learning and Cybernetics

Meteorology LOB‑ITCH

TSF has become an indispensable task in the field of mete- Due to the lack of adequate records, few other fields have
orology with wide-ranging applications in weather forecast- Millisecond data on the span of days as in finance. In the
ing, such as meteorological disaster warnings, agricultural financial field, with the advent of automated trading, limit
production, and more. order books were born, which are very conducive to high-
frequency traders’ operations and leave a large amount of
Weather1 detailed data. The LOB-ITC dataset comprises around four
million events, each with a 144-dimensional representa-
The dataset Weather1 encompasses climate data from almost tion, pertaining over five stocks for ten consecutive trading
1600 locations in the United States,6 spanning a 4-year days [181], from June 1, 2010 to June 14, 2010. And what
period from 2010 to 2013. Hourly data points were col- makes this data different from other data of the same kind
lected, featuring the target value +"+wet bulb+"+ and 11 is the centralized trading market in the Nordic region. Some
climate-related features. researchers found that +"+the differences between differ-
ent trading platforms’ matching rules and transaction costs
Weather2 complicate comparisons between different limit order books
for the same asset [182]+"+. Therefore, Stock Exchange,
Weather2 comprises a meteorological time series featur- which has decentralized exchanges like the United States,
ing 21 weather indicators,7 collected every 10 min in 2020 has more influencing factors and is more difficult to model.
by the Max Planck Institute for Biogeochemistry’s weather In contrast, Helsinki Exchange is a pure limit order market,
station. which can provide purer data.

Temperature rain Dominick

Consisting of 32,072 daily time series, this dataset [91] pre- This dataset [92] incorporates data from randomized experi-
sents temperature observations and rain forecasts collected ments conducted by the University of Chicago Booth School
by the Australian Bureau of Meteorology. The data spans of Business and the now-defunct Dominick’s Finer Foods.
422 weather stations across Australia, covering the period The experiments spanned from 1989 to 1994, covering over
from 02/05/2015 to 26/04/2017. 25 different categories across all 100 stores in the chain. As
a result of this research collaboration, approximately nine
Economics years of store-level data on the sales of more than 3,500
UPCs are available through this resource.
In the field of finance, one of the most extensively studied
areas in TSF is the prediction of financial time series, par- Further data sources
ticularly asset prices. Typically, there are several subtopics
in this field, including stock price prediction, index predic- In addition to the commonly used datasets mentioned above,
tion, foreign exchange price prediction, commodity (such we extensively surveyed data sources from various domains
as oil, gold, etc.) price prediction, bond price prediction, and compiled a subset of additional datasets. These datasets
volatility prediction, and cryptocurrency price prediction. are derived from influential works and serve as the founda-
The following section will introduce commonly used data- tion for researching niche topics and detailed investigations
sets in this domain. in respective fields. We will provide appropriate descriptions
of the datasets listed in Table 4.
Exchange‑rate Several comprehensive datasets from large-scale com-
petitions are also noteworthy, such as M3/M4/M5. These
This dataset [132] compiles daily exchange rates mainly in datasets were put forward by the Makridakis Competitions,
trading days for eight countries (Australia, Canada, China, which are a series of open competitions to evaluate and com-
Japan, New Zealand, Singapore, Switzerland, and the United pare the accuracy of different TSF methods.
Kingdom) spanning the years 1990 to 2016.

6
https://www.ncei.noaa.gov/data/local-climatological-data/
7
https://www.bgc-jena.mpg.de/wetter/
International Journal of Machine Learning and Cybernetics

Table 4 Summary of the datasets used in the experiments

Domain Variants Dataset Data time range Data granularity References

21 the Scada wind farm in Turkey 2018/1/1-2018/12/29 10 m [152]

– Global horizontal solar radiation data 1998/1/1-2007/12/1 1h [226]
Energy – Rooftop PV plant 2015/1/1-2016/12/31 30 m [239]
9 UCI household electric power consumption 2006/12-2010/11 1m [26]
– Spanish electricity demand 2014/01/02-2019/11/01 10 m [133]
– Electric vehicles power consumption 2015/3/2-2016/5/31 1h [133]
– CDC ILI data 2010-2018 1d [253]
45 DEAP - 1 interval [123]
Healthcare 9 Turkish COVID-19 data 2020/3/27-2020/6/11 1d [122]
9 COVID-19 dataset of Orissa state 2020/1/30-2020/6/11 1d [52]
207 METR-LA 2012/3/1-2012/6/30 5m [27]
Transportation 325 PeMS-BAY 2017/1/1-2017/5/31 5m [27]
– BJER4 2014/7/1-2014/8/31 5m [271]
6 Daily data of Shenzhen from 2015 - [36]
Meteorology – CHIRPS 1981-2015 - [82]
– WeatherBench - - [195]
5 S&P500 1997/1/1-2016/12/1 1d [136]
Economics 13 NSE stocks data 1996/1/1-2015/6/30 1d [110]
6 NYSE stock data 2011/1/3-2016/12/30 1m [110]

M3 M6

This dataset8 comprises yearly, quarterly, monthly, daily, The dataset 10 comprises two categories of assets: one
and other time series. To ensure the development of accu- selected from the Standard & Poor’s 500 Index, consisting
rate forecasting models, minimum observation thresholds of 50 stocks, and the other comprising 50 Exchange-Traded
were established: 14 for yearly series, 16 for quarterly series, Funds (ETFs) from various international exchanges. The
48 for monthly series, and 60 for other series. Time series focus of the M6 competition lies in forecasting the returns
within the domains of micro, industry, macro, finance, and risks associated with these stocks, along with investment
demographic, and others were included. decisions made based on the aforementioned predictions.

M4
Funding Open Access funding enabled and organized by CAUL
and its Member Institutions. This work was supported in part by the
The M4 dataset [168] encompasses 100,000 real-life series National Natural Science Foundation of China under Grant 62476247,
in diverse domains, including micro, industry, macro, 62073295 and 62072409, in part by the "Pioneer" and "Leading Goose"
finance, demographic, and others. R&D Program of Zhejiang under Grant 2024C01214, and in part by
the Zhejiang Provincial Natural Science Foundation under Grant
LR21F020003.
M5
Data Availability Data sharing is not applicable to this article as no new
Covering stores in three US States (California, Texas, and data were created or analyzed in this study.
Wisconsin), this dataset9 includes item-level, department,
product categories, and store details. It incorporates explana- Declarations
tory variables such as price, promotions, day of the week, Conflict of interest The authors declare that they have no known com-
and special events. Alongside time series data, it incorpo- peting financial interests or personal relationships that could have ap-
rates additional explanatory variables (e.g., Super Bowl, peared to influence the work reported in this paper.
Valentine’s Day, and Orthodox Easter) influencing sales, Open Access This article is licensed under a Creative Commons Attri-
enhancing forecasting accuracy. bution 4.0 International License, which permits use, sharing, adapta-
tion, distribution and reproduction in any medium or format, as long
8
https://forecasters.org/resources/time-series-data/
9 10
https://mofc.unic.ac.cy/m5-competition/ https://mofc.unic.ac.cy/
International Journal of Machine Learning and Cybernetics

as you give appropriate credit to the original author(s) and the source, 15. Bai Shaojie, Kolter J Zico, Koltun Vladlen (2018) An empirical
provide a link to the Creative Commons licence, and indicate if changes evaluation of generic convolutional and recurrent networks for
were made. The images or other third party material in this article are sequence modeling. arXiv preprint[SPACE]arXiv:1803.01271.
included in the article's Creative Commons licence, unless indicated Accessed 2 Feb 2025
otherwise in a credit line to the material. If material is not included in 16. Bai Tian, Zhang Shanshan, Egleston Brian L, Vucetic Slobodan
the article's Creative Commons licence and your intended use is not (2018) Interpretable representation learning for healthcare via
permitted by statutory regulation or exceeds the permitted use, you will capturing disease progression through time. In: Proceedings of
need to obtain permission directly from the copyright holder. To view a the 24th ACM SIGKDD International Conference on Knowledge
copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. Discovery & Data Mining. pp 43–51
17. Kasun Bandara, Peibei Shi, Christoph Bergmeir, Hansika Hewa-
malage, Quoc Tran (2019) Seaman Brian (2019) Sales demand
forecast in e-commerce using a long short-term memory neu-
References ral network methodology. Neural Information Processing: 26th
International Conference. ICONIP 2019, Sydney, NSW, Aus-
1. Adhikari Ratnadip, Agrawal Ramesh K (2013) An introduc- tralia, December 12–15, 2019, Proceedings, Part III 26. Springer,
tory study on time series modeling and forecasting. arXiv Cham, pp 462–474
preprint[SPACE]arXiv:1302.6613. Accessed 2 Feb 2025 18. Bandara Kasun, Bergmeir Christoph, Hewamalage Hansika
2. Ahmed Nesreen K, Atiya Amir F, El Gayar Neamat, El- (2020) Lstm-msnet: leveraging forecasts on sets of related
Shishiny Hisham (2010) An empirical comparison of machine time series with multiple seasonal patterns. IEEE Trans Neural
learning models for time series forecasting. Economet Rev Netw Learn Syst 32(4):1586–1599
29(5–6):594–621 19. Benidis Konstantinos, Rangapuram Syama Sundar, Flunkert
3. Alaa Ahmed M, Weisz Michael, Van Der Schaar Mihaela (2017) Valentin, Wang Yuyang, Maddix Danielle, Turkmen Caner,
Deep counterfactual networks with propensity-dropout. arXiv Gasthaus Jan, Bohlke-Schneider Michael, Salinas David, Stella
preprint[SPACE]arXiv:1706.05966. Accessed 2 Feb 2025 Lorenzo et al (2022) Deep learning for time series forecasting:
4. Lopez Alcaraz JM, Strodthoff N (2022) Diffusion-based time tutorial and literature survey. ACM Comput Surv 55(6):1–36
series imputation and forecasting with structured state space 20. Berthelot David, Roelofs Rebecca, Sohn Kihyuk, Carlini Nich-
models. Transactions on Machine Learning Research. arXiv: olas, Kurakin Alexey (2022) Adamatch: A unified approach to
2208.09399. Accessed 3 Feb 2025 semi-supervised learning and domain adaptation. In Interna-
5. Ali Jehad, Khan Rehanullah, Ahmad Nasir, Maqsood Imran tional Conference on Learning Representations. URL https://
(2012) Random forests and decision trees. Int J Comput Sci openreview.n et/forum?i d=Q 5uh1N vv5dm. Accessed 2 Feb
Issues (IJCSI) 9(5):272 2025
6. Andersen, Torben G., Bollerslev, Tim, Christoffersen, Peter, Die- 21. Bi Hongjing, Lilei Lu, Meng Yizhen (2023) Hierarchical atten-
bold Francis X (2005) Volatility forecasting tion network for multivariate time series long-term forecasting.
7. Yuan X, Qiao Y (2024) Diffusion-TS: Interpretable diffusion Appl Intell 53(5):5060–5071
for general time series generation. In: The Twelfth International 22. Bica I, Alaa AM, Jordon J, van der Schaar M (2020) Estimating
Conference on Learning Representations. arXiv:2403.01742. counterfactual treatment outcomes over time through adver-
Accessed 3 Feb 2025 sarially balanced representations. In: International Confer-
8. Liu X, Chen D, Wei W, Zhu X, Yu W (2024) Interpretable sparse ence on Learning Representations (ICLR). arXiv:2002.04083.
system identification: Beyond recent deep learning techniques on Accessed 3 Feb 2025
time-series prediction. In: The Twelfth International Conference 23. Böse Joos-Hendrik, Flunkert Valentin, Gasthaus Jan,
on Learning Representations Januschowski Tim, Lange Dustin, Salinas David, Schelter
9. Shen L, Chen W, Kwok J (2024) Multi-resolution diffusion Sebastian, Seeger Matthias, Wang Yuyang (2017) Proba-
models for time series forecasting. In: The Twelfth International bilistic demand forecasting at scale. Proc VLDB Endow
Conference on Learning Representations. https://o penre view.n et/ 10(12):1694–1705
forum?id=mmjnr0G8ZY. Accessed 3 Feb 2025 24. Box George EP, Jenkins Gwilym M, Reinsel Gregory C, Ljung
10. Park J, Gwak D, Choo J, Choi E (2024) Self-supervised contras- Greta M (2015) Time series analysis: forecasting and control.
tive forecasting. In: The Twelfth International Conference on John Wiley & Sons, New York
Learning Representations. arXiv:2402.02023. Accessed 3 Feb 25. Briggs Christopher, Fan Zhong, Andras Peter (2022) Federated
2025 learning for short-term residential load forecasting. IEEE Open
11. Li Y, Chen W, Hu X, Chen B, Zhou M (2024) Transformer- Access J Power Energy 9:573–583
modulated diffusion models for probabilistic multivariate time 26. Seok-Jun Bu, Cho Sung-Bae (2020) Time series forecasting with
series forecasting. In: The Twelfth International Conference on multi-headed attention-based deep learning for residential energy
Learning Representations. https://openreview.net/forum?id= consumption. Energies 13(18):4722
qae04YACHs. Accessed 3 Feb 2025 27. Cai Ling, Janowicz Krzysztof, Mai Gengchen, Yan Bo, Zhu Rui
12. Ansari AF, Stella L, Turkmen C, Zhang X, Mercado P, Shen H, (2020) Traffic transformer: capturing the continuity and periodic-
Shchur O, Rangapuram SS, Arango SP, Kapoor S, Zschiegner ity of time series for traffic forecasting. Trans GIS 24(3):736–755
J (2024) Chronos: learning the language of time series. Trans- 28. Cai Zhaowei, Vasconcelos Nuno (2018) Cascade r-cnn: delving
actions on Machine Learning Research. arXiv:2403.07815. into high quality object detection. In: Proceedings of the IEEE
Accessed 3 Feb 2025 Conference on Computer Vision and Pattern Recognition. pp
13. Arbib Michael A (2003) The handbook of brain theory and neural 6154–6162
networks. MIT Press, New York 29. Callot Laurent AF, Kock Anders B, Medeiros Marcelo C (2017)
14. Asiful Mohammed, Hossain Rezaul Karim, THulasiram Ruppa, Modeling and forecasting large realized covariance matrices and
Bruce Neil DB, Wang Yang (2018) Hybrid deep learning model portfolio choice. J Appl Economet 32(1):140–158
for stock price prediction. In IEEE Symposium Series on Com- 30. Cao Defu, Wang Yujing, Duan Juanyong, Zhang Ce, Zhu Xia,
putational Intelligence, SSCI, Bangalore, India Huang Congrui, Tong Yunhai, Bixiong Xu, Bai Jing, Tong
Jie et al (2020) Spectral temporal graph neural network for
International Journal of Machine Learning and Cybernetics

multivariate time-series forecasting. Adv Neural Inf Process Syst interpretable predictive model for healthcare using reverse time
33:17766–17778 attention mechanism. Adv Neural Inf Process Syst. p 29
31. Cao Defu, Jia Furong, Arik Sercan O, Pfister Tomas, Zheng 48. Cirstea Razvan-Gabriel, Guo Chenjuan, Yang Bin, Kieu Tung,
Yixiang, Ye Wen, Liu Yan (2023a) Tempo: Prompt-based gen- Dong Xuanyi, Pan Shirui (2022) Triformer: Triangular, varia-
erative pre-trained transformer for time series forecasting. arXiv ble-specific attentions for long sequence multivariate time series
preprint[SPACE]arXiv:2310.04948. Accessed 2 Feb 2025 forecasting–full version. arXiv preprint[SPACE]arXiv:2204.
32. Cao Haizhou, Huang Zhenhao, Yao Tiechui, Wang Jue, He Hui, 13767. Accessed 2 Feb 2025
Wang Yangang (2023) Inparformer: evolutionary decomposition 49. Cleveland Robert B, Cleveland William S, McRae Jean E, Irma
transformers with interactive parallel attention for long-term time Terpenning (1990) Stl: a seasonal-trend decomposition. J. Off.
series forecasting. In: Proceedings of the AAAI Conference on Stat 6(1):3–73
Artificial Intelligence 50. Cochrane John H (1997) Time series for macroeconomics and
33. Chang Ching, Peng Wen-Chih, Chen Tien-Fu (2023) Llm4ts: finance
Two-stage fine-tuning for time-series forecasting with pre-trained 51. Darban Zahra Zamanzadeh, Webb Geoffrey I, Pan Shirui, Salehi
llms. arXiv preprint[SPACE]arXiv:2308.08469. Accessed 2 Feb Mahsa (2023) Carla: A self-supervised contrastive representa-
2025 tion learning approach for time series anomaly detection. arXiv
34. Che Dunren, Safran Mejdl, Peng Zhiyong (2013) From big data preprint[SPACE]arXiv:2308.09296. Accessed 2 Feb 2025
to big data mining: challenges, issues, opportunities. In Data- 52. Dash Satyabrata, Chakravarty Sujata, Mohanty Sachi Nandan,
base Systems for Advanced Applications: 18th International Pattanaik Chinmaya Ranjan, Jain Sarika (2021) A deep learn-
Conference, DASFAA 2013, International Workshops: BDMA, ing method to forecast covid-19 outbreak. N Gener Comput
SNSM, SeCoP, Wuhan, China, April 22-25, 2013. Proceedings 39(3–4):515–539
18. Springer, New York. pp 1–15 53. De Brouwer Edward, Simm Jaak, Arany Adam, Moreau Yves
35. Chen Chao, Petty Karl, Skabardonis Alexander, Varaiya Pravin, (2019) Gru-ode-bayes: continuous modeling of sporadically-
Jia Zhanfeng (2001) Freeway performance measurement system: observed time series. Adv Neural Inf Process Syst. p 32
mining loop detector data. Transp Res Rec 1748(1):96–102 54. De Livera Alysha M, Hyndman Rob J, Snyder Ralph D (2011)
36. Chen Guici, Liu Sijia, Jiang Feng (2022) Daily weather forecast- Forecasting time series with complex seasonal patterns using
ing based on deep learning model: a case study of shenzhen city, exponential smoothing. J Am Stat Assoc 106(496):1513–1527
china. Atmosphere 13(8):1208 55. Deb Chirag, Zhang Fan, Yang Junjing, Lee Siew Eang, Shah
37. Chen Mu-Yen, Chen Bo-Tsuen (2015) A hybrid fuzzy time series Kwok Wei (2017) A review on time series forecasting tech-
model based on granular computing for stock price forecasting. niques for building energy consumption. Renew Sustain
Inf Sci 294:227–241 Energy Rev 74:902–924
38. Chen Peng, Zhang Yingying, Cheng Yunyao, Shu Yang, Wang 56. Deng Shumin, Zhang Ningyu, Zhang Wen, Chen Jiaoyan, Pan
Yihang, Wen Qingsong, Yang Bin, Guo Chenjuan (2024) Jeff Z, Chen Huajun (2019) Knowledge-driven stock trend pre-
Multi-scale transformers with adaptive pathways for time diction and explanation via temporal convolutional network.
series forecasting. In: International Conference on Learning In: Companion Proceedings of the 2019 World Wide Web Con-
Representations ference. pp 678–685
39. Chen Ricky TQ, Rubanova Yulia, Bettencourt Jesse, Duvenaud 57. Dhaou Amin, Bertoncello Antoine, Gourvénec Sébastien, Gar-
David K (2018) Neural ordinary differential equations. Adv Neu- nier Josselin, Le Pennec Erwan (2021) Causal and interpretable
ral Inf Process Syst. p 31 rules for time series analysis. In: Proceedings of the 27th ACM
40. Chen Ting, Kornblith Simon, Norouzi Mohammad, Hinton Geof- SIGKDD Conference on Knowledge Discovery & Data Min-
frey (2020) A simple framework for contrastive learning of visual ing. pp 2764–2772
representations. In: International Conference on Machine Learn- 58. Díaz González F (2019) Federated learning for time series fore-
ing. PMLR pp 1597–1607 casting using lstm networks: exploiting similarities through
41. Chen Yitian, Kang Yanfei, Chen Yixiong, Wang Zizhuo (2020) clustering. Master’s thesis, KTH Royal Institute of Technology,
Probabilistic forecasting with temporal convolutional neural net- School of Electrical Engineering and Computer Science. http://
work. Neurocomputing 399:491–501 urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-254665
42. Chen Yuehui, Yang Bin, Meng Qingfang, Zhao Yaou, Abraham 59. Dong Jiaxiang, Wu Haixu, Zhang Haoran, Zhang Li, Wang
Ajith (2011) Time-series forecasting using a system of ordinary Jianmin, Long Mingsheng (2023) Simmtm: A simple pre-
differential equations. Inf Sci 181(1):106–114 training framework for masked time-series modeling. arXiv
43. Chen Yushu, Liu Shengzhuo, Yang Jinzhe, Jing Hao, Zhao preprint[SPACE]arXiv:2302.00861. Accessed 2 Feb 2025
Wenlai, Yang Guangwen (2023) A joint time-frequency domain 60. Dong Jiaxiang, Wu Haixu, Wang Yuxuan, Qiu Yunzhong,
transformer for multivariate time series forecasting. arXiv Zhang Li, Wang Jianmin, Long Mingsheng (2024) Timesiam:
preprint[SPACE]arXiv:2305.14649. Accessed 2 Feb 2025 A pre-training framework for siamese time-series modeling.
44. Cheng Joseph Y, Goh Hanlin, Dogrusoz Kaan, Tuzel Oncel, arXiv preprint[SPACE]arXiv:2 402.0 2475. Accessed 2 Feb
Azemi Erdrin (2020) Subject-aware contrastive learning for 2025
biosignals. arXiv preprint[SPACE]arXiv:2007.04871. Accessed 61. Dosovitskiy Alexey, Fischer Philipp, Springenberg Jost Tobias,
2 Feb 2025 Riedmiller Martin, Brox Thomas (2025) Discriminative unsu-
45. Cheng Qi, Chen Yixin, Xiao Yuteng, Yin Hongsheng, Liu pervised feature learning with exemplar convolutional neural
Weidong (2022) A dual-stage attention-based bi-lstm net- networks. arXiv preprint[SPACE]arXiv:1406.6909. Accessed
work for multivariate time series prediction. J Supercomput 2 Feb 2025
78(14):16214–16235 62. Drouin Alexandre, Marcotte Étienne, Chapados Nicolas (2022)
46. Cho Kyunghyun (2014) Learning phrase representations using Tactis: Transformer-attentional copulas for time series. In:
rnn encoder-decoder for statistical machine translation. arXiv International Conference on Machine Learning
preprint[SPACE]arXiv:1406.1078. Accessed 2 Feb 2025 63. Shengdong Du, Li Tianrui, Yang Yan, Horng Shi-Jinn (2020)
47. Choi Edward, Bahadori Mohammad Taha, Sun Jimeng, Kulas Multivariate time series forecasting via attention-based
Joshua, Schuetz Andy, Stewart Walter (2016) Retain: An encoder-decoder framework. Neurocomputing 388:269–279
International Journal of Machine Learning and Cybernetics

64. Duan Wenying, He Xiaoxi, Zhou Lu, Thiele Lothar, Rao 82. Funk Chris, Peterson Pete, Landsfeld Martin, Pedreros Diego,
Hong (2023) Combating distribution shift for accurate time Verdin James, Shukla Shraddhanand, Husak Gregory, Rowland
series forecasting via hypernetworks. In: 2022 IEEE 28th James, Harrison Laura, Hoell Andrew et al (2015) The climate
International Conference on Parallel and Distributed Systems hazards infrared precipitation with stations-a new environmental
(ICPADS). IEEE. pp 900–907 record for monitoring extremes. Sci Data 2(1):1–21
65. Ekambaram Vijay, Jati Arindam, Nguyen Nam, Sinthong Phan- 83. Gao, Jiechao, Wang, Wenpeng, Liu Zetian, Fazlay Rabbi Masum
wadee, Kalagnanam Jayant (2023) Tsmixer: Lightweight mlp- Billah Md, Campbell Bradford (2021) Decentralized federated
mixer model for multivariate time series forecasting. arXiv learning framework for the neighborhood: a case study on resi-
preprint[SPACE]arXiv:2306.09364. Accessed 2 Feb 2025 dential building load forecasting. In: Proceedings of the 19th
66. Eldele Emadeldeen, Ragab Mohamed, Chen Zhenghua, Wu ACM Conference on Embedded Networked Sensor Systems. pp
Min, Kwoh Chee Keong, Li Xiaoli, Guan Cuntai (2021) Time- 453–459
series representation learning via temporal and contextual con- 84. Gao Penglei, Yang Xi, Huang Kaizhu, Zhang Rui (2022) John
trasting. arXiv preprint[SPACE]arXiv:2106.14112. Accessed Yannis Goulermas. Explainable tensorized neural ordinary dif-
2 Feb 2025 ferential equations for arbitrary-step time series prediction, IEEE
67. Eldele Emadeldeen, Ragab Mohamed, Chen Zhenghua, Min Transactions on Knowledge and Data Engineering
Wu, Kwoh Chee-Keong, Li Xiaoli (2023) Cuntai Guan. Self- 85. Gao Penglei, Yang Xi, Huang Kaizhu, Zhang Rui, Guo Ping,
supervised contrastive representation learning for semi-super- Goulermas John Y (2022) Egpde-net: Building continuous neural
vised time-series classification, IEEE Transactions on Pattern networks for time series prediction with exogenous variables.
Analysis and Machine Intelligence arXiv preprint[SPACE]arXiv:2208.01913. Accessed 2 Feb 2025
68. Eldele Emadeldeen, Ragab Mohamed, Chen Zhenghua, Wu 86. Gao Shanyun, Addanki Raghavendra, Yu Tong, Rossi Ryan A,
Min, Li Xiaoli (2024) Tslanet: Rethinking transformers for time Kocaoglu Murat (2023) Causal discovery in semi-stationary time
series representation learning. arXiv preprint[SPACE]arXiv: series. In: Thirty-seventh Conference on Neural Information Pro-
2404.08472. Accessed 2 Feb 2025 cessing Systems
69. Enguehard Joseph (2023) Learning perturbations to explain 87. Gardner Jr Everette S (1985) Exponential smoothing: The state
time series predictions. arXiv preprint[SPACE]arXiv:2305. of the art. J Forecast 4(1):1–28
18840. Accessed 2 Feb 2025 88. Garza Azul, Mergenthaler-Canseco Max (2023) Timegpt-1.
70. Esteban Cristóbal, Hyland Stephanie L, Rätsch Gunnar (2017) arXiv preprint[SPACE]arXiv:2310.03589. Accessed 2 Feb 2025
Real-valued (medical) time series generation with recurrent 89. Gebodh Nigel, Esmaeilpour Zeinab, Datta Abhishek, Bikson
conditional gans. arXiv preprint[SPACE]arXiv:1706.02633. Marom (2021) Dataset of concurrent eeg, ecg, behavior with
Accessed 2 Feb 2025 multiple doses of transcranial electrical stimulation. Sci Data
71. Faloutsos Christos, Gasthaus Jan, Januschowski Tim, Wang Yuy- 8(1):274
ang (2018) Forecasting big time series: old and new. Proc VLDB 90. Gilani, Faheem H (2021) Diffusion maps and its applications
Endow 11(12):2102–2105 to time series forecasting and filtering and second order elliptic
72. Faloutsos Christos, Flunkert Valentin, Gasthaus Jan, PDEs. The Pennsylvania State University
Januschowski Tim, Wang Yuyang (2019) Forecasting big time 91. Godahewa Rakshitha, Bergmeir Christoph, Webb Geoff (2021)
series: Theory and practice. In Proceedings of the 25th ACM Rob Hyndman. Pablo Montero-Manso, Temperature rain dataset
SIGKDD International Conference on Knowledge Discovery & without missing values
Data Mining. pp 3209–3210 92. Godahewa Rakshitha, Bergmeir Christoph, Webb Geoff (2021)
73. Faloutsos Christos, Gasthaus Jan, Januschowski Tim, Wang Pablo Montero-Manso. Rob Hyndman, Dominick dataset
Yuyang (2019) Classical and contemporary approaches to big 93. Gong Zeying, Tang Yujin, Liang Junwei (2023) Patchmixer: A
time series forecasting. In: Proceedings of the 2019 International patch-mixing architecture for long-term time series forecasting.
Conference on Management of Data. pp 2042–2047 arXiv preprint[SPACE]arXiv:2310.00655. Accessed 2 Feb 2025
74. Fan Jianqing, Han Fang, Liu Han (2014) Challenges of big data 94. Goodfellow Ian, Bengio Yoshua (2016) Aaron Courville. MIT
analysis. Natl Sci Rev 1(2):293–314 Press, Deep learning
75. Feng Cong, Chartan Erol Kevin, Hodge Bri-Mathias S, Zhang Jie 95. Gruver Nate, Finzi Marc, Qiu Shikai, Wilson Andrew Gordon
(2017) Characterizing time series data diversity for wind fore- (2023) Large language models are zero-shot time series forecast-
casting. BDCAT. pp 113–119 ers. arXiv preprint[SPACE]arXiv:2310.07820. Accessed 2 Feb
76. Aya Ferchichi, Ben Abbes Ali, Vincent Barra, Manel Rhif, Riadh 2025
Farah Imed (2024) Multi-attention generative adversarial net- 96. Gu Albert, Goel Karan, Ré Christopher (2021) Efficiently
work for multi-step vegetation indices forecasting using multi- modeling long sequences with structured state spaces. arXiv
variate time series. Eng Appl Artif Intell 128:107563 preprint[SPACE]arXiv:2111.00396. Accessed 2 Feb 2025
77. Folgado Duarte, Barandas Marília, Matias Ricardo, Martins Rod- 97. Guo Jing, Lin Penghui, Zhang Limao, Pan Yue, Xiao Zhonghua
rigo, Carvalho Miguel, Gamboa Hugo (2018) Time alignment (2023) Dynamic adaptive encoder-decoder deep learning net-
measurement for time series. Pattern Recogn 81:268–279 works for multivariate time series forecasting of building energy
78. En Fu, Zhang Yinong, Yang Fan, Wang Shuying (2022) Tempo- consumption. Appl Energy 350:121803
ral self-attention-based conv-lstm network for multivariate time 98. Guo Na, Liu Cong, Li Caihong, Zeng Qingtian, Ouyang Chun,
series prediction. Neurocomputing 501:162–173 Liu Qingzhi, Lu Xixi (2024) Explainable and effective process
79. Fu Yuwei, Wu Di, Boulet Benoit (2022) Reinforcement learning remaining time prediction using feature-informed cascade pre-
based dynamic model combination for time series forecasting. In: diction model. IEEE Transactions on Services Computing
Proceedings of the AAAI Conference on Artificial Intelligence 99. Guo Wenzhong, Wang Jianwen, Wang Shiping (2019) Deep
80. Fujimoto Scott, Hoof Herke, Meger David (2018) Addressing multimodal representation learning: a survey. Ieee Access
function approximation error in actor-critic methods. In: Inter- 7:63373–63394
national Conference on Machine Learning. pp 1587–1596 100. Haarnoja Tuomas, Zhou Aurick, Hartikainen Kristian, Tucker
81. Fuller Wayne A (2009) Introduction to statistical time series. George, Ha Sehoon, Tan Jie, Kumar Vikash, Zhu Henry, Gupta
John Wiley & Sons, New York Abhishek, Abbeel Pieter, et al (2018) Soft actor-critic algorithms
International Journal of Machine Learning and Cybernetics

and applications. arXiv preprint[SPACE]arXiv:1812.05905. 119. Kalra Riya, Singh Tinku, Mishra Suryanshi, Kumar Naveen,
Accessed 2 Feb 2025 Kim Taehong, Kumar Manish et al (2024) An efficient hybrid
101. Hamilton James D (2020) Time series analysis. Princeton Uni- approach for forecasting real-time stock market indices. J King
versity Press, Princeton Saud Univ-Comput Inf Sci 36(8):102180
102. Hamzaçebi Coşkun (2008) Improving artificial neural net- 120. Shruti Kaushik, Abhinav Choudhury, Kumar Sheron Pankaj,
works’ performance in seasonal time series forecasting. Inf Sci Nataraj Dasgupta, Sayee Natarajan, Pickett Larry A, Varun
178(23):4550–4559 Dutt (2020) Ai in healthcare: time-series forecasting using sta-
103. Han Wenyong, Zhu Tao, Chen Liming, Ning Huansheng, Luo tistical, neural, and ensemble architectures. Front Big Data 3:4
Yang, Wan Yaping (2024)Mcformer: Multivariate time series 121. Klopries Hendrik, Schwung Andreas (2024) Itf-gan: synthetic
forecasting with mixed-channels transformer. IEEE Internet of time series dataset generation and manipulation by interpret-
Things Journal able features. Knowl-Based Syst 283:111131
104. Hartford Jason, Lewis Greg, Leyton-Brown Kevin, Taddy Matt 122. Koc Erdinc, Türkoğlu Muammer (2022) Forecasting of medical
(2017) Deep iv: A flexible approach for counterfactual predic- equipment demand and outbreak spreading based on deep long
tion. In: International Conference on Machine Learning. PMLR. short-term memory network: the covid-19 pandemic in Turkey.
pp 1414–1423 Signal, Image and Video Processing, pp 1–9
105. Harvey Andrew C (1990) Forecasting, structural time series mod- 123. Koelstra Sander, Muhl Christian, Soleymani Mohammad, Lee
els and the kalman filter Jong-Seok, Yazdani Ashkan, Ebrahimi Touradj, Pun Thierry,
106. He Xiaoyu, Shi Suixiang, Geng Xiulin, Lingyu Xu (2022) Nijholt Anton, Patras Ioannis (2011) Deap: a database for emo-
Dynamic co-attention networks for multi-horizon forecasting in tion analysis; using physiological signals. IEEE Trans Affect
multivariate time series. Futur Gener Comput Syst 135:72–84 Comput 3(1):18–31
107. He Xiaoyu, Shi Suixiang, Geng Xiulin, Jie Yu, Lingyu Xu (2023) 124. Koh Pang Wei, Liang Percy (2017) Understanding black-box
Multi-step forecasting of multivariate time series using multi- predictions via influence functions. In: International Confer-
attention collaborative network. Expert Syst Appl 211:118516 ence on Machine Learning. PMLR pp 1885–1894
108. Heaton Jeff (2018) Ian goodfellow, yoshua bengio, aaron 125. Kolassa Stephan (2020) Why the “best’’ point forecast depends
courville: Deep learning: The mit press, 2016, 800 pp, isbn: on the error or accuracy measure. Int J Forecast 36(1):208–211
0262035618. Genetic Programming and Evolvable Machines, 126. Kollovieh Marcel, Ansari Abdul Fatir, Bohlke-Schneider
19 (1-2):305–307 Michael, Zschiegner Jasper, Wang Hao, Wang Bernie (2023)
109. Hipel Keith W, Ian McLeod A (1994) Time series modelling of Predict, refine, synthesize: Self-guiding diffusion models for
water resources and environmental systems. Elsevier, Amsterdam probabilistic time series forecasting. In Thirty-seventh Confer-
110. Ma Hiransha, Ab Gopalakrishnan E, Krishna Menon Vijay, ence on Neural Information Processing Systems, 2023
Soman KP (2018) Nse stock market prediction using deep- 127. Kong Xiangjie, Yuhan Wu, Wang Hui, Xia Feng (2022) Edge
learning models. Proc Comput Sci 132:1351–1362 computing for internet of everything: a survey. IEEE Internet
111. Ho Tin Kam (1995) Random decision forests. Proc 3rd Int Conf Things J 9(23):23472–23485
Document Anal Recogn. 1:278–282 128. Kong Xiangjie, Shen Zhehui, Wang Kailai, Shen Guojiang, Fu
112. Hu Jiaxi, Lan Disen, Zhou Ziyu, Wen Qingsong, Liang Yuxuan Yanjie (2024) Exploring bus stop mobility pattern: a multi-
(2024) Time-ssm: simplifying and unifying state space models pattern deep learning prediction framework. IEEE Transactions
for time series forecasting. arXiv preprint[SPACE]arXiv:2405. on Intelligent Transportation Systems
16312. Accessed 2 Feb 2025 129. Kontschieder Peter, Fiterau Madalina, Criminisi Antonio,
113. Hyndman RJ, Athanasopoulos G (2018) Forecasting: Princi- Bulo Samuel Rota (2015) Deep neural decision forests. In Pro-
ples and Practice. OTexts ceedings of the IEEE International Conference on Computer
114. Ilbert Romain, Odonnat Ambroise, Feofanov Vasilii, Virmaux Vision, pages 1467–1475
Aladin, Paolo Giuseppe, Palpanas Themis, Redko Ievgen 130. Koochali Alireza, Schichtel Peter, Dengel Andreas, Ahmed
(2024) Unlocking the potential of transformers in time series Sheraz (2019) Probabilistic forecasting of sensory data
forecasting with sharpness-aware minimization and channel- with generative adversarial networks-forgan. IEEE Access
wise attention. arXiv preprint[SPACE]arXiv:2 402.1 0198. 7:63868–63880
Accessed 2 Feb 2025 131. Nikolaos Kourentzes, Fotios Petropoulos, Trapero Juan R
115. Januschowski Tim, Gasthaus Jan, Wang Yuyang, Salinas (2014) Improving forecasting by estimating time series struc-
David, Flunkert Valentin, Bohlke-Schneider Michael, Callot tural components across multiple frequencies. Int J Forecast
Laurent (2020) Criteria for classifying forecasting methods. 30(2):291–302
Int J Forecast 36(1):167–177 132. Lai Guokun, Chang Wei-Cheng, Yang Yiming, Liu Hanxiao
116. Jeha Paul, Bohlke-Schneider Michael, Mercado Pedro, Kapoor (2018) Modeling long-and short-term temporal patterns with
Shubham, Singh Nirwan Rajbir, Flunkert Valentin, Gasthaus deep neural networks. In: The 41st International ACM SIGIR
Jan, Januschowski Tim (2022) Psa-gan: Progressive self atten- Conference on Research & Development in Information
tion gans for synthetic time series. The Tenth International Retrieval. pp 95–104
Conference on Learning Representations, ICLR ; Conference 133. Lara-Benítez Pedro, Carranza-García Manuel, Luna-Romera
date: 25-04-2022 Through 29-04-2022 José M, Riquelme José C (2020) Temporal convolutional net-
117. Ming Jin Yu, Zheng Yuan-Fang Li, Chen Siheng, Yang Bin works applied to energy-related time series forecasting. Appl
(2022) Shirui Pan. Multivariate time series forecasting with sci. 10(7):2322
dynamic graph neural odes, IEEE Transactions on Knowledge 134. LeCun Yann, Bengio Yoshua, Hinton Geoffrey (2015) Deep
and Data Engineering learning. Nature 521(7553):436–444
118. Jin Ming, Wang Shiyu, Ma Lintao, Chu Zhixuan, Zhang 135. Lee Junsoo (1994) Univariate time series modeling and fore-
James Y, Shi Xiaoming, Chen Pin-Yu, Liang Yuxuan, Li casting (box-jenkins method). Econ Times. p 413
Yuan-Fang, Pan Shirui, et al (2023) Time-llm: Time series 136. Sang Il Lee and Seong Joon Yoo (2020) Threshold-based port-
forecasting by reprogramming large language models. arXiv folio: the role of the threshold and its applications. J Super-
preprint[SPACE]arXiv:2310.01728. Accessed 2 Feb 2025 comput 76(10):8040–8057
International Journal of Machine Learning and Cybernetics

137. Li Jun, Liu Che, Cheng Sibo, Arcucci Rossella, Hong Shenda 154. Liu Mingzhou, Sun Xinwei, Hu Lingjing, Wang Yizhou (2023)
(2023) Frozen language model helps ecg zero-shot learning. Causal discovery from subsampled time series with proxy vari-
arXiv preprint[SPACE]arXiv:2 303.1 2311. Accessed 2 Feb ables. arXiv preprint[SPACE]arXiv:2305.05276. Accessed 2
2025 Feb 2025
138. Li Longyuan, Yan Junchi, Zhang Yunhao, Zhang Jihai, Bao Jie, 155. Liu Minhao, Zeng Ailing, Chen Muxi, Zhijian Xu, Lai Qiuxia,
Jin Yaohui (2022) Xiaokang Yang. Learning generative rnn- Ma Lingna, Qiang Xu (2022) Scinet: time series modeling and
ode for collaborative time-series and event sequence forecast- forecasting with sample convolution and interaction. Adv Neu-
ing, IEEE Transactions on Knowledge and Data Engineering ral Inf Process Syst 35:5816–5828
139. Li Rui, Shahn Zach, Li Jun, Lu Mingyu, Chakraborty Prith- 156. Liu Shizhan, Yu Hang, Liao Cong, Li Jianguo, Lin Weiyao,
wish, Sow Daby, Ghalwash Mohamed, Lehman Li-wei H Liu Alex X, Dustdar Schahram (2021) Pyraformer: Low-com-
(2020) G-net: a deep learning approach to g-computation for plexity pyramidal attention for long-range time series mod-
counterfactual outcome prediction under dynamic treatment eling and forecasting. In: International Conference on Learning
regimes. arXiv preprint[SPACE]arXiv:2003.10551. Accessed Representations
2 Feb 2025 157. Liu Xin, McDuff Daniel, Kovacs Geza, Galatzer-Levy Isaac,
140. Li Shancang, Da Li Xu, Zhao Shanshan (2015) The internet of Sunshine Jacob, Zhan Jiening, Poh Ming-Zher, Liao Shun,
things: a survey. Inf Syst Front 17:243–259 Di Achille Paolo, Patel Shwetak (2023) Large language mod-
141. Li S, Jin X, Xuan Y, Zhou X, Chen W, Wang YX, Yan X (2019) els are few-shot health learners. arXiv preprint[SPACE]arXiv:
Enhancing the locality and breaking the memory bottleneck of 2305.15525. Accessed 2 Feb 2025
transformer on time series forecasting. Adv Neural Inf Process 158. Yi Liu, James JQ, Jiawen Kang, Dusit Niyato, Shuyu Zhang
Syst 32:11 (2020) Privacy-preserving traffic flow prediction: a federated
142. Li Tong, Liu Zhaoyang, Shen Yanyan, Wang Xue, Chen learning approach. IEEE Internet Things J 7(8):7751–7763
Haokun, Huang Sen (2024) Master: market-guided stock trans- 159. Liu Yong, Haixu Wu, Wang Jianmin, Long Mingsheng (2022)
former for stock price forecasting. Proc AAAI Conf Artif Intell Non-stationary transformers: exploring the stationarity in time
38:162–170 series forecasting. Adv Neural Inf Process Syst 35:9881–9893
143. Li Xuerong, Shang Wei, Wang Shouyang (2019) Text-based 160. Liu Yong, Hu Tengge, Zhang Haoran, Wu Haixu, Wang Shiyu,
crude oil price forecasting: a deep learning approach. Int J Ma Lintao, Long Mingsheng (2023) itransformer: Inverted
Forecast 35(4):1548–1560 transformers are effective for time series forecasting. arXiv
144. Li Yaguang, Yu Rose, Shahabi Cyrus, Liu Yan (2017) Dif- preprint[SPACE]arXiv:2310.06625. Accessed 2 Feb 2025
fusion convolutional recurrent neural network: Data-driven 161. Lundberg Scott M, Lee Su-In (2017) A unified approach to
traffic forecasting. arXiv preprint[SPACE]arXiv:1707.01926. interpreting model predictions. Adv Neural Inf Process Syst.
Accessed 2 Feb 2025 p 30
145. Li Yan, Xinjiang Lu, Wang Yaqing, Dou Dejing (2022) Gen- 162. Luo Dongsheng, Cheng Wei, Wang Yingheng, Dongkuan Xu, Ni
erative time series forecasting with diffusion, denoise, disen- Jingchao, Wenchao Yu, Zhang Xuchao, Liu Yanchi, Chen Yun-
tanglement. Adv Neural Inf Process Syst 35:23009–23022 cong, Chen Haifeng et al (2023) Time series contrastive learning
146. Li Yuan, Wang Huanjie, Li Jingwei, Liu Chengbao, Tan Jie with information-aware augmentations. Proc AAAI Conf Artif
(2022) Act: Adversarial convolutional transformer for time Intell 37:4534–4542
series forecasting. In: 2022 International Joint Conference on 163. Luo Rui, Zhang Weinan, Xu Xiaojun, Wang Jun (2018) A neural
Neural Networks (IJCNN). IEEE pp 1–8 stochastic volatility model. In: Proceedings of the AAAI Confer-
147. Li Zhe, Rao Zhongwen, Pan Lujia, Wang Pengyun, Xu Zenglin ence on Artificial Intelligence. p 32
(2023) Ti-mae: Self-supervised masked time series autoencod- 164. Lütkepohl Helmut (2005) Vector autoregressive moving average
ers. arXiv preprint[SPACE]arXiv:2301.08871. Accessed 2 Feb processes. New Introduction to Multiple Time Series Analysis.
2025 pp 419–446
148. Lillicrap Timothy P, Hunt Jonathan J, Pritzel Alexander, Heess 165. Lv Yisheng, Duan Yanjie, Kang Wenwen, Li Zhengxi, Wang Fei-
Nicolas, Erez Tom, Tassa Yuval, Silver David, Wierstra Daan Yue (2014) Traffic flow prediction with big data: a deep learning
(2015) Continuous control with deep reinforcement learning. approach. IEEE Trans Intell Transp Syst 16(2):865–873
arXiv preprint[SPACE]arXiv:1 509.0 2971. Accessed 2 Feb 166. Lyu Xinrui, Hueser Matthias, Hyland Stephanie L, Zerveas
2025 George, Raetsch Gunnar (2018) Improving clinical predictions
149. Lim Bryan (2018) Forecasting treatment responses over time through unsupervised time series representation learning. arXiv
using recurrent marginal structural networks. Adv Neural Inf preprint[SPACE]arXiv:1812.00490. Accessed 2 Feb 2025
Process Syst. p 31 167. Makridakis Spyros (1978) Time-series analysis and forecasting:
150. Bryan Lim, Arık Sercan Ö, Nicolas Loeff, Tomas Pfister (2021) an update and evaluation. International Statistical Review/Revue
Temporal fusion transformers for interpretable multi-horizon Internationale de Statistique. pp 255–278
time series forecasting. Int J Forecasting 37(4):1748–1764 168. Makridakis Spyros, Spiliotis Evangelos, Assimakopoulos Vassil-
151. Lin Shengsheng, Lin Weiwei, Wu Wentai, Wang Songbo, ios (2018) The m4 competition: results, findings, conclusion and
Wang Yongxiang (2023) Petformer: Long-term time series way forward. Int J Forecast 34(4):802–808
forecasting via placeholder-enhanced transformer. arXiv 169. Makridakis Spyros, Spiliotis Evangelos, Assimakopoulos Vassil-
preprint[SPACE]arXiv:2308.04791. Accessed 2 Feb 2025 ios (2018) Statistical and machine learning forecasting methods:
152. Lin Wen-Hui, Wang Ping, Chao Kuo-Ming, Lin Hsiao-Chung, concerns and ways forward. PLoS ONE 13(3):e0194889
Yang Zong-Yu, Lai Yu-Huang (2021) Wind power forecasting 170. Maronna Ricardo A, Douglas Martin R, Yohai Victor J, Matías
with deep learning networks: time-series forecasting. Appl Sci Salibián-Barrera (2019) Robust statistics: theory and methods
11(21):10335 (with R). John Wiley & Sons, New York
153. Linot Alec J, Burby Joshua W, Tang Qi, Balaprakash Prasanna, 171. McMahan Brendan, Moore Eider, Ramage Daniel, Hampson
Graham Michael D, Maulik Romit (2023) Stabilized neural Seth, y Arcas Blaise Aguera (2017) Communication-efficient
ordinary differential equations for long-time forecasting of learning of deep networks from decentralized data. In: Artificial
dynamical systems. J Comput Phys 474:111838 Intelligence and Statistics. PMLR pp 1273–1282.
International Journal of Machine Learning and Cybernetics

172. Mehrkanoon Siamak (2019) Deep shared representation 191. Fotios Petropoulos, Daniele Apiletti, Vassilios Assimakopoulos,
learning for weather elements forecasting. Knowl-Based Syst Zied Babai Mohamed, Barrow Devon K, Ben Taieb Souhaib,
179:120–128 Christoph Bergmeir, Bessa Ricardo J, Jakub Bijak, Boylan John
173. Suryanshi Mishra, Tinku Singh, Manish Kumar, Satakshi, (2024) E et al (2022) Forecasting: theory and practice. Int J Forecast
Multivariate time series short term forecasting using cumulative 38(3):705–871
data of coronavirus. Evol Syst. 15(3):811–828 192. Piccialli Francesco, Giampaolo Fabio, Prezioso Edoardo, Cama-
174. Montero-Manso Pablo, Hyndman Rob J (2021) Principles and cho David, Acampora Giovanni (2021) Artificial intelligence and
algorithms for forecasting groups of time series: locality and healthcare: forecasting of medical bookings through multi-source
globality. Int J Forecast 37(4):1632–1653 time-series fusion. Inf Fusion 74:1–16
175. Montgomery Douglas C, Jennings Cheryl L, Murat Kulahci 193. Qin Yao, Song Dongjin, Chen Haifeng, Cheng Wei, Jiang
(2015) Introduction to time series analysis and forecasting. John Guofei, Cottrell Garrison (2017) A dual-stage attention-based
Wiley & Sons, New York recurrent neural network for time series prediction. arXiv
176. Moraffah Raha, Karami Mansooreh, Guo Ruocheng, Raglin preprint[SPACE]arXiv:1704.02971. Accessed 2 Feb 2025
Adrienne, Liu Huan (2020) Causal interpretability for machine 194. Rajagukguk Rial A, Ramadhan Raden AA, Lee Hyun-Jin (2020)
learning-problems, methods and evaluation. ACM SIGKDD A review on deep learning models for forecasting time series data
Explorations Newsl 22(1):18–33 of solar irradiance and photovoltaic power. Energies 13(24):6623
177. Mudelsee Manfred (2019) Trend analysis of climate time series: 195. Rasp Stephan, Dueben Peter D, Scher Sebastian, Weyn Jonathan
a review of methods. Earth Sci Rev 190:310–322 A, Mouatadid Soukayna, Thuerey Nils (2020) Weatherbench: a
178. Nemirovsky Daniel, Thiebaut Nicolas, Xu Ye, Gupta Abhishek benchmark data set for data-driven weather forecasting. J Adv
(2022) Countergan: generating counterfactuals for real-time Model Earth Syst 12(11):e2020MS002203
recourse and interpretability using residual gans. In: Uncertainty 196. Rasul Kashif, Seward Calvin, Schuster Ingmar, Vollgraf Roland
in Artificial Intelligence. PMLR pp 1488–1497. (2021) Autoregressive denoising diffusion models for multivari-
179. Ni Zelin, Yu Hang, Liu Shizhan, Li Jianguo, Lin Weiyao (2023) ate probabilistic time series forecasting. In: International Confer-
Basisformer: Attention-based time series forecasting with learn- ence on Machine Learning. PMLR. pp 8857–8868.
able and interpretable basis. arXiv preprint[SPACE]arXiv:2310. 197. Rasul Kashif, Ashok Arjun, Williams Andrew Robert, Khorasani
20496. Accessed 2 Feb 2025 Arian, Adamopoulos George, Bhagwatkar Rishika, Biloš Marin,
180. Nie Yuqi, Nguyen Nam H, Sinthong Phanwadee, Kalagnanam Ghonia Hena, Hassen Nadhir Vincent, Schneider Anderson, et al.
Jayant (2022) A time series is worth 64 words: Long-term fore- (2023) Lag-llama: Towards foundation models for time series
casting with transformers. arXiv preprint[SPACE]arXiv:2211. forecasting. arXiv preprint[SPACE]arXiv:2 310.0 8278. Accessed
14730. Accessed 2 Feb 2025 2 Feb 2025
181. Ntakaris Adamantios, Magris Martin, Kanniainen Juho, Gabbouj 198. Rebjock Quentin, Kurt Baris, Januschowski Tim, Callot Laurent
Moncef, Iosifidis Alexandros (2018) Benchmark dataset for mid- (2021) Online false discovery rate control for anomaly detection
price forecasting of limit order book data with machine learning in time series. Adv Neural Inf Process Syst 34:26487–26498
methods. J Forecast 37(8):852–866 199. Ribeiro Marco Tulio, Singh Sameer, Guestrin Carlos (2016) "
182. O’Hara Maureen, Ye Mao (2011) Is market fragmentation harm- why should i trust you?" explaining the predictions of any clas-
ing market quality? J Financ Econ 100(3):459–474 sifier. In: Proceedings of the 22nd ACM SIGKDD International
183. Oord Aaron van den, Dieleman Sander, Zen Heiga, Simonyan Conference on Knowledge Discovery and Data Mining. pp
Karen, Vinyals Oriol, Graves Alex, Kalchbrenner Nal, Senior 1135–1144
Andrew, Kavukcuoglu Koray (2016) Wavenet: A generative 200. Lisbeth Rodríguez-Mazahua, Cristian-Aarón Rodríguez-
model for raw audio. arXiv preprint[SPACE]arXiv:1609.03499. Enríquez, Luis Sánchez-Cervantes José, Jair Cervantes, Luis
Accessed 2 Feb 2025 García-Alcaraz Jorge, Giner Alor-Hernández (2016) A general
184. van den Oord Aaron, Li Yazhe, Vinyals Oriol (2018) Repre- perspective of big data: applications, tools, challenges and trends.
sentation learning with contrastive predictive coding. arXiv J Supercomput 72:3073–3113
preprint[SPACE]arXiv:1807.03748. Accessed 2 Feb 2025 201. Rokach Lior (2016) Decision forest: twenty years of research. Inf
185. Oreshkin Boris N, Carpov Dmitri, Chapados Nicolas, Bengio Fusion 27:111–125
Yoshua (2019) N-beats: Neural basis expansion analysis for inter- 202. Olaf Ronneberger, Philipp Fischer, Thomas Brox (2015) U-net:
pretable time series forecasting. arXiv preprint[SPACE]arXiv: convolutional networks for biomedical image segmentation.
1905.10437. Accessed 2 Feb 2025 Medical image computing and computer-assisted intervention-
186. Ahmed Oussous, Fatima-Zahra Benjelloun, Ait Lahcen Ayoub, MICCAI 2015: 18th International Conference, Munich, Ger-
Samir Belfkih (2018) Big data technologies: a survey. J King many, October 5–9, 2015, Proceedings, Part III 18. Springer,
Saud Univ-Comput Inf Sci 30(4):431–448 New York, pp 234–241
187. Ozyurt Yilmazcan, Feuerriegel Stefan, Zhang Ce (2022) Contras- 203. Rosenblatt Frank (1957) The perceptron, a perceiving and recog-
tive learning for unsupervised domain adaptation of time series. nizing automaton Project Para. Cornell Aeronautical Laboratory
arXiv preprint[SPACE]arXiv:2206.06243. Accessed 2 Feb 2025 204. Ruder S (2017) An overview of multi-task learning in deep
188. Peng Bo, Ding Yuanming, Kang Wei (2023) Metaformer: a trans- neural networks. arXiv preprint[SPACE]arXiv:1706.05098.
former that tends to mine metaphorical-level information. Sen- Accessed 2 Feb 2025
sors 23(11):5093 205. Sagiroglu Seref, Sinanc Duygu (2013) Big data: a review. In:
189. Peng Hao, Yang Renyu, Wang Zheng, Li Jianxin, Lifang He SYu, 2013 International Conference on Collaboration Technologies
Philip Albert Y, Zomaya Rajiv Ranjan (2021) Lime: low-cost and Systems (CTS). IEEE pp 42–47.
and incremental learning for dynamic heterogeneous information 206. Salinas David, Flunkert Valentin, Gasthaus Jan, Januschowski
networks. IEEE Trans Comput 71(3):628–642 Tim (2020) Deepar: probabilistic forecasting with autoregressive
190. Perslev Mathias, Jensen Michael, Darkner Sune, Jennum recurrent networks. Int J Forecast 36(3):1181–1191
Poul Jørgen, Igel Christian (2019) U-time: A fully convolutional 207. Sarkar Pritam, Etemad Ali (2020) Self-supervised learning for
network for time series segmentation applied to sleep staging. ecg-based emotion recognition. In: ICASSP 2020-2020 IEEE
Adv Neural Inf Process Syst. p 32 International Conference on Acoustics, Speech and Signal Pro-
cessing (ICASSP). IEEE pp 3217–3221.
International Journal of Machine Learning and Cybernetics

208. Savi Marco, Olivadese Fabrizio (2021) Short-term energy con- 227. Subramanya Tejas, Riggio Roberto (2021) Centralized and fed-
sumption forecasting at the edge: a federated learning approach. erated learning for predictive vnf autoscaling in multi-domain
IEEE Access 9:95949–95969 5g networks and beyond. IEEE Trans Netw Serv Manage
209. Saxena Harshit, Aponte Omar, McConky Katie T (2019) A 18(1):63–78
hybrid machine learning model for forecasting a billing period’s 228. Sun Chenxi, Li Yaliang, Li Hongyan, Hong Shenda (2023) Test:
peak electric load days. Int J Forecast 35(4):1288–1303 Text prototype aligned embedding to activate llm’s ability for
210. Scholz Randolf, Born Stefan, Duong-Trung Nghia, Cruz-Bour- time series. arXiv preprint[SPACE]arXiv:2 308.0 8241. Accessed
nazou Mariano Nicolas, Schmidt-Thieme Lars (2022) Latent 2 Feb 2025
linear odes with neural kalman filtering for irregular time series 229. Sun Fan-Keng, Boning Duane S (2022) Fredo: frequency
forecasting domain-based long-term time series forecasting. arXiv
211. Semenoglou Artemios-Anargyros, Spiliotis Evangelos, Makrida- preprint[SPACE]arXiv:2205.12301. Accessed 2 Feb 2025
kis Spyros, Assimakopoulos Vassilios (2021) Investigating the 230. Sun Fan-Keng, Lang Chris, Boning Duane (2021) Adjusting for
accuracy of cross-learning time series forecasting methods. Int J autocorrelated errors in neural networks for time series. Adv
Forecast 37(3):1072–1084 Neural Inf Process Syst 34:29806–29819
212. Seyfi Ali, Rajotte Jean-Francois, Ng Raymond (2022) Generating 231. Sutskever I (2014) Sequence to sequence learning with neural
multivariate time series with common source coordinated gan networks. arXiv preprint[SPACE]arXiv:1409.3215. Accessed 2
(cosci-gan). Adv Neural Inf Process Syst 35:32777–32788 Feb 2025
213. Omer Berat Sezer (2020) Mehmet Ugur Gudelek, Ahmet Murat 232. Taïk Afaf, Cherkaoui Soumaya (2020) Electrical load forecasting
Ozbayoglu (2020) Financial time series forecasting with deep using edge computing and federated learning. In: ICC 2020-2020
learning A systematic literature review: 2005–2019. Appl Soft IEEE International Conference on Communications (ICC). IEEE
Comput. 90:106181. pp 1–6.
214. Shabani, Amin, Abdi, Amir, Meng, Lili, Sylvain, Tristan (2022) 233. Shuntaro Takahashi Yu, Chen Kumiko Tanaka-Ishii (2019) Mod-
Scaleformer: iterative multi-scale refining transformers for time eling financial time-series with generative adversarial networks.
series forecasting. arXiv preprint[SPACE]arXiv:2206.04038. Physica A 527:121261
Accessed 2 Feb 2025 234. Tang Peiwang, Zhang Xianchao (2023) Infomaxformer: Maxi-
215. Shao Zezhi, Zhang Zhao, Wang Fei (2022) Yongjun Xu. Pre- mum entropy transformer for long time-series forecasting prob-
training enhanced spatial-temporal graph neural network for lem. arXiv preprint[SPACE]arXiv:2301.01772. Accessed 2 Feb
multivariate time series forecasting. In: Proceedings of the 28th 2025
ACM SIGKDD Conference on Knowledge Discovery and Data 235. Tashiro Yusuke, Song Jiaming, Song Yang, Ermon Stefano
Mining. pp 1567–1577 (2021) Csdi: Conditional score-based diffusion models for proba-
216. Shen Lifeng, Kwok James (2020) Non-autoregressive con- bilistic time series imputation. In M. Ranzato, A. Beygelzimer,
ditional diffusion models for time series prediction. arXiv Y. Dauphin, P.S. Liang, J. Wortman Vaughan, editors, Advances
preprint[SPACE]arXiv:2306.05043. Accessed 2 Feb 2025 in Neural Information Processing Systems. Curran Associates,
217. Sheng Wanxing, Liu Keyan, Jia Dongli, Chen Shuo, Lin Ron- Inc. 34: 24804–24816.
gheng (2022) Short-term load forecasting algorithm based on 236. Taylor James W, McSharry Patrick E, Buizza Roberto (2009)
lst-tcn in power distribution network. Energies 15(15):5584 Wind power density forecasting using ensemble predictions and
218. Shi Xingjian, Chen Zhourong, Wang Hao, Yeung Dit-Yan, Wong time series models. IEEE Trans Energy Convers 24(3):775–782
Wai-Kin, Woo Wang-chun (2020) Convolutional lstm network 237. Tedjopurnomo David Alexander, Bao Zhifeng, Zheng Baihua,
A machine learning approach for precipitation nowcasting. Adv Choudhury Farhana Murtaza, Qin Alex Kai (2020) A sur-
Neural Inf Process Syst. p 28 vey on modern deep neural network for traffic prediction:
219. Shumway Robert H, Stoffer David S, Stoffer David S (2000) trends, methods and challenges. IEEE Trans Knowl Data Eng
Time series analysis and its applications. Springer, Cham 34(4):1544–1561
220. Demystification of deep learning models for time-series analy- 238. Topol Eric J (2019) High-performance medicine: the conver-
sis (2019) Shoaib Ahmed Siddiqui, Dominique Mercier, Moh- gence of human and artificial intelligence. Nat Med 25(1):44–56
sin Munir, Andreas Dengel, Sheraz Ahmed. Tsviz. IEEE Access 239. Torres José F, Troncoso Alicia, Koprinska Irena, Wang Zheng,
7:67027–67040 Martínez-Álvarez Francisco (2019) Deep learning for big data
221. Simonyan Karen, Vedaldi Andrea, Zisserman Andrew (2013) time series forecasting applied to solar power. In Interna-
Deep inside convolutional networks: visualising image classifi- tional Joint Conference SOCO’18-CISIS’18-ICEUTE’18: San
cation models and saliency maps. arXiv preprint[SPACE]arXiv: Sebastián, Spain, June 6-8, 2018 Proceedings 13. Springer,
1312.6034. Accessed 2 Feb 2025 Cham. pp 123–133
222. Singh Ankit (2021) Clda: contrastive learning for semi- 240. Toubeau Jean-François, Bottieau Jérémie, Vallée François,
supervised domain adaptation. Adv Neural Inf Process Syst De Grève Zacharie (2018) Deep learning-based multivariate
34:5089–5101 probabilistic forecasting for short-term scheduling in power
223. Singh Shailendra, Yassine Abdulsalam (2018) Big data mining of markets. IEEE Trans Power Syst 34(2):1203–1215
energy time series for behavioral analytics and energy consump- 241. Trindade Artur (2015) Electricityloaddiagrams 2011-2014.
tion forecasting. Energies 11(2):452 UCI Machine Learning Repository
224. Singh Tinku, Sharma Nikhil, Satakshi Manish Kumar (2023) 242. Wang Hao, Zhang Zhenguo (2022) Tatcn: time series predic-
Analysis and forecasting of air quality index based on satellite tion model based on time attention mechanism and tcn. In:
data. Inhalation Toxicol 35(1–2):24–39 2022 IEEE 2nd International Conference on Computer Com-
225. Smyl Slawek (2020) A hybrid method of exponential smoothing munication and Artificial Intelligence (CCAI). IEEE. pp
and recurrent neural networks for time series forecasting. Int J 26–31.
Forecast 36(1):75–85 243. Wang Lei, Zeng Liang, Li Jian (2023) Aec-gan: adversarial error
226. Sorkun Murat Cihan, Paoli Christophe, Incel Özlem Durmaz correction gans for auto-regressive long time-series generation.
(2017) Time series forecasting on solar irradiation using deep Proc AAAI Conf Artifi Intell 37:10140–10148
learning. In: 2017 10th International Conference on Electrical 244. Wang Shiyu, Wu Haixu, Shi Xiaoming, Hu Tengge, Luo Huakun,
and Electronics Engineering (ELECO). IEEE. pp 151–155. Ma Lintao, Zhang James Y, ZHOU JUN (2024) Timemixer:
International Journal of Machine Learning and Cybernetics

Decomposable multiscale mixing for time series forecasting. In: 262. Vijaya Krishna Yalavarthi, Kiran Madhusudhanan, Randolf
International Conference on Learning Representations (ICLR). Scholz, Nourhan Ahmed, Johannes Burchert, Shayan Jawed,
245. Wang Xue, Zhou Tian, Wen Qingsong, Gao Jinyang, Ding Bolin, Stefan Born, Lars Schmidt-Thieme. Grafiti (2024) Graphs for
Jin Rong (2024) Card: Channel aligned robust blend transformer forecasting irregularly sampled time series. Proc AAAI Conf
for time series forecasting. In: The Twelfth International Confer- Artif Intell. 38:16255–16263
ence on Learning Representations. 263. Yan Jingquan, Wang Hao (2023) Self-interpretable time
246. Wang Yuxuan, Wu Haixu, Dong Jiaxiang, Liu Yong, Qiu Yun- series prediction with counterfactual explanations. arXiv
zhong, Zhang Haoran, Wang Jianmin, Long Mingsheng (2024) preprint[SPACE]arXiv:2306.06024. Accessed 2 Feb 2025
Timexer: Empowering transformers for time series forecasting 264. Hao-Fan Yang and Yi-Ping Phoebe Chen (2019) Representation
with exogenous variables. arXiv preprint[SPACE]arXiv:2402. learning with extreme learning machines and empirical mode
19072. Accessed 2 Feb 2025 decomposition for wind speed forecasting methods. Artif Intell
247. Wang Zhiyuan, Xovee Xu, Zhang Weifeng, Trajcevski Goce, 277:103176
Zhong Ting, Zhou Fan (2022) Learning latent seasonal-trend 265. Yang Ye, Jiangang Lu (2022) A fusion transformer for multi-
representations for time series forecasting. Adv Neural Inf Pro- variable time series forecasting: the mooney viscosity prediction
cess Syst 35:38775–38787 case. Entropy 24(4):528
248. Wen Ruofeng, Torkkola Kari, Narayanaswamy Balakrishnan, 266. Yi Kun, Zhang Qi, Fan Wei, Wang Shoujin, Wang Pengyang, He
Madeka Dhruv (2017) A multi-horizon quantile recurrent fore- Hui, Lian Defu, An Ning, Cao Longbing, Niu Zhendong (2023)
caster. arXiv preprint[SPACE]arXiv:1711.11053. Accessed 2 Frequency-domain mlps are more effective learners in time series
Feb 2025 forecasting. arXiv preprint[SPACE]arXiv:2 311.0 6184. Accessed
249. Woo Gerald, Liu Chenghao, Sahoo Doyen, Kumar Akshat, 2 Feb 2025
Hoi Steven (2022) Cost: Contrastive learning of disentangled 267. Ozge Cagcag Yolcu and Ufuk Yolcu (2023) A novel intuitionistic
seasonal-trend representations for time series forecasting. arXiv fuzzy time series prediction model with cascaded structure for
preprint[SPACE]arXiv:2202.01575. Accessed 2 Feb 2025 financial time series. Expert Syst Appl 215:119336
250. Woo Gerald, Liu Chenghao, Sahoo Doyen, Kumar Akshat, Hoi 268. Yoon Jinsung, Jordon James, Van Der Schaar Mihaela (2018)
Steven (2022) Etsformer: Exponential smoothing transformers Ganite: Estimation of individualized treatment effects using gen-
for time-series forecasting. arXiv preprint[SPACE]arXiv:2202. erative adversarial nets. In: International Conference on Learning
01381. Accessed 2 Feb 2025 Representations.
251. Haixu Wu, Jiehui Xu, Wang Jianmin, Long Mingsheng (2021) 269. Jinsung Yoon, Daniel Jarrett, Mihaela Van der Schaar (2019)
Autoformer: decomposition transformers with auto-correlation Time-series generative adversarial networks. Adv Neural Inf
for long-term series forecasting. Adv Neural Inf Process Syst Process Syst. 32
34:22419–22430 270. Young Julong, Chen Junhui (2022) Feihu Huang, Jian Peng.
252. Wu Haixu, Hu Tengge, Liu Yong, Zhou Hang, Wang Jian- Transformer extends look-back horizon to predict longer-term
min, Long Mingsheng (2022) Timesnet: temporal 2d-var- time series, Dateformer
iation modeling for general time series analysis. arXiv 271. Yu Bing, Yin Haoteng, Zhu Zhanxing (2017) Spatio-temporal
preprint[SPACE]arXiv:2210.02186. Accessed 2 Feb 2025 graph convolutional networks: A deep learning framework for
253. Wu Neo, Green Bradley, Ben Xue, O’Banion Shawn (2020) Deep traffic forecasting. arXiv preprint[SPACE]arXiv:1709.04875.
transformer models for time series forecasting: The influenza Accessed 2 Feb 2025
prevalence case. arXiv preprint[SPACE]arXiv:2001.08317. 272. Yu Chengqing, Wang Fei, Shao Zezhi, Qian Tangwen, Zhang
Accessed 2 Feb 2025 Zhao, Wei Wei, Xu Yongjun (2024) Ginar: An end-to-end mul-
254. Sifan Wu, Xiao Xi, Ding Qianggang, Zhao Peilin, Wei Ying, tivariate time series forecasting model suitable for variable miss-
Huang Junzhou (2020) Adversarial sparse transformer for time ing. In: Proceedings of the 30th ACM SIGKDD Conference on
series forecasting. Adv Neural Inf Process Syst 33:17105–17115 Knowledge Discovery and Data Mining. pp 3989–4000
255. Xindong Wu, Zhu Xingquan, Gong-Qing Wu, Ding Wei (2013) 273. Yu Xinli, Chen Zheng, Ling Yuan, Dong Shujing, Liu Zongyi,
Data mining with big data. IEEE Trans Knowl Data Eng Lu Yanbin (2023) Temporal data meets llm–explainable finan-
26(1):97–107 cial time series forecasting. arXiv preprint[SPACE]arXiv:2306.
256. Xie Qianqian, Han Weiguang, Lai Yanzhao, Peng Min, Huang 11025. Accessed 2 Feb 2025
Jimin (2023) The wall street neophyte: A zero-shot analysis of 274. Yue Zhihan, Wang Yujing, Duan Juanyong, Yang Tianmeng,
chatgpt over multimodal stock movement prediction challenges. Huang Congrui, Tong Yunhai, Bixiong Xu (2022) Ts2vec:
arXiv preprint[SPACE]arXiv:2304.05351. Accessed 2 Feb 2025 towards universal representation of time series. Proc AAAI Conf
257. Qifa Xu, Liu Xi, Jiang Cuixia, Keming Yu (2016) Quantile Artif Intell 36:8980–8987
autoregression neural network model with applications to evalu- 275. Zaini Nur’atiah, Ean Lee Woen, Ahmed Ali Najah, Malek Mar-
ating value at risk. Appl Soft Comput 49:1–12 linda Abdul (2022) A systematic literature review of deep learn-
258. Yingcheng Xu, Zhang Yunfeng, Liu Peide, Zhang Qiuyue, Zuo ing neural network for time series air quality forecasting. Environ
Yuqi (2024) Gan-enhanced nonlinear fusion model for stock Sci Pollut Res. pp 1–33
price prediction. Int J Comput Intell Syst 17(1):12 276. Zebari Rizgar, Abdulazeez Adnan, Zeebaree Diyar, Zebari Dilo-
259. Xue H, Salim FD (2023) Promptcast: a new prompt-based learn- van, Saeed Jwan (2020) A comprehensive review of dimensional-
ing paradigm for time series forecasting. IEEE Trans Knowl Data ity reduction techniques for feature selection and feature extrac-
Eng 36:1–14 tion. J Appl Sci Technol Trends 1(1):56–70
260. Xue Hao, Voutharoja Bhanu Prakash, Salim Flora D (2022) Lev- 277. Zerveas George, Jayaraman Srideepika, Patel Dhaval, Bhami-
eraging language foundation models for human mobility fore- dipaty Anuradha, Eickhoff Carsten (2021) A transformer-based
casting. In: Proceedings of the 30th International Conference on framework for multivariate time series representation learn-
Advances in Geographic Information Systems. pp 1–9 ing. In: Proceedings of the 27th ACM SIGKDD Conference on
261. Xue Wang, Zhou Tian, Wen QingSong, Gao Jinyang, Ding Bolin, Knowledge Discovery & Data Mining. pp 2114–2124
Jin Rong (2023) Make transformer great again for time series 278. Zhang Boyu, Yang Hongyang, Liu Xiao-Yang (2023)
forecasting: Channel aligned robust dual transformer. arXiv I n st r u c t - f i n g p t : Fi n a n c i a l s e n t i m e n t a n a lys i s by
preprint[SPACE]arXiv:2305.12095. Accessed 2 Feb 2025
International Journal of Machine Learning and Cybernetics

instruction tuning of general-purpose large language models. long term time series forecasting. arXiv preprint[SPACE]arXiv:
arXiv preprint[SPACE]arXiv:2306.12659. Accessed 2 Feb 2025 2206.12106. Accessed 2 Feb 2025
279. Zhang Chenhan, Shuyu Zhang JQ, James Shui Yu (2021) Fast- 296. Zhou Tian, Niu Peisong, Wang Xue, Sun Liang, Jin Rong (2023)
gnn: a topological information protected federated learning One fits all: Power general time series analysis by pretrained lm.
approach for traffic speed forecasting. IEEE Trans Industr Inf arXiv preprint[SPACE]arXiv:2302.11939. Accessed 2 Feb 2025
17(12):8464–8474 297. Zhu Hongjun, Yuan Shun, Liu Xin, Chen Kuo, Jia Chaolong,
280. Peter Zhang G (2003) Time series forecasting using a hybrid Qian Ying (2024) Casciff: A cross-domain information fusion
arima and neural network model. Neurocomputing 50:159–175 framework tailored for cascade prediction in social networks.
281. Guoqiang Zhang B, Patuwo Eddy, Hu Michael Y (1998) Fore- Knowl-Based Syst. p 112391
casting with artificial neural networks: the state of the art. Int J 298. Zhu Yuzhen, Luo Shaojie, Huang Di, Zheng Weiyan, Fang Su,
Forecast 14(1):35–62 Hou Beiping (2023) Drcnn: decomposing residual convolutional
282. Zhang Wenrui, Yang Ling, Geng Shijia, Hong Shenda (2022) neural networks for time series forecasting. Sci Rep 13(1):15901
Self-supervised time series representation learning via cross 299. Zou Dongcheng, Wang Senzhang, Li Xuefeng, Peng Hao, Wang
reconstruction transformer. arXiv preprint[SPACE]arXiv:2205. Yuandong, Liu Chunyang, Sheng Kehua, Zhang Bo (2024) Mul-
09928. Accessed 2 Feb 2025 tispans: A multi-range spatial-temporal transformer network for
283. Zhang Xiaoning, Fang Fang, Wang Jiaqi (2020) Probabilistic traffic forecast via structural entropy optimization. In: Proceed-
solar irradiation forecasting based on variational bayesian infer- ings of the 17th ACM International Conference on Web Search
ence with secure federated learning. IEEE Trans Industr Inf and Data Mining. pp 1032–1041
17(11):7849–7859
284. Zhang Xiyuan, Jin Xiaoyong, Gopalswamy Karthick, Gupta Publisher's Note Springer Nature remains neutral with regard to
Gaurav, Park Youngsuk, Shi Xingjian, Wang Hao, Mad- jurisdictional claims in published maps and institutional affiliations.
dix Danielle C, Wang Yuyang (2022) First de-trend then
attend: Rethinking attention for time-series forecasting. arXiv
preprint[SPACE]arXiv:2212.08151. Accessed 2 Feb 2025
285. Zhang Yifan, Wu Rui, Dascalu Sergiu M, Harris Jr Frederick C
(2023) Multi-scale transformer pyramid networks for multivari-
ate time series forecasting. arXiv preprint[SPACE]arXiv:2308.
11946. Accessed 2 Feb 2025
286. Zhang Yunhao, Yan Junchi (2023) Crossformer: Transformer uti-
lizing cross-dimension dependency for multivariate time series
forecasting. In: The eleventh international conference on learning
representations.
287. Zhang Zhenwei, Wang Xin, Gu Yuantao (2023) Sageformer:
Series-aware graph-enhanced transformers for multivariate time
series forecasting. arXiv preprint[SPACE]arXiv:2307.01616.
Accessed 2 Feb 2025
288. Zhang Zhenwei, Meng Linghang, Yuantao Gu (2024) Sage-
former: series-aware framework for long-term multivariate time
series forecasting. IEEE Internet Things J. https://doi.org/10.
1109/JIOT.2024.3363451
289. Zhao Wentian, Gao Yanyun, Ji Tingxiang, Wan Xili, Ye Feng,
Bai Guangwei (2019) Deep temporal convolutional networks for
short-term traffic flow forecasting. Ieee Access 7:114496–114507
290. Zhao Yongning, Ye Lin, Li Zhi, Song Xuri, Lang Yansheng, Jian
Su (2016) A novel bidirectional mechanism based on time series
model for wind power forecasting. Appl Energy 177:793–803
291. Zheng Xiaochen, Chen Xingyu, Schürch Manuel, Mollaysa
Amina, Allam Ahmed, Krauthammer Michael (2023) Simts:
Rethinking contrastive representation learning for time series
forecasting. arXiv preprint[SPACE]arXiv:2 303.1 8205. Accessed
2 Feb 2025
292. Zhou Haoyi, Zhang Shanghang, Peng Jieqi, Zhang Shuai, Li
Jianxin, Xiong Hui, Zhang Wancai (2021) Informer: beyond
efficient transformer for long sequence time-series forecasting.
Proc AAAI Conf Artif Intell 35:11106–11115
293. Zhou Tian, Ma Ziqing, Wen Qingsong, Sun Liang, Yao Tao, Yin
Wotao, Jin Rong et al (2022) Film: frequency improved legendre
memory model for long-term time series forecasting. Adv Neural
Inf Process Syst 35:12677–12690
294. Zhou Tian, Ma Ziqing, Wen Qingsong, Wang Xue, Sun Liang,
Jin Rong (2022) Fedformer: Frequency enhanced decomposed
transformer for long-term series forecasting. In: International
Conference on Machine Learning. PMLR. pp 27268–27286.
295. Zhou Tian, Zhu Jianqing, Wang Xue, Ma Ziqing, Wen Qingsong,
Sun Liang, Jin Rong (2022) Treedrnet: a robust deep model for

Management of Dysarthria
100% (10)
Management of Dysarthria
204 pages
Amit Konar, Diptendu Bhattacharya-Time-Series Prediction and Applications. A Machine Intelligence Approach-Springer (2017)
No ratings yet
Amit Konar, Diptendu Bhattacharya-Time-Series Prediction and Applications. A Machine Intelligence Approach-Springer (2017)
248 pages
A Systematic Review For Transformer-Based Long-Term Series Forecasting
No ratings yet
A Systematic Review For Transformer-Based Long-Term Series Forecasting
30 pages
Book 7
No ratings yet
Book 7
35 pages
An Experimental Review On Deep Learning Architectures For Time Series Forecasting
No ratings yet
An Experimental Review On Deep Learning Architectures For Time Series Forecasting
25 pages
Bryan Lim
No ratings yet
Bryan Lim
145 pages
Deep Learning Models for Time Series Forecasting a Review
No ratings yet
Deep Learning Models for Time Series Forecasting a Review
22 pages
A Joint Time-Frequency Domain Transformer For Multivariate Time Series Forecasting
No ratings yet
A Joint Time-Frequency Domain Transformer For Multivariate Time Series Forecasting
33 pages
Bridging Self Attention and Time Series Decomposition For Periodic Forecasting
No ratings yet
Bridging Self Attention and Time Series Decomposition For Periodic Forecasting
10 pages
TimeGPT 1 2310.03589
No ratings yet
TimeGPT 1 2310.03589
12 pages
A Review of Deep Learning Models For Time Series Prediction
No ratings yet
A Review of Deep Learning Models For Time Series Prediction
16 pages
Time-Series Forecasting With Deep Learning - A Survey
No ratings yet
Time-Series Forecasting With Deep Learning - A Survey
14 pages
bd12bf04746d4c4fa2fb26d296032d49
No ratings yet
bd12bf04746d4c4fa2fb26d296032d49
8 pages
Time-series Forecasting With Deep Learning - A Survey
No ratings yet
Time-series Forecasting With Deep Learning - A Survey
14 pages
FEDformer - Frequency Enhanced Decomposed Transformer For Long-Term Series Forecasting
No ratings yet
FEDformer - Frequency Enhanced Decomposed Transformer For Long-Term Series Forecasting
19 pages
Autoformer
No ratings yet
Autoformer
20 pages
Time Series Forecasting With Deep Learning: A Survey: Research
No ratings yet
Time Series Forecasting With Deep Learning: A Survey: Research
13 pages
Autoformer Nips21
No ratings yet
Autoformer Nips21
12 pages
s11063-024-11656-3
No ratings yet
s11063-024-11656-3
25 pages
Time Gpt
No ratings yet
Time Gpt
12 pages
MixMamba Time Series Modeling With Adaptive Expertise
No ratings yet
MixMamba Time Series Modeling With Adaptive Expertise
13 pages
FreDo - Frequency Domain-Based Long-Term Time Series Forecasting
No ratings yet
FreDo - Frequency Domain-Based Long-Term Time Series Forecasting
12 pages
ssrn-5033163
No ratings yet
ssrn-5033163
33 pages
Improving Long-Term Multivariate Time Series Forecasting With A Seasonal-Trend Decomposition-Based 2-Dimensional Temporal Convolution Dense Network
No ratings yet
Improving Long-Term Multivariate Time Series Forecasting With A Seasonal-Trend Decomposition-Based 2-Dimensional Temporal Convolution Dense Network
13 pages
Full_Review_Time_Series_Deep_Learning
No ratings yet
Full_Review_Time_Series_Deep_Learning
2 pages
Chandrasekaran, R., & Paramasivan, S. K. (2022). a State-Of-The-Art Review of Time Series Forecasting Using Deep Learning Approaches.
No ratings yet
Chandrasekaran, R., & Paramasivan, S. K. (2022). a State-Of-The-Art Review of Time Series Forecasting Using Deep Learning Approaches.
14 pages
Deep Learning For Time Series Forecasting - Tutorial and Literature Survey
100% (1)
Deep Learning For Time Series Forecasting - Tutorial and Literature Survey
36 pages
Are Transformers Effective For Time Series Forecasting?
No ratings yet
Are Transformers Effective For Time Series Forecasting?
15 pages
A comparison between machine and deep learning models on high stationarity data
No ratings yet
A comparison between machine and deep learning models on high stationarity data
11 pages
V2
No ratings yet
V2
10 pages
Transfer Learning With Time Series Data A Systematic Mapping Study
No ratings yet
Transfer Learning With Time Series Data A Systematic Mapping Study
24 pages
1 s2.0 S0925231220300606 Main
No ratings yet
1 s2.0 S0925231220300606 Main
11 pages
Learning Deep Time-Index Models For Time Series Forecasting
No ratings yet
Learning Deep Time-Index Models For Time Series Forecasting
21 pages
XLSTMTime - Long-term Time Series Forecasting With XLSTM
No ratings yet
XLSTMTime - Long-term Time Series Forecasting With XLSTM
13 pages
Обработка данных
No ratings yet
Обработка данных
6 pages
Time Series 10.1007@s10618 019 00619 1
No ratings yet
Time Series 10.1007@s10618 019 00619 1
47 pages
Time Series Forecasting of Petroleum
No ratings yet
Time Series Forecasting of Petroleum
11 pages
Spectral Temporal Graph Neural Network For Multivariate Time-Series Forecasting
No ratings yet
Spectral Temporal Graph Neural Network For Multivariate Time-Series Forecasting
20 pages
Book 8
No ratings yet
Book 8
18 pages
Are Transformers Effective For Time Series Forecasting?
No ratings yet
Are Transformers Effective For Time Series Forecasting?
8 pages
Parallel Multivariate Deep Learning Models For Time-Series Prediction: A Comparative Analysis in Asian Stock Markets
No ratings yet
Parallel Multivariate Deep Learning Models For Time-Series Prediction: A Comparative Analysis in Asian Stock Markets
12 pages
Time Series Forecasting of Petroleum Pro
No ratings yet
Time Series Forecasting of Petroleum Pro
11 pages
A Decoder-Only Foundation Model for Time-series Forecasting
No ratings yet
A Decoder-Only Foundation Model for Time-series Forecasting
11 pages
Multivariate Lstm-Fcns For Time Series Classification: A B A, A
No ratings yet
Multivariate Lstm-Fcns For Time Series Classification: A B A, A
18 pages
时间序列
No ratings yet
时间序列
39 pages
Deep Learning for Time Series Classification a Rev
No ratings yet
Deep Learning for Time Series Classification a Rev
48 pages
Exploring The Association Between Time Series Features and Fore - 2023 - Neuroco
No ratings yet
Exploring The Association Between Time Series Features and Fore - 2023 - Neuroco
20 pages
A Comparative Study and Analysis of Time
No ratings yet
A Comparative Study and Analysis of Time
7 pages
Multivariate Time Series Forecasting With Dynamic Graph Neural Odes
No ratings yet
Multivariate Time Series Forecasting With Dynamic Graph Neural Odes
14 pages
Enhancing Time Series Forecasting Accuracy With Deep Learning Models: A Comparative Study
No ratings yet
Enhancing Time Series Forecasting Accuracy With Deep Learning Models: A Comparative Study
10 pages
ouyang2017
No ratings yet
ouyang2017
13 pages
Kolmogorov-Arnold Networks (Kans) For Time Series Analysis
No ratings yet
Kolmogorov-Arnold Networks (Kans) For Time Series Analysis
7 pages
CH 10
No ratings yet
CH 10
41 pages
Convolutional Neural Networks For Time Series Classification
No ratings yet
Convolutional Neural Networks For Time Series Classification
8 pages
Multivariate Time Series Forecasting Final 3rd Sem
No ratings yet
Multivariate Time Series Forecasting Final 3rd Sem
22 pages
SOFTS_Multivariate_TimeSeries
No ratings yet
SOFTS_Multivariate_TimeSeries
23 pages
FilterNet Harnessing Frequency Filters for Time Series Forecasting
No ratings yet
FilterNet Harnessing Frequency Filters for Time Series Forecasting
20 pages
2403.11144v3
No ratings yet
2403.11144v3
14 pages
2305.12095
No ratings yet
2305.12095
39 pages
Essentials of Time Series Econometrics
From Everand
Essentials of Time Series Econometrics
Rajat Chopra
No ratings yet
Introduction to Time Series Analysis
From Everand
Introduction to Time Series Analysis
Vikas Rathi
No ratings yet
Maf661 Tuto Chapter 6 July2022
No ratings yet
Maf661 Tuto Chapter 6 July2022
2 pages
Introduction To Vertical Roller Mill
No ratings yet
Introduction To Vertical Roller Mill
35 pages
Textile test series 4 answer
No ratings yet
Textile test series 4 answer
13 pages
Week 3 - Part 1
No ratings yet
Week 3 - Part 1
3 pages
A1 ENS Price List 0515
No ratings yet
A1 ENS Price List 0515
25 pages
Chapter 1 Lecture Notes
100% (1)
Chapter 1 Lecture Notes
12 pages
MULTINATIONAL CORPORATION-1
No ratings yet
MULTINATIONAL CORPORATION-1
8 pages
Oil & Gas Industry in Qatar
100% (1)
Oil & Gas Industry in Qatar
25 pages
Blueprint T YBPharm SemV BP501T Medicinal Chemistry II Theory Syllabus Discussion 23june21 1
No ratings yet
Blueprint T YBPharm SemV BP501T Medicinal Chemistry II Theory Syllabus Discussion 23june21 1
10 pages
Ncert Sol for Class 9 Maths Chapter 2 Ex 1
No ratings yet
Ncert Sol for Class 9 Maths Chapter 2 Ex 1
4 pages
Curbstone Method Statment
No ratings yet
Curbstone Method Statment
2 pages
HKU SPACE Community College: Topic 4: Politics
No ratings yet
HKU SPACE Community College: Topic 4: Politics
30 pages
Kumpulan Quiz AKM III
No ratings yet
Kumpulan Quiz AKM III
10 pages
Networks On Chip (NOC) : Design Challenges
No ratings yet
Networks On Chip (NOC) : Design Challenges
8 pages
c99 PHP
No ratings yet
c99 PHP
239 pages
Eshetuu Print
No ratings yet
Eshetuu Print
60 pages
Analysis and Effects of TB Ball Vs Ms Ball of Forward Sports PVT LTD
No ratings yet
Analysis and Effects of TB Ball Vs Ms Ball of Forward Sports PVT LTD
11 pages
LC111CY: Compressor Technical Specification
No ratings yet
LC111CY: Compressor Technical Specification
8 pages
Embraer Prodigy Flight Deck 100: Pilot's Guide
No ratings yet
Embraer Prodigy Flight Deck 100: Pilot's Guide
770 pages
DCU Business School Undergraduate Poster
No ratings yet
DCU Business School Undergraduate Poster
1 page
Written Assignment 6 5272
No ratings yet
Written Assignment 6 5272
6 pages
Topic 9 Scheme of Work
No ratings yet
Topic 9 Scheme of Work
3 pages
Fossils Defining Geologic Timeline New
No ratings yet
Fossils Defining Geologic Timeline New
50 pages
CUETApplicationForm 223511117036
No ratings yet
CUETApplicationForm 223511117036
1 page
VDM Metals Welding Consumables Catalog
No ratings yet
VDM Metals Welding Consumables Catalog
140 pages
Service Quality and Customer Satisfaction Theories(1)
No ratings yet
Service Quality and Customer Satisfaction Theories(1)
3 pages
Epo Ai
No ratings yet
Epo Ai
4 pages
Performance Tuning Brochure W
No ratings yet
Performance Tuning Brochure W
3 pages
TABLET RETRIEVAL CHECKLIST Mwaa
No ratings yet
TABLET RETRIEVAL CHECKLIST Mwaa
2 pages

s13042-025-02560-w

Uploaded by

s13042-025-02560-w

Uploaded by

International Journal of Machine Learning and Cybernetics

Deep learning for time series forecasting: a survey

Received: 8 October 2024 / Accepted: 20 January 2025

Fig. 1 The outline of this article

Table 1 DTSF model architecture paradigm

Paradigms with Paradigms without

Encoder-Decoder Transformer Generate Integrated Cascade

Encoder- Attention- Temporal Stack Tree Triangle

Fig. 2 The details of five paradigms

3.1.2 Transformer model However, applying Transformer to TSF tasks is not

Fig. 3 The overview of HANet model

Fig. 4 The overview of

Fig. 5 The overview of autoformer model

Fig. 6 The overview of

Fig. 7 The overview of TATCN model

Fig. 8 The overview of TreeDRNet model

4.2.1 Dimension decomposition on the overall time series. Furthermore, decomposing data

Fig. 10 The overview of LaST model

Fig. 11 The overview of FEDformer model

Fig. 12 The overview of TF-C

Fig. 13 The overview of STEP

Fig. 14 The overview of PatchTST

5.1 Challenges Furthermore, it is noteworthy that, in contrast to the

of complicated time series data, the NDE technique offers 5.2 Prospects

The burgeoning development of Diffusion models in the

Table 3 Time series datasets in primary domains

ETTh1 7 2016–2018 1h Multi + uni Zhou et al.

Fig. 15 Challenges in time series forecasting

Fig. 16 Time series datasets in primary domains

Energy recorded at 15-minute intervals. These datasets originate

Electricity transformer temperature (ETT) Electricity

Temperature rain Dominick

Table 4 Summary of the datasets used in the experiments

Domain Variants Dataset Data time range Data granularity References

21 the Scada wind farm in Turkey 2018/1/1-2018/12/29 10 m [152]

You might also like

3.1.2 Transformer model However, applying Transformer to TSF tasks is not

4.2.1 Dimension decomposition on the overall time series. Furthermore, decomposing data

5.1 Challenges Furthermore, it is noteworthy that, in contrast to the

of complicated time series data, the NDE technique offers 5.2 Prospects