0% found this document useful (0 votes)
11 views

Temporal Fusion Point-Interval Forecasting- A Comprehensive Approach for Financial Time Series Prediction

Uploaded by

Vaqif Aghayev
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Temporal Fusion Point-Interval Forecasting- A Comprehensive Approach for Financial Time Series Prediction

Uploaded by

Vaqif Aghayev
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Applied Soft Computing Journal 169 (2025) 112600

Contents lists available at ScienceDirect

Applied Soft Computing


journal homepage: www.elsevier.com/locate/asoc

Temporal fusion point-interval forecasting: A comprehensive approach for


financial time series prediction

Xianghui Qi , Zhangyong Xu, Fenghu Wang
School of Economics and Management, Northwestern University, Xi’an 710100, China

ARTICLE INFO ABSTRACT

Keywords: In the era of rapid information technology development, the financial markets are increasingly inundated
Point-interval prediction with vast amounts of data, thereby underscoring the critical importance of accurate long-term forecasting
Financial sequence forecasting of financial sequences. However, the development of a comprehensive long-term point-interval prediction
Optimization strategy
system for financial time series remains an area requiring significant further research. To address this gap,
Model fusion
we introduce the TFMADR model, designed to enhance both the accuracy and robustness of long-term
Model correction
financial sequence predictions. Specifically, we integrate the Temporal Fusion Transformer (TFT) long-term
forecasting model with the DeepAR probabilistic forecasting model, combining their strengths to optimize
prediction outcomes. In addition, we employ the Multi-Objective Simultaneous Search Algorithm (MSSA) for
multi-objective optimization, enabling us to identify the optimal fusion parameter configuration. To further
refine the predictive system, we incorporate correction factors aimed at improving both the precision and
reliability of the forecasts. Comparative experiments conducted across four distinct financial markets reveal
the superior predictive accuracy, robustness, and uncertainty analysis capabilities of the TFMADR model.
These results demonstrate its potential for providing a more holistic understanding of dynamic fluctuations
in financial markets, offering valuable insights for investors seeking to optimize strategies and mitigate risks.
The innovative integration of deep learning models, multi-objective optimization, and corrective enhancements
positions the TFMADR model as a promising tool for the future of financial sequence forecasting.

1. Introudction and unforeseen crises, all of which can significantly impact market
behavior and increase the difficulty of modeling [4]. Furthermore, the
In the modern financial domain, the accuracy and reliability of non-linear nature of financial data, along with the long- and short-term
financial time series forecasting are crucial. The volatility, complexity, dependencies inherent in time series, means that traditional statistical
and high uncertainty of financial markets make predicting market methods and models often struggle to capture market dynamics accu-
trends and price movements a key task in financial decision-making [1]. rately [5]. Although deep learning techniques have made progress in
Whether for investors in asset management or policymakers in eco- overcoming some of these limitations, they still face challenges, such
nomic regulation, precise financial market forecasting provides essen- as strong data dependence, high computational costs, model overfitting,
tial support, helping to mitigate investment risks, optimize resource and local optima, which hinder their stability and reliability in practical
allocation, and enhance the scientific basis for economic decisions. applications.
Consequently, financial time series forecasting based on historical data
Therefore, an essential problem in improving the accuracy and sta-
holds significant academic value and offers valuable decision-making
bility of long-term financial time series forecasting is how to effectively
insights for practical financial operations [2]. However, as market
address these challenges by developing a comprehensive forecasting
dynamics become increasingly complex, existing forecasting models
system that not only provides point predictions but also evaluates the
face several challenges in predicting long-term financial data.
uncertainty of the forecasts [6]. Traditional point prediction methods
Long-term financial time series forecasting is particularly important
often overlook the diversity and complexity of the market, while in-
as it helps investors identify and leverage market trends, thereby formu-
lating more precise and effective investment strategies [3]. However, terval forecasting offers a more comprehensive view, helping decision-
financial markets are characterized by high levels of uncertainty and makers understand the potential fluctuations of the predicted values,
complex external factors, such as economic policies, political events, thereby reducing associated risks and uncertainties. However, most

∗ Corresponding author.
E-mail addresses: [email protected] (X.H. Qi), [email protected] (Z.Y. Xu), [email protected] (F.H. Wang).

https://doi.org/10.1016/j.asoc.2024.112600
Received 3 March 2024; Received in revised form 11 November 2024; Accepted 29 November 2024
Available online 14 December 2024
1568-4946/© 2024 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
X.H. Qi et al. Applied Soft Computing 169 (2025) 112600

current research tends to focus on single prediction tasks, neglecting of data volumes, traditional statistical methods have shown limita-
the importance of uncertainty assessment and multi-objective opti- tions [13]. As a result, researchers have increasingly turned to various
mization [7]. Thus, the integration of point forecasting with interval machine learning, deep learning, and fuzzy logic-based approaches for
estimation, alongside reasonable optimization algorithms to enhance forecasting [14]. In particular, in the domains of point forecasting
overall model performance, forms the core objective of this study. and combined point-interval prediction, scholars have significantly im-
To tackle these challenges, this study proposes an innovative fore- proved the accuracy and reliability of financial time series forecasting
casting model: temporal fusion multi-objective adaptive differential through the development of innovative algorithms and optimization
regression (TFMADR). This model combines the strengths of the tem- strategies.
poral fusion transformer (TFT) [8] and DeepAR [9] models to enhance
the accuracy and reliability of financial time series predictions. The 2.1. Point forecasting methods for financial time series
TFT model excels in capturing long-term dependencies and non-linear
Point forecasting methods for financial time series primarily focus
features in time series data through its powerful feature extraction
on predicting the specific value of a financial variable at a future time
capability and self-attention mechanism, while the DeepAR model ef-
point. Common models in this category include traditional statistical
fectively handles local fluctuations in time series via autoregressive
models, machine learning methods, and deep learning approaches.
modeling [10]. By integrating these two models, this research aims
Among the traditional point forecasting methods, the Autoregressive
to provide both point predictions and interval estimates, addressing
Integrated Moving Average (ARIMA) model is one of the most widely
the dual needs for accurate forecasts and risk assessment in financial
applied models for financial time series forecasting [15]. The ARIMA
markets.
model forecasts future values by fitting the autocorrelations of histor-
Moreover, to further improve model performance, this study intro-
ical data. However, since financial market data often exhibit charac-
duces the multi-objective simultaneous search algorithm (MSSA) [11], teristics such as non-linearity and non-stationarity, the ARIMA model
which optimizes the parameter combinations of the model, balancing tends to produce relatively large errors when applied to financial time
prediction accuracy, computational efficiency, and model complex- series [16]. Consequently, researchers have explored more complex
ity [12]. The MSSA algorithm not only enhances model stability but models to improve forecasting accuracy. For example, the General-
also effectively mitigates the risk of overfitting, ensuring the model ized Autoregressive Conditional Heteroskedasticity (GARCH) model,
maintains high predictive performance across different datasets and which accounts for volatility changes in the financial market, better
market conditions. Therefore, the main contributions of this study are captures asset price volatility but still faces limitations in addressing
as follows: non-linearity and non-stationarity issues.
With the advancement of machine learning and deep learning tech-
• Model fusion and optimization: This research introduces an in-
niques, point forecasting models based on these methods have gradu-
novative model fusion framework that combines the strengths of
ally gained prominence in the financial domain. Lazcano et al. [17]
the TFT and DeepAR models and fine-tunes the fusion parameters
highlighted that machine learning methods, such as Support Vector
through the MSSA. This approach not only improves the accuracy
Machines (SVM) [18], Artificial Neural Networks (ANN) [19], and
of long-term financial time series forecasts but also enhances the deep learning models [20], are capable of capturing complex non-
model’s robustness and stability, enabling it to better adapt to linear relationships within the data, thereby improving prediction ac-
complex market conditions and high-volatility data. curacy. For instance, Jin et al. [21] proposed a hybrid model com-
• Point and interval integrated forecasting system: To overcome bining Empirical Mode Decomposition (EMD) and Backpropagation
the limitations of traditional point forecasting methods, this study Neural Networks (BPNN), optimized with a Particle Swarm Optimiza-
develops an integrated forecasting system that combines point tion (PSO) algorithm, which effectively reduces noise and enhances
prediction and interval estimation. This system provides a more non-linear prediction performance when applied to financial time series
comprehensive output, helping investors assess risk more effec- data. Furthermore, deep learning technologies, particularly Long Short-
tively and optimize decision-making processes when formulating Term Memory (LSTM) networks and Convolutional Neural Networks
investment strategies. (CNN) [22], have been widely applied in financial time series forecast-
• Introduction of correction factors: To further enhance predic- ing in recent years. Cheng et al. [23] utilized a deep learning neural
tion accuracy and stability, this study introduces a correction network model to propose a two-stage oil price forecasting system,
factor mechanism. The correction factors help adjust the model’s demonstrating excellent performance across different time windows,
bias during extreme market fluctuations, thereby improving the forecast horizons, oil price proxies, and business conditions. In another
model’s adaptability and practical applicability in volatile market study, Herrera et al. [24] explored the impact of investor sentiment
environments. on predicting renewable energy stock returns and volatility, proposing
a sentiment analysis-based deep learning framework that significantly
The organization of this paper is as follows: Section 2 provides a
enhanced forecasting performance. Mourtas and Katsikis [25] intro-
review of related research. Section 3 presents the specific methods and
duced a new multifunctional model activated by a time-series weight
theoretical framework of the proposed model. Section 4 outlines the
and structure-determined (MAWTS) algorithm, and proposed a three-
experimental setup for the comprehensive financial sequence point in-
layer feedforward neural network model (MAWTSNN) for long-term
terval prediction system. Specifically, Section 4.1 describes the dataset
financial time series forecasting. Experimental results comparing the
used in this study, and Section 4.2 introduces the multidimensional
performance of MAWTSNN with ZNN and LVI-PDNN in three differ-
evaluation metrics for point interval prediction. To demonstrate the ent portfolio configurations showed that MAWTSNN is an excellent
functionality of the developed prediction system, Section 5 presents alternative to traditional methods.
four distinct experiments, discussing and analyzing the experimental Despite these advancements, traditional point forecasting methods
results of the proposed TFMADR model in comparison to other models. often overlook the inherent uncertainty in financial markets. As a
Finally, Section 6 concludes the paper. result, there has been a growing focus in recent research on addressing
the prediction uncertainty in financial time series, with innovative
2. Related work methods proposed to better capture and describe such uncertainties.

Financial time series forecasting, as a crucial component of financial 2.2. Point-interval combined forecasting methods for financial time series
analysis and decision-making, aims to predict future prices, indices, or
other financial variables based on the analysis of historical data. With Compared to point forecasting methods, point-interval combined
the increasing complexity of financial markets and the rapid growth forecasting methods for financial time series not only focus on pre-

2
X.H. Qi et al. Applied Soft Computing 169 (2025) 112600

dicting future values but also account for the uncertainty range of Furthermore, the TFMADR model utilizes the MSSA multi-objective
these predictions, providing a more comprehensive reflection of market fusion technique to optimize the correction factors in Eq. (24), ensuring
volatility and risk [26]. Interval forecasting typically presents a confi- that the model performs excellently across different tasks and data
dence interval, indicating that the predicted value is likely to fall within conditions. This integrative approach positions the TFMADR model as
a certain range, which is crucial for risk management and investment a vital tool in the financial domain, contributing to improved decision-
decision-making. Liu et al. [27] proposed a method based on fuzzy making and risk management capabilities. Below, we will provide a
time series (FTS) that can simultaneously handle point, interval, and phased introduction to the forecasting process.
distributional forecasts. By incorporating fuzzy logic, this approach
effectively addresses uncertainty in the data, making it particularly 3.1. Initialize the long horizon point interval forecasting system
suitable for financial markets, which are characterized by high non-
linearity and uncertainty. The advantage of fuzzy time series lies in Financial time series refer to unique entities that change continu-
its ability to better model the inherent vagueness of real-world phe- ously over time 𝑡0 , such as electricity prices in different power markets
nomena [28], allowing the model to provide more robust predictions or financial futures prices. We can use historical variable information
in complex market scenarios. denoted as 𝐗𝑖,𝑡0 ∶ = [𝑥𝑖,𝑡0 −𝑚 , 𝑥𝑖,𝑡0 −𝑚+1 , … , 𝑥𝑖,𝑡0 ] from financial series to
In the domain of interval forecasting, Yuan and Che [29] intro- measure information for the future 𝑇 time steps. This is achieved
duced a multi-output least squares support vector regression (MLSSVR) through a point prediction model, denoted as 𝑓𝑃 , which provides
method for interval prediction. This method optimizes the model’s hy- point predictions 𝐘𝑃 ,𝑡0 ∶ = [𝑦𝑖,𝑡0 , 𝑦𝑖,𝑡0 +1 , … , 𝑦𝑖,𝑡0 +𝑇 ] for the time range
perparameters using meta-heuristic optimization algorithms to enhance [𝑡0 , 𝑡0 + 𝑇 ], aiding in understanding future trends. In many cases, to op-
the precision and stability of interval forecasts. By jointly considering timize decision-making and risk management, it is necessary to define
multiple outputs, the MLSSVR method can more accurately predict prediction intervals [32]. These intervals can be determined by con-
future price ranges. Their study demonstrated that interval forecasting sidering the best and worst-case scenarios that may occur. Therefore,
holds significant value in capturing financial market volatility and after generating point predictions, we embed them into a probability
uncertainty, offering decision-makers additional insights for informed interval prediction model, denoted as 𝑓𝐼 , ultimately yielding the range
decision-making. Another common interval forecasting approach is of future changes in the financial series, [𝐘𝐿 , 𝐘𝑈 ]. This process can be
based on dynamic models. Mokarram et al. [30] proposed a particle programmatically executed in the following steps:
swarm optimization (PSO)-trained quantile regression neural network ( )
𝐘𝑃 ,𝑡0 ∶𝑇 = 𝑓𝑃 ℎ𝑖,𝑡−𝑚∶𝑡0 , 𝑥𝑖,𝑡−𝑚∶𝑡0 , 𝝉 , (1)
(PSOQRNN) for forecasting financial time series volatility. Their re-
search showed that by dynamically adjusting model parameters, the where ℎ𝑖,𝑡−𝑚∶𝑡0 represents the hidden layer information flow within 𝐹𝑃 ,
approach can effectively address market uncertainty and volatility, and 𝝉 collectively refers to the parameters to be learned within 𝐹𝑃 .
improving the accuracy of interval forecasts. Dynamic models are more After completing this stage and obtaining point prediction information
adaptable to the rapidly changing financial markets, giving them a 𝑌𝑖,𝑡0 ∶𝑇 , it is further input into the interval prediction stage.
distinct advantage in interval forecasting.
Both point forecasting and point-interval combined forecasting 𝐘𝐼 ,𝑡0 ∶𝑇 = 𝑓𝐼 (𝐲𝑖,𝑡0 ∶𝑇 |𝐲𝑖,1∶𝑡0 −1 , 𝛩𝑖,1∶𝑇 ). (2)
methods for financial time series are continuously evolving and inno-
vating [31]. By incorporating new algorithms, optimization strategies, In this stage, we obtain the final probability prediction value,
and uncertainty handling techniques, current research has progres- denoted as 𝐘𝐼 ,𝑡𝜃 ∶𝑇 = 𝑝(𝑦𝑖,𝑡 |𝛩(𝐡𝑖,𝑡 , 𝜃)), where 𝛩(𝐡𝑖,𝑡 , 𝜃) represents the
sively improved the accuracy, stability, and reliability of financial time parameters to be learned in this phase. Subsequently, we define a
series predictions, providing more possibilities for decision support in specific training loss function , and utilize MSSA to optimize the
the financial sector. This study contributes to the continuation and parameter space [𝝉, 𝛩] for training both stages. Below, we will provide
further innovation of this field. a detailed introduction to each stage of the model.

3. Methodology 3.2. TFT point prediction framework

This section provides a detailed overview of the comprehensive fi- Temporal fusion transformers (TFT) is a transformer model specif-
nancial time series prediction system, which we refer to as the TFMADR ically designed for multi-step prediction tasks. It excels in handling
model. Fig. 1 illustrates the overall framework of the model. The model financial time series data and offers excellent interpretability. Below,
comprises three key components: the first stage involves TFT point we will provide a detailed modular introduction to the different com-
prediction technology, the second stage utilizes DeepAR probabilistic ponents of TFT.
prediction technology, and the final component is the MSSA algorithm
used for optimizing fusion parameters. 3.2.1. Gated residual network
The comprehensive design of the TFMADR model demonstrates its Gated residual network (GRN) is a critical component within TFT
outstanding performance in time series forecasting tasks within the [8], designed to ensure effective information flow within the model. It
financial domain. It provides a holistic solution by combining point combines skip connections and gating layers. Skip connections enable
prediction and probability forecasting, aiding in better understanding the model to pass information between different layers of the network,
and addressing the dynamic changes and uncertainties of financial mar- addressing the issue of gradient vanishing and accelerating model
kets. By integrating key technologies such as TFT, DeepAR, and MSSA, training. Gating layers are used to regulate the flow of information,
the TFMADR model equips decision-makers and risk managers in the ensuring that only essential information is transmitted. This aids the
financial sector with a powerful tool. In Fig. 1, the left side showcases model in better understanding key features within time series data [8].
( )
TFT point prediction technology, while the right side represents the GRN (̃𝐱, 𝐡) = LayerNorm 𝐱̃ + GLU(𝜉1 ) ,
probabilistic forecasting part of DeepAR. What sets the TFMADR model 𝜉1 = 𝑾 1 𝜉2 + 𝒃1 , (3)
apart is its seamless integration of these two crucial stages, passing ( )
𝜉2 = ELU 𝑾 2 𝐱̃ + 𝑾 3 𝐡 + 𝒃2 ,
the results of point prediction to the probability forecasting stage. This
process is achieved through the fusion process in Eq. (24), allowing the where exponential linear unit (ELU) is an activation function used for
model to better combine information from both prediction stages and non-linear transformations in neural networks. ELU is characterized
enhance prediction accuracy and reliability. by exponential smoothing in the negative region, which helps prevent

3
X.H. Qi et al. Applied Soft Computing 169 (2025) 112600

Fig. 1. The framework figure of the proposed TFMADR model is depicted below. The left portion of the diagram with red arrows represents the point prediction phase of TFMADR.
After the point prediction is completed, the model proceeds to the probability interval prediction phase, as indicated on the right side of the figure.

the problem of gradient vanishing, while exhibiting linear behavior the transformed input of the 𝑗th variable at time 𝑡, where 𝛯𝑡 =
[ ]𝑇
in the positive region. This property contributes to the training and 𝑇 (𝑚 )𝑇
𝜉𝑡(1) , … , 𝜉𝑡 𝜒 is a flattened vector containing all past inputs at
generalization performance of neural networks. 𝜉1 ∈ R𝑑𝑁 and 𝜉2 ∈
R𝑑𝑁 (with 𝑑𝑁 representing the hidden state size) are intermediate time 𝑡.
layers, typically hidden layers in neural networks, used for feature Within the encoder component, there are also static covariate en-
extraction and transformation. These intermediate layers introduce coders (SCE) used to encode context vectors for static covariates. Static
non-linearity through the ELU activation function. 𝐋𝐚𝐲𝐞𝐫 𝐍𝐨𝐫 𝐦 is a covariates are typically additional information related to time series
standard normalization layer used to normalize the output of neural data, such as weather conditions, holiday information, and the like.
networks, ensuring model stability and convergence. Standard normal- SCE’s role is to encode this static information into a format that
ization layers aid in speeding up the training process and improving the model can comprehend, allowing the model to better utilize this
model performance [8]. The components of the gated linear unit (GLU) information to enhance predictive performance.
are defined as follows:
( ) ( )
GLU(𝜸) = 𝜎 𝑾 4 𝜸 + 𝒃4 ⊙ 𝑾 5 𝜸 + 𝒃5 , (4) 3.2.3. Temporal fusion decoder
where 𝑊(.) ∈ R𝑑𝑁 ×𝑑𝑁
and 𝑏(.) ∈ R𝑑𝑁
represent the weight and bias The role of the Temporal fusion decoder (TFD) in the diagram
parameters in a neural network. These parameters are utilized for linear is to learn the temporal relationships present in the dataset. In this
transformations, mapping input data to the hidden layers or output process, the LSTM encoder takes past features 𝝃̃ 𝑡−𝑘∶𝑡 as input, while
layers of the neural network. 𝜎 typically denotes the activation function the LSTM decoder takes future features 𝝃̃ 𝑡+1∶𝑡+𝜏max as input [33]. Thus,
used in the neural network. The hadamard product is an element-wise the LSTM encoder and decoder together generate a unified set of time-
multiplication operation, typically represented using the symbol ⊙. series features, where 𝜙(𝑡, 𝑛) ∈ {𝜙(𝑡, −𝑘), … , 𝜙(𝑡, 𝜏max )} represents the
time-series feature, and n is the positional index.
3.2.2. Variable selection network Finally, before entering the TFD, this set of time-series features may
Variable selection network (VSN) is a component used for feature undergo an operation:
( )
selection in a model. It intelligently selects the most significant features ̃ 𝑛) = LayerNorm 𝝃̃ 𝑡+𝑛 + GLU𝜙̃ (𝜙(𝑡, 𝑛)) ,
𝝓(𝑡, (8)
based on the input data, which can enhance the model’s efficiency and
performance. By choosing important features, VSN helps reduce the this operation could include feature transformation, dimension trans-
computational complexity of the model and enables it to better handle formation, or other preprocessing steps to prepare the time-series fea-
various types of time series data [8]. tures for use by the TFD. Upon entering the TFD, the data undergoes
𝑚
∑𝜒 processing through three internal modules: the Static enrichment layer
𝜉̃𝑡 = 𝑣(𝑗) ̃(𝑗) (5)
𝜒 𝜉𝑡 ,
𝑡 (SEL), the temporal self-attention layer (TSL), and the position-wise
𝑗=1
( ) feed-forward layer (PFL).
𝑣𝜒 𝑡 = Sof t max GRN𝑣𝜒 (𝛯𝑡 , 𝐜𝑠 ) , (6) The static enrichment layer (SEL) primarily focuses on effectively
( ) integrating static features into the model, such as stock codes, market
𝜉̃𝑡(𝑗) = GRN𝜉(𝑗)
̃ 𝜉𝑡(𝑗) , (7) indicators, and more. In financial time series prediction, these static fea-
tures can provide additional contextual information, aiding the model
where 𝑣(𝑗) ̃(𝑗) de-
𝜒 𝑡 represents the weight for feature selection, and 𝜉𝑡 in better understanding and interpreting the fluctuations in financial
notes the feature after non-linear processing. 𝜉𝑡(𝑗) ∈ R𝑑𝑁 signifies data. The SEL layer can incorporate static features into the model

4
X.H. Qi et al. Applied Soft Computing 169 (2025) 112600

through various means, such as embedding or other feature engineering During the prediction phase of the DeepAR model, in the absence
methods, to enhance predictive performance. of observed values at the corresponding time points, the previous time
( )
̃ 𝑛), 𝒄 𝑒 . step’s prediction is used as the input for the next time step. This is
𝜽(𝑡, 𝑛) = GRN 𝝓(𝑡, (9)
represented by the following formula for the network’s hidden state
where the GRN are shared throughout the entire layer, and 𝒄 𝑒 is the 𝒉𝑖,𝑡 :
context vector originating from the static covariate encoder. 𝒉𝑖,𝑡 = 𝜑(𝒉𝑖,𝑡−1 , 𝑧𝑖,𝑡−1 , 𝒙𝑖,𝑡 ), (18)
The TSL plays a critical role in financial time series prediction. It
can capture temporal dependencies within the sequence data, including where 𝜑 is a function implemented by a multi-layer recurrent neural
trends, periodicity, and seasonality, among others. In financial markets, network, which includes LSTM units. The initial states 𝒉𝑖,0 and 𝒐 are
temporal factors have a significant impact on prices and volatility, initialized to zero, and 𝑧𝑖,0 is also initialized to zero. This model is
making the TSL layer essential for the model to better understand and autoregressive because the current time step’s hidden state 𝒉𝑖,𝑡−1 is
predict the dynamic behavior of financial time series. This layer uses related to the next time step’s hidden state 𝒉𝑖,𝑡 , and the current time
self-attention mechanisms to learn correlations between different time step’s observed value 𝑧𝑖,𝑡−1 is also related to the next time step’s hidden
steps in the sequence, enabling more accurate predictions of future state 𝒉𝑖,𝑡 . DeepAR assumes the following conditional distribution model:
trends. ∏
𝑇
𝑃𝜃 (𝐳𝑖,𝑡0 ∶𝑇 ∣ 𝐳𝑖,1∶𝑡0 −1 , 𝐱𝑖,1∶𝑇 ) = 𝑃𝜃 (𝑧𝑖,𝑡 ∣ 𝐳𝑖,1∶𝑡−1 , 𝐱𝑖,1∶𝑇 )
𝐁(𝐭) = InterpretableMultiHead(𝜃(𝑡, − 𝑘), … , 𝜃(𝑡, 𝑛)), (10) 𝑡=𝑡0


𝑇
𝛿(𝑡, 𝑛) = LayerNorm(𝜃(𝑡, 𝑛) + GLU𝛿 (𝛽(𝑡, 𝑛))). (11) = 𝑙(𝑧𝑖,𝑡 ∣ 𝜃(𝐡𝑖,𝑡 , 𝛩)). (19)
𝑡=𝑡0
( )
where, 𝐁(𝐭) = [𝛽(𝑡, −𝑘), … , 𝛽 𝑡, 𝜏𝑚𝑎𝑥 ] is the result obtained from the This means that the model assumes that the data at each time step
interpretable multi-head self-attention layer. This result is actually follows some distribution, and the parameters are determined by a
generated through a multi-head mechanism, where shared parameters function 𝜃(𝐡𝑖,𝑡 , 𝛩) output by the network 𝐡𝑖,𝑡 . This function 𝜃 takes the
are used for the input vector 𝐕, while independent parameters are used hidden state 𝐡𝑖,𝑡 and additional parameters 𝛩 as inputs. The likelihood
for the query vector 𝐐 and key vector 𝐊. Subsequently, by calculating function 𝑙(𝑧𝑖,𝑡 ∣ 𝜃(𝐡𝑖,𝑡 , 𝛩)) corresponds to a fixed distribution with
attention scores for multiple heads, vector 𝐕 is weighted, and then a parameters 𝛩, and the shape of this distribution can be chosen based
summation and averaging operation is performed to generate 𝐵(𝑡). The on the statistical properties of the data. In this study, we adopt the
interpretable multi-head self-attention layer is defined as follows: following distribution function:
1
̃ 𝐻,
Int er pr et ableMult iHead(𝐐, 𝐊, 𝐕) = 𝐇𝐖 (12) 𝑝G (𝑧|𝜇 , 𝜎) = (2𝜋 𝜎 2 )− 2 exp(−(𝑧 − 𝜇)2 ∕(2𝜎 2 )), (20)

̃ = 𝐴(𝑸,
𝑯 ̃ 𝑲)𝑽 𝑾 𝑉 ,
{ } 𝜇(𝐡𝑖,𝑡 ) = 𝐰𝑇𝜇 𝐡𝑖,𝑡 + 𝑏𝜇 , (21)
𝑚𝐻 ( )
1 ∑ (ℎ)
= 𝐴 𝑸𝑾 𝑄 , 𝑲 𝑾 (ℎ) 𝑽𝑾𝑉,
𝑚𝐻 ℎ=1 𝐾
(13) 𝜎(𝐡𝑖,𝑡 ) = log(1 + exp(𝐰𝑇𝜎 𝐡𝑖,𝑡 + 𝑏𝜎 )). (22)
𝑚𝐻 ( )
1 ∑
= At t ent ion 𝑸𝑾 (ℎ)
𝑄
, 𝑲 𝑾 (ℎ)
𝐾
,𝑽 𝑾 𝑉 , In the above distribution, a Gaussian likelihood function, 𝑝G (𝑧|𝜇 , 𝜎),
𝑚𝐻 ℎ=1 is used to describe the distribution of the observed values 𝑧. Here, the
mean 𝜇 is given by an affine function of the network’s output. The affine
The PFL is a crucial component used to further process time-series
function can be understood as a linear transformation that maps the
features. In financial time series prediction, this layer can introduce
network’s output to the range of values for the mean 𝜇. The standard
additional non-linear transformations to better capture complex re-
deviation 𝜎 is ensured to be greater than zero by applying a soft positive
lationships in the data. It helps the model appropriately transform
activation function after the affine transformation. This means that the
features at different positions in the time series data to accommodate
standard deviation is non-negative because in a Gaussian distribution,
the needs of different time points.
( ) the standard deviation must be greater than zero.
̃ 𝑛) + GLU𝜓̃ (𝜓(𝑡, 𝑛)) ,
̃ 𝑛) = Layer Nor m 𝜙(𝑡,
𝜓(𝑡, (14)
𝑌𝐿 = 𝜇(𝐡𝑖,𝑡 ) + 𝑧1−𝛼∕2 ∗ 𝜎(𝐡𝑖,𝑡 ), (23)

𝜓(𝑡, 𝑛) = GRN𝜓 (𝛿(𝑡, 𝑛)) . (15)


𝑌𝑈 = 𝜇(𝐡𝑖,𝑡 ) + 𝑧𝛼∕2 ∗ 𝜎(𝐡𝑖,𝑡 ), (24)
After assembling the components mentioned above, the TFT model
where 𝑌𝐿 represents the lower bound, 𝑌𝑈 represents the upper bound,
obtains its final output:
𝜇(𝐡𝑖,𝑡 ) is the mean output by the neural network, 𝜎(𝐡𝑖,𝑡 ) is the stan-
𝐘𝑃 ,𝑡0 ∶𝑇 = 𝑾 𝑃 𝝍(𝑡,
̃ 𝜏) + 𝑏𝑃 . (16) dard deviation output by the neural network, and 𝑧1− 𝛼 and 𝑧 𝛼 are
2 2
percentiles from the standard normal distribution used to construct
the confidence interval. This output interval reflects the model’s uncer-
3.3. DeepAR interval prediction framework tainty about the predicted values and is typically used in risk manage-
ment and decision-making. It helps decision-makers better understand
DeepAR is a prediction model based on RNN, capable of handling potential risks and the range of possible outcomes.
multiple time series simultaneously and providing predictions in the
form of probability distributions [9]. We can use 𝐳𝑖,1∶𝒕0 −1 to represent 3.4. Model training
the observed values of the 𝑖th time series up to time 𝑡0−1 . Our goal is
to establish the following conditional distribution model: Given a set of observed values 𝑋𝑡,𝑡0 ∶𝑇 , the training process of the
𝑃 (𝐳𝑖,𝒕0 ∶𝑇 ∣ 𝐳𝑖,1∶𝒕0 −1 , 𝒙𝑖,1∶𝑇 ). (17) TFMDAR model can be carried out by defining the following loss
function:
In this model, we aim to predict the future values of the 𝑖th time
 = 𝜆𝑃 (𝑋𝑡,𝑡 , 𝑌𝑃 ,𝑡 ) + 𝜇𝐼 (𝑋𝑡,𝑡 , 𝜃(ℎ𝑡 )), (25)
series between time 𝒕0 and 𝑇 , taking into account the observed values 0 ∶𝑇 0 ∶𝑇 0 ∶𝑇 0 ∶𝑇

of this sequence up to time 𝑡0−1 and possible additional information where 𝑃 represents the loss function for point prediction, typically
𝒙𝑖,1∶𝑇 . using mean squared error (MSE) or other appropriate loss functions. 𝐼

5
X.H. Qi et al. Applied Soft Computing 169 (2025) 112600

represents the loss function for interval prediction, typically defined as Algorithm 1 MSSA Algorithm for Optimizing 𝜆 and 𝜇
maximizing the log-likelihood function log 𝑝(𝑧𝑖,𝑡 |𝜃(𝐡𝑖,𝑡 )).
Require: 𝑚𝑎𝑥_𝑖𝑡𝑒𝑟 - maximum iteration number
In the objective function, 𝜆 and 𝜇 are correction factors defined by 𝑖𝑡𝑒𝑟 - current iteration
us, used to balance the accuracy and stability of point and interval 𝑛 - number of salps
predictions. By adjusting the values of these two factors, you can 𝑌 - i-th salp’s location
fine-tune the importance of different types of prediction tasks during 𝐴𝑖 - fitness of the i-th salp
training. The training objective is to minimize the objective function 𝑢𝑗 - upper bound of j-th dimension
, which means the model can achieve optimal performance in both 𝑙𝑗 - lower bound of j-th dimension
point and interval predictions. To achieve the optimal values for 𝜆 and 𝜆 - initial value of 𝜆
𝜇, we use a multi-objective optimizer like MSSA to search for the best 𝜇 - initial value of 𝜇
hyperparameter settings that minimize the objective function. 𝑠𝑡𝑒𝑝 - step size for updating 𝜆 and 𝜇
The goal of this training process is to construct a comprehensive 1: Initialize the parameters of the algorithm.
model that better handles uncertainty and risk in financial time series 2: Initialize the positions of salps randomly.
prediction tasks by considering both point and interval prediction 3: while 𝑖𝑡𝑒𝑟 < 𝑚𝑎𝑥_𝑖𝑡𝑒𝑟 do
( )2
performance. By adjusting 𝜆 and 𝜇, you can balance accuracy and − 4𝑘
stability according to the specific requirements of the problem. This 4: 𝑐1 = 2𝑒 𝐾
5: for 𝑖 = 1 to 𝑛 do
approach allows the model to perform well in multiple prediction tasks
6: Calculate the fitness value of each salp.
and better meet the needs of the financial domain.
7: Find the non-dominated salps.
8: Update the repository about the non-dominated salps.
3.5. Multi-objective simultaneous search algorithm 9: end for
10: if repository number greater than maximum repository number
To address multi-objective problems, it is typically necessary to then
simultaneously optimize multiple objective functions. The general form 11: Remove one repository member.
of a multi-objective problem is as follows [32]: 12: Add non-dominated salp to repository.
{ } 13: end if
𝑀 𝑖𝑛𝑖𝑚𝑖𝑧𝑒 ∶ 𝐹 (𝑥) = 𝑓1 (𝑥), 𝑓2 (𝑥), … , 𝑓𝑛 (𝑥) , (26)
14: Use ranking process and roulette wheel to select a food source.
{
𝑔𝑖 (𝑥) ≥ 0, 𝑖 = 1, 2, … , 𝑚 15: Update 𝑐1 .
Subject to , (27) 16: for 𝑖 = 1 to 𝑛 do
ℎ𝑖 (𝑥) = 0, 𝑖 = 1, 2, … , 𝑝
17: if 𝑖 = 1 then
where 𝑛 represents the number of objectives, 𝑚 represents the number 18: Update location of leader salp.
( )
of inequality constraints, and 𝑝 represents the number of equality 19: if 𝐴𝑗 + 𝑐1 (𝑢𝑗 − 𝑙𝑗 )𝑐2 + 𝑙𝑗 𝑐3 ≥ 0 then
( )
constraints. Multi-objective optimization aims to find a set of vari- 20: 𝑦1𝑗 = 𝐴𝑗 + 𝑐1 (𝑢𝑗 − 𝑙𝑗 )𝑐2 + 𝑙𝑗 𝑐3
able values 𝑥 that simultaneously minimize or maximize all objective 21: else
( )
functions 𝑓𝑖 (𝑥) while satisfying a set of constraint functions 𝑔𝑖 (𝑥) and 22: 𝑦1𝑗 = 𝐴𝑗 − 𝑐1 (𝑢𝑗 − 𝑙𝑗 )𝑐2 + 𝑙𝑗 𝑐3
ℎ𝑖 (𝑥). 23: end if
In this research, financial time series forecasting involves optimizing 24: else
the parameters 𝜆 and 𝜇 in the objective function to balance the accuracy 25: Update location of follower salp.
and stability of point and interval predictions. This can be viewed 26: 𝑦𝑖𝑗 = 12 (𝑦𝑖𝑗 + 𝑦𝑖−1
𝑗 )
as a multi-objective optimization problem where the two objective 27: end if
functions are the values of 𝜆 and 𝜇, and the goal is to find the opti- 28: end for
mal parameter configuration that minimizes or maximizes these two 29: 𝑖𝑡𝑒𝑟 = 𝑖𝑡𝑒𝑟 + 1
objective functions simultaneously. Additionally, there may be other 30: end while
constraints to be satisfied to ensure the reasonableness and feasibility
of the parameters.
To solve this multi-objective problem, one can use multi-objective 4.1. Dataset
optimization algorithms like MSSA, as shown in Algorithm 1 [32].
MSSA algorithms can find trade-offs between multiple objectives to ob- In this study, we selected four representative financial time series
tain the best parameter configuration. This approach helps in adjusting datasets, which contain historical trading data from multiple financial
the values of 𝜆 and 𝜇 to meet the requirements of the forecasting task markets. These datasets were chosen based on their widespread ap-
and strike a balance between point and interval predictions, thereby plicability and the significant challenges they present in the financial
improving model performance and adaptability. domain. Specifically, we utilized the brent crude oil dataset (Brent), the
Through multi-objective optimization algorithms, researchers can carbon emission trading dataset (Carbon), the ETH-USD cryptocurrency
gain a better understanding of the trade-offs between parameters and dataset, and the Spot-FI electricity price dataset. These datasets span
find the optimal parameter configuration to address the complexity and different markets to ensure both diversity and representativeness in
uncertainty of financial time series forecasting tasks. This approach the experiment. All data collected are daily observations, and we
contributes to enhancing the model’s robustness and adaptability to partitioned them into training and test sets, with the first 80% allocated
better cope with market volatility and risk in the financial domain. for training and the remaining 20% for testing. Links for data retrieval
are also provided. The detailed characteristics of these datasets are as
follows (see Fig. 2 and Table 1).
4. Experimental setup
• Brent crude oil dataset (https://www.investing.com): This
In this section, we will provide a detailed overview of the key con- dataset consists of time series data for Brent crude oil prices. The
figurations involved in our experiments, including the dataset, model dataset exhibits a high standard deviation (20.36), indicating that
evaluation metrics, the platform environment for model execution, and oil prices are highly influenced by global economic conditions,
an introduction to the baseline models we have chosen. geopolitical events, supply chain disruptions, and other external

6
X.H. Qi et al. Applied Soft Computing 169 (2025) 112600

Fig. 2. Four financial time series figures. The first is the Brent crude oil price dataset. The second is the EUA carbon emission quota market dataset. The third is the ETH-USD
cryptocurrency price dataset. The fourth is the spot market price dataset for electricity in Finland.

Table 1
Dataset statistics and train-test splits.
Dataset Count Mean Std Min 25% 50% 75% Max Train window (80%) Test window (20%)
Brent 1399 70.69 20.36 9.12 61.04 70.52 80.51 133.18 2018.01.02–2022.05.27 2022.05.28–2023.07.10
Carbon 874 16.26 8.10 4.36 7.39 18.08 24.02 29.77 2017.01.01–2018.11.30 2018.12.01–2019.05.24
ETH-USD 1774 1120.20 1221.35 84.31 206.74 459.80 1823.54 4812.09 2017.11.10–2021.09.28 2021.09.29–2022.09.18
Spot-FI 2261 36.20 12.87 2.69 28.61 34.57 43.84 107.42 2015.01.01–2019.12.13 2019.12.14–2021.03.10

factors, leading to significant volatility. Notably, during periods financial markets. The price dynamics of cryptocurrencies are in-
of global crises,such as the COVID-19 pandemic or decisions made fluenced by a wide range of factors, including investor sentiment,
by the Organization of the Petroleum Exporting Countries (OPEC) market liquidity, technological advancements, and regulatory de-
oil prices can experience sharp fluctuations. The training period velopments. The dataset spans a period of rapid expansion in
(from January 2, 2018, to May 27, 2022) and testing period (from the cryptocurrency market, encompassing significant price fluc-
May 28, 2022, to July 10, 2023) span a time of global market tuations, with values ranging from a low of 84.31 to a high of
instability. This includes notable events that affected oil markets, 4812.09. This wide disparity highlights the inherent uncertainty
such as the pandemic. The high volatility during these periods in the crypto market. The training and testing periods reveal
presents a significant challenge for predictive models, particularly notable market cycles, especially between 2017 and 2021, during
when trying to capture the impact of unforeseen events. Conse- which the volatility of Bitcoin and other digital assets had a sig-
quently, this dataset serves as a testbed for evaluating a model’s nificant impact on the ETH-USD market. Predicting prices in this
robustness and adaptability under extreme market conditions. market requires attention not only to traditional market factors
but also to the unique dynamics driven by speculative behavior,
• EUA carbon emission dataset (https://www.eex.com/en): This
technological innovation, and evolving investor psychology.
dataset contains data related to the European Union Emissions
• Spot-FI electricity price dataset (https://www.nordpoolgroup.
Trading System (EU ETS) carbon allowances. Given the increasing
com): The Spot-FI dataset consists of day-ahead electricity prices
global emphasis on carbon emissions reduction and the grow-
in the Finnish market. Compared to the highly volatile oil and
ing interest in carbon trading markets, the volatility in carbon
cryptocurrency markets, electricity prices in Finland tend to be
pricing is unique. The dataset has a standard deviation of 8.10,
relatively stable, with a standard deviation of 12.87. This price
which is relatively lower than that of Brent crude oil and ETH-
stability is attributed to the regulated nature of the electricity
USD, indicating that carbon markets are generally more stable.
market, which is influenced by factors such as seasonal demand,
The price fluctuations are mainly influenced by policy changes, climate change, energy policies, and power supply constraints.
regulations, and global climate action initiatives, which tend to As electricity is a fundamental infrastructure service, its mar-
have less dramatic market impacts compared to more speculative ket typically exhibits strong cyclical and predictable patterns,
markets. In this context, the challenge lies not only in accurately especially in regions where prices are subject to government
predicting carbon prices but also in modeling the potential effects regulation and oversight. The dataset spans a longer time period
of policy shifts, regulatory changes, and emission targets. This (from January 1, 2015, to March 10, 2021), during which the
requires designing a model capable of capturing subtle market market experienced various policy changes, energy transitions,
fluctuations in a relatively stable trading environment. and shifts in supply dynamics, especially with the increasing share
• ETH-USD dataset (https://www.investing.com): The ETH-USD of renewable energy. The challenge for modeling in this context
dataset tracks the exchange rate between Ethereum and the US is to balance high predictive accuracy with the need to capture
dollar. This market exhibits pronounced volatility (standard de- the effects of long-term trends, seasonal demand patterns, and
viation of 1221.35), particularly when compared to traditional regulatory adjustments.

7
X.H. Qi et al. Applied Soft Computing 169 (2025) 112600

and the ideal width. It provides a measure of bias in the model’s prob-
4.2. Evaluation indicators
ability predictions, indicating whether the model tends to overestimate
or underestimate uncertainty.
The standards we use to assess the quality of point predictions
1 ∑
𝑁
involve comparing the predicted results with our ground truth values to AWD = AWDi , (36)
𝑁 𝑖=1
quantify the differences between them [13]. In time series forecasting,
𝐿 −𝑥
four commonly used and widely adopted evaluation metrics are the ⎧ 𝑖,𝑡 𝑖 , 𝑥𝑖 < 𝐿𝑖,𝑡
mean absolute percentage error (MAPE), Mean absolute error (MAE) ⎪ 𝑈𝑖,𝑡 −𝐿𝑖,𝑡
AWDi = ⎨0, 𝑥𝑖 = 𝐿𝑖,𝑡 (37)
and root mean square error (RMSE). Additionally, the median absolute ⎪
percentage error (MdAPE) characterizes the errors between predicted ⎩𝑥𝑖 − 𝐿𝑖,𝑡 , 𝑥𝑖 > 𝐿𝑖,𝑡 .
and actual values. Furthermore, R-squared (R2) assesses the correlation where, 𝑁 represents the number of samples, 𝐴𝑊 𝐷𝑖 represents the
between predicted and actual values, while the Improvement over cumulative width deviation of the 𝑖th sample, 𝐿𝑖,𝑡 represents the lower
Accuracy (IA) is an enhancement to accuracy evaluation metrics and is bound of the 𝑖th sample, 𝑈𝑖,𝑡 represents the upper bound of the 𝑖th
based on the correlation. These metrics help quantify the performance sample, and 𝑥𝑖 represents the observed value of the 𝑖th sample.
of point predictions by measuring how closely they align with the true Mean Pinball Loss (MPL): MPL is a loss function used to measure
values, and they are widely used in assessing the accuracy and quality the difference between the probability intervals generated by the model
of time series forecasting models. and the actual observed values. It considers the distance of each point
1 ∑ 𝑦𝑖 − 𝑦̂𝑖
𝑁 within the interval from the actual value and serves as a comprehensive
MAPE = | | × 100%, (28) performance evaluation metric.
𝑁 𝑖=1 𝑦𝑖
1 ∑∑
𝑁

1 ∑
𝑁 MPLt = 𝐿 (𝑥 − 𝑦̂𝑖,𝑞 ), (38)
MAE = |𝐲 − ̂ 𝐲𝑖 |, (29) 𝑁 𝑖=1 𝑞∈𝑄 𝑞 ,𝑡 𝑖
𝑁 𝑖=1 𝑖
√ where, 𝑁 represents the number of samples, 𝑄 represents the set of

√1 ∑ 𝑁
( )2 quantiles, 𝑥𝑖 represents the observed value, 𝑦̂𝑖,𝑞 represents the predicted
RMSE = √ ⋅ 𝑦 − 𝑦̂𝑖 , (30)
𝑁 𝑖=1 𝑖 value corresponding to the quantile, and 𝐿𝑞,𝑡 represents the quantile
∑𝑁 loss function.
̂ 2
𝑖=1 (𝐲𝑖 − 𝐲𝑖 )
IA = 1 − ∑𝑁 , (31) Furthermore, to evaluate the differences between our proposed
𝐲𝑖 − 𝐲| − |𝐲𝑖 − 𝐲|)2
𝑖=1 (|̂ model and the benchmark models, we conducted the Giacomini-White
∑𝑁 ( ) (̂ )
(GW) [8] test in this study. The GW test can be seen as an extension of
𝑖=1 𝐲𝑖 − 𝐲
̂
𝐲𝑖 − 𝐲
2
R = √ (32) the Diebold–Mariano (DM) test, which measures Conditional Predictive
∑𝑁 ( ) ∑𝑁 ( ), Ability (CPA) instead of Unconditional Predictive Ability (UPA). This
𝑖=1 𝐲𝑖 − 𝐲 𝑖=1
̂
𝐲 𝑖 − 𝐲
method is commonly used to compare the forecasting performance of
( )
|𝐲 − ̂ two time series models and determine which model’s predictions are
| 𝐲
MdAPE = 𝑚𝑒𝑑 𝑖𝑜𝑛 | | × 100% . (33) superior.
| 𝑦
| ∑𝑇 [ 𝐴 ( ) ( )]
𝐿𝑡 𝑦𝑡 , 𝑦̂𝑡 − 𝐿𝐵
𝑡 𝑦𝑡 , 𝑦̂𝑡
This study utilizes four evaluation metrics to assess the performance GW = 𝑡=1 √ , (39)
of the proposed probabilistic prediction model in probabilistic forecast- 𝑆 2 ∕𝑝
ing. These metrics aim to provide a comprehensive assessment of the where 𝐿(∙) represents the loss function for prediction errors, and 𝑆 2 is
( ) ( )
accuracy and reliability of the model in probabilistic forecasting tasks. the sample variance defined by 𝑑𝑡 = 𝐿𝐴 𝐵
𝑡 𝑦𝑡 , 𝑦̂𝑡 − 𝐿𝑡 𝑦𝑡 , 𝑦̂𝑡 . P denotes
Here is a brief description of these metrics. the degrees of freedom. Given the predictions from model A and model
Prediction Interval Coverage Probability (PICP): PICP measures the B, conducting the test evaluates whether the loss difference of model A
proportion of actual observed values that fall within the probability is significantly higher than or equal to that of model B. If so, it indicates
intervals generated by the model. It is an indicator of the model’s ability that the predictions from model B are significantly more accurate than
to cover real data. Ideally, PICP should be close to the pre-specified those from model A.
confidence level, such as 95
1 ∑
𝑁 4.3. Baseline models
PICP = 𝐼 × 100%, (34)
𝑁 𝑖=1 𝑖
Patch-based temporal self-attention (PatchTST) [34] is a model for
where, 𝑁 represents the number of samples, and 𝐼𝑖 represents an financial time series data that utilizes self-attention mechanisms. It
indicator function that takes a value of 1 when the observed value falls decomposes time series data into multiple patches and applies self-
within the predicted interval and 0 otherwise. attention within these patches to capture temporal dependencies.
Prediction Interval Normalized Average Width (PINAW): PINAW Transformer [35] is a powerful deep learning architecture widely
quantifies the average width of the probability intervals generated used for various sequence modeling tasks, including financial time
by the model. This metric helps assess the model’s estimation of series forecasting. It employs self-attention mechanisms to handle se-
uncertainty, and a narrower probability interval is generally preferred. quential data and can capture long-range dependencies.
1 ∑ 𝑈𝑖,𝑡 − 𝐿𝑖,𝑡
𝑁 Informer [36] is specifically designed for time series forecasting
PINAW = × 100%, (35) and employs multi-layer self-attention mechanisms and CNN structures.
𝑁 𝑖=1 𝑥𝑚𝑎𝑥 − 𝑥𝑚𝑖𝑛
It is suitable for handling long sequences in financial data.
where, 𝑁 represents the number of samples, 𝑈𝑖,𝑡 represents the upper Neural basis expansion analysis time series (NBEATS) [37] is a
bound of the 𝑖th sample, 𝐿𝑖,𝑡 represents the lower bound of the 𝑖th neural network-based model that processes time series by combining
sample, 𝑥𝑚𝑎𝑥 represents the maximum value of the observed values, and multiple base modules. It is commonly used in the financial domain
𝑥𝑚𝑖𝑛 represents the minimum value of the observed values. for predicting stock prices and other financial indicators.
Accumulated Width Deviation (AWD): AWD measures the difference Non-autoregressive hierarchical time series (NHITS) [38] is a non-
between the width of the probability intervals generated by the model autoregressive model capable of simultaneously predicting multiple

8
X.H. Qi et al. Applied Soft Computing 169 (2025) 112600

Table 2
Hyper parameter searchspace for the benchmark models.
Model Parameter values
TFT 𝐻: (64, 128, 256); 𝐿: (2, 3, 4); 𝛼: (0.1, 0.2, 0.3); 𝛽: (4, 8);
PatchTST 𝐻: (64, 128, 256); 𝐿: (2, 3, 4); 𝑃 : (4, 8, 16);
Transformer 𝐻: (64, 128, 256); ℎ: (4, 8); 𝐿: (2, 3, 4);
Informer 𝐻: (64, 128, 256); ℎ: (4, 8); 𝐿: (2, 3, 4);
NBEATS 𝐵: (3, 4, 5); 𝑆: (2, 3);
NHITS 𝐻: (64, 128, 256); 𝐿: (2, 3, 4);
ESRNN 𝐻: (64, 128, 256); 𝜏: (24, 56, 168);
LSTM 𝐻: (64, 128, 256); 𝐿: (1, 2, 3); 𝛼: (0.1, 0.2, 0.3);
TCN 𝐹 : (64, 128, 256); 𝑘: (3, 5, 7); 𝐻: (64, 128, 256); 𝐿: (2, 3, 4);
MLP 𝐻: (64, 128, 256); 𝐿: (2, 3, 4);
TFT Component: 𝐻: (64, 128, 256); 𝐿: (2, 3, 4); 𝛼: (0.1, 0.2, 0.3); 𝛽: (4, 8);
DeepAR Component: 𝐻: (64, 128, 256); 𝐿: (2, 3, 4); 𝛼: (0.1, 0.2, 0.3);
TFMADR
MSSA Component: 𝐺: (100, 200, 300); 𝐶 𝑅: (0.1, 0.5, 0.9); 𝑀 𝑅: (0.01, 0.05, 0.1);
Loss function: 𝜆: (0.05,0.1, 0.5, 1.0); 𝜇: (0.05, 0.1, 0.5, 1.0);

Note: 𝐻: the number of nodes in the hidden layers. 𝐿: the number of layers in the network. 𝛼: the proportion of randomly deactivated
nodes. 𝐺: Number of generations in MSSA for iterative optimization. 𝑃 : the length of the patches in the PatchTST model. ℎ: the
number of attention heads in the attention mechanism. 𝐵: the number of blocks in the NBEATS model. 𝑆: the number of stacks in
the NBEATS model. 𝜏: the seasonality parameter in the ESRNN model. 𝐹 : the number of convolution filters in the TCN model. 𝑘: the
size of the convolution kernels in the TCN model. 𝐺: Number of generations in MSSA for iterative optimization. 𝐶 𝑅: Crossover rate in
MSSA, defining the proportion of crossover in the population. 𝑀 𝑅: Mutation rate in MSSA, defining the rate at which solutions are
mutated. 𝜆: Fusion parameter for point prediction loss, optimizing 𝑃 . 𝜇: Fusion parameter for interval prediction loss, optimizing 𝐼 .

time series, making it suitable for scenarios in the financial domain that 5. Empirical analysis
require multivariate forecasting.
Exponential smoothing recurrent neural network (ESRNN) [39] In this section, we primarily delve into the experimental results.
combines classical exponential smoothing methods with recurrent neu- Firstly, we conduct an in-depth analysis of the point prediction results
ral networks for time series forecasting in the financial domain. It can of the integrated forecasting system. Subsequently, we integrate these
adapt to different data patterns, including seasonality and trends. point prediction results into interval prediction channels for proba-
Long short-term memory (LSTM) [15] is a recurrent neural network bilistic forecasting analysis. When evaluating predictive performance,
commonly used for financial time series forecasting. It can capture long- we rely on a comprehensive evaluation metric system to assess the
term dependencies and is suitable for analyzing dynamic changes in performance of the two different prediction channels. It is worth noting
financial markets. that, in this research phase, we plan to carry out four distinct experi-
ments to evaluate the performance of the integrated forecasting system
Temporal convolutional network (TCN) [40] is a model based on
developed in this paper in terms of point or interval prediction. As
convolutional neural networks designed for financial time series data.
model outcomes may vary with each run, we take the average of five
It effectively captures temporal dependencies and performs well in
runs for the final data results to ensure a fair comparative analysis.
financial forecasting tasks.
Multilayer perceptron (MLP) [22] is a traditional feedforward neu- 5.1. Experiment I: Analysis of point forecasting results
ral network often used for financial time series forecasting. It consists
of multiple fully connected layers and is suitable for various financial To validate the superiority of our proposed model in the point
data modeling tasks. forecasting channel, we conducted comparative analyses with 10 other
state of the art baseline models. We evaluated the forecasting per-
4.4. Experiment settings formance using six different evaluation metrics. The first five models
are based on Transformer architectures, which are suitable for long-
To ensure fairness and comprehensiveness in the comparisons, term financial sequence forecasting and demonstrate relatively superior
we performed hyperparameter tuning using cross-validation across all predictive performance. The latter five models represent mainstream
model configurations. The specific hyperparameter search ranges are forecasting methods and serve as benchmark models during the model-
building process. Therefore, the experimental results can be categorized
detailed in Table 2, with bolded parameters representing the best-
into two gradient model groups. Table 3 presents their one-step-ahead
performing values selected for prediction. This process provided deeper
forecasts (one day ahead). Fig. 3 shows the 90% and 80% prediction
insights into how to achieve optimal performance for each model.
intervals.
It is important to note that, for consistency across experiments,
In the first gradient model group, observing the results in Table 3,
certain parameters were held constant across all models. Specifically,
it is evident that our proposed TFMADR model consistently outper-
the batch size was fixed at 32, the learning rate was set to 0.001, and
forms all baseline models across all datasets. Firstly, compared to
the number of training epochs was set to 200. Additionally, the input the state-of-the-art baseline model TFT, our TFMADR model exhibits
length, denoted as 𝑇 , was standardized: for deep learning models, the significant improvements with MAPE1 = 34.11%, MAPE2 = 65.91%,
input length was set to 128, while for machine learning models, it MAPE3 = 48.57%, and MAPE4 = 43.41% on four different datasets,
was set to 64. This distinction was based on experimental optimization respectively. This is attributed to the TFMADR’s ability to inherit
results, as we found that machine learning models did not perform the advantages of the attention mechanism in Transformers and in-
optimally with an input length of 128, whereas deep learning models corporate self-regulating fusion parameters (also known as correction
showed better performance with longer input sequences. factors). These correction factors allow the TFMADR model to explore
All experiments were implemented on the Python platform, utilizing variations within time-varying sequences. When the model detects
frameworks and tools such as TensorFlow and PyTorch to ensure com- significant trend changes in the data, it triggers a correction mecha-
putational efficiency and the reliability of results across different mod- nism to adjust itself, thus aligning its predictions with the true trend
els. This experimental setup not only enhanced the comparability of the changes and reducing prediction errors. Secondly, in comparison to
outcomes but also provided strong support for model optimization. the second-best model, PatchTST, TFMADR demonstrates a substantial

9
X.H. Qi et al. Applied Soft Computing 169 (2025) 112600

Table 3
Comparison of the accuracy of point forecasting among different models in four financial markets.
DataSet Models MAE MAPE RMSE R2 IA MdAPE
TFMADR 1.0696 0.7115 0.8705 0.9939 0.9985 2.1230
TFT 2.9098 0.9542 1.3739 0.8137 0.9488 5.6633
PatchTST 2.8714 1.1015 1.5494 0.8121 0.9450 6.1725
Transformer 2.8472 1.3644 2.2759 0.8061 0.9290 6.5728
Informer 4.4445 1.6410 2.7353 0.7785 0.9343 7.6199
Brent NBEATS 4.6229 1.5729 2.8517 0.7164 0.9106 7.9323
NHITS 5.7980 1.8438 3.0277 0.6835 0.8799 8.3325
ESRNN 5.7455 1.9047 2.9280 0.6973 0.9207 8.4417
LSTM 6.0565 2.2386 3.2026 0.6625 0.8199 9.7304
TCN 6.4277 2.5077 3.4638 0.6287 0.7942 9.8507
MLP 6.6425 2.8451 3.8253 0.6058 0.7586 9.9580
TFMADR 0.8535 0.5107 0.6642 0.9927 0.9981 1.6895
TFT 2.6394 0.8473 1.5413 0.7858 0.9403 4.3509
PatchTST 3.6794 1.1694 1.5832 0.7820 0.9303 4.4410
Transformer 3.6357 1.2735 1.8316 0.7861 0.9396 4.4648
Informer 5.2135 1.5830 1.9250 0.7785 0.9499 4.7639
ETH-USD NBEATS 4.9717 1.8051 2.2167 0.7540 0.9313 5.2137
NHITS 5.1717 1.9553 2.9167 0.6940 0.9013 5.4137
ESRNN 5.8806 2.4687 3.1879 0.7329 0.9083 5.5963
LSTM 6.1347 2.5583 3.6402 0.6413 0.8299 6.5376
TCN 6.4471 2.6428 3.7258 0.6725 0.8824 6.7253
MLP 6.7258 2.8146 3.7823 0.6942 0.8729 6.9453
TFMADR 1.3970 0.8235 1.9141 0.9465 0.9846 2.6498
TFT 3.6042 1.2235 1.9153 0.8690 0.9678 1.8660
PatchTST 4.0555 1.2503 2.9146 0.8781 0.9517 2.7809
Transformer 3.9264 1.4307 2.3067 0.8564 0.9608 2.3713
Informer 4.4094 1.5235 2.8879 0.8085 0.9499 4.7639
Spot-FI NBEATS 5.0072 1.8527 3.4210 0.7540 0.9313 5.2137
NHITS 5.5737 1.9614 4.3433 0.6940 0.9013 5.4137
ESRNN 5.6507 2.1841 3.1598 0.7329 0.9083 5.5963
LSTM 6.3610 2.5175 4.5863 0.6413 0.8299 6.5376
TCN 6.5086 2.8437 4.6812 0.7864 0.8475 6.7353
MLP 6.8364 2.9043 4.7450 0.8135 0.8643 6.8261
TFMADR 0.7957 0.4372 1.5736 0.9929 0.9982 1.5901
TFT 3.7020 0.6270 1.9804 0.7886 0.9416 3.9320
PatchTST 3.6678 0.9844 2.0236 0.7925 0.9477 4.1131
Transformer 3.6612 1.2073 2.0339 0.7932 0.9418 4.2223
Informer 4.1289 1.2947 2.1056 0.7370 0.9259 4.8991
Carbon NBEATS 4.6273 1.6209 2.7601 0.7970 0.9408 4.5274
NHITS 4.3396 1.8835 2.3844 0.7606 0.9286 5.1010
ESRNN 4.9523 2.1063 2.8219 0.7590 0.9280 4.9936
LSTM 5.1345 2.4701 3.5784 0.5933 0.8298 5.5796
TCN 5.2631 2.9672 3.7368 0.6186 0.8364 5.7535
MLP 5.6471 3.1301 3.8253 0.6428 0.8446 5.7754

Note: In the table, the optimal predictions are indicated by bold green font, the second-best model
predictions are shown in purple font, and the third-best model predictions are highlighted in orange
font. The same color scheme applies throughout.

advantage in financial time series forecasting. TFMADR combines the that the interpretable NBEATS model also performs exceptionally well.
self-attention mechanism of Transformers with time series-specific tech- However, financial time series are often influenced by external factors,
niques, enabling it to more effectively capture seasonality, periodicity, displaying sharp fluctuations. Therefore, developed models need to
and long-term dependencies within time series data. This results in possess the capability to capture such abrupt features.
more accurate predictions and enhanced modeling capabilities, making Unfortunately, other classical time series forecasting models like
it well-suited for handling complex and volatile data in financial mar- ESRNN, LSTM, TCN, and MLP exhibit certain limitations in this regard.
kets. Lastly, relative to traditional Transformers, the third-best model, Their predictive performance is relatively less stable, making them less
PatchTST, holds a significant advantage in financial time series fore- adept at handling the complex fluctuations seen in financial time series.
Thus, the success of the TFMADR model highlights its exceptional per-
casting. It segments time series data into smaller ‘‘patches’’, allowing
formance in dealing with this instability and capturing abrupt features.
the model to handle large volumes of data more efficiently, reduc-
This finding holds significant implications for time series forecasting in
ing computational complexity. Simultaneously, it excels at capturing
the financial domain.
local patterns and rapid changes within time series data, making it
Furthermore, to assess the stability of our proposed model, we
particularly suitable for high-frequency trading and situations with
conducted a multi-step forecasting error analysis for the next 20 days.
significant market volatility. This results in more reliable forecasting Fig. 4 presents the multi-step forecasting results for different financial
performance. In the second gradient model group, we can observe markets. In general, as observed from the color variations in Fig. 4, the
that the TFMADR model exhibits outstanding performance across all forecasting errors of each model gradually increase with an increase
configurations of baseline models. This remarkable advantage can be in the time step. When observing from right to left, the colors in Fig. 4
attributed to the advanced optimization strategy employed in our also deepen, indicating a gradual decline in the predictive performance
TFMADR model, enabling it to efficiently optimize model parameters of different models. Additionally, as one moves further to the left, the
during the process of exploring data features and rapidly converge to surface of Fig. 4 becomes steeper, suggesting a faster growth rate of
optimal solutions. Upon further examination of Table 3, we can notice prediction errors.

10
X.H. Qi et al. Applied Soft Computing 169 (2025) 112600

Fig. 3. The four plots display the 90% and 80% prediction intervals for four different datasets. The first plot represents the point-interval prediction for the Brent dataset, where
the data is relatively smooth, and the probability intervals are distributed uniformly. The second plot shows the point-interval prediction for the Carbon dataset, where the data
exhibits jump discontinuities, leading to considerable variation in the interval distribution. The third plot presents the point-interval prediction for the ETH-USD dataset, where
the data distribution is the most stable. The fourth plot illustrates the point-interval prediction for the Spot-FI dataset, which experiences the largest fluctuations, yet the interval
distribution remains relatively uniform.

Fig. 4. The comparison of prediction performance across different models is illustrated in the following figure. The color depth and surface height in the figure reflect the magnitude
of errors, with darker colors and higher surfaces indicating larger errors. The steepness of the surface reflects the stability of the model, with smoother surfaces indicating greater
stability. It is worth noting that TFMADR exhibits the optimal fusion mode, demonstrating the lowest errors and greater stability.

11
X.H. Qi et al. Applied Soft Computing 169 (2025) 112600

Table 4
Comparison of the accuracy of electricity price day-ahead forecasting among different models.
Dataset Models FICP (90%) FINAW (90%) AWD (90%) FICP (80%) FINAW (80%) AWD (80%)
TFMADR 0.9264 0.1403 0.0291 0.8426 0.2495 0.1298
TFT 0.9094 0.1564 0.0489 0.8207 0.2892 0.1390
PatchTST 0.9171 0.1586 0.0653 0.8007 0.2560 0.1446
Transformer 0.8903 0.1663 0.0751 0.8133 0.3022 0.1575
Informer 0.8875 0.1793 0.0870 0.7805 0.3506 0.1751
Brent NBEATS 0.8596 0.1803 0.1015 0.7528 0.3887 0.2059
NHITS 0.8423 0.2096 0.1268 0.7125 0.3950 0.2156
ESRNN 0.8269 0.2136 0.1283 0.7068 0.4096 0.2353
LSTM 0.8193 0.2379 0.1395 0.6890 0.4172 0.2488
TCN 0.8145 0.2449 0.1569 0.6713 0.4526 0.2679
MLP 0.8107 0.2577 0.1837 0.6588 0.4490 0.3057
TFMADR 0.9328 0.1640 0.0538 0.8370 0.2457 0.1343
TFT 0.9214 0.1819 0.0712 0.8225 0.2743 0.1429
PatchTST 0.9206 0.1753 0.1083 0.7926 0.2826 0.2569
Transformer 0.9074 0.2055 0.1305 0.7833 0.3057 0.2981
Informer 0.8856 0.2321 0.1584 0.7761 0.3429 0.2723
ETH-UDS NBEATS 0.8614 0.2581 0.1651 0.7426 0.3843 0.2831
NHITS 0.8536 0.2623 0.1755 0.7350 0.4167 0.2839
ESRNN 0.8503 0.2791 0.1795 0.7308 0.4485 0.2767
LSTM 0.8452 0.2837 0.2027 0.7014 0.4578 0.3079
TCN 0.8373 0.2922 0.2143 0.6917 0.4772 0.3116
MLP 0.8021 0.3178 0.2351 0.6816 0.4827 0.3306
TFMADR 0.9459 0.1429 0.0536 0.8454 0.2538 0.1587
TFT 0.9356 0.1806 0.0837 0.8264 0.2681 0.2455
PatchTST 0.9257 0.2041 0.0706 0.7969 0.2594 0.2196
Transformer 0.9127 0.2253 0.1379 0.7453 0.3045 0.2787
Informer 0.8826 0.2553 0.1588 0.7526 0.2943 0.3325
Spot-FI PatchTST 0.8725 0.2638 0.1464 0.7483 0.3372 0.3586
NHITS 0.8537 0.3028 0.1556 0.7203 0.3682 0.3739
ESRNN 0.8241 0.3826 0.2082 0.7036 0.3925 0.4274
LSTM 0.8056 0.4095 0.2409 0.7031 0.4289 0.4481
TCN 0.8014 0.4254 0.2846 0.6877 0.4355 0.4513
MLP 0.7826 0.4526 0.2954 0.6795 0.4967 0.4825
TFMADR 0.9342 0.1473 0.0715 0.8259 0.2186 0.1976
TFT 0.9074 0.1696 0.1028 0.8036 0.2677 0.2058
PatchTST 0.8692 0.1982 0.1206 0.7826 0.3037 0.2405
Transformer 0.8822 0.2497 0.1861 0.7936 0.3424 0.2738
Informer 0.8250 0.3045 0.2144 0.7638 0.3539 0.3165
Carbon NBEATS 0.8072 0.2895 0.2386 0.7472 0.3819 0.3425
NHITS 0.7949 0.3248 0.2563 0.6950 0.4167 0.3898
ESRNN 0.7625 0.3766 0.2948 0.6853 0.4254 0.4049
LSTM 0.7259 0.4254 0.3271 0.6468 0.4637 0.4427
TCN 0.7064 0.4584 0.3538 0.6346 0.4870 0.4725
MLP 0.6873 0.4726 0.3624 0.6005 0.4964 0.5089

Specifically, in the four markets, the curve for our proposed TF- the narrowest prediction intervals. However, it is essential to note that
MADR model exhibits a relatively stable trend, with forecasting errors a single model can only reflect its own predictive performance and may
gradually increasing but at a slower rate. However, in the ETH-USD not comprehensively consider cases with high data volatility.
market, the LSTM model performs relatively poorly, showing a ridge- When dealing with tasks characterized by significant data volatility,
like pattern, thereby highlighting the limitations of traditional fore- single models may reveal certain disadvantages as they might fail
casting models when dealing with highly volatile financial time series to capture complex and rapidly changing trends. In domains like fi-
forecasting tasks. nance, where data exhibits high volatility and is significantly influenced
by external factors, prediction becomes more complex and challeng-
5.2. Experiment II: Analysis of interval forecasting results ing. Nevertheless, our TFMADR model, through optimization strategies
and comprehensive performance, manages to achieve the best overall
Analysis of Interval Forecasting Results Based on the point forecast- forecasting results even in such highly unstable environments.
ing results obtained, we derive interval forecasting results. As shown Upon closer examination of Table 4, it is noteworthy that main-
in Table 4, we analyze the 90% and 80% prediction interval results stream time series forecasting models such as ESRNN, LSTM, TCN, and
using three different metrics. It is worth noting that these metrics are MLP require intricate model configurations when conducting interval
crucial for assessing the performance of time series forecasting models, forecasting. In contrast to these traditional models, our TFMADR model
especially in tasks like finance that require accurate uncertainty esti- internally integrates model fusion factors, implying that after obtaining
mation. They help us understand the model’s predictive accuracy and results from the previous point forecasting channel, TFMADR can au-
its ability to estimate uncertainty, aiding in the selection of the most tonomously learn the upper and lower bounds of intervals without the
suitable model and parameter configuration for specific applications. need for additional manual parameter adjustments. This provides the
Upon observing Table 4, it is evident that our proposed TFMADR model with higher adaptability and flexibility in terms of performance.
model continues to exhibit outstanding performance in interval fore- However, it is essential to emphasize that a single model followed by
casting. Next in line are TFT, PatchTST, and the Transformer models, a quantile regression model capable of outputting prediction intervals
ranking within the top three of the benchmark models. This indicates does not have any model connection parameters. Consequently, the
that our proposed model possesses the highest predictive accuracy and errors borne by these two models accumulate, posing a challenge for

12
X.H. Qi et al. Applied Soft Computing 169 (2025) 112600

Fig. 5. The performance comparison of interval prediction across different financial markets is depicted in the figure above. The figure displays the visualization of Mean Pinball
Loss (MPL) errors with a prediction interval coverage of 90%. The interpretation of the figure is similar to that of point prediction, where the color depth and surface height
represent the magnitude of errors, with darker colors and higher surfaces indicating larger errors.

prediction tasks with high volatility, such as financial sequences. This prediction results compared to other models. This is highly valuable
error accumulation may lead to a decrease in the accuracy of interval information for decision-makers and investors in fields such as the
predictions in extreme cases. financial market because they can rely more confidently on the interval
In summary, the internal model fusion and adaptability of the predictions generated by the TFMADR model for decision-making and
TFMADR model give it a significant advantage in interval forecast- investment strategies.
ing. However, for tasks that require more precise interval predictions,
In summary, the results in Fig. 5 further emphasize the outstanding
careful consideration of model configuration and the impact of error
performance and stability of the TFMADR model in interval prediction
accumulation is necessary to ensure the final prediction results are
tasks, providing robust support for addressing forecasting challenges in
highly reliable. This underscores the critical importance of model selec-
highly volatile environments like the financial market. This underscores
tion and parameter configuration in the field of time series forecasting,
particularly when interval estimation is required. the importance of selecting models that suit task requirements and
Similar to point forecasting, we also present the model’s interval conducting detailed err.or analysis.
forecasting results for the next 20 steps (future 20 days) to highlight
its stability in interval prediction. As shown in Fig. 5, the color depth 5.3. Experiment III: Significance test for the models
reflects the magnitude of errors, while the trend of the surface reflects
the change in error. Overall, the surface exhibits a trend from high on
In Experiment III, we conducted significance tests aimed at evaluat-
the left to l ow on the right, with the color intensity transitioning from
lighter at the front to darker at the back. ing the performance of different models in financial market forecasting.
Specifically, we can observe that our proposed TFMADR model We employed the Giacomini-White test (GW) to perform significance
demonstrates the most stable forecasting performance across the four analysis on ten different models across four markets. This type of
financial markets. This implies that in the task of interval prediction for significance analysis is also applicable to assessing individual models.
the next 20 days, the TFMADR model exhibits relatively small varia- The basis of the test was the MLP error values using 90% probability
tions in prediction errors and can provide more consistent and reliable intervals. Fig. 6 displays the results based on the MPL evaluation

13
X.H. Qi et al. Applied Soft Computing 169 (2025) 112600

Fig. 6. Based on the Mean Pinball Loss (MPL), we conducted the GW tests between the proposed TFMADR model and the benchmark models, resulting in GW test results for the
four financial markets. Each small grid initially appears black. When the model predictions in the column corresponding to that grid are superior to the model predictions in the
row corresponding to that grid, the grid will be displayed in color. The color gradient reflects the variability in model predictions, with darker green indicating larger differences.

metric, providing a visual representation of the significance of each model, which combines the reasonable fusion of exogenous factors and
model compared to others. considers stability to enhance predictive accuracy and robustness.
In the significance testing, we employed the same models as men-
tioned in the table above: TFT, PatchTST, Transformer, Informer, 5.4. Ablation study
NBEATS, NHITS, ESRNN, LSTM, TCN, MLP, and our proposed TFMADR
model. Each 𝑝-value of the comparison indicates whether the predictive Fig. 7(a) compares the multi-step probabilistic forecasting per-
performance of that model is statistically significantly better than the formance of our proposed TFMADR model with the baseline model
compared models. If the 𝑝-value is lower than the significance level, it DeepAR. The results clearly demonstrate that the TFMADR model
suggests that the predictive performance of that model is significantly consistently outperforms DeepAR across multiple prediction horizons,
superior to the other models. exhibiting a lower Mean Pinball Loss (MPL) at each step. This indicates
Based on the results of the significance testing, several key findings that the TFMADR model achieves higher accuracy and stability in
emerged. Firstly, ESRNN and NHITS exhibited slightly lower trends interval forecasting.
compared to other models in the ETH-USD and Brent markets, while In the Brent dataset, TFMADR shows a lower MPL at all predic-
NHITS outperformed other models in the Carbon market. Additionally, tion horizons, with a particularly noticeable improvement at longer
NHITS demonstrated inconsistent performance in the Spot-FI market. horizons. For example, at the 7th step (Pred7), TFMADR achieves an
These observations highlight that performance variations are influ- MPL of 3.5032, significantly lower than DeepAR’s 4.2032. Similarly, in
enced by the unique data characteristics of different markets. These the carbon emission dataset, TFMADR shows a notable improvement
findings underscore the significance of economic factors in shaping the right from the initial prediction step. For instance, at the 1st step
predictive capabilities of models. Furthermore, it is worth noting that in (Pred1), TFMADR’s MPL is 0.4372, nearly half of DeepAR’s 0.8372.
any market, no baseline model significantly outperformed our proposed This advantage persists as the prediction horizon increases, indicating
TFMADR model. This suggests that the TFMADR model’s performance that TFMADR effectively adapts to the dynamic changes in carbon
in financial market forecasting is significantly better than other models. pricing, leveraging its ability to capture long-term dependencies from
This outcome further validates the innovation and effectiveness of our TFT alongside DeepAR’s probabilistic forecasting capabilities.

14
X.H. Qi et al. Applied Soft Computing 169 (2025) 112600

Fig. 7. The comparison of probability prediction performance between different model versions, based on Mean Pinball Loss (MPL). The first figure shows the comparison of the
proposed TFMADR model with individual versions of the DeepAR model in terms of their probability prediction performance. The second figure shows the comparison of the
prediction performance of the TFMADR model with different model input lengths.

In the ETH-USD dataset, TFMADR also outperforms DeepAR across 128 consistently performed best across all datasets, and therefore, we
all prediction steps, with a significant reduction in MPL. For example, at chose 128 as the final input length setting to strike the optimal balance
the 4th step (Pred4), TFMADR achieves an MPL of 1.9011, compared to between performance and efficiency.
DeepAR’s 2.4011, demonstrating that TFMADR provides more reliable Next, we conduct an ablation study analyzing the training time
interval forecasts in the volatile cryptocurrency market. This suggests and memory usage of the model. The experimental results are shown
that TFMADR’s design effectively handles the probabilistic charac- in Fig. 8. This figure clearly illustrates the differences between the
teristics of high-volatility markets. Similarly, in the Spot-FI dataset, proposed TFMADR model and other baseline models in terms of train-
TFMADR’s advantage is clear. For instance, at the 6th step (Pred6), ing time (measured in milliseconds per iteration) and parameter size
TFMADR’s MPL is 2.582, significantly lower than DeepAR’s 3.282. (memory usage). Overall, Transformer-based deep learning models ex-
This trend continues across all horizons, showing that TFMADR not hibit significant advantages in capturing the complex patterns of time
only reduces MPL but also enhances the reliability of financial data series data, but they also demand higher training time and mem-
forecasting. ory consumption. In contrast, traditional machine learning models,
Fig. 7(b) illustrates the impact of input length on the probabilistic while requiring fewer parameters and less training time, fall short in
forecasting performance of the TFMADR model. We evaluated the predictive accuracy.
model’s performance with different input lengths (𝑇 = 32, 64, 128, 256, Among the Transformer-based deep learning models, Transformer,
512) across four datasets. From Fig. 7(b), it is evident that, across all Informer, PatchTST, TFT, and the proposed TFMADR all exhibit large
datasets, the MPL (Mean Pinball Loss) initially decreases with increas- parameter sizes and longer training times. The Transformer model
ing input length, but eventually rises as the input length continues occupies a moderate position, with a training time of 390 ms and
to grow. This indicates that an optimal input length helps the model memory requirements of 110 parameters, showing relatively high pre-
better capture patterns in the time series, while excessively long input dictive accuracy (MAPE of approximately 1.9%). Informer, with an
lengths may introduce irrelevant information, leading to a decline in even longer training time of 560 ms and parameter size of 90, likely
forecasting performance. Specifically, when the input length is 128, the benefits from optimization strategies for handling long sequences, but
MPL reaches its minimum value for all datasets, indicating that this its predictive accuracy (MAPE of about 2.1%) is lower than that of other
input length minimizes the model’s prediction error. For instance, in Transformer-based models. PatchTST and TFT have training times of
the Brent dataset, the MPL at an input length of 128 is approximately 610 and 650 ms, with parameter sizes of 100 and 150, respectively,
1.5, whereas when the input length is increased to 512, the MPL rises demonstrating strong capabilities in capturing time series dependen-
to around 2.0, suggesting that longer historical inputs do not positively cies. The proposed TFMADR model further integrates multiple deep
contribute to prediction performance. learning structures (such as TFT and DeepAR), with a training time of
These results suggest that the appropriate input length is crucial 510 ms and a parameter size of 180. Despite its larger memory require-
for balancing the model’s ability to capture both long- and short-term ments, TFMADR achieves the best predictive accuracy, significantly
dependencies. Too short an input length may prevent the model from reducing errors, especially when handling both long- and short-term
accessing sufficient historical information, thus affecting prediction dependencies and uncertainty.
accuracy. On the other hand, too long an input length can make the In comparison, traditional machine learning models offer clear ef-
model overly complex, susceptible to noise, and prone to interference ficiency advantages in terms of memory usage and training time. The
from irrelevant data. Based on the experiments, an input length of MLP model, with the smallest parameter size 22 and extremely short

15
X.H. Qi et al. Applied Soft Computing 169 (2025) 112600

Fig. 8. Comparison of training time and model parameter size between different benchmark models. The training time is calculated based on the speed of ms/iter. The memory
is represented by the size of the circle, with the memory size of the benchmark model shown in the top-right corner of the circle.

training time (150 ms), is highly efficient but suffers from lower self-adaptive fusion parameters enabled the model to better align with
predictive accuracy (MAPE of 2.8%). The LSTM model, as a classic internal data trends, effectively reducing forecasting errors. Similarly,
RNN structure, has a training time of 250 ms and a parameter size for interval forecasting, the TFMADR model showed remarkable perfor-
of 26, offering better predictive accuracy (MAPE of around 2.3%) at mance, achieving higher predictive accuracy and narrower prediction
a moderate computational cost. ESRNN and N-BEATS, with training intervals. This is particularly important in the financial domain, where
times of 230 and 300 ms, and parameter sizes of 30 and 50, respec- precise uncertainty estimates are crucial for risk management and
tively, strike a good balance between accuracy (MAPE of around 2%) decision-making. The model’s internal fusion and adaptability provide
and computational efficiency, making them ideal choices in resource- significant advantages when handling the highly volatile and complex
constrained environments. Models like TCN and NHITS, with more nature of financial market data.
complex structures, show increased training times and parameter sizes, This research presents several key innovations, particularly in the
but offer slightly better predictive accuracy than LSTM and MLP. use of multi-objective optimization and self-adaptive parameter fusion,
Thus, Transformer-based models significantly outperform traditional which enhance the model’s ability to adapt to diverse financial data
machine learning models in terms of training time and parameter scenarios. The TFMADR model holds substantial promise for practical
size, but their superior predictive accuracy highlights their advantage applications. Financial institutions can leverage its enhanced forecast-
in handling complex temporal relationships. The proposed TFMADR ing accuracy to better estimate portfolio risks and implement more
model, by integrating deep learning techniques, achieves low-error pre- effective risk mitigation strategies. Governments can use the model’s
predictions to inform macroeconomic and monetary policy decisions,
dictions, with an increase in training cost but a notable improvement
aiding in the management of economic fluctuations and crises. More-
in capturing long-term and short-term dependencies and addressing
over, the model’s capabilities can support the development of innova-
uncertainty. Traditional models, on the other hand, offer moderate
tive financial products, offering more flexible risk management tools for
predictive accuracy at a lower computational cost, making them more
different types of investors. Individual investors can also benefit from
suitable for applications with limited computational resources.
the model’s predictive insights, enabling them to make more informed
investment decisions and optimize their portfolios while gaining a
6. Conclusions
deeper understanding of market risks.
Despite these advantages, the TFMADR model still faces challenges,
The primary aim of this research was to enhance the long-term
including error accumulation, operational efficiency, and parameter
point and interval forecasting capabilities for financial time series data.
configuration optimization. Future research efforts should focus on ad-
To achieve this, we introduced the TFMADR model, which combines
dressing these issues to further enhance the model’s stability, efficiency,
the strengths of both the TFT and DeepAR models. The optimal fu- and scalability, ensuring its applicability across a broader range of tasks
sion parameter configuration was determined using the Multi-Objective and financial contexts.
Simultaneous Search Algorithm (MSSA), and correction factors were
incorporated to improve the model’s accuracy and reliability. Extensive CRediT authorship contribution statement
experimentation demonstrated the effectiveness of the proposed model,
leading to several key conclusions. Xianghui Qi: Writing – original draft, Visualization, Software, Data
In terms of point forecasting, the TFMADR model outperformed curation. Zhangyong Xu: Writing – review & editing, Resources, Data
traditional financial time series forecasting models, as well as state-of- curation. Fenghu Wang: Writing – review & editing, Visualization,
the-art benchmark models, achieving superior predictive accuracy. The Software, Resources, Methodology.

16
X.H. Qi et al. Applied Soft Computing 169 (2025) 112600

Declaration of competing interest [18] W. Yang, J. Wang, T. Niu, P. Du, A novel system for multi-step electricity price
forecasting for electricity market management, Appl. Soft Comput. 88 (2020)
106029, http://dx.doi.org/10.1016/j.asoc.2019.106029.
The authors declare that they have no known competing finan-
[19] H. Jahangir, H. Tayarani, S. Baghali, A. Ahmadian, A. Elkamel, M.A. Golkar,
cial interests or personal relationships that could have appeared to M. Castilla, A novel electricity price forecasting approach based on dimension
influence the work reported in this paper. reduction strategy and rough artificial neural networks, IEEE Trans. Ind. Inform.
16 (4) (2019) 2369–2381, http://dx.doi.org/10.1109/TII.2019.2933009.
Acknowledgments [20] Z. Xu, M. Mohsin, K. Ullah, X. Ma, Using econometric and machine learning
models to forecast crude oil prices: Insights from economic history, Resour. Policy
83 (2023) 103614.
This research work was partly supported by the National Social [21] Z. Jin, Y. Jin, Z. Chen, Empirical mode decomposition using deep learning
Science Foundation of China under Grants No. 19AJL010. model for financial market forecasting, PeerJ Comput. Sci. 8 (2022) e1076,
http://dx.doi.org/10.7717/peerj-cs.1076.
[22] Y. Liang, Y. Lin, Q. Lu, Forecasting gold price using a novel hybrid model
Data availability
with ICEEMDAN and LSTM-CNN-CBAM, Expert Syst. Appl. 206 (2022) 117847,
http://dx.doi.org/10.1016/j.eswa.2022.117847.
Data will be made available on request. [23] X. Cheng, P. Wu, S.S. Liao, X. Wang, An integrated model for crude oil
forecasting: Causality assessment and technical efficiency, Energy Econ. 117
(2023) 106467, http://dx.doi.org/10.1016/j.eneco.2022.106467.
References [24] G.P. Herrera, M. Constantino, J.-J. Su, A. Naranpanawa, Renewable energy stocks
forecast using Twitter investor sentiment and deep learning, Energy Econ. 114
(2022) 106285, http://dx.doi.org/10.1016/j.eneco.2022.106285.
[1] D.K. Nguyen, T.L.D. Huynh, M.A. Nasir, Carbon emissions determinants and
[25] S.D. Mourtas, V.N. Katsikis, Exploiting the black-litterman framework through
forecasting: Evidence from G6 countries, J. Environ. Manag. 285 (2021) 111988,
error-correction neural networks, Neurocomputing 498 (2022) 43–58, http://dx.
http://dx.doi.org/10.1016/j.jenvman.2021.111988.
doi.org/10.1016/j.neucom.2022.05.036.
[2] J. Han, L. Yan, Z. Li, A task-based day-ahead load forecasting model for stochastic
[26] P. Wang, S.H. Gurmani, Z. Tao, J. Liu, H. Chen, Interval time series forecasting:
economic dispatch, IEEE Trans. Power Syst. (ISSN: 1558-0679) 36 (6) (2021)
A systematic literature review, J. Forecast. 43 (2) (2024) 249–285, http://dx.
5294–5304, http://dx.doi.org/10.1109/TPWRS.2021.3072904.
doi.org/10.1002/for.3024.
[3] D. Zhang, S. Lou, The application research of neural network and BP algorithm
[27] G. Liu, F. Xiao, C.-T. Lin, Z. Cao, A fuzzy interval time-series energy and
in stock price pattern classification and prediction, Future Gener. Comput. Syst.
financial forecasting model using network-based multiple time-frequency spaces
115 (2021) 872–879, http://dx.doi.org/10.1016/j.future.2020.10.009.
and the induced-ordered weighted averaging aggregation operation, IEEE Trans.
[4] B. Yu, C. Li, N. Mirza, M. Umar, Forecasting credit ratings of decarbonized firms:
Fuzzy Syst. 28 (11) (2020) 2677–2690, http://dx.doi.org/10.1109/TFUZZ.2020.
Comparative assessment of machine learning models, Technol. Forecast. Soc.
2972823.
Change 174 (2022) 121255, http://dx.doi.org/10.1016/j.techfore.2021.121255.
[28] K. Sa Teles Rocha Alves, R. Ballini, E. Pestana de Aguiar, Financial series
[5] M. Liu, C.-C. Lee, Capturing the dynamics of the China crude oil futures: Markov
forecasting: A new fuzzy inference system for crisp values and interval-valued
switching, co-movement, and volatility forecasting, Energy Econ. 103 (2021)
predictions, Comput. Econ. (2024) 1–49, http://dx.doi.org/10.1007/s10614-024-
105622, http://dx.doi.org/10.1016/j.eneco.2021.105622.
10670-w.
[6] T. Janke, F. Steinke, Probabilistic multivariate electricity price forecasting using
[29] F. Yuan, J. Che, An ensemble multi-step M-RMLSSVR model based on VMD and
implicit generative ensemble post-processing, in: 2020 International Conference
two-group strategy for day-ahead short-term load forecasting, Knowl.-Based Syst.
on Probabilistic Methods Applied to Power Systems, PMAPS, IEEE, 2020, pp.
252 (2022) 109440, http://dx.doi.org/10.1016/j.knosys.2022.109440.
1–6, http://dx.doi.org/10.1109/PMAPS47429.2020.9183687.
[30] M.J. Mokarram, R. Rashiditabar, M. Gitizadeh, J. Aghaei, Net-load forecasting
[7] P. Jiang, Z. Liu, J. Wang, L. Zhang, Decomposition-selection-ensemble forecasting
of renewable energy systems using multi-input LSTM fuzzy and discrete wavelet
system for energy futures price forecasting based on multi-objective version of
transform, Energy 275 (2023) 127425, http://dx.doi.org/10.1016/j.energy.2023.
chaos game optimization algorithm, Resour. Policy 73 (2021) 102234, http:
127425.
//dx.doi.org/10.1016/j.resourpol.2021.102234.
[31] V. Chavez-Demoulin, P. Embrechts, S. Sardy, Extreme-quantile tracking for
[8] B. Lim, S.Ö. Arık, N. Loeff, T. Pfister, Temporal fusion transformers for inter-
financial time series, J. Econometrics 181 (1) (2014) 44–52, http://dx.doi.org/
pretable multi-horizon time series forecasting, Int. J. Forecast. 37 (4) (2021)
10.1016/j.jeconom.2014.02.007.
1748–1764, http://dx.doi.org/10.1016/j.ijforecast.2021.03.012.
[32] C. Gao, N. Zhang, Y. Li, Y. Lin, H. Wan, Adversarial self-attentive time-variant
[9] D. Salinas, V. Flunkert, J. Gasthaus, T. Januschowski, DeepAR: Probabilistic
neural networks for multi-step time series forecasting, Expert Syst. Appl. (2023)
forecasting with autoregressive recurrent networks, Int. J. Forecast. 36 (3) (2020)
120722, http://dx.doi.org/10.1016/j.eswa.2023.120722.
1181–1191, http://dx.doi.org/10.1016/j.ijforecast.2019.07.001.
[33] B. Du, S. Huang, J. Guo, H. Tang, L. Wang, S. Zhou, Interval forecasting for
[10] Z. Liu, P. Jiang, J. Wang, L. Zhang, Ensemble system for short term carbon
urban water demand using PSO optimized KDE distribution and LSTM neural
dioxide emissions forecasting based on multi-objective tangent search algorithm,
networks, Appl. Soft Comput. 122 (2022) 108875, http://dx.doi.org/10.1016/j.
J. Environ. Manage. 302 (2022) 113951, http://dx.doi.org/10.1016/j.jenvman.
asoc.2022.108875.
2021.113951.
[34] Y. Nie, N.H. Nguyen, P. Sinthong, J. Kalagnanam, A time series is worth
[11] P. Jiang, Y. Nie, J. Wang, X. Huang, Multivariable short-term electric-
64 words: Long-term forecasting with transformers, 2022, http://dx.doi.org/10.
ity price forecasting using artificial intelligence and multi-input multi-output
48550/arXiv.2211.14730, arXiv preprint arXiv:2211.14730.
scheme, Energy Econ. 117 (2023) 106471, http://dx.doi.org/10.1016/j.eneco.
[35] K. Han, A. Xiao, E. Wu, J. Guo, C. Xu, Y. Wang, Transformer in transformer,
2022.106471.
Adv. Neural Inf. Process. Syst. 34 (2021) 15908–15919.
[12] M. Kostrzewski, J. Kostrzewska, Probabilistic electricity price forecasting with
[36] H. Zhou, S. Zhang, J. Peng, S. Zhang, J. Li, H. Xiong, W. Zhang, Informer: Beyond
Bayesian stochastic volatility models, Energy Econ. 80 (2019) 610–620, http:
efficient transformer for long sequence time-series forecasting, in: Proceedings of
//dx.doi.org/10.1016/j.eneco.2019.02.004.
the AAAI Conference on Artificial Intelligence, vol. 35, 2021, pp. 11106–11115,
[13] K. Chaudhari, A. Thakkar, Neural network systems with an integrated coefficient
http://dx.doi.org/10.1609/aaai.v35i12.17325, 12.
of variation-based feature selection for stock price and trend prediction, Expert
[37] B.N. Oreshkin, D. Carpov, N. Chapados, Y. Bengio, N-BEATS: Neural basis
Syst. Appl. 219 (2023) 119527, http://dx.doi.org/10.1016/j.eswa.2023.119527.
expansion analysis for interpretable time series forecasting, 2019, http://dx.doi.
[14] M.S. Alam, M. Murshed, P. Manigandan, D. Pachiyappan, S.Z. Abduvaxitovna,
org/10.48550/arXiv.1905.10437, arXiv preprint arXiv:1905.10437.
Forecasting oil, coal, and natural gas prices in the pre-and post-COVID scenarios:
[38] C. Challu, K.G. Olivares, B.N. Oreshkin, F.G. Ramirez, M.M. Canseco, A.
Contextual evidence from India using time series forecasting tools, Resour. Policy
Dubrawski, NHITS: Neural hierarchical interpolation for time series forecasting,
81 (2023) 103342, http://dx.doi.org/10.1016/j.resourpol.2023.103342.
in: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, 2023,
[15] B. Kumar, N. Yadav, et al., A novel hybrid model combining 𝛽sarma and
pp. 6989–6997, http://dx.doi.org/10.1609/aaai.v37i6.25854, 6.
LSTM for time series forecasting, Appl. Soft Comput. 134 (2023) 110019, http:
[39] C.-F. Hsu, Adaptive backstepping elman-based neural control for unknown
//dx.doi.org/10.1016/j.asoc.2023.110019.
nonlinear systems, Neurocomputing 136 (2014) 170–179, http://dx.doi.org/10.
[16] A.M. Khan, M. Osińska, Comparing forecasting accuracy of selected grey and
1016/j.neucom.2014.01.015.
time series models based on energy consumption in Brazil and India, Expert
[40] P. Hewage, A. Behera, M. Trovati, E. Pereira, M. Ghahremani, F. Palmieri,
Syst. Appl. 212 (2023) 118840, http://dx.doi.org/10.1016/j.eswa.2022.118840.
Y. Liu, Temporal convolutional neural (TCN) network for an effective weather
[17] A. Lazcano, P.J. Herrera, M. Monge, A combined model based on recurrent neural
forecasting using time-series data from the local weather station, Soft Comput.
networks and graph convolutional networks for financial time series forecasting,
24 (2020) 16453–16482, http://dx.doi.org/10.1007/s00500-020-04954-0.
Mathematics 11 (1) (2023) 224, http://dx.doi.org/10.3390/math11010224.

17

You might also like