0% found this document useful (0 votes)
20 views

Journal of Applied Mathematics - 2024 - Araya - A Hybrid GARCH and Deep Learning Method for Volatility Prediction

Uploaded by

1914296603
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Journal of Applied Mathematics - 2024 - Araya - A Hybrid GARCH and Deep Learning Method for Volatility Prediction

Uploaded by

1914296603
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Wiley

Journal of Applied Mathematics


Volume 2024, Article ID 6305525, 19 pages
https://doi.org/10.1155/2024/6305525

Research Article
A Hybrid GARCH and Deep Learning Method for
Volatility Prediction

Hailabe T. Araya ,1,2 Jane Aduda ,3 and Tesfahun Berhane 4

1
Department of Mathematics, Pan African University Institute for Basic Sciences, Technology and Innovation, Nairobi 62000, Kenya
2
Department of Mathematics, Debre Markos University, Debre Markos 269, Ethiopia
3
Department of Statistics and Actuarial Sciences, Jomo Kenyatta University of Agriculture and Technology, Nairobi 62000, Kenya
4
Department of Mathematics, Bahir Dar University, Bahir Dar 26, Ethiopia

Correspondence should be addressed to Hailabe T. Araya; [email protected]

Received 21 February 2024; Revised 5 June 2024; Accepted 15 June 2024

Academic Editor: Man Leung Wong

Copyright © 2024 Hailabe T. Araya et al. This is an open access article distributed under the Creative Commons Attribution
License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is
properly cited.

Volatility prediction plays a vital role in financial data. The time series movements of stock prices are commonly
characterized as highly nonlinear and volatile. This study is aimed at enhancing the accuracy of return volatility forecasts
for stock prices by investigating the prediction of their price volatility through the integration of diverse models. Thus, the
study integrated four powerful methods: seasonal autoregressive (AR) integrated moving average (MA), generalized AR
conditional heteroskedasticity (ARCH) family models, convolutional neural network (CNN), and bidirectional long short-term
memory (LSTM) network. The hybrid model was developed using the residuals generated by the seasonal AR integrated MA
model as input for the generalized ARCH model. Following this, the estimated volatility obtained was utilized as an input
feature for both the hybrid CNNs and bidirectional LSTM models. The model’s forecasting performance was assessed using key
evaluation metrics, including mean absolute error (MAE) and root mean squared error (RMSE). Compared to other hybrid
models, our new proposed hybrid model demonstrates an average reduction in MAE and RMSE of 60.35% and 60.61%,
respectively. The experimental results show that the model proposed in this study has good performance and accuracy in
predicting the volatility of stock prices. These findings offer valuable insights for financial data analysis and risk management
strategies.

Keywords: deep learning; GARCH-family models; hybrid model; volatility

1. Introduction practitioners in modeling the volatility of stock returns.


Modeling volatility plays a crucial role in designing investment
In financial terms, volatility denotes the degree of uncertainty strategies to mitigate risk and enhance stock returns and in
or risk associated with fluctuations in the value of a security. pricing securities and options [2]. Moreover, its significance
Some securities exhibit high volatility, indicating significant extends beyond investors and market participants to encom-
fluctuations in their values over a wide range, while others pass the broader economy. High levels of volatility can disrupt
demonstrate lower volatility, with their values varying within the stability of capital markets, affect currency values, and
a narrower range. The fluctuations in securities’ returns are impede international trade [3]. In light of these consequential
not directly observable, leading traders, institutional investors, challenges, the necessity of forecasting volatility emerges as an
and other market participants to understand the relationship imperative, ensuring investors can adeptly navigate the com-
between returns and volatility [1]. The global expansion of plexities of financial markets and make informed, stable
stock markets has sparked interest among researchers and investment decisions. Therefore, in this paper, the aim was
4185, 2024, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2024/6305525 by CochraneChina, Wiley Online Library on [02/08/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
2 Journal of Applied Mathematics

to obtain an optimal model (a hybrid of GARCH and deep of artificial intelligence (AI) in various fields. Artificial neu-
learning models) to predict volatility in stock prices. ral network (ANN) models can capture the nonlinearity of
The forecasting of volatility in financial instruments the series, do not require the series to be stationary for
has been extensively examined in recent decades, primarily modeling, and perform better in volatility forecasts than
due to its role as an indicator that enables the estimation SGARCH-type models. Liu [13] showed that for a consider-
of risk associated with the asset within a specified time able time interval, volatility predictions for the Standard &
frame. In recent years, substantial literature has emerged Poor’s 500 (S&P 500) and Apple Inc. indicate that the long
on modeling and predicting volatility in financial markets. short-term memory (LSTM) can outperform the GARCH
Primarily, Engle [4] developed the autoregressive (AR) model.
conditional heteroskedasticity (ARCH) model by incorpo- In recent years, the field of financial time series analysis
rating conditional variance and modeling the serial correla- has witnessed a growing interest in the development of
tion of returns as a function of past errors and changing hybrid models that combine various deep-learning tech-
time. This was carried out as part of Engle’s attempts to niques with statistical models to enhance the accuracy of
explain how inflation dynamics operate in the United volatility prediction. Kim and Won [14] developed a hybrid
Kingdom. model to predict the volatility of the Korea Composite Stock
To enhance Engle’s model, the GARCH models were Price Index (KOSPI 200) by integrating GARCH-type
developed by Bollerslev [5]. This enhancement involved models with the LSTM model.
incorporating a long memory and creating a more flexible Kakade et al. [15] investigated the advantage of hybridiz-
lag structure by adding lagged conditional variance to the ing GARCH-type models with LSTM to predict the volatility
original model. The standard GARCH (SGARCH) model of metals in the Indian commodity market. The study found
cannot model the leverage effect because its specifications that hybrid GARCH-LSTM models outperform standalone
assume that the variance depends on the shock’s magnitude models. Vidal and Kristjanpoller [16] studied gold volatility
and is independent of its sign [6]. prediction using a hybrid CNN-LSTM approach. This study
Later, various adaptable GARCH models were intro- found that the hybrid CNN-LSTM model outperforms the
duced, incorporating additional parameters to capture the GARCH and LSTM models in forecasting the volatility of
asymmetric behavior of time series data such as the expo- gold.
nential GARCH (EGARCH) model proposed by Nelson Zeng et al. [17] studied a natural gas load volatility pre-
[7] and the threshold GARCH (TGARCH) model intro- diction model by combining GARCH family models, the
duced by Zakoian [8]. eXtreme Gradient Boosting (XGBoost) algorithm, and the
B. Almansour, Alshater, and A. Almansour’s [9] research LSTM network. Mademlis and Dritsakis [18] presented
focused on assessing the effectiveness of ARCH and GARCH two hybrid models in the investigation of predictive models
models in predicting volatility within the cryptocurrency for the volatility of the Financial Times Stock Exchange
market. The findings indicated that both positive and nega- Milano Italia Borsa (FTSE MIB) index and evaluated their
tive news events have a notable impact on conditional vola- efficacy alongside an asymmetric GARCH model and a neu-
tility across various cryptocurrency markets. Additionally, ral network.
the study concluded that the GARCH model demonstrates All the above studies have their limitations; for instance,
promising predictive capabilities for cryptocurrency price hybrid SARIMA-GARCH family models for volatility fore-
movements in the market. casting often face limitations when confronted with nonlin-
Franses and Van Dijk [10] conducted a study on fore- ear sequences and influential factors [19]. Moreover, a single
casting stock market volatility using the nonlinear GARCH Convolutional neural network (CNN) model has a poor
method. The investigation involved an analysis of the interpretation of volatility. Therefore, prediction accuracy
GARCH model and two of its nonlinear modifications to will not be high.
forecast weekly stock market volatility. The findings of the Combining hybrid econometric models such as
study indicated that the quadratic generalized ARCH SARIMA-GARCH family models with CNN models can
(QGARCH) model was the most effective for forecasting effectively solve the shortcomings in volatility forecasting.
the volatility of the stock market. CNN excels GARCH family models in capturing complex
Sen, Mehtab, and Dutta [11] predicted the volatility of temporal patterns and nonlinear dependencies in volatility
stocks from selected sectors of the National Stock Exchange structures. Furthermore, CNN can automatically identify hier-
(NSE) of the Indian economy using GARCH. The archical features from the data, aiding in the extraction of
researchers found that asymmetric GARCH models, nota- meaningful representations and proving robust to noisy data.
bly, provide more precise forecasts regarding the future vol- However, integrating CNN with GARCH family models
atility levels of the selected stocks. In a modified model, alone often fails to yield superior prediction results, as an
specifically concerning ARIMA-GARCH modeling as dem- abundance of feature inputs can degrade model perfor-
onstrated by Aduda et al. [12], there has been an exploration mance. BiLSTM excels CNN in capturing long-term depen-
of using the residuals of ARIMA as a vital factor for improv- dencies and sequential patterns in the data. By utilizing both
ing forecasting in GARCH. forward and backward information patterns of the market,
Implementing deep learning models in financial mar- BiLSTM enhances accuracy in time series prediction. Its
kets, especially in stock markets, has become a burgeoning memory cells effectively handle data prone to irregular and
research subject in recent times due to the increasing use seasonal patterns.
4185, 2024, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2024/6305525 by CochraneChina, Wiley Online Library on [02/08/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Journal of Applied Mathematics 3

In response to this research gap, the present study intro- p q


εt−i ε
duces a novel hybrid SARIMA-GARCH-CNN-BiLSTM ln σ2t = α0 + 〠 β j ln σ2t−i + 〠 αi + ri t−i 3
j=1 i=1 σt−i σt−i
model. The forecasting performance of this new model is
evaluated. The inclusion of BiLSTM in the extended model
provides a significant improvement over the unidirectional
approach of traditional LSTM models. The findings of this where εt−i > 0 and εt−i < 0 imply good news and bad news
study provide valuable insights into understanding the and their overall effects are αi + γi · εt−i and αi − γi ·
movement and comovement of volatility prices, which is εt−i , respectively.
essential for investors, traders, and risk managers in navigat- The EGARCH model achieves covariance stationarity
p
ing and mitigating the high-risk nature of these investments. when ∑ j=1 β j < 1.

2. Methodology 2.4. TGARCH Model. The TGARCH model is an asymmet-


ric model developed to treat news such as mergers and
The study used the SARIMA, GARCH family, CNN, and
launches of discovery by introducing multiplicative dummy
LSTM as components to create a new hybrid model. Each
variables to the variance equation. This model was devel-
of these components has unique traits that may be extracted
oped by the works of [20, 21]. The generalized specification
from historical data.
of the TGARCH (r, s) model for the conditional variance
2.1. SARIMA Model. The seasonal AR integrated MA model, equation is expressed as
SARIMA p, d, q P, D, Q s is a member of the Box–Jenkins
family of time series forecasting models for the nonstation- p q
ary data series. The SARIMA p, d, q P, D, Q s can be writ- σ2t = α0 + 〠 β j σ2t−j + 〠 αi + γi I t−i ε2t−i 4
ten as j=1 i=1

ΦP Bs ϕp B ∇d ∇Ds r t − μ = ΘQ Bs θq B εt 1
where α0 , αi , β j , and γi are parameters that will be estimated
for evaluation. I t−1 is an indicator dummy variable such that
where the nonseasonal AR and MA components are repre- I t − 1 = 0 for εt−i ≥ 0 and 1 for εt−i < 0. When εt−i ≥ 0 and
q p
sented by θq B = 1 + ∑i=1 θi Bi and ϕp B = 1 − ∑ j=1 ϕ j B j , εt−i < 0, the total contribution to the volatility is αi ε2t−i and
respectively. The seasonal AR and MA components are rep- αi + γi ε2t−i , respectively. Furthermore, when I t−i = 0, the
resented by ΦP Bs = 1 − ∑Pi=1 Φi Bis and ΘQ Bs = 1 + ∑Qj=1 model becomes SGARCH. Thus, these two pieces of news,
of equal length, have different effects on conditional volatil-
Θ j B js , respectively. rt represents the time series, and εt ity. When γi > 0, bad news causes volatility to increase,
denotes the random error at time period t, where μ is the which leads to a leveraging impact in the i-th order. Positive
mean of the model. ∇d and ∇Ds represent the nonseasonal shocks with equal size raised conditional volatility more than
and seasonal differencing operators defined as ∇d r t = negative shocks when γi < 0. Positive conditional volatility
1 − B d rt and ∇Ds r t = 1 − Bs D r t . occurs when α0 > 0, β j ≥ 0, αi ≥ 0, and αi + γi ≥ 0 are all non-
negative. According to Poon [22], the TGARCH model is
2.2. SGARCH Model. The SGARCH r, s model is one in p q
stationary if ∑i=1 αi + γi /2 + ∑ j=1 β j < 1.
which the variance of the error term of the SARIMA model
follows a GARCH process. The model used for the returns
series is represented as follows: the error term εt is equal to 2.4.1. Model Selection Criteria. The study employed the most
z t σt , where z t is independent and identically distributed widely used model selection method, the Akaike information
criterion (AIC) [23], to determine which GARCH model fits
(i.i.d.) with E z t = 0 and Var z t = 1. The variance σ2t is
the data. The best possible model was selected based on the
determined by the following equation:
AIC scores of the models. AIC balances the goodness of fit
against the complexity of the model. Lower AIC values indi-
r s cate a better trade-off between fit and complexity.
σ2t = α0 + 〠 αi ε2t−i + 〠 β j σ2t−j 2
i=1 j=1
2.5. CNN Model. Convolution is a mathematical process that
takes two functions and produces a third, which is typically
where α0 > 0, αi > 0, and βi > 0 are constants. understood to be a filtered or modified version of one of the
original functions [24]. One convolution operand, f n ,
2.3. EGARCH Model. The EGARCH model can accurately corresponds to the filter, h n , with which we process the
evaluate an asymmetric distribution and also quantify the signal. The convolution procedure involves carrying out
increased impact of significant shocks on volatility [7]. The K multiplications and K − 1 sums for every value of the
conditional mean equation of EGARCH is the same as signal when the filter is finite and only specified in the
above. The EGARCH which includes positive and negative domain 0, 1, ⋯, K − 1 [25]. This can be mathematically
asymmetric effects on returns is expressed as represented as
4185, 2024, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2024/6305525 by CochraneChina, Wiley Online Library on [02/08/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
4 Journal of Applied Mathematics

K−1
f ∗h n = 〠 h k f n−k 5 Cli = Rel bli + 〠 C l−1 l
j · K ij 6
k=0 j

This operation is called convolution. In this context l-th layer, C li denotes the i-th feature,
CNNs are specialized neural networks designed for pro- C l−1
j refers to the j-th feature in the preceding l − 1 -th
cessing inputs with inherent spatial structures. Renowned layer, K lij signifies the kernel connecting the i-th to the j-th
for efficacy across diverse data formats such as one-
dimensional time series data, two-dimensional image data, features, bli is the bias associated with these features, and
and three-dimensional video data [26], the study used the Relu stands for the activation function.
1D-CNN model for sequential data. The network uses Figure 1 illustrates a diagram representing how feature
algorithms for data mining to automatically recognize and maps are formed in a CNN model. In the diagram, u repre-
pick the most critical features from the raw data. The sents a sample filter with adjustable weights of Size 3. Each
sequential 1D-CNN model was constructed using convolu- C i denotes the i-th element of the feature map. N stands
tional layers, pooling layers, and dense layers. The input for the number of 1D data tensor units, and f represents
was a 3D tensor with shape (batch size, time steps, and the filter size with a stride of 1.
input dim). According to Rala Cordeiro et al. [29], the stride value
The role of the convolutional layer in the CNN model defines how the kernel moves in the input data. The most
is to identify temporal patterns and relationships within common value is one, meaning that the kernel moves over
sequential data, such as time series [27]. The role of the one column of the input data at each iteration. After convolu-
pooling layer, on the other hand, is to perform downsam- tion, pooling reduces dimensionality and improves feature
pling to address computational complexity and achieve robustness. Pooling size corresponds to input feature units.
translation invariance, enabling the model to identify fea- It applies a function to multiple inputs (convolutional fea-
tures regardless of location in the input. Furthermore, tures). Max pooling defines the pooling layer. This study used
downsampling helps to minimize the likelihood of overfit- two CNN layers to strike a balance between complexity and
ting, increase computational efficiency, and decrease the performance. This approach reduces the risk of overfitting
number of parameters. CNNs combine concepts such as by avoiding excessive complexity and mitigating challenges
weight sharing and local connectivity to improve the like vanishing or exploding gradients during training. Previous
model’s ability to do complex tasks. Relu is a popular activa- research, like [30], may support the efficacy of two-layer CNN
tion function in CNNs due to its nonlinearity, computational architectures for the task at hand. The decision reflects a
efficiency, and ability to solve vanishing gradient problems of thoughtful consideration of computational efficiency and
the training model in deep learning. While CNNs are effec- training stability. Figure 2 depicts the structure of a 1D-CNN
tive models for time series forecasting applications, overfit- model with two convolutional and two max pooling layers.
ting is a problem that can affect them. The overfitting The output of the last pooling layer is flattened and connected
problem can arise from factors such as the complexity of to a dense layer with N units. Following the final pooling layer,
the model and highly correlated training data. Dropout is a the output is flattened and connected to a dense layer contain-
regularization technique used in neural networks to prevent ing N units. Subsequently, this dense layer is connected to the
overfitting. final output layer, which consists of a single neuron.
Feature maps in 1D-CNNs serve as representations of 2.6. BiLSTM Model. Hochreiter and Schmidhuber [31] first
extracted features from time series data. These maps are proposed the LSTM architecture as a specific type of recur-
generated through convolutional filters, capturing specific rent neural network (RNN) to overcome the limitations of
patterns and structures inherent in the data. Convolutional traditional RNNs in capturing and learning long-term
filters, also known as kernels, are small matrices used in dependencies in sequential data. While LSTM captures
CNNs to extract features from input data. These filters information from extended periods, the acquired data per-
slide or convolve across the input data, performing math- tains to the time before the output moment, which lacks
ematical operations to capture patterns and features at dif- reverse information. However, for time series prediction,
ferent locations. Each feature map corresponds to learned it is crucial to consider both backward and forward infor-
features within the time series, such as trends or periodic- mation patterns to enhance predictive performance. The
ities. As the data propagates through the layers, deeper two LSTMs that make up BiLSTM are forward and
layers learn more abstract features, building upon earlier reverse. In contrast to the regular LSTM’s one-way state
representations. In a CNN’s convolutional layer, features transfer, the BiLSTM takes into account the data’s chang-
from the previous layer are combined with learnable ker- ing laws both before and after data transmission, enabling
nels and activation functions like a hyperbolic tangent, sig- it to make more thorough and precise decisions by utiliz-
moid, and Relu to produce feature maps [28]. As such, ing both past and future knowledge. It has performed bet-
each feature map output is combined with more than ter than expected.
one input feature map. In general, the convolved features Given the input sequence x = x1 , x2 , ⋯, xT , the hidden
at the output of the l-th layer can be written as shown layer sequence h = h1 , h2 , ⋯, hT and the network output
in Equation (6) [28]. vector y = y1 , y2 , yT of the standard BiLSTM model are
4185, 2024, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2024/6305525 by CochraneChina, Wiley Online Library on [02/08/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Journal of Applied Mathematics 5

Feature maps Ci = f (u.X[i:i+N–f])

C1 C2 C3 C4

X1 X2 X3 X4 X5 X6

Figure 1: Generation of feature maps of convolution layers in 1D-CNN.

Input layer

Filter
Convolutional layer 1

Output layer
Convolutional layer 2

Fully connected layer


Sliding filer

M
ax
p oo
lin
g
Sliding filer

Ma
xp
oo
lin
g
ten
Sliding filer

Flat

L = 1823

Featu
re Filter
maps Featu
re ma
ps

Figure 2: Simplified view of the 1D-CNN model for Apple Inc. data.

iteratively calculated from t = 1 to t = T. The updated mem- In Equation (9), denoted as f t , the forget gate mecha-
ory cell can calculate the current hidden state ht through the nism. The sigmoid activation function σ was used to judge
following formulas: whether the last memory needs to be retained for the current
memory state. Equation (7) describes the computation of it ,
it = σ W i ·ht−1 , xt + bi 7 serving as the input gate to assess the significance of retain-
ing current input data. Equation (8) describes the calculation
Ct = tanh W c · ht−1 , xt + bc 8 of Ct , which is used to calculate the data that needs to be
updated. Equation (10) shows whether the state at the cur-
f t = σ W f · ht−1 , xt + b f 9 rent moment needs to be updated. After a new state was
obtained, Equation (10) was used to calculate the output gate
Ct = f t · C t−1 + it · C t 10 value Ot .
The predicted value of the system can be given by the
Ot = σ W o · ht−1 , xt + bo 11 linear activation function as

ht = Ot · tanh C t 12 yt = W hy · ht + by 13
4185, 2024, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2024/6305525 by CochraneChina, Wiley Online Library on [02/08/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
6 Journal of Applied Mathematics

Update cell state to


determine new hidden
state

Ct-1 Ct y1 yt-1 yt yt+1


tanh
x
x Backward
� � LSTM LSTM LSTM LSTM
tanh ht yt

ht-1 LSTM LSTM LSTM LSTM
Forward
X1 Xt-1 Xt Xt+1
Output
Forget Input Candidate gate
gate gate for cell LSTM block
state

Figure 3: Internal structure of the BiLSTM model.

Use K-1 fold for training, k = 3

80% training
data Test Evaluate accuracy
Training 20% on each fold

Normalized Training Test 20% Training


data Test 20% Training

Leave 1 fold
for test
20% testing
data

Figure 4: Data division criteria.

Figure 3 illustrates the internal organization of the models. CNN effectively captures spatial dependencies,
BiLSTM model, constructed with LSTM blocks. BiLSTM, while BiLSTM excels at capturing long-term dependencies,
comprising both forward and backward LSTM components, leveraging both temporal and spatial features for improved
necessitates a reversal of the computation. forecasting.

2.7. SARIMA-GARCH-CNN-BiLSTM. The study introduced 2.7.1. Model Development Procedure. The overall process of
a novel approach to volatility forecasting called the hybrid modeling followed the following algorithms.
SARIMA-GARCH-CNN-BiLSTM Model. First, the mean
model was built using SARIMA, which was well known for 2.7.2. Data Preprocessing
its ability to detect seasonal and temporal trends in financial
time series data. Second, the volatility was estimated using 1. Filling missing data. Due to the closure of financial
three models from the GARCH family: GARCH, TGARCH, markets on weekends (Saturdays and Sundays) and
and EGARCH. This makes it easier to choose the most accu- public holidays, missing values may occur in stock
rate model among the GARCH variations. As a result, asym- datasets. Additionally, issues with processing and reg-
metry and time-varying patterns can be captured. SARIMA istration during the data retrieval process could con-
residuals were used as input to estimate volatility in the tribute to missing data. Incomplete data introduces
GARCH family models. Lastly, CNN and BiLSTM architec- biases due to discrepancies between observed and
tures receive the predicted output of the GARCH family unobserved data. In time series prediction, it is
4185, 2024, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2024/6305525 by CochraneChina, Wiley Online Library on [02/08/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Journal of Applied Mathematics 7

Table 1: Internal hybrid structure of deep learning models.

HDL model Structure of layer


conv1D layer (filters: 256, filter size: 2, Relu activation) + maxpooling1D (pooling size: 2, padding: same) +
conv1D layer (filters: 128, filter size: 2, Relu activation) + maxpooling1D (polling size: 2, padding: same) +
CNN
flatten layer + dense layer (neuron: 1, Relu activation) + dropout =0.2 with fully connected layer (neurons =1:
linear activation)
conv1D layer (filters: 256, filter size: 2, Relu activation) + maxpooling1D (pooling size: 2, padding: same) +
conv1D layer (filters: 128, filter size: 2, Relu activation) + maxpooling1D (polling size: 2, padding: same) +
CNN-BiLSTM
flatten layer + BiLSTM layer (neurons: 128, Relu activation) + BiLSTM layer (neurons: 50, Relu activation) +
dense layer (neuron: 1, linear activation)

Test
Fitting mean model (�t)
Returns series rt = �t + Єt SARIMA (p, d, q) (P, D, Q, s)
Validation data

Train data
Єt = rt ‐ ⌃rt GARCH
Є1 �1
Є2 �2
GARCH
· ·
· ·
Єt-1 GARCH �t-1
Єt GARCH �t

Data preprocessing Fill missing values for volatility data of


Data normalization Prepare the data to feed for CNN
GARCH and obtain winsorized data

�1 �2 ... �p
�2 �3 ... �
p+1
...

...

�n-p �n-p+1 ... �n-1


Convolution
Max‐pooling
Convolution
Hybrid CNN‐BiLSTM

�p+1 �p+2 �p+3 �n

Forward
LSTM LSTM LSTM LSTM

Backward
LSTM LSTM LSTM LSTM

Flatten layer ...

Figure 5: The structure of the hybrid SARIMA-GARCH-CNN-BiLSTM model.

essential to address this issue without discarding s1 x if x21 ≤ x < x2


values from the series [32]. In this study, cubic spline
interpolation was used to capture complex and non- s2 x if x22 ≤ x < x3
Sx =
linear trend structures that linear methods cannot ⋮ 14
adequately handle. In cubic spline interpolation, the
cubic spline polynomial is fit to time series data to sn−1 x if x2n−1 ≤ x < xn
estimate the missing values.
si x = ai x − xi 3
+ bi x − x i 2
+ ci x − x i + d i
The mathematical formulation of the cubic spline
method was used under the methodology outlined by Erdo- Equation (14) combines the piecewise function and the
gan [33]. polynomial definition with a single equation number.
4185, 2024, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2024/6305525 by CochraneChina, Wiley Online Library on [02/08/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
8 Journal of Applied Mathematics

200

180

160

140

120
Price

100

80

60

40
9/20/2018

1/28/2019

6/7/2019

10/15/2019

2/23/2020

7/2/2020

11/10/2020

3/20/2021

7/29/2021

12/7/2021

4/16/2022

8/25/2022

1/2/2023

5/12/2023

9/19/2023
Days
Interpolated

Figure 6: Time series plot of Apple Inc. price across time.

2. Outlier detection. Data points that are located at the


Table 2: Summary statistics of Apple Inc. winsorized returns.
outermost limits of a dataset are referred to be
outliers. Statistic Original returns Winsorized returns
3. Data normalization. The winsorized data, devoid of Mean 0.000648 0.000715
outliers, originates from various forecasting points, Median 0.000617 0.000617
each exhibiting different scales of values. To address Variance 0.000299 0.00017
this issue and enhance the training efficiency of the Skewness −0.128 −0.084
model, min–max normalization, as represented in
Kurtosis 5.52139 −0.3254
Equation (15), is employed to scale the dataset within
the range of 0, 1 . Jarque–Bera 2316.86 10.2145
p value 0.00 0.006
Sample size 1820 1820
xw − min xw
xi = 15
max xw − min xw
validation dataset, and 15% to the test dataset [35].
where xi is the normalized value, xw is the original winsor- Figure 4 presents the data splitting procedure utilized
ized value, max xw is the maximum value of winsorized in this research, illustrating the allocation of datasets.
data xw , and min xw is the minimum value of winsorized In this study, the data is divided into three segments
data value xw . (60:20:20 ratio), facilitating a thorough model assess-
ment. The training set educates the model about
4. Data splitting. To create and assess a forecasting underlying patterns, while the validation set serves as
model, the normalized data needs to be divided. In an impartial benchmark for comparing algorithms
practice, there has not been a universally perfect per- trained on this data. Subsequently, the test set evalu-
centage for data splitting in the past. Nonetheless, ates the model’s performance on fresh data, ensuring
there are various methods to partition the dataset. a robust evaluation process and guarding against over-
One instance involves an 80% training and 20% test- fitting. Furthermore, to bolster model training and
ing split [24, 34], while another approach entails allo- minimize overfitting, a k = 3-fold cross-validation
cating 70% to the training dataset, 15% to the technique is utilized within the training dataset.
4185, 2024, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2024/6305525 by CochraneChina, Wiley Online Library on [02/08/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Journal of Applied Mathematics 9

Original returns

Winsorized returns

−0.10 −0.05 0.00 0.05 0.10

Figure 7: The box plot of returns and winsorized returns.


Returns

0
9/21/2018

1/29/2019

6/8/2019

10/16/2019

2/24/2020

7/3/2020

11/11/2020

3/21/2021

7/30/2021

12/8/2021

4/17/2022

8/26/2022

1/3/2023

5/13/2023

Days

Returns

Figure 8: Apple Inc. price winsorized returns.

Table 3: ADF unit root test results.


After splitting the data, it has to be reshaped into 3D to
be an input for the hybrid CNN-BiLSTM model. Thus, input Daily Apple Inc. winsorized returns (r t )
dimensions are samples, time steps, and features. The num- Test Statistic Critical values p value
ber of time steps (window size) was a hyperparameter that
% critical value: −3.4339
represents the number of previous lags used as input to pre-
ADF −10.2166 % critical value: −2.8631 0.025
dict the next time steps. The study used empirical testing to
fix an optimal value of the window hyperparameter. The % critical value: −2.5676
sequence of observations x1 , x2 ,⋯,xn have to be changed
to multiple examples (samples) by developing a matrix X
which served as the independent variable of the model and where each sample has a size equal to the number of time
y as the dependent variable of the model of which the model steps (lagged variables) that is p and the size of learning sam-
can learn. Then, divide the time series to n − p + 1 examples ples is n + 1. The obtained size of the independent and
4185, 2024, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2024/6305525 by CochraneChina, Wiley Online Library on [02/08/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
10 Journal of Applied Mathematics

Trend component
0.0025
0.0000
9/21/2018

2/19/2019

7/20/2019

12/18/2019

5/18/2020

10/16/2020

3/17/2021

8/16/2021

1/15/2022

6/15/2022

11/14/2022

4/14/2023

9/12/2023
Seasonal component
0.025
0.000
–0.025
9/21/2018

2/19/2019

7/20/2019

12/18/2019

5/18/2020

10/16/2020

3/17/2021

8/16/2021

1/15/2022

6/15/2022

11/14/2022

4/14/2023

9/12/2023
Residual component
0.05
0.00
–0.05
9/21/2018

2/19/2019

7/20/2019

12/18/2019

5/18/2020

10/16/2020

3/17/2021

8/16/2021

1/15/2022

6/15/2022

11/14/2022

4/14/2023

9/12/2023
Figure 9: Additive decomposition of Apple Inc. winsorized return data.

Table 4: Model comparison using the Akaike information criterion


predicted matrix will have a size n − p + 1 × p and n + 1 (AIC).
× 1. The deep learning models described in this paper are
outlined in Table 1. The goal is to pick out features from Model AIC
the input dataset using CNN layers. After that, the outputs ARIMA(1,0,0)(1,0,0)2 −5993.333
of these CNN layers are passed into BiLSTM layers and a ARIMA(0,0,1)(0,0,1)2 −5993.063
dense layer at the output to help predict sequences. The ARIMA(1,0,0)(1,0,1)2 −5992.025
comprehensive architecture of the hybrid SARIMA- ARIMA(1,0,0)(0,0,1)2 −5992.182
GARCH-CNN-BiLSTM model is detailed in Figure 5. This
ARIMA(1,0,1)(1,0,0)2 −5992.183
structure begins by leveraging the return series of Apple
Inc.’s price data to establish the mean model. Subsequently, ARIMA(0,0,1)(1,0,0)2 −5993.920
it analyzes the residuals of the mean model to estimate vol- ARIMA(0,0,1)(1,0,1)2 −5992.0
atility using GARCH models. Finally, it preprocesses the
GARCH model outputs to serve as inputs for the hybrid Table 5: Parameter estimation of the ARIMA(1,1,2)(0,0,0)2 model
CNN-BiLSTM model. using the training set.

2.8. Forecasting Performance Evaluation. The study evalu- Coef. Std. err. z P> z 0.025 0.975
ated forecast efficiency using two key metrics: mean absolute Intercept 0.0011 0.001 1.660 0.097 0.001 0.002
error (MAE) and root mean squared error (RMSE). MAE,
ma.L1 0.1998 0.017 12.014 0.001 0.167 0.232
less sensitive to outliers, measures average error magnitude,
while RMSE, more sensitive to outliers, provides deeper ar.S.L2 −0.2096 0.018 −11.902 0.001 −0.244 −0.1375
insights into prediction performance. By utilizing both met- Sigma2 0.0003 0.00659 46.658 0.001 0.000 0.000
rics, the study comprehensively assesses the model’s predic-
tive capabilities, considering data properties.
Table 6: ARCH effect test for residuals of the SARIMA model.
N
1 Lag Statistics p value
MAE = 〠 Y − Ŷ k 16
N k=1 k For residual series
Lag 1 99.0427 0.001
ARCH-LM
Lag 21 153.3352 0.001
1 N Lag 1 0.008505 0.926522
〠 Y − Ŷ k
2 Ljung–Box
RMSE = 17 Lag 21 8.949068 0.111113
N k=1 k
For squared residual series
where N is the number of observations, Y t are the actual Lag 1 100.864047 265.57485
Ljung–Box
values at time t, and Ŷ k are the predicted values of the model Lag 21 0.0096 0.00249
at time t.
4185, 2024, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2024/6305525 by CochraneChina, Wiley Online Library on [02/08/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Journal of Applied Mathematics 11

1.0

0.8

0.6

Density
0.4

0.2

0.0
−4 −3 −2 −1 0 1 2 3
Standardized residuals
Empirical Logistic
Norm t_10
Expon

Figure 10: Comparison of standard residuals between distributions.

3. Data cifically, the study selected the 1st and 99th percentiles,
substituting exceedingly low values with those correspond-
This study used secondary data, namely, daily Apple Inc. ing to the 1st percentile and exceedingly high values with
close price data from 23 September 2018 to 19 September those associated with the 99th percentile.
2023, extracted from the Yahoo Finance website. There were
a total of 1256 daily observations. The database had 565
missing data points. Thus, the author filled in values using
4. Result and Discussion
cubic spline interpolation, as stated in Equation (14). The The Apple Inc. price series in Figure 6 illustrates long-term
dataset was split into three portions: the training set, which increasing and decreasing trends. It suggests a nonstationar-
accounted for 60% of the data; the validation set, which con- ity series. Since the price series exhibits nonstationarity, the
stituted 20% of the data to assess the performance of the study applies a logarithmic transformation and takes the first
trained model; and the remaining 20%, which was used for difference at lag 1 to obtain the return series. Thus, when
testing to evaluate the final performance of the model. Inves- examining higher kurtosis in original return data, as shown
tors tend to focus more on price returns, which reflect price in Table 2, it is desirable to winsorize data to make it less
variation, rather than the price itself [36, 37], because return susceptible to outliers. Figure 7 displays the distribution of
data is typically stationary, making it suitable for use in time returns and winsorized returns through box plots, showcas-
series models. Thus, we chose to conduct our experiments ing their respective central tendencies and dispersion. Due to
using this variable. Additionally, the study scales the price the detection of outliers in the return data, the study opted to
returns to percentages to depict daily returns, as shown in winsorize the dataset as shown in Figure 7. Table 2 shows
Equation (18). the summary statistics of the returns and winsorized returns.
The Jarque–Bera test rejects the normality assumption for
Pt both returns. With negative kurtosis, a lighter tail, and fewer
r t = log × 100 18
Pt−1 outliers in the winsorized returns, the study utilized the
depicted winsorized returns data in Table 2 for further
where Pt is the daily closing price of Apple Inc. on day t, Pt−1 analysis.
is the daily closing price of Apple Inc. on the previous day, Figure 8 displays the winsorized graph that fluctuated
and rt represents the daily returns of the Apple Inc. price above and below the 0 lines, indicating that the price series
index at time t. Since the returns exhibited a leptokurtic achieved stationarity in the mean. However, the series is
property, meaning excessive kurtosis compared to a Gauss- not stationary in variance.
ian distribution (kurtosis = 3), the sensitivity to extreme fluc- The study also tested for stationarity using the aug-
tuations around the mean is pronounced. To identify mented Dickey–Fuller test. The ADF test result displayed
outliers within the return series, the z-score test technique in Table 3 rejects the null hypothesis of a unit root’s exis-
was employed. The z-score test is a statistical method used tence, supported by the ADF test value of −10.2166 and a
to evaluate the positioning of a data point in relation to p value of 0.02576. It is safe to say that the winsorized daily
the mean of a dataset, determining whether it falls within a returns are stationary.
specified range of values [38]. Upon identifying outliers The visual representation of the decomposition of the
within the dataset, winsorization was utilized as a remedial return time series is shown in Figure 9. This decomposition
strategy, applying percentile thresholds for adjustment. Spe- separates the series into three distinct components: trend,
4185, 2024, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2024/6305525 by CochraneChina, Wiley Online Library on [02/08/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
12 Journal of Applied Mathematics

Table 7: Anderson–Darling test results. Table 9: AIC and BIC of SARIMA-TGARCH models.

W 2n p value Best fit Log-


Model AIC
∞ likelihood
Expo 1.34 0
SARIMA(0,0,1)(1,1,0)2-TGARCH(1,1,1) 1010.28 −501.139
Norm 20.167945 0.784 0
SARIMA(0,0,1)(1,1,0)2-TGARCH(1,1,2) 1012.28 −501.139
Logistic 4.936475 0.66 1
SARIMA(0,0,1)(0,1,1)2-TGARCH(1,2,1) 981.476 −485.476
t-student 90.8091 1.24 0
SARIMA(0,0,1)(1,1,1)2-TGARCH(2,1,1) 1012.28 −501.139
SARIMA(0,0,1)(1,1,1)2-TGARCH(1,2,2) 983.476 −485.738
Table 8: AIC and BIC of SARIMA-SGARCH models. SARIMA(0,0,1)(1,1,1)2-TGARCH(2,1,2) 1014.28 −501.139
Log- SARIMA(0,0,1)(1,1,1)2-TGARCH(2,2,1) 983.476 −485.738
Model AIC SARIMA(0,0,1)(1,1,1)2-TGARCH(2,2,2) 985.476 −485.738
likelihood
SARIMA(0,0,1)(1,1,0)2-SGARCH(1,1) 1008.52 −501.259
SARIMA(0,0,1)(1,1,0)2-SGARCH(1,2) 979.6836 −485.842
SARIMA(0,0,1)(0,1,1)2-SGARCH(2,1) 1010.5175 −501.255 Table 10: AIC and BIC of SARIMA-EGARCH models.
SARIMA(0,0,1)(1,1,1)2-SGARCH(2,2) 981.6836 −485.842
Log-
Model AIC
likelihood
seasonal, and residuals. It is evident from the figure that the SARIMA(0,0,1)(1,1,0)2-EGARCH(1,1,1) −617.779 −584.740
observed series exhibits seasonality with a period of two. SARIMA(0,0,1)(1,1,0)2-EGARCH(1,1,2) −730.704 372.352
The study employed the AIC for choosing the order, as it SARIMA(0,0,1)(0,1,1)2-EGARCH(1,2,1) −671.070 342.535
effectively minimizes prediction error and is asymptotically SARIMA(0,0,1)(1,1,1)2-EGARCH(2,1,1) −730.284 691.738
comparable to cross-validation [39]. Table 4 summarizes SARIMA(0,0,1)(1,1,1)2-EGARCH(1,2,2) −730.594 686.541
the AIC values of different models. After assessing various SARIMA(0,0,1)(1,1,1)2-EGARCH(2,1,2) −728.706 −684.653
SARIMA models, the most suitable model for the observed
SARIMA(0,0,1)(1,1,1)2-EGARCH(2,2,1) −687.262 351.631
series was ARIMA(0, 0, 1)(1, 0, 0)2, which exhibits the low-
est AIC value of −5993.920. SARIMA(0,0,1)(1,1,1)2-EGARCH(2,2,2) −728.601 371.301
The chosen model parameters are represented in
Table 5, and the residual variance was significantly different
from zero, suggesting that the model fits the data well. Using
the estimates, the mean model SARIMA(0,0,1)(1,0,0,2) is Table 11: Parameter estimation of GARCH family models using
written as the training set.

Models Model parameters Coefficients p values


r t = 0 0011 − 0 2096rt−2 + 0 1998εt−1 + εt 19
Intercept ω 0.0177 0.07805
At a significance level of 5%, Engle’s Lagrange Multiplier α1 0.2337 0.02582
SGARCH(1,2)
ARCH test, employing up to 21 lags representing a 1-month β1 0.3201 0.07668
trading period of Apple Inc., as presented in Table 6, rejects β2 0.4431 0.01286
the null hypothesis of no ARCH effects in the winsorized
returns. This means that the variances of returns were het- Intercept ω 0.0177 0.07.602
eroscedastic and suggested the use of the ARCH/GARCH α1 0.2338 0.02499
model for capturing the time-varying volatility in the
TGARCH(1,2,1) β1 0.2147 0.047391
returns. Furthermore, the result was supported by the
Ljung–Box test, which indicated that the residuals were β2 0.4223 0.01255
uncorrelated and that the squared residuals did not exhibit γ1 −0.2338 0.939
serial correlation. In simpler terms, the error variance exhib-
ited autocorrelation. Next, a distribution for the residuals of Intercept ω 1.9742 0.01346
the return series was estimated to approximate the empirical α1 2.5914 0.06552
distribution. Based on the work conducted by Budiarti et al. EGARCH(1,1,2) β1 0.9946 0.001
[40], the study examined the distribution of residuals. A
comparison was made among the standard residuals of three γ1 −0.7159 0.619
distributions, namely, normal distribution, t-student distri- γ2 −1.5419 0.02563
bution, and logistic distribution, against the empirical distri-
butions, as depicted in Figure 10.
The Anderson–Darling test showed that the most suit- The study applied three variants of GARCH family
able distribution for Apple Inc. residual data was the logistic models: SGARCH, EGARCH, and TGARCH, to the resid-
distribution, as shown in Table 7. uals of SARIMA for model identification. The AIC and
4185, 2024, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2024/6305525 by CochraneChina, Wiley Online Library on [02/08/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Journal of Applied Mathematics 13

2.00

1.75

1.50

Volatility 1.25

1.00

0.75

0.50

0.25

0.00
9/21/2022

10/27/2022

12/2/2022

1/7/2023

2/12/2023

3/20/2023

4/25/2023

5/31/2023

7/6/2023

8/11/2023

9/16/2023
Days

Actual volatility SARIMA-SGARCH volatility


SARIMA-TGARCH volatility SARIMA-EGARCH volatility

Figure 11: Volatility forecast of hybrid econometric models.

log-likelihood of the estimated models are noted for each of


Table 12: Evaluation results of hybrid SARIMA-GARCH family
the models. models.
4.1. SARIMA-SGARCH Model Identification. The SAR- Model MAE RMSE
IMA(0,0,1)(0,0,1)2-SGARCH(1,2) model under the logistic
distribution was favored over the other models due to its Hybrid SARIMA-SGARCH 0.2053 0.2731
smallest AIC value (979.6836) and larger log-likelihood Hybrid SARIMA-EGARCH 0.444 1.1706
(−485.842), as indicated in Table 8. Hybrid SARIMA-TGARCH 0.2601 0.3281

4.2. SARIMA-TGARCH Model Identification. The SAR-


IMA(0,0,1)(0,0,1)2-TGARCH(1,2,1) model under the logis- εt−1 ε
tic distribution was favored over the other models due to ln σ2t = −1 9742 + 2 5914 · − 0 7159 · t−1 − 1 5419
σt−1 σt−1
its smallest AIC value (981.476) and larger log-likelihood εt−2
(−485.476), as illustrated in Table 9. · + 0 9946 · ln σ2t−1
σt−2
4.3. SARIMA-EGARCH Model Identification. The SAR- 21
IMA(0,0,1)(0,0,1)2-EGARCH(1,1,2) model under the logis-
tic distribution was favored over the other models due to
its smallest AIC value (−730.704) and larger log-likelihood Alexander [41] stated that volatility takes longer to dimin-
(372.352), as demonstrated in Table 10. ish when β j is relatively large. Therefore, based on Equation
The mean and volatility models were developed as fol- (21), the value of β1 = 0 9946 indicated longer volatility persis-
lows by utilizing Equation (19) and Table 11. tence. Moreover, since both γ1 = −0 7159 and γ2 = −1 5419
were negative, it suggested that adverse news had a greater
Mean model r t = 0 0011 − 0 2096r t−2 + 0 1998εt−1 + εt impact on the volatility of returns for Apple Inc. Finally,
Volatility model σ2t = 0 0177 + 0 2337ε2t−1 + 0 3201σ2t−1 + 0 4431σ2t−1 the hybrid SARIMA(0,0,1)(1,0,0,2)-TGARCH(1,2,1) model
exhibited a similar mean model as described in Equation
20 (20), but the volatility equation was formulated as follows:

Equation (20) shows the mean and volatility model of


hybrid SARIMA(0,0,1)(1,0,0,2)-SGARCH(1,2). In the hybrid σ2t = 0 0177 + 0 2147ε2t−1 − 0 2338ε2t−1 · d t−1 + 0 4223σ2t−1
SARIMA(0,0,1)(1,0,0,2)-SGARCH(1,2) model, the degree of 22
volatility persistence was found to be 0.9969. Given that the
persistence parameter was very close to unity, volatility was
predictable for future periods. From Equation (22), γ1 = −0 2338 ≠ 0 indicates that
For the hybrid SARIMA(0,0,1)(1,0,0,2)-EGARCH(1,1,2) there exists a leverage effect. In addition to this, γ1 < 0 indi-
model, the mean equation was as described in Equation cates that volatility increased more by positive shocks than
(20), while the volatility model was described as follows: negative shocks at equal length.
4185, 2024, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2024/6305525 by CochraneChina, Wiley Online Library on [02/08/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Journal of Applied Mathematics

2023-09

2023-09
2023-07

2023-07
2023-05

2023-05

Figure 12: Continued.


Days

Days
2023-03

2023-03

Predicted-hybrid SARIMA-TGARCH-CNN
Predicted-hybrid SARIMA-SGARCH-CNN

(b)
(a)
2023-01

2023-01
2022-11

2022-11
Actual - test

Actual - test
2022-09

2022-09
0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

1.4

1.2

1.0

0.8

0.6

0.4

0.2
Volatility Volatility
14
4185, 2024, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2024/6305525 by CochraneChina, Wiley Online Library on [02/08/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Journal of Applied Mathematics 15

1.4

1.2

1.0

0.8
Volatility

0.6

0.4

0.2

0.0
2022-09 2022-11 2023-01 2023-03 2023-05 2023-07 2023-09
Days

Actual - test
Predicted-hybrid SARIMA-EGARCH-CNN
(c)

Figure 12: Volatility forecasts for hybrid econometrics with CNN models. (a) Volatility forecast of hybrid SARIMA-SGARCH-CNN, (b)
volatility forecast of hybrid SARIMA-TGARCH-CNN, and (c) volatility forecast of hybrid SARIMA-EGARCH-CNN.

4.4. The Prediction Results of Hybrid Econometric Models. As


Table 13: Evaluation results of hybrid econometrics with CNN
shown in Figure 11, it is the fitting curve of the predicted
models.
value and the true value of every hybrid econometric model.
It can be seen that the fitting effect of the hybrid SARIMA- Model MAE RMSE
SGARCH model was better than the other two models. As Hybrid SARIMA-SGARCH-CNN 0.0717 0.0909
shown in Table 12, it is the value of the evaluation index
Hybrid SARIMA-EGARCH-CNN 0.3223 1.1120
of hybrid econometric models. Thereinto, the value of
MAE (0.2053) and RMSE (0.2731) of hybrid SARIMA- Hybrid SARIMA-TGARCH-CNN 0.099510 0.12419
SGARCH is the smallest. Therefore, the overall performance
of the hybrid SARIMA-SGARCH model is better. However,
these hybrid econometric models did not show good predic- The MAE and RMSE of the hybrid SARIMA-EGARCH-
tion performance because they did not capture complex CNN increased by 36.3% and 75.4%, respectively. The
nonlinear relationships and sequential dependencies of the MAE and RMSE of the hybrid SARIMA-TGARCH-CNN
data. decreased by 51.5% and 54.5%, respectively. The MAE and
RMSE values for hybrid SARIMA-SGARCH-CNN are the
4.5. Prediction Results of Hybrid Econometric Models With lowest among the models considered, indicating its superior
CNN Model. Through the above prediction, it is found that overall performance. However, despite these favorable met-
the hybrid SARIMA-SGARCH model has a better fitting rics, the hybrid SARIMA-SGARCH-CNN model exhibited
effect and higher prediction accuracy than the prediction suboptimal prediction performance. This limitation could
results of the other two econometric models. Furthermore, stem from its inability to effectively capture sequential pat-
to improve the prediction accuracy of volatility, rely on (1) terns and long-term dependencies within the data.
hybrid econometric models to describe the series pattern
and volatility clustering and (2) the CNN model to capture 4.6. Prediction Results of the Proposed Hybrid Econometrics
spatial patterns and local dependencies in the data, enhanc- With Hybrid CNN-BiLSTM Model. The study introduced
ing model robustness and accuracy. Through this experi- three novel hybrid models, denoted as hybrid SARIMA-
ment, the study used the estimated volatilities of hybrid SGARCH-CNN-BiLSTM, hybrid SARIMA-EGARCH-
econometric models as input features for the CNN model. CNN-BiLSTM, and hybrid SARIMA-TGARCH-CNN-
The prediction results of hybrid econometrics with CNN BiLSTM, and conducted comparative analyses with existing
models are shown in Figures 12(a), 12(b), and 12(c), while hybrid econometric models and hybrid econometric models
the corresponding evaluation outcomes are summarized in incorporating CNN. Additionally, the study performed
Table 13. Through the prediction results and evaluation intercomparisons among the newly proposed hybrid models
results, it can be found that, compared with hybrid SAR- to identify the superior-performing model. Now, the new
IMA-SGARCH, the MAE and RMSE of hybrid SARIMA- models are constructed by taking the forecasted volatility
SGARCH-CNN decreased by 65% and 66.7%, respectively. of hybrid econometric models as input features to the hybrid
4185, 2024, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2024/6305525 by CochraneChina, Wiley Online Library on [02/08/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Journal of Applied Mathematics

2023-09

2023-09
2023-07

2023-07
2023-05

2023-05

Predicted-hybrid SARIMA-TGARCH-CNN-BiLSTM

Figure 13: Continued.


Predicted-hybrid SARIMA-SGARCH-CNN-BiLSTM
Days

Days
2023-03

2023-03

(b)
(a)
2023-01

2023-01
2022-11

2022-11
Actual - test

Actual - test
2022-09

2022-09
0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

1.4

1.2

1.0

0.8

0.6

0.4

0.2
Volatility Volatility
16
4185, 2024, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2024/6305525 by CochraneChina, Wiley Online Library on [02/08/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Journal of Applied Mathematics 17

1.4

1.2

1.0

0.8
Volatility

0.6

0.4

0.2

0.0
2022-09 2022-11 2023-01 2023-03 2023-05 2023-07 2023-09
Days

Actual - test
Predicted-hybrid SARIMA-EGARCH-CNN-BiLSTM
(c)

Figure 13: Volatility forecasts for a hybrid of econometrics with CNN-BiLSTM models. (a) Volatility forecast of hybrid SARIMA-
SGARCH-CNN-BiLSTM, (b) volatility forecast of hybrid SARIMA-TGARCH-CNN-BiLSTM, and (c) volatility forecast of hybrid
SARIMA-EGARCH-CNN-BiLSTM.

hybrid SARIMA-SGARCH-CNN-BiLSTM demonstrated


Table 14: Evaluation results of hybrid econometrics with hybrid
superior predictive accuracy compared to all hybrid econo-
CNN-BiLSTM models.
metric models with CNN-BiLSTM, indicating its efficacy in
Model MAE RMSE forecasting. Consequently, the hybrid SARIMA-SGARCH-
Hybrid SARIMA-SGARCH-CNN-BiLSTM 0.0514 0.0706
CNN-BiLSTM model stands out as the most effective in
achieving optimal prediction performance and the highest
Hybrid SARIMA-EGARCH-CNN-BiLSTM 0.2840 1.1110
accuracy among the proposed models.
Hybrid SARIMA-TGARCH-CNN-BiLSTM 0.07301 0.11864 Based on the obtained results, the hybrid SARIMA-
SGARCH-CNN-BiLSTM model outperformed other hybrid
econometric models, showing an average reduction in
CNN-BiLSTM model to capture dependencies from both MAE and RMSE of 81.9% and 82.25%, respectively. Addi-
past and future contexts. The prediction results of the newly tionally, the hybrid SARIMA-SGARCH-CNN-BiLSTM
proposed models are shown in Figures 13(a), 13(b), and model exhibited superior performance compared to hybrid
13(c). The evaluation results of the model are presented in econometrics with CNN models, with an average reduction
Table 14. In comparison to hybrid SARIMA-SGARCH- in MAE and RMSE of 53.5% and 52.8%, respectively. More-
CNN, the hybrid model hybrid SARIMA-SGARCH-CNN- over, compared to hybrid SARIMA-GARCH family models,
BiLSTM exhibited notable improvements, with a decrease the outperformed hybrid of econometrics with CNN-
of 28.3% in MAE and 22.3% in RMSE. Conversely, hybrid BiLSTM demonstrates an average reduction in MAE and
SARIMA-EGARCH-CNN-BiLSTM experienced an increase RMSE of 55.5% and 67%, respectively. The outcomes indi-
of 74.7% in MAE and 91.8% in RMSE compared to hybrid cate that the hybrid SARIMA-SGARCH-CNN-BiLSTM
SARIMA-SGARCH-CNN. Similarly, hybrid SARIMA- model outperformed its counterparts, the hybrid economet-
TGARCH-CNN-BiLSTM showed a slight increase of 1.7% rics with CNN-BiLSTM models, exhibiting notable improve-
in MAE and 23.3% in RMSE relative to hybrid SARIMA- ments. This model showcased an average reduction in MAE
SGARCH-CNN. Moreover, hybrid SARIMA-SGARCH- and RMSE of 55.5% and 67%, respectively, underscoring
CNN-BiLSTM demonstrated superior performance com- its enhanced predictive accuracy and efficiency. Compared
pared to hybrid SARIMA-EGARCH-CNN-BiLSTM, with a to three distinct hybrid models, the proposed hybrid SAR-
significant decrease of 81.9% in MAE and 93.6% in RMSE. IMA-STANDARD-GARCH-CNN-BiLSTM model demon-
Additionally, hybrid SARIMA-SGARCH-CNN-BiLSTM strates an average reduction in MAE and RMSE of
outperformed hybrid SARIMA-TGARCH-CNN-BiLSTM, 60.35% and 60.6%, respectively. Overall, the hybrid SAR-
with reductions of 29.5% in MAE and 40.5% in RMSE. IMA-SGARCH-CNN-BiLSTM model emerged as the top
Similarly, hybrid SARIMA-TGARCH-CNN-BiLSTM exhib- performer among the various hybrid econometric models
ited improvements over hybrid SARIMA-EGARCH-CNN- evaluated in this study. These findings underscore the effec-
BiLSTM, with reductions of 58.2% in MAE and 89.3% in tiveness of the SARIMA-SGARCH-CNN-BiLSTM approach
RMSE. Notably, among the newly proposed hybrid models, in volatility forecasting.
4185, 2024, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2024/6305525 by CochraneChina, Wiley Online Library on [02/08/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
18 Journal of Applied Mathematics

5. Conclusion References
Based on the findings presented in the study, the following [1] E. F. Fama, “The behavior of stock-market prices,” The Journal
conclusions were drawn. The study used the input as resid- of Business, vol. 38, no. 1, pp. 34–105, 1965.
uals of the SARIMA model for three hybrid econometric [2] T. H. Roh, “Forecasting the volatility of stock price index,”
models and selected the best model. The results showed that Expert Systems with Applications, vol. 33, no. 4, pp. 916–922,
the hybrid SARIMA-SGARCH model outperformed the 2007.
other two hybrid models. Additionally, the study incorpo- [3] D. Bhowmik, “Stock market volatility: an evaluation,” Interna-
rated the estimated volatilities of the econometric models tional Journal of Scientific and Research Publications, vol. 3,
as input features to the CNN to enhance prediction accu- no. 10, pp. 1–17, 2013.
racy. The results indicated that the hybrid SARIMA- [4] R. F. Engle, “Autoregressive conditional heteroscedasticity
SGARCH-CNN model outperformed the hybrid SARIMA- with estimates of the variance of United Kingdom inflation,”
EGARCH-CNN and hybrid SARIMA-TGARCH-CNN Econometrica: Journal of the Econometric Society, vol. 50,
models. Finally, the study constructed three new models no. 4, pp. 987–1007, 1982.
and used the estimated volatility of the hybrid econometrics [5] T. Bollerslev, “Generalized autoregressive conditional hetero-
as input to the CNN-BiLSTM model, concluding that the skedasticity,” Journal of Econometrics, vol. 31, no. 3, pp. 307–
hybrid SARIMA-SGARCH-CNN-BiLSTM model performs 327, 1986.
well. This proposed model demonstrated its effectiveness, [6] R. Khaldi, A. El Afia, and R. Chiheb, “Forecasting of BTC vol-
particularly with Apple Inc. data, providing valuable insights atility: comparative study between parametric and nonpara-
metric models,” Progress in Artificial Intelligence, vol. 8,
for financial data analysis and risk management strategies,
no. 4, pp. 511–523, 2019.
thus aiding investors in making informed decisions.
[7] D. B. Nelson, “Conditional heteroskedasticity in asset returns:
a new approach,” Econometrica: Journal of the Econometric
Society, vol. 59, no. 2, pp. 347–370, 1991.
6. Recommendation
[8] J.-M. Zakoian, “Threshold heteroskedastic models,” Journal of
The study was limited to investigating the volatility predic- Economic Dynamics and Control, vol. 18, no. 5, pp. 931–955,
tion of Apple Inc. using the three new hybrids of economet- 1994.
rics with CNN-BiLSTM models. As a result, the following [9] B. Y. Almansour, M. M. Alshater, and A. Y. Almansour, “Per-
aspects for further investigation are suggested. Future studies formance of ARCH and GARCH models in forecasting cryp-
should explore incorporating attention mechanisms or tocurrency market volatility,” Industrial Engineering &
Management Systems, vol. 20, no. 2, pp. 130–139, 2021.
transformer architectures to capture complex temporal
dependencies and improve forecasting accuracy. Addition- [10] P. H. Franses and D. Van Dijk, “Forecasting stock market vol-
atility using (non-linear) GARCH models,” Journal of Fore-
ally, evaluating the impact of incorporating additional data
casting, vol. 15, no. 3, pp. 229–235, 1996.
sources beyond historical stock prices, such as sentiment
[11] J. Sen, S. Mehtab, and A. Dutta, “Volatility modeling of stocks
analysis and financial news, could augment the forecast
from selected sectors of the Indian economy using GARCH,”
model. Furthermore, extending the research to include vola-
in 2021 Asian Conference on Innovation in Technology
tility forecasting for a portfolio of assets and exploring corre- (ASIANCON), PUNE, India, 2021.
lations between the assets is recommended.
[12] J. Aduda, P. Weke, P. Ngare, and J. Mwaniki, “Financial time
series modelling of trends and patterns in the energy markets,”
Journal of Mathematical Finance, vol. 6, no. 2, pp. 324–337,
Data Availability Statement 2016.
The data of this study can be obtained by contacting the cor- [13] Y. Liu, “Novel volatility forecasting using deep learning–long
short term memory recurrent neural networks,” Expert Sys-
responding author.
tems with Applications, vol. 132, pp. 99–109, 2019.
[14] H. Y. Kim and C. H. Won, “Forecasting the volatility of stock
Conflicts of Interest price index: a hybrid model integrating LSTM with multiple
GARCH-type models,” Expert Systems with Applications,
The authors declare no conflicts of interest. vol. 103, pp. 25–37, 2018.
[15] K. Kakade, A. K. Mishra, K. Ghate, and S. Gupta, “Forecasting
commodity market returns volatility: a hybrid ensemble learn-
Funding ing GARCH-LSTM based approach,” Intelligent Systems in
Accounting, Finance and Management, vol. 29, no. 2,
This work is supported by the Pan African University Insti- pp. 103–117, 2022.
tute for Basic Sciences, Technology and Innovation. [16] A. Vidal and W. Kristjanpoller, “Gold volatility prediction
using a CNN-LSTM approach,” Expert Systems with Applica-
tions, vol. 157, article 113481, 2020.
Acknowledgments [17] H. Zeng, B. Shao, G. Bian, H. Dai, and F. Zhou, “A hybrid deep
learning approach by integrating extreme gradient boosting-
This work is supported by the Pan African University Insti- long short-term memory with generalized autoregressive con-
tute for Basic Sciences, Technology and Innovation. ditional heteroscedasticity family models for natural gas load
4185, 2024, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2024/6305525 by CochraneChina, Wiley Online Library on [02/08/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Journal of Applied Mathematics 19

volatility prediction,” Energy Science & Engineering, vol. 10, the high-voltage subnet of Northeast Germany,” Sensors,
no. 7, pp. 1998–2021, 2022. vol. 23, no. 2, p. 901, 2023.
[18] D. K. Mademlis and N. Dritsakis, “Volatility forecasting using [36] E. Hajizadeh, A. Seifi, M. F. Zarandi, and I. Turksen, “A hybrid
hybrid GARCH neural network models: the case of the Italian modeling approach for forecasting the volatility of S&P 500
stock market,” International Journal of Economics and Finan- index return,” Expert Systems with Applications, vol. 39,
cial Issues, vol. 11, no. 1, pp. 49–60, 2021. no. 1, pp. 431–436, 2012.
[19] F. H. Mustapa and M. T. Ismail, “Modelling and forecasting [37] M. Seo and G. Kim, “Hybrid forecasting models based on the
S&P 500 stock prices using hybrid Arima-Garch model,” Jour- neural networks for the volatility of bitcoin,” Applied Sciences,
nal of Physics: Conference Series, vol. 1366, no. 1, article vol. 10, no. 14, p. 4768, 2020.
012130, 2019. [38] D. S. Moore and G. P. McCabe, Introduction to the Practice of
[20] L. R. Glosten, R. Jagannathan, and D. E. Runkle, “On the rela- Statistics, WH Freeman/Times Books/Henry Holt & Co., 1989.
tion between the expected value and the volatility of the nom-
[39] M. Stone, “An asymptotic equivalence of choice of model by
inal excess return on stocks,” The Journal of Finance, vol. 48,
cross-validation and Akaike’s criterion,” Journal of the Royal
no. 5, pp. 1779–1801, 1993.
Statistical Society: Series B (Methodological), vol. 39, no. 1,
[21] R. Rabemananjara and J.-M. Zakoian, “Threshold ARCH pp. 44–47, 1977.
models and asymmetries in volatility,” Journal of Applied
[40] R. Budiarti, K. Intansari, I. G. P. Purnaba, and F. Septyanto,
Econometrics, vol. 8, no. 1, pp. 31–49, 1993.
“Modelling dependencies of stock indices during Covid-19
[22] S.-H. Poon, A Practical Guide to Forecasting Financial Market pandemic by extreme-value copula,” Jurnal Teori dan Aplikasi
Volatility, John Wiley & Sons, 2005. Matematika, vol. 7, no. 3, pp. 805–819, 2023.
[23] H. Akaike, “A new look at the statistical model identification,”
[41] C. Alexander, Market Risk Analysis, John Wiley & Sons,
IEEE Transactions on Automatic Control, vol. 19, no. 6,
Boxset, 2009.
pp. 716–723, 1974.
[24] F. Berzal, Redes Neuronales & Deep Learning: Volumen II,
Independently published, 2019.
[25] M. Vakalopoulou, S. Christodoulidis, N. Burgos, O. Colliot,
and V. Lepetit, “Deep learning: basics and convolutional neu-
ral networks (CNNs),” in Machine Learning for Brain Disor-
ders, O. Colliot, Ed., vol. 197 of Neuromethods, Humana,
New York, NY, 2023.
[26] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning,
MIT press, 2016.
[27] J. Cao and J. Wang, “Stock price forecasting model based on
modified convolution neural network and financial time series
analysis,” International Journal of Communication Systems,
vol. 32, no. 12, article e3987, 2019.
[28] L. Muhammad, A. A. Haruna, U. S. Sharif, and M. B. Moham-
med, “CNN-LSTM deep learning based forecasting model for
Covid-19 infection cases in Nigeria, South Africa and
Botswana,” Health and Technology, vol. 12, no. 6, pp. 1259–
1276, 2022.
[29] J. Rala Cordeiro, A. Raimundo, O. Postolache, and
P. Sebastião, “Neural architecture search for 1D CNNs—dif-
ferent approaches tests and measurements,” Sensors, vol. 21,
no. 23, p. 7990, 2021.
[30] S. Y. Yerima, M. K. Alzaylaee, and A. P. V. Shajan, “Deep
learning techniques for android botnet detection,” Electronics,
vol. 10, no. 4, p. 519, 2021.
[31] S. Hochreiter and J. Schmidhuber, “Long short-term mem-
ory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[32] M. Noor, A. Yahaya, N. A. Ramli, and A. M. Al Bakri, “Filling
missing data using interpolation methods: study on the effect
of fitting distribution,” Key Engineering Materials, vol. 594,
pp. 889–895, 2014.
[33] K. Erdogan, “Spline interpolation techniques,” Journal of Tech-
nical Science and Technologies, vol. 2, no. 1, pp. 47–52, 2013.
[34] G. Memarzadeh and F. Keynia, “A new short-term wind speed
forecasting method based on fine-tuned LSTM neural network
and optimal input sets,” Energy Conversion and Management,
vol. 213, article 112824, 2020.
[35] F. Aksan, Y. Li, V. Suresh, and P. Janik, “CNN-LSTM vs.
LSTM-CNN to predict power flow direction: a case study of

You might also like