Journal of Applied Mathematics - 2024 - Araya - A Hybrid GARCH and Deep Learning Method for Volatility Prediction
Journal of Applied Mathematics - 2024 - Araya - A Hybrid GARCH and Deep Learning Method for Volatility Prediction
Research Article
A Hybrid GARCH and Deep Learning Method for
Volatility Prediction
1
Department of Mathematics, Pan African University Institute for Basic Sciences, Technology and Innovation, Nairobi 62000, Kenya
2
Department of Mathematics, Debre Markos University, Debre Markos 269, Ethiopia
3
Department of Statistics and Actuarial Sciences, Jomo Kenyatta University of Agriculture and Technology, Nairobi 62000, Kenya
4
Department of Mathematics, Bahir Dar University, Bahir Dar 26, Ethiopia
Copyright © 2024 Hailabe T. Araya et al. This is an open access article distributed under the Creative Commons Attribution
License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is
properly cited.
Volatility prediction plays a vital role in financial data. The time series movements of stock prices are commonly
characterized as highly nonlinear and volatile. This study is aimed at enhancing the accuracy of return volatility forecasts
for stock prices by investigating the prediction of their price volatility through the integration of diverse models. Thus, the
study integrated four powerful methods: seasonal autoregressive (AR) integrated moving average (MA), generalized AR
conditional heteroskedasticity (ARCH) family models, convolutional neural network (CNN), and bidirectional long short-term
memory (LSTM) network. The hybrid model was developed using the residuals generated by the seasonal AR integrated MA
model as input for the generalized ARCH model. Following this, the estimated volatility obtained was utilized as an input
feature for both the hybrid CNNs and bidirectional LSTM models. The model’s forecasting performance was assessed using key
evaluation metrics, including mean absolute error (MAE) and root mean squared error (RMSE). Compared to other hybrid
models, our new proposed hybrid model demonstrates an average reduction in MAE and RMSE of 60.35% and 60.61%,
respectively. The experimental results show that the model proposed in this study has good performance and accuracy in
predicting the volatility of stock prices. These findings offer valuable insights for financial data analysis and risk management
strategies.
to obtain an optimal model (a hybrid of GARCH and deep of artificial intelligence (AI) in various fields. Artificial neu-
learning models) to predict volatility in stock prices. ral network (ANN) models can capture the nonlinearity of
The forecasting of volatility in financial instruments the series, do not require the series to be stationary for
has been extensively examined in recent decades, primarily modeling, and perform better in volatility forecasts than
due to its role as an indicator that enables the estimation SGARCH-type models. Liu [13] showed that for a consider-
of risk associated with the asset within a specified time able time interval, volatility predictions for the Standard &
frame. In recent years, substantial literature has emerged Poor’s 500 (S&P 500) and Apple Inc. indicate that the long
on modeling and predicting volatility in financial markets. short-term memory (LSTM) can outperform the GARCH
Primarily, Engle [4] developed the autoregressive (AR) model.
conditional heteroskedasticity (ARCH) model by incorpo- In recent years, the field of financial time series analysis
rating conditional variance and modeling the serial correla- has witnessed a growing interest in the development of
tion of returns as a function of past errors and changing hybrid models that combine various deep-learning tech-
time. This was carried out as part of Engle’s attempts to niques with statistical models to enhance the accuracy of
explain how inflation dynamics operate in the United volatility prediction. Kim and Won [14] developed a hybrid
Kingdom. model to predict the volatility of the Korea Composite Stock
To enhance Engle’s model, the GARCH models were Price Index (KOSPI 200) by integrating GARCH-type
developed by Bollerslev [5]. This enhancement involved models with the LSTM model.
incorporating a long memory and creating a more flexible Kakade et al. [15] investigated the advantage of hybridiz-
lag structure by adding lagged conditional variance to the ing GARCH-type models with LSTM to predict the volatility
original model. The standard GARCH (SGARCH) model of metals in the Indian commodity market. The study found
cannot model the leverage effect because its specifications that hybrid GARCH-LSTM models outperform standalone
assume that the variance depends on the shock’s magnitude models. Vidal and Kristjanpoller [16] studied gold volatility
and is independent of its sign [6]. prediction using a hybrid CNN-LSTM approach. This study
Later, various adaptable GARCH models were intro- found that the hybrid CNN-LSTM model outperforms the
duced, incorporating additional parameters to capture the GARCH and LSTM models in forecasting the volatility of
asymmetric behavior of time series data such as the expo- gold.
nential GARCH (EGARCH) model proposed by Nelson Zeng et al. [17] studied a natural gas load volatility pre-
[7] and the threshold GARCH (TGARCH) model intro- diction model by combining GARCH family models, the
duced by Zakoian [8]. eXtreme Gradient Boosting (XGBoost) algorithm, and the
B. Almansour, Alshater, and A. Almansour’s [9] research LSTM network. Mademlis and Dritsakis [18] presented
focused on assessing the effectiveness of ARCH and GARCH two hybrid models in the investigation of predictive models
models in predicting volatility within the cryptocurrency for the volatility of the Financial Times Stock Exchange
market. The findings indicated that both positive and nega- Milano Italia Borsa (FTSE MIB) index and evaluated their
tive news events have a notable impact on conditional vola- efficacy alongside an asymmetric GARCH model and a neu-
tility across various cryptocurrency markets. Additionally, ral network.
the study concluded that the GARCH model demonstrates All the above studies have their limitations; for instance,
promising predictive capabilities for cryptocurrency price hybrid SARIMA-GARCH family models for volatility fore-
movements in the market. casting often face limitations when confronted with nonlin-
Franses and Van Dijk [10] conducted a study on fore- ear sequences and influential factors [19]. Moreover, a single
casting stock market volatility using the nonlinear GARCH Convolutional neural network (CNN) model has a poor
method. The investigation involved an analysis of the interpretation of volatility. Therefore, prediction accuracy
GARCH model and two of its nonlinear modifications to will not be high.
forecast weekly stock market volatility. The findings of the Combining hybrid econometric models such as
study indicated that the quadratic generalized ARCH SARIMA-GARCH family models with CNN models can
(QGARCH) model was the most effective for forecasting effectively solve the shortcomings in volatility forecasting.
the volatility of the stock market. CNN excels GARCH family models in capturing complex
Sen, Mehtab, and Dutta [11] predicted the volatility of temporal patterns and nonlinear dependencies in volatility
stocks from selected sectors of the National Stock Exchange structures. Furthermore, CNN can automatically identify hier-
(NSE) of the Indian economy using GARCH. The archical features from the data, aiding in the extraction of
researchers found that asymmetric GARCH models, nota- meaningful representations and proving robust to noisy data.
bly, provide more precise forecasts regarding the future vol- However, integrating CNN with GARCH family models
atility levels of the selected stocks. In a modified model, alone often fails to yield superior prediction results, as an
specifically concerning ARIMA-GARCH modeling as dem- abundance of feature inputs can degrade model perfor-
onstrated by Aduda et al. [12], there has been an exploration mance. BiLSTM excels CNN in capturing long-term depen-
of using the residuals of ARIMA as a vital factor for improv- dencies and sequential patterns in the data. By utilizing both
ing forecasting in GARCH. forward and backward information patterns of the market,
Implementing deep learning models in financial mar- BiLSTM enhances accuracy in time series prediction. Its
kets, especially in stock markets, has become a burgeoning memory cells effectively handle data prone to irregular and
research subject in recent times due to the increasing use seasonal patterns.
4185, 2024, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2024/6305525 by CochraneChina, Wiley Online Library on [02/08/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Journal of Applied Mathematics 3
ΦP Bs ϕp B ∇d ∇Ds r t − μ = ΘQ Bs θq B εt 1
where α0 , αi , β j , and γi are parameters that will be estimated
for evaluation. I t−1 is an indicator dummy variable such that
where the nonseasonal AR and MA components are repre- I t − 1 = 0 for εt−i ≥ 0 and 1 for εt−i < 0. When εt−i ≥ 0 and
q p
sented by θq B = 1 + ∑i=1 θi Bi and ϕp B = 1 − ∑ j=1 ϕ j B j , εt−i < 0, the total contribution to the volatility is αi ε2t−i and
respectively. The seasonal AR and MA components are rep- αi + γi ε2t−i , respectively. Furthermore, when I t−i = 0, the
resented by ΦP Bs = 1 − ∑Pi=1 Φi Bis and ΘQ Bs = 1 + ∑Qj=1 model becomes SGARCH. Thus, these two pieces of news,
of equal length, have different effects on conditional volatil-
Θ j B js , respectively. rt represents the time series, and εt ity. When γi > 0, bad news causes volatility to increase,
denotes the random error at time period t, where μ is the which leads to a leveraging impact in the i-th order. Positive
mean of the model. ∇d and ∇Ds represent the nonseasonal shocks with equal size raised conditional volatility more than
and seasonal differencing operators defined as ∇d r t = negative shocks when γi < 0. Positive conditional volatility
1 − B d rt and ∇Ds r t = 1 − Bs D r t . occurs when α0 > 0, β j ≥ 0, αi ≥ 0, and αi + γi ≥ 0 are all non-
negative. According to Poon [22], the TGARCH model is
2.2. SGARCH Model. The SGARCH r, s model is one in p q
stationary if ∑i=1 αi + γi /2 + ∑ j=1 β j < 1.
which the variance of the error term of the SARIMA model
follows a GARCH process. The model used for the returns
series is represented as follows: the error term εt is equal to 2.4.1. Model Selection Criteria. The study employed the most
z t σt , where z t is independent and identically distributed widely used model selection method, the Akaike information
criterion (AIC) [23], to determine which GARCH model fits
(i.i.d.) with E z t = 0 and Var z t = 1. The variance σ2t is
the data. The best possible model was selected based on the
determined by the following equation:
AIC scores of the models. AIC balances the goodness of fit
against the complexity of the model. Lower AIC values indi-
r s cate a better trade-off between fit and complexity.
σ2t = α0 + 〠 αi ε2t−i + 〠 β j σ2t−j 2
i=1 j=1
2.5. CNN Model. Convolution is a mathematical process that
takes two functions and produces a third, which is typically
where α0 > 0, αi > 0, and βi > 0 are constants. understood to be a filtered or modified version of one of the
original functions [24]. One convolution operand, f n ,
2.3. EGARCH Model. The EGARCH model can accurately corresponds to the filter, h n , with which we process the
evaluate an asymmetric distribution and also quantify the signal. The convolution procedure involves carrying out
increased impact of significant shocks on volatility [7]. The K multiplications and K − 1 sums for every value of the
conditional mean equation of EGARCH is the same as signal when the filter is finite and only specified in the
above. The EGARCH which includes positive and negative domain 0, 1, ⋯, K − 1 [25]. This can be mathematically
asymmetric effects on returns is expressed as represented as
4185, 2024, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2024/6305525 by CochraneChina, Wiley Online Library on [02/08/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
4 Journal of Applied Mathematics
K−1
f ∗h n = 〠 h k f n−k 5 Cli = Rel bli + 〠 C l−1 l
j · K ij 6
k=0 j
This operation is called convolution. In this context l-th layer, C li denotes the i-th feature,
CNNs are specialized neural networks designed for pro- C l−1
j refers to the j-th feature in the preceding l − 1 -th
cessing inputs with inherent spatial structures. Renowned layer, K lij signifies the kernel connecting the i-th to the j-th
for efficacy across diverse data formats such as one-
dimensional time series data, two-dimensional image data, features, bli is the bias associated with these features, and
and three-dimensional video data [26], the study used the Relu stands for the activation function.
1D-CNN model for sequential data. The network uses Figure 1 illustrates a diagram representing how feature
algorithms for data mining to automatically recognize and maps are formed in a CNN model. In the diagram, u repre-
pick the most critical features from the raw data. The sents a sample filter with adjustable weights of Size 3. Each
sequential 1D-CNN model was constructed using convolu- C i denotes the i-th element of the feature map. N stands
tional layers, pooling layers, and dense layers. The input for the number of 1D data tensor units, and f represents
was a 3D tensor with shape (batch size, time steps, and the filter size with a stride of 1.
input dim). According to Rala Cordeiro et al. [29], the stride value
The role of the convolutional layer in the CNN model defines how the kernel moves in the input data. The most
is to identify temporal patterns and relationships within common value is one, meaning that the kernel moves over
sequential data, such as time series [27]. The role of the one column of the input data at each iteration. After convolu-
pooling layer, on the other hand, is to perform downsam- tion, pooling reduces dimensionality and improves feature
pling to address computational complexity and achieve robustness. Pooling size corresponds to input feature units.
translation invariance, enabling the model to identify fea- It applies a function to multiple inputs (convolutional fea-
tures regardless of location in the input. Furthermore, tures). Max pooling defines the pooling layer. This study used
downsampling helps to minimize the likelihood of overfit- two CNN layers to strike a balance between complexity and
ting, increase computational efficiency, and decrease the performance. This approach reduces the risk of overfitting
number of parameters. CNNs combine concepts such as by avoiding excessive complexity and mitigating challenges
weight sharing and local connectivity to improve the like vanishing or exploding gradients during training. Previous
model’s ability to do complex tasks. Relu is a popular activa- research, like [30], may support the efficacy of two-layer CNN
tion function in CNNs due to its nonlinearity, computational architectures for the task at hand. The decision reflects a
efficiency, and ability to solve vanishing gradient problems of thoughtful consideration of computational efficiency and
the training model in deep learning. While CNNs are effec- training stability. Figure 2 depicts the structure of a 1D-CNN
tive models for time series forecasting applications, overfit- model with two convolutional and two max pooling layers.
ting is a problem that can affect them. The overfitting The output of the last pooling layer is flattened and connected
problem can arise from factors such as the complexity of to a dense layer with N units. Following the final pooling layer,
the model and highly correlated training data. Dropout is a the output is flattened and connected to a dense layer contain-
regularization technique used in neural networks to prevent ing N units. Subsequently, this dense layer is connected to the
overfitting. final output layer, which consists of a single neuron.
Feature maps in 1D-CNNs serve as representations of 2.6. BiLSTM Model. Hochreiter and Schmidhuber [31] first
extracted features from time series data. These maps are proposed the LSTM architecture as a specific type of recur-
generated through convolutional filters, capturing specific rent neural network (RNN) to overcome the limitations of
patterns and structures inherent in the data. Convolutional traditional RNNs in capturing and learning long-term
filters, also known as kernels, are small matrices used in dependencies in sequential data. While LSTM captures
CNNs to extract features from input data. These filters information from extended periods, the acquired data per-
slide or convolve across the input data, performing math- tains to the time before the output moment, which lacks
ematical operations to capture patterns and features at dif- reverse information. However, for time series prediction,
ferent locations. Each feature map corresponds to learned it is crucial to consider both backward and forward infor-
features within the time series, such as trends or periodic- mation patterns to enhance predictive performance. The
ities. As the data propagates through the layers, deeper two LSTMs that make up BiLSTM are forward and
layers learn more abstract features, building upon earlier reverse. In contrast to the regular LSTM’s one-way state
representations. In a CNN’s convolutional layer, features transfer, the BiLSTM takes into account the data’s chang-
from the previous layer are combined with learnable ker- ing laws both before and after data transmission, enabling
nels and activation functions like a hyperbolic tangent, sig- it to make more thorough and precise decisions by utiliz-
moid, and Relu to produce feature maps [28]. As such, ing both past and future knowledge. It has performed bet-
each feature map output is combined with more than ter than expected.
one input feature map. In general, the convolved features Given the input sequence x = x1 , x2 , ⋯, xT , the hidden
at the output of the l-th layer can be written as shown layer sequence h = h1 , h2 , ⋯, hT and the network output
in Equation (6) [28]. vector y = y1 , y2 , yT of the standard BiLSTM model are
4185, 2024, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2024/6305525 by CochraneChina, Wiley Online Library on [02/08/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Journal of Applied Mathematics 5
C1 C2 C3 C4
X1 X2 X3 X4 X5 X6
Input layer
Filter
Convolutional layer 1
Output layer
Convolutional layer 2
M
ax
p oo
lin
g
Sliding filer
Ma
xp
oo
lin
g
ten
Sliding filer
Flat
L = 1823
Featu
re Filter
maps Featu
re ma
ps
Figure 2: Simplified view of the 1D-CNN model for Apple Inc. data.
iteratively calculated from t = 1 to t = T. The updated mem- In Equation (9), denoted as f t , the forget gate mecha-
ory cell can calculate the current hidden state ht through the nism. The sigmoid activation function σ was used to judge
following formulas: whether the last memory needs to be retained for the current
memory state. Equation (7) describes the computation of it ,
it = σ W i ·ht−1 , xt + bi 7 serving as the input gate to assess the significance of retain-
ing current input data. Equation (8) describes the calculation
Ct = tanh W c · ht−1 , xt + bc 8 of Ct , which is used to calculate the data that needs to be
updated. Equation (10) shows whether the state at the cur-
f t = σ W f · ht−1 , xt + b f 9 rent moment needs to be updated. After a new state was
obtained, Equation (10) was used to calculate the output gate
Ct = f t · C t−1 + it · C t 10 value Ot .
The predicted value of the system can be given by the
Ot = σ W o · ht−1 , xt + bo 11 linear activation function as
ht = Ot · tanh C t 12 yt = W hy · ht + by 13
4185, 2024, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2024/6305525 by CochraneChina, Wiley Online Library on [02/08/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
6 Journal of Applied Mathematics
80% training
data Test Evaluate accuracy
Training 20% on each fold
Leave 1 fold
for test
20% testing
data
Figure 3 illustrates the internal organization of the models. CNN effectively captures spatial dependencies,
BiLSTM model, constructed with LSTM blocks. BiLSTM, while BiLSTM excels at capturing long-term dependencies,
comprising both forward and backward LSTM components, leveraging both temporal and spatial features for improved
necessitates a reversal of the computation. forecasting.
2.7. SARIMA-GARCH-CNN-BiLSTM. The study introduced 2.7.1. Model Development Procedure. The overall process of
a novel approach to volatility forecasting called the hybrid modeling followed the following algorithms.
SARIMA-GARCH-CNN-BiLSTM Model. First, the mean
model was built using SARIMA, which was well known for 2.7.2. Data Preprocessing
its ability to detect seasonal and temporal trends in financial
time series data. Second, the volatility was estimated using 1. Filling missing data. Due to the closure of financial
three models from the GARCH family: GARCH, TGARCH, markets on weekends (Saturdays and Sundays) and
and EGARCH. This makes it easier to choose the most accu- public holidays, missing values may occur in stock
rate model among the GARCH variations. As a result, asym- datasets. Additionally, issues with processing and reg-
metry and time-varying patterns can be captured. SARIMA istration during the data retrieval process could con-
residuals were used as input to estimate volatility in the tribute to missing data. Incomplete data introduces
GARCH family models. Lastly, CNN and BiLSTM architec- biases due to discrepancies between observed and
tures receive the predicted output of the GARCH family unobserved data. In time series prediction, it is
4185, 2024, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2024/6305525 by CochraneChina, Wiley Online Library on [02/08/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Journal of Applied Mathematics 7
Test
Fitting mean model (�t)
Returns series rt = �t + Єt SARIMA (p, d, q) (P, D, Q, s)
Validation data
Train data
Єt = rt ‐ ⌃rt GARCH
Є1 �1
Є2 �2
GARCH
· ·
· ·
Єt-1 GARCH �t-1
Єt GARCH �t
�1 �2 ... �p
�2 �3 ... �
p+1
...
...
Forward
LSTM LSTM LSTM LSTM
Backward
LSTM LSTM LSTM LSTM
200
180
160
140
120
Price
100
80
60
40
9/20/2018
1/28/2019
6/7/2019
10/15/2019
2/23/2020
7/2/2020
11/10/2020
3/20/2021
7/29/2021
12/7/2021
4/16/2022
8/25/2022
1/2/2023
5/12/2023
9/19/2023
Days
Interpolated
Original returns
Winsorized returns
0
9/21/2018
1/29/2019
6/8/2019
10/16/2019
2/24/2020
7/3/2020
11/11/2020
3/21/2021
7/30/2021
12/8/2021
4/17/2022
8/26/2022
1/3/2023
5/13/2023
Days
Returns
Trend component
0.0025
0.0000
9/21/2018
2/19/2019
7/20/2019
12/18/2019
5/18/2020
10/16/2020
3/17/2021
8/16/2021
1/15/2022
6/15/2022
11/14/2022
4/14/2023
9/12/2023
Seasonal component
0.025
0.000
–0.025
9/21/2018
2/19/2019
7/20/2019
12/18/2019
5/18/2020
10/16/2020
3/17/2021
8/16/2021
1/15/2022
6/15/2022
11/14/2022
4/14/2023
9/12/2023
Residual component
0.05
0.00
–0.05
9/21/2018
2/19/2019
7/20/2019
12/18/2019
5/18/2020
10/16/2020
3/17/2021
8/16/2021
1/15/2022
6/15/2022
11/14/2022
4/14/2023
9/12/2023
Figure 9: Additive decomposition of Apple Inc. winsorized return data.
2.8. Forecasting Performance Evaluation. The study evalu- Coef. Std. err. z P> z 0.025 0.975
ated forecast efficiency using two key metrics: mean absolute Intercept 0.0011 0.001 1.660 0.097 0.001 0.002
error (MAE) and root mean squared error (RMSE). MAE,
ma.L1 0.1998 0.017 12.014 0.001 0.167 0.232
less sensitive to outliers, measures average error magnitude,
while RMSE, more sensitive to outliers, provides deeper ar.S.L2 −0.2096 0.018 −11.902 0.001 −0.244 −0.1375
insights into prediction performance. By utilizing both met- Sigma2 0.0003 0.00659 46.658 0.001 0.000 0.000
rics, the study comprehensively assesses the model’s predic-
tive capabilities, considering data properties.
Table 6: ARCH effect test for residuals of the SARIMA model.
N
1 Lag Statistics p value
MAE = 〠 Y − Ŷ k 16
N k=1 k For residual series
Lag 1 99.0427 0.001
ARCH-LM
Lag 21 153.3352 0.001
1 N Lag 1 0.008505 0.926522
〠 Y − Ŷ k
2 Ljung–Box
RMSE = 17 Lag 21 8.949068 0.111113
N k=1 k
For squared residual series
where N is the number of observations, Y t are the actual Lag 1 100.864047 265.57485
Ljung–Box
values at time t, and Ŷ k are the predicted values of the model Lag 21 0.0096 0.00249
at time t.
4185, 2024, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2024/6305525 by CochraneChina, Wiley Online Library on [02/08/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Journal of Applied Mathematics 11
1.0
0.8
0.6
Density
0.4
0.2
0.0
−4 −3 −2 −1 0 1 2 3
Standardized residuals
Empirical Logistic
Norm t_10
Expon
3. Data cifically, the study selected the 1st and 99th percentiles,
substituting exceedingly low values with those correspond-
This study used secondary data, namely, daily Apple Inc. ing to the 1st percentile and exceedingly high values with
close price data from 23 September 2018 to 19 September those associated with the 99th percentile.
2023, extracted from the Yahoo Finance website. There were
a total of 1256 daily observations. The database had 565
missing data points. Thus, the author filled in values using
4. Result and Discussion
cubic spline interpolation, as stated in Equation (14). The The Apple Inc. price series in Figure 6 illustrates long-term
dataset was split into three portions: the training set, which increasing and decreasing trends. It suggests a nonstationar-
accounted for 60% of the data; the validation set, which con- ity series. Since the price series exhibits nonstationarity, the
stituted 20% of the data to assess the performance of the study applies a logarithmic transformation and takes the first
trained model; and the remaining 20%, which was used for difference at lag 1 to obtain the return series. Thus, when
testing to evaluate the final performance of the model. Inves- examining higher kurtosis in original return data, as shown
tors tend to focus more on price returns, which reflect price in Table 2, it is desirable to winsorize data to make it less
variation, rather than the price itself [36, 37], because return susceptible to outliers. Figure 7 displays the distribution of
data is typically stationary, making it suitable for use in time returns and winsorized returns through box plots, showcas-
series models. Thus, we chose to conduct our experiments ing their respective central tendencies and dispersion. Due to
using this variable. Additionally, the study scales the price the detection of outliers in the return data, the study opted to
returns to percentages to depict daily returns, as shown in winsorize the dataset as shown in Figure 7. Table 2 shows
Equation (18). the summary statistics of the returns and winsorized returns.
The Jarque–Bera test rejects the normality assumption for
Pt both returns. With negative kurtosis, a lighter tail, and fewer
r t = log × 100 18
Pt−1 outliers in the winsorized returns, the study utilized the
depicted winsorized returns data in Table 2 for further
where Pt is the daily closing price of Apple Inc. on day t, Pt−1 analysis.
is the daily closing price of Apple Inc. on the previous day, Figure 8 displays the winsorized graph that fluctuated
and rt represents the daily returns of the Apple Inc. price above and below the 0 lines, indicating that the price series
index at time t. Since the returns exhibited a leptokurtic achieved stationarity in the mean. However, the series is
property, meaning excessive kurtosis compared to a Gauss- not stationary in variance.
ian distribution (kurtosis = 3), the sensitivity to extreme fluc- The study also tested for stationarity using the aug-
tuations around the mean is pronounced. To identify mented Dickey–Fuller test. The ADF test result displayed
outliers within the return series, the z-score test technique in Table 3 rejects the null hypothesis of a unit root’s exis-
was employed. The z-score test is a statistical method used tence, supported by the ADF test value of −10.2166 and a
to evaluate the positioning of a data point in relation to p value of 0.02576. It is safe to say that the winsorized daily
the mean of a dataset, determining whether it falls within a returns are stationary.
specified range of values [38]. Upon identifying outliers The visual representation of the decomposition of the
within the dataset, winsorization was utilized as a remedial return time series is shown in Figure 9. This decomposition
strategy, applying percentile thresholds for adjustment. Spe- separates the series into three distinct components: trend,
4185, 2024, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2024/6305525 by CochraneChina, Wiley Online Library on [02/08/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
12 Journal of Applied Mathematics
Table 7: Anderson–Darling test results. Table 9: AIC and BIC of SARIMA-TGARCH models.
2.00
1.75
1.50
Volatility 1.25
1.00
0.75
0.50
0.25
0.00
9/21/2022
10/27/2022
12/2/2022
1/7/2023
2/12/2023
3/20/2023
4/25/2023
5/31/2023
7/6/2023
8/11/2023
9/16/2023
Days
2023-09
2023-09
2023-07
2023-07
2023-05
2023-05
Days
2023-03
2023-03
Predicted-hybrid SARIMA-TGARCH-CNN
Predicted-hybrid SARIMA-SGARCH-CNN
(b)
(a)
2023-01
2023-01
2022-11
2022-11
Actual - test
Actual - test
2022-09
2022-09
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
1.4
1.2
1.0
0.8
0.6
0.4
0.2
Volatility Volatility
14
4185, 2024, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2024/6305525 by CochraneChina, Wiley Online Library on [02/08/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Journal of Applied Mathematics 15
1.4
1.2
1.0
0.8
Volatility
0.6
0.4
0.2
0.0
2022-09 2022-11 2023-01 2023-03 2023-05 2023-07 2023-09
Days
Actual - test
Predicted-hybrid SARIMA-EGARCH-CNN
(c)
Figure 12: Volatility forecasts for hybrid econometrics with CNN models. (a) Volatility forecast of hybrid SARIMA-SGARCH-CNN, (b)
volatility forecast of hybrid SARIMA-TGARCH-CNN, and (c) volatility forecast of hybrid SARIMA-EGARCH-CNN.
2023-09
2023-09
2023-07
2023-07
2023-05
2023-05
Predicted-hybrid SARIMA-TGARCH-CNN-BiLSTM
Days
2023-03
2023-03
(b)
(a)
2023-01
2023-01
2022-11
2022-11
Actual - test
Actual - test
2022-09
2022-09
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
1.4
1.2
1.0
0.8
0.6
0.4
0.2
Volatility Volatility
16
4185, 2024, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2024/6305525 by CochraneChina, Wiley Online Library on [02/08/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Journal of Applied Mathematics 17
1.4
1.2
1.0
0.8
Volatility
0.6
0.4
0.2
0.0
2022-09 2022-11 2023-01 2023-03 2023-05 2023-07 2023-09
Days
Actual - test
Predicted-hybrid SARIMA-EGARCH-CNN-BiLSTM
(c)
Figure 13: Volatility forecasts for a hybrid of econometrics with CNN-BiLSTM models. (a) Volatility forecast of hybrid SARIMA-
SGARCH-CNN-BiLSTM, (b) volatility forecast of hybrid SARIMA-TGARCH-CNN-BiLSTM, and (c) volatility forecast of hybrid
SARIMA-EGARCH-CNN-BiLSTM.
5. Conclusion References
Based on the findings presented in the study, the following [1] E. F. Fama, “The behavior of stock-market prices,” The Journal
conclusions were drawn. The study used the input as resid- of Business, vol. 38, no. 1, pp. 34–105, 1965.
uals of the SARIMA model for three hybrid econometric [2] T. H. Roh, “Forecasting the volatility of stock price index,”
models and selected the best model. The results showed that Expert Systems with Applications, vol. 33, no. 4, pp. 916–922,
the hybrid SARIMA-SGARCH model outperformed the 2007.
other two hybrid models. Additionally, the study incorpo- [3] D. Bhowmik, “Stock market volatility: an evaluation,” Interna-
rated the estimated volatilities of the econometric models tional Journal of Scientific and Research Publications, vol. 3,
as input features to the CNN to enhance prediction accu- no. 10, pp. 1–17, 2013.
racy. The results indicated that the hybrid SARIMA- [4] R. F. Engle, “Autoregressive conditional heteroscedasticity
SGARCH-CNN model outperformed the hybrid SARIMA- with estimates of the variance of United Kingdom inflation,”
EGARCH-CNN and hybrid SARIMA-TGARCH-CNN Econometrica: Journal of the Econometric Society, vol. 50,
models. Finally, the study constructed three new models no. 4, pp. 987–1007, 1982.
and used the estimated volatility of the hybrid econometrics [5] T. Bollerslev, “Generalized autoregressive conditional hetero-
as input to the CNN-BiLSTM model, concluding that the skedasticity,” Journal of Econometrics, vol. 31, no. 3, pp. 307–
hybrid SARIMA-SGARCH-CNN-BiLSTM model performs 327, 1986.
well. This proposed model demonstrated its effectiveness, [6] R. Khaldi, A. El Afia, and R. Chiheb, “Forecasting of BTC vol-
particularly with Apple Inc. data, providing valuable insights atility: comparative study between parametric and nonpara-
metric models,” Progress in Artificial Intelligence, vol. 8,
for financial data analysis and risk management strategies,
no. 4, pp. 511–523, 2019.
thus aiding investors in making informed decisions.
[7] D. B. Nelson, “Conditional heteroskedasticity in asset returns:
a new approach,” Econometrica: Journal of the Econometric
Society, vol. 59, no. 2, pp. 347–370, 1991.
6. Recommendation
[8] J.-M. Zakoian, “Threshold heteroskedastic models,” Journal of
The study was limited to investigating the volatility predic- Economic Dynamics and Control, vol. 18, no. 5, pp. 931–955,
tion of Apple Inc. using the three new hybrids of economet- 1994.
rics with CNN-BiLSTM models. As a result, the following [9] B. Y. Almansour, M. M. Alshater, and A. Y. Almansour, “Per-
aspects for further investigation are suggested. Future studies formance of ARCH and GARCH models in forecasting cryp-
should explore incorporating attention mechanisms or tocurrency market volatility,” Industrial Engineering &
Management Systems, vol. 20, no. 2, pp. 130–139, 2021.
transformer architectures to capture complex temporal
dependencies and improve forecasting accuracy. Addition- [10] P. H. Franses and D. Van Dijk, “Forecasting stock market vol-
atility using (non-linear) GARCH models,” Journal of Fore-
ally, evaluating the impact of incorporating additional data
casting, vol. 15, no. 3, pp. 229–235, 1996.
sources beyond historical stock prices, such as sentiment
[11] J. Sen, S. Mehtab, and A. Dutta, “Volatility modeling of stocks
analysis and financial news, could augment the forecast
from selected sectors of the Indian economy using GARCH,”
model. Furthermore, extending the research to include vola-
in 2021 Asian Conference on Innovation in Technology
tility forecasting for a portfolio of assets and exploring corre- (ASIANCON), PUNE, India, 2021.
lations between the assets is recommended.
[12] J. Aduda, P. Weke, P. Ngare, and J. Mwaniki, “Financial time
series modelling of trends and patterns in the energy markets,”
Journal of Mathematical Finance, vol. 6, no. 2, pp. 324–337,
Data Availability Statement 2016.
The data of this study can be obtained by contacting the cor- [13] Y. Liu, “Novel volatility forecasting using deep learning–long
short term memory recurrent neural networks,” Expert Sys-
responding author.
tems with Applications, vol. 132, pp. 99–109, 2019.
[14] H. Y. Kim and C. H. Won, “Forecasting the volatility of stock
Conflicts of Interest price index: a hybrid model integrating LSTM with multiple
GARCH-type models,” Expert Systems with Applications,
The authors declare no conflicts of interest. vol. 103, pp. 25–37, 2018.
[15] K. Kakade, A. K. Mishra, K. Ghate, and S. Gupta, “Forecasting
commodity market returns volatility: a hybrid ensemble learn-
Funding ing GARCH-LSTM based approach,” Intelligent Systems in
Accounting, Finance and Management, vol. 29, no. 2,
This work is supported by the Pan African University Insti- pp. 103–117, 2022.
tute for Basic Sciences, Technology and Innovation. [16] A. Vidal and W. Kristjanpoller, “Gold volatility prediction
using a CNN-LSTM approach,” Expert Systems with Applica-
tions, vol. 157, article 113481, 2020.
Acknowledgments [17] H. Zeng, B. Shao, G. Bian, H. Dai, and F. Zhou, “A hybrid deep
learning approach by integrating extreme gradient boosting-
This work is supported by the Pan African University Insti- long short-term memory with generalized autoregressive con-
tute for Basic Sciences, Technology and Innovation. ditional heteroscedasticity family models for natural gas load
4185, 2024, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2024/6305525 by CochraneChina, Wiley Online Library on [02/08/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Journal of Applied Mathematics 19
volatility prediction,” Energy Science & Engineering, vol. 10, the high-voltage subnet of Northeast Germany,” Sensors,
no. 7, pp. 1998–2021, 2022. vol. 23, no. 2, p. 901, 2023.
[18] D. K. Mademlis and N. Dritsakis, “Volatility forecasting using [36] E. Hajizadeh, A. Seifi, M. F. Zarandi, and I. Turksen, “A hybrid
hybrid GARCH neural network models: the case of the Italian modeling approach for forecasting the volatility of S&P 500
stock market,” International Journal of Economics and Finan- index return,” Expert Systems with Applications, vol. 39,
cial Issues, vol. 11, no. 1, pp. 49–60, 2021. no. 1, pp. 431–436, 2012.
[19] F. H. Mustapa and M. T. Ismail, “Modelling and forecasting [37] M. Seo and G. Kim, “Hybrid forecasting models based on the
S&P 500 stock prices using hybrid Arima-Garch model,” Jour- neural networks for the volatility of bitcoin,” Applied Sciences,
nal of Physics: Conference Series, vol. 1366, no. 1, article vol. 10, no. 14, p. 4768, 2020.
012130, 2019. [38] D. S. Moore and G. P. McCabe, Introduction to the Practice of
[20] L. R. Glosten, R. Jagannathan, and D. E. Runkle, “On the rela- Statistics, WH Freeman/Times Books/Henry Holt & Co., 1989.
tion between the expected value and the volatility of the nom-
[39] M. Stone, “An asymptotic equivalence of choice of model by
inal excess return on stocks,” The Journal of Finance, vol. 48,
cross-validation and Akaike’s criterion,” Journal of the Royal
no. 5, pp. 1779–1801, 1993.
Statistical Society: Series B (Methodological), vol. 39, no. 1,
[21] R. Rabemananjara and J.-M. Zakoian, “Threshold ARCH pp. 44–47, 1977.
models and asymmetries in volatility,” Journal of Applied
[40] R. Budiarti, K. Intansari, I. G. P. Purnaba, and F. Septyanto,
Econometrics, vol. 8, no. 1, pp. 31–49, 1993.
“Modelling dependencies of stock indices during Covid-19
[22] S.-H. Poon, A Practical Guide to Forecasting Financial Market pandemic by extreme-value copula,” Jurnal Teori dan Aplikasi
Volatility, John Wiley & Sons, 2005. Matematika, vol. 7, no. 3, pp. 805–819, 2023.
[23] H. Akaike, “A new look at the statistical model identification,”
[41] C. Alexander, Market Risk Analysis, John Wiley & Sons,
IEEE Transactions on Automatic Control, vol. 19, no. 6,
Boxset, 2009.
pp. 716–723, 1974.
[24] F. Berzal, Redes Neuronales & Deep Learning: Volumen II,
Independently published, 2019.
[25] M. Vakalopoulou, S. Christodoulidis, N. Burgos, O. Colliot,
and V. Lepetit, “Deep learning: basics and convolutional neu-
ral networks (CNNs),” in Machine Learning for Brain Disor-
ders, O. Colliot, Ed., vol. 197 of Neuromethods, Humana,
New York, NY, 2023.
[26] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning,
MIT press, 2016.
[27] J. Cao and J. Wang, “Stock price forecasting model based on
modified convolution neural network and financial time series
analysis,” International Journal of Communication Systems,
vol. 32, no. 12, article e3987, 2019.
[28] L. Muhammad, A. A. Haruna, U. S. Sharif, and M. B. Moham-
med, “CNN-LSTM deep learning based forecasting model for
Covid-19 infection cases in Nigeria, South Africa and
Botswana,” Health and Technology, vol. 12, no. 6, pp. 1259–
1276, 2022.
[29] J. Rala Cordeiro, A. Raimundo, O. Postolache, and
P. Sebastião, “Neural architecture search for 1D CNNs—dif-
ferent approaches tests and measurements,” Sensors, vol. 21,
no. 23, p. 7990, 2021.
[30] S. Y. Yerima, M. K. Alzaylaee, and A. P. V. Shajan, “Deep
learning techniques for android botnet detection,” Electronics,
vol. 10, no. 4, p. 519, 2021.
[31] S. Hochreiter and J. Schmidhuber, “Long short-term mem-
ory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[32] M. Noor, A. Yahaya, N. A. Ramli, and A. M. Al Bakri, “Filling
missing data using interpolation methods: study on the effect
of fitting distribution,” Key Engineering Materials, vol. 594,
pp. 889–895, 2014.
[33] K. Erdogan, “Spline interpolation techniques,” Journal of Tech-
nical Science and Technologies, vol. 2, no. 1, pp. 47–52, 2013.
[34] G. Memarzadeh and F. Keynia, “A new short-term wind speed
forecasting method based on fine-tuned LSTM neural network
and optimal input sets,” Energy Conversion and Management,
vol. 213, article 112824, 2020.
[35] F. Aksan, Y. Li, V. Suresh, and P. Janik, “CNN-LSTM vs.
LSTM-CNN to predict power flow direction: a case study of