Forecasting Financial Time Series Volatility Using Particle Swarm Optimization Trained Quantile Regression Neural Network
Forecasting Financial Time Series Volatility Using Particle Swarm Optimization Trained Quantile Regression Neural Network
a r t i c l e i n f o a b s t r a c t
Article history: Accurate forecasting of volatility from financial time series is paramount in financial decision making.
Received 3 May 2016 This paper presents a novel, Particle Swarm Optimization (PSO)-trained Quantile Regression Neural Net-
Received in revised form 3 March 2017 work namely PSOQRNN, to forecast volatility from financial time series. We compared the effectiveness
Accepted 6 April 2017
of PSOQRNN with that of the traditional volatility forecasting models, i.e., Generalized Autoregressive
Available online 24 April 2017
Conditional Heteroskedasticity (GARCH) and three Artificial Neural Networks (ANNs) including Multi-
Layer Perceptron (MLP), General Regression Neural Network (GRNN), Group Method of Data Handling
Keywords:
(GMDH), Random Forest (RF) and two Quantile Regression (QR)-based hybrids including Quantile Regres-
Financial time series volatility forecasting
GARCH
sion Neural Network (QRNN) and Quantile Regression Random Forest (QRRF). The results indicate that
Quantile regression the proposed PSOQRNN outperformed these models in terms of Mean Squared Error (MSE), on a majority
QRNN of the eight financial time series including exchange rates of USD versus JPY, GBP, EUR and INR, Gold
PSO Price, Crude Oil Price, Standard and Poor 500 (S&P 500) Stock Index and NSE India Stock Index considered
here. It was corroborated by the Diebold–Mariano test of statistical significance. It also performed well
in terms of other important measures such as Directional Change Statistic (Dstat) and Theil’s Inequality
Coefficient. The superior performance of PSOQRNN can be attributed to the role played by PSO in obtain-
ing the better solutions. Therefore, we conclude that the proposed PSOQRNN can be used as a viable
alternative in forecasting volatility.
© 2017 Elsevier B.V. All rights reserved.
based on the volatility of the underlying asset, (3) the financial risk ˆ = (rt − r̄) (1)
N−1
managers for reserving the capital of at least three times that of t=1
Value-at-Risk (VaR) based on the given volatility forecast and (4)
where rt is the return of financial variable on day t, r̄ is the average
the policy makers as a barometer to measure the vulnerability of
return over the N-day period. The standard deviation is the uncon-
financial markets and the economy [1].
ditional volatility over the N-day period. In this paper, we used t to
In finance, volatility is the spread of all likely outcomes of
refer to the volatility at time t and ˆt to refer to its forecast at time
an uncertain variable (e.g., spread of asset returns). Statistically,
t. The returns series are difference series. As the volatility does not
volatility [1] is often referred to the standard deviation, , or the
remain constant through time, the conditional volatility that uti-
lizes the volatility reference period is more relevant information
for various financial applications such as risk management, asset
∗ Corresponding author. pricing, investment analysis and option pricing [2].
E-mail addresses: [email protected] (D. Pradeepkumar), [email protected] Conventional volatility forecasting models include GARCH, MLP,
(V. Ravi). GRNN, GMDH and to a less extent RF. Their respective advantages
http://dx.doi.org/10.1016/j.asoc.2017.04.014
1568-4946/© 2017 Elsevier B.V. All rights reserved.
36 D. Pradeepkumar, V. Ravi / Applied Soft Computing 58 (2017) 35–52
and disadvantages are as follows. GARCH [3], while modeling condi- experimental design best. The results of the proposed model and
tional heteroskedasticity, allows reducing the number of estimated other volatility forecasting models are presented and discussed in
parameters to just a few. Further, various neural networks such as Section 6. Finally, Section 7 concludes the paper.
MLP [4], GRNN [5], and GMDH [6] are employed in volatility fore-
casting as they are data-driven, nonlinear and adaptive. Compared 2. Literature review
to linear forecasting models, neural networks can capture complex
nonlinear relationships. They can generalize well and are good uni- There are many types of models have been used for forecasting
versal approximators [7]. On the other hand, Random Forest [8] is volatility from financial time series. The reviews of the volatility
a robust method for both forecasting and classification problems, forecasting models are presented by Franses and McAleer [14],
because of its design that is replete with ensembling several deci- Poon and Granger [2], Andersen et al. [15] and Knight and Satchell
sion trees and random subspace modeling of the feature space. [16] respectively. In this section, we briefly present the review of
It grows as an ensemble of trees. Each tree, in turn, can handle the applications of various forecasting models including GARCH
different types of predictors and missing data too. and its variants, MLP, GRNN, GMDH, hybrid models and Quantile
Despite their advantages, GARCH variants cannot model asym- regression respectively used for volatility forecasting.
metries of the volatility concerning the sign of past shocks. In MLP,
convergence is slow, local minima can affect the training process, 2.1. Forecasting financial time series volatility using
and it is hard to scale. The GRNN is often more accurate than MLP Autoregressive Conditional Heteroschedasticity (ARCH) variants
but relatively insensitive to outliers and requires more memory
space to store the model. The GMDH generates a complex poly- The most widely used are the ARCH models proposed by Engle
nomial for a simple system. Because of its limited architecture, it [17], and then generalized by Bollerslev [3]. These models have
does not consider input–output relationship well. It also produces led the researchers to model and forecast volatilities from financial
complex networks as it tries to stretch for the last bit of accu- time series. Bollerslev et al. [18] presented a review of ARCH mod-
racy. Conclusively, we can say that the GMDH is very ineffective els in finance. Later, researchers adopted GARCH and its variants to
in modeling nonlinear systems which exhibit different character- forecast volatility from different financial time series and obtained
istics in different environments [9]. In RF, a large number of trees better volatility forecasts. Noh et al. [19] assessed the performance
result in slow convergence and extreme values are often not accu- of two volatility prediction models – Implied Volatility Regression
rately predicted as it underestimates high values and overestimates (IVR) Model and GARCH and concluded that GARCH outperforms
lower values. All of these disadvantages point to the need for better IVR by returning a greater profit after experimenting with S&P 500
volatility forecasting approaches. Stock Index.
To bridge the gap, one can attempt the existing two QR-based Vilasuso [20] adopted the Fractionally Integrated GARCH
hybrids – QRRF and QRNN, which nobody applied yet to solve this (FIGARCH) model that was introduced by Baillie et al. [21] to fore-
problem. Quantile regression (QR) is a regression mechanism that cast FOREX rate volatility and concluded that FIGARCH model could
models the relationship between dependent variable and indepen- capture features of exchange rate volatility better than GARCH
dent variables at different quantiles. The QRRF [10,11] is a QR-based and Integrated GARCH (IGARCH) models. The FIGARCH model also
hybrid and a variant of RF. It can yield predictions of volatility at generated better out-of-sample volatility forecasts after work-
different quantiles without assuming the unambiguous form of the ing with exchange rates of Canada, France, Germany, Italy, Japan,
underlying distribution of data. On the other hand, QRNN [12] is and the UK. Chang et al. [22] modeled the volatility of exchange
also a QR-based hybrid and a feedforward neural network that can rate RM/Sterling using the Stationary GARCH-in-Mean (GARCH-M)
be used to estimate the nonlinear models at different quantiles. model and concluded that the volatility of RM/Sterling is constant,
It avoids the need for a distributional assumption successfully by and the Stationary GARCH-M outperformed other GARCH models
applying quantile regression to the historical returns to produce in out-of-sample and one-step-ahead forecasting. Agnolucci [23]
various quantile models. However, the Back Propagation algorithm compared GARCH-type models with an implied volatility model for
that trains the QRNN suffers from entrapment in local minima prob- forecasting volatility in Crude Oil price and concluded that GARCH-
lem; Therefore, QRNN cannot yield accurate predictions. In this type models could better predict the volatility.
paper, we solved this problem by proposing a QRNN trained by
PSO [13] named as PSOQRNN. 2.2. Forecasting financial time series volatility using Artificial
The major contributions of this paper are: Neural Networks (ANNs)
1. We propose a novel ANN architecture namely PSOQRNN, and The stand-alone ANNs such as MLP and GRNN are applied
apply for forecasting volatility from financial time series. The to forecast volatility, and these outperformed the GARCH and
PSOQRNN is a variant of QRNN with the only difference that the its variants. Donaldson and Kamstra [24] applied MLP to
PSO trains QRNN there by yielding optimal weights and biases. forecast stock return volatility in London, New York, Tokyo
We also compared the performance of the PSOQRNN with that of and Toronto and concluded that MLP can capture volatility
other volatility forecasting models such as GARCH, MLP, GRNN, effects overlooked by GARCH, Exponential GARCH (EGARCH) and
GMDH, RF, QRRF and QRNN on eight financial time series. Glosten–Jagannathan–Runkle (GJR) models and produced better
2. In literature, to the best of our knowledge, there is no work out-of-sample volatility forecasts. Similarly, Miranda [25] inves-
reported on financial time series volatility forecasting models tigated the use of MLP for forecasting volatility implied in the
involving RF, QRRF, and QRNN. We also presented the applica- transaction prices of Ibex35 index options and concluded that the
tion of these volatility forecasting models on eight financial time MLP yielded better forecasting results. The author also tested and
series. This is the auxiliary contribution of the current study. rejected the hypothesis that volatility changes are unpredictable
on an hourly basis. Hamid and Iqbal [26] reported that MLP yielded
The remainder of this paper is structured as follows. Section 2 better volatility forecasts than implied volatility forecasts on S&P
reviews various volatility forecasting models of financial time 500 future stock indices and these are not different from realized
series found in the literature. Section 3 presents the overview of the volatility.
techniques used. Section 4 describes the proposed PSOQRNN and its Aragonés et al. [27] investigated whether ANNs could improve
application in forecasting volatility in detail. Section 5 presents the the traditional volatility forecasts from both the time series models
D. Pradeepkumar, V. Ravi / Applied Soft Computing 58 (2017) 35–52 37
and the implied volatilities obtained from Spanish stock market proposed model is based on the long memory property that uses
index, the IBEX-35 and concluded that GRNN yielded better predic- wavelet decomposed data. They concluded that applying hybrid
tions than implied volatility, GARCH, Threshold ARCH (TARCH) and methods led to more accurate forecasts. Kristjanpoller et al. [42]
MLP. Mohsen et al. [28] reported that GRNN outperformed GARCH implemented hybrid neural network models for volatility predic-
(1,1) in terms of Root Mean Squared Error (RMSE) in forecasting tion by applying it in three different Latin-American emerging stock
volatility of the prices of two Crude Oil markets of Brent and markets. It outperformed GARCH and ANN, in their stand-alone
WTI. mode, in terms of MAPE, whereas, with respect to the measures
of both Mean SquareMSE and Mean Absolute Deviation (MAD),
2.3. Forecasting financial time series volatility using hybrid the differences between the performances of GARCH and ANN are
models not significant in most of the cases. Choudhury et al. [39] pro-
posed a novel Self-organizing Map (SOM)-based hybrid clustering
During last few years, various hybrid volatility forecasting technique that is integrated with Support Vector Regression for
models are proposed. These hybrid models outperformed sta- portfolio selection, accurate price, and volatility predictions. The
tistical volatility forecasting models and various ANNs. Zhuang research considered the top 102 stocks for National Stock Exchange
and Chan [29] developed Hidden Markov Model-GARCH (HMM- (NSE) Stock market, India to identify a set of best portfolios that an
GARCH) model that allows both different volatility states in investor can maintain for risk reduction and high profitability. The
time series and state-specific GARCH models within each state. authors concluded that the work could find various applications in
The authors concluded that the proposed model had overcome software development acting as investing guide to a target trader
excessive persistence problems and outperformed GARCH for in a volatility market.
both in-sample and out-of-sample evaluation. Roh [30] pro- Rosa et al. [40] suggested an evolving hybrid neural fuzzy net-
posed hybrid volatility forecasting models with a neural network work (eHFN) approach for realized volatility forecasting using S&P
such as Neural Network-Exponentially Weighted Moving Aver- 500 and Nasdaq (United States), FTSE (United Kingdom), DAX
ages (NN-EWMA), NN-GARCH and NN-EGARCH to forecast stock (Germany), IBEX(Spain) and Ibovespa (Brazil) and concluded that
price index and concluded that NN-EGARCH outperformed other the proposed hybrid outperformed GARCH and related stochastic
proposed hybrid models. Tseng et al. [31] proposed an ANN- volatility models. Babu and Reddy [41] predicted NSE Indian stock
EGARCH model of the volatility of the Taiwan stock index option using partitioning-interpolation based Autoregressive Integrated
prices. The ANN-EGARCH model could capture the asymmetric Moving Averages-GARCH (ARIMA-GARCH) model. The proposed
volatility. It also simultaneously decreased the stochasticity and model improved prediction accuracy compared to ARIMA, GARCH,
nonlinearity of the error term sequence. The proposed model and ANN. It also preserved the data trend over the prediction hori-
outperformed EGARCH and Grey-EGARCH in terms of Mean Abso- zon better than these models. Kristijanpoller and Minutolo [42]
lute Error (MAE), RMSE, and Mean Absolute Percentage Error predicted Gold Price volatility using hybrid ANN-GARCH model
(MAPE). that outperformed GARCH. The authors also demonstrated an
Tseng et al. [32] proposed a new hybrid model for stock volatility innovative method to determine which financial variables are
prediction by using Grey-GARCH model and concluded that pro- the most important in affecting the volatility of spot gold prices
posed model enhanced the one-period-ahead volatility forecasts of and future prices. Dash et al. [43] proposed a new hybrid model
the GARCH model using MAE, RMSE and MAPE. But, the model failed namely Interval Type2 Fuzzy-Computationally Efficient-EGARCH
to outperform the GARCH(1,1) model for certain cases. Wang [33] (IT2F-CE-EGARCH) integrating an Interval Type2 Fuzzy Logic Sys-
proposed Grey-GJR-GARCH volatility model for forecasting stock tem (IT2FLS) with a Computationally Efficient Link Artificial Neural
index option price. It yielded better predictions than other volatil- Network (CEFLANN) and EGARCH model for accurate forecast-
ity forecasting approaches including both GARCH and GJR-GARCH ing and modeling of stock market (BSE Sensex and CNX Nifty)
in terms of MAE, RMSE, and MAPE. Bildirici and Ersin [34] pro- volatility. The Differential Harmony Search (DHS) algorithm gets
posed ANN-APGARCH model that increased the performance of used for optimizing the parameters of the entire fuzzy time series
APGARCH model by applying it to the daily returns of Istanbul Stock model.
Exchange. The ANN-extended versions of GARCH models improved
forecast results than various variants of GARCH in terms of
RMSE. 2.4. Forecasting financial time series volatility using Quantile
Hung [35] proposed a Fuzzy-GARCH model for the forecast- Regression (QR)
ing volatility of the stock market. The Genetic Algorithm (GA), in
the proposed model, is used to achieve a global optimal solution There are, however, very few works reported related to the
with a fast convergence rate. The author concluded that the pro- application of Quantile Regression (QR) to forecast financial time
posed hybrid model outperformed the stand-alone GARCH model. series volatility. Taylor [12] proposed a QR-based approach to
Chang et al. [22] introduced a hybrid Adaptive Neuro-Fuzzy Infer- estimate the conditional probability distribution of multi-period
ence System (ANFIS) model based on AR model and volatility financial returns. After experimenting with exchange rates, the
for Taiwan Futures Exchange (TAIEX) forecasting. The proposed author concluded that the QR-based approach is best suitable
model outperformed AR model and two other works in litera- to estimate the conditional density compared to GARCH-based
ture in terms of RMSE. Hajizadeh et al. [36] proposed a hybrid quantile estimates. Huang et al. [44] adopted QR to forecast
modeling approach for forecasting the volatility of S&P 500 index volatility from exchange rates. After experimenting it with nine
returns. In this approach, three variants of GARCH had been cali- exchange rates using 19 years of data, the authors concluded
brated. The EGARCH(3,3) turned out to be the best model. Monfared that QR produced more reliable volatility forecasts. Except for
and Enke [27] proposed hybrid GJR-GARCH Neural Network Model these works, to the best of our knowledge, there is no other
for volatility forecasting and concluded that the proposed hybrid published work related to volatility forecasting using quantile
model outperformed the GJR-GARCH model after experimenting regression and related hybrids. In this context, our work bridges
with 10 NASDAQ indices. the gap by adopting two quantile regression based hybrids includ-
Komijani et al. [38] introduced a hybrid approach namely ing QRRF and QRNN to forecast volatility. In addition to these, there
Autoregressive Fractionally Integrated Moving Averages-FIGARCH are no works found that applied RF in the context of volatility
(ARFIMA-FIGARCH) for forecasting Crude Oil prices volatility. The forecasting.
38 D. Pradeepkumar, V. Ravi / Applied Soft Computing 58 (2017) 35–52
In this paper, we employed various volatility forecasting models GMDH, proposed by Ivakhnenko [6], is a self-organized
including GARCH, MLP, GRNN, GMDH, RF, QRRF and QRNN. This feed-forward network based on short-term polynomial transfer
section presents the overview of the techniques employed. functions, called Ivakhnenko polynomials. The coefficients of these
transfer functions are obtained using least square regression [45].
It is best suitable ANN for dealing with inaccurate, noisy, or small
3.1. Generalized Autoregressive Conditional Heteroskedasticity data sets. It yields higher accuracy and is a simpler structure than
(GARCH) model traditional ANN models [46]. It is the earliest proposed deep learn-
ing neural network architecture, where nodes in the hidden layers
GARCH model [3] is a generalized version of ARCH model [17]. are dropped if they are found to be not having sufficient predictive
The term AR comes from the fact that this model is an autoregres- power.
sive model in squared returns. The term conditional comes from the
fact that in this model, next period’s volatility is predicted based on
information of this period. Heteroskedasticity means non-constant
variance/volatility. It is based on the assumption that volatility 3.5. Random Forest (RF)
forecast changing in time depends on lagged values of standard
deviation of asset returns. It differs from ARCH by the form of t . In order to understand Random Forest [8], we first need to
In the GARCH model, the forecast t can be obtained using a con- appreciate how Classification and Regression Tree (CART) [47]
stant depicting constant variance throughout trading days, a sum of works. CART is a recursively partitioned binary decision tree, and it
p weighted products of last periods’ forecasts (GARCH terms) and is one of the non-parametric statistical techniques that are used to
the sum of q weighted products of last periods’ squared residual solve classification and regression problems using the ‘if-then’ rules
terms (ARCH terms). The GARCH (p, q) model can be defined as in it generates in the process of training [47]. It can model nonlinear-
Eq. (2). ity very well, and it yields readily interpretable results. However, it
is worth to note that CART becomes unstable even when there are
rt = + t small changes in the training data. A bootstrap aggregating tech-
nique namely bagging can be applied to overcome the problem of
√
t = t zt instability, [48]. In bagging, a large number of bootstrap samples are
(2) taken from the dataset and a single tree is fit to each bootstrap sam-
p
q
ple. The predictions are obtained using the average of predictions
t = K + Ai t−i + Bj 2t−j
obtained from each of the fitted trees.
i=1 j=1
The Random Forest (RF) is a modified version of bagged tree.
In this RF, a random subset of the predictor variables is used for
where rt is the return at time t, is the mean return, t is return
each tree and at each node, which is not possible with bagging. It
residual term (also known as innovation) which is a product of a
includes the effective methods of handling missing data too.
stochastic piece zt and a time-dependent volatility, a random vari-
able zt is a standardized, independent and identically distributed
(i.i.d) random draw from normal distribution with zero mean/unit
variance. The parameters of the model are K, A1 , A2 , . . ., Ap , B1 , B2 , 3.6. Quantile Regression Random Forest (QRRF)
. . ., Bq ;K > 0 is a constant, Ai ≥ 0, Bj ≥ 0 are the coefficients used in
multiplying GARCH and ARCH terms respectively, p is the degree Quantile (0 < < 1) is a boundary of uniformly sized consecu-
of GARCH terms t and q is the degree of ARCH terms t which are, tive subsets. The median (50th percentile), 70th percentile, 90th
in turn, obtained by using Akaike Information Criterion (AIC) as in percentile etc are examples of different quantiles. Both simple lin-
Autoregressive Integrated Moving Averages (ARIMA) model. ear regression and multiple linear regression models can model
the behavior of predictand by assuming average Gaussian distri-
bution whereas Quantile Regression [49] can model it at different
3.2. Multi-Layer Perceptron (MLP) quantiles. This departure results in various conclusions compared
to examining only the average of the predictand. Quantile Regres-
Multi-Layer Perceptron [4] is one of the most commonly sion Random Forest (QRRF), introduced by Meinshausen [10], is a
employed neural network architectures. It is the widely used ANN generalization of RF. Each node in an RF stores only the mean of the
for pattern classification and prediction. It can yield accurate pre- observations excluding other information different from mean. In
dictions for challenging problems. It is too popular to be described contrast, each node in QRRF keeps track of the spread of the pre-
here in detail. dictand. It allows the construction of prediction intervals that could
cover highly probable new observations.
The GRNN [5] is a 4-layered neural network that has the unique 3.7. Particle Swarm Optimization (PSO)
features of learning in one pass and simple training algorithm. In
GRNN, each training sample is considered as a kernel during the PSO, developed by Kennedy and Eberhart [13], is very simple
training process, and the estimation is based on non-parametric to implement and has very few parameters to tweak. It progresses
regression analysis. It is discriminative against occasional outliers towards the solution by mutual sharing of knowledge of every
and erroneous observations. It can converge to any arbitrary nonlin- particle collectively. In PSO, the population of particles with veloc-
old is initially randomly generated. Each particle’s velocity
ity Vid
ear function of data with only a few training samples. The additional
knowledge needed to get the best fit of the function is relatively gets updated with respect to its corresponding old position xid old
small. These features make GRNN very useful tool to perform pre- using neighborhood best pid (see Eqs. (3) and (4)) and global best
dictions in real time. particle pgd until the convergence criterion is satisfied. After the
D. Pradeepkumar, V. Ravi / Applied Soft Computing 58 (2017) 35–52 39
New Old New In the current work, we propose a novel ANN namely PSO-
Xid = Xid + Vid (4)
trained Quantile Regression Neural Network (PSOQRNN) and its
where Vid Old is old velocity, V New is updated velocity, X Old is old application for predicting volatilities from financial time series. The
id id
New is updated particle, p is a local best particle, p architecture of PSOQRNN is the same as that of QRNN (see Fig. 1).
particle, Xid id gd is
The weights and biases are estimated using PSO. Typically, weights
the global best particle, C1 and C2 are two positive constants, is
and biases in QRNN are determined using Back Propagation (BP)
the inertia weight and finally, rand is a random number between
algorithm at different quantiles. The BP algorithm has the draw-
0 and 1.
backs such as large computational time, slow convergence rate and
entrapment in local minima [52]. The PSO overcomes these draw-
3.8. Quantile Regression Neural Network (QRNN) backs. PSO is a population-based evolutionary technique which is
derivative-free, has fewer parameters to tweak in, has fast conver-
For the purpose of forecasting, the most extensively used neu- gence rate and provides a near-global optimal solution. Therefore,
ral network is single hidden-layer feedforward network [7]. QRNN it was used effectively by researchers as an alternative to BP algo-
[50], whose architecture is depicted in Fig. 1, is one such neural net- rithm to train ANN [53–56]. Now, in the current work, it is also
work. It consists of m input neurons for predictors X1 , X2 , . . ., Xm , employed for training the QRNN.
which are connected to n hidden neurons in a single hidden layer, The reasons for choosing PSO to train the QRNN are as follows.
which, in turn, are connected to one output neuron that yields pre- Compared with other evolutionary algorithms, PSO is very intuitive
dictand. The difference between traditional feedforward ANN and and flexible, less sensitive to the nature of the objective function
QRNN lies in the process of training. If Eq. (5) is used as the cost func- and is able to handle objective functions with stochastic nature.
tion to train the feedforward ANN, then the outputs are estimates Moreover, the heuristics involved in PSO are easy to comprehend
of the conditional regression quantiles [51] and the resultant model and implement. Further, it has fewer user-defined parameters to
is QRNN [12]. QRNN is a flexible model that represents nonlinear tweak and does not require a good initial solution to start its itera-
predictor-predictand relationships, including the ones involving tion process [57,58].
interactions between predictors, without prior specification of the In the proposed methodology of obtaining volatility predictions
form of the relationships by the modeler [50]. using PSOQRNN, (1) the return series are obtained from original
financial time series using Return Series Generator, (2) the inno-
1
N
E = (yt − yˆt ) (5) vations series is obtained from the returns series obtained using
N Innovations Series Generator, (3) all volatilities are computed using
t=1
Volatility generator, (4) both volatilities and innovations series
u, if u ≥ 0 are divided into training set and test set using partition gener-
(u) = (6) ator, (5) the training sets are input to PSOQRNN, (6) the QRNN
( − 1)u, otherwise
is trained using weights obtained from PSO, (7) the trained PSO-
In Eq. (5), yt is actual value at time t, yˆt is predicted value at time t QRNN yields accurate training set predictions, (8) the test sets are
and N is the total number of observations trained. (.) is the tilted input to trained PSOQRNN and (9) finally, the predictions of test
absolute value function (also known as the check, tick or pinball set volatilities are obtained. This procedure is entirely depicted by
function). It accepts input namely u with the error value obtained Fig. 2.
from (yt − yˆt ) and returns the value using Eq. (6). Fig. 3 depicts the same process of obtaining predictions of
The output at the hidden layer can be calculated using Eq. (7): volatilities using various forecasting models. For forecasting the
m volatilities, GARCH, MLP,GRNN,GMDH and RF are trained. As for as
QRRF and QRNN are concerned, the models are trained at different
gj,t = tanh Xi,t wijh + bhj (7) quantiles. Finally, PSOQRNN is trained with the weights obtained
i=1 by PSO at different quantiles.
40 D. Pradeepkumar, V. Ravi / Applied Soft Computing 58 (2017) 35–52
The step-by-step procedure of obtaining predictions of volatil- 2. Obtain the innovations series t from rt using Eq. (10) (innova-
ities using the proposed PSOQRNN is clearly described in detail as tions series generator):
follows. Let Y = {y1 , y2 , . . ., yk , yk+1 , . . ., yN } be a set of N observations
of a financial variable recorded at t = {1, 2, . . ., k, k + 1, . . ., N} respec- t = rt − r̄ (10)
tively. Then the volatilities of financial variable can be forecast as where rt is return of financial variable at the time t and r̄ is
follows: mean return over n periods. The innovations (residual terms)
series together with returns series help the model in predicting
1. Obtain returns series rt from Y using Eq. (9) (returns series volatilities.
generator): 3. Calculate all of the volatilities t with t = ω, ω + 1, . . ., k, k + 1, . . .,
N based on window length ω, trading days 252, and standard
yt − yt−1 deviation std(.) using Eq. (11) (volatilities generator):
rt = (9) √
yt
t = std(ri , ri+1 , . . ., ri+ω−1 ) ∗ 252; i = 1, 2, . . ., N − ω(11)
where yt is the actual observation of time series at the time t On average, there are 252 trading days in a financial year. So, a
with t = {1, 2, . . ., k, k + 1, . . ., N}. Usually, the volatilities from moving window length of past 252 days [42] is used to obtain
financial time series can be obtained with the help of returns volatilities. In other words, each day’s volatility is measured
series. with the help of its previous 252 days’ returns. In order to
D. Pradeepkumar, V. Ravi / Applied Soft Computing 58 (2017) 35–52 41
Fig. 3. Detailed flow of forecasting volatility from financial time series using various forecasting models.
present the volatility in annualized terms, we simply need to to compute activation functions of all neurons. PSO in PSOQRNN
multiply the standard deviation value by the square root of 252. evaluates the error fitness function for iterations using pop-
4. Partition both of the volatilities series t and the innovations ulation size Z. Therefore, the total computational complexity of
series t as training set with t = ω, ω + 1, . . ., k and test set with PSOQRNN is O( 2 . . Z).
t = k + 1, k + 2, . . ., N respectively using partition generator.
5. Input the training set ( t−1 ,t , t ; t = ω, ω + 1, . . ., k) and Quan- 5. Experimental design
tile values Q* = { 1 , 2 , . . ., q }, (0 < i < 1; i = 1, 2, . . ., q). Let nH
represent the number of hidden nodes of PSOQRNN. Here, t−1 This section presents the description of various datasets used,
and t are inputs while t is output to PSOQRNN. performance measures used and the execution environment used
6. For each quantile value i in Q* , repeat steps 7, 8, 9 and 10. to show the effectiveness of the proposed PSOQRNN along with
7. Obtain weights using PSO algorithm as follows: other volatility forecasting models.
(a) Initialize the particles with random values within specified
range [0,1]. Each particle is a vector, whose length is deter- 5.1. Datasets used
mined by the total number of weights and biases of the
PSOQRNN. For example, if nH = 3, then the length of a parti- The datasets collected are of daily US Dollar (USD) exchange
cle is 13 (No. of weights = 2 * 3 + 3 * 1 = 9, biases = 3 + 1 = 4). rates with respect to four currencies – Japanese Yen (JPY), Great
(b) Evaluate the fitness of each particle using quantile regres- Britain Pound (GBP), Euro (EUR) and Indian Rupees (INR), Gold
sion error function as in Eqs. (5) and (6). Price in terms of USD, Crude Oil Price in terms of USD, S&P 500 Stock
(c) Update individual and global best fitnesses fitness values Index, and NSE India Stock Index. These eight financial datasets
and positions. are used for testing the effectiveness of the proposed volatility
(d) Update velocity and position of each particle using Eqs. (3) forecasting models. The foreign exchange data are obtained from
and (4) respectively. http://www.federalreserve.gov/releases/h10/hist/, Gold Price data
(e) Repeat steps (b), (c) and (d) until all iterations are finished. is obtained from http://www.quandl.com/LBMA/GOLD-Gold-
The coordinates of the global best particle are the weights Price-London-Fixing, Crude Oil Price data is obtained from https://
and bias values to PSOQRNN for training the network. www.quandl.com/data/FRED/DCOILBRENTEU-Crude-Oil-Prices-
8. Train PSOQRNN with the weights obtained from PSO in order to Brent-Europe, S&P 500 Stock Index data is obtained from https://
yield predictions of training set volatilities ˆt where t = ω, ω + 1, www.quandl.com/data/YAHOO/INDEX GSPC-S-P-500-Index and
. . ., k. NSE India Stock Index is obtained from https://www.quandl.com/
9. Input test sets of volatilities and innovations t−1 ,t with data/GOOG/NSE BANKINDIA-Bank-of-India-BANKINDIA. Each of
t = k + 1, k + 2, . . ., N to the trained PSOQRNN. the datasets is divided into both Training set (80%) and Test set
10. Obtain the predictions of Test set volatilities k+1 ˆ , k+2 ˆ , . . ., ˆN . (20%) respectively (see Table 1).
Computational complexity is the count of function evaluations. 5.2. Performance measures used
In the proposed PSOQRNN, there are activation functions at each
node and error fitness function to be minimized. If there are Mean Squared Error (MSE), Directional Change Statistic (Dstat)
nodes in PSOQRNN, then it performs 2 multiplications needed and Theil’s Inequality Coefficient (U) are used to measure the
42 D. Pradeepkumar, V. Ravi / Applied Soft Computing 58 (2017) 35–52
Table 1 Various parameters are tuned to obtain better results for all
Datasets used.
datasets. In literature, even though, Masters [62] recommended
Dataset Total observations Training set Test set that the number of hidden nodes as Nin ∗ NO , where Nin repre-
USD–JPY 6036 4829 1207 sents No. of input nodes and NO is number of output nodes. We
USD–GBP 6036 4829 1207 experimented with various number of hidden nodes (2, 3, 4 and
USD–EUR 3772 3018 754 5). After thorough experimentation, the number of hidden nodes
USD–INR 6028 4825 1203
of PSOQRNN is fixed as follows: 3 is selected for USD–JPY dataset,
Gold Price 7602 6081 1521
Crude Oil Price 6857 5486 1371 2 is selected for USD–GBP, 5 is selected for USD–EUR, 4 is selected
S&P 500 Stock Index 7581 6065 1516 for USD–INR, 3 is selected for Gold Price (USD), 5 is selected for
NSE India Stock Index 4232 3386 846 Crude Oil Price (USD), 3 is selected for S&P Stock Index and 5 is
selected for NSE India Stock Index datasets respectively. Therefore,
the recommendation made by Timothy did not fit in our work. The
performance of the proposed model and are defined as in Eqs.
quantile () values fixed are with in the range [0, 1]. Among these,
(12)–(14):
= 0.5 helped PSOQRNN in yielding better predictions. The param-
N 2 eters commonly used in PSO for all datasets are as follows. The
(
t=1 t
− ˆt )
MSE = (12) inertia weight (), controls the momentum of a particle. Vmax and
N
Vmin determine the maximum and minimum changes a particle can
1
N
undergo in its positional coordinates during an iteration respec-
Dstat = at ∗ 100% (13) tively. Shi and Eberhart found that “When Vmax ≥ 3, = 0.8 is a good
N
t=1 choice” [63]. Based on this, in the current work, we selected −5
for Vmin , 5 for Vmax and = 0.8. In PSO literature, it is quite a com-
1, if (t+1 − t ) ∗ (t+1
ˆ − ˆt ) ≥ 0
where at = mon practice to limit the swarm size to the range 20–60 [63–65].
0, otherwise
In the current work, we selected the number of particles as 60 after
thorough experimentation which is the same that we carried out in
1 N 2
N
(
t=1 t
− ˆt ) [66]. Usually, the acceleration coefficients C1 (Self-confidence) and
U= (14) C2 (Swarm confidence) are within the range of [0, 4] [67,65]. In the
N N 2
1
N
( )2
t=1 t
+ 1
N
(ˆ )
t=1 t current work, we selected C1 = C2 = 2 which is the same that we
carried out in [66], and the maximum number of iterations is 5000
In Eqs. (12)–(14), N is the number of forecasts obtained, t is the
so that global best particle is converged to near optimal solution.
actual volatility at time t and ˆt is forecasted volatility at time t
These parameters are chosen after rigorous experimentation with
respectively.
PSOQRNN.
The MSE (see Eq. (12)) measures the average of the squares of
Table 3 presents various parameters of forecasting models after
the errors. MSE is useful when we are concerned about significant
thorough experimentation. These parameters helped the models
errors whose negative consequences are proportionately much big-
in yielding better predictions. In GARCH model, p determines the
ger than equivalent smaller one [59]. In forecasting volatility, it is
order of GARCH terms and q determines the order of ARCH terms.
not only important to measure the accuracy of predictions but also
For all datasets, p = 1 and q = 1 are obtained as the best parameters.
the directional change of time series. For this purpose, Yao and Tan
In MLP, the learning rate (0 < < 1) controls the size of weight and
[60] developed a measure (expressed in percentages) namely Dstat
bias changes in learning of the training algorithm and momentum
as in Eq. (13). We used this measure in our work. It is well known
(0 < ˛ < 1) simply adds a fraction ˛ of the previous weight update
that Theil’s Inequality Coefficient [61,59] measures how well a fore-
to the current one. In this paper, we have chosen = 0.6 and ˛ = 0.9
casted time series is closer to actual time series. Generally, the value
for all datasets. We experimented with various number of hidden
of U lies in between 0 and 1. U = 0 means that t = ˆt for all observa-
nodes of MLP(3, 4 and 5) and, for each dataset, different numbers of
tions and there is a perfect fit and U = 1 means that the performance
hidden nodes are selected. In GRNN, the only adjustable parameter
is bad.
is smoothing factor() for kernel function and its value is varied
as per dataset. For GMDH, as part of Neuroshell, the parameters
5.3. Execution environment
selected (MD, MC and MO) are as follows. Model Diversity (MD)
defines the maximum number of Survivors which are allowed to
The execution of the proposed work is carried out using Win-
® pass from the output of each layer to the input of the next one.
dows 7 Professional platform. However, it can also be carried out
Model Complexity (MC) determines the allowed length of the for-
on other platforms. It is carried out under the system with the spec-
mula of a candidate for survival by adjusting the relative penalty
ifications of 8 GB RAM, 500 GB HDD. Table 2 presents various tools
for overall model complexity at the output of each layer. Both MD
employed in numerous experiments with the datasets.
and MC are fixed at the option of “Medium” for all datasets. Model
Opitmization (MO) is set to “Smart” which is an optimal tradeoff
Table 2 between calculation speed and model quality. In RF and QRRF, ntree
Tools and techniques used.
determines number of trees to grow. For both models, ntree is cho-
Technique used Used for Tool used sen as 100. Quantile determines boundary of data and, in QRRF and
MLP/GRNN/GMDH (http://www. Obtaining predictions Neuroshell 2. 0
® QRNN, the quantile () values fixed are with in the range [0, 1].
neuroshell.com/) Among these, = 0.5 helped both models in yielding better predic-
®
GARCH Obtaining predictions MATLAB tions. The number of hidden nodes chosen in QRNN are same as
RF (http://cran.r-project.org/web/ Obtaining predictions R that of PSOQRNN.
packages/randomForest/)
QRRF (http://cran.r-project.org/ Obtaining predictions R
web/packages/quantregForest/)
QRNN (http://cran.us.r-project.org/ Obtaining predictions R 6. Results and discussion
web/packages/qrnn/)
PSO Obtaining coefficients Java This section presents the dataset-wise results obtained and dis-
PSOQRNN Obtaining predictions Java
cusses them.
D. Pradeepkumar, V. Ravi / Applied Soft Computing 58 (2017) 35–52 43
Table 3
Parameters of various models tuned for different datasets.
GARCH (p = 1, q = 1) GARCH (p = 1, q = 1)
MLP ( = 0.6, ˛ = 0.9,nHM = 5) MLP ( = 0.6, ˛ = 0.9, nHM = 4)
GRNN ( = 0.0332941) Gold GRNN ( = 0.0449412)
USD–JPY GMDH (MD = Medium, MC = Medium, MO = Smart) Price GMDH (MD = Medium, MC = Medium, MO = Smart)
RF (ntree = 100) (USD) RF (ntree = 100)
QRRF ( = 0.5, ntree = 100) QRRF ( = 0.5, ntree = 100)
QRNN ( = 0.5, nHQ = 3) QRNN ( = 0.5, nHQ = 3)
GARCH (p = 1, q = 1) GARCH (p = 1, q = 1)
MLP ( = 0.6, ˛ = 0.9, nHM = 4) MLP ( = 0.6, ˛ = 0.9, nHM = 5)
Crude
GRNN ( = 0.0138824) GRNN ( = 0.0138824)
Oil
USD–GBP GMDH (MD = Medium, MC = Medium, MO = Smart) GMDH (MD = Medium, MC = Medium, MO = Smart)
Price
RF (ntree = 100) RF (ntree = 100)
(USD)
QRRF ( = 0.5, ntree = 100) QRRF ( = 0.5, ntree = 100)
QRNN ( = 0.5, nHQ = 2) QRNN ( = 0.5, nHQ = 5)
GARCH (p = 1, q = 1) GARCH (p = 1, q = 1)
MLP ( = 0.6, ˛ = 0.9, nHM = 3) MLP ( = 0.6, ˛ = 0.9, nHM = 5)
S&P
GRNN ( = 0.0332941) GRNN ( = 0.0138824)
500
USD–EUR GMDH (MD = Medium, MC = Medium, MO = Smart) GMDH (MD = Medium, MC = Medium, MO = Smart)
Stock
RF (ntree = 100) RF (ntree = 100)
Index
QRRF ( = 0.5, ntree = 100) QRRF ( = 0.5, ntree = 100)
QRNN ( = 0.5, nHQ = 5) QRNN ( = 0.5, nHQ = 3)
GARCH (p = 1, q = 1) GARCH (p = 1, q = 1)
MLP ( = 0.6, ˛ = 0.9, nHM = 5) MLP ( = 0.6, ˛ = 0.9, nHM = 5)
NSE
GRNN ( = 0.0216471) GRNN ( = 0.0216471)
India
USD–INR GMDH (MD = Medium, MC = Medium, MO = Smart) GMDH (MD = Medium, MC = Medium, MO = Smart)
Stock
RF (ntree = 100) RF (ntree = 100)
Index
QRRF ( = 0.5, ntree = 100) QRRF ( = 0.5, ntree = 100)
QRNN ( = 0.5, nHQ = 4) QRNN ( = 0.5, nHQ = 5)
p = order of GARCH terms, q = order of ARCH terms, = learning rate, ˛ = momentum rate, nHM = No. of hidden nodes of MLP, = Smoothing factor, = Quantile, ntree = No. of
trees to grow, nHQ = No. of hidden nodes of QRNN, MD = Model Diversity, MC = Model Complexity, MO = Model Optimization. Note: For all of the above models, No. of input
variables = 2 and No. of output variables = 1.
6.1. USD–JPY in bold faces in Tables 4–11 because the model outperformed all
other volatility prediction models specified.
Table 4 presents MSE, Dstat and Theil’s U values obtained by
using the volatility forecasting models including GARCH, MLP, 6.2. USD–GBP
GRNN, GMDH, RF, QRRF, QRNN and PSOQRNN for both of the
training set and the test set of USD–JPY. For the USD–JPY dataset, Table 5 presents MSE, Dstat and Theil’s U values obtained by
PSOQRNN yielded 80% less MSE than that of QRNN on Test set. Fur- using volatility forecasting models including GARCH, MLP, GRNN,
ther, PSOQRNN yielded the highest Dstat too. These reveal that GMDH, RF, QRRF, QRNN and PSOQRNN for both of the training set
PSOQRNN yielded better predictions than QRNN in both training and the test set of USD–GBP. For the USD–GBP dataset, PSOQRNN
set and test set respectively. Figs. 4 and 5 depict the predictions of yielded 87% less MSE than that of QRNN. Further, PSOQRNN yielded
volatilities of both training set and test set respectively. The min- the highest Dstat and Theil’s Inequality Coefficient too. These reveal
imum MSE obtained by the proposed PSOQRNN in 30 runs is also that PSOQRNN yielded better predictions than QRNN in both train-
better when compared to the MSE obtained by other techniques. ing set and test set. Figs. 6 and 7 depict the predictions of volatilities
From Table 4, it is also evident that QR hybrids best predicted the of both training set and test set respectively. The minimum MSE
volatility of financial time series than GARCH, MLP, GRNN, GMDH, obtained by the proposed PSOQRNN in 30 runs is least when com-
and RF. The results of the proposed PSOQRNN model are highlighted pared to the MSE obtained by other techniques. From Table 5,
Table 4
Results of volatility forecasting models for USD–JPY data.
it is also evident that QR hybrids best predicted the volatility of predictions of volatilities of both training set and test set respec-
USD–GBP than GARCH, MLP, GRNN, GMDH, and RF. tively. The minimum MSE obtained by the proposed PSOQRNN in
30 runs is also least when compared to the MSE obtained by other
6.3. USD–EUR techniques. From Table 6, it is also evident that QR hybrids best pre-
dicted the volatility of USD–EUR than GARCH, MLP, GRNN, GMDH,
Table 6 presents MSE, Dstat and Theil’s U values obtained by and RF.
using volatility forecasting models including GARCH, MLP, GRNN,
GMDH, RF, QRRF, QRNN and PSOQRNN for both of the training set 6.4. USD–INR
and the test set of USD–EUR. For the USD–EUR dataset, PSOQRNN
yielded 85% less MSE than that of QRNN on Test set. Further, PSO- Table 7 presents MSE, Dstat and Theil’s U values obtained by
QRNN yielded the highest Dstat and Theil’s Inequality Coefficient using volatility forecasting models including GARCH, MLP, GRNN,
too. These reveal that PSOQRNN yielded better predictions than GMDH, RF, QRRF, QRNN and PSOQRNN for both of the training set
QRNN in both training set and test set. Figs. 8 and 9 depict the and the test set of dataset of USD–INR. For the USD–INR dataset,
Table 5
Results of volatility forecasting models for USD–GBP data.
Table 6
Results of volatility forecasting models for USD–EUR data.
Table 7
Results of volatility forecasting models for USD–INR data.
PSOQRNN yielded 98% less MSE than that of QRNN on Test set. it is also evident that QR hybrids best predicted the volatility of
Further, PSOQRNN yielded the highest Dstat too. These reveal that USD–INR than GARCH, MLP, GRNN, GMDH, and RF.
PSOQRNN yielded better predictions than QRNN in both training
set and test set. Figs. 10 and 11 depict the predictions of volatilities 6.5. Gold Price (USD)
of both training set and test set respectively. The minimum MSE
obtained by the proposed PSOQRNN in 30 runs is also least when Table 8 presents MSE, Dstat and Theil’s U values obtained by
compared to the MSE obtained by other techniques. From Table 7, using volatility forecasting models including GARCH, MLP, GRNN,
D. Pradeepkumar, V. Ravi / Applied Soft Computing 58 (2017) 35–52 47
Table 8
Results of volatility forecasting models for Gold Price (USD) data.
GMDH, RF, QRRF, QRNN and PSOQRNN for both training set and 6.6. Crude Oil Price (USD)
test set of datasets of Gold Price (USD). For the Gold Price dataset
(see Table 7), PSOQRNN yielded 98% less MSE than that of QRNN on Table 9 presents MSE, Dstat and Theil’s U values obtained by
Test set. Further, PSOQRNN yielded the highest Dstat and Theil’s using volatility forecasting models including GARCH, MLP, GRNN,
Inequality Coefficient too. These reveal that PSOQRNN yielded GMDH, RF, QRRF, QRNN and PSOQRNN for both training set and
better predictions than QRNN in both training set and test set. test set of datasets of Crude Oil Price (USD). For the Crude Oil Price
The predictions of volatilities of both training set and test set dataset (see Table 8), PSOQRNN yielded 29% less MSE than that
are depicted in Figs. 12 and 13 respectively. The minimum MSE of QRNN. Further, PSOQRNN yielded the highest Dstat and Theil’s
obtained by the proposed PSOQRNN in 30 runs is also least when Inequality Coefficient too. The minimum MSE obtained by the pro-
compared to the MSE obtained by other techniques. From Table 8, posed PSOQRNN in 30 runs is least when compared to the MSE
it is also clear that QR hybrids best predicted the volatility of Gold obtained by other techniques. The predictions of volatilities of both
Price than GARCH, MLP, GRNN, GMDH and RF. training set and test set are depicted in Figs. 14 and 15 respectively.
Table 9
Results of volatility forecasting models for Crude Oil Price (USD) data.
From Table 8, it is also clear that QR hybrids best predicted the The predictions of volatilities of both training set and test set
volatility of Crude Oil Price than GARCH, MLP, GRNN, GMDH, and are depicted in Figs. 16 and 17 respectively. The minimum MSE
RF. obtained by the proposed PSOQRNN in 30 runs is also least when
compared to the MSE obtained by other techniques. From Table 10,
6.7. S & P 500 Stock Index it is also clear that QR hybrids best predicted the volatility of S&P
500 Stock Index than GARCH, MLP, GRNN, GMDH, and RF.
Table 10 presents MSE, Dstat and Theil’s U values obtained by
using volatility forecasting models including GARCH, MLP, GRNN, 6.8. NSE India Stock Index
GMDH, RF, QRRF, QRNN and PSOQRNN for both training set and
test set of dataset of S&P 500 Stock Index. For the S&P Stock Index Table 11 presents MSE, Dstat and Theil’s U values obtained by
dataset, PSOQRNN yielded 12% less MSE than that of QRNN on using volatility forecasting models including GARCH, MLP, GRNN,
Test set. Further, PSOQRNN yielded the highest Dstat and Theil’s GMDH, RF, QRRF, QRNN and PSOQRNN for both of the training set
Inequality Coefficient too. These reveal that PSOQRNN yielded and the test set of dataset of NSE India Stock Index. For the NSE
better predictions than QRNN in both training set and test set. India Stock Index dataset, PSOQRNN yielded 84% less MSE than that
Fig. 14. Predictions of training set of Crude Oil Price (USD) volatilities.
D. Pradeepkumar, V. Ravi / Applied Soft Computing 58 (2017) 35–52 49
Fig. 15. Predictions of test set of Crude Oil Price (USD) volatilities.
Table 10
Results of volatility forecasting models for S&P 500 Stock Index data.
Fig. 16. Predictions of training set of S&P 500 Stock Index volatilities.
Table 11
Results of volatility forecasting models for NSE India Stock Index data.
Fig. 17. Predictions of test set of S&P 500 Stock Index volatilities.
Fig. 18. Predictions of training set of NSE India Stock Index volatilities.
Fig. 19. Predictions of test set of NSE India Stock Index volatilities.
of QRNN on Test set. Further, PSOQRNN yielded the highest Dstat also clear that QR hybrids best predicted the volatility of NSE India
and Theil’s Inequality Coefficient too. These reveal that PSOQRNN Stock Index than GARCH, MLP, GRNN, GMDH and RF.
yielded better predictions than QRNN in both training set and test
set. The predictions of volatilities of both training set and test set 6.9. Discussion
are depicted in Figs. 18 and 19 respectively. The minimum MSE
obtained by the proposed PSOQRNN in 30 runs is least when com- Since QR has the capability of characterizing and modeling
pared to the MSE obtained by other techniques. From Table 11, it is the variability of the dependent variable across all the quantiles,
D. Pradeepkumar, V. Ravi / Applied Soft Computing 58 (2017) 35–52 51
Table 12
Results of Diebold–Mariano (DM) test on test sets of all datasets.
Forecasting model USD–JPY USD–GBP USD–EUR USD–INR Gold Price Crude Oil Price NSE India Stock Index S&P 500 Stock Index
PSOQRNN Vs
GARCH 2.440949 1.238817 1.379453 3.31054 9.371722 6.731015 6.585208 26.42448
MLP 6.983993 2.917819 2.314754 2.31974 12.86086 7.186184 10.89401 7.575923
GRNN 2.141267 1.015665 1.268521 1.95331 7.733054 5.567485 3.679559 5.727691
GMDH 2.164064 1.019099 1.253273 1.88248 7.806101 5.418079 3.595849 5.626264
RF 2.490336 1.179588 1.428840 2.61283 9.295843 6.023462 4.277927 6.460665
QRRF 0.000695 0.0010651 0.0009519 0.00535 0.2037165 1.357820 0.0457576 1.483677
QRNN 0.002296 0.0065773 0.0034160 0.46864 1.208080 0.014929 0.0532363 0.1353052
covering the whole distribution of a dependent variable, QR when except QRRF on four datasets and superior to all models except
used in conjunction with RF (resulting in the hybrid, QRRF) outper- QRRF and QRNN in all cases.
formed the stand-alone RF. The same phenomenon applies to the
hybrid QRNN also. These two observations can be easily evidenced
7. Conclusions
by the results presented in Tables 3–10, wherein QRRF ( = 0.5) and
QRNN ( = 0.5) outperformed RF and Neural Network respectively
This paper proposed a novel PSO trained QRNN, called PSO-
in terms of MSE, Dstat and Theil’s U for all datasets.
QRNN, to forecast volatility from a financial time series. It is
However, in particular, QRRF outperformed QRNN on five
observed that the proposed PSOQRNN yielded statistically sig-
datasets including USD versus JPY, GBP, EUR, INR and Gold Price
nificant results compared to other popular volatility forecasting
(USD) because of the following reasons:
models such as GARCH, MLP, GRNN, GMDH, RF, QRRF, and QRNN
on eight financial datasets in terms of MSE. It also performed well
1. QRRF chose an appropriate number of random covariates to in terms of other important measures Dstat and Theil’s U Inequality
find the best split. Once the best split is found, the predictions Coefficient. The spectacular performance of PSOQRNN is caused by
obtained from it turned to be better predictions than those pre- the presence of PSO, which yielded global optimal weights while
dictions obtained from the other forecasting models. training PSOQRNN. The results are encouraging, and we suggest its
2. The bagging procedure in QRRF reduced the variance in predic- further use in volatility forecasting in similar other financial and
tions so that QRRF could obtain better predictions. non-financial data.
3. QRRFs considered random feature subsets of data at different
quantiles, constructed trees for each subset over a bootstrap
References
sample and ensembled the obtained results. The ensembling
procedure in QRRF helped it in yielding better predictions than [1] S.-H. Poon, A Practical Guide to Forecasting Financial Market Volatility, John
other forecasting models. The last mentioned two features are Wiley & Sons Ltd., 2005.
missing in MLP and other ANNs. [2] S.-H. Poon, C. Granger, Practical issues in forecasting volatility types of
volatility models, Financ. Anal. J. 61 (2005) 45–56.
[3] T. Bollerslev, Generalized autoregressive conditional heteroskedasticity, J.
Economet. 31 (1986) 307–327.
It is also important to note that QRNN outperformed QRRF on
[4] D.E. Rumelhart, G.E. Hinton, R.J. Williams, Parallel Distributed Processing:
three datasets including Crude Oil Price (USD), S&P 500 Stock Index Explorations in the Microstructure of Cognition, vol. 1, 1986, pp. 318–362.
and NSE India Stock Index. The QRRF accepts a range for each obser- [5] D. Specht, A general regression neural network, IEEE Trans. Neural Netw. 2
vation will be in (with high probability). It will be less accurate (1991) 568–576.
[6] A. Ivakhnenko, The GMDH: a rival of stochastic approximation, Sov. Autom.
when there is the wider range for a new instance [10]. Therefore, Control 3 (1968).
based on this observation, QRRF probably could not yield better [7] G. Zhang, B. Eddy Patuwo, M.Y. Hu, Forecasting with artificial neural
predictions compared to QRNN as there may be the wider range for networks: the state of the art, Int. J. Forecast. 14 (1998) 35–62.
[8] L. Breiman, Random forests, Mach. Learn. 45 (2001) 5–32.
the new instance of three datasets. [9] S.-K. Oh, D.-W. Kim, B.-J. Park, H.-S. Hwang, Advanced polynomial neural
The Back Propagation algorithm in QRNN implements a gradi- networks architecture with new adaptive nodes, Trans. Control Autom. Syst.
ent descent search through the space of possible network weights. Eng. 3 (2001) 43–50.
[10] N. Meinshausen, Quantile regression forests, J. Mach. Learn. Res. 7 (2006)
Consequently, these weights converge to a local minimum but not 983–999.
to the global optimum [52]. The PSO, being an evolutionary algo- [11] V. Ravi, A. Sharma, Support vector-quantile regression random forest hybrid
rithm, has a higher chance of obtaining the global optimal solution. for regression problems, in: MIWAI 2014, Bangalore, India, LNAI 8875,
December 9–10, 2014, pp. 149–160.
So, the PSO is utilized for training the QRNN. Therefore, PSOQRNN
[12] J.W. Taylor, A quantile regression approach to estimating the distribution of
could yield better predictions. multiperiod returns, J. Deriv. 7 (1999) 64–78.
Finally, the above forecasting accuracy measures including MSE, [13] J. Kennedy, R. Eberhart, Particle swarm optimization, in: IEEE International
Conference on Neural Networks, Perth, Australia, 27 November to 1
Dstat, and Theil’s U do not formally test whether one method is
December, 1995, pp. 1942–1948.
statistically significantly different from another method or not. For [14] P.H. Franses, M. McAleer, Financial volatility: an introduction, J. Appl.
this purpose, there is one popular test proposed by Diebold and Economet. 17 (2002) 419–424.
Mariano [68]. We employed this test statistic that is implemented [15] T.G. Andersen, T. Bollerslev, P.F. Christoffersen, F.X. Diebold, Volatility and
correlation forecasting, Handb. Econ. Forecast. 1 (2006) 777–878.
as a part of ‘forecast’ package from archives of R (https://cran.r- [16] J.L. Knight, S.S. Satchell, Forecasting Volatility in the Financial Markets, 3rd
project.org/web/packages/forecast/) to check whether PSOQRNN is ed., Butterworth-Heinemann, 2007.
performing statistically significantly different from other volatility [17] R.F. Engle, Autoregressive conditional heteroscedasticity with estimates of
the variance of United Kingdom inflation, Econometrica 50 (1982) 987–1007.
forecasting models on average or not. Table 12 presents the results [18] T. Bollerslev, R.Y. Chou, K.F. Kroner, ARCH modeling in finance: a review of the
of the Diebold–Mariano test on eight datasets. If the test statistic is theory and empirical evidence, J. Economet. 52 (1992) 5–59.
less than or equal to 0.05, then the corresponding model is perform- [19] J. Noh, R.F. Engle, A. Kane, Forecasting volatility and option prices of the S&P
500 index, J. Deriv. 2 (1994) 17–30.
ing equally accurate with PSOQRNN. From the table, it is clear that [20] J. Vilasuso, Forecasting exchange rate volatility, Econ. Lett. 76 (2002) 59–64.
PSOQRNN is superior to all models on two datasets, superior to all [21] R.T. Baillie, T. Bollerslev, H.O. Mikkelsen, Fractionally integrated generalized
models except QRNN on three datasets, superior to all models autoregressive conditional heteroskedasticity, J. Economet. 74 (1996) 3–30.
52 D. Pradeepkumar, V. Ravi / Applied Soft Computing 58 (2017) 35–52
[22] J.-R. Chang, L.-Y. Wei, C.-H. Cheng, A hybrid ANFIS model based on AR and [46] S. Ketabchia, H. Ghanadzadehc, A. Ghanadzadehb, S. Fallahia, M. Ganjia,
volatility for TAIEX forecasting, Appl. Soft Comput. 11 (2011) 1388–1395. Estimation of VLE of binary systems (tert-butanol+2-ethyl-1-hexanol) and
[23] P. Agnolucci, Volatility in crude oil futures: a comparison of the predictive (n-butanol+2-ethyl-1-hexanol) using GMDH-type neural network, J. Chem.
ability of GARCH and implied volatility models, Energy Econ. 31 (2009) Thermodyn. 42 (2010) 1352–1355.
316–321. [47] L. Breiman, J. Friedman, R. Olshen, C. Stone, Classification and Regression
[24] R. Donaldson, M. Kamstra, An artificial neural network-GARCH model for Trees, Belmont, Wadsworth, 1984.
international stock return volatility, J. Empir. Finance 4 (1997) 17–46. [48] L. Breiman, Bagging predictors, Mach. Learn. 24 (1996) 123–140.
[25] F. Gonzalez Miranda, N. Burgess, Modelling market volatilities: the neural [49] R. Koenker, G. Bassett, Regression quantiles, Econometrica 46 (1978) 33–50.
network perspective, Eur. J. Finance 3 (1997) 137–157. [50] A.J. Cannon, Quantile regression neural networks: implementation in R and
[26] S.A. Hamid, Z. Iqbal, Using neural networks for forecasting volatility of S&P application to precipitation downscaling, Comput. Geosci. 37 (2011)
500 Index futures prices, J. Bus. Res. 57 (2004) 1116–1125. 1277–1284.
[27] J.R. Aragonés, C. Blanco, P.G. Estévez, Neural network volatility forecasts, [51] M. Saerens, Building cost functions minimizing to some summary statistics,
Intell. Syst. Acc. Finance Manage. 15 (2007) 107–121. IEEE Trans. Neural Netw. 11 (2000) 1263–1271.
[28] M. Mohsen, B. Nafiseh, A. Mehdi, M. Mohsen, Forecasting volatility of crude oil [52] T.M. Mitchell, Machine Learning, McGraw-Hill, 1997.
price using the GMDH neural network, Q. Energy Econ. Rev. 7 (2010) 89–112. [53] R. Mendes, P. Cortez, M. Rocha, J. Neves, Particle swarms for feedforward
[29] X.-f. Zhuang, L.-w. Chan, Volatility forecasts in financial time series with neural network training, in: 2002 International Joint Conference on Neural
HMM-GARCH models, J. Financ. Strateg. Decis. 3177 (2004) 807–812. Networks (IJCNN02), vol. 6, Honolulu, Hawaii, 12–17 May, 2002, pp.
[30] T. Hyup Roh, Forecasting the volatility of stock price index, Expert Syst. Appl. 1895–1899.
33 (2007) 916–922. [54] G.K. Jha, P. Thulasiraman, R.K. Thulasiram, PSO based neural network for time
[31] C.-H. Tseng, S.-T. Cheng, Y.-H. Wang, J.-T. Peng, Artificial neural network series forecasting, in: 2009 International Joint Conference on Neural Networks
model of the hybrid EGARCH volatility of the Taiwan stock index option (IJCNN2009), IEEE, Atlanta, Georgia, USA, 14–19 June, 2009, pp. 1422–1427.
prices, Phys. A: Stat. Mech. Appl. 387 (2008) 3192–3200. [55] R. Adhikari, R.K. Agrawal, Effectiveness of PSO based neural network for
[32] C.-H. Tseng, S.-T. Cheng, Y.-H. Wang, New hybrid methodology for stock seasonal time series forecasting, in: 5th Indian International Conference on
volatility prediction, Expert Syst. Appl. 36 (2009) 1833–1839. Artificial Intelligence (IICAI 2011), Tumkur, India, 14–16 December, 2011, pp.
[33] Y.-H. Wang, Nonlinear neural network forecasting model for stock index 231–244.
option price: hybrid GJR-GARCH approach, Expert Syst. Appl. 36 (2009) [56] M.S. Innocente, J. Sienz, Particle swarm optimization with inertia weight and
564–570. constriction factor, in: The 2nd International Conference on Swarm
[34] M. Bildirici, Ö.Ö. Ersin, Improving forecasts of GARCH family models with the Intelligence (ICSI’2011), Chongqing, China, 12–15 June, 2011, pp. 1–11.
artificial neural networks: An application to the daily returns in Istanbul [57] M. AlRashidi, M. El-Hawary, A survey of particle swarm optimization
Stock Exchange, Expert Syst. Appl. 36 (2009) 7355–7362. applications in electric power systems, IEEE Trans. Evol. Comput. 13 (2009)
[35] J.-C. Hung, Forecasting volatility of stock market using adaptive Fuzzy-GARCH 913–918.
model, in: 2009 Fourth International Conference on Computer Sciences and [58] R. Eberhart, Y. Shi, Guest editorial. Special issue on particle swarm
Convergence Information Technology, ICCIT09, IEEE, Seoul, Korea, 24–26 optimization, IEEE Trans. Evol. Comput. 8 (2004) 201–203.
November, 2009, pp. 583–587. [59] S. Makridakis, M. Hibon, Evaluating accuracy (or error) measures, Insead
[36] E. Hajizadeh, A. Seifi, M. Fazel Zarandi, I. Turksen, A hybrid modeling approach (1995) 1–41.
for forecasting the volatility of S&P 500 index return, Expert Syst. Appl. 39 [60] J. Yao, C.L. Tan, A case study on using neural networks to perform technical
(2012) 431–436. forecasting of forex, Neurocomputing 34 (2000) 79–98.
[27] S.A. Monfared, D. Enke, Volatility forecasting using a hybrid GJR-GARCH [61] H. Theil, Applied Economic Forecasting, North-Holland Pub. Co., Amsterdam,
neural network model, in: Complex Adaptive Systems, Procedia Computer 1966.
Science, vol. 36, Philadelphia, PA, November 3–5, 2014, pp. 246–253. [62] T. Masters, Practical Neural Network Recipies in C++, Academic Press, Inc.,
[38] A. Komijani, E. Naderi, N. Gandali Alikhani, A hybrid approach for forecasting London, 1993.
of oil prices volatility, OPEC Energy Rev. 38 (2014) 323–340. [63] Y. Shi, R. Eberhart, A modified particle swarm optimizer, in: IEEE World
[39] S. Choudhury, S. Ghosh, A. Bhattacharya, K.J. Fernandes, M.K. Tiwari, A real Congress on Computational Intelligence, Anchorage, Alaska, 4–9 May, 1998,
time clustering and SVM based price-volatility prediction for optimal trading pp. 69–73.
strategy, Neurocomputing 131 (2014) 419–426. [64] J. Kennedy, Small worlds and mega-minds: effects of neighborhood topology
[40] R. Rosa, L. Maciel, F. Gomide, R. Ballini, Evolving hybrid neural fuzzy network on particle swarm performance, in: Congress on Evolutionary
for realized volatility forecasting with jumps, in: 2014 IEEE Conference on Computation-CEC99, IEEE, Washington DC, USA, 6–9 July, 1999, pp.
Computational Intelligence for Financial Engineering & Economics (CIFEr), 1931–1938.
London, UK, 27–28 March, 2014, pp. 481–488. [65] S. Das, A. Abraham, A. Konar, Particle swarm optimization and differential
[41] C. Narendra Babu, B. Eswara Reddy, Prediction of selected Indian stock using a evolution algorithms: technical analysis, applications and hybridization
partitioning-interpolation based ARIMA-GARCH model, Appl. Comput. Inf. 11 perspectives, Stud. Comput. Intell. 116 (2008) 1–38.
(2015) 130–143. [66] D. Pradeepkumar, V. Ravi, FOREX rate prediction using chaos, neural network
[42] W. Kristjanpoller, M.C. Minutolo, Gold price volatility: a forecasting approach and particle swarm optimization, in: 5th International Conference on Swarm
using the artificial neural network-GARCH model, Expert Syst. Appl. 42 (2015) Intelligence, ICSI 2014, volume LNCS 8795, Springer International Publishing,
7245–7251. Hefei, China, 17–20 October, 2014, pp. 363–375.
[43] R. Dash, P. Dash, R. Bisoi, A differential harmony search based hybrid interval [67] G. Venter, J. Sobieszczanski-Sobieski, Particle swarm optimization, in: 43rd
type2 fuzzy EGARCH model for stock market volatility prediction, Int. J. AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics and Materials
Approx. Reason. 59 (2015) 81–104. Conference, American Institute of Aeronautics and Astronautics, Denver,
[44] A.Y. Huang, S.-P. Peng, F. Li, C.-J. Ke, Volatility forecasting of exchange rate by Colorado, 22–25 April, 2002, pp. 1–9.
quantile regression, Int. Rev. Econ. Finance 20 (2011) 591–606. [68] F.X. Diebold, R.S. Mariano, Comparing predictive accuracy, J. Bus. Econ. Stat.
[45] S.J. Farlow, Self-Organizing Methods in Modeling: GMDH Type Algorithms, 13 (1995) 253–263.
Illustrated, volume 54 of Statistics: A Series of Textbooks and Monographs
Edition, CRC Press, 1984.