Comparison of Predictive Algorithms: Backpropagation, SVM, LSTM and Kalman Filter For Stock Market
Comparison of Predictive Algorithms: Backpropagation, SVM, LSTM and Kalman Filter For Stock Market
Abstract: The variation and dependency on different param- As every algorithm has its own advantage and disadvantage
eters of stock market makes prediction a complex process. this research will study 3 different machine learning
Artificial neural Networks have been proven to be useful in techniques commonly being in use for stock market prediction.
such cases to predict the stock values.The parameters A compar-ative approach will be followed to find out optimal
involved and the commonly used algorithms are discussed technique for stock market prediction under different
and compared in this paper. In case of backpropagation circumstances. Comparison will be made on the basis of their
algorithm, a feed forward network is present and weights are performance on the same input dataset and for the same
modified by back propagating the error. Similarly, significant number of epochs. As the main purpose of any predictive
modification is introduced in Sup-port Vector Machines model is to minimize error (in case of stock market - minimize
Algorithm(SVMA) which results in higher accuracy rates. the risks), our given problem statement was addressed to
Presence of kernel and other parameters make it more achieve as much accuracy as possible by tweaking the
flexible. Long Short-Term Memory(LSTM), another algorithms.
commonly used time series forecasting algorithm, is a special
type of Recurrent Neural Network(RNN) that uses gradient II. LITERATURE SURVEY
descent algorithm. This paper provides a comparative Several studies have been done to analyze these algorithms to
analysis between these algorithms on the basis of accuracy, be used for stock market predictions. In most cases Arti-ficial
variation and time required for different number of epochs. Neural network(ANNs) were used which suffered from
The T-test hypothesis test was used for further analysis to test overfitting problem this was because of the large number of
the reliability of each algorithm. parameters to be fixed [2], and the little prior user knowledge
about the relevance of the inputs in the analyzed problem.
Keywords: Artificial Neural Network(ANN), Support Vec-tor
SVM [3] and Backpropagation algorithms were used for many
Machines(SVM), Long Short-Term Memory(LSTM), Back-
problems centered around classification. They tried to over-
propgation, Recurrent Neural Netwrok(RNN), T-tests
come most of the limitations of ANN and on many instances
I. INTRODUCTION reasonable accuracy was achieved. LSTM was used for many
time series problems and similar approach is used to predict
Prediction or forecast is essentially a calculated guess based on trend for stock by memorizing history data. Kalman filter is a
some previous evidence or facts. In todays growing economy two stage algorithm which employs linear regression within
prediction and analysis of stock market is of great importance data and represents ’true value’ of the market. It strikes a
as it reflects the economic conditions, but is a very balance between data and predictions based on relative amount
complicated task. This is because of non-linear behavior of of noise [4].
stock data. Many other factors which create hurdles in the way
to predict stock include social, economic conditions, politics, Several other machine learning techniques are also being used
rumours and media hype, traders behaviour, etc. Professional for stock market prediction. There is no specification made by
traders have tried to make forecasts and have developed a which we can choose the optimal solution for stock market
variety of analysis methods such as technical, fundamental, prediction. This paper analyzes these algorithms using same
quantitative, and so on [1]. These methods use different amount of test and training data to maintain the consistency of
sources ranging from news to price data, but they all have a testing environment. T-tests were performed for one-to-one
common aim - predicting the companys future stock prices so comparison of these algorithms under similar conditions.
they can make educated decisions on their trading options. These tests tell us the ability of these algorithms to provide
Over the past 10 years, neural network - one of the most consistent results with respect to each other.
intelligent data mining techniques - has been used extensively.
This has led to many researchers applying machine learning III. THEORY
techniques to stock market as well some of which have The three algorithms being focused in this paper are:
produced quite promising results.
LSTM or Long Short Term memory is a type of recurrent x Adj Close: Closing value of the stock before market opens
neural network (RNN)and uses an appropriate gradient descent next day, adjusted by considering corporate actions and
algorithm. A RNN network feeds the output of a block as the distributions
input for the next iteration and uses its feedback connection to x Volume: Total number of shares that changed hands
store recent events in the form of activation. Unlike multilayer during that day
229
So, we collected stock price historical data from Yahoo x Creating a SVM classifier
Finances for the following 9 companies belonging to same
sector of Technology: x Defining the kernel and setting its parameters
x Apple x Predicting and comparison with target for accuracy
x Acer calculation
x Amazon
In this case, since the stock market data is non linear we used
x Google
the rbf kernel and kept the hyperparameters C and gamma at
x HP
their default value of 1 and scale(version 0.22) respectively.
x IBM Higher values of C and gamma,causes the decision boundary
x Intel to become more curvy and variance to increase thereby
x Microsoft causing higher chances of overfitting. The decision function
x Sony shape was chosen to be ovo as it gave better results [12].
We compiled our own database from these parameters. The Accuracy can be improved by tuning the parameters using
key terms used in deriving the input parameters are: function gridSearchCV() in sklearn.model selection.This may
take a lot of time depending on the number of training
x Momentum: It is the measure of detecting the flow of a samples. Once the classifier is trained by mapping the given
parameter such as stock price in our case. It is defined as features and labels, prediction is made using the the test data
+1 if the price has increased and -1 if the price has been which consists of approximately 20% of the total data (4,412
dropped from yesterday. samples). This is the compared with the target data present in
the output file and accuracy is ascertained.
x Volatility: It determines the fluctuations in the same
parameter at different times. For our case it will be 2) Backpropagation: We followed 5 steps to implement
determined by difference between yesterday’s price and this algorithm on our dataset [13]:
today’s price divided by yesterday’s price.
a) Initialize Network:
Hence for the input dataset we found out following parameters
from the above parameters: x Here we initialize weights and biases using the number of
inputs, number of hidden neurons and number of outputs.
x Index Momentum: It determines the momentum of the
market. For our case taken as 5 days of average. x Weights are selected randomly from the range 0 to 1.
x Index Volatility: This helps in determining the fluctua- b) Forward Propagation: In this stage we can get an output by
tions in market. It is also calculated as average of last 5 propagatingour inputs through all the hidden layers and output
days. layer. We use this technique further in the process for the
prediction. Forward propagation involves:
x Sector Momentum: This parameter considers other com- x Activation of Neuron: We find the activation value using
panies in the same sector and calculates the momentum of weighted sum of inputs.
these companies for the last 5 days.
x Transfer of Neuron: Using this activation value and
x Stock Momentum: This is average of the last 5 days of sigmoid function we transfer this activation to get the
momentum of the respective company output.
x Stock Price Volatility: This is average of the last 5 days of x Forward Propagation: It propagates inputs through input,
stock price of respective company. hidden and output layers and stores the output value.
Data set consisted of these 6 dimensions for each of the stocks. c) Backward Propagation of error: Here we calculate the
I.E. each day has 9 records, one record for each company slope of the output by differentiating the sigmoid function.
stock. The total number of rows for our input data set are Now, error signal is generated using the weighted error of
around 22212. each neuron in output layer.
B. Implementation of Algorithm d) Train Network: The error calculated for each neuron using
1) Support Vector Machine: This algorithm was imple- backpropagation method can be used to update the
mented in python using the scikit-learn library for weights. Similarly we also update bias weights. In this
machine learning. The implementation was done in the stage learning rate determines how much change should
following steps [11] - be there in updating the weights. Smaller learning rate
over larger database and iterations give better set of
x Importing the library and csv data files weights. Since, we are updating weights for each iteration,
this is called as online learning. Once the weights are
230
updated the step repeats depending on the number of this model is fit to the training data and we predict the output.
iterations defined (number of epochs) and we train the Then, we calculated the accuracy of the model.
network.
V. RESULTS AND ANALYSIS
e) Predict: For prediction we are forward propagating the
To make sure that one algorithms has a consistent perfor-
input set to get the output. In our case we are using the
mance and it performs better than other we tried to run same
output itself as the probability of pattern belonging to each
algorithm multiple times. Each algorithm was implemented 30
output class [13]. We turned this output into crisp class
times and for each time prediction accuracy is calculated on
prediction by selecting the class value with larger
test data. For each run different training and testing dataset is
probability.
used. Find below accuracy results for each algorithm.
Then we compared this predicted output with the target output
A. SVM Result:
to get the accuracy of the algorithm. The database was divided
into 5 batches (5 folds) out of which 4 were used for training For 10 runs of SVM algorithm we got approximately 66.9823
the network and 1 fold for testing the network. This gave us mean accuracy with 0.05256 standard deviation. This shows
around 80% of data for training and 20% for testing. thatSVM performance is consistent for 10 runs this is due to
nature of SVM algorithm. Algorithm will keep training until it
3) Long Short Term Memory(LSTM): Implementation of can classify maximum testing data resulting in higher
LSTM was done using the following steps [14]- accuracy(Table I).
a) Dataset creation for LSTM: B. LSTM Result:
x Converting the time series into supervised learning
problem: For 10 runs of LSTM algorithm we got approximately
– We have data divided into input set and output set. 68.51635 mean accuracy with standard deviation of 0.71779.
– We defined a value for time step. Using this time step But for larger number of epochs LSTM gives less variance and
we shift the dataset and concatenate these two series thus more reliable output (Table II).
to get the output set.
TABLE I: RESULTS OBTAINED FOR SVM
x Converting the time series into stationary data:
– Stationary data is easier to model and is better for Mean Time
forecasting. No. of Accuracy Standard Required
– It is obtained by removing the trend from the data Variance
Epochs (in Deviation ( in
which is achieved by differencing the data. percentage) seconds)
x Observations are transformed to specific scale: 10 66.9823 0.05255 0.00276 255.4932
– LSTM works on the data which is within the scale of 30 67.0316 0.11561 0.01337 933.3335
activation function of the network.
50 67.0895 0.15970 0.02550 1438.2164
– Since the default activation function for LSTM is
70 67.2219 0.18522 0.03430 2154.9157
hyperbolic tangent (tanh) which has output range
from -1 to +1 which is ideal for time series data. 100 67.1212 0.16695 0.02787 2258.1878
b) Model Development:
TABLE II: RESULTS OBTAINED FOR LSTM
x LSTM is a type of Recurrent Neural Network (RNN). This
type of neural network is useful when remembering over Mean Time
the long sequence of data and it doesnt depend on the No. of Accuracy Standard Required
window lagged dataset as input. Variance
Epochs (in Deviation (in
x LSTM layer takes 3 inputs: percentage) seconds)
– Samples: The rows of input dataset having indepen- 10 68.51635 0.71779 0.51523 574.5679
dent observations.
– Time steps: These are time steps of the parameters for 30 68.98083 0.05528 0.00306 3504.9207
the input dataset. 50 68.95468 0.04981 0.00248 5403.6272
– Features: The parameters considered and observed for 70 68.96183 0.05109 0.00261 7448.0353
input dataset to predict the output. 100 69.04171 0.05204 0.00271 10692.7814
While compiling the network we need to mention loss function C. Backpropagation Result:
and also the optimization algorithm. So, we are using mae as
For 10 runs of Backpropagation algorithm we got approx-
loss function and ADAM as optimization algorithm. ADAM
imately 68.649 mean accuracy with standard deviation of
will select a suitable learning rate for the network. And then
231
0.55375. As it can be observed Backpropagation performs noises. Once the outcome of the next measurement is obtained
similar to SVM but is faster. It has higher fluctuation in by the algorithm it updates its uncertainty matrix depending on
accuracy as compared to LSTM so that might cause issue the covariance between the predicted value and the measured
when a steady accuracy is required(Table III). value. Kalman filter hence develops a state transition model to
follow the same flow as the data given and updates it at every
TABLE III: RESULTS OBTAINED FOR step to find the most localized value and hence accurate
BACKPROPAGATION prediction of the next state [16]. The stock market being a time
series data and highly fluctuating becomes a good application
Mean Time to use Kalman filter which possesses real time tracking
No. of Accuracy Standard Required characteristics [17].
Variance
Epochs (in Deviation ( in
percentage) seconds) A. Algorithm and Flowchart
10 68.649 0.55375 0.30664 12.6310
30 68.433 0.54018 0.29180 52.8089
50 68.339 0.89537 0.80169 113.1676
70 68.249 0.69470 0.48260 143.1701
100 67.434 2.43245 5.91679 229.9802
D. Observations
From the above obtained results we also found out the T-test
values for different combination of algorithms.T-tests are one
of inferential statistical methods used to compare means. It
tells us how many standard units the two means are apart. An
assumption is made that the dependent variable is normally
distributed and thus helps calculate the probability. The
probability or amount of confidence(alpha level, level of
significance) is set as 95%. The critical value for 95%
confidence is found out for epoch = 10. And as we can see
Fig. 1. Flowchart of Kalman Filter [18]
from the table IV, only the combination of Backpropagation-
LSTM has T-stat less than critical value(CV). From table I and B. Data Set Creation
II we can see that LSTM has much less variance. So obtaining
better results from LSTM has higher chances than The data set used is the same as used for the other
Backpropagation. algorithms.However, the data set used in Kalman filtering is
preprocessed as follows :
TABLE IV: T-TEST VALUES FOR EPOCH = 10
1) From the adj close parameter in the original dataset, an
T-Stat CV adjustment factor(adj factor) is calculated as adj factor
SVM-Backpropagation 6.715 1.771 = adj close/close.
Backpropagation-LSTM 0.395 1.771 2) The other parameters of the original dataset are mul-
LSTM-SVM 7.666 1.734 tiplied/divided with the adj factor and we get the new
values as :
Open = Open * adj factor
VI. KALMAN FILTER AS ANOTHER PREDICTIVE Close = Close * adj factor
ALGORITHM High = High * adj factor
Kalman algorithm is a recursive algorithm that uses time series Low = Low * adj factor
data to eliminate inaccuracies obtained due to noise in Volume = Volume / adj factor
measurement of different variables and produce estimates of 3) Now, the adj close and open parameters are shifted by
variables more accurate than single measurements and hence a one row to enable to make comparison between the
time series algorithm. Kalman deals with the uncertainties of present and next days predicted value.
the variables with weights higher to estimates with higher 4) The difference between the various parameters is found
uncertainty [15]. Kalman algorithm works in the following by the following method :
steps. The first step involves estimates of the variables along High diff = High - adj close shift
with the noise involved in its measurement and other different Low diff = Low - adj close shift
232
Close diff = adj close - adj close shift VII. CONCLUSION
Open diff = Open shift - adj close shift
Through this paper we have observed that we can use machine
C. Prediction and Comparison learning to predict and compare the stock market prices. The
result shows how we can use historical data to predict stock
A prediction is made by using Kalman Filter on the various
movement with reasonable accuracy but the choice of
parameters of the data set and estimated stock prices of the
algorithm depends on the requirement of parameters like time,
next day are found out using the steps mentioned in the
variance and mean accuracy. If the requirement is high
Kalman filter algorithm.The predicted and the actual values are
accuracy and low variance, LSTM would be a better choice but
compared by calculating the R2 score between them.
it is comparatively slower. If the requirement is high speed and
R2 is a statistic that will give some information about the accuracy then backpropagation is better. Also, from T-test
goodness of fit of a model. In regression, the R2 coefficient of result analysis we can conclude that LSTM is more reliable as
determination is a statistical measure of how well the compared to Backpropagation and SVM. For this
regression predictions approximate the real data points. An R2 implementation, we have incorporated 6 factors that affect
of 1 indicates that the regression predictions perfectly fit the stock performance. If a higher number of factors are used and
data. An acceptable value is between 0 to 1. after adequate preprocessing and filtering of data, it is used to
train the network model, then a higher accuracy can be
D. Experimental Results achieved.
233
[12] A. J. Smola and B. Scholkopf,¨ “A tutorial on support vector [16] J. Teow. (2017, May) Understanding kalman filters with
regression,” Statistics and computing, vol. 14, no. 3, pp. 199– python.
222, 2004. [Online].Available:https://medium.com/@jaems33/understandi
[13] J. Brownlee. (2016, Jul.) How to implement the ng-kalman-filters-with-python-2310e87b8f48
backpropagation algorithm from scratch inpython. [Online]. [17] Y. Xu and G. Zhang, “Application of kalman filter in the
Available: https://machinelearningmastery.com/time-series- prediction of stock price,” in International Symposium on
prediction-lstm-recurrent-neural-networks-python-keras/ Knowledge Acquisition and Modeling (KAM). Atlantis press,
[14] J. Brownlee. (2016, Jul.) Time series prediction withlstm 2015, pp. 197–198.
recurrent neural networks in python with keras. [Online]. [18] T. Lacey, “Tutorial: The kalman filter,” Computer Vision.
Available: https://machinelearningmastery.com/time-series- [Online].
prediction-lstm-recurrent-neural-networks-python-keras/ [19] Available:http://www.cc.gatech.edu/classes/cs732298/spring/PS
[15] C. Ku. (2017, May) Beating the naive model in the stock /kf1.p
market. [Online]. Available:
https://medium.com/@CalvinJKu/beating-the-naive-model-in-
the-stock-market-62ec54436cf3
234