0% found this document useful (0 votes)
68 views7 pages

Comparison of Predictive Algorithms: Backpropagation, SVM, LSTM and Kalman Filter For Stock Market

08701258

Uploaded by

PiyushPurohit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views7 pages

Comparison of Predictive Algorithms: Backpropagation, SVM, LSTM and Kalman Filter For Stock Market

08701258

Uploaded by

PiyushPurohit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Comparison of Predictive Algorithms: Backpropagation, SVM,

LSTM and Kalman Filter for Stock Market


Divit Karmiani1, Ruman Kazi2, Ameya Nambisan3, Aastha Shah4, Vijaya Kamble5
1,2,3,4,5
Sardar Patel Institute of Technology, Mumbai
1
[email protected], [email protected], [email protected],
4
[email protected], 5vijaya [email protected]

Abstract: The variation and dependency on different param- As every algorithm has its own advantage and disadvantage
eters of stock market makes prediction a complex process. this research will study 3 different machine learning
Artificial neural Networks have been proven to be useful in techniques commonly being in use for stock market prediction.
such cases to predict the stock values.The parameters A compar-ative approach will be followed to find out optimal
involved and the commonly used algorithms are discussed technique for stock market prediction under different
and compared in this paper. In case of backpropagation circumstances. Comparison will be made on the basis of their
algorithm, a feed forward network is present and weights are performance on the same input dataset and for the same
modified by back propagating the error. Similarly, significant number of epochs. As the main purpose of any predictive
modification is introduced in Sup-port Vector Machines model is to minimize error (in case of stock market - minimize
Algorithm(SVMA) which results in higher accuracy rates. the risks), our given problem statement was addressed to
Presence of kernel and other parameters make it more achieve as much accuracy as possible by tweaking the
flexible. Long Short-Term Memory(LSTM), another algorithms.
commonly used time series forecasting algorithm, is a special
type of Recurrent Neural Network(RNN) that uses gradient II. LITERATURE SURVEY
descent algorithm. This paper provides a comparative Several studies have been done to analyze these algorithms to
analysis between these algorithms on the basis of accuracy, be used for stock market predictions. In most cases Arti-ficial
variation and time required for different number of epochs. Neural network(ANNs) were used which suffered from
The T-test hypothesis test was used for further analysis to test overfitting problem this was because of the large number of
the reliability of each algorithm. parameters to be fixed [2], and the little prior user knowledge
about the relevance of the inputs in the analyzed problem.
Keywords: Artificial Neural Network(ANN), Support Vec-tor
SVM [3] and Backpropagation algorithms were used for many
Machines(SVM), Long Short-Term Memory(LSTM), Back-
problems centered around classification. They tried to over-
propgation, Recurrent Neural Netwrok(RNN), T-tests
come most of the limitations of ANN and on many instances
I. INTRODUCTION reasonable accuracy was achieved. LSTM was used for many
time series problems and similar approach is used to predict
Prediction or forecast is essentially a calculated guess based on trend for stock by memorizing history data. Kalman filter is a
some previous evidence or facts. In todays growing economy two stage algorithm which employs linear regression within
prediction and analysis of stock market is of great importance data and represents ’true value’ of the market. It strikes a
as it reflects the economic conditions, but is a very balance between data and predictions based on relative amount
complicated task. This is because of non-linear behavior of of noise [4].
stock data. Many other factors which create hurdles in the way
to predict stock include social, economic conditions, politics, Several other machine learning techniques are also being used
rumours and media hype, traders behaviour, etc. Professional for stock market prediction. There is no specification made by
traders have tried to make forecasts and have developed a which we can choose the optimal solution for stock market
variety of analysis methods such as technical, fundamental, prediction. This paper analyzes these algorithms using same
quantitative, and so on [1]. These methods use different amount of test and training data to maintain the consistency of
sources ranging from news to price data, but they all have a testing environment. T-tests were performed for one-to-one
common aim - predicting the companys future stock prices so comparison of these algorithms under similar conditions.
they can make educated decisions on their trading options. These tests tell us the ability of these algorithms to provide
Over the past 10 years, neural network - one of the most consistent results with respect to each other.
intelligent data mining techniques - has been used extensively.
This has led to many researchers applying machine learning III. THEORY
techniques to stock market as well some of which have The three algorithms being focused in this paper are:
produced quite promising results.

978-1-5386-9346-9/19/$31.00 ©2019 IEEE


A. Back Propagation perceptron, LSTM uses diverse processing elements or blocks
to build a better recurrent network capable of learning long
Back Propagation Algorithm is another supervised learning term dependencies.The inputs to a LSTM cell are the previous
that is used to train a multi-layer feed forward network as it hidden and memory states and the current input while the
requires one or more fully interconnected layers. Like SVM, outputs are the current hidden and memory state. The memory
Backpropagation can also be used for both Classification and in LSTM can be seen as a gated cell, where three gates - input,
Regression problem. Stock market prediction can be treated as forget and output gate control the ability to add or remove
a classification problem. For each sample in the training information or let it impact the output at the latest
dataset the output response id determined. The output is then present(current) time step [8]. It uses a sigmoid neural net
compared with the given target and error is calculated using layer and weights to assign importance to information. This
the Widrow-Hoff Delta learning rule or gradient descent rule helps the gates protect and control the flow of information
[5]. As the term back propagation suggests, the weight overcoming the vanishing gradient problem traditional RNN
updation and calculation of gradient happens backwards and has. The stochastic gradient descent based algorithm used
using the least mean square error of the output response to the helps enforce constant error propagation( neither exploding or
input sample. vanishing) through its internal units. LSTM can be used for
time series predictions and are hence used in stock market [9].
The structure of the network consists of - the input layer which
is connected to the hidden layer by interconnection weights IV. MODEL CREATION AND METHODOLOGY
and hidden layer which is further connected to the output layer
by interconnection weights [5]. The increase in number of Our approach in this analysis consists of three major steps:
layers increases the computational complexity of the neural
network which can cause the time taken for convergence and x Data-set creation
to minimize the error to be very high. x Implementation of algorithm
B. SVM x Result and analysis
SVM (Support vector machine) is one popular algorithm used A. Data-set creation
for many classification and regression problems. It is one of
the supervised learning models that are based on the concept of For training and testing predictive algorithms, we compiled
hyperplanes as decision boundaries. In two dimensional space database for training and testing purpose [10]. The database
the hyperplane is a line. Based on the training samples, SVM should be having minimum deviation and should account
classifies them into one of the different membership classes. normal behaviour. It was important to select a time frame
While building the model, SVM training algorithm assigns where the stock market did not show any sudden jump or deep
new examples to one class or other class and assigns a in prices. Often, there are times of economic collapse in certain
hyperplane to output. This makes it a non-probabilistic binary sector, industry or country. The extent of these collapse might
linear classifier. An SVM model is a representation of the not be fathomable and are unpredictable.
examples as points in space, mapped such that the points in
For the training and testing of the algorithms and models, we
different categories are a a maximum distance from the
chose time frame of January 2009 to October 2018. Since,
decision boundary [6]. SVM uses five-fold cross-validation
during 2007-2008 stock market took a hit because of financial
technique to estimate probability estimates instead of a direct
crisis. Yahoo Finance provides Historical Database which can
method. This makes it more expensive. They are effective in
be used for the training of the model. It provides historical
higher dimensions, especially when the number of dimensions
database of a stock in the form of following parameters :
exceeds the number of samples. This is due to the presence of
hyperparameters which include gamma, regularization x Date: The date of when the following values collected
parameter(C) and the choice of kernel available in the svm
classifier. C decides the extent to which misclassification of x Open: Opening value of the stock
data is allowed, gamma denotes how far influence of a training
x High: Highest value achieved by the stock on that date
sample reaches and kernel which could be linear, poly or rbf
which determines the learning of the hyperplane [7]. x Low: Lowest value achieved by the stock on that date

C. LSTM x Close: Closing value of the stock for that date

LSTM or Long Short Term memory is a type of recurrent x Adj Close: Closing value of the stock before market opens
neural network (RNN)and uses an appropriate gradient descent next day, adjusted by considering corporate actions and
algorithm. A RNN network feeds the output of a block as the distributions
input for the next iteration and uses its feedback connection to x Volume: Total number of shares that changed hands
store recent events in the form of activation. Unlike multilayer during that day

229
So, we collected stock price historical data from Yahoo x Creating a SVM classifier
Finances for the following 9 companies belonging to same
sector of Technology: x Defining the kernel and setting its parameters
x Apple x Predicting and comparison with target for accuracy
x Acer calculation
x Amazon
In this case, since the stock market data is non linear we used
x Google
the rbf kernel and kept the hyperparameters C and gamma at
x HP
their default value of 1 and scale(version 0.22) respectively.
x IBM Higher values of C and gamma,causes the decision boundary
x Intel to become more curvy and variance to increase thereby
x Microsoft causing higher chances of overfitting. The decision function
x Sony shape was chosen to be ovo as it gave better results [12].
We compiled our own database from these parameters. The Accuracy can be improved by tuning the parameters using
key terms used in deriving the input parameters are: function gridSearchCV() in sklearn.model selection.This may
take a lot of time depending on the number of training
x Momentum: It is the measure of detecting the flow of a samples. Once the classifier is trained by mapping the given
parameter such as stock price in our case. It is defined as features and labels, prediction is made using the the test data
+1 if the price has increased and -1 if the price has been which consists of approximately 20% of the total data (4,412
dropped from yesterday. samples). This is the compared with the target data present in
the output file and accuracy is ascertained.
x Volatility: It determines the fluctuations in the same
parameter at different times. For our case it will be 2) Backpropagation: We followed 5 steps to implement
determined by difference between yesterday’s price and this algorithm on our dataset [13]:
today’s price divided by yesterday’s price.
a) Initialize Network:
Hence for the input dataset we found out following parameters
from the above parameters: x Here we initialize weights and biases using the number of
inputs, number of hidden neurons and number of outputs.
x Index Momentum: It determines the momentum of the
market. For our case taken as 5 days of average. x Weights are selected randomly from the range 0 to 1.
x Index Volatility: This helps in determining the fluctua- b) Forward Propagation: In this stage we can get an output by
tions in market. It is also calculated as average of last 5 propagatingour inputs through all the hidden layers and output
days. layer. We use this technique further in the process for the
prediction. Forward propagation involves:
x Sector Momentum: This parameter considers other com- x Activation of Neuron: We find the activation value using
panies in the same sector and calculates the momentum of weighted sum of inputs.
these companies for the last 5 days.
x Transfer of Neuron: Using this activation value and
x Stock Momentum: This is average of the last 5 days of sigmoid function we transfer this activation to get the
momentum of the respective company output.
x Stock Price Volatility: This is average of the last 5 days of x Forward Propagation: It propagates inputs through input,
stock price of respective company. hidden and output layers and stores the output value.
Data set consisted of these 6 dimensions for each of the stocks. c) Backward Propagation of error: Here we calculate the
I.E. each day has 9 records, one record for each company slope of the output by differentiating the sigmoid function.
stock. The total number of rows for our input data set are Now, error signal is generated using the weighted error of
around 22212. each neuron in output layer.
B. Implementation of Algorithm d) Train Network: The error calculated for each neuron using
1) Support Vector Machine: This algorithm was imple- backpropagation method can be used to update the
mented in python using the scikit-learn library for weights. Similarly we also update bias weights. In this
machine learning. The implementation was done in the stage learning rate determines how much change should
following steps [11] - be there in updating the weights. Smaller learning rate
over larger database and iterations give better set of
x Importing the library and csv data files weights. Since, we are updating weights for each iteration,
this is called as online learning. Once the weights are

230
updated the step repeats depending on the number of this model is fit to the training data and we predict the output.
iterations defined (number of epochs) and we train the Then, we calculated the accuracy of the model.
network.
V. RESULTS AND ANALYSIS
e) Predict: For prediction we are forward propagating the
To make sure that one algorithms has a consistent perfor-
input set to get the output. In our case we are using the
mance and it performs better than other we tried to run same
output itself as the probability of pattern belonging to each
algorithm multiple times. Each algorithm was implemented 30
output class [13]. We turned this output into crisp class
times and for each time prediction accuracy is calculated on
prediction by selecting the class value with larger
test data. For each run different training and testing dataset is
probability.
used. Find below accuracy results for each algorithm.
Then we compared this predicted output with the target output
A. SVM Result:
to get the accuracy of the algorithm. The database was divided
into 5 batches (5 folds) out of which 4 were used for training For 10 runs of SVM algorithm we got approximately 66.9823
the network and 1 fold for testing the network. This gave us mean accuracy with 0.05256 standard deviation. This shows
around 80% of data for training and 20% for testing. thatSVM performance is consistent for 10 runs this is due to
nature of SVM algorithm. Algorithm will keep training until it
3) Long Short Term Memory(LSTM): Implementation of can classify maximum testing data resulting in higher
LSTM was done using the following steps [14]- accuracy(Table I).
a) Dataset creation for LSTM: B. LSTM Result:
x Converting the time series into supervised learning
problem: For 10 runs of LSTM algorithm we got approximately
– We have data divided into input set and output set. 68.51635 mean accuracy with standard deviation of 0.71779.
– We defined a value for time step. Using this time step But for larger number of epochs LSTM gives less variance and
we shift the dataset and concatenate these two series thus more reliable output (Table II).
to get the output set.
TABLE I: RESULTS OBTAINED FOR SVM
x Converting the time series into stationary data:
– Stationary data is easier to model and is better for Mean Time
forecasting. No. of Accuracy Standard Required
– It is obtained by removing the trend from the data Variance
Epochs (in Deviation ( in
which is achieved by differencing the data. percentage) seconds)
x Observations are transformed to specific scale: 10 66.9823 0.05255 0.00276 255.4932
– LSTM works on the data which is within the scale of 30 67.0316 0.11561 0.01337 933.3335
activation function of the network.
50 67.0895 0.15970 0.02550 1438.2164
– Since the default activation function for LSTM is
70 67.2219 0.18522 0.03430 2154.9157
hyperbolic tangent (tanh) which has output range
from -1 to +1 which is ideal for time series data. 100 67.1212 0.16695 0.02787 2258.1878
b) Model Development:
TABLE II: RESULTS OBTAINED FOR LSTM
x LSTM is a type of Recurrent Neural Network (RNN). This
type of neural network is useful when remembering over Mean Time
the long sequence of data and it doesnt depend on the No. of Accuracy Standard Required
window lagged dataset as input. Variance
Epochs (in Deviation (in
x LSTM layer takes 3 inputs: percentage) seconds)
– Samples: The rows of input dataset having indepen- 10 68.51635 0.71779 0.51523 574.5679
dent observations.
– Time steps: These are time steps of the parameters for 30 68.98083 0.05528 0.00306 3504.9207
the input dataset. 50 68.95468 0.04981 0.00248 5403.6272
– Features: The parameters considered and observed for 70 68.96183 0.05109 0.00261 7448.0353
input dataset to predict the output. 100 69.04171 0.05204 0.00271 10692.7814

While compiling the network we need to mention loss function C. Backpropagation Result:
and also the optimization algorithm. So, we are using mae as
For 10 runs of Backpropagation algorithm we got approx-
loss function and ADAM as optimization algorithm. ADAM
imately 68.649 mean accuracy with standard deviation of
will select a suitable learning rate for the network. And then

231
0.55375. As it can be observed Backpropagation performs noises. Once the outcome of the next measurement is obtained
similar to SVM but is faster. It has higher fluctuation in by the algorithm it updates its uncertainty matrix depending on
accuracy as compared to LSTM so that might cause issue the covariance between the predicted value and the measured
when a steady accuracy is required(Table III). value. Kalman filter hence develops a state transition model to
follow the same flow as the data given and updates it at every
TABLE III: RESULTS OBTAINED FOR step to find the most localized value and hence accurate
BACKPROPAGATION prediction of the next state [16]. The stock market being a time
series data and highly fluctuating becomes a good application
Mean Time to use Kalman filter which possesses real time tracking
No. of Accuracy Standard Required characteristics [17].
Variance
Epochs (in Deviation ( in
percentage) seconds) A. Algorithm and Flowchart
10 68.649 0.55375 0.30664 12.6310
30 68.433 0.54018 0.29180 52.8089
50 68.339 0.89537 0.80169 113.1676
70 68.249 0.69470 0.48260 143.1701
100 67.434 2.43245 5.91679 229.9802
D. Observations
From the above obtained results we also found out the T-test
values for different combination of algorithms.T-tests are one
of inferential statistical methods used to compare means. It
tells us how many standard units the two means are apart. An
assumption is made that the dependent variable is normally
distributed and thus helps calculate the probability. The
probability or amount of confidence(alpha level, level of
significance) is set as 95%. The critical value for 95%
confidence is found out for epoch = 10. And as we can see
Fig. 1. Flowchart of Kalman Filter [18]
from the table IV, only the combination of Backpropagation-
LSTM has T-stat less than critical value(CV). From table I and B. Data Set Creation
II we can see that LSTM has much less variance. So obtaining
better results from LSTM has higher chances than The data set used is the same as used for the other
Backpropagation. algorithms.However, the data set used in Kalman filtering is
preprocessed as follows :
TABLE IV: T-TEST VALUES FOR EPOCH = 10
1) From the adj close parameter in the original dataset, an
T-Stat CV adjustment factor(adj factor) is calculated as adj factor
SVM-Backpropagation 6.715 1.771 = adj close/close.
Backpropagation-LSTM 0.395 1.771 2) The other parameters of the original dataset are mul-
LSTM-SVM 7.666 1.734 tiplied/divided with the adj factor and we get the new
values as :
Open = Open * adj factor
VI. KALMAN FILTER AS ANOTHER PREDICTIVE Close = Close * adj factor
ALGORITHM High = High * adj factor
Kalman algorithm is a recursive algorithm that uses time series Low = Low * adj factor
data to eliminate inaccuracies obtained due to noise in Volume = Volume / adj factor
measurement of different variables and produce estimates of 3) Now, the adj close and open parameters are shifted by
variables more accurate than single measurements and hence a one row to enable to make comparison between the
time series algorithm. Kalman deals with the uncertainties of present and next days predicted value.
the variables with weights higher to estimates with higher 4) The difference between the various parameters is found
uncertainty [15]. Kalman algorithm works in the following by the following method :
steps. The first step involves estimates of the variables along High diff = High - adj close shift
with the noise involved in its measurement and other different Low diff = Low - adj close shift

232
Close diff = adj close - adj close shift VII. CONCLUSION
Open diff = Open shift - adj close shift
Through this paper we have observed that we can use machine
C. Prediction and Comparison learning to predict and compare the stock market prices. The
result shows how we can use historical data to predict stock
A prediction is made by using Kalman Filter on the various
movement with reasonable accuracy but the choice of
parameters of the data set and estimated stock prices of the
algorithm depends on the requirement of parameters like time,
next day are found out using the steps mentioned in the
variance and mean accuracy. If the requirement is high
Kalman filter algorithm.The predicted and the actual values are
accuracy and low variance, LSTM would be a better choice but
compared by calculating the R2 score between them.
it is comparatively slower. If the requirement is high speed and
R2 is a statistic that will give some information about the accuracy then backpropagation is better. Also, from T-test
goodness of fit of a model. In regression, the R2 coefficient of result analysis we can conclude that LSTM is more reliable as
determination is a statistical measure of how well the compared to Backpropagation and SVM. For this
regression predictions approximate the real data points. An R2 implementation, we have incorporated 6 factors that affect
of 1 indicates that the regression predictions perfectly fit the stock performance. If a higher number of factors are used and
data. An acceptable value is between 0 to 1. after adequate preprocessing and filtering of data, it is used to
train the network model, then a higher accuracy can be
D. Experimental Results achieved.

For the dataset of Apple,R2 score of 0.9856 was calculated REFERENCES


which is within the accepted range of R2 score.The actual and [1] A. Zheng and J. Jin, “Using ai to make predictions on stock
predicted values are plot in a graph as follows.Refer fig. 2 market,” Stanford University, Tech. Rep., 2017.
[2] O. Hegazy, O. S. Soliman, and M. Abdul Salam, “A machine
learning model for stock market prediction,” International
Journal of Computer Science and Telecommunications, vol. 4,
pp. 17–23, Dec 2013.
[3] Y. Xia, Y. Liu, and Z. Chen, “Support vector regression for
prediction of stock trend,” in Information Management,
Innovation Management and Industrial Engineering (ICIII),
2013 6th International Conference on, vol. 2. IEEE, 2013, pp.
123–126.
[4] R. M. . N. Rhoads, “Predicting market data with a kalman
filter,” Technical Analysis of Stocks and Commodities, January
2010.
[5] Brilliant.org. (2018) Backpropagation. [Online]. Available:
https://brilliant.org/wiki/backpropagation/
Fig. 2. Graph showing Actual and Predicted values for Apple
[6] S. Madge, “Predicting stock price direction using support vector
On using another data set, we observe that the variance ma-chines,” Princeton University, Independent Work Report,
between prices of two consecutive days is too high or the noise 2015.
as perceived by the Kalman filter is very high. The predicted [7] M. A. Hearst, S. T. Dumais, E. Osuna, J. Platt, and B.
Scholkopf, “Sup-port vector machines,” IEEE Intelligent
value is an indefinite value. As the noise decreases, the
Systems and their Applications, vol. 13, no. 4, pp. 18–28, July
Kalman filter is able to predict the following day’s closing 1998.
price value accurately. This is shown in the graph below.Refer [8] C. Olah, “Understanding lstm networks,” Aug. 2015. [Online].
fig. 3 Available: http://colah.github.io/posts/2015-08-Understanding-
LSTMs
[9] A.Moawad. (2018, Feb.) The magic of lstm neu-Ral networks.
[Online]. Available: https://medium.com/datathings/the-magic-
of-lstm-neural-networks-6775e8b540cd
[10] G. Bonde and R. Khaled, “Extracting the best features for
predicting stock prices using machine learning,” in Proceedings
on the Inter-national Conference on Artificial Intelligence
(ICAI). The Steering Committee of The World Congress in
Computer Science, Computer , 2012, p. 1.
[11] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B.
Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V.
Dubourg et al., “Scikit-learn: Machine learning in python,”
Fig. 3. Graph showing Actual and Predicted values for Acer Journal of machine learning research, vol. 12, pp. 2825–2830,
Oct 2011.

233
[12] A. J. Smola and B. Scholkopf,¨ “A tutorial on support vector [16] J. Teow. (2017, May) Understanding kalman filters with
regression,” Statistics and computing, vol. 14, no. 3, pp. 199– python.
222, 2004. [Online].Available:https://medium.com/@jaems33/understandi
[13] J. Brownlee. (2016, Jul.) How to implement the ng-kalman-filters-with-python-2310e87b8f48
backpropagation algorithm from scratch inpython. [Online]. [17] Y. Xu and G. Zhang, “Application of kalman filter in the
Available: https://machinelearningmastery.com/time-series- prediction of stock price,” in International Symposium on
prediction-lstm-recurrent-neural-networks-python-keras/ Knowledge Acquisition and Modeling (KAM). Atlantis press,
[14] J. Brownlee. (2016, Jul.) Time series prediction withlstm 2015, pp. 197–198.
recurrent neural networks in python with keras. [Online]. [18] T. Lacey, “Tutorial: The kalman filter,” Computer Vision.
Available: https://machinelearningmastery.com/time-series- [Online].
prediction-lstm-recurrent-neural-networks-python-keras/ [19] Available:http://www.cc.gatech.edu/classes/cs732298/spring/PS
[15] C. Ku. (2017, May) Beating the naive model in the stock /kf1.p
market. [Online]. Available:
https://medium.com/@CalvinJKu/beating-the-naive-model-in-
the-stock-market-62ec54436cf3

234

You might also like