Multi-factor Based Stock Price Prediction Using Hybrid Neural Networks with Attention Mechanism
Multi-factor Based Stock Price Prediction Using Hybrid Neural Networks with Attention Mechanism
Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress
4th Saleh Ahmed 5th Kazi Md. Rokibul Alam 6th Yasuhiko Morimoto
Graduate School of Engineering Department of Computer Science and Engineering Graduate School of Engineering
Hiroshima University Khulna University of Engineering and Technology Hiroshima University
Hiroshima, Japan Khulna, Bangladesh Hiroshima, Japan
[email protected] [email protected] [email protected]
Abstract—The prediction of time series data, such as stock The linear models such as Autoregressive model (AR) [1],
prices, is difficult since there exist many factors that affect the Autoregressive Moving Average (ARMA) [2] and its variation
prediction model. Also, the influence of different factors on a Autoregressive Integrated Moving Average (ARIMA) [3] are
stock price may be linear or nonlinear. The generation of good
models for stock prices challenge the researchers in recent years. well-known econometric approaches which can generate the
Long Short-Term Memory (LSTM) is a variation of Recurrent linear relationship of the historical data. However, in some
Neural Network (RNN), which can capture temporal sequence real-world scenarios, these models cannot capture underlying
and have gained great success on time series prediction. Also, dynamics in the data. Artificial Neural Network (ANNs) are
Convolutional Neural Network (CNN) is superior for extracting dominant in detecting nonlinear relationships such as the
features from multi-dimensional sequences. In this paper, we
propose a CNN-LSTM hybrid neural network with multiple application of regression [4] and classification [5], which
factors to predict stock prices. Moreover, we add an attention attracting researchers to focus on ANNs in recent years.
mechanism to improve the scalability and the accuracy of Recurrent Neural Network (RNN) [6] is a class of ANNs,
the CNN-LSTM model. In the experiments, we compare our which is superior for capturing temporal sequence. Unlike
proposed model with different approaches in two real stock ANNs, there maintain a hidden state in RNNs, which can be
datasets. The results confirm the efficiency and scalability of our
proposed method. updated as time goes on, so the previous information can be
Index Terms—stock prediction, multi-factor, CNN, LSTM, memorized and utilized for current prediction. Unfortunately,
attention mechanism the conventional RNNs may cause gradient vanishing problem
[7] as sequences being longer. In other words, conventional
I. I NTRODUCTION RNNs lack the ability in long-term information processing.
A time series is a series of data points indexed in time Long Short-Term Memory (LSTM) [8] and Gated Recurrent
order. Most commonly, a time series is a sequence taken Unit (GRU) [9] are two variations of RNNs, which are
at successive equally spaced points in time. In the past few designed to solve the gradient vanishing problem and learn
decades, many researchers have been attracted by time series long-term dependencies in time series data by using memory
prediction problems. However, the prediction of time series cells and gate mechanisms.
data such as stock prices, the weather, and the exchange rate There exist some past literature focuses on the prediction of
is difficult since there exist many factors that influence the stock prices by using LSTM, which are recorded in [11], [12],
prediction results. For example, the factors affecting stock [20]. In the data preprocessing phase, we usually split the stock
prices are not only related to historical stock prices, but may price data to sequences, and each sequence is sorted according
also be related to the volume, the change of the exchange rate, to timestamp. If the maximum number of the sequences is n,
and changes in economic policies, etc. Also, the influence of and the length of each sequence is m, the input data of the
different factors on a stock price may be linear or nonlinear. LSTM model are an n×m 2-dimensional matrix. However, as
The complicated relationship between a stock price and multi- the length of the sequence m being larger, we have to increase
factors challenge the researchers to generate a reasonable and the timesteps of the LSTM to process more information. For
appropriate model for the prediction of stock prices. In the the multi-factor related time series data, such as stock prices,
stock market, it is crucial for investors to have a well-known a wise choice is to extract features from every sequence.
judgment on stocks as early as possible. Convolutional Neural Network (CNN) [21] is an excellent
In general, there are two classes of models to analyze technique to obtain the features of image data. CNN has shown
and predict stock prices: linear models and nonlinear models. great success at modeling image data in computer vision filed
962
optimal architecture should learn the different characteristics
of each model to improve the prediction accuracy. From this
motivation, we propose an optimal architecture to integrate
a CNN and LSTM to enhance the accuracy of stock price
prediction by fusing features of different representations from
financial time series data.
E. Attention Mechanism
The attention mechanism is a well-known concept and a
useful tool in the deep learning community. Attentive neural
networks have gain success in machine translations [14],
image captioning [15] and speech recognition [24].
In general, attention is a kind of weighted summation. Wang Fig. 1. Single factor model and multi-factor model.
et al. [25] proposed three kinds of attention mechanisms, called
location-based attention, general attention and concatenation-
based attention. Chen et al. [26] introduced a novel attention In the convolutional layer, we select ReLu activation function
mechanism in collaborative filtering to address the challenging and represent them into feature map.
item and component-level implicit feedback in the multimedia In the pooling layer, the max operation is the most com-
recommendation. Specifically, they used two attentive neural monly used approach. To extract the important features and
networks to select informative components of multimedia reduce the computation from the convolutional layer, we select
items and the item-level attention module. The experimental to use the Max-pooling layer. After the Max-pooling layer, we
results of the proposed model outperformed the state-of-art apply the dropout technique to overcome the overfitting. In this
approaches. work, we set the dropout with the probability as 0.3.
In this paper, to improve the scalability and the accuracy
of the integrated neural network model, we add the attention
mechanism to memorize long sequences.
963
IV. E XPERIMENT
In this section, we conduct extensive experiments to eval-
uate the efficiency and effectiveness of the proposed model.
We implement all algorithms using Python 3.5.2. We select
Support Vector Regression (SVR), LSTM model, GRU model
as the baselines and compare them with our proposed CNN-
LSTM-Attention model.
A. Data Set
We use two real stock datasets, called JQData dataset and
Fig. 3. Basic LSTM structure and unfolded LSTM structure. Pingan Bank dataset.
JQData dataset:We collect the historical stock data from
April 1st, 2017 to March 30th, 2019. This dataset
C. Attentive Neural Network contains 53,153 stock records with six factors, called
Since the attention mechanism can improve the prediction open price, close price, high price, low price, volume,
accuracy, we propose the attention mechanism for the regres- and money respectively. This dataset is split into
sion tasks. Let O be the matrix consisting the output vectors two subsets for training and testing usages. We use
of the LSTM model [o1 , o2 , ..., oT ], where T represents the 47,835 stock data for the training phase, and use
sequence length. We use s to represent the score vector for 5,318 stock data for the testing phase. Figure 4 shown
the output of the LSTM, which is formed by a weighted sum the close price of the training data and the testing
of the matrix O: data.
Pingan Bank dataset: We use the Pingan Bank historical
stock data from April 1st, 2017 to March 30th, 2019.
M = sigmoid(O) (2) This dataset contains 23,328 stock records with six
α = sof tmax(wT M ) (3) factors, called open price, close price, high price, low
s = OαT (4) price, volume, and money respectively. This dataset
is split into two subsets for training and testing
where O ∈ Rm×T . usages. We use 20,995 stock data for the training
Then we use a fully connected layer to map the output of the phase, and use 2,333 stock data for the testing phase.
attention mechanism and obtain the final result of the proposed Figure 5 shown the close price of the training data
architecture. The formula is shown as follows: and the testing data.
D. Training Phase
At the training phase, we use the fixed size of time steps T
for the unfolded LSTM structure shown in Figure 3. Define
the training data as {[x1 , x2 , ..., xT −1 ]j , [x2 , x3 , ..., xT ]j }N
j=1 ,
where [x1 , x2 , ..., xT −1 ]j and [x2 , x3 , ..., xT ]j are the input
data and the true output of the LSTM model respectively. We
trian the LSTM model in a sequence-to-sequence manner.
We use the Backpropagation-Through-Time (BPTT) [27]
to propagate gradients of errors. The errors are computed by
using loss function, which is shown as follows:
Fig. 4. Training data and testing data of JQData dataset.
1
n 2
loss = n i=1 (yi − outputi ) (6)
964
Fig. 8. Regression Results of JQData dataset when sequence length is 30.
Fig. 5. Training data and testing data of Pingan Bank dataset.
TABLE I
MSE OF THE P REDICTION M ETHODS IN JQDATA DATASET.
C. Experiment Results
SeqLen
We compare our proposed model with three baselines, SVR, 7 14 30
Methods
LSTM, and GRU respectively. Evaluations are executed using SVR 1.6753 1.6753 1.6753
MSE measurement. GRU 0.0031 0.0041 0.0029
LSTM 0.0032 0.0114 0.0018
Figure 6, Figure 7 and Figure 8 demonstrate the test Our Model 0.0033 0.0033 0.0012
regression results of JQData dataset when the sequence length
is 7, 14, and 30 respectively.
parameters to generate a good model when the sequence length
is larger. Our model works better than other baseline models.
For the Pingan Bank dataset, we show the test regression
results in Figure 9, Figure 10 and Figure 11.
TABLE II
MSE OF THE P REDICTION M ETHODS IN P INGAN BANK DATASET.
SeqLen
7 14 30
Methods
SVR 1.8003 1.8003 1.8003
GRU 0.0032 0.0036 0.0018
Fig. 7. Regression Results of JQData dataset when sequence length is 14. LSTM 0.0022 0.0034 0.0015
Our Model 0.0029 0.0032 0.0014
We use MSE to evaluate the models, and demonstrate the
results in Table I. When the sequence length is 7, GRU works We demonstrate the MSE results in Table II. When the
better than other models because the sequence length is short sequence length is 7, LSTM works better than other models.
that the model doesn’t need too many parameters. When the When the sequence length is larger, e.g., SeqLen = 14 and
sequence length is larger, e.g., SeqLen = 14 and 30, the MSE 30, the MSE of our proposed model works better than other
of our proposed model has the smallest value. We need more baseline models.
965
[3] Hyndman, and J. Rob, “Athanasopoulos, George. 8.9 seasonal ARIMA
models,” Forecasting: principles and practice, oTexts, 2015.
[4] W. Bao, J. Yue, and Y. Rao, “A deep learning framework for financial
time series using stacked autoencoders and long-short-term memory,”
PLoS One, 12(7), 2017.
[5] M. Tomas, K. Martin, B. Lukas, C. Jan, and K. Sanjeev, “Recurrent
neural network based language model,” In Interspeech, volume 2, page
3, 2010.
[6] Williams, J. Ronald, Hinton, E. Geoffrey, Rumelhart, and E. David,
“Learning representations by back-propagating errors,” Nature, 323
(6088): 533-536, 1986.
[7] B. Yoshua, S. Patrice, and F. Paolo, “Learning long-term dependencies
with gradient descent is difficult,” IEEE transactions on neural networks,
5 (2):157-166, 1994.
[8] S. Hochreiter, and J. Schmidhuber, “Long Short-Term Memory,” Neural
Fig. 10. Regression Results of Pingan Bank dataset when sequence length is Computation, 9 (8):1735-1780, 1997.
14. [9] K. Cho, B. V. Merrienboer, D. Bahdanau, and B. Yoshua, “On the
properties of neural machine translation: encoder–decoder approaches,”
arXiv:1409.1259, 2014.
[10] K. Chen, Y. Zhou, F. Dai, “A LSTM-based method for stock returns
prediction: A case study of China stock market,” In Proceedings: 2015
IEEE International Conference on Big Data, pp. 2823-4, 2015.
[11] D. M. Q. Nelson, A. C. M. Pereira, R. A. De Oliveira, “Stock market’s
price movement prediction with LSTM neural networks,” In Proceedings
of the International Joint Conference on Neural Networks, pp. 1419-26,
2017.
[12] X. Zhang, C. Li, Y. Morimoto, “A multi-factor approach for stock price
prediction by using recurrent neural networks”, Bulletin of Networking,
Computing, Systems, and Software, 8 (1), pp. 9-13, 2019.
[13] J. F. Chen, W. L. Chen, C. P. Huang, S. H. Huang, and A. P. Chen,
“Financial time-series data analysis using deep convolutional neural
networks,” In 7th International Conference on Cloud Computing and
Big Data (CCBD), pp. 87–92, 2016.
Fig. 11. Regression Results of Pingan Bank dataset when sequence length is [14] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by
30. jointly learning to align and translate,” arXiv:1409.0473, 2014.
[15] K. Xu, J. Ba, R. Kiros, A. Courville, R. Salakhutdinov, R. Zemel, and
Y. Bengio, “Show, attend and tell: neural image caption generation with
visual attention,” arXiv:1502.03044, 2015.
V. C ONCLUSION [16] T. Kimoto, K. Asakawa, M. Yoda, and M. Takeoka, “Stock market
Financial time series prediction, such as stock price predic- prediction system with modular neural networks,” In International Joint
Conference on Neural Networks, pp. 1-6, 1990.
tion, is crucial for investors to have a well-known judgment [17] K. Kim, and I. Han, “Genetic algorithms approach to feature discretiza-
on stocks as early as possible. In general, there are multiple tion in artificial neural networks for the prediction of stock price index,”
factors which can affect a stock price. For this study, we Expert Syst Appl., 19 (2):125-32, 2000.
[18] P. M. Tsang, P. Kwok, S. O. Choy, R. Kwan, S. C. Ng, J. Mak, et
collected six influence factors, open trade price, close price, al. “Design and implementation of NN5 for Hong Kong stock price
volume data, high price, low price and money to predict the forecasting,” Eng Appl Artif Intell, 20 (4):453-61, 2007.
close price of a stock. Besides, we used CNN to extract [19] J. Z. Wang, J. J. Wang, Z. G. Zhang, and S. P. Guo, “Forecasting
stock indices with backpropagation neural network,” Expert Syst Appl,
features from two real stock datasets and integrated the CNN 38 (11):14346-55, 2011.
with an LSTM model. To improve the accuracy and the [20] K. Chen, Y. Zhou, and F. Dai, “A LSTM-based method for stock return
scalability of the prediction, we added an attention mechanism prediction: A case study of China stock market,” In IEEE International
Conference on Big Data, 2015.
based on the LSTM neural network. [21] J. F. Chen, W. L. Chen, C. P. Huang, S. H. Huang, and A. P. Chen,
The experiments are conducted to demonstrate the effec- “Financial time-series data analysis using deep convolutional neural
tiveness and the performance of our proposed model. We networks,” In 7th International Conference on Cloud Computing and
Big Data (CCBD), pp. 87-92, 2016.
compared our model with SVR, LSTM and GRU model. [22] J. L. Ba, K. Swersky, S. Fidler, and R. Salakhutdinov, “Predicting
The results illustrated that the proposed approach had good deep zero-shot convolutional neural networks using textual descriptions,”
accuracy when the sequence length was larger. IEEE International Conference on Computer Vision, pp. 4247-55, 2015.
[23] G. Hu, Y. Hu, K. Yang, Z. Yu, F. Sung, Z. Zhang, et al. “Deep stock
ACKNOWLEDGMENT representation learning: from candlestick charts to investment decisions.
arXiv, 2017.
This work is partially supported by KAKENHI [24] J. K. Chorowski, D. Bahdanau, D. Serdyuk, K. Cho, and Y. Bengio,
(16K00155,17H01823) Japan. C. Li is supported by “Attention-based models for speech recognition,” In Advances in Neural
Information Processing Systems, pp. 577-585, 2015.
Japanese YAHATA Scholarship. M. Qaosar and S. Ahmed [25] X. J. Wang, L. Yu, K. Ren, G. Tao, W. N. Zhang, Y. Yu, and J. Wang,
are supported by Japanese Government MEXT Scholarship. “Dynamic attention deep model for article recommendation by learning
human editors’ demonstration,” KDD 17, Canada, 2017.
R EFERENCES [26] J. Chen, H. Zhang, X. He, L. Nie, W. Liu, and T. S. Chua, “Atten-
tive collaborative filtering: multimedia recommendation with item and
[1] G. E. P. Box, and G. Jenkins, “Time series analysis, forecasting and component-level attention,” SIGIR’17, Tokyo, Japan, 2017.
control,” Holden-Day, San Francisco, CA, 1970. [27] P. J. Werbos, “Backpropagation through time: what it does and how to
[2] E. D. McKenzie, “General exponential smoothing and the equivalent do it,” Proceeding of the IEEE, 78 (10):1550-1560, 1990.
ARMA process,” J. Forecasting, 333-344, 1984.
966