0% found this document useful (0 votes)
36 views

Stock Market Prediction Based On Interrelated Time Series Data

Use of standard math techniques as time related series and regression for stock market prediction using machine learning models

Uploaded by

Ed Sheeran
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views

Stock Market Prediction Based On Interrelated Time Series Data

Use of standard math techniques as time related series and regression for stock market prediction using machine learning models

Uploaded by

Ed Sheeran
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

2012 IEEE Symposium on Computers & Informatics

Stock Market Prediction Based on


Interrelated Time Series Data
Ryota Kato and Tomoharu Nagao
Graduate School of Environment and Information Sciences,
Yokohama National University
Kanagawa, Japan
Email: [email protected]

Abstract—In this paper, we propose a stock market prediction indicators such as moving average, relative strength index
method based on interrelated time series data. Though there are (RSI), stochastic oscillator and so on. There are many studies
a lot of stock market prediction models, there are few models to predict the stock market by using such data and engineered
which predict a stock by considering other time series data.
Moreover it is difficult to discover which data is interrelated approaches, such as Neural Network [6], Evolutionary Al-
with a predicted stock. Therefore we focus on extracting interre- gorithms [7], Support Vector Machine [8], Neuro-Fuzzy [9],
lationships between the predicted stock and various time series Hidden Markov model [10] and decision tree [11]. Concerning
data, such as other stocks, world stock market indices, foreign the other approaches, there are some studies that research the
exchanges and oil prices. We test our method for predicting interrelationship between stock prices of the predicted stock
the daily up and down changes in the closing value by using
discovered interrelationships, and experimental results show that and other time series data, such as foreign stock, temperature,
our methods can predict stock directions well, especially in the audience rate and so on. Sung and So analyzed association
manufacturing industry. rule for predicting changes in the Korea Compose Stock Price
Keywords—data mining; stock market prediction; Evolution Index based on the time series data of various interrelated
Strategy. world stock market indices [12]. Johan, Huina and Xiaojun
discovered that measurements of collective public mood states
I. I NTRODUCTION derived from large scale Twitter feeds correlate to the value of
These days, many kinds of data are stored and it is easy to the Dow Jones Industrial Average over time. Moreover they
gain access to them. On the other hand we are faced with also find an accuracy of 86.7% in predicting the daily up and
unmanageability of this data. We still don’t know what is down changes in the closing value by using its measurements
the best way to use stored data. Therefore, one of the most [13]. In this approach, however, it is difficult to discover which
important problems is to discover knowledge from these data data is interrelated with the predicted stock.
and make effective use of them. For this reason, studying about Therefore we propose a method that extracts interrelation-
data mining or knowledge discovery is required. In recent ships of changes in price between the predicted stock and
years, it has been possible to analyze huge amounts of data various time series data, such as other stocks, world stock
due to developments in computer technology. There are many market indices, foreign exchanges and oil prices from real data
studies in this field [1], [2]. automatically. We test our method for predicting the daily up
Regarding financial areas, there have been many studies of and down changes in the closing value by using discovered
the stock market prediction or stock investment using data interrelationships.
mining techniques. Indeed, there are many investors using The rest of this paper is organized as follows: Section 2
these techniques all over the world. The important things in discusses the proposed method. Experiments and results are
this area are to discover effective data and to make effective reported in Section 3 while conclusions are the topic of Section
use of the discovered data. 4.
There are two approaches to stock market predictions. One
approach focuses on data on the predicted stock, the other II. P ROPOSED M ETHOD
approach focuses on data aside from the predicted stock.
Concerning the first approach, there are two typical methods. While there are some causes of changes in stock prices,
One is fundamental analysis. This method predicts the stock information about other than the predicted stock should have
market by focusing on financial statements, interest rates, effects on the predicted stock. For example, a stock related
products, management, news and so on. This is used to get to exports is affected by foreign exchanges or foreign stocks.
some insight on whether it is overvalued or undervalued. There Therefore we aim to extract interrelations of changes in stock
are some studies to get some insight from news articles by prices between the predicted stock and various time series
using engineered method [3], [4], [5]. The other is technical data from real data, then predict the stock by using extracted
analysis. This is a type of method which predicts the stock interrelationships. This method is composed of two phases,
market by focusing on previous stock data, usually technical interrelation discovery phase and prediction phase.

978-1-4673-1686-6/12/$26.00 ©2012 IEEE 17


Variation pattern
A. Interrelation discovery phase Referenced time series data
t-4 t-3 t-2 t-1 t
In this phase, we discover interrelationships between the Dow Jones Average # # # # 0
predicted stock and various time series data. We investigate S&P500 #
3
4 # #
0
1
whether there is a specific fluctuation of referenced time The predicted stock FTSE100 3
1 # 2 # 4
SONY
series ahead of the rise or drop of the predicted stock. We SSE Composite Index # # # 4 3
define a specific fluctuation as a variation pattern, which is Hong Kong Hang Seng Index # # # #
3
4
a representation how referenced time series data has changed.


A variation pattern of referenced time series data is computed
one-on-one with the predicted stock. Figure 1 shows the Fig. 1. Outline of Interrelation discovery phase.
detailed description. How to compute the variation pattern is
as follows.
The variation pattern
1) Quantize referenced time series data: In this study,
we quantize referenced time series data for simplicity of t-4 t-3 t-2 t-1 t
The predicted stock
discovering interrelationships. How to quantize is as follows. 1
# or # 4 0 changes up at time t+1
At first, referenced time series data is converted to rate of 2
change of value:
y(t) − y(t − 1)
C(t) = , (1)
y(t − 1) Fig. 2. Example of a variation pattern.

where C(t) is the rate of change of value at time t, y(t) is


the value of referenced time series data at time t.
Then, C(t) is quantized to five classes. We define five 
H (H < 0.7)
classes by comparing with previous change rates to maintain f itness = (4)
H +n (otherwise)
appearance frequency of each class. The expression (2) shows
how to quantize. where h denotes the number of correct predictions and n
⎧ denotes the number of times that the variation pattern matches.

⎪ 0 (C(t) > Si , i = N × 0.1) We discover two variation patterns with respect to each


⎨ 1 (Si ≤ C(t) < Sj , i = N × 0.1, j = N × 0.3) referenced time series data: the pattern in prediction of up
Q(t) = 2 (Si ≤ C(t) < Sj , i = N × 0.3, j = N × 0.7) changes and down changes.



⎪ 3 (Si ≤ C(t) < Sj , i = N × 0.7, j = N × 0.9)

4 (otherwise) B. Prediction phase
(2)
In the interrelation discovery phase, we obtain variation
where Q(t) is the quantized number at time t, Si is the ith
patterns with respect to each referenced time series data. It can
C(t) which is sorted in descending order over the past N days,
not be said, however, that all of the obtained variation patterns
in this paper N is 150.
are useful. Therefore we must make a choice as to which
2) Discover the variation pattern: A variation pattern rep-
patterns are useful. In this study, we consider the pattern whose
resents how the referenced time series data has changed in the
h is high is better. We set a threshold of h for discovering
past ahead of the rise or drop of the predicted stock. Figure 2
better patterns. The expression (5) shows the procedure of
shows an example of a variation pattern. The number at time t
making the threshold th.
denotes the class number. In addition, # denotes not caring at
the time. This pattern is searched by using Evolution Strategy th = h + 4σ, (5)
(ES). ES searches the class number, whether caring or non-
caring and the operator to extract good variation pattern. The 1 
Nrd
operator denotes whether the class number cares a neighbor h= hi , (6)
Nrd i=1
number. We define three operators to make the variation
pattern flexible: A, B and C. The operator A denotes that the 
1 
Nrd
class number does not care a neighbor number, B denotes that
the class number cares a large one, and C denotes that the σ= (hi − h)2 . (7)
Nrd i=1
class number cares a small one.
The variation pattern takes account of not only the previous We consider the variation pattern whose h is higher than
data but the past M days data, 5 in this paper, as Fig. 2. ES threshold is better. We define such referenced data as likely
tries to search the pattern whose hit rate and the number of interrelated data (LID). We predict the stock when one of
prediction are high in learning period. The expression (3) and LIDs matches with its variation pattern. If the number of
(4) show hit rate H and fitness function, respectively. LID is more than 10, in order to raise the reliability of the
h prediction we predict the stock when more than two of LIDs
H= , (3) matches with its variation pattern.
n

18
TABLE I Likely interrelated data
PARAMETER SETTINGS FOR ES Dow Jones Average
Dow Jones
Average t-4 t-3 t-2 t-1 t
Parameter Value
(index) # # # # 0
Number of Generations 5,000
Generation alternation model (1+4)ES S&P500
Mutation rate 1/(gene length) (index)
S&P500 FTSE100
FTSE100
(index) t-4 t-3 t-2 t-1 t t-4 t-3 t-2 t-1 t
2 0 2
TABLE II # or # # or or # # # 0
3 1 3
PARAMETER SETS FOR THE VARIATION PATTERN

Parameter Candidate value


Fig. 3. Obtained variation patterns when we predict the stock of Sony.
Whether caring or not True,False
Operator A,B,C Likely interrelated data
Class number 0,1,2,3,4
The kinki Sharyo
(transportation equipment)
Tokyo Electron
(electric) Mitsubishi Steel
III. E XPERIMENTS AND R ESULT Toho Zinc
t-4 t-3 t-2 t-1 t
(nonferrous metal) example of
We test our method on the stock market prediction. Table NICHIREKI a variation pattern 1 3
# # or or 0
I and II show parameter settings for ES and candidates of (petroleum and coral) 2 4
parameters for the variation pattern at time t, respectively. The NIDEC TOSOK
(transportation equipment)
experimental settings are shown in Table III. We chose various Mitsubishi Steel It rose a bit
referenced time series data and typical Japanese stocks for at t-2 It rose greatly at t
(steel)
after falling at t-1
prediction. Pacific Metals
Table IV and V show numerical evaluations in predicting the (steel)
Daimei Telecom
daily up and down changes in the closing value. In this area, Engineering
it is better to predict the daily up or down in more than 55% (building)
hit rate. Atsalakis and Valavanis summarized 14 studies about
stock market prediction [9]. These results were 57.1% hit rate Fig. 4. Obtained variation patterns when we predict the stock of MIT-
on average and 68.3% in the best. Therefore this method can SUBISHI HEAVY INDUSTRIES.
predict stock directions well, especially in the manufacturing Likely interrelated data
industry. We think the reason for this is there seem to be more Dow Jones
relationships in the manufacturing industry, such as buying Average
or selling components from the company, capitalizing the (index)
CHINO
S&P500
company for technologies and so on. Moreover, our method example of t-4 t-3 t-2 t-1 t
(index)
is not only predicting the stock but also discovering likely TOYO
a variation pattern
0 1 2 3
interrelated data. (wholesale) or # or or or
1 2 3 4
Figure 3 shows LIDs and variation patterns in predicting CHINO
up change of the stock of Sony . All obtained LIDs are world (electric)
MEITEC It fell
stock market indices. Moreover all variation patterns express It rose greatly at
t-4
It stayed flat
greatly
(hospitality)
that index rises greatly just before the stock of Sony rises Heiwa
(at time t). It is often said that a stock related to exports is (machine)
affected by foreign economies, and our method obtained such The Kita-Nippon
interrelations. The same interrelationships are obtained when Bank
(banking)
we predict the stock of Hitachi, NEC, Nikon and Nissan, all
of them are a stock related to exports.
Figure 4 shows LIDs and a example of a variation pattern in Fig. 5. Obtained variation patterns when we predict the stock of Mitsubishi
Corporation.
predicting up change of the stock of MITSUBISHI HEAVY
INDUSTRIES (MHI). Almost all obtained LIDs are stocks
which seem to interrelate with heavy industry, such as train
manufacture, automobile component manufacture, steel manu- predicting down change of the stock of Mitsubishi Corporation
facture and so on. Furthermore, MHI is the biggest stockholder (MC). Since MC is a wholesale firm, we can understand why
of Mitsubishi Steel and has been involving in the business the Dow Jones Average and S&P500 are discovered as LID.
cooperation with The Kinki Sharyo. This is to say that our The others, however, seem not to interrelate with MC because
method discovered interrelationships which can be expressed there are no business alliance with MC or these stocks are not
why they have interrelationships. in the same kind of business area as MC. Nevertheless, they are
Figure 5 shows LIDs and a example of a variation pattern in useful in prediction because it goes well. Though we would not

19
explain why these interrelationships are discovered, we think it
is also important to discover such interrelations because there
may be some indirect interrelationships such as the butterfly
effect. In addition, we have to analyze the reason why these
are discovered because we may obtain new knowledge about
the industrial structure.
IV. C ONCLUSION
In this paper we proposed the methods that extract interre-
lationships of changes in prices between the predicted stock
and various time series data, such as other stocks, world stock
market indices, foreign exchanges and oil prices. Our method
calculates variation patterns which represent how referenced
time series has changed by using Evolution Strategy, then
extracts likely interrelated data, and predicts the stock by
using obtained interrelationships. We tested our method on
the stock market prediction. Experimental results showed
that our method can predict stock directions well, especially
in the manufacturing industry. Obtained LIDs are not only
stocks which is stockholder or has involved in the business
cooperation with the predicted stock but also stocks which
seem not to interrelated with the predicted stock.
In our future works, we will work on analyzing the interre-
lation between the predicted stock and other time series output
as LID, and develop a trading strategy using our method.
R EFERENCES
[1] Alex A, Freitas, “Data Mining and Knowledge Discovery with Evolu-
tionary Algorithms”, Springer, 2002.
[2] Ashish Ghosh, Lakhmi C. Jain, “Evolutionary Computation in Data
Mining”, Springer, 2004.
[3] Gabriel Pui Cheong Fung, Jeffrey Xu Yu, Wai Lam, “Stock Prediction: In-
tegrating Text Mining Approach using Real-Time News”, Computational
Intelligence for Financial Engineering, pp. 395 - 402, 2003.
[4] Di WU, Gabriel Pui Cheong FUNG, Jeffrey Xu YU, Qi PAN, “Stock
prediction: an event-driven approach based on bursty keywords”, FRON-
TIERS OF COMPUTER SCIENCE IN CHINA,Vol. 3, No. 2, pp. 145-
157, 2009.
[5] Robert P.Schumaker, Hsinchun Chen, “A Discrete Stock Price Prediction
Engine Based on Financial News”. Computer, Vol. 43, No. 1 ,pp. 51-56,
2010.
[6] Kyoung-jae Kim, “Artificial neural networks with evolutionary instance
selection for financial forecasting” , Expert System with Applications Vol.
30, pp. 519-526, 2006.
[7] Pawel B. Myszkowski, Lukasz Rachwalski, “Trading rule discovery on
Warsaw Stock Exchange using coevolutionary algorithms”, Proceedings
of the International Multiconference on Computer Science and Informa-
tion Technology, Vol. 4, pp. 81-88, 2009.
[8] Wei Huang, Yoshiteru Nakamori, Shou-Yang Wang, “Forecasting stock
market movement direction with support vector machine”, Computers &
Operations Research, Vol. 32, No. 10, pp. 2513-2522, 2005.
[9] George S.Atsalakis, Kimon P. Valavanis, “Forecasting stock market short-
term trends using a neuro-fuzzy based methodology”, Expert System with
Applications, Vol.36, pp. 10696-10707, 2009.
[10] Md. Rafiul Hassen, Baikunth Nath, Michael Kirley, “A fusion model
of HMM, ANN and GA for stock market forecasting” , Expert Systems
with applications, Vol. 33, No. 1, pp. 171-180, 2007.
[11] Jar-Kong Wang, Shu-Hui Chan, “Stock market trading rule discovery
using two-layer bias decision tree”, Expert System with Applications,
Vol.30, pp. 605-611, 2006.
[12] Sung Hoon Na, So Young Sohn, “Forecasting changes in Korea Com-
posite Stock Price Index(KOSPI) using association rule”, Expert System
with Applications, Vol.38, pp. 9046-9049, 2011.
[13] Johan Bollen, Huina Mao, Xiaojun Zeng, “Twitter mood predicts the
stock market”, Journal of Computational Science, vol.2, pp. 1-8, 2011.

20
TABLE III
E XPERIMENTAL SETTING

Referenced time series data 1371 stocks listed on the Tokyo Stock Exchange,
TOPIX, Dow Jones Average, S&P500, FTSE100, SSE Composite Index, Hong Kong Hang Seng Index,
yen-dollar exchange rate, yen-GBP exchange rate, yen-Euro exchange rate, yen-AUD exchange rate,
yen-NZD exchange rate, yen-CAD exchange rate, yen-CHF exchange rate, Euro-dollar exchange rate,
WTI
Predicted stock Nippon Meat Packers, Inc. (food) COSMO OIL Co, Ltd. (petroleum and coral)
(Industry) ITOCHU Corporation. (wholesale) Mitsubishi Corporation. (wholesale)
Hitachi, Ltd. (electric) NEC corporation. (electric) Sony Corporation. (electric)
Nikon Corporation. (precision equipment) MITSUBISHI HEAVY INDUSTRIES, LTD. (machine)
Nissan Motor Co, Ltd. (automobile) TOYOTA MOTOR CORPORATION. (automobile)
Training periods 4 January 1999 to 29 December 2006
Test periods 4 January 2007 to 30 December 2008

TABLE IV
NUMERICAL EVALUATIONS IN PREDICTING THE DAILY UP

training period test period


Stock name Number of LID Number of prediction Hit rate(%) Number of prediction Hit rate(%)
Nippon Meat Packers 8 519 68.4 125 52.8
COSMO OIL 8 437 68.9 117 52.1
ITOCHU Corporation 3 217 71.6 81 56.8
Mitsubishi Corporation 5 196 69.3 58 44.8
Hitachi 12 272 79.0 102 55.0
NEC 5 451 70.1 135 66.7
Sony 3 496 74.0 146 68.5
Nikon 12 302 80.1 74 60.8
MITSUBISHI HEAVY INDUSTRIES 8 449 69.3 98 59.2
Nissan Motor 11 246 80.1 103 63.1
TOYOTA MOTOR 2 187 70.1 50 60.0
Average 72.8 58.2

TABLE V
NUMERICAL EVALUATIONS IN PREDICTING THE DAILY DOWN

training period test period


Stock name Number of LID Number of prediction Hit rate(%) Number of prediction Hit rate(%)
Nippon Meat Packers 13 203 84.2 38 50.0
COSMO OIL 6 259 68.5 85 49.4
ITOCHU Corporation 8 368 69.2 127 68.5
Mitsubishi Corporation 7 358 69.1 152 62.5
Hitachi 10 238 75.6 88 69.3
NEC 16 672 76.2 175 64.6
Sony 8 693 68.1 190 59.5
Nikon 12 159 71.2 86 59.3
MITSUBISHI HEAVY INDUSTRIES 9 584 68.5 129 54.3
Nissan Motor 10 108 86.1 22 63.6
TOYOTA MOTOR 6 411 68.6 116 65.5
Average 73.2 60.6

21

You might also like