Improving stock trend prediction through financial time series classification and temporal correlation analysis based on aligning change point (1)
Improving stock trend prediction through financial time series classification and temporal correlation analysis based on aligning change point (1)
https://doi.org/10.1007/s00500-022-07630-7 (0123456789().,-volV)(0123456789().
,- volV)
Abstract
In order to improve the accuracy of stock prediction, people major in computer science and technology begin to apply their
techniques to the financial market. In the financial market, there are many similar but not simultaneous fluctuations caused
by different reaction efficiencies to the same event. Therefore, quickly reflected stocks’ trends could improve trend
predictions of similar slowly reflected stocks. To find the temporal correlation between stocks in the same securities
exchange, a financial time series classification approach based on aligning change points is proposed to help investors
discover hidden temporal correlations, which could improve stock trend prediction, to adjust portfolios. Firstly, the
securities index of the securities exchange is chosen to be the benchmark, and the important change points are screened out
to mark the essential fluctuations. Secondly, the points of all the constituent stocks of the same securities index which could
be aligned to the important change points of the index are screened out and aligned through the aligning algorithm. Then
the number of aligned stocks’ points in different types helps to divide stocks into lead class and lag class. Temporal
correlation and time difference are obtained through the temporal correlation analysis algorithm. Finally, four different
prediction models are used to verify whether the classification information and time difference obtained from temporal
correlation analysis could improve the stock trend prediction. The results show that our work could reveal potential
connections among stocks as a bridge to introduce valid exogenous information, which is promising for stock trend
prediction studies.
Keywords Financial time series Temporal correlation Align change point Stock trend prediction
1 Introduction et al. 2017), data mining, etc. have been applied in all fields
(Shih et al. 2019; Nasseri et al. 2015; Du and Rada 2018;
In recent years, with the rapid growth of data storage Hájek 2018; Dang and Lin 2016). As a typical form of data,
capacity and computing power, computer technologies time series data exists in many industries, such as finance
such as machine learning (Efendi et al. 2018), AI (Vargas (Efendi et al. 2018; Wang 2017; He and Shang 2017),
biology, medicine, etc. In the financial market, a large
number of time series data, such as stock quote data gen-
& Xiaolong Wang erated every trading day. Mining financial time series to
[email protected] find hidden correlations could help investors make more
Mengxia Liang rational investment decisions.
[email protected] A stock index of one securities exchange reflects the
Shaocong Wu composite situation of the global securities market. For
[email protected] example, Shanghai Securities Composite Index reflects the
1 composite situation of the listed securities in the Shanghai
Faculty of Computing, Harbin Institute of Technology,
Harbin 150001, China Securities Exchange in China. The prices of different
2 stocks may go up and down differently at ordinary times
College of Computer Science and Technology, Harbin
Institute of Technology, Shenzhen 518055, China before some decisive events happen. When a new event
123
3656 M. Liang et al.
123
Improving stock trend prediction through financial time series classification and temporal correlation… 3657
investment portfolio in time to gain more earnings. The to obtain the distance between two time series of different
contributions of this paper are as follows: lengths and get the optimal alignment. It was first proposed
for speech recognition to solve the template matching
1. An important change points screening algorithm is
problem with different pronunciation length (Sakoe and
proposed to screen important change points of the
Chiba 1978). Then it became one of the most common time
index and its component stocks.
series similarity measurements and is improved and com-
2. A change points align algorithm is proposed to get the
bined with many algorithms to achieve different research
important change points of component stocks aligned
purposes (Liang et al. 2021). Combined with perceptually
with the corresponding important change points of the
important points (PIPs), which is used to segment price
index. Then stocks could be classified into the lead or
series into subsequences dynamically, dynamic time
the lag class through the aligned data.
warping is used to find similar historical subsequences for
3. Time difference and temporal correlation between
assessing the efficient market hypothesis (EMH) (Tsi-
similar leading and lagging stocks could be obtained
naslanidis and Kugiumtzis 2014) statistically. As DTW
through the stock temporal correlation analysis
suffers from a quadratic computational cost and does not fit
approach.
well into support vector machine (SVM) which is mainly
4. The classification information and the time differences
because a direct positive definite kernel could not be
are both used to improve trend prediction accuracies of
derived from its definition, a computation speed-up of
stocks in the lag class. Four different models (Back-
DTW and its kernelization is proposed to lower the com-
track, Long Short-Term Memory (Bhandari et al.
putational cost without losing on the quality of the measure
2022), Bi-directional Long Short-Term Memory, Gated
(Soheily-Khah and Marteau 2019).
Recurrent Unit) are used to predict the stock trend in
DTW algorithms perform poorly when aligning
lag class, respectively. Different situations, such as
sequences of variable sampling frequencies, which makes
input data with similar stock in lead class obtained by
it challenging to apply DTW to practical problems, then an
our classification algorithm, or input data without the
EventDTW is proposed, which uses information propa-
external information, are used to compare the effec-
gated from defined events as the basis for path matching
tiveness of our classification algorithm and temporal
and hence sequence alignment (Jiang et al. 2020b). As a
correlation analysis.
nonlinear pattern-matching approach, DTW can optimally
The remainder of this paper is organized as follows. align motion signal sequences tending toward time-varying
Section 2 introduces the traditional time series classifica- or speed-varying expressions, so it could be used to capture
tion algorithm and compares it with the proposed algo- exceptional motions (Yang et al. 2019). DTW is also used
rithm. In Sect. 3, stocks are classified into lead class and in Ground Penetrating Radar (Jazayeri et al. 2019), hand-
lag class. The temporal relationship between similar stocks written signature verification (Yao and Wei 2016), similar
and the delay time calculation algorithm is described in historical pattern extraction of stock price(Yao and Wei
Sect. 4. Experiments and comparisons are shown in 2017; Udagawa 2017), speech segments clustering (Lerato
Sect. 5. Section 6 shows the conclusion. and Niesler 2019), skill classification in surgical VR
training (Vaughan and Gabrys 2020), bullish and bearish
class predictions for stocks (Tsinaslanidis 2018), correla-
2 Related work tion characteristics detection between financial time series
(Wang et al. 2019), temporal pattern leaning (Iwana et al.
2.1 Time series classification and correlation 2020), hand gesture recognition (Liu et al. 2019).
Dynamic Time Warping is still the main similarity
Time series classification (TSC) is a typical problem (Wu measure and correlation analysis algorithm. However, its
et al. 2021) of time series analysis and has been applied in weakness is obvious when analyzing financial time series:
different domains. Most classifications are based on simi-
1. The financial market is complex and affected by
larity measurement, and the distance between different
multiple factors. DTW is easily affected by minor
samples (Kenji Iwana and Uchida 2020; Majumdar and
fluctuations and noisy data in the real market.
Laha 2020) could reflect the similarity digitally.
2. Financial data is generated rapidly. DTW has a high
Although there are many different classification meth-
computational cost; it could not simultaneously process
ods and similarity measures, when referring to time series
a large amount of time series data of different stocks.
classification and time series similarity, most methods are
When analyzing the temporal correlation between
still based on or improved from Dynamic Time Warping.
stocks, the DTW distances calculated item by item
Dynamic Time Warping (DTW) is a classic distance cal-
would cost much time and computing resources.
culation algorithm between time series and is widely used
123
3658 M. Liang et al.
3. DTW is good at judging whether two time series are One is changing the form of the input using the different
similar in shape, but not whether one time series is lead preprocessing methods, and one is adding the features of
or lag than another time series. the input (Jiang et al. 2020a; Chen et al. 2021; Thakkar
4. DTW distance is focused on the distance in amplitude. et al. 2021; Wu et al. 2021). However, it is rare to introduce
However, temporal correlation analysis in this article other stock’s information to help improve the current
focuses on the time difference between similar time stock’s trend prediction in the literature, which is why this
series. work focuses on correlation analysis between different
stocks. With the introduction of effective exogenous
The comparison between pure DTW and the approach
information, the improvements of effects do not rely on the
proposed in this paper is shown in Table 1.
prediction model.
A new classification standard and time difference cal-
culation algorithm are proposed to cover the disadvantages
of DTW in analyzing financial time series. Unlike tradi-
tional classification, stocks are classified not based on
3 Stock classification
distance or similarity but on a time comparison between the
The stock index is chosen as a benchmark, and important
important change points of the stocks and their market
change points are chosen to be aligned. If the time of a
index. Apart from the research about time series change
stock change point is before its aligned index change point,
point detection (Bi et al. 2022), the change point in this
it is defined as a lead point. If the time of the stock change
work is the stock price reversal point, which represents a
point is after its aligned index change point, it is defined as
trend reversal. As this work focuses on the time difference
a lag point. If there are more lead points than lag points in a
between change points, the time of the point and its char-
stock, the stock belongs to the lead class, and on the con-
acteristics should be researched, but the time series seg-
trary, it belongs to the lag class. The rule is shown in
ments between different change points are not considered.
formula (1).
2.2 Stock prediction lead; numðLeadPointsÞ numðLagPointsÞ
ClassðstÞ ¼
lag; numðLeadPointsÞ\numðLagPointsÞ
According to the prediction result, the stock prediction ð1Þ
could be divided into two classes: stock price prediction
(Alhnaity and Abbod 2020; Niu et al. 2020; Zhang et al. In formula (1), LeadPoints is a set of stock st’s points
2020; Mohanty et al. 2021) and stock trend prediction whose time ahead of the aligned index points, LagPoints is
(Jiang et al. 2020a; Long et al. 2020; Chen et al. 2021; a set of stock st’s points whose time lag behind the aligned
Huang et al. 2021; Thakkar et al. 2021). Stock price pre- index points, num() is a method that count the number of
diction is a regression problem, and stock trend prediction the points in the set.
is a classification problem. As this work is focused on trend Let StockInfo = {StateTime, StockCode, open, high,
prediction, some comparisons between the proposed low, close, ChangeRate} present a set of stock daily raw
approach and recent trend prediction work are summarized data, which StateTime is the date that stock’s status record,
in Table 2 (abbreviations and corresponding detailed StockCode is the code of a stock, open is the opening price
descriptions can be checked in Table 8). of the corresponding day, high is the stock’s highest price
Compared with recent works, two ways are evident to in the period of the corresponding trading day, low is the
improve stock prediction. One is improving the prediction stock’s lowest price in the period of the corresponding
model; for example, instead of a single model (Thakkar trading day, close is the closing price of the corresponding
et al. 2021), hybrid models are used (Huang et al. 2021). day, and ChangeRate is the rate of change of the corre-
The other is improving the input of the prediction model. sponding day which is defined as formula (2).
When referring to improving the prediction model input, ChangeRate½i ¼ ðclose½i close½i 1Þ =close½i 1
there are still two different directions for improvement. ð2Þ
123
Table 2 Comparison between Recent Work and the Proposed Approach
Article Dataset Input Preprocessing Predicting Model Granularity Effect
Method
Jiang et al. S&P 500 OHLC CMDV Meta- LassoLR Daily Acc = 0.7074 (Nasdaq)
(2020a) DJIA 30 8 TIs
Nasdaq 16 MIs
Chen et al. CSM OHLC ICGN GC-CNN Daily Acc = 53.37% (stock 300330)
(2021) 10 Tis Dual-CNN
Thakkar et al. NSE of India Historical Dataset PCC VNN Daily Acc = 78.48% (VNN-PCC, HDFC bank, discrete data)
(2021) 10 TIs APCC
Huang et al. EURUSD C/OHLCVA/ NVG Hybrid Models (CNN, ResNet Daily Acc = 0.660 (MotifCNNpred-GRU on SSCI)
(2021) SSCI and GRU) Minute
SZSESMEP
CNPI
Long et al. CSC IP CNN Attention-based Bi-LSTM Daily Acc = 75.89% (CITIC Securities)
(2020) CTFD
Proposed SSCI related component C/CR (2 ICPSA, CPAA, Backtrack, LSTM Daily Acc = 67.5%, 84.21%
Improving stock trend prediction through financial time series classification and temporal correlation…
123
3660 M. Liang et al.
In formula (2), ChangeRate[i] is the change rate of day severity, which also determines the degree of importance of
i, close[i] is the close price of day i, close[i-1] is the close a point in a time series in this article. Important change
price of day i-1. points are used to represent significant fluctuations, so it is
defined as an important change point when its ChangeRate
3.1 Important change point screening algorithm is higher than a threshold value, the PeakLimit, or lower
than a threshold, the TroughLimit. The important change
Stock’s ChangeRate time series and close price time series points screening algorithm is described in Algorithm 1.
are used to help screen the important change points. Let The time complexity of Algorithm 1 is O(l-2), where l is
IndexPole = {PoleNo, IndexCode, IndexTime, Change- the length of the close price time series of the stock index.
Rate, PoleType} present a set of change points of index that Important change points of stocks are obtained using the
meet the condition which is defined as important change same way, and StockPole = {PoleNo, StockCode, Stock-
point, PoleNo is the only identify of a change point, Time, ChangeRate, PoleType} is a set of change points of
IndexCode is the code of index which is chosen to be the stocks. A schematic diagram of ICPSA is shown in Fig. 3.
benchmark, IndexTime is the timestamp of the change
point, ChangeRate is the ChangeRate of index at the same 3.2 Change points align algorithm
time of the change point which is same to ChangeRate in
StockInfo, and PoleType is the type of pole when the After the important change points of the index and stock
change point is in close price time series curve, it may be a are obtained, the change points align algorithm (CPAA) is
peak or a trough. used to get the stock’s important change points aligned
ICPSA (Important Change Point Screening Algorithm) with the index’s important change points. An align table is
screens important change points from time series, using a used to help record the align situation.
defined ChangeRate value to help find change points that The structure of aligning table is AlignData = {Stock-
represent important fluctuations. Code, IndexCode, StockTime, IndexTime, LeadorLag}, in
In the real data, the ChangeRate of a stock could rep- which StockCode is the code of stock, IndexCode is the
resent the stock trend and the amplitude of a stock change code of chosen index, StockTime is the time of the stock
in a trading day. If ChangeRate [ 0, that means the stock change point aligned with the index change point, Index-
trend is upward, and if ChangeRate \ 0, that means the Time is the time of the current change point of the index,
stock trend is downward, and if ChangeRate = 0, that LeadorLag is a tag which is used to show whether the stock
means there is no trend. The magnitude of the absolute change point lead or lag behind the index change point.
value of the ChangeRate indicates the fluctuation’s When LeadorLag is ‘‘lead’’, that means the stock’s change
123
Improving stock trend prediction through financial time series classification and temporal correlation… 3661
point lead its aligned index’s change point, just like point next index point, if it has not, and there is not a point at
d of stock is aligned with point a of index in Fig. 4, but DecTime whose pole type is same as the chosen index
point d is before point a at the timeline. When LeadorLag change point, add this align record to the align table with
is ‘‘lag’’, that means the stock’s change point is after its the LeadorLag tag is ‘‘lag’’, go on to the next index change
aligned index change point, just like point e of stock is point and back to Step 2.
aligned with point b of index in Fig. 4, but point e is after Step 5 If there is neither a stock change point at Dec-
point b at a timeline. When the LeadorLag tag is ‘‘align’’, Time nor a stock change point at IncTime has the same
that means the time of stock change point is the same as the PoleType with the chosen index change point, let DecTime
time of index change point, just like the pair of aligned move forward a time unit in the direction of the decreasing
points, point c, and point f in Fig. 4. The main step of time to update the DecTime, and at the same time let the
aligning algorithm is shown as below. IncTime move forward a time unit in the direction of the
Step 1 Choose an important change point of the index. increasing time to update the IncTime, and back to Step 3
Step 2 Take the time of chosen index change point as the and Step 4.
start time, and check the stock important change points to Step 6 Repeat Step 2, Step 3, Step 4, and Step 5 until all
find whether there is an important change point whose the index important change points have been aligned or
PoleType is the same as the chosen index’s change point at checked.
the same time. If there is, check the point of stock that Some special situations need to pay attention to:
whether it has been aligned to index’s point before; if not, Step 3 and Step 4 of the aligning algorithm are simul-
add this align record to align table with the LeadorLag tag taneous in reality. Then there may be a situation that both
‘‘align’’, otherwise, go on to the next index point and back at DecTime and IncTime, there are two stock change points
to the start of Step 2. If there is no stock change point of the same PoleType with the index change point, shown
whose time is same as the index change point, let the start in Fig. 5, ta - tb = tc - ta. It is difficult to judge which
time move forward one unit of time in the direction of the point is the real aligning point of this index point. This
increasing and decreasing time. Then, DecTime is used to index change point will not be aligned to any of the two
present the time in decreasing time direction, and IncTime stock change points not to affect the classification result.
is used to present the time in an increasing time direction. The schematic diagram of aligning algorithm is shown
Step 3 Check the stock important change points, whether in Fig. 6. The solid line represents the index, and the
there is a stock change point whose pole type is the same as dashed line represents the stock. The stock change point
the chosen index change point at DecTime; if yes, check b is aligned with index change point a, and the stock
whether the stock change point has been aligned with the change point d is aligned with index change point c.
index change point, if it has, goes on to the next index point, if If the nearest stock point of the same PoleType has been
it has not, and there is not a point at IncTime whose pole type aligned with other index points, which is just like point a, b and
is same as the chosen index change point, add this align c in Fig. 7, stock important point b has already been aligned
record to the align table with the LeadorLag tag is ‘‘lead’’, go with index important point c, and index important point a’s
on to the next index change point and back to Step 2. nearest stock point of the same PoleType is also point b, then
Step 4 Check the stock important change points and this index point a would not be aligned to any point of stock,
whether there is a stock change point whose PoleType is and it should be marked that it has been checked.
the same as the chosen index change point at IncTime; if The whole process of the change points align algorithm
there is, check whether the stock change point has been could also be described by Algorithm 2.
aligned with the index change point, if it has, go on to the
123
3662 M. Liang et al.
123
Improving stock trend prediction through financial time series classification and temporal correlation… 3663
Fig. 5 Example of two stock change points could be aligned with the
index change point in both directions
The maximum time complexity of Algorithm 2 is Fig. 6 Example of the normal aligned change points couples
O(mn), where m is the number of the index important
change points, and n is the number of the stock important
change points needed to be aligned to the index. The
minimum time complexity is O(m), which is an ideal state
that every index point could find its aligned stock point at
the first time and check the stock points without searching
for other stock points.
After all the change points of the index have been aligned,
the align table is obtained. Then the number of lead points
of a stock and the number of lag points of a stock could be
calculated. If one stock has more lead points than lag points
when the important change points align with index change Fig. 7 Example of the nearest stock point has been aligned before
points, it should belong to the lead class. On the contrary, it
should belong to the lag class. The SCA (stock classifica-
tion algorithm) is described by Algorithm 3.
123
3664 M. Liang et al.
The time complexity of Algorithm 3 is O(a), where a is could help improve the lag-class stock’s trend prediction.
the number of the records in AlignData obtained from Only the lag-class stock needs to find its most similar lead-
Algorithm 2. class stock, and not all the lead-class need to find its most
Through Algorithm 3, it is obvious that the classification similar lag-class; the proposed classification could reduce
of a stock is based on the number of lead-type important unnecessary computation.
points and the number of lag-type important points. The The time complexity of Algorithm 4 is O(ls), where ls is
ICPSA is used to find the important points, and the CPAA the number of stocks classified as the lead-class stocks.
is used to judge the important points (lead-type or lag- After obtaining stock in lag class and its most similar
type). stock in lead class through Algorithm 4, the real delay time
between lead-class and lag-class stock should be calculated
by the delay time calculation algorithm (DTCA) to help
4 Stock temporal relationship analysis improve the lag-class stock prediction. The align table ob-
and stock prediction tained by Algorithm 2 could be used to obtain the time
delay. Each stock has its own align table record align
Affecting by the same event, similar stocks may have information of all the important points. Comparing differ-
similar volatility, which is shown in Fig. 1. Due to the ent align tables of similar stocks, the delay time between
different reaction efficiencies, a sensitive stock may have similar pairs of stocks is calculated by Algorithm 5.
volatility earlier, and its similar insensitive stock may show The maximum complexity of Algorithm 5 is O(p2),
up similar volatility after a time interval, which causes a where p is the number of index poles that have been
temporal relationship between similar stocks. So, a stock of aligned, and the minimum complexity of Algorithm 5 is
lead class may have a temporal relationship with similar O(p).
stocks of lag class. The time series of lead-class stock’s price and the delay
The dynamic time warping method is used to find the time could be used to predict the trend of the lag-class
lagging stock’s most similar leading stock. The process of stock. The principle is that if the current trend of the lead
finding the most similar stock is shown in Algorithm 4. In stock is known, and the delay time between the lead-class
reality, because stocks in the lag class have no help with stock and the lag-class stock is known, the lag-class stock
stock prediction, there is no need to calculate all the DTW is supposed to get the same trend as the current lead-class
distances one by one. In practice, a lag-class stock is stock trend after the delay time. A prediction verification
picked, and DTW distances between lead-class and lag- algorithm is proposed as Algorithm 6.
class stocks are calculated to find a lead-class stock that
123
Improving stock trend prediction through financial time series classification and temporal correlation… 3665
The maximum time complexity of Algorithm 6 is O(s The framework of the proposed approach is shown in
Maxðle; laÞ), where s is the number of couples that are Fig. 8.
formed by a lead-class stock and its most similar lag-class
stock, le is the length of the current lead-class stock’s
ChangeRate series, la is the length of the current lag-class 5 Performance evaluation
stock’s ChangeRate series, Max() return the maximum
value. The minimum time complexity of Algorithm 6 is Actual stock data are used for the experiments following
O(s). the algorithm described before, and three prediction
models are used to verify the effectiveness of the
123
3666 M. Liang et al.
proposed classification algorithm and temporal correla- Step 4 Backtrack the time series of the related stock dt
tion analysis. and get the trend at that time point, which is the basic
stock’s future trend.
5.1 Experiment The peaklimit and the troughlimit are taken as 0.5 to
screen important points of the stock index and change
Shanghai Stock Index and its constituent stocks are used as points of component stocks. Finally, following the formula
experimental data. Two experiment groups used different (1), 58 stocks (about 4.04% of the whole sample) are
models to verify the effectiveness of the stock classification classified to lead class, and 1379 stocks (about 95.96% of
and correlation analysis approach. the whole sample) are classified to lag class through the
Group 1 1437 stocks’ 2018 whole year daily market data stock classification method proposed in Sect. 3. Questions
are used to experiment. Close time series are used to 1 and 2 proposed in Sect. 1 are resolved.
classify the stocks and compute the dynamic time warping Then stocks in the lag class are the new benchmark.
distance, and market data of lead class stocks after classi- Dynamic Time Warping is used to find the most similar
fication is used to predict the stock trend of its most similar stock in the lead class. Each stock in the lag class has its
stock in lag-class. The trend of any stock in the lag class most similar stock in the lead class. Then stock pairs are
should be the same as its most similar in the lead class after obtained. It is believed that there is a temporal correlation
the time difference between two stocks. The prediction between the stock of lead class and the stock of lag class in
process follows the steps below: a pair of stocks. Then delay time between two stocks in a
Step 1 Choose a lag-class stock as the basic stock. stock pair is calculated by Algorithm 5. The delay time
Step 2 Compute all the DTW distances between the situation between lag class stock and similar lead class
basic stock and all the lead-class stocks, and find the basic stock is partly shown in Table 3. This information is nee-
stock’s most similar lead-class stock as the related stock. ded to improve stock trend prediction, obtained from stock
Step 3 DTCA is used to get the delay time dt between classification and correlation analysis.
basic and related stock.
CPAA
123
Improving stock trend prediction through financial time series classification and temporal correlation… 3667
Table 5 The settings of three scenarios for each part of the experiments in Group 2
Situations Input (conditions) Output (target)
(no.)
10 days ChangeRate time series of 10 days ChangeRate time series of Time
basic stock related stock difference
Group 2 In this group, we further explore the effect of prediction models (LSTM, Bi-LSTM, and GRU) are used
the proposed method on the prediction model. The exper- to predict the trend of the basic stock. Data from 2018-01-
iments are divided into three parts, and three prediction 01 to 2019-01-31 are used as a dataset for training and
models are considered: LSTM, Bi-LSTM, and GRU. Three testing prediction models (abbreviated as ‘‘Period for pre-
different length periods of the Shanghai Index and its diction’’). Considering the dataset’s period is about
component stocks’ daily data are used to study the effects 13 months, most of the earlier data are used as the training
of different periods and market situations of the proposed set (12/13 of the total) and the remaining 1/13 as the test
method. The parameter settings for these experiments are set.
detailed in Table 4, and we take the first part of the For each part of the experiments and models, three sit-
experiment as an example to illustrate. uations were designed to validate the effectiveness of the
In the first part, data from 2018-01-01 to 2018-12-31 are proposed classification method and temporal analysis.
used to classify stocks into lead or lag class (abbreviated as These situations are progressive, and the information
‘‘Period for classification’’). Closing price time series are obtained from classification and temporal analysis is taken
used for classification. Subsequently, the stock whose code as the auxiliary information of the input for prediction
is 600837.SH in lag class is chosen as the basic stock and models. Table 5 details the relevant settings. The output of
its similar stock whose code is 600030.SH in lead class is the prediction model is the same in all situations, and the
chosen as the related stock. The classification has been difference lies in the information contained in the inputs.
completed, and the next stage is prediction, where three
123
3668 M. Liang et al.
LSTM
Stock-1 (Lag) FC
Change_rate Series
Input Output
Situation 1. Stock-1 (Lag)
C1-1 C1-2 …… C1-10 … … C1-11
LSTM
C1-1 C1-2 …… C1-10 FC
Stock-1 (Lag) Stock-2 (Lead)
Situation 2. Input Output
Stock-1 (Lag)
… … C1-11
C2-1 C2-2 …… C2-10
LSTM
Stock-1 (Lag) Stock-2 (Lead)
Input FC
Output
Situation 3. Stock-1 (Lag)
Time Difference between … … C1-11
Stock-1 and Stock-2
Table 6 The parameter settings of the LSTM, Bi-LSTM, and GRU Situations 2 and 3 also show how to use sensitive stocks’
information to improve insensitive stocks’ trend prediction
Parameters LSTM Bi-LSTM GRU
accuracy without future information, then Question 3 in
Num of layer 3 1 2 Sect. 1 is solved.
Num of full connection layer 1 1 1 To verity the effectiveness of the information obtained
Input size 10/20/21 10/20/21 10/20/21 through algorithms in this work, which is also to eliminate
Num of hidden units 5/10/10 5/10/10 5/10/10 the role of the model as much as possible, the LSTM model
Output size 1 1 1 in this work is the standard LSTM model; not any variant
of the model is considered. To further verify the perfor-
mance of exogenous information on different models, the
Bi-LSTM and GRU models have also been evaluated. The
The following is an example of the input and output of inputs of models are the same as the LSTM model. The
LSTM in three situations (Fig. 9). parameter settings of these models are shown in Table 6
below. In all the models, the loss function is MSE, and the
1. Situation 1 only 10 days ChangeRate time series of the learning rate is 0.01.
basic stock is used to predict the ChangeRate value on
the 11th day. 5.2 Result and discussion
2. Situation 2 10 days ChangeRate time series of the
basic stock and same days’ ChangeRate time series of In Group 1, the result of stock classification is obtained,
the related stock are used to predict the basic stock’s and to prove the efficiency of improving stock trend pre-
ChangeRate value on the 11th day. diction, stock market time series data of lead stocks and the
3. Situation 3 10 days ChangeRate time series of the delay time situation are used to predict the trend of the lag
basic stock and same days’ ChangeRate time series of stock. The actual stock change time series of the lag class is
the related stock, together with the time difference used to verify whether the stock of the lag class fluctuates
(rounding up) obtained from the approach described in with its similar stock of the lead class after the time dif-
Sect. 4, are used to predict the basic stock’s Change- ference between the two stocks.
Rate value on the 11th day.
123
Improving stock trend prediction through financial time series classification and temporal correlation… 3669
Table 7 Experiment results of Group 2. (SCTS: stock change time series, TD: time difference)
Period for Basic stock Related stock Time Models Accuracy of trend prediction
prediction (Lag-class) (Lead-class) difference
Situation 1 Situation 2 Situation 3
(1 SCTS) (2 SCTS) (2 SCTS ? TD)
The final prediction accuracy is 67.5%, compared with According to the complexity of the financial market,
the prediction accuracy of 33.3% (theoretical accuracy of different periods and lengths of the period could get dif-
3-classification) without using stock classification infor- ferent accuracy of prediction. It is reasonable that sensitive
mation and the temporal relationship between stocks, and stocks during different periods may be different. Some
the prediction accuracy has obvious improvement. sensitive stocks in one period may become insensitive in
In Group 2, the accuracy of trend prediction is shown in another period. That may be caused by stocks’ different
Table 7. reaction times to different market events. Then if a period
It is obvious that with the information after classification is too long to include too many events, including a stock’s
and correlation analysis, the accuracy of trend prediction sensitive and insensitive events, the stock’s lead or lag
improved. With the data of the most similar lead-class status may be counteracted. Then the classification would
stock, higher accuracy could be obtained without being be inaccurate, and the time difference obtained from the
limited to the prediction model. The increment of input proposed approach would also be inaccurate. That would
data of different models is used to verify the effectiveness lead to low prediction accuracy. That is why when using
of the related stock’s information obtained by the correla- only stock’s time series, prediction accuracy, which using
tion analysis. Moreover, all the models get higher accuracy long-period data would get more accurate than short-period
with the exogenous information, which is enough to show data. However, prediction accuracy using short-period data
that the classification and the correlation analysis are would improve when combined with the lead-class stock’s
effective. information and time difference.
The classification of stocks is different during different
periods. Stock in the lead class may become a lag-class
stock in another period, so a couple of stocks from different 6 Conclusion
classes were not the same in different periods. However,
the prediction results show that no matter the classification A financial time series classification and temporal corre-
result, the lead-class stock obtained by the approach pro- lation analysis approach based on aligning change points is
posed in this work can always improve the related lag-class proposed in this paper. The time series of the stock index is
stock’s trend prediction. taken as the benchmark to classify its component stocks
According to the experiment, it was found that the lead- into lead class and lag class. Then stock in lead class and its
class stock’s data could improve the accuracy of the lag- most similar stock in lag class has a temporal correlation.
class stock trend prediction in all the periods experimented. Stock in the lead class and the time difference between two
However, classification and correlation analysis using half- stocks are used to predict the stock trend in the lag class.
year stock data could sometimes get more accurate pre- This work is to find useful exogenous information to
dictions than classification using whole-year data. This improve the prediction accuracy of stock trends, and the
situation is discussed and analyzed below: classification of stocks, a couple of similar stocks from
different classes, and the time difference between stock
123
3670 M. Liang et al.
couples are all the exogenous information obtained by the Acknowledgements This work is supported by the National Natural
correlation analysis. The experiment shows that classifi- Science Foundation of China [grant number 61573118]; the Science
and Technology Planning Project of Shenzhen Municipality [grant
cation in this approach is efficient, and the temporal cor- number JCYJ20190806112210067].
relation based on classification is useful and could
obviously improve the stock trend prediction accuracy. Author contributions Conceptualization: [ML]; Methodology: [ML];
Formal analysis and investigation: [ML], [SW]; Software: [ML];
Writing—original draft preparation: [ML]; Writing—review and
editing: [ML], [XW], [SW]; Funding acquisition: [XW]; Resources:
Appendix [ML], [XW]; Supervision: [XW].
See Table 8. Funding This work was supported by the National Natural Science
Foundation of China [Grant No.: 61573118]; the Science and Tech-
Datasets
S&P The Standard & Poor’s 500 Index DJIA The Dow Jones industrial average
500
Nasdaq National Association of Securities Dealers Automated CSM Chinese Stock Market
Quotations
NSE National Stock Exchange EURUSD The exchange rate of Euro to US dollar
SSCI Shanghai Securities Composite Index SZSESMEP Shenzhen Stock Exchange Small & Medium
Enterprises Price Index
CNPI ChiNext Price Index CSC Chinese Securities Company
Input
O The Opening price H The Highest price
L The Lowest price C The Closing price
V The trading Volume TI Technical Indicators
MI Macroeconomic Indicator MFTS Multivariate Financial Time Series
A The trading Amount IP Investor Profiles
CTFD Client Transaction Flow Data CR Change Rate
DT Delay Time
Preprocessing
CMDV Correlation Matrix of Different Variables ICGN Improved Graph Convolutional Network
CNN Convolutional Neural Network PCC Pearson Correlation Coefficient
APCC Absolute Pearson Correlation Coefficient NVG Natural Visibility Graph
Model
LSTM Long Short-Term Memory Meta Meta-classifier
Lasso Least Absolute Shrinkage and Selection Operator GRU Gated Recurrent Unit
RNN Recurrent Neural Networks GC-CNN Graph Convolutional Feature based Convolutional
Neural Network
VNN Vanilla Neural Network CNN Convolutional Neural Network
123
Improving stock trend prediction through financial time series classification and temporal correlation… 3671
nology Planning Project of Shenzhen Municipality [Grant No.: of nonuniform sampling frequencies. Sensors (switzerland).
JCYJ20190806112210067]. https://doi.org/10.3390/s20092700
Kenji Iwana B, Uchida S (2020) Time series classification using local
Data availability The stock data used in this article is collected by distance-based features in multi-modal fusion networks. Pattern
Tushare Pro (https://tushare.pro/). Recognit 97:1–12. https://doi.org/10.1016/j.patcog.2019.107024
Lee C, Shleifer A, Thaler RH (2003) Investor sentiment and the
Code availability All code was written in Python. closed-end-puzzele. 1–51
Lerato L, Niesler T (2019) Feature trajectory dynamic time warping
for clustering of speech segments. Eurasip J Audio Speech
Declarations Music Process. https://doi.org/10.1186/s13636-019-0149-9
Liang M, Wang X, Wu S (2021) A novel time-sensitive composite
Conflict of interest The authors have no conflicts of interest to declare similarity model for multivariate time-series correlation analysis.
that are relevant to the content of this article. Entropy. https://doi.org/10.3390/e23060731
Liu YT, Zhang YA, Zeng M (2019) Adaptive global time sequence
averaging method using dynamic time warping. IEEE Trans
References Signal Process 67:2129–2142. https://doi.org/10.1109/TSP.2019.
2897958
Alhnaity B, Abbod M (2020) A new hybrid financial time series Long J, Chen Z, He W et al (2020) An integrated framework of deep
prediction model. Eng Appl Artif Intell 95:103873. https://doi. learning and knowledge graph for prediction of stock price trend:
org/10.1016/j.engappai.2020.103873 an application in Chinese stock exchange market. Appl Soft
Bhandari HN, Rimal B, Pokhrel NR et al (2022) Predicting stock Comput J 91:106205. https://doi.org/10.1016/j.asoc.2020.
market index using LSTM. Mach Learn with Appl 9:100320. 106205
https://doi.org/10.1016/j.mlwa.2022.100320 Majumdar S, Laha AK (2020) Clustering and classification of time
Bi X, Zhang C, Wang F et al (2022) An uncertainty-based neural series using topological data analysis with applications to
network for explainable trajectory segmentation. ACM Trans finance. Expert Syst Appl. https://doi.org/10.1016/j.eswa.2020.
Intell Syst Technol 13:1–18. https://doi.org/10.1145/3467978 113868
Chen W, Jiang M, Zhang WG, Chen Z (2021) A novel graph Mohanty DK, Parida AK, Khuntia SS (2021) Financial market
convolutional feature based convolutional neural network for prediction under deep learning framework using auto encoder
stock trend prediction. Inf Sci (NY) 556:67–94. https://doi.org/ and kernel extreme learning machine. Appl Soft Comput
10.1016/j.ins.2020.12.068 99:106898. https://doi.org/10.1016/j.asoc.2020.106898
Dang HV, Lin M (2016) Herd mentality in the stock market: on the Nasseri AL, Tucker A, De Cesare S (2015) Quantifying StockTwits
role of idiosyncratic participants with heterogeneous informa- semantic terms’ trading behavior in financial markets: an
tion. Int Rev Financ Anal 48:247–260. https://doi.org/10.1016/j. effective application of decision tree algorithms. Expert Syst
irfa.2016.10.005 Appl 42:9192–9210. https://doi.org/10.1016/j.eswa.2015.08.008
Du J, Rada R (2018) A semantic-based, distance-proportional Niu T, Wang J, Lu H et al (2020) Developing a deep learning
mutation for stock classification. Expert Syst Appl 95:212–223. framework with two-stage feature selection for multivariate
https://doi.org/10.1016/j.eswa.2017.11.029 financial time series forecasting. Expert Syst Appl 148:113237.
Efendi R, Arbaiy N, Deris MM (2018) A new procedure in stock https://doi.org/10.1016/j.eswa.2020.113237
market forecasting based on fuzzy random auto-regression time Sakoe H, Chiba S (1978) Dynamic programming algorithm opti-
series model. Inf Sci (ny) 441:113–132. https://doi.org/10.1016/j. mization for spoken word recognition. IEEE Trans Acoust
ins.2018.02.016 26:43–49. https://doi.org/10.1109/TASSP.1978.1163055
Hájek P (2018) Combining bag-of-words and sentiment features of Shih SY, Sun FK, Lee H (2019) Temporal pattern attention for
annual reports to predict abnormal stock returns. Neural Comput multivariate time series forecasting. Mach Learn
Appl 29:343–358. https://doi.org/10.1007/s00521-017-3194-2 108:1421–1441. https://doi.org/10.1007/s10994-019-05815-0
He J, Shang P (2017) Comparison of transfer entropy methods for Soheily-Khah S, Marteau PF (2019) Sparsification of the alignment
financial time series. Phys A Stat Mech Its Appl 482:772–785. path search space in dynamic time warping. Appl Soft Comput J
https://doi.org/10.1016/j.physa.2017.04.089 78:630–640. https://doi.org/10.1016/j.asoc.2019.03.009
Huang Y, Mao X, Deng Y (2021) Natural visibility encoding for time Thakkar A, Patel D, Shah P (2021) Pearson correlation coefficient-
series and its application in stock trend prediction. Knowl-Based based performance enhancement of vanilla neural network for
Syst 232:107478. https://doi.org/10.1016/j.knosys.2021.107478 stock trend prediction. Neural Comput Appl 33:16985–17000.
Iwana BK, Frinken V, Uchida S (2020) DTW-NN: A novel neural https://doi.org/10.1007/s00521-021-06290-2
network for time series recognition using dynamic alignment Tsinaslanidis PE (2018) Subsequence dynamic time warping for
between inputs and weights. Knowl-Based Syst 188:104971. charting: bullish and bearish class predictions for NYSE stocks.
https://doi.org/10.1016/j.knosys.2019.104971 Expert Syst Appl 94:193–204. https://doi.org/10.1016/j.eswa.
Jazayeri S, Saghafi A, Esmaeili S, Tsokos CP (2019) Automatic 2017.10.055
object detection using dynamic time warping on ground Tsinaslanidis PE, Kugiumtzis D (2014) A prediction scheme using
penetrating radar signals. Expert Syst Appl 122:102–107. perceptually important points and dynamic time warping. Expert
https://doi.org/10.1016/j.eswa.2018.12.057 Syst Appl 41:6848–6860. https://doi.org/10.1016/j.eswa.2014.
Jiang M, Liu J, Zhang L, Liu C (2020a) An improved Stacking 04.028
framework for stock index prediction by leveraging tree-based Udagawa Y (2017) Approach for retrieving similar stock price
ensemble models and deep learning algorithms. Phys A Stat patterns using dynamic programming method. ACM Int Conf
Mech Its Appl 541:122272. https://doi.org/10.1016/j.physa.2019. Proceeding Ser. https://doi.org/10.1145/3151759.3151820
122272 Vargas MR, De Lima BSLP, Evsukoff AG (2017) Deep learning for
Jiang Y, Qi Y, Wang WK et al (2020b) EventDTW: An improved stock market prediction from financial news articles. 2017 IEEE
dynamic time warping algorithm for aligning biomedical signals int conf comput intell virtual environ meas syst appl CIVEMSA
123
3672 M. Liang et al.
2017—Proc 60–65. https://doi.org/10.1109/CIVEMSA.2017. 2016 tackling new challenges automation and computing,
7995302 pp 108–113. https://doi.org/10.1109/IConAC.2016.7604903
Vaughan N, Gabrys B (2020) Scoring and assessment in medical VR Yao X, Wei HL (2017) Short-term stock price forecasting based on
training simulators with dynamic time series classification. Eng similar historical patterns extraction. ICAC 2017—2017 23rd
Appl Artif Intell 94:103760. https://doi.org/10.1016/j.engappai. IEEE International conference on automation and computing
2020.103760 Addressing Glob Challenges through automation and computing,
Wang XX, Xu LY, Yu J et al (2019) Detection of correlation pp 7–8. https://doi.org/10.23919/IConAC.2017.8082009
characteristics between financial time series based on multi- Zhang Y, Yan B, Aasma M (2020) A novel deep learning framework:
resolution analysis. Adv Eng Inform 42:100957. https://doi.org/ Prediction and analysis of financial time series using CEEMD
10.1016/j.aei.2019.100957 and LSTM. Expert Syst Appl 159:113609. https://doi.org/10.
Wang Y (2017) Stock market forecasting with financial micro-blog 1016/j.eswa.2020.113609
based on sentiment and time series analysis. J Shanghai Jiaotong
Univ 22:173–179. https://doi.org/10.1007/s12204-017-1818-4 Publisher’s Note Springer Nature remains neutral with regard to
Wu S, Wang X, Liang M, Wu D (2021) Pfc: A novel perceptual jurisdictional claims in published maps and institutional affiliations.
features-based framework for time series classification. Entropy
23:1–23. https://doi.org/10.3390/e23081059
Springer Nature or its licensor (e.g. a society or other partner) holds
Yang CY, Chen PY, Wen TJ, Jan GE (2019) Imu consensus exception
exclusive rights to this article under a publishing agreement with the
detection with dynamic time warping—a comparative approach.
author(s) or other rightsholder(s); author self-archiving of the
Sensors (switzerland). https://doi.org/10.3390/s19102237
accepted manuscript version of this article is solely governed by the
Yao X, Wei HL (2016) Off-line signature verification based on a new
terms of such publishing agreement and applicable law.
symbolic representation and dynamic time warping. 2016 22nd
International conference on automation and computing ICAC
123