0% found this document useful (0 votes)
8 views

Ref 2

a

Uploaded by

Quân Phạm
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Ref 2

a

Uploaded by

Quân Phạm
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Expert Systems With Applications 228 (2023) 120474

Contents lists available at ScienceDirect

Expert Systems With Applications


journal homepage: www.elsevier.com/locate/eswa

A hybrid stock market prediction model based on GNG and


reinforcement learning
Yongming Wu a, Zijun Fu a, b, *, ⋅Xiaoxuan Liu a, ⋅Yuan Bing a
a
Sate Key Laboratory of Public Big Data, Guizhou University, Guiyang 550025, China
b
College of Computer Science and Technology, Guizhou University, Guiyang 550025, China

A R T I C L E I N F O A B S T R A C T

Keywords: The stock market is a dynamic, complex, and chaotic environment, which makes predictions for the stock market
Reinforcement learning difficult. Many prediction methods are applied to the stock market, but most are supervised learning and cannot
Stock market prediction effectively parse the trading information present in the stock market. This paper proposes a prediction model that
Growing neural gas
combines unsupervised learning with reinforcement learning to address this problem. Firstly, we capture the
Reward function
Triple Q-learning
stock trend from historical stock data and construct the trading environment state of the market by the growing
neural gas (GNG) algorithm in unsupervised learning. Secondly, the reward function is restructured to provide
timely feedback on the trading information present in the stock trading market. Finally, a novel trading agent
algorithm, Triple Q-learning, is designed to execute the corresponding trading behavior and make comprehen­
sive predictions of the stock market based on the environment state constructed by GNG. Experimental results on
several stock datasets demonstrate that the proposed model outperforms other comparative models in this paper.

1. Introduction hidden state transmission process for stock price prediction. Chung &
Shin, 2020 combined genetic algorithm (GA) with multi-channel CNN
One of the investors’ primary concerns in the stock market is accu­ and proposed GA-CNN to predict the fluctuations of stock index. Chen
rately predicting its trends. Due to the high volatility of most stock prices et al. (Chen, Jiang, Zhang, & Chen, 2021) proposed a graph convolu­
and the numerous factors influencing their movements, predicting the tional neural network (GC-CNN) model that uses graph convolutional
stock market is a daunting task (Li, Liang, & Huynh, 2022). Conse­ features to predict stock trends. Meanwhile, (Qiu, Yang, Lu, & Chen,
quently, investors have to rely on traditional statistical methods for 2020) combined multi-layer long short-term memory, multi-layer gated
making predictions. In recent years, the widespread use of computer recurrent unit, and single-layer ReLU to propose a hybrid RNN model for
technology has popularized and extensively applied machine learning in stock market prediction. Wang, Xu, Huang, & Yang, 2019 designed a
stock market prediction, gaining recognition from increasing investors new RNN-based ensemble learning (RNN-EL) framework, combining
due to its stable investment performance (Kumbure, Lohrmann, Luukka, trade-based features with listed company features to predict stock price
& Porras, 2022). As the scale of international financial markets con­ trends effectively. While these models can efficiently and quickly predict
tinues to expand, studying efficient and stable stock market prediction stock prices, they often ignore market behavior in the stock market.
algorithms is essential for promoting the update and iteration of trading Since stock prices are merely the final result of a series of trading be­
methods. haviors in the stock market, their predictive accuracy is often low when
The study of stock market prediction has yielded numerous mature stock prices fluctuate.
solutions. Most prediction methods, known as predictive regression To obtain more accurate predictions, researchers use historical price
models, aim to predict the exact stock prices (Yang, Liu, Peng, & Cai, time series data to train models and learn about the trading behavior in
2021). These models primarily rely on convolutional neural networks the stock market. These models are known as symbol prediction models
(CNN) and recurrent neural networks (RNN) in deep learning to predict (Jeon, Hong, & Chang, 2018). They predict behavior or prices to assist in
the stock market. Zhang, Yuan, & Shao, 2018 proposed deep and wide deciding whether to make a long (predicting that stock prices will rise in
neural networks (DWNN) by adding CNN’s convolutional layer to RNN’s the future) or short (predicting that stock prices will fall in the future)

* Corresponding author at: Sate Key Laboratory of Public Big Data, Guizhou University, Guiyang 550025, China.
E-mail address: [email protected] (Z. Fu).

https://doi.org/10.1016/j.eswa.2023.120474
Received 20 October 2022; Received in revised form 10 May 2023; Accepted 10 May 2023
Available online 23 May 2023
0957-4174/© 2023 Elsevier Ltd. All rights reserved.
Y. Wu et al. Expert Systems With Applications 228 (2023) 120474

trade to maximize overall profits during a given period. Li, Tan, Wang, & in federated learning.
Chen, 2020 proposed an event-driven long short-term memory (LSTM) In the financial field, reinforcement learning has been applied with
neural network model. Ray, Ganguli, & Chakrabarti, 2021 combined good results. Zhang et al., 2020 proposed a cost-sensitive portfolio se­
LSTM, Bayesian structural time (BST) series model, and regression lection method with deep reinforcement learning based on a novel dual-
components to detect abnormal behavior or abnormal patterns in stock stream investment portfolio strategy network and a new cost-sensitive
price movements. Yang, Zhang, Xiong, Wei, Ng, Xu, & Dong, 2018 reward function. Chakole, Kolhe, Mahapurush, Yadav, & Kurhekar,
developed a two-layer Gated Recurrent Unit (GRU) model with 2021 proposed two different methods, clustering, and candlestick, to
attention-based mechanisms to predict stock price trends, where the represent the discrete state of the environment and used the Q-learning
attention layer assigns weights to capture important information, and algorithm of reinforcement learning to train trading agents to find the
GRU extracts features from it. Lv, Wang, Gao, & Zhao, 2021 designed a optimal dynamic trading strategy. Carta, Ferreira, Podda, Recupero, &
LightGBM-optimized LSTM model for short-term stock price prediction, Sanna, 2021 proposed an integration method of reinforcement learning
outperforming CNN and RNN algorithms. These models can accurately agents by training different models in different iterations (periods) and
predict future trends in the stock market based on historical data. further analyzing the behavior of these integrations based on different
However, stock prices of all companies are affected by various objective protocol thresholds. Shi, Li, Zhu, Guo, & Cambria, 2021 combined the
factors such as the company’s business situation, government policies, end-to-end Double DQN model with CNN to predict stock market trends.
and natural disasters (Xu, Chai, Luo, & Li, 2022), as well as subjective Xu et al., 2022 developed a novel stock price prediction model based on
factors such as market sentiment and investor psychology (Du & Tanaka- reinforcement learning and attention mechanism using bidirectional
Ishii, 2022; Malandri, Xing, Orsenigo, Vercellis, & Cambria, 2018). The gated recurrent unit networks. Unlike most machine learning methods,
models mentioned above have not considered these factors during the reinforcement learning model can continuously train to find the action
prediction process, leading to inaccurate predictions. that produces the maximum return and implement it after that, it ach­
With the recent advances in natural language processing (NLP) ieves good results in stock market prediction. However, since the envi­
techniques, researchers have better-captured market and investor ronment state of the stock market environment is dynamic, complex,
sentiment from financial text data and applied it to stock market pre­ and chaotic (Shahi, Shrestha, Neupane, & Guo, 2020), and the reward
dictions. Xing, Cambria, & Zhang, 2019 proposed a new model called function of the models mentioned above cannot provide timely feedback
sentiment-aware volatility forecasting (SAVING), which incorporates on the trading information in the stock market, the prediction results of
market sentiment into stock return volatility prediction. Picasso, Mer­ the model are sometimes unsatisfactory. Furthermore, the trading agent
ello, Ma, Oneto, & Cambria, 2019 proposed a combination method, of the models mentioned above only relies on the greedy algorithm to
using news data for technical indicators and sentiment analysis for stock value the previous actions (Peer, Tessler, Merlis, & Meir, 2021), which
trend prediction. They claimed that their model effectively predicted the usually leads to biases such as overestimation and underestimation,
trends of a portfolio consisting of the 20 most capitalized companies thereby damaging the policy of the trading agent (Fischer, Eyberg,
listed on the NASDAQ 100 index. Xing, Cambria, & Welsch, 2018 used Werling, Lauer, & Algorithms, 2021).
sentiment analysis and text mining techniques to compute sentiment To address the above issues, in this paper, we propose to combine
time series from social media and employed LSTM and Bayesian unsupervised learning GNG with reinforcement learning to predict stock
methods to process the obtained sentiment time series for stock market market trends. The model leverages unsupervised learning methods to
prediction. Ma, Mao, Lin, Wu, & Cambria, 2023 combined the digital group historical information from stock market data and reinforcement
features of the target stock, the market-driven news sentiment, and the learning to represent the behavior of the stock market (environment). In
related news sentiment to propose a new multi-source aggregated clas­ this study, we introduce growing neural gas (GNG) from unsupervised
sification (MAC) method for predicting stock price trends. These models learning for the first time to capture the trend of stock data and construct
effectively utilize investor sentiment as an external factor to predict the the environment state by classifying the historical trading data in the
stock market, and the prediction results are more accurate than before. stock market through GNG. At the same time, we redesign the reward
However, since investor behavior in the stock market is constantly function so that the model can provide timely feedback on the trading
changing, and the models above only use supervised learning methods information in the stock market and improve the model’s performance
to predict the stock market, this may lead to overfitting and inaccurate in capturing stock price trends. Finally, we innovated in the reinforce­
predictions based on current investor behavior (Pendharkar & Cusatis, ment learning field by designing a strategy algorithm called Triple Q-
2018). Learning, which preserves three Q-tables during the training phase and
Currently, a model called reinforcement learning has been intro­ updates them with equal probability during the training process, which
duced to address the issues mentioned above. Schultz, Dayan, and enables the algorithm to converge to an optimal policy faster and
Montague first proposed reinforcement learning in 1997 (Schultz, overcome the maximum bias generated by the trading agent.
Dayan, & Montague, 1997). Unlike most machine learning methods, The contributions of this paper are as follows:
reinforcement learning models capture the factors influencing predic­
tion results through an agent and continuously tries to find which ac­ 1. We have redesigned the reinforcement learning method for con­
tions can produce the best results. The benefits of reinforcement learning structing the environment state by utilizing the GNG model from
have constantly been verified in many application scenarios. Hou et al. unsupervised learning to group historical stock trading data. This
(Hou, Fei, Deng, & Xu, 2020) used hierarchical deep reinforcement approach allows the model to represent the environment state using
learning to solve the reward sparsity problem in robot double peg-hole a limited number of states for trading. To our knowledge, this is the
assembly tasks. Peng, Ma, Poria, Li, & Cambria, 2021 proposed the first time the GNG model has been applied in the context of stock
disambiguate intonation for sentiment analysis (DISA) based on the market prediction.
principle of reinforcement learning, which can eliminate the intonation 2. We have redesigned the reward function to evaluate the trading
ambiguity of each Chinese character (Pinyin) and thus learn accurate behavior of the model output. The new reward function facilitates
speech representation. Li et al. (Li, Wang, & Yang, 2021) proposed an timely feedback on the trading information present in the stock
optimized scheduling model for isolated microgrids based on automatic trading market and enhances the model’s ability to capture stock
reinforcement learning for the multi-period prediction of renewable price movements.
energy generation and load, thereby reducing system operating costs. 3. We have developed a novel trading agent algorithm called Triple Q-
Ahmadi et al. (Ahmadi et al., 2022) proposed a combination of spectral learning, which constructs the trading agent of the model. In this
clustering and deep-Q reinforcement learning integration, called DQRE- algorithm, we keep three Q-tables in the training phase to iteratively
SCnet, which can effectively reduce the communication rounds required update the Q-values. This approach solves the maximization bias

2
Y. Wu et al. Expert Systems With Applications 228 (2023) 120474

generated by traditional trading agent algorithms and improves the Algorithm 1 (continued )
iteration speed of the model. 21: end for
4. We have conducted extensive experiments on the proposed and other 22: Adjust the error of all nodes:E←βE
comparative models on two domain stock datasets (index and indi­ 23: end for

vidual stock datasets). The experimental results proved that our


proposed model is competitive on all datasets.
The parameters ∂jn , ∂jt , agemax , L, and β in the above algorithm represent
The rest of the paper is organized as follows: Section 2 explains the the learning coefficient of the winning node, the learning coefficient of
relevant theoretical background of reinforcement learning, Q-Learning, the neighboring nodes, the maximum connection age, the period of the
and GNG. In Section 3, the proposed model and related theories are new node, and the global error adjustment coefficient. In GNG, each
described. And Section 4 presents the dataset and evaluation metrics node has an accumulated error variable, which is updated in each iter­
used for the experiments and shows and discusses the results of our ation of the algorithm based on the Euclidean distance between the node
experiments. Finally, Section 5 concludes the paper and the subsequent and input data and is used to adjust the weights of the winning node and
work. its neighboring nodes. A larger error value means that local error up­
dates can map input data to nodes further away, with a wider coverage
2. Background range and better adaptation to data patterns. Subsequently, when new
input data is presented, the connection age between the winner and its
This section primarily introduces the fundamental theories, mainly neighbors will increase by 1, and connections with an age greater than
focusing on the concepts of GNG and reinforcement learning. the predefined maximum connection age agemax will be deleted. Finally,
the error of all nodes is adjusted, and the entire network is updated,
2.1. Growth neural gas waiting for the following input of data.

GNG self-organizing neural network was first proposed by Fritzke 2.2. Reinforcement learning
(Fritzke, 1994) in 1994 and is based on the study of neural gas first
presented by Martinetz (Martinetz, Berkovich, & Schulten, 1993). GNG Unlike other machine learning methods, reinforcement learning can
is a competitive, unsupervised neural network with features such as maximize the expected return through trial and error by interacting with
scalability, flexibility, rapid adaptability, and excellent input space (Liu, the environment (Guan et al., 2021). Reinforcement learning was first
Ishibuchi, Masuyama, & Nojima, 2019). Compared to other unsuper­ proposed by Sutton and Barto (Sutton & Barto, 2018) and is typically
vised algorithms at the time, GNG has significant advantages. For represented as a Markov Decision Process (MDP). MDP is interpreted as
example, unlike the K-means clustering method, GNG does not require an autonomous learning process, which describes the problem as a five-
the number of clusters to be specified in advance and can automatically tuple (S, A, P, R, γ), where:
learn the topological structure of data while smoothing the feature
distribution, enabling better expression of the data using a good topo­ • S: A finite set of states. It includes an agent receiving information
logical structure (Mahmoudabadi, Kuchaki Rafsanjani, & Javidi, 2021). from the environment and involves representing information in the
GNG also overcomes the drawback of other unsupervised algorithms environment and describing the problem.
that cannot generate optimal topological maps for any input dataset • A: A finite set of actions. It represents an agent’s actions based on the
under time constraints (Wu, Chen, Li, & Chen, 2021). The performance information collected from the state set S.
of GNG algorithm has been further validated in (Liu, Jin, Heiderich, • P: Action-based state transfer matrix. It represents the probability of
Rodemann, & Yu, 2020; Wu et al., 2021), where GNG is extended to enteri-ng the next state St+1 when performing action A under state St ,
handle non-stationary data. The classical GNG algorithm is described in Pass’ = ℙ[St+1 = s’ |St = s, At = a ].
Algorithm 1. • R: State and action-based reward function. It represents the expected
reward received by transitioning from state St to state St+1 through
Algorithm 1
the execution of action A, Ras = E[Rt+1 |St = s, At = a ].
GNG.
• γ: A discount factor. It is used to discount future rewards and avoid
infinite rewards when dealing with cyclic Markov processes,
1: Initialize the set of all nodes in the network G by two nodes with random weight
γ ∈ (0, 1).
vectors: j←0,s←0
2: Initializes parameters: ∂jn , ∂jt , agemax , L,β
3: for each node n from the set of all topological neighbors do As seen from the above, the state transition probability P and reward
4: s←s + 1 function R in the MDP process depend not only on the individual’s
5: Find the node jn closest to the input n current state S but also on the action A chosen by the individual.
⃦ ⃦2
6: Calculate local error:ej ←ej + ⃦n − jn ⃦
n n During the MDP process, an agent can select an action from a set of
7: Adjust weight for the winner node jn and the node in the neighborhood n:jn ←jn +
actions based on its understanding of the current state. This selection is
∂jn (n − jn ), jm ←jm + ∂jm (m − jm ),jm ∀m
8: Create an edge between jn and the node in the neighborhood m if it does not known as a policy, denoted by the letter π . A policy π determines the
exist:agejn ,m ←0 agent’s action and is a probability distribution over the set of possible
9: for each edge e in all topological neighbors do actions in a given state, represented as:
10: if age > agemax then
11: Delete old edges, delete isolated nodes π(a|s ) = ℙ[At = a|St = s ] (1)
12: end if
13: if s is multiple of L then The primary objective of MDP is to identify the policy that maximizes
14: Generate a new node jt between jn and jm the payoff. During the training phase, the agent can explore multiple
1 policies. However, only one enables the individual to achieve the
15: jt ← emax
2
16: Delete the edge between jn and jm
highest possible gain during the interaction with the environment. This
17: Create the edge between jn and jt policy is commonly referred to as the optimal policy, i.e.
18: Create the edge between jt and jn [ ]
19: agejt ←0 ∑
∞ ⃒
20: end if
argmaxE γt ⃒Rπ(s ) = (st , st+1 )
t (2)
π t=0
(continued on next column)

3
Y. Wu et al. Expert Systems With Applications 228 (2023) 120474

In general, for any MDP, there is always at least one optimal policy π* ensuring continuous exploration and experience of a sufficient number
that is not worse than any other strategy and may even be better, and of new states, eventually converging to the optimal policy and the
there may be more than one optimal policy for an MDP (Gao, Gao, Hu, optimal action-value function.
Jiang, & Su, 2020). At this stage, the state value function vπ* (s) under the
optimal policy equals the optimal state value function v* (s). Therefore, 3. Model architecture
the optimal policy π* is obtained by iteratively running the MDP to find
the optimal state value function v* (s). All value functions for other Trading in the stock market can be viewed as a sequential decision
possible policies are lower than v* (s), i.e. problem (Chakole & Kurhekar, 2020). Traders must make trading de­
cisions at each time interval during the trading session. Therefore, it is
v* (s) = max (vπ (s)) (3) possible to use reinforcement learning methods to find the optimal dy­
namic trading strategy and perform stock market prediction. In this
with
section, we present a detailed description of our proposed model. Firstly,
vπ (s) = E[Rt+1 + γvπ (St+1 )|St = s] (4) we use the unsupervised learning method GNG to group historical stock
trading data and represent the state of the environment using a limited
where vπ (s) denotes the value function of the current state under the number of states. Secondly, we redesign the reward function to evaluate
policy π. the trading behavior of the model output. Finally, we introduce a new
The Q-learning algorithm is a model-free policy reinforcement trading agent algorithm called Triple Q-learning to construct the
learning method that replaces the value function v(s) with the action- model’s trading agent and represent the stock’s market environment
value function Q(S, A). The goal of Q-learning is to obtain the optimal behavior. The specific trading model is shown in Fig. 1.
action-value function Q* (S,A). In the process of Q-learning, the action At
at time t is generated by the policy μ:
3.1. GNG-based environmental state representation
At ∼ μ(⋅|St ) (5)
Due to the multitude of information that impacts stock prices in the
where policy μ is an ε-greedy policy. In Q-learning, the agent looks up stock market, the traditional Q-learning model can only rely on
the best action through a lookup table called the Q-table, where the simplistic modeling techniques to construct the model environment
initial Q-values are set to 0, meaning all actions in the state list have a (Araújo, Figueiredo, & Botto, 2022). This leads to a slow capture of
chance of being selected. At time t + 1, the action At+1 used to update the trading signals in the stock market, which in turn fails to provide useful

Q-value is: information to trading agents and ultimately decreases the model’s
prediction accuracy. Therefore, to effectively analyze the trading in­
(6)

At+1 ∼ π(⋅|St+1 )
formation present in the stock market environment, we utilize GNG in
unsupervised learning to cluster and analyze trading data and construct
where the policy π is a fully greedy policy. Then the action-value func­
the environment state.
tion Q(St , At ) at time t is:
Within our model, we assume that historical events in the stock
( ( ) )

QA (St , At )←QA (St , At ) + α Rt+1 + γQ St+1 , At+1 − Q(St , At ) (7) market will repeat themselves, meaning that the current trading
behavior state is similar to past behavior states. We aim to identify the
where γ is the discount factor, α is the learning rate, and optimal trading behavior from past trading behavior states and apply it
Rt+1 +γQ(St+1 , At+1 ) is the value of Q generated to obtain the action At+1
′ ′
to the current trading state. Price trends for all stocks can be summarized
based on the policy π . According to this value updating method, the into three categories: upward, downward, or stable (Chakole et al.,
value of action At obtained by state St using the ε-greedy strategy will be 2021). Research by Hu et al. (Hu, Wang, Ho, & Tan, 2021) and Xiaoning
updated towards the maximum value of action determined by the greedy et al. (Xiaoning, Shang, Jiang, & Shouyang, 2019) shows that capturing
strategy in state St+1 by a certain proportion. This algorithm enables the trends in stock data through algorithms can effectively increase the
individual’s policy μ to become closer to the greedy strategy while model’s returns in the stock market. Therefore, we quantize this trend
factor and incorporate it into our clustering algorithm. First, we

Fig. 1. Proposed model for stock market prediction.

4
Y. Wu et al. Expert Systems With Applications 228 (2023) 120474

preprocess the daily trading data collected from the stock market. We Li, & Yang, 2022). As the information in the stock market environment
assume that there are n trading days in the stock data and divide it into n changes according to the continuously emerging trading behavior (Shi
trading sessions (P1 , P2 , P3 , ..., Pn ), each consisting of one trading day. et al., 2021), it is necessary to design a reward function that fits the
Then, we categorize the data based on [Open, Close, High, Low, Volume, changing behavior of the stock market trading in order to effectively
Trend], where Trend is the stock price trend. The classified data is rep­ improve the model’s ability to capture stock price trends. Therefore, we
resented using tuples that show the trend and environment state of the have redesigned the reward function based on the existing trading
stock data within the trading period, where: behavior in the stock market.
Typically, there are two types of trading behaviors in the stock
• O = Percentage change of the opening price compared to the pre­ market. The first type encourages buying, indicating a positive outlook
( )
vious day’s closeing price = openclose
t − closet− 1
*100 on the current stock market and a continual increase in buying. The
t− 1
reward rbt during the trading session Pt is:
• C = Percentage change of the close price compared to the opening
( ) ( ( ))
price of the day = closeopen
t − opent
*100 rbt = Bt + exp − λtb
Ut
− C (8)
t
Ut + Dt
• H = Percentage change of the high price compared to the opening
( )
hight − opent
price of the day = = open *100 where Bt is the logarithmic return of the current trading session, which
t

• L = Percentage change of the low price compared to the opening represents the price change of the stock in the current trading session,
( ) Bt = ln(closet ) − ln(opent ). U is the size of the upward fluctuation of the
price of the day = = lowopen
t − opent
*100
t stock price in the current trading session, D is the size of the downward
• V = Percentage change of the volume compared to the volume of the fluctuation of the stock price in the current trading session, U +D is the
( )
previous day = volume t − volumet− 1
*100 size of the total fluctuation of the stock price in the current trading
volumet− 1
session. λtb is the ratio of times the stock price fluctuates upward to the
• T = Percentage change of the opening price compared to the previ­
( ) total number of fluctuations during the current trading session. C are the
ous day’s opening price = openopen
t − opent− 1
*100 trading costs, including brokerage fees, stamp duties, stock exchange
t− 1

fees, etc.
After that, the tuple Q = [O, C, H, L, V, T] are provided as input data to The second type is encouraging selling, which means the current
the GNG for clustering. Fig. 2 shows the flowchart of the proposed stock market is not optimistic and keeps selling. The reward rst during the
structure and Algorithm 2 shows a detailed overview of the GNG-based trading session Pt is:
clustering algorithm. ( ( ))
Dt
rst = − Bt + exp − λts − C (9)
Algorithm 2 Ut + Dt
GNG-based environmental state representation.
where λts is the ratio of times the stock price fluctuates downward to the
Input: Historical trading data H containing the data values from t1 to tn total number of fluctuations during the current trading session.
Output: Clustered data Furthermore, we have incorporated two key considerations, the
1: Define total number of trading days n Williams percentage range (WPR) and the commodity channel index
2: Initialize P1, P2, …, Pn (CCI) (Xu et al., 2022), to assess the trading behavior of our model based
3: for d = 1 to n do
4: Pd = {td-1, …, td}
on the calculated rewards. These indicators measure the extent of recent
5: end for changes in stock prices to determine whether the current stock market
6: Initialize tuple Q1, Q2, …, Qn situation is overbought or oversold. By using the closing price and daily
7: for d = 1 to n do trading information, we calculate these indicators to evaluate whether
8: Qd = [O, C, H, L, V, T]
the trading signals generated by the market align with the model’s
9: end for
10: Apply GNG clustering algorithm on tuple (Q1, Q2, …, Qn) trading behavior. If there is a significant deviation, corrective trading
11: return Clustered data behavior is adjusted accordingly, ensuring that the agent modifies its
trading strategy only when market conditions permit such changes. This
approach maximizes profits from the agent’s actions, ensuring that the
3.2. Reward function optimal action in the Q table is selected while minimizing any losses
from random actions.
In Q-Learning, the information in the environment needs to be fed
back to the trading agent through a reward function (Zhang, Zhang, Yan,

Fig. 2. Clustering of historical data from stock market.

5
Y. Wu et al. Expert Systems With Applications 228 (2023) 120474

3.3. Trading agent

For trading agents, traditional Q-learning algorithms rely solely on


the greedy algorithm to estimate rewards, leading to overestimation and
maximizing bias, potentially resulting in suboptimal behavior (Brim,
2020). Therefore, we propose Triple Q-learning, a reinforcement
learning-based Q-learning method, to overcome this maximization bias.
Triple Q-learning uses a state-action mapping table called the Q-table to
maintain the model’s trading strategy. The Q-table senses the trading
environment in the stock market through environmental states and
contains Q-values for each state-action pair. Since the trading agent has
no knowledge of the current market environment, all values in the Q-
table are initialized to zero. We retain three Q-tables during the training
phase to offset the maximization bias. One Q-table is saved upon
receiving an action-based reward and used to update the other two Q-
tables based on the maximum state-action pair of a corresponding state.
Subsequently, Q-values are iteratively updated based on the obtained Q-
values, and corresponding trading actions are executed based on the
obtained Q-values. Gains and losses are recorded as rewards, and the Q-
table is updated based on the reward values until the best trading
behavior is achieved. The value of QA (St , At ) during the trading session
Pt is:
(
γ ∑ 3
( ′ )
QA (St , At )←QA (St , At ) + α Rt+1 + max QB St+1 , At+1
2 B=1,B∕
=A A

)
− QA (St , At ) (10)

Since Triple Q-learning employs a greedy strategy, it can update the


three tables with equal probability, allowing the algorithm to converge
to the optimal strategy faster. As a result, the model will be able to in­ Fig. 3. Training process of the trading agent in proposed model.
crease the profits at the end of the training. Algorithm 3 describes the
algorithmic process of Triple Q-learning that we propose, and Fig. 3 il­
lustrates the process of training the trading agent. Table 1
Index stock datasets.
Algorithm 3
Name of stock Length Total period
Triple Q-learning.
DAX30 3300 2006/01/02–2018/12/31
NASDAQ 4557 2001/01/02–2018/12/31
1: for n in number of Q-tables do
SENSEX 2750 2007/09/17–2018/12/31
2: for each s in S and a in A(s) do
3: set Qn (s, a) autonomously
4: set Qn (terminal state, ⋅) = 0
5: end for
6: end for
Table 2
7: for each episode in episodes do Individual stock datasets.
8: Initialize:S← first state of episode Name of stock Length Total period
9: for each step of episode do
1 American Power Conversion 1240 2013/02/08–2018/02/07
10: Choose QA where A ∈ {1, 2, 3} using probability Intel 1240 2013/02/08–2018/02/07
3
11: A = policy(QA , S)(e.g. ε-greedy policy) Johnson & Johnson 1240 2013/02/08–2018/02/07
12: R, S = perform action(S, A)

Occidental Petroleum Corporation 1240 2013/02/08–2018/02/07
( )
γ ∑3 Procter & Gamble 1240 2013/02/08–2018/02/07
13: QA (St ,At )←QA (St ,At ) + α Rt+1 + maxQB (St+1 ,At+1 ) − QA (St ,At )

2 B=1,B∕
=A A′ Tiffany 1240 2013/02/08–2018/02/07
14: S←S

15: end for


16: end for American Stock Exchange, the SENSEX index from the Indian Securities
Exchange, and the DAX30 index from the German Securities Exchange.
The individual stock datasets consist of stock data from companies listed
4. Experimental results and discussion in the S&P 500 index of the American Stock Exchange. Tables 1 and 2
show the daily data for individual and index stocks. All the data are
In this section, we describe the experiments in detail, including the obtained from Yahoo Finance1.
data set, performance evaluation metrics, experimental results, and a To compare our results with previous studies, we split all datasets in
discussion. chronological order, with 60% of the total data used for training, 20%
for validation, and 20% for testing, consistent with the split ratios used
4.1. Experimental dataset in previous studies. After training, we validated the models using the
validation set and selected the best-performing model for testing.
To effectively evaluate the proposed model’s performance in the
stock market, we have selected two types of stock datasets: index and
individual. The index stock datasets include the NASDAQ index from the 1
. https://finance.yahoo.com/.

6
Y. Wu et al. Expert Systems With Applications 228 (2023) 120474

4.2. Evaluation metrics 4.2.5. Sharpe ratio


The Sharpe ratio (SR) is a performance metric for investment port­
To evaluate the effectiveness of the proposed method in practical folios that measures how well the returns of an investment compensate
stock trading settings, we have selected several financial performance for the risk taken by the investor and is commonly used to understand
evaluation indicators, such as cumulative return, average annual return, investment returns in relation to risk (Chou, Jiang, & Kuo, 2021).
maximum drawdown, standard deviation, and Sharpe ratio (Chakole & Compared to other trading strategies, a strategy with the highest Sharpe
Kurhekar, 2020; Chakole et al., 2021). Ratio indicates the lowest risk. The formula for the Sharpe Ratio is:
Rp − Rf
4.2.1. Accumulated return SR = (16)
σp
The accumulated return (AR) is the percentage change between the
initial and final investment amounts, with the value of the accumulated where Rp is the expected return on the investment, Rf is the risk-free
return rate expected to be positive and as large as possible. The specific rate, and σp is the standard deviation of the investment return.
formula for calculating the accumulated return rate is:

%AR =
It − I0
*100% (11) 4.3. Experimental details
I0
In this experiment, we compared the proposed model with two
where It is the amount of existing investment at time t, and I0 is the
standard trading strategies: the buy-and-hold strategy (Carta et al.,
initial investment amount.
2021) and the decision tree-based strategy (Brim, 2020), as well as two
novel trading strategies: the combination strategy of K-means and Q-
4.2.2. Average annual return
learning (Chakole et al., 2021) and the combination strategy of GNG and
The average annual return (AAR) is the average of the annual return
Q-learning. All models in this experiment can perform long (sell first,
rates for investing in stocks over a given period. The value of the average
then buy) and short (buy first, then sell) operations in the stock market.
annual return should be positive and as large as possible. The specific
For simplicity, the models execute an intraday trading strategy, which
formula for calculating the average annual return is:
allows only one type of trading behavior (buying, selling, or holding) per
1 ∑m
Yey − Yby day, and all operations must stop after the market closes. In addition, for
%AAR = *100% (12) each transaction, the trader must pay transaction costs, including
m y=1 Yby
brokerage fees, stamp duties, and exchange fees, which are set to 0.10%
where m is the total number of years, Ye is the amount invested at the of the trading amount based on the trading rules in the stock market. To
end of the year, and Yb is the amount invested at the beginning of the ensure the fairness of the experiment, we performed a grid search on all
year. adjustable parameters for each model on all datasets in this experiment,
aiming to obtain the optimal adjustable parameters for all models
4.2.3. Maximum drawdown involved in this experiment. Specifically, based on previous research, we
The maximum drawdown (MDD) is a tool used to measure invest­ set the stop-loss threshold to 9%, the learning rate α to 0.05, the decay
ment risk and is often used to compare the related risks between two or factor γ to 0.7, the decay coefficient to 0.995, and the decay step to 100.
more trading strategies (Drenovak, Ranković, Urošević, & Jelic, 2022).
Since the maximum drawdown estimates the downside risk over the 4.4. Experimental results and discussion
considered period. Ideally, the maximum drawdown value should be as
small as possible. The specific calculation formula for the maximum 4.4.1. Experimental results on the index stock datasets
drawdown is: All models accumulated returns on the index stock datasets are
plotted in Fig. 4. As shown in Fig. 4, our proposed model (indicated by
MDDt = max (%Drawdownt ) (13) the red curve in the figure) outperforms other models in terms of the
overall accumulated return on all index stock datasets. Especially in the
where
latter half of all datasets, the proposed model’s accumulated return is
Vmax − Vt consistently higher than that of other comparative models. Compared
%Drawdownt = *100% (14)
Vmax with other advanced prediction models, such as the trading strategies
combining K-means and Q-learning, and the trading strategies
where Vmax is the historical maximum value of the investment income combining GNG and Q-learning (indicated by the green and yellow
and Vt is the value of the investment income at time t. curves in the figure), our proposed model has significant advantages in
terms of accumulated returns. This phenomenon demonstrates that our
4.2.4. Standard deviation proposed model can effectively process the complex information in the
The standard deviation (SD) is commonly used to measure the index stock datasets and achieve higher returns, making it competitive
volatility associated with stock returns. A larger standard deviation in­ in the stock market. However, on NASDAQ and SENSEX, the proposed
dicates greater volatility in stock prices (Narayan & Narayan, 2021). model’s performance in the first half is inferior to other models, espe­
Therefore, in an ideal scenario, the value of the standard deviation cially compared to the trading strategy combining K-means and Q-
should be as small as possible. The specific formula for the standard learning. It is because the GNG model has a slow convergence rate in the
deviation is: early stages of training, despite its overall performance advantage.
√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ However, this drawback is compensated for by continuous training in
1 ∑ n
SD = (xi − x̄)2 (15) the later stages, which is why the proposed model’s performance in the
n − 1 i=1
second half is superior to that of other models, as seen in Fig. 4.
Table 3 presents the evaluation metrics for the optimal results ob­
where n is the total number of samples and x̄ is the sample mean. tained by all models on all index stock datasets. Based on Table 3, it can
be concluded that our proposed model outperforms other models on all

7
Y. Wu et al. Expert Systems With Applications 228 (2023) 120474

Fig. 4. Performance comparison of all models on the test dataset of index stocks.

8
Y. Wu et al. Expert Systems With Applications 228 (2023) 120474

Table 3
Comparison of evaluation metrics of all models on the test dataset of index stocks.
Dataset Evaluation metrics Buy-and-Hold Decision Tree K-means and Q GNG and Q Proposed Model

DAX30 Accumulated Return (%) 83.10 59.50 79.47 115.53 145.77


Average Annual Return (%) 11.90 5.37 9.11 13.29 15.68
Maximum Drawdown (%) 61.29 52.35 49.51 53.89 47.11
Standard Deviation (%) 0.95 0.92 0.89 0.86 0.72
Sharpe Ratio (%) 2.11 1.02 1.71 1.97 2.78

NASDAQ Accumulated Return (%) 185.22 113.57 265.63 345.88 360.11


Average Annual Return (%) 11.12 8.02 12.89 14.04 19.55
Maximum Drawdown (%) 84.35 65.06 82.76 82.34 61.13
Standard Deviation (%) 1.31 1.20 1.21 1.48 1.14
Sharpe Ratio (%) 1.77 1.41 2.22 2.72 3.15

SENSEX Accumulated Return (%) 97.62 67.08 97.15 125.05 150.73


Average Annual Return (%) 10.63 7.22 10.61 11.57 13.66
Maximum Drawdown (%) 60.98 53.70 49.42 59.89 48.93
Standard Deviation (%) 0.94 0.81 0.85 0.86 0.83
Sharpe Ratio (%) 1.91 1.47 2.15 2.26 2.71

evaluation metrics except for standard deviation, showing a 50%-60% model is competitive in stock market prediction tasks. Compared to the
improvement over both basic prediction models of buy-and-hold and trading strategies combining K-means and Q-learning, and the trading
decision tree strategy. Furthermore, it shows a 35%-45% improvement strategies combining GNG and Q-learning (indicated by green and yel­
over advanced prediction models of K-means and Q-learning combined low curves in the figure), the proposed model’s accumulated return
trading strategy and GNG and Q-learning combined trading strategy. performance is superior to that of GNG and Q-learning. In contrast, GNG
Especially on NASDAQ, the proposed model’s accumulated return and Q-learning’s accumulated return performance is superior to that of
reached 360.11%, a significant improvement compared to both bench­ K-means and Q-learning. It demonstrates that the GNG algorithm is
mark trading strategies of buy-and-hold and decision tree method, more effective in handling stock market data than K-means, and our
showing a 90% and 218% increase, respectively. Moreover, it shows a proposed Triple Q-learning can be well integrated with GNG to predict
35% improvement over the K-means and Q-learning combined trading the stock market, proving the effectiveness of our model. However, in
strategy and has a significant advantage over comparable models. These the case of Tiffany and Occidental Petroleum Corporation, the initial
phenomena demonstrate the excellent predictive performance of the performance of our proposed model was inferior to other models. It can
proposed model on the index stock datasets. However, as seen in the be attributed to the fact that although the GNG algorithm has an overall
table, our proposed model’s standard deviation on the SENSEX dataset advantage, it suffers from slow convergence speed in the early stages of
falls slightly short compared to the trading strategy that uses decision training. However, the disadvantage is compensated by continued
trees. It is because our model frequently makes trading actions converge training in the latter stages of training, as can be observed from Fig. 5,
faster to the optimal strategy, resulting in fluctuations in holding returns where the later performance of our proposed model was superior to that
and ultimately leading to poor standard deviation performance and of other models.
some trading risks. However, our proposed model outperforms all other Table 4 presents the evaluation metrics of the optimal results ob­
models in terms of the obtained Sharpe ratio evaluation metric. Addi­ tained on the individual stock datasets. Based on Table 4, it can be
tionally, our model’s accumulated and average annual returns perfor­ concluded that the proposed model outperforms the other models in all
mance is significantly better than other models. It indicates that while evaluation metrics, exhibiting a significant improvement of 50%-70%
our model incurs some trading risks to maximize optimal returns in over the two benchmark trading strategies – buy-and-hold and decision
terms of accumulated and average annual returns, these risks are tree methods. Compared to the trading strategy combining K-means and
acceptable overall. Q-learning and the trading strategy combining GNG and Q-learning
strategy, our proposed model shows an improvement of 40%-50% and
4.4.2. Experimental results on the individual stock datasets 20%-40%. Particularly on Tiffany, the proposed model achieved an
All models accumulated returns on the individual stock datasets are accumulated return of 103.34%, exhibiting a significant advantage over
plotted in Fig. 5. It can be observed from Fig. 5 that the proposed model the other comparative models, demonstrating our proposed model’s
(represented by the red curve in the figure) outperforms other models in excellent predictive performance. However, on Procter & Gamble and
terms of accumulated return on individual stock datasets. Compared Tiffany, the standard deviation of the proposed model is lower than the
with the buy-and-hold and decision tree strategies (represented by the buy-and-hold trading strategy. It is because the proposed model aims to
black and blue curves in the figure), the proposed model consistently maximize the return and continuously makes trading actions during the
achieves better accumulated returns. Particularly on American Power trading process, leading to fluctuations in holding returns, ultimately
Conversion, while the buy-and-hold and decision tree strategies show resulting in poor standard deviation performance compared to the buy-
fluctuating, and decreasing curves, the proposed model’s curve shows and-hold trading strategy. Thus, there is some trading risk associated
an upward trend. This phenomenon demonstrates the outstanding pre­ with the proposed model. However, when looking at the Sharpe ratio,
dictive performance of our proposed model. Further, it proves that the the proposed model outperforms other models, indicating that the
proposed model can effectively handle the complex information in in­ trading risk associated with the proposed model is acceptable under the
dividual stocks, resulting in higher returns. Therefore, our proposed premise of pursuing optimal returns.

9
Y. Wu et al. Expert Systems With Applications 228 (2023) 120474

Fig. 5. Performance comparison of all models on the test dataset of individual stocks.

10
Y. Wu et al. Expert Systems With Applications 228 (2023) 120474

Fig. 5. (continued).

11
Y. Wu et al. Expert Systems With Applications 228 (2023) 120474

Table 4
Comparison of evaluation metrics of all models on the test dataset of individual stocks.
Dataset Evaluation metrics Buy-and-Hold Decision Tree K-means and Q GNG and Q Proposed Model

American Power Conversion Accumulated Return (%) − 16.86 2.42 9.11 20.19 50.40
Average Annual Return (%) − 7.51 2.45 6.22 10.14 22.54
Maximum Drawdown (%) 44.25 23.31 23.13 27.31 20.28
Standard Deviation (%) 1.68 1.38 1.44 1.42 1.33
Sharpe Ratio (%) − 0.48 0.22 0.47 1.10 1.67

Intel Accumulated Return (%) 29.42 4.35 24.65 51.18 77.17


Average Annual Return (%) 16.12 1.25 12.85 21.11 28.27
Maximum Drawdown (%) 33.18 27.28 27.00 34.63 24.98
Standard Deviation (%) 1.31 1.07 1.10 0.97 0.89
Sharpe Ratio (%) 1.24 0.32 0.81 3.18 3.89

Johnson & Johnson Accumulated Return (%) 18.44 12.65 35.11 56.97 74.53
Average Annual Return (%) 8.61 3.80 9.65 11.72 14.32
Maximum Drawdown (%) 25.07 27.99 29.95 28.90 24.95
Standard Deviation (%) 0.89 1.34 0.95 1.06 0.84
Sharpe Ratio (%) 1.23 0.65 1.60 2.26 2.55

Occidental Petroleum Corporation Accumulated Return (%) − 0.19 7.84 27.33 41.26 63.09
Average Annual Return (%) 0.81 4.11 8.29 10.85 14.72
Maximum Drawdown (%) 25.26 24.83 25.70 26.38 22.95
Standard Deviation (%) 1.19 0.95 0.94 1.05 0.92
Sharpe Ratio (%) 0.09 0.55 1.01 1.01 1.87

Procter & Gamble Accumulated Return (%) 4.96 15.01 31.59 58.38 77.46
Average Annual Return (%) 0.25 4.25 6.46 8.08 11.59
Maximum Drawdown (%) 14.13 14.34 12.78 15.94 13.68
Standard Deviation (%) 0.55 0.67 0.71 0.77 0.64
Sharpe Ratio (%) 0.05 1.04 1.24 1.66 1.98

Tiffany Accumulated Return (%) 47.05 20.69 36.44 69.76 103.34


Average Annual Return (%) 11.31 5.54 6.48 13.41 17.98
Maximum Drawdown (%) 30.09 18.38 23.52 29.18 17.66
Standard Deviation (%) 0.72 1.36 0.99 1.79 0.87
Sharpe Ratio (%) 1.06 0.23 1.42 1.33 1.60

5. Conclusion Declaration of Competing Interest

Under the guidance of computer technology, the financial sector has The authors declare that they have no known competing financial
rapidly developed, and the scale of the stock market has also expanded interests or personal relationships that could have appeared to influence
accordingly. The quest to acquire valuable information from the stock the work reported in this paper.
market to predict market trends, reduce investment risks, and obtain
substantial returns has become a formidable challenge for practitioners Data availability
in the field.
In this paper, we propose a stock market prediction model based on The data that has been used is confidential.
GNG and Triple Q-learning. In the model, we use the GNG algorithm to
group and cluster historical stock data to describe stock trends and form Funding
the trading environment state of the stock market. We then evaluate the
trading behavior output by the model through a redesigned reward This work was supported in part by the National Natural Science
function and provide timely feedback on the trading information present Foundation of China under Grant 51505094, in part by the Guizhou
in the stock trading market. Finally, we use Triple Q-learning to perceive Provincial Science and Technology Foundation under Grant [ZK(2023)
the trading environment state of the stock market and make corre­ 079], in part by the Science and Technology Support Program in Guiz­
sponding trading behavior to overcome the maximization bias generated hou under Grant [2017]2029, in part by the Applied Basic Research
by traditional models when executing corresponding behaviors. Per­ Program of major projects in Guizhou under Grant JZ[2014]2001, and
formance experiments on different types of stock data show that our in part by the Talent Introduction Research Program of Guizhou Uni­
proposed model is competitive on all datasets, especially in terms of versity under Grant (2014)60.
accumulated return, annual average return, and Sharpe ratio compared
to other models. References
For future work, we will focus on several directions. Firstly, the
model proposed may need to improve in the evaluation index of stan­ Ahmadi, M., Taghavirashidizadeh, A., Javaheri, D., Masoumian, A., Ghoushchi, S. J., &
Pourasad, Y. (2022). DQRE-SCnet: A novel hybrid approach for selecting users in
dard deviation, and how to optimize the model is a problem to be solved. federated learning with deep-Q-reinforcement learning based on spectral clustering.
Secondly, how quantifying the impact of external factors on the stock Journal of King Saud University-Computer and Information Sciences, 34(9), 7445–7458.
market and improving the model based on this is another challenging Araújo, J. P., Figueiredo, M. A., & Botto, M. A. (2022). Control with adaptive Q-learning:
A comparison for two classical control problems. Engineering Applications of Artificial
issue. Intelligence, 112, Article 104797.

12
Y. Wu et al. Expert Systems With Applications 228 (2023) 120474

Brim, A. (2020, January). Deep reinforcement learning pairs trading with a double deep Ma, Y., Mao, R., Lin, Q., Wu, P., & Cambria, E. (2023). Multi-source aggregated
Q-network. In 2020 10th Annual Computing and Communication Workshop and classification for stock price movement prediction. Information Fusion, 91, 515–528.
Conference (CCWC) (pp. 0222-0227). IEEE. Mahmoudabadi, A., Kuchaki Rafsanjani, M., & Javidi, M. M. (2021). Online one pass
Carta, S., Ferreira, A., Podda, A. S., Recupero, D. R., & Sanna, A. (2021). Multi-DQN: An clustering of data streams based on growing neural gas and fuzzy inference systems.
ensemble of Deep Q-learning agents for stock market forecasting. Expert systems with Expert Systems, 38(7), Article e12736.
applications, 164, Article 113820. Malandri, L., Xing, F. Z., Orsenigo, C., Vercellis, C., & Cambria, E. (2018). Public
Chakole, J. B., Kolhe, M. S., Mahapurush, G. D., Yadav, A., & Kurhekar, M. P. (2021). mood–driven asset allocation: The importance of financial sentiment in portfolio
A Q-learning agent for automated trading in equity stock markets. Expert Systems management. Cognitive Computation, 10, 1167–1176.
with Applications, 163, Article 113761. Martinetz, T. M., Berkovich, S. G., & Schulten, K. J. (1993). ’Neural-gas’ network for
Chakole, J., & Kurhekar, M. (2020). Trend following deep Q-Learning strategy for stock vector quantization and its application to time-series prediction. IEEE transactions on
trading. Expert Systems, 37(4), Article e12514. neural networks, 4(4), 558–569.
Chen, W., Jiang, M., Zhang, W. G., & Chen, Z. (2021). A novel graph convolutional Narayan, P. K., & Narayan, S. (2021). Do opinion polls on government preference
feature based convolutional neural network for stock trend prediction. Information influence stock returns. Journal of Behavioral and Experimental Finance, 30, Article
Sciences, 556, 67–94. 100493.
Chou, Y. H., Jiang, Y. C., & Kuo, S. Y. (2021). Portfolio optimization in both long and Peer, O., Tessler, C., Merlis, N., & Meir, R. (2021, July). Ensemble bootstrapping for Q-
short selling trading using trend ratios and quantum-inspired evolutionary Learning. In International Conference on Machine Learning (pp. 8454-8463). PMLR.
algorithms. IEEE Access, 9, 152115–152130. Pendharkar, P. C., & Cusatis, P. (2018). Trading financial indices with reinforcement
Chung, H., & Shin, K. S. (2020). Genetic algorithm-optimized multi-channel learning agents. Expert Systems with Applications, 103, 1–13.
convolutional neural network for stock market prediction. Neural Computing and Peng, H., Ma, Y., Poria, S., Li, Y., & Cambria, E. (2021). Phonetic-enriched text
Applications, 32(12), 7897–7914. representation for Chinese sentiment analysis with reinforcement learning.
Drenovak, M., Ranković, V., Urošević, B., & Jelic, R. (2022). Mean-maximum drawdown Information Fusion, 70, 88–99.
optimization of buy-and-hold portfolios using a multi-objective evolutionary Picasso, A., Merello, S., Ma, Y., Oneto, L., & Cambria, E. (2019). Technical analysis and
algorithm. Finance Research Letters, 46, Article 102328. sentiment embeddings for market trend prediction. Expert Systems with Applications,
Du, X., & Tanaka-Ishii, K. (2022). Stock portfolio selection balancing variance and tail 135, 60–70.
risk via stock vector representation acquired from price data and texts. Knowledge- Qiu, Y., Yang, H. Y., Lu, S., & Chen, W. (2020). A novel hybrid model based on recurrent
Based Systems, 249, Article 108917. neural networks for stock market timing. Soft Computing, 24(20), 15273–15290.
Fischer, J., Eyberg, C., Werling, M., & Lauer, M. Sampling-based Inverse Reinforcement Ray, P., Ganguli, B., & Chakrabarti, A. (2021). A hybrid approach of bayesian structural
Learning Algorithms with Safety Constraints. In 2021 IEEE/RSJ International time series With LSTM to identify the influence of news sentiment on short-term
Conference on Intelligent Robots and Systems (IROS) (pp. 791-798). IEEE. forecasting of stock price. IEEE Transactions on Computational Social Systems, 8(5),
Fritzke, B. (1994). A growing neural gas network learns topologies. Advances in neural 1153–1162.
information processing systems. 7. Schultz, W., Dayan, P., & Montague, P. R. (1997). A neural substrate of prediction and
Gao, Z., Gao, Y., Hu, Y., Jiang, Z., & Su, J. (2020, May). Application of deep q-network in reward. Science, 275(5306), 1593–1599.
portfolio management. In 2020 5th IEEE International Conference on Big Data Analytics Shahi, T. B., Shrestha, A., Neupane, A., & Guo, W. (2020). Stock price forecasting with
(ICBDA) (pp. 268-275). IEEE. deep learning: A comparative study. Mathematics, 8(9), Article 1441.
Guan, Y., Li, S. E., Duan, J., Li, J., Ren, Y., Sun, Q., & Cheng, B. (2021). Direct and Shi, Y., Li, W., Zhu, L., Guo, K., & Cambria, E. (2021). Stock trading rule discovery with
indirect reinforcement learning. International Journal of Intelligent Systems, 36(8), double deep Q-network. Applied Soft Computing, 107, Article 107320.
4439–4467. Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MIT press.
Hou, Z., Fei, J., Deng, Y., & Xu, J. (2020). Data-efficient hierarchical reinforcement Wang, Q., Xu, W., Huang, X., & Yang, K. (2019). Enhancing intraday stock price
learning for robotic assembly control applications. IEEE Transactions on Industrial manipulation detection by leveraging recurrent neural networks with ensemble
Electronics, 68(11), 11565–11575. learning. Neurocomputing, 347, 46–58.
Hu, Z., Wang, Z., Ho, S. B., & Tan, A. H. (2021, November). Stock Market Trend Wu, Y. M., Chen, L. S., Li, S. B., & Chen, J. D. (2021). An adaptive algorithm for dealing
Forecasting Based on Multiple Textual Features: A Deep Learning Method. In 2021 with data stream evolution and singularity. Information Sciences, 545, 312–330.
IEEE 33rd International Conference on Tools with Artificial Intelligence (ICTAI) (pp. Xiaoning, C. U. I., Shang, W., Jiang, F., & Shouyang, W. A. N. G. (2019, December). Stock
1002-1007). IEEE. index forecasting by hidden Markov models with trends recognition. In 2019 IEEE
Jeon, S., Hong, B., & Chang, V. (2018). Pattern graph tracking-based stock price International Conference on Big Data (Big Data) (pp. 5292-5297). IEEE.
prediction using big data. Future Generation Computer Systems, 80, 171–187. Xing, F. Z., Cambria, E., & Welsch, R. E. (2018). Intelligent asset allocation via market
Kumbure, M. M., Lohrmann, C., Luukka, P., & Porras, J. (2022). Machine learning sentiment views. ieee ComputatioNal iNtelligeNCe magaziNe, 13(4), 25–34.
techniques and data for stock market forecasting: A literature review. Expert Systems Xing, F. Z., Cambria, E., & Zhang, Y. (2019). Sentiment-aware volatility forecasting.
with Applications, Article 116659. Knowledge-Based Systems, 176, 68–76.
Li, Y., Liang, C., & Huynh, T. L. D. (2022). Forecasting US stock market returns by the Xu, H., Chai, L., Luo, Z., & Li, S. (2022). Stock movement prediction via gated recurrent
aggressive stock-selection opportunity. Finance Research Letters, Article 103323. unit network based on reinforcement learning with incorporated attention
Li, Q., Tan, J., Wang, J., & Chen, H. (2020). A multimodal event-driven lstm model for mechanisms. Neurocomputing, 467, 214–228.
stock prediction using online news. IEEE Transactions on Knowledge and Data Yang, L., Zhang, Z., Xiong, S., Wei, L., Ng, J., Xu, L., & Dong, R. (2018, November).
Engineering, 33(10), 3323–3337. Explainable text-driven neural network for stock prediction. In 2018 5th IEEE
Li, Y., Wang, R., & Yang, Z. (2021). Optimal scheduling of isolated microgrids using International Conference on Cloud Computing and Intelligence Systems (CCIS) (pp. 441-
automated reinforcement learning-based multi-period forecasting. IEEE Transactions 445). IEEE.
on Sustainable Energy, 13(1), 159–169. Yang, B., Liu, X., Peng, L., & Cai, Z. (2021). Unified tests for a dynamic predictive
Liu, Y., Ishibuchi, H., Masuyama, N., & Nojima, Y. (2019). Adapting reference vectors regression. Journal of Business & Economic Statistics, 39(3), 684–699.
and scalarizing functions by growing neural gas to handle irregular Pareto fronts. Zhang, R., Yuan, Z., & Shao, X. (2018, July). A new combined CNN-RNN model for sector
IEEE Transactions on Evolutionary Computation, 24(3), 439–453. stock price analysis. In 2018 IEEE 42nd Annual Computer Software and Applications
Liu, Q., Jin, Y., Heiderich, M., Rodemann, T., & Yu, G. (2020). An adaptive reference Conference (COMPSAC) (Vol. 2, pp. 546-551). IEEE.
vector-guided evolutionary algorithm using growing neural gas for many-objective Zhang, W., Zhang, N., Yan, J., Li, G., & Yang, X. (2022). Auto uning of price prediction
optimization of irregular problems. IEEE Transactions on Cybernetics, 52(5), models for high-frequency trading via reinforcement learning. Pattern Recognition,
2698–2711. 125, Article 108543.
Lv, J., Wang, C., Gao, W., & Zhao, Q. (2021). An economic forecasting method based on Zhang, Y., Zhao, P., Wu, Q., Li, B., Huang, J., & Tan, M. (2020). Cost-sensitive portfolio
the LightGBM-Optimized LSTM and Time-Series model. Computational Intelligence selection via deep reinforcement learning. IEEE Transactions on Knowledge and Data
and Neuroscience. Engineering, 34(1), 236–248.

13

You might also like