0% found this document useful (0 votes)
3 views

Stock Trading Strategies Based On Deep Reinforcement Learning

Stock Trading Strategies Based on Deep Reinforcement Learning Auhthors: Yawei Li,1 Peipei Liu ,2 and Ze Wang2 1 2 School of Management Science and Engineering, Shandong University of Finance and Economics, Jinan 250014, China School of Computer Science and Technology, Shandong University of Finance and Economics, Jinan 250014, China Correspondence should be addressed to Peipei Liu; [email protected] Received 1 August 2021; Revised 30 December 2021; Accepted 4 February 2022; Published

Uploaded by

abd.feteiha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Stock Trading Strategies Based On Deep Reinforcement Learning

Stock Trading Strategies Based on Deep Reinforcement Learning Auhthors: Yawei Li,1 Peipei Liu ,2 and Ze Wang2 1 2 School of Management Science and Engineering, Shandong University of Finance and Economics, Jinan 250014, China School of Computer Science and Technology, Shandong University of Finance and Economics, Jinan 250014, China Correspondence should be addressed to Peipei Liu; [email protected] Received 1 August 2021; Revised 30 December 2021; Accepted 4 February 2022; Published

Uploaded by

abd.feteiha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Hindawi

Scientific Programming
Volume 2022, Article ID 4698656, 15 pages
https://doi.org/10.1155/2022/4698656

Research Article
Stock Trading Strategies Based on Deep Reinforcement Learning

Yawei Li,1 Peipei Liu ,2 and Ze Wang2


1
School of Management Science and Engineering, Shandong University of Finance and Economics, Jinan 250014, China
2
School of Computer Science and Technology, Shandong University of Finance and Economics, Jinan 250014, China

Correspondence should be addressed to Peipei Liu; [email protected]

Received 1 August 2021; Revised 30 December 2021; Accepted 4 February 2022; Published 1 March 2022

Academic Editor: Cristian Mateos

Copyright © 2022 Yawei Li et al. This is an open access article distributed under the Creative Commons Attribution License, which
permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

The purpose of stock market investment is to obtain more profits. In recent years, an increasing number of researchers have tried
to implement stock trading based on machine learning. Facing the complex stock market, how to obtain effective information
from multisource data and implement dynamic trading strategies is difficult. To solve these problems, this study proposes a new
deep reinforcement learning model to implement stock trading, analyzes the stock market through stock data, technical indicators
and candlestick charts, and learns dynamic trading strategies. Fusing the features of different data sources extracted by the deep
neural network as the state of the stock market, the agent in reinforcement learning makes trading decisions on this basis.
Experiments on the Chinese stock market dataset and the S&P 500 stock market dataset show that our trading strategy can obtain
higher profits compared with other trading strategies.

1. Introduction Reinforcement learning solves the sequential decision-


making problem, which can be applied to stock trading to
Stock trading is the process of buying and selling stocks to learn dynamic trading strategies. Nevertheless, reinforce-
obtain investment profit. The key to stock trading is to make ment learning lacks the ability to perceive the environment.
the right trading decisions at the right times, that is, to The combination of deep learning and reinforcement
develop a suitable trading strategy [1]. In recent years, many learning (i.e., deep reinforcement learning) solves this
studies have been based on machine learning methods to problem and has more advantages when it has the decision-
predict stock trends or prices to implement stock trading. making ability of reinforcement learning and perception
However, long-duration prediction of the price or trend of ability of deep learning.
the stock is not reliable. Besides, the trading strategy based One of the challenges when implementing stock trading
on stock price prediction is static [2, 3]. The stock market is based on deep reinforcement learning is the correct analysis
affected by many factors [4–6], such as changes in investor of the state of the stock market. Financial data is nonlinear
psychology and company policies, natural disasters, emer- and unstable. Most of the existing studies on stock trading
gencies, etc., stock price fluctuates greatly. Compared with a based on deep reinforcement learning analyze the stock
static trading strategy, a dynamic trading strategy can make market through stock data [13–15]. However, there is noise
trading decisions dynamically according to the changes of in stock data, which affects the final analysis results.
the stock market, which has greater advantages. Technical indicators can reflect the changes in the stock
At present, an increasing number of studies implement market from different perspectives and reduce the influence
dynamic trading strategies based on deep reinforcement of noise [16, 17]. There are studies that convert financial data
learning. Reinforcement learning gains increasing attention into two-dimensional images for analyzing the stock market
after AlphaZero defeated humans [7], has the ability of [18–22]. Different data sources reflect the changes in the
independent learning and decision-making, and has been stock market from different perspectives. Compared with
successfully applied in the field of game playing [8, 9], the analysis of the stock market based on a single data
unmanned driving [10, 11], and helicopter control [12]. source, multisource data can integrate the characteristics of
2 Scientific Programming

different data sources, which is more conducive to the price of stock [24–27]. In the financial field, deep learning
analysis of the stock market. However, the fusion of mul- methods are used for stock price prediction because they can
tisource data is difficult. obtain temporal characteristics from financial data [28, 29];
To a deeper analysis of the stock market and learn the Chen et al. [30] analyzed 2D images transformed from fi-
optimal dynamic trading strategy, this study proposes a deep nancial data through a convolutional neural network (CNN)
reinforcement learning model and integrates multisource to classify the future price trend of stocks. When imple-
data to implement stock trading. Through the analysis of menting stock trading based on the deep learning method,
stock data, technical indicators, and candlestick charts, we the higher the accuracy of the prediction, the more helpful
obtain a deeper feature representation of the stock market, the trading decision. On the contrary, when the prediction
which is conducive to learning the optimal trading strategy. result deviates greatly from the actual situation, it will cause
Besides, the setting of the reward function in reinforcement the fault trading decision. In addition, the trading strategy
learning cannot be ignored. In stock trading, investment risk implemented by such methods is static and cannot be ad-
should be paid attention to while considering returns and justed in time according to the changes in the stock market.
reasonably balance risk and returns. Sharpe ratio (SR) Reinforcement learning can be used to implement
represents the profit that can be obtained under certain risks stock trading by self-learning and autonomous decision-
[23]. In this study, the reward function takes investment risk making. Chakole et al. [31] used Q-learning algorithm [32]
into consideration and combines SR and profit rate (PR) as to find the optimal trading strategy, in which the unsu-
the reward function to promote the learning of optimal pervised learning method K-means and candlestick chart
trading strategies. were, respectively, used to represent the state of the stock
To verify the effectiveness of the trading strategy learned market. Deng et al. [33] proposed a model Deep Direct
by our proposed model, we compare it with other trading Reinforcement Learning and added fuzzy learning, which
strategies based on practical trading data. For stocks with is the first attempt to combine deep learning and rein-
different trends, our trading strategy obtained higher PR and forcement learning in the field of financial transactions.
SR, which has better robustness. In addition, we conduct Wu et al. [34] proposed a long short-term memory based
ablation experiments, and the experimental results show that (LSTM-based) agent that could perceive stock market
the trading strategy learned from analyzing the stock market conditions and automatically trade by analyzing stock
based on multisource data is better than those learned from data and technical indicators. Lei et al. [35] proposed a
analyzing the stock market based on a single data source. The time-driven feature aware jointly deep reinforcement
main contributions of this paper are as follows: learning model (TFJ-DRL), which combines gated re-
current unit (GRU) and policy gradient algorithm to
(i) A new deep reinforcement learning model is pro-
implement stock trading. Lee et al. [36] proposed
posed to implement stock trading and integrate the
HW_LSTM_RL structure, which first used wavelet
stock data and candlestick charts to analyze the
transforms to remove noise in stock data, then based on
stock market, which is more helpful to learn the
deep reinforcement learning to analyze the stock data to
optimal dynamic trading strategy.
make trading decisions.
(ii) A new reward function is proposed. In this study, Existing studies on stock trading based on deep rein-
investment risk is taken into account, and the sum forcement learning mostly analyze the stock market through
of SR and PR is taken as the reward function. a single data source. In this study, we propose a new deep
(iii) The experimental results show that the trading reinforcement learning model to implement stock trading,
strategy learned from the deep reinforcement and analyze the state of the stock market through stock data,
learning model proposed in this paper can obtain technical indicators, and candlestick charts. In our proposed
better profits for stocks with different trends. model, firstly, different deep neural networks are used to
extract the features of different data sources. Secondly, the
2. Related Work features of different data sources are fused. Finally, rein-
forcement learning makes trading decisions according to the
In recent years, a mass of machine learning methods has fused features and continuously optimizes trading strategies
been implemented in stock trading. Investors make trading according to the profits. The setting of reward function in
decisions based on their judgment of the stock market. reinforcement learning cannot be ignored. In this study, the
However, due to the influence of many factors, they cannot SR is added to the reward function setting, and the in-
make correct trading decisions based on the changes in the vestment risk is taken into account while considering the
stock market in time. Compared with traditional trading profits.
strategies, machine learning methods can learn trading
strategies by analyzing information related to the stock 3. Methods
market and discovering profit patterns that people do not
know about without professional financial knowledge, We propose a new deep reinforcement learning model and
which have more advantages. implement stock trading by analyzing the stock market with
There are some studies based on deep learning methods multisource data. In this section, first, we introduce the
to implement stock trading. Deep learning methods usually overall deep reinforcement learning model, then the feature
implement stock trading by predicting the future trend or extraction process of different data sources is described in
Scientific Programming 3

detail. Finally, the specific application of reinforcement value, so we replaced the NaNs in the stock data and
learning in stock trading is introduced. technical indicators with 0. Data with different value ranges
may show gradient explosion during neural network
training [42]. To prevent this problem, we normalize the
3.1. The Overall Structure of the Model. Implementing stock stock data and technical indicators; normalization is per-
trading based on deep reinforcement learning and correctly formed to transform the data to a fixed interval. In this work,
analyzing the state of the stock market is more conducive to the stock data and technical indicators of each dimension are
learning the optimal dynamic trading strategy. To obtain the normalized, and the data are converted into ranges [0, 1].
deeper feature representation of the stock market state and The normalization formula is as follows:
learn the optimal dynamic trading strategy, we fuse the
features of stock data, technical indicators, and candlestick X − Xmin
Xnorm � , (1)
charts. Figure 1 shows the overall structure of the model. Xmax − Xmin
The deep reinforcement learning model we propose can
be divided into two modules, the deep learning module for where X represents the original data, Xmin and Xmax rep-
extracting features of different data sources and the rein- resent the minimum and maximum values of the original
forcement learning module for making trading decisions. data, respectively, Xnorm represents the normalized data. The
Candlestick charts features are extracted by the CNN and neural network structure for extracting stock data and
bidirectional long short-term memory (BiLSTM); stock data technical indicators is shown in Figure 2. LSTM network is a
and technical indicators are as the input of the LSTM variant of recurrent neural network (RNN), and its unit
network for feature extraction. After extracting the features structure is shown in Figure 3. LSTM solves the problem of
of different data sources, contacting the features of the gradient disappearance and gradient explosion in the long
sequence training process. In the LSTM network, f, i, and o
different data sources to implement feature fusion, the fused
represent a forget gate, an input gate, and an output gate,
features can be regarded as the state of the stock market, and
respectively. A forget gate is responsible for removing in-
the reinforcement learning module makes trading decisions
formation from the cell state. The input gate is responsible
on this basis. In addition, in the reinforcement learning
module, the algorithms used are Dueling DQN [37] and for the addition of information to the cell state. The output
Double DQN [38]. gate decides which next hidden state should be selected. Ct is
the state of the memory cell at time t; C􏽥t is the value of the
candidate state of the memory cell at time t; σ and tanh are
3.2. Deep Learning Module. The purpose of this study is to the sigmoid and tanh activation functions, respectively; W
obtain a deeper feature representation of the stock market and b represent the weight and deviation matrix, respec-
environmental state through the fusion of multisource data tively; xt is the input vector; ht is the output vector; in this
to learn the optimal dynamic trading strategy. Although raw paper, xt is the data after the contacting of stock data and
stock data can reflect changes in the stock market, they technical indicators, xt and other specific calculation for-
contain considerable noise. To reduce the impact of noise mulas are as follows:
and perceive the changes of the stock market more objec- xt � (open, low, close, ..., MACD, RSI, Willr)
tively and accurately, relevant technical indicators are used
as one of the data sources for analyzing the stock market in it � σ Wi · 􏼂ht−1 , xt 􏼃 + bi 􏼁
this study. Candlestick charts can reflect the changes in the C􏽥t � tanh Wc · 􏼂ht−1 , xt 􏼃 + bc 􏼁
stock market from another perspective. This paper fuses the (2)
features of the candlestick charts. Ct � ft ∗ Ct−1 + it ∗ C􏽥t
ot � σ Wo 􏼂ht−1 , xt 􏼃 + bo 􏼁
3.3. Stock Data and Technical Indicator Feature Extraction. ht � ot ∗ tanh Ct 􏼁.
Due the noise in stock data, we use relevant technical in-
dicators to reduce the impact of noise. The technical indi- In the entire process of feature extraction of stock data
cators reflect the changes in the stock market from different and technical indicators, the stock data and technical in-
perspectives. In this paper, stock data and technical indi- dicators are first cleaned and normalized. Then, the nor-
cators are used as inputs to the LSTM network to better malized data are used as the input of the LSTM network for
capture the main trends of stocks. The raw stock data we use feature extraction. Finally, the final feature is obtained by
include opening price, closing price, high price, low price, feature extraction through the two-layer LSTM network.
and trading volume. The technical indicators used in this
paper are the MACD, EMA, DIFF, DEA, KDJ, BIAS, RSI,
and WILLR. The indicators are calculated by mathematical 3.4. Candlestick Chart Feature Extraction. To extract more
formulas based on stock prices and trading volumes [39], as informative features, in this study, historical stock data are
reported in Table 1. transformed into candlestick charts; candlestick charts
To facilitate subsequent calculations, we perform missing contain not only the candlestick but also other information,
value processing on the stock data. First, the stock data is which can be divided into two parts, the upper part is the
cleaned, and the missing data are supplemented with 0. In candlestick and moving average of the closing price, the
addition, the input of the neural network must be a real lower part is the trading volume histogram and its moving
4 Scientific Programming

240
230
220
V (s)
210
CNN BiLSTM
1e7
FC FC
1

0
0 10 20

features vi
candlestick

Open Q–Value
... Open
A (s, a)
Volume Volume
... LSTM features v FC FC
MACD MACD
... Willr
Willr
features vd

Figure 1: The overall structure of the model. vi represents candlestick chart feature, vd represents the feature of stock data and technical
indicators, and the feature vector obtained by contacting these two feature vectors is used as the input of the two fully connected (FC) layers.
In this paper, FC layers are used to construct the dueling DQN network; the two FC layers represent the advantage function A(s, a) and state
value function V(s) in dueling DQN. The final Q value is obtained by adding the outputs of the two functions.

Table 1: The list of used technical indicators.


Technical indicator Indicator description
MACD Moving average convergence/divergence
EMA Exponential moving average
DIFF Diff
DEA Difference exponential average
KDJ Stochastics
BIAS Bias
RSI Relative strength index
WILLR Williams %R

Feature Extraction

LSTM LSTM ... LSTM

LSTM LSTM ... LSTM

Price Indicators Price Indicators ... Price Indicators

MACD MACD MACD


Open Open Open
DIFF DIFF DIFF
Close Close Close
EMA EMA EMA
Low Low Low
RSI RSI RSI
High High High
BIAS BIAS BIAS
Volume Volume Volume
WILLR WILLR WILLR

Figure 2: The network structure for extracting features of stock data and technical indicators.
Scientific Programming 5

high high
Ct-1 Ct-1
open close
tanh

Body Body
~
ft it Ct Ot open
close
σ σ tanh σ
low low
ht-1 ht
Figure 4: Candlestick representation.

Xt consideration, SR is an indicator for evaluating transactions


and is used to optimize the trade-off between profitability
Figure 3: LSTM network unit structure. and risk. SR is the expected return minus the risk-free rate
and then divided by the variance of the return. Considering
both investment risk and return change, the reward function
average. Generally, the candlestick consists of body, upper
obtained by the sum of the two is more advantageous than
shadow, and lower shadow. The body is the difference be-
the reward function based on return change, and this is
tween the closing price and the opening price of the stock
proved by experiments, so the immediate reward is set to the
during the trading session, as shown in Figure 4. If the
sum of PR and the SR, the specific formula is as follows:
opening price is lower than the closing price, it indicates that
the price is rising, this kind of candle is called a bullish E R P 􏼁 − Rf
candle, and the color of the candlestick is red. And if the SR � ,
σP
open price is higher than the close price, it indicates that the
price has fallen, and the color of the candlestick is green. For Pt − Pt−1
a bullish candlestick, the upper shadow is the difference PR � ,
Pt−1
between the high price and the close price, and the lower (3)
shadow represents the difference between the low price and r � SR + PR,
the open price. For a bearish candlestick, the upper shading
indicates the difference between the high price and the open T
price, and the lower shading indicates the difference between Rt � 􏽘 ck rt+k+1 ,
the low price and the close price. The trading time of stocks k�0
can range from one minute to one month. The candlestick
where Pt is the sum of the assets owned by the investors at
chart is based on days in this study.
time t, E(RP ) is the expected portfolio return, Rf is the risk-
The network structure of extracting candlestick chart
free rate, and σ P is the standard deviation of the return, Rt is
features is shown in Figure 5. In this study, we first obtain the
cumulative rewards, c ∈ (0, 1), which is a discount factor.
features of the candlestick chart through three layers of
In this paper, we combine the Double DQN algorithm
convolution and pooling, then transform the obtained
and the Dueling DQN algorithm, both of which are im-
vector, input the BiLSTM network for feature extraction,
proved algorithms based on the DQN algorithm. In the
and finally obtain the final features.
value-based deep reinforcement learning algorithm, ac-
tions are selected according to the Q value. In the DQN
3.5. Reinforcement Learning Module. Reinforcement learn- algorithm, there are two networks with the same structure,
ing includes agent, environment, state, action, and reward. the main network and the target network. Initially, the
The agent chooses action according to the environmental parameters of the two networks are the same. During the
state, which will get the immediate reward every time it training process, the target network and the main network
chooses an action. The agent constantly adjusts the learning update the parameters in different ways. In the DQN al-
strategy according to the reward value to obtain the largest gorithm, under the ε-greedy strategy, the agent that has a
cumulative reward value. For example, in the process of greater probability chooses the action corresponding to the
stock trading, if the trading action selected by the agent gains maximum Q value, which will cause the Q value to be
a profit, it will get a positive reward value. In contrast, if there overestimated.
is a loss after choosing a trade action, the agent will get a Compared with the DQN algorithm, the Dueling DQN
negative reward value. Reward promotes the agent to make algorithm changes the calculation method of the Q value
the correct action in future behavior choices. Most previous through the addition of the state value function V(s) and
works used trading profits as an immediate reward of re- advantage function A(s, a). The value function V(s) is used
inforcement learning to optimize the trading strategies. to evaluate the quality of the state, and the advantage
However, this only considers the changes in the profits after function A(s, a) is used to evaluate the quality that the agent
each trading action is taken and does not consider the in- chooses action a in state s. The calculation formula of the Q
vestment risk. In this paper, we take investment risk into value is as follows:
6 Scientific Programming

10.6
10.4
10.2
10.0
9.8
9.6
Conv Pool Conv Pool Conv Pool BiLSTM vi
1e8

1
0
0 10 20

Figure 5: The network structure for extracting features of a candlestick chart. vi is the final features.

Q(s, a; θ, α, β) � A(s, a; θ, α) + V(s; θ, β), (4) The sum of the features extracted by the deep
learning module represents the environment state st ;
where α and β, respectively, represent the parameters in the With the probability E choose a random action at ;
value function V(s) and advantage function A(s, a), θ
Otherwise select at � argmax Q(st , a; θ) ;
represents other parameters in the deep reinforcement a
learning modal. Get the reward rt and next state st+1 ;
Double DQN changes the calculation of the Q value of Store the transaction (st , at , rt , st+1 ) to D;
the target network and solves the Q value overestimation if t%n � 0 then
problem in the DQN algorithm, which can be combined
with the Dueling DQN algorithm to improve the overall Sample minibatch (st , at , rt , st+1 ) randomly
performance of the model. The formula for calculating the Q from D;
value of the target network in the Double DQN algorithm is Set:
r if the state st+1 is terminal,
as follows: Yt � 􏼨 t
rt
+cQ(st+1 , arg max Q(st+1 ; θt ); θ′t)otherwise.
a network with loss function
Train the
Yt � rt+1 + cQ􏼠st+1 , argmax Q st+1 ; θt 􏼁; θ′t􏼡, (5)
a
L(θ) � E[(Yt − Q(s, a; θ))2
where θ and θ′ represent the parameters in the main network Update the target network parameters θ’ � θ
and target network, respectively. every N steps;
The loss function is the mean square error of the Q value end if
of the main network and the target network. The formula is
shown as follows: end for
end for
2
L(θ) � E􏽨 Yt − Q(s, a; θ)􏼁 􏽩. (6)
4. Experiment and Results
In this study, we analyze the stock market from stock
data, technical indicators, and candlestick charts and fuse This section mainly introduces the dataset, evaluation
the features of the different data sources to obtain stock metrics, comparison methods, implementation details, and
market state features representation and help the agent learn experimental result analysis.
the optimal dynamic trading strategy. Trading action at � {
long, neural, short } � {1, 0, −1}, long, neural and short
represent buy, hold, and sell, respectively. When the trading 4.1. Datasets. In this study, we verify the dynamic trading
action is long, cash is converted into stock as much as strategy learned from the proposed model on datasets of
possible, and when the trading action is short, all shares are Chinese stocks and S&P 500 stock market stocks and
sold into cash. In addition, transaction costs in stock trading compare them with other trading strategies. The period
cannot be ignored. High-frequency transactions result in range of the dataset is from January 2012 to January 2021.
higher costs; the transaction cost in this paper is 0.1% of the The training period ranges from January 2012 to December
stock value [40]. The trading process is shown in Algorithm 2018; the testing period ranges from January 2019 to January
1. 2021. For stock data, it includes the daily open price, high
price, low price, close price, and trading volume of the stock,
Input: stock data, technical indicators, candlestick as shown in Table 2.
chart;
Initialize the experience replay memory D to capacity
C; 4.2. Metrics. The evaluation indicators used in this paper are
PR, the annualized rate of return (AR), SR, and max
Initialize the main Q network with random weights θ;
drawdown duration (MDD). The details are as follows:
Initialize the target Q network with θ’ � θ;
(i) PR refers to the difference between the assets owned
for episode 1 to N do
at the end of the stock transaction and the original
for t � 1 to T do assets divided by the original assets.
Scientific Programming 7

Table 2: Stock data structure example. learning for stock trading and is a relatively new
Date Open High Low Close Volume
method.
... ... ... ... ... ...
2018/4/5 177.01 185.49 175.75 91.98 2749769
2018/4/6 185.30 190.95 184.08 94.13 2079350
4.4. Implementation Details. This study is based on deep
2018/4/7 189.06 194.00 186.98 94.28 2565820 reinforcement learning to implement stock trading and fuses
2018/4/8 192.03 196.00 188.20 93.98 1271339 the features of stock data, technical indicators, and can-
... ... ... ... ... ... dlestick charts as the state of the stock market. LSTM
network extracts features of stock data and technical indi-
cators, the size of the hidden layer is 128, and the size of the
(ii) AR is the ratio of the profits to the principal of the candlestick chart is 390 × 290. In the process of learning the
investment period of one year. The formula is de- optimal trading strategy, an episode is the trading period
fined as follows: ranges from January 2012 to December 2018. The episode in
total profits 365 training is 200. In the ε-greedy strategy of reinforcement
AR � ∗ ∗ 100. (7) learning, the ε � 0.8. The length of the sliding time window is
principal trading days
set to 30 days, and the learning rate is 0.0001.
(iii) SR is a standardized comprehensive evaluation
index, which can consider both profits and risks at
the same time to eliminate the adverse impact of risk 4.5. Experimental Results
factors on performance evaluation.
4.5.1. Comparative Experiment on the Chinese Stock Dataset.
(iv) MDD refers to the maximum losses that can be
We select 10 Chinese stocks with different trends for
borne during trading, and the lower the value, the
comparative experiments, and the initial amount is 100,000
better the performance of the trading strategy.
Chinese Yuan (CNY). The results of the experiment are
shown in Tables 3–6. We select three stocks with different
trend changes to further demonstrate the PR changes, as
4.3. Baselines
shown in Figure 6.
(i) Buy and Hold (B&H) [41] refers to the construction The traditional trading strategy B&H is a passive trading
of a certain portfolio according to the determined strategy, which has an advantage in stocks with rising prices.
appropriate asset allocation ratio, and the mainte- However, it does not perform well for stocks with large price
nance of this portfolio during the appropriate fluctuations or downward trends. It can be seen from Fig-
holding period without changing the ure 6 that for the stock 002460 with an upward trend, the
asset allocation status. And B&H strategy is a B&H trading strategy can obtain a higher PR, while for the
passive investment strategy. other two stocks, 601101 and 600746, with different trends,
(ii) Based on the Q-learning algorithm, two models are the PR obtained are not as good as the other trading
proposed to implement stock trading [31]. The two strategies. Trading strategies learned based on the Q-learning
models perceive the stock market environment in algorithm are dynamic, compared with the traditional
different ways, model 1 analyzes the stock market trading strategy B&H. In most cases, the trading strategies
through the k-means method, model 2 analyzes the learned based on model 1 can obtain higher PR, AR, SR, and
stock market through a candlestick chart, and the lower MDD for stocks with different trends in different
experimental results show that model 1 performs fields. Nonetheless, reinforcement learning lacks the ability
better than model 2, so we only compare with model to perceive the environment. Compared with the trading
1. In model 1, the size n of clusters is set to 3, 6, and strategies learned based on the deep reinforcement learning
9, and we compare them, respectively. model, the trading strategies learned based on model 1 do
not have obvious advantages. LSTM_based, TFJ-DRL, and
(iii) A LSTM-based agent is proposed to learn the
HW_LSTM_RL are all methods based on deep reinforce-
temporal relationship between data and implement
ment learning. The data sources analyzed by these methods
stock trading [34], and the effects of different
are relatively single, compared with the traditional trading
combinations of technical indicators on trading
strategy B&H and trading strategies learned based on Model
strategies are verified. In this paper, only a group of
1. The trading strategies learned by these methods can obtain
technical indicators with the best results are
more profits for stocks with different trends in different
compared.
fields. From the experimental results, we can see that the
(iv) The time-driven feature-aware jointly deep rein- dynamic trading strategies learned by our proposed model
forcement learning (TFJ-DRL) [35] model, which have better performance. It can be seen from Tables 3–6 that
used GRU to extract temporal features of stock data stocks with different trends, the trading strategy learned by
and technical indicators and implement stock the model proposed in this paper, have better performance.
trading through the policy gradient algorithm. Compared with other trading strategies, the evaluation in-
(v) HW_LSTM_RL [36] is a structure that combines dicators of our trading strategy are the highest in most cases.
wavelet transformation and deep reinforcement On the whole, the average PR of our trading strategy is
8 Scientific Programming

Table 3: PR comparison of different methods in the Chinese stock market dataset.


PR (%)
B&H [41] Model 1 [31] LSTM-based [34] TFJ-DRL [35] HW_LSTM_RL [36] Ours
Stock n�3 n�6 n�9
002460 365.62 −50.66 −19.48 −68.11 133.29 379.32 257.68 418.12
601101 −18.93 −14.38 −18.07 −19.77 −18.79 −28.59 −25.09 −9.28
600746 2.49 16.17 7.54 31.55 2.85 0.14 4.05 32.30
600316 438.22 472.61 480.32 524.03 492.31 502.73 536.28 595.93
600028 −19.22 20.87 22.35 19.56 18.70 12.49 21.38 23.94
600900 26.21 28.25 22.11 25.67 9.44 19.63 27.14 24.35
002129 189.41 193.78 163.50 173.92 176.40 178.05 190.31 196.67
600704 −2.91 10.89 16.54 12.34 9.91 15.26 18.28 22.98
600377 0.58 6.09 3.82 10.41 11.12 6.73 9.41 14.05
300122 285.45 268.52 259.87 298.91 275.26 290.35 301.48 309.67
Average 126.69 52.68 50.62 53.69 111.05 137.61 134.09 162.87

Table 4: AR comparison of different methods in the Chinese stock market dataset.


AR (%)
B&H [41] Model 1 [31] LSTM-based [34] TFJ-DRL [35] HW_LSTM_RL [36] Ours
Stock n�3 n�6 n�9
002460 67.25 −51.75 −4.72 −100 45.64 68.09 59.14 70.37
601101 −4.73 −4.63 −7.73 −7.70 −4.64 −11.62 −8.97 0.95
600746 11.58 13.31 9.47 15.52 21.76 16.59 12.30 28.18
600316 72.53 74.23 74.72 76.08 68.28 67.12 72.29 79.83
600028 5.32 5.21 6.81 4.69 6.99 5.72 6.33 7.39
600900 12.35 13.82 10.47 12.04 5.83 11.36 12.74 11.69
002129 53.14 54.26 48.12 50.33 49.41 51.42 54.02 58.90
600704 2.32 7.43 10.06 8.49 9.32 9.95 11.47 13.28
600377 1.73 3.11 1.63 5.84 6.42 2.58 4.23 7.70
300122 60.18 57.12 55.84 61.40 60.52 62.47 59.24 65.39
Average 28.17 17.21 20.47 12.67 26.95 28.37 28.28 34.37

Table 5: SR comparison of different methods in the Chinese stock market dataset.


SR
B&H [41] Model 1 [31] LSTM-based [34] TFJ-DRL [35] HW_LSTM_RL [36] Ours
Stock n�3 n�6 n�9
002460 1.72 −1.01 −0.08 −1.56 1.04 1.75 1.47 1.83
601101 −0.13 −0.21 −0.41 −0.32 −0.13 −0.31 −0.24 0.03
600746 0.25 0.39 0.29 0.62 0.49 0.30 0.41 0.52
600316 1.77 1.79 1.83 1.89 1.65 1.73 1.92 2.00
600028 0.23 0.21 0.34 0.19 0.35 0.22 0.27 0.49
600900 0.74 0.81 0.67 0.72 0.34 0.68 0.71 0.70
002129 1.23 1.25 1.18 1.16 1.20 1.12 1.22 1.34
600704 0.08 0.35 0.41 0.39 0.38 0.40 0.43 0.50
600377 0.10 0.22 0.15 0.35 0.39 0.24 0.30 0.46
300122 1.64 1.67 1.52 1.69 1.48 1.61 1.70 1.78
Average 0.76 0.55 0.59 0.51 0.72 0.77 0.82 0.97

162.87, AR is 34.37, SR is 0.97, and MDD is 29.98, which are the S&P 500 dataset, we selected 10 stocks with different
higher than other trading strategies. trends and compared them with other trading strategies;
the initial capital is 100,000 USD. The results of the
experiment are shown in Tables 7–10. In this section, we
4.5.2. Comparative Experiment on the S&P 500 Stock select ten stocks with different trends and show their
Dataset. To further verify the performance of the trading price changes and PR changes in detail, as shown in
strategy learned by the model proposed in this paper, in Figure 7.
Scientific Programming 9

Table 6: MDD comparison of different methods in the Chinese stock market dataset.
MDD (%)
B&H [41] Model 1 [31] LSTM-based [34] TFJ-DRL [35] HW_LSTM_RL [36] Ours
Stock n�3 n�6 n�9
002460 41.32 60.70 50.50 69.37 56.68 48.05 41.32 41.32
601101 56.98 42.17 39.50 43.90 56.29 59.38 57.80 46.44
600746 42.58 44.56 40.65 36.01 52.47 53.53 39.68 49.84
600316 25.36 26.71 26.59 26.83 28.91 30.12 34.40 25.12
600028 50.03 53.17 51.77 56.82 17.20 43.95 23.39 15.58
600900 16.19 15.29 19.40 16.38 19.95 17.23 18.34 17.42
002129 35.48 35.21 38.76 34.92 35.48 42.83 32.57 30.41
600704 25.77 26.91 25.08 26.96 30.79 36.24 24.16 23.37
600377 19.76 19.84 20.62 18.26 20.31 23.48 18.22 15.45
300122 38.07 38.28 40.96 36.05 42.60 39.12 38.55 34.89
Average 35.15 36.28 35.58 36.55 36.07 39.39 32.84 29.98

100 4

3
80
Close Price

Profit rate

2
60

1
40
0
20
2019-01

2019-04

2019-07

2019-10

2020-01

2020-04

2020-07

2020-10

2021-01

2019-01

2019-04

2019-07

2019-10

2020-01

2020-04

2020-07

2020-10

2021-01
Date Date

B&H[41] LSTM_based[34]
Model 1 n=3[31] TFJ–DRL[35]
Model 1 n=6[31] HW_LSTM_RL[36]
Model 1 n=9[31] Our

(a) (b)

9.5
0.6
9.0
8.5 0.4
Close Price

8.0
Profit rate

0.2
7.5
0.0
7.0
6.5 -0.2
6.0
-0.4
5.5
2019-01

2019-04

2019-07

2019-10

2020-01

2020-04

2020-07

2020-10

2021-01

2019-01

2019-04

2019-07

2019-10

2020-01

2020-04

2020-07

2020-10

2021-01

Date Date

B&H[41] LSTM_based[34]
Model 1 n=3[31] TFJ–DRL[35]
Model 1 n=6[31] HW_LSTM_RL[36]
Model 1 n=9[31] Our

(c) (d)
Figure 6: Continued.
10 Scientific Programming

0.3
7 0.2

0.1
6
0.0
Close Price

Profit rate
-0.1
5
-0.2

-0.3
4
-0.4

3 -0.5
2019-01

2019-04

2019-07

2019-10

2020-01

2020-04

2020-07

2020-10

2021-01

2019-01

2019-04

2019-07

2019-10

2020-01

2020-04

2020-07

2020-10

2021-01
Date Date

B&H[41] LSTM_based[34]
Model 1 n=3[31] TFJ–DRL[35]
Model 1 n=6[31] HW_LSTM_RL[36]
Model 1 n=9[31] Our

(e) (f )

Figure 6: Changes in the price and PR of stocks with different trends. (a) 002460. (b) 002460. (c) 600746. (d) 600746. (e) 601101. (f ) 601101.

Table 7: PR comparison of different methods in the S&P 500 market dataset.


PR (%)
B&H [41] Model 1 [31] LSTM-based [34] TFJ-DRL [35] HW_LSTM_RL [36] Ours
Stock n�3 n�6 n�9
AMD 296.09 323.08 315.36 337.61 301.28 283.45 336.72 352.67
AAL −55.78 −11.73 −75.31 −28.66 −60.89 −70.49 −18.58 −3.69
BIO 122.72 136.15 116.40 129.37 128.24 133.61 140.27 143.67
BLK 79.56 83.41 75.94 84.28 74.25 78.97 82.62 85.39
TSLA 1060.14 1167.90 1005.81 986.43 958.22 1024.26 1130.22 1184.59
AAPL 216.41 218.39 206.28 224.83 197.34 209.41 213.70 227.61
GOOGL 54.76 22.87 34.83 61.24 39.29 50.87 52.97 62.47
IBM 0.41 2.46 5.51 −2.31 18.25 8.50 4.18 28.30
HST −15.05 −10.81 −18.92 −14.06 −17.51 −14.32 −10.63 −3.47
PG 47.52 49.84 50.06 46.61 40.94 44.72 50.25 52.04
Average 180.68 198.16 171.60 182.53 167.94 174.90 198.17 212.96

Table 8: AR comparison of different methods in the S&P 500 market dataset.


AR (%)
B&H [41] Model 1 [31] LSTM-based [34] TFJ-DRL [35] HW_LSTM_RL [36] Ours
Stock n�3 n�6 n�9
AMD 63.19 65.01 64.23 67.49 63.13 54.20 69.41 72.26
AAL −13.27 −0.04 −48.11 −13.73 −20.70 −35.88 6.82 32.25
BIO 38.92 41.57 36.93 39.61 28.29 37.43 40.57 42.73
BLK 31.10 31.48 29.85 32.17 25.30 31.72 32.96 33.20
TSLA 99.24 100.03 97.15 96.09 84.61 94.83 96.25 103.85
AAPL 51.27 52.28 49.54 53.23 46.49 50.30 52.81 55.71
GOOGL 23.87 22.04 14.19 25.57 19.11 22.86 22.97 25.94
IBM 5.06 5.12 6.16 2.59 12.58 8.30 5.65 16.34
HST 4.16 4.69 3.73 4.01 5.04 4.92 6.44 8.92
PG 20.56 20.87 21.65 18.02 19.83 21.41 20.36 24.14
Average 32.41 34.30 27.53 32.51 28.37 29.01 35.42 41.43
Scientific Programming 11

Table 9: SR comparison of different methods in the S&P 500 market dataset.


SR
B&H [41] Model 1 [31] LSTM-based [34] TFJ-DRL [35] HW_LSTM_RL [36] Ours
Stock n�3 n�6 n�9
AMD 1.55 1.64 1.59 1.69 1.42 1.30 1.61 1.73
AAL −0.16 0 −0.56 −0.54 −0.24 −0.36 0.12 0.42
BIO 1.29 1.32 1.14 1.30 1.27 1.12 1.22 1.36
BLK 0.98 0.92 0.83 1.16 0.85 0.96 1.06 1.24
TSLA 2.08 2.11 1.87 1.79 1.69 1.92 2.13 2.21
AAPL 1.76 1.78 1.62 1.81 1.68 1.74 1.71 1.85
GOOGL 0.85 0.56 0.93 0.92 0.71 0.81 0.86 1.11
IBM 0.16 0.19 0.23 0.10 0.42 0.28 0.21 0.55
HST 0.08 0.09 0.05 0.07 0.05 0.07 0.11 0.16
PG 0.89 0.91 0.93 0.75 0.65 0.90 0.82 0.98
Average 0.95 0.95 0.86 0.91 0.85 0.87 0.99 1.16

Table 10: MDD comparison of different methods in the S&P 500 market dataset.
MDD (%)
B&H [41] Model 1 [31] LSTM-based [34] TFJ-DRL [35] HW_LSTM_RL [36] Ours
Stock n�3 n�6 n�9
AMD 34.28 34.06 34.82 33.95 35.47 39.05 30.41 32.11
AAL 74.72 30.20 82.01 41.15 77.64 76.10 48.31 34.06
BIO 19.72 19.23 21.65 19.28 25.83 22.64 20.16 18.09
BLK 42.27 45.21 49.32 38.75 50.28 48.45 40.91 36.27
TSLA 33.73 31.37 36.04 38.61 41.29 30.73 35.20 27.06
AAPL 20.37 21.38 23.09 19.71 22.74 21.29 18.43 16.51
GOOGL 32.41 43.69 38.62 44.41 20.22 30.84 22.52 24.01
IBM 38.96 33.68 31.71 32.65 31.60 30.37 24.62 30.55
HST 52.05 47.26 49.25 47.81 59.18 48.20 44.36 43.08
PG 23.14 23.27 27.18 29.02 33.38 30.43 25.58 20.06
Average 37.17 32.93 39.37 34.53 39.76 37.81 31.05 28.18

1800 0.6
1700

1600 0.4
Close Price

1500
Profit rate

1400 0.2

1300
0.0
1200

1100
-0.2
1000
2019-01

2019-04

2019-07

2019-10

2020-01

2020-04

2020-07

2020-10

2021-01

2019-01

2019-04

2019-07

2019-10

2020-01

2020-04

2020-07

2020-10

2021-01

Date Date

B&H[41] LSTM_based[34]
Model 1 n=3[31] TFJ–DRL[35]
Model 1 n=6[31] HW_LSTM_RL[36]
Model 1 n=9[31] Our

(a) (b)
Figure 7: Continued.
12 Scientific Programming

140 0.3

130 0.2
Close Price

0.1

Profit rate
120
0.0
110
-0.1
100
-0.2
90
-0.3
2019-01

2019-04

2019-07

2019-10

2020-01

2020-04

2020-07

2020-10

2021-01

2019-01

2019-04

2019-07

2019-10

2020-01

2020-04

2020-07

2020-10

2021-01
Date Date

B&H[41] LSTM_based[34]
Model 1 n=3[31] TFJ–DRL[35]
Model 1 n=6[31] HW_LSTM_RL[36]
Model 1 n=9[31] Our

(c) (d)

35 0.4

30 0.2

0.0
Close Price

Profit rate

25
-0.2
20
-0.4
15
-0.6
10
-0.8
2019-01

2019-04

2019-07

2019-10

2020-01

2020-04

2020-07

2020-10

2021-01

2019-01

2019-04

2019-07

2019-10

2020-01

2020-04

2020-07

2020-10

2021-01
Date Date

B&H[41] LSTM_based[34]
Model 1 n=3[31] TFJ–DRL[35]
Model 1 n=6[31] HW_LSTM_RL[36]
Model 1 n=9[31] Our

(e) (f )

Figure 7: Changes in the price and PR of stocks with different trends. (a) GOOGL. (b) GOOGL. (c) IBM. (d) IBM. (e) AAL. (f ) AAL.

For stocks with different trends in S&P 500, our trading stocks selected from the Chinese stock market are 600746
strategy also has a better performance. It can be seen from and 601101, and the stocks selected from the S&P 500 are
Tables 7–10 that compared with other trading strategies, our GOOGL and IBM, the training period ranges from January
trading strategy can obtain higher yields, SR, AR, and MDD, 2012 to December 2018, and the testing period ranges from
and on the whole, our PR reached 212.96, SR reached 1.16, January 2019 to January 2021. The experimental results are
obviously higher than other trading strategies. shown in Table 11.
To further verify the performance of our proposed model It can be seen from the experimental results in Table 11
in different stock markets, we conducted Mann–Whitney U that compared to the reward function without the SR, when
test on the profits rate of 20 stocks selected from the Chinese the reward function contains the SR, the learned trading
stock market and the S&P 500 stock market. The results strategies have a better performance overall. Different from
showed that P � 0.677 > 0.05, indicating that there is no the existing algorithmic trading based on deep reinforce-
significant difference between the returns obtained by our ment learning, most of which take profit rate as reward
model in the Chinese stock market and the S&P 500 stock function, this study takes investment risk into account and
market, and it has good generalization ability. adds SR and profit rate as reward function, and the learned
trading strategy obtains higher PR, AR, SR, and MDD.
4.5.3. Reward Function Comparison Experiment. In this
section, we set the reward function with SR and without SR 4.5.4. Ablation Experiments. In this section, to verify the
and select two stocks from the Chinese stock market and the effectiveness of multisource data fusion, we conduct an
S&P 500 stock market for comparison experiments. The ablation experiment. Three groups of comparative
Scientific Programming 13

Table 11: Comparison results of the reward function experiment.

Stock Reward function with SR Reward function without SR


PR (%) AR (%) SR MDD (%) PR (%) AR (%) SR MDD (%)
600746 595.93 28.18 0.52 28.18 528.31 25.72 0.46 33.39
601101 −9.28 0.95 0.03 46.44 −14.99 −2.31 −0.06 48.54
GOOGL 62.47 25.94 1.11 24.01 52.18 22.90 0.95 29.40
IBM 28.30 16.34 0.55 30.55 24.39 14.28 0.40 35.20

0.6
0.5
0.4
Profit rate

0.3
0.2
0.1
0.0
–0.1
2019-01

2019-04

2019-07

2019-10

2020-01

2020-04

2020-07

2020-10

2021-01
Date

Group one
Group two
Group three

Figure 8: Profit curves of the ablation experiment on stock GOOGL.

Table 12: Comparison results of ablation experiments on stock GOOGL.


PR (%) AR (%) SR MDD (%)
Group one 44.74 22.36 0.95 27.35
Group two 54.76 23.70 0.98 25.61
Group three 62.47 25.94 1.11 24.01

experiments are carried out, all of which are based on deep implement stock trading. Stock data, technical indicators,
reinforcement learning to implement stock trading. The first and candlestick charts can reflect the changes in the stock
group analyzes the stock market through stock data and market from different perspectives, we use different deep
technical indicators; the second group analyzes the stock neural networks to extract the features of the data source and
market through a candlestick chart; and the third group fuse features, and the fused features are more helpful to learn
analyzes the stock market through stock data, technical the optimal dynamic trading strategy.
indicators, and candlestick chart. We select the trading data It can be concluded from the experimental results that
of GOOGL stock from January 2012 to January 2021 as the the trading strategies learned based on the deep rein-
dataset for this section, in which January 2012 to December forcement learning method can be dynamically adjusted
2018 is the training data, and January 2019 to January 2021 is according to the stock market changes and have more
the test data. The comparison results are shown in Figure 8 advantages. Compared with other trading strategies, our
and Table 12. trading strategy has better performance for stocks with
The experimental results show that compared with the different trends, and the average SR value is the highest,
trading strategies learned in the first two groups, the trading which means that under the same risk, our trading strategy
strategies learned in the third group can obtain higher PR, can get more profits. However, textual information such as
SR, AR, and lower MDD. This also proves that the analysis of investor comments and news events has an impact on the
multisource data can obtain a deeper feature representation fluctuations of stock prices and cannot be ignored. It is
of the stock market, which is more conducive to learning the important to obtain informative data from relevant texts
optimal trading strategy. for stock trading. In future research, we will consider
different text information and train more stable trading
5. Conclusion strategies.

Correct analysis of the stock market state is one of the Data Availability
challenges when implementing stock trading based on deep
reinforcement learning. In this research, we analyze mul- The experimental data in this article can be downloaded
tisource data based on deep reinforcement learning to from Yahoo Finance (https://finance.yahoo.com/).
14 Scientific Programming

Conflicts of Interest [12] A. Y. Ng, H. J. Kim, M. I. Jordan, and S. Sastry, “Autonomous


helicopter flight via reinforcement learning,” in Proceedings of
The authors declare no conflicts of interest. the Conference and Workshop on Neural Information Pro-
cessing Systems, vol. 16, 2003.
[13] W. Bao, J. Yue, and Y. Rao, “A deep learning framework for
Acknowledgments financial time series using stacked autoencoders and long-
short term memory,” PLoS One, vol. 12, no. 7, p. e0180944,
This research was funded by the National Natural Science 2017.
Foundation of China (Grant nos. 61972227 and 61902217), [14] K. Chen, Y. Zhou, and F. Dai, “A LSTM-based method for
the Natural Science Foundation of Shandong Province stock returns prediction: a case study of China stock market,”
(Grant nos. ZR2019MF051, ZR2020MF037, and in Proceedings of the IEEE International Conference on Big
ZR2019BF043), the NSFC-Zhejiang Joint Fund of the In- Data, pp. 2823-2824, Santa Clara, CA, USA, Nov 2015.
tegration of Informatization and Industrialization (Grant [15] S. Karaoglu, U. Arpaci, and S. Ayvaz, “A deep learning ap-
no. U1909210), Key Research and Development Project of proach for optimization of systematic signal detection in fi-
Shandong Province (Grant nos. 2019GGX101007 and nancial trading systems with big data,” International Journal
2019GSF109112), and the Science and Technology Plan for of Intelligent Systems and Applications in Engineering,
Young Talents in Colleges and Universities of Shandong vol. 2017, no. Special Issue, pp. 31–36, 2017.
Province (Grant no. 2020KJN007). [16] C. J. Neely, D. E. Rapach, J. Tu, and G. Zhou, “Forecasting the
equity risk premium: the role of technical indicators,”
Management Science, vol. 60, no. 7, pp. 1772–1791, 2014.
References [17] A. Gorgulho, N. Rui, and N. Horta, “Applying a GA kernel on
optimizing technical analysis rules for stock picking and
[1] Y. Li, W. Zheng, and Z. Zheng, “Deep robust reinforcement portfolio composition,” Expert Systems with Applications,
learning for practical algorithmic trading,” IEEE Access, vol. 7, vol. 38, no. 11, pp. 14072–14085, 2011.
pp. 108014–108022, 2019. [18] O. B. Sezer and A. M. Ozbayoglu, “Algorithmic financial
[2] T. Fischer and C. Krauss, “Deep learning with long short-term trading with deep convolutional neural networks: time series
memory networks for financial market predictions,” Euro- to image conversion approach,” Applied Soft Computing,
pean Journal of Operational Research, vol. 270, no. 02, vol. 70, pp. 525–538, 2018.
pp. 654–669, 2018. [19] T. Kim, H. Y. Kim, and A. Montoya, “Forecasting stock prices
[3] X. Ding, Y. Zhang, and T. Liu, “Deep learning for event-
with a feature fusion LSTM-CNN model using different
driven stock prediction,” in Proceedings of the Twenty-fourth
representations of the same data,” PLoS ONE, vol. 14, no. 2,
international joint conference on artificial intelligence,
p. e0212320, 2019.
pp. 2327–2333, Buenos Aires Argentina, 2015.
[20] M. U. Gudelek, S. A. Boluk, and A. M. Ozbayoglu, “A deep
[4] S. Carta, A. Corriga, A. Ferreira, A. S. Podda, and
learning based stock trading model with 2-D CNN trend
D. R. Recupero, “A multi-layer and multi-ensemble stock
detection,” in Proceedings of the IEEE Symposium Series on
trader using deep learning and deep reinforcement learning,”
Computational Intelligence, pp. 1–8, Honolulu, HI, USA, 2017.
Applied Intelligence, vol. 51, no. 8, pp. 889–905, 2021.
[21] A. Tsantekidis, N. Passalis, and A. Tefas, “Forecasting stock
[5] R. Chia, S. Y. Lim, P. K. Ong, and S. F. Teh, “Pre and post
Chinese new year holiday effects: Evidence from Hong Kong prices from the limit order book using convolutional neural
stock market,” The Singapore Economic, vol. 60, no. 04, networks,” in Proceedings of the IEEE 19th Conference on
pp. 1–14, 2015. Business Informatics, vol. 1, pp. 7–12, Thessaloniki, Greece,
[6] Q. Huang, T. Wang, D. Tao, and X. Li, “Biclustering learning 2017.
of trading rules,” IEEE Transactions on Cybernetics, vol. 45, [22] Y. Y. Chen, W. L. Chen, and S. H. Huang, “Developing ar-
no. 20, pp. 2287–2298, 2014. bitrage strategy in high-frequency pairs trading with filter-
[7] D. Silver, J. Schrittwieser, K. Simonyan et al., “Mastering the bank CNN algorithm,” in Proceedings of the IEEE
game of go without human knowledge,” Nature, vol. 550, International Conference on Agents, pp. 113–116, Singapore,
no. 7076, pp. 354–359, 2017. 2018.
[8] D. Silver, T. Hubert, J. Schrittwieser et al., “A general rein- [23] W. F. Sharpe, “The sharpe ratio,” Journal of Portfolio Man-
forcement learning algorithm that masters chess, shogi, and agement, vol. 21, no. 1, pp. 49–58, 1994.
Go through self-play,” Science, vol. 362, no. 6419, [24] S. Liu, C. Zhang, and J. Ma, “CNN-LSTM neural network
pp. 1140–1144, 2018. model for quantitative strategy analysis in stock markets,”
[9] V. Mnih, K. Kavukcuoglu, D. Silver et al., “Human-level Neural Information Processing, vol. 2017, pp. 198–206, 2017.
control through deep reinforcement learning,” Nature, [25] Y. Yan and D. Yang, “A stock trend forecast algorithm based
vol. 518, no. 7540, pp. 529–533, 2015. on deep neural networks[J],” Scientific Programming,
[10] S. Gu, E. Holly, T. Lillicrap, and S. Levine, “Deep rein- vol. 2021, Article ID 7510641, , 2021.
forcement learning for robotic manipulation with asyn- [26] D. T. Tran, M. Magris, and J. Kanniainen, “Tensor repre-
chronous off-policy updates,” in Proceedings of the IEEE sentation in high-frequency financial data for price change
International Conference on Robotics and Automation (ICRA), prediction,” in Proceedings of the IEEE Symposium Series on
pp. 3389–3396, Singapore, June 2017. Computational Intelligence, pp. 1–7, Honolulu, HI, USA, 2017.
[11] P. Wolf, C. Hubschneider, M. Weber, and A. Bauer, “Learning [27] M. Dixon, D. Klabjan, and J. H. Bang, “Classification-based
how to drive in a real world simulation with deep q-net- financial markets prediction using deep neural networks,”
works,” in Proceedings of the IEEE Intelligent Vehicles Sym- Algorithmic Finance, vol. 6, no. 3-4, pp. 67–77, 2017.
posium, no. IV, pp. 244–250, Los Angeles, CA, USA, June [28] J. Long, Z. Chen, and W. He, “An integrated framework of
2017. deep learning and knowledge graph for prediction of stock
Scientific Programming 15

pricetrend: an application in Chinese stock exchange market,”


Applied Soft Computing, vol. 91, 2020.
[29] S. W. Lee and H. Y. Kim, “Stock market forecasting with
super-high dimensional time-series data using ConvLSTM,
trend sampling, and specialized data augmentation,” Expert
Systems with Applications, vol. 161, 2020.
[30] J. F. Chen, W. L. Chen, C.-P. Huang, S.-H. Huang, and
A.-P. Chen, “Financial time-series data analysis using deep
convolutional neural networks,” in Proceedings of the 2016
17th International conference on cloud computing and big
data, pp. 87–92, Macau, China, 2016.
[31] J. B. Chakole, M. S. Kolhe, and GD. Mahapurush, “A
Q-learning agent for automated trading in equity stock
markets,” Expert Systems with Applications, vol. 163, 2021.
[32] C. J. C. H. Watkins, Learning from delayed rewards, 1989.
[33] Y. Deng, F. Bao, and Y. Kong, “Deep direct reinforcement
learning for financial signal representation and trading,” IEEE
Transactions on Neural Networks and Learning Systems,
vol. 28, no. 3, pp. 653–664, 2016.
[34] J. Wu, C. Wang, and L. Xiong, “Quantitative trading on stock
market based on deep reinforcement learning,” in Proceedings
of the International Joint Conference on Neural Networks,
pp. 1–8, Budapest, Hungary, 2019.
[35] K. Lei, B. Zhang, and Y. Li, “Time-driven feature-aware jointly
deep reinforcement learning for financial signal representa-
tion and algorithmic trading,” Expert Systems with Applica-
tions, vol. 140, 2020.
[36] J. Lee, H. Koh, and H. J. Choe, Learning to trade in financial
time series using high-frequency through wavelet transforma-
tion and deep reinforcement learning, Applied Intelligence,
2021.
[37] Z. Wang, T. Schaul, and M. hessel, “Dueling network ar-
chitectures for deep reinforcement learning,” in Proceedings of
the International conference on machine learning, 2016.
[38] H. Van Hasselt, A. Guez, and D. Silver, “Deep reinforcement
learning with double q-learning,” in Proceedings of the AAAI
conference on artificial intelligence, 2016.
[39] X. Wu, H. Chen, and J. Wang, Adaptive Stock Trading
Strategies with Deep Reinforcement Learning Methods, In-
formation Sciences, 2020.
[40] T. Théate and D. Ernst, “An application of deep reinforcement
learning to algorithmic trading,” Expert Systems with Appli-
cations, vol. 173, 2020.
[41] E. Chan, Algorithmic trading: winning strategies and their
rationale, John Wiley & Sons, vol. 625, , 2013.
[42] A. J. Hussain, A. Knowles, P. J. G. Lisboa, and W. El-Deredy,
“Financial time series prediction using polynomial pipelined
neural networks,” Expert Systems with Applications, vol. 35,
no. 3, pp. 1186–1199, 2008.

You might also like