VC Comp Deep Reinforcement Learning Accepted
VC Comp Deep Reinforcement Learning Accepted
net/publication/338126351
CITATIONS READS
86 3,489
3 authors, including:
Victor Chang
Aston University
581 PUBLICATIONS 18,633 CITATIONS
SEE PROFILE
All content following this page was uploaded by Victor Chang on 09 May 2022.
Abstract The role of the stock market across the overall financial market is
indispensable. The way to acquire practical trading signals in the transaction
process to maximize the benefits is a problem that has been studied for a
long time. This paper put forward a theory of Deep Reinforcement Learning
in the stock trading decisions and stock price prediction, the reliability and
availability of the model are proved by experimental data, and the model is
compared with the traditional model to prove its advantages. From the point of
view of stock market forecasting and intelligent decision-making mechanism,
this paper proves the feasibility of Deep Reinforcement Learning
in financial markets and the credibility and advantages of strategic
decision-making.
Keywords Reinforcement Learning · Financial Strategy · Deep Q Learning
1 Introduction
1.1 Background
As the current Artificial Intelligence methods have become closer to the way
humans think and behave, there is a need to develop something innovative.
Deep Reinforcement Learning (DRL), which integrates the perception of Deep
Learning with the decision-making ability of Reinforcement Learning, simu-
lates human cognition and learning mode. This method can input vision and
other multidimensional and high-dimensional resource information, and then
Yuming Li, Pin Ni
Department of Computer Science, University of Liverpool, Liverpool, UK
E-mail: [email protected] E-mail: [email protected]
Victor Chang
School of Computing Engineering and Digital Technologies, Teesside University, Middles-
brough, UK
E-mail: [email protected]
2 Yuming Li et al.
directly output actions through the simulation of Deep Neural Network, which
can be controlled directly according to the input image without external su-
pervision.
Deep Neural Network (DNN) can automatically find the corresponding rep-
resentation of the lower dimension by extracting the higher dimension input
data. The core of DNN is to integrate the bias of respondent into the hier-
archical neural network architecture. Therefore, Deep Learning has a strong
perception and feature extraction ability. Its weakness is the lack of possessing
decision-making capabilities. Although Reinforcement Learning can be used in
decision-making processes, it has problems to express perception fully. This has
motivated us to integrate Deep Learning with Reinforcement Learning since
each method will be complementary to each other. The integrated approach
can provide a scheme for the construction of cognitive decision-making system
of the sophisticated system.
The stock market is characterized by rapid change, many interference fac-
tors and insufficient periodic data. Stock trading is a game process under
incomplete information, and the single-objective supervised learning model
is difficult to deal with such serialization decision problems. Reinforcement
learning is one of the effective ways to solve such problems. The conventional
quantitative investing is often based on technical index, with the relatively
poor self-adaptability and short life span. This paper tend to realize the ap-
plication about introducing Deep Reinforcement Learning model to financial
area, which can deal with the huge scale data in financial market, enhance the
ability of data processing and extracting features from transaction signals, to
improve the ability of transaction. Besides, this study combines Deep Learning
with Reinforcement Learning theory in computer science area to the field of
financial field and realize the feature of Neural Network to catch and analyze
the information of the mass of data from the field of finance. For instance,
stock exchanging is a sequential decision-making approach, for Reinforcement
Learning, the final task is learning multiple stage behavior strategies. The
method can identify the best price in a certain state, to make the transaction
cost the lowest. Consequently, it has the best practicability for the investment
field.
1.2 Organization
another alternative approach; Section 8 summarizes the full text and proposes
further research plans.
2 Literature Review
The major point of view about the early Deep Reinforcement Learning is
through involving neural networks for dimensionality decrease of data from
the higher dimension to promote data processing task. Shibata et al. [28][27]
irstly integrated the monolayer neural network with Reinforcement Learn-
ing to process the visual signal during the construction of the model for the
pushing-box task automatically, Lange et al. [18] proposed the application of
competent deep auto encoder to visual learning control, and came up with
“visual motion learning” to train the agent have human-like perception and
decision-making capacity. Abtahi et al. [2] proposed the Deep Belief Networks
(DBN) into Reinforcement Learning, in the process of model construction,
the DBN is used to replace the original value function approximator, and the
model is triumphantly applied to the character segmentation task of license
plate image. Then, Lange et al. [19] proposed Deep Q-Learning witch ap-
plied Reinforced Learning based on visual issue to automatically control the
car. Koutnik et al. [16] made the combination of the Neural Evolution (NE)
method and Reinforcement Learning to the popular car race game TORCS
[35] and finally realize the automaticly driving of the automobile.
In the stock decision model based on DRL, the DL part automatically
perceives the current market environment for feature learning, and the RL
part construct the interaction together with deep characterization and makes
trading decisions to accumulate the final return of the current updated envi-
ronment.
Mnih et al. [23] was called the pioneer of DRL. In this paper, the pixel
points of the game screen are taken as the input data (S), and the front, rear,
left and right directions of the game joystick are taken as actions (A) to solve
the decision-making problem of atari games. Finally, he proved that the per-
formance of agent of Deep q-network could surpass all existing algorithms in
2015 [24]. Many researchers then improved DQN later on. Van Hasselt et al.
[30] proposed Double-DQN, in which one of the Q networks chooses the action
and the other Q network evaluates the action. The two networks work together
to solve the deviation problem existing in a single DQN. In 2016, Silver et al.
[25] added a replay mechanism based on original Double-DQN to speed up
the training process and added camouflage samples. Wang et al. [34] brought
forward the Dueling Network, which is a DQN-based method divides the orig-
inal network into an output scalar V(s) and an output action to the dominant
value, and integrates two Q values after operation respectively. Silver et al. [29]
demonstrated Deterministic Policy Gradient Algorithms (DPG), then DDPG
paper by Google [21] combined DQN and DPG together to put DRL into con-
tinuous motion space control. To the research from Berkeley University [26],
the essential of the method is the credibility of simulating and improving the
4 Yuming Li et al.
stability of the DRL model. Gabriel et al. [10] creatively introduced the con-
cept of action embedding, embedding the discrete action in reality into the
continuous space, so that the reinforcement learning method can be applied
to large-scale learning problems. The above results can prove that in order to
adapt to more realistic situations, the deep reinforcement learning algorithm
is continuously improved and perfected. Reinforcement learning can observe
the environment without supervision, actively explore and trial and error, and
can self-summarize excellent experience. Although the active learning system
combining deep learning and reinforcement learning is still in the initial stage,
it has achieved excellent results in learning various video games.
In recent years, researchers have become increasingly interested in evolu-
tionary algorithms like genetic algorithm [8] [9] [5], and artificial neural net-
works [6], to come up with stock trading strategy. Deep reinforcement learn-
ing has been applied to such areas as high frequency trading and investment
portfolios in financial pair trading. To be more exactely, the Reinforcement
Learning (RL) algorithm have been used in quantitative finance [4]. The su-
periority in applying RL concepts in finance is common, which includes the
automated processing real-time data with high-frequency, and conduct trans-
actions efficiently with the use of agent. For example, both Sarsa (On-Policy
TD Control) and Q-learning (Off-Policy Temporal Difference Control Algo-
rithm) are used by the optimization algorithm of optimized by JP Morgan
Chase trading system [15]. League Champion Algorithm (LCA) [3] for the
extraction stock trading rules, the process extracts and holds multiple stock
trading rules for diversiform stock market environment.
Krollner et al. [17] review diverse types of stock market forecasting papers
based on machine learning, such as neural network based models, evolution
and optimization mechanics, multiple and compound methods, ect. The Ar-
tificial Neural Network (ANN) is used commonly by scientists to predict the
stock market trend [33][20]. For example, Guresen et al.[13] use Dynamic Arti-
ficial Neural Network (DANN) and Multi-Layer Perceptron (MLP) Model for
NASDAQ Stock Index prediction. Hu et al.[1] combined reinforcement learn-
ing algorithm and cointegration paired trading strategy to solve the problem
of portfolio selection.Using sotino ratio as the return index, the adaptive dy-
namic adjustment of model parameters is realized, and the return rate and
sotino ratio are greatly improved.The maximum retracement obviously drops,
the transaction frequency obviously reduces.However, there are fewer bonds,
smaller data sets and fewer status indicators. Vanstone et al. [31] design a
MLP-based trading system to detect trading signals for the Australian stock
market. Due to the limitations of a single model, hybrid machine learning
(HML) models have been used to resolve financial trading points based on
time sequence. HML models have become the mainstream for financial analy-
sis as follows. J.Wang et al. [32] propose a hybrid Support Vector Regression
model. It can connect Principal Component Analysis with Brainstorm opti-
mization to predict trading prices. Mabu et al.[22] introdused a rule-based
evolutionary algorithm combined with MLP to identify the trading points of
the stock market.
Title Suppressed Due to Excessive Length 5
This section describes the architecture of our deployment, since it can help us
to get analysis quicker, better and with higher accuracy. As shown in Figure
1. When the raw data comes in, the data analysis phase is performed first
to ensure that it is not extreme data. At the same time, the original data
is passed into the data processing phase. Data will then be the input for the
Deep Q Network part. DQN is a kind of network that uses a neural network to
predict Q value and continuously updates the neural network to learn the max
Q value. There are two neural networks (NN) in DQN: one is Target-Network
with relatively fixed parameters, which is used to obtain the target value; the
other is called Current Q-Network, which is used to evaluate the Current Q
value. The training data is extracted randomly from Replay Memory, which
0
records the actions (a), rewards (r), and results of the next state (s, a, r, s ).
As the Environment changes, Networks will update its parameters regularly
and Replay Memory will change accordingly. The Loss function is the result of
subtracting the value of Q in Target-Network from the value of Q in Current q-
network. The values between modules are changed iteratively until the optimal
value of Q is achieved and output operation is carried out.
Fig. 1 The architecture of our deployment for experiments and data analysis
6 Yuming Li et al.
4.2 Methods
Deep Q Network is one of the most classical and excellent algorithms in DRL.
Before DQN appear in the deep reinforcement learning research found that
using the nonlinear mapping deep network to represent the value function is
not stable or even no convergence, and deep network training samples require-
ments are independent of each other but before and after the intensive study of
a correlation between state data, these problems make directly from the high-
dimensional data learning control strategy can be difficult. However, deep Q
network introducing experience replay technology, target network and other
methods, to overcome the above problems, use DQN build the Agent on the
Atari 2600 game[28] directly before four frames of the graphics for the input,
output control instruction for end-to-end training. In the test, DQN shown
that it can be comparable to those of human players, and even outperforming
experienced human experts in less difficult, non-strategic games.
Title Suppressed Due to Excessive Length 7
The algorithm blends the Q-learning algorithm and neural network bene-
fits. This is achieved by 1) increasing the experience reply function, from the
previous state transition (experience) in the random sample training and 2)
overcoming the correlated data and non - stationary distribution problems. In
DQN, the Q value represents the current learned experience. The key of DQN
model is to learn the q-value function, and finally be able to converge and
accurately predict the Q value of each action in various states. The Q value
calculated according to the formula is a score obtained by the agent through
interaction with the environment and its own experience (namely, the target
0
Q value). Finally, update the old Q value Q (st , at ) with the target Q value
0 0
rt+1 + γmaxa0 Q(st , at ; θ). The corresponding relationship between the target
Q value and the old Q value is exactly the corresponding the correlation be-
tween the result value and the output value in the supervised learning neural
network. The experience pool can save the transfer samples (st , at , rt , st+1 )
fetched and stored by each time step agent and the current environment into
a memory unit, some are randomly fetched for training at the time needed.
The loss function of DQN is presented below:
(at represents the action situation of the Agent, st refer to the current state
of this Agent, rt is a real number that means the reward of selected action, θ
0 0 0
indexes the mean square error of the network parameter, and Q , st , at signals
the renovated value of Q, st and at . )
Under certain conditions, Q learning algorithm only needs to use greedy
strategy to ensure convergence, so Q learning is a more commonly used model
independent reinforcement learning algorithm.
In Q learning and deep Q learning, optimal Q value will be used to select and
measure an action. Choosing an overestimated value will lead to overestimation
of the real value of Q. Van Hasselt et al. [30] found and proved that the
traditional DQN method had the problem of overestimating the Q value, and
the error would accumulate with the increase of the number of iterations.
The proposal of Double Deep Q learning is to solve the problem caused by
overestimation.
Deep Q learning can be regarded as a new neural network plus an old
neural network. They have the same structure, but their internal parameters
are updated with time difference. Since the optimal Q value predicted by the
neural network is inherently wrong, and the error will become larger and larger
with the iteration, Double DQN introduces another neural network to optimize
the influence of error. Target network and main network can effectively reduce
the number of participants. The specific operation is to modify the generating
of the Target Q value is:
8 Yuming Li et al.
0 0 0
T argetDQ = rt+1 + γQ(st , argmaxa0 Q(st , at ; θ); θ )
In many DRL tasks, the value functions of the state action pairs are different
under the influence of different actions. However, in some states, the size of
the value function is independent of the action. Based on this situation, Wang
et al.[34] proposed Dueling DQN, and add it into the DQN network pattern.
Dueling DQN combines Dueling Network with DQN.Dueling net assigns its
eigenvalues extracted from the convolutional network layer to its two branches.
The first part is the state value function V (st ; θ, β), which stands for the value
of the current state environment itself; the second part is the action advantage
function A(st , at ; θ, α) of the dependent state, which refer to the extra value
of an Action (A). Finally, the final Q value Q(st , at ; θ, α, β) can be obtained
by re-aggregating the two paths, V (st ; θ, β) and A(st , at ; θ, α) together. In the
above function, at stands for an action of the Agent, θ is the convolutional layer
parameter, st represents a state of the Agent, and α and β are the two-way
fully connected layer parameters.
In real cases, the action dominant flow is commonly set as the individ-
ual action advantage function Q(st , at ; θ, α, β) minus the average of all action
0
1
P
advantage functions |A| a 0 A(st , a ; θ, α) in a certain state.
t
The advantage of this method is that, when there is no sample to a, a
can also be updated, data can be used more efficiently and training can be
accelerated. In this way, the relative order of the main functions of each action
in this state can be guaranteed to remain unchanged, and the range of Q value
can be reduced to reduce the redundancy, so as to improve the overall stability
of the algorithm.
– The experimental data were imported and preprocessed. Invalid data were
cleaned first, and then the data were divided into training set and test
set according to the ratio of 4:6, and appropriate experimental parameters
were set.
– Ten stocks were randomly selected, three reinforcement learning algorithms
were used to simulate trading, and their closing prices were obtained and
compared.
– Further analysis was made on the nature of the single stock and the above
experimental results were combined
– The present results are analyzed and discussed
Title Suppressed Due to Excessive Length 9
4.4 Problems
Due to the large quantity of data in the dataset, there are certain differences
in the size of individual data. For example, some stocks have been recorded for
decades, while some newly listed stocks are only a few months. The 10 stocks
tested are not consistent in time dimension. In the whole process of the exper-
iment, even though time is not taken as the input parameter that influences
the experimental results, some external influences will still be generated in the
actual market.
5 RESULTS
The profit of training set and test set about the ten stocks in the three Deep
Reinforcement Learning models are shown in Table 1.
Fig. 3 The Loss Function and the Reward Function of the stock
pared with the stock price of a certain day before a certain number of days,
reflecting the degree of change in the stock market.Momentum refers to the
ability of stocks (or economic indicators) to continue to grow. VWAP is an
average price weighted based on the number of trades traded.
Title Suppressed Due to Excessive Length 13
6 DISCUSSION
From the perspective of a single stock, as shown in Table 1, the deep reinforce-
ment learning strategy can be fully applied, and some stocks will still show a
negative profit, but this method is effective for most stocks and has certain
feasibility. For example, the stock kdmn.us has a negative return when using
DQN for decision-making, but a positive return through the other two im-
proved reinforcement learning models. Most of the stocks’ test profit is higher
than the train profit, because the test set uses the optimal Q value of the
training set in the training process. In unsupervised mode, the effect of using
the optimal Q value is better than that in the process of exploring the optimal
Q value. At the same time, by comparing three kinds of depth of reinforce-
ment learning model, it can be seen that DQN in stock decisions are general is
greater than the benefits of Double DQN and Dueling DQN. Although Double
DQN and Dueling DQN are based on the improved version of DQN, due to
the difference between the application field of this paper and that of the above
algorithm, double Q network and dueling network are better than traditional
14 Yuming Li et al.
dqn in game competition, but they are better in stock market not applicable
for market decision-making.
The shortcomings of traditional human decision-making: 1) insufficient in-
formation and it cannot be accurately valued; 2) one-sided basis based on
an indicator, since the effect is very poor; 3) the summarized indicators and
fixed operational strategies, which cannot dynamically adapt to environmen-
tal changes, have weak abilities to counter the effects of risks. As a result,
we propose our approaches with distinctive features as follows. First of all,
this method has been adopted in the financial field, and the application has
achieved good results, which can play a role in assisting manual decision-
making, thus saving manpower to some extent. Second, the algorithm can use
a large amount of historical data as learning materials. Third, in the handling
of emergencies by algorithms, most of the cases are more ’experienced’ than
human decision-making. It is an improved and hybrid method of using deep
learning intelligence to improve stock returns. All these have been evident
by the analysis of Figure 2 and 3, the experimental results demonstrate that
deep reinforcement learning (represented by three algorithms) plays an effec-
tive role in stock timing trading decisions, and can guarantee returns under
most conditions.
The experiment provides a paradigm to prove the application of Deep Rein-
forcement Learning in the area of decision-making mechanism of stock market
, which can be used by subsequent practitioners to apply in the real market.
In addition, as shown in Table 1, DQN is the Deep Reinforcement Learning
model with the best result, not the improved Double DQN and Dueling DQN.
This illustrates that we can not stuck in empirical mistakes in practical ap-
plications. Improvements based on the original method are not always better
than the original. Moreover, each method has its own specific field of applica-
tion. The method that works well in autopilot may not be applicable to the
financial field. Divide and conquer is the best method.Therefore, the conclu-
sion must be made through feasible scientific experiments to prove the point
of view.
It can be analyzed from the above experimental results that, except for the
possible invalid data, the deep reinforcement learning model is applicable to
most stocks with sufficient information, but it cannot be fully applicable to all
stocks. Therefore, in the actual investment of practitioners, the dependence
on the model should be reduced as far as possible to reduce the risk. The
model only plays an auxiliary role in judgment, rather than a theorem. The
empirical results prove the differentiated application of deep learning in the
financial field to some extent.
As shown in Figure 4, the prediction data is a little bit of delay to test data.
This shows that the decision-making result of Deep Reinforcement Learning
is to imitate and learn the past data trend. The decision-making choice for a
distinct node is based on the data before the node, but not the future data,
which is slightly insufficient at the global level.
Title Suppressed Due to Excessive Length 15
Chang et al. [7] proposed an alternative approach was based on the devel-
opment of the Adaboost algorithm[14]. The purpose was to enable computa-
tional financial modeling to be conducted and completed with both accuracy
and performance achieved. The algorithm was a different sub-division of AI.
The approach was to study the historical data and fully understand the trends
of the data movements. Once the trends have been captured, the algorithms
can better predict the movement. As a result, predictive modeling can be
achieved. The predicted stock index can be adjusted when there are changes
in the market, so that the accuracy for predictions can be higher. Similarly, op-
timization algorithms have been developed to ensure better performances can
be achieved. Our approach in this paper is to use three DRL algorithms, which
can calculate the best strategies and the ideal prices every second. When there
are changes, DRL algorithms adapt changes in the next available time unit
and then optimize the performance at the same time. In other words, accuracy
and performance can be maintained, but more likely to get better outcomes in
the next time unit, rather than the same time unit. Both approaches for com-
16 Yuming Li et al.
The paper implemented a novel Deep Reinforcement Learning for stock trans-
action strategy, and proved the practicality of DRL in dealing with financial
strategy issues, and made the comparison of three classical DRL models. The
outcomes demonstrated these three learning algorithms we developed were
effective, particularly the DQN model with the best performance in dealing
with decision-making problems of stock market strategies. The advantages of
using our proposed algorithms are as follows. First, these three algorithms
have better intelligence than traditional transactions, as they could respond
Title Suppressed Due to Excessive Length 17
to the market quickly and adapted for changes. Second, other single-objective
supervised learning model is difficult to deal with such serialization decision
problems. The reinforcement learning algorithm is a deep learning algorithm
that is most similar to the learning process of human beings. The human ex-
ploration and development process is similar to the continuous trial and error
of reinforcement learning, obtaining the environmental reward label and the
alternating process of learning with empirical data. As a result, our three al-
gorithms can be more suitable for tasks involved with humans such as trading
and other human-based operations. However, the weakness is that there was
a problem with the data set itself. The data size difference was large, which
could bring instability to the experiment to a certain extent. This was the
most complete data set in the field commonly used for financial analysis.
This research proved the feasibility of the Deep Reinforcement Learning
algorithm in the financial field and its practicability as an auxiliary tool for
financial investment decision-making. Reinforcement learning was a recently
focused algorithm that was more innovative in business applications. Our con-
tribution was mainly on making the application of deep reinforcement learning
(a new reinforcement learning variant) to financial transactions, as an innova-
tive approach to the application level of this model. The decision results of our
three algorithms under the same set of data were compared. Finally, it was
concluded that DQN maximization of decision benefits among the three mod-
els, which could be used as a reference for future research. We also compared
between improved DRL and Adaboost algorithms in detail in terms of sim-
ulations, accuracy, performance, optimization and popularity. Each approach
had its specific ways of focuses and apparent strengths and suitability for dif-
ferent cases. We proposed a hybrid solution. For stocks that investors were
familiar with, they could use Adaboost. For new invested stocks, they could
use DRL algorithms, and then switched to Adaboost algorithms to capture
trends of data movement, and would decide the next investment strategy. In
summary of our contribution, we developed novel three algorithms that could
fit the research direction of interdisciplinary research. We could broaden the
scope of use of tools for quantitative investment, and also extended the scope
of application of deep learning applications.
The theory of Deep Reinforcement Learning is now widely accepted. How-
ever, there are still many challenges to be overcome, such as the exploration
and utilization of balance problem, slow convergence rate, space disaster and
so on. The following research will gradually increase the latest technology of
deep reinforcement learning, continuously enhance the ability of model learn-
ing strategy, search for the method of high-level abstract logic memory and
control intelligent agent. We plan to improve the main shortcomings in the
following research, especially in the field of financial application, and try to
introduce the data characteristics and strategy templates of the field, and pro-
pose a finance-DRL algorithm applicable to the characteristics of the financial
field.
18 Yuming Li et al.
References
1.
2. Abtahi, F., Zhu, Z., Burry, A.M.: A deep reinforcement learning approach to character
segmentation of license plate images. In: Machine Vision Applications (MVA), 2015
14th IAPR International Conference on, pp. 539–542. IEEE (2015)
3. Alimoradi, M.R., Kashan, A.H.: A league championship algorithm equipped with net-
work structure and backward q-learning for extracting stock trading rules. Applied Soft
Computing 68, 478–493 (2018)
4. Almahdi, S., Yang, S.Y.: An adaptive portfolio trading system: A risk-return portfolio
optimization using recurrent reinforcement learning with expected maximum drawdown.
Expert Systems with Applications 87, 267–279 (2017)
5. Berutich, J.M., López, F., Luna, F., Quintana, D.: Robust technical trading strategies
using gp for algorithmic portfolio selection. Expert Systems with Applications 46, 307–
315 (2016)
6. Chang, P.C., Liao, T.W., Lin, J.J., Fan, C.Y.: A dynamic threshold decision system for
stock trading signal detection. Applied Soft Computing 11(5), 3998–4010 (2011)
7. Chang, V., Li, T., Zeng, Z.: Towards an improved adaboost algorithmic method for
computational financial analysis. Journal of Parallel and Distributed Computing 134,
219–232 (2019)
8. Cheng, C.H., Chen, T.L., Wei, L.Y.: A hybrid model based on rough sets theory and
genetic algorithms for stock price forecasting. Information Sciences 180(9), 1610–1629
(2010)
9. Chien, Y.W.C., Chen, Y.L.: Mining associative classification rules with stock trading
data–a ga-based method. Knowledge-Based Systems 23(6), 605–614 (2010)
10. Dulac-Arnold, G., Evans, R., van Hasselt, H., Sunehag, P., Lillicrap, T., Hunt, J., Mann,
T., Weber, T., Degris, T., Coppin, B.: Deep reinforcement learning in large discrete
action spaces. arXiv preprint arXiv:1512.07679 (2015)
11. Foerster, J., Assael, I.A., de Freitas, N., Whiteson, S.: Learning to communicate with
deep multi-agent reinforcement learning. In: Advances in Neural Information Processing
Systems, pp. 2137–2145 (2016)
12. Foerster, J.N., Farquhar, G., Afouras, T., Nardelli, N., Whiteson, S.: Counterfactual
multi-agent policy gradients. In: Thirty-Second AAAI Conference on Artificial Intelli-
gence (2018)
13. Guresen, E., Kayakutlu, G., Daim, T.U.: Using artificial neural network models in stock
market index prediction. Expert Systems with Applications 38(8), 10389–10397 (2011)
14. Hastie, T., Rosset, S., Zhu, J., Zou, H.: Multi-class adaboost. Statistics and its Interface
2(3), 349–360 (2009)
15. jpmorgan. https://www.businessinsider.com/jpmorgan-takes-ai-use-to-the-next-level-
2017-8
16. Koutnı́k, J., Schmidhuber, J., Gomez, F.: Online evolution of deep convolutional network
for vision-based reinforcement learning. In: International Conference on Simulation of
Adaptive Behavior, pp. 260–269. Springer (2014)
17. Krollner, B., Vanstone, B., Finnie, G.: Financial time series forecasting with machine
learning techniques: A survey (2010)
18. Lange, S., Riedmiller, M.: Deep auto-encoder neural networks in reinforcement learning.
In: The 2010 International Joint Conference on Neural Networks (IJCNN), pp. 1–8.
IEEE (2010)
19. Lange, S., Riedmiller, M., Voigtlander, A.: Autonomous reinforcement learning on raw
visual input data in a real world application. In: Neural Networks (IJCNN), The 2012
International Joint Conference on, pp. 1–8. IEEE (2012)
20. Liao, Z., Wang, J.: Forecasting model of global stock index by stochastic time effective
neural network. Expert Systems with Applications 37(1), 834–841 (2010)
21. Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D.,
Wierstra, D.: Continuous control with deep reinforcement learning. arXiv preprint
arXiv:1509.02971 (2015)
22. Mabu, S., Obayashi, M., Kuremoto, T.: Ensemble learning of rule-based evolutionary al-
gorithm using multi-layer perceptron for supporting decisions in stock trading problems.
Applied soft computing 36, 357–367 (2015)
Title Suppressed Due to Excessive Length 19
23. Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D.,
Riedmiller, M.: Playing atari with deep reinforcement learning. arXiv preprint
arXiv:1312.5602 (2013)
24. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves,
A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al.: Human-level control through
deep reinforcement learning. Nature 518(7540), 529 (2015)
25. Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. arXiv
preprint arXiv:1511.05952 (2015)
26. Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy opti-
mization. In: International Conference on Machine Learning, pp. 1889–1897 (2015)
27. Shibata, K., Iida, M.: Acquisition of box pushing by direct-vision-based reinforcement
learning. In: SICE 2003 Annual Conference, vol. 3, pp. 2322–2327. IEEE (2003)
28. Shibata, K., Okabe, Y.: Reinforcement learning when visual sensory signals are directly
given as inputs. In: Neural Networks, 1997., International Conference on, vol. 3, pp.
1716–1720. IEEE (1997)
29. Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic
policy gradient algorithms. In: ICML (2014)
30. Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-
learning. In: AAAI, vol. 2, p. 5. Phoenix, AZ (2016)
31. Vanstone, B., Finnie, G., Hahn, T.: Creating trading systems with fundamental variables
and neural networks: The aby case study. Mathematics and computers in simulation
86, 78–91 (2012)
32. Wang, J., Hou, R., Wang, C., Shen, L.: Improved v-support vector regression model
based on variable selection and brain storm optimization for stock price forecasting.
Applied Soft Computing 49, 164–178 (2016)
33. Wang, J.Z., Wang, J.J., Zhang, Z.G., Guo, S.P.: Forecasting stock indices with back
propagation neural network. Expert Systems with Applications 38(11), 14346–14355
(2011)
34. Wang, Z., Schaul, T., Hessel, M., Van Hasselt, H., Lanctot, M., De Freitas, N.: Dueling
network architectures for deep reinforcement learning. arXiv preprint arXiv:1511.06581
(2015)
35. Wymann, B., Espié, E., Guionneau, C., Dimitrakakis, C., Coulom, R., Sumner, A.:
Torcs, the open racing car simulator. Software available at http://torcs. sourceforge.
net 4, 6 (2000)