0% found this document useful (0 votes)

4 views22 pages

0327_AaronGerardDaniel

Uploaded by

GerardDaniel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views22 pages

0327_AaronGerardDaniel

Uploaded by

GerardDaniel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 22

PROJECT REPORT

Topic: MACHINE THINKING FOR ALGORITHMIC TRADING

AN APPLICATION OF DEEP REINFORCEMENT LEARNING TO
ALGORITHMIC TRADING

SUBMITTED BY
Name of Candidate: Aaron Gerard Daniel
Registration Number: 19-300-3-01-0327
College Roll Number: 0327
College Room Number: 12
Department: B.Com(Hons.) – Morning

SUPERVISED BY:
Name of the Supervisor: Dr. Soheli Ghose
Name of College: St. Xavier’s College (Autonomous), Kolkata

MONTH AND YEAR OF SUBMISSION:

April 2022
St. Xavier’s College (Autonomous), Kolkata
Department of Commerce

PROJECT COMPLETION AND PLAGIARISM VERIFICATION

CERTIFICATE

Student Name: Aaron Gerard Daniel

Room No.: 12 Roll No.: 0327
Title of the dissertation: MACHINE THINKING FOR ALGORITHMIC
TRADING
AN APPLICATION OF DEEP REINFORCEMENT LEARNING TO
ALGORITHMIC TRADING
The above dissertation was scanned using iThenticate for similarity detection
and the similarity index is as follows:

Similarity Index: __%

The dissertation may be considered for submission.

Name of the Supervisor: Dr. Soheli Ghose

Signature: ....................................

Date: __ April 2021

2|Page
Acknowledgement
Surpassing milestones towards a mission sometimes gives us such a degree of
jubilance that we tend to forfeit the precious guidance and help extended by the
people to whom the success of mission is solely dedicated. A project depends
on contributions from a wide range of people for its success. I would like to take
this opportunity to acknowledge the many people who have contributed a great
deal of their time and expertise to the development of this project. Firstly, I
would like to express my sincere gratitude towards the college and father
Principal and Vice Principal for giving us this opportunity to write a dissertation
project. I would also like to express my sincere thanks to my project guide Dr.
Soheli Ghose for getting me started and for guiding me throughout the project.
Without her knowledge, guidance and experience, this project would not have
gone so far. She has been a source of constant inspiration, stimulating me to
learn and pick up minute of the topics, making my learning process a worthy
experience. She has given a sense of completeness to this project and ensures its
full proof by completely monitoring my work. Also I would like to thank my
family, friends and relatives for their continuous support throughout the time
period of my project. Above all I want to thank God for giving me confidence
and patience to successfully complete the project.

3|Page
INTRODUCTION

There is a need to design something innovative as current Artificial Intelligence approaches

have become closer to how people think and behave. By merging the concepts of Deep
Learning with the decision-making skills of Reinforcement Learning, Deep Reinforcement
Learning (DRL) model matches human cognition and learning mode. This method can
combine vision and other multidimensional and high-dimensional resource information, and
then output actions directly using a Deep Neural Network simulation that can be controlled
directly depending on the input image without the need for external supervision.

A Deep Neural Network (DNN), on its own figures out the consecutive representation of the
lower dimension by extrapolating it with that of the higher dimension input data. The
essential feature of DNN is that it incorporates the respondent's bias into the hierarchical
neural network architecture. As a result, Deep Learning has a high ability to perceive and
extract features. Its main flaw is that it lacks the ability to make decisions. Although
Reinforcement Learning may be used to make decisions, it has limits when it comes to fully
expressing observation. This has prompted to combine Deep Learning and Reinforcement
Learning because one method complements the other. The integrated method can give a
framework for developing a complex cognitive decision-making system.

The stock market is characterised by fast fluctuation, a plethora of interfering factors, and a
lack of timely data. Stock trading is a game with imperfect information, and dealing with
serialised decision problems is tough using a single-objective supervised learning model.
Reinforcement learning is among the most effective methods for resolving these problems.
Traditional quantitative investment is frequently based on technical indexes, which have
limited self-adaptability and have a short life period. This study aims to demonstrate the
application of bringing a Deep Reinforcement Learning model to the financial sector, which
can deal with large amounts of data in the financial market, improve data processing and
feature extraction from transaction signals, and improve transaction capability. Furthermore,
this research applies Deep Learning and Reinforcement Learning theory from computer
science to the realm of finance, demonstrating the ability of Neural Networks to capture and
evaluate information from large amounts of data. For example, stock trading is a sequential
decision-making strategy, and the final aim in Reinforcement Learning is to learn multiple
stage behaviour patterns. The approach can determine the optimal pricing in a given state in

4|Page
order to reduce transaction costs. As a result, it is the most practical in the investment
industry.

LITERATURE REVIEW

The main point of view on early Deep Reinforcement Learning is that it uses neural networks
to reduce the dimensionality of input from higher dimensions in order to speed up data
processing. Shibata et al. advocated using a competent deep auto encoder for visual learning
control and devised "visual motion learning" to teach the agent to have human-like perception
and decision-making abilities. Lange et al. proposed the application of competent deep auto
encoder to visual learning control, and came up with "visual motion learning" to train the
agent have human-like perception and decision-making capacity. Abtahi et al. proposed the
Deep Belief Networks (DBN) into Reinforcement Learning, where the DBN is utilised to
replace the original value function approximator during model development, and the model is
successfully applied to the character segmentation problem of a licence plate image. Then,
Lange et al. introduced Deep Q-Learning, which used Reinforced Learning to control the car
automatically based on a visual issue. Koutnik et al. applied the Neural Evolution (NE)
approach and Reinforcement Learning to the popular car racing game TORCS and were able
to achieve fully automated driving.

The DL part automatically observes the current market environment for feature learning,
while the RL part constructs the interaction with deep characterisation and performs trading
decisions to accumulate the final return of the current updated environment.

The pioneers of DRL were Mnih et al. To answer the decision-making problem of atari
games, the pixel points of the game screen are used as input data (S), while the front, rear,
left, and right directions of the game joystick are used as actions (A). Finally, he
demonstrated that in 2015, the performance of a Deep q-network agent could outperform all
other methods. Later on, many researchers enhanced DQN. Double-DQN was proposed by
Van Hasselt et al., in which one of the Q networks decides the action and the other Q network
reviews it. To overcome the deviation problem that exists in a single DQN, the two networks
collaborate. To speed up the training process, Silver et al. developed a replay mechanism
based on the original Double-DQN and added camouflage samples in 2016. The Dueling
Network, proposed by Wang et al., is a DQN-based approach that divides the original
network into an output scalar V(s) and an output action to the dominant value, and integrates

5|Page
two Q values after operation. Deterministic Policy Gradient Algorithms (DPG) were shown
by Silver et al., and the DDPG paper by Google integrated DQN and DPG to incorporate
DRL into continuous motion space control. According to Berkeley University research, the
method's credibility in simulating and increasing the stability of the DRL model is crucial.
Gabriel et al. established the concept of action embedding, which involves embedding a
discrete action in reality into a continuous space in order to apply the reinforcement learning
method to large-scale learning issues. The preceding findings show that the deep
reinforcement learning algorithm is constantly developed and optimised in order to adapt to
more realistic scenarios. Reinforcement learning is able to examine the world without
supervision, actively explore and experiment, and self-summarize outstanding experience.
Although the active learning system, which combines deep learning with reinforcement
learning, is still in its early stages, it has shown to be effective in learning a variety of video
games.

To develop stock trading strategies, academics have been increasingly interested in

evolutionary algorithms such as genetic algorithms and artificial neural networks in recent
years. In financial pair trading, deep reinforcement learning has been used in fields including
high frequency trading and investment portfolios. The Reinforcement Learning (RL)
technique has been used in quantitative finance to be more precise. It is common to see
superiority in the application of RL ideas in finance, such as the automated processing of
real-time data with high frequency and the efficient conduct of transactions with the
employment of agents. The optimization algorithm of JP Morgan Chase's trading system, for
example, uses both Sarsa (On-Policy TD Control) and Q-learning (Off-Policy Temporal
Difference Control Algorithm). For the extraction of stock trading rules, the League
Champion Algorithm (LCA) procedure extracts and holds multiple stock trading rules for a
diverse stock market environment.

Krollner et al. examine a variety of machine learning-based stock market forecasting articles,
including neural network-based models, evolution and optimization mechanics, multiple and
compound approaches, and so on. Scientists frequently utilise the Artificial Neural Network
(ANN) to forecast stock market trends. Guresen et al., for example, employ the Dynamic
Artificial Neural Network (DANN) and Multi-Layer Perceptron (MLP) Model to predict the
NASDAQ Stock Index. To overcome the problem of portfolio selection, Hu et al. integrated a
reinforcement learning algorithm with a cointegration paired trading strategy. The adaptive
dynamic modification of model parameters is realised using the sotino ratio as the return

6|Page
index, and the return rate and sotino ratio are considerably improved. The maximum
retracement decreases, and the transaction frequency decreases. There are fewer bonds,
smaller data sets, and fewer status indications, on the other hand. Vanstone and colleagues
develop an MLP-based trading system for detecting trade signals in the Australian stock
market. Hybrid machine learning (HML) models have been utilised to resolve financial
trading points depending on time sequence due to the constraints of a single model. As
follows, HML models have become the standard for financial analysis. A hybrid Support
Vector Regression model is proposed by J.Wang et al. It can anticipate trading prices by
combining Principal Component Analysis and Brainstorm optimization. Mabu et al.
employed a rule-based evolutionary algorithm in conjunction with MLP to discover stock
market trading points.

Financial market data has the characteristics of uncertainty and timing due to the reciprocal
influence of a large number of complicated components. Data analysis is a difficult,
nonlinear, and inherently unstable problem. In financial forecasting and sequence decision
making, traditional statistical models and mass data mining models are ineffective. Technical
indicators and evaluation criteria are frequently used in traditional quantitative investment
algorithms. These techniques are notorious for their extended lifespan and poor self-
adaptation. Following machine learning algorithms, strategic data in financial investment can
be greatly improved. The machine's processing speed can greatly increase strategy
adaptability and the capacity to derive market features from real-time trading signals.

BACKGROUND

Types of Learning:
This study is primarily based on the application of Deep Reinforcement Learning for
Algorithmic Trading, but before we come to that, lets first understand the different types of
learning:
1. Supervised Learning
2. Unsupervised Learning
3. Reinforcement Learning
How can supervised learning be used for Algorithmic Trading?
We can build a model based on supervised learning, but first we must analyse all outcomes
and scenarios. Initially, a large number of scenarios will have to be fed to the machine in
order for it to act. A large number of real-time scenarios and use-cases will be used to build
the model. However, the machine will know what to do in a specific scenario and will be able

7|Page
to act on it (provided a scenario is fed into the system). The system becomes dependent due
to the feeding of all scenarios, data, and use cases.

The reliance is on supervision. Though all types of learning are supervised, they are all
supervised by a loss function or a function that tells the machine what is good and what is
bad. Since childhood, we humans have been influenced by this type of learning.

What is Reinforcement Learning?

To better understand this, consider a human being from our own lives. As a case-based
scenario, the human brain attempts to understand/decipher each specific situation in our lives.

The brain takes each situation and tries to act on it in a specific way, but first it analyses all
types of scenarios and tries to predict the best possible outcome or step to take and act on it.
The brain then executes that outcome in the hope that it will occur in the manner that it hoped
for success. And, following execution, it receives a success/failure result.

Similarly, when a machine is given a scenario, it attempts to comprehend the scenario in a

specific context. Assume a machine attempts to play a game. The computer game begins, and
the machine finds itself in a specific scenario, and in that specific scenario, the machine
moves from one step to the next, and during that process/scenario, the state of the game
changes, and the machine acts in accordance with the different states, making decisions based
on the "states." Finally, the game concludes with an increase/decrease in points, i.e., a reward
for success, or vice versa.

Thus, Reinforcement Learning is a technique for teaching machines to solve a task without
supervision and then receiving a reward based on the outcome.

When a task or scenario is assigned, a series of steps must be completed in order for the final
result to be achieved. The scenarios change at each step, and a new "state" is presented, based
on which the decision is made. Finally, a reward is given based on the outcome of the steps
taken, which defines success or failure. Because Reinforcement Learning learns through
experience, it is self-sufficient.

Supervised learning versus reinforcement learning

Reinforcement learning is a machine learning algorithm that teaches a computer how to

perform a task by rewarding correct behaviour. Reinforcement learning algorithms differ

8|Page
from supervised learning algorithms in that the computer must be explicitly told what
behaviour is acceptable.

Reinforcement learning algorithms are especially well-suited for tasks that are difficult to
specify explicitly, such as chess or driving a car. In these cases, telling the computer exactly
what to do to achieve the desired result is difficult or impossible. Reinforcement learning
algorithms, on the other hand, can learn how to perform a task by trial and error by providing
feedback (e.g., rewards or punishments) for specific behaviours.

MARKOV DECISION PROCESS:

The Markov Decision Process, which we can understand from the Markov Chain, is the
overall process that we use as a concept for Reinforcement Learning.

1. We begin by performing an action or taking a step. It is essentially a decision that must be

made in order to achieve the desired outcome.

2. When we perform a specific action, we move from one state to another. Essentially, we are
changing from one state to another.

3. We receive a reward at each transition.

4. Because the transition is based on action, what exactly motivates a specific action? This is
referred to as policy. This policy is incorporated into the system that drives our actions. The
policy here could be based on something that will ultimately benefit us.

APPLICATION TO TRADING:
The question that lies here is how do we apply DRL to the stock market.
So, when the machine enters the market, it is essentially entering the environment of a
specific stock. When a machine attempts to understand a chart, multiple factors and cases are
presented to it. In this case, the various market parameters, such as data from candle charts,
trend analysis, or sentimental analysis, form the state and provide an understanding to the
machine, on which it can act based on the policies that it has been fed. The policies in this
case could be trading strategies or various types of analysis. The machine's actions are limited
to buy/sell/hold. If the machine decides to buy a particular stock at a specific step, it will
reach the point where it will own the stock. Following that, if it decides to hold in situations
where the stock price rises or falls, it reaches another state, and so on. Finally, it receives an
award if it sells the stock at a profit. The system will aim to maximise the reward in the short
or long term based on the strategy or policy fed to it.
9|Page
THE PROBLEM:
To put it simply, the problem is determining when to act so that the shift can occur to another
state. In the normal course of events, humans often know what all possible scenarios are by
judging and drawing conclusions based on past statistical data, and we may know the best
time to buy/sell/hold. Because knowing when to buy/sell/hold comes from experience, and
DRL teaches through experience, let's use an example to illustrate the issue:

The machine enters a trade in which it shorts an option at price 'x.' This option has a high
level of volatility. The price fluctuates as 'x+10', 'x+5', 'x-2', 'x-7', and finally falls to 'x-10', at
which point the machine closes the trade. Now, let's take a look at the entire scenario. When
the prices were rising initially, there was a possibility that the price would not have gone up
indefinitely and, as a result of speculative behaviour, the machine could have waited for the
price to fall but, in doing so, it could have earned a huge loss later on if such a scenario had
not occurred. Another possibility is that prices would have continued to fall even after 'x-10,'
resulting in profits greater than 'x-10' in the long run. There could be an infinite number of
possibilities, both in the short and long term.

As a result, we can conclude that the problem here is to assist the machine in making
decisions in all possible scenarios, as well as to assist the machine in dealing with the concept
of 'delayed gratification,' and thus labelling different decisions/actions.

To deal with the problem explained above of ‘Delayed Gratification’, we label each trade
with a particular profit/loss function. So here in the image above the machine executed a long
position at $1200 and exited the trade at $1300 thus making a profit of $100. Here the system
has identified the value of each hold option as zero and it is skipping the previous trades just
on the speculation that it might receive a higher profit at a later stage and therefore it executes
the final trade at $1300 considering that it is making a profit of $100 and if exercised earlier,
it would have made a lower profit.

BELLMAN’S EQUATION

10 | P a g e
The Bellman equation is a mathematical formula used in dynamic programming and
reinforcement learning to compute the optimal value function for a given Markov decision
process (MDP). The equation is named after its creator, Richard Bellman. The Bellman
equation states that the optimal value function for a given MDP and discount factor is equal
to the sum of the expected values of the immediate rewards from each possible state,
discounted by the discount factor. In other words, the optimal value function is the sum of the
expected values of all possible future rewards, discounted by the discount factor. The
Bellman equation can be used to solve for the optimal policy for an MDP. The optimal policy
is the policy that maximizes the expected value of the discounted sum of future rewards. The
Bellman equation can also be used to compute the value function for a given policy. The
value function for a given policy is the expected value of the discounted sum of future
rewards under that policy. The Bellman equation is a fundamental tool in dynamic
programming and reinforcement learning. The Bellman equation is a key component of
dynamic programming, which is a method of solving optimisation problems by breaking
them down into smaller subproblems. The Bellman equation can be used to find an optimal
solution to a wide range of optimisation problems, including those involving decision making
under uncertainty, such as in economics and finance. The Bellman equation is a recursive
equation, which means that it defines a sequence of future optimal values in terms of a
current optimal value. In other words, the equation tells us how to find the best future
decision given our current situation.

Here we use the Bellman’s equation to label each and every action. Each of the hold
decisions are labelled which wasn’t in the earlier example and they were being identified as
zero. Through Bellman’s Equation we give a value to each of those hold options that if we
hold the trade then how much value would it be worth at the end. The value with which these
hold values are labelled can be considered as profit and loss value for each of those trades.
This is not exactly a Profit or loss value but it is a quasi-profit and loss value. This is done so
that it will give an idea of how much the decision to hold is worth compared to the decisions
to sell at different points during the trade. If the machine concludes that the decision to hold
is more compared to that of selling, the machine will continue to hold the trade and this cycle
will continue until it reaches a point where it figures out to achieve a profit.

11 | P a g e
Here the value for γ =0.9

Q ( s , a )=r ( s , a ) + γ max Q ( s ' ,a )

◦ r is the immediate reward of an action a

◦ Q is the cumulative reward of each action that is expected. It is an imaginative reward

that we might get if we choose the hold option.

◦ s’ is the future state where we end up after performing an action (in the future).

◦ S is the previous state.

◦ The equation works retroactively since we go from future state & evaluate the
previous state.

• γ is the discount factor. It gives weight to the future state w.r.t the previous state.

• If γ = 0, reward = 0. we have the original table where we started, where value

were 0

• The larger the value for γ , the higher the weight to the future outcomes. Based
on the value, the reward of holding the position will be higher than that of
closing the position.

Therefore, we can state that executing trades based on the Bellman’s Equation, we are not
just speculating or taking a decision based on greed or by just considering the on-spot returns,
but we also consider the other aspects of holding the option with a conviction that we will be
able to receive higher returns in the future. Therefore, Reinforcement Learning helps us make
decisions based on the expectancy of future outcomes rather than on the basis of immediate
actions.

CONSTRAINTS IN USING DEEP REINFORCEMENT LEARNING TO TRADE

STOCKS

There are a few key constraints to using deep reinforcement learning for stock trading.

1. The deep reinforcement learning algorithm must be able to accurately predict future stock
prices.

12 | P a g e
2. The deep reinforcement learning algorithm must be able to make quick and accurate
decisions in order to take advantage of stock price fluctuations.

3. The deep reinforcement learning algorithm must be able to handle large amounts of data in
order to make informed decisions.

TRADING STRATEGY IN ALGORITHMIC TRADING

An algorithmic trading strategy is a set of rules that determine when to buy and sell a
security. These rules can be based on a number of factors, including price, volume, and
technical indicators. Algorithmic traders use these rules to create and execute trading orders
automatically. This allows them to execute trades quickly and efficiently, without the need
for human intervention. Algorithmic trading strategies can be used for a variety of purposes,
including market making, arbitrage, and trend following. Algorithmic trading strategies can
be used by individual investors, or by professional money managers. Many institutional
investors use algorithmic trading strategies to minimize the costs of trading, and to improve
the speed and execution of their orders.

There are a variety of different types of algorithmic trading strategies, including trend-
following, breakout, and mean-reversion. Each of these strategies has its own strengths and
weaknesses, and may be more or less appropriate for a particular type of security or market
condition. Trend following is a strategy that attempts to capture long-term price trends. It
looks for stocks that are moving in a particular direction and attempts to ride the trend for as
long as possible. Breakout trading is a strategy that looks for stocks that have recently broken
out of a trading range. The trader buys the stock when it breaks out and sells it when it returns
to the range. Mean reversion is a strategy that looks for stocks that have been moving in a
particular direction for a while and then reverses course. The trader buys the stock when it
begins to move back in the opposite direction and sells it when it returns to its original
direction.

13 | P a g e
It is important to remember that no single algorithmic trading strategy is guaranteed to be
successful in all market conditions. It is important to carefully test any algorithmic trading
strategy before using it in a live market. There are many different types of algorithmic trading
strategies, but all share a common goal: to make money by buying and selling securities at
the right time.

TIMELINE DISCRETISATION IN ALGORITHMIC TRADING

In algorithmic trading, time is often discretized into small time-steps in order to make the
calculations involved in trading more manageable. This discretization can be done in a
number of ways, each with its own set of benefits and drawbacks.

One way to discretize time is to use a fixed time-step size. This approach is simple to
implement, and it is easy to calculate the time-step size that is needed to achieve a desired
level of accuracy. However, this approach can lead to inaccuracies if the time-step size is not
large enough, and it can also cause problems if the time-step size is too large. Another way to
discretize time is to use a variable time-step size. This approach can be more accurate than
the fixed time-step size approach, but it can be more complex to implement. Additionally, it
can be more difficult to calculate the time-step size that is needed to achieve a desired level
of accuracy.

A time series can be discretized into a sequence of points in time, called a time-series
discretization. This sequence can be used to approximate the original time series, and can be
used in algorithms for time-series analysis and trading. The time-series discretization can be
done in different ways, and different ways can be better or worse for different purposes.

DEEP REINFORCEMENT LEARNING ALGORITHM DESIGN

There are a few considerations when designing a reinforcement learning algorithm:

1. The environment should be able to provide feedback to the learner about the success of its
actions. This feedback can be in the form of rewards (positive feedback) or punishments
(negative feedback).

2. The learner should be able to determine which actions are most likely to result in positive
outcomes, and then select those actions more often.

3. The learner should also be able to adapt its behavior over time, based on the results of its
previous actions.

14 | P a g e
4. The environment should be able to provide enough information to the learner so that it can
make informed decisions.

5. The environment should be stable, so that the learner can make reliable predictions about
the outcomes of its actions.

DEEP REINFORCEMENT LEARNING ALGORITHM

The deep reinforcement learning algorithm is a neural network-based approach to machine

learning that allows a computer system to learn how to achieve a desired outcome through
trial and error. The algorithm is composed of multiple layers of neurons, which are
interconnected and can learn how to recognize patterns in data. The algorithm can also adjust
its own parameters in order to improve its performance over time.

Deep Q-Network algorithm

The deep Q-network (DQN) algorithm is a neural network algorithm used for training
artificial intelligence agents, specifically those used in reinforcement learning. The DQN
algorithm was proposed by DeepMind in 2013 as an extension of the Q-learning algorithm.

The DQN algorithm is a neural network that uses a deep learning approach to reinforcement
learning. It consists of two parts: a deep belief network (DBN) and a Q-network. The DBN is
used to learn the state representation of the environment, while the Q-network is used to learn
the optimal action for each state.

The DQN algorithm has been shown to be more effective than traditional reinforcement
learning algorithms, such as Q-learning and SARSA. In particular, it has been shown to be
able to learn more effectively from experience and generalize better to new environments.

Double DQN

Double DQN is a reinforcement learning algorithm that combines the advantages of both
deep Q-learning and double Q-learning.

The algorithm is based on the idea of "teacher forcing" in deep Q-learning, which uses a
teacher agent to provide feedback on the quality of each action. In double Q-learning, this
feedback is used to improve the estimates of the value functions for both the agent and
teacher.

15 | P a g e
The Double DQN algorithm extends this idea by using two teacher agents, one for each
player in a two-player game. The agents share a common value function, which is updated
using feedback from both players. This allows the Double DQN algorithm to learn more
effectively in games with multiple rounds or states, where traditional deep Q-learning
algorithms can struggle.

ADAM optimiser:

The ADAM optimiser is a tool that can be used to improve the performance of your ADAM
deployment. The optimiser can be used to identify and correct potential performance
bottlenecks in your deployment.

To use the ADAM optimiser, you must first install the Microsoft Windows Performance
Toolkit. The Windows Performance Toolkit is available as a free download from Microsoft.

Once you have installed the Windows Performance Toolkit, you can launch the ADAM
optimiser by running the following command from a command prompt:

adamopt.exe

The ADAM optimiser will prompt you for the location of your ADAM installation. Once you
have specified the location of your ADAM installation, the optimiser will scan your
deployment for potential performance bottlenecks.

The ADAM optimiser will report any potential performance bottlenecks that it finds. The
optimiser will also provide recommendations on how to correct these bottlenecks.

Huber loss

In information theory, the Huber loss is a measure of the distortion introduced by a statistical
estimator. It is named after Leo Huber, who introduced it in 1962.

The Huber loss is defined as

where "x" is the true value of a random variable, "y" is the estimate of "x" produced by an
estimator, and "ε" is an arbitrarily small positive number. The Huber loss penalizes estimates
that are too far from the true value more than estimates that are close to the true value. This
ensures that the estimated values are clustered around the true value, rather than being
scattered all over the place.

16 | P a g e
The Huber loss is a popular measure of distortion because it is relatively insensitive to
outliers. This makes it a good choice for estimating the values of variables that are affected
by outliers, such as the weights of patients in a medical study.

Gradient clipping

The gradient clipping feature allows you to clip the gradient to a specific shape. This is useful
when you want to create a custom gradient effect.

To clip the gradient, select the shape and then select the Gradient tool. In the Options bar,
click the Clipping button and then select the shape from the list.

The gradient will be clipped to the shape.

Xavier initialisation

The Xavier initialisation is a technique used to improve the performance of deep learning
networks. It was proposed by Geoffrey Hinton in 2014.

The Xavier initialisation technique uses a weight initialization scheme that is designed to
produce a more uniform distribution of activations across the layers of a deep neural network.
This helps to avoid problems such as "sparse" activations, where some layers have many
more activated neurons than others. The Xavier initialisation also helps to prevent the
formation of "clusters" of activated neurons, which can lead to poor network performance.

BATCH NORMALISATION LAYERS

Batch normalisation layers are a type of layer used in deep learning networks. They are used
to improve the accuracy and speed of training by normalising the input data across a batch of
examples. This helps to prevent the network from overfitting to any one example in the batch,
and leads to more accurate predictions.

Batch normalisation layers are typically added after the input layer and before the first fully-
connected layer in a deep learning network.

Regularisation techniques

There are a number of ways to regularise an image. The most common are:

1. Use a filter to smooth the image. This will remove some of the noise, but may also blur the
image.

17 | P a g e
2. Use a median filter to remove noise from an image. This is a more sophisticated technique
that can be more effective than a simple smoothing filter.

3. Use a Laplacian filter to remove noise from an image. This is another sophisticated
technique that can be more effective than a simple smoothing or median filter.

4. Use a wavelet transform to reduce the noise in an image. This is a more sophisticated
technique that can be more effective than a simple smoothing, median or Laplacian filter.

Preprocessing and normalisation

The data was preprocessed and normalised using the following steps:

1. The data was split into a training set and a test set.

2. The training set was preprocessed and normalised.

3. The test set was not preprocessed or normalised.

Preprocessing

The data was preprocessed using the following steps:

1. The data was split into a training set and a test set.

2. The training set was preprocessed.

3. The test set was not preprocessed.

Data augmentation techniques

There are a few different ways to add data to your training set in order to improve your
machine learning models.

1. Synthetic data generation

Synthetic data is data that is artificially generated, usually using a computer. This can be done
in a number of ways, including:

sampling from a distribution : This involves randomly selecting values from within a given
distribution. For example, you could use this technique to generate new data points that are
similar to those in your training set. This can help improve the accuracy of your models by
increasing the diversity of the data they are trained on.

18 | P a g e
DATA RESULTS

Figure: RAW DATA FOR GOOGLE STOCK DATA

19 | P a g e
Figure: DQN NETWORK: TRAIN & TEST DATA

FIGURE: DQN NETWORK: TRAIN & TEST – REWARD & LOSS

20 | P a g e
Figure: DOUBLE DQN NETWORK: TRAIN & TEST DATA

Figure: DOUBLE DQN NETWORK: TRAIN & TEST – REWARD & LOSS

21 | P a g e
Figure: DUELING DOUBLE DQN NETWORK: TRAIN & TEST DATA

Figure: DOUBLE DQN NETWORK: TRAIN & TEST – REWARD & LOSS

22 | P a g e

(Addison-Wesley Data & Analytics Series) Laura Graesser - Wah Loon Keng - Foundations of Deep Reinforcement Learning - Theory and Practice in Python-Addison-Wesley Professional (2019) PDF
100% (1)
(Addison-Wesley Data & Analytics Series) Laura Graesser - Wah Loon Keng - Foundations of Deep Reinforcement Learning - Theory and Practice in Python-Addison-Wesley Professional (2019) PDF
656 pages
Deep Reinforcement Learning Mohit Sewak
No ratings yet
Deep Reinforcement Learning Mohit Sewak
6 pages
Yuming Li Pin Ni Victor Chang
No ratings yet
Yuming Li Pin Ni Victor Chang
19 pages
VC Comp Deep Reinforcement Learning Accepted
No ratings yet
VC Comp Deep Reinforcement Learning Accepted
20 pages
ML FINA l note
No ratings yet
ML FINA l note
90 pages
Application_of_Deep_Reinforcement_Learning_to_Algo_Trading
No ratings yet
Application_of_Deep_Reinforcement_Learning_to_Algo_Trading
15 pages
A_Futures_Quantitative_Trading_Strategy_Based_on_a_Deep_Reinforcement_Learning_Algorithm
No ratings yet
A_Futures_Quantitative_Trading_Strategy_Based_on_a_Deep_Reinforcement_Learning_Algorithm
5 pages
A Brief Survey of Deep Reinforcement Learning PDF
No ratings yet
A Brief Survey of Deep Reinforcement Learning PDF
14 pages
An Introduction To Deep Reinforcement Learning PDF
No ratings yet
An Introduction To Deep Reinforcement Learning PDF
140 pages
Unit 5 - Copy
No ratings yet
Unit 5 - Copy
7 pages
Deep Reinforcement Learning in Computer Vision: A Comprehensive Survey
No ratings yet
Deep Reinforcement Learning in Computer Vision: A Comprehensive Survey
103 pages
Deep Reinforcement Learning: An Essential Guide
From Everand
Deep Reinforcement Learning: An Essential Guide
Robert Johnson
No ratings yet
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
7 pages
JETIR2107018
No ratings yet
JETIR2107018
5 pages
applsci_13_02443
No ratings yet
applsci_13_02443
23 pages
Deep Learning
From Everand
Deep Learning
Manish Soni
No ratings yet
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
3 pages
A2c Bot
No ratings yet
A2c Bot
20 pages
Pplication of Deep Reinforcement Learning For Ndian Stock Trading Automation
No ratings yet
Pplication of Deep Reinforcement Learning For Ndian Stock Trading Automation
9 pages
Deep Learning
No ratings yet
Deep Learning
7 pages
ML NOTES
No ratings yet
ML NOTES
89 pages
A Deep Reinforcement Learning Algorithm For Robotic Manipulation Tasks in Simulated Environments
No ratings yet
A Deep Reinforcement Learning Algorithm For Robotic Manipulation Tasks in Simulated Environments
10 pages
Deep Reinforcement Learning in Agent Based Financial Market Simulation
No ratings yet
Deep Reinforcement Learning in Agent Based Financial Market Simulation
17 pages
50 Breakthrough AI Concepts in 500 Words Each: In 500 words, #17
From Everand
50 Breakthrough AI Concepts in 500 Words Each: In 500 words, #17
Nietsnie Trebla
No ratings yet
mathematics-12-01621
No ratings yet
mathematics-12-01621
22 pages
Deep Learning Algorithms
No ratings yet
Deep Learning Algorithms
21 pages
Unit-3 Notes
No ratings yet
Unit-3 Notes
16 pages
Internet of Things (IoT) A Quick Start Guide: A to Z of IoT Essentials
From Everand
Internet of Things (IoT) A Quick Start Guide: A to Z of IoT Essentials
Chitra Lele
No ratings yet
Algorithmic Trading On Financial Time Series Using
No ratings yet
Algorithmic Trading On Financial Time Series Using
20 pages
3rd Unit DL Final Class Notes (1)
No ratings yet
3rd Unit DL Final Class Notes (1)
78 pages
Unit 1
No ratings yet
Unit 1
21 pages
Deep Learning in The Era of Big Data: Foundations, Advances, Applications, Challenges, and Future Directions
No ratings yet
Deep Learning in The Era of Big Data: Foundations, Advances, Applications, Challenges, and Future Directions
4 pages
Improving Deep Reinforcement Learning Agent Tradin
No ratings yet
Improving Deep Reinforcement Learning Agent Tradin
9 pages
Good - DRL Survey
No ratings yet
Good - DRL Survey
25 pages
Graph Data Science with Python and Neo4j
From Everand
Graph Data Science with Python and Neo4j
Timothy Eastridge
No ratings yet
Graph Data Science with Python and Neo4j: Hands-on Projects on Python and Neo4j Integration for Data Visualization and Analysis Using Graph Data Science for Building Enterprise Strategies (English Edition)
From Everand
Graph Data Science with Python and Neo4j: Hands-on Projects on Python and Neo4j Integration for Data Visualization and Analysis Using Graph Data Science for Building Enterprise Strategies (English Edition)
Timothy Eastridge
No ratings yet
1-s2.0-S095741742303083X-main
No ratings yet
1-s2.0-S095741742303083X-main
16 pages
Self Learning Agent-1
No ratings yet
Self Learning Agent-1
12 pages
An Analysis of Machine Learning and Deep Learning Sharif Zhanel
No ratings yet
An Analysis of Machine Learning and Deep Learning Sharif Zhanel
8 pages
A Survey of Deep Reinforcement Learning in Video Games
No ratings yet
A Survey of Deep Reinforcement Learning in Video Games
13 pages
Notes RL
No ratings yet
Notes RL
12 pages
Financial Trading As A Game: A Deep Reinforcement Learning Approach
No ratings yet
Financial Trading As A Game: A Deep Reinforcement Learning Approach
15 pages
Deep Learning for Beginners: A Comprehensive Introduction of Deep Learning Fundamentals for Beginners to Understanding Frameworks, Neural Networks, Large Datasets, and Creative Applications with Ease
From Everand
Deep Learning for Beginners: A Comprehensive Introduction of Deep Learning Fundamentals for Beginners to Understanding Frameworks, Neural Networks, Large Datasets, and Creative Applications with Ease
Steven Cooper
5/5 (1)
Article
No ratings yet
Article
10 pages
ML_FINAL Reference (1)
No ratings yet
ML_FINAL Reference (1)
89 pages
Artificial Intelligence 2024 Book 2 of 2: AI, #2
From Everand
Artificial Intelligence 2024 Book 2 of 2: AI, #2
Yang Yen Thaw
No ratings yet
A Short Survey On Memory Based RL
No ratings yet
A Short Survey On Memory Based RL
18 pages
Practical Deep Reinforcement Learning with Python: Concise Implementation of Algorithms, Simplified Maths, and Effective Use of TensorFlow and PyTorch (English Edition)
From Everand
Practical Deep Reinforcement Learning with Python: Concise Implementation of Algorithms, Simplified Maths, and Effective Use of TensorFlow and PyTorch (English Edition)
Ivan Gridin
4/5 (1)
Comprehensive Review of Deep ReinforcementLearning Methods and Applications in Economics
No ratings yet
Comprehensive Review of Deep ReinforcementLearning Methods and Applications in Economics
42 pages
21PA1A05E3 Documentation-1
No ratings yet
21PA1A05E3 Documentation-1
32 pages
3 8 BoykoN Bosik Ivanets
No ratings yet
3 8 BoykoN Bosik Ivanets
6 pages
Hyperparameter Tuning For Deep Reinforcement Learning Applications
No ratings yet
Hyperparameter Tuning For Deep Reinforcement Learning Applications
12 pages
case
No ratings yet
case
6 pages
Data and Analytics in Action: Project Ideas and Basic Code Skeleton in Python
From Everand
Data and Analytics in Action: Project Ideas and Basic Code Skeleton in Python
Zemelak Goraga
No ratings yet
Understanding Reinforcement Learning Algorithms：Q Learning
No ratings yet
Understanding Reinforcement Learning Algorithms：Q Learning
18 pages
Applied Deep Learning: Design and implement your own Neural Networks to solve real-world problems (English Edition)
From Everand
Applied Deep Learning: Design and implement your own Neural Networks to solve real-world problems (English Edition)
Dr. Rajkumar Tekchandani
No ratings yet
33 Optimization of Multi Factor M
No ratings yet
33 Optimization of Multi Factor M
7 pages
Module-2
No ratings yet
Module-2
37 pages
Beating The Stock Market With A Deep Reinforcement Learning Day Trading System
No ratings yet
Beating The Stock Market With A Deep Reinforcement Learning Day Trading System
8 pages
Chapter 11 - Reinforcement Learning Methods in Algorithmic Trading
No ratings yet
Chapter 11 - Reinforcement Learning Methods in Algorithmic Trading
23 pages
Hamilton-Jacobi-Bellman Equations and Optimal Control: !birkh Auser Verlag Basel
No ratings yet
Hamilton-Jacobi-Bellman Equations and Optimal Control: !birkh Auser Verlag Basel
10 pages
EC744 Lecture Note 3 Dynamic Programming Under Certainty: Prof. Jianjun Miao
No ratings yet
EC744 Lecture Note 3 Dynamic Programming Under Certainty: Prof. Jianjun Miao
17 pages
Markov Decision Processes With Ordinal Rewards: Reference Point-Based Preferences
No ratings yet
Markov Decision Processes With Ordinal Rewards: Reference Point-Based Preferences
8 pages
Bellman, Lee - Functional Equations in Dynamic Programming
No ratings yet
Bellman, Lee - Functional Equations in Dynamic Programming
18 pages
Dynamic Programming: Design and Analysis of Algorithms
No ratings yet
Dynamic Programming: Design and Analysis of Algorithms
41 pages
Unit 4 PDF
No ratings yet
Unit 4 PDF
105 pages
Microgrid Optimal Power Flow MATLAB Simulation
100% (1)
Microgrid Optimal Power Flow MATLAB Simulation
19 pages
Aio PDF
No ratings yet
Aio PDF
256 pages
Calculaus
No ratings yet
Calculaus
8 pages
Mba 205
No ratings yet
Mba 205
23 pages
Introduction_To_Dynamic_Programming
No ratings yet
Introduction_To_Dynamic_Programming
15 pages
Examples of Dynamic Programming Problems
No ratings yet
Examples of Dynamic Programming Problems
5 pages
Unit-5 Reinforcemnt and Q learning
No ratings yet
Unit-5 Reinforcemnt and Q learning
45 pages
Agent-Based Modeling For Decision Making in Economics Under Uncertainty
No ratings yet
Agent-Based Modeling For Decision Making in Economics Under Uncertainty
20 pages
Reinforcement Learning: Instructor: Max Welling
No ratings yet
Reinforcement Learning: Instructor: Max Welling
18 pages
Reinforcement Learning in Robotics A Survey
No ratings yet
Reinforcement Learning in Robotics A Survey
37 pages
9 Unsupervised Learning: 9.1 K-Means Clustering
No ratings yet
9 Unsupervised Learning: 9.1 K-Means Clustering
34 pages
Growth Theory: Larry Blume
No ratings yet
Growth Theory: Larry Blume
15 pages
RL 2048
No ratings yet
RL 2048
10 pages
Markov Decision Processes
100% (1)
Markov Decision Processes
104 pages
Dynammic Programming Shortest Route
No ratings yet
Dynammic Programming Shortest Route
18 pages
Optimal Control
No ratings yet
Optimal Control
51 pages
Hansen 1985 PDF
No ratings yet
Hansen 1985 PDF
19 pages
2015 - Guarnieri - Decision Models in Engineering and Management
100% (2)
2015 - Guarnieri - Decision Models in Engineering and Management
322 pages
Dynamic Programming (DP)
No ratings yet
Dynamic Programming (DP)
32 pages
An Introduction To Policy Search Methods: Thomas Furmston
No ratings yet
An Introduction To Policy Search Methods: Thomas Furmston
33 pages
6 Hansen Model
No ratings yet
6 Hansen Model
26 pages
Wickens
No ratings yet
Wickens
105 pages
2012mo Zheng Lin Shapeofoptionimplied
No ratings yet
2012mo Zheng Lin Shapeofoptionimplied
27 pages
Infinite-Horizon Dynamic Programming: Tianxiao Zheng Saif
No ratings yet
Infinite-Horizon Dynamic Programming: Tianxiao Zheng Saif
10 pages

0327_AaronGerardDaniel

Uploaded by

0327_AaronGerardDaniel

Uploaded by

PROJECT REPORT

Topic: MACHINE THINKING FOR ALGORITHMIC TRADING

MONTH AND YEAR OF SUBMISSION:

PROJECT COMPLETION AND PLAGIARISM VERIFICATION

Student Name: Aaron Gerard Daniel

Similarity Index: __%

The dissertation may be considered for submission.

Name of the Supervisor: Dr. Soheli Ghose

Date: __ April 2021

There is a need to design something innovative as current Artificial Intelligence approaches

To develop stock trading strategies, academics have been increasingly interested in

What is Reinforcement Learning?

Similarly, when a machine is given a scenario, it attempts to comprehend the scenario in a

Supervised learning versus reinforcement learning

Reinforcement learning is a machine learning algorithm that teaches a computer how to

MARKOV DECISION PROCESS:

1. We begin by performing an action or taking a step. It is essentially a decision that must be

3. We receive a reward at each transition.

Q ( s , a )=r ( s , a ) + γ max Q ( s ' ,a )

◦ r is the immediate reward of an action a

◦ Q is the cumulative reward of each action that is expected. It is an imaginative reward

◦ S is the previous state.

• If γ = 0, reward = 0. we have the original table where we started, where value

CONSTRAINTS IN USING DEEP REINFORCEMENT LEARNING TO TRADE

TRADING STRATEGY IN ALGORITHMIC TRADING

TIMELINE DISCRETISATION IN ALGORITHMIC TRADING

DEEP REINFORCEMENT LEARNING ALGORITHM DESIGN

There are a few considerations when designing a reinforcement learning algorithm:

DEEP REINFORCEMENT LEARNING ALGORITHM

The deep reinforcement learning algorithm is a neural network-based approach to machine

Deep Q-Network algorithm

The Huber loss is defined as

The gradient will be clipped to the shape.

BATCH NORMALISATION LAYERS

Preprocessing and normalisation

2. The training set was preprocessed and normalised.

3. The test set was not preprocessed or normalised.

The data was preprocessed using the following steps:

2. The training set was preprocessed.

3. The test set was not preprocessed.

Data augmentation techniques

1. Synthetic data generation

Figure: RAW DATA FOR GOOGLE STOCK DATA

FIGURE: DQN NETWORK: TRAIN & TEST – REWARD & LOSS

You might also like