1 Introduction To Reinforcement Learning
1 Introduction To Reinforcement Learning
Thalesians Ltd
Level39, One Canada Square, Canary Wharf, London E14 5AB
2023.01.17
Introduction to Reinforcement Learning
A historical perspective
I This painting from the Everett Collection depicts Wentworth Works, file and steel
manufacturers and exporters of iron in Sheffield, England, ca. 1860.
I According to the 15th edition of Encyclopædia Britannica, the Industrial Revolution,
in modern history, is the process of change from an agrarian, handicraft economy to
one dominated by industry and machine manufacture.
I Started around 1760 and until around 1830 was largely confined to Britain.
I The technological changes included:
I the use of new basic materials, chiefly iron and steel,
I the use of new energy sources, including both fuels and motive power, such as coal, the
steam engine, electricity, petroleum, and the internal-combustion engine,
I the invention of new machines, such as the spinning jenny and the power loom that permitted
increased production with a smaller expenditure of human energy,
I a new organisation of work known as the factory system, which entailed increased division of
labour and specialisation of function,
I important developments in transportation and communication, including the stream
locomotive, steamship, automobile, airplane, telegraph, and radio, and
I the increasing application of science to industry.
I This was the first step towards automation.
Introduction to Reinforcement Learning
A historical perspective
From https://www.intriguing-history.com/great-exhibition/:
I On 1st May 1851 over half a million people massed in Hyde Park in London to witness
its opening.
I Prince Albert captured the mood of the time when the British considered themselves to
be the workshop of the world.
I The exhibition was to be the biggest display of objects of industry from all over the
world with over half of it given over to all that Britain manufactured. It was to be a
showcase for a hundred thousand objects, of inventions, machines, and creative
works.
I The works of industry of all nations was to be a combination of visual wonder,
competition (between manufacturers with prizes awarded) and shopping.
I The main exhibition hall was a giant glass structure, with over a million square feet of
glass. The man who designed it, Joseph Paxton, named it the Crystal Palace. In
itself it was a wondrous thing to behold and covered nearly 20 acres, easily
accommodating the huge elm trees that grew in the park.
Introduction to Reinforcement Learning
A historical perspective
I According to Wikipedia, the Digital Revolution is the shift from mechanical and
analogue electronic technology to digital electronics which began anywhere from the
late 1950s to the late 1970s with the adoption and proliferation of digital computers
and digital record keeping that continues to the present day.
I The term also refers to the sweeping changes brought about by digital computing and
communication technology during (and after) the latter half of the 20th century.
I The Digital Revolution marked the beginning of the Information Age—a historical
period characterized by a rapid epochal shift from the traditional industry established
by the Industrial Revolution to an economy primarily based upon information
technology.
Introduction to Reinforcement Learning
A historical perspective
Figure: Rings of time: Information Age (Digital Revolution) from 1968 to 2017. Spruce tree. By Petar
Milošević.
Introduction to Reinforcement Learning
A historical perspective
Marvin Minsky
Introduction to Reinforcement Learning
A historical perspective
I Reinforcement learning differs from other types of machine learning in that the
training information is used to evaluate the actions rather than instruct as to what the
correct actions should be.
I Instructive feedback, as in supervised machine learning, points out the correct
action to take independent of the action taken.
I Evaluative feedback, as in reinforcement learning, points out how good the action
taken is, but not whether it is the best or the worst action possible.
I This creates the need for active exploration, a trial-and-error search for good
behaviour.
Introduction to Reinforcement Learning
A different kind of learning
Agent
Observations
State change: st +1
Reward: rt
Agent Environment
Actions
Action: at
Environment
Observations
State change: st +1
Reward: rt
Agent Environment
Actions
Action: at
The environment is the world in which the agent exists and operates.
Introduction to Reinforcement Learning
Elements of reinforcement learning
Action
Observations
State change: st +1
Reward: rt
Agent Environment
Actions
Action: at
Observation
Observations
State change: st +1
Reward: rt
Agent Environment
Actions
Action: at
The observation provides the agent with information about the (possibly changed)
environment after taking an action.
Introduction to Reinforcement Learning
Elements of reinforcement learning
State
Observations
State change: st +1
Reward: rt
Agent Environment
Actions
Action: at
Reward
Observations
State change: st +1
Reward: rt
Agent Environment
Actions
Action: at
The reward is the feedback that measures the success or failure of the agent’s action. It
defines the goal of a reinforcement learning problem.
Introduction to Reinforcement Learning
Elements of reinforcement learning
Total reward
Observations
State change: st +1
Reward: rt
Agent Environment
Actions
Action: at
The total (future) reward is given by Gt = ∑i∞=t +1 ri . May or may not converge.
Introduction to Reinforcement Learning
Elements of reinforcement learning
Observations
State change: st +1
Reward: rt
Agent Environment
Actions
Action: at
Reward hypothesis
History
The history consists in the sequence of all observations, actions, and rewards (i.e. all
observable variables) up to the current time:
Ht = s0 , a0 , r0 , s1 , a1 , r1 , s2 , a2 , r2 , s3 , a3 , r3 , . . . , st .
Introduction to Reinforcement Learning
Elements of reinforcement learning
Environment state
I The agent state, st , may or may not match the environment state, ste .
I Consider for example, a poker game. The agent (a poker player) knows only his hand.
The environment state includes the hand of each poker player.
I In chess, on the other hand, st = ste — it is a perfect information game.
Introduction to Reinforcement Learning
Elements of reinforcement learning
Markov state
P [st +1 | st ] = P [st +1 | s0 , . . . , st ] ,
in other words, the future is independent of the past given the present.
Introduction to Reinforcement Learning
Elements of reinforcement learning
Policy
Value function
vπ (s ) = Eπ [rt + γrt +1 + γ2 rt +2 + γ3 rt +3 + . . . | St = s ].
I Whereas the reward signal indicates what is good in an immediate sense, a value
function specifies what is good in the long run.
I Roughly speaking, the value of a state is the total amount of reward an agent can
expect to accumulate over the future, starting from that state.
Introduction to Reinforcement Learning
Elements of reinforcement learning
Model
Phil’s breakfast
A prop trader
A proprietary trader [Car15, Cha08, Cha13, Cha16, Dur13, Tul15] observes the dynamics
of market securities and watches economic releases and news unfold on his Bloomberg
terminal. Based on this information, considering both the tactical and strategic information,
he places buy and sell orders, stop losses and stop gains. The trader’s goal is to have a
strong PnL.
Introduction to Reinforcement Learning
Examples of reinforcement learning
A vanilla options market maker [Che98, Cla10, JFB15, Tal96, Wys17] produces two-sided
quotes in FX options. She hedges her options position with spot. The market moves all the
time, so her risk (delta, gamma, vega, etc.) keeps changing. The market maker’s goal is to
hedge the position as safely and as cheaply as possible.
Introduction to Reinforcement Learning
Origins of reinforcement learning
Checkers (i)
The game of checkers [Sam59, Sam67], following some ideas from [Sha50].
Introduction to Reinforcement Learning
Successes of reinforcement learning
Checkers (ii)
Checkers (iii)
Backgammon (i)
Backgammon (ii)
Backgammon (iii)
Backgammon (iv)
Go (i)
Go (ii)
In Mastering the game of Go with deep neural networks and tree search [SHM+ 16]:
The game of Go has long been viewed as the most challenging of classic games
for artificial intelligence owing to its enormous search space and the difficulty of
evaluating board positions and moves. Here we introduce a new approach to
computer Go that uses ‘value networks’ to evaluate board positions and ‘policy
networks’ to select moves. These deep neural networks are trained by a novel
combination of supervised learning from human expert games, and reinforcement
learning from games of self-play. Without any lookahead search, the neural net-
works play Go at the level of state-of-the-art Monte Carlo tree search programs that
simulate thousands of random games of self-play. We also introduce a new search
algorithm that combines Monte Carlo simulation with value and policy networks.
Using this search algorithm, our program AlphaGo achieved a 99.8% winning rate
against other Go programs, and defeated the human European Go champion by 5
games to 0. This is the first time that a computer program has defeated a human
professional player in the full-sized game of Go, a feat previously thought to be at
least a decade away.
Introduction to Reinforcement Learning
Successes of reinforcement learning
Go (iii)
In Simulation, learning, and optimization techniques in Watson’s game strategies [TGL+ 12]:
The game of Jeopardy! features four types of strategic decision-making: 1) Daily
Double wagering; 2) Final Jeopardy! wagering; 3) selecting the next square when
in control of the board; and 4) deciding whether to attempt to answer, i.e., “buzz in”.
Strategies that properly account for the game state and future event probabilities
can yield a huge boost in overall winning chances, when compared with simple
“rule-of-thumb” strategies. In this paper, we present an approach to developing
and testing components to make said strategy decisions, founded upon develop-
ment of reasonably faithful simulation models of the players and the Jeopardy!
game environment. We describe machine learning and Monte Carlo methods
used in simulations to optimize the respective strategy algorithms. Application of
these methods yielded superhuman game strategies for IBM Watson that signifi-
cantly enhanced its overall competitive record.
Introduction to Reinforcement Learning
Successes of reinforcement learning
In [TTG15]:
In this paper, we propose a framework for using reinforcement learning (RL) algo-
rithms to learn good policies for personalised ad recommendation (PAR) sys-
tems. The RL algorithms take into account the long-term effect of an action, and
thus, could be more suitable than myopic techniques like supervised learning and
contextual bandit, for modern PAR systems in which the number of returning visi-
tors is rapidly growing. However, while myopic techniques have been well-studied
in PAR systems, the RL approach is still in its infancy, mainly due to two fundamen-
tal challenges: how to compute a good RL strategy and how to evaluate a solution
using historical data to ensure its “safety” before deployment. In this paper, we pro-
pose to use a family of off-policy evaluation techniques with statistical guarantees
to tackle both these challenges. We apply these methods to a real PAR problem,
both for evaluating the final performance and for optimising the parameters of the
RL algorithm. Our results show that a RL algorithm equipped with these off-policy
evaluation techniques outperforms the myopic approaches. Our results also give
fundamental insights on the difference between the click through rate (CTR) and
life-time value (LTV) metrics for evaluating the performance of a PAR algorithm.
Introduction to Reinforcement Learning
Successes of reinforcement learning
In The QLBS Q-Learner Goes NuQLear: Fitted Q Iteration, Inverse RL, and Option
Portfolios [Hal18]:
The QLBS model is a discrete-time option hedging and pricing model that is based
on Dynamic Programming (DP) and Reinforcement Learning (RL). It combines the
famous Q-Learning method for RL with the Black–Scholes (–Merton) model’s idea
of reducing the problem of option pricing and hedging to the problem of optimal
rebalancing of a dynamic replicating portfolio for the option, which is made of a
stock and cash.
Here we expand on several NuQLear (Numerical Q-Learning) topics with the
QLBS model. First, we investigate the performance of Fitted Q Iteration for a RL
(data-driven) solution to the model, and benchmark it versus a DP (model-based)
solution, as well as versus the BSM model.
Second, we develop an Inverse Reinforcement Learning (IRL) setting for the
model, where we only observe prices and actions (re-hedges) taken by a trader,
but not rewards.
Third, we outline how the QLBS model can be used for pricing portfolios of options,
rather than a single option in isolation, thus providing its own, data-driven and
model independent solution to the (in)famous volatility smile problem of the Black–
Scholes model.
Introduction to Reinforcement Learning
Financial applications of reinforcement learning
RL hedging—Kolm/Ritter
Deep hedging—Buehler/Gonon/Teichmann/Wood/Mohan/Kochems
Deep hedging—Cao/Chen/Hull/Poulos
Wealth management—Dixon/Halperin
Optimal execution—Ning/Lin/Jaimungal
Sutton/Barto
Szepesvári
Csaba Szepesvári. Algorithms for Reinforcement Learning.
Synthesis Lectures on Artificial Intelligence and Machine
Learning, Morgan & Claypool, 2010 [Sze10].
Reinforcement learning is a learning paradigm concerned with
learning to control a system so as to maximize a numerical
performance measure that expresses a long-term objective.
What distinguishes reinforcement learning from supervised
learning is that only partial feedback is given to the learner
about the learner’s predictions. Further, the predictions may
have long term effects through influencing the future state of
the controlled system. Thus, time plays a special role. The
goal in reinforcement learning is to develop efficient learning
algorithms, as well as to understand the algorithms’ merits
and limitations. Reinforcement learning is of great interest be-
cause of the large number of practical applications that it can
be used to address, ranging from problems in artificial intelli-
gence to operations research or control engineering. In this
book, we focus on those algorithms of reinforcement learning
that build on the powerful theory of dynamic programming. We
give a fairly comprehensive catalog of learning problems, de-
scribe the core ideas, note a large number of state of the art
algorithms, followed by the discussion of their theoretical prop-
erties and limitations.
Available online for free:
https://sites.ualberta.ca/˜szepesva/rlbook.html
Introduction to Reinforcement Learning
Textbooks
Bertsekas
Dimitri Bertsekas. Reinforcement Learning and Optimal Control.
Athena Scientific, 2019. [Ber19]
This book considers large and challenging multistage decision problems,
which can be solved in principle by dynamic programming, but their ex-
act solution is computationally intractable. We discuss solution methods
that rely on approximations to produce suboptimal policies with adequate
performance. These methods are known by several essentially equiva-
lent names: reinforcement learning, approximate dynamic programming,
and neuro-dynamic programming. They underlie, among others, the re-
cent impressive successes of self-learning in the context of games such
as chess and Go. One of the aims of the book is to explore the common
boundary between artificial intelligence and optimal control, and to form
a bridge that is accessible by workers with background in either field.
Another aim is to organize coherently the broad mosaic of methods that
have proved successful in practice while having a solid theoretical and/or
logical foundation. This may help researchers and practitioners to find
their way through the maze of competing ideas that constitute the current
state of the art. The mathematical style of this book is somewhat different
than other books by the same author. While we provide a rigorous, albeit
short, mathematical account of the theory of finite and infinite horizon dy-
namic programming, and some fundamental approximation methods, we
rely more on intuitive explanations and less on proof-based insights. We
also illustrate the methodology with many example algorithms and ap-
plications. Selected sections, instructional videos and slides, and other
supporting material may be found at the author’s website.
Introduction to Reinforcement Learning
Textbooks
Agarwal/Jiang/Kakade/Sun
I Work in progress: Alekh Agarwal, Nan Jiang, Sham M. Kakade, Wen Sun.
Reinforcement Learning: Theory and Algorithms [AJKS21].
I A draft is available at https://rltheorybook.github.io/
I Current contents:
I Markov decision processes and computational complexity
I Sample complexity
I Approximate value function methods
I Generalization
I Multi-armed and linear bandits
I Strategic exploration in tabular MDPs
I Linearly parameterized MDPs
I Parametric models with bounded Bellman rank
I Policy gradient methods and non-convex optimization
I Optimality
I Function approximation and the NPG
I CPI, TRPO, and more
I Linear quadratic regulators
I Imitation learning
I Offline reinforcement learning
I Partially observable Markov decision processes
Introduction to Reinforcement Learning
Textbooks
Lapan
Maxim Lapan. Deep Reinforcement Learning Hands-On.
Packt [Lap18].
Recent developments in reinforcement learning (RL), com-
bined with deep learning (DL) have seen unprecedented
progress made towards training agents to solve complex prob-
lems in a human-like way. Google’s use of algorithms to play
and defeat the well-known Atari arcade games has propelled
the field to prominence, and researchers are generating new
ideas at a rapid pace.
Deep Reinforcement Learning Hands-On is a comprehensive
guide to the very latest DL tools and their limitations. You
will evaluate methods including cross-entropy and policy gra-
dients, before applying them to real-world environments. Take
on both the Atari set of virtual games and family favourites
such as Connect4. The book provides an introduction to the
basics of RL, giving you the know-how to code intelligent learn-
ing agents to take on a formidable array of practical tasks.
Discover how to implement Q-learning on ‘grid world’ environ-
ments, teach your agent to buy and trade stocks, and find out
how natural language models are driving the boom in chat-
bots.
Introduction to Reinforcement Learning
Textbooks
Zai/Brown
Dixon/Halperin/Bilokon
Matthew Dixon, Igor Halperin, and Paul Bilokon. Machine
Learning in Finance: From Theory to Practice. Springer, 2020.
This book is written for advanced graduate students and academics in
financial econometrics, management science and applied statistics, in
addition to quants and data scientists in the field of quantitative finance.
We present machine learning as a non-linear extension of various topics
in quantitative economics such as financial econometrics and dynamic
programming, with an emphasis on novel algorithmic representations
of data, regularisation, and techniques for controlling the bias-variance
tradeoff leading to improved out-of-sample forecasting. The book is
presented in three parts, each part covering theory and applications.
The first presents supervised learning for cross-sectional data from both
a Bayesian and frequentist perspective. The more advanced material
places a firm emphasis on neural networks, including deep learning, as
well as Gaussian processes, with examples in investment management
and derivatives. The second part covers supervised learning for time se-
ries data, arguably the most common data type used in finance with ex-
amples in trading, stochastic volatility and fixed income modeling. Finally,
the third part covers reinforcement learning and its applications in trad-
ing, investment and wealth management. We provide Python code ex-
amples to support the readers’ understanding of the methodologies and
applications. As a bridge to research in this emergent field, we present
the frontiers of machine learning in finance from a researcher’s perspec-
tive, highlighting how many well known concepts in statistical physics are
likely to emerge as research topics for machine learning in finance.
Introduction to Reinforcement Learning
Textbooks
Novotny/Bilokon/Galiotos/Délèze
Philip E. Agre.
The Dynamic Structure of Everyday Life.
PhD thesis, Massachusetts Institute of Technology, Cambridge MA, 1988.
Alekh Agarwal, Nan Jiang, Sham M. Kakade, and Wen Sun.
Reinforcement Learning: Theory and Algorithms.
2021.
https://rltheorybook.github.io/.
Richard Bellman.
Dynamic Programming.
Princeton University Press, NJ, 1957.
Dimitri P. Bertsekas.
Dynamic programming and optimal control, Volume I.
Athena Scientific, Belmont, MA, 2001.
Dimitri P. Bertsekas.
Dynamic programming and optimal control, Volume II.
Athena Scientific, Belmont, MA, 2005.
Dimitri P. Bertsekas.
Reinforcement Learning and Optimal Control.
Athena Scientific, 2019.
Hans Buehler, Lukas Gonon, Josef Teichmann, Ben Wood, Baranidharan Mohan, and
Jonathan Kochems.
Introduction to Reinforcement Learning
Bibliography
Deep hedging: Hedging derivatives under generic market frictions using reinforcement
learning.
Research Paper 19–80, Swiss Finance Institute, 2019.
Justin A. Boyan and Michael L. Littman.
Packet routing in dynamically changing networks: A reinforcement learning approach.
In Advances in Neural Information Processing Systems 6 (NIPS 1993), 1993.
Nicole Bäuerle and Ulrich Rieder.
Markov Decision Processes with Applications to Finance.
Springer, 2011.
Dimitri P. Bertsekas and Steven E. Shreve.
Stochastic optimal control.
Academic Press, New York, 1978.
Robert Carver.
Systematic Trading: A Unique New Method for Designing Trading and Investing
Systems.
Harriman House, 2015.
Jay Cao, Jacky Chen, John C. Hull, and Poulos Zissis.
Deep hedging of derivatives using reinforcement learning.
SSRN, December 2019.
Ernest P. Chan.
Quantitative Trading: How to Build Your Own Algorithmic Trading Business.
Wiley, 2008.
Introduction to Reinforcement Learning
Bibliography
Ernest P. Chan.
Algorithmic Trading: Winning Strategies and Their Rationale.
Wiley, 2013.
Ernest P. Chan.
Machine Trading: Deploying Computer Algorithms to Conquer the Markets.
Wiley, 2016.
Zhaohui Chen, editor.
Currency Options and Exchange Rate Economics.
World Scientific, 1998.
Iain J. Clark.
Foreign Exchange Option Pricing: A Practitioner’s Guide.
Wiley, 2010.
Matthew Dixon and Igor Halperin.
G-learner and GIRL: Goal based wealth management with reinforcement learning.
arXiv, 2020.
Eugene A. Durenard.
Professional Automated Trading: Theory and Practice.
Wiley, 2013.
Eugene A. Feinberg and Adam Shwartz.
Handbook of Markov decision processes.
Kluwer Academic Publishers, Boston, MA, 2002.
Introduction to Reinforcement Learning
Bibliography
Igor Halperin.
QLBS: Q-learner in the Black–Scholes (–Merton) worlds.
SSRN, 2017.
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3087076.
Igor Halperin.
The QLBS Q-learner goes NuQLear: Fitted Q iteration, inverse RL, and option
portfolios.
SSRN, 2018.
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3102707.
Onesimo Hernández-Lerma and Jean B. Lasserre.
Discrete-time Markov control processes.
Springer-Verlag, New York, 1996.
Ronald A. Howard.
Dynamic programming and Markov processes.
The Technology Press of M.I.T., Cambridge, Mass., 1960.
Engin İpek, Onur Mutlu, José F. Martı́nez, and Rich Caruana.
Self-optimizing memory controllers: A reinforcement learning approach.
In Proceedings of the 35th Annual International Symposium on Computer
Architecture, pages 39–50. IEEE Computer Society Washington, DC, 2008.
Jessica James, Jonathan Fullwood, and Peter Billington.
FX Option Performance and Data Set: An Analysis of the Value Delivered by FX
Options Since the Start of the Market.
Introduction to Reinforcement Learning
Bibliography
Wiley, 2015.
Petter N. Kolm and Gordon Ritter.
Dynamic replication and hedging: A reinforcement learning approach.
The Journal of Financial Data Science, 1(1):159–171, 2019.
Petter N. Kolm and Gordon Ritter.
Modern perspectives on reinforcement learning in finance.
Journal of Machine Learning in Finance, 1(1), 2019.
Maxim Lapan.
Deep Reinforcement Learning Hands-On.
Packt, 2018.
Yuanlong Li, Yonggang Wen, Dacheng Tao, and Kyle Guan.
Transforming cooling optimization for green data center via deep reinforcement
learning.
IEEE Transactions on Cybernetics, pages 1–12, 2019.
José F. Martı́nez and Engin İpek.
Dynamic multicore resource management: A machine learning approach.
Micro, IEEE, 29(5):8–17, 2009.
Donald Michie.
Experiments on the mechanization of game-learning. Part I. Characterization of the
model and its parameters.
The Computer Journal, 6(3):232–236, November 1963.
Introduction to Reinforcement Learning
Bibliography
Marvin Minsky.
Form and content in computer science, 1969 Turing Award lecture.
Journal of the Association for Computing Machinery, 17(2), 1970.
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou,
Daan Wierstra, and Martin Riedmiller.
Playing Atari with deep reinforcement learning.
https://arxiv.org/abs/1312.5602, December 2013.
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness,
Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg
Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen
King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis.
Human-level control through deep reinforcement learning.
Nature, 518, February 2015.
Jan Novotny, Paul Alexander Bilokon, Aris Galiotos, and Frédéric Délèze.
Machine Learning and Bid Data with kdb+/q.
Wiley, 2019.
Andrew Y. Ng, Adam Coates, Mark Diel, Varun Ganapathi, Jamie Schulte, Ben Tse,
Eric Berger, and Eric Liang.
Experimental Robotics IX: The 9th International Symposium on Experimental
Robotics, chapter Autonomous Inverted Helicopter Flight via Reinforcement Learning,
pages 363–372.
Springer, 2006.
Introduction to Reinforcement Learning
Bibliography
Some studies in machine learning using the game of checkers. ii — recent progress.
IBM Journal on Research and Development, 11(6):601–617, 1967.
Richard S. Sutton and Andrew G. Barto.
Reinforcement Learning: An Introduction.
MIT Press, 2 edition, 2018.
Matthias Schnaubelt.
Deep reinforcement learning for optimal placement of cryptocurrency limit orders.
FAU Discussion Papers in Economics 05/2020, Friedrich-Alexander-Universität
Erlangen-Nürnberg, Institute of Economics, Nürnberg, 2020.
Claude E. Shannon.
Programming a computer for playing chess.
Philosophical Magazine and Journal of Science, 41(314):256–275, 1950.
Lloyd Stowell Shapley.
Stochastic games.
Proceedings of the National Academy of Sciences of the United States of America,
39(10):1095–1100, October 1953.
David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George
van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam,
Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya
Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel,
and Demis Hassabis.
Mastering the game of Go with deep neural networks and tree search.
Introduction to Reinforcement Learning
Bibliography
Gerald Tesauro, David C. Gondek, Jonathan Lenchner, James Fan, and John M.
Prager.
Analysis of Watson’s strategies for playing Jeopardy!
Journal of Artificial Intelligence Research, 47:205–251, 2013.
Philip S. Thomas.
Safe Reinforcement Learning.
PhD thesis, University of Massachusetts, Amherst, 2015.
Georgios Theocharous, Philip S. Thomas, and Mohammad Ghavamzadeh.
Personalized ad recommendation systems for life-time value optimization with
guarantees.
In Proceedings of the Twenty-Fourth International Joint Conference on Artificial
Intelligence (IJCAI 2015). AAAI Press, Palo Alto, CA, 2015.
Igor Tulchinsky.
Finding Alphas: A Quantitative Approach to Building Trading Strategies.
Wiley, 2015.
Alan Mathison Turing.
The Essential Turing, chapter Intelligent machinery, pages 410–432.
Oxford University Press, Oxford, 2004.
Timothy Woodbury, Caroline Dunn, and John Valasek.
Autonomous soaring using reinforcement learning for trajectory generation.
In 52nd Aerospace Sciences Meeting, 2014.
Introduction to Reinforcement Learning
Bibliography
Uwe Wystup.
FX Options and Structured Products.
Wiley, 2 edition, 2017.
Alex Zai and Brandon Brown.
Deep Reinforecement Learning in Action.
Manning, 2020.