Foundations of Deep Reinforcement Learning Theory and Practice in Python 1st Edition Laura Graesser 2024 Scribd Download
Foundations of Deep Reinforcement Learning Theory and Practice in Python 1st Edition Laura Graesser 2024 Scribd Download
com
https://textbookfull.com/product/foundations-of-
deep-reinforcement-learning-theory-and-practice-
in-python-1st-edition-laura-graesser/
https://textbookfull.com/product/foundations-of-deep-reinforcement-
learning-theory-and-practice-in-python-first-edition-laura-graesser/
textbookfull.com
https://textbookfull.com/product/deep-reinforcement-learning-in-
action-1st-edition-alexander-zai/
textbookfull.com
https://textbookfull.com/product/grokking-deep-reinforcement-learning-
first-edition-miguel-morales/
textbookfull.com
https://textbookfull.com/product/the-dynamism-of-civil-procedure-
global-trends-and-developments-1st-edition-colin-b-picker/
textbookfull.com
https://textbookfull.com/product/wisconsin-off-the-beaten-path-
discover-your-fun-11th-edition-martin-hintz/
textbookfull.com
https://textbookfull.com/product/the-giant-vesicle-book-1st-edition-
rumiana-dimova-editor/
textbookfull.com
https://textbookfull.com/product/livestock-ration-formulation-for-
dairy-cattle-and-buffalo-1st-edition-ravinder-singh-kuntal/
textbookfull.com
https://textbookfull.com/product/a-post-wto-international-legal-order-
utopian-dystopian-and-other-scenarios-meredith-kolsky-lewis/
textbookfull.com
CompTIA Network+ Certification Study Guide, Seventh
Edition (Exam N10-007) Clarke
https://textbookfull.com/product/comptia-network-certification-study-
guide-seventh-edition-exam-n10-007-clarke/
textbookfull.com
Praise for Foundations of Deep
Reinforcement Learning
“This book provides an accessible introduction to deep reinforcement learning covering
the mathematical concepts behind popular algorithms as well as their practical
implementation. I think the book will be a valuable resource for anyone looking to apply
deep reinforcement learning in practice.”
—Volodymyr Mnih, lead developer of DQN
“An excellent book to quickly develop expertise in the theory, language, and practical
implementation of deep reinforcement learning algorithms. A limpid exposition which
uses familiar notation; all the most recent techniques explained with concise, readable
code, and not a page wasted in irrelevant detours: it is the perfect way to develop a solid
foundation on the topic.”
—Vincent Vanhoucke, principal scientist, Google
“As someone who spends their days trying to make deep reinforcement learning methods
more useful for the general public, I can say that Laura and Keng’s book is a welcome
addition to the literature. It provides both a readable introduction to the fundamental
concepts in reinforcement learning as well as intuitive explanations and code for many of
the major algorithms in the field. I imagine this will become an invaluable resource for
individuals interested in learning about deep reinforcement learning for years to come.”
—Arthur Juliani, senior machine learning engineer, Unity Technologies
“Until now, the only way to get to grips with deep reinforcement learning was to slowly
accumulate knowledge from dozens of different sources. Finally, we have a book bringing
everything together in one place.”
—Matthew Rahtz, ML researcher, ETH Zürich
Foundations
of Deep
Reinforcement
Learning
The Pearson Addison-Wesley
Data & Analytics Series
The series aims to tie all three of these areas together to help the reader build
end-to-end systems for fighting spam; making recommendations; building
personalization; detecting trends, patterns, or problems; and gaining insight
from the data exhaust of systems and user interactions.
Laura Graesser
Wah Loon Keng
The authors and publisher have taken care in the preparation of this book, but make no expressed or
implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed
for incidental or consequential damages in connection with or arising out of the use of the information or
programs contained herein.
For information about buying this title in bulk quantities, or for special sales opportunities (which may
include electronic versions; custom cover designs; and content particular to your business, training goals,
marketing focus, or branding interests), please contact our corporate sales department
at [email protected] or (800) 382-3419.
For questions about sales outside the U.S., please contact [email protected].
All rights reserved. This publication is protected by copyright, and permission must be obtained from the
publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or
by any means, electronic, mechanical, photocopying, recording, or likewise. For information regarding
permissions, request forms and the appropriate contacts within the Pearson Education Global Rights &
Permissions Department, please visit www.pearson.com/permissions.
ISBN-13: 978-0-13-517238-4
ISBN-10: 0-13-517238-1
3 20
For those people who make me feel that anything is possible
—Laura
Foreword xix
Preface xxi
Acknowledgments xxv
2 REINFORCE 25
2.1 Policy 26
2.2 The Objective Function 26
2.3 The Policy Gradient 27
2.3.1 Policy Gradient
Derivation 28
2.4 Monte Carlo Sampling 30
2.5 REINFORCE Algorithm 31
2.5.1 Improving
REINFORCE 32
2.6 Implementing REINFORCE 33
2.6.1 A Minimal REINFORCE
Implementation 33
2.6.2 Constructing Policies with
PyTorch 36
2.6.3 Sampling Actions 38
2.6.4 Calculating Policy
Loss 39
2.6.5 REINFORCE Training
Loop 40
2.6.6 On-Policy Replay
Memory 41
2.7 Training a REINFORCE Agent 44
2.8 Experimental Results 47
2.8.1 Experiment: The Effect of
Discount Factor γ 47
2.8.2 Experiment: The Effect of
Baseline 49
2.9 Summary 51
2.10 Further Reading 51
2.11 History 51
3 SARSA 53
3.1 The Q- and V-Functions 54
3.2 Temporal Difference Learning 56
3.2.1 Intuition for Temporal
Difference Learning 59
Visit https://textbookfull.com
now to explore a rich
collection of eBooks, textbook
and enjoy exciting offers!
Contents xi
13 Hardware 273
13.1 Computer 273
13.2 Data Types 278
13.3 Optimizing Data Types in RL 280
13.4 Choosing Hardware 285
13.5 Summary 285
14 States 289
14.1 Examples of States 289
14.2 State Completeness 296
14.3 State Complexity 297
14.4 State Information Loss 301
14.4.1 Image Grayscaling 301
14.4.2 Discretization 302
14.4.3 Hash Conflict 303
14.4.4 Metainformation
Loss 303
14.5 Preprocessing 306
14.5.1 Standardization 307
14.5.2 Image Preprocessing 308
14.5.3 Temporal
Preprocessing 310
14.6 Summary 313
15 Actions 315
15.1 Examples of Actions 315
15.2 Action Completeness 318
15.3 Action Complexity 319
15.4 Summary 323
15.5 Further Reading: Action Design in
Everyday Things 324
16 Rewards 327
16.1 The Role of Rewards 327
16.2 Reward Design Guidelines 328
16.3 Summary 332
Contents xvii
Epilogue 338
References 353
Index 363
This page intentionally left blank
Foreword
In April of 2019, OpenAI’s Five bots played in a Dota 2 competition match against 2018
human world champions, OG. Dota 2 is a complex, multiplayer battle arena game where
players can choose different characters. Winning a game requires strategy, teamwork, and
quick decisions. Building an artificial intelligence to compete in this game, with so
many variables and a seemingly infinite search space for optimization, seems like an
insurmountable challenge. Yet OpenAI’s bots won handily and, soon after, went on to win
over 99% of their matches against public players. The innovation underlying this
achievement was deep reinforcement learning.
Although this development is recent, reinforcement learning and deep learning have
both been around for decades. However, a significant amount of new research combined
with the increasing power of GPUs have pushed the state of the art forward. This book
gives the reader an introduction to deep reinforcement learning and distills the work done
over the last six years into a cohesive whole.
While training a computer to beat a video game may not be the most practical thing to
do, it’s only a starting point. Reinforcement learning is an area of machine learning that is
useful for solving sequential decision-making problems—that is, problems that are solved
over time. This applies to almost any endeavor—be it playing a video game, walking down
the street, or driving a car.
Laura Graesser and Wah Loon Keng have put together an approachable introduction to
a complicated topic that is at the forefront of what is new in machine learning. Not only
have they brought to bear their research into many papers on the topic; they created an
open source library, SLM Lab, to help others get up and running quickly with deep
reinforcement learning. SLM Lab is written in Python on top of PyTorch, but readers only
need familiarity with Python. Readers intending to use TensorFlow or some other library
as their deep learning framework of choice will still get value from this book as it
introduces the concepts and problem formulations for deep reinforcement learning
solutions.
This book brings together the most recent research in deep reinforcement learning
along with examples and code that the readers can work with. Their library also works
with OpenAI’s Gym, Roboschool, and the Unity ML-Agents toolkit, which makes this
book a perfect jumping-off point for readers looking to work with those systems.
We first discovered deep reinforcement learning (deep RL) when DeepMind achieved
breakthrough performance in the Atari arcade games. Using only images and no prior
knowledge, artificial agents reached human-level performance for the first time.
The idea of an artificial agent learning by itself, through trial and error, without
supervision, sparked something in our imaginations. It was a new and exciting approach to
machine learning, and it was quite different from the more familiar field of supervised
learning.
We decided to work together to learn about this topic. We read books and papers,
followed online courses, studied code, and tried to implement the core algorithms. We
realized that not only is deep RL conceptually challenging, but that implementation
requires as much effort as a large software engineering project.
As we progressed, we learned more about the landscape of deep RL—how algorithms
relate to each other and what their different characteristics are. Forming a mental model of
this was hard because deep RL is a new area of research and the theoretical knowledge had
not yet been distilled into a book. We had to learn directly from research papers and online
lectures.
Another challenge was the large gap between theory and implementation. Often, a deep
RL algorithm has many components and tunable hyperparameters that make it sensitive
and fragile. For it to succeed, all the components need to work together correctly and with
appropriate hyperparameter values. The implementation details required to get this right
are not immediately clear from the theory, but are just as important. A resource that
integrated theory and implementation would have been invaluable when we were learning.
We felt that the journey from theory to implementation could have been simpler than
we found it, and we wanted to contribute to making deep RL easier to learn. This book is
our attempt to do that. It takes an end-to-end approach to introducing deep RL—starting
with intuition, then explaining the theory and algorithms, and finishing with
implementations and practical tips. This is also why the book comes with a companion
software library, SLM Lab, which contains implementations of all the algorithms discussed
in it. In short, this is the book we wished existed when we were starting to learn about this
topic.
Deep RL belongs to the larger field of reinforcement learning. At the core of
reinforcement learning is function approximation; in deep RL, functions are learned using
deep neural networks. Reinforcement learning, along with supervised and unsupervised
learning, make up the three core machine learning techniques, and each technique differs
in how problems are formulated and how algorithms learn from data.
In this book we focus exclusively on deep RL because the challenges we experienced
are specific to this subfield of reinforcement learning. This bounds the scope of the book
xxii Preface
in two ways. First, it excludes all other techniques that can be used to learn functions
in reinforcement learning. Second, it emphasizes developments between 2013 and
2019 even though reinforcement learning has existed since the 1950s. Many of the
recent developments build from older research, so we felt it was important to trace the
development of the main ideas. However, we do not intend to give a comprehensive
history of the field.
This book is aimed at undergraduate computer science students and software engineers.
It is intended to be an introduction to deep RL and no prior knowledge of the subject is
required. However, we do assume that readers have a basic familiarity with machine
learning and deep learning as well as an intermediate level of Python programming. Some
experience with PyTorch is also useful but not necessary.
The book is organized as follows. Chapter 1 introduces the different aspects of a deep
reinforcement learning problem and gives an overview of deep reinforcement learning
algorithms.
Part I is concerned with policy-based and value-based algorithms. Chapter 2 introduces
the first Policy Gradient method known as REINFORCE. Chapter 3 introduces the first
value-based method known as SARSA. Chapter 4 discusses the Deep Q-Networks
(DQN) algorithm and Chapter 5 focuses on techniques for improving it—target
networks, the Double DQN algorithm, and Prioritized Experience Replay.
Part II focuses on algorithms which combine policy-based and value-based methods.
Chapter 6 introduces the Actor-Critic algorithm which extends REINFORCE.
Chapter 7 introduces Proximal Policy Optimization (PPO) which can extend
Actor-Critic. Chapter 8 discusses synchronous and asynchronous parallelization techniques
that are applicable to any of the algorithms in this book. Finally, all the algorithms are
summarized in Chapter 9.
Each algorithm chapter is structured in the same way. First, we introduce the main
concepts and work through the relevant mathematical formulations. Then we describe
the algorithm and discuss an implementation in Python. Finally, we provide a configured
algorithm with tuned hyperparameters which can be run in SLM Lab, and illustrate the
main characteristics of the algorithm with graphs.
Part III focuses on the practical details of implementing deep RL algorithms.
Chapter 10 covers engineering and debugging practices and includes an almanac of
hyperparameters and results. Chapter 11 provides a usage reference for the companion
library, SLM Lab. Chapter 12 looks at neural network design and Chapter 13 discusses
hardware.
The final part of book, Part IV, is about environment design. It consists of Chapters 14,
15, 16, and 17 which treat the design of states, actions, rewards, and transition functions
respectively.
The book is intended to be read linearly from Chapter 1 to Chapter 10. These chapters
introduce all of the algorithms in the book and provide practical tips for getting them to
work. The next three chapters, 11 to 13, focus on more specialized topics and can be read
Preface xxiii
in any order. For readers that do not wish to go into as much depth, Chapters 1, 2, 3, 4, 6,
and 10 are a coherent subset of the book that focuses on a few of the algorithms. Finally,
Part IV contains a standalone set of chapters intended for readers with a particular interest
in understanding environments in more depth or building their own.
SLM Lab [67], this book’s companion software library, is a modular deep RL
framework built using PyTorch [114]. SLM stands for Strange Loop Machine, in homage
to Hofstadter’s iconic book Gödel, Escher, Bach: An Eternal Golden Braid [53]. The specific
examples from SLM Lab that we include use PyTorch’s syntax and features for training
neural networks. However, the underlying principles for implementing deep RL
algorithms are applicable to other deep learning frameworks such as TensorFlow [1].
The design of SLM Lab is intended to help new students learn deep RL by organizing
its components into conceptually clear pieces. These components also align with how deep
RL is discussed in the academic literature to make it easier to translate from theory to code.
Another important aspect of learning deep RL is experimentation. To facilitate this,
SLM Lab also provides an experimentation framework to help new students design and
test their own hypotheses.
The SLM Lab library is released as an open source project on Github. We encourage
readers to install it (on a Linux or MacOS machine) and run the first demo by following
the instructions on the repository website https://github.com/kengz/SLM-Lab. A
dedicated git branch “book” has been created with a version of code compatible with this
book. A short installation instruction copied from the repository website is shown in
Code 0.1.
We recommend you set this up first so you can train agents with algorithms as they are
introduced in this book. Beyond installation and running the demo, it is not necessary to
be familiar with SLM Lab before reading the algorithm chapters (Parts I and II)—we give
all the commands to train agents where needed. We also discuss SLM Lab more extensively
in Chapter 11 after shifting focus from algorithms to more practical aspects of deep
reinforcement learning.
xxiv Preface
Register your copy of Foundations of Deep Reinforcement Learning on the InformIT site for
convenient access to updates and/or corrections as they become available. To start the reg-
istration process, go to informit.com/register and log in or create an account. Enter the
product ISBN (9780135172384) and click Submit. Look on the Registered Products tab
for an Access Bonus Content link next to this product, and follow that link to access any
available bonus materials. If you would like to be notified of exclusive offers on new editions
and updates, please check the box to receive email from us.
Acknowledgments
There are many people who have helped us finish this project. We thank Milan Cvitkovic,
Alex Leeds, Navdeep Jaitly, Jon Krohn, Katya Vasilaky, and Katelyn Gleason for supporting
and encouraging us. We are grateful to OpenAI, PyTorch, Ilya Kostrikov, and Jamromir
Janisch for providing high-quality open source implementations of different components
of deep RL algorithms. We also thank Arthur Juliani for early discussions on environment
design. These resources and discussions were invaluable as we were building SLM Lab.
A number of people provided thoughtful and insightful feedback on earlier drafts of this
book. We would like to thank Alexandre Sablayrolles, Anant Gupta, Brandon Strickland,
Chong Li, Jon Krohn, Jordi Frank, Karthik Jayasurya, Matthew Rahtz, Pidong Wang,
Raymond Chua, Regina R. Monaco, Rico Jonschkowski, Sophie Tabac, and Utku Evci
for the time and effort you put into this. The book is better as a result.
We are very grateful to the Pearson production team—Alina Kirsanova, Chris Zahn,
Dmitry Kirsanov, and Julie Nahil. Thanks to your thoughtfulness, care, and attention to
detail the text has been greatly improved.
Finally, this book would not exist without our editor Debra Williams Cauley. Thank
you for your patience and encouragement, and for helping us to see that writing a book
was possible.
Random documents with unrelated
content Scribd suggests to you:
He said he was apprised of a speculation going on
in the stock of the Morrison; he intended to EAVESDROPP
embark in it—wished them to hold back their stock, ERS.
and aid his views in effecting a rise, and he would
aid them in disposing of theirs at the right time. He did not tell them
that he was the sole author of the speculation; he modestly forebore
that, under the plea that he did not feel at liberty to tell all that he
knew. But the directors were almost immediately confirmed in their
good opinion of his knowledge and sagacity, as well as of his
intention to do them a service, by hearing of the large purchases of
Mr. Bottomly, at a considerable advance on the previous market
value.
Mr. Friendly was not disposed to embrace this offer, liberal as it was,
and maintained such a reserve as excited still further the cupidity of
Spriggins. Meantime, having heard of the operations of
Eavesdropper, Mr. Friendly perceived that he had fairly put the match
to a train that, if properly fed, would lead to an explosion. But he
resolved that before it should happen, he would take good care of
himself.
Mr. Friendly, as all good men do, spent his money liberally and
charitably. To a large circle, his house was the centre of politeness,
elegance, and hospitality; but his insatiable appetite for speculation
ruined him at last, as it does all others; at least, he is so far ruined,
that until another speculation shall turn out like the Morrison, he will
be content to practice economy. Mr. Spriggins set up his carriage on
his anticipated profits, and was let down from it before his
coachman had fairly mounted his livery; and report says that he has
since done the same thing three times over. Mr. Eavesdropper ran
wild with his first success, and, in the end, only arrived one stride
nearer the “black list.” And exactly one hundred and forty others
were made poorer than they were before, by the whole amount
which they put at risk.
Mr. Faintheart first blushed, then turned pale, rose from his seat, and
his knees smote together, when he perceived that he had unluckily
hit on the vulgar cognomen of Nicholas. Regarding the man with the
highest veneration, and even with fear, and unable himself to
comprehend, how so great a mind could exist, without more aid
than falls to the lot of common mortals, he secretly believed the
profane allusion to be a verity, and, in his fears, expecting a blow
from the forked tail, that would annihilate him at once, he was fain
to ensconce himself under the table. But perceiving that none of his
comrades attempted to run away, and that Nicholas himself sat in
“the armed chair, calm as a summer’s morning,” with a smile playing
upon his countenance, benignant as benevolence herself, he was re-
assured; the purple came to his nose again, and he stammered out,
I beg pardon, I had no allusion to—to—you, sir.
“Sit down, Mr. Faintheart,” said Nicholas, “your wit amuses us.”
Mr. Faintheart sat down, but was unable sufficiently to master his
disturbed, and mortified feelings, to utter another word.
All this took place soon after the period we are speaking of, and if,
by talent and shrewdness, is meant, the ability to obtain credit,
when none is merited, and to know how to appropriate the avails to
themselves, without incurring liability, with two or three honorable
exceptions, the selection of officers was a judicious one. All these,
however, were minor considerations, and when Nicholas saw his new
friends installed, with a prospect of reviving the credit of the
company, and not only recovering his debt, but obtaining a bolster
also, to support the weary head of his great Pet, his ends were
nearly answered. This was probably one of the encouragements
which led to his famous letter of resignation; and, since the cotton
speculation is now closed, and the commissions all realised, we must
now leave him to enjoy his taste for literature, botany, and
horticulture, on the banks of the Schuylkill.
G. Leave off your jokes, T.; this is not a place for them. Mr.
President, what says he to our proposal?
Prest. Not at all; he has the most perfect confidence; that was cared
for at Philadelphia, before he came here.
T. And, if you are not quick about it, John-of-the-Field will outwit you
all.
S. No matter, tell him it is only for 700,000, and cost two millions
and a half.
T. Yes, and you can tell what became of some of the money, eh?
S. Mr. T., you are impertinent; do you accuse me sir, at this board.
Sir, you are a scoundrel.
Here Mr. T. brought round his right foot against the leg of the chair
in which Mr. S., who sat next him, was balancing himself with
offended dignity, and knocking it from under him, brought him to the
floor, to the very sensible uneasiness of his crupper, for some time
afterwards.
T. Well, W., I’ll go with you; rogues must bear to be told the truth,
without throwing back an insult, generally the best evidence of their
guilt.
Messrs. W. and T. here retired, and the rest again took their seats.
The proposal being agreed to, the board adjourned, and Mr. S. still
suffering from the contusion of his lower spine, walked home with a
gait much as he would have done, had he chanced to have made a
rent in his nether integuments, on a windy day.
While these things were going on in the parlor of the Morrison, the
commissioner sat in consultation with John-of-the-Field, and the
following colloquy took place between them:
John. And you are taking the right course for that purpose. The
example of New-York, in internal improvements, has, by its success,
given an impulse throughout the country. In point of locality, you far
exceed us; but then, we have the capital here, which is our great
advantage.
Com. Yes, sir, if we had the capital of New-York in the State of ——,
to carry out our plans of public improvement, in twenty years from
this, we should exceed her in population by a million.
John. Yes, Mr. Commissioner, and your soil is so fertile, that when
they were completed, even the farmers of New-York would draw
their supplies from you, through the lakes, the Erie canal, and the
Erie railroad, which is now in progress, principally with that view.
Here John, perceiving a stare of surprise and doubt in the eye of the
commissioner, added, “i. e. provided they could turn their lands to a
better account, by the cultivation of products more congenial to the
poverty of their soil.”
Com. The view taken of the subject by our legislature, is, that if we
can borrow the capital now, to complete the works, the income from
them will so increase, with the increase of products and population,
that in twenty years they will pay the debt and interest, and leave a
large revenue to the State ever afterwards.
Com. My object, sir, is ready funds; they are wanted for immediate
disbursement, to about the amount you speak of; if that could be
arranged, terms could be made for the balance.
Com. Why, sir, if the security of the whole State is not good, they
have nothing else to offer. However, I will think of your offer of
Mississippi funds. I have another call to make—so, good morning, sir.
As soon as the commissioner had left, John held this soliloquy with
himself:
Stocks sold on time, are seldom delivered; but, when the contract is
mature, the difference between the sale and the average market
value then, is paid over by the loser.
There are a few men of property, not brokers, who occasionally buy
fancy stocks in Wall-street, when money is scarce, and sell again
when it is more plenty, and reap a profit by it; but their number is so