Deep Reinforcement Learning in Action 1st Edition Alexander Zai pdf download
Deep Reinforcement Learning in Action 1st Edition Alexander Zai pdf download
https://textbookfull.com/product/deep-reinforcement-learning-in-
action-1st-edition-alexander-zai/
https://textbookfull.com/product/grokking-deep-reinforcement-
learning-first-edition-miguel-morales/
https://textbookfull.com/product/deep-reinforcement-learning-in-
unity-with-unity-ml-toolkit-1st-edition-abhilash-majumder/
https://textbookfull.com/product/deep-reinforcement-learning-in-
unity-with-unity-ml-toolkit-1st-edition-abhilash-majumder-2/
https://textbookfull.com/product/foundations-of-deep-
reinforcement-learning-theory-and-practice-in-python-1st-edition-
laura-graesser/
Foundations of Deep Reinforcement Learning Theory and
Practice in Python First Edition Laura Graesser
https://textbookfull.com/product/foundations-of-deep-
reinforcement-learning-theory-and-practice-in-python-first-
edition-laura-graesser/
https://textbookfull.com/product/deep-learning-in-biometrics-1st-
edition-mayank-vatsa/
https://textbookfull.com/product/reinforcement-learning-for-
optimal-feedback-control-rushikesh-kamalapurkar/
https://textbookfull.com/product/reinforcement-learning-and-
optimal-control-draft-version-1st-edition-dmitri-bertsekas/
https://textbookfull.com/product/reinforcement-learning-
industrial-applications-of-intelligent-agents-1st-edition-phil-
winder/
IN ACTION
Alexander Zai
Brandon Brown
MANNING
Deep Reinforcement Learning in Action
Deep Reinforcement
Learning in Action
BRANDON BROWN
AND ALEXANDER ZAI
MANNING
SHELTER ISLAND
For online information and ordering of this and other Manning books, please visit
www.manning.com. The publisher offers discounts on this book when ordered in quantity.
For more information, please contact
Special Sales Department
Manning Publications Co.
20 Baldwin Road
PO Box 761
Shelter Island, NY 11964
Email: [email protected]
Many of the designations used by manufacturers and sellers to distinguish their products are
claimed as trademarks. Where those designations appear in the book, and Manning Publications
was aware of a trademark claim, the designations have been printed in initial caps or all caps.
Recognizing the importance of preserving what has been written, it is Manning’s policy to have
the books we publish printed on acid-free paper, and we exert our best efforts to that end.
Recognizing also our responsibility to conserve the resources of our planet, Manning books
are printed on paper that is at least 15 percent recycled and processed without the use of
elemental chlorine.
ISBN: 9781617295430
Printed in the United States of America
brief contents
PART 1 FOUNDATIONS ..............................................................1
1 ■ What is reinforcement learning? 3
2 ■ Modeling reinforcement learning problems:
Markov decision processes 23
3 ■ Predicting the best states and actions:
Deep Q-networks 54
4 ■ Learning to pick the best policy: Policy
gradient methods 90
5 ■ Tackling more complex problems
with actor-critic methods 111
v
contents
preface xiii
acknowledgments xv
about this book xvi
about the authors xix
about the cover illustration xx
vii
viii CONTENTS
gradient 92 Exploration 94
■
xiii
xiv PREFACE
We began blogging about the things we were learning in the machine learning
world and projects that we were using in our work. We ended up getting a fair amount
of positive feedback, which led us to the idea of collaborating on this book. We believe
that most of the resources out there for learning hard things are either too simple and
leave out the most compelling aspects of the topic or are inaccessible to people with-
out sophisticated mathematics backgrounds. This book is our effort at translating a
body of work written for experts into a course for those with nothing more than a pro-
gramming background and some basic knowledge of neural networks. We employ
some novel teaching methods that we think set our book apart and lead to much
faster understanding. We start from the basics, and by the end you will be implement-
ing cutting-edge algorithms invented by industry-based research groups like Deep-
Mind and OpenAI, as well as from high-powered academic labs like the Berkeley
Artificial Intelligence Research (BAIR) Lab and University College London.
acknowledgments
This book took way longer than we anticipated, and we owe a lot to our editors Can-
dace West and Susanna Kline for helping us at every stage of the process and keeping
us on track. There are a lot of details to keep track of when writing a book, and with-
out the professional and supportive editorial staff we would have floundered.
We’d also like to thank our technical editors Marc-Philippe Huget and Al Krinker
and all of the reviewers who took the time to read our manuscript and provide us with
crucial feedback. In particular, we thank the reviewers: Al Rahimi, Ariel Gamiño,
Claudio Bernardo Rodriguez, David Krief, Dr. Brett Pennington, Ezra Joel Schroeder,
George L. Gaines, Godfred Asamoah, Helmut Hauschild, Ike Okonkwo, Jonathan
Wood, Kalyan Reddy, M. Edward (Ed) Borasky, Michael Haller, Nadia Noori, Satyajit
Sarangi, and Tobias Kaatz. We would also like to thank everyone at Manning who
worked on this project: Karen Miller, the developmental editor; Ivan Martinović, the
review editor; Deirdre Hiam, the project editor; Andy Carroll, the copy editor; and
Jason Everett, the proofreader.
In this age, many books are self-published using various online services, and we
were initially tempted by this option; however, after having been through this whole
process, we can see the tremendous value in professional editing staff. In particular, we
thank copy editor Andy Carroll for his insightful feedback that dramatically improved
the clarity of the text.
Alex thanks his PI Jamie who introduced him to machine learning early in his
undergraduate career.
Brandon thanks his wife Xinzhu for putting up with his late nights of writing and
time away from the family and for giving him two wonderful children, Isla and Avin.
xv
about this book
Who should read this book
Deep Reinforcement Learning in Action is a course designed to take you from the very
foundational concepts in reinforcement learning all the way to implementing the lat-
est algorithms. As a course, each chapter centers around one major project meant to
illustrate the topic or concept of that chapter. We’ve designed each project to be
something that can be efficiently run on a modern laptop; we don’t expect you to
have access to expensive GPUs or cloud computing resources (though access to these
resources does make things run faster).
This book is for individuals with a programming background, in particular, a work-
ing knowledge of Python, and for people who have at least a basic understanding of
neural networks (a.k.a. deep learning). By “basic understanding,” we mean that you
have at least tried implementing a simple neural network in Python even if you didn’t
fully understand what was going on under the hood. Although this book is focused on
using neural networks for the purposes of reinforcement learning, you will also proba-
bly learn a lot of new things about deep learning in general that can be applied to
other problems outside of reinforcement learning, so you do not need to be an expert
at deep learning before jumping into deep reinforcement learning.
xvi
ABOUT THIS BOOK xvii
xix
about the cover illustration
The figure on the cover of Deep Reinforcement Learning in Action is captioned “Femme
de l’Istria,” or woman from Istria. The illustration is taken from a collection of dress
costumes from various countries by Jacques Grasset de Saint-Sauveur (1757-1810),
titled Costumes de Différents Pays, published in France in 1797. Each illustration is finely
drawn and colored by hand. The rich variety of Grasset de Saint-Sauveur’s collection
reminds us vividly of how culturally apart the world’s towns and regions were just 200
years ago. Isolated from each other, people spoke different dialects and languages. In
the streets or in the countryside, it was easy to identify where they lived and what their
trade or station in life was just by their dress.
The way we dress has changed since then and the diversity by region, so rich at the
time, has faded away. It is now hard to tell apart the inhabitants of different conti-
nents, let alone different towns, regions, or countries. Perhaps we have traded cultural
diversity for a more varied personal life—certainly for a more varied and fast-paced
technological life.
At a time when it is hard to tell one computer book from another, Manning cele-
brates the inventiveness and initiative of the computer business with book covers
based on the rich diversity of regional life of two centuries ago, brought back to life by
Grasset de Saint-Sauveur’s pictures.
xx
Part 1
Foundations
P art 1 consists of five chapters that teach the most fundamental aspects of
deep reinforcement learning. After reading part 1, you’ll be able to understand
the chapters in part 2 in any order.
Chapter 1 begins with a high-level introduction to deep reinforcement learn-
ing, explaining its main concepts and its utility. In chapter 2 we’ll start building
practical projects that illustrate the basic ideas of reinforcement learning. In
chapter 3 we’ll implement a deep Q-network—the same kind of algorithm that
DeepMind famously used to play Atari games at superhuman levels.
Chapters 4 and 5 round out the most common reinforcement learning algo-
rithms, namely policy gradient methods and actor-critic methods. We’ll look at
the pros and cons of these approaches compared to deep Q-networks.
What is reinforcement
learning?
Computer languages of the future will be more concerned with goals and less with
procedures specified by the programmer.
—Marvin Minksy, 1970 ACM Turing Lecture
If you’re reading this book, you are probably familiar with how deep neural net-
works are used for things like image classification or prediction (and if not, just
keep reading; we also have a crash course in deep learning in the appendix). Deep
reinforcement learning (DRL) is a subfield of machine learning that utilizes deep
learning models (i.e., neural networks) in reinforcement learning (RL) tasks (to be
defined in section 1.2). In image classification we have a bunch of images that cor-
respond to a set of discrete categories, such as images of different kinds of animals,
and we want a machine learning model to interpret an image and classify the kind
of animal in the image, as in figure 1.1.
3
4 CHAPTER 1 What is reinforcement learning?
Class labels
Cat
Image classifier
Dog
Parametric function
Training data
Parametric function
Training data
Figure 1.2 Perhaps the simplest machine learning model is a simple linear function of the
form f(x) = mx + b, with parameters m (the slope) and b (the intercept). Since it has adjustable
parameters, we call it a parametric function or model. If we have some 2-dimensional data, we
can start with a randomly initialized set of parameters, such as [m = 3.4, b = 0.3], and then use
a training algorithm to optimize the parameters to fit the training data, in which case the optimal
set of parameters is close to [m = 2, b = 1].
N VP
V NP
Figure 1.3 A sentence like “John hit the ball” can be decomposed
into simpler and simpler parts until we get the individual words. In this
D N case, we can decompose the sentence (denoted S) into a subject noun
(N) and a verb phrase (VP). The VP can be further decomposed into a
verb, “hit,” and a noun phrase (NP). The NP can then be decomposed
John hit the ball. into the individual words “the” and “ball.”
6 CHAPTER 1 What is reinforcement learning?
which are composed into elementary shapes, and so on, until you get the complete,
complex image. This ability to handle complexity with compositional representations
is largely what makes deep learning so powerful.
Updates the
Reinforcement learning
algorithm
Machine
Control tasks
learning
Is a Is a framework
subset of for solving Figure 1.5 Deep learning is a
subfield of machine learning. Deep
Deep Reinforcement learning algorithms can be used
learning Can be used as the learning to power RL approaches to solving
learning algorithm for control tasks.
One added complexity of moving from image processing to the domain of control
tasks is the additional element of time. With image processing, we usually train a deep
learning algorithm on a fixed data set of images. After a sufficient amount of training,
we typically get a high-performance algorithm that we can deploy to some new, unseen
images. We can think of the data set as a “space” of data, where similar images are closer
together in this abstract space and distinct images are farther apart (figure 1.6).
In control tasks, we similarly have a space of data to process, but each piece of data
also has a time dimension—the data exists in both time and space. This means that
what the algorithm decides at one time is influenced by what happened at a previous
time. This isn’t the case for ordinary image classification and similar problems. Time
10
A
5 C
–5
Figure 1.6 This graphical depiction of words in a 2D space shows each word as a colored point.
Similar words cluster together, and dissimilar words are farther apart. Data naturally lives in some
kind of “space” with similar data living closer together. The labels A, B, C, and D point to particular
clusters of words that share some semantics.
Other documents randomly have
different content
The Project Gutenberg eBook of Gambolling with
Galatea: a Bucolic Romance
This ebook is for the use of anyone anywhere in the United
States and most other parts of the world at no cost and with
almost no restrictions whatsoever. You may copy it, give it away
or re-use it under the terms of the Project Gutenberg License
included with this ebook or online at www.gutenberg.org. If you
are not located in the United States, you will have to check the
laws of the country where you are located before using this
eBook.
Language: English
(page 30)
GAMBOLLING WITH
GALATEA: A BUCOLIC ROMANCE
By CURTIS DUNHAM
Author of “The Casino Girl in London,” “Two in a Zoo,” “The Golden
Goblin,” etc.
WITH ILLUSTRATIONS
BY OLIVER HERFORD
PART I
Initiation of the Two-Legged Partners 1
PART II
Fair Warning to the Horseless 39
PART III
Pig-Malion and Galatea 67
PART IV
The Obsequies of Bos Nemo 98
PART V
Equus Minor, Detective 127
PART VI
Taurus Cupid, Esq. 157
Illustrations
Half a mile away the Poet and his sister sat on a boulder beside the
road. It was a semi-public road winding around the foot of a wooded
hill. Behind them, a mile away, was the railway station. That mile
had been mostly uphill, and the Poet did not love physical exercise.
He was tall and lean, with a geometrical figure composed mainly of
acute angles. When in a state of repose, it resembled a carpenter’s
pocket rule which protested at being entirely shut up. The Poet’s
sister, on the contrary, was mainly curves—those delicate, subtle
curves that deny the presence of bones, yet repel any suggestion of
fat. She was young; not too young—just young enough to have won
the crowning glory of spinsterhood. She had quantities of red hair,
the kind of red hair that always goes with that astonishingly
transparent skin underneath which scattering amber freckles come
and go over-night. There was one now on the side of her nose, which
had a becomingly mirthful tilt at the end. Her lips were full at the
centre, carmine, and with finely shaped corners which could not by
any possibility be drawn downward. She wore a solid pair of calfskin
boots, with military heels which looked small while being ample in
size. Her dark walking-skirt barely reached the interesting spot
where her bootlaces were tied. Her waist, of a soft, cream-tinted
material, left her neck and throat bare—for which the Lord be
praised!—and a shapeless, yet shapely, fluffy white thing resting on
the coils of her hair seemed to absorb warmth from them. In short,
you will make no mistake when you keep your mind fixed on the
Poet’s sister.
“Just around the next turn of the road, George,” she was saying, “our
little summer Elysium will burst upon your view.”
The Poet mopped the long, solemn countenance that was belied by
his eyes and his manner of speech.
“Galatea, I have observed that most things elysian in this life are
generally just around the corner. I am not impatient. I can wait. In
fact, I should prefer to have that first view burst upon me while I am
comfortably seated in the spring wagon of—What did I understand
you to say the gentleman’s name was, Galatea?”
“He is called Gabe.”
“Doubtless a corruption of Gabriel. I wonder if Gabriel blows his
trumpet for breakfast?”
Galatea’s lips parted in a musical ripple of laughter. The sight would
have caused a dentist to pass on, with misgivings about his future.
The Poet merely remarked:—
“Galatea, are you sure we brought our toothbrushes?” Whereupon
the dentist would have been heartened by the sight of a tiny point of
gold shining out of the crown of her left bicuspid.
“George, you lazy thing, come on. It’s only half a mile further. Gabriel
probably missed us at the station, and has returned by the main
road.”
“Oh, well, if all roads lead to Elysium, I suppose it’s no use waiting
here.”
Slowly the Poet’s angles adjusted themselves to the upright position,
and he strode on beside his sister.
“So you really like the place, Galatea?”
“It’s lovely—just the spot to give you inspiration, George. I shall
expect great things of you, dear.”
“Will it inspire me to reduce the rhythm of Anacreon to ragtime, do
you think?”
“O George! And there are the Professor’s pets, you know—Mrs.
Cowslip, Clarence, Reginald, Gustavius, and William. I told you
about them. The Professor has the most wonderful knack of
understanding domestic animals and making them understand him.
Really, they look upon him as one of themselves. The Professor says
we do our domestic animal pets great injustice when we overlook
their loyalty and intelligence, refusing to meet them half-way in
friendly companionship. Why, with only a little encouragement they
develop the most remarkable emotions, almost human in their
complexity; while their powers of expression develop
correspondingly. Positively the Professor and his cow, and colt, and
pig, and bull-calf,—William the goat, Napoleon the dog, and
Cleopatra the mare were away the day I called to arrange about the
lease for the summer,—are just one big happy family.”
Galatea’s cheeks were flushed with enthusiasm. The Poet’s eyes
twinkled, but his face remained long and solemn.
“What name does the pig answer to?”
“Reginald; but he’s a nice, clean pig.”
“Yes, of course, being a member of the Professor’s family. By the way,
did you have an opportunity to note Reginald’s table manners?”
“O George, how perfectly absurd!”
“Not necessarily. I give way to no man in my determination to do
justice to my fellow creatures, irrespective of the number of legs with
which they are equipped. As the Professor has left us in undisputed
possession for the next six months, there’s no telling what we may
accomplish. What sort of voice has Reginald?”
“George, I shan’t tell you another thing!”
“There, there. It merely occurred to me that, as neither you nor I nor
Arthur sings—By the way, Galatea, I suppose Arthur will run over
occasionally in his new automobile, the lucky beggar?”
“I lay claim to no advance information respecting Arthur’s
intentions,” answered the Poet’s sister, in cool, even tones. The
flapping brim of her headgear was between the Poet’s eyes and her
cheek, suddenly turned pink.
“Oh, well, I was only thinking what a boon Arthur’s banjo and my
guitar would turn out to be if the pig should develop a romantic tenor
voice. By Jove, Galatea! If that’s the place, I apologize for
everything.”
They had reached the turn of the road that overlooked their summer
Elysium. The Poet distributed his joints over another roadside
boulder, while Galatea stood by his side, and gave his attention to the
charming scene in detail.
“Really, a fine, rambling old house surrounded by shaded verandas
below, and not too near the road. A stone-walled inclosure of half a
dozen acres sloping down to a pretty brook that flows under the
lower wall just below the barn—a comfortable red barn; a barn that
isn’t red is only half a barn. A kitchen-garden and an orchard, and
the rest pasture that is neat enough for a lawn. What romps we shall
have, Galatea, with the colt and the bull-calf! What’s that vine-
covered affair reared against the west gable of the house? Oh, a
water-tank. Just so; there’s a pipe connecting underground with the
brook, and that wind-wheel on the barn roof does the pumping.
Good! I anticipate the luxury of an occasional tub. I was afraid
Elysium was like Germany—lots of romance and no bathtubs.
Galatea, we shall do—we shall do beautifully. But I say, what’s that
funny-looking thing on the peak of the house roof?”
“Isn’t it the chimney?”
“It looks to me like a saw-horse.”
They walked on. After passing through a grove of chestnuts, they had
a nearer and better view of the house.
“No, it isn’t a saw-horse,” said the Poet. “It moves. Did you see it?”
Galatea looked embarrassed.
“Galatea, the thing on our roof looks to me uncommonly like a billy-
goat. Galatea, it is a billy-goat—I can make out his whiskers.”
Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
textbookfull.com