0% found this document useful (0 votes)

2 views56 pages

Deep Reinforcement Learning in Action 1st Edition Alexander Zai pdf download

The document provides information about various books related to deep reinforcement learning, including 'Deep Reinforcement Learning in Action' by Alexander Zai and Brandon Brown, which covers foundational concepts and advanced topics in the field. It includes links to download these books and mentions the authors' backgrounds in software engineering and neuroscience. The text emphasizes the accessibility of deep reinforcement learning concepts for those with programming knowledge and basic neural network understanding.

Uploaded by

ggpoedayz691

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views56 pages

Deep Reinforcement Learning in Action 1st Edition Alexander Zai pdf download

Uploaded by

ggpoedayz691

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 56

Deep Reinforcement Learning in Action 1st

Edition Alexander Zai download

https://textbookfull.com/product/deep-reinforcement-learning-in-
action-1st-edition-alexander-zai/

Download full version ebook from https://textbookfull.com

We believe these products will be a great fit for you. Click
the link to download now, or visit textbookfull.com
to discover even more!

Grokking Deep Reinforcement Learning First Edition

Miguel Morales

https://textbookfull.com/product/grokking-deep-reinforcement-
learning-first-edition-miguel-morales/

Deep Reinforcement Learning in Unity: With Unity ML

Toolkit 1st Edition Abhilash Majumder

https://textbookfull.com/product/deep-reinforcement-learning-in-
unity-with-unity-ml-toolkit-1st-edition-abhilash-majumder/

Deep Reinforcement Learning in Unity With Unity ML

Toolkit 1st Edition Abhilash Majumder

https://textbookfull.com/product/deep-reinforcement-learning-in-
unity-with-unity-ml-toolkit-1st-edition-abhilash-majumder-2/

Foundations of Deep Reinforcement Learning Theory and

Practice in Python 1st Edition Laura Graesser

https://textbookfull.com/product/foundations-of-deep-
reinforcement-learning-theory-and-practice-in-python-1st-edition-
laura-graesser/
Foundations of Deep Reinforcement Learning Theory and
Practice in Python First Edition Laura Graesser

https://textbookfull.com/product/foundations-of-deep-
reinforcement-learning-theory-and-practice-in-python-first-
edition-laura-graesser/

Deep Learning in Biometrics 1st Edition Mayank Vatsa

https://textbookfull.com/product/deep-learning-in-biometrics-1st-
edition-mayank-vatsa/

Reinforcement Learning for Optimal Feedback Control

Rushikesh Kamalapurkar

https://textbookfull.com/product/reinforcement-learning-for-
optimal-feedback-control-rushikesh-kamalapurkar/

Reinforcement learning and Optimal Control Draft

version 1st Edition Dmitri Bertsekas

https://textbookfull.com/product/reinforcement-learning-and-
optimal-control-draft-version-1st-edition-dmitri-bertsekas/

Reinforcement Learning Industrial Applications of

Intelligent Agents 1st Edition Phil Winder

https://textbookfull.com/product/reinforcement-learning-
industrial-applications-of-intelligent-agents-1st-edition-phil-
winder/
IN ACTION
Alexander Zai
Brandon Brown

MANNING
Deep Reinforcement Learning in Action
Deep Reinforcement
Learning in Action
BRANDON BROWN
AND ALEXANDER ZAI

MANNING
SHELTER ISLAND
For online information and ordering of this and other Manning books, please visit
www.manning.com. The publisher offers discounts on this book when ordered in quantity.
For more information, please contact
Special Sales Department
Manning Publications Co.
20 Baldwin Road
PO Box 761
Shelter Island, NY 11964
Email: [email protected]

©2020 by Manning Publications Co. All rights reserved.

No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in

any form or by means electronic, mechanical, photocopying, or otherwise, without prior written
permission of the publisher.

Many of the designations used by manufacturers and sellers to distinguish their products are
claimed as trademarks. Where those designations appear in the book, and Manning Publications
was aware of a trademark claim, the designations have been printed in initial caps or all caps.

Recognizing the importance of preserving what has been written, it is Manning’s policy to have
the books we publish printed on acid-free paper, and we exert our best efforts to that end.
Recognizing also our responsibility to conserve the resources of our planet, Manning books
are printed on paper that is at least 15 percent recycled and processed without the use of
elemental chlorine.

Development editor: Karen Miller

Technical development editor: Marc-Philippe Huget
Manning Publications Co. Review editor: Ivan Martinović
20 Baldwin Road Production editor: Deirdre Hiam
PO Box 761 Copy editor: Andy Carroll
Shelter Island, NY 11964 Proofreader: Jason Everett
Technical proofreader: Al Krinker
Typesetter: Dennis Dalinnik
Cover designer: Marija Tudor

ISBN: 9781617295430
Printed in the United States of America
brief contents
PART 1 FOUNDATIONS ..............................................................1
1 ■ What is reinforcement learning? 3
2 ■ Modeling reinforcement learning problems:
Markov decision processes 23
3 ■ Predicting the best states and actions:
Deep Q-networks 54
4 ■ Learning to pick the best policy: Policy
gradient methods 90
5 ■ Tackling more complex problems
with actor-critic methods 111

PART 2 ABOVE AND BEYOND . .................................................139

6 ■ Alternative optimization methods: Evolutionary
algorithms 141
7 ■ Distributional DQN: Getting the full story 167
8 ■ Curiosity-driven exploration 210
9 ■ Multi-agent reinforcement learning 243
10 ■ Interpretable reinforcement learning: Attention
and relational models 283
11 ■ In conclusion: A review and roadmap 329

v
contents
preface xiii
acknowledgments xv
about this book xvi
about the authors xix
about the cover illustration xx

PART 1 FOUNDATIONS ....................................................1

1 What is reinforcement learning?

1.1
3
The “deep” in deep reinforcement learning 4
1.2 Reinforcement learning 6
1.3 Dynamic programming versus Monte Carlo 9
1.4 The reinforcement learning framework 10
1.5 What can I do with reinforcement learning? 14
1.6 Why deep reinforcement learning? 16
1.7 Our didactic tool: String diagrams 18
1.8 What’s next? 20

vii
viii CONTENTS

2 Modeling reinforcement learning problems:

Markov decision processes 23
2.1 String diagrams and our teaching methods 23
2.2 Solving the multi-arm bandit 28
Exploration and exploitation 29 ■
Epsilon-greedy strategy 30
Softmax selection policy 35
2.3 Applying bandits to optimize ad placements 37
Contextual bandits 38 ■ States, actions, rewards 39
2.4 Building networks with PyTorch 40
Automatic differentiation 40 ■ Building Models 41
2.5 Solving contextual bandits 42
2.6 The Markov property 47
2.7 Predicting future rewards: Value and policy functions 49
Policy functions 50 ■
Optimal policy 51 ■
Value
functions 51

3 Predicting the best states and actions: Deep Q-networks

3.1 The Q function 55
54

3.2 Navigating with Q-learning 56

What is Q-learning? 56 Tackling Gridworld 57
■

Hyperparameters 59 Discount factor 60 Building the

■ ■

network 61 Introducing the Gridworld game engine 63

■

A neural network as the Q function 65

3.3 Preventing catastrophic forgetting: Experience replay 75
Catastrophic forgetting 75 ■
Experience replay 76
3.4 Improving stability with a target network 80
Learning instability 81
3.5 Review 86

4 Learning to pick the best policy: Policy gradient methods

4.1 Policy function using neural networks
Neural network as the policy function 91 ■
91
Stochastic policy
90

gradient 92 Exploration 94
■

4.2 Reinforcing good actions: The policy gradient

algorithm 95
Defining an objective 95 Action reinforcement 97
■

Log probability 98 Credit assignment 99

■
CONTENTS ix

4.3 Working with OpenAI Gym 100

CartPole 102 ■
The OpenAI Gym API 103
4.4 The REINFORCE algorithm 103
Creating the policy network 104 Having the agent interact with
■

the environment 104 Training the model 105 The full

■ ■

training loop 107 Chapter conclusion 108

■

5 Tackling more complex problems with actor-critic methods 111

5.1 Combining the value and policy function 113
5.2 Distributed training 118
5.3 Advantage actor-critic 123
5.4 N-step actor-critic 132

PART 2 ABOVE AND BEYOND . .......................................139

6 Alternative optimization methods: Evolutionary

algorithms 141
6.1 A different approach to reinforcement learning 142
6.2 Reinforcement learning with evolution strategies 143
Evolution in theory 143 ■
Evolution in practice 147
6.3 A genetic algorithm for CartPole 151
6.4 Pros and cons of evolutionary algorithms 158
Evolutionary algorithms explore more 158 Evolutionary ■

algorithms are incredibly sample intensive 158

Simulators 159
6.5 Evolutionary algorithms as a scalable alternative 159
Scaling evolutionary algorithms 160 Parallel vs. serial
■

processing 161 Scaling efficiency 162 Communicating

■ ■

between nodes 163 Scaling linearly 165 Scaling gradient-

■ ■

based approaches 165

7 Distributional DQN: Getting the full story

7.1 What’s wrong with Q-learning? 168
167

7.2 Probability and statistics revisited 173

Priors and posteriors 175 ■
Expectation and variance 176
7.3 The Bellman equation 180
The distributional Bellman equation 180
x CONTENTS

7.4 Distributional Q-learning 181

Representing a probability distribution in Python 182
Implementing the Dist-DQN 191
7.5 Comparing probability distributions 193
7.6 Dist-DQN on simulated data 198
7.7 Using distributional Q-learning to play Freeway 203

8 Curiosity-driven exploration 210

8.1 Tackling sparse rewards with predictive coding 212
8.2 Inverse dynamics prediction 215
8.3 Setting up Super Mario Bros. 218
8.4 Preprocessing and the Q-network 221
8.5 Setting up the Q-network and policy function 223
8.6 Intrinsic curiosity module 226
8.7 Alternative intrinsic reward mechanisms 239

9 Multi-agent reinforcement learning 243

9.1 From one to many agents 244
9.2 Neighborhood Q-learning 248
9.3 The 1D Ising model 252
9.4 Mean field Q-learning and the 2D Ising model 261
9.5 Mixed cooperative-competitive games 271

10 Interpretable reinforcement learning: Attention and

relational models 283
10.1 Machine learning interpretability with attention
and relational biases 284
Invariance and equivariance 286
10.2 Relational reasoning with attention 287
Attention models 288 Relational reasoning 290
■

Self-attention models 295

10.3 Implementing self-attention for MNIST 298
Transformed MNIST 298 The relational module 299
■

Tensor contractions and Einstein notation 303 Training

■

the relational module 306

10.4 Multi-head attention and relational DQN 310
CONTENTS xi

10.5 Double Q-learning 317

10.6 Training and attention visualization 319
Maximum entropy learning 323 Curriculum learning
■ 323
Visualizing attention weights 323

11 In conclusion: A review and roadmap

11.1 What did we learn? 329
329

11.2 The uncharted topics in deep reinforcement

learning 331
Prioritized experience replay 331 Proximal policy optimization
■

(PPO) 332 Hierarchical reinforcement learning and the options

■

framework 333 Model-based planning 333 Monte Carlo

■ ■

tree search (MCTS) 334

11.3 The end 335

appendix Mathematics, deep learning, PyTorch 336

Reference list 348
index 351
preface
Deep reinforcement learning was launched into the spotlight in 2015, when Deep-
Mind produced an algorithm capable of playing a suite of Atari 2600 games at super-
human performance. Artificial intelligence seemed to be finally making some real
progress, and we wanted to be a part of it.
Both of us have software engineering backgrounds and an interest in neurosci-
ence, and we’ve been interested in the broader field of artificial intelligence for a long
time (in fact, one of us actually wrote his first neural network before high school in
C#). These early experiences did not lead to any sustained interest, since this was
before the deep learning revolution circa 2012, when the superlative performance of
deep learning was clear. But after seeing the amazing successes of deep learning
around this time, we became recommitted to being a part of the exciting and bur-
geoning fields of deep learning and then deep reinforcement learning, and both of
us have incorporated machine learning more broadly into our careers in one way or
another. Alex transitioned into a career as a machine learning engineer, making his
mark at little-known places like Amazon, and Brandon began using machine learning
in academic neuroscience research. As we delved into deep reinforcement learning,
we had to struggle through dozens of textbooks and primary research articles, parsing
advanced mathematics and machine learning theory. Yet we found that the funda-
mentals of deep reinforcement learning are actually quite approachable from a soft-
ware engineering background. All of the math can be easily translated into a language
that any programmer would find quite readable.

xiii
xiv PREFACE

We began blogging about the things we were learning in the machine learning
world and projects that we were using in our work. We ended up getting a fair amount
of positive feedback, which led us to the idea of collaborating on this book. We believe
that most of the resources out there for learning hard things are either too simple and
leave out the most compelling aspects of the topic or are inaccessible to people with-
out sophisticated mathematics backgrounds. This book is our effort at translating a
body of work written for experts into a course for those with nothing more than a pro-
gramming background and some basic knowledge of neural networks. We employ
some novel teaching methods that we think set our book apart and lead to much
faster understanding. We start from the basics, and by the end you will be implement-
ing cutting-edge algorithms invented by industry-based research groups like Deep-
Mind and OpenAI, as well as from high-powered academic labs like the Berkeley
Artificial Intelligence Research (BAIR) Lab and University College London.
acknowledgments
This book took way longer than we anticipated, and we owe a lot to our editors Can-
dace West and Susanna Kline for helping us at every stage of the process and keeping
us on track. There are a lot of details to keep track of when writing a book, and with-
out the professional and supportive editorial staff we would have floundered.
We’d also like to thank our technical editors Marc-Philippe Huget and Al Krinker
and all of the reviewers who took the time to read our manuscript and provide us with
crucial feedback. In particular, we thank the reviewers: Al Rahimi, Ariel Gamiño,
Claudio Bernardo Rodriguez, David Krief, Dr. Brett Pennington, Ezra Joel Schroeder,
George L. Gaines, Godfred Asamoah, Helmut Hauschild, Ike Okonkwo, Jonathan
Wood, Kalyan Reddy, M. Edward (Ed) Borasky, Michael Haller, Nadia Noori, Satyajit
Sarangi, and Tobias Kaatz. We would also like to thank everyone at Manning who
worked on this project: Karen Miller, the developmental editor; Ivan Martinović, the
review editor; Deirdre Hiam, the project editor; Andy Carroll, the copy editor; and
Jason Everett, the proofreader.
In this age, many books are self-published using various online services, and we
were initially tempted by this option; however, after having been through this whole
process, we can see the tremendous value in professional editing staff. In particular, we
thank copy editor Andy Carroll for his insightful feedback that dramatically improved
the clarity of the text.
Alex thanks his PI Jamie who introduced him to machine learning early in his
undergraduate career.
Brandon thanks his wife Xinzhu for putting up with his late nights of writing and
time away from the family and for giving him two wonderful children, Isla and Avin.

xv
about this book
Who should read this book
Deep Reinforcement Learning in Action is a course designed to take you from the very
foundational concepts in reinforcement learning all the way to implementing the lat-
est algorithms. As a course, each chapter centers around one major project meant to
illustrate the topic or concept of that chapter. We’ve designed each project to be
something that can be efficiently run on a modern laptop; we don’t expect you to
have access to expensive GPUs or cloud computing resources (though access to these
resources does make things run faster).
This book is for individuals with a programming background, in particular, a work-
ing knowledge of Python, and for people who have at least a basic understanding of
neural networks (a.k.a. deep learning). By “basic understanding,” we mean that you
have at least tried implementing a simple neural network in Python even if you didn’t
fully understand what was going on under the hood. Although this book is focused on
using neural networks for the purposes of reinforcement learning, you will also proba-
bly learn a lot of new things about deep learning in general that can be applied to
other problems outside of reinforcement learning, so you do not need to be an expert
at deep learning before jumping into deep reinforcement learning.

xvi
ABOUT THIS BOOK xvii

How this book is organized: A roadmap

The book has two sections with 11 chapters.
Part 1 explains the fundamentals of deep reinforcement learning.
■ Chapter 1 gives a high-level introduction to deep learning, reinforcement
learning, and the marriage of the two into deep reinforcement learning.
■ Chapter 2 introduces the fundamental concepts of reinforcement learning that
will reappear through the rest of the book. We also implement our first practi-
cal reinforcement learning algorithm.
■ Chapter 3 introduces deep Q-learning, one of the two broad classes of deep
reinforcement algorithms. This is the algorithm that DeepMind used to outper-
form humans at many Atari 2600 games in 2015.
■ Chapter 4 describes the other major class of deep reinforcement learning algo-
rithms, policy-gradient methods. We use this to train an algorithm to play a sim-
ple game.
■ Chapter 5 shows how we can combine deep Q-learning from chapter 3 and pol-
icy-gradient methods from chapter 4 into a combined class of algorithms called
actor-critic algorithms.
Part 2 builds on the foundations we built in part 1 to cover the biggest advances in
deep reinforcement learning in recent years.
■ Chapter 6 shows how to implement evolutionary algorithms, which use princi-
ples of biological evolution, to train neural networks.
■ Chapter 7 describes a method to significantly improve the performance of deep
Q-learning by incorporating probabilistic concepts.
■ Chapter 8 introduces a way to give reinforcement learning algorithms a sense of
curiosity to explore their environments without any external cues.
■ Chapter 9 shows how to extend what we have learned in training single agent
reinforcement learning algorithms into systems that have multiple interacting
agents.
■ Chapter 10 describes how to make deep reinforcement learning algorithms
more interpretable and efficient by using attention mechanisms.
■ Chapter 11 concludes the book by discussing all the exciting areas in deep rein-
forcement learning we didn’t have the space to cover but that you may be inter-
ested in.
The chapters in part 1 should be read in order, as each chapter builds on the concepts
in the previous chapter. The chapters in part 2 can more or less be approached in any
order, although we still recommend reading them in order.
xviii ABOUT THIS BOOK

About the code

As we noted, this book is a course, so we have included all of the code necessary to run
the projects within the main text of the book. In general, we include shorter code
blocks as inline code which is formatted in this font as well as code in separate num-
bered code listings that represented larger code blocks.
At press time we are confident all the in-text code is working, but we cannot guar-
antee that the code will continue to be bug free (especially for those of you reading
this in print) in the long term, as the deep learning field and consequently its libraries
are evolving quickly. The in-text code has also been pared down to the minimum nec-
essary to get the projects working, so we highly recommend you follow the projects in
the book using the code in this book’s GitHub repository: http://mng.bz/JzKp. We
intend to keep the code on GitHub up to date for the foreseeable future, and it also
includes additional comments and code that we used to generate many of the figures
in the book. Hence, it is best if you read the book alongside the corresponding code
in the Jupyter Notebooks found on the GitHub repository.
We are confident that this book will teach you the concepts of deep reinforcement
learning and not just how to narrowly code things in Python. If Python were to some-
how disappear after you finish this book, you would still be able to implement all of
these algorithms in some other language or framework, since you will understand the
fundamentals.

liveBook discussion forum

Purchase of Deep Reinforcement Learning in Action includes free access to a private web
forum run by Manning Publications where you can make comments about the book,
ask technical questions, and receive help from the authors and from other users. To
access the forum, go to https://livebook.manning.com/#!/book/deep-reinforce-
ment-learning-in-action/discussion. You can also learn more about Manning’s forums
and the rules of conduct at https://livebook.manning.com/#!/discussion.
Manning’s commitment to our readers is to provide a venue where a meaningful
dialogue between individual readers and between readers and the authors can take
place. It is not a commitment to any specific amount of participation on the part of
the authors, whose contribution to the forum remains voluntary (and unpaid). We
suggest you try asking the authors some challenging questions lest their interest stray!
The forum and the archives of previous discussions will be accessible from the pub-
lisher’s website as long as the book is in print.
about the authors
ALEX ZAI has worked as Chief Technology Officer at Codesmith, an immersive coding
bootcamp where he remains a Technical Advisor, as a software engineer at Uber, and
as a machine learning engineer at Banjo and Amazon and he is a contributor to the
open source deep learning framework Apache MXNet. He is also an entrepreneur
who has co-founded two companies, one of which was a Y-combinator entrant.

BRANDON BROWN grew up programming and worked as a part-time software engineer

through college but ended up pursuing a career in medicine; he worked as a software
engineer in the healthcare technology space along the way. He is now a physician and
is pursuing his research interests in computational psychiatry inspired by deep rein-
forcement learning.

xix
about the cover illustration
The figure on the cover of Deep Reinforcement Learning in Action is captioned “Femme
de l’Istria,” or woman from Istria. The illustration is taken from a collection of dress
costumes from various countries by Jacques Grasset de Saint-Sauveur (1757-1810),
titled Costumes de Différents Pays, published in France in 1797. Each illustration is finely
drawn and colored by hand. The rich variety of Grasset de Saint-Sauveur’s collection
reminds us vividly of how culturally apart the world’s towns and regions were just 200
years ago. Isolated from each other, people spoke different dialects and languages. In
the streets or in the countryside, it was easy to identify where they lived and what their
trade or station in life was just by their dress.
The way we dress has changed since then and the diversity by region, so rich at the
time, has faded away. It is now hard to tell apart the inhabitants of different conti-
nents, let alone different towns, regions, or countries. Perhaps we have traded cultural
diversity for a more varied personal life—certainly for a more varied and fast-paced
technological life.
At a time when it is hard to tell one computer book from another, Manning cele-
brates the inventiveness and initiative of the computer business with book covers
based on the rich diversity of regional life of two centuries ago, brought back to life by
Grasset de Saint-Sauveur’s pictures.

xx
Part 1

Foundations

P art 1 consists of five chapters that teach the most fundamental aspects of
deep reinforcement learning. After reading part 1, you’ll be able to understand
the chapters in part 2 in any order.
Chapter 1 begins with a high-level introduction to deep reinforcement learn-
ing, explaining its main concepts and its utility. In chapter 2 we’ll start building
practical projects that illustrate the basic ideas of reinforcement learning. In
chapter 3 we’ll implement a deep Q-network—the same kind of algorithm that
DeepMind famously used to play Atari games at superhuman levels.
Chapters 4 and 5 round out the most common reinforcement learning algo-
rithms, namely policy gradient methods and actor-critic methods. We’ll look at
the pros and cons of these approaches compared to deep Q-networks.
What is reinforcement
learning?

This chapter covers

 A brief review of machine learning
 Introducing reinforcement learning as a subfield
 The basic framework of reinforcement learning

Computer languages of the future will be more concerned with goals and less with
procedures specified by the programmer.
—Marvin Minksy, 1970 ACM Turing Lecture

If you’re reading this book, you are probably familiar with how deep neural net-
works are used for things like image classification or prediction (and if not, just
keep reading; we also have a crash course in deep learning in the appendix). Deep
reinforcement learning (DRL) is a subfield of machine learning that utilizes deep
learning models (i.e., neural networks) in reinforcement learning (RL) tasks (to be
defined in section 1.2). In image classification we have a bunch of images that cor-
respond to a set of discrete categories, such as images of different kinds of animals,
and we want a machine learning model to interpret an image and classify the kind
of animal in the image, as in figure 1.1.

3
4 CHAPTER 1 What is reinforcement learning?

Class labels

Cat
Image classiﬁer
Dog

Figure 1.1 An image classifier is

Cat a function or learning algorithm
that takes in an image and returns
Image classiﬁer
a class label, classifying the image
Dog into one of a finite number of
possible categories or classes.

1.1 The “deep” in deep reinforcement learning

Deep learning models are just one of many kinds of machine learning models we
can use to classify images. In general, we just need some sort of function that takes
in an image and returns a class label (in this case, the label identifying which kind of
animal is depicted in the image), and usually this function has a fixed set of adjust-
able parameters—we call these kinds of models parametric models. We start with a
parametric model whose parameters are initialized to random values—this will pro-
duce random class labels for the input images. Then we use a training procedure to
adjust the parameters so the function iteratively gets better and better at correctly
classifying the images. At some point, the parameters will be at an optimal set of val-
ues, meaning that the model cannot get any better at the classification task. Para-
metric models can also be used for regression, where we try to fit a model to a set of
data so we can make predictions for unseen data (figure 1.2). A more sophisticated
approach might perform even better if it had more parameters or a better internal
architecture.
Deep neural networks are popular because they are in many cases the most accu-
rate parametric machine learning models for a given task, like image classification.
This is largely due to the way they represent data. Deep neural networks have many
layers (hence the “deep”), which induces the model to learn layered representations
of input data. This layered representation is a form of compositionality, meaning that a
complex piece of data is represented as the combination of more elementary compo-
nents, and those components can be further broken down into even simpler compo-
nents, and so on, until you get to atomic units.
Human language is compositional (figure 1.3). For example, a book is composed
of chapters, chapters are composed of paragraphs, paragraphs are composed of sen-
tences, and so on, until you get to individual words, which are the smallest units of
meaning. Yet each individual level conveys meaning—an entire book is meant to con-
vey meaning, and its individual paragraphs are meant to convey smaller points. Deep
neural networks can likewise learn a compositional representation of data—for exam-
ple, they can represent an image as the composition of primitive contours and textures,
The “deep” in deep reinforcement learning 5

Untrained parametric function

Parametric function
Training data

Trained parametric function

Parametric function
Training data

Figure 1.2 Perhaps the simplest machine learning model is a simple linear function of the
form f(x) = mx + b, with parameters m (the slope) and b (the intercept). Since it has adjustable
parameters, we call it a parametric function or model. If we have some 2-dimensional data, we
can start with a randomly initialized set of parameters, such as [m = 3.4, b = 0.3], and then use
a training algorithm to optimize the parameters to fit the training data, in which case the optimal
set of parameters is close to [m = 2, b = 1].

N VP

V NP
Figure 1.3 A sentence like “John hit the ball” can be decomposed
into simpler and simpler parts until we get the individual words. In this
D N case, we can decompose the sentence (denoted S) into a subject noun
(N) and a verb phrase (VP). The VP can be further decomposed into a
verb, “hit,” and a noun phrase (NP). The NP can then be decomposed
John hit the ball. into the individual words “the” and “ball.”
6 CHAPTER 1 What is reinforcement learning?

which are composed into elementary shapes, and so on, until you get the complete,
complex image. This ability to handle complexity with compositional representations
is largely what makes deep learning so powerful.

1.2 Reinforcement learning

It is important to distinguish between problems and their solutions, or in other words,
between the tasks we wish to solve and the algorithms we design to solve them. Deep
learning algorithms can be applied to many problem types and tasks. Image classifica-
tion and prediction tasks are common applications of deep learning because auto-
mated image processing before deep learning was very limited, given the complexity
of images. But there are many other kinds of tasks we might wish to automate, such as
driving a car or balancing a portfolio of stocks and other assets. Driving a car includes
some amount of image processing, but more importantly the algorithm needs to learn
how to act, not merely to classify or predict. These kinds of problems, where decisions
must be made or some behavior must be enacted, are collectively called control tasks.

Video game Game controller

Updates the

Is input to Takes action using

Reinforcement learning
algorithm

Figure 1.4 As opposed to an image classifier, a reinforcement

learning algorithm dynamically interacts with data. It continually
consumes data and decides what actions to take—actions that will
change the subsequent data presented to it. A video game screen
might be input data for an RL algorithm, which then decides which
action to take using the game controller, and this causes the game
to update (e.g. the player moves or fires a weapon).

Reinforcement learning is a generic framework for representing and solving control

tasks, but within this framework we are free to choose which algorithms we want to
apply to a particular control task (figure 1.4). Deep learning algorithms are a natural
choice as they are able to process complex data efficiently, and this is why we’ll focus
on deep reinforcement learning, but much of what you’ll learn in this book is the gen-
eral reinforcement framework for control tasks (see figure 1.5). Then we’ll look at
how you can design an appropriate deep learning model to fit the framework and
solve a task. This means you will learn a lot about reinforcement learning, and you’ll
probably will learn some things about deep learning that you didn’t know as well.
Reinforcement learning 7

Machine
Control tasks
learning

Is a Is a framework
subset of for solving Figure 1.5 Deep learning is a
subfield of machine learning. Deep
Deep Reinforcement learning algorithms can be used
learning Can be used as the learning to power RL approaches to solving
learning algorithm for control tasks.

One added complexity of moving from image processing to the domain of control
tasks is the additional element of time. With image processing, we usually train a deep
learning algorithm on a fixed data set of images. After a sufficient amount of training,
we typically get a high-performance algorithm that we can deploy to some new, unseen
images. We can think of the data set as a “space” of data, where similar images are closer
together in this abstract space and distinct images are farther apart (figure 1.6).
In control tasks, we similarly have a space of data to process, but each piece of data
also has a time dimension—the data exists in both time and space. This means that
what the algorithm decides at one time is influenced by what happened at a previous
time. This isn’t the case for ordinary image classification and similar problems. Time

10
A

5 C

–5

Adj Vb (PP) Vb (Pres) F Vb (Past) Char Vb M Nn

–10
–10 –5 0 5 10

Figure 1.6 This graphical depiction of words in a 2D space shows each word as a colored point.
Similar words cluster together, and dissimilar words are farther apart. Data naturally lives in some
kind of “space” with similar data living closer together. The labels A, B, C, and D point to particular
clusters of words that share some semantics.
Other documents randomly have
different content
The Project Gutenberg eBook of Gambolling with
Galatea: a Bucolic Romance
This ebook is for the use of anyone anywhere in the United
States and most other parts of the world at no cost and with
almost no restrictions whatsoever. You may copy it, give it away
or re-use it under the terms of the Project Gutenberg License
included with this ebook or online at www.gutenberg.org. If you
are not located in the United States, you will have to check the
laws of the country where you are located before using this
eBook.

Title: Gambolling with Galatea: a Bucolic Romance

Author: Curtis Dunham

Illustrator: Oliver Herford

Release date: March 28, 2018 [eBook #56861]

Language: English

Credits: E-text prepared by Richard Tonsing, David Edwards,

and the Online Distributed Proofreading Team
(http://www.pgdp.net) from page images generously
made available by Internet Archive
(https://archive.org)

*** START OF THE PROJECT GUTENBERG EBOOK GAMBOLLING

WITH GALATEA: A BUCOLIC ROMANCE ***
E-text prepared by Richard Tonsing, David Edwards,
and the Online Distributed Proofreading Team
(http://www.pgdp.net)
from page images generously made available by
Internet Archive
(https://archive.org)

Note: Images of the original pages are available through Internet

Archive. See
https://archive.org/details/gambollingwithga00dunhiala
I WOULDN’T ROOST
IN A CHERRY TREE

(page 30)
GAMBOLLING WITH
GALATEA: A BUCOLIC ROMANCE
By CURTIS DUNHAM
Author of “The Casino Girl in London,” “Two in a Zoo,” “The Golden
Goblin,” etc.

WITH ILLUSTRATIONS
BY OLIVER HERFORD

HOUGHTON MIFFLIN COMPANY BOSTON & NEW

YORK ⁘ THE RIVERSIDE PRESS CAMBRIDGE
MDCCCCIX
COPYRIGHT, 1909, BY CURTIS DUNHAM AND HOUGHTON
MIFFLIN COMPANY

Published May 1909

Preliminary and Confidential
air reader (and unfair one, of either sex), I pray you be
not dismayed by the profundity of this discourse.
Doubtless there are some light-minded observers who
would have seen in the natural phenomena herein
recorded the very quintessence of humor, the apotheosis
of the comical. Such pretenders to scientific and literary eminence
would entertain the same view of the noble Titanotherium
Robustum, or the sublime Stegosaurus Ungulatus. They would have
cast merry doubts upon the improving conversation between Balaam
and his Ass; ridiculed the psychic resources of the Birds of St. Francis
d’Assisi; scoffed at the gratitude of Æsop’s Lion; denied the acumen
of the Jumping Frog of Calaveras; yea, and presumed to say “scat” to
the sacred Cat of Bubastis.
Fair reader (or unfair one), be warned against all such triflers with
the important truths of nature. Life is earnest. Turn the page—read,
ponder, and be wise.
C. D.
Contents

Preliminary and Confidential vii

PART I
Initiation of the Two-Legged Partners 1

PART II
Fair Warning to the Horseless 39

PART III
Pig-Malion and Galatea 67

PART IV
The Obsequies of Bos Nemo 98

PART V
Equus Minor, Detective 127

PART VI
Taurus Cupid, Esq. 157
Illustrations

“I wouldn’t roost in a cherry tree” (page 30) Frontispiece

The goat seemed to nod his approval 44

Sit perfectly still for five minutes while the

gentleman takes your picture 92

Seized her hand and kissed it ardently 126

The guests ate their turnips decorously 150

All the four-legged members of the firm had drawn

near 168
GAMBOLLING WITH GALATEA
I
Initiation of the Two-Legged Partners
he thing was incredible. It was intolerable—just cause for
mutiny. Talk about injustice, arrogant denial of the
equal rights of man and beast! Well, here was a spectacle
calculated to make the heavens weep. Yet never had a
June sky revealed a deeper shade of blue for fleecy
clouds to sail upon. The wind that should have risen in a shriek of
indignation blew softly around the corner of the barn, and was laden
with fragrance from all the flowers that bloom. In the meadow just
beyond the stone fence, the tall grass waved gently, whispering
contentment to the brook that gurgled with happiness. Birds sang,
grasshoppers chirped—
Clarence could stand it no longer. With his neck stretched far out of
his stall window, the colt lifted up his voice and whinnied
remonstrance.
“O Amanda! Why are we still prisoners, and the sun half-way up the
roof of heaven? It is an outrage, Amanda. Come quickly and let us
out.”
Reginald—the round fat one with the tight kink in his tail—stood on
his hind-legs inside the barnyard fence under the colt’s nose, and
voiced his personal grievance in short sharp squeaks.
“Let me out, let me out, let me out! My trough is empty. My flattened
belly cleaves to my backbone.”
On either side of him were Mrs. Cowslip and Gustavius, with their
heads over the fence and their noses in the air.
“Amanda, O Amanda!” bawled the bull-calf, while his mother—she of
the liquid eyes and the crumpled horn—lowed her gentle reminder:—
“Good, kind Amanda, this yard is barren; in the pasture the long
grass is luscious. Amanda, O Amanda!”
And William, the big-horned and bearded one, butted foolishly at the
hinges of the barnyard gate.
The others gave no heed to William’s puerile devices. He was only an
addle-pated goat anyway, devoid of reasoning power and puffed up
with vanity. They put their noses together and considered the matter,
the bull-calf wrinkling his yellow muzzle at Clarence’s ear and
dropping now and then a superfluous comment. Ordinarily the colt,
having an exalted sense of his own superiority, would have indulged
in no such familiarity with a placid old cow and her lubberly calf; but
it was plain that the present occasion was one demanding the sinking
of the individual in the organization. So Clarence patiently reviewed
the situation, inviting their suggestions.
To go back to the events of the early morning. Why had that two-
legged tyrant, who always responded so promptly to the vulgar name
of Gabe whenever Amanda hailed him from the kitchen door,
harnessed the mare and driven off, leaving them deprived of their
customary liberty, and without a word of explanation? The act was
contrary to the Professor’s most sacred principle of equity for all
living creatures, whether having four legs or only two.
“And yet just now you led us in our supplications to Amanda,”
observed Mrs. Cowslip. “Why did you not remind the Professor of
our—”
“Ah!” broke in Gustavius, “you can trust the Professor to understand
the needs of a bull-calf.”
“You don’t have to ask the Professor twice when you want your back
scratched,” grunted Reginald, his tail kinking tighter than ever with
delicious memories.
“The Professor has a large, round, and most inviting stomach,”
commented William. “Never before have I spared such a stomach.
Yet never have I felt the slightest inclination to butt the Professor.”
Mrs. Cowslip turned her mild eyes inquiringly on the colt. “I
suggest,” she said, “that we remind the Professor—”
“My gracious!” interrupted Clarence with impatience. “Can’t you
fellows remember anything over-night? The Professor drove off
behind my mother yesterday morning. There was a box beside him in
the wagon. He wore his high hat. Mother came home without him.
There’s nobody left in the house but Amanda and that two-legged
Gabe.”
Just then Gustavius tossed his immature horns and bellowed:—
“Amanda! Amanda!”
With an apron over her head and a tin pail on her arm, Amanda had
come into view beyond the angle of the barn.
“She’s going to the strawberry-patch over beyond the orchard,” said
Clarence, excitedly. “Quick! Now, all together!”
Amanda had not the hardihood to ignore the resulting chorus of
appeals to her. But she passed quickly on out of sight, after turning
long enough to wave her hand and answer:—
“Jest be patient, you critters. Gabe’ll ’tend to you when he gits
home.”
The colt nearly burst with indignation.
“That settles it,” he shrieked, lashing out with his heels so that there
was a great clatter of things loose in the barn. Then he drew back his
lips, baring his teeth, and began snapping at the latch-string of the
barn-door, which was just beyond his reach.
“It’s a pity,” said Mrs. Cowslip. “I’ve seen your mother let herself in
that way many a time, when she was full of grass and eager for her
midday nap.”
“If I was only out of here, I could reach that string,” grunted
Reginald, with one thought for the colt and two for himself.
“Oh, we know all about you,” retorted Clarence with exasperation. “If
you could get out you’d scoot for those artichokes down by the brook
and never look behind you, you fat, selfish, kink-tailed little beast.”
“Just you try me,” urged the pig, for he had great confidence in the
colt’s resources.
Once more their noses were close together, while Clarence instructed
them in the details of a desperate effort designed to gain freedom for
them all.
To contend with the smug incredulity of those millions of human
kind who spend their lives in little brick-and-mortar boxes set one on
top of another in long double rows is the fate of all chroniclers of the
important aspects of nature. But truth is mighty and will prevail. Let
us therefore proceed calmly with the facts.
When Clarence had repeated his instructions several times, Reginald
gave three sharp, intelligent grunts and ran straight to the barnyard
gate. With his stiffened snout he began furiously attacking the hard
earth beneath the lower bar.
“Not there, you idiot!” squealed the colt. “The other end. The other
end, where the iron hinges are!”
Reginald stood corrected. While the dirt flew from under the hinged
end of the gate, Gustavius galloped foolishly around the yard with his
tail aloft, and William, with a coolly calculating eye on those hinges,
backed away slowly, with significance understood by all the other
conspirators. Mrs. Cowslip looked on benignantly. Presently the pig
got his sturdy shoulders under the gate and heaved with all his
might. William, with head down, leaped to the assault. The crash of
his horns on those hinges reëchoed between orchard and wooded
hills. But the gate was raised only an inch or two, and Reginald stuck
fast. His squeals as he struggled would have melted a heart of stone.
William backed away for another assault. It was while he was in mid-
air that Clarence shrilled:—
“Not the hinges! The pig, the pig!”
William understood. This time all the weight behind his horns
landed with a resounding smack on Reginald’s inviting posterior. In
the midst of heart-rending squeals the gate rose in the air and the
barnyard prisoners looked out on liberty. Instantly Reginald was off
in the direction of the artichokes.
“Stop!” shrieked Clarence. “As I’m a thoroughbred, you shall feel my
heels among your spareribs!”
Reginald looked back, and seeing immediate menace in the lowered
horns of Mrs. Cowslip and Gustavius, turned about, ran to the barn-
door, stood on his hind-legs, seized with his teeth the leather string
at which the colt was frantically snapping, gave one sharp pull—and
the deed was done. If Amanda, a moment later, had looked up from
her strawberry-picking, she would have seen, circling over the half-
lawn, half-pasture between the barn and the house, all tails in the air,
a triumphant procession consisting of one yearling colt, one cow with
a crumpled horn, one bull-calf, one he-goat making short stiff-legged
jumps with horns lowered, and one pig bringing up the rear with a
tail now so tightly kinked that it lifted his hind-quarters clear of the
ground at every second leap.
But Amanda’s mind was glued on strawberries; and for the present
other matters of moment require us, too, to leave the escaped
prisoners to their own devices.

Half a mile away the Poet and his sister sat on a boulder beside the
road. It was a semi-public road winding around the foot of a wooded
hill. Behind them, a mile away, was the railway station. That mile
had been mostly uphill, and the Poet did not love physical exercise.
He was tall and lean, with a geometrical figure composed mainly of
acute angles. When in a state of repose, it resembled a carpenter’s
pocket rule which protested at being entirely shut up. The Poet’s
sister, on the contrary, was mainly curves—those delicate, subtle
curves that deny the presence of bones, yet repel any suggestion of
fat. She was young; not too young—just young enough to have won
the crowning glory of spinsterhood. She had quantities of red hair,
the kind of red hair that always goes with that astonishingly
transparent skin underneath which scattering amber freckles come
and go over-night. There was one now on the side of her nose, which
had a becomingly mirthful tilt at the end. Her lips were full at the
centre, carmine, and with finely shaped corners which could not by
any possibility be drawn downward. She wore a solid pair of calfskin
boots, with military heels which looked small while being ample in
size. Her dark walking-skirt barely reached the interesting spot
where her bootlaces were tied. Her waist, of a soft, cream-tinted
material, left her neck and throat bare—for which the Lord be
praised!—and a shapeless, yet shapely, fluffy white thing resting on
the coils of her hair seemed to absorb warmth from them. In short,
you will make no mistake when you keep your mind fixed on the
Poet’s sister.
“Just around the next turn of the road, George,” she was saying, “our
little summer Elysium will burst upon your view.”
The Poet mopped the long, solemn countenance that was belied by
his eyes and his manner of speech.
“Galatea, I have observed that most things elysian in this life are
generally just around the corner. I am not impatient. I can wait. In
fact, I should prefer to have that first view burst upon me while I am
comfortably seated in the spring wagon of—What did I understand
you to say the gentleman’s name was, Galatea?”
“He is called Gabe.”
“Doubtless a corruption of Gabriel. I wonder if Gabriel blows his
trumpet for breakfast?”
Galatea’s lips parted in a musical ripple of laughter. The sight would
have caused a dentist to pass on, with misgivings about his future.
The Poet merely remarked:—
“Galatea, are you sure we brought our toothbrushes?” Whereupon
the dentist would have been heartened by the sight of a tiny point of
gold shining out of the crown of her left bicuspid.
“George, you lazy thing, come on. It’s only half a mile further. Gabriel
probably missed us at the station, and has returned by the main
road.”
“Oh, well, if all roads lead to Elysium, I suppose it’s no use waiting
here.”
Slowly the Poet’s angles adjusted themselves to the upright position,
and he strode on beside his sister.
“So you really like the place, Galatea?”
“It’s lovely—just the spot to give you inspiration, George. I shall
expect great things of you, dear.”
“Will it inspire me to reduce the rhythm of Anacreon to ragtime, do
you think?”
“O George! And there are the Professor’s pets, you know—Mrs.
Cowslip, Clarence, Reginald, Gustavius, and William. I told you
about them. The Professor has the most wonderful knack of
understanding domestic animals and making them understand him.
Really, they look upon him as one of themselves. The Professor says
we do our domestic animal pets great injustice when we overlook
their loyalty and intelligence, refusing to meet them half-way in
friendly companionship. Why, with only a little encouragement they
develop the most remarkable emotions, almost human in their
complexity; while their powers of expression develop
correspondingly. Positively the Professor and his cow, and colt, and
pig, and bull-calf,—William the goat, Napoleon the dog, and
Cleopatra the mare were away the day I called to arrange about the
lease for the summer,—are just one big happy family.”
Galatea’s cheeks were flushed with enthusiasm. The Poet’s eyes
twinkled, but his face remained long and solemn.
“What name does the pig answer to?”
“Reginald; but he’s a nice, clean pig.”
“Yes, of course, being a member of the Professor’s family. By the way,
did you have an opportunity to note Reginald’s table manners?”
“O George, how perfectly absurd!”
“Not necessarily. I give way to no man in my determination to do
justice to my fellow creatures, irrespective of the number of legs with
which they are equipped. As the Professor has left us in undisputed
possession for the next six months, there’s no telling what we may
accomplish. What sort of voice has Reginald?”
“George, I shan’t tell you another thing!”
“There, there. It merely occurred to me that, as neither you nor I nor
Arthur sings—By the way, Galatea, I suppose Arthur will run over
occasionally in his new automobile, the lucky beggar?”
“I lay claim to no advance information respecting Arthur’s
intentions,” answered the Poet’s sister, in cool, even tones. The
flapping brim of her headgear was between the Poet’s eyes and her
cheek, suddenly turned pink.
“Oh, well, I was only thinking what a boon Arthur’s banjo and my
guitar would turn out to be if the pig should develop a romantic tenor
voice. By Jove, Galatea! If that’s the place, I apologize for
everything.”
They had reached the turn of the road that overlooked their summer
Elysium. The Poet distributed his joints over another roadside
boulder, while Galatea stood by his side, and gave his attention to the
charming scene in detail.
“Really, a fine, rambling old house surrounded by shaded verandas
below, and not too near the road. A stone-walled inclosure of half a
dozen acres sloping down to a pretty brook that flows under the
lower wall just below the barn—a comfortable red barn; a barn that
isn’t red is only half a barn. A kitchen-garden and an orchard, and
the rest pasture that is neat enough for a lawn. What romps we shall
have, Galatea, with the colt and the bull-calf! What’s that vine-
covered affair reared against the west gable of the house? Oh, a
water-tank. Just so; there’s a pipe connecting underground with the
brook, and that wind-wheel on the barn roof does the pumping.
Good! I anticipate the luxury of an occasional tub. I was afraid
Elysium was like Germany—lots of romance and no bathtubs.
Galatea, we shall do—we shall do beautifully. But I say, what’s that
funny-looking thing on the peak of the house roof?”
“Isn’t it the chimney?”
“It looks to me like a saw-horse.”
They walked on. After passing through a grove of chestnuts, they had
a nearer and better view of the house.
“No, it isn’t a saw-horse,” said the Poet. “It moves. Did you see it?”
Galatea looked embarrassed.
“Galatea, the thing on our roof looks to me uncommonly like a billy-
goat. Galatea, it is a billy-goat—I can make out his whiskers.”
Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade

Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.