0% found this document useful (0 votes)

47 views

Simple RL: Reproducible Reinforcement Learning in Python: David - Abel@brown - Edu

This document introduces simple rl, an open source Python library for conducting reinforcement learning experiments with a focus on simplicity and reproducibility. The goal of simple rl is to support seamless and reproducible methods for running RL experiments through a basic pipeline that defines agents and environments, runs interactions between them, and logs results in JSON files that can be used to rerun the exact same experiment. The core functionality demonstrated in examples creates gridworld and RL agents, runs experiments measuring cumulative reward over episodes, and plots the results.

Uploaded by

Ronaldo Adriano

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views

Simple RL: Reproducible Reinforcement Learning in Python: David - Abel@brown - Edu

Uploaded by

Ronaldo Adriano

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

simple rl: Reproducible Reinforcement Learning in

Python

David Abel
[email protected]

Abstract
Conducting reinforcement-learning experiments can be a complex and timely pro-
cess. A full experimental pipeline will typically consist of a simulation of an en-
vironment, an implementation of one or many learning algorithms, a variety of
additional components designed to facilitate the agent-environment interplay, and
any requisite analysis, plotting, and logging thereof. In light of this complexity,
this paper introduces simple rl1 , a new open source library for carrying out rein-
forcement learning experiments in Python 2 and 3 with a focus on simplicity. The
goal of simple rl is to support seamless, reproducible methods for running rein-
forcement learning experiments. This paper gives an overview of the core design
philosophy of the package, how it differs from existing libraries, and showcases
its central features.

run CumulaWivH RHwarG: GriGwRrlG H 3 : 4

40 4-lHarQiQg
RaQGRm
CumulaWivH RHwarG

10
script of main experiment
0
0 6 12 18 24 30 36 42 48
EpisRGH 1umEHr

5HprRGucWiRQ: GriGwRrlG H 3 : 4
4-lHarQiQg
35
5aQGRm
30
CumulaWivH 5HwarG

reproduce 5

0
0 6 12 18 24 30 36 42 48
EpisRGH 1umEHr

JSON log of experiment

Figure 1: The core functionality of simple rl: Create agents and an MDP, then run and plot their
resulting interactions. Running an experiment also creates an experiment log (stored as a JSON file),
which can be used to rerun the exact same experiment, thereby facilitating simple reproduction of
results. All practitioners need to do, in theory, is share a copy of the experiment file to someone with
the library to ensure result reproduction.

1 Introduction
Reinforcement learning (RL) has recently soared in popularity due in large part to recent success
in challenging domains, including learning to play Atari games from image input [29], beating the
1
https://github.com/david-abel/simple_rl

2019 ICLR Workshop on Reproducibility in Machine Learning, New Orleans, LA.

1 from simple_rl.agents import QLearningAgent, RandomAgent
2 from simple_rl.tasks import GridWorldMDP
3 from simple_rl.run_experiments import run_agents_on_mdp
4
5 # Setup MDP.
6 mdp = GridWorldMDP(width=4, height=3, init_loc=(1, 1), goal_locs=[(4, 3)])
7
8 # Make agents.
9 ql_agent = QLearningAgent(actions=mdp.get_actions())
10 rand_agent = RandomAgent(actions=mdp.get_actions())
11
12 # Run experiment and make plot.
13 run_agents_on_mdp([ql_agent, rand_agent], mdp, instances=5, episodes=50,
steps=10)

Figure 2: Example code for running a basic experiment. First, define a grid-world MDP (line 6),
then make our agents (line 9-10), and then run the experiment (line 13). Running the above will
generate the plot shown in Figure 4.

world champion in Go [35], and robotic control from high dimensional sensors [22]. In concert with
the field’s growth, experiments have become more complex, leading to new challenges for empir-
ical evaluation of RL methods. Recent work by Henderson et al. [17] highlighted many of the is-
sues involved with handling this new complexity, raising concerns about emerging RL experimental
practices. Additionally, Python has become a prominent programming language used by machine-
learning researchers due to the availability of powerful deep learning libraries like PyTorch [31] and
tensorflow [1], along with scipy [20] and numpy [30].
To accommodate this growth, there is a need for a simple, lightweight library that supports quick
execution and analysis of RL experiments in Python. Certainly, many libraries already fulfill this
need for many uses cases—-as will be discussed in Section 2, many effective RL libraries for Python
already exist. However, the design philosophy and ultimate end user of these packages is distinct
from those targeted by simple rl: those users who seek to quickly run simple experiments, look at
a plot that summarizes results, and allow for the quick sharing and reproduction of these findings.
The core design principle of simple rl is that of simplicity, per its name. The library is stripped
down to the bare necessities required to run basic RL experiments. The focus of the library is on
traditional, tabular domains, though it does have the capacity to cooperate with high-dimensional
environments like those offered by the OpenAI Gym [6]. The assumed objective of a practitioner
using the library is to define (1) an RL agent (or collection of agents), (2) an environment (an
MDP, POMDP, or similar Markov model), (3) let the agent(s) interact with the environment, and
(4) view and analyze the results of this interaction. This basic pipeline serves as the “end-game” of
simple rl, and dictates much of the design and its core features. A block diagram of this process
is presented in Figure 1: run an experiment, see the results, and reproduce these results according
to an auto-generated JSON file logging the experimental details. The actual code of the experiment
run is shown in Figure 2: in around five lines, we define a Q-Learning instance, a random actor, and
a simple grid-world domain, and let these agents interact with the environment for a set number of
instances. As mentioned, running this code produces both a JSON file tracking the experiment that
can be used (or shared) to run the same experiment again, and regenerate the plot seen in Figure 4a.

2 Relation To Other Libraries

Many excellent libraries already exist in Python for carrying out RL experiments. What separates
simple rl? As the name suggests, its distinguishing feature is its emphasis on simplicity, which
also brings a shortage of certain features. We here describe the objectives of other RL libraries in
Python, and briefly cover what some have implemented in case those are a better fit for the needs of
different programmers.

2
2.1 RLPy

RLPy offers a well documented, expansive library for RL and planning experiments in Python 2 [16].
The library includes a similar overall structure to that of simple rl: the core entities are agents,
environments, experiments, policies, and representations. The main focus of RLPy is on value-
function approximation, but the library also offers several MDP solvers in the form of the usual
dynamic programming algorithms like value iteration [4] and policy iteration [19]. Notably, the
library also includes a large number of canonical RL tasks, including Mountain Car, Acrobot, Puddle
World, Swimmer, and Cart Pole.
Get it here: https://github.com/rlpy/rlpy

2.2 mushroom

Mushroom is a new library aimed at simplifying RL experimentation with OpenAI gym and tensor-
flow, but also offers support for traditional tabular experiments [13]. Mushroom offers implemen-
tations of many recent Deep RL algorithms, including DQN [29], Stochastic Actor-Critic [12], and
a template for Policy Gradient algorithms. All of its neural network code is based on tensorflow.
Additionally, Mushroom comes with noteworthy RL tasks like Mountain Car, Inverted Pendulum,
and a classic Linear-Quadratic Regulator control task.
Get it here: https://github.com/AIRLab-POLIMI/mushroom

2.3 PyBrain

PyBrain is an established, expansive, general purpose library for machine learning in Python [33],
but also offers infrastructure for conducting RL experiments with a similar focus to RLPy. The
library includes a number of the standard environments and agents, with a large number of model-
free algorithms.
Get it here: http://www.pybrain.org/

2.4 keras-rl

keras-rl provides integration between Keras [9] and many popular Deep RL algorithms.
keras-rl offers an expansive list of implemented Deep RL algorithms in one place, including:
DQN, Double DQN [40], Deep Deterministic Policy Gradient [24], and Dueling DQN [41]. For
those that use Keras for deep learning and mostly want to focus on deep RL, keras-rl library is a
great choice.
Get it here: https://github.com/keras-rl/keras-rl

2.5 RLLib

RLLib is built on top of ray2 , which serves to parallelize typical machine learning experimental
pipelines [23]. RLLib allows for either PyTorch or tensorflow as a backend, and excels at running
experiments in parallel. It contains implementations of many of the latest deep RL algorithms and
offers interface to the OpenAI Gym along with multi-agent environments.
Get it here: https://ray.readthedocs.io/en/latest/rllib.html

2.6 Horizon

Horizon is Facebook’s new applied RL library [15]. Per the enterprise scale needs of Facebook,
Horizon is primarily designed for large implementation: “Horizon is designed with production use
cases as top of mind”. The library offers many of the canonical deep RL algorithms and is built on
top of PyTorch.
Get it here: https://github.com/facebookresearch/Horizon
2
https://github.com/ray-project/ray

3
2.7 python-rl

python-rl [11] provides integration with the classic language-agnostic framework RL-Glue [39].
The main goal of this library is to bring RL-Glue up to date with a few somewhat more recent
features, agents, and environments in common RL experiments.
Get it here: https://github.com/amarack/python-rl

2.8 reinforcement-learning

reinforcement-learning offers an excellent resource for RL education—it is designed to

be paired with David Silver’s online RL course3 [5]. The library contains many central al-
gorithms, including value iteration, policy iteration, Q-Learning [42], SARSA [36], and Pol-
icy Gradient [43, 38]. Programmers planning to go through David Silver’s course may find the
reinforcement-learning library the most suitable package.
Get it here: https://github.com/dennybritz/reinforcement-learning

2.9 dopamine

dopamine is a recently released library [3] offering many of the most recent deep RL algorithms
including Rainbow [18], Prioritized Experience Replay [34], and Distributional RL [2], with an eye
for reproducibility in the ALE based on the suggestions given by [27]. dopamine offers a lot for
people whose main agenda is to run experiments in the ALE or perform new research in deep RL.
Get it here: https://github.com/google/dopamine

To summarize: Many great packages are already out there. The main differentiating features of
simple rl are (1) quick generation of plots, (2) focus on reproducibility, and (3) emphasis on
simplicity, both in terms of algorithmic development and its attachment to classical RL problems
(like grid worlds).

3 Overview of Features
We begin by unpacking the example in Figure 2 to showcase the main design philosophy of
simple rl.

3.1 The Core: Agents and MDPs

The library primarily consists of agents and environments (called “tasks” in the library).
Agents, by default, are all subclasses of the abstract class, Agent, which is only responsible for
a method act(self, state, reward) → action. A list of agents, planning algorithms, and
tasks currently implemented is presented in Table 1.
Tasks, for the most part, all inherit from the abstract MDP class, MDP. The core of an MDP is
its transition function and reward function, captured in the abstract class by class-wide variables,
transition func and reward func:
transition func(state, action) → state, (1)
reward func(state, action, next state) → reward. (2)
When defining an MDP instance, you must pass in functions of T and R that output a state and re-
ward, respectively. In this way, no MDP is ever responsible for enumerating either S or A explicitly,
thereby allowing for (1) simple specification of these two functions, and (2) efficient implementation
of high-dimensional domains—we need only represent and store the states that are visited during
experimentation.
3
https://www.youtube.com/watch?v=2pWv7GOvuf0

4
Naturally, MDP subclasses have a variety of arguments—in the earlier grid-world example, we saw
the GridWorldMDP class take as input the dimensions of the grid, a starting location, and a list of
goal locations. Such inputs are typical to MDP classes in simple rl.

RL Agents Q-Learning, RMax, DelayedQ, DoubleQ, Random, Fixed

Linear Q-Learning, DQN, LinUCB.
Planning Algorithms Value Iteration, Bounded RTDP, MCTS

MDPs Chain, Grid World, Randomized Graph, Open AI Gym

Combo Lock, Puddle, Hanoi, Bandit
OOMDPs Taxi, Trench, Cleanup
POMDPs Maze
Markov Games Grid Games, Rock Paper Scissors, Prisoner’s Dilemma, Gather

Table 1: An overview of Agents and MDPs in simple rl.

3.1.1 Running Simple Experiments

Defining an agent and an MDP is almost all that is needed to run an experiment. The final component
required is an experiment function from the run experiments.py file. This file contains a number
of different experiment types that are catered to the different environment types (POMDPs, Markov
Games, and so on). For now, let us focus on run agents on mdp function, which is the most
canonical. As per the example in Figure 2, this function takes as input at minimum a list of agents
and an MDP instance. A user can also specify experimental parameters like instances, episodes,
and steps, which indicate the following:

• instances: The number of times to repeat the entire experiment (will be used to form
95% confidence intervals for all experiments conducted).
• episodes: The number of episodes per instance. An episode will consist of steps number
of steps, after which the agent is reset to the start state (but gets to remember what it has
learned so far).
• steps: The number of steps per episode.

The plotting is set up to plot all of the above appropriately. For instance, if a user sets episodes=1
but steps=50, then the library produces a step-wise plot (that is, the x-axis is steps, not episodes).
Running the function run agents on mdp will create a JSON file detailing all of the components
of the experiment needed to rerun it. Then, it will create a folder locally, “results”, store each
agent’s stream of received rewards, and print out the status of the experiment to console. When
the experiment concludes, a learning curve with 95% confidence intervals will be generated (via
simple rl/utils/chart utils.py and opened. The JSON file lets users of the library recon-
struct and rerun the original experiment using another function from the run experiments.py
script. In this way, the JSON file is effectively a certificate that this plot can be reproduced if the
same experiment were run again. We provide more detail on this feature in Section 3.2.
We can also run a similar experiment in the OpenAI Gym (Figure 3).
As can be seen in Figure 3, the structure of the experiment is identical. Since we define a GymMDP,
we pass as input the name of the environment we’d like to produce: In this case, we’re running ex-
periments in CartPole-v1, but any of the usual Gym environment names will work. We can also pass
in the render boolean flag, indicating whether or not we’d like to visualize the learning process. Al-
ternatively, we can pass in the render every n episodes flag (along with render=True), which
will only render the agent’s learning process every N episodes.
On longer experiments, we may want additional feedback about the learning process. For this pur-
pose, the run agents on mdp function also takes as input a Boolean flag verbose, which, if true,
will provide detailed episode-by-episode tracking of the progress of the experiment to the console.

5
1 from simple_rl.tasks import GymMDP
2 from simple_rl.agents import RandomAgent, LinearQAgent
3 from simple_rl.run_experiments import run_agents_on_mdp
4
5 # Gym MDP
6 gym_mdp = GymMDP(env_name=’CartPole-v1’, render=True)
7 num_feats = gym_mdp.get_num_state_feats()
8
9 # Setup agents and run.
10 rand_agent = RandomAgent(gym_mdp.get_actions())
11 lin_q_agent = LinearQAgent(gym_mdp.get_actions(), num_feats, rbf=True)
12 agents = [lin_q_agent, rand_agent]
13
14 # Run.
15 run_agents_on_mdp(agents, gym_mdp, instances=5, episodes=5000, steps=200)

Figure 3: Running experiments in the OpenAI Gym.

There are a number of other ways to run experiments, but these examples capture the core experi-
mental cycle.

Other Environment Types The library offers support for other types of environments beyond typ-
ical MDPs, including classes for Object-Oriented MDPs or OOMDPs [14], k-Armed Bandits [8],
Partially Observable MDPs or POMDPs [21], a probability distribution over MDPs for lifelong
learning [7], and Markov Games [25]. Aspects of these classes are handled slightly differently to
accommodate the different kinds of decision-making problems they capture, but the interface to run
experiments with each type is nearly identical. Examples for how to run experiments with each type
of environment are included in the examples directory in the repository along with a test script that
ensures each example can run on a given machine. Running experiments with these other environ-
ment types is the same as the pipeline so far described: a function in the run experiments.py
script will handle all of the interactions between agent(s) and environment and produce a plot when
the experiment finishes. Notably, the reproducibility feature is not yet fully developed for all envi-
ronment types. This is a major direction for future development of the library.

3.2 Reproducibility

Due to its simplicity, the library is naturally suited for reproducing results from previously run ex-
periments. As mentioned, every experiment that is conducted using the library will create a directory
with the experiment name containing a JSON file “full experiment data.json” that enumer-
ates every parameter, agent, MDP, and type needed to launch the exact same experiment another
time. The idea is that these files can be shared across users of the library—if a user gives someone
else this file (and the necessary agents and environments), it is a contract that they can rerun exactly
the same experiment just run using simple rl.
Using one of these experiment files, the function reproduce from exp file(exp name), will
read the experiment file, reconstruct all the necessary components, rerun the entire experiment, and
remake the plot. Thus, providing one of these JSON files is to be interpreted as a certificate that this
experiment is guaranteed to produce similar results.
As an example, consider again the code from Figure 2. Running this code will create: (1)
the “results” directory, (2) the “gridworld h-3 w-4” directory within results, and (3) the
“full experiment data.json file, which contains all necessary parameters to rerun the experi-
ment.
Suppose someone provided the directory gridworld h-3 w-4 containing the experiment file for
the above grid-world experiment. Then, we could run the following code:
1 from simple_rl.run_experiments import reproduce_from_exp_file
2
3 reproduce_from_exp_file("gridworld_h-3_w-4")

6
Which will automatically generate the plot in Figure 4b.

Cumulative Reward: Gridworld H 3 W 4 Reproduce: gridworld_h-3_w-4

30 Q-learning Q-learning
30
Random Random
25 25
Cumulative Reward

Cumulative Reward
20
20
15
15
10
10
5
5
0
0
0 6 12 18 24 30 36 42 48 0 6 12 18 24 30 36 42 48
Episode Number Episode Number

(a) Original (b) From Automatic Reproduction

Figure 4: Original results (left) and results generated by reproducing the experiment (right).

To ensure reproducibility of new subclasses or other bells and whistles attached to the library, any
agent or MDP must implement the “get parameters(self)” method that returns a dictionary
containing all relevant parameters for the instance to be reconstructed. For example, consider the
QLearningAgent class in Figure 5.
Any introduced subclass that wants to play along well with the reproduction infrastructure in
simple rl must have such a method.
We stipulate that this is a lightweight means of ensuring reproduction for three reasons: 1) it is
entirely obfuscated from the programmer, as all tracking of experimental parameters is done auto-
matically, 2) a single, universally formatted document (JSON) contains all the information needed to
guarantee reproduction of results (along with a copy of the library itself, and any new agents/MDPs),
and 3) the library is simple enough that most experiments consist of only a small number of moving
parts. The feature to reproduce from a JSON does not yet fully support all environment types, but it
is an active area of development for the library.
To recap, the introduced components define the essence of the library: 1) Center everything around
agents, MDPs, and interactions thereof; 2) Completely obscure the complexity of plotting and ex-
periment tracking from the programmer, while making it simple to plot and reproduce results if
neededl; 3) Simplicity above all else; 4) Treat things generatively—namely, MDPs transition mod-
els and reward functions are best implemented as functions that return a state or reward, rather than
enumerate all state–actions pairs.

1 def get_parameters(self):
2 ’’’
3 Returns:
4 (dict) key=param_name (str) --> val=param_val (object).
5 ’’’
6 param_dict = defaultdict(int)
7
8 param_dict["alpha"] = self.alpha
9 param_dict["gamma"] = self.gamma
10 param_dict["epsilon"] = self.epsilon_init
11 param_dict["anneal"] = self.anneal
12 param_dict["explore"] = self.explore
13
14 return param_dict

Figure 5: The get parameters method of QLearningAgent.

7
(a) The Four Rooms grid world (b) Visualizing the estimated value
from Sutton et al. [37]. of states during learning in the grid
world from Russell and Norvig [32].
Figure 6: Example visual generated by the library

Utilities In addition to the core experimental pipeline described, the library is well stocked with
other utilities. These include a variety of plotting tools that allow direct control over the plots
created (for instance, experiments can also generate plots comparing CPU time taken by different
algorithms, and more). For more details on plotting, see the chart utils.py script. The library
also offers functionality for visualizing grid world domains using pygame4 . Two example visuals
are presented in Figure 6; in these case, the learning process is visualized while the experiment
runs. On the right, the estimated value of each state is shown. The library includes several de-
fault planning algorithms such as Value Iteration, Monte Carlo Tree Search [10], and Bounded Real
Time Dynamic Programming [28]. Planners can be used to compute the value function, the opti-
mal (or near-optimal) policy, or enumerate a state-action space (see planning example.py in the
repository). For other supported features, see the examples directory.

4 Conclusion
simple rl offers a lightweight suite of tools for conducting RL experiments in Python 2 and 3.
Its design philosophy focuses on obfuscating complexity from the end user, including the tracking
of experimental details, generation of plots, and construction of agents and MDPs. The result is
a package that is light in features but comes with an ease of use that lets only a few lines of code
generate and visualize results that are guaranteed to be reproducible. The library is available on the
Python package index, and thus can be installed with the usual pip install simple rl. Many
features are currently under development: the most important near term goal is to expand the suite
of reproducibility tools to account for more variety across different operating systems and other
variables that might impact experiments. Additionally, the library would benefit from a suite of
basic deep RL algorithms for use in experimentation and a more general interface for visuals.

Acknowledgments This library was heavily influenced by time spent working with the Java li-
brary for RL and Planning, BURLAP [26]. I extend my sincere gratitude its creator, James Mac-
Glashan, for his care in crafting such an expansive library, and the influence he (and BURLAP) had
in shaping simple rl. Additionally, I would like to thank my advisor Michael Littman for letting
me make this library throughout my Ph.D, and for his help in its development. Thanks to Stefanie
Tellex for her willingness to pick up the library for use in her lab and her encouragement in the
library’s development. Thanks to those who have helped contribute to the library either in code
or concept, including: Cameron Allen, Sebastien Arnold, Dilip Arumugam, Kavosh Asadi, Akhil
Bagaria, Fernando Diaz, Owain Evans, David Halpern, Mark Ho, Yuu Jinnai, Nishanth J Kumar,
Jessica Forde, Neev Parikh, Emily Reif, Yagnesh Revar, John Salvatier, Sean Segal, Yuhang Song,
Paul Touma, Nate Umbanhowarm, Ansel Vahle, David Whitney, and all the members of RLAB and
H2R at Brown. Lastly, thanks to the anonymous student in Michael Littman’s 2017 Fall Reinforce-
ment Learning course at Brown, who initially suggested reproducing experiments from parameter
files.
4
https://pygame.org

8
References
[1] Martı́n Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu
Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. Tensorflow: a system for
large-scale machine learning. In OSDI, volume 16, pages 265–283, 2016.
[2] Marc G. Bellemare, Will Dabney, and Rémi Munos. A distributional perspective on rein-
forcement learning. In Proceedings of the International Conference on Machine Learning,
volume 70, pages 449–458, 2017.
[3] Marc G. Bellemare, Pablo Samuel Castro, Carles Gelada, Saurabh Kumar, and Subhodeep
Moitra. dopamine, 2018.
[4] Richard Bellman. A Markovian decision process. Journal of Mathematics and Mechanics, 6
(5):679–684, 1957.
[5] Denny Britz. reinforcement-learning. github.com/dennybritz/
reinforcement-learning, 2018.
[6] Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang,
and Wojciech Zaremba. Openai gym. arXiv preprint arXiv:1606.01540, 2016.
[7] Emma Brunskill and Lihong Li. Pac-inspired option discovery in lifelong reinforcement learn-
ing. In International Conference on Machine Learning, pages 316–324, 2014.
[8] Robert R Bush and Frederick Mosteller. A stochastic model with applications to learning. The
Annals of Mathematical Statistics, pages 559–585, 1953.
[9] François Chollet et al. Keras. https://keras.io, 2015.
[10] Rémi Coulom. Efficient selectivity and backup operators in Monte-Carlo tree search. In Pro-
ceedings of the International Conference on Computers and Games, pages 72–83. Springer,
2006.
[11] Will Dabney and Pierre-Luc Bacon. python-rl. https://github.com/amarack/
python-rl, 2013.
[12] Thomas Degris, Patrick M Pilarski, and Richard S Sutton. Model-free reinforcement learning
with continuous action in practice. In American Control Conference (ACC), 2012, pages 2177–
2182. IEEE, 2012.
[13] Carlo D’Eramo and Davide Tateo. mushroom. https://github.com/AIRLab-POLIMI/
mushroom, 2018.
[14] Carlos Diuk, Andre Cohen, and Michael L Littman. An object-oriented representation for
efficient reinforcement learning. In Proceedings of the International Conference on Machine
Learning, pages 240–247. ACM, 2008.
[15] Jason Gauci, Edoardo Conti, Yitao Liang, Kittipat Virochsiri, Zhengxing Chen, Yuchen He,
Zachary Kaden, Vivek Narayanan, and Xiaohui Ye. Horizon: Facebook’s open source applied
reinforcement learning platform. arXiv preprint arXiv:1811.00260, 2018.
[16] Alborz Geramifard, Robert H Klein, Christoph Dann, William Dabney, and Jonathan P How.
RLPy: The reinforcement learning library for education and research. http://acl.mit.
edu/RLPy, 2013.
[17] Peter Henderson, Riashat Islam, Philip Bachman, Joelle Pineau, Doina Precup, and David
Meger. Deep reinforcement learning that matters. 2018.
[18] Matteo Hessel, Joseph Modayil, Hado Van Hasselt, Tom Schaul, Georg Ostrovski, Will Dab-
ney, Dan Horgan, Bilal Piot, Mohammad Azar, and David Silver. Rainbow: Combining im-
provements in deep reinforcement learning. Proceedings of the AAAI Conference on Artificial
Intelligence, 2018.
[19] Ronald A Howard. Dynamic programming and Markov processes. 1960.

9
[20] Eric Jones, Travis Oliphant, and Pearu Peterson. Scipy: Open source scientific tools for python.
2014.
[21] Leslie Pack Kaelbling, Michael L Littman, and Anthony R Cassandra. Planning and acting in
partially observable stochastic domains. Artificial Intelligence, 101(1-2):99–134, 1998.
[22] Sergey Levine, Chelsea Finn, Trevor Darrell, and Pieter Abbeel. End-to-end training of deep
visuomotor policies. The Journal of Machine Learning Research, 17(1):1334–1373, 2016.
[23] Eric Liang, Richard Liaw, Philipp Moritz, Robert Nishihara, Roy Fox, Ken Goldberg, Joseph E
Gonzalez, Michael I Jordan, and Ion Stoica. Rllib: Abstractions for distributed reinforcement
learning. 2018.
[24] Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval
Tassa, David Silver, and Daan Wierstra. Continuous control with deep reinforcement learning.
2016.
[25] Michael L Littman. Markov games as a framework for multi-agent reinforcement learning. In
Machine Learning, pages 157–163. Elsevier, 1994.
[26] J MacGlashan. Brown-umbc reinforcement learning and planning (burlap), 2016.
[27] Marlos C Machado, Marc G Bellemare, Erik Talvitie, Joel Veness, Matthew Hausknecht, and
Michael Bowling. Revisiting the arcade learning environment: Evaluation protocols and open
problems for general agents. arXiv preprint arXiv:1709.06009, 2017.
[28] H Brendan McMahan, Maxim Likhachev, and Geoffrey J Gordon. Bounded real-time dynamic
programming: RTDP with monotone upper bounds and performance guarantees. In Proceed-
ings of the International Conference on Machine Learning, pages 569–576. ACM, 2005.
[29] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G
Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al.
Human-level control through deep reinforcement learning. Nature, 518(7540):529, 2015.
[30] Travis Oliphant. Guide to NumPy. 01 2006.
[31] Adam Paszke, Sam Gross, Soumith Chintala, and Gregory Chanan. Pytorch: Tensors and
dynamic neural networks in python with strong gpu acceleration, 2017.
[32] Stuart J Russell and Peter Norvig. Artificial intelligence: a modern approach. Malaysia;
Pearson Education Limited,, 2016.
[33] Tom Schaul, Justin Bayer, Daan Wierstra, Yi Sun, Martin Felder, Frank Sehnke, Thomas
Rückstieß, and Jürgen Schmidhuber. PyBrain. Journal of Machine Learning Research, 2010.
[34] Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver. Prioritized experience replay.
arXiv preprint arXiv:1511.05952, 2015.
[35] David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van
Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanc-
tot, et al. Mastering the game of go with deep neural networks and tree search. nature, 529
(7587):484, 2016.
[36] Richard S Sutton. Learning to predict by the methods of temporal differences. Machine
Learning, 3(1):9–44, 1988.
[37] Richard S Sutton, Doina Precup, and Satinder Singh. Between mdps and semi-mdps: A frame-
work for temporal abstraction in reinforcement learning. Artificial intelligence, 112(1-2):181–
211, 1999.
[38] Richard S Sutton, David A McAllester, Satinder P Singh, and Yishay Mansour. Policy gradi-
ent methods for reinforcement learning with function approximation. In Advances in Neural
Information Processing Systems, pages 1057–1063, 2000.

10
[39] Brian Tanner and Adam White. Rl-glue: Language-independent software for reinforcement-
learning experiments. Journal of Machine Learning Research, 10(Sep):2133–2136, 2009.
[40] Hado Van Hasselt, Arthur Guez, and David Silver. Deep reinforcement learning with double
q-learning. In Proceedings of the AAAI Conference on Artificial Intelligence, 2016.
[41] Ziyu Wang, Tom Schaul, Matteo Hessel, Hado Van Hasselt, Marc Lanctot, and Nando De Fre-
itas. Dueling network architectures for deep reinforcement learning. Proceedings of the Inter-
national Conference on Machine Learning, 2016.
[42] Christopher JCH Watkins and Peter Dayan. Q-learning. Machine Learning, 8(3-4):279–292,
1992.
[43] Ronald J Williams. Simple statistical gradient-following algorithms for connectionist rein-
forcement learning. Machine Learning, 8(3-4):229–256, 1992.

The AI Wealth Creation Blueprint PDF
67% (3)
The AI Wealth Creation Blueprint PDF
50 pages
The Age of AI and Our Human Future (Henry Kissinger, Eric Schmidt Etc.) (Z-Library)
100% (8)
The Age of AI and Our Human Future (Henry Kissinger, Eric Schmidt Etc.) (Z-Library)
148 pages
How To Hack Atm
87% (15)
How To Hack Atm
1 page
Activity Complete Your Project Charter Assignment Answer
100% (5)
Activity Complete Your Project Charter Assignment Answer
3 pages
Christopher Langan - CTMU, The Cognitive-Theoretic Model of The Universe, A New Kind of Reality Theory
88% (8)
Christopher Langan - CTMU, The Cognitive-Theoretic Model of The Universe, A New Kind of Reality Theory
56 pages
Data Structure and Algorithmic Thinking With Python Data Structure and Algorithmic Puzzles PDF
95% (20)
Data Structure and Algorithmic Thinking With Python Data Structure and Algorithmic Puzzles PDF
471 pages
Gayle Laakmann McDowell - Cracking The Coding Interview - 189 Programming Questions and Solutions (2015, CareerCup)
81% (48)
Gayle Laakmann McDowell - Cracking The Coding Interview - 189 Programming Questions and Solutions (2015, CareerCup)
708 pages
Gödel, Escher, Bach - An Eternal Golden Braid (20th Anniversary Edition) by Douglas R. Hofstadter (Charm-Quark) PDF
100% (10)
Gödel, Escher, Bach - An Eternal Golden Braid (20th Anniversary Edition) by Douglas R. Hofstadter (Charm-Quark) PDF
821 pages
Cracking The Coding Interview - 189 Programming Questions and Solutions (6th Edition) (EnglishOnlineClub - Com)
100% (10)
Cracking The Coding Interview - 189 Programming Questions and Solutions (6th Edition) (EnglishOnlineClub - Com)
708 pages
Chris Bailey - Hyperfocus - The New Science of Attention, Productivity, and Creativity-Viking (2018)
100% (25)
Chris Bailey - Hyperfocus - The New Science of Attention, Productivity, and Creativity-Viking (2018)
306 pages
The Art of Asking ChatGPT For High-Quality Answers A Complete Guide To Prompt Engineering Techniques (Ibrahim John) (Z-Library)
100% (24)
The Art of Asking ChatGPT For High-Quality Answers A Complete Guide To Prompt Engineering Techniques (Ibrahim John) (Z-Library)
52 pages
(Addison-Wesley Data & Analytics Series) Laura Graesser - Wah Loon Keng - Foundations of Deep Reinforcement Learning - Theory and Practice in Python-Addison-Wesley Professional (2019) PDF
100% (1)
(Addison-Wesley Data & Analytics Series) Laura Graesser - Wah Loon Keng - Foundations of Deep Reinforcement Learning - Theory and Practice in Python-Addison-Wesley Professional (2019) PDF
656 pages
The Fabric of Reality
100% (1)
The Fabric of Reality
6 pages
Banana Pancakes - Ukulele Chord Chart
100% (1)
Banana Pancakes - Ukulele Chord Chart
2 pages
The WingMakers
100% (2)
The WingMakers
308 pages
Python for Chemistry: An introduction to Python algorithms, Simulations, and Programing for Chemistry (English Edition)
From Everand
Python for Chemistry: An introduction to Python algorithms, Simulations, and Programing for Chemistry (English Edition)
Dr. M. Kanagasabapathy
5/5 (1)
ZinKo Thu (Network Engineer Resume)
No ratings yet
ZinKo Thu (Network Engineer Resume)
3 pages
75 Productivity Hacks - System Sunday
100% (7)
75 Productivity Hacks - System Sunday
75 pages
Cs 229, Autumn 2016 Problem Set #2: Naive Bayes, SVMS, and Theory
No ratings yet
Cs 229, Autumn 2016 Problem Set #2: Naive Bayes, SVMS, and Theory
20 pages
Military Remote Viewing Manual
100% (5)
Military Remote Viewing Manual
72 pages
Machine Learning For Humans
100% (4)
Machine Learning For Humans
97 pages
2402.03046v1
No ratings yet
2402.03046v1
25 pages
AI - Assignment 2 Zaryab Khan
No ratings yet
AI - Assignment 2 Zaryab Khan
6 pages
Rlpyt: A Research Code Base For Deep Reinforcement Learning in Pytorch
No ratings yet
Rlpyt: A Research Code Base For Deep Reinforcement Learning in Pytorch
12 pages
A Crash Course On Reinforcement Learning
No ratings yet
A Crash Course On Reinforcement Learning
40 pages
Dotrl: A Platform For Rapid Reinforcement Learning Methods Development and Validation
No ratings yet
Dotrl: A Platform For Rapid Reinforcement Learning Methods Development and Validation
9 pages
PR ZXV
No ratings yet
PR ZXV
8 pages
Building Reinforcement Learning Environment
No ratings yet
Building Reinforcement Learning Environment
7 pages
Deep Reinforcement Learning: A Brief Survey
No ratings yet
Deep Reinforcement Learning: A Brief Survey
13 pages
RLtools
No ratings yet
RLtools
15 pages
RLDL128
No ratings yet
RLDL128
73 pages
RLtools-Nov. 2024
No ratings yet
RLtools-Nov. 2024
19 pages
Reinforcement Learning For IoT - Final
No ratings yet
Reinforcement Learning For IoT - Final
45 pages
D3rlpy: An Offline Deep Reinforcement Learning Library: Takuma Seno
No ratings yet
D3rlpy: An Offline Deep Reinforcement Learning Library: Takuma Seno
20 pages
A Brief Survey of Deep Reinforcement Learning PDF
No ratings yet
A Brief Survey of Deep Reinforcement Learning PDF
14 pages
Deepbots: A Webots-Based Deep Reinforcement Learning Framework For Robotics
No ratings yet
Deepbots: A Webots-Based Deep Reinforcement Learning Framework For Robotics
12 pages
Paper Fiuri
No ratings yet
Paper Fiuri
17 pages
Podracer Architectures For Scalable Reinforcement Learning
No ratings yet
Podracer Architectures For Scalable Reinforcement Learning
12 pages
AIot Lab Syllabus
No ratings yet
AIot Lab Syllabus
4 pages
Imitation:: Clean Imitation Learning Implementations
No ratings yet
Imitation:: Clean Imitation Learning Implementations
7 pages
FDS Lab
No ratings yet
FDS Lab
11 pages
Lab3 - Introduction To Machine Learning Algorithms With A Focus On Robotics Applications
No ratings yet
Lab3 - Introduction To Machine Learning Algorithms With A Focus On Robotics Applications
12 pages
Core Libraries For Machine Learning
No ratings yet
Core Libraries For Machine Learning
5 pages
Hyperparameter Tuning For Deep Reinforcement Learning Applications
No ratings yet
Hyperparameter Tuning For Deep Reinforcement Learning Applications
12 pages
gymnasium
No ratings yet
gymnasium
10 pages
Best Python Libraries For Machine Learning - GeeksforGeeks
No ratings yet
Best Python Libraries For Machine Learning - GeeksforGeeks
18 pages
Inteligencia Artificial para Videojuegos Con Deep Reinf Molina Garcia Alvaro
No ratings yet
Inteligencia Artificial para Videojuegos Con Deep Reinf Molina Garcia Alvaro
67 pages
A Tour of TensorFlow
No ratings yet
A Tour of TensorFlow
17 pages
A Brief Survey of Deep Reinforcement Learning
No ratings yet
A Brief Survey of Deep Reinforcement Learning
16 pages
PDF 5
No ratings yet
PDF 5
8 pages
Done Assignment
No ratings yet
Done Assignment
9 pages
HW1 Questions
No ratings yet
HW1 Questions
10 pages
Data Preprocessing-AIML Algorithm1
No ratings yet
Data Preprocessing-AIML Algorithm1
47 pages
Lab Manual (AI)
100% (1)
Lab Manual (AI)
17 pages
ML_Exp
No ratings yet
ML_Exp
9 pages
An Introduction To Deep Reinforcement Learning PDF
No ratings yet
An Introduction To Deep Reinforcement Learning PDF
140 pages
2112.03636v1
No ratings yet
2112.03636v1
7 pages
Reinforcement Learning2018
No ratings yet
Reinforcement Learning2018
5 pages
IoT lab file
No ratings yet
IoT lab file
44 pages
to send
No ratings yet
to send
6 pages
Tianshou
No ratings yet
Tianshou
6 pages
OR-Gym: A Reinforcement Learning Library For Operations Research Problems
No ratings yet
OR-Gym: A Reinforcement Learning Library For Operations Research Problems
28 pages
Gerald Corzo 5/26/2020: Workshop Google Machine Learning Tools (Services) 1
No ratings yet
Gerald Corzo 5/26/2020: Workshop Google Machine Learning Tools (Services) 1
24 pages
CETM313 - Workshop Week 06-4
No ratings yet
CETM313 - Workshop Week 06-4
9 pages
DL Practical
No ratings yet
DL Practical
14 pages
Qpol 2104.07715
No ratings yet
Qpol 2104.07715
31 pages
RLDL File
No ratings yet
RLDL File
31 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
47 pages
RL-DL File
No ratings yet
RL-DL File
18 pages
ML_lab
No ratings yet
ML_lab
30 pages
15703wPg#s
No ratings yet
15703wPg#s
70 pages
Experiment 5
No ratings yet
Experiment 5
8 pages
Neural Networks
No ratings yet
Neural Networks
39 pages
Deep Reinforcement Learning An Overview
No ratings yet
Deep Reinforcement Learning An Overview
30 pages
1701 07274v2 PDF
No ratings yet
1701 07274v2 PDF
30 pages
Machine Learning - Lab Wise Manual Abbbbb
No ratings yet
Machine Learning - Lab Wise Manual Abbbbb
13 pages
AI_Python_Programs_Practice
No ratings yet
AI_Python_Programs_Practice
2 pages
Production System: Fundamentals and Applications
From Everand
Production System: Fundamentals and Applications
Fouad Sabry
No ratings yet
Python Interview Questions You'll Most Likely Be Asked
From Everand
Python Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
2/5 (1)
The Secrets of A Slot Machine
No ratings yet
The Secrets of A Slot Machine
4 pages
Roadmap How To Learn AI in 2024 (Uncovered AI)
No ratings yet
Roadmap How To Learn AI in 2024 (Uncovered AI)
6 pages
Teas Topics To Study
100% (12)
Teas Topics To Study
6 pages
My Ai Cheat List
100% (11)
My Ai Cheat List
3 pages
Tech Trend 2024 Report-2
No ratings yet
Tech Trend 2024 Report-2
11 pages
2045: The Year Man Becomes Immortal
No ratings yet
2045: The Year Man Becomes Immortal
9 pages
From Music To Mathematic
100% (1)
From Music To Mathematic
4 pages
Mind Control Patents
100% (1)
Mind Control Patents
41 pages
Rationality From AI To Zombies
86% (7)
Rationality From AI To Zombies
1,813 pages
Attention Is All You Need
67% (3)
Attention Is All You Need
11 pages
Python Programming and Maching Learning 2 in 1 B08Y5DPX32
100% (7)
Python Programming and Maching Learning 2 in 1 B08Y5DPX32
145 pages
Wisc V Interpretation
100% (1)
Wisc V Interpretation
8 pages
Current and Future Trends on AI Applications - Mohammed A Al-Sharafi
No ratings yet
Current and Future Trends on AI Applications - Mohammed A Al-Sharafi
456 pages
Psych Unit 7a Practice Quiz
No ratings yet
Psych Unit 7a Practice Quiz
4 pages
AIMA Decision Trees
No ratings yet
AIMA Decision Trees
11 pages
Embedded System - Embedded Linux (Full-Final)
No ratings yet
Embedded System - Embedded Linux (Full-Final)
47 pages
AUTOSAR Classic Platfrom Guide With Piest Systems-1
No ratings yet
AUTOSAR Classic Platfrom Guide With Piest Systems-1
27 pages
Delinea Licensing
No ratings yet
Delinea Licensing
41 pages
Linux For Bioinformatics (2012), Paul Stothard
100% (1)
Linux For Bioinformatics (2012), Paul Stothard
36 pages
Tawk
No ratings yet
Tawk
21 pages
File Save As: Try It
No ratings yet
File Save As: Try It
7 pages
Db12c Exam Study Guide 2202720
No ratings yet
Db12c Exam Study Guide 2202720
16 pages
Software Quality Management
No ratings yet
Software Quality Management
22 pages
Chapter 2 Asset Security (Domain 2) - CISSP Official (ISC) 2 Practice Tests
100% (1)
Chapter 2 Asset Security (Domain 2) - CISSP Official (ISC) 2 Practice Tests
15 pages
Cash Register Express Quickstep Instructions
No ratings yet
Cash Register Express Quickstep Instructions
8 pages
Lan
100% (8)
Lan
19 pages
NDT LEVEL II UT Technician Resume CV Format CV Sample Model Example BioData Template Cover Letter PDF
No ratings yet
NDT LEVEL II UT Technician Resume CV Format CV Sample Model Example BioData Template Cover Letter PDF
3 pages
Blind Source Separation Using Modified Gaussian Fastica: V. K. Ananthashayana, and Jyothirmayi M
No ratings yet
Blind Source Separation Using Modified Gaussian Fastica: V. K. Ananthashayana, and Jyothirmayi M
4 pages
Graphical Machining Process Simulator
No ratings yet
Graphical Machining Process Simulator
37 pages
Mamta Singh Resume.
No ratings yet
Mamta Singh Resume.
3 pages
5040 - MASIBUS CONTROLLER User - Manual PDF
No ratings yet
5040 - MASIBUS CONTROLLER User - Manual PDF
53 pages
Philips Az1018 PDF
No ratings yet
Philips Az1018 PDF
34 pages
Instruction Manual Fisher 8560 Eccentric Disc Butterfly Control Valve en 137996
No ratings yet
Instruction Manual Fisher 8560 Eccentric Disc Butterfly Control Valve en 137996
36 pages
Instalacion Cabezal HP A3
No ratings yet
Instalacion Cabezal HP A3
20 pages
Web Security: Different Web Application Attacks Client Side Attack
No ratings yet
Web Security: Different Web Application Attacks Client Side Attack
9 pages
Software Requirements Specification - Payment Gateway
No ratings yet
Software Requirements Specification - Payment Gateway
4 pages
Visucam 200 Si 500 PDF
No ratings yet
Visucam 200 Si 500 PDF
96 pages
Tech Note FBB E325727 FleetOne SW Release 124
No ratings yet
Tech Note FBB E325727 FleetOne SW Release 124
5 pages
LoginAndPayWithAmazonIntegrationGuide. V361295154
No ratings yet
LoginAndPayWithAmazonIntegrationGuide. V361295154
82 pages
Management Information System
100% (3)
Management Information System
24 pages
rohini_51140588608
No ratings yet
rohini_51140588608
11 pages

Simple RL: Reproducible Reinforcement Learning in Python: David - Abel@brown - Edu

Uploaded by

Simple RL: Reproducible Reinforcement Learning in Python: David - Abel@brown - Edu

Uploaded by

simple rl: Reproducible Reinforcement Learning in

run CumulaWivH RHwarG: GriGwRrlG H 3 : 4

JSON log of experiment

2019 ICLR Workshop on Reproducibility in Machine Learning, New Orleans, LA.

2 Relation To Other Libraries

reinforcement-learning offers an excellent resource for RL education—it is designed to

3.1 The Core: Agents and MDPs

RL Agents Q-Learning, RMax, DelayedQ, DoubleQ, Random, Fixed

MDPs Chain, Grid World, Randomized Graph, Open AI Gym

Table 1: An overview of Agents and MDPs in simple rl.

3.1.1 Running Simple Experiments

Figure 3: Running experiments in the OpenAI Gym.

Cumulative Reward: Gridworld H 3 W 4 Reproduce: gridworld_h-3_w-4

(a) Original (b) From Automatic Reproduction

Figure 5: The get parameters method of QLearningAgent.

You might also like