Ray: A Cluster Computing Engine for Reinforcement Learning Applications with Philipp Moritz and Robert Nishihara

Ray: A Distributed Execution Framework for
Emerging AI Applications
Presenters: Philipp Moritz, Robert Nishihara
Spark Summit West
June 6, 2017

Supervised Learning
Model
“CAT”
Data point Label

Supervised Learning → Reinforcement Learning

● One prediction ● Sequences of actions→

● One prediction
● Static environments
● Sequences of actions
● Dynamic environments→
→

● One prediction
● Static environments
● Immediate feedback
● Sequences of actions
● Dynamic environments
● Delayed rewards→
→
→

Process inputs from different sensors in parallel & real-time
RL Application Pattern

Execute large number of simulations, e.g., up to 100s of millions

Rollouts outcomes are used to update policy (e.g., SGD)
Update
policy
Update
policy
…
…
Update
policy
simulations
Update
policy
…

…
Update
policy
Update
policy
…
Update
policy
rollout
Update
policy
…

Often policies implemented by DNNs
actions
observations

Often policies implemented by DNNs
Most RL algorithms developed in Python

RL Application Requirements
Need to handle dynamic task graphs, where tasks have
• Heterogeneous durations
• Heterogeneous computations
Schedule millions of tasks/sec
Make it easy to parallelize ML algorithms written in Python

Ray API - remote functions
def zeros(shape):
return np.zeros(shape)
def dot(a, b):
return np.dot(a, b)

@ray.remote
def zeros(shape):
@ray.remote
def dot(a, b):
return np.dot(a, b)

@ray.remote
def zeros(shape):
@ray.remote
def dot(a, b):
return np.dot(a, b)
id1 = zeros.remote([5, 5])
id3 = dot.remote(id1, id2)
ray.get(id3)

@ray.remote
def zeros(shape):
@ray.remote
def dot(a, b):
return np.dot(a, b)
ray.get(id3)
● Blue variables are Object IDs.

@ray.remote
def zeros(shape):
@ray.remote
def dot(a, b):
return np.dot(a, b)
ray.get(id3)
● Blue variables are Object IDs.
id1 id2
id3
zeros zeros
dot

Ray API - actors
class Counter(object):
def __init__(self):
self.value = 0
def inc(self):
self.value += 1
return self.value
c = Counter()
c.inc() # This returns 1

Ray API - actors
@ray.remote
def __init__(self):
self.value = 0
def inc(self):
self.value += 1
return self.value
c = Counter.remote()
id1 = c.inc.remote()
ray.get([id1, id2, id3]) # This returns [1, 2, 3]
● State is shared between actor methods.
● Actor methods return Object IDs.

Ray API - actors
@ray.remote
def __init__(self):
self.value = 0
def inc(self):
self.value += 1
return self.value
id1inc
Counter
inc
inc
id2
id3

Ray API - actors
@ray.remote(num_gpus=1)
def __init__(self):
self.value = 0
def inc(self):
self.value += 1
return self.value
● Can specify GPU requirements
id1inc
Counter
inc
inc
id2
id3

Ray architecture
Node 1 Node 2 Node 3

Ray architecture
Driver

Ray architecture
Worker WorkerWorker WorkerWorkerDriver

Ray architecture
WorkerDriver WorkerWorker WorkerWorker
Object Store Object Store Object Store

Ray architecture
Local Scheduler Local Scheduler Local Scheduler

Ray architecture
Global Scheduler
Global Scheduler
Global Scheduler
Global Scheduler

Ray architecture
Global Control Store
Global Scheduler
Global Scheduler
Global Scheduler
Global Scheduler

Ray architecture
Global Scheduler
Global Scheduler
Global Scheduler
Global Scheduler

Ray architecture
Debugging Tools
Profiling Tools
Web UI
Global Scheduler
Global Scheduler
Global Scheduler
Global Scheduler

Ray performance
One million tasks per
second

Ray performance
Latency of local task execution: ~300 us
Latency of remote task execution: ~1ms
One million tasks per
second

Ray fault tolerance
Reconstruction

Ray fault tolerance
Reconstruction
Reconstruction

Evolution Strategies
actions
observations
rewards
PolicySimulator
Try lots of different policies and see which one works best!

Pseudocode
actions
observations
rewards
class Worker(object):
def do_simulation(policy, seed):
# perform simulation and return reward

Pseudocode
actions
observations
rewards
workers = [Worker() for i in range(20)]
policy = initial_policy()

Pseudocode
for i in range(200):
seeds = generate_seeds(i)
rewards = [workers[j].do_simulation(policy, seeds[j])
for j in range(20)]
policy = compute_update(policy, rewards, seeds)
actions
observations
rewards

Pseudocode
@ray.remote
for j in range(20)]
actions
observations
rewards

Pseudocode
@ray.remote
workers = [Worker.remote() for i in range(20)]
for j in range(20)]
actions
observations
rewards

Pseudocode
@ray.remote
rewards = [workers[j].do_simulation.remote(policy, seeds[j])
for j in range(20)]
actions
observations
rewards

Pseudocode
@ray.remote
rewards = [workers[j].do_simulation.remote(policy, seeds[j])
for j in range(20)]
policy = compute_update(policy, ray.get(rewards), seeds)
actions
observations
rewards

Evolution strategies on Ray
10 nodes 20 nodes 30 nodes 40 nodes 50 nodes
Reference 97K 215K 202K N/A N/A
Ray 152K 285K 323K 476K 571K
The Ray implementation takes half the amount of
code and was implemented in a couple of hours
Simulator steps per second:

Ray + Apache Spark
● Complementary
○ Spark handles data processing, “classic” ML algorithms
○ Ray handles emerging AI algos., e.g. reinforcement learning (RL)
● Interoperability through object store based on Apache Arrow
○ Common data layout
○ Supports multiple languages

Ray is a system for AI Applications
● Ray is open source! https://github.com/ray-project/ray
● We have a v0.1 release!
pip install ray
● We’d love your feedback
Philipp
Ion
Alexey
Stephanie
Johann Richard
William Mehrdad
Mike
Robert

Ray: A Cluster Computing Engine for Reinforcement Learning Applications with Philipp Moritz and Robert Nishihara

More Related Content

What's hot (20)

Similar to Ray: A Cluster Computing Engine for Reinforcement Learning Applications with Philipp Moritz and Robert Nishihara (20)

More from Databricks (20)

Recently uploaded (20)

Ray: A Cluster Computing Engine for Reinforcement Learning Applications with Philipp Moritz and Robert Nishihara