0% found this document useful (0 votes)

12 views

2015.08.26.Lecture01Intro 2

This document provides an introduction to a course on deep reinforcement learning. The goals of the course are to understand how deep RL can be applied in various domains and learn about three classes of RL algorithms that use neural networks. The lecture will define deep RL, discuss where RL is currently deployed, and outline three main classes of algorithms to be covered: policy gradients, approximate dynamic programming, and search with supervised learning. Examples of RL applications include robotics, business, finance, e-commerce, and medicine.

Uploaded by

hinsermu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

2015.08.26.Lecture01Intro 2

Uploaded by

hinsermu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

Deep

Reinforcement Learning

Lecture 1: Introduction

John Schulman
Goal of the Course

■ Understand how deep reinforcement learning can be applied in various domains
■ Learn about three classes of RL algorithm and how implement with neural
networks
■ policy gradient methods
■ approximate dynamic programming
■ search + supervised learning
■ Understand the state of deep RL as a research topic

2
Outline of Lecture

■ What is “deep reinforcement learning”

■ Where is reinforcement learning deployed?
■ Where is reinforcement learning NOT deployed?
(but could be…)

3
Sequential Decision Making

Agent
Action Observation,
Reward

Environment

Goal: maximize expected total reward

with respect to the policy: a function from observation history to
next action

4
Applications
■ Robotics:
■ Actions: torque at joints
■ Observations: sensor readings
■ Rewards:
■ navigate to target location

5
Applications
■ Robotics:
■ Actions: torque at joints
■ Observations: sensor readings
■ Rewards:
■ navigate to target location
■ complete manipulation
task

6
Applications
■ Business operations
■ Inventory management: how much to purchase of
inventory, spare parts
■ Resource allocation: e.g. in call center, who to service first
■ Routing problems: e.g. for management of shipping fleet,
which trucks/truckers to assign to which cargo

7
Applications
■ Finance
■ Investment decisions
■ Portfolio design
■ Option/asset pricing

8
Applications
■ E-‐commerce / media
■ What content to present to users (using click-‐through / visit
time as reward)
■ What ads to present to users (avoiding ad fatigue)

9
Applications
■ Medicine
■ What tests to perform, what treatments to provide

10
Applications
■ Structured prediction: algorithm has to make a sequence of
predictions, which are fed back into predictor
■ in NLP, text generation & translation, parsing [1,2]
■ multi-‐step pipelines in vision [3]
1.2. CATEGORIZATION OF LEARNING TASKS 9

Iterate many *mes over graph:

[1] Daumé, Hal, et al..Search-‐based structured
prediction (2009) Features
Predictor
Neighbors’
[2] Shi, T et al., Learning Where to Sample in predic*ons

Structured Prediction, (2015)

[3] S. Ross, Interactive Learning for Sequential
Figure 1.3: Depiction of the inference or decoding process of structured prediction meth-
ods in the context of image labeling. E↵ectively, a sequence of predictions are made at
Decisions and Predictions, 2013
each pixel/image segments over the image, using local image features, and previous
11
computations/predictions at nearby pixels/image segments. This is often iterated many
RL vs Other Learning Problems
■ Supervised learning: classification / regression
■ given observation, predict label, maximize reward
function R(observation, label)

object detection speech recognition

12
RL vs Other Learning Problems
■ Contextual Bandits
■ given observation, output action, receive reward, with
unknown and stochastic dependence on action and
observation
■ e.g., advertising

13
RL vs Other Learning Problems
■ Reinforcement learning
■ given observation, output action, receive reward, with
unknown and stochastic dependence on action and
observation
■ AND we perform a sequence of actions, and states
depend on previous actions

14
RL vs. Other Learning Problems
o o o o o o o o o

a a a a a a a a a

r r r r r r r
r r

Supervised learning
⊂ Contextual bandits ⊂ Reinforcement learning

deterministic decision stochastic

node node node

15
How is RL different from Supervised Learning, In Practice?

■ State distribution is affected by policy

■ Need for exploration
■ Leads to instability in many algorithms

■ Can’t use past data — online learning is not straightforward

16
What is “Deep RL”?

Agent
Action Observation,
Reward

Environment

17
What is “Deep RL”?

fθ(history)
Action Observation,
Reward

Environment

18
Deep RL: Algorithm Design Criteria
■ Algorithm learns a parameterized function fθ
■ Algorithm does not depend on parameterization, just that loss
is differentiable wrt θ
■ Optimize using gradient-‐based algorithms, using gradient
estimators ∇θLoss

■ computational complexity is linear in θ

■ sample complexity is (in a sense) independent of θ
19
Nonlinear/Nonconvex Learning
■ Supervised learning: just an unconstrained minimization of differentiable objective

■ minimizeθ Loss(Xtrain, ytrain)

■ easy to get convergence to local minimum

■ Reinforcement learning: no differentiable objective to optimize!

■ actual objective E[total reward] is an expectation over random variables of unknown
system

■ Approximate Dynamic Programming methods e.g. Q-‐learning: NOT gradient descent

on fixed objective, NOT guaranteed to converge

20
Deep RL Allows Unified Treatment of Problem Classes

■ No difference between Markov Decision Process (MDP) and Partially

Observed Markov Decision Process (POMDP)
■ Not much difference between discrete and continuous state/action setting
■ No difference between finite-‐horizon, infinite horizon discounted, and
average-‐cost setting
■ we’re always just ignoring long-‐term dependencies

21
Deep RL Frontier
■ Opportunity for theoretical / conceptual advances
■ How to explore state space
■ How to have a policy that involves actions with different
timescales, or has subgoals (hierarchy)
■ How to combine reinforcement learning with unsupervised
learning

22
Deep RL Frontier
■ Opportunity for empirical/engineering advances

■ Pre-‐2012, object recognition state-‐of-‐the-‐art used hand-‐engineered features + learned

linear classifiers + hand-‐engineered grouping mechanism

■ Now entire computer vision field uses deep neural networks for feature extraction, and
moving towards end-‐to-‐end optimization of entire pipeline

Figure 2: An illustration of the architecture of our CNN, explicitly showing the delineation of responsibilities
[KSH2012] Krizhevsky,
between Sutskever,
the two GPUs.& HOne
inton.,
GPUImageNet Classification
runs the layer-parts at the top ofwthe
ith figure
Deep Convolutional
while Neural
the other runs the Networks, 2012
layer-parts
at the bottom. The GPUs communicate only at certain layers. The network’s input is 150,528-dimensional, and
the number of neurons in the network’s remaining layers is given by 253,440–186,624–64,896–64,896–43,264–
4096–4096–1000.
23
Where is RL Deployed Today
■ Operations research (see, e.g., [1])
■ Inventory / storage
■ Power grid: when to buy new transformers. Each costs $5M, but failure
leads to much bigger costs
■ How much of items to purchase and keep in stock
■ Resource allocation
■ Fleet management: assign cargos to truck drivers, locomotives to trains
■ Queueing problems: which customers to serve first in call center

24
RL in Robotics
■ Most industrial robotic systems perform a fixed motion repeatedly with simple or no
perception.

■ Iterative Learning Control [1] is used in some robotic systems — using model of dynamics,
correct errors in trajectories. But these systems still use simple or no perception

[1] Bristow, Douglas, Marina Tharayil, and Andrew G. Alleyne. A survey of iterative learning control
25
Classic Paradigm for Vision-‐Based Robotics

Motor
commands

state estimation, Motion planning

integration + control
Sensor data
images / lidar
World model

26
Future paradigm?

Motor
commands

Sensor data
images / lidar Figure 2: An illustration of the architecture of our CNN, explicitly showing the delineation of responsibilities
Deep neural net
between the two GPUs. One GPU runs the layer-parts at the top of the figure while the other runs the layer-parts
at the bottom. The GPUs communicate only at certain layers. The network’s input is 150,528-dimensional, and
the number of neurons in the network’s remaining layers is given by 253,440–186,624–64,896–64,896–43,264–
4096–4096–1000.

neurons in a kernel map). The second convolutional layer takes as input the (response-normalized
and pooled) output of the first convolutional layer and filters it with 256 kernels of size 5 ⇥ 5 ⇥ 48.
The third, fourth, and fifth convolutional layers are connected to one another without any intervening
pooling or normalization layers. The third convolutional layer has 384 kernels of size 3 ⇥ 3 ⇥
256 connected to the (normalized, pooled) outputs of the second convolutional layer. The fourth
convolutional layer has 384 kernels of size 3 ⇥ 3 ⇥ 192 , and the fifth convolutional layer has 256
kernels of size 3 ⇥ 3 ⇥ 192. The fully-connected layers have 4096 neurons each. 27
Frontiers in Robotic Manipulation

28
Frontiers in Robotic Locomotion

Mordatch et al., Interactive Control of Diverse Complex Characters

with Neural Networks, Under review (2015)
29
Frontiers in Robotic Locomotion

Mordatch, Igor, Kendall Lowrey, and Emanuel Todorov. Ensemble-‐CIO: Full-‐Body Dynamic Motion
Planning that Transfers to Physical Humanoids.
30
Frontiers in Locomotion

Schulman, Moritz, Levine, Jordan, Abbeel (2015)

High-‐Dimensional Continuous Control Using Generalized Advantage Estimation
31
Atari Games

Schulman, Levine, Moritz, Jordan, Abbeel (2015) Trust Region Policy Optimization

32
Where Else Could Deep RL Be Applied?

33
Outline for Next Lectures
■ Mon 8/31: MDPs
■ Weds 9/2: neural nets and backprop
■ Mon 9/9: policy gradients

34
Brushing up on RL: refs
■ MDP review
■ Sutton and Barto, ch 3 and 4
■ See Andrew Ng’s thesis, ch 1-‐2 for a nice concise review of MDPs

35
Reinforcement Learning Textbooks
■ Sutton & Barto, Reinforcement Learning: An Introduction

■ informal, prefers online algorithms

■ Bertsekas, Dynamic Programming and Optimal Control

■ Vol 1. ch 6: survey of some of the most useful practical approaches for control, e.g. MPC, rollout algs

■ Vol 2 (Approximate Dynamic Programming, 3ed): linear and otherwise tractable methods for solving for value functions, policy
iteration algs

■ Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming

■ Exact methods for solving MDPs, including modified policy iteration

■ Czepesvari, Algorithms for Reinforcement Learning

■ Theory on online algorithms

■ Powell, Approximate Dynamic Programming

great on OR applications

36
■
Thanks
■ Next class is Monday, August 31st
■ We’ll cover MDPs
■ First homework will be released
■ uses python+numpy+ipython

Action Plan in Basic Calculus (Specialized)
100% (1)
Action Plan in Basic Calculus (Specialized)
2 pages
Malmo Tutorial
No ratings yet
Malmo Tutorial
9 pages
An Introduction To Deep Reinforcement Learning PDF
No ratings yet
An Introduction To Deep Reinforcement Learning PDF
140 pages
Deep Reinforcement Learning An Overview
No ratings yet
Deep Reinforcement Learning An Overview
30 pages
1701 07274v2 PDF
No ratings yet
1701 07274v2 PDF
30 pages
Deep Learning in Neural Networks An Overview
No ratings yet
Deep Learning in Neural Networks An Overview
89 pages
Deep Learning in Neural Networks: An Overview
No ratings yet
Deep Learning in Neural Networks: An Overview
31 pages
1 Introduction To RL
No ratings yet
1 Introduction To RL
46 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
47 pages
ML 5 Reinforcement
No ratings yet
ML 5 Reinforcement
23 pages
RL Chap 5
No ratings yet
RL Chap 5
21 pages
Deep Learning 2 July 2014
No ratings yet
Deep Learning 2 July 2014
75 pages
Rldl End Sem
No ratings yet
Rldl End Sem
230 pages
Ibarz Et Al 2021 How To Train Your Robot With Deep Reinforcement Learning Lessons We Have Learned
No ratings yet
Ibarz Et Al 2021 How To Train Your Robot With Deep Reinforcement Learning Lessons We Have Learned
24 pages
Applications of Reinforcement Learning
No ratings yet
Applications of Reinforcement Learning
10 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
3 pages
CSD411-Week_3-_Learning_paradigms_and_Mathematical_Foundations_172361284795468330766bc3eaf84fd2
No ratings yet
CSD411-Week_3-_Learning_paradigms_and_Mathematical_Foundations_172361284795468330766bc3eaf84fd2
132 pages
Deep Learning 15 May 2014
No ratings yet
Deep Learning 15 May 2014
70 pages
A Brief Survey of Deep Reinforcement Learning
No ratings yet
A Brief Survey of Deep Reinforcement Learning
16 pages
Final
No ratings yet
Final
18 pages
Lec 1 Intro Course Overview
No ratings yet
Lec 1 Intro Course Overview
50 pages
Defining Problem From Solutions - Inverse Reinforcement Learning (IRL) and Its Applications For Next-Generation Networking
No ratings yet
Defining Problem From Solutions - Inverse Reinforcement Learning (IRL) and Its Applications For Next-Generation Networking
9 pages
03-04-lessonarticle
No ratings yet
03-04-lessonarticle
5 pages
DL unit 5 perfect pdf._1
No ratings yet
DL unit 5 perfect pdf._1
17 pages
Unit 5 - Copy
No ratings yet
Unit 5 - Copy
7 pages
Reinforcement Learning For IoT - Final
No ratings yet
Reinforcement Learning For IoT - Final
45 pages
Deep Reinforcement Learning Yuxi Li Itebooks download
No ratings yet
Deep Reinforcement Learning Yuxi Li Itebooks download
53 pages
Comprehensive Survey of Reinforcement Learning From Algorithms to Practical Challenges
No ratings yet
Comprehensive Survey of Reinforcement Learning From Algorithms to Practical Challenges
79 pages
SocrAI Day 4
No ratings yet
SocrAI Day 4
38 pages
Playbook Executive Briefing Reinforcement Learning
No ratings yet
Playbook Executive Briefing Reinforcement Learning
20 pages
UNIT V reinforcement learning
No ratings yet
UNIT V reinforcement learning
8 pages
Reinforcement Learning Notes ?
No ratings yet
Reinforcement Learning Notes ?
40 pages
On Transforming Reinforcement Learning With Transformers: The Development Trajectory
No ratings yet
On Transforming Reinforcement Learning With Transformers: The Development Trajectory
26 pages
Deep Unsupervised Learning
No ratings yet
Deep Unsupervised Learning
90 pages
Deep Reinforcement Learning in Computer Vision: A Comprehensive Survey
No ratings yet
Deep Reinforcement Learning in Computer Vision: A Comprehensive Survey
103 pages
RL Introduction
No ratings yet
RL Introduction
225 pages
AI Reinforcdement Learning
No ratings yet
AI Reinforcdement Learning
20 pages
Alexnet Paper
No ratings yet
Alexnet Paper
39 pages
Lec 23
No ratings yet
Lec 23
51 pages
ICML 2018 Notes: Stockholm, Sweden
No ratings yet
ICML 2018 Notes: Stockholm, Sweden
55 pages
four
No ratings yet
four
5 pages
Unit 3 Introduction to Deep Learning part 1
No ratings yet
Unit 3 Introduction to Deep Learning part 1
7 pages
tiếng anhi
No ratings yet
tiếng anhi
7 pages
Lecture1 ANN -Full
No ratings yet
Lecture1 ANN -Full
66 pages
case
No ratings yet
case
6 pages
Advancements in Reinforcement Learning Algorithms For Autonomous Systems
No ratings yet
Advancements in Reinforcement Learning Algorithms For Autonomous Systems
6 pages
Group I - PPT
No ratings yet
Group I - PPT
20 pages
cs224r_L01_intro
No ratings yet
cs224r_L01_intro
51 pages
A Brief Survey of Deep Reinforcement Learning PDF
No ratings yet
A Brief Survey of Deep Reinforcement Learning PDF
14 pages
Datamahadev Com Ai Pilot Deep Reinforcement Learning Change Aviation Warfare
No ratings yet
Datamahadev Com Ai Pilot Deep Reinforcement Learning Change Aviation Warfare
8 pages
2024 MTH058 Lecture04 AILearningParadigms
No ratings yet
2024 MTH058 Lecture04 AILearningParadigms
85 pages
Reinforcement Learning With Python
No ratings yet
Reinforcement Learning With Python
24 pages
22 Selected Top Papers On Deep Learning
No ratings yet
22 Selected Top Papers On Deep Learning
393 pages
Deep Learning Introduction Class (1)
No ratings yet
Deep Learning Introduction Class (1)
46 pages
DraftResearchGate
No ratings yet
DraftResearchGate
42 pages
Deep Learning and Applications: Pham The Bao Ptbao@sgu - Edu.vn
No ratings yet
Deep Learning and Applications: Pham The Bao Ptbao@sgu - Edu.vn
43 pages
UNIT - 5 Lecture 2
No ratings yet
UNIT - 5 Lecture 2
26 pages
Paper 1
No ratings yet
Paper 1
7 pages
Intermediate AI Prompting – Reinforcement Learning
From Everand
Intermediate AI Prompting – Reinforcement Learning
Eric Centore
No ratings yet
Activity Recognition: Fundamentals and Applications
From Everand
Activity Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
Image Classification: Step-by-step Classifying Images with Python and Techniques of Computer Vision and Machine Learning
From Everand
Image Classification: Step-by-step Classifying Images with Python and Techniques of Computer Vision and Machine Learning
Mark Magic
No ratings yet
Smarter Decisions – The Intersection of Internet of Things and Decision Science
From Everand
Smarter Decisions – The Intersection of Internet of Things and Decision Science
Jojo Moolayil
No ratings yet
Process Control CH-1
No ratings yet
Process Control CH-1
24 pages
Capter 1
No ratings yet
Capter 1
35 pages
Ieee Scada Practice
No ratings yet
Ieee Scada Practice
49 pages
Best Int Control
No ratings yet
Best Int Control
77 pages
Automatic Control System S Hasan Saeed
90% (104)
Automatic Control System S Hasan Saeed
295 pages
Electrical Measurements and Measuring Instruments
No ratings yet
Electrical Measurements and Measuring Instruments
317 pages
Acc Principles
100% (1)
Acc Principles
3 pages
Number Systems, Operations and Codes Harmonization - 2
No ratings yet
Number Systems, Operations and Codes Harmonization - 2
60 pages
Three Basic Forms: G G G G G G
No ratings yet
Three Basic Forms: G G G G G G
20 pages
Solar Tracker
100% (1)
Solar Tracker
18 pages
Laboratory Session 8 Squirrel Cage Induction Motor Characteristics
No ratings yet
Laboratory Session 8 Squirrel Cage Induction Motor Characteristics
15 pages
People Vs Torres
No ratings yet
People Vs Torres
5 pages
Practice Problems
No ratings yet
Practice Problems
5 pages
History 1a
No ratings yet
History 1a
23 pages
Jurnal Employee Relation Aspek Indikator
No ratings yet
Jurnal Employee Relation Aspek Indikator
23 pages
Software Engineering:: - Software Maintenance & Reliability Issues
No ratings yet
Software Engineering:: - Software Maintenance & Reliability Issues
20 pages
The Curious Incident of The Dog in The Night-Time
No ratings yet
The Curious Incident of The Dog in The Night-Time
3 pages
623c5167e6932197335531 SMT GRC Analyst
No ratings yet
623c5167e6932197335531 SMT GRC Analyst
3 pages
25 Appendices
No ratings yet
25 Appendices
23 pages
Indian Culture Vs Western Culture
80% (5)
Indian Culture Vs Western Culture
30 pages
Synaptic Dynamics: John D. Murray July 2011
No ratings yet
Synaptic Dynamics: John D. Murray July 2011
5 pages
Oh Crap! Potty Training PDF
No ratings yet
Oh Crap! Potty Training PDF
252 pages
Society Can Be Defined As A Group of People Who Share A Common Economic
No ratings yet
Society Can Be Defined As A Group of People Who Share A Common Economic
4 pages
Leptospirosis
No ratings yet
Leptospirosis
15 pages
Off-Grid Solar Power Bible - (5 - Erik Sipes
No ratings yet
Off-Grid Solar Power Bible - (5 - Erik Sipes
85 pages
Developmental Effects of Parent-Child Separation
No ratings yet
Developmental Effects of Parent-Child Separation
24 pages
Catalogue Safety Spray Shields
No ratings yet
Catalogue Safety Spray Shields
14 pages
Theories of Translation12345
No ratings yet
Theories of Translation12345
22 pages
PE - Norway Pension Fund - Evaluatin Investments in PE (Ver P. 32)
No ratings yet
PE - Norway Pension Fund - Evaluatin Investments in PE (Ver P. 32)
154 pages
Strategic Intent at Dabur India Limited - Case Study
100% (1)
Strategic Intent at Dabur India Limited - Case Study
2 pages
Drops On A Penny
100% (1)
Drops On A Penny
5 pages
Chapter3 G7
No ratings yet
Chapter3 G7
13 pages
How To Create The Sales Bill of Material (BOM) or Sales Kit and Its Limitation
No ratings yet
How To Create The Sales Bill of Material (BOM) or Sales Kit and Its Limitation
6 pages
Intensity Modulated Radiation Therapy A Clinical Overview Indra J Das download
100% (2)
Intensity Modulated Radiation Therapy A Clinical Overview Indra J Das download
76 pages
3 TRIGON XL 1 CHP 1 Buffer LLH 1 VT Heating Circuit 2 Gemini Calorifiers
No ratings yet
3 TRIGON XL 1 CHP 1 Buffer LLH 1 VT Heating Circuit 2 Gemini Calorifiers
1 page
Project of Green Building
No ratings yet
Project of Green Building
53 pages
Form Validasi CAF WANASARI WANAYASA NTS OPRT TSEL
No ratings yet
Form Validasi CAF WANASARI WANAYASA NTS OPRT TSEL
8 pages
How to Answer Interview Questions (McKee, Peggy) (Z-Library)
No ratings yet
How to Answer Interview Questions (McKee, Peggy) (Z-Library)
216 pages
Tagalog Reviewers
No ratings yet
Tagalog Reviewers
132 pages