0% found this document useful (0 votes)

27 views

RL Lecture1-Introduction (IITH)

Uploaded by

Rahul Ramachandran

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views

RL Lecture1-Introduction (IITH)

Uploaded by

Rahul Ramachandran

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

AI 3000 (CS 5500) : Reinforcement Learning

Easwar Subramanian
TCS Innovation Labs, Hyderabad

Email : [email protected]

August 03, 2024

Overview

1 Introduction

2 RL : Framework, Components and Challenges

3 Historical Notes

4 Motivation and Success Stories

5 Course Logistics

Easawr Subramanian, IIT Hyderabad 2 of 44

Introduction

Easawr Subramanian, IIT Hyderabad 3 of 44

Machine Learning
” Machine learning is about developing bots that has the ability to automatically learn and
improve from experience without being explicitly programmed ”

Figure Source: David Silver’s RL

Easawr Subramanian, IIT Hyderabad 4 of 44 course
Supervised Learning
▶ Data : (x, y) → x is data and y is label
▶ Goal: Learn a function f to map y = f (x)
▶ Problems : Classification or Regression

Classification
Figure Source: Aura Portal -
Easawr Subramanian, IIT Hyderabad 5 of 44 AI/ML Blog
Unsupervised Learning
▶ Data : (x) → Only data; No label
▶ Goal: Learn underlying structure
▶ Techniques : Clustering

Clustering
Figure Source: Aura Portal -
Easawr Subramanian, IIT Hyderabad 6 of 44 AI/ML Blog
Reinforcement Learning
▶ Data : Agent interacts with environment to collect data
▶ Goal : Agent learns to interact with environment to maximize an utility
▶ Examples : Learn a task, Navigation

Learn to cycle (task)

Figure Source:
Easawr Subramanian, IIT Hyderabad 7 of 44 worldmodels.github.io
Example : Navigation

▶ Task : Start from square S and reach square G in as less moves as possible

▶ One has to make sequence of moves

(actions)
▶ Action chosen determine which
squares (states) would be visited
subsequently
▶ Reaching the goal state will fetch a
reward; Visiting intermediate squares
(states) may or may not fetch reward
Navigation in grid world

Figure Source: Genevieve Hayes :

Easawr Subramanian, IIT Hyderabad 8 of 44 Medium Post
Sequential Decision Making

Supervised or Unsupervised Setting

▶ System is making a isolated decision; i.e., classification, regression or clustering;
▶ Decision does not affect future observations

Reinforcement Learning
▶ Generally, the agent makes a sequence of decisions (or actions)
▶ Actions affect future observations
▶ Actions taken have consequences

Easawr Subramanian, IIT Hyderabad 9 of 44

Types of Learning : Summary

Easawr Subramanian, IIT Hyderabad 10 of 44 Figure Source: Saggie

RL : Framework, Components and Challenges

Easawr Subramanian, IIT Hyderabad 11 of 44

Reinforcement Learning : Framework

▶ Observations are non i.i.d and are sequential in nature

▶ Agent’s action (may) affect the subsequent observations seen
▶ There is no supervisor; Only reward signal (feedback)
▶ Reward or feedback can be delayed

Easawr Subramanian, IIT Hyderabad 12 of 44

Example : Tic-Tac-Toe

▶ Observations : Board position

▶ Actions : Moves
▶ Reward : Win or Loss

Easawr Subramanian, IIT Hyderabad 13 of 44

Example : Robotics

▶ Observations : Image from in-built camera

▶ Actions : Motor current for movement
▶ Reward : Task success measure

Easawr Subramanian, IIT Hyderabad 14 of 44

Example : Inventory Control

▶ Observations : Stock levels

▶ Actions : What to purchase
▶ Reward : Profit
Easawr Subramanian, IIT Hyderabad 15 of 44
Components of RL : Agent and Environment

Agent
▶ Executes action upon receiving observation
▶ For taking an action the agent receives an appropriate reward

Environment
▶ An external system that an agent can perceive and act on.
▶ Receives action from agent and in response emits appropriate reward and (next)
observation

Slide Credit: David Silver’s RL

Easawr Subramanian, IIT Hyderabad 16 of 44 Course
Components of RL : State and Reward

State
▶ State can be viewed as a summary or an abstraction of the past history of the system
⋆ For example, in Tic-Tac-Toe, the state could be raw image or vector
representation of the board

Reward
▶ Reward is a scalar feedback signal
▶ Indicates how well agent acted at a certain time
▶ The agent’s aim is to maximise cumulative reward

Slide Credit: David Silver’s RL

Easawr Subramanian, IIT Hyderabad 17 of 44 Course
Reinforcement Learning : Challenges

▶ Delayed Feedback

▶ Credit Assignment Problem

▶ Stochastic Environment

▶ Definition of Reward Function

▶ Data Collection Problem

Easawr Subramanian, IIT Hyderabad 18 of 44

Historical Notes

Easawr Subramanian, IIT Hyderabad 19 of 44

Learning by Trial and Error

Tic-Tac-Toe

▶ Random movements by agent is akin to exploration

▶ Exploration can help the agent place ’X’ in square number 5
▶ Reward obtained from placing ’X’ in square number 5 can now be remembered in
terms of updating the policy or value function

Easawr Subramanian, IIT Hyderabad 20 of 44

Thondrike’s Cat : Psychophysical Experiment

Thondrike’s cat Law of Effect

Law of Effect (1898)

Any behaviour that is followed by pleasant consequences is likely to be repeated, and any
behaviour followed by unpleasant consequences is likely to be stopped

Figure Source: Oscar Education :

Easawr Subramanian, IIT Hyderabad 21 of 44 Blogpost
Pavlov’s Dog

Pavlov’s Dog Figure Source:

https://www.age-of-the-
Easawr Subramanian, IIT Hyderabad 22 of 44 sage.org/psychology/pavlov.html
Connections to Temporal Difference

▶ Ivav Pavlov laid the ground for classical conditioning (1901)

▶ First theory that incorporated time into the learning procedure
▶ Rescorla-Wagner (RW) (1972) model is a formal model to explain Pavlovian
conditioning
▶ Temporal-Difference (TD) learning, that extends RW model, is an approach to
learning how to predict a quantity that depends on future values of a given signal
(Sutton, 1984)
▶ TD learning forms the basis of almost all RL algorithms that we see today

Easawr Subramanian, IIT Hyderabad 23 of 44

Connections to Optimal Control

Easawr Subramanian, IIT Hyderabad 24 of 44

Connections to Optimal Control

▶ Outcomes are partly random and partly under the control of the decision maker
▶ Markov Decision Process (MDP) (Bellman, 1957) is used as a framework to model
and solve sequential decision problem
▶ People working in control theory have contributed to optimal sequential decision
making

Easawr Subramanian, IIT Hyderabad 25 of 44

Modern Reinforcement Learning

▶ The temporal difference (TD) thread and the optimal control thread were bought
together by Watkins (1989) when he proposed the famous Q-learning algorithm
▶ Gerald Tesauro (1992) employed TD learning to play backgammon; The developed
software agent was able to beat experts

Figure Source:
Easawr Subramanian, IIT Hyderabad 26 of 44 https://www.linuxjournal.com/article/11038
Era of Deep (Reinforcement) Learning

Deep Neural Net for Atari Games

Easawr Subramanian, IIT Hyderabad 27 of 44 Figure Source: Minh et.al, 2015

Reinforcement Learning : History

Easawr Subramanian, IIT Hyderabad 28 of 44 Slide Credit: RL Course, Abir Das

Motivation and Success Stories

Easawr Subramanian, IIT Hyderabad 29 of 44

Motivation

Why study Reinforcement Learning (RL) now ?

▶ Advances in computational capability

▶ Advances in deep learning
▶ Advances in reinforcement learning
⋆ Subject matter of this course !

Slide Credit: Sergey Levine course

Easawr Subramanian, IIT Hyderabad 30 of 44 on Deep RL at UCB
Sucess Stories

(a) Ng et al 2004 (b) Kohl et al 2004

Easawr Subramanian, IIT Hyderabad 31 of 44

Sucess Stories

(c) Minh et al 2013 (d) Schulman et al 2016

(d) Silver et al. 2016

Easawr Subramanian, IIT Hyderabad 32 of 44

Towards Intelligent Systems

▶ Things that we can all do (Walking) (Evolution, may be)

▶ Things that we learn (driving a bicycle, car etc)
▶ We learn a huge variety of things (music, sport, arts etc)
We are still far from building a ‘reasonable’ intelligent system
▶ We are taking baby steps towards the goal of building intelligent systems
▶ Reinforcement Learning (RL) is one of the important paradigm towards
that goal

Slide Credit: Sergey Levine course

Easawr Subramanian, IIT Hyderabad 33 of 44 on Deep RL at UCB
Course Logistics

Easawr Subramanian, IIT Hyderabad 34 of 44

Course Content - Part A
Modern Reinforcement Learning

▶ Markov Decision Process

▶ Dynamic Programming and Bellman Optimality Principle

▶ Value and Policy Iteration

▶ Convergence Properties of Value and Policy Iteration

▶ Model Free Prediction

▶ Model Free Control : Q-Learning and SARSA

Easawr Subramanian, IIT Hyderabad 35 of 44

Course Content - Part B
Deep Reinforcement Learning

▶ Deep Q-Learning and Variants

▶ Policy Gradient Approaches

▶ Variance Reduction in Policy Gradient Methods

▶ Actor Crtic Algorithms

▶ Deterministic Policy Gradients

▶ Advanced Policy Gradient Methods : TRPO and PPO

Easawr Subramanian, IIT Hyderabad 36 of 44

Course Prerequisites

▶ Prerequisites
⋆ Probability
⋆ Linear Algebra
⋆ Machine Learning
⋆ Deep Learning

▶ Programming Prerequisites
⋆ Good Proficiency in Python
⋆ Tensorflow / Theano / PyTorch / Keras
⋆ Other Associated Python Libraries

Easawr Subramanian, IIT Hyderabad 37 of 44

Venue and Timing

▶ Mode
⋆ In class lectures at LHC-3 (possibly recorded for MDS students)
▶ Timing
⋆ Saturday - 10.00 AM to 1.00 PM (??)
▶ Course Co-ordinator
⋆ Prof. Konda Reddy

Easawr Subramanian, IIT Hyderabad 38 of 44

Course Evaluation

▶ Assignments : Three or Four in Total (30 %)

▶ Exams : Two in Total (70 %)

Details will be in Piazza

Easawr Subramanian, IIT Hyderabad 39 of 44

Course Material : Books

Reinforcement Learning : Sutton and Barto

Reinforcement Learning and Optimal Control, Bertsekas and Tsitsiklis

Dynamic Programming and Optimal Control (I and II) by Bertsekas

Easawr Subramanian, IIT Hyderabad 40 of 44

Course Material : Online Material

David Silver’s course on Reinforcement Learning

Stanford course on Deep RL (Sergey Levine)

Deep RL BootCamp (Pieter Abeel)

John Schulman’s lectures on Policy Gradient Methods

... and many others

Easawr Subramanian, IIT Hyderabad 41 of 44

Course Material : From India

Prof. B. Ravindran’s Course on RL (NPTEL)

Dr. Abir Das’s Course on RL (IIT KGP)

Reinforcement Learning via Stochastic Approximation, Mathukumalli Vidyasagar,

Lecture Notes, 2022 (Link to online version available in Piazza)

Easawr Subramanian, IIT Hyderabad 42 of 44

Attribution and Disclaimer

▶ Most concepts, ideas and figures, that form part of course lectures, are from several
sources from across web; Most of them are listed as course material

▶ Care is taken to provide appropriate attribution; Omissions, if any, are regretted and
unintentional

▶ Material prepared only for learning / teaching purpose

▶ Original authorship / copyright rests with the respective authors / publishers

Easawr Subramanian, IIT Hyderabad 43 of 44

Enjoy Learning !

Easawr Subramanian, IIT Hyderabad 44 of 44

Band in A Box 2017 Manual
No ratings yet
Band in A Box 2017 Manual
675 pages
UNIT 1 Machine Learning (KCS-055)
No ratings yet
UNIT 1 Machine Learning (KCS-055)
184 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
30 pages
Machine Learning Slides
No ratings yet
Machine Learning Slides
46 pages
Ai Ques Ans Unit 4 Lecture Notes 4 aktu
No ratings yet
Ai Ques Ans Unit 4 Lecture Notes 4 aktu
20 pages
Final Summer Training Report
No ratings yet
Final Summer Training Report
25 pages
part5
No ratings yet
part5
11 pages
part5
No ratings yet
part5
11 pages
Module 1
No ratings yet
Module 1
72 pages
Introduction 2 Machine Learning
No ratings yet
Introduction 2 Machine Learning
25 pages
CH 8 - Introduction To Machine Learning
No ratings yet
CH 8 - Introduction To Machine Learning
74 pages
4.1 Reinforcement Learning 2
No ratings yet
4.1 Reinforcement Learning 2
31 pages
Vsat2k_ML_Ch1 Introduction to Machine Learning - Jan 2025
No ratings yet
Vsat2k_ML_Ch1 Introduction to Machine Learning - Jan 2025
26 pages
CyberRat Experiment 5
No ratings yet
CyberRat Experiment 5
5 pages
20 Q Learning 29 04 2024
No ratings yet
20 Q Learning 29 04 2024
29 pages
RL Introduction
No ratings yet
RL Introduction
225 pages
Learning Computational Thinking Through Robotic Competitions
No ratings yet
Learning Computational Thinking Through Robotic Competitions
8 pages
Learning AI
No ratings yet
Learning AI
34 pages
Unit 5 - 5.1 Reinforcement Learning
No ratings yet
Unit 5 - 5.1 Reinforcement Learning
9 pages
My Report CW 1
No ratings yet
My Report CW 1
13 pages
Module 01
No ratings yet
Module 01
66 pages
ML RUSA Module 1 Intro
No ratings yet
ML RUSA Module 1 Intro
30 pages
Aiml M3 C1
No ratings yet
Aiml M3 C1
59 pages
The_Use_of_Reinforcement_Learning_in_Gaming_The_Br
No ratings yet
The_Use_of_Reinforcement_Learning_in_Gaming_The_Br
9 pages
AutoRL Tutorials
No ratings yet
AutoRL Tutorials
80 pages
part6
No ratings yet
part6
7 pages
Unit 5 2
No ratings yet
Unit 5 2
31 pages
AIMLQBUnit5
No ratings yet
AIMLQBUnit5
46 pages
Presentation For Follow Up
No ratings yet
Presentation For Follow Up
23 pages
Reinforcement Learning (RL) : Big Data Mining
No ratings yet
Reinforcement Learning (RL) : Big Data Mining
86 pages
Final
No ratings yet
Final
18 pages
Machine Learning Unit 1
No ratings yet
Machine Learning Unit 1
56 pages
UNIT 2 Class Basic
No ratings yet
UNIT 2 Class Basic
69 pages
Data Analytics Object Segmentation Unit IV
No ratings yet
Data Analytics Object Segmentation Unit IV
34 pages
Cz4041 1a Introduction
No ratings yet
Cz4041 1a Introduction
55 pages
ML Unit-1
No ratings yet
ML Unit-1
34 pages
Lecture 1
No ratings yet
Lecture 1
26 pages
AIMLQBUnit 5
No ratings yet
AIMLQBUnit 5
46 pages
Chapter 5 - Machine Learning Basics
No ratings yet
Chapter 5 - Machine Learning Basics
58 pages
Final
No ratings yet
Final
17 pages
Module XM
No ratings yet
Module XM
29 pages
APznzaaG9jVWOjxz6sSRarStfhoAcuTzDg4xaHxkVeNp1K_XBhdDYUTk7pCJ5rC0u9thdxVXvX2JtRNfUgCglIvTQhxkqfXWyPyTEw
No ratings yet
APznzaaG9jVWOjxz6sSRarStfhoAcuTzDg4xaHxkVeNp1K_XBhdDYUTk7pCJ5rC0u9thdxVXvX2JtRNfUgCglIvTQhxkqfXWyPyTEw
19 pages
College Recommendation
No ratings yet
College Recommendation
35 pages
College Recommendation
No ratings yet
College Recommendation
35 pages
Module 03
No ratings yet
Module 03
54 pages
AIML Module - 03
No ratings yet
AIML Module - 03
34 pages
PlacementAnalysisforStudentsusingMachineLearning2
No ratings yet
PlacementAnalysisforStudentsusingMachineLearning2
16 pages
THEORY FILE - Machine Learning (6th Sem)!!
No ratings yet
THEORY FILE - Machine Learning (6th Sem)!!
26 pages
Lec 01
No ratings yet
Lec 01
60 pages
SL-Week01
No ratings yet
SL-Week01
13 pages
AI_Chapter_19
No ratings yet
AI_Chapter_19
53 pages
1 s2.0 S1566253522000288 Main
No ratings yet
1 s2.0 S1566253522000288 Main
22 pages
Machine Learning: Version 2 CSE IIT, Kharagpur
No ratings yet
Machine Learning: Version 2 CSE IIT, Kharagpur
9 pages
Theory and Application of Reward Shaping in Reinforcement Learning
No ratings yet
Theory and Application of Reward Shaping in Reinforcement Learning
102 pages
AIML Module - 03 21CS4
No ratings yet
AIML Module - 03 21CS4
34 pages
JPR #Micro Project
No ratings yet
JPR #Micro Project
18 pages
Artificial Intelligence (Csc261) : Prakash Neupane
No ratings yet
Artificial Intelligence (Csc261) : Prakash Neupane
21 pages
Module 1 PPT PDF
No ratings yet
Module 1 PPT PDF
90 pages
Module 1 PPT PDF
No ratings yet
Module 1 PPT PDF
90 pages
Reinforcement Learning Explained - A Step-by-Step Guide to Reward-Driven AI
From Everand
Reinforcement Learning Explained - A Step-by-Step Guide to Reward-Driven AI
Luka Nikolic
No ratings yet
Intermediate AI Prompting – Reinforcement Learning
From Everand
Intermediate AI Prompting – Reinforcement Learning
Eric Centore
No ratings yet
policy (RL IITH)
No ratings yet
policy (RL IITH)
46 pages
Lecture2-MRP (RL IITH)
No ratings yet
Lecture2-MRP (RL IITH)
54 pages
exact (RL IITH)
No ratings yet
exact (RL IITH)
47 pages
Binary Trees
No ratings yet
Binary Trees
16 pages
Binomial Heaps
No ratings yet
Binomial Heaps
38 pages
Binary Trees 2
No ratings yet
Binary Trees 2
10 pages
R.D.D - Chapter 1
No ratings yet
R.D.D - Chapter 1
19 pages
Thessaloniki Brochure
No ratings yet
Thessaloniki Brochure
5 pages
Speed Control of 3Ø AC Induction Motor Using VSI & Microcontroller
No ratings yet
Speed Control of 3Ø AC Induction Motor Using VSI & Microcontroller
32 pages
IIT Roorkee GATE Cut Off Score For M
No ratings yet
IIT Roorkee GATE Cut Off Score For M
3 pages
Motivation Concepts: Prepared By: Prince Dudhatra
No ratings yet
Motivation Concepts: Prepared By: Prince Dudhatra
37 pages
Muslim Scientists
No ratings yet
Muslim Scientists
3 pages
CST265: System Programming Question Bank UNIT-1
50% (2)
CST265: System Programming Question Bank UNIT-1
6 pages
Colour Vision Requirements of Fire Fighters
No ratings yet
Colour Vision Requirements of Fire Fighters
11 pages
Resume Example Oxford
No ratings yet
Resume Example Oxford
3 pages
Gwalior in T Conference Dec 2014
No ratings yet
Gwalior in T Conference Dec 2014
2 pages
Ls-Dyna Manual Vol III r7.1
No ratings yet
Ls-Dyna Manual Vol III r7.1
183 pages
Questionnaire Form (Athletes)
No ratings yet
Questionnaire Form (Athletes)
2 pages
Facilitation of Training
No ratings yet
Facilitation of Training
5 pages
Next Generation Robotics: Editorial Team: Henrik I Christensen, Allison Okamura, Maja Mataric, Vijay Kumar
No ratings yet
Next Generation Robotics: Editorial Team: Henrik I Christensen, Allison Okamura, Maja Mataric, Vijay Kumar
24 pages
2-Circle Drawing Algorithms
No ratings yet
2-Circle Drawing Algorithms
4 pages
Is 432 2 1982 PDF
No ratings yet
Is 432 2 1982 PDF
14 pages
5 Chapter
No ratings yet
5 Chapter
5 pages
Capsaicin Recovery From A Cell Culture Broth
No ratings yet
Capsaicin Recovery From A Cell Culture Broth
5 pages
Alaska Hybrid Start For Clipper Programmer
100% (4)
Alaska Hybrid Start For Clipper Programmer
17 pages
Operational Excellence
No ratings yet
Operational Excellence
24 pages
Presentation
No ratings yet
Presentation
33 pages
7 Cladogramsandgenetics
No ratings yet
7 Cladogramsandgenetics
4 pages
CIAP-PCAB Citizen's Charter - AMO Seminar and Exam - 26sep2016 PDF
No ratings yet
CIAP-PCAB Citizen's Charter - AMO Seminar and Exam - 26sep2016 PDF
1 page
Moist Air Is A Mixture of Dry Air and Water Vapor
100% (1)
Moist Air Is A Mixture of Dry Air and Water Vapor
10 pages
Lecture in Fortran
No ratings yet
Lecture in Fortran
5 pages
Speaking Naturally
100% (18)
Speaking Naturally
128 pages
Dividing Wall Column Revamp Optimises Mixed Xylenes Production
No ratings yet
Dividing Wall Column Revamp Optimises Mixed Xylenes Production
14 pages
Discussion Challenges of Earth Pressure Balance Tunnelling in Weathered Granite With Boulders
No ratings yet
Discussion Challenges of Earth Pressure Balance Tunnelling in Weathered Granite With Boulders
6 pages
Band Manufacture: Exam Spot
No ratings yet
Band Manufacture: Exam Spot
1 page