0% found this document useful (0 votes)

5 views

Intro To Reinforcement Learning

prof. Carlo Lucibello Department of Computing Sciences Bocconi University

Uploaded by

s25237

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

Intro To Reinforcement Learning

prof. Carlo Lucibello Department of Computing Sciences Bocconi University

Uploaded by

s25237

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 56

Intro to

Reinforcement
Learning
prof. Carlo Lucibello

Department of Computing Sciences

Bocconi University
Machine learning paradigms

● Supervised learning: “learn to predict”

Machine learning paradigms

● Supervised learning: “learn to predict”

● Unsupervised learning: “learn the representation”

Machine learning paradigms

● Supervised learning: “learn to predict”

● Unsupervised learning: “learn the representation”

● Reinforcement learning: “learn to control”

Supervised Learning Example:
diagnosing heart abnormalities

● ML approach: teach the computer program

through examples

● Ask cardiologists to label a large number of

recorded ECG signals

● The learning algorithms adjusts the

program/model so that its predictions agree
with cardiologists’ labels
Supervised Learning

Training Set of input-output pairs:

deterministic prediction
Model:

probabilistic prediction
Reinforcement Learning
Games
Trading
Robotics
Reinforcement Learning in nature
Reinforcement Learning basics

Two key elements:

• Learning via trial and error
• No supervision
MAIN INGREDIENTS

● Agent: It’s me, Mario!

● State: Observations about the

world

● Actions: Decisions on what to

do next

● Rewards: Positive or negative

consequences of the actions

● Learning: Learn to choose the

action that gets the largest
reward!
Basic set-up

● An agent repeatedly interacts

and learns from a stochastic
environment with the goal of
maximizing cumulative rewards.

● “Trial-and-error” method of
learning.

● Algorithmic principles motivated

by psychology and neuroscience:
rewards provide a positive
reinforcement for an action.
Key features
Lack of a “supervisor”
● No labels telling us the best action to take
● We only have a reward signal

Delayed feedback
● The effect of an action may not be entirely visible instantaneously, but it may
affect the reward signal many steps later

Sequential decisions
● The sequence in which you make your moves will decide the path you take and
hence the ﬁnal outcome

Actions affects observations

● Observations are a function of the agents’ own actions, which the agent may
decide based on its past observations
Agent & Environment
State (Level Info)

Dapeng Hong , Billy Wan, Yaqi Zhang

Action (Up, Down, Left, Right….)
Rewards
Finish the level
Still in game

Game over
Get a coin
What we want to learn?

Given the current

state and the
possible reward,
what is the best
action we can
choose?

Optimal POLICY!
LET'S FORMALIZE
MATHEMATICALLY
ALL THIS
State -> Action -> Reward -> State (SARS)
State (a vector of numbers)
Action (discrete choices)
Rewards
Finish the level:+100
Still in game: + 0.1 per sec

Game over: -10

Get a coin: +1
Return: cumulative reward

“Future”
reward…?

Discount factor
HOW WE CAN LEARN?
What we want to learn?

Given the current

state and the
possible reward,
what is the best
action we can
choose?
First approach: Policy Learning

How we can ﬁnd the best action given a speciﬁc state?

We can “build” a function, the so called POLICY, and try to optimize it

Deterministic policy:

Stochastic policy:
Second Approach: Q-Learning

Given a policy, one can compute what is the average return (total reward)
obtained by playing a certain action in a certain state and then keep playing
according to the policy. This is called Q-function:

One can then use the Q-function to improve the

policy, then compute a new Q-function and so on.

E.g. greedy policy derived from Q-function:

HOW TO LEARN THE
Q-FUNCTION?
Q-Learning (Classic Algorithm)

Idea: what if we consider the expected value of reward for each action in the different states?

A1 A2 A3

S1 12 0 -10

S2 4 10 1000

● Q-Learning helps the agent make decisions by estimating the value of

different actions in different states

● It learns from experience which is the best action for each state

● You build your policy, the play the game many times and update the Q
function
Q-Table and Q-value
Learning the Q-table (pseudo-code)
Q-value update

Learning Rate Discount factor

Reward
after the the maximum value of
step the Q function for all
possible actions in the
new state
DOES IT REALLY WORK?
Real world case

Chess game:
● State: 10^50
● Action: 30-40 (legal ones)
or 2^18 (possible)
● Transitions to a new state are
deterministic, but depend on
the adversary
● Rewards: 0 for each
intermediate step, {-1,0,1} at
the end
Real world case

GO game:
● State: 3^361 (possible) or
10^170 (legal)
● Action: 200 (average)
or 361 (beginning)
● Transitions to a new state are
deterministic, but depend on
the adversary
● Rewards: 0 for each
intermediate step, {-1,0,1} at
the end
Real world case

Texas Hold ‘em:

● State: 10^31×4^8 (9 players)
● Action: 4 (fold,rise, call, check)
● Transitions to a new state are
stochastic and depend on the
adversary
● Rewards: 0 for each step, {−1, 0,
1} at the and
Real world case

Tram in Milano:
● State: 18×30×2
(lines × stops × (at or going to))
● Action: 100^3 (#tram^#state)
HOW DO DEAL
WITH COMPLEX
SETTINGS?
Q-Deep-learning

Constructing this function, even using Q-learning, could be an impossible task due to large state space.

However, we can approximate it.

● Exploration: try new actions to gather information about the environment

● Exploitation: choose actions based on the current knowledge to maximize rewards

A balance between exploration and exploitation is essential!

Exploration vs Exploitation
epsilon-greedy policy
Stuck into the walls?

Global (or Delayed) Rewards: Do not look at the reward at every instant – look
at all rewards at the same time.

Delayed Rewards Delayed Punishment

Let's train our Neural Networks
using Pytorch and Google Colab!

https://tinyurl.com/BocconiRL
Other Links

https://tinyurl.com/BocconiFrozenLake

https://tinyurl.com/FrozenLakeTEO

https://huggingface.co/learn/deep-rl-course/unit0/introduction

Intelligence Test Wisc
100% (1)
Intelligence Test Wisc
6 pages
All Schools List Kolhapur District
75% (4)
All Schools List Kolhapur District
224 pages
Unlock The Power of ChatGPT
No ratings yet
Unlock The Power of ChatGPT
3 pages
Mapeh Mps Per Component
No ratings yet
Mapeh Mps Per Component
8 pages
English For Specific Purpose Syllabus
No ratings yet
English For Specific Purpose Syllabus
7 pages
Reinforcement Learning: Russell and Norvig: CH 21
No ratings yet
Reinforcement Learning: Russell and Norvig: CH 21
16 pages
A Beginner's Guide To Deep Reinforcement Learning: Skymind - Ai
No ratings yet
A Beginner's Guide To Deep Reinforcement Learning: Skymind - Ai
23 pages
Reinforcement Learning: Russell and Norvig: CH 21
No ratings yet
Reinforcement Learning: Russell and Norvig: CH 21
16 pages
Machine Learning For NLP
No ratings yet
Machine Learning For NLP
58 pages
PowerPoint Presentation
No ratings yet
PowerPoint Presentation
63 pages
A Beginners Guide To Deep Reinforcement Learning PDF
No ratings yet
A Beginners Guide To Deep Reinforcement Learning PDF
9 pages
ML UNIT 5
No ratings yet
ML UNIT 5
13 pages
Intro to Reinforcement Learning - DQ Q AC A3C
No ratings yet
Intro to Reinforcement Learning - DQ Q AC A3C
36 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
31 pages
pac_man_compte_rendu
No ratings yet
pac_man_compte_rendu
8 pages
Lecture Notes on Reinforcement Learning Basics
No ratings yet
Lecture Notes on Reinforcement Learning Basics
6 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
9 pages
lecture-06
No ratings yet
lecture-06
98 pages
Deep Reinforcement Learning in Games
No ratings yet
Deep Reinforcement Learning in Games
9 pages
lecture21
No ratings yet
lecture21
29 pages
Unit 5 - Reinforcement Learning
No ratings yet
Unit 5 - Reinforcement Learning
15 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
32 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
18 pages
Unit 5 Deep Learning
No ratings yet
Unit 5 Deep Learning
24 pages
کتاب هشتم بارگزاری شده
No ratings yet
کتاب هشتم بارگزاری شده
112 pages
42-Deep Q Learning
No ratings yet
42-Deep Q Learning
8 pages
Reinforcement Learning
100% (1)
Reinforcement Learning
64 pages
Reinforcement Learning - Ipynb - Colaboratory
No ratings yet
Reinforcement Learning - Ipynb - Colaboratory
7 pages
UNIT VI
No ratings yet
UNIT VI
17 pages
Lecture 3.1 AML
No ratings yet
Lecture 3.1 AML
65 pages
Reinforcement learning
No ratings yet
Reinforcement learning
10 pages
4.1 Reinforcement Learning 2
No ratings yet
4.1 Reinforcement Learning 2
31 pages
Reinforcement Learning.pptx
No ratings yet
Reinforcement Learning.pptx
59 pages
Sections
No ratings yet
Sections
76 pages
kguh
No ratings yet
kguh
38 pages
LearnAlgorithms LT
No ratings yet
LearnAlgorithms LT
95 pages
Reinforcement Learning: Nguyen Do Van, PHD
No ratings yet
Reinforcement Learning: Nguyen Do Van, PHD
40 pages
Tic Tac Toe
No ratings yet
Tic Tac Toe
80 pages
Examples and Videos of Markov Decision Processes (MDPS) and Reinforcement Learning
No ratings yet
Examples and Videos of Markov Decision Processes (MDPS) and Reinforcement Learning
36 pages
Reinforcement 2
No ratings yet
Reinforcement 2
2 pages
AI (IT) UNIT-5
No ratings yet
AI (IT) UNIT-5
43 pages
Reinforcement Learning: By: Chandra Prakash IIITM Gwalior
No ratings yet
Reinforcement Learning: By: Chandra Prakash IIITM Gwalior
64 pages
RL Lecturer (1)
No ratings yet
RL Lecturer (1)
38 pages
RL & DL Notes
No ratings yet
RL & DL Notes
73 pages
Reinforcement Learning Model Based Planning Dynamic Programming
No ratings yet
Reinforcement Learning Model Based Planning Dynamic Programming
17 pages
16 - Reinforcement Learning and Bandits.pptx
No ratings yet
16 - Reinforcement Learning and Bandits.pptx
41 pages
AI unit -3.docx
No ratings yet
AI unit -3.docx
102 pages
Q Learning
No ratings yet
Q Learning
38 pages
Lecture_01 - Introduction - I
No ratings yet
Lecture_01 - Introduction - I
15 pages
CSD311: Artificial Intelligence
No ratings yet
CSD311: Artificial Intelligence
11 pages
ML-10
No ratings yet
ML-10
9 pages
M 2
No ratings yet
M 2
12 pages
Unit 03 RL Problem
No ratings yet
Unit 03 RL Problem
9 pages
Reinforcement Learning and Robotics
No ratings yet
Reinforcement Learning and Robotics
35 pages
DQN_Muhammed
No ratings yet
DQN_Muhammed
46 pages
Introduction
No ratings yet
Introduction
6 pages
Lecture#5 Monte Carlo Methods Part I
No ratings yet
Lecture#5 Monte Carlo Methods Part I
28 pages
An Analytic Approach: Abilities?
No ratings yet
An Analytic Approach: Abilities?
6 pages
Reinforcement Learning: Nazia Bibi
100% (1)
Reinforcement Learning: Nazia Bibi
61 pages
L13 Reinforcement Learning
No ratings yet
L13 Reinforcement Learning
57 pages
Unit 1
No ratings yet
Unit 1
18 pages
Reinforcement Learning-1
No ratings yet
Reinforcement Learning-1
13 pages
RL Project - Deep Q-Network Agent Report
No ratings yet
RL Project - Deep Q-Network Agent Report
11 pages
ML QB 5
No ratings yet
ML QB 5
44 pages
Reinforcement Learning Explained - A Step-by-Step Guide to Reward-Driven AI
From Everand
Reinforcement Learning Explained - A Step-by-Step Guide to Reward-Driven AI
Luka Nikolic
No ratings yet
Chapter 2 BEED 2B
No ratings yet
Chapter 2 BEED 2B
41 pages
Multisensory Approach Presentation
No ratings yet
Multisensory Approach Presentation
15 pages
BJ COT 4 MAPEH (HEALTH)
100% (1)
BJ COT 4 MAPEH (HEALTH)
12 pages
Measurement of Leadership
No ratings yet
Measurement of Leadership
23 pages
Rubrics For Online Presentation
No ratings yet
Rubrics For Online Presentation
1 page
STEM PowerPoint
No ratings yet
STEM PowerPoint
18 pages
LESSON PLAN A 8 A - ART KLETT
No ratings yet
LESSON PLAN A 8 A - ART KLETT
8 pages
Bloom Taxonomy
No ratings yet
Bloom Taxonomy
2 pages
GP Book 7
88% (8)
GP Book 7
269 pages
INT354 Syllabus
No ratings yet
INT354 Syllabus
2 pages
LA Notes 1
No ratings yet
LA Notes 1
27 pages
QUESTIONNAIRES
No ratings yet
QUESTIONNAIRES
3 pages
Varamballi Chitra CV
No ratings yet
Varamballi Chitra CV
3 pages
Activity Sheet Psychosocial
No ratings yet
Activity Sheet Psychosocial
20 pages
English Year 2 Lesson Plan: Steps / Phase Time Content Teaching Learning Activities Remarks Set Induction
No ratings yet
English Year 2 Lesson Plan: Steps / Phase Time Content Teaching Learning Activities Remarks Set Induction
7 pages
Fee Structure
No ratings yet
Fee Structure
6 pages
John Keller's: Motivational Model
No ratings yet
John Keller's: Motivational Model
19 pages
Cim Qualifications-brochure Online
No ratings yet
Cim Qualifications-brochure Online
17 pages
SAP Master Data Governance (MDG)_Feb 2021
No ratings yet
SAP Master Data Governance (MDG)_Feb 2021
7 pages
Factors Affecting The English Performance
No ratings yet
Factors Affecting The English Performance
15 pages
Theory Culture Society 2016 Van Dooren 29 52
No ratings yet
Theory Culture Society 2016 Van Dooren 29 52
24 pages
AI_Based_Pet_Adoption_System
No ratings yet
AI_Based_Pet_Adoption_System
5 pages
Supporting Significant Life Events
100% (1)
Supporting Significant Life Events
5 pages
Lecture Notes For Chapter 4 Artificial Neural Networks Introduction To Data Mining, 2 Edition
No ratings yet
Lecture Notes For Chapter 4 Artificial Neural Networks Introduction To Data Mining, 2 Edition
22 pages
Impact K12
No ratings yet
Impact K12
11 pages