0% found this document useful (0 votes)

11 views

Q-learn

The document outlines a Q-learning practice exercise for a grid-world scenario where an agent learns an optimal policy through given episodes. It includes specific Q-value calculations and update equations for both Q-learning and SARSA methods. Additionally, it requires filling in the first occurrence of non-zero Q-values based on the provided episodes.

Uploaded by

Harshit Rathour

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

Q-learn

Uploaded by

Harshit Rathour

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

CS 6300 Q-Learning Practice March 9, 2021

Consider the grid-world given below and an agent who is trying to learn the optimal policy.
Rewards are only awarded for taking the Exit action from one of the shaded states. Taking this
action moves the agent to the Done state, and the MDP terminates. Assume γ = 1 and α = 0.5 for
all calculations. All equations need to explicitly mention γ and α if necessary.

1. The agent starts from the top left corner and you are given the following episodes from runs
of the agent through this grid-world. Each line in an Episode is a tuple containing (s, a, s′ , r).

Episode 1 Episode 2 Episode 3 Episode 4 Episode 5

(3,1), S, (2,1), 0 (3,1), S, (2,1), 0 (3,1), S, (2,1), 0 (3,1), S, (2,1), 0 (3,1), S, (2,1), 0
(2,1), E, (2,2), 0 (2,1), E, (2,2), 0 (2,1), E, (2,2), 0 (2,1), E, (2,2), 0 (2,1), E, (2,2), 0
(2,2), E, (2,3), 0 (2,2), S, (1,2), -100 (2,2), E, (2,3), 0 (2,2), E, (2,3), 0 (2,2), E, (2,3), 0
(2,3), N, (3,3), +50 (2,3), S, (1,3), +30 (2,3), N, (3,3), +50 (2,3), S, (1,3), +30

Fill in the following Q-values obtained from direct evaluation from the samples:

Q((2,3), N) = Q((2,3), S) = Q((2,2), E) =

1
2. Q-learning is an online algorithm to learn optimal Q-values in an MDP with unknown re-
wards and transition function. The update equation is:

Q(st , at ) = (1 − α)Q(st , at ) + α(R(st , at , st+1 ) + γ max

′
Q(st+1 , a′ ))
a

where γ is the discount factor, α is the learning rate and the sequence of observations are
(· · · , st , at , st+1 , rt , · · · ).

(a) Particularize the Q-learning equation for this problem.

Q((2, 1), E) =

Q((2, 2), E) =

Q((2, 2), S) =

Q((2, 3), N ) =

Q((2, 3), S) =

Q((3, 1), S) =

2
(b) Given the episodes in part 1, fill in the time at which the following Q values first be-
come non-zero. Your answer should be of the form (episode#,iter#) where iter# is the
Q-learning update iteration in that episode.

Q((2,1), E) = Q((2,2), E) = Q((2,3), S) =

3
3. Repeat with SARSA. The update equation is:

Q(st , at ) = (1 − α)Q(st , at ) + α(R(st , at , st+1 ) + γQ(st+1 , at+1 ))

(a) Particularize the SARSA equation for this problem.

Q((2, 1), E) =

Q((2, 2), E) =

Q((2, 2), S) =

Q((2, 3), N ) =

Q((2, 3), S) =

Q((3, 1), S) =

4
(b) Given the episodes in part 1, fill in the time at which the following Q values first be-
come non-zero.

Q((2,1), E) = Q((2,2), E) = Q((2,3), S) =

Abe Hayat English Editio Chapter 1
88% (25)
Abe Hayat English Editio Chapter 1
146 pages
Arjes Impaktor 250 - 28 - Carro Inferior
No ratings yet
Arjes Impaktor 250 - 28 - Carro Inferior
62 pages
Case Study On Manage Change: The Action Plan For Velo V Project in Lyon, France
No ratings yet
Case Study On Manage Change: The Action Plan For Velo V Project in Lyon, France
7 pages
Sec 12
No ratings yet
Sec 12
5 pages
qustion bank with solution (3)
No ratings yet
qustion bank with solution (3)
147 pages
Lec 17 SARSA Expected SARSA Q Learning
No ratings yet
Lec 17 SARSA Expected SARSA Q Learning
4 pages
Homework #3: MDPS, Q-Learning, &: Pomdps
No ratings yet
Homework #3: MDPS, Q-Learning, &: Pomdps
18 pages
EE 675 Lecture 27th March
No ratings yet
EE 675 Lecture 27th March
4 pages
MAS-Lab7-QFA
No ratings yet
MAS-Lab7-QFA
10 pages
3964 Double Q Learning
No ratings yet
3964 Double Q Learning
9 pages
HW 2
No ratings yet
HW 2
2 pages
q2B Review
No ratings yet
q2B Review
9 pages
MarkovDecisionProcesses Analysis
No ratings yet
MarkovDecisionProcesses Analysis
10 pages
5.4-Reinforcement learning-part3-Q-Learning
No ratings yet
5.4-Reinforcement learning-part3-Q-Learning
18 pages
Practice Assignment 6: Reinforcement Learning Prof. B. Ravindran
No ratings yet
Practice Assignment 6: Reinforcement Learning Prof. B. Ravindran
24 pages
RL Examples
No ratings yet
RL Examples
6 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
6 pages
Temporal Difference Learning
No ratings yet
Temporal Difference Learning
17 pages
Filippov Theory On Infinitesimal Epsilon-Greedy Q-Learning
No ratings yet
Filippov Theory On Infinitesimal Epsilon-Greedy Q-Learning
66 pages
p1 Piotr
No ratings yet
p1 Piotr
7 pages
Lec 09
No ratings yet
Lec 09
26 pages
CS6700 RL 2024 Wa1
No ratings yet
CS6700 RL 2024 Wa1
7 pages
Report p1
No ratings yet
Report p1
7 pages
Q Learning SARSA Deep Q Learning
No ratings yet
Q Learning SARSA Deep Q Learning
4 pages
Lecture6 Convergence of MDPs
No ratings yet
Lecture6 Convergence of MDPs
23 pages
Assignment 4
No ratings yet
Assignment 4
6 pages
Notações Dos Algoritimos
No ratings yet
Notações Dos Algoritimos
10 pages
DL Unit 6 QP Solution
No ratings yet
DL Unit 6 QP Solution
15 pages
Problem Set 1
No ratings yet
Problem Set 1
15 pages
Littomore
No ratings yet
Littomore
169 pages
Reinforcement Learning: Mitchell, Ch. 13 (See Also Barto & Sutton Book On-Line)
No ratings yet
Reinforcement Learning: Mitchell, Ch. 13 (See Also Barto & Sutton Book On-Line)
14 pages
Reinforcement Learning: Mitchell, Ch. 13 (See Also Barto & Sutton Book On-Line)
No ratings yet
Reinforcement Learning: Mitchell, Ch. 13 (See Also Barto & Sutton Book On-Line)
14 pages
Reinforcement Learning 2
No ratings yet
Reinforcement Learning 2
41 pages
Learning Rates for Q-learning
No ratings yet
Learning Rates for Q-learning
25 pages
q2B Review Sol
No ratings yet
q2B Review Sol
14 pages
rl
No ratings yet
rl
6 pages
6b Soln
No ratings yet
6b Soln
3 pages
Q-Learning: Reinforcement Learning Basic Q-Learning Algorithm Common Modifications
No ratings yet
Q-Learning: Reinforcement Learning Basic Q-Learning Algorithm Common Modifications
22 pages
Q Learning
No ratings yet
Q Learning
9 pages
22 Reinforcement Learning
No ratings yet
22 Reinforcement Learning
18 pages
tut_RL-1
No ratings yet
tut_RL-1
2 pages
Non-Deterministic Reward and Action
No ratings yet
Non-Deterministic Reward and Action
2 pages
CS 188 Introduction To Artificial Intelligence Summer 2019 Note 4
No ratings yet
CS 188 Introduction To Artificial Intelligence Summer 2019 Note 4
9 pages
RL Theory Tutorial
No ratings yet
RL Theory Tutorial
80 pages
cs461 hw1
No ratings yet
cs461 hw1
14 pages
Jia Zhou - JMLR 2023
No ratings yet
Jia Zhou - JMLR 2023
61 pages
ML Unit 5 (ChatGPT)
No ratings yet
ML Unit 5 (ChatGPT)
17 pages
Assignment 6 (Sol.) : Reinforcement Learning
No ratings yet
Assignment 6 (Sol.) : Reinforcement Learning
4 pages
Reinforcement_Learning_Algorithms_in_Global_Path_Planning_for_Mobile_Robot
No ratings yet
Reinforcement_Learning_Algorithms_in_Global_Path_Planning_for_Mobile_Robot
5 pages
_EC_2024__Zero_Sum_Stochastic_Games____Linear_Function_Approximation
No ratings yet
_EC_2024__Zero_Sum_Stochastic_Games____Linear_Function_Approximation
55 pages
242 Sheet 02 02
No ratings yet
242 Sheet 02 02
6 pages
Analytic Geometry: Graphic Solutions Using Matlab Language
From Everand
Analytic Geometry: Graphic Solutions Using Matlab Language
Ing. Mario Castillo
No ratings yet
hgtfhgfhtf
No ratings yet
hgtfhgfhtf
5 pages
S18 Reinforcement Learning 2
No ratings yet
S18 Reinforcement Learning 2
46 pages
Intro to Reinforcement Learning - DQ Q AC A3C
No ratings yet
Intro to Reinforcement Learning - DQ Q AC A3C
36 pages
ML - Unit 3 - Part II
No ratings yet
ML - Unit 3 - Part II
51 pages
Q Learning
No ratings yet
Q Learning
38 pages
Wcci 14 S
No ratings yet
Wcci 14 S
7 pages
I2ml3e Chap18
No ratings yet
I2ml3e Chap18
27 pages
4 Reinforcement Learning - Basic Algorithms: - S, A) ) and The Immediate Reward Function R (R (S, A, S
No ratings yet
4 Reinforcement Learning - Basic Algorithms: - S, A) ) and The Immediate Reward Function R (R (S, A, S
16 pages
Reinforcement Learning: Csci 5512: Artificial Intelligence Ii
No ratings yet
Reinforcement Learning: Csci 5512: Artificial Intelligence Ii
30 pages
12 ML Reinforcement Learning Value Based Control
No ratings yet
12 ML Reinforcement Learning Value Based Control
12 pages
Reinforcement Learning I:: The Setting and Classical Stochastic Dynamic Programming Algorithms
No ratings yet
Reinforcement Learning I:: The Setting and Classical Stochastic Dynamic Programming Algorithms
42 pages
17125-27-07 MORNING PHYSICS 2022
No ratings yet
17125-27-07 MORNING PHYSICS 2022
9 pages
171221-27-07 EVENING PHYSICS 2022
No ratings yet
171221-27-07 EVENING PHYSICS 2022
9 pages
Order_ID_5473256944
No ratings yet
Order_ID_5473256944
1 page
Lexical Analysis (2)
No ratings yet
Lexical Analysis (2)
2 pages
Case Study
No ratings yet
Case Study
4 pages
9 MTM Epsilon Negative
No ratings yet
9 MTM Epsilon Negative
22 pages
K-LB-1.30 SPD
No ratings yet
K-LB-1.30 SPD
3 pages
Con Rod Torque
No ratings yet
Con Rod Torque
3 pages
07 - Sistema Eletrico
No ratings yet
07 - Sistema Eletrico
86 pages
Respostas Homework Wizard Book 2
100% (1)
Respostas Homework Wizard Book 2
4 pages
Tanque de Extincion de Incendio Mediante Agente Limpio
No ratings yet
Tanque de Extincion de Incendio Mediante Agente Limpio
6 pages
Psychometric Tests
40% (5)
Psychometric Tests
4 pages
02-How To Win The Game of Life, The Book
No ratings yet
02-How To Win The Game of Life, The Book
18 pages
Impact of Technology On Strategy
No ratings yet
Impact of Technology On Strategy
37 pages
SOAL PAT Bahasa Inggris Kelas 5 2023-2024
No ratings yet
SOAL PAT Bahasa Inggris Kelas 5 2023-2024
3 pages
Mini Pleat Medium Filter f6 f9
No ratings yet
Mini Pleat Medium Filter f6 f9
3 pages
A Better Version Sheet Music
No ratings yet
A Better Version Sheet Music
11 pages
ASLI Products List 2014
No ratings yet
ASLI Products List 2014
2 pages
Writing reports for quasi-experiments - TREND
No ratings yet
Writing reports for quasi-experiments - TREND
11 pages
CSV Import Guide
No ratings yet
CSV Import Guide
3 pages
Lamore Ead530 Teacher Coaching and Development Practice Post
No ratings yet
Lamore Ead530 Teacher Coaching and Development Practice Post
5 pages
Welcome To My Country - Lesson Plan
No ratings yet
Welcome To My Country - Lesson Plan
3 pages
Dealing With Learner Reticence in The Speaking Class: Xiuqin Zhang and Katie Head
No ratings yet
Dealing With Learner Reticence in The Speaking Class: Xiuqin Zhang and Katie Head
9 pages
Vectors and Equilibrium
No ratings yet
Vectors and Equilibrium
9 pages
Instruction Level Parallelism (ILP)
No ratings yet
Instruction Level Parallelism (ILP)
9 pages
App List
No ratings yet
App List
13 pages
Introduction To Robotics
No ratings yet
Introduction To Robotics
13 pages
History of The Internet
No ratings yet
History of The Internet
1 page
Raychem Saudi Arabia Ltd. High Voltage, Medium Voltage & Low Voltage Cable Joints, Termination Kits & Heat Shrinkable Tubing
No ratings yet
Raychem Saudi Arabia Ltd. High Voltage, Medium Voltage & Low Voltage Cable Joints, Termination Kits & Heat Shrinkable Tubing
4 pages
Cbse Test Paper-02 12 Electricity and Its Effects
No ratings yet
Cbse Test Paper-02 12 Electricity and Its Effects
4 pages
Flash Tech Sports API Integration - Single Wallet (EN) 1.0.1.1
No ratings yet
Flash Tech Sports API Integration - Single Wallet (EN) 1.0.1.1
104 pages

Q-learn

Uploaded by

Q-learn

Uploaded by

CS 6300 Q-Learning Practice March 9, 2021

Episode 1 Episode 2 Episode 3 Episode 4 Episode 5

Q((2,3), N) = Q((2,3), S) = Q((2,2), E) =

Q(st , at ) = (1 − α)Q(st , at ) + α(R(st , at , st+1 ) + γ max

(a) Particularize the Q-learning equation for this problem.

Q((2,1), E) = Q((2,2), E) = Q((2,3), S) =

Q(st , at ) = (1 − α)Q(st , at ) + α(R(st , at , st+1 ) + γQ(st+1 , at+1 ))

(a) Particularize the SARSA equation for this problem.

Q((2,1), E) = Q((2,2), E) = Q((2,3), S) =

You might also like