MAS-Lab7-QFA

The document discusses Q-Learning with Linear Value Function Approximation, focusing on how agents learn from their actions in environments with large state and action spaces. It outlines the use of features to represent states and actions, as well as the process of updating Q-values through temporal differences and stochastic gradient descent. Additionally, it describes the Cartpole-v1 environment in Gymnasium, detailing its objectives, actions, rewards, and the setup for testing various learning parameters.

Uploaded by

Adrian Patrascu

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

MAS-Lab7-QFA

Uploaded by

Adrian Patrascu

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Multi Agent Systems

- Lab 7 -
Q-Learning with Linear Value Function Approximation
Q-Learning Recap

●
Value Function is more explicit in storing the value of executing
an action in a given state: q(s, a)
q π (s , a)=E π [ R t +1 + γ R t + 2 + γ2 Rt +3 +...∣St =s , A t =a ]= E π
[∑
τ =t +1
γ τ −t −1 R τ∣St =s , A t =a
]
●
Instance of model-free learning – i.e. environment dynamics is
unknown to the agent

●
We tackled environments where number of states is small
enough to use a tabular representation of the Q-Function
Q-Learning Recap

Agent learns by observing consequences procedure Q-Learning (<S, A, γ>, ε)

for all s in S, a in A do
of actions it takes in the environment
q(s,a) ← 0 // set initial values to 0
end for
Q-values adjusted through temporal for all episodes do
s ← initial state
differences
while s not final state do
pick action a using ε-Greedy (s, q, ε)
Learning is off-policy execute a → get reward r and next state s’
q(s, a) ← q(s, a) + α(r + γmaxa’q(s’,a’ ) - q(s, a))
Learning policy is greedy
s ← s’
Play policy allows for exploration end while
end for
for all s in S do
procedure ε-Greedy (s, q, ε)
π(s) ← argmaxa in A q(s, a)
with prob ε: return random(A)
end for
with prob 1-ε: return argmax q (s , a)
a return π
end
Q-Learning in continuous state space

●
Many real world problems have enormous state and/or action
spaces (e.g. robotics control, self driving)

●
Tabular representation is not really appropriate

●
Idea: Use a function to represent the value
Q-Learning with Linear Value Function
Approximation – General Formulation

( )
x 1 (s , a)
●
Use features to represent state and action x (s , a)= x 2 (s , a)
...
x n (s , a)

●
Q-function represented as weighted linear combination of
features n
^ , a , w)=x(s , a)T w=∑ x j (s , a)w j
Q(s
j=1

●
Learn weights w through stochastic gradient descent updates
∇ w J (w)=∇ w E π [(Q (s , a)−Q^ (s , a , w)) ]
π π 2
Q-Learning with Linear Value Function
Approximation – Simplified

( )
x 1 (s)
●
When action space A is small and finite x (s)= x 2 (s)
...
consider a featurised representation of states only x n (s)

●
Q-function represented as collection of weighted linear
combination of features – one model per action
n
Q^ a (s , w)=x(s) w=∑ x j (s) w j , ∀ a∈ A
T

j=1

●
Learn weights w through stochastic gradient descent updates
∇ w J (w)=∇ w E π [(Q (s , a)−Q^ a (s , w)) ]
π π 2
Q-Learning with Linear Value Function
Approximation – TD Target

●
For Q-Function, instead of the actual gain per episode under
current policy Qπ (s , a) use TD-target r + γ max a ' Q^ (s ' , a ' , w)

●
Learn weights w through stochastic gradient descent updates
∇ w J (w)=∇ w (r + γ max a ' Q^a ' (s ' , w)− Q^ a (s , w))
2

Δ w=α (r+ γ max a ' Q^a ' (s ' , w)−Q^ a (s , w)) ∇ w Q^ a (s , w)

Δ w=α (r+ γ max a ' Q^a ' (s ' , w)−Q^ a (s , w)) x (s)
Q-Learning, Linear Approximation, TD target

procedure Q-Learning (<S, A, γ>, ε, estimator) Agent learns by observing consequences

for all episodes do
of actions it takes in the environment

s ← initial state
while s not final state do Q-values adjusted through temporal
pick action a using ε-Greedy (os, estimator, ε)
differences
execute a → get reward r and next state s’
x(s’) = featurize(os’)
[q̂a1(s’), …, q̂am(s’)] = estimator.predict(x(s’))
Learning is off-policy
tdtarget = r + γmaxa’q̂a’(s’)
Learning policy is greedy
estimator.update(s, a, tdtarget)
s ← s’ Play policy allows for exploration
end while
end for
procedure ε-Greedy (os, estimator, ε)

for all s in S do x(s) = featurize(os)

π(s) ← argmaxa in A q(s, a) q̂ = estimator(x(s))
end for with prob ε: return random(A)
return π with prob 1-ε: return argmax q^a (s)
a
end
OpenAI Gym Cartpole Environment

●
Cartpole-v1 environment in Gymnasium:
– Objective: keep a pendulum upright for as long as possible
– 2 actions: left (force = -1), right (force = +1)
– Reward: +1 for every time step that the pole remains upright
– Game ends when pole more the 15° from vertical OR cart moves >
2.4 units from center
OpenAI Gym Cartpole Environment

●
Cartpole-v1 environment in Gymnasium:
– Max num steps per episode = 100
– Use one q-function estimator per action: q̂left , q̂right
– Use a MLP-like feature extractor
●
Sample 4 sets of weights and 4 sets of biases
w ij ~ √ 2 γ k N (0, 1), k =1..4 , i=1. .100, j=1..4 γ k ∈{5.0,2.0, 1.0, 0.5}
k
–
k
– bi ~ uniform(0,2 π ), k =1..4 , i=1. .100
●
Explore three different activation functions: cos(x), sigmoid(x), tanh(x)
●
Explore different values of the SGD learning rate
●
Explore different values of ε – ε=0.0, ε=0.1, ε=decay(init=0.1, min =
0.001, factor=0.99)
– Plot agent learning curves for each case

Hourglass Workout Program by Luisagiuliet 2
76% (21)
Hourglass Workout Program by Luisagiuliet 2
51 pages
12 Week Program: Summer Body Starts Now
87% (46)
12 Week Program: Summer Body Starts Now
70 pages
Read People Like A Book by Patrick King-Edited
58% (81)
Read People Like A Book by Patrick King-Edited
12 pages
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
77% (13)
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
260 pages
Cheat Code To The Universe
94% (79)
Cheat Code To The Universe
34 pages
Facial Gains Guide (001 081)
91% (45)
Facial Gains Guide (001 081)
81 pages
Curse of Strahd
95% (467)
Curse of Strahd
258 pages
The Psychiatric Interview - Daniel Carlat
91% (34)
The Psychiatric Interview - Daniel Carlat
473 pages
The Borax Conspiracy
91% (57)
The Borax Conspiracy
14 pages
The Secret Language of Attraction
86% (107)
The Secret Language of Attraction
278 pages
How To Develop and Write A Grant Proposal
83% (542)
How To Develop and Write A Grant Proposal
17 pages
Penis Enlargement Secret
60% (124)
Penis Enlargement Secret
12 pages
Workbook For The Body Keeps The Score
89% (53)
Workbook For The Body Keeps The Score
111 pages
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
83% (1016)
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
13 pages
KamaSutra Positions
78% (69)
KamaSutra Positions
55 pages
7 Hermetic Principles
93% (30)
7 Hermetic Principles
3 pages
27 Feedback Mechanisms Pogil Key
77% (13)
27 Feedback Mechanisms Pogil Key
6 pages
Frank Hammond - List of Demons
92% (92)
Frank Hammond - List of Demons
3 pages
Phone Codes
79% (28)
Phone Codes
5 pages
36 Questions That Lead To Love
91% (35)
36 Questions That Lead To Love
3 pages
How 2 Setup Trust
97% (307)
How 2 Setup Trust
3 pages
The 36 Questions That Lead To Love - The New York Times
94% (34)
The 36 Questions That Lead To Love - The New York Times
3 pages
100 Questions To Ask Your Partner
78% (36)
100 Questions To Ask Your Partner
2 pages
Satanic Calendar
25% (56)
Satanic Calendar
4 pages
The 36 Questions That Lead To Love - The New York Times
95% (21)
The 36 Questions That Lead To Love - The New York Times
3 pages
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
100% (8)
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
27 pages
Jeffrey Epstein39s Little Black Book Unredacted PDF
75% (12)
Jeffrey Epstein39s Little Black Book Unredacted PDF
95 pages
1001 Songs
69% (72)
1001 Songs
1,798 pages
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
23% (954)
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
38 pages
Zodiac Sign & Their Most Common Addictions
63% (30)
Zodiac Sign & Their Most Common Addictions
9 pages
10. Learning Task
No ratings yet
10. Learning Task
14 pages
S18 Reinforcement Learning 2
No ratings yet
S18 Reinforcement Learning 2
46 pages
37 RL
No ratings yet
37 RL
18 pages
Reinforcement Learning: Instructor: Max Welling
No ratings yet
Reinforcement Learning: Instructor: Max Welling
18 pages
Q-Learning in RL With Openai Gym: Joo Soon Lee
No ratings yet
Q-Learning in RL With Openai Gym: Joo Soon Lee
34 pages
RL Theory Tutorial
No ratings yet
RL Theory Tutorial
80 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
6 pages
Rl Dp and Value and Policy
No ratings yet
Rl Dp and Value and Policy
4 pages
Temporal Difference Learning
No ratings yet
Temporal Difference Learning
17 pages
ML - Unit 3 - Part II
No ratings yet
ML - Unit 3 - Part II
51 pages
Intro to Reinforcement Learning - DQ Q AC A3C
No ratings yet
Intro to Reinforcement Learning - DQ Q AC A3C
36 pages
Reinforcement Learning: Csci 5512: Artificial Intelligence Ii
No ratings yet
Reinforcement Learning: Csci 5512: Artificial Intelligence Ii
30 pages
Unit 5d - Deep Reinforcement Learning
No ratings yet
Unit 5d - Deep Reinforcement Learning
52 pages
8200 Non Delusional Q Learning and Value Iteration
No ratings yet
8200 Non Delusional Q Learning and Value Iteration
11 pages
Deep RL Tutorial Small
No ratings yet
Deep RL Tutorial Small
66 pages
7- Reinforcement Learning
No ratings yet
7- Reinforcement Learning
23 pages
3964 Double Q Learning
No ratings yet
3964 Double Q Learning
9 pages
A12 Spring2024
No ratings yet
A12 Spring2024
5 pages
CS 188 Introduction To Artificial Intelligence Summer 2019 Note 4
No ratings yet
CS 188 Introduction To Artificial Intelligence Summer 2019 Note 4
9 pages
Lecture Notes v1.0 687 F22
No ratings yet
Lecture Notes v1.0 687 F22
115 pages
07 Deep Reinforcement Learning (John)
No ratings yet
07 Deep Reinforcement Learning (John)
52 pages
Q-Learning and Deep Q Networks (DQN)
No ratings yet
Q-Learning and Deep Q Networks (DQN)
52 pages
Rule-based Reinforcement Learning augmented by External Knowledge
No ratings yet
Rule-based Reinforcement Learning augmented by External Knowledge
7 pages
Smooth Q-Learning - Accelerate Convergence
No ratings yet
Smooth Q-Learning - Accelerate Convergence
7 pages
11-DL-Deep Learning For Reinforcement Learning
No ratings yet
11-DL-Deep Learning For Reinforcement Learning
47 pages
Q Learning
No ratings yet
Q Learning
12 pages
cs188 sp23 Note14
No ratings yet
cs188 sp23 Note14
2 pages
Report On Reinforcement Learning
No ratings yet
Report On Reinforcement Learning
26 pages
13-RL DRL
No ratings yet
13-RL DRL
102 pages
Reinforcement Learning: Mitchell, Ch. 13 (See Also Barto & Sutton Book On-Line)
No ratings yet
Reinforcement Learning: Mitchell, Ch. 13 (See Also Barto & Sutton Book On-Line)
14 pages
Reinforcement Learning: Mitchell, Ch. 13 (See Also Barto & Sutton Book On-Line)
No ratings yet
Reinforcement Learning: Mitchell, Ch. 13 (See Also Barto & Sutton Book On-Line)
14 pages
Sec 12
No ratings yet
Sec 12
5 pages
Unit 1
No ratings yet
Unit 1
18 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
48 pages
SP14 CS188 Lecture 10 - Reinforcement Learning I PDF
No ratings yet
SP14 CS188 Lecture 10 - Reinforcement Learning I PDF
38 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
11 pages
CHAPTER 21-Final
No ratings yet
CHAPTER 21-Final
20 pages
Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)
No ratings yet
Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)
11 pages
AI 11 Reinforcement Learning II
No ratings yet
AI 11 Reinforcement Learning II
35 pages
21 - Reinforcement Learning
No ratings yet
21 - Reinforcement Learning
25 pages
What is TD Learning
No ratings yet
What is TD Learning
15 pages
Q-Learning: Reinforcement Learning Basic Q-Learning Algorithm Common Modifications
No ratings yet
Q-Learning: Reinforcement Learning Basic Q-Learning Algorithm Common Modifications
22 pages
Reinforcement Learning (Part 2) : Nguyen Do Van, PHD
No ratings yet
Reinforcement Learning (Part 2) : Nguyen Do Van, PHD
46 pages
Q_Networks[1]-31-50
No ratings yet
Q_Networks[1]-31-50
20 pages
EE 675 Lecture 27th March
No ratings yet
EE 675 Lecture 27th March
4 pages
DD2431 Machine Learning Lab 4: Reinforcement Learning Python Version
No ratings yet
DD2431 Machine Learning Lab 4: Reinforcement Learning Python Version
9 pages
Playing Geometry Dash With Convolutional Neural Networks
No ratings yet
Playing Geometry Dash With Convolutional Neural Networks
7 pages
Using Q-Learning To Automatically Tune Quadcopter PID Controller Online For Fast Altitude Stabilization
No ratings yet
Using Q-Learning To Automatically Tune Quadcopter PID Controller Online For Fast Altitude Stabilization
6 pages
SP14 CS188 Lecture 10 - Reinforcement Learning I
No ratings yet
SP14 CS188 Lecture 10 - Reinforcement Learning I
35 pages
Ecs 403 ML Module I
No ratings yet
Ecs 403 ML Module I
33 pages
New CZ3005 Module 5 - Reinforcement Learning
No ratings yet
New CZ3005 Module 5 - Reinforcement Learning
31 pages
Q Learning SARSA Deep Q Learning
No ratings yet
Q Learning SARSA Deep Q Learning
4 pages
Deep Quality-Value (DQV) Learning: Preprint. Work in Progress
No ratings yet
Deep Quality-Value (DQV) Learning: Preprint. Work in Progress
10 pages
lecture doubts
No ratings yet
lecture doubts
2 pages
EEE4114F 2022 ML Tutorial Solution 2 of 2
No ratings yet
EEE4114F 2022 ML Tutorial Solution 2 of 2
4 pages
L13 Reinforcement Learning
No ratings yet
L13 Reinforcement Learning
35 pages
2023 Week4 Funcapproximate Update
No ratings yet
2023 Week4 Funcapproximate Update
69 pages
I2ml3e Chap18
No ratings yet
I2ml3e Chap18
27 pages
Reinforcement learning
No ratings yet
Reinforcement learning
10 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
20240402-10-681-PUB
No ratings yet
20240402-10-681-PUB
5 pages
Active Statistics (Andrew Gelman, Aki Vehtari) (Z-Library)
No ratings yet
Active Statistics (Andrew Gelman, Aki Vehtari) (Z-Library)
370 pages
temaEnefitSSL
No ratings yet
temaEnefitSSL
7 pages
Predictive-Corrective Incompressible SPH
No ratings yet
Predictive-Corrective Incompressible SPH
7 pages
Lesson Plan Health Is Wealth
100% (1)
Lesson Plan Health Is Wealth
2 pages
PPGC
No ratings yet
PPGC
1 page
Larsen & Toubro - MT PGMT GET PGET
No ratings yet
Larsen & Toubro - MT PGMT GET PGET
3 pages
Of Revenge Summary
0% (1)
Of Revenge Summary
2 pages
Introduction To Malaysian English
No ratings yet
Introduction To Malaysian English
4 pages
Festivals - Halloween: Skull Mask Teeth Hat A C D B
No ratings yet
Festivals - Halloween: Skull Mask Teeth Hat A C D B
3 pages
CASE PRESENTATION PP - Anxiety. Tiffany Gordon
No ratings yet
CASE PRESENTATION PP - Anxiety. Tiffany Gordon
6 pages
Pygmalion
No ratings yet
Pygmalion
10 pages
Edexcel GCE: Tuesday 17 January 2012 Time: 1 Hour 30 Minutes
No ratings yet
Edexcel GCE: Tuesday 17 January 2012 Time: 1 Hour 30 Minutes
15 pages
CBT for Depression _ an Integrated Approach -- Stephen Barton, Peter Armstrong -- 1, 2018-11-28 -- SAGE Publications, Limited; SAGE Publications Ltd -- 9781526402738 -- Ba9b9788266ca45eca3908e4bec9bf9e -- Anna’s Archive
No ratings yet
CBT for Depression _ an Integrated Approach -- Stephen Barton, Peter Armstrong -- 1, 2018-11-28 -- SAGE Publications, Limited; SAGE Publications Ltd -- 9781526402738 -- Ba9b9788266ca45eca3908e4bec9bf9e -- Anna’s Archive
313 pages
Professional Resume Ortiz
No ratings yet
Professional Resume Ortiz
3 pages
Class 11 Ch-8 Gravitation Revision Notes For Jee
No ratings yet
Class 11 Ch-8 Gravitation Revision Notes For Jee
14 pages
Holy Infant College Tacloban City NSTP - Cwts Unit First Aid
No ratings yet
Holy Infant College Tacloban City NSTP - Cwts Unit First Aid
40 pages
Screenshot 2023-10-09 at 18.41.30
No ratings yet
Screenshot 2023-10-09 at 18.41.30
10 pages
Shadifadsha 05DPM23F1026 DPM1C
No ratings yet
Shadifadsha 05DPM23F1026 DPM1C
2 pages
Oppenheimer, Robert - The Tree of Knowledge (Harper's, October 1958)
No ratings yet
Oppenheimer, Robert - The Tree of Knowledge (Harper's, October 1958)
7 pages
Performance Management and Coaching
67% (6)
Performance Management and Coaching
41 pages
Limitations of The Present System of Examination
No ratings yet
Limitations of The Present System of Examination
3 pages
2ND Sem RM Project
No ratings yet
2ND Sem RM Project
29 pages
Black Box Testing
No ratings yet
Black Box Testing
3 pages
Problems of Management in The 21st Century, Vol. 2, 2011
No ratings yet
Problems of Management in The 21st Century, Vol. 2, 2011
201 pages
Ieeestd 1228 1994 Software Safety Plans
No ratings yet
Ieeestd 1228 1994 Software Safety Plans
20 pages
2306 8ma0 21 As Statistics June 2023 PDF
No ratings yet
2306 8ma0 21 As Statistics June 2023 PDF
2 pages
Emotional Intelligence
No ratings yet
Emotional Intelligence
9 pages
Music 4 Exam
No ratings yet
Music 4 Exam
2 pages
N@bharathi
No ratings yet
N@bharathi
2 pages