0% found this document useful (0 votes)

44 views

Part 3 - Building A Deep Q-Network To Play Gridworld - Learning Instability and Target Networks - by NandaKishore Joshi - Towards Data Science

1. The document discusses learning instability, a common problem in deep reinforcement learning agents, and how to solve it using target networks. 2. A target network is a duplicate of the main Q-network that has its own parameters that are periodically synchronized with the main network to improve training stability. 3. The implementation in PyTorch creates a target network, copies its parameters from the main network, and synchronizes the parameters every 500 steps to address learning instability in the deep Q-learning agent training on the Gridworld environment.

Uploaded by

배영광

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views

Part 3 - Building A Deep Q-Network To Play Gridworld - Learning Instability and Target Networks - by NandaKishore Joshi - Towards Data Science

Uploaded by

배영광

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Part 3— Building a deep Q-network to play

Gridworld — Learning Instability and Target

Networks
NandaKishore Joshi · Follow
Published in Towards Data Science
5 min read · Dec 5, 2021

Listen Share

In this article let’s understand what is Learning instability which is a common problem
with Deep Reinforcement Learning agents. We will solve this problem by implementing
Target Network

Welcome to the third part of Deep Q-network tutorials. This is the continuation of
the part 1 and part 2. If you have not read these, I strongly suggest you to read them,
as many codes and explanations in this article will be directly related to the ones
already explained in them.

Till now in part 1 !!

1. We started by understanding what is Q-learning and the formula used to update

the Q-learning

2. Later we saw GridWorld game and defined its state, actions and rewards.

3. Then we came up with a Reinforcement Learning approach to win the game

4. We learnt how to import the GridWorld environment and various modes of the
environment

5. Designed and built a neural network to act as a Q function .

6. We trained and tested our RL agent and got very good result in solving static
GridWorld. But we failed to solve Random GridWorld.
In part 2 !!

1. We learnt what is Catastrophic forgetting and how it effects the DQN agent

2. We solved Catastrophic forgetting by implementing Experience reply

3. We saw that DRL suffer from learning instability.

In this article we will learn how to implement Target network to get rid of the
learning instability

What is learning instability ??

When Q-network's parameter's are updated after every move there are chances of
instabilities in the network as reward is very sparse (significant reward is given only
on winning or loosing). AS significant rewards are not available for each step the
algorithm start to behave erratically.

For example, In any state moving ‘up’ would win the game and hence +10 as reward
is achieved. Our algorithm thinks that action ‘up’ is good for the current state and
updates its parameters to predict high Q value to this action. But in next game, the
network predicts high Q value to ‘up’ and this might result in acquiring -10 reward.
Now the our algorithm thinks the action is bad and updates its parameter. Then
some game later moving up can result in winning. This would result in confusion
and predicted Q value would never settle for a reasonable stable value. This is very
similar to catastrophic forgetting which we have discussed in the previous article.

Device a duplicate Q-network called Target network!!

The solution DeepMind devised is to duplicate the Q-network into two copies, each
with its own model parameters: the “regular” Q-network and a copy called the target
network (symbolically denoted Q^-network, read “Q hat”). The target network is
identical to the Q-network at the beginning, before any training, but its own
parameters lag behind the regular Q-network in terms of how they’re updated.
Fig 1 : Q-learning with target network

The above figure shows the general overview for Q-learning with a target network.
It’s a fairly straightforward extension of the normal Q-learning algorithm, except
that you have a second Q-network called the target network whose predicted Q
values are used to backpropagate through and train the main Q-network. The target
network’s parameters are not trained, but they are periodically synchronized with
the Q-network’s parameters. The idea is that using the target network’s Q values to
train the Q-network will improve the stability of the training.

Steps followed in using a target network are

1. Initialize the Q-network with parameters (weights) θ(Q) (read “theta Q”).

2. Initialize the target network as a copy of the Q-network, but with separate
parameters θ(T) (read “theta T”), and set θ(T) = θ(Q).

3. Use the epsilon greedy method to select the action a with the Q value of the Q-
network

4. Observe the reward r(t+1) for state s(t+1) post taking the action a

5. The target network’s Q value will be set to r(t+1) if the episode has just been
terminated (i.e., the game was won or lost) or to r(t+1) + γmaxQθr(S(t+1))
otherwise

6. Backpropagate Target network’s Q-values through the Q-network. Here we are

not using Q-values of Q-network as this will lead to learning instability
7. Every C number of iterations, set the Target network weights with Q-Networks
weight

Let’s see the implementation of Target Network using PyTorch

import copy

model = torch.nn.Sequential(
torch.nn.Linear(l1, l2),
torch.nn.ReLU(),
torch.nn.Linear(l2, l3),
torch.nn.ReLU(),
torch.nn.Linear(l3,l4)
)

model2 = model2 = copy.deepcopy(model) 1

model2.load_state_dict(model.state_dict()) 2
sync_freq = 50 3

loss_fn = torch.nn.MSELoss()
learning_rate = 1e-3
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

1 Creates a second model by making an identical copy of the original Q-network

model

2 Copies the parameters of the original model

3 Synchronizes the frequency parameter; every 50 steps we will copy the

parameters of model into model2

Lets now build a DQN with experience replay and target network

from collections import deque

epochs = 5000
losses = []
mem_size = 1000
batch_size = 200
replay = deque(maxlen=mem_size)
max_moves = 50
h = 0
sync_freq = 500 1
j=0
for i in range(epochs):
game = Gridworld(size=4, mode='random')
state1_ = game.board.render_np().reshape(1,64) +
np.random.rand(1,64)/100.0
state1 = torch.from_numpy(state1_).float()
status = 1
mov = 0
while(status == 1):
j+=1
mov += 1
qval = model(state1)
qval_ = qval.data.numpy()
if (random.random() < epsilon):
action_ = np.random.randint(0,4)
else:
action_ = np.argmax(qval_)

action = action_set[action_]
game.makeMove(action)
state2_ = game.board.render_np().reshape(1,64) +
np.random.rand(1,64)/100.0
state2 = torch.from_numpy(state2_).float()
reward = game.reward()
done = True if reward > 0 else False
exp = (state1, action_, reward, state2, done)
replay.append(exp)
state1 = state2

if len(replay) > batch_size:

minibatch = random.sample(replay, batch_size)
state1_batch = torch.cat([s1 for (s1,a,r,s2,d) in
minibatch])
action_batch = torch.Tensor([a for (s1,a,r,s2,d) in
minibatch])
reward_batch = torch.Tensor([r for (s1,a,r,s2,d) in
minibatch])
state2_batch = torch.cat([s2 for (s1,a,r,s2,d) in
minibatch])
done_batch = torch.Tensor([d for (s1,a,r,s2,d) in
minibatch])
Q1 = model(state1_batch)
with torch.no_grad():
Q2 = model2(state2_batch) 2
Y = reward_batch + gamma * ((1-done_batch) * \
torch.max(Q2,dim=1)[0])
X = Q1.gather(dim=1,index=action_batch.long() \
.unsqueeze(dim=1)).squeeze()
loss = loss_fn(X, Y.detach())
print(i, loss.item())
clear_output(wait=True)
optimizer.zero_grad()
loss.backward()
losses.append(loss.item())
optimizer.step()

if j % sync_freq == 0: 3
model2.load_state_dict(model.state_dict())
if reward != -1 or mov > max_moves:
status = 0
mov = 0

losses = np.array(losses)

1 Sets the update frequency for synchronizing the target model parameters to
the main DQN

2 Uses the target network to get the maximum Q value for the next state

3 Copies the main model parameters to the target network

Below is the loss plot of the DQN with target network

Fig 2 : Loss plot with Target Network

We can see that the loss has a more stable downward trend. Experiment with the
hyperparameters, such as the experience replay buffer size, the batch size, the
target network update frequency, and the learning rate. The performance can be
quite sensitive to these hyperparameters.
When experimented on 1000 games we got a improvement of 3% in the accuracy
over just using experience replay. Now the accuracy stands at around 93%

The entire code for this project can be found in this GIT link

Check out the part 1 of this article here:

https://nandakishorej8.medium.com/part-1-building-a-deep-q-network-to-play-
gridworld-deepminds-deep-q-networks-78842007c631

Check out the part 2 of this article here:

https://nandakishorej8.medium.com/part-2-building-a-deep-q-network-to-play-
gridworld-catastrophic-forgetting-and-experience-6b2b000910d7

Reinforcement Learning Data Science Deep Learning Machine Learning

Deep Q Learning

Written by NandaKishore Joshi

93 Followers · Writer for Towards Data Science

Data Scientist by Profession, Blockchain developer by hobby, Start- up enthusiast and a Seeker by choice!!

More from NandaKishore Joshi and Towards Data Science

(Addison-Wesley Data & Analytics Series) Laura Graesser - Wah Loon Keng - Foundations of Deep Reinforcement Learning - Theory and Practice in Python-Addison-Wesley Professional (2019) PDF
100% (1)
(Addison-Wesley Data & Analytics Series) Laura Graesser - Wah Loon Keng - Foundations of Deep Reinforcement Learning - Theory and Practice in Python-Addison-Wesley Professional (2019) PDF
656 pages
Institute of Pure and Applied Sciences: Implementation of Learning Vector Quantization (LVQ) Using Matlab
No ratings yet
Institute of Pure and Applied Sciences: Implementation of Learning Vector Quantization (LVQ) Using Matlab
8 pages
Part 2 - Building A Deep Q-Network To Play Gridworld - Catastrophic Forgetting and Experience Replay - by NandaKishore Joshi - Towards Data Science
No ratings yet
Part 2 - Building A Deep Q-Network To Play Gridworld - Catastrophic Forgetting and Experience Replay - by NandaKishore Joshi - Towards Data Science
8 pages
01 Module 2 Neural Network Based Reinforcement Learning
No ratings yet
01 Module 2 Neural Network Based Reinforcement Learning
133 pages
Lecture Notes Deep Reinforcement Learning: Generalizability in Deep RL
No ratings yet
Lecture Notes Deep Reinforcement Learning: Generalizability in Deep RL
7 pages
Ee126 Project 1
No ratings yet
Ee126 Project 1
5 pages
Deep RL Tutorial Small
No ratings yet
Deep RL Tutorial Small
66 pages
RL UNIT V QA (1)
No ratings yet
RL UNIT V QA (1)
13 pages
CS461 Intermediate Report Team7
No ratings yet
CS461 Intermediate Report Team7
5 pages
Chapter 1
No ratings yet
Chapter 1
33 pages
4b - Deep Reinforcement Learning
No ratings yet
4b - Deep Reinforcement Learning
29 pages
import gym
No ratings yet
import gym
4 pages
Practical
No ratings yet
Practical
6 pages
Walking Through Original DQN Paper - by Stas Olekhnovich - Medium
No ratings yet
Walking Through Original DQN Paper - by Stas Olekhnovich - Medium
13 pages
Deep Q Learning
No ratings yet
Deep Q Learning
5 pages
Py Code Example 4 1 Gradient MC Evaluation
No ratings yet
Py Code Example 4 1 Gradient MC Evaluation
4 pages
Quadcopter
No ratings yet
Quadcopter
7 pages
drl_hw2_2022_fin2
No ratings yet
drl_hw2_2022_fin2
6 pages
Report PDF
No ratings yet
Report PDF
5 pages
IBest_DeepLearning
No ratings yet
IBest_DeepLearning
123 pages
13-RL DRL
No ratings yet
13-RL DRL
102 pages
Q-Learning and Deep Q Networks (DQN)
No ratings yet
Q-Learning and Deep Q Networks (DQN)
52 pages
Report
No ratings yet
Report
3 pages
DGM MID SEM
No ratings yet
DGM MID SEM
39 pages
Unit 5d - Deep Reinforcement Learning
No ratings yet
Unit 5d - Deep Reinforcement Learning
52 pages
REINFORCE_Algorithm
No ratings yet
REINFORCE_Algorithm
15 pages
Playing Geometry Dash With Convolutional Neural Networks
No ratings yet
Playing Geometry Dash With Convolutional Neural Networks
7 pages
What is TD Learning
No ratings yet
What is TD Learning
15 pages
Class ActorCritic
No ratings yet
Class ActorCritic
1 page
Intro to Reinforcement Learning - DQ Q AC A3C
No ratings yet
Intro to Reinforcement Learning - DQ Q AC A3C
36 pages
4a - Approximate Reinforcement Learning
No ratings yet
4a - Approximate Reinforcement Learning
55 pages
Federated Deep Reinforcement Learning
No ratings yet
Federated Deep Reinforcement Learning
9 pages
6S191 MIT DeepLearning L5
No ratings yet
6S191 MIT DeepLearning L5
62 pages
21 - Reinforcement Learning
No ratings yet
21 - Reinforcement Learning
25 pages
15
No ratings yet
15
17 pages
Autonomous Car Racing in Simulation Environment Using Deep Reinforcement Learning
No ratings yet
Autonomous Car Racing in Simulation Environment Using Deep Reinforcement Learning
6 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
26 pages
drl_v5
No ratings yet
drl_v5
64 pages
Pplication of Deep Reinforcement Learning For Ndian Stock Trading Automation
No ratings yet
Pplication of Deep Reinforcement Learning For Ndian Stock Trading Automation
9 pages
DL questions
No ratings yet
DL questions
30 pages
Towards-Monocular-Vision-based-Obstacle-Avoidance-through-Deep-Reinforcement-Learning (1)
No ratings yet
Towards-Monocular-Vision-based-Obstacle-Avoidance-through-Deep-Reinforcement-Learning (1)
14 pages
Hota-ML-ReinforcementLearning
No ratings yet
Hota-ML-ReinforcementLearning
12 pages
Q Learning
No ratings yet
Q Learning
38 pages
Sensors 23 02036
No ratings yet
Sensors 23 02036
24 pages
Download Full Foundations of Deep Reinforcement Learning Theory and Practice in Python First Edition Laura Graesser PDF All Chapters
100% (4)
Download Full Foundations of Deep Reinforcement Learning Theory and Practice in Python First Edition Laura Graesser PDF All Chapters
62 pages
Matlab NN Toolbox
No ratings yet
Matlab NN Toolbox
18 pages
Deep Learning Cheatsheet
No ratings yet
Deep Learning Cheatsheet
5 pages
Matlab NN Toolbox
No ratings yet
Matlab NN Toolbox
18 pages
Full Download Foundations of Deep Reinforcement Learning Theory and Practice in Python First Edition Laura Graesser PDF
100% (5)
Full Download Foundations of Deep Reinforcement Learning Theory and Practice in Python First Edition Laura Graesser PDF
62 pages
MAS-Lab7-QFA
No ratings yet
MAS-Lab7-QFA
10 pages
Part 13 MD
No ratings yet
Part 13 MD
41 pages
10. Learning Task
No ratings yet
10. Learning Task
14 pages
Ass1 Merged Merged
No ratings yet
Ass1 Merged Merged
19 pages
Artificial Neural Networks_dl
No ratings yet
Artificial Neural Networks_dl
55 pages
Deep Learning
No ratings yet
Deep Learning
3 pages
Deep Reinforcement Learning: Lecture Notes
No ratings yet
Deep Reinforcement Learning: Lecture Notes
60 pages
1
No ratings yet
1
8 pages
SocrAI Day 4
No ratings yet
SocrAI Day 4
38 pages
rldl
No ratings yet
rldl
27 pages
Amazing Java: Learn Java Quickly
From Everand
Amazing Java: Learn Java Quickly
Andrei Besedin
No ratings yet
MCS-011: Problem Solving and Programming
From Everand
MCS-011: Problem Solving and Programming
Dr. DK Sukhani
No ratings yet
Truth FoS
No ratings yet
Truth FoS
18 pages
DBMS Chap 3
No ratings yet
DBMS Chap 3
5 pages
Word Embeddings Notes
No ratings yet
Word Embeddings Notes
9 pages
Bsadcom 201910007
No ratings yet
Bsadcom 201910007
18 pages
Employee Satisfaction Survey AIHR
No ratings yet
Employee Satisfaction Survey AIHR
3 pages
Ai Image Generator
No ratings yet
Ai Image Generator
20 pages
Program-Registration Form
No ratings yet
Program-Registration Form
1 page
Bussiness Communication
50% (2)
Bussiness Communication
3 pages
Chapter 6 - Group 1
No ratings yet
Chapter 6 - Group 1
18 pages
06 Performance Evaluation
No ratings yet
06 Performance Evaluation
12 pages
Adaptive Control
No ratings yet
Adaptive Control
15 pages
mpc1 2
No ratings yet
mpc1 2
16 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
4 pages
(IT) 01 Introduction To Database v.3
No ratings yet
(IT) 01 Introduction To Database v.3
50 pages
Critical Systems Thinking: Beyond The Fragments
No ratings yet
Critical Systems Thinking: Beyond The Fragments
17 pages
Cheat Sheet - Machine Learning - Data Science Interview PDF
No ratings yet
Cheat Sheet - Machine Learning - Data Science Interview PDF
16 pages
Cluster Analysis Clustering
No ratings yet
Cluster Analysis Clustering
6 pages
Cluster Lecture-1
No ratings yet
Cluster Lecture-1
20 pages
Unstructured Data Classification
No ratings yet
Unstructured Data Classification
2 pages
Lab Manual SCOA 20 21
0% (1)
Lab Manual SCOA 20 21
37 pages
Breaking From Tradition - Model Based Control vs. PID
No ratings yet
Breaking From Tradition - Model Based Control vs. PID
7 pages
SQL Questions
No ratings yet
SQL Questions
4 pages
TSSD 2009
No ratings yet
TSSD 2009
16 pages
609b784052d21b6878c9321d - Syllabus Summer Training ShapeAi
No ratings yet
609b784052d21b6878c9321d - Syllabus Summer Training ShapeAi
8 pages
Pauta Relatorio 6
No ratings yet
Pauta Relatorio 6
79 pages
Communicative Language Activities and Strategies (CEFR Section 4.4)
No ratings yet
Communicative Language Activities and Strategies (CEFR Section 4.4)
3 pages
Tensorflow Deep Learning With Keras
No ratings yet
Tensorflow Deep Learning With Keras
90 pages
Konior, Bogna. The Gnostic Machine. Artificial Intelligence in Stanislaw Lem´s Summa Technologiae, ch 06
No ratings yet
Konior, Bogna. The Gnostic Machine. Artificial Intelligence in Stanislaw Lem´s Summa Technologiae, ch 06
20 pages
Business Analytics Anna University
No ratings yet
Business Analytics Anna University
40 pages

Part 3 - Building A Deep Q-Network To Play Gridworld - Learning Instability and Target Networks - by NandaKishore Joshi - Towards Data Science

Uploaded by

Part 3 - Building A Deep Q-Network To Play Gridworld - Learning Instability and Target Networks - by NandaKishore Joshi - Towards Data Science

Uploaded by

Part 3— Building a deep Q-network to play

Gridworld — Learning Instability and Target

Till now in part 1 !!

1. We started by understanding what is Q-learning and the formula used to update

3. Then we came up with a Reinforcement Learning approach to win the game

5. Designed and built a neural network to act as a Q function .

2. We solved Catastrophic forgetting by implementing Experience reply

3. We saw that DRL suffer from learning instability.

What is learning instability ??

Device a duplicate Q-network called Target network!!

Steps followed in using a target network are

6. Backpropagate Target network’s Q-values through the Q-network. Here we are

Let’s see the implementation of Target Network using PyTorch

model2 = model2 = copy.deepcopy(model) 1

1 Creates a second model by making an identical copy of the original Q-network

2 Copies the parameters of the original model

3 Synchronizes the frequency parameter; every 50 steps we will copy the

from collections import deque

if len(replay) > batch_size:

3 Copies the main model parameters to the target network

Below is the loss plot of the DQN with target network

Fig 2 : Loss plot with Target Network

Check out the part 1 of this article here:

Check out the part 2 of this article here:

Reinforcement Learning Data Science Deep Learning Machine Learning

Written by NandaKishore Joshi

More from NandaKishore Joshi and Towards Data Science

You might also like