Deep Deformable Q-Network An Extension of Deep Q-Network

The document presents Deep Deformable Q-Network, an extension of Deep Q-Network (DQN) that uses deformable convolution mechanisms. It aims to address limitations in DQN related to fixed receptive field shapes and inability to model large transformations. The network uses deformable convolution layers that allow the shape of receptive fields to be learned during training, rather than fixed. The document describes the overall network structure and evaluates it on Atari games, finding it a feasible and effective approach.

Uploaded by

Yusra Banday

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views

Deep Deformable Q-Network An Extension of Deep Q-Network

Uploaded by

Yusra Banday

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Deep Deformable Q-Network:An Extension of Deep Q-Network

Beibei Jin Jianing Yang

[email protected] [email protected]

Xiangsheng Huang Dawar Khan

[email protected] [email protected]

ABSTRACT with the reality that the real world problems are usually high-
The performance of Deep Reinforcement Learning (DRL) algorithms dimensional . On the other hand, these methods rely on manually
is usually constrained by instability and variability. In this work,we extracted features to represent the environment, which limits the
present an extension of Deep Q-Network (DQN) called Deep De- flexibility of the agent. Recently, deep learning has made great
formable Q-Network which is based on deformable convolution progress in various fields [22–24] because the neural networks have
mechanisms. The new algorithm can readily be built on existing efficient generalization ability and powerful abstraction ability. [1]
models and can be easily trained end-to-end by standard back- first put forward the deep reinforcement learning and Deep Q-
propagation.Extensive experiments on the Atari games validate Network (DQN) by successfully integrating the Deep Learning and
the feasibility and effectiveness of the proposed Deep Deformable Reinforcement Learning. DQN presented a remarkably flexible and
Q-Network. stable algorithm, showing great success in the majority of games
within the Arcade Learning Environment (ALE) [25]. The success of
CCS CONCEPTS Deep Q- Network (DQN) inspires researchers to seek improvements
in order to further its learning abilities. In [2], Mnih et al., introduced
• Computing methodologies → Supervised learning by classifi-
a mechanism to improve deep Q-network, which learns successful
cation;
policies directly from high-dimensional sensory inputs using end-
to-end reinforcement learning.Similarly, another extension of DQN
KEYWORDS
is presented in [26], which works on âĂIJsoftâĂİ and âĂIJhardâĂİ
Deep Q-Network, deformable convolution layer, Deep Learning, attention mechanisms. Another method is Double DQN, which uses
Reinforcement Learning the existing architecture and deep neural network of the DQN to
ACM Reference format: find better policies [6]. Mnih et al., developed a framework [27]
Beibei Jin, Jianing Yang, Xiangsheng Huang, and Dawar Khan. 2017. Deep for prioritizing experience. This framework frequently replays the
Deformable Q-Network:An Extension of Deep Q-Network. In Proceedings important transitions and therefore learn more efficiently. In [7],
of WI ’17, Leipzig, Germany, August 23-26, 2017, 4 pages. Wang et al., presented two separate estimators: one for the state
https://doi.org/10.1145/3106426.3109426 value function and one for the state-dependent action advantage
function, which generalizes the learning across actions without
1 INTRODUCTION imposing any change to the underlying reinforcement learning
Reinforcement Learning (RL) is to learn how to map the states to algorithm.
the actions, so as to obtain the maximum numerical reward signal. However, the CNNs used in Deep Q-Network usually have fixed
In Reinforcement Learning, an agent seeks an optimal policy for a shape of receptive fields, which is undesirable for high level layers
sequential decision making problem [18]. To the best of our knowl- that encode the semantics over spatial locations and limits them
edge, delayed reward and trial-and-error search are the two most for modeling large, unkonwn transformations.
important and distinguishing features for reinforcement learning. In this work, We propose a reinforcement learning method called
Reinforcement Learning has a wide application prospect for solving Deep deformable Q-Network based on deformable convolution
complex control and decision making problems. neural network by improving the structure of the neurak network in
During the course of the development of Reinforcement Learning Deep Q-Network. Different from the conventional Deep Q-Network,
(RL), many algorithms including Q-learning [19], SARSA [18, 20], the CNNs used in Deep deformable Q-Network have a kind of new
and policy gradient methods [21] have been introduced to solve RL convolution unit which has more diverse forms of receptive fields
problems. But most traditional methods usually assume that the rather than fixed shape of receptive fields. Moreover, the shape of
state space and the action space are discrete, which is inconsistent the receptive fields can be learned during the training procedure.
The rest of the paper is organized as follow. In section 2, the
Permission to make digital or hard copies of all or part of this work for personal or specific algorithms are described. Section 3 presents the structure of
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
the network and analyzes the result of the experiments. Conclusions
on the first page. Copyrights for components of this work owned by others than ACM are formulated in Section 4.
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specific permission and/or a
fee. Request permissions from [email protected]. 2 DEEP DEFORMABLE Q-NETWORK
WI ’17, August 23-26, 2017, Leipzig, Germany In Reinforcement Learning (RL), an agent is faced with a sequential
© 2017 Association for Computing Machinery.
ACM ISBN 978-1-4503-4951-2/17/08. . . $15.00 decision making problem [1, 2], where interaction with the envi-
https://doi.org/10.1145/3106426.3109426 ronment takes at discrete time steps (t = 0, 1, ...). At time t the
963
WI ’17, August 23-26, 2017, Leipzig, Germany B. Jin et al.

Figure 1: The overall network structure diagram.The box above is the illustration of 3*3 deformable convolution.The box
bottom is the illustration of the network used in this paper.

Table 1: Network architecture of Deep Deformable Q-Network

Layer Input Filter size Stride Num. filters Activation Output

conv1 84*84*4 8*8 4 32 Relu 20*20*32
deform_conv1 20*20*32 3*3 1 64 Relu 20*20*32
conv2 20*20*32 4*4 2 64 Relu 9*9*64
deform_conv2 9*9*64 3*3 1 128 Relu 9*9*64
conv3 9*9*64 3*3 1 64 Relu 7*7*64
fc4 7*7*64 512 Relu 512
fc5 512 n_actions Linear n_actions

agent observes state st ∈ S and selects an action at ∈ A, which is the discount factor. The goal of the agent is to find an optimal
results in a scalar reward r t ∈ R, and a transition to a next state policy that maximize its expected discounted cumulative reward.
st +1 ∈ S. We consider infinite horizon problems with a discounted
cumulative reward objective R t = Tt′ =t γ t −t r t ′ , where γ ∈ [0, 1]
Í ′

964
Deep Deformable Q-Network:An Extension of Deep Q-Network WI ’17, August 23-26, 2017, Leipzig, Germany

Figure 2: The first plot shows the average maximum predicted action-value of a held out set of states on Breakout respectively.
The third plot shows average reward per episode on Breakout respectively during training. The statistics were computed by
running an ε-greedy policy.

2.1 Deep Q-Learning Now the sampling is operated over the irregular and offsetted loca-
We consider the usual Deep Q-Network [1]. Deep Q-Network is a tions pn + ∆pn . Since the offset ∆pn is typically fractional, Eq. (2)
multiple-layered neural network that for a given state s outputs a is implemented through bilinear interpolation as
vector of action value Q(s, a; θ ), θ denotes parameters of the neural
Õ
x(p) = G(q, p) · x(q) (6)
network. For an n-dimensional state space and an action space q
including m actions, the Deep Q-Network is a function from R n to
where p denotes an arbitrary location (p = p0 + pn + ∆pn for Eq.
Rm . The loss function of Deep Q-Network is as follows [1]:
(2)), q enumerates all integral spatial locations in the feature map
Li (θ i ) = Es,a,r,s ′ [(yi − Q(s, a, θ i ))2 ], (1) X , and G is the bilinear interpolation kernel.
The offsets are obtained by applying a convolutional layer on
Where the same spatial resolution with the input feature map. The channel
yi = (r i + γ maxa ′ Q(s ′, a ′, θ − )) (2) dimension is 2N, encoding N 2D offset vectors.
Li represents the expected error when the parameter is θ i . θ −
represents the parameters of a separated target-network, and θ i 3 EXPERIMENT
represents the parameters of the online-network. By using target- The proposed algorithm was tested on a popular Atari 2600 game
network, the performance of learning updates is improved. The which is called Breakout. The project was implemented on Keras
gradient descent part is as follows [1]: using tensorflow backend.

∇θ i Li (θ i ) = Es,a,r,s ′ [(yi − Q(s, a; θ i ))∇θ i Q(s, a)] (3) 3.1 Network Architecture
In order to avoid correlated updates, experience replay with fixed The structure of the network in this work is based on the network
maximum capacity is introduced to the Deep Q-Network. Tran- in [1] . We adjust the network in [1] and combine deformable con-
sitions from previous episodes are sampled from replay memory volution layers into it. The overall network structure is illustrated
multiple times to update the network, therefore divergence issue in figure 1. Table 1 details the parameters of each layer of the net-
caused by correlated updates is avoided. work. The final input representation to the neural network is an
84*84*4 image which is stacked by 4 frames . The first hidden layer
2.2 Deformable Convolution convolves 32 8*8 filters with stride 1 with the input image and
Generally, a convolution unit has a number of learnable filters, applies a subsample of 4*4 followed by a rectifier nonlinearity . The
and each filter is convolved with its receptive field. In terms of 2D second hidden layer convolves 64 4*4 filters with stride 1, followed
convolution, the grid R denotes the field size and dilation [9]. For by a subsample of 2*2 and a rectifier nonlinearity. The third hidden
example, layer convolves 64 3*3 filters with stride 1, again followed by a rec-
tifier units. The final hidden layer is fully-connected and consists
R = {(−1, −1), (−1, 0), . . . , (0, 1), (1, 1)} of 512 rectifier units. The output layer is a fully-connected linear
layer with a single output for each valid action. The deformable
defines a 3 × 3 kernel with dilation 1.
convolution layer can be put before any convolution layer. The
For each location p0 on the feature map y, we have
Õ number of filters and the shape of the output in the deformable
y(p0 ) = w(pn ) · x(p0 + pn ). (4) convolution layer must be consistent with the layer before it. Table
pn ∈R 1 is an example with two deformable convolution layers before the
convolution hidden layers.
Deformable convolution [9] increases the regular grid R with offsets
{∆pn |n = 1, . . . , N }, where N = |R|, Eq. (1) becomes 3.2 Hyper-parameters
Õ
y(p0 ) = w(pn ) · (p0 + pn + ∆pn ) (5) In our experiments, the discount factor γ was set to 0. 99, the
pn ∈R learning rate α is set to 0.00025. The number of steps between
965
WI ’17, August 23-26, 2017, Leipzig, Germany B. Jin et al.

target network updates was 10,000. Training was done over 12,000 [4] Osband, I., Blundell, C., Pritzel, A. and Van Roy, B., 2016. Deep exploration via
episodes . The agent was evaluated after every 10,000 steps based bootstrapped DQN. In Advances In Neural Information Processing Systems (pp.
4026-4034).
in the average reward per episode obtained by running an e-greedy [5] Sutton, R.S. and Barto, A.G., 1998. Reinforcement learning: An introduction (Vol.
policy with e annealed linearly from 1 to 0.1 over the first million 1, No. 1). Cambridge: MIT press.
[6] Van Hasselt, H., Guez, A. and Silver, D., 2016, March. Deep Reinforcement Learn-
frames, and fixed at 0.1 thereafter. The size of the experience replay ing with Double Q-Learning. In AAAI (pp. 2094-2100).
memory was 400,000 tuples. The memory was sampled to update [7] Wang, Z., Schaul, T., Hessel, M., van Hasselt, H., Lanctot, M. and de Freitas,
the network every 4 steps with minibatches of size 32. The model N., 2015. Dueling network architectures for deep reinforcement learning. arXiv
preprint arXiv:1511.06581.
was trained using the backpropagation through time. [8] Schulman, J., Moritz, P., Levine, S., Jordan, M. and Abbeel, P., 2015. High-
dimensional continuous control using generalized advantage estimation. arXiv
3.3 Training and Results preprint arXiv:1506.02438.
[9] Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H. and Wei, Y., 2017. Deformable
Unlike supervised learning, accurately evaluating the progress of Convolutional Networks. arXiv preprint arXiv:1703.06211.
an agent in reinforcement learning can be challenging. The total [10] Jeon, Y. and Kim, J., 2017. Active Convolution: Learning the Shape of Convolution
for Image Classification. arXiv preprint arXiv:1703.09076.
reward of the agent in an episode averaged over a number of games, [11] Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J. and Wojna, Z., 2016. Rethink-
so that it only can be computed during training. The first plot in ing the inception architecture for computer vision. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition (pp. 2818-2826).
figure 2 shows how the average total reward evolves during training [12] Simonyan, K. and Zisserman, A., 2014. Very deep convolutional networks for
on the games Breakout. The average total reward metric tends to large-scale image recognition. arXiv preprint arXiv:1409.1556.
be very noisy because small changes to the weights of a policy [13] Krizhevsky, A., Sutskever, I. and Hinton, G.E., 2012. Imagenet classification
with deep convolutional neural networks. In Advances in neural information
can lead to large changes in the distribution of states the policy processing systems (pp. 1097-1105).
visits [1] . The estimated action-value function Q which estimates [14] Wang, J., Zhou, J., Wonka, P. and Ye, J., 2013. Advances in neural information
processing systems. In Neural information processing systems foundation.
how much discounted reward the agent can obtain is more stable. [15] Sutton, R.S. and Barto, A.G., 1998. Introduction to reinforcement learning (Vol.
We can see relatively smooth improvement to predicted Q during 135). Cambridge: MIT Press.
training and didn’t experience any divergence guarantees. [16] Stadie, B.C., Levine, S. and Abbeel, P., 2015. Incentivizing exploration in reinforce-
ment learning with deep predictive models. arXiv preprint arXiv:1507.00814.
[17] LeCun, Y., Bengio, Y. and Hinton, G., 2015. Deep learning. Nature, 521(7553),
4 CONCLUSION AND FUTURE WORK pp.436-444.
[18] Sutton, R.S. and Barto, A.G., 1998. Reinforcement learning: An introduction (Vol.
In this paper, we have presented one way of integrating deformable 1, No. 1). Cambridge: MIT press.
convolution machanisms which provides more freedom to a conven- [19] Watkins, C.J. and Dayan, P., 1992. Q-learning. Machine learning, 8(3-4), pp.279-
tional convolution into the structure of Deep Q-Network. Through 292.
[20] Rummery, G.A. and Niranjan, M., 1994. On-line Q-learning using connectionist
experiments, we showed the feasibility and effectiveness of our systems. University of Cambridge, Department of Engineering.
approach. [21] Sutton, R.S., McAllester, D.A., Singh, S.P. and Mansour, Y., 1999, November. Policy
gradient methods for reinforcement learning with function approximation. In
In future work, we may dynamically learn when and how to NIPS (Vol. 99, pp. 1057-1063).
insert deformable convolution layers for best results. Finally, incor- [22] Krizhevsky, A., Sutskever, I. and Hinton, G.E., 2012. Imagenet classification
porating deformable convolution layers within on-policy methods with deep convolutional neural networks. In Advances in neural information
processing systems (pp. 1097-1105).
such as SARSA and Actor-Critic methods [25] can further improve [23] Long, J., Shelhamer, E. and Darrell, T., 2015. Fully convolutional networks for
these algorithms. semantic segmentation. In Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition (pp. 3431-3440).
[24] Girshick, R., Donahue, J., Darrell, T. and Malik, J., 2014. Rich feature hierarchies
REFERENCES for accurate object detection and semantic segmentation. In Proceedings of the
[1] Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis IEEE conference on computer vision and pattern recognition (pp. 580-587).
Antonoglou, Daan Wierstra, and Martin Riedmiller. "Playing atari with deep [25] Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D. and
reinforcement learning." arXiv preprint arXiv:1312.5602 (2013). Kavukcuoglu, K., 2016, June. Asynchronous methods for deep reinforcement
[2] Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., learning. In International Conference on Machine Learning (pp. 1928-1937).
Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G. and Petersen, S., 2015. [26] Sorokin, I., Seleznev, A., Pavlov, M., Fedorov, A. and Ignateva, A., 2015. Deep
Human-level control through deep reinforcement learning. Nature, 518(7540), attention recurrent Q-network. arXiv preprint arXiv:1512.01693.
pp.529-533. [27] Schaul, T., Quan, J., Antonoglou, I. and Silver, D., 2015. Prioritized experience
[3] Hausknecht, M. and Stone, P., 2015. Deep recurrent q-learning for partially replay. arXiv preprint arXiv:1511.05952.
observable mdps. arXiv preprint arXiv:1507.06527.

966

End of Year Test
100% (4)
End of Year Test
8 pages
Deep Reinforcement Learning Mohit Sewak
No ratings yet
Deep Reinforcement Learning Mohit Sewak
6 pages
Deep Reinforcement Learning: From Q-Learning To Deep Q-Learning
No ratings yet
Deep Reinforcement Learning: From Q-Learning To Deep Q-Learning
9 pages
Yuan and Luo Points
100% (5)
Yuan and Luo Points
23 pages
Adult Lumbar Scoliosis
100% (3)
Adult Lumbar Scoliosis
293 pages
Comparacion Entre ASTM D7169 y TBP
100% (1)
Comparacion Entre ASTM D7169 y TBP
28 pages
RLDL_PBL_AmriteshChandra_09411503121
No ratings yet
RLDL_PBL_AmriteshChandra_09411503121
15 pages
1602 02672 PDF
No ratings yet
1602 02672 PDF
10 pages
Towards Adapting Reinforcement Learning Agents To New Tasks: Insights From Q-Values
No ratings yet
Towards Adapting Reinforcement Learning Agents To New Tasks: Insights From Q-Values
10 pages
Towards-Monocular-Vision-based-Obstacle-Avoidance-through-Deep-Reinforcement-Learning (1)
No ratings yet
Towards-Monocular-Vision-based-Obstacle-Avoidance-through-Deep-Reinforcement-Learning (1)
14 pages
6S191 MIT DeepLearning L5
No ratings yet
6S191 MIT DeepLearning L5
62 pages
Pplication of Deep Reinforcement Learning For Ndian Stock Trading Automation
No ratings yet
Pplication of Deep Reinforcement Learning For Ndian Stock Trading Automation
9 pages
1611.01606v1
No ratings yet
1611.01606v1
13 pages
1 s2.0 S0925231220303337 Main
No ratings yet
1 s2.0 S0925231220303337 Main
12 pages
18-deeprl
No ratings yet
18-deeprl
19 pages
Human-Level Control Through Deep Reinforcement Learning
No ratings yet
Human-Level Control Through Deep Reinforcement Learning
13 pages
Chapter_1_Introduction_RL_Report_Kiran
No ratings yet
Chapter_1_Introduction_RL_Report_Kiran
2 pages
Multi-Agent Deep Reinforcement Learning: Maxim Egorov Stanford University
No ratings yet
Multi-Agent Deep Reinforcement Learning: Maxim Egorov Stanford University
8 pages
Case Study C Neww
No ratings yet
Case Study C Neww
12 pages
Introduction To Deep Q-Network (DQN) : by Divyansh Pandit
No ratings yet
Introduction To Deep Q-Network (DQN) : by Divyansh Pandit
10 pages
An Introduction To Deep Reinforcement Learning PDF
No ratings yet
An Introduction To Deep Reinforcement Learning PDF
140 pages
On Improving DRL For POMDP
No ratings yet
On Improving DRL For POMDP
7 pages
1 Related Works
No ratings yet
1 Related Works
2 pages
Introduction To Deep Reinforcement Learning
No ratings yet
Introduction To Deep Reinforcement Learning
7 pages
Q-Learning and Deep Q Networks (DQN)
No ratings yet
Q-Learning and Deep Q Networks (DQN)
52 pages
Untitled document
No ratings yet
Untitled document
11 pages
Deep Q-Network
No ratings yet
Deep Q-Network
15 pages
Control of Nonholonomic Vehicle System Using Hierarchical Deep Reinforcement Learning
No ratings yet
Control of Nonholonomic Vehicle System Using Hierarchical Deep Reinforcement Learning
4 pages
Yuming Li Pin Ni Victor Chang
No ratings yet
Yuming Li Pin Ni Victor Chang
19 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
7 pages
Playing FPS Games With Deep Reinforcement Learning: Guillaume Lample, Devendra Singh Chaplot
No ratings yet
Playing FPS Games With Deep Reinforcement Learning: Guillaume Lample, Devendra Singh Chaplot
7 pages
Human-Level Control Through Deep Reinforcement Learning - Nature
No ratings yet
Human-Level Control Through Deep Reinforcement Learning - Nature
11 pages
Doom AI
No ratings yet
Doom AI
7 pages
1701 07274v2 PDF
No ratings yet
1701 07274v2 PDF
30 pages
Deep Reinforcement Learning An Overview
No ratings yet
Deep Reinforcement Learning An Overview
30 pages
MEG511_Term Report
No ratings yet
MEG511_Term Report
15 pages
Deep Reinforcement Learning
100% (4)
Deep Reinforcement Learning
48 pages
Full Download Deep Reinforcement Learning in Action 1st Edition Alexander Zai Brandon Brown PDF DOCX
100% (2)
Full Download Deep Reinforcement Learning in Action 1st Edition Alexander Zai Brandon Brown PDF DOCX
65 pages
Deep Quality-Value (DQV) Learning: Preprint. Work in Progress
No ratings yet
Deep Quality-Value (DQV) Learning: Preprint. Work in Progress
10 pages
4b - Deep Reinforcement Learning
No ratings yet
4b - Deep Reinforcement Learning
29 pages
RL Project - Deep Q-Network Agent Presentation
No ratings yet
RL Project - Deep Q-Network Agent Presentation
15 pages
Chapter 1
No ratings yet
Chapter 1
33 pages
Report
No ratings yet
Report
3 pages
Unit 5d - Deep Reinforcement Learning
No ratings yet
Unit 5d - Deep Reinforcement Learning
52 pages
42-Deep Q Learning
No ratings yet
42-Deep Q Learning
8 pages
IMPLing The DQN
No ratings yet
IMPLing The DQN
9 pages
Playing Geometry Dash With Convolutional Neural Networks
No ratings yet
Playing Geometry Dash With Convolutional Neural Networks
7 pages
Lecture Notes Deep Reinforcement Learning: Generalizability in Deep RL
No ratings yet
Lecture Notes Deep Reinforcement Learning: Generalizability in Deep RL
7 pages
2410.22766v1
No ratings yet
2410.22766v1
12 pages
Control of Memory and Action in Minecraft
No ratings yet
Control of Memory and Action in Minecraft
10 pages
dqn-atari
No ratings yet
dqn-atari
26 pages
15
No ratings yet
15
17 pages
Large-Scale Deep Reinforcement Learning
No ratings yet
Large-Scale Deep Reinforcement Learning
6 pages
Q learning
No ratings yet
Q learning
187 pages
Walking Through Original DQN Paper - by Stas Olekhnovich - Medium
No ratings yet
Walking Through Original DQN Paper - by Stas Olekhnovich - Medium
13 pages
3.5 Intro2DeepQLearning
No ratings yet
3.5 Intro2DeepQLearning
12 pages
Lecture Notes on Reinforcement Learning Basics
No ratings yet
Lecture Notes on Reinforcement Learning Basics
6 pages
Safe_Navigation_Based_on_Deep_Q-Network_Algorithm_Using_an_Improved_Control_Architecture
No ratings yet
Safe_Navigation_Based_on_Deep_Q-Network_Algorithm_Using_an_Improved_Control_Architecture
6 pages
CS461 Intermediate Report Team7
No ratings yet
CS461 Intermediate Report Team7
5 pages
Bio_inspired_AI_seminar_paper
No ratings yet
Bio_inspired_AI_seminar_paper
18 pages
Towards Vision-Based Deep Reinforcement Learning For Robotic Motion Control
No ratings yet
Towards Vision-Based Deep Reinforcement Learning For Robotic Motion Control
8 pages
Q Transformer
No ratings yet
Q Transformer
20 pages
AI for Everyone: An Intermediate Guide to Artificial Intelligence
From Everand
AI for Everyone: An Intermediate Guide to Artificial Intelligence
Nova Clarke
No ratings yet
CCSP - Certified Cloud Security Professional Exam Success
From Everand
CCSP - Certified Cloud Security Professional Exam Success
SUJAN
No ratings yet
Q4w2 Science Phases of The Moon
No ratings yet
Q4w2 Science Phases of The Moon
10 pages
Differential Pressure Switch Honeywell
No ratings yet
Differential Pressure Switch Honeywell
2 pages
Class 11 - English Core The Summer of The Beautiful White Horse Question Bank
No ratings yet
Class 11 - English Core The Summer of The Beautiful White Horse Question Bank
3 pages
2122 Exam 4eso Rec 3 Ev
No ratings yet
2122 Exam 4eso Rec 3 Ev
1 page
Earth and Life Science: Quarter 2 - Module 10
No ratings yet
Earth and Life Science: Quarter 2 - Module 10
37 pages
EPP and TLE Content Matrix
100% (1)
EPP and TLE Content Matrix
2 pages
Grammar Handout
No ratings yet
Grammar Handout
57 pages
Exam 2nd Quarter Cookery
100% (1)
Exam 2nd Quarter Cookery
2 pages
DT-9 - Lesson 4
No ratings yet
DT-9 - Lesson 4
8 pages
The Stunt - CL ROSE
No ratings yet
The Stunt - CL ROSE
117 pages
Mole Concept DPT-1
No ratings yet
Mole Concept DPT-1
2 pages
A Novel Homogenization Method For Periodic Piezoelectric Composites Via Diffused Material Interface
No ratings yet
A Novel Homogenization Method For Periodic Piezoelectric Composites Via Diffused Material Interface
21 pages
Chemistry Chapter 2
No ratings yet
Chemistry Chapter 2
35 pages
Romaco_DB_Bipak-EN
No ratings yet
Romaco_DB_Bipak-EN
2 pages
Hand Over Note
No ratings yet
Hand Over Note
5 pages
Fluid Mechanics 2 - Formula Sheet: Conservation Equation
No ratings yet
Fluid Mechanics 2 - Formula Sheet: Conservation Equation
5 pages
ORÍKÌ Day 4 OBÀTÁLÁ - Iyami - Egungun
100% (2)
ORÍKÌ Day 4 OBÀTÁLÁ - Iyami - Egungun
4 pages
007 Questions Pseries AIX System Administration
100% (2)
007 Questions Pseries AIX System Administration
16 pages
Foundations of Naturalistic Inquiry
No ratings yet
Foundations of Naturalistic Inquiry
17 pages
(Time: Hours) (Marks: 75) Please Check Whether You Have Got The Right Question Paper
No ratings yet
(Time: Hours) (Marks: 75) Please Check Whether You Have Got The Right Question Paper
16 pages
SST-600 Steam Turbine For Economical Production of Heat and Power
100% (1)
SST-600 Steam Turbine For Economical Production of Heat and Power
4 pages
Renovhyal
No ratings yet
Renovhyal
8 pages
Screenshot 2023-02-14 at 14.00.07
No ratings yet
Screenshot 2023-02-14 at 14.00.07
11 pages
ATC-3200 User's Manual
No ratings yet
ATC-3200 User's Manual
3 pages
INR18650
No ratings yet
INR18650
2 pages
wood and fabric in aircraft construction & specifications
No ratings yet
wood and fabric in aircraft construction & specifications
4 pages

Deep Deformable Q-Network An Extension of Deep Q-Network

Uploaded by

Deep Deformable Q-Network An Extension of Deep Q-Network

Uploaded by

Deep Deformable Q-Network:An Extension of Deep Q-Network

Beibei Jin Jianing Yang

Xiangsheng Huang Dawar Khan

Table 1: Network architecture of Deep Deformable Q-Network

Layer Input Filter size Stride Num. filters Activation Output

You might also like