0% found this document useful (0 votes)
1 views

Deep Reinforcement Learning-Based Autonomous Navigation of Mobile Robots in Smart Factory

The document details a senior design project focused on enhancing mobile robot navigation in smart factories using deep reinforcement learning, specifically the DQN algorithm. It highlights the project's objectives to improve navigation efficiency, accuracy, and adaptability compared to traditional methods like Nav2 SLAM, validated through extensive testing in ROS2 and Gazebo. The findings suggest that the DQN-based system significantly enhances operational capabilities, paving the way for advancements in industrial automation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

Deep Reinforcement Learning-Based Autonomous Navigation of Mobile Robots in Smart Factory

The document details a senior design project focused on enhancing mobile robot navigation in smart factories using deep reinforcement learning, specifically the DQN algorithm. It highlights the project's objectives to improve navigation efficiency, accuracy, and adaptability compared to traditional methods like Nav2 SLAM, validated through extensive testing in ROS2 and Gazebo. The findings suggest that the DQN-based system significantly enhances operational capabilities, paving the way for advancements in industrial automation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 56

Deep Reinforcement Learning-Based Autonomous Navigation of

Mobile Robots in Smart Factory

A SENIOR DESIGN PROJECT REPORT

Submitted by

01FE21BAR013 – PRUTHVI ASANGI


01FE21BAR028 – RAMPRASAD AVALAKKI
01FE21BAR040 – SATHVIKA ANUGU
01FE21BAR041 – PRAJWAL MAHENDRAKAR

In the completion of senior design project

Under the guidance of:

Asst. Prof. Shilpa Tanvashi

Bachelor of Engineering

IN

AUTOMATION AND ROBOTICS

KLE TECHNOLOGICAL UNIVERSITY, HUBBALLI

DECEMBER 2024

`
DEPARTMENT OF AUTOMATION AND ROBOTICS

Certified that the project work entitled

Deep Reinforcement Learning-Based Autonomous Navigation of Mobile


Robots in Smart Factory

Is a bonafide work carried out by

Pruthvi Asangi, Ramprasad Avalakki, Sathvika Anugu, Prajwal Mahendrakar in partial fulfillment
for the award of degree of Bachelor Engineering in AUTOMATION AND ROBOTICS of the
KLE Technological University, Hubballi during the year 2024. It is certified that all
corrections/suggestions indicated for internal assignment have been incorporated in the report
deposited in the departmental library. The project report has been approved as it satisfies the
academic requirements in respect of project work prescribed for the Bachelor of Engineering
Degree.

Signature of the guide Signature of the HOD


Asst. Prof. Shilpa Tanvashi Prof. Sachin Karadgi

Name of the Examiners Signature with Date

1.
2.

`
ACKNOWLEDGEMENT

We have put our sincere efforts in this project. However, it would not have been possible without
the kind support and help of many individuals. We would like to extend our sincere thanks to all of
them.

We are highly indebted to our guide Asst. Prof. Shilpa Tanvashi for their guidance and constant
supervision as well as for providing necessary information regarding the project & also for their
support in completing the project.

We would like to express our gratitude towards all the members of the department of Automation
and Robotics for their kind cooperation and encouragement which helped us in completion of this
project.

We would like to express our special gratitude and thanks to Head of the Department Prof. Sachin
Karadgi for giving us attention and time.

Our thanks and appreciations to all our classmates and people who have willingly helped us out
with their abilities.

01FE21BAR013 01FE21BAR028 01FE21BAR040

01FE21BAR041

1
ABSTRACT

The Deep Reinforcement Learning-Based Autonomous Navigation of Mobile Robots in Smart


Factory project leverages the DQN algorithm to optimize the movement of TurtleBot3 within a
dynamic environment. Tested extensively in ROS2 and Gazebo, this project demonstrates how
DQN outperforms traditional methods like Nav2 SLAM in terms of navigation efficiency and
adaptability. By integrating advanced reinforcement learning techniques, the system achieves
superior autonomous navigation, essential for modern smart factory operations.

This project aims to enhance operational efficiency by enabling mobile robots to navigate complex
factory layouts autonomously. Utilizing the DQN algorithm allows for real-time decision-making
and path optimization, crucial for maintaining high productivity and flexibility in manufacturing
processes.

Through meticulous design and implementation, the system ensures seamless interaction between
the physical and digital domains. The deployment in ROS2 and Gazebo provides a robust testing
ground, simulating real-world scenarios to validate the effectiveness of the DQN-based navigation
system.

The results showcase the potential of deep reinforcement learning in transforming autonomous
navigation. Compared to Nav2 SLAM, DQN offers improved accuracy, reduced computational
overhead, and better adaptability to changes in the environment. These advancements contribute
significantly to the development of smart factories, where automation and efficiency are paramount.

This project sets a new benchmark in autonomous navigation within smart factories. By harnessing
the power of DQN and rigorous testing in ROS2 and Gazebo, it highlights the future of mobile
robot applications in industrial automation, paving the way for smarter, more efficient production
environments.

2
TABLE OF CONTENTS

Chapter No Title of the chapter Page No.


Acknowledgement I
Abstract II
List of contents III
List of tables V
List of figures VI
Chapter 1 Introduction 1
Chapter 2 Problem Statement 2
2.1 Motivation 2
2.2 Goals and Objectives 2
Chapter 3 Literature Survey 4
Chapter 4 Methodology 12
4.1 Black Box 12
4.2 Function Structure Diagram 13
4.3 Functions 14
4.4 Software 15
4.5 Building a dynamic environment for the use case 16
4.6 Navigation using SLAM and Nav2 17
4.6.1 Creating a Warehouse Environment in Gazebo 17
4.6.2 Adding Warehouse Elements 17
4.6.3 Customizing the Warehouse 17
4.6.4 Saving and Using the Environment 18
4.6.5 Integrating with ROS2 18
4.6.6 Building a Map and Navigation 18
4.6.7 SLAM and Nav2 19
4.6.8 Setting Navigation Goals 19
4.7 DQN Training 21
4.8 DQN Testing 28
Chapter 5 Results and discussions 29

3
Chapter 6 Conclusion and future scope 30
References 31
Appendix 32

LIST OF TABLES

Table No. Title of the Tables Page No

Table 4.3 Functions of the System 14

Table 4.4 Software used 15

4
LIST OF FIGURES

Figure No. Title of the Figures Page No

Fig 4.1 Black Box 12

Fig 4.2 Function Structure Diagram 13

Fig 4.5 Dynamic Environment of Warehouse scenario 16

Fig 4.6.3 Customized Environment in Warehouse in gazebo 17

Fig 4.6.5 Custom Warehouse Environment with TurtleBot3 18

Fig 4.6.6 Map of the Warehouse Environment 19

Fig 4.6.7 Navigation of TurtleBot3 in Warehouse Environment 20

5
Fig 4.7a No Obstacles 21

Fig 4.7b Static Obstacles 21

Fig 4.7c Dynamic obstacles 21

Fig 4.7d Combination of Static and Dynamic Obstacles 22

Fig 4.7e Launching the Environment 23

Fig 4.7f Setting Goal Points 24

Fig 4.7f Initializing Environment for DQN 25

Fig 4.7g Training interms of Episodes 26

Fig 4.7h Navigation Metrics Overview 26

Fig 4.7i Goal Navigation Feedback 27

Fig 4.8 Testing in Warehouse Environment 28

6
CHAPTER 1
INTRODUCTION

The rapidly evolving landscape of industrial automation demands innovative solutions to enhance efficiency,
reduce costs, and streamline operations. In this context, the integration of deep reinforcement learning
(DRL) techniques, specifically the Deep Q-Network (DQN) algorithm, for autonomous navigation of mobile
robots presents a significant advancement. Traditional navigation methods like Nav2 SLAM have shown
limitations in adapting to dynamic environments and optimizing path planning, often resulting in
inefficiencies and increased operational costs.

As industries strive to improve productivity and respond to ever-changing manufacturing requirements, the
ability to deploy mobile robots capable of autonomously navigating complex factory environments becomes
crucial. This project addresses these needs by leveraging the DQN algorithm to enable TurtleBot3 robots to
move efficiently within a simulated environment, tested extensively in ROS2 and Gazebo.

The primary objective of this research is to demonstrate the superiority of DQN over traditional navigation
methods by providing more accurate and adaptable path planning solutions. The DQN algorithm's ability to
learn optimal policies through interactions with the environment allows for real-time decision-making,
essential for maintaining high productivity levels in smart factories.

In modern manufacturing, the need for flexibility, real-time responsiveness, and data-driven decision-
making is paramount. Current automation systems often struggle to meet these demands, leading to
inefficiencies and higher operational costs. This project seeks to fill this gap by introducing an advanced
navigation system that integrates seamlessly with existing factory automation processes.

The implementation of the DQN algorithm within ROS2 and Gazebo provides a robust platform for testing
and validating the system's performance in realistic scenarios. By comparing DQN to Nav2 SLAM, this
research highlights the improvements in navigation accuracy, computational efficiency, and adaptability,
ultimately contributing to the development of more intelligent and efficient smart factories.

Addressing the challenges of autonomous navigation in dynamic and complex environments, this project
sets a new benchmark for mobile robot applications in industrial automation. The integration of deep
reinforcement learning techniques not only enhances the operational capabilities of mobile robots but also
paves the way for future advancements in smart factory technologies.

1
CHAPTER 2
PROBLEM STATEMENT

"DEEP REINFORCEMENT LEARNING-BASED AUTONOMOUS NAVIGATION OF MOBILE


ROBOTS IN SMART FACTORY"

2.1 Motivation

The motivation behind documenting the Deep Reinforcement Learning-Based Autonomous Navigation of
Mobile Robots project is driven by a strong commitment to advancing the efficiency and capability of smart
factory environments. The project highlights the use of the DQN algorithm to enable TurtleBot3 to navigate
complex environments autonomously, tested within the robust simulation platforms of ROS2 and Gazebo.
The goal is to illustrate how such advanced techniques can outperform traditional navigation methods like
Nav2 SLAM, thus setting new standards in the field of industrial automation.

This project aims to inspire future researchers, engineers, and innovators by sharing the comprehensive
journey of development, including the challenges faced and the solutions found. Documenting this project
not only preserves valuable insights and experiences gained throughout its execution but also serves as a
guide and motivation for those venturing into cutting-edge technologies.

Moreover, this documentation seeks to amplify the impact of this innovation by reaching a broader audience
and fostering a collaborative mindset within the scientific and industrial communities. By detailing the
intricacies of integrating deep reinforcement learning into autonomous navigation, the project hopes to spark
discussions, invite constructive criticism, and encourage further advancements. This continuous cycle of
innovation and improvement is crucial for the evolution of autonomous systems in industrial settings.

Ultimately, the team believes that the knowledge and solutions developed through this project should act as a
beacon, guiding future endeavors toward greater technological achievements and societal benefits.

2.2 Goals and Objectives

1. Optimize Autonomous Navigation: Utilize the DQN algorithm to enhance the autonomous navigation
capabilities of mobile robots, ensuring efficient and accurate movement within complex factory
environments.
2. Performance Benchmarking: Compare the performance of DQN-based navigation with traditional
methods such as Nav2 SLAM to highlight improvements in navigation efficiency, accuracy, and
adaptability.
3. Seamless Integration: Integrate advanced reinforcement learning techniques with ROS2 and Gazebo to
provide a robust and scalable solution for autonomous navigation in smart factories.
4. Real-Time Decision Making: Enable mobile robots to make real-time decisions based on dynamic
environmental conditions, enhancing their ability to navigate and perform tasks autonomously.

2
5. Reduce Operational Costs: Demonstrate how improved navigation efficiency can lead to reduced
operational costs and increased productivity in industrial settings.
6. Foster Innovation: Encourage ongoing research and experimentation in the field of autonomous
navigation, promoting a research-driven environment that keeps pace with the latest technological
advancements.

This problem statement sets the stage for addressing the challenges of autonomous navigation in smart
factories, with a focus on leveraging deep reinforcement learning to achieve superior performance and
operational efficiency.

3
CHAPTER 3
LITERATURE SURVEY

[1] This paper presents a comprehensive review examining mobile robot path planning
approaches through deep reinforcement learning (DRL) algorithms. The research analyzes
both value function-based algorithms (DQN and variants) and policy gradient methods (A3C,
DDPG, TRPO, PPO), demonstrating how DRL combines deep learning's perceptual strengths
with reinforcement learning's decision-making capabilities. The paper explores DRL
applications across computer games, video games, and autonomous navigation, while
highlighting that despite promising results in experimental settings, significant challenges
remain in transitioning DRL-based path planning to real-world applications.

[2] This paper introduces PIC4rl-gym, a comprehensive modular framework in


ROS2/Gazebo for training and testing deep reinforcement learning agents specifically for
mobile robot navigation tasks. The research demonstrates a flexible approach to autonomous
navigation by integrating various robotic platforms, sensors, and deep reinforcement learning
algorithms across different navigation scenarios, including point-to-point navigation,
vineyard row navigation, and person monitoring. The framework provides a standardized
testing package and environment to enhance reproducibility and comparability of DRL-based
navigation solutions.

[3] This paper presents a novel autonomous mobile robot navigation system utilizing deep
reinforcement learning (DRL) with depth estimation from a monocular camera. The proposed
system employs a Double Deep Q-Network (DDQN) to derive action policies based on input
states generated from monocular depth images, enabling navigation without reliance on grid
maps or 2D LiDAR. Experimental results demonstrate the system's capability for
autonomous navigation in real-world environments, highlighting the importance of accurate
localization and depth estimation improvements for future enhancements.

[4] This paper presents a comprehensive study on autonomous mobile robot navigation using
deep reinforcement learning (DRL) in warehouse environments. The research employs a
Deep Q-Network (DQN) algorithm to enable autonomous navigation without pre-existing
maps, utilizing LIDAR sensor data and odometry information. Through extensive simulation
and real-world experiments, the study demonstrates the algorithm's capability to navigate
complex environments, avoid obstacles, and reach target locations, highlighting the potential
of DRL for adaptive and flexible robotic navigation in dynamic warehouse settings.

[5] This paper presents a comprehensive approach to autonomous mobile robot navigation
using Deep Deterministic Policy Gradient (DDPG) reinforcement learning integrated with

4
ROS2 and PyTorch frameworks. By leveraging LiDAR sensors and an advanced Actor-Critic
architecture, the research develops a robust navigation system capable of dynamically
adapting to changing environments. The proposed method demonstrates significant
improvements in navigation accuracy, achieving 88.97% success rate in avoiding obstacles
and navigating complex scenarios, thereby addressing critical challenges in traditional map-
based navigation techniques for autonomous robotic systems.

[6]This paper presents a approach to autonomous mobile robot navigation by integrating the
Twin-Delayed Deep Deterministic Policy Gradient (TD3) reinforcement learning algorithm
with ROS2 and Gazebo simulation frameworks. Leveraging advanced LiDAR sensor data
and an innovative Actor-Critic architecture, the research develops a sophisticated navigation
system capable of dynamically exploring and mapping complex, unknown environments. By
addressing critical limitations of traditional reinforcement learning methods, the proposed
approach demonstrates remarkable adaptability, achieving robust obstacle avoidance and
goal-driven exploration with a simulated Pioneer P3DX mobile robot, thereby offering a
transformative solution to autonomous robotic navigation challenges.

[7] This paper presents a actor-critic, model-free algorithm based on the deterministic policy
gradient that can operate over continuous action spaces. Their work primarily focused on
simulated environments, introducing a significant advancement in deep reinforcement
learning by addressing the challenge of learning continuous control policies. The research
demonstrated the potential of using neural networks to directly learn policies for complex
control tasks, particularly in scenarios where traditional reinforcement learning methods
struggled. By proposing a method that could handle continuous action spaces more
effectively, the study opened new avenues for applying deep reinforcement learning to
robotics, autonomous systems, and other domains requiring nuanced, continuous control
strategies. This approach laid the groundwork for subsequent research in deep deterministic
policy gradient (DDPG) methods and inspired further innovations in autonomous navigation
and robotic control.

[8] In this paper, a two-step approach is proposed to develop an optimal action strategy model
for mobile robot navigation using deep reinforcement learning. Initially, the Deep Q Network
(DQN) model is trained in a simulated environment, specifically within the Gazebo platform,
to mitigate the challenges associated with noise and reward delays often encountered in real-
world scenarios. Once the DQN is well-trained, it is applied to a real mobile robot for
autonomous navigation. The system architecture includes modules for image acquisition and
preprocessing, value function acquisition, and action selection, where the robot utilizes real-
time RGB images as input to determine the optimal actions for obstacle avoidance and target
reaching. The experimental results demonstrate the effectiveness of the proposed method in
both simulated and real environments, confirming its adaptability and robustness in
autonomous navigation tasks.

5
[9]This paper presents a goal-driven autonomous exploration system using deep
reinforcement learning that enables mobile robots to navigate unknown environments by
combining a global navigation strategy with a Twin Delayed Deep Deterministic Policy
Gradient (TD3) neural network. The approach learns a generalized motion policy through
simulation, allowing robots to extract points of interest, select optimal waypoints, and
generate continuous action decisions based on laser sensor data. Experimental results
demonstrate the system's effectiveness in autonomously exploring complex static and
dynamic environments, overcoming local optimum problems without prior map information.

[10] Autonomous robot exploration in large-scale environments has been traditionally


approached through frontier-based, sampling-based, and information-based methods, each
facing computational and adaptability challenges. Recent advancements in Deep
Reinforcement Learning (DRL) have introduced novel strategies for robotic navigation,
enabling more adaptive and efficient exploration techniques. Current DRL methods
predominantly utilize frontier points and LiDAR-based exploration, but often struggle with
obstacle avoidance and computational overhead. This paper proposes an end-to-end LiDAR-
based Active SLAM system using deep reinforcement learning that addresses these
limitations by integrating point cloud and map information with an innovative neural network
architecture.

[11]The paper by Ravi Raj and Andrzej Kos presents a paper on mobile robot (MR)
navigation in unknown and complex environments using reinforcement learning (RL),
specifically deep Q-learning (DQN). The authors emphasize the increasing reliance on MRs
across various industries and the necessity for effective control strategies that adapt to
dynamic surroundings. Their method involves training a neural network to learn navigation
policies through trial and error, where the robot receives rewards for actions that lead to goal
attainment while avoiding obstacles. The study demonstrates that their RL-based approach
outperforms traditional navigation techniques in terms of efficiency and safety, particularly in
environments characterized by static and dynamic obstacles. The research highlights the
potential of deep reinforcement learning to enable autonomous navigation without the need
for precise mapping, thereby enhancing the capabilities of MRs in real-world applications.

[12] The paper presents an innovative autonomous navigation system for mobile robots that
integrates deep reinforcement learning (DRL) with sensor data fusion to navigate complex
and unknown environments without relying on pre-existing maps. Utilizing the Soft Actor-
Critic (SAC) algorithm, the system processes fused data from onboard lidar sensors and
cameras to generate motion control strategies, enabling the robot to explore and navigate
effectively. The proposed approach includes a heuristic function to evaluate potential
navigation points, guiding the robot toward its global target while avoiding obstacles.
Experimental results demonstrate that the system outperforms traditional navigation methods
and other DRL algorithms regarding efficiency and adaptability, showcasing its potential for
real-world applications in autonomous navigation.

6
[13]The paper proposes a novel path-planning method for mobile robots operating in
unknown environments, utilizing deep reinforcement learning (DRL) techniques. It addresses
the limitations of traditional path-planning methods, which often rely on global maps and can
be inefficient in dynamic or complex settings. The proposed approach leverages a model-free
DRL framework to enable robots to learn optimal navigation strategies through interaction
with their environment, thereby improving adaptability and efficiency. The methodology is
validated through simulations, demonstrating its effectiveness in generating efficient paths
while avoiding obstacles and adapting to environmental changes. The results indicate that the
proposed method significantly enhances the performance of mobile robots in real-time path-
planning tasks.

[14]The paper presents a novel approach for visual navigation of mobile robots in complex
environments using distributed deep reinforcement learning (DRL). It addresses the
limitations of traditional map-based navigation methods, particularly in unknown and
dynamic settings, by dividing the environment into distinct regions and employing a
combination of long-term term memory (LSTM) and proximal policy optimization (PPO)
algorithms for local navigation. The proposed model utilizes RGB-D images and target
coordinates as inputs to generate continuous motion outputs, enabling end-to-end navigation
without maps or human intervention. A new reward function is designed to enhance training
efficiency by considering collision avoidance, distance to the target, and robot speed.
Experimental results demonstrate the model's effectiveness, achieving an overall navigation
success rate of 84% in a simulated large-scale complex environment, indicating its potential
for practical applications in autonomous navigation.

[15]The paper presents a novel navigation approach for mobile robots utilizing a Quantile
Regression Deep Q-Network (QR-DQN) within a distributional deep reinforcement learning
framework. Traditional reinforcement learning methods often fail in uncertain environments
due to their reliance on expected values, which can misrepresent real situations, particularly
in the presence of dynamic obstacles. The proposed QR-DQN model leverages depth images
from a Kinect 3D camera to simplify data collection and enhance navigation efficiency. The
methodology includes a robust policy learning architecture, a carefully designed reward
function to promote safe and steady movement, and a distribution parameter calculation that
captures the variability of potential outcomes. Simulation results demonstrate that the QR-
DQN outperforms other deep reinforcement learning models in terms of navigation efficiency
and obstacle avoidance across various scenarios, highlighting its effectiveness

[16] The paper presents a path planning algorithm for mobile robots in unknown
environments using Deep Reinforcement Learning (DRL), specifically the Proximal Policy
Optimization (PPO) algorithm. The research addresses the challenge of planning a safe,
collision-free path without a global map, relying solely on local information obtained from a
2D Lidar sensor. The path planning problem is modeled as a Partially Observed Markov

7
Decision Problem (POMDP), where the state space includes the robot's position relative to
the target and Lidar distance data, while the action space consists of target positions for the
robot to reach. Notably, the algorithm improves training efficiency by executing the motion
controller only during testing, rather than throughout the training phase. Simulations
demonstrate the algorithm's effectiveness, achieving a high success rate in navigating
complex environments, and the paper concludes with insights on enhancing DRL-based path
planning efficiency through reward shaping and potential future research directions.

[17]The paper presents a novel approach for the motion control of non-holonomic
constrained mobile robots, focusing on point stabilization using deep reinforcement learning
(DRL). It begins by constructing a kinematic model of the robot, which serves as the
foundation for the DRL framework, incorporating elements such as state space, action space,
and reward functions. The authors employ a deep deterministic policy gradient algorithm to
develop a control law that enables the robot to stabilize at a desired point through iterative
learning. Simulation and experimental results demonstrate the effectiveness of the proposed
method, showing that the robot can successfully reach the target state while adhering to non-
holonomic constraints. The study highlights the potential of DRL in overcoming traditional
control limitations in mobile robotics and suggests future applications in other non-
holonomic systems.

[18] The paper discusses a method for mobile robot path planning using Deep Reinforcement
Learning (DRL) that addresses the challenge of sparse rewards in larger map scenarios.
Traditional path planning methods often struggle with high computational demands and poor
adaptability to dynamic environments. The authors propose a time-sensitive reward function
that enhances the learning process by incorporating various node information and collision
data, thereby improving convergence rates in complex environments, including trap maps.
The method is validated through simulation experiments on multiple maps, demonstrating its
effectiveness compared to existing DRL algorithms. The results indicate that the proposed
time-sensitive reward allows for efficient exploration and quick convergence to feasible
solutions, highlighting its adaptability across different scenarios. Future work may involve
developing an adaptive parameterization for the reward function to further enhance
performance in varying map sizes.

[19]The paper discusses a study on using Multi-Agent Deep Deterministic Policy Gradient
(MADDPG) for controlling mobile robots in flocking scenarios while avoiding obstacles. It
highlights the importance of mobile robots in various applications and their need for effective
navigation and responsiveness to human activities and environmental challenges. The authors
present a reinforcement learning approach that allows robots to learn autonomously in a
simulated environment, focusing on maintaining distances between robots, reaching targets,
and avoiding collisions. The methodology involves defining states, actions, and reward
functions to guide the robots' learning process. Despite implementing the algorithm in
MATLAB/Simulink and conducting extensive simulations, the results indicate that the

8
proposed approach did not achieve the desired objectives, prompting plans for further
improvements and real-world testing.

[20]The paper explores the use of deep reinforcement learning (DRL) combined with transfer
learning to improve the path planning capabilities of mobile robots in unknown
environments. By implementing a Double Deep Q-Network (DDQN) architecture, the robot
autonomously learns to navigate towards a target while avoiding static and dynamic
obstacles. Transfer learning enables the reuse of trained data from previous models, reducing
the time required for the robot to adapt to new scenarios. Simulations in various
environments demonstrate that this approach enhances the robot’s learning efficiency and
stability, achieving effective path planning and obstacle avoidance with minimal reliance on
manual supervision.

[21]This paper. addresses the challenges faced by deep reinforcement learning (RL) in real-
world autonomous robot navigation, such as safety, generalization, and sample efficiency.
The authors identify four key desiderata for effective navigation systems: reasoning under
uncertainty, safety, learning from limited data, and generalization to diverse environments.
They explore various RL techniques, including memory-based neural networks, safe RL
methods, model-based RL, and domain randomization, and evaluate their performance in a
newly developed open-source navigation benchmark. This benchmark simulates complex
obstacle courses and includes real-world environments to assess the effectiveness of different
approaches. The results indicate that while RL methods show promise, they still lag behind
classical navigation systems in safety and generalization, highlighting the need for further
research to improve these aspects in RL-based navigation

[22]The research paper provides an extensive overview of advancements and challenges in


the field of autonomous mobile robots, particularly focusing on their navigation capabilities.
The authors discuss various critical aspects, including sensor types, mobile robot platforms,
simulation tools, path planning and following, sensor fusion methods, obstacle avoidance
techniques, and simultaneous localization and mapping (SLAM). They emphasize the rapid
evolution of autonomous navigation technologies, driven by deep learning methods, and
highlight the importance of regular literature surveys to keep the research community
informed. The paper also identifies gaps in existing surveys, noting that many have
overlooked modern obstacle avoidance methods and SLAM techniques. The authors
conclude by outlining future research directions, including the deployment of mobile robots
in real-world scenarios, improvements in SLAM methodologies, and the exploration of
unseen environments. Overall, the paper serves as a valuable resource for both experienced
researchers and newcomers in the field of autonomous navigation.

[23]The paper discusses the hierarchical structure of motion planning, which includes
strategic decisions, trajectory planning, and control. The paper highlights the significance of
modeling the environment, state representation, and reward systems in designing effective
DRL systems. It reviews various techniques and frameworks used in the field, including

9
vehicle dynamics models and simulation environments like SUMO and CARLA. The paper
also categorizes existing research based on different driving tasks, such as car-following,
lane-keeping, and merging in traffic, while addressing challenges and open questions in the
field, particularly regarding safety and the transfer of learned behaviors from simulation to
real-world applications. Overall, it emphasizes the potential of DRL to enhance autonomous
driving capabilities while acknowledging the complexities and limitations that remain to be
addressed.

[24]The paper discusses the significance of DRL in learning complex policies in high-
dimensional environments, particularly in tasks where traditional supervised learning
methods fall short. The authors categorize various autonomous driving tasks that can benefit
from DRL, such as driving policy development, motion planning, and predictive perception.
They also highlight the challenges associated with real-world deployment, including the need
for robust training methods, validation, and testing of DRL systems. The paper further
explores related domains like imitation learning and inverse reinforcement learning,
emphasizing the role of simulators in training autonomous agents. Key contributions include
a detailed literature review, identification of computational challenges, and suggestions for
future research directions in DRL for autonomous driving. Overall, the paper serves as a
foundational resource for researchers and practitioners interested in leveraging DRL for
developing safe and efficient autonomous driving systems.

[25]The research paper discusses advancements in robot navigation using Deep


Reinforcement Learning (DRL) techniques, emphasizing the challenges and solutions
associated with applying these methods in dynamic environments. It highlights the
importance of effective navigation for mobile robots, particularly in complex settings where
traditional methods may fall short. The paper reviews various DRL algorithms, such as Deep
Q-Learning and Deep Deterministic Policy Gradients, and their applications in improving
navigation efficiency and safety. It addresses issues like data inefficiency, transfer
difficulties, and collision avoidance, proposing strategies to enhance the performance of
robotic systems in real-world scenarios. The findings suggest that while DRL offers
significant potential for robot navigation, further research is needed to optimize these
techniques for practical deployment, ensuring reliability and adaptability in diverse
environments.

[26]This paper surveys recent advancements in applying deep reinforcement learning (DRL)
to autonomous driving (AD). It presents a comprehensive view of DRL's role in AD by
examining the entire AD stack, from perception and decision-making to motion control. The
paper highlights key DRL methodologies, discussing models like Safe RL and Hierarchical
Reinforcement Learning (HRL) and how they tackle complex driving scenarios. It
emphasizes challenges such as safety, decision interpretability, and continuous learning,
proposing the use of emerging technologies like Large Language Models and Diffusion
Models to improve real-world AD performance. The authors introduce a new six-mode DRL

10
framework for AD applications and explore the potential of DRL in handling the
uncertainties and dynamic nature of real-world driving environments

[27]This study focuses on multiagent collision avoidance in scenarios where agents cannot
communicate. Using deep reinforcement learning, the paper presents a decentralized
algorithm that enables agents to navigate collision-free in shared environments without prior
knowledge of each other's intentions. The algorithm incorporates a value network to estimate
the time to the goal based on each agent’s joint configuration with its neighbors. This method
offloads complex computations to an offline learning process, allowing agents to make
efficient, real-time collision avoidance decisions. The results demonstrate significant
improvements over traditional methods, like the Optimal Reciprocal Collision Avoidance
(ORCA), particularly in scenarios with unpredictable interactions

[28]This paper explores deep reinforcement learning (DRL) for autonomous navigation of
mobile robots in unknown environments, focusing on collision avoidance and target
detection. Two types of DRL agents, Deep Q-Network (DQN) and Double DQN (DDQN),
are applied to allow robots to autonomously learn navigation skills without relying on pre-
mapped environments. The study finds that the DDQN agent achieves higher accuracy and
navigational efficiency than the DQN. Simulation and real-world testing results confirm that
the DRL-trained robots can effectively navigate towards target objects, handling static and
dynamic obstacles. The paper advocates for DRL’s potential in overcoming the limitations of
traditional map-based navigation methods in dynamic environments

[29]explores an approach to enable autonomous robots to navigate pedestrian-rich


environments safely and in a socially compliant way. By using a framework called Socially
Aware Collision Avoidance with Deep Reinforcement Learning (SA-CADRL), the study
trains robots to follow social norms like "passing on the right," resulting in smoother
interactions with pedestrians. Unlike conventional methods that treat humans as dynamic
obstacles, this approach models human-like social behavior to create a more natural and
acceptable presence in spaces like airports or shopping malls. The model is validated in both
simulations and real-world environments, where robots successfully navigated while adhering
to human social conventions

[30]proposes a framework for robots to navigate autonomously in multi-agent environments


without requiring communication. Using deep reinforcement learning, the approach enables
each agent to independently anticipate and avoid potential collisions by predicting the
movements of others around it. This decentralized, non-communicative model is well-suited
for dynamic environments where agents—such as robots or autonomous vehicles—encounter
multiple other moving entities, such as pedestrians, without pre-defined coordination. The
framework emphasizes scalability and adaptability, allowing for real-time applications in
settings where communication constraints or unpredictable agent behaviors complicate
traditional centralized methods

11
CHAPTER 4
METHODOLOGY

4.1 Black Box

The core functionality of the Deep Reinforcement Learning-Based Autonomous Navigation System is
represented by the black box, as shown in Fig 4.1. This black box houses inputs such as Robot State, Sensor
Data, and Factory State. The transformation processes take place inside this closed system, producing
outputs like Control Actions, Navigation Path, and Task Execution. This abstract representation highlights
the system's revolutionary potential to optimize smart factory operations, emphasizing the power of the
DQN algorithm to enhance navigation efficiency and adaptability while concealing the intricate details of
the internal workings.

Fig 4.1 Black Box

12
4.2 Function Structure Diagram

Fig 4.2 Functional Structure Diagram

13
4.3 Functions

Table 4.3 Functions of the System

SI. No Functions Descriptions

1 Deep Reinforcement The mobile robot learns optimal navigation strategies by


Learning (DRL) Framework interacting with the environment and receiving rewards or
penalties. The DRL algorithm continuously updates its policy
for real-time decision-making.

2 Sensor-Based Perception and LiDAR sensors help the robot perceive its surroundings and
Data Input navigate around obstacles by providing data to the DRL
model.

3 DRL Training Process Rewards are given for efficient navigation and avoiding
collisions, while penalties are assigned for inefficiencies.
Training occurs in a simulated environment using Gazebo,
running multiple episodes.

4 Efficient Navigation and DRL allows dynamic path adjustment based on real-time
Path Planning changes, improving collision avoidance and overall navigation
efficiency

5 Smart Factory Operations The DRL-based system enables the robot to autonomously
and Task Efficiency perform tasks, reduce downtime, and improve productivity.

6 Performance Metrics and Tracks learning progress, evaluates efficiency, and assesses
Evaluation adaptability in dynamic environments.

7 Simulation and Deployment Training and testing occur in Gazebo, with real-time policy
deployment to the mobile robot for practical use.

8 DRL Algorithm Uses algorithms like DQN, allowing the model to continue
Implementation learning and improving navigation strategies over time.

14
4.4 SOFTWARE

Table 4.4 Software used

Sl. No Software used Description

1 ROS2 (Robot Robot Operation Management The ROS2 framework provides the
Operating backbone for managing robot operations, communication between nodes, and
System 2) integration of various software modules. It ensures efficient and modular
development of the autonomous navigation system.

2 Gazebo Simulation Environment Gazebo is used to create a realistic simulation


environment for testing and training the TurtleBot3. It allows for the safe
exploration of navigation strategies and optimization of the DRL model
before real-world deployment.

3 Python Algorithm Implementation Python is utilized to implement the Deep Q-


Network (DQN) algorithm and other related machine learning models. It
provides the flexibility to develop and test different components of the
navigation system efficiently.

4 TensorFlow Neural Network Training TensorFlow is used to develop and train the
neural network models required for the DRL algorithm. It allows for the
efficient processing and optimization of large datasets during the training
phase.

5 Keras High-Level Neural Networks API Keras is used alongside TensorFlow to


simplify the process of building and training deep learning models. It provides
an easy-to-use interface for creating complex neural network architectures.

6 RViz Visualization Tool RViz is used for visualizing the robot’s environment,

15
sensor data, and navigation paths. It provides a graphical interface to monitor
and analyze the robot’s performance in real-time.

4.5 Building a dynamic environment for the use case

In today's fast-paced logistics landscape, the integration of Automated Mobile Robots (AMRs) and
Automated Guided Vehicles (AGVs) is transforming warehouse operations. In this dynamic environment,
four strategically positioned storage racks are utilized to hold various packages as shown in Figure 4.5, while
a designated area in one corner contains a pile of boxes. This pile serves as the initial point for the AGV,
which is responsible for picking up packages and transporting them to the AMR, specifically the TurtleBot
3. The AMR is equipped with advanced navigation capabilities, allowing it to autonomously maneuver
through the warehouse and receive packages from the AGV.

Once the AGV successfully transfers a package to the TurtleBot 3, the AMR determines the optimal storage
rack for delivery based on factors such as rack availability and storage algorithms. It then navigates to the
selected rack, where it interfaces with an Automated Storage and Retrieval System (ASRS) designed to
optimize space and streamline the storage process. This interaction not only enhances the efficiency of
package handling but also ensures that the warehouse operates smoothly and effectively.

The benefits of this dynamic environment are substantial, including increased operational efficiency,
improved space utilization, and reduced labor costs. By automating repetitive tasks, the warehouse can
maintain continuous operations, allowing human staff to focus on higher-value activities that require critical
thinking and problem-solving. Overall, the integration of AMRs and AGVs exemplifies how technology can
revolutionize warehouse management, making it more flexible and responsive to the ever-changing demands
of the supply chain.

16
Fig 4.5 Dynamic Environment of Warehouse scenario

4.6 Navigation using SLAM and Nav2

4.6.1 Creating a Warehouse Environment in Gazebo

Gazebo offers a robust platform for designing and simulating realistic environments, making it ideal for
creating virtual warehouses. This guide provides a step-by-step approach to building a warehouse
environment, incorporating essential elements such as shelves, cabinets, and cardboard boxes. It also
explains how to save the environment and integrate it with ROS2 for advanced simulations.

Begin by launching Gazebo from the terminal. The default interface presents an empty world with a ground
plane, serving as the foundation for your warehouse. From the Insert tab on the left panel, you can drag and
drop various built-in models into the environment. These models include common warehouse components
such as shelves, cabinets, and tables.

4.6.2 Adding Warehouse Elements


Gazebo provides pre-built models like shelves for storage racks, cabinets for office areas, and cardboard
boxes to simulate realistic storage conditions. If additional models are required, you can download them
from the official Gazebo model repository. After downloading, extract and place the models in the
~/.gazebo/models directory, and update the GAZEBO_MODEL_PATH in your environment configuration
file for accessibility.

4.6.3 Customizing the Warehouse


Once the basic elements are in place, customization ensures the warehouse layout supports realistic
simulations. Clear aisles should be maintained between shelves for robot navigation as shown in Figure
4.6.3, and proper lighting can be added for better visibility. While adding details enhances realism, be
cautious of performance bottlenecks caused by overloading the simulation with complex models.

17
Fig 4.6.3 Customized Environment in Warehouse in gazebo

4.6.4 Saving and Using the Environment

After constructing your warehouse, save the environment via File -> Save World As in
Gazebo. For example, saving the file as
/home/sathvika/warehouse_ws/src/my_custom_world3.world allows for easy future access.
You can launch this custom environment directly by running the command:

gazebo /home/sathvika/warehouse_ws/src/my_custom_world3.world

4.6.5 Integrating with ROS2

To enable advanced simulations with ROS2, edit the .world file to include ROS2-compatible
plugins. These plugins enable functionalities such as state management and spawning objects
dynamically as shown in Figure 4.6.5. The launch file of the TurtleBot3 simulation package
can be modified to use your custom warehouse environment. Copy and edit the TurtleBot3
launch file to reference your .world file, and launch it with:

ros2 launch /home/sathvika/warehouse_ws/src/custom_turtlebot3_world.launch.py

18
Fig 4.6.5 Custom Warehouse Environment with TurtleBot3

4.6.6 Building a Map and Navigation

With the environment set, use SLAM (Simultaneous Localization and Mapping) to generate a map.
Launch the TurtleBot3 simulation and the Cartographer SLAM package to visualize the mapping process
in RViz. Drive the robot around manually using teleop tools to explore the warehouse and complete the
map. Save the map files as .pgm and .yaml for future use with the navigation stack as shown in Figure
4.6.6.

Fig 4.6.6 Map of the Warehouse Environment

4.6.7 SLAM and Nav2

ROS2 Nav2 Stack combines SLAM for mapping with Adaptive Monte Carlo Localization (AMCL) for
determining the robot’s position on the generated map. For navigation, global planners like A* and NavFn
compute efficient routes, while local planners such as the Dynamic Window Approach (DWA) ensure safe,

19
real-time navigation by avoiding obstacles. Cost maps are used to represent environmental details, enabling
smooth path planning and execution.

4.6.8 Setting Navigation Goals

Once the map is generated, use RViz to set the robot’s initial pose and navigation goals. The "2D Pose
Estimate" tool helps define the starting location, while the "Nav2 Goal" tool allows you to set destinations
on the map. The robot will autonomously navigate to the goal as shown in Figure 4.6.7, leveraging the Nav2
Stack's planning and localization capabilities.

By following these steps, a functional warehouse environment is created in Gazebo, integrate it with ROS2,
and enable autonomous navigation for robotic simulations. This setup is invaluable for testing and refining
algorithms for real-world applications.

Fig 4.6.7 Navigation of TurtleBot3 in Warehouse Environment

20
4.7 DQN Training:

The TurtleBot 3 is trained in four different environments no obstacles as shown in Figure 4.7a, static
obstacles as shown in Figure 4.7b, dynamic obstacles as shown in Figure 4.7c, and a combination of static
and dynamic obstacles as shown in Figure 4.7d.

Fig 4.7a No Obstacles

Fig 4.7b Static Obstacles

21
Fig 4.7c Dynamic obstacles

Fig 4.7d Combination of Static and Dynamic Obstacles

The command executed ros2 launch turtlebot3_gazebo turtlebot3_dqn_stage1.launch.py is part of the


Robot Operating System 2 (ROS 2) framework, specifically designed to launch a simulation for the
TurtleBot3 robot using the Gazebo simulation environment. The command starts with ros2, which is the
command-line interface for ROS 2, followed by the launch keyword that indicates the intention to run a
launch file. The package turtlebot3_gazebo contains the necessary configurations and models for simulating
the TurtleBot3 robot. Finally, turtlebot3_dqn_stage1.launch.py is the specific launch file being executed,
which likely sets up a simulation scenario tailored for reinforcement learning tasks using a Deep Q-Network
(DQN).

This file orchestrates the initialization of various nodes and parameters required for the simulation.

Upon executing the command, the system begins by providing log initialization messages. The first output
indicates where all log files generated during this session will be stored, specifically in the user’s home
directory under .ros/log. This is important for debugging and reviewing the simulation's performance later.
The log also states that the default logging verbosity is set to "INFO," meaning that the system will display
informational messages that provide insights into what is happening during the execution of the launch file.
This level of verbosity helps users track the progress and identify any issues that may arise.

22
Next, the output indicates that the URDF (Unified Robot Description Format) file for the TurtleBot3 Burger
model is being utilized. The URDF file is crucial as it defines the robot's physical structure, including its
components, joints, and dimensions. This information is essential for simulating the robot accurately in the
Gazebo environment. By knowing the robot's configuration, the simulation can replicate its movements and
interactions with the environment realistically.

As the launch file executes, various processes are initiated, each with a unique process ID (pid). The output
shows that the Gazebo server (gzserver), Gazebo client (gzclient), and other nodes like robot_state_publisher
and spawn_entity.py are started. The Gazebo server is responsible for running the simulation, while the
Gazebo client provides a graphical interface for users to interact with the simulation environment. The
robot_state_publisher publishes the robot's state to the ROS network, which is vital for tracking its position
and orientation. Meanwhile, spawn_entity.py is tasked with loading the robot model into the Gazebo
simulation.
The robot_state_publisher outputs messages indicating that it is successfully identifying and publishing
various segments of the robot, such as base_footprint, base_link, and others. This information is critical for
visualizing the robot's state within the simulation and ensuring that its movements and interactions are
accurately represented. Additionally, there is a note about the selected interface being non-multicast-capable,
which is a technical detail indicating that certain network communication features are disabled due to the use
of the loopback interface.

The process of spawning the TurtleBot3 model into the Gazebo environment is indicated by the output from
spawn_entity.py. It confirms that the entity has been successfully loaded, meaning the TurtleBot3 Burger
model is now part of the simulation. This step is crucial for enabling any interactions or behaviors that the
robot will exhibit during the simulation, such as moving, sensing, or responding to commands.

Following the successful spawning of the robot, the Gazebo server provides additional configuration details.
It specifies parameters related to the robot's wheels, such as their separation and diameter, which are vital for
simulating realistic movement. The server also indicates that it is publishing odometry data, which tracks the
robot's movement and position relative to its environment. This data is essential for navigation and for any
algorithms that rely on understanding the robot's location.

Finally, the output concludes with a message indicating that the spawn_entity.py process has finished
cleanly as shown in Fig 4.7e. This signifies that the spawning operation was successful and that the
simulation is now fully set up and ready for interaction. Users can now engage with the Gazebo simulation,
sending commands to the TurtleBot3 and observing its behavior in the simulated environment. This setup is
particularly useful for testing algorithms, such as those based on reinforcement learning, in a controlled and
safe manner before deploying them on a physical robot.

23
Fig 4.7e Launching the Environment

The command executed is ros2 run turtlebot3_dqn dqn_gazebo 1 as shown in Fig 4.7f, which is part of
the Robot Operating System 2 (ROS 2) framework. This command is used to run a specific node from the
turtlebot3_dqn package, which contains implementations for reinforcement learning algorithms tailored for
the TurtleBot3 robot. In this case, the node being executed is dqn_gazebo, which suggests that it is
responsible for controlling the TurtleBot3 in a Gazebo simulation environment. The final argument, 1,
indicates a specific configuration or mode for the node, such as a particular scenario or seed for random
number generation.

Upon execution, the output begins with a log message indicating that the selected network interface is "lo,"
which refers to the loopback interface. This interface is not multicast-capable, so multicast communication is
disabled. This detail is primarily informational, indicating that certain network features will not be available
during this run. Following this message, the output prints "init!!!," suggesting that the initialization process
for the dqn_gazebo node has started successfully. This initialization is critical for setting up the necessary
parameters and configurations for the algorithm to operate effectively within the simulation.

Finally, the output specifies the goal pose for the TurtleBot3, which is indicated as "0.5 0.0." This means that
the target position for the robot in the simulation is set to coordinates (0.5, 0.0) in the Gazebo environment.
These coordinates likely represent a position in a 2D space, with the first value (0.5) indicating the x-
coordinate and the second value (0.0) indicating the y-coordinate as shown in Fig . Setting a goal pose is an
essential step in any robotic navigation task, as it defines the target location that the robot will aim to reach
during its operation in the simulation. Overall, this command and its output demonstrate the initialization of
a reinforcement learning scenario for the TurtleBot3 robot within a simulated environment.

24
Fig 4.7f Setting Goal Points

The command ros2 run turtlebot3_dqn dqn_environment as shown in Fig 4.7g is used to execute a
specific node from the turtlebot3_dqn package within the Robot Operating System 2 (ROS 2) framework.
This command is likely part of a reinforcement learning setup for the Turtle Bot3 robot. The node being
executed, dqn_environment, is responsible for managing the environment in which the reinforcement
learning algorithm operates.

Upon execution, the output begins with a log message indicating that the selected network interface is "lo,"
which refers to the loopback interface. This interface is not multicast-capable, leading to the disabling of
multicast communication. This message serves as an informational note, indicating that certain network
features will not be available during this run.

Fig 4.7g Initializing Environment for DQN

The initialization of the dqn_environment node is crucial as it sets up the parameters and configurations
necessary for the reinforcement learning process. This environment typically includes defining the state
space, action space, and reward structure that the learning algorithm will use to train the TurtleBot3.

Overall, the command ros2 run turtlebot3_dqn dqn_agent 1 signifies the start of the environment setup
for the reinforcement learning task, allowing the TurtleBot3 to interact with its surroundings and learn from
its experiences in a simulated context.

25
When executing the command ros2 run turtlebot3_dqn dqn_agent 1, the DQN agent for the TurtleBot3 robot
is initiated within a ROS 2 environment. The output generated during the execution provides insights into
the agent's decision-making process and performance metrics. Initially, the agent makes a prediction using
its neural network model, indicated by the line 1/1 [==============================] - 0s
15ms/step, which shows that it processed one sample in approximately 15 milliseconds. The subsequent
lines display the actions chosen by the agent, with outputs such as 3, 3, and 4, representing the specific
actions selected based on the current state. These actions correspond to the agent's movement commands,
which could include moving backward, turning, or stopping, depending on how the action space is defined
in the environment.

As the episode progresses, the output includes a summary indicating the episode number as shown in Fig
4.7h, which in this case is Episode: 60, along with the cumulative score of -41.11082100868225. This
negative score suggests that the agent encountered penalties or performed poorly during the episode.
Additionally, the memory length is reported as 24268, indicating that the agent has accumulated a substantial
number of experiences in its replay memory, which it can use for training. The current value of epsilon is
noted as 0.6491026283684022, reflecting the exploration rate of the agent; this means the agent is still
actively exploring its action space. Following this summary, the output continues with additional action
selections, such as 3, 3, 2, 4, 2, 1, and 2, which further illustrates the agent's ongoing interaction with the
environment and its evolving decision-making process as it learns from its experiences. Overall, the output
encapsulates the agent's learning journey, highlighting both its actions and performance metrics as it
navigates its task.

Fig 4.7h Training interms of Episodes

The output from the command ros2 run turtlebot3_dqn dqn_environment provides insights into the
TurtleBot3 robot's navigation and performance as it interacts with its environment. The output as shown in
Fig 4.7i often presents a series of numerical values that represent key metrics related to the robot's operation.
For instance, these values may include the robot's distance to a target goal, the angle relative to that goal, and
measurements from various sensors, such as distance to nearby obstacles. These metrics are critical for the
robot's navigation and decision-making processes, as they inform the system about its environment and
guide its actions. Additionally, the output may indicate the success or failure of specific tasks, providing

26
feedback that can be used to adjust the robot's behavior or improve its performance in future attempts.
Overall, the output serves as a valuable tool for monitoring the robot's status and facilitating its learning and
adaptation in a dynamic environment.

Fig 4.7i Navigation Metrics Overview

In the second terminal output after running the command, the messages indicate the status of the robot's goal
navigation. When you see "generate a new goal :)" as shown in Fig 4.7j, it signifies that the robot has
successfully reached its previous goal and is now ready to pursue a new one. This is a positive confirmation
of successful navigation. Conversely, the repeated message "reset the gazebo environment :(" suggests that
the robot was unable to reach its intended goal, necessitating a reset of the Gazebo environment. This reset is
typically triggered when there are obstacles or issues preventing the robot from completing its task. The
output thus reflects the robot's progress: successful goal achievement is marked by the generation of a new
goal, while persistent resets indicate challenges in navigation that require the environment to be cleared for a
fresh start

Fig 4.7j Goal Navigation Feedback

27
4.8 DQN Testing :

Testing Procedure in Gazebo Environment

The testing of the DQN-based navigation framework was conducted in the Gazebo simulation environment
using the ros2 run turtlebot3_dqn dqn_test command. This command initializes the TurtleBot3 robot in
the simulation environment and evaluates its navigation performance using pre-trained DQN models. The
pre-trained models, stored in .h5 files, encapsulate the neural network weights and learned policies, while
accompanying .json files contain metadata, including the epsilon values, which represent the exploration rate
during training.

The training process leading to these models involved running the DQN framework for 90 to 100 episodes,
during which the agent learned optimal navigation strategies by interacting with the environment. The
storage of trained model data occurred at intervals of 10 episodes, ensuring that only meaningful checkpoints
of the learning process were saved. This periodic saving not only reduced storage redundancy but also
provided the flexibility to resume training or testing from specific milestones.
During the testing phase, the dqn_test command loads the stored model data and evaluates the agent's ability
to navigate efficiently within the simulated environment. By using the trained models, the robot executed
navigation tasks with minimal exploration, focusing primarily on exploiting the learned policy to achieve
efficient path planning and obstacle avoidance as shown in Fig 4.8. The Gazebo environment provided a

28
realistic platform for assessing the model's performance, simulating dynamic industrial scenarios, and
validating the robustness of the DQN framework.

This approach of testing with pre-trained models ensures that the results reflect the true potential of the
DQN-based navigation strategy under near-real-world conditions. The structured storage of models and
metadata, combined with the episodic testing framework, forms a robust methodology for iterative
improvement and deployment in physical implementations.

Fig 4.8 Testing in Warehouse Environment

CHAPTER 5
RESULTS AND DISCUSSIONS

The evaluation of Deep Q-Network (DQN) and SLAM with Navigation2 (NAV2) for autonomous
navigation of mobile robots in a smart factory environment revealed significant differences in their
performance. DQN demonstrated clear advantages in terms of adaptability, efficiency, and real-time
decision-making compared to the map-dependent SLAM NAV2 algorithm. This study highlights the
suitability of reinforcement learning techniques for complex and dynamic industrial settings.

One of the most significant findings is that DQN does not rely on a predefined map for navigation. Unlike
SLAM NAV2, which depends on accurate mapping and predefined routes, DQN learns optimal paths
through interaction with the environment. This independence from static maps allows DQN to adapt
dynamically to changes in the factory layout, making it ideal for smart factories where configurations often
shift to meet production demands. DQN’s ability to identify efficient paths in real time resulted in smoother
and more flexible navigation, especially in unstructured or constantly changing environments.

In scenarios involving dynamic obstacles, DQN exhibited better obstacle avoidance and adaptability
compared to SLAM NAV2. The decision-making process in DQN, driven by reinforcement learning
policies, enabled it to react proactively to unexpected changes in the environment. SLAM NAV2, on the

29
other hand, struggled with obstacles outside its mapped data and occasionally required manual
reinitialization or re-planning. This ability to learn from experience and continuously refine navigation
strategies makes DQN particularly effective in unpredictable settings.

While DQN requires significant training in a controlled environment before deployment, its computational
demands during real-time operation are lower than those of SLAM NAV2. This characteristic makes DQN
more efficient for deployment in resource-constrained systems typical of smart factories. The initial training
phase ensures that the robot acquires robust navigation strategies, but once deployed, DQN’s lightweight
computational requirements enhance its scalability and usability in industrial applications.

Overall, the adaptability and efficiency of DQN make it a strong contender for autonomous navigation in
smart factories. By eliminating the dependence on predefined maps, DQN simplifies the setup process and
enhances flexibility, reducing downtime during factory reconfigurations. However, despite its strengths,
DQN’s ability to generalize across entirely new environments without retraining remains a limitation that
warrants further exploration. Hybrid frameworks that combine the dynamic adaptability of DQN with the
initial mapping strengths of SLAM NAV2 could offer even greater performance.

This study demonstrates that DQN has the potential to revolutionize navigation strategies in smart factories
by providing a robust, flexible, and efficient solution for dynamic and unstructured environments. Its
advantages over traditional map-based methods like SLAM NAV2 make it a promising choice for the next
generation of industrial automation systems.

CHAPTER 6
CONCLUSION AND FUTURE SCOPE

This study demonstrates that Deep Q-Network (DQN)-based navigation outperforms SLAM with
Navigation2 (NAV2) in terms of adaptability, obstacle avoidance, and operational efficiency for autonomous
mobile robots in smart factories. By eliminating the reliance on predefined maps, DQN provides a robust
and flexible solution that aligns well with the dynamic and evolving layouts of modern industrial
environments. These findings highlight the potential of reinforcement learning techniques to revolutionize
navigation strategies, offering greater efficiency and reduced setup complexity.

While the results in simulation showcase the promise of DQN for industrial applications, physical
implementation remains a critical next step. Deploying the DQN framework in real-world smart factory
environments will provide insights into its performance under practical conditions, including sensor noise,
hardware limitations, and environmental unpredictability. Such implementations will also help address
challenges like generalization to new factory layouts and unforeseen obstacles.

Future work could explore hybrid navigation frameworks that integrate the adaptability of DQN with
traditional mapping techniques to enhance overall robustness. Additionally, real-world trials can focus on
optimizing the training process, ensuring that DQN models can be deployed faster and more efficiently. By
30
bridging the gap between simulation and physical deployment, this research can pave the way for smarter,
more autonomous robots capable of seamlessly integrating into the next generation of automated factories.

REFERENCES

[1] Zhao, Y., Zhang, Y., & Wang, S. (2021, December). A review of mobile robot path planning
based on deep reinforcement learning algorithm. In Journal of Physics: Conference Series (Vol.
2138, No. 1, p. 012011). IOP Publishing.

[2] Martini, M., Eirale, A., Cerrato, S., & Chiaberge, M. (2023, March). Pic4rl-gym: a ros2
modular framework for robots autonomous navigation with deep reinforcement learning. In 2023
3rd international conference on computer, control and robotics (ICCCR) (pp. 198-202). IEEE.

[3] Yokoyama, K., & Morioka, K. (2020, January). Autonomous mobile robot with simple
navigation system based on deep reinforcement learning and a monocular camera. In 2020
IEEE/SICE International Symposium on System Integration (SII) (pp. 525-530). IEEE.

[4] Balachandran, A., Lal, A., & Sreedharan, P. (2022, October). Autonomous navigation of an
amr using deep reinforcement learning in a warehouse environment. In 2022 IEEE 2nd Mysore
Sub Section International Conference (MysuruCon) (pp. 1-5). IEEE.

[5] Moorthy, C., Shafeek, A., Gurunathan, V., Manoj, M., Naveen, V., & Krishnan, P. H. (2024,
August). Comprehensive Exploration of Enhanced Navigation Efficiency via Deep

31
Reinforcement Learning Techniques. In 2024 Second International Conference on Intelligent
Cyber Physical Systems and Internet of Things (ICoICI) (pp. 1087-1094). IEEE.

[6] Abuali, S., Sammour, P., Asfaw, M. T., Robu, B., & Hably, A. Controlling a Mobile Robot
Using ROS2 and Machine Learning.

[7] Surmann, H., Jestel, C., Marchel, R., Musberg, F., Elhadj, H., & Ardani, M. (2020). Deep
reinforcement learning for real autonomous mobile robot navigation in indoor environments.
arXiv preprint arXiv:2005.13857.

[8] Yue, P., Xin, J., Zhao, H., Liu, D., Shan, M., & Zhang, J. (2019, June). Experimental
research on deep reinforcement learning in autonomous navigation of mobile robot. In 2019 14th
IEEE Conference on Industrial Electronics and Applications (ICIEA) (pp. 1612-1616). IEEE.

[9] Cimurs, R., Suh, I. H., & Lee, J. H. (2021). Goal-driven autonomous exploration through
deep reinforcement learning. IEEE Robotics and Automation Letters, 7(2), 730-737.

[10] Chen, J., Wu, K., Hu, M., Suganthan, P. N., & Makur, A. (2024). LiDAR-Based End-to-
End Active SLAM Using Deep Reinforcement Learning in Large-Scale Environments. IEEE
Transactions on Vehicular Technology.

APPENDIX
Dqn_gazebo.py file :

import os
import random
import sys

from gazebo_msgs.srv import DeleteEntity


from gazebo_msgs.srv import SpawnEntity
from geometry_msgs.msg import Pose
import rclpy
from rclpy.node import Node
from rclpy.qos import QoSProfile
from std_srvs.srv import Empty

class DQNGazebo(Node):
def _init_(self, stage):
super()._init_('dqn_gazebo')

32
"""************************************************************
** Initialise variables
************************************************************"""
# Stage
self.stage = int(stage)

# Entity 'goal'
self.entity_dir_path = os.path.dirname(os.path.realpath(_file_))
self.entity_dir_path = self.entity_dir_path.replace(
'turtlebot3_machine_learning/turtlebot3_dqn/turtlebot3_dqn/dqn_gazebo',
'turtlebot3_simulations/turtlebot3_gazebo/models/turtlebot3_dqn_world/goal_box')
self.entity_path = os.path.join(self.entity_dir_path, 'model.sdf')
self.entity = open(self.entity_path, 'r').read()
self.entity_name = 'goal'

self.goal_pose_x = 0.5
self.goal_pose_y = 0.0

self.init_state = False

"""************************************************************
** Initialise ROS publishers, subscribers and clients
************************************************************"""
qos = QoSProfile(depth=10)

# Initialise publishers
self.goal_pose_pub = self.create_publisher(Pose, 'goal_pose', qos)

# Initialise client
self.delete_entity_client = self.create_client(DeleteEntity, 'delete_entity')
self.spawn_entity_client = self.create_client(SpawnEntity, 'spawn_entity')
self.reset_simulation_client = self.create_client(Empty, 'reset_simulation')

# Initialise servers
self.task_succeed_server = self.create_service(
Empty,
'task_succeed',
self.task_succeed_callback)
self.task_fail_server = self.create_service(Empty, 'task_fail', self.task_fail_callback)

# Process

33
self.publish_timer = self.create_timer(
0.010, # unit: s
self.publish_callback)

"""*******************************************************************************
** Callback functions and relevant functions
*******************************************************************************"""
def publish_callback(self):
# Init
if self.init_state is False:
self.delete_entity()
self.reset_simulation()
self.init_state = True
print("init!!!")
print("Goal pose: ", self.goal_pose_x, self.goal_pose_y)

# Publish goal pose


goal_pose = Pose()
goal_pose.position.x = self.goal_pose_x
goal_pose.position.y = self.goal_pose_y
self.goal_pose_pub.publish(goal_pose)
self.spawn_entity()

def task_succeed_callback(self, request, response):


self.delete_entity()
self.generate_goal_pose()
print("generate a new goal :)")

return response

def task_fail_callback(self, request, response):


self.delete_entity()
self.reset_simulation()
self.generate_goal_pose()
print("reset the gazebo environment :(")

return response

def generate_goal_pose(self):
if self.stage != 4:
self.goal_pose_x = random.randrange(-15, 16) / 10.0
self.goal_pose_y = random.randrange(-15, 16) / 10.0

34
else:
goal_pose_list = [[1.0, 0.0], [2.0, -1.5], [0.0, -2.0], [2.0, 2.0], [0.8, 2.0],
[-1.9, 1.9], [-1.9, 0.2], [-1.9, -0.5], [-2.0, -2.0], [-0.5, -1.0]]
index = random.randrange(0, 10)
self.goal_pose_x = goal_pose_list[index][0]
self.goal_pose_y = goal_pose_list[index][1]
print("Goal pose: ", self.goal_pose_x, self.goal_pose_y)

def reset_simulation(self):
req = Empty.Request()
while not self.reset_simulation_client.wait_for_service(timeout_sec=1.0):
self.get_logger().info('service not available, waiting again...')

self.reset_simulation_client.call_async(req)

def delete_entity(self):
req = DeleteEntity.Request()
req.name = self.entity_name
while not self.delete_entity_client.wait_for_service(timeout_sec=1.0):
self.get_logger().info('service not available, waiting again...')

self.delete_entity_client.call_async(req)

def spawn_entity(self):
goal_pose = Pose()
goal_pose.position.x = self.goal_pose_x
goal_pose.position.y = self.goal_pose_y
req = SpawnEntity.Request()
req.name = self.entity_name
req.xml = self.entity
req.initial_pose = goal_pose
while not self.spawn_entity_client.wait_for_service(timeout_sec=1.0):
self.get_logger().info('service not available, waiting again...')

self.spawn_entity_client.call_async(req)

def main(args=None):

if args is None:

args = sys.argv[1:] # Get all command-line arguments

35
if len(args) < 1:

print("Usage: ros2 run turtlebot3_dqn dqn_gazebo <stage_number>")

return

stage = args[0] # Get the stage argument

try:

stage = int(stage) # Convert to integer

except ValueError:

print("Stage must be an integer.")

return

rclpy.init(args=args)

dqn_gazebo = DQNGazebo(stage) # Pass the integer stage

rclpy.spin(dqn_gazebo)

dqn_gazebo.destroy()

rclpy.shutdown()

if _name_ == '_main_':
main()

Dqn_environment.py file:

import math
import numpy

36
from geometry_msgs.msg import Pose
from geometry_msgs.msg import Twist
from nav_msgs.msg import Odometry
import rclpy
from rclpy.node import Node
from rclpy.qos import QoSProfile
from rclpy.qos import qos_profile_sensor_data
from sensor_msgs.msg import LaserScan
from std_srvs.srv import Empty

from turtlebot3_msgs.srv import Dqn

class DQNEnvironment(Node):
def _init_(self):
super()._init_('dqn_environment')

"""************************************************************
** Initialise variables
************************************************************"""
self.goal_pose_x = 0.0
self.goal_pose_y = 0.0
self.last_pose_x = 0.0
self.last_pose_y = 0.0
self.last_pose_theta = 0.0

self.action_size = 5
self.done = False
self.fail = False
self.succeed = False

self.goal_angle = 0.0
self.goal_distance = 1.0
self.init_goal_distance = 1.0
self.scan_ranges = []
self.min_obstacle_distance = 10.0
self.min_obstacle_angle = 10.0

self.local_step = 0

"""************************************************************
** Initialise ROS publishers and subscribers

37
************************************************************"""
qos = QoSProfile(depth=10)

# Initialise publishers
self.cmd_vel_pub = self.create_publisher(Twist, 'cmd_vel', qos)

# Initialise subscribers
self.goal_pose_sub = self.create_subscription(
Pose,
'goal_pose',
self.goal_pose_callback,
qos)
self.odom_sub = self.create_subscription(
Odometry,
'odom',
self.odom_callback,
qos)
self.scan_sub = self.create_subscription(
LaserScan,
'scan',
self.scan_callback,
qos_profile=qos_profile_sensor_data)

# Initialise client
self.task_succeed_client = self.create_client(Empty, 'task_succeed')
self.task_fail_client = self.create_client(Empty, 'task_fail')

# Initialise servers
self.dqn_com_server = self.create_service(Dqn, 'dqn_com', self.dqn_com_callback)

"""*******************************************************************************
** Callback functions and relevant functions
*******************************************************************************"""
def goal_pose_callback(self, msg):
self.goal_pose_x = msg.position.x
self.goal_pose_y = msg.position.y

def odom_callback(self, msg):


self.last_pose_x = msg.pose.pose.position.x
self.last_pose_y = msg.pose.pose.position.y
_, _, self.last_pose_theta = self.euler_from_quaternion(msg.pose.pose.orientation)

38
goal_distance = math.sqrt(
(self.goal_pose_x-self.last_pose_x)**2
+ (self.goal_pose_y-self.last_pose_y)**2)

path_theta = math.atan2(
self.goal_pose_y-self.last_pose_y,
self.goal_pose_x-self.last_pose_x)

goal_angle = path_theta - self.last_pose_theta


if goal_angle > math.pi:
goal_angle -= 2 * math.pi

elif goal_angle < -math.pi:


goal_angle += 2 * math.pi

self.goal_distance = goal_distance
self.goal_angle = goal_angle

def scan_callback(self, msg):


self.scan_ranges = msg.ranges
self.min_obstacle_distance = min(self.scan_ranges)
self.min_obstacle_angle = numpy.argmin(self.scan_ranges)

def get_state(self):
state = list()
state.append(float(self.goal_distance))
state.append(float(self.goal_angle))
state.append(float(self.min_obstacle_distance))
state.append(float(self.min_obstacle_angle))
self.local_step += 1

# Succeed
if self.goal_distance < 0.20: # unit: m
print("Goal! :)")
self.succeed = True
self.done = True
self.cmd_vel_pub.publish(Twist()) # robot stop
self.local_step = 0
req = Empty.Request()
while not self.task_succeed_client.wait_for_service(timeout_sec=1.0):
self.get_logger().info('service not available, waiting again...')
self.task_succeed_client.call_async(req)

39
# Fail
if self.min_obstacle_distance < 0.13: # unit: m
print("Collision! :(")
self.fail = True
self.done = True
self.cmd_vel_pub.publish(Twist()) # robot stop
self.local_step = 0
req = Empty.Request()
while not self.task_fail_client.wait_for_service(timeout_sec=1.0):
self.get_logger().info('service not available, waiting again...')
self.task_fail_client.call_async(req)

if self.local_step == 500:
print("Time out! :(")
self.done = True
self.local_step = 0
req = Empty.Request()
while not self.task_fail_client.wait_for_service(timeout_sec=1.0):
self.get_logger().info('service not available, waiting again...')
self.task_fail_client.call_async(req)

return state

def reset(self):
return self.state

def dqn_com_callback(self, request, response):


action = request.action
twist = Twist()
twist.linear.x = 0.3
twist.angular.z = ((self.action_size - 1)/2 - action) * 1.5
self.cmd_vel_pub.publish(twist)

response.state = self.get_state()
response.reward = self.get_reward(action)
response.done = self.done

if self.done is True:
self.done = False
self.succeed = False
self.fail = False

40
if request.init is True:
self.init_goal_distance = math.sqrt(
(self.goal_pose_x-self.last_pose_x)**2
+ (self.goal_pose_y-self.last_pose_y)**2)

return response

def get_reward(self, action):


yaw_reward = 1 - 2*math.sqrt(math.fabs(self.goal_angle / math.pi))

distance_reward = (2 * self.init_goal_distance) / \
(self.init_goal_distance + self.goal_distance) - 1

# Reward for avoiding obstacles


if self.min_obstacle_distance < 0.25:
obstacle_reward = -2
else:
obstacle_reward = 0

reward = yaw_reward + distance_reward + obstacle_reward

# + for succeed, - for fail


if self.succeed:
reward += 5
elif self.fail:
reward -= -10
print(reward)

return reward

"""*******************************************************************************
** Below should be replaced when porting for ROS 2 Python tf_conversions is done.
*******************************************************************************"""
def euler_from_quaternion(self, quat):
"""
Converts quaternion (w in last place) to euler roll, pitch, yaw
quat = [x, y, z, w]
"""
x = quat.x
y = quat.y
z = quat.z

41
w = quat.w

sinr_cosp = 2 * (w*x + y*z)


cosr_cosp = 1 - 2*(x*x + y*y)
roll = numpy.arctan2(sinr_cosp, cosr_cosp)

sinp = 2 * (w*y - z*x)


pitch = numpy.arcsin(sinp)

siny_cosp = 2 * (w*z + x*y)


cosy_cosp = 1 - 2 * (y*y + z*z)
yaw = numpy.arctan2(siny_cosp, cosy_cosp)

return roll, pitch, yaw

def main(args=None):
rclpy.init(args=args)
dqn_environment = DQNEnvironment()
rclpy.spin(dqn_environment)

dqn_environment.destroy()
rclpy.shutdown()

if _name_ == '_main_':
main()

Dqn_agent.py file:

import collections
from keras.layers import Activation
from keras.layers import Dense
from keras.layers import Dropout
from keras.models import Sequential
from keras.models import load_model
from keras.optimizers import RMSprop
import json
import numpy
import os
import random
import sys

42
import time

import rclpy
from rclpy.node import Node

from turtlebot3_msgs.srv import Dqn

class DQNAgent(Node):
def _init_(self, stage):
super()._init_('dqn_agent')

"""************************************************************
** Initialise variables
************************************************************"""
# Stage
self.stage = int(stage)

# State size and action size


self.state_size = 4
self.action_size = 5
self.episode_size = 3000

# DQN hyperparameter
self.discount_factor = 0.99
self.learning_rate = 0.00025
self.epsilon = 1.0
self.epsilon_decay = 0.99
self.epsilon_min = 0.05
self.batch_size = 100
self.train_start = 100

# Replay memory
self.memory = collections.deque(maxlen=1000000)

# Build model and target model


self.model = self.build_model()
self.target_model = self.build_model()
self.update_target_model()
self.update_target_model_start = 2000

# Load saved models

43
self.load_model = False
self.load_episode = 0
self.model_dir_path = os.path.dirname(os.path.realpath(_file_))
self.model_dir_path = self.model_dir_path.replace(
'turtlebot3_dqn/dqn_agent',
'model')
self.model_path = os.path.join(
self.model_dir_path,
'stage'+str(self.stage)+'_episode'+str(self.load_episode)+'.h5')

if self.load_model:
self.model.set_weights(load_model(self.model_path).get_weights())
with open(os.path.join(
self.model_dir_path,
'stage'+str(self.stage)+'_episode'+str(self.load_episode)+'.json')) as outfile:
param = json.load(outfile)
self.epsilon = param.get('epsilon')

"""************************************************************
** Initialise ROS clients
************************************************************"""
# Initialise clients
self.dqn_com_client = self.create_client(Dqn, 'dqn_com')

"""************************************************************
** Start process
************************************************************"""
self.process()

"""*******************************************************************************
** Callback functions and relevant functions
*******************************************************************************"""
def process(self):
global_step = 0

for episode in range(self.load_episode+1, self.episode_size):


global_step += 1
local_step = 0

state = list()
next_state = list()
done = False

44
init = True
score = 0

# Reset DQN environment


time.sleep(1.0)

while not done:


local_step += 1

# Aciton based on the current state


if local_step == 1:
action = 2 # Move forward
else:
state = next_state
action = int(self.get_action(state))

# Send action and receive next state and reward


req = Dqn.Request()
print(int(action))
req.action = action
req.init = init
while not self.dqn_com_client.wait_for_service(timeout_sec=1.0):
self.get_logger().info('service not available, waiting again...')

future = self.dqn_com_client.call_async(req)

while rclpy.ok():
rclpy.spin_once(self)
if future.done():
if future.result() is not None:
# Next state and reward
next_state = future.result().state
reward = future.result().reward
done = future.result().done
score += reward
init = False
else:
self.get_logger().error(
'Exception while calling service: {0}'.format(future.exception()))
break

# Save <s, a, r, s'> samples

45
if local_step > 1:
self.append_sample(state, action, reward, next_state, done)

# Train model
if global_step > self.update_target_model_start:
self.train_model(True)
elif global_step > self.train_start:
self.train_model()

if done:
# Update neural network
self.update_target_model()

print(
"Episode:", episode,
"score:", score,
"memory length:", len(self.memory),
"epsilon:", self.epsilon)

param_keys = ['epsilon']
param_values = [self.epsilon]
param_dictionary = dict(zip(param_keys, param_values))

# While loop rate


time.sleep(0.01)

# Update result and save model every 10 episodes


if episode % 10 == 0:
self.model_path = os.path.join(
self.model_dir_path,
'stage'+str(self.stage)+'_episode'+str(episode)+'.h5')
self.model.save(self.model_path)
with open(os.path.join(
self.model_dir_path,
'stage'+str(self.stage)+'_episode'+str(episode)+'.json'), 'w') as outfile:
json.dump(param_dictionary, outfile)

# Epsilon
if self.epsilon > self.epsilon_min:
self.epsilon *= self.epsilon_decay

def build_model(self):

46
model = Sequential()
model.add(Dense(
64,
input_shape=(self.state_size,),
activation='relu',
kernel_initializer='lecun_uniform'))
model.add(Dense(64, activation='relu', kernel_initializer='lecun_uniform'))
model.add(Dropout(0.2))
model.add(Dense(self.action_size, kernel_initializer='lecun_uniform'))
model.add(Activation('linear'))
model.compile(loss='mse', optimizer=RMSprop(lr=self.learning_rate, rho=0.9, epsilon=1e-06))
model.summary()

return model

def update_target_model(self):
self.target_model.set_weights(self.model.get_weights())

def get_action(self, state):


if numpy.random.rand() <= self.epsilon:
return random.randrange(self.action_size)
else:
state = numpy.asarray(state)
q_value = self.model.predict(state.reshape(1, len(state)))
print(numpy.argmax(q_value[0]))
return numpy.argmax(q_value[0])

def append_sample(self, state, action, reward, next_state, done):


self.memory.append((state, action, reward, next_state, done))

def train_model(self, target_train_start=False):


mini_batch = random.sample(self.memory, self.batch_size)
x_batch = numpy.empty((0, self.state_size), dtype=numpy.float64)
y_batch = numpy.empty((0, self.action_size), dtype=numpy.float64)

for i in range(self.batch_size):
state = numpy.asarray(mini_batch[i][0])
action = numpy.asarray(mini_batch[i][1])
reward = numpy.asarray(mini_batch[i][2])
next_state = numpy.asarray(mini_batch[i][3])
done = numpy.asarray(mini_batch[i][4])

47
q_value = self.model.predict(state.reshape(1, len(state)))
self.max_q_value = numpy.max(q_value)

if not target_train_start:
target_value = self.model.predict(next_state.reshape(1, len(next_state)))
else:
target_value = self.target_model.predict(next_state.reshape(1, len(next_state)))

if done:
next_q_value = reward
else:
next_q_value = reward + self.discount_factor * numpy.amax(target_value)

x_batch = numpy.append(x_batch, numpy.array([state.copy()]), axis=0)

y_sample = q_value.copy()
y_sample[0][action] = next_q_value
y_batch = numpy.append(y_batch, numpy.array([y_sample[0]]), axis=0)

if done:
x_batch = numpy.append(x_batch, numpy.array([next_state.copy()]), axis=0)
y_batch = numpy.append(y_batch, numpy.array([[reward] * self.action_size]), axis=0)

self.model.fit(x_batch, y_batch, batch_size=self.batch_size, epochs=1, verbose=0)

def main(args=None):
if args is None:
args = '3' # Default stage
rclpy.init(args=[args])
dqn_agent = DQNAgent(args)
rclpy.spin(dqn_agent)

dqn_agent.destroy()
rclpy.shutdown()

if _name_ == '_main_':
main()

48

You might also like