Artificial Potential Field Incorporated Deep-Q-Network Algorithm for Mobile
Artificial Potential Field Incorporated Deep-Q-Network Algorithm for Mobile
DOI: 10.32604/iasc.2023.028126
Article
1
Department of Robotics and Automation Engineering, PSG College of Technology, Coimbatore, 641004, India
2
Department of Electrical and Electronics Engineering, PSG College of Technology, Coimbatore, 641004, India
*Corresponding Author: A. Sivaranjani. Email: [email protected]
Received: 03 February 2022; Accepted: 13 March 2022
1 Introduction
The challenge of steering a mobile robot in an unknown area is known as motion planning. Navigating
local paths is a common and practical problem in autonomous mobile robotics research. The autonomous
robots use a tool path to choose the best path from point A to point B without colliding with any barriers
[1]. The proposed approach for mobile robots is in the face of increasing scientific and technological
breakthroughs is currently confronted with a complicated and dynamic world [2]. The traditional path
This work is licensed under a Creative Commons Attribution 4.0 International License, which
permits unrestricted use, distribution, and reproduction in any medium, provided the original
work is properly cited.
1136 IASC, 2023, vol.35, no.1
planning algorithms lack certain salient merits such as least working cost and minimal processing time. To
overcome these limitations, Reinforcement Learning (RL) has been proposed based on current
developments. The Deep neural network algorithm is a part of RL. The Deep Q-Network (DQN) is
utilized to train the TB3 in a Robot Operating System (ROS) simulation environment. TB3 is a robot is a
term that is frequently used in robotic systems. In this research, TB3 Waffle Pi is selected over
TB3 Burger due to its additional features such as enhanced sensing capabilities, better computing and
higher power available to drive the wheels. This aids in handling heavier payloads.
ROS is an accessible robotics framework that comprises a variety of open-source tools and applications
for building complex mobile robots [3]. It creates a transparent graph that allows robotics programming to be
written and merged in a flexible and orderly manner, such as low-level firmware, control algorithms, sensory
perceptions methodologies, navigation algorithms, and so on. Without the use of any programming, ROS
may connect with Gazebo directly. modifications to execute in simulations rather than the actual world
[4]. It enables the entire robotics computer system to work in real-time while being advanced to the
simulation’s desired pace. ROS Melodious is the ROS dispersion that was employed in this study. The
necessary tools for training the DQN agents are provided by ROS and Gazebo.
DQN Agent’s mission is to get the TB3 to its target without colliding. TB3 receives a positive reward
when it moves closer to its objective, and a bad incentive when it moves further away. The episode terminates
when the TB3 collides with an obstruction or after a set amount of time has passed. TB3 obtains a
tremendous positive bonus when it arrives at its target and a massive poor reward when it smashes into
an obstacle during the episode. The DQN approach was presented by Tai et al. for robot route
optimization simulators. Exaggeration of action value and sluggish fast convergence were determined to
be the DQN individual’s flaws [5].
To overcome this, the shortest path algorithm is included with DQN to achieve better results and to
primarily reduce the training time of the DQN algorithm. For effective path planning, the Artificial
Potential Field (APF) method is used with DQN. In the coordinate space, the artificial field consists of a
repellent vector field and an attractive prospective field. The resulting force is the consequence of the
attracting and negative charges combining. The size and direction of the resultant force define the robot’s
mobility state.
This algorithm’s Efficient quantitative equations and simplicity is commonly used for automatic guided
mobile robot navigation [6–8]. This method is frequently used to solve the local minima dilemma, which
arises when the total force applied to a robot is equal to zero. Alternative approaches for avoiding local
minima have been documented in several research projects, such as altering the Gaussian function of
potential and using a sample selection scheme. This could force the robot to take a longer route. This
problem is not considered in this study because, both DQN and APFs are used to find the next step of the
robot. The advantages of combined APF and DQN are minimum training time and that, the robot takes
the shortest path to reach the goal.
Yang et al. had explained about the fully automated warehouse or stock-keeping units performing the
“Goods to People” model majorly based on the material handling robot. With this robot, the industry can
conserve manpower and increase the efficiency of the plant by eliminating manual material handling and
transportation. This article had utilized the Deep-Q-Network algorithm with Q-learning methods are
combined with an actual approach and neural network perceived loudness innovations in relevance
feedback. As a result, the goal of this paper is to answer the problem of multi-robot trajectory tracking is
fully automated warehouses or stock-keeping units [9].
Bae et al. had analyzed the effectiveness for accomplishing specific tasks, several practical methods to
the challenge of multi-robot path planning. Moreover, all the robots involved in this group operate
individually as well as cooperatively depending on the given scenarios, thus the area of search is
IASC, 2023, vol.35, no.1 1137
improved. Reinforcement Learning in any path planning approach gives major focus to fixed space where
each object has interactivity [10].
Yu et al. (2020) had explained that by observing the scene and performing extraction of features, the
fitting obtained from the state action function was achieved using neural network models. The mapping
of the current state to hierarchy relevance feedback was achieved with the help of the enhancement
function, allowing the robot to become even more mobile. Route optimization efficiency in network
should be improved robotic systems, these two methodologies were naturally merged. This research has
yet to conduct experiments on dynamical environments and related scenarios. But emphasizes theoretical
accuracy with the conducted research [11].
Wang et al. (2020) had described that the dynamic path planning algorithm is incapable of solving
problems related to wheeled mobile robots with scenarios including slopes and dynamic obstacles
constantly moving at their rate. The Tree-Double Deep Q Network technique for variable trajectory
tracking in robotic systems is suggested in this research. The Q Network with a Double Deep is used in
this approach. It rejects the incomplete and outlier-based detected paths by improving the performance.
The DDQN approach is combined with the tree-based method because of the binary tree. This study took
the best option available in the present state and performed fewer activities, resulting in a path that
fits the restrictions. Eventually according to the obtained state-based results were repeated to the plurality
of eligible paths. This research utilizes ROS simulations and practical experimentations to verify the
hypothesis [12].
Ali et al. (2019) had explained about a Laser Range Finder’s LS and sensor fusion (LRF) The sensor to
reduce noise and ambiguity from sensor information and provide optimal path planning strategies, a fusion
method is used. The results from the experiment show that the methods can safely drive the robot in road
driving and circle situations. During navigation, motion control employing twin feedback mechanisms
based on the RAC-AFC controller is employed to track the WMR’s planning path while rejecting the
disruption. For a novel WMR platform that would navigate in a road roundabout environment,
localization and control algorithms are developed. A Resolution Accelerated Control (RAC) combined
with active control (AFC) for rejecting disturbances is utilized to control the kinematics characteristics of
WMR [13].
Sun et al. (2019) had analyzed the number of samples necessary for random exploration grows increases
with the number of steps required to achieve a reward in the reward curricular training approach. Displays
outstanding capacity to migrate to different unfamiliar places and good planning skills in a Mapless world.
The mechanism can also withstand current disturbances. The paper presents a reward-based training strategy
for curricular learning. Using sensor data as input and surging force and yaw moment as output, the system
achieves motion planning through the policy. This strategy addresses the issue of sparse rewards in
complicated contexts while avoiding the poor training effects of intermediary rewards [14].
Lin et al. (2013) had explained the TSK-type neural fuzzy networks are designed using a novel Adaptive
Group Organization Cooperative Evolutionary Algorithm (AGOCEA). To autonomously create neural fuzzy
networks, the suggested AGOCEA employs a group-based cooperative iterative method and a self-
organizing technique. It can dynamically estimate the parameters of adaptive neuro-fuzzy inference
networks, eliminating the need to assign some critical parameters ahead of time. Our model has a shorter
rescue time than a standard model that uses a static judgement method, according to simulation data [15].
Xin et al. (2017) had explained to simulate the mobile robot state-action value function, It is developed
and trained a deep Q-network (DQN). Without any finger features or feature detection, the DQN receives the
original Rgb values (image pixels) from the surrounding. Finally, while traveling, the mobile robot reaches
the objective point. Our hard-to-learn robot path planning method is a fantastic end-to-end mobile robot trip
1138 IASC, 2023, vol.35, no.1
planning solution method, according to the experimental data. This research proposes a revolutionary end-to-
end path planning approach for mobile robots utilizing a Deep Learning approach [16].
Liu et al. (2015) had analyzed topological recognition quality criteria, notably for semi-structured
environments in 2D. Then we show how to develop topological separation for semi-structured settings
using an incremental technique. Our unique method is based on spectral clustering of discrete-time metric
maps decomposed using a modified Voronoi breakdown incrementally The robustness and quality of the
topology providing an overview using the proposed approach are demonstrated in real-world experiments [17].
Mnih et al. (2015) had explained Agents must construct efficient models of the surroundings based on
increased sensory information, and use them to transfer previous experience to a whole new situations
condition to use learning algorithm effectively in settings nearing real-world complexities. We show that
using only the pixels and the game score as inputs, the deep Q-network agent was able to outperform all
previous methods and achieve a level similar to that of a professional human game. The gap between top
sensory information and actions is bridged in this research [18].
An end-to-end Tree branch embedding network has been established by Sun et al. (2021) [19] for the
precise feature extraction of the vehicles. The local and global features of the vehicles are utilized for the
re-identification of the vehicle through the cameras. Mainly the region features of the vehicles are
emphasized for better prediction using colour image analysis. The training of the dataset results in
identifying the similarities and differences between the target and other vehicles. Further enhancement in
the proposed has to be included by adding an adaptive technique for further efficiency in the prediction.
A deep learning-based object detection strategy for the traffic information collection using Unmanned
aerial vehicle has been proposed by Sun et al. (2021) [20]. A real time small object detection algorithm
incorporated YOLOv3 has been utilized to detect the objects in traffic monitoring and provides specific
information. The local and global features are fused and are emphasized by the feature Pyramid network.
The work has to be enhanced with further optimization for enhanced prediction.
This paper contributes a Reinforcement Learning approach that enables efficient path prediction for
robot navigation in the environment based on ROS system using Applied Potential Field induced Deep-
Q-Network.
@y
U ðqÞ ¼ Uatt ðqÞ þ Urep ðqÞ (7)
FðqÞ ¼ rUatt ðqÞ rUrep ðqÞ (8)
r U ðqÞ U at location q is denoted by U(q). Eq. (7) demonstrates the possible field operating on the
robot as the sum of the attractive field of the objective and the repellent field of the impediments. In the
same way, the pressures can be split into an attractive and repelling half, as illustrated in Eq. (8). A
parabolic distribution, such as in Eq. (9), can be used to define a value-added.
1
Uatt ðqÞ ¼ katt q2goal ðqÞ (9)
2
q2goal ðqÞ Fatt denotes the Distance measure, and katt is a constant scale parameter. It is distinguishable,
as indicated in Eq. (10), resulting in the attractive force F att. The repellent perspective is capable of
producing a force that repels all known impediments. Whenever the robot is close to the thing, this ought
to be quite strong, but when the robot is far away from the object, it should have little effect on its
motion.
Fatt ðqÞ ¼ katt ðq qgoal Þ (10)
8
<1 1 1 2
Urep ðqÞ ¼ 2 krep qðqÞ qo ; if qðqÞ qo (11)
:
0; if qðqÞ qo
8
< 1 1 1 q qobstacle
krep ; if qðqÞ qo
Frep ðqÞ ¼ qðqÞ qo q ðqÞ 2 qðqÞ (12)
:
0; if qðqÞ qo
where k rep is a scale parameter, _o is the individual’s impact separation, and q rep is the shortest distance
between q and the object. Eq. (11) depicts the repelling virtual infrastructure ρ(q), which can be positive or
negative and approaches infinity as q approaching the object [27]. (q) is recognizable everywhere in the free
coordinates space if the object border is convex and piece-wise distinct. The repelling pressure F rep is the
result of this, as shown in Eq. (12). The actions that cause ðqÞ ¼ Fatt ðqÞ þ Frep ðqÞ to be applied to a point
robot subjected to attracted and repellent forces cause the TB3 to move away from the barriers towards
the target. krep Urep Fatt ðqÞ þ Frep ðqÞ.
Fig. 3 illustrates the overall structure Urep of the proposed method for robot path prediction utilizing
deep RL and the shortest technique.
1142 IASC, 2023, vol.35, no.1
To reduce the training time and maximize the rewards, the shortest path algorithm, APF is incorporated
in the DQN algorithm. One of the algorithms mostly used by the researchers is Artificial Potential Field. APF
gives minimum processing time and path length. But, when the algorithm combines with Reinforcement
Learning, it gives a greater number of successful targets and takes minimum average time to reach the
goal autonomously with a reduced lifetime.
A succession of depth photos collected from a depth camera, combined with an angle denoting the
heading toward the intended target, was chosen as the state. The effectiveness of the suggested strategic
planning is evaluated in a simulated world based on the Robot Operation System (ROS) and contrasted to
the conventional method DQN algorithm. The number of successful targets, average training time, and
average rewards has been taken to validate the results. The proposed hybrid algorithm gives a greater
number of successful targets. Also, it took less training time, and maximize rewards.
While navigating from start to goal position; at first two points are generated, one is by DQN, and
another is by APF. Always it chooses the next point from starting point only with the DQN algorithm,
but it verifies with APF, whether the chosen point is optimal or not. The robot heading and goal are
shown in Fig. 4.
The agents should be rewarded for navigating the robots going away from the goal and colliding with
barriers and being fined for traveling away from the goal and hitting with hurdles in mobile robot navigation.
To do so, the reward function, r, was created as shown in Eq. (13).
r ¼ cos h (13)
If the value of the reward function i.e., cosine angle between the two points is positive then the robot will
move on to the next location since the location is ideal which is selected by DQN. Else the robot collides in
the environment.
To solve the inherent instability problem of using compliance in RL, DQN used two strategies:
experience replaying and targeted networks. Experiencing replay memory, which store transitions of the
form (st, at, st + 1, rt + 1) in a cyclic buffer, allow the Agent to sample from and learn on previously
witnessed data. This not only reduces the number of interactions with the environment but also allows for
the sampling of information batches, minimizing the variability of learning upgrades. Furthermore, the
temporal correlations that can harm RL systems are eliminated by sampling evenly from a vast memory.
From a practical standpoint, current equipment can easily manage batches of data simultaneously,
improving capacity. While the initial DQN system employed uniform sampling, subsequent research has
1144 IASC, 2023, vol.35, no.1
revealed that prioritizing samples based on TD errors is more advantageous for learning purposes.
Reinforcement learning functions are shown in Fig. 5.
That if such a database could be created, it would be poorly occupied, and knowledge received from one
state-action pair could not be spread to others. The DQN’s strength rests in its ability to use deep neural
networks to cohesively encode both strong data and the Q-function. It would be impossible to tackle the
discrete Atari realm from raw visual inputs without such a capability.
The policy system utilizes the fixed targeted system instead of calculating the TD error based on its own
rapidly fluctuating estimates of the Q-values. The parameters of the network device are adjusted to match the
policy network after a set number of steps during learning. Both the encounters replay as well as the target
systems have since been employed in other DRL experiments.
The positioning refers to the process of the state–a stack of grey-scale frames from the video game—
with convolution and fully connected layers, as well as nonlinear effects between every level. The
network outputs a discrete action at the last layer, which correlates to one of the game’s potential control
inputs. The game generates a fresh score based on the current state and action taken. The DQN learns
from its choice by using the reward—the difference between the new and prior score. The incentive is
used to refresh the channel’s estimation of Q, and the difference between both the old and new estimates
is backpropagated through the system. The design of a Dueling Deep Q-network changes a standard
DQN into a Reinforcement Learning architecture more suited to model-free reinforcement learning, with
the goal of lowering loss as much as feasible. A typical DQN design is made up of a stream of
completely linked layers, but the Dueling DQN divides the stream into two parts: one for the value
function and the other for the advantage function. Double DQN and Dueling DQN are combined in
Dueling Double DQN. As a result, the estimating problem is solved, and efficiency is better. The training
sequence s is viewed as a Markov Decision Process as the traditional reinforcement learning approach
(MDP). Engaging with the environment allows the robot to make choices. Moving ahead, turning half-
left, turning left, turning left-right, a rightmost lane is some of the activities.
2.3.3 DQN Based on Deep Reinforcement Learning
The suggested navigation technique can successfully achieve autonomous and collision-free motion of
the robot to the location of the target object without constructing the environmental map in advance,
according to simulation navigation experimental data. It is a successful autonomous navigation technique
that demonstrates the viability of Deep Reinforcement Learning in mobile robot automated driving. This
method’s use is confined to discontinuous action space agents. However, some early works have used the
DQN methodology to learn optimal actions from visual input. Because the convolutional neural network
can automatically extract complicated features, it is the best choice when dealing with high-dimensional
and continuous state data. The network is improved by Dueling DQN. It takes advantage of the model
structure to express the value function more thoroughly, allowing the model to perform much better and
reducing the overestimation of the DQN Q value.
selected here because of more advantages than previous versions. The TurtleBot3 series by Robotics offers a
range of open-source robotics platforms that are programmable on ROS, and highly powerful and performing
despite their small size and low price. These mobile robots are equipped with a full autonomous navigation
system, 2 servos, a motion controller, and a microcontroller. A Turtlebot3 Burger is a lightweight, compact,
cost-effective, adaptable, and lightweight mobile robotics platform. It includes a high SBC with Intel CPU in
terms of computations, a True Sense Application Of deep learning for object detection and 3D SLAM, and an
increased power actuator that ensures a maximal linear speed of 0.26 m/s and an angular speed of 1.8 rad/s
thus the iteration can be reduced using Pi. Fig. 7 shows the TB3 Waffle Pi in a ROS-Gazebo context.
The 4 × 4 map with no obstacles is taken as the environment while the four walls are considered as
obstacles. The TB3 should reach the goal without colliding the walls. The main aim of the research is to
test the efficiency of the conventional and proposed algorithms. The environment which is selected in this
research is shown in Fig. 8.
In this environment, the TB3 experiments with the conventional DQN algorithm and the proposed DQN
algorithm. The results are compared in the terms of the number of successful targets, average time (seconds),
and average rewards. In the training period, the proposed algorithm gives more successful targets compared
to DQN. The comparison of results is tested with 1000 episodes where, an episode is the ratio of a single
batch of datasets per the respective tasks. An episode usually means one single dataset. In the 50th
episode, the DQN algorithm gives only 6 successful targets but the proposed algorithm gives
10 successful targets. As the episode increases the number of successful targets also increases. In the
500th episode, the proposed algorithm gives 349 successful targets while the conventional algorithm gives
only 295 successful targets. So, when compared with the conventional algorithm the proposed algorithm
gives more successful targets as shown in Fig. 9. The performance improvement rate of the proposed
DQN + APF in comparison with DQN in terms of the number of successful targets is attained by 88%.
IASC, 2023, vol.35, no.1 1147
Figure 9: Comparison of algorithms concerning the number of successful targets and episodes
The average time taken to train the TB3 Waffle Pi is less in the proposed algorithm when compared to the
conventional DQN. With the reward function, a direct reaction is the final notches the robot can achieve, with
the trained method for selecting the knowledge with the highest Q-value at this point in the series, we’ve
reached the 50th segment. The average time taken for the conventional algorithm is 108 s but the
proposed algorithm takes only 81 s to reach the target. In the initial training period that is from the 50th
episode to the 400th episode the average time taken is more for the DQN algorithm. But the proposed
algorithm took very little time as shown in Fig. 10. The performance improvement rate of the proposed
DQN + APF in comparison with DQN in terms of average time is attained by 0.331 s.
1148 IASC, 2023, vol.35, no.1
Figure 10: Comparison of algorithms concerning the time taken in seconds and episodes
When TB3 takes an action in a state, it receives a reward [10]. The reward design is very important for
learning. A reward can be positive or negative. When TB3 gets to the goal, it gets a big positive reward.
When TB3 collides with the wall it gets a negative reward. In the initial training from the 50th episode to
the 250th episode, the conventional DQN algorithm gets a negative reward only. So, the number of
successful targets will be very less as shown in Fig. 8. But in the proposed algorithm the 250th episode
itself gets a positive reward. Between the 300th episode to the 550th episode, the proposed algorithm gets
a better average reward function than the DQN algorithm as shown in Fig. 11. The performance of the
proposed DQN + APF in comparison with DQN average rewards in which the positive goal is attained
by 85% and negative goal is attained by −90%
4 Conclusion
In this paper, the Artificial Potential Field is to increase the average reward and sample effectiveness,
integrate it with the DQN algorithms. In addition, the suggested method minimizes path planning time,
increases the number of effective objectives throughout training, reduces convergence time and improves
the robotic arm TB3’s seamless and effective mobility characteristics. In a ROS-Gazebo simulator, this
study shows that the proposed algorithm can navigate a TB3 Waffle Pi to specified places. In this study,
the proposed approach is limited to a static setting. The suggested methodology has yet to investigate
dynamic environmental changes. By comparing the proposed DQN + APF with DQN in terms of a
number of successful targets, the performance improvement rate is 88%. In terms of average time when
compared to DQN, the proposed DQN + APF attains an average time of 0.331 s. The performance of the
IASC, 2023, vol.35, no.1 1149
proposed DQN + APF in comparison with DQN average rewards in which the positive goal is attained by
85% and the negative goal is attained by −90%
Acknowledgement: The authors with a deep sense of gratitude would thank the supervisor for his guidance
and constant support rendered during this research.
Funding Statement: The authors received no specific funding for this study.
Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the
present study.
References
[1] S. Muthukumaran and R. Sivaramakrishnan, “Optimization of mobile robot navigation using hybrid dragonfly-
cuckoo search algorithm,” Tierärztliche Praxis, vol. 40, pp. 1324–1332, 2020.
[2] P. Wang, X. Li, C. Song and S. Zhai, “Research on dynamic path planning of wheeled robot based on deep
reinforcement learning on the sloping ground,” Journal of Robotics, vol. 2020, pp. 1–20, 2020.
[3] M. Quigley, B. Gerkey and W. D. Smart, “Programming Robots with ROS,” in A Practical Introduction to the
Robot Operating System, 1st ed., Sebastopol, CA: O’Reilly Media, Inc., pp. 1–417, 2015.
[4] S. Dilip Raje, “Evaluation of ROS and gazebo simulation environment using turtlebot3 robot,” in Proc. 11th
International Conference on Simulation and Modeling Methodologies, Technologies and Applications, Setúbal,
Portugal, pp. 1–5, 2020.
[5] L. Tai and M. Liu, “Towards cognitive exploration through deep reinforcement learning for mobile robots,” arXiv
preprint arXiv:1610.01733, 2016.
[6] J. Xin, H. Zhao, D. Liu and M. Li, “Application of deep reinforcement learning in mobile robot path planning,” in
Proc. 2017 Chinese Automation Congress (CAC), Jinan, China, pp. 7112–7116, 2017.
[7] Z. Peng, J. Lin, D. Cui, Q. Li and J. He, “A multiobjective trade-off framework for cloud resource scheduling
based on the deep Q-network algorithm,” Cluster Computing, vol. 23, no. 4, pp. 2753–2767, 2020.
[8] M. M. Rahman, S. H. Rashid and M. M. Hossain, “Implementation of Q learning and deep Q network for
controlling a self balancing robot model,” Robotics and Biomimetics, vol. 5, no. 1, pp. 1–6, 2018.
[9] Y. Yang, L. Juntao and P. Lingling, “Multi-robot path planning based on a deep reinforcement learning DQN
algorithm,” CAAI Transactions on Intelligence Technology, vol. 5, no. 3, pp. 177–183, 2020.
[10] H. Bae, G. Kim, J. Kim, D. Qian and S. Lee, “Multi-robot path planning method using reinforcement learning,”
Applied Sciences, vol. 9, no. 15, pp. 3057, 2019.
[11] J. Yu, Y. Su and Y. Liao, “The path planning of mobile robot by neural networks and hierarchical reinforcement
learning,” Frontiers in Neurorobotics, vol. 14, no. 63, pp. 1–12, 2020.
[12] M. A. Ali and M. Mailah, “Path planning and control of mobile robot in road environments using sensor fusion
and active force control,” IEEE Transactions on Vehicular Technology, vol. 68, no. 3, pp. 2176–2195, 2019.
[13] Y. Sun, J. Cheng, G. Zhang and H. Xu, “Mapless motion planning system for an autonomous underwater vehicle
using policy gradient-based deep reinforcement learning,” Journal of Intelligent & Robotic Systems, vol. 96, no.
3–4, pp. 591–601, 2019.
[14] S. F. Lin and J. W. Chang, “Adaptive group organization cooperative evolutionary algorithm for TSK-type neural fuzzy
networks design,” International Journal of Advanced Research in Artificial Intelligence, vol. 2, no. 3, pp. 1–9, 2013.
[15] J. Xin, H. Zhao, D. Liu and M. Li, “Application of deep reinforcement learning in mobile robot path planning,” in
Proc. 2017 Chinese Automation Congress (CAC), Jinan, China, pp. 7112–7116, 2017.
[16] M. Liu, F. Colas, L. Oth and R. Siegwart, “Incremental topological segmentation for semi-structured
environments using discretized GVG,” Autonomous Robots, vol. 38, no. 2, pp. 143–160, 2015.
[17] S. S. Ge and Y. J. Cui, “Dynamic motion planning for mobile robots using potential field method,” Autonomous
Robots, vol. 13, no. 3, pp. 207–222, 2002.
1150 IASC, 2023, vol.35, no.1
[18] N. Sariff and N. Buniyamin, “An overview of autonomous mobile robot path planning algorithms,” in Proc. 2006
4th Student Conf. on Research and Development, Shah Alam, Malaysia, pp. 183–188, 2006.
[19] W. Sun, G. Dai, X. Zhang, X. He and X. Chen, “TBE-net: A three-branch embedding network with part-aware
ability and feature complementary learning for vehicle re-identification,” IEEE Transactions on Intelligent
Transportation Systems, vol. 99, pp. 1–13, 2021.
[20] Z. Jiangzhou, Z. Yingying, W. Shuai and L. Zhenxiao, “Research on real-time object detection algorithm in traffic
monitoring scene,” in Proc. 2021 IEEE International Conference on Power Electronics, Computer Applications
(ICPECA), Shenyang, China, pp. 513–519, 2021.
[21] L. Tai and M. Liu, “A robot exploration strategy based on q-learning network,” in Proc. 2016 IEEE Int. Conf. on
Real-Time Computing and Robotics (RCAR), Angkor Wat, Cambodia, pp. 57–62, 2016.
[22] V. François-Lavet, P. Henderson, R. Islam, M. G. Bellemare and J. Pineau, “An introduction to deep reinforcement
learning,” Foundations and Trends in Machine Learning, vol. 11, no. 3–4, pp. 219–354, 2018.
[23] K. M. Jung and K. B. Sim, “Path planning for autonomous mobile robot using potential field,” International
Journal of Fuzzy Logic and Intelligent Systems, vol. 9, no. 4, pp. 315–320, 2009.
[24] J. Sfeir, M. Saad and H. Saliah-Hassane, “An improved artificial potential field approach to real-time mobile robot
path planning in an unknown environment,” in Proc. 2011 IEEE Int. Symp. on Robotic and Sensors Environments
(ROSE), Montreal, QC, Canada, pp. 208–213, 2011.
[25] O. Khatib, “Real-time obstacle avoidance for manipulators and mobile robots,” in Proc. Autonomous Robot
Vehicles, New York, NY, pp. 396–404, 1986.
[26] H. E. Romeijn and R. L. Smith, “Simulated annealing for constrained global optimization,” Journal of Global
Optimization, vol. 5, no. 2, pp. 101–126, 1994.
[27] R. Dhaya, R. Kanthavel and A. Ahilan, “Developing an energy-efficient ubiquitous agriculture mobile sensor
network-based threshold built-in MAC routing protocol, ” Soft Computing, vol. 25, no. 18, pp. 12333–12342, 2021.
[28] A. Ahilan, G. Manogaran, C. Raja, S. Kadry, S. N. Kumar et al., “Segmentation by fractional order darwinian
particle swarm optimization based multilevel thresholding and improved lossless prediction-based compression
algorithm for medical images,” IEEE Access, vol. 7, pp. 89570–89580, 2019.
[29] B. Sivasankari, A. Ahilan, R. Jothin and A. J. G. Malar, “Reliable N sleep shuffled phase damping design for
ground bouncing noise mitigation,” Microelectronics Reliability, vol. 88, pp. 1316–1321, 2018.
[30] M. Liu, F. Colas, F. Pomerleau and R. Siegwart, “A markov semi-supervised clustering approach and its
application in topological map extraction,” in Proc. 2012 IEEE/RSJ Int. Conf. on Intelligent Robots and
Systems, Vilamoura-Algarve, Portugal, pp. 4743–4748, 2012.
[31] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness et al., “Human-level control through deep
reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015.
[32] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long et al., “Caffe: Convolutional architecture for fast feature
embedding,” in Proc. 22nd ACM Int. Conf. on Multimedia, Lisboa, Portugal, pp. 675–678, 2014.
[33] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre et al., “Mastering the game of go with deep neural
networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, 2016.