0% found this document useful (0 votes)

2 views

Multi_Agent_Reinforcement_Learning_for_SWARM_UAV_Collision_Avoidance

The document discusses a proposed Deep Q-Learning based Dynamic Swarm Pattern Formation (DSPF) model for Unmanned Aerial Vehicles (UAVs) that enhances automation in mission execution through a Speed Control based Reinforcement Learning (SC-RL) algorithm. This model facilitates efficient pattern formation and collision avoidance among multiple UAVs while allowing for dynamic pattern switching and decentralized coordinate calculation. Simulation results indicate significant improvements in pattern formation time and distance covered, showcasing the model's feasibility in complex environments.

Uploaded by

Edison Chandraseelan

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Multi_Agent_Reinforcement_Learning_for_SWARM_UAV_Collision_Avoidance

Uploaded by

Edison Chandraseelan

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

1

Simulation and Path Optimization of SWARM UAV

Formation Exchange Using Multi Agent
Reinforcement Learning
Vasantharaj Rajagopal, K. Senthil Kumar

Abstract—Unmanned Aerial Vehicles (UAVs) have been widely the efficient execution and completion of any UAV mission
used in various applications mainly for surveillance and crowd- like search and rescue, drone delivery systems, unmanned
sensing. A swarm setup is an environment where multiple surveillance, etc. The primary aim of the operation is to
UAVs coordinate together to execute a specific mission. This
swarm of UAVs is a useful entity to access areas where human reduce the distance between the drones and the target areas
penetration is impossible. To facilitate the need for automation [7]. Traditional pattern formation mechanisms involve a pre-
among swarm of UAVs, A Deep Q-Learning based Dynamic programming of the entire mission, where patterns are formed
Swarm Pattern Formation (DSPF) model has been proposed. The at specific intervals of time [12]. However, these models
proposed Speed Control based Reinforcement Learning (SC-RL) would not be suitable in scenarios where pattern requirements
algorithm enhances the DSPF model to achieve pattern formation
in an automated manner. The SC-RL algorithm strives to switch are uncertain. Thus, the pattern formation mechanism must
patterns efficiently by avoiding inter-UAV collisions and also be equipped with the ability to switch patterns on demand
keeps optimal trajectory intact throughout its pattern switching dynamically.
mechanism. In this pattern formation process, a Decentralized A dynamic tracking of multiple targets, data collection sens-
Coordinate Calculation (DCC) algorithm is proposed to facilitate ing framework for Unmanned Aerial Vehicles (UAVs) has been
parallelized coordinate calculation and to reduce the pattern
formation time. The DSPF model is also empowered with the implemented using holistic reputation model and dynamic
capability of switching patterns on thefly with the help of the intelligent matching. The optimization of each target is done
Servo Interrupt based Pattern Switch (SIPS) control, making with Stackelberg game theoretic and Stackelberg Equilibrium
it adaptable to a multitude of scenarios. Simulation has been conditions. The work is processed based on labor economics
performed for 50 UAVs and the results showcase the feasibility of in future it is planned to extent based on the principles of
the proposed DSPF model in a dense collision-prone environment
by improving the pattern formation time and distance covered contract theory. This method can be widely used to support
by almost 95.68% and 66.67%, respectively. crowdsourcing IoT applications and public safety backgrounds
Index Terms—Pattern Formation, Servo Interrupt, Reinforce- [44].
ment Learning, Decentralized Coordinate Calculation, Speed One of the most crucial factors detrimental to designing
Control, Swarm of UAVs an autonomous pattern formation algorithm is the mecha-
nism of collision avoidance [11]. Several collision avoidance
I. I NTRODUCTION
mechanisms employ vision-based mechanisms, dynamic pro-
Unmanned Aerial Vehicles (UAVs) have proven a signif- gramming approaches, and also by employing data obtained
icant asset to the world by bringing efficiency and accu- from multiple sensors mounted on the drones [26], [27].
racy in various commercial and civilian applications, such Since the pattern formation environment tends to be highly
as military combat, transportation, agricultural purposes, etc. collision prone, such systems prove inefficient due to excessive
[34]. It has been estimated that the global UAV market will processing requirements which prolong the delay in pattern
reach US$21.47 billion, with the Indian market touching formation. As a mechanism of tackling such an increased pro-
the US$885.7 million mark, by 2021. UAVs boast several cessing load, traditional pattern formation mechanisms focus
functionalities, namely reactive autonomy [33], Simultaneous on adding an extra time delay between the movement of UAVs
Localization And Mapping (SLAM), swarming, etc., which [25]. Unfortunately, these systems offer reduced processing
have tremendous potential for extensive research. Of the requirements at the cost of scalability, making them unsuitable
several functionalities demonstrated by UAVs, their pattern for use in widespread multi-drone applications. It is therefore
formation capability holds key importance in their primary essential to build pattern formation mechanisms which are
use cases [13]. speedy, scalable and have the ability to mitigate inter-UAV
During pattern formation, a swarm of UAVs is empowered collisions at a high rate of accuracy.
with the capability of forming different patterns that adapt Further, a typical swarm pattern formation algorithm focuses
to the different underlying environments in which they are on the Leader-Follower methodology. This model adopts a
deployed. This dynamic pattern switching is necessary for centralized approach that involves the calculation of positions
of follower drones, communicating essential run-time deci-
Vasantharaj Rajagopal and Senthil Kumar K are with Department
of Aerospace Enginnering, Anna University, Chennai, India, (e-mail: sions to the followers, etc. solely by the leader drone. This
[email protected], ksk [email protected]), centralized approach tends to increase the time complexity of
the algorithm, thereby decreasing the overall efficiency of the and rescue operations with optimized time and resources.
system. Forming pattern reduces the redundancy with less overlapping
Considering the real time environment constrains it com- coverage. Formation of pattern can provide efficient resource
posed of complex data with many uncertainties the dy- allocation by addition or removal of drones in the swarm based
namic measurement requires efficient self-adaptive methods. on the resource availability. Representing the UAV in the form
There are certain self-adaptive methods developed for some of Patterns enables effective collaboration with human which
communication applications. To overcome the dependencies can help to easily identify the faults.
of stochastic sampling in Next Generation Multiple Access The rest of the paper is organized as follows. The existing
(NGMA) systems a pair of scheduling policies called mul- methodologies of pattern formation among UAVs, obstacle
tiple vacation and startup threshold has been developed to detection, and utilization of reinforcement learning in the
reduce the power consumption of TDMA,FDMA and CDMA swarm of UAVs have been studied and presented in Section
systems. This algorithm is less complex and works well for II. Section III gives an in-depth view of the proposed DSPF
TDMA and FDMA systems [39]. A swarm of unmanned model, describing the primary SC-RL algorithm and the other
aerial/surface/underwater vehicle (UAV-USV-UUV) has been submodules, namely the DCC algorithm and SIPS control.
proposed to provide fine-grained spatial-temporal information A detailed analysis of the simulation and the results, along
acquisition for underwater target hunting. DQN-based energy with performance analysis, has been presented in Section
optimization scheme is proposed to provide energy optimiza- IV. Finally, Section V presents the conclusion with future
tion with success rate of 95%. However this method needs to directions.
be tested in real time and multi target environment [40].Suc-
II. R ELATED W ORK
cessive refinement of Gaussian and Spherical codebooks with
an arbitrary memory less source under the quadratic distor- Swarm of UAVs is a concept that paves the way for
tion measure has been proposed [41]. An achievable joint coordinated UAV missions [8]–[10] in variable ecosystems.
excess-distortion probability is achieved under mild moment Communication and coordination among drones employed in
condition. For large deviations the Gaussian codebook alone the system must be established [1] so that the multiple drones
is considered for refinement. This method can be used for will act as a single unit. This communication is generally
many multi-terminal lossy source coding problems.An optimal facilitated with the help of the MAVLink protocol [3]. A
power allocation algorithm for Non-orthogonal multiple access critical function of such a system is the Pattern Formation.
(NOMA) is proposed [42] to achieve minimum transmission Any multi UAV configuration setup consists of a Base Station
power with reliable transmission in 5G networks to carry (BS) and a Ground Control Station (GCS), which provide
low latency critical traffic. The Bit Error Rate (BER) of the grounds for wireless communication and centralized control
receiver is improved for a limited number of receivers. A low of the swarm of UAVs [4]–[6]. The work in [2] proposed an
complex graph theory based Integrated Sensing and Commu- idea of a coordinated multi UAV pattern formation mechanism
nication (ISAC) network has been developed to provide low that involves consideration of various physical parameters like
transmission delay and resource optimization. The proposed wind speed, the direction of the wind, obstacles, etc.
methodology is helpful to provide alternate solution to avoid Making the multiple UAVs work as a single unit will help
unbearable data transmission delay in routers [43]. us to form patterns that help in the efficient execution of UAV
Thus to tackle the problems as mentioned above and to missions [12]–[14]. Before pattern formation, the task needs
increase the autonomy in operations, this paper proposed the to be grouped into multiple closely related clusters based on
following solutions for the pattern formation environment. its geographical location [17]. During pattern formation, the
coordination among UAVs is of prime importance. Leader-
• A Speed Control based Reinforcement Learning (SC-RL) follower based formation flight [15], [16] is an efficient
algorithm helps to detect and avoid collisions automat- mechanism for inter-UAV coordination. In this mechanism, a
ically among the dynamic number of UAVs. This SC- single drone is selected as a leader and rest others as followers.
RL algorithm also ensures that drones’ trajectory remains The leader broadcasts its positional coordinates and other state
optimal thereby reducing the pattern formation time. manipulation functions to all the followers, based on which
• A Servo Interrupt based Pattern Switch (SIPS) control critical decisions are taken.
which helps to switch patterns on demand dynamically. With an increase in the number of UAVs in the pattern
This mechanism caters to the varying needs of the pattern formation environment, it becomes complicated to control the
formation environment effectively. system manually. There is a critical need for automation,
• A Decentralized Coordinate Calculation (DCC) algorithm which is achieved by Reinforcement Learning (RL) [28]–[32].
brings parallelism to calculation of drones’ positions in In the UAV field, RL provides methodologies for autonomous
the Leader-Follower environment. This algorithm helps to UAV navigation, collision avoidance, the stable hovering of
reduce the burden on the leader drone and also drastically drones, etc. [18]–[20].
reduces the pattern formation time. Another determining factor in the application of au-
The specific pattern formation can cover a large area with tonomous pattern formation algorithms is the mechanism of
grid formation which is most suitable for surveillance, search collision avoidance. The collision avoidance mechanism of
multiple drones is based on the UAV relative distance cal-
culation model [21]–[23]. Every agent in a multi UAV system UDP UDP
considers only the relative positions of its nearest neighbors Ardupilot Physics Simulation
on both left and right sides. The work in [25] instructs the
UDP UDP
drones to fly to their destinations at different timings to avoid
the collision. In the absence of the Global Positioning System TCP TCP UDP
(GPS), the collision avoidance mechanism can be executed
with the help of a camera and a laser detector [24]. How- UDP
ever, these mechanisms pose serious processing overhead and
poor scalability, thereby deeming them inefficient in pattern TCP TCP Flight Control
UDP
formation environments. Controlling of Multiple UAVs with
Micro protocol Air Vehicle Link protocol for Model-View- GCS / Mission Telemetry
Controller (MVC) pattern has been proposed in [38]. A new Planner Port
UDP UDP Other GCS
MAV Link message has been established to control a group
of UAVs. However this method needs an improvement in the Fig. 1: SITL architecture
communication channel.Oftenly the communication channel
gets overloaded. Table 1 shows the comparison of different
patterns with the advantages and disadvantages. It shows the are executed on MAVProxy software that helps to establish
need for automatic pattern formation and control for multi- overall control of the swarm system and collect valuable
agent system. data during mission execution. The obtained telemetry data
This paper proposes an idea to automate the pattern forma- can also be shared with multiple other GCS modules. The
tion of multiple drones in a multi-agent system. The concept of Ardupilot module serves to provide a user interface enabling
RL is employed to develop the SC-RL algorithm that controls communication among the user, GCS, and the physical control
the DSPF model. For pattern formation, the SC-RL algorithm module like Physics simulations, Flight Gear, etc. These
takes the mission, and the pattern as input and automatically physical control modules contain built-in functions that help
detects the position for each drone without any collision in the to replicate the actual UAV’s motion and operation. TCP
shortest time possible. Further, the DCC algorithm has been provides the communication between the Ardupilot, GCS, and
modeled to reduce pattern formation time. the Telemetry port modules while UDP equips the rest of the
communications [36].
III. DYNAMIC S WARM PATTERN F ORMATION (DSPF)
M ODEL B. Servo Interrupt based Pattern Switching (SIPS) Control
In a multi-UAV environment, pattern formation is an es- The outermost layer to the proposed DSPF model for
sential function that helps in making UAV missions efficient deciding the pattern to be formed at a specific juncture is
and cost-effective. The proposed DSPF model helps automate the interaction between the user and the Multi-UAV system.
this pattern formation process and ensures that collision is The traditional and the most common approach for deciding
avoided throughout the UAV mission. This automation process patterns are pre-programmed on a time-interleaved sequential
would considerably reduce human intervention, saving much manner. This time based sequential pattern formation is not a
manual work like waypoints, tracking of UAV trajectory for feasible mechanism for all situations where the swarm needs
collisions, etc. The DSPF model comprises two functionalities, to switch to a desired pattern on the fly. To cater to such
namely SIPS control and DCC mechanisms. The DSPF is also scenarios, a SIPS control mechanism is proposed to easily
simulated and evaluated using Software In The Loop (SITL) switch to the desired pattern on the fly. These servo(s) can
setup. be controlled directly through the mission planner through
commands sent from the GCS [35]. The servo input consists
A. Software In The Loop (SITL) Setup of many Radio Frequency (RF) Channels, also called the servo
In the SITL environment, the proposed DSPF model has source. Each RF channel can be programmed to form a specific
been tested over a multitude of scenarios. The SITL environ- pattern under a given scenario.
ment provides ground for effectively simulating any swarm The algorithm 1 illustrates the servo based pattern switching
of UAVs module, replacing all the hardware components by mechanism for a general quadcopter [12] for a sample of three
a corresponding software module. This environment helps to shapes, namely square, triangle, and diamond. In a quadcopter,
reduce the testing costs and expand the number of scenarios the first four RF channels are reserved for specific drone
to which the given module can be exposed. motion commands, and hence the channels 5, 6, 7, and 8 are
The proposed DSPF model manipulates the physical simu- used for the pattern formation. The system listens for a servo
lation control layer to achieve efficient pattern formation based interrupt, based on which it determines the appropriate servo
on the input data obtained from the GCS via the Ardupilot. source value. The leader drone is instructed to switch to a
Fig. 1 illustrates the various modules in a SITL architecture for specified formation, whenever a toggle of a particular servo
the DSPF model. Here the GCS and telemetry port modules source has happened. A toggle in the value of servo source 8
TABLE I: comparison of existing pattern with the merits and demerits
REF NO. ALGORITHM ADVANTAGES DISADVANTAGES
Artificial Immune System (AIS), Solves NP-hard problem.
Does not consider
[7] Particle Swarm Optimization (PSO), Provides best results compared with
trajectory optimization.
Virtual Bee Algorithm(VBA) classical combinatorial optimization methods.
Many-to-Many Learning Matching ( MLMA) FPMA provides a convergence
Dynamic situation of drone
[4] Fast Potential Matching performance which is robust to dynamic
needs to be analyzed
Without Substitutability(FPMA) UAV communication networks.
Performs both exploration and The complexity of the formation
Oxyrrhis Marina-inspired search (OMS),
[1] exploitation of the search space leads to high computational and
Circular or Elliptical
using Levy flight and Brownian search communication overhead
Consensus can be achieved in multivehicle It may not be suitable for scenarios
[16] Leader–Follower consensus algorithm, 2 Drones wirelessly networked uncertain where the formation needs
systems with some conditions to adapt to changing conditions or obstacles.
path planning is superior compared with other Produces noise
[14] Multiverse optimizer (MVO)
meta-heuristic algorithms. in the communication phase
Application of Reinforcement
Sophisticated path planning and collision avoidance Learning deeply analyzes The obtained control
[20]
algorithms, Circle and fountain pattern the state space, action space and decision time is 100ms
environment model
It optimizes the trajectory of the UAV-BS Computationally expensive
[25] Deep Reinforcement Learning (DRL) to minimize the outage probability and require large amounts of data
and maximize the user satisfaction index.. ]to train the models.
Improves the network stability Nodes in the transmission
[28] Agent Learning-based Clustering Algorithm
by facilitating adaptive clustering range affects the decision in route selection

instructs the swarm of drones to return to their initial takeoff drones increases on a large scale, the complexity of the
location, and the drones eventually land to the ground. Thus coordinate calculation algorithm increases to O(n).
the SIPS mechanism serves to dynamically form patterns in a To avoid such an overhead, the DCC algorithm is pro-
multi UAV environment on the fly. posed. Fig. 2 shows the flow of control of the proposed
DCC algorithm. The GCS specifies the pattern to be formed
Algorithm 1 Servo Interrupt Based Pattern Formation to the leader drone. The leader drone upon receiving the
Input: The servo interrupt signal, k pattern information broadcasts the pattern information and
Output: Pattern formation based on the servo interrupt its positional coordinates to all the follower drones. Each
1: procedure S ERVO BASED PATTERN F ORMATION(k) follower is pre-assigned with its unique ID before the start
2: Let the source of the servo interrupt be servoSource of the process to avoid conflicts between the two drones’
3: if servoSource == 5 then position during the coordinate calculation. The follower uses
4: Switch to a square pattern its unique ID, the leader’s positional data, and the pattern
5: else if servoSource == 6 then formation information to calculate its absolute position. Once
6: Switch to a triangle pattern the destination coordinates are obtained, the followers move
7: else if servoSource == 7 then to the specified location independently. When all the followers
8: Switch to a diamond pattern reach the destination, they issue feedback to the leader. When
9: else if servoSource == 8 then the leader receives the feedback from all the followers, it
10: Return to home and land ensures pattern formation, and this information is made known
to the GCS.
Since the followers have a unique ID, the coordinate cal-
culation mechanism for each follower becomes independent,
C. Decentralized Coordinate Calculation (DCC)
and hence it is parallelized. This makes the DCC algorithm
Once the specified pattern has been input to the system independent of the number of drones, reducing the complexity
through servo interrupts, it is essential to determine the co- to O(1).
ordinates of each UAVs in the swarm system. The general DCC for Square, Diamond and Triangle Patterns:
UAV coordinate calculation algorithm in a leader-follower The DCC algorithm is the central part of the pattern forma-
environment would take place in a centralized manner [12]. In tion mechanism. The algorithm operates in a parallel fashion
such a scenario, the leader would be responsible for calculating and calculates the destination positions for all the drones
the positions of the followers based on its position and then based on the leader’s coordinates, the drone’s uniqueID,
communicate the coordinates to the followers. The follower’s and the information about the pattern to be formed. The
task would be to receive the calculated coordinates from the implementation of the proposed DCC algorithm has been done
leader and then move to the specified destination. In this for three shapes, namely square, diamond, and triangle, and
scenario, the complete overhead lies on the leader, which has been designed to operate for any number of drones, nd .
reduces the efficiency of the system. When the number of For each drone, the positional coordinates are represented
Specify A
A B
pattern Side 1

GROUND CONTROL SYSTEM LEADER Side 4 Side 1

Leader's Leader's
Leader's Side 4 Side 2 D B
position position
position

Side 3 Side 2

FOLLOWER 1 FOLLOWER 2 FOLLOWER n D Side 3 C C

Follower's (a) Square (b) Diamond

Follower's position

Follower's position
position
A

Unique ID &
Leader's position

COORDINATE Side 4 Side 1

CALCULATION
Unique ID & Unique ID &
Leader's position Leader's position

Fig. 2: Distributed coordinate calculation mechanism

D Side 3 C Side 2 B
TABLE II: Notations of the variables used in patterns
(c) Triangle
S. No Variable Function
Fig. 3: Representation of Square, Diamond and Triangle
1 r Displacement from the leader Patterns
2 θ Azimuth angle
3 l Edge length
4 nde Number of Drones per Edge
5 nd Number of Drones The following equations represent the positional coordinates
6 nr Remainder number of Drones
7 k Coordinate calculation ID of drones when the total number of drones nd is an integral
multiple of 4, i.e., nr = 0
r = k ∗ 10, θ = 90°, where 1 ≤ k ≤ nde (3)
as (r, θ), where r represents the displacement from the leader
drone, and θ represents the azimuth angle. The azimuth angle
is the angle between the true north and the line joining the p t ∗ 10
r = l2 + (t ∗ 10)2 , θ = 90° + tan−1 ,
leader and the follower drone. The leader drone is assumed l (4)
to be at the true north, and the coordinates are represented where nde < k ≤ 2nde , t = k − nde
as (0, 0). The hover height for all the drones is fixed as 20
meters, and the minimum inter-drone distance is 10 meters. q
As shown in Fig. 3 the shapes, namely square, diamond, r= l2 + ((nde − t) ∗ 10) ,
2
and triangle, are considered to contain four sides, and drones
are equally divided among these sides. The edge length, l, is −1 l (5)
θ = 90° + tan ,
calculated as the product of the highest number of drones in (nde − t) ∗ 10
an edge and minimum inter-drone distance. The number of where 2nde < k ≤ 3nde , t = k − 2nde
drones per edge nde is calculated as
jn k
nde =
d
(1) r = (nde − t) ∗ 10, θ = 180°
4 (6)
where 3nde < k < 4nde , t = k − 3nde
With this value of nde and uniqueID, the drone positions
Eq. (3), (4), (5), and (6) present the mechanism for calcu-
are calculated by the followers. Also, the remainder number
lating the positions of drones when the uniqueID k is between
of drones nr is calculated as
different ranges, as mentioned.
Case II: nr = 1
nr = nd − 4 ∗ nde (2)
When the remaining number of drones, nr = 1, then
1) Square Pattern: In a square pattern, the length of each the additional drone is present in the side2 of the square.
side l = 10nde , since a gap of 10 metres is assumed in between The equations for the coordinate calculation for a drone with
each drone. Here, the value of (r, θ) is calculated for a drone uniqueID k changes as follows,
with uniqueID k for the following cases:

l
Case I: nr = 0 i.e. nd is an integral multiple of 4 r=k∗ , θ = 90°, where 1 ≤ k ≤ nde (7)
nde

p t ∗ 10 4, i.e., nr = 0.
r = l2 + (t ∗ 10)2 , θ = 90° + tan−1 ,
l (8) Case II: nr = 1
where nde < k ≤ 2nde + 1, t = k − nde When the remaining number of drones, nr = 1, then
the additional drone is present in the side2 of the diamond.
s The equations for the coordinate calculation for a drone with
2
uniqueID k changes as follows,

l
r= l2 + t ∗ ,
nde
l

nde
(9) r=k∗ , θ = 135°, where 1 ≤ k ≤ nde (14)
θ = 90° + tan−1 , nde
(nde − t)
where 2nde + 1 < k ≤ 3nde + 1, t = k − (2nde + 1)
p t ∗ 10
r= l2 + (t ∗ 10)2 , θ = 135° + tan−1 ,
l (15)

l where nde < k ≤ 2nde + 1, t = k − nde
r = (nde − t) ∗ , θ = 180°
nde (10)
where 3nde + 1 < k < 4nde + 1, t = k − (3nde + 1) s 2
In these Eq. (7), (8), (9) and (10), the length of side of l
r= l2 + ((nde − t) ∗ ,
square l = 10 ∗ (nde + 1). The ranges of the equations have nde
also been adjusted to accommodate the extra drone’s positional

nde − t
(16)
θ = 225° − tan−1 ,
calculation. (nde )
Case III: nr = 2 where 2nde + 1 < k ≤ 3nde + 1, t = k − (2nde + 1)
When the remaining number of drones, nr = 2, the addi-
tional drone is present in the side4 of the square, compared
to the previous configuration. Here, the Eq. (7), (8) and (9)

l
remain the same but the Eq. (10) is changed as follows: r = (nde − t) ∗ , θ = 225°
nde (17)
r = (nde − t + 1) ∗ 10, θ = 180° where 3nde + 1 < k < 4nde + 1, t = k − (3nde + 1)
(11)
where 3nde + 1 < k ≤ 4nde + 1, t = k − (3nde + 1) In these Eq. (14), (15), (16) and (17), the length of side of
Case IV: nr = 3 diamond l = 10 ∗ (nde + 1). This value of l would remain the
When the remaining number of drones, nr = 3, then same for the further equations as well.
the additional drone is present in the side3 of the square, Case III: nr = 2
compared to the previous configuration. Here, the Eq. (7) and When the remaining number of drones, nr = 2, then
(8) remain the same, however the Eq. (9) and (11) are changed the additional drone is present in the side3 of the diamond,
as follows: compared to the previous configuration. Here, the Eq. (14)
q and (15) remain the same, however the Eq. (16) and (17) are
2 changed as follows:
r = l2 + (nde − t + 1 ,

l
p
θ = 90° + tan−1 , (12) r = l2 + ((nde − t + 1) ∗ 10)2 ,
(nde − t + 1) ∗ 10
(nde − t + 1) ∗ 10

−1
where 2nde + 1 < k ≤ 3nde + 2, t = k − (2nde + 1) θ = 225° − tan , (18)
l
where 2nde + 1 < k ≤ 3nde + 2, t = k − (2nde + 1)
r = (nde − t + 1) ∗ 10, θ = 180°
(13)
where 3nde + 2 < k ≤ 4nde + 2, t = k − (3nde + 2) l

r = (nde − t) ∗ , θ = 225°
Thus in the square formation, the azimuth angle is constant nde (19)
across all drones that are part of side1 and side4, having where 3nde + 2 < k < 4nde + 2, t = k − (3nde + 2)
values of 90° and 180°, respectively.
Case IV: nr = 3
2) Diamond Pattern: The diamond shape is assumed to be
When the remaining number of drones, nr = 3, then
a square which has been tilted by 45°. Here, the value of (r, θ)
the additional drone is present in the side4 of the diamond,
is calculated for a drone with uniqueID k for the following
compared to the previous configuration. Here, the Eq. (14),
cases:
(15) and (18) remain the same, whereas Eq. (19) is modified
Case I: nr = 0 i.e. nd is an integral multiple of 4
as follows:
Eq. (3), (4), (5) and (6), with an addition of 45° to the
azimuth angle, θ, determine the drones’ positions in a diamond r = (nde − t + 1) ∗ 10, θ = 225°
(20)
when the total number of drones nd is an integral multiple of where 3nde + 2 < k ≤ 4nde + 2, t = k − (3nde + 2)
s 2
Thus in the diamond formation, the azimuth angle is con-

l
stant across all drones that are part of side1 and side4, having r= l2 + (t ∗ ,
nde
values of 135° and 225° respectively. (27)
t
3) Triangle Pattern: The triangle considered in this forma- θ = 180° + tan−1 ,
(nde )
tion is a right-angled triangle at the vertex ’A.’ Even though
where 2nde + 1 < k ≤ 3nde + 1, t = k − (2nde + 1)
the triangle is a three-sided figure, we consider it contains four
sides with hypotenuse being divided into two equal sides at
’C,’ as shown in Fig. 3c. The length of half of the hypotenuse l

√
is l = 10nde due to a gap of 10 meters between each drone. r = (nde − t) ∗ ∗ 2, θ = 225°
nde (28)
The lengths
√ of each of the other two sides of the triangle are
equal to l 2. Here, the value of (r, θ) is calculated for a drone where 3nde + 1 < k < 4nde + 1, t = k − (3nde + 1)
with uniqueID k for the following cases: In these Eq. (25), (26), (27) and (28), the the length of non-
Case I: nr = 0 i.e. nd is an integral multiple of 4 hypotenuse sides of the triangle l = 10 ∗ (nde + 1). This value
The following equations represent the positional coordinates of l would remain the same for the further equations as well.
of drones when the total number of drones nd is an integral Case III: nr = 2
multiple of 4, i.e., nr = 0. When the remaining number of drones, nr = 2, then
the additional drone is present in the side3 of the triangle,
√ compared to the previous configuration. Here, the Eq. (25)
r = k ∗ 10 2, θ = 135°, where 1 ≤ k ≤ nde (21)
and (26) remain the same, however, Eq. (27) and (28) are
modified as follows:
p
p r = l2 + (t ∗ 10)2 ,
r= l2 + ((nde − t) ∗ 10)2
−1 (t
−1 nde − t θ = 180° + tan , (29)
θ = 180° − tan , (22) nde + 1
nde
where 2nde + 1 < k ≤ 3nde + 2, t = k − (2nde + 1)
where nde < k ≤ 2nde , t = k − nde

l
r = (nde − t) ∗ , θ = 225°
t nde (30)
q
2
r= l2 + (t ∗ 10) , θ = 180° + tan−1 ,
nde (23) where 3nde + 2 < k < 4nde + 2, t = k − (3nde + 2)
where 2nde < k ≤ 3nde , t = k − 2nde Case IV: nr = 3
When the remaining number of drones, nr = 3, then
the additional drone is present in the side4 of the triangle,
√ compared to the previous configuration. Here, the Eq. (25),
r = (nde − t) ∗ 10 2, θ = 225°
(24) (26) and (29) remain the same, whereas Eq. (30) is modified
where 3nde < k < 4nde , t = k − 3nde as follows:
√
Eq. (21), (22), (23) and (24) presents the mechanism for r = (nde − t + 1) ∗ 10 2, θ = 225°
(31)
calculating the positions of drones when the uniqueID k is where 3nde + 2 < k ≤ 4nde + 2, t = k − (3nde + 2)
between different ranges as mentioned. Thus in the triangle formation, the azimuth angle is constant
Case II: nr = 1 across all drones that are part of side1 and side4, having
When the remaining number of drones, nr = 1, then values of 135° and 225°, respectively.
the additional drone is present in the side2 of the triangle. Thus DCC implementation illustrated in algorithm 2 calcu-
The equations for the coordinate calculation for a drone with lates the coordinates for all the drones in a decentralized and
uniqueID k changes as follows: parallel fashion.

√

l
D. Speed Control based Reinforcement Learning (SC-RL)
r =k 2∗ , θ = 135°, where 1 ≤ k ≤ nde (25) Model
nde
During the pattern formation process, if all the drones move
simultaneously towards their destinations, there are chances of
p a collision. Hence, a small-time delay is generally introduced
r= l2 + ((nde − t + 1) ∗ 10)2 , in between the movement of each drone [25]. Although such a

−1 nde − t + 1 mechanism creates a collision-free pattern formation in many
θ = 180° − tan , (26)
nde + 1 cases, it is not the most efficient one since pattern formation
where nde < k ≤ 2nde + 1, t = k − nde time is high, and there might still be scenarios where collisions
Algorithm 2 Decentralized Coordinate Calculation performed is determined based on the current state and the
Input: The number of drones nd , the pattern to be formed p. reward information from the Q Table. The previous state
Output: nd coordinates – one for each drone information and the action helps to determine the next state
1: procedure DCC(nd , p) are used to train the DSPF model.
2: Each follower is assigned with a uniqueID In the multi UAV environment, the length of the state and
3: The leader issues the takeoff command to the swarm action vectors are equal to 3 ∗ nd , where nd represents the
4: The leader broadcasts its position as posl total number of drones in the swarm system. The state vector
5: The follower’s relative position is denoted as posrel consists of each drone’s positional coordinates (x,y) and its
6: if p == square then velocity. The possible actions that can be executed on this
7: for i = 1, 2, 3, ...(nd − 1) do system are to either increase/decrease the velocity or leave it
8: Each followers initiate a parallel process unchanged. The action executed on each drone is independent
9: Calculate posrel based on Eq. (3)-(13) of each other, and hence the action space extends to 3 ∗ nd .
10: else if p == diamond then Since the state space and action space extends to large
11: for i = 1, 2, 3, ...(nd − 1) do values, a regular Q learning would be inefficient to perform
12: Each followers initiate a parallel process the reinforcement learning on the environment. Hence the
13: Calculate posrel based on Eq. (14)-(24) system demands a scalable algorithm to perform reinforcement
14: else if p == triangle then learning, which can be obtained using Deep Q Learning (or)
15: for i = 1, 2, 3, ...(nd − 1) do Dyna Q Learning methodology. This learning algorithm uses
16: Each followers initiate a parallel process the fact that the values in the Q table or Q matrix have
17: Calculate posrel based on Eq. (25)-(35) relative importance only. Hence it uses a deep neural network
18: Final position, posf = posl + posrel to approximate the Q table values preserving this relative
importance measure. This Deep Q Learning methodology is
scalable for many states and actions as the neural network’s
Previous State memory requirements are independent of the number of states
and actions of the system, making it a perfect fit for our
proposed model.
Leader's
state The reward calculation forms an essential factor in the
DSPF Model reinforcement learning setup. In the proposed SC-RL model,
a significant positive reward ξf is assigned upon a successful
Current Action

Follower's
state
pattern formation, and a significant negative reward ξf is
assigned for cases when a collision happens. During pattern
formation, if there is any change in the drones’ velocity, a
Determine small reward ξa is detected, which must be less than 10% of
System state
action the goal reward (success/failure). This reward detection helps
the system to refrain from changing the velocity of the drones
unnecessarily. Also, a small reward ξs is detected for every
Q Table iteration to speed up pattern formation.
The SC-RL mechanism is explained in detail in algorithm
Fig. 4: UAV Reinforcement Learning Model 3. Initially, a neural network is created, and the reward
mechanism is fixed, as explained. During training for every
100 episodes, the initial and the destination points are changed.
can take place. To avoid collision among drones and to speed During every episode, a set of nd random actions are predicted
up the process of automated pattern formation, the Speed for nd drones. This prediction is made in a randomized manner
Control based Reinforcement Learning (SC-RL) mechanism during the initial training phases, and later the partially trained
is proposed. neural network is used to predict actions. After predicting the
In SC-RL, each drone is considered as an agent of the actions, the set of critical drones are obtained. The critical
RL algorithm. The goal of the SC-RL model is to form the drones are those drones that are present inside the safety
required pattern in the shortest possible time without any boundary of one or more drones. Then, the action is applied
collision. This goal is achieved by varying the drones’ speed to only those critical drones and rest all the drones move at
as per necessity without changing its optimal trajectory. the same speed. The score for the action in the given state is
Fig. 4 shows an abstract flow of control in the reinforcement calculated and updated in the memory. The state of the drones
learning environment. The states of the leader and follower is also gets updated. Once the pattern formation is complete,
drones together form the state of the environment. The Q the episode score is calculated and updated in the memory.
Table consists of the reward information about every possible Also, if any pair of drones collide, the episode is deemed
state transitions in the RL environment. The action to be complete, and the episode score is evaluated.
Algorithm 3 Speed Control based Reinforcement Learning
Model
Input: The number of drones nd , initial coordinates init, final
coordinates dest
Output: A deep Q learning model with smart and collision-
free pattern switching capability
1: procedure SCRL(nd , init, dest) Collision Boundary Collision Boundary

2: Initialize the velocity range as [v1, v2] where v2 > v1

3: Initialize the neural Network Safety Boundary Safety Boundary

4: Fix a reward ξa for increase / decrease velocity actions (a) No collision chances scenario (b) Collision may occur shortly
5: Fix a reward ξf for successful completion / collision
6: Fix a reward ξs for each step in an episode
7: for each episode do
8: for every 100 iterations read nd , init, and dest
9: Set episodeScore → 0
10: Set episodeDone → F alse
11: while episodeDone ↛ T rue do
Collision Boundary
12: Predict the set of nd actions
13: Find all the criticalDrones
Safety Boundary
14: Apply the action to only the criticalDrones
15: Find the number of criticalDrones as nda (c) Collision has occurred
16: episodeScore → episodeScore − (nda ∗ ξa ) Fig. 5: Scenarios of operation for the reinforcement learning
17: if there is a collision then algorithm
18: episodeScore → episodeScore − ξf
19: episodeDone → T rue
20: if there is a successful formation then
21: episodeScore → episodeScore + ξf
22: episodeDone → T rue
23: episodeScore → episodeScore − ξs
24: Print episodeScore
25: Save the Neural Network Model
26: Plot the episode vs episodeScore graph

E. DSPF Collision Scenarios

In DSPF model for each drone, two boundaries are con-
sidered namely a safety boundary and a collision boundary, Fig. 6: Square pattern formation
to determine the different collision scenarios. These collision
scenarios are represented for any pair of drones i and j, where
q
dists ≤ ((xi − xj )2 + (yi − yj )2 ) < distc
1 ≤ i, j < nd and i ̸= j. Here (xi , yi ) and (xj , yj ) represent
the positions of ith and j th drones respectively. The radius Such drones are called critical drones. The SC-RL model
of the critical boundary is represented as distc whereas the decides to increase/decrease the velocity or leave it unchanged
radius of the safety boundary is represented as dists . These for both the drones i and j, respectively.
scenarios are illustrated in Fig. 5. 3) Scenario 2: Occurance of Collision: A collision is said
1) Scenario 1: No Collision Possibility: There is no colli- to have occurred between any pair of drones i and j iff,
sion possibility between any pair of drones i and j iff, q
q ((xi − xj )2 + (yi − yj )2 ) ≥ distc
((xi − xj )2 + (yi − yj )2 ) ≥ dists If a collision has occured the episode is terminated and the
pattern switching is unsuccessful.
In such a scenario, the SC-RL model does not decide on
altering drones’ speed, and the drones proceed with the same IV. P ERFORMANCE A NALYSIS AND R ESULTS
speed. The proposed DSPF model has been implemented suc-
2) Scenario 2: Possibility of Collision in near future: There cessfully in a SITL environment. All the drones have been
is a chance of collision to occur in the near future, between connected to the SITL setup using the initialized port numbers
any pair of drones i and j iff, in the MAVProxy.
#10 7
6
CCC
Time Delayed with DCC
SC-RL with DCC

Formation Time (seconds)

Fig. 7: Diamond pattern formation 0

0 1 2 3 4 5 6 7 8 9 10
Number of Drones #10 5

Fig. 10: Pattern formation Time for Different Models

A. Results of Simulation

This SIPS mechanism is solely dependent on the servo inter-

rupt, so the user can easily change among different formations
on the fly. These shapes can be formed in any order based
on the provided servo interrupt. This module addresses the
changing requirements of any mission, as the patterns can be
formed as per the mission requirements. The trained SC-RL
model handles the entire pattern switching mechanism, and the
coordinates are calculated using the proposed DCC algorithm
whenever a pattern formation is initiated.

B. Reinforcement Learning Model

Fig. 9 illustrates the rewards obtained over 50 episodes over
two scenarios. The first scenario represents a situation where
Fig. 8: Triangle pattern formation there is no possibility of collision during pattern switching
mechanism. In such a case, the graph is observed to be a
constant, with reward tending to be the maximum possible
4
value. The second scenario represents a situation where col-
3 lision is possible during pattern switching. Here, the neural
network initially tries to make decisions that keep the rewards
2
fluctuating, but as the episodes increase the rewards also
1 increases tending to the maximum obtainable value. Thus, Fig.
9 is proof of the proposed DSPF model’s correctness.
Reward

0
The computational complexity of the proposed algorithm
-1 may get slightly increased with number of collision and settled
down with increased episodes. The proposed method works
-2
well for real time environment with small possibilities of
-3
Collisionless positional error.
With Collision
-4
0 10 20 30 40 50 C. Analysis of Pattern Formation Time
Number of Episodes
Fig. 10 shows the pattern formation times with the increas-
Fig. 9: Training rewards for presence and absence of collision ing number of drones over three mechanisms namely, tradi-
tional time delayed with Centralized Coordinate Calculation
Once the drones take off to an altitude of 20m, a servo in- (CCC) mechanism [12], traditional time delayed [25] with
terrupt is provided via the Ardupilot. Once the servo interrupt proposed DCC mechanism, and the proposed DSPF model.
is received, the servo source value is determined. It is inferred that due to the pattern formation time becoming
As shown in Fig. 6, 7 and 8, square, diamond and triangle independent of the number of drones in the proposed DCC,
patterns are formed with respect to the servo source value it brings a significant improvement over the traditional CCC
respectively. These patterns can be formed whenever required algorithm. The proposed DSPF model with SC-RL and DCC
by providing the appropriate servo interrupt via the Ardupilot. algorithms proves to completely flatten the curve providing
1400
SC-RL
V. C ONCLUSION AND F UTURE W ORK
TC-RL
1200 The proposed SC-RL algorithm based DSPF model serves
to automate the process of pattern formation among a swarm
Distance Travelled (in metres)

1000
of UAVs, by efficiently avoiding collisions and keeping an
800 optimal trajectory during the pattern switching mechanism.
This model also serves to switch patterns on the fly, thereby
600 being a beneficial component for missions that involve a lot
of runtime decision making. The DCC algorithm serves to
400
obtain the destination points quickly in a decentralized and
200
parallel mechanism, thereby speeding up the overall mission
accomplishment. The DSPF model scales up to 300 to 400
0 drones easily due to its independency with the number of
0 10 20 30 40 50
Number of Drones drones. However, the difficulty lies in extending the mech-
anism to a wide variety of shapes, because of the tedious-
Fig. 11: Total Distance Travelled between SC-RL and TC-RL
ness involved in framing generalized algorithms for all the
available shapes. This scalability to a wide variety of shapes
can be improved by embedding the mechanisms of point
excellent scalability and improved pattern formation time by clouds that help convert a 2D/3D figure into a set of points,
95.68%. thereby paving the way for a wide variety of shapes to be
incorporated into the DSPF model. The effectiveness of the
TABLE III: Comparsion of proposed algorithm with other given work has been expanded by deploying the radar stealth
algorithms structure, Reconfigurable structure for cooperative manage-
Pattern Model Ar- Rewards Episodes Effectiveness ment and FPGA based computing which will reduces the
chitecture circuit complexity with low power consumption. To compete
Square Recurrent 110 230 60% with complex real time environment different UAV techniques
Deep De- such as heterogenic Swarm operation, visual inertial dormitory
terministic design, flocking feature, multi-UAV coordination control and
Policy
Gradient onboard diagnosis will be developed. The constraints in the
(RDDPG) communication environment have been improved by Data
Deep 73 123 72% encryption, distributed computing, SDR design for long range
Recurrent
Q-Network communication with frequency hopping.
(DRQN)
Proposed Al- 12 52 86% VI. ACKNOWLEDGEMENT
gorithm
This Publication is an outcome of the R&D work under-
taken in the project under the Indian air force Mehar BABA
Table shows the comparison of proposed algorithm with
Challenge, Government of India, being implemented by Centre
other algorithms for the pattern formation of square. It shows
for Aerospace Research, Anna University, MIT campus.
that the effectiveness of pattern formation with the proposed
method increases with lesser episodes compared with other
R EFERENCES
methods.
[1] K. Harikumar, J. Senthilnath and S. Sundaram, ”Multi-UAV Oxyrrhis
Marina-Inspired Search and Dynamic Formation Control for Forest Fire-
D. Analysis of Distance Travelled during Formation fighting,” in IEEE Transactions on Automation Science and Engineering,
vol. 16, no. 2, pp. 863-873, April 2019
[2] M. Aljehani and M. Inoue, ”A swarm of computational clouds as multi-
Fig. 11, shows the comparison of total distance traveled ple ground control stations of multi-UAV,” IEEE 6th Global Conference
during the pattern switching mechanism over the existing Tra- on Consumer Electronics (GCCE), Nagoya, pp. 1-2, 2017
jectory Control Reinforcement Learning (TC-RL) algorithm [3] A. Koubâa, A. Allouch, M. Alajlan, Y. Javed, A. Belghith and M.
[30] and the proposed SC-RL algorithm. The TC-RL algorithm Khalgui, ”Micro Air Vehicle Link (MAVlink) in a Nutshell: A Survey,”
in IEEE Access, vol. 7, pp. 87658-87680, 2019
deviates the trajectory of drone whenever there is a possibility [4] D. Liu et al., ”Task-Driven Relay Assignment in Distributed UAV Com-
of collision. This deviation significantly increases the total munication Networks,” IEEE Transactions on Vehicular Technology, vol.
distance traveled if the amount of collision possibilities during 68, no. 11, pp. 11003-11017, Nov. 2019
[5] J. Yao and N. Ansari, ”QoS-Aware Power Control in Internet of
a pattern switch is high. However, the SC-RL algorithm Drones for Data Collection Service,” IEEE Transactions on Vehicular
maintains the optimal trajectory irrespective of the number Technology, vol. 68, no. 7, pp. 6649-6656, July 2019
of collisions possible. Thus, the total distance covered in an [6] M. Cui, G. Zhang, Q. Wu and D. W. K. Ng, ”Robust Trajectory
and Transmit Power Design for Secure UAV Communications,” IEEE
SC-RL mechanism is always the least, reducing the distance Transactions on Vehicular Technology, vol. 67, no. 9, pp. 9042-9046,
covered by almost 66.67%. Sept. 2018
[7] J. Senthilnath, S. N. Omkar, V. Mani and A. R. Katti, ”Cooperative [26] Wenjie Song, Yi Yang, Mengyin Fu, Fan Qiu, and Meiling Wang,
communication of UAV to perform multi-task using nature inspired tech- “Real-Time Obstacles Detection and Status Classification for Collision
niques,” IEEE Symposium on Computational Intelligence for Security Warning in a Vehicle Active Safety System”, IEEE Transactions on
and Defense Applications (CISDA), Singapore, pp. 45-50, 2013 Intelligent Transportation Systems, vol. 19, no. 3, March 2018
[8] Peng Jian-liang, Sun Xiu-xia, Zhu Fan and Li Xiang-qing, ”Multi UAVs [27] Sushil Pratap Bharati, Yuanwei Wu, Yao Sui, Curtis Padgett, and
Cooperative Task Assignment Using Multi Agent,” Chinese Control and Guanghui Wang, “Real-Time Obstacle Detection and Tracking for
Decision Conference, Yantai, Shandong, 2008, pp. 4517-4520 Sense-and-Avoid Mechanism in UAVs”, IEEE Transactions on Intel-
[9] C. Xia and A. Yudi, ”Multi — UAV path planning based on improved ligent Vehicles, Vol. 3, No. 2, June 2018
neural network,” Chinese Control and Decision Conference (CCDC), [28] N. Kumar, N. Chilamkurti, J.H. Park, ”ALCA: Agent Learning–based
Shenyang, pp. 354-359, 2018 Clustering Algorithm in Vehicular Ad Hoc Networks,” Personal and
[10] B. H. Lee, J. R. Morrison and R. Sharma, ”Multi-UAV control testbed Ubiquitous Computing vol. 17, pp. 1683–1692, 2013
for persistent UAV presence: ROS GPS waypoint tracking package [29] X. Liu, Y. Liu and Y. Chen, ”Reinforcement Learning in Multiple-UAV
and centralized task allocation capability,” International Conference on Networks: Deployment and Movement Design,” IEEE Transactions on
Unmanned Aircraft Systems (ICUAS), Miami, FL, USA, pp. 1742-1750, Vehicular Technology, vol. 68, no. 8, pp. 8036-8049, Aug. 2019
2017 [30] G. Raja, S. Anbalagan, V. S. Narayanan, S. Jayaram and A. Gana-
[11] F. Ho, R. Geraldes, A. Gonçalves, M. Cavazza and H. Prendinger, pathisubramaniyan, ”Inter-UAV Collision Avoidance using Deep-Q-
”Improved Conflict Detection and Resolution for Service UAVs in Learning in Flocking Environment,” IEEE 10th Annual Ubiquitous Com-
Shared Airspace,” IEEE Transactions on Vehicular Technology, vol. 68, puting, Electronics & Mobile Communication Conference (UEMCON),
no. 2, pp. 1231-1242, Feb. 2019 New York City, NY, USA, pp. 1089-1095, 2019
[12] K. Z. Y. Ang, X. X. Dong, W. Q. Liu, et al., “High-precision multi-UAV [31] G. Raja, A. Ganapathisubramaniyan, S. Anbalagan, S. B. M. Baskaran,
teaming for the first outdoor night show in Singapore,” in Unmanned K. Raja and A. K. Bashir, ”Intelligent Reward based Data Offloading in
Systems, vol. 6, no. 1, pp. 39–65, 2018 Next Generation Vehicular Networks,” IEEE Internet of Things Journal,
2020
[13] W. Yuan, Q. Chen, Z. Hou and Y. Li, ”Multi-UAVs formation flight con-
trol based on leader-follower pattern,” 36th Chinese Control Conference [32] N. Kumar, J. Lee and J. J. P. C. Rodrigues, ”Intelligent Mobile
(CCC), Dalian, pp. 1276-1281, 2017 Video Surveillance System as a Bayesian Coalition Game in Vehicular
Sensor Networks: Learning Automata Approach,” IEEE Transactions on
[14] P. Kumar, S. Garg, A. Singh, S. Batra, N. Kumar and I. You, ”MVO-
Intelligent Transportation Systems, vol. 16, no. 3, pp. 1148-1161, June
Based 2-D Path Planning Scheme for Providing Quality of Service in
2015
UAV Environment,” IEEE Internet of Things Journal, vol. 5, no. 3, pp.
[33] D. Floreano, R. Wood, ”Science, technology and the future of small
1698-1707, June 2018
autonomous drones,” Nature, vol. 521, pp. 460–466, 2015
[15] B. Wang, J. Wang, B. Zhang, W. Chen and Z. Zhang, ”Leader- [34] S. Garg, A. Singh, S. Batra, N. Kumar and L. T. Yang, ”UAV-Empowered
Follower Consensus of Multivehicle Wirelessly Networked Uncertain Edge Computing Environment for Cyber-Threat Detection in Smart
Systems Subject to Nonlinear Dynamics and Actuator Fault,” in IEEE Vehicles,” in IEEE Network, vol. 32, no. 3, pp. 42-51, May/June 2018
Transactions on Automation Science and Engineering, vol. 15, no. 2, [35] M. Kostadinović, M. Stojćev, Z. Bundalo and D. Bundalo, ”Simulation
pp. 492-505, April 2018 model of DC servo motor control,” 14th International Power Electronics
[16] H. Park, I. Choi, S. Park and J. Choi, ”Leader-follower formation control and Motion Control Conference EPE-PEMC, Ohrid, pp. T7-10-T7-14,
using infrared camera with reflective tag,” 10th International Conference 2010
on Ubiquitous Robots and Ambient Intelligence (URAI), Jeju, pp. 321- [36] J. Wang, Y. Tang, J. Kavalen, A. F. Abdelzaher and S. P. Pandit,
324, 2013 ”Autonomous UAV Swarm: Behavior Generation and Simulation,” In-
[17] A. Kumari, S. Tanwar, S. Tyagi, N. Kumar, M. Maasberg, K. K. R. Choo, ternational Conference on Unmanned Aircraft Systems (ICUAS), Dallas,
”Multimedia big data computing and Internet of Things applications: TX, pp. 1-8, 2018
A taxonomy and process model,” Journal of Network and Computer [37] C. Lin, D. He, N. Kumar, K. R. Choo, A. Vinel and X. Huang, ”Security
Applications vol. 124, pp. 169-195, 2018 and Privacy for the Internet of Drones: Challenges and Solutions,” in
[18] R. Gunasekaran, V. R. Uthariaraj, U. Yamini et al, ”A Distributed IEEE Communications Magazine, vol. 56, no. 1, pp. 64-69, Jan. 2018
Mechanism for Handling of Adaptive/Intelligent Selfish Misbehaviour at [38] Iulisloi Zacarias, Carlos E.T. Leite, Janana Schwarzrock, Edison P. de
MAC Layer in Mobile Ad Hoc Networks,” Journal of Computer Science Freitas, Control Platform for Multiple Unmanned Aerial Vehicles, IFAC-
and Technology, vol. 24, pp. 472–481, 2009 PapersOnLine, Volume 49, Issue 30, 2016, Pages 36-41, ISSN 2405-
[19] T. Sugimoto and M. Gouko, ”Acquisition of Hovering by Actual 8963, https://doi.org/10.1016/j.ifacol.2016.11.119
UAV Using Reinforcement Learning,” 3rd International Conference on [39] Zhengru Fang ,Jingjing Wang ,Yong Renet al., “Age of Information in
Information Science and Control Engineering (ICISCE), Beijing, pp. Energy Harvesting Aided Massive Multiple Access Networks”, IEEE
148-152, 2016 Journal on selected areas in communications, Vol. 40, No. 5, May 2022
[20] F. Bin, F. XiaoFeng and X. Shuo, ”Research on Cooperative Colli- [40] Wei Wei, Jingjing Wang, Zhengru Fang et al., “ 3U: Joint Design
sion Avoidance Problem of Multiple UAV Based on Reinforcement of UAV-USV-UUV Networks for Cooperative Target Hunting ”, IEEE
Learning,” 10th International Conference on Intelligent Computation Transactions on vehicular technology, Vol. 72, No. 3, 2023, https:
Technology and Automation (ICICTA), Changsha, pp. 103-109, 2017 //doi.org/10.1109/TVT.2022.3220856
[21] N. Kumar, R. Iqbal, S. Misra, J. J. P. C. Rodrigues, ”Bayesian coalition [41] Lin Bai , Zhuangfei Wu and Lin Zhou , “Achievable Refined Asymp-
game for contention-aware reliable data forwarding in vehicular mobile totics for Successive Refinement Using Gaussian Codebooks”, IEEE
cloud,” Future Generation Computer Systems vol. 48, pp. 60-72, 2014 transactions on information theory, Vol. 69, No. 6, June 2023. https:
[22] V. M. Polyakov, I. N. Kaliteevsky, K. S. Amelin, V. A. Smyslov and M. //doi.org/10.1109/TIT.2023.3244232
A. Permyakov, ”Complexed NIR laser detector and LWIR camera optical [42] Yibo Zhang , Jingjing Wang , Lanjie Zhang, Member et al., “Reliable
system with neural network management for UAV collision avoidance Transmission for NOMA Systems With Randomly Deployed Receivers”,
system,” International Conference Laser Optics (ICLO), St. Petersburg, IEEE transactions on communications, Vol. 71, No. 2, February 2023
pp. 280-280, 2018 [43] Jiaxing Wang , Lin Bai , Jianrui Chen et al., “Starling Flocks-Inspired
[23] N. Kumar, S. Misra, J. J. P. C. Rodrigues and M. S. Obaidat, ”Coalition Resource Allocation for ISAC-Aided Green Ad Hoc Networks”, IEEE
Games for Spatio-Temporal Big Data in Internet of Vehicles Environ- transactions on green communications and networking, vol. 7, No. 1,
ment: A Comparative Analysis,” in IEEE Internet of Things Journal, March 2023
vol. 2, no. 4, pp. 310-320, Aug. 2015 [44] Nathan Patrizi, Georgios Fragkos, Kendric Ortiz et al., “A UAV-enabled
[24] B. Taha and A. Shoufan, ”Machine Learning-Based Drone Detection Dynamic Multi-Target Tracking and Sensing Framework”, GLOBECOM
and Classification: State-of-the-Art in Research,” in IEEE Access, vol. 2020 - 2020 IEEE Global Communications Conference, IEEE, 2020
7, pp. 138669-138682, 2019
[25] D. Kwon and J. Kim, ”Optimal Trajectory Learning for UAV-BS Video
Provisioning System: A Deep Reinforcement Learning Approach,”
International Conference on Information Networking (ICOIN), Kuala
Lumpur, Malaysia, pp. 372-374, 2019

State-Of-The-Art and Future Research Challenges in UAV Swarms
No ratings yet
State-Of-The-Art and Future Research Challenges in UAV Swarms
23 pages
sensors-23-08766-v2
No ratings yet
sensors-23-08766-v2
34 pages
Applied Sciences
No ratings yet
Applied Sciences
16 pages
Entropy 25 00853 With Cover
No ratings yet
Entropy 25 00853 With Cover
15 pages
A_Distributed_Method_for_UAV_Swarm_Path_Planning
No ratings yet
A_Distributed_Method_for_UAV_Swarm_Path_Planning
6 pages
Particle Swarm Optimization For Target Encirclement by A UAV Formation
No ratings yet
Particle Swarm Optimization For Target Encirclement by A UAV Formation
8 pages
drones-08-00582-v2
No ratings yet
drones-08-00582-v2
23 pages
drones-08-00125-v2
No ratings yet
drones-08-00125-v2
19 pages
Multi-UAV Systems For Scalability in Last-Mile Logisitics With Formation Control
No ratings yet
Multi-UAV Systems For Scalability in Last-Mile Logisitics With Formation Control
4 pages
657-Article Text-2390-1-10-20240104
No ratings yet
657-Article Text-2390-1-10-20240104
9 pages
1 s2.0 S1000936120301205 Main
No ratings yet
1 s2.0 S1000936120301205 Main
12 pages
Electronics 11 04187 v2
No ratings yet
Electronics 11 04187 v2
33 pages
TSP Csse 31116
No ratings yet
TSP Csse 31116
16 pages
Applsci 14 03703 v2
No ratings yet
Applsci 14 03703 v2
14 pages
1 s2.0 S1877050913005140 Main
No ratings yet
1 s2.0 S1877050913005140 Main
10 pages
applsci-08-01169
No ratings yet
applsci-08-01169
12 pages
sensors-22-07243-v2
No ratings yet
sensors-22-07243-v2
18 pages
Sensors 22 09180 v4
No ratings yet
Sensors 22 09180 v4
13 pages
Particle Swarm Optimisation
No ratings yet
Particle Swarm Optimisation
16 pages
Air Force Institute of Technology
No ratings yet
Air Force Institute of Technology
238 pages
Mathematics 10 04244 v3
No ratings yet
Mathematics 10 04244 v3
24 pages
A UAV-Swarm-Communication Model Using A Machine-Learning
No ratings yet
A UAV-Swarm-Communication Model Using A Machine-Learning
19 pages
applsci-11-03417-v2
No ratings yet
applsci-11-03417-v2
20 pages
Addis Ababa University Addis Ababa Institute of Technology School of Electrical and Computer Engineering
No ratings yet
Addis Ababa University Addis Ababa Institute of Technology School of Electrical and Computer Engineering
23 pages
How Den 09
No ratings yet
How Den 09
11 pages
Swarm Coordination of Mini-Uavs For Target Search Using Imperfect Sensors
No ratings yet
Swarm Coordination of Mini-Uavs For Target Search Using Imperfect Sensors
28 pages
Collaborative Coverage Path Planning of UAVs Using RL
No ratings yet
Collaborative Coverage Path Planning of UAVs Using RL
7 pages
A Simplified Algorithm For Implementing Swarm Intelligence in Multi Robot System
No ratings yet
A Simplified Algorithm For Implementing Swarm Intelligence in Multi Robot System
9 pages
Addis Ababa University Addis Ababa Institute of Technology School of Electrical and Computer Engineering
No ratings yet
Addis Ababa University Addis Ababa Institute of Technology School of Electrical and Computer Engineering
34 pages
DTIC_ADA582919
No ratings yet
DTIC_ADA582919
119 pages
RA_L___Camera_Ready (3)
No ratings yet
RA_L___Camera_Ready (3)
8 pages
Cloud Brokering
From Everand
Cloud Brokering
Felipe Díaz-Sánchez
No ratings yet
Improved Unmanned Aerial Vehicle Control For Efficient Obstacle Detection and Data Protection
No ratings yet
Improved Unmanned Aerial Vehicle Control For Efficient Obstacle Detection and Data Protection
12 pages
Examining Application-Specific Resiliency Implemen
No ratings yet
Examining Application-Specific Resiliency Implemen
26 pages
95a89738-b988-4aee-b41b-05259c261520
No ratings yet
95a89738-b988-4aee-b41b-05259c261520
36 pages
Decentralized Control and Machine Learning Techniques For Effective Drone Swarm Control
No ratings yet
Decentralized Control and Machine Learning Techniques For Effective Drone Swarm Control
32 pages
A Swarm Anomaly Detection Model for IoT UAVs Based on a Multi-modal Denoising Autoencoder and Federated Learning
No ratings yet
A Swarm Anomaly Detection Model for IoT UAVs Based on a Multi-modal Denoising Autoencoder and Federated Learning
22 pages
2018 Wireless Communications and Control For Swarms of Cellular-Connected UAVs
No ratings yet
2018 Wireless Communications and Control For Swarms of Cellular-Connected UAVs
5 pages
Robotics: Cooperative Optimization of Uavs Formation Visual Tracking
No ratings yet
Robotics: Cooperative Optimization of Uavs Formation Visual Tracking
22 pages
Optimal Path Planning For Drones Based On Swarm Intelligence Algorithm
No ratings yet
Optimal Path Planning For Drones Based On Swarm Intelligence Algorithm
23 pages
Model-Driven Online Capacity Management for Component-Based Software Systems
From Everand
Model-Driven Online Capacity Management for Component-Based Software Systems
André van Hoorn
No ratings yet
Learning Based Multi-Obstacle Avoidance of Unmanned
No ratings yet
Learning Based Multi-Obstacle Avoidance of Unmanned
13 pages
UAV_Swarm_Intelligence_Recent_Advances_and_Future_
No ratings yet
UAV_Swarm_Intelligence_Recent_Advances_and_Future_
23 pages
Papier Sys CON
No ratings yet
Papier Sys CON
7 pages
A_Low_Latency_Clustering_Method_for_Large-Scale_Drone_Swarms
No ratings yet
A_Low_Latency_Clustering_Method_for_Large-Scale_Drone_Swarms
8 pages
Multi-Agent Reinforcement Learning Based Resource Allocation For UAV Networks
No ratings yet
Multi-Agent Reinforcement Learning Based Resource Allocation For UAV Networks
30 pages
drones-08-00018-v2
No ratings yet
drones-08-00018-v2
18 pages
Computers: Towards Self-Aware Multirotor Formations
No ratings yet
Computers: Towards Self-Aware Multirotor Formations
29 pages
Use of Swarm Intelligence Algorithms For Pattern Formation PDF
No ratings yet
Use of Swarm Intelligence Algorithms For Pattern Formation PDF
10 pages
Unmanned Aircraft Systems
From Everand
Unmanned Aircraft Systems
Ella Atkins
No ratings yet
CrowdStrike CCCS Exam Prep: 500 Practice Questions and Detailed Answers for Certified Cloud Specialist
From Everand
CrowdStrike CCCS Exam Prep: 500 Practice Questions and Detailed Answers for Certified Cloud Specialist
Steve Brown
No ratings yet
MonA02-4
No ratings yet
MonA02-4
6 pages
Path Planning of Unmanned Aerial Vehicles Based on an Improved Bio-Inspired TSO Algorithm
No ratings yet
Path Planning of Unmanned Aerial Vehicles Based on an Improved Bio-Inspired TSO Algorithm
30 pages
Smart Sensors in Flight
From Everand
Smart Sensors in Flight
Sophie Carter
No ratings yet
Efficient 3D Path Planning For Drone Swarm Using I
No ratings yet
Efficient 3D Path Planning For Drone Swarm Using I
30 pages
Multi-UAV Cooperative Trajectory Planning Based on Many-Objective Evolutionary Algorithm
No ratings yet
Multi-UAV Cooperative Trajectory Planning Based on Many-Objective Evolutionary Algorithm
12 pages
Trajectory Synthesis For A UAV Swarm Based On Resilient Data Collection Objectives
No ratings yet
Trajectory Synthesis For A UAV Swarm Based On Resilient Data Collection Objectives
14 pages
Collision Avoidance Strategies For Unmanned Aerial Vehicles in Formation Flight
No ratings yet
Collision Avoidance Strategies For Unmanned Aerial Vehicles in Formation Flight
17 pages
Swarm Technology in Drones and Its Applications
No ratings yet
Swarm Technology in Drones and Its Applications
20 pages
SwarmRobotics
No ratings yet
SwarmRobotics
10 pages
thermal storage devices ML Paper 20.1.25
No ratings yet
thermal storage devices ML Paper 20.1.25
33 pages
04 Abstract
No ratings yet
04 Abstract
3 pages
Air and Water Pollution
No ratings yet
Air and Water Pollution
22 pages
Building A Start Up Innovation Ecosystem
No ratings yet
Building A Start Up Innovation Ecosystem
9 pages
Entrepreneurialecosystems
No ratings yet
Entrepreneurialecosystems
22 pages
Strength of Materials
No ratings yet
Strength of Materials
18 pages
Notes On Vellore District
No ratings yet
Notes On Vellore District
40 pages
Notes On Kaniyakumari District
No ratings yet
Notes On Kaniyakumari District
27 pages
Water Quality Using Ann
No ratings yet
Water Quality Using Ann
17 pages
BBS Part A Questions
No ratings yet
BBS Part A Questions
1 page
Sixth Generation Innovation Model Description of A Success Model
No ratings yet
Sixth Generation Innovation Model Description of A Success Model
26 pages
Unit IV Accidents - IH5023 Behaviour Based Safety
No ratings yet
Unit IV Accidents - IH5023 Behaviour Based Safety
14 pages
Analysis of Water Quality - A Review
No ratings yet
Analysis of Water Quality - A Review
8 pages
Write A Detailed Note On The Following BBS &WCA 1923
No ratings yet
Write A Detailed Note On The Following BBS &WCA 1923
14 pages
Systematic Thinking and Accident Analysis Models
No ratings yet
Systematic Thinking and Accident Analysis Models
28 pages
Desalination Solar Powered Thesis
No ratings yet
Desalination Solar Powered Thesis
209 pages
Kalpana Chawla
No ratings yet
Kalpana Chawla
6 pages
MN It Conference
No ratings yet
MN It Conference
5 pages
Unit 1 Solid Mechanics
67% (3)
Unit 1 Solid Mechanics
21 pages
Geology of Madurai District
100% (2)
Geology of Madurai District
45 pages
Icdtms 2021
No ratings yet
Icdtms 2021
8 pages
Eg Course File
No ratings yet
Eg Course File
130 pages
The Pedagogical Model: Eva Lua
100% (1)
The Pedagogical Model: Eva Lua
28 pages
Assessment of Water Quality of Ganga River in Kanpur Byusing Principal Components Analysis
No ratings yet
Assessment of Water Quality of Ganga River in Kanpur Byusing Principal Components Analysis
8 pages
Instructions To Candidates
No ratings yet
Instructions To Candidates
4 pages
AEDP 02v1
100% (1)
AEDP 02v1
33 pages
FTI - Tech Trends Report 2017
No ratings yet
FTI - Tech Trends Report 2017
155 pages
FPGA Based Flexible Autopilot Platform For Unmanned Systems: W. Alvis, S. Murthy, K. Valavanis, W. Moreno, S. Katkoori
No ratings yet
FPGA Based Flexible Autopilot Platform For Unmanned Systems: W. Alvis, S. Murthy, K. Valavanis, W. Moreno, S. Katkoori
9 pages
Yogesh 369
No ratings yet
Yogesh 369
69 pages
Design and Stress Analysis of LSU 05 Twin Boom Using Finite Element Method
No ratings yet
Design and Stress Analysis of LSU 05 Twin Boom Using Finite Element Method
11 pages
Perdix Fact Sheet
No ratings yet
Perdix Fact Sheet
1 page
Drone Technology Seminar
No ratings yet
Drone Technology Seminar
24 pages
MEng UAS Challenge - Quad-Rotor CDR PDF
No ratings yet
MEng UAS Challenge - Quad-Rotor CDR PDF
351 pages
Metroplex RPG
No ratings yet
Metroplex RPG
11 pages
TJ150 PDF
No ratings yet
TJ150 PDF
2 pages
The Military Legacy of Alexander the Great: Lessons for the Information Age 1st Edition Ferguson All Chapters Instant Download
100% (1)
The Military Legacy of Alexander the Great: Lessons for the Information Age 1st Edition Ferguson All Chapters Instant Download
55 pages
IELTS Reading Matching Sentence Endings: Answers
No ratings yet
IELTS Reading Matching Sentence Endings: Answers
9 pages
Journal of Air Transport Management: Rico Merkert, James Bushell
No ratings yet
Journal of Air Transport Management: Rico Merkert, James Bushell
10 pages
Ai in Surveying and Geomatics
No ratings yet
Ai in Surveying and Geomatics
13 pages
Routing and Scheduling Optimization For UAV Assisted Delivery System: A Hybrid Approach
No ratings yet
Routing and Scheduling Optimization For UAV Assisted Delivery System: A Hybrid Approach
21 pages
Urban Air Mobility: A Paradigm Shift in Transport Systems in Metropolitan Areas
100% (1)
Urban Air Mobility: A Paradigm Shift in Transport Systems in Metropolitan Areas
1 page
Veronte Autopilot Kit Datasheet
No ratings yet
Veronte Autopilot Kit Datasheet
2 pages
Drones in Agriculture
100% (1)
Drones in Agriculture
28 pages
DJI Remote Identification Whitepaper 3-22-17
No ratings yet
DJI Remote Identification Whitepaper 3-22-17
10 pages
Idea Forgery IPO Scam - SEBI
No ratings yet
Idea Forgery IPO Scam - SEBI
1,321 pages
Final RKN KTG
No ratings yet
Final RKN KTG
31 pages
2021 Design Aerodynamic Analysis and Test Flight of A Bat-Inspired Tailless Flapping Wing Unmanned Aerial Vehicle
No ratings yet
2021 Design Aerodynamic Analysis and Test Flight of A Bat-Inspired Tailless Flapping Wing Unmanned Aerial Vehicle
10 pages
WESCAM's MX-15D. Fully Digital. High Definition
100% (2)
WESCAM's MX-15D. Fully Digital. High Definition
2 pages
Nps Future Ucavs
No ratings yet
Nps Future Ucavs
195 pages
Drone Technology
No ratings yet
Drone Technology
18 pages
Electrical Vertical Takeoff and Landing
No ratings yet
Electrical Vertical Takeoff and Landing
15 pages
Project Charter Wilmonts Pharmacy Drone
No ratings yet
Project Charter Wilmonts Pharmacy Drone
2 pages
Guidelines On Industry 4.0 and Drone Entrepreneurship For VET Students
No ratings yet
Guidelines On Industry 4.0 and Drone Entrepreneurship For VET Students
45 pages
Proceedings 2012
No ratings yet
Proceedings 2012
97 pages

Multi_Agent_Reinforcement_Learning_for_SWARM_UAV_Collision_Avoidance

Uploaded by

Multi_Agent_Reinforcement_Learning_for_SWARM_UAV_Collision_Avoidance

Uploaded by

1

Simulation and Path Optimization of SWARM UAV

GROUND CONTROL SYSTEM LEADER Side 4 Side 1

FOLLOWER 1 FOLLOWER 2 FOLLOWER n D Side 3 C C

Follower's (a) Square (b) Diamond

COORDINATE Side 4 Side 1

Fig. 2: Distributed coordinate calculation mechanism

2: Initialize the velocity range as [v1, v2] where v2 > v1

E. DSPF Collision Scenarios

Formation Time (seconds)

Fig. 7: Diamond pattern formation 0

Fig. 10: Pattern formation Time for Different Models

This SIPS mechanism is solely dependent on the servo inter-

B. Reinforcement Learning Model

You might also like