Reinforcement_Learning-Based_Dynamic_Anti-Jamming_Power_Control_in_UAV_Networks_An_Effective_Jamming_Signal_Strength_Based_Approach

This document presents a novel anti-jamming power control framework for UAV-assisted air-to-ground communication using reinforcement learning (RL). It introduces a deep deterministic policy gradient (DDPG) approach to dynamically manage transmission power in the presence of unknown jamming models, leveraging effective jamming signal strength (EJSS) estimation through kernel density estimation (KDE). Simulation results demonstrate that the proposed method outperforms existing algorithms in terms of sum rate and energy efficiency.

Uploaded by

sunrisers025

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views5 pages

Reinforcement_Learning-Based_Dynamic_Anti-Jamming_Power_Control_in_UAV_Networks_An_Effective_Jamming_Signal_Strength_Based_Approach

Uploaded by

sunrisers025

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

IEEE COMMUNICATIONS LETTERS, VOL. 26, NO.

10, OCTOBER 2022 2355

Reinforcement Learning-Based Dynamic Anti-Jamming Power Control in UAV

Networks: An Effective Jamming Signal Strength Based Approach
Nan Ma , Kui Xu , Xiaochen Xia , Chen Wei , Qiao Su , Maiying Shen, and Wei Xie

Abstract— Unmanned aerial vehicle (UAV) assisted Reinforcement learning (RL) is a promising approach for
air-to-ground (A2G) communication is vulnerable to malicious solving such a problem when the jamming model is not
jamming due to the broadcast nature of wireless communications. available in advance. In [10], a cooperative anti-jamming
In this letter, an anti-jamming power control framework with an
unknown jamming model and unknown transmission power is
resource allocation algorithm was proposed, where the BS
proposed. In particular, the probability density function (PDF) selected appropriate power and sub-band based on deep RL
of the effective jamming signal strength (EJSS) is first estimated to realize energy-efficient anti-jamming transmission. In [11],
via kernel density estimation (KDE). Then, utilizing the EJSS, an RL-based algorithm was proposed to improve the quality
a deep deterministic policy gradient (DDPG) based framework of service of low-latency Visual Internet of Things (VIOT)
is proposed to acquire the power control strategy in real time. video streaming in jamming environment. In [9], the authors
Moreover, a trajectory design scheme based on K-means++ is
proposed to track the location of users. The simulation results proposed a Q-learning based algorithm to optimize the utility
show that the proposed framework yields an improved sum rate of the BS in millimeter-wave massive MIMO system with
and energy efficiency over the reference schemes. smart jammer. However, most of the previous works assume
Index Terms— UAV, anti-jamming power control, unknown discrete power and perfect estimation of the jamming power
jamming model, deep deterministic policy gradient, kernel den- by the BS, which is not always available in practice.
sity estimation. Motivated by the above research, in this letter, we model the
interaction between a UAV and a smart jammer as a Markov
I. I NTRODUCTION
decision process (MDP) and then use the deep RL-based
A S SUPPLEMENTS to cellular mobile communication
systems, unmanned aerial vehicles (UAVs) can greatly
improve coverage performance due to their deployment flexi-
algorithm to contend with such a problem. Considering the
mobility of users and the kinematic restraints of fixed-wing
UAVs, we propose a location-based trajectory design scheme
bility [1]. However, the links between UAV base stations (BSs)
to track the users. Under the assumption that the system has no
and ground users are vulnerable to malicious jamming due to
prior knowledge of the jamming model and jamming power,
the broadcast nature. This makes the anti-jamming problem a
we use the jamming signal strength (JSS) as an indicator of the
critical issue in UAV-assisted air-to-ground (A2G) communi-
jamming strategy, and a kernel density estimation (KDE) based
cation systems.
effective JSS (EJSS) estimation method is proposed. Moreover,
The anti-jamming power control has often been investigated
we address continuous power control with deep deterministic
through game theory due to its noncooperative nature [2]–[6].
policy gradient (DDPG) which is different from the above
In [7], the authors proposed a beam-domain anti-jamming
RL-based works. Simulation results show the superiority of
transmission scheme for downlink massive multi-input multi-
the proposed framework compared with Q-learning and the
output (MIMO) system, in which the hierarchical interactions
benchmark algorithm.
between the jammer and the BS were formulated as Bayesian
Stackelberg game. In [8], to determine the users’ power invest- II. S YSTEM M ODEL
ment in dual communication environment, a noncooperative
The UAV-assisted A2G communication system with a
power control game is formulated and solved. However, the
smart jammer is illustrated in Fig. 1, which consists of one
drawbacks of these schemes are twofold: 1) They require the
fixed-wing UAV with M antennas and K single-antenna users
BS to perfectly know the jamming model between the smart
located at within a circle of radius rd . The UAV serves as a
jammer and the users [9], and this model is usually difficult
BS to provide internet access for the users. The smart jammer
to obtain. 2) They are executed in an iterative manner that
is equipped with M antennas and wishes to jam the users with
lowers the efficiency of making an optimal decision, especially
suitable power.
in time-varying environment.
Manuscript received 30 June 2022; accepted 16 July 2022. Date of pub- A. Channel Model
lication 21 July 2022; date of current version 10 October 2022. This work
was supported in part by the National Natural Science Foundation of China
We consider a UAV that flies along a circular trajectory
under Grant 62071485, Grant 61901519, Grant 62001513 and in part by the with a fixed altitude HU . The time duration for the UAV to
Basic Research Project of Jiangsu Province under Grant BK 20192002 and the fly through one rotation is T , which is equally divided into N
Natural Science Foundation of Jiangsu Province under Grant BK 20201334, epochs. In the nth epoch, the location of the UAV and user
and BK 20200579. The associate editor coordinating the review of this
letter and approving it for publication was F. Kara. (Corresponding author: k are denoted by (xU (n), yU (n), HU ) and (xk (n), yk (n), 0)
Kui Xu.) in a three-dimensional (3D) Cartesian coordinate system,
The authors are with the College of Communications Engineering, Army respectively. The channel vector between the UAV and user
Engineering University of PLA, Nanjing 210007, China (e-mail: manan995@
163.com; [email protected]; [email protected]; [email protected]; k is expressed as [12]
[email protected]; [email protected]; [email protected]).

Digital Object Identifier 10.1109/LCOMM.2022.3193309 hU,k (n) = βU,k (n)gU,k (n), n ∈ {1, . . . , N }, (1)
1558-2558 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: VNR Vignana Jyothi Inst of Eng & Tech. Downloaded on April 04,2025 at 08:13:47 UTC from IEEE Xplore. Restrictions apply.
2356 IEEE COMMUNICATIONS LETTERS, VOL. 26, NO. 10, OCTOBER 2022

where βJ,k = ψτ0 /dκJ,k models the path loss and

shadow fading term [15], ψ is a log-normal random
variable
with a standard deviation σshadow . dJ,k =
2 2
(xJ − xk ) + (yJ − yk ) denotes the distance between the
jammer and user k. κ denotes the path loss factor. τ0 denotes
the path loss at the reference distance. The small-scale fad-
ing process is modeled as Rayleigh fading, i.e., gJ,k ∼
CN (0, IM ). Thus, the channel matrix between the jammer and
all users is given as
Fig. 1. UAV based networks with smart jammer.
HJ (n) = DJ (n)1/2 GJ (n), (7)
where βU,k (n) denotes the path loss, which is modeled as where GJ (n) = [gJ,1 (n)T , . . . , gJ,K (n)T ]T . DJ (n) is a
an elevation angle dependent probabilistic line-of-sight (LoS) diagonal matrix with [DJ (n)]kk = βJ,k (n).
model [13]. The probability of having a LoS link between user
k and the UAV is given as
B. The Signal Model
1
PLoS,k = −1
, (2) We consider a UAV that transmits signals to different
1 + a exp(−b( 180
π sin ( dU,k
HU
(n) ) − a))
users at the same time and frequency band through pre-

2 2 coding/beamforming with equal power. Assuming that the
where dU,k (n) = (xU (n) − xk ) + (yU (n) − yk ) + HU2
beamforming matrices of the UAV and smart jammer in epoch
is the distance, a and b are constants that depend on the n are denoted as WU (n) ∈ CM×K and WJ (n) ∈ CM×K ,
environment. In addition, the probability of having non line respectively, the received signal at user k in epoch n is
of sight (NLoS) link is PN LoS,k = 1 − PLoS,k . The path
losses of LoS and NLoS links are expressed as P LLoS = yU,k (n) = PU /KhU,k (n)WU (n)xU (n)
LF S + ηLoS and P LN LoS = LF S + ηN LoS , respectively.
+ PJ /KhJ,k (n)WJ (n)xJ (n) + z, (8)
LF S (dB) = 20 log(dU,k ) + 20 log(f ) + 20 log(4π/c) denotes
T
the free space path loss and f denotes the carrier frequency. where xU (n) = [xU,1 (n), . . . , xU,K (n)] denotes the infor-
c denotes the speed of light. The terms ηLoS and ηN LoS mation vector transmitted by the UAV to the users, xU,k (n)
are the mean additional path losses of LoS and NLoS links, denotes the symbol transmitted to user k, xJ (n) =
T
respectively. Thus, the mean path loss between the UAV and [xJ,1 (n), . . . , xJ,K (n)] denotes the jamming signal, PU
user k is expressed as denotes the total transmission power selected by the UAV,
PJ denotes the total jamming power used by the smart
βU,k = PLoS,k × P LLoS + PN LoS,k × P LN LoS . (3)
jammer, and z ∼ CN (0, σ 2 ) denotes additive white Gaussian
Moreover, the small-scale fading vector gU,k (n) = noise (AWGN) in the signal forwarded by the UAV to the
[gU,k,1 (n), gU,k,2 (n), . . . gU,k,M (n)] is formulated as a Rician users, which we assume is equal for each user. We assume that
fading channel model due to the high altitude of the UAV. the jammer has perfect knowledge of the jamming channel,
Thus the small-scale fading coefficient between UAV antenna while the performance with imperfect channel state informa-
m and user k is expressed as [14] tion (CSI) is given in Section IV. When zero-forcing (ZF)
beamforming is adopted, the beamforming matrices of the
Gray 2π 1 UAV and jammer, respectively, are denoted as:
gU,k,m (n) = ej λ dU,k (n) + qU,k,m (n).
Gray + 1 Gray + 1
WU (n) = vU HU (n)H (HU (n)HU (n)H )−1 , (9)
(4) H −1
WJ (n) = vJ HJ (n) (HJ (n)HJ (n) ) ,
H
(10)
The first term on the right-hand side (RHS) of (4)
denotes the LoS component, with 2π λ dU,k (n) denoting the where vU = 1/(HU (n)HU (n) )2F and vJ =
H

phase of the LoS path. Gray denotes the Rician factor, and λ
denotes the wavelength of the carrier. The second term on the 1/(HJ (n)HJ (n)H )2F denote the power normalization
RHS of (4) denotes Rayleigh fading qU,k,m (n) ∼ CN (0, 1), factors of the UAV and jammer, respectively. Thus, the
which is independent of different k, m and n values. Hence, signal-to-interference-plus-noise ratio (SINR) of user k is
the channel matrix between the UAV and all users is given by [14]
PU /K[(HU (n)HU (n)H )−1 ]k,k
HU (n) = DU (n)1/2 GU (n), (5) γk = , (11)
PJ /K[(HJ (n)HJ (n)H )−1 ]k,k + σ 2
T
where GU (n) = gU,1 (n)T , . . . , gU,K (n)T , DU (n) ∈
CK×K is a diagonal matrix with [DU (n)]k,k = βU,k (n). C. Jamming Model
The location of the smart jammer in epoch n is denoted as We consider a smart jammer that is able to quickly learn
(xJ (n), yJ (n), 0), as given in (1), the channel vector between the UAVs transmission power and adjust its jamming policy
the jammer and user k is expressed as by applying Q-learning to choose its jamming power PJ (n) ∈
[0, PJmax ], where PJmax denotes the maximum jamming power
hJ,k (n) = βJ,k (n)gJ,k (n), n ∈ {1, . . . , N }, (6) of jammer. The design of the jammer is formulated as follows:

Authorized licensed use limited to: VNR Vignana Jyothi Inst of Eng & Tech. Downloaded on April 04,2025 at 08:13:47 UTC from IEEE Xplore. Restrictions apply.
MA et al.: RL-BASED DYNAMIC ANTI-JAMMING POWER CONTROL IN UAV NETWORKS: AN EJSS BASED APPROACH 2357

1) State: the state of the jammer in epoch n is expressed where h denotes the bandwidth parameter. O(·) denotes the
as sjam (n) = [PU (n − 1)], where PU (n − 1) is the kernel function, which can be selected as a Gaussian kernel
transmission power of the UAV estimated by the jammer. or a rectangular kernel [21].
2) Action: the action of the jammer in epoch n is the total We use KDE to estimate the PDF of the compressed
jamming power PJ (n). JSS samples, i.e., P̃J (1), P̃J (2), . . . , P̃J (n). After the PDF
3) Reward: the reward function of the jammer in epoch n is estimated, the dominant interval P̃J,min, P̃J,max for the
is expressed as: compressed JSS can be defined as the minimum interval that
K
satisfies the condition:
Rjam (n) = − log2 (1+γk (n)) − cJ PJ (n), (12) P̃J,max +∞
k=1 fˆh (x)dx ≥ η fˆh (x)dx, (14)
where cJ denotes the transmission cost of the jammer. P̃J,min −∞

III. A NTI -JAMMING P OWER C ONTROL F RAMEWORK where the parameter η is a positive close to 1. Then, the
W ITHOUT THE JAMMING M ODEL AND P OWER UAV can compress the JSS into the dominant interval and
In this work, we model the interactions between the UAV the outliers are eliminated.
and the jammer as MDP, and considering the mobility of the The KDE-based estimation of the EJSS in epoch n + 1 is
users and continuous power optimization, a KDE-based DDPG shown in Algorithm 1.
framework is proposed to realize anti-jamming power control
at the UAV. Algorithm 1 The KDE-Based Estimation of the EJSS
Input: Initialize JSS level Lnum , the kernel function O(·), and the ratio η.
A. Trajectory Design Based on K-Means++
JSS P̄J (n + 1) in epoch n + 1.
Considering the flight payload and on-board battery, the 1: Perform Logarithmic compression on historical data of JSS, i.e., P̃J (i) =
fixed-wing UAV is adopted. Thus, the UAV must maintain 10 log10 P̄J (i), 1 ≤ i ≤ n.
a forward flight to remain aloft [16] and the majority of 2: Estimate the PDF fˆh of the compressed JSS using P̃J (i), 1 ≤ i ≤ n by
studies about trajectory optimization [17] are unavailable in KDE.
this scenario. 3: Determine the valid distribution interval [P̃J,min , P̃J,max ] of the data
Thus, we use the location information of users to determine according to (14) and the pre-defined ratio η.
a circular trajectory, which makes the UAV capable of adapting º
4: Quantify the effective interval into Lnum levels, and calculate the quan-
its trajectory to track users and provide better services. First, tization interval as ξ = (P̃J,max − P̃J,min ) (Lnum − 1).
we determine the cluster trajectory center [ccx , ccy ] and radius 5: The quantified P̂J (n + 1) is given as P̂J (n + 1) = (P̃J (n + 1) −
rc,uav (n+1) with K-means++ [18]. Then, we make the actual P̃J,min )/ξ, which is defined as EJSS.
trajectory track the change of cluster trajectory slowly, i.e., the Output: The EJSS P̂J (n + 1).
next center and radius are updated by [cx (n + 1), cy (n + 1)] =
0.5[cx (n), cy (n)] + 0.5[ccx , ccy ] and ruav (n + 1) = ruav (n) + C. The DDPG Based Power Control Framework
δr , where δr = rc,uav (n + 1) − rc,uav (n) denotes the change In the proposed framework, the design of the agent is
of cluster radius. [cx (n), cy (n)], ruav (n) and rc,uav (n) are the formulated as follows:
current center, radius and cluster trajectory radius, respectively. 1) State: the state of the UAV in epoch n is expressed as
suav (n) = r(n − 1), P̂J (n − 1) , where r(n − 1) =
B. The KDE-Based EJSS Estimation K
We define the term PJ /K[(HJ (n)HJ (n)H )−1 ]k,k in the k=1 log2 (1 + γk (n − 1)) is the sum rate of the users
denominator of (11) as the JSS at user k, which is denoted and P̂J (n − 1) is the EJSS estimated by the UAV.
as P̄J,k . We consider the JSS as an indicator of the jammer’s 2) Action: the action of the UAV in epoch n is the total
current strategy, which is used to construct the state space of transmission power PU (n).
the UAV in the next subsection. In practice, the JSS can be 3) Reward: the reward function of the UAV in epoch n is
estimated from the SINR feed back from users. However, due expressed as:
to channel fluctuations, the JSS may have a large dynamic K
region, which results in a massive state space for the UAV. Ruav (n) = log2 (1+γk (n)) − cU PU (n), (15)
In this subsection, we propose an EJSS estimation algorithm k=1
based on KDE to address this problem. where cU denotes the transmission cost of the UAV.
In particular, we define the total EJSS of all users K as P̂J , We consider continuous power optimization thus the value
which can be estimated with the JSS P̄J = k=1 P̄J,k . based approaches are not available because they can only
To compress the dynamic region and reduce the variance generate discrete actions.
of the JSS, a logarithmic operation is utilized as P̃J (i) = DDPG is an actor-critic (AC) algorithm. It concurrently
10 log10 P̄J (i), 1 ≤ i ≤ n. Moreover, we obtain the probability learns a policy network approximation μ(s|θμ ), i.e., actor
density function (PDF) of the compressed JSS P̃J by using network, and a Q-function network approximation Q(s, a|θQ )
KDE based on the historical value of the JSS. called the critic. Since the output of the policy network in
The KDE is a nonparametric method for estimating proba- the DDPG directly maps the states to the actions, instead of
bility densities [19], [20]. If x1 , x2 , . . . xn is a set of indepen- computing the probability distribution across a discrete action
dent sample points taken from an unknown density fh (x), the space, it is more suitable for the continuous action space.
estimation of fh (x) can be obtained as follows: Because the target policy in DDPG is deterministic, the
1
n
x − xi Q-function is trained using the Bellman equation
fˆh (x) = O , (13)
nh i=1 h Qμ (s, a) = E(s,a,r,s )∈D [r(s, a)+γQμ (s , μ(s |θμ ))] , (16)
Authorized licensed use limited to: VNR Vignana Jyothi Inst of Eng & Tech. Downloaded on April 04,2025 at 08:13:47 UTC from IEEE Xplore. Restrictions apply.
2358 IEEE COMMUNICATIONS LETTERS, VOL. 26, NO. 10, OCTOBER 2022

Algorithm 2 DDPG Based Anti-Jamming Power Control where ς ∈ (0, 1) denotes the soft coefficient that determines
Framework the update speed of the target networks.
Initialization: Randomly initialize actor network μ(s|θµ ) and critic network To explore the continuous action space effectively, we con-
Q(s, a|θ Q ) with weights θ µ and θ Q . Initialize target network μ and Q struct an exploration policy μe by adding noise sampled from

with weights θ µ ← θ µ , θ Q ← θ Q . Initialize replay buffer D. Initialize a noise process N to an actor policy:
a random process N for action exploration, randomly generating the initial
μe (s) = μ(s|θμ ) + N , (21)
state suav (1).
1: Generate initial trajectory with center [0, 0] and pre-defined radius. N can be chosen according to the environment.
2: for n = 1 : Nepoch do Based on the above, the DDPG based anti-jamming power
3: Select action a(n) = μ(s|θ µ ) + N . control framework is presented in Algorithm 2.
4: Execute action a(n) and observed reward Ruav (n).
5: Estimate the EJSS P̂J (n) following Algorithm 1 and construct the IV. S IMULATION R ESULTS
next state suav (n + 1).
In this section, the performance of the proposed framework
6: Store the transition (suav (n), a(n), R(n), susv (n + 1)) in replay
buffer D.
is presented and compared with Q-learning and the benchmark
7: Sample a minibatch of Nr transitions from the buffer D randomly.
algorithm. The UAV flies over a circular mission area with a
8: Calculate the target value y(n) following (17). radius of rd = 500 m along the trajectory determined by the
9: Update the weights of critic network θ µ by minimizing the loss in (18). scheme in Part A, Section III or a pre-defined trajectory with
10: Update the weights of actor network θ Q with the sampled gradient a radius of 250 m. The smart jammer located at the center of
in (19). the mission area. The altitude of the UAV is HU = 100 m.

11: Update the target actor network θ Q and the target policy network θ µ The number of antennas employed by the UAV and jammer
following (20). is M = 16. The number of users is K = 10. The parameters
12: while n = kN, k = 1, . . . , Nepoch /N do of the A2G channels is Gray = 8 [14], a = 9.61, b = 0.16,
13: Update trajectory according Part A, Section III. ηLoS = 1 and ηN LoS = 20.
14: end while The path loss in the reference distance τ0 = −40 dBm.
15: end for The noise power is σ 2 = −110 dBm. The ψ is a log-normal
where s is the next state of the UAV, and γ denotes the random variable with a standard deviation of σshadow =
discount factor. D is the experience replay buffer, and in each 8 dB. κ = 3.8 denotes the path loss factor. The maximal
epoch, the tuple (s, a, r, s ) will be stored in the replay buffer. transmission power of the UAV is PUmax = 1.5 W and can
At each timestep, the actor and critic are updated by sampling be continuously adjusted. The maximal jamming power is
a minibatch uniformly from the buffer, in which the samples PJmax = 150 W, which is equally divided into Lnum = 16 lev-
are assumed to be independently and identically distributed. els. In the DDPG algorithm, the learning rates of the actor and
Moreover, because the updated network Qμ (s, a) is also critic are 0.0005 and 0.001, respectively. The discount factor
used in calculating the target value (16), it is difficult for the is γ = 0.95. The ratio η used in KDE is 0.95. The number of
Q update to achieve convergence. To overcome this drawback, epochs in one flying cycle N = 200, and Nepoch = 105 .
we create a copy of the actor and critic networks, namely Then, the transmission costs are cU = 15 and cJ = 1.
In addition, we assume that the users move independently and
the target actor network μ (s|θμ ) and target critic network

Q (s, a|θQ ), that are used for calculating the target values. randomly during each epoch. The CSI error at the jammer
Thus, for each transition (s(n), a(n), r(n), s(n + 1)) in mini- follows CN (0, 0.1).
batch, the target value y(n) is calculated as The time-complexity of a deep neural network is represented
by the number of floating-point operations (FLOPs). Hence,
y(n) = r(n)+γ(Q (s(n + 1), μ (s(n + 1)|θμ )|θQ ). (17) the FLOPs of the proposed DDPG during inference is
The critic network is updated by minimizing the loss L(θ):
F LOP s = 2 · (128 × |S| + 64 × |A| + 32768)
1 2
N
L(θ) = Q(s(i), a(i)|θQ − y(i)) , (18) = 2 · (128 × 2 + 64 × 1 + 32768) = 33408
Nr i=1 (22)
where Nr is the size of a minibatch. Then, the actor network Fig. 2 shows the average sum rate as a function of flying
update using the sampled gradient is expressed as: cycles. It can be seen that the scheme with the adaptive
∇θμ μ|s(n) trajectory can achieve an average sum rate gain of 2 bit/s/Hz
1
N
for Q-learning and 0.5 bit/s/Hz for DDPG due to the ability
≈ ∇a Q(s, a|θQ )|s=s(n),a=μ(s(n))∇θμ μ(s|θμ )|s(n). of tracking the users real-time locations.
Nr i=1
(19) Fig. 3 compares the outage probability (the ratio of the
In the deep Q network (DQN) algorithm, since the weights user’s SINR that is less than 1) of each scheme. The DDPG
of the target network are regularly copied from the current and trajectory optimization based scheme converge to the
network, the stability of training decreases. In the DDPG lowest outage probability. However, this metric would not
algorithm, the “soft update” is used to address this issue, which converge to zero due to the smart jammer also optimizes its
means that the weights of the target actor and critic networks jamming policy according to the ongoing transmission.
are updated by having them slowly track the learned networks: Moreover, the proposed DDPG framework achieves a lower
outage probability with less energy consumption, as shown

θQ ← ςθQ + (1 − ς)θQ , θμ ← ςθμ + (1 − ς)θμ , (20) in Fig. 4. This advantage may be a benefit derived from

V. C ONCLUSION
In this letter, an anti-jamming DDPG based framework is
proposed to optimize the anti-jamming power control strategy
of the UAV. Considering the mobility of users, a trajectory
optimization scheme based on K-means++ is used, and a
KDE-based EJSS estimation approach is proposed to address
the problem of the UAV encountering difficulty in obtaining
the jamming power. Simulation results show that the proposed
Fig. 2. Average sum rate. DDPG based framework with the EJSS and adaptive trajectory
can achieve better performance with lower energy consump-
tion than the Q-learning and benchmark algorithm.
R EFERENCES
[1] B. Ji et al., “Several key technologies for 6G: Challenges and oppor-
tunities,” IEEE Commun. Standards Mag., vol. 5, no. 2, pp. 44–51,
Jun. 2021.
[2] D. Yang et al., “Coping with a smart jammer in wireless networks: A
Stackelberg game approach,” IEEE Trans. Wireless Commun., vol. 12,
no. 8, pp. 4038–4047, Aug. 2013.
[3] L. Xiao et al., “Anti-jamming transmission Stackelberg game with
observation errors,” IEEE Commun. Lett., vol. 19, no. 6, pp. 949–952,
Fig. 3. Outage probability. Jun. 2015.
[4] L. Jia et al., “Bayesian Stackelberg game for antijamming transmission
with incomplete information,” IEEE Commun. Lett., vol. 20, no. 10,
pp. 1991–1994, Oct. 2016.
[5] Y. Xu et al., “Anti-jamming transmission in UAV communication
networks: A Stackelberg game approach,” in Proc. IEEE/CIC Int. Conf.
Commun. China (ICCC), Chengdu, China, Oct. 2017, pp. 1–6.
[6] A. Garnaev et al., “A power control game with uncertainty on the
type of the jammer,” in Proc. IEEE Global Conf. Signal Inf. Process.
(GlobalSIP), Ottawa, ON, Canada, Nov. 2019, pp. 1–5.
[7] Z. Shen et al., “Beam-domain anti-jamming transmission for downlink
massive MIMO systems: A Stackelberg game perspective,” IEEE Trans.
Inf. Forensics Security, vol. 16, pp. 2727–2742, 2021.
[8] P. Vamvakas et al., “Exploiting prospect theory and risk-awareness to
Fig. 4. Energy consumption. protect UAV-assisted network operation,” EURASIP J. Wireless Com-
mun. Netw., vol. 2019, no. 1, pp. 286–305, Dec. 2019.
[9] Z. Xiao et al., “Learning based power control for mmWave mas-
sive MIMO against jamming,” in Proc. IEEE Global Commun. Conf.
(GLOBECOM), Abu Dhabi, United Arab Emirates, Dec. 2018, pp. 1–6.
[10] Y. Li et al., “Power and frequency selection optimization in anti-
jamming communication: A deep reinforcement learning approach,” in
Proc. IEEE 5th Int. Conf. Comput. Commun. (ICCC), Changchun, China,
Dec. 2019, pp. 815–820.
[11] Y. Xiao et al., “Learning-based low-latency VIoT video streaming
against jamming and interference,” IEEE Wireless Commun., vol. 28,
no. 4, pp. 12–18, Aug. 2021.
[12] X. Xia et al., “Toward digitalizing the wireless environment: A unified
A2G information and energy delivery framework based on binary
Fig. 5. Energy efficiency. channel feature map,” IEEE Trans. Wireless Commun., early access,
Feb. 10, 2022, doi: 10.1109/TWC.2022.3149636.
[13] A. Al-Hourani et al., “Optimal LAP altitude for maximum coverage,”
IEEE Wireless Commun. Lett., vol. 3, no. 6, pp. 569–572, Dec. 2014.
the continuous power optimization approach, which could [14] L. Liu et al., “CoMP in the sky: UAV placement and movement
optimization for multi-user communications,” IEEE Trans. Commun.,
decrease the estimation precision at the jammer because the vol. 67, no. 8, pp. 5645–5658, Aug. 2019.
discrete state space is adopted. Furthermore, compared with [15] H. Q. Ngo et al., “Energy and spectral efficiency of very large multiuser
the Q-learning algorithm, the DDPG based framework can MIMO systems,” IEEE Trans. Commun., vol. 61, no. 4, pp. 1436–1449,
Apr. 2013.
reduce the power requirement 38% without incurring perfor- [16] Q. Song et al., “A survey of prototype and experiment for UAV com-
mance loss. Thus the DDPG based framework can achieve munications,” Sci. China Inf. Sci., vol. 64, no. 4, pp. 1–21, Feb. 2021.
higher energy efficiency as shown in Fig. 5, which is more [17] X. Wang et al., “Jamming-resilient path planning for multiple UAVs
via deep reinforcement learning,” in Proc. IEEE Int. Conf. Commun.
attractive for energy-limited UAVs. Workshops (ICC Workshops), Montreal, QC, Canada, Jun. 2021, pp. 1–6.
In addition, the performance achieves a tremendous growth [18] J. An and F. Zhao, “Trajectory optimization and power allocation
when the jammer has imperfect CSI. The reason is that the CSI algorithm in MBS-assisted cell-free massive MIMO systems,” IEEE
Access, vol. 9, pp. 30417–30425, 2021.
error makes the jammer unable to achieve effective precoding; [19] D. W. Scott, Multivariate Density Estimation. Hoboken, NJ, USA: Wiley,
thus, the jamming efficiency decreases, and the UAV can 1992.
achieve better performance with less energy consumption. [20] R. O. Duda, Pattern Classification. Hoboken, NJ, USA: Wiley, 2000.
[21] C. M. Bishop, Pattern Recognition and Machine Learning. New York,
NY, USA: Springer, 2006.
1 The benchmark algorithm was proposed in [9] and simulated without EJSS [22] T. P. Lillicrap et al., “Continuous control with deep reinforcement
estimation and trajectory optimization. learning,” 2015, arXiv:1509.02971.

Authorized licensed use limited to: VNR Vignana Jyothi Inst of Eng & Tech. Downloaded on April 04,2025 at 08:13:47 UTC from IEEE Xplore. Restrictions apply.

M8AL-If-1 Writes The Linear Equation Ax + by C in The Form y MX
100% (1)
M8AL-If-1 Writes The Linear Equation Ax + by C in The Form y MX
8 pages
electronics-13-01209
No ratings yet
electronics-13-01209
14 pages
A One-Leader Multi-Follower Bayesian-Stackelberg Game For Anti-Jamming Transmission in UAV Communication Networks
No ratings yet
A One-Leader Multi-Follower Bayesian-Stackelberg Game For Anti-Jamming Transmission in UAV Communication Networks
11 pages
Event-Driven Transformer-Based Reinforcement Learning for Trajectory Design and Channel Assignment in Multi-UAV Assisted Communication
No ratings yet
Event-Driven Transformer-Based Reinforcement Learning for Trajectory Design and Channel Assignment in Multi-UAV Assisted Communication
13 pages
Eeeetyh08 01911
No ratings yet
Eeeetyh08 01911
20 pages
Positioning and Power
No ratings yet
Positioning and Power
37 pages
Ris + Uav
No ratings yet
Ris + Uav
14 pages
Reinforcement Learning-Based NOMA Power Allocation in The Presence of Smart Jamming
No ratings yet
Reinforcement Learning-Based NOMA Power Allocation in The Presence of Smart Jamming
13 pages
Game Theoretical Secure Wireless Communication For UAV-assisted Vehicular Internet of Things PDF
No ratings yet
Game Theoretical Secure Wireless Communication For UAV-assisted Vehicular Internet of Things PDF
11 pages
Puede Ser
No ratings yet
Puede Ser
14 pages
QoE-Driven_Adaptive_Deployment_Strategy_of_Multi-UAV_Networks_Based_on_Hybrid_Deep_Reinforceme
No ratings yet
QoE-Driven_Adaptive_Deployment_Strategy_of_Multi-UAV_Networks_Based_on_Hybrid_Deep_Reinforceme
14 pages
IEEE Trans 2022---Packet Routing in Dynamic Multi-Hop UAV Relay Network--A Multi-Agent Learning Approach
No ratings yet
IEEE Trans 2022---Packet Routing in Dynamic Multi-Hop UAV Relay Network--A Multi-Agent Learning Approach
14 pages
Deep Reinforcement Learning For 5G Networks: Joint Beamforming, Power Control, and Interference Coordination
No ratings yet
Deep Reinforcement Learning For 5G Networks: Joint Beamforming, Power Control, and Interference Coordination
30 pages
Distributed Cooperative and Noncooperative Joint Power And: Beamforming Adaptation Game For MIMO Sensor Network
No ratings yet
Distributed Cooperative and Noncooperative Joint Power And: Beamforming Adaptation Game For MIMO Sensor Network
5 pages
Game_Theory_and_Reinforcement_Learning_for_Anti-jamming_Defense_in_Wireless_Communications_Current_Research_Challenges_and_Solutions
No ratings yet
Game_Theory_and_Reinforcement_Learning_for_Anti-jamming_Defense_in_Wireless_Communications_Current_Research_Challenges_and_Solutions
43 pages
An Improved Anti-Jamming Method Based on Deep Reinforcement Learning and Feature Engineering
No ratings yet
An Improved Anti-Jamming Method Based on Deep Reinforcement Learning and Feature Engineering
9 pages
Multi-Agent_Deep_Reinforcement_Learning_Based_Optimizing_Joint_3D_Trajectories_and_Phase_Shifts_in_RIS-Assisted_UAV-Enabled_Wireless_Communications
No ratings yet
Multi-Agent_Deep_Reinforcement_Learning_Based_Optimizing_Joint_3D_Trajectories_and_Phase_Shifts_in_RIS-Assisted_UAV-Enabled_Wireless_Communications
15 pages
Research Paper
No ratings yet
Research Paper
11 pages
No-Pain No-Gain DRL Assisted Optimization in Energy-Constrained CR-NOMA Networks
No ratings yet
No-Pain No-Gain DRL Assisted Optimization in Energy-Constrained CR-NOMA Networks
16 pages
Intelligent_Dynamic_Spectrum_Anti-Jamming_Communications_A_Deep_Reinforcement_Learning_Perspective
No ratings yet
Intelligent_Dynamic_Spectrum_Anti-Jamming_Communications_A_Deep_Reinforcement_Learning_Perspective
8 pages
A Deep Q-Network Based-Resource Allocation
No ratings yet
A Deep Q-Network Based-Resource Allocation
5 pages
aerospace-10-00133
No ratings yet
aerospace-10-00133
21 pages
Linear_Jamming_Bandits_Learning_to_Jam_5G-based_Coded_Communications_Systems
No ratings yet
Linear_Jamming_Bandits_Learning_to_Jam_5G-based_Coded_Communications_Systems
6 pages
Multi Agent Reinforcement Learning Based Resource Allocation For
No ratings yet
Multi Agent Reinforcement Learning Based Resource Allocation For
15 pages
DQN 1
No ratings yet
DQN 1
5 pages
UAV Relay Flight Path Planning in The Presence of
No ratings yet
UAV Relay Flight Path Planning in The Presence of
12 pages
Multi-Agent Reinforcement Learning Based Resource Allocation For UAV Networks
No ratings yet
Multi-Agent Reinforcement Learning Based Resource Allocation For UAV Networks
30 pages
Defeating Proactive Jammers Using Deep Reinforcement Learning For Resource-Constrained IoT Networks
No ratings yet
Defeating Proactive Jammers Using Deep Reinforcement Learning For Resource-Constrained IoT Networks
6 pages
Optimal Transmission Strategy For Multiple Markovian Fading Channels Existence, Structure, and Approximation
No ratings yet
Optimal Transmission Strategy For Multiple Markovian Fading Channels Existence, Structure, and Approximation
13 pages
Trajectory Optimization For Autonomous Flying Base Station Via Reinforcement Learning
No ratings yet
Trajectory Optimization For Autonomous Flying Base Station Via Reinforcement Learning
5 pages
Deep Reinforcement Learning For UAV NavigationThrough Massive MIMO Technique
No ratings yet
Deep Reinforcement Learning For UAV NavigationThrough Massive MIMO Technique
6 pages
Jamming of UAV Remote Control Systems Using Software Defined Radio
No ratings yet
Jamming of UAV Remote Control Systems Using Software Defined Radio
7 pages
RL PLS UVA
No ratings yet
RL PLS UVA
5 pages
sensors-23-09541-v2
No ratings yet
sensors-23-09541-v2
23 pages
Version 9
No ratings yet
Version 9
15 pages
An Adaptive Resource Allocation Model With Anti-Jamming in IoT Network
No ratings yet
An Adaptive Resource Allocation Model With Anti-Jamming in IoT Network
9 pages
Detection and Mitigation of Position Spoofing Attacks on Cooperative UAV Swarm Formations
No ratings yet
Detection and Mitigation of Position Spoofing Attacks on Cooperative UAV Swarm Formations
13 pages
3D-Trajectory and Phase-Shift Design for RIS-Assisted UAV Systems Using Deep Reinforcement Learning
No ratings yet
3D-Trajectory and Phase-Shift Design for RIS-Assisted UAV Systems Using Deep Reinforcement Learning
10 pages
Interference-Aware_Deployment_for_Maximizing_User_Satisfaction_in_Multi-UAV_Wireless_Networks
No ratings yet
Interference-Aware_Deployment_for_Maximizing_User_Satisfaction_in_Multi-UAV_Wireless_Networks
5 pages
Resource and Trajectory Optimization For Secure Communications in Dual Unmanned Aerial Vehicle Mobile Edge Computing Systems
No ratings yet
Resource and Trajectory Optimization For Secure Communications in Dual Unmanned Aerial Vehicle Mobile Edge Computing Systems
10 pages
Enhancing Sum-Rate Performance in Constrained Multicell Networks: A Low-Information Exchange Approach
No ratings yet
Enhancing Sum-Rate Performance in Constrained Multicell Networks: A Low-Information Exchange Approach
5 pages
Jammer
No ratings yet
Jammer
6 pages
2.NOMA Empowered Integrated Sensing and Communication
No ratings yet
2.NOMA Empowered Integrated Sensing and Communication
5 pages
Deep Reinforcement Learning for Multi-user
No ratings yet
Deep Reinforcement Learning for Multi-user
33 pages
[3] Deep Reinforcement Learning for RIS-Aided Non-Orthogonal Multiple Access Downlink Networks
No ratings yet
[3] Deep Reinforcement Learning for RIS-Aided Non-Orthogonal Multiple Access Downlink Networks
6 pages
Reinforcement Learning For Maritime Communications Liang Xiao pdf download
100% (1)
Reinforcement Learning For Maritime Communications Liang Xiao pdf download
46 pages
[2] Unsupervised Learning-Based Joint Active and Passive Beamforming Design for Reconfigurable Intelligent Surfaces Aided Wireless Networks
No ratings yet
[2] Unsupervised Learning-Based Joint Active and Passive Beamforming Design for Reconfigurable Intelligent Surfaces Aided Wireless Networks
5 pages
50 Make Smart Decisions Faster Deciding D2D Resource Allocation Via Stackelberg Game Guided Multi Agent Deep Reinforcement Learning
No ratings yet
50 Make Smart Decisions Faster Deciding D2D Resource Allocation Via Stackelberg Game Guided Multi Agent Deep Reinforcement Learning
12 pages
He 2020 MIMORadar
No ratings yet
He 2020 MIMORadar
6 pages
Aerospace 2128999
No ratings yet
Aerospace 2128999
31 pages
Competing_Cognitive_Resilient_Networks
No ratings yet
Competing_Cognitive_Resilient_Networks
15 pages
Deep Reinforcement Learning Based End-to-End Multiuser Channel Prediction and Beamforming
No ratings yet
Deep Reinforcement Learning Based End-to-End Multiuser Channel Prediction and Beamforming
15 pages
Abd-Elmagid, Dhillon, Pappas - 2019 - A reinforcement learning framework for optimizing age-of-information in RF-powered communication s
No ratings yet
Abd-Elmagid, Dhillon, Pappas - 2019 - A reinforcement learning framework for optimizing age-of-information in RF-powered communication s
14 pages
Information Theory in Emerging Wireless Communicat
No ratings yet
Information Theory in Emerging Wireless Communicat
5 pages
Reinforcement Learning Framework For Dynamic Power Transmission in Cloud RAN Systems
No ratings yet
Reinforcement Learning Framework For Dynamic Power Transmission in Cloud RAN Systems
6 pages
QPGAO RL UAVQ Rev1 Fix-1
No ratings yet
QPGAO RL UAVQ Rev1 Fix-1
15 pages
Deep Reinforcement Learning-Based Sum-Rate Maximization For Uplink Multi-User SIMO-RSMA Systems
No ratings yet
Deep Reinforcement Learning-Based Sum-Rate Maximization For Uplink Multi-User SIMO-RSMA Systems
10 pages
Machine Learning For Position Prediction and Determination in Aerial Base Station System
No ratings yet
Machine Learning For Position Prediction and Determination in Aerial Base Station System
6 pages
Joint Transmit Waveform and Receive Filter Design For Dual-Function Radar-Communication Systems
No ratings yet
Joint Transmit Waveform and Receive Filter Design For Dual-Function Radar-Communication Systems
15 pages
demoppt
No ratings yet
demoppt
5 pages
Distributed Facts Device for Flow Controls
From Everand
Distributed Facts Device for Flow Controls
Dr.V.V.L.N. Sastry
No ratings yet
01 Reading Passage - Business Ideas of The Future
100% (2)
01 Reading Passage - Business Ideas of The Future
2 pages
ADTRAN TotalAccess908e
No ratings yet
ADTRAN TotalAccess908e
34 pages
NIJ Standard of Armour Materials
No ratings yet
NIJ Standard of Armour Materials
27 pages
Generator - SOP
No ratings yet
Generator - SOP
12 pages
Pantech-Deep Learning, AI Topics 2022-2023
No ratings yet
Pantech-Deep Learning, AI Topics 2022-2023
5 pages
Two Worlds One Family
No ratings yet
Two Worlds One Family
81 pages
Model of Cellular Automata
No ratings yet
Model of Cellular Automata
5 pages
Calypso: Option 10 Automation
No ratings yet
Calypso: Option 10 Automation
18 pages
NOC24-EE49 Assignment Week00
No ratings yet
NOC24-EE49 Assignment Week00
5 pages
HY200_ICC-ES__ESR-3187
No ratings yet
HY200_ICC-ES__ESR-3187
46 pages
Robert Young Hybridity
No ratings yet
Robert Young Hybridity
15 pages
Journal Wemos and Triangulation Eng PDF
No ratings yet
Journal Wemos and Triangulation Eng PDF
6 pages
ECHOLENS_Smart_Glasses_for_Real-time_speech_displa
No ratings yet
ECHOLENS_Smart_Glasses_for_Real-time_speech_displa
6 pages
Pds Hempadur Mastic 45880 En-Gb
No ratings yet
Pds Hempadur Mastic 45880 En-Gb
3 pages
UNIT_1
No ratings yet
UNIT_1
140 pages
The Psychology Of Art George Mather instant download
100% (1)
The Psychology Of Art George Mather instant download
42 pages
PPM in SES
No ratings yet
PPM in SES
128 pages
ANGLAIS 9e Compilation 2023
No ratings yet
ANGLAIS 9e Compilation 2023
27 pages
Shortcut Keys
No ratings yet
Shortcut Keys
49 pages
NEWMay LIST - OF - LICENSED - CFAS
No ratings yet
NEWMay LIST - OF - LICENSED - CFAS
103 pages
LECTURE 26 Simple Serological Techniques
No ratings yet
LECTURE 26 Simple Serological Techniques
8 pages
Closers End Game Item Guide
No ratings yet
Closers End Game Item Guide
7 pages
India Independence Day
No ratings yet
India Independence Day
20 pages
qatartribune-20250316-1
No ratings yet
qatartribune-20250316-1
16 pages
Annexure-I: Ord. Bos - Ces: 21-9-2020
No ratings yet
Annexure-I: Ord. Bos - Ces: 21-9-2020
92 pages
JEANWATSONTHEORY
No ratings yet
JEANWATSONTHEORY
5 pages
Consumer Behavior
No ratings yet
Consumer Behavior
35 pages
BS en Iso 08503-2-2012
No ratings yet
BS en Iso 08503-2-2012
16 pages
Magick and Hypnosis
No ratings yet
Magick and Hypnosis
112 pages

Reinforcement_Learning-Based_Dynamic_Anti-Jamming_Power_Control_in_UAV_Networks_An_Effective_Jamming_Signal_Strength_Based_Approach

Uploaded by

Reinforcement_Learning-Based_Dynamic_Anti-Jamming_Power_Control_in_UAV_Networks_An_Effective_Jamming_Signal_Strength_Based_Approach

Uploaded by

IEEE COMMUNICATIONS LETTERS, VOL. 26, NO.

10, OCTOBER 2022 2355

Reinforcement Learning-Based Dynamic Anti-Jamming Power Control in UAV

where βJ,k = ψτ0 /dκJ,k models the path loss and

You might also like