0% found this document useful (0 votes)
9 views

Secure Movable Antennas (1)

This paper presents a reinforcement learning approach to enhance physical security in movable antenna-aided Integrated Sensing and Communication (ISAC) systems by dynamically adjusting antenna positions to improve communication secrecy against eavesdroppers. The proposed deep reinforcement learning algorithm, Proximal Policy Optimization (PPO), optimizes the transmit beamforming and antenna movement under imperfect channel state information conditions, achieving a 52% increase in secrecy rate compared to baseline methods. The study highlights the advantages of movable antennas in providing additional degrees of freedom while addressing the challenges of secure communication in dynamic environments.

Uploaded by

Hòa Nguyễn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Secure Movable Antennas (1)

This paper presents a reinforcement learning approach to enhance physical security in movable antenna-aided Integrated Sensing and Communication (ISAC) systems by dynamically adjusting antenna positions to improve communication secrecy against eavesdroppers. The proposed deep reinforcement learning algorithm, Proximal Policy Optimization (PPO), optimizes the transmit beamforming and antenna movement under imperfect channel state information conditions, achieving a 52% increase in secrecy rate compared to baseline methods. The study highlights the advantages of movable antennas in providing additional degrees of freedom while addressing the challenges of secure communication in dynamic environments.

Uploaded by

Hòa Nguyễn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

1

Beamforming Design for Physical Security in


Movable Antenna-aided ISAC Systems:
A Reinforcement Learning Approach
Hoang Le Hung, Nguyen Hoang Huy, Nguyen Cong Luong, Quoc-Viet Pham, Senior Member, IEEE,
Dusit Niyato, Fellow, IEEE, and Nguyen Tien Hoa, Member, IEEE

Abstract—In this paper, we investigate a secure Integrated systems, it can be used to disrupt illegal receivers to protect
Sensing and Communication (ISAC) system aided by Movable confidential information from eavesdropping. Extensive stud-
Antennas (MAs). Therein, the positions of the MAs can be ies on Physical Layer Security (PLS) have considered the
dynamically adjusted to enhance channel conditions, thereby
improving communication security for legitimate users in the use of interfering/jamming signals to disrupt eavesdroppers
presence of adversaries. Additionally, we consider a realistic [4], [5]. Specifically, the authors in [4] employed symbol-
scenario, where the Base Station (BS) only has imperfect level precoding to implement secure beamforming, while the
eavesdropper’s Channel State Information (CSI). In this context, authors in [5] designed beamforming under imperfect CSI
our objective is to optimize the system secrecy rate by jointly conditions, including bounded and Gaussian CSI errors.
designing the BS’s transmit beamforming vector and MA
movement. To address the dynamics of the ISAC environments Another challenge for ISAC implementation is the limita-
and uncertainties of the imperfect CSI scenario, as well as the tion of available Degrees of Freedom (DoFs) in the spatial do-
complexity and non-convexity of the problem, we propose a deep main when using Fixed-Position Antenna (FPAs) arrays. An-
reinforcement learning method to simultaneously design trans-
mit beamforming and MA movement to against eavesdropper. tenna Selection (AS) techniques can partially address this lim-
The simulation results show that the use of MAs improves the itation by exploiting spatial DoFs, but they typically require a
secrecy rate by 52% and the proposed algorithm outperforms large number of antenna elements, thus driving up costs [6].
baseline methods in terms of adaptability and performance. Recently, MA have emerged as a cost-effective alternative
Index Terms—Movable Antenna, Integrated Sensing and to address the challenge of providing additional DoFs. By
Communication, Reinforcement Learning, Physical Layer Se- using real-time controllers, such as stepper motors or servos,
curity MAs allow flexible adjustment of antenna positions, enabling
full spatial diversity with fewer antennas or even a single
I. I NTRODUCTION antenna. With the advantages mentioned above, this approach
has attracted attention in ISAC system design [7], [8]. For
Integrated Sensing and Communication (ISAC) has be- example, the work in [7] demonstrated significant uplink and
come a key enabler for next-generation wireless networks, in- downlink communication improvements in MA-based ISAC
cluding Beyond Fifth Generation (B5G) and Sixth Generation systems, and [8] addressed secure communication using MAs
(6G) systems. By integrating sensing and communication into in combination with reconfigurable intelligent surfaces.
a unified framework, ISAC facilitates intelligent applications
The aforementioned studies primarily employed traditional
such as vehicle-to-infrastructure networks and unmanned
convex optimization methods to address the complex non-
aerial vehicles [1]. Unlike traditional wireless systems, where
convex problems inherent in ISAC design. However, in
sensing and communication functions are performed indepen-
dynamic and uncertain environments under imperfect CSI
dently, ISAC simultaneously delivers communication services
scenarios, the one-shot optimization methods prove to be
to users while acquiring critical information about sensing tar-
ineffective. To address this limitation, Deep Reinforcement
gets, thereby improving the resource efficiency and reducing
Learning (DRL) has emerged as an effective method, enabling
the hardware cost [2].
systems to learn from environmental interactions and adapt
However, designing ISAC systems poses critical chal-
to changes or uncertainties within the operational context.
lenges. The need to create highly directional sensor beams
The authors in [9], [10] investigated the application of
while carrying information signals leads to inevitable signal
DRL in MA implementation, showing its ability to solve
leakage, which can be intercepted by eavesdroppers [3].
problems encountered by traditional convex optimization
Although radar is considered as a disruptive element in ISAC
methods. However, there is no research yet on the use of
H. L. Hung and N. C. Luong are with the Faculty of Computer Sci- DRL method to improve the secure communication of MA-
ence, Phenikaa University, Hanoi 12116, Vietnam. Emails: (hung.hoangle, aided ISAC systems. Inspired by this research motivation, this
luong.nguyencong)@phenikaa-uni.edu.vn paper considers a secure communication issue for MA-aided
N. T. Hoa, and N. H. Huy are with the School of Electrical and Electronic
Engineering, Hanoi University of Science and Technology, Hanoi 100000, ISAC system with imperfect eavesdropper’s CSI. Specifically,
Vietnam. Emails: [email protected], [email protected]. we aim to maximize the wosrt-case security rate between
Q.-V. Pham is with the School of Computer Science and Statistics, Trinity users and eavesdropper over multiple time lots by jointly
College Dublin, D02 PN40, Ireland. Email: [email protected].
D. Niyato is with the College of Computing and Data Science, Nanyang optimizing the transmit beamforming of BS and the MA
Technological University, Singapore. Email: [email protected]. locations. We propose a DRL algorithm, namely Proximal
2

Policy Optimization (PPO), to jointly design the transmit from the BS to eavesdropper/CUs, respectively, and λ is the
beamforming and MA movement. The simulation results wavelength.
demonstrate that the superior performance in terms of secrecy 1) Secure Communication: We denote Lp as the number
rate of using the MAs. Moreover, the PPO algorithm has of channel paths from the BS to CU m. The channel between
excellent adaptability to dynamic and uncertain environments the BS and CU m then is obtained as [11]
under imperfect CSI scenario. XLp
hm = βm am (x) ∈ CN ×1 ,
l l
(4)
l=1
II. S YSTEM M ODEL AND P ROBLEM F ORMULATION where l
βmdenotes the path gain of the l-th transmit path from
A. System Model l
the BS to CU m. We assume that βm follows the circularly
l
symmetric complex Gaussian distribution [11], i.e., βm ∼
−α
Communication Link CN (0, ρdm /Lp ), where ρ represents the path loss at the
Radar Link Target reference distance of 1 m and α is the path-loss exponent.
Accordingly, the received signal at the CU m is expressed as
XM
Eavesdropper ym = hH m (x)wm sm + hH (x)wj sj (5)
| {z } | j=1,j̸=m{z m
Expected Signal
}
Inter-user Interference
... + hH (x)w0 s0 +nm ,
ISAC-MA BS
CU 1 CU M
| m {z }
Sensing Interference

Fig. 1: MA-aided secure ISAC system model. where nm ∼ CN (0, σm 2


) is the additive white Gaussian noise
2
(AWGN) with mean zero and variance σm . The communi-
We consider a secure ISAC system, as illustrated in cation Signal-to-Interference-plus-Noise Ratio (SINR) at CU
Fig. 1, consisting of a BS equipped with N transmitting m then can be expressed as:
MA (Tx-MA) elements and a fixed single receiving an-
tenna (Rx), M communication users (CUs), denoted by set ∥hH
m (x)wm ∥
2
γm (W, x) = PM . (6)
M = {1, . . . , M }, and a sensing target. We consider that an j=0,j̸=m ∥hH 2 2
m (x)wj ∥ + σm
untrusted or suspicious eavesdropper may attempt to collect For the eavesdropping model under imperfect CSI scenario,
confidential messages directed of any CU m. We consider we assume that the BS has imperfect CSI of the eavesdropper.
the operation of a secure ISAC system over multiple time Let heva denote the perfect channel between the BS and the
slots. Specifically, the radar function detects sensing targets, eavesdropper, which can be formulate similarly to (4), and
while the BS uses linear beamforming to transmit confidential ĥeva denote the estimated CSI corresponding to the eaves-
information and the sensing signal. Let s = [s0 , . . . , sM ]T ∈ dropper. The CSI estimation error vector of eavesdropper is
CM +1 be the transmit signal vector, where s0 is the radar expressed as
signal, sm is the symbol for CU m, and E[sH s] = I. The ∆heva = heva − ĥeva . (7)
beamforming matrix W = [w0 , . . . , wM ] ∈ CN ×(M +1) ,
where w0 , wm ∈ CN ×1 are transmit beamforming vector Similarly to [5], we assume bounded CSI errors, where the
for radar sensing, and confidential information of CU m, CSI estimation error vector of eavesdropper is subject to a
respectively. constraint, i.e., ∥∆heva ∥ ≤ ε, where ε = µ|heva | represents
Since the size of antenna is much smaller than the signal the maximum allowable CSI error, with µ denoting the error
transmission distance, we consider the far-field channel model percentage. Thus, the received signal at the eavesdropper is
from the CUs to the BS. Therefore, for each l-th channel expressed as [10]
path component, all the Tx-MA elements experience the XM
same Angle of Direction (AoD) φ ∈ [0, π]. In addition, yeva =ĥHeva (x)wm sm + ∆hHeva (x)wj sj + (8)
j=0
the position of Tx-MA elements can be flexibly adjusted XM
in a given one-dimensional line segment of length ATx . In ĥH H
eva (x)wj sj + ĥeva (x)w0 s0 + nm .
j=1,j̸=m
detail, the positions of N Tx-MA elements are indicated by
Then, the SINR achieved by eavesdropper is given by
x = [x1 , . . . , xN ]T ∈ RN , where 0 ≤ x1 < · · · < xN ≤ ATx
without loss of generality. In order to prevent coupling of m ∥ĥH
eva (x)wm ∥
2
γ̂eva (W, x) = , (9)
adjacent MA elements, a minimum distance D0 between Ĵm
eva
antenna pairs is ensured, i.e, |xz − xq | ≥ D0 , z, q ∈ PM
{1, . . . , N }, z ̸= q. The Field-Response Vectors (FRVs) of where Ĵm
eva = H
j=0 ∥∆heva (x)wj ∥
2
+
PM H 2 2
the Tx-MA then are expressed as j=1,j̸=m ∥ ĥeva (x)w j ∥ + σeva is the covariance matrix of
2π 2π interference plus noise plus channel estimation errors. To
as (x) = [ej λ x1 cos φs
, . . . , ej λ xN cos φs T
] , (1) evaluate the security performance of the system, we employ
l l
j 2π j 2π
aleva (x) = [e λ x1 cos φeva ,...,e λ xN cos φeva T ] , (2) the worst-case secrecy rate as the secure communication

cos φlm 2π
cos φlm metric, which is given by
alm (x) = [ej λ x1 , . . . , ej λ xN ]T , (3)
 +
m
where as (x) and aleva (x)/alm (x) are FRVs between the ∆Rm (W, x) = min log2 (1 + γm ) − log2 (1 + γ̂eva ) .
m∈M
BS and the sensing target, and the l-th propagation path (10)
3

2) Radar Sensing: We consider that the radar operates III. S TOCHASTIC O PTIMIZATION P ROBLEM
using a tracking model, where an initial estimate of the F ORMULATION AND DRL A LGORITHM
target’s Angle of Arrival (AoA) is obtained from previous A. Stochastic Optimization Problem
scans. The radar channel is modeled as slowly varying over
time, with the following expression [12]: The system optimization is formulated as a Markov deci-
sion process defined as < S, A, P, R >, which represent the
gs = βs ej2πfd τ as (x), (11) state space, action space, state transition probability function,
and reward function, respectively.
where fd = 2vfc /c is the Doppler frequency, fc , v and c 1) State Space: The state space is represented by factors
representing the carrier frequency, the target’s velocity, and that influence the objective and action decisions. The state
the speed of light, respectively, τ is the system sampling of the system can be determined by the CSI between the BS
period, and βs represents the attenuation coefficient, which and the target/CUs. In particular, we also use the imperfect
incorporates round-trip path loss and the radar cross-section CSI based on (7) so that the proposed algorithm can directly
(RCS) αs , given by find favorable actions based on the estimated imperfect CSI.
s Therefore, the state space at time slot t is
λ2 αs
βs = . (12)
(4π)3 d4s St ={gst , htm , ĥteva }, m ∈ M. (16)

The received signal at the radar system consists of the target 2) Action Space: In the optimization problem in (15), the
echo signal, interference from communication echoes, and BS needs to design the BS transmit beamforming W and
noise, which is given by: control the MA element positions x to maximize the worst-
XM case secrecy rate. Thus, the action space is defined by
ys = gsH w0 s0 + gH wj sj +ns , (13)
| {z } | j=1{zs At ={Wt , xt }. (17)
target }
users
3) Reward: The reward is designed to maximize the worst-
where ns ∼ CN (0, σs2 ) is the AWGN with mean zero and case secrecy rate of the system while satisfying important
variance σs2 . The radar SINR then can be written as constraints in (15b), (15c), and (15d). Thus, the reward
function is formulated as
∥gsH (x)w0 ∥2
γs (W, x) = PM . (14) (P
t
∥gsH (x)wj ∥2 + σs2 m∈M ∆Rm (Wt , xt ), if (15b), (15c), (15d)
j=1 rt =
0, otherwise.
(18)
B. Problem Formulation Accordingly, the BS only receives rewards when it ensures
In this paper, we aim to design the BS transmit beamform- constraints on transmission power, MA movement and sens-
ing W and control the MA element positions x to maximize ing SINR.
the worst-case secrecy rate. Meanwhile, the radar target
detection performance constraint, the MA constraints, and the B. PPO-based Algorithm
power constraints of the BS are satisfied. Mathematically, the
dynamic beamforming and antenna movement optimization Stochastic optimization problems like the one presented in
problem is formulated as: Section III-A can be solved using value-based algorithms,
X such as Deep Deterministic Policy Gradient (DDPG) [13],
max ∆Rm (W, x), (15a) where the algorithms estimate the value function (Q-function)
W,x m∈M
and select actions to optimize the Q-value at each step.
s.t. Tr(WH W) ≤ P0 , (15b) However, the learning process of these algorithms requires a
|xz − xq | ≥ D0 , z, q ∈ {1, . . . , N }, z ̸= q, (15c) large amount of resources and often yields suboptimal results.
γs (W, x) ≥ γmin , (15d) This is due to the non-policy training nature and the need to
iterate the environment exploration process. In contrast, PPO,
where P0 is the maximum transmission power, and γmin an on-policy algorithm, integrates policy gradient methods
denotes the sensing SINR threshold. Since the objective with a learned value function. This approach enables agents
function (15a) is non-concave with respect to the MA lo- trained with PPO to continuously improve their policies
cations and minimum distance constraint in (15c) is non- while interacting with the environment, allowing them to
convex, problem (15) is a non-convex optimization problem. adapt effectively to environmental changes while maintaining
In addition, since we consider the problem in multi-time slots, exploratory behavior.
the input parameters such as CU locations and AoDs vary The actor–critic architecture is central to the PPO frame-
over time, which makes the traditional one-shot optimization work. The actor-network, parameterized by θa , defines a
algorithms need to be re-implemented over time slots. This policy based on the state, while the critic-network, param-
results in high computation cost. Furthermore, we aim to eterized by θc , serves as a value function (similar to value-
maximize a long-term reward, and thus we propose to use based algorithms). The critic-network evaluates the actor’s
DRL. In the next sections, we reformulate the problem in actions and thus improving the policy using policy gradient.
(15) into a stochastic optimization problem and develop an Therefore, the PPO algorithm becomes effective when they
efficient algorithm using DRL to obtain the solution. can directly learn policies from the environment. In addition,
4

by limiting the update of the new policy within a small range TABLE I: Simulation Parameter
compared to the old policy, PPO balances policy improvement Parameter Value Parameter Value
and learning stability. M 3 N 4
τ 1 µs [15] P0 25 dBm
Specifically, the PPO algorithm utilizes a state-value func- Lp 13 λ 0.1m
tion V (st ) to estimate the expected return from the current α 2.2 D0 λ/2
2 , σ2 −80 dBm −46 dB [16]
state according to the policy πθc (at |st ), which is mathemat- σm s ρ
ATx 10 × λ γmin 10 dBm
ically formulated as: dmax 100 dmin 25
hX∞ i fc 2.4 GHz c 3 × 108 m/s
V (st ) = E γ i rt+i , (19)
i=0

where γ is the discount factor and rt+i indicates the obtained


reward according to (18). The critic network then estimates 0.6
the advantage function. To ensure stable updates of the 0.5
policy, the General Advantage Estimation (GAE) method

Mean Reward
0.4
is employed, and the estimated advantage function is then
defined as: 0.3
X∞ MA-PPO
0.2
Â(st ) = (γω)i (rt+i + γV (st+i+1 ) − V (st+i )) , MA-A2C
MA-DDPG
i=0
(20) 0.1 FPA-PPO
FPA-A2C

where ω is the GAE coefficient. In order to address the 0


FPA-DDPG

0 1000 2000 3000 4000 5000


problem of large policy updates during model training, the
Episodes
PPO algorithm introduces the clip function. This function
improves the policy stability by limiting the ratio between Fig. 2: The covergence of algorithms.
the current policy and the previous policy within a range
[1 − ϵ, 1 + ϵ] to remove the incentive for the current policy
to go too far from the old one. Mathematically, it can be randomly distributed within [0, π] and [dmin , dmax ], respec-
expressed as tively. The model parameters are listed in Table I. We first
discuss the convergence of PPO, A2C, and DDQN algorithms

1 + ϵ, if ψ > 1 + ϵ,

in Fig. 2. The results indicate that these algorithms converge
clip(ψ, 1 − ϵ, 1 + ϵ) = ψ, if 1 − ϵ ≤ ψ ≤ 1 + ϵ,
 to stable reward values, confirming their feasibility. After
1 − ϵ, if ψ < 1 + ϵ,

training, the proposed PPO algorithm delivers robust and
where ϵ is the limit adjustment parameter. The loss function real-time solutions for transmit beamforming and antenna
of actor network, which utilizes the clip function, can then movement with imperfect CSI. Notably, the reward achieved
be evaluated as by the MA system is significantly higher than that of the FPA
h  i system due to utilization of DoFs for attaining more favorable
L(θa ) = Et Â(st , at ) min pt (θa ), clip(pt (θa ), 1 − ϵ, 1 + ϵ) , (21) channels. Specifically, for the proposed PPO algorithm, the
reward of the MA system is improved by 52% compared
where pt (θa ) = πθanew (at |st )/πθaold (at |st ) is the probability
to the FPA system. However, the FPA system demonstrates
ratio between the new policy and the old policy. In addition,
faster convergence because it only optimizes the beamforming
the loss function of the critic network can be expressed as
h i vector, whereas the MA system simultaneously optimizes
L(θc ) = Et rt+i + γV (st+i+1 ) − V (st+i ) . (22) both the beamforming vector and antenna movement. Addi-
tionally, the proposed PPO algorithm outperforms A2C and
Finally, the actor network is updated using gradient ascent DDQN in terms of system reward in both FPA and MA
and the critic network is updated using gradient descent: scenarios, which confirms its excellent adaptability compared
to baseline algorithms.
θa ← θa + βθa ∆θa L(θa ), (23a)
Next, we examine how the maximum transmission power
θc ← θc − βθc ∆θc L(θc ), (23b) and the sensing SINR threshold affect the sum secrecy
rate. Fig. 3a illustrates the relationship between the sum
where βθa and βθa are learning rate of actor and critic
secrecy rate and the maximum transmission power. It is clear
network, respectively.
that as transmission power increases, all schemes exhibit a
significant improvement in the sum secrecy rate. Notably,
IV. P ERFORMANCE E VALUATION the MA scheme achieves the same sum secrecy rate with
This section presents the simulation results to evaluate the significantly lower power requirements than the FPA scheme.
performance of the proposed PPO algorithm. We also intro- For instance, under the proposed PPO algorithm, the MA
duce Advantage Actor Critic (A2C) [14] and DDPG algo- scheme achieves a sum secrecy rate of 0.6 bit/s/Hz with only
rithms as baseline schemes to evaluate the performance of our 25 dBm, while the FPA scheme requires 27 dBm to reach
proposed algorithm. We use the tuple (φ, d, v) to represent the same performance. Fig. 3b shows the sum secrecy rate
the AoD, distance, and velocity of target/eavesdropper/CUs. as a function of the sensing SINR threshold. The results
The radar target is at (π/3, 25m, 10m/s), while the eaves- indicate a consistent decline in the secrecy rate for all
dropper/CUs are stationary with their AoDs and distances approaches as the sensing SINR threshold becomes stricter.
5

V. C ONCLUSIONS
1.4 MA-PPO
0.6 This paper has investigated an MA-aided secure ISAC sys-

Sum Secrecy Rate


MA-A2C
Sum secrecy rate

1.2 MA-DDPG

1
FPA-PPO 0.5 tem. With high flexibility, MAs improve channel conditions
FPA-A2C

0.8
FPA-DDPG
0.4 and ensure the secure communication of legitimate users. To
MA-PPO
0.6 0.3 MA-A2C optimize the system secrecy rate and tackle the challenges
MA-DDPG
0.4 0.2 FPA-PPO
FPA-A2C
of dynamic and uncertain imperfect CSI scenarios, we have
0.2
0.1
FPA-DDPG
proposed the PPO algorithm to jointly optimize the BS
20 23 25 27 30 5 10 15 20
Maximum transmission power (dBm) Sensing SINR Threshold (dBm) transmit beamforming and MA movements. The simulation
(a) (b)
results have confirmed the effectiveness of the proposed DRL
algorithms and highlighted the performance advantages of
Fig. 3: The impact of (a) the maximum transmission power MAs over traditional FPA schemes.
and (b) the sensing SINR threshold on the sum secrecy rate.
R EFERENCES
1.2 MA-PPO [1] Z. Du, F. Liu, W. Yuan, C. Masouros, Z. Zhang, S. Xia, and G. Caire,
MA-A2C 0.6
“Integrated Sensing and Communications for V2I Networks: Dynamic
Sum Secrecy Rate

Sum secrecy Rate

MA-DDPG
1 FPA-PPO 0.5 Predictive Beamforming for Extended Vehicle Targets,” IEEE Trans-
FPA-A2C
0.8 FPA-DDPG
0.4 actions on Wireless Communications, vol. 22, no. 6, pp. 3612–3627,
0.6 0.3 MA-PPO 2023.
MA-A2C
MA-DDPG [2] F. Liu, Y. Cui, C. Masouros, J. Xu, T. X. Han, Y. C. Eldar, and
0.2
0.4 FPA-PPO
FPA-A2C
S. Buzzi, “Integrated Sensing and Communications: Toward Dual-
0.1 FPA-DDPG Functional Wireless Networks for 6G and Beyond,” IEEE Journal on
0.2
2 4 6 8 3 5 7 9 11 13 15 Selected Areas in Communications, vol. 40, no. 6, pp. 1728–1767,
Number of Antennas Number of Channel Paths 2022.
[3] F. Liu, C. Masouros, A. P. Petropulu, H. Griffiths, and L. Hanzo, “Joint
(a) (b)
Radar and Communication Design: Applications, State-of-the-Art, and
Fig. 4: The impact of (a) the number of antennas and (b) the the Road Ahead,” IEEE Transactions on Communications, vol. 68,
no. 6, pp. 3834–3862, 2020.
number of channel paths on the sum secrecy rate. [4] M. R. A. Khandaker, C. Masouros, and K.-K. Wong, “Constructive
Interference Based Secure Precoding: A New Dimension in Physical
Layer Security,” IEEE Transactions on Information Forensics and
Security, vol. 13, no. 9, pp. 2256–2268, 2018.
This is because a more stringent target detection constraint [5] Z. Ren, L. Qiu, J. Xu, and D. W. K. Ng, “Robust Transmit Beam-
demands higher power allocation for sensing beamformers. forming for Secure Integrated Sensing and Communication,” IEEE
Consequently, with a fixed transmission power budget, less Transactions on Communications, vol. 71, no. 9, pp. 5549–5564, 2023.
[6] A. Molisch and M. Win, “MIMO Systems with Antenna Selection,”
power is available for communication tasks, leading to a IEEE Microwave Magazine, vol. 5, no. 1, pp. 46–56, 2004.
reduction in the sum secrecy rate. Importantly, the PPO- [7] H. Qin, W. Chen, Q. Wu, Z. Zhang, Z. Li, and N. Cheng, “Cramér-Rao
based MA scheme exhibits superior adaptability compared Bound Minimization for Movable Antenna-Assisted Multiuser Inte-
grated Sensing and Communications,” IEEE Wireless Communications
to baseline schemes. For example, when the sensing SINR Letters, pp. 1–1, 2024.
threshold increases from 5 dB to 20 dB, the MA-PPO scheme [8] Y. Ma, K. Liu, Y. Liu, L. Zhu, and Z. Xiao, “Movable-Antenna
experiences only a 9% reduction in performance, while the Aided Secure Transmission for RIS-ISAC Systems,” 2024. [Online].
Available: https://arxiv.org/abs/2410.03426
FPA-DDPG scheme suffers a significantly larger drop of 28%. [9] C. Wang, G. Li, H. Zhang, K.-K. Wong, Z. Li, D. W. K. Ng, and C.-B.
Finally, we examine how the number of MA elements and Chae, “Fluid Antenna System Liberating Multiuser MIMO for ISAC
channel paths affect the sum secrecy rate. Fig. 4a presents via Deep Reinforcement Learning,” IEEE Transactions on Wireless
Communications, vol. 23, no. 9, pp. 10 879–10 894, 2024.
the sum secrecy rate as a function of the number of antenna [10] C. Weng, Y. Chen, L. Zhu, and Y. Wang, “Learning-Based Joint
elements. Increasing the number of antennas enables the Beamforming and Antenna Movement Design for Movable Antenna
system to achieve spatial diversity and beamforming gains, Systems,” IEEE Wireless Communications Letters, vol. 13, no. 8, pp.
2120–2124, 2024.
resulting in higher sum secrecy rates for all schemes. Notably, [11] X. Wei, W. Mei, D. Wang, B. Ning, and Z. Chen, “Joint Beamforming
the superior spatial DoF utilization achieved by MAs allows and Antenna Position Optimization for Movable Antenna-Assisted
them to require fewer antennas to deliver the same level Spectrum Sharing,” IEEE Wireless Communications Letters, vol. 13,
no. 9, p. 2502–2506, 2024.
of security performance. Specifically, the MA-PPO scheme [12] Z. Liu, S. Aditya, H. Li, and B. Clerckx, “Joint Transmit and Receive
achieves a sum secrecy rate of 0.6 bits/s/Hz with only 4 Beamforming Design in Full-Duplex Integrated Sensing and Communi-
antennas, compared to 6 antennas needed by the FPA-PPO cations,” IEEE Journal on Selected Areas in Communications, vol. 41,
no. 9, pp. 2907–2919, 2023.
scheme. Fig. 4b shows the variation in sum secrecy rate with [13] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa,
the number of channel paths. It is clear that as the number D. Silver, and D. Wierstra, “Continuous Control with Deep Reinforce-
of channel paths grows, the sum secrecy rate increases ment Learning,” International Conference on Learning Representations
(ICLR), 2016.
considerably across all schemes, with the performance gaps [14] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley,
becoming more pronounced. Specifically, when Lp = 3, the D. Silver, and K. Kavukcuoglu, “Asynchronous Methods for Deep
sum secrecy rate gaps between the MA and FPA systems are Reinforcement Learning,” Proceedings of The 33rd International Con-
ference on Machine Learning, vol. 48, pp. 1928–1937, 2016.
0.02, 0.05 and 0.04 bit/s/Hz for the PPO, A2C, and DDPG [15] S. Huang, M. Zhang, Y. Gao, and Z. Feng, “MIMO Radar Aided
algorithms, respectively, and these gaps expand to 0.22, mmWave Time-Varying Channel Estimation in MU-MIMO V2X Com-
0.19 and 0.17 bit/s/Hz when Lp = 15. This improvement munications,” IEEE Transactions on Wireless Communications, vol. 20,
no. 11, pp. 7581–7594, 2021.
occurs because the additional channel paths provide MAs [16] W. Mei, X. Wei, B. Ning, Z. Chen, and R. Zhang, “Movable-Antenna
with greater diversity and more DoFs, further enhancing the Position Optimization: A Graph-Based Approach,” IEEE Wireless
sum secrecy rate. Communications Letters, vol. 13, no. 7, pp. 1853–1857, 2024.

You might also like