Secure Movable Antennas (1)
Secure Movable Antennas (1)
Abstract—In this paper, we investigate a secure Integrated systems, it can be used to disrupt illegal receivers to protect
Sensing and Communication (ISAC) system aided by Movable confidential information from eavesdropping. Extensive stud-
Antennas (MAs). Therein, the positions of the MAs can be ies on Physical Layer Security (PLS) have considered the
dynamically adjusted to enhance channel conditions, thereby
improving communication security for legitimate users in the use of interfering/jamming signals to disrupt eavesdroppers
presence of adversaries. Additionally, we consider a realistic [4], [5]. Specifically, the authors in [4] employed symbol-
scenario, where the Base Station (BS) only has imperfect level precoding to implement secure beamforming, while the
eavesdropper’s Channel State Information (CSI). In this context, authors in [5] designed beamforming under imperfect CSI
our objective is to optimize the system secrecy rate by jointly conditions, including bounded and Gaussian CSI errors.
designing the BS’s transmit beamforming vector and MA
movement. To address the dynamics of the ISAC environments Another challenge for ISAC implementation is the limita-
and uncertainties of the imperfect CSI scenario, as well as the tion of available Degrees of Freedom (DoFs) in the spatial do-
complexity and non-convexity of the problem, we propose a deep main when using Fixed-Position Antenna (FPAs) arrays. An-
reinforcement learning method to simultaneously design trans-
mit beamforming and MA movement to against eavesdropper. tenna Selection (AS) techniques can partially address this lim-
The simulation results show that the use of MAs improves the itation by exploiting spatial DoFs, but they typically require a
secrecy rate by 52% and the proposed algorithm outperforms large number of antenna elements, thus driving up costs [6].
baseline methods in terms of adaptability and performance. Recently, MA have emerged as a cost-effective alternative
Index Terms—Movable Antenna, Integrated Sensing and to address the challenge of providing additional DoFs. By
Communication, Reinforcement Learning, Physical Layer Se- using real-time controllers, such as stepper motors or servos,
curity MAs allow flexible adjustment of antenna positions, enabling
full spatial diversity with fewer antennas or even a single
I. I NTRODUCTION antenna. With the advantages mentioned above, this approach
has attracted attention in ISAC system design [7], [8]. For
Integrated Sensing and Communication (ISAC) has be- example, the work in [7] demonstrated significant uplink and
come a key enabler for next-generation wireless networks, in- downlink communication improvements in MA-based ISAC
cluding Beyond Fifth Generation (B5G) and Sixth Generation systems, and [8] addressed secure communication using MAs
(6G) systems. By integrating sensing and communication into in combination with reconfigurable intelligent surfaces.
a unified framework, ISAC facilitates intelligent applications
The aforementioned studies primarily employed traditional
such as vehicle-to-infrastructure networks and unmanned
convex optimization methods to address the complex non-
aerial vehicles [1]. Unlike traditional wireless systems, where
convex problems inherent in ISAC design. However, in
sensing and communication functions are performed indepen-
dynamic and uncertain environments under imperfect CSI
dently, ISAC simultaneously delivers communication services
scenarios, the one-shot optimization methods prove to be
to users while acquiring critical information about sensing tar-
ineffective. To address this limitation, Deep Reinforcement
gets, thereby improving the resource efficiency and reducing
Learning (DRL) has emerged as an effective method, enabling
the hardware cost [2].
systems to learn from environmental interactions and adapt
However, designing ISAC systems poses critical chal-
to changes or uncertainties within the operational context.
lenges. The need to create highly directional sensor beams
The authors in [9], [10] investigated the application of
while carrying information signals leads to inevitable signal
DRL in MA implementation, showing its ability to solve
leakage, which can be intercepted by eavesdroppers [3].
problems encountered by traditional convex optimization
Although radar is considered as a disruptive element in ISAC
methods. However, there is no research yet on the use of
H. L. Hung and N. C. Luong are with the Faculty of Computer Sci- DRL method to improve the secure communication of MA-
ence, Phenikaa University, Hanoi 12116, Vietnam. Emails: (hung.hoangle, aided ISAC systems. Inspired by this research motivation, this
luong.nguyencong)@phenikaa-uni.edu.vn paper considers a secure communication issue for MA-aided
N. T. Hoa, and N. H. Huy are with the School of Electrical and Electronic
Engineering, Hanoi University of Science and Technology, Hanoi 100000, ISAC system with imperfect eavesdropper’s CSI. Specifically,
Vietnam. Emails: [email protected], [email protected]. we aim to maximize the wosrt-case security rate between
Q.-V. Pham is with the School of Computer Science and Statistics, Trinity users and eavesdropper over multiple time lots by jointly
College Dublin, D02 PN40, Ireland. Email: [email protected].
D. Niyato is with the College of Computing and Data Science, Nanyang optimizing the transmit beamforming of BS and the MA
Technological University, Singapore. Email: [email protected]. locations. We propose a DRL algorithm, namely Proximal
2
Policy Optimization (PPO), to jointly design the transmit from the BS to eavesdropper/CUs, respectively, and λ is the
beamforming and MA movement. The simulation results wavelength.
demonstrate that the superior performance in terms of secrecy 1) Secure Communication: We denote Lp as the number
rate of using the MAs. Moreover, the PPO algorithm has of channel paths from the BS to CU m. The channel between
excellent adaptability to dynamic and uncertain environments the BS and CU m then is obtained as [11]
under imperfect CSI scenario. XLp
hm = βm am (x) ∈ CN ×1 ,
l l
(4)
l=1
II. S YSTEM M ODEL AND P ROBLEM F ORMULATION where l
βmdenotes the path gain of the l-th transmit path from
A. System Model l
the BS to CU m. We assume that βm follows the circularly
l
symmetric complex Gaussian distribution [11], i.e., βm ∼
−α
Communication Link CN (0, ρdm /Lp ), where ρ represents the path loss at the
Radar Link Target reference distance of 1 m and α is the path-loss exponent.
Accordingly, the received signal at the CU m is expressed as
XM
Eavesdropper ym = hH m (x)wm sm + hH (x)wj sj (5)
| {z } | j=1,j̸=m{z m
Expected Signal
}
Inter-user Interference
... + hH (x)w0 s0 +nm ,
ISAC-MA BS
CU 1 CU M
| m {z }
Sensing Interference
2) Radar Sensing: We consider that the radar operates III. S TOCHASTIC O PTIMIZATION P ROBLEM
using a tracking model, where an initial estimate of the F ORMULATION AND DRL A LGORITHM
target’s Angle of Arrival (AoA) is obtained from previous A. Stochastic Optimization Problem
scans. The radar channel is modeled as slowly varying over
time, with the following expression [12]: The system optimization is formulated as a Markov deci-
sion process defined as < S, A, P, R >, which represent the
gs = βs ej2πfd τ as (x), (11) state space, action space, state transition probability function,
and reward function, respectively.
where fd = 2vfc /c is the Doppler frequency, fc , v and c 1) State Space: The state space is represented by factors
representing the carrier frequency, the target’s velocity, and that influence the objective and action decisions. The state
the speed of light, respectively, τ is the system sampling of the system can be determined by the CSI between the BS
period, and βs represents the attenuation coefficient, which and the target/CUs. In particular, we also use the imperfect
incorporates round-trip path loss and the radar cross-section CSI based on (7) so that the proposed algorithm can directly
(RCS) αs , given by find favorable actions based on the estimated imperfect CSI.
s Therefore, the state space at time slot t is
λ2 αs
βs = . (12)
(4π)3 d4s St ={gst , htm , ĥteva }, m ∈ M. (16)
The received signal at the radar system consists of the target 2) Action Space: In the optimization problem in (15), the
echo signal, interference from communication echoes, and BS needs to design the BS transmit beamforming W and
noise, which is given by: control the MA element positions x to maximize the worst-
XM case secrecy rate. Thus, the action space is defined by
ys = gsH w0 s0 + gH wj sj +ns , (13)
| {z } | j=1{zs At ={Wt , xt }. (17)
target }
users
3) Reward: The reward is designed to maximize the worst-
where ns ∼ CN (0, σs2 ) is the AWGN with mean zero and case secrecy rate of the system while satisfying important
variance σs2 . The radar SINR then can be written as constraints in (15b), (15c), and (15d). Thus, the reward
function is formulated as
∥gsH (x)w0 ∥2
γs (W, x) = PM . (14) (P
t
∥gsH (x)wj ∥2 + σs2 m∈M ∆Rm (Wt , xt ), if (15b), (15c), (15d)
j=1 rt =
0, otherwise.
(18)
B. Problem Formulation Accordingly, the BS only receives rewards when it ensures
In this paper, we aim to design the BS transmit beamform- constraints on transmission power, MA movement and sens-
ing W and control the MA element positions x to maximize ing SINR.
the worst-case secrecy rate. Meanwhile, the radar target
detection performance constraint, the MA constraints, and the B. PPO-based Algorithm
power constraints of the BS are satisfied. Mathematically, the
dynamic beamforming and antenna movement optimization Stochastic optimization problems like the one presented in
problem is formulated as: Section III-A can be solved using value-based algorithms,
X such as Deep Deterministic Policy Gradient (DDPG) [13],
max ∆Rm (W, x), (15a) where the algorithms estimate the value function (Q-function)
W,x m∈M
and select actions to optimize the Q-value at each step.
s.t. Tr(WH W) ≤ P0 , (15b) However, the learning process of these algorithms requires a
|xz − xq | ≥ D0 , z, q ∈ {1, . . . , N }, z ̸= q, (15c) large amount of resources and often yields suboptimal results.
γs (W, x) ≥ γmin , (15d) This is due to the non-policy training nature and the need to
iterate the environment exploration process. In contrast, PPO,
where P0 is the maximum transmission power, and γmin an on-policy algorithm, integrates policy gradient methods
denotes the sensing SINR threshold. Since the objective with a learned value function. This approach enables agents
function (15a) is non-concave with respect to the MA lo- trained with PPO to continuously improve their policies
cations and minimum distance constraint in (15c) is non- while interacting with the environment, allowing them to
convex, problem (15) is a non-convex optimization problem. adapt effectively to environmental changes while maintaining
In addition, since we consider the problem in multi-time slots, exploratory behavior.
the input parameters such as CU locations and AoDs vary The actor–critic architecture is central to the PPO frame-
over time, which makes the traditional one-shot optimization work. The actor-network, parameterized by θa , defines a
algorithms need to be re-implemented over time slots. This policy based on the state, while the critic-network, param-
results in high computation cost. Furthermore, we aim to eterized by θc , serves as a value function (similar to value-
maximize a long-term reward, and thus we propose to use based algorithms). The critic-network evaluates the actor’s
DRL. In the next sections, we reformulate the problem in actions and thus improving the policy using policy gradient.
(15) into a stochastic optimization problem and develop an Therefore, the PPO algorithm becomes effective when they
efficient algorithm using DRL to obtain the solution. can directly learn policies from the environment. In addition,
4
by limiting the update of the new policy within a small range TABLE I: Simulation Parameter
compared to the old policy, PPO balances policy improvement Parameter Value Parameter Value
and learning stability. M 3 N 4
τ 1 µs [15] P0 25 dBm
Specifically, the PPO algorithm utilizes a state-value func- Lp 13 λ 0.1m
tion V (st ) to estimate the expected return from the current α 2.2 D0 λ/2
2 , σ2 −80 dBm −46 dB [16]
state according to the policy πθc (at |st ), which is mathemat- σm s ρ
ATx 10 × λ γmin 10 dBm
ically formulated as: dmax 100 dmin 25
hX∞ i fc 2.4 GHz c 3 × 108 m/s
V (st ) = E γ i rt+i , (19)
i=0
Mean Reward
0.4
is employed, and the estimated advantage function is then
defined as: 0.3
X∞ MA-PPO
0.2
Â(st ) = (γω)i (rt+i + γV (st+i+1 ) − V (st+i )) , MA-A2C
MA-DDPG
i=0
(20) 0.1 FPA-PPO
FPA-A2C
V. C ONCLUSIONS
1.4 MA-PPO
0.6 This paper has investigated an MA-aided secure ISAC sys-
1.2 MA-DDPG
1
FPA-PPO 0.5 tem. With high flexibility, MAs improve channel conditions
FPA-A2C
0.8
FPA-DDPG
0.4 and ensure the secure communication of legitimate users. To
MA-PPO
0.6 0.3 MA-A2C optimize the system secrecy rate and tackle the challenges
MA-DDPG
0.4 0.2 FPA-PPO
FPA-A2C
of dynamic and uncertain imperfect CSI scenarios, we have
0.2
0.1
FPA-DDPG
proposed the PPO algorithm to jointly optimize the BS
20 23 25 27 30 5 10 15 20
Maximum transmission power (dBm) Sensing SINR Threshold (dBm) transmit beamforming and MA movements. The simulation
(a) (b)
results have confirmed the effectiveness of the proposed DRL
algorithms and highlighted the performance advantages of
Fig. 3: The impact of (a) the maximum transmission power MAs over traditional FPA schemes.
and (b) the sensing SINR threshold on the sum secrecy rate.
R EFERENCES
1.2 MA-PPO [1] Z. Du, F. Liu, W. Yuan, C. Masouros, Z. Zhang, S. Xia, and G. Caire,
MA-A2C 0.6
“Integrated Sensing and Communications for V2I Networks: Dynamic
Sum Secrecy Rate
MA-DDPG
1 FPA-PPO 0.5 Predictive Beamforming for Extended Vehicle Targets,” IEEE Trans-
FPA-A2C
0.8 FPA-DDPG
0.4 actions on Wireless Communications, vol. 22, no. 6, pp. 3612–3627,
0.6 0.3 MA-PPO 2023.
MA-A2C
MA-DDPG [2] F. Liu, Y. Cui, C. Masouros, J. Xu, T. X. Han, Y. C. Eldar, and
0.2
0.4 FPA-PPO
FPA-A2C
S. Buzzi, “Integrated Sensing and Communications: Toward Dual-
0.1 FPA-DDPG Functional Wireless Networks for 6G and Beyond,” IEEE Journal on
0.2
2 4 6 8 3 5 7 9 11 13 15 Selected Areas in Communications, vol. 40, no. 6, pp. 1728–1767,
Number of Antennas Number of Channel Paths 2022.
[3] F. Liu, C. Masouros, A. P. Petropulu, H. Griffiths, and L. Hanzo, “Joint
(a) (b)
Radar and Communication Design: Applications, State-of-the-Art, and
Fig. 4: The impact of (a) the number of antennas and (b) the the Road Ahead,” IEEE Transactions on Communications, vol. 68,
no. 6, pp. 3834–3862, 2020.
number of channel paths on the sum secrecy rate. [4] M. R. A. Khandaker, C. Masouros, and K.-K. Wong, “Constructive
Interference Based Secure Precoding: A New Dimension in Physical
Layer Security,” IEEE Transactions on Information Forensics and
Security, vol. 13, no. 9, pp. 2256–2268, 2018.
This is because a more stringent target detection constraint [5] Z. Ren, L. Qiu, J. Xu, and D. W. K. Ng, “Robust Transmit Beam-
demands higher power allocation for sensing beamformers. forming for Secure Integrated Sensing and Communication,” IEEE
Consequently, with a fixed transmission power budget, less Transactions on Communications, vol. 71, no. 9, pp. 5549–5564, 2023.
[6] A. Molisch and M. Win, “MIMO Systems with Antenna Selection,”
power is available for communication tasks, leading to a IEEE Microwave Magazine, vol. 5, no. 1, pp. 46–56, 2004.
reduction in the sum secrecy rate. Importantly, the PPO- [7] H. Qin, W. Chen, Q. Wu, Z. Zhang, Z. Li, and N. Cheng, “Cramér-Rao
based MA scheme exhibits superior adaptability compared Bound Minimization for Movable Antenna-Assisted Multiuser Inte-
grated Sensing and Communications,” IEEE Wireless Communications
to baseline schemes. For example, when the sensing SINR Letters, pp. 1–1, 2024.
threshold increases from 5 dB to 20 dB, the MA-PPO scheme [8] Y. Ma, K. Liu, Y. Liu, L. Zhu, and Z. Xiao, “Movable-Antenna
experiences only a 9% reduction in performance, while the Aided Secure Transmission for RIS-ISAC Systems,” 2024. [Online].
Available: https://arxiv.org/abs/2410.03426
FPA-DDPG scheme suffers a significantly larger drop of 28%. [9] C. Wang, G. Li, H. Zhang, K.-K. Wong, Z. Li, D. W. K. Ng, and C.-B.
Finally, we examine how the number of MA elements and Chae, “Fluid Antenna System Liberating Multiuser MIMO for ISAC
channel paths affect the sum secrecy rate. Fig. 4a presents via Deep Reinforcement Learning,” IEEE Transactions on Wireless
Communications, vol. 23, no. 9, pp. 10 879–10 894, 2024.
the sum secrecy rate as a function of the number of antenna [10] C. Weng, Y. Chen, L. Zhu, and Y. Wang, “Learning-Based Joint
elements. Increasing the number of antennas enables the Beamforming and Antenna Movement Design for Movable Antenna
system to achieve spatial diversity and beamforming gains, Systems,” IEEE Wireless Communications Letters, vol. 13, no. 8, pp.
2120–2124, 2024.
resulting in higher sum secrecy rates for all schemes. Notably, [11] X. Wei, W. Mei, D. Wang, B. Ning, and Z. Chen, “Joint Beamforming
the superior spatial DoF utilization achieved by MAs allows and Antenna Position Optimization for Movable Antenna-Assisted
them to require fewer antennas to deliver the same level Spectrum Sharing,” IEEE Wireless Communications Letters, vol. 13,
no. 9, p. 2502–2506, 2024.
of security performance. Specifically, the MA-PPO scheme [12] Z. Liu, S. Aditya, H. Li, and B. Clerckx, “Joint Transmit and Receive
achieves a sum secrecy rate of 0.6 bits/s/Hz with only 4 Beamforming Design in Full-Duplex Integrated Sensing and Communi-
antennas, compared to 6 antennas needed by the FPA-PPO cations,” IEEE Journal on Selected Areas in Communications, vol. 41,
no. 9, pp. 2907–2919, 2023.
scheme. Fig. 4b shows the variation in sum secrecy rate with [13] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa,
the number of channel paths. It is clear that as the number D. Silver, and D. Wierstra, “Continuous Control with Deep Reinforce-
of channel paths grows, the sum secrecy rate increases ment Learning,” International Conference on Learning Representations
(ICLR), 2016.
considerably across all schemes, with the performance gaps [14] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley,
becoming more pronounced. Specifically, when Lp = 3, the D. Silver, and K. Kavukcuoglu, “Asynchronous Methods for Deep
sum secrecy rate gaps between the MA and FPA systems are Reinforcement Learning,” Proceedings of The 33rd International Con-
ference on Machine Learning, vol. 48, pp. 1928–1937, 2016.
0.02, 0.05 and 0.04 bit/s/Hz for the PPO, A2C, and DDPG [15] S. Huang, M. Zhang, Y. Gao, and Z. Feng, “MIMO Radar Aided
algorithms, respectively, and these gaps expand to 0.22, mmWave Time-Varying Channel Estimation in MU-MIMO V2X Com-
0.19 and 0.17 bit/s/Hz when Lp = 15. This improvement munications,” IEEE Transactions on Wireless Communications, vol. 20,
no. 11, pp. 7581–7594, 2021.
occurs because the additional channel paths provide MAs [16] W. Mei, X. Wei, B. Ning, Z. Chen, and R. Zhang, “Movable-Antenna
with greater diversity and more DoFs, further enhancing the Position Optimization: A Graph-Based Approach,” IEEE Wireless
sum secrecy rate. Communications Letters, vol. 13, no. 7, pp. 1853–1857, 2024.